I am a new hivebe.
My Query : In the log file we have a request fie开发者_如何学Gold like this "GET /img/home/search-user-ico.jpg HTTP/1.1" .There are more than 10,000 records are available.
Example :
"GET /img/home/search-user-ico.jpg HTTP/1.1"
"GET /JavaScript/jquery-1.4.2.min.js HTTP/1.1" "GET /ems/home HTTP/1.1" "POST /ir HTTP/1.1" "GET /CSS/jquery/themes/base/jquery.ui.button.css HTTP/1.1" "GET /CSS/jquery/themes/base/images/ui-bg_glass_75_e6e6e6_1x400.png HTTP/1.1" "GET /JavaScript/jquery/jquery-ui-1.8.5.custom.min.js HTTP/1.0"From this field "GET /img/home/search-user-ico.jpg HTTP/1.1" , i want only this part /img/home/search-user-ico.jpg ,i want to split it from GET,POST and HTTP/1.1 so please help me as how to split this using string functions available in wiki.I tried with some of the syntax available in wiki.but i'm helpless now.
i tried with the syntax like,
select regexp_extract(request,'a-zA-Za-zA-Z[a-zA-Z]',2) from logfile limit 10;
select regexp_extract(request,'GET(\s)([a-zA-Z])',2) from logfile limit 10;
select regexp_extract(request,'.?(\s)(.?)(\s)(.*?)',2) from logfile limit 10;
select regexp_extract(request,'.(\s)(.)(\s)(.*)',2) from logfile limit 10;
Thanks -Joe
I used RegexBuddy and the samples you provided and got just the URLs with this regex ([\S]*) HTTP
This assumes there will be no literal spaces in the URL, encoded is fine.
Plugging it into a hive query should look something like
select regexp_extract(request, ' (\\S*) HTTP', 1) from logfile;
(Just to note, there is a space before (\\S)
. It might be fairly obvious, but just wanted to comment on it in case it was missed)
I have done a little testing in hive and it is working, at least with the tests similar to the samples provided.
精彩评论