开发者

Parsing Jetty log records

开发者 https://www.devze.com 2023-02-27 15:07 出处:网络
For the given input example: 70.80.110.200 --[12/Apr/2011:05:47:34 +0000] \"GET /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000 HTTP/1.1\" 302 0 \"-\" \"Mozilla/4

For the given input example:

70.80.110.200 -  -  [12/Apr/2011:05:47:34 +0000] "GET /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000 HTTP/1.1" 302 0 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident开发者_开发技巧/4.0; FunWebProducts; HotbarSearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; AskTbFWV5/5.11.3.15590)" 4 4

I would like to define the following parsing logic (probably regex)

  1. Extract the IP (3 digits, dot) * 4 => 70.80.110.200
  2. Extract the date => 12/Apr/2011
  3. Extract the time => 05:47:34
  4. Extract the URI (starts with \" and ends with \"). => /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000


Try with:

/^([0-9.]+).*?\[(\d+\/\w+\/\d+):(\d+:\d+:\d+).*?\].*?(\/[^ ]*).*$/

As you expect, in following groups (1, 2, 3, 4) you will get all data you specified - for example .group(3) is time.


Ensure Jetty is configured to do NSCA-compatible logging, then you can use any NCSA log analyzer to analyze the logs.

If you want to do it by hand, then this is a nice usecase for regular expressions.


Complete code sample (based on hsz's answer):

import java.util.*;
import java.util.regex.*;

public class RegexDemo {

  public static void main( String[] argv ) {
    String pat = "^([0-9.]*).*?\\[(\\d+\\/\\w+\\/\\d+):(\\d+:\\d+:\\d+).*?\\].*?(\\/[^ ]*).*$";
    Pattern p = Pattern.compile(pat);
    String target = "70.80.110.200 -  -  [12/Apr/2011:05:47:34 +0000] \"GET /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000 HTTP/1.1\" 302 0 \"-\" \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; FunWebProducts; HotbarSearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; AskTbFWV5/5.11.3.15590)\" 4 4";
    Matcher m = p.matcher(target);
    System.out.println("pattern: " + pat);
    System.out.println("target: " + target);

    if (m.matches()) {
      System.out.println("found");
      for (int i=0; i <= m.groupCount(); ++i) {
        System.out.println(m.group(i));
      }
    }
  }
}


You can try the following:

String s = "70.80.110.200 -  -  [12/Apr/2011:05:47:34 +0000] \"GET /notify/click?r=http://www.xxxxxx.com/hello_world&rt=1302587231462&iid=00000 HTTP/1.1\" 302 0 \"-\" \"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; FunWebProducts; HotbarSearchToolbar 1.1; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; AskTbFWV5/5.11.3.15590)\" 4 4";
Pattern p = Pattern.compile("^(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}).*?\\" + //ip
                            "[([^:]*):"+ //date
                            "(\\d{2}:\\d{2}:\\d{2}).*?\\].*?"+ //time
                            "(/[^\\s]*).*$"); //uri

Matcher m = p.matcher(s);
if(m.find()){
    String ip = m.group(1);
    String date = m.group(2);
    String time = m.group(3);
    String uri = m.group(4);
}
0

精彩评论

暂无评论...
验证码 换一张
取 消