I have this pattern:
"^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+|\\S+)"
That I use it on the following apache access log entry:
127.0.0.1 - - [16/Jul/2011:20:29:14 +0100] "GET /TestWebPages/MScAIS-SEWN-Search-Optimisation.ht开发者_C百科ml HTTP/1.1" 200 5569
Sometime after the 7th element I might or might not have something. E.g.
127.0.0.1 - - [16/Jul/2011:20:29:14 +0100] "GET /TestWebPages/MScAIS-SEWN-Search-Optimisation.html HTTP/1.1" 200 5569 –
Sometimes I have the -
at the end and sometime it just doesn’t exists.
How can add this to my pattern? I tried using (\\S{0})
but it did not work!
Try adding: (\\s–){0,1}
which means you could have zero or one occurrence of " –"
Try the question mark:
\S? (alternative \S{0,1})
or for multiple occurence the asterisk:
\S* (alternative \S{0,})
精彩评论