I have this log from Exchange server
2010-05-20T01:53:33.097Z,12.10.53.144,,12.10.53.200,EXHUB-10,08CCC3F50C35F2D2;2010-05-20T01:53:32.128Z;0,EXHUB-10\Default EXHUB-10,SMTP,RECEIVE,829888,,norma@ccc.gov.my,,521647,1,,,"NEAC Sub-Working Group Meeting - Upgrade Skills of the Labour Force's and Enhance Vocational and Technical Training- 2:30 pm Monday May 24, 2010",lee.cheesung@gmail.com,<>,00A:
and i used this regex to match and group the pattern;
(\d{4}-\d{2}-\d{2})(?:[\w\s]+)(\d+:\d+:\d+.\d+)(?:[\w+\d.]*),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(['"].*['"]|.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(?:(\d{4}-\d{2}-\d{2}\w\d{2}:\d{2}:\d{2}.\d+)(?:\w+)*)*(.*)
Basically, the information in the log is separated by the comma.
Unfortunately, for the 'email subject' field, if the user enter the comma, the log will appear in double quote such as the above example - comma in the date format "Monday May 24, 2010"
.....521647,1,,,"NEAC Sub-Working Group Meeting - Upgrade Skills of the Labour Force's and Enhance Vocational and Technical Training- 2:30 pm Monday May 24, 2010",lee.keesung@gma开发者_如何学运维il.com,.....
How can i grab the whole subject together with the comma without the double quote in the specific group(19th group)
You mention:
Basically, the information in the log is separated by the comma...also if a comma is part of the field the field will be double quoted.
which makes it a CSV file. Parsing a CSV file is a solved problem and you need not reinvent the wheel. Use a CSV parser provided by your language library.
If you are using Perl take a look at the Text::CSV module.
The line you gave seems to be in a CSV format. Why not parse it using a CSV parser, such as:
- http://opencsv.sourceforge.net/
- http://supercsv.sourceforge.net/
For java use Apache commons:
http://commons.apache.org/sandbox/csv/
精彩评论