I'm attempting to parse log files from a chat using c#, the problem I'm running into is that it's not really designed for parsing as it doesn't use standard delimiters. Here's an example of a typical line from the file:
2010-08-09 02:07:54 [Message] Sky开发者_开发问答lar Morris -> (ATL)City Waterfront: I'll be right back
date time messageType userName -> roomName: message
The fields I'd like to store are: Date and Time joined as a DateTime type
messageType
userName
roomName
message
If it was separable by a standard delimiter like space, tab, or comma it would be fairly simple but I'm at a loss on how to attack this.
As a follow up, using this code as a template:
List<String> fileContents = new List<String>();
string input = @"2010-08-09 02:07:54 [Message] Skylar Morris -> (ATL)City Waterfront: I'll be right back";
string pattern = @"(.*)\[(.*)\](.*)->(.+?):(.*)";
foreach (string result in Regex.Split(input, pattern))
{
fileContents.Add(result.Trim());
}
I'm getting 7 elements (one empty before and after) the 5 that are expected. How can I rectify this?
foreach (string result in Regex.Split(input, pattern)
**.Where(result => !string.IsNullOrEmpty(result))**)
{
fileContents.Add(result.Trim());
}
Ok, managed to resolve it with the above code.
You know that old adage about "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."?
well, in this case, you really do need regular expressions.
this one should cover you in this case:
([\d]{4}-[\d]{2}-[\d]{2} [\d]{2}:[\d]{2}:[\d]{2}) \[([\w]+)\] ([a-zA-Z0-9 ]+) -> (\([\w]+\)[a-zA-Z0-9 ]+): (.*)
you should really test it though. I just threw this together and it may be not handling everything you could see.
Try this:
.*\[(.*)\](.*)->(.+?):(.*)
It uses the fact that message is in square brackets [] name is between [] and -> room name is between -> and : and message is everything afterwards. :)
精彩评论