开发者

Parsing a text file into fields using multiple delimiter types

开发者 https://www.devze.com 2023-02-11 18:42 出处:网络
I\'m attempting to parse log files from a chat using c#, the problem I\'m running into is that it\'s not really designed for parsing as it doesn\'t use standard delimiters.Here\'s an example of a typi

I'm attempting to parse log files from a chat using c#, the problem I'm running into is that it's not really designed for parsing as it doesn't use standard delimiters. Here's an example of a typical line from the file:

 2010-08-09 02:07:54 [Message] Sky开发者_开发问答lar Morris -> (ATL)City Waterfront: I'll be right back 
 date time messageType userName -> roomName: message

The fields I'd like to store are: Date and Time joined as a DateTime type

messageType

userName

roomName

message

If it was separable by a standard delimiter like space, tab, or comma it would be fairly simple but I'm at a loss on how to attack this.


As a follow up, using this code as a template:

List<String> fileContents = new List<String>();
string input = @"2010-08-09 02:07:54 [Message] Skylar Morris -> (ATL)City Waterfront: I'll be right back";
string pattern = @"(.*)\[(.*)\](.*)->(.+?):(.*)";

foreach (string result in Regex.Split(input, pattern))
{
   fileContents.Add(result.Trim());
}

I'm getting 7 elements (one empty before and after) the 5 that are expected. How can I rectify this?

foreach (string result in Regex.Split(input, pattern)
        **.Where(result => !string.IsNullOrEmpty(result))**)
{
   fileContents.Add(result.Trim());
}

Ok, managed to resolve it with the above code.


You know that old adage about "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."?

well, in this case, you really do need regular expressions.

this one should cover you in this case:

([\d]{4}-[\d]{2}-[\d]{2} [\d]{2}:[\d]{2}:[\d]{2}) \[([\w]+)\] ([a-zA-Z0-9 ]+) -> (\([\w]+\)[a-zA-Z0-9 ]+): (.*)

you should really test it though. I just threw this together and it may be not handling everything you could see.


Try this:

.*\[(.*)\](.*)->(.+?):(.*)

It uses the fact that message is in square brackets [] name is between [] and -> room name is between -> and : and message is everything afterwards. :)

0

精彩评论

暂无评论...
验证码 换一张
取 消