I have a text data file which contains text like this:
"[category开发者_Python百科.type.group.subgroup]" - "2934:10,4388:20,3949:30" "[category.type.group.subgroup]" - "2934:10,4388:20,3949:30" "[category.type.group.subgroup]" - "2934:10,4388:20,3949:30" "[category.type.group.subgroup]" - "2934:10,4388:20,3949:30" 34i23042034002340 ----- "[category.type.group.subgroup]" - "2934:10,4388:20,3949:30" "[category.type.group.subgroup]" - "2934:10,4388:20,3949:30" 828728382 ------ 3498293485 AAAAAAA
I need the best way to parse the data, specifically I need the category, type, group, subgroup, and numeric values in the quotes. I was thinking of using Regex, but I was wondering if there are other ideas instead of having several IF statements to analize the data?
If you use Regex, you won't need several IF statements. Something like this would read several values with one regular expression:
Regex parseLine = new Regex(@"(?<num1>\d+)\:(?<num2>\d+)\,(?<num3>\d+)", RegexOptions.Compiled);
foreach (string line in File.ReadAllLines(yourFilePath))
{
var match = parseLine.Match(line);
if (match.Success) {
var num1 = match.Groups["num1"].Value;
var num2 = match.Groups["num2"].Value;
var num3 = match.Groups["num3"].Value;
// use the values.
}
}
Try the FileHelpers library, it'll take a little work to set up, but save you a lot of work dealing with all the tricky situations that come up in parsing a file like that. It can handle delimited, fixed-width or record based parsing.
string reg = "\"\\[([^.]+)\\.([^.]+)\\.([^.]+)\\.([^.]+)\\]\"\\s+-\\s+\"([0-9]+):([0-9]+),([0-9]+):([0-9]+),([0-9]+):([0-9]+)\"";
Regex r = new Regex(reg);
Match m = r.Match(aline);
if (m.Success)
{
string category = m.Groups[1];
string type = m.Groups[2];
string group = m.Groups[3];
string subgroup = m.Groups[4];
string num1 = m.Groups[5];
// and so on...
}
EDIT Just saw that you can have an arbitrary number of number sets. The following should handle that:
string reg = "\"\\[([^.]+)\\.([^.]+)\\.([^.]+)\\.([^.]+)\\]\"(\\s+-\\s+\"(([0-9]+):([0-9]+),?)+\")?";
string reg2 = "([0-9]+):([0-9]+),?";
Regex r = new Regex(reg);
Console.WriteLine(a);
Console.WriteLine(reg);
Match m = r.Match(a);
if (m.Success)
{
string category = m.Groups[1];
string type = m.Groups[2];
string group = m.Groups[3];
string subgroup = m.Groups[4];
MatchCollection mc = Regex.Matches(m.Groups[5].Value, reg2);
List<string> numbers = new List<string>();
foreach (Match match in mc)
{
numbers.Add(match.Groups[1].Value);
numbers.Add(match.Groups[2].Value);
}
}
精彩评论