Hi I have a difficult regex problem that I have tried and have a partial solution, but haven't gotten to work perfectly yet. Essentially, I have to parse a document that is in an outline format such as this:
1. HEY BUDDY
1.A. Your the best
1.A.1 And i know
1.A.2. this because
1.A.3 it is
1.A.4. the
1.A.5. truth i
1.A.6. tell ya.
1.B. so anyway
1.B.1. one two
1.B.2 three four!
2. i have
2.A. many numbers
2.A.1. hahaha
2.A.1.A. 3.2 ppl
2.A.1.B. are in
2.A.1.C my head.
2.A.1.D. yes exactly
2.A.2. 3.21
2.A.3. if you dont
2.A.4 trust me
2.B then
2.B.1. youre
2.B.2.soo wrong
2.C. its not
3. even funny.
3.A. but it
3.B. kind of
3.C. is a little
4. bit i
4.A. believe.
4.A.1. talk to me
4.A.2. more about
4.B. these ppl
4.B.2. in your head.
That is my test document... I need to find each of the new "bullets" in this document and then save the text in between them and do more computation. All that I haven't figured out is how to acurately identify the different outline numbers using regex. (I know it could probably be done other ways then regex but I'm in the process of learning regex and I have my mind set on doing it this way) What I've come up with now is this:
(\b)(([1-9][0-9]?)(\.))([A-Z])?((\.)([1-9][0-9]?)((\.)([A-Z]))?)*(\.)?(\b)
The problem with this is that it isn't recog开发者_开发技巧nizing the 1., 2., 3., or 4., and it IS picking up "3." from the 3.2 and 3.21 in the text. (And yes i will have doubles in the text like this) The format for the outline is always #.A-Z.#.A-Z.#.A-Z... and numbers should never be higher then 99.
Thanks for any help.
^[\d\w\.]+ [^\n]+$
explaining: "start of line: any digit+character+dot combination, followed by space and a combination of any non-line break character:end of line"
Have in mind that you will need to add another slash when you write this regex in your code.
The Pattern class documentation is extremely useful, even if you are advanced with regex.
Bozho's solution only works for your specific example document and will generate a lot of false matches if there are lines that don't begin with the pattern you want to match. Here's a more specific solution:
^(\d{1,2}\.([A-Z]\.)?){1,2}\s
And here's how to use it:
using System;
using System.IO;
using System.Text.RegularExpressions;
class Program
{
static void Main(string[] args)
{
using (var f = File.OpenText("input.txt"))
{
while (true)
{
string line = f.ReadLine();
if (line == null) break;
Match match = Regex.Match(line, @"^(\d{1,2}\.([A-Z]\.)?){1,2}\s");
if (match.Success)
{
Console.WriteLine(match.Value);
string result = match.Value.Substring(0, match.Value.Length - 2);
string[] array = result.Split('.');
// ..
}
}
}
}
}
精彩评论