开发者

RegEx Multiple Matches in Text

开发者 https://www.devze.com 2023-01-29 07:16 出处:网络
I am trying to parse out an email and most of its working except when the record in question comes in with multiple errors.

I am trying to parse out an email and most of its working except when the record in question comes in with multiple errors.

Here's part of the text

Record #1 with LeadRecordID 4 and MTN of (813) 555-1234 has 4 errors:
   Shipping Street Address cannot be blank
   Shipping City cannot be blank
   Shipping Zipcode cannot be blank
   Errors exist in secondary records #2, #3, #4, record not processed. 
Record #2 with LeadRecordID 5 and MTN of (813) 555-4321 has 1 开发者_运维知识库errors:
   Shipping Street Address cannot be blank

Here is the RegEx I'm using:

Record #(?<RecordNumber>\d*) with LeadRecordID (?<LeadRecordId>\d*) and MTN of .* has (?<NumberOfErrors>\d*) errors:(?:\r\n|)* (?<Error1>.*)

Edit: If I do this, I get two matches, with the Error Group only showing one match per group, it should be showing all error lines. Record #(?\d*) with LeadRecordID (?\d*) and MTN of .* has (?\d*) errors:(?:\r\n)(?.*)(?:\r\n)

Edit 2: This seems to get me a subgroup, thanks.

Record #(?<RecordNumber>\d*) with LeadRecordID (?<LeadRecordId>\d*) and MTN of [^\r\n]* has (?<NumberOfErrors>\d*) errors:(?:\r\n|)*(?<Errors>(?:(?<Error>\s{3}[^\r\n]+)(?:\r\n)*)+)
enter code here


Try to use this pattern and Regex.Matches:

@"Record #(?<RecordNumber>\d*) with LeadRecordID (?<LeadRecordId>\d*) and MTN of [^\r\n]* has (?<NumberOfErrors>\d*) errors:(?:\r\n|)*(?<Errors>(?:\s{3}[^\r\n]+(?:\r\n)*)+)"

Test code:

    static void Main(string[] args)
        {
            string pattern =
@"Record #(?<RecordNumber>\d*) with LeadRecordID (?<LeadRecordId>\d*) and MTN of [^\r\n]* has (?<NumberOfErrors>\d*) errors:(?:\r\n|)*(?<Errors>(?:\s{3}[^\r\n]+(?:\r\n)*)+)";

            string message = @"Record #1 with LeadRecordID 4 and MTN of (813) 555-1234 has 4 errors:
   Shipping Street Address cannot be blank
   Shipping City cannot be blank
   Shipping Zipcode cannot be blank
   Errors exist in secondary records #2, #3, #4, record not processed. 
Record #2 with LeadRecordID 5 and MTN of (813) 555-4321 has 1 errors:
   Shipping Street Address cannot be blank";

            MatchCollection mc = Regex.Matches(message, pattern);

            foreach (Match m in mc)
            {
                Console.WriteLine("RecordNumber = \"{0}\"", m.Groups["RecordNumber"].Value);
                Console.WriteLine("LeadRecordId = \"{0}\"", m.Groups["LeadRecordId"].Value);
                Console.WriteLine("NumberOfErrors = \"{0}\"", m.Groups["NumberOfErrors"].Value);
                Console.WriteLine("Errors = \"{0}\"", m.Groups["Errors"].Value);

                MatchCollection errors = Regex.Matches(m.Groups["Errors"].Value, @"\s{3}(?<error>[^\r\n]+)(?:\r\n)*");
                foreach(Match g1 in errors)
                {
                    Console.WriteLine(g1.Groups["error"].Value);
                }
                Console.WriteLine("------------------------");
            }
            Console.ReadLine();
        }

Result:

RecordNumber = "1"
LeadRecordId = "4"
NumberOfErrors = "4"
Errors = "   Shipping Street Address cannot be blank
   Shipping City cannot be blank
   Shipping Zipcode cannot be blank
   Errors exist in secondary records #2, #3, #4, record not processed.
"
Shipping Street Address cannot be blank
Shipping City cannot be blank
Shipping Zipcode cannot be blank
Errors exist in secondary records #2, #3, #4, record not processed.
------------------------
RecordNumber = "2"
LeadRecordId = "5"
NumberOfErrors = "1"
Errors = "   Shipping Street Address cannot be blank"
Shipping Street Address cannot be blank
------------------------


The acoolaum's answer is correct though it uses additional regular expression per match. I changed his code so that it uses only one regular expression. Here's the code:

static void Main(string[] args)
{
    string pattern =
@"Record #(?<RecordNumber>\d*) with LeadRecordID (?<LeadRecordId>\d*) and MTN of [^\r\n]* has (?<NumberOfErrors>\d*) errors:\r\n(?:\s{3}(?<Error>[^\r\n]+)(?:\r\n)*)+";

            string message =
@"Record #1 with LeadRecordID 4 and MTN of (813) 555-1234 has 4 errors:
   Shipping Street Address cannot be blank
   Shipping City cannot be blank
   Shipping Zipcode cannot be blank
   Errors exist in secondary records #2, #3, #4, record not processed. 
Record #2 with LeadRecordID 5 and MTN of (813) 555-4321 has 1 errors:
   Shipping Street Address cannot be blank";

    MatchCollection mc = Regex.Matches(message, pattern);

    foreach (Match m in mc)
    {
        Console.WriteLine("RecordNumber = \"{0}\"", m.Groups["RecordNumber"].Value);
        Console.WriteLine("LeadRecordId = \"{0}\"", m.Groups["LeadRecordId"].Value);
        Console.WriteLine("NumberOfErrors = \"{0}\"", m.Groups["NumberOfErrors"].Value);
        Console.WriteLine("Errors:");

        foreach (Capture capture in m.Groups["Error"].Captures)
        {
            Console.WriteLine("\t{0}", capture.Value);
        }
        Console.WriteLine("------------------------");
    }
    Console.ReadLine();
}

Please notice I changed regular expression itself with code to extract matches from Regex (I use Group.Captures property to extract multiple matches of group "Error").

Output:

RecordNumber = "1"
LeadRecordId = "4"
NumberOfErrors = "4"
Errors:
        Shipping Street Address cannot be blank
        Shipping City cannot be blank
        Shipping Zipcode cannot be blank
        Errors exist in secondary records #2, #3, #4, record not processed.
------------------------
RecordNumber = "2"
LeadRecordId = "5"
NumberOfErrors = "1"
Errors:
        Shipping Street Address cannot be blank
------------------------


I can't tell without seeing your code, but you probably need Regex.Matches as opposed to Regex.Match


What do you set as option for matching. Single Line or multiline?

I think you need to change your regex and use multiline option.

0

精彩评论

暂无评论...
验证码 换一张
取 消