开发者

Regular Expression to find the question in the string

开发者 https://www.devze.com 2023-03-14 20:34 出处:网络
I have some string like this, \\r\\n21.what is your favourite pet name?\\r\\nA.Cat B.Dog\\r\\nC.Horse D.Snake\\r\\n22.Which country produce

I have some string like this,

\r\n21.what is your favourite pet name?\r\nA.Cat B.Dog\r\nC.Horse D.Snake\r\n22.Which country produce w开发者_如何学Cheat most?\r\nA.Australia B.Bhutan\r\nC.India D.Canada.

=====================================

Now i have to find the questions as well as the choice from the string through regular expression.

Can anybody sujjest.

I am parsing like [1-9][.] for the question. But I am getting two questions sometimes merged.

Can any body suggest any changes.


((\d+\..*?\?\\r\\n)(A\..*?)(B\..*?)(C\..*?)(D\..*?\\r\\n))

You can use this regex, but it assumes that after the last choice there are \r\n characters.


I have created two possible regular expressions, depending on if you want the number/letter of the question/answer to appear in the capture or not.

Pattern1: (?<Question>\d+\.[^?]+\?)(?:(?:\W*)(?<Answer>[ABCD]\..*?(?=$|(?:\s|\r\n)(?:[ABCD]\.|\d+\.))))*
Pattern2: \d+\.(?<Question>[^?]+\?)(?:(?:\W*)[ABCD]\.(?<Answer>.*?(?=$|(?:\s|\r\n)(?:[ABCD]\.|\d+\.))))*

I am assuming you want this in C#, since you tagged it as C#, so here is some sample code you can paste into a new Console Application to begin playing with:

        var input = "\r\n21.what is your favourite pet name?\r\nA.Cat B.Dog\r\nC.Horse D.Snake\r\n22.Which country produce wheat most?\r\nA.Australia B.Bhutan\r\nC.India D.Canada.";
        var pattern1 = @"(?<Question>\d+\.[^?]+\?)(?:(?:\W*)(?<Answer>[ABCD]\..*?(?=$|(?:\s|\r\n)(?:[ABCD]\.|\d+\.))))*";
        var pattern2 = @"\d+\.(?<Question>[^?]+\?)(?:(?:\W*)[ABCD]\.(?<Answer>.*?(?=$|(?:\s|\r\n)(?:[ABCD]\.|\d+\.))))*";
        foreach (Match m in Regex.Matches(input, pattern2))
        {
            var question = m.Groups["Question"].Value;
            var answers = (from Capture cap in m.Groups["Answer"].Captures
                           select cap.Value).ToList();

            Console.WriteLine("Question: {0}", question);
            foreach (var answer in answers)
            {
                Console.WriteLine("Answer: {0}", answer);
            }
        }
        Console.ReadLine();

It uses a regex pattern to parse each question into a question variable, and the related answers into a list of answers. You can change which pattern is used by changing the pattern sent to the Regex.Matches() function in the first foreach.


In Python:

Find the questions:

>>> import re
>>> re.findall(r'[1-9][1-9]*\.([^?]*)',s)
['what is your favourite pet name', 'Which country produce wheat most']


I'm not sure if it will work in Bengali, but the following code works OK in English (at least on the example you provided ;) ):

var input = "\r\n21.what is your favourite pet name?\r\nA.Cat B.Dog\r\nC.Horse D.Snake\r\n22.Which country produce wheat most?\r\nA.Australia B.Bhutan\r\nC.India D.Canada.";

var regex = new Regex(@"(?<number>[0-9]+)\.(?<question>.+\?)\W+((?<letter>[A-Z])\.(?<answer>\w+)\W*)+");

foreach (Match question in regex.Matches(input))
{
    Console.Write("{0}. ", question.Groups["number"].Captures[0]);
    Console.WriteLine(question.Groups["question"].Captures[0]);

    foreach (Capture answer in question.Groups["answer"].Captures)
    {
        Console.WriteLine(answer.Value);
    }
}

It prints:

21. what is your favourite pet name?
Cat
Dog
Horse
Snake
22. Which country produce wheat most?
Australia
Bhutan
India
Canada

I guess you can get what you need from there.


This can help:

[0-9]+\.(.*?)\?\s*A\.(.*?)\s*B\.(.*?)\s*C\.(.*?)\s*D\.(.*?)\r\n

using \r\n to delim questions is not a good idea. Though it should work in your case.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号