I have some string like this,
\r\n21.what is your favourite pet name?\r\nA.Cat B.Dog\r\nC.Horse D.Snake\r\n22.Which country produce w开发者_如何学Cheat most?\r\nA.Australia B.Bhutan\r\nC.India D.Canada.
=====================================
Now i have to find the questions as well as the choice from the string through regular expression.
Can anybody sujjest.
I am parsing like [1-9][.]
for the question. But I am getting two questions sometimes merged.
Can any body suggest any changes.
((\d+\..*?\?\\r\\n)(A\..*?)(B\..*?)(C\..*?)(D\..*?\\r\\n))
You can use this regex, but it assumes that after the last choice there are \r\n characters.
I have created two possible regular expressions, depending on if you want the number/letter of the question/answer to appear in the capture or not.
Pattern1: (?<Question>\d+\.[^?]+\?)(?:(?:\W*)(?<Answer>[ABCD]\..*?(?=$|(?:\s|\r\n)(?:[ABCD]\.|\d+\.))))*
Pattern2: \d+\.(?<Question>[^?]+\?)(?:(?:\W*)[ABCD]\.(?<Answer>.*?(?=$|(?:\s|\r\n)(?:[ABCD]\.|\d+\.))))*
I am assuming you want this in C#, since you tagged it as C#, so here is some sample code you can paste into a new Console Application to begin playing with:
var input = "\r\n21.what is your favourite pet name?\r\nA.Cat B.Dog\r\nC.Horse D.Snake\r\n22.Which country produce wheat most?\r\nA.Australia B.Bhutan\r\nC.India D.Canada.";
var pattern1 = @"(?<Question>\d+\.[^?]+\?)(?:(?:\W*)(?<Answer>[ABCD]\..*?(?=$|(?:\s|\r\n)(?:[ABCD]\.|\d+\.))))*";
var pattern2 = @"\d+\.(?<Question>[^?]+\?)(?:(?:\W*)[ABCD]\.(?<Answer>.*?(?=$|(?:\s|\r\n)(?:[ABCD]\.|\d+\.))))*";
foreach (Match m in Regex.Matches(input, pattern2))
{
var question = m.Groups["Question"].Value;
var answers = (from Capture cap in m.Groups["Answer"].Captures
select cap.Value).ToList();
Console.WriteLine("Question: {0}", question);
foreach (var answer in answers)
{
Console.WriteLine("Answer: {0}", answer);
}
}
Console.ReadLine();
It uses a regex pattern to parse each question into a question variable, and the related answers into a list of answers. You can change which pattern is used by changing the pattern sent to the Regex.Matches() function in the first foreach.
In Python:
Find the questions:
>>> import re
>>> re.findall(r'[1-9][1-9]*\.([^?]*)',s)
['what is your favourite pet name', 'Which country produce wheat most']
I'm not sure if it will work in Bengali, but the following code works OK in English (at least on the example you provided ;) ):
var input = "\r\n21.what is your favourite pet name?\r\nA.Cat B.Dog\r\nC.Horse D.Snake\r\n22.Which country produce wheat most?\r\nA.Australia B.Bhutan\r\nC.India D.Canada.";
var regex = new Regex(@"(?<number>[0-9]+)\.(?<question>.+\?)\W+((?<letter>[A-Z])\.(?<answer>\w+)\W*)+");
foreach (Match question in regex.Matches(input))
{
Console.Write("{0}. ", question.Groups["number"].Captures[0]);
Console.WriteLine(question.Groups["question"].Captures[0]);
foreach (Capture answer in question.Groups["answer"].Captures)
{
Console.WriteLine(answer.Value);
}
}
It prints:
21. what is your favourite pet name?
Cat
Dog
Horse
Snake
22. Which country produce wheat most?
Australia
Bhutan
India
Canada
I guess you can get what you need from there.
This can help:
[0-9]+\.(.*?)\?\s*A\.(.*?)\s*B\.(.*?)\s*C\.(.*?)\s*D\.(.*?)\r\n
using \r\n to delim questions is not a good idea. Though it should work in your case.
精彩评论