开发者

Regex filter " with <> tags included

开发者 https://www.devze.com 2023-01-17 13:30 出处:网络
am having problems with some Regex code can anyone help. I have the following string of data see below:

am having problems with some Regex code can anyone help.

I have the following string of data see below:

abcd " something code " nothing  "f开发者_JS百科 <b> cannot find this section </b> "

I want to find the sections between " quotes.

I can get if to work fine using the following regax:

foreach (Match match in Regex.Matches(sourceLine, @""((\\")|[^"(\\")])+""))

However, if section between the quotes contain <> does not find the section. Not sure what to do to include the <> tags in the regex.

Thanks for your time.


public List<string> Parse(string input)
{
    List<string> results = new List<string>();
    bool startSection = true;
    int startIndex = 0;
    foreach (Match m in Regex.Matches(input, @"(^|[^\\])(&quot;)"))
    {
        if (startSection)
        {
            startSection = false;
            // capture a new section
            startIndex = m.Index + "&quot;".Length;

        }
        else
        {
            // next match starts a new section to capture
            startSection = true;
            results.Add(input.Substring(startIndex, m.Index - startIndex + 1));
        }
    }
    return results;
}


A character class […] describes a set of allowed characters and a negated character class [^…] describes a set of disallowed characters. So [^&quot;(\\&quot;)] means any character except &, q, u, o, t, ;, (, \, and ). It does not mean anything but &quot;(&quot;).

Try this instead:

&quot;(.*?)&quot;

Using the ungreedy quantifier *? matches as little as possible in opposite to the greedy quantifier * that matches as much as possible.


You can use HttpUtility.HtmlDecode to convert this text to normal characters. Then using a regex to extract text between the double quotes would be simple.

0

精彩评论

暂无评论...
验证码 换一张
取 消