开发者

c# Regex.Matches problems with multiple matches results

开发者 https://www.devze.com 2023-03-01 16:12 出处:网络
I am trying to use Regex.Matches and it seems to work in a different way to what I am used to with other languages like PHP.

I am trying to use Regex.Matches and it seems to work in a different way to what I am used to with other languages like PHP. Here is what I am trying to do:

I want to get all forms from a particular webpage, but when I try to do the following

        String pattern = "(?i)<form[^<>]*>(.*)<\\/form>"; 
        MatchCollection matches = Regex.Matches(c开发者_运维知识库ontent, pattern );

        foreach (Match myMatch in matches)
        {
            MessageBox.Show(myMatch.Result("$1"));
        }

This code does not show anything even though there are three forms on that page. It seems that when I use (.*) it just skips everything till the end of the content.


The Regex class makes the . operator NOT match \r and \n by default. Try replacing this:

MatchCollection matches = Regex.Matches(content, pattern );

with:

MatchCollection matches = Regex.Matches(content, pattern, RegexOptions.Singleline);


Try something like this for the main portion of your Regex:

    String pattern = "<form[\\d\\D]*?</form>";

It is a pattern I am currently using to strip all tags of a specific type out of a document, but should do well finding the form tags. You can alter the \d\D section, if so desired.


string pattern = @"(?is)<form[^<>]*>(.*?)</form>"; 

That regex should work the same in PHP and C# (or, more accurately, PCRE and .NET). If you're getting minimal matches in PHP without the ?, you probably have the /U ("ungreedy") option set, e.g.:

preg_match_all('~<form[^<>]*>(.*)</form>~isU', $subject, $matches);

or

preg_match_all('~(?isU)<form[^<>]*>(.*)</form>~', $subject, $matches);

.NET has no equivalent for PCRE's ungreedy mode.

0

精彩评论

暂无评论...
验证码 换一张
取 消