I am trying to use Regex.Matches and it seems to work in a different way to what I am used to with other languages like PHP. Here is what I am trying to do:
I want to get all forms from a particular webpage, but when I try to do the following
String pattern = "(?i)<form[^<>]*>(.*)<\\/form>";
MatchCollection matches = Regex.Matches(c开发者_运维知识库ontent, pattern );
foreach (Match myMatch in matches)
{
MessageBox.Show(myMatch.Result("$1"));
}
This code does not show anything even though there are three forms on that page. It seems that when I use (.*) it just skips everything till the end of the content.
The Regex
class makes the .
operator NOT match \r and \n by default. Try replacing this:
MatchCollection matches = Regex.Matches(content, pattern );
with:
MatchCollection matches = Regex.Matches(content, pattern, RegexOptions.Singleline);
Try something like this for the main portion of your Regex:
String pattern = "<form[\\d\\D]*?</form>";
It is a pattern I am currently using to strip all tags of a specific type out of a document, but should do well finding the form tags. You can alter the \d\D section, if so desired.
string pattern = @"(?is)<form[^<>]*>(.*?)</form>";
That regex should work the same in PHP and C# (or, more accurately, PCRE and .NET). If you're getting minimal matches in PHP without the ?
, you probably have the /U
("ungreedy") option set, e.g.:
preg_match_all('~<form[^<>]*>(.*)</form>~isU', $subject, $matches);
or
preg_match_all('~(?isU)<form[^<>]*>(.*)</form>~', $subject, $matches);
.NET has no equivalent for PCRE's ungreedy mode.
精彩评论