php regex problem_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-03 15:22 出处：网络

I want to get the <form> from the site. but between the form part in this situation, there still have mnay other html code. how to remove them? I mean how to use php just regular thean开发者_开发百

相关专题：php regex

I want to get the <form> from the site. but between the form part in this situation, there still have mnay other html code. how to remove them? I mean how to use php just regular the an开发者_开发百科d part from the site?

$str = file_get_contents('http://bingphp.codeplex.com');
preg_match_all('~<form.+</form>~iUs', $str, $match);
var_dump($match);

You should not use regular expressions for extracting HTML content. Use a DOM parser.

E.g.

$doc = new DOMDocument();
$doc->loadHTMLFile("http://bingphp.codeplex.com");

$forms = $doc->getElementsByTagName('form');

Update: If you want to remove the forms (not sure if you meant that):

for($i = $forms.length;$i--;) {
    $node = $forms->item($i);
    $node->parentNode->removeChild($node);
}

Update 2:

I just noticed that they have one form that wraps the whole body content. So this way or another, you will get the whole page actually.

The regex problem lies in the greedyness. For such cases .+? is advisable.

But what @Felix said. While a regular expression is workable for HTML extraction, you often look for something specific, and should thus rather parse it. It's also much simpler if you use QueryPath:

 $str = file_get_contents('http://bingphp.codeplex.com');
 print qp($str)->find("form")->html();

The best way i can think of is to use the Simple HTML DOM library with PHP to get the form(s) from the HTML page using DOM queries.

It is a little more convenient than using built-in xml parsers like simplexml or domdocument.

You can find the library here.

Normally you should use DOM to parse HTML, but in this case the web site is very far from being standard HTML, with some of the code being modified in place by javascript. It can therefore not be loaded into the DOM object. This might be intentional, a way of obfuscating the code.

In any case, it is not so much your RE (although using a non-greedy match would help), but the design of the site itself which is preventing you from parsing out what you want.

php regex problem

精彩评论

关注公众号

热门标签

图文推荐

php regex problem

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：