When I execute the following code; I get a seg fault every time! Is this a known bug? How can I make this code work?
<?php
$doc = file_get_contents("http://prairieprogressive.com/");
$replace = array(
"/<script([\s\S])*?<\/ ?script>/",
"/<style([\s\S])*?<\/ ?style>/",
"/<!--([\s\S])*?-->/",
"/\r\n/"
);
$doc = preg_replace($replace,"",$doc);
echo $doc;
?>
The error (obviously) looks like:
[root@localhost 2.开发者_开发技巧0]# php test.php
Segmentation fault (core dumped)
You have unnecessary capture groups that strain PCRE's backtracking. Try this:
$replace = array(
"/<script.*?><\/\s?script>/s",
"/<style.*?><\/\s?style>/s",
"/<!--.*?-->/s",
"/\r\n/s"
);
Another thing, \s
(whitespace) combined with \S
(non-whitespace) matches anything. So just use the .
pattern.
OK! It seems like there is some issue with the () operators...
When I use
$doc = preg_replace("/<style([\s\S]*)<\/ ?style>/",'',$doc);
instead of
$doc = preg_replace("/<style([\s\S])*<\/ ?style>/",'',$doc);
it works!!
This seems to be a bug.
As mentioned by you in the comment, it is the style regex that is causing this. As a workaround you can use the s
modifier so that .
matches even the newline:
$doc = preg_replace("/<style.*?<\/ ?style>/s",'',$doc);
Try this (added option u for unicode and changed ([\s\S])? to .? :
<?php
$doc = file_get_contents("http://prairieprogressive.com/");
$replace = array(
"#<script.*?</ ?script>#u",
'#<style.*?</ ?style>#u',
"#<!--.*?-->#u",
"#\r\n#u"
);
$doc = preg_replace($replace,"",$doc);
echo $doc;
?>
What is the point of [\s\S]
? It matches any whitespace character, and any non-whitespace character. If you replace it with .*
, it works just fine.
EDIT: If you want to match new lines too, use the s
modifier. In my opinion, it is easier to understand than a contradictory [\s\S]
.
精彩评论