I'm trying to set up some exotic PHP code (I'm not an expert), and I get a FastCGI Error 500 on a PHP line containing 'preg_match_开发者_JAVA百科all'.
When I comment out the line, the page is returned with a 200 (but not how it was meant to be).
The code is parsing PHP, HTML and JavaScript content loaded from the database and is composing them to return the finished page.
Now, by placing around some error_log
entries I could determine that the line with the preg_match_all
is the cause of the 500. However the line is hit multiple times during the loading of the page and on other occasions, the line does not cause an error.
Here's how it looks like exactly:
preg_match_all ("/(<([\w]+)[^>]*>)((?:.|\n)*)(<\/\\2>)/",
$part['data'], $tags, PREG_PATTERN_ORDER|PREG_OFFSET_CAPTURE);
The subject string is a piece of text that looks like:
<script> ... some javascript functions ... </script>
Edit: This is code that is up and running correctly elsewhere, so this very well could be a PHP setting or environment difference. I'm using PHP 5.2.13 on IIS6 with FastCGI.
Edit: Nothing is mentioned in the log files. At least not in the ones I checked:
- IIS Logs
- Event Logs
- PHP Log
Edit: jab11 has pointed out the problem, but there's no solution yet:
Any thoughts or direction would be welcome.
Any chance that $part['data']
might be extremely big?
I used to get 500 error on preg_match_all
when I used it on strings bigger than 100 KB.
This is a wonderful example why it's a bad idea to process HTML with regular expressions. I'm willing to bet you're running into a Stack Overflow because the HTML source string is containing some unclosed tags, making the regex try all sorts of permutations in its futile attempt to find a closing tag (</\2>
). In an HTML file of 32 KB, it's easy to throw your regex off the trolley. Perhaps the stack is a different size on a different server so it works on one but not the other.
A quick test:
I applied the regex to the source code of this page (after having removed the closing </html>
tag). RegexBuddy promptly went catatonic for about a minute before then matching the <head>
and <body>
tags (successfully). Debugging the regex from <html>
on showed that it took the regex engine 970257 steps to find out that it couldn't match.
精彩评论