I'm looking for a way to remove all JavaScripts tags from a html string.
Following regex works fi开发者_开发问答ne, but I would like to add an exception:
$html = preg_replace('#<script[^>]*>.*?</script>#is', '', $html);
How can I add a rule that scripts of a type text/html are getting ignored?
<script type="text/html" ... > ... </script>
Any suggestion?
Thanks in advance.
You may not be trying to sanitize untrusted HTML, but just so readers of this question don't get the wrong idea:
This won't remove javascript outside <script>
elements : <img src=bogus onerror=alert(42)>
.
It won't remove barely obfuscated scripts : <script>alert(42)</script >
.
It will turn invalid content into scripts : <scrip<script></script>t>alert(42)</script>
.
I'm not saying this is what you're trying to do. You may have perfectly good reasons for doing this that don't have to do with untrusted inputs, but, for later readers, don't try to roll your own HTML sanitizer with just regular expressions.
Use a greedy match that won't fall to Mike's pointers, like so:
$html = preg_replace('#<script.*</script>#is', '', $html);
This should (greedily) match all script tags. As for the exception, I'm not sure how to do that, sorry.
精彩评论