Lets assume we have a user form that generates HTML input, and the following could be an example of what gets POSTed to PHP.
<p>Hello</p>
<p><strong>World</strong></p>
Now, these will show up later on via injected to the HTML output, into some DIV.
What I'd like to prevent is the following being entered in:
</div>
<p>Hello</p>
<p><strong>World</strong></p>
<div>
Or even something like:
</div>
<script> someScript(); </script>
<iframe src="http://www.example.com">......
<开发者_StackOverflow;p>Hello</p>
<p><strong>World</strong></p>
<div>
How can I use PHP to determine that this input will not break the document, include bad iframes, or run scripts? The most importat part is I still want that information, I'm not throwing it out, but it needs to be included as harmless text of some sort.
Using alternative markup is not an option, it needs to be HTML.
what you need is htmlpurifier
Not only it outputs html according to standars but it cleans the posted code from xss vulnerabilities.
Edit 1: you should also check the comparison out , its interesting:)
Edit 2: you can also check out htmlspecialchars and htmlentities but imo htmlpurifier is far better and much more customizable, when it comes to more complex things, like yours.
If you want to keep the broken tags but render them harmless, I'd suggest saving it twice. Save the unmodified post data into one database column, and the Purified into another. Display the Purified version usually, and the dangerous version only when you need to.
Somewhere on the HTML Purifier support forums there's an example of how to change <a href="dangerous.url.or.javascript">text</a>
to <span>text (dangerous.url.or.javascript)</span>
. This may be the sort of thing you're looking for when you say you want to keep the information, not throw it out.
HTML Purifier is highly customisable, and the author, Ambush Commander, is very helpful both on the HTML Purifier forum and here at StackOverflow.
精彩评论