I have two applications where users can submit HTML pages. I would like to make sure that no scripts are included in the HTML. Normally you would escape content to get ri开发者_运维问答d of scripts, but as this is HTML I can't do that. Anyone with good suggestions on how to do that? The applications are written in both C# and Java
OWASP has a project to scrub html and css
The first thing I'd do is see if there is a <script>
tag in the HTML. That solves the first issue, then you have to make sure there are no inline onmouseover/onclick etc. events. You could maybe use a DOM Parser to go over all elements and remove all attributes that start with 'on'.
I have little to no experience in both C# as Java, so am unaware of any "easier" solutions that area already available. But maybe someone else here has a better idea for that.
精彩评论