Overflowed Stack,
I have a Java web application (tomcat) whereby I allow the user to upload HTML code through a form.
Now since I am running on tomcat and I actual开发者_开发知识库ly display the user-uploaded HTML I do not want a user to malicious code JSP tags/scriptlet/EL and for these to be executed on the server. I want to filter out any JSP/non-HTML content.
Writing a parser myself seems too onerous - apart from the lots of subtleties one has to take care of (comments, byte representation for the scripts etc).
Do you know of any API/library which does this for me ? I know about Caja filtering, but am looking at something specifically for JSPs.
Many Thanks, JP, Malta.
Using a library for content cleaning is better than trying to do it yourself with e.g. Regexes.
Try Antisamy of the Open Web Application Security Project.
http://www.owasp.org/index.php/Antisamy
I didnt used it (yet), but seems to be suitable. JSP Content should be automatically removed/escaped by the HTML Normalization.
Edit, just found these:
Best Practice: User generated HTML cleaning
RegEx match open tags except XHTML self-contained tags
Don't worry about executing JSP code. Your JSP will be turned into a servlet once, so you will have something like:
out.println(contents);
and the contents
won't be evaluated as JSP code. But you must worry about malicious javascript
Just save it as *.html
, not as *.jsp
, then it won't be passed through the JspServlet
which does all the taglib/EL processing work. All taglibs/EL will end up plain (unparsed) in response.
I'm not sure if i have understand you question completly but if you whant to remove all content in suround with a "<%@ .. %>" you can replace it with regex.
String resultString = subjectString.replaceAll("(?sim)<%@ .*? %>", "");
I don't have a library to remove JSP tags, but you can write a little one based on regexp that would :
- delete all "<% %>" tags
- delete all HTML tags that contains the ':' character (to avoid "" tags for example
I don't know whether all potential malicious java code is included with theses two filters but it is a good start...
Another solution, but a little more complicated : use a http proxy server (Apache httpd, Nginx, etc.), that will serve directly static resources (css, images, html pages) and forward to Tomcat only dynamic resources (JSP and .do actions for example). When a file is uploaded, you force the file extension to ".html". You are sure (thanks to the http proxy) that the file will not be interpreted by Tomcat.
If the pages supplied by the users aren't mentioned in the web.xml
and you don't have a rule "anything that ends with *.jsp is a JSP" in web.xml
, Tomcat won't try to compile/run them.
What is much more important: You must filter the HTML or users could add arbitrary JavaScript which would then steal other users passwords. This is non-trivial. Try to clean the code with JTidy to get XML and then remove all <script>
tags, <link>
, <object>
, maybe even <img>
(unless you make sure the URLs supplied are valid; some buggy browsers might run JavaScript if the image source is actually text/JavaScript
, all CSS styles and make sure any href
points to a safe URL. Don't forget <iframe>
and <applet>
and all the other things that might break your secure shell.
[EDIT] Thats should give you an idea where this is going to. In the end, you should do the reverse: Allow only a very small subset of HTML -- if at all. Most sites (like this one) use special markup for the formatting for two reasons:
- It's more simple for the user
- It's more secure
精彩评论