Are there any security risks in allowing(whitelist only) pure markup tags such as a, b, i, etc in post submission?
BB code seems like a heavy solution to the problem of injecting code and whitelisting "safe" html tags seems easier then going through all the parsing and conversion that bb code requires.
I have found that many bb code libraries have issues with nested elements(is this because they use a FSA or regex, instead of a proper parser?) and blockquote or fieldset are properly parsed by 开发者_JS百科the web browser.
Any and all opinions are greatly appreciated.
This is something everyone seems to get wrong, while it is so simple.
Use a parser
It doesn't matter whether you use markdown, html, bbcode, whatever.
Use a parser. A real parser. Not a bunch of regexes.
The parser gives you a syntaxtree. From the syntaxtree you derive the html (still as a tree of objects). Clean the tree (using a whitelist), print the html.
Using html as syntax is perfectly fine. Just don't try to clean it with regexes.
There is nothing wrong with using HTML as long as you:
- Use a proper HTML parser to process the input.
- Whitelist the tags so that only things you want get through.
- Whitelist the attributes on the tags. This includes parsing and whitelist things inside
style
attributes if you want to allowstyle
(and, of course, use a real CSS parser for thestyle
attributes). - Rewrite the HTML while you parse it.
The last point is mostly about getting consistent and correct HTML output. Your parser should take care of figuring out the usual confusion (such as incorrectly nested tags) that you find in hand written HTML.
精彩评论