EDIT: For future reference, I'm using non-xhtml content type definition <!html>
I'm creating a website using Django, and I'm trying to embed arbitrary json data in my pages to be used by client-side javascript code.
Let's say my json object is {"foo": "</script>"}
. If I embed this directly,
<script type='text/javascript'>JSON={"foo": "</script>"};</script>
The first closes the json object. (also, it will make the site vulnerable to XSS, since this json object will be dynamically generated).
If I use django's HTML escape function, the resulting output is:
开发者_Go百科<script type='text/javascript'>JSON={"foo": "</script>"};</script>
and the browser cannot interpret the <script>
tag.
The question I have here is,
- Which characters am i suppose to escape / not escape in this situation?
- Is there automated way to perform this in Python / django?
If you are using XHTML, you would be able to use entity references (<
, >
, &
) to escape any string you want within <script>
. You would not want to use a <![CDATA[...]]>
section, because the sequence "]]>
" can't be expressed within a CDATA section, and you would have to change the script to express ]]>
.
But you're probably not using XHTML. If you're using regular HTML, the <script>
tag acts somewhat like a CDATA section in XML, except that it has even more pitfalls. It ends with </script>
. There are also arcane rules to allow <!-- document.write("<script>...</script>") -->
(the comments and <script>
opening tag must both be present for </script>
to be passed through). The compromise that the HTML5 editors adopted for future browsers is described in HTML 5 tokenization and CDATA Escapes
I think the takeaway is that you must prevent </script>
from occurring in your JSON, and to be safe you should also avoid <script>
, <!--
, and -->
to prevent runaway comments or script tags. I think it's easiest just to replace <
with \u003c
and -->
with --\>
I tried backslash escaping the forward slash and that seems to work:
<script type='text/javascript'>JSON={"foo": "<\/script>"};</script>
have you tried that?
On a side note, I am surprised that the embedded </script>
tag in a string breaks the javascript. Couldn't believe it at first but tested in Chrome and Firefox.
I would do something like this:
<script type='text/javascript'>JSON={"foo": "</" + "script>"};</script>
For this case in python, I have opened a bug in the bug tracker. However the rules are indeed complicated, as <!--
and <script>
play together in quite evil ways even in the adopted html5 parsing rules. BTW, ">" is not a valid JSON escape, so it would better be replaced with "\u003E", thus the absolutely safe escaping should be to escape \u003C and \u003E AND a couple other evil characters mentioned in the python bug...
精彩评论