How can I prevent double encoding of html entities, or fix them programmatically?
I am using the encode() function from the HTML::Entities perl module to encode HTML entities in user input. The problem here is that we also allow users to input HTML entities directly and these entities end up being double encoded.
For example, a user may enter:
Stackoverflow & Perl = Awesome…
This ends up being encoded to
Stackoverflow & Perl = Awesome…
This renders in the browser as
Stackoverflow & Perl = Awesome…
We want this to render as
St开发者_如何学Cackoverflow & Perl = Awesome...
Is there a way to prevent this double encoding? Or is there a module or snippet of code that can easily correct these double encoding issues?
Any help is greatly appreciated!
You can decode the string first:
my $input = from_user();
my $encoded = encode_entities( decode_entities $input );
There is an extremely simple way to avoid this:
- Remove all the entities upon input (turn them into Unicode)
- Encode into entities again at the stage of output.
Consider saving the call to encode()
until you retrieve the value for display, rather than before you store it. So long as you are consistent in your retrieval mechanism, the extra data in your database probably isn't worth fretting over.
Edit
Re-reading your question I realize now my answer doesn't fully address the issue seeing as calling encode()
later will still have the same results. Not knowing of an alternative myself, it may not be much help, but you may want to consider finding a more suitable method for encoding that will respect existing symbols.
精彩评论