In my website there's a header that should display "Welcome username.".
<span>Welcome <?php echo $username; ?>.</span>
The problem is that if the user changes his name to U+202Eusername (where U+202E is the right-to-left override character, or RLO), all the layout breaks.
Instead of displaying "Welcome emanresu.", it displays "Welcome .emanresu" or ".emanresu Welcome" or things开发者_开发百科 like that. I tried adding a U+202C (pop directional formatting or PDF) character after the username, and it worked. Like this:
<span>Welcome <?php echo $username; ?>‬.</span>
But, if the username has more than one RLO characters, it breaks again. So what I should do is matching the RLOs characters with the PDFs characters, but I'm not sure of how to do this. And according to the W3C specifications there's no solution to this. Am I missing something here?
You might be interested in HTML5 tag <bdi>
.
Details: http://rishida.net/blog/?p=564
Why not search for this character in $username and if it is found change <span>
to <span dir="rtl">
. Also replace these characters with blank in $username
Understanding Bidirectional (BIDI) Text in Unicode
This article being a very interesting general read about bidi issues also has a section named "Filtering User Input" near the end that seems to be talking exactly about the issue you're talking about.
W3C's solution is that you should be filtering out RLO and other characters from the group known as “Not suitable for use in markup”.
Do this at the same time as filtering out other unwanted control codes like ASCII 0x00–0x1F (potentially including or excluding the newline character) and 0x7F-0x9F. See this question for background.
(You should also be using echo htmlspecialchars($username);
. Maybe your usernames can't contain <
or &
but that's not a good idea to rely on in your output stage. Get used to calling htmlspecialchars
on everything that goes out to the page as a matter of course; define a shortcut function for it if necessary.)
精彩评论