开发者

multibyte identifiers list

开发者 https://www.devze.com 2023-02-07 01:12 出处:网络
I was looking into multi-byte characters and how they are used but how many different identifiers/pasterns are used for dif开发者_高级运维ferent multi-bytes.

I was looking into multi-byte characters and how they are used but how many different identifiers/pasterns are used for dif开发者_高级运维ferent multi-bytes.

e.g: &nbps;,&#nbsp;,U+0026,%20

how many different identifiers such as &,&#,u+ ,% etc are there ?

Im trying to look for inputs if they have words which are more than 255 characters long then its probably a multi-byte (hack attempt) and then I can check if word can be split has the multi-byte identifier then stop the hack attempt.


% format - a url-encoded value for embedding into URLS, e.g. %20 is a space (ascii 20)
  - named character entity, a non-breaking space in this case
U+0026 - a unicode character in hex notation, an & in this case
&#...; - a numbered character entity in decimal (base10) & = &
&#x...; - a numbered character entity in hex (base 16): & = &


Are you trying to avoid homoglyph-based spoofing ? Does identifier means username here ?

If yes, and if your users use a latin alphabet, just allow only ascii letters and numbers:

$identifier = preg_replace('#[^A-Za-z0-9]+#', '', $identifier);
0

精彩评论

暂无评论...
验证码 换一张
取 消