开发者

Javascript validate user input against desired character set (encoding)

开发者 https://www.devze.com 2023-03-28 22:16 出处:网络
The scenario is as follows: User copies text from web site using Win-1252 encoding开发者_JS百科 for its character set.This text is then sent to a database that I control with a character set of ISO-8

The scenario is as follows:

User copies text from web site using Win-1252 encoding开发者_JS百科 for its character set. This text is then sent to a database that I control with a character set of ISO-8859-1(this is a subset of Win-1252). Is there a mechanism within Javascript to inform the user that they are trying to insert "invalid" characters into the system? Preference if it can highlight said characters.

The general form of this problem is that a system A(sending system) has a Set of encodings defined as AsubE and a different system B(the accepting system) has a set of encodings defined as BsubE. When BsubE is inside the universe of AsubE it is not a problem. The question is about when BsubE is not a subset of AsubE how can we validate the input from the user.


Since some characters are not defined in the subset, you could use a regular expression to define those intervals:

function isNotAllowed(char) {
    return /\x00-\x1f|\x7f-\x9f/.test(char); // 00 to 1f, or 7f to 9f
}

To also highlight characters it will become more complicated but this function could be the core of it.


There is no facility in JavaScript to do this. Luckily, neither Windows-1252 or ISO-8859-1 is a variable-width encoding, so you could write something in, say, .NET or something that does understand character encodings to make a regular expression to test this.

For instance, in .NET, you could make a byte array with 256 bytes, one for each character, and then use each encoding to get the appropriate string. Figure out the differences in those strings, encode them into a regular expression, and there you go.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号