We have a web site and WinForms application written in .NET 4.0 that allows users to enter any Unicode char (pretty standard).
The problem is that a small amount of our data gets submitted to an old mainframe application. While we were testing a user entered a name with characters that ending up crashing the mainframe program. The name was BOËNS. The E is not supported.
What is the best way to detect if a unicode char is supported by EBCDIC?
I tried using the following regular expression but that restricted some standard special chars (/, _, :) which are fine for the mainframe.
I would prefer to use one method to validate each char or have a method that you just passed in a st开发者_StackOverflow社区ring and it returned true or false if chars not supported by EBCDIC were contained in the strig.
First, you would have to get the proper Encoding instance for EBCDIC, calling the static GetEncoding
method which will takes the code page id as a parameter.
Once you have that, you can set the DecoderFallback
property to the value in the static ExceptionFallback
property on the DecoderFallback
class.
Then, in your code, you would loop through each character in your string and call the GetBytes
method to encode the character to the byte sequence. If it cannot be encoded, then a DecoderFallbackException
is thrown; you would just have to wrap each call to GetBytes
in a try/catch
block to determine which character is in error.
Note, the above is required if you want to know the position of the character that failed. If you don't care about the position of the character, just if the string will not encode as a whole, then you can just call the GetBytes
method which takes a string parameter and it will throw the same DecoderFallbackException
if a character that cannot be encoded is encountered.
You can escape characters in Regex using the \
. So if you want to match a dot, you can do @"\."
. To match /._,:[]-
for example: @"[/._,:\-\[\]]
. Now, EBDIC is 8 bits, but many characters are control characters. Do you have a list of "valid" characters?
I have made this pattern:
string pattern = @"[^a-zA-Z0-9 ¢.<(+&!$*);¬/|,%_>?`:#@'=~{}\-\\" + '"' + "]";
It should find "illegal" characters. If IsMatch
then there is a problem.
I have used this: http://nemesis.lonestar.org/reference/telecom/codes/ebcdic.html
Note the special handling of the "
. I'm using the @
at the beginning of the string to disable \ escape expansion
, so I can't escape the closing quote, and so I add it to the pattern in the end.
To test it:
Regex rx = new Regex(pattern);
bool m1 = rx.IsMatch(@"a-zA-Z0-9 ¢.<(+&!$*);¬/|,%_>?`:#@'=~{}\-\\" + '"');
bool m2 = rx.IsMatch(@"€a-zA-Z0-9 ¢.<(+&!$*);¬/|,%_>?`:#@'=~{}\-\\" + '"');
m1
is false
(it's the list of all the "good" characters), m2
is true
(to the other list I've added the €
symbol)
精彩评论