开发者

Unicode strings in .Net with Hebrew letters and numbers

开发者 https://www.devze.com 2023-03-18 03:48 出处:网络
There is a strange behavior when trying to create string which contains a Hebrew letter and a digit. The digit will always be displayed left to the letter. For example:

There is a strange behavior when trying to create string which contains a Hebrew letter and a digit. The digit will always be displayed left to the letter. For example:

string A = "\u05E9"; //A Hebrew letter
string B = "23";
string AB = A + B;
textBlock1.Text = AB;
//Ouput bug - B is left to A.

This bug only happens when using both a Hebrew letter and digits. When omitting one of those from the equation the bug won't happen:

string A = "\u20AA"; //Some random Unicode.
string B = "23";
string AB = A + B;
textBlock1.Text = AB;
//Output OK.

string A = "\u05E9"; 开发者_Go百科//A Hebrew letter.
string B = "HELLO";
string AB = A + B;
textBlock1.Text = AB;
//Output OK.

I tried playing with FlowDirection property but it didn't help.

A workaround to get the text displayed properly in the first code exmaple would be welcomed.


The unicode characters "RTL mark" (U+200F) and "LTR mark" (U+200E) were created precisely for this purpose.

In your example, simply place an LTR mark after the Hebrew character, and the numbers will then be displayed to the right of the Hebrew character, as you wish.

So your code would be adjusted as follows:

string A = "\u05E9"; //A Hebrew letter
string LTRMark = "\u200E"; 
string B = "23";
string AB = A + LTRMark + B;


This is because of Unicode Bidirectional Algorithms. If I understand this correctly, the unicode character has an "identifier" that says where it should be when it's next to another word.

In this case \u05E9 says that it should be to the left. Even if you do:

var ab = string.Format("{0}{1}", a, b);

You will still get it to the left. However, if you take another unicoded character such as \u05D9 this will be added to the right because that character is not said to be on the left.

This is the layout of the language and when outputting this the layout enginge will output it according to the language layout.


That strange Behavior has explanation. Digits with unicode chars are treated as a part of unicode string. and as Hebrew lang is read right to left, scenario will give

string A = "\u05E9"; //A Hebrew letter
string B = "23";
string AB = A + B;

B comes first, followed by A.

second scenario:

string A = "\u20AA"; //Some random Unicode.
string B = "23";
string AB = A + B;

A is some unicode, not part of lang that is read right to left. so output is - first A followed by B.

now consider my own scenario

string A = "\u05E9";
string B = "\u05EA";
string AB = A + B;

both A and B are part of right to left read lang, so AB is B followed by A. not A followed by B.

EDITED, to answer the comment

taking into account this scenario -

string A = "\u05E9"; //A Hebrew letter
string B = "23";
string AB = A + B;

The only solution, to get letter followed by digit, is : string AB = B + A;

prolly, not a solution that will work in general. So, I guess u have to implement some checking conditions and build string according the requirements.


string A = "\u05E9"; //A Hebrew letter
string B = "23";
string AB = B + A; // !
textBlock1.Text = AB;
textBlock1.FlowDirection = FlowDirection.RightToLeft;
//Ouput Ok - A is left to B as intended.
0

精彩评论

暂无评论...
验证码 换一张
取 消