开发者

Lazarus. Equivalent to Chr() for Unicode symbols

开发者 https://www.devze.com 2023-03-26 00:49 出处:网络
Is there any function in freepascal to show the Unicode symbol by its code (e.g. U+1D15E)? Unfortunately Chr() works only with ANSI symbols (with codes less than 127).

Is there any function in freepascal to show the Unicode symbol by its code (e.g. U+1D15E)? Unfortunately Chr() works only with ANSI symbols (with codes less than 127).

I want to use symbols from custom symbolic font and 开发者_开发技巧it is very inconvenient to put them into sourcecode directly (they are shown in Lazarus as ? or something else because they are absent in system fonts).


Take a look at this page. I assume that Freepascal either uses UTF-16, in which it becomes a surrogate pair of two WideChars (see table) or UTF-8, in which it becomes a sequence of byte values (see table again).

UTF-8:

const
  HalfNoteString = UTF8String(#$F0#$9D#$85#$9E);

UTF-16:

const
  HalfNoteString = UnicodeString(#$D834#$DD5E);

The names of the string types may differ, as I don't know FreePascal very well. Perhaps AnsiString and WideString.


I have never used Free Pascal, but if I were you, I'd try

var
  s: char;
begin
  s := char($222b);                   // Just cast a word

or, if the compiler is really stubborn,

var
  s: char;
begin
  PWord(@s)^ := $222b;                // Forcibly write a word


Current unicode status of FPC to my best knowledge

  1. The codepage of literals can be set with $codepage http://www.freepascal.org/docs-html/prog/progsu81.html
  2. FPC 2.4.x+ does have unicodestring (since it is +/- Kylix widestring) but only basic routine support. (pos and copy, not routines like format), but the "record" misses the codepage field.
  3. Lazarus widgets expect UTF8 in normal ansistrings (D7..D2007 ansistrings without codepage data), and programmers must manually insert conversions if necessary. So on Windows the widgets ARE mostly using unicode (-W) calls, but take ansistrings with UTF8 in it.
  4. FPC doesn't follow the utf8 in ansistring scheme , so for some string accepting routines in sysutils, there are special routines in Lazarus that assume UTF8 that call -W variants)
  5. FPC ansistring is the system default 1-byte encoding. ansi on Windows, utf8 on most other platforms.
  6. Trunk, 2.7.1, provides support for the new D2009+ ansistring (with codepages).
  7. There has been no discussion yet how to deal with the default stringtype (e.g. will "string" be utf8string on *nix and unicodestring on Windows, or unicodestring or utf8string everywhere?)
  8. Other unicodestring related enhancement (like encoding parameters to tstringlist.savetofile) are not implemented. Likewise for the pseudo objects (like TCharacter which are afaik mostly static)

Update: 2.7.1 has a variable encoding ansistring type, and lazarus has been fixed to keep working. Nothing is really taking advantage from it yet though, e.g. most of the RTL still uses -A calls, and prototypes of sysutils and system procedures that takes strings haven't changed to rawbytestring yet.


I assume the problem is to convert from UCS4 encoding (which is actually a Unicode codepoint number) to UTF16.

In Delphi, you can use UCS4StringToUnicodeString function.

Warning: Be careful with UCS4String type. It is actually a zero-terminated dynamic array, not a string (that means it is zero-based).

var
  S1: UCS4String;
  S: string;

begin
  SetLength(S1, 2);
  S1[0]:= UCS4Char($1D15E);
  S1[1]:= UCS4Char(0);
  S:= UCS4StringToUnicodeString(S1);
  ShowMessage(Format('%d, %x, %x', [Length(S), Ord(S[1]), Ord(S[2])]));
end;
0

精彩评论

暂无评论...
验证码 换一张
取 消