开发者

Delphi: TStringList does not understand BOM?

开发者 https://www.devze.com 2023-03-13 15:53 出处:网络
Does TStringList not understand BOM? Tf1 := TFileStream.Create(LIGALOG+\'liga.log\',fmOpenRead or fmShareDenyNone);

Does TStringList not understand BOM?

Tf1 := TFileStream.Create(LIGALOG+'liga.log',fmOpenRead or fmShareDenyNone);

str:=tstringlist.Create;
str.LoadFromStream(tf1);

String1:='FStream '+inttostr(tf开发者_如何转开发1.Size)+'/ String: '+(str.Text);

If a text file is saved in UTF-8 +BOM then Str.Count=0; Str.Text=''. Without BOM all is OK.

Is it normal?


If you're using a version of Delphi prior to 2009, it doesn't support Unicode and the BOM is meaningless to TStringList.

If you're using D2009 or higher (which support Unicode), you can use the overloaded TStringList.LoadFromStream(Stream: TStream; Encoding: TEncoding)if you know ahead of time what the encoding is; if you don't, the RTL will try to figure it out using TEncoding.GetBufferEncoding. You can see the Delphi XE documentation on the topic here

If you don't know ahead of time, and the RTL isn't able to figure it out from the content, you can always read the BOM yourself from the stream, and then set the Stream.Position to just after the BOM and load the TStringList from that position with the decoding you determine yourself from that BOM.

Also, creating a TFileStream simply to then load into a TStringList is a waste; TStringList.LoadFromFile will handle the file itself, and is a lot less code if that's all you're going to do with the TStream.

EDIT: After your comment, I thought I'd include a list of the BOMs I'm familiar with - there may be more I'm not aware of:

$00 $00 $FE $FF  UTF-32, big-endian (bytes must be swapped for Windows)
$FE $FF $00 $00  UTF-32, little-endian
$FF $FE          UTF-16 2 byte chars little-endian
$FE $FF          UTF-16 2 byte big-endian 
$EF $BB $BF      Unicode UTF-8 (must be decoded before using Unicode data)

(For future reference: You should indicate in either the tags or the text of your question which version of Delphi you're using, as there are differences in the VCL and RTL between them. When it comes to things like Unicode/BOM type questions, these differences are extremely important.)

0

精彩评论

暂无评论...
验证码 换一张
取 消