开发者

Why Read In UTF-16LE File Won't Convert "\r\n" Into "\n" In Windows

开发者 https://www.devze.com 2022-12-26 08:55 出处:网络
I am using Perl to read UTF-16LE files in Windows 7. If I read in an ASCII file with following code开发者_Python百科 then each \"\\r\\n\" in file will be converted into a \"\\n\" in memory:

I am using Perl to read UTF-16LE files in Windows 7.

If I read in an ASCII file with following code开发者_Python百科 then each "\r\n" in file will be converted into a "\n" in memory:

open CUR_FILE, "<", $asciiFile; 

If I read in an UTF-16LE(windows 1200) file with following code, this inconsistency cause problems when I trying to regexp lines with line breaks.

open CUR_FILE, "<:encoding(UTF-16LE)", $utf16leFile;

Then "\r\n" will keep unchanged.

Update:

For each line of a UTF-16LE file:

line =~ /(.*)$/

Then the string matched in $1 will include a "\r" at the end...


What version of Perl are you using? UTF-16 and CRLF handling did not mix properly before 5.8.9 (Unicode changes in 5.8.9). I'm not sure about 5.10.0, but it works in 5.10.1 and 5.8.9. You might need to use "<:encoding(UTF-16LE):crlf" when opening the file.


That is windows performing that magic for you.... If you specify UTF this is the equivalent of opening the file in binary mode vs text.

Newer versions of Perl have the \R which is a generic newline (ie, will match both \r\n and \n) as well as \v which will match all the OS and Unicode notions of vertical whitespace (ie, \r \n \r\n nonbreaking space, etc)

Does you regex logic allow using \R instead of \n?

0

精彩评论

暂无评论...
验证码 换一张
取 消