开发者

What is unicode character 2028 (LS / Line Separator) used for?

开发者 https://www.devze.com 2023-01-04 12:12 出处:网络
I was thinking to myself that the line breaking problem must be somewhat solved by someone, but maybe not widely adopted. Being forward thinking, I went to search to see if there was a platform indepe

I was thinking to myself that the line breaking problem must be somewhat solved by someone, but maybe not widely adopted. Being forward thinking, I went to search to see if there was a platform independent unicode method to separate lines. In my search I found unicode character 2028. Then, I found Jeff Atwoods post on this topic where he mentions that he's "...not sure under what circumstances you would want those Unicode newline markers."

Well, me too. I did a little digging in the C# source code and it looks li开发者_开发知识库ke LS (x2028) is not supported by TextReader.ReadLine() and it is also not supported in Java's BufferedReader.ReadLine(). So, my conclusion is that it is not widely supported.

I would love to have a bright future where I can write files using a single format in Linux, MacOS and Windows. Does this little character have promise? What is it currently used for?


Nicked from McDowell’s comment on the same page, and indirectly from the Unicode docs:

Traditionally, NLF started out as a line separator (and sometimes record separator). It is still used as a line separator in simple text editors such as program editors. As platforms and programs started to handle word processing with automatic line-wrap, these characters were reinterpreted to stand for paragraph separators. For example, even such simple programs as the Windows Notepad program and the Mac SimpleText program interpret their platform’s NLF as a paragraph separator, not a line separator.

NLF (New Line Function) in this context is shorthand for CR, LF and CRLF. By contrast, the two Unicode characters have unambiguous uses.


Per the Unicode Newline Guidelines, U+2029 paragraph separator (PS) unambiguously indicates an intent to separate paragraphs. U+2028 line separator (LS) does likewise for lines. The other newline function characters, LF, CR, CR+LF, and NEL, are ambiguous, with their meanings dependent on platform and application.

For example, a LF might separate paragraphs in a word processing application but only lines in a simple text editor. By contrast, PS always separates paragraphs, regardless of the type of application.

0

精彩评论

暂无评论...
验证码 换一张
取 消