If I get an element that has a <BR \>
inside, and get it's text with the innerText
property, I'm seeing that the line break is two characters: 13 and 10. What determines this? Is it the browser or the web page's encoding?
I want to either make sure line breaks are always going to be this two characters (as long as it's part of the static content of the web page and not dynamically created content) or modify my text processing algorithm to han开发者_运维百科dle both possibilities.
This is something I'll be using to split text into lines with the split
method. I'm not sure if I should use split("\r\n")
or some more complicated code.
split(/\r\n?|\n/g)
should handle UNIX newlines, windows newlines, and old-style Mac newlines.
There are a few other characters that are considered newlines by unicode but those extra ones are unlikely to be used to replace <br>
s in HTML innerText
.
The 13 corresponds to \r
which is known as CR or carriage return.
The 10 corresponds to \n
which is known as LF or line-feed.
The combination of the two "\r\n" is known as a CRLF line separator.
Some of them are considered line terminators in other web languages. E.g. U+2028 and U+2029 are line terminators in JavaScript and U+000C is considered a line terminator in CSS.
It depends on your editor and/or OS. Windows uses \r(13)\n(10). Unix systems use only \n. Old macs used \r. You could just replace all \r\n by \n and than split on \n. So
//"test\r\nnewline".replace('\r\n','\n').split('\n')//only replaces the first newline
"test\r\nnewline".replace(/\r\n/g, '\n').split('\n')
It is generally dependent on OS. Windows is \r\n, \r for Mac and \n for Linux
text.split(/\s*\n+/)
splits text on newlines. It is always safe to remove any whitespace characters before a newline, but not after.
精彩评论