JS regex to split by line_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-11 13:13 出处：网络

How do you split a long piece of text into separate lines? Why does this return line1 twice? /^(.*?)$/mg.exec(\'line1\\r\\nline2\\r\\n\');

How do you split a long piece of text into separate lines? Why does this return line1 twice?

/^(.*?)$/mg.exec('line1\r\nline2\r\n');

["line1", "line1"]

I turned on the multi-line modifier to make ^ and $ match beginning and end of lines. I also turned on the global modifier to capture all lines.

I wish to use a regex split and not String.split because I'll be dealing with both Linux \n and Windows \r\n 开发者_JS百科line endings.

arrayOfLines = lineString.match(/[^\r\n]+/g);

As Tim said, it is both the entire match and capture. It appears regex.exec(string) returns on finding the first match regardless of global modifier, wheras string.match(regex) is honouring global.

Use

result = subject.split(/\r?\n/);

Your regex returns line1 twice because line1 is both the entire match and the contents of the first capturing group.

I am assuming following constitute newlines

\r followed by \n
\n followed by \r
\n present alone
\r present alone

Please Use

var re=/\r\n|\n\r|\n|\r/g;

arrayofLines=lineString.replace(re,"\n").split("\n");

for an array of all Lines including the empty ones.

Please Use

arrayOfLines = lineString.match(/[^\r\n]+/g);

For an array of non empty Lines

Even simpler regex that handles all line ending combinations, even mixed in the same file, and removes empty lines as well:

var lines = text.split(/[\r\n]+/g);

With whitespace trimming:

var lines = text.trim().split(/\s*[\r\n]+\s*/g);

Unicode Compliant Line Splitting

Unicode® Technical Standard #18 defines what constitutes line boundaries. That same section also gives a regular expression to match all line boundaries. Using that regex, we can define the following JS function that splits a given string at any line boundary (preserving empty lines as well as leading and trailing whitespace):

const splitLines = s => s.split(/\r\n|(?!\r\n)[\n-\r\x85\u2028\u2029]/)

I don't understand why the negative look-ahead part ((?!\r\n)) is necessary, but that is what is suggested in the Unicode document

JS regex to split by line

Unicode Compliant Line Splitting

精彩评论

关注公众号

热门标签

图文推荐

JS regex to split by line

Unicode Compliant Line Splitting

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：