开发者

JS regex to split by line

开发者 https://www.devze.com 2023-02-11 13:13 出处:网络
How do you split a long piece of text into separate lines? Why does this return line1 twice? /^(.*?)$/mg.exec(\'line1\\r\\nline2\\r\\n\');

How do you split a long piece of text into separate lines? Why does this return line1 twice?

/^(.*?)$/mg.exec('line1\r\nline2\r\n');

["line1", "line1"]

I turned on the multi-line modifier to make ^ and $ match beginning and end of lines. I also turned on the global modifier to capture all lines.

I wish to use a regex split and not String.split because I'll be dealing with both Linux \n and Windows \r\n 开发者_JS百科line endings.


arrayOfLines = lineString.match(/[^\r\n]+/g);

As Tim said, it is both the entire match and capture. It appears regex.exec(string) returns on finding the first match regardless of global modifier, wheras string.match(regex) is honouring global.


Use

result = subject.split(/\r?\n/);

Your regex returns line1 twice because line1 is both the entire match and the contents of the first capturing group.


I am assuming following constitute newlines

  1. \r followed by \n
  2. \n followed by \r
  3. \n present alone
  4. \r present alone

Please Use

var re=/\r\n|\n\r|\n|\r/g;

arrayofLines=lineString.replace(re,"\n").split("\n");

for an array of all Lines including the empty ones.

OR

Please Use

arrayOfLines = lineString.match(/[^\r\n]+/g); 

For an array of non empty Lines


Even simpler regex that handles all line ending combinations, even mixed in the same file, and removes empty lines as well:

var lines = text.split(/[\r\n]+/g);

With whitespace trimming:

var lines = text.trim().split(/\s*[\r\n]+\s*/g);


Unicode Compliant Line Splitting

Unicode® Technical Standard #18 defines what constitutes line boundaries. That same section also gives a regular expression to match all line boundaries. Using that regex, we can define the following JS function that splits a given string at any line boundary (preserving empty lines as well as leading and trailing whitespace):

const splitLines = s => s.split(/\r\n|(?!\r\n)[\n-\r\x85\u2028\u2029]/)

I don't understand why the negative look-ahead part ((?!\r\n)) is necessary, but that is what is suggested in the Unicode document

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号