开发者

CR vs LF perl parsing

开发者 https://www.devze.com 2023-04-08 14:17 出处:网络
I have a perl script which parses a text file and breaks it up per line into an array. It works fine when each line are terminated by LF but when they terminate by CR my script is not handling properl

I have a perl script which parses a text file and breaks it up per line into an array. It works fine when each line are terminated by LF but when they terminate by CR my script is not handling properly. How can I modify this line to fix this

my @allLines = split(/^/, $entireFile);

edit: My file has a mixture of lines with either ending LF or ending CR it just collapses all lines when its endi开发者_如何学运维ng in CR


Perl can handle both CRLF and LF line-endings with the built-in :crlf PerlIO layer:

open(my $in, '<:crlf', $filename);

will automatically convert CRLF line endings to LF, and leave LF line endings unchanged. But CR-only files are the odd-man out. If you know that the file uses CR-only, then you can set $/ to "\r" and it will read line-by-line (but it won't change the CR to a LF).

If you have to deal with files of unknown line endings (or even mixed line endings in a single file), you might want to install the PerlIO::eol module. Then you can say:

open(my $in, '<:raw:eol(LF)', $filename);

and it will automatically convert CR, CRLF, or LF line endings into LF as you read the file.

Another option is to set $/ to undef, which will read the entire file in one slurp. Then split it on /\r\n?|\n/. But that assumes that the file is small enough to fit in memory.


If you have mixed line endings, you can normalize them by matching a generalized line ending:

 use v5.10;

 $entireFile =~ s/\R/\n/g;

You can also open a filehandle on a string and read lines just like you would from a file:

 open my $fh, '<', \ $entireFile;
 my @lines = <$fh>;
 close $fh;

You can even open the string with the layers that cjm shows.


You can probably just handle the different line endings when doing the split, e.g.:

my @allLines = split(/\r\n|\r|\n/, $entireFile);


It will automatically split the input into lines if you read with <>, but you need to you need to change $/ to \r.

$/ is the "input record separator". see perldoc perlvar for details.

There is not any way to change what a regular expression considers to be the end-of-line - it's always newline.

0

精彩评论

暂无评论...
验证码 换一张
取 消