开发者

Opening a CSV file created in Mac Excel with Perl

开发者 https://www.devze.com 2023-02-10 21:45 出处:网络
I\'m having a bit of trouble with the Perl code below. I can open and read in a CSV file that I\'ve made manually, but if I try to open any Mac Excel spreadsheet that I save开发者_Go百科 as a CSV file

I'm having a bit of trouble with the Perl code below. I can open and read in a CSV file that I've made manually, but if I try to open any Mac Excel spreadsheet that I save开发者_Go百科 as a CSV file, the code below reads it all as a single line.

#!/usr/bin/perl

use strict;
use warnings;

open F, "file.csv";

foreach (<F>)
{
    ($first, $second, undef, undef) = split (',', $_);
}

print "$first : $second\n";

close(F);


Always use a specialised module (such as Text::CSV or Text::CSV_XS) for this purpose as there are lots of cases where split-ing will not help (for example when the fields contain a comma which is not a field separator but is within quotes).


Traditional Macintosh (System 9 and previous) uses CR (0x0D, \r) as the line separator. Mac OS X (Unix based) uses LF(0x0A, \n) as the default line separator, so the perl script, being a Unix tool, is probably expecting LF but is getting CR. Since there are no line separators in the file perl thinks there is only one line. If it had Windows line endings (CR,LF) you'd probably be getting an invisible CR at the end of each line.

A quick loop over the input replacing 0x0D with 0x0A should fix your problem.


I've directly experienced this problem with Excel 2004 for Mac. The line endings are indeed \r, and IIRC, the text uses the MacRoman character set, rather than Latin-1 or UTF-8 as you might expect.

So as well as the good advice to use Text::CSV / Text::CSV_XS and splitting on \r, you will want to open the file using the MacRoman encoding like so:

open my $fh, "<:encoding(MacRoman)", $filename
   or die "Can't read $filename: $!";

Likewise, when reading a file exported with Excel on Windows, you may wish to use :encoding(cp1252) instead of :encoding(MacRoman) in that code.


Not sure about Mac excel, but certainly the windows version tends to enclose all values in quotes: "like","this". Also, you need to take into account the possibility of there being a quote in the value, which would show up "like""this" (there's only a single " in that value).

To actually answer your question however, it's likely that it's using a different newline character from what you'd expect. It's probably saving as \r\n instead of \n, or vice versa.


As others have suspected, your line endings are probably to blame. On my Linux-based system there are builtin utilities to change these line endings. mac2unix (which I think is just a wrapper around dos2unix will read your file and change the line endings for you. You should have something similar both on Linux and Mac (Microsoft may not care about you).

If you want to handle this in Perl, look into setting the $/ variable to set the "input record separator" from "\n" to "\r" (if thats the right ending). Try local $/ = "\r" before you read the file. Read more about it in perldoc perlvar (near $/) or in perldoc perlport (devoted to writing portable Perl code.

P.S. if I have some part of this incorrect let me know, I don't use Mac, I just think I know the theory


if you set the "special variable" that handles what it considers a newline to \r you'll be able to read one line at a time: $/="\r"; in this particular case the mac new line for perl is default \n but the file is probably using \r. This builds off what Flynn1179 & Mark Thalman said but shows you what to do to use the while () style reading.

0

精彩评论

暂无评论...
验证码 换一张
取 消