开发者

ruby mechanize: how read downloaded binary csv file

开发者 https://www.devze.com 2022-12-17 16:59 出处:网络
I\'m not very familiar using ruby with binary data.I\'m using mechanize to download a large number of csv files to my local disk.I then need to search these files for specific strings.

I'm not very familiar using ruby with binary data. I'm using mechanize to download a large number of csv files to my local disk. I then need to search these files for specific strings.

I use the save_as method in mechanize to save the file (which saves the file as binary). The content type of the file (according to mechanize) is:

application/vnd.ms-excel;charset=x-UTF-16LE-BOM

From here, I'm not sure how to read the file. I've tried reading it in as a normal file in ruby, but I just get the binary data. I've also tried just using standard unix tools (strings/grep) to try and search without any luck.

When I run the 'file' command on one of the files, I get:

foo.csv: Little-endian UTF-16 Unicode Pascal program text, with very long lines, with CRLF, CR, LF line terminators

I can see the data just fine with cat or vi. With vi I also see some control characters.

I've also tried both the csv and fastercsv ru开发者_Python百科by libraries, but I get 'IllegalFormatError' exception for these. I've also tried this solution without any luck.

Any help would be greatly appreciated. Thanks.


You can use the command 'iconv' to conver to UTF-8,

# iconv -f 'UTF-16LE' -t 'UTF-8' bad_file.csv > good_file.csv

There is also a wrapper for iconv in the standard library, you could use that to convert the file after reading it into your program.

0

精彩评论

暂无评论...
验证码 换一张
取 消