开发者

Excel File opened under Ubuntu, read by R, OpenOffice

开发者 https://www.devze.com 2023-02-03 12:33 出处:网络
I have a bunch of Excel files which are updated on a daily basis on a Windows machine. I transfer them t开发者_开发技巧o a Ubuntu machine and want to open them there. Specifically, I want to read the

I have a bunch of Excel files which are updated on a daily basis on a Windows machine. I transfer them t开发者_开发技巧o a Ubuntu machine and want to open them there. Specifically, I want to read the files as a database under R.

A couple years ago, I used ODBC under a Windows machine to open Excel files through R. Is there any way, I can do this with R under Ubuntu?

I could make a database .ODB file for the corresponding XLS files using OpenOffice, but I don't know of a way to connect to a .ODB database. OpenOffice seems to have ways to connect TO databases, but no way to connect to ODB.

Thanks for any potential solutions.


You might be able to get away with using xls2csv from apt-get install catdoc to dump the Excel files to CSV. Then you can pretty much pick your poison as to how they get proccessed from there. read.csv.sql from the sqldf package could be very handy if you want to extract information using SQL statements.


I would suggest xlsx package, which has no special requirements (like xlsReadWrite and others), so it can be easily used under Linux. Although it only reads (and writes) xlsx format.

Another aprouch could be using read.xls function in gdata package, which first convers xls files to csv, and read those dataframes. You will need Perl and xls2csv installed, which is not a big problem under Linux.


Your ODBC solution should work on Linux, providing you install the uniXODBC package (for your OS, you might also need the unixODBC-devel package if compiling RODBC) and the RODBC package for R. The link Gabor provides in his comment to @daroczig's answer has some details of RODBC on linux; note the points about this being read-only on Linux and the potentially difficult set-up.

You might well be better off with the options @daroczig and Gabor suggest, but if you are familiar with ODBC you might want to give it a try on Ubuntu also.


There's another solution - host your data in a database to which both your machines have access. Postgres or MySQL will cost you nothing or MS-SQL server if you've got cash rattling around. What you seem to be trying to do is exactly what networked RDBMSs were designed for. You'll be able to play with the data in Excel and R on remote machines. Win.

Copying Excel files around is a massive fail waiting to happen. Get yourself a real RDBMS. I'd go for Postgres.

0

精彩评论

暂无评论...
验证码 换一张
取 消