开发者

SQLite or flat text file?

开发者 https://www.devze.com 2022-12-21 22:45 出处:网络
I process a lot of text/data that I exchange between Python, R, and sometimes Matlab. My go-to is the flat text file, but also use SQLite occasionally to store the data and access from each program (

I process a lot of text/data that I exchange between Python, R, and sometimes Matlab.

My go-to is the flat text file, but also use SQLite occasionally to store the data and access from each program (not Matlab yet thou开发者_运维问答gh). I don't use GROUPBY, AVG, etc. in SQL as much as I do these operations in R, so I don't necessarily require the database operations.

For such applications that requires exchanging data among programs to make use of available libraries in each language, is there a good rule of thumb on which data exchange format/method to use (even XML or NetCDF or HDF5)?

I know between Python -> R there is rpy or rpy2 but I was wondering about this question in a more general sense - I use many computers which all don't have rpy2 and also use a few other pieces of scientific analysis software that require access to the data at various times (the stages of processing and analysis are also separated).


If all the languages support SQLite - use it. The power of SQL might not be useful to you right now, but it probably will be at some point, and it saves you having to rewrite things later when you decide you want to be able to query your data in more complicated ways.

SQLite will also probably be substantially faster if you only want to access certain bits of data in your datastore - since doing that with a flat-text file is challenging without reading the whole file in (though it's not impossible).


A flat text file (e.g. in csv format) would be the most portable solution. Almost every program/library can work with this format: R and Python have good csv support and if your data set isn't too large you can even import the csv into Excel for smaller tasks.

However, text files are unhandily for larger data sets since you need to read them completely for almost all operations (depending on the structure of your data).

SQLite allows you to filter the data very easily (even without much SQL experties) and as you already mentioned can do some computation on its own (AVG, SUM, ...). Using the Firefox Plug-in SQLiteManager you can work with the DB on every computer without any installation/configuration trouble and thus easily manage your data (import/export, filter).

So I would recommend to use SQLite for larger data sets that needs a lot of filtering to extract the data that you need. For smaller data sets or if there is no need to select subsets of your data a flat (csv) text file should be fine.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号