Problem dropping columns from large SQLite database_问答_开发者

I have a 15gb SQLite db with 40 columns. I would like to remove most of the columns to make queries faster and the db more portable. I found this guidance, but can't get it to work. It seems like it hangs and I end up corrupting the db and having to start over. Here's the script I use:

BEGIN TRANSACTION;
CREATE TEMPORARY TABLE dly_backup(DATE INTEGER,
    TICKER TEXT,
    TSYMBOL TEXT,
    VOL INTEGER,
    RET REAL,
    RETX REAL,
    VWRETD REAL,
    VWRETX REAL,
    EWRETD REAL,
    EWRETX REAL,
    SPRTRN REAL
);
INSERT INTO dly_backup SELECT DATE INTEGER,
    TICKER TEXT,
    TSYMBOL TEXT,
    VOL INTEGER,
    RET REAL,
    RETX REAL,
    vwretd REAL,
    vwretx REAL,
    ewretd REAL,
    ewretx REAL,
    sprtrn REAL
    FROM dly;
DROP TABLE dly;
CREATE TABLE dly(DATE INTEGER,
    TICKER TEXT,
    TSYMBOL TEXT,
    VOL INTEGER,
    RET REAL,
    RETX REAL,
    VWRETD REAL,
    VWRETX REAL,
    EWRETD REAL,
    EWRETX REAL,
    SPRTRN REAL
);
INSERT INTO dly SELECT DATE INTEGER,
    TICKER TEXT,
    TSYMBOL TEXT,
    VOL INTEGER,
    RET REAL,
    RETX REAL,
    VWRETD REAL,
    VWRETX REAL,
    EWRETD REAL,
    EWRETX REAL,
    SPRTRN REAL
    FROM d开发者_开发知识库ly_backup;
DROP TABLE dly_backup;
COMMIT;

Is there a better way to do this? FWIW, I have the original .csv file and import it using the RSQLite package in R.

Is there a way that I can only import a subset of columns in a .csv file? Thanks! (new to SQLite)

I actually have no idea whether 15G is a large size for SQLite - I tend to use DMBS' where 15G can be considered a configuration table :-)

However, one thing we normally do for this sort of work, which may help you out:

take the database offline totally (in other words, prevent anyone from connection and/or changing the data) ^*a.
rename the current table to the backup one.
create a new table with the original name and reduced schema.
copy your data into it from the backup, but with two caveats: first, disable any indexes, triggers and/or constraints, second, don't do it as part of a transaction (you know the data is good since it's coming from a table with the constraints, and you don't need a transaction (with its associated overheads, since you're the only one using the database).
finally, re-instate the indexes/triggers/constraints and re-open the database for business.

^*a Of course, we don't really do this, we have multiple redundant database instances with failover and all sorts of other wonderful features like replication, but it's probably workable for a small database like this.

One thing I have noticed is that you copy the partial rows to the backup table, re-create the original table, then copy them back row by row, before deleting the backup.

It seems to me that you could simply rename the current table to the backup one instead of that first copy. It doesn't matter that you still have the unneeded columns in the backup since you're not going to transfer them and will eventually delete the backup table. Give this a try (and minimising your transaction scope):

begin transaction;
alter table dly rename to dly_backup;
create table dly (
  date integer, ticker text, tsymbol text, vol integer, ret real, retx real,
  vwretd real, vwretx real, ewretd real, ewretx real, sprtrn real);
commit;

begin transaction;
insert into dly (
    date, ticker, tsymbol, vol, ret, retx,
    vwretd, vwretx, ewretd, ewretx, sprtrn
  ) select
    date, ticker, tsymbol, vol, ret, retx,
    vwretd, vwretx, ewretd, ewretx, sprtrn
  from dly_backup;
commit;

This will result in a transaction that half the size of the one you're attempting.

Then, and only then, and only if there were no errors, would you drop dly_backup. If you still had problems with the process, you would drop dly then rename the backup table back to the original and try again.

Another thing you may want to try is to limit the data being transferred in a test run, to see if it runs okay with a smaller data set. Using your original code, try to create the dly_backup table but only copy across a subset of data (assuming these are NYSE/NASDAQ quotes, you can do something like using a where clause to only get one ticker symbol such as MSFT or IBM).

Don't drop any tables in the test run.

And I've just noticed your rather strange syntax for the insert...select statement where you have both the column name and type. I don't know whether that's an extension for SQLite but I think it would cause problems for other DBMS' that I'm familiar with. Was that a typo on your part?