开发者

Upload large CSV file approx 10,000,000 records in mysql table also it contain duplicate rows

开发者 https://www.devze.com 2023-04-08 05:38 出处:网络
I want to upload a large csv file approx 10,000,000 records in mysql table which also contain same or more no. of records and also some duplicate records.

I want to upload a large csv file approx 10,000,000 records in mysql table which also contain same or more no. of records and also some duplicate records. I tried Local data infile but it is also taking more time. How can I resolve this without waiting for a long time. If it can't be resolved then how can 开发者_JAVA百科I do it with AJAX to send some records and process it at a time and will do it till the whole csv get uploaded/proccessed.


LOAD DATA INFILE isn't going to be beat speed-wise. There are a few things you can do to speed it up:

  • Drop or disable some indexes (but of course, you'll get to wait for them to build after the load. But this is often faster). If you're using MyISAM, you can ALTER TABLE *foo* DISABLE KEYS, but InnoDB doesn't support that, unfortunately. You'll have to drop them instead.
  • Optimize your my.cnf settings. In particular, you may be able to disable a lot of safety things (like fsync). Of course, if you take a crash, you'll have to restore a backup and start the load over again. Also, if you're running the default my.cnf, last I checked its pretty sub-optimal for a database machine. Plenty of tuning guides are around.
  • Buy faster hardware. Or rent some (e.g., try a fast Amazon ECC instance).
  • As @ZendDevel mentions, consider other data storage solutions, if you're not locked into MySQL. For example, if you're just storing a list of telephone numbers (and some data with them), a plain hash table is going to be many times faster.

If the problem is that its killing a database performance, you can split your CSV file into multiple CSV files, and load them in chunks.


Try this:

load data local infile '/yourcsvfile.csv' into table yourtable fields terminated by ',' lines terminated by '\r\n'


Depending on your storage engine this can take a long time. I've noticed that with MYISAM it goes a bit faster. I've just tested with the exact same dataset and I finally went with PostgreSQL because it was more robust at loading the file. Innodb was so slow I aborted it after two hours with the same size dataset but it was 10,000,000 records by 128 columns full of data.


As this is a white list being updated on a daily basis does this not mean that there are a very large number of duplicates (after the first day)? If this is the case it would make the upload a lot faster to do a simple script which checks if the record already exists before inserting it.


Try this query:

$sql="LOAD DATA LOCAL INFILE '../upload/csvfile.csv' 
INTO TABLE table_name FIELDS 
TERMINATED BY ',' 
ENCLOSED BY '' 
LINES TERMINATED BY '\n' "


I was realize the same problem and find out a way out. You can check the process to upload large CSV file using AJAX.

How to use AJAX to upload large CSV file?

0

精彩评论

暂无评论...
验证码 换一张
取 消