开发者

Postgres insert optimization

开发者 https://www.devze.com 2023-01-29 05:45 出处:网络
I have a script that generates tens of thousands of inserts into a postgres db through a custom ORM.As you can imagine, it\'s quite slow.This is used for development purposes in order to create dummy

I have a script that generates tens of thousands of inserts into a postgres db through a custom ORM. As you can imagine, it's quite slow. This is used for development purposes in order to create dummy data. Is there a simple optimization I can do at the Postgres level to make this faster? 开发者_开发知识库 It's the only script running, sequentially, and requires no thread safety.

Perhaps I can turn off all locking, safety checks, triggers, etc? Just looking for a quick and dirty solution that will greatly speed up this process.

Thanks.


If you don't need that kind of functionality in production environment, I'd suggest you turn fsync off from your PostgreSQL config. This will speed up the inserts dramatically.

Never turn off fsync on a production database.


The fastest way to insert data would be the COPY command. But that requires a flat file as its input. I guess generating a flat file is not an option.

Don't commit too often, especially do not run this with autocommit enabled. "Tens of thousands" sounds like a single commit at the end would be just right.

If you can convice your ORM to make use of Postgres' multi-row insert that would speed up things as well

This is an example of a multi-row insert:

insert into my_table (col1, col2) 
values 
(row_1_col_value1, row_1_col_value_2), 
(row_2_col_value1, row_2_col_value_2), 
(row_3_col_value1, row_3_col_value_2)

If you can't generate the above syntax and you are using Java make sure you are using batched statements instead of single statement inserts (maybe other DB layers allow something similar)

Edit:

jmz' post inspired me to add something:

You might also see an improvement when you increase wal_buffers to some bigger value (e.g. 8MB) and checkpoint_segments (e.g. 16)


For inserts that number in the hundreds to thousands, batch them:

begin;
insert1 ...
insert2 ...
...
insert10k ... 
commit;

For inserts in the millions use copy:

COPY test (ts) FROM stdin;
2010-11-29 22:32:01.383741-07
2010-11-29 22:32:01.737722-07
... 1Million rows
\.

Make sure any col used as an fk in another table is indexed if it's more than trivial in size in the other table.


One thing you can do is remove all indexs, do your inserts, and then recreate the indexes.


Are you sending a batch of tens of thousands of INSERTs OR are you sending tens of thousands of INSERTs?

I know with Hibernate you can batch all your SQL statements up and send them at the end in one big chunk instead of taking the tax of network and database overhead of making thousands of SQL statements individually.


If you are just initializing constant test data, you could also put the test data into a staging table(s), then just copy the table contents, using

INSERT INTO... SELECT...

that should be about as fast as using COPY (though I did not benchmark it), with the advantage that you can copy using just SQL commands, without the hassle of setting up an external file like for COPY.


Try to do as much as possible in one request!

insert into my_table (col1, col2) 
values (
  unnest(array[row_1_col_value_1, row_2_col_value_1, row3_col_value_1]), 
  unnest(array[row_1_col_value_2, row_2_col_value_2, row_3_col_value_2));

This resembles the suggestion of @a_horse_with_no_name. The advantage of using unnest is: You can use query parameters that contain arrays!

insert into my_table (col1, col2) 
values (unnest(:col_values_1), unnest(:col_values_2));

By collapsing three insert statements into one, you save more than 50% of execution time. And by using query parameters with 2000 values in a single Insert, I get a speed factor of 150 in my application.

0

精彩评论

暂无评论...
验证码 换一张
取 消