开发者

PostgreSQL slow on a large table with arrays and lots of updates

开发者 https://www.devze.com 2023-01-04 05:58 出处:网络
I have a pretty large table (20M records) which has a 3 column index and an array column. The array column is updated daily (by appending new values) for all rows. There is also inserts, but not as mu

I have a pretty large table (20M records) which has a 3 column index and an array column. The array column is updated daily (by appending new values) for all rows. There is also inserts, but not as much as there are updates.

The data in the array represents daily measurements corresponding to the three keys, something like this: [[date_id_1, my_value_for_date_1], [date_id_2, my_value_for_date_2]]. It is used to draw a graph of those daily values. Say I want to visualize the value for the key (a, b, c) over time, I do SELECT values FROM t WHERE a = my_a AND b = my_b AND c = my_c. Then I use the values array to draw the graph.

Performance of the updates (which happen in a bulk once a day) has worsened considerably over time.

Using PostgreSQL 8.3.8.

Can you give me any hints of where to look for a solution? It could be anythin开发者_运维知识库g from tweaking some parameters in postgres to even moving to another database (I guess a non-relational database would be better suited for this particular table, but I don't have much experience with those).


I would take a look at the FILLFACTOR for the table. By default it's set to 100, you could lower it to 70 (to start with). After this, you have to do a VACUUM FULL to rebuild the table.

ALTER TABLE tablename SET (FILLFACTOR = 70);
VACUUM FULL tablename;
REINDEX TABLE tablename;

This gives UPDATE a chance to place the updated copy of a row on the same page as the original, which is more efficient than placing it on a different page. Or if your database is already somewhat fragmented from lots of previous updated, it might already be sparese enough. Now your database also has the option to do HOT updates, assuming the column you are updating is not one involved in any index.


Not sure if arrays are the way to go here.

Why not store these in a separate table (one value plus keys per row) then you bulk update will be pure insert activity.


The problem is in updates. Change the schema from array based to multiple-rows per day, and the performance problem will go away.

You can add rollups to arrays, later on, with some kind of cronjob, but avoid updates.


Well a 3-column index is nothing to worry about. That doesn't necessarily make it that much slower. But that array-column might indeed be the problem. You say you are appending values to that array-column daily. By appending, do you mean appending values to all the 20 mln. records in the table? Or just some records?

The situation isn't completely clear to me, but I would suggest looking into ways of getting rid of that array-column. Making it a separate table for example. But, this depends on your situation and might not be an option. It might be just me, but I always feel 'dirty' having such a column in one of my tables. And most of the time there is a better solution for the problem you are trying to solve with that array-column. That being said, there are certainly situations in which such a column is valid, but at the moment, I can think of none. Certainly not in a table with a 20 mln. record count.

0

精彩评论

暂无评论...
验证码 换一张
取 消