I am running the following query on 489 million rows (102 gb) on a computer with 2 gb of memory:
select * from table order by x, y, z, h, j, l;
I am u开发者_运维百科sing psycopg2 with a server cursor ("cursor_unique_name") and fetch 30000 rows at a time.
Obviously the result of the query cannot stay in memory, but my question is whether the following set of queries would be just as fast:
select * into temp_table from table order by x, y, z, h, j, l;
select * from temp_table
This means that I would use a temp_table to store the ordered result and fetch data from that table instead.
The reason for asking this question is that the takes only 36 minutes to complete if run manually using psql, but it took more than 8 hours (never finished) to fetch the first 30000 rows when the query was executed using psycopg2.
If you want to fetch this table by chunks and sorted then you need to create an index. Every fetch will need to sort this whole table if there will be no such index. Your cursor probably sorted this table once for every row fetched — waiting for red giant sun would probably end sooner…
create index tablename_order_idx on tablename (x, y, z, h, j, l);
If your table data is relatively stable then you should
cluster
it by this index. This way table data will be fetched without too much seeking on disk.
cluster tablename using tablename_order_idx;
If you want to get data in chunks the you should not use cursor, as it will always work one row at a time. You should use
limit
andoffset
:
select * from tablename order by x, y, z, h, j, l
limit 30000 offset 44*30000
精彩评论