how to select the newly added rows in a table efficiently?_问答_开发者

how to select the newly added rows in a table efficiently?

开发者 https://www.devze.com 2023-02-02 16:38 出处：网络

I need to periodically update a local cache with new additions to some DB table. The table rows contain an auto-increment sequential number (SN) field. The cache keeps this number too, so basically I

I need to periodically update a local cache with new additions to some DB table. The table rows contain an auto-increment sequential number (SN) field. The cache keeps this number too, so basically I just need to fetch all rows with S开发者_JAVA百科N larger than the highest I already have.

SELECT * FROM table where SN > <max_cached_SN>

However, the majority of the attempts will bring no data (I just need to make sure that I have an absolutely up-to-date local copy). So I wander if this will be more efficient:

count = SELECT count(*) from table;
if (count > <cache_size>)
  // fetch new rows as above

I suppose that selecting by an indexed numeric field is quite efficient, so I wander whether using count has benefit. On the other hand, this test/update will be done quite frequently and by many clients, so there is a motivation to optimize it.

this test/update will be done quite frequently and by many clients

this could lead to unexpected race competition for cache generation

I would suggest

upon new addition to your table add the newest id into a queue table
using like crontab to trigger the cache generation by checking queue table
upon new cache generated, delete the id from queue table

as you stress majority of the attempts will bring no data, the above will only trigger where there is new addition

and the queue table concept, even can expand for update and delete

I believe that

SELECT * FROM table where SN > <max_cached_SN>

will be faster, because select count(*) may call table scan. Just for clarification, do you never delete rows from this table?

SELECT COUNT(*) may involve a scan (even a full scan), while SELECT ... WHERE SN > constant can effectively use an index by SN, and looking at very few index nodes may suffice. Don't count items if you don't need the exact total, it's expensive.

You don't need to use SELECT COUNT(*)

There is two solution.

You can use a temp table that has one field that contain last count of your table, and create new Trigger after insert on your table and inc temp table field in Trigger.
You can use a temp table that has one field that contain last SN of your table is cached and create new Trigger after insert on your table and update temp table field in Trigger.

not much to this really

drop table if exists foo;
create table foo
(
foo_id int unsigned not null auto_increment primary key
)
engine=innodb;

insert into foo values (null),(null),(null),(null),(null),(null),(null),(null),(null);

select * from foo order by foo_id desc limit 10;

insert into foo values (null),(null),(null),(null),(null),(null),(null),(null),(null);

select * from foo order by foo_id desc limit 10;