开发者

Fast way to replicate a huge database table

开发者 https://www.devze.com 2023-01-07 13:57 出处:网络
We are currently trying to solve a performance problem. Which is searching for data and presenting it in a paginated way takes about 2-3 minutes.

We are currently trying to solve a performance problem. Which is searching for data and presenting it in a paginated way takes about 2-3 minutes.

Upon further investigation (and after several sql tuning), it seems that searching is slow just because of开发者_运维技巧 the sheer amount of data.

A possible solution that I'm currently investigating is to replicate the data in a searchable cache. Now this cache can be in the database (i.e. materialized view) or it could be outside the db (nosql approach). However, since I would like the cache to be horizontally scalable, I am leaning towards caching it outside the database.

I've created a proof of concept, and indeed, searching in my cache is faster than in the db. However, the initial full replication takes a long time to complete. Although the full replication will just happen once, and then succeeding replication will just be incremental against those that changed since the last replication, it would still be great if I can speed up the initial full replication.

However, during full replication, aside from the slowness of the query's execution, I also have to battle against network latency. In fact, I can deal with the slow query execution time. But the network latency is really really slowing the replication down.

So which leads me to my question, how can I speed up my replication? Should I spawn several threads each one doing a query? Should I use a scrollable?


Replicating the data in a cache seems like replicating the functionality of the database.

From reading other comments, I see that you are not doing this to avoid network roundtrips, but because of costly joins. In many DBMS you can create temporary tables - like this:

CREATE TEMPORARY TABLE abTable AS SELECT * FROM a , b ;

If a and b are large (relatively permanent) tables, then you will have a one-time cost of 2-3 minutes to create the temporary table. However, if you use abTable for many queries, then the subsequent per query cost will be much smaller than

SELECT name, city, ... , FROM a , b ;

Other database systems have a view concept which lets you do something like this

CREATE VIEW abView AS SELECT * FROM a , b ;

Changes in the underlying a and b table will be reflected in the abView.

If you really are concerned about network round trips, then you may be able to replicate parts of the database on the local computer.

A good database management system should be able to handle your data needs. So why reinvent the wheel?


  1. SELECT * FROM YOUR_TABLE
  2. Map results into an object or data structure
  3. Assign a unique key for each object or data structure
  4. Load the key and object or data structure into a WeakHashMap to act as your cache.

I don't see why you need sorting, because your cache should access values by unique key in O(1) time. What is sorting buying you?

Be sure to think about thread safety.

I'm assuming that this is a read-only cache, and you're doing this to avoid the constant network latency. I'm also assuming that you'll do this once on start up.

How much data per record? 12M records at 1KB per record means you'll need 12GB of RAM just to hold your cache.

0

精彩评论

暂无评论...
验证码 换一张
取 消