Accumulate data for batch updates and send it after passing threshold limits of size or after timed duration?_问答_开发者

Accumulate data for batch updates and send it after passing threshold limits of size or after timed duration?

开发者 https://www.devze.com 2023-04-08 02:35 出处：网络

Is it a good strategy to accumulate in webserver memory upto a specific limit of data over time that is being written to the database & send it as batch updates after every specified in开发者_C百科

Such kind of data would be very small like just adding a relationship between two entities which means adding just a set of ids to the rows.

(Of course, the delayed data should be such that is not expected to be immediately visible).

Are there any disadvantages of this approach ?

Usage: Building web application using Cassandra DB, with Java & JSF.

Short answer: this is a bad idea.

Cassandra's batch operations (e.g. http://pycassa.github.com/pycassa/api/pycassa/batch.html) are there to allow you to group updates in an idempotent unit. This allows you to retry the batch as a unit, so the purpose is roughly similar to a transaction in a relational database.

However, unlike the transaction analogy, the impact on performance is negligible and in fact making the load artificially "bursty" is usually counterproductive.

The main disadvantage is that it requires another thread to implement the timeout (a small amount of complexity) However the benefits are likely to be much greater.

A simple way to implement this is to use a wait/notify (there doesn't appear to be a good solution using the concurrency library)

private final List<T> buffered = new ArrayList<T>();
private final int notifySize = ...
private final int timeoutMS = ...

public synchronized void add(T t) {
    buffered.add(t);
    if (buffered.size() >= notifySize)
       notifyAll();
}

public synchronized void drain(List<T> drained) throws InterruptedException {
    while(buffered.isEmpty())
        wait(timeoutMS);
    drained.addAll(buffered);
    buffered.clear();
}

The add and drained can be called by any number of threads, however I imagine you would have only one thread draining, until it is interrupted.