Summary:
How do I synchronize very large amount of data with a client which can't hold all the data in memory and keeps disconnecting?
Explanation:
I have a real-time (ajax/comet) app which will display some data on the web. I like to think of this as the view being on the web and the model being on the server.
Say I have a large number of records on the server, all of them being added/removed/modified all the time. Here are the problems:
-This being the web, the client is likely to have many connections/disconnections. While the client is disconnected, data may have been modified and the client will need to be updated when reconnected. However, the client can't be sent ALL the data every time there is a re-connections, since the data is so large.
-Since there is so much data, the client obviously can't be sent all of it. Think of a gmail account with thousands of messages or google map with ... the whole world!
I realize that initially a complete snapshot of some relevant subset of data will be sent to the client, and thereafter only incremental updates. This will likely be done through some sort of sequence numbers...the client will say "the last update I received was #234" and the client will send all messages between #234 and #current.
I also realize that the client-view will notify the server that it is 'displaying' records 100-200 "so only send me those" (perhaps 0-300, whatever the strategy).
However, I hate the idea of 开发者_如何学编程coding all of this myself. There is a general enough and common enough problem that there must be libraries (or at least step-by-step recipes) already.
I am looking to do this either in Java or node.js. If solutions are available in other languages, I'll be willing to switch.
Try a pub/sub solution. Subscribe the client at a given start time to your server events. The server logs all data change events based on the time they occur. After a given tim eor reconnect of your client the client asks for a list of all changed data rows since last sync. You can keep all the logic on the server and just sync the changes. Would result in a typical "select * from table where id in (select id from changed_rows where change_date > given_date)" statement on the server, which can be optimized.
精彩评论