We have a database on our master server (PHP/MySQL) that stores our data thats updated through a website. I'm trying to figure out the best way to copy this 开发者_JAVA百科data down to a local SQLite database (running on android). Because of the constraints of mobile internet speeds etc i'm going to need to build in checks to see if each row needs updating. Thought a versioning number for each row might be the best solution.
What are peoples thoughts on this?
Thanks, Nathan
Wouldn't using timestamp be easier? Just get all rows that were updated since particular time. How much of the data do you need to copy?
Depending on the size of the database, a binary search might be worth while.
Basically, come up with a query that runs MD5()
on a subset of rows. Then it's as simple as walking down the tree in a binary fashion, skipping those sections that match. Steps:
Pre-compute the first 3 levels of the search (or so), send it to the client.
On client, compute those 3 levels, check for matches. On those that don't match, compute 3 levels (14 hashes per non-matched level3) and send back to server.
On server, compute the next 3 levels and compare to what was sent by client. Compute next 3 levels for mismatched rows.
Repeat 2-3 until the number of rows reaches 1 in each limit. Once there, you've identified the differences, so send/request them from the server, and you're done.
Now, why is this more efficient then sending a dump? Well, for 1 row changed, it will take approximately log(total_rows, 2) * (sizeofMD5 + 2)
bytes transfered (excluding overhead). So for a 1 million row table, it'll take about 352 bytes of bydirectional data to determine the changed row. Now, the more rows that are changed, the more data it will use.
I chose batches of 3, since it's a good trade-off between processing extra data, and reducing latency with number of connections.
Now, as far as hashing rows, you could do something like:
SELECT MD5(GROUP_CONCAT(row)) AS checksum FROM
(
SELECT 1 AS grouping, MD5(CONCAT(id, col2, col3)) AS row
FROM table
ORDER BY id ASC
WHERE id > ? AND id < ?
) AS a
GROUP BY grouping
Which would produce a single MD5 which is dependent upon every single row in the limit. (there are other ways, but that's an illustration). Then just call that with 0, count(rows)
to check the table as a whole, and then iterate down the chain from there.
Note that this will only work on tables with an integer PK (since it's used as the bounding window on the binary search). But it will detect every type of change (master-delete, slave-delete, master-change, slave-change, master-insert, slave-insert) with the same speed, efficiency and code.
精彩评论