开发者

Name for this distributed-database data-location optimization algorithm?

开发者 https://www.devze.com 2023-03-16 14:02 出处:网络
Say we have a large graph of databases connected to each other, effectively one giant distributed database.Any node on the graph can query the whole datab开发者_开发技巧ase by querying its neighbors r

Say we have a large graph of databases connected to each other, effectively one giant distributed database. Any node on the graph can query the whole datab开发者_开发技巧ase by querying its neighbors recursively, which take the results they get from their neighbors and pass the combined result back down the query path.

Also, assume that there's the capability to stop the recursion if a node's own database contains a result that is "good enough", so that the entire network doesn't have to be queried if there's a decent result already nearby. This makes what I'm about to say relevant.

Wouldn't it make sense to transfer the returned data one step closer to the node that originated the query every time a query is made? That is, a queried node queries its neighbors and gets X, queries itself and gets Y, passes X+Y back to the node that queried it, stores X in its database, and deletes Y from its database. Wouldn't this eventually result in the distributed database having a roughly optimal distribution of data among its nodes with respect to the amount of nodes that would be consulted during a query, on average?

Is there a name for this technique?


This topic comes up a lot in grid computing; you want to do a google scholar search for something like data grid replica placement. It works well if there's a lot of time-locality in accesses (if a node wants some data, it'll want it a lot in the near future) and the data is read-mostly. As yi_H points out, if there's a lot of big modifications of the data, "cache" (replica) coherency becomes a big issue.


There are techniques like this but you have to be aware that once you "cache" a result you have to update it if when the data changes.. which means either you have to store at the data who caches it, or notify everybody. Implementing something like this requires a lot of coordination which will hurt performance.. not as easy as it sounds. You can also loosen the constraints you database gives you and then be aware in your application that you might get cached results which are out sync (and if neccessary ask for a non-cached version).

0

精彩评论

暂无评论...
验证码 换一张
取 消