开发者

What are some strategies for updating volatile data in Solr?

开发者 https://www.devze.com 2023-04-07 09:39 出处:网络
What are some strategies for updating volatile data in Solr? Imagine if you needed to model YouTube video data in a Solr index: how would you keep the \"views\" data fresh without swamping Solr in 开发

What are some strategies for updating volatile data in Solr? Imagine if you needed to model YouTube video data in a Solr index: how would you keep the "views" data fresh without swamping Solr in 开发者_Python百科updates?

I would imagine that storing the "views" data in a different data store (something like MongoDB or Redis) that is better at handling rapid updates would be the best idea.

But what is the best way to update the index periodically with that data? Would a delta-import make sense in this context? What does a delta-import do to Solr in terms of performance for running queries?


First you need to define "fresh".

Is "fresh" 1ms? If so, by the time the value (the rendered html) gets to the browser, it's not fresh anymore, due to network latency. Does that really matter? For the vast majority of cases, no, true real-time results are not needed.

A more common limit is 1s. In that case, Solr can deal with that with RankingAlgorithm (a plugin) or soft commits (currently available in Solr 4.0 trunk only).

"Delta-import" is a term from DataImportHandler that doesn't have much intrinsic meaning. From the point of view of a Solr server, there's only document additions, it doesn't matter where they come from or if a set of documents represent the "whole" dataset or not.

If you want to have an item indexed within 1s of its creation/modification, then do just that, add it to Solr just after it's created/modified (for example with a hook in your DAL). This should be done asynchronously, and use RA or soft commits.


You might be interested in so-called "near-realtime search", or NRT, now available on Solr's trunk, which is designed to deal with exactly this problem. See http://wiki.apache.org/solr/NearRealtimeSearch for more info and links.


How about using the external file field ?
This helps you to maintain data outside of your index in a separate file, which you can refresh periodically without any changes to the index.

For data such as downloads, views, rank which is fast changing data this can be an good option.
More info @ http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

This has some limitations, so you would need to check depending upon your needs.

0

精彩评论

暂无评论...
验证码 换一张
取 消