开发者

Solr Performance Suggestions

开发者 https://www.devze.com 2023-03-21 00:26 出处:网络
I am facing some performance issues on my Solr Instal开发者_运维问答lation (3core server). I am indexing live twitter data based on certain keywords, as you can imagine, the rate at which documents ar

I am facing some performance issues on my Solr Instal开发者_运维问答lation (3core server). I am indexing live twitter data based on certain keywords, as you can imagine, the rate at which documents are received is very high and so the updates to the core is very high and regular. Given below are the document size on my three core.

Twitter  - 26874747
Core2    -  3027800
Core3    -  6074253

My Server configuration has 8GB RAM, but now we are experiencing server performance drop. What can be done to improve this? Also, I have a few questions.

Does the number of commit takes high memory? Will reducing the number of commits per hour help? Most of my queries are field or date faceting based? how to improve those?

Regards, Rohit


  • Since you have high number of commits, you might want to have a larger merge factor to improve indexing performance.
  • Index the documents in batches, not one by one.
  • It takes very high memory/cpu when a merge happens, also the indexing gets blocked at that time.
  • Seperate indexing server from query server for better performance, use a master slave configuration.


We have also experienced a drop in performance in Solr since our original setup, the best thing I have found is this: http://www.lucidimagination.com/blog/2010/01/21/the-seven-deadly-sins-of-solr/

That should get you straight into making some improvements of your Solr implementation. So far, my findings say that:

  • Use SolrJ for better performance using the binary updates (it adds documents to the index but doesnt need a commit... we had to ditch PHP and move the backend to Java)
  • Remove the optimization process from each commit and run it only once a day/week depending on the volume of data you have (it's set to run on each commit on the php-solr-client library by default).
  • Tune your warmup queries to the data you hit the most.
  • Commit large chunks of data.
  • Monitor the JVM and Garbage collector
  • Tune Tomcat/Java settings


Upgrade to recent trunk of SOLR 4.0. Then, follow the instructions here: http://wiki.apache.org/solr/NearRealtimeSearch

The key to the solution is to use softcommits while you import your tweets.

We're using a similar system to what you are describing, and index about 500.000 tweets per hour, without a hitch.

Part of the issue here is that you can't set up too much caching in this environment, because you will need to commit at one point. When that happens, cache is gone, and autowarming kicks in. Make sure you look at your Autowarming settings. Also check you logs when you do your facet queries for indications that the amount of unique values in any of the facets isn't too high. We're still struggeling with pivoting queries with 6 million unique users. That kills your performance. In practice it means we're doing fine with 50M tweets or so, above that it slows down significantly, and we're waiting for some 4.0 improvements to solve this.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号