开发者

Solr changes document's score when its random field value altered

开发者 https://www.devze.com 2023-03-10 23:55 出处:网络
I need to navigate forth and back in Solr results set ordered by score viewing documents one by one. To visualise that, first a list of 开发者_高级运维document titles is presented to user, then he or

I need to navigate forth and back in Solr results set ordered by score viewing documents one by one. To visualise that, first a list of 开发者_高级运维document titles is presented to user, then he or she can click one of the title to see more details and then needs to have an opportunity to move to the next document in the original list without getting back and clicking another title.

During viewing documents get changed: their dynamic field is modified (or created is not exists yet) to mark that document has already been viewed (used in other search).

The problem I face is that when the document is altered and re-indexed to keep those changes, sometimes (and not always, which is very disturbing) its place in the results set for the same query changes (in other words, it's score changes as that doesn't happen when browsing results sorted by one of the documents' fields). So, "Previous" / "Next" navigation doesn't work properly.

I'm not using any custom weighting or boosters on fields for score calculation. Also, that dynamic field changed during browsing doesn't participate in the query used to get the record set browsed.

So, the questions are: can the modification of the document's field not included in the query change its relevance score? And if it can, then how can I control that?

UPDATE

I did some tests and can add the following:

  1. Document changes its place in the result set even if no field is amended - just requesting the document and re-indexing it without any changes to its fields makes it take another place next time the same query over the same index is executed.

  2. That happens even if the result set is sorted explicitly ("first_name DESC"), so score (which depends on the update date) is not involved. The document stays the same, its field result set is sorted by is the same, yet its position changes.

Still have no idea how to avoid that.


In Solr, if your field is "indexed", it will have an effect on the relevancy ranking ("stored" fields show up in search results but are not necessarily searchable). If the fields in question aren't marked as indexed then you are good to go. Note that "indexed" and "stored" are not necessarily the same, hence you confusion about results lists changing even though not all fields are shown (a field can be "indexed" and not "stored" as well).

In this case I think you want your "viewed" field to be "stored" but not "indexed". If you really want to control the query, you can use copyField to copy the relevant results into a single searchable field. You can also boost terms or documents so that certain fields are "less important" to the search query.

If you want to see how the relevancy rankings are calculated, you can add "debugQuery=on" to the end of your Solr Query (see the Relevancy FAQ for more info).

However, all that being said, I would recommend you cache your search result query (at least for the first page for your results), since you will always have results changing (documents added, removed by other users, etc). Your best bet is to design a UI that anticipates this, or at least batches a user's query.


I've found the solution which doesn't eliminate the problem completely but makes it much less likely to happen.

So the problem happens when the documents are sorted by some field and there is a number of them with the same value in this field (e.g. result set is sorted by first name, and there are 100 entries for "John").

This is when the indexed time gets involved - apparently Solr uses it to sort the documents when their main sorting fields are identical. To make this case much less probable, you need to add more sorting fields, e.g. "first_name desc" should become "first_name desc, last_name desc, register_date asc".

Also, adding document's unique id as the last sorting field should remove the problem completely (the set of sorting fields will never be identical for any two documents in the index).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号