开发者

how to Index URL in SOLR so I can boost results after website

开发者 https://www.devze.com 2023-04-03 06:25 出处:网络
I have thousands of documents indexed in my SOLR which represents data crawled from different websites. One of the fields of a documentis SourceURL which contains the url of a webpage that I crawled a

I have thousands of documents indexed in my SOLR which represents data crawled from different websites. One of the fields of a document is SourceURL which contains the url of a webpage that I crawled and indexed into this Document.

I want to boost results from a specific website using boost query. For example I have 4 documents each containing in SourceURL the following data

  1. https://meta.stackoverflow.com/page1
  2. http://www.stackoverflow.com/page2
  3. https://stackoverflow.com/page3
  4. https://stackexchange.com/page1

I want to boost all results that are from stackoverflow.com, and not subdomains (in this case result 2 and 3 ).

Do you know how can I index the url field and then use boost query t开发者_运维问答o identify all the documents from a specific website like in the case above ?


One way would be to parse the url prior to index time and specify if it is a primary domain ( primarydomain boolean field in your schema.xml file for example).

Then you can boost the primarydomain field in your query results. See using the DisMaxQParserPlugin from the Solr Wiki for an example on how to boost fields at query time.

0

精彩评论

暂无评论...
验证码 换一张
取 消