how to Index URL in SOLR so I can boost results after website_问答_开发者

how to Index URL in SOLR so I can boost results after website

开发者 https://www.devze.com 2023-04-03 06:25 出处：网络

I have thousands of documents indexed in my SOLR which represents data crawled from different websites. One of the fields of a documentis SourceURL which contains the url of a webpage that I crawled a

相关专题：solr

I have thousands of documents indexed in my SOLR which represents data crawled from different websites. One of the fields of a document is SourceURL which contains the url of a webpage that I crawled and indexed into this Document.

I want to boost results from a specific website using boost query. For example I have 4 documents each containing in SourceURL the following data

https://meta.stackoverflow.com/page1
http://www.stackoverflow.com/page2
https://stackoverflow.com/page3
https://stackexchange.com/page1

I want to boost all results that are from stackoverflow.com, and not subdomains (in this case result 2 and 3 ).

Do you know how can I index the url field and then use boost query t开发者_运维问答o identify all the documents from a specific website like in the case above ?

One way would be to parse the url prior to index time and specify if it is a primary domain ( primarydomain boolean field in your schema.xml file for example).

Then you can boost the primarydomain field in your query results. See using the DisMaxQParserPlugin from the Solr Wiki for an example on how to boost fields at query time.