开发者

Solr and web site indexing to create a site search

开发者 https://www.devze.com 2022-12-23 18:52 出处:网络
I was trying to build a \'site search\' on a simple http site. I have a site, lets call it www.mycompany.com, that is pure html.

I was trying to build a 'site search' on a simple http site.

I have a site, lets call it www.mycompany.com, that is pure html.

Is there an easy way to use solr to index the entire site to build a full text search using solr开发者_开发知识库 as the engine?

I googled for a bit and could not find anything specific of the type: Do A Do B ... profit!

Let me also know if I am a bit off with what is solr for :P

Thanks in advance.


Solr is only for indexing and searching text, it does not have a crawler since it's out the project's scope.

However take a look at Nutch, which is a crawler and not too hard to setup initially.

Nutch and Solr can be integrated if you need some Solr-specific feature to search the index.


$ bin/solr create -c corename
$ bin/post -c corename https://siteurl.com -recursive 2 -delay 1

This would do a basic index of the site but it would not be the best. If you want simple then there it is. It can be done.

I think this only works on solr 5+.


Two other options you might want to look at are Crawl Anywhere and Heritrix

0

精彩评论

暂无评论...
验证码 换一张
取 消