开发者

How does Sphinx handle URLs

开发者 https://www.devze.com 2023-04-12 01:37 出处:网络
When working with PostgreSQL you can break apart a URL into several different lexemes when using full text search.For example:

When working with PostgreSQL you can break apart a URL into several different lexemes when using full text search. For example:

SELECT to_tsvector('http://www.example.com/dir/page.html');
                               to_tsvector                                
--------------------------------------------------------------------------
 '/dir/page.html':3 'www.example.com':2 'www.example.com/dir/page.html':1
(1 row)

You can see that PostgreSQL has broken up http://www.example.com/dir/page.html into the url minus the protocol (www.example.com/dir/page.html), host (www.example.com) and the url_path (/dir/page.html). This is handy because it will allow you to quickly search for www.example.com.

With that background, how does SphinxSearch handle ind开发者_JAVA技巧exing a URL? Does it behave similarly to PostgreSQL in that it breaks apart a URL into parts so that it can be easily searched?


it literally just breaks up the source text using any charactors not listed in charset_table

so normally . and / just count as seperators so a url will just be searchable by the groups of letters - usefully combined with phrase operator

0

精彩评论

暂无评论...
验证码 换一张
取 消