When working with PostgreSQL you can break apart a URL into several different lexemes when using full text search. For example:
SELECT to_tsvector('http://www.example.com/dir/page.html');
to_tsvector
--------------------------------------------------------------------------
'/dir/page.html':3 'www.example.com':2 'www.example.com/dir/page.html':1
(1 row)
You can see that PostgreSQL has broken up http://www.example.com/dir/page.html
into the url minus the protocol (www.example.com/dir/page.html
), host (www.example.com
) and the url_path (/dir/page.html
). This is handy because it will allow you to quickly search for www.example.com
.
With that background, how does SphinxSearch handle ind开发者_JAVA技巧exing a URL? Does it behave similarly to PostgreSQL in that it breaks apart a URL into parts so that it can be easily searched?
it literally just breaks up the source text using any charactors not listed in charset_table
so normally . and / just count as seperators so a url will just be searchable by the groups of letters - usefully combined with phrase operator
精彩评论