When performing a query like:
select count(*) from myTextTable where tsv @@ plainto_tsquery('english', 'TERM');
I've noticed that PostgreSQL does not use the GI开发者_开发百科N index (that I defined on the tsv column) when TERM is 1 or 2 characters long; 3 or more characters work fine.
I understand that by indexing 1 or 2 character terms, the size of the index will increase vastly but retrieving texts containing specific 1 or 2 character terms in a fast way is essential for the application I'm developing.
Is there some full text search configuration parameter to index 1- or 2-character terms?
Some time ago, I wrote my own to_tsquery() and to_tsvector() methods (in Python), since I wanted more control. AFAIK the filtering happens in plainto_tsquery(). If you replace this method, you can index single characters, too.
This issue has been solved now by (a) removing lots of noisy text from the pages (using language detection) and (b) dropping/re-creating the GIN index. My guess is that the noisy text caused an explosion in the number of lexemes and that the index became unusable, or was classified as such by the query planner. –
精彩评论