If I have strings/phrases like this stored in the database:
- What are Q-type Operations?
- Programmer's Guide
- A.B.C's of Coding
Is there a way to pass a query parameter in like 开发者_运维百科"Programmers"
or "abc"
or "q-type"
and have it find "Programmer's"
, "A.B.C"
and "Q-type"
?
tsvector
Use the tsvector
type, which is part of the PostgreSQL text-search feature.
postgres> select 'What are Q-type Operations?'::tsvector;
tsvector
-------------------------------------
'Operations?' 'Q-type' 'What' 'are'
(1 row)
You can use familiar operators on tsvectors as well:
postgres> select 'What are Q-type Operations?'::tsvector
postgres> || 'A.B.C''s of Coding'::tsvector;
?column?
--------------------------------------------------------------
'A.B.C''s' 'Coding' 'Operations?' 'Q-type' 'What' 'are' 'of'
From tsvector documentation:
A tsvector value is a sorted list of distinct lexemes, which are words that have been normalized to merge different variants of the same word (see Chapter 12 for details). Sorting and duplicate-elimination are done automatically during input
If you also want to do language-specific normalization, like removing common words ('the', 'a', etc) and multiplies, use the to_tsvector
function. It also assigns weights to different words for text search:
postgres> select to_tsvector('english',
postgres> 'What are Q-type Operations? A.B.C''s of Coding');
to_tsvector
--------------------------------------------------------
'a.b.c':7 'code':10 'oper':6 'q':4 'q-type':3 'type':5
(1 row)
Full-blown text search
Obviously doing this for every row in a query will be expensive -- so you should store the tsvector in a separate column and use ts_query() to search for it. This also allows you to create a GiST index on the tsvector.
postgres> insert into text (phrase, tsvec)
postgres> values('What are Q-type Operations?',
postgres> to_tsvector('english', 'What are Q-type Operations?'));
INSERT 0 1
Searching is done using tsquery and the @@ operator:
postgres> select phrase from text where tsvec @@ to_tsquery('q-type');
phrase
-----------------------------
What are Q-type Operations?
(1 row)
You could try with an ILIKE with a TRANSLATE function, see here.
For example: translate(field, '.-\'', '')
Here's another link that can be relevant. Strip the value of the field from all punctuation before comparing it to the search string.
SQL Server: How do you remove punctuation from a field?
Postgresql supports full-text searching by converting text input to tsvector
types:
steve@steve@[local] =# select input, to_tsvector('english', input)\
from (values('What are Q-type Operations?'),('Programmer''s Guide'),('A.B.C''s of Coding')) x(input);
input | to_tsvector
-----------------------------+------------------------------------
What are Q-type Operations? | 'oper':6 'q':4 'q-type':3 'type':5
Programmer's Guide | 'guid':3 'programm':1
A.B.C's of Coding | 'a.b.c':1 'code':4
(3 rows)
As you can see, the stemming used by default will make "programming" "programmer" and "programmer's" all match identically.
You would typically use this by having an indexed tsvector column or expression, and then using the @@
operator to match that with a tsquery
, e.g.:
steve@steve@[local] =# select input, to_tsvector('english', input) \
from (values('What are Q-type Operations?'),('Programmer''s Guide'),('A.B.C''s of Coding')) x(input)\
where to_tsvector('english', input) @@ plainto_tsquery('english', 'programmers');
input | to_tsvector
--------------------+-----------------------
Programmer's Guide | 'guid':3 'programm':1
(1 row)
Here plainto_tsquery
analyses a user input string, and produces a query where every non-stop word in the query has to be matched by a tsvector.
This sounds like you want something along these lines:
http://www.postgresql.org/docs/9.0/static/fuzzystrmatch.html
I'm not 100% sure if that will cover what you want though.
EDIT I had to run this up locally to check (Using PostgreSQL 9.0 on Windows)
Here's what I found:
template1=> select soundex('Programmers'), soundex('Programmer''s');
soundex | soundex
---------+---------
P626 | P626
(1 row)
template1=> select soundex('abc'), soundex('A.B.C.');
soundex | soundex
---------+---------
A120 | A120
(1 row)
template1=> select soundex('Q-type'), soundex('q-type');
soundex | soundex
---------+---------
Q310 | Q310
(1 row)
So if you were to do soundex(colname) = soundex(<user param>)
should get you what you need in the where clause.
You will need to install the fuzzystrmatch module:
psql -U <dbowner> -d <database> -f SHAREDIR/contrib/fuzzystrmatch.sql
Refer to the documentation on how to locate SHAREDIR
EDIT I just noticed what I overlooked, I think this combined with the ts_vector
functionality may get you where you are aiming for.
Postgres supports pattern matching so you can build a regular expression in your where clause http://www.postgresql.org/docs/8.3/static/functions-matching.html
精彩评论