I'm trying to optimize my PostgreSQL 8.3 DB tables to the best of my ability, and I'm unsure if I need to use varchar_pattern_ops
for certain columns where I'm performing a LIKE
against the first N characters of a string. According to this documentation, the use of xxx_pattern_ops
is only necessary "...when the server does not use the standard 'C' locale".
Can someone explain what this means? How do I check what locale my 开发者_如何学运维database is using?
Currently some locale [docs] support can only be set at initdb time, but I think the one relevant to _pattern_ops
can be modified via SET at runtime, LC_COLLATE. To see the set values you can use the SHOW command.
For example:
SHOW LC_COLLATE
_pattern_ops
indexes are useful in columns that use pattern matching constructs, like LIKE
or regexps. You still have to make a regular index (without _pattern_ops
) to do equality search on an index. So you have to take all this into consideration to see if you need such indexes on your tables.
About what locale is, it's a set of rules about character ordering, formatting and similar things that vary from language/country to another language/country. For instance, the locale fr_CA (French in Canada) might have some different sorting rules (or way of displaying numbers and so on) than en_CA (English in Canada.). The standard "C" locale is the POSIX standards-compliant default locale. Only strict ASCII characters are valid, and the rules of ordering and formatting are mostly those of en_US (US English)
In computing, locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface. Usually a locale identifier consists of at least a language identifier and a region identifier.
psql -l
according to handbook
example output:
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------+--------+----------+-------------+-------------+-------------------
packrd | packrd | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | packrd | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | packrd | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/packrd +
| | | | | packrd=CTc/packrd
template1 | packrd | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/packrd +
| | | | | packrd=CTc/packrd
(5 rows)
OK, from my perusings, it appears that this initial setting
initdb --locale=xxx
--locale=locale
Specifies the locale to be used in this database. This is equivalent to specifying both --lc-collate and --lc-ctype.
basically specifies the "default" locale for all database that you create after that (i.e. it specifies the settings for template1, which is the default template). You can create new databases with a different locale like this:
Locale is different than encoding, you can manually specify it and/or encoding:
CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;
If you want to manually call it out.
Basically if you don't specify it, it uses the system default, which is almost never "C".
So if your show LC_COLLATE
returns anything other than "C" or "POSIX" then you are not using the standard C locale
and you will need to specify the xxx_pattern_ops for your indexes. Note also the caveat that if you want to use the <, <=, >, or >= operators you need to create a second index without the xxx_pattern_ops flag (unless you are using the standard C locale on your database, which is rare...). For just == and LIKE
(etc.) then you don't need a second index. If you don't need LIKE
then you don't need the index with xxx_pattern_ops, possibly, as well.
Even if your indexes are defined to collate with the "default" like
CREATE INDEX my_index_name
ON table_name
USING btree
(identifier COLLATE pg_catalog."default");
This is not enough, unless the default is the "C" (or POSIX, same thing) collation, it can't be used for patterns like LIKE 'ABC%'
. You need something like this:
CREATE INDEX my_index_name
ON table_name
USING btree
(identifier COLLATE pg_catalog."default" varchar_pattern_ops);
If you've got the option...
You could recreate the database cluster with the C locale.
You need to pass the locale to initdb when initializing your Postgres instance.
You can do this regardless of what the server's default or user's locale is.
That's a server administration command though, not a database schema designers task. The cluster contains all the databases on the server, not just the one you're optimising.
It creates a brand new cluster, and does not migrate any of your existing databases or data. That'd be additional work.
Furthermore, if you're in a position where you can consider creating a new cluster as an option, you really should be considering using PostgreSQL 8.4 instead, which can have per-database locales, specified in the CREATE DATABASE statement.
There is also another way (assuming you want to check them, not modify them):
Check file /var/lib/postgres/data/postgresql.conf Following lines should be found:
# These settings are initialized by initdb, but they can be changed.
lc_messages = 'en_US.UTF-8' # locale for system error message strings
lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
lc_numeric = 'en_US.UTF-8' # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting
精彩评论