Where I work we use a postgres database (8.3 soon to migrate to 8.4). There is a small debate currently on the use of enums in the database. Personally I do not like the db enum type. Among other things it puts application logic in the database and creates a posibility for a mismatch between code a开发者_JAVA技巧nd data.
I was wondering what exactly are the advantages of postgres enums (besides readablity) and what are the disadvatages?
The advantages of enums are:
- Performance is better. You can just display what you get out of the core table instead of either having a separate lookup table that translates a code to a value or having app logic that translates a code to a value. This can be especially useful in datawarehouse applications.
- Ad hoc SQL is easier to write
The disadvantages are:
- Encoding display values into your database ddl is bad form. If you translate the enum value into a different display value in your app code, then you lost a lot of the advantages of using enums.
- Adding values requires DDL changes
- Makes language localization difficult
- Database portability is decreased
Enums combine the advantages of ints with the advantages of strings: they are small and fast like ints, readable like strings, and have the additional advantage of being safe (you can't mis-spell an enum).
However, if you don't care about readability, an int is as good as an enum.
In PostgreSQL 13, btree indexes now support deduplication. If we take the following real-world example of using an ENUM to represent HTTP methods in a log table with 100 million rows:
public | test_http_enum_idx | index | postgres | test | permanent | 789 MB |
public | test_http_test_idx | index | postgres | test | permanent | 789 MB |
We can see that the index size for both are the same. For a denormalized table, saving a few bytes per row doesn't really make up for the disadvantages.
Rule of thumb for PG 13+: Use ENUMs to constrain a column to a fixed/static set of values; do not to use them to save on disk space.
Possible exception: If an ENUM of static values will help you avoid a costly JOIN or a FK --- go for it; just make sure to avoid premature optimization and measure your results in production.
When making your decision, consider that popular BI tools like Metabase did not support filtering on an ENUM, however, they'll work just fine on a TEXT column. @solaris: Reports that Metabase 0.42.1 and higher support filtering on ENUM values.
As advantage you have also DB checking, that nothing else enum value couldn't be recorded in column. The big disadvantage for me was, that enum could be modified only by adding value to the end, but since Postgres 9.1 it is in the past: https://stackoverflow.com/a/7834949/548473
Advantages
Reduce Storage: Postgres uses only 1 byte per tuple when 255 or less ENUM elements are defined or 2 bytes for 256~65535 elements. This is because, rather that storing the constant literal of the value, Postgres stores the index in the ordered set of that value. For very large tables, this might prove to be a significant storage space save.
Arbitrary sorting:
CREATE TABLE opening_hours(
week_day ENUM ('Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'),
opening_time TIME,
closing_time TIME
);
If you sort by week_day it will sort in the order you specified which is convenient in the above case.
- Cheap constraints: Instead of doing checking in your application code or some complicated db constraint, enums check that only certain values are added in a cheap way.
Disadvantages
- The list of options CANNOT be controlled by end-users since ENUM is part of the schema
- An additional query is required to see the list of options
- String operations and functions do not work on ENUM This is due to ENUM being a separate data type from the built-in data types like NUMERIC or TEXT. This can be overcome by casting ENUM values to TEXT when being operated. However, it can be a pain when using ORM.
Point is, if applications are allowed to do DDL, they are more likely to cause blocking or conflict. DDL is best done offline i.e. in single-user mode.
精彩评论