I am currently writing a web app and will need to do some orderin开发者_如何学Gog on a set of Chinese characters and I want to know whether Chinese characters are sorted by databases, if so how does it get sorted?
For reference I will be using PostgreSQL.
PostgreSQL sorts text using the operating system locale facility. This is exactly the same behavior that operating system tools such as sort
give you. So set your locale to something useful, such as zh_HK.utf8
when you initialize the database system.
If you don't like the results of that sort, you'll have to come with a custom solution.
The easiest and most common way to sort them is just as binary data, either as Unicode code points, or even more simple as raw binary data (which does work well for ASCII data). Unfortunately, that does not make for a very meaningful sort order. It does group things together though, so things like prefix queries should work.
For meaningful sort order, there is no good algorithmic solution. You'd need to work with lookup tables (see for example this thread about mapping Chinese to pinyin, by which you could then sort).
精彩评论