Good morning,
I am at the moment writing my master thesis and hence I have to justify each an every general assessment I make.
I have a flat database in MySQL which was originally composed of the following fields
- Date (DATETIME)
- Name (VARCHAR(50))
- Value (DOUBLE)
The PK of the table was a composite of the columns 1 and 2.
The thing is that I soon had more than 40 millions rows and my queries for all records over a single name were taking ages.
Hence, I decided to create an "index table" (I think the terminology is correct) where I store a mapping between Names and IDs:
- ID (INT)
- Name (VARCHAR 50)
And I changed my initial table to
- Date (DATETIME)
- ID (INT)
- Value (DOUBLE)
This way I could first find the ID of the record I was looking for, and then run a query over the large table very quickly (performance was really much better).
I assume this is because comparison between integer is much easier than between character strings, but I was looking for some literature to back this assessment (ideally some database structure book).
Do you think my assessme开发者_开发百科nt is correct?
Part of the issue is that compound keys (such as your Date,Name PK) are created by concatenating the indexed values (see http://dev.mysql.com/doc/refman/5.1/en/create-index.html), and the name (the primary thing you're looking up by here) is second. This makes it much more work to look stuff up by name, because the index won't be sorted by name -- it'll be sorted by date, THEN name, meaning mysqld will have to search the whole index instead of just grabbing the section where the PK is between "Jack, 0000-00-00" and "Jack, 9999-12-31".
If you added an index just for the name, or at least switched the PK to (Name, Date), you'd probably find your original table working much better.
Alternatively, if you did the same thing to your Date,ID table, it should be faster still, because you're all but eliminating string comparisons.
Assuming that there is a lot of duplication of data of the "Name" field, your query performance improved because integer comparisons are faster than string comparisons and you significantly reduced the size of the date table. This means less memory paging and less disk seeking.
If the name table has N rows, then you are doing N string comparisons, plus 40 million integer comparisons, instead of 40 million string comparisons. To increase query performance even more, you should add an index for the ID field of the date table.
CREATE INDEX date_id_index ON date_table (ID)
As for books, "Applied Mathematics for Database Professionals" by Lex de Haan and Toon Koppelaars is really good book if you want advanced SQL-knowledge. I should point out that you don't just "mention" books, you read them and use them as reference - just referencing books because they sound cool but not reading them will come back to bite you in the ass.
精彩评论