Which granulary to choose for database table partitioning?_问答_开发者

Which granulary to choose for database table partitioning?

开发者 https://www.devze.com 2023-01-15 04:40 出处：网络

I have a 20-million record table in MySQL database. SELECT\'s work really fast because I have set up good indexes, but INSERT and UPDATE o开发者_运维知识库peration is getting to be really slow. The da

I have a 20-million record table in MySQL database. SELECT's work really fast because I have set up good indexes, but INSERT and UPDATE o开发者_运维知识库peration is getting to be really slow. The database is back-end of a web application under heavy load. INSERTs and UPDATEs are really slow because there are some 5 indexes on this table and index size is about 1GB now - I guess it takes to much time to compute.

To solve this problem, I decided to partition a table. I run MySQL 4, and cannot upgrade (no direct control over server), so I'll do manual partitioning - create a separate table for each section.

The data-set is composed from about 18000 different logical slices, which could be queried completely separately. Therefore, I could create 18000 tables named (maindata1, maindata2, etc.). However, I'm not sure that this is optimal way do to it? Beside the obvious fact that I'll have to browse through 18000 items in administration tool whenever I want to do something manually, I'm concerned about file-system performance. File-system is ext3. I'm not sure how fast it is in locating files in a directory with 36000 files (there's data file and index file).

If this is a problem, I could join some slices of data together into a same table. For example: maindata10, maindata20, etc. where maindata10 would contain slices 1, 2, 3...10. If I would go for "groups" of 10, I would only have 1800 tables. If I would group 20, I would get 900 tables.

I wonder what would be the optimal size of this grouping, i.e. number of files in a directory vs table size?

Edit: I also wonder if it would be a good idea to use multiple separate databases to group files together. So, even if I would have 18000 tables, I could group them in, say, 30 databases of 600 tables each. It seems like this would be much easier to manage. I don't know if having multiple databases would increase or decrease performance or memory footprint (it would complicate backup and restore though)

There are a few tactics you could follow to boost performance. By "partitions" I assume you mean "versions of tables with the same column layout but different data contents."

Get a server that will run mySQL 5 if you possibly can. It's faster and better at this stuff, enough so that you may not have a problem after you upgrade.

Are you using InnoDB? If so, can you switch to myISAM? (If you need rigid transactional integrity you might not be able to switch).

For partitioning, you might try to figure out what kind of data-slice combination will give you roughly equal-size partitions (by row count). If I were you I'd go for no more than about 20 partitions unless you can prove to yourself that you need to.

If only a few of your data slices are being actively updated (for example, if they are "this month's data" and "last month's data), I might consider splitting those into smaller slices. For example, you might have "this week's data", "last week's," and "the week before" in their own partitions. Then, when your partitions cool off, copy their data and combine them into bigger groups like "the quarter before last." This has the disadvantage that it will require routine Sunday-evening style maintenance jobs to run. But it has the advantage that most or all updates only happen on a small fraction of your table.

you should look into the merge engine if you are using myISAM, this way you can get pretty much the same functionality as a partitioning of mysql5, you will be able to run the same select as you are running now.