Background:
We have large flat files span around 60GB and are inserting into database. We are experiencing incremental performance downgrade during insertion.
- We have 174 (million) records and expecting another 50 (million) to be inserted
- We have splitted main table into 1000+ tables on the basis of first-two-characters of entity-name e.g. entity_aa, entity_ab ... entity_zz
- During each insertion, there are three queries ran (a) range based search to another table,(b) checking if record is already inserted or not (c) insert into detail (entity_briefs) table
- We added entity_briefs to handle frequent search queries, but realized that, upon insertion into database, it slow down gradually no matter if we ALTER TABLE entity (or entity_briefs) DISABLE (or ENABLE) KEY.
- The machine has 4 CPUs, Gigs of disk-space, 2GB RAM. Operating system is Linux CentOS (5.4) 32bit
- We found that not all 4 CPUs are get utilized
- We've had ran 4 importing scripts at once, tho overall performance is not satisfactory
Problematic Table
CREATE TABLE `entity_briefs` (
`entity_brief_id` bigint(11) NOT NULL auto_increment,
`entity_id` bigint(11) default NULL,
`entity_table_prefix` char(2) default NULL,
`string_1` varchar(255) default NULL,
`string_2` varchar(255) default NULL,
`zip` varchar(25) default NULL,
`phone` bigint(11) default NULL,
PRIMARY KEY (`entity_brief_id`),
KEY `idx_entity_id` (`entity_id`),
KEY `idx_entity_table_prefix` (`entity_table_prefix`),
KEY `idx_zip` (`zip`),
KEY `idx_string_1` (`string_1`),
KEY `idx_string_2` (`string_2`),
KEY `idx_phone` (`phone`)
);
mysqltuner.pl output:
>> MySQLTuner 1.1.1 - Major Hayden <major@mhtx.net>
>> Bug reports, feature requests, and downloads at http://mysqltuner.com/
>> Run with '--help' for additional options and output filtering
Please enter your MySQL administrative login: xxxxx
Please enter your MySQL administrative password:xxxxx
-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.85-community
[OK] Operating on 32-bit architecture with less than 2GB RAM
-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 101M (Tables: 1344)
[!!] InnoDB is enabled but isn't being used
[!!] Total fragmented tables: 1
-------- Security Recommendations -------------------------------------------
ERROR 1142 (42000) at line 1: SELECT command denied to user 'xxxx'@'localhost' for table 'user'
[OK] All database users have passwords assigned
-------- Performance Metrics -------------------------------------------------
[--] Up for: 5d 15h 53m 55s (2M q [4.395 qps], 9K conn, TX: 1B, RX: 425M)
[--] Reads / Writes: 51% / 49%
[--] Total buffers: 34.0M global + 2.7M per thread (500 max threads)
[OK] Maximum possible memory usage: 1.3G (67% of installed RAM)
[OK] Slow queries: 0% (9/2M)
[OK] Highest usage of available connections: 1% (5/500)
[!!] Key buffer size / total MyISAM indexes: 8.0M/105.3M
开发者_如何学JAVA[!!] Key buffer hit rate: 94.1% (72M cached / 4M reads)
[!!] Query cache is disabled
[OK] Temporary tables created on disk: 7% (101 on disk / 1K total)
[!!] Thread cache is disabled
[!!] Table cache hit rate: 0% (64 open / 277K opened)
[OK] Open file limit used: 0% (127/18K)
[OK] Table locks acquired immediately: 99% (2M immediate / 2M locks)
[!!] Connections aborted: 38%
-------- Recommendations -----------------------------------------------------
General recommendations:
Add skip-innodb to MySQL configuration to disable InnoDB
Run OPTIMIZE TABLE to defragment tables for better performance
Enable the slow query log to troubleshoot bad queries
Set thread_cache_size to 4 as a starting value
Increase table_cache gradually to avoid file descriptor limits
Your applications are not closing MySQL connections properly
Variables to adjust:
key_buffer_size (> 105.3M)
query_cache_size (>= 8M)
thread_cache_size (start at 4)
table_cache (> 64)
Requirement: In order to speed up the insertion what optimization strategy can be used?
A few general suggestions, as I don't have a silver bullet for you:
I don't think you can expect things to not slow down at all on insert as the table sizes grow. Database insert times generally will scale with database size, the trick is to try to make the overall performance acceptable given this expectation.
If things are slowing down and CPU isn't pegged, then you are probably I/O bound on database access. If you find that this is the case, you may want to try faster drives, Raid 0, faster drive controllers, etc. You may even want to consider building the database on a Solid State drive and then copying it after creation to a traditional hard drive. These should be much faster for the random access behaviour you can expect from mysql on a filesystem, though I understand you will 'wear them out' over time. Still, you can get a Terabyte of Solid State storage under $10k.
Also take a good look at optimizing your insert procedure. Disabling indexes during the inserts like you mention, while it won't stop the gradual slow down, should speed up the overall procedure significantly. I take from your description that you have some sort of insert script logic that does selects and inserts, not not a simple LOAD of a flat file. You are doing three different queries per insert, possibly round tripping the data multiple times between your client and the database. Especially look at that ranged select and make sure that this query alone doesn't have bad performance characteristics on table size.
Another possibility may be throwing a lot more RAM at the problem and using it as a disk cache. If that "other table" that you are running those range selects on isn't being modified during your insertfest, perhaps you can get that in memory to cut down on drive seeking, if you determine that seek time is indeed the performance bound here.
精彩评论