开发者

Optimising MySQL for parallel Import of massive data files. 1 Connection Per Table

开发者 https://www.devze.com 2023-02-17 02:49 出处:网络
I\'m doing some preparatory work for a large website migration. The database is around the 10GB in size and several tables contain > 15 Million records. Unfortunately, this only comes in a large sin

I'm doing some preparatory work for a large website migration.

The database is around the 10GB in size and several tables contain > 15 Million records. Unfortunately, this only comes in a large single mysqldump file in SQL format due to client relations outside my remit, but you know how that goes. My goal is to minimize downtime and hence import the data as fast as possible.

I have attempted to use the standard MySQL CLI开发者_如何学JAVA interface like so:

$mysql database_name < superhuge_sql_file -u username -p

This is however, super slow.

To try and speed things up I've used awk to split the file in to chunks for each table with associated data, and have built a little shell script to try and import the tables in parallel, like so;

#!/bin/sh

awk '/DROP TABLE/{f=0 ;n++; print >(file="out_" n); close("out_" n-1)} f{ print > file}; /DROP TABLE/{f=1}'  superhuge.sql

for (( i = 1; i <= 95; i++ )) 
do
    mysql -u admin --password=thepassword database_name < /path/to/out_$i &
done

It's worth mentioning that this is a "use once and destroy" script (passwords in scripts etc...).

Now, this works, but still takes over 3 hours to complete on a quad core server doing nothing else at present. The tables do import in parallel but not all of them at once, and trying to get MySQL server information through the CLI is very slow during the process. I'm not sure why but trying to access tables using the same mysql user account hangs while this is in process. max_user_connections is unlimited.

I have set max connections to 500 in my.cnf but have otherwise not configured MySQL on this server.

I've had a good hunt around but was wondering if there are any MySQL config options that will help speed this process up, or any other methods I have missed that will be quicker.


If you can consider using GNU parallel, please check this example found on wardbekker gist:

# Split MYSQL dump file
zcat dump.sql.gz | awk '/DROP TABLE IF EXISTS/{n++}{print >"out" n ".sql" }'
# Parallel import using GNU Parallel http://www.gnu.org/software/parallel/
ls -rS *.sql | parallel --joblog joblog.txt mysql -uXXX -pYYY db_name "<"

which will split big file into separate SQL files then run parallel for parallel processing.

So to run 10 threads in GNU parallel, you can run:

ls -rS data.*.sql | parallel -j10 --joblog joblog.txt mysql -uuser -ppass dbname "<"

On OS X, it can be:

gunzip -c wiebetaaltwat_stable.sql.gz | awk '/DROP TABLE IF EXISTS/{n++}{filename = "out" n ".sql"; print > filename}'

Source: wardbekker/gist:964146


Related: Import sql files using xargs at Unix.SE


Importing the dumpfile to the server

$ sudo apt-get install pigz pv
$ zcat /path/to/folder/<dbname>_`date +\%Y\%m\%d_\%H\%M`.sql.gz | pv | mysql --user=<yourdbuser> --password=<yourdbpassword> --database=<yournewdatabasename> --compress --reconnect --unbuffered --net_buffer_length=1048576 --max_allowed_packet=1073741824 --connect_timeout=36000 --line-numbers --wait --init-command="SET GLOBAL net_buffer_length=1048576;SET GLOBAL max_allowed_packet=1073741824;SET FOREIGN_KEY_CHECKS=0;SET UNIQUE_CHECKS = 0;SET AUTOCOMMIT = 1;FLUSH NO_WRITE_TO_BINLOG QUERY CACHE, STATUS, SLOW LOGS, GENERAL LOGS, ERROR LOGS, ENGINE LOGS, BINARY LOGS, LOGS;"

Optional: Command Arguments for connection

--host=127.0.0.1 / localhost / IP Address of the Import Server
--port=3306

The optional software packages are helpful to import your database SQL file faster

with a progress view (pv)
Parallel gzip (pigz/unpigz) to gzip/gunzip files in parallel

for faster zipping of the output

Alternatively you do have a range of MySQL Options for

Export

https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html

Import

https://dev.mysql.com/doc/refman/5.7/en/mysql-command-options.html

Configuration

https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html

https://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html

Here is a sample my.cnf I run on my server SSD Quad Core and usually imports a 100GB DB file in about 8 Hours. But you can tweak your Server to the settings to help it to write much faster.

Check each config variable using the above link to match your MySQL Server with the variable and values.

# Edit values to as per your Server Processor and Memory requirements.
[mysqld]
# Defaults
pid-file = /var/run/mysqld/mysqld.pid
socket   = /var/run/mysqld/mysqld.sock
log-error = /var/log/mysql/error.log
datadir = /var/lib/mysql
log_timestamps = SYSTEM
character_set_server = utf8mb4
collation_server = utf8mb4_general_ci

# InnoDB
innodb_buffer_pool_size = 48G
innodb_buffer_pool_instances = 48
innodb_log_file_size = 3G
innodb_log_files_in_group = 4
innodb_log_buffer_size = 256M
innodb_log_compressed_pages = OFF
innodb_large_prefix = ON
innodb_file_per_table = true
innodb_buffer_pool_load_at_startup = ON
innodb_buffer_pool_dump_at_shutdown = ON
innodb_autoinc_lock_mode = 2
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 360
innodb_flush_neighbors = 0
innodb_flush_method = O_DIRECT
innodb_io_capacity = 2500
innodb_io_capacity_max = 5000
innodb_read_io_threads = 64
innodb_write_io_threads = 64
innodb_monitor_enable = all
performance_schema = ON

key_buffer_size = 32M
wait_timeout = 30
interactive_timeout = 3600
max_connections = 1000
table_open_cache = 5000
open_files_limit = 8000
tmp_table_size = 32M
max_heap_table_size = 64M

# Slow/Error
log_output = file
slow_query_log = ON
slow_query_log_file = /var/log/mysql/slow_query.log
long_query_time = 10
log_queries_not_using_indexes = ON
log_slow_rate_limit = 100
log_slow_rate_type = query
log_slow_verbosity = full
log_slow_admin_statements = ON
log_slow_slave_statements = ON
slow_query_log_always_write_time = 1
slow_query_log_use_global_control = all

# Query
join_buffer_size = 32M
sort_buffer_size = 16M
read_rnd_buffer_size = 8M
query_cache_limit = 8M
query_cache_size = 8M
query_cache_type = 1

# TCP
max_allowed_packet = 1G


Does the sql in the dump insert multiple rows? Does the dump use multiple row inserts? (Or maybe you can pre-process it to?)

This guy covers a lot of the basics, for example:

  • Disabling indexes which makes import many times faster.
  • Disable MySQL indexes, so before import run:

    ALTER TABLE `table_name` DISABLE KEYS;
    

    then after import change it back:

    ALTER TABLE `table_name` DISABLE KEYS;
    
  • When using MyISAM table type, use MySQL's INSERT DELAYED command instead, so it encourage MySQL to write the data to the disk when the database is idle.

  • For InnoDB tables, use these extra commands to avoid a great deal of disk access:

    SET FOREIGN_KEY_CHECKS = 0;
    SET UNIQUE_CHECKS = 0;
    SET AUTOCOMMIT = 0;
    

    and these at the end:

    SET UNIQUE_CHECKS = 1;
    SET FOREIGN_KEY_CHECKS = 1;
    COMMIT;
    
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号