A long time ago on a system far, far away...
Trying to migrate a database from MySQL to PostgreSQL. All the documentation I have read covers, in great detail, how to migrate the structure. I have found very little documentation on migrating the data. The schema has 13 tables (which have been migrated successfully) and 9 GB of data.
MySQL version: 5.1.x
PostgreSQL version: 8.4.xI want to use the R programming language to analyze the data using SQL select statements; PostgreSQL has PL/R, but MySQL has nothing (as far as I can tell).
A New Hope
Create the database location (/var
has insufficient space; also dislike having the PostgreSQL version number everywhere -- upgrading would 开发者_开发问答break scripts!):
sudo mkdir -p /home/postgres/main
sudo cp -Rp /var/lib/postgresql/8.4/main /home/postgres
sudo chown -R postgres.postgres /home/postgres
sudo chmod -R 700 /home/postgres
sudo usermod -d /home/postgres/ postgres
All good to here. Next, restart the server and configure the database using these installation instructions:
sudo apt-get install postgresql pgadmin3
sudo /etc/init.d/postgresql-8.4 stop
sudo vi /etc/postgresql/8.4/main/postgresql.conf
- Change
data_directory
to/home/postgres/main
sudo /etc/init.d/postgresql-8.4 start
sudo -u postgres psql postgres
\password postgres
sudo -u postgres createdb climate
pgadmin3
Use pgadmin3
to configure the database and create a schema.
The episode continues in a remote shell known as bash
, with both databases running, and the installation of a set of tools with a rather unusual logo: SQL Fairy.
perl Makefile.PL
sudo make install
sudo apt-get install perl-doc
(strangely, it is not calledperldoc
)perldoc SQL::Translator::Manual
Extract a PostgreSQL-friendly DDL and all the MySQL
data:
sqlt -f DBI --dsn dbi:mysql:climate --db-user user --db-password password -t PostgreSQL > climate-pg-ddl.sql
- Edit
climate-pg-ddl.sql
and convert the identifiers to lowercase, and insert the schema reference (using VIM)::%s/"\([A-Z_]*\)"/\L\1/g
:%s/ TABLE / TABLE climate./g
:%s/ on / on climate./g
mysqldump --skip-add-locks --complete-insert --no-create-db --no-create-info --quick --result-file="climate-my.sql" --databases climate --skip-comments -u root -p
It might be worthwhile to simply rename the tables and columns in MySQL to lowercase:
select concat( 'RENAME TABLE climate.', TABLE_NAME, ' to climate.', lower(TABLE_NAME), ';' ) from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA='climate';
- Execute the commands from the previous step.
- There is probably a way to do the same for columns; I changed them manually because it was faster than figuring out how to write the query.
The Database Strikes Back
Recreate the structure in PostgreSQL as follows:
pgadmin3
(switch to it)- Click the Execute arbitrary SQL queries icon
- Open
climate-pg-ddl.sql
- Search for
TABLE "
replace withTABLE climate."
(insert the schema nameclimate
) - Search for
on "
replace withon climate."
(insert the schema nameclimate
) - Press
F5
to execute
This results in:
Query returned successfully with no result in 122 ms.
Replies of the Jedi
At this point I am stumped.
- Where do I go from here (what are the steps) to convert
climate-my.sql
toclimate-pg.sql
so that they can be executed against PostgreSQL? - How to I make sure the indexes are copied over correctly (to maintain referential integrity; I don't have constraints at the moment to ease the transition)?
- How do I ensure that adding new rows in PostgreSQL will start enumerating from the index of the last row inserted (and not conflict with an existing primary key from the sequence)?
- How do you ensure the schema name comes through when transforming the data from MySQL to PostgreSQL inserts?
Resources
A fair bit of information was needed to get this far:
- https://help.ubuntu.com/community/PostgreSQL
- http://articles.sitepoint.com/article/site-mysql-postgresql-1
- http://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL#MySQL
- http://pgfoundry.org/frs/shownotes.php?release_id=810
- http://sqlfairy.sourceforge.net/
Thank you!
What I usually do for such migrations is two-fold:
- Extract the whole database definition from MySQL and adapt it to PostgreSQL syntax.
- Go over the database definition and transform it to take advantage of functionality in PostgreSQL that doesn't exist in MySQL.
Then do the conversion, and write a program in whatever language you are most comfortable with that accomplishes the following:
- Reads the data from the MySQL database.
- Performs whatever transformation is necessary on the data to be stored in the PostgreSQL database.
- Saves the now-transformed data in the PostgreSQL database.
Redesign the tables for PostgreSQL to take advantage of its features.
If you just do something like use a sed
script to convert the SQL dump from one format to the next, all you are doing is putting a MySQL database in a PostgreSQL server. You can do that, and there will still be some benefit from doing so, but if you're going to migrate, migrate fully.
It will involve a little bit more up-front time spent, but I have yet to come across a situation where it isn't worth it.
Convert the mysqldump file to a PostgreSQL-friendly format
Convert the data as follows (do not use mysql2pgsql.perl):
Escape the quotes.
sed "s/\\\'/\'\'/g" climate-my.sql | sed "s/\\\r/\r/g" | sed "s/\\\n/\n/g" > escaped-my.sql
Replace the
USE "climate";
with a search path and comment the comments:sed "s/USE \"climate\";/SET search_path TO climate;/g" escaped-my.sql | sed "s/^\/\*/--/" > climate-pg.sql
Connect to the database.
sudo su - postgres
psql climate
Set the encoding (mysqldump ignores its encoding parameter) and then execute the script.
\encoding iso-8859-1
\i climate-pg.sql
This series of steps will probably not work for complex databases with many mixed types. However, it works for integer
s, varchar
s, and float
s.
Indexes, primary keys, and sequences
Since mysqldump
included the primary keys when generating the INSERT
statements, they will trump the table's automatic sequence. The sequences for all tables remained 1 upon inspection.
Set the sequence after import
Using the ALTER SEQUENCE
command will set them to whatever value is needed.
Schema Prefix
There is no need to prefix tables with the schema name. Use:
SET search_path TO climate;
If you've converted a schema then migrating data would be the easy part:
dump schema from PostgreSQL (you said that you've converted schema to postgres, so we will dump it for now, as we will be deleting and recreating target database, to have it cleaned):
pg_dump dbname > /tmp/dbname-schema.sql
split schema to 2 parts —
/tmp/dbname-schema-1.sql
containing create table statements,/tmp/dbname-schema-2.sql
— the rest. PostgreSQL needs to import data before foreign keys, triggers etc. are imported, but after table definitions are imported.recreate database with only 1 part of schema:
drop database dbname create database dbname \i /tmp/dbname-schema-1.sql -- now we have tables without data, triggers, foreign keys etc.
import data:
( echo 'start transaction'; mysqldump --skip-quote-names dbname | grep ^INSERT; echo 'commit' ) | psql dbname -- now we have tables with data, but without triggers, foreign keys etc.
A
--skip-quote-names
option is added in MySQL 5.1.3, so if you have older version, then install newer mysql temporarily in/tmp/mysql
(configure --prefix=/tmp/mysql && make install
should do) and use/tmp/mysql/bin/mysqldump
.import the rest of schema:
psql dbname start transaction \i /tmp/dbname-schema-2.sql commit -- we're done
Check out etlalchemy. It allows you migrate from MySQL to PostgreSQL, or between several other databases, in 4 lines of Python. You can read more about it here.
To install: pip install etlalchemy
To run:
from etlalchemy import ETLAlchemySource, ETLAlchemyTarget
# Migrate from MySQL to PostgreSQL
src = ETLAlchemySource("mysql://user:passwd@hostname/dbname")
tgt = ETLAlchemyTarget("postgresql://user:passwd@hostname/dbname",
drop_database=True)
tgt.addSource(src)
tgt.migrate()
精彩评论