I'm writing a PHP script which imports data from tables in two different databases into another one. I've got it working ok with sample data, except now I've moved to using data closer resembling its final use: 25+ million records per table, and growing daily. Obviously, efficiency is a bit of a concern.
Here's how it current works. I copy the table structure, adding a couple of extra fields to maintain key integrity:
other1.someTable (field1, field2, field3) Pk = [field1, field2]
other2.someTable (field1, field2, field3) Pk = [field1, field2]
mydb.someTable (id, source, field1, field2, field3)
Pk = id, Unique key = [source, field1, field2]
And here's the SQL. It has a ON DUPLICATE KEY UPDATE
statement because this import needs to be done regularly, updating the data in "mydb". Thankfully, records won't be deleted from the "other" database (i think!).
INSERT INTO mydb.someTable (source, field1, field2, field3)
SELECT 1, field1, field2, field3 FROM other1.someTable
ON DUPLICATE KEY UPDATE field1 = field1, field2 = field2, field3 = field3;
INSERT INTO mydb.someTable (source, field1, field2, field3)
SELECT 2, field1, field2, field3 FROM other2.someTable;
ON DUPLICATE KEY UPDATE field1 = field1, field2 = field2, field3 = field3;
My question is this: Is this the best possible way to do this? Are the开发者_开发问答re any other methods which might be faster, considering there are going to be millions and millions of records, totaling many gigabytes of data per table?
Are you sure there are no duplicate IDs? Or, if there are, are you always going to overwrite them with data from the second database?
Additionally, do you do any processing on the data you obtain from DB1 / DB2 prior to inserting / updating it into 3rd database?
If the answers are "yes" to the first question and "no" to the third, it will likely be a lot faster to use LOAD DATA INFILE. Select data from DB1 and DB2 and load them in sequence.
Well on your On Duplicate Key Update, there is not need to update field1 and field2 as they are the key and have been matched.
The other question is: do you care if 1 sets field3 to one value and then 2 sets it to another -- and again tomorrow, and the day after -- is this something to know happened?
Have you considered using federated tables?
精彩评论