We're doing a sort of roll-your-own replication from a SQL Server 2008 R2 database to a MySQL 5.1 database. This isn't true replication, as the MySQL box is acting as permanent storage and the SQL Server is what we're forced to use as a data aggregator.
So, I've written a stored procedure that checks the latest row inserted on my table in question on the MySQL database, populates a temporary table on the SQL Server end with a new batch of rows to insert back on the MySQL box, and does the inserts. The problem is, this is painfully slow. I have a backlog of >24 million rows to send from the SQL Server box to MySQL, and at the rate I'm getting <2 rows/second, it's going to take nearly 6 months to complete and will never be able to keep up with new data as it hits the SQL Server database.
The boxes are not on the same network--the MySQL box sits pretty darn near the country's connection to the Internet back bone, but the SQL Server box is (for reasons beyond our control) only connected through a business DSL line (not positive of the upstream speed). If I connect through the command line (as opposed to by querying through the linked server in SQL Server), I'm able to insert rows in the MySQL database at an average of ~0.03 seconds per row. This leads me to believe that either my stored procedure is just incredibly inefficient, or OPENQUERY/Linked Servers are just inherently slow. Here is the stored procedure:
DECLARE @LastSensorLogDateFormatted DATETIME
DECLARE @LastSensorLogDate VARCHAR(30)
DECLARE @LastSensorLogMillis INT
DECLARE @LastSensorLogEibaddress VARCHAR(30)
DECLARE @SensorLogReplicated BIGINT
DECLARE @counter INT
-- Create a temporary table to store last write
-- to MySQL database on CASALA-DB01
CREATE TABLE #SensorLogRecord
(LastSensorLogDate VARCHAR(30)
, LastSensorLogMillis INT
, LastSensorLogEibaddress VARCHAR(30))
-- Dump result of query on CASALA-DB01 into our temporary table
INSERT INTO #SensorLogRecord
(LastSensorLogDate, LastSensorLogMillis, LastSensorLogEibaddress )
(SELECT date, date_millis, eib_address
FROM
OPENQUERY(MYSQL4, 'SELECT date, date_millis, eib_address
FROM cabie.sensors_log_redux ORDER BY date desc LIMIT 1'))
-- Store the last sensor log date and EIB address
-- from our temporary table into local vars
SELECT
@LastSensorLogDate = LastSensorLogDate,
@LastSensorLogMillis = LastSensorLogMillis,
@LastSensorLogEibaddress = LastSensorLogEibaddress
FROM #SensorLogRecord
SET @LastSensorLogDateFormatted =
CAST((LEFT(@LastSensorLogDate, 20) +
CAST(@LastSensorLogMillis as VARCHAR)) AS DATETIME)
SET @counter = 0
WHILE (1=1)
BEGIN
CREATE TABLE #RecordHolder (Id BIGINT)
INSERT INTO #RecordHolder (Id)
SELECT TOP 1000 Sensor_Id FROM dbo.Sensors_Log
WHERE Sensor_Id NOT IN (SELECT * FROM dbo.Sensors_Arch开发者_开发技巧ivals)
AND dbo.Sensors_Log.Date <= GETDATE()
AND dbo.Sensors_Log.EibAddress <> @LastSensorLogEibaddress
AND dbo.Sensors_Log.Date <> @LastSensorLogDateFormatted
INSERT OPENQUERY(MYSQL4,
'SELECT date, eib_address, ip_address, value, application, phys_address
FROM sensors_log_redux_holding')
SELECT
CONVERT(VARCHAR, GNH.dbo.Sensors_Log.Date,121),
GNH.dbo.Sensors_Log.EibAddress,
GNH.dbo.Sensors_Log.Ip_Address,
GNH.dbo.Sensors_Log.Value,
GNH.dbo.Sensors_Log.Application,
GNH.dbo.Sensors_Log.Phys_Address
FROM GNH.dbo.Sensors_Log
JOIN #RecordHolder
ON (#RecordHolder.Id = GNH.dbo.Sensors_Log.Sensor_Id)
INSERT INTO dbo.Sensors_Archivals (Row_Id) SELECT Id FROM #RecordHolder
DROP TABLE #RecordHolder
IF (@counter >= 1000000)
BREAK
END
I know that's messy...we've been trying different approaches to getting this to work, so there are probably unused variables, etc. For what it's worth, just running a that first select query on the linked server takes upwards of 40 seconds.
EDIT: Here's my linked server setup:
- Provider:
Microsoft OLE Provider for ODBC Drivers
- Product Name:
MySQL 5.1
- Provider String:
DRIVER={MySQL ODBC 5.1 Driver};SERVER=192.168.17.5;DATABASE=cabie;USERNAME=ourUsername;PASSWORD=ourPassword
- Advanced Options:
- Collation Compatible:
FALSE
- Data Access:
TRUE
- RPC:
FALSE
- RPC Out:
FALSE
- Use Remote Collation:
TRUE
- Collation Name:
(Blank)
- Connection Timeout:
0
- Query Timeout:
0
- Distributor:
FALSE
- Publisher:
FALSE
- Subscriber:
FALSE
- Lazy Schema Validation:
FALSE
- Enable Promotion of Distributed Transactions:
TRUE
- Collation Compatible:
Here's part of the problem...
# Provider: Microsoft OLE Provider for ODBC Drivers
# Provider String: DRIVER={MySQL ODBC 5.1 Driver};...
ODBC is famously slow and you are using it on both ends.
An alternative approach would be to write an application in Delphi or C(#). If this application connects using native client dll's to the database.
I know that Delphi with the open source ZEOS database using the client dll's freely available from Microsoft and MySQL will transfer data many times faster than ODBC will do.
Also C# or any modern programming language that uses the native client dll's with SQL will do the job. Make sure that if you use a in-between program to bridge between the two databases that it does not fall back to ODBC, lots of (mainly older) programming environments do this out of lazyness or cheapness with the same crappy transfer speeds that your getting now.
Hope this helps...
精彩评论