I'm having some issues dealing with updating and inserting millions of row in a MySQL Database. I need to flag 50 million rows in Table A, insert some data from the marked 50 million rows into Table B, then update those same 50 million rows in Table A again. There are about 130 million rows in Table A and 80 million in Table B.
This needs to happen on a live server without denying access to other queries from the website. The problem is while this stored procedure is running, other queries from the website end up locked and the HTTP request times out.
Here's gist of the SP, a little simplified for illustration purposes:
CREATE DEFINER=`user`@`localhost` PROCEDURE `MyProcedure`(
totalLimit int
)
BEGIN
SET @totalLimit = totalLimit;
/* Prepare new rows to be issued */
PREPARE STMT FROM 'UPDATE tableA SET `status` = "Bei开发者_如何学Cng-Issued" WHERE `status` = "Available" LIMIT ?';
EXECUTE STMT USING @totalLimit;
/* Insert new rows for usage into tableB */
INSERT INTO tableB (/* my fields */)
SELECT /* some values from TableA */
FROM tableA
WHERE `status` = "Being-Issued";
/* Set rows as being issued */
UPDATE tableB SET `status` = 'Issued' WHERE `status` = 'Being-Issued';
END$$
DELIMITER ;
Processing 50M rows three times will be slow irrespective of what you're doing.
Make sure your updates are affecting smaller-sized, disjoint sets. And execute each of them one by one, rather than each and every one of them within the same transaction.
If you're doing this already and MySQL is misbehaving, try this slight tweak to your code:
create a temporary table
begin
insert into tmp_table
select your stuff
limit ?
for update
do your update on A using tmp_table
commit
begin
do your insert on B using tmp_table
do your update on A using tmp_table
commit
this should keep locks for a minimal time.
What about this? It basically calls the original stored procedure in a loop until the total amount needed is reached, and having a sleep period in between calls (like 2 seconds) to allow other queries to process.
increment
is the amount to do at one time (using 10,000 in this case)
totalLimit
is the total amount to be processed
sleepSec
is the amount of time to rest between calls
BEGIN
SET @x = 0;
REPEAT
SELECT SLEEP(sleepSec);
SET @x = @x + increment;
CALL OriginalProcedure( increment );
UNTIL @x >= totalLimit
END REPEAT;
END$$
Obviously it could use a little math to make sure the increment doesn't go over the total limit if its not evenly divisible, but it appears to work (by work I mean allow other queries to still be processed from web requests), and seems to be faster overall as well.
Any insight here? Is this a good idea? Bad idea?
精彩评论