开发者

Populating a table with a large amount of data from a temporary table - MySQL

开发者 https://www.devze.com 2023-02-02 07:53 出处:网络
Ok I have a temporary MySQL table with 135,000 rows, from this temporary table I want to populate several other tables.

Ok I have a temporary MySQL table with 135,000 rows, from this temporary table I want to populate several other tables.

First of all this is the structure of the temporary table

CREATE TEMPORARY TABLE TVTEMPTABLE ( PROGTITLE TEXT, SUBTITLE TEXT, EPISODE TEXT, YR YEAR, DIRECTOR TEXT, PERFORMERS TEXT, PREMIERE BOOL, FILM BOOL, RPEAT BOOL, SUBTITLES BOOL, WIDESCREEN BOOL, NEWSERIES BOOL, DEAFSIGNED BOOL, BNW BOOL, STARRATING TINYINT, CERTIFICATE VARCHAR(5), GENRE VARCHAR(50), DESCRIPTION TEXT, CHOICE BOOL, PROGDATE DATE, STARTIME TIME, ENDTIME TIME, DURATION INT, CHANNELID INT NOT NULL)

And this is the structure of one of the tables I plan to populate from this.

CREATE TABLE PROGRAMME ( PROGRAMMEID INT NOT NULL AUTO_INCREMENT, GENREID INT NOT NULL, PROGTITLE VARCHAR(50), YR YEAR, DIRECTOR VARCHAR(50), PERFORMERS TEXT, FILM BOOL, WIDESCREEN BOOL, BNW BOOL, CERTIFICATE VARCHAR(5), DESCRIPTION TEXT, PRIMARY KEY(PROGRAMMEID), INDEX (GENREID), FOREIGN KEY (GENREID) REFERENCES GENRE(GENREID) ) ENGINE=INNODB;

And this is how I do my insert to the programme table

INSE开发者_JS百科RT INTO PROGRAMME ( GENREID, PROGTITLE, YR, DIRECTOR, PERFORMERS, FILM, WIDESCREEN, BNW, CERTIFICATE, DESCRIPTION) SELECT G.GENREID, T.PROGTITLE, T.YR, T.DIRECTOR, T.PERFORMERS, T.FILM, T.WIDESCREEN, T.BNW, T.CERTIFICATE, T.DESCRIPTION FROM TVTEMPTABLE T,GENRE G WHERE G.GENRENAME = T.GENRE AND NOT EXISTS ( SELECT * FROM PROGRAMME P WHERE P.PROGTITLE = T.PROGTITLE )

This is however taking a very very long time to do, how should I approach this?

Thanks, Paul

Ok thanks guys still having a few issues with this I'm trying the left join example, I've found however that if the table I'm inserting into is empty to begin with though, then it inserts duplicates. Here's a simple example

CREATE TEMPORARY TABLE TEMP(
    GENRENAME TEXT);

CREATE TABLE GENRE(
    GENREID INT NOT NULL AUTO_INCREMENT,
    GENRENAME TEXT, PRIMARY KEY(GENREID)
) ENGINE=INNODB;

INSERT INTO TEMP(
    GENRENAME)
VALUES("news");

INSERT INTO TEMP(
    GENRENAME)
VALUES("news");

This inserts the "news" genre into the temporary table twice. Now if I run this SQL command

INSERT INTO GENRE(
    GENRENAME)
SELECT
    T.GENRENAME
FROM
    TEMP T
LEFT JOIN
    GENRE G ON G.GENRENAME=T.GENRENAME
WHERE
    G.GENRENAME IS NULL;

It inserts "news" twice into the genre table which is wrong. If I run the same command again, it correctly doesn't insert any new rows.


How about this:

INSERT INTO PROGRAMME ( GENREID, PROGTITLE, YR, DIRECTOR, PERFORMERS, FILM, WIDESCREEN, BNW, CERTIFICATE, DESCRIPTION)
SELECT G.GENREID, T.PROGTITLE, T.YR, T.DIRECTOR, T.PERFORMERS, T.FILM, T.WIDESCREEN, T.BNW, T.CERTIFICATE, T.DESCRIPTION
FROM TVTEMPTABLE T,GENRE G
WHERE G.GENRENAME = T.GENRE
AND T.PROGTITLE NOT IN ( SELECT DISTINCT P.PROGTITLE FROM PROGRAMME P) 

I believe your NOT EXISTS has to execute for each selected row. Replace it with a single static subquery and check NOT IN on it.


You are doing a (possibly large) sub-select for each row.

I would suggest doing a LEFT JOIN against PROGRAMME, and then only inserting rows where the join result is NULL, as follows:

INSERT INTO PROGRAMME (
    GENREID, PROGTITLE, YR, DIRECTOR,
    PERFORMERS, FILM, WIDESCREEN, BNW,
    CERTIFICATE, DESCRIPTION)
SELECT
    G.GENREID, T.PROGTITLE, T.YR, T.DIRECTOR,
    T.PERFORMERS, T.FILM, T.WIDESCREEN, T.BNW,
    T.CERTIFICATE, T.DESCRIPTION
FROM
    TVTEMPTABLE T
    INNER JOIN GENRE G ON G.GENRENAME=T.GENRE
    LEFT JOIN PROGRAMME P ON P.PROGTITLE=T.PROGTITLE
WHERE
    P.PROGTITLE IS NULL

As an aside: have you considered (a) pretty-printing your code to be more readable and (b) not using all-caps field names?


i would first look at the performance of the embedded select statement in your insert, especially the NOT EXISTS clause. Make sure you have good indexes there.

another thought is to break it into smaller chunks so you do not have rollback space problems. So see if you can insert only 1000 or 10,000 rows at a time, then commit, then run again. etc.


wrap your insert statement in a transaction.

start transaction;

insert into programme (...) select ... from tvtemptable ...

commit;

if it's still slow then post the explain plan of the select portion of your insert statement so we can see what's going on :P


This was a combination of using left/inner join and the distinct keyword.


Create an index for P.PROGTITLE and T.PROGTITLE.

0

精彩评论

暂无评论...
验证码 换一张
取 消