开发者

merging tables which consist of 17 million records

开发者 https://www.devze.com 2023-03-14 18:51 出处:网络
I have 3 tables in which 2 tables have 200 000 records and another table of 1 800 000records. I do merge these 3 tables using 2 contraints that is OCN and TIMESTAMP(month,year). first two tables has c

I have 3 tables in which 2 tables have 200 000 records and another table of 1 800 000 records. I do merge these 3 tables using 2 contraints that is OCN and TIMESTAMP(month,year). first two tables has columns for month and year as Monthx (which includes both month,date and year). and other table as seperate columns for ea开发者_开发问答ch month and year. I gave the query as,

mysql--> insert into trail 
  select * from A,B,C 
  where A.OCN=B.OCN 
  and B.OCN=C.OCN 
  and C.OCN=A.OCN 
  and date_format(A.Monthx,'%b')=date_format(B.Monthx,'%b') 
  and date_format(A.Monthx,'%b')=C.IMonth
  and date_format(B.Monthx,'%b')=C.month
  and year(A.Monthx)=year(B.Monthx)
  and year(B.Monthx)=C.Iyear
  and year(A.Monthx)=C.Iyear

I gave this query 4days before its still running.could u tell me whether this query is correct or wrong and provide me a exact query..(i gave tat '%b' because my C table has a column which has months in the form JAN,MAR).


Please don't use implicit where joins, bury it in 1989, where it belongs. Use explicit joins instead

select * from a inner join b on (a.ocn = b.ocn and 
date_format(A.Monthx,'%b')=date_format(B.Monthx,'%b') ....

This select part of the query (had to rewrite it because I refuse to deal with '89 syntax)

select * from A
inner join B on (
  A.OCN=B.OCN 
  and date_format(A.Monthx,'%b')=date_format(B.Monthx,'%b') 
  and year(A.Monthx)=year(B.Monthx)
  )
inner join C on (
  C.OCN=A.OCN 
  and date_format(A.Monthx,'%b')=C.IMonth 
  and date_format(B.Monthx,'%b')=C.month
  and year(B.Monthx)=C.Iyear
  and year(A.Monthx)=C.Iyear
  )

Has a lot of problems.

  1. using a function on a field will kill any opportunity to use an index on that field.
  2. you are doing a lot of duplicate test. if (A = B) and (B = C) then it logically follows that (A = C)
  3. the translations of the date fields take a lot of time

I would suggest you rewrite your tables to use fields that don't need translating (using functions), but can be compared directly.
A field like yearmonth : char(6) e.g. 201006 can be indexed and compared much faster.

If the table A,B,C have a field called ym for short than your query can be:

INSERT INTO TRAIL
SELECT a.*, b.*, c.*  FROM a
INNER JOIN b ON (
  a.ocn = b.ocn 
  AND a.ym = b.ym
  )
INNER JOIN c ON (
  a.ocn = c.ocn 
  AND a.ym = c.ym
  );

If you put indexes on ocn (primary index probably) and ym the query should run about a million rows a second (or more).


To test if your query is ok, import a small subset of records from A, B and C to a temporary database and test it their.

You have redundancies in your implicit JOIN because you are joining A.OCN with B.OCN, B.OCN with C.OCN and then C.OCN to A.OCN, on of those can be deleted. If A.OCN = B.OCN and B.CON = C.OCN, A.OCN = C.OCN is implied. Further, I guess you have redundancies in your date comparisons.

0

精彩评论

暂无评论...
验证码 换一张
取 消