开发者

SQL optimization on multi-table queries

开发者 https://www.devze.com 2023-02-04 09:57 出处:网络
I\'m working with a large database of unemployment data built from the files available from the department of labor statistics here:

I'm working with a large database of unemployment data built from the files available from the department of labor statistics here:

ftp://ftp.bls.gov/pub/time.series/la/

I'm building several queries to pull up sections of data, and after adding indexes to all the tables I could 开发者_运维百科think of many are still taking several seconds or more.

My first query returns all of the sub-areas within a state that there is unemployment data available for. After adding indexes to the Series table and the Area table, execution time went from 2 to 0.9 seconds but I can't get it any lower. I think the DISTINCT is making it take so long, but it's necessary to keep the records from returning duplicates.

SELECT DISTINCT series.area_code, area.area_text FROM Alabama 
LEFT JOIN series ON Alabama.series_id=series.series_id 
LEFT JOIN area ON series.area_code=area.area_code
WHERE area.area_type_code != 'A';

My second query, which actually pulls up the data for each of these areas, only takes 0.3 seconds even if it pulls up far more records:

USE unemploymentdata;
SELECT DISTINCT * FROM Alabama 
LEFT JOIN series ON Alabama.series_id=series.series_id 
LEFT JOIN area ON series.area_code=area.area_code
WHERE area.area_type_code != 'A' 
AND area.area_code = 'CA011420'
AND year > 2000;

I have very little knowledge about database and query optimization at this point - can anyone give me any pointers on my queries, or on adding indexes, etc to the database itself to speed up my transactions?


My guess is that the Area code / text are slowly changing data so why not put them into their own table. You can then substitute them on teh alabama table with an id which will shrink the size of that table down which will make it quicker to read from that table too.

As you aren't actually using any data from the alabama table in the first query this may be quicker without a table change.

SELECT DISTINCT series.area_code, area.area_text 
FROM  series
LEFT JOIN area ON series.area_code=area.area_code
WHERE area.area_type_code != 'A';
and series_id in (select series_id from Alabama)


Your problem is probably the LEFT join. Did you mean to make it a normal join? (The left join will return null if there are no records in the table on the right that match)


SELECT  DISTINCT 
        series.area_code, 
        area.area_text 
FROM    Alabama LEFT JOIN   
        series ON Alabama.series_id=series.series_id LEFT JOIN  
        area ON series.area_code=area.area_code
WHERE   area.area_type_code != 'A';

Can you change it to be INNER JOINS instead?

SELECT   DISTINCT 
         series.area_code, 
         area.area_text 
FROM     Alabama INNER JOIN 
         series ON Alabama.series_id=series.series_id INNER JOIN    
         area ON series.area_code=area.area_code
WHERE   area.area_type_code != 'A'

Is the alabama table needed at all? Do you get the same results if you remove it like I have here?

SELECT   DISTINCT 
         series.area_code, 
         area.area_text 
FROM     series INNER JOIN  
         area ON series.area_code = area.area_code
WHERE   area.area_type_code != 'A'

Same question about the series table, can it be removed?

SELECT   DISTINCT 
         area.area_code, 
         area.area_text 
FROM     area 
WHERE   area.area_type_code != 'A'

If not, then indexes it is.

First the area table. Add a index with the following columns

area_type_code, area_code, area_text

the series table, (test which is the faster.)

series_id, area_code

or

area_code, series_id

the alabama table create simple index with the following column

series_id
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号