I'm working with a large database of unemployment data built from the files available from the department of labor statistics here:
ftp://ftp.bls.gov/pub/time.series/la/
I'm building several queries to pull up sections of data, and after adding indexes to all the tables I could 开发者_运维百科think of many are still taking several seconds or more.
My first query returns all of the sub-areas within a state that there is unemployment data available for. After adding indexes to the Series table and the Area table, execution time went from 2 to 0.9 seconds but I can't get it any lower. I think the DISTINCT is making it take so long, but it's necessary to keep the records from returning duplicates.
SELECT DISTINCT series.area_code, area.area_text FROM Alabama
LEFT JOIN series ON Alabama.series_id=series.series_id
LEFT JOIN area ON series.area_code=area.area_code
WHERE area.area_type_code != 'A';
My second query, which actually pulls up the data for each of these areas, only takes 0.3 seconds even if it pulls up far more records:
USE unemploymentdata;
SELECT DISTINCT * FROM Alabama
LEFT JOIN series ON Alabama.series_id=series.series_id
LEFT JOIN area ON series.area_code=area.area_code
WHERE area.area_type_code != 'A'
AND area.area_code = 'CA011420'
AND year > 2000;
I have very little knowledge about database and query optimization at this point - can anyone give me any pointers on my queries, or on adding indexes, etc to the database itself to speed up my transactions?
My guess is that the Area code / text are slowly changing data so why not put them into their own table. You can then substitute them on teh alabama table with an id which will shrink the size of that table down which will make it quicker to read from that table too.
As you aren't actually using any data from the alabama table in the first query this may be quicker without a table change.
SELECT DISTINCT series.area_code, area.area_text
FROM series
LEFT JOIN area ON series.area_code=area.area_code
WHERE area.area_type_code != 'A';
and series_id in (select series_id from Alabama)
Your problem is probably the LEFT join. Did you mean to make it a normal join? (The left join will return null if there are no records in the table on the right that match)
SELECT DISTINCT
series.area_code,
area.area_text
FROM Alabama LEFT JOIN
series ON Alabama.series_id=series.series_id LEFT JOIN
area ON series.area_code=area.area_code
WHERE area.area_type_code != 'A';
Can you change it to be INNER JOINS instead?
SELECT DISTINCT
series.area_code,
area.area_text
FROM Alabama INNER JOIN
series ON Alabama.series_id=series.series_id INNER JOIN
area ON series.area_code=area.area_code
WHERE area.area_type_code != 'A'
Is the alabama table needed at all? Do you get the same results if you remove it like I have here?
SELECT DISTINCT
series.area_code,
area.area_text
FROM series INNER JOIN
area ON series.area_code = area.area_code
WHERE area.area_type_code != 'A'
Same question about the series table, can it be removed?
SELECT DISTINCT
area.area_code,
area.area_text
FROM area
WHERE area.area_type_code != 'A'
If not, then indexes it is.
First the area table. Add a index with the following columns
area_type_code, area_code, area_text
the series table, (test which is the faster.)
series_id, area_code
or
area_code, series_id
the alabama table create simple index with the following column
series_id
精彩评论