How would I make this query run faster...?
SELECT account_id, account_name, account_update, account_sold, account_mds, ftp_url, ftp_livestatus, number_digits, number_cw, client_name, ppc_status, user_name FROM Accounts, FTPDetails, SiteNumbers, Clients, PPC, Users WHERE Accounts.account_id = FTPDetails.ftp_accountid AND Accounts.account_id = SiteNumbers.number_accountid AND Accounts.account_client = Clients.client_id AND Accounts.account_id = PPC.ppc_accountid AND Accounts.account_designer = Users.user_id AND Accounts.account_active = 'active' AND FTPDetails.ftp_active = 'active' AND SiteNumbers.number_active = 'active' AND Clients.client_active = 'active' AND PPC.ppc_active = 'active' AND Users.user_active = 'active' ORDER BY Accounts.account_update DESC
Thanks in advance :)
EXPLAIN query results:
I don't really have any fore开发者_JS百科ign keys set up...I was trying to avoid making alterations to the database as will have to do a complete overhaul soon.
only primary keys are the id of each table e.g. account_id, ftp_id, ppc_id ...
Indexes
You need - at least - an index on every field that is used in a
JOIN
condition.Indexes on the fields that appear in
WHERE
orGROUP BY
orORDER BY
clauses are most of the time useful, too.When in a table, two or more fields are used in JOIns (or WHERE or GROUP BY or ORDER BY), a compound (combined) index of these (two or more) fields may be better than separate indexes. For example in the
SiteNumbers
table, possible indexes are the compound(number_accountid, number_active)
or(number_active, number_accountid)
.Condition in fields that are Boolean (ON/OFF, active/inactive) are sometimes slowing queries (as indexes are not selective and thus not very helpful). Restructuring (father normalizing) the tables is an option in that case but probably you can avoid the added complexity.
Besides the usual advice (examine the EXPLAIN plan, add indexes where needed, test variations of the query),
I notice that in your query there is a partial Cartesian Product. The table Accounts
has a one-to-many relationships to three tables FTPDetails
, SiteNumbers
and PPC
. This has the effect that if you have for example 1000 accounts, and every account is related to, say, 10 FTPDetails, 20 SiteNumbers and 3 PPCs, the query will return for every account 600 rows (the product of 10x20x3). In total 600K rows where many data are duplicated.
You could instead split the query into three plus one for base data (Account and the rest tables). That way, only 34K rows of data (having smaller length) would be transfered :
Accounts JOIN Clients JOIN Users
(with all fields needed from these tables)
1K rows
Accounts JOIN FTPDetails
(with Accounts.account_id and all fields from FTPDetails)
10K rows
Accounts JOIN SiteNumbers
(with Accounts.account_id and all fields from SiteNumbers)
20K rows
Accounts JOIN PPC
(with Accounts.account_id and all fields from PPC)
3K rows
and then use the data from the 4 queries in the client side to show combined info.
I would add the following indexes:
Table Accounts
index on (account_designer)
index on (account_client)
index on (account_active, account_id)
index on (account_update)
Table FTPDetails
index on (ftp_active, ftp_accountid)
Table SiteNumbers
index on (number_active, number_accountid)
Table PPC
index on (ppc_active, ppc_accountid)
Use EXPLAIN to find out which index could be used and which index is actually used. Create an appropriate index if necessary.
If FTPDetails.ftp_active
only has the two valid entries 'active'
and 'inactive'
, use BOOL
as data type.
As a side note: I strongly suggest using explicit joins instead of implicit ones:
SELECT
account_id, account_name, account_update, account_sold, account_mds,
ftp_url, ftp_livestatus,
number_digits, number_cw,
client_name,
ppc_status,
user_name
FROM Accounts
INNER JOIN FTPDetails
ON Accounts.account_id = FTPDetails.ftp_accountid
AND FTPDetails.ftp_active = 'active'
INNER JOIN SiteNumbers
ON Accounts.account_id = SiteNumbers.number_accountid
AND SiteNumbers.number_active = 'active'
INNER JOIN Clients
ON Accounts.account_client = Clients.client_id
AND Clients.client_active = 'active'
INNER JOIN PPC
ON Accounts.account_id = PPC.ppc_accountid
AND PPC.ppc_active = 'active'
INNER JOIN Users
ON Accounts.account_designer = Users.user_id
AND Users.user_active = 'active'
WHERE Accounts.account_active = 'active'
ORDER BY Accounts.account_update DESC
This makes the query much more readable because the join condition is close to the name of the table that is being joined.
EXPLAIN, benchmark different options. For starters, I'm sure that several queries will be faster than this monster. First, because query optimiser will spend a lot of time examining what join order is the best (5!=120 possibilities). And second, queries like SELECT ... WHERE ....active = 'active'
will be cached (though it depends on an amount of data changes).
One of your main problems is here: x.y_active = 'active'
Problem: low cardinality
The active field is a boolean field with 2 possible values, as such it has very low cardinality.
MySQL (or any SQL for that matter will not use an index when 30% or more of the rows have the same value).
Forcing the index is useless because it will make your query slower, not faster.
Solution: partition your tables
A solution is to partition your tables on the active
columns.
This will exclude all non-active fields from consideration, and will make the select
act as if you actually have a working index on the xxx-active
fields.
Sidenote
Please don't ever use implicit where
joins, it's much too error prone and consufing to be useful.
Use a syntax like Oswald's answer instead.
Links:
Cardinality: http://en.wikipedia.org/wiki/Cardinality_(SQL_statements)
Cardinality and indexes: http://www.bennadel.com/blog/1424-Exploring-The-Cardinality-And-Selectivity-Of-SQL-Conditions.htm
MySQL partitioning: http://dev.mysql.com/doc/refman/5.5/en/partitioning.html
精彩评论