I have a SUPER slow query, which I posted here: http://pastebin.com/E5sdRi7e. When I did an EXPLAIN, I got the following:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 5 Using filesort
2 DERIVED Workflow ALL PRIMARY NULL NULL NULL 9 Using temporary; Using filesort
2 DERIVED <derived3> ALL NULL NULL NULL NULL 141 Using where; Using join buffer
2 DERIVED DataSource ALL PRIMARY NULL NULL NULL 1310 Using where; Using join buffer
2 DERIVED <derived4> ALL NULL NULL NULL NULL 1310 Using where; Using join buffer
2 DERIVED User eq_ref PRIMARY PRIMARY 4 LatestDataSourceActivityLog.UserId 1
4 DERIVED t1 ALL NULL NULL NULL NULL 5400 Using where; Using temporary; Using filesort
5 DEPENDENT SUBQUERY t2 ref DataSourceId DataSourceId 4 companyname_db.t1.DataSourceId 4
3 DERIVED DataSource range PRIMARY PRIMARY 4 NULL 142 Using where
What does the above table tell me? Does it help me identify which fields should be indexed?
Any help is greatly appreciated.
Query
SELECT WrappedData.*
FROM (SELECT ParentLeafNodeDataSource.Id,
LatestDataSourceActivityLog.UserId,
DataSource.Status AS StatusCode,
( CASE
WHEN User.Name IS NULL THEN 'CompanyName'
ELSE User.Name
开发者_高级运维 END ) AS `Username`,
Workflow.Name AS WorkflowName,
LatestDataSourceActivityLog.Timestamp
FROM DataSource,
Workflow,
(SELECT *
FROM DataSource
WHERE DataSource.Id IN ( 0, 1, 2, 3,
4, 5, 6, 7,
8, 9, 10, 11,
12, 13, 16, 21,
22, 23, 24, 25,
26, 27, 28, 29,
30, 31, 32, 33,
34, 35, 36, 37,
38, 39, 40, 41,
42, 43, 44, 45,
46, 47, 48, 49,
50, 51, 52, 53,
54, 55, 56, 57,
58, 59, 60, 61,
62, 63, 64, 65,
66, 67, 68, 69,
70, 71, 72, 73,
74, 75, 76, 77,
78, 79, 80, 81,
83, 84, 85, 86,
87, 88, 89, 90,
91, 92, 93, 94,
95, 96, 97, 98,
99, 100, 101, 102,
103, 104, 105, 106,
107, 108, 109, 110,
111, 112, 113, 114,
115, 116, 117, 118,
119, 120, 142, 1293,
1294, 1295, 1296, 1297,
1298, 1299, 143, 1300,
1301, 1302, 1303, 1304,
1305, 1306, 144, 146,
145, 1307, 1308, 1309,
1310, 147, 149, 148,
150, 151 )) AS ParentLeafNodeDataSource,
(SELECT t1.*
FROM DataSourceActivityLog AS t1
WHERE Timestamp = (SELECT Max(t2.Timestamp)
FROM DataSourceActivityLog AS t2
WHERE t1.DataSourceId = t2.DataSourceId)
GROUP BY t1.DataSourceId) AS LatestDataSourceActivityLog
LEFT JOIN User
ON User.Id = LatestDataSourceActivityLog.UserId
WHERE ParentLeafNodeDataSource.Status = '203'
OR ParentLeafNodeDataSource.Status = '204'
AND Workflow.Id = ParentLeafNodeDataSource.WorkflowId
AND LatestDataSourceActivityLog.DataSourceId = ParentLeafNodeDataSource.Id
AND DataSource.Id = LatestDataSourceActivityLog.DataSourceId
AND LatestDataSourceActivityLog.UserId = 1
GROUP BY ParentLeafNodeDataSource.Id) AS WrappedData
ORDER BY WrappedData.`Timestamp` DESC
It's very difficult to say conclusively, but here are a couple refactoring things.
On performance, the first thing to look at are GROUP functions.
(SELECT t1.*
FROM DataSourceActivityLog AS t1
WHERE Timestamp = (SELECT Max(t2.Timestamp)
FROM DataSourceActivityLog AS t2
WHERE t1.DataSourceId = t2.DataSourceId)
GROUP BY t1.DataSourceId) AS LatestDataSourceActivityLog
Which can eliminate the use of MAX entirely
(SELECT t1.*
FROM DataSourceActivityLog AS t1
WHERE Timestamp = (SELECT t2.Timestamp
FROM DataSourceActivityLog AS t2
WHERE t1.DataSourceId = t2.DataSourceId
ORDER BY t2.Timestamp DESC
LIMIT 1)
GROUP BY t1.DataSourceId) AS LatestDataSourceActivityLog
Probably not a big performance issue, but here you can use IFNULL or COALESCE instead of a CASE.
( CASE
WHEN User.Name IS NULL THEN 'CompanyName'
ELSE User.Name
END )
Instead
( IFNULL(User.Name,'CompanyName' )
In terms of indexes, they increase SELECT performance by making lookups easier, but they slow down write operations as the indexes have to be updated as well. If your application isn't write-heavy, you should be indexing commonly searched columns, particularly in large tables.
In this query, it looks like you'd benefit by adding an index to DataSourceId, but I can't test if there's any gain. The primary keys will already be indexed.
I would try the following:
- the outer wrapper is completely useless, putting the ORDER BY in the inner query should work the same
- try to rewrite the subqueries to be used as JOIN's
- then move the WHERE-clauses to the relevant JOINS's so the intermediate resultset becomes smaller
- look at the WHERE and JOIN's which indexes should be made.
A quick try (I'm not sure the result will be the same)
SELECT
dsa.Status AS StatusCode,
dsb.Id,
dsl.UserId,
dsl.Timestamp
wf.Name AS WorkflowName,
COALESCE(u.Name, 'CompanyName') AS `Username`
FROM
DataSource dsa
INNER JOIN DataSource dsb
ON dsb.Id IN ( 0, 1, 2, 3, 4, 5, 6, 7, etc ))
AND dsb.Status = '203' OR dsb.Status = '204'
INNER JOIN DataSourceActivityLog dsl
ON dsl.DataSourceId=dsa.Id
AND dsl.DataSourceId=dsb.Id
AND dsl.UserId = 1
AND dsl.Timestamp=(
SELECT MAX(t2.Timestamp)
FROM DataSourceActivityLog AS dslt
WHERE dslt.DataSourceId = dsl.DataSourceId
)
INNER JOIN Workflow wf
ON wf.Id = dsb.WorkflowId
LEFT JOIN User u
ON u.Id = dsl.UserId
GROUP BY
dsl.Id
ORDER BY
dsl.Timestamp DESC
Perhaps using the refactor of Zurahn to get rid of the GROUP BY in the subquery
With indexes on:
- DataSource.WorkFlowId, DataSource.Status
- DataSourceActivityLog.Timestamp, DataSourceActivityLog.UserId, DataSourceActivityLog.DataSourceId
Ok actually, I came to the conclusion that dsb (originally ParentLeafNodeDataSource) is actually the source of the data, and this could fill the WHERE clause. Personally I try to start with the source of the data, and then JOIN the rest against that. This will usually result in a query where it is easy to understand what is actually selected. Instead of the last JOIN suddenly cutting down on the resultset. So reordering the JOIN's could do that, and it would be something like:
SELECT
dsa.Status AS StatusCode,
dsb.Id,
dsl.UserId,
dsl.Timestamp
wf.Name AS WorkflowName,
COALESCE(u.Name, 'CompanyName') AS `Username`
FROM
DataSource dsb
INNER JOIN Workflow wf
ON dsb.WorkflowId=wf.Id
INNER JOIN DataSourceActivityLog dsl
ON dsl.DataSourceId=dsb.Id
AND dsl.UserId=1
AND dsl.Timestamp=(
SELECT MAX(t2.Timestamp)
FROM DataSourceActivityLog AS dslt
WHERE dslt.DataSourceId = dsl.DataSourceId
)
INNER JOIN DataSource dsa
ON dsl.DataSourceId=dsa.Id
LEFT JOIN User u
ON dsl.UserId=u.Id
WHERE
dsb.Id IN ( 0, 1, 2, 3, 4, 5, 6, 7, etc ))
AND dsb.Status = '203' OR dsb.Status = '204'
GROUP BY
dsl.Id
ORDER BY
dsl.Timestamp DESC
Have you considered the MySql Query Profiler ?
This is how you will understand your performance problems.
Without that step, most people here will sadly prefer writing jokes on your query than trying to help you.
精彩评论