I have a query (see below) that I have a custom developed UDF that is used to calculate whether or not certain points are within a polygon (first query in UNION) or circular (second query in UNION) shape.
select e.inquiry_match_type_id
, a.geo_boundary_id
, GeoBoundaryContains(c.tpi_geo_boundary_coverage_type_id, 29.287437, -95.055807, a.lat, a.lon, a.geo_boundary_vertex_id ) in_out
, e.inquiry_id
, e.external_id
, COALESCE(f.inquiry_device_id,0) inquiry_device_id
, b.external_info1
, b.external_info2
, b.geo_boundary_id
, b.geo_boundary_type_id
from geo_boundary_vertex a
join geo_boundary b on b.geo_boundary_id = a.geo_boundary_id
join trackpoint_index_geo_boundary_mem c on c.geo_boundary_id = b.geo_boundary_id
join trackpoint_index_mem d on d.trackpoint_index_id = c.trackpoint_index_id
join inquiry_mem e on e.inquiry_id = b.inquiry_id left
outer join inquiry_device_mem f on f.inquiry_id = e.inquiry_id and f.device_id = 3201
where d.trackpoint_index_id = 3127
and b.geo_boundary_type_id = 3
and e.expiration_date >= now()
group by
a.geo_boundary_id
UNION
select e.inquiry_match_type_id
, b.geo_boundary_id
, GeoBoundaryContains( c.tpi_geo_boundary_coverage_type_id, 29.287437, -95.055807, b.centroid_lat, b.centoid_lon, b.radius ) in_out
, e.inquiry_id
, e.external_id
, COALESCE(f.inquiry_device_id,0) inquiry_device_id
, b.external_info1
, b.external_info2
, b.geo_boundary_id
, b.geo_boundary_type_id
from geo_boundary b
join trackpoint_index_geo_boundary_mem c on c.geo_boundary_id = b.geo_boundary_id
join trackpoint_index_mem d on d.trackpoint_index_id = c.trackpoint_index_id
join inquiry_mem e on e.inquiry_id = b.inquiry_id
left outer join inquiry_device_mem f on f.inquiry_id = e.inquiry_id and f.device_id = 3201
where d.trackpoint_index_id = 3127
and b.geo_boundary_type_id = 2
and e.expiration_date >= now()
group by
b.geo_boundary_id
When I run an explain for the query I get the following:
id select_type table type possible_keys key key_len ref rows Extra
------ -------------- ---------- ------- --------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------- ---------- ------------------------ ------- -------------------------------
1 PRIMARY d const PRIMARY PRIMARY 4 const 1 Using temporary; Using filesort
1 PRIMARY c ref PRIMARY,fk_mtp_idx_geo_boundary_mtp_idx,fk_mtp_idx_geo_boundary_geo_boundary,fk_mtp_idx_geo_boundary_mtp_mem_idx,fk_mtp_idx_geo_boundary_geo_boundary_mem fk_mtp_idx_geo_boundary_mtp_idx 4 const 9
1 PRIMARY b eq_ref PRIMARY,fk_geo_boundary_inquiry,fk_geo_boundary_geo_boundary_type PRIMARY 4 gothim.c.geo_boundary_id 1 Using where
1 PRIMARY e eq_ref PRIMARY PRIMARY 4 gothim.b.inquiry_id 1 Using where
1 PRIMARY f ref fk_inquiry_device_mem_inquiry fk_inquiry_device_mem_inquiry 4 gothim.e.inquiry_id 2
1 PRIMARY a ref fk_geo_boundary_vertex_geo_boundary fk_geo_boundary_vertex_geo_boundary 4 gothim.b.geo_boundary_id 11 Using where
2 UNION d const PRIMARY PRIMARY 4 const 1 Using temporary; Using filesort
2 UNION c ref PRIMARY,fk_mtp_idx_geo_boundary_mtp_idx,fk_mtp_idx_geo_boundary_geo_boundary,fk_mtp_idx_geo_boundary_mtp_mem_idx,fk_mtp_idx_geo_boundary_geo_boundary_mem fk_mtp_idx_geo_boundary_mtp_idx 4 const 9
2 UNION b eq_ref PRIMARY,fk_geo_boundary_inquiry,fk_geo_boundary_geo_boundary_type PRIMARY 4 gothim.c.geo_boundary_id 1 Using where
2 UNION e eq_ref PRIMARY PRIMARY 4 gothim.b.inquiry_id 1 Using where
2 UNION f ref fk_inquiry_device_mem_inquiry fk_inquiry_device_mem_inquiry 4 gothim.e.inquiry_id 2
(null) UNION RESULT <union1,2> ALL (null) (null) (null) (null) (null) Using filesort
12 record(s) selected [Fetch MetaData: 1ms] [Fetch Data: 5ms]
Now, I can split the queries up and use the ORDER BY NULL trick to get rid of the filesort however when I attempt to add that to the end of a UNION it doesn't work.
I am considering splitting the query apart into 2 queries or possibly re-writing it completely not to use a UNION (though that is a bit more difficult of course). The other thing I have working against me is that we have this in production and I'd like to limit changes - I would have loved just to be able to add ORDER BY NULL to the end of the query and be done with it, but it doesn't work w/ the UNION.
Any help wo开发者_如何学Pythonuld be greatly appreciated.
Normally, ORDER BY
can be used for the individual queries within a UNION
like this:
(
SELECT *
FROM table1, …
GROUP BY
id
ORDER BY
NULL
)
UNION ALL
(
SELECT *
FROM table2, …
GROUP BY
id
ORDER BY
NULL
)
However, as the docs state:
However, use of
ORDER BY
for individualSELECT
statements implies nothing about the order in which the rows appear in the final result becauseUNION
by default produces an unordered set of rows. Therefore, the use ofORDER BY
in this context is typically in conjunction withLIMIT
, so that it is used to determine the subset of the selected rows to retrieve for theSELECT
, even though it does not necessarily affect the order of those rows in the finalUNION
result. IfORDER BY
appears withoutLIMIT
in aSELECT
, it is optimized away because it will have no effect anyway.
This is of course a smart move, however, not too smart, since they forgot to optimize away the ordering behavior of GROUP BY
as well.
So as for now, you should add a very high LIMIT
to your individual queries:
(
SELECT *
FROM table1, …
GROUP BY
id
ORDER BY
NULL
LIMIT 100000000
)
UNION ALL
(
SELECT *
FROM table2, …
GROUP BY
id
ORDER BY
NULL
LIMIT 100000000
)
I'll post it as a bug to MySQL, hope they'll fix it in the next release, but meanwhile you could use this solution.
Note that a similar solution (using TOP 100%
) was used to force ordering of the subqueries in SQL Server 2000
, however, it stopped working in 2005
(ORDER BY
has no effect in subqueries with TOP 100%
for the optimizer).
It is safe to use it though since it won't break your queries even if the optimizer behavior changes in the next releases, but will just make them as slow as they are now.
Maybe try something like
SELECT *
FROM
(
[your entire query here]
) DerivedTable
ORDER BY NULL
I've never used MySQL so forgive me if I'm missing the plot :)
EDIT: What if you run each individual query separately (which, as you say, works), but insert the data into a temporary table. Then, at the end just do a select from the temp table.
Have you tried changing the UNION
to UNION ALL
?
A UNION
tries to remove duplicate rows. In order to do that, it would have to sort the intermediate results what might explain what you are seeing in your execution plan.
From MySQL Union
By default the MySQL UNION removes all duplicate rows from the result set even if you don’t explicit using DISTINCT after the keyword UNION.
If you use UNION ALL explicitly, the duplicate rows remain in the result set. You only use this in the cases that you want to keep duplicate rows or you are sure that there is no duplicate rows in the result set.
Edit
I doubt it will make any difference (might even be worse) but could you try following "equivalent" query
select *
from (
select b.geo_boundary_id
, GeoBoundaryContains( c.tpi_geo_boundary_coverage_type_id, 29.287437, -95.055807, b.centroid_lat, b.centoid_lon, b.radius ) in_out
from geo_boundary b
join trackpoint_index_geo_boundary_mem c on c.geo_boundary_id = b.geo_boundary_id
where b.geo_boundary_type_id = 2
group by
b.geo_boundary_id
union all
select a.geo_boundary_id
, GeoBoundaryContains(c.tpi_geo_boundary_coverage_type_id, 29.287437, -95.055807, a.lat, a.lon, a.geo_boundary_vertex_id ) in_out
from geo_boundary_vertex a
join geo_boundary b on b.geo_boundary_id = a.geo_boundary_id
join trackpoint_index_geo_boundary_mem c on c.geo_boundary_id = b.geo_boundary_id
where b.geo_boundary_type_id = 3
group by
a.geo_boundary_id
) s
inner join (
select e.inquiry_match_type_id
, e.inquiry_id
, e.external_id
, COALESCE(f.inquiry_device_id,0) inquiry_device_id
, b.external_info1
, b.external_info2
, b.geo_boundary_id
, b.geo_boundary_type_id
from geo_boundary b
join trackpoint_index_geo_boundary_mem c on c.geo_boundary_id = b.geo_boundary_id
join trackpoint_index_mem d on d.trackpoint_index_id = c.trackpoint_index_id
join inquiry_mem e on e.inquiry_id = b.inquiry_id left
outer join inquiry_device_mem f on f.inquiry_id = e.inquiry_id and f.device_id = 3201
where d.trackpoint_index_id = 3127
and b.geo_boundary_type_id IN (2, 3)
and e.expiration_date >= now()
) r on r.geo_boundary_id = s.geo_boundary_id
精彩评论