i have a contact_relationship table that stores the reported strength of a relationship between one contact and another at a given point in time.
mysql> desc contact_relationship;
+------------------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+------------------+-----------+------+-----+-------------------+-----------------------------+
| relationship_id | int(11) | YES | | NULL | |
| contact_id | int(11) | YES | MUL | NULL | |
| other_contact_id | int(11) | YES | | NULL | |
| strength | int(11) | YES | | NULL | |
| recorded | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+-----------+------+-----+-------------------+-----------------------------+
now i want to get a list of two-way relationships between contacts (meaning there are two rows, one with contact a specifying a relationship strength with contact b and another with contact b specifying a strength for contact a -- the strength of the two-way relationship is the smaller of those two strength values).
this is the query i've come up with but it is pretty slow:
select
mrcr1.contact_id,
mrcr1.other_contact_id,
case when (mrcr1.strength < mrcr2.strength) then
mrcr1.strength
else
mrcr2.strength
end strength
from (
select
cr1.*
from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id,
other_contact_id
) as cr2
inner join contact_relationship cr1 on
cr1.contact_id = cr2.contact_id
and cr1.other_contact_id = cr2.other_contact_id
and cr1.recorded = cr2.max_recorded
) as mrcr1,
(
select
cr3.*
from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id,
other_contact_id
) as cr4
inner join contact_relationship cr3 on
cr3.contact_id = cr4.contact_id
and cr3.other_contact_id = cr4.other_contact_id
and cr3.recorded = cr4.max_recorded
) as mrcr2
where
mrcr1.contact_id = mrcr2.other_contact_id
and mrcr1.other_contact_id = mrcr2.contact_id
and mrcr1.contact_id != mrcr1.other_contact_id
and mrcr2.contact_id != mrcr2.other_contact_id
and mrcr1.contact_id <= mrcr1.other_contact_id;
anyone have any recommendations of how to speed it up?
note that because a user may specify the strength of his relationship with a particular user more than once, you must only grab the most recent record for each pair of contacts.
update: here is the result of explaining the que开发者_开发技巧ry...
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 36029 | Using where |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 36029 | Using where; Using join buffer |
| 4 | DERIVED | <derived5> | ALL | NULL | NULL | NULL | NULL | 36021 | |
| 4 | DERIVED | cr3 | ref | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10 | cr4.contact_id,cr4.other_contact_id | 1 | Using where |
| 5 | DERIVED | contact_relationship | index | NULL | contact_relationship_index_3 | 14 | NULL | 37973 | Using index |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | NULL | 36021 | |
| 2 | DERIVED | cr1 | ref | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10 | cr2.contact_id,cr2.other_contact_id | 1 | Using where |
| 3 | DERIVED | contact_relationship | index | NULL | contact_relationship_index_3 | 14 | NULL | 37973 | Using index |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
You are losing a lot lot lot of time selecting the most recent record. 2 options :
1- Change the way you are stocking data, and have a table with only recent record, and an other table more like historical record.
2- Use analytic request to select the most recent record, if your DBMS allows you to do this. Something like
Select first_value(strength) over(partition by contact_id, other_contact_id order by recorded desc)
from contact_relationship
Once you have the good record line, I think your query will go a lot faster.
Scorpi0's answer got me to thinking maybe I could use a temp table...
create temporary table mrcr1 (
contact_id int,
other_contact_id int,
strength int,
index mrcr1_index_1 (
contact_id,
other_contact_id
)
) replace as
select
cr1.contact_id,
cr1.other_contact_id,
cr1.strength from (
select
contact_id,
other_contact_id,
max(recorded) as max_recorded
from
contact_relationship
group by
contact_id, other_contact_id
) as cr2
inner join
contact_relationship cr1 on
cr1.contact_id = cr2.contact_id
and cr1.other_contact_id = cr2.other_contact_id
and cr1.recorded = cr2.max_recorded;
which i had to do twice (second time into a temp table named mrcr2) because mysql has a limitation where you can't alias the same temp table twice in one query.
with my two temp tables created my query then becomes:
select
mrcr1.contact_id,
mrcr1.other_contact_id,
case when (mrcr1.strength < mrcr2.strength) then
mrcr1.strength
else
mrcr2.strength
end strength
from
mrcr1,
mrcr2
where
mrcr1.contact_id = mrcr2.other_contact_id
and mrcr1.other_contact_id = mrcr2.contact_id
and mrcr1.contact_id != mrcr1.other_contact_id
and mrcr2.contact_id != mrcr2.other_contact_id
and mrcr1.contact_id <= mrcr1.other_contact_id;
精彩评论