开发者

help optimizing query (shows strength of two-way relationships between contacts)

开发者 https://www.devze.com 2023-01-05 00:42 出处:网络
i have a contact_relationship table that stores the reported strength of a relationship between one contact and another at a given point in time.

i have a contact_relationship table that stores the reported strength of a relationship between one contact and another at a given point in time.

mysql> desc contact_relationship;
+------------------+-----------+------+-----+-------------------+-----------------------------+
| Field            | Type      | Null | Key | Default           | Extra                       |
+------------------+-----------+------+-----+-------------------+-----------------------------+
| relationship_id  | int(11)   | YES  |     | NULL              |                             |
| contact_id       | int(11)   | YES  | MUL | NULL              |                             |
| other_contact_id | int(11)   | YES  |     | NULL              |                             |
| strength         | int(11)   | YES  |     | NULL              |                             |
| recorded         | timestamp | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+------------------+-----------+------+-----+-------------------+-----------------------------+

now i want to get a list of two-way relationships between contacts (meaning there are two rows, one with contact a specifying a relationship strength with contact b and another with contact b specifying a strength for contact a -- the strength of the two-way relationship is the smaller of those two strength values).

this is the query i've come up with but it is pretty slow:

select 
    mrcr1.contact_id, 
    mrcr1.other_contact_id, 
    case when (mrcr1.strength < mrcr2.strength) then 
        mrcr1.strength 
    else 
        mrcr2.strength 
    end strength 
from ( 
    select 
        cr1.* 
    from ( 
        select 
            contact_id,
            other_contact_id,
            max(recorded) as max_recorded 
        from 
            contact_relationship 
        group by 
            contact_id,
            other_contact_id 
    ) as cr2 
    inner join contact_relationship cr1 on 
        cr1.contact_id = cr2.contact_id 
        and cr1.other_contact_id = cr2.other_contact_id 
        and cr1.recorded = cr2.max_recorded 
) as mrcr1, 
( 
    select 
        cr3.* 
    from ( 
        select 
            contact_id,
            other_contact_id,
            max(recorded) as max_recorded 
        from 
            contact_relationship 
        group by 
            contact_id,
            other_contact_id 
    ) as cr4 
    inner join contact_relationship cr3 on 
        cr3.contact_id = cr4.contact_id 
        and cr3.other_contact_id = cr4.other_contact_id 
        and cr3.recorded = cr4.max_recorded 
) as mrcr2 
where 
    mrcr1.contact_id = mrcr2.other_contact_id 
    and mrcr1.other_contact_id = mrcr2.contact_id 
    and mrcr1.contact_id != mrcr1.other_contact_id 
    and mrcr2.contact_id != mrcr2.other_contact_id 
    and mrcr1.contact_id <= mrcr1.other_contact_id; 

anyone have any recommendations of how to speed it up?

note that because a user may specify the strength of his relationship with a particular user more than once, you must only grab the most recent record for each pair of contacts.

update: here is the result of explaining the que开发者_开发技巧ry...

+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
| id | select_type | table                | type  | possible_keys                                                                          | key                          | key_len | ref                                 | rows  | Extra                          |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+
|  1 | PRIMARY     | <derived2>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36029 | Using where                    |
|  1 | PRIMARY     | <derived4>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36029 | Using where; Using join buffer |
|  4 | DERIVED     | <derived5>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36021 |                                |
|  4 | DERIVED     | cr3                  | ref   | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10      | cr4.contact_id,cr4.other_contact_id |     1 | Using where                    |
|  5 | DERIVED     | contact_relationship | index | NULL                                                                                   | contact_relationship_index_3 | 14      | NULL                                | 37973 | Using index                    |
|  2 | DERIVED     | <derived3>           | ALL   | NULL                                                                                   | NULL                         | NULL    | NULL                                | 36021 |                                |
|  2 | DERIVED     | cr1                  | ref   | contact_relationship_index_1,contact_relationship_index_2,contact_relationship_index_3 | contact_relationship_index_2 | 10      | cr2.contact_id,cr2.other_contact_id |     1 | Using where                    |
|  3 | DERIVED     | contact_relationship | index | NULL                                                                                   | contact_relationship_index_3 | 14      | NULL                                | 37973 | Using index                    |
+----+-------------+----------------------+-------+----------------------------------------------------------------------------------------+------------------------------+---------+-------------------------------------+-------+--------------------------------+


You are losing a lot lot lot of time selecting the most recent record. 2 options :

1- Change the way you are stocking data, and have a table with only recent record, and an other table more like historical record.

2- Use analytic request to select the most recent record, if your DBMS allows you to do this. Something like

Select first_value(strength) over(partition by contact_id, other_contact_id order by recorded desc)
from contact_relationship

Once you have the good record line, I think your query will go a lot faster.


Scorpi0's answer got me to thinking maybe I could use a temp table...

create temporary table mrcr1 (
    contact_id int, 
    other_contact_id int, 
    strength int, 
    index mrcr1_index_1 (
        contact_id, 
        other_contact_id
    )
) replace as 
    select 
        cr1.contact_id, 
        cr1.other_contact_id, 
        cr1.strength from ( 
            select 
                contact_id, 
                other_contact_id, 
                max(recorded) as max_recorded 
            from 
                contact_relationship 
            group by 
                contact_id, other_contact_id 
        ) as cr2 
        inner join 
            contact_relationship cr1 on 
                cr1.contact_id = cr2.contact_id 
                and cr1.other_contact_id = cr2.other_contact_id 
                and cr1.recorded = cr2.max_recorded;

which i had to do twice (second time into a temp table named mrcr2) because mysql has a limitation where you can't alias the same temp table twice in one query.

with my two temp tables created my query then becomes:

select 
    mrcr1.contact_id, 
    mrcr1.other_contact_id, 
    case when (mrcr1.strength < mrcr2.strength) then 
        mrcr1.strength 
    else 
        mrcr2.strength 
    end strength 
from 
    mrcr1,
    mrcr2 
where 
    mrcr1.contact_id = mrcr2.other_contact_id 
    and mrcr1.other_contact_id = mrcr2.contact_id 
    and mrcr1.contact_id != mrcr1.other_contact_id 
    and mrcr2.contact_id != mrcr2.other_contact_id 
    and mrcr1.contact_id <= mrcr1.other_contact_id; 
0

精彩评论

暂无评论...
验证码 换一张
取 消