开发者

How to use JOIN instead of UNION to count the neighbors of "A OR B"?

开发者 https://www.devze.com 2023-01-27 01:52 出处:网络
The following query counts the common neighbors of two nodes in the graph: DECLARE @monthly_connections_test TABLE (

The following query counts the common neighbors of two nodes in the graph:

    DECLARE @monthly_connections_test TABLE (
  calling_party VARCHAR(50)
  , called_party VARCHAR(50))

INSERT INTO @monthly_connections_test
          SELECT 'z1', 'z2'
UNION ALL SELECT 'z1', 'z3'
UNION ALL SELECT 'z1', 'z4'
UNION ALL SELECT 'z1', 'z5'
UNION ALL SELECT 'z1', 'z6'
UNION ALL SELECT 'z2', 'z1'
UNION ALL SELECT 'z2', 'z4'
UNION ALL SELECT 'z2', 'z5'
UNION ALL SELECT 'z2', 'z7'
UNION ALL SELECT 'z3', 'z1'
UNION ALL SELECT 'z4', 'z7'
UNION ALL SELECT 'z5', 'z1'
UNION ALL SELECT 'z5', 'z2'
UNION ALL SELECT 'z7', 'z4'
UNION ALL SELECT 'z7', 'z2'

SELECT     monthly_connections_test.calling_party AS user1, monthly_connections_test_1.calling_party AS user2, COUNT(*) AS calling_calling, 0 AS calling_called, 
                      0 AS called_calling, 0 AS called_called, 0 AS both_directions
FROM         @monthly_connections_test AS monthly_connections_test INNER JOIN
                      @monthly_connections_test AS monthly_connections_test_1 开发者_C百科ON 
                      monthly_connections_test.called_party = monthly_connections_test_1.called_party AND 
                      monthly_connections_test.calling_party < monthly_connections_test_1.calling_party
GROUP BY monthly_connections_test.calling_party, monthly_connections_test_1.calling_party

For the following graph

How to use JOIN instead of UNION to count the neighbors of "A OR B"?

it returns the number of common neighbors which are called by user1 AND user2 so for example the number of neighbors called by z1 AND z2 it returns 2 as both call z4 and z5.

Another thing I would like to count is the number of all neighbors of two nodes (users) which are called either by user1 or user2 so for example for the pair (z1, z2) the query should return 5 (user z1 calls z2, z3, z4, z5, z6 and user z2 calls z1, z4, z5, z7 - the connections between z1 and z2 have to be exluded as (z1, z2) is the observed pair and the number of elements in (z3, z4, z5, z6) U (z4, z5, z7) is 5).

Does anyone know how to modify/create the join query for the above logic?

Thank you!


@Martin's answer is correct. He's a genius.

Go Martin!

CORRECTION

His answer works with 1 small modification if run against the bidirectional solution I gave. Otherwise the results are incorrect.

So your answer his his and mine :)

The full solution:

DECLARE @T1 TABLE (calling_party VARCHAR(50), called_party VARCHAR(50))

INSERT  INTO @T1
SELECT  *
FROM    dbo.monthly_connections_test

INSERT  INTO @T1
SELECT  *
FROM    (
        SELECT  called_party AS calling_party, calling_party AS called_party
        FROM    dbo.monthly_connections_test AS T2
        WHERE   T2.called_party < T2.calling_party
        ) T2
WHERE   NOT EXISTS (
        SELECT *
        FROM    monthly_connections_test
        WHERE   calling_party = T2.calling_party and called_party = T2.called_party
)

select u1, u2, count(called_party) called_parties 
from (
select distinct u1, u2, called_party from 
(
        select a1.calling_party u1, a2.calling_party u2 from 
        (select calling_party from @T1 group by calling_party) a1,
        (select calling_party from @T1 group by calling_party) a2
) pairs,
 @T1 AS T
where
(u1 <> u2) and 
((u1 = t.calling_party and u2 <> t.called_party) or
(u2 = t.calling_party and u1 <> t.called_party))
) res
group by u1, u2
order by u1, u2


I don't have SQL Server here, but this should work:

select u1, u2, count(called_party) called_parties 
from (
select distinct u1, u2, called_party from 
(
    select a1.calling_party u1, a2.calling_party u2 from 
        (select calling_party from @monthly_connections_test group by calling_party) a1,
        (select calling_party from @monthly_connections_test group by calling_party) a2
) pairs,
 @monthly_connections_test t
where 
(u1 = t.calling_party and u2 <> t.called_party) or
(u2 = t.calling_party and u1 <> t.called_party)
) res
group by u1, u2;

The pairs subquery simple creates all possible pairs of users, you probably have a userlist somewhere else.


Out of interest, doesn't z1 also call z2 and vice-versa, making the desired result (z2, z3, z4, z5, z6) U (z1, z4, z5, z7) is 7?

Would a COMPUTE operation give you the count you want?


Niko, I believe there is a missing data point in your table example for this question. I have added the following for my testing.

UNION ALL SELECT 'z1', 'z6'

I have two simple queries to answer the questions:

"the number of common neighbors which are called by user1 AND user2 "

" I would like to count is the number of all neighbors of two nodes (users) which are called either by user1 or user2"

declare @Party1 varchar(10)
declare @Party2 varchar(10)
set @Party1 = 'z1'
set @Party2 = 'z2'
select count(distinct called_party) AS 'Total calls 2 neighbors' 
from @monthly_connections_test
WHERE calling_party in (@Party1, @Party2)
AND called_party not in (@Party1 , @Party2)

;With cteAllCalls(x) as
(
Select called_party from @monthly_connections_test 
where called_party != @Party1 and calling_party = @Party2
 )

select Count(X) AS 'Total common calls' from cteAllCalls
inner join @monthly_connections_test on x = called_party
and called_party != @Party2 and calling_party = @Party1


Ok, this is a seriously tough nut to crack!

The first problem is that the data is bidirectional in the table. The first step to solving this is the make the data unidirectional.

DECLARE @T1 TABLE (calling_party VARCHAR(50), called_party VARCHAR(50))
DECLARE @T2 TABLE (calling_party VARCHAR(50), called_party VARCHAR(50))

INSERT  INTO @T1
SELECT  *
FROM    dbo.monthly_connections_test

INSERT  INTO @T1
SELECT  *
FROM    (
        SELECT  called_party AS calling_party, calling_party AS called_party
        FROM    dbo.monthly_connections_test AS T2
        WHERE   T2.called_party < T2.calling_party
        ) T2
WHERE   NOT EXISTS (
        SELECT *
        FROM    monthly_connections_test
        WHERE   calling_party = T2.calling_party and called_party = T2.called_party
)

INSERT  INTO @T2
SELECT  DISTINCT TOP (100) PERCENT calling_party, called_party
FROM    @T1
WHERE   calling_party < called_party
UNION
SELECT  DISTINCT TOP (100) PERCENT called_party AS calling_party, calling_party AS called_party
FROM    @T1
WHERE   calling_party > called_party

The above fully solves any bidirectional issues by unwrapping the data into a distinct 1:1 relationship. The result is only 9 records that represent every relation as per the original data.

We (yes, after these hours, this is my problem now too) should be able to query the result to get the neighbors as desired. This is the next hurdle...

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号