开发者

Calculating link strength between two users with pure sql

开发者 https://www.devze.com 2023-01-23 07:06 出处:网络
One measurement of link strength between two users (buddies) is the following: S = (number of common buddies)/(number of buddies of person1 UNION number of buddies of person2)

One measurement of link strength between two users (buddies) is the following:

S = (number of common buddies)/(number of buddies of person1 UNION number of buddies of person2)

To calculate the value above I have started to write the following query:

WITH user1 AS
(
SELECT calling_party, called_party FROM monthly_connections WHERE calling_party = 'a' OR called_party ='a'
),
user2 AS
(
SELECT calling_party, called_party FROM monthly_connections WHERE calling_party = 'b' OR called_party ='b'
),
commonUsers AS
(
SELECT COUNT (*) common_users_count FROM user1 u1 INNER JOIN user2 u2 ON u1.called_party = u2.called_party OR u1.calling_party = u2.calling_party OR u1.called_party = u2.calling_party OR u1.calling_party = u2.called_party
),
开发者_开发知识库unionUsers AS
(
SELECT COUNT(*) FROM user1  UNION SELECT  COUNT(*) FROM user2
)

Then the number of unionUsers (which am not sure is written correctly) should be used as a denominator. Anyway I don't know how to complete the procedure in order to get the desired value so I would appreciate your help.

Thank you!


The count(*) queries return scalars that you can add arithmetically. No need to use UNION (which is a SET operation).


I think that what you're trying to say is that S is the number of common buddies over the total number of people who are buddies of either person 1 or person 2.

Perhaps someone else will give you the proper SQL, but here's some pseudo code which I think will get the two numbers:

SELECT COUNT(*) as AllFriends, 
SUM(Case when A.FriendID is not null and B.FriendID is not null then 1 else 0 end) AS JointFriends FROM
(
  (SELECT FriendID from Friends WHERE PersonID=x) A
  FULL OUTER JOIN 
  (SELECT FriendID from Friends WHERE PersonID=y) B
  ON A.FriendID = B.FriendID
) C


WITH user1_buddies AS
(
SELECT called_party AS buddy FROM monthly_connections WHERE calling_party = '80A8A8D9D9AC58BE479C59D9BC59625691F32E76'
UNION SELECT calling_party AS buddy FROM monthly_connections WHERE called_party ='80A8A8D9D9AC58BE479C59D9BC59625691F32E76'
),
user2_buddies AS
(
SELECT calling_party AS buddy FROM monthly_connections WHERE  called_party ='11171309B5B6163D71B477D99D29763E4A7305E1'
UNION SELECT called_party AS buddy FROM monthly_connections WHERE calling_party = '11171309B5B6163D71B477D99D29763E4A7305E1'
),
commonUsers AS
(
SELECT cu.b1, cu.b2 FROM (SELECT u1.buddy b1, u2.buddy b2 FROM user1_buddies u1 INNER JOIN user2_buddies u2 ON u1.buddy = u2.buddy) cu
),
allUsers AS
(
SELECT b allUsersCount FROM (SELECT buddy b FROM user1_buddies UNION SELECT buddy b FROM user2_buddies) cu
)
SELECT(CAST((SELECT COUNT (*) FROM commonUsers) AS decimal(10,5)) / (CAST((SELECT COUNT (*) FROM allUsers) AS decimal(10,5)))) link_strength


It may be easier to handle this in multiple steps so you can see and check the intermediate output.

You'll need a user table with the userID pk and you'll need to cross join (full outer join) it with itself to get all buddy pairs, except add to the where clause to exclude rows where the userID = userID (itself. no one is their own buddy or calls themselves.) This defines the set of all possible buddy connections.

You have the monthly connection table already with one set of buddies, calling buddy -> called buddy. This defines one type of buddy connection.

You need another instance of your monthly connection table with the buddies swapped, called buddy -> calling buddy. This defines a second type of buddy connection.

You need a third instance of your monthly connection table with 2 instances of the monthly connection table, joined on the called field. Make sure in your where clause you exclude rows where the calling user in both tables is equal. If two different users called the same third user, it defines the third type of buddy connection.

Now you can determine the number of common buddies for each buddy pair. It's the number of rows from those three sets added together.

Make sense?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号