Trying not to select same connection multiple times I have tried to use charindex() = 0 condition the following way:
WITH Cluster(calling_party, called_party, link_strength, Path)
AS
(SELECT
calling_party,
called_party,
link_strength,
CONVERT(varchar(max), calling_party + '.' + called_party) AS Path
FROM
monthly_connections_test
WHERE
link_strength > 0.1 AND
calling_party = 'b'
UNION ALL
SELECT
mc.calling_party,
mc.called_party,
mc.link_strength,
CONVERT(varchar(max), cl.Path + '.' + mc.calling_party + '.' + mc.called_party) AS Path
FROM
monthly_connections_test mc
INNER JOIN Cluster cl ON
(
开发者_开发知识库 mc.called_party = cl.called_party OR
mc.called_party = cl.calling_party
) AND
(
CHARINDEX(cl.called_party + '.' + mc.calling_party, Path) = 0 AND
CHARINDEX(cl.called_party + '.' + mc.called_party, Path) = 0
)
WHERE
mc.link_strength > 0.1
)
SELECT
calling_party,
called_party,
link_strength,
Path
FROM
Cluster OPTION (maxrecursion 30000)
The condition however does not fulfill its purpose as the same rows are selected multiple times.
The actual aim here is to retrieve the whole cluster of connections to which the selected user (in the example user b) belongs.
Edit1:
I tried to modify the query the following way:
With combined_users AS
(SELECT calling_party CALLING, called_party CALLED, link_strength FROM dbo.monthly_connections_test WHERE link_strength > 0.1),
related_users1 AS
(
SELECT c.CALLING, c.CALLED, c.link_strength, CONVERT(varchar(max), '.' + c.CALLING + '.' + c.CALLED + '.') path from combined_users c where CALLING = 'a1'
UNION ALL
SELECT c.CALLING, c.CALLED, c.link_strength,
convert(varchar(max),r.path + c.CALLED + '.') path
from combined_users c
join related_users1 r
ON (c.CALLING = r.CALLED) and CHARINDEX(c.CALLING + '.' + c.CALLED + '.', r.path)= 0
),
related_users2 AS
(
SELECT c.CALLING, c.CALLED, c.link_strength, CONVERT(varchar(max), '.' + c.CALLING + '.' + c.CALLED + '.') path from combined_users c where CALLED = 'a1'
UNION ALL
SELECT c.CALLING, c.CALLED, c.link_strength,
convert(varchar(max),r.path + c.CALLING + '.') path
from combined_users c
join related_users2 r
ON c.CALLED = r.CALLING and CHARINDEX('.' + c.CALLING + '.' + c.CALLED, r.path)= 0
)
SELECT CALLING, CALLED, link_strength, path FROM
(SELECT CALLING, CALLED, link_strength, path FROM related_users1 UNION SELECT CALLING, CALLED, link_strength, path FROM related_users2) r OPTION (MAXRECURSION 30000)
To test the query I created the cluster below:
The query above returned the following table:
a1 a2 1.0000000 .a1.a2.
a11 a13 1.0000000 .a12.a1.a13.a11.
a12 a1 1.0000000 .a12.a1.
a13 a12 1.0000000 .a12.a1.a13.
a14 a13 1.0000000 .a12.a1.a13.a14.
a15 a14 1.0000000 .a12.a1.a13.a14.a15.
a2 a10 1.0000000 .a1.a2.a10.
a2 a3 1.0000000 .a1.a2.a3.
a3 a4 1.0000000 .a1.a2.a3.a4.
a3 a6 1.0000000 .a1.a2.a3.a6.
a4 a8 1.0000000 .a1.a2.a3.a4.a8.
a4 a9 1.0000000 .a1.a2.a3.a4.a9.
The query obviously returns connections toward the selected node and the connections in the opposite direction. The problem is the change of direction: for example the connections a7, a4 and a11, a10 are not selected because the direction is changed (relative to starting node).
Does anyone know how to modify the query in order to include all connections?
Thank you
Okay, there are several things to discuss here.
Zerothly, i have PostgreSQL, so all this is done with that; i'm trying to only use standard SQL, so this should work okay with SQL Server too.
Firstly, if you're only interested in calls with link strength greater than 0.1, let's say:
-- like calls, but only strong enough to be interesting
create view strong_calls (calling_party, called_party, link_strength)
as (
select calling_party, called_party, link_strength
from monthly_connections_test
where link_strength > 0.1
);
And from now on, we'll talk in terms of this table.
Secondly, you say:
The actual aim here is to retrieve the whole cluster of connections to which the selected user (in the example user b) belongs.
If that's true, why are you computing the path? If you just want to know the set of connections, you can do:
with recursive cluster (calling_party, called_party, link_strength)
as (
(
select calling_party, called_party, link_strength
from strong_calls
where calling_party = 'b'
)
union
(
select c.calling_party, c.called_party, c.link_strength
from cluster this, strong_calls c
where c.calling_party = this.called_party
or c.called_party = this.calling_party
)
)
select *
from cluster;
Thirdly, perhaps you don't really want to find what connections are in the cluster, you want to find which people are in the cluster, and what the shortest path to them from the target is. In that case, you can do:
with recursive cluster (party, path)
as (
select cast('b' as character varying), cast('b' as character varying)
union
(
select (case
when this.party = c.calling_party then c.called_party
when this.party = c.called_party then c.calling_party
end), (this.path || '.' || (case
when this.party = c.calling_party then c.called_party
when this.party = c.called_party then c.calling_party
end))
from cluster this, strong_calls c
where (this.party = c.calling_party and position(c.called_party in this.path) = 0)
or (this.party = c.called_party and position(c.calling_party in this.path) = 0)
)
)
select party, path
from cluster
where not exists (
select *
from cluster c2
where cluster.party = c2.party
and (
char_length(cluster.path) > char_length(c2.path)
or (char_length(cluster.path) = char_length(c2.path)) and (cluster.path > c2.path)
)
)
order by party, path;
As you can see, you were very much on the right track.
If you actually do want a list of all the calls in the cluster, with paths, then, er, i'll get back to you!
EDIT: Bear in mind that the queries which don't construct a path will have very different performance characteristics to those which do. Roughly speaking, the non-path queries will do O(n) work (probably in about O(log n) iteration steps), because they visit each node in the cluster, but the path-constructing steps will do much more - O(n!) maybe? - because they have to visit every path through the graph. You'll be alright if clusters are as big as in your examples, but if they're much bigger, you may find the runtime is prohibitive.
charindex('b.d','b.c.d.b')=0 because there's a 'c.' in between
Easier to read:
WITH cluster(calling_party, called_party, link_strength, PATH)
AS (SELECT calling_party,
called_party,
link_strength,
CONVERT(VARCHAR(MAX), calling_party + '.' + called_party) AS
PATH
FROM monthly_connections_test
WHERE link_strength > 0.1
AND calling_party = 'b'
UNION ALL
SELECT mc.calling_party,
mc.called_party,
mc.link_strength,
CONVERT(VARCHAR(MAX), cl.PATH + '.' + mc.calling_party + '.' +
mc.called_party)
AS PATH
FROM monthly_connections_test mc
INNER JOIN cluster cl
ON ( mc.called_party = cl.called_party
OR mc.called_party = cl.calling_party )
AND ( Charindex(cl.called_party + '.' + mc.calling_party,
PATH)
= 0
AND Charindex(cl.called_party + '.' +
mc.called_party,
PATH)
=
0 )
WHERE mc.link_strength > 0.1)
SELECT calling_party,
called_party,
link_strength,
PATH
FROM cluster
OPTION (MAXRECURSION 30000)
To address your edited question, if you want to ignore directionality of links, try:
create view symmetric_users (calling_party, called_party, link_strength)
as (
select calling_party, called_party, link_strength from monthly_connections_test
union
select called_party , calling_party, link_strength from monthly_connections_test
)
Then point your query at that.
If you have users who call each other, you will have two rows for each ordered pair of users. You should be able to filter those out by choosing the stronger.
精彩评论