I have a scenario where I have a large number of blogs. These blogs all have multiple posts. Each blog post can link to a post on another blog, but they should then never link back from that blog to the linking blog.
To clarify:
- Site A links to Site B (and can link to other Sites)
- Site B then cannot link to Site A (but can link to other Sites)
Every time a post is made I store the post's ID and the ID of the website it links to. It's important to remember that once a single post 开发者_开发知识库links to any post on another website that other website cannot link back from anywhere, not just the post it's linked to.
Site A can link to Site B as many times as it likes, and each post may link to more than one other post. An example scenario might be:
- Site A links to Site B
- Site C links to Site B
- Site D links to Site A
In the above data:
- Site A could link to Site C (or Site B again)
- Site B could link to Site D
- Site C could link to Site A or Site D (or Site B again)
- Site D could link to Site B or Site C (or Site A again)
Here is a link to some test data and a dump of the 2 tables needed: http://pastie.org/1506715
I think I need a cross join to get all possible linking combinations, but then factor in existing relationships to prevent sites from linking back in the opposite direction. The query I have so far is:
SELECT
t1.* , t2.* FROM test_posts t1, test_posts as t2
WHERE
t1.post_id != t2.post_id
ORDER BY
t1.post_id, t2.post_id;
This gives me all possible relationships between posts. What I'm struggling with is how to exclude relationships that would contradict the above rules. The previous relationships are recorded in the test_smartlinks_to_websites table, with post_id belonging the "originating" website and website_id belonging to the "destination" site (remembering that the relationship is effectively one-way between websites, not posts).
I have tried using a NOT EXISTS subquery, but I'm unsure on the exact clause (or whether that is the correct approach).
Correct me if I am wrong. It appears your task is to determine cycles in directed graph. It's not that complex as it seems. Please see this blog post for how it's done in SQL: http://devio.wordpress.com/2009/09/13/finding-cycles-in-directed-graphs-using-tsql/. Also see this link for the breadth first search in SQL: http://willets.org/sqlgraphs.html.
EDITED: added images for clarity and understanding of directed acyclic and cyclic graphs.
For example, here is something that resembles your situation. It's not a single graph but a set of graphs (or forest if they were trees). Note there is no common root. It's just nodes connected somehow. There is a cycle in the bigger subgraph where to nodes reference each other. If to remove the link upwards, the subgraph becomes acyclic.
精彩评论