this is my first question on Stack Overflow. This is not really a programming question but since most of us have to deal with theoretical problems at some point and there might be some graph theory specialists around, I thought I might give it a go.
I am currently doing some research on multilingual websites and I found some interesting patterns in the website structure. The graphs below are the website graphs of two different multilingual websites. Sorry, I don't have enough rep points to post images so I leave them as links. I used the Force Atlas algorithm for the layout. Vertices are colored according to the page language. The shaded areas correspond to the subgraphs of a specific language.
Here is the graph of the website where different language versions of the same content are very closely linked. Hence the planes representing the different language versions are overlapping.
http://www.ai.soc.i.kyoto-u.ac.jp/~julien/phd/images/tight.png
In this second graph, we have a website where language versions of a website are almost independent, thus we have almost no overlap.
http://www.ai.soc.i.kyoto-u.ac.jp/~julien/phd/images/loose.png
So here is my question:
Is there a specific metric to quantify this overlap? If so, what is it named?
Since I used a force-based layout, the number of edges between the language subgraphs. So I guess something like taking the ratio of the number of edges within the subgraph to the number edges go开发者_如何学编程ing outside/coming inside a specific subgraph might do the trick. I am sure I am not the first to get this idea so I was wondering if this metric had a name. I could then Google it from there :)
Thank you in advance!
It sounds like what you're looking for is Network Modularity. Given a graph, and a partition (breaking the graph into disjoint subgraphs), the modularity is defined as:
The fraction of the edges that fall within the given groups minus the expected such fraction if edges were distributed at random.
Modularity was the basis of some of the first community detection algorithms on networks, which try to find sets of nodes that are densely connected. Recently, modularity has been shown to be a poor metric for community detection though because of resolution limits that fail to detect small groups or break apart well defined groups in certain cases (see this paper).
And there are now other approaches than modularity, designed to overcome the limitations mentionned by job, such as surprise; or the B- and C-scores (designed to be significance indices).
精彩评论