开发者

Visualize data and clustering [closed]

开发者 https://www.devze.com 2023-01-07 05:13 出处:网络
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.

Closed 7 years ago.

Improve this question

i am currently writing a python script to find the similarity between documents.I have already calcula开发者_JAVA技巧ted the similarities score for each document pairs and store them in dictionaries. It looks something like this:

{(8328, 8327): 1.0, (8313, 8306): 0.12405229825691289, (8329, 8328): 1.0, (8322, 8321): 0.99999999999999989, (8328, 8329): 1.0, (8306, 8316): 0.12405229825691289, (8320, 8319): 0.67999999999999989, (8337, 8336): 1.0000000000000002, (8319, 8320): 0.67999999999999989, (8313, 8316): 0.99999999999999989, (8321, 8322): 0.99999999999999989, (8330, 8328): 1.0}

My final goal is to cluster the similar documents together. The data above can be viewed in another way. Let's say the document pair (8313,8306). The similarity score is 0.12405. I can specified that the inverse of the score will be the distance between document 8313 and 8306. Therefore, similar documents will cluster closer together while not-so-similar documents will be further apart based on their distance.

My question is, IS there any open source visualization tool that can help me to achieve this?


I'm not sure what the term for that type of graph would be (minimum weight spanning tree?), but check out Graphviz. There are some Python bindings for it as well, but failing that you could simply generate an input file for it, or pipe data directly in.


I think you have to use MDS

http://en.wikipedia.org/wiki/Multidimensional_scaling


I think Weka can do this. You might have to massage the input file to a different format first. Weka also has an API, though it's in Java, not Python.


There are lots of tools you can use to do this.

There have been other mentions, but you could fairly easily do something like this in Tkinter, PyGTK+, PyQT, matplotlib, or really any graphical lib.

However, a polar plot in matplotlib would be fairly simple:

(untested):

import math
from matplotlib.pyplot import figure, show

# assign your data here
fig = figure()
ax = fig.add_subplot(111, polar=True)

for pair in data:
    ax.plot(0, data[pair], 'o')
show()

That should give you a rudimentary visualization. You could also change it around to

ax.plot(pair*math.pi, 1, 'o')

For a different style of visualization.

The matplotlib docs are very good and they have plenty of examples.


Maybe Networkx may help. This example could be a good starting point:

http://networkx.lanl.gov/examples/drawing/knuth_miles.html

0

精彩评论

暂无评论...
验证码 换一张
取 消