Is there a name of the pattern/algorithm for what I'm trying to describe below?...
Say you have a tree of relevance-data like this:
- IDEs
- Visual Studio
- Visual Studio 2008
- Visual Studio 2010
- Eclipse
Then I have an object that contains a reference to "Visual Studio 2010".
Then I do a relevance开发者_StackOverflow社区-search for "Visual Studio" on this object and want to know how relevant this match is.
Is this something best done when building the tree with setting a specific value between nodes individually or can/should I set, for example, that one level away is 10 points, two levels away is 5 points and so on?
Multiple nodes could potentially be linked to multiple other nodes. Or is this a bad idea? Visual Studio is also a "Microsoft Software" and so on.
Could this also be made 2-ways? With points both up the tree and down the tree.
This are my initial thoughts to testing around and build some kind of relevance-engine. Please help me get me on some kind of track.
This is a big can of worms, so forgive me if this is hand wavy and general. There are all sorts of relations you could build into this data structure. Currently, you have a taxonomy of relationships. You also mentioned another category of 'Microsoft software' which will cross cut your taxonomy. You could then get in to has-a relationships and so on and so forth.
More generally, you're talking about an ontology. While there's been a whole lot of research about how they should be structured and searched, I don't know of any large projects that have built a rich ontology programmatically and even if you get experts to build an ontology by hand, it's not always clear how to weight things for a 'relevance engine'. I'm not on the bleeding edge of this stuff, but most information retrieval techniques that work the best are statistical ones that operate on simple structures, not the one's with richly structured data-models.
I think you're on the right track. My advice - keep it as simple as possible. I would structure the hierarchy as a general graph and base relevance on graph distance, if necessary putting a weight on each edge. Bidirectionality is good here too, so you can penalize for generalization/specification as necessary. There's no real cookbook approach here, you'll have to experiment
精彩评论