In a script I am working on, I calculate how relevant every item in one array is to each item in another array by comparing similarities in keywords and keyphrases. End the end, I select the top 4 most relevant items for each item in that second array.
I know this is a very vague background, but is there any way to avoid making the algor开发者_JAVA技巧ithm O(n^2) (comparing every item in one array for every item in another), or if there are more efficient ways of calculating relevancy?
Maybe you can group your job titles / job opening in category.
Use a list of the most frequent words ans only search matches among items having these words.
I mean no need to compare a "Java programmer" with a "C++ job opening" but among the "java" keyword you can still compare "programmer" and "project leader".
Do you see what I mean ?
But please, give us an example, it easier to answer when we know what we are talking about.
Use an inverted index (Hash Table) to get it down to O(n). Put all the items in the first list in one hash table. Then iterate through all the items in the second list, looking up each item in the hash table.
What I don't know is how you are defining similar. If similarity is simply that the items in the two lists are equal, then this will work. However if similarity is more complex, then you may need to build multiple hash tables for each type of similarity possible. For example, you could have one hash table that keys off of the phonetic spelling of a word, and one that keys off of the exact string of the word.
If you have one list that is large like a Job Openings list, and you want to query the list for candidate skills, you should really use a search engine. A search engine is just a set of hash tables keyed off of keywords. There is no sense rebuilding a search engine when you can use one that has already been built. First you index all the job openings, then you query the search engine using words from a candidate's resume. A popular open source search engine that you may want to look into is Solr.
精彩评论