开发者

What possible schema can I use to store words combinations?

开发者 https://www.devze.com 2023-02-04 20:18 出处:网络
I\'m making a simple program in Java. Given a set of letters it\'ll list all the words (with more than 2 letters) that match the combinations of the letters.

I'm making a simple program in Java. Given a set of letters it'll list all the words (with more than 2 letters) that match the combinations of the letters.

For example:

Is the given word is ward.

The result s开发者_开发知识库hould be: ward. raw, daw, war, rad

I have in a sqlite database a huge list o English words in the original form and sorted by letter, this make the selections faster.


The database schema looks like:

dictionary: {id, word, length}

anagram: {id, anagram, length}

anagram_dictionary: {id, word_id, anagram_id}


With the same example:

When the word raw is inserted

It search for arw, and the results give back raw, war

My problem resides that every time I do a search it do the math of the combinations of the letters I given.

For the example it makes this math:

4!/(4!*1!) + 4!/(3!*1!) = 5

My problem is that the given letters length is 16. So I have to make combinations of 16 in 16 + combinations of 16 in 15 + ... + combinations of 16 in 1

I need to improve the method because it takes ages to give a simple result, but I don't now how? So I try to store in the database, but can't figure out how?

Thanks in advance


It seems that the most effective way to do this would be to store words using an alpha ordered key (which you have already):

adn -> and, dna celrstu -> cluster etc...

Take your input, alphabetize the letters, look it up, match. Done.

If that isn't the answer to your question, you may want to adjust the wording of your question a bit...


Im not entirely sure on your constraints and resources, which would help me tune my answer but here it goes...

While you are inputing you dictionary, perform some pre-processing. Count up the frequencies just as CurtainDog recommends.

Now, based on your example it looks like you want to find the subset of your given word. You could search out its combinations OR you could eliminate those that wont fit into that subset.

thus

Get the dictionary
from this, your given word has an A, so skip this letter
from this, your given word does not have a B, so return all words that don't have a B.
from this, your given word does not have a C, so return all words that don't have a C.
from this, your given word has an D, improved formatting so skip this letter
etc...

it seems like your concern was the runtime growing as the your given word had more letters. With this solution the runtime gets better with larger words and your worse case scenario is (26-2)*(# of words in the dictionary)


In your dictionary, store the frequencies of each letter. Then, just build your select to only return words that have letter frequencies that match (or are lesser if you want to be able to return partial anagrams)

0

精彩评论

暂无评论...
验证码 换一张
取 消