Is there a better way to find anagrams using SQL?_问答_开发者

Is there a better way to find anagrams using SQL?

开发者 https://www.devze.com 2023-01-21 14:56 出处：网络

Given the following database table: WORDS alphagram....varchar(15) word.........varchar(15) PK length.......int

相关专题：sql

Given the following database table:

WORDS
 alphagram....varchar(15)
 word.........varchar(15) PK
 length.......int

Where:

'alphagram' is the letters of a word in alphabetical order (e.g. AEINNRTT is the alphagram of INTRANET)
the primary key is 'word', and there are indexes on alphagram and length

I've found a way to find the anagrams of a given string of letters via SQL. For example, to find the anagrams of AEINNRTT this will work:

select alphagram, word, definition
from words
where length = 8
and alphagram like '%A%' 
and alphagram like '%E%' 
and alphagram like '%I%'
and alphagram like '%NN%' 
and alphagram like '%R%' 
and alphagram like '%TT%'

That will return 1 row (for INTRANET)

And if I wanted to include a known number of wildcards, for example, how many words are with INTRANET + a blank (wildcard) I just have to change the 'length' to the total number of letters + number of wild cards

e.g.

select alphagram, word, definition
from words
where length = 9
and alphagram like '%A%' 
and alphagram like '%E%' 
and alphagram like '%I%'
and alphagra开发者_如何学JAVAm like '%NN%' 
and alphagram like '%R%' 
and alphagram like '%TT%'

...will return 8 rows (ENTERTAIN, INSTANTER, INTEGRANT, INTRANETS, ITINERANT, NATTERING, RATTENING, and TRANSIENT)

My question is this: is there a more efficient way of doing this via SQL only?

This works pretty fast in SQLServer but pretty slow in SqlLite. I realise that the %xxx% searches are not fast.

You could create a kind of index column for each entry that has all the letters of the word in alphabetical order and then compare these. Each anagram will have the same index value.

One idea is to do it like this (for a given word length):

split the word into individual characters (probably using SUBSTRING() in a loop, though a better approach is probably worth a separate targeted SO question)
generate all permutations
PROFIT!

Though, as a commenter said, I'd STRONGLY advise you to do that outside SQL unless you have very good reasons not to or you're just doing this to challenge your skills.

The best way that I figured out to do this is: I created columns a...z and parsed each word and counted the number of occurrences of the given letter and put it under the respective column next when I entered the word to unscramble I counted each occurrence of each letter for that word and compared it with the words in the database This can be a little difficult to understand let me know if you need further clarification

This question is old and I may misunderstand something, but it seems your first request could be

select alphagram, word, definition
from words
where length = 8
and alphagram = 'AEINNRTT' and word <> alphagram

This works because all same-length anagrams have the same alphagram. It would use the index on alphagram and be very fast.

for the length > 8 case, it is harder to have an easy scenario, but I would try adding 26 columns to the table : alpha_a, alpha_b, .. containing the number of each letter in the alphagram. Each can have an index and then you search for

select alphagram, word, definition
from words
where length = 9
and alpha_a >= 1
and alpha_e >= 1
and alpha_i >= 1
and alpha_n >= 2
and alpha_r >= 1
and alpha_t >= 2