开发者

Finding anagrams in a word list

开发者 https://www.devze.com 2023-03-14 03:34 出处:网络
I have a word list and a file containing a number of anagrams. These anagrams are words found in the word list. I need to develop an algorithm to find the matching words and produce them in an output

I have a word list and a file containing a number of anagrams. These anagrams are words found in the word list. I need to develop an algorithm to find the matching words and produce them in an output file. The code I have developed so far has only worked for the first two words. In addition, I can't get the code to play nice with strings containing numbers anywhere in it. Please tell me how I can fix the code.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main (void)
{
int x = 0, y = 0;
int a = 0, b = 0;
int emptyx, emptyy;
int match = 0;
ifstream f1, f2;
ofs开发者_运维问答tream f3;
string line, line1[1500], line2[50];
size_t found;

f1.open ("wordlist.txt");
f2.open ("file.txt");
f3.open ("output.txt");

while (f1.eof() == 0)
{
    getline (f1, line);
    line1[x] = line;
    x++;
}

while (f2.eof() == 0)
{
    getline (f2, line);
    line2[y] = line;
    y++;
}

//finds position of last elements
emptyx = x-1;
emptyy = y-1;

//matching algorithm
for (y = 0; y <= emptyy; y++)
{
    for (x = 0; x <= emptyx; x++)
    {
        if (line2[y].length() == line1[x].length())
        {
            for (a = 0; a < line1[x].length(); a++)
            {
                found = line2[y].find(line1[x][a]);
                if (found != string::npos)
                {
                    match++;
                    line2[y].replace(found, 1, 1, '.');

                    if (match == line1[x].length())
                    {
                        f3 << line1[x] << ", ";
                        match = 0;
                    }
                }
            }
        }
    }
}

f1.close();
f2.close();
f3.close();

return 0;
}


Step 1: Build an index with a key of the sorted characters in each word in the wordlist and with the value being the the word.

act   -  cat
act   -  act
dgo   -  dog

...

aeeilnppp - pineapple

....

etc...

Step 2: For each anagram you want to find, sort the characters in your anagram word, and then match against the index to retrieve all words from index with matching sorted key.


Trying to improve Mitch Wheat's solution:

  • Storing both sorted order and the word is really not necessary - store only the sorted string for every word in list.

  • Anyways, when we read a word from the file we have to sort it to find if it is equal to the sorted string - and the index is indexed on sorted string, so it will not help anyways.

  1. Build a 'position independent' hash with words in word list - also store the sorted string in the hash.

  2. For every word in file, get the 'position independent' hash and check in hashtable.

  3. If hit, sort and compare to every sorted string stored at this position in hash (collisions!).

Thoughts?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号