开发者

Generating word list from word

开发者 https://www.devze.com 2023-02-25 04:27 出处:网络
looking for ideas to get started on what I would call a word obscurifier word list generator. it takes a string, e.g. \"hello\" and basically looks to generate further possibilities of similar words

looking for ideas to get started on what I would call a word obscurifier word list generator.

it takes a string, e.g. "hello" and basically looks to generate further possibilities of similar words out of it, i.e. returning something like:

  • h3ll0
  • he11o
  • HEL10
  • h3LLo
  • ...
  • ...

As you can see I need to be cap sensitive.

I am just looking at ideas/ways I could kick this off.

Maybe the first pass does the cap stuff:

and then feed that list/array to the method to sub numbers/symbols

I am confident in and will most likely use C# (at least to start) this application.

If something has already been written which is available which does the kind of thing I am talking about then all the better, i'd love to hear about it.

Thanks for reading.


This is too long to be a comment, but it's not a real answer. Merely a suggestion. First, consider this link:

http://ericlippert.com/2010/06/28/computing-a-cartesian-product-with-linq/

You could think of your problem as computing a cartesian product of a sequence of sequences. Just thinking about alphanumeric characters, they have from 1 to 3 states, such as a the original character in lower case (if applicable), in upper case (if applicable), and the numeric replacement (again, if applicable). Or if you're starting with a number, the number, and the upper and lower case letter replacement. Such as:

A -> a, A, 4
B -> b, B, 8
C -> c, C
D -> d, D
// etc.
1 -> 1, L, l
2 -> 2
3 -> 3, e, E
// etc.

Each of those is a sequence. So in your problem, you might turn the original input "hello" into a process where you grab the sequences that correspond to each character in the string, and then take those sequences and get their cartesian products. The methodology in the linked blog from Eric Lippert would be a great guide for continuing from here.


This sample puts Anthony Pegram's idea into code. I hardcoded your letter mappings and input, but you will be able to change this easily.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace SO5672236
{
    static class Program
    {
        static void Main()
        {
            // Setup your letter mappings first
            Dictionary<char,string[]> substitutions = new Dictionary<char, string[]>
            {
                {'h', new[] {"h", "H"}},
                {'e', new[] {"e", "E", "3"}},
                {'l', new[] {"l", "L", "1"}},
                {'o', new[] {"o", "O"}}
            };

            // Take your input
            const string input = "hello";

            // Get mapping for each letter in your input
            IEnumerable<string[]> letters = input.Select(c => substitutions[c]);

            // Calculate cortesian product
            var cartesianProduct = letters.CartesianProduct();

            // Concatenate letters
            var result = cartesianProduct.Select(x => x.Aggregate(new StringBuilder(), (a, s) => a.Append(s), b => b.ToString()));

            // Print out results
            result.Foreach(Console.WriteLine);
        }

        // This function is taken from 
        // http://blogs.msdn.com/b/ericlippert/archive/2010/06/28/computing-a-cartesian-product-with-linq.aspx
        static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)
        {
            IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
            return sequences.Aggregate(
              emptyProduct,
              (accumulator, sequence) =>
                from accseq in accumulator
                from item in sequence
                select accseq.Concat(new[] { item }));
        }

        // This is a "standard" Foreach helper for enumerables
        public static void Foreach<T>(this IEnumerable<T> enumerable, Action<T> action)
        {
            foreach (T value in enumerable)
            {
                action(value);
            }
        }
    }
}


You should look into string permutation.

http://www-edlab.cs.umass.edu/cs123/Projects/Permutation/project6.htm


Start with a

Dictionary:
    key:  letter    
    value:  List of alternate choices for that letter

create a new empty word
for each letter in the word,
    randomly choose an alternate choice and add it to the new word.
0

精彩评论

暂无评论...
验证码 换一张
取 消