开发者

Implement text index in C# memory

开发者 https://www.devze.com 2023-02-13 11:55 出处:网络
I have a performance sensitive task and I am considering storing of all objects which is about 100,000 items in memory. (persistent at ms sql, but co开发者_Go百科py in memory to improve complex search

I have a performance sensitive task and I am considering storing of all objects which is about 100,000 items in memory. (persistent at ms sql, but co开发者_Go百科py in memory to improve complex search performance)

Search by key works fast enough, but search by text, eg. Contains is relatively slow - it takes about 30ms per each query like this:

IEnumerable<Product> result =
   products.Where(p =>
   p.Title.Contains(itemnames[rnd.Next(itemnames.Length)]));

I already tried to use memory database db4o but it performance is even worse - about 1.5 seconds per search in 100K items.

What's options in order not to review every object Title and perform this faster?

What memory database can i use to solve this task?


Do you have the option of changing the data structure in which your products are stored? One way you can speed up your Contains search is to store every possible Product.Title substring in a Dictionary<string, List<Product>>. This will allow your search to be O(1) instead of O(n).

You can generate every substring like so:

public static IEnumberable<string> AllSubstrings(this string value)
{
    int index = 0;
    while(++index <= value.Length)
    {
        yield return value.Substring(0, index);
    }

    index = 0;
    while(++index <= value.Length - 1)
    {
        yield return value.Substring(index);
    }
}

Then you can populate your dictionary like so:

var titleIndex = new Dictionary<string, List<Product>>();

foreach(Product product in products)
{
    foreach(string substring in product.Title.AllSubstrings())
    {
        if(titleIndex.ContainsKey(substring))
        {
            index[substring].Add(product);
        }
        else
        {
            index[substring] = new List<Product> { product };
        }
    }
}

And lastly, you perform your search like so:

string searchString = itemnames[rnd.Next(itemnames.Length)];

if(titleIndex.ContainsKey(searchString))
{
    List<Product> searchResults = titleIndex[searchString];
}

Note: As you may have guessed, storing your data like this takes more CPU time up front and uses more RAM.


Try using Sql Server Full Text search instead: http://msdn.microsoft.com/en-us/library/ms142571.aspx
It may be faster than the sequential search in your example.


If you actually need to search on contained words and not truly any contained text, then create an index in memory. Create a Dictionary and add an entry for each word in the title to the dictionary. Then you can do quick lookups by individual words.

Another option would be to load the data into a SQLite in-memory database and use the built-in Full Text Search engine to do the searches.


I would try SQLite since it has embedded full-text indexes (FTS3).

0

精彩评论

暂无评论...
验证码 换一张
取 消