开发者

caching multiple key hash

开发者 https://www.devze.com 2023-03-08 12:46 出处:网络
I want to do some caching in my project. Let my API is int foo(int a, float b, float c, int d, char e)

I want to do some caching in my project.

Let my API is int foo(int a, float b, float c, int d, char e)

Now in my project, there is lot of calls to above time consuming API with repeating values of a, b, c ,d and e. Now I want to store return value of this function with these arguments as keys.

suppose my call sequence is

foo(23, 3.45, 4.5, 90, 'd') // returns 1000, so I need to store it in cache as (23,3.45, 4.5, 90, 'd')->1000

foo(30, 1.2, 3.5, 10开发者_开发问答0, 'e') // returns 2000, so I need to store it in cache as (30, 1.2, 3.5, 100, 'e')->2000

foo(23, 3.45, 4.5, 90, 'd') // No need to call this API, I just check in my cache value associated with    
//(23, 3.45, 4.5, 90, 'd'), which is already stored as 1000

What should be best strategy to implement above in C++? which data structure would be best to make cache table?


One key note: caching is difficult.

Often times people think that caching will solve all their issues, but they forget to take into account the issues that it brings to the table. An unmanaged cache is nothing else than a giant memory leak. Two strategies of note:

  • Size limit: whenever the cache is full, adding a new entry cause another entry to be evicted (you therefore need a scheme to decide when to evict an entry)
  • Time limit: entries are flushed out after a certain time elapsed

Usually, when we hear about caches we think LRU (Least Recently Used) Cache. Those cache are limited by size, and the least recently used entry is evicted when the cache is full. Note: might cause contention on multi-threading because read-only accesses in fact imply modifying a value.

Such a cache is implemented in terms of two elements:

  • A (key -> value) mapping, either using a tree or a hash-map
  • A priority list, which is interleaved within the nodes for efficiency

If you go this road, I would suggest using the Boost.MultiIndex library. There is an exemple of a MRU implementation which is very similar to your needs.


If you can use boost, look at boost::unordered_map, otherwise you can use a std::map. You will have to provide functor to generate the key.


It doesn't always work and is somewhat compiler dependent, but you can look into using function attributes. Of interest to you might be the const or pure attributes. hot might also be of interest.


Nice question. You have several options. First of all, put all the values into an struct:

struct values
{
   int a;
   float b;
    ...
};
  1. If one of the values of the sequence is most representative, you can just use a std::map to map that representative value to a "bucket". Let's say that the most representative is the float b :

    std::map< float, std::list < std::pair< values, int> > >

    is represented by the std::list, and stores pairs of value structures and result value (int in this case).

  2. Declare a map from the values to the result, int. For that, you should allow values struct to be compared against others in the map, so you have to write the operator<()

:

 int operator<(values const& left, values const& right)
 {
    if (left.a < left.b) ... // compare two values objects
 }

and then declare the map as usual:

std::map<values, int>

There are other questions, such as copy constructors, etc. that you have to deal with, but this is the idea.

Final note, you can also substitute std::map for unordered_map.


Put them all in a structure

struct mykey{ int a; float b; float c; int d; char e; };

Then write them in and hash the structure, and use it as a key

int foo(int a, float b, float c, int d, char e)
{
    mykey tk = { a, b, c, d, e };
    guid key = md5( &tk, sizeof( tk ) );


I'd use nested maps, so you use the first parameter to lookup a map from a map, until the final map where you lookup using the last parameter and the result is the previously cached value of foo.

When you arrive to the last map and find that foo hasn't been called for this setup of parameters, you only need to store the result of foo for the last parameter.


I suggest using the Hash table. You will only need to calculate hash function of the data. If the hash is strong enough, it is possible to store it and output value, without storing arguments. Also, this metod should work faster than using std::map.

In C++ this can be implemented with unordered_map or std::hash_map. Using very simple hash function will suffice, for example The String hash function.

By the way, the metod of storing output values for arguments is called Memoization

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号