开发者

LRU caches in C

开发者 https://www.devze.com 2023-01-03 00:56 出处:网络
I need to cache a large (but variable) number of smallish (1 kilobyte to 10 megabytes) files in memory, for a C application (in a *nix environment). Since I don\'t want to eat all my memory, I\'d like

I need to cache a large (but variable) number of smallish (1 kilobyte to 10 megabytes) files in memory, for a C application (in a *nix environment). Since I don't want to eat all my memory, I'd like to set hard memory limit (say, 64 megabytes) and push files into a hash table with the file name as the key and dispose of the entries with the least use. What I believe I need is an LRU cache.

Really, I'd rather not roll my own so if someone knows where I can find a workable library, please point the way? Failing that, can someone provide a simple example of an LRU cache in C? Related posts indicate开发者_开发百科d that a hash table with a doubly-linked list, but I'm not even clear on how a doubly-linked list keeps LRU.

Side note: I realize this is almost exactly the function of memcache, but it's not an option for me. I also took a look at the source hoping to enlighten myself on LRU caching, with no success.


Related posts indicated that a hash table with a doubly-linked list, but I'm not even clear on how a doubly-linked list keeps LRU.

I'm just taking a guess here, but you could do something like this (using pseudo-C here because I'm lazy). Here are the basic data structures:

struct File
{
    // hash key
    string Name;

    // doubly-linked list
    File* Previous;
    File* Next;

    // other file data...
}

struct Cache
{
    HashTable<string, File*> Table // some existing hashtable implementation
    File* First; // most recent
    File* Last;  // least recent
}

And here's how you'd open and close a file:

File* Open(Cache* cache, string name)
{
    if (look up name in cache->Table succeeds)
    {
        File* found = find it from the hash table lookup
        move it to the front of the list
    }
    else
    {
        File* newFile = open the file and create a new node for it

        insert it at the beginning of the list

        if (the cache is full now)
        {
            remove the last file from the list
            close it
            remove it from the hashtable too
        }
    }
}

The hashtable lets you find nodes by name quickly, and the linked-list lets you maintain them in use order. Since they point to the same nodes, you can switch between them. This lets you look a file up by name, but then move it around in the list afterwards.

But I could be totally wrong about all of this.


If you're using Linux, I think the OS will do all you need, especially if you take advantage of the fadvise system call to let the system know what files you plan to use next.


koders.com locates a few; the one that's easiest to adapt and reuse (if you're OK with its license conditions) appears to be this one from the FreeType project (will take some figuring out for its, ahem, interesting preprocessor work). At worst, it should show you one approach whereby you can implement a LRU cache in C.

Most reusable LRU cache implementations (and there are many to be found on the net), of course, use handier languages (Java, C++, C#, Python, ...) which offer stronger data structures and, typically, memory management.


It seems you can build a LRU Cache in C with uthash.

What I like most of uthash is that it's a simple header file, with lots of macros, so your extra dependencies are kept to a minimum.


I'm not aware of any general unix environmental libraries in C, but it shouldn't be hard to implement.

For code samples, I suggest looking around at any of the gazillion (oi) hash table implementations out there. Whether the table uses a linked list or a tree structure for the actual processing, it is not uncommon for some form of caching to be used (such as MRU), so it may give you an idea of what an implementation might look like. Some simple Garbage Collectors and various bits of software needing a page replacement algorithm may also be worth a look.

Basically, you mark things when they are accessed and age the references. If you increase the age of things on access rather than every peer of the item accessed, you obviously save a loop at accesses and push the weight onto the expiration operation. You'll want to do some light profiling in order to find a general idea of how least recent is sufficiently !recent enough for your task. When you get to that point, you just update the cache accordingly.

0

精彩评论

暂无评论...
验证码 换一张
取 消