I'd like to ask your expert advice on a workable architecture in C#.
I have a C# service which responds to a request from a local user on the LAN, fetches packets of data from the internet, and crunches that data to produce arrays of data in a structure. Each data request takes about 2 seconds, and returns 4000 bytes. There could be tens of thousands of requests per day.
To speed everything up, and reduce bandwidth, I need to cache the results of the data crunching so that 2nd and subsequent accesses are served instantly to any other users on the LAN (there could be >50 users).
Constraints:
- 开发者_StackOverflow中文版The underlying data never changes, i.e. I don't have to worry about "dirty" data (great!).
- The data I want to cache is a rather complex structure, containing nested arrays of DateTime, doubles, etc. The data is crunched, using a lot of math, from the data served from the internet.
- I can't use more than 100MB of memory no matter how much data is cached (i.e. the cache must be size limited).
- I can't index the data in the cache by a numerical index, I have to index it with a combination of date ("YYYY-MM-DD") and a unique ID string ("XXXXXXXX").
- It has to be fast, i.e. it has to serve most of its responses from RAM.
- The data in the cache must be persisted to disk every 24 hours.
Here are my options at the moment:
- Cache the data in the server class, using private variables (i.e. private List or Dictionary), then serialize it to disk occasionally;
- Use a database;
I'm interested in your expert opinion.
By far the easiest solution is to use a Dictionary<string, ComplexDataStructure>
for this.
Concerning your requirements:
Lifetime of the cache is easiest to manage by having a background thread that does a scan of the cache ever 10 minutes or hour or so. With the
ComplexDataStructure
, you store aDateTime
when the cache was created and remove the key from the dictionary once its lifetime has expired;Because you are storing the actual data structure, complexity is not an issue;
Limiting the size may be difficult. sizeof() equivalent for reference types? may help you to calculate the size of the object structure. This operation will not be trivial, but you can store the result with
ComplexDataStructure
. Then, the same thread as the one used for 1. can remove entries when you run out of space. An easier solution would probably be to useGC.GetTotalMemory()
and determine whether the total memory usage of your process is outside of a specific limit. Then, just remove a cache item and on the second run, when you see you're still using too much memory, remove a second one;Just use a string;
Using the
Dictionary<,>
is probably by fat the fasted way;Again, use the thread from 1. and implement such logic.
Make sure you handle your locking strategy correctly. The largest issue here will be that you don't want the crunching when a different thread is already crunching the data. A solution to this could be the following strategy:
Lock the dictionary;
Verify whether the cache item exists;
When the cache item does not exist:
Create an empty cache item;
Add that to the dictionary;
Put a lock on the cache item;
Release the lock on the dictionary;
Do the data crunching;
Add the crunched data to the cache item;
Release the lock on the cache item;
When the cache item already exists;
When the cache item actually does have the crunched data, return that;
When the cache item does not have the crunched data, put a lock on the cache item;
Inside the lock, the crunched data will have appeared (because the lock forces you to wait on the other thread).
There are other issues that will have to be resolved, but I think the basics are described here.
Perhaps something like Index4Objects?
(http://www.codeplex.com/i4o) http://staxmanade.blogspot.com/2008/12/i4o-indexspecification-for.html
Also, maybe read this response to another SO question i4o vs. PLINQ.
What about: Use the IIS provided internal methods?
I think I've found the perfect solution: PostSharp + Kellerman .NET logging library. PostSharp requires a slight learning curve (about 15 minutes), but once you are up and running, you can annotate your method with the attribute [Cachable], and the system will automatically cache the results of this method for you. Its about as clean a solution as you can possibly get.
精彩评论