I'll begin this question with apologizing for the length of the post. So that I save you some time, my problem is that the class pattern I've got stuck in my head is obviously flawed, and I can't see a good solution.
In a project I'm working on, I need to use operate algorithms on a chunks of data, let's call them DataCache
. Sometimes these algorithms return results that themselves need to be cached, and so I devised a scheme.
I have an Algorithm base class that looks like so
abstract class Algorithm<T>
{
protected abstract T ExecuteAlgorithmLogic(DataCache dataCache);
private readonly Dictionary<DataCache, WeakReference> _resultsWeak = new Dictionary<DataCache, WeakReference>();
private readonly Dictionary<DataCache, T> _resultsStrong = new Dictionary<DataCache, T>();
public T ComputeResult(DataCache dataCache, bool save = false)
{
if (_resultsStrong.ContainsKey(dataCache))
return _resultsStrong[dataCache];
if (_resultsWeak.ContainsKey(dataCache))
{
var temp = _resultsWeak[dataCache].Target;
if (temp != null) return (T) temp;
}
var result = ExecuteAlgorithmLogic(dataCache);
_resultsWeak[dataCache] = new WeakReference(result, true);
if (save) _resultsStrong[dataCache] = result;
return result;
}
}
If you call ComputeResult()
and provide a DataCache
you can optionally select to cache the result. Also, if you are lucky result still might be there if the GC hasn't collected it yet. The size of each DataCache is in hundreds of megabytes, and before you ask there are about 10 arrays in each, which hold basic types such as int
and float
.
My idea here was that an actual algorithm would look something like this:
class ActualAgorithm : Algorithm<SomeType>
{
protected override SomeType ExecuteAlgorithmLogic(DataCache dataCache)
{
//Elves be here
}
}
And I would define tens of .cs files, each for one algorithm. There are two problems with this approach. Firstly, in order for this to work, I need to instantiate my algorithms and keep that instance (or the results are not cached and the entire point is mute). But then I end up with an unsightly singleton pattern implementation in each derived class. It would look something like so:
class ActualAgorithm : Algorithm<SomeType>
{
protected override SomeType ExecuteAlgorithmLogic(DataCache dataCache)
{
//El开发者_如何学运维ves and dragons be here
}
protected ActualAgorithm(){ }
private static ActualAgorithm _instance;
public static ActualAgorithm Instance
{
get
{
_instance = _instance ?? new ActualAgorithm();
return _instance;
}
}
}
So in each implementation I would have to duplicate code for the singleton pattern. And secondly tens of CS files also sounds a bit overkill, since what I'm really after is just a single function returning some results that can be cached for various DataCache
objects. Surely there must be a smarter way of doing this, and I would greatly appreciate a nudge in the right direction.
What I meant with my comment was something like this:
abstract class BaseClass<K,T> where T : BaseClass<K,T>, new()
{
private static T _instance;
public static T Instance
{
get
{
_instance = _instance ?? new T();
return _instance;
}
}
}
class ActualClass : BaseClass<int, ActualClass>
{
public ActualClass() {}
}
class Program
{
static void Main(string[] args)
{
Console.WriteLine(ActualClass.Instance.GetType().ToString());
Console.ReadLine();
}
}
The only problem here is that you'll have a public constructor.
I refined my previous answer but as it is rather different than the other approach I proposed, I thought I might just make another answer. First, we'll need to declare some interfaces:
// Where to find cached data
interface DataRepository {
void cacheData(Key k, Data d);
Data retrieveData(Key k, Data d);
};
// If by any chance we need an algorithm somewhere
interface AlgorithmRepository {
Algorithm getAlgorithm(Key k);
}
// The algorithm that process data
interface Algorithm {
void processData(Data in, Data out);
}
Given these interfaces, we can define some basic implementation for the algorithm repository:
class BaseAlgorithmRepository {
// The algorithm dictionnary
Map<Key, Algorithm> algorithms;
// On init, we'll build our repository using this function
void setAlgorithmForKey(Key k, Algorithm a) {
algorithms.put(k, a);
}
// ... implement the other function of the interface
}
Then we can also implement something for the DataRepository
class DataRepository {
AlgorithmRepository algorithmRepository;
Map<Key, Data> cache;
void cacheData(Key k, Data d) {
cache.put(k, d);
}
Data retrieveData(Key k, Data in) {
Data d = cache.get(k);
if (d==null) {
// Data not found in the cache, then we try to produce it ourself
Data d = new Data();
Algorithm a = algorithmRepository.getAlgorithm(k);
a.processData(in, d);
// This is optional, you could simply throw an exception to say that the
// data has not been cached and thus, the algorithm succession did not
// produce the necessary data. So instead of the above, you could simply:
// throw new DataNotCached(k);
// and thus halt the whole processing
}
return d;
}
}
Finally, we get to implement algorithms:
abstract class BaseAlgorithm {
DataRepository repository;
}
class SampleNoCacheAlgorithm extends BaseAlgorithm {
void processData(Data in, Data out) {
// do something with in to compute out
}
}
class SampleCacheProducerAlgorithm extends BaseAlgorithm {
static Key KEY = "SampleCacheProducerAlgorithm.myKey";
void processData(Data in, Data out) {
// do something with in to compute out
// then call repository.cacheData(KEY, out);
}
}
class SampleCacheConsumerAlgorithm extends BaseAlgorithm {
void processData(Data in, Data out) {
// Data tmp = repository.retrieveData(SampleCacheProducerAlgorithm.KEY, in);
// do something with in and tmp to compute out
}
}
To build on this, I think you could also define some special kinds of algorithms that are just in fact composites of other algorithms but also implement the Algorithm interface. An example could be:
class AlgorithmChain extends BaseAlgorithm {
List<Algorithms> chain;
void processData(Data in, Data out) {
Data currentIn = in;
foreach (Algorithm a : chain) {
Data currentOut = new Data();
a.processData(currentIn, currentOut);
currentIn = currentOut;
}
out = currentOut;
}
}
One addition I would make to this is a DataPool, that would allow you to reuse exisiting but unused Data objects in order to avoid allocating lots of memory each time you make a new Data().
I think this set of classes could give a good basis to your whole architecture, with the additional benefit that it does not employ any Singleton (always passing references to the concerned objects). Which means also that implementing dummy classes for unit tests would be rather easy.
You could have your algorithms independant of their results:
class Engine<T> {
Map<AlgorithmKey, Algorithm<T>> algorithms;
Map<AlgorithmKey, Data> algorithmsResultCache;
T processData(Data in);
}
interface Algorithm<T> {
boolean doesResultNeedsToBeCached();
T processData(Data in);
}
Then you Engine is responsible for instanciating the algorithms which are only pieces of code where the input is data and the output is either null or some data. Each algorithm can say whether his result needs to be cached or not.
In order to refine my answer, I think you should give some precisions about how the algorithms are to be run (is there an order, is it user adjustable, do we know in advance the algorithms that will be run, ...).
Can you register your algorithm instances with a combined repository/factory of algorithms that'll keep references to them? The repository could be a singleton, and, if you give the repository control of algorithm instantiation, you could use it to ensure that only one instance of each existed.
public class AlgorithmRepository
{
//... use boilerplate singleton code
public void CreateAlgorithm(Algorithms algorithm)
{
//... add to some internal hash or map, checking that it hasn't been created already
//... Algorithms is just an enum telling it which to create (clunky factory
// implementation)
}
public void ComputeResult(Algorithms algorithm, DataCache datacache)
{
// Can lazy load algoirthms here and make CreateAlgorithm private ..
CreateAlgorithm(algorithm);
//... compute and return.
}
}
This said, having a separate class (and cs file) for each algorithm makes sense to me. You could break with convention and have multiple algo classes in a single cs file if they're lightweight and it makes it easier to manage if you're worried about the number of files -- there are worse things to do. FWIW I'd just put up with the number of files ...
Typically when you create a Singleton class you don't want to inherit from it. When you do this you lose some of the goodness of the Singleton pattern (and what I hear from the pattern zealots is that an angel loses its wings every time you do something like this). But lets be pragmatic...sometimes you do what you have to do.
Regardless I do not think combining generics and inheritance will work in this instance anyway.
You indicated the number of algorithms will be in the tens (not hundreds). As long is this is the case I would create a dictionary keyed off of System.Type and store references to your methods as the values of the dictionary. In this case I used
Func<DataCache, object>
as the dictionary value signature.
When the class instantiates for the first time register all your available algorithms in the dictionary. At runtime when the class needs to execute an algorithm for type T it will get the Type of T and look up the alogorithm in the dictionary.
If the code for the algorithms will be relatively involved I would suggest splitting them off into partial classes just to keep your code readable.
public sealed partial class Algorithm<T>
{
private static object ExecuteForSomeType(DataCache dataCache)
{
return new SomeType();
}
}
public sealed partial class Algorithm<T>
{
private static object ExecuteForSomeOtherType(DataCache dataCache)
{
return new SomeOtherType();
}
}
public sealed partial class Algorithm<T>
{
private readonly Dictionary<System.Type, Func<DataCache, object>> _algorithms = new Dictionary<System.Type, Func<DataCache, object>>();
private readonly Dictionary<DataCache, WeakReference> _resultsWeak = new Dictionary<DataCache, WeakReference>();
private readonly Dictionary<DataCache, T> _resultsStrong = new Dictionary<DataCache, T>();
private Algorithm() { }
private static Algorithm<T> _instance;
public static Algorithm<T> Instance
{
get
{
if (_instance == null)
{
_instance = new Algorithm<T>();
_instance._algorithms.Add(typeof(SomeType), ExecuteForSomeType);
_instance._algorithms.Add(typeof(SomeOtherType), ExecuteForSomeOtherType);
}
return _instance;
}
}
public T ComputeResult(DataCache dataCache, bool save = false)
{
T returnValue = (T)(new object());
if (_resultsStrong.ContainsKey(dataCache))
{
returnValue = _resultsStrong[dataCache];
return returnValue;
}
if (_resultsWeak.ContainsKey(dataCache))
{
returnValue = (T)_resultsWeak[dataCache].Target;
if (returnValue != null) return returnValue;
}
returnValue = (T)_algorithms[returnValue.GetType()](dataCache);
_resultsWeak[dataCache] = new WeakReference(returnValue, true);
if (save) _resultsStrong[dataCache] = returnValue;
return returnValue;
}
}
First off, I'd suggest you rename DataCache to something like DataInput for more clarity, because it's easy to confuse it with objects that really act as caches (_resultsWeak and _resultsStrong) to store the results.
Concerning the need for these caches to remain in memory for future use, maybe you should consider placing them in one of the wider scopes that exist in a .NET application than the object scope, Application or Session for example.
You could also use an AlgorithmLocator (see ServiceLocator pattern) as a single point of access to all Algorithms to get rid of the singleton logic duplication in each Algorithm.
Other than that, I find your solution to be a nice one globally. Whether or not it is overkill will basically depend on the homogeneity of your algorithms. If they all have the same way of caching data, of returning their results... it will be a great benefit to have all that logic factored out in a single place. But we lack context here to judge.
Encapsulating the caching logic in a specific object held by the Algorithm (CachingStrategy ?) would also be an alternative to inheriting it, but maybe a bit awkward since the caching object would have to access the cache before and after calculation and would need to be able to trigger algorithm calculation itself and have a hand on the results.
[Edit] if you're concerned with having one .cs file per algorithm, you can always group all Algorithm classes pertaining to a particular T in the same file.
精彩评论