How to avoid coupling two methods that have a similar implementation now, but which may change in the future?_问答_开发者

I've got two methods in a class that currently share a very similar implementation, but an implementation that is quite costly in terms of performance.

Sample:

class Example
{
    public void process(Object obj)
    {
        boolean someFact = getFirstFact(obj);
        boolean someOtherFact = getSecondFact(obj);

        //use the two facts somehow
    }

    public boolean getFirstFact(Object obj)
    {
         boolean data = someExpensiveMethod(obj);
         //return some value derived from data
    }

    public boolean getSecondFact(Object obj)
    {
         boolean data = someExpensiveMethod(obj);
         //return some other value derived from data
    }

    public boolean someExpensiveMethod(Object obj){...}
}

Ive thought about somehow caching the result of someExpensiveMethod, but that s开发者_Python百科eems wasteful, given that objects tend to come in, be processed and then discarded. It also seems clunky - methods need to know about a cache, or I need to cache results in ssomeExpensiveMethod.

Even short term cache could be bad news, since literally millions of objects get processed every day.

My concerns are twofold - firstly, there is no guarantee that these two methods will always depend on the third, so any solution should e transparent from their POV, and secondly that the obvious solution (cahcing inside someExpensiveMethod), could be very costly in terms of space for results that arent needed to be kept long term.

I've thought about somehow caching the result of someExpensiveMethod, but that seems wasteful, given that objects tend to come in, be processed and then discarded.

I don't see how that is wasteful. This is basically how caches work. You compare the objects that come in against the ones that you've recently processed, and when you get a "hit" you avoid the expense of calling someExpensiveMethod.

Whether caching actually works for your application will depend on a number of factors like:

the number of object / result pairs that you can keep in your cache,
the probability of a "hit",
the average cost of performing a cache probe (in the "hit" and "miss" cases),
the average cost of calling someExpensiveMethod
the direct costs of maintaining the cache; e.g. if you use LRU or some other strategy to get rid of cache entries that not helping, and
the indirect cost of maintaining the cache.

(The last point is hard to predict / measure, but it includes the extra memory needed to represent the cache structures, work that the GC has to do to deal with the fact that the cache and its contents are "reachable", and the GC overheads associated with weak references ... assuming that you use them.)

Ultimately, the success (or otherwise) of a caching solution is judged in terms of the system's average behavior for realistic workloads. The fact that some cached results are never used again is not really relevant.

It also seems clunky - methods need to know about a cache, or I need to cache results in someExpensiveMethod.

Again, IMO it not "clunky" either way. This is the way that you implement caching.

Even short term cache could be bad news, since literally millions of objects get processed every day.

Again, I don't see the logic of your argument. If millions of objects are processed a day and you keep (say) the last 5 minutes worth, that just tens of thousands of objects to cache. That is hardly "bad news".

If you really are processing "literally millions" of objects a day, then:

someExpensiveMethod cannot be be that expensive ... unless you have either a highly effective caching and lots of memory, or a large number of processors, or both,
your concerns about elegance (unclunkiness) and avoiding coupling must be secondary to the issue of designing the application so that it can keep up, and
you'll probably need to run on a multiprocessor, and you will therefore need to deal with the fact that a cache can be a concurrency bottleneck.

Are you always invoking the process method (I mean, do you never invoke the get...Fact methods directly)? If that is the case, then you know for certain that getFirstFact is always invoked before getSecondFact.

You could then simply cache the boolean output of someExpensiveMethod in the getFirstFact method, using a private field, and reuse that value in the getSecondFact method:

class Example
{
    private boolean _expensiveMethodOutput;

    public void process(Object obj)
    {
        boolean someFact = getFirstFact(obj);
        boolean someOtherFact = getSecondFact(obj);

        //use the two facts somehow
    }

    private boolean getFirstFact(Object obj)
    {
         _expensiveMethodOutput = someExpensiveMethod(obj);
         //return some value derived from data
    }

    private boolean getSecondFact(Object obj)
    {
         boolean data = _expensiveMethodOutput;
         //return some other value derived from data
    }

    private boolean someExpensiveMethod(Object obj){...}
}

From your question title I guess that you don't want to do

class Example
{
    public void process(Object obj)
    {
        boolean expensiveResult = someExpensiveMethod(obj);
        boolean someFact = getFirstFact(expensiveResult);
        boolean someOtherFact = getSecondFact(expensiveResult);

        //use the two facts somehow
    }
    ...

because that would mean that when changing one of the methods, you can not access obj anymore. Also, you want to avoid executing the expensive method whenever possible. A simple solution would be

private Object lastParam = null;
private boolean lastResult = false;
public boolean someExpensiveMethod(Object obj){
    if (obj == lastParam) return lastResult;
    lastResult = actualExpensiveMethod(obj);
    lastParam = obj;
    return lastResult ;
}

Of course this will not work with multithreading. (At least make sure process is synchronized.)

I would consider to introduce a factory method and a new object that encapsulates the preprocessing. This way the jvm can discard the preprocessed data, as soon as the object gets out of scope.

class PreprocessedObject {
    private ... data;

    public static PreprocessedObject  create(Object obj) {
        PreprocessedObject pObj = new PreprocessedObject();
        // do expensive stuff
        pObj.data = ...
        return pObj;
    }

    public boolean getFirstFact() {
         //return some value derived from data
    }

    public boolean getSecondFact() {
         //return some other value derived from data
    }
}

In addition to the answer from Stephen, I'd suggest you to look at Google Guava. There is a concept of computing map that suits to the problem you faced here. I've written an article about that here.

In term of code, this is what I suggest:

class Example {

    private ConcurrentMap<Object, Boolean> cache;

    void initCache() {
        cache = new MapMaker().softValues()
                    .makeComputingMap(new Function<Object, Boolean>() {

            @Override
            public Boolean apply(Object from) {
                return someExpensiveMethod(from);
            }
        });
    }

    public void process(Object obj) {
        boolean someFact = getFirstFact(obj);
        boolean someOtherFact = getSecondFact(obj);

        // use the two facts somehow
    }

    public boolean getFirstFact(Object obj) {
        boolean data = cache.get(obj);
        // return some value derived from data
    }

    public boolean getSecondFact(Object obj) {
        boolean data = cache.get(obj);
        // return some other value derived from data
    }

    public boolean someExpensiveMethod(Object obj) {
    }
}