suppose I want to calculate average value of a data-set such as
class Averager {
float total;
size_t count;
float addData (float value) {
this->total += value;
return this->total / ++this->count;
}
}
sooner or later the total
or count
valu开发者_如何学Goe will overflow, so I make it doesn't remember the total value by :
class Averager {
float currentAverage;
size_t count;
float addData (float value) {
this->currentAverage = (this->currentAverage*count + value) / ++count;
return this->currentAverage;
}
}
it seems they will overflow longer, but the multiplication between average
and count
lead to overflow problem, so next solution is:
class Averager {
float currentAverage;
size_t count;
float addData (float value) {
this->currentAverage += (value - this->currentAverage) / ++count;
return this->currentAverage;
}
}
seems better, next problem is how to prevent count
from overflow?
Aggregated buckets.
We pick a bucket size that's comfortably less than squareRoot(MAXINT). To keep it simple, let's pick 10.
Each new value is added to the current bucket, and the moving average can be computed as you describe.
When the bucket is full start a new bucket, remembering the average of the full bucket. We can safely calculate the overall average by combining the averages of the full buckets and the current, partial bucket. When we get to 10 full buckets, we create a bigger bucket, capacity 100.
To compute the total average we first compute the average of the "10s" and then combine that with the "100s". This pattern repeats for "1,000s" "10,000s" and so on. At each stage we only need to consider two levels one 10 x bigger than the previous one.
Use double total; unsigned long long count;
. You should still worry about accuracy, but it will be much less of a problem than with float
.
What about using Arbitrary-precision arithmetic ?
There's a list of libraries you could use on Wikipedia: http://en.wikipedia.org/wiki/Bignum#Libraries
Most of Arbitrary-precision arithmetic libraries will not overflow until the number of digits stored fill the available memory (which is quite unlikely).
You want to use kahan's summation algorithm:
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
See also the section about errors in summation in "What Every Computer Scientist Should Know About Floating-Point Arithmetic"
http://docs.sun.com/source/806-3568/ncg_goldberg.html#1262
You could use these special datatypes where integeres can grow infinitely until your RAM is full.
I was just thinking about this also. I think this solution works in terms of the new value 'moving the needle'. It only moves it by a factor of the number of previous values that contributed to the average-so-far (plus 1 for itself). It will lose accuracy as the inputs grow but on average should be practically acceptable. Here's some Java code that seems to work. I used floats and ints here to demonstrate that it will work with those limitations but you could use double to gain accuracy. This is just to give you an idea of how to average an array of near-max integers. You would need to keep track of the total number of inputs and the current average, but not the total sum of the inputs. If your total number of inputs approaches MAX_INT, this eventually won't work and you should use the bucket suggestion above, but that is pretty drastic in most cases.
public float calcAverageContinuous(int[] integers)
{
float ave = 0;
for (int i = 0; i < integers.length; i++) {
ave += (((float)integers[i] - ave) / (float)(i + 1));
}
return ave;
}
精彩评论