I am doing an exercise of a programming book A Book on C . The exercise suggests that to find average of a group of numbers, algorithm:
avg += (x - avg) / i;
is better than:
sum += x;
avg = sum / i;
'x' is a variable used to store the input numbers. It also suggests beside preventing overflow, the first algorithm do have some other benefits than the second algorthim, can anyone help me? Thanks!
I'm assuming we're talking about floating-point arithmetic here (otherwise the "better" average will be terrible).
In the second method, the intermediate result (sum
) will tend to grow without bound, which means you'll eventually lose low-end precision. In the first method, the intermediate result should stay of a roughly similar magnitude to your input data (assuming your input doesn't have an enormous dynamic range). which means that it will retain precision better.
However, I can imagine that as i
gets bigger and bigger, the value of (x - avg) / i
will get less and less accurate (relatively). So it also has its disadvantages.
It is better in the sense that it computes a running average, i.e. you don't need to have all your numbers in advance. You can calculate that as you go, or as numbers become available.
The latter algorithm is faster than the former, because you have to perform n operations (actually, the latter requires performing 2*n operations). But it is true that the first prevents overflow. For example, if you have this set of 1000 numbers: 4000000*250, 1500000*500, 2000000*500, the total sum of all of the integers will be 2'750.000.000, but the upper bound of a c++ int data type is a 2,147,483,647. So, we are dealing iin this case with an overflow problem. But if you perform the first algorithm, then you are able to deal with this problem.
So I recommend that you use the first algorithm if it is likely to occur the overflow, otherwise it only will add extra operations. If you decide to use the first anyways, then I reccomend that you use a type with a larger range.
Ok, the answer lies not in overflowing the sum (since that is ruled out), but as Oli said in "losing the low-end precision". If the average of the numbers you are summing is much larger than the distance of each number from the average, the 2nd approach will lose mantissa bits. Since the first approach is only looking at the relative values, it doesn't suffer from that problem.
So any list of numbers that are greater than, say, 60 million (for single-precision floating point) but the values don't vary by more than 10 or so should show you the behavior.
If you are using double-precision floats, the average value should be much higher. Or the deltas much lower.
sum += x;
avg = sum / i;
In above code suppose we have numbers as 10000,20000 ,..that is numbers containing large number of digits then value in sum may exceed its MAX value,Which is not the case in I'st one as sum is always divided by no of elements prior to storing in it.
Although because of large data types present in programming language this may not be a problem.Thus what the
Experts say "Use Data Type As Per Your Application and Requirement."
I like the second method (summing in a loop and dividing at the end) better, and can identify the second method much faster than the first.
The performance differences, if any, are irrelevant.
And, if a sum of values overflows a big enough data type, you'll probably have more problems than calculating an average.
How about calculating like this, assuming ints are in an array?:
sum += x[i] / N; rem += x[i] % N;
avg = sum + rem/N;
If N
is large (0xFFFFF) and x[i]
are all small so rem
adds up to 0xFFFF (largest int) then an overflow might happen.
精彩评论