Maintaining Floating Point Accuracy with a Running Average_问答_开发者

Maintaining Floating Point Accuracy with a Running Average

开发者 https://www.devze.com 2023-02-07 06:51 出处：网络

I need to calculate the mean-squared error of a 16-bit operation for an arbitrary number of data points (upwards of 100 million).I decided to go with a running average so I wouldn\'t have to worry abo

I need to calculate the mean-squared error of a 16-bit operation for an arbitrary number of data points (upwards of 100 million). I decided to go with a running average so I wouldn't have to worry about overflow from adding a large number of squared errors. At 100 million samples I had problems with floating point precision (inaccurate results) so I moved to double.

Here is my code

int iDifference = getIdeal() - getValue();

m_iCycles++;


// calculate the running MSE as

// http://en.wikipedia.org/wiki/Moving_average

// MSE(i + 1) = MSE(i) + (E^2 - MSE(i))/(i + 1)

m_dMSE = m_dMSE + ((pow((double)iDifference,2) - m_dMSE) / (double)m_iCycle开发者_如何学Gos);

Is there a better way to implement this to maintain accuracy? I considered normalizing the MSE to one and simply keeping a sum with a final division on completion to calculate the average.

You might want to look at the Kahan Summation Algorithm - it's not exactly what you need here but it solves a very similar problem and you may be able to adapt it to your needs.

Floating point numbers don't overflow in this kind of situation, they only lose precision. So there are no advantages of a running average over a running total here. The consequence is the same whether the running total or the denominator grows.

To maintain precision in a running total, keep subtotals instead of a single total. Just keep adding to a subtotal until adding one more would cause overflow. Then move on to the next subtotal. Since they are all the same order of magnitude (in base 2), optimal precision may be achieved by converting to floating point and using a pairwise accumulation into one final total.

// first = errors, second = counter
typedef pair< vector< uint32_t >, uint32_t > running_subtotals;

void accumulate_error( uint32_t error, running_subtotals &acc ) {
    ( numeric_limits< uint32_t >::max() - error < acc.first.back()?
        * acc.first.insert( acc.first.end(), 0 ) : acc.first.back() )
        += error; // add error to current subtotal, or new one if needed
    ++ acc.second; // increment counter
}

double get_average_error( running_subtotals const &total ) {
    vector< double > acc( total.first.begin(), total.first.end() );
    while ( acc.size() != 1 ) {
        if ( acc.size() % 2 ) acc.push_back( 0 );
        for ( size_t index = 0; index < acc.size() / 2; ++ index ) {
            acc[ index ] = acc[ index * 2 ] + acc[ index * 2 + 1 ];
        }
        acc.resize( acc.size() / 2 );
    }
    return acc.front() / total.second;
}

If your other solutions don't work you might investigate the Bignum library

"GMP is a free library for arbitrary precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. There is no practical limit to the precision except the ones implied by the available memory in the machine GMP runs on. GMP has a rich set of functions, and the functions have a regular interface. "