Problem with arithmetic using logarithms to avoid numerical underflow (take 2)
Having seen the above and having seen softmax normalization I was trying to normalize a vector while avoiding overflow -
that is if I have an array
x[1], x[2] x[3], x[4], ... , x[n]
the normalized form for me has the sum of squares of elements as 1.0
and is obtained by dividing each element by
sqrt(x[1]*x[1]+x[2]*x[2]+...+x[n]*x[n])
now the sum of squares can overflow even if the square root is small enough to fit into a floating point variable, so I imagined one could do something like
开发者_运维百科s=(2*log(fabs(x[1]))+2*log(fabs(x[2]))+...+2*log(fabs(x[n])))/2
and calculating the elements as
exp(log(fabs(x[1]))-s), ..., exp(log(fabs(x[n]))-s
BUT
The above is incorrect as log(A+B) is not log(A)+log(B) - now is there a way to do vector normalization that avoids overflow better?
Instead of
norm = sqrt(x[1] * x[1] + ... + x[n] * x[n])
you might want to divide the elements of the vector by the maximum possible value before squaring
max_x = max(x[1], ..., x[n])
y[1] = x[1] / max_x / n
...
y[n] = x[n] / max_x / n
norm = n * sqrt(y[1] * y[1] + ... + y[n] * y[n]) * max_x
The norm of the y
vector should then be equal or smaller than zero. The value of n * max_x
could still overflow, so you need to be careful there, too, that the operations are executed in a non-overflowing order.
You seem to be making the assumption that:
log(x^2 + y^2)
is the same as:
log(x^2) + log(y^2)
However, this isn't correct, as you can't simplify the logarithm of a sum like that.
KennyTM is correct - your ideas about logarithms are wrong.
You can't use an L2 norm, because it requires that you calculate the magnitude of the vector, which is exactly what you're having overflow issues with.
Perhaps the L-infinity norm, where you divide each component in the vector by the absolute value of the maximum component first will be better. Be sure to hang onto that max absolute value so you can get the right magnitude back.
I understand completely that you need the L2 norm, but if overflow is indeed an issue you'll need to take intermediate steps to get it:
- Find the max absolute value of the vector.
- Divide each component by the max absolute value to normalize; max value is now +/- 1.
- Calculate the square root of the sum of squares of normalized components. I'd recommend sorting the values and adding them in ascending order to make sure that small components aren't lost.
- Multiply by the max absolute value to get the L2 norm of the original vector.
精彩评论