开发者

Why differs floating-point precision in C# when separated by parantheses and when separated by statements?

开发者 https://www.devze.com 2022-12-23 22:45 出处:网络
I am aware of how floating point precision works in the regular cases, but I stumbled on an odd situation in my C# code.

I am aware of how floating point precision works in the regular cases, but I stumbled on an odd situation in my C# code.

Why aren't result1 and result2 the exact same floating point value here?


const float A;   // Arbitrary value
const float B;   // Arbitrary value

float result1 = (A*B)*dt;

float result2 = (A*B); 
result2 *= dt;

From this page I figured float arithmetic was left-associative and that this means values are evaluated and calculated in a left-to-right manner.

The full source code involves XNA's Quaternions. I don't think it's relevant what my constants are and what the VectorHelper.AddPitchRollYaw() does. Th开发者_运维技巧e test passes just fine if I calculate the delta pitch/roll/yaw angles in the same manner, but as the code is below it does not pass:


X
  Expected: 0.275153548f
  But was:  0.275153786f

[TestFixture]
    internal class QuaternionPrecisionTest
    {
        [Test]
        public void Test()
        {
            JoystickInput input;
            input.Pitch = 0.312312432f;
            input.Roll = 0.512312432f;
            input.Yaw = 0.912312432f;
            const float dt = 0.017001f;

            float pitchRate = input.Pitch * PhysicsConstants.MaxPitchRate;
            float rollRate = input.Roll * PhysicsConstants.MaxRollRate;
            float yawRate = input.Yaw * PhysicsConstants.MaxYawRate;

            Quaternion orient1 = Quaternion.Identity;
            Quaternion orient2 = Quaternion.Identity;

            for (int i = 0; i < 10000; i++)
            {
                float deltaPitch = 
                      (input.Pitch * PhysicsConstants.MaxPitchRate) * dt;
                float deltaRoll = 
                      (input.Roll * PhysicsConstants.MaxRollRate) * dt;
                float deltaYaw = 
                      (input.Yaw * PhysicsConstants.MaxYawRate) * dt;

                // Add deltas of pitch, roll and yaw to the rotation matrix
                orient1 = VectorHelper.AddPitchRollYaw(
                                orient1, deltaPitch, deltaRoll, deltaYaw);

                deltaPitch = pitchRate * dt;
                deltaRoll = rollRate * dt;
                deltaYaw = yawRate * dt;
                orient2 = VectorHelper.AddPitchRollYaw(
                                orient2, deltaPitch, deltaRoll, deltaYaw);
            }

            Assert.AreEqual(orient1.X, orient2.X, "X");
            Assert.AreEqual(orient1.Y, orient2.Y, "Y");
            Assert.AreEqual(orient1.Z, orient2.Z, "Z");
            Assert.AreEqual(orient1.W, orient2.W, "W");
        }
    }

Granted, the error is small and only presents itself after a large number of iterations, but it has caused me some great headackes.


Henk is exactly right. Just to add a bit to that.

What's happening here is that if the compiler generates code that keeps the floating point operations "on the chip" then they can be done in higher precision. If the compiler generates code that moves the results back to the stack every so often, then every time they do so, the extra precsion is lost.

Whether the compiler chooses to generate the higher-precision code or not depends on all kinds of unspecified details: whether you compiled debug or retail, whether you are running in a debugger or not, whether the floats are in variables or constants, what chip architecture the particular machine has, and so on.

Basically, you are guaranteed 32 bit precision OR BETTER, but you can NEVER predict whether you're going to get better than 32 bit precision or not. Therefore you are required to NOT rely upon having exactly 32 bit precision, because that is not a guarantee we give you. Sometimes we're going to do better, and sometimes not, and if you sometimes get better results for free, don't complain about it.

Henk said that he could not find a reference on this. It is section 4.1.6 of the C# specification, which states:

Floating-point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an “extended” or “long double” floating-point type with greater range and precision than the double type, and implicitly perform all floating-point operations using this higher precision type. Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations. Other than delivering more precise results, this rarely has any measurable effects.

As for what you should do: First, always use doubles. There is no reason whatsoever to use floats for arithmetic. Use floats for storage if you want; if you have a million of them and want to use four million bytes instead of eight million bytes, that's a reasonable usage for floats. But floats COST you at runtime because the chip is optimized to do 64 bit math, not 32 bit math.

Second, do not rely upon floating point results being exact or reproducible. Small changes in conditions can cause small changes in results.


I couldn't find a reference to back this up but I think it is due to the following:

  • float operations are calculated in the precision available in the hardware, that means they can be done with a greater precision than that of float.
  • the assignment to the intermediate result2 variable forces rounding back to float precision, but the single expression for rsult1 is computed entirely in native precision before being rounded down.

On a side note, testing float or double with == is always dangerous. The Microsoft Unit testing provides for am Assert.AreEqual(float expected, float actual,float delta) where you would solve this problem with a suitable delta.

0

精彩评论

暂无评论...
验证码 换一张
取 消