开发者

Why does a C floating-point type modify the actual input of 125.1 to 125.099998 on output?

开发者 https://www.devze.com 2023-03-16 04:51 出处:网络
I wrote the following program: #include<stdio.h> int main(void) 开发者_JAVA技巧{ float f; printf(\"\\nInput a floating-point no.: \");

I wrote the following program:

 #include<stdio.h>
    int main(void)
 开发者_JAVA技巧   {
     float f;
     printf("\nInput a floating-point no.: ");
     scanf("%f",&f);
     printf("\nOutput: %f\n",f);
     return 0;
    }

I am on Ubuntu and used GCC to compile the above program. Here is my sample run and output I want to inquire about:

Input a floating-point no.: 125.1
Output: 125.099998

Why does the precision change?


Because the number 125.1 is impossible to represent exactly with floating-point numbers. This happens in most programming languages. Use e.g. printf("%.1f", f); if you want to print the number with one decimal, but be warned: the number itself is not exactly equal to 125.1.


Thank you all for your answers. Although almost all of you helped me look in the right direction I could not understand the exact reason for this behavior. So I did a bit of research in addition to reading the pages you guys pointed me to. Here is my understanding for this behavior:

Single Precision Floating Point numbers typically use 4 bytes for storage on x86/x86-64 architectures. However not all 32 bits (4 bytes = 32 bits) are used to store the magnitude of the number.

For storing as a single precision floating type, the input stream is formatted in the following notation (somewhat similar to scientific notation):

(-1)^s x 1.m x 2^(e-127), where
  s = sign of the number, range:{0,1} - takes up 1 bit
  m = mantissa (fractional portion) of the number - takes up 23 bits
  e = exponent of the number offset by 127, range:{0,..,255} - takes up 8 bits

and then stored in memory as

0th byte 1st byte 2nd byte 3rd byte
mmmmmmmm mmmmmmmm emmmmmmm seeeeeee

Therefore the decimal number 125.1 is first converted to binary form but limited to 24 bits so that the mantissa is represented by no more than 23 bits. After conversion to binary form:

125.1 = 1111101.00011001100110011

NOTE: 0.1 in decimal can be represented up to infinite bits in binary but the computer limits the representation to 17 bits so the complete representation does not exceed 24 bits.

Now converting it into the specified notation we get:

125.1 = 1.111101 00011001100110011 x 2^6
      = (-1)^0 + 1.111101 00011001100110011 x 2^(133-127)

which implies

s = 0
m = 11110100011001100110011
e = 133 = 10000101

Therefore, 125.1 will be stored in memory as:

0th byte 1st byte 2nd byte 3rd byte
mmmmmmmm mmmmmmmm emmmmmmm seeeeeee
00110011 00110011 11111010 01000010

On being passed to the printf() function the output stream is generated by converting the binary form to the decimal form. The bytes are actually stored in reverse order (from the input stream) and hence read in this order:

3rd byte 2nd byte 1st byte 0th byte
seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
01000010 11111010 00110011 00110011

Next, it is converted into the specific notation for conversion

(-1)^0 + 1.111101 00011001100110011 x 2^(133-127)

On simplifying the above representation further:

= 1.111101 00011001100110011 x 2^6
= 1111101.00011001100110011

And finally converting it to decimal:

= 125.0999984741210938

but single precision floating point can represent only up to 6 decimal places, therefore the answer is rounded off to 125.099998.


Think about a fixed point representation first.

2^3=8 2^2=4 2^1=2 2^0=1 2^-1=1/2 2^-2=1/4 2^-3=1/8 2^-4=1/16

If we want to represent a fraction then we set the bits to the right of the point, so 5.5 is represented as 01011000.

But if we want to represent 5.6, there is not an exact fractional representation. The closest we can get is 01011001 == 5.5625

1/2 + 1/16 = 0.5625

2^-4 + 2^-1


Because its the closest representation of 125.1 , remember that single precision floating point are just 32 bits.


If I tell you to write 1/3 as decimal number down, you realize there a numbers which have no finite representation. .1 is the exact representation of 1/10 there this problem does not appear, BUT this is just in decimal representation. In binary representation .1 is one of those numbers that require infinite digits. As your number must be somehwere cut there is something lost.


No floating point numbers has an exact representation, they all have limited accuracy. When converting from a number in text to a float (with scanf or otherwise), you're in another world with different kinds of numbers, and precision may be lost. Same thing goes when converting from a float to a string: you decide on how many digits you want. You can't know "how many digits there are" in a float before converting to text or another format that can keep that information. This all has to do with how floats are stored:

significant_digits * baseexponent


The normal type used for floating point in C is double, not float. Your float is implicitly cast to a double, and because the float is less precise, the difference to the closest representable number to 125.1 is more apparent (and printf's default precision is tailored for use with doubles). Try this instead:

#include<stdio.h>
int main(void)
{
    double f;
    printf("\nInput a floating-point no.: ");
    scanf("%lf",&f);
    printf("\nOutput: %f\n",f);
    return 0;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号