开发者

Warning about data loss c++/c

开发者 https://www.devze.com 2022-12-27 10:50 出处:网络
I am getting a benign warning about possible data loss warning C4开发者_如何学JAVA244: \'argument\' : conversion from \'const int\' to \'float\', possible loss of data

I am getting a benign warning about possible data loss

warning C4开发者_如何学JAVA244: 'argument' : conversion from 'const int' to 'float', possible loss of data

Question

I remember as if float has a larger precision than int. So how can data be lost if I convert from a smaller data type (int) to a larger data type (float)?


Because float numbers are not precise. You cannot represent every possible value an int can hold into a float, even though the maximum value of a float is much higher.

For instance, run this simple program:

#include <stdio.h>

int main()
{
 for(int i = 0; i < 2147483647; i++)
 {
  float value = i;
  int ivalue = value;
  if(i != ivalue)
   printf("Integer %d is represented as %d in a float\n", i, ivalue);
 }
}

You'll quickly see that there are thousands billions of integers that can't be represented as floats. For instance, all integers between the range 16,777,219 and 16,777,221 are represented as 16,777,220.

EDIT again Running that program above indicates that there are 2,071,986,175 positive integers that cannot be represented precisely as floats. Which leaves you roughly with only 100 millions of positive integer that fit correctly into a float. This means only one integer out of 21 is right when you put it into a float.

I expect the numbers to be the same for the negative integers.


On most architectures int and float are the same size, in that they have the same number of bits. However, in a float those bits are split between exponent and mantissa, meaning that there are actually fewer bits of precision in the float than the int. This is only likely to be a problem for larger integers, though.

On systems where an int is 32 bits, a double is usually 64 bits and so can exactly represent any int.


Both types are composed of 4 bytes (32 bits). Only one of them allows a fraction (the float).

Take this for a float example;

34.156

(integer).(fraction)

Now use your logic; If one of them must save fraction information (after all it should represent a number) then it means that it has less bits for the integer part.

Thus, a float can represent a maximal integer number which is smaller than the int's type capability.

To be more specific, an "int" uses 32 bits to represent an integer number (maximal unsigned integer of 4,294,967,296). A "float" uses 23 bits to do so (maximal unsigned integer of 8,388,608).

That's why when you convert from int to float you might lose data.

Example: int = 1,158,354,125

You cannot store this number in a "float".

More information at:

http://en.wikipedia.org/wiki/Single_precision_floating-point_format

http://en.wikipedia.org/wiki/Integer_%28computer_science%29


Precision does not matter. The precision of int is 1, while the precision of a typical float (IEEE 754 single precision) is approximately 5.96e-8. What matters is the sets of numbers that the two formats can represent. If there are numbers that int can represent exactly that float cannot, then there is a possible loss of data.

Floats and ints are typically both 32 bits these days, but that's not guaranteed. Assuming it is the case on your machine, it follows that there must be int values that float cannot represent exactly, because there are obviously float values that int cannot represent exactly. The range of one format cannot be a proper super-set of the other if both formats use the same number of bits efficiently.

A 32 bit int effectively has 31 bits that code for the absolute value of the number. An IEEE 754 float effectively has only 24 bits that code for the mantissa (one implicit).


The fact is that both a float and an int are represented using 32 bits. The integer value uses all 32 bits so it can accommodate numbers from -231 to 231-1. However, a float uses 1 bit for the sign (including -0.0f) and 8 bits for the exponent. The means 32 - 9 = 23 bits left for the mantissa. However, the float assumes that if the mantissa and exponent are not zero, then the mantissa starts with a 1. So you more or less have 24 bits for your integer, instead of 32. However, because it can be shifted, it accommodates more than 224 integers.

A floating point uses a Sign, an eXponent, and a Mantissa
S X X X X X X X X M M M M M M M M M M M M M M M M M M M M M M M

An integer has a Sign, and a Mantissa
S M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M

So, a 29 bit integer such as:

0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

fits in a float because it can be shifted:

0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
|       |                             |
|       +-----------+                 +-----------+
|                   |                             |
v                   v                             v
S  X X X X X X X X  M M M M M M M M M M M M M M M M M M M M M M M
0  1 0 0 1 1 0 1 1  1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0

The eXponent represents a biased shift (the shift of the mantissa minus 128, if I'm correct—the shift counts from the decimal point). This clearly shows you that if you have to shift by 5 bits, you're going to lose the 5 lower bits.

So this other integer can be converted to a float with a lose of 2 bits (i.e. when you convert back to an integer, the last two bits (11) are set to zero (00) because they were not saved in the float):

1 1 1 0 0 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1
|                             ||
|                             || complement
|                             vv
| 0 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1
|       |                             |               | | | | |
|       +-----------+                 +-----------+   +-+-+-+-+--> lost bits
|                   |                             |
v                   v                             v
S  X X X X X X X X  M M M M M M M M M M M M M M M M M M M M M M M
1  1 0 0 1 1 0 1 1  1 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1

Note: For negative numbers, we first generate the complement, which is subtracting 1 then reversing all the bits from 0 to 1. That complement is what gets saved in the mantissa. The sign, however, still gets copied as is.

Pretty simple stuff really.

IMPORTANT NOTE: Yes, the first 1 in the integer is the sign, then the next 1 is not copied in the mantissa, it is assumed to be 1 so it is not required.


A float is usually in the standard IEEE single-precision format. This means there are only 24 bits of precision in a float, while an int is likely to be 32-bit. So, if your int contains a number whose absolute value cannot fit in 24 bits, you are likely to have it rounded to the nearest representable number.


My stock answer to such questions is to read this - What Every Computer Scientist Should Know About Floating-Point Arithmetic.

0

精彩评论

暂无评论...
验证码 换一张
取 消