Understanding floating point representation errors; what's wrong with my thinking?_问答_开发者

Understanding floating point representation errors; what's wrong with my thinking?

开发者 https://www.devze.com 2023-03-11 22:43 出处：网络

I\'m having some trouble understanding why some figures can\'t be represented with floating point number.

I'm having some trouble understanding why some figures can't be represented with floating point number.

As we know, a normal float would have sign bit, exponent, and mantissa. Why can't, for example, 0.1 be represented accurately in this system; the way I think of it would be that you put 10 (1010 in bin) to mantissa and -2 to the exponent. As far as I know, both numbers can be accurately rep开发者_StackOverflow社区resented in the mantissa and exponent. So why can't we represent 0.1 accurately?

If your exponent is decimal (i.e. it represents 10^X), you can precisely represent 0.1 -- however, most floating point formats use binary exponents (i.e. they represent 2^X). Since there are no integers X and Y such that Y * (2 ^ X) = 0.1, you cannot precisely represent 0.1 in most floating point formats.

Some languages have types with both exponents. In C#, for example, there is a data type aptly named decimal which is a floating point format with a decimal exponent so it will support storing a number like 0.1, although it has other uncommon properties: The decimal type can distinguish between 0.1 and 0.10, and it is always true that x + 1 != x for all values of x.

For most common purposes, though, C# also has the float and double floating point types that cannot precisely store 0.1 because they use a binary exponent (as defined in IEEE-754). The binary floating point types use less storage, are faster because they are easier to implement, and have more operations defined on them. In general decimal is only used for financial values where the exact representation of all decimal values is important and the storage, speed, and range of operations are not.

You must start reading What Every Computer Scientist Should Know About Floating-Point Arithmetic

Check out :

Floating-Point Number Tutorial
Tutorial: Floating-Point Binary

Each floating-point number in the IEEE 754 standard is, in effect, some integer multiplied by some integer power of two. E.g., 3 is represented by 3 * 2⁰, 96 is represented by 3 * 2³, and 3/16 is represented by 3 * 2^-4.

There are no integers x and y such that .1 = x * 2^y, therefore .1 cannot be exactly represented by a floating-point number. Proof: If .1 = x * 2^y, then 10x = 2^-y. 2^-y is clearly positive, so x is positive. It is also an integer, so 10x is divisible by 10, so it is divisible by 5. Therefore 2^-y is a power of two that is divisible by 5, which is clearly impossible.

That would be 10 × 2^-1 = 5, not 0.1.

Generally, it's like representing one-third in base ten: it's just not possible with a finite number of digits.

By the way, 10₁₀ = 1010₂ ≠ 1100₂.

You're thinking about 1* 10^-1, which works for a decimal floating number representation, such as decimal in C#. The normal floating point (such as float, double) uses binary representation, i.e. in powers of 2

Normally, binary is used because they can be more efficiently arranged in bits. Decimal is normally used when absolute decimal precision is required, for example when counting money.