开发者

How to add and subtract 16 bit floating point half precision numbers?

开发者 https://www.devze.com 2023-04-09 17:01 出处：网络

How do I add and sub开发者_如何学Pythontract 16 bit floating point half precision numbers? Say I need to add or subtract:

相关专题：bit bit-manipulation twos-complement

How do I add and sub开发者_如何学Pythontract 16 bit floating point half precision numbers?

Say I need to add or subtract:

1 10000 0000000000

1 01111 1111100000

2’s complement form.

The OpenEXR library defines a half-precision floating point class. It's C++, but the code for casting between native IEEE754 float and half should be easy to adapt. see: Half/half.h as a start.

Assuming you are using a denormalized representation similar to that of IEEE single/double precision, just compute the sign = (-1)^S, the mantissa as 1.M if E != 0 and 0.M if E == 0, and the exponent = E - 2^(n-1), operate on these natural representations, and convert back to the 16-bit format.

sign1 = -1 mantissa1 = 1.0 exponent1 = 1

sign2 = -1 mantissa2 = 1.11111 exponent2 = 0

sum: sign = -1 mantissa = 1.111111 exponent = 1

Representation: 1 10000 1111110000

Naturally, this assumes excess encoding of the exponent.