Half precision floating-point format

Computing

Computing is usually defined as the activity of using and improving computer hardware and software. It is the computer-specific part of information technology...

, half precision is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory.

In IEEE 754-2008 the 16-bit base 2 format is officially referred to as binary16. It is intended for storage (of many floating-point values where higher precision need not be stored), not for performing arithmetic computations.

Half-precision floating point is a relatively new binary floating-point format. It was created concurrently by Nvidia

NVIDIA

Nvidia is an American global technology company based in Santa Clara, California. Nvidia is best known for its graphics processors . Nvidia and chief rival AMD Graphics Techonologies have dominated the high performance GPU market, pushing other manufacturers to smaller, niche roles...

and Industrial Light & Magic. Nvidia defined the half datatype in the Cg language, released in early 2002, and was the first to implement 16-bit floating point in silicon, with the GeForce FX, released in late 2002. ILM was searching for an image format that could handle dynamic ranges, but without the hard drive and memory cost of floating-point representations that are commonly used for floating-point computation (single and double precision).

This format is used in several computer graphics

Computer graphics

Computer graphics are graphics created using computers and, more generally, the representation and manipulation of image data by a computer with help from specialized software and hardware....

environments including OpenEXR

OpenEXR

OpenEXR is a high dynamic range imaging image file format, released as an open standard along with a set of software tools created by Industrial Light and Magic , released under a free software license similar to the BSD license....

, OpenGL

OpenGL

OpenGL is a standard specification defining a cross-language, cross-platform API for writing applications that produce 2D and 3D computer graphics. The interface consists of over 250 different function calls which can be used to draw complex three-dimensional scenes from simple primitives. OpenGL...

, Cg, and D3DX

D3dx

In computing, D3DX is a high level API library which is written to supplement Microsoft's Direct3D graphics API. The D3DX library was introduced in Direct3D 7, and subsequently was improved in Direct3D 9...

. The advantage over 8-bit or 16-bit binary integers is that the increased dynamic range

Dynamic range

Dynamic range, abbreviated DR or DNR, is the ratio between the largest and smallest possible values of a changeable quantity, such as in sound and light. It is measured as a ratio, or as a base-10 or base-2 logarithmic value.-Dynamic range and human perception:The human senses of sight and...

allows for more detail to be preserved in highlights and shadow

Shadow

A shadow is an area where direct light from a light source cannot reach due to obstruction by an object. It occupies all of the space behind an opaque object with light in front of it. The cross section of a shadow is a two-dimensional silhouette, or reverse projection of the object blocking the...

s for images. The advantage over 32-bit single-precision binary formats is that it requires half the storage and bandwidth (at the expense of precision).

IEEE 754 half-precision binary floating-point format: binary16

The IEEE 754 standard specifies a binary16 as having:

Sign bit
Sign bit
In computer science, the sign bit is a bit in a computer numbering format that indicates the sign of a number. In IEEE format, the sign bit is the leftmost bit...

: 1 bit
Exponent width: 5 bits
Significant precision
Precision (arithmetic)
The precision of a value describes the number of digits that are used to express that value. In a scientific setting this would be the total number of digits or, less commonly, the number of fractional digits or decimal places...

: 11 (10 explicitly stored)

The format is assumed to have an implicit lead bit with value 1 unless the exponent field is stored with all zeros. Thus only 10 bits of the significand

Significand

The significand is part of a floating-point number, consisting of its significant digits. Depending on the interpretation of the exponent, the significand may represent an integer or a fraction.-Examples:...

appear in the memory format but the total precision is 11 bits. In IEEE 754 parlance, there are 10 bits of significand, but there are 11 bits of significand precision (log₁₀(2¹¹) ≈ 3.311 decimal digits). The bits are laid out as follows:

Exponent encoding

The half-precision binary floating-point exponent is encoded using an offset-binary representation, with the zero offset being 15; also known as exponent bias in the IEEE 754 standard.

E_min = 01_h−0F_h = −14
E_max = 1E_h−0F_h = 15
Exponent bias
Exponent bias
In IEEE 754 floating point numbers, the exponent is biased in the engineering sense of the word – the value stored is offset from the actual value by the exponent bias....

= 0F_h = 15

Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 15 has to be subtracted from the stored exponent.

The stored exponents 0x00 and 0x1f are interpreted specially.

Exponent	Significand zero	Significand non-zero	Equation
00_h	zero 0 (number) 0 is both a numberand the numerical digit used to represent that number in numerals.It fulfills a central role in mathematics as the additive identity of the integers, real numbers, and many other algebraic structures. As a digit, 0 is used as a placeholder in place value systems... , −0	subnormal numbers	(−1)^signbit × 2⁻¹⁴ × 0.significandbits₂
01_h, ..., 1E_h	normalized value		(−1)^signbit × 2^{exponent−15} × 1.significandbits₂
1F_h	±infinity Infinity Infinity is a concept in many fields, most predominantly mathematics and physics, that refers to a quantity without bound or end. People have developed various ideas throughout history about the nature of infinity...	NaN NaN In computing, NaN is a value of the numeric data type representing an undefined or unrepresentable value, especially in floating-point calculations... (quiet, signalling)

The minimum strictly positive (subnormal) value is
2⁻²⁴ ≈ 5.96 × 10⁻⁸.
The minimum positive normal value is 2⁻¹⁴ ≈ 6.10 × 10⁻⁵.
The maximum representable value is (2−2⁻¹⁰) × 2¹⁵ = 65504.

Half precision examples

These examples are given in bit representation, in hexadecimal

Hexadecimal

In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

,
of the floating-point value. This includes the sign, (biased) exponent, and significand.

3c00 = 1
c000 = −2

7bff = 6.5504 × 10⁴ (max half precision)

0400 = 2⁻¹⁴ ≈ 6.10352 × 10⁻⁵ (minimum positive normal)

0001 = 2⁻²⁴ ≈ 5.96046 × 10⁻⁸ (minimum strictly positive subnormal)

0000 = 0
8000 = −0

7c00 = infinity
fc00 = −infinity

3555 ≈ 0.33325... ≈ 1/3

By default, 1/3 rounds down like for double precision

Double precision

In computing, double precision is a computer number format that occupies two adjacent storage locations in computer memory. A double-precision number, sometimes simply called a double, may be defined to be an integer, fixed point, or floating point .Modern computers with 32-bit storage locations...

, because of the odd number of bits in the significand.
So the bits beyond the rounding point are 0101... which is less than 1/2 of a unit in the last place

Unit in the Last Place

In computer science and numerical analysis, unit in the last place or unit of least precision is the spacing between floating-point numbers, i.e., the value the least significant bit represents if it is 1...

Precision limitations on integer values

Integers between 0 and 2048 can be exactly represented

Integers between 2049 and 4096 round down to the nearest multiple of 2 (even number)

Integers between 4097 and 8192 round down to the nearest multiple of 4

Integers between 8193 and 16384 round down to the nearest multiple of 8

Integers between 16385 and 32768 round down to the nearest multiple of 16

Integers between 32769 and 65535 round down to the nearest multiple of 32

External links

Minifloats (in Survey of Floating-Point Formats)
OpenEXR site
Half precision constants from D3DX
D3dx
In computing, D3DX is a high level API library which is written to supplement Microsoft's Direct3D graphics API. The D3DX library was introduced in Direct3D 7, and subsequently was improved in Direct3D 9...

(defunct page)
OpenGL treatment of half precision
Analog devices variant (four-bit exponent)
C source code to convert between IEEE double, single, and half precision can be found here
C# source code implementing a half-precision floating-point data type can be found here

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

IEEE 754 half-precision binary floating-point format: binary16

Exponent encoding

Half precision examples

Precision limitations on integer values

See also

External links