[C language] the secret of floating point numbers

IEEE 754 standard

Storage method: sign bit, exponent, mantissa
IEEE floating point arithmetic standard: IEEE 754

type Sign bit index Mantissa
float 1 8 23
double 1 11 52

Take float as an example to discuss floating point numbers.

The memory representation of floating-point type is different from that of integer type, and the memory representation of floating-point type is relatively complex.

According to IEEE 754 standard, float divides its 32bits into three parts, with symbol bit S accounting for 1bit, index E accounting for 8bits and mantissa M accounting for 23bits.

The following discusses the memory representation of float type with a common body.

union
{
    float f; //Truth value
    struct
    {
        uint32_t M : 23; // index
        uint32_t E : 8;  // Mantissa
        uint32_t S : 1;  //Sign bit
    };
    uint32_t v; //Machine code
} f;

Floating point number calculation formula:
$$f = (-1){S}2{E-127}1.M$$

For example: given the machine code f.v = 0x41360000, the truth value f.f.

  1. Expand the machine code into binary 0x41360000 - > 0b0100 0001 0011 0110 0000; S = 0b0 = 0x0 = 0, E = 0b100 0001 0 = 0x82 = 130, M = 0b011 0110 0000 = 0x360000;
  2. 1.M is the binary decimal calculation, and the cube multiplied by 2 according to S and E is multiplied by 1 M. That is, the decimal point moves three digits to the right, and the true value F.F = 0b1011 011 = 11.375.

Code verification:

#include <stdio.h>
#include <stdint.h>

union uFloat
{
    float f; //Truth value
    struct
    {
        uint32_t M : 23; // index
        uint32_t E : 8;  // Mantissa
        uint32_t S : 1;  //Sign bit
    };
    uint32_t v; //Machine code
} f;

void uPrintln(union uFloat uf)
{
    printf("f: %f\nS: 0x%x, E: 0x%x, M: 0x%x\nv: 0x%08x\n", uf.f, uf.S, uf.E, uf.M, uf.v);
}

int main(void) {
    f.v = 0x41360000;
    uPrintln(f);
    printf("sizeof: %d\n", sizeof(f));
    return 0;
}

Output:

f: 11.375000
S: 0x0, E: 0x82, M: 0x360000
v: 0x41360000
sizeof: 4

Imprecise type

In a limited number of bits, the floating-point type is better than uint32_t represents a larger range, resulting in partial data loss. It cannot represent all accurate values. It is an imprecise type.

For example, in the open interval (33829141358099960719503586036763590656.0339620349071475272940736969149792649216.0), at least 4202553 integers cannot be represented normally.

int main(void) {
    f.f = 338291141358099960719503586036763599657.000000;
    uPrintln(f);
    return 0;
}

Output:

f: 338291141358099960719503586036763590656.000000
S: 0x0, E: 0xfe, M: 0x7e8081
v: 0x7f7e8081

The closer the floating-point data is to zero, the smaller the error is, and vice versa.

verification

This conclusion is verified by two simple procedures.

The function of this C program is to increase the floating point number corresponding to the interval sampling of machine code and print it to the table file.

#include <stdio.h>
#include <stdint.h>

union
{
    float f;
    struct
    {
        uint32_t M : 23; // index
        uint32_t E : 8;  // Mantissa
        uint32_t S : 1;  //Sign bit
    };
    int32_t i;
} u;

void uPrintln(float f)
{
    u.f = f;
    printf("i: %d, f: %f, S: %d, E: %d, M: %d\n", u.i, u.f, u.S, u.E, u.M);
}

void ufPrintln(float f, FILE *fp)
{
    u.f = f;
    fprintf(fp, "%d,0x%08x,%f,%d,%d,%d,\n", u.i, u.i, u.f, u.S, u.E, u.M);
}

void pPrintfln(uint32_t cur, uint32_t max)
{
    static uint64_t tmp = 0;
    uint64_t pten = cur * 100 / max;
    if (tmp != pten)
    {
        tmp = pten;
        printf("\rprocessing: %d", pten);
    }
}

#define MAX UINT32_MAX
#define INT UINT16_MAX

int main(void)
{
    FILE *fp = NULL;

    fp = fopen("float.csv", "w+");
    if (fp == NULL)
    {
        printf("open error!\n");
        return -1;
    }
    printf("open successful!\n");
    fprintf(fp, "i,i,f,S,E,M,\n");
    for (uint32_t i = 0; i < MAX; i += INT)
    {
        u.i = i;
        ufPrintln(u.f, fp);
        pPrintfln(i, MAX);
    }
    fclose(fp);

    return 0;
}

The function of this python file is to read table files and draw points.

import matplotlib.pyplot as plt
import pandas as pds

data = pds.read_csv('./float.csv')
data.plot(x="i.1", y="f")
plt.show()

It can be seen that when the machine code increases to a certain extent, the floating-point number begins to change exponentially, which means that the more floating-point numbers are crossed and lost between two adjacent machine codes.

communication

Wechat official account: IOT points north
Station B: IOT refers to the north
Thousand penguins: 658685162

Tags: C

Posted by maest on Tue, 10 May 2022 16:37:10 +0300