Adding 1 10 times gives more than 10 (or equal to 10).
int_sum.py
x = 0
for i in range(10):
x += 1
print(x < 10)
print(x == 10)
print(type(x))
It's natural, isn't it?
% python int_sum.py
False
True
<type 'int'>
So what if you add 0.1 10 times?
float_sum.py
x = 0.0
for i in range(10):
x += 0.1
print(x < 1)
print(type(x))
Not more than 1 ...
% python float_sum.py
True
<type 'float'>
Numerical values are represented by binary numbers on a computer, but this happens because decimal numbers are represented by finite-digit binary numbers.
Detailed explanation http://docs.python.jp/3/tutorial/floatingpoint.html However, let's take a look at the expression method of "floating point numbers", which is the premise of the explanation.
Before getting into the main subject, let's review how to represent numbers in binary.
To represent numbers in decimal, use numbers from 0 to 9 and powers of 10, for example.
2015
= 2*1000 + 0*100 + 1*10 + 5*1
= 2*10^3 + 0*10^2 + 1*10^1 + 5*10^0
I wrote like this.
Similarly, in binary numbers, the numbers 0 to 1 and the power of 2
11001011
= 1*2^7 + 1*2^6 + 0*2^5 + 0*2^4 + 1*2^3 + 0*2^2 + 1*2^1 + 1*2^0
Represents a number like. In addition, since binary notation has many digits and is difficult to write, it is often written in a compact form using hexadecimal numbers when explaining. For example, the binary number 11001011 is written as 0xcb in the hexadecimal number.
Let's analyze what the decimal representation actually looks like using a simple C program.
float.c
#include <stdio.h>
int main(int argc, char *argv[])
{
double z;
if (argc != 2) {
fprintf(stderr, "usage: %s str\n", argv[0]);
return -1;
}
if (sscanf(argv[1], "%lf", &z) != 1) {
fprintf(stderr, "parse failed\n");
return -1;
}
printf("double: %f\n", z);
printf("hex: %016llx\n", *(unsigned long long *)&z);
return 0;
}
In this program
I am doing that.
% gcc -o float float.c
If you compile with, give an argument and execute it, it will be as follows.
% ./float 1
double: 1.000000
hex: 3ff0000000000000
% ./float -1
double: -1.000000
hex: bff0000000000000
% ./float 0.5
double: 0.500000
hex: 3fe0000000000000
% ./float -0.5
double: -0.500000
hex: bfe0000000000000
% ./float 2
double: 2.000000
hex: 4000000000000000
Comparing the hexadecimal notation of 1 and -1, 0.5 and -0.5, it seems that the difference in sign corresponds to the difference between 3 and b at the beginning. When expressed in binary numbers
So you can expect a positive number if the most significant bit is 0 and a negative number if it is 1. If it is correct
So -2 should be c000000000000000. Let's actually try it.
% ./float -2
double: -2.000000
hex: c000000000000000
It seems that the expectation was correct. Next, if we focus on the part excluding the sign bit
And, when doubled, it seems that the first 3 characters (11-bit value excluding the sign bit) are incremented by 1. Therefore, 4 (2 ^ 2) can be expected to be 4010000000000000, but when you actually check it,
% ./float 4
double: 4.000000
hex: 4010000000000000
It became. In other words, it was found that the 11 bits following the sign bit represent the exponent (the number multiplied by 2).
So what do the remaining 13 characters (52 bits) represent? Given the binary number 1.1, that is, 1 + 1/2 = 1.5
% ./float 1.5
double: 1.500000
hex: 3ff8000000000000
0x8 (1000) appears at the top. Binary 1.11 That is, 1 + 1/2 + 1/4 = 1.75
% ./float 1.75
double: 1.750000
hex: 3ffc000000000000
In this case it starts at 0xc (1100). Then the binary number 1.111, that is, 1 + 1/2 + 1/4 + 1/8 = 1.875
% ./float 1.875
double: 1.875000
hex: 3ffe000000000000
And if you summarize up to here
Comparing the left and right, it seems that the value on the left is the value on the right, with the 1 above the decimal point omitted. So what happens with 1.00001? If you give 1 + 1/32 = 1.03125 as an argument
% ./float 1.03125
double: 1.031250
hex: 3ff0800000000000
It became 0x08 (0000 1000).
Again, this can be explained by the omission of the 1 above the decimal point. In the case of the power of 2 that was seen at the beginning, the lower 52 bits were 0, but if the power of 2 is expressed in binary, the most significant power is 1 and the rest is 0, so that is also explained. Is attached.
From what we have observed so far, the expression of decimal numbers can be expressed as an expression.
(-1)^(x >> 63) * 2^((0x7ff & (x >> 52)) - 0x3ff) * ((x & 0x000fffffffffffff) * 2^(-52) + 1)
It will be. Note that 0 cannot be expressed in this form, but it plays an important role when multiplying, so
% ./float 0
double: 0.000000
hex: 0000000000000000
And all the bits are set to 0.
Summary
Decimal numbers were expressed in the form of.
Let's print the value in the middle by the calculation of adding 0.1 10 times, which we saw at the beginning.
float_sum.c
#include <stdio.h>
int main()
{
double z = 0.0;
for (int i = 0; i < 10; i++)
{
z += 0.1;
printf("hex: %016llx\n", *(unsigned long long *)&z);
}
return 0;
}
% gcc -o float_sum float_sum.c
% ./float_sum
hex: 3fb999999999999a
hex: 3fc999999999999a
hex: 3fd3333333333334
hex: 3fd999999999999a
hex: 3fe0000000000000
hex: 3fe3333333333333
hex: 3fe6666666666666
hex: 3fe9999999999999
hex: 3feccccccccccccc
hex: 3fefffffffffffff
First, the first line should represent the decimal number 0.1,
2^((0x3fb - 0x3ff) * (0x1999999999999a * 2^(-52))
= 0x1999999999999a * 2^(-56)
= 7205759403792794.0 / 72057594037927936.0
It is not possible to accurately represent 0.1 (it is a little larger than 0.1). When the decimal number 0.1 is expressed as a binary number, it becomes a recurring decimal number, and in the hexadecimal number, 9 continues infinitely, but it shows that there is an error in expressing it in a finite number of digits.
Next, let's see how to calculate addition. Floating-point numbers were represented by exponents and mantissa, but their addition is in decimal.
10 * 3 + 100 * 5 = 100 * (0.3 + 5) = 100 * 5.3
Do something similar to. In fact, if you write a decimal in the form (exponent, mantissa),
3fb999999999999a + 3fb999999999999a
= (3fb, 0x1999999999999a) + (3fb, 0x1999999999999a)
= (3fb, 0x1999999999999a + 0x1999999999999a)
= (3fb, 0x33333333333334)
= (3fc, 0x1999999999999a)
= 3fc999999999999a
3fc999999999999a + 3fb999999999999a
= (3fc, 0x1999999999999a) + (3fb, 0x1999999999999a)
= (3fc, 0x1999999999999a) + (3fc, 0xccccccccccccd)
= (3fc, 0x1999999999999a + 0xccccccccccccd)
= (3fc, 0x26666666666667)
= (3fd, 0x13333333333334)
= 3fd3333333333334
Here is the rounding rule for mantissa shifts
I'm using
0x33333333333334 = ...00110100
1 bit shift →...1010 = 0x1999999999999a
0x999999999999a = ...10011010
1 bit shift →...1101 = 0xccccccccccccd
0x26666666666667 = ...01100111
1 bit shift →...0100 = 0x13333333333334
Again, we need to round to represent the exponent with 53 bits, and we can see that there is an additional error. Due to these errors, adding 0.1 10 times will result in less than 1.
Now that you understand the representation of floating point numbers using a C program, Finally, let's make sure that Python actually uses the same expression.
float_sum_hex.py
x = 0.0
for i in range(10):
x += 0.1
print(x.hex())
% python float_sum_hex.py
0x1.999999999999ap-4
0x1.999999999999ap-3
0x1.3333333333334p-2
0x1.999999999999ap-2
0x1.0000000000000p-1
0x1.3333333333333p-1
0x1.6666666666666p-1
0x1.9999999999999p-1
0x1.cccccccccccccp-1
0x1.fffffffffffffp-1
The writing method is slightly different, but the same values as those seen in C language appear in each.
In fact, if you look at the CPython implementation, the Python float value is represented by a C double.
cpython/Include/floatobject.h
typedef struct {
PyObject_HEAD
double ob_fval;
} PyFloatObject;
Recommended Posts