Well, it's not surprising to those who know it.
If you are given an array of numbers in Ruby and want to find the sum of its elements, [Array # sum](https://docs.ruby-lang.org/ja/2.7.0/method/Array/i/ The rule of thumb is to use sum.html).
Let's compare the code below.
numbers = Array.new(1000){ rand }
# (1)
numbers.inject(&:+)
# (2)
numbers.inject(:+)
# (3)
numbers.sum
The speed increases in the order of (1) → (2) → (3).
Of these, (3) is the fastest and simplest, and that alone is the reason why sum
should be adopted.
But that's not all.
If the number is Integer or Rational, there is no calculation error in the first place, but if the number is Float, the calculation error must be considered.
Floating-point arithmetic is often error-prone. It can't be helped in itself, but if you repeat the operation more than once, you have to consider the ** cumulative ** of the error.
To write the conclusion first, the third reason why sum
should be adopted is that sum
adopts "Kahan's addition algorithm" that suppresses the accumulation of errors.
Reference: [Kahan summation algorithm-Wikipedia](https://ja.wikipedia.org/wiki/%E3%82%AB%E3%83%8F%E3%83%B3%E3%81%AE%E5% 8A% A0% E7% AE% 97% E3% 82% A2% E3% 83% AB% E3% 82% B4% E3% 83% AA% E3% 82% BA% E3% 83% A0)
Before getting into the main subject, I would like to sort out some background knowledge.
It is well known that the number 0.1 cannot be represented by a binary floating point number. Simply put, the number 0.1 is an infinite decimal in binary, so it cannot be represented by a finite number of digits.
So in Ruby
puts 0.1
What does it mean that 0.1
is displayed on the terminal when you execute?
First, the Ruby processing system looks at a floating-point number literal called 0.1
and creates a Float
object.
This represents a floating point number that is "very close to 0.1 but not the same".
Roughly speaking, it is a number with a precision of about 15 digits (in decimal). Actually, what kind of bit string the Ruby Float object is represented by is environment-dependent, but it seems that it is based on the numeric type of C language double in many environments, and in that case. All the content of this article is written on this premise, and the results may differ in environments that do not apply.
Why is it displayed as 0.1
even though I am displaying a number that is not 0.1?
That's because when you give a Float object to puts
, the to_s
method first converts it to a String object called "decimal number string", but the result is the string " 0.1 "
. Is.
Unlike decimal → decimal, binary → decimal conversion does not make a finite decimal into an infinite decimal [^ go]. In fact, the Float object (the number represented by) represented by the floating-point number literal 0.1
is in decimal notation.
[^ go]: This is because the radix 10 is a multiple of 2.
0.1000000000000000055511151231257827021181583404541015625
It seems to be a finite decimal with 55 digits after the decimal point (unless I made a mistake somewhere).
However, the Ruby Float # to_s method does not return such a string.
It takes only the number of digits commensurate with the precision that the Float object can have, and returns the string " 0.1 "
. This is a specification of to_s
.
(I'm not very familiar with this, so if you have any mistakes or supplements, please.)
Let's see an example where the results are different for the main subject sum
and ʻinject (: +)`.
numbers = [0.1] * 6
puts numbers.sum == numbers.inject(:+)
# => false
The sum of the six 0.1
s does not match (again, the results may vary depending on the environment).
Let's display each result with puts
.
numbers = [0.1] * 6
puts numbers.sum # => 0.6000000000000001
puts numbers.inject(:+) # => 0.6
Looking at this, don't make a quick consensus, "Eh? ʻInject (: +) is more accurate than
sum`! Liar! "[^ Ten].
[^ ten]: There is a site that explains that puts
is more accurate than the result of adding 10 0.1s, but this is also not valid.
As we've seen, the Float object created by Ruby's floating-point literal 0.1
represents a non-0.1 number.
By the way, although the result of the ʻinject version is displayed as
0.6, it does not mean that the return value is 0.6 (a Float object representing the number). It's just that the result of
to_s the return value is the string
"0.6" `.
So how do you know exactly what the error is? To do this, use Float # to_r to convert the Float object to a Rational object. This method returns a rational number object that exactly matches the number represented by Float.
It's like this.
puts 0.1.to_r
# => 3602879701896397/36028797018963968
Somehow a great fraction was displayed, but it's not a mistake.
A Float
object based on a literal of 0.1
represents such a number.
At first glance, it looks like a string of numbers, but if you look closely, the numerator and denominator have the same numbers except for the end, and the numerator is an order of magnitude less. In other words, it can be seen that it is very close to the numerical value of 0.1.
If you bring it to Rational, there will be no calculation error in addition, subtraction, multiplication and division.
Let's do it like this.
The sum taken after making Rational is called "true sum".
The absolute value ** of the difference ** between the sum created by sum
and ʻinjectand the true sum ** is called the" error ". To compare which one has the smaller error, calculate the error of the
sum version minus the error of the ʻinject
version. If this is positive, the error of the sum
version is large, and if it is negative, the error of the ʻinject` version is large.
#Number to add
x = 0.1
#Number to add
n = 6
#True sum (Rational)
exact_sum = x.to_r * n
#An array of n x
numbers = [x] * n
#Sum by sum (Rational)
sum_by_sum = numbers.sum.to_r
#That error
error_by_sum = (exact_sum - sum_by_sum).abs
#Rational by inject
sum_by_inject = numbers.inject(:+).to_r
#That error
error_by_inject = (exact_sum - sum_by_inject).abs
#Comparison
puts sum_by_sum - sum_by_inject # => 1/9007199254740992
puts error_by_sum - error_by_inject # => 0/1
That? The sum of the sum
version and the sum of the ʻinjectversion are certainly slightly different, but are the errors exactly the same? At first glance, it's puzzling, but nothing strange. This is because the absolute value is taken when calculating the "error". The sum of the
sum and ʻinject
versions exists at the same distance ** to the left and right of the true sum. The value of the sum itself is different, but the amount of deviation from the true sum is the same.
"Hmm? Then, was it a lie to say that sum
is more accurate than ʻinject`?"
I want you to listen to the story to the end [^ sho].
[^ sho]: To be honest, I myself saw this result and thought for a moment that it was something wrong.
In this case, that is, when 6 0.1
(literal Floats) are added together, there is no difference in accuracy.
This was a lucky case for the ʻinjectversion. Then, is there a case where the
sumversion is actually closer to the true sum? Or is there the opposite case? Let's check it as follows. The number to be added is
0.1` (Float by a literal), and the number to be added is changed.
x = 0.1
1.upto(100) do |n|
exact_sum = x.to_r * n
numbers = [x] * n
sum_by_sum = numbers.sum.to_r
error_by_sum = (exact_sum - sum_by_sum).abs
sum_by_inject = numbers.inject(:+).to_r
error_by_inject = (exact_sum - sum_by_inject).abs
puts "%4d %2d" % [n, (error_by_sum <=> error_by_inject)]
end
ʻError_by_sum <=> error_by_inject should return
1if the error in the
sum version is greater,
-1if the opposite is true, and
0` if the error is the same. ..
The result is as follows.
1 0
2 0
3 0
4 0
5 0
6 0
7 -1
8 -1
9 -1
10 -1
11 -1
12 0
13 0
14 0
15 -1
16 -1
17 -1
18 -1
19 -1
20 -1
21 -1
22 -1
23 -1
24 -1
25 -1
26 -1
27 -1
28 -1
29 -1
30 -1
31 -1
32 -1
33 -1
34 -1
35 -1
36 -1
37 -1
38 -1
39 -1
40 -1
41 -1
42 -1
43 -1
44 0
45 0
46 -1
47 -1
48 -1
49 -1
50 -1
51 -1
52 -1
53 -1
54 -1
55 -1
56 -1
57 -1
58 -1
59 -1
60 -1
61 -1
62 -1
63 -1
64 -1
65 -1
66 -1
67 -1
68 -1
69 -1
70 -1
71 -1
72 -1
73 -1
74 -1
75 -1
76 -1
77 -1
78 -1
79 -1
80 -1
81 -1
82 -1
83 -1
84 -1
85 -1
86 -1
87 -1
88 -1
89 -1
90 -1
91 -1
92 -1
93 -1
94 -1
95 -1
96 -1
97 -1
98 -1
99 -1
100 -1
As you can see, there were 11 of the 1 to 100 with the same magnitude of error.
Everything else is smaller in the sum
version.
After all, within the scope of this experiment,
sum
version and the ʻinject` version is the same → a small numbersum
version has a smaller error → Manysum
version → NoneMet. But, of course, the results of such a simple experiment cannot be generalized. All I can do now is to believe in Kahan's algorithm and to confirm that the expected results were obtained within the scope of the above experiment.
By the way, there were 7 cases where the sum
version matched the true sum. Of these, three also matched the ʻinject` version.
Use sum
instead of ʻinject (: +)` to get the sum.
It's simpler and faster. And when the numerical value is a floating point number, the accumulation of errors can be suppressed.
Recommended Posts