When I was imagining what I wanted to implement, in the process it became necessary to "count the bits with the integer 1". However, I noticed that implementing it in Ruby is quite slow, so I decided to implement it in C language etc. (jkr2255 / bit_counter).
Recently, it has become a CPU instruction called POPCNT
(described later), and when I thought that someone had implemented it, I realized that surprisingly, Bignum
was almost untouched.
"If you don't have one, you can make it yourself!"
In the case of Ruby, integers allow infinite digits (conceptually), and negative numbers are two's complement representations, so the sign bit 1 is ** infinitely continuous **. As expected, counting the number of 1s in this state is "infinite" and is meaningless, so when it is a negative number, it is implemented as "counting the number of 0s and then returning it as a minus". [^ 1].
If you make an extension in C language, JRuby will not be supported. However, in this case, there were methods such as Long.bitCount ()
and BigInteger # bitCount ()
on the Java side, so call this. jkr_2255 / items / 33b1eb1b2d4099ca1c67) was almost done.
However, if the C extension build starts in JRuby, an error will occur, so add a conditional branch to see if it is JRuby in Rakefile
or gemspec
, and generate a separate Gem for JRuby to RubyGems. I had to push.
Ruby integers are divided into Fixnum
and Bignum
, but Fixnum
can easily be a C integer, so you should do it. That is
Bignum
to a bit arrayThese two.
First, check with CPUID
to see if the POPCNT
instruction introduced in SSE 4.2 can be used, and if it is valid, count with POPCNT
[^ 2]. If not, use GCC's built-in __builtin_popcountl ()
, otherwise use a handwritten version of the count function.
Bignum
to an arrayIf you continue the operation with Bignum
, an object will be created each time, and the speed will not come out at all. So, once you convert from Bignum
to a numeric array, there are two functions, rb_big_pack
and rb_integer_pack
, depending on the version of Ruby, and which one is valid may differ. Also, when the capacity of the conversion buffer is small, ʻALLOCA is used instead of ʻALLOC
to avoid overhead for the stack (in fact, this alone is 30% faster for small numbers).
If you enter the array, the rest is counting, but in the Windows x64 environment, long
is 32 bits wide, so the pointer is read so that the 64-bit POPCNT
instruction can be used.
Compared to the commonly used num.to_s (2) .count ('1')
, it was 5 to 20 times faster (on machines with POPCNT instructions).
Ichiou Rubinius also works, but it doesn't get much faster, so I'm considering something that can't be done.
[^ 1]: Java's BigInteger # bitCount ()
is to return the number of bits "different from the sign bit". That is, it also returns a positive value for negative numbers.
[^ 2]: It seems that it will be a little faster if you use AVX etc. to get enthusiastic, but I have not done so much.
Recommended Posts