A story that even a man who does not understand C language could add new functions to Ruby 2.6

This article is the 9th day of ISer Advent Calendar 2018.

Introduction

Needless to say, Ruby is a well-known programming language. It is widely used all over the world, centering on Ruby on Rais, which is a web application framework. I usually work part-time using Ruby. I love Ruby, the language I use at work and the language that made me like writing programs. I've always wanted to contribute to Ruby in some way. The most obvious form is to write your own code to improve Ruby. But, of course, you need a language other than Ruby to make Ruby. In fact, much of the Ruby source code is written in C. I had no practical experience outside of Ruby, and I felt the threshold of C was so high that I couldn't put into action my desire to improve Ruby. However, the other day, when I wrote a Ruby patch for some reason and sent it, I was able to get it into Ruby. I'm going to write this article so that you can feel the threshold for getting involved in Ruby development as low as possible.

Trigger

During the summer vacation, I found an event called Cookpad Ruby Hack Challenge. Cookpad Ruby Hack Challenge # 5 [held for two days] When asked, it was an event "Let's develop a Ruby interpreter together!" A long-standing desire may be fulfilled here. With that expectation, I decided to participate.

In fact, I've always wondered about the Ruby spec. The Hash class has a method called merge that merges the two hashes.

hash1 = {a: 1, b: 2}
hash2 = {c: 3, d: 4}

hash1.merge(hash2)
# => {a: 1, b: 2, c: 3, d: 4}

However, this method can only take one argument, so you cannot combine three or more hashes at the same time.

hash3 = {e: 5, f: 6}

hash1.merge(hash2, hash3)
# => ArgumentError (wrong number of arguments (given 2, expected 1))

This is a little inconvenient. It's a great opportunity, so I decided to do my best to change this.

Feature implementation

First, I will post the pull request that I actually made on GitHub. Make the number of arguments of Hash#merge variable

From here, I will write how the development proceeded.

First, in order to develop Ruby, I downloaded various source codes and prepared the development environment. This area is summarized in detail on the following page, which is also a reference material for the Ruby Hack Challenge, so I will omit the details.

ko1/rubyhackchallenge

Anyway, if the ruby repository is cloned to workdir / ruby with the following directory structure, I think that it is enough for the first preparation.

workdir/
 ├ ruby/
 ├ build/
 └ install/

From here, we will edit the Ruby source code workdir / ruby. This time, we will edit ruby / hash.c, which is a collection of code related to the Hash class.

ruby/hash.c ↑ Before I edited it, it was like the above. The number of lines is 4857. long. At first glance, my heart is about to break. But only a part of it is edited. Don't worry. Let's go step by step. At the bottom of the code you can see that a number of functions called rb_define_method are called.

hash.c


    rb_define_method(rb_cHash, "initialize", rb_hash_initialize, -1);
    rb_define_method(rb_cHash, "initialize_copy", rb_hash_initialize_copy, 1);
    rb_define_method(rb_cHash, "rehash", rb_hash_rehash, 0);

    rb_define_method(rb_cHash, "to_hash", rb_hash_to_hash, 0);
    rb_define_method(rb_cHash, "to_h", rb_hash_to_h, 0);
    rb_define_method(rb_cHash, "to_a", rb_hash_to_a, 0);
//・
//・
//・

Here, the processing of the Ruby method passed in the second argument is associated with the C function passed in the third argument. For the time being, when I searched for merge from this, it was at the bottom.

hash.c


    rb_define_method(rb_cHash, "merge!", rb_hash_update, 1);

    rb_define_method(rb_cHash, "merge", rb_hash_merge, 1);

Since merge! is a destructive method of merge (a method that modifies the receiver itself), this implementation needs to be modified along with merge. From the description of rb_define_method, we can see that we need to make changes to the two C functions, rb_hash_update and rb_hash_merge, in order to achieve this goal. Before that, you should look carefully at the last argument of rb_define_method. This shows the number of arguments that a Ruby method can take. For variable length, it is -1. Since this change is an implementation that makes the number of variables that can be taken variable length, it is naturally necessary to change these two to -1.

hash.c


    rb_define_method(rb_cHash, "merge!", rb_hash_update, -1);

    rb_define_method(rb_cHash, "merge", rb_hash_merge, -1);

From here, let's take a look at the actual processing of Hash # merge. First, let's look at rb_hash_merge.

hash.c


static VALUE
rb_hash_merge(VALUE hash1, VALUE hash2)
{
    return rb_hash_update(rb_hash_dup(hash1), hash2);
}

The VALUE written here is the representation of a Ruby object in the C code. From here, this function seems to be a function that returns some Ruby object. As far as I read the code somehow, I feel like I'm returning a duplicate of hash1 and multiplying it by rb_hash_update (the C function that corresponds to Ruby's merge!). I'd like to allow this function to take variadic arguments for the time being, but I have no idea what to do. So, for the time being, I decided to try implementing a method that takes other variadic arguments.

hash.c


static VALUE
rb_hash_flatten(int argc, VALUE *argv, VALUE hash)
{
//・
//・
//・
}

It seems that variadic arguments can be realized by receiving the receiver itself as the last argument hash, the number of arguments as the first ʻargc, and the array of arguments as the second ʻargv. For the time being, make the argument of rb_hash_merge in this format so that the internal call of rb_hash_update also follows this implementation.

hash.c


static VALUE
rb_hash_merge(int argc, VALUE *argv, VALUE self)
{
    return rb_hash_update(argc, argv, rb_hash_dup(self));
}

All you have to do now is modify rb_hash_update.

hash.c


static VALUE
rb_hash_update(VALUE hash1, VALUE hash2)
{
    rb_hash_modify(hash1);
    hash2 = to_hash(hash2);
    if (rb_block_given_p()) {
	rb_hash_foreach(hash2, rb_hash_update_block_i, hash1);
    }
    else {
	rb_hash_foreach(hash2, rb_hash_update_i, hash1);
    }
    return hash1;
}

It looks like the receiver-side hash of the method and the merged hash are doing different things, depending on whether the block is passed to the method or not. This change only allows the original merge process to be performed multiple times at once, so it seems good to call this process many times with a for statement.

I checked how to write a C for statement and added it.

hash.c


static VALUE
rb_hash_update(int argc, VALUE *argv, VALUE self)
{
    rb_hash_modify(self);
    for(int i = 0; i < argc; i++){
      VALUE hash = to_hash(argv[i]);
      if (rb_block_given_p()) {
    rb_hash_foreach(hash, rb_hash_update_block_i, self);
      }
      else {
    rb_hash_foreach(hash, rb_hash_update_i, self);
      }
    }
    return self;
}

I've just added a for statement, and it should work.

Testing and tweaking

Let's create workdir / ruby / test.rb and write a Ruby script that uses the function we implemented this time.

test.rb


hash1 = {a: 1, b: 2}
hash2 = {c: 3, d: 4}
hash3 = {e: 5, f: 6}

puts hash1.merge(hash2, hash3)

If the environment is set up according to the material mentioned at the beginning of "Implementation of functions", you can type the make run command in workdir / build, and the Ruby script will be __The Ruby you just edited. Will be executed in __. (Some functions are limited, such as the extension library cannot be used.)

$ make run
compiling ../ruby/hash.c
linking miniruby
../ruby/tool/ifchange "--timestamp=.rbconfig.time" rbconfig.rb rbconfig.tmp
rbconfig.rb unchanged
creating verconf.h
verconf.h updated
compiling ../ruby/loadpath.c
linking static-library libruby.2.6-static.a
linking shared-library libruby.2.6.dylib
linking ruby
./miniruby -I../ruby/lib -I. -I.ext/common  ../ruby/tool/runruby.rb --extout=.ext  -- --disable-gems ../ruby/test.rb
{a: 1, b: 2, c: 3, d: 4, e: 5, f: 6}

As intended, the output is a combination of the three hashes! The implementation seems to be successful for the time being.

From here, I will write test code to confirm that this function works properly. There is a test code for hash.c in workdir / ruby / test / ruby / test_hash.rb, so edit it. It is written in the format of minitest, which is a major test framework in Ruby, so it should not be so difficult for those who are accustomed to developing in Ruby.

test_hash.rb


  def test_merge
    h1 = @cls[1=>2, 3=>4]
    h2 = {1=>3, 5=>7}
    h3 = {1=>1, 2=>4}
    assert_equal({1=>3, 3=>4, 5=>7}, h1.merge(h2))
    assert_equal({1=>6, 3=>4, 5=>7}, h1.merge(h2) {|k, v1, v2| k + v1 + v2 })
    assert_equal({1=>1, 2=>4, 3=>4, 5=>7}, h1.merge(h2, h3))
    assert_equal({1=>8, 2=>4, 3=>4, 5=>7}, h1.merge(h2, h3) {|k, v1, v2| k + v1 + v2 })
  end

In the above test code, we have confirmed the operation of this method in four ways: when there are one or two arguments for Hash # merge, when a block is passed, and when it is not passed. By the way, by using @cls, you can check the operation of both the Hash object and its child class objects.

Let's run the test. You can run all the test code by running the make test-all command in workdir / build.

$ make test-all
・
・
・
optparse.rb:810:in `update':can't modify frozen -702099568038204768 (FrozenError)
make: *** [verconf.h] Error 1

that? I got an error. Upon closer inspection, the method ʻupdate also called rb_hash_update`.

rb_define_method(rb_cHash, "update", rb_hash_update, 1);

Since the last argument is 1, when ʻupdate is called, rb_hash_updatewill be called as if there was only one argument. However, since I have already changed the argument to the variable length one in the definition ofrb_hash_update, I get an error every time I call ʻupdate. This method was used in the ʻoptparselibrary that manages command line arguments. Therefore, it seems that the command could not be called properly and an error occurred. Modify the linerb_define_method so that it can take variadic arguments when called from ʻupdate.

rb_define_method(rb_cHash, "update", rb_hash_update, -1);

Add minitest for ʻupdate`.

test_hash.rb


  def test_update2
    h1 = @cls[1=>2, 3=>4]
    h2 = {1=>3, 5=>7}
    h3 = {1=>1, 2=>4}
    h1.update(h2, h3) {|k, v1, v2| k + v1 + v2 }
    assert_equal({1=>8, 2=>4, 3=>4, 5=>7}, h1)
  end
$ make test-all
・
・
・
Finished tests in 979.934312s, 19.6534 tests/s, 2362.7614 assertions/s.
19259 tests, 2315351 assertions, 0 failures, 0 errors, 51 skips

The test has passed.

Until merging

From now on, I'll write about how the above changes are merged into the repository of Ruby itself. I will explain the flow until the Ruby patch is imported for the time being, quoting the following document again. ko1/rubyhackchallenge Ruby is versioned with Subversion, not Git. As a result, you can't just submit a pull request to GitHub to get your changes populated. Ruby is discussed daily on Redmine, the OSS for project management, so if you want to make changes, first announce the changes you want to make (this is called a ticket). .. Ruby Issue Tracking System Tickets are divided into "Feature request" and "Bug report". This change is categorized as the former. In this case, it seems that the ticket should include the following contents.

--Abstract (short summary of suggestions) --Background: What's wrong now, what's really in trouble, what are the actual use cases) --Proposal --Implementation: Implementation is a strong proof to determine if the proposal is feasible. --Evaluation (Evaluation: What was improved by the proposal, if there is an implementation, its performance is sufficient, etc.) --Discussion (Discussion: What to consider, comparison with other approaches, etc.) --Summary

Above all, regarding Feature request, it is important to see if the person who actually uses the function is actually used, so if you write a specific use case etc. in the Background section, it seems that the possibility of being incorporated will increase. If you like this, people with commit permissions to Subversion (not to mention the Ruby Committer!) Will incorporate the changes into Ruby's Subversion.

This is the ticket I made this time. Make the number of arguments of Hash#merge variable For Implemention, I made a separate pull request on GitHub and attached the link. For Background, there are examples of stack overflow and Qiita where you have to write annoying code because 3 hashes cannot be merged with Hash # merge. I searched for it from and pasted it.

So far, the Ruby Hack Challenge is over. Finally, I had the opportunity to announce my implementation in front of everyone at Ruby Committer, and I got a positive response from everyone, and it was decided that this code would be incorporated into the Ruby repository on the spot. I did. I was really happy at that time ...

After that, on the pull request on GitHub, I changed the parts pointed out by various people, such as incorrect notation, indentation, and incomplete documentation. (Sometimes you made the changes yourself ... Thank you)

Repeat that for a week ... finally ... スクリーンショット 2018-12-04 17.53.57.png The code I wrote has been incorporated into Ruby! I was really happy.

Finally

This time, I was really happy to be able to contribute to my favorite Ruby by combining various things. Thanks to the people involved in hosting the Ruby Hack Challenge, the Ruby Commiters who accepted it, and the various people who gave us the code reviews. I'm really thankful to you. I hope this article lowers the hurdles for many people to contribute to Ruby. You there who are still shy. I think Ruby is the only major language in Japan that has such a strong community of core developers. It's a waste not to make use of it!

Recommended Posts

A story that even a man who does not understand C language could add new functions to Ruby 2.6
After verifying the Monty Hall problem with Ruby, a story that I could understand well and did not understand well
MockMVC returns 200 even if I make a request to a path that does not exist
[JQuery] A guy that even beginners could understand well
If hash [: a] [: b] [: c] = 0 in Ruby, I want you to extend it recursively even if the key does not exist.
Correspond to "error that basic authentication does not pass" in the test code "The story that could not be done"
File transfer to virtual environment that could not be solved even after trying for a day: Memorandum
How to interact with a server that does not crash the app
A story that suffered from a space that does not disappear even if trimmed with Java The cause is BOM