I made a library that allows you to call Ruby methods from Python. I will introduce method chains and iterators because they can be used naturally to some extent.

https://github.com/yohm/rb_call

How it was made

We are developing a Rails app that manages jobs for scientific and technological calculations, and the behavior can be controlled by the Ruby API. However, since there are many Python users in the field of scientific computing, there were many requests for a Python API instead of Ruby.

What can you do?

For example, suppose you have the following Ruby code.

`minimal_sample.rb`


class MyClass
  def m1
    "m1"
  end

  def m2(a,b)
    "m2 #{a} #{b}"
  end

  def m3(a, b:)
    "m3 #{a} #{b}"
  end

  def m4(a)
    Proc.new { "m4 #{a}" }
  end

  def m5
    enum = Enumerator.new{|y|
      (1..10).each{|i|
        y << "#{i}" if i % 5 == 0
      }
    }
  end
end

if $0 == __FILE__
  obj = MyClass.new
  puts obj.m1, obj.m2(1,2), obj.m3(3,b:4)  #=> "m1", "m2 1 2", "m3 3 4"
  proc = obj.m4('arg of proc')
  puts proc.call                           #=> "m4 arg of proc"
  e = MyClass.m5
  e.each do |i|
    puts i                                 #=> "5", "10"
  end
end

The same thing can be written in Python as follows.

`minimal_sample.py`


from rb_call import RubySession

rb = RubySession()                          # Execute a Ruby process
rb.require('./minimal_sample')              # load a Ruby library 'sample_class.rb'

MyClass = rb.const('MyClass')               # get a Class defined in 'sample_class.rb'
obj = MyClass()                             # create an instance of MyClass
print( obj.m1(), obj.m2(1,2), obj.m3(3,b=4) )
                                            #=> "m1", "m2 1 2", "m3 3 4"
proc = obj.m4('arg of proc')
print( proc() )                             #=> "m4 arg of proc"

e = obj.m5()                                # Not only a simple Array but an Enumerator is supported
for i in e:                                 # You can iterate using `for` syntax over an Enumerable
    print(i)                                #=> "5", "10"

You can call Ruby libraries from Python almost as they are. You can do method chains, and you can iterate using for. Although not in the sample, list comprehensions can be used as expected. You can also access Ruby exceptions.

For example, if you combine it with Rails code, you can write it like this.

`rails_sample.py`


author = Author.find('...id...')
Book.where( {'author':author} ).gt( {'price':100} ).asc( 'year' )

If you use metaprogramming and external libraries well, you can realize it with compact code that fits in one file, about 130 lines for Python and about 80 lines for Ruby. Of course, there are restrictions as described later, but it works fine for most applications.

How did you implement it?

Problems with normal RPC

I'm using RPC to call Ruby methods from Python. This time, I used a library (specification?) Called MessagePack-RPC. Click here for the basic usage of MessagePack RPC. http://qiita.com/yohm13/items/70b626ca3ac6fbcdf939 Ruby is started as a sub-process of Python process, and interprocess communication is performed by socket between Ruby and Python. Roughly speaking, give the method name and arguments you want to call from Python to Ruby, and return the return value from Ruby to Python. At that time, the specification for serializing the transmitted / received data is defined in MessagePack-RPC.

If it is a process of "simply giving an argument and returning a value", there is no problem with this method. However, Ruby often makes you want to do method chains, and there are many libraries that assume that. For example, in Rails, you would frequently write the following code.

Book.where( author: author ).gt( price: 100 ).asc( :year )

Such a method chain cannot be realized by ordinary RPC.

The problem is essentially due to the inability to save state in the middle of the method chain. RPC between Ruby and Python can only exchange objects that can be serialized with MessagePack, so the object after Book.where will be serialized when it is returned from the Ruby process to the Python process. Even if you want to call the method of, you cannot call it.

In other words, it is necessary to hold some Ruby object in the Ruby process, and a mechanism to refer to it later as needed is required.

Solution

Therefore, this time, the Ruby object that cannot be serialized by the return value from Ruby is kept in the Ruby process, and only the ID and class of the object are returned to the Python side. Define a class called RubyObject on the Python side, keep the pair of (ID, class) coming from the Ruby side as a member, and delegate the method call to that RubyObject to the object in the Ruby process. To do.

The process when returning a value on the Ruby side is roughly as follows.

@@variables[ obj.object_id ] = obj          #Keep the object so that it can be referenced later by ID
MessagePack.pack( [self.class.to_s, self.object_id] )  #Return class and object ID to Python side

However, anything that can be serialized with MessagePack, such as String and Fixnum, is sent to Python as it is.

The processing when it is received on the Python side

class RubyObject():
    def __init__(self, rb_class, obj_id):  #RubyObject holds class name and ID
        self.rb_class = rb_class
        self.obj_id = obj_id

#Handling of values returned by RPC
rb_class, obj_id = msgpack.unpackb(obj.data, encoding='utf-8')
RubyObject( rb_class, obj_id )

When I write it in a picture, it looks like this, and the object on the Python side is an image that has only a pointer to a Ruby object.

After that, it is OK if you transfer the method call made to RubyObject in Python to the actual Object on the Ruby side. Define a __getattr__ method (method_missing in Ruby) that is called when an attribute that does not exist is called for RubyObject.

class RubyObject():
    ...
    def __getattr__( self, attr ):
        def _method_missing(*args, **kwargs):
            return self.send( attr, *args, **kwargs )
        return _method_missing

    def send(self, method, *args, **kwargs):
        #Object ID in RPC,Send method name and arguments to Ruby
        obj = self.client.call('send_method', self.obj_id, method, args, kwargs )
        return self.cast(obj)       #Cast the return value to a RubyObject

Code called on the Ruby side

  def send_method( objid, method_name, args = [], kwargs = {})
    obj = find_object(objid)                       #Get the saved object from objid
    ret = obj.send(method_name, *args, **kwargs)   #Execute method
  end

Then, the method called for RubyObject in Python will be called as the method of Ruby. With this, Ruby objects also behave as if they were assigned to Python variables, and Ruby methods can be called naturally from Python.

Points to be careful about when implementing

Use MessagePack Extension Type

MessagePack has a specification that allows you to define a user-defined type called Extension type. https://github.com/msgpack/msgpack/blob/master/spec.md#types-extension-type

This time, I defined RubyObject (that is, String of class name and Fixnum of object ID) as Extension type and used it. On the Ruby side, I monkey patched Object and defined the to_msgpack_ext method. By the way, the latest msgpack gem supports Extension type, but msgpack-rpc-ruby seems to have stopped development and did not use the latest msgpack. Forked to rely on the latest gems. https://github.com/yohm/msgpack-rpc-ruby The code looks like this:

Object.class_eval
  def self.from_msgpack_ext( data )
    rb_cls, obj_id = MessagePack.unpack( data )
    RbCall.find_object( obj_id )
  end

  def to_msgpack_ext
    RbCall.store_object( self )   #Save the object in the variable
    MessagePack.pack( [self.class.to_s, self.object_id] )
  end
end
MessagePack::DefaultFactory.register_type(40, Object)

On the Python side as well, I wrote a process to convert the 40th Extension Type to the RubyObject type.

Counting the number of object references

At this rate, every time an object is returned from the Ruby side to the Python side, the variables saved in the Ruby process increase monotonically and a memory leak occurs. Variables that are no longer referenced on the Python side need to be dereferenced on the Ruby side as well.

That's why I overridden Python's RubyObject __del__. __del__ is a method called when the variable that refers to an object in Python is 0 and can be collected by GC. At this timing, the variables on the Ruby side are also deleted. http://docs.python.jp/3/reference/datamodel.html#object.del

    def __del__(self):
        self.session.call('del_object', self.obj_id)

The following code is called on the Ruby side.

  def del_object
    @@variables.delete(objid)
  end

However, if you simply use this method, it will not work properly if two Python variables refer to one Ruby Object. Therefore, the number of references returned from the Ruby side to the Python side is also counted, and the variables on the Ruby side are also released when it becomes zero.

class RbCall
  def self.store_object( obj )
    key = obj.object_id
    if @@variables.has_key?( key )
      @@variables[key][1] += 1
    else
      @@variables[key] = [obj, 1]
    end
  end

  def self.find_object( obj_id )
    @@variables[obj_id][0]
  end

  def del_object( args, kwargs = {} )
    objid = args[0]
    @@variables[objid][1] -= 1
    if @@variables[objid][1] == 0
      @@variables.delete(objid)
    end
    nil
  end
end

Objects deleted from @@ variables are properly released by Ruby's GC. There should be no problem managing the life of the object.

exception

Ruby exception information can also be obtained. In the published msgpack-rpc-ruby, when an exception occurs on the Ruby side, the exception is sent as to_s, but this method loses most of the exception information. Therefore, the exception object of Ruby is also sent as an instance of RubyObject of Python. Again, I've tweaked msgpack-rpc-ruby and made changes to serialize if Msgpack can serialize it, rather than always doing to_s.

The processing when an exception occurs is as follows. If an exception occurs on the Ruby side, msgpackrpc.error.RPCError will occur on the Python side. This is a specification of msgpack-rpc-python. Put an instance of RubyObject in the ʻargsattribute of the exception. If RubyObject is included, throwRubyExceptiondefined on Python side. At that time, the reference to the exception object generated on the Ruby side is stored in the attributerb_exception`. Now you can access the exceptions on the Ruby side.

The processing on the Python side is simplified and written as follows.

class RubyObject():
    def send(self, method, *args, **kwargs):
        try:
            obj = self.session.client.call('send_method', self.obj_id, method, args, kwargs )
            return self.cast(obj)
        except msgpackrpc.error.RPCError as ex:
            arg = RubyObject.cast( ex.args[0] )
            if isinstance( arg, RubyObject ):
                raise RubyException( arg.message(), arg ) from None
            else:
                raise

class RubyException( Exception ):
    def __init__(self,message,rb_exception):
        self.args = (message,)
        self.rb_exception = rb_exception

For example, when a Ruby ArgumentError is generated, the Python process is as follows.

try:
    obj.my_method("invalid", "number", "of", "arg")  #RubyObject my_Incorrect number of method arguments
except RubyException as ex:    #Exception called RubyException occurs
    ex.args.rb_exception       # ex.args.rb_Exception has a RubyObject that references a Ruby exception

Generator compatible

For example, consider performing the following Ruby processing.

articles = Article.all

ʻArticle.all` is not an Array but an Enumerable and is not actually expanded as an array in memory. The database is accessed for the first time when each is turned, and the information of each record can be acquired.

On the Python side as well, it is necessary to define a generator to run the loop in the form of for a in articles.

To do this, define the __iter__ method in the RubyObject class on the Python side. __iter__ is a method that returns an iterator, and this method is implicitly called in the for statement. This corresponds directly to Ruby's ʻeach, so call ʻeach in __iter__. http://anandology.com/python-practice-book/iterators.html https://docs.ruby-lang.org/ja/latest/class/Enumerator.html

In Python, when turning a loop, the __next__ method is called for the return value of __iter__. There is exactly the same correspondence in Ruby, and ʻEnumerator # next is the corresponding method. When the iteration reaches the end, the exception "StopIteration" is thrown on the Ruby side. Python has the same specifications, and when an exception occurs, an exception called StopIteration is thrown. (It happens to be an exception with the same name.)

class RubyObject():
    ...
    def __iter__(self):
        return self.send( "each" )
    def __next__(self):
        try:
            n = self.send( "next" )
            return n
        except RubyException as ex:
            if ex.rb_exception.rb_class == 'StopIteration': #When a Stop Iteration exception is thrown in Ruby
                raise StopIteration()  #Raise StopIteration exception in Python
            else:
                raise

Now you can use the loop from Python to Ruby's Enumerable.

others

Define methods in RubyObject so that Python's built-in functions work properly. The corresponding Ruby methods are:

-- __eq__ is == -- __dir __ is public_methods -- __str__ is to_s -- __len__ is size --__getitem__ is [] --__call__ is call

Limitations

--Cannot pass Python objects (those that cannot be serialized with MessagePack) to Ruby method arguments -Can't pass a Python function to a Ruby method that receives a block (essentially the same problem as above) --In principle, it should be possible to refer to Python objects from Ruby processes in the same way, but it seems to be complicated to implement, so I leave it as it is. --There is a method name that cannot be used on Python syntax. --For example, you can use method names such as .class, ʻis_a?in Ruby, but not in Python. --This problem can be avoided by calling.send ('class'), .send ('is_a?'). --Some Ruby libraries may have problems. --For example, in Mongoid, there is a class that undefs almost all public_methods for metaprogramming. - http://qiita.com/yohm13/items/40376eafc045492d5f4f --In such a case, the to_msgpack_extdefined in rb_call will also be undefined and will not work properly. --After requiring Mongoid, you can avoid it by redefiningto_msgpack_ext` in the corresponding class. --Unconfirmed, ActiveRecord may cause the same problem.

When I implemented it, I found that Python and Ruby have a very similar correspondence, and it can be implemented very neatly just by defining the corresponding methods well. It's actually about 200 lines of code, so if you are interested, please read the source.

A mechanism to call a Ruby method from Python that can be done in 200 lines