I made a library that allows you to call Ruby methods from Python. I will introduce method chains and iterators because they can be used naturally to some extent.
https://github.com/yohm/rb_call
We are developing a Rails app that manages jobs for scientific and technological calculations, and the behavior can be controlled by the Ruby API. However, since there are many Python users in the field of scientific computing, there were many requests for a Python API instead of Ruby.
For example, suppose you have the following Ruby code.
minimal_sample.rb
class MyClass
def m1
"m1"
end
def m2(a,b)
"m2 #{a} #{b}"
end
def m3(a, b:)
"m3 #{a} #{b}"
end
def m4(a)
Proc.new { "m4 #{a}" }
end
def m5
enum = Enumerator.new{|y|
(1..10).each{|i|
y << "#{i}" if i % 5 == 0
}
}
end
end
if $0 == __FILE__
obj = MyClass.new
puts obj.m1, obj.m2(1,2), obj.m3(3,b:4) #=> "m1", "m2 1 2", "m3 3 4"
proc = obj.m4('arg of proc')
puts proc.call #=> "m4 arg of proc"
e = MyClass.m5
e.each do |i|
puts i #=> "5", "10"
end
end
The same thing can be written in Python as follows.
minimal_sample.py
from rb_call import RubySession
rb = RubySession() # Execute a Ruby process
rb.require('./minimal_sample') # load a Ruby library 'sample_class.rb'
MyClass = rb.const('MyClass') # get a Class defined in 'sample_class.rb'
obj = MyClass() # create an instance of MyClass
print( obj.m1(), obj.m2(1,2), obj.m3(3,b=4) )
#=> "m1", "m2 1 2", "m3 3 4"
proc = obj.m4('arg of proc')
print( proc() ) #=> "m4 arg of proc"
e = obj.m5() # Not only a simple Array but an Enumerator is supported
for i in e: # You can iterate using `for` syntax over an Enumerable
print(i) #=> "5", "10"
You can call Ruby libraries from Python almost as they are.
You can do method chains, and you can iterate using for
. Although not in the sample, list comprehensions can be used as expected.
You can also access Ruby exceptions.
For example, if you combine it with Rails code, you can write it like this.
rails_sample.py
author = Author.find('...id...')
Book.where( {'author':author} ).gt( {'price':100} ).asc( 'year' )
If you use metaprogramming and external libraries well, you can realize it with compact code that fits in one file, about 130 lines for Python and about 80 lines for Ruby. Of course, there are restrictions as described later, but it works fine for most applications.
I'm using RPC to call Ruby methods from Python. This time, I used a library (specification?) Called MessagePack-RPC. Click here for the basic usage of MessagePack RPC. http://qiita.com/yohm13/items/70b626ca3ac6fbcdf939 Ruby is started as a sub-process of Python process, and interprocess communication is performed by socket between Ruby and Python. Roughly speaking, give the method name and arguments you want to call from Python to Ruby, and return the return value from Ruby to Python. At that time, the specification for serializing the transmitted / received data is defined in MessagePack-RPC.
If it is a process of "simply giving an argument and returning a value", there is no problem with this method. However, Ruby often makes you want to do method chains, and there are many libraries that assume that. For example, in Rails, you would frequently write the following code.
Book.where( author: author ).gt( price: 100 ).asc( :year )
Such a method chain cannot be realized by ordinary RPC.
The problem is essentially due to the inability to save state in the middle of the method chain.
RPC between Ruby and Python can only exchange objects that can be serialized with MessagePack, so the object after Book.where
will be serialized when it is returned from the Ruby process to the Python process. Even if you want to call the method of, you cannot call it.
In other words, it is necessary to hold some Ruby object in the Ruby process, and a mechanism to refer to it later as needed is required.
Therefore, this time, the Ruby object that cannot be serialized by the return value from Ruby is kept in the Ruby process, and only the ID and class of the object are returned to the Python side.
Define a class called RubyObject
on the Python side, keep the pair of (ID, class) coming from the Ruby side as a member, and delegate the method call to that RubyObject to the object in the Ruby process. To do.
The process when returning a value on the Ruby side is roughly as follows.
@@variables[ obj.object_id ] = obj #Keep the object so that it can be referenced later by ID
MessagePack.pack( [self.class.to_s, self.object_id] ) #Return class and object ID to Python side
However, anything that can be serialized with MessagePack, such as String and Fixnum, is sent to Python as it is.
The processing when it is received on the Python side
class RubyObject():
def __init__(self, rb_class, obj_id): #RubyObject holds class name and ID
self.rb_class = rb_class
self.obj_id = obj_id
#Handling of values returned by RPC
rb_class, obj_id = msgpack.unpackb(obj.data, encoding='utf-8')
RubyObject( rb_class, obj_id )
When I write it in a picture, it looks like this, and the object on the Python side is an image that has only a pointer to a Ruby object.
After that, it is OK if you transfer the method call made to RubyObject in Python to the actual Object on the Ruby side.
Define a __getattr__
method (method_missing
in Ruby) that is called when an attribute that does not exist is called for RubyObject.
class RubyObject():
...
def __getattr__( self, attr ):
def _method_missing(*args, **kwargs):
return self.send( attr, *args, **kwargs )
return _method_missing
def send(self, method, *args, **kwargs):
#Object ID in RPC,Send method name and arguments to Ruby
obj = self.client.call('send_method', self.obj_id, method, args, kwargs )
return self.cast(obj) #Cast the return value to a RubyObject
Code called on the Ruby side
def send_method( objid, method_name, args = [], kwargs = {})
obj = find_object(objid) #Get the saved object from objid
ret = obj.send(method_name, *args, **kwargs) #Execute method
end
Then, the method called for RubyObject in Python will be called as the method of Ruby. With this, Ruby objects also behave as if they were assigned to Python variables, and Ruby methods can be called naturally from Python.
MessagePack has a specification that allows you to define a user-defined type called Extension type. https://github.com/msgpack/msgpack/blob/master/spec.md#types-extension-type
This time, I defined RubyObject (that is, String of class name and Fixnum of object ID) as Extension type and used it.
On the Ruby side, I monkey patched Object and defined the to_msgpack_ext
method.
By the way, the latest msgpack gem supports Extension type, but msgpack-rpc-ruby seems to have stopped development and did not use the latest msgpack. Forked to rely on the latest gems.
https://github.com/yohm/msgpack-rpc-ruby
The code looks like this:
Object.class_eval
def self.from_msgpack_ext( data )
rb_cls, obj_id = MessagePack.unpack( data )
RbCall.find_object( obj_id )
end
def to_msgpack_ext
RbCall.store_object( self ) #Save the object in the variable
MessagePack.pack( [self.class.to_s, self.object_id] )
end
end
MessagePack::DefaultFactory.register_type(40, Object)
On the Python side as well, I wrote a process to convert the 40th Extension Type to the RubyObject
type.
At this rate, every time an object is returned from the Ruby side to the Python side, the variables saved in the Ruby process increase monotonically and a memory leak occurs. Variables that are no longer referenced on the Python side need to be dereferenced on the Ruby side as well.
That's why I overridden Python's RubyObject __del__
.
__del__
is a method called when the variable that refers to an object in Python is 0 and can be collected by GC.
At this timing, the variables on the Ruby side are also deleted.
http://docs.python.jp/3/reference/datamodel.html#object.del
def __del__(self):
self.session.call('del_object', self.obj_id)
The following code is called on the Ruby side.
def del_object
@@variables.delete(objid)
end
However, if you simply use this method, it will not work properly if two Python variables refer to one Ruby Object. Therefore, the number of references returned from the Ruby side to the Python side is also counted, and the variables on the Ruby side are also released when it becomes zero.
class RbCall
def self.store_object( obj )
key = obj.object_id
if @@variables.has_key?( key )
@@variables[key][1] += 1
else
@@variables[key] = [obj, 1]
end
end
def self.find_object( obj_id )
@@variables[obj_id][0]
end
def del_object( args, kwargs = {} )
objid = args[0]
@@variables[objid][1] -= 1
if @@variables[objid][1] == 0
@@variables.delete(objid)
end
nil
end
end
Objects deleted from @@ variables
are properly released by Ruby's GC.
There should be no problem managing the life of the object.
Ruby exception information can also be obtained.
In the published msgpack-rpc-ruby, when an exception occurs on the Ruby side, the exception is sent as to_s
, but this method loses most of the exception information.
Therefore, the exception object of Ruby is also sent as an instance of RubyObject of Python.
Again, I've tweaked msgpack-rpc-ruby and made changes to serialize if Msgpack can serialize it, rather than always doing to_s.
The processing when an exception occurs is as follows.
If an exception occurs on the Ruby side, msgpackrpc.error.RPCError
will occur on the Python side. This is a specification of msgpack-rpc-python.
Put an instance of RubyObject in the ʻargsattribute of the exception. If RubyObject is included, throw
RubyExceptiondefined on Python side. At that time, the reference to the exception object generated on the Ruby side is stored in the attribute
rb_exception`.
Now you can access the exceptions on the Ruby side.
The processing on the Python side is simplified and written as follows.
class RubyObject():
def send(self, method, *args, **kwargs):
try:
obj = self.session.client.call('send_method', self.obj_id, method, args, kwargs )
return self.cast(obj)
except msgpackrpc.error.RPCError as ex:
arg = RubyObject.cast( ex.args[0] )
if isinstance( arg, RubyObject ):
raise RubyException( arg.message(), arg ) from None
else:
raise
class RubyException( Exception ):
def __init__(self,message,rb_exception):
self.args = (message,)
self.rb_exception = rb_exception
For example, when a Ruby ArgumentError is generated, the Python process is as follows.
try:
obj.my_method("invalid", "number", "of", "arg") #RubyObject my_Incorrect number of method arguments
except RubyException as ex: #Exception called RubyException occurs
ex.args.rb_exception # ex.args.rb_Exception has a RubyObject that references a Ruby exception
For example, consider performing the following Ruby processing.
articles = Article.all
ʻArticle.all` is not an Array but an Enumerable and is not actually expanded as an array in memory. The database is accessed for the first time when each is turned, and the information of each record can be acquired.
On the Python side as well, it is necessary to define a generator to run the loop in the form of for a in articles
.
To do this, define the __iter__
method in the RubyObject class on the Python side.
__iter__
is a method that returns an iterator, and this method is implicitly called in the for statement.
This corresponds directly to Ruby's ʻeach, so call ʻeach
in __iter__
.
http://anandology.com/python-practice-book/iterators.html
https://docs.ruby-lang.org/ja/latest/class/Enumerator.html
In Python, when turning a loop, the __next__
method is called for the return value of __iter__
. There is exactly the same correspondence in Ruby, and ʻEnumerator # next is the corresponding method. When the iteration reaches the end, the exception "StopIteration
" is thrown on the Ruby side. Python has the same specifications, and when an exception occurs, an exception called StopIteration
is thrown. (It happens to be an exception with the same name.)
class RubyObject():
...
def __iter__(self):
return self.send( "each" )
def __next__(self):
try:
n = self.send( "next" )
return n
except RubyException as ex:
if ex.rb_exception.rb_class == 'StopIteration': #When a Stop Iteration exception is thrown in Ruby
raise StopIteration() #Raise StopIteration exception in Python
else:
raise
Now you can use the loop from Python to Ruby's Enumerable.
Define methods in RubyObject so that Python's built-in functions work properly. The corresponding Ruby methods are:
-- __eq__
is ==
-- __dir __
is public_methods
-- __str__
is to_s
-- __len__
is size
--__getitem__
is []
--__call__
is call
--Cannot pass Python objects (those that cannot be serialized with MessagePack) to Ruby method arguments
-Can't pass a Python function to a Ruby method that receives a block (essentially the same problem as above)
--In principle, it should be possible to refer to Python objects from Ruby processes in the same way, but it seems to be complicated to implement, so I leave it as it is.
--There is a method name that cannot be used on Python syntax.
--For example, you can use method names such as .class
, ʻis_a?in Ruby, but not in Python. --This problem can be avoided by calling
.send ('class'),
.send ('is_a?'). --Some Ruby libraries may have problems. --For example, in Mongoid, there is a class that undefs almost all public_methods for metaprogramming. - http://qiita.com/yohm13/items/40376eafc045492d5f4f --In such a case, the
to_msgpack_extdefined in rb_call will also be undefined and will not work properly. --After requiring Mongoid, you can avoid it by redefining
to_msgpack_ext` in the corresponding class.
--Unconfirmed, ActiveRecord may cause the same problem.
When I implemented it, I found that Python and Ruby have a very similar correspondence, and it can be implemented very neatly just by defining the corresponding methods well. It's actually about 200 lines of code, so if you are interested, please read the source.