[Translation] reticulate vignette: R to Python interface

This document is a vignette ["R interface to Python"](https: //) of the R package reticulate (Version 0.7) by RStudio et al. It is a translation of cran.r-project.org/package=reticulate/vignettes/introduction.html).

License: Apache License 2.0

Overview

The ** reticulate ** package provides an R interface to Python modules, classes and functions. For example, the following code imports a Python ʻos` module and calls the functions in it.

library(reticulate)
os <- import("os")
os$chdir("tests")
os$getcwd()

Functions and other data in Python modules and classes can be accessed with the $ operator (similar to working with R lists, environments, and reference classes).

When you call Python, the R data type is automatically converted to the Python equivalent. When a value is returned from Python to R, it is converted to the type of R. The type is converted as follows.

R	Python	Example
One element vector	scalar	`1`, `1L`, `TRUE`, `"foo"`
Multi-factor vector	list	`c(1.0, 2.0, 3.0)`, `c(1L, 2L, 3L)`
List containing multiple types	Tuple	`list(1L, TRUE, "foo")`
Named list	dictionary	`list(a = 1L, b = 2.0)`, `dict(x = x_data)`
queue/Array	Array of NumPy(ndarray)	`matrix(c(1,2,3,4), nrow = 2, ncol = 2)`
function	Python functions	`function(x) x + 1`
NULL, TRUE, FALSE	None, True, False	`NULL`, `TRUE`, `FALSE`

If a custom class Python object is returned, R will return a reference to that object. You can call methods and access properties on this object as if it were an instance of an R reference class.

The ** reticulate ** package works with all Python versions 2.7 and above. Numpy can be integrated as an option, but Numpy 1.6 or higher is required.

Installation

** reticulate ** can be installed from CRAN as follows.

install.packages("reticulate")

Identifying the location of Python

If the version of Python you want to use is in your system's PATH, it will be automatically found (by Sys.which) and used.

Alternatively, you can use one of the following functions to specify another version of Python.

function	Description
use_python	Specify the path to a specific Python binary.
use_virtualenv	Specify the directory that contains Python's virtualenv.
use_condaenv	Specify the environment of conda.

Example:

library(reticulate)
use_python("/usr/local/bin/python")
use_virtualenv("~/myenv")
use_condaenv("myenv")

Numpy 1.6 or higher is required to use Numpy features with ** reticulate **, so a version of Python that meets this requirement is preferred.

Also note that by default the ʻusefamily of functions is just a hint of where Python can be found (that is, no error will occur if the specified version of Python does not exist). To ensure that the specified version of Python actually exists, add therequired` argument.

use_virtualenv("~/myenv", required = TRUE)

The version of Python that is searched and found in the following order is used.

The location specified by calling ʻuse_python, ʻuse_virtualenv, ʻuse_condaenv`
Location specified by the RETICULATE_PYTHON environment variable
Python location found in the system PATH (by the Sys.which function)
Other customary places where Python is placed. / usr / local / bin / python, / opt / local / bin / python, etc.

In a typical use, Python search and binding will be performed the first time you call ʻimport in an R session. As a result, the version of Python that contains the module specified in the call to ʻimport is preferentially used (that is, the version of Python that does not contain the specified module is skipped).

You can use the py_config function to query information about your version of Python and other versions of Python found on your system.

py_config()

Module import

You can import any Python module using the ʻimport` function. For example.

difflib <- import("difflib")
difflib$ndiff(foo, bar)

filecmp <- import("filecmp")
filecmp$cmp(dir1, dir2)

The ʻimport_main and ʻimport_builtins functions give you access to the main module, where the code is executed by default, and the built-in Python functions. For example.

main <- import_main()

py <- import_builtins()
py$print('foo')

In general, the main module is useful when you want to execute Python code from a file or string and access the results (see the section below for more details).

Object conversion

When a Python object is returned to R, it is converted to the R equivalent type by default. However, if you want to make the Python-to-R conversion explicit and make it the default setting to handle native Python objects, you can pass convert = FALSE to the ʻimport` function.

#Import numpy and prohibit automatic conversion from Python to R
np <- import("numpy", convert = FALSE)

#Manipulate arrays with NumPy
a <- np$array(c(1:4))
sum <- a$cumsum()

#Finally explicitly convert to R
py_to_r(sum)

As shown above, if you need access to the R object at the end of the calculation, you can explicitly call the py_to_r function.

Code execution

You can use the py_run_file and py_run_string functions to execute Python code inside the main module. Both of these functions return a reference to Python's main module, so you can access the execution results. For example.

py_run_file("script.py")

main <- py_run_string("x = 10")
main$x

Lists, tuples, dictionaries

Automatic conversion from R type to Python type works well in most cases, but in some cases more explicit manipulation is required on the R side to give the type that Python expects.

For example, if the Python API requires a list and you pass a vector of one element of R, it will be converted to a Python scalar. To overcome this, simply use R's list function explicitly.

foo$bar(indexes = list(42L))

Similarly, the Python API may require tuples instead of lists, in which case you can use the tuple function.

tuple("a", "b", "c")

R's named list is converted to a Python dictionary. You can also explicitly create a Python dictionary using the dict function.

dict(foo = "bar", index = 42L)

This may be useful if you need to pass a dictionary with more complex objects (not strings) as keys.

context

You can use R's with generic function to manipulate Python context manager objects (in Python you can do the same with the with keyword). For example.

py <- import_builtins()
with(py$open("output.txt", "w") %as% file, {
  file$write("Hello, there!")
})

This example opens a file and guarantees that it will be closed automatically at the end of the with block. Notice that we use the % as% operator to give an alias to the object created by the context manager.

Iterator

If the Python API returns iterators and generators, you can manipulate this using the ʻiterate function. .. You can use the ʻiterate function to apply the R function to each element returned by the iterator.

iterate(iter, print)

If you don't pass a function to ʻiterate`, the results will be collected into an R vector.

results <- iterate(iter)

Note that the iterator value is consumed by the ʻiterate` function.

a <- iterate(iter) #The result is not empty
b <- iterate(iter) #The result is empty because the element has already been consumed

Callable object

Some Python objects can be called (that is, they can be called with arguments like ordinary functions), as well as access to methods and properties. Callable Python objects are returned to R as objects rather than functions, but you can execute callable functions with the $ call () method. For example.

#Get a callable object
parser <- spacy$English()
#Call the object as a function
parser$call(spacy)

Advanced functions

More advanced functions are also available, which are mainly useful when creating high-level R interfaces for Python libraries.

Python object

To work with Python objects from R, you usually use the $ operator to access the required object functionality. With $, Python objects are automatically converted to R equivalents if possible, but the following functions can be used to manipulate Python objects at a lower level (for example, explicit). Unless you call py_to_r, R will not be converted to an object).

function	Description
py_has_attr	Check if the object has the specified attributes.
py_get_attr	Get the attributes of a Python object.
py_set_attr	Set the attributes of the Python object.
py_list_attributes	Get a list of all attributes of a Python object.
py_call	Call a Python callable object with the specified arguments.
py_to_r	Convert a Python object to an R equivalent object.
r_to_py	Convert an R object to a Python equivalent.

Setting

You can use the following functions to query information about Python settings available on your current system.

function	Description
py_available	Check if the interface to Python is available on this system.
py_numpy_available	Check if the R interface to NumPy is available (NumPy 1 required).6 or more).
py_module_available	Check if Python modules are available on this system.
py_config	Get information about the location and version of Python in use.

Output control

You can use the following functions to capture or suppress the output from Python.

function	Description
py_capture_output	Captures the Python output for the specified expression and returns it as an R character vector.
py_suppress_warnings	Executes the specified expression, but suppresses the display of Python warnings.

Other

The following functions provide a variety of other low-level functionality.

function	Description
py_unicode	Convert a string to a Python Unicode object.
py_str	Get the string representation of a Python object.
py_is_null_xptr	Check if the Python object is a null externalptr.
py_validate_xptr	Check if the Python object is a null externalptr and throw an error if so.

Use in packaging

Check and test with CRAN

If you want to use ** reticulate ** with other R packages, you need to consider the following: That is, when a package is submitted to CRAN, CRAN's test server may not have Python, NumPy, or any other Python module you are trying to wrap in the package.

To ensure that your package works well with CRAN, you should do two things:

When importing Python modules for use within a package, you should use the delay_load option to ensure that the modules (and Python) are loaded only the first time you use them.

#Of python you want to use in your package'foo'module
foo <- NULL

.onLoad <- function(libname, pkgname) {
  #Lazy loading the foo module ($Loaded only when accessed with)
  foo <<- import("foo", delay_load = TRUE)
}

When writing a test, check if the module is available and skip the test if it is not available. For example, if you are using the ** testthat ** package, it would look like this:

# 'foo'Helper function to skip the test if there is no module
skip_if_no_foo <- function() {
  have_foo <- py_module_available("foo")
  if (!have_foo)
    skip("Foo is not available for testing!")
}

#Call this helper function in all tests
test_that("Works as expected", {
  skip_if_no_foo()
  #Write test code here
})

S3 method

Since the class of the Python object exposed on the R side is carried over to R by ** reticulate **, it is possible to write an S3 method for the class and customize the behavior of str and print, for example. You can (but usually don't need to), because the default str and print methods call PyObject_Str, which usually provides an acceptable default behavior.

If you really decide to implement a customized S3 method for your Python class, it's important to keep the following in mind: That is, the connection to the Python object is lost at the end of the R session, so restoring .RData saved in one R session in a subsequent R session effectively loses the Python object (exactly). In other words, it becomes a NULL ʻexternalptr` object of R).

This means that you should always use py_is_null_xptr before manipulating Python objects with S3 methods. For example.

#' @export
summary.MyPythonClass <- function(object, ...) {
  if (py_is_null_xptr(object))
    stop("Object is NULL")
  else
    #Manipulate objects to generate summaries
}

To make this easier, there are some shortcut methods available. The py_validate_xptr function makes the necessary checks and automatically throws an error if it fails. So the above example could be rewritten as:

#' @export
summary.MyPythonClass <- function(object, ...) {
  py_validate_xptr(object)
  #Manipulate objects to generate summaries
}

Finally, the ** reticulate ** package exports the py_str generic function, which returns proper validation from the str method (returns <pointer: 0x0> if the object is NULL). It is called only after passing [^ tr1]. You can implement the py_str method as follows.

#' @importFrom reticulate py_str
#' @export 
py_str.MyPythonClass <- function(object, ...) {
  #Manipulate objects to generate strings
}

In short, implement py_str to provide custom str and print methods. For other S3 methods, be sure to call py_validate_xptr or py_is_null_xptr before manipulating the object.

[^ tr1]: I think it's a little confusing, so I'll add it. First, by reticulate, any Python object looks like an object that inherits the python.builtin.object class on the R side. A str method is defined for python.builtin.object, and this method calls the py_str generic function. And the py_str generic function has a structure that calls the method after checking the arguments.

```r
reticulate:::str.python.builtin.object
```

```
## function (object, ...) 
## {
##     cat(py_str(object), "\n", sep = "")
## }
## <environment: namespace:reticulate>
```

```r
reticulate::py_str
```

```
## function (object, ...) 
## {
##     if (!inherits(object, "python.builtin.object")) 
##         py_str.default(object)
##     else if (py_is_null_xptr(object) || !py_available()) 
##         "<pointer: 0x0>"
##     else UseMethod("py_str")
## }
## <environment: namespace:reticulate>
```