This document is a vignette ["R interface to Python"](https: //) of the R package reticulate
(Version 0.7) by RStudio et al. It is a translation of cran.r-project.org/package=reticulate/vignettes/introduction.html).
License: Apache License 2.0
The ** reticulate ** package provides an R interface to Python modules, classes and functions. For example, the following code imports a Python ʻos` module and calls the functions in it.
library(reticulate)
os <- import("os")
os$chdir("tests")
os$getcwd()
Functions and other data in Python modules and classes can be accessed with the $
operator (similar to working with R lists, environments, and reference classes).
When you call Python, the R data type is automatically converted to the Python equivalent. When a value is returned from Python to R, it is converted to the type of R. The type is converted as follows.
R | Python | Example |
---|---|---|
One element vector | scalar | 1 , 1L , TRUE , "foo" |
Multi-factor vector | list | c(1.0, 2.0, 3.0) , c(1L, 2L, 3L) |
List containing multiple types | Tuple | list(1L, TRUE, "foo") |
Named list | dictionary | list(a = 1L, b = 2.0) , dict(x = x_data) |
queue/Array | Array of NumPy(ndarray) | matrix(c(1,2,3,4), nrow = 2, ncol = 2) |
function | Python functions | function(x) x + 1 |
NULL, TRUE, FALSE | None, True, False | NULL , TRUE , FALSE |
If a custom class Python object is returned, R will return a reference to that object. You can call methods and access properties on this object as if it were an instance of an R reference class.
The ** reticulate ** package works with all Python versions 2.7 and above. Numpy can be integrated as an option, but Numpy 1.6 or higher is required.
** reticulate ** can be installed from CRAN as follows.
install.packages("reticulate")
If the version of Python you want to use is in your system's PATH
, it will be automatically found (by Sys.which
) and used.
Alternatively, you can use one of the following functions to specify another version of Python.
function | Description |
---|---|
use_python | Specify the path to a specific Python binary. |
use_virtualenv | Specify the directory that contains Python's virtualenv. |
use_condaenv | Specify the environment of conda. |
Example:
library(reticulate)
use_python("/usr/local/bin/python")
use_virtualenv("~/myenv")
use_condaenv("myenv")
Numpy 1.6 or higher is required to use Numpy features with ** reticulate **, so a version of Python that meets this requirement is preferred.
Also note that by default the ʻusefamily of functions is just a hint of where Python can be found (that is, no error will occur if the specified version of Python does not exist). To ensure that the specified version of Python actually exists, add the
required` argument.
use_virtualenv("~/myenv", required = TRUE)
The version of Python that is searched and found in the following order is used.
The location specified by calling ʻuse_python, ʻuse_virtualenv
, ʻuse_condaenv`
Location specified by the RETICULATE_PYTHON
environment variable
Python location found in the system PATH
(by the Sys.which
function)
Other customary places where Python is placed. / usr / local / bin / python
, / opt / local / bin / python
, etc.
In a typical use, Python search and binding will be performed the first time you call ʻimport in an R session. As a result, the version of Python that contains the module specified in the call to ʻimport
is preferentially used (that is, the version of Python that does not contain the specified module is skipped).
You can use the py_config
function to query information about your version of Python and other versions of Python found on your system.
py_config()
You can import any Python module using the ʻimport` function. For example.
difflib <- import("difflib")
difflib$ndiff(foo, bar)
filecmp <- import("filecmp")
filecmp$cmp(dir1, dir2)
The ʻimport_main and ʻimport_builtins
functions give you access to the main module, where the code is executed by default, and the built-in Python functions. For example.
main <- import_main()
py <- import_builtins()
py$print('foo')
In general, the main module is useful when you want to execute Python code from a file or string and access the results (see the section below for more details).
When a Python object is returned to R, it is converted to the R equivalent type by default. However, if you want to make the Python-to-R conversion explicit and make it the default setting to handle native Python objects, you can pass convert = FALSE
to the ʻimport` function.
#Import numpy and prohibit automatic conversion from Python to R
np <- import("numpy", convert = FALSE)
#Manipulate arrays with NumPy
a <- np$array(c(1:4))
sum <- a$cumsum()
#Finally explicitly convert to R
py_to_r(sum)
As shown above, if you need access to the R object at the end of the calculation, you can explicitly call the py_to_r
function.
You can use the py_run_file
and py_run_string
functions to execute Python code inside the main module. Both of these functions return a reference to Python's main module, so you can access the execution results. For example.
py_run_file("script.py")
main <- py_run_string("x = 10")
main$x
Automatic conversion from R type to Python type works well in most cases, but in some cases more explicit manipulation is required on the R side to give the type that Python expects.
For example, if the Python API requires a list and you pass a vector of one element of R, it will be converted to a Python scalar. To overcome this, simply use R's list
function explicitly.
foo$bar(indexes = list(42L))
Similarly, the Python API may require tuples instead of lists, in which case you can use the tuple
function.
tuple("a", "b", "c")
R's named list is converted to a Python dictionary. You can also explicitly create a Python dictionary using the dict
function.
dict(foo = "bar", index = 42L)
This may be useful if you need to pass a dictionary with more complex objects (not strings) as keys.
You can use R's with
generic function to manipulate Python context manager objects (in Python you can do the same with the with
keyword). For example.
py <- import_builtins()
with(py$open("output.txt", "w") %as% file, {
file$write("Hello, there!")
})
This example opens a file and guarantees that it will be closed automatically at the end of the with block. Notice that we use the % as%
operator to give an alias to the object created by the context manager.
If the Python API returns iterators and generators, you can manipulate this using the ʻiterate function. .. You can use the ʻiterate
function to apply the R function to each element returned by the iterator.
iterate(iter, print)
If you don't pass a function to ʻiterate`, the results will be collected into an R vector.
results <- iterate(iter)
Note that the iterator value is consumed by the ʻiterate` function.
a <- iterate(iter) #The result is not empty
b <- iterate(iter) #The result is empty because the element has already been consumed
Some Python objects can be called (that is, they can be called with arguments like ordinary functions), as well as access to methods and properties. Callable Python objects are returned to R as objects rather than functions, but you can execute callable functions with the $ call ()
method. For example.
#Get a callable object
parser <- spacy$English()
#Call the object as a function
parser$call(spacy)
More advanced functions are also available, which are mainly useful when creating high-level R interfaces for Python libraries.
To work with Python objects from R, you usually use the $
operator to access the required object functionality. With $
, Python objects are automatically converted to R equivalents if possible, but the following functions can be used to manipulate Python objects at a lower level (for example, explicit). Unless you call py_to_r
, R will not be converted to an object).
function | Description |
---|---|
py_has_attr | Check if the object has the specified attributes. |
py_get_attr | Get the attributes of a Python object. |
py_set_attr | Set the attributes of the Python object. |
py_list_attributes | Get a list of all attributes of a Python object. |
py_call | Call a Python callable object with the specified arguments. |
py_to_r | Convert a Python object to an R equivalent object. |
r_to_py | Convert an R object to a Python equivalent. |
You can use the following functions to query information about Python settings available on your current system.
function | Description |
---|---|
py_available | Check if the interface to Python is available on this system. |
py_numpy_available | Check if the R interface to NumPy is available (NumPy 1 required).6 or more). |
py_module_available | Check if Python modules are available on this system. |
py_config | Get information about the location and version of Python in use. |
You can use the following functions to capture or suppress the output from Python.
function | Description |
---|---|
py_capture_output | Captures the Python output for the specified expression and returns it as an R character vector. |
py_suppress_warnings | Executes the specified expression, but suppresses the display of Python warnings. |
The following functions provide a variety of other low-level functionality.
function | Description |
---|---|
py_unicode | Convert a string to a Python Unicode object. |
py_str | Get the string representation of a Python object. |
py_is_null_xptr | Check if the Python object is a null externalptr. |
py_validate_xptr | Check if the Python object is a null externalptr and throw an error if so. |
If you want to use ** reticulate ** with other R packages, you need to consider the following: That is, when a package is submitted to CRAN, CRAN's test server may not have Python, NumPy, or any other Python module you are trying to wrap in the package.
To ensure that your package works well with CRAN, you should do two things:
When importing Python modules for use within a package, you should use the delay_load
option to ensure that the modules (and Python) are loaded only the first time you use them.
#Of python you want to use in your package'foo'module
foo <- NULL
.onLoad <- function(libname, pkgname) {
#Lazy loading the foo module ($Loaded only when accessed with)
foo <<- import("foo", delay_load = TRUE)
}
When writing a test, check if the module is available and skip the test if it is not available. For example, if you are using the ** testthat ** package, it would look like this:
# 'foo'Helper function to skip the test if there is no module
skip_if_no_foo <- function() {
have_foo <- py_module_available("foo")
if (!have_foo)
skip("Foo is not available for testing!")
}
#Call this helper function in all tests
test_that("Works as expected", {
skip_if_no_foo()
#Write test code here
})
Since the class of the Python object exposed on the R side is carried over to R by ** reticulate **, it is possible to write an S3 method for the class and customize the behavior of str
and print
, for example. You can (but usually don't need to), because the default str
and print
methods call PyObject_Str
, which usually provides an acceptable default behavior.
If you really decide to implement a customized S3 method for your Python class, it's important to keep the following in mind: That is, the connection to the Python object is lost at the end of the R session, so restoring .RData saved in one R session in a subsequent R session effectively loses the Python object (exactly). In other words, it becomes a NULL
ʻexternalptr` object of R).
This means that you should always use py_is_null_xptr
before manipulating Python objects with S3 methods. For example.
#' @export
summary.MyPythonClass <- function(object, ...) {
if (py_is_null_xptr(object))
stop("Object is NULL")
else
#Manipulate objects to generate summaries
}
To make this easier, there are some shortcut methods available. The py_validate_xptr
function makes the necessary checks and automatically throws an error if it fails. So the above example could be rewritten as:
#' @export
summary.MyPythonClass <- function(object, ...) {
py_validate_xptr(object)
#Manipulate objects to generate summaries
}
Finally, the ** reticulate ** package exports the py_str
generic function, which returns proper validation from the str
method (returns <pointer: 0x0>
if the object is NULL). It is called only after passing [^ tr1]. You can implement the py_str
method as follows.
#' @importFrom reticulate py_str
#' @export
py_str.MyPythonClass <- function(object, ...) {
#Manipulate objects to generate strings
}
In short, implement py_str
to provide custom str
and print
methods. For other S3 methods, be sure to call py_validate_xptr
or py_is_null_xptr
before manipulating the object.
[^ tr1]: I think it's a little confusing, so I'll add it. First, by reticulate, any Python object looks like an object that inherits the python.builtin.object
class on the R side. A str
method is defined for python.builtin.object
, and this method calls the py_str
generic function. And the py_str
generic function has a structure that calls the method after checking the arguments.
```r
reticulate:::str.python.builtin.object
```
```
## function (object, ...)
## {
## cat(py_str(object), "\n", sep = "")
## }
## <environment: namespace:reticulate>
```
```r
reticulate::py_str
```
```
## function (object, ...)
## {
## if (!inherits(object, "python.builtin.object"))
## py_str.default(object)
## else if (py_is_null_xptr(object) || !py_available())
## "<pointer: 0x0>"
## else UseMethod("py_str")
## }
## <environment: namespace:reticulate>
```
Recommended Posts