Boost.NumPy Tutorial for Extending Python in C ++ (Practice)

C ++ Advent Calender Day 10 article.

Boost 1.63 merges Boost.NumPy into Boost.Python. Along with that, there may be a problem with the following description. I will write a revised article.

Motivation

Python is really useful. For me, who is an academia (apprentice) whose main focus is simulation and analysis of the results, An IPython Notebook that allows you to seamlessly process, analyze, and visualize data interactively is a must. Unfortunately, C ++ alone does not provide the equivalent functionality (probably). The cling of ROOT made by CERN seems to be able to execute C ++ interactively, but I have never used it because there is little information. Please make an IC ++ Notebook that can be literately programmed in C ++ (seriously).

However, the simulation itself runs for days to weeks, Speed is essential and Python cannot be written in a slow language. So the simulation will be written in C ++ (what Fortran is that delicious).

On the other hand, projects like scikit-learn and PyMC are front ends for Python. If you look at the place where you are preparing For the time being, I feel that it will become more common to implement it in another language and use it from Python. So, among the methods for writing functions that can be used in Python in C ++ Here's how to use Boost.NumPy.

Why Boost.NumPy?

I once tried to use Boost.Python and was frustrated because the converter didn't understand Boost.NumPy was very easy to use (important).

To briefly mention the other options, Boost.NumPy introduced this time simply takes the method of constructing a wrapping for C ++ of numpy.ndarray in Python. Since C ++ already has (innumerable) libraries that perform linear algebra operations like Eigen, There is also a Choice that converts numpy.ndarray to Eigen's multidimensional vector as is. More simply, using only Boost.Python, convert C ++ vector etc. to Python list, You also have the option of then converting to numpy.ndarray. There are many others such as SWIG and Cython.

Purpose

I summarized how to install and compile in previous article, so this time I will summarize how to actually use it. Normally, I would like to use only the interface part of C ++ code from the Python side, so I will summarize how to use it as easily as possible while avoiding difficult things.

I won't explain Boost.Python, so please check it yourself. Below is a list of Boost.Python commentary sites:

let's boost -Relaxing engineer's diary -A quick introduction to the features of Boost.Python

-Boost.Python -Boost.Python (Japanese translation) Honke (Japanese translation)

Tutorial for Boost.NumPy

The following is what I wrote based on the Boost.NumPy tutorial. The namespace is abbreviated as follows to eliminate notational complexity.

namespace p = boost::python;
namespace np = boost::numpy;

How to use `np :: ndarray` (for one-dimensional)

The introduction was long, so let's put a code that works quickly:

`mymod1.cpp`


#include "boost/numpy.hpp"
#include <stdexcept>
#include <algorithm>

namespace p = boost::python;
namespace np = boost::numpy;

/*Double*/
void mult_two(np::ndarray a) {
  int nd = a.get_nd();
  if (nd != 1)
    throw std::runtime_error("a must be 1-dimensional");
  size_t N = a.shape(0);
  if (a.get_dtype() != np::dtype::get_builtin<double>())
    throw std::runtime_error("a must be float64 array");
  double *p = reinterpret_cast<double *>(a.get_data());
  std::transform(p, p + N, p, [](doublex){return2*x;});
}

BOOST_PYTHON_MODULE(mymod1) {
  Py_Initialize();
  np::initialize();
  p::def("mult_two", mult_two);
}

This can be compiled as is. It's a simple function that just doubles a one-dimensional array, but it contains important elements.

--Use get_nd () for array dimensions --Use shape (n) for the size of the array --The type of the elements of the array is dynamically determined and you can get it with get_dtype () --Data can be accessed with a raw pointer --C ++ std :: runtime_error is converted to RuntimeError on the Python side

The memory is managed on the np :: ndarray side, so you do not need to be aware of it. Unfortunately, the type is dynamically determined, so the function cannot be overloaded. It must be judged by the if statement at the time of execution. It's nice to convert exceptions.

Let's start with Python.

`mymod1.py`


#!/usr/bin/env python
# -*- coding: utf-8 -*-

import mymod1
import numpy as np

if __name__ == '__main__':
    a = np.array([1,2,3], dtype=np.float64)
    mymod1.mult_two(a)
    print(a)

    b = np.array([1,2,3], dtype=np.int64)
    mymod1.mult_two(b) # raise error
    print(b)

The b part is an error because the type is long long instead of double as described above:

[ 2.  4.  6.]
Traceback (most recent call last):
  File "/path/of/mymod1.py", line 13, in <module>
    mymod1.mult_two(b)
RuntimeError: a must be float64 array

When using an integer on numpy, ʻint64 (long long in C ++) is used unless otherwise specified. Please note that it is not ʻint.

There are some complaints such as not being able to overload, but I was able to implement a function in C ++ that can be easily used in Python.

How to use `np :: ndarray` (for multidimensional)

The basics are the same for multidimensional, but you need to be careful about the order of memory. The explanation of numpy.ndarray.stride is easy to understand, but below I will explain briefly.

Sometimes you want to manage a two-dimensional array in one continuous memory area.

double *a_as1 = new double[N*M];
double **a_as2 = new double*[N];
for(int i=0;i<N;++i){
  a_as2[i] = &a_as1[i*M];
}

Then the memory called by ʻa_as2 [i] [j]will be the same as ʻa_as1 [i * M + j]. The way to do something like this ʻi * M + j is ndarray.stride. In this case, it advances 1 byte in the j direction, but it advances 8 bytes in memory (double is 8 bytes). On the other hand, to advance 1 in the i direction, it advances 8 * MByte in memory ((i + 1) * M + j = i * M + j + M). These 8 and 8 * M` are called stride (stride, stride). This idea can be used even at higher dimensions.

If you pay attention to this, the rest is easy.

#include "boost/numpy.hpp"
#include <iostream>
#include <stdexcept>
#include <algorithm>

namespace p = boost::python;
namespace np = boost::numpy;

void print(np::ndarray a) {
  int nd = a.get_nd();
  if (nd != 2)
    throw std::runtime_error("a must be two-dimensional");
  if (a.get_dtype() != np::dtype::get_builtin<double>())
    throw std::runtime_error("a must be float64 array");

  auto shape = a.get_shape();
  auto strides = a.get_strides();

  std::cout << "i j val\n";
  for (int i = 0; i < shape[0]; ++i) {
    for (int j = 0; j < shape[1]; ++j) {
      std::cout << i << " " << j << " "
                << *reinterpret_cast<double *>(a.get_data() + i * strides[0] +
                                               j * strides[1]) << std::endl;
    }
  }
}

BOOST_PYTHON_MODULE(mymod2) {
  Py_Initialize();
  np::initialize();
  p::def("print", print);
}

It is faster to look at stride and turn it in ascending order in terms of memory access, I omitted it because it is troublesome. Please do your best.

Finally

Python or NumPy is really convenient. The big point is that SciPy provides the interface of the standard numerical calculation library with careful documentation.

Now that we have summarized how to access the data in np :: ndarray, I think that the algorithm implemented in C ++ can be used from Python. Next, I would like to find out about posts to PyPI and posts to scikits.

Boost.NumPy Tutorial for Extending Python in C ++ (Practice)

Motivation

Why Boost.NumPy?

Purpose

How to use np :: ndarray (for one-dimensional)

mymod1.cpp

mymod1.py

How to use np :: ndarray (for multidimensional)

Finally

How to use `np :: ndarray` (for one-dimensional)

`mymod1.cpp`

`mymod1.py`

How to use `np :: ndarray` (for multidimensional)