C ++ Advent Calender Day 10 article.
Boost 1.63 merges Boost.NumPy into Boost.Python. Along with that, there may be a problem with the following description. I will write a revised article.
Python is really useful. For me, who is an academia (apprentice) whose main focus is simulation and analysis of the results, An IPython Notebook that allows you to seamlessly process, analyze, and visualize data interactively is a must. Unfortunately, C ++ alone does not provide the equivalent functionality (probably). The cling of ROOT made by CERN seems to be able to execute C ++ interactively, but I have never used it because there is little information. Please make an IC ++ Notebook that can be literately programmed in C ++ (seriously).
However, the simulation itself runs for days to weeks, Speed is essential and Python cannot be written in a slow language. So the simulation will be written in C ++ (what Fortran is that delicious).
On the other hand, projects like scikit-learn and PyMC are front ends for Python. If you look at the place where you are preparing For the time being, I feel that it will become more common to implement it in another language and use it from Python. So, among the methods for writing functions that can be used in Python in C ++ Here's how to use Boost.NumPy.
I once tried to use Boost.Python and was frustrated because the converter didn't understand Boost.NumPy was very easy to use (important).
To briefly mention the other options,
Boost.NumPy introduced this time simply takes the method of constructing a wrapping for C ++ of numpy.ndarray
in Python.
Since C ++ already has (innumerable) libraries that perform linear algebra operations like Eigen,
There is also a Choice that converts numpy.ndarray
to Eigen's multidimensional vector as is.
More simply, using only Boost.Python, convert C ++ vector
etc. to Python list,
You also have the option of then converting to numpy.ndarray
.
There are many others such as SWIG and Cython.
I summarized how to install and compile in previous article, so this time I will summarize how to actually use it. Normally, I would like to use only the interface part of C ++ code from the Python side, so I will summarize how to use it as easily as possible while avoiding difficult things.
I won't explain Boost.Python, so please check it yourself. Below is a list of Boost.Python commentary sites:
-Boost.Python -Boost.Python (Japanese translation) Honke (Japanese translation)
Tutorial for Boost.NumPy
The following is what I wrote based on the Boost.NumPy tutorial. The namespace is abbreviated as follows to eliminate notational complexity.
namespace p = boost::python;
namespace np = boost::numpy;
np :: ndarray
(for one-dimensional)The introduction was long, so let's put a code that works quickly:
mymod1.cpp
#include "boost/numpy.hpp"
#include <stdexcept>
#include <algorithm>
namespace p = boost::python;
namespace np = boost::numpy;
/*Double*/
void mult_two(np::ndarray a) {
int nd = a.get_nd();
if (nd != 1)
throw std::runtime_error("a must be 1-dimensional");
size_t N = a.shape(0);
if (a.get_dtype() != np::dtype::get_builtin<double>())
throw std::runtime_error("a must be float64 array");
double *p = reinterpret_cast<double *>(a.get_data());
std::transform(p, p + N, p, [](doublex){return2*x;});
}
BOOST_PYTHON_MODULE(mymod1) {
Py_Initialize();
np::initialize();
p::def("mult_two", mult_two);
}
This can be compiled as is. It's a simple function that just doubles a one-dimensional array, but it contains important elements.
--Use get_nd ()
for array dimensions
--Use shape (n)
for the size of the array
--The type of the elements of the array is dynamically determined and you can get it with get_dtype ()
--Data can be accessed with a raw pointer
--C ++ std :: runtime_error
is converted to RuntimeError
on the Python side
The memory is managed on the np :: ndarray
side, so you do not need to be aware of it.
Unfortunately, the type is dynamically determined, so the function cannot be overloaded.
It must be judged by the if statement at the time of execution.
It's nice to convert exceptions.
Let's start with Python.
mymod1.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import mymod1
import numpy as np
if __name__ == '__main__':
a = np.array([1,2,3], dtype=np.float64)
mymod1.mult_two(a)
print(a)
b = np.array([1,2,3], dtype=np.int64)
mymod1.mult_two(b) # raise error
print(b)
The b part is an error because the type is long long
instead of double
as described above:
[ 2. 4. 6.]
Traceback (most recent call last):
File "/path/of/mymod1.py", line 13, in <module>
mymod1.mult_two(b)
RuntimeError: a must be float64 array
When using an integer on numpy, ʻint64 (
long long in C ++) is used unless otherwise specified. Please note that it is not ʻint
.
There are some complaints such as not being able to overload, but I was able to implement a function in C ++ that can be easily used in Python.
np :: ndarray
(for multidimensional)The basics are the same for multidimensional, but you need to be careful about the order of memory. The explanation of numpy.ndarray.stride is easy to understand, but below I will explain briefly.
Sometimes you want to manage a two-dimensional array in one continuous memory area.
double *a_as1 = new double[N*M];
double **a_as2 = new double*[N];
for(int i=0;i<N;++i){
a_as2[i] = &a_as1[i*M];
}
Then the memory called by ʻa_as2 [i] [j]will be the same as ʻa_as1 [i * M + j]
.
The way to do something like this ʻi * M + j is
ndarray.stride. In this case, it advances 1 byte in the j direction, but it advances 8 bytes in memory (double is 8 bytes). On the other hand, to advance 1 in the i direction, it advances
8 * MByte in memory (
(i + 1) * M + j = i * M + j + M). These
8 and
8 * M` are called stride (stride, stride).
This idea can be used even at higher dimensions.
If you pay attention to this, the rest is easy.
#include "boost/numpy.hpp"
#include <iostream>
#include <stdexcept>
#include <algorithm>
namespace p = boost::python;
namespace np = boost::numpy;
void print(np::ndarray a) {
int nd = a.get_nd();
if (nd != 2)
throw std::runtime_error("a must be two-dimensional");
if (a.get_dtype() != np::dtype::get_builtin<double>())
throw std::runtime_error("a must be float64 array");
auto shape = a.get_shape();
auto strides = a.get_strides();
std::cout << "i j val\n";
for (int i = 0; i < shape[0]; ++i) {
for (int j = 0; j < shape[1]; ++j) {
std::cout << i << " " << j << " "
<< *reinterpret_cast<double *>(a.get_data() + i * strides[0] +
j * strides[1]) << std::endl;
}
}
}
BOOST_PYTHON_MODULE(mymod2) {
Py_Initialize();
np::initialize();
p::def("print", print);
}
It is faster to look at stride and turn it in ascending order in terms of memory access, I omitted it because it is troublesome. Please do your best.
Python or NumPy is really convenient. The big point is that SciPy provides the interface of the standard numerical calculation library with careful documentation.
Now that we have summarized how to access the data in np :: ndarray
,
I think that the algorithm implemented in C ++ can be used from Python.
Next, I would like to find out about posts to PyPI and posts to scikits.
Recommended Posts