Using SONY's deep learning framework NNabla, I will try the procedure from learning with python to making inference from C ++.

I think one of the features of NNabla is that the core is pure C ++. Learning is done on a machine with a GPU, and I think it is an advantage to be able to quickly migrate to C ++ when exporting the learned net to an embedded device.

There is no documentation for the C ++ API yet, but since ver 0.9.4, usage examples have been added to examples / cpp. If it is only inference, it seems that it can be used by imitating this.

As usual, the theme is chainer and deep learning learned by function approximation. A function called exp (x) is inferred by a three-layer MLP.

All code is posted here [https://github.com/ashitani/NNabla_exp).

Learning in python

First, import the basic library.

import nnabla as nn
import nnabla.functions as F
import nnabla.parametric_functions as PF
import nnabla.solvers as S

Graph generation is almost the same as chainer. Functions without parameters are defined in F, and functions with parameters are defined in PF.

x = nn.Variable((batch_size,1))
h1 = F.elu(PF.affine(x, 16,name="affine1"))
h2 = F.elu(PF.affine(h1, 32,name="affine2"))
y = F.elu(PF.affine(h2, 1,name="affine3"))

For PF, you can define scope or specify the name argument, but be aware that if you do not specify it, the namespace will be covered and an error will occur. Since there was no leaky_relu, I replaced it with elu this time.

Next, define the loss function. There is no particular hesitation.

t = nn.Variable((batch_size,1))
loss = F.mean(F.squared_error(y, t))

The definition of solver. In Adam without thinking.

solver = S.Adam()
solver.set_parameters(nn.get_parameters())

Turn the learning loop. With the flow of forward (), zero_grad (), backward (), update (), if you are familiar with chainer, it is a workflow that does not feel strange at all. There seems to be a function for data supply, but I haven't used it this time.

losses=[]
for i in range(10000):
    dat=get_batch(batch_size)
    x.d=dat[0].reshape((batch_size,1))
    t.d=dat[1].reshape((batch_size,1))
    loss.forward()
    solver.zero_grad()
    loss.backward()
    solver.update()
    losses.append(loss.d.copy())
    if i % 1000 == 0:
        print(i, loss.d)

I plotted the loss.

Use .d or .data to access the node's data. You can forward () to any node in the graph. Use this to make inferences.

x.d= 0.2
y.forward()
print(y.d[0][0])

At first glance, it is strange that even if you put a scalar in a net made up of batch size 100, it will pass, but it seems that if you put a scalar, it will be replaced with data containing the same value for 100 pieces. Therefore, the same value will be output for 100 outputs.

Learned parameters can be saved with save_parameters ().

nn.save_parameters("exp_net.h5")

The code to infer using the saved parameters is here. It can be called with load_parameters (). At the time of inference, it is better to change the batch size to 1.

The inference result. The blue line is the output of the math library, and the red line is the output of this net, which almost overlaps.

Inference in C ++

It seems that you can use it from C ++ if you save it in the format of NNP. I tried to imitate the description of this area, but I think that API documentation will be created soon.

import nnabla.utils.save
                                      
runtime_contents = {
        'networks': [
            {'name': 'runtime',
             'batch_size': 1,
             'outputs': {'y': y},
             'names': {'x': x}}],
        'executors': [
            {'name': 'runtime',
             'network': 'runtime',
             'data': ['x'],
             'output': ['y']}]}
nnabla.utils.save.save('exp_net.nnp', runtime_contents)

Use this in the code below. It's almost an example.

#include <nbla_utils/nnp.hpp>

#include <iostream>
#include <string>
#include <cmath>

int main(int argc, char *argv[]) {
  nbla::CgVariablePtr y;
  float in_x;
  const float *y_data;

  // Load NNP files and prepare net
  nbla::Context ctx{"cpu", "CpuCachedArray", "0", "default"};
  nbla::utils::nnp::Nnp nnp(ctx);
  nnp.add("exp_net.nnp");
  auto executor = nnp.get_executor("runtime");
  executor->set_batch_size(1);
  nbla::CgVariablePtr x = executor->get_data_variables().at(0).variable;
  float *data = x->variable()->cast_data_and_get_pointer<float>(ctx);

  for(int i=1;i<10;i++){
    // set input data
    in_x = 0.1*i;
    *data = in_x;

    // execute
    executor->execute();
    y = executor->get_output_variables().at(0).variable;
    y_data= y->variable()->get_data_pointer<float>(ctx);

    // print output
    std::cout << "exp(" << in_x <<"):" << "predict: " << y_data[0] << ", actual: " << std::exp(in_x) <<std::endl;
  }

  return 0;
}

The Makefile used for the build is below. It remains as an example.

all: exp_net.cpp
    $(CXX) -std=c++11 -O -o exp_net exp_net.cpp -lnnabla -lnnabla_utils

clean:
    rm -f exp_net

This is the execution result.

exp(0.1):predict: 1.10528, actual: 1.10517
exp(0.2):predict: 1.22363, actual: 1.2214
exp(0.3):predict: 1.34919, actual: 1.34986
exp(0.4):predict: 1.4878, actual: 1.49182
exp(0.5):predict: 1.64416, actual: 1.64872
exp(0.6):predict: 1.81886, actual: 1.82212
exp(0.7):predict: 2.01415, actual: 2.01375
exp(0.8):predict: 2.2279, actual: 2.22554
exp(0.9):predict: 2.45814, actual: 2.4596

The time comparison of the inference itself is a loop of 10000 samples

Python	C++
566msec	360msec

was. I think that the net like this time is too small and not very credible for comparison. However, above all, the overhead immediately after startup is different, and the total execution time of the program that only calculates the same 10000 sample is as follows. The sizes of h5 and NNP are quite different, and the python side has to start over from model construction, so I think that it is an unfair comparison, so it is just for reference, but it is quite different.

	Python	C++
real	7.186s	0.397s
user	1.355s	0.385s
sys	0.286s	0.007s

Impressions

Basically, I just imitated the example, but I think that it is quite convenient that the model learned in python can be used quickly from C ++. The speed comparison isn't as good as this one, but there's no reason to slow it down. I would like to compare with a large net.

There are various traps because NNabla's build on OS X is not yet supported, but it will be resolved soon. The installation procedure at this point will be described in the following chapters.

In addition, there is a wonderful instruction in here to run it with Docker. I think the idea of putting the jupyter notebook file storage on the host side is great.

Install python library (OSX)

The build procedure is as instructed.

git clone https://github.com/sony/nnabla
cd nnabla
sudo pip install -U -r python/setup_requirements.txt
sudo pip install -U -r python/requirements.txt

mkdir build
cd build
cmake ../
make
cd dist
sudo pip install -U nnabla-0.9.4.post8+g1aa7502-cp27-cp27mu-macosx_10_11_x86_64.whl

However, the following error occurs with import nnabla.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/***/.pyenv/versions/2.7.11/Python.framework/Versions/2.7/lib/python2.7/site-packages/nnabla/__init__.py", line 16, in <module>
    import _init  # Must be imported first
ImportError: dlopen(/Users/***/.pyenv/versions/2.7.11/Python.framework/Versions/2.7/lib/python2.7/site-packages/nnabla/_init.so, 2): Library not loaded: @rpath/libnnabla.dylib
  Referenced from: /Users/***/.pyenv/versions/2.7.11/Python.framework/Versions/2.7/lib/python2.7/site-packages/nnabla/_init.so
  Reason: image not found

 Library not loaded: @rpath/libnnabla.dylib

So, add the path where libnnabla.dylib exists to DYLD_LIBRARY_PATH.

export DYLD_LIBRARY_PATH='~/.pyenv/versions/2.7.11/Python.framework/Versions/2.7/lib/python2.7/site-packages/nnabla/':$DYLD_LIBRARY_PATH

The import nnabla is now passed.

However, when I try "examples / vision / mnist / classification.py"

Symbol not found: __gfortran_stop_numeric_f08

I couldn't do it. It seems that we are preparing MNIST data, so we are not following it in depth. For the time being, the example in this article was able to work even in this state.

C ++ Library Installation (OSX)

libarchive is put in with homebrew.

brew install libarchive
brew link --force libarchive

cmake .. -DBUILD_CPP_UTILS=ON -DBUILD_PYTHON_API=OFF
make

I got the following error.

Undefined symbols for architecture x86_64:
  "_archive_read_free", referenced from:
      nbla::utils::nnp::Nnp::add(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in nnp.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [lib/libnnabla_utils.dylib] Error 1
make[1]: *** [src/nbla_utils/CMakeFiles/nnabla_utils.dir/all] Error 2
make: *** [all] Error 2

Apparently there is a version inconsistency in libarchive. When I follow it, the following are stopped due to an error

nnabla/build_cpp/src/nbla_utils/CMakeFiles/nnabla_utils.dir/link.txt

At the end of, where /usr/lib/libarchive.dylib is specified. I rewrote it to /usr/local/lib/libarchive.dylib installed with brew and the build passed. I'm not sure if / usr / lib is from OS X or if I put it in the past. ..

Originally, you should refer to the one installed by brew at the time of cmake. Well it worked so I'm glad.

Try NNabla's C ++ API

Learning in python

Inference in C ++

Impressions

Install python library (OSX)

C ++ Library Installation (OSX)