Overview

I have summarized the procedure for performing principal component analysis using sklearn from Nim. Principal component analysis can be performed in a few lines by using sklearn, so I will try using a library called nimpy to execute it from Nim. The part that actually performs principal component analysis using sklearn is written in Python.

I've done it before about how to call Python with nimpy, so it's an application. Call python from nim with Nimpy

Preparation on the Python side

As in the example (refer to the above article), add CONFIGURE_OPTS ="-enable-shared " when inserting arbitrary Python with pyenv etc. so that libpython is generated. Then add sklearn using poetry etc.

`pyproject.toml`


[tool.poetry]
name = "nimpy_pca"
version = "0.1.0"
description = ""
authors = ["Your Name <[email protected]>"]

[tool.poetry.dependencies]
python = "^3.7"
numpy = "^1.18"
scikit-learn = "^0.22.2"

[tool.poetry.dev-dependencies]

[build-system]
requires = ["poetry>=1.0"]
build-backend = "poetry.masonry.api"

Since it is difficult to call PCA in sklearn from nim, write the process in python. (In nimpy, the method is called, so you can't use the style like initializer or syntax without return value.) I try to exchange data and results using JSON to simplify the passing of types.

`pca.py`


from sklearn.decomposition import PCA
import numpy as np
import json

def pca(json_text):
    data = json.loads(json_text)
    pca = PCA(n_components=2)
    A = np.array(data)
    pca.fit(A)
    return {
        "components": pca.components_.tolist(),
        "varianceRatio": pca.explained_variance_ratio_.tolist()
    }

Call Python from Nim with nimpy

First, specify the libpython path so that the sklearn entered in poetry can be read.

import nimpy
import nimpy/py_lib as pyLib

pyLib.pyInitLibPath("/root/.pyenv/versions/3.7.7/lib/libpython3.7m.so")

Also, add the path where the file is located so that you can load the Python you just created.

discard pyImport("sys").path.append("/workspace/src")

Then use the JSON data and pass the data to the previous method. I'm using toJson to convert the return value PyObject to a JsonNode type.

let pcaResult = pyImport("pca.py").callMethod("pca", json).toJson

The rest is completed by expanding the value contained in pcaResult.

import sugar

let projectedValues = datas.map(data =>
    pcaResult["components"].getElems.map(c => c.getElems.zip(data).map(n => n[0].getFloat * n[1]).foldl(a + b))
)

Summary

I got stuck in the way to specify the path of pyImport and the name collision of the python file, but I was able to read python this way. If you apply this, it seems that you can easily execute other statistical processing and linear processing by leaving it to Python.

Principal component analysis using python from nim with nimpy