How are you doing in the whirlpool of the corona? I used faiss with Go and Rust, which I recently started studying, in order to devote my spare time to studying and verification that I can't usually use because I don't have to go out unnecessarily. I decided to try it. Faiss is a favorite neighborhood search library provided by Facebook Resarch, which was also introduced in Introduction of a little niche function of faiss.
Then, I would like to implement it in the order of python, Go, Rust.
python
First is the environment construction. If you follow the original Installation Guide, the module will be installed without any problem. The method of installing with conda
is also written, but I personally have a bitter memory in the conda
environment, so I built it from the source.
Next is the source code. The same is true for Go and Rust, but I wanted to measure performance later, so I output the log as json. Also, if you do not release the memory in each iteration, the memory will continue to increase for each iteration, so put the del variable
andgc.collect ()
at the end to forcibly release the memory. It is.
main.py
import gc
import logging
import sys
from time import time
import faiss
import numpy as np
from pythonjsonlogger import jsonlogger
def elapsed(f, *args, **kwargs):
start = time()
f(*args, **kwargs)
elapsed_time = time() - start
return elapsed_time
if __name__ == '__main__':
# Prepare log.
logger = logging.getLogger()
formatter = jsonlogger.JsonFormatter('(levelname) (asctime) (pathname) (lineno) (message)')
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.INFO)
# Define basic information.
d = 512
N = 1e6
nd = 10
nbs = np.arange(N / nd, N + 1, N / nd).astype(int)
nq = 10
k = 5
# Start measuring performance.
for i in range(100):
for nb in nbs:
# Prepare data.
xb = np.random.rand(nb, 512).astype(np.float32)
xq = np.random.rand(nq, 512).astype(np.float32)
# Construct index.
index = faiss.IndexFlatL2(d)
# Evaluate performance.
elapsed_add = elapsed(index.add, xb)
elapsed_search = elapsed(index.search, xq, k)
# Log performance.
logger.info('end one iteration.', extra={
'i': i,
'nb': nb,
'elapsed_add': elapsed_add,
'elapsed_search': elapsed_search
})
# Force to free memory.
del xb
del xq
del index
gc.collect()
Go
Next is Go. This also starts with environment construction. For Go, this is it! I didn't have the faiss wrapper, so I decided to use zhyon404 / faiss, which seems to be the simplest wrapping. In this repository, the environment is provided by Docker, so follow the README and do docker build
to create the environment. It was.
Next is the source code. Go also uses logrus.JSONFormatter
to output the log to json output, and also releases the memory in each iteration. In particular, the faiss index has memory allocated in the C area, so it was not released easily, and I had a hard time finding a method called faiss_go.FaissIndexFree
.
main.go
package main
import (
"github.com/facebookresearch/faiss/faiss_go"
log "github.com/sirupsen/logrus"
"math/rand"
"os"
"runtime"
"runtime/debug"
"time"
)
func main() {
// Prepare log.
log.SetFormatter(&log.JSONFormatter{})
log.SetOutput(os.Stdout)
// Define basic information.
d := 512
nbs := []int{1e5, 2e5, 3e5, 4e5, 5e5, 6e5, 7e5, 8e5, 9e5, 1e6}
nq := 10
k := 5
// Start measuring performance.
for i := 0; i < 100; i++ {
for _, nb := range nbs {
// Prepare data.
xb := make([]float32, d*nb)
xq := make([]float32, d*nq)
for i := 0; i < nb; i++ {
for j := 0; j < d; j++ {
xb[d*i+j] = rand.Float32()
}
}
for i := 0; i < nq; i++ {
for j := 0; j < d; j++ {
xq[d*i+j] = rand.Float32()
}
}
// Construct index.
v := new(faiss_go.Faissindexflatl2)
faiss_go.FaissIndexflatl2NewWith(&v, d)
index := (*faiss_go.Faissindex)(v)
// Evaluate performance.
add_start := time.Now()
faiss_go.FaissIndexAdd(index, nb, xb)
add_end := time.Now()
I := make([]int, k*nq)
D := make([]float32, k*nq)
search_start := time.Now()
faiss_go.FaissIndexSearch(index, nq, xq, k, D, I)
search_end := time.Now()
// Log performance.
log.WithFields(log.Fields{
"i": i,
"nb": nb,
"elapsed_add": add_end.Sub(add_start).Seconds(),
"elapsed_search": search_end.Sub(search_start).Seconds(),
}).Info("end one iteration.")
// Force to free memory.
faiss_go.FaissIndexFree(index)
runtime.GC()
debug.FreeOSMemory()
}
}
}
Rust Finally, Rust. Here too, it is from the environment construction. I decided to use Enet4 / faiss-rs, which is also published in Docs.rs, as the wrapper for faiss. Basically, you can install it by following README.
This will result in the dynamic library faiss_c ("libfaiss_c.so" in Linux), which needs to be installed in a place where your system will pick up. In Linux, try somewhere in the LD_LIBRARY_PATH environment variable, such as "/usr/lib", or try adding a new path to this variable.
It is important not to forget to add the path to the library. Also, depending on the environment, it does not seem to work unless it is added to LIBRARY_PATH
.
Next is the source code. This also uses json_logger
to output the log as json. I have defined struct
according to the sample, but I wonder if there is a better way. I think. Also, since Rust's random number generation was slow and performance measurement was difficult, use rand_xorshift
by referring to Difference in processing speed when generating random numbers with Rust. I made it. What was interesting was that, unlike python and Go, it was possible to implement it without being particularly conscious of memory release, even though memory allocation in the C area was involved.
Cargo.toml
[package]
name = "rust"
version = "0.1.0"
authors = []
edition = "2018"
[dependencies]
faiss = "0.8.0"
json_logger = "0.1"
log = "0.4"
rand = "0.7"
rand_xorshift = "0.2"
rustc-serialize = "0.3"
main.rs
use faiss::{Index, index_factory, MetricType};
use log::{info, LevelFilter};
use rand::{RngCore, SeedableRng};
use rand_xorshift::XorShiftRng;
use rustc_serialize::json;
use std::time::Instant;
#[derive(RustcEncodable)]
struct LogMessage<'a> {
msg: &'a str,
i: i32,
nb: i32,
elapsed_add: f32,
elapsed_search: f32
}
fn main() {
// Prepare log.
json_logger::init("faiss", LevelFilter::Info).unwrap();
// Define basic information.
let d: i32 = 512;
const N: i32 = 1_000_000;
let nd: i32 = 10;
let nbs: Vec<i32> = (N/nd..N+1).step_by((N/nd) as usize).collect();
let nq: i32 = 10;
let k: usize = 5;
let mut rng: XorShiftRng = SeedableRng::from_entropy();
// Start measuring performance.
for i in 0..100 {
for &nb in nbs.iter() {
// Prepare data.
let xb: Vec<f32> = (0..nb*d).map(|_| rng.next_u32() as f32 / u32::max_value() as f32).collect();
let xq: Vec<f32> = (0..nq*d).map(|_| rng.next_u32() as f32 / u32::max_value() as f32).collect();
// Construct index.
let mut index = index_factory(d as u32, "Flat", MetricType::L2).unwrap();
// Evaluate performance.
let start = Instant::now();
index.add(&xb).unwrap();
let elapsed_add = start.elapsed().as_micros() as f32 / 1e6;
let start = Instant::now();
index.search(&xq, k).unwrap();
let elapsed_search = start.elapsed().as_micros() as f32 / 1e6;
// Log performance.
info!("{}", json::encode(&LogMessage {
msg: "end one iteration.", i, nb, elapsed_add, elapsed_search
}).unwrap());
}
}
}
Well, python, GO, Rust are just wrapping the C API of faiss, so I thought that there would be no difference in performance, but I implemented it so I decided to measure performance. .. In addition, OpenBLAS was used for the matrix operation library, m5.large of AWS EC2 was used for the environment, and AMI was measured with Canonical, Ubuntu, 18.04 LTS, amd64 bionic image build on 2020-01-12
.
The graph below shows the average processing time for each number of data, with the number of target data being moved between $ 10 ^ 5 $ and $ 10 ^ 6 $ for search and add 100 times each.
add
search
In all languages, the processing time increases linearly with the increase in the number of data. With add, there was almost no difference in performance between the three languages, but with search, the result was that only python performed better. I was a little surprised because I thought that there would be no difference in performance because they are similar wrappers. Furthermore, if we plot Go processing time / python processing time
to see how this performance difference changes depending on the number of data.
It seems that Go and Rust are slower than python, with an average of 1.44 times regardless of the number of data.
I tried using faiss, which is a neighborhood search library of Facebook Research, with python, Go, and Rust, and measured the performance. When I tried using the same library in different languages, it was interesting to see something like a habit of each language. In particular, I felt that it was encouraging that Rust would release even the memory allocated in the C area properly. The language has evolved considerably since the days when it was implemented in C. In terms of performance, Go and Rust are about the same, and only python is 1.44 times faster. I was surprised to assume that the performance would be the same in that all three languages are faiss wrappers written in C. Only python is officially supported, and the compilation method is different between python and Go / Rust, so I'm wondering if that is involved. It will be interesting to dig deep there this time!
Recommended Posts