Since Python, R, and Julia have their own strengths, I think there are often situations where you want to use them in combination. It has the ability to call code directly, but in data analysis situations it is often sufficient to have separate scripts take charge of different steps without having to combine them so tightly. For example, it is easy to imagine a case where data is scraped in Python, analysis is performed in multithreading in Julia, and statistical analysis and visualization are performed in R.
Why Feather? In such a case, if you save it with pickle in Python, of course you can not bring the data to other programming languages, on the other hand, saving in CSV is slow or it is troublesome to reparse at the time of reading, etc. there is. This time, I will briefly introduce the Feather format that solves the problems of workflow construction and how to use it. Feather is a lightweight format for storing data, has a simple API, is free to move between programming languages, and is fast to read and write.
According to Comparison article here, Feather shows excellent performance in terms of both speed and memory consumption. The actual performance will vary depending on what kind of data you store, but it's easy to use anyway, so it's probably well worth a try.
** Feather format does not support row labels. Therefore, if you are giving row labels in pandas, you need to do df.reset_index ()
in advance. ** I don't think R uses row labels at all, and some say it's not recommended -of-a-dataframe-in-R /).
Python
python.py
import pandas as pd
import feather
# read
df = feather.read_dataframe("foobar.feather")
# write
feather.write_dataframe(df, "foobar.feather")
R
r.r
library(feather)
# read
df <- read_feather("foobar.feather")
# write
write_feather(df, "foobar.feather")
Julia
julia.jl
using DataFrames
using Feather
# read
df = Feather.read("foobar.feather")
# write
Feather.write("foobar.feather", df)
Only this. I think it's easier than CSV because you can read and write in any language without worrying about types and headers.
Postscript: Feather V2 has recently been released. I haven't got the corresponding packages in Julia yet, so I won't cover them here. The content of this article is for Feather V1.
Recommended Posts