Compress variables such as DataFrame with joblib instead of pickle to read and write

It is better to save a large-scale DataFrame with pickle, When uploading to git, if it was pickle, it might not be possible to push due to capacity. Therefore, I decided to compress it with joblib and manage the data in git as well.

When saving and loading with pickle

`save_pickle.py`



import pandas as pd
df = pd.DataFrame([1,2,3])
df.to_pickle('df.pickle')

`read_pickle.py`


import pandas as pd

df = pd.read_pickle('df.pickle')

Click here to save and load with joblib. You can change the compression ratio with compress. If you compress too much, it will take time to compress and read, so In my case, 4 seemed to be good.

`save_joblib.py`



import pandas as pd
import joblib

df = pd.DataFrame([1,2,3])
joblib.dump(df, 'df.joblib', compress=4)

`read_joblib.py`


import pandas as pd
import joblib

df = joblib.load('df.joblib')

Recommended Posts

Compress variables such as DataFrame with joblib instead of pickle to read and write

Read CSV file with Python and convert it to DataFrame as it is

Compare read / write speed and capacity of csv, pickle, joblib, parquet in python environment

How to read original data or external data on the Internet with scikit-learn instead of attached data set such as iris

How to enable Read / Write of net.Conn with context with golang

Bind to class to read and write YAML

Read and write csv files with numpy

Compress python data and write to sqlite

Read JSON with Python and output as CSV

Script to tweet with multiples of 3 and numbers with 3 !!

[Python3] Read and write with datetime isoformat with json

[Python] How to write type annotations for Callable objects treated as variables and arguments

Read the csv file with jupyter notebook and write the graph on top of it

I tried to read and save automatically with VOICEROID2 2

I tried to automatically read and save with VOICEROID2

Read and write files with Slackbot ~ Bot development with Python ~

[Graph drawing] I tried to write a bar graph of multiple series with matplotlib and seaborn