I will use it often, so I will leave it for notes.
joblib and pickle are ** libraries that can store various data in a nice way **. It can be used not only for text and csv, but also for saving trained models. Also, I feel that the reading and writing speeds are fast. (It seems that memory is used quite a bit)
Basically, it seems that using joblib is better in terms of memory than pickle.
import pandas as pd
arr = ['a','b','c','d','e']
df = pd.DataFrame({'data':arr})
df.head(5)
# data
#0 a
#1 b
#2 c
#3 d
#4 e
import joblib
#Data storage
joblib.dump(df,'test_jb.pkl', compress=3)
#Data reading
load_df = joblib.load('test_jb.pkl')
load_df.head()
# data
#0 a
#1 b
#2 c
#3 d
#4 e
import pandas as pd
#Data storage
df.to_pickle('test_pk.pkl')
#Data reading
load_df2 = pd.read_pickle('test_pk.pkl')
load_df2.head()
# data
#0 a
#1 b
#2 c
#3 d
#4 e
If you change the "data" part to "learned model", you can save the model as it is.
Recommended Posts