It is better to save a large-scale DataFrame with pickle, When uploading to git, if it was pickle, it might not be possible to push due to capacity. Therefore, I decided to compress it with joblib and manage the data in git as well.
When saving and loading with pickle
save_pickle.py
import pandas as pd
df = pd.DataFrame([1,2,3])
df.to_pickle('df.pickle')
read_pickle.py
import pandas as pd
df = pd.read_pickle('df.pickle')
Click here to save and load with joblib. You can change the compression ratio with compress. If you compress too much, it will take time to compress and read, so In my case, 4 seemed to be good.
save_joblib.py
import pandas as pd
import joblib
df = pd.DataFrame([1,2,3])
joblib.dump(df, 'df.joblib', compress=4)
read_joblib.py
import pandas as pd
import joblib
df = joblib.load('df.joblib')
Recommended Posts