The one that saves python objects as binary data https://docs.python.org/ja/3/library/pickle.html
Loading is fast Since it is binary data, parsing processing is fast because it is unnecessary. Trained models can be pickled and reused
This verification article is wonderful Python: I investigated the persistence format of pandas
Make train.csv pickle for the time being This is the only code
#pickle is a standard library so no install required
import pickle
import pandas as pd
train = pd.read_csv('../input/titanic/train.csv')
# 'wb'(write binary)Specify
with open('train.pickle', 'wb') as f:
pickle.dump(train, f)
First commit
When the green Complete appears in the upper left, click Open Version.
Scroll to the Output column
If you can see train.pickle
, then New Dataset
Enter your favorite Dataset title and create
Dataset is completed
If you create a new notebook + Add Data
Filter by Your Datasets
Add the guy you just made
Win if displayed here
This is the only code
# 'rb'(read binary)Specify
with open('../input/titanicdatasetpickles/train.pickle', 'rb') as f:
train = pickle.load(f)
It is properly loaded as a DataFrame.
train.shape
# (891, 12)
!ls ../input
# titanicdatasetpickles
Let's use the dump process
dump_pickles.py
import pickle
import pandas as pd
#Switch path between Kaggle and another environment
if '/kaggle/working' in _dh:
input_path = '../input'
else:
input_path = './input'
#Rewrite only here for each competition
data_sets = {
'train': f'{input_path}/titanic/train.csv',
'test': f'{input_path}/titanic/test.csv',
'gender_submission': f'{input_path}/titanic/gender_submission.csv'
}
for name, path in data_sets.items():
df = pd.read_csv(path)
with open(f'{name}.pickle', 'wb') as f:
pickle.dump(df, f)
#this is
with open('./train.pickle', 'wb') as f:
pickle.dump(train, f)
#like this
train.to_pickle('./train.pickle')
#this is
with open('../input/titanicdatasetpickles/train.pickle', 'rb') as f:
df_ss = pickle.load(f)
#like this
train = pd.read_pickle('../input/titanicdatasetpickles/train.pickle')
ModuleNotFoundError: No module named 'pandas.core.internals.managers'; 'pandas.core.internals' is not a package
It seems to be a problem with the version of pandas
pip install -U pandas
Solved by
I was saved by this article Inconsistency between pickle and pandas
Thank you for reading to the end
Recommended Posts