--numpy 1.16.3 or later
Python code example
np.load('/path/to/file.npy')
Examples of errors that occur
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-37-1db66562b57b> in <module>
----> 1 np.load('tmp.npy')
~/venv/aep/lib/python3.7/site-packages/numpy/lib/npyio.py in load(file, mmap_mode, allow_pickle, fix_imports, encoding)
451 else:
452 return format.read_array(fid, allow_pickle=allow_pickle,
--> 453 pickle_kwargs=pickle_kwargs)
454 else:
455 # Try a pickle
~/venv/aep/lib/python3.7/site-packages/numpy/lib/format.py in read_array(fp, allow_pickle, pickle_kwargs)
720 # The array contained Python objects. We need to unpickle the data.
721 if not allow_pickle:
--> 722 raise ValueError("Object arrays cannot be loaded when "
723 "allow_pickle=False")
724 if pickle_kwargs is None:
ValueError: Object arrays cannot be loaded when allow_pickle=False
Since numpy v1.16.3
, the behavior of thenumpy.load ()
function has changed.
Change before | After change |
---|---|
allow_pickle The default value for the option isTrue |
allow_pickle The default value for the option isFalse |
After confirming that there are no ** security concerns ** described later, specify the ʻallow_pickle` option as shown below.
np.load('/path/to/file.npy', allow_pickle=True)
dtype
The numpy matrix (np.ndarray
) can store strings and Python objects as well as numbers. The type of stored value is reflected in the attribute dtype
.
numpy v1.16.0
A vulnerability has been reported that could allow malicious code to be executed when serializing a numpy matrix (a file that serializes) containing Python objects with np.load ()
. (However, there is a counterargument regarding this vulnerability)
Therefore, from v1.16.3
, the default behavior ofnp.load ()
is changed as described above, and if dtype
is a Python object, if ʻallow_pickle = False,
ValueError` is thrown. It was way.
It can be said that it is a specification change to push it to the safer side.
As a matter of course, don't np.load (allow_pickle = True)
for ** untrusted files **. As mentioned in the previous section, it is possible to execute arbitrary code.
There is usually no problem with ad hoc code such as data formatting by Jupyter and machine learning [^ 1]. Note that application developers use Python.
[^ 1]: There is a problem with the * .npy
file given by a malicious colleague (?).
Of course, I think it's a Breaking Change because it changes the behavior of the application.
Python's math library may have a tendency to be safe if you change the default value. [^ 2] If you think that it's okay because it's a revision upgrade, it will hurt. Please be careful of application engineers who have entered from other languages.
[^ 2]: Other examples include the default value of n_estimator
in sklearn.ensemble.RandomForestClassifier
.
Recommended Posts