When training or verifying data with a neural network, it is necessary to shape the data into the form (data group, label). MNIST and CIFAR10 have this shape in advance, but when using the data you have, you need to make this shape yourself. However, it took some time to understand, so I will save it here.
In addition, there were people who answered in English about the basic method, so if you can speak English, please see the following series of exchanges. Convert Pandas dataframe to PyTorch tensor?
This Qiita supplements the context and explains in a little more detail, so please read only where necessary.
train_label = torch.tensor(train['target'].values)
Quote: Convert Pandas dataframe to PyTorch tensor?
train_data = torch.tensor(X.drop('target', axis = 1).values)
Quote: Convert Pandas dataframe to PyTorch tensor?
However, this can result in the following error:
Error
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, int64, int32, int16, int8, uint8, and bool.
As mentioned above, if the type of np.ndarray is object, it cannot be converted to torch.tensor. An object is a data type that is a mixture of multiple data types. Therefore, the data must be converted to one of the tensor's corresponding data types above. For example, let's unify it to the float32 type. It works fine if you rewrite it as follows.
train_data = torch.tensor(X.drop('target', axis = 1).values.astype(np.float32))
Quote: Convert Pandas dataframe to PyTorch tensor?
Use the data type suitable for your data. For data types, see NumPy data type dtype list and conversion by astype (cast).
Or if there is an unlabeled DataFrame (X_train) from the beginning, convert it to np.array and then to tensor.
train_data = torch.tensor(np.array(X_train.astype('f')))
from torch.utils import data
train_tensor = data.TensorDataset(train_data, train_label)
The torch.utils.data.TensorDataset class takes tensors as an argument and puts the data together by indexing the tensors according to the first dimension. So, of course, you'll get an error if you don't have the number of data and labels to train. Reference: pytorch official tutorial
that's all.
Recommended Posts