Divide the dataset (ndarray) into arbitrary proportions with NumPy

Method

Use numpy.split ().

(Example 1) Divide the dataset into train: test = 7: 3

import numpy as np

ds = np.arange(128)  # array([0, 1, 2, ..., 127])

train, test = np.split(ds, [int(ds.size * 0.7)])

train  # array([[0, 1, ..., 88])
test   # array([[89, 90, ..., 127])

train.size  # 89 ≈ 128 * 0.7 = 89.6
test.size   # 39 ≈ 128 * 0.3 = 38.4

(Example 2) Divide the dataset into train: test: validation = 6: 2: 2

import numpy as np

ds = np.arange(128)  # array([0, 1, 2, ..., 127])

indices = [int(ds.size * n) for n in [0.6, 0.6 + 0.2]]  # [76, 102]
train, test, validation = np.split(ds, indices)

train       # array([0, 1, ..., 75])
test        # array([76, 77, ..., 101])
validation  # array([102, 103, ..., 127])

train.size       # 76 ≈ 128 * 0.6 = 76.8
test.size        # 26 ≈ 128 * 0.2 = 25.6
validation.size  # 26 ≈ 128 * 0.2 = 25.6

reference

Recommended Posts

Divide the dataset (ndarray) into arbitrary proportions with NumPy
I tried to divide the file into folders with Python
Small story: Divide the alphabet into arbitrary numbers and output (solved)
Divide data into project-like units with Django (2)
[GIMP] [Python-Fu] Divide the image into swords
Divide your data into project-like units with Django (3)
Find a position above the threshold with NumPy
Divide the string into the specified number of characters
Divide your data into project-like units with Django