About this article

I decided to use Trainer with Chainer, and when I inherited DatasetMixin, which is a class for passing data to lIterator, and created my own dataset, I made a slight mistake, but I made a slight mistake, so I will keep it as a record.

(Addition) The version of Chainer assumed in this article is 1.20. The situation may change with the new version.

Error message encountered

The error message that was output was difficult (difficult to understand) to resolve the cause.

  File "cupy/cuda/device.pyx", line 66, in cupy.cuda.device.Device.__enter__ (cupy/cuda/device.cpp:1621)
  File "cupy/cuda/device.pyx", line 81, in cupy.cuda.device.Device.use (cupy/cuda/device.cpp:1862)
  File "cupy/cuda/runtime.pyx", line 178, in cupy.cuda.runtime.setDevice (cupy/cuda/runtime.cpp:2702)
  File "cupy/cuda/runtime.pyx", line 130, in cupy.cuda.runtime.check_status (cupy/cuda/runtime.cpp:2028)
cupy.cuda.runtime.CUDARuntimeError: cudaErrorInvalidDevice: invalid device ordinal

When I saw this message, I thought, "Did I make a mistake in writing the GPU processing of the model or data?" So I tried desperately to play around with cuda.cupy, but I couldn't find a solution.

You may have noticed if you read the docs carefully, but in the end I looked at the chainer source to understand the cause.

Part of the code I created

class MyDataset(chainer.dataset.DatasetMixin):
    def __init__(self, path):
        label_list = {}
        def get_label(l):
            num = label_list.get(l, -1)
            if num < 0:
                label_list[l] = len(label_list)
                return label_list[l]
            return num
        flist = []
        llist = []
        for root, dirs, files in os.walk(path):
            label = os.path.basename(root) #Label the directory name
            label_num = get_label(label)
            for file in files:
                flist.append(os.path.join(root, file))
                llist.append(label_num)
        self.flist = flist
        self.llist = llist

    def __len__(self):
        return len(self.flist)

    def get_example(self, i):
        fname = self.flist[i]
        label = self.llist[i]
        img = Image.open(fname)
        img = np.asarray(img, dtype=np.float32).transpose(2, 0, 1)
        return img, label

If you use this class to train using Iterator and Trainer with a GPU-enabled chainer, you will get the above error regardless of whether GPU is used or not.

The following error occurred on the CPU version of chainer.

    check_cuda_available()
  File "/path/to/lib/python3.4/site-packages/chainer/cuda.py", line 83, in check_cuda_available
    raise RuntimeError(msg)
RuntimeError: CUDA environment is not correctly set up
(see https://github.com/pfnet/chainer#installation).CuPy is not correctly installed. Please check your environment, uninstall Chainer and reinstall it with `pip install chainer --no-cache-dir -vvvv`.

Cause

The simple story is that the label value returned by get_example is not a numpy object. When writing code without using Trainer, it is easy to notice that TypeError ("numpy.ndarray or cuda.ndarray are expected.") Will appear if it is not a numpy object when wrapping the value in Variable. However, when I left it to Iterator / Trainer, I got a different error, so I couldn't find the cause (should I request type checking?).

Countermeasures

In this case, the solution is to make the array llist containing the label values into a numpy object.

@@ -16,7 +16,7 @@
                 flist.append(os.path.join(root, file))
                 llist.append(label_num)
         self.flist = flist
-        self.llist = llist
+        self.llist = np.asarray(llist, dtype=np.int32)
 
     def __len__(self):
         return len(self.flist)

Finally

If you use Trainer, you can get a fancy progress output (ProgressBar ()) and it's cool, so let's use Trainer.