This article is the 8th day article of Furukawa Lab Advent_calendar. This article was written by a student at Furukawa Lab as part of his studies. The content may be ambiguous or the expression may be slightly different.

Introduction

As axes for comparing machine learning algorithms of the same task, learning stability (dependency of initial value), whether it can be explained, (accuracy in supervised learning), precision, recall ( There is also an index called Recall)) And there is "calculation amount".

It is difficult to adopt an algorithm that tends to increase the amount of calculation so that it does not fit in memory depending on the situation (time / calculation resource). (Ex: $ \ mathcal {O} (number of data ^ 3) $ once The number of training data must be limited to 1000 ($ 1000 ^ 3 = 10 Billion \ fallingdotseq 10GB $) at most.

Also, even if the amount of calculation is small, it may not be implemented in that calculation.

This article describes how to check the memory consumption of a python program.

No package @shiracamus has specified how to know the memory size without importing anything, so I will add it to the text as well.

import numpy as np
N = 1000
D = 3
X = np.random.randn(N,D)
X.__sizeof__()
# 24112

Extra: How to check memory consumption other than variables

X = [X]
X.__sizeof__()
# 48

def f():
  pass
f.__sizeof__()
# 112

class A():
  pass
A.__sizeof__()
#   File "<stdin>", line 1, in <module>
# TypeError: descriptor '__sizeof__' of 'object' object needs an argument
#Sys described later.getsizeof()Is possible

sys

import sys
import numpy as np

N = 1000
D = 3
X = np.random.randn(N,D)
print(sys.getsizeof(X))
# 24112

By the way, in the case of pytorch, it will not be brought properly unless .storage () is added after the variable (one loss)

import sys
import torch

N = 1000
D = 3
X = torch.randn(N,D)
print(sys.getsizeof(X))
# 72
print(sys.getsizeof(X.storage()))
# 12064

memory_profiler

Calling sys.getsizeof for each variable you want to see is tedious. So memory_profiler. It is not a standard package, so you need to install it.

By the way, the contents of the target method in @profile do not stop in pycharm debug mode (one loss)


from memory_profiler import profile
import numpy as np

@profile()
def test():
    N = 10000000
    D = 3
    X = np.random.randn(N, D)

test()

Then in the output


Line #    Mem usage    Increment   Line Contents
================================================
     5     62.3 MiB     62.3 MiB   @profile()
     6                             def test():
     7     62.3 MiB      0.0 MiB       N = 10000000
     8     62.3 MiB      0.0 MiB       D = 3
     9    291.2 MiB    228.9 MiB       X = np.random.randn(N, D)

Will give you.

Mem usage is Total and Increment is the amount of memory added on that line. It seems that I can not trust it very much depending on the implementation, but I use this for light confirmation.

[python] Checking the memory consumption of variables

Introduction