This article is the 8th day article of Furukawa Lab Advent_calendar. This article was written by a student at Furukawa Lab as part of his studies. The content may be ambiguous or the expression may be slightly different.
As axes for comparing machine learning algorithms of the same task, learning stability (dependency of initial value), whether it can be explained, (accuracy in supervised learning), precision, recall ( There is also an index called Recall)) And there is "calculation amount".
It is difficult to adopt an algorithm that tends to increase the amount of calculation so that it does not fit in memory depending on the situation (time / calculation resource). (Ex: $ \ mathcal {O} (number of data ^ 3) $ once The number of training data must be limited to 1000 ($ 1000 ^ 3 = 10 Billion \ fallingdotseq 10GB $) at most.
Also, even if the amount of calculation is small, it may not be implemented in that calculation.
This article describes how to check the memory consumption of a python program.
No package @shiracamus has specified how to know the memory size without importing anything, so I will add it to the text as well.
import numpy as np
N = 1000
D = 3
X = np.random.randn(N,D)
X.__sizeof__()
# 24112
Extra: How to check memory consumption other than variables
X = [X]
X.__sizeof__()
# 48
def f():
pass
f.__sizeof__()
# 112
class A():
pass
A.__sizeof__()
# File "<stdin>", line 1, in <module>
# TypeError: descriptor '__sizeof__' of 'object' object needs an argument
#Sys described later.getsizeof()Is possible
sys
import sys
import numpy as np
N = 1000
D = 3
X = np.random.randn(N,D)
print(sys.getsizeof(X))
# 24112
By the way, in the case of pytorch, it will not be brought properly unless .storage () is added after the variable (one loss)
import sys
import torch
N = 1000
D = 3
X = torch.randn(N,D)
print(sys.getsizeof(X))
# 72
print(sys.getsizeof(X.storage()))
# 12064
memory_profiler
Calling sys.getsizeof for each variable you want to see is tedious. So memory_profiler. It is not a standard package, so you need to install it.
By the way, the contents of the target method in @profile do not stop in pycharm debug mode (one loss)
from memory_profiler import profile
import numpy as np
@profile()
def test():
N = 10000000
D = 3
X = np.random.randn(N, D)
test()
Then in the output
Line # Mem usage Increment Line Contents
================================================
5 62.3 MiB 62.3 MiB @profile()
6 def test():
7 62.3 MiB 0.0 MiB N = 10000000
8 62.3 MiB 0.0 MiB D = 3
9 291.2 MiB 228.9 MiB X = np.random.randn(N, D)
Will give you.
Mem usage is Total and Increment is the amount of memory added on that line. It seems that I can not trust it very much depending on the implementation, but I use this for light confirmation.
Recommended Posts