Let's see how to use a profiler in Python to time and speed up a program.

What is optimization?

What is program optimization?

Improving the resource utilization efficiency of the program
Reduce memory usage
Reduce processing time
Reduce disk usage
Reduce system resources
→ Cost reduction and cheaper service can be provided
You can do more with the same resource
→ More users can use it at the same time

This time we will focus on shortening the processing time

What is taking time

Let's measure the processing time of the program.

Assuming that you have two programs that take 5 seconds, try measuring with the time command.

`python`


% time ./wait.py
./wait.py  0.02s user 0.02s system 0% cpu 5.057 total
% time ./busy.py
./busy.py  5.01s user 0.02s system 99% cpu 5.038 total

In both cases, it takes a total of 5 seconds from the start to the end of the program, but the situation is slightly different.

The former takes a long time even though the CPU usage is low
The latter has high CPU usage and takes time

Let's take a look at the source actually used:

`wait.py`


#!/usr/bin/env python

import time

def main():
    time.sleep(5)

if __name__ == '__main__':
    main()

`busy.py`


#!/usr/bin/env python

import time

def main():
    start = time.time()
    while time.time() - start < 5:
        pass

if __name__ == '__main__':
    main()

How to use time is different

For the former, after calling sleep (), just wait for it to come out.
The latter keeps checking the time without hesitation

Programs generally spend their time doing things like:

Calculation is done using CPU
Waiting to read / write disk
Reading and writing data
Program loading
Waiting for communication response
Waiting for human input
...

Improve how you spend your time

Improve the time taken for calculation
→ Less computational complexity
→ Invest more computational resources
Improve waiting time
→ How to wait less
→ Do what you can while waiting

Find a bottleneck

Where can I improve in the program?

Look for large chunks that are consuming time
Possibility of shortening where time is consumed
I spend a lot of time at once
Runs many times and spends a total of time
There is no room for scraping where it is not consumed
Consider measures with two times in mind
Will it improve the calculation time?
Will it improve waiting time?

How to find

If the program can be divided, measure each with the time command
When the program cannot be divided and measured
The time command can only measure the time of the entire program
How do I see what part of the program is spending time?

Log

Log output of the time difference before and after processing

`python`


start = time.time()
some_func()
print "%f sec" % (time.time() - start)

`python`


start = time.clock()
some_func()
print "%f sec" % (time.clock() - start)

A method that can be used anywhere
Can be used regardless of programming language or execution environment
cf. printf debug
It's hard to have to fill the logs here and there
If you search for 2 minutes, you can enjoy it to some extent.

cProfile

One of the tools called "Profiler" that comes with Python.

Get call statistics for each function
Number of times called
Time taken
Hook function call and return to measure time
Measurement processing is implemented in C language (_lsprof.so)

From the command line

Execute with a program to be executed like the time command as an argument

`python`


% python -m cProfile wait.py
         4 function calls in 5.002 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    5.002    5.002 wait.py:3(<module>)
        1    0.000    0.000    5.001    5.001 wait.py:5(main)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    5.001    5.001    5.001    5.001 {time.sleep}

item	Value meaning
ncalls	Number of calls
tottime	Time spent(Does not include what was called)
percall	Time spent per call(tottime/ncalls)
cumtime	Time spent(Including what you called)
percall	Time spent per call(cumtime/ncalls)

The module was loaded and called main () and in it time.sleep ()
The main function defined on line 5 of wait.py takes 5 seconds
tottime is 0 → The function called from it is consuming time
It takes 5 seconds in time.sleep → This

When you run cProfile.py from the command line

Easy profiling for the entire program
Not applicable to services (daemons)

Embed in code

Embed code for profile in program
Only the entrance of the processing to be measured is required
Part of the service (daemon) can be processed
Measurement results can be output to a file
Even if standard output is not available

`wait_profile.py`


#!/usr/bin/env python

import cProfile
import time

def main():
    time.sleep(5)

if __name__ == '__main__':
    cProfile.run("main()", "wait.prof")

% python -c "import pstats; pstats.Stats('wait.prof').strip_dirs().sort_stats(-1).print_stats()"
Fri Jun 17 00:25:58 2016    wait.prof

         4 function calls in 5.005 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    5.005    5.005 <string>:1(<module>)
        1    0.000    0.000    5.005    5.005 wait_profile.py:6(main)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    5.005    5.005    5.005    5.005 {time.sleep}

How to proceed with optimization

Improve non-functional requirements while maintaining functional requirements

Functional requirements
What kind of output is output for input
Non-functional requirements
Performance requirements
Memory usage within XX MB, execution time within YY seconds, etc.

How to recommend

Implemented changes while keeping functional requirement test results green
Run the program and measure the performance
Repeat

let's try it

Fibonacci sequence

`fib.py`


#!/usr/bin/env python

def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n-2) + fib(n-1)

if __name__ == '__main__':
    assert fib(30) == 832040

`python`


% time ./fib.py
python fib.py  0.52s user 0.01s system 98% cpu 0.540 total
% python -m cProfile fib.py
         2692539 function calls (3 primitive calls) in 1.084 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.084    1.084 fib.py:3(<module>)
2692537/1    1.084    0.000    1.084    1.084 fib.py:3(fib)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

In this example, fib is called too many times. (M / N of tottime means M recursive calls and N non-recursive calls)

Also note that the latter takes longer when comparing the execution time without profile (0.540s) measured with the time command and the execution time with profile (1.084s).

Profile overhead (processing for measurement for each function call)
Note that the ratio of functions that are called frequently and those that are not called may differ from the normal operation.

Improve performance while maintaining functionality

Maintain functionality: Test guaranteed (assert line)
Many calls
To calculate fib (30), you should know fib (29), fib (28), ..., fib (0)
2692537 I have calculated it 3 times
→ Will it be faster if you remember the result of the calculation once?

Actually try

Prepare a dictionary to store the calculation results
After calculation, register it in the dictionary and then return the value
If it is in the dictionary before calculation, return it

`fib_optimized.py`


#!/usr/bin/env python

cache = {}

def fib(n):
    if n in cache:
        return cache[n]
    if n == 0:
        cache[n] = 0
    elif n == 1:
        cache[n] = 1
    else:
        cache[n] = fib(n-2) + fib(n-1)
    return cache[n]

if __name__ == '__main__':
    assert fib(30) == 832040

`python`


% python -m cProfile fib_optimized.py
         61 function calls (3 primitive calls) in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 fib_optimized.py:3(<module>)
     59/1    0.000    0.000    0.000    0.000 fib_optimized.py:5(fib)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

The number of calls has been reduced, and the overall time has been shortened!

The second example. A function that takes the sum from beg to end is implemented using the standard function sum.

`takesum.py`


#!/usr/bin/env python

def takesum(beg, end):
    "take sum of beg, beg+1, ..., end"
    assert beg <= end
    i = beg
    xs = []
    while i <= end:
        xs.append(i)
        i += 1
    return sum(xs)

if __name__ == '__main__':
    assert takesum(0, 10000000) == 50000005000000

`python`


% python -m cProfile takesum.py
         10000005 function calls in 3.482 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.076    0.076    3.482    3.482 takesum.py:3(<module>)
        1    2.418    2.418    3.405    3.405 takesum.py:3(takesum)
 10000001    0.878    0.000    0.878    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.109    0.109    0.109    0.109 {sum}

Think about how you can reduce the processing time.

Is it possible to eliminate unnecessary processing?
Need to build a list?
Is it possible to speed up by using computational resources?
Run on a faster machine?
Do you want to process in parallel?
Is there a way to avoid calculation as much as possible?
Need to add?

Summary

Program processing time: Are you calculating or waiting for something?
Find bottlenecks and reduce processing time
Log the time difference before and after processing to narrow down
Profiler may help detect bottlenecks quickly and reliably
Hooks to a function call and measures the consumption time and the number of calls
It may be good to use a tool that visualizes profile results with calling relationships.
When using cProfile in a class method: http://qiita.com/yoichi22/items/e4fe74e5b9afa47b0887
In reality, it may not be so simple (rather, there are more)
When profiler overhead comes into play
It's faster on the profiler, but not faster when running without the profiler
For processing that spans multiple systems
I don't know which system is the bottleneck
If there is no reproducibility
I don't know if the processing time fluctuates every time it is executed and it is improving.
Basic strategy
Ensure reproducibility (do the procedure well, choose a reproducible index)
Narrow down bottlenecks (measure partial parts to find places that take time)
Measure with the code before and after the change (adopt if improved, consider another plan if not)

Getting Started with Optimization

What is optimization?

What is taking time

python

wait.py

busy.py

Find a bottleneck

Log

python

python

From the command line

python

Embed in code

wait_profile.py

How to proceed with optimization

let's try it

fib.py

python

fib_optimized.py

python

takesum.py

python

Summary

`python`

`wait.py`

`busy.py`

`python`

`python`

`python`

`wait_profile.py`

`fib.py`

`python`

`fib_optimized.py`

`python`

`takesum.py`

`python`