I'm a Jupyter beginner, so I'll show you how to build an environment on AWS EC2 and how to use it easily for myself six months later.
Aside from the detailed settings, the goal is to quickly build a Jupyter environment on EC2, run a simple Python script on Jupyter, and learn the basics of Jupyter's UI operation method. Since Linux combat power is low, try to follow the procedure with copy and paste as much as possible.
If you want to create Jupyter on Spark Cluster using Amazon EMR see here. In addition, Jupyter Notebook will be ** Juypter Lab ** from the next version, and the UI / functions will change significantly. How to build the environment of Jupyter Lab Please refer to here.
The first step is to build a Jupyter environment.
Start EC2 that runs Jupyter and log in with ssh.
8080
in the EC2 Security GroupPut the required module in ʻatp-get, update pip and install ipython [notebook]. Add ʻexport LC_ALL = C
when you see a message like WARNING! Your environment specifies an invalid locale.
during ssh login.
$ export LC_ALL=C
$ sudo apt-get update
$ sudo apt-get install -y python-pip libpq-dev python-dev libpng12-dev libjpeg8-dev libfreetype6-dev libxft-dev
$ sudo pip install -U pip
$ sudo pip install numpy pandas matplotlib seaborn scikit-learn plotly ipython[notebook]
The following command will create a Jupyter configuration file template (~ / .jupyter / jupyter_notebook_config.py
).
$ jupyter notebook --generate-config
Then edit ~ / .jupyter / jupyter_notebook_config.py
. All of them are big files commented out with #
, so put the following 5 lines in the beginning of the file and save it (The following settings are settings that anyone can access the Jupyter server, please note Please give me).
c = get_config()
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8080
c.NotebookApp.token = ''
From recent Jupyter, Login with password or token is required as a security measure. In the example above, c.NotebookApp.token =''
allows access without token.
If you want to set the Login password, you need to check the hash string of Password in advance. When you execute the following command, Prompt for entering Password will appear, so enter the Password you want to set.
$ python -c "import IPython;print(IPython.lib.passwd())"
Then, it will return a hash string starting with sha1:
such as sha1: 3be1549bb425: 1500071094720b33gf8f0feg474931dc5e43dfed
, so copy it.
Then, change the contents of ~ / .jupyter / jupyter_notebook_config.py
edited in ↑ as follows. Replace the hash string after c.NotebookApp.password
with the hash string you looked up in advance above.
c = get_config()
c.NotebookApp.ip = '0.0.0.0'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8080
c.NotebookApp.password = u'sha1:3be1549bb425:1500071094720b33gf8f0feg474931dc5e43dfed'
Run the following command to start Jupyter.
$ jupyter notebook
Go to EC2 on your browser, such as ʻec2-53-239-93-85.ap-northeast-1.compute.amazonaws.com:8080. Don't forget the port
8080. If the login screen of Jupyter appears and you can log in by entering the password set at the top, it is successful. If you want to run Background, set
nohup jupyter notebook> / dev / null 2> & 1 &` and Jupyter will continue to work even if you disconnect ssh.
Create a script called start_jupyter.sh
, register it in /etc/rc.local
, and set Jupyter to be executed in the Background when EC2 starts.
touch ~/start_jupyter.sh
start_jupyter.sh
and write / usr / local / bin / jupyter notebook
in it.chmod 777 ~/start_jupyter.sh
/etc/rc.local
with root privileges and put a line of su --ubuntu /home/ubuntu/start_jupyter.sh &
before ʻexit 0`shutdown -r now
with root privilegesIt's super easy, but it's a glimpse of how to use Jupyter.
After logging in, select Python2
from New
to create a Python2 notebook.
Jupyter works by writing Code and description (Markdown) in a box (?) Called ** Cell ** and executing them in sequence. The functions of various icons are as follows
+
Icon adds Cell (You can also add Cell by ʻInsert> ʻInsert Cell Above
or ʻInsert Cell Below`)Scissors
icon deletes Cellup / down arrow
icon hits FocusCode
is selected when entering Code such as PythonMarkdown
is selected when using the Markdown language to add explanations (to Code and processing results). Formula notation is also possible using MathjaxRaw NB Convert
is selected when entering formulas in LatexHeading
seems to have a similar use to Markdown
(will disappear in the future)Enter the following Python code into Cell. It is a code that only prints the current time 10 times every second.
import datetime, time
def main():
for count in range(0, 10):
print_current_time()
time.sleep(1)
def print_current_time():
print (datetime.datetime.now().strftime('%Y/%m/%d %H:%M:%S'))
if __name__ == '__main__':
main()
Please enter the above Python code in the Cell of 4 Codes as shown below.
Cell
> Run All
to run it.Clear all outputs & restart
with Kernel
> Restart
, and then execute Cell
> Run All
again to process in order from the top.Add a Markdown Cell as shown below and enter a comment in Markdown. If you execute Cell
> Run All
, Markdown will be processed in order from the top in the Rendered state.
It is common to use matplotlib
to create a 2D Chart. Chart is created / displayed by executing the following Code on Jupyter. % matplotlib inline
is an idiot required to display the output Chart of matplotlib on Jupyter, and it is OK if it is declared / executed once somewhere on the Notebook.
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randint(0, 100, 10000)
plt.hist(x, bins=20)
plt.plot()
With np.random.randint (0, 100, 10000)
, create 10,000 random number ints in the range of 0-99, and display the distribution of the random numbers as a histogram of 20 columns.
In addition, matplotlib
is a 2D Chart creation library that has been around for a long time, and while there is an opinion that the parameter setting method is a little complicated, there is also a library that can create a modern design Chart with a short code called seaborn
. There is (Refer to here). However, be aware that some charts support matplotlib but not seaborn.
A 3D Chart can be created using a Library called plotly
. It is effective when you want to check the 3D data distribution by machine learning etc., and Scatter, Surface, Mesh is supported in 3D. If you execute the following Code on Jupyter, 3D Scatter Chart will be displayed like this. Enjoy changing the viewpoint and scaling with Drag and Pinch in / out.
np.random.multivariate_normal ([0,0,0], [[0.1, 0, 0], [0, 1, 0], [0, 0, 2]], 1000) .T
has an average It is a function that creates 1000 random numbers with a three-dimensional normal distribution with [0,0,0] and a variance of [0.1, 1, 2] respectively.
import plotly
import numpy as np
plotly.offline.init_notebook_mode()
x1, y1, z1 = np.random.multivariate_normal([3,3,3], [[0.5, 0, 0], [0, 0.5, 0], [0, 0, 0.5]], 1000).T
trace1 = plotly.graph_objs.Scatter3d(x=x1, y=y1, z=z1, mode='markers', marker=dict(size=1, line=dict(color='b')))
x2, y2, z2 = np.random.multivariate_normal([0,0,0], [[0.1, 0, 0], [0, 1, 0], [0, 0, 2]], 1000).T
trace2 = plotly.graph_objs.Scatter3d(x=x2, y=y2, z=z2, mode='markers', marker=dict(size=1, line=dict(color='r')))
fig = plotly.graph_objs.Figure(data=[trace1, trace2])
plotly.offline.iplot(fig, show_link=False)
There are several ways to express formulas on Jupyter.
Raw NB Convert
Markdown
.Here, I will describe using Mathjax on Markdown.
$$r=\frac{1}{f}$$
$$\left(x + y\right)^{5}$$
Enter the above in ** Markdown Cell ** to execute the Cell, and it is OK if Rendering is done as follows.
Sample formulas using Mathjax on Jupyter can be found in this article and this article. It can be found at //www.suluclac.com/Wiki+MathJax+Syntax). Also, Mathjax's grammar is this article is summarized.
You can name the created notebook with file
> rename
and save it in * .ipynb
format with file
> Download as
.
There are a few cases where you are using it and want to execute a Shell command. I want to add a Python library, or I want to bring a file from another server with wget.
You can enter the Master node with ssh and execute the script, but you can also execute the shell script directly on Jupyter by one of the following methods. Script is executed with the user authority that started Jupyter.
A python library called commands runs a shell script on python.
import commands
commands.getoutput("date")
commands.getoutput("curl yahoo.co.jp")
!
**As a Jupyter-specific function, if you write a shell script after !
, It will be executed.
!date
!curl yahoo.co.jp
It is possible to execute the shell script by sudo
as shown below, but since Jupyter will be processing all the time after executing the script, it is necessary to restore it by Interrupt etc.
!sudo su
!find / -name 'hoge.txt'
Jupyter notebook extensions Jupyter / IPython extensions is being developed (separate from Jupyter's original development team). This article is very well organized about what kind of functions it has.
To install Extension, edit ~ / .jupyter / jupyter_notebook_config.py
with the above jupyter settings
, and execute the following two lines before starting Jupyter with jupyter notebook
. is.
mkdir -p ~/.local/share/jupyter
sudo pip install https://github.com/ipython-contrib/IPython-notebook-extensions/archive/master.zip
When I confess, I honestly don't use Extension ... Although ʻExecute Time` is convenient.
Jupyter Magic Commands
Jupyter / IPython has a dedicated feature called Magic Commands
. If you google with ʻipython magic command`, you will see various things, but the following are famous.
% who
% whos
(type and value)%% timeit
% quickref
displays a list of supported Magic CommandsIf you execute the Cell with % whos
entered in the Cell, the Magic command will be executed. However, when I confess, I honestly don't use Magic Command ... I use % whos
occasionally because it's convenient.
Have a good Jupyter life!
Recommended Posts