JupyterHub is a mechanism for operating Python etc. on a WEB browser called Jupyter Notebook. It is a mechanism to make it available in, but since there was no information that was fragmented but organized, login using GitHub authentication and connecting well with Django's Shell environment, it is summarized. We use AWS ELB / EC2 / RDS and use nginx for our web server.
As a use case, I have already built a Django environment for connecting to test data, but I want an environment where I can quickly touch it from the WEB ・ I want to separate the environment for each user, but I want to create a complicated authentication mechanism I think it's useful when you don't want to dull.
When completed, you will be able to quickly do the following things on your web browser.
Roughly speaking, it looks like the following.
I think there is a way to do it, but if you use https connection, when you configure SSL → nginx → Tornade on Virtualenv to JupyterHub, it will fail to attach when you specify it as Django's Kernel in notebook. I did. It's okay to access http, but it seems that something else needs to be set, maybe it is necessary to tell JupyterHub that it is http internally. Since there is access from the outside such as cooperation with GitHub, it should be https, but it is a bit disappointing that it did not reach that point.
In the environment I'm using, the DB of the production server is masked with Daily and stored in another RDS. In the test / staging environment, the mask DB can be operated from Django. In this document, Goal is to make it possible to execute the environment operated by Django for the RDS through Jupyter Hub.
/var/www/jupyter.example.jp/
It is assumed that the Django environment is built under.
Some of the directory structures that are likely to be relevant are shown below.
/var/www/jupyter.example.jp/
├── README.md
├── jupyter #Place application code(Below this manage.There is py)
├── requirements.txt
├── virtualenv
We're assuming that Django is already in the virtualenv environment and django-extensions is included to load the Model using shell_plus.
The virtualenv environment is included as follows.
cd /var/www/jupyter.example.jp/
virtualenv --prompt "(jupyter)" virtualenv
Also, place the JupyterHub configuration file under / etc / jupyterhub
.
JupyterHub is operated by the jupyterhub user.
sudo useradd jupyterhub
sudo usermod -a -G shadow jupyterhub
sudo mkdir /etc/jupyterhub
sudo chown jupyterhub /etc/jupyterhub
Continue with jupyterhub / jupyterhub: Multi-user server for Jupyter notebooks.
sudo apt-get install npm nodejs-legacy
sudo npm install -g configurable-http-proxy
(Install with pip under virtualenv environment)
pip install ipython jupyterhub
(How to install npm and node, while feeling old)
Creating a JupyterHub configuration file
jupyterhub --generate-config
/etc/jupyterhub/jupyterhub_config.Go to py and fix the following
L.Near 137
c.JupyterHub.ip = '127.0.0.1'
L.Near 106
c.JupyterHub.cookie_secret_file = '/etc/jupyterhub/jupyterhub_cookie_secret'
c.JupyterHub.db_url = '/etc/jupyterhub/jupyterhub.sqlite'
Reference: http://jupyterhub.readthedocs.io/en/latest/config-examples.html
Build the settings to log in to GitHub with Oauth from JupyterHub.
pip install oauthenticator
Added the following to about L.58 of jupyterhub_config.py
c.JupyterHub.authenticator_class = 'oauthenticator.GitHubOAuthenticator'
c.GitHubOAuthenticator.oauth_callback_url = os.environ['OAUTH_CALLBACK_URL']
c.GitHubOAuthenticator.client_id = os.environ['GITHUB_CLIENT_ID']
c.GitHubOAuthenticator.client_secret = os.environ['GITHUB_CLIENT_SECRET']
Reference: https://github.com/jupyterhub/oauthenticator
Spawner I don't have a deep understanding of what Spawner is, but it seems that I need to define a Spawner to use on JupyterHub for process management.
Reference: http://jupyterhub.readthedocs.io/en/latest/spawners.html
pip install git+https://github.com/jupyter/sudospawner
Describe the following near L.220 of jupyterhub_config.py
c.JupyterHub.confirm_no_ssl = True
c.JupyterHub.spawner_class = 'sudospawner.SudoSpawner'
c.Spawner.notebook_dir = '~/notebooks'
c.SudoSpawner.sudospawner_path = '/var/www/jupyter.example.jp/virtualenv/bin/sudospawner'
Add the following to the end with sudo visudo
.
## Jupyterhub
# comma-separated whitelist of users that can spawn single-user servers
Runas_Alias JUPYTER_USERS =GitHub user name to use(Comma separated)
# the command(s) the Hub can run on behalf of the above users without needing a password# the exact pa$
Cmnd_Alias JUPYTER_CMD = /var/www/jupyter.example.jp/virtualenv/bin/sudospawner
# actually give the Hub user permission to run the above command on behalf# of the above users without$
jupyterhub ALL=(JUPYTER_USERS) NOPASSWD:JUPYTER_CMD
Reference: http://qiita.com/mt08/items/301f9fb93d01e78bda47
With the settings up to this point, you can start the JupyterHub server. Create a startup script and start it.
sudo -u jupyterhub vi /etc/jupyterhub/launch_notebook.sh
#!/bin/bash
export OAUTH_CALLBACK_URL=http://jupyter.example.jp/hub/oauth_callback
export GITHUB_CLIENT_ID=xxx
export GITHUB_CLIENT_SECRET=xxx
source /var/www/jupyter.example.jp/virtualenv/bin/activate
jupyterhub -f /etc/jupyterhub/jupyterhub_config.py
CLIENT_ID and CLIENT_SECRET required for GitHub integration can be obtained from the following. https://github.com/settings/applications/new
with this
sudo -u jupyterhub /etc/jupyterhub/launch_notebook.sh
If you run, JupyterHub should start at http://127.0.0.1:8000/.
Once JupyterHub starts, we will send access from ELB next. nginx receives access to port 80 and uses it as a proxy to send to port 8000 if it should be processed by JupyterHub.
sudo vi vi /etc/nginx/nginx.conf
http {
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
...
/ etc / nginx / sites-enabled /
Create a conf file that describes the settings for jupyter.example.jp under. (Example: jupyter.example.jp.conf)
server {
listen 80;
server_name jupyter.example.jp;
#I want to enable the following settings when an HTTPS connection environment is created
# add_header Strict-Transport-Security max-age=15768000;
# Managing literal requests to the JupyterHub front end
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
# Managing WebHook/Socket requests between hub user servers and external proxy
location ~* /(api/kernels/[^/]+/(channels|iopub|shell|stdin)|terminals/websocket)/? {
proxy_pass http://127.0.0.1:8000;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
}
}
If you already have the same GitHub user name for Linux users, there is no problem, but if you do not have it, you need to create it.
USER=GitHub user name to use
useradd -d /home/${USER} -m ${USER}
mkdir /home/${USER}/notebooks
chown ${USER}/home/${USER}/notebooks
Jupyter notebook has the concept of ** Kernel **, which allows you to specify which Python environment to use at runtime. By default, it is a Python environment at startup, but by setting an environment that uses Django's shell_plus, it will be available in a state where it is already connected to the DB and Model import is completed, which is very convenient.
If you have Jupyter Hub in your Virtualenv environment, you can create a configuration file with: WARNING is out, but ...
% jupyter kernelspec install-self --user
[InstallNativeKernelSpec] WARNING | `jupyter kernelspec install-self` is DEPRECATED as of 4.0. You probably want `ipython kernel install` to install the IPython kernelspec.
[InstallNativeKernelSpec] Installed kernelspec python3 in /home/your_user/.local/share/jupyter/kernels/python3
If you put it in the global area without putting it in the Virtualenv environment, you may be able to do it as follows.
% python -m ipykernel install
Installed kernelspec python3 in /usr/local/share/jupyter/kernels/python3
Then, move the created configuration files.
% cd /usr/local/share
% sudo mv /home/your_user/.local/share/jupyter ./
% sudo chmod 775 jupyter
% sudo chown -R root:root jupyter
% cd /usr/local/share/jupyter/kernels
% ls /usr/local/share/jupyter/kernels
python3
% sudo mv python3 django
% ls /usr/local/share/jupyter/kernels/django
kernel.json logo-32x32.png logo-64x64.png
sudo vi django/kernel.json
I think that it is as follows, so modify it as described later.
{
"language": "python",
"argv": [
"/var/www/jupyter.example.jp/virtualenv/bin/python3.4",
"-m",
"ipykernel",
"-f",
"{connection_file}"
],
"display_name": "Python 3"
}
↓
{
"display_name": "Django",
"language": "python",
"codemirror_mode": {
"version": 3,
"name": "ipython"
},
"argv": [
"/var/www/jupyter.example.jp/virtualenv/bin/python",
"/var/www/jupyter.example.jp/jupyter/manage.py",
"shell_plus",
"--settings=jupyter.settings",
"--kernel",
"--connection-file",
"{connection_file}"
]
}
※--settings=jupyter.settings is manage.Specify the path of the Django configuration file used by py
※shell_Django has django to use plus-INSTALLED with extensions_Django on APPS_extensions must be specified
Reference 1: https://ipython.org/ipython-doc/3/development/kernels.html Reference 2: http://stackoverflow.com/questions/31088080/django-extensions-shell-plus-kernel-specify-connection-file (must be "--connection-file" as mentioned in the comments) Reference 3: http://stackoverflow.com/questions/39007571/running-jupyter-with-multiple-python-and-ipython-paths
Also, https://github.com/Cadair/jupyter_environment_kernels It may be easier to switch the kernel by inserting this, but I haven't done it.
Now you can start it by running launch_notebook.sh
, but since it is troublesome to start it every time the server is restarted, supervisord will start it automatically.
sudo vi /etc/supervisor/conf.d/
[program:notebook]
command=/etc/jupyterhub/launch_notebook.sh
directory=/etc/jupyterhub/
autostart=true
autorestart=true
stopgroup=true
startretries=3
exitcodes=0,2
stopsignal=TERM
user=jupyterhub
group=jupyterhub
Load the configuration file and start it.
supervisord reread
supervisord reload
supervisord start notebook
It will start automatically and the log will be listed in /etc/log/supervisor/notebook-xxx.log
.
Appendix I will summarize the contents that I got when building the environment and the links that I referred to.
If you send 80th access to JupyterHub without thinking about it,
2017/02/03 17:20:44 [emerg] 16297#16297: unknown "connection_upgrade" variable
I get the error.
It has been solved by referring to http://mogile.web.fc2.com/nginx/http/websocket.html, but it seems that it is necessary to describe the following in /etc/nginx/nginx.conf
.
http {
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
If you access from https with ELB, you can access JupyterHub, but when you set Kernel, the following error appears in supervisord.
[I 2017-02-05 21:43:24.410 JupyterHub log:100] 200 GET /hub/api/authorizations/cookie/jupyter-hub-token-xxx/[secret]([email protected]) 14.89ms
21:44:26.703 - error: [ConfigProxy] Proxy error: Error: socket hang up
at createHangUpError (http.js:1472:15)
at Socket.socketCloseListener (http.js:1522:23)
at Socket.EventEmitter.emit (events.js:95:17)
at TCP.close (net.js:466:12)
It's called a cookie, and it seems that it can be cured by modifying the setting value somewhere, but I couldn't do that and gave up. I will try again someday.
Since this area is posted elsewhere, it may be omitted, but in order to use matplotlib etc., the library necessary for data analysis is included in the following etc.
sudo apt-get install -y libpng12-dev libjpeg8-dev libfreetype6-dev libxft-dev
pip install numpy pandas matplotlib seaborn scikit-learn
* You can use it by putting it in the virtualenv environment and using supervisord restart notebook.
--You talk about how cool the Jupyter notebook is. -Powerful notepad for modern engineers Jupyter notebook recommendation ――It's nice and arbitrary, but you can feel the love for Jupyter notebook. -Recommended coding environment Jupyter Notebook for data scientists