Data Scientist Article 2nd ww
I think that data scientists have many opportunities to make presentations and discuss while looking at the data. <-Appropriate speculation Therefore, I feel that a coding environment different from vim / emacs and the old IDE is required.
I think the following four points are required specifications.
Required specifications | vim/emacs | Rstudio/Spyder | Spotfire/Tableu | Jupyter Notebook |
---|---|---|---|---|
Can code | ◎ | ○ | × | ○ |
Data can be visualized interactively | × | ○ | ◎ | ○ |
Ease of ensuring reproducibility | △ | △ | × | ○ |
sexy | Seen from the general public × | ○ | ◎ | ◎ |
It's an arbitrary table far from data science, but it's true that you can use Jupyter Notebook without any loss. RStudio (python is Spyder) is good, but ~~ sexy ~~ From the viewpoint of ensuring reproducibility, Jupyter Notebook, which allows you to leave comments with markdown, is better. It is recommended because you can save the coding process even if you are not a data scientist.
――Python comes with an interactive shell from the beginning, but people who weren't satisfied with it created an interactive shell called IPython (Interactive Python). -Excerpt from How to use IPython
--Cell-oriented coding: Can be executed collectively in units called cells --Tab completion of reserved words, variables, module names, etc. --Investigate Objects: Add? To the object name for more information --Various magic commands: Check execution speed with
%% timeit
, etc. --Shell commands: Lines starting with! Such as! Ls
can be executed as shell commands --Reuse of inputs and outputs: Cell inputs and outputs are stored in variables called In and Out
--In the IPython project, an IPython Notebook that can input and output IPython from the web has appeared. --Pandas tables, matplotlib graphs, mathjax formulas, etc. can be displayed using web functions. --Display of comments in markdown: Descriptive power has improved at once --You can save and share the analysis process in .ipynb format. --If you put it on github, you can view it in the form of ipython notebook from nbviewer. --Other language users who were looking at the IPython Notebook started hooking the IPython Notebook to work in other languages as well. --The base is a zeromq-based communication called kernel. --This kernel has begun to be made in each language --Julia, Ruby, R, etc. --In a situation where the name IPython Notebook is strange? --Spinned out of the IPython project and became an independent project called Jupyter. - IPython Notebook 4.0 => Jupyter Notebook (JuPyteR: Julia + Python + R) --So Jupyter Notebook and IPython Notebook are the same --The console version called qtconsole, which was developed in IPython, has also moved to Jupyter.
as a result,
--In various languages --Easy to code --The powerful comment expressiveness of Markdown and --Has an interactive data visualization environment via the web --Embedded code makes it easy to retest,
** It is an application that has reproducibility, storage and sharing of the analysis process **. This is a kind of electronic lab notebook. (There is no witness)
If you add anaconda, it's all included. The environment construction of anaconda can be found at here.
Start the terminal in an appropriate folder and hit the following command.
jupyter notebook
It is OK if the browser starts up and the Jupyter page is displayed at http: // localhost: 8888
.
Old article Integration
It works for the time being even if you do not set it. If you don't mind, please skip it.
alias
jupyter notebook
is quite long, so you can make note
or alias.
It seems that the config area has changed considerably in jupyter 4.0, so those who have been using it for a long time should check it. https://jupyter.readthedocs.org/en/latest/migrating.html
jupyter notebook --generate-config
#>>> Writing default config to: ~/.jupyter/jupyter_notebook_config.py
python -c "from notebook.auth import passwd;print(passwd())"
#>>> Enter password:
#>>> Verify password:
#>>> 'sha1:........'
Make a copy of the hash password that starts with sha1: ...
.
vi ~/.jupyter/jupyter_notebook_config.py
Find the parameters below, uncomment them if necessary, and enter the values.
Parameters | initial value | comment |
---|---|---|
c.NotebookApp.ip | 'localhost' | Change if you want to access from other client machines.'*'Fully open. |
c.NotebookApp.notebook_dir | null | Specify the current directory of Jupyter. It is good to specify it somewhere. |
c.NotebookApp.open_browser | True | Do you want to open the browser at startup? Set Servers that do not contain X to False. |
c.NotebookApp.port | 8888 | If you are using 8888 elsewhere, specify a different port. |
c.NotebookApp.password | null | If you enter the hash string you copied earlier, password authentication will be applied. |
There are other ssl settings as well, so check the documentation if you need to publish to the web.
If you describe the library to be loaded first in ~ / .ipython / profile_default / startup
, it will be read together when the kernel is started.
Since cell magic can also be described in ipy format, it is good to describe % matplotlin inline
as well.
If you write seaborn's favorite way of writing, it will be easier after that.
If you write too much, the kernel will start up slowly, which is frustrating. (Pandas is relatively heavy)
Example:
00_init.ipy
%matplotlib inline
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
extension Jupyter Notebookn also has an extension. I will omit it because it is summarized in the following article. Add an extension to build a more comfortable Jupyter environment [jupyter notebook extensions python-markdown(markdown + jinja2)] (http://qiita.com/ksomemo/items/ba0f24daae2276ffd9b2)
RISE
There is a cool extension that allows you to make presentations on your Jupyter Notebook.
git clone https://github.com/damianavila/RISE
cd RISE
python setup.py install
A slideshow button will be added at the top right of the notebook page If you select Slideshow with the Cell Toolbar button on the notebook page, you can specify how far you want to make one slide.
Jupyter Content Management Extensions (3/21 postscript) I forgot that IBM created a super useful extension. If you put this in, you can do a full-text search in the notebook file from jupyter. Introduction to IBM blog It is published on pip so it is easy to install.
pip install jupyter_cms
jupyter cms install --user -s
jupyter cms activate
jupyter notebook
If you launch the notebook after jupyter cms activate
, a search button will be added to the tree screen.
You can also search for subordinate codes and comments, which enhances reusability.
The search is also quite flexible (http://whoosh.readthedocs.org/en/latest/querylang.html).
(I would be more happy if there was a preview ...)
In addition, there are also extensions that allow you to create dashboards with jupyter notebook. JupyterDay NYC slides github
(3/21 postscript up to here)
There is a good article. Beginning of Jupyter
(Added on April 13, 2016) To switch the environment, enter jupyter_environment_kernels. I found out in a wonderful article here.