I took Udemy's "Practical Python Data Science"

This blog is the third day entry of jupyter notebook Advent Calendar 2016.

There is an online learning service called Udemy, and "Practical Python Data Science" was 2,300 yen. So I tried it. Shingo Tsuji, the author of "Python Startbook", explained in Japanese in the 17.5 hour course. It's a very easy-to-understand course. I haven't listened to all the sections yet, but I would like to introduce you to this course.

Since this course basically uses Jupyter Notebook to explain almost all sections, what are the features of Jupyter Notebook introduced in this course? I will introduce you while exchanging.

The first half is in the form of explaining how to use Python's data analysis library. After that, data analysis and visualization are explained, and in the latter half, more practical data analysis is explained using actual data.

Anaconda This is a library package required for data analysis provided by Continuum Analytics. This package also contains a Jupyter Notebook. If you install Anaconda, pip, a Python package management software, is included, so you can use it to install the required libraries. I think that a super beginner like me should first install this Anaconda in order to prepare the data analysis environment of Python.

How to use Jupyter

After installing Anaconda, just start it in the working directory as shown below and the browser will start.

$ ipython notebook

Click Python with New to create a new Notebook
Code execution
Ctrl + Enter to execute code
Shift + Enter to run the code and create a new cell
You can also write comments with Markdown
CTRL + C in Terminal to exit Jupyter
Rename the note in Rename Notebook. It is saved with an extension called ipynb. You can open the saved Notebook by clicking on a file with the + ipynb extension. You can execute commands in Terminal with +! ls etc.
In Jupyter, you can execute the html code by writing and executing html as shown below in Markdown, and if there is a file in the same directory, you can display the image and save it as a notebook. Masu

<img src="lec28.png ">

You can display the image in Notebook by writing as below.

%matplotlib inline

Explanation of data analysis

Sections 3 and beyond of this course use Jupyter to explain the basics of data science. It will flow in the form of introducing the library.

NumPy
NumPy arrays play a central role in data analysis
Pandas
A data library that is very often used when analyzing data with Python --Series is a very popular library --DataFrame is a very popular data type

In this way, you can also read data from the clipboard.

from pandas import Series, DataFrame
import pandas as pd
#Paste the data you want to read to the clipboard
nfl_frame = pd.read_clipboard()
nfl_frame

Basics of data analysis

Introduction of how to read data from text data, JSON, HTML, Excel
Learn to merge data and more.
Learn how to organize data.

Data visualization

Use of Seaborn Seaborn is a very good library for visualizing data, and it also has the feature that you can easily change the color. You can display histograms, kernel density estimates (smooth histograms in a nutshell), box plots (box plots), violin plots, regression lines, and more.

Practical data analysis

Analyze Titanic data You can download various data by creating an account at Kaggle and logging in. Here, we will analyze the data using Titanic Survivor Attribute Data.

Kaggle is a predictive modeling and analytical method related platform and its operating company where companies and researchers post data and statisticians and data analysts around the world compete for the optimal model. See also: https://ja.wikipedia.org/wiki/Kaggle

Stock market data analysis Since Pandas is a library originally created for the purpose of data analysis of financial information, it is very suitable for data analysis of stock prices. With Pandas' Remote Data Access DataReader, you can easily download stock data and analyze the data.

from pandas.io.data import DataReader
from datetime import datetime
tech_list = ['AAPL','GOOG','MSFT','AMZN']
end = datetime.now()
start = datetime(end.year - 1,end.month,end.day)
for stock in tech_list:   
    globals()[stock] = DataReader(stock,'yahoo',start,end)
AAPL.describe()

Election data analysis The data comes from HuffPost Pollster. It is the data of the US election. Here, we use a module called requests, which is convenient for fetching data from the Web. Also, in order to handle CSV format text data like a file, use StringIO to perform data analysis.

As mentioned above, the content is a little different from the content related to Jupyter Notebook, but I hope it will be helpful. I think "Practical Python Data Science" is worth more than 2,300 yen. The content is very easy to understand for a super beginner like me, so I recommend you to take it.