environment

OS X El Capitan 10.11.6 python: 2.7.11 pandas: 0.18.0 matplotlib: 1.5.1 numpy: 1.10.4 IPython: 4.1.2

at first

Recommended drawing by Python 10 There are various drawing methods in Python, and it is basic to use a library called matplotlib. However, it's a bit unfashionable, so there is a rapper called seaborn that makes it easy to draw fashionably. If you're not happy with this, maybe Bokeh is a good choice. If you are using ggplot in R, you may be familiar with it.

However, in any case, pandas, which is required for data formatting, also has a plot function as a wrapper for matplotlib, so I will use it. I visited the following sites and studied, but I was a little confused because there was a slight difference in the notation, probably because of the version difference. http://sinhrks.hatenablog.com/entry/2015/11/15/222543 http://qiita.com/hik0107/items/de5785f680096df93efa http://qiita.com/y__sama/items/9676f148a66c16d8f47c http://qiita.com/TomokIshii/items/d786d25c69f20a0fc3c8

The most important point for me was;

DataFrame.plot() Is the simplest drawing, but if it is a scatter plot DataFrame.plot(kind='scatter') Or DataFrame.plot.scatter() I was wondering which one is better (correct), but pandas original "Visualization" I was convinced that there was the following description in "Other Plots" of.

You can also create these other plots using the methods DataFrame.plot.kind instead of providing the kind keyword argument. This makes it easier to discover plot methods and the specific arguments they use:

So both are correct DataFrame.plot.scatter() The point was that it would be easier to understand.

Below is a brief summary of the flow up to drawing and the points I noticed. As you can see by reading matplotlib Honke and pandas Honke.

Flow of drawing

Loading the library

import numpy as np import pandas as pd import matplotlib.pyplot as plt

Command to draw a diagram in Jupyter notebook

%matplotlib inline

Data capture

Specifying the file path file_path="/Users/username/Documents/file_name.csv"

Comma separated files

data_frame=pd.read_csv(file_path)

Tab delimited file

data_frame=pd.read_table(file_path)

Files other than the above

In case of data_frame = pd.read_table (file_path, sep ='.') # "." sep ='' # delimiter

Other settings header ='' # Number of lines you want to skip

Confirmation of read data

data_frame.head () #Check the first few lines of the dataframe data_frame.tail () #Check the last few lines of the dataframe

Data extraction

Column extraction

data_frame ['column_name'] # Extracted as series data_frame.column_name # Same as above data_frame [['column_name1','column_name2']] # 2 column extraction

Extraction of rows

data_frame.ix ['index_name'] #ix is a field for index reference data_frame [: n] # Extract all lines up to n data_frame [data_frame ['column_name']> x] #column_name Extract rows with column values greater than x data_frame.query ('column_name == x & column_name == y') # If you want to condition two or more, this is it. data_frame.query ('column_name == x | column_name == y') For #or

plot

Split screen (2X2, etc.) and specify margins

fig, axes = plt.subplots(2,2,figsize=(19,19)) plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.1, hspace=0.1)

Scatter plot

When drawing several sheets in a for sentence. ʻIis the number of counters.data_frame.plot(x='column1',y='column2',xlim=(x1,x2),linestyle='',marker='.',ax=axes.flatten()[i],color='k',title=title_list[i])`

data_frame.plot.scatter(x='column1',y='column2',xlim=(x1,x2),ax=axes.flatten()[i],color='k',s=15,title=title_list[i])

Either way is fine, but (probably) plot.scatter () is supposed to have a one-to-one correspondence, so there is no legend. You can omit x = if you want to use an index on the x-axis in the case of plot (). Since the standard is a line graph, it is a scatter plot by eliminating the linestyle and specifying a marker. Even if there is a column containing NaN, it will be ignored and plotted, but if you plot from two columns at the same time, an error will occur. It is necessary to change the drawing area every time.

Contents of .plot ()

xlim = (x1, x2) #x range ylim = (y1, y2) #y range color ='k' # color specification k black, r red, b blue, g green, c cyan, m magenta, w white, y yellow linestyle ='-' # ls. -: solid,-: dashed,'': nothing linewidth = 1 # lw. marker='.' #.:point, o:circle, v:triangle,s:square,+:plus, '':nothing markersize = 12 # ms. markeredgecolor ='' # mec. markeredgewidth = 1 # mew. markerfacecolor ='' # mfc. label ='name' # legend ax = axes.flatten () [i] #Draw in i-th place yerr ='' # y-axis error bar

Contents of .plot.scatter ()

s = 20 # marker size

Command to put a legend

axes.flatten( )[i].legend(loc='best') # 'upper right','center left','lower center','center'

Other notes

--You shouldn't use the pylab package. (From pylab import *)

http://yagays.github.io/blog/2014/08/15/ipython-notebook-matplotlib-inline/ http://qiita.com/HirofumiYashima/items/51d8dac9a784de356c5b import numpy as np import matplotlib.pyplot as plt Should load numpy and pyplot respectively. pylab may wear a name due to a rough import.

Sites that are often taken care of

Pandas in general. One of the developer members is well organized. http://sinhrks.hatenablog.com/entry/2015/04/28/235430

matplotlib wiki http://seesaawiki.jp/met-python/d/matplotlib

To go back and forth between standard python, numpy, pandas http://qiita.com/richi40/items/6b3af6f4b00d62dbe8e1

the term

Python (2.7) Scripting language. It can be written more clearly than Perl (it can only be written). More popular than Ruby overseas. It is often used in scientific calculations, and there are many related libraries.
Library A reusable collection of multiple highly versatile programs. A collection of code that provides some functionality to other programs. The following modules and packages that are included as standard are called standard libraries.
module A collection of parts such as functions for each purpose of use. math, sys, etc.

--Package A directory containing module files. NumPy, pandas, etc.

pandas A package that provides easy-to-use data structures and functions. Aspiring packages that aim to be the most powerful and flexible data analysis tool of any language.

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

Numpy Basic package for scientific calculations in Python
matplotlib The most common packages used for data visualization
Jupyter A web application that allows you to create and share codes, formulas, diagrams drawn there, and their explanations. You can organize and format data, perform numerical calculations, statistical analysis, machine learning, and more. You can easily perform analysis work while exploring, and share and store the results.

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

SciPy A Python-based software environment for mathematics, science, and engineering. A general term for a combination of the above packages (NumPy, Matplotlib, Jupyter, pandas) instead of individual packages, combined with a basic package for scientific calculations called SciPy liblrary. Often referred to as SciPy, which refers to the SciPy library.

Serpentine

From the various plots (Other Plots) in Pandas Headquarters Visualization

In addition to these kind s, there are the DataFrame.hist(), and DataFrame.boxplot() methods, which use a separate interface.

Besides using kind, DataFrame.hist () and DataFrame.boxplot () exist as separate interfaces. Therefore, how to write a histogram DataFrame.plot(kind='hist') DataFrame.plot.hist() DataFrame.hist() There are three types.

Drawing on Jupyter using the plot function of pandas