OS X El Capitan 10.11.6 python: 2.7.11 pandas: 0.18.0 matplotlib: 1.5.1 numpy: 1.10.4 IPython: 4.1.2
Recommended drawing by Python 10 There are various drawing methods in Python, and it is basic to use a library called matplotlib. However, it's a bit unfashionable, so there is a rapper called seaborn that makes it easy to draw fashionably. If you're not happy with this, maybe Bokeh is a good choice. If you are using ggplot in R, you may be familiar with it.
However, in any case, pandas, which is required for data formatting, also has a plot function as a wrapper for matplotlib, so I will use it. I visited the following sites and studied, but I was a little confused because there was a slight difference in the notation, probably because of the version difference. http://sinhrks.hatenablog.com/entry/2015/11/15/222543 http://qiita.com/hik0107/items/de5785f680096df93efa http://qiita.com/y__sama/items/9676f148a66c16d8f47c http://qiita.com/TomokIshii/items/d786d25c69f20a0fc3c8
The most important point for me was;
DataFrame.plot()
Is the simplest drawing, but if it is a scatter plot
DataFrame.plot(kind='scatter')
Or
DataFrame.plot.scatter()
I was wondering which one is better (correct), but pandas original "Visualization" I was convinced that there was the following description in "Other Plots" of.
You can also create these other plots using the methods DataFrame.plot.kind instead of providing the kind keyword argument. This makes it easier to discover plot methods and the specific arguments they use:
So both are correct
DataFrame.plot.scatter()
The point was that it would be easier to understand.
Below is a brief summary of the flow up to drawing and the points I noticed. As you can see by reading matplotlib Honke and pandas Honke.
import numpy as np import pandas as pd import matplotlib.pyplot as plt
%matplotlib inline
Specifying the file path
file_path="/Users/username/Documents/file_name.csv"
data_frame=pd.read_csv(file_path)
data_frame=pd.read_table(file_path)
In case of data_frame = pd.read_table (file_path, sep ='.')
# "."
sep ='' # delimiter
Other settings header ='' # Number of lines you want to skip
data_frame.head ()
#Check the first few lines of the dataframe
data_frame.tail ()
#Check the last few lines of the dataframe
data_frame ['column_name']
# Extracted as series
data_frame.column_name
# Same as above
data_frame [['column_name1','column_name2']]
# 2 column extraction
data_frame.ix ['index_name']
#ix is a field for index reference
data_frame [: n]
# Extract all lines up to n
data_frame [data_frame ['column_name']> x]
#column_name Extract rows with column values greater than x
data_frame.query ('column_name == x & column_name == y')
# If you want to condition two or more, this is it.
data_frame.query ('column_name == x | column_name == y')
For #or
fig, axes = plt.subplots(2,2,figsize=(19,19)) plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.1, hspace=0.1)
When drawing several sheets in a for sentence. ʻIis the number of counters.
data_frame.plot(x='column1',y='column2',xlim=(x1,x2),linestyle='',marker='.',ax=axes.flatten()[i],color='k',title=title_list[i])`
data_frame.plot.scatter(x='column1',y='column2',xlim=(x1,x2),ax=axes.flatten()[i],color='k',s=15,title=title_list[i])
Either way is fine, but (probably) plot.scatter ()
is supposed to have a one-to-one correspondence, so there is no legend. You can omit x =
if you want to use an index on the x-axis in the case of plot ()
. Since the standard is a line graph, it is a scatter plot by eliminating the linestyle and specifying a marker.
Even if there is a column containing NaN, it will be ignored and plotted, but if you plot from two columns at the same time, an error will occur. It is necessary to change the drawing area every time.
xlim = (x1, x2) #x range ylim = (y1, y2) #y range color ='k' # color specification k black, r red, b blue, g green, c cyan, m magenta, w white, y yellow linestyle ='-' # ls. -: solid,-: dashed,'': nothing linewidth = 1 # lw. marker='.' #.:point, o:circle, v:triangle,s:square,+:plus, '':nothing markersize = 12 # ms. markeredgecolor ='' # mec. markeredgewidth = 1 # mew. markerfacecolor ='' # mfc. label ='name' # legend ax = axes.flatten () [i] #Draw in i-th place yerr ='' # y-axis error bar
s = 20 # marker size
axes.flatten( )[i].legend(loc='best') # 'upper right','center left','lower center','center'
--You shouldn't use the pylab package. (From pylab import *)
http://yagays.github.io/blog/2014/08/15/ipython-notebook-matplotlib-inline/
http://qiita.com/HirofumiYashima/items/51d8dac9a784de356c5b
import numpy as np import matplotlib.pyplot as plt
Should load numpy and pyplot respectively. pylab may wear a name due to a rough import.
Pandas in general. One of the developer members is well organized. http://sinhrks.hatenablog.com/entry/2015/04/28/235430
matplotlib wiki http://seesaawiki.jp/met-python/d/matplotlib
To go back and forth between standard python, numpy, pandas http://qiita.com/richi40/items/6b3af6f4b00d62dbe8e1
Python (2.7) Scripting language. It can be written more clearly than Perl (it can only be written). More popular than Ruby overseas. It is often used in scientific calculations, and there are many related libraries.
Library A reusable collection of multiple highly versatile programs. A collection of code that provides some functionality to other programs. The following modules and packages that are included as standard are called standard libraries.
module A collection of parts such as functions for each purpose of use. math, sys, etc.
--Package A directory containing module files. NumPy, pandas, etc.
pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.
Numpy Basic package for scientific calculations in Python
matplotlib The most common packages used for data visualization
Jupyter A web application that allows you to create and share codes, formulas, diagrams drawn there, and their explanations. You can organize and format data, perform numerical calculations, statistical analysis, machine learning, and more. You can easily perform analysis work while exploring, and share and store the results.
The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.
From the various plots (Other Plots) in Pandas Headquarters Visualization
In addition to these kind s, there are the DataFrame.hist(), and DataFrame.boxplot() methods, which use a separate interface.
Besides using kind, DataFrame.hist () and DataFrame.boxplot () exist as separate interfaces.
Therefore, how to write a histogram
DataFrame.plot(kind='hist')
DataFrame.plot.hist()
DataFrame.hist()
There are three types.
Recommended Posts