Drawing on Jupyter using the plot function of pandas

environment

OS X El Capitan 10.11.6 python: 2.7.11 pandas: 0.18.0 matplotlib: 1.5.1 numpy: 1.10.4 IPython: 4.1.2

at first

Recommended drawing by Python 10 There are various drawing methods in Python, and it is basic to use a library called matplotlib. However, it's a bit unfashionable, so there is a rapper called seaborn that makes it easy to draw fashionably. If you're not happy with this, maybe Bokeh is a good choice. If you are using ggplot in R, you may be familiar with it.

However, in any case, pandas, which is required for data formatting, also has a plot function as a wrapper for matplotlib, so I will use it. I visited the following sites and studied, but I was a little confused because there was a slight difference in the notation, probably because of the version difference. http://sinhrks.hatenablog.com/entry/2015/11/15/222543 http://qiita.com/hik0107/items/de5785f680096df93efa http://qiita.com/y__sama/items/9676f148a66c16d8f47c http://qiita.com/TomokIshii/items/d786d25c69f20a0fc3c8

The most important point for me was;

DataFrame.plot() Is the simplest drawing, but if it is a scatter plot DataFrame.plot(kind='scatter') Or DataFrame.plot.scatter() I was wondering which one is better (correct), but pandas original "Visualization" I was convinced that there was the following description in "Other Plots" of.

You can also create these other plots using the methods DataFrame.plot.kind instead of providing the kind keyword argument. This makes it easier to discover plot methods and the specific arguments they use:

So both are correct DataFrame.plot.scatter() The point was that it would be easier to understand.

Below is a brief summary of the flow up to drawing and the points I noticed. As you can see by reading matplotlib Honke and pandas Honke.

Flow of drawing

Loading the library

import numpy as np import pandas as pd import matplotlib.pyplot as plt

Command to draw a diagram in Jupyter notebook

%matplotlib inline

Data capture

Specifying the file path file_path="/Users/username/Documents/file_name.csv"

Comma separated files

data_frame=pd.read_csv(file_path)

Tab delimited file

data_frame=pd.read_table(file_path)

Files other than the above

In case of data_frame = pd.read_table (file_path, sep ='.') # "." sep ='' # delimiter

Other settings header ='' # Number of lines you want to skip

Confirmation of read data

data_frame.head () #Check the first few lines of the dataframe data_frame.tail () #Check the last few lines of the dataframe

Data extraction

Column extraction

data_frame ['column_name'] # Extracted as series data_frame.column_name # Same as above data_frame [['column_name1','column_name2']] # 2 column extraction

Extraction of rows

data_frame.ix ['index_name'] #ix is a field for index reference data_frame [: n] # Extract all lines up to n data_frame [data_frame ['column_name']> x] #column_name Extract rows with column values greater than x data_frame.query ('column_name == x & column_name == y') # If you want to condition two or more, this is it. data_frame.query ('column_name == x | column_name == y') For #or

plot

Split screen (2X2, etc.) and specify margins

fig, axes = plt.subplots(2,2,figsize=(19,19)) plt.subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=0.1, hspace=0.1)

Scatter plot

When drawing several sheets in a for sentence. ʻIis the number of counters.data_frame.plot(x='column1',y='column2',xlim=(x1,x2),linestyle='',marker='.',ax=axes.flatten()[i],color='k',title=title_list[i])`

data_frame.plot.scatter(x='column1',y='column2',xlim=(x1,x2),ax=axes.flatten()[i],color='k',s=15,title=title_list[i])

Either way is fine, but (probably) plot.scatter () is supposed to have a one-to-one correspondence, so there is no legend. You can omit x = if you want to use an index on the x-axis in the case of plot (). Since the standard is a line graph, it is a scatter plot by eliminating the linestyle and specifying a marker. Even if there is a column containing NaN, it will be ignored and plotted, but if you plot from two columns at the same time, an error will occur. It is necessary to change the drawing area every time.

Contents of .plot ()

xlim = (x1, x2) #x range ylim = (y1, y2) #y range color ='k' # color specification k black, r red, b blue, g green, c cyan, m magenta, w white, y yellow linestyle ='-' # ls. -: solid,-: dashed,'': nothing linewidth = 1 # lw. marker='.' #.:point, o:circle, v:triangle,s:square,+:plus, '':nothing markersize = 12 # ms. markeredgecolor ='' # mec. markeredgewidth = 1 # mew. markerfacecolor ='' # mfc. label ='name' # legend ax = axes.flatten () [i] #Draw in i-th place yerr ='' # y-axis error bar

Contents of .plot.scatter ()

s = 20 # marker size

Command to put a legend

axes.flatten( )[i].legend(loc='best') # 'upper right','center left','lower center','center'

Other notes

--You shouldn't use the pylab package. (From pylab import *)

http://yagays.github.io/blog/2014/08/15/ipython-notebook-matplotlib-inline/ http://qiita.com/HirofumiYashima/items/51d8dac9a784de356c5b import numpy as np import matplotlib.pyplot as plt Should load numpy and pyplot respectively. pylab may wear a name due to a rough import.

Sites that are often taken care of

Pandas in general. One of the developer members is well organized. http://sinhrks.hatenablog.com/entry/2015/04/28/235430

matplotlib wiki http://seesaawiki.jp/met-python/d/matplotlib

To go back and forth between standard python, numpy, pandas http://qiita.com/richi40/items/6b3af6f4b00d62dbe8e1

the term

--Package A directory containing module files. NumPy, pandas, etc.

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

Serpentine

From the various plots (Other Plots) in Pandas Headquarters Visualization

In addition to these kind s, there are the DataFrame.hist(), and DataFrame.boxplot() methods, which use a separate interface.

Besides using kind, DataFrame.hist () and DataFrame.boxplot () exist as separate interfaces. Therefore, how to write a histogram DataFrame.plot(kind='hist') DataFrame.plot.hist() DataFrame.hist() There are three types.

Recommended Posts

Drawing on Jupyter using the plot function of pandas
Display the graph of tensorBoard on jupyter
Plot the environmental concentration of organofluorine compounds on a map using open data
The story of launching python2.x jupyter notebook using docker (crushed on Saturday and Sunday)
Change the theme of Jupyter
The Power of Pandas: Python
Reformat the timeline of the pandas time series plot with matplotlib
Make the function of drawing Japanese fonts in OpenCV general
Easy on Mac! Plot of unit step response using Python
Understand the function of convolution using image processing as an example
I tried to get the index of the list using the enumerate function
[Python] I wrote the route of the typhoon on the map using folium
Precautions when drawing the probability density function and the histogram on top of each other in matplotlib
Precautions when using the urllib.parse.quote function
I wrote the basic operation of Pandas with Jupyter Lab (Part 1)
How to use Jupyter on the front end of supercomputer ITO
Finding the optimum value of a function using a genetic algorithm (Part 1)
Cases using pandas plot, cases using (pure) matplotlib plot
Notes on using matplotlib on the server
I wrote the basic operation of Pandas with Jupyter Lab (Part 2)
100 Language Processing Knock-93 (using pandas): Calculate the accuracy rate of analogy tasks
Graph the change in the number of keyword appearances per month using pandas
Plot the Nikkei Stock Average with pandas
Investigate the effect of outliers on correlation
A memorandum of using Python's input function
[Python3] Rewrite the code object of the function
Post the subject of Gmail on twitter
Build the execution environment of Jupyter Lab
Notes on using OpenCL on Linux on the RX6800
Clone the github repository on jupyter notebook
GPU check of PC on jupyter notebook
About the arguments of the setup function of PyCaret
Display histogram / scatter plot on Jupyter Notebook
Pandas of the beginner, by the beginner, for the beginner [Python]
Study on Tokyo Rent Using Python (3-1 of 3)
Change the order of PostgreSQL on Heroku
[Linux] Difference in time information depending on the clock ID of the clock_gettime () function
[Circuit x Python] How to find the transfer function of a circuit using Lcapy
Set an upper limit on the number of recursive function iterations in Python
Label each point on the seaborn scatter plot
Shortening the analysis time of Openpose using sound
Estimating the effect of measures using propensity scores
Execution environment on the Web by "Project Jupyter"
Check the type of the variable you are using
Drawing tips with matplotlib on the server side
Defeat the probability density function of the normal distribution
Get the caller of a function in Python
Exclusive release of the django app using ngrok
Python: Try using the UI on Pythonista 3 on iPad
[2020July] Check the UDID of the iPad on Linux
Use the latest version of PyCharm on Ubuntu
I checked the list of shortcut keys of Jupyter
[Pandas] Basics of processing date data using dt
100 Language Processing Knock-32 (using pandas): Prototype of verb
[Pythonocc] I tried using CAD on jupyter notebook
Fill the browser with the width of Jupyter Notebook
Syntax highlighting on the command line using Pygments
Fix the argument of the function used in map
Calculate the probability of outliers on a boxplot
Sound the buzzer using python on Raspberry Pi 3!
Try using the collections module (ChainMap) of python3