How to format a table using Pandas apply, pivot and swaplevel

Overview

Let's create a table like this with Pandas. スクリーンショット 2016-03-16 19.35.24.png

Input data

import pandas as pd

tokyo = pd.DataFrame(data={'year':['2013','2014','2015'],
                           'cnt':[10,15,20],
                           'price':[100,100,90],},
                     columns=['year','cnt','price'])

osaka = pd.DataFrame(data={'year':['2013','2014','2015'],
                           'cnt':[5,6,7],
                           'price':[100,100,100],},
                     columns=['year','cnt','price'])

yokohama = pd.DataFrame(data={'year':['2015'],
                              'cnt':[1],
                              'price':[50],},
                        columns=['year','cnt','price'])

`result`


tokyo
-----------------------
   year  cnt  price
0  2013   10    100
1  2014   15    100
2  2015   20     90

osaka
-----------------------
   year  cnt  price
0  2013    5    100
1  2014    6    100
2  2015    7    100

yokohama
-----------------------
   year  cnt  price
0  2015    1     50

The sample data assumes sales of a certain product in Tokyo, Osaka, and Yokohama for three years. cnt is the number and price is the unit price. Yokohama has data for 2015 only.

Concatenate input data (concat)

Give each DataFrame a city name and join them vertically.

#Add city name to each DataFrame
tokyo['city'] = 'tokyo'
osaka['city'] = 'osaka'
yokohama['city'] = 'yokohama'

#Combine vertically
df = pd.concat([tokyo,osaka,yokohama], ignore_index=True)

`result`


df
-------------------------------------
   year  cnt  price      city
0  2013   10    100     tokyo
1  2014   15    100     tokyo
2  2015   20     90     tokyo
3  2013    5    100     osaka
4  2014    6    100     osaka
5  2015    7    100     osaka
6  2015    1     50  yokohama

Create a new column using the information of an existing column (apply)

With the apply method, you can perform uniform processing for all columns and all rows. In this example, axis = 1 is specified, so the lambda expression is called line by line.

** Calculate sales by multiplying the number (cnt) and unit price (price) **

#axis=In 1, apply function to each line. axis axis='columns'But the same effect.
#x.cnt * x.The price part is the number*unit price
df["sales"] = df.apply(lambda x:x.cnt * x.price,axis=1)

`result`


df
------------------------------------------------
   year  cnt  price      city  sales
0  2013   10    100     tokyo   1000
1  2014   15    100     tokyo   1500
2  2015   20     90     tokyo   1800
3  2013    5    100     osaka    500
4  2014    6    100     osaka    600
5  2015    7    100     osaka    700
6  2015    1     50  yokohama     50

** Determine the Kanto / Kansai area from the city name **

#'west' if x.city == 'osaka' else 'east'Is a ternary operation.
df["area"] = df.apply(lambda x:'west' if x.city == 'osaka' else 'east',axis=1)

`result`


df
------------------------------------------------
   year  cnt  price      city  sales  area
0  2013   10    100     tokyo   1000  east
1  2014   15    100     tokyo   1500  east
2  2015   20     90     tokyo   1800  east
3  2013    5    100     osaka    500  west
4  2014    6    100     osaka    600  west
5  2015    7    100     osaka    700  west
6  2015    1     50  yokohama     50  east

Create a table of cities and years (pivot_table)

List the sales for each year in each city. Pandas automatically edits with pivot_table, but the arguments are complicated and you need to get used to it. .. This time, I referred to this page.

#Create a table of cnt and sales with area and city on the horizontal axis and year on the vertical axis.
#fill_value fills the missing part of the data with 0.
df2 = pd.pivot_table(df,
                     index=['area','city'], 
                     columns=['year'],
                     values=['cnt','sales'],
                     fill_value=0)

`result`


df2
------------------------------------------------
               cnt           sales            
year          2013 2014 2015  2013  2014  2015
area city                                     
east tokyo      10   15   20  1000  1500  1800
     yokohama    0    0    1     0     0    50
west osaka       5    6    7   500   600   700

The above is the output result of Python, but if you make it a little easier to understand, it looks like this.

              |  cnt           | sales             <Column 1st layer(No name)
              |================|===================
year          | 2013 2014 2015 | 2013  2014  2015  <Column 2nd layer(The name is year)
==============|================|===================
east|tokyo    |   10   15   20 | 1000  1500  1800
    |yokohama |    0    0    1 |    0     0    50
west|osaka    |    5    6    7 |  500   600   700
==============|================|===================
area|city　<<The Index also has two levels. The names are area and city.

Change the order of columns in Pivot results (swap level)

I'm pretty close to the output, but I want the columns to be in the order of "year"> "quantity, sales", so [swaplevel](http://pandas.pydata.org/pandas-docs/stable/generated/pandas. Edit with DataFrame.swaplevel.html). (If you know how to do it with pivot alone, please let me know.)

** Swap 2 rows in column **

#The first argument, 0, points to the 0th line of colum.
#The second argument, 1 points to the first line of colum.
df2=df2.swaplevel(0,1, axis=1)

`Result (year has moved to the beginning)`


df2
------------------------------------------------
year          2013 2014 2015  2013  2014  2015
               cnt  cnt  cnt sales sales sales
area city                                     
east tokyo      10   15   20  1000  1500  1800
     yokohama    0    0    1     0     0    50
west osaka       5    6    7   500   600   700

** Sort by year **

df3=df2[['2013','2014','2015']]

`result`


df3
------------------------------------------------
year          2013       2014       2015      
               cnt sales  cnt sales  cnt sales
area city                                     
east tokyo      10  1000   15  1500   20  1800
     yokohama    0     0    0     0    1    50
west osaka       5   500    6   600    7   700

Output to excel

Now that you have created the output you want, output it to Excel.

#pip install openpyxl
writer = pd.ExcelWriter('output.xlsx')
df3.to_excel(writer,'Sheet1')
writer.save()

スクリーンショット 2016-03-16 20.52.52.png

The rest is done by drawing ruled lines and coloring. It seems that it can be done with openpyxl, but this time I did it manually with Excel.