Let's create a table like this with Pandas.
import pandas as pd
tokyo = pd.DataFrame(data={'year':['2013','2014','2015'],
'cnt':[10,15,20],
'price':[100,100,90],},
columns=['year','cnt','price'])
osaka = pd.DataFrame(data={'year':['2013','2014','2015'],
'cnt':[5,6,7],
'price':[100,100,100],},
columns=['year','cnt','price'])
yokohama = pd.DataFrame(data={'year':['2015'],
'cnt':[1],
'price':[50],},
columns=['year','cnt','price'])
result
tokyo
-----------------------
year cnt price
0 2013 10 100
1 2014 15 100
2 2015 20 90
osaka
-----------------------
year cnt price
0 2013 5 100
1 2014 6 100
2 2015 7 100
yokohama
-----------------------
year cnt price
0 2015 1 50
The sample data assumes sales of a certain product in Tokyo, Osaka, and Yokohama for three years. cnt is the number and price is the unit price. Yokohama has data for 2015 only.
Give each DataFrame a city name and join them vertically.
#Add city name to each DataFrame
tokyo['city'] = 'tokyo'
osaka['city'] = 'osaka'
yokohama['city'] = 'yokohama'
#Combine vertically
df = pd.concat([tokyo,osaka,yokohama], ignore_index=True)
result
df
-------------------------------------
year cnt price city
0 2013 10 100 tokyo
1 2014 15 100 tokyo
2 2015 20 90 tokyo
3 2013 5 100 osaka
4 2014 6 100 osaka
5 2015 7 100 osaka
6 2015 1 50 yokohama
With the apply method, you can perform uniform processing for all columns and all rows. In this example, axis = 1 is specified, so the lambda expression is called line by line.
** Calculate sales by multiplying the number (cnt) and unit price (price) **
#axis=In 1, apply function to each line. axis axis='columns'But the same effect.
#x.cnt * x.The price part is the number*unit price
df["sales"] = df.apply(lambda x:x.cnt * x.price,axis=1)
result
df
------------------------------------------------
year cnt price city sales
0 2013 10 100 tokyo 1000
1 2014 15 100 tokyo 1500
2 2015 20 90 tokyo 1800
3 2013 5 100 osaka 500
4 2014 6 100 osaka 600
5 2015 7 100 osaka 700
6 2015 1 50 yokohama 50
** Determine the Kanto / Kansai area from the city name **
#'west' if x.city == 'osaka' else 'east'Is a ternary operation.
df["area"] = df.apply(lambda x:'west' if x.city == 'osaka' else 'east',axis=1)
result
df
------------------------------------------------
year cnt price city sales area
0 2013 10 100 tokyo 1000 east
1 2014 15 100 tokyo 1500 east
2 2015 20 90 tokyo 1800 east
3 2013 5 100 osaka 500 west
4 2014 6 100 osaka 600 west
5 2015 7 100 osaka 700 west
6 2015 1 50 yokohama 50 east
List the sales for each year in each city. Pandas automatically edits with pivot_table, but the arguments are complicated and you need to get used to it. .. This time, I referred to this page.
#Create a table of cnt and sales with area and city on the horizontal axis and year on the vertical axis.
#fill_value fills the missing part of the data with 0.
df2 = pd.pivot_table(df,
index=['area','city'],
columns=['year'],
values=['cnt','sales'],
fill_value=0)
result
df2
------------------------------------------------
cnt sales
year 2013 2014 2015 2013 2014 2015
area city
east tokyo 10 15 20 1000 1500 1800
yokohama 0 0 1 0 0 50
west osaka 5 6 7 500 600 700
The above is the output result of Python, but if you make it a little easier to understand, it looks like this.
| cnt | sales <Column 1st layer(No name)
|================|===================
year | 2013 2014 2015 | 2013 2014 2015 <Column 2nd layer(The name is year)
==============|================|===================
east|tokyo | 10 15 20 | 1000 1500 1800
|yokohama | 0 0 1 | 0 0 50
west|osaka | 5 6 7 | 500 600 700
==============|================|===================
area|city <<The Index also has two levels. The names are area and city.
I'm pretty close to the output, but I want the columns to be in the order of "year"> "quantity, sales", so [swaplevel](http://pandas.pydata.org/pandas-docs/stable/generated/pandas. Edit with DataFrame.swaplevel.html). (If you know how to do it with pivot alone, please let me know.)
** Swap 2 rows in column **
#The first argument, 0, points to the 0th line of colum.
#The second argument, 1 points to the first line of colum.
df2=df2.swaplevel(0,1, axis=1)
Result (year has moved to the beginning)
df2
------------------------------------------------
year 2013 2014 2015 2013 2014 2015
cnt cnt cnt sales sales sales
area city
east tokyo 10 15 20 1000 1500 1800
yokohama 0 0 1 0 0 50
west osaka 5 6 7 500 600 700
** Sort by year **
df3=df2[['2013','2014','2015']]
result
df3
------------------------------------------------
year 2013 2014 2015
cnt sales cnt sales cnt sales
area city
east tokyo 10 1000 15 1500 20 1800
yokohama 0 0 0 0 1 50
west osaka 5 500 6 600 7 700
Now that you have created the output you want, output it to Excel.
#pip install openpyxl
writer = pd.ExcelWriter('output.xlsx')
df3.to_excel(writer,'Sheet1')
writer.save()
The rest is done by drawing ruled lines and coloring. It seems that it can be done with openpyxl, but this time I did it manually with Excel.
Recommended Posts