A friend of mine who works for a medical company asked me if I wanted to convert a large amount of long data into wide data. In that industry, data such as patients are often represented in a wide format. So, I made a conversion tool in python, so I wrote it as an article.
Below, the conversion image
Before conversion
ID,data,item,
0,100,BPS,
0,200,LDL,
1,20,YBY,
2,XXX,YYY,
2,AAA,BBB,
After conversion
ID YBY BBB BPS LDL YYY
0 N/A N/A 100 200 N/A
1 20 N/A N/A N/A N/A
2 N/A AAA N/A N/A XXX
So I created a tool using pandas. The created tool is as follows.
import pandas as pd
import csv
csv_file = open("sample.csv", "r",
encoding="ms932", errors="", newline="")
f = csv.reader(csv_file, delimiter=",", doublequote=True,
lineterminator="\r\n", quotechar='"', skipinitialspace=True)
columns = next(f)
data = []
for row in f:
data.append(row)
df = pd.DataFrame(data, columns=columns)
df_t = df.pivot_table(values=[columns[1]], index=[columns[0]], columns=[
columns[2]], aggfunc='sum', fill_value='N/A')
df_t.to_excel('out.xlsx', index=True)
Read sample.csv, convert it to wide type data, and output it to Excel. As a premise, there are three columns in csv. Create a row for each ID, and the more items there are, the more columns will be added. Then, data is set for each value. If there is no data, N / A will be set.
Output result
If you want to convert CSV from long type to wide type, please refer to this implementation. The source is also posted on the following Github. https://github.com/kurihiro0119/transform_wide_long
Recommended Posts