Hello.
Recently, I'm also touching ** Python **.
Now, in order to improve the efficiency of internal operations, we are trying to automate the payment application process that accounting manually performs using Excel and Csv using ** Python **.
I'm using ** openpyxl ** for Excel operation and ** pandas ** for data processing, but I'm still struggling with ** pandas **, and if it's ** C # **, the code will be killed instantly. I'm also struggling with the process of assembling.
I will write down such a struggle instead of a memo.
Click here for this memo.
There may be a tsukkomi saying, "It's that easy!", But I've tried various things, so I'll make a note of it.
When I looked up how to delete a row in a DataFrame, I found a function called ** drop **. [Official] pandas.DataFrame.drop
The usage is as follows. Specify the Index number of the DataFrame and delete it.
import pandas as pd
#Read an Excel file
data_frame = pd.read_excel('{Excel file path}', sheet_name='{Sheet name}', header=0)
#10th line(Index=9)To delete
data_frame.drop(9)
#6th line(Index=5)And line 8(Index=7)To delete
data_frame.drop([5,7])
#2nd line(Index=1)5th line from(Index=4)To delete
data_frame.drop(range(1,4))
This time, as the title suggests, "** Delete data whose business partner starts with XX and description starts with △△ **" and delete lines that satisfy multiple conditions.
For that purpose, it is necessary to obtain the Index number that satisfies the conditions.
When I looked up how to get the Index number of a DataFrame, I found a function called ** index **. [Official] pandas.DataFrame.index
The usage is as follows.
#Get Index that matches the condition
drop_index = data_frame.index[data_frame['Suppliers'] == '〇〇']
#Delete Index that matches the condition
data_frame = data_frame.drop(drop_index)
Yeah, I was able to properly delete the data of "** Customers are 〇〇 **". Next, let's increase the conditions.
#Get Index that matches multiple conditions
drop_index = data_frame.index[(data_frame['Suppliers'] == '〇〇') & (data_frame['Description'] == '△△*****')]
#Delete Index that matches the condition
data_frame = data_frame.drop(drop_index)
Alright, the data of the two conditions can be deleted! : Hmm? I forgot something ... The second condition was a prefix match ... I don't know ...
For the time being, try the prefix match ** starts with **.
#Try prefix match with starts with DataFrame ...
drop_index = data_frame.index[(data_frame['Suppliers'] == '〇〇') & (data_frame['Description'].startswith('△△'))]
【result】
Error:'Series' object has no attribute 'startswith'
"** Series ** doesn't have ** starts with **!" Was angry.
Pandas has an accessor (access method) that can process strings called ** str **, and also has a ** startswith ** function that can be extracted by prefix match. [Official] pandas.Series.str.startswith
Try it with ** str.startswith **.
#Added prefix match condition with startswith of str accessor of DataFrame
drop_index = data_frame.index[(data_frame['Suppliers'] == '〇〇') & (data_frame['Description'].str.startswith('△△'))]
#Delete Index that matches the condition
data_frame = data_frame.drop(drop_index)
well done! I was able to delete the data of two conditions properly!
I would like to say, "That's it!" By the time we got to the above method, we actually implemented it in a different way.
#Loop up to the maximum number of rows in the DataFrame
for i in range(len(data_frame)):
#Chuck with specified column and row values
if data_frame['Suppliers'][i] == '〇〇' and data_frame['Description'][i].startswith('△△'):
#Delete the Index number that matches the condition
data_frame = data_frame.drop(i)
It's a bit analog, but it loops through the number of rows of data and processes each one. This method can also delete the data of two conditions properly.
There are many other methods, but this time I introduced the method I actually tried. That's it.
Next time, I'll post where I struggled with ** openpyxl ** and ** pandas **.
Thank you for reading this far.
Recommended Posts