Execution environment / conditions p> Execution environment ・ Windows10 Home 64bit conditions -Data used: CSV data of 10000 lines ・ Column of data used: index, student ID, A-E 5 subject scores (0-10000) * Modeled on a list of grades of 10,000 students who took a certain exam
for idx, row in enumerate(list):
row.extend('0')
row[7] = str((float(row[2]) + float(row[3]) + float(row[4]) + float(row[5]) + float(row[6]))/5.0)
It is similar in writing to other languages. This is fine, but ... If you implement it with DataFrame, you only need one line below.
df['average'] = (df['subjectA'] + df['subjectB'] + df['subjectC'] + df['subjectD'] + df['subjectE'])/5
Since you can write with the image of one line operation, there is less risk of coding mistakes.
list = sorted(list, key=lambda x: x[7], reverse=True)
Sorting can be implemented in one line even with list. If you implement this with a DataFrame, you only need one line.
df.sort_values('average', ascending=False)
list2 = []
for idx, row in enumerate(list):
if 50 <= float(row[7]):
list2.append(row)
Use the for loop as you did when calculating the average. If you implement this in a DataFrame ... you can expect it to be in one line.
df2 = df[50 < df['average']]
Process name | 10 times average travel time(list)[sec] | 10 times average travel time(DataFrame)[sec] |
---|---|---|
Average calculation | 0.764768385887146 | 0.01179955005645752 |
sort | 0.030899477005004884 | 0.011399650573730468 |
Narrow down | 0.04529948234558105 | 0.006699275970458984 |
Not only is the code simple to implement with DataFrame, but it's also fast.
for idx in range(len(df)):
df.iat[idx, 6] = str((float(df.iat[idx, 1]) + float(df.iat[idx, 2])
+ float(df.iat[idx, 3]) + float(df.iat[idx, 4]) + float(df.iat[idx, 5])/5.0))
It takes an average of 2.33 [s] 10 times, which is slower than that of list. Therefore, when dealing with DataFrame, it is desirable not to use for as much as possible.