Since I downloaded the file using selenium up to the last time, Describes the process of acquiring and processing it and saving it again as a csv file.
Get all files with a specific pattern in a specific folder! In that case, glob is convenient.
#Get the file list of the regular expression(glob)
file_list = glob.glob(dl_dir+'/*')
It seems that there are several libraries for excel operation by python, but it seems convenient to remember one. I use xlrd.
#Working with Excel files
wb = xlrd.open_workbook(file_name) #open xls
sheet_names = wb.sheet_names() #Get a list of sheet names
sheet = wb.sheet_by_name(sheet_names[1])
values2 = sheet.col_values(2)
values5 = sheet.col_values(5)
values2.pop(0) #To eliminate the first line ... I wonder if there is a better way
values5.pop(0)
for i in range(len(channels)):
obj = [
word,
someFunction2(values2[i]),
someFunction5(values5[i])
]
result.append(obj)
with open(up_dir + '/result-{}.csv'.format(file_name), 'w') as f:
writer = csv.writer(f)
writer.writerows(result)
So far
--When executed, it scrapes the site and downloads the file. --Put the processed product in a specific folder
I was able to do that. Next, I will write about "sending the processed material to S3" and "obtaining the original INPUT (words) from S3".
Recommended Posts