[Python] Creating a scraping tool Memo

Mainly memos for myself.

goal

Create a scraping tool to get pachinko data

design

Unit data acquisition [OK until test]

It is a program to get the number of G and the number of big hits. This is almost complete, and we have been able to acquire 5 units in the test, so we should be able to acquire all of them without any problems. Click here for the acquisition flow

Create a URL list
Access the URL list in order
Get the information you want when you access it and create a list → For example, if you want to get the number of BBs and the number of Gs, create a list for each.
Convert the list to a data frame

I also tried to get it with read_html in the table, but when joining each data frame, I could not join well, so I got only the information I wanted in a list and converted and joined them into a data frame. ..

After that, adjust the type of the acquired data.

Acquisition of slump graph [OK until test]

It is a program to acquire the slump graph of each unit.

Click here for the acquisition flow you are thinking about

Create a URL list (the list is the same as the unit data)
Create an image URL list
Get the image URL and append it to the list → The point to note is that you don't have to bother to turn it with a for statement because you only want one image on each page.
Create a function to download the image
Download execution → Fixed to print the error

What you have to pay attention to in the image is that the SRC is partly a relative path, whether the site is taking measures for the data of the day. I can't feel the regularity of which model has a relative path. Therefore, the data to be acquired is basically the data of the previous day. It is necessary to investigate what time the site will switch.

Data conversion of graph [OK until test]

Next, it is a program that analyzes the image of the slump graph and converts it into data.

Click here for the flow of thinking

Analysis based on the acquired image
Add analysis information to the list
Recalculate the analysis information in the list
Convert to data frame
Merge with the first data frame

Summary

Especially because it's close to my own memo, I don't think it will be helpful to anyone.