Creating a scraping tool
This is Qiita's first post. Thank you.
Self-introduction h1>
31 years old ♂
Graduated from the Department of Information Science at a national university.
22 years old Joined an independent SIer. Resident at a food wholesale company.
26 years old Moved to the information system department of a food wholesale company.
Until now
How did you want to learn Python h1>
Transferred to the planning of the information system department of a food wholesale company
I made a proposal to introduce AI.
However, AI vendors are too expensive,
I was dismissed by the user department if it was not cost-effective.
It was a complex that was only legacy development,
I wondered if I could make a proposal by incorporating deep learning by myself.
I started studying Python.
I started studying Python, learned about scraping,
I thought this was in demand, so I tried to make it a tool.
About scraping tools h1>
The food wholesale company has more than 50 branches and more than 100 stores, each of which has different customers, and it was impossible for the information system department to handle everything, so each store has some IT knowledge and patience. Designed to be usable if there is.
Execution method h3>
Distribute the batch file to the startup, and when you start your PC in the morning, execute the Python program from the batch file. The information of each customer is acquired, and if there is a difference from the previous acquisition contents, the URL and new information are displayed in a pop-up.
File structure h3>
Input a simple csv file so that you can create it yourself. The output is also csv, making it easy to compare with the previous acquisition.
Specified content h3>
1.URL
2. Acquisition item class (up to 3 can be specified)
3. Output file name
Challenges h3>
1. Items that do not define a class cannot be taken
→ If the class is not defined for the item you want to take, you cannot get it. I considered taking the ID and name as well, but it would be confusing, so I decided not to. For future improvement
2. I can get extra items other than the target
→ Since it is not used as input data, ask them to delete unnecessary parts. We have supported the acquisition of weather information, but it will be complicated if it is generalized, so we will not accept it. For future improvement
3. You must be aware that you do not violate the terms of service and do not overload.
And to change jobs h1>
I've been writing for a long time so far, but now that I have acquired web-based development technology and want to become a company-independent engineer, I started to change jobs. Use this scraping tool as a portfolio.
It will be published on GitHub. I would be very grateful if you could give me some advice.
https://github.com/yamamasa2020/scraping-tool