Scrapy is Python's crawling and scraping framework. By using this, you can code according to the framework's method instead of importing the library into your own code.
$pip install scrapy
To create a project, run the following command.
$scrapy startproject (Project name)
The project name can be anything you like. If you execute it, you will get a lot of directories.
If you don't download at intervals, it will put a load on the system you are crawling to, so you need to pay close attention to it.
Add the following statement to setting.py from the project name folder.
DOWNLOAD_DERAY = 1
It is a place to store what you got from crawling. Define a class in items.py.
class [name of the class](scrapy.Item):
[The name of what you fetch] = scrapy.Field()
item = [name of the class]()
item['The name of what you fetch'] = 'Examples'
The details of crawling and scraping are mainly described in spider. Enter the following command to create a spider.
$scrapy genspider [spider name] [Domain of the site to fetch]
This will create a [spider name] .py file in the spider folder.
After this, spider will be described according to the site where crawling is performed.
I would appreciate it if you could point out any mistakes.
Recommended Posts