Script when executing Scrapy from Script

Run Scrapy from a script

Scripts like the one in Scrapy's official documentation didn't work for some reason. (Crawling target is Japanese site)

spider = FollowAllSpider(domain='scrapinghub.com')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run()

So, like the execution from the command line tool (scrapy crawl hogehoge), I modified the description around Log and it worked.

spider = FollowAllSpider(domain='scrapinghub.com')
settings = get_project_settings()
crawler = Crawler(settings)
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
log.start_from_crawler(crawler)
crawler.configure()
crawler.crawl(spider)
crawler.start()
reactor.run()

The cause seems to be that log does not receive the crawler when executed with log.start (). However, it is unknown why it does not work without the crawler.

scrapy/log.py


def start_from_crawler(crawler):
    return start_from_settings(crawler.settings, crawler)

Recommended Posts

Script when executing Scrapy from Script
Run illustrator script from python
Xpath summary when extracting data from websites with Python Scrapy
Execute Python script from batch file
Download images from "Irasutoya" using Scrapy
Precautions when using phantomjs from python