Prerequisite knowledge

python3

This time I wrote the code to collect text from the website using python and selenium, so I will summarize it.

What is selenium?

Originally, selenium is for automatically testing web applications, but you can operate a web browser to operate a website.

Selenium - Web Browser

Web scraping with Python and Selenium

To explain how we decided to scrape the web with Python and Selenium this time.

The site you tried to scrape had a mechanism to get the contents of the site by ajax communication.
Therefore, the urlopen function of urllib.request cannot be used.

For the above reason, use not only urlopen of urllib.request, which is often used for web scraping, but also selenium.

Basic web scraping flow of selenium and python

from selenium import webdriver
from bs4 import BeautifulSoup

class Crawler(object):
    
    def main(self, url):
        if url is not None:
            #Exception handling
            try:
                browser = webdriver.PhantomJS() #Create an object to operate the browser
                browser.get(url) #Access URL
            except:
                ~~~

        html_source = browser.page_source #Returns the page source of the visited site
        bs_obj = BeautifulSoup(html_source) #Creates a BeautifulSoup object with the page source as an argument
        
        print(url)
        print(html_source)
        print(bs_obj)
        browser.quit()


if __name__ == "__main__":
    cw = Crawler()
    cw.main(http://www.yahoo.co.jp/)

Selenium/BeautifulSoup -Basic usage of selenium -Basic usage of beautiful soup

Recommended Posts

I tried web scraping using python and selenium

Web scraping using Selenium (Python)

I tried web scraping with python.

[Python scraping] I tried google search top10 using Beautifulsoup & selenium

Python web scraping selenium

Practice web scraping with Python and Selenium

I tried object detection using Python and OpenCV

I tried scraping with Python

[Python] I tried using OpenPose

I tried scraping with python