I tried web scraping using python and selenium

Prerequisite knowledge

This time I wrote the code to collect text from the website using python and selenium, so I will summarize it.

What is selenium?

Originally, selenium is for automatically testing web applications, but you can operate a web browser to operate a website.

Web scraping with Python and Selenium

To explain how we decided to scrape the web with Python and Selenium this time.

  1. The site you tried to scrape had a mechanism to get the contents of the site by ajax communication.
  2. Therefore, the urlopen function of urllib.request cannot be used.

For the above reason, use not only urlopen of urllib.request, which is often used for web scraping, but also selenium.

Basic web scraping flow of selenium and python

from selenium import webdriver
from bs4 import BeautifulSoup

class Crawler(object):
    
    def main(self, url):
        if url is not None:
            #Exception handling
            try:
                browser = webdriver.PhantomJS() #Create an object to operate the browser
                browser.get(url) #Access URL
            except:
                ~~~

        html_source = browser.page_source #Returns the page source of the visited site
        bs_obj = BeautifulSoup(html_source) #Creates a BeautifulSoup object with the page source as an argument
        
        print(url)
        print(html_source)
        print(bs_obj)
        browser.quit()


if __name__ == "__main__":
    cw = Crawler()
    cw.main(http://www.yahoo.co.jp/)

Selenium/BeautifulSoup -Basic usage of selenium -Basic usage of beautiful soup

Recommended Posts

I tried web scraping using python and selenium
Web scraping using Selenium (Python)
I tried web scraping with python.
[Python scraping] I tried google search top10 using Beautifulsoup & selenium
Python web scraping selenium
Practice web scraping with Python and Selenium
I tried object detection using Python and OpenCV
I tried scraping with Python
[Python] I tried using OpenPose
I tried scraping with python
Scraping with Python, Selenium and Chromedriver
Python programming: I tried to get (crawling) news articles using Selenium and BeautifulSoup4.
I tried using Thonny (Python / IDE)
[Python] I tried using YOLO v3
I tried [scraping] fashion images and text sentences in Python.
I tried to get Web information using "Requests" and "lxml"
I tried scraping
I tried various things with Python: scraping (Beautiful Soup + Selenium + PhantomJS) and morphological analysis.
Scraping using Python
I tried using Twitter api and Line api
[Beginner] Python web scraping using Google Colaboratory
I tried scraping Yahoo News with Python
I tried using Selenium with Headless chrome
I tried using PyEZ and JSNAPy. Part 2: I tried using PyEZ
I tried using Bayesian Optimization in Python
I tried using UnityCloudBuild API from Python
I tried scraping Yahoo weather (Python edition)
Easy web scraping with Python and Ruby
I tried using Headless Chrome from Selenium
I tried to make a periodical process with Selenium and Python
I tried using parameterized
I tried using PyEZ and JSNAPy. Part 1: Overview
I tried using argparse
I tried web scraping to analyze the lyrics.
I tried using mimesis
I tried using anytree
vprof --I tried using the profiler for Python
I tried using aiomysql
I tried using Summpy
I tried Python> autopep8
I tried using coturn
I tried using Pipenv
[ML-Aents] I tried machine learning using Unity and Python TensorFlow (v0.11β compatible)
I tried using matplotlib
I tried using "Anvil".
I tried Jacobian and partial differential with python
I tried using Hubot
I tried using mecab with python2.7, ruby2.3, php7
I tried function synthesis and curry with python
I tried using ESPCN
Create a web map using Python and GDAL
Pharmaceutical company researchers summarized web scraping using Python
I tried using openpyxl
I tried using Ipython
I tried using PyCaret
I tried reading a CSV file using Python
I tried using cron
I tried using the Datetime module by Python
I tried using "Streamlit" which can do the Web only with Python
I tried using ngrok
I tried using face_recognition