[Python] Code that can be written with brain death at the beginning when scraping as a beginner

Every time you scrape

test.py


from bs4 import BeautifulSoup

Since it is troublesome to write like this, I will create a template that is sure to use this for the time being.

test.py


!apt-get update
!apt install chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin
!pip install selenium
!pip install requests-html

First of all, library related. I usually use clb, so I'll put this in for the time being.

test.py


import pandas as pd
import datetime
from tqdm.notebook import tqdm
import requests
from bs4 import BeautifulSoup
import time
import re
from urllib.request import urlopen
import urllib.request, urllib.error
from requests_html import HTMLSession
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

#Up to the point of getting html
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome('chromedriver',options=options)
driver.implicitly_wait(10)
url="https://www.XXX.com"
driver.get(url)
html = driver.page_source.encode('utf-8')
soup = BeautifulSoup(html, "html.parser")

Yes, it's OK to copy and paste because of brain death so far. later

test.py


soup

With this, you can reach the point where you output html for the time being in a few seconds.

Strictly speaking, there are some libraries that I don't use, such as tqdm, but I also pack all the code that imports the libraries that I use in the set almost every time I scrape personally.

I myself copy and paste this and use it all the time.

Recommended Posts

[Python] Code that can be written with brain death at the beginning when scraping as a beginner
Article that can be a human resource who understands and masters the mechanism of API (with Python code)
[Python3] Code that can be used when you want to change the extension of an image at once
[Python] Make a graph that can be moved around with Plotly
I made a shuffle that can be reset (reverted) with Python
Understand the probabilities and statistics that can be used for progress management with a python program
About the matter that torch summary can be really used when building a model with Pytorch
[Python] A program that finds the maximum number of toys that can be purchased with your money
A memo when creating an environment that can be debugged with Lambda @ Edge for the time being
[Python3] Code that can be used when you want to cut out an image in a specific size
Since python is read as "Pichon", it can be executed with "Pichon" (it is a story)
I made a familiar function that can be used in statistics with Python
Confirmation that rkhunter can be installed
Article that can be a human resource who understands and masters the mechanism of API (with Python code)
Run the output code with tkinter, saying "A, pretending to be B" in python
[Python] Introduction to web scraping | Summary of methods that can be used with webdriver
Python knowledge notes that can be used with AtCoder
A memo that I touched the Datastore with python
Limits that can be analyzed at once with MeCab
Can be used with AtCoder! A collection of techniques for drawing short code in Python!
[Python3] Code that can be used when you want to resize images in folder units
[Python] A program to find the number of apples and oranges that can be harvested
As you may know, Python can be written like this
Get UNIXTIME at the beginning of today with a command
Solution when the image cannot be displayed with tkinter [python]
Use a macro that runs when saving python with vscode
Python code that keeps tweeting "Bals" as much as you can
The story that Python stopped working with VS Code (Windows 10)
Scripts that can be used when using bottle in Python
Precautions that must be understood when building a PYTHON environment
I investigated the pretreatment that can be done with PyCaret
Let's make a diagram that can be clicked with IPython
Run the output code on the local web server as "A, pretending to be B" in python
Here's a summary of things that might be useful when dealing with complex numbers in Python
I bought and analyzed the year-end jumbo lottery with Python that can be executed in Colaboratory
A story that didn't work when I tried to log in with the Python requests module
・ <Slack> Write a function to notify Slack so that it can be quoted at any time (Python)
Web scraping beginner with python
Predict the number of cushions that can be received as laughter respondents with Word2Vec + Random Forest
[Python, Selenium, PhantomJS] A story when scraping a website with lazy load
I made a package that can compare morphological analyzers with Python
Make a Spinbox that can be displayed in Binary with Tkinter
From a book that programmers can learn (Python): Find the mode
A timer (ticker) that can be used in the field (can be used anywhere)
Make a currency chart that can be moved around with Plotly (2)
Make a Spinbox that can be displayed in HEX with Tkinter
Python standard module that can be used on the command line
Make a currency chart that can be moved around with Plotly (1)
The story of making a module that skips mail with python
The story of making a slackbot that outputs as gif or png when you send the processing code
[Python] A program that finds a pair that can be divided by a specified value
Basic summary of scraping with Requests that beginners can absolutely understand [Python]
The LXC Web Panel that can operate LXC with a browser was wonderful
[Python] A program that calculates the number of socks to be paired
Create a web app that can be easily visualized with Plotly Dash
Extract lines that match the conditions from a text file with python
Mathematical optimization that can be used for free work with Python + PuLP
Draw a graph that can be moved around with HoloViews and Bokeh
I made a simple timer that can be started from the terminal
The eval () function that calculates a string as an expression in python
Be careful when retrieving tweets at regular intervals with the Twitter API
Can VS Code be debugged if the path contains certain symbols? (Python)