Basic summary of scraping with Requests that beginners can absolutely understand [Python]

Requests basics

import

import requests

You will definitely need this import.

Get the source from the website

--Get with GET method --Get with the POST method

You should remember these two.

Get with GET method (requests.get)

import requests

url = 'https://www.yahoo.co.jp/'
response = requests.get(url)
print(response) # →<Response [200]>

html = response.text
print(html) #→ HTML source string

The return value of requests.get (url) is the HTTP status code. If successful, 200 will be returned.

You can get the HTML source string you are looking for in response.text.

Get by POST method (requests.post)

You may not get the source you are looking for without the POST method.

data =  {'username':'tarouyamada', 'password':'4r8q99fiad'}

response = requests.post(url, data=data)

Now you can send the request including the request body.

How to add request header

headers = {'user-agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36', 
'accept': 'application/json'}

response = requests.get(url, headers=headers)

You can now send the request with the request header attached. The writing method is common for get and post.

Get an image

You can get binary data using .content. Images are also a type of binary data.

response = requests.get(url)

img_data = response.content

print(img_data)
#b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x03\x02\x02\x03\x02\x02\x03\x03\x03\x03\x04\x03\x03\x04\x05\x08\x05\x05\x04\x04\x05\n\x07\x07\x06\x08\x0c\n\x0………

print(type(img_data))
# <class 'bytes'>

--The output is bytes type

Save image

--By the way, when you want to save the acquired image data --Add'b'to read / write binary files

with open('test.jpg', 'wb') as f:
    f.write(response.content)

Parameter specification

params = {'q':'qiita', 'date':'2020-7-3'}

response = requests.get(url, params=params)

View response header

--Content-type can be used to judge whether it is text, json, or image.

response = requests.get(
    'https://www.pakutaso.com/shared/img/thumb/nekocyan458A3541_TP_V.jpg')

print(response.headers)

# {'Server': 'nginx', 'Date': 'Tue, 07 Jul 2020 22:39:37 GMT', 'Content-Type': 'image/jpeg', 'Content-Length': '239027', 'Last-Modified': 'Sun, 05 Jul 2020 01:51:48 GMT', 'Connection': 'keep-alive', 'ETag': '"5f013234-3a5b3"', 'Expires': 'Thu, 06 Aug 2020 22:39:37 GMT', 'Cache-Control': 'max-age=2592000', 'X-Powered-By': 'PleskLin', 'Strict-Transport-Security': 'max-age=31536000;  includeSubDomains; preload', 'Accept-Ranges': 'bytes'}

If there is a redirect

Get the redirected response

If you want to use the history in the middle of redirect, use .history

Encoding check

response = requests.get(
    'https://qiita.com/')

print(response.encoding)

# utf-8

Get json data

--Can be obtained as a dictionary with response.json ()

response = requests.get(url)

json_dict = response.json()

Recommended Posts

Basic summary of scraping with Requests that beginners can absolutely understand [Python]
[Python] Introduction to web scraping | Summary of methods that can be used with webdriver
Summary of the basic flow of machine learning with Python
Format summary of formats that can be serialized with gensim
[For beginners] Summary of standard input in Python (with explanation)
Basic summary of data manipulation with Python Pandas-First half: Data creation & manipulation
Scraping with Selenium in Python (Basic)
[Raspberry Pi] Scraping of web pages that cannot be obtained with python requests + Beautiful Soup
Basic study of OpenCV with Python
[Django] A brief summary of the log output function so that even beginners can understand it.
[For beginners] Try web scraping with Python
"Manim" that can draw animation of mathematical formulas and graphs with Python
Solve with Python [100 selections of past questions that beginners and intermediates should solve] (034-038 Dynamic programming: Knapsack DP basic)
Python practice data analysis Summary of learning that I hit about 10 with 100 knocks
Scraping with Python
Scraping with Python
[Python] Summary of S3 file operations with boto3
One-liner that outputs 10000 digits of pi with Python
Basic story of inheritance in Python (for beginners)
Summary of statistical data analysis methods using Python that can be used in business
[Introduction to Python] Basic usage of the library scipy that you absolutely must know
Beginners can use Python for web scraping (1) Improved version
Summary of tools for operating Windows GUI with Python
Summary of pre-processing practices for Python beginners (Pandas dataframe)
Python knowledge notes that can be used with AtCoder
Python beginners get stuck with their first web scraping
Summary about Python scraping
Try scraping with Python.
Basics of Python scraping basics
Scraping with Python + PhantomJS
Basic knowledge of Python
Summary of Python arguments
Scraping with Selenium [Python]
Retry with python requests
Scraping with Python + PyQuery
Scraping RSS with Python
Basic summary of data manipulation in Python Pandas-Second half: Data aggregation
I wrote the basic grammar of Python with Jupyter Lab
An introduction to Python that even monkeys can understand (Part 3)
A beginner's summary of Python machine learning is super concise.
An introduction to Python that even monkeys can understand (Part 1)
An introduction to Python that even monkeys can understand (Part 2)
Module summary that automates and assists WebDriver installation with Python
Here's a summary of things that might be useful when dealing with complex numbers in Python
Understand the probabilities and statistics that can be used for progress management with a python program
[Python] A program that finds the maximum number of toys that can be purchased with your money
I tried scraping with Python
Web scraping with python + JupyterLab
Scraping with selenium in Python
Scraping with Selenium + Python Part 1
Summary of Python3 list operations
Scraping with chromedriver in python
BASIC authentication with Python bottle
Festive scraping with Python, scrapy
[Python] Using OpenCV with Python (Basic)
Scraping with Selenium in Python
Scraping with Tor in Python
Scraping weather forecast with python
Basic Python grammar for beginners
Basic usage of Pandas Summary
Scraping with Selenium + Python Part 2