[Memo] How to use BeautifulSoup4 (2) Display the article headline with Requests

The html of websites on the Internet contains various information, and it is difficult to analyze it by yourself. Therefore, we use a library called Requests that gets html.

This time, we will learn how to use Requests by acquiring the headlines of articles in the domestic column of MSN Japan.

In [1] Import Beautiful Soup, Requests and Re

`In[1]`


from bs4 import BeautifulSoup
import requests
import Re

In [2] Store html information in variable urlshutoku

`In[2]`


urlshutoku = requests.get("https://www.msn.com/ja-jp")

In [3] Try to display the entire page

`In[3]`


urlshutoku.text

When In [3] is displayed, unnecessary information is more noticeable, so only the headings that are necessary information this time are displayed. For that purpose, the headline information must be obtained. That's where Google Chrome's developer tools come in.

First, right-click the heading and click Validate (I). Then, the following screen is displayed.

The information used for scraping is only alphanumeric characters on the left side of the above screen. Make sure that the heading at the top of the part where you clicked Verify earlier is blue. Next, check \ corresponding to the url of the article headline. Other headlines are the same, so \ seems to be a clue.

In [4] Analyzed with BeautifulSoup and html.parser

`In[4]`


soup = BeautifulSoup(urlshutoku.text,"html.parser")

Extract domestic headlines using In [5] find_all

`In[5]`


midashi = soup.find_all(href=re.compile("/ja-jp/news/national"))

If you type midashi on the jupyter notebook, the headline information will be displayed, but the url information is also included. Since it is difficult to see as it is, only characters can be displayed.

Display only characters using In [6] for statement and string

`In[6]`


for ichiran in midashi:
    print(ichiran.string)

Now only the heading is displayed.

Recommended Posts

[Memo] How to use BeautifulSoup4 (2) Display the article headline with Requests

[Memo] How to use BeautifulSoup4 (3) Display the article headline with class_

[Memo] How to use BeautifulSoup4 (1) Display html

How to use the generator

How to use the decorator

How to use cron (personal memo)

Python: How to use async with

How to use the zip function

How to use the optparse module

How to use Requests (Python Library)

How to use virtualenv with PowerShell

[Memo] How to use Google MµG

How to use FTP with Python

How to use the ConfigParser module

[Python] Explains how to use the format function with an example

How to use the Spark ML pipeline

How to use ManyToManyField with Django's Admin

How to use OpenVPN with Ubuntu 18.04.3 LTS

How to use Cmder with PyCharm (Windows)

[Linux] How to use the echo command

How to use the Linux grep command

How to use Ass / Alembic with HtoA

How to use Japanese with NLTK plot

How to display python Japanese with lolipop

How to use jupyter notebook with ABCI

How to use CUT command (with sample)

How to use the IPython debugger (ipdb)

How to use SQLAlchemy / Connect with aiomysql

How to use JDBC driver with Redash

[Python] Explains how to use the range function with a concrete example

How to use the C library in Python

How to use GCP trace with open Telemetry

How to use MkDocs for the first time

How to specify the NIC to scan with amazon-dash

[Python] How to change the date format (display format)

Specify the Python executable to use with virtualenv

How to try the friends-of-friends algorithm with pyfof

How to use the graph drawing library Bokeh

How to scrape horse racing data with BeautifulSoup

How to use the Google Cloud Translation API

How to use the NHK program guide API

The easiest way to use OpenCV with python

[Algorithm x Python] How to use the list

How to use tkinter with python in pyenv

How to display images continuously with matplotlib Note

How to Learn Kaldi with the JUST Corpus

How to display in the entire window when setting the background image with tkinter

How to use xml.etree.ElementTree

How to create an article from the command line

How to use virtualenv

How to use Seaboan

How to use image-match

How to delete the specified string with the sed command! !! !!

How to use shogun

A memo of how to use AIST supercomputer ABCI

How to use Pandas 2

How to use Virtualenv

[Introduction to Python] How to iterate with the range function?

How to use numpy.vectorize

How to create a submenu with the [Blender] plugin

How to use pytest_report_header