The first of the first series is Python as much as possible ... (Does this series continue properly?) Also, this series is not like writing code that can automate work as it is, but it is a series that does ** automation of this, etc. ** **
** Please be sure to read this. ** ** First of all, scraping means that the computer does the work that humans do, so you can access it many times. However, doing so puts a burden on the server, so it is necessary to take measures such as once per second.
From here is the most important. You have to make sure that the site you are scraping ** allows scraping ** and so on.
I will write it for those who are old and scrape quickly without saying such a thing.
First of all, install the libraries required for scraping.
It's a library called Beautiful Soup.
If you have Anaconda installed, it is included from the beginning, but if you get an error, please execute this code.
conda install BeautifulSoup4 lxml
Eh? Isn't the conda command pip? It can't be helped. ~~ Gentle Faguri will write it. ~~
pip install BeautifulSoup4 lxml
Please run the.
code.py
from bs4 import BeautifulSoup
import requests
page_data = requests.get('https://ja.wikipedia.org/wiki/%E3%82%A6%E3%82%A7%E3%83%96%E3%82%B9%E3%82%AF%E3%83%AC%E3%82%A4%E3%83%94%E3%83%B3%E3%82%B0').text
page = BeautifulSoup(page_data, 'lxml')
for element in page.select("#mw-content-text > div > p:nth-child(1)"):
print(element.text)
Web scraping (English: Web scraping) is a computer software technology that extracts information from websites. Also known as a web crawler [1] or web spider [2]. Such software programs typically acquire WWW content by implementing low-level HTTP or by embedding a web browser.
For those who are just starting out, I think it's Nanikore, especially "for element in page.select (" # mw-content-text> div> p: nth-child (1) "):" I will. # mw-content-text> div> p: nth-child (1) "): How to find it (like a math explanation) Right click in Google Chrome> Verification> And![Verification.png](https: / /qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/551445/6751d074-acea-990a-04af-3e246bd654fb.png)
Click the area surrounded by the red frame, move the cursor to the area you want to scrape, right-click the area that is light blue, copy it with Copy> Copy selector, and paste it in that area. ~~ This will not complain even for beginners ~~
It depends on the person, but Python can be scraped by this alone. Take care of your body! (It doesn't matter at all)
Recommended Posts