JupyterLab
is an execution environment where you can easily touch python
.
git clone https://github.com/takiguchi-yu/python-jupyterLab.git
cd python-jupyterLab
docker-compose up -d
http://localhost:8888
docker-compose down
Let's write a little web scraping. A sample that reads the URL described in the external file and outputs the result to the external file while hitting it.
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 12_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1'
}
print('Start processing')
#List of URLs(External file)Read
with open('./input_urls.txt', mode='r', encoding='utf-8') as f:
for url in f:
result = requests.get(url.rstrip('\n'), headers=headers) #Note: Remove the line feed code
print(result.status_code)
soup = BeautifulSoup(result.content, 'html.parser')
a = soup.find_all('HTML tag name here', {'class': 'Class name here'})
#a = soup.find_all('div', {'class': 'hoge-hoge'}) #Example
b = a[0].find(text=True) #Get the text of an HTML tag
#External file of scraping result(output.txt)Output to
with open('./output.txt', 'a') as f:
print(b, file=f)
print('Processing completed')
You can freely put in your favorite library
https://qiita.com/hgaiji/items/edf71435d0565257f980
Recommended Posts