Try Python that you were curious about. The environment is Windows 10.
Anaconda3 Python distribution. You can download it with Python alone, but if you look it up a bit Anaconda3, which already contains frequently used libraries, seemed to be better, so I'll put it in here. I downloaded the following 64-Bit Graphical Installer (466 MB) and installed it all by default. https://www.anaconda.com/products/individual
After installation, add the location of the binary files to your environment variables. C:\Users\xxxxxx\anaconda3
VS code Search by VS Code and download. After installation, add the following from the extension on the left menu.
If you can display the version by typing the following command from the command prompt, the environment setting is complete.
python
C:\Users\xxxxxx> python -V
Python 3.8.3
Create a test directory and create a "test.py" file in it.
test.py
print("test!!")
Select "Terminal" from View to display the terminal. PowerShell will start in the directory of the open py file, so execute the following command. Then, "test !!" is displayed.
powershell
PS C:\Users\xxxxx\workspace\test> python test.py
test!!
Try scraping. Beautiful Soup seems to be useful. ** Be sure to check robots.txt and terms of use to see if it is allowed before scraping! !! ** **
test.py
import requests
import pandas as pd
from bs4 import BeautifulSoup
#URL to scrape
url = 'xxxxxxxx'
response = requests.get(url)
response.encoding = response.apparent_encoding
#Convert to a BeautifulSoup object
bs = BeautifulSoup(response.text, 'html.parser')
#Get h2 tag with class123 specified in class attribute
tags = bs.find_all('h2', attrs={'class': 'class123'})
for tag in tags:
print(tag,end='\n')
If you get an error like "numpy not found", the path is missing. Add the following library directory to your environment variables and restart VScode. C:\Users\xxxxx\anaconda3\Library\bin
Scraping was much easier than Javascript.
Recommended Posts