I'm ** Shun ** studying programming. Recently, I was interested in Python, so I read "** Python that understands fluently ". This book will teach you the basic syntax of Python and how to do web scraping. [ A Python book that you can understand fluently **] (https://www.amazon.co.jp/%E3%82%B9%E3%83%A9%E3%82%B9%E3%83%A9%E3%82%8F%E3%81%8B%E3%82%8BPython-%E5%B2%A9%E5%B4%8E-%E5%9C%AD/dp/4798151092/ref=asc_df_4798151092/?tag=jpgo-22&linkCode=df0&hvadid=295686767484&hvpos=1o1&hvnetw=g&hvrand=17010285472902510266&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=1009343&hvtargid=pla-526272651553&psc=1&th=1&psc=1/)
Simply put, it's a technology that extracts the information you want on a website.
Now that I've learned web scraping, I'll try it. The site for scraping this time is BanG Dream's official site (https://bang-dream.com/) Why did you try this site? .. .. I wanted the image below.
I made a folder called Qiita with VScode. I would like to save it in this folder called Qiita. Then open a command prompt and execute the following command. The installation will start.
$ > pip install requests --user
$ > pip install BeautifulSoup4 --user
Once the installation is complete, I would like to open a terminal and check if the installation was successful.
$ >>> import requests
>>>
$ >>> from bs4 import BeautifulSoup
>>>
If no message is displayed at this point, the installation is successful. If you get the following error message here, the installation has failed. In such a case, check if your computer is connected to the Internet, and then install it again with the pip command.
$ >>> import requests
Traceback (most recent call last ) :
File "<stdin>" , line 1 , in <module>
ModuleNotFoundError : No module named " requests "
>>>
I saved the following contents in the Qiita folder as Qiita01.py. A commentary is also posted.
Qiita01.py
import requests
from bs4 import BeautifulSoup
result = requests.get("https://bang-dream.com/")
soup = BeautifulSoup(result.text, "html.parser")
img = soup.find_all('img')
print(img)
import requests
Declaration to use requests library
from bs4 import beautifulsoup
Importing external library beautifulsoup
result = requests.get("https://bang-dream.com/")Enter the URL you want to scrape here
#### **` soup = BeautifulSoup(result.text, "html.parser")Specify the character string to be analyzed and the type of processing to actually analyze in the processing of BeautifulSoup`**
img = soup.find_all('img')in the find method[img]Specify the character
|Mesot|function|
|:--------|------|
| find_all() |Searches for the tag specified in the citation and returns a list containing all matches|
``` print(img) ```output
### Output result
![キャプチャ03_LI.jpg](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/541905/797a1cf9-34e4-e62a-2cba-6396c4dedbed.jpeg)
If you look at it in the terminal, you will see something like this. Let's open the link drawn by the red line. If the following image appears, scraping is successful.
![579de894-5bc4-4371-a0a0-da781af22bfa.jpg](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/541905/68492ed6-94b2-a5ca-f52b-055d71f3eef5.jpeg)
## Impressions
Why did you write a rudimentary article? ?? Some people may think that. The answer is simple, I've only had this many articles ... I want to deepen Python further.
Recommended Posts