Review of scraping

I was worried about scraping and wanted to get some data for the time being, so I tried scraping while referring to the following site. https://www.atmarkit.co.jp/ait/articles/1910/18/news015_2.html I will write it as a review, so I hope it will be helpful for those who are new to scraping! Written in Google Colab using Python. Therefore, there may be some differences from the local description.

Basics of scraping

I scraped with request and Beautiful so up. In request, the specified web k and other files are acquired, and the desired information is extracted from the file acquired by Beautiful soup. As you can see on the site, I am writing a program to get the J League standings. In addition, I have written up to the point of additionally saving to CSV. The code used this time is shown below.

`qiita.rb`


from bs4 import BeautifulSoup
from urllib import request

url = 'https://www.jleague.jp/standings/j1/'
response = request.urlopen(url)
content = response.read()
response.close()

charset = response.headers.get_content_charset()
html = content.decode(charset, 'ignore')
soup = BeautifulSoup(html)

table = soup.find_all('tr')

standing = []
for row in table:
    tmp = []
    for item in row.find_all('td'):
        if item.a:
            tmp.append(item.text[0:len(item.text) // 2])
        else:
            tmp.append(item.text)
    del tmp[0]
    del tmp[-1]
    standing.append(tmp)

for item in standing:
    print(item)

import pandas as pd
from google.colab import files 
del standing[0]
df = pd.DataFrame(standing,columns = ['Ranking', 'Club name', 'Points', 'Number of games', 'Win', 'Minutes', 'negative', 'score', 'Conceded', '得Conceded'])

from google.colab import drive

filename = 'j1league.csv'
path = '/content/drive/My Drive/' + filename

with open(path, 'w', encoding = 'utf-8-sig') as f:
  df.to_csv(f,index=False)

Since I implemented it while checking it in detail on the way, I put print () in between, but here I have implemented it up to saving it to a file at once.

[Python] Until scraping beginners save J-League standings to CSV files

Review of scraping

Basics of scraping

qiita.rb

`qiita.rb`