I want to extract only the store name from the store name list of goToEat and output it to CSV.
Beautifulsoup requests python3 windows10
I am using.
I was able to extract the store name including the tag in the form of a list by specifying the html tag with the following code
urlName = "https://premium-gift.jp/eatosaka/use_store?events=page&id={}&store=&addr=&industry=".format(PageNumber)
dataHTML = requests.get(urlName)
soup = BeautifulSoup(dataHTML.content, "html.parser")
elems = soup.select('h3.store-card__title')
Replace and delete extra information and output to CSV. I was told that i.text can be used to get text information.
with open(r'C:\Users\daisuke\Desktop\python\eat.csv', 'w') as f:
writer = csv.writer(f)
for i in elems:
"""
i = str(i)
i = i.replace('<h3 class="store-card__title">', '')
i = i.replace('</h3>', '')
i = i.replace(' ', ' ')
i = i.replace(' ', ' ')
"""
print(i.text)
try:
writer.writerow([i.text])
except:
writer.writerow(['error'])
The following error occurs
Live spiny lobster dish Chunagon Osaka Station 3 Building
Traceback (most recent call last):
File "C:\Users\daisuke\Desktop\python\go_to_eat.py", line 24, in <module>
writer.writerow(i)
UnicodeEncodeError: 'cp932' codec can't encode character '\xa0' in position 20: illegal multibyte sequence
Therefore, we replaced the non-breaking space with a half-width space as shown below. So to speak, it's not good because it's a symptomatic treatment.
for i in elems:
i = str(i)
i = i.replace('<h3 class="store-card__title">', '')
i = i.replace('</h3>', '')
i = i.replace(' ', ' ')
i = i.replace(' ', ' ')
print(i)
try:
writer.writerow([i])
except:
writer.writerow(['error'])
Perhaps the best thing is to specify a character code that can properly express the character in question. If you give the encoding keyword argument to the open () function as shown below, you can directly specify the character code used in the automatic conversion, so make it UTF-8 etc. that can express Unicode characters. That's fine.
The characters are garbled when the CSV file is opened, but it is okay if you change the character code.
with open(r'C:\Users\daisuke\Desktop\python\eat.csv', 'w', encoding='utf-8') as f:
However, when reading from CSV, an unnecessary blank column was added as shown below. ~~ I still don't know why. ~~ A detailed person told me in the comments and solved it! Thank you
['Wolfgang Steakhouse by Wolfgang Steakhouse Osaka']
[]
['Vineyard']
[]
['Sumikoku Rotating Chicken Cuisine LUCUA']
Recommended Posts