The collaboration between UNIQLO and Jil Sander, the + J collection, has attracted a lot of attention. I also enjoyed it, and even now, more than a month after the collaboration started, I often look at the special page of + J. It seems that the special page is updated quite frequently, probably because of its unusually large popularity. Updating the special page does not guarantee a resurgence of inventory, but it is still a concern for fast fashion nerds. So, I wanted to scrape it with python and monitor the update status regularly.
Initially, I tried scraping with beautiful soup, but it failed. There is a difference between the downloaded HTML file and the HTML file that can actually be viewed, and it seems that it did not work.
Reference: https://gammasoft.jp/blog/how-to-download-web-page-created-javascript/
So, let's scrape using requests-html.
There seems to be an Incoming Webhook. Use this. The setting method and simple usage are as follows.
Reference: https://qiita.com/shtnkgm/items/4f0e4dcbb9eb52fdf316
observation.py
# coding: UTF-8
import configparser
from requests_html import HTMLSession
import slackweb
import datetime
from config import config, update_product
#Session start
session = HTMLSession()
url = config["web_info"]["url"]
r = session.get(url)
#Generate HTML in the browser engine
r.html.render()
#Scraping
#Product name
product_name = r.html.find(".ocI5u4BRvjaH-uauZvJ8R > h3")
product_name_array = []
for name in product_name:
product_name_array.append(name.text)
#Difference comparison
priv_product_name_array = config["web_info"]["product"].strip("[]").replace("\'", "").split(", ")
priv_set = set(priv_product_name_array)
curt_set = set(product_name_array)
slack = slackweb.Slack(url = config["slack_info"]["in_webhook_url"])
dt_now = datetime.datetime.now()
#Case with reduced display
if len(priv_set) - len(curt_set) > 0:
diff_result = list(priv_set - curt_set)
#Case with increased display
elif len(curt_set) - len(priv_set) > 0:
diff_result = list(curt_set - priv_set)
#message
slack.notify(text = dt_now.strftime('%Y year%m month%d day%H:%M:%S'))
for product in diff_result:
slack.notify(text = "『" + product + "』" + "Has been updated.")
slack.notify(text = url)
#No display change
elif len(curt_set) == len(priv_set):
diff_result = []
#message
slack.notify(text = dt_now.strftime('%Y year%m month%d day%H:%M:%S'))
slack.notify(text = "There was no product update.")
else:
diff_result = []
#Product status update
update_product(product_name_array)
config.py
import configparser
import re
#Read configuration file
config = configparser.ConfigParser()
config.read('config.ini')
def update_product(product_name):
with open("config.ini", "r") as f:
lines = f.readlines()
with open("config.ini", "w") as f:
for line in lines:
if re.match(r'(product =)', line):
f.write("product = {}".format(product_name))
continue
f.write(line)
config.ini
[slack_info]
in_webhook_url = https://hooks.slack.com/services/hogehogehogehoge
[web_info]
url = https://www.uniqlo.com/jp/ja/spl/collaboration/plusj/men/
product = ['Wool blend jacket (striped) can be set up', 'Wool blend oversized jacket', 'Hybrid down oversized hoodie', 'Cashmere blend crew neck sweater (long sleeves)', 'Merino Blend V-neck Cardigan (Long Sleeve)', 'Merino Blend V-neck Cardigan (Long Sleeve / Cloud)', 'Supima cotton oversized shirt (long sleeves)', 'Supima Cotton Oversized Shirt (Long Sleeve / Striped)', 'Supima cotton oversized shirt (long sleeves)', 'Supima Cotton Oversized Shirt (Long Sleeve / Striped)', 'Supima Cotton Oversized Shirt (Long Sleeve / Striped)', 'Supima cotton mock neck T (long sleeves)', 'Supima Cotton Crew Neck T (Long Sleeve)', 'Wool blend easy pants', 'Wool blend pants set up', 'Wool blend pants (stripes) can be set up', 'Wool stall']
When I run it ...
It seems to have worked.
Reference: https://qiita.com/taka-kawa/items/f0597b2f375da7ddbb73
Recommended Posts