Do you know a site called Amaoku? It is a site where you can buy and sell Amazon gift certificates, and it is traded at a discount rate of about 5 to 10%.
How can I buy a gift certificate at the best possible price on this site? For example, is there any tendency that the discount rate is good on Tuesday and the discount rate is bad around the 25th?
Fortunately, Amaoku has released Past Transaction Data to the public. The content of this article is that this transaction data was scraped with Python + Beautiful Soup and analyzed with R.
If you write the conclusion first, it will be as follows. --There is no relationship between face value and discount rate --There is no relationship between the validity period and the discount rate ――Currently expensive. Wait until it reaches 92.5-95% before buying. ――The discount rate does not change on any day of the week ――The discount rate does not change on any day --Slightly cheaper during the day than at other times
The Python code used is below. As a flow,
is.
amaoku_scraping.py
#! coding: UTF-8
from bs4 import BeautifulSoup
import urllib.request
import time
file = open("C:/Users/user/amaoku_transaction_data.csv", 'w')
# get last page index
last_index = 0
html = urllib.request.urlopen("https://amaoku.jp/past_gift/index_amazon/")
soup = BeautifulSoup(html, "lxml")
a_s = soup.find(class_="pager_link").find_all('a')
for a in a_s:
if a.string.replace(u"\xa0", u" ") == u'last "':
last_index = int(a.get('href').split('/')[-1])
# get auction data from a page
last_index = 20
page_index = 0
while page_index <= last_index:
url = 'https://amaoku.jp/past_gift/index_amazon/' + str(page_index)
html = urllib.request.urlopen(url)
soup = BeautifulSoup(html, 'lxml')
rows = soup.find('table', class_='contacttable').find('tbody').find_all('tr')
# get sales data from a page
for row in rows:
line_elements = []
# if the row is a header, skip
if row.has_attr('class') and ['class'][0] == 'tr_tit':
continue
items = row.find_all('td')
for item in items:
# if the item is empty, skip
if item.string == None:
continue
# clean the string
element = item.string.replace(',', '').replace(' ', '').replace('\xa0', '').replace(u'Circle', '').replace('%', '')
line_elements.append(element)
line = ','.join(line_elements)
if line == '':
continue
file.write(line + '\n')
print("Page {0} processed".format(page_index))
time.sleep(1)
# 20 items per a page
page_index += 20
file.close()
print("Task completed")
Read the file with read.csv and put a name in each column. Date and time data is converted to Date class.
uri <- "D:/workspace/amaoku_analyze/amaoku_transaction_data.csv"
dat <- read.csv(uri, header=T, fileEncoding="UTF-8", stringsAsFactors = F)
names(dat) <- c("biddate", "facevalue", "bidprice", "discount", "validdate")
dat$biddate2 <- as.Date(dat$biddate)
dat$validdate2 <- as.Date(dat$validdate)
For the time being, --biddate: date and time of purchase --facevalue: face value --bidprice: Purchase price --discount: Discount rate --valid date: expiration date is.
When I checked the line with NaN etc., there were 170.
sum(!complete.cases(dat)) # 170
I'll erase it.
dat = dat[complete.cases(dat),]
The data is 176899 rows and 7 columns.
> str(dat)
'data.frame': 176899 obs. of 7 variables:
$ biddate : chr "2015/12/20 18:58" "2015/12/20 18:03" "2015/12/20 18:03" "2015/12/20 18:01" ...
$ facevalue : int 10000 5000 5000 20000 3000 5000 5000 3000 10000 3000 ...
$ bidprice : int 9750 4825 4825 19300 2880 4800 4825 2895 9700 2895 ...
$ discount : num 97.5 96.5 96.5 96.5 96 96 96.5 96.5 97 96.5 ...
$ validdate : chr "2015/12/20" "2016/12/20" "2016/11/20" "2016/12/20" ...
$ biddate2 : Date, format: "2015-12-20" "2015-12-20" "2015-12-20" ...
$ validdate2: Date, format: "2015-12-20" "2016-12-20" "2016-11-20" ...
The higher the face value, the higher the discount rate. How is it actually?
require(ggplot2)
ggplot(dat, aes(facevalue, discount)) + geom_point() + labs(x="Face value [yen]", y="Discount rate [%]")
At first glance, it seems that there is no such tendency. Let's look at the slope of the regression line.
>summary(lm(discount ~ facevalue, data=dat))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.401e+01 5.586e-03 16828.37 <2e-16 ***
facevalue -1.812e-05 2.516e-07 -72.03 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The slope is-1.812e-05。Pr(>|t|)It is significant when you look at the value of. In other wordsIf the face value increases by 1000 yen, the price will be 0.02%Go downThere is a tendency. It's almost within the margin of error.
** Conclusion: There is no relationship between face value and discount rate **
Generally speaking, the shorter the validity period, the lower the demand, so the discount rate is likely to be higher. What about the truth?
Calculate the validity period from the expiration date and purchase date and time, and plot it together with the discount rate.
dat$timediff <- as.numeric(difftime(dat$validdate2, dat$biddate2, units = "days")) / 365.24
ggplot(dat, aes(timediff, discount)) + geom_point() +
labs(x="Valid period [year]", y="Discount [%]")
There seems to be no particular tendency here either. The slope of the regression line was -0.099743 (p <2e-16) in the same way as before.
It seems that the discount rate is low with a validity period of 1 year, but it is probably because the number of samples is large and the base of distribution is wide. Below is the histogram.
** Conclusion: There is no relationship between the validity period and the discount rate **
ggplot(dat, aes(timediff)) + geom_histogram(binwidth = 1/12) + xlim(-1, 5) +
labs(x="Valid period [year]", y="Frequency")
How does the discount rate change when viewed throughout the year? Is there a cheap season?
ggplot(dat, aes(biddate2, discount)) + geom_point(size=1) +
ylim(75, 100) + labs(x="Date", y="Discount [%]")
The numbers on the horizontal axis are the months of 2015. It is showing a meandering movement. Since the data acquired this time is for the past year, I do not know the details of seasonal fluctuations, but looking at the data for the whole year, this season seems to be expensive. As far as the graph is concerned, 92.5-95% looks like a market price.
** Conclusion: Currently expensive. Wait until it reaches 92.5-95% before buying. ** **
Also check the day of the week. Since the number of users of the site is large on Saturdays and Sundays, it will be advantageous for the seller side and the discount rate will be worse.
weekdays_names=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
dat$weekdays <- factor(weekdays(dat$biddate2), levels= weekdays_names)
wddf <- aggregate(discount ~ weekdays, dat, mean)
gggplot(data=dat, aes(weekdays, discount)) + geom_boxplot() +
ylim(75,110) + labs(x="Day of the week", y="Discount rate [%]")
** Conclusion: The discount rate does not change on any day of the week **
Since the 25th is a payday, the user's wallet will be moisturized, and even if the conditions are a little bad, it will sell, so the discount rate may worsen.
dat$days <- factor(format(as.POSIXct(dat$biddate), format="%d"))
ggplot(dat, aes(days, discount)) + geom_boxplot() +
ylim(75,100) + labs(x="Day of a month", y="Discount rate [%]")
** Conclusion: The discount rate does not change on any day **
Isn't it possible that the number of users will decrease and the discount rate will improve in the middle of the night, early morning, and daytime? Examine you.
dat$hours <- factor(format(as.POSIXct(dat$biddate), format="%H"))
ggplot(dat, aes(hours, discount)) + geom_boxplot() +
ylim(75,100) + labs(x="Hour of a day", y="Discount rate [%]")
From 23:00 to 8:00 the next morning, the price is high. It's not a big difference, but if you're looking for a low price, it's best during the day.
** Conclusion: Daytime is slightly cheaper than other times **
What did you think? Based on this information, we hope that users can buy Amazon gift certificates at a low price.
Recommended Posts