How to Get an Amazon Gift Certificate Cheaply with Amaoku

Do you know a site called Amaoku? It is a site where you can buy and sell Amazon gift certificates, and it is traded at a discount rate of about 5 to 10%.

How can I buy a gift certificate at the best possible price on this site? For example, is there any tendency that the discount rate is good on Tuesday and the discount rate is bad around the 25th?

Fortunately, Amaoku has released Past Transaction Data to the public. The content of this article is that this transaction data was scraped with Python + Beautiful Soup and analyzed with R.

If you write the conclusion first, it will be as follows. --There is no relationship between face value and discount rate --There is no relationship between the validity period and the discount rate ――Currently expensive. Wait until it reaches 92.5-95% before buying. ――The discount rate does not change on any day of the week ――The discount rate does not change on any day --Slightly cheaper during the day than at other times

environment

Windows 8.1 64bit
Python 3.5
Beautiful Soup
R version 3.2.1

Web scraping

The Python code used is below. As a flow,

Get the range of pages to scrape
Get transaction information line by line using Beautiful Soup
Write the information to a csv file
Move to the next page
Repeat steps 2 to 5 within the range obtained in 1.

is.

`amaoku_scraping.py`


#! coding: UTF-8

from bs4 import BeautifulSoup
import urllib.request
import time

file = open("C:/Users/user/amaoku_transaction_data.csv", 'w') 

# get last page index
last_index = 0
html = urllib.request.urlopen("https://amaoku.jp/past_gift/index_amazon/")
soup = BeautifulSoup(html, "lxml")
a_s =  soup.find(class_="pager_link").find_all('a')
for a in a_s:
    if a.string.replace(u"\xa0", u" ") == u'last "':
        last_index = int(a.get('href').split('/')[-1])

# get auction data from a page 
last_index = 20
page_index = 0
while page_index <= last_index:
    url = 'https://amaoku.jp/past_gift/index_amazon/' + str(page_index)
    html = urllib.request.urlopen(url)
    soup = BeautifulSoup(html, 'lxml') 
    rows = soup.find('table', class_='contacttable').find('tbody').find_all('tr')
    # get sales data from a page
    for row in rows:
        line_elements = []
        # if the row is a header, skip 
        if row.has_attr('class') and ['class'][0] == 'tr_tit':
            continue
        items = row.find_all('td')
        for item in items:
            # if the item is empty, skip
            if item.string == None:
                continue
            # clean the string
            element = item.string.replace(',', '').replace('&nbsp;', '').replace('\xa0', '').replace(u'Circle', '').replace('%', '')
            line_elements.append(element)
        line = ','.join(line_elements)
        if line == '':
            continue
        file.write(line + '\n')

    print("Page {0} processed".format(page_index))
    time.sleep(1)
    # 20 items per a page
    page_index += 20

file.close()
print("Task completed")

Analyze with R

Preprocessing

Read the file with read.csv and put a name in each column. Date and time data is converted to Date class.

uri <- "D:/workspace/amaoku_analyze/amaoku_transaction_data.csv"
dat <- read.csv(uri, header=T, fileEncoding="UTF-8", stringsAsFactors = F)
names(dat) <- c("biddate", "facevalue", "bidprice", "discount", "validdate")
dat$biddate2 <- as.Date(dat$biddate)
dat$validdate2 <- as.Date(dat$validdate)

For the time being, --biddate: date and time of purchase --facevalue: face value --bidprice: Purchase price --discount: Discount rate --valid date: expiration date is.

When I checked the line with NaN etc., there were 170.

sum(!complete.cases(dat))  # 170

I'll erase it.

dat = dat[complete.cases(dat),]

The data is 176899 rows and 7 columns.

> str(dat)

'data.frame':	176899 obs. of  7 variables:
 $ biddate   : chr  "2015/12/20 18:58" "2015/12/20 18:03" "2015/12/20 18:03" "2015/12/20 18:01" ...
 $ facevalue : int  10000 5000 5000 20000 3000 5000 5000 3000 10000 3000 ...
 $ bidprice  : int  9750 4825 4825 19300 2880 4800 4825 2895 9700 2895 ...
 $ discount  : num  97.5 96.5 96.5 96.5 96 96 96.5 96.5 97 96.5 ...
 $ validdate : chr  "2015/12/20" "2016/12/20" "2016/11/20" "2016/12/20" ...
 $ biddate2  : Date, format: "2015-12-20" "2015-12-20" "2015-12-20" ...
 $ validdate2: Date, format: "2015-12-20" "2016-12-20" "2016-11-20" ...

Does the discount rate increase as the face value increases?

The higher the face value, the higher the discount rate. How is it actually?

require(ggplot2)
ggplot(dat, aes(facevalue, discount)) + geom_point() + labs(x="Face value [yen]", y="Discount rate [%]")

At first glance, it seems that there is no such tendency. Let's look at the slope of the regression line.

>summary(lm(discount ~ facevalue, data=dat))

Coefficients:
              Estimate Std. Error  t value Pr(>|t|)    
(Intercept)  9.401e+01  5.586e-03 16828.37   <2e-16 ***
facevalue   -1.812e-05  2.516e-07   -72.03   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The slope is-1.812e-05。Pr(>|t|)It is significant when you look at the value of. In other wordsIf the face value increases by 1000 yen, the price will be 0.02%Go downThere is a tendency. It's almost within the margin of error.

** Conclusion: There is no relationship between face value and discount rate **

Is there a relationship between the validity period and the discount rate?

Generally speaking, the shorter the validity period, the lower the demand, so the discount rate is likely to be higher. What about the truth?

Calculate the validity period from the expiration date and purchase date and time, and plot it together with the discount rate.

dat$timediff <- as.numeric(difftime(dat$validdate2, dat$biddate2, units = "days")) / 365.24
ggplot(dat, aes(timediff, discount)) + geom_point() +
    labs(x="Valid period [year]", y="Discount [%]")

There seems to be no particular tendency here either. The slope of the regression line was -0.099743 (p <2e-16) in the same way as before.

It seems that the discount rate is low with a validity period of 1 year, but it is probably because the number of samples is large and the base of distribution is wide. Below is the histogram.

** Conclusion: There is no relationship between the validity period and the discount rate **

ggplot(dat, aes(timediff)) + geom_histogram(binwidth = 1/12) + xlim(-1, 5) +
    labs(x="Valid period [year]", y="Frequency")

How is the discount rate changing throughout the year?

How does the discount rate change when viewed throughout the year? Is there a cheap season?

ggplot(dat, aes(biddate2, discount)) + geom_point(size=1) +
    ylim(75, 100) + labs(x="Date", y="Discount [%]")

The numbers on the horizontal axis are the months of 2015. It is showing a meandering movement. Since the data acquired this time is for the past year, I do not know the details of seasonal fluctuations, but looking at the data for the whole year, this season seems to be expensive. As far as the graph is concerned, 92.5-95% looks like a market price.

** Conclusion: Currently expensive. Wait until it reaches 92.5-95% before buying. ** **

Is there a relationship between the day of the week and the discount rate?

Also check the day of the week. Since the number of users of the site is large on Saturdays and Sundays, it will be advantageous for the seller side and the discount rate will be worse.

weekdays_names=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")
dat$weekdays <- factor(weekdays(dat$biddate2), levels= weekdays_names)
wddf <- aggregate(discount ~ weekdays, dat, mean)
gggplot(data=dat, aes(weekdays, discount)) + geom_boxplot() + 
    ylim(75,110) + labs(x="Day of the week", y="Discount rate [%]")

** Conclusion: The discount rate does not change on any day of the week **

Is there a relationship between the date and the discount rate?

Since the 25th is a payday, the user's wallet will be moisturized, and even if the conditions are a little bad, it will sell, so the discount rate may worsen.

dat$days <- factor(format(as.POSIXct(dat$biddate), format="%d"))
ggplot(dat, aes(days, discount)) + geom_boxplot() + 
    ylim(75,100) + labs(x="Day of a month", y="Discount rate [%]")

** Conclusion: The discount rate does not change on any day **

Does the discount rate change depending on the time of day?

Isn't it possible that the number of users will decrease and the discount rate will improve in the middle of the night, early morning, and daytime? Examine you.

dat$hours <- factor(format(as.POSIXct(dat$biddate), format="%H"))
ggplot(dat, aes(hours, discount)) + geom_boxplot() +
    ylim(75,100) + labs(x="Hour of a day", y="Discount rate [%]")

From 23:00 to 8:00 the next morning, the price is high. It's not a big difference, but if you're looking for a low price, it's best during the day.

** Conclusion: Daytime is slightly cheaper than other times **

Finally

What did you think? Based on this information, we hope that users can buy Amazon gift certificates at a low price.

Analyze Amazon Gift Certificate Low Price Information with Python for Web Scraping & R