I tried to verify the result of A / B test by chi-square test

In the field of digital marketing, PDCA of various measures is going around every day. In such a process, it may be difficult to accurately verify the effect of measures simply by comparing the magnitude of the numbers. From the perspective of how to apply the knowledge of statistics to better analysis and improvement proposals, I worked on a case study using the method of statistical testing.

The topic this time is to verify whether there is a significant difference in the results of A / B testing of a website (assuming a significance level of 0.05). We adopted the chi-square test (independence test), which is often used to verify the effectiveness of A / B testing, and referred to kaggle's "audacity ab testing" for the data. https://www.kaggle.com/samtyagi/audacity-ab-testing

First, import the library and load the data.

import math
import numpy as np
import pandas as pd
import scipy.stats
df=pd.read_csv("homepage_actions.csv")
df.head()

Capture.PNG

The explanation of each column is as follows. timestamp: Access date and time id: User ID group: Control group is control, test group is experiment action: Click when clicked, view if just seen

Then aggregate the data. Let's get the total number for each group.

group=df.groupby('group').count()
group

Capture4.PNG

In addition, the pivot table aggregates the number of clicks for each group.

pd.pivot_table(df,index='group',columns='action',values=['group','action'],aggfunc='count')

Capture3.PNG

The click rate of each group is Control group: 932 ÷ 4264 = 0.21857410881801126 Test group: 928/3924 = 0.2364937410805302 And you can see that the click rate itself is higher in the test group.

Is it possible to say that the test group has a significantly higher click rate? Let's verify with the chi-square test. For Python, use the chi2_contingency function in scipy.stats.

data=np.matrix([[932,3332],[928,2996]])
chi2,p,ddof,expected=scipy.stats.chi2_contingency(data,correction=False)

print("Chi-square value:", chi2)
print("p-value:", p)
print("Degree of freedom:", ddof)
print("Expected frequency:", expected)

Capture1.PNG

Looking at the output results, the p value, which is the significance probability, was higher than 0.05. In this case, the null hypothesis that "there is no significant difference between the two samples" is not rejected. This means that the test group does not have a significantly higher CTR. You can see that it is not possible to judge by the size of simple numbers.

In some cases, it may not be appropriate to judge whether a measure is good or bad based on the click rate alone, but analysis by such a method can be used as a judgment material for taking more effective measures.

Recommended Posts

I tried to verify the result of A / B test by chi-square test
I tried to verify and analyze the acceleration of Python by Cython
I tried to verify the best way to find a good marriage partner
I tried to touch the API of ebay
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 1
I tried to find the optimal path of the dreamland by (quantum) annealing
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2
[Linux] I tried to verify the secure confirmation method of FQDN (CentOS7)
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
A super introduction to Django by Python beginners! Part 2 I tried using the convenient functions of the template
I tried to predict the presence or absence of snow by machine learning.
I tried to rescue the data of the laptop by booting it on Ubuntu
I tried to pass the G test and E qualification by training from 50
I tried to create a model with the sample of Amazon SageMaker Autopilot
I want to grep the execution result of strace
I tried to summarize the basic form of GPLVM
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python
I tried to confirm whether the unbiased estimator of standard deviation is really unbiased by "throwing a coin 10,000 times"
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
I tried to find the entropy of the image with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried to get the location information of Odakyu Bus
I tried to find the average of the sequence with TensorFlow
How to test the attributes added by add_request_method of pyramid
I made a function to check the model of DCGAN
I tried programming the chi-square test in Python and Java.
[TF] I tried to visualize the learning result using Tensorboard
[Machine learning] I tried to summarize the theory of Adaboost
I tried to fight the Local Minimum of Goldstein-Price Function
A super introduction to Django by Python beginners! Part 6 I tried to implement the login function
I tried to tabulate the number of deaths per capita of COVID-19 (new coronavirus) by country
I tried to compare the accuracy of machine learning models using kaggle as a theme.
I tried to create a Python script to get the value of a cell in Microsoft Excel
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried to move the ball
I tried to automate the construction of a hands-on environment using IBM Cloud's SoftLayer API
I tried to predict the sales of game software with VARISTA by referring to the article of Codexa
I tried to estimate the interval.
[Linux] I tried to summarize the command of resource confirmation system
I tried to get a database of horse racing using Pandas
I tried to create a simple credit score by logistic regression.
I tried to get the index of the list using the enumerate function
I tried to make a regular expression of "amount" using Python
I tried to make a regular expression of "time" using Python
I tried to build the SD boot image of LicheePi Nano
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
I tried to visualize the Beverage Preference Dataset by tensor decomposition.
I tried to create a list of prime numbers with python
I tried to make a regular expression of "date" using Python
Can I pass the first grade of math test by programming?
How to output the output result of the Linux man command to a file
I tried to get a list of AMI Names using Boto3
I tried to register a station on the IoT platform "Rimotte"
I tried to summarize the commands used by beginner engineers today
I tried to predict by letting RNN learn the sine wave
I tried to expand the size of the logical volume with LVM