Thank you for browsing. It is pbird yellow.
This time, I estimated the number of AA appearances from 1 million hands. Specifically, we estimate how likely and how much AA will appear for an unknown 2000 hand that you will play in the future. The estimation procedure is as follows.
① Aggregate the hands and create a histogram ② Test whether the aggregated hands have normality ③ Find the average value and standard deviation value ④ Estimate the number of occurrences of AA with an accuracy of 95%
Regarding the above contents and technical terms that will appear after that, the following books are very easy to understand, so I will post them. ・ "Complete Self-study: Introduction to Statistics Kindle Edition" Hiroyuki Kojima (Author) https://amzn.to/3mSPpqf
Below is a histogram of the conclusions. In the histogram of this article, the vertical axis shows the integrated value and the horizontal axis shows the number of times AA appeared in 2000 hands.
■ What is a histogram? A histogram is simply a "graph of aggregated data". Taking the above figure as an example ・ The number of times AA appeared every 2000 hands was 8 times (horizontal axis), but it was 72 times (vertical axis) in 1 million hands. ・ The number of times AA appeared every 2000 hands was twice (horizontal axis), but once in 1 million hands (vertical axis). And so on.
■ What is SW test? There are various test methods for normality, but this time we will use the SW test. The SW test is a method to verify whether the aggregated data has normality in the original data group (= population). With normality, various laws can be used. And it is possible to estimate the number of occurrences of AA using those laws.
The SW test uses the p-value to determine normality. The p value is a value that expresses the probability that data will be distributed like aggregated data when data is randomly selected from the population and distributed, assuming that the population has normality. In general, if the probability of distribution is less than 5%, it is too low and it is judged that the population has no rules with normality (= the population has no normality) in the first place. .. In the above figure, the p-value is 0.08> 0.05 (= 5%), so it can be said that there is marginal normality.
■ Mean and standard deviation In the above figure ・ Average → average (μ) ・ Deviation → standard deviation (σ) Is applicable.
■ Estimating method with 95% accuracy Let μ be the mean value of the aggregated data and σ be the standard deviation value. There is a 95% chance that the number of AA occurrences will fall within "μ-1.96σ ≤ x ≤ μ + 1.96σ".
So ** "If you play 2000 hands, AA will appear 3.37 times or more and 15.29 times or less with a 95% chance." ** It will be.
Since the estimation is based on the data of 1 million hands, there is some error in the above estimation value. As the data grows and the mean and standard deviation approaches the population mean and population standard deviation, the estimates will be accurate.
By the way, in the case of KK, it is as follows. Since P-value = 0.03 <0.05, the assumption is rejected and the population has no normality. In this case, the aggregated data also has no normality, so an estimated 95% accuracy cannot be calculated.
However, if the number of hands increases, the P-value value will increase and the data will be normal.
In the case of QQ, there is normality ** "If you play 2000 hands, QQ will appear 3.40 times or more and 14.54 times or less with a 95% probability" **.
By the way, why isn't KK regular? What's the difference with AA and QQ histograms! I think some people say that.
···I agree with you! !! !!
This is bad because the p-value is estimated around the boundary limit. This problem can be solved by changing every 2000 hands to every 1000 hands. However, this is a problem, and the values on the horizontal axis are only 0 or more, which makes it impossible to perform a proper analysis ...
The point is that 1 million hands is too few lol
However, it is really convenient because Python can do such complicated calculations in an instant. I will keep the source code, so please take advantage of it !!
As a matter of fact, the contents of the SW test are not very well understood. In Python, you can calculate with just one line, so even if you know what kind of numerical value you can get, the calculation process is difficult to understand. If you have any books that are explained with actual examples, I would be grateful if you could let me know!
The following is the source code. The program is a complete beginner, so if you have any suggestions on how to write better code, please let us know! !!
pokermain.py
from holdcards import Holdcards
from plotgraph import Plotgraph
import os
import glob
import re
path='Write the path here'
hand = "AA" #Describe the hand you want to look up
count = 2000 #Describe each hand you want to check
num = lambda val : int(re.sub("\\D", "", val))
filelist = sorted(glob.glob(os.path.join(path,"*.txt"),recursive=True),key = num)
totcards = []
graphdata = []
countdata = []
counthands = []
for item in filelist:
print(item)
with open(item) as f:
data = f.readlines()
card = Holdcards()
h_cards = card.find_holdcards(data)
totcards += h_cards
i = 0
while len(totcards[count*i:count*(i+1)]) == count:
graphdata.append(totcards[count*i:count*(i+1)])
i += 1
for item in graphdata:
countdata.append(item.count(hand))
graph= Plotgraph()
graph.writehist(countdata,hand,count,len(graphdata)*count) #SW test-Normalization
holdcards.py
class Holdcards:
def __init__(self):
self.trump={"A":"14","K":"13","Q":"12","J":"11","T":"10","9":"9","8":"8","7":"7","6":"6","5":"5","4":"4","3":"3","2":"2"}
self.r_trump={"14":"A","13":"K","12":"Q","11":"J","10":"T","9":"9","8":"8","7":"7","6":"6","5":"5","4":"4","3":"3","2":"2"}
self.hands = 0
self.tothands = 0
self.handlist = []
def find_holdcards(self,data):
holdcards = []
for item in data:
if 'Dealt to' in item:
item = item[-7:-2]
if item[1] == item[4]:
if int(self.trump.get(item[0])) > int(self.trump.get(item[3])):
item = item[0] + item[3] + 's'
else:
item = item[3] + item[0] + 's'
else:
if int(self.trump.get(item[0])) > int(self.trump.get(item[3])):
item = item[0] + item[3] + 'o'
elif item[0] == item[3]:
item = item[0] + item[3]
else:
item = item[3] + item[0] + 'o'
holdcards.append(item)
return holdcards
plotgraph.py
import numpy as np
import pandas as pd
import scipy.stats as st
import math
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import matplotlib.transforms as ts
class Plotgraph:
def __init__(self):
pass
def writehist(self,countdata,hand,count,tothands):#Mean mu, standard deviation sig, number of normal random numbers n
df = pd.DataFrame( {'p1':countdata} )
target = 'p1' #Columns to plot in the data frame
# (1)Statistical processing
mu = round(df[target].mean(),2) #average
sig = round(df[target].std(ddof=0),2)#Standard deviation: ddof(Degree of freedom)=0
print(f'■ Average:{df[target].mean():.2f},standard deviation:{df[target].std(ddof=0):.2f}')
ci1, ci2 = (None, None)
#Graph drawing parameters
x_min = round(mu - 3*sig)
x_max = round(mu + 3*sig) #Score range to plot (lower and upper limits)
j = 10 #Y-axis (frequency) step size
k = 1 #class
bins = int((x_max - x_min)/k) #Number of sections(x_max-x_min)/k (100-40)/5->12
d = 0.001
#Drawing process from here
plt.figure(dpi=96)
plt.xlim(x_min,x_max)
hist_data = plt.hist(df[target], bins=bins, color='tab:cyan', range=(x_min, x_max), rwidth=0.9)
n = len(hist_data[0]) #Specimen size
plt.title("hand = "+hand+" , totalhands = "+str(tothands))
# (2)Histogram drawing
plt.gca().set_xticks(np.arange(x_min,x_max-k+d, k))
#Test of normality (significance level 5)%)
_, p = st.shapiro(hist_data[0])
print(hist_data[0])
print(st.shapiro(hist_data[0]))
if p >= 0.05 :
print(f' - p={p:.2f} ( p>=0.05 )And it can be said that the population has normality')
U2 = df[target].var(ddof=1) #Population variance estimate (unbiased variance)
print(U2)
DF = n-1 #Degree of freedom
SE = math.sqrt(U2/n) #Standard error
print(SE)
ci1,ci2 = st.t.interval( alpha=0.95, loc=mu, scale=SE, df=DF )
else:
print(f' ※ p={p:.2f} ( p<0.05 )And the population cannot be said to be normal')
# (3)Approximate curve assuming a normal distribution
sig = df[target].std(ddof=1) #Unbiased standard deviation: ddof(Degree of freedom)=1
nx = np.linspace(x_min, x_max+d, 150) #150 divisions
ny = st.norm.pdf(nx,mu,sig) * k * len(df[target])
plt.plot( nx , ny, color='tab:blue', linewidth=1.5, linestyle='--')
# (4)X-axis scale / label setting
plt.xlabel('total"'+str(hand)+'"/'+str(count)+'hands',fontsize=12)
plt.gca().set_xticks(np.arange(x_min,x_max+d, k))
# (5)Y-axis scale / label setting
y_max = max(hist_data[0].max(), st.norm.pdf(mu,mu,sig) * k * len(df[target]))
y_max = int(((y_max//j)+1)*j) #The smallest multiple of j that is greater than the maximum frequency
plt.ylim(0,y_max)
plt.gca().set_yticks( range(0,y_max+1,j) )
plt.ylabel('Accumulation',fontsize=12)
# (6)Text output of mean and standard deviation
tx = 0.03 #For character output position adjustment
ty = 0.91 #For character output position adjustment
tt = 0.08 #For character output position adjustment
tp = dict( horizontalalignment='left',verticalalignment='bottom',
transform=plt.gca().transAxes, fontsize=11 )
plt.text( tx, ty, f'average {mu:.2f}', **tp)
plt.text( tx, ty-tt, f'deviation {sig:.2f}', **tp)
plt.text( tx, ty-tt-tt, f'P-value {p:.2f}', **tp)
plt.vlines( mu, 0, y_max, color='black', linewidth=1 )
plt.show()
Recommended Posts