On June 18, 2014, Dodgers pitcher Clayton Kershaw pitched nine times in the Colorado Rockies vs. Los Angeles Dodgers match, achieving 15 strikeouts and no-hitters. This time, we will compare it with the pitcher of the opponent Rockies and analyze why Clayton Kershaw was able to achieve a no-hitter no-run.
・ Python 3.7.5 ・ Windows10 ・ Jupyter Notebook (Anaconda3)
$ jupyter notebook
baseball_analysis.ipynb
%matplotlib inline
import requests
import xml.etree.ElementTree as ET
import os
import pandas as pd
baseball_analysis.ipynb
#Data frame creation
pitchDF = pd.DataFrame(columns = ['pitchIdx', 'inning', 'frame', 'ab', 'abIdx', 'batter', 'stand', 'speed',
'pitchtype', 'px', 'pz', 'szTop', 'szBottom', 'des'], dtype=object)
#Creating a ball type dictionary
pitchDictionary = { "FA":"fastball", "FF":"4-seam fb", "FT": "2-seam fb", "FC": "fb-cutter", "":"unknown", None: "none",
"FS":"fb-splitter", "SL":"slider", "CH":"changeup","CU":"curveball","KC":"knuckle-curve",
"KN":"knuckleball","EP":"eephus", "UN":"unidentified", "PO":"pitchout", "SI":"sinker", "SF":"split-finger"
}
# top=Table, bottom=back
frames = ["top", "bottom"]
baseball_analysis.ipynb
#Read player information distributed by MLB Advanced Media
url = 'https://gd2.mlb.com/components/game/mlb/year_2014/month_06/day_18/gid_2014_06_18_colmlb_lanmlb_1/players.xml'
resp = requests.get(url)
xmlfile = "myplayers.xml"
with open(xmlfile, mode='wb') as f:
f.write(resp.content)
statinfo = os.stat(xmlfile)
#Parse xml file
tree = ET.parse(xmlfile)
game = tree.getroot()
teams = game.findall("./team")
playerDict = {}
for team in teams:
players = team.findall("./player")
for player in players:
#Add player ID and player name to dictionary
playerDict[ player.attrib.get("id") ] = player.attrib.get("first") + " " + player.attrib.get("last")
baseball_analysis.ipynb
#Read the data for each inning distributed by MLB Advanced Media
url = 'https://gd2.mlb.com/components/game/mlb/year_2014/month_06/day_18/gid_2014_06_18_colmlb_lanmlb_1/inning/inning_all.xml'
resp = requests.get(url)
xmlfile = "mygame.xml"
with open(xmlfile, 'wb') as f:
f.write(resp.content)
statinfo = os.stat(xmlfile)
#Parse xml file
tree = ET.parse(xmlfile)
root = tree.getroot()
innings = root.findall("./inning")
totalPitchCount = 0
topPitchCount = 0
bottomPitchCount = 0
for inning in innings:
for i in range(len(frames)):
fr = inning.find(frames[i])
if fr is not None:
for ab in fr.iter('atbat'):
battername = playerDict[ab.get('batter')]
standside = ab.get('stand')
abIdx = ab.get('num')
abPitchCount = 0
pitches = ab.findall("pitch")
for pitch in pitches:
if pitch.attrib.get("start_speed") is None:
speed == 0
else:
speed = float(pitch.attrib.get("start_speed"))
pxFloat = 0.0 if pitch.attrib.get("px") == None else float('{0:.2f}'.format(float(pitch.attrib.get("px"))))
pzFloat = 0.0 if pitch.attrib.get("pz") == None else float('{0:.2f}'.format(float(pitch.attrib.get("pz"))))
szTop = 0.0 if pitch.attrib.get("sz_top") == None else float('{0:.2f}'.format(float(pitch.attrib.get("sz_top"))))
szBot = 0.0 if pitch.attrib.get("sz_bot") == None else float('{0:.2f}'.format(float(pitch.attrib.get("sz_bot"))))
abPitchCount = abPitchCount + 1
totalPitchCount = totalPitchCount + 1
if frames[i]=='top':
topPitchCount = topPitchCount + 1
else:
bottomPitchCount = bottomPitchCount + 1
inn = inning.attrib.get("num")
verbosePitch = pitchDictionary[pitch.get("pitch_type")]
desPitch = pitch.get("des")
#Add to data frame
pitchDF.loc[totalPitchCount] = [float(totalPitchCount), inn, frames[i], abIdx, abPitchCount, battername, standside, speed,
verbosePitch, pxFloat, pzFloat, szTop, szBot, desPitch]
baseball_analysis.ipynb
pitchDF
# pitchIdx=serial number
# inning=inning
# frame=Front and back
# ab=Batter ID
# abIdx=Number of balls per turn at bat
# batter=Batter name
# stand=At bat(R → right-handed, L → left-handed)
# speed=Ball speed
# pitchtype=Ball type
# px=Home base passing position(Left and right)(Right → positive, left → negative)
# pz=Home base passing position(High low)
# szTop=Distance from the ground to the highest batter's strike zone
# szBottom=Distance from the ground to the lowest batter's strike zone
# des=result
baseball_analysis.ipynb
import matplotlib.pyplot as plt
import matplotlib.patches as patches
#Draw a new window
fig1 = plt.figure()
#Add subplot
ax1 = fig1.add_subplot(111, aspect='equal')
#Strike zone width is 17 inches= 1.4 feet
#Strike zone height is 1.5~3.5 feet
#Baseball ball size is 3 inches= 0.25 feet
#How to find feet=inch/ 12
#Strike zone creation
#The blue frame is the strike zone
platewidthInFeet = 17 / 12
szHeightInFeet = 3.5 - 1.5
#Create a strike zone outside one ball
#The light blue frame is a strike zone outside one ball
expandedPlateInFeet = 20 / 12
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2
ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, 1.5 - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, 1.5), platewidthInFeet, szHeightInFeet))
plt.ylim(0, 5)
plt.xlim(-2, 2)
plt.show()
baseball_analysis.ipynb
uniqDesList = pitchDF.des.unique()
ballColList = []
strikeColList = []
ballCount = 0
strikeCount = 0
for index, row in pitchDF.iterrows():
des = row['des']
if row['abIdx'] == 1:
ballCount = 0
strikeCount = 0
ballColList.append(ballCount)
strikeColList.append(strikeCount)
if 'Ball' in des:
ballCount = ballCount + 1
elif 'Foul' in des:
if strikeCount is not 2:
strikeCount = strikeCount + 1
elif 'Strike' in des:
strikeCount = strikeCount + 1
#Add to data frame
pitchDF['ballCount'] = ballColList
pitchDF['strikeCount'] = strikeColList
baseball_analysis.ipynb
pitchDF
baseball_analysis.ipynb
df= pitchDF.loc[pitchDF['frame']=='top']
ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Clayton Kershaw's pitching tendency')
ax1.set_aspect(aspect=1)
platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2
outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2
rect.zorder=-1
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()
baseball_analysis.ipynb
df= pitchDF.loc[pitchDF['frame']=='bottom']
ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Rockies pitching tendency')
ax1.set_aspect(aspect=1)
platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2
outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2
rect.zorder=-1
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()
Comparing both pitchers, ** Clayton strike rate: 65% ** ** Rockies strike rate: 56% ** I found out that. I feel that Clayton has fewer laterally missed balls than Rockies pitchers. Is it the influence of the slider or the straight that hops that there is a lot of vertical variation?
Next, let's look at the tendency of the first ball.
baseball_analysis.ipynb
df= pitchDF.loc[pitchDF['frame']=='top'].loc[pitchDF['abIdx']==1]
ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Clayton Kershaw's first ball tendency')
ax1.set_aspect(aspect=1)
platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2
outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2
rect.zorder=-1
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()
baseball_analysis.ipynb
df= pitchDF.loc[pitchDF['frame']=='bottom'].loc[pitchDF['abIdx']==1]
ax1 = df.plot(kind='scatter', x='px', y='pz', marker='o', color='red', figsize=[8,8], ylim=[0,4], xlim=[-2,2])
ax1.set_xlabel('horizontal location')
ax1.set_ylabel('vertical location')
ax1.set_title('Rockies' first ball tendency')
ax1.set_aspect(aspect=1)
platewidthInFeet = 17 / 12
expandedPlateInFeet = 20 / 12
szTop = df["szTop"].iloc[0]
szBottom = df["szBottom"].iloc[0]
szHeightInFeet = szTop - szBottom
ballInFeet = 3 / 12
halfBallInFeet = ballInFeet / 2
outrect = ax1.add_patch(patches.Rectangle((expandedPlateInFeet/-2, szBottom - halfBallInFeet), expandedPlateInFeet, szHeightInFeet + ballInFeet, color='lightblue'))
rect = ax1.add_patch(patches.Rectangle((platewidthInFeet/-2, szBottom), platewidthInFeet, szHeightInFeet))
outrect.zorder=-2
rect.zorder=-1
plt.ylim(0, 5)
plt.xlim(-2.5, 2.5)
plt.show()
Comparing both pitchers, ** Clayton's first ball strike rate: 71% ** ** Rockies first ball strike rate: 64% ** I found out that.
Pitcher Clayton has a small number of balls and is ahead of the strike.
Next, let's look at the change in ball speed.
baseball_analysis.ipynb
df = pitchDF.loc[(pitchDF['frame']=='top')]
speed = df['speed']
print(sum(speed) / len(speed))
print(max(speed))
print(min(speed))
print(max(speed) - min(speed))
ax = df.plot(x='pitchIdx', y='speed', color='blue', figsize=[12,6])
ax.set_ylabel('speed')
ax.set_title('Rockies ball speed change')
plt.savefig('pitch_rockies_speed.png')
plt.show()
>>>>>>>>>>>>>>>>>>>>>>>>>
#Average ball speed: 87.88504672897201
#Fastest: 95.0
#The latest: 72.4
#Slow / fast difference: 22.599999999999994
baseball_analysis.ipynb
df = pitchDF.loc[(pitchDF['frame']=='bottom')]
speed = df['speed']
print(sum(speed) / len(speed))
print(max(speed))
print(min(speed))
print(max(speed) - min(speed))
ax = df.plot(x='pitchIdx', y='speed', color='blue', figsize=[12,6])
ax.set_ylabel('speed')
ax.set_title('Rockies ball speed change')
plt.savefig('pitch_rockies_speed.png')
plt.show()
>>>>>>>>>>>>>>>>>>>>>>>>>
#Average ball speed: 89.13599999999998
#Fastest: 96.3
#The latest: 71.8
#Slow / fast difference: 24.5
Comparing both pitchers, Clayton ** Average ball speed: 87 miles ** ** Fastest: 95 miles ** ** Late: 72 miles ** ** Speed difference: 22 miles **
Rockies ** Average ball speed: 89 miles ** ** Fastest: 96 miles ** ** Late: 71 miles ** ** Speed difference: 24 miles ** I found out that.
Rockies has five pitchers, so it's natural that there is a difference in the tendency.
Next, let's look at the change in ball speed.
baseball_analysis.ipynb
df = pitchDF.loc[(pitchDF['frame']=='top')]
df.pitchtype.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('Ball type ratio')
plt.show()
baseball_analysis.ipynb
df = pitchDF.loc[(pitchDF['pitchtype']=='4-seam fb') & (pitchDF['frame']=='top')]
df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('4-seam event results')
plt.show()
baseball_analysis.ipynb
df = pitchDF.loc[(pitchDF['pitchtype']=='slider') & (pitchDF['frame']=='top')]
df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('slider event result')
plt.show()
baseball_analysis.ipynb
df = pitchDF.loc[(pitchDF['pitchtype']=='curveball') & (pitchDF['frame']=='top')]
df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('curveball event result')
plt.show()
baseball_analysis.ipynb
df = pitchDF.loc[(pitchDF['pitchtype']=='changeup') & (pitchDF['frame']=='top')]
df.des.value_counts().plot(kind='pie', autopct="%.1f%%", pctdistance=0.8)
plt.axis('equal')
plt.axis('off')
plt.title('changeup event result')
plt.show()
Comparing the out rates for each type of ball, ** 4 seams: 35.7% ** ** Slider: 18.8% ** ** Curve: 22.3% ** ** Changeup: 0% ** I found out that.
The four seams, which account for half of the number of pitches, are pretty good.
Next, let's look at the ball distribution by count.
baseball_analysis.ipynb
titleList = []
dataList = []
fig, axes = plt.subplots(4, 3, figsize=(12,16))
#Count creation
for b in range(4):
for s in range(3):
df = pitchDF.loc[(pitchDF['ballCount']==b) & (pitchDF['strikeCount']==s) & (pitchDF['frame']=='top')]
title = "Count:" + str(b) + "-" + str(s) + " (" + str(len(df)) + ")"
titleList.append(title)
dataList.append(df)
for i, ax in enumerate(axes.flatten()):
x = dataList[i].pitchtype.value_counts()
l = dataList[i].pitchtype.unique()
ax.pie(x, autopct="%.1f%%", pctdistance=0.9, labels=l)
ax.set_title(titleList[i])
plt.show()
Well, almost 4 seams.
Next, let's look at the results by count.
baseball_analysis.ipynb
titleList = []
dataList = []
fig, axes = plt.subplots(4, 3, figsize=(12,16))
for b in range(4):
for s in range(3):
df = pitchDF.loc[(pitchDF['ballCount']==b) & (pitchDF['strikeCount']==s) & pitchDF['des'] & (pitchDF['frame']=='top')]
title = "Count:" + str(b) + "-" + str(s) + " (" + str(len(df)) + ")"
titleList.append(title)
dataList.append(df)
for i, ax in enumerate(axes.flatten()):
x = dataList[i].des.value_counts()
l = dataList[i].des.unique()
ax.pie(x, autopct="%.1f%%", pctdistance=0.9, labels=l)
ax.set_title(titleList[i])
plt.show()
You can see that there is a high probability of a strike judgment and In play outs (out as a result of the ball flying to the field) at any count.
--There are many first-ball strikes, and we have a favorable count (we have taken quite a bit before it became advantageous).
--There is a strong tendency for four seams to be distributed
--Be careful of sliders that come unexpectedly (probably vertical cracks)
I was familiar with the characteristics of pitcher Clayton to some extent, but I couldn't understand the reason why he got a no-hitter no-run without comparing with other games. You will also need a record of past battles with batters. Pitcher Clayton had good control and pitched only 107 pitches in this match. MLB has more games than NPB and throws through the season in the middle of the 4th, so even if the pitcher is a good pitcher, there is a tendency to drop at around 120 pitches due to pitching restrictions. Therefore, good control may be the most important factor in achieving a no-hitter no-run in major leagues. It's been a long time, but thank you for reading this far. If you find any mistakes, I would be very grateful if you could point them out in the comments.
Recommended Posts