environment:
In this article, I want to analyze baseball data like Sabermetrics myself! This is a summary of the process of bringing in the necessary data that I did and putting it in the database.
As of October 29, 2015, Softbank won the Japan Series in Japan, and the Royals won the World Series for the second time in a row, which is a very exciting season for me as a baseball lover.
The reason why I was interested in Saber was that I wanted to study data analysis at first. Then, when I thought about what kind of data I should analyze and how to get the most out of it, I arrived at the idea of hacking my favorite baseball data with Python.
Therefore, I hope this article will be of some help to those who are thinking about the same thing.
Retrosheet Unfortunately, as for NPB, as detailed data as MLB is not published and cannot be downloaded (it can be collected by scraping).
So, this time, I will download the MLB data from Retrosheet (there are other Lahman etc.) and put it in the database.
(1) py-retrosheet, Chadwick Retrosheet data is very difficult to read as it is, and data cleaning is very troublesome. However, there are some people who have arranged such data neatly, so I will use that code.
$ brew update
$ brew install chadwick
$ git clone https://github.com/wellsoliver/py-retrosheet.git
Also download the libraries needed to use py-retrosheet. mysql-python required for sqlalchemy and MySQL used this time.
$ pip install sqlalchemy mysql-python
(2) MySQL Data download and import to SQL are also done with py-retrosheet, so I will prepare a database such as MySQL.
$ brew install mysql
Next, make various settings for MySQL. (This time, create and use a test user)
$ mysql_secure_installation #Security settings such as password
$ mysql.server start
$ mysql -u root -p
Enter password:What you set
mysql> CREATE DATABASE retrosheet #Create database
mysql> CREATE USER test IDENTIFIED BY "*****(Your Password)" #Create a test user named test
mysql> GRANT ALL ON retrosheet.* TO test #Set retrosheet database access to test users
After that, import the schema with the test user created earlier
$ mysql -u test[USER_NAME] -p retrosheet[DB_NAME] < sql/schema.sql
This completes the MySQL settings.
parse.py[line58-59,61]
sql = 'SELECT * FROM rosters WHERE year = %s AND player_id = %s AND team_tx = %s'
res = conn.execute(sql, [row[0], row[1], row[6]])
if results.rowcount() == 1:
parse.py[Revised]
sql = 'SELECT * FROM rosters WHERE year = ? AND player_id = ? AND team_tx = ?'
res = conn.execute(sql, [row[0], row[1], row[6]])
if len(results.fetchall()) == 1:
Finally, edit confit.ini and set up the database you just created
$ cd scripts
$ cp config.ini.dist config.ini
$ vim config.ini # config.ini edit
config.ini
[database]
engine = mysql # or sqlite, postgresql
host = localhost
database = retrosheet
schema = retrosheet
user = test
password = ****** # your password
Then download the data with download.py and import it into SQL with parse.py!
$ python download.py -y 2014 #Specify the year to download in the Christian era, no option is possible
$ python parse.py -y 2014 #Import into SQL
Check the contents of SQL with this, and if it can be imported properly, it ends. All you have to do is analyze the data as you like!
mysql> select * from retrosheet.teams;
+---------+-------+---------------+--------------+
| TEAM_ID | LG_ID | LOC_TEAM_TX | NAME_TEAM_TX |
+---------+-------+---------------+--------------+
| ANA | A | Anaheim | Angels |
| BAL | A | Baltimore | Orioles |
| BOS | A | Boston | Red Sox |
| CHA | A | Chicago | White Sox |
| CLE | A | Cleveland | Indians |
| DET | A | Detroit | Tigers |
| HOU | A | Houston | Astros |
| KCA | A | Kansas City | Royals |
| MIN | A | Minnesota | Twins |
| NYA | A | New York | Yankees |
| OAK | A | Oakland | Athletics |
| SEA | A | Seattle | Mariners |
| TBA | A | Tampa Bay | Rays |
| TEX | A | Texas | Rangers |
| TOR | A | Toronto | Blue Jays |
| ARI | N | Arizona | Diamondbacks |
| ATL | N | Atlanta | Braves |
| CHN | N | Chicago | Cubs |
| CIN | N | Cincinnati | Reds |
| COL | N | Colorado | Rockies |
| LAN | N | Los Angeles | Dodgers |
| MIA | N | Miami | Marlins |
| MIL | N | Milwaukee | Brewers |
| NYN | N | New York | Mets |
| PHI | N | Philadelphia | Phillies |
| PIT | N | Pittsburgh | Pirates |
| SDN | N | San Diego | Padres |
| SFN | N | San Francisco | Giants |
| SLN | N | St. Louis | Cardinals |
| WAS | N | Washington | Nationals |
+---------+-------+---------------+--------------+
Recommended Posts