I participated in Kaggle's NFL competition

Overview of the competition

--Held from October 10, 2019 to November 27, 2019 (https://www.kaggle.com/c/nfl-big-data-bowl-2020/ ） --Predict the number of yards the attacker will advance in American football runplay

Available data

--Overall information and player-specific information are given for each run play --Overall information: --Weather, temperature, wind speed, wind direction --Stadium, location, turf condition

Start time --Home / Away Score --Number of downs (attack ends in 4 times * 1) --Number of yards required to continue the attack --Number of advanced yards ← Predict this - etc... --Player-specific information * 2 - NFL ID --Location information (X, Y) --Movement information (S: velocity, A: acceleration) --Orientation (Dir: body orientation, Orientation: face orientation)
Birthday
height, weight --Home / Away --Jersey number --Position - etc...

1: If you move forward 10 yards or more within 4 downs, the attack will continue. If it cannot be done, the offense and defense will be switched.
2: There are a total of 22 lines of data, 11 offensive and 11 defensive players per play.

About the team

--The team name "Griffin Series" is inherited from AI RUSH --The team members this time are Mitsuno (tenyaf ) and Sato (foo_foo ) --Use Slack for communication and task management (I regret that I should have used Trello etc. for task management) --Code management forks Kaggle notebooks to each other ――Both of them have no prior knowledge of American football

result

--Public 47th (out of 2,038 teams) * At the end of the 1st stage (11/28) ――I was able to enter the silver area!

Introduction of team solution

Outline of solution

--Feature engineering focusing on player position and speed --Estimated the state after 0.5 and 1.0 seconds (calculated from speed and acceleration) and doubled the features. --The model is a simple MLP --Addition of post-processing to remove the impossible number of yards

Preprocessing

--Since there are two types of attack directions, "attack to the left" and "attack to the right", the directions are unified. --Team name identification --Weather / wind speed / height processing --Weather: Categorized by including a certain keyword (sunny) --Wind speed: Extract values excluding units such as "mph" --Height: Unified to feet and inches

Feature engineering

--Statistics in each play --X, Y, S, A min, max, ave, std, var --Min, max, ave, std, var in each of the X and Y directions of S

--Rusher (* 1) information --X, Y, S, A min, max, ave, std, var --S magnitude in the X direction, S magnitude in the Y direction --Distance to offensive / defensive team members (min, max, ave, sum, var) --The shortest time to collide with a defensive team f member (min, max, ave, std) --Speed ratio with the closest defensive member --Distance from the center of gravity of offensive / defensive team members --Distance from the line of scrimmage --Other player information

--Wide receiver (* 2) information --Wide receiver distance from scramage line (min, max, ave) --Enemy / ally X, Y, S, A (min, max, ave, std, var) within 5 yards of the wide receiver

1: Ball holder. Basically running back (RB).
2: There are multiple wide receivers (about 1 to 3)

Reinforcement using data after 0.5 seconds and 1.0 seconds

--Estimate and calculate X, Y, and S after 0 seconds based on "velocity", "acceleration", and "body orientation" * 1 --After replacing with X, Y, S after 0.5 seconds and 1.0 seconds, extract and combine various features (number of features: 146 → 379 * 2)

Position at the start and 1.0 seconds later 開始時点

1: Calculated assuming constant acceleration motion
2: Count excluding duplicate features

Neural Network architecture

--Fork and use the model of the public kernel - Optimizer = Adam - Loss = categorical cross entropy --Since the possible values of the number of yards are discrete, the model is trained as a 199 class classification of [-99, 99].

Post-processing

--Likelihood other than possible values is set to 0 based on the scramage line. --According to the submission rules, form a cumulative distribution of -99-99 yards

Ideas that could not be realized

--Graph Convolution by using the player's location information as a graph --Sort 22 players in a certain order and add X, Y, S, A to the column as it is --Information on X, Y, S, A of other important positions --Neural Network tuning

Impressions

tenyaf ――I participated in the Kaggle competition for the first time, and I enjoyed the subject matter very much. ――It was a battle against processing time, and I realized the need for fast code writing. ――There were many ideas that came to mind but did not reach implementation, so I wanted to improve the implementation power.
foo_foo --Feature engineering was relatively well done --Learn NN technology in table competitions --I should have used some tool for task management ――I will do my best to get gold