Learning ObjectivesBy the end of this chapter, you will be able to:
- Work with NFL tracking data (Next Gen Stats)
- Analyze player movement and positioning
- Calculate spatial metrics (separation, coverage)
- Build tracking-based models
- Visualize tracking data effectively
Introduction
Player tracking data represents one of the most significant advances in football analytics. Since the NFL began installing RFID chips in player shoulder pads and the football in 2015, analysts have access to precise location data for all 22 players on the field at 10 times per second. This granular spatial and temporal data opens up entirely new avenues for analysis that were impossible with traditional play-by-play data.
Next Gen Stats (NGS) tracking data captures:
- Player locations (x, y coordinates)
- Player velocity and acceleration
- Player orientation (direction facing)
- Ball location and trajectory
- All measurements 10 times per second
What is Tracking Data?
Tracking data provides the precise location and movement of every player and the ball throughout a play. Each frame captures 22 players plus the ball, recorded 10 times per second, resulting in hundreds of data points per play.This chapter will teach you how to work with tracking data to answer questions that traditional statistics cannot address:
- How fast was the receiver running when the ball arrived?
- How much separation did the receiver create?
- Which defenders were in optimal position?
- What spatial area did each defender control?
- How do route patterns cluster together?
Tracking Data Structure and Format
Data Format
NFL tracking data is typically provided in a longitudinal format with one row per player per frame:
#| label: load-libraries-r
#| message: false
#| warning: false
library(tidyverse)
library(nflfastR)
library(arrow)
library(gganimate)
library(ggforce)
library(ggrepel)
library(gt)
library(plotly)
#| label: load-libraries-py
#| message: false
#| warning: false
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.spatial import Voronoi, voronoi_plot_2d
from scipy.spatial.distance import cdist
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')
Loading Tracking Data
The NFL has released tracking data through Kaggle competitions. Let's load and examine the structure:
#| label: load-tracking-r
#| eval: false
#| echo: true
# Load tracking data (example from Big Data Bowl)
tracking <- read_csv("data/tracking_week_1.csv")
# Examine structure
glimpse(tracking)
# Sample output:
# Rows: 1,234,567
# Columns: 14
# $ gameId <dbl> 2021091200, 2021091200, ...
# $ playId <dbl> 97, 97, 97, ...
# $ nflId <dbl> NA, 47848, 32488, ...
# $ frameId <dbl> 1, 1, 1, ...
# $ time <chr> "2021-09-12 16:03:21.5", ...
# $ jerseyNumber <dbl> NA, 88, 26, ...
# $ team <chr> "football", "ARI", "ARI", ...
# $ playDirection <chr> "left", "left", "left", ...
# $ x <dbl> 46.84, 28.33, 31.82, ...
# $ y <dbl> 26.65, 18.77, 30.52, ...
# $ s <dbl> 0.00, 1.38, 1.02, ...
# $ a <dbl> 0.00, 1.84, 1.42, ...
# $ dis <dbl> 0.00, 0.14, 0.10, ...
# $ o <dbl> NA, 171.26, 79.38, ...
# $ dir <dbl> NA, 174.88, 81.20, ...
#| label: load-tracking-py
#| eval: false
#| echo: true
# Load tracking data (example from Big Data Bowl)
tracking = pd.read_csv("data/tracking_week_1.csv")
# Examine structure
print(tracking.info())
print("\nFirst few rows:")
print(tracking.head())
# Sample columns:
# - gameId: Game identifier
# - playId: Play identifier
# - nflId: Player identifier (NA for football)
# - frameId: Frame number (1-N)
# - time: Timestamp
# - jerseyNumber: Player jersey number
# - team: Team abbreviation or 'football'
# - playDirection: Direction of play
# - x, y: Field coordinates
# - s: Speed (yards/second)
# - a: Acceleration (yards/second^2)
# - dis: Distance traveled (yards)
# - o: Orientation (degrees)
# - dir: Direction of movement (degrees)
Coordinate System
Understanding the coordinate system is crucial:
- x coordinate: Position along the length of the field (0-120 yards)
- 0 = back of left end zone
- 10 = left goal line
- 60 = midfield
- 110 = right goal line
-
120 = back of right end zone
-
y coordinate: Position across the width of the field (0-53.3 yards)
- 0 = left sideline
- 26.65 = middle of field
-
53.3 = right sideline
-
playDirection: "left" or "right" indicating offensive direction
Standardizing Play Direction
Always standardize plays to go in one direction (typically left-to-right) to make analysis easier. This involves flipping x coordinates when playDirection is "left".Creating Synthetic Tracking Data
For demonstration purposes, let's create synthetic tracking data for a simple passing play:
#| label: synthetic-tracking-r
#| message: false
#| warning: false
# Create synthetic tracking data for one play
set.seed(123)
# Function to generate player trajectory
generate_trajectory <- function(start_x, start_y, end_x, end_y, frames = 50) {
x <- seq(start_x, end_x, length.out = frames)
y <- seq(start_y, end_y, length.out = frames)
# Add some noise for realism
y <- y + rnorm(frames, 0, 0.5)
tibble(
frameId = 1:frames,
x = x,
y = y
)
}
# Generate receiver route (go route)
receiver <- generate_trajectory(75, 20, 95, 22) %>%
mutate(
nflId = 1001,
jerseyNumber = 88,
team = "OFF",
position = "WR"
)
# Generate cornerback coverage
cornerback <- generate_trajectory(75, 18, 93, 20) %>%
mutate(
nflId = 2001,
jerseyNumber = 25,
team = "DEF",
position = "CB"
)
# Generate safety help
safety <- generate_trajectory(85, 26.65, 95, 24) %>%
mutate(
nflId = 2002,
jerseyNumber = 43,
team = "DEF",
position = "S"
)
# Combine all players
tracking_example <- bind_rows(receiver, cornerback, safety) %>%
mutate(
gameId = 2023091000,
playId = 1
)
# Calculate speed and acceleration
tracking_example <- tracking_example %>%
group_by(nflId) %>%
arrange(frameId) %>%
mutate(
# Speed calculation
dx = x - lag(x, default = first(x)),
dy = y - lag(y, default = first(y)),
s = sqrt(dx^2 + dy^2) * 10, # multiply by 10 for yards/second
# Acceleration
a = (s - lag(s, default = first(s))) * 10,
# Direction
dir = atan2(dy, dx) * 180 / pi
) %>%
ungroup()
# Display sample
tracking_example %>%
filter(frameId <= 3) %>%
select(frameId, nflId, jerseyNumber, position, x, y, s) %>%
gt() %>%
fmt_number(columns = c(x, y, s), decimals = 2) %>%
tab_header(title = "Sample Tracking Data")
#| label: synthetic-tracking-py
#| message: false
#| warning: false
# Create synthetic tracking data for one play
np.random.seed(123)
def generate_trajectory(start_x, start_y, end_x, end_y, frames=50):
"""Generate player trajectory with noise"""
x = np.linspace(start_x, end_x, frames)
y = np.linspace(start_y, end_y, frames)
# Add noise for realism
y = y + np.random.normal(0, 0.5, frames)
return pd.DataFrame({
'frameId': range(1, frames + 1),
'x': x,
'y': y
})
# Generate receiver route (go route)
receiver = generate_trajectory(75, 20, 95, 22)
receiver['nflId'] = 1001
receiver['jerseyNumber'] = 88
receiver['team'] = 'OFF'
receiver['position'] = 'WR'
# Generate cornerback coverage
cornerback = generate_trajectory(75, 18, 93, 20)
cornerback['nflId'] = 2001
cornerback['jerseyNumber'] = 25
cornerback['team'] = 'DEF'
cornerback['position'] = 'CB'
# Generate safety help
safety = generate_trajectory(85, 26.65, 95, 24)
safety['nflId'] = 2002
safety['jerseyNumber'] = 43
safety['team'] = 'DEF'
safety['position'] = 'S'
# Combine all players
tracking_example = pd.concat([receiver, cornerback, safety], ignore_index=True)
tracking_example['gameId'] = 2023091000
tracking_example['playId'] = 1
# Calculate speed and acceleration
tracking_example = tracking_example.sort_values(['nflId', 'frameId'])
for nfl_id in tracking_example['nflId'].unique():
mask = tracking_example['nflId'] == nfl_id
# Calculate differences
dx = tracking_example.loc[mask, 'x'].diff().fillna(0)
dy = tracking_example.loc[mask, 'y'].diff().fillna(0)
# Speed (yards/second)
tracking_example.loc[mask, 's'] = np.sqrt(dx**2 + dy**2) * 10
# Acceleration
s_diff = tracking_example.loc[mask, 's'].diff().fillna(0)
tracking_example.loc[mask, 'a'] = s_diff * 10
# Direction
tracking_example.loc[mask, 'dir'] = np.arctan2(dy, dx) * 180 / np.pi
# Display sample
print("Sample Tracking Data:")
print(tracking_example[tracking_example['frameId'] <= 3][
['frameId', 'nflId', 'jerseyNumber', 'position', 'x', 'y', 's']
].to_string(index=False))
Player Speed, Acceleration, and Distance Metrics
Speed Analysis
Speed is one of the most straightforward metrics from tracking data. Let's analyze speed distributions and identify top speeds:
#| label: speed-analysis-r
#| message: false
#| warning: false
# Calculate max speed for each player
max_speeds <- tracking_example %>%
group_by(nflId, jerseyNumber, position) %>%
summarise(
max_speed = max(s, na.rm = TRUE),
avg_speed = mean(s, na.rm = TRUE),
frames = n(),
.groups = "drop"
)
max_speeds %>%
arrange(desc(max_speed)) %>%
gt() %>%
fmt_number(columns = c(max_speed, avg_speed), decimals = 2) %>%
cols_label(
nflId = "Player ID",
jerseyNumber = "Jersey",
position = "Position",
max_speed = "Max Speed (yd/s)",
avg_speed = "Avg Speed (yd/s)",
frames = "Frames"
) %>%
tab_header(title = "Player Speed Summary")
#| label: speed-analysis-py
#| message: false
#| warning: false
# Calculate max speed for each player
max_speeds = (tracking_example
.groupby(['nflId', 'jerseyNumber', 'position'])
.agg(
max_speed=('s', 'max'),
avg_speed=('s', 'mean'),
frames=('s', 'count')
)
.reset_index()
.sort_values('max_speed', ascending=False)
)
print("\nPlayer Speed Summary:")
print(max_speeds.to_string(index=False))
Speed Over Time Visualization
#| label: fig-speed-time-r
#| fig-cap: "Player speed throughout the play"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false
tracking_example %>%
filter(!is.na(s)) %>%
ggplot(aes(x = frameId, y = s, color = position, group = nflId)) +
geom_line(linewidth = 1) +
geom_point(size = 2, alpha = 0.6) +
scale_color_manual(
values = c("WR" = "#00BFC4", "CB" = "#F8766D", "S" = "#7CAE00"),
labels = c("WR" = "Wide Receiver", "CB" = "Cornerback", "S" = "Safety")
) +
labs(
title = "Player Speed Throughout the Play",
subtitle = "Speed measured in yards per second",
x = "Frame Number",
y = "Speed (yards/second)",
color = "Position"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "top"
)
#| label: fig-speed-time-py
#| fig-cap: "Player speed throughout the play - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false
# Filter out NA values
plot_data = tracking_example.dropna(subset=['s'])
# Create plot
plt.figure(figsize=(10, 6))
colors = {'WR': '#00BFC4', 'CB': '#F8766D', 'S': '#7CAE00'}
labels = {'WR': 'Wide Receiver', 'CB': 'Cornerback', 'S': 'Safety'}
for position in plot_data['position'].unique():
data = plot_data[plot_data['position'] == position]
plt.plot(data['frameId'], data['s'],
color=colors[position],
label=labels[position],
marker='o', markersize=4, alpha=0.6, linewidth=2)
plt.xlabel('Frame Number', fontsize=12)
plt.ylabel('Speed (yards/second)', fontsize=12)
plt.title('Player Speed Throughout the Play\nSpeed measured in yards per second',
fontsize=14, fontweight='bold')
plt.legend(title='Position', loc='upper right')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Acceleration Analysis
Acceleration measures how quickly players change speed—important for evaluating quickness and explosiveness:
#| label: acceleration-analysis-r
#| message: false
#| warning: false
# Calculate acceleration metrics
acceleration_summary <- tracking_example %>%
group_by(nflId, jerseyNumber, position) %>%
summarise(
max_accel = max(a, na.rm = TRUE),
max_decel = min(a, na.rm = TRUE),
avg_accel = mean(a[a > 0], na.rm = TRUE),
.groups = "drop"
)
acceleration_summary %>%
arrange(desc(max_accel)) %>%
gt() %>%
fmt_number(columns = c(max_accel, max_decel, avg_accel), decimals = 2) %>%
cols_label(
jerseyNumber = "Jersey",
position = "Position",
max_accel = "Max Acceleration",
max_decel = "Max Deceleration",
avg_accel = "Avg Acceleration"
) %>%
tab_header(
title = "Player Acceleration Summary",
subtitle = "Measured in yards/second²"
)
#| label: acceleration-analysis-py
#| message: false
#| warning: false
# Calculate acceleration metrics
acceleration_data = tracking_example.dropna(subset=['a'])
acceleration_summary = []
for nfl_id in acceleration_data['nflId'].unique():
player_data = acceleration_data[acceleration_data['nflId'] == nfl_id]
acceleration_summary.append({
'nflId': nfl_id,
'jerseyNumber': player_data['jerseyNumber'].iloc[0],
'position': player_data['position'].iloc[0],
'max_accel': player_data['a'].max(),
'max_decel': player_data['a'].min(),
'avg_accel': player_data[player_data['a'] > 0]['a'].mean()
})
accel_df = pd.DataFrame(acceleration_summary).sort_values('max_accel', ascending=False)
print("\nPlayer Acceleration Summary:")
print("Measured in yards/second²")
print(accel_df.to_string(index=False))
Distance Traveled
Calculate total distance covered by each player:
#| label: distance-analysis-r
#| message: false
#| warning: false
# Calculate distance metrics
distance_summary <- tracking_example %>%
group_by(nflId, jerseyNumber, position) %>%
arrange(frameId) %>%
summarise(
total_distance = sum(sqrt(dx^2 + dy^2), na.rm = TRUE),
straight_line_distance = sqrt((last(x) - first(x))^2 + (last(y) - first(y))^2),
efficiency = straight_line_distance / total_distance,
.groups = "drop"
)
distance_summary %>%
arrange(desc(total_distance)) %>%
gt() %>%
fmt_number(columns = c(total_distance, straight_line_distance), decimals = 2) %>%
fmt_percent(columns = efficiency, decimals = 1) %>%
cols_label(
jerseyNumber = "Jersey",
position = "Position",
total_distance = "Total Distance (yd)",
straight_line_distance = "Straight Distance (yd)",
efficiency = "Route Efficiency"
) %>%
tab_header(title = "Player Distance Summary")
#| label: distance-analysis-py
#| message: false
#| warning: false
# Calculate distance metrics
distance_summary = []
for nfl_id in tracking_example['nflId'].unique():
player_data = tracking_example[tracking_example['nflId'] == nfl_id].sort_values('frameId')
# Calculate distances
dx = player_data['x'].diff().fillna(0)
dy = player_data['y'].diff().fillna(0)
total_dist = np.sqrt(dx**2 + dy**2).sum()
straight_dist = np.sqrt(
(player_data['x'].iloc[-1] - player_data['x'].iloc[0])**2 +
(player_data['y'].iloc[-1] - player_data['y'].iloc[0])**2
)
distance_summary.append({
'nflId': nfl_id,
'jerseyNumber': player_data['jerseyNumber'].iloc[0],
'position': player_data['position'].iloc[0],
'total_distance': total_dist,
'straight_line_distance': straight_dist,
'efficiency': straight_dist / total_dist if total_dist > 0 else 0
})
dist_df = pd.DataFrame(distance_summary).sort_values('total_distance', ascending=False)
print("\nPlayer Distance Summary:")
print(dist_df.to_string(index=False))
Separation and Coverage Metrics
Calculating Separation Distance
Separation is the distance between a receiver and the nearest defender—a critical metric for pass completion probability:
#| label: separation-calc-r
#| message: false
#| warning: false
# Function to calculate separation
calculate_separation <- function(tracking_data) {
tracking_data %>%
group_by(frameId) %>%
summarise(
wr_x = x[position == "WR"],
wr_y = y[position == "WR"],
# Find minimum distance to any defender
separation = min(
sqrt((x[team == "DEF"] - wr_x)^2 + (y[team == "DEF"] - wr_y)^2)
),
.groups = "drop"
)
}
# Calculate separation for our example
separation_data <- calculate_separation(tracking_example)
# Summary statistics
separation_summary <- separation_data %>%
summarise(
min_separation = min(separation, na.rm = TRUE),
max_separation = max(separation, na.rm = TRUE),
avg_separation = mean(separation, na.rm = TRUE),
median_separation = median(separation, na.rm = TRUE)
)
separation_summary %>%
gt() %>%
fmt_number(decimals = 2) %>%
cols_label(
min_separation = "Min (yd)",
max_separation = "Max (yd)",
avg_separation = "Mean (yd)",
median_separation = "Median (yd)"
) %>%
tab_header(title = "Receiver Separation Summary")
#| label: separation-calc-py
#| message: false
#| warning: false
def calculate_separation(tracking_data):
"""Calculate separation between receiver and nearest defender"""
separations = []
for frame in tracking_data['frameId'].unique():
frame_data = tracking_data[tracking_data['frameId'] == frame]
# Get receiver position
wr_pos = frame_data[frame_data['position'] == 'WR'][['x', 'y']].values
# Get defender positions
def_pos = frame_data[frame_data['team'] == 'DEF'][['x', 'y']].values
if len(wr_pos) > 0 and len(def_pos) > 0:
# Calculate distances to all defenders
distances = cdist(wr_pos, def_pos)
min_distance = distances.min()
separations.append({
'frameId': frame,
'separation': min_distance
})
return pd.DataFrame(separations)
# Calculate separation
separation_data = calculate_separation(tracking_example)
# Summary statistics
print("\nReceiver Separation Summary:")
print(f"Min: {separation_data['separation'].min():.2f} yards")
print(f"Max: {separation_data['separation'].max():.2f} yards")
print(f"Mean: {separation_data['separation'].mean():.2f} yards")
print(f"Median: {separation_data['separation'].median():.2f} yards")
Separation Over Time
#| label: fig-separation-time-r
#| fig-cap: "Receiver separation from nearest defender over time"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false
separation_data %>%
ggplot(aes(x = frameId, y = separation)) +
geom_line(color = "#00BFC4", linewidth = 1.2) +
geom_point(color = "#00BFC4", size = 2, alpha = 0.6) +
geom_hline(yintercept = 2, linetype = "dashed", color = "red", alpha = 0.7) +
annotate("text", x = max(separation_data$frameId) * 0.8, y = 2.3,
label = "Tight Coverage (2 yards)", color = "red", size = 3.5) +
labs(
title = "Receiver Separation Throughout the Play",
subtitle = "Distance to nearest defender measured in yards",
x = "Frame Number",
y = "Separation (yards)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14)
)
#| label: fig-separation-time-py
#| fig-cap: "Receiver separation from nearest defender - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false
plt.figure(figsize=(10, 6))
plt.plot(separation_data['frameId'], separation_data['separation'],
color='#00BFC4', linewidth=2, marker='o', markersize=4, alpha=0.6)
plt.axhline(y=2, color='red', linestyle='--', alpha=0.7)
plt.text(separation_data['frameId'].max() * 0.8, 2.3,
'Tight Coverage (2 yards)', color='red', fontsize=10)
plt.xlabel('Frame Number', fontsize=12)
plt.ylabel('Separation (yards)', fontsize=12)
plt.title('Receiver Separation Throughout the Play\nDistance to nearest defender measured in yards',
fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Coverage Quality Metrics
Develop metrics for coverage quality based on position and separation:
#| label: coverage-quality-r
#| message: false
#| warning: false
# Calculate coverage quality metrics
coverage_metrics <- tracking_example %>%
filter(team == "DEF") %>%
left_join(
separation_data %>% select(frameId, separation),
by = "frameId"
) %>%
group_by(nflId, jerseyNumber, position) %>%
summarise(
avg_separation_allowed = mean(separation, na.rm = TRUE),
min_separation = min(separation, na.rm = TRUE),
tight_coverage_pct = mean(separation < 2, na.rm = TRUE),
.groups = "drop"
)
coverage_metrics %>%
arrange(avg_separation_allowed) %>%
gt() %>%
fmt_number(columns = c(avg_separation_allowed, min_separation), decimals = 2) %>%
fmt_percent(columns = tight_coverage_pct, decimals = 1) %>%
cols_label(
jerseyNumber = "Jersey",
position = "Position",
avg_separation_allowed = "Avg Separation (yd)",
min_separation = "Min Separation (yd)",
tight_coverage_pct = "Tight Coverage %"
) %>%
tab_header(
title = "Defender Coverage Quality",
subtitle = "Tight coverage defined as < 2 yards"
)
#| label: coverage-quality-py
#| message: false
#| warning: false
# Calculate coverage quality metrics
defenders = tracking_example[tracking_example['team'] == 'DEF'].copy()
defenders = defenders.merge(separation_data[['frameId', 'separation']], on='frameId')
coverage_metrics = (defenders
.groupby(['nflId', 'jerseyNumber', 'position'])
.agg(
avg_separation_allowed=('separation', 'mean'),
min_separation=('separation', 'min'),
tight_coverage_pct=('separation', lambda x: (x < 2).mean())
)
.reset_index()
.sort_values('avg_separation_allowed')
)
print("\nDefender Coverage Quality:")
print("Tight coverage defined as < 2 yards")
print(coverage_metrics.to_string(index=False))
Route Analysis and Path Clustering
Route Path Features
Extract features from route paths for clustering and classification:
#| label: route-features-r
#| message: false
#| warning: false
# Calculate route features
calculate_route_features <- function(player_data) {
player_data <- player_data %>% arrange(frameId)
tibble(
total_distance = sum(sqrt(player_data$dx^2 + player_data$dy^2), na.rm = TRUE),
depth = max(player_data$x) - min(player_data$x),
width = max(player_data$y) - min(player_data$y),
max_speed = max(player_data$s, na.rm = TRUE),
avg_speed = mean(player_data$s, na.rm = TRUE),
direction_changes = sum(abs(diff(player_data$dir)) > 45, na.rm = TRUE),
end_x = last(player_data$x),
end_y = last(player_data$y)
)
}
# Example route features for receiver
route_features <- tracking_example %>%
filter(position == "WR") %>%
calculate_route_features()
route_features %>%
pivot_longer(everything(), names_to = "Feature", values_to = "Value") %>%
gt() %>%
fmt_number(columns = Value, decimals = 2) %>%
cols_label(
Feature = "Route Feature",
Value = "Value"
) %>%
tab_header(title = "Route Path Features")
#| label: route-features-py
#| message: false
#| warning: false
def calculate_route_features(player_data):
"""Calculate features from route path"""
player_data = player_data.sort_values('frameId')
# Calculate distance
dx = player_data['x'].diff().fillna(0)
dy = player_data['y'].diff().fillna(0)
total_distance = np.sqrt(dx**2 + dy**2).sum()
# Direction changes
dir_diff = player_data['dir'].diff().fillna(0).abs()
direction_changes = (dir_diff > 45).sum()
features = {
'total_distance': total_distance,
'depth': player_data['x'].max() - player_data['x'].min(),
'width': player_data['y'].max() - player_data['y'].min(),
'max_speed': player_data['s'].max(),
'avg_speed': player_data['s'].mean(),
'direction_changes': direction_changes,
'end_x': player_data['x'].iloc[-1],
'end_y': player_data['y'].iloc[-1]
}
return features
# Calculate features for receiver
wr_data = tracking_example[tracking_example['position'] == 'WR']
route_features = calculate_route_features(wr_data)
print("\nRoute Path Features:")
for feature, value in route_features.items():
print(f"{feature:20s}: {value:.2f}")
Route Clustering
Use clustering to identify common route patterns:
#| label: route-clustering-r
#| eval: false
#| echo: true
# Create dataset of multiple routes (synthetic example)
set.seed(456)
# Generate different route types
generate_route_set <- function(n_routes = 100) {
routes <- list()
for (i in 1:n_routes) {
route_type <- sample(c("go", "slant", "out", "post"), 1)
if (route_type == "go") {
# Straight vertical route
route <- generate_trajectory(75, 20, 95, 22 + rnorm(1, 0, 2))
} else if (route_type == "slant") {
# Diagonal route
route <- generate_trajectory(75, 20, 85, 26 + rnorm(1, 0, 1))
} else if (route_type == "out") {
# Out route
route <- generate_trajectory(75, 20, 85, 15 + rnorm(1, 0, 1))
} else {
# Post route
route <- generate_trajectory(75, 20, 95, 26 + rnorm(1, 0, 1))
}
route$route_id <- i
route$true_type <- route_type
routes[[i]] <- route
}
bind_rows(routes)
}
# Generate routes
all_routes <- generate_route_set()
# Calculate features for each route
route_features <- all_routes %>%
group_by(route_id) %>%
summarise(
depth = max(x) - min(x),
width = max(y) - min(y),
end_y = last(y),
true_type = first(true_type),
.groups = "drop"
)
# Perform k-means clustering
set.seed(789)
kmeans_result <- route_features %>%
select(depth, width, end_y) %>%
scale() %>%
kmeans(centers = 4, nstart = 25)
# Add cluster assignments
route_features$cluster <- as.factor(kmeans_result$cluster)
# Visualize clusters
ggplot(route_features, aes(x = depth, y = end_y, color = cluster)) +
geom_point(size = 3, alpha = 0.6) +
labs(
title = "Route Clustering",
subtitle = "Routes grouped by depth and ending position",
x = "Route Depth (yards)",
y = "Ending Y Position (yards)",
color = "Cluster"
) +
theme_minimal()
#| label: route-clustering-py
#| eval: false
#| echo: true
# Create dataset of multiple routes (synthetic example)
np.random.seed(456)
def generate_route_set(n_routes=100):
"""Generate multiple routes of different types"""
routes = []
for i in range(n_routes):
route_type = np.random.choice(['go', 'slant', 'out', 'post'])
if route_type == 'go':
route = generate_trajectory(75, 20, 95, 22 + np.random.normal(0, 2))
elif route_type == 'slant':
route = generate_trajectory(75, 20, 85, 26 + np.random.normal(0, 1))
elif route_type == 'out':
route = generate_trajectory(75, 20, 85, 15 + np.random.normal(0, 1))
else: # post
route = generate_trajectory(75, 20, 95, 26 + np.random.normal(0, 1))
route['route_id'] = i
route['true_type'] = route_type
routes.append(route)
return pd.concat(routes, ignore_index=True)
# Generate routes
all_routes = generate_route_set()
# Calculate features for each route
route_features = (all_routes
.groupby('route_id')
.agg(
depth=('x', lambda x: x.max() - x.min()),
width=('y', lambda x: x.max() - x.min()),
end_y=('y', 'last'),
true_type=('true_type', 'first')
)
.reset_index()
)
# Perform k-means clustering
scaler = StandardScaler()
features_scaled = scaler.fit_transform(route_features[['depth', 'width', 'end_y']])
kmeans = KMeans(n_clusters=4, random_state=789, n_init=25)
route_features['cluster'] = kmeans.fit_predict(features_scaled)
# Visualize clusters
plt.figure(figsize=(10, 6))
for cluster in range(4):
cluster_data = route_features[route_features['cluster'] == cluster]
plt.scatter(cluster_data['depth'], cluster_data['end_y'],
label=f'Cluster {cluster}', alpha=0.6, s=50)
plt.xlabel('Route Depth (yards)', fontsize=12)
plt.ylabel('Ending Y Position (yards)', fontsize=12)
plt.title('Route Clustering\nRoutes grouped by depth and ending position',
fontsize=14, fontweight='bold')
plt.legend(title='Cluster')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Expected Completion Models with Tracking
Feature Engineering
Create features from tracking data for completion probability models:
#| label: xcomp-features-r
#| eval: false
#| echo: true
# Function to calculate features at target time (when ball arrives)
calculate_completion_features <- function(tracking_data, target_frame) {
target_data <- tracking_data %>% filter(frameId == target_frame)
# Get receiver and ball positions
receiver <- target_data %>% filter(position == "WR")
defenders <- target_data %>% filter(team == "DEF")
# Calculate separation metrics
distances_to_defenders <- sqrt(
(defenders$x - receiver$x)^2 + (defenders$y - receiver$y)^2
)
tibble(
separation = min(distances_to_defenders),
defenders_within_1yd = sum(distances_to_defenders < 1),
defenders_within_2yd = sum(distances_to_defenders < 2),
defenders_within_3yd = sum(distances_to_defenders < 3),
receiver_speed = receiver$s,
closest_defender_speed = defenders$s[which.min(distances_to_defenders)],
target_x = receiver$x,
target_y = receiver$y,
depth_of_target = receiver$x - 75 # assuming LOS at 75
)
}
# Example usage
target_frame <- 30 # Frame when ball arrives
completion_features <- calculate_completion_features(tracking_example, target_frame)
completion_features %>%
pivot_longer(everything(), names_to = "Feature", values_to = "Value") %>%
gt() %>%
fmt_number(columns = Value, decimals = 2) %>%
tab_header(title = "Completion Probability Features")
#| label: xcomp-features-py
#| eval: false
#| echo: true
def calculate_completion_features(tracking_data, target_frame):
"""Calculate features at target time for completion model"""
target_data = tracking_data[tracking_data['frameId'] == target_frame]
# Get receiver and defenders
receiver = target_data[target_data['position'] == 'WR'].iloc[0]
defenders = target_data[target_data['team'] == 'DEF']
# Calculate distances
distances = np.sqrt(
(defenders['x'] - receiver['x'])**2 +
(defenders['y'] - receiver['y'])**2
)
features = {
'separation': distances.min(),
'defenders_within_1yd': (distances < 1).sum(),
'defenders_within_2yd': (distances < 2).sum(),
'defenders_within_3yd': (distances < 3).sum(),
'receiver_speed': receiver['s'],
'closest_defender_speed': defenders.iloc[distances.argmin()]['s'],
'target_x': receiver['x'],
'target_y': receiver['y'],
'depth_of_target': receiver['x'] - 75 # assuming LOS at 75
}
return features
# Example usage
target_frame = 30 # Frame when ball arrives
completion_features = calculate_completion_features(tracking_example, target_frame)
print("\nCompletion Probability Features:")
for feature, value in completion_features.items():
print(f"{feature:25s}: {value:.2f}")
Expected Completion Model
Build a logistic regression model for completion probability:
#| label: xcomp-model-r
#| eval: false
#| echo: true
# Load completion data (synthetic example)
# In practice, you would load actual tracking data with completion outcomes
set.seed(999)
n_plays <- 1000
# Simulate features and outcomes
completion_data <- tibble(
separation = abs(rnorm(n_plays, 2.5, 1.5)),
defenders_within_2yd = rpois(n_plays, 0.8),
receiver_speed = abs(rnorm(n_plays, 10, 2)),
depth_of_target = abs(rnorm(n_plays, 15, 8)),
air_yards = abs(rnorm(n_plays, 12, 6))
) %>%
mutate(
# Simulate completion probability
xcomp = plogis(
0.5 +
0.3 * separation -
0.4 * defenders_within_2yd -
0.02 * depth_of_target +
0.01 * receiver_speed
),
# Simulate actual completions
completion = rbinom(n_plays, 1, xcomp)
)
# Fit logistic regression model
xcomp_model <- glm(
completion ~ separation + defenders_within_2yd +
receiver_speed + depth_of_target + air_yards,
data = completion_data,
family = binomial(link = "logit")
)
# Model summary
summary(xcomp_model)
# Calculate predicted probabilities
completion_data$predicted_xcomp <- predict(xcomp_model, type = "response")
# Evaluate model performance
library(pROC)
roc_obj <- roc(completion_data$completion, completion_data$predicted_xcomp)
auc(roc_obj)
#| label: xcomp-model-py
#| eval: false
#| echo: true
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, roc_curve
# Simulate features and outcomes
np.random.seed(999)
n_plays = 1000
completion_data = pd.DataFrame({
'separation': np.abs(np.random.normal(2.5, 1.5, n_plays)),
'defenders_within_2yd': np.random.poisson(0.8, n_plays),
'receiver_speed': np.abs(np.random.normal(10, 2, n_plays)),
'depth_of_target': np.abs(np.random.normal(15, 8, n_plays)),
'air_yards': np.abs(np.random.normal(12, 6, n_plays))
})
# Simulate completion probability
from scipy.special import expit
completion_data['xcomp'] = expit(
0.5 +
0.3 * completion_data['separation'] -
0.4 * completion_data['defenders_within_2yd'] -
0.02 * completion_data['depth_of_target'] +
0.01 * completion_data['receiver_speed']
)
completion_data['completion'] = np.random.binomial(
1, completion_data['xcomp']
)
# Prepare features and target
features = ['separation', 'defenders_within_2yd', 'receiver_speed',
'depth_of_target', 'air_yards']
X = completion_data[features]
y = completion_data['completion']
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Fit logistic regression
xcomp_model = LogisticRegression(random_state=42, max_iter=1000)
xcomp_model.fit(X_train, y_train)
# Predictions
y_pred_proba = xcomp_model.predict_proba(X_test)[:, 1]
# Evaluate
auc_score = roc_auc_score(y_test, y_pred_proba)
print(f"\nModel AUC: {auc_score:.3f}")
# Feature importance
print("\nFeature Coefficients:")
for feature, coef in zip(features, xcomp_model.coef_[0]):
print(f"{feature:25s}: {coef:7.3f}")
Defensive Alignment Detection
Formation Detection
Identify defensive formations from pre-snap positioning:
#| label: formation-detection-r
#| eval: false
#| echo: true
# Function to detect defensive alignment
detect_defensive_formation <- function(tracking_data, snap_frame = 1) {
# Get defender positions at snap
defenders <- tracking_data %>%
filter(frameId == snap_frame, team == "DEF") %>%
arrange(x)
# Count defenders by region
box_count <- sum(defenders$y > 14 & defenders$y < 40 & defenders$x < 80)
deep_safeties <- sum(defenders$x > 85)
# Classify formation
formation <- case_when(
deep_safeties >= 2 & box_count <= 6 ~ "Cover 2",
deep_safeties == 1 & box_count <= 6 ~ "Cover 3",
deep_safeties == 0 ~ "Cover 0/1",
box_count >= 8 ~ "Heavy Box",
TRUE ~ "Other"
)
list(
formation = formation,
box_count = box_count,
deep_safeties = deep_safeties,
defender_positions = defenders %>% select(x, y, position)
)
}
# Example usage
formation_info <- detect_defensive_formation(tracking_example)
cat("Detected Formation:", formation_info$formation, "\n")
cat("Box Count:", formation_info$box_count, "\n")
cat("Deep Safeties:", formation_info$deep_safeties, "\n")
#| label: formation-detection-py
#| eval: false
#| echo: true
def detect_defensive_formation(tracking_data, snap_frame=1):
"""Detect defensive formation from pre-snap positioning"""
# Get defender positions at snap
defenders = (tracking_data
.query(f"frameId == {snap_frame} & team == 'DEF'")
.sort_values('x')
)
# Count defenders by region
box_count = ((defenders['y'] > 14) &
(defenders['y'] < 40) &
(defenders['x'] < 80)).sum()
deep_safeties = (defenders['x'] > 85).sum()
# Classify formation
if deep_safeties >= 2 and box_count <= 6:
formation = "Cover 2"
elif deep_safeties == 1 and box_count <= 6:
formation = "Cover 3"
elif deep_safeties == 0:
formation = "Cover 0/1"
elif box_count >= 8:
formation = "Heavy Box"
else:
formation = "Other"
return {
'formation': formation,
'box_count': box_count,
'deep_safeties': deep_safeties,
'defender_positions': defenders[['x', 'y', 'position']]
}
# Example usage
formation_info = detect_defensive_formation(tracking_example)
print(f"\nDetected Formation: {formation_info['formation']}")
print(f"Box Count: {formation_info['box_count']}")
print(f"Deep Safeties: {formation_info['deep_safeties']}")
Defensive Alignment Visualization
#| label: fig-formation-viz-r
#| fig-cap: "Defensive alignment at snap"
#| fig-width: 12
#| fig-height: 8
#| eval: false
#| echo: true
# Visualize defensive formation
plot_defensive_formation <- function(tracking_data, snap_frame = 1) {
snap_data <- tracking_data %>% filter(frameId == snap_frame)
ggplot() +
# Field boundaries
geom_rect(aes(xmin = 0, xmax = 120, ymin = 0, ymax = 53.3),
fill = "#196f0c", alpha = 0.3) +
# Yard lines
geom_vline(xintercept = seq(10, 110, by = 10),
color = "white", alpha = 0.3) +
# Line of scrimmage
geom_vline(xintercept = 75, color = "yellow", linewidth = 1) +
# Plot defenders
geom_point(data = snap_data %>% filter(team == "DEF"),
aes(x = x, y = y),
color = "red", size = 6, alpha = 0.8) +
geom_text(data = snap_data %>% filter(team == "DEF"),
aes(x = x, y = y, label = jerseyNumber),
color = "white", size = 3, fontface = "bold") +
# Plot offense
geom_point(data = snap_data %>% filter(team == "OFF"),
aes(x = x, y = y),
color = "blue", size = 6, alpha = 0.8) +
geom_text(data = snap_data %>% filter(team == "OFF"),
aes(x = x, y = y, label = jerseyNumber),
color = "white", size = 3, fontface = "bold") +
# Formatting
coord_fixed() +
scale_x_continuous(limits = c(60, 95)) +
scale_y_continuous(limits = c(0, 53.3)) +
labs(
title = "Defensive Alignment at Snap",
subtitle = paste("Formation:", formation_info$formation),
x = "Field Position (yards)",
y = "Field Width (yards)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
panel.grid = element_blank()
)
}
plot_defensive_formation(tracking_example)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-formation-viz-py
#| fig-cap: "Defensive alignment at snap - Python"
#| fig-width: 12
#| fig-height: 8
#| eval: false
#| echo: true
def plot_defensive_formation(tracking_data, snap_frame=1):
"""Visualize defensive formation"""
snap_data = tracking_data[tracking_data['frameId'] == snap_frame]
fig, ax = plt.subplots(figsize=(12, 8))
# Field background
ax.add_patch(plt.Rectangle((0, 0), 120, 53.3,
facecolor='#196f0c', alpha=0.3))
# Yard lines
for x in range(10, 111, 10):
ax.axvline(x, color='white', alpha=0.3, linewidth=0.5)
# Line of scrimmage
ax.axvline(75, color='yellow', linewidth=2)
# Plot defenders
defenders = snap_data[snap_data['team'] == 'DEF']
ax.scatter(defenders['x'], defenders['y'],
c='red', s=200, alpha=0.8, zorder=3)
for _, player in defenders.iterrows():
ax.text(player['x'], player['y'], str(int(player['jerseyNumber'])),
color='white', fontsize=10, fontweight='bold',
ha='center', va='center', zorder=4)
# Plot offense
offense = snap_data[snap_data['team'] == 'OFF']
ax.scatter(offense['x'], offense['y'],
c='blue', s=200, alpha=0.8, zorder=3)
for _, player in offense.iterrows():
ax.text(player['x'], player['y'], str(int(player['jerseyNumber'])),
color='white', fontsize=10, fontweight='bold',
ha='center', va='center', zorder=4)
ax.set_xlim(60, 95)
ax.set_ylim(0, 53.3)
ax.set_aspect('equal')
ax.set_xlabel('Field Position (yards)', fontsize=12)
ax.set_ylabel('Field Width (yards)', fontsize=12)
ax.set_title(f'Defensive Alignment at Snap\nFormation: {formation_info["formation"]}',
fontsize=14, fontweight='bold')
ax.grid(False)
plt.tight_layout()
plt.show()
plot_defensive_formation(tracking_example)
Spatial Control and Voronoi Diagrams
Voronoi Tessellation
Voronoi diagrams partition the field into regions based on which player is closest to each point:
#| label: voronoi-r
#| eval: false
#| echo: true
library(deldir)
library(ggvoronoi)
# Function to create Voronoi diagram
create_voronoi_plot <- function(tracking_data, frame_num) {
frame_data <- tracking_data %>%
filter(frameId == frame_num) %>%
select(x, y, team, jerseyNumber)
# Create Voronoi tessellation
ggplot(frame_data, aes(x = x, y = y, fill = team)) +
# Field background
geom_rect(aes(xmin = 0, xmax = 120, ymin = 0, ymax = 53.3),
fill = "#196f0c", alpha = 0.3, inherit.aes = FALSE) +
# Voronoi regions
geom_voronoi(alpha = 0.4, outline = frame_data) +
# Player positions
geom_point(aes(color = team), size = 5) +
geom_text(aes(label = jerseyNumber),
color = "white", size = 3, fontface = "bold") +
# Styling
scale_fill_manual(values = c("OFF" = "blue", "DEF" = "red")) +
scale_color_manual(values = c("OFF" = "blue", "DEF" = "red")) +
coord_fixed(xlim = c(60, 100), ylim = c(0, 53.3)) +
labs(
title = "Spatial Control via Voronoi Diagram",
subtitle = paste("Frame:", frame_num),
x = "Field Position (yards)",
y = "Field Width (yards)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "top"
)
}
# Create Voronoi plot
create_voronoi_plot(tracking_example, frame_num = 25)
#| label: voronoi-py
#| eval: false
#| echo: true
def create_voronoi_plot(tracking_data, frame_num):
"""Create Voronoi diagram for spatial control"""
frame_data = tracking_data[tracking_data['frameId'] == frame_num]
# Get player positions
points = frame_data[['x', 'y']].values
# Create Voronoi diagram
vor = Voronoi(points)
fig, ax = plt.subplots(figsize=(12, 8))
# Field background
ax.add_patch(plt.Rectangle((0, 0), 120, 53.3,
facecolor='#196f0c', alpha=0.3))
# Plot Voronoi regions
voronoi_plot_2d(vor, ax=ax, show_vertices=False,
line_colors='black', line_width=1.5)
# Color regions by team
for region_idx, point_idx in enumerate(vor.point_region):
region = vor.regions[point_idx]
if not -1 in region and len(region) > 0:
polygon = [vor.vertices[i] for i in region]
team = frame_data.iloc[region_idx]['team']
color = 'blue' if team == 'OFF' else 'red'
ax.fill(*zip(*polygon), alpha=0.3, color=color)
# Plot players
for team in ['OFF', 'DEF']:
team_data = frame_data[frame_data['team'] == team]
color = 'blue' if team == 'OFF' else 'red'
ax.scatter(team_data['x'], team_data['y'],
c=color, s=200, edgecolors='white', linewidths=2, zorder=3)
for _, player in team_data.iterrows():
ax.text(player['x'], player['y'],
str(int(player['jerseyNumber'])),
color='white', fontsize=10, fontweight='bold',
ha='center', va='center', zorder=4)
ax.set_xlim(60, 100)
ax.set_ylim(0, 53.3)
ax.set_aspect('equal')
ax.set_xlabel('Field Position (yards)', fontsize=12)
ax.set_ylabel('Field Width (yards)', fontsize=12)
ax.set_title(f'Spatial Control via Voronoi Diagram\nFrame: {frame_num}',
fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
create_voronoi_plot(tracking_example, frame_num=25)
Pitch Control Model
A more sophisticated approach to spatial control using influence functions:
#| label: pitch-control-r
#| eval: false
#| echo: true
# Pitch control model based on Spearman et al. (2017)
calculate_pitch_control <- function(tracking_data, frame_num,
grid_x = seq(60, 100, 1),
grid_y = seq(0, 53.3, 1)) {
frame_data <- tracking_data %>% filter(frameId == frame_num)
# Create grid
grid <- expand_grid(grid_x = grid_x, grid_y = grid_y)
# Calculate influence for each grid point
grid$off_control <- 0
grid$def_control <- 0
for (i in 1:nrow(grid)) {
gx <- grid$grid_x[i]
gy <- grid$grid_y[i]
# Offensive influence
off_players <- frame_data %>% filter(team == "OFF")
off_distances <- sqrt((off_players$x - gx)^2 + (off_players$y - gy)^2)
off_speeds <- off_players$s
off_influence <- sum(exp(-off_distances / (off_speeds + 1)))
# Defensive influence
def_players <- frame_data %>% filter(team == "DEF")
def_distances <- sqrt((def_players$x - gx)^2 + (def_players$y - gy)^2)
def_speeds <- def_players$s
def_influence <- sum(exp(-def_distances / (def_speeds + 1)))
# Calculate control probability
total_influence <- off_influence + def_influence
grid$off_control[i] <- off_influence / total_influence
grid$def_control[i] <- def_influence / total_influence
}
grid
}
# Calculate and visualize pitch control
pitch_control <- calculate_pitch_control(tracking_example, frame_num = 25)
ggplot(pitch_control, aes(x = grid_x, y = grid_y, fill = off_control)) +
geom_tile() +
scale_fill_gradient2(
low = "red", mid = "white", high = "blue",
midpoint = 0.5,
name = "Offensive\nControl"
) +
geom_point(
data = tracking_example %>% filter(frameId == 25),
aes(x = x, y = y, color = team),
size = 5, inherit.aes = FALSE
) +
scale_color_manual(values = c("OFF" = "blue", "DEF" = "red")) +
coord_fixed() +
labs(
title = "Pitch Control Model",
subtitle = "Probability of offensive control at each location",
x = "Field Position (yards)",
y = "Field Width (yards)"
) +
theme_minimal()
#| label: pitch-control-py
#| eval: false
#| echo: true
def calculate_pitch_control(tracking_data, frame_num,
grid_x=np.arange(60, 101, 1),
grid_y=np.arange(0, 54, 1)):
"""Calculate pitch control using influence functions"""
frame_data = tracking_data[tracking_data['frameId'] == frame_num]
# Create grid
xx, yy = np.meshgrid(grid_x, grid_y)
# Initialize control arrays
off_control = np.zeros_like(xx, dtype=float)
def_control = np.zeros_like(xx, dtype=float)
# Get player data
off_players = frame_data[frame_data['team'] == 'OFF']
def_players = frame_data[frame_data['team'] == 'DEF']
# Calculate influence for each grid point
for i in range(xx.shape[0]):
for j in range(xx.shape[1]):
gx, gy = xx[i, j], yy[i, j]
# Offensive influence
off_distances = np.sqrt(
(off_players['x'] - gx)**2 + (off_players['y'] - gy)**2
)
off_speeds = off_players['s'].values
off_influence = np.sum(np.exp(-off_distances / (off_speeds + 1)))
# Defensive influence
def_distances = np.sqrt(
(def_players['x'] - gx)**2 + (def_players['y'] - gy)**2
)
def_speeds = def_players['s'].values
def_influence = np.sum(np.exp(-def_distances / (def_speeds + 1)))
# Calculate control probability
total_influence = off_influence + def_influence
off_control[i, j] = off_influence / total_influence
def_control[i, j] = def_influence / total_influence
return xx, yy, off_control, def_control, frame_data
# Calculate and visualize
xx, yy, off_control, def_control, frame_data = calculate_pitch_control(
tracking_example, frame_num=25
)
plt.figure(figsize=(12, 8))
# Plot heatmap
plt.contourf(xx, yy, off_control, levels=20, cmap='RdBu', alpha=0.8)
plt.colorbar(label='Offensive Control Probability')
# Plot players
off = frame_data[frame_data['team'] == 'OFF']
plt.scatter(off['x'], off['y'], c='blue', s=200, edgecolors='white',
linewidths=2, label='Offense', zorder=3)
deff = frame_data[frame_data['team'] == 'DEF']
plt.scatter(deff['x'], deff['y'], c='red', s=200, edgecolors='white',
linewidths=2, label='Defense', zorder=3)
plt.xlabel('Field Position (yards)', fontsize=12)
plt.ylabel('Field Width (yards)', fontsize=12)
plt.title('Pitch Control Model\nProbability of offensive control at each location',
fontsize=14, fontweight='bold')
plt.legend()
plt.axis('equal')
plt.tight_layout()
plt.show()
Animation and Visualization
Static Frame Visualization
#| label: fig-field-viz-r
#| fig-cap: "Field visualization with player positions"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Function to plot a single frame
plot_frame <- function(tracking_data, frame_num) {
frame_data <- tracking_data %>% filter(frameId == frame_num)
ggplot() +
# Field
geom_rect(aes(xmin = 0, xmax = 120, ymin = 0, ymax = 53.3),
fill = "#196f0c", alpha = 0.5) +
# Hash marks and yard lines
geom_vline(xintercept = seq(10, 110, by = 5),
color = "white", alpha = 0.3) +
geom_hline(yintercept = c(0, 53.3), color = "white") +
# Line of scrimmage
geom_vline(xintercept = 75, color = "yellow", linewidth = 1.5) +
# Players
geom_point(data = frame_data, aes(x = x, y = y, color = team),
size = 8, alpha = 0.8) +
geom_text(data = frame_data, aes(x = x, y = y, label = jerseyNumber),
color = "white", size = 3, fontface = "bold") +
# Paths (trace)
geom_path(data = tracking_example %>%
filter(frameId <= frame_num) %>%
group_by(nflId),
aes(x = x, y = y, color = team, group = nflId),
alpha = 0.3, linewidth = 0.8) +
scale_color_manual(
values = c("OFF" = "#0000FF", "DEF" = "#FF0000"),
labels = c("OFF" = "Offense", "DEF" = "Defense")
) +
coord_fixed(xlim = c(65, 105), ylim = c(0, 53.3)) +
labs(
title = paste("Play Tracking - Frame", frame_num),
x = "Field Position (yards)",
y = "Field Width (yards)",
color = "Team"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
panel.grid = element_blank(),
legend.position = "top"
)
}
# Plot a single frame
plot_frame(tracking_example, frame_num = 25)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-field-viz-py
#| fig-cap: "Field visualization with player positions - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
def plot_frame(tracking_data, frame_num):
"""Plot a single frame of tracking data"""
frame_data = tracking_data[tracking_data['frameId'] == frame_num]
fig, ax = plt.subplots(figsize=(12, 8))
# Field
ax.add_patch(plt.Rectangle((0, 0), 120, 53.3,
facecolor='#196f0c', alpha=0.5))
# Yard lines
for x in range(10, 111, 5):
ax.axvline(x, color='white', alpha=0.3, linewidth=0.5)
# Sidelines
ax.axhline(0, color='white', linewidth=1)
ax.axhline(53.3, color='white', linewidth=1)
# Line of scrimmage
ax.axvline(75, color='yellow', linewidth=2)
# Player paths (trace)
for nfl_id in tracking_data['nflId'].unique():
player_path = tracking_data[
(tracking_data['nflId'] == nfl_id) &
(tracking_data['frameId'] <= frame_num)
]
team = player_path['team'].iloc[0]
color = 'blue' if team == 'OFF' else 'red'
ax.plot(player_path['x'], player_path['y'],
color=color, alpha=0.3, linewidth=1.5)
# Current player positions
for team, color in [('OFF', 'blue'), ('DEF', 'red')]:
team_data = frame_data[frame_data['team'] == team]
ax.scatter(team_data['x'], team_data['y'],
c=color, s=250, alpha=0.8, edgecolors='white',
linewidths=2, zorder=3)
for _, player in team_data.iterrows():
ax.text(player['x'], player['y'],
str(int(player['jerseyNumber'])),
color='white', fontsize=10, fontweight='bold',
ha='center', va='center', zorder=4)
ax.set_xlim(65, 105)
ax.set_ylim(0, 53.3)
ax.set_aspect('equal')
ax.set_xlabel('Field Position (yards)', fontsize=12)
ax.set_ylabel('Field Width (yards)', fontsize=12)
ax.set_title(f'Play Tracking - Frame {frame_num}',
fontsize=14, fontweight='bold')
ax.grid(False)
# Legend
from matplotlib.patches import Patch
legend_elements = [
Patch(facecolor='blue', label='Offense'),
Patch(facecolor='red', label='Defense')
]
ax.legend(handles=legend_elements, loc='upper right')
plt.tight_layout()
plt.show()
plot_frame(tracking_example, frame_num=25)
Animated Visualization
#| label: animation-r
#| eval: false
#| echo: true
library(gganimate)
# Create animated visualization
create_play_animation <- function(tracking_data) {
p <- ggplot() +
# Field
geom_rect(aes(xmin = 0, xmax = 120, ymin = 0, ymax = 53.3),
fill = "#196f0c", alpha = 0.5) +
# Yard lines
geom_vline(xintercept = seq(10, 110, by = 5),
color = "white", alpha = 0.3) +
# Line of scrimmage
geom_vline(xintercept = 75, color = "yellow", linewidth = 1.5) +
# Player paths
geom_path(data = tracking_data,
aes(x = x, y = y, color = team, group = nflId),
alpha = 0.3, linewidth = 0.8) +
# Players
geom_point(data = tracking_data,
aes(x = x, y = y, color = team),
size = 8, alpha = 0.8) +
geom_text(data = tracking_data,
aes(x = x, y = y, label = jerseyNumber),
color = "white", size = 3, fontface = "bold") +
scale_color_manual(
values = c("OFF" = "#0000FF", "DEF" = "#FF0000")
) +
coord_fixed(xlim = c(65, 105), ylim = c(0, 53.3)) +
labs(
title = "Play Animation - Frame {frame}",
x = "Field Position (yards)",
y = "Field Width (yards)",
color = "Team"
) +
theme_minimal() +
theme(panel.grid = element_blank()) +
# Animation
transition_manual(frameId)
# Animate
animate(p, nframes = max(tracking_data$frameId),
fps = 10, width = 1000, height = 600)
}
# Create animation
# anim <- create_play_animation(tracking_example)
# anim_save("play_animation.gif", animation = anim)
#| label: animation-py
#| eval: false
#| echo: true
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
def create_play_animation(tracking_data):
"""Create animated visualization of play"""
fig, ax = plt.subplots(figsize=(12, 8))
def update(frame_num):
ax.clear()
# Field
ax.add_patch(plt.Rectangle((0, 0), 120, 53.3,
facecolor='#196f0c', alpha=0.5))
# Yard lines
for x in range(10, 111, 5):
ax.axvline(x, color='white', alpha=0.3, linewidth=0.5)
ax.axvline(75, color='yellow', linewidth=2)
# Player paths up to current frame
for nfl_id in tracking_data['nflId'].unique():
player_path = tracking_data[
(tracking_data['nflId'] == nfl_id) &
(tracking_data['frameId'] <= frame_num)
]
team = player_path['team'].iloc[0]
color = 'blue' if team == 'OFF' else 'red'
ax.plot(player_path['x'], player_path['y'],
color=color, alpha=0.3, linewidth=1.5)
# Current positions
frame_data = tracking_data[tracking_data['frameId'] == frame_num]
for team, color in [('OFF', 'blue'), ('DEF', 'red')]:
team_data = frame_data[frame_data['team'] == team]
ax.scatter(team_data['x'], team_data['y'],
c=color, s=250, alpha=0.8, edgecolors='white',
linewidths=2, zorder=3)
for _, player in team_data.iterrows():
ax.text(player['x'], player['y'],
str(int(player['jerseyNumber'])),
color='white', fontsize=10, fontweight='bold',
ha='center', va='center', zorder=4)
ax.set_xlim(65, 105)
ax.set_ylim(0, 53.3)
ax.set_aspect('equal')
ax.set_xlabel('Field Position (yards)', fontsize=12)
ax.set_ylabel('Field Width (yards)', fontsize=12)
ax.set_title(f'Play Animation - Frame {frame_num}',
fontsize=14, fontweight='bold')
ax.grid(False)
frames = sorted(tracking_data['frameId'].unique())
anim = FuncAnimation(fig, update, frames=frames,
interval=100, repeat=True)
plt.close()
return anim
# Create animation
# anim = create_play_animation(tracking_example)
# HTML(anim.to_html5_video())
# Or save: anim.save('play_animation.mp4', writer='ffmpeg')
Advanced Tracking Applications
Pass Rush Analysis
Analyze pass rush paths and pressure generation:
#| label: pass-rush-r
#| eval: false
#| echo: true
# Calculate pass rush metrics
calculate_rush_metrics <- function(tracking_data, qb_frame) {
tracking_data %>%
filter(team == "DEF") %>%
group_by(nflId, jerseyNumber) %>%
summarise(
time_to_qb = qb_frame / 10, # convert frames to seconds
distance_to_qb = min(sqrt(
(x - tracking_data$x[tracking_data$position == "QB"])^2 +
(y - tracking_data$y[tracking_data$position == "QB"])^2
)),
max_speed = max(s, na.rm = TRUE),
rush_distance = sum(sqrt(dx^2 + dy^2), na.rm = TRUE),
pressure_created = distance_to_qb < 3,
.groups = "drop"
)
}
#| label: pass-rush-py
#| eval: false
#| echo: true
def calculate_rush_metrics(tracking_data, qb_frame):
"""Calculate pass rush metrics"""
rush_metrics = []
defenders = tracking_data[tracking_data['team'] == 'DEF']
for nfl_id in defenders['nflId'].unique():
player_data = defenders[defenders['nflId'] == nfl_id]
# Get QB position at release
qb_pos = tracking_data[
(tracking_data['position'] == 'QB') &
(tracking_data['frameId'] == qb_frame)
][['x', 'y']].values[0]
# Calculate distances to QB
distances = np.sqrt(
(player_data['x'] - qb_pos[0])**2 +
(player_data['y'] - qb_pos[1])**2
)
rush_metrics.append({
'nflId': nfl_id,
'jerseyNumber': player_data['jerseyNumber'].iloc[0],
'time_to_qb': qb_frame / 10,
'distance_to_qb': distances.min(),
'max_speed': player_data['s'].max(),
'pressure_created': distances.min() < 3
})
return pd.DataFrame(rush_metrics)
Expected Yards After Catch (xYAC)
Model expected YAC based on spatial configuration:
#| label: xyac-r
#| eval: false
#| echo: true
# Calculate xYAC features
calculate_xyac_features <- function(tracking_data, catch_frame) {
catch_data <- tracking_data %>% filter(frameId == catch_frame)
receiver <- catch_data %>% filter(position == "WR")
defenders <- catch_data %>% filter(team == "DEF")
# Calculate distances and angles
distances <- sqrt(
(defenders$x - receiver$x)^2 + (defenders$y - receiver$y)^2
)
# Defenders ahead of receiver
defenders_ahead <- sum(defenders$x > receiver$x)
# Space metrics
tibble(
receiver_speed = receiver$s,
nearest_defender = min(distances),
defenders_within_5yd = sum(distances < 5),
defenders_ahead = defenders_ahead,
open_field = (53.3 - max(abs(receiver$y - 26.65))) / 26.65,
distance_to_endzone = 110 - receiver$x
)
}
# Example model
# yac_model <- lm(yards_after_catch ~ receiver_speed + nearest_defender +
# defenders_within_5yd + defenders_ahead + open_field,
# data = yac_training_data)
#| label: xyac-py
#| eval: false
#| echo: true
def calculate_xyac_features(tracking_data, catch_frame):
"""Calculate expected YAC features"""
catch_data = tracking_data[tracking_data['frameId'] == catch_frame]
receiver = catch_data[catch_data['position'] == 'WR'].iloc[0]
defenders = catch_data[catch_data['team'] == 'DEF']
# Calculate distances
distances = np.sqrt(
(defenders['x'] - receiver['x'])**2 +
(defenders['y'] - receiver['y'])**2
)
features = {
'receiver_speed': receiver['s'],
'nearest_defender': distances.min(),
'defenders_within_5yd': (distances < 5).sum(),
'defenders_ahead': (defenders['x'] > receiver['x']).sum(),
'open_field': (53.3 - abs(receiver['y'] - 26.65)) / 26.65,
'distance_to_endzone': 110 - receiver['x']
}
return features
# Example model
# from sklearn.ensemble import RandomForestRegressor
# yac_model = RandomForestRegressor(n_estimators=100, random_state=42)
# yac_model.fit(X_train, y_train)
Receiver Route Running Efficiency
Evaluate route running quality:
#| label: route-efficiency-r
#| message: false
#| warning: false
# Calculate route efficiency metrics
calculate_route_efficiency <- function(player_tracking) {
player_tracking <- player_tracking %>% arrange(frameId)
tibble(
route_id = first(player_tracking$nflId),
total_distance = sum(sqrt(
player_tracking$dx^2 + player_tracking$dy^2
), na.rm = TRUE),
straight_distance = sqrt(
(last(player_tracking$x) - first(player_tracking$x))^2 +
(last(player_tracking$y) - first(player_tracking$y))^2
),
efficiency_ratio = straight_distance / total_distance,
avg_speed = mean(player_tracking$s, na.rm = TRUE),
top_speed = max(player_tracking$s, na.rm = TRUE),
speed_variance = sd(player_tracking$s, na.rm = TRUE),
sharp_cuts = sum(abs(diff(player_tracking$dir)) > 45, na.rm = TRUE)
) %>%
mutate(
route_quality_score = (efficiency_ratio * 0.3) +
(avg_speed / 15 * 0.3) +
(top_speed / 20 * 0.2) +
(1 / (sharp_cuts + 1) * 0.2)
)
}
# Calculate for receiver
wr_tracking <- tracking_example %>% filter(position == "WR")
route_efficiency <- calculate_route_efficiency(wr_tracking)
route_efficiency %>%
select(-route_id) %>%
pivot_longer(everything(), names_to = "Metric", values_to = "Value") %>%
gt() %>%
fmt_number(columns = Value, decimals = 3) %>%
tab_header(title = "Route Running Efficiency Metrics")
#| label: route-efficiency-py
#| message: false
#| warning: false
def calculate_route_efficiency(player_tracking):
"""Calculate route running efficiency metrics"""
player_tracking = player_tracking.sort_values('frameId')
# Calculate distances
dx = player_tracking['x'].diff().fillna(0)
dy = player_tracking['y'].diff().fillna(0)
total_distance = np.sqrt(dx**2 + dy**2).sum()
straight_distance = np.sqrt(
(player_tracking['x'].iloc[-1] - player_tracking['x'].iloc[0])**2 +
(player_tracking['y'].iloc[-1] - player_tracking['y'].iloc[0])**2
)
# Direction changes
dir_changes = player_tracking['dir'].diff().fillna(0).abs()
sharp_cuts = (dir_changes > 45).sum()
efficiency_ratio = straight_distance / total_distance if total_distance > 0 else 0
avg_speed = player_tracking['s'].mean()
top_speed = player_tracking['s'].max()
# Route quality score
route_quality_score = (
(efficiency_ratio * 0.3) +
(avg_speed / 15 * 0.3) +
(top_speed / 20 * 0.2) +
(1 / (sharp_cuts + 1) * 0.2)
)
return {
'total_distance': total_distance,
'straight_distance': straight_distance,
'efficiency_ratio': efficiency_ratio,
'avg_speed': avg_speed,
'top_speed': top_speed,
'speed_variance': player_tracking['s'].std(),
'sharp_cuts': sharp_cuts,
'route_quality_score': route_quality_score
}
# Calculate for receiver
wr_tracking = tracking_example[tracking_example['position'] == 'WR']
route_efficiency = calculate_route_efficiency(wr_tracking)
print("\nRoute Running Efficiency Metrics:")
for metric, value in route_efficiency.items():
print(f"{metric:25s}: {value:.3f}")
Summary
In this chapter, we explored player tracking data and spatial analysis in football:
- Data Structure: Learned the format and coordinate system of tracking data
- Movement Metrics: Calculated speed, acceleration, and distance traveled
- Separation Analysis: Measured receiver-defender separation and coverage quality
- Route Analysis: Clustered routes and evaluated route running efficiency
- Completion Models: Built expected completion models using spatial features
- Formation Detection: Identified defensive alignments from positioning data
- Spatial Control: Applied Voronoi diagrams and pitch control models
- Visualization: Created static and animated visualizations of plays
- Advanced Applications: Analyzed pass rush, expected YAC, and route quality
Tracking data provides unprecedented insight into player movement and spatial dynamics, enabling analysis that was impossible with traditional statistics. As tracking technology continues to improve and become more widely available, these spatial analytics will become increasingly important for team strategy and player evaluation.
Key Takeaways
1. Tracking data captures player locations 10 times per second, providing granular movement data 2. Separation is a critical metric for completion probability 3. Spatial control models (Voronoi, pitch control) quantify field dominance 4. Route clustering can identify patterns and tendencies 5. Tracking-based models significantly outperform traditional statistics for many predictions 6. Visualization is crucial for communicating tracking insightsExercises
Conceptual Questions
-
Coordinate Systems: Why is it important to standardize play direction when working with tracking data? What problems could arise if you don't?
-
Separation vs Coverage: Explain the difference between separation distance and coverage quality. Can a defender provide good coverage even with high separation?
-
Spatial Control: Compare and contrast Voronoi diagrams and pitch control models for measuring spatial dominance. What are the advantages and limitations of each?
Coding Exercises
Exercise 1: Speed Analysis
Using tracking data: a) Calculate the top speed achieved by each position group b) Identify plays where receivers reached maximum speed c) Analyze the relationship between receiver speed and separation **Dataset**: Use Big Data Bowl tracking data or the synthetic data from this chapter.Exercise 2: Separation Heat Map
Create a heat map showing: a) Average separation by field location b) Separation at different route depths c) Identify "hot zones" where receivers generate most separation **Visualization**: Use ggplot2 or matplotlib to create the heat map.Exercise 3: Route Clustering
Perform route clustering analysis: a) Extract features from 100+ routes (use synthetic or real data) b) Apply k-means clustering with k=4 c) Visualize the clusters and interpret the route types d) Calculate separation achieved by each route cluster **Method**: Use features like depth, width, direction changes, and ending position.Exercise 4: Expected Completion Model
Build an expected completion probability model: a) Create features from tracking data (separation, coverage, speed) b) Train a logistic regression or random forest model c) Evaluate model performance (AUC, calibration) d) Identify which features are most important **Advanced**: Compare model performance with and without tracking features.Exercise 5: Play Animation
Create an animated visualization: a) Load a full play from tracking data b) Create frames showing player movement c) Add trails showing player paths d) Highlight key events (snap, throw, catch) **Output**: Save as GIF or MP4 file.Further Reading
Academic Papers
-
Fernández, J., & Bornn, L. (2018). "Wide Open Spaces: A statistical technique for measuring space creation in professional soccer." MIT Sloan Sports Analytics Conference.
-
Spearman, W. (2018). "Beyond Expected Goals." MIT Sloan Sports Analytics Conference.
-
Steiner, S., & Raabe, D. (2020). "Player tracking data in football: An application to position-specific player performance." Journal of Sports Analytics, 6(4), 241-252.
Technical Resources
- NFL Big Data Bowl Competition (Kaggle)
- Next Gen Stats Documentation (NFL.com)
- Fernández, J. et al. (2019). "Decomposing the Immeasurable Sport" (tracking data methodology)
Books and Guides
-
Alamar, B. (2013). Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers. Columbia University Press.
-
"A Framework for Tactical Analysis and Individual Offensive Production Assessment in Soccer Using Markov Chains" - Link, D. et al.
References
:::