Learning ObjectivesBy the end of this chapter, you will be able to:

  1. Work with NFL tracking data (Next Gen Stats)
  2. Analyze player movement and positioning
  3. Calculate spatial metrics (separation, coverage)
  4. Build tracking-based models
  5. Visualize tracking data effectively

Introduction

Player tracking data represents one of the most significant advances in football analytics. Since the NFL began installing RFID chips in player shoulder pads and the football in 2015, analysts have access to precise location data for all 22 players on the field at 10 times per second. This granular spatial and temporal data opens up entirely new avenues for analysis that were impossible with traditional play-by-play data.

Next Gen Stats (NGS) tracking data captures:

  • Player locations (x, y coordinates)
  • Player velocity and acceleration
  • Player orientation (direction facing)
  • Ball location and trajectory
  • All measurements 10 times per second

What is Tracking Data?

Tracking data provides the precise location and movement of every player and the ball throughout a play. Each frame captures 22 players plus the ball, recorded 10 times per second, resulting in hundreds of data points per play.

This chapter will teach you how to work with tracking data to answer questions that traditional statistics cannot address:

  • How fast was the receiver running when the ball arrived?
  • How much separation did the receiver create?
  • Which defenders were in optimal position?
  • What spatial area did each defender control?
  • How do route patterns cluster together?

Tracking Data Structure and Format

Data Format

NFL tracking data is typically provided in a longitudinal format with one row per player per frame:

#| label: load-libraries-r
#| message: false
#| warning: false

library(tidyverse)
library(nflfastR)
library(arrow)
library(gganimate)
library(ggforce)
library(ggrepel)
library(gt)
library(plotly)
#| label: load-libraries-py
#| message: false
#| warning: false

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.spatial import Voronoi, voronoi_plot_2d
from scipy.spatial.distance import cdist
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

Loading Tracking Data

The NFL has released tracking data through Kaggle competitions. Let's load and examine the structure:

#| label: load-tracking-r
#| eval: false
#| echo: true

# Load tracking data (example from Big Data Bowl)
tracking <- read_csv("data/tracking_week_1.csv")

# Examine structure
glimpse(tracking)

# Sample output:
# Rows: 1,234,567
# Columns: 14
# $ gameId    <dbl> 2021091200, 2021091200, ...
# $ playId    <dbl> 97, 97, 97, ...
# $ nflId     <dbl> NA, 47848, 32488, ...
# $ frameId   <dbl> 1, 1, 1, ...
# $ time      <chr> "2021-09-12 16:03:21.5", ...
# $ jerseyNumber <dbl> NA, 88, 26, ...
# $ team      <chr> "football", "ARI", "ARI", ...
# $ playDirection <chr> "left", "left", "left", ...
# $ x         <dbl> 46.84, 28.33, 31.82, ...
# $ y         <dbl> 26.65, 18.77, 30.52, ...
# $ s         <dbl> 0.00, 1.38, 1.02, ...
# $ a         <dbl> 0.00, 1.84, 1.42, ...
# $ dis       <dbl> 0.00, 0.14, 0.10, ...
# $ o         <dbl> NA, 171.26, 79.38, ...
# $ dir       <dbl> NA, 174.88, 81.20, ...
#| label: load-tracking-py
#| eval: false
#| echo: true

# Load tracking data (example from Big Data Bowl)
tracking = pd.read_csv("data/tracking_week_1.csv")

# Examine structure
print(tracking.info())
print("\nFirst few rows:")
print(tracking.head())

# Sample columns:
# - gameId: Game identifier
# - playId: Play identifier
# - nflId: Player identifier (NA for football)
# - frameId: Frame number (1-N)
# - time: Timestamp
# - jerseyNumber: Player jersey number
# - team: Team abbreviation or 'football'
# - playDirection: Direction of play
# - x, y: Field coordinates
# - s: Speed (yards/second)
# - a: Acceleration (yards/second^2)
# - dis: Distance traveled (yards)
# - o: Orientation (degrees)
# - dir: Direction of movement (degrees)

Coordinate System

Understanding the coordinate system is crucial:

  • x coordinate: Position along the length of the field (0-120 yards)
  • 0 = back of left end zone
  • 10 = left goal line
  • 60 = midfield
  • 110 = right goal line
  • 120 = back of right end zone

  • y coordinate: Position across the width of the field (0-53.3 yards)

  • 0 = left sideline
  • 26.65 = middle of field
  • 53.3 = right sideline

  • playDirection: "left" or "right" indicating offensive direction

Standardizing Play Direction

Always standardize plays to go in one direction (typically left-to-right) to make analysis easier. This involves flipping x coordinates when playDirection is "left".

Creating Synthetic Tracking Data

For demonstration purposes, let's create synthetic tracking data for a simple passing play:

#| label: synthetic-tracking-r
#| message: false
#| warning: false

# Create synthetic tracking data for one play
set.seed(123)

# Function to generate player trajectory
generate_trajectory <- function(start_x, start_y, end_x, end_y, frames = 50) {
  x <- seq(start_x, end_x, length.out = frames)
  y <- seq(start_y, end_y, length.out = frames)

  # Add some noise for realism
  y <- y + rnorm(frames, 0, 0.5)

  tibble(
    frameId = 1:frames,
    x = x,
    y = y
  )
}

# Generate receiver route (go route)
receiver <- generate_trajectory(75, 20, 95, 22) %>%
  mutate(
    nflId = 1001,
    jerseyNumber = 88,
    team = "OFF",
    position = "WR"
  )

# Generate cornerback coverage
cornerback <- generate_trajectory(75, 18, 93, 20) %>%
  mutate(
    nflId = 2001,
    jerseyNumber = 25,
    team = "DEF",
    position = "CB"
  )

# Generate safety help
safety <- generate_trajectory(85, 26.65, 95, 24) %>%
  mutate(
    nflId = 2002,
    jerseyNumber = 43,
    team = "DEF",
    position = "S"
  )

# Combine all players
tracking_example <- bind_rows(receiver, cornerback, safety) %>%
  mutate(
    gameId = 2023091000,
    playId = 1
  )

# Calculate speed and acceleration
tracking_example <- tracking_example %>%
  group_by(nflId) %>%
  arrange(frameId) %>%
  mutate(
    # Speed calculation
    dx = x - lag(x, default = first(x)),
    dy = y - lag(y, default = first(y)),
    s = sqrt(dx^2 + dy^2) * 10,  # multiply by 10 for yards/second
    # Acceleration
    a = (s - lag(s, default = first(s))) * 10,
    # Direction
    dir = atan2(dy, dx) * 180 / pi
  ) %>%
  ungroup()

# Display sample
tracking_example %>%
  filter(frameId <= 3) %>%
  select(frameId, nflId, jerseyNumber, position, x, y, s) %>%
  gt() %>%
  fmt_number(columns = c(x, y, s), decimals = 2) %>%
  tab_header(title = "Sample Tracking Data")
#| label: synthetic-tracking-py
#| message: false
#| warning: false

# Create synthetic tracking data for one play
np.random.seed(123)

def generate_trajectory(start_x, start_y, end_x, end_y, frames=50):
    """Generate player trajectory with noise"""
    x = np.linspace(start_x, end_x, frames)
    y = np.linspace(start_y, end_y, frames)

    # Add noise for realism
    y = y + np.random.normal(0, 0.5, frames)

    return pd.DataFrame({
        'frameId': range(1, frames + 1),
        'x': x,
        'y': y
    })

# Generate receiver route (go route)
receiver = generate_trajectory(75, 20, 95, 22)
receiver['nflId'] = 1001
receiver['jerseyNumber'] = 88
receiver['team'] = 'OFF'
receiver['position'] = 'WR'

# Generate cornerback coverage
cornerback = generate_trajectory(75, 18, 93, 20)
cornerback['nflId'] = 2001
cornerback['jerseyNumber'] = 25
cornerback['team'] = 'DEF'
cornerback['position'] = 'CB'

# Generate safety help
safety = generate_trajectory(85, 26.65, 95, 24)
safety['nflId'] = 2002
safety['jerseyNumber'] = 43
safety['team'] = 'DEF'
safety['position'] = 'S'

# Combine all players
tracking_example = pd.concat([receiver, cornerback, safety], ignore_index=True)
tracking_example['gameId'] = 2023091000
tracking_example['playId'] = 1

# Calculate speed and acceleration
tracking_example = tracking_example.sort_values(['nflId', 'frameId'])

for nfl_id in tracking_example['nflId'].unique():
    mask = tracking_example['nflId'] == nfl_id

    # Calculate differences
    dx = tracking_example.loc[mask, 'x'].diff().fillna(0)
    dy = tracking_example.loc[mask, 'y'].diff().fillna(0)

    # Speed (yards/second)
    tracking_example.loc[mask, 's'] = np.sqrt(dx**2 + dy**2) * 10

    # Acceleration
    s_diff = tracking_example.loc[mask, 's'].diff().fillna(0)
    tracking_example.loc[mask, 'a'] = s_diff * 10

    # Direction
    tracking_example.loc[mask, 'dir'] = np.arctan2(dy, dx) * 180 / np.pi

# Display sample
print("Sample Tracking Data:")
print(tracking_example[tracking_example['frameId'] <= 3][
    ['frameId', 'nflId', 'jerseyNumber', 'position', 'x', 'y', 's']
].to_string(index=False))

Player Speed, Acceleration, and Distance Metrics

Speed Analysis

Speed is one of the most straightforward metrics from tracking data. Let's analyze speed distributions and identify top speeds:

#| label: speed-analysis-r
#| message: false
#| warning: false

# Calculate max speed for each player
max_speeds <- tracking_example %>%
  group_by(nflId, jerseyNumber, position) %>%
  summarise(
    max_speed = max(s, na.rm = TRUE),
    avg_speed = mean(s, na.rm = TRUE),
    frames = n(),
    .groups = "drop"
  )

max_speeds %>%
  arrange(desc(max_speed)) %>%
  gt() %>%
  fmt_number(columns = c(max_speed, avg_speed), decimals = 2) %>%
  cols_label(
    nflId = "Player ID",
    jerseyNumber = "Jersey",
    position = "Position",
    max_speed = "Max Speed (yd/s)",
    avg_speed = "Avg Speed (yd/s)",
    frames = "Frames"
  ) %>%
  tab_header(title = "Player Speed Summary")
#| label: speed-analysis-py
#| message: false
#| warning: false

# Calculate max speed for each player
max_speeds = (tracking_example
    .groupby(['nflId', 'jerseyNumber', 'position'])
    .agg(
        max_speed=('s', 'max'),
        avg_speed=('s', 'mean'),
        frames=('s', 'count')
    )
    .reset_index()
    .sort_values('max_speed', ascending=False)
)

print("\nPlayer Speed Summary:")
print(max_speeds.to_string(index=False))

Speed Over Time Visualization

#| label: fig-speed-time-r
#| fig-cap: "Player speed throughout the play"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

tracking_example %>%
  filter(!is.na(s)) %>%
  ggplot(aes(x = frameId, y = s, color = position, group = nflId)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2, alpha = 0.6) +
  scale_color_manual(
    values = c("WR" = "#00BFC4", "CB" = "#F8766D", "S" = "#7CAE00"),
    labels = c("WR" = "Wide Receiver", "CB" = "Cornerback", "S" = "Safety")
  ) +
  labs(
    title = "Player Speed Throughout the Play",
    subtitle = "Speed measured in yards per second",
    x = "Frame Number",
    y = "Speed (yards/second)",
    color = "Position"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )
#| label: fig-speed-time-py
#| fig-cap: "Player speed throughout the play - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Filter out NA values
plot_data = tracking_example.dropna(subset=['s'])

# Create plot
plt.figure(figsize=(10, 6))

colors = {'WR': '#00BFC4', 'CB': '#F8766D', 'S': '#7CAE00'}
labels = {'WR': 'Wide Receiver', 'CB': 'Cornerback', 'S': 'Safety'}

for position in plot_data['position'].unique():
    data = plot_data[plot_data['position'] == position]
    plt.plot(data['frameId'], data['s'],
             color=colors[position],
             label=labels[position],
             marker='o', markersize=4, alpha=0.6, linewidth=2)

plt.xlabel('Frame Number', fontsize=12)
plt.ylabel('Speed (yards/second)', fontsize=12)
plt.title('Player Speed Throughout the Play\nSpeed measured in yards per second',
          fontsize=14, fontweight='bold')
plt.legend(title='Position', loc='upper right')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Acceleration Analysis

Acceleration measures how quickly players change speed—important for evaluating quickness and explosiveness:

#| label: acceleration-analysis-r
#| message: false
#| warning: false

# Calculate acceleration metrics
acceleration_summary <- tracking_example %>%
  group_by(nflId, jerseyNumber, position) %>%
  summarise(
    max_accel = max(a, na.rm = TRUE),
    max_decel = min(a, na.rm = TRUE),
    avg_accel = mean(a[a > 0], na.rm = TRUE),
    .groups = "drop"
  )

acceleration_summary %>%
  arrange(desc(max_accel)) %>%
  gt() %>%
  fmt_number(columns = c(max_accel, max_decel, avg_accel), decimals = 2) %>%
  cols_label(
    jerseyNumber = "Jersey",
    position = "Position",
    max_accel = "Max Acceleration",
    max_decel = "Max Deceleration",
    avg_accel = "Avg Acceleration"
  ) %>%
  tab_header(
    title = "Player Acceleration Summary",
    subtitle = "Measured in yards/second²"
  )
#| label: acceleration-analysis-py
#| message: false
#| warning: false

# Calculate acceleration metrics
acceleration_data = tracking_example.dropna(subset=['a'])

acceleration_summary = []
for nfl_id in acceleration_data['nflId'].unique():
    player_data = acceleration_data[acceleration_data['nflId'] == nfl_id]

    acceleration_summary.append({
        'nflId': nfl_id,
        'jerseyNumber': player_data['jerseyNumber'].iloc[0],
        'position': player_data['position'].iloc[0],
        'max_accel': player_data['a'].max(),
        'max_decel': player_data['a'].min(),
        'avg_accel': player_data[player_data['a'] > 0]['a'].mean()
    })

accel_df = pd.DataFrame(acceleration_summary).sort_values('max_accel', ascending=False)

print("\nPlayer Acceleration Summary:")
print("Measured in yards/second²")
print(accel_df.to_string(index=False))

Distance Traveled

Calculate total distance covered by each player:

#| label: distance-analysis-r
#| message: false
#| warning: false

# Calculate distance metrics
distance_summary <- tracking_example %>%
  group_by(nflId, jerseyNumber, position) %>%
  arrange(frameId) %>%
  summarise(
    total_distance = sum(sqrt(dx^2 + dy^2), na.rm = TRUE),
    straight_line_distance = sqrt((last(x) - first(x))^2 + (last(y) - first(y))^2),
    efficiency = straight_line_distance / total_distance,
    .groups = "drop"
  )

distance_summary %>%
  arrange(desc(total_distance)) %>%
  gt() %>%
  fmt_number(columns = c(total_distance, straight_line_distance), decimals = 2) %>%
  fmt_percent(columns = efficiency, decimals = 1) %>%
  cols_label(
    jerseyNumber = "Jersey",
    position = "Position",
    total_distance = "Total Distance (yd)",
    straight_line_distance = "Straight Distance (yd)",
    efficiency = "Route Efficiency"
  ) %>%
  tab_header(title = "Player Distance Summary")
#| label: distance-analysis-py
#| message: false
#| warning: false

# Calculate distance metrics
distance_summary = []

for nfl_id in tracking_example['nflId'].unique():
    player_data = tracking_example[tracking_example['nflId'] == nfl_id].sort_values('frameId')

    # Calculate distances
    dx = player_data['x'].diff().fillna(0)
    dy = player_data['y'].diff().fillna(0)
    total_dist = np.sqrt(dx**2 + dy**2).sum()

    straight_dist = np.sqrt(
        (player_data['x'].iloc[-1] - player_data['x'].iloc[0])**2 +
        (player_data['y'].iloc[-1] - player_data['y'].iloc[0])**2
    )

    distance_summary.append({
        'nflId': nfl_id,
        'jerseyNumber': player_data['jerseyNumber'].iloc[0],
        'position': player_data['position'].iloc[0],
        'total_distance': total_dist,
        'straight_line_distance': straight_dist,
        'efficiency': straight_dist / total_dist if total_dist > 0 else 0
    })

dist_df = pd.DataFrame(distance_summary).sort_values('total_distance', ascending=False)

print("\nPlayer Distance Summary:")
print(dist_df.to_string(index=False))

Separation and Coverage Metrics

Calculating Separation Distance

Separation is the distance between a receiver and the nearest defender—a critical metric for pass completion probability:

#| label: separation-calc-r
#| message: false
#| warning: false

# Function to calculate separation
calculate_separation <- function(tracking_data) {
  tracking_data %>%
    group_by(frameId) %>%
    summarise(
      wr_x = x[position == "WR"],
      wr_y = y[position == "WR"],
      # Find minimum distance to any defender
      separation = min(
        sqrt((x[team == "DEF"] - wr_x)^2 + (y[team == "DEF"] - wr_y)^2)
      ),
      .groups = "drop"
    )
}

# Calculate separation for our example
separation_data <- calculate_separation(tracking_example)

# Summary statistics
separation_summary <- separation_data %>%
  summarise(
    min_separation = min(separation, na.rm = TRUE),
    max_separation = max(separation, na.rm = TRUE),
    avg_separation = mean(separation, na.rm = TRUE),
    median_separation = median(separation, na.rm = TRUE)
  )

separation_summary %>%
  gt() %>%
  fmt_number(decimals = 2) %>%
  cols_label(
    min_separation = "Min (yd)",
    max_separation = "Max (yd)",
    avg_separation = "Mean (yd)",
    median_separation = "Median (yd)"
  ) %>%
  tab_header(title = "Receiver Separation Summary")
#| label: separation-calc-py
#| message: false
#| warning: false

def calculate_separation(tracking_data):
    """Calculate separation between receiver and nearest defender"""
    separations = []

    for frame in tracking_data['frameId'].unique():
        frame_data = tracking_data[tracking_data['frameId'] == frame]

        # Get receiver position
        wr_pos = frame_data[frame_data['position'] == 'WR'][['x', 'y']].values

        # Get defender positions
        def_pos = frame_data[frame_data['team'] == 'DEF'][['x', 'y']].values

        if len(wr_pos) > 0 and len(def_pos) > 0:
            # Calculate distances to all defenders
            distances = cdist(wr_pos, def_pos)
            min_distance = distances.min()

            separations.append({
                'frameId': frame,
                'separation': min_distance
            })

    return pd.DataFrame(separations)

# Calculate separation
separation_data = calculate_separation(tracking_example)

# Summary statistics
print("\nReceiver Separation Summary:")
print(f"Min: {separation_data['separation'].min():.2f} yards")
print(f"Max: {separation_data['separation'].max():.2f} yards")
print(f"Mean: {separation_data['separation'].mean():.2f} yards")
print(f"Median: {separation_data['separation'].median():.2f} yards")

Separation Over Time

#| label: fig-separation-time-r
#| fig-cap: "Receiver separation from nearest defender over time"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

separation_data %>%
  ggplot(aes(x = frameId, y = separation)) +
  geom_line(color = "#00BFC4", linewidth = 1.2) +
  geom_point(color = "#00BFC4", size = 2, alpha = 0.6) +
  geom_hline(yintercept = 2, linetype = "dashed", color = "red", alpha = 0.7) +
  annotate("text", x = max(separation_data$frameId) * 0.8, y = 2.3,
           label = "Tight Coverage (2 yards)", color = "red", size = 3.5) +
  labs(
    title = "Receiver Separation Throughout the Play",
    subtitle = "Distance to nearest defender measured in yards",
    x = "Frame Number",
    y = "Separation (yards)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14)
  )
#| label: fig-separation-time-py
#| fig-cap: "Receiver separation from nearest defender - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

plt.figure(figsize=(10, 6))

plt.plot(separation_data['frameId'], separation_data['separation'],
         color='#00BFC4', linewidth=2, marker='o', markersize=4, alpha=0.6)

plt.axhline(y=2, color='red', linestyle='--', alpha=0.7)
plt.text(separation_data['frameId'].max() * 0.8, 2.3,
         'Tight Coverage (2 yards)', color='red', fontsize=10)

plt.xlabel('Frame Number', fontsize=12)
plt.ylabel('Separation (yards)', fontsize=12)
plt.title('Receiver Separation Throughout the Play\nDistance to nearest defender measured in yards',
          fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Coverage Quality Metrics

Develop metrics for coverage quality based on position and separation:

#| label: coverage-quality-r
#| message: false
#| warning: false

# Calculate coverage quality metrics
coverage_metrics <- tracking_example %>%
  filter(team == "DEF") %>%
  left_join(
    separation_data %>% select(frameId, separation),
    by = "frameId"
  ) %>%
  group_by(nflId, jerseyNumber, position) %>%
  summarise(
    avg_separation_allowed = mean(separation, na.rm = TRUE),
    min_separation = min(separation, na.rm = TRUE),
    tight_coverage_pct = mean(separation < 2, na.rm = TRUE),
    .groups = "drop"
  )

coverage_metrics %>%
  arrange(avg_separation_allowed) %>%
  gt() %>%
  fmt_number(columns = c(avg_separation_allowed, min_separation), decimals = 2) %>%
  fmt_percent(columns = tight_coverage_pct, decimals = 1) %>%
  cols_label(
    jerseyNumber = "Jersey",
    position = "Position",
    avg_separation_allowed = "Avg Separation (yd)",
    min_separation = "Min Separation (yd)",
    tight_coverage_pct = "Tight Coverage %"
  ) %>%
  tab_header(
    title = "Defender Coverage Quality",
    subtitle = "Tight coverage defined as < 2 yards"
  )
#| label: coverage-quality-py
#| message: false
#| warning: false

# Calculate coverage quality metrics
defenders = tracking_example[tracking_example['team'] == 'DEF'].copy()
defenders = defenders.merge(separation_data[['frameId', 'separation']], on='frameId')

coverage_metrics = (defenders
    .groupby(['nflId', 'jerseyNumber', 'position'])
    .agg(
        avg_separation_allowed=('separation', 'mean'),
        min_separation=('separation', 'min'),
        tight_coverage_pct=('separation', lambda x: (x < 2).mean())
    )
    .reset_index()
    .sort_values('avg_separation_allowed')
)

print("\nDefender Coverage Quality:")
print("Tight coverage defined as < 2 yards")
print(coverage_metrics.to_string(index=False))

Route Analysis and Path Clustering

Route Path Features

Extract features from route paths for clustering and classification:

#| label: route-features-r
#| message: false
#| warning: false

# Calculate route features
calculate_route_features <- function(player_data) {
  player_data <- player_data %>% arrange(frameId)

  tibble(
    total_distance = sum(sqrt(player_data$dx^2 + player_data$dy^2), na.rm = TRUE),
    depth = max(player_data$x) - min(player_data$x),
    width = max(player_data$y) - min(player_data$y),
    max_speed = max(player_data$s, na.rm = TRUE),
    avg_speed = mean(player_data$s, na.rm = TRUE),
    direction_changes = sum(abs(diff(player_data$dir)) > 45, na.rm = TRUE),
    end_x = last(player_data$x),
    end_y = last(player_data$y)
  )
}

# Example route features for receiver
route_features <- tracking_example %>%
  filter(position == "WR") %>%
  calculate_route_features()

route_features %>%
  pivot_longer(everything(), names_to = "Feature", values_to = "Value") %>%
  gt() %>%
  fmt_number(columns = Value, decimals = 2) %>%
  cols_label(
    Feature = "Route Feature",
    Value = "Value"
  ) %>%
  tab_header(title = "Route Path Features")
#| label: route-features-py
#| message: false
#| warning: false

def calculate_route_features(player_data):
    """Calculate features from route path"""
    player_data = player_data.sort_values('frameId')

    # Calculate distance
    dx = player_data['x'].diff().fillna(0)
    dy = player_data['y'].diff().fillna(0)
    total_distance = np.sqrt(dx**2 + dy**2).sum()

    # Direction changes
    dir_diff = player_data['dir'].diff().fillna(0).abs()
    direction_changes = (dir_diff > 45).sum()

    features = {
        'total_distance': total_distance,
        'depth': player_data['x'].max() - player_data['x'].min(),
        'width': player_data['y'].max() - player_data['y'].min(),
        'max_speed': player_data['s'].max(),
        'avg_speed': player_data['s'].mean(),
        'direction_changes': direction_changes,
        'end_x': player_data['x'].iloc[-1],
        'end_y': player_data['y'].iloc[-1]
    }

    return features

# Calculate features for receiver
wr_data = tracking_example[tracking_example['position'] == 'WR']
route_features = calculate_route_features(wr_data)

print("\nRoute Path Features:")
for feature, value in route_features.items():
    print(f"{feature:20s}: {value:.2f}")

Route Clustering

Use clustering to identify common route patterns:

#| label: route-clustering-r
#| eval: false
#| echo: true

# Create dataset of multiple routes (synthetic example)
set.seed(456)

# Generate different route types
generate_route_set <- function(n_routes = 100) {
  routes <- list()

  for (i in 1:n_routes) {
    route_type <- sample(c("go", "slant", "out", "post"), 1)

    if (route_type == "go") {
      # Straight vertical route
      route <- generate_trajectory(75, 20, 95, 22 + rnorm(1, 0, 2))
    } else if (route_type == "slant") {
      # Diagonal route
      route <- generate_trajectory(75, 20, 85, 26 + rnorm(1, 0, 1))
    } else if (route_type == "out") {
      # Out route
      route <- generate_trajectory(75, 20, 85, 15 + rnorm(1, 0, 1))
    } else {
      # Post route
      route <- generate_trajectory(75, 20, 95, 26 + rnorm(1, 0, 1))
    }

    route$route_id <- i
    route$true_type <- route_type
    routes[[i]] <- route
  }

  bind_rows(routes)
}

# Generate routes
all_routes <- generate_route_set()

# Calculate features for each route
route_features <- all_routes %>%
  group_by(route_id) %>%
  summarise(
    depth = max(x) - min(x),
    width = max(y) - min(y),
    end_y = last(y),
    true_type = first(true_type),
    .groups = "drop"
  )

# Perform k-means clustering
set.seed(789)
kmeans_result <- route_features %>%
  select(depth, width, end_y) %>%
  scale() %>%
  kmeans(centers = 4, nstart = 25)

# Add cluster assignments
route_features$cluster <- as.factor(kmeans_result$cluster)

# Visualize clusters
ggplot(route_features, aes(x = depth, y = end_y, color = cluster)) +
  geom_point(size = 3, alpha = 0.6) +
  labs(
    title = "Route Clustering",
    subtitle = "Routes grouped by depth and ending position",
    x = "Route Depth (yards)",
    y = "Ending Y Position (yards)",
    color = "Cluster"
  ) +
  theme_minimal()
#| label: route-clustering-py
#| eval: false
#| echo: true

# Create dataset of multiple routes (synthetic example)
np.random.seed(456)

def generate_route_set(n_routes=100):
    """Generate multiple routes of different types"""
    routes = []

    for i in range(n_routes):
        route_type = np.random.choice(['go', 'slant', 'out', 'post'])

        if route_type == 'go':
            route = generate_trajectory(75, 20, 95, 22 + np.random.normal(0, 2))
        elif route_type == 'slant':
            route = generate_trajectory(75, 20, 85, 26 + np.random.normal(0, 1))
        elif route_type == 'out':
            route = generate_trajectory(75, 20, 85, 15 + np.random.normal(0, 1))
        else:  # post
            route = generate_trajectory(75, 20, 95, 26 + np.random.normal(0, 1))

        route['route_id'] = i
        route['true_type'] = route_type
        routes.append(route)

    return pd.concat(routes, ignore_index=True)

# Generate routes
all_routes = generate_route_set()

# Calculate features for each route
route_features = (all_routes
    .groupby('route_id')
    .agg(
        depth=('x', lambda x: x.max() - x.min()),
        width=('y', lambda x: x.max() - x.min()),
        end_y=('y', 'last'),
        true_type=('true_type', 'first')
    )
    .reset_index()
)

# Perform k-means clustering
scaler = StandardScaler()
features_scaled = scaler.fit_transform(route_features[['depth', 'width', 'end_y']])

kmeans = KMeans(n_clusters=4, random_state=789, n_init=25)
route_features['cluster'] = kmeans.fit_predict(features_scaled)

# Visualize clusters
plt.figure(figsize=(10, 6))
for cluster in range(4):
    cluster_data = route_features[route_features['cluster'] == cluster]
    plt.scatter(cluster_data['depth'], cluster_data['end_y'],
               label=f'Cluster {cluster}', alpha=0.6, s=50)

plt.xlabel('Route Depth (yards)', fontsize=12)
plt.ylabel('Ending Y Position (yards)', fontsize=12)
plt.title('Route Clustering\nRoutes grouped by depth and ending position',
          fontsize=14, fontweight='bold')
plt.legend(title='Cluster')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Expected Completion Models with Tracking

Feature Engineering

Create features from tracking data for completion probability models:

#| label: xcomp-features-r
#| eval: false
#| echo: true

# Function to calculate features at target time (when ball arrives)
calculate_completion_features <- function(tracking_data, target_frame) {
  target_data <- tracking_data %>% filter(frameId == target_frame)

  # Get receiver and ball positions
  receiver <- target_data %>% filter(position == "WR")
  defenders <- target_data %>% filter(team == "DEF")

  # Calculate separation metrics
  distances_to_defenders <- sqrt(
    (defenders$x - receiver$x)^2 + (defenders$y - receiver$y)^2
  )

  tibble(
    separation = min(distances_to_defenders),
    defenders_within_1yd = sum(distances_to_defenders < 1),
    defenders_within_2yd = sum(distances_to_defenders < 2),
    defenders_within_3yd = sum(distances_to_defenders < 3),
    receiver_speed = receiver$s,
    closest_defender_speed = defenders$s[which.min(distances_to_defenders)],
    target_x = receiver$x,
    target_y = receiver$y,
    depth_of_target = receiver$x - 75  # assuming LOS at 75
  )
}

# Example usage
target_frame <- 30  # Frame when ball arrives
completion_features <- calculate_completion_features(tracking_example, target_frame)

completion_features %>%
  pivot_longer(everything(), names_to = "Feature", values_to = "Value") %>%
  gt() %>%
  fmt_number(columns = Value, decimals = 2) %>%
  tab_header(title = "Completion Probability Features")
#| label: xcomp-features-py
#| eval: false
#| echo: true

def calculate_completion_features(tracking_data, target_frame):
    """Calculate features at target time for completion model"""
    target_data = tracking_data[tracking_data['frameId'] == target_frame]

    # Get receiver and defenders
    receiver = target_data[target_data['position'] == 'WR'].iloc[0]
    defenders = target_data[target_data['team'] == 'DEF']

    # Calculate distances
    distances = np.sqrt(
        (defenders['x'] - receiver['x'])**2 +
        (defenders['y'] - receiver['y'])**2
    )

    features = {
        'separation': distances.min(),
        'defenders_within_1yd': (distances < 1).sum(),
        'defenders_within_2yd': (distances < 2).sum(),
        'defenders_within_3yd': (distances < 3).sum(),
        'receiver_speed': receiver['s'],
        'closest_defender_speed': defenders.iloc[distances.argmin()]['s'],
        'target_x': receiver['x'],
        'target_y': receiver['y'],
        'depth_of_target': receiver['x'] - 75  # assuming LOS at 75
    }

    return features

# Example usage
target_frame = 30  # Frame when ball arrives
completion_features = calculate_completion_features(tracking_example, target_frame)

print("\nCompletion Probability Features:")
for feature, value in completion_features.items():
    print(f"{feature:25s}: {value:.2f}")

Expected Completion Model

Build a logistic regression model for completion probability:

#| label: xcomp-model-r
#| eval: false
#| echo: true

# Load completion data (synthetic example)
# In practice, you would load actual tracking data with completion outcomes

set.seed(999)
n_plays <- 1000

# Simulate features and outcomes
completion_data <- tibble(
  separation = abs(rnorm(n_plays, 2.5, 1.5)),
  defenders_within_2yd = rpois(n_plays, 0.8),
  receiver_speed = abs(rnorm(n_plays, 10, 2)),
  depth_of_target = abs(rnorm(n_plays, 15, 8)),
  air_yards = abs(rnorm(n_plays, 12, 6))
) %>%
  mutate(
    # Simulate completion probability
    xcomp = plogis(
      0.5 +
      0.3 * separation -
      0.4 * defenders_within_2yd -
      0.02 * depth_of_target +
      0.01 * receiver_speed
    ),
    # Simulate actual completions
    completion = rbinom(n_plays, 1, xcomp)
  )

# Fit logistic regression model
xcomp_model <- glm(
  completion ~ separation + defenders_within_2yd +
    receiver_speed + depth_of_target + air_yards,
  data = completion_data,
  family = binomial(link = "logit")
)

# Model summary
summary(xcomp_model)

# Calculate predicted probabilities
completion_data$predicted_xcomp <- predict(xcomp_model, type = "response")

# Evaluate model performance
library(pROC)
roc_obj <- roc(completion_data$completion, completion_data$predicted_xcomp)
auc(roc_obj)
#| label: xcomp-model-py
#| eval: false
#| echo: true

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, roc_curve

# Simulate features and outcomes
np.random.seed(999)
n_plays = 1000

completion_data = pd.DataFrame({
    'separation': np.abs(np.random.normal(2.5, 1.5, n_plays)),
    'defenders_within_2yd': np.random.poisson(0.8, n_plays),
    'receiver_speed': np.abs(np.random.normal(10, 2, n_plays)),
    'depth_of_target': np.abs(np.random.normal(15, 8, n_plays)),
    'air_yards': np.abs(np.random.normal(12, 6, n_plays))
})

# Simulate completion probability
from scipy.special import expit

completion_data['xcomp'] = expit(
    0.5 +
    0.3 * completion_data['separation'] -
    0.4 * completion_data['defenders_within_2yd'] -
    0.02 * completion_data['depth_of_target'] +
    0.01 * completion_data['receiver_speed']
)

completion_data['completion'] = np.random.binomial(
    1, completion_data['xcomp']
)

# Prepare features and target
features = ['separation', 'defenders_within_2yd', 'receiver_speed',
            'depth_of_target', 'air_yards']
X = completion_data[features]
y = completion_data['completion']

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Fit logistic regression
xcomp_model = LogisticRegression(random_state=42, max_iter=1000)
xcomp_model.fit(X_train, y_train)

# Predictions
y_pred_proba = xcomp_model.predict_proba(X_test)[:, 1]

# Evaluate
auc_score = roc_auc_score(y_test, y_pred_proba)
print(f"\nModel AUC: {auc_score:.3f}")

# Feature importance
print("\nFeature Coefficients:")
for feature, coef in zip(features, xcomp_model.coef_[0]):
    print(f"{feature:25s}: {coef:7.3f}")

Defensive Alignment Detection

Formation Detection

Identify defensive formations from pre-snap positioning:

#| label: formation-detection-r
#| eval: false
#| echo: true

# Function to detect defensive alignment
detect_defensive_formation <- function(tracking_data, snap_frame = 1) {
  # Get defender positions at snap
  defenders <- tracking_data %>%
    filter(frameId == snap_frame, team == "DEF") %>%
    arrange(x)

  # Count defenders by region
  box_count <- sum(defenders$y > 14 & defenders$y < 40 & defenders$x < 80)
  deep_safeties <- sum(defenders$x > 85)

  # Classify formation
  formation <- case_when(
    deep_safeties >= 2 & box_count <= 6 ~ "Cover 2",
    deep_safeties == 1 & box_count <= 6 ~ "Cover 3",
    deep_safeties == 0 ~ "Cover 0/1",
    box_count >= 8 ~ "Heavy Box",
    TRUE ~ "Other"
  )

  list(
    formation = formation,
    box_count = box_count,
    deep_safeties = deep_safeties,
    defender_positions = defenders %>% select(x, y, position)
  )
}

# Example usage
formation_info <- detect_defensive_formation(tracking_example)
cat("Detected Formation:", formation_info$formation, "\n")
cat("Box Count:", formation_info$box_count, "\n")
cat("Deep Safeties:", formation_info$deep_safeties, "\n")
#| label: formation-detection-py
#| eval: false
#| echo: true

def detect_defensive_formation(tracking_data, snap_frame=1):
    """Detect defensive formation from pre-snap positioning"""
    # Get defender positions at snap
    defenders = (tracking_data
        .query(f"frameId == {snap_frame} & team == 'DEF'")
        .sort_values('x')
    )

    # Count defenders by region
    box_count = ((defenders['y'] > 14) &
                 (defenders['y'] < 40) &
                 (defenders['x'] < 80)).sum()

    deep_safeties = (defenders['x'] > 85).sum()

    # Classify formation
    if deep_safeties >= 2 and box_count <= 6:
        formation = "Cover 2"
    elif deep_safeties == 1 and box_count <= 6:
        formation = "Cover 3"
    elif deep_safeties == 0:
        formation = "Cover 0/1"
    elif box_count >= 8:
        formation = "Heavy Box"
    else:
        formation = "Other"

    return {
        'formation': formation,
        'box_count': box_count,
        'deep_safeties': deep_safeties,
        'defender_positions': defenders[['x', 'y', 'position']]
    }

# Example usage
formation_info = detect_defensive_formation(tracking_example)
print(f"\nDetected Formation: {formation_info['formation']}")
print(f"Box Count: {formation_info['box_count']}")
print(f"Deep Safeties: {formation_info['deep_safeties']}")

Defensive Alignment Visualization

#| label: fig-formation-viz-r
#| fig-cap: "Defensive alignment at snap"
#| fig-width: 12
#| fig-height: 8
#| eval: false
#| echo: true

# Visualize defensive formation
plot_defensive_formation <- function(tracking_data, snap_frame = 1) {
  snap_data <- tracking_data %>% filter(frameId == snap_frame)

  ggplot() +
    # Field boundaries
    geom_rect(aes(xmin = 0, xmax = 120, ymin = 0, ymax = 53.3),
              fill = "#196f0c", alpha = 0.3) +
    # Yard lines
    geom_vline(xintercept = seq(10, 110, by = 10),
               color = "white", alpha = 0.3) +
    # Line of scrimmage
    geom_vline(xintercept = 75, color = "yellow", linewidth = 1) +
    # Plot defenders
    geom_point(data = snap_data %>% filter(team == "DEF"),
               aes(x = x, y = y),
               color = "red", size = 6, alpha = 0.8) +
    geom_text(data = snap_data %>% filter(team == "DEF"),
              aes(x = x, y = y, label = jerseyNumber),
              color = "white", size = 3, fontface = "bold") +
    # Plot offense
    geom_point(data = snap_data %>% filter(team == "OFF"),
               aes(x = x, y = y),
               color = "blue", size = 6, alpha = 0.8) +
    geom_text(data = snap_data %>% filter(team == "OFF"),
              aes(x = x, y = y, label = jerseyNumber),
              color = "white", size = 3, fontface = "bold") +
    # Formatting
    coord_fixed() +
    scale_x_continuous(limits = c(60, 95)) +
    scale_y_continuous(limits = c(0, 53.3)) +
    labs(
      title = "Defensive Alignment at Snap",
      subtitle = paste("Formation:", formation_info$formation),
      x = "Field Position (yards)",
      y = "Field Width (yards)"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 14),
      panel.grid = element_blank()
    )
}

plot_defensive_formation(tracking_example)

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-formation-viz-py
#| fig-cap: "Defensive alignment at snap - Python"
#| fig-width: 12
#| fig-height: 8
#| eval: false
#| echo: true

def plot_defensive_formation(tracking_data, snap_frame=1):
    """Visualize defensive formation"""
    snap_data = tracking_data[tracking_data['frameId'] == snap_frame]

    fig, ax = plt.subplots(figsize=(12, 8))

    # Field background
    ax.add_patch(plt.Rectangle((0, 0), 120, 53.3,
                               facecolor='#196f0c', alpha=0.3))

    # Yard lines
    for x in range(10, 111, 10):
        ax.axvline(x, color='white', alpha=0.3, linewidth=0.5)

    # Line of scrimmage
    ax.axvline(75, color='yellow', linewidth=2)

    # Plot defenders
    defenders = snap_data[snap_data['team'] == 'DEF']
    ax.scatter(defenders['x'], defenders['y'],
              c='red', s=200, alpha=0.8, zorder=3)
    for _, player in defenders.iterrows():
        ax.text(player['x'], player['y'], str(int(player['jerseyNumber'])),
               color='white', fontsize=10, fontweight='bold',
               ha='center', va='center', zorder=4)

    # Plot offense
    offense = snap_data[snap_data['team'] == 'OFF']
    ax.scatter(offense['x'], offense['y'],
              c='blue', s=200, alpha=0.8, zorder=3)
    for _, player in offense.iterrows():
        ax.text(player['x'], player['y'], str(int(player['jerseyNumber'])),
               color='white', fontsize=10, fontweight='bold',
               ha='center', va='center', zorder=4)

    ax.set_xlim(60, 95)
    ax.set_ylim(0, 53.3)
    ax.set_aspect('equal')
    ax.set_xlabel('Field Position (yards)', fontsize=12)
    ax.set_ylabel('Field Width (yards)', fontsize=12)
    ax.set_title(f'Defensive Alignment at Snap\nFormation: {formation_info["formation"]}',
                fontsize=14, fontweight='bold')
    ax.grid(False)

    plt.tight_layout()
    plt.show()

plot_defensive_formation(tracking_example)

Spatial Control and Voronoi Diagrams

Voronoi Tessellation

Voronoi diagrams partition the field into regions based on which player is closest to each point:

#| label: voronoi-r
#| eval: false
#| echo: true

library(deldir)
library(ggvoronoi)

# Function to create Voronoi diagram
create_voronoi_plot <- function(tracking_data, frame_num) {
  frame_data <- tracking_data %>%
    filter(frameId == frame_num) %>%
    select(x, y, team, jerseyNumber)

  # Create Voronoi tessellation
  ggplot(frame_data, aes(x = x, y = y, fill = team)) +
    # Field background
    geom_rect(aes(xmin = 0, xmax = 120, ymin = 0, ymax = 53.3),
              fill = "#196f0c", alpha = 0.3, inherit.aes = FALSE) +
    # Voronoi regions
    geom_voronoi(alpha = 0.4, outline = frame_data) +
    # Player positions
    geom_point(aes(color = team), size = 5) +
    geom_text(aes(label = jerseyNumber),
              color = "white", size = 3, fontface = "bold") +
    # Styling
    scale_fill_manual(values = c("OFF" = "blue", "DEF" = "red")) +
    scale_color_manual(values = c("OFF" = "blue", "DEF" = "red")) +
    coord_fixed(xlim = c(60, 100), ylim = c(0, 53.3)) +
    labs(
      title = "Spatial Control via Voronoi Diagram",
      subtitle = paste("Frame:", frame_num),
      x = "Field Position (yards)",
      y = "Field Width (yards)"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 14),
      legend.position = "top"
    )
}

# Create Voronoi plot
create_voronoi_plot(tracking_example, frame_num = 25)
#| label: voronoi-py
#| eval: false
#| echo: true

def create_voronoi_plot(tracking_data, frame_num):
    """Create Voronoi diagram for spatial control"""
    frame_data = tracking_data[tracking_data['frameId'] == frame_num]

    # Get player positions
    points = frame_data[['x', 'y']].values

    # Create Voronoi diagram
    vor = Voronoi(points)

    fig, ax = plt.subplots(figsize=(12, 8))

    # Field background
    ax.add_patch(plt.Rectangle((0, 0), 120, 53.3,
                               facecolor='#196f0c', alpha=0.3))

    # Plot Voronoi regions
    voronoi_plot_2d(vor, ax=ax, show_vertices=False,
                   line_colors='black', line_width=1.5)

    # Color regions by team
    for region_idx, point_idx in enumerate(vor.point_region):
        region = vor.regions[point_idx]
        if not -1 in region and len(region) > 0:
            polygon = [vor.vertices[i] for i in region]
            team = frame_data.iloc[region_idx]['team']
            color = 'blue' if team == 'OFF' else 'red'
            ax.fill(*zip(*polygon), alpha=0.3, color=color)

    # Plot players
    for team in ['OFF', 'DEF']:
        team_data = frame_data[frame_data['team'] == team]
        color = 'blue' if team == 'OFF' else 'red'
        ax.scatter(team_data['x'], team_data['y'],
                  c=color, s=200, edgecolors='white', linewidths=2, zorder=3)
        for _, player in team_data.iterrows():
            ax.text(player['x'], player['y'],
                   str(int(player['jerseyNumber'])),
                   color='white', fontsize=10, fontweight='bold',
                   ha='center', va='center', zorder=4)

    ax.set_xlim(60, 100)
    ax.set_ylim(0, 53.3)
    ax.set_aspect('equal')
    ax.set_xlabel('Field Position (yards)', fontsize=12)
    ax.set_ylabel('Field Width (yards)', fontsize=12)
    ax.set_title(f'Spatial Control via Voronoi Diagram\nFrame: {frame_num}',
                fontsize=14, fontweight='bold')

    plt.tight_layout()
    plt.show()

create_voronoi_plot(tracking_example, frame_num=25)

Pitch Control Model

A more sophisticated approach to spatial control using influence functions:

#| label: pitch-control-r
#| eval: false
#| echo: true

# Pitch control model based on Spearman et al. (2017)
calculate_pitch_control <- function(tracking_data, frame_num,
                                   grid_x = seq(60, 100, 1),
                                   grid_y = seq(0, 53.3, 1)) {
  frame_data <- tracking_data %>% filter(frameId == frame_num)

  # Create grid
  grid <- expand_grid(grid_x = grid_x, grid_y = grid_y)

  # Calculate influence for each grid point
  grid$off_control <- 0
  grid$def_control <- 0

  for (i in 1:nrow(grid)) {
    gx <- grid$grid_x[i]
    gy <- grid$grid_y[i]

    # Offensive influence
    off_players <- frame_data %>% filter(team == "OFF")
    off_distances <- sqrt((off_players$x - gx)^2 + (off_players$y - gy)^2)
    off_speeds <- off_players$s
    off_influence <- sum(exp(-off_distances / (off_speeds + 1)))

    # Defensive influence
    def_players <- frame_data %>% filter(team == "DEF")
    def_distances <- sqrt((def_players$x - gx)^2 + (def_players$y - gy)^2)
    def_speeds <- def_players$s
    def_influence <- sum(exp(-def_distances / (def_speeds + 1)))

    # Calculate control probability
    total_influence <- off_influence + def_influence
    grid$off_control[i] <- off_influence / total_influence
    grid$def_control[i] <- def_influence / total_influence
  }

  grid
}

# Calculate and visualize pitch control
pitch_control <- calculate_pitch_control(tracking_example, frame_num = 25)

ggplot(pitch_control, aes(x = grid_x, y = grid_y, fill = off_control)) +
  geom_tile() +
  scale_fill_gradient2(
    low = "red", mid = "white", high = "blue",
    midpoint = 0.5,
    name = "Offensive\nControl"
  ) +
  geom_point(
    data = tracking_example %>% filter(frameId == 25),
    aes(x = x, y = y, color = team),
    size = 5, inherit.aes = FALSE
  ) +
  scale_color_manual(values = c("OFF" = "blue", "DEF" = "red")) +
  coord_fixed() +
  labs(
    title = "Pitch Control Model",
    subtitle = "Probability of offensive control at each location",
    x = "Field Position (yards)",
    y = "Field Width (yards)"
  ) +
  theme_minimal()
#| label: pitch-control-py
#| eval: false
#| echo: true

def calculate_pitch_control(tracking_data, frame_num,
                           grid_x=np.arange(60, 101, 1),
                           grid_y=np.arange(0, 54, 1)):
    """Calculate pitch control using influence functions"""
    frame_data = tracking_data[tracking_data['frameId'] == frame_num]

    # Create grid
    xx, yy = np.meshgrid(grid_x, grid_y)

    # Initialize control arrays
    off_control = np.zeros_like(xx, dtype=float)
    def_control = np.zeros_like(xx, dtype=float)

    # Get player data
    off_players = frame_data[frame_data['team'] == 'OFF']
    def_players = frame_data[frame_data['team'] == 'DEF']

    # Calculate influence for each grid point
    for i in range(xx.shape[0]):
        for j in range(xx.shape[1]):
            gx, gy = xx[i, j], yy[i, j]

            # Offensive influence
            off_distances = np.sqrt(
                (off_players['x'] - gx)**2 + (off_players['y'] - gy)**2
            )
            off_speeds = off_players['s'].values
            off_influence = np.sum(np.exp(-off_distances / (off_speeds + 1)))

            # Defensive influence
            def_distances = np.sqrt(
                (def_players['x'] - gx)**2 + (def_players['y'] - gy)**2
            )
            def_speeds = def_players['s'].values
            def_influence = np.sum(np.exp(-def_distances / (def_speeds + 1)))

            # Calculate control probability
            total_influence = off_influence + def_influence
            off_control[i, j] = off_influence / total_influence
            def_control[i, j] = def_influence / total_influence

    return xx, yy, off_control, def_control, frame_data

# Calculate and visualize
xx, yy, off_control, def_control, frame_data = calculate_pitch_control(
    tracking_example, frame_num=25
)

plt.figure(figsize=(12, 8))

# Plot heatmap
plt.contourf(xx, yy, off_control, levels=20, cmap='RdBu', alpha=0.8)
plt.colorbar(label='Offensive Control Probability')

# Plot players
off = frame_data[frame_data['team'] == 'OFF']
plt.scatter(off['x'], off['y'], c='blue', s=200, edgecolors='white',
           linewidths=2, label='Offense', zorder=3)

deff = frame_data[frame_data['team'] == 'DEF']
plt.scatter(deff['x'], deff['y'], c='red', s=200, edgecolors='white',
           linewidths=2, label='Defense', zorder=3)

plt.xlabel('Field Position (yards)', fontsize=12)
plt.ylabel('Field Width (yards)', fontsize=12)
plt.title('Pitch Control Model\nProbability of offensive control at each location',
          fontsize=14, fontweight='bold')
plt.legend()
plt.axis('equal')
plt.tight_layout()
plt.show()

Animation and Visualization

Static Frame Visualization

#| label: fig-field-viz-r
#| fig-cap: "Field visualization with player positions"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Function to plot a single frame
plot_frame <- function(tracking_data, frame_num) {
  frame_data <- tracking_data %>% filter(frameId == frame_num)

  ggplot() +
    # Field
    geom_rect(aes(xmin = 0, xmax = 120, ymin = 0, ymax = 53.3),
              fill = "#196f0c", alpha = 0.5) +
    # Hash marks and yard lines
    geom_vline(xintercept = seq(10, 110, by = 5),
               color = "white", alpha = 0.3) +
    geom_hline(yintercept = c(0, 53.3), color = "white") +
    # Line of scrimmage
    geom_vline(xintercept = 75, color = "yellow", linewidth = 1.5) +
    # Players
    geom_point(data = frame_data, aes(x = x, y = y, color = team),
               size = 8, alpha = 0.8) +
    geom_text(data = frame_data, aes(x = x, y = y, label = jerseyNumber),
              color = "white", size = 3, fontface = "bold") +
    # Paths (trace)
    geom_path(data = tracking_example %>%
                filter(frameId <= frame_num) %>%
                group_by(nflId),
              aes(x = x, y = y, color = team, group = nflId),
              alpha = 0.3, linewidth = 0.8) +
    scale_color_manual(
      values = c("OFF" = "#0000FF", "DEF" = "#FF0000"),
      labels = c("OFF" = "Offense", "DEF" = "Defense")
    ) +
    coord_fixed(xlim = c(65, 105), ylim = c(0, 53.3)) +
    labs(
      title = paste("Play Tracking - Frame", frame_num),
      x = "Field Position (yards)",
      y = "Field Width (yards)",
      color = "Team"
    ) +
    theme_minimal() +
    theme(
      plot.title = element_text(face = "bold", size = 14),
      panel.grid = element_blank(),
      legend.position = "top"
    )
}

# Plot a single frame
plot_frame(tracking_example, frame_num = 25)

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-field-viz-py
#| fig-cap: "Field visualization with player positions - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

def plot_frame(tracking_data, frame_num):
    """Plot a single frame of tracking data"""
    frame_data = tracking_data[tracking_data['frameId'] == frame_num]

    fig, ax = plt.subplots(figsize=(12, 8))

    # Field
    ax.add_patch(plt.Rectangle((0, 0), 120, 53.3,
                               facecolor='#196f0c', alpha=0.5))

    # Yard lines
    for x in range(10, 111, 5):
        ax.axvline(x, color='white', alpha=0.3, linewidth=0.5)

    # Sidelines
    ax.axhline(0, color='white', linewidth=1)
    ax.axhline(53.3, color='white', linewidth=1)

    # Line of scrimmage
    ax.axvline(75, color='yellow', linewidth=2)

    # Player paths (trace)
    for nfl_id in tracking_data['nflId'].unique():
        player_path = tracking_data[
            (tracking_data['nflId'] == nfl_id) &
            (tracking_data['frameId'] <= frame_num)
        ]
        team = player_path['team'].iloc[0]
        color = 'blue' if team == 'OFF' else 'red'
        ax.plot(player_path['x'], player_path['y'],
               color=color, alpha=0.3, linewidth=1.5)

    # Current player positions
    for team, color in [('OFF', 'blue'), ('DEF', 'red')]:
        team_data = frame_data[frame_data['team'] == team]
        ax.scatter(team_data['x'], team_data['y'],
                  c=color, s=250, alpha=0.8, edgecolors='white',
                  linewidths=2, zorder=3)
        for _, player in team_data.iterrows():
            ax.text(player['x'], player['y'],
                   str(int(player['jerseyNumber'])),
                   color='white', fontsize=10, fontweight='bold',
                   ha='center', va='center', zorder=4)

    ax.set_xlim(65, 105)
    ax.set_ylim(0, 53.3)
    ax.set_aspect('equal')
    ax.set_xlabel('Field Position (yards)', fontsize=12)
    ax.set_ylabel('Field Width (yards)', fontsize=12)
    ax.set_title(f'Play Tracking - Frame {frame_num}',
                fontsize=14, fontweight='bold')
    ax.grid(False)

    # Legend
    from matplotlib.patches import Patch
    legend_elements = [
        Patch(facecolor='blue', label='Offense'),
        Patch(facecolor='red', label='Defense')
    ]
    ax.legend(handles=legend_elements, loc='upper right')

    plt.tight_layout()
    plt.show()

plot_frame(tracking_example, frame_num=25)

Animated Visualization

#| label: animation-r
#| eval: false
#| echo: true

library(gganimate)

# Create animated visualization
create_play_animation <- function(tracking_data) {
  p <- ggplot() +
    # Field
    geom_rect(aes(xmin = 0, xmax = 120, ymin = 0, ymax = 53.3),
              fill = "#196f0c", alpha = 0.5) +
    # Yard lines
    geom_vline(xintercept = seq(10, 110, by = 5),
               color = "white", alpha = 0.3) +
    # Line of scrimmage
    geom_vline(xintercept = 75, color = "yellow", linewidth = 1.5) +
    # Player paths
    geom_path(data = tracking_data,
              aes(x = x, y = y, color = team, group = nflId),
              alpha = 0.3, linewidth = 0.8) +
    # Players
    geom_point(data = tracking_data,
               aes(x = x, y = y, color = team),
               size = 8, alpha = 0.8) +
    geom_text(data = tracking_data,
              aes(x = x, y = y, label = jerseyNumber),
              color = "white", size = 3, fontface = "bold") +
    scale_color_manual(
      values = c("OFF" = "#0000FF", "DEF" = "#FF0000")
    ) +
    coord_fixed(xlim = c(65, 105), ylim = c(0, 53.3)) +
    labs(
      title = "Play Animation - Frame {frame}",
      x = "Field Position (yards)",
      y = "Field Width (yards)",
      color = "Team"
    ) +
    theme_minimal() +
    theme(panel.grid = element_blank()) +
    # Animation
    transition_manual(frameId)

  # Animate
  animate(p, nframes = max(tracking_data$frameId),
          fps = 10, width = 1000, height = 600)
}

# Create animation
# anim <- create_play_animation(tracking_example)
# anim_save("play_animation.gif", animation = anim)
#| label: animation-py
#| eval: false
#| echo: true

from matplotlib.animation import FuncAnimation
from IPython.display import HTML

def create_play_animation(tracking_data):
    """Create animated visualization of play"""

    fig, ax = plt.subplots(figsize=(12, 8))

    def update(frame_num):
        ax.clear()

        # Field
        ax.add_patch(plt.Rectangle((0, 0), 120, 53.3,
                                   facecolor='#196f0c', alpha=0.5))

        # Yard lines
        for x in range(10, 111, 5):
            ax.axvline(x, color='white', alpha=0.3, linewidth=0.5)
        ax.axvline(75, color='yellow', linewidth=2)

        # Player paths up to current frame
        for nfl_id in tracking_data['nflId'].unique():
            player_path = tracking_data[
                (tracking_data['nflId'] == nfl_id) &
                (tracking_data['frameId'] <= frame_num)
            ]
            team = player_path['team'].iloc[0]
            color = 'blue' if team == 'OFF' else 'red'
            ax.plot(player_path['x'], player_path['y'],
                   color=color, alpha=0.3, linewidth=1.5)

        # Current positions
        frame_data = tracking_data[tracking_data['frameId'] == frame_num]
        for team, color in [('OFF', 'blue'), ('DEF', 'red')]:
            team_data = frame_data[frame_data['team'] == team]
            ax.scatter(team_data['x'], team_data['y'],
                      c=color, s=250, alpha=0.8, edgecolors='white',
                      linewidths=2, zorder=3)
            for _, player in team_data.iterrows():
                ax.text(player['x'], player['y'],
                       str(int(player['jerseyNumber'])),
                       color='white', fontsize=10, fontweight='bold',
                       ha='center', va='center', zorder=4)

        ax.set_xlim(65, 105)
        ax.set_ylim(0, 53.3)
        ax.set_aspect('equal')
        ax.set_xlabel('Field Position (yards)', fontsize=12)
        ax.set_ylabel('Field Width (yards)', fontsize=12)
        ax.set_title(f'Play Animation - Frame {frame_num}',
                    fontsize=14, fontweight='bold')
        ax.grid(False)

    frames = sorted(tracking_data['frameId'].unique())
    anim = FuncAnimation(fig, update, frames=frames,
                        interval=100, repeat=True)

    plt.close()
    return anim

# Create animation
# anim = create_play_animation(tracking_example)
# HTML(anim.to_html5_video())
# Or save: anim.save('play_animation.mp4', writer='ffmpeg')

Advanced Tracking Applications

Pass Rush Analysis

Analyze pass rush paths and pressure generation:

#| label: pass-rush-r
#| eval: false
#| echo: true

# Calculate pass rush metrics
calculate_rush_metrics <- function(tracking_data, qb_frame) {
  tracking_data %>%
    filter(team == "DEF") %>%
    group_by(nflId, jerseyNumber) %>%
    summarise(
      time_to_qb = qb_frame / 10,  # convert frames to seconds
      distance_to_qb = min(sqrt(
        (x - tracking_data$x[tracking_data$position == "QB"])^2 +
        (y - tracking_data$y[tracking_data$position == "QB"])^2
      )),
      max_speed = max(s, na.rm = TRUE),
      rush_distance = sum(sqrt(dx^2 + dy^2), na.rm = TRUE),
      pressure_created = distance_to_qb < 3,
      .groups = "drop"
    )
}
#| label: pass-rush-py
#| eval: false
#| echo: true

def calculate_rush_metrics(tracking_data, qb_frame):
    """Calculate pass rush metrics"""
    rush_metrics = []

    defenders = tracking_data[tracking_data['team'] == 'DEF']

    for nfl_id in defenders['nflId'].unique():
        player_data = defenders[defenders['nflId'] == nfl_id]

        # Get QB position at release
        qb_pos = tracking_data[
            (tracking_data['position'] == 'QB') &
            (tracking_data['frameId'] == qb_frame)
        ][['x', 'y']].values[0]

        # Calculate distances to QB
        distances = np.sqrt(
            (player_data['x'] - qb_pos[0])**2 +
            (player_data['y'] - qb_pos[1])**2
        )

        rush_metrics.append({
            'nflId': nfl_id,
            'jerseyNumber': player_data['jerseyNumber'].iloc[0],
            'time_to_qb': qb_frame / 10,
            'distance_to_qb': distances.min(),
            'max_speed': player_data['s'].max(),
            'pressure_created': distances.min() < 3
        })

    return pd.DataFrame(rush_metrics)

Expected Yards After Catch (xYAC)

Model expected YAC based on spatial configuration:

#| label: xyac-r
#| eval: false
#| echo: true

# Calculate xYAC features
calculate_xyac_features <- function(tracking_data, catch_frame) {
  catch_data <- tracking_data %>% filter(frameId == catch_frame)

  receiver <- catch_data %>% filter(position == "WR")
  defenders <- catch_data %>% filter(team == "DEF")

  # Calculate distances and angles
  distances <- sqrt(
    (defenders$x - receiver$x)^2 + (defenders$y - receiver$y)^2
  )

  # Defenders ahead of receiver
  defenders_ahead <- sum(defenders$x > receiver$x)

  # Space metrics
  tibble(
    receiver_speed = receiver$s,
    nearest_defender = min(distances),
    defenders_within_5yd = sum(distances < 5),
    defenders_ahead = defenders_ahead,
    open_field = (53.3 - max(abs(receiver$y - 26.65))) / 26.65,
    distance_to_endzone = 110 - receiver$x
  )
}

# Example model
# yac_model <- lm(yards_after_catch ~ receiver_speed + nearest_defender +
#                 defenders_within_5yd + defenders_ahead + open_field,
#                 data = yac_training_data)
#| label: xyac-py
#| eval: false
#| echo: true

def calculate_xyac_features(tracking_data, catch_frame):
    """Calculate expected YAC features"""
    catch_data = tracking_data[tracking_data['frameId'] == catch_frame]

    receiver = catch_data[catch_data['position'] == 'WR'].iloc[0]
    defenders = catch_data[catch_data['team'] == 'DEF']

    # Calculate distances
    distances = np.sqrt(
        (defenders['x'] - receiver['x'])**2 +
        (defenders['y'] - receiver['y'])**2
    )

    features = {
        'receiver_speed': receiver['s'],
        'nearest_defender': distances.min(),
        'defenders_within_5yd': (distances < 5).sum(),
        'defenders_ahead': (defenders['x'] > receiver['x']).sum(),
        'open_field': (53.3 - abs(receiver['y'] - 26.65)) / 26.65,
        'distance_to_endzone': 110 - receiver['x']
    }

    return features

# Example model
# from sklearn.ensemble import RandomForestRegressor
# yac_model = RandomForestRegressor(n_estimators=100, random_state=42)
# yac_model.fit(X_train, y_train)

Receiver Route Running Efficiency

Evaluate route running quality:

#| label: route-efficiency-r
#| message: false
#| warning: false

# Calculate route efficiency metrics
calculate_route_efficiency <- function(player_tracking) {
  player_tracking <- player_tracking %>% arrange(frameId)

  tibble(
    route_id = first(player_tracking$nflId),
    total_distance = sum(sqrt(
      player_tracking$dx^2 + player_tracking$dy^2
    ), na.rm = TRUE),
    straight_distance = sqrt(
      (last(player_tracking$x) - first(player_tracking$x))^2 +
      (last(player_tracking$y) - first(player_tracking$y))^2
    ),
    efficiency_ratio = straight_distance / total_distance,
    avg_speed = mean(player_tracking$s, na.rm = TRUE),
    top_speed = max(player_tracking$s, na.rm = TRUE),
    speed_variance = sd(player_tracking$s, na.rm = TRUE),
    sharp_cuts = sum(abs(diff(player_tracking$dir)) > 45, na.rm = TRUE)
  ) %>%
    mutate(
      route_quality_score = (efficiency_ratio * 0.3) +
                           (avg_speed / 15 * 0.3) +
                           (top_speed / 20 * 0.2) +
                           (1 / (sharp_cuts + 1) * 0.2)
    )
}

# Calculate for receiver
wr_tracking <- tracking_example %>% filter(position == "WR")
route_efficiency <- calculate_route_efficiency(wr_tracking)

route_efficiency %>%
  select(-route_id) %>%
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") %>%
  gt() %>%
  fmt_number(columns = Value, decimals = 3) %>%
  tab_header(title = "Route Running Efficiency Metrics")
#| label: route-efficiency-py
#| message: false
#| warning: false

def calculate_route_efficiency(player_tracking):
    """Calculate route running efficiency metrics"""
    player_tracking = player_tracking.sort_values('frameId')

    # Calculate distances
    dx = player_tracking['x'].diff().fillna(0)
    dy = player_tracking['y'].diff().fillna(0)
    total_distance = np.sqrt(dx**2 + dy**2).sum()

    straight_distance = np.sqrt(
        (player_tracking['x'].iloc[-1] - player_tracking['x'].iloc[0])**2 +
        (player_tracking['y'].iloc[-1] - player_tracking['y'].iloc[0])**2
    )

    # Direction changes
    dir_changes = player_tracking['dir'].diff().fillna(0).abs()
    sharp_cuts = (dir_changes > 45).sum()

    efficiency_ratio = straight_distance / total_distance if total_distance > 0 else 0
    avg_speed = player_tracking['s'].mean()
    top_speed = player_tracking['s'].max()

    # Route quality score
    route_quality_score = (
        (efficiency_ratio * 0.3) +
        (avg_speed / 15 * 0.3) +
        (top_speed / 20 * 0.2) +
        (1 / (sharp_cuts + 1) * 0.2)
    )

    return {
        'total_distance': total_distance,
        'straight_distance': straight_distance,
        'efficiency_ratio': efficiency_ratio,
        'avg_speed': avg_speed,
        'top_speed': top_speed,
        'speed_variance': player_tracking['s'].std(),
        'sharp_cuts': sharp_cuts,
        'route_quality_score': route_quality_score
    }

# Calculate for receiver
wr_tracking = tracking_example[tracking_example['position'] == 'WR']
route_efficiency = calculate_route_efficiency(wr_tracking)

print("\nRoute Running Efficiency Metrics:")
for metric, value in route_efficiency.items():
    print(f"{metric:25s}: {value:.3f}")

Summary

In this chapter, we explored player tracking data and spatial analysis in football:

  • Data Structure: Learned the format and coordinate system of tracking data
  • Movement Metrics: Calculated speed, acceleration, and distance traveled
  • Separation Analysis: Measured receiver-defender separation and coverage quality
  • Route Analysis: Clustered routes and evaluated route running efficiency
  • Completion Models: Built expected completion models using spatial features
  • Formation Detection: Identified defensive alignments from positioning data
  • Spatial Control: Applied Voronoi diagrams and pitch control models
  • Visualization: Created static and animated visualizations of plays
  • Advanced Applications: Analyzed pass rush, expected YAC, and route quality

Tracking data provides unprecedented insight into player movement and spatial dynamics, enabling analysis that was impossible with traditional statistics. As tracking technology continues to improve and become more widely available, these spatial analytics will become increasingly important for team strategy and player evaluation.

Key Takeaways

1. Tracking data captures player locations 10 times per second, providing granular movement data 2. Separation is a critical metric for completion probability 3. Spatial control models (Voronoi, pitch control) quantify field dominance 4. Route clustering can identify patterns and tendencies 5. Tracking-based models significantly outperform traditional statistics for many predictions 6. Visualization is crucial for communicating tracking insights

Exercises

Conceptual Questions

  1. Coordinate Systems: Why is it important to standardize play direction when working with tracking data? What problems could arise if you don't?

  2. Separation vs Coverage: Explain the difference between separation distance and coverage quality. Can a defender provide good coverage even with high separation?

  3. Spatial Control: Compare and contrast Voronoi diagrams and pitch control models for measuring spatial dominance. What are the advantages and limitations of each?

Coding Exercises

Exercise 1: Speed Analysis

Using tracking data: a) Calculate the top speed achieved by each position group b) Identify plays where receivers reached maximum speed c) Analyze the relationship between receiver speed and separation **Dataset**: Use Big Data Bowl tracking data or the synthetic data from this chapter.

Exercise 2: Separation Heat Map

Create a heat map showing: a) Average separation by field location b) Separation at different route depths c) Identify "hot zones" where receivers generate most separation **Visualization**: Use ggplot2 or matplotlib to create the heat map.

Exercise 3: Route Clustering

Perform route clustering analysis: a) Extract features from 100+ routes (use synthetic or real data) b) Apply k-means clustering with k=4 c) Visualize the clusters and interpret the route types d) Calculate separation achieved by each route cluster **Method**: Use features like depth, width, direction changes, and ending position.

Exercise 4: Expected Completion Model

Build an expected completion probability model: a) Create features from tracking data (separation, coverage, speed) b) Train a logistic regression or random forest model c) Evaluate model performance (AUC, calibration) d) Identify which features are most important **Advanced**: Compare model performance with and without tracking features.

Exercise 5: Play Animation

Create an animated visualization: a) Load a full play from tracking data b) Create frames showing player movement c) Add trails showing player paths d) Highlight key events (snap, throw, catch) **Output**: Save as GIF or MP4 file.

Further Reading

Academic Papers

  • Fernández, J., & Bornn, L. (2018). "Wide Open Spaces: A statistical technique for measuring space creation in professional soccer." MIT Sloan Sports Analytics Conference.

  • Spearman, W. (2018). "Beyond Expected Goals." MIT Sloan Sports Analytics Conference.

  • Steiner, S., & Raabe, D. (2020). "Player tracking data in football: An application to position-specific player performance." Journal of Sports Analytics, 6(4), 241-252.

Technical Resources

  • NFL Big Data Bowl Competition (Kaggle)
  • Next Gen Stats Documentation (NFL.com)
  • Fernández, J. et al. (2019). "Decomposing the Immeasurable Sport" (tracking data methodology)

Books and Guides

  • Alamar, B. (2013). Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers. Columbia University Press.

  • "A Framework for Tactical Analysis and Individual Offensive Production Assessment in Soccer Using Markov Chains" - Link, D. et al.

References

:::