Chapter 16: Run Defense Analytics | Football Analytics Textbook

Learning ObjectivesBy the end of this chapter, you will be able to:

Evaluate run defense beyond yards per carry allowed
Analyze defensive front effectiveness
Study gap integrity and assignment discipline
Measure tackle efficiency and missed tackles
Understand run fits and defensive scheme

Introduction

In the chess match between offense and defense, stopping the run remains a fundamental aspect of defensive success. While modern football has become increasingly pass-heavy, the ability to control the line of scrimmage and limit rushing efficiency creates favorable down-and-distance situations, forces opponents into predictable passing situations, and controls game tempo.

Traditional run defense metrics like yards per carry allowed and rushing yards per game provide only a surface-level view of defensive effectiveness. They fail to account for situational context, opponent quality, game script, and the specific roles different defenders play in run defense. This chapter introduces a comprehensive framework for evaluating run defense using advanced analytics.

Understanding run defense requires analyzing multiple levels: the defensive line's ability to control gaps and generate penetration, linebackers' gap discipline and tackling efficiency, safeties' run support and pursuit angles, and the overall scheme's effectiveness in different situations. We'll examine each dimension using data-driven approaches.

The Modern Run Defense Challenge

Today's NFL offenses employ diverse rushing schemes: inside and outside zone, gap/power concepts, read-option plays, and quarterback designed runs. Effective run defense requires versatility, discipline, and the ability to adapt to multiple offensive approaches while maintaining sound fundamentals.

The Problem with Yards Per Carry Allowed

Why YPC Allowed is Misleading

Yards per carry allowed has been the standard run defense metric for decades, but it suffers from critical limitations:

Game Script Effects: Teams winning by multiple scores face more rushing attempts as opponents abandon the run. This inflates YPC allowed for good teams.

Opponent Quality: Playing against elite rushing offenses vs poor ones dramatically affects averages.

Volume Bias: Defenses facing many carries often have better YPC allowed because they include more short-yardage situations.

Situational Blindness: Giving up 5 yards on 3rd-and-10 is excellent defense, but it counts the same as 5 yards on 3rd-and-4.

Outlier Sensitivity: A single 70-yard run can skew an entire game's statistics.

Let's examine real data to illustrate these limitations:

R
Python

#| label: ypc-allowed-limitations-r
#| message: false
#| warning: false
#| cache: true

library(tidyverse)
library(nflfastR)
library(gt)
library(nflplotR)

# Load 2023 play-by-play data
pbp <- load_pbp(2023)

# Calculate basic run defense stats
run_defense_basic <- pbp %>%
  filter(play_type == "run", !is.na(defteam)) %>%
  group_by(defteam) %>%
  summarise(
    rushes_faced = n(),
    total_yards = sum(yards_gained, na.rm = TRUE),
    ypc_allowed = mean(yards_gained, na.rm = TRUE),
    longest_allowed = max(yards_gained, na.rm = TRUE),
    rushes_over_20 = sum(yards_gained >= 20, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(ypc_allowed)

# Show top 10 by YPC allowed
run_defense_basic %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Defense",
    rushes_faced = "Rushes",
    total_yards = "Yards",
    ypc_allowed = "YPC",
    longest_allowed = "Long",
    rushes_over_20 = "20+ Yd Runs"
  ) %>%
  fmt_number(
    columns = ypc_allowed,
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(rushes_faced, total_yards, longest_allowed, rushes_over_20),
    decimals = 0
  ) %>%
  tab_header(
    title = "Best Run Defenses by Yards Per Carry Allowed",
    subtitle = "2023 Season"
  )

#| label: ypc-allowed-limitations-py
#| message: false
#| warning: false
#| cache: true

import pandas as pd
import numpy as np
import nfl_data_py as nfl

# Load 2023 play-by-play data
pbp = nfl.import_pbp_data([2023])

# Calculate basic run defense stats
run_defense_basic = (pbp
    .query("play_type == 'run' & defteam.notna()")
    .groupby('defteam')
    .agg(
        rushes_faced=('play_id', 'count'),
        total_yards=('yards_gained', 'sum'),
        ypc_allowed=('yards_gained', 'mean'),
        longest_allowed=('yards_gained', 'max'),
        rushes_over_20=('yards_gained', lambda x: (x >= 20).sum())
    )
    .reset_index()
    .sort_values('ypc_allowed')
)

print("Best Run Defenses by Yards Per Carry Allowed (2023 Season):\n")
print(run_defense_basic.head(10).to_string(index=False))

The Outlier Effect on Run Defense

Let's examine how a few long runs impact defensive YPC:

R
Python

#| label: outlier-effect-defense-r
#| message: false
#| warning: false

# Analyze impact of longest runs on defensive YPC
outlier_impact <- pbp %>%
  filter(play_type == "run", !is.na(defteam)) %>%
  group_by(defteam) %>%
  summarise(
    rushes_faced = n(),
    ypc_with_long = mean(yards_gained, na.rm = TRUE),
    ypc_without_long = mean(yards_gained[yards_gained != max(yards_gained)], na.rm = TRUE),
    longest_run = max(yards_gained, na.rm = TRUE),
    ypc_difference = ypc_with_long - ypc_without_long,
    .groups = "drop"
  ) %>%
  arrange(desc(ypc_difference))

# Show teams most affected by outliers
outlier_impact %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Defense",
    rushes_faced = "Rushes",
    ypc_with_long = "YPC (w/ Long)",
    ypc_without_long = "YPC (w/o Long)",
    longest_run = "Longest Run",
    ypc_difference = "Difference"
  ) %>%
  fmt_number(
    columns = c(ypc_with_long, ypc_without_long, ypc_difference),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(rushes_faced, longest_run),
    decimals = 0
  ) %>%
  tab_header(
    title = "Impact of Longest Run Allowed on YPC",
    subtitle = "How much does one run skew defensive stats?"
  )

#| label: outlier-effect-defense-py
#| message: false
#| warning: false

# Analyze impact of longest runs
def calc_ypc_without_max(series):
    return series[series != series.max()].mean()

outlier_impact = (pbp
    .query("play_type == 'run' & defteam.notna()")
    .groupby('defteam')
    .agg(
        rushes_faced=('yards_gained', 'count'),
        ypc_with_long=('yards_gained', 'mean'),
        ypc_without_long=('yards_gained', calc_ypc_without_max),
        longest_run=('yards_gained', 'max')
    )
    .reset_index()
)

outlier_impact['ypc_difference'] = (outlier_impact['ypc_with_long'] -
                                     outlier_impact['ypc_without_long'])

print("\nImpact of Longest Run Allowed on YPC:\n")
print(outlier_impact.nlargest(10, 'ypc_difference').to_string(index=False))

Run EPA Allowed: A Better Framework

Expected Points Added (EPA) allowed provides a superior framework for evaluating run defense by accounting for context. Run EPA allowed measures the change in expected points from the defense's perspective on rushing plays.

Understanding Run EPA Allowed

Why Run EPA Allowed is Better Than YPC

Run EPA allowed accounts for: 1. **Down and distance**: Allowing 3 yards on 3rd-and-10 is good defense, on 3rd-and-2 is bad 2. **Field position**: Stops near the goal line are more valuable 3. **Game situation**: Context matters for true defensive value 4. **Success prevention**: Not all yards are equally damaging

Let's calculate run EPA allowed for all defenses:

R
Python

#| label: run-epa-allowed-r
#| message: false
#| warning: false

# Calculate comprehensive run defense metrics
run_defense_advanced <- pbp %>%
  filter(play_type == "run", !is.na(defteam), !is.na(epa)) %>%
  group_by(defteam) %>%
  summarise(
    rushes_faced = n(),
    total_yards = sum(yards_gained, na.rm = TRUE),
    ypc_allowed = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    total_epa_allowed = sum(epa, na.rm = TRUE),
    success_rate_allowed = mean(epa > 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(epa_per_rush)

# Display top run defenses by EPA
run_defense_advanced %>%
  head(15) %>%
  gt() %>%
  cols_label(
    defteam = "Defense",
    rushes_faced = "Rushes",
    total_yards = "Yards",
    ypc_allowed = "YPC",
    epa_per_rush = "EPA/Rush",
    total_epa_allowed = "Total EPA",
    success_rate_allowed = "Success % Allowed"
  ) %>%
  fmt_number(
    columns = c(ypc_allowed, epa_per_rush),
    decimals = 3
  ) %>%
  fmt_number(
    columns = total_epa_allowed,
    decimals = 1
  ) %>%
  fmt_percent(
    columns = success_rate_allowed,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(rushes_faced, total_yards),
    decimals = 0
  ) %>%
  data_color(
    columns = epa_per_rush,
    colors = scales::col_numeric(
      palette = c("green", "white", "red"),
      domain = c(-0.2, 0.1)
    )
  ) %>%
  tab_header(
    title = "Run Defense Efficiency Rankings",
    subtitle = "2023 Season (Lower EPA is better)"
  )

#| label: run-epa-allowed-py
#| message: false
#| warning: false

# Calculate comprehensive run defense metrics
run_defense_advanced = (pbp
    .query("play_type == 'run' & defteam.notna() & epa.notna()")
    .groupby('defteam')
    .agg(
        rushes_faced=('play_id', 'count'),
        total_yards=('yards_gained', 'sum'),
        ypc_allowed=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean'),
        total_epa_allowed=('epa', 'sum'),
        success_rate_allowed=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
    .sort_values('epa_per_rush')
)

print("\nRun Defense Efficiency Rankings (2023 Season):\n")
print(run_defense_advanced.head(15).to_string(index=False))

Comparing YPC and EPA Rankings

Not all defenses with good YPC allowed have good EPA allowed:

R
Python

#| label: fig-ypc-vs-epa-defense-r
#| fig-cap: "Yards per carry allowed vs EPA per rush allowed for defenses"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

run_defense_advanced %>%
  ggplot(aes(x = ypc_allowed, y = epa_per_rush)) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_vline(xintercept = mean(run_defense_advanced$ypc_allowed),
             linetype = "dashed", alpha = 0.5) +
  geom_nfl_logos(aes(team_abbr = defteam), width = 0.05, alpha = 0.8) +
  geom_smooth(method = "lm", se = TRUE, color = "#D50A0A", linetype = "dashed") +
  labs(
    title = "Run Defense Efficiency: YPC Allowed vs EPA per Rush",
    subtitle = "2023 NFL Season | Lower values = better defense",
    x = "Yards Per Carry Allowed",
    y = "EPA per Rush Allowed",
    caption = "Data: nflfastR | Negative EPA = good defense"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    legend.position = "none"
  )

#| label: fig-ypc-vs-epa-defense-py
#| fig-cap: "Yards per carry allowed vs EPA per rush - Python"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

fig, ax = plt.subplots(figsize=(10, 8))

# Create scatter plot
scatter = ax.scatter(run_defense_advanced['ypc_allowed'],
                     run_defense_advanced['epa_per_rush'],
                     s=150, alpha=0.6, c='#013369')

# Add team labels
for _, row in run_defense_advanced.iterrows():
    ax.annotate(row['defteam'],
                xy=(row['ypc_allowed'], row['epa_per_rush']),
                xytext=(3, 3), textcoords='offset points',
                fontsize=8, alpha=0.8)

# Add reference lines
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=run_defense_advanced['ypc_allowed'].mean(),
           color='gray', linestyle='--', alpha=0.5)

# Add regression line
z = np.polyfit(run_defense_advanced['ypc_allowed'],
               run_defense_advanced['epa_per_rush'], 1)
p = np.poly1d(z)
ax.plot(run_defense_advanced['ypc_allowed'],
        p(run_defense_advanced['ypc_allowed']),
        "r--", alpha=0.8, linewidth=2)

ax.set_xlabel('Yards Per Carry Allowed', fontsize=12)
ax.set_ylabel('EPA per Rush Allowed', fontsize=12)
ax.set_title('Run Defense Efficiency: YPC Allowed vs EPA per Rush\n2023 NFL Season | Lower = Better Defense',
             fontsize=14, fontweight='bold')
ax.text(0.98, 0.02, 'Data: nfl_data_py | Negative EPA = good defense',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Stuff Rate: Defensive Line Dominance

Stuff rate measures the percentage of rushing attempts stopped at or behind the line of scrimmage. This metric captures defensive line penetration and front-seven physicality.

Calculating Stuff Rate

Stuff Rate Definition

**Stuff Rate** = Percentage of runs stopped for 0 or negative yards This metric indicates: - Defensive line gap control - Linebacker aggressiveness and timing - Overall front-seven dominance - Ability to create negative plays

R
Python

#| label: stuff-rate-r
#| message: false
#| warning: false

# Calculate stuff rate and related metrics
stuff_rate_analysis <- pbp %>%
  filter(play_type == "run", !is.na(defteam)) %>%
  group_by(defteam) %>%
  summarise(
    rushes_faced = n(),
    stuffs = sum(yards_gained <= 0, na.rm = TRUE),
    stuff_rate = mean(yards_gained <= 0, na.rm = TRUE),
    negative_plays = sum(yards_gained < 0, na.rm = TRUE),
    tfl_rate = mean(yards_gained < 0, na.rm = TRUE),
    explosives_allowed = sum(yards_gained >= 10, na.rm = TRUE),
    explosive_rate = mean(yards_gained >= 10, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(stuff_rate))

# Display stuff rate leaders
stuff_rate_analysis %>%
  head(15) %>%
  gt() %>%
  cols_label(
    defteam = "Defense",
    rushes_faced = "Rushes",
    stuffs = "Stuffs",
    stuff_rate = "Stuff Rate",
    negative_plays = "TFLs",
    tfl_rate = "TFL Rate",
    explosives_allowed = "10+ Yd",
    explosive_rate = "Explosive %",
    epa_per_rush = "EPA/Rush"
  ) %>%
  fmt_percent(
    columns = c(stuff_rate, tfl_rate, explosive_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = epa_per_rush,
    decimals = 3
  ) %>%
  fmt_number(
    columns = c(rushes_faced, stuffs, negative_plays, explosives_allowed),
    decimals = 0
  ) %>%
  data_color(
    columns = stuff_rate,
    colors = scales::col_numeric(
      palette = "Greens",
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Run Defense Stuff Rate Leaders",
    subtitle = "2023 Season | Higher stuff rate = more dominant"
  ) %>%
  tab_source_note(
    source_note = "Stuff = ≤0 yards | TFL = <0 yards | Explosive = ≥10 yards"
  )

#| label: stuff-rate-py
#| message: false
#| warning: false

# Calculate stuff rate analysis
stuff_rate_analysis = (pbp
    .query("play_type == 'run' & defteam.notna()")
    .groupby('defteam')
    .agg(
        rushes_faced=('play_id', 'count'),
        stuffs=('yards_gained', lambda x: (x <= 0).sum()),
        stuff_rate=('yards_gained', lambda x: (x <= 0).mean()),
        negative_plays=('yards_gained', lambda x: (x < 0).sum()),
        tfl_rate=('yards_gained', lambda x: (x < 0).mean()),
        explosives_allowed=('yards_gained', lambda x: (x >= 10).sum()),
        explosive_rate=('yards_gained', lambda x: (x >= 10).mean()),
        epa_per_rush=('epa', 'mean')
    )
    .reset_index()
    .sort_values('stuff_rate', ascending=False)
)

print("\nRun Defense Stuff Rate Leaders (2023 Season):\n")
print(stuff_rate_analysis.head(15).to_string(index=False))
print("\nNote: Stuff = ≤0 yards | TFL = <0 yards | Explosive = ≥10 yards")

Stuff Rate vs EPA Allowed

Strong correlation between stuff rate and overall run defense EPA:

R
Python

#| label: fig-stuff-rate-correlation-r
#| fig-cap: "Relationship between stuff rate and run defense EPA"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

stuff_rate_analysis %>%
  ggplot(aes(x = stuff_rate, y = epa_per_rush)) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_smooth(method = "lm", se = TRUE, color = "#013369", fill = "#D3D3D3") +
  geom_nfl_logos(aes(team_abbr = defteam), width = 0.04, alpha = 0.8) +
  scale_x_continuous(labels = scales::percent_format()) +
  labs(
    title = "Stuff Rate vs Run Defense EPA",
    subtitle = "2023 NFL Season | Higher stuff rate = lower EPA allowed",
    x = "Stuff Rate (% of runs ≤0 yards)",
    y = "EPA per Rush Allowed",
    caption = "Data: nflfastR | Bottom-right = elite run defense"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    legend.position = "none"
  )

#| label: fig-stuff-rate-correlation-py
#| fig-cap: "Stuff rate vs EPA correlation - Python"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

fig, ax = plt.subplots(figsize=(10, 8))

# Create scatter plot
scatter = ax.scatter(stuff_rate_analysis['stuff_rate'],
                     stuff_rate_analysis['epa_per_rush'],
                     s=150, alpha=0.6, c='#013369')

# Add team labels
for _, row in stuff_rate_analysis.iterrows():
    ax.annotate(row['defteam'],
                xy=(row['stuff_rate'], row['epa_per_rush']),
                xytext=(3, 3), textcoords='offset points',
                fontsize=8, alpha=0.8)

# Add reference line
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)

# Add regression line
z = np.polyfit(stuff_rate_analysis['stuff_rate'],
               stuff_rate_analysis['epa_per_rush'], 1)
p = np.poly1d(z)
x_line = np.linspace(stuff_rate_analysis['stuff_rate'].min(),
                     stuff_rate_analysis['stuff_rate'].max(), 100)
ax.plot(x_line, p(x_line), "r-", alpha=0.8, linewidth=2)

# Add correlation coefficient
corr = stats.pearsonr(stuff_rate_analysis['stuff_rate'],
                      stuff_rate_analysis['epa_per_rush'])[0]
ax.text(0.05, 0.95, f'Correlation: {corr:.3f}',
        transform=ax.transAxes, fontsize=11,
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

ax.set_xlabel('Stuff Rate (% of runs ≤0 yards)', fontsize=12)
ax.set_ylabel('EPA per Rush Allowed', fontsize=12)
ax.set_title('Stuff Rate vs Run Defense EPA\n2023 NFL Season | Higher stuff rate = lower EPA',
             fontsize=14, fontweight='bold')
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))

plt.tight_layout()
plt.show()

Yards Before Contact Allowed

Yards before contact allowed measures how far runners advance before being touched. This metric helps separate defensive line/linebacker effectiveness from tackling ability.

Analyzing Pre-Contact Defense

Yards Before Contact Allowed

This metric indicates: - **Low YBC**: Good gap discipline and penetration - **High YBC**: Blockers creating running lanes - **Gap control**: Defensive line winning at point of attack - **LB flow**: Fast pursuit to contact point

Since standard play-by-play doesn't include yards before contact, we'll analyze it conceptually and with proxy metrics:

R
Python

#| label: ybc-analysis-r
#| message: false
#| warning: false

# Analyze runs by distance to estimate blocking vs tackling
ybc_proxy <- pbp %>%
  filter(play_type == "run", !is.na(defteam)) %>%
  mutate(
    run_type = case_when(
      yards_gained <= 0 ~ "At/Behind LOS",
      yards_gained <= 3 ~ "Short (1-3)",
      yards_gained <= 7 ~ "Medium (4-7)",
      yards_gained <= 10 ~ "Good (8-10)",
      TRUE ~ "Explosive (10+)"
    )
  ) %>%
  group_by(defteam, run_type) %>%
  summarise(
    plays = n(),
    .groups = "drop_last"
  ) %>%
  mutate(
    pct = plays / sum(plays)
  ) %>%
  ungroup()

# Calculate defensive line effectiveness score
# Higher % of short runs = better gap control
dl_effectiveness <- ybc_proxy %>%
  filter(run_type %in% c("At/Behind LOS", "Short (1-3)")) %>%
  group_by(defteam) %>%
  summarise(
    short_run_pct = sum(pct),
    .groups = "drop"
  ) %>%
  left_join(
    run_defense_advanced %>% select(defteam, epa_per_rush),
    by = "defteam"
  ) %>%
  arrange(desc(short_run_pct))

# Display results
dl_effectiveness %>%
  head(15) %>%
  gt() %>%
  cols_label(
    defteam = "Defense",
    short_run_pct = "Short Run %",
    epa_per_rush = "EPA/Rush"
  ) %>%
  fmt_percent(
    columns = short_run_pct,
    decimals = 1
  ) %>%
  fmt_number(
    columns = epa_per_rush,
    decimals = 3
  ) %>%
  data_color(
    columns = short_run_pct,
    colors = scales::col_numeric(
      palette = "Greens",
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Defensive Line Gap Control",
    subtitle = "% of runs stopped for 0-3 yards (proxy for YBC allowed)"
  )

#| label: ybc-analysis-py
#| message: false
#| warning: false

# Analyze run distance distribution
ybc_data = pbp.query("play_type == 'run' & defteam.notna()").copy()

def classify_run(yards):
    if yards <= 0:
        return "At/Behind LOS"
    elif yards <= 3:
        return "Short (1-3)"
    elif yards <= 7:
        return "Medium (4-7)"
    elif yards <= 10:
        return "Good (8-10)"
    else:
        return "Explosive (10+)"

ybc_data['run_type'] = ybc_data['yards_gained'].apply(classify_run)

ybc_proxy = (ybc_data
    .groupby(['defteam', 'run_type'])
    .size()
    .reset_index(name='plays')
)

ybc_proxy['pct'] = ybc_proxy.groupby('defteam')['plays'].transform(lambda x: x / x.sum())

# Calculate DL effectiveness
dl_effectiveness = (ybc_proxy
    .query("run_type.isin(['At/Behind LOS', 'Short (1-3)'])")
    .groupby('defteam')
    .agg(short_run_pct=('pct', 'sum'))
    .reset_index()
    .merge(run_defense_advanced[['defteam', 'epa_per_rush']], on='defteam')
    .sort_values('short_run_pct', ascending=False)
)

print("\nDefensive Line Gap Control:\n")
print(dl_effectiveness.head(15).to_string(index=False))
print("\nNote: Higher short run % indicates better gap control (proxy for low YBC allowed)")

Missed Tackles and Broken Tackles Allowed

Missed tackles are one of the most damaging defensive failures. They turn should-be stops into big gains and expose poor fundamentals.

Estimating Missed Tackles

While charting services track missed tackles, we can estimate them from play-by-play data:

R
Python

#| label: missed-tackles-r
#| message: false
#| warning: false

# Estimate missed tackles using yards after expected contact
# Runs that exceed expected distance suggest missed tackles
set.seed(42)

missed_tackle_proxy <- pbp %>%
  filter(play_type == "run", !is.na(defteam), !is.na(epa)) %>%
  mutate(
    # Estimate based on unusually long runs for the situation
    potential_broken = case_when(
      down == 1 & yards_gained >= 8 ~ 1,
      down == 2 & yards_gained >= 7 ~ 1,
      down >= 3 & yards_gained > ydstogo + 3 ~ 1,
      TRUE ~ 0
    )
  ) %>%
  group_by(defteam) %>%
  summarise(
    rushes_faced = n(),
    estimated_broken = sum(potential_broken),
    broken_tackle_rate = mean(potential_broken),
    avg_yards_on_broken = mean(yards_gained[potential_broken == 1], na.rm = TRUE),
    epa_on_broken = mean(epa[potential_broken == 1], na.rm = TRUE),
    epa_on_solid = mean(epa[potential_broken == 0], na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(broken_tackle_rate)

# Display best tackling defenses
missed_tackle_proxy %>%
  head(15) %>%
  gt() %>%
  cols_label(
    defteam = "Defense",
    rushes_faced = "Rushes",
    estimated_broken = "Est. Broken",
    broken_tackle_rate = "Broken Rate",
    avg_yards_on_broken = "Avg Yds (Broken)",
    epa_on_broken = "EPA (Broken)",
    epa_on_solid = "EPA (Solid)"
  ) %>%
  fmt_percent(
    columns = broken_tackle_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(avg_yards_on_broken, epa_on_broken, epa_on_solid),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(rushes_faced, estimated_broken),
    decimals = 0
  ) %>%
  data_color(
    columns = broken_tackle_rate,
    colors = scales::col_numeric(
      palette = c("green", "white", "red"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Estimated Tackle Efficiency",
    subtitle = "Proxy metric based on unexpectedly long runs"
  )

#| label: missed-tackles-py
#| message: false
#| warning: false

# Estimate broken tackles
def estimate_broken(row):
    if row['down'] == 1 and row['yards_gained'] >= 8:
        return 1
    elif row['down'] == 2 and row['yards_gained'] >= 7:
        return 1
    elif row['down'] >= 3 and row['yards_gained'] > row['ydstogo'] + 3:
        return 1
    return 0

run_data = pbp.query("play_type == 'run' & defteam.notna() & epa.notna()").copy()
run_data['potential_broken'] = run_data.apply(estimate_broken, axis=1)

missed_tackle_proxy = (run_data
    .groupby('defteam')
    .agg(
        rushes_faced=('play_id', 'count'),
        estimated_broken=('potential_broken', 'sum'),
        broken_tackle_rate=('potential_broken', 'mean'),
        avg_yards_on_broken=('yards_gained',
            lambda x: x[run_data.loc[x.index, 'potential_broken'] == 1].mean()),
        epa_on_broken=('epa',
            lambda x: x[run_data.loc[x.index, 'potential_broken'] == 1].mean()),
        epa_on_solid=('epa',
            lambda x: x[run_data.loc[x.index, 'potential_broken'] == 0].mean())
    )
    .reset_index()
    .sort_values('broken_tackle_rate')
)

print("\nEstimated Tackle Efficiency:\n")
print(missed_tackle_proxy.head(15).to_string(index=False))
print("\nNote: Proxy based on unexpectedly long runs for the situation")

Run Defense by Direction

Understanding defensive effectiveness by run direction reveals scheme strengths, personnel matchups, and potential vulnerabilities.

#| label: directional-defense-r
#| message: false
#| warning: false

# Analyze run defense by direction
directional_defense <- pbp %>%
  filter(play_type == "run", !is.na(defteam), !is.na(run_location)) %>%
  group_by(defteam, run_location) %>%
  summarise(
    rushes = n(),
    ypc_allowed = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    success_rate_allowed = mean(epa > 0, na.rm = TRUE),
    stuff_rate = mean(yards_gained <= 0, na.rm = TRUE),
    .groups = "drop"
  )

# Find teams with meaningful samples in all directions
balanced_defenses <- directional_defense %>%
  group_by(defteam) %>%
  filter(n() == 3, all(rushes >= 50)) %>%
  ungroup()

# Show sample of directional performance
balanced_defenses %>%
  filter(defteam %in% c("SF", "BAL", "CLE", "BUF", "KC", "DAL")) %>%
  arrange(defteam, run_location) %>%
  gt() %>%
  cols_label(
    defteam = "Defense",
    run_location = "Direction",
    rushes = "Rushes",
    ypc_allowed = "YPC",
    epa_per_rush = "EPA/Rush",
    success_rate_allowed = "Success %",
    stuff_rate = "Stuff %"
  ) %>%
  fmt_number(
    columns = c(ypc_allowed, epa_per_rush),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = c(success_rate_allowed, stuff_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = rushes,
    decimals = 0
  ) %>%
  tab_header(
    title = "Run Defense by Direction",
    subtitle = "Selected teams from 2023 season"
  )

#| label: directional-defense-py
#| message: false
#| warning: false

# Analyze directional run defense
directional_defense = (pbp
    .query("play_type == 'run' & defteam.notna() & run_location.notna()")
    .groupby(['defteam', 'run_location'])
    .agg(
        rushes=('play_id', 'count'),
        ypc_allowed=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean'),
        success_rate_allowed=('epa', lambda x: (x > 0).mean()),
        stuff_rate=('yards_gained', lambda x: (x <= 0).mean())
    )
    .reset_index()
)

# Filter to teams with balanced samples
balanced_defenses = (directional_defense
    .groupby('defteam')
    .filter(lambda x: len(x) == 3 and all(x['rushes'] >= 50))
)

# Show sample
sample_teams = ['SF', 'BAL', 'CLE', 'BUF', 'KC', 'DAL']
print("\nRun Defense by Direction (Selected Teams):\n")
print(balanced_defenses[balanced_defenses['defteam'].isin(sample_teams)]
      .sort_values(['defteam', 'run_location'])
      .to_string(index=False))

Visualizing Directional Weaknesses

R
Python

#| label: fig-directional-heatmap-r
#| fig-cap: "Run defense EPA by direction heatmap"
#| fig-width: 12
#| fig-height: 10
#| message: false
#| warning: false

# Create heatmap of directional EPA
directional_defense %>%
  mutate(
    run_location = factor(run_location, levels = c("left", "middle", "right"))
  ) %>%
  ggplot(aes(x = run_location, y = reorder(defteam, epa_per_rush),
             fill = epa_per_rush)) +
  geom_tile(color = "white", linewidth = 0.5) +
  geom_text(aes(label = sprintf("%.2f", epa_per_rush)),
            color = "white", fontface = "bold", size = 2.5) +
  scale_fill_gradient2(
    low = "#013369", mid = "white", high = "#D50A0A",
    midpoint = 0, name = "EPA/Rush\nAllowed",
    limits = c(-0.3, 0.3)
  ) +
  labs(
    title = "Run Defense EPA Allowed by Direction",
    subtitle = "2023 NFL Season | Blue = good defense, Red = poor defense",
    x = "Run Direction",
    y = "Defense",
    caption = "Data: nflfastR | Lower EPA = better run defense"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text.y = element_text(size = 8),
    legend.position = "right"
  )

#| label: fig-directional-heatmap-py
#| fig-cap: "Directional run defense heatmap - Python"
#| fig-width: 12
#| fig-height: 10
#| message: false
#| warning: false

# Create heatmap
heatmap_data = directional_defense.pivot(
    index='defteam', columns='run_location', values='epa_per_rush'
)

# Reorder columns
heatmap_data = heatmap_data[['left', 'middle', 'right']]

# Sort by average EPA
heatmap_data = heatmap_data.loc[heatmap_data.mean(axis=1).sort_values().index]

fig, ax = plt.subplots(figsize=(12, 10))
sns.heatmap(heatmap_data, annot=True, fmt='.2f', cmap='RdBu_r',
            center=0, cbar_kws={'label': 'EPA/Rush Allowed'},
            vmin=-0.3, vmax=0.3,
            linewidths=0.5, linecolor='white', ax=ax)

ax.set_title('Run Defense EPA Allowed by Direction\n2023 NFL Season | Blue = Good Defense',
             fontsize=14, fontweight='bold', pad=20)
ax.set_xlabel('Run Direction', fontsize=12)
ax.set_ylabel('Defense', fontsize=12)
ax.text(0.98, -0.05, 'Data: nfl_data_py | Lower EPA = better run defense',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Short Yardage and Goal Line Defense

Critical situations reveal a defense's toughness and gap discipline. Short yardage and goal line defense are make-or-break moments.

Short Yardage Conversion Defense

R
Python

#| label: short-yardage-defense-r
#| message: false
#| warning: false

# Analyze short yardage run defense (1-2 yards to go, 3rd/4th down)
short_yardage_defense <- pbp %>%
  filter(
    play_type == "run",
    !is.na(defteam),
    ydstogo <= 2,
    down %in% c(3, 4)
  ) %>%
  group by(defteam) %>%
  filter(n() >= 10) %>%  # Minimum sample
  summarise(
    attempts = n(),
    conversions_allowed = sum(yards_gained >= ydstogo, na.rm = TRUE),
    stop_rate = 1 - mean(yards_gained >= ydstogo, na.rm = TRUE),
    ypc_allowed = mean(yards_gained, na.rm = TRUE),
    stuff_rate = mean(yards_gained <= 0, na.rm = TRUE),
    tfl_rate = mean(yards_gained < 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(stop_rate))

# Display best short yardage defenses
short_yardage_defense %>%
  head(20) %>%
  gt() %>%
  cols_label(
    defteam = "Defense",
    attempts = "Attempts",
    conversions_allowed = "Conversions",
    stop_rate = "Stop Rate",
    ypc_allowed = "YPC",
    stuff_rate = "Stuff %",
    tfl_rate = "TFL %"
  ) %>%
  fmt_percent(
    columns = c(stop_rate, stuff_rate, tfl_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = ypc_allowed,
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(attempts, conversions_allowed),
    decimals = 0
  ) %>%
  data_color(
    columns = stop_rate,
    colors = scales::col_numeric(
      palette = "Greens",
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Short Yardage Run Defense",
    subtitle = "3rd/4th down, 1-2 yards to go (Min. 10 attempts)"
  )

#| label: short-yardage-defense-py
#| message: false
#| warning: false

# Analyze short yardage defense
short_yardage_data = pbp.query(
    "play_type == 'run' & defteam.notna() & ydstogo <= 2 & down.isin([3, 4])"
).copy()

short_yardage_defense = (short_yardage_data
    .groupby('defteam')
    .filter(lambda x: len(x) >= 10)
    .groupby('defteam')
    .agg(
        attempts=('play_id', 'count'),
        conversions_allowed=('yards_gained',
            lambda x: (x >= short_yardage_data.loc[x.index, 'ydstogo']).sum()),
        ypc_allowed=('yards_gained', 'mean'),
        stuff_rate=('yards_gained', lambda x: (x <= 0).mean()),
        tfl_rate=('yards_gained', lambda x: (x < 0).mean())
    )
    .reset_index()
)

short_yardage_defense['stop_rate'] = 1 - (
    short_yardage_defense['conversions_allowed'] /
    short_yardage_defense['attempts']
)

short_yardage_defense = short_yardage_defense.sort_values('stop_rate', ascending=False)

print("\nShort Yardage Run Defense:\n")
print(short_yardage_defense.head(20).to_string(index=False))

Goal Line Defense

R
Python

#| label: goal-line-defense-r
#| message: false
#| warning: false

# Analyze goal line run defense (inside 5-yard line)
goal_line_defense <- pbp %>%
  filter(
    play_type == "run",
    !is.na(defteam),
    yardline_100 <= 5
  ) %>%
  group_by(defteam) %>%
  filter(n() >= 15) %>%
  summarise(
    attempts = n(),
    tds_allowed = sum(touchdown == 1, na.rm = TRUE),
    td_rate = mean(touchdown == 1, na.rm = TRUE),
    ypc_allowed = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    stuff_rate = mean(yards_gained <= 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(td_rate)

# Display best goal line defenses
goal_line_defense %>%
  head(20) %>%
  gt() %>%
  cols_label(
    defteam = "Defense",
    attempts = "Attempts",
    tds_allowed = "TDs",
    td_rate = "TD Rate",
    ypc_allowed = "YPC",
    epa_per_rush = "EPA/Rush",
    stuff_rate = "Stuff %"
  ) %>%
  fmt_percent(
    columns = c(td_rate, stuff_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(ypc_allowed, epa_per_rush),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(attempts, tds_allowed),
    decimals = 0
  ) %>%
  data_color(
    columns = td_rate,
    colors = scales::col_numeric(
      palette = c("green", "white", "red"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Goal Line Run Defense",
    subtitle = "Inside the 5-yard line (Min. 15 attempts)"
  )

#| label: goal-line-defense-py
#| message: false
#| warning: false

# Analyze goal line defense
goal_line_defense = (pbp
    .query("play_type == 'run' & defteam.notna() & yardline_100 <= 5")
    .groupby('defteam')
    .filter(lambda x: len(x) >= 15)
    .groupby('defteam')
    .agg(
        attempts=('play_id', 'count'),
        tds_allowed=('touchdown', 'sum'),
        td_rate=('touchdown', 'mean'),
        ypc_allowed=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean'),
        stuff_rate=('yards_gained', lambda x: (x <= 0).mean())
    )
    .reset_index()
    .sort_values('td_rate')
)

print("\nGoal Line Run Defense:\n")
print(goal_line_defense.head(20).to_string(index=False))

Linebacker and Defensive Line Evaluation

Evaluating individual defenders in run defense requires combining team-level metrics with personnel grouping analysis.

Position Group Run Defense

R
Python

#| label: position-group-defense-r
#| message: false
#| warning: false

# Analyze run defense by defensive personnel grouping
personnel_defense <- pbp %>%
  filter(play_type == "run", !is.na(defteam), !is.na(defense_personnel)) %>%
  # Parse defensive personnel (simplified)
  mutate(
    # Extract number of DL, LB, DB from personnel string
    def_front = case_when(
      str_detect(defense_personnel, "4 DL") ~ "4-3 Front",
      str_detect(defense_personnel, "3 DL") ~ "3-4 Front",
      TRUE ~ "Other"
    )
  ) %>%
  group_by(defteam, def_front) %>%
  filter(n() >= 30) %>%
  summarise(
    plays = n(),
    ypc_allowed = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    stuff_rate = mean(yards_gained <= 0, na.rm = TRUE),
    success_rate_allowed = mean(epa > 0, na.rm = TRUE),
    .groups = "drop"
  )

# Compare defensive fronts
front_comparison <- personnel_defense %>%
  group_by(def_front) %>%
  summarise(
    teams = n_distinct(defteam),
    total_plays = sum(plays),
    avg_ypc = mean(ypc_allowed),
    avg_epa = mean(epa_per_rush),
    avg_stuff = mean(stuff_rate),
    .groups = "drop"
  ) %>%
  filter(def_front != "Other") %>%
  arrange(avg_epa)

front_comparison %>%
  gt() %>%
  cols_label(
    def_front = "Defensive Front",
    teams = "Teams",
    total_plays = "Total Plays",
    avg_ypc = "Avg YPC",
    avg_epa = "Avg EPA",
    avg_stuff = "Avg Stuff %"
  ) %>%
  fmt_number(
    columns = c(avg_ypc, avg_epa),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = avg_stuff,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(teams, total_plays),
    decimals = 0
  ) %>%
  tab_header(
    title = "Run Defense by Defensive Front",
    subtitle = "Comparing 3-4 vs 4-3 effectiveness"
  )

#| label: position-group-defense-py
#| message: false
#| warning: false

# Analyze by defensive personnel
def classify_front(pers):
    if pd.isna(pers):
        return "Other"
    if "4 DL" in str(pers):
        return "4-3 Front"
    elif "3 DL" in str(pers):
        return "3-4 Front"
    else:
        return "Other"

personnel_data = pbp.query(
    "play_type == 'run' & defteam.notna() & defense_personnel.notna()"
).copy()

personnel_data['def_front'] = personnel_data['defense_personnel'].apply(classify_front)

personnel_defense = (personnel_data
    .groupby(['defteam', 'def_front'])
    .filter(lambda x: len(x) >= 30)
    .groupby(['defteam', 'def_front'])
    .agg(
        plays=('play_id', 'count'),
        ypc_allowed=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean'),
        stuff_rate=('yards_gained', lambda x: (x <= 0).mean()),
        success_rate_allowed=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
)

# Compare fronts
front_comparison = (personnel_defense
    .query("def_front != 'Other'")
    .groupby('def_front')
    .agg(
        teams=('defteam', 'nunique'),
        total_plays=('plays', 'sum'),
        avg_ypc=('ypc_allowed', 'mean'),
        avg_epa=('epa_per_rush', 'mean'),
        avg_stuff=('stuff_rate', 'mean')
    )
    .reset_index()
    .sort_values('avg_epa')
)

print("\nRun Defense by Defensive Front:\n")
print(front_comparison.to_string(index=False))

Box Count Analysis

Number of defenders in the box significantly impacts run defense:

R
Python

#| label: box-count-analysis-r
#| message: false
#| warning: false

# Analyze run defense by box count
box_count_defense <- pbp %>%
  filter(play_type == "run", !is.na(defteam), !is.na(defenders_in_box)) %>%
  mutate(
    box_category = case_when(
      defenders_in_box <= 6 ~ "Light Box (≤6)",
      defenders_in_box == 7 ~ "Standard Box (7)",
      defenders_in_box >= 8 ~ "Loaded Box (8+)",
      TRUE ~ "Unknown"
    )
  ) %>%
  group_by(box_category) %>%
  summarise(
    plays = n(),
    ypc_allowed = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    stuff_rate = mean(yards_gained <= 0, na.rm = TRUE),
    explosive_rate = mean(yards_gained >= 10, na.rm = TRUE),
    success_rate_allowed = mean(epa > 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(box_category != "Unknown") %>%
  arrange(factor(box_category, levels = c("Light Box (≤6)", "Standard Box (7)", "Loaded Box (8+")))

box_count_defense %>%
  gt() %>%
  cols_label(
    box_category = "Box Count",
    plays = "Plays",
    ypc_allowed = "YPC",
    epa_per_rush = "EPA/Rush",
    stuff_rate = "Stuff %",
    explosive_rate = "Explosive %",
    success_rate_allowed = "Success %"
  ) %>%
  fmt_number(
    columns = c(ypc_allowed, epa_per_rush),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = c(stuff_rate, explosive_rate, success_rate_allowed),
    decimals = 1
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Run Defense Performance by Box Count",
    subtitle = "How many defenders needed to stop the run?"
  )

#| label: box-count-analysis-py
#| message: false
#| warning: false

# Analyze by box count
def categorize_box(count):
    if pd.isna(count):
        return "Unknown"
    if count <= 6:
        return "Light Box (≤6)"
    elif count == 7:
        return "Standard Box (7)"
    else:
        return "Loaded Box (8+)"

box_data = pbp.query(
    "play_type == 'run' & defteam.notna() & defenders_in_box.notna()"
).copy()

box_data['box_category'] = box_data['defenders_in_box'].apply(categorize_box)

box_count_defense = (box_data
    .query("box_category != 'Unknown'")
    .groupby('box_category')
    .agg(
        plays=('play_id', 'count'),
        ypc_allowed=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean'),
        stuff_rate=('yards_gained', lambda x: (x <= 0).mean()),
        explosive_rate=('yards_gained', lambda x: (x >= 10).mean()),
        success_rate_allowed=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
)

# Sort by box category
box_order = ['Light Box (≤6)', 'Standard Box (7)', 'Loaded Box (8+)']
box_count_defense['box_category'] = pd.Categorical(
    box_count_defense['box_category'],
    categories=box_order,
    ordered=True
)
box_count_defense = box_count_defense.sort_values('box_category')

print("\nRun Defense Performance by Box Count:\n")
print(box_count_defense.to_string(index=False))

Formation-Specific Run Defense

Different offensive formations present different run defense challenges.

Run Defense vs Personnel Groupings

R
Python

#| label: formation-defense-r
#| message: false
#| warning: false

# Analyze run defense vs offensive personnel
formation_defense <- pbp %>%
  filter(play_type == "run", !is.na(defteam), !is.na(offense_personnel)) %>%
  mutate(
    off_personnel = case_when(
      str_detect(offense_personnel, "1 RB, 1 TE") ~ "11 Personnel",
      str_detect(offense_personnel, "1 RB, 2 TE") ~ "12 Personnel",
      str_detect(offense_personnel, "1 RB, 3 TE") ~ "13 Personnel",
      str_detect(offense_personnel, "2 RB, 1 TE") ~ "21 Personnel",
      str_detect(offense_personnel, "2 RB, 2 TE") ~ "22 Personnel",
      TRUE ~ "Other"
    )
  ) %>%
  filter(off_personnel != "Other") %>%
  group_by(defteam, off_personnel) %>%
  filter(n() >= 20) %>%
  summarise(
    plays = n(),
    ypc_allowed = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    stuff_rate = mean(yards_gained <= 0, na.rm = TRUE),
    .groups = "drop"
  )

# Compare defensive performance by offensive personnel
personnel_comparison <- formation_defense %>%
  group_by(off_personnel) %>%
  summarise(
    teams = n_distinct(defteam),
    total_plays = sum(plays),
    avg_ypc = mean(ypc_allowed),
    avg_epa = mean(epa_per_rush),
    avg_stuff = mean(stuff_rate),
    .groups = "drop"
  ) %>%
  arrange(desc(total_plays))

personnel_comparison %>%
  gt() %>%
  cols_label(
    off_personnel = "Offensive Personnel",
    teams = "Defenses",
    total_plays = "Total Plays",
    avg_ypc = "Avg YPC",
    avg_epa = "Avg EPA",
    avg_stuff = "Avg Stuff %"
  ) %>%
  fmt_number(
    columns = c(avg_ypc, avg_epa),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = avg_stuff,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(teams, total_plays),
    decimals = 0
  ) %>%
  tab_header(
    title = "Run Defense vs Offensive Personnel",
    subtitle = "How do defenses perform against different formations?"
  )

#| label: formation-defense-py
#| message: false
#| warning: false

# Classify offensive personnel
def classify_offense_personnel(pers):
    if pd.isna(pers):
        return "Other"
    pers = str(pers)
    if "1 RB, 1 TE" in pers:
        return "11 Personnel"
    elif "1 RB, 2 TE" in pers:
        return "12 Personnel"
    elif "1 RB, 3 TE" in pers:
        return "13 Personnel"
    elif "2 RB, 1 TE" in pers:
        return "21 Personnel"
    elif "2 RB, 2 TE" in pers:
        return "22 Personnel"
    else:
        return "Other"

formation_data = pbp.query(
    "play_type == 'run' & defteam.notna() & offense_personnel.notna()"
).copy()

formation_data['off_personnel'] = formation_data['offense_personnel'].apply(
    classify_offense_personnel
)

formation_defense = (formation_data
    .query("off_personnel != 'Other'")
    .groupby(['defteam', 'off_personnel'])
    .filter(lambda x: len(x) >= 20)
    .groupby(['defteam', 'off_personnel'])
    .agg(
        plays=('play_id', 'count'),
        ypc_allowed=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean'),
        stuff_rate=('yards_gained', lambda x: (x <= 0).mean())
    )
    .reset_index()
)

# Compare by personnel
personnel_comparison = (formation_defense
    .groupby('off_personnel')
    .agg(
        teams=('defteam', 'nunique'),
        total_plays=('plays', 'sum'),
        avg_ypc=('ypc_allowed', 'mean'),
        avg_epa=('epa_per_rush', 'mean'),
        avg_stuff=('stuff_rate', 'mean')
    )
    .reset_index()
    .sort_values('total_plays', ascending=False)
)

print("\nRun Defense vs Offensive Personnel:\n")
print(personnel_comparison.to_string(index=False))

Comprehensive Run Defense Rankings

Let's create a composite run defense ranking combining multiple metrics:

R
Python

#| label: comprehensive-rankings-r
#| message: false
#| warning: false

# Build comprehensive run defense ranking
comprehensive_defense <- pbp %>%
  filter(play_type == "run", !is.na(defteam)) %>%
  group_by(defteam) %>%
  summarise(
    rushes = n(),
    epa_per_rush = mean(epa, na.rm = TRUE),
    success_rate_allowed = mean(epa > 0, na.rm = TRUE),
    stuff_rate = mean(yards_gained <= 0, na.rm = TRUE),
    explosive_rate = mean(yards_gained >= 10, na.rm = TRUE),
    ypc_allowed = mean(yards_gained, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    # Rank each component (lower is better)
    epa_rank = rank(epa_per_rush),
    success_rank = rank(success_rate_allowed),
    stuff_rank = rank(desc(stuff_rate)),
    explosive_rank = rank(explosive_rate),

    # Composite score (average of ranks)
    composite_score = (epa_rank + success_rank + stuff_rank + explosive_rank) / 4,
    overall_rank = rank(composite_score)
  ) %>%
  arrange(overall_rank)

# Display comprehensive rankings
comprehensive_defense %>%
  select(overall_rank, defteam, epa_per_rush, success_rate_allowed,
         stuff_rate, explosive_rate, ypc_allowed, composite_score) %>%
  head(20) %>%
  gt() %>%
  cols_label(
    overall_rank = "Rank",
    defteam = "Defense",
    epa_per_rush = "EPA/Rush",
    success_rate_allowed = "Success %",
    stuff_rate = "Stuff %",
    explosive_rate = "Explosive %",
    ypc_allowed = "YPC",
    composite_score = "Composite"
  ) %>%
  fmt_number(
    columns = c(epa_per_rush, ypc_allowed, composite_score),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = c(success_rate_allowed, stuff_rate, explosive_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = overall_rank,
    decimals = 0
  ) %>%
  data_color(
    columns = composite_score,
    colors = scales::col_numeric(
      palette = c("green", "yellow", "red"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Comprehensive Run Defense Rankings",
    subtitle = "2023 Season - Composite of EPA, Success Rate, Stuff Rate, and Explosives"
  )

#| label: comprehensive-rankings-py
#| message: false
#| warning: false

# Build comprehensive rankings
comprehensive_defense = (pbp
    .query("play_type == 'run' & defteam.notna()")
    .groupby('defteam')
    .agg(
        rushes=('play_id', 'count'),
        epa_per_rush=('epa', 'mean'),
        success_rate_allowed=('epa', lambda x: (x > 0).mean()),
        stuff_rate=('yards_gained', lambda x: (x <= 0).mean()),
        explosive_rate=('yards_gained', lambda x: (x >= 10).mean()),
        ypc_allowed=('yards_gained', 'mean')
    )
    .reset_index()
)

# Rank components
comprehensive_defense['epa_rank'] = comprehensive_defense['epa_per_rush'].rank()
comprehensive_defense['success_rank'] = comprehensive_defense['success_rate_allowed'].rank()
comprehensive_defense['stuff_rank'] = comprehensive_defense['stuff_rate'].rank(ascending=False)
comprehensive_defense['explosive_rank'] = comprehensive_defense['explosive_rate'].rank()

# Composite score
comprehensive_defense['composite_score'] = (
    comprehensive_defense['epa_rank'] +
    comprehensive_defense['success_rank'] +
    comprehensive_defense['stuff_rank'] +
    comprehensive_defense['explosive_rank']
) / 4

comprehensive_defense['overall_rank'] = comprehensive_defense['composite_score'].rank()
comprehensive_defense = comprehensive_defense.sort_values('overall_rank')

print("\nComprehensive Run Defense Rankings (2023 Season):\n")
display_cols = ['overall_rank', 'defteam', 'epa_per_rush', 'success_rate_allowed',
                'stuff_rate', 'explosive_rate', 'ypc_allowed', 'composite_score']
print(comprehensive_defense[display_cols].head(20).to_string(index=False))

Visualizing Run Defense Performance

R
Python

#| label: fig-defense-scatter-r
#| fig-cap: "Run defense performance: EPA vs Stuff Rate"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

comprehensive_defense %>%
  ggplot(aes(x = stuff_rate, y = epa_per_rush)) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_vline(xintercept = mean(comprehensive_defense$stuff_rate),
             linetype = "dashed", alpha = 0.5) +
  geom_nfl_logos(aes(team_abbr = defteam), width = 0.055, alpha = 0.8) +
  annotate("rect", xmin = mean(comprehensive_defense$stuff_rate),
           xmax = Inf, ymin = -Inf, ymax = 0,
           fill = "green", alpha = 0.1) +
  annotate("text", x = max(comprehensive_defense$stuff_rate) * 0.95,
           y = min(comprehensive_defense$epa_per_rush) * 1.1,
           label = "Elite\nRun Defense", hjust = 1, fontface = "bold",
           color = "darkgreen", size = 5) +
  scale_x_continuous(labels = scales::percent_format()) +
  labs(
    title = "Run Defense Performance Matrix",
    subtitle = "2023 NFL Season | Bottom-right = elite run defense (high stuff rate, low EPA)",
    x = "Stuff Rate (% of runs ≤0 yards)",
    y = "EPA per Rush Allowed",
    caption = "Data: nflfastR | Lower EPA and higher stuff rate = better defense"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    legend.position = "none"
  )

#| label: fig-defense-scatter-py
#| fig-cap: "Run defense performance matrix - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

fig, ax = plt.subplots(figsize=(12, 8))

# Create scatter
scatter = ax.scatter(comprehensive_defense['stuff_rate'],
                     comprehensive_defense['epa_per_rush'],
                     s=200, alpha=0.6, c='#013369')

# Add team labels
for _, row in comprehensive_defense.iterrows():
    ax.annotate(row['defteam'],
                xy=(row['stuff_rate'], row['epa_per_rush']),
                xytext=(3, 3), textcoords='offset points',
                fontsize=9, alpha=0.8, fontweight='bold')

# Add reference lines
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=comprehensive_defense['stuff_rate'].mean(),
           color='gray', linestyle='--', alpha=0.5)

# Highlight elite quadrant
ax.axhspan(comprehensive_defense['epa_per_rush'].min(),
           0,
           xmin=0.5, xmax=1,
           alpha=0.1, color='green')

ax.text(0.95, 0.05, 'Elite\nRun Defense',
        transform=ax.transAxes, fontsize=14, fontweight='bold',
        color='darkgreen', ha='right', va='bottom')

ax.set_xlabel('Stuff Rate (% of runs ≤0 yards)', fontsize=12)
ax.set_ylabel('EPA per Rush Allowed', fontsize=12)
ax.set_title('Run Defense Performance Matrix\n2023 NFL Season | Bottom-right = Elite',
             fontsize=14, fontweight='bold')
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))

plt.tight_layout()
plt.show()

Run-Stopping vs Coverage Trade-offs

Defensive coordinators face a fundamental trade-off: commit more defenders to stop the run (load the box) or prioritize pass coverage?

Analyzing the Trade-off

R
Python

#| label: tradeoff-analysis-r
#| message: false
#| warning: false

# Analyze run vs pass defense balance
team_balance <- pbp %>%
  filter(!is.na(defteam), play_type %in% c("run", "pass")) %>%
  group_by(defteam, play_type) %>%
  summarise(
    plays = n(),
    epa_per_play = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  pivot_wider(
    names_from = play_type,
    values_from = c(plays, epa_per_play, success_rate),
    names_sep = "_"
  ) %>%
  mutate(
    run_defense_rank = rank(epa_per_play_run),
    pass_defense_rank = rank(epa_per_play_pass),
    balance_score = abs(run_defense_rank - pass_defense_rank),
    defense_type = case_when(
      run_defense_rank <= 10 & pass_defense_rank <= 10 ~ "Elite Both",
      run_defense_rank <= 10 ~ "Run Focused",
      pass_defense_rank <= 10 ~ "Pass Focused",
      TRUE ~ "Average Both"
    )
  ) %>%
  arrange(run_defense_rank + pass_defense_rank)

# Display balanced defenses
team_balance %>%
  select(defteam, epa_per_play_run, epa_per_play_pass,
         run_defense_rank, pass_defense_rank, defense_type) %>%
  head(20) %>%
  gt() %>%
  cols_label(
    defteam = "Defense",
    epa_per_play_run = "Run EPA",
    epa_per_play_pass = "Pass EPA",
    run_defense_rank = "Run Rank",
    pass_defense_rank = "Pass Rank",
    defense_type = "Type"
  ) %>%
  fmt_number(
    columns = c(epa_per_play_run, epa_per_play_pass),
    decimals = 3
  ) %>%
  fmt_number(
    columns = c(run_defense_rank, pass_defense_rank),
    decimals = 0
  ) %>%
  data_color(
    columns = c(epa_per_play_run, epa_per_play_pass),
    colors = scales::col_numeric(
      palette = c("green", "white", "red"),
      domain = c(-0.2, 0.2)
    )
  ) %>%
  tab_header(
    title = "Run vs Pass Defense Balance",
    subtitle = "Best overall defenses in 2023"
  )

#| label: tradeoff-analysis-py
#| message: false
#| warning: false

# Analyze run vs pass balance
team_balance = (pbp
    .query("defteam.notna() & play_type.isin(['run', 'pass'])")
    .groupby(['defteam', 'play_type'])
    .agg(
        plays=('play_id', 'count'),
        epa_per_play=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
)

# Pivot to wide format
team_balance_wide = team_balance.pivot(
    index='defteam',
    columns='play_type',
    values=['plays', 'epa_per_play', 'success_rate']
)

team_balance_wide.columns = ['_'.join(col).strip() for col in team_balance_wide.columns]
team_balance_wide = team_balance_wide.reset_index()

# Rank and classify
team_balance_wide['run_defense_rank'] = team_balance_wide['epa_per_play_run'].rank()
team_balance_wide['pass_defense_rank'] = team_balance_wide['epa_per_play_pass'].rank()

def classify_defense(row):
    if row['run_defense_rank'] <= 10 and row['pass_defense_rank'] <= 10:
        return "Elite Both"
    elif row['run_defense_rank'] <= 10:
        return "Run Focused"
    elif row['pass_defense_rank'] <= 10:
        return "Pass Focused"
    else:
        return "Average Both"

team_balance_wide['defense_type'] = team_balance_wide.apply(classify_defense, axis=1)
team_balance_wide['total_rank'] = (team_balance_wide['run_defense_rank'] +
                                    team_balance_wide['pass_defense_rank'])
team_balance_wide = team_balance_wide.sort_values('total_rank')

print("\nRun vs Pass Defense Balance (Best Overall Defenses):\n")
display_cols = ['defteam', 'epa_per_play_run', 'epa_per_play_pass',
                'run_defense_rank', 'pass_defense_rank', 'defense_type']
print(team_balance_wide[display_cols].head(20).to_string(index=False))

Visualizing the Trade-off

R
Python

#| label: fig-tradeoff-viz-r
#| fig-cap: "Run defense vs pass defense trade-off"
#| fig-width: 12
#| fig-height: 10
#| message: false
#| warning: false

team_balance %>%
  ggplot(aes(x = run_defense_rank, y = pass_defense_rank)) +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed",
              color = "gray", alpha = 0.5) +
  geom_nfl_logos(aes(team_abbr = defteam), width = 0.055, alpha = 0.8) +
  annotate("rect", xmin = 0, xmax = 10, ymin = 0, ymax = 10,
           fill = "green", alpha = 0.1) +
  annotate("text", x = 5, y = 5, label = "Elite\nBoth",
           fontface = "bold", color = "darkgreen", size = 5) +
  scale_x_continuous(breaks = seq(0, 32, 4)) +
  scale_y_continuous(breaks = seq(0, 32, 4)) +
  coord_fixed() +
  labs(
    title = "Run Defense vs Pass Defense Rankings",
    subtitle = "2023 NFL Season | Top-left corner = elite at both",
    x = "Run Defense Rank (Lower is Better)",
    y = "Pass Defense Rank (Lower is Better)",
    caption = "Data: nflfastR | Teams above diagonal = better pass D, below = better run D"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    legend.position = "none",
    panel.grid.minor = element_blank()
  )

#| label: fig-tradeoff-viz-py
#| fig-cap: "Run vs pass defense trade-off - Python"
#| fig-width: 12
#| fig-height: 10
#| message: false
#| warning: false

fig, ax = plt.subplots(figsize=(12, 10))

# Create scatter
scatter = ax.scatter(team_balance_wide['run_defense_rank'],
                     team_balance_wide['pass_defense_rank'],
                     s=200, alpha=0.6, c='#013369')

# Add team labels
for _, row in team_balance_wide.iterrows():
    ax.annotate(row['defteam'],
                xy=(row['run_defense_rank'], row['pass_defense_rank']),
                xytext=(3, 3), textcoords='offset points',
                fontsize=9, alpha=0.8, fontweight='bold')

# Add diagonal line
ax.plot([0, 32], [0, 32], 'k--', alpha=0.3)

# Highlight elite quadrant
ax.axhspan(0, 10, xmin=0, xmax=10/32, alpha=0.1, color='green')
ax.text(5, 5, 'Elite\nBoth', fontsize=14, fontweight='bold',
        color='darkgreen', ha='center', va='center')

ax.set_xlabel('Run Defense Rank (Lower is Better)', fontsize=12)
ax.set_ylabel('Pass Defense Rank (Lower is Better)', fontsize=12)
ax.set_title('Run Defense vs Pass Defense Rankings\n2023 NFL Season | Top-left = Elite at Both',
             fontsize=14, fontweight='bold')
ax.set_xlim(0, 33)
ax.set_ylim(0, 33)
ax.set_aspect('equal')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Summary

Run defense analytics extends far beyond yards per carry allowed. Modern evaluation requires:

Key Metrics:
- EPA per rush allowed: Context-aware defensive effectiveness
- Stuff rate: Defensive line dominance and gap control
- Success rate allowed: Consistency in preventing successful plays
- Explosive plays allowed: Preventing big runs
- Yards before contact: Separating line play from tackling

Situational Analysis:
- Short yardage stop rates
- Goal line defense
- Run defense by direction
- Performance vs different formations

Personnel Evaluation:
- Defensive front effectiveness (3-4 vs 4-3)
- Box count impact
- Linebacker and defensive line contributions
- Gap discipline and assignment soundness

Strategic Considerations:
- Run defense vs pass coverage trade-offs
- When to load the box vs play coverage
- Adapting to opponent personnel
- Situational run defense priorities

Key Findings from 2023:
- Elite run defenses combine high stuff rates with low EPA allowed
- Box count significantly impacts run defense success
- Best defenses maintain balance between run and pass defense
- Short yardage and goal line defense require specific skill sets
- Directional weaknesses can be exploited by savvy offenses

Exercises

Conceptual Questions

EPA vs YPC: Explain why a defense allowing 4.5 YPC might actually be better than one allowing 3.8 YPC. What contextual factors could explain this?
Stuff Rate Importance: Why is stuff rate considered a better indicator of defensive line performance than just yards per carry allowed?
Box Count Trade-off: Discuss the strategic implications of loading the box to stop the run. What are the risks and benefits?

Coding Exercises

Exercise 1: Complete Run Defense Profile

Build a comprehensive run defense profile for your favorite team including: a) EPA and success rate allowed overall and by direction b) Stuff rate and explosive play rate c) Short yardage and goal line performance d) Performance vs different offensive personnel groupings e) Box count analysis Create visualizations for each component and a summary dashboard.

Exercise 2: Defensive Front Analysis

Compare 3-4 and 4-3 defensive fronts: a) Calculate run defense metrics for each front type b) Analyze stuff rate and gap control differences c) Examine performance by run direction d) Determine if one front is superior or if it's situation-dependent Which front would you recommend and why?

Exercise 3: Tackling Efficiency Model

Develop a model to estimate missed tackles: a) Use play-by-play data to identify plays with likely missed tackles b) Calculate team-level tackling efficiency metrics c) Analyze correlation with overall defensive performance d) Identify teams with tackling issues vs strong fundamentals How much does tackling efficiency impact overall run defense?

Exercise 4: Run Defense Trade-off Analysis

Analyze the run defense vs pass coverage trade-off: a) Calculate run and pass EPA allowed for all teams b) Identify teams that excel at both, neither, or one c) Analyze how box count affects both metrics d) Determine optimal defensive balance for different game situations When should a defense prioritize run-stopping over coverage?

References

:::

Learning ObjectivesBy the end of this chapter, you will be able to:

Introduction

The Modern Run Defense Challenge

The Problem with Yards Per Carry Allowed

Why YPC Allowed is Misleading

The Outlier Effect on Run Defense

Run EPA Allowed: A Better Framework

Understanding Run EPA Allowed

Why Run EPA Allowed is Better Than YPC

Comparing YPC and EPA Rankings

📊 Visualization Output

Stuff Rate: Defensive Line Dominance

Calculating Stuff Rate

Stuff Rate Definition

Stuff Rate vs EPA Allowed

Yards Before Contact Allowed

Analyzing Pre-Contact Defense

Yards Before Contact Allowed

Missed Tackles and Broken Tackles Allowed

Estimating Missed Tackles

Run Defense by Direction

Analyzing Directional Run Defense

Visualizing Directional Weaknesses

Short Yardage and Goal Line Defense

Short Yardage Conversion Defense

Goal Line Defense

Linebacker and Defensive Line Evaluation

Position Group Run Defense

Box Count Analysis

Formation-Specific Run Defense

Run Defense vs Personnel Groupings

Comprehensive Run Defense Rankings

Visualizing Run Defense Performance

Run-Stopping vs Coverage Trade-offs

Analyzing the Trade-off

Visualizing the Trade-off

Summary

Exercises

Conceptual Questions

Coding Exercises

Exercise 1: Complete Run Defense Profile

Exercise 2: Defensive Front Analysis

Exercise 3: Tackling Efficiency Model

Exercise 4: Run Defense Trade-off Analysis

Further Reading

References