Learning ObjectivesBy the end of this chapter, you will be able to:

  1. Build comprehensive defensive rating systems that combine multiple metrics
  2. Integrate pass and run defense metrics into unified evaluation frameworks
  3. Adjust defensive metrics for opponent quality and game context
  4. Create position-specific defensive grades for different defensive roles
  5. Develop predictive models for future defensive performance
  6. Analyze defensive unit synergy and coordination
  7. Evaluate defensive consistency and clutch performance

Introduction

Throughout the previous chapters in this part, we've explored various aspects of defensive play: basic defensive metrics (Chapter 13), coverage analysis (Chapter 14), pass rush evaluation (Chapter 15), and run defense (Chapter 16). Each chapter provided valuable insights into specific defensive components, but real-world defensive evaluation requires integrating these elements into comprehensive rating systems.

This chapter synthesizes everything we've learned, building advanced defensive metrics that account for multiple factors simultaneously. We'll develop opponent-adjusted ratings, context-aware metrics, and position-specific evaluation frameworks. Most importantly, we'll create predictive models that help forecast future defensive performance—crucial for player evaluation, roster construction, and game planning.

Why Comprehensive Defensive Metrics Matter

Modern NFL offenses are increasingly sophisticated, requiring defensive evaluation that goes beyond simple statistics. Comprehensive defensive metrics allow us to: - **Compare players across different schemes and situations** - **Identify which defensive units truly excel vs. benefit from favorable matchups** - **Predict future performance more accurately than traditional stats** - **Make data-driven decisions about personnel and strategy** - **Account for the interconnected nature of defensive play**

The Challenge of Defensive Evaluation

Why Defense is Harder to Evaluate

Defensive evaluation presents unique challenges compared to offensive analysis:

  1. Collective Nature: Defensive success requires all 11 players executing their assignments
  2. Scheme Dependence: Player performance is heavily influenced by defensive scheme
  3. Negative Attribution: Defense is often judged by what doesn't happen
  4. Limited Tracking Data: Public tracking data for defensive players is more limited
  5. Context Sensitivity: Quality of opponent significantly impacts defensive statistics

The 11-Player Problem

Unlike offense where quarterbacks and skill players can be evaluated somewhat independently, defensive performance is inherently collective. A defensive back playing perfect coverage can still allow a completion if the pass rush fails. A dominant pass rush may go unrewarded if the coverage breaks down.

The Multi-Dimensional Nature of Defense

Effective defensive evaluation must consider multiple dimensions:

  • Run Defense: Stopping the running game
  • Pass Coverage: Limiting passing yards and completions
  • Pass Rush: Generating pressure and sacks
  • Takeaways: Creating turnovers
  • Situational Performance: Third downs, red zone, etc.
  • Consistency: Performance variance game-to-game
  • Opponent Quality: Strength of opposing offenses

Building Composite Defensive Ratings

The Defensive EPA Framework

We'll start by building a comprehensive defensive rating system based on EPA. First, let's load our data and prepare it for analysis.

#| label: setup-r
#| message: false
#| warning: false

library(tidyverse)
library(nflfastR)
library(nflplotR)
library(gt)
library(glue)

# Load multiple seasons for robust analysis
seasons <- 2020:2023
pbp <- load_pbp(seasons)

# Filter to regular plays
pbp_clean <- pbp %>%
  filter(
    !is.na(posteam),
    !is.na(defteam),
    !is.na(epa),
    play_type %in% c("pass", "run"),
    season_type == "REG"
  )

cat("Loaded", nrow(pbp_clean), "plays from", min(seasons), "-", max(seasons), "\n")
#| label: setup-py
#| message: false
#| warning: false

import pandas as pd
import numpy as np
import nfl_data_py as nfl
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Load multiple seasons for robust analysis
seasons = list(range(2020, 2024))
pbp = nfl.import_pbp_data(seasons)

# Filter to regular plays
pbp_clean = pbp[
    pbp['posteam'].notna() &
    pbp['defteam'].notna() &
    pbp['epa'].notna() &
    pbp['play_type'].isin(['pass', 'run']) &
    (pbp['season_type'] == 'REG')
].copy()

print(f"Loaded {len(pbp_clean):,} plays from {min(seasons)}-{max(seasons)}")

Basic Defensive EPA Metrics

Let's calculate foundational defensive metrics by team and season:

#| label: basic-def-metrics-r
#| message: false
#| warning: false

# Calculate comprehensive defensive metrics
def_metrics <- pbp_clean %>%
  group_by(season, defteam) %>%
  summarise(
    plays = n(),

    # Overall EPA
    epa_per_play = -mean(epa),  # Negative because lower EPA is better for defense

    # Pass defense
    pass_plays = sum(play_type == "pass"),
    pass_epa = -mean(epa[play_type == "pass"]),
    pass_success_rate = 1 - mean(epa[play_type == "pass"] > 0),

    # Run defense
    run_plays = sum(play_type == "run"),
    run_epa = -mean(epa[play_type == "run"]),
    run_success_rate = 1 - mean(epa[play_type == "run"] > 0),

    # Explosive plays allowed
    explosive_pass_rate = mean(yards_gained[play_type == "pass"] >= 20),
    explosive_run_rate = mean(yards_gained[play_type == "run"] >= 10),

    # Pressure and sacks
    sacks = sum(sack == 1, na.rm = TRUE),
    sack_rate = mean(sack == 1, na.rm = TRUE),

    # Turnovers
    interceptions = sum(interception == 1, na.rm = TRUE),
    fumbles = sum(fumble_lost == 1, na.rm = TRUE),
    takeaways = interceptions + fumbles,
    takeaway_rate = takeaways / plays,

    .groups = "drop"
  ) %>%
  arrange(season, desc(epa_per_play))

# Display top defenses from 2023
def_metrics %>%
  filter(season == 2023) %>%
  select(defteam, plays, epa_per_play, pass_epa, run_epa,
         sack_rate, takeaway_rate) %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    plays = "Plays",
    epa_per_play = "EPA/Play",
    pass_epa = "Pass EPA",
    run_epa = "Run EPA",
    sack_rate = "Sack %",
    takeaway_rate = "TO %"
  ) %>%
  fmt_number(
    columns = c(epa_per_play, pass_epa, run_epa),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = c(sack_rate, takeaway_rate),
    decimals = 1
  ) %>%
  data_color(
    columns = epa_per_play,
    colors = scales::col_numeric(
      palette = c("#d73027", "#fee090", "#4575b4"),
      domain = NULL,
      reverse = TRUE
    )
  ) %>%
  tab_header(
    title = "Top 10 Defenses by EPA/Play",
    subtitle = "2023 NFL Regular Season"
  )
#| label: basic-def-metrics-py
#| message: false
#| warning: false

# Calculate comprehensive defensive metrics
def_metrics = pbp_clean.groupby(['season', 'defteam']).agg({
    'epa': ['count', 'mean'],
    'play_type': lambda x: (x == 'pass').sum(),
    'yards_gained': 'mean',
    'sack': lambda x: x.sum() if x.notna().any() else 0,
    'interception': lambda x: x.sum() if x.notna().any() else 0,
    'fumble_lost': lambda x: x.sum() if x.notna().any() else 0
}).reset_index()

# Flatten column names
def_metrics.columns = ['season', 'defteam', 'plays', 'epa_mean',
                       'pass_plays', 'yards_per_play', 'sacks',
                       'interceptions', 'fumbles']

# Calculate derived metrics
def_metrics['epa_per_play'] = -def_metrics['epa_mean']  # Negative for defense
def_metrics['run_plays'] = def_metrics['plays'] - def_metrics['pass_plays']

# Pass defense metrics
pass_plays = pbp_clean[pbp_clean['play_type'] == 'pass']
pass_metrics = pass_plays.groupby(['season', 'defteam']).agg({
    'epa': 'mean',
    'yards_gained': lambda x: (x >= 20).mean()
}).reset_index()
pass_metrics.columns = ['season', 'defteam', 'pass_epa_raw', 'explosive_pass_rate']
pass_metrics['pass_epa'] = -pass_metrics['pass_epa_raw']

# Run defense metrics
run_plays = pbp_clean[pbp_clean['play_type'] == 'run']
run_metrics = run_plays.groupby(['season', 'defteam']).agg({
    'epa': 'mean',
    'yards_gained': lambda x: (x >= 10).mean()
}).reset_index()
run_metrics.columns = ['season', 'defteam', 'run_epa_raw', 'explosive_run_rate']
run_metrics['run_epa'] = -run_metrics['run_epa_raw']

# Merge all metrics
def_metrics = def_metrics.merge(
    pass_metrics[['season', 'defteam', 'pass_epa', 'explosive_pass_rate']],
    on=['season', 'defteam']
)
def_metrics = def_metrics.merge(
    run_metrics[['season', 'defteam', 'run_epa', 'explosive_run_rate']],
    on=['season', 'defteam']
)

# Calculate rates
def_metrics['sack_rate'] = def_metrics['sacks'] / def_metrics['pass_plays']
def_metrics['takeaways'] = def_metrics['interceptions'] + def_metrics['fumbles']
def_metrics['takeaway_rate'] = def_metrics['takeaways'] / def_metrics['plays']

# Display top defenses from 2023
print("\nTop 10 Defenses by EPA/Play (2023):")
top_def_2023 = def_metrics[def_metrics['season'] == 2023].nlargest(10, 'epa_per_play')
print(top_def_2023[['defteam', 'plays', 'epa_per_play', 'pass_epa',
                     'run_epa', 'sack_rate', 'takeaway_rate']].to_string(index=False))

Creating a Composite Defensive Rating

Now let's create a composite rating that combines multiple defensive dimensions:

#| label: composite-rating-r
#| message: false
#| warning: false

# Function to standardize metrics (z-scores)
standardize <- function(x) {
  (x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}

# Create composite rating
def_composite <- def_metrics %>%
  group_by(season) %>%
  mutate(
    # Standardize each component (higher is better)
    epa_z = standardize(epa_per_play),
    pass_epa_z = standardize(pass_epa),
    run_epa_z = standardize(run_epa),
    sack_z = standardize(sack_rate),
    takeaway_z = standardize(takeaway_rate),
    explosive_pass_z = standardize(-explosive_pass_rate),  # Negative: fewer is better
    explosive_run_z = standardize(-explosive_run_rate),

    # Weighted composite score
    # EPA components: 50%, Pressure: 20%, Turnovers: 15%, Explosive plays: 15%
    composite_score = (
      0.25 * epa_z +           # Overall EPA
      0.15 * pass_epa_z +      # Pass EPA
      0.10 * run_epa_z +       # Run EPA
      0.20 * sack_z +          # Sack rate
      0.15 * takeaway_z +      # Takeaway rate
      0.08 * explosive_pass_z + # Explosive pass defense
      0.07 * explosive_run_z    # Explosive run defense
    )
  ) %>%
  ungroup() %>%
  arrange(season, desc(composite_score))

# Display 2023 rankings
def_composite %>%
  filter(season == 2023) %>%
  select(defteam, epa_per_play, pass_epa, run_epa,
         sack_rate, takeaway_rate, composite_score) %>%
  mutate(rank = row_number()) %>%
  select(rank, everything()) %>%
  head(15) %>%
  gt() %>%
  cols_label(
    rank = "Rank",
    defteam = "Team",
    epa_per_play = "EPA/Play",
    pass_epa = "Pass EPA",
    run_epa = "Run EPA",
    sack_rate = "Sack %",
    takeaway_rate = "TO %",
    composite_score = "Composite"
  ) %>%
  fmt_number(
    columns = c(epa_per_play, pass_epa, run_epa, composite_score),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = c(sack_rate, takeaway_rate),
    decimals = 1
  ) %>%
  data_color(
    columns = composite_score,
    colors = scales::col_numeric(
      palette = c("#d73027", "#fee090", "#4575b4"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Composite Defensive Rankings",
    subtitle = "2023 NFL Regular Season - Weighted Multi-Metric Rating"
  ) %>%
  tab_source_note(
    source_note = "Weights: EPA 50%, Pressure 20%, Turnovers 15%, Explosive Plays 15%"
  )
#| label: composite-rating-py
#| message: false
#| warning: false

# Function to standardize metrics (z-scores)
def standardize(x):
    return (x - x.mean()) / x.std()

# Create composite rating
def_composite = def_metrics.copy()

# Standardize each component by season
for season in def_composite['season'].unique():
    mask = def_composite['season'] == season

    def_composite.loc[mask, 'epa_z'] = standardize(def_composite.loc[mask, 'epa_per_play'])
    def_composite.loc[mask, 'pass_epa_z'] = standardize(def_composite.loc[mask, 'pass_epa'])
    def_composite.loc[mask, 'run_epa_z'] = standardize(def_composite.loc[mask, 'run_epa'])
    def_composite.loc[mask, 'sack_z'] = standardize(def_composite.loc[mask, 'sack_rate'])
    def_composite.loc[mask, 'takeaway_z'] = standardize(def_composite.loc[mask, 'takeaway_rate'])
    def_composite.loc[mask, 'explosive_pass_z'] = standardize(-def_composite.loc[mask, 'explosive_pass_rate'])
    def_composite.loc[mask, 'explosive_run_z'] = standardize(-def_composite.loc[mask, 'explosive_run_rate'])

# Weighted composite score
def_composite['composite_score'] = (
    0.25 * def_composite['epa_z'] +
    0.15 * def_composite['pass_epa_z'] +
    0.10 * def_composite['run_epa_z'] +
    0.20 * def_composite['sack_z'] +
    0.15 * def_composite['takeaway_z'] +
    0.08 * def_composite['explosive_pass_z'] +
    0.07 * def_composite['explosive_run_z']
)

# Display 2023 rankings
print("\nComposite Defensive Rankings (2023):")
rankings_2023 = def_composite[def_composite['season'] == 2023].sort_values(
    'composite_score', ascending=False
)
rankings_2023['rank'] = range(1, len(rankings_2023) + 1)

print(rankings_2023.head(15)[['rank', 'defteam', 'epa_per_play', 'pass_epa',
                               'run_epa', 'sack_rate', 'takeaway_rate',
                               'composite_score']].to_string(index=False))

Customizing Composite Ratings

The weights in this composite rating (50% EPA, 20% pressure, 15% turnovers, 15% explosive plays) can be adjusted based on your priorities or analytical findings. Some analysts may weight turnovers less due to year-to-year variance, while others may emphasize pass defense more heavily in pass-heavy eras.

Opponent-Adjusted Defensive Metrics

The Opponent Quality Problem

Raw defensive statistics can be misleading. A defense that faces weak offenses will naturally have better statistics than one facing elite offenses, even if the latter is actually the better unit.

We need to adjust for opponent quality to make fair comparisons.

#| label: opponent-adjustment-r
#| message: false
#| warning: false

# Step 1: Calculate offensive quality (EPA generated)
off_quality <- pbp_clean %>%
  group_by(season, posteam) %>%
  summarise(
    off_epa = mean(epa),
    .groups = "drop"
  )

# Step 2: Join offensive quality to defensive plays
pbp_adj <- pbp_clean %>%
  left_join(
    off_quality %>% rename(defteam = posteam, opp_off_quality = off_epa),
    by = c("season", "defteam")
  )

# Step 3: Calculate opponent-adjusted defensive metrics
def_opponent_adj <- pbp_adj %>%
  group_by(season, defteam) %>%
  summarise(
    plays = n(),

    # Raw metrics
    raw_epa = -mean(epa),

    # Average opponent quality faced
    avg_opp_quality = mean(opp_off_quality, na.rm = TRUE),

    .groups = "drop"
  )

# Step 4: Calculate league-average opponent quality
league_avg_opp <- mean(def_opponent_adj$avg_opp_quality, na.rm = TRUE)

# Step 5: Adjust defensive EPA for opponent strength
def_opponent_adj <- def_opponent_adj %>%
  mutate(
    # Adjustment factor: how much stronger/weaker were opponents than average
    opp_adjustment = avg_opp_quality - league_avg_opp,

    # Opponent-adjusted EPA (subtract opponent advantage)
    adj_epa = raw_epa - opp_adjustment,

    # Calculate difference
    adjustment_impact = adj_epa - raw_epa
  ) %>%
  arrange(season, desc(adj_epa))

# Compare raw vs adjusted for 2023
def_opponent_adj %>%
  filter(season == 2023) %>%
  select(defteam, raw_epa, avg_opp_quality, adj_epa, adjustment_impact) %>%
  mutate(
    raw_rank = rank(-raw_epa),
    adj_rank = rank(-adj_epa),
    rank_change = raw_rank - adj_rank
  ) %>%
  arrange(adj_rank) %>%
  head(15) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    raw_epa = "Raw EPA",
    avg_opp_quality = "Opp Quality",
    adj_epa = "Adj EPA",
    adjustment_impact = "Impact",
    raw_rank = "Raw Rank",
    adj_rank = "Adj Rank",
    rank_change = "Rank Δ"
  ) %>%
  fmt_number(
    columns = c(raw_epa, avg_opp_quality, adj_epa, adjustment_impact),
    decimals = 3
  ) %>%
  fmt_number(
    columns = c(raw_rank, adj_rank, rank_change),
    decimals = 0
  ) %>%
  data_color(
    columns = rank_change,
    colors = scales::col_numeric(
      palette = c("#d73027", "white", "#4575b4"),
      domain = c(-10, 10)
    )
  ) %>%
  tab_header(
    title = "Opponent-Adjusted Defensive EPA",
    subtitle = "2023 NFL Regular Season - Accounting for Schedule Strength"
  )
#| label: opponent-adjustment-py
#| message: false
#| warning: false

# Step 1: Calculate offensive quality
off_quality = pbp_clean.groupby(['season', 'posteam'])['epa'].mean().reset_index()
off_quality.columns = ['season', 'posteam', 'off_epa']

# Step 2: Join to get opponent quality for each defensive play
pbp_adj = pbp_clean.merge(
    off_quality.rename(columns={'posteam': 'defteam', 'off_epa': 'opp_off_quality'}),
    on=['season', 'defteam'],
    how='left'
)

# Step 3: Calculate opponent-adjusted metrics
def_opponent_adj = pbp_adj.groupby(['season', 'defteam']).agg({
    'epa': ['count', lambda x: -x.mean()],
    'opp_off_quality': 'mean'
}).reset_index()

def_opponent_adj.columns = ['season', 'defteam', 'plays', 'raw_epa', 'avg_opp_quality']

# Step 4: Calculate adjustments
league_avg_opp = def_opponent_adj['avg_opp_quality'].mean()

def_opponent_adj['opp_adjustment'] = def_opponent_adj['avg_opp_quality'] - league_avg_opp
def_opponent_adj['adj_epa'] = def_opponent_adj['raw_epa'] - def_opponent_adj['opp_adjustment']
def_opponent_adj['adjustment_impact'] = def_opponent_adj['adj_epa'] - def_opponent_adj['raw_epa']

# Calculate ranks
def_2023 = def_opponent_adj[def_opponent_adj['season'] == 2023].copy()
def_2023['raw_rank'] = def_2023['raw_epa'].rank(ascending=False)
def_2023['adj_rank'] = def_2023['adj_epa'].rank(ascending=False)
def_2023['rank_change'] = def_2023['raw_rank'] - def_2023['adj_rank']

# Display results
print("\nOpponent-Adjusted Defensive EPA (2023):")
print(def_2023.sort_values('adj_rank').head(15)[
    ['defteam', 'raw_epa', 'avg_opp_quality', 'adj_epa',
     'adjustment_impact', 'raw_rank', 'adj_rank', 'rank_change']
].to_string(index=False))

Strength of Schedule Visualization

Let's visualize how opponent quality affects defensive rankings:

#| label: fig-opponent-adjustment-r
#| fig-cap: "Impact of opponent adjustment on defensive rankings"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

# Create comparison plot for 2023
plot_data <- def_opponent_adj %>%
  filter(season == 2023) %>%
  mutate(
    raw_rank = rank(-raw_epa),
    adj_rank = rank(-adj_epa),
    rank_change = raw_rank - adj_rank,
    improved = rank_change > 0
  )

ggplot(plot_data, aes(x = raw_rank, y = adj_rank)) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray50") +
  geom_point(aes(color = rank_change), size = 3, alpha = 0.7) +
  geom_nfl_logos(aes(team_abbr = defteam), width = 0.05, alpha = 0.8) +
  scale_color_gradient2(
    low = "#d73027", mid = "white", high = "#4575b4",
    midpoint = 0,
    name = "Rank Change"
  ) +
  scale_x_continuous(breaks = seq(0, 32, 4)) +
  scale_y_continuous(breaks = seq(0, 32, 4)) +
  coord_fixed() +
  labs(
    title = "Opponent-Adjusted Defensive Rankings vs Raw Rankings",
    subtitle = "2023 NFL Season - Points above line improved after adjustment",
    x = "Raw EPA Rank",
    y = "Opponent-Adjusted EPA Rank",
    caption = "Data: nflfastR | Teams above line faced tougher schedules"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    panel.grid.minor = element_blank()
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-opponent-adjustment-py
#| fig-cap: "Impact of opponent adjustment on defensive rankings - Python"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

# Create comparison plot for 2023
fig, ax = plt.subplots(figsize=(10, 8))

# Plot diagonal line
ax.plot([0, 32], [0, 32], 'k--', alpha=0.3, label='No Change')

# Scatter plot
scatter = ax.scatter(
    def_2023['raw_rank'],
    def_2023['adj_rank'],
    c=def_2023['rank_change'],
    cmap='RdBu',
    s=100,
    alpha=0.6,
    edgecolors='black',
    linewidth=0.5,
    vmin=-10, vmax=10
)

# Add colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Rank Change', rotation=270, labelpad=20)

# Labels
ax.set_xlabel('Raw EPA Rank', fontsize=12)
ax.set_ylabel('Opponent-Adjusted EPA Rank', fontsize=12)
ax.set_title('Opponent-Adjusted Defensive Rankings vs Raw Rankings\n2023 NFL Season',
             fontsize=14, fontweight='bold')

# Grid and limits
ax.grid(True, alpha=0.3)
ax.set_xlim(0, 32)
ax.set_ylim(0, 32)
ax.set_aspect('equal')

plt.tight_layout()
plt.show()

Context-Adjusted EPA

Accounting for Game Situation

Defensive performance varies significantly based on game context. Let's adjust for:

  • Score Differential: Prevent defense vs. aggressive schemes
  • Quarter/Time: Early-game vs. late-game situations
  • Down and Distance: Expected vs. unexpected situations
#| label: context-adjustment-r
#| message: false
#| warning: false

# Create context categories
pbp_context <- pbp_clean %>%
  mutate(
    # Score context
    score_context = case_when(
      score_differential >= 14 ~ "Winning Big",
      score_differential >= 7 ~ "Winning",
      score_differential >= -6 ~ "Close",
      score_differential >= -13 ~ "Losing",
      TRUE ~ "Losing Big"
    ),

    # Time context
    time_context = case_when(
      qtr <= 2 ~ "First Half",
      qtr == 3 ~ "Third Quarter",
      qtr == 4 & game_seconds_remaining > 300 ~ "Fourth Quarter",
      TRUE ~ "Final 5 Minutes"
    ),

    # Down context
    down_context = case_when(
      down == 1 ~ "First Down",
      down == 2 ~ "Second Down",
      down == 3 & ydstogo >= 7 ~ "Third & Long",
      down == 3 ~ "Third & Short",
      down == 4 ~ "Fourth Down",
      TRUE ~ "Other"
    ),

    # High leverage situations
    high_leverage = (
      (down >= 3) |  # Third or fourth down
      (qtr == 4 & abs(score_differential) <= 8) |  # Close late game
      (yardline_100 <= 20)  # Red zone
    )
  )

# Calculate context-specific EPA
def_by_context <- pbp_context %>%
  group_by(season, defteam, score_context) %>%
  summarise(
    plays = n(),
    epa_per_play = -mean(epa),
    .groups = "drop"
  ) %>%
  pivot_wider(
    names_from = score_context,
    values_from = c(plays, epa_per_play),
    names_sep = "_"
  )

# High leverage performance
def_leverage <- pbp_context %>%
  group_by(season, defteam, high_leverage) %>%
  summarise(
    plays = n(),
    epa_per_play = -mean(epa),
    success_rate = 1 - mean(epa > 0),
    .groups = "drop"
  ) %>%
  mutate(leverage_label = ifelse(high_leverage, "High Leverage", "Standard"))

# Display 2023 high leverage performance
def_leverage %>%
  filter(season == 2023, high_leverage == TRUE) %>%
  arrange(desc(epa_per_play)) %>%
  select(defteam, plays, epa_per_play, success_rate) %>%
  head(15) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    plays = "Plays",
    epa_per_play = "EPA/Play",
    success_rate = "Success %"
  ) %>%
  fmt_number(
    columns = epa_per_play,
    decimals = 3
  ) %>%
  fmt_percent(
    columns = success_rate,
    decimals = 1
  ) %>%
  data_color(
    columns = epa_per_play,
    colors = scales::col_numeric(
      palette = c("#d73027", "#fee090", "#4575b4"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "High Leverage Defensive Performance",
    subtitle = "2023 NFL Season - 3rd/4th downs, close late games, red zone"
  )
#| label: context-adjustment-py
#| message: false
#| warning: false

# Create context categories
pbp_context = pbp_clean.copy()

# Score context
pbp_context['score_context'] = pd.cut(
    pbp_context['score_differential'],
    bins=[-100, -13, -7, 6, 13, 100],
    labels=['Losing Big', 'Losing', 'Close', 'Winning', 'Winning Big']
)

# Time context
def get_time_context(row):
    if row['qtr'] <= 2:
        return 'First Half'
    elif row['qtr'] == 3:
        return 'Third Quarter'
    elif row['qtr'] == 4 and row['game_seconds_remaining'] > 300:
        return 'Fourth Quarter'
    else:
        return 'Final 5 Minutes'

pbp_context['time_context'] = pbp_context.apply(get_time_context, axis=1)

# Down context
def get_down_context(row):
    if row['down'] == 1:
        return 'First Down'
    elif row['down'] == 2:
        return 'Second Down'
    elif row['down'] == 3 and row['ydstogo'] >= 7:
        return 'Third & Long'
    elif row['down'] == 3:
        return 'Third & Short'
    elif row['down'] == 4:
        return 'Fourth Down'
    else:
        return 'Other'

pbp_context['down_context'] = pbp_context.apply(get_down_context, axis=1)

# High leverage situations
pbp_context['high_leverage'] = (
    (pbp_context['down'] >= 3) |
    ((pbp_context['qtr'] == 4) & (pbp_context['score_differential'].abs() <= 8)) |
    (pbp_context['yardline_100'] <= 20)
)

# High leverage performance
def_leverage = pbp_context.groupby(['season', 'defteam', 'high_leverage']).agg({
    'epa': ['count', lambda x: -x.mean(), lambda x: (x <= 0).mean()]
}).reset_index()

def_leverage.columns = ['season', 'defteam', 'high_leverage',
                        'plays', 'epa_per_play', 'success_rate']

# Display 2023 high leverage performance
print("\nHigh Leverage Defensive Performance (2023):")
high_lev_2023 = def_leverage[
    (def_leverage['season'] == 2023) &
    (def_leverage['high_leverage'] == True)
].sort_values('epa_per_play', ascending=False)

print(high_lev_2023.head(15)[['defteam', 'plays', 'epa_per_play',
                                'success_rate']].to_string(index=False))

Clutch Defense Analysis

Let's identify which defenses perform best in clutch situations:

#| label: clutch-defense-r
#| message: false
#| warning: false

# Compare standard vs high leverage performance
clutch_comparison <- def_leverage %>%
  filter(season == 2023) %>%
  select(defteam, high_leverage, epa_per_play) %>%
  pivot_wider(
    names_from = high_leverage,
    values_from = epa_per_play,
    names_prefix = "leverage_"
  ) %>%
  rename(
    standard = leverage_FALSE,
    clutch = leverage_TRUE
  ) %>%
  mutate(
    clutch_delta = clutch - standard,
    clutch_performer = clutch_delta > 0
  ) %>%
  arrange(desc(clutch_delta))

# Display top clutch performers
clutch_comparison %>%
  select(defteam, standard, clutch, clutch_delta) %>%
  head(15) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    standard = "Standard EPA",
    clutch = "Clutch EPA",
    clutch_delta = "Clutch Δ"
  ) %>%
  fmt_number(
    columns = c(standard, clutch, clutch_delta),
    decimals = 3
  ) %>%
  data_color(
    columns = clutch_delta,
    colors = scales::col_numeric(
      palette = c("#d73027", "white", "#4575b4"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Clutch Defensive Performance",
    subtitle = "2023 NFL Season - High Leverage EPA vs Standard Situations"
  ) %>%
  tab_source_note(
    source_note = "Positive delta indicates better performance in clutch situations"
  )
#| label: clutch-defense-py
#| message: false
#| warning: false

# Compare standard vs high leverage performance
clutch_comparison = def_leverage[def_leverage['season'] == 2023].pivot_table(
    index='defteam',
    columns='high_leverage',
    values='epa_per_play'
).reset_index()

clutch_comparison.columns = ['defteam', 'standard', 'clutch']
clutch_comparison['clutch_delta'] = clutch_comparison['clutch'] - clutch_comparison['standard']
clutch_comparison = clutch_comparison.sort_values('clutch_delta', ascending=False)

print("\nClutch Defensive Performance (2023):")
print(clutch_comparison.head(15)[['defteam', 'standard', 'clutch',
                                   'clutch_delta']].to_string(index=False))

Position-Specific Defensive Evaluation

Framework for Position Evaluation

Different defensive positions require different evaluation metrics:

Position Key Metrics Data Sources
Cornerback Coverage EPA, targets, completion %, yards/target, TDs allowed Play-by-play, tracking
Safety Run support, coverage EPA, tackles, missed tackles Play-by-play, tackles
Linebacker Run stop rate, coverage, pressure, versatility Play-by-play, tackles
Defensive Line Pressure rate, run stop rate, sacks, hurries Play-by-play, pass rush

Cornerback Evaluation

#| label: cb-evaluation-r
#| message: false
#| warning: false

# Filter to targeted passes
cb_targets <- pbp_clean %>%
  filter(
    play_type == "pass",
    !is.na(pass_location),
    !is.na(air_yards)
  ) %>%
  mutate(
    # Categorize coverage situation
    coverage_situation = case_when(
      air_yards <= 0 ~ "Behind LOS",
      air_yards <= 5 ~ "Short",
      air_yards <= 15 ~ "Intermediate",
      air_yards <= 25 ~ "Deep",
      TRUE ~ "Very Deep"
    ),

    # Target outcome
    target_result = case_when(
      complete_pass == 1 & touchdown == 1 ~ "TD",
      complete_pass == 1 ~ "Completion",
      interception == 1 ~ "Interception",
      incomplete_pass == 1 ~ "Incomplete",
      TRUE ~ "Other"
    )
  )

# Team-level CB performance metrics
cb_team_metrics <- cb_targets %>%
  group_by(season, defteam) %>%
  summarise(
    targets = n(),
    completions = sum(complete_pass == 1, na.rm = TRUE),
    completion_pct = mean(complete_pass == 1, na.rm = TRUE),
    yards = sum(yards_gained, na.rm = TRUE),
    yards_per_target = mean(yards_gained, na.rm = TRUE),
    tds_allowed = sum(touchdown == 1, na.rm = TRUE),
    interceptions = sum(interception == 1, na.rm = TRUE),
    epa_per_target = -mean(epa),

    # Deep coverage
    deep_targets = sum(air_yards > 15, na.rm = TRUE),
    deep_completions = sum(complete_pass == 1 & air_yards > 15, na.rm = TRUE),
    deep_completion_pct = mean(complete_pass[air_yards > 15] == 1, na.rm = TRUE),

    .groups = "drop"
  )

# Display 2023 best coverage units
cb_team_metrics %>%
  filter(season == 2023) %>%
  arrange(desc(epa_per_target)) %>%
  select(defteam, targets, completion_pct, yards_per_target,
         tds_allowed, interceptions, epa_per_target) %>%
  head(12) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    targets = "Targets",
    completion_pct = "Comp %",
    yards_per_target = "Yds/Tgt",
    tds_allowed = "TDs",
    interceptions = "INTs",
    epa_per_target = "EPA/Tgt"
  ) %>%
  fmt_percent(
    columns = completion_pct,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(yards_per_target, epa_per_target),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(targets, tds_allowed, interceptions),
    decimals = 0
  ) %>%
  data_color(
    columns = epa_per_target,
    colors = scales::col_numeric(
      palette = c("#d73027", "#fee090", "#4575b4"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Best Coverage Units by EPA per Target",
    subtitle = "2023 NFL Season - Team Coverage Metrics"
  )
#| label: cb-evaluation-py
#| message: false
#| warning: false

# Filter to targeted passes
cb_targets = pbp_clean[
    (pbp_clean['play_type'] == 'pass') &
    (pbp_clean['pass_location'].notna()) &
    (pbp_clean['air_yards'].notna())
].copy()

# Categorize coverage situations
def get_coverage_situation(air_yards):
    if air_yards <= 0:
        return 'Behind LOS'
    elif air_yards <= 5:
        return 'Short'
    elif air_yards <= 15:
        return 'Intermediate'
    elif air_yards <= 25:
        return 'Deep'
    else:
        return 'Very Deep'

cb_targets['coverage_situation'] = cb_targets['air_yards'].apply(get_coverage_situation)

# Team-level CB performance metrics
cb_metrics_list = []
for (season, defteam), group in cb_targets.groupby(['season', 'defteam']):
    deep_targets = group[group['air_yards'] > 15]

    metrics = {
        'season': season,
        'defteam': defteam,
        'targets': len(group),
        'completions': group['complete_pass'].sum(),
        'completion_pct': group['complete_pass'].mean(),
        'yards': group['yards_gained'].sum(),
        'yards_per_target': group['yards_gained'].mean(),
        'tds_allowed': group['touchdown'].sum(),
        'interceptions': group['interception'].sum(),
        'epa_per_target': -group['epa'].mean(),
        'deep_targets': len(deep_targets),
        'deep_completion_pct': deep_targets['complete_pass'].mean() if len(deep_targets) > 0 else 0
    }
    cb_metrics_list.append(metrics)

cb_team_metrics = pd.DataFrame(cb_metrics_list)

# Display 2023 best coverage units
print("\nBest Coverage Units by EPA per Target (2023):")
top_coverage_2023 = cb_team_metrics[cb_team_metrics['season'] == 2023].sort_values(
    'epa_per_target', ascending=False
)

print(top_coverage_2023.head(12)[['defteam', 'targets', 'completion_pct',
                                   'yards_per_target', 'tds_allowed',
                                   'interceptions', 'epa_per_target']].to_string(index=False))

Pass Rush Evaluation

#| label: pass-rush-r
#| message: false
#| warning: false

# Pass rush metrics by team
pass_rush_metrics <- pbp_clean %>%
  filter(play_type == "pass") %>%
  group_by(season, defteam) %>%
  summarise(
    pass_plays = n(),

    # Sacks and pressure
    sacks = sum(sack == 1, na.rm = TRUE),
    sack_rate = mean(sack == 1, na.rm = TRUE),

    # QB hits and hurries (if available)
    qb_hits = sum(qb_hit == 1, na.rm = TRUE),

    # Time to throw proxy (yards vs air yards)
    quick_passes = sum(air_yards <= 5, na.rm = TRUE),
    quick_pass_rate = mean(air_yards <= 5, na.rm = TRUE),

    # EPA on non-sack passes
    pass_epa = -mean(epa[sack == 0], na.rm = TRUE),

    # Explosive pass plays allowed
    explosive_allowed = sum(yards_gained >= 20, na.rm = TRUE),
    explosive_rate = mean(yards_gained >= 20, na.rm = TRUE),

    .groups = "drop"
  ) %>%
  mutate(
    # Pressure proxy: higher sack rate + forcing quick passes
    pressure_score = sack_rate * 100 + quick_pass_rate * 50
  )

# Display 2023 best pass rush units
pass_rush_metrics %>%
  filter(season == 2023) %>%
  arrange(desc(pressure_score)) %>%
  select(defteam, pass_plays, sacks, sack_rate, quick_pass_rate,
         pass_epa, pressure_score) %>%
  head(12) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    pass_plays = "Plays",
    sacks = "Sacks",
    sack_rate = "Sack %",
    quick_pass_rate = "Quick %",
    pass_epa = "Pass EPA",
    pressure_score = "Pressure"
  ) %>%
  fmt_number(
    columns = c(sacks),
    decimals = 0
  ) %>%
  fmt_percent(
    columns = c(sack_rate, quick_pass_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(pass_epa, pressure_score),
    decimals = 2
  ) %>%
  data_color(
    columns = pressure_score,
    colors = scales::col_numeric(
      palette = c("#fee090", "#4575b4"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Best Pass Rush Units",
    subtitle = "2023 NFL Season - Pressure Score = Sack Rate + Quick Pass Rate"
  )
#| label: pass-rush-py
#| message: false
#| warning: false

# Pass rush metrics by team
pass_plays = pbp_clean[pbp_clean['play_type'] == 'pass'].copy()

pass_rush_list = []
for (season, defteam), group in pass_plays.groupby(['season', 'defteam']):
    non_sack = group[group['sack'] == 0]

    metrics = {
        'season': season,
        'defteam': defteam,
        'pass_plays': len(group),
        'sacks': group['sack'].sum(),
        'sack_rate': group['sack'].mean(),
        'qb_hits': group['qb_hit'].sum() if 'qb_hit' in group.columns else 0,
        'quick_passes': (group['air_yards'] <= 5).sum(),
        'quick_pass_rate': (group['air_yards'] <= 5).mean(),
        'pass_epa': -non_sack['epa'].mean() if len(non_sack) > 0 else 0,
        'explosive_allowed': (group['yards_gained'] >= 20).sum(),
        'explosive_rate': (group['yards_gained'] >= 20).mean()
    }
    pass_rush_list.append(metrics)

pass_rush_metrics = pd.DataFrame(pass_rush_list)
pass_rush_metrics['pressure_score'] = (
    pass_rush_metrics['sack_rate'] * 100 +
    pass_rush_metrics['quick_pass_rate'] * 50
)

# Display 2023 best pass rush units
print("\nBest Pass Rush Units (2023):")
top_rush_2023 = pass_rush_metrics[pass_rush_metrics['season'] == 2023].sort_values(
    'pressure_score', ascending=False
)

print(top_rush_2023.head(12)[['defteam', 'pass_plays', 'sacks', 'sack_rate',
                               'quick_pass_rate', 'pass_epa',
                               'pressure_score']].to_string(index=False))

Run Defense Evaluation

#| label: run-defense-r
#| message: false
#| warning: false

# Run defense metrics
run_defense_metrics <- pbp_clean %>%
  filter(play_type == "run") %>%
  group_by(season, defteam) %>%
  summarise(
    run_plays = n(),

    # Basic metrics
    yards_per_carry = mean(yards_gained, na.rm = TRUE),
    run_epa = -mean(epa),
    success_rate = 1 - mean(epa > 0),

    # Explosive run prevention
    explosive_runs = sum(yards_gained >= 10, na.rm = TRUE),
    explosive_rate = mean(yards_gained >= 10, na.rm = TRUE),

    # Stuff rate (runs for 0 or negative yards)
    stuffs = sum(yards_gained <= 0, na.rm = TRUE),
    stuff_rate = mean(yards_gained <= 0, na.rm = TRUE),

    # Power situations (short yardage)
    power_situations = sum(down %in% c(3, 4) & ydstogo <= 2),
    power_stops = sum(down %in% c(3, 4) & ydstogo <= 2 & yards_gained < ydstogo),
    power_stop_rate = ifelse(
      power_situations > 0,
      power_stops / power_situations,
      NA_real_
    ),

    .groups = "drop"
  )

# Display 2023 best run defenses
run_defense_metrics %>%
  filter(season == 2023) %>%
  arrange(desc(run_epa)) %>%
  select(defteam, run_plays, yards_per_carry, run_epa, success_rate,
         stuff_rate, explosive_rate) %>%
  head(12) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    run_plays = "Plays",
    yards_per_carry = "YPC",
    run_epa = "EPA/Play",
    success_rate = "Success %",
    stuff_rate = "Stuff %",
    explosive_rate = "Explosive %"
  ) %>%
  fmt_number(
    columns = c(run_plays),
    decimals = 0
  ) %>%
  fmt_number(
    columns = c(yards_per_carry, run_epa),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = c(success_rate, stuff_rate, explosive_rate),
    decimals = 1
  ) %>%
  data_color(
    columns = run_epa,
    colors = scales::col_numeric(
      palette = c("#d73027", "#fee090", "#4575b4"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Best Run Defense Units",
    subtitle = "2023 NFL Season - Run Defense Metrics"
  )
#| label: run-defense-py
#| message: false
#| warning: false

# Run defense metrics
run_plays = pbp_clean[pbp_clean['play_type'] == 'run'].copy()

run_defense_list = []
for (season, defteam), group in run_plays.groupby(['season', 'defteam']):
    power_sits = group[(group['down'].isin([3, 4])) & (group['ydstogo'] <= 2)]
    power_stops = power_sits[power_sits['yards_gained'] < power_sits['ydstogo']]

    metrics = {
        'season': season,
        'defteam': defteam,
        'run_plays': len(group),
        'yards_per_carry': group['yards_gained'].mean(),
        'run_epa': -group['epa'].mean(),
        'success_rate': 1 - (group['epa'] > 0).mean(),
        'explosive_runs': (group['yards_gained'] >= 10).sum(),
        'explosive_rate': (group['yards_gained'] >= 10).mean(),
        'stuffs': (group['yards_gained'] <= 0).sum(),
        'stuff_rate': (group['yards_gained'] <= 0).mean(),
        'power_situations': len(power_sits),
        'power_stop_rate': len(power_stops) / len(power_sits) if len(power_sits) > 0 else np.nan
    }
    run_defense_list.append(metrics)

run_defense_metrics = pd.DataFrame(run_defense_list)

# Display 2023 best run defenses
print("\nBest Run Defense Units (2023):")
top_run_def_2023 = run_defense_metrics[run_defense_metrics['season'] == 2023].sort_values(
    'run_epa', ascending=False
)

print(top_run_def_2023.head(12)[['defteam', 'run_plays', 'yards_per_carry',
                                  'run_epa', 'success_rate', 'stuff_rate',
                                  'explosive_rate']].to_string(index=False))

Defensive Unit Synergy Metrics

Measuring Coordination

Great defenses aren't just collections of good players—they feature coordinated units working together. Let's measure unit synergy:

#| label: unit-synergy-r
#| message: false
#| warning: false

# Calculate coverage and pass rush interaction
pass_synergy <- pbp_clean %>%
  filter(play_type == "pass", !is.na(air_yards)) %>%
  mutate(
    # Time proxy: deep passes suggest more time
    time_proxy = case_when(
      sack == 1 ~ "Sack",
      air_yards > 15 ~ "Long Time",
      air_yards > 8 ~ "Medium Time",
      TRUE ~ "Quick Release"
    )
  ) %>%
  group_by(season, defteam, time_proxy) %>%
  summarise(
    plays = n(),
    epa = -mean(epa),
    completion_pct = mean(complete_pass == 1, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  pivot_wider(
    names_from = time_proxy,
    values_from = c(plays, epa, completion_pct)
  )

# Synergy score: how well coverage holds up with time
synergy_score <- pass_synergy %>%
  filter(season == 2023) %>%
  mutate(
    # Better coverage on longer plays = better synergy
    synergy_score =
      `epa_Long Time` * 0.4 +
      `epa_Medium Time` * 0.3 +
      (1 - `completion_pct_Long Time`) * 0.3
  ) %>%
  arrange(desc(synergy_score))

# Display top synergy teams
synergy_score %>%
  select(defteam, `epa_Quick Release`, `epa_Medium Time`, `epa_Long Time`,
         `completion_pct_Long Time`, synergy_score) %>%
  head(12) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    `epa_Quick Release` = "Quick EPA",
    `epa_Medium Time` = "Med EPA",
    `epa_Long Time` = "Long EPA",
    `completion_pct_Long Time` = "Long Comp %",
    synergy_score = "Synergy"
  ) %>%
  fmt_number(
    columns = c(`epa_Quick Release`, `epa_Medium Time`, `epa_Long Time`, synergy_score),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = `completion_pct_Long Time`,
    decimals = 1
  ) %>%
  data_color(
    columns = synergy_score,
    colors = scales::col_numeric(
      palette = c("#fee090", "#4575b4"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Pass Defense Unit Synergy",
    subtitle = "2023 NFL Season - Coverage holding up with time = better synergy"
  )
#| label: unit-synergy-py
#| message: false
#| warning: false

# Calculate coverage and pass rush interaction
pass_plays_syn = pbp_clean[
    (pbp_clean['play_type'] == 'pass') &
    (pbp_clean['air_yards'].notna())
].copy()

# Time proxy
def get_time_proxy(row):
    if row['sack'] == 1:
        return 'Sack'
    elif row['air_yards'] > 15:
        return 'Long Time'
    elif row['air_yards'] > 8:
        return 'Medium Time'
    else:
        return 'Quick Release'

pass_plays_syn['time_proxy'] = pass_plays_syn.apply(get_time_proxy, axis=1)

# Calculate metrics by time proxy
synergy_list = []
for (season, defteam), team_data in pass_plays_syn.groupby(['season', 'defteam']):
    metrics = {'season': season, 'defteam': defteam}

    for time_cat in ['Quick Release', 'Medium Time', 'Long Time', 'Sack']:
        cat_data = team_data[team_data['time_proxy'] == time_cat]
        if len(cat_data) > 0:
            metrics[f'epa_{time_cat}'] = -cat_data['epa'].mean()
            metrics[f'comp_pct_{time_cat}'] = cat_data['complete_pass'].mean()
            metrics[f'plays_{time_cat}'] = len(cat_data)

    synergy_list.append(metrics)

synergy_df = pd.DataFrame(synergy_list)

# Calculate synergy score for 2023
synergy_2023 = synergy_df[synergy_df['season'] == 2023].copy()
synergy_2023['synergy_score'] = (
    synergy_2023.get('epa_Long Time', 0) * 0.4 +
    synergy_2023.get('epa_Medium Time', 0) * 0.3 +
    (1 - synergy_2023.get('comp_pct_Long Time', 0.5)) * 0.3
)

# Display top synergy teams
print("\nPass Defense Unit Synergy (2023):")
top_synergy = synergy_2023.nlargest(12, 'synergy_score')
print(top_synergy[['defteam', 'epa_Quick Release', 'epa_Medium Time',
                    'epa_Long Time', 'comp_pct_Long Time',
                    'synergy_score']].to_string(index=False))

Defensive Consistency vs Variance

Measuring Performance Stability

Consistent defenses are more valuable than volatile ones. Let's measure week-to-week consistency:

#| label: consistency-r
#| message: false
#| warning: false

# Calculate weekly defensive performance
weekly_def <- pbp_clean %>%
  group_by(season, week, defteam) %>%
  summarise(
    plays = n(),
    epa_per_play = -mean(epa),
    .groups = "drop"
  ) %>%
  filter(plays >= 30)  # Minimum plays threshold

# Calculate consistency metrics
def_consistency <- weekly_def %>%
  group_by(season, defteam) %>%
  summarise(
    games = n(),
    mean_epa = mean(epa_per_play),
    sd_epa = sd(epa_per_play),
    cv = sd_epa / abs(mean_epa),  # Coefficient of variation
    best_game = max(epa_per_play),
    worst_game = min(epa_per_play),
    range = best_game - worst_game,
    .groups = "drop"
  ) %>%
  filter(games >= 10)  # Full season minimum

# Display 2023 most consistent defenses
def_consistency %>%
  filter(season == 2023) %>%
  arrange(sd_epa) %>%
  select(defteam, mean_epa, sd_epa, cv, best_game, worst_game, range) %>%
  head(15) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    mean_epa = "Mean EPA",
    sd_epa = "Std Dev",
    cv = "CV",
    best_game = "Best",
    worst_game = "Worst",
    range = "Range"
  ) %>%
  fmt_number(
    columns = c(mean_epa, sd_epa, cv, best_game, worst_game, range),
    decimals = 3
  ) %>%
  data_color(
    columns = sd_epa,
    colors = scales::col_numeric(
      palette = c("#4575b4", "#fee090", "#d73027"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Most Consistent Defensive Units",
    subtitle = "2023 NFL Season - Lower standard deviation = more consistent"
  ) %>%
  tab_source_note(
    source_note = "CV = Coefficient of Variation (SD / Mean)"
  )
#| label: consistency-py
#| message: false
#| warning: false

# Calculate weekly defensive performance
weekly_def = pbp_clean.groupby(['season', 'week', 'defteam']).agg({
    'epa': ['count', lambda x: -x.mean()]
}).reset_index()

weekly_def.columns = ['season', 'week', 'defteam', 'plays', 'epa_per_play']
weekly_def = weekly_def[weekly_def['plays'] >= 30]

# Calculate consistency metrics
consistency_list = []
for (season, defteam), group in weekly_def.groupby(['season', 'defteam']):
    if len(group) >= 10:  # Full season minimum
        metrics = {
            'season': season,
            'defteam': defteam,
            'games': len(group),
            'mean_epa': group['epa_per_play'].mean(),
            'sd_epa': group['epa_per_play'].std(),
            'best_game': group['epa_per_play'].max(),
            'worst_game': group['epa_per_play'].min()
        }
        metrics['cv'] = metrics['sd_epa'] / abs(metrics['mean_epa'])
        metrics['range'] = metrics['best_game'] - metrics['worst_game']
        consistency_list.append(metrics)

def_consistency = pd.DataFrame(consistency_list)

# Display 2023 most consistent defenses
print("\nMost Consistent Defensive Units (2023):")
consistent_2023 = def_consistency[def_consistency['season'] == 2023].sort_values('sd_epa')
print(consistent_2023.head(15)[['defteam', 'mean_epa', 'sd_epa', 'cv',
                                 'best_game', 'worst_game', 'range']].to_string(index=False))

Visualizing Consistency

#| label: fig-consistency-r
#| fig-cap: "Defensive performance consistency"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

# Create consistency plot for 2023
consistency_2023 <- def_consistency %>%
  filter(season == 2023)

ggplot(consistency_2023, aes(x = mean_epa, y = sd_epa)) +
  geom_point(aes(size = range), alpha = 0.4, color = "gray60") +
  geom_nfl_logos(aes(team_abbr = defteam), width = 0.06, alpha = 0.8) +
  geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.3) +
  geom_hline(yintercept = median(consistency_2023$sd_epa),
             linetype = "dashed", alpha = 0.3, color = "blue") +
  scale_size_continuous(range = c(3, 15), name = "Performance Range") +
  labs(
    title = "Defensive Performance: Mean vs Consistency",
    subtitle = "2023 NFL Season - Top right = Good but inconsistent, Bottom right = Good and consistent",
    x = "Mean Defensive EPA per Play (Higher = Better)",
    y = "Standard Deviation (Lower = More Consistent)",
    caption = "Data: nflfastR | Size represents range (best - worst game)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    panel.grid.minor = element_blank()
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-consistency-py
#| fig-cap: "Defensive performance consistency - Python"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

# Create consistency plot for 2023
consistency_2023 = def_consistency[def_consistency['season'] == 2023]

fig, ax = plt.subplots(figsize=(10, 8))

# Scatter plot with size based on range
scatter = ax.scatter(
    consistency_2023['mean_epa'],
    consistency_2023['sd_epa'],
    s=consistency_2023['range'] * 500,
    alpha=0.6,
    c=consistency_2023['mean_epa'],
    cmap='RdYlBu',
    edgecolors='black',
    linewidth=0.5
)

# Add reference lines
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.3)
ax.axhline(y=consistency_2023['sd_epa'].median(), color='blue',
           linestyle='--', alpha=0.3)

# Labels
ax.set_xlabel('Mean Defensive EPA per Play (Higher = Better)', fontsize=12)
ax.set_ylabel('Standard Deviation (Lower = More Consistent)', fontsize=12)
ax.set_title('Defensive Performance: Mean vs Consistency\n2023 NFL Season',
             fontsize=14, fontweight='bold')

# Colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Mean EPA', rotation=270, labelpad=20)

# Add grid
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Predictive Defensive Models

Building a Predictive Framework

Let's build models to predict future defensive performance based on current metrics:

#| label: predictive-model-r
#| message: false
#| warning: false

# Create year-to-year prediction dataset
predict_data <- def_composite %>%
  arrange(defteam, season) %>%
  group_by(defteam) %>%
  mutate(
    next_season_epa = lead(epa_per_play),
    next_season_composite = lead(composite_score)
  ) %>%
  ungroup() %>%
  filter(!is.na(next_season_epa))

# Split into train and test
train_data <- predict_data %>% filter(season < 2023)
test_data <- predict_data %>% filter(season == 2023)

# Build multiple models
library(broom)

# Model 1: Simple persistence (this year predicts next year)
model1 <- lm(next_season_epa ~ epa_per_play, data = train_data)

# Model 2: Multiple components
model2 <- lm(next_season_epa ~ epa_per_play + sack_rate + takeaway_rate +
               explosive_pass_rate + explosive_run_rate, data = train_data)

# Model 3: Composite score
model3 <- lm(next_season_composite ~ composite_score, data = train_data)

# Display model comparisons
model_comparison <- bind_rows(
  glance(model1) %>% mutate(model = "EPA Only"),
  glance(model2) %>% mutate(model = "Multi-Component"),
  glance(model3) %>% mutate(model = "Composite Score")
) %>%
  select(model, r.squared, adj.r.squared, sigma, AIC)

model_comparison %>%
  gt() %>%
  cols_label(
    model = "Model",
    r.squared = "R²",
    adj.r.squared = "Adj R²",
    sigma = "RMSE",
    AIC = "AIC"
  ) %>%
  fmt_number(
    columns = c(r.squared, adj.r.squared, sigma),
    decimals = 3
  ) %>%
  fmt_number(
    columns = AIC,
    decimals = 1
  ) %>%
  tab_header(
    title = "Defensive Performance Prediction Models",
    subtitle = "Year-to-Year Predictive Power"
  )

# Make predictions for 2024
predictions_2024 <- def_composite %>%
  filter(season == 2023) %>%
  mutate(
    pred_epa_model1 = predict(model1, newdata = .),
    pred_epa_model2 = predict(model2, newdata = .),
    pred_composite = predict(model3, newdata = .)
  ) %>%
  arrange(desc(pred_composite))

# Display top predicted defenses for 2024
predictions_2024 %>%
  select(defteam, epa_per_play, composite_score,
         pred_epa_model1, pred_epa_model2, pred_composite) %>%
  head(15) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    epa_per_play = "2023 EPA",
    composite_score = "2023 Comp",
    pred_epa_model1 = "Pred EPA (M1)",
    pred_epa_model2 = "Pred EPA (M2)",
    pred_composite = "Pred Comp"
  ) %>%
  fmt_number(
    columns = c(epa_per_play, composite_score, pred_epa_model1,
                pred_epa_model2, pred_composite),
    decimals = 3
  ) %>%
  tab_header(
    title = "Predicted Best Defenses for 2024",
    subtitle = "Based on 2023 performance and historical patterns"
  )
#| label: predictive-model-py
#| message: false
#| warning: false

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error

# Create year-to-year prediction dataset
predict_data = def_composite.sort_values(['defteam', 'season']).copy()
predict_data['next_season_epa'] = predict_data.groupby('defteam')['epa_per_play'].shift(-1)
predict_data['next_season_composite'] = predict_data.groupby('defteam')['composite_score'].shift(-1)
predict_data = predict_data.dropna(subset=['next_season_epa'])

# Split into train and test
train_data = predict_data[predict_data['season'] < 2023]
test_data = predict_data[predict_data['season'] == 2023]

# Model 1: Simple persistence
X1_train = train_data[['epa_per_play']]
y1_train = train_data['next_season_epa']
model1 = LinearRegression().fit(X1_train, y1_train)

# Model 2: Multiple components
X2_train = train_data[['epa_per_play', 'sack_rate', 'takeaway_rate',
                        'explosive_pass_rate', 'explosive_run_rate']]
y2_train = train_data['next_season_epa']
model2 = LinearRegression().fit(X2_train, y2_train)

# Model 3: Composite score
X3_train = train_data[['composite_score']]
y3_train = train_data['next_season_composite']
model3 = LinearRegression().fit(X3_train, y3_train)

# Calculate R² and RMSE
models_comparison = []
for i, (model, X, y) in enumerate([
    (model1, X1_train, y1_train),
    (model2, X2_train, y2_train),
    (model3, X3_train, y3_train)
], 1):
    pred = model.predict(X)
    models_comparison.append({
        'model': ['EPA Only', 'Multi-Component', 'Composite Score'][i-1],
        'r_squared': r2_score(y, pred),
        'rmse': np.sqrt(mean_squared_error(y, pred))
    })

print("\nDefensive Performance Prediction Models:")
print(pd.DataFrame(models_comparison).to_string(index=False))

# Make predictions for 2024
pred_2024 = def_composite[def_composite['season'] == 2023].copy()
pred_2024['pred_epa_model1'] = model1.predict(pred_2024[['epa_per_play']])
pred_2024['pred_epa_model2'] = model2.predict(
    pred_2024[['epa_per_play', 'sack_rate', 'takeaway_rate',
               'explosive_pass_rate', 'explosive_run_rate']]
)
pred_2024['pred_composite'] = model3.predict(pred_2024[['composite_score']])

# Display top predicted defenses
print("\nPredicted Best Defenses for 2024:")
pred_2024_sorted = pred_2024.sort_values('pred_composite', ascending=False)
print(pred_2024_sorted.head(15)[['defteam', 'epa_per_play', 'composite_score',
                                  'pred_epa_model1', 'pred_epa_model2',
                                  'pred_composite']].to_string(index=False))

Understanding Predictive Accuracy

Defensive performance shows moderate year-to-year correlation (R² typically 0.15-0.30), which is lower than offensive metrics. This reflects: 1. **Personnel turnover**: Defensive rosters change significantly 2. **Scheme changes**: New coordinators and strategies 3. **Injury impact**: Key injuries affect defense more than offense 4. **Turnover regression**: High turnover rates rarely sustain 5. **Opponent evolution**: Offenses adapt to defensive strengths

Feature Importance Analysis

Let's identify which defensive components best predict future performance:

#| label: feature-importance-r
#| message: false
#| warning: false

# Calculate correlations with next year performance
feature_correlations <- train_data %>%
  select(next_season_epa, epa_per_play, pass_epa, run_epa,
         sack_rate, takeaway_rate, explosive_pass_rate, explosive_run_rate) %>%
  cor(use = "complete.obs")

# Extract correlations with next season
importance <- feature_correlations[-1, 1] %>%
  as.data.frame() %>%
  rownames_to_column("feature") %>%
  rename(correlation = ".") %>%
  arrange(desc(abs(correlation)))

# Visualize feature importance
importance %>%
  mutate(
    feature = fct_reorder(feature, correlation),
    feature_label = case_when(
      feature == "epa_per_play" ~ "Overall EPA",
      feature == "pass_epa" ~ "Pass EPA",
      feature == "run_epa" ~ "Run EPA",
      feature == "sack_rate" ~ "Sack Rate",
      feature == "takeaway_rate" ~ "Takeaway Rate",
      feature == "explosive_pass_rate" ~ "Explosive Pass Rate",
      feature == "explosive_run_rate" ~ "Explosive Run Rate",
      TRUE ~ feature
    )
  ) %>%
  ggplot(aes(x = correlation, y = feature_label, fill = correlation > 0)) +
  geom_col() +
  geom_vline(xintercept = 0, linetype = "solid", color = "black") +
  scale_fill_manual(values = c("TRUE" = "#4575b4", "FALSE" = "#d73027"), guide = "none") +
  labs(
    title = "Feature Importance for Predicting Next Season Defensive EPA",
    subtitle = "Correlation with year-ahead defensive performance",
    x = "Correlation with Next Season EPA",
    y = NULL,
    caption = "Data: 2020-2022 training data"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 13),
    panel.grid.major.y = element_blank()
  )
#| label: feature-importance-py
#| message: false
#| warning: false

# Calculate correlations with next year performance
features = ['epa_per_play', 'pass_epa', 'run_epa', 'sack_rate',
            'takeaway_rate', 'explosive_pass_rate', 'explosive_run_rate']

correlations = train_data[features + ['next_season_epa']].corr()['next_season_epa'][:-1]
importance = pd.DataFrame({
    'feature': correlations.index,
    'correlation': correlations.values
}).sort_values('correlation', key=abs, ascending=False)

# Visualize feature importance
fig, ax = plt.subplots(figsize=(10, 6))

feature_labels = {
    'epa_per_play': 'Overall EPA',
    'pass_epa': 'Pass EPA',
    'run_epa': 'Run EPA',
    'sack_rate': 'Sack Rate',
    'takeaway_rate': 'Takeaway Rate',
    'explosive_pass_rate': 'Explosive Pass Rate',
    'explosive_run_rate': 'Explosive Run Rate'
}

importance['feature_label'] = importance['feature'].map(feature_labels)
colors = ['#4575b4' if x > 0 else '#d73027' for x in importance['correlation']]

ax.barh(importance['feature_label'], importance['correlation'], color=colors)
ax.axvline(x=0, color='black', linewidth=0.8)
ax.set_xlabel('Correlation with Next Season EPA', fontsize=12)
ax.set_title('Feature Importance for Predicting Next Season Defensive EPA\nCorrelation with year-ahead defensive performance',
             fontsize=13, fontweight='bold')
ax.grid(True, axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

Historical Defensive Comparisons

Era Adjustments

Comparing defenses across eras requires adjusting for rule changes and offensive evolution:

#| label: era-adjustment-r
#| message: false
#| warning: false

# Calculate league average by season (as proxy for era)
league_avg_by_season <- pbp_clean %>%
  group_by(season) %>%
  summarise(
    league_avg_epa = mean(epa),
    league_avg_yards = mean(yards_gained, na.rm = TRUE),
    .groups = "drop"
  )

# Adjust defensive metrics for era
def_era_adjusted <- def_metrics %>%
  left_join(league_avg_by_season, by = "season") %>%
  mutate(
    # Era-adjusted EPA (compare to league average)
    era_adj_epa = epa_per_play - (-league_avg_epa),

    # Z-score within season
    era_z_score = (epa_per_play - mean(epa_per_play)) / sd(epa_per_play)
  ) %>%
  group_by(season) %>%
  mutate(era_z_score = standardize(epa_per_play)) %>%
  ungroup()

# Find best defenses across all seasons
best_defenses_all_time <- def_era_adjusted %>%
  arrange(desc(era_z_score)) %>%
  select(season, defteam, plays, epa_per_play, era_adj_epa, era_z_score) %>%
  head(20)

best_defenses_all_time %>%
  gt() %>%
  cols_label(
    season = "Season",
    defteam = "Team",
    plays = "Plays",
    epa_per_play = "Raw EPA",
    era_adj_epa = "Era Adj EPA",
    era_z_score = "Z-Score"
  ) %>%
  fmt_number(
    columns = c(plays),
    decimals = 0
  ) %>%
  fmt_number(
    columns = c(epa_per_play, era_adj_epa, era_z_score),
    decimals = 3
  ) %>%
  data_color(
    columns = era_z_score,
    colors = scales::col_numeric(
      palette = c("#fee090", "#4575b4"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Best Defensive Seasons (Era-Adjusted)",
    subtitle = "2020-2023 NFL Seasons - Ranked by within-season z-score"
  ) %>%
  tab_source_note(
    source_note = "Z-score accounts for league-wide offensive performance in each season"
  )
#| label: era-adjustment-py
#| message: false
#| warning: false

# Calculate league average by season
league_avg_by_season = pbp_clean.groupby('season').agg({
    'epa': 'mean',
    'yards_gained': 'mean'
}).reset_index()
league_avg_by_season.columns = ['season', 'league_avg_epa', 'league_avg_yards']

# Adjust defensive metrics for era
def_era_adjusted = def_metrics.merge(league_avg_by_season, on='season')
def_era_adjusted['era_adj_epa'] = (
    def_era_adjusted['epa_per_play'] - (-def_era_adjusted['league_avg_epa'])
)

# Z-score within season
def_era_adjusted['era_z_score'] = def_era_adjusted.groupby('season')['epa_per_play'].transform(
    lambda x: (x - x.mean()) / x.std()
)

# Find best defenses across all seasons
best_defenses_all_time = def_era_adjusted.nlargest(20, 'era_z_score')

print("\nBest Defensive Seasons (Era-Adjusted) 2020-2023:")
print(best_defenses_all_time[['season', 'defteam', 'plays', 'epa_per_play',
                               'era_adj_epa', 'era_z_score']].to_string(index=False))
#| label: fig-historical-trends-r
#| fig-cap: "Defensive EPA trends over time"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Calculate percentiles by season
def_percentiles <- def_era_adjusted %>%
  group_by(season) %>%
  summarise(
    p90 = quantile(epa_per_play, 0.90),
    p75 = quantile(epa_per_play, 0.75),
    p50 = quantile(epa_per_play, 0.50),
    p25 = quantile(epa_per_play, 0.25),
    p10 = quantile(epa_per_play, 0.10),
    .groups = "drop"
  )

# Plot trends
ggplot(def_percentiles, aes(x = season)) +
  geom_ribbon(aes(ymin = p10, ymax = p90), fill = "#4575b4", alpha = 0.2) +
  geom_ribbon(aes(ymin = p25, ymax = p75), fill = "#4575b4", alpha = 0.4) +
  geom_line(aes(y = p50), color = "#4575b4", size = 1.5) +
  geom_point(aes(y = p50), color = "#4575b4", size = 3) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  scale_x_continuous(breaks = 2020:2023) +
  labs(
    title = "Distribution of Defensive EPA Over Time",
    subtitle = "Dark band = 25th-75th percentile, Light band = 10th-90th percentile",
    x = "Season",
    y = "Defensive EPA per Play (Higher = Better)",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    panel.grid.minor = element_blank()
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-historical-trends-py
#| fig-cap: "Defensive EPA trends over time - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Calculate percentiles by season
percentiles = def_era_adjusted.groupby('season')['epa_per_play'].quantile(
    [0.10, 0.25, 0.50, 0.75, 0.90]
).unstack()

percentiles.columns = ['p10', 'p25', 'p50', 'p75', 'p90']
percentiles = percentiles.reset_index()

# Plot trends
fig, ax = plt.subplots(figsize=(12, 8))

# Fill between percentiles
ax.fill_between(percentiles['season'], percentiles['p10'], percentiles['p90'],
                alpha=0.2, color='#4575b4', label='10th-90th percentile')
ax.fill_between(percentiles['season'], percentiles['p25'], percentiles['p75'],
                alpha=0.4, color='#4575b4', label='25th-75th percentile')

# Median line
ax.plot(percentiles['season'], percentiles['p50'],
        color='#4575b4', linewidth=2.5, marker='o', markersize=8, label='Median')

# Reference line
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)

# Labels and formatting
ax.set_xlabel('Season', fontsize=12)
ax.set_ylabel('Defensive EPA per Play (Higher = Better)', fontsize=12)
ax.set_title('Distribution of Defensive EPA Over Time',
             fontsize=14, fontweight='bold')
ax.legend(loc='best')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

Complete Defensive Dashboard

Let's create a comprehensive defensive evaluation dashboard:

#| label: complete-dashboard-r
#| message: false
#| warning: false
#| fig-width: 14
#| fig-height: 12

library(patchwork)

# Select a team to analyze
team_to_analyze <- "SF"  # San Francisco 49ers
team_data_2023 <- def_composite %>% filter(season == 2023, defteam == team_to_analyze)

# 1. Multi-metric comparison plot
plot1 <- def_composite %>%
  filter(season == 2023) %>%
  select(defteam, epa_per_play, pass_epa, run_epa, composite_score) %>%
  mutate(
    highlight = defteam == team_to_analyze,
    rank = rank(-composite_score)
  ) %>%
  ggplot(aes(x = pass_epa, y = run_epa)) +
  geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.3) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.3) +
  geom_point(aes(alpha = highlight, size = highlight), color = "gray60") +
  geom_nfl_logos(
    data = . %>% filter(highlight),
    aes(team_abbr = defteam),
    width = 0.08
  ) +
  scale_alpha_manual(values = c(0.3, 1), guide = "none") +
  scale_size_manual(values = c(2, 4), guide = "none") +
  labs(
    title = "Pass Defense vs Run Defense",
    x = "Pass Defense EPA (Higher = Better)",
    y = "Run Defense EPA (Higher = Better)"
  ) +
  theme_minimal()

# 2. Rank comparison
plot2 <- def_composite %>%
  filter(season == 2023) %>%
  mutate(
    epa_rank = rank(-epa_per_play),
    composite_rank = rank(-composite_score),
    highlight = defteam == team_to_analyze
  ) %>%
  ggplot(aes(x = epa_rank, y = composite_rank)) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", alpha = 0.3) +
  geom_point(aes(alpha = highlight, size = highlight), color = "gray60") +
  geom_nfl_logos(
    data = . %>% filter(highlight),
    aes(team_abbr = defteam),
    width = 0.08
  ) +
  scale_alpha_manual(values = c(0.3, 1), guide = "none") +
  scale_size_manual(values = c(2, 4), guide = "none") +
  scale_x_continuous(breaks = seq(0, 32, 8)) +
  scale_y_continuous(breaks = seq(0, 32, 8)) +
  coord_fixed() +
  labs(
    title = "EPA Rank vs Composite Rank",
    x = "EPA Rank",
    y = "Composite Rank"
  ) +
  theme_minimal()

# 3. Historical performance
plot3 <- def_composite %>%
  filter(defteam == team_to_analyze) %>%
  ggplot(aes(x = season, y = epa_per_play)) +
  geom_line(size = 1.5, color = "#4575b4") +
  geom_point(size = 3, color = "#4575b4") +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.3) +
  labs(
    title = glue("{team_to_analyze} Defensive EPA Trend"),
    x = "Season",
    y = "EPA per Play"
  ) +
  theme_minimal()

# 4. Component breakdown
plot4 <- def_composite %>%
  filter(season == 2023, defteam == team_to_analyze) %>%
  select(epa_z, pass_epa_z, run_epa_z, sack_z, takeaway_z) %>%
  pivot_longer(everything(), names_to = "metric", values_to = "z_score") %>%
  mutate(
    metric = factor(metric,
                   levels = c("epa_z", "pass_epa_z", "run_epa_z", "sack_z", "takeaway_z"),
                   labels = c("Overall EPA", "Pass EPA", "Run EPA", "Sack Rate", "Turnovers"))
  ) %>%
  ggplot(aes(x = z_score, y = metric, fill = z_score > 0)) +
  geom_col() +
  geom_vline(xintercept = 0, linetype = "solid") +
  scale_fill_manual(values = c("FALSE" = "#d73027", "TRUE" = "#4575b4"), guide = "none") +
  labs(
    title = glue("{team_to_analyze} Component Performance (Z-Scores)"),
    x = "Z-Score (vs League Average)",
    y = NULL
  ) +
  theme_minimal()

# Combine plots
(plot1 | plot2) / (plot3 | plot4) +
  plot_annotation(
    title = glue("{team_to_analyze} Comprehensive Defensive Dashboard - 2023 Season"),
    theme = theme(plot.title = element_text(face = "bold", size = 16))
  )
#| label: complete-dashboard-py
#| message: false
#| warning: false
#| fig-width: 14
#| fig-height: 12

# Select a team to analyze
team_to_analyze = "SF"  # San Francisco 49ers

fig, axes = plt.subplots(2, 2, figsize=(14, 12))

# 1. Pass Defense vs Run Defense
ax1 = axes[0, 0]
data_2023 = def_composite[def_composite['season'] == 2023]
team_data = data_2023[data_2023['defteam'] == team_to_analyze]

ax1.scatter(data_2023['pass_epa'], data_2023['run_epa'],
           alpha=0.3, s=50, color='gray')
ax1.scatter(team_data['pass_epa'], team_data['run_epa'],
           s=200, color='#d73027', edgecolors='black', linewidth=2, zorder=5)
ax1.axvline(x=0, color='gray', linestyle='--', alpha=0.3)
ax1.axhline(y=0, color='gray', linestyle='--', alpha=0.3)
ax1.set_xlabel('Pass Defense EPA (Higher = Better)')
ax1.set_ylabel('Run Defense EPA (Higher = Better)')
ax1.set_title('Pass Defense vs Run Defense')
ax1.grid(True, alpha=0.3)

# 2. EPA Rank vs Composite Rank
ax2 = axes[0, 1]
data_2023['epa_rank'] = data_2023['epa_per_play'].rank(ascending=False)
data_2023['composite_rank'] = data_2023['composite_score'].rank(ascending=False)

ax2.scatter(data_2023['epa_rank'], data_2023['composite_rank'],
           alpha=0.3, s=50, color='gray')
team_data_ranks = data_2023[data_2023['defteam'] == team_to_analyze]
ax2.scatter(team_data_ranks['epa_rank'], team_data_ranks['composite_rank'],
           s=200, color='#d73027', edgecolors='black', linewidth=2, zorder=5)
ax2.plot([0, 32], [0, 32], 'k--', alpha=0.3)
ax2.set_xlabel('EPA Rank')
ax2.set_ylabel('Composite Rank')
ax2.set_title('EPA Rank vs Composite Rank')
ax2.grid(True, alpha=0.3)

# 3. Historical performance
ax3 = axes[1, 0]
team_history = def_composite[def_composite['defteam'] == team_to_analyze]
ax3.plot(team_history['season'], team_history['epa_per_play'],
        marker='o', linewidth=2.5, markersize=8, color='#4575b4')
ax3.axhline(y=0, color='gray', linestyle='--', alpha=0.3)
ax3.set_xlabel('Season')
ax3.set_ylabel('EPA per Play')
ax3.set_title(f'{team_to_analyze} Defensive EPA Trend')
ax3.grid(True, alpha=0.3)

# 4. Component breakdown
ax4 = axes[1, 1]
team_2023 = def_composite[
    (def_composite['season'] == 2023) &
    (def_composite['defteam'] == team_to_analyze)
].iloc[0]

components = {
    'Overall EPA': team_2023['epa_z'],
    'Pass EPA': team_2023['pass_epa_z'],
    'Run EPA': team_2023['run_epa_z'],
    'Sack Rate': team_2023['sack_z'],
    'Turnovers': team_2023['takeaway_z']
}

colors = ['#4575b4' if v > 0 else '#d73027' for v in components.values()]
ax4.barh(list(components.keys()), list(components.values()), color=colors)
ax4.axvline(x=0, color='black', linewidth=0.8)
ax4.set_xlabel('Z-Score (vs League Average)')
ax4.set_title(f'{team_to_analyze} Component Performance (Z-Scores)')
ax4.grid(True, axis='x', alpha=0.3)

plt.suptitle(f'{team_to_analyze} Comprehensive Defensive Dashboard - 2023 Season',
             fontsize=16, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()

Summary

In this chapter, we've built a comprehensive framework for advanced defensive evaluation:

Key Concepts Covered

  1. Composite Defensive Ratings: Combined multiple metrics (EPA, pressure, turnovers, explosive plays) into weighted ratings

  2. Opponent Adjustments: Accounted for strength of schedule to make fair comparisons between defensive units

  3. Context-Adjusted Metrics: Adjusted for game situation, score differential, and leverage to identify clutch performance

  4. Position-Specific Evaluation: Created targeted metrics for different defensive positions (CB, pass rush, run defense)

  5. Unit Synergy: Measured coordination between coverage and pass rush units

  6. Consistency Analysis: Identified reliable defenses vs. volatile ones using variance metrics

  7. Predictive Modeling: Built models to forecast future defensive performance

  8. Historical Comparisons: Developed era adjustments for comparing defenses across seasons

Key Takeaways

  • Defense is multifaceted: No single metric captures defensive quality; comprehensive evaluation requires combining multiple dimensions
  • Context matters: Raw statistics can be misleading; adjusting for opponent and situation is essential
  • Predictability is limited: Defensive performance is less stable year-to-year than offense (R² ~0.20-0.30)
  • Turnovers regress: High turnover rates are difficult to sustain
  • Synergy is valuable: Coordinated units outperform collections of individuals
  • Consistency counts: Reliable defenses are more valuable than volatile ones

Practical Applications

These advanced defensive metrics enable:

  • Team Evaluation: Identifying truly elite defenses vs. those benefiting from weak opponents
  • Player Valuation: Assessing individual contributions to defensive success
  • Strategic Planning: Understanding which defensive strengths are sustainable
  • Roster Construction: Building balanced, synergistic defensive units
  • Game Planning: Exploiting opponent defensive weaknesses
  • Draft Analysis: Projecting how defensive prospects will contribute

Exercises

Conceptual Questions

  1. Composite Rating Design: Design your own composite defensive rating system. What weights would you assign to different components (EPA, pressure, turnovers, etc.) and why?

  2. Turnover Sustainability: Research and explain why turnover rates show low year-to-year correlation. What implications does this have for defensive evaluation?

  3. Position Value: Which defensive position has the greatest impact on overall defensive EPA? Use evidence from the chapter to support your answer.

Coding Exercises

Exercise 1: Custom Defensive Rating System

Create your own composite defensive rating that includes: a) At least 5 different metrics b) Justification for your chosen weights c) Validation by comparing rankings to playoff success d) Visualization comparing your rating to EPA-only rankings **Bonus**: Test your rating system's predictive power for next-season performance.

Exercise 2: Opponent-Adjusted Analysis

Load data from the 2023 season and: a) Calculate opponent-adjusted EPA for each team b) Identify which teams' rankings changed most after adjustment c) Determine whether playoff teams faced stronger or weaker schedules d) Create visualizations showing the impact of adjustment **Hint**: Consider both offensive and defensive opponent quality.

Exercise 3: Position-Specific Framework

Build a complete evaluation framework for one defensive position: a) Identify 5-7 key metrics for the position b) Calculate these metrics by team for the 2023 season c) Create a composite score for position unit quality d) Validate by comparing to overall defensive performance **Challenge**: If player-level data is available, extend to individual player evaluation.

Exercise 4: Predictive Defensive Model

Build a model to predict next-season defensive EPA: a) Use data from 2020-2022 as training data b) Test different feature combinations c) Validate on 2023 season d) Generate predictions for 2024 e) Compare your model's accuracy to simple persistence (this year = next year) **Advanced**: Incorporate personnel changes (e.g., coordinator changes, key player departures) as features.

Exercise 5: Defensive Dashboard

Create an interactive defensive evaluation dashboard that includes: a) Overall defensive ranking (multiple metrics) b) Pass defense vs. run defense scatter plot c) Situational performance breakdown d) Historical trend line e) Opponent-adjusted metrics f) Consistency/variance analysis Make it filterable by team and season.

Further Reading

Academic Research

  • Yurko, R., Ventura, S., & Horowitz, M. (2019). "nflWAR: A reproducible method for offensive player evaluation in football." Journal of Quantitative Analysis in Sports, 15(3), 163-183.

  • Lopez, M. J., Matthews, G. J., & Baumer, B. S. (2018). "How often does the best team win? A unified approach to understanding randomness in North American sport." The Annals of Applied Statistics, 12(4), 2483-2516.

  • Burke, B. (2019). "Defenses and the Passing Game." ESPN Analytics.

Industry Resources

  • Football Outsiders. "DVOA Explained." https://www.footballoutsiders.com/info/methods

  • Pro Football Focus. "PFF Grading." https://www.pff.com/grades

  • Baldwin, B. (2021). "The nflfastR Beginner's Guide." https://www.nflfastR.com/

Books

  • Alamar, B. (2013). Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers. Columbia University Press.

  • Miller, T. W. (2015). Sports Analytics and Data Science. Pearson FT Press.

References

:::