Learning ObjectivesBy the end of this chapter, you will be able to:
- Build comprehensive defensive rating systems that combine multiple metrics
- Integrate pass and run defense metrics into unified evaluation frameworks
- Adjust defensive metrics for opponent quality and game context
- Create position-specific defensive grades for different defensive roles
- Develop predictive models for future defensive performance
- Analyze defensive unit synergy and coordination
- Evaluate defensive consistency and clutch performance
Introduction
Throughout the previous chapters in this part, we've explored various aspects of defensive play: basic defensive metrics (Chapter 13), coverage analysis (Chapter 14), pass rush evaluation (Chapter 15), and run defense (Chapter 16). Each chapter provided valuable insights into specific defensive components, but real-world defensive evaluation requires integrating these elements into comprehensive rating systems.
This chapter synthesizes everything we've learned, building advanced defensive metrics that account for multiple factors simultaneously. We'll develop opponent-adjusted ratings, context-aware metrics, and position-specific evaluation frameworks. Most importantly, we'll create predictive models that help forecast future defensive performance—crucial for player evaluation, roster construction, and game planning.
Why Comprehensive Defensive Metrics Matter
Modern NFL offenses are increasingly sophisticated, requiring defensive evaluation that goes beyond simple statistics. Comprehensive defensive metrics allow us to: - **Compare players across different schemes and situations** - **Identify which defensive units truly excel vs. benefit from favorable matchups** - **Predict future performance more accurately than traditional stats** - **Make data-driven decisions about personnel and strategy** - **Account for the interconnected nature of defensive play**The Challenge of Defensive Evaluation
Why Defense is Harder to Evaluate
Defensive evaluation presents unique challenges compared to offensive analysis:
- Collective Nature: Defensive success requires all 11 players executing their assignments
- Scheme Dependence: Player performance is heavily influenced by defensive scheme
- Negative Attribution: Defense is often judged by what doesn't happen
- Limited Tracking Data: Public tracking data for defensive players is more limited
- Context Sensitivity: Quality of opponent significantly impacts defensive statistics
The 11-Player Problem
Unlike offense where quarterbacks and skill players can be evaluated somewhat independently, defensive performance is inherently collective. A defensive back playing perfect coverage can still allow a completion if the pass rush fails. A dominant pass rush may go unrewarded if the coverage breaks down.The Multi-Dimensional Nature of Defense
Effective defensive evaluation must consider multiple dimensions:
- Run Defense: Stopping the running game
- Pass Coverage: Limiting passing yards and completions
- Pass Rush: Generating pressure and sacks
- Takeaways: Creating turnovers
- Situational Performance: Third downs, red zone, etc.
- Consistency: Performance variance game-to-game
- Opponent Quality: Strength of opposing offenses
Building Composite Defensive Ratings
The Defensive EPA Framework
We'll start by building a comprehensive defensive rating system based on EPA. First, let's load our data and prepare it for analysis.
#| label: setup-r
#| message: false
#| warning: false
library(tidyverse)
library(nflfastR)
library(nflplotR)
library(gt)
library(glue)
# Load multiple seasons for robust analysis
seasons <- 2020:2023
pbp <- load_pbp(seasons)
# Filter to regular plays
pbp_clean <- pbp %>%
filter(
!is.na(posteam),
!is.na(defteam),
!is.na(epa),
play_type %in% c("pass", "run"),
season_type == "REG"
)
cat("Loaded", nrow(pbp_clean), "plays from", min(seasons), "-", max(seasons), "\n")
#| label: setup-py
#| message: false
#| warning: false
import pandas as pd
import numpy as np
import nfl_data_py as nfl
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
# Load multiple seasons for robust analysis
seasons = list(range(2020, 2024))
pbp = nfl.import_pbp_data(seasons)
# Filter to regular plays
pbp_clean = pbp[
pbp['posteam'].notna() &
pbp['defteam'].notna() &
pbp['epa'].notna() &
pbp['play_type'].isin(['pass', 'run']) &
(pbp['season_type'] == 'REG')
].copy()
print(f"Loaded {len(pbp_clean):,} plays from {min(seasons)}-{max(seasons)}")
Basic Defensive EPA Metrics
Let's calculate foundational defensive metrics by team and season:
#| label: basic-def-metrics-r
#| message: false
#| warning: false
# Calculate comprehensive defensive metrics
def_metrics <- pbp_clean %>%
group_by(season, defteam) %>%
summarise(
plays = n(),
# Overall EPA
epa_per_play = -mean(epa), # Negative because lower EPA is better for defense
# Pass defense
pass_plays = sum(play_type == "pass"),
pass_epa = -mean(epa[play_type == "pass"]),
pass_success_rate = 1 - mean(epa[play_type == "pass"] > 0),
# Run defense
run_plays = sum(play_type == "run"),
run_epa = -mean(epa[play_type == "run"]),
run_success_rate = 1 - mean(epa[play_type == "run"] > 0),
# Explosive plays allowed
explosive_pass_rate = mean(yards_gained[play_type == "pass"] >= 20),
explosive_run_rate = mean(yards_gained[play_type == "run"] >= 10),
# Pressure and sacks
sacks = sum(sack == 1, na.rm = TRUE),
sack_rate = mean(sack == 1, na.rm = TRUE),
# Turnovers
interceptions = sum(interception == 1, na.rm = TRUE),
fumbles = sum(fumble_lost == 1, na.rm = TRUE),
takeaways = interceptions + fumbles,
takeaway_rate = takeaways / plays,
.groups = "drop"
) %>%
arrange(season, desc(epa_per_play))
# Display top defenses from 2023
def_metrics %>%
filter(season == 2023) %>%
select(defteam, plays, epa_per_play, pass_epa, run_epa,
sack_rate, takeaway_rate) %>%
head(10) %>%
gt() %>%
cols_label(
defteam = "Team",
plays = "Plays",
epa_per_play = "EPA/Play",
pass_epa = "Pass EPA",
run_epa = "Run EPA",
sack_rate = "Sack %",
takeaway_rate = "TO %"
) %>%
fmt_number(
columns = c(epa_per_play, pass_epa, run_epa),
decimals = 3
) %>%
fmt_percent(
columns = c(sack_rate, takeaway_rate),
decimals = 1
) %>%
data_color(
columns = epa_per_play,
colors = scales::col_numeric(
palette = c("#d73027", "#fee090", "#4575b4"),
domain = NULL,
reverse = TRUE
)
) %>%
tab_header(
title = "Top 10 Defenses by EPA/Play",
subtitle = "2023 NFL Regular Season"
)
#| label: basic-def-metrics-py
#| message: false
#| warning: false
# Calculate comprehensive defensive metrics
def_metrics = pbp_clean.groupby(['season', 'defteam']).agg({
'epa': ['count', 'mean'],
'play_type': lambda x: (x == 'pass').sum(),
'yards_gained': 'mean',
'sack': lambda x: x.sum() if x.notna().any() else 0,
'interception': lambda x: x.sum() if x.notna().any() else 0,
'fumble_lost': lambda x: x.sum() if x.notna().any() else 0
}).reset_index()
# Flatten column names
def_metrics.columns = ['season', 'defteam', 'plays', 'epa_mean',
'pass_plays', 'yards_per_play', 'sacks',
'interceptions', 'fumbles']
# Calculate derived metrics
def_metrics['epa_per_play'] = -def_metrics['epa_mean'] # Negative for defense
def_metrics['run_plays'] = def_metrics['plays'] - def_metrics['pass_plays']
# Pass defense metrics
pass_plays = pbp_clean[pbp_clean['play_type'] == 'pass']
pass_metrics = pass_plays.groupby(['season', 'defteam']).agg({
'epa': 'mean',
'yards_gained': lambda x: (x >= 20).mean()
}).reset_index()
pass_metrics.columns = ['season', 'defteam', 'pass_epa_raw', 'explosive_pass_rate']
pass_metrics['pass_epa'] = -pass_metrics['pass_epa_raw']
# Run defense metrics
run_plays = pbp_clean[pbp_clean['play_type'] == 'run']
run_metrics = run_plays.groupby(['season', 'defteam']).agg({
'epa': 'mean',
'yards_gained': lambda x: (x >= 10).mean()
}).reset_index()
run_metrics.columns = ['season', 'defteam', 'run_epa_raw', 'explosive_run_rate']
run_metrics['run_epa'] = -run_metrics['run_epa_raw']
# Merge all metrics
def_metrics = def_metrics.merge(
pass_metrics[['season', 'defteam', 'pass_epa', 'explosive_pass_rate']],
on=['season', 'defteam']
)
def_metrics = def_metrics.merge(
run_metrics[['season', 'defteam', 'run_epa', 'explosive_run_rate']],
on=['season', 'defteam']
)
# Calculate rates
def_metrics['sack_rate'] = def_metrics['sacks'] / def_metrics['pass_plays']
def_metrics['takeaways'] = def_metrics['interceptions'] + def_metrics['fumbles']
def_metrics['takeaway_rate'] = def_metrics['takeaways'] / def_metrics['plays']
# Display top defenses from 2023
print("\nTop 10 Defenses by EPA/Play (2023):")
top_def_2023 = def_metrics[def_metrics['season'] == 2023].nlargest(10, 'epa_per_play')
print(top_def_2023[['defteam', 'plays', 'epa_per_play', 'pass_epa',
'run_epa', 'sack_rate', 'takeaway_rate']].to_string(index=False))
Creating a Composite Defensive Rating
Now let's create a composite rating that combines multiple defensive dimensions:
#| label: composite-rating-r
#| message: false
#| warning: false
# Function to standardize metrics (z-scores)
standardize <- function(x) {
(x - mean(x, na.rm = TRUE)) / sd(x, na.rm = TRUE)
}
# Create composite rating
def_composite <- def_metrics %>%
group_by(season) %>%
mutate(
# Standardize each component (higher is better)
epa_z = standardize(epa_per_play),
pass_epa_z = standardize(pass_epa),
run_epa_z = standardize(run_epa),
sack_z = standardize(sack_rate),
takeaway_z = standardize(takeaway_rate),
explosive_pass_z = standardize(-explosive_pass_rate), # Negative: fewer is better
explosive_run_z = standardize(-explosive_run_rate),
# Weighted composite score
# EPA components: 50%, Pressure: 20%, Turnovers: 15%, Explosive plays: 15%
composite_score = (
0.25 * epa_z + # Overall EPA
0.15 * pass_epa_z + # Pass EPA
0.10 * run_epa_z + # Run EPA
0.20 * sack_z + # Sack rate
0.15 * takeaway_z + # Takeaway rate
0.08 * explosive_pass_z + # Explosive pass defense
0.07 * explosive_run_z # Explosive run defense
)
) %>%
ungroup() %>%
arrange(season, desc(composite_score))
# Display 2023 rankings
def_composite %>%
filter(season == 2023) %>%
select(defteam, epa_per_play, pass_epa, run_epa,
sack_rate, takeaway_rate, composite_score) %>%
mutate(rank = row_number()) %>%
select(rank, everything()) %>%
head(15) %>%
gt() %>%
cols_label(
rank = "Rank",
defteam = "Team",
epa_per_play = "EPA/Play",
pass_epa = "Pass EPA",
run_epa = "Run EPA",
sack_rate = "Sack %",
takeaway_rate = "TO %",
composite_score = "Composite"
) %>%
fmt_number(
columns = c(epa_per_play, pass_epa, run_epa, composite_score),
decimals = 3
) %>%
fmt_percent(
columns = c(sack_rate, takeaway_rate),
decimals = 1
) %>%
data_color(
columns = composite_score,
colors = scales::col_numeric(
palette = c("#d73027", "#fee090", "#4575b4"),
domain = NULL
)
) %>%
tab_header(
title = "Composite Defensive Rankings",
subtitle = "2023 NFL Regular Season - Weighted Multi-Metric Rating"
) %>%
tab_source_note(
source_note = "Weights: EPA 50%, Pressure 20%, Turnovers 15%, Explosive Plays 15%"
)
#| label: composite-rating-py
#| message: false
#| warning: false
# Function to standardize metrics (z-scores)
def standardize(x):
return (x - x.mean()) / x.std()
# Create composite rating
def_composite = def_metrics.copy()
# Standardize each component by season
for season in def_composite['season'].unique():
mask = def_composite['season'] == season
def_composite.loc[mask, 'epa_z'] = standardize(def_composite.loc[mask, 'epa_per_play'])
def_composite.loc[mask, 'pass_epa_z'] = standardize(def_composite.loc[mask, 'pass_epa'])
def_composite.loc[mask, 'run_epa_z'] = standardize(def_composite.loc[mask, 'run_epa'])
def_composite.loc[mask, 'sack_z'] = standardize(def_composite.loc[mask, 'sack_rate'])
def_composite.loc[mask, 'takeaway_z'] = standardize(def_composite.loc[mask, 'takeaway_rate'])
def_composite.loc[mask, 'explosive_pass_z'] = standardize(-def_composite.loc[mask, 'explosive_pass_rate'])
def_composite.loc[mask, 'explosive_run_z'] = standardize(-def_composite.loc[mask, 'explosive_run_rate'])
# Weighted composite score
def_composite['composite_score'] = (
0.25 * def_composite['epa_z'] +
0.15 * def_composite['pass_epa_z'] +
0.10 * def_composite['run_epa_z'] +
0.20 * def_composite['sack_z'] +
0.15 * def_composite['takeaway_z'] +
0.08 * def_composite['explosive_pass_z'] +
0.07 * def_composite['explosive_run_z']
)
# Display 2023 rankings
print("\nComposite Defensive Rankings (2023):")
rankings_2023 = def_composite[def_composite['season'] == 2023].sort_values(
'composite_score', ascending=False
)
rankings_2023['rank'] = range(1, len(rankings_2023) + 1)
print(rankings_2023.head(15)[['rank', 'defteam', 'epa_per_play', 'pass_epa',
'run_epa', 'sack_rate', 'takeaway_rate',
'composite_score']].to_string(index=False))
Customizing Composite Ratings
The weights in this composite rating (50% EPA, 20% pressure, 15% turnovers, 15% explosive plays) can be adjusted based on your priorities or analytical findings. Some analysts may weight turnovers less due to year-to-year variance, while others may emphasize pass defense more heavily in pass-heavy eras.Opponent-Adjusted Defensive Metrics
The Opponent Quality Problem
Raw defensive statistics can be misleading. A defense that faces weak offenses will naturally have better statistics than one facing elite offenses, even if the latter is actually the better unit.
We need to adjust for opponent quality to make fair comparisons.
#| label: opponent-adjustment-r
#| message: false
#| warning: false
# Step 1: Calculate offensive quality (EPA generated)
off_quality <- pbp_clean %>%
group_by(season, posteam) %>%
summarise(
off_epa = mean(epa),
.groups = "drop"
)
# Step 2: Join offensive quality to defensive plays
pbp_adj <- pbp_clean %>%
left_join(
off_quality %>% rename(defteam = posteam, opp_off_quality = off_epa),
by = c("season", "defteam")
)
# Step 3: Calculate opponent-adjusted defensive metrics
def_opponent_adj <- pbp_adj %>%
group_by(season, defteam) %>%
summarise(
plays = n(),
# Raw metrics
raw_epa = -mean(epa),
# Average opponent quality faced
avg_opp_quality = mean(opp_off_quality, na.rm = TRUE),
.groups = "drop"
)
# Step 4: Calculate league-average opponent quality
league_avg_opp <- mean(def_opponent_adj$avg_opp_quality, na.rm = TRUE)
# Step 5: Adjust defensive EPA for opponent strength
def_opponent_adj <- def_opponent_adj %>%
mutate(
# Adjustment factor: how much stronger/weaker were opponents than average
opp_adjustment = avg_opp_quality - league_avg_opp,
# Opponent-adjusted EPA (subtract opponent advantage)
adj_epa = raw_epa - opp_adjustment,
# Calculate difference
adjustment_impact = adj_epa - raw_epa
) %>%
arrange(season, desc(adj_epa))
# Compare raw vs adjusted for 2023
def_opponent_adj %>%
filter(season == 2023) %>%
select(defteam, raw_epa, avg_opp_quality, adj_epa, adjustment_impact) %>%
mutate(
raw_rank = rank(-raw_epa),
adj_rank = rank(-adj_epa),
rank_change = raw_rank - adj_rank
) %>%
arrange(adj_rank) %>%
head(15) %>%
gt() %>%
cols_label(
defteam = "Team",
raw_epa = "Raw EPA",
avg_opp_quality = "Opp Quality",
adj_epa = "Adj EPA",
adjustment_impact = "Impact",
raw_rank = "Raw Rank",
adj_rank = "Adj Rank",
rank_change = "Rank Δ"
) %>%
fmt_number(
columns = c(raw_epa, avg_opp_quality, adj_epa, adjustment_impact),
decimals = 3
) %>%
fmt_number(
columns = c(raw_rank, adj_rank, rank_change),
decimals = 0
) %>%
data_color(
columns = rank_change,
colors = scales::col_numeric(
palette = c("#d73027", "white", "#4575b4"),
domain = c(-10, 10)
)
) %>%
tab_header(
title = "Opponent-Adjusted Defensive EPA",
subtitle = "2023 NFL Regular Season - Accounting for Schedule Strength"
)
#| label: opponent-adjustment-py
#| message: false
#| warning: false
# Step 1: Calculate offensive quality
off_quality = pbp_clean.groupby(['season', 'posteam'])['epa'].mean().reset_index()
off_quality.columns = ['season', 'posteam', 'off_epa']
# Step 2: Join to get opponent quality for each defensive play
pbp_adj = pbp_clean.merge(
off_quality.rename(columns={'posteam': 'defteam', 'off_epa': 'opp_off_quality'}),
on=['season', 'defteam'],
how='left'
)
# Step 3: Calculate opponent-adjusted metrics
def_opponent_adj = pbp_adj.groupby(['season', 'defteam']).agg({
'epa': ['count', lambda x: -x.mean()],
'opp_off_quality': 'mean'
}).reset_index()
def_opponent_adj.columns = ['season', 'defteam', 'plays', 'raw_epa', 'avg_opp_quality']
# Step 4: Calculate adjustments
league_avg_opp = def_opponent_adj['avg_opp_quality'].mean()
def_opponent_adj['opp_adjustment'] = def_opponent_adj['avg_opp_quality'] - league_avg_opp
def_opponent_adj['adj_epa'] = def_opponent_adj['raw_epa'] - def_opponent_adj['opp_adjustment']
def_opponent_adj['adjustment_impact'] = def_opponent_adj['adj_epa'] - def_opponent_adj['raw_epa']
# Calculate ranks
def_2023 = def_opponent_adj[def_opponent_adj['season'] == 2023].copy()
def_2023['raw_rank'] = def_2023['raw_epa'].rank(ascending=False)
def_2023['adj_rank'] = def_2023['adj_epa'].rank(ascending=False)
def_2023['rank_change'] = def_2023['raw_rank'] - def_2023['adj_rank']
# Display results
print("\nOpponent-Adjusted Defensive EPA (2023):")
print(def_2023.sort_values('adj_rank').head(15)[
['defteam', 'raw_epa', 'avg_opp_quality', 'adj_epa',
'adjustment_impact', 'raw_rank', 'adj_rank', 'rank_change']
].to_string(index=False))
Strength of Schedule Visualization
Let's visualize how opponent quality affects defensive rankings:
#| label: fig-opponent-adjustment-r
#| fig-cap: "Impact of opponent adjustment on defensive rankings"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false
# Create comparison plot for 2023
plot_data <- def_opponent_adj %>%
filter(season == 2023) %>%
mutate(
raw_rank = rank(-raw_epa),
adj_rank = rank(-adj_epa),
rank_change = raw_rank - adj_rank,
improved = rank_change > 0
)
ggplot(plot_data, aes(x = raw_rank, y = adj_rank)) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray50") +
geom_point(aes(color = rank_change), size = 3, alpha = 0.7) +
geom_nfl_logos(aes(team_abbr = defteam), width = 0.05, alpha = 0.8) +
scale_color_gradient2(
low = "#d73027", mid = "white", high = "#4575b4",
midpoint = 0,
name = "Rank Change"
) +
scale_x_continuous(breaks = seq(0, 32, 4)) +
scale_y_continuous(breaks = seq(0, 32, 4)) +
coord_fixed() +
labs(
title = "Opponent-Adjusted Defensive Rankings vs Raw Rankings",
subtitle = "2023 NFL Season - Points above line improved after adjustment",
x = "Raw EPA Rank",
y = "Opponent-Adjusted EPA Rank",
caption = "Data: nflfastR | Teams above line faced tougher schedules"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
panel.grid.minor = element_blank()
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-opponent-adjustment-py
#| fig-cap: "Impact of opponent adjustment on defensive rankings - Python"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false
# Create comparison plot for 2023
fig, ax = plt.subplots(figsize=(10, 8))
# Plot diagonal line
ax.plot([0, 32], [0, 32], 'k--', alpha=0.3, label='No Change')
# Scatter plot
scatter = ax.scatter(
def_2023['raw_rank'],
def_2023['adj_rank'],
c=def_2023['rank_change'],
cmap='RdBu',
s=100,
alpha=0.6,
edgecolors='black',
linewidth=0.5,
vmin=-10, vmax=10
)
# Add colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Rank Change', rotation=270, labelpad=20)
# Labels
ax.set_xlabel('Raw EPA Rank', fontsize=12)
ax.set_ylabel('Opponent-Adjusted EPA Rank', fontsize=12)
ax.set_title('Opponent-Adjusted Defensive Rankings vs Raw Rankings\n2023 NFL Season',
fontsize=14, fontweight='bold')
# Grid and limits
ax.grid(True, alpha=0.3)
ax.set_xlim(0, 32)
ax.set_ylim(0, 32)
ax.set_aspect('equal')
plt.tight_layout()
plt.show()
Context-Adjusted EPA
Accounting for Game Situation
Defensive performance varies significantly based on game context. Let's adjust for:
- Score Differential: Prevent defense vs. aggressive schemes
- Quarter/Time: Early-game vs. late-game situations
- Down and Distance: Expected vs. unexpected situations
#| label: context-adjustment-r
#| message: false
#| warning: false
# Create context categories
pbp_context <- pbp_clean %>%
mutate(
# Score context
score_context = case_when(
score_differential >= 14 ~ "Winning Big",
score_differential >= 7 ~ "Winning",
score_differential >= -6 ~ "Close",
score_differential >= -13 ~ "Losing",
TRUE ~ "Losing Big"
),
# Time context
time_context = case_when(
qtr <= 2 ~ "First Half",
qtr == 3 ~ "Third Quarter",
qtr == 4 & game_seconds_remaining > 300 ~ "Fourth Quarter",
TRUE ~ "Final 5 Minutes"
),
# Down context
down_context = case_when(
down == 1 ~ "First Down",
down == 2 ~ "Second Down",
down == 3 & ydstogo >= 7 ~ "Third & Long",
down == 3 ~ "Third & Short",
down == 4 ~ "Fourth Down",
TRUE ~ "Other"
),
# High leverage situations
high_leverage = (
(down >= 3) | # Third or fourth down
(qtr == 4 & abs(score_differential) <= 8) | # Close late game
(yardline_100 <= 20) # Red zone
)
)
# Calculate context-specific EPA
def_by_context <- pbp_context %>%
group_by(season, defteam, score_context) %>%
summarise(
plays = n(),
epa_per_play = -mean(epa),
.groups = "drop"
) %>%
pivot_wider(
names_from = score_context,
values_from = c(plays, epa_per_play),
names_sep = "_"
)
# High leverage performance
def_leverage <- pbp_context %>%
group_by(season, defteam, high_leverage) %>%
summarise(
plays = n(),
epa_per_play = -mean(epa),
success_rate = 1 - mean(epa > 0),
.groups = "drop"
) %>%
mutate(leverage_label = ifelse(high_leverage, "High Leverage", "Standard"))
# Display 2023 high leverage performance
def_leverage %>%
filter(season == 2023, high_leverage == TRUE) %>%
arrange(desc(epa_per_play)) %>%
select(defteam, plays, epa_per_play, success_rate) %>%
head(15) %>%
gt() %>%
cols_label(
defteam = "Team",
plays = "Plays",
epa_per_play = "EPA/Play",
success_rate = "Success %"
) %>%
fmt_number(
columns = epa_per_play,
decimals = 3
) %>%
fmt_percent(
columns = success_rate,
decimals = 1
) %>%
data_color(
columns = epa_per_play,
colors = scales::col_numeric(
palette = c("#d73027", "#fee090", "#4575b4"),
domain = NULL
)
) %>%
tab_header(
title = "High Leverage Defensive Performance",
subtitle = "2023 NFL Season - 3rd/4th downs, close late games, red zone"
)
#| label: context-adjustment-py
#| message: false
#| warning: false
# Create context categories
pbp_context = pbp_clean.copy()
# Score context
pbp_context['score_context'] = pd.cut(
pbp_context['score_differential'],
bins=[-100, -13, -7, 6, 13, 100],
labels=['Losing Big', 'Losing', 'Close', 'Winning', 'Winning Big']
)
# Time context
def get_time_context(row):
if row['qtr'] <= 2:
return 'First Half'
elif row['qtr'] == 3:
return 'Third Quarter'
elif row['qtr'] == 4 and row['game_seconds_remaining'] > 300:
return 'Fourth Quarter'
else:
return 'Final 5 Minutes'
pbp_context['time_context'] = pbp_context.apply(get_time_context, axis=1)
# Down context
def get_down_context(row):
if row['down'] == 1:
return 'First Down'
elif row['down'] == 2:
return 'Second Down'
elif row['down'] == 3 and row['ydstogo'] >= 7:
return 'Third & Long'
elif row['down'] == 3:
return 'Third & Short'
elif row['down'] == 4:
return 'Fourth Down'
else:
return 'Other'
pbp_context['down_context'] = pbp_context.apply(get_down_context, axis=1)
# High leverage situations
pbp_context['high_leverage'] = (
(pbp_context['down'] >= 3) |
((pbp_context['qtr'] == 4) & (pbp_context['score_differential'].abs() <= 8)) |
(pbp_context['yardline_100'] <= 20)
)
# High leverage performance
def_leverage = pbp_context.groupby(['season', 'defteam', 'high_leverage']).agg({
'epa': ['count', lambda x: -x.mean(), lambda x: (x <= 0).mean()]
}).reset_index()
def_leverage.columns = ['season', 'defteam', 'high_leverage',
'plays', 'epa_per_play', 'success_rate']
# Display 2023 high leverage performance
print("\nHigh Leverage Defensive Performance (2023):")
high_lev_2023 = def_leverage[
(def_leverage['season'] == 2023) &
(def_leverage['high_leverage'] == True)
].sort_values('epa_per_play', ascending=False)
print(high_lev_2023.head(15)[['defteam', 'plays', 'epa_per_play',
'success_rate']].to_string(index=False))
Clutch Defense Analysis
Let's identify which defenses perform best in clutch situations:
#| label: clutch-defense-r
#| message: false
#| warning: false
# Compare standard vs high leverage performance
clutch_comparison <- def_leverage %>%
filter(season == 2023) %>%
select(defteam, high_leverage, epa_per_play) %>%
pivot_wider(
names_from = high_leverage,
values_from = epa_per_play,
names_prefix = "leverage_"
) %>%
rename(
standard = leverage_FALSE,
clutch = leverage_TRUE
) %>%
mutate(
clutch_delta = clutch - standard,
clutch_performer = clutch_delta > 0
) %>%
arrange(desc(clutch_delta))
# Display top clutch performers
clutch_comparison %>%
select(defteam, standard, clutch, clutch_delta) %>%
head(15) %>%
gt() %>%
cols_label(
defteam = "Team",
standard = "Standard EPA",
clutch = "Clutch EPA",
clutch_delta = "Clutch Δ"
) %>%
fmt_number(
columns = c(standard, clutch, clutch_delta),
decimals = 3
) %>%
data_color(
columns = clutch_delta,
colors = scales::col_numeric(
palette = c("#d73027", "white", "#4575b4"),
domain = NULL
)
) %>%
tab_header(
title = "Clutch Defensive Performance",
subtitle = "2023 NFL Season - High Leverage EPA vs Standard Situations"
) %>%
tab_source_note(
source_note = "Positive delta indicates better performance in clutch situations"
)
#| label: clutch-defense-py
#| message: false
#| warning: false
# Compare standard vs high leverage performance
clutch_comparison = def_leverage[def_leverage['season'] == 2023].pivot_table(
index='defteam',
columns='high_leverage',
values='epa_per_play'
).reset_index()
clutch_comparison.columns = ['defteam', 'standard', 'clutch']
clutch_comparison['clutch_delta'] = clutch_comparison['clutch'] - clutch_comparison['standard']
clutch_comparison = clutch_comparison.sort_values('clutch_delta', ascending=False)
print("\nClutch Defensive Performance (2023):")
print(clutch_comparison.head(15)[['defteam', 'standard', 'clutch',
'clutch_delta']].to_string(index=False))
Position-Specific Defensive Evaluation
Framework for Position Evaluation
Different defensive positions require different evaluation metrics:
| Position | Key Metrics | Data Sources |
|---|---|---|
| Cornerback | Coverage EPA, targets, completion %, yards/target, TDs allowed | Play-by-play, tracking |
| Safety | Run support, coverage EPA, tackles, missed tackles | Play-by-play, tackles |
| Linebacker | Run stop rate, coverage, pressure, versatility | Play-by-play, tackles |
| Defensive Line | Pressure rate, run stop rate, sacks, hurries | Play-by-play, pass rush |
Cornerback Evaluation
#| label: cb-evaluation-r
#| message: false
#| warning: false
# Filter to targeted passes
cb_targets <- pbp_clean %>%
filter(
play_type == "pass",
!is.na(pass_location),
!is.na(air_yards)
) %>%
mutate(
# Categorize coverage situation
coverage_situation = case_when(
air_yards <= 0 ~ "Behind LOS",
air_yards <= 5 ~ "Short",
air_yards <= 15 ~ "Intermediate",
air_yards <= 25 ~ "Deep",
TRUE ~ "Very Deep"
),
# Target outcome
target_result = case_when(
complete_pass == 1 & touchdown == 1 ~ "TD",
complete_pass == 1 ~ "Completion",
interception == 1 ~ "Interception",
incomplete_pass == 1 ~ "Incomplete",
TRUE ~ "Other"
)
)
# Team-level CB performance metrics
cb_team_metrics <- cb_targets %>%
group_by(season, defteam) %>%
summarise(
targets = n(),
completions = sum(complete_pass == 1, na.rm = TRUE),
completion_pct = mean(complete_pass == 1, na.rm = TRUE),
yards = sum(yards_gained, na.rm = TRUE),
yards_per_target = mean(yards_gained, na.rm = TRUE),
tds_allowed = sum(touchdown == 1, na.rm = TRUE),
interceptions = sum(interception == 1, na.rm = TRUE),
epa_per_target = -mean(epa),
# Deep coverage
deep_targets = sum(air_yards > 15, na.rm = TRUE),
deep_completions = sum(complete_pass == 1 & air_yards > 15, na.rm = TRUE),
deep_completion_pct = mean(complete_pass[air_yards > 15] == 1, na.rm = TRUE),
.groups = "drop"
)
# Display 2023 best coverage units
cb_team_metrics %>%
filter(season == 2023) %>%
arrange(desc(epa_per_target)) %>%
select(defteam, targets, completion_pct, yards_per_target,
tds_allowed, interceptions, epa_per_target) %>%
head(12) %>%
gt() %>%
cols_label(
defteam = "Team",
targets = "Targets",
completion_pct = "Comp %",
yards_per_target = "Yds/Tgt",
tds_allowed = "TDs",
interceptions = "INTs",
epa_per_target = "EPA/Tgt"
) %>%
fmt_percent(
columns = completion_pct,
decimals = 1
) %>%
fmt_number(
columns = c(yards_per_target, epa_per_target),
decimals = 2
) %>%
fmt_number(
columns = c(targets, tds_allowed, interceptions),
decimals = 0
) %>%
data_color(
columns = epa_per_target,
colors = scales::col_numeric(
palette = c("#d73027", "#fee090", "#4575b4"),
domain = NULL
)
) %>%
tab_header(
title = "Best Coverage Units by EPA per Target",
subtitle = "2023 NFL Season - Team Coverage Metrics"
)
#| label: cb-evaluation-py
#| message: false
#| warning: false
# Filter to targeted passes
cb_targets = pbp_clean[
(pbp_clean['play_type'] == 'pass') &
(pbp_clean['pass_location'].notna()) &
(pbp_clean['air_yards'].notna())
].copy()
# Categorize coverage situations
def get_coverage_situation(air_yards):
if air_yards <= 0:
return 'Behind LOS'
elif air_yards <= 5:
return 'Short'
elif air_yards <= 15:
return 'Intermediate'
elif air_yards <= 25:
return 'Deep'
else:
return 'Very Deep'
cb_targets['coverage_situation'] = cb_targets['air_yards'].apply(get_coverage_situation)
# Team-level CB performance metrics
cb_metrics_list = []
for (season, defteam), group in cb_targets.groupby(['season', 'defteam']):
deep_targets = group[group['air_yards'] > 15]
metrics = {
'season': season,
'defteam': defteam,
'targets': len(group),
'completions': group['complete_pass'].sum(),
'completion_pct': group['complete_pass'].mean(),
'yards': group['yards_gained'].sum(),
'yards_per_target': group['yards_gained'].mean(),
'tds_allowed': group['touchdown'].sum(),
'interceptions': group['interception'].sum(),
'epa_per_target': -group['epa'].mean(),
'deep_targets': len(deep_targets),
'deep_completion_pct': deep_targets['complete_pass'].mean() if len(deep_targets) > 0 else 0
}
cb_metrics_list.append(metrics)
cb_team_metrics = pd.DataFrame(cb_metrics_list)
# Display 2023 best coverage units
print("\nBest Coverage Units by EPA per Target (2023):")
top_coverage_2023 = cb_team_metrics[cb_team_metrics['season'] == 2023].sort_values(
'epa_per_target', ascending=False
)
print(top_coverage_2023.head(12)[['defteam', 'targets', 'completion_pct',
'yards_per_target', 'tds_allowed',
'interceptions', 'epa_per_target']].to_string(index=False))
Pass Rush Evaluation
#| label: pass-rush-r
#| message: false
#| warning: false
# Pass rush metrics by team
pass_rush_metrics <- pbp_clean %>%
filter(play_type == "pass") %>%
group_by(season, defteam) %>%
summarise(
pass_plays = n(),
# Sacks and pressure
sacks = sum(sack == 1, na.rm = TRUE),
sack_rate = mean(sack == 1, na.rm = TRUE),
# QB hits and hurries (if available)
qb_hits = sum(qb_hit == 1, na.rm = TRUE),
# Time to throw proxy (yards vs air yards)
quick_passes = sum(air_yards <= 5, na.rm = TRUE),
quick_pass_rate = mean(air_yards <= 5, na.rm = TRUE),
# EPA on non-sack passes
pass_epa = -mean(epa[sack == 0], na.rm = TRUE),
# Explosive pass plays allowed
explosive_allowed = sum(yards_gained >= 20, na.rm = TRUE),
explosive_rate = mean(yards_gained >= 20, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
# Pressure proxy: higher sack rate + forcing quick passes
pressure_score = sack_rate * 100 + quick_pass_rate * 50
)
# Display 2023 best pass rush units
pass_rush_metrics %>%
filter(season == 2023) %>%
arrange(desc(pressure_score)) %>%
select(defteam, pass_plays, sacks, sack_rate, quick_pass_rate,
pass_epa, pressure_score) %>%
head(12) %>%
gt() %>%
cols_label(
defteam = "Team",
pass_plays = "Plays",
sacks = "Sacks",
sack_rate = "Sack %",
quick_pass_rate = "Quick %",
pass_epa = "Pass EPA",
pressure_score = "Pressure"
) %>%
fmt_number(
columns = c(sacks),
decimals = 0
) %>%
fmt_percent(
columns = c(sack_rate, quick_pass_rate),
decimals = 1
) %>%
fmt_number(
columns = c(pass_epa, pressure_score),
decimals = 2
) %>%
data_color(
columns = pressure_score,
colors = scales::col_numeric(
palette = c("#fee090", "#4575b4"),
domain = NULL
)
) %>%
tab_header(
title = "Best Pass Rush Units",
subtitle = "2023 NFL Season - Pressure Score = Sack Rate + Quick Pass Rate"
)
#| label: pass-rush-py
#| message: false
#| warning: false
# Pass rush metrics by team
pass_plays = pbp_clean[pbp_clean['play_type'] == 'pass'].copy()
pass_rush_list = []
for (season, defteam), group in pass_plays.groupby(['season', 'defteam']):
non_sack = group[group['sack'] == 0]
metrics = {
'season': season,
'defteam': defteam,
'pass_plays': len(group),
'sacks': group['sack'].sum(),
'sack_rate': group['sack'].mean(),
'qb_hits': group['qb_hit'].sum() if 'qb_hit' in group.columns else 0,
'quick_passes': (group['air_yards'] <= 5).sum(),
'quick_pass_rate': (group['air_yards'] <= 5).mean(),
'pass_epa': -non_sack['epa'].mean() if len(non_sack) > 0 else 0,
'explosive_allowed': (group['yards_gained'] >= 20).sum(),
'explosive_rate': (group['yards_gained'] >= 20).mean()
}
pass_rush_list.append(metrics)
pass_rush_metrics = pd.DataFrame(pass_rush_list)
pass_rush_metrics['pressure_score'] = (
pass_rush_metrics['sack_rate'] * 100 +
pass_rush_metrics['quick_pass_rate'] * 50
)
# Display 2023 best pass rush units
print("\nBest Pass Rush Units (2023):")
top_rush_2023 = pass_rush_metrics[pass_rush_metrics['season'] == 2023].sort_values(
'pressure_score', ascending=False
)
print(top_rush_2023.head(12)[['defteam', 'pass_plays', 'sacks', 'sack_rate',
'quick_pass_rate', 'pass_epa',
'pressure_score']].to_string(index=False))
Run Defense Evaluation
#| label: run-defense-r
#| message: false
#| warning: false
# Run defense metrics
run_defense_metrics <- pbp_clean %>%
filter(play_type == "run") %>%
group_by(season, defteam) %>%
summarise(
run_plays = n(),
# Basic metrics
yards_per_carry = mean(yards_gained, na.rm = TRUE),
run_epa = -mean(epa),
success_rate = 1 - mean(epa > 0),
# Explosive run prevention
explosive_runs = sum(yards_gained >= 10, na.rm = TRUE),
explosive_rate = mean(yards_gained >= 10, na.rm = TRUE),
# Stuff rate (runs for 0 or negative yards)
stuffs = sum(yards_gained <= 0, na.rm = TRUE),
stuff_rate = mean(yards_gained <= 0, na.rm = TRUE),
# Power situations (short yardage)
power_situations = sum(down %in% c(3, 4) & ydstogo <= 2),
power_stops = sum(down %in% c(3, 4) & ydstogo <= 2 & yards_gained < ydstogo),
power_stop_rate = ifelse(
power_situations > 0,
power_stops / power_situations,
NA_real_
),
.groups = "drop"
)
# Display 2023 best run defenses
run_defense_metrics %>%
filter(season == 2023) %>%
arrange(desc(run_epa)) %>%
select(defteam, run_plays, yards_per_carry, run_epa, success_rate,
stuff_rate, explosive_rate) %>%
head(12) %>%
gt() %>%
cols_label(
defteam = "Team",
run_plays = "Plays",
yards_per_carry = "YPC",
run_epa = "EPA/Play",
success_rate = "Success %",
stuff_rate = "Stuff %",
explosive_rate = "Explosive %"
) %>%
fmt_number(
columns = c(run_plays),
decimals = 0
) %>%
fmt_number(
columns = c(yards_per_carry, run_epa),
decimals = 2
) %>%
fmt_percent(
columns = c(success_rate, stuff_rate, explosive_rate),
decimals = 1
) %>%
data_color(
columns = run_epa,
colors = scales::col_numeric(
palette = c("#d73027", "#fee090", "#4575b4"),
domain = NULL
)
) %>%
tab_header(
title = "Best Run Defense Units",
subtitle = "2023 NFL Season - Run Defense Metrics"
)
#| label: run-defense-py
#| message: false
#| warning: false
# Run defense metrics
run_plays = pbp_clean[pbp_clean['play_type'] == 'run'].copy()
run_defense_list = []
for (season, defteam), group in run_plays.groupby(['season', 'defteam']):
power_sits = group[(group['down'].isin([3, 4])) & (group['ydstogo'] <= 2)]
power_stops = power_sits[power_sits['yards_gained'] < power_sits['ydstogo']]
metrics = {
'season': season,
'defteam': defteam,
'run_plays': len(group),
'yards_per_carry': group['yards_gained'].mean(),
'run_epa': -group['epa'].mean(),
'success_rate': 1 - (group['epa'] > 0).mean(),
'explosive_runs': (group['yards_gained'] >= 10).sum(),
'explosive_rate': (group['yards_gained'] >= 10).mean(),
'stuffs': (group['yards_gained'] <= 0).sum(),
'stuff_rate': (group['yards_gained'] <= 0).mean(),
'power_situations': len(power_sits),
'power_stop_rate': len(power_stops) / len(power_sits) if len(power_sits) > 0 else np.nan
}
run_defense_list.append(metrics)
run_defense_metrics = pd.DataFrame(run_defense_list)
# Display 2023 best run defenses
print("\nBest Run Defense Units (2023):")
top_run_def_2023 = run_defense_metrics[run_defense_metrics['season'] == 2023].sort_values(
'run_epa', ascending=False
)
print(top_run_def_2023.head(12)[['defteam', 'run_plays', 'yards_per_carry',
'run_epa', 'success_rate', 'stuff_rate',
'explosive_rate']].to_string(index=False))
Defensive Unit Synergy Metrics
Measuring Coordination
Great defenses aren't just collections of good players—they feature coordinated units working together. Let's measure unit synergy:
#| label: unit-synergy-r
#| message: false
#| warning: false
# Calculate coverage and pass rush interaction
pass_synergy <- pbp_clean %>%
filter(play_type == "pass", !is.na(air_yards)) %>%
mutate(
# Time proxy: deep passes suggest more time
time_proxy = case_when(
sack == 1 ~ "Sack",
air_yards > 15 ~ "Long Time",
air_yards > 8 ~ "Medium Time",
TRUE ~ "Quick Release"
)
) %>%
group_by(season, defteam, time_proxy) %>%
summarise(
plays = n(),
epa = -mean(epa),
completion_pct = mean(complete_pass == 1, na.rm = TRUE),
.groups = "drop"
) %>%
pivot_wider(
names_from = time_proxy,
values_from = c(plays, epa, completion_pct)
)
# Synergy score: how well coverage holds up with time
synergy_score <- pass_synergy %>%
filter(season == 2023) %>%
mutate(
# Better coverage on longer plays = better synergy
synergy_score =
`epa_Long Time` * 0.4 +
`epa_Medium Time` * 0.3 +
(1 - `completion_pct_Long Time`) * 0.3
) %>%
arrange(desc(synergy_score))
# Display top synergy teams
synergy_score %>%
select(defteam, `epa_Quick Release`, `epa_Medium Time`, `epa_Long Time`,
`completion_pct_Long Time`, synergy_score) %>%
head(12) %>%
gt() %>%
cols_label(
defteam = "Team",
`epa_Quick Release` = "Quick EPA",
`epa_Medium Time` = "Med EPA",
`epa_Long Time` = "Long EPA",
`completion_pct_Long Time` = "Long Comp %",
synergy_score = "Synergy"
) %>%
fmt_number(
columns = c(`epa_Quick Release`, `epa_Medium Time`, `epa_Long Time`, synergy_score),
decimals = 3
) %>%
fmt_percent(
columns = `completion_pct_Long Time`,
decimals = 1
) %>%
data_color(
columns = synergy_score,
colors = scales::col_numeric(
palette = c("#fee090", "#4575b4"),
domain = NULL
)
) %>%
tab_header(
title = "Pass Defense Unit Synergy",
subtitle = "2023 NFL Season - Coverage holding up with time = better synergy"
)
#| label: unit-synergy-py
#| message: false
#| warning: false
# Calculate coverage and pass rush interaction
pass_plays_syn = pbp_clean[
(pbp_clean['play_type'] == 'pass') &
(pbp_clean['air_yards'].notna())
].copy()
# Time proxy
def get_time_proxy(row):
if row['sack'] == 1:
return 'Sack'
elif row['air_yards'] > 15:
return 'Long Time'
elif row['air_yards'] > 8:
return 'Medium Time'
else:
return 'Quick Release'
pass_plays_syn['time_proxy'] = pass_plays_syn.apply(get_time_proxy, axis=1)
# Calculate metrics by time proxy
synergy_list = []
for (season, defteam), team_data in pass_plays_syn.groupby(['season', 'defteam']):
metrics = {'season': season, 'defteam': defteam}
for time_cat in ['Quick Release', 'Medium Time', 'Long Time', 'Sack']:
cat_data = team_data[team_data['time_proxy'] == time_cat]
if len(cat_data) > 0:
metrics[f'epa_{time_cat}'] = -cat_data['epa'].mean()
metrics[f'comp_pct_{time_cat}'] = cat_data['complete_pass'].mean()
metrics[f'plays_{time_cat}'] = len(cat_data)
synergy_list.append(metrics)
synergy_df = pd.DataFrame(synergy_list)
# Calculate synergy score for 2023
synergy_2023 = synergy_df[synergy_df['season'] == 2023].copy()
synergy_2023['synergy_score'] = (
synergy_2023.get('epa_Long Time', 0) * 0.4 +
synergy_2023.get('epa_Medium Time', 0) * 0.3 +
(1 - synergy_2023.get('comp_pct_Long Time', 0.5)) * 0.3
)
# Display top synergy teams
print("\nPass Defense Unit Synergy (2023):")
top_synergy = synergy_2023.nlargest(12, 'synergy_score')
print(top_synergy[['defteam', 'epa_Quick Release', 'epa_Medium Time',
'epa_Long Time', 'comp_pct_Long Time',
'synergy_score']].to_string(index=False))
Defensive Consistency vs Variance
Measuring Performance Stability
Consistent defenses are more valuable than volatile ones. Let's measure week-to-week consistency:
#| label: consistency-r
#| message: false
#| warning: false
# Calculate weekly defensive performance
weekly_def <- pbp_clean %>%
group_by(season, week, defteam) %>%
summarise(
plays = n(),
epa_per_play = -mean(epa),
.groups = "drop"
) %>%
filter(plays >= 30) # Minimum plays threshold
# Calculate consistency metrics
def_consistency <- weekly_def %>%
group_by(season, defteam) %>%
summarise(
games = n(),
mean_epa = mean(epa_per_play),
sd_epa = sd(epa_per_play),
cv = sd_epa / abs(mean_epa), # Coefficient of variation
best_game = max(epa_per_play),
worst_game = min(epa_per_play),
range = best_game - worst_game,
.groups = "drop"
) %>%
filter(games >= 10) # Full season minimum
# Display 2023 most consistent defenses
def_consistency %>%
filter(season == 2023) %>%
arrange(sd_epa) %>%
select(defteam, mean_epa, sd_epa, cv, best_game, worst_game, range) %>%
head(15) %>%
gt() %>%
cols_label(
defteam = "Team",
mean_epa = "Mean EPA",
sd_epa = "Std Dev",
cv = "CV",
best_game = "Best",
worst_game = "Worst",
range = "Range"
) %>%
fmt_number(
columns = c(mean_epa, sd_epa, cv, best_game, worst_game, range),
decimals = 3
) %>%
data_color(
columns = sd_epa,
colors = scales::col_numeric(
palette = c("#4575b4", "#fee090", "#d73027"),
domain = NULL
)
) %>%
tab_header(
title = "Most Consistent Defensive Units",
subtitle = "2023 NFL Season - Lower standard deviation = more consistent"
) %>%
tab_source_note(
source_note = "CV = Coefficient of Variation (SD / Mean)"
)
#| label: consistency-py
#| message: false
#| warning: false
# Calculate weekly defensive performance
weekly_def = pbp_clean.groupby(['season', 'week', 'defteam']).agg({
'epa': ['count', lambda x: -x.mean()]
}).reset_index()
weekly_def.columns = ['season', 'week', 'defteam', 'plays', 'epa_per_play']
weekly_def = weekly_def[weekly_def['plays'] >= 30]
# Calculate consistency metrics
consistency_list = []
for (season, defteam), group in weekly_def.groupby(['season', 'defteam']):
if len(group) >= 10: # Full season minimum
metrics = {
'season': season,
'defteam': defteam,
'games': len(group),
'mean_epa': group['epa_per_play'].mean(),
'sd_epa': group['epa_per_play'].std(),
'best_game': group['epa_per_play'].max(),
'worst_game': group['epa_per_play'].min()
}
metrics['cv'] = metrics['sd_epa'] / abs(metrics['mean_epa'])
metrics['range'] = metrics['best_game'] - metrics['worst_game']
consistency_list.append(metrics)
def_consistency = pd.DataFrame(consistency_list)
# Display 2023 most consistent defenses
print("\nMost Consistent Defensive Units (2023):")
consistent_2023 = def_consistency[def_consistency['season'] == 2023].sort_values('sd_epa')
print(consistent_2023.head(15)[['defteam', 'mean_epa', 'sd_epa', 'cv',
'best_game', 'worst_game', 'range']].to_string(index=False))
Visualizing Consistency
#| label: fig-consistency-r
#| fig-cap: "Defensive performance consistency"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false
# Create consistency plot for 2023
consistency_2023 <- def_consistency %>%
filter(season == 2023)
ggplot(consistency_2023, aes(x = mean_epa, y = sd_epa)) +
geom_point(aes(size = range), alpha = 0.4, color = "gray60") +
geom_nfl_logos(aes(team_abbr = defteam), width = 0.06, alpha = 0.8) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.3) +
geom_hline(yintercept = median(consistency_2023$sd_epa),
linetype = "dashed", alpha = 0.3, color = "blue") +
scale_size_continuous(range = c(3, 15), name = "Performance Range") +
labs(
title = "Defensive Performance: Mean vs Consistency",
subtitle = "2023 NFL Season - Top right = Good but inconsistent, Bottom right = Good and consistent",
x = "Mean Defensive EPA per Play (Higher = Better)",
y = "Standard Deviation (Lower = More Consistent)",
caption = "Data: nflfastR | Size represents range (best - worst game)"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
panel.grid.minor = element_blank()
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-consistency-py
#| fig-cap: "Defensive performance consistency - Python"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false
# Create consistency plot for 2023
consistency_2023 = def_consistency[def_consistency['season'] == 2023]
fig, ax = plt.subplots(figsize=(10, 8))
# Scatter plot with size based on range
scatter = ax.scatter(
consistency_2023['mean_epa'],
consistency_2023['sd_epa'],
s=consistency_2023['range'] * 500,
alpha=0.6,
c=consistency_2023['mean_epa'],
cmap='RdYlBu',
edgecolors='black',
linewidth=0.5
)
# Add reference lines
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.3)
ax.axhline(y=consistency_2023['sd_epa'].median(), color='blue',
linestyle='--', alpha=0.3)
# Labels
ax.set_xlabel('Mean Defensive EPA per Play (Higher = Better)', fontsize=12)
ax.set_ylabel('Standard Deviation (Lower = More Consistent)', fontsize=12)
ax.set_title('Defensive Performance: Mean vs Consistency\n2023 NFL Season',
fontsize=14, fontweight='bold')
# Colorbar
cbar = plt.colorbar(scatter, ax=ax)
cbar.set_label('Mean EPA', rotation=270, labelpad=20)
# Add grid
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Predictive Defensive Models
Building a Predictive Framework
Let's build models to predict future defensive performance based on current metrics:
#| label: predictive-model-r
#| message: false
#| warning: false
# Create year-to-year prediction dataset
predict_data <- def_composite %>%
arrange(defteam, season) %>%
group_by(defteam) %>%
mutate(
next_season_epa = lead(epa_per_play),
next_season_composite = lead(composite_score)
) %>%
ungroup() %>%
filter(!is.na(next_season_epa))
# Split into train and test
train_data <- predict_data %>% filter(season < 2023)
test_data <- predict_data %>% filter(season == 2023)
# Build multiple models
library(broom)
# Model 1: Simple persistence (this year predicts next year)
model1 <- lm(next_season_epa ~ epa_per_play, data = train_data)
# Model 2: Multiple components
model2 <- lm(next_season_epa ~ epa_per_play + sack_rate + takeaway_rate +
explosive_pass_rate + explosive_run_rate, data = train_data)
# Model 3: Composite score
model3 <- lm(next_season_composite ~ composite_score, data = train_data)
# Display model comparisons
model_comparison <- bind_rows(
glance(model1) %>% mutate(model = "EPA Only"),
glance(model2) %>% mutate(model = "Multi-Component"),
glance(model3) %>% mutate(model = "Composite Score")
) %>%
select(model, r.squared, adj.r.squared, sigma, AIC)
model_comparison %>%
gt() %>%
cols_label(
model = "Model",
r.squared = "R²",
adj.r.squared = "Adj R²",
sigma = "RMSE",
AIC = "AIC"
) %>%
fmt_number(
columns = c(r.squared, adj.r.squared, sigma),
decimals = 3
) %>%
fmt_number(
columns = AIC,
decimals = 1
) %>%
tab_header(
title = "Defensive Performance Prediction Models",
subtitle = "Year-to-Year Predictive Power"
)
# Make predictions for 2024
predictions_2024 <- def_composite %>%
filter(season == 2023) %>%
mutate(
pred_epa_model1 = predict(model1, newdata = .),
pred_epa_model2 = predict(model2, newdata = .),
pred_composite = predict(model3, newdata = .)
) %>%
arrange(desc(pred_composite))
# Display top predicted defenses for 2024
predictions_2024 %>%
select(defteam, epa_per_play, composite_score,
pred_epa_model1, pred_epa_model2, pred_composite) %>%
head(15) %>%
gt() %>%
cols_label(
defteam = "Team",
epa_per_play = "2023 EPA",
composite_score = "2023 Comp",
pred_epa_model1 = "Pred EPA (M1)",
pred_epa_model2 = "Pred EPA (M2)",
pred_composite = "Pred Comp"
) %>%
fmt_number(
columns = c(epa_per_play, composite_score, pred_epa_model1,
pred_epa_model2, pred_composite),
decimals = 3
) %>%
tab_header(
title = "Predicted Best Defenses for 2024",
subtitle = "Based on 2023 performance and historical patterns"
)
#| label: predictive-model-py
#| message: false
#| warning: false
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
# Create year-to-year prediction dataset
predict_data = def_composite.sort_values(['defteam', 'season']).copy()
predict_data['next_season_epa'] = predict_data.groupby('defteam')['epa_per_play'].shift(-1)
predict_data['next_season_composite'] = predict_data.groupby('defteam')['composite_score'].shift(-1)
predict_data = predict_data.dropna(subset=['next_season_epa'])
# Split into train and test
train_data = predict_data[predict_data['season'] < 2023]
test_data = predict_data[predict_data['season'] == 2023]
# Model 1: Simple persistence
X1_train = train_data[['epa_per_play']]
y1_train = train_data['next_season_epa']
model1 = LinearRegression().fit(X1_train, y1_train)
# Model 2: Multiple components
X2_train = train_data[['epa_per_play', 'sack_rate', 'takeaway_rate',
'explosive_pass_rate', 'explosive_run_rate']]
y2_train = train_data['next_season_epa']
model2 = LinearRegression().fit(X2_train, y2_train)
# Model 3: Composite score
X3_train = train_data[['composite_score']]
y3_train = train_data['next_season_composite']
model3 = LinearRegression().fit(X3_train, y3_train)
# Calculate R² and RMSE
models_comparison = []
for i, (model, X, y) in enumerate([
(model1, X1_train, y1_train),
(model2, X2_train, y2_train),
(model3, X3_train, y3_train)
], 1):
pred = model.predict(X)
models_comparison.append({
'model': ['EPA Only', 'Multi-Component', 'Composite Score'][i-1],
'r_squared': r2_score(y, pred),
'rmse': np.sqrt(mean_squared_error(y, pred))
})
print("\nDefensive Performance Prediction Models:")
print(pd.DataFrame(models_comparison).to_string(index=False))
# Make predictions for 2024
pred_2024 = def_composite[def_composite['season'] == 2023].copy()
pred_2024['pred_epa_model1'] = model1.predict(pred_2024[['epa_per_play']])
pred_2024['pred_epa_model2'] = model2.predict(
pred_2024[['epa_per_play', 'sack_rate', 'takeaway_rate',
'explosive_pass_rate', 'explosive_run_rate']]
)
pred_2024['pred_composite'] = model3.predict(pred_2024[['composite_score']])
# Display top predicted defenses
print("\nPredicted Best Defenses for 2024:")
pred_2024_sorted = pred_2024.sort_values('pred_composite', ascending=False)
print(pred_2024_sorted.head(15)[['defteam', 'epa_per_play', 'composite_score',
'pred_epa_model1', 'pred_epa_model2',
'pred_composite']].to_string(index=False))
Understanding Predictive Accuracy
Defensive performance shows moderate year-to-year correlation (R² typically 0.15-0.30), which is lower than offensive metrics. This reflects: 1. **Personnel turnover**: Defensive rosters change significantly 2. **Scheme changes**: New coordinators and strategies 3. **Injury impact**: Key injuries affect defense more than offense 4. **Turnover regression**: High turnover rates rarely sustain 5. **Opponent evolution**: Offenses adapt to defensive strengthsFeature Importance Analysis
Let's identify which defensive components best predict future performance:
#| label: feature-importance-r
#| message: false
#| warning: false
# Calculate correlations with next year performance
feature_correlations <- train_data %>%
select(next_season_epa, epa_per_play, pass_epa, run_epa,
sack_rate, takeaway_rate, explosive_pass_rate, explosive_run_rate) %>%
cor(use = "complete.obs")
# Extract correlations with next season
importance <- feature_correlations[-1, 1] %>%
as.data.frame() %>%
rownames_to_column("feature") %>%
rename(correlation = ".") %>%
arrange(desc(abs(correlation)))
# Visualize feature importance
importance %>%
mutate(
feature = fct_reorder(feature, correlation),
feature_label = case_when(
feature == "epa_per_play" ~ "Overall EPA",
feature == "pass_epa" ~ "Pass EPA",
feature == "run_epa" ~ "Run EPA",
feature == "sack_rate" ~ "Sack Rate",
feature == "takeaway_rate" ~ "Takeaway Rate",
feature == "explosive_pass_rate" ~ "Explosive Pass Rate",
feature == "explosive_run_rate" ~ "Explosive Run Rate",
TRUE ~ feature
)
) %>%
ggplot(aes(x = correlation, y = feature_label, fill = correlation > 0)) +
geom_col() +
geom_vline(xintercept = 0, linetype = "solid", color = "black") +
scale_fill_manual(values = c("TRUE" = "#4575b4", "FALSE" = "#d73027"), guide = "none") +
labs(
title = "Feature Importance for Predicting Next Season Defensive EPA",
subtitle = "Correlation with year-ahead defensive performance",
x = "Correlation with Next Season EPA",
y = NULL,
caption = "Data: 2020-2022 training data"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 13),
panel.grid.major.y = element_blank()
)
#| label: feature-importance-py
#| message: false
#| warning: false
# Calculate correlations with next year performance
features = ['epa_per_play', 'pass_epa', 'run_epa', 'sack_rate',
'takeaway_rate', 'explosive_pass_rate', 'explosive_run_rate']
correlations = train_data[features + ['next_season_epa']].corr()['next_season_epa'][:-1]
importance = pd.DataFrame({
'feature': correlations.index,
'correlation': correlations.values
}).sort_values('correlation', key=abs, ascending=False)
# Visualize feature importance
fig, ax = plt.subplots(figsize=(10, 6))
feature_labels = {
'epa_per_play': 'Overall EPA',
'pass_epa': 'Pass EPA',
'run_epa': 'Run EPA',
'sack_rate': 'Sack Rate',
'takeaway_rate': 'Takeaway Rate',
'explosive_pass_rate': 'Explosive Pass Rate',
'explosive_run_rate': 'Explosive Run Rate'
}
importance['feature_label'] = importance['feature'].map(feature_labels)
colors = ['#4575b4' if x > 0 else '#d73027' for x in importance['correlation']]
ax.barh(importance['feature_label'], importance['correlation'], color=colors)
ax.axvline(x=0, color='black', linewidth=0.8)
ax.set_xlabel('Correlation with Next Season EPA', fontsize=12)
ax.set_title('Feature Importance for Predicting Next Season Defensive EPA\nCorrelation with year-ahead defensive performance',
fontsize=13, fontweight='bold')
ax.grid(True, axis='x', alpha=0.3)
plt.tight_layout()
plt.show()
Historical Defensive Comparisons
Era Adjustments
Comparing defenses across eras requires adjusting for rule changes and offensive evolution:
#| label: era-adjustment-r
#| message: false
#| warning: false
# Calculate league average by season (as proxy for era)
league_avg_by_season <- pbp_clean %>%
group_by(season) %>%
summarise(
league_avg_epa = mean(epa),
league_avg_yards = mean(yards_gained, na.rm = TRUE),
.groups = "drop"
)
# Adjust defensive metrics for era
def_era_adjusted <- def_metrics %>%
left_join(league_avg_by_season, by = "season") %>%
mutate(
# Era-adjusted EPA (compare to league average)
era_adj_epa = epa_per_play - (-league_avg_epa),
# Z-score within season
era_z_score = (epa_per_play - mean(epa_per_play)) / sd(epa_per_play)
) %>%
group_by(season) %>%
mutate(era_z_score = standardize(epa_per_play)) %>%
ungroup()
# Find best defenses across all seasons
best_defenses_all_time <- def_era_adjusted %>%
arrange(desc(era_z_score)) %>%
select(season, defteam, plays, epa_per_play, era_adj_epa, era_z_score) %>%
head(20)
best_defenses_all_time %>%
gt() %>%
cols_label(
season = "Season",
defteam = "Team",
plays = "Plays",
epa_per_play = "Raw EPA",
era_adj_epa = "Era Adj EPA",
era_z_score = "Z-Score"
) %>%
fmt_number(
columns = c(plays),
decimals = 0
) %>%
fmt_number(
columns = c(epa_per_play, era_adj_epa, era_z_score),
decimals = 3
) %>%
data_color(
columns = era_z_score,
colors = scales::col_numeric(
palette = c("#fee090", "#4575b4"),
domain = NULL
)
) %>%
tab_header(
title = "Best Defensive Seasons (Era-Adjusted)",
subtitle = "2020-2023 NFL Seasons - Ranked by within-season z-score"
) %>%
tab_source_note(
source_note = "Z-score accounts for league-wide offensive performance in each season"
)
#| label: era-adjustment-py
#| message: false
#| warning: false
# Calculate league average by season
league_avg_by_season = pbp_clean.groupby('season').agg({
'epa': 'mean',
'yards_gained': 'mean'
}).reset_index()
league_avg_by_season.columns = ['season', 'league_avg_epa', 'league_avg_yards']
# Adjust defensive metrics for era
def_era_adjusted = def_metrics.merge(league_avg_by_season, on='season')
def_era_adjusted['era_adj_epa'] = (
def_era_adjusted['epa_per_play'] - (-def_era_adjusted['league_avg_epa'])
)
# Z-score within season
def_era_adjusted['era_z_score'] = def_era_adjusted.groupby('season')['epa_per_play'].transform(
lambda x: (x - x.mean()) / x.std()
)
# Find best defenses across all seasons
best_defenses_all_time = def_era_adjusted.nlargest(20, 'era_z_score')
print("\nBest Defensive Seasons (Era-Adjusted) 2020-2023:")
print(best_defenses_all_time[['season', 'defteam', 'plays', 'epa_per_play',
'era_adj_epa', 'era_z_score']].to_string(index=False))
Historical Trends Visualization
#| label: fig-historical-trends-r
#| fig-cap: "Defensive EPA trends over time"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Calculate percentiles by season
def_percentiles <- def_era_adjusted %>%
group_by(season) %>%
summarise(
p90 = quantile(epa_per_play, 0.90),
p75 = quantile(epa_per_play, 0.75),
p50 = quantile(epa_per_play, 0.50),
p25 = quantile(epa_per_play, 0.25),
p10 = quantile(epa_per_play, 0.10),
.groups = "drop"
)
# Plot trends
ggplot(def_percentiles, aes(x = season)) +
geom_ribbon(aes(ymin = p10, ymax = p90), fill = "#4575b4", alpha = 0.2) +
geom_ribbon(aes(ymin = p25, ymax = p75), fill = "#4575b4", alpha = 0.4) +
geom_line(aes(y = p50), color = "#4575b4", size = 1.5) +
geom_point(aes(y = p50), color = "#4575b4", size = 3) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
scale_x_continuous(breaks = 2020:2023) +
labs(
title = "Distribution of Defensive EPA Over Time",
subtitle = "Dark band = 25th-75th percentile, Light band = 10th-90th percentile",
x = "Season",
y = "Defensive EPA per Play (Higher = Better)",
caption = "Data: nflfastR"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
panel.grid.minor = element_blank()
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-historical-trends-py
#| fig-cap: "Defensive EPA trends over time - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Calculate percentiles by season
percentiles = def_era_adjusted.groupby('season')['epa_per_play'].quantile(
[0.10, 0.25, 0.50, 0.75, 0.90]
).unstack()
percentiles.columns = ['p10', 'p25', 'p50', 'p75', 'p90']
percentiles = percentiles.reset_index()
# Plot trends
fig, ax = plt.subplots(figsize=(12, 8))
# Fill between percentiles
ax.fill_between(percentiles['season'], percentiles['p10'], percentiles['p90'],
alpha=0.2, color='#4575b4', label='10th-90th percentile')
ax.fill_between(percentiles['season'], percentiles['p25'], percentiles['p75'],
alpha=0.4, color='#4575b4', label='25th-75th percentile')
# Median line
ax.plot(percentiles['season'], percentiles['p50'],
color='#4575b4', linewidth=2.5, marker='o', markersize=8, label='Median')
# Reference line
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
# Labels and formatting
ax.set_xlabel('Season', fontsize=12)
ax.set_ylabel('Defensive EPA per Play (Higher = Better)', fontsize=12)
ax.set_title('Distribution of Defensive EPA Over Time',
fontsize=14, fontweight='bold')
ax.legend(loc='best')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Complete Defensive Dashboard
Let's create a comprehensive defensive evaluation dashboard:
#| label: complete-dashboard-r
#| message: false
#| warning: false
#| fig-width: 14
#| fig-height: 12
library(patchwork)
# Select a team to analyze
team_to_analyze <- "SF" # San Francisco 49ers
team_data_2023 <- def_composite %>% filter(season == 2023, defteam == team_to_analyze)
# 1. Multi-metric comparison plot
plot1 <- def_composite %>%
filter(season == 2023) %>%
select(defteam, epa_per_play, pass_epa, run_epa, composite_score) %>%
mutate(
highlight = defteam == team_to_analyze,
rank = rank(-composite_score)
) %>%
ggplot(aes(x = pass_epa, y = run_epa)) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.3) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.3) +
geom_point(aes(alpha = highlight, size = highlight), color = "gray60") +
geom_nfl_logos(
data = . %>% filter(highlight),
aes(team_abbr = defteam),
width = 0.08
) +
scale_alpha_manual(values = c(0.3, 1), guide = "none") +
scale_size_manual(values = c(2, 4), guide = "none") +
labs(
title = "Pass Defense vs Run Defense",
x = "Pass Defense EPA (Higher = Better)",
y = "Run Defense EPA (Higher = Better)"
) +
theme_minimal()
# 2. Rank comparison
plot2 <- def_composite %>%
filter(season == 2023) %>%
mutate(
epa_rank = rank(-epa_per_play),
composite_rank = rank(-composite_score),
highlight = defteam == team_to_analyze
) %>%
ggplot(aes(x = epa_rank, y = composite_rank)) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", alpha = 0.3) +
geom_point(aes(alpha = highlight, size = highlight), color = "gray60") +
geom_nfl_logos(
data = . %>% filter(highlight),
aes(team_abbr = defteam),
width = 0.08
) +
scale_alpha_manual(values = c(0.3, 1), guide = "none") +
scale_size_manual(values = c(2, 4), guide = "none") +
scale_x_continuous(breaks = seq(0, 32, 8)) +
scale_y_continuous(breaks = seq(0, 32, 8)) +
coord_fixed() +
labs(
title = "EPA Rank vs Composite Rank",
x = "EPA Rank",
y = "Composite Rank"
) +
theme_minimal()
# 3. Historical performance
plot3 <- def_composite %>%
filter(defteam == team_to_analyze) %>%
ggplot(aes(x = season, y = epa_per_play)) +
geom_line(size = 1.5, color = "#4575b4") +
geom_point(size = 3, color = "#4575b4") +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.3) +
labs(
title = glue("{team_to_analyze} Defensive EPA Trend"),
x = "Season",
y = "EPA per Play"
) +
theme_minimal()
# 4. Component breakdown
plot4 <- def_composite %>%
filter(season == 2023, defteam == team_to_analyze) %>%
select(epa_z, pass_epa_z, run_epa_z, sack_z, takeaway_z) %>%
pivot_longer(everything(), names_to = "metric", values_to = "z_score") %>%
mutate(
metric = factor(metric,
levels = c("epa_z", "pass_epa_z", "run_epa_z", "sack_z", "takeaway_z"),
labels = c("Overall EPA", "Pass EPA", "Run EPA", "Sack Rate", "Turnovers"))
) %>%
ggplot(aes(x = z_score, y = metric, fill = z_score > 0)) +
geom_col() +
geom_vline(xintercept = 0, linetype = "solid") +
scale_fill_manual(values = c("FALSE" = "#d73027", "TRUE" = "#4575b4"), guide = "none") +
labs(
title = glue("{team_to_analyze} Component Performance (Z-Scores)"),
x = "Z-Score (vs League Average)",
y = NULL
) +
theme_minimal()
# Combine plots
(plot1 | plot2) / (plot3 | plot4) +
plot_annotation(
title = glue("{team_to_analyze} Comprehensive Defensive Dashboard - 2023 Season"),
theme = theme(plot.title = element_text(face = "bold", size = 16))
)
#| label: complete-dashboard-py
#| message: false
#| warning: false
#| fig-width: 14
#| fig-height: 12
# Select a team to analyze
team_to_analyze = "SF" # San Francisco 49ers
fig, axes = plt.subplots(2, 2, figsize=(14, 12))
# 1. Pass Defense vs Run Defense
ax1 = axes[0, 0]
data_2023 = def_composite[def_composite['season'] == 2023]
team_data = data_2023[data_2023['defteam'] == team_to_analyze]
ax1.scatter(data_2023['pass_epa'], data_2023['run_epa'],
alpha=0.3, s=50, color='gray')
ax1.scatter(team_data['pass_epa'], team_data['run_epa'],
s=200, color='#d73027', edgecolors='black', linewidth=2, zorder=5)
ax1.axvline(x=0, color='gray', linestyle='--', alpha=0.3)
ax1.axhline(y=0, color='gray', linestyle='--', alpha=0.3)
ax1.set_xlabel('Pass Defense EPA (Higher = Better)')
ax1.set_ylabel('Run Defense EPA (Higher = Better)')
ax1.set_title('Pass Defense vs Run Defense')
ax1.grid(True, alpha=0.3)
# 2. EPA Rank vs Composite Rank
ax2 = axes[0, 1]
data_2023['epa_rank'] = data_2023['epa_per_play'].rank(ascending=False)
data_2023['composite_rank'] = data_2023['composite_score'].rank(ascending=False)
ax2.scatter(data_2023['epa_rank'], data_2023['composite_rank'],
alpha=0.3, s=50, color='gray')
team_data_ranks = data_2023[data_2023['defteam'] == team_to_analyze]
ax2.scatter(team_data_ranks['epa_rank'], team_data_ranks['composite_rank'],
s=200, color='#d73027', edgecolors='black', linewidth=2, zorder=5)
ax2.plot([0, 32], [0, 32], 'k--', alpha=0.3)
ax2.set_xlabel('EPA Rank')
ax2.set_ylabel('Composite Rank')
ax2.set_title('EPA Rank vs Composite Rank')
ax2.grid(True, alpha=0.3)
# 3. Historical performance
ax3 = axes[1, 0]
team_history = def_composite[def_composite['defteam'] == team_to_analyze]
ax3.plot(team_history['season'], team_history['epa_per_play'],
marker='o', linewidth=2.5, markersize=8, color='#4575b4')
ax3.axhline(y=0, color='gray', linestyle='--', alpha=0.3)
ax3.set_xlabel('Season')
ax3.set_ylabel('EPA per Play')
ax3.set_title(f'{team_to_analyze} Defensive EPA Trend')
ax3.grid(True, alpha=0.3)
# 4. Component breakdown
ax4 = axes[1, 1]
team_2023 = def_composite[
(def_composite['season'] == 2023) &
(def_composite['defteam'] == team_to_analyze)
].iloc[0]
components = {
'Overall EPA': team_2023['epa_z'],
'Pass EPA': team_2023['pass_epa_z'],
'Run EPA': team_2023['run_epa_z'],
'Sack Rate': team_2023['sack_z'],
'Turnovers': team_2023['takeaway_z']
}
colors = ['#4575b4' if v > 0 else '#d73027' for v in components.values()]
ax4.barh(list(components.keys()), list(components.values()), color=colors)
ax4.axvline(x=0, color='black', linewidth=0.8)
ax4.set_xlabel('Z-Score (vs League Average)')
ax4.set_title(f'{team_to_analyze} Component Performance (Z-Scores)')
ax4.grid(True, axis='x', alpha=0.3)
plt.suptitle(f'{team_to_analyze} Comprehensive Defensive Dashboard - 2023 Season',
fontsize=16, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()
Summary
In this chapter, we've built a comprehensive framework for advanced defensive evaluation:
Key Concepts Covered
-
Composite Defensive Ratings: Combined multiple metrics (EPA, pressure, turnovers, explosive plays) into weighted ratings
-
Opponent Adjustments: Accounted for strength of schedule to make fair comparisons between defensive units
-
Context-Adjusted Metrics: Adjusted for game situation, score differential, and leverage to identify clutch performance
-
Position-Specific Evaluation: Created targeted metrics for different defensive positions (CB, pass rush, run defense)
-
Unit Synergy: Measured coordination between coverage and pass rush units
-
Consistency Analysis: Identified reliable defenses vs. volatile ones using variance metrics
-
Predictive Modeling: Built models to forecast future defensive performance
-
Historical Comparisons: Developed era adjustments for comparing defenses across seasons
Key Takeaways
- Defense is multifaceted: No single metric captures defensive quality; comprehensive evaluation requires combining multiple dimensions
- Context matters: Raw statistics can be misleading; adjusting for opponent and situation is essential
- Predictability is limited: Defensive performance is less stable year-to-year than offense (R² ~0.20-0.30)
- Turnovers regress: High turnover rates are difficult to sustain
- Synergy is valuable: Coordinated units outperform collections of individuals
- Consistency counts: Reliable defenses are more valuable than volatile ones
Practical Applications
These advanced defensive metrics enable:
- Team Evaluation: Identifying truly elite defenses vs. those benefiting from weak opponents
- Player Valuation: Assessing individual contributions to defensive success
- Strategic Planning: Understanding which defensive strengths are sustainable
- Roster Construction: Building balanced, synergistic defensive units
- Game Planning: Exploiting opponent defensive weaknesses
- Draft Analysis: Projecting how defensive prospects will contribute
Exercises
Conceptual Questions
-
Composite Rating Design: Design your own composite defensive rating system. What weights would you assign to different components (EPA, pressure, turnovers, etc.) and why?
-
Turnover Sustainability: Research and explain why turnover rates show low year-to-year correlation. What implications does this have for defensive evaluation?
-
Position Value: Which defensive position has the greatest impact on overall defensive EPA? Use evidence from the chapter to support your answer.
Coding Exercises
Exercise 1: Custom Defensive Rating System
Create your own composite defensive rating that includes: a) At least 5 different metrics b) Justification for your chosen weights c) Validation by comparing rankings to playoff success d) Visualization comparing your rating to EPA-only rankings **Bonus**: Test your rating system's predictive power for next-season performance.Exercise 2: Opponent-Adjusted Analysis
Load data from the 2023 season and: a) Calculate opponent-adjusted EPA for each team b) Identify which teams' rankings changed most after adjustment c) Determine whether playoff teams faced stronger or weaker schedules d) Create visualizations showing the impact of adjustment **Hint**: Consider both offensive and defensive opponent quality.Exercise 3: Position-Specific Framework
Build a complete evaluation framework for one defensive position: a) Identify 5-7 key metrics for the position b) Calculate these metrics by team for the 2023 season c) Create a composite score for position unit quality d) Validate by comparing to overall defensive performance **Challenge**: If player-level data is available, extend to individual player evaluation.Exercise 4: Predictive Defensive Model
Build a model to predict next-season defensive EPA: a) Use data from 2020-2022 as training data b) Test different feature combinations c) Validate on 2023 season d) Generate predictions for 2024 e) Compare your model's accuracy to simple persistence (this year = next year) **Advanced**: Incorporate personnel changes (e.g., coordinator changes, key player departures) as features.Exercise 5: Defensive Dashboard
Create an interactive defensive evaluation dashboard that includes: a) Overall defensive ranking (multiple metrics) b) Pass defense vs. run defense scatter plot c) Situational performance breakdown d) Historical trend line e) Opponent-adjusted metrics f) Consistency/variance analysis Make it filterable by team and season.Further Reading
Academic Research
-
Yurko, R., Ventura, S., & Horowitz, M. (2019). "nflWAR: A reproducible method for offensive player evaluation in football." Journal of Quantitative Analysis in Sports, 15(3), 163-183.
-
Lopez, M. J., Matthews, G. J., & Baumer, B. S. (2018). "How often does the best team win? A unified approach to understanding randomness in North American sport." The Annals of Applied Statistics, 12(4), 2483-2516.
-
Burke, B. (2019). "Defenses and the Passing Game." ESPN Analytics.
Industry Resources
-
Football Outsiders. "DVOA Explained." https://www.footballoutsiders.com/info/methods
-
Pro Football Focus. "PFF Grading." https://www.pff.com/grades
-
Baldwin, B. (2021). "The nflfastR Beginner's Guide." https://www.nflfastR.com/
Books
-
Alamar, B. (2013). Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers. Columbia University Press.
-
Miller, T. W. (2015). Sports Analytics and Data Science. Pearson FT Press.
References
:::