Chapter 13: Defensive Analytics Basics | Football Analytics Textbook

Learning ObjectivesBy the end of this chapter, you will be able to:

Defensive Coverage Performance

Compare EPA allowed and usage frequency for different coverage schemes. Negative EPA (green) is good for defense.

Understand fundamental defensive concepts and terminology
Calculate basic defensive efficiency metrics
Analyze defensive formations and alignments
Evaluate defensive performance by situation
Measure defensive consistency and explosiveness allowed

Introduction

"Defense wins championships." This football axiom has endured for generations, and while modern analytics has shown that offense and defense contribute roughly equally to winning, defensive excellence remains critical to sustained success. Yet defense has traditionally been more difficult to measure than offense—partly because defensive players react to offensive schemes, and partly because traditional statistics like tackles and interceptions don't capture the full picture of defensive performance.

Understanding defensive performance presents unique analytical challenges that differ fundamentally from offensive analysis. When we measure offense, we track what a team actively does—yards gained, points scored, successful plays executed. Defense, however, is inherently reactive. A defense doesn't "create" yards or points; instead, it prevents them. This reactive nature means defensive metrics must be framed differently: yards allowed, points prevented, plays stopped. Moreover, defensive performance is heavily influenced by factors outside the defense's control, such as the strength of opposing offenses and the field position created by special teams turnovers.

Consider the challenge of comparing two defenses: one faces high-powered offenses each week and allows 350 yards per game, while another faces weaker offenses and allows 300 yards per game. Which defense is actually better? Traditional statistics can't answer this question because they lack context. Modern defensive analytics addresses this limitation by using context-aware metrics that adjust for situation, opponent quality, and game circumstances.

This chapter introduces the fundamental concepts and metrics used to evaluate defensive performance in modern football analytics. We'll examine how defenses stop offenses, prevent explosive plays, and perform in critical situations. Most importantly, we'll learn to measure defensive effectiveness using advanced metrics that account for context and situation. By the end of this chapter, you'll understand how to evaluate defenses using the same rigorous, data-driven approach that NFL teams employ in their war rooms every Sunday.

The Defensive Perspective: Preventing Points Rather Than Scoring Them

When we analyze defense, we must shift our mental framework from the offensive perspective we've developed in previous chapters. On offense, positive outcomes are easy to identify: gaining yards, scoring points, converting first downs. Defense operates in reverse—success means preventing those same outcomes. This fundamental difference creates unique measurement challenges.

The most important insight in defensive analytics is that defense is measured from the opponent's perspective. When we calculate defensive EPA, we're actually calculating how much expected points the offense gained—then reversing the sign. A play where the offense gains 0.3 EPA represents a defensive failure worth -0.3 EPA from the defense's viewpoint. This framing can be confusing at first, but it's essential for proper defensive evaluation.

Consider two defensive plays: on first-and-10 from the 50-yard line, the defense allows a 5-yard completion versus allows a 5-yard sack. Both result in 5 yards of field position change, but the defensive outcomes are dramatically different. The completion gives the offense second-and-5 with positive EPA (typically around +0.5 EPA), while the sack creates second-and-15 with highly negative EPA (around -1.5 EPA). A good defensive metric must capture this distinction, which is why EPA-based metrics have become the gold standard for defensive evaluation.

Throughout this chapter, we'll adopt the defensive coordinator's perspective. We'll measure success by points prevented, explosive plays stopped, and scoring drives ended. We'll learn to identify which defenses consistently force offenses into unfavorable situations, and which defenses allow offenses to operate at peak efficiency. This framework will prepare you for the more advanced defensive topics in Chapters 14-16, where we'll examine pass defense, run defense, and situational decision-making in greater depth.

What is Defensive Analytics?

Defensive analytics encompasses the measurement, evaluation, and optimization of defensive performance. This includes everything from individual play efficiency to overall defensive strategy, formation effectiveness, coverage schemes, and situational decision-making.

The Evolution of Defensive Football

From Steel Curtain to Modern Multiple Defenses

The NFL has seen defensive philosophy evolve dramatically over the past five decades:

1970s-1980s: Traditional 4-3 and 3-4 base defenses dominated, with defenses spending most of the game in base personnel. The Pittsburgh Steelers' "Steel Curtain" epitomized this era's physical, run-stopping defense.

1990s-2000s: The Tampa 2 defense, popularized by Tony Dungy and Monte Kiffin, emphasized speed and zone coverage. Defenses began using more sub-packages (nickel and dime) as offenses passed more frequently.

2010s-Present: Modern defenses use multiple personnel packages and coverage schemes, adjusting to offensive tendencies. Nickel defense (5 DBs) has become the new "base" defense for most teams. Versatile defenders who can cover, rush the passer, and stop the run are at a premium.

The Analytics Impact

Analytics has revealed that preventing explosive plays and creating havoc (sacks, turnovers, negative plays) are more important than simply limiting yards. This has led defenses to emphasize pressure and deep coverage over traditional run-stopping.

Basic Defensive Metrics

Traditional Statistics

Before exploring advanced metrics, let's review traditional defensive statistics:

Volume Metrics:
- Total yards allowed (passing + rushing)
- Total plays faced
- Points allowed
- Turnovers forced

Rate Metrics:
- Yards per play allowed
- Yards per carry allowed
- Yards per attempt allowed
- Completion percentage allowed
- Third down conversion rate allowed

While these metrics provide useful baseline information, they lack crucial context. A defense facing long fields due to poor offensive performance will naturally allow fewer points than an equally good defense with poor field position.

Yards Per Play Allowed

Yards per play allowed is one of the simplest yet most informative defensive efficiency metrics. It measures how many yards a defense surrenders on average each time the opposing offense runs a play. This metric normalizes for the number of opportunities faced, making it a fair comparison between defenses that face different numbers of plays.

$$ \text{Yards Per Play Allowed} = \frac{\text{Total Yards Allowed}}{\text{Defensive Plays}} $$

Like its offensive counterpart, yards per play allowed is strongly correlated with winning. Teams that limit opponents to fewer yards per play typically win more games. The league average typically hovers around 5.3-5.5 yards per play, with elite defenses holding opponents under 5.0 yards per play and struggling defenses allowing over 6.0 yards per play. This metric has proven remarkably predictive: defenses in the bottom quartile of YPP allowed make the playoffs less than 15% of the time, while defenses in the top quartile make the playoffs more than 50% of the time.

Why does yards per play matter more than total yards? Consider two scenarios: Defense A faces 70 plays and allows 350 yards (5.0 YPP), while Defense B faces 55 plays and allows 330 yards (6.0 YPP). Defense B allowed fewer total yards, but Defense A was actually more efficient on a per-play basis. The difference in total plays faced could be due to factors outside defensive control—such as the offense's time of possession or pace of play—making yards per play a fairer comparison. A defense paired with a fast-paced offense that goes three-and-out frequently will face many more plays than a defense paired with a ball-control offense, but this says nothing about the defensive unit's quality.

Breaking down yards per play allowed into pass and run components reveals important insights about a defense's strengths and weaknesses. Most NFL defenses are better against either the pass or the run, but not equally good at both. Understanding these splits helps identify matchup advantages and potential game-planning strategies. For example, a defense that allows 4.5 yards per rush but 7.5 yards per pass attempt is vulnerable through the air and should be attacked with an aggressive passing game, even if their overall YPP allowed looks respectable.

In the following analysis, we'll calculate defensive yards per play for every team in the 2023 season. We'll examine both overall efficiency and the pass-run split, which will reveal each defense's strengths, weaknesses, and structural characteristics. This foundational metric will serve as our baseline for more sophisticated defensive evaluation throughout the chapter.

#| label: yards-per-play-allowed-r
#| message: false
#| warning: false
#| cache: true

library(tidyverse)
library(nflfastR)
library(gt)

# Load 2023 season play-by-play data
# This includes every play from every game with advanced metrics
pbp_2023 <- load_pbp(2023)

# Calculate team defensive yards per play allowed
# We analyze both overall efficiency and pass vs run splits
defensive_ypp <- pbp_2023 %>%
  # Filter to plays where defense is recorded and play type is pass or run
  # This excludes special teams plays, penalties, and other non-offensive plays
  filter(!is.na(defteam), play_type %in% c("pass", "run")) %>%
  # Group by defending team to calculate team-level metrics
  group_by(defteam) %>%
  summarise(
    # Count total defensive plays faced
    plays = n(),
    # Sum total yards allowed on all plays
    total_yards = sum(yards_gained, na.rm = TRUE),
    # Calculate overall yards per play allowed
    yards_per_play = total_yards / plays,
    # Count plays by type to calculate type-specific efficiency
    pass_plays = sum(play_type == "pass"),
    rush_plays = sum(play_type == "run"),
    # Sum yards allowed by play type
    pass_yards = sum(yards_gained[play_type == "pass"], na.rm = TRUE),
    rush_yards = sum(yards_gained[play_type == "run"], na.rm = TRUE),
    # Calculate yards per play for each type
    yards_per_pass = pass_yards / pass_plays,
    yards_per_rush = rush_yards / rush_plays,
    .groups = "drop"
  ) %>%
  # Sort by overall yards per play (lowest is best for defense)
  arrange(yards_per_play)

# Display top 10 defenses in a formatted table
defensive_ypp %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    plays = "Plays",
    total_yards = "Total Yards",
    yards_per_play = "YPP",
    pass_plays = "Pass Plays",
    rush_plays = "Rush Plays",
    yards_per_pass = "YPA",
    yards_per_rush = "YPC"
  ) %>%
  fmt_number(
    columns = c(yards_per_play, yards_per_pass, yards_per_rush),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(plays, total_yards, pass_plays, rush_plays),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Defensive Yards Per Play Leaders",
    subtitle = "2023 NFL Season - Top 10 Teams (Lowest YPP Allowed)"
  )

#| label: yards-per-play-allowed-py
#| message: false
#| warning: false
#| cache: true

import pandas as pd
import numpy as np
import nfl_data_py as nfl

# Load 2023 season play-by-play data using nfl_data_py
# This Python package provides the same data as nflfastR in R
pbp_2023 = nfl.import_pbp_data([2023])

# Calculate team defensive yards per play allowed
# Using pandas query and groupby for efficient aggregation
defensive_ypp = (pbp_2023
    # Filter to plays with recorded defense and pass/run play types
    .query("defteam.notna() & play_type.isin(['pass', 'run'])")
    # Group by defending team
    .groupby('defteam')
    # Aggregate multiple metrics for each team
    .agg(
        plays=('yards_gained', 'count'),  # Total plays faced
        total_yards=('yards_gained', 'sum'),  # Total yards allowed
        # Count plays by type using lambda functions
        pass_plays=('play_type', lambda x: (x == 'pass').sum()),
        rush_plays=('play_type', lambda x: (x == 'run').sum()),
        # Sum yards by type using conditional selection
        pass_yards=('yards_gained', lambda x: x[pbp_2023.loc[x.index, 'play_type'] == 'pass'].sum()),
        rush_yards=('yards_gained', lambda x: x[pbp_2023.loc[x.index, 'play_type'] == 'run'].sum())
    )
    .reset_index()
)

# Calculate per-play efficiency metrics
defensive_ypp['yards_per_play'] = defensive_ypp['total_yards'] / defensive_ypp['plays']
defensive_ypp['yards_per_pass'] = defensive_ypp['pass_yards'] / defensive_ypp['pass_plays']
defensive_ypp['yards_per_rush'] = defensive_ypp['rush_yards'] / defensive_ypp['rush_plays']

# Sort by overall yards per play (ascending - lower is better)
# and select top 10 defenses
defensive_ypp_top10 = (defensive_ypp
    .sort_values('yards_per_play')
    .head(10)
    [['defteam', 'plays', 'total_yards', 'yards_per_play',
      'pass_plays', 'rush_plays', 'yards_per_pass', 'yards_per_rush']]
)

print("Defensive Yards Per Play Leaders - 2023 NFL Season (Top 10 - Lowest YPP)")
print("=" * 80)
print(defensive_ypp_top10.to_string(index=False))

This code calculates defensive efficiency by measuring how many yards a defense allows per play. The analysis proceeds in several steps: 1. **Data Loading**: We load the complete 2023 NFL play-by-play dataset, which contains information about every play including the defending team, play type, and yards gained. 2. **Filtering**: We filter to only pass and run plays where the defense is recorded. This excludes special teams plays, penalties without plays, and administrative plays that don't represent actual defensive performance. 3. **Aggregation**: For each defending team, we calculate: - Total plays faced (sample size matters for reliability) - Total yards allowed (raw volume metric) - Overall yards per play (primary efficiency metric) - Separate calculations for pass and run plays to identify defensive strengths/weaknesses 4. **Pass vs Run Split**: By calculating separate efficiency rates for passes and runs, we can identify whether a defense is balanced or has a clear weakness. For example, a defense might allow 6.0 yards per pass attempt but only 3.5 yards per rush—indicating they should be attacked through the air. **Interpretation Note**: When examining these results, look for defenses with: - Overall YPP below 5.0 (elite) - Balanced pass and run defense (within 0.5 yards of each other) - Large sample sizes (50+ games worth of plays indicates reliable estimates)

Key Insight: Pass Defense vs Run Defense Trade-offs

Most defenses cannot be elite against both the pass and run simultaneously. Defenses that use more defensive backs (nickel/dime packages) typically excel against the pass but struggle against the run. Conversely, defenses with heavy base packages (more linebackers) stop the run well but can be vulnerable to passes. Understanding these trade-offs helps identify matchup advantages.

The yards per play metric reveals fascinating strategic insights. Elite defenses like the Baltimore Ravens and Cleveland Browns in 2023 typically achieved their rankings through different approaches—some with elite pass defense and adequate run defense, others with dominant run defense and good-enough pass defense. There's rarely a "perfect" defense that excels at everything, which means offensive coordinators can always find potential areas to attack.

When interpreting these results, look for several key patterns. First, examine the gap between yards per pass attempt and yards per carry—large gaps (greater than 2.0 yards) indicate exploitable imbalances. Second, compare overall yards per play to the league average of approximately 5.4 yards—teams more than 0.5 yards below average are elite, while teams 0.5 yards above average are vulnerable. Third, consider the volume of plays faced: defenses facing many more plays than average may be dealing with poor offensive support (quick three-and-outs) or extremely fast pace, which can inflate their total yards allowed even if their per-play efficiency is good.

Common Pitfall: Confusing Volume and Efficiency

A defense can rank poorly in total yards allowed while ranking well in yards per play allowed (or vice versa). Always focus on per-play metrics rather than volume metrics when evaluating defensive quality. Total yards allowed tells you more about the offense and game script than the defense.

Points Allowed Per Drive

Points allowed per drive accounts for the number of opportunities defenses face, providing a more accurate measure of defensive effectiveness than total points allowed. While yards per play tells us about efficiency on individual plays, points per drive captures the ultimate outcome: whether the defense prevented scoring.

$$ \text{Points Allowed Per Drive} = \frac{\text{Total Points Allowed}}{\text{Defensive Drives}} $$

This metric is superior to total points allowed because it normalizes for pace and number of possessions. Consider two defenses: Defense A allows 300 points in a season while facing 150 drives (2.00 points per drive), while Defense B allows 280 points while facing 130 drives (2.15 points per drive). Defense B allowed fewer total points but was actually less efficient per opportunity. The difference in drives faced might reflect their offense's ability to sustain drives and control time of possession, not the defensive unit's quality.

Elite defenses typically allow fewer than 1.70 points per drive, while struggling defenses allow more than 2.20 points per drive. The league average hovers around 1.95 points per drive. This seemingly small difference compounds dramatically over a season: at 11 drives per game and 17 games, the gap between an elite defense (1.70 PPD) and a poor defense (2.20 PPD) amounts to 93.5 points per season—worth approximately 9-10 wins based on the typical relationship between point differential and winning percentage.

One advantage of points per drive over yards per play is that it inherently accounts for red zone and goal-line performance. A defense might allow significant yardage but stiffen in scoring position, forcing field goals instead of touchdowns. Points per drive captures this "bend-but-don't-break" approach, while yards per play does not. Conversely, a defense might allow few yards overall but give up too many explosive touchdowns, looking good by yards per play but poor by points per drive.

When calculating points per drive, we face an important methodological decision: how to assign points for touchdowns versus field goals. The code below assigns 7 points for touchdowns (including the extra point) and 3 points for field goals (the typical value). This approximation works well for season-long analysis, though it slightly overstates defensive performance since extra points are not automatic (though they succeed about 94% of the time) and field goals vary in value by distance.

#| label: points-per-drive-allowed-r
#| message: false
#| warning: false

# Calculate points allowed per drive
defensive_scoring <- pbp_2023 %>%
  filter(!is.na(defteam), play_type %in% c("pass", "run")) %>%
  group_by(defteam, fixed_drive) %>%
  summarise(
    plays = n(),
    drive_points = last(fixed_drive_result),
    .groups = "drop"
  ) %>%
  mutate(
    points = case_when(
      drive_points == "Touchdown" ~ 7,
      drive_points == "Field goal" ~ 3,
      TRUE ~ 0
    )
  ) %>%
  group_by(defteam) %>%
  summarise(
    total_plays = sum(plays),
    total_drives = n(),
    total_points = sum(points),
    points_per_drive = total_points / total_drives,
    touchdown_rate = sum(drive_points == "Touchdown") / total_drives,
    .groups = "drop"
  ) %>%
  arrange(points_per_drive)

# Display results
defensive_scoring %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    total_plays = "Plays",
    total_drives = "Drives",
    total_points = "Points",
    points_per_drive = "Pts/Drive",
    touchdown_rate = "TD Rate"
  ) %>%
  fmt_number(
    columns = points_per_drive,
    decimals = 3
  ) %>%
  fmt_percent(
    columns = touchdown_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(total_plays, total_drives, total_points),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Best Scoring Defenses",
    subtitle = "2023 NFL Season - Lowest Points Per Drive Allowed"
  )

#| label: points-per-drive-allowed-py
#| message: false
#| warning: false

# Calculate points allowed per drive
def calculate_points_allowed(result):
    if result == 'Touchdown':
        return 7
    elif result == 'Field goal':
        return 3
    else:
        return 0

defensive_data = (pbp_2023
    .query("defteam.notna() & play_type.isin(['pass', 'run'])")
    .copy()
)

# Calculate points per drive
drive_summary = (defensive_data
    .groupby(['defteam', 'fixed_drive'])
    .agg(
        plays=('play_id', 'count'),
        drive_result=('fixed_drive_result', 'last')
    )
    .reset_index()
)

drive_summary['points'] = drive_summary['drive_result'].apply(calculate_points_allowed)
drive_summary['is_td'] = (drive_summary['drive_result'] == 'Touchdown').astype(int)

defensive_scoring = (drive_summary
    .groupby('defteam')
    .agg(
        total_plays=('plays', 'sum'),
        total_drives=('fixed_drive', 'count'),
        total_points=('points', 'sum'),
        touchdowns_allowed=('is_td', 'sum')
    )
    .reset_index()
)

defensive_scoring['points_per_drive'] = defensive_scoring['total_points'] / defensive_scoring['total_drives']
defensive_scoring['touchdown_rate'] = defensive_scoring['touchdowns_allowed'] / defensive_scoring['total_drives']

# Display top 10
scoring_top10 = (defensive_scoring
    .sort_values('points_per_drive')
    .head(10)
)

print("\nBest Scoring Defenses - 2023 NFL Season")
print("Lowest Points Per Drive Allowed")
print("=" * 75)
print(scoring_top10.to_string(index=False))

This code calculates how many points defenses allow per drive, providing insight into scoring efficiency rather than just yardage efficiency. The analysis proceeds through several stages: 1. **Drive Identification**: We group plays by team and drive using the `fixed_drive` variable, which uniquely identifies each possession. Each drive can end in several ways: touchdown, field goal, punt, turnover, downs, or end of half. 2. **Drive Outcome Determination**: For each drive, we extract the final result using the `fixed_drive_result` variable. This tells us whether the offense scored and how many points. 3. **Point Assignment**: We convert categorical outcomes (touchdown, field goal, etc.) into point values. Touchdowns are worth 7 points (6 + extra point), field goals are worth 3 points, and all other outcomes (punts, turnovers, etc.) are worth 0 points. 4. **Aggregation by Team**: We sum the total points allowed and count the total drives faced for each defense, then calculate the ratio. **Key Methodological Decisions**: - We assign 7 points for touchdowns rather than 6 because extra points succeed about 94% of the time. This approximation is close enough for defensive evaluation purposes. - We filter to only pass and run plays to ensure we're measuring plays where the defense was actively engaged, excluding special teams plays. - The `fixed_drive` variable handles edge cases like drives spanning multiple quarters or drives that end due to penalties. **Interpretation Notes**: The best defenses in points per drive allowed are not always the same as the best in yards per play. Teams with excellent red zone defense will perform better in this metric, while teams that allow explosive plays for touchdowns will perform worse. Also note the "touchdown rate" column, which shows what percentage of drives end in touchdowns—elite defenses keep this below 15%, while struggling defenses often allow touchdowns on 25% or more of drives.

When examining the results, pay special attention to the touchdown rate column. This metric reveals how often the defense completely breaks down and allows a touchdown. The difference between allowing a touchdown (7 points) and forcing a field goal (3 points) is enormous—four points per scoring drive. Over a season, a defense that converts just 10% of potential touchdowns into field goal stops (e.g., dropping from 20% TD rate to 18% TD rate) saves approximately 13-14 points, worth more than a full win.

The relationship between yards per play and points per drive is not perfectly linear. Some defenses are "bend-but-don't-break," allowing yards but preventing scores. The 2019 Green Bay Packers epitomized this approach, ranking 18th in yards allowed but 9th in points allowed. They accomplished this through strong red zone defense and by forcing field goals rather than allowing touchdowns. Other defenses are "boom-or-bust," allowing very few yards when they play well but occasionally surrendering quick-strike touchdowns on explosive plays.

Best Practice: Combine Multiple Defensive Metrics

No single defensive metric tells the complete story. Always examine yards per play, points per drive, EPA, and success rate together. Defenses that rank well in all metrics are truly elite, while defenses with divergent rankings have specific strengths or weaknesses to investigate.

Defensive EPA and Success Rate

EPA Against (Defensive EPA)

Expected Points Added (EPA) is just as valuable for measuring defense as offense—perhaps even more so, because it automatically adjusts for the context and situation each defense faces. Defensive EPA measures how many expected points a defense prevents relative to average defensive performance in the same situation. This context-awareness makes EPA superior to yards or points for defensive evaluation.

$$ \text{Defensive EPA} = \text{EP}_{\text{start}} - \text{EP}_{\text{end}} $$

Understanding defensive EPA requires thinking from the offense's perspective first, then flipping the sign. Each play starts with an expected points value based on field position, down, and distance. The offense either increases this value (positive EPA, good for offense) or decreases it (negative EPA, bad for offense). From the defense's perspective, we simply reverse these values: offensive EPA gains are defensive EPA losses, and vice versa.

For example, consider a play that starts at first-and-10 from the offense's own 25-yard line (approximately 0.5 expected points). If the offense completes a 15-yard pass to their own 40-yard line, the new situation (first-and-10 from the 40) has approximately 1.2 expected points. The offense gained +0.7 EPA on this play. From the defense's perspective, they allowed +0.7 EPA, which represents a defensive failure. If we negate this value (-0.7 defensive EPA), lower values represent worse defensive performance, which can be intuitive but requires careful interpretation.

Many analysts prefer to work with defensive EPA negated (multiplied by -1) so that positive values represent good defensive plays and negative values represent poor defensive plays. This makes defensive EPA parallel to offensive EPA in interpretation. In this chapter, we'll use the negated convention: positive defensive EPA values are good for the defense, negative values are bad. A defense that averages +0.10 EPA per play is excellent (preventing 0.10 expected points per play above average), while a defense averaging -0.10 EPA per play is poor (allowing 0.10 extra expected points per play).

Why is EPA so valuable for defensive evaluation? First, it accounts for situation. A defense facing third-and-15 from their own 20-yard line has an easier task than a defense facing first-and-goal from the 3-yard line, and EPA automatically adjusts for this. Second, it accounts for down-and-distance changes. A 5-yard gain on third-and-10 is a defensive success (no first down), while a 5-yard gain on third-and-3 is a defensive failure (first down). EPA captures this perfectly, while yards per play treats both equally. Third, it correlates extremely well with winning—better than any yardage-based metric.

In the following analysis, we'll calculate defensive EPA for every NFL team, examining both overall defensive performance and the pass-run split. We'll also calculate defensive success rate, which measures how often the defense prevents positive-EPA plays. These metrics will give us a sophisticated, context-aware view of defensive quality that goes far beyond traditional statistics.

R
Python

#| label: defensive-epa-r
#| message: false
#| warning: false

# Calculate defensive EPA
defensive_epa <- pbp_2023 %>%
  filter(!is.na(defteam), !is.na(epa), play_type %in% c("pass", "run")) %>%
  group_by(defteam) %>%
  summarise(
    plays = n(),
    epa_per_play = -mean(epa, na.rm = TRUE),  # Negative so lower is better for defense
    success_rate_allowed = mean(epa > 0, na.rm = TRUE),
    pass_epa = -mean(epa[play_type == "pass"], na.rm = TRUE),
    run_epa = -mean(epa[play_type == "run"], na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(epa_per_play))

# Display top 10 defenses
defensive_epa %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    plays = "Plays",
    epa_per_play = "EPA/Play",
    success_rate_allowed = "Success Rate",
    pass_epa = "Pass EPA",
    run_epa = "Run EPA"
  ) %>%
  fmt_number(
    columns = c(epa_per_play, pass_epa, run_epa),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = success_rate_allowed,
    decimals = 1
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Best Defenses by EPA",
    subtitle = "2023 NFL Season - Higher EPA is Better for Defense"
  ) %>%
  tab_footnote(
    footnote = "EPA negated so positive values indicate better defense",
    locations = cells_column_labels(columns = epa_per_play)
  )

#| label: defensive-epa-py
#| message: false
#| warning: false

# Calculate defensive EPA
defensive_epa_data = (pbp_2023
    .query("defteam.notna() & epa.notna() & play_type.isin(['pass', 'run'])")
    .copy()
)

defensive_epa = (defensive_epa_data
    .groupby('defteam')
    .agg(
        plays=('play_id', 'count'),
        epa_per_play=('epa', lambda x: -x.mean()),  # Negative so higher is better
        success_rate_allowed=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
)

# Calculate pass and run EPA separately
pass_epa = (defensive_epa_data.query("play_type == 'pass'")
    .groupby('defteam')['epa'].mean()
    .rename('pass_epa'))

run_epa = (defensive_epa_data.query("play_type == 'run'")
    .groupby('defteam')['epa'].mean()
    .rename('run_epa'))

defensive_epa = defensive_epa.join(pass_epa, on='defteam').join(run_epa, on='defteam')
defensive_epa['pass_epa'] = -defensive_epa['pass_epa']  # Negate for defense
defensive_epa['run_epa'] = -defensive_epa['run_epa']

# Display top 10
defensive_epa_top10 = (defensive_epa
    .sort_values('epa_per_play', ascending=False)
    .head(10)
)

print("\nBest Defenses by EPA - 2023 NFL Season")
print("Higher EPA is Better for Defense (negated values)")
print("=" * 85)
print(defensive_epa_top10.to_string(index=False))

Defensive Success Rate

Success rate allowed measures how often the defense prevents a "successful" offensive play (defined as EPA > 0). While EPA captures the magnitude of success or failure, success rate captures the consistency or frequency of defensive stops. Together, these metrics provide a complete picture of defensive performance.

$$ \text{Success Rate Allowed} = \frac{\text{Plays with EPA} > 0}{\text{Total Plays}} $$

Lower success rates allowed indicate better defensive performance. The league average success rate allowed is approximately 45-47%, meaning offenses succeed (gain positive EPA) on roughly half of all plays. Elite defenses hold opponents below 42% success rate, while struggling defenses allow success rates above 50%. This might seem like a small difference, but over a full season (approximately 1,000 defensive plays), the gap between 42% and 50% success rate allowed amounts to 80 additional successful offensive plays—enough to dramatically impact win-loss records.

Success rate complements EPA by revealing defensive consistency. A defense might have good EPA per play by making big plays (sacks, turnovers, tackles for loss) while still allowing too many successful plays overall. Conversely, a defense might allow few successful plays but give up enormous EPA on the occasional explosive play. The best defenses excel at both: preventing successful plays consistently while also generating havoc and negative plays.

Consider the relationship between these metrics. A defense allowing 45% success rate with +0.05 EPA per play is performing well on both dimensions—preventing consistent offensive success. A defense allowing 45% success rate with -0.05 EPA per play has a problem: despite preventing success at an average rate, the offense succeeds explosively when it does succeed. This defense needs to prevent big plays. A defense allowing 50% success rate with +0.05 EPA per play has the opposite problem: the offense succeeds too often, even if not explosively. This defense needs to force more negative plays.

When interpreting success rate allowed, remember that it's calculated per play, not per drive. A defense that allows the offense to succeed on 45% of plays will still force plenty of punts, because three consecutive stops (each occurring 55% of the time) multiplies to a 16.6% chance of three-and-out. The math of repeated trials means that even modest advantages in per-play success rate compound into substantial drive-stopping ability.

#| label: fig-defensive-epa-viz-r
#| fig-cap: "Defensive EPA and Success Rate (2023 season)"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

library(nflplotR)

# Create scatter plot
defensive_epa %>%
  ggplot(aes(x = success_rate_allowed, y = epa_per_play)) +
  geom_hline(yintercept = mean(defensive_epa$epa_per_play),
             linetype = "dashed", color = "gray50", alpha = 0.7) +
  geom_vline(xintercept = mean(defensive_epa$success_rate_allowed),
             linetype = "dashed", color = "gray50", alpha = 0.7) +
  geom_nfl_logos(aes(team_abbr = defteam), width = 0.05, alpha = 0.8) +
  scale_x_reverse(labels = scales::percent_format()) +
  labs(
    title = "Defensive EPA vs Success Rate Allowed",
    subtitle = "2023 NFL Season | Top-right quadrant = Best defenses",
    x = "Success Rate Allowed (Lower is Better)",
    y = "EPA per Play (Higher is Better for Defense)",
    caption = "Data: nflfastR | EPA negated so positive = better defense"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    panel.grid.minor = element_blank()
  )

#| label: fig-defensive-epa-viz-py
#| fig-cap: "Defensive EPA and Success Rate - Python (2023 season)"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

import matplotlib.pyplot as plt

# Create scatter plot
fig, ax = plt.subplots(figsize=(10, 8))

# Add quadrant lines
ax.axhline(y=defensive_epa['epa_per_play'].mean(),
           color='gray', linestyle='--', alpha=0.7, linewidth=1)
ax.axvline(x=defensive_epa['success_rate_allowed'].mean(),
           color='gray', linestyle='--', alpha=0.7, linewidth=1)

# Plot points
ax.scatter(defensive_epa['success_rate_allowed'],
          defensive_epa['epa_per_play'],
          s=200, alpha=0.6, c='#00BFC4', edgecolors='black', linewidth=1.5)

# Add team labels
for idx, row in defensive_epa.iterrows():
    ax.annotate(row['defteam'],
               (row['success_rate_allowed'], row['epa_per_play']),
               fontsize=8, ha='center', va='center', fontweight='bold')

ax.set_xlabel('Success Rate Allowed (Lower is Better)', fontsize=12)
ax.set_ylabel('EPA per Play (Higher is Better for Defense)', fontsize=12)
ax.set_title('Defensive EPA vs Success Rate Allowed\n2023 NFL Season | Top-left quadrant = Best defenses',
             fontsize=14, fontweight='bold', pad=20)
ax.invert_xaxis()  # Reverse x-axis so lower is to the right
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:.0%}'))
ax.grid(alpha=0.3)

plt.text(0.98, 0.02, 'Data: nfl_data_py | EPA negated so positive = better defense',
         transform=ax.transAxes,
         ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Defensive Formations and Personnel

Base Defensive Formations

Modern defenses employ several base formations, distinguished by the number of defensive linemen and linebackers:

4-3 Defense (4 DL, 3 LB, 4 DB):
- Traditional defense emphasizing run stopping
- Four defensive linemen provide a strong pass rush
- Used more frequently on early downs

3-4 Defense (3 DL, 4 LB, 4 DB):
- Outside linebackers often function as pass rushers
- More flexible than 4-3, with versatile linebackers
- Can disguise blitzes and coverages effectively

Nickel Defense (4 DL, 2 LB, 5 DB):
- Most common "base" defense in modern NFL
- Extra defensive back to counter passing attacks
- Used on approximately 60% of defensive snaps

Dime Defense (4 DL, 1 LB, 6 DB):
- Used in obvious passing situations
- Maximum pass coverage personnel
- Vulnerable to runs but optimized for passing downs

R
Python

#| label: defensive-personnel-r
#| message: false
#| warning: false

# Analyze defensive personnel usage
defensive_personnel <- pbp_2023 %>%
  filter(!is.na(defteam), !is.na(defense_personnel), play_type %in% c("pass", "run")) %>%
  group_by(defense_personnel) %>%
  summarise(
    plays = n(),
    pct_of_plays = n() / nrow(filter(pbp_2023, !is.na(defense_personnel),
                                     play_type %in% c("pass", "run"))),
    epa_allowed = -mean(epa, na.rm = TRUE),
    success_rate_allowed = mean(epa > 0, na.rm = TRUE),
    yards_per_play = mean(yards_gained, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(plays)) %>%
  head(10)

# Display results
defensive_personnel %>%
  gt() %>%
  cols_label(
    defense_personnel = "Personnel",
    plays = "Plays",
    pct_of_plays = "% of Snaps",
    epa_allowed = "EPA/Play",
    success_rate_allowed = "Success Rate",
    yards_per_play = "YPP"
  ) %>%
  fmt_percent(
    columns = c(pct_of_plays, success_rate_allowed),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(epa_allowed, yards_per_play),
    decimals = 2
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Defensive Personnel Usage and Effectiveness",
    subtitle = "2023 NFL Season - Top 10 Most Common"
  )

#| label: defensive-personnel-py
#| message: false
#| warning: false

# Analyze defensive personnel usage
personnel_data = pbp_2023.query(
    "defteam.notna() & defense_personnel.notna() & play_type.isin(['pass', 'run'])"
).copy()

total_plays = len(personnel_data)

defensive_personnel = (personnel_data
    .groupby('defense_personnel')
    .agg(
        plays=('play_id', 'count'),
        epa_allowed=('epa', lambda x: -x.mean()),
        success_rate_allowed=('epa', lambda x: (x > 0).mean()),
        yards_per_play=('yards_gained', 'mean')
    )
    .reset_index()
)

defensive_personnel['pct_of_plays'] = defensive_personnel['plays'] / total_plays

# Sort and display top 10
personnel_top10 = (defensive_personnel
    .sort_values('plays', ascending=False)
    .head(10)
)

print("\nDefensive Personnel Usage and Effectiveness - 2023 NFL Season")
print("Top 10 Most Common")
print("=" * 90)
print(personnel_top10.to_string(index=False))

Coverage Schemes: Two-High vs Single-High Safety

One of the most important defensive concepts in modern football is the safety alignment:

Single-High Safety (Cover 1, Cover 3):
- One deep safety in the middle of the field
- Allows for more defenders in the box (run support)
- More aggressive, but vulnerable to deep passes down the sidelines
- Better against run-heavy offenses

Two-High Safety (Cover 2, Cover 4, Cover 6):
- Two deep safeties splitting the field
- Better protection against deep passes
- Fewer defenders in the box (vulnerable to runs)
- Forces offenses to be patient and take underneath throws

R
Python

#| label: safety-coverage-r
#| message: false
#| warning: false

# Analyze defensive performance by number of safeties deep
safety_coverage <- pbp_2023 %>%
  filter(!is.na(defteam), !is.na(number_of_pass_rushers), play_type %in% c("pass", "run")) %>%
  mutate(
    coverage_shell = case_when(
      defenders_in_box >= 8 ~ "Heavy Box (8+)",
      defenders_in_box == 7 ~ "Standard Box (7)",
      defenders_in_box == 6 ~ "Light Box (6)",
      TRUE ~ "Very Light Box (<6)"
    )
  ) %>%
  group_by(coverage_shell, play_type) %>%
  summarise(
    plays = n(),
    epa_allowed = -mean(epa, na.rm = TRUE),
    success_rate_allowed = mean(epa > 0, na.rm = TRUE),
    yards_per_play = mean(yards_gained, na.rm = TRUE),
    explosive_rate = mean(
      (play_type == "pass" & yards_gained >= 20) |
      (play_type == "run" & yards_gained >= 10),
      na.rm = TRUE
    ),
    .groups = "drop"
  ) %>%
  arrange(coverage_shell, play_type)

# Display results
safety_coverage %>%
  gt() %>%
  cols_label(
    coverage_shell = "Box Configuration",
    play_type = "Play Type",
    plays = "Plays",
    epa_allowed = "EPA/Play",
    success_rate_allowed = "Success %",
    yards_per_play = "YPP",
    explosive_rate = "Explosive %"
  ) %>%
  fmt_number(
    columns = c(epa_allowed, yards_per_play),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = c(success_rate_allowed, explosive_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Defensive Performance by Box Configuration",
    subtitle = "2023 NFL Season - By Play Type"
  )

#| label: safety-coverage-py
#| message: false
#| warning: false

# Analyze defensive performance by box configuration
safety_data = pbp_2023.query(
    "defteam.notna() & number_of_pass_rushers.notna() & play_type.isin(['pass', 'run'])"
).copy()

def categorize_box(defenders):
    if defenders >= 8:
        return "Heavy Box (8+)"
    elif defenders == 7:
        return "Standard Box (7)"
    elif defenders == 6:
        return "Light Box (6)"
    else:
        return "Very Light Box (<6)"

safety_data['coverage_shell'] = safety_data['defenders_in_box'].apply(categorize_box)

# Calculate explosive plays
safety_data['is_explosive'] = (
    ((safety_data['play_type'] == 'pass') & (safety_data['yards_gained'] >= 20)) |
    ((safety_data['play_type'] == 'run') & (safety_data['yards_gained'] >= 10))
).astype(int)

safety_coverage = (safety_data
    .groupby(['coverage_shell', 'play_type'])
    .agg(
        plays=('play_id', 'count'),
        epa_allowed=('epa', lambda x: -x.mean()),
        success_rate_allowed=('epa', lambda x: (x > 0).mean()),
        yards_per_play=('yards_gained', 'mean'),
        explosive_rate=('is_explosive', 'mean')
    )
    .reset_index()
)

print("\nDefensive Performance by Box Configuration - 2023 NFL Season")
print("By Play Type")
print("=" * 100)
print(safety_coverage.to_string(index=False))

Explosiveness Allowed

Explosive plays allowed are among the most damaging outcomes for a defense. Preventing big plays is critical to defensive success. In fact, research has shown that explosive play rate allowed is one of the strongest predictors of defensive performance—sometimes even stronger than overall EPA or yards per play. This makes intuitive sense: a single 60-yard touchdown pass can destroy an entire drive's worth of good defensive plays.

In football analytics, we typically define explosive plays as gains of 20+ yards on passes and 10+ yards on runs. These thresholds capture plays that fundamentally change field position and expected points. A 20-yard pass gain typically adds 1.5-2.5 EPA depending on field position, while a typical successful pass adds only 0.3-0.5 EPA. The explosive play is worth 3-5 times as much as a normal successful play. Similarly, a 10-yard run is worth about 1.0-1.5 EPA, compared to 0.2-0.3 EPA for a typical successful run.

The impact of explosive plays goes beyond their immediate EPA value. Explosive plays demoralize defenses, energize offenses, shift momentum, and often lead to scoring drives even if the explosive play itself doesn't score. Conversely, defenses that prevent explosive plays force offenses to execute long, methodical drives where each play must succeed—a much more difficult task that gives the defense multiple opportunities to create negative plays or force turnovers.

Elite defenses typically allow explosive plays on fewer than 8% of plays, with the very best defenses staying below 6%. Struggling defenses allow explosive plays on 12% or more of plays. Consider the cumulative impact: over 1,000 defensive plays per season, the difference between 6% and 12% explosive play rate is 60 additional explosive plays allowed—each worth 1-2 EPA—totaling approximately 90-120 EPA over the season, equivalent to 9-12 points. That single metric difference can be the margin between a playoff berth and missing the postseason.

Explosive plays allowed vary dramatically by play type and coverage strategy. Defenses using two-high safety shells (Cover 2, Cover 4) typically allow fewer explosive passes but more explosive runs, since two safeties deep leaves fewer defenders in the box. Defenses using single-high looks (Cover 1, Cover 3) can stop the run better with more box defenders but are vulnerable to explosive passes down the sidelines. The best defensive coordinators adjust their coverage based on opponent tendencies and game situation.

Measuring Explosive Plays Allowed

In the following analysis, we'll calculate explosive play rates allowed for each defense, examining both overall explosiveness and the pass-run split. We'll identify which defenses excel at preventing big plays and which defenses are vulnerable. This metric is particularly useful for game-planning: if you're facing a defense that struggles to prevent explosive passes, an aggressive vertical passing attack might be optimal even if your offense typically prefers a conservative approach.

R
Python

#| label: explosive-defense-r
#| message: false
#| warning: false

# Calculate explosive plays allowed
explosive_defense <- pbp_2023 %>%
  filter(!is.na(defteam), play_type %in% c("pass", "run")) %>%
  mutate(
    explosive = case_when(
      play_type == "pass" & yards_gained >= 20 ~ 1,
      play_type == "run" & yards_gained >= 10 ~ 1,
      TRUE ~ 0
    ),
    explosive_pass = if_else(play_type == "pass" & yards_gained >= 20, 1, 0),
    explosive_run = if_else(play_type == "run" & yards_gained >= 10, 1, 0)
  ) %>%
  group_by(defteam) %>%
  summarise(
    total_plays = n(),
    explosive_plays_allowed = sum(explosive),
    explosive_rate_allowed = explosive_plays_allowed / total_plays,
    pass_plays = sum(play_type == "pass"),
    explosive_pass_allowed = sum(explosive_pass),
    explosive_pass_rate = explosive_pass_allowed / pass_plays,
    run_plays = sum(play_type == "run"),
    explosive_run_allowed = sum(explosive_run),
    explosive_run_rate = explosive_run_allowed / run_plays,
    .groups = "drop"
  ) %>%
  arrange(explosive_rate_allowed)

# Display best defenses at preventing explosive plays
explosive_defense %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    total_plays = "Plays",
    explosive_plays_allowed = "Explosive",
    explosive_rate_allowed = "Overall %",
    explosive_pass_rate = "Pass %",
    explosive_run_rate = "Run %"
  ) %>%
  fmt_percent(
    columns = c(explosive_rate_allowed, explosive_pass_rate, explosive_run_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(total_plays, explosive_plays_allowed),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Best Defenses at Preventing Explosive Plays",
    subtitle = "2023 NFL Season - Pass: 20+ yds, Run: 10+ yds"
  )

#| label: explosive-defense-py
#| message: false
#| warning: false

# Calculate explosive plays allowed
explosive_def_data = pbp_2023.query(
    "defteam.notna() & play_type.isin(['pass', 'run'])"
).copy()

# Define explosive plays
def is_explosive_def(row):
    if row['play_type'] == 'pass' and row['yards_gained'] >= 20:
        return 1
    elif row['play_type'] == 'run' and row['yards_gained'] >= 10:
        return 1
    return 0

explosive_def_data['explosive'] = explosive_def_data.apply(is_explosive_def, axis=1)
explosive_def_data['explosive_pass'] = ((explosive_def_data['play_type'] == 'pass') &
                                         (explosive_def_data['yards_gained'] >= 20)).astype(int)
explosive_def_data['explosive_run'] = ((explosive_def_data['play_type'] == 'run') &
                                        (explosive_def_data['yards_gained'] >= 10)).astype(int)

explosive_defense = (explosive_def_data
    .groupby('defteam')
    .agg(
        total_plays=('play_id', 'count'),
        explosive_plays_allowed=('explosive', 'sum'),
        pass_plays=('play_type', lambda x: (x == 'pass').sum()),
        explosive_pass_allowed=('explosive_pass', 'sum'),
        run_plays=('play_type', lambda x: (x == 'run').sum()),
        explosive_run_allowed=('explosive_run', 'sum')
    )
    .reset_index()
)

explosive_defense['explosive_rate_allowed'] = explosive_defense['explosive_plays_allowed'] / explosive_defense['total_plays']
explosive_defense['explosive_pass_rate'] = explosive_defense['explosive_pass_allowed'] / explosive_defense['pass_plays']
explosive_defense['explosive_run_rate'] = explosive_defense['explosive_run_allowed'] / explosive_defense['run_plays']

# Display top 10 (lowest rates)
explosive_best = (explosive_defense
    .sort_values('explosive_rate_allowed')
    .head(10)
    [['defteam', 'total_plays', 'explosive_plays_allowed', 'explosive_rate_allowed',
      'explosive_pass_rate', 'explosive_run_rate']]
)

print("\nBest Defenses at Preventing Explosive Plays - 2023 NFL Season")
print("Pass: 20+ yds, Run: 10+ yds")
print("=" * 90)
print(explosive_best.to_string(index=False))

Visualizing Explosive Plays Allowed

R
Python

#| label: fig-explosive-defense-r
#| fig-cap: "Explosive plays allowed by defense (2023 season)"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Prepare data for visualization
explosive_viz <- explosive_defense %>%
  select(defteam, explosive_pass_rate, explosive_run_rate) %>%
  pivot_longer(
    cols = c(explosive_pass_rate, explosive_run_rate),
    names_to = "type",
    values_to = "rate"
  ) %>%
  mutate(
    type = if_else(type == "explosive_pass_rate", "Pass (20+ yds)", "Run (10+ yds)")
  )

# Create plot
ggplot(explosive_viz, aes(x = reorder(defteam, -rate), y = rate, fill = type)) +
  geom_col(position = "dodge", alpha = 0.8) +
  coord_flip() +
  scale_y_continuous(labels = scales::percent_format(), breaks = seq(0, 0.20, 0.05)) +
  scale_fill_manual(values = c("Pass (20+ yds)" = "#F8766D", "Run (10+ yds)" = "#00BFC4")) +
  labs(
    title = "Explosive Plays Allowed by Defense",
    subtitle = "2023 NFL Season | Pass: 20+ yards, Run: 10+ yards | Lower is Better",
    x = "Team",
    y = "Explosive Play Rate Allowed",
    fill = "Play Type",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 12),
    axis.text = element_text(size = 9),
    legend.position = "top"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-explosive-defense-py
#| fig-cap: "Explosive plays allowed by defense - Python (2023 season)"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

import matplotlib.pyplot as plt

# Prepare data for visualization
explosive_viz = explosive_defense.copy()
explosive_viz = explosive_viz.sort_values('explosive_rate_allowed', ascending=False)

# Create plot
fig, ax = plt.subplots(figsize=(12, 8))

x = range(len(explosive_viz))
width = 0.35

bars1 = ax.barh([i - width/2 for i in x], explosive_viz['explosive_pass_rate'],
                width, label='Pass (20+ yds)', color='#F8766D', alpha=0.8)
bars2 = ax.barh([i + width/2 for i in x], explosive_viz['explosive_run_rate'],
                width, label='Run (10+ yds)', color='#00BFC4', alpha=0.8)

ax.set_yticks(x)
ax.set_yticklabels(explosive_viz['defteam'], fontsize=9)
ax.set_xlabel('Explosive Play Rate Allowed', fontsize=12)
ax.set_ylabel('Team', fontsize=12)
ax.set_title('Explosive Plays Allowed by Defense\n2023 NFL Season | Pass: 20+ yards, Run: 10+ yards | Lower is Better',
             fontsize=16, fontweight='bold', pad=20)
ax.legend(title='Play Type', loc='lower right', fontsize=10)
ax.set_xlim(0, 0.20)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:.0%}'))
ax.grid(axis='x', alpha=0.3)

plt.text(0.98, 0.02, 'Data: nfl_data_py',
         transform=ax.transAxes,
         ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Red Zone Defense

The red zone (inside the opponent's 20-yard line) is where defenses must prevent touchdowns and force field goals. Red zone defense is one of the most critical aspects of defensive performance, often determining the difference between winning and losing close games. The compressed field fundamentally changes offensive and defensive strategy: routes are shorter, windows are tighter, and every yard matters exponentially more.

From an EPA perspective, red zone plays are enormously high-leverage. A touchdown from the 10-yard line is worth about 7 points but only adds 1-2 EPA because the offense already had 5-6 expected points from that field position. However, preventing that touchdown and forcing a field goal costs the offense 4 points—a massive swing worth approximately 0.40 win probability points in a close game. This is why red zone defense is often called "championship defense"—teams that excel in this area win close games.

Red zone defense differs from normal defense in several key ways. First, the field is compressed, eliminating deep routes and forcing offenses to attack horizontally or vertically in short spaces. This favors defenses that can match up well man-to-man and generate pressure without blitzing. Second, down-and-distance matters even more—third-and-goal from the 8 is very different from third-and-goal from the 1. Third, the value of stopping a touchdown versus allowing a field goal is enormous, making red zone touchdown percentage one of the most important defensive metrics.

Elite red zone defenses typically allow touchdowns on fewer than 50% of red zone possessions, with the best defenses staying below 45%. Poor red zone defenses allow touchdowns on 60% or more of red zone trips. The difference between these rates—15 percentage points over approximately 50 red zone drives per season—is 7-8 touchdowns prevented, worth approximately 28-32 points (assuming they become field goals instead), or roughly 3 wins based on the typical point-differential-to-wins conversion.

Interestingly, red zone defense is one of the least stable defensive metrics year-to-year, suggesting it contains significant randomness or is highly dependent on personnel matchups. A team with an excellent red zone defense one year often regresses toward average the next year. This instability makes red zone defense both critically important (high leverage) and difficult to predict (high variance).

Red Zone Metrics

In the following analysis, we'll examine red zone defensive performance by calculating touchdown rates allowed, field goal rates allowed, and points per red zone trip. We'll identify which defenses excel in this critical area and which defenses struggle to prevent scores when opponents reach scoring position. Pay particular attention to the touchdown rate allowed—this is often the single most important red zone metric.

Key Insight: Red Zone Defense Wins Championships

Teams that rank in the top 10 in red zone touchdown percentage allowed make the playoffs at nearly double the rate of teams ranking in the bottom 10. In close games (decided by 7 points or fewer), elite red zone defenses win approximately 60% of the time, while poor red zone defenses win only 35% of the time.

R
Python

#| label: red-zone-defense-r
#| message: false
#| warning: false

# Calculate red zone defense
red_zone_defense <- pbp_2023 %>%
  filter(!is.na(defteam), yardline_100 <= 20, down %in% c(1, 2, 3, 4)) %>%
  group_by(defteam, fixed_drive) %>%
  slice(1) %>%  # First play of each red zone drive
  ungroup() %>%
  group_by(defteam) %>%
  summarise(
    red_zone_drives = n(),
    touchdowns_allowed = sum(fixed_drive_result == "Touchdown", na.rm = TRUE),
    field_goals_allowed = sum(fixed_drive_result == "Field goal", na.rm = TRUE),
    stops = red_zone_drives - touchdowns_allowed - field_goals_allowed,
    td_rate_allowed = touchdowns_allowed / red_zone_drives,
    scoring_rate_allowed = (touchdowns_allowed + field_goals_allowed) / red_zone_drives,
    points_per_trip = (touchdowns_allowed * 7 + field_goals_allowed * 3) / red_zone_drives,
    .groups = "drop"
  ) %>%
  arrange(td_rate_allowed)

# Display best red zone defenses
red_zone_defense %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    red_zone_drives = "RZ Drives",
    touchdowns_allowed = "TDs",
    field_goals_allowed = "FGs",
    stops = "Stops",
    td_rate_allowed = "TD %",
    scoring_rate_allowed = "Score %",
    points_per_trip = "Pts/Trip"
  ) %>%
  fmt_percent(
    columns = c(td_rate_allowed, scoring_rate_allowed),
    decimals = 1
  ) %>%
  fmt_number(
    columns = points_per_trip,
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(red_zone_drives, touchdowns_allowed, field_goals_allowed, stops),
    decimals = 0
  ) %>%
  tab_header(
    title = "Best Red Zone Defenses",
    subtitle = "2023 NFL Season - Ranked by TD % Allowed"
  )

#| label: red-zone-defense-py
#| message: false
#| warning: false

# Calculate red zone defense
rz_def_data = pbp_2023.query(
    "defteam.notna() & yardline_100 <= 20 & down.isin([1, 2, 3, 4])"
).copy()

# Get first play of each red zone drive
rz_def_drives = (rz_def_data
    .sort_values(['defteam', 'fixed_drive', 'play_id'])
    .groupby(['defteam', 'fixed_drive'])
    .first()
    .reset_index()
)

red_zone_defense = (rz_def_drives
    .groupby('defteam')
    .agg(
        red_zone_drives=('fixed_drive', 'count'),
        touchdowns_allowed=('fixed_drive_result', lambda x: (x == 'Touchdown').sum()),
        field_goals_allowed=('fixed_drive_result', lambda x: (x == 'Field goal').sum())
    )
    .reset_index()
)

red_zone_defense['stops'] = (red_zone_defense['red_zone_drives'] -
                              red_zone_defense['touchdowns_allowed'] -
                              red_zone_defense['field_goals_allowed'])
red_zone_defense['td_rate_allowed'] = red_zone_defense['touchdowns_allowed'] / red_zone_defense['red_zone_drives']
red_zone_defense['scoring_rate_allowed'] = ((red_zone_defense['touchdowns_allowed'] +
                                              red_zone_defense['field_goals_allowed']) /
                                             red_zone_defense['red_zone_drives'])
red_zone_defense['points_per_trip'] = ((red_zone_defense['touchdowns_allowed'] * 7 +
                                        red_zone_defense['field_goals_allowed'] * 3) /
                                       red_zone_defense['red_zone_drives'])

# Display top 10 (lowest TD rate)
rz_def_best = (red_zone_defense
    .sort_values('td_rate_allowed')
    .head(10)
)

print("\nBest Red Zone Defenses - 2023 NFL Season")
print("Ranked by TD % Allowed")
print("=" * 90)
print(rz_def_best.to_string(index=False))

Red Zone Defense Visualization

R
Python

#| label: fig-red-zone-defense-r
#| fig-cap: "Red zone outcomes allowed by defense (2023 season)"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Create visualization
red_zone_defense %>%
  mutate(
    td_pct = td_rate_allowed * 100,
    fg_pct = (field_goals_allowed / red_zone_drives) * 100,
    stop_pct = (stops / red_zone_drives) * 100
  ) %>%
  select(defteam, td_pct, fg_pct, stop_pct) %>%
  pivot_longer(
    cols = c(td_pct, fg_pct, stop_pct),
    names_to = "outcome",
    values_to = "percentage"
  ) %>%
  mutate(
    outcome = factor(outcome,
                    levels = c("td_pct", "fg_pct", "stop_pct"),
                    labels = c("Touchdown", "Field Goal", "Stop"))
  ) %>%
  ggplot(aes(x = reorder(defteam, percentage, function(x) sum(x[outcome == "Stop"])),
             y = percentage, fill = outcome)) +
  geom_col() +
  coord_flip() +
  scale_fill_manual(
    values = c("Stop" = "#4CAF50", "Field Goal" = "#FF9800", "Touchdown" = "#F44336")
  ) +
  labs(
    title = "Red Zone Outcomes Allowed by Defense",
    subtitle = "2023 NFL Season - Percentage of Red Zone Drives",
    x = "Team",
    y = "Percentage of Red Zone Drives",
    fill = "Outcome",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 12),
    axis.text = element_text(size = 9),
    legend.position = "top"
  )

#| label: fig-red-zone-defense-py
#| fig-cap: "Red zone outcomes allowed by defense - Python (2023 season)"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Prepare data for stacked bar chart
rz_viz = red_zone_defense.copy()
rz_viz['td_pct'] = (rz_viz['touchdowns_allowed'] / rz_viz['red_zone_drives']) * 100
rz_viz['fg_pct'] = (rz_viz['field_goals_allowed'] / rz_viz['red_zone_drives']) * 100
rz_viz['stop_pct'] = (rz_viz['stops'] / rz_viz['red_zone_drives']) * 100

rz_viz = rz_viz.sort_values('stop_pct', ascending=False)

# Create stacked bar chart
fig, ax = plt.subplots(figsize=(12, 8))

teams = rz_viz['defteam']
y_pos = range(len(teams))

ax.barh(y_pos, rz_viz['stop_pct'], label='Stop', color='#4CAF50')
ax.barh(y_pos, rz_viz['fg_pct'], left=rz_viz['stop_pct'],
        label='Field Goal', color='#FF9800')
ax.barh(y_pos, rz_viz['td_pct'],
        left=rz_viz['stop_pct'] + rz_viz['fg_pct'],
        label='Touchdown', color='#F44336')

ax.set_yticks(y_pos)
ax.set_yticklabels(teams, fontsize=9)
ax.set_xlabel('Percentage of Red Zone Drives', fontsize=12)
ax.set_ylabel('Team', fontsize=12)
ax.set_title('Red Zone Outcomes Allowed by Defense\n2023 NFL Season - Percentage of Red Zone Drives',
             fontsize=16, fontweight='bold', pad=20)
ax.legend(title='Outcome', loc='lower right', fontsize=10)
ax.set_xlim(0, 100)
ax.grid(axis='x', alpha=0.3)

plt.text(0.98, 0.02, 'Data: nfl_data_py',
         transform=ax.transAxes,
         ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

Third Down Defense

Third down defense is critical for getting off the field and preventing sustained drives. Forcing punts on third down is a key defensive objective—and one of the most predictive indicators of overall defensive quality. Third down represents the "money down" for defenses: it's their last chance to force a punt and get the offense back on the field. Success on third down determines whether a drive continues or ends.

The importance of third down defense cannot be overstated. Consider that the average NFL drive includes 2-3 third downs. A defense that converts third-and-long situations into stops at a high rate will face far fewer total plays, reducing overall defensive fatigue and reducing opportunities for the offense to score. Conversely, a defense that allows frequent third down conversions will face marathon drives that drain time of possession and increase the likelihood of allowing points.

Third down stop rate varies enormously by distance to the first down marker. Third-and-2 is dramatically different from third-and-12, and EPA-based analysis has shown these situations have fundamentally different characteristics. On third-and-short (1-3 yards), offenses convert approximately 60-65% of the time, while on third-and-long (10+ yards), conversion rates drop to 20-30%. Elite defenses excel particularly on third-and-medium (4-7 yards), where defensive scheme and execution matter most.

One fascinating aspect of third down defense is the trade-off between yards allowed and conversion prevention. Some defenses play conservatively, conceding short completions while preventing the first down. Others play aggressively, trying to generate negative plays even at the risk of allowing conversions on blown coverages. The optimal strategy depends on field position, score, and time remaining—we'll explore these strategic considerations in depth in Chapter 15 on defensive decision-making.

From an analytics perspective, third down EPA is more valuable than third down conversion rate, because it accounts for the magnitude of success or failure. Allowing a third-and-15 conversion that gains exactly 15 yards is bad, but allowing a 45-yard touchdown on third-and-15 is catastrophic. EPA captures this distinction while conversion rate does not. However, both metrics provide useful information: conversion rate measures consistency, while EPA measures impact.

Understanding Third Down Success

A "successful" third down for the defense is any play that does not result in a first down. This includes incomplete passes, runs short of the marker, sacks, and turnovers. From an EPA perspective, even allowing exactly enough yards for a first down is a marginal failure, while forcing a turnover or sack is an enormous success worth 2-4 EPA.

R
Python

#| label: third-down-defense-r
#| message: false
#| warning: false

# Calculate third down defense
third_down_defense <- pbp_2023 %>%
  filter(!is.na(defteam), down == 3, play_type %in% c("pass", "run")) %>%
  mutate(
    distance_category = case_when(
      ydstogo <= 3 ~ "Short (1-3 yds)",
      ydstogo <= 6 ~ "Medium (4-6 yds)",
      ydstogo <= 10 ~ "Long (7-10 yds)",
      TRUE ~ "Very Long (11+ yds)"
    )
  ) %>%
  group_by(defteam) %>%
  summarise(
    third_downs = n(),
    stops = sum(third_down_converted == 0, na.rm = TRUE),
    conversions_allowed = sum(third_down_converted == 1, na.rm = TRUE),
    stop_rate = stops / third_downs,
    avg_distance = mean(ydstogo, na.rm = TRUE),
    avg_epa = -mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(stop_rate))

# Display results
third_down_defense %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    third_downs = "3rd Downs",
    stops = "Stops",
    conversions_allowed = "Conv Allowed",
    stop_rate = "Stop %",
    avg_distance = "Avg Distance",
    avg_epa = "Avg EPA"
  ) %>%
  fmt_percent(
    columns = stop_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(avg_distance, avg_epa),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(third_downs, stops, conversions_allowed),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Best Third Down Defenses",
    subtitle = "2023 NFL Season - Highest Stop Rate"
  )

#| label: third-down-defense-py
#| message: false
#| warning: false

# Calculate third down defense
third_down_def_data = pbp_2023.query(
    "defteam.notna() & down == 3 & play_type.isin(['pass', 'run'])"
).copy()

third_down_defense = (third_down_def_data
    .groupby('defteam')
    .agg(
        third_downs=('play_id', 'count'),
        stops=('third_down_converted', lambda x: (x == 0).sum()),
        conversions_allowed=('third_down_converted', 'sum'),
        avg_distance=('ydstogo', 'mean'),
        avg_epa=('epa', lambda x: -x.mean())
    )
    .reset_index()
)

third_down_defense['stop_rate'] = third_down_defense['stops'] / third_down_defense['third_downs']

# Display top 10
third_down_best = (third_down_defense
    .sort_values('stop_rate', ascending=False)
    .head(10)
)

print("\nBest Third Down Defenses - 2023 NFL Season")
print("Highest Stop Rate")
print("=" * 85)
print(third_down_best.to_string(index=False))

Third Down Defense by Distance

R
Python

#| label: fig-third-down-defense-distance-r
#| fig-cap: "Third down stop rate by distance (2023 season)"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Calculate stop rate by distance
third_down_distance_def <- pbp_2023 %>%
  filter(!is.na(defteam), down == 3, play_type %in% c("pass", "run")) %>%
  mutate(
    distance_category = case_when(
      ydstogo <= 3 ~ "Short (1-3 yds)",
      ydstogo <= 6 ~ "Medium (4-6 yds)",
      ydstogo <= 10 ~ "Long (7-10 yds)",
      TRUE ~ "Very Long (11+ yds)"
    ),
    distance_category = factor(distance_category,
                               levels = c("Short (1-3 yds)", "Medium (4-6 yds)",
                                        "Long (7-10 yds)", "Very Long (11+ yds)"))
  ) %>%
  group_by(distance_category) %>%
  summarise(
    attempts = n(),
    stops = sum(third_down_converted == 0, na.rm = TRUE),
    stop_rate = stops / attempts,
    .groups = "drop"
  )

# Create visualization
ggplot(third_down_distance_def, aes(x = distance_category, y = stop_rate)) +
  geom_col(fill = "#4CAF50", alpha = 0.8) +
  geom_text(aes(label = scales::percent(stop_rate, accuracy = 0.1)),
            vjust = -0.5, fontface = "bold", size = 4) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 0.8)) +
  labs(
    title = "Third Down Stop Rate by Distance",
    subtitle = "2023 NFL Season - Defensive Perspective",
    x = "Distance to First Down",
    y = "Stop Rate",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    axis.text.x = element_text(angle = 0, hjust = 0.5, size = 10)
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-third-down-defense-distance-py
#| fig-cap: "Third down stop rate by distance - Python (2023 season)"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Calculate stop rate by distance
third_down_dist_def = pbp_2023.query(
    "defteam.notna() & down == 3 & play_type.isin(['pass', 'run'])"
).copy()

third_down_dist_def['distance_category'] = third_down_dist_def['ydstogo'].apply(categorize_distance)

distance_summary_def = (third_down_dist_def
    .groupby('distance_category')
    .agg(
        attempts=('play_id', 'count'),
        stops=('third_down_converted', lambda x: (x == 0).sum())
    )
    .reset_index()
)

distance_summary_def['stop_rate'] = distance_summary_def['stops'] / distance_summary_def['attempts']

# Order categories
distance_summary_def['distance_category'] = pd.Categorical(
    distance_summary_def['distance_category'],
    categories=category_order,
    ordered=True
)
distance_summary_def = distance_summary_def.sort_values('distance_category')

# Create visualization
fig, ax = plt.subplots(figsize=(10, 6))

bars = ax.bar(distance_summary_def['distance_category'],
              distance_summary_def['stop_rate'],
              color='#4CAF50', alpha=0.8)

# Add percentage labels on bars
for i, bar in enumerate(bars):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{height:.1%}',
            ha='center', va='bottom', fontweight='bold', fontsize=11)

ax.set_ylim(0, 0.8)
ax.set_ylabel('Stop Rate', fontsize=12)
ax.set_xlabel('Distance to First Down', fontsize=12)
ax.set_title('Third Down Stop Rate by Distance\n2023 NFL Season - Defensive Perspective',
             fontsize=14, fontweight='bold', pad=20)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, p: f'{y:.0%}'))
ax.grid(axis='y', alpha=0.3)

plt.text(0.98, 0.02, 'Data: nfl_data_py',
         transform=ax.transAxes,
         ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Takeaways and Turnovers

The Most Valuable Defensive Plays

Takeaways (interceptions and fumble recoveries) represent the most valuable plays a defense can make. A typical takeaway is worth 3-5 EPA depending on field position, compared to 0.5-1.5 EPA for a sack or 1.0-2.0 EPA for a tackle for loss. Takeaways not only end the opponent's drive but also give the offense possession, often in favorable field position. The dual benefit—preventing opponent points while creating scoring opportunities—makes takeaways the single most impactful defensive outcome.

From a strategic perspective, turnovers are partially skill-based and partially luck-based. Teams can create favorable conditions for turnovers by generating pressure (which leads to hurried throws and fumbles), playing tight coverage (which creates interception opportunities), and emphasizing ball-stripping techniques. However, whether a tipped pass falls into a defender's hands or harmlessly to the ground, or whether a fumble bounces toward the defense or the offense, involves significant randomness.

This skill-luck mix creates an important analytical challenge: turnover rate is highly unstable year-over-year. A defense that forces 25 turnovers one season might force only 15 the next season with similar personnel and scheme. This regression toward the mean makes turnover rate one of the least predictive defensive metrics for future performance. However, turnovers remain incredibly valuable when they do occur, even if we can't reliably predict which teams will generate them.

The EPA value of turnovers varies enormously by field position and game situation. An interception at the opponent's 20-yard line that the defense returns to midfield represents an approximately 6-7 EPA swing: the offense loses 3 expected points from their field position, and the defense gains 3-4 expected points from the new field position. An interception returned for a touchdown represents a 10+ EPA swing. These enormous impacts mean that even small differences in turnover rate—say, 5 turnovers over a season—can equal 20-25 EPA or 2-3 wins.

Common Pitfall: Overvaluing Turnover Rate

While turnovers are extremely valuable when they occur, turnover rate is one of the least stable defensive metrics year-over-year. Don't assume a defense that generated many turnovers last season will repeat that performance. Focus on more stable metrics (EPA, success rate, pressure rate) for predicting future defensive performance.

Measuring Defensive Takeaways

In the following analysis, we'll calculate turnover rates and examine their impact on defensive performance. We'll also explore the relationship between pressure rate and turnovers, demonstrating how defenses can tilt the odds in their favor even if they can't completely control whether turnovers occur.

#| label: takeaways-defense-r
#| message: false
#| warning: false
#| cache: true

# Calculate defensive takeaways and their impact
takeaways_defense <- pbp_2023 %>%
  # Filter to plays where defense is recorded
  filter(!is.na(defteam), play_type %in% c("pass", "run")) %>%
  # Group by defending team
  group_by(defteam) %>%
  summarise(
    # Count total plays faced
    plays = n(),
    # Count interceptions (defensive team recovered)
    interceptions = sum(interception == 1, na.rm = TRUE),
    # Count fumbles recovered by defense
    fumbles_recovered = sum(fumble_lost == 1, na.rm = TRUE),
    # Total takeaways
    total_takeaways = interceptions + fumbles_recovered,
    # Takeaway rate (per play)
    takeaway_rate = total_takeaways / plays,
    # Calculate average EPA on plays with takeaways
    takeaway_epa = -mean(epa[interception == 1 | fumble_lost == 1], na.rm = TRUE),
    # Overall defensive EPA
    def_epa = -mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  # Sort by total takeaways (descending)
  arrange(desc(total_takeaways))

# Display top 10 defenses in takeaways
takeaways_defense %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    plays = "Plays",
    interceptions = "INTs",
    fumbles_recovered = "Fumbles",
    total_takeaways = "Total",
    takeaway_rate = "Rate",
    takeaway_epa = "Takeaway EPA",
    def_epa = "Overall EPA"
  ) %>%
  fmt_number(
    columns = c(takeaway_epa, def_epa),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = takeaway_rate,
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(plays, interceptions, fumbles_recovered, total_takeaways),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Defensive Takeaways Leaders",
    subtitle = "2023 NFL Season - Most Turnovers Forced"
  )

#| label: takeaways-defense-py
#| message: false
#| warning: false
#| cache: true

# Calculate defensive takeaways and their impact
takeaways_data = pbp_2023.query(
    "defteam.notna() & play_type.isin(['pass', 'run'])"
).copy()

# Calculate takeaways by team
takeaways_defense = (takeaways_data
    .groupby('defteam')
    .agg(
        plays=('play_id', 'count'),
        interceptions=('interception', 'sum'),
        fumbles_recovered=('fumble_lost', 'sum'),
        def_epa=('epa', lambda x: -x.mean())
    )
    .reset_index()
)

# Calculate total takeaways and rate
takeaways_defense['total_takeaways'] = (takeaways_defense['interceptions'] +
                                         takeaways_defense['fumbles_recovered'])
takeaways_defense['takeaway_rate'] = takeaways_defense['total_takeaways'] / takeaways_defense['plays']

# Calculate average EPA on takeaway plays
takeaway_plays = takeaways_data.query("interception == 1 | fumble_lost == 1")
takeaway_epa_by_team = (takeaway_plays
    .groupby('defteam')['epa']
    .mean()
    .rename('takeaway_epa')
)

takeaways_defense = takeaways_defense.join(takeaway_epa_by_team, on='defteam')
takeaways_defense['takeaway_epa'] = -takeaways_defense['takeaway_epa'].fillna(0)

# Display top 10
takeaways_top10 = (takeaways_defense
    .sort_values('total_takeaways', ascending=False)
    .head(10)
)

print("\nDefensive Takeaways Leaders - 2023 NFL Season")
print("Most Turnovers Forced")
print("=" * 90)
print(takeaways_top10.to_string(index=False))

This code calculates how often defenses force turnovers and measures the EPA impact of takeaways. The analysis reveals several important insights: 1. **Turnover Types**: We separate interceptions from fumbles recovered because they occur through different mechanisms. Interceptions typically result from coverage and quarterback pressure, while fumbles result from hard hits, strip attempts, and ball security issues. 2. **Takeaway Rate**: We calculate turnovers as a percentage of total plays. Elite defenses force turnovers on 3-4% of plays, while average defenses are closer to 2-2.5%. This seemingly small difference amounts to 10-15 additional turnovers over a full season. 3. **EPA Impact**: The `takeaway_epa` column shows the average EPA value of plays where turnovers occur. These plays are typically worth 4-6 EPA for the defense—enormous compared to the typical play value of 0.1-0.2 EPA. 4. **Overall Performance**: We include overall defensive EPA to show that teams with many takeaways don't always have the best overall defenses. Turnover luck can mask underlying defensive weaknesses or elevate already-good defenses to elite status. **Key Finding**: Notice that takeaway leaders often (but not always) correlate with overall defensive EPA leaders. Teams that generate pressure and play tight coverage create more turnover opportunities, but converting those opportunities into actual turnovers involves some luck.

The relationship between turnovers and winning is complex. Teams that win the turnover battle (force more turnovers than they commit) win approximately 75-80% of their games. However, this correlation doesn't imply that turnovers are fully within a team's control or that turnover margin is predictive across seasons. Much of the year-to-year variation in turnover margin is regression to the mean—teams with excellent turnover margins one year typically regress toward average the next year.

This instability has important implications for team-building and evaluation. While you should always try to force turnovers when the opportunity presents itself, you shouldn't build your defensive strategy around expecting a high turnover rate. Instead, focus on sustainable defensive advantages: generating pressure, preventing explosive plays, stopping the run efficiently, and excelling in key situations like third down and red zone. Turnovers will come as a bonus when defensive execution forces offensive mistakes.

Best Practice: Focus on Turnover Creation, Not Turnover Results

Rather than measuring turnover rate (which is unstable), measure the factors that create turnover opportunities: pressure rate on the quarterback, tight coverage (measured by yards per attempt allowed), and forced fumbles. These metrics are more stable and predictive of future performance.

Pass Rush Metrics

Sacks and Pressure

Sacks are one of the most impactful defensive plays, dramatically reducing EPA and killing offensive drives. While not quite as valuable as turnovers, sacks are worth approximately 2-3 EPA depending on field position and down. More importantly, sacks are more stable and predictable than turnovers, making pressure rate a fundamental building block of elite defenses.

R
Python

#| label: pass-rush-defense-r
#| message: false
#| warning: false

# Calculate pass rush metrics
pass_rush_defense <- pbp_2023 %>%
  filter(!is.na(defteam), (play_type == "pass" | sack == 1)) %>%
  group_by(defteam) %>%
  summarise(
    pass_attempts = n(),
    sacks = sum(sack == 1, na.rm = TRUE),
    sack_rate = sacks / pass_attempts,
    sack_yards = sum(yards_gained[sack == 1], na.rm = TRUE),
    avg_sack_yards = if_else(sacks > 0, sack_yards / sacks, 0),
    pass_epa = -mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(sack_rate))

# Display best pass rush defenses
pass_rush_defense %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    pass_attempts = "Pass Attempts",
    sacks = "Sacks",
    sack_rate = "Sack Rate",
    sack_yards = "Sack Yards",
    avg_sack_yards = "Avg Sack Yds",
    pass_epa = "Pass EPA"
  ) %>%
  fmt_percent(
    columns = sack_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(avg_sack_yards, pass_epa, sack_yards),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(pass_attempts, sacks),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Best Pass Rush Defenses (Highest Sack Rate)",
    subtitle = "2023 NFL Season"
  )

#| label: pass-rush-defense-py
#| message: false
#| warning: false

# Calculate pass rush metrics
pass_rush_data = pbp_2023.query(
    "defteam.notna() & (play_type == 'pass' | sack == 1)"
).copy()

pass_rush_defense = (pass_rush_data
    .groupby('defteam')
    .agg(
        pass_attempts=('play_id', 'count'),
        sacks=('sack', 'sum'),
        pass_epa=('epa', lambda x: -x.mean())
    )
    .reset_index()
)

# Calculate sack yards
sack_yards_def = (pass_rush_data.query("sack == 1")
    .groupby('defteam')
    .agg(
        sack_yards=('yards_gained', 'sum')
    )
)

pass_rush_defense = pass_rush_defense.join(sack_yards_def, on='defteam')
pass_rush_defense['sack_yards'] = pass_rush_defense['sack_yards'].fillna(0)
pass_rush_defense['sack_rate'] = pass_rush_defense['sacks'] / pass_rush_defense['pass_attempts']
pass_rush_defense['avg_sack_yards'] = np.where(
    pass_rush_defense['sacks'] > 0,
    pass_rush_defense['sack_yards'] / pass_rush_defense['sacks'],
    0
)

# Display best pass rush defenses
pass_rush_best = (pass_rush_defense
    .sort_values('sack_rate', ascending=False)
    .head(10)
)

print("\nBest Pass Rush Defenses (Highest Sack Rate) - 2023 NFL Season")
print("=" * 85)
print(pass_rush_best.to_string(index=False))

Comprehensive Defensive Dashboard

Building a Complete Defensive Evaluation System

Let's combine multiple metrics into a comprehensive defensive ranking. No single metric tells the complete story of defensive performance—EPA captures context but might miss explosiveness, yards per play captures efficiency but misses situational performance, and turnover rate is valuable but unstable. By combining multiple metrics into a composite score, we can create a more robust evaluation that captures different dimensions of defensive quality.

Our comprehensive dashboard will combine three key metrics:

EPA per play (negated for defense): Captures overall efficiency adjusted for situation
Success rate allowed: Measures consistency of preventing successful offensive plays
Explosive play rate allowed: Captures big-play prevention, critical for winning

We'll calculate z-scores for each metric, which standardize them to the same scale (mean of 0, standard deviation of 1). This allows us to combine metrics that have different units (EPA is in points, rates are percentages) into a single composite score. Teams that excel across all three dimensions will have the highest composite scores, while teams that excel in only one dimension will have moderate scores.

This multi-dimensional approach mirrors how NFL front offices evaluate defenses. Rather than focusing on any single metric, successful organizations examine multiple indicators to build a complete picture. A defense might have excellent EPA but allow too many explosive plays, or might prevent big plays but struggle with consistency on early downs. The composite score identifies truly elite defenses that perform well across all dimensions.

Understanding Composite Scores

A composite score combines multiple metrics using z-scores (standard deviations from the mean). A composite score of +1.0 means the defense is one standard deviation better than average across all metrics. Scores above +0.5 are above average, scores above +1.0 are elite, and scores above +1.5 are historically excellent.

R
Python

#| label: comprehensive-defense-r
#| message: false
#| warning: false

# Create comprehensive defensive metrics
comprehensive_defense <- pbp_2023 %>%
  filter(!is.na(defteam), play_type %in% c("pass", "run")) %>%
  group_by(defteam) %>%
  summarise(
    plays = n(),
    yards_per_play = mean(yards_gained, na.rm = TRUE),
    epa_per_play = -mean(epa, na.rm = TRUE),  # Negative for defense
    success_rate_allowed = mean(epa > 0, na.rm = TRUE),
    explosive_rate_allowed = mean(
      (play_type == "pass" & yards_gained >= 20) |
      (play_type == "run" & yards_gained >= 10),
      na.rm = TRUE
    ),
    .groups = "drop"
  ) %>%
  # Calculate z-scores for composite ranking
  mutate(
    epa_z = scale(epa_per_play),
    success_z = scale(-success_rate_allowed),  # Negative because lower is better
    explosive_z = scale(-explosive_rate_allowed),
    composite_score = (epa_z + success_z + explosive_z) / 3
  ) %>%
  arrange(desc(composite_score))

# Display top 10
comprehensive_defense %>%
  head(10) %>%
  select(defteam, plays, yards_per_play, epa_per_play, success_rate_allowed,
         explosive_rate_allowed, composite_score) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    plays = "Plays",
    yards_per_play = "YPP",
    epa_per_play = "EPA",
    success_rate_allowed = "Success %",
    explosive_rate_allowed = "Explosive %",
    composite_score = "Score"
  ) %>%
  fmt_number(
    columns = c(yards_per_play, epa_per_play, composite_score),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = c(success_rate_allowed, explosive_rate_allowed),
    decimals = 1
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  data_color(
    columns = composite_score,
    colors = scales::col_numeric(
      palette = c("#F44336", "#FFFFFF", "#4CAF50"),
      domain = c(-2, 2)
    )
  ) %>%
  tab_header(
    title = "Comprehensive Defensive Rankings",
    subtitle = "2023 NFL Season - Composite Score (EPA + Success + Explosive)"
  )

#| label: comprehensive-defense-py
#| message: false
#| warning: false

from scipy import stats

# Create comprehensive defensive metrics
comp_def_data = pbp_2023.query(
    "defteam.notna() & play_type.isin(['pass', 'run'])"
).copy()

# Calculate explosive plays
comp_def_data['is_explosive'] = (
    ((comp_def_data['play_type'] == 'pass') & (comp_def_data['yards_gained'] >= 20)) |
    ((comp_def_data['play_type'] == 'run') & (comp_def_data['yards_gained'] >= 10))
).astype(int)

comprehensive_defense = (comp_def_data
    .groupby('defteam')
    .agg(
        plays=('play_id', 'count'),
        yards_per_play=('yards_gained', 'mean'),
        epa_per_play=('epa', lambda x: -x.mean()),  # Negative for defense
        success_rate_allowed=('epa', lambda x: (x > 0).mean()),
        explosive_rate_allowed=('is_explosive', 'mean')
    )
    .reset_index()
)

# Calculate z-scores
comprehensive_defense['epa_z'] = stats.zscore(comprehensive_defense['epa_per_play'])
comprehensive_defense['success_z'] = stats.zscore(-comprehensive_defense['success_rate_allowed'])
comprehensive_defense['explosive_z'] = stats.zscore(-comprehensive_defense['explosive_rate_allowed'])
comprehensive_defense['composite_score'] = (
    comprehensive_defense['epa_z'] +
    comprehensive_defense['success_z'] +
    comprehensive_defense['explosive_z']
) / 3

# Display top 10
comp_def_top10 = (comprehensive_defense
    .sort_values('composite_score', ascending=False)
    .head(10)
    [['defteam', 'plays', 'yards_per_play', 'epa_per_play',
      'success_rate_allowed', 'explosive_rate_allowed', 'composite_score']]
)

print("\nComprehensive Defensive Rankings - 2023 NFL Season")
print("Composite Score (EPA + Success + Explosive)")
print("=" * 100)
print(comp_def_top10.to_string(index=False))

Summary

In this chapter, we've covered the fundamental concepts and metrics for evaluating defensive performance from a modern analytics perspective. We began by recognizing that defense must be measured differently than offense—from the opponent's perspective, accounting for what the defense prevents rather than what it accomplishes. This conceptual foundation underpins all defensive analytics.

Key Takeaways:

Context Matters More for Defense: Raw statistics like total yards allowed can be highly misleading because they don't account for game script, pace, or field position. EPA and success rate provide context-aware measurements that adjust for these factors, making them superior to traditional metrics.
Prevent Explosives Above All: Limiting big plays (20+ yard passes, 10+ yard runs) is one of the most important defensive objectives. Explosive plays allowed is one of the strongest predictors of defensive performance, sometimes even stronger than overall EPA. A single explosive play can undo an entire drive's worth of good defensive execution.
Situational Defense Determines Championships: Red zone and third down defense are critical situations that disproportionately affect outcomes. Elite red zone defenses force field goals instead of allowing touchdowns, saving approximately 4 points per red zone stop. Elite third down defenses get off the field, preventing extended drives and reducing total plays faced.
Personnel Flexibility is Essential: Modern defenses must adapt to offensive personnel with nickel (5 DBs) and dime (6 DBs) packages. The days of playing base defense (4-3 or 3-4) on most snaps are over—nickel is now the true "base" defense for most teams, used on 60%+ of snaps.
The Pass-Run Tradeoff: Almost no defense excels equally against both the pass and run. Defenses optimized to stop the pass (more DBs, lighter boxes) typically allow more yards per carry. Defenses optimized to stop the run (fewer DBs, heavier boxes) typically allow more yards per attempt. Understanding these tradeoffs helps identify matchup advantages.
Turnover Value vs. Turnover Predictability: Turnovers are extremely valuable when they occur (3-5 EPA per turnover), but turnover rate is one of the least stable metrics year-over-year. Focus on creating turnover opportunities (pressure, tight coverage) rather than expecting high turnover rates to persist.
Comprehensive Evaluation Over Single Metrics: Combine multiple metrics (EPA, success rate, explosiveness, situational performance) for complete defensive analysis. Defenses that rank well across all dimensions are truly elite, while defenses with divergent rankings have specific strengths or weaknesses worth investigating.

Metrics Covered:

Basic Efficiency: Yards per play allowed (5.0 or less is elite), points per drive allowed (under 1.70 is elite)
EPA and Success Rate: Context-aware defensive performance measuring expected points prevented and consistency of stopping successful plays
Personnel Analysis: Formation and coverage shell tendencies, including nickel/dime usage and box defender counts
Explosiveness: Big plays allowed (20+ yard passes, 10+ yard runs), with elite defenses allowing fewer than 8% explosive play rate
Red Zone: Touchdown percentage allowed (elite: under 50%), field goal percentage, points per red zone trip
Third Down: Stop rates by distance category, with elite defenses exceeding 60% stop rate overall
Pass Rush: Sack rates and pressure metrics, with sacks worth 2-3 EPA each
Takeaways: Interception and fumble recovery rates, worth 3-5 EPA per turnover but highly unstable year-over-year
Comprehensive Rankings: Composite defensive scores combining EPA, success rate, and explosive play prevention

The Defensive Coordinator's Toolkit:

The metrics we've covered in this chapter form the foundation of modern defensive evaluation. A defensive coordinator preparing for an upcoming opponent would examine:

Overall efficiency (EPA, yards per play) to understand baseline defensive quality
Pass-run splits to identify whether to attack through the air or on the ground
Explosive play rates to assess big-play risk and determine appropriate aggression level
Third down performance to understand how to approach key downs
Red zone tendencies to prepare goal-line packages and red zone play-calls
Personnel packages to anticipate defensive alignments and match personnel appropriately

This analytical framework extends beyond opponent preparation to roster construction, in-game adjustments, and performance evaluation. Front offices use these metrics to identify free agent targets, evaluate draft prospects, and determine coaching effectiveness. In-game, coaches use real-time EPA and success rate to assess whether their defensive game plan is working or needs adjustment.

Looking Ahead:

This chapter has provided the foundation for defensive analytics, but we've only scratched the surface. In the coming chapters, we'll dive deeper into specific aspects of defensive performance:

Chapter 14 will explore pass defense in detail, examining coverage schemes, cornerback performance, and quarterback pressure
Chapter 15 will analyze run defense, including gap integrity, tackle success, and defending different run concepts
Chapter 16 will investigate defensive decision-making, including when to blitz, coverage selection by situation, and risk-reward tradeoffs

The metrics and concepts from this chapter will serve as building blocks for these more advanced topics. You now have the tools to evaluate defensive performance at a professional level—understanding not just which defenses are good or bad, but why they succeed or fail, and how their performance translates into wins and losses.

Final Insight: Defense is Half the Equation

While we often focus on offensive analytics because offense is easier to measure and more exciting to watch, defense contributes equally to winning. Teams that rank in the top 10 in defensive EPA make the playoffs at approximately the same rate as teams ranking in the top 10 in offensive EPA. Building a championship team requires excellence on both sides of the ball—or at least adequate performance on one side paired with elite performance on the other.

Exercises

Conceptual Questions

EPA for Defense: Explain why defensive EPA is negated (multiplied by -1) in many analyses. What does a positive defensive EPA indicate?
Two-High Safety: Why have many NFL defenses shifted toward two-high safety looks in recent years? What are the trade-offs?
Red Zone Defense: Is it better to prevent red zone touchdowns (forcing field goals) or to prevent red zone trips altogether? Defend your answer with analytical reasoning.

Coding Exercises

Exercise 1: Complete Defensive Profile

Create a comprehensive defensive profile for a specific team that includes: a) Overall efficiency metrics (EPA allowed, success rate allowed, yards per play) b) Pass-run defense split and efficiency against each c) Explosive plays allowed d) Red zone performance e) Third down stop rates f) A visualization comparing the team to league averages **Bonus**: Add pressure rate and coverage metrics if available.

Exercise 2: Formation Effectiveness Analysis

Analyze the effectiveness of different defensive personnel groupings: a) Calculate EPA and success rate allowed for each defensive personnel grouping b) Determine which personnel groupings are most effective against pass vs run c) Identify which teams use which packages most effectively d) Create a visualization showing personnel usage vs effectiveness **Hint**: Filter for plays where `defense_personnel` is not null.

Exercise 3: Situational Defense Evaluation

Examine defensive performance in key situations: a) Early down (1st & 2nd) defense metrics b) Third down stop rates by distance category c) Red zone defense (inside 20, inside 10, inside 5) d) Performance in close games (4th quarter, one score difference) e) Visualizations comparing teams across situations **Challenge**: Identify which teams are most consistent across all situations.

Exercise 4: Advanced Defensive Metrics

Create an advanced defensive evaluation system: a) Calculate havoc rate (sacks + turnovers + TFLs per play) b) Measure defensive consistency (standard deviation of EPA) c) Analyze performance by opponent quality d) Create a composite defensive score using multiple metrics e) Visualize the relationship between different defensive metrics **Bonus**: Weight metrics by their correlation with winning.

Exercise 5: Coverage Analysis

Deep dive into coverage schemes (if data available): a) Analyze performance in different coverage shells (single-high vs two-high) b) Measure effectiveness by defenders in box c) Examine blitz frequency and success d) Determine optimal coverage strategies by down and distance e) Create visualizations showing coverage tendencies and effectiveness **Challenge**: Compare coverage strategies between winning and losing teams.

References

:::

Learning ObjectivesBy the end of this chapter, you will be able to:

Defensive Coverage Performance

Introduction

The Defensive Perspective: Preventing Points Rather Than Scoring Them

What is Defensive Analytics?

The Evolution of Defensive Football

From Steel Curtain to Modern Multiple Defenses

The Analytics Impact

Basic Defensive Metrics

Traditional Statistics

Yards Per Play Allowed

Key Insight: Pass Defense vs Run Defense Trade-offs

Common Pitfall: Confusing Volume and Efficiency

Points Allowed Per Drive

Best Practice: Combine Multiple Defensive Metrics

Defensive EPA and Success Rate

EPA Against (Defensive EPA)

Defensive Success Rate

Visualizing Defensive EPA

📊 Visualization Output

Defensive Formations and Personnel

Base Defensive Formations

Coverage Schemes: Two-High vs Single-High Safety

Explosiveness Allowed

Measuring Explosive Plays Allowed

Visualizing Explosive Plays Allowed

📊 Visualization Output

📊 Visualization Output

Red Zone Defense

Red Zone Metrics

Key Insight: Red Zone Defense Wins Championships

Red Zone Defense Visualization

Third Down Defense

Understanding Third Down Success

Third Down Defense by Distance

📊 Visualization Output

📊 Visualization Output

Takeaways and Turnovers

The Most Valuable Defensive Plays

Common Pitfall: Overvaluing Turnover Rate

Measuring Defensive Takeaways

Best Practice: Focus on Turnover Creation, Not Turnover Results

Pass Rush Metrics

Sacks and Pressure

Comprehensive Defensive Dashboard

Building a Complete Defensive Evaluation System

Understanding Composite Scores

Summary

Final Insight: Defense is Half the Equation

Exercises

Conceptual Questions

Coding Exercises

Exercise 1: Complete Defensive Profile

Exercise 2: Formation Effectiveness Analysis

Exercise 3: Situational Defense Evaluation

Exercise 4: Advanced Defensive Metrics

Exercise 5: Coverage Analysis

Further Reading

References