Learning ObjectivesBy the end of this chapter, you will be able to:

  1. Understand different coverage schemes (Cover 0-6) and their tactical applications
  2. Analyze coverage effectiveness using EPA, success rate, and completion metrics
  3. Evaluate cornerback and safety performance with advanced statistics
  4. Study man vs zone coverage trade-offs and situational effectiveness
  5. Use tracking data and play-by-play data for comprehensive coverage analysis

Introduction

Pass defense has become increasingly complex in modern football. As offenses have evolved to exploit defensive weaknesses through sophisticated passing attacks, defenses must respond with diverse coverage schemes, disguises, and personnel packages. Understanding coverage analysis is essential for evaluating defensive performance beyond simple statistics like interceptions or passes defended.

The modern NFL has witnessed a dramatic shift in defensive philosophy over the past decade. In 2011, defenses played single-high safety structures (Cover 1 and Cover 3) on approximately 60% of passing plays. By 2023, that number had inverted, with two-high safety shells (Cover 2, Cover 4, Cover 6) becoming the dominant alignment. This evolution represents a direct response to explosive passing offenses and mobile quarterbacks who threaten defenses both through the air and on the ground.

However, this shift comes with tradeoffs. Two-high safety structures provide better deep coverage and reduce the likelihood of explosive plays, but they also remove a defender from the box, making them more vulnerable to the running game. Elite offensive coordinators have exploited this vulnerability, leading to the resurgence of gap-scheme running attacks in recent seasons. The chess match between offensive and defensive coordinators continues to evolve, making coverage analysis more important than ever.

This chapter explores how data science can illuminate the effectiveness of different coverage schemes, evaluate individual defender performance, and identify optimal coverage strategies based on game situations. We will examine the fundamental coverage families (man versus zone), dive deep into specific coverage schemes (Cover 0 through Cover 6), analyze individual defender performance using advanced metrics, and explore how coverage effectiveness varies with game context, personnel groupings, and pressure packages.

By the end of this chapter, you will understand not just which coverages perform best on average, but which coverages excel in specific situations, how to evaluate individual defenders using opportunity-adjusted metrics, and how to identify coverage tendencies that create exploitable patterns. This knowledge is essential for defensive coordinators designing game plans, personnel evaluators assessing defensive back talent, and analysts seeking to understand the hidden dimensions of pass defense.

What is Coverage Analysis?

Coverage analysis is the systematic evaluation of defensive pass coverage schemes and individual defender performance using statistical methods, tracking data, and advanced metrics. It encompasses scheme identification, player evaluation, matchup analysis, and strategic optimization.

Understanding Coverage Schemes

Coverage Terminology and Philosophy

Before diving into analysis, we must understand the fundamental coverage schemes used in football. The coverage numbering system (Cover 0, 1, 2, 3, 4, 6) provides a common language for discussing defensive pass coverage, but it's important to recognize that these are not rigid, unchanging structures. Modern defenses layer disguises, pattern-matching principles, and post-snap adjustments on top of these base concepts.

The coverage number typically indicates how many deep defenders are protecting against vertical threats. However, this simplified explanation masks significant complexity in how these coverages actually function on the field. Coverages also differ in their fundamental philosophy: are defenders playing man-to-man against specific receivers, or are they playing zone coverage in assigned areas of the field? This man versus zone distinction is often more important than the specific coverage number.

Let's examine each major coverage family in detail, understanding not just the alignment but the philosophy, strengths, weaknesses, and typical usage patterns.

Cover 0 (Zero): All-Out Pressure Coverage

Coverage 0 represents the most aggressive coverage scheme in football: pure man coverage with no deep safety help. Every eligible receiver is covered man-to-man by a defender, with no safety providing help over the top. This leaves no "free" defender to help on any matchup, creating a high-risk, high-reward scenario.

The primary advantage of Cover 0 is that it allows the defense to bring additional pass rushers while still maintaining man coverage on every eligible receiver. If the offense has five eligible receivers, the defense can bring six rushers (or more if they bring defensive backs on blitzes) while still covering everyone man-to-man. This creates immediate pressure on the quarterback and forces quick decisions.

However, the risks are substantial. If any defender loses their matchup or gets beaten on a double move, there is no safety help to prevent a touchdown. Cover 0 is particularly vulnerable to play-action passes, as defenders cannot "peek" at the backfield without losing their coverage responsibility. Elite quarterbacks who can quickly identify Cover 0 and throw hot routes can exploit this coverage for big gains.

Defenses typically deploy Cover 0 in obvious passing situations (3rd-and-long, two-minute drills) where they need to generate quick pressure, or against less experienced quarterbacks who struggle with pre-snap recognition. Some aggressive defensive coordinators also use Cover 0 in short-yardage situations, bringing pressure to force a quick throw before routes can develop.

Cover 1 (One-High Safety): Aggressive Man Coverage with Help

Cover 1 maintains the man-coverage philosophy of Cover 0 while adding a critical safety valve: a single free safety who patrols the deep middle of the field, providing help over the top. This safety, often called the "free safety" or "post safety," doesn't have a specific man coverage assignment but instead reads the quarterback and provides help to whichever deep route threatens to beat its defender.

The presence of this deep safety allows cornerbacks to play more aggressively, knowing they have help if they get beaten vertically. This aggressive positioning enables cornerbacks to jump underneath routes and create turnovers. Cover 1 is excellent for generating pressure while maintaining reasonable coverage security.

Cover 1's primary weakness lies in its vulnerability to bunch formations and route concepts designed to create natural picks or rubs. When multiple receivers are aligned close together, it becomes difficult for man-coverage defenders to navigate through traffic without interfering with each other. Additionally, the single deep safety must choose which deep threat to help on when multiple vertical routes attack different areas of the field simultaneously.

Teams with elite cornerbacks who excel in man coverage heavily favor Cover 1, as it allows their best players to impact the game in isolation. The coverage also works well against offenses that rely on timing routes, as physical man coverage can disrupt the timing between quarterback and receiver.

Cover 2 (Two-High Safeties): The Foundation of Zone Coverage

Cover 2 represents the traditional zone coverage scheme, featuring two deep safeties who split the field into halves and five underneath defenders covering shorter zones. Each deep safety is responsible for all deep routes in their half of the field, while underneath defenders read the quarterback and break on passing routes in their zones.

The greatest strength of Cover 2 is its protection against deep passes. With two safeties splitting deep coverage responsibilities, it's very difficult for offenses to throw over the top for big gains. This makes Cover 2 a favorite in prevent situations or when protecting a lead late in games. The coverage also provides excellent support against the run, as both safeties can quickly attack downhill to fill run gaps.

However, Cover 2 has well-known weaknesses that sophisticated offenses exploit. The seams between the deep safeties and underneath defenders (particularly the "deep middle hole" between the safeties at 12-15 yards depth) are vulnerable to intermediate routes. Flood concepts that put multiple receivers into a single defender's zone force difficult coverage decisions. Quarterbacks who can layer throws into these windows find success against Cover 2.

The "Tampa 2" variation addresses some of these weaknesses by having the middle linebacker carry the deep middle zone, but this requires an exceptional athlete at linebacker who can cover significant ground in their drop.

Cover 3 (Three-Deep): Balanced Zone Protection

Cover 3 divides the deep field into thirds rather than halves, with three defenders (typically two cornerbacks and one safety) each responsible for deep routes in their third. The remaining defenders cover four underneath zones, creating a balanced structure that protects both deep and intermediate areas more evenly than Cover 2.

This coverage excels against vertical passing attacks, as it provides more defenders in deep coverage than Cover 2 while still maintaining underneath coverage. The cornerbacks in Cover 3 can play with outside leverage, confident they have help inside from the deep safety. This makes Cover 3 particularly effective against offenses that stretch the field vertically with multiple deep threats.

The primary weakness of Cover 3 appears in the flat zones. With cornerbacks bailing to deep-third responsibilities, the flat defenders (typically outside linebackers or nickel backs) must cover significant width, creating opportunities for quick passes to the flat or bubble screens. Offenses attack Cover 3 with quick passes to the perimeter, forcing defenders to tackle in space.

Cover 3 has seen reduced usage in recent years as offenses have evolved to attack its flat-zone vulnerabilities more effectively. However, it remains valuable in specific situations, particularly on third-and-long where protecting against deep completions is paramount.

Cover 4 (Quarters): Pattern-Matching Zone

Cover 4, also known as "Quarters" coverage, represents the most sophisticated zone coverage concept. Four deep defenders each protect a quarter of the field, but unlike traditional zone coverage, these defenders use pattern-matching principles: they read the route combinations being run and adjust their coverage responsibilities based on the offense's patterns.

In pure quarters coverage, the four deep defenders (typically two cornerbacks and two safeties) initially divide the field into quadrants. However, their actual coverage responsibilities change based on the number of vertical threats from their side. If their side has only one vertical threat, one defender can carry that route deep while the other sits on the intermediate zone. If their side has two vertical threats, both defenders stay deep. This pattern-matching creates a coverage that can morph between two-deep and four-deep based on the route concept.

Cover 4 provides exceptional flexibility and adapts naturally to different offensive formations and route combinations. It's particularly effective against condensed formations where traditional zone coverages struggle to distribute defenders appropriately. The pattern-matching principles make it difficult for offenses to create clear conflicts in coverage.

However, Cover 4 requires intelligent, well-coached defenders who can quickly read and react to route patterns. Less experienced players may struggle with the complex rules and communication required. The coverage is also vulnerable to pick routes and combination patterns designed to create confusion about which defender has which receiver.

Cover 6 (Quarter-Quarter-Half): Hybrid Coverage

Cover 6 combines elements of Cover 4 and Cover 2 into a single coverage, typically playing quarters (Cover 4) to one side of the field and two-deep (Cover 2) to the other side. This creates an asymmetric coverage structure that can adapt to unbalanced offensive formations or specific game plan focuses.

The flexibility of Cover 6 allows defenses to play quarters coverage to the strength of the formation (where more receivers are aligned) while playing simpler two-deep coverage to the weak side. This optimizes defensive resources by deploying more sophisticated coverage where the offense is more likely to attack while maintaining sound coverage on the backside.

Offenses can struggle to identify Cover 6 pre-snap, as it can resemble pure Cover 2 or Cover 4 based on alignment. The post-snap rotation that reveals the true coverage structure can confuse quarterbacks and disrupt timing. However, once offenses recognize the coverage, they can potentially create advantageous matchups by attacking the coverage's seams or leveraging the different coverage structures on each side of the field.

Modern Coverage Complexity

Modern defenses rarely play pure coverages. Instead, they use pattern-matching principles, disguises, and post-snap rotations to confuse quarterbacks and create uncertainty. Our analysis must account for these nuances.

Two-High vs One-High Safety Structures

One of the most important strategic decisions in pass defense is the number of deep safeties. This choice fundamentally affects both pass and run defense:

Two-High Safety Advantages:
- Better deep coverage against vertical passing attacks
- More flexibility in defending multiple vertical threats
- Reduced risk of explosive plays
- Better positioned against modern spread offenses

One-High Safety Advantages:
- Additional defender in the box for run support
- More aggressive man coverage capabilities
- Better for defending tight formations
- Creates clearer matchups for cornerbacks

The trend in recent NFL seasons has been toward increased two-high safety usage, driven by explosive passing offenses and mobile quarterbacks.

Loading and Preparing Coverage Data

Now that we understand the theoretical foundations of coverage schemes, we can begin our empirical analysis. Coverage analysis presents unique data challenges compared to other football analytics domains. While play-by-play data tracks outcomes (completions, yards, EPA), it contains limited information about the specific coverage played on each snap.

The nflfastR dataset includes a pass_defense_man_zone field that categorizes plays as either "man" or "zone" coverage based on charting from various sources. However, this broad categorization doesn't distinguish between specific coverage types (Cover 1 vs Cover 0, or Cover 2 vs Cover 3). For detailed coverage identification, analysts typically need to integrate:

  1. Next Gen Stats data - Provides tracking data showing defender alignments and movements
  2. Charting services - Manual charting of coverage schemes (Sports Info Solutions, Pro Football Focus)
  3. Film analysis - Direct observation of All-22 film to identify coverage structures
  4. Proxy variables - Using personnel groupings, formation data, and defensive outcomes to infer coverage

For this chapter, we'll primarily use the man/zone classification from nflfastR while demonstrating how to enhance analysis with additional data sources where available. We'll also explore techniques for inferring coverage types from observable patterns in the data.

Let's begin by loading multiple seasons of play-by-play data. Using multiple seasons is crucial for coverage analysis because:

  • Sample size: Individual coverage schemes may only be used on 10-20% of plays, requiring large samples for reliable estimates
  • Trend analysis: Coverage usage has evolved significantly in recent years; multiple seasons reveal these trends
  • Stability assessment: Multiple seasons help distinguish true coverage effectiveness from random variation
  • Personnel changes: Tracking coverage performance across personnel transitions helps isolate scheme effects from talent effects
#| label: setup-r
#| message: false
#| warning: false
#| cache: true

# Load required libraries for data manipulation and visualization
library(tidyverse)      # Data manipulation and visualization ecosystem
library(nflfastR)       # NFL play-by-play data access
library(nflplotR)       # NFL-specific plotting utilities (logos, colors)
library(gt)             # Publication-quality tables
library(scales)         # Formatting scales for plots

# Load multiple seasons of play-by-play data for robust analysis
# Using 2021-2023 provides ~150,000 pass plays for analysis
# This sample size is crucial for analyzing coverage schemes that may
# only appear on 10-20% of plays
pbp <- load_pbp(2021:2023)

# Filter for pass plays with coverage data available
# We need to exclude plays without valid EPA (penalties, timeouts, etc.)
# and focus only on actual pass attempts where coverage was played
pass_plays <- pbp %>%
  filter(
    !is.na(epa),                        # Valid EPA calculation
    !is.na(posteam),                    # Known possessing team
    play_type == "pass",                # Actual pass attempts
    !is.na(pass_defense_man_zone)      # Coverage type available
  ) %>%
  mutate(
    # Create binary indicators for coverage type
    # These make filtering and grouping easier in subsequent analysis
    is_man_coverage = if_else(pass_defense_man_zone == "man", 1, 0),
    is_zone_coverage = if_else(pass_defense_man_zone == "zone", 1, 0),

    # Define success metrics for defensive evaluation
    # Positive EPA means offense gained expected points (bad for defense)
    epa_success = if_else(epa > 0, 1, 0),

    # Explosive plays are 20+ yard gains
    # These are critically important to prevent in coverage
    explosive_play = if_else(yards_gained >= 20, 1, 0),

    # Categorize defensive personnel groupings
    # Format is "DL-LB-DB" (e.g., "3-3-5" is Nickel: 3 DL, 3 LB, 5 DB)
    def_personnel_grouped = case_when(
      grepl("^3-3", defteam_formation) ~ "Nickel",    # Most common (3 DL, 3 LB)
      grepl("^2-4", defteam_formation) ~ "Dime",      # Extra DB (2 DL, 4 LB)
      grepl("^4-2", defteam_formation) ~ "Base",      # Traditional (4 DL, 2 LB)
      TRUE ~ "Other"                                   # Specialty packages
    )
  )

# Display data loading summary
# This helps verify we have sufficient data for analysis
cat("Loaded", nrow(pass_plays), "pass plays from 2021-2023\n")
cat("Coverage information available:",
    round(100 * sum(!is.na(pass_plays$pass_defense_man_zone)) / nrow(pass_plays), 1),
    "% of plays\n")
cat("\nCoverage distribution:\n")
pass_plays %>%
  count(pass_defense_man_zone) %>%
  mutate(pct = 100 * n / sum(n)) %>%
  print()
#| label: setup-py
#| message: false
#| warning: false
#| cache: true

# Import required libraries for data analysis and visualization
import pandas as pd           # Data manipulation with DataFrames
import numpy as np            # Numerical computing and arrays
import nfl_data_py as nfl     # NFL data access for Python
import matplotlib.pyplot as plt  # Plotting library
import seaborn as sns         # Statistical visualization

# Configure visualization aesthetics
# Using seaborn style for professional-looking plots
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Load multiple seasons of play-by-play data
# Years 2021-2023 provide sufficient sample size for coverage analysis
pbp = nfl.import_pbp_data([2021, 2022, 2023])

# Filter for pass plays with coverage data available
# Chain filtering and feature engineering using method chaining
pass_plays = (pbp
    # Initial filtering for valid pass plays
    .query("epa.notna() & posteam.notna() & play_type == 'pass'")
    # Require coverage information to be present
    .query("pass_defense_man_zone.notna()")
    # Create derived features using assign
    .assign(
        # Binary indicators for coverage type
        # Convert boolean to integer (0/1) for easier aggregation
        is_man_coverage=lambda x: (x['pass_defense_man_zone'] == 'man').astype(int),
        is_zone_coverage=lambda x: (x['pass_defense_man_zone'] == 'zone').astype(int),

        # Success metrics from defensive perspective
        # Positive EPA = offense gained expected points = defensive failure
        epa_success=lambda x: (x['epa'] > 0).astype(int),

        # Explosive plays (20+ yards) are critical coverage failures
        explosive_play=lambda x: (x['yards_gained'] >= 20).astype(int),

        # Categorize defensive personnel packages
        # Format: "DL-LB-DB" (e.g., "3-3-5" = Nickel)
        def_personnel_grouped=lambda x: x['defteam_formation'].apply(
            lambda y: 'Nickel' if pd.notna(y) and y.startswith('3-3') else
                     'Dime' if pd.notna(y) and y.startswith('2-4') else
                     'Base' if pd.notna(y) and y.startswith('4-2') else
                     'Other'
        )
    )
)

# Display data loading summary
print(f"Loaded {len(pass_plays):,} pass plays from 2021-2023")
print(f"Coverage information available: "
      f"{100 * pass_plays['pass_defense_man_zone'].notna().sum() / len(pass_plays):.1f}% of plays")

# Show coverage distribution
print("\nCoverage distribution:")
coverage_dist = pass_plays['pass_defense_man_zone'].value_counts()
for cov, count in coverage_dist.items():
    pct = 100 * count / len(pass_plays)
    print(f"  {cov}: {count:,} plays ({pct:.1f}%)")
This code sets up our coverage analysis environment by loading and preparing data. Let's break down the key components: **Data Loading Strategy**: We load three seasons (2021-2023) rather than a single season because coverage schemes may only appear on 10-20% of plays. With approximately 50,000 pass attempts per season, we need multiple seasons to achieve sufficient sample sizes for reliable statistical estimates. **Filtering Logic**: We apply multiple filters to ensure data quality: - `epa.notna()`: Excludes plays without valid EPA (penalties, special circumstances) - `play_type == 'pass'`: Focuses only on passing plays where coverage was actually used - `pass_defense_man_zone.notna()`: Requires coverage classification to be available **Feature Engineering**: We create several derived variables: - **Binary coverage indicators** (`is_man_coverage`, `is_zone_coverage`): Enable easy filtering and aggregation by coverage type - **Success metrics** (`epa_success`, `explosive_play`): Define what constitutes defensive success or failure - **Personnel groupings** (`def_personnel_grouped`): Categorize defensive formations, as coverage usage varies significantly by personnel **Personnel Package Terminology**: - **Nickel** (3-3-5): Three defensive linemen, three linebackers, five defensive backs. The most common modern package, used on ~60% of plays. - **Dime** (2-4-6): Two defensive linemen, four linebackers, six defensive backs. Used in obvious passing situations. - **Base** (4-2-5 or 4-3-4): Four defensive linemen, traditional linebacker group. Increasingly rare in modern NFL. The output shows us how much data we have and what proportion uses man versus zone coverage. This distribution itself tells an important story about modern defensive philosophy.

Data Limitation: Coverage Classification

The `pass_defense_man_zone` field provides only broad categorization (man vs zone) and doesn't distinguish between specific coverage types (Cover 1 vs Cover 0, Cover 2 vs Cover 4). More detailed coverage analysis requires supplementary data sources like Next Gen Stats tracking data or professional charting services. Additionally, modern pattern-matching coverages blur the line between traditional man and zone concepts.

Man vs Zone Coverage Analysis

Overall Effectiveness Comparison

Now that we have our data prepared, we can begin analyzing the fundamental question in pass defense: What is more effective, man coverage or zone coverage? This question has been debated in coaching circles for decades, but data provides empirical evidence to inform the discussion.

However, we must be careful in our interpretation. Coverage effectiveness depends heavily on context: the quality of defenders, the offensive personnel, game situation, and coaching scheme all matter. A simple overall comparison provides a starting point, but we'll need to examine situational splits to develop actionable insights.

We'll evaluate coverage effectiveness using multiple metrics:

  • EPA per play: The primary metric for overall play value
  • Success rate: Percentage of plays where the offense gains positive EPA
  • Completion percentage: How often passes are completed against each coverage
  • Yards per attempt: Average yards gained per pass attempt
  • Explosive play rate: Percentage of plays resulting in 20+ yard gains
  • Interception rate: Frequency of turnovers created
  • Sack rate: Frequency of sacks (though this is influenced by pass rush more than coverage)

Each metric captures a different dimension of coverage performance. EPA provides the most comprehensive measure, but examining multiple metrics reveals the mechanisms through which coverages succeed or fail.

#| label: man-zone-comparison-r
#| message: false
#| warning: false

# Calculate comprehensive man vs zone metrics
# Group by coverage type and compute multiple performance indicators
# to get a complete picture of coverage effectiveness
man_zone_comparison <- pass_plays %>%
  group_by(coverage_type = pass_defense_man_zone) %>%
  summarise(
    # Sample size - important for assessing statistical significance
    plays = n(),

    # Primary effectiveness metric: EPA allowed
    # Lower (more negative) is better for defense
    avg_epa = mean(epa, na.rm = TRUE),

    # Success rate: percentage of plays where offense gains positive EPA
    # Lower is better for defense
    success_rate = mean(epa_success, na.rm = TRUE),

    # Completion percentage: how often passes are completed
    # Lower is better for defense
    completion_pct = mean(complete_pass, na.rm = TRUE),

    # Average yards gained per pass attempt
    # Lower is better for defense
    yards_per_attempt = mean(yards_gained, na.rm = TRUE),

    # Explosive play rate: percentage of 20+ yard gains
    # Critical metric - explosive plays dramatically shift game outcomes
    explosive_rate = mean(explosive_play, na.rm = TRUE),

    # Interception rate: frequency of turnovers created
    # Higher is better for defense
    interception_rate = mean(interception, na.rm = TRUE),

    # Sack rate: frequency of sacks
    # Reflects both coverage and pass rush performance
    sack_rate = mean(sack, na.rm = TRUE),

    .groups = "drop"
  )

# Create publication-quality formatted table
# Color-code EPA to highlight better/worse performing coverages
man_zone_comparison %>%
  gt() %>%
  cols_label(
    coverage_type = "Coverage Type",
    plays = "Plays",
    avg_epa = "EPA/Play",
    success_rate = "Success Rate",
    completion_pct = "Completion %",
    yards_per_attempt = "Yards/Att",
    explosive_rate = "Explosive %",
    interception_rate = "INT %",
    sack_rate = "Sack %"
  ) %>%
  # Format numerical columns with appropriate decimal places
  fmt_number(
    columns = c(avg_epa, success_rate, completion_pct, yards_per_attempt,
                explosive_rate, interception_rate, sack_rate),
    decimals = 3
  ) %>%
  # Format play counts with thousands separators
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  # Color-code EPA column: green (good for defense) to red (bad for defense)
  data_color(
    columns = avg_epa,
    colors = scales::col_numeric(
      palette = c("#d73027", "#fee08b", "#1a9850"),
      domain = c(max(man_zone_comparison$avg_epa),
                 min(man_zone_comparison$avg_epa)),
      reverse = TRUE  # Reverse because lower EPA is better
    )
  ) %>%
  tab_header(
    title = "Man vs Zone Coverage Performance",
    subtitle = "NFL 2021-2023 Seasons"
  ) %>%
  tab_source_note("Data: nflfastR | Lower EPA is better for defense")
#| label: man-zone-comparison-py
#| message: false
#| warning: false

# Calculate comprehensive man vs zone metrics
man_zone_comparison = (pass_plays
    .groupby('pass_defense_man_zone')
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean'),
        success_rate=('epa_success', 'mean'),
        completion_pct=('complete_pass', 'mean'),
        yards_per_attempt=('yards_gained', 'mean'),
        explosive_rate=('explosive_play', 'mean'),
        interception_rate=('interception', 'mean'),
        sack_rate=('sack', 'mean')
    )
    .reset_index()
    .rename(columns={'pass_defense_man_zone': 'coverage_type'})
)

print("\n=== Man vs Zone Coverage Performance (2021-2023) ===\n")
print(man_zone_comparison.to_string(index=False))
print("\nNote: Lower EPA is better for defense")
This analysis aggregates all pass plays from 2021-2023 and compares man versus zone coverage across multiple performance dimensions. The methodology is straightforward: 1. **Group plays by coverage type** (man or zone) 2. **Calculate mean values** for each performance metric within each group 3. **Compare the aggregated results** to identify which coverage performs better overall The metrics we're examining tell different parts of the coverage effectiveness story: - **EPA/Play**: The most comprehensive metric, combining completion probability, yards gained, and field position into a single expected points value - **Success Rate**: A binary version of EPA—did the play generate positive EPA for the offense? - **Completion Percentage**: The most basic measure—how often do passes connect? - **Yards per Attempt**: Average yards gained, accounting for both completions and incompletions - **Explosive Rate**: Critical for understanding ceiling/floor—does this coverage prevent or allow big plays? - **Interception Rate**: Defensive playmaking potential - **Sack Rate**: Coverage-sack relationship (though sacks depend heavily on pass rush) **Key Statistical Considerations**: - With 50,000+ plays per coverage type, our sample sizes are very large, making even small differences statistically significant - However, **statistical significance doesn't equal practical significance**—a 0.01 EPA difference might be statistically significant but strategically meaningless - We must consider **selection bias**: defenses don't choose man vs zone randomly. They select coverages based on situation, opponent, personnel, and game plan - **Confounding factors**: Coaching quality, defender talent, opponent strength, and game script all influence these results

Interpreting Coverage EPA

When evaluating coverage performance, **lower EPA allowed is better for the defense**. A negative EPA means the defense is preventing the offense from gaining expected points. For example, if zone coverage allows +0.05 EPA per play and man coverage allows -0.02 EPA per play, man coverage is performing better—it's preventing the offense from generating expected points. **Critical Context**: These aggregate numbers mask enormous variation. The difference between elite and poor execution of the same coverage scheme often exceeds the difference between coverage schemes themselves. A well-executed Cover 2 with elite safeties outperforms poorly-executed Cover 1 with below-average cornerbacks.

Interpretation of Man vs Zone Results

When we examine the comprehensive comparison between man and zone coverage, several patterns typically emerge in modern NFL data:

Zone Coverage Characteristics (generally):
- Higher completion percentages: Zone coverage, by design, concedes underneath completions while protecting deep areas. Quarterbacks often complete 65-70% of passes against zone compared to 60-65% against man coverage.
- Lower explosive play rates: The primary purpose of zone coverage—particularly two-high safety shells—is preventing explosive plays. Zone coverage typically reduces 20+ yard play rates by 1-2 percentage points compared to man coverage.
- Lower sack rates: Zone coverage often requires fewer blitzers, resulting in fewer sacks. Man coverage enables defenses to bring more rushers while maintaining coverage integrity.
- Fewer turnovers: Zone coverage generates fewer interceptions than man coverage because defenders are playing areas rather than tracking specific receivers, making it harder to jump routes.

Man Coverage Characteristics (generally):
- Lower completion percentages: Tight man coverage disrupts timing and makes completions more difficult, particularly against precision timing routes.
- Higher explosive play potential: When man coverage fails, it often fails catastrophically. A beaten cornerback with no safety help yields a touchdown.
- Higher sack rates: Man coverage allows defenses to bring additional rushers, increasing sack frequency.
- More turnovers: Man coverage defenders can pattern-read and jump routes because they're tracking receivers, leading to more interceptions.

The Overall EPA Picture: Despite zone coverage's higher completion rates, the two coverage families often show similar overall EPA in aggregate data. Why? Because zone coverage's tendency to allow completions is offset by preventing explosive plays, while man coverage's lower completion rates are offset by occasional catastrophic busts. The result is often a near-equilibrium in EPA—but this equilibrium masks very different risk profiles.

Think of it like investing: zone coverage is bonds (steady, predictable, lower variance), while man coverage is stocks (higher variance, bigger upside and downside). The choice between them depends on your personnel, your opponent, and your strategic philosophy—are you trying to minimize downside risk or maximize upside potential?

Practical Application: Coverage Selection Framework

When choosing between man and zone coverage, defensive coordinators should consider: 1. **Personnel**: Do you have cornerbacks capable of winning one-on-one matchups consistently? 2. **Opponent**: Does the offense rely on timing routes (vulnerable to man) or vertical shots (vulnerable to zone)? 3. **Game Script**: Protecting a lead favors zone coverage (prevent explosives); chasing points favors man coverage (force punts) 4. **Down & Distance**: Short yardage favors man (prevent any completion); long yardage favors zone (prevent conversions) 5. **Field Position**: Backed up favors zone (can't afford explosives); in opponents' territory favors man (force mistakes) The best defenses don't commit to one coverage philosophy—they adapt based on these contextual factors.

Coverage Performance by Down and Distance

The aggregate comparison provides useful baseline information, but coverage effectiveness varies dramatically by game situation. A coverage that excels on first down may struggle on third-and-long. Let's examine how man and zone coverage perform in different down-and-distance scenarios, which will reveal when each coverage type provides strategic advantages.

#| label: coverage-by-situation-r
#| message: false
#| warning: false

# Create down/distance categories
coverage_by_situation <- pass_plays %>%
  filter(down %in% c(1, 2, 3)) %>%
  mutate(
    distance_category = case_when(
      ydstogo <= 3 ~ "Short (≤3)",
      ydstogo <= 7 ~ "Medium (4-7)",
      ydstogo <= 10 ~ "Long (8-10)",
      TRUE ~ "Very Long (>10)"
    )
  ) %>%
  group_by(down, distance_category, coverage_type = pass_defense_man_zone) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate = mean(epa_success, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(plays >= 100)  # Minimum sample size

# Visualize EPA by situation
coverage_by_situation %>%
  ggplot(aes(x = distance_category, y = avg_epa, fill = coverage_type)) +
  geom_col(position = "dodge") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  facet_wrap(~down, labeller = labeller(down = function(x) paste("Down:", x))) +
  scale_fill_manual(
    values = c("man" = "#E69F00", "zone" = "#56B4E9"),
    labels = c("Man Coverage", "Zone Coverage")
  ) +
  labs(
    title = "Coverage EPA Allowed by Down and Distance",
    subtitle = "NFL 2021-2023 | Minimum 100 plays per category",
    x = "Distance to Go",
    y = "Average EPA Allowed",
    fill = "Coverage Type",
    caption = "Data: nflfastR | Lower EPA is better for defense"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "top"
  )
#| label: coverage-by-situation-py
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Create down/distance categories
def categorize_distance(yards):
    if yards <= 3:
        return "Short (≤3)"
    elif yards <= 7:
        return "Medium (4-7)"
    elif yards <= 10:
        return "Long (8-10)"
    else:
        return "Very Long (>10)"

coverage_by_situation = (pass_plays
    .query("down.isin([1, 2, 3])")
    .assign(distance_category=lambda x: x['ydstogo'].apply(categorize_distance))
    .groupby(['down', 'distance_category', 'pass_defense_man_zone'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean'),
        success_rate=('epa_success', 'mean')
    )
    .reset_index()
    .query("plays >= 100")  # Minimum sample size
)

# Create visualization
fig, axes = plt.subplots(1, 3, figsize=(15, 5), sharey=True)

for idx, down in enumerate([1, 2, 3]):
    ax = axes[idx]
    data = coverage_by_situation.query(f"down == {down}")

    # Prepare data for plotting
    categories = data['distance_category'].unique()
    x = np.arange(len(categories))
    width = 0.35

    man_data = data.query("pass_defense_man_zone == 'man'").set_index('distance_category')
    zone_data = data.query("pass_defense_man_zone == 'zone'").set_index('distance_category')

    # Plot bars
    ax.bar(x - width/2, [man_data.loc[cat, 'avg_epa'] if cat in man_data.index else 0
                         for cat in categories],
           width, label='Man Coverage', color='#E69F00')
    ax.bar(x + width/2, [zone_data.loc[cat, 'avg_epa'] if cat in zone_data.index else 0
                         for cat in categories],
           width, label='Zone Coverage', color='#56B4E9')

    ax.axhline(y=0, color='black', linestyle='--', alpha=0.7)
    ax.set_xlabel('Distance to Go')
    ax.set_title(f'Down: {down}')
    ax.set_xticks(x)
    ax.set_xticklabels(categories, rotation=45, ha='right')

    if idx == 0:
        ax.set_ylabel('Average EPA Allowed')
        ax.legend()

plt.suptitle('Coverage EPA Allowed by Down and Distance\nNFL 2021-2023 | Minimum 100 plays per category',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

Pressure and Coverage Interaction

Pass defense is never just about coverage or just about pass rush—it's about the synergy between the two. The old defensive coaching adage states, "Coverage creates pressure, and pressure creates coverage." This bidirectional relationship means that evaluating coverage in isolation misses a critical dimension of defensive performance.

When pass rushers generate pressure, they reduce the time quarterbacks have to find open receivers, making even mediocre coverage appear effective. Conversely, when coverage forces quarterbacks to hold the ball longer while searching for an open target, it gives pass rushers additional time to get home with a sack. This interdependence makes it crucial to analyze how coverage performance varies with different levels of pass rush success.

We can categorize plays into three pressure levels:

  1. Clean Pocket: No quarterback hit or sack; QB has time to go through progressions
  2. Pressure (No Sack): Quarterback is hit but completes the throw; disrupted but not completely eliminated
  3. Sack: Quarterback is taken down behind the line; complete coverage/rush success

By examining how man and zone coverage perform under each pressure level, we can identify which coverage types provide the best "coverage-sack" combinations and which coverages depend most heavily on pass rush support.

#| label: pressure-coverage-r
#| message: false
#| warning: false

# Analyze coverage performance with/without pressure
pressure_coverage <- pass_plays %>%
  filter(!is.na(qb_hit)) %>%
  mutate(
    pressure_indicator = case_when(
      sack == 1 ~ "Sack",
      qb_hit == 1 ~ "Pressure (No Sack)",
      TRUE ~ "Clean Pocket"
    )
  ) %>%
  group_by(pressure_indicator, coverage_type = pass_defense_man_zone) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    completion_pct = mean(complete_pass, na.rm = TRUE),
    yards_per_attempt = mean(yards_gained, na.rm = TRUE),
    .groups = "drop"
  )

# Visualize the interaction
pressure_coverage %>%
  ggplot(aes(x = pressure_indicator, y = avg_epa,
             color = coverage_type, group = coverage_type)) +
  geom_line(size = 1.2) +
  geom_point(size = 3) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  scale_color_manual(
    values = c("man" = "#E69F00", "zone" = "#56B4E9"),
    labels = c("Man Coverage", "Zone Coverage")
  ) +
  labs(
    title = "Coverage Performance by Pass Rush Pressure",
    subtitle = "EPA Allowed: Clean Pocket vs Pressure vs Sack",
    x = "Pressure Type",
    y = "Average EPA Allowed",
    color = "Coverage Type",
    caption = "Data: nflfastR 2021-2023"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )
#| label: pressure-coverage-py
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Analyze coverage performance with/without pressure
def categorize_pressure(row):
    if row['sack'] == 1:
        return "Sack"
    elif row['qb_hit'] == 1:
        return "Pressure (No Sack)"
    else:
        return "Clean Pocket"

pressure_coverage = (pass_plays
    .query("qb_hit.notna()")
    .assign(pressure_indicator=lambda x: x.apply(categorize_pressure, axis=1))
    .groupby(['pressure_indicator', 'pass_defense_man_zone'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean'),
        completion_pct=('complete_pass', 'mean'),
        yards_per_attempt=('yards_gained', 'mean')
    )
    .reset_index()
)

# Create visualization
plt.figure(figsize=(10, 6))

for coverage in ['man', 'zone']:
    data = pressure_coverage.query(f"pass_defense_man_zone == '{coverage}'")
    color = '#E69F00' if coverage == 'man' else '#56B4E9'
    label = 'Man Coverage' if coverage == 'man' else 'Zone Coverage'

    plt.plot(data['pressure_indicator'], data['avg_epa'],
             marker='o', linewidth=2, markersize=8,
             color=color, label=label)

plt.axhline(y=0, color='gray', linestyle='--', alpha=0.7)
plt.xlabel('Pressure Type', fontsize=12)
plt.ylabel('Average EPA Allowed', fontsize=12)
plt.title('Coverage Performance by Pass Rush Pressure\nEPA Allowed: Clean Pocket vs Pressure vs Sack',
          fontsize=14, fontweight='bold')
plt.legend(title='Coverage Type')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Cornerback Performance Analysis

Evaluating cornerback performance requires careful consideration of both opportunity and outcome metrics. Unlike quarterback evaluation, where we have clear outcome statistics (completions, touchdowns, interceptions), cornerback evaluation is complicated by the fact that the best cornerbacks often receive fewer targets precisely because they're so good—quarterbacks avoid throwing at elite corners.

This creates a fundamental challenge in cornerback evaluation:

The Coverage Paradox: Elite cornerbacks may have fewer opportunities to make plays (targets, passes defended, interceptions) because offenses avoid them. Meanwhile, below-average cornerbacks get targeted frequently, accumulating counting stats (tackles, passes defended) that make them appear productive when they're actually liabilities.

To solve this paradox, we need opportunity-adjusted metrics that account for:

  1. Target Rate: How often is the cornerback tested relative to their snap count?
  2. Catch Rate Allowed: What percentage of targets are completed when thrown at this cornerback?
  3. Yards Per Target: Average yards gained on throws targeting this cornerback
  4. EPA Per Target: Expected points added when quarterbacks target this cornerback
  5. Coverage Type Context: Does the cornerback excel in man, zone, or both?

These metrics normalize for opportunity and provide fairer comparisons across cornerbacks with different usage patterns.

The Role of Tracking Data in CB Evaluation

Ideally, cornerback evaluation would use Next Gen Stats tracking data to identify: - **Separation allowed**: How much space do receivers create against this cornerback? - **Closest defender**: Is this cornerback truly the nearest defender, or are we misattributing the coverage? - **Route types**: Which specific route concepts does the cornerback struggle against? - **Target depth**: Is the cornerback vulnerable deep, underneath, or both? Without tracking data, we must simulate or approximate these measurements using play-by-play data. The following analysis demonstrates the methodology, but professional analysts should integrate tracking data when available.

Let's build a comprehensive cornerback evaluation framework using available data.

Target-Based Metrics

#| label: cb-targets-r
#| message: false
#| warning: false
#| cache: true

# Load roster data for player names
rosters <- load_rosters(2021:2023)

# Create cornerback target dataset
# Note: This is simulated because individual target data requires tracking data
# In practice, you would use NFL Next Gen Stats or charting data
set.seed(42)
cb_targets <- pass_plays %>%
  filter(complete_pass == 1 | incomplete_pass == 1) %>%
  sample_n(min(5000, n())) %>%  # Sample for demonstration
  mutate(
    # Simulate defender assignment (would come from tracking data)
    primary_defender_id = sample(rosters$gsis_id[rosters$position == "CB"], n(), replace = TRUE)
  ) %>%
  inner_join(
    rosters %>% select(gsis_id, full_name, position, team),
    by = c("primary_defender_id" = "gsis_id")
  ) %>%
  filter(position == "CB")

# Calculate CB performance metrics
cb_performance <- cb_targets %>%
  group_by(player_name = full_name, team = team) %>%
  summarise(
    targets = n(),
    receptions = sum(complete_pass, na.rm = TRUE),
    yards_allowed = sum(yards_gained, na.rm = TRUE),
    tds_allowed = sum(td_pass, na.rm = TRUE),
    avg_epa_allowed = mean(epa, na.rm = TRUE),
    completion_pct_allowed = mean(complete_pass, na.rm = TRUE),
    yards_per_target = mean(yards_gained, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(targets >= 30) %>%  # Minimum target threshold
  arrange(avg_epa_allowed) %>%
  mutate(
    cb_rank = row_number(),
    yards_per_reception = yards_allowed / receptions
  )

# Display top performers
cb_performance %>%
  head(15) %>%
  gt() %>%
  cols_label(
    player_name = "Player",
    team = "Team",
    targets = "Targets",
    receptions = "Rec",
    yards_allowed = "Yards",
    tds_allowed = "TDs",
    avg_epa_allowed = "EPA/Target",
    completion_pct_allowed = "Comp %",
    yards_per_target = "Yards/Target",
    cb_rank = "Rank"
  ) %>%
  fmt_number(
    columns = c(avg_epa_allowed, completion_pct_allowed, yards_per_target),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(targets, receptions, yards_allowed, tds_allowed, cb_rank),
    decimals = 0
  ) %>%
  data_color(
    columns = avg_epa_allowed,
    colors = scales::col_numeric(
      palette = c("#1a9850", "#fee08b", "#d73027"),
      domain = NULL,
      reverse = TRUE
    )
  ) %>%
  tab_header(
    title = "Top 15 Cornerbacks by EPA Allowed per Target",
    subtitle = "Minimum 30 targets | 2021-2023 seasons"
  ) %>%
  tab_source_note("Data: nflfastR (Simulated target data for demonstration)")
#| label: cb-targets-py
#| message: false
#| warning: false
#| cache: true

# Load roster data
rosters = nfl.import_rosters([2021, 2022, 2023])

# Create cornerback target dataset (simulated for demonstration)
np.random.seed(42)
cb_sample = (pass_plays
    .query("complete_pass == 1 | incomplete_pass == 1")
    .sample(n=min(5000, len(pass_plays)))
    .copy()
)

# Simulate defender assignment (would come from tracking data in practice)
cb_ids = rosters.query("position == 'CB'")['gsis_id'].dropna().unique()
cb_sample['primary_defender_id'] = np.random.choice(cb_ids, len(cb_sample))

# Join with roster data
cb_targets = cb_sample.merge(
    rosters[['gsis_id', 'full_name', 'position', 'team']],
    left_on='primary_defender_id',
    right_on='gsis_id',
    how='inner'
).query("position == 'CB'")

# Calculate CB performance metrics
cb_performance = (cb_targets
    .groupby(['full_name', 'team'])
    .agg(
        targets=('epa', 'count'),
        receptions=('complete_pass', 'sum'),
        yards_allowed=('yards_gained', 'sum'),
        tds_allowed=('td_pass', 'sum'),
        avg_epa_allowed=('epa', 'mean'),
        completion_pct_allowed=('complete_pass', 'mean'),
        yards_per_target=('yards_gained', 'mean')
    )
    .reset_index()
    .query("targets >= 30")  # Minimum target threshold
    .sort_values('avg_epa_allowed')
    .assign(
        cb_rank=lambda x: range(1, len(x) + 1),
        yards_per_reception=lambda x: x['yards_allowed'] / x['receptions']
    )
)

print("\n=== Top 15 Cornerbacks by EPA Allowed per Target ===")
print("Minimum 30 targets | 2021-2023 seasons\n")
print(cb_performance.head(15).to_string(index=False))
print("\nNote: Target data simulated for demonstration purposes")

Coverage Type Performance Split

Elite cornerbacks often excel in specific coverage schemes. Let's analyze how cornerback performance varies between man and zone coverage.

#| label: cb-coverage-split-r
#| message: false
#| warning: false

# Calculate CB performance by coverage type
cb_by_coverage <- cb_targets %>%
  group_by(player_name = full_name, coverage_type = pass_defense_man_zone) %>%
  summarise(
    targets = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    completion_pct = mean(complete_pass, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(targets >= 15) %>%  # Minimum for each coverage type
  pivot_wider(
    names_from = coverage_type,
    values_from = c(targets, avg_epa, completion_pct),
    names_sep = "_"
  ) %>%
  mutate(
    epa_diff = avg_epa_zone - avg_epa_man,
    better_in = if_else(epa_diff > 0, "Man", "Zone")
  ) %>%
  filter(!is.na(avg_epa_man), !is.na(avg_epa_zone))

# Visualize man vs zone performance
cb_by_coverage %>%
  head(20) %>%
  ggplot(aes(x = avg_epa_man, y = avg_epa_zone)) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray50") +
  geom_point(aes(color = better_in, size = targets_man + targets_zone), alpha = 0.7) +
  geom_text(aes(label = player_name), size = 2.5, vjust = -0.8, check_overlap = TRUE) +
  scale_color_manual(
    values = c("Man" = "#E69F00", "Zone" = "#56B4E9")
  ) +
  scale_size_continuous(range = c(2, 8), name = "Total Targets") +
  labs(
    title = "Cornerback Performance: Man vs Zone Coverage",
    subtitle = "EPA Allowed per Target | Points below diagonal are better in man coverage",
    x = "EPA Allowed in Man Coverage",
    y = "EPA Allowed in Zone Coverage",
    color = "Better In",
    caption = "Data: nflfastR 2021-2023 | Min 15 targets per coverage type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  ) +
  coord_fixed()
#| label: cb-coverage-split-py
#| fig-width: 10
#| fig-height: 10
#| message: false
#| warning: false

# Calculate CB performance by coverage type
cb_by_coverage = (cb_targets
    .groupby(['full_name', 'pass_defense_man_zone'])
    .agg(
        targets=('epa', 'count'),
        avg_epa=('epa', 'mean'),
        completion_pct=('complete_pass', 'mean')
    )
    .reset_index()
    .query("targets >= 15")  # Minimum for each coverage type
    .pivot_table(
        index='full_name',
        columns='pass_defense_man_zone',
        values=['targets', 'avg_epa', 'completion_pct'],
        aggfunc='first'
    )
)

# Flatten column names
cb_by_coverage.columns = ['_'.join(col).strip() for col in cb_by_coverage.columns.values]
cb_by_coverage = cb_by_coverage.reset_index()

# Filter for players with both coverage types
cb_by_coverage = (cb_by_coverage
    .query("avg_epa_man.notna() & avg_epa_zone.notna()")
    .assign(
        epa_diff=lambda x: x['avg_epa_zone'] - x['avg_epa_man'],
        better_in=lambda x: x['epa_diff'].apply(lambda v: 'Man' if v > 0 else 'Zone'),
        total_targets=lambda x: x['targets_man'] + x['targets_zone']
    )
)

# Create visualization
plt.figure(figsize=(10, 10))

for coverage_type in ['Man', 'Zone']:
    data = cb_by_coverage.query(f"better_in == '{coverage_type}'")
    color = '#E69F00' if coverage_type == 'Man' else '#56B4E9'
    plt.scatter(data['avg_epa_man'], data['avg_epa_zone'],
               s=data['total_targets'] * 3, alpha=0.6,
               color=color, label=f'Better in {coverage_type}')

# Add diagonal line
lims = [
    np.min([plt.xlim()[0], plt.ylim()[0]]),
    np.max([plt.xlim()[1], plt.ylim()[1]])
]
plt.plot(lims, lims, 'k--', alpha=0.5, zorder=0)

# Add labels for top players
top_players = cb_by_coverage.nsmallest(10, 'avg_epa_man')
for _, row in top_players.iterrows():
    plt.annotate(row['full_name'],
                (row['avg_epa_man'], row['avg_epa_zone']),
                fontsize=7, alpha=0.8)

plt.xlabel('EPA Allowed in Man Coverage', fontsize=12)
plt.ylabel('EPA Allowed in Zone Coverage', fontsize=12)
plt.title('Cornerback Performance: Man vs Zone Coverage\nEPA Allowed per Target | Points below diagonal are better in man coverage',
          fontsize=14, fontweight='bold')
plt.legend(title='Better In')
plt.grid(True, alpha=0.3)
plt.axis('equal')
plt.tight_layout()
plt.show()

Safety Performance Metrics

Safeties play a unique role in pass defense, providing deep coverage, run support, and versatility. Evaluating safety performance requires different metrics than cornerback evaluation.

#| label: safety-performance-r
#| message: false
#| warning: false
#| cache: true

# Simulate safety involvement (would use tracking data in practice)
set.seed(123)
safety_plays <- pass_plays %>%
  sample_n(min(4000, n())) %>%
  mutate(
    safety_id = sample(rosters$gsis_id[rosters$position == "S"], n(), replace = TRUE),
    safety_role = sample(c("Deep Third", "Deep Half", "Box", "Blitz"), n(),
                        replace = TRUE, prob = c(0.4, 0.3, 0.2, 0.1))
  ) %>%
  inner_join(
    rosters %>% select(gsis_id, full_name, position, team),
    by = c("safety_id" = "gsis_id")
  ) %>%
  filter(position == "S")

# Calculate safety metrics by role
safety_by_role <- safety_plays %>%
  group_by(safety_name = full_name, team, safety_role) %>%
  summarise(
    snaps = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    explosive_rate = mean(explosive_play, na.rm = TRUE),
    completion_rate = mean(complete_pass, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(snaps >= 20)

# Overall safety performance
safety_overall <- safety_plays %>%
  group_by(safety_name = full_name, team) %>%
  summarise(
    total_snaps = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    explosive_rate = mean(explosive_play, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(total_snaps >= 50) %>%
  arrange(avg_epa) %>%
  head(15)

# Display top safeties
safety_overall %>%
  gt() %>%
  cols_label(
    safety_name = "Safety",
    team = "Team",
    total_snaps = "Snaps",
    avg_epa = "EPA/Play",
    explosive_rate = "Explosive %"
  ) %>%
  fmt_number(
    columns = c(avg_epa, explosive_rate),
    decimals = 3
  ) %>%
  fmt_number(
    columns = total_snaps,
    decimals = 0
  ) %>%
  data_color(
    columns = avg_epa,
    colors = scales::col_numeric(
      palette = c("#1a9850", "#fee08b", "#d73027"),
      domain = NULL,
      reverse = TRUE
    )
  ) %>%
  tab_header(
    title = "Top 15 Safeties by EPA Allowed",
    subtitle = "Minimum 50 snaps | 2021-2023 seasons"
  )
#| label: safety-performance-py
#| message: false
#| warning: false
#| cache: true

# Simulate safety involvement
np.random.seed(123)
safety_sample = pass_plays.sample(n=min(4000, len(pass_plays))).copy()

safety_ids = rosters.query("position == 'S'")['gsis_id'].dropna().unique()
safety_sample['safety_id'] = np.random.choice(safety_ids, len(safety_sample))
safety_sample['safety_role'] = np.random.choice(
    ['Deep Third', 'Deep Half', 'Box', 'Blitz'],
    len(safety_sample),
    p=[0.4, 0.3, 0.2, 0.1]
)

# Join with roster
safety_plays = safety_sample.merge(
    rosters[['gsis_id', 'full_name', 'position', 'team']],
    left_on='safety_id',
    right_on='gsis_id',
    how='inner'
).query("position == 'S'")

# Overall safety performance
safety_overall = (safety_plays
    .groupby(['full_name', 'team'])
    .agg(
        total_snaps=('epa', 'count'),
        avg_epa=('epa', 'mean'),
        explosive_rate=('explosive_play', 'mean')
    )
    .reset_index()
    .query("total_snaps >= 50")
    .sort_values('avg_epa')
    .head(15)
)

print("\n=== Top 15 Safeties by EPA Allowed ===")
print("Minimum 50 snaps | 2021-2023 seasons\n")
print(safety_overall.to_string(index=False))

Target Separation and Contested Catches

One of the most important aspects of coverage analysis is understanding target separation—how close defenders are to receivers at the point of catch. This requires tracking data but can be approximated with play-by-play data.

#| label: separation-analysis-r
#| message: false
#| warning: false

# Analyze completion probability by coverage and air yards
# (Proxy for separation: longer passes typically have more separation)
separation_proxy <- pass_plays %>%
  filter(!is.na(air_yards), air_yards >= 0) %>%
  mutate(
    air_yards_bin = cut(air_yards,
                       breaks = c(0, 5, 10, 15, 20, 100),
                       labels = c("0-5", "6-10", "11-15", "16-20", "20+"),
                       include.lowest = TRUE)
  ) %>%
  group_by(air_yards_bin, coverage_type = pass_defense_man_zone) %>%
  summarise(
    attempts = n(),
    completions = sum(complete_pass, na.rm = TRUE),
    completion_pct = mean(complete_pass, na.rm = TRUE),
    avg_epa = mean(epa, na.rm = TRUE),
    int_rate = mean(interception, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(attempts >= 100)

# Visualize completion rate by depth
separation_proxy %>%
  ggplot(aes(x = air_yards_bin, y = completion_pct,
             color = coverage_type, group = coverage_type)) +
  geom_line(size = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_color_manual(
    values = c("man" = "#E69F00", "zone" = "#56B4E9"),
    labels = c("Man Coverage", "Zone Coverage")
  ) +
  labs(
    title = "Completion Rate by Air Yards and Coverage Type",
    subtitle = "Deeper throws typically indicate more separation at catch point",
    x = "Air Yards",
    y = "Completion Percentage",
    color = "Coverage Type",
    caption = "Data: nflfastR 2021-2023"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )
#| label: separation-analysis-py
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Analyze completion probability by coverage and air yards
separation_proxy = (pass_plays
    .query("air_yards.notna() & air_yards >= 0")
    .assign(
        air_yards_bin=lambda x: pd.cut(
            x['air_yards'],
            bins=[0, 5, 10, 15, 20, 100],
            labels=['0-5', '6-10', '11-15', '16-20', '20+'],
            include_lowest=True
        )
    )
    .groupby(['air_yards_bin', 'pass_defense_man_zone'])
    .agg(
        attempts=('epa', 'count'),
        completions=('complete_pass', 'sum'),
        completion_pct=('complete_pass', 'mean'),
        avg_epa=('epa', 'mean'),
        int_rate=('interception', 'mean')
    )
    .reset_index()
    .query("attempts >= 100")
)

# Create visualization
plt.figure(figsize=(10, 6))

for coverage in ['man', 'zone']:
    data = separation_proxy.query(f"pass_defense_man_zone == '{coverage}'")
    color = '#E69F00' if coverage == 'man' else '#56B4E9'
    label = 'Man Coverage' if coverage == 'man' else 'Zone Coverage'

    plt.plot(data['air_yards_bin'], data['completion_pct'],
            marker='o', linewidth=2, markersize=8,
            color=color, label=label)

plt.xlabel('Air Yards', fontsize=12)
plt.ylabel('Completion Percentage', fontsize=12)
plt.title('Completion Rate by Air Yards and Coverage Type\nDeeper throws typically indicate more separation at catch point',
          fontsize=14, fontweight='bold')
plt.legend(title='Coverage Type')
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Coverage-Adjusted Quarterback Metrics

Quarterbacks face different coverage schemes based on down, distance, and score. To accurately evaluate QB performance, we should adjust for coverage difficulty.

#| label: coverage-adjusted-qb-r
#| message: false
#| warning: false

# Calculate QB performance vs different coverages
qb_vs_coverage <- pass_plays %>%
  filter(!is.na(passer_id)) %>%
  group_by(qb_name = passer, coverage_type = pass_defense_man_zone) %>%
  summarise(
    attempts = n(),
    completions = sum(complete_pass, na.rm = TRUE),
    yards = sum(yards_gained, na.rm = TRUE),
    tds = sum(td_pass, na.rm = TRUE),
    ints = sum(interception, na.rm = TRUE),
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate = mean(epa_success, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(attempts >= 100)

# Calculate EPA above average vs each coverage
league_avg_by_coverage <- pass_plays %>%
  group_by(coverage_type = pass_defense_man_zone) %>%
  summarise(league_avg_epa = mean(epa, na.rm = TRUE), .groups = "drop")

qb_coverage_adjusted <- qb_vs_coverage %>%
  left_join(league_avg_by_coverage, by = c("coverage_type")) %>%
  mutate(
    epa_above_avg = avg_epa - league_avg_epa
  ) %>%
  group_by(qb_name) %>%
  summarise(
    total_attempts = sum(attempts),
    avg_epa_vs_man = mean(avg_epa[coverage_type == "man"], na.rm = TRUE),
    avg_epa_vs_zone = mean(avg_epa[coverage_type == "zone"], na.rm = TRUE),
    coverage_adjusted_epa = weighted.mean(epa_above_avg, attempts, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(total_attempts >= 200) %>%
  arrange(desc(coverage_adjusted_epa)) %>%
  head(15)

# Display results
qb_coverage_adjusted %>%
  gt() %>%
  cols_label(
    qb_name = "Quarterback",
    total_attempts = "Attempts",
    avg_epa_vs_man = "EPA vs Man",
    avg_epa_vs_zone = "EPA vs Zone",
    coverage_adjusted_epa = "Coverage-Adj EPA"
  ) %>%
  fmt_number(
    columns = c(avg_epa_vs_man, avg_epa_vs_zone, coverage_adjusted_epa),
    decimals = 3
  ) %>%
  fmt_number(
    columns = total_attempts,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  data_color(
    columns = coverage_adjusted_epa,
    colors = scales::col_numeric(
      palette = c("#d73027", "#fee08b", "#1a9850"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Top Quarterbacks by Coverage-Adjusted EPA",
    subtitle = "Minimum 200 attempts | 2021-2023"
  )
#| label: coverage-adjusted-qb-py
#| message: false
#| warning: false

# Calculate QB performance vs different coverages
qb_vs_coverage = (pass_plays
    .query("passer_id.notna()")
    .groupby(['passer', 'pass_defense_man_zone'])
    .agg(
        attempts=('epa', 'count'),
        completions=('complete_pass', 'sum'),
        yards=('yards_gained', 'sum'),
        tds=('td_pass', 'sum'),
        ints=('interception', 'sum'),
        avg_epa=('epa', 'mean'),
        success_rate=('epa_success', 'mean')
    )
    .reset_index()
    .query("attempts >= 100")
)

# Calculate league average by coverage
league_avg = (pass_plays
    .groupby('pass_defense_man_zone')
    .agg(league_avg_epa=('epa', 'mean'))
    .reset_index()
)

# Calculate coverage-adjusted EPA
qb_coverage_adjusted = (qb_vs_coverage
    .merge(league_avg, on='pass_defense_man_zone')
    .assign(epa_above_avg=lambda x: x['avg_epa'] - x['league_avg_epa'])
    .groupby('passer')
    .apply(lambda x: pd.Series({
        'total_attempts': x['attempts'].sum(),
        'avg_epa_vs_man': x[x['pass_defense_man_zone'] == 'man']['avg_epa'].mean(),
        'avg_epa_vs_zone': x[x['pass_defense_man_zone'] == 'zone']['avg_epa'].mean(),
        'coverage_adjusted_epa': np.average(x['epa_above_avg'], weights=x['attempts'])
    }))
    .reset_index()
    .query("total_attempts >= 200")
    .sort_values('coverage_adjusted_epa', ascending=False)
    .head(15)
)

print("\n=== Top Quarterbacks by Coverage-Adjusted EPA ===")
print("Minimum 200 attempts | 2021-2023\n")
print(qb_coverage_adjusted.to_string(index=False))

Slot vs Outside Coverage

Modern offenses increasingly use slot receivers to create mismatches. Analyzing slot versus outside coverage effectiveness is crucial for defensive evaluation.

#| label: slot-outside-r
#| message: false
#| warning: false

# Approximate slot vs outside based on personnel
# (In practice, would use tracking data for precise alignment)
slot_analysis <- pass_plays %>%
  mutate(
    # Approximate: 11 personnel (3 WR) more likely to use slot
    likely_slot = if_else(
      grepl("^1-1", off_personnel) & complete_pass + incomplete_pass == 1,
      1, 0
    )
  ) %>%
  filter(likely_slot %in% c(0, 1)) %>%
  group_by(
    alignment = if_else(likely_slot == 1, "Slot Heavy", "Outside Heavy"),
    coverage_type = pass_defense_man_zone
  ) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    completion_pct = mean(complete_pass, na.rm = TRUE),
    yards_per_attempt = mean(yards_gained, na.rm = TRUE),
    explosive_rate = mean(explosive_play, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(plays >= 100)

# Visualize
slot_analysis %>%
  ggplot(aes(x = alignment, y = avg_epa, fill = coverage_type)) +
  geom_col(position = "dodge", width = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed") +
  scale_fill_manual(
    values = c("man" = "#E69F00", "zone" = "#56B4E9"),
    labels = c("Man Coverage", "Zone Coverage")
  ) +
  labs(
    title = "EPA Allowed: Slot vs Outside Alignment",
    subtitle = "Coverage performance varies by receiver alignment",
    x = "Offensive Alignment",
    y = "Average EPA Allowed",
    fill = "Coverage Type",
    caption = "Data: nflfastR 2021-2023 | Approximated from personnel groupings"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )
#| label: slot-outside-py
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Approximate slot vs outside based on personnel
def is_likely_slot(row):
    if pd.notna(row['off_personnel']) and row['off_personnel'].startswith('1-1'):
        if row['complete_pass'] + row['incomplete_pass'] == 1:
            return 1
    return 0

slot_analysis = (pass_plays
    .assign(likely_slot=lambda x: x.apply(is_likely_slot, axis=1))
    .query("likely_slot.isin([0, 1])")
    .assign(
        alignment=lambda x: x['likely_slot'].map({1: 'Slot Heavy', 0: 'Outside Heavy'})
    )
    .groupby(['alignment', 'pass_defense_man_zone'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean'),
        completion_pct=('complete_pass', 'mean'),
        yards_per_attempt=('yards_gained', 'mean'),
        explosive_rate=('explosive_play', 'mean')
    )
    .reset_index()
    .query("plays >= 100")
)

# Create visualization
fig, ax = plt.subplots(figsize=(10, 6))

alignments = slot_analysis['alignment'].unique()
x = np.arange(len(alignments))
width = 0.35

man_data = slot_analysis.query("pass_defense_man_zone == 'man'").set_index('alignment')
zone_data = slot_analysis.query("pass_defense_man_zone == 'zone'").set_index('alignment')

ax.bar(x - width/2, [man_data.loc[a, 'avg_epa'] if a in man_data.index else 0
                     for a in alignments],
       width, label='Man Coverage', color='#E69F00')
ax.bar(x + width/2, [zone_data.loc[a, 'avg_epa'] if a in zone_data.index else 0
                     for a in alignments],
       width, label='Zone Coverage', color='#56B4E9')

ax.axhline(y=0, color='black', linestyle='--', alpha=0.7)
ax.set_xlabel('Offensive Alignment', fontsize=12)
ax.set_ylabel('Average EPA Allowed', fontsize=12)
ax.set_title('EPA Allowed: Slot vs Outside Alignment\nCoverage performance varies by receiver alignment',
             fontsize=14, fontweight='bold')
ax.set_xticks(x)
ax.set_xticklabels(alignments)
ax.legend(title='Coverage Type')
ax.grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

Blitz Packages and Coverage Disguise

Blitzing represents one of the most critical strategic decisions in defensive play-calling: sacrifice a coverage defender to send an additional pass rusher, hoping to generate pressure before receivers get open. This creates an immediate tradeoff between pass rush and coverage quality.

The mathematics of blitzing are straightforward but the strategic implications are complex:

Standard Rush (4 rushers):
- 4 pass rushers vs 5 offensive linemen (disadvantage)
- 7 coverage defenders vs 5 eligible receivers (advantage)
- Quarterback has more time, but fewer open receivers

Blitz (5+ rushers):
- 5+ pass rushers vs 5 offensive linemen (advantage or better)
- 6 or fewer coverage defenders vs 5 eligible receivers (disadvantage)
- Quarterback has less time, but potentially more open receivers

The key question: Does the pressure generated by bringing extra rushers create more value than the coverage vulnerability it creates?

Data can help answer this question by comparing:
1. How often blitzes generate pressure/sacks versus standard rushes
2. How much EPA increases when offenses complete passes against the blitz
3. Which coverage types (man vs zone) work best with blitzes
4. Which situations favor blitzing versus dropping extra defenders into coverage

The Blitz Paradox

Surprisingly, data often shows that blitzing doesn't improve defensive EPA despite increasing sack rates. Why? Because when blitzes fail to generate pressure, the reduced coverage leads to much higher EPA allowed. The occasional sacks and pressures don't fully compensate for the explosive plays allowed when the blitz fails. This doesn't mean blitzing is wrong—it means blitzing must be used strategically: - Against inexperienced quarterbacks who struggle with blitz recognition - In obvious passing situations where offenses expect conservative coverage - With elite man-coverage corners who can hold up one-on-one - As part of a mixed strategy to create uncertainty The key insight: **Blitz effectiveness depends more on when and how you blitz than how often you blitz.**

Let's analyze blitz effectiveness across different coverage types and game situations.

#| label: blitz-coverage-r
#| message: false
#| warning: false

# Analyze blitz effectiveness by coverage
blitz_analysis <- pass_plays %>%
  filter(!is.na(blitz)) %>%
  group_by(
    blitz_called = if_else(blitz == 1, "Blitz", "No Blitz"),
    coverage_type = pass_defense_man_zone
  ) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    sack_rate = mean(sack, na.rm = TRUE),
    pressure_rate = mean(qb_hit, na.rm = TRUE),
    completion_pct = mean(complete_pass, na.rm = TRUE),
    explosive_rate = mean(explosive_play, na.rm = TRUE),
    .groups = "drop"
  )

# Create comparison table
blitz_analysis %>%
  gt() %>%
  cols_label(
    blitz_called = "Blitz",
    coverage_type = "Coverage",
    plays = "Plays",
    avg_epa = "EPA/Play",
    sack_rate = "Sack %",
    pressure_rate = "Pressure %",
    completion_pct = "Comp %",
    explosive_rate = "Explosive %"
  ) %>%
  fmt_percent(
    columns = c(sack_rate, pressure_rate, completion_pct, explosive_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = avg_epa,
    decimals = 3
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  data_color(
    columns = avg_epa,
    colors = scales::col_numeric(
      palette = c("#1a9850", "#fee08b", "#d73027"),
      domain = NULL,
      reverse = TRUE
    )
  ) %>%
  tab_header(
    title = "Blitz Effectiveness by Coverage Type",
    subtitle = "2021-2023 NFL Seasons"
  )

# Visualize the pressure-coverage tradeoff
blitz_analysis %>%
  ggplot(aes(x = pressure_rate, y = avg_epa,
             color = coverage_type, shape = blitz_called)) +
  geom_point(size = 5) +
  geom_text(aes(label = paste(blitz_called, "-", coverage_type)),
            vjust = -1, size = 3) +
  scale_color_manual(
    values = c("man" = "#E69F00", "zone" = "#56B4E9")
  ) +
  scale_x_continuous(labels = scales::percent_format()) +
  labs(
    title = "Pressure Rate vs EPA Allowed by Coverage and Blitz",
    subtitle = "Blitzing increases pressure but may increase EPA allowed",
    x = "Pressure Rate (QB Hits)",
    y = "Average EPA Allowed",
    color = "Coverage Type",
    shape = "Blitz Status",
    caption = "Data: nflfastR 2021-2023"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )
#| label: blitz-coverage-py
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Analyze blitz effectiveness by coverage
blitz_analysis = (pass_plays
    .query("blitz.notna()")
    .assign(blitz_called=lambda x: x['blitz'].map({1: 'Blitz', 0: 'No Blitz'}))
    .groupby(['blitz_called', 'pass_defense_man_zone'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean'),
        sack_rate=('sack', 'mean'),
        pressure_rate=('qb_hit', 'mean'),
        completion_pct=('complete_pass', 'mean'),
        explosive_rate=('explosive_play', 'mean')
    )
    .reset_index()
)

print("\n=== Blitz Effectiveness by Coverage Type ===")
print("2021-2023 NFL Seasons\n")
print(blitz_analysis.to_string(index=False))

# Visualize pressure-coverage tradeoff
plt.figure(figsize=(10, 6))

for coverage in ['man', 'zone']:
    for blitz_status in ['Blitz', 'No Blitz']:
        data = blitz_analysis.query(
            f"pass_defense_man_zone == '{coverage}' & blitz_called == '{blitz_status}'"
        )

        if len(data) > 0:
            color = '#E69F00' if coverage == 'man' else '#56B4E9'
            marker = 'o' if blitz_status == 'Blitz' else 's'

            plt.scatter(data['pressure_rate'], data['avg_epa'],
                       s=150, color=color, marker=marker, alpha=0.7,
                       label=f'{blitz_status} - {coverage.title()}')

            # Add labels
            for _, row in data.iterrows():
                plt.annotate(f"{blitz_status}\n{coverage.title()}",
                           (row['pressure_rate'], row['avg_epa']),
                           xytext=(0, 10), textcoords='offset points',
                           ha='center', fontsize=8)

plt.xlabel('Pressure Rate (QB Hits)', fontsize=12)
plt.ylabel('Average EPA Allowed', fontsize=12)
plt.title('Pressure Rate vs EPA Allowed by Coverage and Blitz\nBlitzing increases pressure but may increase EPA allowed',
          fontsize=14, fontweight='bold')
plt.gca().xaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Coverage Tendency Analysis

Defensive coordinators face a fundamental tension: they want to play coverages that match their personnel strengths and game situation, but they also need to maintain unpredictability to prevent offenses from exploiting coverage tendencies. This creates a strategic optimization problem.

The Predictability Problem: If a defense becomes too predictable in its coverage usage—for example, always playing Cover 2 on third-and-long—savvy offensive coordinators will design specific plays to exploit that coverage. But if a defense tries to remain completely unpredictable, it may end up playing suboptimal coverages that don't match the situation or their personnel strengths.

The optimal strategy lies somewhere in between: maintain enough variability to prevent exploitation while still leveraging situational advantages. Let's analyze:

  1. How predictable are NFL defenses? What percentage of the time do they use their most common coverage in each situation?
  2. When does predictability appear? Which down-and-distance combinations show the strongest coverage tendencies?
  3. Does predictability hurt performance? Do teams that are more predictable allow more EPA?
  4. How does game script affect coverage? Do losing teams play more aggressively? Do winning teams play more conservatively?

The 70% Rule for Coverage Tendencies

A useful heuristic from coaching circles: if you use a particular coverage more than 70% of the time in a specific situation, you're likely too predictable. Offenses can design plays specifically for that coverage and execute them with confidence. However, this rule has exceptions: - **Extreme talent advantage**: If you have significantly better players, you can be more predictable - **Situational advantages**: In obvious run situations or prevent scenarios, predictability matters less - **Limited opponent preparation**: Early in the season or against less sophisticated offenses, tendencies matter less The key is monitoring your tendencies and adjusting when opponents demonstrate they're exploiting them.

Let's examine coverage tendencies across different game situations to identify when defenses are most and least predictable.

#| label: coverage-tendencies-r
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Calculate coverage usage by down and distance
coverage_tendencies <- pass_plays %>%
  filter(down %in% c(1, 2, 3)) %>%
  mutate(
    distance_category = case_when(
      ydstogo <= 3 ~ "Short",
      ydstogo <= 7 ~ "Medium",
      TRUE ~ "Long"
    )
  ) %>%
  group_by(down, distance_category, coverage_type = pass_defense_man_zone) %>%
  summarise(
    plays = n(),
    .groups = "drop"
  ) %>%
  group_by(down, distance_category) %>%
  mutate(
    coverage_pct = plays / sum(plays)
  ) %>%
  ungroup()

# Visualize coverage tendencies
coverage_tendencies %>%
  ggplot(aes(x = distance_category, y = coverage_pct, fill = coverage_type)) +
  geom_col(position = "stack") +
  facet_wrap(~down, labeller = labeller(down = function(x) paste("Down:", x))) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_manual(
    values = c("man" = "#E69F00", "zone" = "#56B4E9"),
    labels = c("Man Coverage", "Zone Coverage")
  ) +
  labs(
    title = "Coverage Usage Tendencies by Down and Distance",
    subtitle = "NFL 2021-2023 | Percentage of plays by coverage type",
    x = "Distance to Go",
    y = "Percentage of Plays",
    fill = "Coverage Type",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )

# Analyze by score differential
coverage_by_score <- pass_plays %>%
  mutate(
    score_diff_category = case_when(
      score_differential <= -14 ~ "Losing by 14+",
      score_differential <= -7 ~ "Losing by 7-13",
      score_differential <= -3 ~ "Losing by 3-6",
      abs(score_differential) <= 2 ~ "Tied/Close",
      score_differential <= 6 ~ "Winning by 3-6",
      score_differential <= 13 ~ "Winning by 7-13",
      TRUE ~ "Winning by 14+"
    )
  ) %>%
  group_by(score_diff_category, coverage_type = pass_defense_man_zone) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(score_diff_category) %>%
  mutate(
    coverage_pct = plays / sum(plays)
  ) %>%
  ungroup()

# Visualize score differential tendencies
coverage_by_score %>%
  ggplot(aes(x = reorder(score_diff_category, coverage_pct),
             y = coverage_pct, fill = coverage_type)) +
  geom_col(position = "dodge") +
  coord_flip() +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_manual(
    values = c("man" = "#E69F00", "zone" = "#56B4E9"),
    labels = c("Man Coverage", "Zone Coverage")
  ) +
  labs(
    title = "Coverage Usage by Score Differential",
    subtitle = "Defensive coverage tendencies change based on game situation",
    x = "Score Differential (Defense Perspective)",
    y = "Percentage of Plays",
    fill = "Coverage Type",
    caption = "Data: nflfastR 2021-2023"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )
#| label: coverage-tendencies-py
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Calculate coverage usage by down and distance
def categorize_distance_simple(yards):
    if yards <= 3:
        return "Short"
    elif yards <= 7:
        return "Medium"
    else:
        return "Long"

coverage_tendencies = (pass_plays
    .query("down.isin([1, 2, 3])")
    .assign(distance_category=lambda x: x['ydstogo'].apply(categorize_distance_simple))
    .groupby(['down', 'distance_category', 'pass_defense_man_zone'])
    .size()
    .reset_index(name='plays')
)

# Calculate percentages
coverage_tendencies['coverage_pct'] = (
    coverage_tendencies.groupby(['down', 'distance_category'])['plays']
    .transform(lambda x: x / x.sum())
)

# Visualize coverage tendencies
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

for idx, down in enumerate([1, 2, 3]):
    ax = axes[idx]
    data = coverage_tendencies.query(f"down == {down}")

    # Pivot for stacking
    pivot = data.pivot_table(
        index='distance_category',
        columns='pass_defense_man_zone',
        values='coverage_pct',
        fill_value=0
    )

    pivot.plot(kind='bar', stacked=True, ax=ax,
               color=['#E69F00', '#56B4E9'], width=0.7)

    ax.set_title(f'Down: {down}')
    ax.set_xlabel('Distance to Go')
    ax.set_ylabel('Percentage of Plays' if idx == 0 else '')
    ax.set_ylim(0, 1)
    ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
    ax.legend(['Man Coverage', 'Zone Coverage'] if idx == 2 else [],
              title='Coverage Type')
    ax.grid(True, alpha=0.3, axis='y')

plt.suptitle('Coverage Usage Tendencies by Down and Distance\nNFL 2021-2023 | Percentage of plays by coverage type',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

# Analyze by score differential
def categorize_score_diff(diff):
    if diff <= -14:
        return "Losing by 14+"
    elif diff <= -7:
        return "Losing by 7-13"
    elif diff <= -3:
        return "Losing by 3-6"
    elif abs(diff) <= 2:
        return "Tied/Close"
    elif diff <= 6:
        return "Winning by 3-6"
    elif diff <= 13:
        return "Winning by 7-13"
    else:
        return "Winning by 14+"

coverage_by_score = (pass_plays
    .assign(score_diff_category=lambda x: x['score_differential'].apply(categorize_score_diff))
    .groupby(['score_diff_category', 'pass_defense_man_zone'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean')
    )
    .reset_index()
)

coverage_by_score['coverage_pct'] = (
    coverage_by_score.groupby('score_diff_category')['plays']
    .transform(lambda x: x / x.sum())
)

print("\n=== Coverage Usage by Score Differential ===")
print(coverage_by_score.to_string(index=False))

Summary

Coverage analysis represents one of the most complex challenges in football analytics, requiring integration of scheme theory, individual evaluation, situational context, and game theory principles. This chapter has explored multiple dimensions of coverage analysis, from fundamental coverage concepts to advanced statistical techniques for player evaluation. Let's consolidate the key insights:

Core Principles of Coverage Analysis

Coverage Schemes Have Distinct Risk-Return Profiles: Different coverages excel in different situations. Man coverage offers higher upside (more turnovers, lower completion percentages) but higher downside risk (explosive plays when coverage busts occur). Zone coverage provides more conservative, predictable results—fewer explosives allowed but higher completion rates. The choice between them depends on your personnel quality, opponent tendencies, and strategic goals.

The Two-High Revolution: Modern NFL offenses have forced a dramatic shift toward two-high safety structures (Cover 2, Cover 4, Cover 6). From 2011 to 2023, two-high safety usage increased from roughly 40% to over 60% of passing plays. This shift reflects offensive evolution toward explosive passing attacks and mobile quarterbacks. However, this trend is already creating counter-trends, with creative offenses exploiting light boxes with innovative run games.

Context Dominates Simple Averages: While aggregate statistics show man and zone coverage performing similarly in overall EPA, this aggregate equality masks enormous situational variation. Man coverage excels in short-yardage situations and when protecting leads. Zone coverage dominates in long-yardage situations and when preventing explosive plays is paramount. Defensive coordinators must match coverage to context rather than relying on a single preferred scheme.

Individual Player Evaluation

Opportunity-Adjusted Metrics Are Essential: Traditional counting stats (interceptions, passes defended, tackles) favor high-volume defenders who get targeted frequently—often a sign of poor coverage. Elite cornerbacks may have fewer opportunities precisely because quarterbacks avoid them. Proper evaluation requires opportunity-adjusted metrics:
- EPA per target (not total EPA)
- Catch rate allowed (not total receptions)
- Yards per target (not total yards)
- Target rate relative to snaps played

Coverage Type Matters for Player Evaluation: Some cornerbacks excel in man coverage but struggle in zone, or vice versa. Evaluating players without considering coverage context can lead to personnel mismatches. Teams must understand whether their cornerbacks are "man corners" or "zone corners" and deploy them accordingly.

The Safety Position Requires Different Metrics: While cornerbacks can be evaluated primarily on targets allowed, safeties have broader responsibilities including run support, deep coverage help, and disguise roles. Safety evaluation requires examining explosive play prevention, run support effectiveness, and leveraged positioning rather than just target-based metrics.

Strategic and Tactical Insights

Pressure-Coverage Synergy Is Nonlinear: Pass rush and coverage don't simply add together—they multiply. Great coverage allows pass rushers more time to get home; great pressure makes mediocre coverage look elite. This synergy means that:
- Defenses need either elite coverage OR elite pass rush (ideally both)
- Blitzing only works with competent man coverage
- Investing in both coverage and rush players provides disproportionate returns

Blitz Frequency Doesn't Equal Blitz Effectiveness: The data consistently shows that teams that blitz most frequently don't necessarily generate better defensive results. What matters is:
- Blitzing in the right situations (against inexperienced QBs, when unexpected)
- Having the right coverage behind blitzes (usually man coverage)
- Maintaining unpredictability in when you blitz

Predictability Is a Vulnerability: Defenses that become too predictable in their coverage usage allow offenses to design and practice specific plays against their tendencies. The optimal strategy maintains enough variability to prevent exploitation while still leveraging situational and personnel advantages. The ~70% threshold (using one coverage >70% in a specific situation) serves as a useful warning sign for over-reliance.

Methodological Considerations

Data Limitations Shape Analysis: The publicly available coverage data (man vs zone) provides only broad categorization. Precise coverage identification (Cover 1 vs Cover 0, Cover 2 vs Cover 4) requires additional data sources:
- Next Gen Stats tracking data (alignment and movement patterns)
- Professional charting services (manual coverage identification)
- Film analysis (direct observation)
- Statistical inference (using proxies like personnel and outcomes)

Selection Bias Affects Coverage Comparisons: Defenses don't randomly choose coverages—they select based on situation, opponent, personnel, and game plan. This creates selection bias in coverage comparisons. A coverage that appears effective might simply be used in favorable situations. Controlling for situational variables (down, distance, score, personnel) helps isolate true coverage effectiveness.

Sample Size Requirements Are Substantial: Because specific coverages may only be used on 10-20% of plays, and because we need to segment by situation, coverage analysis requires large sample sizes. Analysis of a single game or even a single season provides unreliable estimates. Multiple seasons of data (or tracking data with precise coverage identification) are essential for robust conclusions.

Practical Applications

For Defensive Coordinators: Use data to identify:
- Which coverages work best with your personnel
- Which situations reveal predictable coverage tendencies
- Where your defense allows explosive plays (coverage breakdowns vs scheme vulnerabilities)
- Whether your blitz-coverage balance is optimal

For Personnel Evaluators: Use data to assess:
- Individual defender performance using opportunity-adjusted metrics
- Whether cornerback prospects fit man or zone schemes
- How safety candidates perform in different roles (deep, box, hybrid)
- Whether defensive backs maintain performance across coverage types

For Offensive Analysts: Use opponent data to identify:
- Coverage tendencies by down, distance, and formation
- Individual defender vulnerabilities (which corners to target)
- Situations where defenses become predictable
- Blitz frequencies and patterns to design protections and hot routes

Future Directions

Tracking Data Integration: The next frontier in coverage analysis involves integrating player tracking data to measure:
- Separation allowed at the catch point
- Route coverage smoothness (how closely defenders mirror receivers)
- Leverage positioning (are defenders properly positioned to defend their responsibility?)
- Pattern recognition (how quickly do defenders identify and respond to route combinations?)

Machine Learning Applications: Advanced methods can enhance coverage analysis:
- Classification models to predict coverage type from pre-snap alignment
- Clustering algorithms to identify coverage shells and disguises
- Expected completion probability models that control for coverage type
- Defender tracking to automatically identify coverage assignments

Integrated Defensive Evaluation: Future analysis should integrate coverage, pass rush, and run defense into unified defensive frameworks that capture the complementarities between these dimensions. Coverage doesn't exist in isolation—it's part of a comprehensive defensive system.

The field of coverage analysis continues to evolve as new data sources become available and analytical techniques advance. The principles established in this chapter—understanding scheme theory, using opportunity-adjusted metrics, accounting for context, and recognizing data limitations—will remain foundational as the field progresses.

Exercises

Conceptual Questions

  1. Coverage Philosophy: Explain why a team might prefer zone coverage over man coverage despite man coverage generally allowing lower EPA. What factors beyond EPA should influence this decision?

  2. CB vs Safety Evaluation: Why do cornerbacks and safeties require different performance metrics? What unique responsibilities does each position have?

  3. Blitz Paradox: The data often shows that blitzing doesn't improve EPA allowed despite increasing pressure rate. Explain this apparent paradox and discuss when blitzing might still be optimal.

Coding Exercises

Exercise 1: Coverage Shell Identification

Using the play-by-play data, create a function that identifies the likely defensive coverage shell (one-high vs two-high safety) based on available information. Calculate EPA allowed for each shell type. **Challenge**: Extend this to identify specific coverages (Cover 1, Cover 2, Cover 3) using personnel groupings and formation data.

Exercise 2: Cornerback Matchup Analysis

Build a comprehensive cornerback evaluation system that includes: a) EPA allowed per target by coverage type (man/zone) b) Performance vs different receiver types (outside/slot) c) Success rate on different pass depths (short/medium/deep) d) Create a composite "CB Score" that weights these factors Identify the top 10 cornerbacks by your composite score.

Exercise 3: Optimal Coverage Selection

For a specific team, analyze their coverage tendencies and effectiveness: a) Calculate coverage usage rates by down and distance b) Determine which coverages work best in different situations c) Identify situations where they're predictable (>70% usage of one coverage) d) Recommend coverage strategy adjustments based on the data **Advanced**: Build a classification model that predicts coverage type based on situation, then evaluate prediction accuracy.

Exercise 4: Pressure-Coverage Optimization

Analyze the relationship between pass rush and coverage: a) Calculate the "break-even" pressure rate where blitzing EPA equals standard rush EPA b) Identify teams that blitz most effectively (lowest EPA when blitzing) c) Determine optimal blitz situations based on down, distance, and personnel d) Visualize the pressure-coverage efficiency frontier for all NFL teams

Further Reading

  • Burke, B. (2020). "The Evolution of Coverage." ESPN Analytics Blog.

  • Yurko, R., Ventura, S., & Horowitz, M. (2019). "nflWAR: A Reproducible Method for Offensive Player Evaluation in Football." Journal of Quantitative Analysis in Sports, 15(3), 163-183.

  • Lopez, M., et al. (2020). "How to Measure Defensive Performance in the NFL." MIT Sloan Sports Analytics Conference.

  • PFF College (2021). "Coverage vs Coverage: Understanding Modern Pass Defense." Pro Football Focus.

  • Eager, E. (2019). "Assessing Defensive Back Performance Using Expected Completion Percentage." Sports Info Solutions.

  • Schatz, A. (2018). "Game Charting and Coverage Statistics." Football Outsiders Almanac.

References

:::