Chapter 18: Scheme Analysis | Football Analytics Textbook

Learning ObjectivesBy the end of this chapter, you will be able to:

Identify defensive scheme tendencies and base defensive alignments
Analyze coordinator philosophy and strategic fingerprints
Study game-planning approaches and in-game adaptations
Evaluate scheme fit and personnel alignment
Use analytics for offensive game planning against specific defenses
Map coverage tendencies by situation and personnel
Analyze blitz patterns and pressure schemes
Generate comprehensive defensive scouting reports

Introduction

Throughout Part III, we've examined individual components of defensive performance: coverage schemes, pass rush effectiveness, run defense, and defensive metrics. While each analysis provides valuable insights, comprehensive defensive evaluation requires synthesizing these elements into a complete scouting and game-planning framework.

The challenge every offensive coordinator faces: walking into a stadium on Sunday morning, knowing they'll face a defense that has spent all week preparing to stop their offense. The difference between winning and losing often comes down to one question: How well did you scout your opponent's defensive scheme?

Defensive scheme analysis is the art and science of reverse-engineering an opponent's defensive philosophy from observable data. In the film room, coaches spend countless hours breaking down defensive alignments, coverage shells, and blitz packages. But modern analytics adds a powerful dimension to this traditional scouting process: the ability to identify patterns across hundreds or thousands of plays, revealing tendencies that might not be obvious from watching game film alone.

This chapter brings together everything we've learned about defensive analytics to build complete scheme analysis systems. We'll learn how to identify defensive philosophies from play-by-play data, distinguish between 3-4 and 4-3 base fronts, map coverage tendencies by situation, analyze coordinator decision-making patterns, and create professional-grade scouting reports that inform offensive game planning. By the end of this chapter, you'll understand how NFL teams combine traditional film study with data analytics to gain competitive advantages in preparation and in-game adjustments.

What makes scheme analysis different: Unlike player evaluation, which focuses on individual performance, scheme analysis examines collective decision-making. We're not asking "How good is this linebacker?" but rather "When does this defense blitz on third-and-medium?" and "What coverage do they prefer against 11 personnel?" These pattern-recognition questions are perfectly suited to analytical approaches, especially when we have multiple seasons of play-by-play data to analyze.

What is Defensive Scheme Analysis?

Defensive scheme analysis is a comprehensive evaluation framework that: - Identifies base defensive alignments and personnel groupings - Maps coverage and blitz tendencies by situation and formation - Analyzes coordinator philosophy and strategic patterns - Evaluates scheme fit with available personnel - Identifies exploitable weaknesses and matchup advantages - Informs offensive game planning and play-calling - Predicts defensive adjustments and in-game adaptations

The Evolution of Defensive Scheme Analysis

Traditional Scouting Methods

Historically, defensive scouting relied on:

Film Study: Hours of manual video review
Tendency Cards: Hand-charted situational tendencies
Down-and-Distance Sheets: Play-type preferences
Formation Cards: Personnel and alignment breakdowns

The Analytics Revolution

Modern defensive analysis leverages:

Play-by-Play Data: Automated tendency tracking
Next Gen Stats: Player tracking and alignment data
Machine Learning: Pattern recognition and prediction
Visual Analytics: Interactive tendency mapping

The Competitive Advantage

Teams that effectively combine traditional film study with analytics-driven scheme analysis gain significant competitive advantages in game planning and in-game adjustments.

Base Defense Identification

Understanding Defensive Fronts

The foundation of any defensive scheme is its base front—the alignment of defensive linemen and linebackers that defines how the defense approaches stopping both the run and pass. While modern NFL defenses are increasingly multiple and adaptable, most teams still have a foundational structure they return to in standard situations.

Why base fronts matter: The choice between a 4-3 and 3-4 front isn't just about counting players. It reflects fundamental philosophical differences about how to allocate defensive resources. A 4-3 defense emphasizes defensive line play, using four down linemen to control gaps and pressure the quarterback. A 3-4 defense prioritizes versatility, using an extra linebacker to create more complex blitz packages and coverage disguises.

Understanding a team's base front helps offensive coordinators predict defensive behavior in key situations and identify personnel matchups to exploit. For example, a 3-4 defense often means at least one linebacker will drop into coverage on passing downs, potentially creating favorable matchups for receiving running backs or tight ends.

Common Defensive Fronts

NFL defenses primarily use three base fronts, each with distinct strategic implications:

4-3 Defense: Four down linemen, three linebackers
- Strengths: Superior run defense with four gap-controlling linemen; more predictable pass rush lanes; better suited for teams with elite defensive tackles
- Weaknesses: Fewer coverage resources (only three linebackers); less schematic flexibility; struggles against spread formations with multiple receivers
- Strategic implications: Typically better against power running teams; more vulnerable to passing attacks; requires elite defensive line talent
- Common teams: 49ers (dominant defensive line), Ravens (situational), Chargers (Staley era)
- Personnel requirements: Need four legitimate NFL defensive linemen; strong safety support critical

3-4 Defense: Three down linemen, four linebackers
- Strengths: Maximum scheme versatility with four linebackers who can rush or drop; superior disguise capabilities; better matchups against spread offenses
- Weaknesses: Requires a true nose tackle (rare and expensive); run defense can be compromised if nose gets blocked; more complex communication requirements
- Strategic implications: Creates confusion about who's rushing; better against passing teams; allows more creative blitz designs
- Common teams: Steelers (historical), Patriots (Belichick era), Packers (recent years)
- Personnel requirements: Elite nose tackle essential; versatile outside linebackers who can rush and cover

Hybrid/Multiple Fronts: Flexible personnel packages that adapt to offensive formations
- Strengths: Ultimate matchup flexibility; can adjust to any offensive personnel; maximizes defensive talent utilization
- Weaknesses: Increased complexity requires sophisticated defenders; communication challenges; harder to master
- Strategic implications: Makes game planning difficult for offenses; allows defensive coordinators to dictate matchups; requires deep roster
- Common teams: Rams (McVay/Raheem Morris era), Bills (Frazier/McDermott), Ravens (Martindale/Wink)
- Personnel requirements: Position-flexible defenders; high football IQ across roster; extensive coaching

The Evolution Toward Multiple Fronts

Modern NFL offenses have forced defenses to evolve. The traditional base front distinction (4-3 vs 3-4) matters less than it once did because teams spend most snaps in nickel or dime packages (5-6 defensive backs) rather than their base defense. However, understanding base tendencies still reveals coordinator philosophy and personnel priorities.

Identifying Base Defense from Play-by-Play Data

While film study traditionally identified defensive fronts through visual alignment analysis, we can infer base defensive structures from play-by-play data by analyzing personnel groupings. The nflfastR dataset includes defensive personnel information that tells us how many defensive linemen (DL), linebackers (LB), and defensive backs (DB) were on the field for each play.

The analytical approach: By examining which personnel packages each team uses most frequently in early-down, neutral situations (first and second down with 4-10 yards to go), we can identify their base defensive preference. Teams that primarily use packages with 4 DL and 3 LB are running a 4-3 base, while teams favoring 3 DL and 4 LB are running a 3-4 base.

Important considerations:
- Modern defenses spend 60-70% of snaps in sub-packages (nickel/dime), so base defense appears less frequently than in past eras
- Some teams are truly multiple, using both 4-3 and 3-4 looks depending on situation
- Personnel groupings don't always reveal the actual front shown (a team might use 4-3 personnel but align in a 3-4 look)
- Context matters—game script and opponent tendencies influence defensive personnel usage

R
Python

#| label: setup-r
#| message: false
#| warning: false
#| cache: true

# Load required packages for defensive scheme analysis
# tidyverse: Data manipulation and visualization
# nflfastR: NFL play-by-play data access
# nflplotR: NFL team logos and branding
# gt/gtExtras: Professional table formatting
# patchwork: Combining multiple plots
library(tidyverse)
library(nflfastR)
library(nflplotR)
library(gt)
library(gtExtras)
library(patchwork)

# Set minimal theme for cleaner visualizations
theme_set(theme_minimal())

# Load play-by-play data for 2023 season
# This includes all regular season and playoff games
# Contains ~45,000-50,000 plays with defensive personnel info
pbp <- load_pbp(2023)

# Display confirmation with total play count
cat("✓ Loaded", nrow(pbp), "plays from 2023 season\n")
cat("✓ Data includes defensive personnel, coverage schemes, and blitz indicators\n")

#| label: setup-py
#| message: false
#| warning: false

import pandas as pd
import numpy as np
import nfl_data_py as nfl
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

# Load play-by-play data
pbp = nfl.import_pbp_data([2023])

print(f"✓ Loaded {len(pbp):,} plays from 2023 season")

Defensive Personnel Package Analysis

Now that we understand the conceptual differences between defensive fronts, let's use play-by-play data to identify each team's base defensive structure. This analysis examines personnel groupings across all 32 NFL teams to classify their foundational defensive approach.

Methodology: We'll parse the defense_personnel field from nflfastR, which uses notation like "4 DL, 3 LB, 4 DB" to indicate personnel groupings. By counting how many defensive linemen and linebackers each team uses most frequently, we can infer their base front preference.

What to look for: Teams that use 4 DL + 3 LB combinations most often are running a 4-3 base. Teams favoring 3 DL + 4 LB are running a 3-4 base. Teams with significant diversity across multiple packages are running hybrid/multiple schemes.

R
Python

#| label: personnel-analysis-r
#| message: false
#| warning: false

# Analyze defensive personnel packages across all teams
# Goal: Identify each team's most common personnel grouping
def_personnel <- pbp %>%
  # Filter to plays with recorded defensive personnel
  filter(!is.na(defense_personnel)) %>%
  # Parse personnel string into separate components
  # Format is "X DL, Y LB, Z DB" - we need to extract X, Y, Z
  separate(defense_personnel,
           into = c("dl", "lb", "db"),
           sep = ",",
           remove = FALSE,      # Keep original column
           fill = "right") %>%  # Handle incomplete data
  mutate(
    # Extract numeric values using regex pattern matching
    dl = as.numeric(str_extract(dl, "\\d+")),  # DL count
    lb = as.numeric(str_extract(lb, "\\d+")),  # LB count
    db = as.numeric(str_extract(db, "\\d+"))   # DB count
  ) %>%
  # Count plays by team and personnel package
  group_by(defteam, defense_personnel, dl, lb, db) %>%
  summarise(
    plays = n(),  # Total plays in this package
    .groups = "drop"
  ) %>%
  # Calculate usage percentage for each team
  group_by(defteam) %>%
  mutate(
    pct = plays / sum(plays),           # Percentage of team's snaps
    rank = row_number(desc(plays))      # Rank by frequency
  ) %>%
  ungroup()

# Identify base defense for each team
# Use most common personnel package to infer base front
base_defense <- def_personnel %>%
  # Get each team's #1 most used personnel package
  filter(rank == 1) %>%
  # Classify base front based on DL/LB split
  mutate(
    base_front = case_when(
      dl == 4 & lb == 3 ~ "4-3",      # Traditional 4-3
      dl == 3 & lb == 4 ~ "3-4",      # Traditional 3-4
      TRUE ~ "Hybrid"                  # Multiple/hybrid scheme
    )
  ) %>%
  select(defteam, base_front, defense_personnel, plays, pct)

# Display results
base_defense %>%
  arrange(base_front, desc(pct)) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    base_front = "Base Front",
    defense_personnel = "Personnel",
    plays = "Plays",
    pct = "Usage %"
  ) %>%
  fmt_number(
    columns = pct,
    decimals = 1,
    scale_by = 100
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  data_color(
    columns = pct,
    palette = "Blues"
  ) %>%
  tab_header(
    title = "Base Defensive Fronts - 2023 Season",
    subtitle = "Most common personnel package by team"
  )

#| label: personnel-analysis-py
#| message: false
#| warning: false

# Analyze defensive personnel packages
def_personnel = (pbp
    .dropna(subset=['defense_personnel'])
    .copy()
)

# Extract personnel numbers
def_personnel['dl'] = def_personnel['defense_personnel'].str.extract(r'(\d+)\s*DL')[0].astype(float)
def_personnel['lb'] = def_personnel['defense_personnel'].str.extract(r'(\d+)\s*LB')[0].astype(float)
def_personnel['db'] = def_personnel['defense_personnel'].str.extract(r'(\d+)\s*DB')[0].astype(float)

# Calculate usage by team
personnel_summary = (def_personnel
    .groupby(['defteam', 'defense_personnel', 'dl', 'lb', 'db'])
    .size()
    .reset_index(name='plays')
)

personnel_summary['pct'] = (
    personnel_summary.groupby('defteam')['plays']
    .transform(lambda x: x / x.sum() * 100)
)

personnel_summary['rank'] = (
    personnel_summary.groupby('defteam')['plays']
    .rank(ascending=False, method='first')
)

# Identify base defense
base_defense = (personnel_summary
    .query("rank == 1")
    .copy()
)

# Classify base front
def classify_front(row):
    if row['dl'] == 4 and row['lb'] == 3:
        return '4-3'
    elif row['dl'] == 3 and row['lb'] == 4:
        return '3-4'
    else:
        return 'Hybrid'

base_defense['base_front'] = base_defense.apply(classify_front, axis=1)

# Display results
print("\nBase Defensive Fronts - 2023 Season")
print("=" * 70)
print(base_defense[['defteam', 'base_front', 'defense_personnel', 'plays', 'pct']]
      .sort_values(['base_front', 'pct'], ascending=[True, False])
      .to_string(index=False))

Personnel Package Notation

Defensive personnel is typically notated as "DL-LB-DB": - **4-2-5**: Four defensive linemen, two linebackers, five defensive backs (nickel) - **4-3-4**: Four defensive linemen, three linebackers, four defensive backs (base 4-3) - **3-4-4**: Three defensive linemen, four linebackers, four defensive backs (base 3-4)

Interpreting the results: When you run this analysis, you'll discover that most NFL teams don't fit neatly into traditional 4-3 or 3-4 categories. Many teams' most common personnel package is actually nickel (4-2-5 or 3-3-5), reflecting the modern NFL's passing emphasis. However, the underlying philosophy—whether they prefer four down linemen or three—still reveals important strategic tendencies.

Why Some Teams Show "Hybrid" Classification

If a team's most common package doesn't fit traditional 4-3 or 3-4 patterns (for example, 3-3-5 nickel or 2-4-5 dime), they're classified as "Hybrid." This often indicates a genuinely multiple scheme that adapts based on opponent rather than a fixed base structure. The Baltimore Ravens under Wink Martindale, for instance, famously used such diverse personnel that identifying a "base" defense was nearly impossible.

Context Dependence of Base Defense

Base defense identification from season-long data can be misleading. Some teams change their base front mid-season due to injuries or performance. Others use different fronts against different opponents. Always examine base defense in context of specific games and situations rather than treating it as an immutable characteristic.

Coverage Tendency Mapping

Coverage Classification and Strategic Implications

Once we've identified a team's base front, the next layer of scheme analysis involves mapping their coverage tendencies. Coverage—how the defense defends receivers—is arguably more important than front structure in the modern NFL. A defense might run a 4-3 front but play man coverage, or run a 3-4 front with predominantly zone coverage, and these choices dramatically affect offensive strategy.

The coverage spectrum: NFL coverage schemes exist on a spectrum from pure man coverage (each defender assigned to a specific receiver) to pure zone coverage (each defender responsible for an area of the field). Most modern defenses blend man and zone principles, creating hybrid coverages that combine elements of both approaches.

Why coverage tendencies matter for game planning: If an offensive coordinator knows that a defense plays Cover 3 zone on 60% of third-and-long situations, they can design route combinations specifically to attack Cover 3's vulnerable areas (the deep middle and flat zones). Conversely, if a defense frequently shows man coverage, the offense might emphasize pick routes and bunch formations to create natural rubs that free receivers.

Modern defenses use various coverage schemes, each with distinct strategic purposes:

Man Coverage Schemes:
- Cover 0: Pure man coverage with no deep safety help; ultra-aggressive; vulnerable to any broken coverage
- Cover 1: Man coverage with single high safety as last line of defense; balanced aggression
- Cover 2 Man: Man coverage with two deep safeties; safer against deep shots but vulnerable to intermediate routes

Zone Coverage Schemes:
- Cover 2: Two deep safeties divide the field in half, five underneath defenders cover shallow zones; vulnerable to deep middle
- Cover 3: Three deep defenders each cover one-third of the field, four underneath zones; sound against deep passes
- Cover 4 (Quarters): Four deep zones with four defenders splitting the field into quarters; excellent against four vertical routes
- Cover 6: Half-field coverage with quarter-quarter on one side, half coverage on the other; creates asymmetric coverage

Pattern-Matching Coverage

Modern NFL defenses increasingly use "pattern-matching" or "match" coverage, which starts in zone but converts to man coverage based on route patterns. This creates classification challenges—is it man or zone? The answer: both. For analytical purposes, these often appear as man coverage in data because defenders end up matched to specific receivers.

Coverage Disguise and Data Limitations

Elite defensive coordinators disguise their coverage shells pre-snap, showing Cover 2 but rotating to Cover 3 after the snap, or vice versa. Play-by-play data typically records the post-snap coverage, but quarterbacks must make pre-snap reads. This creates a gap between what the data shows and what the quarterback actually sees, limiting our ability to fully analyze coverage effectiveness.

#| label: coverage-tendencies-r
#| message: false
#| warning: false

# Analyze coverage tendencies by down and distance
coverage_tendencies <- pbp %>%
  filter(
    !is.na(coverage_scheme),
    !is.na(down),
    down <= 3
  ) %>%
  mutate(
    distance_bin = case_when(
      ydstogo <= 3 ~ "Short (1-3)",
      ydstogo <= 7 ~ "Medium (4-7)",
      ydstogo <= 10 ~ "Long (8-10)",
      TRUE ~ "Very Long (11+)"
    ),
    distance_bin = factor(distance_bin,
                          levels = c("Short (1-3)", "Medium (4-7)",
                                    "Long (8-10)", "Very Long (11+)"))
  ) %>%
  group_by(defteam, down, distance_bin, coverage_scheme) %>%
  summarise(
    plays = n(),
    .groups = "drop"
  ) %>%
  group_by(defteam, down, distance_bin) %>%
  mutate(
    pct = plays / sum(plays)
  ) %>%
  ungroup()

# Focus on primary coverage schemes
primary_coverage <- coverage_tendencies %>%
  filter(coverage_scheme %in% c("Cover 1", "Cover 2", "Cover 3",
                                "Cover 4", "Cover 6", "2 Man")) %>%
  group_by(defteam, coverage_scheme) %>%
  summarise(
    total_plays = sum(plays),
    avg_pct = mean(pct),
    .groups = "drop"
  ) %>%
  group_by(defteam) %>%
  mutate(
    overall_pct = total_plays / sum(total_plays)
  ) %>%
  ungroup()

# Get top coverage for each team
top_coverage <- primary_coverage %>%
  group_by(defteam) %>%
  slice_max(overall_pct, n = 3) %>%
  mutate(rank = row_number()) %>%
  ungroup()

# Display results
top_coverage %>%
  filter(rank == 1) %>%
  arrange(desc(overall_pct)) %>%
  select(defteam, coverage_scheme, overall_pct, avg_pct) %>%
  head(10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    coverage_scheme = "Primary Coverage",
    overall_pct = "Overall %",
    avg_pct = "Avg Situation %"
  ) %>%
  fmt_percent(
    columns = c(overall_pct, avg_pct),
    decimals = 1
  ) %>%
  data_color(
    columns = overall_pct,
    palette = "Greens"
  ) %>%
  tab_header(
    title = "Primary Coverage Schemes - 2023",
    subtitle = "Most frequently used coverage by team"
  )

#| label: coverage-tendencies-py
#| message: false
#| warning: false

# Analyze coverage tendencies
coverage_data = (pbp
    .dropna(subset=['coverage_scheme', 'down'])
    .query("down <= 3")
    .copy()
)

# Create distance bins
def classify_distance(ydstogo):
    if ydstogo <= 3:
        return "Short (1-3)"
    elif ydstogo <= 7:
        return "Medium (4-7)"
    elif ydstogo <= 10:
        return "Long (8-10)"
    else:
        return "Very Long (11+)"

coverage_data['distance_bin'] = coverage_data['ydstogo'].apply(classify_distance)

# Calculate tendencies
coverage_tendencies = (coverage_data
    .groupby(['defteam', 'down', 'distance_bin', 'coverage_scheme'])
    .size()
    .reset_index(name='plays')
)

coverage_tendencies['pct'] = (
    coverage_tendencies.groupby(['defteam', 'down', 'distance_bin'])['plays']
    .transform(lambda x: x / x.sum())
)

# Primary coverage schemes
primary_schemes = ['Cover 1', 'Cover 2', 'Cover 3', 'Cover 4', 'Cover 6', '2 Man']

primary_coverage = (coverage_tendencies
    .query("coverage_scheme in @primary_schemes")
    .groupby(['defteam', 'coverage_scheme'])
    .agg(
        total_plays=('plays', 'sum'),
        avg_pct=('pct', 'mean')
    )
    .reset_index()
)

primary_coverage['overall_pct'] = (
    primary_coverage.groupby('defteam')['total_plays']
    .transform(lambda x: x / x.sum())
)

# Top coverage per team
top_coverage = (primary_coverage
    .sort_values(['defteam', 'overall_pct'], ascending=[True, False])
    .groupby('defteam')
    .head(1)
    .sort_values('overall_pct', ascending=False)
)

print("\nPrimary Coverage Schemes - 2023")
print("=" * 70)
print(top_coverage[['defteam', 'coverage_scheme', 'overall_pct', 'avg_pct']].head(10).to_string(index=False))

Coverage Heatmap by Down and Distance

R
Python

#| label: fig-coverage-heatmap-r
#| fig-cap: "Coverage scheme usage by down and distance - Example team"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Select a team for detailed analysis (e.g., SF - known for diverse coverage)
example_team <- "SF"

coverage_heatmap_data <- coverage_tendencies %>%
  filter(
    defteam == example_team,
    coverage_scheme %in% c("Cover 1", "Cover 2", "Cover 3",
                           "Cover 4", "Cover 6", "2 Man")
  )

# Create heatmap
ggplot(coverage_heatmap_data, aes(x = distance_bin, y = coverage_scheme, fill = pct)) +
  geom_tile(color = "white", linewidth = 0.5) +
  geom_text(aes(label = scales::percent(pct, accuracy = 1)),
            color = "white", size = 3, fontface = "bold") +
  facet_wrap(~ paste("Down", down), ncol = 3) +
  scale_fill_gradient(
    low = "#08306b",
    high = "#c6dbef",
    labels = scales::percent_format()
  ) +
  labs(
    title = paste(example_team, "Coverage Tendencies by Down and Distance"),
    subtitle = "2023 Regular Season",
    x = "Distance to Go",
    y = "Coverage Scheme",
    fill = "Usage %",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "right",
    panel.grid = element_blank()
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-coverage-heatmap-py
#| fig-cap: "Coverage scheme usage by down and distance - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Select example team
example_team = "SF"

heatmap_data = (coverage_tendencies
    .query("defteam == @example_team")
    .query("coverage_scheme in @primary_schemes")
)

# Create subplots for each down
fig, axes = plt.subplots(1, 3, figsize=(14, 6))

for i, down in enumerate([1, 2, 3]):
    down_data = heatmap_data.query("down == @down")

    # Pivot for heatmap
    pivot_data = down_data.pivot_table(
        index='coverage_scheme',
        columns='distance_bin',
        values='pct',
        fill_value=0
    )

    # Reorder columns
    col_order = ['Short (1-3)', 'Medium (4-7)', 'Long (8-10)', 'Very Long (11+)']
    pivot_data = pivot_data.reindex(columns=[c for c in col_order if c in pivot_data.columns])

    # Create heatmap
    sns.heatmap(
        pivot_data,
        annot=True,
        fmt='.1%',
        cmap='Blues',
        cbar_kws={'label': 'Usage %'},
        ax=axes[i],
        vmin=0,
        vmax=pivot_data.max().max() if len(pivot_data) > 0 else 1
    )

    axes[i].set_title(f'Down {down}', fontsize=12, fontweight='bold')
    axes[i].set_xlabel('Distance to Go')
    axes[i].set_ylabel('Coverage Scheme' if i == 0 else '')

plt.suptitle(f'{example_team} Coverage Tendencies by Down and Distance\n2023 Regular Season',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

Blitz Frequency and Patterns

Understanding Blitz Strategy

Blitzing—sending more pass rushers than the offense can block with its protection scheme—represents one of the most critical decisions defensive coordinators make. The blitz creates a fundamental trade-off: increased pressure on the quarterback versus decreased coverage resources in the secondary. Understanding when, how often, and how successfully teams blitz reveals core philosophical differences between defensive coordinators.

The blitz calculus: A standard pass protection can handle five rushers (five offensive linemen). When a defense sends six or more rushers, they're blitzing—committing extra defenders to the pass rush while leaving fewer in coverage. This creates numerical advantages in the pass rush but numerical disadvantages in coverage. The offense has more receivers than the defense has coverage defenders.

Why blitz rates vary dramatically: Some coordinators (historically, Rex Ryan, Gregg Williams, Vic Fangio at times) blitz frequently, believing pressure disrupts timing and forces mistakes. Others (historically, Pete Carroll, Dan Quinn, Belichick in certain game plans) rarely blitz, preferring to rush four and drop seven into coverage, trusting their front four to generate pressure while maintaining coverage integrity.

Modern blitz trends: League-wide blitz rates have decreased over the past decade as passing offenses have become more sophisticated at identifying and exploiting blitzes. Elite quarterbacks often perform better against the blitz than against standard four-man rushes because blitz creates more defined coverage looks and leverage advantages for receivers. However, well-timed blitzes against less experienced quarterbacks or in specific situations remain highly effective.

Blitz Analysis Framework

Blitz analysis examines when and how defenses send extra rushers. Our analytical approach considers:

Overall blitz frequency: What percentage of pass plays feature a blitz?
Situational tendencies: When does the defense blitz (down, distance, field position)?
Blitz effectiveness: Does blitzing improve defensive performance (measured by EPA)?
Blitz success differential: How much better or worse is the defense when blitzing vs. not blitzing?

The nflfastR dataset includes a blitz indicator (1 = blitz, 0 = no blitz) that allows us to analyze these patterns systematically.

R
Python

#| label: blitz-analysis-r
#| message: false
#| warning: false

# Analyze blitz tendencies
blitz_analysis <- pbp %>%
  filter(
    !is.na(blitz),
    play_type %in% c("pass", "run"),
    down <= 3
  ) %>%
  mutate(
    distance_bin = case_when(
      ydstogo <= 3 ~ "Short",
      ydstogo <= 7 ~ "Medium",
      TRUE ~ "Long"
    ),
    field_position = case_when(
      yardline_100 >= 80 ~ "Opponent Territory",
      yardline_100 >= 50 ~ "Midfield",
      yardline_100 >= 20 ~ "Own Territory",
      TRUE ~ "Red Zone Defense"
    )
  ) %>%
  group_by(defteam, down, distance_bin) %>%
  summarise(
    plays = n(),
    blitz_plays = sum(blitz == 1),
    blitz_rate = mean(blitz == 1),
    blitz_epa = mean(epa[blitz == 1], na.rm = TRUE),
    no_blitz_epa = mean(epa[blitz == 0], na.rm = TRUE),
    .groups = "drop"
  )

# Overall blitz rates by team
team_blitz <- pbp %>%
  filter(
    !is.na(blitz),
    play_type == "pass"
  ) %>%
  group_by(defteam) %>%
  summarise(
    total_dropbacks = n(),
    blitz_plays = sum(blitz == 1),
    blitz_rate = mean(blitz == 1),
    blitz_epa = mean(epa[blitz == 1], na.rm = TRUE),
    no_blitz_epa = mean(epa[blitz == 0], na.rm = TRUE),
    epa_diff = blitz_epa - no_blitz_epa,
    .groups = "drop"
  ) %>%
  arrange(desc(blitz_rate))

# Display results
team_blitz %>%
  head(15) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    total_dropbacks = "Dropbacks",
    blitz_plays = "Blitzes",
    blitz_rate = "Blitz %",
    blitz_epa = "EPA When Blitz",
    no_blitz_epa = "EPA No Blitz",
    epa_diff = "EPA Diff"
  ) %>%
  fmt_percent(
    columns = blitz_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(blitz_epa, no_blitz_epa, epa_diff),
    decimals = 3
  ) %>%
  fmt_number(
    columns = c(total_dropbacks, blitz_plays),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  data_color(
    columns = blitz_rate,
    palette = "Reds"
  ) %>%
  data_color(
    columns = epa_diff,
    palette = "RdYlGn",
    reverse = TRUE
  ) %>%
  tab_header(
    title = "Defensive Blitz Rates - 2023",
    subtitle = "Pass plays only, sorted by blitz frequency"
  )

#| label: blitz-analysis-py
#| message: false
#| warning: false

# Analyze blitz tendencies
blitz_data = (pbp
    .dropna(subset=['blitz'])
    .query("play_type in ['pass', 'run'] and down <= 3")
    .copy()
)

# Create bins
def distance_bin(ydstogo):
    if ydstogo <= 3:
        return "Short"
    elif ydstogo <= 7:
        return "Medium"
    else:
        return "Long"

blitz_data['distance_bin'] = blitz_data['ydstogo'].apply(distance_bin)

# Team blitz rates
team_blitz = (pbp
    .dropna(subset=['blitz'])
    .query("play_type == 'pass'")
    .groupby('defteam')
    .agg(
        total_dropbacks=('blitz', 'count'),
        blitz_plays=('blitz', 'sum'),
        blitz_rate=('blitz', 'mean'),
        blitz_epa=('epa', lambda x: x[pbp.loc[x.index, 'blitz'] == 1].mean()),
        no_blitz_epa=('epa', lambda x: x[pbp.loc[x.index, 'blitz'] == 0].mean())
    )
    .reset_index()
)

team_blitz['epa_diff'] = team_blitz['blitz_epa'] - team_blitz['no_blitz_epa']
team_blitz = team_blitz.sort_values('blitz_rate', ascending=False)

print("\nDefensive Blitz Rates - 2023")
print("=" * 90)
print(team_blitz.head(15).to_string(index=False))

Blitz Success by Situation

R
Python

#| label: fig-blitz-success-r
#| fig-cap: "Blitz effectiveness by down and distance"
#| fig-width: 11
#| fig-height: 7
#| message: false
#| warning: false

# Aggregate blitz success
blitz_success <- pbp %>%
  filter(
    !is.na(blitz),
    play_type == "pass",
    down <= 3
  ) %>%
  mutate(
    distance_bin = case_when(
      ydstogo <= 3 ~ "Short (1-3)",
      ydstogo <= 7 ~ "Medium (4-7)",
      TRUE ~ "Long (8+)"
    ),
    distance_bin = factor(distance_bin,
                          levels = c("Short (1-3)", "Medium (4-7)", "Long (8+)"))
  ) %>%
  group_by(down, distance_bin, blitz) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    sack_rate = mean(sack == 1, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    blitz_label = ifelse(blitz == 1, "Blitz", "No Blitz")
  )

# Create visualization
ggplot(blitz_success, aes(x = distance_bin, y = avg_epa, fill = blitz_label)) +
  geom_col(position = "dodge", alpha = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  geom_text(aes(label = sprintf("%.3f", avg_epa)),
            position = position_dodge(width = 0.9),
            vjust = ifelse(blitz_success$avg_epa > 0, -0.5, 1.5),
            size = 3) +
  facet_wrap(~ paste("Down", down), ncol = 3) +
  scale_fill_manual(
    values = c("Blitz" = "#d62728", "No Blitz" = "#2ca02c")
  ) +
  labs(
    title = "Offensive EPA Against Blitz vs. No Blitz",
    subtitle = "By down and distance - 2023 Season",
    x = "Distance to Go",
    y = "Average EPA (Offense)",
    fill = "Defensive Strategy",
    caption = "Data: nflfastR | Note: Negative EPA favors defense"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top",
    axis.text.x = element_text(angle = 0)
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-blitz-success-py
#| fig-cap: "Blitz effectiveness by down and distance - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Calculate blitz success
blitz_success_data = (pbp
    .dropna(subset=['blitz', 'epa'])
    .query("play_type == 'pass' and down <= 3")
    .copy()
)

# Create distance bins
def distance_bin_full(ydstogo):
    if ydstogo <= 3:
        return "Short (1-3)"
    elif ydstogo <= 7:
        return "Medium (4-7)"
    else:
        return "Long (8+)"

blitz_success_data['distance_bin'] = blitz_success_data['ydstogo'].apply(distance_bin_full)
blitz_success_data['blitz_label'] = blitz_success_data['blitz'].map({1: 'Blitz', 0: 'No Blitz'})

# Aggregate
blitz_success = (blitz_success_data
    .groupby(['down', 'distance_bin', 'blitz_label'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean')
    )
    .reset_index()
)

# Create subplot for each down
fig, axes = plt.subplots(1, 3, figsize=(14, 5))

for i, down in enumerate([1, 2, 3]):
    down_data = blitz_success.query("down == @down")

    # Create grouped bar chart
    x = np.arange(len(down_data['distance_bin'].unique()))
    width = 0.35

    blitz_data = down_data[down_data['blitz_label'] == 'Blitz'].set_index('distance_bin')['avg_epa']
    no_blitz_data = down_data[down_data['blitz_label'] == 'No Blitz'].set_index('distance_bin')['avg_epa']

    axes[i].bar(x - width/2, blitz_data, width, label='Blitz', color='#d62728', alpha=0.8)
    axes[i].bar(x + width/2, no_blitz_data, width, label='No Blitz', color='#2ca02c', alpha=0.8)

    axes[i].axhline(y=0, color='black', linestyle='--', alpha=0.5)
    axes[i].set_xlabel('Distance to Go')
    axes[i].set_ylabel('Average EPA (Offense)' if i == 0 else '')
    axes[i].set_title(f'Down {down}', fontweight='bold')
    axes[i].set_xticks(x)
    axes[i].set_xticklabels(down_data['distance_bin'].unique())
    if i == 2:
        axes[i].legend()

plt.suptitle('Offensive EPA Against Blitz vs. No Blitz\nBy Down and Distance - 2023 Season',
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

Interpreting Blitz EPA

Lower (more negative) EPA values favor the defense. When analyzing blitz effectiveness: - Negative EPA difference = Blitz working (offense worse against blitz) - Positive EPA difference = Blitz backfiring (offense better against blitz)

Situational Scheme Adjustments

Red Zone Defense

Defenses adjust their schemes significantly in the red zone:

R
Python

#| label: red-zone-defense-r
#| message: false
#| warning: false

# Analyze red zone defensive adjustments
red_zone_def <- pbp %>%
  filter(
    !is.na(coverage_scheme),
    play_type == "pass"
  ) %>%
  mutate(
    zone_type = case_when(
      yardline_100 >= 80 ~ "Own 20 to Opp 20",
      yardline_100 >= 50 ~ "Opp 50 to Opp 20",
      yardline_100 >= 20 ~ "Red Zone (20-10)",
      yardline_100 >= 10 ~ "Red Zone (10-Goal)",
      TRUE ~ "Inside 10"
    )
  ) %>%
  group_by(zone_type, coverage_scheme) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(zone_type) %>%
  mutate(
    usage_pct = plays / sum(plays)
  ) %>%
  ungroup() %>%
  filter(coverage_scheme %in% c("Cover 1", "Cover 2", "Cover 3",
                                "Cover 4", "Cover 6", "2 Man"))

# Display top coverages by zone
red_zone_summary <- red_zone_def %>%
  group_by(zone_type) %>%
  slice_max(usage_pct, n = 3) %>%
  ungroup() %>%
  arrange(zone_type, desc(usage_pct))

red_zone_summary %>%
  gt() %>%
  cols_label(
    zone_type = "Field Zone",
    coverage_scheme = "Coverage",
    plays = "Plays",
    usage_pct = "Usage %",
    avg_epa = "Avg EPA",
    success_rate = "Success %"
  ) %>%
  fmt_percent(
    columns = c(usage_pct, success_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = avg_epa,
    decimals = 3
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Coverage Schemes by Field Position",
    subtitle = "Top 3 coverages per zone - 2023 Season"
  )

#| label: red-zone-defense-py
#| message: false
#| warning: false

# Analyze red zone adjustments
def classify_zone(yardline):
    if yardline >= 80:
        return "Own 20 to Opp 20"
    elif yardline >= 50:
        return "Opp 50 to Opp 20"
    elif yardline >= 20:
        return "Red Zone (20-10)"
    elif yardline >= 10:
        return "Red Zone (10-Goal)"
    else:
        return "Inside 10"

red_zone_data = (pbp
    .dropna(subset=['coverage_scheme'])
    .query("play_type == 'pass'")
    .copy()
)

red_zone_data['zone_type'] = red_zone_data['yardline_100'].apply(classify_zone)

# Calculate metrics
red_zone_def = (red_zone_data
    .groupby(['zone_type', 'coverage_scheme'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
)

red_zone_def['usage_pct'] = (
    red_zone_def.groupby('zone_type')['plays']
    .transform(lambda x: x / x.sum())
)

# Filter to main coverages
primary_schemes = ['Cover 1', 'Cover 2', 'Cover 3', 'Cover 4', 'Cover 6', '2 Man']
red_zone_def = red_zone_def.query("coverage_scheme in @primary_schemes")

# Top 3 per zone
red_zone_summary = (red_zone_def
    .sort_values(['zone_type', 'usage_pct'], ascending=[True, False])
    .groupby('zone_type')
    .head(3)
)

print("\nCoverage Schemes by Field Position - 2023")
print("=" * 85)
print(red_zone_summary.to_string(index=False))

Third Down Defense

Third down situations require specialized defensive approaches:

R
Python

#| label: third-down-defense-r
#| message: false
#| warning: false

# Third down defensive analysis
third_down_def <- pbp %>%
  filter(
    down == 3,
    !is.na(coverage_scheme),
    play_type == "pass"
  ) %>%
  mutate(
    distance_category = case_when(
      ydstogo <= 3 ~ "3rd & Short (1-3)",
      ydstogo <= 6 ~ "3rd & Medium (4-6)",
      ydstogo <= 10 ~ "3rd & Long (7-10)",
      TRUE ~ "3rd & Very Long (11+)"
    )
  ) %>%
  group_by(distance_category, coverage_scheme) %>%
  summarise(
    plays = n(),
    conversion_rate = mean(third_down_converted == 1, na.rm = TRUE),
    avg_epa = mean(epa, na.rm = TRUE),
    sack_rate = mean(sack == 1, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(distance_category) %>%
  mutate(
    usage_pct = plays / sum(plays)
  ) %>%
  ungroup() %>%
  filter(
    coverage_scheme %in% c("Cover 1", "Cover 2", "Cover 3",
                           "Cover 4", "Cover 6", "2 Man"),
    plays >= 20  # Minimum sample size
  )

# Display results
third_down_def %>%
  arrange(distance_category, desc(usage_pct)) %>%
  head(16) %>%
  gt() %>%
  cols_label(
    distance_category = "Situation",
    coverage_scheme = "Coverage",
    plays = "Plays",
    usage_pct = "Usage %",
    conversion_rate = "Conv %",
    avg_epa = "EPA",
    sack_rate = "Sack %"
  ) %>%
  fmt_percent(
    columns = c(usage_pct, conversion_rate, sack_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = avg_epa,
    decimals = 3
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  data_color(
    columns = conversion_rate,
    palette = "RdYlGn",
    reverse = TRUE
  ) %>%
  tab_header(
    title = "Third Down Coverage Analysis",
    subtitle = "Coverage performance by distance - 2023 Season"
  )

#| label: third-down-defense-py
#| message: false
#| warning: false

# Third down analysis
def distance_category(ydstogo):
    if ydstogo <= 3:
        return "3rd & Short (1-3)"
    elif ydstogo <= 6:
        return "3rd & Medium (4-6)"
    elif ydstogo <= 10:
        return "3rd & Long (7-10)"
    else:
        return "3rd & Very Long (11+)"

third_down_data = (pbp
    .query("down == 3 and play_type == 'pass'")
    .dropna(subset=['coverage_scheme'])
    .copy()
)

third_down_data['distance_category'] = third_down_data['ydstogo'].apply(distance_category)

# Calculate metrics
third_down_def = (third_down_data
    .groupby(['distance_category', 'coverage_scheme'])
    .agg(
        plays=('epa', 'count'),
        conversion_rate=('third_down_converted', lambda x: x.mean()),
        avg_epa=('epa', 'mean'),
        sack_rate=('sack', lambda x: x.mean())
    )
    .reset_index()
)

third_down_def['usage_pct'] = (
    third_down_def.groupby('distance_category')['plays']
    .transform(lambda x: x / x.sum())
)

# Filter
primary_schemes = ['Cover 1', 'Cover 2', 'Cover 3', 'Cover 4', 'Cover 6', '2 Man']
third_down_def = (third_down_def
    .query("coverage_scheme in @primary_schemes and plays >= 20")
    .sort_values(['distance_category', 'usage_pct'], ascending=[True, False])
)

print("\nThird Down Coverage Analysis - 2023")
print("=" * 90)
print(third_down_def.head(16).to_string(index=False))

Coordinator Philosophy Fingerprints

Understanding Defensive Philosophy Through Data

Every defensive coordinator brings a distinct philosophical approach to their role—a set of beliefs about how to build and deploy a defense. Some coordinators are aggressive, preferring man coverage and frequent blitzes to create negative plays. Others are conservative, playing zone coverage and rushing four to limit explosive plays and force offenses to execute long, mistake-free drives. These philosophical differences aren't random; they reflect fundamental beliefs about risk tolerance, personnel trust, and optimal defensive strategy.

What constitutes a defensive philosophy: We can characterize a coordinator's philosophy along multiple dimensions:

Aggression level: Blitz frequency and use of zero or single-high safety looks
Coverage preference: Man vs. zone coverage tendencies
Risk tolerance: Willingness to give up explosive plays to create turnovers/negative plays
Adaptability: Consistency of approach vs. game-plan variation
Complexity: Scheme diversity vs. doing a few things exceptionally well

Why philosophy matters: Understanding a coordinator's philosophy helps predict their behavior in novel situations. If a coordinator historically blitzes 40% of the time on third-and-long, they'll likely maintain that tendency even against new opponents. If they prefer Cover 3 in the red zone, that preference provides offensive coordinators a framework for game planning.

Identifying Coordinator Tendencies Through Pattern Analysis

We can identify coordinator philosophical fingerprints by analyzing patterns across multiple dimensions simultaneously. By combining blitz frequency with coverage preference, we create a two-dimensional philosophy space that reveals distinct coaching archetypes.

The four defensive archetypes:

Aggressive Man: High blitz rate (>35%) + high man coverage rate (>50%)
- Philosophy: Attack the quarterback, trust DBs in man coverage
- Risk: Vulnerable to quick throws and crossing routes
- Historical examples: Rex Ryan, Vic Fangio (early career)
Aggressive Zone: High blitz rate (>35%) + low man coverage rate (<50%)
- Philosophy: Pressure with blitz but protect with zone coverage
- Risk: Can be predictable; zone coverage susceptible to option routes
- Historical examples: Dick LeBeau, Gregg Williams
Conservative Man: Low blitz rate (<35%) + high man coverage rate (>50%)
- Philosophy: Trust front four pressure, play man coverage behind it
- Risk: If DL can't pressure, coverage eventually breaks down
- Historical examples: Bill Belichick (situational), Brandon Staley
Conservative Zone: Low blitz rate (<35%) + low man coverage rate (<50%)
- Philosophy: Bend-but-don't-break, prevent explosive plays
- Risk: Allows high completion rates and time-consuming drives
- Historical examples: Pete Carroll, Dan Quinn (recent)

Philosophy vs. Personnel Constraints

It's important to distinguish between true philosophical preferences and personnel-driven constraints. A coordinator might prefer aggressive man coverage but lack the cornerback talent to execute it, forcing a more zone-heavy, conservative approach. Ideally, we'd analyze a coordinator across multiple teams/seasons to separate philosophy from personnel limitations.

Offensive Game Planning Based on Philosophy

Once you've identified a coordinator's philosophy archetype: - **Against Aggressive Man**: Use pick routes, bunch formations, and quick game to exploit man coverage - **Against Aggressive Zone**: Attack seams and soft spots in zone, use route adjustments - **Against Conservative Man**: Run longer-developing routes; trust receivers to win - **Against Conservative Zone**: Be patient, take underneath throws, avoid turnovers

R
Python

#| label: coordinator-fingerprint-r
#| message: false
#| warning: false

# Build coordinator fingerprint
coordinator_fingerprint <- pbp %>%
  filter(
    !is.na(coverage_scheme),
    !is.na(blitz),
    play_type == "pass",
    down <= 3
  ) %>%
  group_by(defteam) %>%
  summarise(
    # Coverage preferences
    cover1_rate = mean(coverage_scheme == "Cover 1", na.rm = TRUE),
    cover2_rate = mean(coverage_scheme == "Cover 2", na.rm = TRUE),
    cover3_rate = mean(coverage_scheme == "Cover 3", na.rm = TRUE),
    man_coverage_rate = mean(coverage_scheme %in% c("Cover 1", "2 Man", "Cover 0"), na.rm = TRUE),

    # Pressure preferences
    blitz_rate = mean(blitz == 1),

    # Aggressiveness
    cover0_rate = mean(coverage_scheme == "Cover 0", na.rm = TRUE),

    # Results
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),

    total_plays = n(),
    .groups = "drop"
  ) %>%
  mutate(
    philosophy = case_when(
      blitz_rate > 0.35 & man_coverage_rate > 0.50 ~ "Aggressive Man",
      blitz_rate > 0.35 & man_coverage_rate <= 0.50 ~ "Aggressive Zone",
      blitz_rate <= 0.35 & man_coverage_rate > 0.50 ~ "Conservative Man",
      TRUE ~ "Conservative Zone"
    )
  )

# Display results
coordinator_fingerprint %>%
  arrange(desc(blitz_rate)) %>%
  select(defteam, philosophy, blitz_rate, man_coverage_rate,
         cover3_rate, avg_epa, success_rate) %>%
  head(16) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    philosophy = "Philosophy",
    blitz_rate = "Blitz %",
    man_coverage_rate = "Man %",
    cover3_rate = "Cover 3 %",
    avg_epa = "EPA",
    success_rate = "Off Success %"
  ) %>%
  fmt_percent(
    columns = c(blitz_rate, man_coverage_rate, cover3_rate, success_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = avg_epa,
    decimals = 3
  ) %>%
  data_color(
    columns = blitz_rate,
    palette = "Reds"
  ) %>%
  data_color(
    columns = avg_epa,
    palette = "RdYlGn",
    reverse = TRUE
  ) %>%
  tab_header(
    title = "Defensive Coordinator Philosophy Fingerprints",
    subtitle = "2023 Season - Pass plays on downs 1-3"
  )

#| label: coordinator-fingerprint-py
#| message: false
#| warning: false

# Build coordinator fingerprint
fingerprint_data = (pbp
    .dropna(subset=['coverage_scheme', 'blitz'])
    .query("play_type == 'pass' and down <= 3")
)

# Calculate team tendencies
coordinator_fingerprint = (fingerprint_data
    .groupby('defteam')
    .agg(
        cover1_rate=('coverage_scheme', lambda x: (x == 'Cover 1').mean()),
        cover2_rate=('coverage_scheme', lambda x: (x == 'Cover 2').mean()),
        cover3_rate=('coverage_scheme', lambda x: (x == 'Cover 3').mean()),
        man_coverage_rate=('coverage_scheme', lambda x: x.isin(['Cover 1', '2 Man', 'Cover 0']).mean()),
        blitz_rate=('blitz', 'mean'),
        cover0_rate=('coverage_scheme', lambda x: (x == 'Cover 0').mean()),
        avg_epa=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean()),
        total_plays=('epa', 'count')
    )
    .reset_index()
)

# Classify philosophy
def classify_philosophy(row):
    if row['blitz_rate'] > 0.35 and row['man_coverage_rate'] > 0.50:
        return "Aggressive Man"
    elif row['blitz_rate'] > 0.35:
        return "Aggressive Zone"
    elif row['man_coverage_rate'] > 0.50:
        return "Conservative Man"
    else:
        return "Conservative Zone"

coordinator_fingerprint['philosophy'] = coordinator_fingerprint.apply(classify_philosophy, axis=1)

# Display results
print("\nDefensive Coordinator Philosophy Fingerprints - 2023")
print("=" * 100)
print(coordinator_fingerprint
      .sort_values('blitz_rate', ascending=False)
      [['defteam', 'philosophy', 'blitz_rate', 'man_coverage_rate',
        'cover3_rate', 'avg_epa', 'success_rate']]
      .head(16)
      .to_string(index=False))

Philosophy Clustering Visualization

R
Python

#| label: fig-philosophy-cluster-r
#| fig-cap: "Defensive coordinator philosophy clustering"
#| fig-width: 11
#| fig-height: 8
#| message: false
#| warning: false

# Create philosophy scatter plot
ggplot(coordinator_fingerprint,
       aes(x = man_coverage_rate, y = blitz_rate, color = philosophy)) +
  geom_point(size = 4, alpha = 0.7) +
  geom_nfl_logos(aes(team_abbr = defteam), width = 0.04, alpha = 0.8) +
  geom_vline(xintercept = 0.50, linetype = "dashed", alpha = 0.3) +
  geom_hline(yintercept = 0.35, linetype = "dashed", alpha = 0.3) +
  annotate("text", x = 0.25, y = 0.48, label = "Conservative\nZone",
           fontface = "italic", color = "gray40", size = 3.5) +
  annotate("text", x = 0.75, y = 0.48, label = "Conservative\nMan",
           fontface = "italic", color = "gray40", size = 3.5) +
  annotate("text", x = 0.25, y = 0.22, label = "Aggressive\nZone",
           fontface = "italic", color = "gray40", size = 3.5) +
  annotate("text", x = 0.75, y = 0.22, label = "Aggressive\nMan",
           fontface = "italic", color = "gray40", size = 3.5) +
  scale_x_continuous(labels = scales::percent_format(), limits = c(0.15, 0.85)) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0.15, 0.55)) +
  scale_color_manual(
    values = c(
      "Aggressive Man" = "#d62728",
      "Aggressive Zone" = "#ff7f0e",
      "Conservative Man" = "#2ca02c",
      "Conservative Zone" = "#1f77b4"
    )
  ) +
  labs(
    title = "Defensive Coordinator Philosophy Clustering",
    subtitle = "Man Coverage Rate vs. Blitz Rate - 2023 Season",
    x = "Man Coverage Rate",
    y = "Blitz Rate",
    color = "Philosophy",
    caption = "Data: nflfastR | Pass plays on downs 1-3"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom",
    panel.grid.minor = element_blank()
  )

#| label: fig-philosophy-cluster-py
#| fig-cap: "Defensive coordinator philosophy clustering - Python"
#| fig-width: 11
#| fig-height: 8
#| message: false
#| warning: false

# Create scatter plot
fig, ax = plt.subplots(figsize=(11, 8))

# Define colors for each philosophy
philosophy_colors = {
    "Aggressive Man": "#d62728",
    "Aggressive Zone": "#ff7f0e",
    "Conservative Man": "#2ca02c",
    "Conservative Zone": "#1f77b4"
}

# Plot each philosophy
for philosophy, color in philosophy_colors.items():
    data = coordinator_fingerprint[coordinator_fingerprint['philosophy'] == philosophy]
    ax.scatter(data['man_coverage_rate'], data['blitz_rate'],
              c=color, label=philosophy, s=100, alpha=0.7)

# Add reference lines
ax.axvline(x=0.50, linestyle='--', alpha=0.3, color='gray')
ax.axhline(y=0.35, linestyle='--', alpha=0.3, color='gray')

# Add quadrant labels
ax.text(0.25, 0.48, 'Conservative\nZone', ha='center', va='center',
        fontsize=10, style='italic', color='gray')
ax.text(0.75, 0.48, 'Conservative\nMan', ha='center', va='center',
        fontsize=10, style='italic', color='gray')
ax.text(0.25, 0.22, 'Aggressive\nZone', ha='center', va='center',
        fontsize=10, style='italic', color='gray')
ax.text(0.75, 0.22, 'Aggressive\nMan', ha='center', va='center',
        fontsize=10, style='italic', color='gray')

ax.set_xlabel('Man Coverage Rate', fontsize=12)
ax.set_ylabel('Blitz Rate', fontsize=12)
ax.set_title('Defensive Coordinator Philosophy Clustering\nMan Coverage Rate vs. Blitz Rate - 2023 Season',
            fontsize=14, fontweight='bold')
ax.legend(title='Philosophy', loc='upper right')
ax.set_xlim(0.15, 0.85)
ax.set_ylim(0.15, 0.55)

# Format as percentages
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:.0%}'))
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, p: f'{y:.0%}'))

plt.tight_layout()
plt.show()

Personnel Package Correlations

Matching Personnel to Coverage

Analyzing how defensive personnel correlates with coverage schemes:

R
Python

#| label: personnel-coverage-r
#| message: false
#| warning: false

# Analyze personnel-coverage relationships
personnel_coverage <- pbp %>%
  filter(
    !is.na(defense_personnel),
    !is.na(coverage_scheme)
  ) %>%
  separate(defense_personnel,
           into = c("dl", "lb", "db"),
           sep = ",",
           remove = FALSE,
           fill = "right") %>%
  mutate(
    dl = as.numeric(str_extract(dl, "\\d+")),
    lb = as.numeric(str_extract(lb, "\\d+")),
    db = as.numeric(str_extract(db, "\\d+"))
  ) %>%
  filter(!is.na(db)) %>%
  group_by(defense_personnel, db, coverage_scheme) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(defense_personnel) %>%
  mutate(
    total_plays = sum(plays),
    coverage_pct = plays / total_plays
  ) %>%
  ungroup() %>%
  filter(
    total_plays >= 100,  # Minimum usage
    coverage_scheme %in% c("Cover 1", "Cover 2", "Cover 3",
                           "Cover 4", "Cover 6", "2 Man")
  )

# Get top coverage for each personnel
top_personnel_coverage <- personnel_coverage %>%
  group_by(defense_personnel) %>%
  slice_max(coverage_pct, n = 2) %>%
  ungroup() %>%
  arrange(defense_personnel, desc(coverage_pct))

# Display
top_personnel_coverage %>%
  head(20) %>%
  gt() %>%
  cols_label(
    defense_personnel = "Personnel",
    db = "DBs",
    coverage_scheme = "Coverage",
    plays = "Plays",
    coverage_pct = "Coverage %",
    avg_epa = "Avg EPA"
  ) %>%
  fmt_percent(
    columns = coverage_pct,
    decimals = 1
  ) %>%
  fmt_number(
    columns = avg_epa,
    decimals = 3
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Personnel-Coverage Correlations",
    subtitle = "Top coverages by defensive personnel grouping"
  )

#| label: personnel-coverage-py
#| message: false
#| warning: false

# Analyze personnel-coverage relationships
pers_cov_data = (pbp
    .dropna(subset=['defense_personnel', 'coverage_scheme'])
    .copy()
)

# Extract DB count
pers_cov_data['db'] = pers_cov_data['defense_personnel'].str.extract(r'(\d+)\s*DB')[0].astype(float)
pers_cov_data = pers_cov_data.dropna(subset=['db'])

# Calculate correlations
personnel_coverage = (pers_cov_data
    .groupby(['defense_personnel', 'db', 'coverage_scheme'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean')
    )
    .reset_index()
)

personnel_coverage['total_plays'] = (
    personnel_coverage.groupby('defense_personnel')['plays']
    .transform('sum')
)

personnel_coverage['coverage_pct'] = (
    personnel_coverage['plays'] / personnel_coverage['total_plays']
)

# Filter
primary_schemes = ['Cover 1', 'Cover 2', 'Cover 3', 'Cover 4', 'Cover 6', '2 Man']
personnel_coverage = (personnel_coverage
    .query("total_plays >= 100 and coverage_scheme in @primary_schemes")
)

# Top coverage per personnel
top_personnel_coverage = (personnel_coverage
    .sort_values(['defense_personnel', 'coverage_pct'], ascending=[True, False])
    .groupby('defense_personnel')
    .head(2)
)

print("\nPersonnel-Coverage Correlations")
print("=" * 80)
print(top_personnel_coverage.head(20).to_string(index=False))

Scheme Fit Evaluation

The Critical Importance of Personnel-Scheme Alignment

One of the most consequential decisions in building a defensive unit is ensuring alignment between the scheme you want to run and the personnel you have available. A perfectly designed scheme executed by ill-suited players produces poor results, while a simpler scheme run by perfectly suited players can dominate. This concept—scheme fit—represents the intersection of coaching philosophy and roster construction.

What is scheme fit: Scheme fit measures how well a team's defensive personnel matches the demands of their scheme. A 3-4 defense requires a massive nose tackle who can command double-teams, athletic outside linebackers who can rush and cover, and disciplined inside linebackers. A 4-3 defense needs four quality defensive linemen, fast linebackers, and cornerbacks who can play zone or man. Man-heavy schemes need elite cornerback talent. Zone-heavy schemes need smart, instinctive defenders who can read routes.

Why scheme fit matters: Poor scheme fit manifests in multiple ways:
- Defensive players unable to execute their assignments
- Defensive coordinator unable to call preferred schemes
- Forced into predictable play-calling based on personnel limitations
- Individual players' skills underutilized
- Higher injury rates from players in ill-suited roles

Historical examples of scheme fit issues:
- Rob Ryan's Dallas Cowboys (2011-2012): Attempted to run a complex 3-4 scheme without an elite nose tackle, leading to persistent run defense issues
- Brandon Staley's Chargers (2021-2023): Played extensive two-high safety coverage but lacked the cornerback talent to consistently play man coverage underneath
- Vic Fangio's Bears (2015-2018): Had perfect personnel (Khalil Mack, Eddie Jackson, Kyle Fuller) for his scheme, producing top-5 defenses

Evaluating Personnel-Scheme Alignment

We can assess scheme fit analytically by examining:

Coverage diversity: Can the defense run multiple coverages effectively, or are they limited?
Weighted performance: How well does their most-used coverage perform?
Usage concentration: Do they lean heavily on one coverage (suggesting limitations)?
Situational effectiveness: Do they have different coverages for different situations?

The following analysis creates a "scheme fit score" based on coverage effectiveness and diversity:

R
Python

#| label: scheme-fit-r
#| message: false
#| warning: false

# Evaluate scheme fit
scheme_fit <- pbp %>%
  filter(
    !is.na(coverage_scheme),
    !is.na(defteam),
    play_type == "pass"
  ) %>%
  group_by(defteam, coverage_scheme) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate_allowed = mean(epa > 0, na.rm = TRUE),
    completion_pct = mean(complete_pass == 1, na.rm = TRUE),
    yards_per_att = mean(yards_gained, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(defteam) %>%
  mutate(
    usage_pct = plays / sum(plays)
  ) %>%
  ungroup()

# Calculate weighted performance
team_scheme_fit <- scheme_fit %>%
  filter(coverage_scheme %in% c("Cover 1", "Cover 2", "Cover 3",
                                "Cover 4", "Cover 6", "2 Man")) %>%
  group_by(defteam) %>%
  summarise(
    primary_coverage = coverage_scheme[which.max(usage_pct)],
    primary_usage = max(usage_pct),
    weighted_epa = sum(avg_epa * usage_pct),
    coverage_diversity = 1 - max(usage_pct),  # Higher = more diverse
    .groups = "drop"
  ) %>%
  arrange(weighted_epa)

# Display top defensive schemes
team_scheme_fit %>%
  head(16) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    primary_coverage = "Primary Coverage",
    primary_usage = "Usage %",
    weighted_epa = "Weighted EPA",
    coverage_diversity = "Diversity Index"
  ) %>%
  fmt_percent(
    columns = c(primary_usage, coverage_diversity),
    decimals = 1
  ) %>%
  fmt_number(
    columns = weighted_epa,
    decimals = 3
  ) %>%
  data_color(
    columns = weighted_epa,
    palette = "RdYlGn",
    reverse = TRUE
  ) %>%
  data_color(
    columns = coverage_diversity,
    palette = "Blues"
  ) %>%
  tab_header(
    title = "Defensive Scheme Fit Evaluation",
    subtitle = "Weighted EPA and coverage diversity - 2023 Season"
  ) %>%
  tab_footnote(
    footnote = "Lower weighted EPA = better defense",
    locations = cells_column_labels(columns = weighted_epa)
  )

#| label: scheme-fit-py
#| message: false
#| warning: false

# Evaluate scheme fit
scheme_fit_data = (pbp
    .dropna(subset=['coverage_scheme', 'defteam'])
    .query("play_type == 'pass'")
)

# Calculate coverage performance
scheme_fit = (scheme_fit_data
    .groupby(['defteam', 'coverage_scheme'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean'),
        success_rate_allowed=('epa', lambda x: (x > 0).mean()),
        completion_pct=('complete_pass', lambda x: x.mean()),
        yards_per_att=('yards_gained', 'mean')
    )
    .reset_index()
)

scheme_fit['usage_pct'] = (
    scheme_fit.groupby('defteam')['plays']
    .transform(lambda x: x / x.sum())
)

# Calculate weighted metrics
primary_schemes = ['Cover 1', 'Cover 2', 'Cover 3', 'Cover 4', 'Cover 6', '2 Man']

team_scheme_fit = []
for team in scheme_fit['defteam'].unique():
    team_data = scheme_fit.query("defteam == @team and coverage_scheme in @primary_schemes")

    if len(team_data) > 0:
        primary_idx = team_data['usage_pct'].idxmax()

        team_scheme_fit.append({
            'defteam': team,
            'primary_coverage': team_data.loc[primary_idx, 'coverage_scheme'],
            'primary_usage': team_data.loc[primary_idx, 'usage_pct'],
            'weighted_epa': (team_data['avg_epa'] * team_data['usage_pct']).sum(),
            'coverage_diversity': 1 - team_data['usage_pct'].max()
        })

team_scheme_fit = pd.DataFrame(team_scheme_fit).sort_values('weighted_epa')

print("\nDefensive Scheme Fit Evaluation - 2023")
print("=" * 85)
print(team_scheme_fit.head(16).to_string(index=False))
print("\nNote: Lower weighted EPA = better defense")

Interpreting Scheme Fit Results

When analyzing the scheme fit results, several patterns typically emerge:

High-performing, diverse defenses: Teams with good weighted EPA (negative, favoring defense) and high diversity index have excellent scheme fit. They can run multiple coverages effectively, making them unpredictable and difficult to game plan against. These teams typically have both quality coaching and quality personnel.

High-performing, predictable defenses: Teams with good weighted EPA but low diversity index have found a coverage that works with their personnel and stick with it. While this creates some predictability, executing one coverage at an elite level can still produce excellent results. The Tampa-2 Buccaneers under Monte Kiffin exemplified this approach.

Struggling, diverse defenses: Teams with poor weighted EPA despite high diversity index often lack scheme fit—they're trying many coverages but executing none well. This suggests either poor personnel across the board or players who don't fit the scheme's demands.

Struggling, predictable defenses: Teams with poor weighted EPA and low diversity index have the worst of both worlds—they can only run one or two coverages, and they don't even execute those well. This indicates severe scheme fit issues and typically leads to coordinator changes.

The Diversity Trap

High coverage diversity isn't always desirable. Sometimes it indicates a coordinator "searching" for something that works, unable to establish an identity. The best defenses often show moderate diversity—they have a clear identity and primary coverage but can mix in alternatives to keep offenses honest.

Defensive Scheme Effectiveness Metrics

Measuring Scheme Performance Beyond Basic Stats

Traditional defensive statistics—yards allowed, points allowed, turnovers—tell us what happened but not why it happened. To truly understand defensive scheme effectiveness, we need metrics that account for context and isolate scheme performance from individual player talent.

Why we need better metrics: A defense might allow few yards because their offense controls the clock, limiting opponent opportunities. Another might force turnovers due to elite individual talent rather than scheme-created opportunities. Scheme effectiveness metrics attempt to separate scheme quality from situational factors and player talent.

Key Scheme Effectiveness Metrics

1. Coverage EPA (Expected Points Added)

Coverage EPA measures the defensive EPA allowed on plays when using a specific coverage scheme. Negative values favor the defense (offense gained fewer expected points than average).

What it measures: How effective a coverage scheme is at limiting offensive production
Advantages: Context-aware, accounts for down/distance/field position
Limitations: Doesn't fully separate scheme from personnel talent
Benchmark: League average coverage EPA is approximately 0; elite coverages allow -0.10 to -0.15 EPA

2. Success Rate Allowed

The percentage of plays where the offense achieves a "successful" play (EPA > 0) against a specific coverage or scheme.

What it measures: Consistency of scheme effectiveness
Advantages: Easy to interpret, stable metric
Limitations: Treats all successful plays equally (5-yard gain vs. 50-yard gain)
Benchmark: League average ~47-48% success rate allowed; elite schemes hold offenses below 43%

3. Explosive Play Rate Allowed

The percentage of plays resulting in explosive gains (EPA > 1.0, roughly 15+ yard gains) against a coverage scheme.

What it measures: Scheme's vulnerability to big plays
Advantages: Identifies high-risk vs. safe schemes
Limitations: Small sample sizes can create noise
Benchmark: League average ~6-8% explosive rate; conservative schemes allow ~4-5%, aggressive schemes ~9-11%

4. Pressure-Adjusted Coverage Metrics

Coverage performance split by whether the defense generated pressure (QB hit or hurry) on the play.

What it measures: How much coverage success depends on pressure
Advantages: Reveals whether coverage can succeed without pass rush help
Limitations: Pressure data not always available in play-by-play data
Insight: Big gaps between pressure/no-pressure performance suggest coverage vulnerability

5. Personnel-Adjusted Performance

How well a coverage performs against different offensive personnel groupings (11, 12, 21, etc.).

What it measures: Coverage versatility and matchup dependencies
Advantages: Identifies specific exploitable matchups
Limitations: Requires sufficient sample size for each personnel group
Insight: Wide variance suggests scheme has specific strengths/weaknesses

The Context-Dependence of Scheme Effectiveness

No defensive scheme is universally "good" or "bad." Scheme effectiveness depends on: - Quality of personnel executing it - Opponent offensive tendencies and personnel - Game situation (score, time, field position) - Supporting scheme elements (blitz, front, formation) - Individual player execution on that specific play Analytics can identify tendencies and average effectiveness, but football remains a game of situational execution.

Scheme Effectiveness vs. Player Talent

A persistent challenge in defensive analytics is separating scheme effectiveness from player talent. Does a coverage allow low EPA because it's well-designed, or because elite cornerbacks are executing it? Statistical approaches like: - **Multi-level modeling**: Modeling both player and scheme effects simultaneously - **Year-over-year comparison**: Following coordinators across team changes - **Replacement-level analysis**: Comparing performance when key players are injured ...can help separate these effects, though perfect separation remains elusive.

Exploiting Defensive Weaknesses

Identifying Exploitable Tendencies

The ultimate purpose of defensive scheme analysis from an offensive perspective is identifying exploitable weaknesses—specific situations, coverages, or matchups where the defense consistently struggles. This is where analytics-driven scouting produces tangible competitive advantages.

What makes a tendency exploitable: Not every defensive tendency represents a weakness. A team might play Cover 3 on 70% of first downs and execute it perfectly. That's a tendency, but not necessarily exploitable. Exploitable tendencies have three characteristics:

Predictability: The defense does something consistently in specific situations
Vulnerability: That something has demonstrable weaknesses (high EPA allowed)
Repeatability: The pattern persists across games and opponents

The game planning process: Once exploitable tendencies are identified:
1. Design plays specifically to attack those weaknesses
2. Create constraint plays to prevent defensive adjustment
3. Build play sequences that set up the exploitation
4. Prepare quarterback with pre-snap reads to identify the tendency

R
Python

#| label: exploit-weaknesses-r
#| message: false
#| warning: false

# Identify defensive weaknesses
# Focus on specific matchups and situations

weakness_analysis <- pbp %>%
  filter(
    !is.na(coverage_scheme),
    !is.na(personnel),
    play_type == "pass",
    down <= 3
  ) %>%
  mutate(
    off_personnel_type = case_when(
      str_detect(personnel, "1 RB, 1 TE") ~ "11 Personnel",
      str_detect(personnel, "1 RB, 2 TE") ~ "12 Personnel",
      str_detect(personnel, "2 RB, 1 TE") ~ "21 Personnel",
      str_detect(personnel, "1 RB, 0 TE") ~ "10 Personnel",
      TRUE ~ "Other"
    )
  ) %>%
  filter(off_personnel_type != "Other") %>%
  group_by(defteam, coverage_scheme, off_personnel_type) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    explosive_rate = mean(epa > 1.0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(
    plays >= 25,  # Minimum sample
    coverage_scheme %in% c("Cover 1", "Cover 2", "Cover 3", "Cover 4", "Cover 6")
  )

# Find worst matchups for each team
worst_matchups <- weakness_analysis %>%
  group_by(defteam) %>%
  slice_max(avg_epa, n = 3) %>%
  ungroup() %>%
  arrange(desc(avg_epa))

# Display
worst_matchups %>%
  head(20) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    coverage_scheme = "Coverage",
    off_personnel_type = "vs Personnel",
    plays = "Plays",
    avg_epa = "EPA",
    success_rate = "Success %",
    explosive_rate = "Explosive %"
  ) %>%
  fmt_number(
    columns = avg_epa,
    decimals = 3
  ) %>%
  fmt_percent(
    columns = c(success_rate, explosive_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  data_color(
    columns = avg_epa,
    palette = "Reds"
  ) %>%
  tab_header(
    title = "Defensive Weaknesses by Matchup",
    subtitle = "Worst coverage-personnel matchups for each team"
  ) %>%
  tab_footnote(
    footnote = "Higher EPA = worse for defense (better attack point)",
    locations = cells_column_labels(columns = avg_epa)
  )

#| label: exploit-weaknesses-py
#| message: false
#| warning: false

# Identify defensive weaknesses
def classify_off_personnel(pers):
    if pd.isna(pers):
        return "Other"
    if "1 RB, 1 TE" in pers:
        return "11 Personnel"
    elif "1 RB, 2 TE" in pers:
        return "12 Personnel"
    elif "2 RB, 1 TE" in pers:
        return "21 Personnel"
    elif "1 RB, 0 TE" in pers:
        return "10 Personnel"
    else:
        return "Other"

weakness_data = (pbp
    .dropna(subset=['coverage_scheme', 'personnel'])
    .query("play_type == 'pass' and down <= 3")
    .copy()
)

weakness_data['off_personnel_type'] = weakness_data['personnel'].apply(classify_off_personnel)
weakness_data = weakness_data.query("off_personnel_type != 'Other'")

# Calculate matchup performance
weakness_analysis = (weakness_data
    .groupby(['defteam', 'coverage_scheme', 'off_personnel_type'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean()),
        explosive_rate=('epa', lambda x: (x > 1.0).mean())
    )
    .reset_index()
)

primary_schemes = ['Cover 1', 'Cover 2', 'Cover 3', 'Cover 4', 'Cover 6']
weakness_analysis = (weakness_analysis
    .query("plays >= 25 and coverage_scheme in @primary_schemes")
)

# Find worst matchups
worst_matchups = (weakness_analysis
    .sort_values(['defteam', 'avg_epa'], ascending=[True, False])
    .groupby('defteam')
    .head(3)
    .sort_values('avg_epa', ascending=False)
)

print("\nDefensive Weaknesses by Matchup")
print("=" * 90)
print(worst_matchups.head(20).to_string(index=False))
print("\nNote: Higher EPA = worse for defense (better attack point)")

Game Planning Recommendations

Using Weakness Analysis for Game Planning

When game planning against a specific defense: 1. **Identify primary coverages** used in key situations 2. **Find exploitable matchups** between offensive personnel and defensive coverage 3. **Target high EPA situations** where the defense struggles 4. **Leverage explosive play opportunities** in favorable matchups 5. **Plan constraint plays** to prevent defensive adjustments

Complete Defensive Scouting Report

Building a Comprehensive Scouting Report

Let's create a complete function to generate a defensive scouting report:

R
Python

#| label: scouting-report-r
#| message: false
#| warning: false

# Function to generate comprehensive defensive scouting report
generate_defensive_scouting_report <- function(pbp_data, team, season = 2023) {

  cat("=" %+% rep("=", 70) %+% "=\n")
  cat("DEFENSIVE SCOUTING REPORT:", team, "-", season, "Season\n")
  cat("=" %+% rep("=", 70) %+% "=\n\n")

  team_data <- pbp_data %>% filter(defteam == team, play_type == "pass")

  # 1. Base Defense
  cat("1. BASE DEFENSIVE ALIGNMENT\n")
  cat("-" %+% rep("-", 40) %+% "\n")

  base_def <- team_data %>%
    filter(!is.na(defense_personnel)) %>%
    count(defense_personnel, sort = TRUE) %>%
    mutate(pct = n / sum(n)) %>%
    head(3)

  for(i in 1:nrow(base_def)) {
    cat(sprintf("   %s: %.1f%% (%d plays)\n",
                base_def$defense_personnel[i],
                base_def$pct[i] * 100,
                base_def$n[i]))
  }
  cat("\n")

  # 2. Coverage Tendencies
  cat("2. COVERAGE TENDENCIES\n")
  cat("-" %+% rep("-", 40) %+% "\n")

  coverage_summary <- team_data %>%
    filter(!is.na(coverage_scheme)) %>%
    count(coverage_scheme, sort = TRUE) %>%
    mutate(pct = n / sum(n)) %>%
    head(5)

  for(i in 1:nrow(coverage_summary)) {
    cat(sprintf("   %s: %.1f%%\n",
                coverage_summary$coverage_scheme[i],
                coverage_summary$pct[i] * 100))
  }
  cat("\n")

  # 3. Blitz Frequency
  cat("3. PRESSURE SCHEME\n")
  cat("-" %+% rep("-", 40) %+% "\n")

  blitz_stats <- team_data %>%
    filter(!is.na(blitz)) %>%
    summarise(
      blitz_rate = mean(blitz == 1),
      blitz_epa = mean(epa[blitz == 1], na.rm = TRUE),
      no_blitz_epa = mean(epa[blitz == 0], na.rm = TRUE)
    )

  cat(sprintf("   Blitz Rate: %.1f%%\n", blitz_stats$blitz_rate * 100))
  cat(sprintf("   EPA when blitzing: %.3f\n", blitz_stats$blitz_epa))
  cat(sprintf("   EPA when not blitzing: %.3f\n", blitz_stats$no_blitz_epa))
  cat("\n")

  # 4. Third Down Performance
  cat("4. THIRD DOWN DEFENSE\n")
  cat("-" %+% rep("-", 40) %+% "\n")

  third_down <- team_data %>%
    filter(down == 3) %>%
    mutate(
      distance_cat = case_when(
        ydstogo <= 3 ~ "Short",
        ydstogo <= 6 ~ "Medium",
        TRUE ~ "Long"
      )
    ) %>%
    group_by(distance_cat) %>%
    summarise(
      conv_rate = mean(third_down_converted == 1, na.rm = TRUE),
      .groups = "drop"
    )

  for(i in 1:nrow(third_down)) {
    cat(sprintf("   3rd & %s: %.1f%% conversion rate\n",
                third_down$distance_cat[i],
                third_down$conv_rate[i] * 100))
  }
  cat("\n")

  # 5. Key Weaknesses
  cat("5. EXPLOITABLE WEAKNESSES\n")
  cat("-" %+% rep("-", 40) %+% "\n")

  weaknesses <- team_data %>%
    filter(!is.na(coverage_scheme), !is.na(epa)) %>%
    group_by(coverage_scheme) %>%
    summarise(
      plays = n(),
      avg_epa = mean(epa, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    filter(plays >= 30) %>%
    arrange(desc(avg_epa)) %>%
    head(3)

  for(i in 1:nrow(weaknesses)) {
    cat(sprintf("   Attack %s (EPA: %.3f, %d plays)\n",
                weaknesses$coverage_scheme[i],
                weaknesses$avg_epa[i],
                weaknesses$plays[i]))
  }
  cat("\n")

  # 6. Game Planning Recommendations
  cat("6. GAME PLANNING RECOMMENDATIONS\n")
  cat("-" %+% rep("-", 40) %+% "\n")
  cat("   - Target primary coverage weakness identified above\n")
  cat("   - Use personnel groupings that create favorable matchups\n")
  cat("   - Have constraint plays ready for defensive adjustments\n")

  cat("\n" %+% "=" %+% rep("=", 70) %+% "=\n")
}

# Generate example report
generate_defensive_scouting_report(pbp, "SF", 2023)

#| label: scouting-report-py
#| message: false
#| warning: false

def generate_defensive_scouting_report(pbp_data, team, season=2023):
    """Generate comprehensive defensive scouting report"""

    print("=" * 72)
    print(f"DEFENSIVE SCOUTING REPORT: {team} - {season} Season")
    print("=" * 72)
    print()

    team_data = pbp_data.query(f"defteam == '{team}' and play_type == 'pass'")

    # 1. Base Defense
    print("1. BASE DEFENSIVE ALIGNMENT")
    print("-" * 40)

    base_def = (team_data
        .dropna(subset=['defense_personnel'])
        ['defense_personnel']
        .value_counts()
        .head(3)
    )

    for pers, count in base_def.items():
        pct = count / base_def.sum() * 100
        print(f"   {pers}: {pct:.1f}% ({count} plays)")
    print()

    # 2. Coverage Tendencies
    print("2. COVERAGE TENDENCIES")
    print("-" * 40)

    coverage = (team_data
        .dropna(subset=['coverage_scheme'])
        ['coverage_scheme']
        .value_counts()
        .head(5)
    )

    for cov, count in coverage.items():
        pct = count / coverage.sum() * 100
        print(f"   {cov}: {pct:.1f}%")
    print()

    # 3. Blitz Frequency
    print("3. PRESSURE SCHEME")
    print("-" * 40)

    blitz_data = team_data.dropna(subset=['blitz'])
    blitz_rate = blitz_data['blitz'].mean()
    blitz_epa = blitz_data.query("blitz == 1")['epa'].mean()
    no_blitz_epa = blitz_data.query("blitz == 0")['epa'].mean()

    print(f"   Blitz Rate: {blitz_rate*100:.1f}%")
    print(f"   EPA when blitzing: {blitz_epa:.3f}")
    print(f"   EPA when not blitzing: {no_blitz_epa:.3f}")
    print()

    # 4. Third Down Performance
    print("4. THIRD DOWN DEFENSE")
    print("-" * 40)

    third_down = team_data.query("down == 3").copy()

    def distance_cat(ydstogo):
        if ydstogo <= 3:
            return "Short"
        elif ydstogo <= 6:
            return "Medium"
        else:
            return "Long"

    third_down['distance_cat'] = third_down['ydstogo'].apply(distance_cat)

    conv_rates = (third_down
        .groupby('distance_cat')
        ['third_down_converted']
        .mean()
    )

    for cat in ['Short', 'Medium', 'Long']:
        if cat in conv_rates.index:
            print(f"   3rd & {cat}: {conv_rates[cat]*100:.1f}% conversion rate")
    print()

    # 5. Key Weaknesses
    print("5. EXPLOITABLE WEAKNESSES")
    print("-" * 40)

    weaknesses = (team_data
        .dropna(subset=['coverage_scheme', 'epa'])
        .groupby('coverage_scheme')
        .agg(
            plays=('epa', 'count'),
            avg_epa=('epa', 'mean')
        )
        .query("plays >= 30")
        .sort_values('avg_epa', ascending=False)
        .head(3)
    )

    for cov, row in weaknesses.iterrows():
        print(f"   Attack {cov} (EPA: {row['avg_epa']:.3f}, {int(row['plays'])} plays)")
    print()

    # 6. Recommendations
    print("6. GAME PLANNING RECOMMENDATIONS")
    print("-" * 40)
    print("   - Target primary coverage weakness identified above")
    print("   - Use personnel groupings that create favorable matchups")
    print("   - Have constraint plays ready for defensive adjustments")

    print()
    print("=" * 72)

# Generate example report
generate_defensive_scouting_report(pbp, "SF", 2023)

Advanced Analytics Applications

Predictive Modeling for Defensive Play-Calling

Machine Learning for Defense Prediction

Advanced teams use machine learning to predict defensive play-calls based on: - Game situation (down, distance, score, time) - Offensive personnel and formation - Field position - Historical tendencies - Coordinator patterns This allows offenses to call plays with higher expected success rates against predicted defensive schemes.

In-Game Adjustment Tracking

Defensive coordinators make adjustments throughout games. Track these adjustments:

R
Python

#| label: in-game-adjustments-r
#| message: false
#| warning: false

# Track coverage adjustments by quarter
in_game_adjustments <- pbp %>%
  filter(
    !is.na(coverage_scheme),
    !is.na(qtr),
    play_type == "pass",
    qtr <= 4
  ) %>%
  group_by(defteam, qtr, coverage_scheme) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(defteam, qtr) %>%
  mutate(
    usage_pct = plays / sum(plays)
  ) %>%
  ungroup() %>%
  filter(coverage_scheme %in% c("Cover 1", "Cover 2", "Cover 3"))

# Example: SF's coverage adjustments by quarter
sf_adjustments <- in_game_adjustments %>%
  filter(defteam == "SF") %>%
  select(qtr, coverage_scheme, usage_pct, avg_epa) %>%
  pivot_wider(
    names_from = qtr,
    values_from = c(usage_pct, avg_epa),
    names_glue = "Q{qtr}_"
  )

sf_adjustments %>%
  gt() %>%
  cols_label(
    coverage_scheme = "Coverage"
  ) %>%
  fmt_percent(
    columns = contains("usage_pct"),
    decimals = 1
  ) %>%
  fmt_number(
    columns = contains("avg_epa"),
    decimals = 3
  ) %>%
  tab_header(
    title = "SF In-Game Coverage Adjustments",
    subtitle = "Usage and EPA by quarter - 2023 Season"
  )

#| label: in-game-adjustments-py
#| message: false
#| warning: false

# Track in-game adjustments
adjustment_data = (pbp
    .dropna(subset=['coverage_scheme', 'qtr'])
    .query("play_type == 'pass' and qtr <= 4")
)

in_game_adjustments = (adjustment_data
    .groupby(['defteam', 'qtr', 'coverage_scheme'])
    .agg(
        plays=('epa', 'count'),
        avg_epa=('epa', 'mean')
    )
    .reset_index()
)

in_game_adjustments['usage_pct'] = (
    in_game_adjustments.groupby(['defteam', 'qtr'])['plays']
    .transform(lambda x: x / x.sum())
)

# Filter to main coverages
primary_schemes = ['Cover 1', 'Cover 2', 'Cover 3']
in_game_adjustments = in_game_adjustments.query("coverage_scheme in @primary_schemes")

# Example: SF adjustments
sf_adjustments = (in_game_adjustments
    .query("defteam == 'SF'")
    [['qtr', 'coverage_scheme', 'usage_pct', 'avg_epa']]
)

print("\nSF In-Game Coverage Adjustments by Quarter - 2023")
print("=" * 70)
print(sf_adjustments.to_string(index=False))

Using Scouting Reports for Game Planning

The defensive scouting report generated by the function above provides a framework for offensive game planning. Here's how offensive coordinators typically use this information:

Pre-game preparation:
1. Identify primary coverage: Know what coverage you'll see most often
2. Find the weakness: Target the coverage/situation combination where the defense struggles
3. Plan personnel matchups: Use offensive personnel that exploits defensive tendencies
4. Script opening plays: Design the first 10-15 plays to test defensive tendencies
5. Prepare adjustments: Have constraint plays ready if the defense changes their approach

In-game application:
- Verify tendencies: Confirm pre-game scouting matches actual in-game behavior
- Exploit weaknesses: When scouting is confirmed, attack identified weaknesses
- Adjust to deviations: If defense shows unexpected looks, adapt play-calling
- Force defensive adjustments: Once you exploit a weakness, force them to change
- Have counter-punches ready: When they adjust, have plays designed for the adjustment

The Chess Match of In-Game Adjustments

Football at the highest level is a chess match between offensive and defensive coordinators. The offense attacks a weakness. The defense adjusts to take away that weakness, creating a new vulnerability. The offense recognizes and exploits the new vulnerability. This cycle continues throughout the game. The coordinator who anticipates and prepares for these adjustments gains an advantage.

The Integration of Film and Data

While data analytics provides powerful pattern recognition capabilities, it doesn't replace film study. The most effective scouting combines: - **Data**: Identifies tendencies across large sample sizes - **Film**: Reveals execution details, player techniques, and subtle tells - **Football knowledge**: Interprets why tendencies exist and when they might change Teams that integrate all three sources create the most comprehensive scouting reports.

Summary

This chapter synthesized defensive analytics into a comprehensive scheme analysis and game planning framework. We covered multiple layers of defensive evaluation, from foundational structure to situational tendencies to philosophical fingerprints.

Key Concepts Covered

1. Base Defense Identification
We learned to recognize 4-3, 3-4, and hybrid fronts from play-by-play data by analyzing defensive personnel groupings. While modern defenses spend most snaps in sub-packages, understanding base front reveals coordinator philosophy and personnel priorities. The distinction between four-down linemen (4-3) and three-down linemen (3-4) reflects fundamental strategic differences in how teams allocate defensive resources.

2. Coverage Tendency Mapping
We analyzed coverage usage by down, distance, field position, and situation, creating comprehensive tendency maps. Understanding that a defense plays Cover 3 on 60% of third-and-long situations, but switches to Cover 2 in the red zone, allows offensive coordinators to prepare specific plays for specific situations.

3. Blitz Pattern Analysis
We identified pressure frequencies and effectiveness across situations. Some defensive coordinators blitz frequently (>35% of plays), while others rarely blitz (<25%). More importantly, we learned to measure blitz effectiveness through EPA differential, revealing which coordinators blitz successfully and which coordinators would be better served rushing four.

4. Coordinator Philosophy Fingerprints
By combining blitz frequency with coverage preference, we created a two-dimensional philosophy space revealing four distinct coordinator archetypes: Aggressive Man, Aggressive Zone, Conservative Man, and Conservative Zone. Each archetype has characteristic strengths, weaknesses, and optimal offensive counter-strategies.

5. Defensive Scheme Effectiveness Metrics
We established metrics beyond basic statistics to measure scheme performance: Coverage EPA, Success Rate Allowed, Explosive Play Rate, Pressure-Adjusted Performance, and Personnel-Adjusted Performance. These context-aware metrics separate scheme quality from situational factors and player talent.

6. Personnel-Scheme Correlation
We analyzed how defensive personnel groupings correlate with coverage schemes, revealing which packages are used with which coverages. This helps identify when defenses are in their comfort zones versus when they're forced into less-preferred packages.

7. Scheme Fit Evaluation
We assessed personnel-scheme alignment effectiveness by examining coverage diversity and weighted performance. Teams with both diverse and effective coverage repertoires have excellent scheme fit, while teams limited to one or two coverages often suffer from personnel constraints.

8. Weakness Identification
We developed methods for finding exploitable matchups by identifying coverage-personnel combinations where defenses consistently allow high EPA. These represent specific attack points for offensive game planning.

9. Complete Scouting Reports
We built comprehensive defensive evaluation frameworks that synthesize all previous analyses into actionable scouting reports, mimicking the professional game planning process.

The Analytical Advantage in Modern Football

These techniques enable data-driven offensive game planning that exploits defensive tendencies and creates competitive advantages. The modern NFL involves extensive defensive scheme analysis because:

Offenses have become more sophisticated: Spread formations and RPOs force defenses to adapt
Pattern recognition is difficult from film alone: Analyzing 1,000+ plays manually is time-consuming
Tendencies exist but aren't always obvious: Data reveals subtle patterns missed by eye
Game planning time is limited: Analytics accelerates the scouting process
In-game adjustments require quick recognition: Understanding coordinator philosophy helps predict adjustments

The Limitations of Scheme Analysis

While powerful, scheme analysis has important limitations:

Personnel changes affect tendencies: Injuries or personnel moves can shift scheme usage
Game plan variation: Coordinators adjust their approach based on opponent
Small sample issues: Some tendency splits have limited sample sizes
Execution matters: Perfect scheme can fail with poor execution
Disguise and deception: Coordinators intentionally create misleading tendencies

From Analysis to Action

The ultimate value of scheme analysis lies in converting insights into action. Offensive coordinators who effectively integrate defensive scouting into their game planning:
1. Enter games with clear situational plans
2. Call plays with higher expected success rates
3. Adapt more quickly to in-game defensive adjustments
4. Force defensive coordinators out of comfortable schemes
5. Create favorable matchups for their offensive players

Practical Application

NFL teams combine film study with these analytical frameworks to: 1. **Identify defensive tendencies before games**: Know what to expect in key situations 2. **Call plays that exploit scheme weaknesses**: Target specific coverage-personnel vulnerabilities 3. **Adjust based on in-game defensive changes**: Recognize and respond to coordinator adjustments 4. **Evaluate defensive coordinator effectiveness**: Assess coaching quality separate from player talent 5. **Make personnel decisions based on scheme fit**: Ensure roster construction aligns with defensive philosophy 6. **Prepare quarterbacks with situational knowledge**: Give QBs pre-snap indicators of likely coverage 7. **Design practice plans that simulate opponent schemes**: Prepare offense against looks they'll actually see

Next Steps in Defensive Analysis

This chapter focused on scheme identification and game planning from an offensive perspective. Future analyses might explore:
- Defensive coordinator tracking across multiple teams: Separating individual coordinator tendencies from team effects
- Predictive modeling of defensive play-calls: Using machine learning to forecast defensive scheme selection
- Temporal tendency analysis: How defensive tendencies change throughout games and seasons
- Success of specific offensive counter-strategies: Measuring which offensive approaches best exploit specific defensive tendencies
- Coordinator decision-making in critical situations: Analysis of scheme choices in high-leverage moments

The field of defensive scheme analysis continues to evolve as data becomes richer (player tracking, more detailed coverage classifications) and analytical methods become more sophisticated (machine learning, causal inference). The fundamental challenge—reverse-engineering opponent strategy from observable patterns—will remain central to competitive football strategy.

Exercises

Conceptual Questions

Philosophy Classification: Explain how you would classify a defensive coordinator as "aggressive" vs. "conservative" using play-by-play data. What metrics would you use?
Coverage Shell Tendencies: Why might a defense show more Cover 2 on first down than third down? What strategic factors drive these tendencies?
Blitz Timing: When is blitzing most effective? When does it backfire? How would you determine optimal blitz situations for a specific defense?

Coding Exercises

Exercise 1: Team Defensive Profile

Build a complete defensive profile for your favorite team: a) Identify their base defensive front and primary personnel packages b) Map their coverage tendencies by down and distance c) Calculate their blitz frequency and effectiveness d) Classify their coordinator's philosophy e) Identify their top 3 defensive weaknesses Create visualizations for each component.

Exercise 2: Coverage Effectiveness Analysis

For a specific defensive coordinator: a) Calculate EPA allowed for each coverage scheme b) Analyze how coverage effectiveness varies by field position c) Identify which offensive personnel groupings give each coverage the most trouble d) Create a heatmap showing coverage usage and effectiveness by situation **Bonus**: Compare this coordinator's coverage preferences to league averages.

Exercise 3: Matchup Exploitation

Select an upcoming game matchup: a) Analyze the defensive team's tendencies in key situations b) Identify 3-5 specific scheme weaknesses c) Recommend offensive personnel groupings to exploit these weaknesses d) Create a game plan summary with specific play-calling recommendations This simulates real-world offensive game planning.

Advanced Exercises

Exercise 4: Coordinator Comparison

Compare two defensive coordinators: a) Build complete philosophical profiles for each b) Analyze their adjustment patterns throughout games c) Evaluate their performance in different game situations (leading, trailing, close games) d) Determine which coordinator is more adaptable based on in-game coverage changes Create a comprehensive comparison report with visualizations.

Exercise 5: Predictive Tendency Model

Build a predictive model for defensive play-calling: a) Use game situation, personnel, and historical tendencies to predict coverage b) Calculate prediction accuracy for each defensive coordinator c) Identify which coordinators are most predictable vs. most diverse d) Evaluate how predictability correlates with defensive performance **Hint**: Start with a simple logistic regression model predicting man vs. zone coverage.

Exercise 6: Complete Game Planning Report

Create a professional-quality defensive scouting report: a) Select a specific opponent defense b) Analyze all components: base front, coverage, blitz, adjustments c) Identify exploitable weaknesses and recommended attacks d) Include visualizations showing tendencies and vulnerabilities e) Provide specific play-calling recommendations by down and distance Format this as you would present it to an offensive coaching staff.

References

:::## Appendix: Additional Code### Complete Tendency Tracking Function

#| label: appendix-tendency-tracker
#| echo: true
#| eval: false

# Comprehensive tendency tracking function
track_defensive_tendencies <- function(pbp_data, team, min_plays = 10) {

  tendencies <- list()

  # Overall coverage distribution
  tendencies$coverage_distribution <- pbp_data %>%
    filter(defteam == team, !is.na(coverage_scheme)) %>%
    count(coverage_scheme, sort = TRUE) %>%
    mutate(pct = n / sum(n))

  # Down and distance tendencies
  tendencies$down_distance <- pbp_data %>%
    filter(defteam == team, !is.na(coverage_scheme), down <= 3) %>%
    mutate(
      distance_bin = case_when(
        ydstogo <= 3 ~ "Short",
        ydstogo <= 7 ~ "Medium",
        TRUE ~ "Long"
      )
    ) %>%
    group_by(down, distance_bin, coverage_scheme) %>%
    summarise(plays = n(), .groups = "drop") %>%
    filter(plays >= min_plays)

  # Blitz tendencies
  tendencies$blitz <- pbp_data %>%
    filter(defteam == team, !is.na(blitz), play_type == "pass") %>%
    summarise(
      total_plays = n(),
      blitz_plays = sum(blitz == 1),
      blitz_rate = mean(blitz == 1),
      blitz_epa = mean(epa[blitz == 1], na.rm = TRUE),
      no_blitz_epa = mean(epa[blitz == 0], na.rm = TRUE)
    )

  # Personnel correlations
  tendencies$personnel <- pbp_data %>%
    filter(defteam == team, !is.na(defense_personnel), !is.na(coverage_scheme)) %>%
    group_by(defense_personnel, coverage_scheme) %>%
    summarise(plays = n(), .groups = "drop") %>%
    group_by(defense_personnel) %>%
    mutate(pct = plays / sum(plays)) %>%
    filter(plays >= min_plays)

  return(tendencies)
}

# Usage example
# sf_tendencies <- track_defensive_tendencies(pbp, "SF")

Coverage Visualization Helper

#| label: appendix-viz-helper
#| echo: true
#| eval: false

def plot_coverage_heatmap(pbp_data, team, season=2023):
    """
    Create coverage tendency heatmap for a specific team
    """
    import matplotlib.pyplot as plt
    import seaborn as sns

    # Filter data
    team_data = (pbp_data
        .query(f"defteam == '{team}' and play_type == 'pass' and down <= 3")
        .dropna(subset=['coverage_scheme'])
        .copy()
    )

    # Create bins
    def distance_bin(ydstogo):
        if ydstogo <= 3:
            return "Short"
        elif ydstogo <= 7:
            return "Medium"
        else:
            return "Long"

    team_data['distance_bin'] = team_data['ydstogo'].apply(distance_bin)

    # Calculate percentages
    coverage_pct = (team_data
        .groupby(['down', 'distance_bin', 'coverage_scheme'])
        .size()
        .reset_index(name='plays')
    )

    coverage_pct['pct'] = (
        coverage_pct.groupby(['down', 'distance_bin'])['plays']
        .transform(lambda x: x / x.sum())
    )

    # Filter to main coverages
    main_coverages = ['Cover 1', 'Cover 2', 'Cover 3', 'Cover 4']
    plot_data = coverage_pct.query("coverage_scheme in @main_coverages")

    # Create plot
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))

    for i, down in enumerate([1, 2, 3]):
        down_data = plot_data.query(f"down == {down}")
        pivot = down_data.pivot_table(
            index='coverage_scheme',
            columns='distance_bin',
            values='pct',
            fill_value=0
        )

        sns.heatmap(pivot, annot=True, fmt='.1%', cmap='Blues',
                   ax=axes[i], vmin=0, vmax=0.6, cbar=(i==2))
        axes[i].set_title(f'Down {down}', fontweight='bold')
        axes[i].set_xlabel('Distance')
        axes[i].set_ylabel('Coverage' if i == 0 else '')

    plt.suptitle(f'{team} Coverage Tendencies - {season}',
                fontsize=14, fontweight='bold')
    plt.tight_layout()

    return fig

# Usage:
# fig = plot_coverage_heatmap(pbp, 'SF', 2023)
# plt.show()

Game Plan Generator

#| label: appendix-game-plan
#| echo: true
#| eval: false

# Automated game plan generator
generate_game_plan <- function(pbp_data, opponent_defense) {

  # Analyze opponent
  opp_data <- pbp_data %>% filter(defteam == opponent_defense)

  # Find weaknesses
  weaknesses <- opp_data %>%
    filter(!is.na(coverage_scheme), play_type == "pass") %>%
    group_by(coverage_scheme) %>%
    summarise(
      plays = n(),
      avg_epa = mean(epa, na.rm = TRUE),
      success_rate = mean(epa > 0, na.rm = TRUE),
      explosive_rate = mean(epa > 1.0, na.rm = TRUE),
      .groups = "drop"
    ) %>%
    filter(plays >= 30) %>%
    arrange(desc(avg_epa))

  # Build recommendations
  game_plan <- list(
    primary_coverage = weaknesses$coverage_scheme[1],
    target_epa = weaknesses$avg_epa[1],
    success_rate = weaknesses$success_rate[1],
    recommendation = sprintf(
      "Target %s coverage (EPA: %.3f, Success: %.1f%%)",
      weaknesses$coverage_scheme[1],
      weaknesses$avg_epa[1],
      weaknesses$success_rate[1] * 100
    )
  )

  return(game_plan)
}