Chapter 6: Offensive Analytics Basics | Football Analytics Textbook

Learning ObjectivesBy the end of this chapter, you will be able to:

Passing vs Rushing EPA Comparison

Compare the efficiency of passing versus rushing plays across different game situations.

Understand fundamental offensive concepts and terminology
Calculate basic offensive efficiency metrics
Analyze play-calling tendencies
Evaluate offensive line performance
Measure explosive play rates

Introduction

Offense wins games. While defense may win championships, as the old adage goes, the reality is that modern football is increasingly offensive-focused. Rule changes favoring passing, evolving offensive schemes, and the proliferation of elite quarterback play have made offensive analytics more important than ever.

This chapter introduces the fundamental concepts and metrics used to evaluate offensive performance. We'll move beyond basic statistics like total yards and touchdowns to examine more sophisticated measures that account for context, efficiency, and situational factors.

What is Offensive Analytics?

Offensive analytics encompasses the measurement, evaluation, and optimization of offensive performance. This includes everything from individual play efficiency to overall offensive strategy, formation effectiveness, and situational decision-making.

The Evolution of Offensive Football

From Three Yards and a Cloud of Dust to Air Raid

The NFL has undergone a dramatic offensive transformation over the past five decades:

1970s-1980s: Run-first offenses dominated, with teams averaging fewer than 20 pass attempts per game. The focus was on power running and ball control.

1990s-2000s: The West Coast offense revolution, pioneered by Bill Walsh, emphasized short, high-percentage passes as an extension of the running game.

2010s-Present: Spread offenses, RPOs (run-pass options), and pass-heavy attacks have become the norm. Teams now average 35+ pass attempts per game, and offensive efficiency has reached historic highs.

The Analytics Impact

The shift toward passing isn't just about scheme evolution—it's data-driven. Analytics has consistently shown that passing is more efficient than rushing on a per-play basis, leading teams to adopt more pass-heavy strategies.

Basic Offensive Metrics

Traditional Statistics

Before diving into advanced metrics, let's review the traditional statistics that have long been used to evaluate offense:

Volume Metrics:
- Total yards (passing + rushing)
- Total plays
- First downs
- Points scored

Rate Metrics:
- Yards per play
- Yards per carry
- Yards per attempt
- Completion percentage

While these metrics provide useful information, they lack context. A 5-yard gain on 3rd-and-3 is far more valuable than a 5-yard gain on 3rd-and-10, yet traditional statistics treat them equally.

Yards Per Play

Yards per play is one of the simplest yet most informative efficiency metrics:

$$ \text{Yards Per Play} = \frac{\text{Total Yards}}{\text{Total Plays}} $$

Despite its simplicity, yards per play is strongly correlated with winning. Teams that average more yards per play typically win more games.

R
Python

#| label: yards-per-play-r
#| message: false
#| warning: false
#| cache: true

library(tidyverse)
library(nflfastR)
library(gt)

# Load 2023 season data
pbp_2023 <- load_pbp(2023)

# Calculate team offensive yards per play
offensive_ypp <- pbp_2023 %>%
  filter(!is.na(posteam), play_type %in% c("pass", "run")) %>%
  group_by(posteam) %>%
  summarise(
    plays = n(),
    total_yards = sum(yards_gained, na.rm = TRUE),
    yards_per_play = total_yards / plays,
    pass_plays = sum(play_type == "pass"),
    rush_plays = sum(play_type == "run"),
    pass_yards = sum(yards_gained[play_type == "pass"], na.rm = TRUE),
    rush_yards = sum(yards_gained[play_type == "run"], na.rm = TRUE),
    yards_per_pass = pass_yards / pass_plays,
    yards_per_rush = rush_yards / rush_plays,
    .groups = "drop"
  ) %>%
  arrange(desc(yards_per_play))

# Display top 10 teams
offensive_ypp %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    plays = "Plays",
    total_yards = "Total Yards",
    yards_per_play = "YPP",
    pass_plays = "Pass Plays",
    rush_plays = "Rush Plays",
    yards_per_pass = "YPA",
    yards_per_rush = "YPC"
  ) %>%
  fmt_number(
    columns = c(yards_per_play, yards_per_pass, yards_per_rush),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(plays, total_yards, pass_plays, rush_plays),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Offensive Yards Per Play Leaders",
    subtitle = "2023 NFL Season - Top 10 Teams"
  )

#| label: yards-per-play-py
#| message: false
#| warning: false
#| cache: true

import pandas as pd
import numpy as np
import nfl_data_py as nfl

# Load 2023 season data
pbp_2023 = nfl.import_pbp_data([2023])

# Calculate team offensive yards per play
offensive_ypp = (pbp_2023
    .query("posteam.notna() & play_type.isin(['pass', 'run'])")
    .groupby('posteam')
    .agg(
        plays=('yards_gained', 'count'),
        total_yards=('yards_gained', 'sum'),
        pass_plays=('play_type', lambda x: (x == 'pass').sum()),
        rush_plays=('play_type', lambda x: (x == 'run').sum()),
        pass_yards=('yards_gained', lambda x: x[pbp_2023.loc[x.index, 'play_type'] == 'pass'].sum()),
        rush_yards=('yards_gained', lambda x: x[pbp_2023.loc[x.index, 'play_type'] == 'run'].sum())
    )
    .reset_index()
)

# Calculate per-play metrics
offensive_ypp['yards_per_play'] = offensive_ypp['total_yards'] / offensive_ypp['plays']
offensive_ypp['yards_per_pass'] = offensive_ypp['pass_yards'] / offensive_ypp['pass_plays']
offensive_ypp['yards_per_rush'] = offensive_ypp['rush_yards'] / offensive_ypp['rush_plays']

# Sort and display top 10
offensive_ypp_top10 = (offensive_ypp
    .sort_values('yards_per_play', ascending=False)
    .head(10)
    [['posteam', 'plays', 'total_yards', 'yards_per_play',
      'pass_plays', 'rush_plays', 'yards_per_pass', 'yards_per_rush']]
)

print("Offensive Yards Per Play Leaders - 2023 NFL Season (Top 10)")
print("=" * 80)
print(offensive_ypp_top10.to_string(index=False))

First Down Rate

First down rate measures how often an offense converts a new set of downs:

$$ \text{First Down Rate} = \frac{\text{First Downs}}{\text{Total Plays}} $$

This metric captures an offense's ability to sustain drives and maintain possession.

R
Python

#| label: first-down-rate-r
#| message: false
#| warning: false

# Calculate first down rate
first_down_rate <- pbp_2023 %>%
  filter(!is.na(posteam), play_type %in% c("pass", "run")) %>%
  group_by(posteam) %>%
  summarise(
    plays = n(),
    first_downs = sum(first_down == 1, na.rm = TRUE),
    first_down_rate = first_downs / plays,
    .groups = "drop"
  ) %>%
  arrange(desc(first_down_rate))

# Display results
first_down_rate %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    plays = "Total Plays",
    first_downs = "First Downs",
    first_down_rate = "1st Down Rate"
  ) %>%
  fmt_percent(
    columns = first_down_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(plays, first_downs),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "First Down Rate Leaders",
    subtitle = "2023 NFL Season"
  )

#| label: first-down-rate-py
#| message: false
#| warning: false

# Calculate first down rate
first_down_rate = (pbp_2023
    .query("posteam.notna() & play_type.isin(['pass', 'run'])")
    .groupby('posteam')
    .agg(
        plays=('play_id', 'count'),
        first_downs=('first_down', 'sum')
    )
    .reset_index()
)

first_down_rate['first_down_rate'] = first_down_rate['first_downs'] / first_down_rate['plays']

# Sort and display
first_down_rate_top10 = (first_down_rate
    .sort_values('first_down_rate', ascending=False)
    .head(10)
)

print("\nFirst Down Rate Leaders - 2023 NFL Season")
print("=" * 60)
for _, row in first_down_rate_top10.iterrows():
    print(f"{row['posteam']:4s} | Plays: {row['plays']:4.0f} | "
          f"1st Downs: {row['first_downs']:4.0f} | "
          f"Rate: {row['first_down_rate']:.1%}")

Scoring Efficiency

Points are what matter most. Points per play and points per drive measure how efficiently an offense converts opportunities into points:

$$ \text{Points Per Play} = \frac{\text{Total Points}}{\text{Total Plays}} $$

$$ \text{Points Per Drive} = \frac{\text{Total Points}}{\text{Total Drives}} $$

R
Python

#| label: scoring-efficiency-r
#| message: false
#| warning: false

# Calculate scoring efficiency
scoring_efficiency <- pbp_2023 %>%
  filter(!is.na(posteam), play_type %in% c("pass", "run")) %>%
  group_by(posteam, fixed_drive) %>%
  summarise(
    plays = n(),
    drive_points = last(fixed_drive_result),
    .groups = "drop"
  ) %>%
  mutate(
    points = case_when(
      drive_points == "Touchdown" ~ 7,
      drive_points == "Field goal" ~ 3,
      TRUE ~ 0
    )
  ) %>%
  group_by(posteam) %>%
  summarise(
    total_plays = sum(plays),
    total_drives = n(),
    total_points = sum(points),
    points_per_play = total_points / total_plays,
    points_per_drive = total_points / total_drives,
    .groups = "drop"
  ) %>%
  arrange(desc(points_per_drive))

# Display results
scoring_efficiency %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    total_plays = "Plays",
    total_drives = "Drives",
    total_points = "Points",
    points_per_play = "Pts/Play",
    points_per_drive = "Pts/Drive"
  ) %>%
  fmt_number(
    columns = c(points_per_play, points_per_drive),
    decimals = 3
  ) %>%
  fmt_number(
    columns = c(total_plays, total_drives, total_points),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Scoring Efficiency Leaders",
    subtitle = "2023 NFL Season"
  )

#| label: scoring-efficiency-py
#| message: false
#| warning: false

# Calculate scoring efficiency
def calculate_drive_points(result):
    if result == 'Touchdown':
        return 7
    elif result == 'Field goal':
        return 3
    else:
        return 0

scoring_data = (pbp_2023
    .query("posteam.notna() & play_type.isin(['pass', 'run'])")
    .copy()
)

# Calculate points per drive
drive_summary = (scoring_data
    .groupby(['posteam', 'fixed_drive'])
    .agg(
        plays=('play_id', 'count'),
        drive_result=('fixed_drive_result', 'last')
    )
    .reset_index()
)

drive_summary['points'] = drive_summary['drive_result'].apply(calculate_drive_points)

scoring_efficiency = (drive_summary
    .groupby('posteam')
    .agg(
        total_plays=('plays', 'sum'),
        total_drives=('fixed_drive', 'count'),
        total_points=('points', 'sum')
    )
    .reset_index()
)

scoring_efficiency['points_per_play'] = scoring_efficiency['total_points'] / scoring_efficiency['total_plays']
scoring_efficiency['points_per_drive'] = scoring_efficiency['total_points'] / scoring_efficiency['total_drives']

# Display top 10
scoring_top10 = (scoring_efficiency
    .sort_values('points_per_drive', ascending=False)
    .head(10)
)

print("\nScoring Efficiency Leaders - 2023 NFL Season")
print("=" * 75)
print(scoring_top10.to_string(index=False))

Offensive Formations and Personnel Groupings

Understanding offensive formations and personnel groupings is crucial for analyzing play-calling and scheme effectiveness.

Personnel Groupings

Personnel groupings describe the combination of skill position players on the field:

11 Personnel: 1 RB, 1 TE, 3 WR (most common in modern NFL)
12 Personnel: 1 RB, 2 TE, 2 WR
21 Personnel: 2 RB, 1 TE, 2 WR
22 Personnel: 2 RB, 2 TE, 1 WR
10 Personnel: 0 RB, 1 TE, 4 WR (spread/empty sets)

R
Python

#| label: personnel-analysis-r
#| message: false
#| warning: false

# Analyze personnel grouping usage
personnel_usage <- pbp_2023 %>%
  filter(!is.na(posteam), !is.na(personnel_o), play_type %in% c("pass", "run")) %>%
  group_by(personnel_o) %>%
  summarise(
    plays = n(),
    pass_plays = sum(play_type == "pass"),
    run_plays = sum(play_type == "run"),
    pass_rate = pass_plays / plays,
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(plays)) %>%
  head(10)

# Display results
personnel_usage %>%
  gt() %>%
  cols_label(
    personnel_o = "Personnel",
    plays = "Total Plays",
    pass_plays = "Pass",
    run_plays = "Run",
    pass_rate = "Pass %",
    avg_epa = "Avg EPA",
    success_rate = "Success %"
  ) %>%
  fmt_percent(
    columns = c(pass_rate, success_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = avg_epa,
    decimals = 3
  ) %>%
  fmt_number(
    columns = c(plays, pass_plays, run_plays),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Personnel Grouping Usage and Efficiency",
    subtitle = "2023 NFL Season - Top 10 Most Common"
  )

#| label: personnel-analysis-py
#| message: false
#| warning: false

# Analyze personnel grouping usage
personnel_data = pbp_2023.query(
    "posteam.notna() & personnel_o.notna() & play_type.isin(['pass', 'run'])"
).copy()

personnel_usage = (personnel_data
    .groupby('personnel_o')
    .agg(
        plays=('play_id', 'count'),
        pass_plays=('play_type', lambda x: (x == 'pass').sum()),
        run_plays=('play_type', lambda x: (x == 'run').sum()),
        avg_epa=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
)

personnel_usage['pass_rate'] = personnel_usage['pass_plays'] / personnel_usage['plays']

# Sort and display top 10
personnel_top10 = (personnel_usage
    .sort_values('plays', ascending=False)
    .head(10)
)

print("\nPersonnel Grouping Usage and Efficiency - 2023 NFL Season")
print("=" * 85)
print(personnel_top10.to_string(index=False))

Formation Analysis

While play-by-play data doesn't always include detailed formation information, we can infer formation tendencies from personnel groupings and down-distance situations.

Formation Tendencies

Teams often use specific formations in predictable situations. For example, 21 and 22 personnel are typically used in short-yardage and goal-line situations, while 11 personnel dominates in obvious passing situations.

Play-Calling Balance and Tendencies

Pass-Run Balance

The balance between passing and rushing attempts is a fundamental aspect of offensive strategy:

R
Python

#| label: play-calling-balance-r
#| message: false
#| warning: false

# Calculate play-calling balance by team
play_balance <- pbp_2023 %>%
  filter(!is.na(posteam), play_type %in% c("pass", "run")) %>%
  group_by(posteam) %>%
  summarise(
    total_plays = n(),
    pass_plays = sum(play_type == "pass"),
    run_plays = sum(play_type == "run"),
    pass_rate = pass_plays / total_plays,
    pass_epa = mean(epa[play_type == "pass"], na.rm = TRUE),
    run_epa = mean(epa[play_type == "run"], na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(pass_rate))

# Display results
play_balance %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    total_plays = "Total Plays",
    pass_plays = "Pass Plays",
    run_plays = "Run Plays",
    pass_rate = "Pass Rate",
    pass_epa = "Pass EPA",
    run_epa = "Run EPA"
  ) %>%
  fmt_percent(
    columns = pass_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(pass_epa, run_epa),
    decimals = 3
  ) %>%
  fmt_number(
    columns = c(total_plays, pass_plays, run_plays),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Most Pass-Heavy Offenses",
    subtitle = "2023 NFL Season"
  )

#| label: play-calling-balance-py
#| message: false
#| warning: false

# Calculate play-calling balance by team
balance_data = pbp_2023.query(
    "posteam.notna() & play_type.isin(['pass', 'run'])"
).copy()

play_balance = (balance_data
    .groupby('posteam')
    .agg(
        total_plays=('play_id', 'count'),
        pass_plays=('play_type', lambda x: (x == 'pass').sum()),
        run_plays=('play_type', lambda x: (x == 'run').sum())
    )
    .reset_index()
)

# Calculate pass EPA and run EPA separately
pass_epa = (balance_data.query("play_type == 'pass'")
    .groupby('posteam')['epa'].mean()
    .rename('pass_epa'))

run_epa = (balance_data.query("play_type == 'run'")
    .groupby('posteam')['epa'].mean()
    .rename('run_epa'))

play_balance = play_balance.join(pass_epa, on='posteam').join(run_epa, on='posteam')
play_balance['pass_rate'] = play_balance['pass_plays'] / play_balance['total_plays']

# Display top 10 most pass-heavy
balance_top10 = (play_balance
    .sort_values('pass_rate', ascending=False)
    .head(10)
)

print("\nMost Pass-Heavy Offenses - 2023 NFL Season")
print("=" * 85)
print(balance_top10.to_string(index=False))

Situational Tendencies

Play-calling varies dramatically by down and distance. Understanding these tendencies is crucial for both offense and defense:

R
Python

#| label: situational-tendencies-r
#| message: false
#| warning: false

# Analyze play-calling by down
down_tendencies <- pbp_2023 %>%
  filter(!is.na(down), play_type %in% c("pass", "run")) %>%
  mutate(
    down_label = paste0(down,
                       case_when(down == 1 ~ "st",
                                down == 2 ~ "nd",
                                down == 3 ~ "rd",
                                down == 4 ~ "th"),
                       " Down")
  ) %>%
  group_by(down_label) %>%
  summarise(
    plays = n(),
    pass_plays = sum(play_type == "pass"),
    run_plays = sum(play_type == "run"),
    pass_rate = pass_plays / plays,
    avg_yards_to_go = mean(ydstogo, na.rm = TRUE),
    .groups = "drop"
  )

# Display results
down_tendencies %>%
  gt() %>%
  cols_label(
    down_label = "Down",
    plays = "Total Plays",
    pass_plays = "Pass Plays",
    run_plays = "Run Plays",
    pass_rate = "Pass Rate",
    avg_yards_to_go = "Avg Yards To Go"
  ) %>%
  fmt_percent(
    columns = pass_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = avg_yards_to_go,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(plays, pass_plays, run_plays),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Play-Calling Tendencies by Down",
    subtitle = "2023 NFL Season"
  )

#| label: situational-tendencies-py
#| message: false
#| warning: false

# Analyze play-calling by down
down_data = pbp_2023.query(
    "down.notna() & play_type.isin(['pass', 'run'])"
).copy()

def get_down_suffix(down):
    suffixes = {1: 'st', 2: 'nd', 3: 'rd', 4: 'th'}
    return f"{int(down)}{suffixes.get(int(down), 'th')} Down"

down_data['down_label'] = down_data['down'].apply(get_down_suffix)

down_tendencies = (down_data
    .groupby('down_label')
    .agg(
        plays=('play_id', 'count'),
        pass_plays=('play_type', lambda x: (x == 'pass').sum()),
        run_plays=('play_type', lambda x: (x == 'run').sum()),
        avg_yards_to_go=('ydstogo', 'mean')
    )
    .reset_index()
)

down_tendencies['pass_rate'] = down_tendencies['pass_plays'] / down_tendencies['plays']

print("\nPlay-Calling Tendencies by Down - 2023 NFL Season")
print("=" * 80)
print(down_tendencies.to_string(index=False))

Explosive Plays

Explosive plays are game-changing gains that significantly shift field position and scoring probability. The standard definition is:

Passing: 20+ yards
Rushing: 10+ yards (some analysts use 15+)

Why Explosive Plays Matter

Research has shown that explosive play rate is one of the strongest predictors of offensive success. Teams that generate more explosive plays score more points and win more games.

#| label: explosive-plays-r
#| message: false
#| warning: false

# Calculate explosive play rates
explosive_plays <- pbp_2023 %>%
  filter(!is.na(posteam), play_type %in% c("pass", "run")) %>%
  mutate(
    explosive = case_when(
      play_type == "pass" & yards_gained >= 20 ~ 1,
      play_type == "run" & yards_gained >= 10 ~ 1,
      TRUE ~ 0
    ),
    explosive_pass = if_else(play_type == "pass" & yards_gained >= 20, 1, 0),
    explosive_run = if_else(play_type == "run" & yards_gained >= 10, 1, 0)
  ) %>%
  group_by(posteam) %>%
  summarise(
    total_plays = n(),
    explosive_plays = sum(explosive),
    explosive_rate = explosive_plays / total_plays,
    pass_plays = sum(play_type == "pass"),
    explosive_pass_plays = sum(explosive_pass),
    explosive_pass_rate = explosive_pass_plays / pass_plays,
    run_plays = sum(play_type == "run"),
    explosive_run_plays = sum(explosive_run),
    explosive_run_rate = explosive_run_plays / run_plays,
    .groups = "drop"
  ) %>%
  arrange(desc(explosive_rate))

# Display results
explosive_plays %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    total_plays = "Plays",
    explosive_plays = "Explosive",
    explosive_rate = "Overall %",
    explosive_pass_rate = "Pass %",
    explosive_run_rate = "Run %"
  ) %>%
  fmt_percent(
    columns = c(explosive_rate, explosive_pass_rate, explosive_run_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(total_plays, explosive_plays),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Explosive Play Rate Leaders",
    subtitle = "2023 NFL Season - Pass: 20+ yds, Run: 10+ yds"
  )

#| label: explosive-plays-py
#| message: false
#| warning: false

# Calculate explosive play rates
explosive_data = pbp_2023.query(
    "posteam.notna() & play_type.isin(['pass', 'run'])"
).copy()

# Define explosive plays
def is_explosive(row):
    if row['play_type'] == 'pass' and row['yards_gained'] >= 20:
        return 1
    elif row['play_type'] == 'run' and row['yards_gained'] >= 10:
        return 1
    return 0

explosive_data['explosive'] = explosive_data.apply(is_explosive, axis=1)
explosive_data['explosive_pass'] = ((explosive_data['play_type'] == 'pass') &
                                     (explosive_data['yards_gained'] >= 20)).astype(int)
explosive_data['explosive_run'] = ((explosive_data['play_type'] == 'run') &
                                    (explosive_data['yards_gained'] >= 10)).astype(int)

explosive_summary = (explosive_data
    .groupby('posteam')
    .agg(
        total_plays=('play_id', 'count'),
        explosive_plays=('explosive', 'sum'),
        pass_plays=('play_type', lambda x: (x == 'pass').sum()),
        explosive_pass_plays=('explosive_pass', 'sum'),
        run_plays=('play_type', lambda x: (x == 'run').sum()),
        explosive_run_plays=('explosive_run', 'sum')
    )
    .reset_index()
)

explosive_summary['explosive_rate'] = explosive_summary['explosive_plays'] / explosive_summary['total_plays']
explosive_summary['explosive_pass_rate'] = explosive_summary['explosive_pass_plays'] / explosive_summary['pass_plays']
explosive_summary['explosive_run_rate'] = explosive_summary['explosive_run_plays'] / explosive_summary['run_plays']

# Display top 10
explosive_top10 = (explosive_summary
    .sort_values('explosive_rate', ascending=False)
    .head(10)
    [['posteam', 'total_plays', 'explosive_plays', 'explosive_rate',
      'explosive_pass_rate', 'explosive_run_rate']]
)

print("\nExplosive Play Rate Leaders - 2023 NFL Season")
print("Pass: 20+ yds, Run: 10+ yds")
print("=" * 90)
print(explosive_top10.to_string(index=False))

Visualizing Explosive Play Rates

R
Python

#| label: fig-explosive-plays-r
#| fig-cap: "Explosive play rates by team (2023 season)"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

library(nflplotR)

# Prepare data for visualization
explosive_viz <- explosive_plays %>%
  select(posteam, explosive_pass_rate, explosive_run_rate) %>%
  pivot_longer(
    cols = c(explosive_pass_rate, explosive_run_rate),
    names_to = "type",
    values_to = "rate"
  ) %>%
  mutate(
    type = if_else(type == "explosive_pass_rate", "Passing (20+ yds)", "Rushing (10+ yds)")
  )

# Create plot
ggplot(explosive_viz, aes(x = reorder(posteam, rate), y = rate, fill = type)) +
  geom_col(position = "dodge", alpha = 0.8) +
  coord_flip() +
  scale_y_continuous(labels = scales::percent_format(), breaks = seq(0, 0.20, 0.05)) +
  scale_fill_manual(values = c("Passing (20+ yds)" = "#00BFC4", "Rushing (10+ yds)" = "#F8766D")) +
  labs(
    title = "Explosive Play Rates by Team",
    subtitle = "2023 NFL Season | Passing: 20+ yards, Rushing: 10+ yards",
    x = "Team",
    y = "Explosive Play Rate",
    fill = "Play Type",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 12),
    axis.text = element_text(size = 9),
    legend.position = "top"
  )

#| label: fig-explosive-plays-py
#| fig-cap: "Explosive play rates by team - Python (2023 season)"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

import matplotlib.pyplot as plt

# Prepare data for visualization
explosive_viz = explosive_summary.copy()
explosive_viz = explosive_viz.sort_values('explosive_rate')

# Create plot
fig, ax = plt.subplots(figsize=(12, 8))

x = range(len(explosive_viz))
width = 0.35

bars1 = ax.barh([i - width/2 for i in x], explosive_viz['explosive_pass_rate'],
                width, label='Passing (20+ yds)', color='#00BFC4', alpha=0.8)
bars2 = ax.barh([i + width/2 for i in x], explosive_viz['explosive_run_rate'],
                width, label='Rushing (10+ yds)', color='#F8766D', alpha=0.8)

ax.set_yticks(x)
ax.set_yticklabels(explosive_viz['posteam'], fontsize=9)
ax.set_xlabel('Explosive Play Rate', fontsize=12)
ax.set_ylabel('Team', fontsize=12)
ax.set_title('Explosive Play Rates by Team\n2023 NFL Season | Passing: 20+ yards, Rushing: 10+ yards',
             fontsize=16, fontweight='bold', pad=20)
ax.legend(title='Play Type', loc='lower right', fontsize=10)
ax.set_xlim(0, 0.20)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'{x:.0%}'))
ax.grid(axis='x', alpha=0.3)

plt.text(0.98, 0.02, 'Data: nfl_data_py',
         transform=ax.transAxes,
         ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Offensive Line Metrics

The offensive line is often overlooked in traditional statistics, but it's fundamental to offensive success. Key metrics include:

Sacks and Pressure

Sack rate measures how often the quarterback is brought down behind the line of scrimmage:

$$ \text{Sack Rate} = \frac{\text{Sacks}}{\text{Pass Plays + Sacks}} $$

R
Python

#| label: sack-analysis-r
#| message: false
#| warning: false

# Calculate sack rates
sack_rates <- pbp_2023 %>%
  filter(!is.na(posteam), (play_type %in% c("pass") | sack == 1)) %>%
  group_by(posteam) %>%
  summarise(
    pass_attempts = n(),
    sacks = sum(sack == 1, na.rm = TRUE),
    sack_rate = sacks / pass_attempts,
    sack_yards_lost = sum(yards_gained[sack == 1], na.rm = TRUE),
    avg_sack_yards = if_else(sacks > 0, sack_yards_lost / sacks, 0),
    .groups = "drop"
  ) %>%
  arrange(sack_rate)

# Display best offensive lines (lowest sack rate)
sack_rates %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    pass_attempts = "Pass Attempts",
    sacks = "Sacks",
    sack_rate = "Sack Rate",
    sack_yards_lost = "Yards Lost",
    avg_sack_yards = "Avg Sack Yds"
  ) %>%
  fmt_percent(
    columns = sack_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(avg_sack_yards, sack_yards_lost),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(pass_attempts, sacks),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Best Pass Protection (Lowest Sack Rate)",
    subtitle = "2023 NFL Season"
  )

#| label: sack-analysis-py
#| message: false
#| warning: false

# Calculate sack rates
sack_data = pbp_2023.query(
    "posteam.notna() & (play_type == 'pass' | sack == 1)"
).copy()

sack_summary = (sack_data
    .groupby('posteam')
    .agg(
        pass_attempts=('play_id', 'count'),
        sacks=('sack', 'sum')
    )
    .reset_index()
)

# Calculate sack yards lost
sack_yards = (sack_data.query("sack == 1")
    .groupby('posteam')
    .agg(
        sack_yards_lost=('yards_gained', 'sum')
    )
)

sack_summary = sack_summary.join(sack_yards, on='posteam')
sack_summary['sack_yards_lost'] = sack_summary['sack_yards_lost'].fillna(0)
sack_summary['sack_rate'] = sack_summary['sacks'] / sack_summary['pass_attempts']
sack_summary['avg_sack_yards'] = np.where(
    sack_summary['sacks'] > 0,
    sack_summary['sack_yards_lost'] / sack_summary['sacks'],
    0
)

# Display best pass protection
sack_best = (sack_summary
    .sort_values('sack_rate')
    .head(10)
)

print("\nBest Pass Protection (Lowest Sack Rate) - 2023 NFL Season")
print("=" * 75)
print(sack_best.to_string(index=False))

Run Blocking Efficiency

While individual run blocking metrics require advanced tracking data, we can approximate run blocking efficiency using yards per carry and success rate:

R
Python

#| label: run-blocking-r
#| message: false
#| warning: false

# Calculate run blocking efficiency
run_blocking <- pbp_2023 %>%
  filter(!is.na(posteam), play_type == "run") %>%
  group_by(posteam) %>%
  summarise(
    rush_attempts = n(),
    total_yards = sum(yards_gained, na.rm = TRUE),
    yards_per_carry = total_yards / rush_attempts,
    success_rate = mean(epa > 0, na.rm = TRUE),
    stuffed_rate = mean(yards_gained <= 0, na.rm = TRUE),
    explosive_runs = sum(yards_gained >= 10, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(yards_per_carry))

# Display results
run_blocking %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    rush_attempts = "Carries",
    total_yards = "Yards",
    yards_per_carry = "YPC",
    success_rate = "Success %",
    stuffed_rate = "Stuffed %",
    explosive_runs = "Explosive"
  ) %>%
  fmt_number(
    columns = yards_per_carry,
    decimals = 2
  ) %>%
  fmt_percent(
    columns = c(success_rate, stuffed_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(rush_attempts, total_yards, explosive_runs),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Run Blocking Efficiency Leaders",
    subtitle = "2023 NFL Season"
  )

#| label: run-blocking-py
#| message: false
#| warning: false

# Calculate run blocking efficiency
run_data = pbp_2023.query("posteam.notna() & play_type == 'run'").copy()

run_blocking = (run_data
    .groupby('posteam')
    .agg(
        rush_attempts=('play_id', 'count'),
        total_yards=('yards_gained', 'sum'),
        success_rate=('epa', lambda x: (x > 0).mean()),
        stuffed_rate=('yards_gained', lambda x: (x <= 0).mean()),
        explosive_runs=('yards_gained', lambda x: (x >= 10).sum())
    )
    .reset_index()
)

run_blocking['yards_per_carry'] = run_blocking['total_yards'] / run_blocking['rush_attempts']

# Display top 10
run_blocking_top10 = (run_blocking
    .sort_values('yards_per_carry', ascending=False)
    .head(10)
)

print("\nRun Blocking Efficiency Leaders - 2023 NFL Season")
print("=" * 85)
print(run_blocking_top10.to_string(index=False))

Red Zone Efficiency

The red zone (inside the opponent's 20-yard line) is where offenses must convert drives into points. Red zone efficiency is critical to winning games.

Red Zone Scoring

R
Python

#| label: red-zone-r
#| message: false
#| warning: false

# Calculate red zone efficiency
red_zone <- pbp_2023 %>%
  filter(!is.na(posteam), yardline_100 <= 20, down %in% c(1, 2, 3, 4)) %>%
  group_by(posteam, fixed_drive) %>%
  slice(1) %>%  # First play of each red zone drive
  ungroup() %>%
  group_by(posteam) %>%
  summarise(
    red_zone_trips = n(),
    touchdowns = sum(fixed_drive_result == "Touchdown", na.rm = TRUE),
    field_goals = sum(fixed_drive_result == "Field goal", na.rm = TRUE),
    scores = touchdowns + field_goals,
    td_rate = touchdowns / red_zone_trips,
    scoring_rate = scores / red_zone_trips,
    points = touchdowns * 7 + field_goals * 3,
    points_per_trip = points / red_zone_trips,
    .groups = "drop"
  ) %>%
  arrange(desc(td_rate))

# Display results
red_zone %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    red_zone_trips = "RZ Trips",
    touchdowns = "TDs",
    field_goals = "FGs",
    scores = "Scores",
    td_rate = "TD %",
    scoring_rate = "Score %",
    points_per_trip = "Pts/Trip"
  ) %>%
  fmt_percent(
    columns = c(td_rate, scoring_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(points_per_trip),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(red_zone_trips, touchdowns, field_goals, scores),
    decimals = 0
  ) %>%
  tab_header(
    title = "Red Zone Efficiency Leaders",
    subtitle = "2023 NFL Season - Ranked by TD Rate"
  )

#| label: red-zone-py
#| message: false
#| warning: false

# Calculate red zone efficiency
red_zone_data = pbp_2023.query(
    "posteam.notna() & yardline_100 <= 20 & down.isin([1, 2, 3, 4])"
).copy()

# Get first play of each red zone drive
red_zone_drives = (red_zone_data
    .sort_values(['posteam', 'fixed_drive', 'play_id'])
    .groupby(['posteam', 'fixed_drive'])
    .first()
    .reset_index()
)

red_zone_summary = (red_zone_drives
    .groupby('posteam')
    .agg(
        red_zone_trips=('fixed_drive', 'count'),
        touchdowns=('fixed_drive_result', lambda x: (x == 'Touchdown').sum()),
        field_goals=('fixed_drive_result', lambda x: (x == 'Field goal').sum())
    )
    .reset_index()
)

red_zone_summary['scores'] = red_zone_summary['touchdowns'] + red_zone_summary['field_goals']
red_zone_summary['td_rate'] = red_zone_summary['touchdowns'] / red_zone_summary['red_zone_trips']
red_zone_summary['scoring_rate'] = red_zone_summary['scores'] / red_zone_summary['red_zone_trips']
red_zone_summary['points'] = red_zone_summary['touchdowns'] * 7 + red_zone_summary['field_goals'] * 3
red_zone_summary['points_per_trip'] = red_zone_summary['points'] / red_zone_summary['red_zone_trips']

# Display top 10
red_zone_top10 = (red_zone_summary
    .sort_values('td_rate', ascending=False)
    .head(10)
)

print("\nRed Zone Efficiency Leaders - 2023 NFL Season")
print("Ranked by TD Rate")
print("=" * 90)
print(red_zone_top10.to_string(index=False))

Red Zone Play Calling

R
Python

#| label: fig-red-zone-tendencies-r
#| fig-cap: "Red zone scoring rates by team (2023 season)"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Create visualization
red_zone %>%
  mutate(
    td_pct = td_rate * 100,
    fg_pct = (field_goals / red_zone_trips) * 100,
    fail_pct = ((red_zone_trips - scores) / red_zone_trips) * 100
  ) %>%
  select(posteam, td_pct, fg_pct, fail_pct) %>%
  pivot_longer(
    cols = c(td_pct, fg_pct, fail_pct),
    names_to = "outcome",
    values_to = "percentage"
  ) %>%
  mutate(
    outcome = factor(outcome,
                    levels = c("fail_pct", "fg_pct", "td_pct"),
                    labels = c("No Score", "Field Goal", "Touchdown"))
  ) %>%
  ggplot(aes(x = reorder(posteam, percentage, function(x) sum(x[outcome == "Touchdown"])),
             y = percentage, fill = outcome)) +
  geom_col() +
  coord_flip() +
  scale_fill_manual(
    values = c("Touchdown" = "#4CAF50", "Field Goal" = "#FF9800", "No Score" = "#F44336")
  ) +
  labs(
    title = "Red Zone Outcomes by Team",
    subtitle = "2023 NFL Season - Percentage of Red Zone Trips",
    x = "Team",
    y = "Percentage of Red Zone Trips",
    fill = "Outcome",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 12),
    axis.text = element_text(size = 9),
    legend.position = "top"
  )

#| label: fig-red-zone-tendencies-py
#| fig-cap: "Red zone scoring rates by team - Python (2023 season)"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Prepare data for stacked bar chart
red_zone_viz = red_zone_summary.copy()
red_zone_viz['td_pct'] = (red_zone_viz['touchdowns'] / red_zone_viz['red_zone_trips']) * 100
red_zone_viz['fg_pct'] = (red_zone_viz['field_goals'] / red_zone_viz['red_zone_trips']) * 100
red_zone_viz['fail_pct'] = ((red_zone_viz['red_zone_trips'] - red_zone_viz['scores']) /
                            red_zone_viz['red_zone_trips']) * 100

red_zone_viz = red_zone_viz.sort_values('td_pct')

# Create stacked bar chart
fig, ax = plt.subplots(figsize=(12, 8))

teams = red_zone_viz['posteam']
y_pos = range(len(teams))

ax.barh(y_pos, red_zone_viz['td_pct'], label='Touchdown', color='#4CAF50')
ax.barh(y_pos, red_zone_viz['fg_pct'], left=red_zone_viz['td_pct'],
        label='Field Goal', color='#FF9800')
ax.barh(y_pos, red_zone_viz['fail_pct'],
        left=red_zone_viz['td_pct'] + red_zone_viz['fg_pct'],
        label='No Score', color='#F44336')

ax.set_yticks(y_pos)
ax.set_yticklabels(teams, fontsize=9)
ax.set_xlabel('Percentage of Red Zone Trips', fontsize=12)
ax.set_ylabel('Team', fontsize=12)
ax.set_title('Red Zone Outcomes by Team\n2023 NFL Season - Percentage of Red Zone Trips',
             fontsize=16, fontweight='bold', pad=20)
ax.legend(title='Outcome', loc='lower right', fontsize=10)
ax.set_xlim(0, 100)
ax.grid(axis='x', alpha=0.3)

plt.text(0.98, 0.02, 'Data: nfl_data_py',
         transform=ax.transAxes,
         ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

Third Down Conversions

Third down conversion rate is one of the most important offensive metrics. Converting third downs sustains drives and keeps the defense on the field.

R
Python

#| label: third-down-r
#| message: false
#| warning: false

# Calculate third down conversion rates
third_down <- pbp_2023 %>%
  filter(!is.na(posteam), down == 3, play_type %in% c("pass", "run")) %>%
  mutate(
    distance_category = case_when(
      ydstogo <= 3 ~ "Short (1-3 yds)",
      ydstogo <= 6 ~ "Medium (4-6 yds)",
      ydstogo <= 10 ~ "Long (7-10 yds)",
      TRUE ~ "Very Long (11+ yds)"
    )
  ) %>%
  group_by(posteam) %>%
  summarise(
    third_downs = n(),
    conversions = sum(third_down_converted == 1, na.rm = TRUE),
    conversion_rate = conversions / third_downs,
    avg_distance = mean(ydstogo, na.rm = TRUE),
    avg_epa = mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(conversion_rate))

# Display results
third_down %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    third_downs = "3rd Downs",
    conversions = "Conversions",
    conversion_rate = "Conv %",
    avg_distance = "Avg Distance",
    avg_epa = "Avg EPA"
  ) %>%
  fmt_percent(
    columns = conversion_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(avg_distance, avg_epa),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(third_downs, conversions),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Third Down Conversion Leaders",
    subtitle = "2023 NFL Season"
  )

#| label: third-down-py
#| message: false
#| warning: false

# Calculate third down conversion rates
third_down_data = pbp_2023.query(
    "posteam.notna() & down == 3 & play_type.isin(['pass', 'run'])"
).copy()

third_down_summary = (third_down_data
    .groupby('posteam')
    .agg(
        third_downs=('play_id', 'count'),
        conversions=('third_down_converted', 'sum'),
        avg_distance=('ydstogo', 'mean'),
        avg_epa=('epa', 'mean')
    )
    .reset_index()
)

third_down_summary['conversion_rate'] = third_down_summary['conversions'] / third_down_summary['third_downs']

# Display top 10
third_down_top10 = (third_down_summary
    .sort_values('conversion_rate', ascending=False)
    .head(10)
)

print("\nThird Down Conversion Leaders - 2023 NFL Season")
print("=" * 80)
print(third_down_top10.to_string(index=False))

Third Down by Distance

R
Python

#| label: fig-third-down-distance-r
#| fig-cap: "Third down conversion rate by distance (2023 season)"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Calculate conversion rate by distance
third_down_distance <- pbp_2023 %>%
  filter(!is.na(posteam), down == 3, play_type %in% c("pass", "run")) %>%
  mutate(
    distance_category = case_when(
      ydstogo <= 3 ~ "Short (1-3 yds)",
      ydstogo <= 6 ~ "Medium (4-6 yds)",
      ydstogo <= 10 ~ "Long (7-10 yds)",
      TRUE ~ "Very Long (11+ yds)"
    ),
    distance_category = factor(distance_category,
                               levels = c("Short (1-3 yds)", "Medium (4-6 yds)",
                                        "Long (7-10 yds)", "Very Long (11+ yds)"))
  ) %>%
  group_by(distance_category) %>%
  summarise(
    attempts = n(),
    conversions = sum(third_down_converted == 1, na.rm = TRUE),
    conversion_rate = conversions / attempts,
    .groups = "drop"
  )

# Create visualization
ggplot(third_down_distance, aes(x = distance_category, y = conversion_rate)) +
  geom_col(fill = "#00BFC4", alpha = 0.8) +
  geom_text(aes(label = scales::percent(conversion_rate, accuracy = 0.1)),
            vjust = -0.5, fontface = "bold", size = 4) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 0.7)) +
  labs(
    title = "Third Down Conversion Rate by Distance",
    subtitle = "2023 NFL Season",
    x = "Distance to First Down",
    y = "Conversion Rate",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    axis.text.x = element_text(angle = 0, hjust = 0.5, size = 10)
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-third-down-distance-py
#| fig-cap: "Third down conversion rate by distance - Python (2023 season)"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Calculate conversion rate by distance
def categorize_distance(yds):
    if yds <= 3:
        return "Short (1-3 yds)"
    elif yds <= 6:
        return "Medium (4-6 yds)"
    elif yds <= 10:
        return "Long (7-10 yds)"
    else:
        return "Very Long (11+ yds)"

third_down_dist = pbp_2023.query(
    "posteam.notna() & down == 3 & play_type.isin(['pass', 'run'])"
).copy()

third_down_dist['distance_category'] = third_down_dist['ydstogo'].apply(categorize_distance)

distance_summary = (third_down_dist
    .groupby('distance_category')
    .agg(
        attempts=('play_id', 'count'),
        conversions=('third_down_converted', 'sum')
    )
    .reset_index()
)

distance_summary['conversion_rate'] = distance_summary['conversions'] / distance_summary['attempts']

# Order categories
category_order = ["Short (1-3 yds)", "Medium (4-6 yds)", "Long (7-10 yds)", "Very Long (11+ yds)"]
distance_summary['distance_category'] = pd.Categorical(
    distance_summary['distance_category'],
    categories=category_order,
    ordered=True
)
distance_summary = distance_summary.sort_values('distance_category')

# Create visualization
fig, ax = plt.subplots(figsize=(10, 6))

bars = ax.bar(distance_summary['distance_category'],
              distance_summary['conversion_rate'],
              color='#00BFC4', alpha=0.8)

# Add percentage labels on bars
for i, bar in enumerate(bars):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'{height:.1%}',
            ha='center', va='bottom', fontweight='bold', fontsize=11)

ax.set_ylim(0, 0.7)
ax.set_ylabel('Conversion Rate', fontsize=12)
ax.set_xlabel('Distance to First Down', fontsize=12)
ax.set_title('Third Down Conversion Rate by Distance\n2023 NFL Season',
             fontsize=14, fontweight='bold', pad=20)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, p: f'{y:.0%}'))
ax.grid(axis='y', alpha=0.3)

plt.text(0.98, 0.02, 'Data: nfl_data_py',
         transform=ax.transAxes,
         ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Time of Possession and Pace

Time of Possession

Time of possession (TOP) measures how long an offense controls the ball. While not directly correlated with winning, it can indicate offensive efficiency and keep opposing offenses off the field.

Pace of Play

Pace refers to how quickly an offense runs plays. It's typically measured as seconds per play:

$$ \text{Seconds Per Play} = \frac{\text{Total Game Time}}{\text{Total Plays}} $$

R
Python

#| label: pace-analysis-r
#| message: false
#| warning: false

# Calculate pace of play
pace_data <- pbp_2023 %>%
  filter(!is.na(posteam), play_type %in% c("pass", "run"), !is.na(drive_play_count)) %>%
  group_by(posteam, game_id, fixed_drive) %>%
  summarise(
    drive_plays = max(drive_play_count, na.rm = TRUE),
    drive_duration = max(drive_time_of_possession, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(drive_duration > 0, drive_plays > 0) %>%
  group_by(posteam) %>%
  summarise(
    total_drives = n(),
    avg_plays_per_drive = mean(drive_plays, na.rm = TRUE),
    avg_drive_duration = mean(drive_duration, na.rm = TRUE),
    seconds_per_play = avg_drive_duration / avg_plays_per_drive,
    .groups = "drop"
  ) %>%
  arrange(seconds_per_play)

# Display fastest-paced offenses
pace_data %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    total_drives = "Drives",
    avg_plays_per_drive = "Plays/Drive",
    avg_drive_duration = "Avg Duration (sec)",
    seconds_per_play = "Sec/Play"
  ) %>%
  fmt_number(
    columns = c(avg_plays_per_drive, avg_drive_duration, seconds_per_play),
    decimals = 1
  ) %>%
  fmt_number(
    columns = total_drives,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Fastest-Paced Offenses",
    subtitle = "2023 NFL Season - Ranked by Seconds Per Play"
  )

#| label: pace-analysis-py
#| message: false
#| warning: false

# Calculate pace of play
pace_plays = pbp_2023.query(
    "posteam.notna() & play_type.isin(['pass', 'run']) & drive_play_count.notna()"
).copy()

drive_summary = (pace_plays
    .groupby(['posteam', 'game_id', 'fixed_drive'])
    .agg(
        drive_plays=('drive_play_count', 'max'),
        drive_duration=('drive_time_of_possession', 'max')
    )
    .reset_index()
    .query("drive_duration > 0 & drive_plays > 0")
)

pace_summary = (drive_summary
    .groupby('posteam')
    .agg(
        total_drives=('fixed_drive', 'count'),
        avg_plays_per_drive=('drive_plays', 'mean'),
        avg_drive_duration=('drive_duration', 'mean')
    )
    .reset_index()
)

pace_summary['seconds_per_play'] = pace_summary['avg_drive_duration'] / pace_summary['avg_plays_per_drive']

# Display fastest-paced offenses
pace_fastest = (pace_summary
    .sort_values('seconds_per_play')
    .head(10)
)

print("\nFastest-Paced Offenses - 2023 NFL Season")
print("Ranked by Seconds Per Play")
print("=" * 85)
print(pace_fastest.to_string(index=False))

Situational Offense

Early Down Efficiency

Early down efficiency (1st and 2nd down) sets up favorable third down situations:

R
Python

#| label: early-down-r
#| message: false
#| warning: false

# Analyze early down performance
early_down <- pbp_2023 %>%
  filter(!is.na(posteam), down %in% c(1, 2), play_type %in% c("pass", "run")) %>%
  group_by(posteam) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    avg_yards = mean(yards_gained, na.rm = TRUE),
    explosive_rate = mean(
      (play_type == "pass" & yards_gained >= 20) |
      (play_type == "run" & yards_gained >= 10),
      na.rm = TRUE
    ),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_epa))

# Display results
early_down %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    plays = "Plays",
    avg_epa = "Avg EPA",
    success_rate = "Success %",
    avg_yards = "Avg Yards",
    explosive_rate = "Explosive %"
  ) %>%
  fmt_number(
    columns = c(avg_epa, avg_yards),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = c(success_rate, explosive_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Early Down Efficiency Leaders",
    subtitle = "2023 NFL Season - 1st and 2nd Down"
  )

#| label: early-down-py
#| message: false
#| warning: false

# Analyze early down performance
early_down_data = pbp_2023.query(
    "posteam.notna() & down.isin([1, 2]) & play_type.isin(['pass', 'run'])"
).copy()

# Calculate explosive plays
early_down_data['is_explosive'] = (
    ((early_down_data['play_type'] == 'pass') & (early_down_data['yards_gained'] >= 20)) |
    ((early_down_data['play_type'] == 'run') & (early_down_data['yards_gained'] >= 10))
).astype(int)

early_down_summary = (early_down_data
    .groupby('posteam')
    .agg(
        plays=('play_id', 'count'),
        avg_epa=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean()),
        avg_yards=('yards_gained', 'mean'),
        explosive_rate=('is_explosive', 'mean')
    )
    .reset_index()
)

# Display top 10
early_down_top10 = (early_down_summary
    .sort_values('avg_epa', ascending=False)
    .head(10)
)

print("\nEarly Down Efficiency Leaders - 2023 NFL Season")
print("1st and 2nd Down")
print("=" * 85)
print(early_down_top10.to_string(index=False))

Late-Game Situations

Performance in close games (score within one score in the 4th quarter) is critical:

R
Python

#| label: clutch-offense-r
#| message: false
#| warning: false

# Analyze late-game performance
clutch_offense <- pbp_2023 %>%
  filter(
    !is.na(posteam),
    play_type %in% c("pass", "run"),
    qtr == 4,
    abs(score_differential) <= 8
  ) %>%
  group_by(posteam) %>%
  summarise(
    plays = n(),
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    first_down_rate = mean(first_down == 1, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(plays >= 30) %>%  # Minimum play threshold
  arrange(desc(avg_epa))

# Display results
clutch_offense %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    plays = "Plays",
    avg_epa = "Avg EPA",
    success_rate = "Success %",
    first_down_rate = "1st Down %"
  ) %>%
  fmt_number(
    columns = avg_epa,
    decimals = 3
  ) %>%
  fmt_percent(
    columns = c(success_rate, first_down_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Clutch Offense Leaders",
    subtitle = "2023 NFL Season - 4th Quarter, One Score Games"
  )

#| label: clutch-offense-py
#| message: false
#| warning: false

# Analyze late-game performance
clutch_data = pbp_2023.query(
    "posteam.notna() & play_type.isin(['pass', 'run']) & "
    "qtr == 4 & score_differential.abs() <= 8"
).copy()

clutch_summary = (clutch_data
    .groupby('posteam')
    .agg(
        plays=('play_id', 'count'),
        avg_epa=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean()),
        first_down_rate=('first_down', 'mean')
    )
    .reset_index()
    .query("plays >= 30")  # Minimum play threshold
)

# Display top 10
clutch_top10 = (clutch_summary
    .sort_values('avg_epa', ascending=False)
    .head(10)
)

print("\nClutch Offense Leaders - 2023 NFL Season")
print("4th Quarter, One Score Games (min. 30 plays)")
print("=" * 75)
print(clutch_top10.to_string(index=False))

Comprehensive Offensive Dashboard

Let's combine multiple metrics into a comprehensive offensive ranking:

R
Python

#| label: comprehensive-offense-r
#| message: false
#| warning: false

# Create comprehensive offensive metrics
comprehensive <- pbp_2023 %>%
  filter(!is.na(posteam), play_type %in% c("pass", "run")) %>%
  group_by(posteam) %>%
  summarise(
    plays = n(),
    yards_per_play = mean(yards_gained, na.rm = TRUE),
    avg_epa = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    explosive_rate = mean(
      (play_type == "pass" & yards_gained >= 20) |
      (play_type == "run" & yards_gained >= 10),
      na.rm = TRUE
    ),
    .groups = "drop"
  ) %>%
  # Calculate z-scores for composite ranking
  mutate(
    epa_z = scale(avg_epa),
    success_z = scale(success_rate),
    explosive_z = scale(explosive_rate),
    composite_score = (epa_z + success_z + explosive_z) / 3
  ) %>%
  arrange(desc(composite_score))

# Display top 10
comprehensive %>%
  head(10) %>%
  select(posteam, plays, yards_per_play, avg_epa, success_rate, explosive_rate, composite_score) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    plays = "Plays",
    yards_per_play = "YPP",
    avg_epa = "EPA",
    success_rate = "Success %",
    explosive_rate = "Explosive %",
    composite_score = "Score"
  ) %>%
  fmt_number(
    columns = c(yards_per_play, avg_epa, composite_score),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = c(success_rate, explosive_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  data_color(
    columns = composite_score,
    colors = scales::col_numeric(
      palette = c("#F44336", "#FFFFFF", "#4CAF50"),
      domain = c(-2, 2)
    )
  ) %>%
  tab_header(
    title = "Comprehensive Offensive Rankings",
    subtitle = "2023 NFL Season - Composite Score (EPA + Success + Explosive)"
  )

#| label: comprehensive-offense-py
#| message: false
#| warning: false

from scipy import stats

# Create comprehensive offensive metrics
comp_data = pbp_2023.query(
    "posteam.notna() & play_type.isin(['pass', 'run'])"
).copy()

# Calculate explosive plays
comp_data['is_explosive'] = (
    ((comp_data['play_type'] == 'pass') & (comp_data['yards_gained'] >= 20)) |
    ((comp_data['play_type'] == 'run') & (comp_data['yards_gained'] >= 10))
).astype(int)

comprehensive = (comp_data
    .groupby('posteam')
    .agg(
        plays=('play_id', 'count'),
        yards_per_play=('yards_gained', 'mean'),
        avg_epa=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean()),
        explosive_rate=('is_explosive', 'mean')
    )
    .reset_index()
)

# Calculate z-scores
comprehensive['epa_z'] = stats.zscore(comprehensive['avg_epa'])
comprehensive['success_z'] = stats.zscore(comprehensive['success_rate'])
comprehensive['explosive_z'] = stats.zscore(comprehensive['explosive_rate'])
comprehensive['composite_score'] = (
    comprehensive['epa_z'] +
    comprehensive['success_z'] +
    comprehensive['explosive_z']
) / 3

# Display top 10
comp_top10 = (comprehensive
    .sort_values('composite_score', ascending=False)
    .head(10)
    [['posteam', 'plays', 'yards_per_play', 'avg_epa',
      'success_rate', 'explosive_rate', 'composite_score']]
)

print("\nComprehensive Offensive Rankings - 2023 NFL Season")
print("Composite Score (EPA + Success + Explosive)")
print("=" * 95)
print(comp_top10.to_string(index=False))

Summary

In this chapter, we've covered the fundamental concepts and metrics for evaluating offensive performance:

Key Takeaways:

Efficiency Matters: Yards per play, EPA, and success rate provide more insight than volume statistics
Context is Critical: Situational metrics (red zone, third down, personnel) reveal how offenses perform when it matters most
Balance vs Effectiveness: The pass-run balance matters less than efficiency in each area
Explosive Plays Win Games: Generating big plays is one of the strongest predictors of offensive success
Comprehensive Evaluation: No single metric tells the whole story—combine multiple measures for complete analysis

Metrics Covered:

Basic efficiency: Yards per play, first down rate, scoring efficiency
Personnel analysis: Formation and grouping tendencies
Play-calling: Pass-run balance, situational tendencies
Explosive plays: 20+ yard passes, 10+ yard runs
Offensive line: Sack rates, run blocking efficiency
Red zone: Touchdown and scoring rates
Third down: Conversion rates by distance
Pace: Time of possession and seconds per play
Situational: Early down, clutch situations

Exercises

Conceptual Questions

Efficiency vs Volume: Why is yards per play generally more predictive of success than total yards? What are the limitations of using yards per play as an evaluation metric?
Personnel Groupings: How do personnel groupings (11, 12, 21, etc.) influence defensive strategies? What advantages does each grouping provide?
Red Zone Philosophy: Some teams prioritize touchdown rate in the red zone, while others accept field goals as successful outcomes. What factors should influence this philosophy?

Coding Exercises

Exercise 1: Team Offensive Profile

Create a comprehensive offensive profile for a specific team that includes: a) Overall efficiency metrics (EPA, success rate, yards per play) b) Pass-run split and efficiency in each c) Explosive play rates d) Red zone performance e) Third down conversion rates f) A visualization comparing the team to league averages **Bonus**: Add play-calling tendencies by down and distance.

Exercise 2: Formation Effectiveness Analysis

Analyze the effectiveness of different personnel groupings: a) Calculate EPA and success rate for each personnel grouping b) Determine pass rate from each grouping c) Identify which groupings are most effective for each team d) Create a visualization showing personnel usage vs effectiveness **Hint**: Filter for plays where `personnel_o` is not null.

Exercise 3: Red Zone Optimization

Examine red zone play-calling and success: a) Calculate pass vs run rates in different red zone areas (inside 5, 6-10, 11-20) b) Measure EPA for passes vs runs in each area c) Identify teams that are most efficient in red zone d) Determine optimal play-calling strategies by field position **Bonus**: Include down and distance in your analysis.

Exercise 4: Advanced Situational Analysis

Create a comprehensive situational offense report: a) Early down (1st & 2nd) efficiency metrics b) Third down conversion rates by distance category c) Two-minute drill efficiency (last 2 minutes of half) d) Performance when leading vs trailing e) Visualizations comparing teams across situations **Challenge**: Identify which teams perform best under pressure (close games, 4th quarter).

Exercise 5: Explosive Play Analysis

Deep dive into explosive plays: a) Calculate explosive play rates for each team b) Analyze which personnel groupings generate the most explosive plays c) Examine explosive play rates by down and distance d) Determine correlation between explosive play rate and points scored e) Create visualizations showing explosive play trends **Bonus**: Compare explosive play rates in different game situations (score differential, quarter).

References

:::

Learning ObjectivesBy the end of this chapter, you will be able to:

Passing vs Rushing EPA Comparison

Introduction

What is Offensive Analytics?

The Evolution of Offensive Football

From Three Yards and a Cloud of Dust to Air Raid

The Analytics Impact

Basic Offensive Metrics

Traditional Statistics

Yards Per Play

First Down Rate

Scoring Efficiency

Offensive Formations and Personnel Groupings

Personnel Groupings

Formation Analysis

Formation Tendencies

Play-Calling Balance and Tendencies

Pass-Run Balance

Situational Tendencies

Explosive Plays

Why Explosive Plays Matter

Measuring Explosive Play Rate

Visualizing Explosive Play Rates

📊 Visualization Output

Offensive Line Metrics

Sacks and Pressure

Run Blocking Efficiency

Red Zone Efficiency

Red Zone Scoring

Red Zone Play Calling

Third Down Conversions

Third Down by Distance

📊 Visualization Output

📊 Visualization Output

Time of Possession and Pace

Time of Possession

Pace of Play

Situational Offense

Early Down Efficiency

Late-Game Situations

Comprehensive Offensive Dashboard

Summary

Exercises

Conceptual Questions

Coding Exercises

Exercise 1: Team Offensive Profile

Exercise 2: Formation Effectiveness Analysis

Exercise 3: Red Zone Optimization

Exercise 4: Advanced Situational Analysis

Exercise 5: Explosive Play Analysis

Further Reading

References