Learning ObjectivesBy the end of this chapter, you will be able to:

  1. Evaluate running back performance beyond yards per carry
  2. Understand rush EPA and success rate
  3. Analyze offensive line run blocking effectiveness
  4. Study gap schemes and run concepts
  5. Measure broken tackles and yards after contact

Introduction

The rushing attack has long been one of the most debated aspects of football strategy. Traditional wisdom emphasizes "establishing the run," while modern analytics often questions this approach. The truth lies somewhere in between, and understanding it requires moving beyond simplistic metrics like yards per carry to a more nuanced evaluation of rushing efficiency.

This chapter explores the complete landscape of rushing analytics, from individual running back evaluation to offensive line performance, from situational rushing success to the evolving role of running backs as receivers. We'll examine why traditional metrics fail to capture the full value (or limitations) of the running game and introduce advanced frameworks for comprehensive rushing analysis.

The Modern Running Back

Today's NFL running backs are asked to do far more than simply carry the football. They must pass protect, run routes, catch passes, and execute in diverse situations. Evaluating running backs requires a multi-dimensional approach that captures all these contributions.

The Problem with Yards Per Carry

Why YPC is Misleading

Yards per carry (YPC) has been the standard measure of running back efficiency for decades. While simple and intuitive, it has several critical limitations:

Context Independence: YPC treats all carries equally, ignoring:
- Down and distance situations
- Field position
- Score differential
- Defensive personnel
- Game script

Outlier Sensitivity: A single 80-yard run can dramatically inflate YPC while representing an unrepeatable event.

Volume Bias: Running backs with fewer carries often have higher YPC due to selective usage in favorable situations.

Success Definition: A 5-yard gain on 3rd-and-10 and a 5-yard gain on 3rd-and-4 produce identical YPC but vastly different outcomes.

Let's examine real data to illustrate these limitations:

#| label: ypc-limitations-r
#| message: false
#| warning: false
#| cache: true

library(tidyverse)
library(nflfastR)
library(gt)
library(nflplotR)

# Load 2023 play-by-play data
pbp <- load_pbp(2023)

# Calculate basic rushing stats
rb_basic <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id)) %>%
  group_by(rusher_player_name, posteam) %>%
  summarise(
    carries = n(),
    total_yards = sum(yards_gained, na.rm = TRUE),
    ypc = mean(yards_gained, na.rm = TRUE),
    longest = max(yards_gained, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(carries >= 100) %>%
  arrange(desc(ypc))

# Show top 10 by YPC
rb_basic %>%
  head(10) %>%
  gt() %>%
  cols_label(
    rusher_player_name = "Player",
    posteam = "Team",
    carries = "Carries",
    total_yards = "Yards",
    ypc = "YPC",
    longest = "Long"
  ) %>%
  fmt_number(
    columns = ypc,
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(carries, total_yards, longest),
    decimals = 0
  ) %>%
  tab_header(
    title = "Top Running Backs by Yards Per Carry",
    subtitle = "2023 Season (Min. 100 carries)"
  )
#| label: ypc-limitations-py
#| message: false
#| warning: false
#| cache: true

import pandas as pd
import numpy as np
import nfl_data_py as nfl

# Load 2023 play-by-play data
pbp = nfl.import_pbp_data([2023])

# Calculate basic rushing stats
rb_basic = (pbp
    .query("play_type == 'run' & rusher_player_id.notna()")
    .groupby(['rusher_player_name', 'posteam'])
    .agg(
        carries=('play_id', 'count'),
        total_yards=('yards_gained', 'sum'),
        ypc=('yards_gained', 'mean'),
        longest=('yards_gained', 'max')
    )
    .reset_index()
    .query("carries >= 100")
    .sort_values('ypc', ascending=False)
)

print("Top Running Backs by Yards Per Carry (2023 Season, Min. 100 carries):\n")
print(rb_basic.head(10).to_string(index=False))

The Outlier Effect

Let's examine how a single long run impacts YPC:

#| label: outlier-effect-r
#| message: false
#| warning: false

# Analyze impact of longest run on YPC
outlier_impact <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id)) %>%
  group_by(rusher_player_name) %>%
  filter(n() >= 100) %>%
  summarise(
    carries = n(),
    ypc_with_long = mean(yards_gained, na.rm = TRUE),
    ypc_without_long = mean(yards_gained[yards_gained != max(yards_gained)], na.rm = TRUE),
    longest_run = max(yards_gained, na.rm = TRUE),
    ypc_difference = ypc_with_long - ypc_without_long,
    .groups = "drop"
  ) %>%
  arrange(desc(ypc_difference))

# Show top cases
outlier_impact %>%
  head(10) %>%
  gt() %>%
  cols_label(
    rusher_player_name = "Player",
    carries = "Carries",
    ypc_with_long = "YPC (w/ Long)",
    ypc_without_long = "YPC (w/o Long)",
    longest_run = "Longest Run",
    ypc_difference = "Difference"
  ) %>%
  fmt_number(
    columns = c(ypc_with_long, ypc_without_long, ypc_difference),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(carries, longest_run),
    decimals = 0
  ) %>%
  tab_header(
    title = "Impact of Longest Run on YPC",
    subtitle = "How much does one run skew the average?"
  )
#| label: outlier-effect-py
#| message: false
#| warning: false

# Analyze impact of longest run on YPC
def calc_ypc_without_max(series):
    return series[series != series.max()].mean()

outlier_impact = (pbp
    .query("play_type == 'run' & rusher_player_id.notna()")
    .groupby('rusher_player_name')
    .filter(lambda x: len(x) >= 100)
    .groupby('rusher_player_name')
    .agg(
        carries=('yards_gained', 'count'),
        ypc_with_long=('yards_gained', 'mean'),
        ypc_without_long=('yards_gained', calc_ypc_without_max),
        longest_run=('yards_gained', 'max')
    )
    .reset_index()
)

outlier_impact['ypc_difference'] = (outlier_impact['ypc_with_long'] -
                                     outlier_impact['ypc_without_long'])

print("\nImpact of Longest Run on YPC:\n")
print(outlier_impact.nlargest(10, 'ypc_difference').to_string(index=False))

Rush EPA: A Better Framework

Expected Points Added (EPA) provides a superior framework for evaluating rushing plays by accounting for context. Rush EPA measures the change in expected points from the start to end of a rushing play.

Understanding Rush EPA

Why Rush EPA is Better Than YPC

Rush EPA accounts for: 1. **Down and distance**: A 3-yard gain on 3rd-and-2 is more valuable than on 3rd-and-10 2. **Field position**: Gains near the goal line are worth more 3. **Game situation**: Context matters for true value assessment 4. **Success probability**: Not all yards are created equal

Let's calculate rush EPA for all running backs:

#| label: rush-epa-r
#| message: false
#| warning: false

# Calculate comprehensive rushing metrics
rb_advanced <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id), !is.na(epa)) %>%
  group_by(rusher_player_name, posteam) %>%
  summarise(
    carries = n(),
    total_yards = sum(yards_gained, na.rm = TRUE),
    ypc = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    total_epa = sum(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(carries >= 100) %>%
  arrange(desc(epa_per_rush))

# Display top rushers by EPA
rb_advanced %>%
  head(15) %>%
  gt() %>%
  cols_label(
    rusher_player_name = "Player",
    posteam = "Team",
    carries = "Carries",
    total_yards = "Yards",
    ypc = "YPC",
    epa_per_rush = "EPA/Rush",
    total_epa = "Total EPA",
    success_rate = "Success %"
  ) %>%
  fmt_number(
    columns = c(ypc, epa_per_rush),
    decimals = 3
  ) %>%
  fmt_number(
    columns = total_epa,
    decimals = 1
  ) %>%
  fmt_percent(
    columns = success_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(carries, total_yards),
    decimals = 0
  ) %>%
  data_color(
    columns = epa_per_rush,
    colors = scales::col_numeric(
      palette = c("red", "white", "green"),
      domain = c(-0.2, 0.2)
    )
  ) %>%
  tab_header(
    title = "Running Back Efficiency Rankings",
    subtitle = "2023 Season (Min. 100 carries)"
  )
#| label: rush-epa-py
#| message: false
#| warning: false

# Calculate comprehensive rushing metrics
rb_advanced = (pbp
    .query("play_type == 'run' & rusher_player_id.notna() & epa.notna()")
    .groupby(['rusher_player_name', 'posteam'])
    .agg(
        carries=('play_id', 'count'),
        total_yards=('yards_gained', 'sum'),
        ypc=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean'),
        total_epa=('epa', 'sum'),
        success_rate=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
    .query("carries >= 100")
    .sort_values('epa_per_rush', ascending=False)
)

print("\nRunning Back Efficiency Rankings (2023 Season):\n")
print(rb_advanced.head(15).to_string(index=False))

Comparing YPC and EPA Rankings

Not all efficient rushers by YPC are efficient by EPA:

#| label: fig-ypc-vs-epa-r
#| fig-cap: "Yards per carry vs EPA per rush for running backs"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

rb_advanced %>%
  ggplot(aes(x = ypc, y = epa_per_rush)) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_vline(xintercept = mean(rb_advanced$ypc), linetype = "dashed", alpha = 0.5) +
  geom_point(aes(size = carries), alpha = 0.6, color = "#013369") +
  geom_smooth(method = "lm", se = TRUE, color = "#D50A0A", linetype = "dashed") +
  scale_size_continuous(range = c(2, 10), name = "Carries") +
  labs(
    title = "Running Back Efficiency: YPC vs EPA per Rush",
    subtitle = "2023 NFL Season (Min. 100 carries)",
    x = "Yards Per Carry",
    y = "EPA per Rush",
    caption = "Data: nflfastR | Positive EPA indicates successful plays"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    legend.position = "bottom"
  )
#| label: fig-ypc-vs-epa-py
#| fig-cap: "Yards per carry vs EPA per rush - Python"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

fig, ax = plt.subplots(figsize=(10, 8))

# Create scatter plot
scatter = ax.scatter(rb_advanced['ypc'], rb_advanced['epa_per_rush'],
                     s=rb_advanced['carries']*2, alpha=0.6, c='#013369')

# Add reference lines
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=rb_advanced['ypc'].mean(), color='gray', linestyle='--', alpha=0.5)

# Add regression line
z = np.polyfit(rb_advanced['ypc'], rb_advanced['epa_per_rush'], 1)
p = np.poly1d(z)
ax.plot(rb_advanced['ypc'], p(rb_advanced['ypc']),
        "r--", alpha=0.8, linewidth=2)

ax.set_xlabel('Yards Per Carry', fontsize=12)
ax.set_ylabel('EPA per Rush', fontsize=12)
ax.set_title('Running Back Efficiency: YPC vs EPA per Rush\n2023 NFL Season (Min. 100 carries)',
             fontsize=14, fontweight='bold')
ax.text(0.98, 0.02, 'Data: nfl_data_py | Positive EPA indicates successful plays',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Success Rate and Consistency

Success rate measures the percentage of plays that generate positive EPA. While EPA per rush captures average value, success rate reveals consistency.

Defining Rush Success

The standard definition of success:

  • 1st down: Gain at least 40% of yards needed
  • 2nd down: Gain at least 60% of yards needed
  • 3rd/4th down: Achieve the first down

However, using EPA > 0 provides a more nuanced measure that accounts for field position and game situation.

#| label: success-rate-analysis-r
#| message: false
#| warning: false

# Calculate success rates by different definitions
success_comparison <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id), !is.na(epa)) %>%
  mutate(
    # Traditional success definition
    trad_success = case_when(
      down == 1 ~ yards_gained >= 0.4 * ydstogo,
      down == 2 ~ yards_gained >= 0.6 * ydstogo,
      down >= 3 ~ yards_gained >= ydstogo,
      TRUE ~ FALSE
    ),
    # EPA-based success
    epa_success = epa > 0
  ) %>%
  group_by(rusher_player_name, posteam) %>%
  filter(n() >= 100) %>%
  summarise(
    carries = n(),
    trad_success_rate = mean(trad_success, na.rm = TRUE),
    epa_success_rate = mean(epa_success, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(epa_success_rate))

# Display top by EPA success rate
success_comparison %>%
  head(15) %>%
  gt() %>%
  cols_label(
    rusher_player_name = "Player",
    posteam = "Team",
    carries = "Carries",
    trad_success_rate = "Traditional Success %",
    epa_success_rate = "EPA Success %",
    epa_per_rush = "EPA/Rush"
  ) %>%
  fmt_percent(
    columns = c(trad_success_rate, epa_success_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = epa_per_rush,
    decimals = 3
  ) %>%
  fmt_number(
    columns = carries,
    decimals = 0
  ) %>%
  tab_header(
    title = "Running Back Success Rates",
    subtitle = "Comparing traditional vs EPA-based success definitions"
  )
#| label: success-rate-analysis-py
#| message: false
#| warning: false

# Calculate success rates by different definitions
rushes = pbp.query("play_type == 'run' & rusher_player_id.notna() & epa.notna()").copy()

# Traditional success definition
def trad_success(row):
    if row['down'] == 1:
        return row['yards_gained'] >= 0.4 * row['ydstogo']
    elif row['down'] == 2:
        return row['yards_gained'] >= 0.6 * row['ydstogo']
    elif row['down'] >= 3:
        return row['yards_gained'] >= row['ydstogo']
    return False

rushes['trad_success'] = rushes.apply(trad_success, axis=1)
rushes['epa_success'] = rushes['epa'] > 0

success_comparison = (rushes
    .groupby(['rusher_player_name', 'posteam'])
    .filter(lambda x: len(x) >= 100)
    .groupby(['rusher_player_name', 'posteam'])
    .agg(
        carries=('play_id', 'count'),
        trad_success_rate=('trad_success', 'mean'),
        epa_success_rate=('epa_success', 'mean'),
        epa_per_rush=('epa', 'mean')
    )
    .reset_index()
    .sort_values('epa_success_rate', ascending=False)
)

print("\nRunning Back Success Rates:\n")
print(success_comparison.head(15).to_string(index=False))

Yards Before and After Contact

One of the most important advanced rushing metrics is the split between yards before contact (blocked by offensive line) and yards after contact (created by the running back).

Measuring Individual Contribution

While we don't have contact data in standard play-by-play, we can use Next Gen Stats data or estimate based on tackle data when available. For this analysis, we'll demonstrate the concept:

Data Limitation

Standard nflfastR data doesn't include yards before/after contact. This metric typically requires: - Next Gen Stats tracking data - Manual charting services (Pro Football Focus) - Advanced video analysis The code below demonstrates the analytical approach using simulated contact data.
#| label: contact-yards-r
#| message: false
#| warning: false

# Simulate yards before/after contact for demonstration
# In practice, this would come from NGS or PFF data
set.seed(42)

rb_contact <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id)) %>%
  group_by(rusher_player_name, posteam) %>%
  filter(n() >= 100) %>%
  ungroup() %>%
  mutate(
    # Simulate contact point (typically 2-4 yards)
    yards_before_contact = pmax(0, pmin(yards_gained,
                                         rnorm(n(), mean = 2.5, sd = 1.5))),
    yards_after_contact = pmax(0, yards_gained - yards_before_contact)
  ) %>%
  group_by(rusher_player_name, posteam) %>%
  summarise(
    carries = n(),
    avg_yards = mean(yards_gained, na.rm = TRUE),
    avg_ybc = mean(yards_before_contact, na.rm = TRUE),
    avg_yac = mean(yards_after_contact, na.rm = TRUE),
    yac_percentage = mean(yards_after_contact / pmax(1, yards_gained), na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_yac))

# Display leaders in yards after contact
rb_contact %>%
  head(15) %>%
  gt() %>%
  cols_label(
    rusher_player_name = "Player",
    posteam = "Team",
    carries = "Carries",
    avg_yards = "Avg Yards",
    avg_ybc = "Yards Before Contact",
    avg_yac = "Yards After Contact",
    yac_percentage = "YAC %"
  ) %>%
  fmt_number(
    columns = c(avg_yards, avg_ybc, avg_yac),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = yac_percentage,
    decimals = 1
  ) %>%
  fmt_number(
    columns = carries,
    decimals = 0
  ) %>%
  tab_header(
    title = "Yards Before vs After Contact",
    subtitle = "Simulated data for demonstration purposes"
  )
#| label: contact-yards-py
#| message: false
#| warning: false

# Simulate yards before/after contact for demonstration
np.random.seed(42)

rushes_contact = pbp.query("play_type == 'run' & rusher_player_id.notna()").copy()

# Filter to RBs with 100+ carries
rb_counts = rushes_contact.groupby('rusher_player_name').size()
qualifying_rbs = rb_counts[rb_counts >= 100].index

rushes_contact = rushes_contact[rushes_contact['rusher_player_name'].isin(qualifying_rbs)]

# Simulate contact point
rushes_contact['yards_before_contact'] = np.maximum(0,
    np.minimum(rushes_contact['yards_gained'],
               np.random.normal(2.5, 1.5, len(rushes_contact))))
rushes_contact['yards_after_contact'] = np.maximum(0,
    rushes_contact['yards_gained'] - rushes_contact['yards_before_contact'])

rb_contact = (rushes_contact
    .groupby(['rusher_player_name', 'posteam'])
    .agg(
        carries=('play_id', 'count'),
        avg_yards=('yards_gained', 'mean'),
        avg_ybc=('yards_before_contact', 'mean'),
        avg_yac=('yards_after_contact', 'mean'),
        yac_percentage=('yards_after_contact',
                       lambda x: (x / np.maximum(1, rushes_contact.loc[x.index, 'yards_gained'])).mean())
    )
    .reset_index()
    .sort_values('avg_yac', ascending=False)
)

print("\nYards Before vs After Contact (Simulated):\n")
print(rb_contact.head(15).to_string(index=False))

Broken Tackles and Elusiveness

Broken tackles measure a running back's ability to force missed tackles. This is a key indicator of individual talent independent of blocking.

Elusiveness Rating

Elusiveness combines multiple factors: - Broken tackles per carry - Yards after contact - Juke rate (from tracking data) - Success rate on broken tackles

While broken tackle data typically comes from charting services, we can analyze the concept:

#| label: broken-tackles-r
#| message: false
#| warning: false

# Simulate broken tackle data for demonstration
set.seed(123)

rb_elusiveness <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id)) %>%
  group_by(rusher_player_name, posteam) %>%
  filter(n() >= 100) %>%
  ungroup() %>%
  mutate(
    # Simulate broken tackles (higher probability on longer runs)
    broken_tackles = rpois(n(), lambda = pmax(0, (yards_gained - 3) / 10))
  ) %>%
  group_by(rusher_player_name, posteam) %>%
  summarise(
    carries = n(),
    total_broken_tackles = sum(broken_tackles, na.rm = TRUE),
    broken_tackle_rate = mean(broken_tackles, na.rm = TRUE),
    carries_with_bt = sum(broken_tackles > 0, na.rm = TRUE),
    bt_percentage = mean(broken_tackles > 0, na.rm = TRUE),
    avg_yards = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(broken_tackle_rate))

# Display most elusive runners
rb_elusiveness %>%
  head(15) %>%
  gt() %>%
  cols_label(
    rusher_player_name = "Player",
    posteam = "Team",
    carries = "Carries",
    total_broken_tackles = "Total BT",
    broken_tackle_rate = "BT per Carry",
    bt_percentage = "Plays w/ BT %",
    avg_yards = "YPC",
    epa_per_rush = "EPA/Rush"
  ) %>%
  fmt_number(
    columns = c(broken_tackle_rate, avg_yards, epa_per_rush),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = bt_percentage,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(carries, total_broken_tackles),
    decimals = 0
  ) %>%
  tab_header(
    title = "Most Elusive Running Backs",
    subtitle = "Simulated broken tackle data"
  )
#| label: broken-tackles-py
#| message: false
#| warning: false

# Simulate broken tackle data
np.random.seed(123)

rushes_bt = pbp.query("play_type == 'run' & rusher_player_id.notna()").copy()
qualifying_rbs = rushes_bt.groupby('rusher_player_name').size()
qualifying_rbs = qualifying_rbs[qualifying_rbs >= 100].index
rushes_bt = rushes_bt[rushes_bt['rusher_player_name'].isin(qualifying_rbs)]

# Simulate broken tackles
rushes_bt['broken_tackles'] = np.random.poisson(
    lam=np.maximum(0, (rushes_bt['yards_gained'] - 3) / 10)
)

rb_elusiveness = (rushes_bt
    .groupby(['rusher_player_name', 'posteam'])
    .agg(
        carries=('play_id', 'count'),
        total_broken_tackles=('broken_tackles', 'sum'),
        broken_tackle_rate=('broken_tackles', 'mean'),
        carries_with_bt=('broken_tackles', lambda x: (x > 0).sum()),
        bt_percentage=('broken_tackles', lambda x: (x > 0).mean()),
        avg_yards=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean')
    )
    .reset_index()
    .sort_values('broken_tackle_rate', ascending=False)
)

print("\nMost Elusive Running Backs (Simulated):\n")
print(rb_elusiveness.head(15).to_string(index=False))

Rush Direction and Gap Schemes

Understanding where running backs run and how successful they are in different directions is crucial for both player evaluation and scheme analysis.

Analyzing Rush Direction

#| label: rush-direction-r
#| message: false
#| warning: false

# Analyze rushing by direction (when available in data)
# run_location: left, middle, right
# run_gap: end, tackle, guard, center

rush_direction <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id), !is.na(run_location)) %>%
  group_by(rusher_player_name, posteam, run_location) %>%
  filter(n() >= 30) %>%  # Minimum attempts per direction
  summarise(
    carries = n(),
    avg_yards = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    .groups = "drop"
  )

# Find RBs with data in all three directions
rbs_all_directions <- rush_direction %>%
  group_by(rusher_player_name, posteam) %>%
  filter(n() == 3) %>%
  ungroup()

# Show a sample of directional tendencies
rbs_all_directions %>%
  filter(posteam %in% c("SF", "BAL", "DAL", "MIA", "CLE")) %>%
  arrange(posteam, rusher_player_name, run_location) %>%
  gt() %>%
  cols_label(
    rusher_player_name = "Player",
    posteam = "Team",
    run_location = "Direction",
    carries = "Carries",
    avg_yards = "YPC",
    epa_per_rush = "EPA/Rush",
    success_rate = "Success %"
  ) %>%
  fmt_number(
    columns = c(avg_yards, epa_per_rush),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = success_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = carries,
    decimals = 0
  ) %>%
  tab_header(
    title = "Rushing Efficiency by Direction",
    subtitle = "Selected teams from 2023 season"
  )
#| label: rush-direction-py
#| message: false
#| warning: false

# Analyze rushing by direction
rush_direction = (pbp
    .query("play_type == 'run' & rusher_player_id.notna() & run_location.notna()")
    .groupby(['rusher_player_name', 'posteam', 'run_location'])
    .filter(lambda x: len(x) >= 30)
    .groupby(['rusher_player_name', 'posteam', 'run_location'])
    .agg(
        carries=('play_id', 'count'),
        avg_yards=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
)

# Find RBs with data in all three directions
rbs_all_directions = (rush_direction
    .groupby(['rusher_player_name', 'posteam'])
    .filter(lambda x: len(x) == 3)
)

# Show sample
sample_teams = ['SF', 'BAL', 'DAL', 'MIA', 'CLE']
print("\nRushing Efficiency by Direction (Selected Teams):\n")
print(rbs_all_directions[rbs_all_directions['posteam'].isin(sample_teams)]
      .sort_values(['posteam', 'rusher_player_name', 'run_location'])
      .to_string(index=False))

Visualizing Directional Tendencies

#| label: fig-direction-heatmap-r
#| fig-cap: "Rush EPA by direction for top running backs"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

# Get top RBs by total carries
top_rbs <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id)) %>%
  count(rusher_player_name, posteam, sort = TRUE) %>%
  head(12) %>%
  pull(rusher_player_name)

# Create heatmap data
direction_heatmap <- rush_direction %>%
  filter(rusher_player_name %in% top_rbs) %>%
  mutate(
    player_team = paste0(rusher_player_name, " (", posteam, ")"),
    run_location = factor(run_location, levels = c("left", "middle", "right"))
  )

# Create heatmap
direction_heatmap %>%
  ggplot(aes(x = run_location, y = reorder(player_team, epa_per_rush),
             fill = epa_per_rush)) +
  geom_tile(color = "white", linewidth = 1) +
  geom_text(aes(label = sprintf("%.2f", epa_per_rush)),
            color = "white", fontface = "bold", size = 3.5) +
  scale_fill_gradient2(
    low = "#D50A0A", mid = "white", high = "#013369",
    midpoint = 0, name = "EPA/Rush"
  ) +
  labs(
    title = "Rush EPA by Direction - Top Running Backs",
    subtitle = "2023 NFL Season",
    x = "Run Direction",
    y = "",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text.y = element_text(size = 10),
    legend.position = "right"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-direction-heatmap-py
#| fig-cap: "Rush EPA by direction - Python"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

# Get top RBs
top_rbs = (pbp
    .query("play_type == 'run' & rusher_player_id.notna()")
    .groupby(['rusher_player_name', 'posteam'])
    .size()
    .nlargest(12)
    .index.get_level_values(0)
    .tolist()
)

# Filter to top RBs
direction_heatmap = rush_direction[rush_direction['rusher_player_name'].isin(top_rbs)].copy()
direction_heatmap['player_team'] = (direction_heatmap['rusher_player_name'] +
                                    ' (' + direction_heatmap['posteam'] + ')')

# Pivot for heatmap
heatmap_data = direction_heatmap.pivot(
    index='player_team', columns='run_location', values='epa_per_rush'
)

# Create heatmap
fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(heatmap_data, annot=True, fmt='.2f', cmap='RdBu_r',
            center=0, cbar_kws={'label': 'EPA/Rush'},
            linewidths=1, linecolor='white', ax=ax)

ax.set_title('Rush EPA by Direction - Top Running Backs\n2023 NFL Season',
             fontsize=14, fontweight='bold', pad=20)
ax.set_xlabel('Run Direction', fontsize=12)
ax.set_ylabel('', fontsize=12)
ax.text(0.98, -0.08, 'Data: nfl_data_py',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Offensive Line Run Blocking

While individual running backs get the glory, offensive line play is fundamental to rushing success. Evaluating line performance requires aggregate analysis.

Team-Level Run Blocking Metrics

#| label: oline-blocking-r
#| message: false
#| warning: false

# Analyze team rushing performance (proxy for O-line)
oline_rushing <- pbp %>%
  filter(play_type == "run", !is.na(posteam)) %>%
  group_by(posteam) %>%
  summarise(
    rushes = n(),
    total_yards = sum(yards_gained, na.rm = TRUE),
    ypc = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    stuffed_rate = mean(yards_gained <= 0, na.rm = TRUE),
    explosive_rate = mean(yards_gained >= 10, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(epa_per_rush))

# Display rankings
oline_rushing %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    rushes = "Rushes",
    total_yards = "Yards",
    ypc = "YPC",
    epa_per_rush = "EPA/Rush",
    success_rate = "Success %",
    stuffed_rate = "Stuffed %",
    explosive_rate = "Explosive %"
  ) %>%
  fmt_number(
    columns = c(ypc, epa_per_rush),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = c(success_rate, stuffed_rate, explosive_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(rushes, total_yards),
    decimals = 0
  ) %>%
  data_color(
    columns = epa_per_rush,
    colors = scales::col_numeric(
      palette = c("red", "white", "green"),
      domain = c(-0.2, 0.1)
    )
  ) %>%
  tab_header(
    title = "Team Run Blocking Efficiency",
    subtitle = "2023 NFL Season"
  ) %>%
  tab_source_note(
    source_note = "Stuffed: ≤0 yards | Explosive: ≥10 yards"
  )
#| label: oline-blocking-py
#| message: false
#| warning: false

# Analyze team rushing performance
oline_rushing = (pbp
    .query("play_type == 'run' & posteam.notna()")
    .groupby('posteam')
    .agg(
        rushes=('play_id', 'count'),
        total_yards=('yards_gained', 'sum'),
        ypc=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean()),
        stuffed_rate=('yards_gained', lambda x: (x <= 0).mean()),
        explosive_rate=('yards_gained', lambda x: (x >= 10).mean())
    )
    .reset_index()
    .sort_values('epa_per_rush', ascending=False)
)

print("\nTeam Run Blocking Efficiency (2023 Season):\n")
print(oline_rushing.to_string(index=False))
print("\nNote: Stuffed = ≤0 yards | Explosive = ≥10 yards")

Power vs Zone Blocking

Different blocking schemes produce different results:

Blocking Scheme Indicators

While scheme isn't directly labeled in play-by-play data, we can infer tendencies: - **Power/Gap schemes**: Higher variance, more stuffed runs, more explosives - **Zone schemes**: Lower variance, consistent gains, fewer negatives - **Run location**: Outside zone often targets edges, inside zone hits between tackles

Situational Rushing Analytics

Not all rushing situations are equal. Performance in key situations reveals true effectiveness.

Short Yardage Success

#| label: short-yardage-r
#| message: false
#| warning: false

# Analyze short yardage performance (1-2 yards to go)
short_yardage <- pbp %>%
  filter(
    play_type == "run",
    !is.na(rusher_player_id),
    ydstogo <= 2,
    down %in% c(3, 4)  # Critical downs
  ) %>%
  group_by(rusher_player_name, posteam) %>%
  filter(n() >= 10) %>%  # Minimum attempts
  summarise(
    attempts = n(),
    conversions = sum(yards_gained >= ydstogo, na.rm = TRUE),
    conversion_rate = mean(yards_gained >= ydstogo, na.rm = TRUE),
    avg_yards = mean(yards_gained, na.rm = TRUE),
    stuffed_rate = mean(yards_gained <= 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(conversion_rate))

# Display short yardage specialists
short_yardage %>%
  head(20) %>%
  gt() %>%
  cols_label(
    rusher_player_name = "Player",
    posteam = "Team",
    attempts = "Attempts",
    conversions = "Conversions",
    conversion_rate = "Conv. Rate",
    avg_yards = "Avg Yards",
    stuffed_rate = "Stuffed %"
  ) %>%
  fmt_percent(
    columns = c(conversion_rate, stuffed_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = avg_yards,
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(attempts, conversions),
    decimals = 0
  ) %>%
  tab_header(
    title = "Short Yardage Conversion Leaders",
    subtitle = "3rd/4th down, 1-2 yards to go (Min. 10 attempts)"
  )
#| label: short-yardage-py
#| message: false
#| warning: false

# Analyze short yardage performance
short_yardage_plays = pbp.query(
    "play_type == 'run' & rusher_player_id.notna() & ydstogo <= 2 & down.isin([3, 4])"
).copy()

short_yardage = (short_yardage_plays
    .groupby(['rusher_player_name', 'posteam'])
    .filter(lambda x: len(x) >= 10)
    .groupby(['rusher_player_name', 'posteam'])
    .agg(
        attempts=('play_id', 'count'),
        conversions=('yards_gained', lambda x: (x >= short_yardage_plays.loc[x.index, 'ydstogo']).sum()),
        conversion_rate=('yards_gained', lambda x: (x >= short_yardage_plays.loc[x.index, 'ydstogo']).mean()),
        avg_yards=('yards_gained', 'mean'),
        stuffed_rate=('yards_gained', lambda x: (x <= 0).mean())
    )
    .reset_index()
    .sort_values('conversion_rate', ascending=False)
)

print("\nShort Yardage Conversion Leaders:\n")
print(short_yardage.head(20).to_string(index=False))

Red Zone Rushing

#| label: red-zone-rushing-r
#| message: false
#| warning: false

# Analyze red zone rushing (inside 20-yard line)
red_zone_rushing <- pbp %>%
  filter(
    play_type == "run",
    !is.na(rusher_player_id),
    yardline_100 <= 20
  ) %>%
  group_by(rusher_player_name, posteam) %>%
  filter(n() >= 15) %>%
  summarise(
    carries = n(),
    touchdowns = sum(touchdown == 1, na.rm = TRUE),
    td_rate = mean(touchdown == 1, na.rm = TRUE),
    avg_yards = mean(yards_gained, na.rm = TRUE),
    epa_per_rush = mean(epa, na.rm = TRUE),
    success_rate = mean(epa > 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(td_rate))

# Display red zone leaders
red_zone_rushing %>%
  head(20) %>%
  gt() %>%
  cols_label(
    rusher_player_name = "Player",
    posteam = "Team",
    carries = "Carries",
    touchdowns = "TDs",
    td_rate = "TD Rate",
    avg_yards = "YPC",
    epa_per_rush = "EPA/Rush",
    success_rate = "Success %"
  ) %>%
  fmt_percent(
    columns = c(td_rate, success_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(avg_yards, epa_per_rush),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(carries, touchdowns),
    decimals = 0
  ) %>%
  tab_header(
    title = "Red Zone Rushing Leaders",
    subtitle = "Inside the 20-yard line (Min. 15 carries)"
  )
#| label: red-zone-rushing-py
#| message: false
#| warning: false

# Analyze red zone rushing
red_zone_rushing = (pbp
    .query("play_type == 'run' & rusher_player_id.notna() & yardline_100 <= 20")
    .groupby(['rusher_player_name', 'posteam'])
    .filter(lambda x: len(x) >= 15)
    .groupby(['rusher_player_name', 'posteam'])
    .agg(
        carries=('play_id', 'count'),
        touchdowns=('touchdown', 'sum'),
        td_rate=('touchdown', 'mean'),
        avg_yards=('yards_gained', 'mean'),
        epa_per_rush=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
    .sort_values('td_rate', ascending=False)
)

print("\nRed Zone Rushing Leaders:\n")
print(red_zone_rushing.head(20).to_string(index=False))

Running Back Receiving Contributions

Modern running backs must contribute in the passing game. Evaluating total value requires combining rushing and receiving.

RB Receiving Metrics

#| label: rb-receiving-r
#| message: false
#| warning: false

# Analyze RB receiving (identify RBs by their rushing activity)
rb_list <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id)) %>%
  count(rusher_player_id, rusher_player_name) %>%
  filter(n >= 50) %>%
  pull(rusher_player_id)

rb_receiving <- pbp %>%
  filter(
    play_type == "pass",
    !is.na(receiver_player_id),
    receiver_player_id %in% rb_list
  ) %>%
  group_by(receiver_player_name, posteam) %>%
  summarise(
    targets = n(),
    receptions = sum(complete_pass == 1, na.rm = TRUE),
    catch_rate = mean(complete_pass == 1, na.rm = TRUE),
    rec_yards = sum(yards_gained * complete_pass, na.rm = TRUE),
    yards_per_rec = mean(yards_gained[complete_pass == 1], na.rm = TRUE),
    rec_tds = sum(touchdown == 1, na.rm = TRUE),
    epa_per_target = mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(targets >= 30) %>%
  arrange(desc(epa_per_target))

# Display receiving leaders
rb_receiving %>%
  head(20) %>%
  gt() %>%
  cols_label(
    receiver_player_name = "Player",
    posteam = "Team",
    targets = "Targets",
    receptions = "Rec",
    catch_rate = "Catch %",
    rec_yards = "Yards",
    yards_per_rec = "Y/R",
    rec_tds = "TDs",
    epa_per_target = "EPA/Target"
  ) %>%
  fmt_percent(
    columns = catch_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(yards_per_rec, epa_per_target),
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(targets, receptions, rec_yards, rec_tds),
    decimals = 0
  ) %>%
  tab_header(
    title = "Running Back Receiving Leaders",
    subtitle = "2023 Season (Min. 30 targets)"
  )
#| label: rb-receiving-py
#| message: false
#| warning: false

# Identify RBs
rb_list = (pbp
    .query("play_type == 'run' & rusher_player_id.notna()")
    .groupby('rusher_player_id')
    .size()
    .pipe(lambda x: x[x >= 50].index.tolist())
)

# Analyze RB receiving
rb_receiving = (pbp
    .query("play_type == 'pass' & receiver_player_id.notna() & receiver_player_id.isin(@rb_list)")
    .groupby(['receiver_player_name', 'posteam'])
    .agg(
        targets=('play_id', 'count'),
        receptions=('complete_pass', 'sum'),
        catch_rate=('complete_pass', 'mean'),
        rec_yards=('yards_gained', lambda x: (x * pbp.loc[x.index, 'complete_pass']).sum()),
        yards_per_rec=('yards_gained', lambda x: x[pbp.loc[x.index, 'complete_pass'] == 1].mean()),
        rec_tds=('touchdown', 'sum'),
        epa_per_target=('epa', 'mean')
    )
    .reset_index()
    .query("targets >= 30")
    .sort_values('epa_per_target', ascending=False)
)

print("\nRunning Back Receiving Leaders:\n")
print(rb_receiving.head(20).to_string(index=False))

Total RB Value: Combined Rushing + Receiving

#| label: total-rb-value-r
#| message: false
#| warning: false

# Calculate total RB value
rb_rushing_summary <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id)) %>%
  group_by(rusher_player_id, rusher_player_name, posteam) %>%
  summarise(
    rush_attempts = n(),
    rush_yards = sum(yards_gained, na.rm = TRUE),
    rush_tds = sum(touchdown == 1, na.rm = TRUE),
    rush_epa = sum(epa, na.rm = TRUE),
    .groups = "drop"
  )

rb_receiving_summary <- pbp %>%
  filter(play_type == "pass", !is.na(receiver_player_id)) %>%
  group_by(receiver_player_id, receiver_player_name, posteam) %>%
  summarise(
    targets = n(),
    receptions = sum(complete_pass == 1, na.rm = TRUE),
    rec_yards = sum(yards_gained * complete_pass, na.rm = TRUE),
    rec_tds = sum(touchdown == 1, na.rm = TRUE),
    rec_epa = sum(epa, na.rm = TRUE),
    .groups = "drop"
  )

# Combine rushing and receiving
total_rb_value <- rb_rushing_summary %>%
  left_join(
    rb_receiving_summary,
    by = c("rusher_player_id" = "receiver_player_id",
           "rusher_player_name" = "receiver_player_name",
           "posteam")
  ) %>%
  replace_na(list(targets = 0, receptions = 0, rec_yards = 0, rec_tds = 0, rec_epa = 0)) %>%
  mutate(
    total_touches = rush_attempts + receptions,
    total_yards = rush_yards + rec_yards,
    total_tds = rush_tds + rec_tds,
    total_epa = rush_epa + rec_epa,
    epa_per_touch = total_epa / total_touches,
    receiving_pct = receptions / total_touches
  ) %>%
  filter(total_touches >= 100) %>%
  arrange(desc(total_epa))

# Display most valuable RBs
total_rb_value %>%
  head(20) %>%
  select(rusher_player_name, posteam, rush_attempts, receptions,
         total_touches, total_yards, total_tds, total_epa,
         epa_per_touch, receiving_pct) %>%
  gt() %>%
  cols_label(
    rusher_player_name = "Player",
    posteam = "Team",
    rush_attempts = "Rushes",
    receptions = "Rec",
    total_touches = "Touches",
    total_yards = "Total Yds",
    total_tds = "TDs",
    total_epa = "Total EPA",
    epa_per_touch = "EPA/Touch",
    receiving_pct = "Rec %"
  ) %>%
  fmt_number(
    columns = c(total_epa, epa_per_touch),
    decimals = 1
  ) %>%
  fmt_percent(
    columns = receiving_pct,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(rush_attempts, receptions, total_touches, total_yards, total_tds),
    decimals = 0
  ) %>%
  data_color(
    columns = total_epa,
    colors = scales::col_numeric(
      palette = "Greens",
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Most Valuable Running Backs (Total EPA)",
    subtitle = "Combined rushing + receiving value, 2023 Season"
  )
#| label: total-rb-value-py
#| message: false
#| warning: false

# Calculate rushing summary
rb_rushing_summary = (pbp
    .query("play_type == 'run' & rusher_player_id.notna()")
    .groupby(['rusher_player_id', 'rusher_player_name', 'posteam'])
    .agg(
        rush_attempts=('play_id', 'count'),
        rush_yards=('yards_gained', 'sum'),
        rush_tds=('touchdown', 'sum'),
        rush_epa=('epa', 'sum')
    )
    .reset_index()
)

# Calculate receiving summary
rb_receiving_summary = (pbp
    .query("play_type == 'pass' & receiver_player_id.notna()")
    .groupby(['receiver_player_id', 'receiver_player_name', 'posteam'])
    .agg(
        targets=('play_id', 'count'),
        receptions=('complete_pass', 'sum'),
        rec_yards=('yards_gained', lambda x: (x * pbp.loc[x.index, 'complete_pass']).sum()),
        rec_tds=('touchdown', 'sum'),
        rec_epa=('epa', 'sum')
    )
    .reset_index()
)

# Combine
total_rb_value = rb_rushing_summary.merge(
    rb_receiving_summary,
    left_on=['rusher_player_id', 'rusher_player_name', 'posteam'],
    right_on=['receiver_player_id', 'receiver_player_name', 'posteam'],
    how='left'
).fillna(0)

# Calculate totals
total_rb_value['total_touches'] = total_rb_value['rush_attempts'] + total_rb_value['receptions']
total_rb_value['total_yards'] = total_rb_value['rush_yards'] + total_rb_value['rec_yards']
total_rb_value['total_tds'] = total_rb_value['rush_tds'] + total_rb_value['rec_tds']
total_rb_value['total_epa'] = total_rb_value['rush_epa'] + total_rb_value['rec_epa']
total_rb_value['epa_per_touch'] = total_rb_value['total_epa'] / total_rb_value['total_touches']
total_rb_value['receiving_pct'] = total_rb_value['receptions'] / total_rb_value['total_touches']

# Filter and display
total_rb_value = (total_rb_value
    .query("total_touches >= 100")
    .sort_values('total_epa', ascending=False)
)

print("\nMost Valuable Running Backs (Total EPA):\n")
display_cols = ['rusher_player_name', 'posteam', 'rush_attempts', 'receptions',
                'total_touches', 'total_yards', 'total_tds', 'total_epa',
                'epa_per_touch', 'receiving_pct']
print(total_rb_value[display_cols].head(20).to_string(index=False))

Visualizing RB Versatility

#| label: fig-rb-versatility-r
#| fig-cap: "Running back versatility: rushing vs receiving EPA"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

total_rb_value %>%
  filter(total_touches >= 150) %>%
  ggplot(aes(x = rush_epa, y = rec_epa)) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_point(aes(size = total_touches, color = epa_per_touch), alpha = 0.7) +
  geom_text(aes(label = rusher_player_name), size = 2.5,
            hjust = -0.1, vjust = -0.5, check_overlap = TRUE) +
  scale_color_gradient2(
    low = "#D50A0A", mid = "gray", high = "#013369",
    midpoint = 0, name = "EPA/Touch"
  ) +
  scale_size_continuous(range = c(3, 12), name = "Total Touches") +
  labs(
    title = "Running Back Versatility: Rushing vs Receiving Value",
    subtitle = "2023 NFL Season (Min. 150 touches)",
    x = "Total Rushing EPA",
    y = "Total Receiving EPA",
    caption = "Data: nflfastR | Top-right quadrant = elite dual-threat backs"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 11),
    legend.position = "right"
  )
#| label: fig-rb-versatility-py
#| fig-cap: "RB versatility scatter plot - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

versatility_data = total_rb_value[total_rb_value['total_touches'] >= 150]

fig, ax = plt.subplots(figsize=(12, 8))

scatter = ax.scatter(versatility_data['rush_epa'],
                     versatility_data['rec_epa'],
                     s=versatility_data['total_touches']*2,
                     c=versatility_data['epa_per_touch'],
                     cmap='RdBu', alpha=0.7, edgecolors='black', linewidth=0.5)

# Add reference lines
ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5)

# Add labels for selected players
for _, row in versatility_data.nlargest(10, 'total_epa').iterrows():
    ax.annotate(row['rusher_player_name'],
                xy=(row['rush_epa'], row['rec_epa']),
                xytext=(5, 5), textcoords='offset points',
                fontsize=8, alpha=0.8)

plt.colorbar(scatter, label='EPA/Touch', ax=ax)

ax.set_xlabel('Total Rushing EPA', fontsize=12)
ax.set_ylabel('Total Receiving EPA', fontsize=12)
ax.set_title('Running Back Versatility: Rushing vs Receiving Value\n2023 NFL Season (Min. 150 touches)',
             fontsize=14, fontweight='bold')
ax.text(0.98, 0.02, 'Data: nfl_data_py | Top-right = elite dual-threat backs',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Committee Approaches vs Bell Cows

The NFL has shifted from "bell cow" backs (one player taking 80%+ of carries) to committee approaches. Let's analyze this trend.

Identifying Backfield Distribution

#| label: backfield-distribution-r
#| message: false
#| warning: false

# Calculate team backfield distribution
backfield_usage <- pbp %>%
  filter(play_type == "run", !is.na(rusher_player_id), !is.na(posteam)) %>%
  group_by(posteam, rusher_player_name) %>%
  summarise(carries = n(), .groups = "drop_last") %>%
  mutate(
    team_carries = sum(carries),
    carry_share = carries / team_carries
  ) %>%
  ungroup()

# Find lead backs and their share
lead_backs <- backfield_usage %>%
  group_by(posteam) %>%
  slice_max(order_by = carry_share, n = 1) %>%
  ungroup() %>%
  arrange(desc(carry_share))

# Classify backfield approaches
backfield_approach <- backfield_usage %>%
  group_by(posteam) %>%
  summarise(
    total_carries = first(team_carries),
    lead_back = rusher_player_name[which.max(carry_share)],
    lead_back_share = max(carry_share),
    rb2_share = sort(carry_share, decreasing = TRUE)[2],
    rb3_share = sort(carry_share, decreasing = TRUE)[3],
    num_rbs_10pct = sum(carry_share >= 0.10),
    approach = case_when(
      lead_back_share >= 0.70 ~ "Bell Cow",
      lead_back_share >= 0.55 ~ "Featured Back",
      lead_back_share >= 0.40 ~ "1A/1B Split",
      TRUE ~ "Full Committee"
    ),
    .groups = "drop"
  ) %>%
  arrange(desc(lead_back_share))

# Display results
backfield_approach %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    total_carries = "Total Carries",
    lead_back = "Lead Back",
    lead_back_share = "Lead %",
    rb2_share = "RB2 %",
    rb3_share = "RB3 %",
    num_rbs_10pct = "RBs ≥10%",
    approach = "Approach"
  ) %>%
  fmt_percent(
    columns = c(lead_back_share, rb2_share, rb3_share),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(total_carries, num_rbs_10pct),
    decimals = 0
  ) %>%
  data_color(
    columns = lead_back_share,
    colors = scales::col_numeric(
      palette = c("#fee5d9", "#a50f15"),
      domain = c(0.3, 0.75)
    )
  ) %>%
  tab_header(
    title = "NFL Backfield Approaches by Team",
    subtitle = "2023 Season carry distribution"
  )
#| label: backfield-distribution-py
#| message: false
#| warning: false

# Calculate backfield distribution
backfield_usage = (pbp
    .query("play_type == 'run' & rusher_player_id.notna() & posteam.notna()")
    .groupby(['posteam', 'rusher_player_name'])
    .size()
    .reset_index(name='carries')
)

# Add team totals and shares
backfield_usage['team_carries'] = backfield_usage.groupby('posteam')['carries'].transform('sum')
backfield_usage['carry_share'] = backfield_usage['carries'] / backfield_usage['team_carries']

# Calculate team backfield approaches
def classify_approach(share):
    if share >= 0.70:
        return "Bell Cow"
    elif share >= 0.55:
        return "Featured Back"
    elif share >= 0.40:
        return "1A/1B Split"
    else:
        return "Full Committee"

backfield_approach = (backfield_usage
    .sort_values('carry_share', ascending=False)
    .groupby('posteam')
    .agg(
        total_carries=('team_carries', 'first'),
        lead_back=('rusher_player_name', 'first'),
        lead_back_share=('carry_share', 'max'),
        rb2_share=('carry_share', lambda x: sorted(x, reverse=True)[1] if len(x) > 1 else 0),
        rb3_share=('carry_share', lambda x: sorted(x, reverse=True)[2] if len(x) > 2 else 0),
        num_rbs_10pct=('carry_share', lambda x: (x >= 0.10).sum())
    )
    .reset_index()
)

backfield_approach['approach'] = backfield_approach['lead_back_share'].apply(classify_approach)
backfield_approach = backfield_approach.sort_values('lead_back_share', ascending=False)

print("\nNFL Backfield Approaches by Team (2023):\n")
print(backfield_approach.to_string(index=False))

Comparing Committee vs Bell Cow Efficiency

#| label: committee-efficiency-r
#| message: false
#| warning: false

# Join approach with team rushing efficiency
committee_efficiency <- backfield_approach %>%
  left_join(
    oline_rushing %>% select(posteam, epa_per_rush, success_rate, ypc),
    by = "posteam"
  )

# Compare approaches
approach_comparison <- committee_efficiency %>%
  group_by(approach) %>%
  summarise(
    teams = n(),
    avg_lead_share = mean(lead_back_share),
    avg_epa_per_rush = mean(epa_per_rush, na.rm = TRUE),
    avg_success_rate = mean(success_rate, na.rm = TRUE),
    avg_ypc = mean(ypc, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(factor(approach, levels = c("Bell Cow", "Featured Back", "1A/1B Split", "Full Committee")))

approach_comparison %>%
  gt() %>%
  cols_label(
    approach = "Backfield Approach",
    teams = "Teams",
    avg_lead_share = "Avg Lead Share",
    avg_epa_per_rush = "Avg EPA/Rush",
    avg_success_rate = "Avg Success %",
    avg_ypc = "Avg YPC"
  ) %>%
  fmt_percent(
    columns = c(avg_lead_share, avg_success_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(avg_epa_per_rush, avg_ypc),
    decimals = 3
  ) %>%
  fmt_number(
    columns = teams,
    decimals = 0
  ) %>%
  tab_header(
    title = "Backfield Approach vs Rushing Efficiency",
    subtitle = "Does a bell cow produce better results?"
  )
#| label: committee-efficiency-py
#| message: false
#| warning: false

# Merge with efficiency data
committee_efficiency = backfield_approach.merge(
    oline_rushing[['posteam', 'epa_per_rush', 'success_rate', 'ypc']],
    on='posteam'
)

# Compare approaches
approach_order = ['Bell Cow', 'Featured Back', '1A/1B Split', 'Full Committee']
approach_comparison = (committee_efficiency
    .groupby('approach')
    .agg(
        teams=('posteam', 'count'),
        avg_lead_share=('lead_back_share', 'mean'),
        avg_epa_per_rush=('epa_per_rush', 'mean'),
        avg_success_rate=('success_rate', 'mean'),
        avg_ypc=('ypc', 'mean')
    )
    .reset_index()
)

# Sort by approach order
approach_comparison['approach'] = pd.Categorical(
    approach_comparison['approach'],
    categories=approach_order,
    ordered=True
)
approach_comparison = approach_comparison.sort_values('approach')

print("\nBackfield Approach vs Rushing Efficiency:\n")
print(approach_comparison.to_string(index=False))

Summary

Rushing analytics has evolved far beyond yards per carry. Modern evaluation requires:

Key Metrics:
- EPA per rush: Context-aware efficiency measurement
- Success rate: Consistency indicator
- Yards before/after contact: Separating line and RB contributions
- Broken tackles: Individual elusiveness
- Directional efficiency: Scheme and tendency analysis

Situational Performance:
- Short yardage conversion rates
- Red zone efficiency
- Down-and-distance success

Modern RB Evaluation:
- Combined rushing and receiving value
- Versatility and role flexibility
- Backfield usage patterns

Key Findings from 2023:
- Pass EPA remains higher than rush EPA on average
- Elite running backs create value through consistency and receiving
- Committee approaches are increasingly common
- Situational rushing success matters more than overall volume

Exercises

Conceptual Questions

  1. YPC Limitations: Explain why a running back with 5.0 YPC might be less valuable than one with 4.2 YPC. What contextual factors could explain this?

  2. Line vs RB: How can we better separate offensive line contribution from running back contribution in rushing success? What metrics help with this distinction?

  3. Modern RB Role: How has the running back position evolved in the last decade? What skills are more valued now compared to 2010?

Coding Exercises

Exercise 1: Complete RB Evaluation

Build a comprehensive running back evaluation system that combines: a) Rush EPA and success rate b) Receiving EPA and catch rate c) Situational performance (short yardage, red zone) d) Total value above replacement Create a single composite score ranking all qualifying RBs.

Exercise 2: Offensive Line Run Blocking

Analyze offensive line run blocking by: a) Calculating team rush EPA before contact (when available) b) Comparing stuffed rate (≤0 yards) across teams c) Analyzing explosive play rate (≥10 yards) d) Creating a run blocking efficiency ranking Which teams create the best running lanes?

Exercise 3: Directional Analysis

For a specific team, analyze: a) Rush EPA by direction (left, middle, right) b) Success rate by gap (when available) c) Personnel grouping impact on rushing d) Down and distance tendencies Identify that team's most/least effective rushing approaches.

Exercise 4: Historical Analysis

Compare rushing analytics across multiple seasons: a) How has league-wide rush EPA evolved? b) Has the value of receiving for RBs increased? c) Are committee approaches becoming more common? d) Do bell cow backs face more injury risk? Use at least 3 seasons of data for this analysis.

Further Reading

  • Baldwin, B. (2021). "Running Back Analytics: Beyond Yards Per Carry." Open Source Football.
  • Eager, E. & Paine, N. (2022). "The Evolution of the Running Back Position." FiveThirtyEight.
  • Yurko, R., Ventura, S., & Horowitz, M. (2019). "Going deep: models for continuous-time within-play valuation of game outcomes in American football with tracking data." Journal of Quantitative Analysis in Sports, 16(2), 163-182.
  • Pro Football Focus. (2023). "RB Grades and Advanced Metrics Explained."

References

:::