Learning ObjectivesBy the end of this chapter, you will be able to:

Win Probability Game Simulation

Watch how win probability changes throughout a simulated game. Each play can dramatically shift momentum.

  1. Build win probability models from scratch using logistic regression and machine learning
  2. Understand the key drivers of win probability (score, time, field position)
  3. Calculate Win Probability Added (WPA) for plays and players
  4. Apply win probability to inform in-game decision-making
  5. Analyze clutch performance using WP-based metrics

Introduction

Few moments in football are as exhilarating as a game-changing play late in the fourth quarter. When a team drives down the field, trailing by 4 points with 2 minutes left, every fan instinctively understands that the next play matters enormously. But exactly how much? What are the team's chances of winning before and after a crucial third-down conversion? How does a sudden turnover change the outcome probability?

Win probability (WP) models answer these questions by estimating the likelihood of victory given the current game situation. Unlike Expected Points, which measures field position value, win probability directly addresses what matters most: who will ultimately win the game.

This chapter explores win probability from multiple angles. We'll build WP models from scratch, analyze the key factors that drive winning probability, calculate Win Probability Added (WPA) to measure play and player impact, identify high-leverage situations, and use WP to inform strategic decisions.

What is Win Probability?

Win probability estimates the percentage chance that a team will win the game based on current game state, including: - **Score differential**: Point spread between teams - **Time remaining**: Seconds left in game - **Field position**: Yard line of possession - **Down and distance**: Current down and yards to go - **Timeouts**: Remaining timeouts for each team - **Possession**: Which team has the ball A well-calibrated WP model should predict outcomes correctly across all probability levels.

The Evolution of Win Probability

Early Win Probability Models

Win probability modeling in football dates back to the 1970s:

  • Virgil Carter (1971): First systematic study of game situations and winning probability
  • Hidden Game of Football (1988): Early probability tables based on game situations
  • Pro-Football-Reference (2000s): First widely available online WP calculator

Modern WP Revolution

The analytics era brought sophisticated WP models:

  • Advanced NFL Stats (2009): Brian Burke's WP model became the gold standard
  • nflWAR (2017): Comprehensive player value using WP framework
  • nflfastR (2020): Built-in WP models available in play-by-play data
  • ESPN Win Probability (2020+): Real-time WP graphics during broadcasts

Today, WP is ubiquitous in football coverage, displayed during every close game and used to evaluate critical decisions.

Mathematical Foundation of Win Probability

Binary Classification Framework

Win probability is fundamentally a binary classification problem: will the team with possession win (1) or lose (0)?

For logistic regression, we model the log-odds of winning:

$$ \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_k x_k $$

Where:
- $p$ = probability of winning
- $x_i$ = predictor variables (score, time, field position, etc.)
- $\beta_i$ = coefficients to be estimated

Converting to probability:

$$ p = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_1 + ... + \beta_k x_k)}} $$

Key Predictor Variables

Essential features for WP models include:

  1. Score differential: Most important predictor
    - $\text{score\_diff} = \text{posteam\_score} - \text{defteam\_score}$

  2. Time remaining: Converted to seconds
    - $\text{game\_seconds\_remaining}$

  3. Score-time interaction: Differential impact varies by time
    - $\text{score\_diff} \times \text{game\_seconds\_remaining}$

  4. Field position: EPA or yards from end zone
    - Better than raw yardline due to nonlinearity

  5. Down and distance: Critical for conversion probability

  6. Possession: Who has the ball (incorporated in score differential)

  7. Timeouts: Remaining for each team

Model Validation Metrics

WP models should be evaluated on:

Calibration: Do predicted probabilities match actual outcomes?
- Plot predicted WP vs actual win rate in bins
- Ideal: points fall on diagonal line

Discrimination: Can model separate wins from losses?
- AUC (Area Under Curve): Higher is better (>0.75 is good)
- Log loss: Lower is better

Coverage: Does model work across all game situations?
- Test on rare situations (e.g., end-of-half, overtime)

Building a Basic Win Probability Model

Data Preparation

#| label: setup-r
#| message: false
#| warning: false

library(tidyverse)
library(nflfastR)
library(nflplotR)
library(gt)
library(caret)

# Load multiple seasons for training
pbp <- load_pbp(2018:2023)

# Prepare data for WP modeling
wp_data <- pbp %>%
  filter(!is.na(posteam), !is.na(score_differential)) %>%
  filter(season_type == "REG") %>%
  mutate(
    # Create binary outcome: did posteam win?
    posteam_won = if_else(result > 0, 1, 0),

    # Time remaining in seconds
    game_seconds = game_seconds_remaining,

    # Score differential (positive = posteam ahead)
    score_diff = score_differential,

    # Field position (use EPA for better representation)
    field_pos = yardline_100,

    # Down variables
    is_first_down = if_else(down == 1, 1, 0),
    is_third_or_fourth = if_else(down >= 3, 1, 0),

    # Distance to go
    yards_to_go = ydstogo,

    # Interaction terms
    score_time = score_diff * game_seconds,
    score_squared = score_diff^2,

    # Half indicator
    half = if_else(qtr <= 2, 1, 2)
  ) %>%
  select(
    game_id, play_id, posteam, defteam,
    posteam_won, score_diff, game_seconds, field_pos,
    down, yards_to_go, is_first_down, is_third_or_fourth,
    score_time, score_squared, half
  ) %>%
  filter(!is.na(posteam_won))

cat("Prepared", nrow(wp_data), "plays for WP modeling\n")
cat("Winning team plays:", sum(wp_data$posteam_won), "\n")
cat("Losing team plays:", sum(wp_data$posteam_won == 0), "\n")
cat("Win rate:", mean(wp_data$posteam_won), "\n")
#| label: setup-py
#| message: false
#| warning: false

import pandas as pd
import numpy as np
import nfl_data_py as nfl
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, log_loss, roc_curve
from sklearn.ensemble import GradientBoostingClassifier
import warnings
warnings.filterwarnings('ignore')

# Load multiple seasons
pbp = nfl.import_pbp_data(range(2018, 2024))

# Prepare data
wp_data = pbp[
    (pbp['posteam'].notna()) &
    (pbp['score_differential'].notna()) &
    (pbp['season_type'] == 'REG')
].copy()

# Create features
wp_data['posteam_won'] = (wp_data['result'] > 0).astype(int)
wp_data['game_seconds'] = wp_data['game_seconds_remaining']
wp_data['score_diff'] = wp_data['score_differential']
wp_data['field_pos'] = wp_data['yardline_100']
wp_data['is_first_down'] = (wp_data['down'] == 1).astype(int)
wp_data['is_third_or_fourth'] = (wp_data['down'] >= 3).astype(int)
wp_data['yards_to_go'] = wp_data['ydstogo']
wp_data['score_time'] = wp_data['score_diff'] * wp_data['game_seconds']
wp_data['score_squared'] = wp_data['score_diff'] ** 2
wp_data['half'] = (wp_data['qtr'] <= 2).astype(int) + 1

# Select features
feature_cols = ['score_diff', 'game_seconds', 'field_pos', 'down',
                'yards_to_go', 'is_first_down', 'is_third_or_fourth',
                'score_time', 'score_squared', 'half']

wp_data = wp_data[['game_id', 'play_id', 'posteam', 'defteam', 'posteam_won'] +
                  feature_cols].dropna()

print(f"Prepared {len(wp_data):,} plays for WP modeling")
print(f"Winning team plays: {wp_data['posteam_won'].sum():,}")
print(f"Losing team plays: {(wp_data['posteam_won'] == 0).sum():,}")
print(f"Win rate: {wp_data['posteam_won'].mean():.3f}")

Logistic Regression WP Model

#| label: logistic-wp-r
#| message: false
#| warning: false
#| cache: true

# Split into training and test sets
set.seed(2024)
train_index <- createDataPartition(wp_data$posteam_won, p = 0.8, list = FALSE)
train_data <- wp_data[train_index, ]
test_data <- wp_data[-train_index, ]

# Build logistic regression model
wp_model <- glm(
  posteam_won ~ score_diff + game_seconds + field_pos +
    score_time + score_squared +
    down + yards_to_go + half,
  data = train_data,
  family = binomial(link = "logit")
)

# Model summary
summary(wp_model)

# Predict on test set
test_data$wp_pred <- predict(wp_model, test_data, type = "response")

# Calculate metrics
auc <- roc_auc_score(test_data$posteam_won, test_data$wp_pred)
logloss <- log_loss(test_data$posteam_won, test_data$wp_pred)

cat("\nModel Performance:\n")
cat("AUC:", round(auc, 4), "\n")
cat("Log Loss:", round(logloss, 4), "\n")

# Calibration check
calibration <- test_data %>%
  mutate(wp_bin = cut(wp_pred, breaks = seq(0, 1, 0.1))) %>%
  group_by(wp_bin) %>%
  summarise(
    predicted_wp = mean(wp_pred),
    actual_win_rate = mean(posteam_won),
    n_plays = n(),
    .groups = "drop"
  ) %>%
  filter(n_plays >= 100)

calibration %>%
  gt() %>%
  cols_label(
    wp_bin = "WP Range",
    predicted_wp = "Predicted WP",
    actual_win_rate = "Actual Win Rate",
    n_plays = "Plays"
  ) %>%
  fmt_percent(
    columns = c(predicted_wp, actual_win_rate),
    decimals = 1
  ) %>%
  fmt_number(
    columns = n_plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Win Probability Model Calibration",
    subtitle = "Predicted vs Actual Win Rate by WP Decile"
  )
#| label: logistic-wp-py
#| message: false
#| warning: false
#| cache: true

# Split data
X = wp_data[feature_cols]
y = wp_data['posteam_won']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=2024, stratify=y
)

# Build logistic regression model
wp_model = LogisticRegression(max_iter=1000, random_state=2024)
wp_model.fit(X_train, y_train)

# Predict probabilities
y_pred_proba = wp_model.predict_proba(X_test)[:, 1]

# Calculate metrics
auc = roc_auc_score(y_test, y_pred_proba)
logloss = log_loss(y_test, y_pred_proba)

print("\nLogistic Regression Model Performance:")
print(f"AUC: {auc:.4f}")
print(f"Log Loss: {logloss:.4f}")

# Feature importance
feature_importance = pd.DataFrame({
    'feature': feature_cols,
    'coefficient': wp_model.coef_[0]
}).sort_values('coefficient', key=abs, ascending=False)

print("\nTop Feature Coefficients:")
print(feature_importance.head(10).to_string(index=False))

# Calibration check
test_results = pd.DataFrame({
    'actual': y_test,
    'predicted': y_pred_proba
})

test_results['wp_bin'] = pd.cut(test_results['predicted'], bins=10)
calibration = test_results.groupby('wp_bin').agg({
    'predicted': 'mean',
    'actual': ['mean', 'count']
}).round(3)

print("\nCalibration by WP Decile:")
print(calibration)

Visualizing Model Calibration

#| label: fig-calibration-r
#| fig-cap: "Win probability model calibration curve"
#| fig-width: 10
#| fig-height: 7
#| message: false
#| warning: false

# Create calibration plot
ggplot(calibration, aes(x = predicted_wp, y = actual_win_rate)) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed",
              color = "red", size = 1) +
  geom_point(aes(size = n_plays), alpha = 0.7, color = "#00BFC4") +
  geom_smooth(method = "loess", se = TRUE, color = "darkblue", size = 1) +
  scale_x_continuous(labels = scales::percent_format(), limits = c(0, 1)) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 1)) +
  scale_size_continuous(labels = scales::comma) +
  labs(
    title = "Win Probability Model Calibration",
    subtitle = "Perfect calibration falls on red diagonal line",
    x = "Predicted Win Probability",
    y = "Actual Win Rate",
    size = "Number of Plays",
    caption = "Data: nflfastR | 2018-2023 Seasons"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )
#| label: fig-calibration-py
#| fig-cap: "Win probability model calibration curve - Python"
#| fig-width: 10
#| fig-height: 7
#| message: false
#| warning: false

# Create bins for calibration
n_bins = 20
bin_edges = np.linspace(0, 1, n_bins + 1)
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2

# Calculate actual vs predicted for each bin
actual_rates = []
predicted_rates = []
counts = []

for i in range(n_bins):
    mask = (y_pred_proba >= bin_edges[i]) & (y_pred_proba < bin_edges[i+1])
    if mask.sum() > 0:
        actual_rates.append(y_test[mask].mean())
        predicted_rates.append(y_pred_proba[mask].mean())
        counts.append(mask.sum())

# Create plot
fig, ax = plt.subplots(figsize=(10, 7))

# Perfect calibration line
ax.plot([0, 1], [0, 1], 'r--', linewidth=2, label='Perfect Calibration', alpha=0.7)

# Actual calibration
scatter = ax.scatter(predicted_rates, actual_rates, s=[c/100 for c in counts],
                    alpha=0.6, c='#00BFC4', edgecolors='black', linewidth=0.5)

# Trend line
z = np.polyfit(predicted_rates, actual_rates, 2)
p = np.poly1d(z)
x_smooth = np.linspace(0, 1, 100)
ax.plot(x_smooth, p(x_smooth), 'darkblue', linewidth=2, label='Fitted Curve')

ax.set_xlabel('Predicted Win Probability', fontsize=12)
ax.set_ylabel('Actual Win Rate', fontsize=12)
ax.set_title('Win Probability Model Calibration\nPerfect calibration falls on red diagonal line',
            fontsize=14, fontweight='bold')
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: nfl_data_py | 2018-2023 Seasons',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Advanced Win Probability Models

Gradient Boosting WP Model

Machine learning models can capture nonlinear relationships:

#| label: gbm-wp-r
#| message: false
#| warning: false
#| cache: true

library(xgboost)

# Prepare data for xgboost
train_matrix <- xgb.DMatrix(
  data = as.matrix(train_data %>% select(score_diff, game_seconds, field_pos,
                                         down, yards_to_go, score_time, score_squared, half)),
  label = train_data$posteam_won
)

test_matrix <- xgb.DMatrix(
  data = as.matrix(test_data %>% select(score_diff, game_seconds, field_pos,
                                        down, yards_to_go, score_time, score_squared, half)),
  label = test_data$posteam_won
)

# Train gradient boosting model
xgb_model <- xgb.train(
  data = train_matrix,
  params = list(
    objective = "binary:logistic",
    eval_metric = "auc",
    max_depth = 6,
    eta = 0.1,
    subsample = 0.8
  ),
  nrounds = 100,
  watchlist = list(train = train_matrix, test = test_matrix),
  verbose = 0
)

# Predictions
test_data$wp_xgb <- predict(xgb_model, test_matrix)

# Compare models
auc_logit <- roc_auc_score(test_data$posteam_won, test_data$wp_pred)
auc_xgb <- roc_auc_score(test_data$posteam_won, test_data$wp_xgb)

tibble(
  Model = c("Logistic Regression", "XGBoost"),
  AUC = c(auc_logit, auc_xgb),
  `Log Loss` = c(
    log_loss(test_data$posteam_won, test_data$wp_pred),
    log_loss(test_data$posteam_won, test_data$wp_xgb)
  )
) %>%
  gt() %>%
  fmt_number(decimals = 4) %>%
  tab_header(
    title = "WP Model Comparison",
    subtitle = "Logistic Regression vs Gradient Boosting"
  ) %>%
  data_color(
    columns = AUC,
    colors = scales::col_numeric(
      palette = c("#ffffbf", "#1a9850"),
      domain = c(0.7, 0.9)
    )
  )

# Feature importance
importance <- xgb.importance(model = xgb_model)
xgb.plot.importance(importance, top_n = 10)
#| label: gbm-wp-py
#| message: false
#| warning: false
#| cache: true

# Train gradient boosting model
gbm_model = GradientBoostingClassifier(
    n_estimators=100,
    max_depth=6,
    learning_rate=0.1,
    subsample=0.8,
    random_state=2024
)

gbm_model.fit(X_train, y_train)

# Predictions
y_pred_gbm = gbm_model.predict_proba(X_test)[:, 1]

# Compare models
auc_logit = roc_auc_score(y_test, y_pred_proba)
auc_gbm = roc_auc_score(y_test, y_pred_gbm)
logloss_logit = log_loss(y_test, y_pred_proba)
logloss_gbm = log_loss(y_test, y_pred_gbm)

print("\nModel Comparison:")
print("=" * 50)
print(f"{'Model':<25} {'AUC':>10} {'Log Loss':>12}")
print("-" * 50)
print(f"{'Logistic Regression':<25} {auc_logit:>10.4f} {logloss_logit:>12.4f}")
print(f"{'Gradient Boosting':<25} {auc_gbm:>10.4f} {logloss_gbm:>12.4f}")

# Feature importance
feature_importance_gbm = pd.DataFrame({
    'feature': feature_cols,
    'importance': gbm_model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nFeature Importance (Gradient Boosting):")
print(feature_importance_gbm.to_string(index=False))

# Visualize feature importance
fig, ax = plt.subplots(figsize=(10, 6))
ax.barh(range(len(feature_importance_gbm)), feature_importance_gbm['importance'],
        alpha=0.8, color='#00BFC4')
ax.set_yticks(range(len(feature_importance_gbm)))
ax.set_yticklabels(feature_importance_gbm['feature'])
ax.set_xlabel('Importance', fontsize=12)
ax.set_title('Feature Importance in Win Probability Model',
            fontsize=14, fontweight='bold')
ax.invert_yaxis()
plt.tight_layout()
plt.show()

Key Drivers of Win Probability

Score Differential Impact

#| label: fig-score-wp-r
#| fig-cap: "Win probability by score differential and time remaining"
#| fig-width: 10
#| fig-height: 7
#| message: false
#| warning: false

# Create scenarios varying score and time
scenarios <- expand_grid(
  score_diff = seq(-21, 21, by = 3),
  game_seconds = c(3600, 1800, 900, 300, 120),
  field_pos = 50,  # Midfield
  down = 1,
  yards_to_go = 10,
  half = 2
) %>%
  mutate(
    is_first_down = 1,
    is_third_or_fourth = 0,
    score_time = score_diff * game_seconds,
    score_squared = score_diff^2,
    time_label = case_when(
      game_seconds == 3600 ~ "Start of 3rd Quarter",
      game_seconds == 1800 ~ "Mid 3rd Quarter",
      game_seconds == 900 ~ "Start of 4th Quarter",
      game_seconds == 300 ~ "5 minutes left",
      game_seconds == 120 ~ "2 minutes left"
    )
  )

# Predict WP for scenarios
scenarios$wp <- predict(wp_model, scenarios, type = "response")

# Visualize
ggplot(scenarios, aes(x = score_diff, y = wp, color = time_label)) +
  geom_line(size = 1.2, alpha = 0.8) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "gray50") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 1)) +
  scale_color_brewer(palette = "Set1") +
  labs(
    title = "Win Probability by Score Differential and Time Remaining",
    subtitle = "Possession at midfield, 1st and 10",
    x = "Score Differential (Positive = Team Ahead)",
    y = "Win Probability",
    color = "Time Remaining",
    caption = "Data: nflfastR WP Model"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-score-wp-py
#| fig-cap: "Win probability by score differential and time remaining - Python"
#| fig-width: 10
#| fig-height: 7
#| message: false
#| warning: false

# Create scenarios
score_diffs = np.arange(-21, 22, 3)
time_points = [3600, 1800, 900, 300, 120]
time_labels = {
    3600: "Start of 3rd Quarter",
    1800: "Mid 3rd Quarter",
    900: "Start of 4th Quarter",
    300: "5 minutes left",
    120: "2 minutes left"
}

# Generate all combinations
scenarios_list = []
for time_sec in time_points:
    for score in score_diffs:
        scenarios_list.append({
            'score_diff': score,
            'game_seconds': time_sec,
            'field_pos': 50,
            'down': 1,
            'yards_to_go': 10,
            'is_first_down': 1,
            'is_third_or_fourth': 0,
            'score_time': score * time_sec,
            'score_squared': score ** 2,
            'half': 2,
            'time_label': time_labels[time_sec]
        })

scenarios_df = pd.DataFrame(scenarios_list)

# Predict WP
X_scenarios = scenarios_df[feature_cols]
scenarios_df['wp'] = wp_model.predict_proba(X_scenarios)[:, 1]

# Visualize
fig, ax = plt.subplots(figsize=(10, 7))

colors = ['#e41a1c', '#377eb8', '#4daf4a', '#984ea3', '#ff7f00']
for i, (time_sec, label) in enumerate(time_labels.items()):
    data = scenarios_df[scenarios_df['game_seconds'] == time_sec]
    ax.plot(data['score_diff'], data['wp'], linewidth=2.5,
           label=label, color=colors[i], alpha=0.8)

ax.axhline(y=0.5, color='gray', linestyle='--', alpha=0.7)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.7)
ax.set_xlabel('Score Differential (Positive = Team Ahead)', fontsize=12)
ax.set_ylabel('Win Probability', fontsize=12)
ax.set_title('Win Probability by Score Differential and Time Remaining\nPossession at midfield, 1st and 10',
            fontsize=14, fontweight='bold')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.set_ylim(0, 1)
ax.legend(title='Time Remaining', loc='upper left')
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: WP Model Predictions',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Field Position Impact

#| label: fig-field-pos-wp-r
#| fig-cap: "Win probability by field position in close games"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Scenarios varying field position
field_scenarios <- expand_grid(
  score_diff = c(-3, 0, 3),
  game_seconds = 300,  # 5 minutes left
  field_pos = seq(5, 95, by = 5),
  down = 1,
  yards_to_go = 10,
  half = 2
) %>%
  mutate(
    is_first_down = 1,
    is_third_or_fourth = 0,
    score_time = score_diff * game_seconds,
    score_squared = score_diff^2,
    score_label = case_when(
      score_diff == -3 ~ "Down 3",
      score_diff == 0 ~ "Tied",
      score_diff == 3 ~ "Up 3"
    )
  )

field_scenarios$wp <- predict(wp_model, field_scenarios, type = "response")

ggplot(field_scenarios, aes(x = 100 - field_pos, y = wp, color = score_label)) +
  geom_line(size = 1.2, alpha = 0.8) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "gray50") +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_color_manual(
    values = c("Down 3" = "#d73027", "Tied" = "#fee08b", "Up 3" = "#1a9850")
  ) +
  labs(
    title = "Win Probability by Field Position",
    subtitle = "5 minutes remaining in 4th quarter, 1st and 10",
    x = "Yard Line (Offense's Perspective)",
    y = "Win Probability",
    color = "Score",
    caption = "Data: nflfastR WP Model"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-field-pos-wp-py
#| fig-cap: "Win probability by field position in close games - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Create field position scenarios
score_levels = [-3, 0, 3]
score_labels = {-3: "Down 3", 0: "Tied", 3: "Up 3"}
field_positions = np.arange(5, 96, 5)

field_scenarios_list = []
for score in score_levels:
    for fp in field_positions:
        field_scenarios_list.append({
            'score_diff': score,
            'game_seconds': 300,
            'field_pos': fp,
            'down': 1,
            'yards_to_go': 10,
            'is_first_down': 1,
            'is_third_or_fourth': 0,
            'score_time': score * 300,
            'score_squared': score ** 2,
            'half': 2,
            'score_label': score_labels[score]
        })

field_scenarios_df = pd.DataFrame(field_scenarios_list)
X_field = field_scenarios_df[feature_cols]
field_scenarios_df['wp'] = wp_model.predict_proba(X_field)[:, 1]

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))

colors = {'Down 3': '#d73027', 'Tied': '#fee08b', 'Up 3': '#1a9850'}
for score_label in ['Down 3', 'Tied', 'Up 3']:
    data = field_scenarios_df[field_scenarios_df['score_label'] == score_label]
    ax.plot(100 - data['field_pos'], data['wp'], linewidth=2.5,
           label=score_label, color=colors[score_label], alpha=0.8)

ax.axhline(y=0.5, color='gray', linestyle='--', alpha=0.7)
ax.set_xlabel("Yard Line (Offense's Perspective)", fontsize=12)
ax.set_ylabel('Win Probability', fontsize=12)
ax.set_title('Win Probability by Field Position\n5 minutes remaining in 4th quarter, 1st and 10',
            fontsize=14, fontweight='bold')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.legend(title='Score', loc='upper left')
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: WP Model Predictions',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Win Probability Added (WPA)

Calculating WPA

Win Probability Added measures the change in win probability from a play:

$$ \text{WPA} = \text{WP}_{\text{after}} - \text{WP}_{\text{before}} $$

#| label: wpa-calculation-r
#| message: false
#| warning: false

# Calculate WPA for 2023 season
pbp_2023 <- load_pbp(2023) %>%
  filter(!is.na(wp), !is.na(def_wp)) %>%
  mutate(
    # WPA is change in win probability
    wpa = wp - lag(wp),
    # For defense, use change in defensive WP
    def_wpa = def_wp - lag(def_wp)
  )

# Biggest WPA plays of 2023
biggest_wpa <- pbp_2023 %>%
  filter(abs(wpa) >= 0.20) %>%
  select(
    game_date, posteam, defteam, qtr, time, down, ydstogo,
    desc, wp, wpa, play_type
  ) %>%
  arrange(desc(abs(wpa))) %>%
  head(10)

biggest_wpa %>%
  gt() %>%
  cols_label(
    game_date = "Date",
    posteam = "Team",
    defteam = "Opp",
    qtr = "Qtr",
    time = "Time",
    down = "Down",
    ydstogo = "Dist",
    desc = "Play Description",
    wp = "WP After",
    wpa = "WPA",
    play_type = "Type"
  ) %>%
  fmt_percent(
    columns = c(wp, wpa),
    decimals = 1
  ) %>%
  data_color(
    columns = wpa,
    colors = scales::col_numeric(
      palette = c("#d73027", "#ffffbf", "#1a9850"),
      domain = c(-0.5, 0.5)
    )
  ) %>%
  tab_header(
    title = "Highest WPA Plays - 2023 Season",
    subtitle = "Plays with WPA magnitude >= 20%"
  )
#| label: wpa-calculation-py
#| message: false
#| warning: false

# Load 2023 season with WP
pbp_2023 = nfl.import_pbp_data([2023])
pbp_2023 = pbp_2023[(pbp_2023['wp'].notna()) & (pbp_2023['def_wp'].notna())].copy()

# Calculate WPA
pbp_2023['wpa'] = pbp_2023.groupby('game_id')['wp'].diff()
pbp_2023['def_wpa'] = pbp_2023.groupby('game_id')['def_wp'].diff()

# Find biggest WPA plays
biggest_wpa = pbp_2023[pbp_2023['wpa'].abs() >= 0.20].copy()
biggest_wpa = biggest_wpa.sort_values('wpa', key=abs, ascending=False).head(10)

print("\nHighest WPA Plays - 2023 Season")
print("Plays with WPA magnitude >= 20%")
print("=" * 100)

display_cols = ['game_date', 'posteam', 'defteam', 'qtr', 'time',
                'down', 'ydstogo', 'play_type', 'wp', 'wpa']
print(biggest_wpa[display_cols].to_string(index=False))

WPA by Play Type

#| label: fig-wpa-by-type-r
#| fig-cap: "Distribution of WPA by play type"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Analyze WPA distribution by play type
wpa_by_type <- pbp_2023 %>%
  filter(play_type %in% c("pass", "run")) %>%
  filter(!is.na(wpa), abs(wpa) <= 0.5) %>%
  group_by(play_type) %>%
  summarise(
    plays = n(),
    mean_wpa = mean(wpa, na.rm = TRUE),
    median_wpa = median(wpa, na.rm = TRUE),
    sd_wpa = sd(wpa, na.rm = TRUE),
    positive_wpa_rate = mean(wpa > 0, na.rm = TRUE),
    big_plays = sum(abs(wpa) > 0.10, na.rm = TRUE),
    .groups = "drop"
  )

# Visualize WPA distribution
pbp_2023 %>%
  filter(play_type %in% c("pass", "run")) %>%
  filter(!is.na(wpa), abs(wpa) <= 0.25) %>%
  ggplot(aes(x = wpa, fill = play_type)) +
  geom_density(alpha = 0.6) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "black") +
  scale_fill_manual(
    values = c("pass" = "#00BFC4", "run" = "#F8766D"),
    labels = c("Pass", "Run")
  ) +
  scale_x_continuous(labels = scales::percent_format()) +
  labs(
    title = "Win Probability Added Distribution by Play Type",
    subtitle = "2023 NFL Regular Season",
    x = "Win Probability Added",
    y = "Density",
    fill = "Play Type",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-wpa-by-type-py
#| fig-cap: "Distribution of WPA by play type - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Filter data
wpa_data = pbp_2023[
    (pbp_2023['play_type'].isin(['pass', 'run'])) &
    (pbp_2023['wpa'].notna()) &
    (pbp_2023['wpa'].abs() <= 0.25)
].copy()

# Summary statistics
wpa_summary = wpa_data.groupby('play_type').agg({
    'wpa': ['count', 'mean', 'median', 'std']
}).round(4)

print("\nWPA Summary by Play Type:")
print(wpa_summary)

# Visualize distribution
fig, ax = plt.subplots(figsize=(10, 6))

for play_type, color in [('pass', '#00BFC4'), ('run', '#F8766D')]:
    data = wpa_data[wpa_data['play_type'] == play_type]['wpa']
    ax.hist(data, bins=50, alpha=0.6, label=play_type.title(),
           color=color, density=True)

ax.axvline(x=0, color='black', linestyle='--', alpha=0.7)
ax.set_xlabel('Win Probability Added', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Win Probability Added Distribution by Play Type\n2023 NFL Regular Season',
            fontsize=14, fontweight='bold')
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax.legend(title='Play Type', loc='upper right')
ax.text(0.98, 0.02, 'Data: nfl_data_py',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Team WPA Leaders

#| label: team-wpa-r
#| message: false
#| warning: false

# Calculate team-level WPA
team_wpa <- pbp_2023 %>%
  filter(!is.na(wpa)) %>%
  group_by(posteam) %>%
  summarise(
    plays = n(),
    total_wpa = sum(wpa, na.rm = TRUE),
    wpa_per_play = mean(wpa, na.rm = TRUE),
    positive_plays = sum(wpa > 0, na.rm = TRUE),
    negative_plays = sum(wpa < 0, na.rm = TRUE),
    big_positive = sum(wpa > 0.10, na.rm = TRUE),
    big_negative = sum(wpa < -0.10, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(total_wpa))

# Top teams by WPA
team_wpa %>%
  head(10) %>%
  select(posteam, plays, total_wpa, wpa_per_play,
         big_positive, big_negative) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    plays = "Plays",
    total_wpa = "Total WPA",
    wpa_per_play = "WPA/Play",
    big_positive = "Big + Plays",
    big_negative = "Big - Plays"
  ) %>%
  fmt_number(
    columns = c(total_wpa, wpa_per_play),
    decimals = 3
  ) %>%
  fmt_number(
    columns = plays,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  data_color(
    columns = total_wpa,
    colors = scales::col_numeric(
      palette = c("#d73027", "#ffffbf", "#1a9850"),
      domain = c(-5, 5)
    )
  ) %>%
  tab_header(
    title = "Team WPA Leaders - 2023",
    subtitle = "Total Win Probability Added (Offensive)"
  )
#| label: team-wpa-py
#| message: false
#| warning: false

# Calculate team WPA
team_wpa_data = pbp_2023[pbp_2023['wpa'].notna()].groupby('posteam').agg({
    'play_id': 'count',
    'wpa': ['sum', 'mean']
}).reset_index()

team_wpa_data.columns = ['team', 'plays', 'total_wpa', 'wpa_per_play']

# Add big play counts
for idx, row in team_wpa_data.iterrows():
    team_plays = pbp_2023[pbp_2023['posteam'] == row['team']]
    team_wpa_data.loc[idx, 'big_positive'] = (team_plays['wpa'] > 0.10).sum()
    team_wpa_data.loc[idx, 'big_negative'] = (team_plays['wpa'] < -0.10).sum()

# Sort by total WPA
team_wpa_data = team_wpa_data.sort_values('total_wpa', ascending=False)

print("\nTeam WPA Leaders - 2023")
print("Total Win Probability Added (Offensive)")
print("=" * 80)
print(team_wpa_data.head(10).to_string(index=False))

Leverage and Critical Situations

Leverage Index

Leverage measures how much a play can swing win probability:

$$ \text{Leverage} = \frac{\text{WP}_{\text{success}} - \text{WP}_{\text{failure}}}{2} $$

#| label: leverage-calculation-r
#| message: false
#| warning: false

# Calculate leverage for each play
pbp_leverage <- pbp_2023 %>%
  filter(!is.na(wp)) %>%
  arrange(game_id, play_id) %>%
  group_by(game_id) %>%
  mutate(
    wp_next = lead(wp),
    wp_prev = lag(wp),
    # Approximate leverage as WPA magnitude potential
    leverage = abs(wp_next - wp_prev) / 2
  ) %>%
  ungroup() %>%
  filter(!is.na(leverage))

# High leverage situations
high_leverage <- pbp_leverage %>%
  filter(leverage >= 0.10) %>%
  mutate(
    quarter_time = paste0("Q", qtr, " ", time)
  ) %>%
  select(
    game_date, posteam, defteam, quarter_time, down, ydstogo,
    wp, leverage, desc
  ) %>%
  arrange(desc(leverage)) %>%
  head(10)

high_leverage %>%
  gt() %>%
  cols_label(
    game_date = "Date",
    posteam = "Team",
    defteam = "Opp",
    quarter_time = "Time",
    down = "Down",
    ydstogo = "Dist",
    wp = "WP",
    leverage = "Leverage",
    desc = "Description"
  ) %>%
  fmt_percent(
    columns = c(wp, leverage),
    decimals = 1
  ) %>%
  tab_header(
    title = "Highest Leverage Plays - 2023",
    subtitle = "Situations with biggest potential WP swings"
  )
#| label: leverage-calculation-py
#| message: false
#| warning: false

# Calculate leverage
pbp_leverage = pbp_2023[pbp_2023['wp'].notna()].copy()
pbp_leverage = pbp_leverage.sort_values(['game_id', 'play_id'])

# Calculate WP changes
pbp_leverage['wp_next'] = pbp_leverage.groupby('game_id')['wp'].shift(-1)
pbp_leverage['wp_prev'] = pbp_leverage.groupby('game_id')['wp'].shift(1)
pbp_leverage['leverage'] = (pbp_leverage['wp_next'] - pbp_leverage['wp_prev']).abs() / 2

# High leverage plays
high_leverage = pbp_leverage[pbp_leverage['leverage'] >= 0.10].copy()
high_leverage = high_leverage.sort_values('leverage', ascending=False).head(10)

print("\nHighest Leverage Plays - 2023")
print("Situations with biggest potential WP swings")
print("=" * 100)

display_cols = ['game_date', 'posteam', 'defteam', 'qtr', 'time',
                'down', 'ydstogo', 'wp', 'leverage']
print(high_leverage[display_cols].to_string(index=False))

Leverage Index Over Game

#| label: fig-leverage-game-r
#| fig-cap: "Leverage index throughout a close game"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Find an exciting close game
close_games <- pbp_2023 %>%
  group_by(game_id) %>%
  summarise(
    final_margin = abs(last(score_differential)),
    lead_changes = sum(sign(score_differential) != lag(sign(score_differential)), na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(final_margin <= 7) %>%
  arrange(desc(lead_changes))

# Select most exciting game
exciting_game <- close_games$game_id[1]

# Plot leverage over time
pbp_leverage %>%
  filter(game_id == exciting_game) %>%
  filter(!is.na(leverage)) %>%
  ggplot(aes(x = play_id, y = leverage)) +
  geom_area(fill = "#619CFF", alpha = 0.5) +
  geom_line(color = "#619CFF", size = 1) +
  geom_hline(yintercept = 0.05, linetype = "dashed", color = "red") +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(
    title = "Leverage Index Throughout Game",
    subtitle = paste("Game:", exciting_game, "| Red line = High leverage threshold"),
    x = "Play Number",
    y = "Leverage Index",
    caption = "Data: nflfastR"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14)
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-leverage-game-py
#| fig-cap: "Leverage index throughout a close game - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Find close exciting games
close_games = pbp_2023.groupby('game_id').agg({
    'score_differential': lambda x: abs(x.iloc[-1]),
    'play_id': 'count'
}).reset_index()
close_games.columns = ['game_id', 'final_margin', 'plays']
close_games = close_games[close_games['final_margin'] <= 7].sort_values('final_margin')

# Select a close game
exciting_game = close_games.iloc[0]['game_id']

# Plot leverage
game_data = pbp_leverage[
    (pbp_leverage['game_id'] == exciting_game) &
    (pbp_leverage['leverage'].notna())
].copy()

fig, ax = plt.subplots(figsize=(10, 6))

ax.fill_between(range(len(game_data)), game_data['leverage'],
               alpha=0.5, color='#619CFF')
ax.plot(range(len(game_data)), game_data['leverage'],
       color='#619CFF', linewidth=2)
ax.axhline(y=0.05, color='red', linestyle='--', alpha=0.7)

ax.set_xlabel('Play Number', fontsize=12)
ax.set_ylabel('Leverage Index', fontsize=12)
ax.set_title(f'Leverage Index Throughout Game\nGame: {exciting_game} | Red line = High leverage threshold',
            fontsize=14, fontweight='bold')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.text(0.98, 0.02, 'Data: nfl_data_py',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Fourth Quarter Comeback Probability

Modeling Comeback Likelihood

#| label: comeback-analysis-r
#| message: false
#| warning: false

# Analyze fourth quarter comebacks
fourth_quarter <- pbp_2023 %>%
  filter(qtr == 4, game_seconds_remaining <= 900) %>%
  filter(!is.na(score_differential)) %>%
  mutate(
    trailing = score_differential < 0,
    deficit = abs(score_differential),
    deficit_category = cut(
      deficit,
      breaks = c(0, 3, 7, 10, 14, 21, 100),
      labels = c("1-3", "4-7", "8-10", "11-14", "15-21", "22+")
    ),
    time_category = cut(
      game_seconds_remaining,
      breaks = c(0, 120, 300, 600, 900),
      labels = c("0-2 min", "2-5 min", "5-10 min", "10-15 min")
    )
  )

# Comeback success rate
comeback_prob <- fourth_quarter %>%
  filter(trailing) %>%
  group_by(deficit_category, time_category) %>%
  summarise(
    games = n_distinct(game_id),
    comebacks = sum(result > 0) / n_distinct(game_id),
    avg_wp = mean(wp, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(games >= 5)

comeback_prob %>%
  pivot_wider(
    names_from = time_category,
    values_from = comebacks
  ) %>%
  gt() %>%
  cols_label(
    deficit_category = "Deficit",
    `0-2 min` = "0-2 min",
    `2-5 min` = "2-5 min",
    `5-10 min` = "5-10 min",
    `10-15 min` = "10-15 min"
  ) %>%
  fmt_percent(
    columns = where(is.numeric),
    decimals = 1
  ) %>%
  fmt_missing(
    missing_text = "-"
  ) %>%
  data_color(
    columns = where(is.numeric),
    colors = scales::col_numeric(
      palette = c("#d73027", "#ffffbf", "#1a9850"),
      domain = c(0, 0.5)
    )
  ) %>%
  tab_header(
    title = "Fourth Quarter Comeback Probability",
    subtitle = "2023 NFL Season - Success rate by deficit and time"
  )
#| label: comeback-analysis-py
#| message: false
#| warning: false

# Filter fourth quarter plays
fourth_q = pbp_2023[
    (pbp_2023['qtr'] == 4) &
    (pbp_2023['game_seconds_remaining'] <= 900) &
    (pbp_2023['score_differential'].notna())
].copy()

# Classify situations
fourth_q['trailing'] = fourth_q['score_differential'] < 0
fourth_q['deficit'] = fourth_q['score_differential'].abs()

def deficit_category(deficit):
    if deficit <= 3:
        return "1-3"
    elif deficit <= 7:
        return "4-7"
    elif deficit <= 10:
        return "8-10"
    elif deficit <= 14:
        return "11-14"
    elif deficit <= 21:
        return "15-21"
    else:
        return "22+"

def time_category(seconds):
    if seconds <= 120:
        return "0-2 min"
    elif seconds <= 300:
        return "2-5 min"
    elif seconds <= 600:
        return "5-10 min"
    else:
        return "10-15 min"

fourth_q['deficit_cat'] = fourth_q['deficit'].apply(deficit_category)
fourth_q['time_cat'] = fourth_q['game_seconds_remaining'].apply(time_category)

# Calculate comeback probability
trailing = fourth_q[fourth_q['trailing']].copy()
comeback_data = trailing.groupby(['deficit_cat', 'time_cat']).agg({
    'game_id': 'nunique',
    'result': lambda x: (x > 0).sum() / len(x.unique()) if len(x.unique()) > 0 else 0,
    'wp': 'mean'
}).reset_index()

comeback_data.columns = ['deficit', 'time', 'games', 'comeback_rate', 'avg_wp']
comeback_data = comeback_data[comeback_data['games'] >= 5]

print("\nFourth Quarter Comeback Probability - 2023 NFL Season")
print("Success rate by deficit and time remaining")
print("=" * 70)

# Pivot for display
pivot_table = comeback_data.pivot(index='deficit', columns='time', values='comeback_rate')
print(pivot_table.to_string())

Visualizing Comeback Probability

#| label: fig-comeback-viz-r
#| fig-cap: "Comeback probability by deficit and time remaining"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

ggplot(comeback_prob, aes(x = time_category, y = deficit_category,
                          fill = comebacks)) +
  geom_tile(color = "white", size = 1) +
  geom_text(aes(label = scales::percent(comebacks, accuracy = 1)),
            color = "black", fontface = "bold", size = 4) +
  scale_fill_gradient2(
    low = "#d73027", mid = "#ffffbf", high = "#1a9850",
    midpoint = 0.25,
    labels = scales::percent_format(),
    name = "Comeback\nProbability"
  ) +
  labs(
    title = "Fourth Quarter Comeback Probability",
    subtitle = "Trailing team's chance of winning by deficit and time remaining",
    x = "Time Remaining in 4th Quarter",
    y = "Point Deficit",
    caption = "Data: nflfastR | 2023 NFL Season"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text = element_text(size = 10)
  )
#| label: fig-comeback-viz-py
#| fig-cap: "Comeback probability by deficit and time remaining - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Create heatmap
pivot_for_plot = comeback_data.pivot(index='deficit', columns='time', values='comeback_rate')

# Define order
deficit_order = ["1-3", "4-7", "8-10", "11-14", "15-21", "22+"]
time_order = ["10-15 min", "5-10 min", "2-5 min", "0-2 min"]

pivot_for_plot = pivot_for_plot.reindex(index=deficit_order, columns=time_order)

fig, ax = plt.subplots(figsize=(10, 6))

# Create heatmap
im = ax.imshow(pivot_for_plot.values, cmap='RdYlGn', aspect='auto',
              vmin=0, vmax=0.5)

# Set ticks
ax.set_xticks(np.arange(len(pivot_for_plot.columns)))
ax.set_yticks(np.arange(len(pivot_for_plot.index)))
ax.set_xticklabels(pivot_for_plot.columns)
ax.set_yticklabels(pivot_for_plot.index)

# Add text annotations
for i in range(len(pivot_for_plot.index)):
    for j in range(len(pivot_for_plot.columns)):
        value = pivot_for_plot.values[i, j]
        if not np.isnan(value):
            text = ax.text(j, i, f'{value:.0%}',
                          ha="center", va="center", color="black",
                          fontweight='bold', fontsize=11)

ax.set_xlabel('Time Remaining in 4th Quarter', fontsize=12)
ax.set_ylabel('Point Deficit', fontsize=12)
ax.set_title("Fourth Quarter Comeback Probability\nTrailing team's chance of winning by deficit and time remaining",
            fontsize=14, fontweight='bold')

# Colorbar
cbar = plt.colorbar(im, ax=ax)
cbar.set_label('Comeback Probability', rotation=270, labelpad=20)
cbar.ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))

ax.text(0.98, -0.12, 'Data: nfl_data_py | 2023 NFL Season',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Win Probability-Informed Decision Making

Fourth Down Decisions with WP

#| label: wp-fourth-down-r
#| message: false
#| warning: false

# Analyze fourth down decisions using WP
fourth_downs <- pbp_2023 %>%
  filter(down == 4, !is.na(wp)) %>%
  mutate(
    decision = case_when(
      play_type == "punt" ~ "Punt",
      play_type == "field_goal" ~ "FG Attempt",
      play_type %in% c("pass", "run") ~ "Go For It",
      TRUE ~ "Other"
    ),
    wp_gain = wp - lag(wp),
    success = if_else(play_type %in% c("pass", "run"),
                     first_down_rush == 1 | first_down_pass == 1,
                     NA_real_)
  ) %>%
  filter(decision != "Other")

# WP analysis by decision
wp_by_decision <- fourth_downs %>%
  group_by(decision) %>%
  summarise(
    attempts = n(),
    avg_wp_before = mean(lag(wp), na.rm = TRUE),
    avg_wp_after = mean(wp, na.rm = TRUE),
    avg_wp_gain = mean(wp_gain, na.rm = TRUE),
    success_rate = mean(success, na.rm = TRUE),
    .groups = "drop"
  )

wp_by_decision %>%
  gt() %>%
  cols_label(
    decision = "Decision",
    attempts = "Attempts",
    avg_wp_before = "Avg WP Before",
    avg_wp_after = "Avg WP After",
    avg_wp_gain = "Avg WP Gain",
    success_rate = "Success Rate"
  ) %>%
  fmt_number(
    columns = attempts,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  fmt_percent(
    columns = c(avg_wp_before, avg_wp_after, avg_wp_gain, success_rate),
    decimals = 2
  ) %>%
  data_color(
    columns = avg_wp_gain,
    colors = scales::col_numeric(
      palette = c("#d73027", "#ffffbf", "#1a9850"),
      domain = c(-0.05, 0.05)
    )
  ) %>%
  tab_header(
    title = "Fourth Down Decisions and Win Probability",
    subtitle = "2023 NFL Season"
  )
#| label: wp-fourth-down-py
#| message: false
#| warning: false

# Filter fourth downs
fourth_downs = pbp_2023[
    (pbp_2023['down'] == 4) &
    (pbp_2023['wp'].notna())
].copy()

# Classify decisions
def classify_4th_decision(row):
    if row['play_type'] == 'punt':
        return 'Punt'
    elif row['play_type'] == 'field_goal':
        return 'FG Attempt'
    elif row['play_type'] in ['pass', 'run']:
        return 'Go For It'
    else:
        return 'Other'

fourth_downs['decision'] = fourth_downs.apply(classify_4th_decision, axis=1)
fourth_downs = fourth_downs[fourth_downs['decision'] != 'Other']

# Calculate WP changes
fourth_downs['wp_before'] = fourth_downs.groupby('game_id')['wp'].shift(1)
fourth_downs['wp_gain'] = fourth_downs['wp'] - fourth_downs['wp_before']

# Calculate success
fourth_downs['success'] = (
    (fourth_downs['first_down_rush'] == 1) |
    (fourth_downs['first_down_pass'] == 1)
).astype(float)

# Summary by decision
wp_by_decision = fourth_downs.groupby('decision').agg({
    'play_id': 'count',
    'wp_before': 'mean',
    'wp': 'mean',
    'wp_gain': 'mean',
    'success': 'mean'
}).reset_index()

wp_by_decision.columns = ['decision', 'attempts', 'avg_wp_before',
                          'avg_wp_after', 'avg_wp_gain', 'success_rate']

print("\nFourth Down Decisions and Win Probability - 2023")
print("=" * 80)
print(wp_by_decision.to_string(index=False))

Two-Point Conversion Chart

#| label: two-point-wp-r
#| message: false
#| warning: false

# Create two-point conversion decision chart based on WP
# Scenarios: down by various amounts after TD in 4th quarter
two_pt_scenarios <- expand_grid(
  score_diff_after_td = c(-8, -5, -2, 1, 4, 7),  # After TD scored
  game_seconds = c(600, 300, 120, 60),
  field_pos = 75,  # Typical post-kickoff
  down = 1,
  yards_to_go = 10,
  half = 2
) %>%
  mutate(
    is_first_down = 1,
    is_third_or_fourth = 0,
    score_time = score_diff_after_td * game_seconds,
    score_squared = score_diff_after_td^2,

    # WP if kick XP (add 1 point)
    score_diff_xp = score_diff_after_td + 1,
    score_time_xp = score_diff_xp * game_seconds,
    score_squared_xp = score_diff_xp^2,

    # WP if go for 2 and succeed (add 2 points)
    score_diff_2pt_success = score_diff_after_td + 2,
    score_time_2pt_success = score_diff_2pt_success * game_seconds,
    score_squared_2pt_success = score_diff_2pt_success^2,

    # WP if go for 2 and fail (add 0 points)
    score_diff_2pt_fail = score_diff_after_td,
    score_time_2pt_fail = score_diff_2pt_fail * game_seconds,
    score_squared_2pt_fail = score_diff_2pt_fail^2,

    time_label = case_when(
      game_seconds == 600 ~ "10 min",
      game_seconds == 300 ~ "5 min",
      game_seconds == 120 ~ "2 min",
      game_seconds == 60 ~ "1 min"
    )
  )

# Calculate WP for each scenario (simplified - using score_diff directly)
# In practice, would use full model predictions
two_pt_scenarios <- two_pt_scenarios %>%
  mutate(
    wp_xp = plogis(score_diff_xp * 0.15 - game_seconds * 0.0003),
    wp_2pt_success = plogis(score_diff_2pt_success * 0.15 - game_seconds * 0.0003),
    wp_2pt_fail = plogis(score_diff_2pt_fail * 0.15 - game_seconds * 0.0003),

    # Expected WP of going for 2 (assume 48% success rate)
    wp_2pt_expected = 0.48 * wp_2pt_success + 0.52 * wp_2pt_fail,

    # Should go for 2?
    should_go_for_2 = wp_2pt_expected > wp_xp,
    wp_advantage = wp_2pt_expected - wp_xp
  )

# Create chart
two_pt_scenarios %>%
  select(score_diff_after_td, time_label, wp_xp, wp_2pt_expected,
         should_go_for_2, wp_advantage) %>%
  pivot_wider(
    names_from = time_label,
    values_from = should_go_for_2
  ) %>%
  gt() %>%
  cols_label(
    score_diff_after_td = "Score After TD",
    `10 min` = "10 min",
    `5 min` = "5 min",
    `2 min` = "2 min",
    `1 min` = "1 min"
  ) %>%
  fmt_logical(
    columns = where(is.logical),
    true = "GO FOR 2",
    false = "KICK XP"
  ) %>%
  tab_header(
    title = "Two-Point Conversion Decision Chart",
    subtitle = "Based on Win Probability Maximization (4th Quarter)"
  ) %>%
  tab_style(
    style = cell_fill(color = "#1a9850"),
    locations = cells_body(
      columns = where(is.logical),
      rows = `10 min` == TRUE | `5 min` == TRUE | `2 min` == TRUE | `1 min` == TRUE
    )
  )
#| label: two-point-wp-py
#| message: false
#| warning: false

# Create two-point scenarios
score_diffs = [-8, -5, -2, 1, 4, 7]
times = [600, 300, 120, 60]
time_labels = {600: "10 min", 300: "5 min", 120: "2 min", 60: "1 min"}

two_pt_results = []

for score_after_td in score_diffs:
    for time_sec in times:
        # Simple WP calculation (logistic function)
        def calc_wp(score_diff, time_remaining):
            return 1 / (1 + np.exp(-(score_diff * 0.15 - time_remaining * 0.0003)))

        # Calculate WP for each option
        wp_xp = calc_wp(score_after_td + 1, time_sec)
        wp_2pt_success = calc_wp(score_after_td + 2, time_sec)
        wp_2pt_fail = calc_wp(score_after_td, time_sec)

        # Expected WP of 2-pt (48% success rate)
        wp_2pt_expected = 0.48 * wp_2pt_success + 0.52 * wp_2pt_fail

        two_pt_results.append({
            'score_after_td': score_after_td,
            'time': time_labels[time_sec],
            'wp_xp': wp_xp,
            'wp_2pt': wp_2pt_expected,
            'go_for_2': wp_2pt_expected > wp_xp,
            'wp_advantage': wp_2pt_expected - wp_xp
        })

two_pt_df = pd.DataFrame(two_pt_results)

# Pivot for display
pivot_2pt = two_pt_df.pivot(index='score_after_td', columns='time', values='go_for_2')
pivot_2pt = pivot_2pt[["10 min", "5 min", "2 min", "1 min"]]

print("\nTwo-Point Conversion Decision Chart")
print("Based on Win Probability Maximization (4th Quarter)")
print("=" * 60)
print("\nGO = Go for 2-point conversion")
print("KICK = Kick extra point\n")

# Convert to readable format
for idx in pivot_2pt.index:
    row_str = f"{idx:>3}:"
    for col in pivot_2pt.columns:
        decision = "  GO  " if pivot_2pt.loc[idx, col] else " KICK "
        row_str += f" {decision}"
    print(row_str)

print("\nColumn headers: " + " | ".join(pivot_2pt.columns))

Clutch Performance Analysis with WP

Measuring Clutch Performance

#| label: clutch-performance-r
#| message: false
#| warning: false

# Define clutch situations
clutch_plays <- pbp_2023 %>%
  filter(!is.na(wpa), !is.na(passer_player_name)) %>%
  mutate(
    is_clutch = (qtr == 4 & game_seconds_remaining <= 300 &
                abs(score_differential) <= 8) |
                (qtr == 5),  # Overtime
    leverage_high = !is.na(wpa) & abs(wpa) > 0.05
  )

# QB performance in clutch situations
qb_clutch <- clutch_plays %>%
  filter(play_type == "pass", is_clutch == TRUE) %>%
  group_by(passer = passer_player_name) %>%
  summarise(
    clutch_plays = n(),
    total_wpa = sum(wpa, na.rm = TRUE),
    avg_wpa = mean(wpa, na.rm = TRUE),
    positive_plays = sum(wpa > 0, na.rm = TRUE),
    big_plays = sum(wpa > 0.10, na.rm = TRUE),
    success_rate = mean(wpa > 0, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(clutch_plays >= 20) %>%
  arrange(desc(total_wpa))

# Top clutch QBs
qb_clutch %>%
  head(15) %>%
  gt() %>%
  cols_label(
    passer = "Quarterback",
    clutch_plays = "Plays",
    total_wpa = "Total WPA",
    avg_wpa = "Avg WPA",
    positive_plays = "+ Plays",
    big_plays = "Big Plays",
    success_rate = "Success %"
  ) %>%
  fmt_number(
    columns = c(total_wpa, avg_wpa),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = success_rate,
    decimals = 1
  ) %>%
  data_color(
    columns = total_wpa,
    colors = scales::col_numeric(
      palette = c("#d73027", "#ffffbf", "#1a9850"),
      domain = c(-0.5, 0.5)
    )
  ) %>%
  tab_header(
    title = "Clutch QB Performance - 2023",
    subtitle = "4th quarter/OT, close games (within 8 pts, <5 min left)"
  )
#| label: clutch-performance-py
#| message: false
#| warning: false

# Define clutch situations
clutch_plays = pbp_2023[
    (pbp_2023['wpa'].notna()) &
    (pbp_2023['passer_player_name'].notna())
].copy()

clutch_plays['is_clutch'] = (
    ((clutch_plays['qtr'] == 4) &
     (clutch_plays['game_seconds_remaining'] <= 300) &
     (clutch_plays['score_differential'].abs() <= 8)) |
    (clutch_plays['qtr'] == 5)  # Overtime
)

# QB clutch performance
qb_clutch_data = clutch_plays[
    (clutch_plays['play_type'] == 'pass') &
    (clutch_plays['is_clutch'] == True)
].copy()

qb_clutch = qb_clutch_data.groupby('passer_player_name').agg({
    'play_id': 'count',
    'wpa': ['sum', 'mean']
}).reset_index()

qb_clutch.columns = ['passer', 'clutch_plays', 'total_wpa', 'avg_wpa']

# Add additional metrics
for idx, row in qb_clutch.iterrows():
    qb_data = qb_clutch_data[qb_clutch_data['passer_player_name'] == row['passer']]
    qb_clutch.loc[idx, 'positive_plays'] = (qb_data['wpa'] > 0).sum()
    qb_clutch.loc[idx, 'big_plays'] = (qb_data['wpa'] > 0.10).sum()
    qb_clutch.loc[idx, 'success_rate'] = (qb_data['wpa'] > 0).mean()

# Filter and sort
qb_clutch = qb_clutch[qb_clutch['clutch_plays'] >= 20].copy()
qb_clutch = qb_clutch.sort_values('total_wpa', ascending=False)

print("\nClutch QB Performance - 2023")
print("4th quarter/OT, close games (within 8 pts, <5 min left)")
print("=" * 90)
print(qb_clutch.head(15).to_string(index=False))

Clutch vs Non-Clutch Performance

#| label: fig-clutch-comparison-r
#| fig-cap: "QB performance: clutch vs non-clutch situations"
#| fig-width: 10
#| fig-height: 7
#| message: false
#| warning: false

# Compare clutch vs non-clutch for top QBs
qb_comparison <- clutch_plays %>%
  filter(play_type == "pass", !is.na(passer_player_name)) %>%
  group_by(passer = passer_player_name, is_clutch) %>%
  summarise(
    plays = n(),
    avg_wpa = mean(wpa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  pivot_wider(
    names_from = is_clutch,
    values_from = c(plays, avg_wpa),
    names_prefix = "clutch_"
  ) %>%
  filter(plays_clutch_TRUE >= 20, plays_clutch_FALSE >= 100) %>%
  mutate(
    clutch_difference = avg_wpa_clutch_TRUE - avg_wpa_clutch_FALSE
  ) %>%
  arrange(desc(clutch_difference))

# Visualize
qb_comparison %>%
  head(15) %>%
  ggplot(aes(x = avg_wpa_clutch_FALSE, y = avg_wpa_clutch_TRUE)) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") +
  geom_point(aes(size = plays_clutch_TRUE), alpha = 0.7, color = "#00BFC4") +
  geom_text(aes(label = passer), hjust = -0.1, vjust = 0, size = 3) +
  scale_x_continuous(labels = scales::percent_format(scale = 100)) +
  scale_y_continuous(labels = scales::percent_format(scale = 100)) +
  labs(
    title = "QB Performance: Clutch vs Non-Clutch Situations",
    subtitle = "Points above diagonal = better in clutch; below = worse in clutch",
    x = "Average WPA (Non-Clutch)",
    y = "Average WPA (Clutch)",
    size = "Clutch Plays",
    caption = "Data: nflfastR | 2023 Season"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-clutch-comparison-py
#| fig-cap: "QB performance: clutch vs non-clutch situations - Python"
#| fig-width: 10
#| fig-height: 7
#| message: false
#| warning: false

# Compare clutch vs non-clutch
qb_comp_data = clutch_plays[
    (clutch_plays['play_type'] == 'pass') &
    (clutch_plays['passer_player_name'].notna())
].copy()

# Calculate metrics for each situation
clutch_metrics = qb_comp_data.groupby(['passer_player_name', 'is_clutch']).agg({
    'play_id': 'count',
    'wpa': 'mean'
}).reset_index()

clutch_metrics.columns = ['passer', 'is_clutch', 'plays', 'avg_wpa']

# Pivot
clutch_wide = clutch_metrics.pivot(index='passer', columns='is_clutch',
                                   values=['plays', 'avg_wpa']).reset_index()
clutch_wide.columns = ['passer', 'plays_nonclutch', 'plays_clutch',
                       'avg_wpa_nonclutch', 'avg_wpa_clutch']

# Filter
clutch_wide = clutch_wide[
    (clutch_wide['plays_clutch'] >= 20) &
    (clutch_wide['plays_nonclutch'] >= 100)
].copy()

clutch_wide['clutch_diff'] = clutch_wide['avg_wpa_clutch'] - clutch_wide['avg_wpa_nonclutch']
clutch_wide = clutch_wide.sort_values('clutch_diff', ascending=False).head(15)

# Visualize
fig, ax = plt.subplots(figsize=(10, 7))

# Diagonal line
max_val = max(clutch_wide['avg_wpa_nonclutch'].max(),
             clutch_wide['avg_wpa_clutch'].max())
min_val = min(clutch_wide['avg_wpa_nonclutch'].min(),
             clutch_wide['avg_wpa_clutch'].min())
ax.plot([min_val, max_val], [min_val, max_val], 'r--', alpha=0.7)

# Scatter points
scatter = ax.scatter(clutch_wide['avg_wpa_nonclutch'],
                    clutch_wide['avg_wpa_clutch'],
                    s=clutch_wide['plays_clutch']*3,
                    alpha=0.6, c='#00BFC4', edgecolors='black', linewidth=0.5)

# Labels
for idx, row in clutch_wide.iterrows():
    ax.annotate(row['passer'],
               (row['avg_wpa_nonclutch'], row['avg_wpa_clutch']),
               fontsize=8, ha='left', va='bottom')

ax.set_xlabel('Average WPA (Non-Clutch)', fontsize=12)
ax.set_ylabel('Average WPA (Clutch)', fontsize=12)
ax.set_title('QB Performance: Clutch vs Non-Clutch Situations\nPoints above diagonal = better in clutch; below = worse in clutch',
            fontsize=14, fontweight='bold')
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x*100:.1f}%'))
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y*100:.1f}%'))
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: nfl_data_py | 2023 Season',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Win Probability Model Validation

Calibration Curves by Game Situation

#| label: validation-calibration-r
#| message: false
#| warning: false

# Validate model across different game situations
validation_data <- test_data %>%
  mutate(
    game_phase = case_when(
      half == 1 ~ "First Half",
      half == 2 & game_seconds > 900 ~ "Early 4th",
      half == 2 & game_seconds <= 900 ~ "Late 4th",
      TRUE ~ "Other"
    ),
    score_situation = case_when(
      abs(score_diff) <= 3 ~ "Very Close",
      abs(score_diff) <= 7 ~ "One Score",
      abs(score_diff) <= 14 ~ "Two Score",
      TRUE ~ "Blowout"
    )
  )

# Calibration by phase
calibration_by_phase <- validation_data %>%
  group_by(game_phase) %>%
  mutate(wp_bin = cut(wp_pred, breaks = seq(0, 1, 0.1))) %>%
  group_by(game_phase, wp_bin) %>%
  summarise(
    predicted_wp = mean(wp_pred),
    actual_win_rate = mean(posteam_won),
    n_plays = n(),
    .groups = "drop"
  ) %>%
  filter(n_plays >= 50)

# Visualize
ggplot(calibration_by_phase, aes(x = predicted_wp, y = actual_win_rate,
                                  color = game_phase)) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "black") +
  geom_point(aes(size = n_plays), alpha = 0.7) +
  geom_smooth(method = "loess", se = FALSE, size = 1) +
  scale_x_continuous(labels = scales::percent_format(), limits = c(0, 1)) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 1)) +
  scale_color_brewer(palette = "Set1") +
  labs(
    title = "Model Calibration by Game Phase",
    subtitle = "Model performs well across all game situations",
    x = "Predicted Win Probability",
    y = "Actual Win Rate",
    color = "Game Phase",
    size = "Plays",
    caption = "Data: nflfastR | Test Set"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )
#| label: validation-calibration-py
#| message: false
#| warning: false

# Add game situation features to test data
test_with_features = X_test.copy()
test_with_features['actual'] = y_test.values
test_with_features['predicted'] = y_pred_proba

# Classify game phase
def game_phase(row):
    if row['half'] == 1:
        return "First Half"
    elif row['half'] == 2 and row['game_seconds'] > 900:
        return "Early 4th"
    elif row['half'] == 2 and row['game_seconds'] <= 900:
        return "Late 4th"
    else:
        return "Other"

test_with_features['game_phase'] = test_with_features.apply(game_phase, axis=1)

# Calculate calibration by phase
calibration_results = []
for phase in ["First Half", "Early 4th", "Late 4th"]:
    phase_data = test_with_features[test_with_features['game_phase'] == phase]

    # Create bins
    for i in range(10):
        lower = i / 10
        upper = (i + 1) / 10
        bin_data = phase_data[
            (phase_data['predicted'] >= lower) &
            (phase_data['predicted'] < upper)
        ]

        if len(bin_data) >= 50:
            calibration_results.append({
                'phase': phase,
                'predicted': bin_data['predicted'].mean(),
                'actual': bin_data['actual'].mean(),
                'count': len(bin_data)
            })

calib_df = pd.DataFrame(calibration_results)

# Visualize
fig, ax = plt.subplots(figsize=(10, 7))

# Perfect calibration line
ax.plot([0, 1], [0, 1], 'k--', alpha=0.7, label='Perfect Calibration')

# Plot each phase
colors = {'First Half': '#e41a1c', 'Early 4th': '#377eb8', 'Late 4th': '#4daf4a'}
for phase in ["First Half", "Early 4th", "Late 4th"]:
    phase_data = calib_df[calib_df['phase'] == phase]
    ax.scatter(phase_data['predicted'], phase_data['actual'],
              s=phase_data['count']/5, alpha=0.6,
              c=colors[phase], label=phase)

    # Trend line
    if len(phase_data) > 3:
        z = np.polyfit(phase_data['predicted'], phase_data['actual'], 2)
        p = np.poly1d(z)
        x_line = np.linspace(phase_data['predicted'].min(),
                            phase_data['predicted'].max(), 50)
        ax.plot(x_line, p(x_line), color=colors[phase], linewidth=2, alpha=0.7)

ax.set_xlabel('Predicted Win Probability', fontsize=12)
ax.set_ylabel('Actual Win Rate', fontsize=12)
ax.set_title('Model Calibration by Game Phase\nModel performs well across all game situations',
            fontsize=14, fontweight='bold')
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.legend(title='Game Phase', loc='upper left')
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: Test Set',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Summary

Win probability models are powerful tools for understanding game situations, evaluating decisions, and measuring clutch performance:

Key Takeaways:

  1. WP Models: Logistic regression provides strong baseline; machine learning can improve accuracy
  2. Key Drivers: Score differential, time remaining, and their interaction dominate WP
  3. Win Probability Added: Measures play impact directly in terms of winning
  4. Leverage: Identifies most critical game situations where plays matter most
  5. Decision-Making: WP informs optimal choices on fourth downs and two-point conversions
  6. Clutch Performance: WPA in high-leverage situations reveals true clutch ability
  7. Model Validation: Calibration is critical—predicted probabilities must match actual outcomes

Best Practices:

  • Use multiple seasons of data for stable WP models
  • Validate calibration across different game situations
  • Consider context when interpreting WPA (leverage matters)
  • Update models regularly as game evolves
  • Combine WP with EPA for complete evaluation
  • Account for special teams and turnover probabilities
  • Use WP to evaluate decisions, not just outcomes

Exercises

Conceptual Questions

  1. WP vs EPA: When would you use win probability instead of expected points? What are the advantages and limitations of each metric?

  2. Comeback Probability: Why does comeback probability decline non-linearly with deficit size? At what point does a comeback become nearly impossible?

  3. Clutch Performance Debate: Some argue "clutch" performance is mostly luck. How can WPA help settle this debate? What evidence would convince you either way?

Coding Exercises

Exercise 1: Build Custom WP Model

Build your own win probability model from scratch: a) Load 5+ years of play-by-play data b) Engineer features including score, time, field position, and interactions c) Train logistic regression and gradient boosting models d) Evaluate model calibration and discrimination e) Compare your model to nflfastR's built-in WP **Extension**: Add weather, home/away, or rest days as features.

Exercise 2: Game-Level WPA Analysis

Calculate total WPA for every game in a season: a) Sum offensive and defensive WPA for each team per game b) Identify games with highest total WPA (most exciting) c) Find teams that consistently win/lose close games d) Correlate game-level WPA with actual margin of victory **Visualization**: Create a scatter plot of final margin vs total WPA.

Exercise 3: High-Leverage Play Analysis

Identify and analyze the highest-leverage situations: a) Calculate leverage index for every play b) Find the 100 highest-leverage plays in a season c) Analyze success rates in high vs low leverage d) Identify players who excel in high-leverage moments **Research Question**: Do better teams perform better in high-leverage situations?

Exercise 4: Clutch Performance Evaluation

Evaluate quarterback clutch performance over multiple seasons: a) Define "clutch" situations (4th quarter, close game, high leverage) b) Calculate QB WPA in clutch vs non-clutch situations c) Test whether clutch performance is stable year-over-year d) Identify the most consistently clutch QBs **Statistical Test**: Use correlation to test clutch performance stability.

Exercise 5: WP-Informed Strategy Analysis

Use WP to evaluate strategic decisions: a) Analyze all fourth down decisions using WP framework b) Calculate expected WP for going for it vs punting/kicking c) Identify teams that make optimal WP-maximizing decisions d) Estimate wins gained/lost from suboptimal decisions **Extension**: Create a "fourth down decision quality" score for each team.

Further Reading

  • Win Probability Fundamentals:
  • Burke, B. (2009). "Win Probability Added (WPA)." Advanced NFL Stats.
  • Lock, D. & Nettleton, D. (2014). "Using random forests to estimate win probability before each play of an NFL game." Journal of Quantitative Analysis in Sports, 10(2), 197-205.

  • WP Applications:

  • Yurko, R., Ventura, S., & Horowitz, M. (2019). "nflWAR: A reproducible method for offensive player evaluation in football." Journal of Quantitative Analysis in Sports, 15(3), 163-183.
  • Baldwin, B. & Sebastian, C. (2019). "nflfastR: Functions to efficiently scrape NFL play-by-play data." CRAN.

  • Clutch Performance:

  • Moskowitz, T. & Wertheim, L.J. (2011). Scorecasting: The Hidden Influences Behind How Sports Are Played and Games Are Won. Crown Archetype.
  • Berri, D. & Schmidt, M. (2002). "Instrumental versus bounded rationality: A comparison of Major League Baseball and the National Basketball Association." Journal of Socio-Economics, 31(3), 191-214.

  • Decision-Making:

  • Romer, D. (2006). "Do firms maximize? Evidence from professional football." Journal of Political Economy, 114(2), 340-365.
  • Burke, B. (2014). "The Two-Point Conversion Chart." Advanced NFL Stats.

References

:::