Chapter 9: Expected Points Added (EPA) | Football Analytics Textbook

Learning ObjectivesBy the end of this chapter, you will be able to:

Interactive EPA Calculator

Explore how field position, down, and distance affect Expected Points.

EPA Distribution

The distribution of EPA across all plays shows most plays have small EPA values, with occasional explosive plays.

NFL Team Performance Comparison (2023)

This scatter plot shows offensive EPA per play (x-axis) vs. defensive EPA allowed per play (y-axis). Top-right teams are dominant on both sides of the ball.

Understand the mathematical foundation of Expected Points (EP) and Expected Points Added (EPA)
Build expected points models from scratch using historical data
Calculate and interpret EPA in different game contexts
Recognize EPA limitations and apply appropriate adjustments
Use EPA for strategic decision-making and team evaluation
Compare EPA with traditional metrics and understand when to use each
Implement EPA-based analysis in both R and Python

Introduction

In Chapter 1, we introduced Expected Points Added (EPA) as one of the foundational metrics in modern football analytics. We learned that EPA measures the value of each play by comparing the expected points before and after the play. This chapter takes a deep dive into EPA, exploring its mathematical foundations, practical implementation, and strategic applications in ways that will fundamentally change how you think about football performance evaluation.

Expected Points Added represents perhaps the single most important conceptual breakthrough in football analytics over the past two decades. Before EPA, analysts relied on traditional statistics—total yards, completion percentage, yards per carry—that failed to capture the situational context that makes football unique among major sports. A 5-yard gain means something entirely different on 3rd-and-4 (a successful conversion that extends the drive) versus 3rd-and-20 (an unsuccessful play that results in a punt). EPA elegantly solves this problem by measuring value in terms of scoring expectation rather than raw yardage.

The power of EPA lies in its ability to reduce the complexity of football into a single, context-aware metric. Every play in football occurs within a specific game state defined by down, distance, field position, score, and time remaining. Traditional statistics treat all yards equally, but we know intuitively that's wrong—a 1-yard touchdown plunge from the goal line is more valuable than a 50-yard gain that still leaves the offense at midfield on 4th-and-1. EPA captures these nuances by asking a simple but profound question: "How much closer to scoring (or further from allowing the opponent to score) did this play move the offense?"

This chapter will guide you through the complete EPA framework, from theoretical foundations to practical implementation. We'll build expected points models from scratch, learning not just how to use EPA but why it works and when it's most applicable. You'll learn to calculate EPA for any game situation, analyze EPA patterns across different contexts, and apply EPA insights to strategic decision-making. By the end, you'll understand why NFL teams now consider EPA their primary metric for evaluating offensive and defensive performance, why it predicts future success better than traditional stats, and how to implement EPA-based analysis in your own work.

Understanding EPA deeply is essential for anyone serious about football analytics. It forms the foundation for player evaluation (which quarterbacks add the most value?), play-calling optimization (which play types work best in which situations?), fourth-down decision models (when should teams attempt conversions?), and countless other applications. The concepts and methods you'll learn in this chapter will reappear throughout the remainder of this textbook, as EPA serves as the analytical backbone for modern football analysis.

Why This Chapter Matters

EPA is the single most important metric in modern football analytics. It forms the foundation for: - Player evaluation and grading - Play-calling analysis and optimization - Fourth-down decision-making - Draft analysis and team building - Coaching strategy and game planning

Review: Expected Points Framework

Before diving deep into the mathematics and implementation of EPA, let's review and expand upon the core concepts introduced in Chapter 1. This review will establish the conceptual foundation we'll build upon throughout this chapter, ensuring we have a shared understanding of what Expected Points represents and why it matters.

Expected Points (EP): The Foundation of Context-Aware Analysis

Expected Points represents the average number of points a team can expect to score on the current drive before the opponent scores, given the current game state defined by down, distance, and field position. This might sound simple, but it's a profound insight that fundamentally changed how we think about football value.

To understand Expected Points, imagine we collected data from 50,000 drives over the past decade and identified every instance where a team faced 1st-and-10 at their own 20-yard line. Across those thousands of drives, we track what happened next: How often did teams score touchdowns? Field goals? How often did their opponents score next? When we average the net points scored across all those drives, we get the Expected Points for that situation—approximately 0.4 points.

Let's examine some illustrative examples to build intuition about how Expected Points varies across different situations:

1st-and-10 at own 20-yard line: ~0.4 expected points
This is roughly the league average starting field position after a touchback
Teams in this situation score before their opponent about 15% of touchdowns (7 pts), 8% field goals (3 pts)
The opponent often scores next, dragging down the expectation
Net result: Slightly positive scoring expectation
1st-and-10 at opponent 20-yard line: ~4.0 expected points
Often called the "red zone," this represents excellent field position
Teams score touchdowns about 35% of the time, field goals 40% of the time
Very unlikely that the opponent scores next from this position
This is nearly the value of a guaranteed field goal plus a coin flip for a touchdown
3rd-and-15 at own 30-yard line: ~0.2 expected points
Long third downs dramatically reduce scoring expectation
Conversion rate is low (~15-20%), meaning punt is likely
Poor field position makes scoring difficult even after conversion
The opponent's good field position after the likely punt reduces net expectation
1st-and-goal at the 1-yard line: ~6.0 expected points
The goal line represents the highest expected points situation
Teams score touchdowns 85-90% of the time from this position
The remaining 10-15% usually results in field goals or turnovers
Average outcome is very close to an automatic touchdown

These values are empirically derived from analyzing hundreds of thousands of historical drives from NFL data. The key insight is that these values represent what actually happens on average, not theoretical calculations or subjective assessments. Expected Points is grounded in observed reality, which gives it predictive power and makes it useful for decision-making.

Notice the pattern: Expected Points increases as you move closer to your opponent's goal line, increases with more favorable downs (1st > 2nd > 3rd), and increases with shorter distances to the first down marker. These relationships aren't linear—moving from your own 20 to your own 40 is worth different value than moving from your opponent's 20 to the goal line. The EP framework captures these non-linear relationships that make football strategically rich.

Expected Points Added (EPA): Measuring Play Value

With Expected Points providing a valuation for every game state, Expected Points Added naturally follows as a measure of play value. EPA quantifies how much a single play changed the team's scoring expectation by comparing the Expected Points before and after the play. This simple difference tells us whether a play moved the team closer to scoring (positive EPA) or further away (negative EPA).

The mathematical formula for EPA is elegantly straightforward:

$$ \text{EPA} = \text{EP}_{\text{end}} - \text{EP}_{\text{start}} $$

Where:
- $\text{EP}_{\text{start}}$ is the expected points at the beginning of the play (based on down, distance, field position before the snap)
- $\text{EP}_{\text{end}}$ is the expected points at the end of the play (based on the resulting down, distance, field position after the play concludes)

This subtraction captures everything that happened during the play: yards gained or lost, down changes (conversion or loss of down), turnovers, penalties, and even immediate scores (touchdowns, field goals, safeties). Because EP already accounts for situational context, EPA automatically incorporates that context into play valuation.

Example: EPA Calculation for a Successful Pass Play

Let's walk through a detailed example to illustrate how EPA works in practice. **Starting Situation**: 1st-and-10 at own 25-yard line - Field position: 75 yards from opponent's end zone - Expected Points: ~0.5 points - This represents slightly better than touchback field position **The Play**: Quarterback completes a 15-yard pass to the slot receiver **Ending Situation**: 1st-and-10 at own 40-yard line - Field position: 60 yards from opponent's end zone - Expected Points: ~1.2 points - The offense has moved into more favorable territory **EPA Calculation**: $$\text{EPA} = 1.2 - 0.5 = +0.7 \text{ expected points}$$ **Interpretation**: This play added 0.7 expected points to the offense's scoring expectation. To put this in perspective, over a typical 65-play game, accumulating +0.7 EPA per play would result in approximately 45.5 expected points—an exceptionally high-scoring performance. This single play was successful because it meaningfully improved the team's position to score. **Why This Play Has Positive EPA**: 1. **Yardage gain**: 15 yards of field position improvement 2. **Maintained favorable down**: Stayed on 1st down (conversion was automatic) 3. **Moved into better EP territory**: The EP curve is non-linear, so moving from the 25 to 40 provides significant value 4. **No negative events**: No turnover, sack, penalty, or loss of down

Example: EPA Calculation for an Unsuccessful Pass Play

Now let's examine a contrasting example where a play generates negative EPA. **Starting Situation**: 3rd-and-8 at own 42-yard line - Field position: 58 yards from opponent's end zone - Expected Points: ~1.3 points - This is a moderately difficult conversion situation **The Play**: Quarterback throws incomplete pass (defensed by cornerback) **Ending Situation**: 4th-and-8 at own 42-yard line (team will punt) - Field position: Unchanged at 58 yards from end zone - Expected Points after punt: ~-0.8 points (opponent gets ball around midfield) - The offense failed to convert and must surrender possession **EPA Calculation**: $$\text{EPA} = -0.8 - 1.3 = -2.1 \text{ expected points}$$ **Interpretation**: This play cost the offense 2.1 expected points. Despite gaining zero yards (incompletion), the play has substantial negative value because it forced the team to punt, giving the opponent favorable field position. This demonstrates EPA's power—it captures that failing to convert on 3rd down has major consequences beyond the immediate yardage. **Why This Play Has Negative EPA**: 1. **Loss of down**: Moved from 3rd to 4th down without converting 2. **Change of possession**: Team must punt, giving opponent the ball 3. **Opponent field position**: Punt likely results in opponent starting around midfield 4. **Lost opportunity**: The failure to convert ended the drive prematurely This example illustrates why completion percentage, while informative, misses critical context. An incomplete pass can be mildly negative (on 1st down) or severely negative (on 3rd down), and EPA captures this distinction automatically.

The Power of EPA's Context Awareness:

What makes EPA revolutionary is its automatic incorporation of situational context. Consider three hypothetical 5-yard runs:

1st-and-10 from own 30: ~+0.3 EPA (modest gain, maintains manageable distance)
3rd-and-3 from own 30: ~+1.8 EPA (conversion! Drive extends with fresh downs)
3rd-and-8 from own 30: ~-1.5 EPA (non-conversion, leads to punt)

All three plays gain the same 5 yards, yet EPA reveals their vastly different values. Traditional statistics (yards per carry: 5.0 for all three) completely miss this. EPA's context-awareness makes it far more useful for evaluating play quality, player performance, and strategic decisions.

Mathematical Foundation of Expected Points

Understanding the mathematical foundations of Expected Points helps us appreciate why it works, how to build robust models, and when its assumptions might break down. This section transitions from the intuitive understanding we've developed to the formal statistical framework underlying EP and EPA. Don't worry if some concepts feel technical—we'll provide intuitive explanations alongside the mathematics, and you don't need advanced statistical training to grasp the key ideas.

The Expected Points Model: A Conditional Expectation Framework

At its core, the expected points model is a conditional expectation problem from probability theory. We're trying to estimate the expected value of a random variable (points scored) conditional on known information about the current state (down, distance, field position). Mathematically, we write this as:

$$ E[\text{Points} \mid X] $$

Where:
- $E[\cdot]$ denotes the expected value (average) of a random quantity
- $\text{Points}$ is the net points the offense will score before the defense scores
- $X$ is a vector of state variables describing the current game situation
- The vertical bar $\mid$ means "conditional on" or "given that we know"

This notation says: "What is the expected (average) number of points we expect the offense to score on this drive, given what we know about the current situation?" The beauty of this formulation is that it explicitly acknowledges uncertainty (we don't know what will happen next) while providing the best estimate based on historical patterns.

Why Conditional Expectation Matters: The conditioning aspect is crucial. We're not asking "What's the average points scored per drive in the NFL?" (which would be around 1.8-2.0 points). We're asking "What's the average points scored from THIS SPECIFIC situation?" The answer depends entirely on the situation—hence "conditional" expectation. A team facing 1st-and-10 at the opponent's 5-yard line has very different expectations than a team facing 4th-and-15 at their own 10.

The Learning Problem: Building an EP model means learning the function $f$ such that $f(X) \approx E[\text{Points} \mid X]$ from historical data. We observe thousands of drives, record their starting situations ($X$) and outcomes ($\text{Points}$), then use statistical/machine learning methods to estimate the relationship. This is supervised learning: we have inputs (game situations), outputs (points scored), and we want to learn the mapping between them.

State Variables: Defining the Game Situation

The accuracy and usefulness of an EP model depends critically on which state variables we include. More variables generally improve model accuracy but require more data and computational complexity. The art of EP modeling involves balancing predictive power against simplicity and interpretability.

The core state variables for a basic EP model include:

1. Field Position ($\text{yardline}$)
- Measured as distance from opponent's end zone (0-100 yards)
- Often represented as yardline_100 in nflfastR data
- Most important predictor of scoring expectation
- Relationship is non-linear: gaining 10 yards at midfield has different value than gaining 10 yards in the red zone
- Example values: Own 20 = 80 yards from goal; Opponent 20 = 20 yards from goal

2. Down ($\text{down}$)
- Categorical variable: 1st, 2nd, 3rd, or 4th down
- Captures how many attempts remain to gain first down
- First down is most valuable (three chances remaining)
- Fourth down usually results in punt/field goal attempt rather than offensive play
- Down interacts with distance—1st-and-15 is different from 1st-and-5

3. Distance to First Down ($\text{distance}$)
- Continuous variable: yards needed for first down (1-50+ yards)
- Also called ydstogo (yards to go) in most datasets
- Short distances favor offense (easier conversions)
- Long distances (10+) make conversions difficult and favor defense
- Interacts strongly with down: 3rd-and-1 vs. 3rd-and-15 are completely different situations

4. Score Differential ($\text{score\_diff}$) - Optional but increasingly common
- Points the offense is ahead (positive) or behind (negative)
- Affects strategy: teams protecting leads play conservatively, teams trailing play aggressively
- Particularly important late in games when teams may prioritize clock management over scoring
- Basic EP models often exclude this for simplicity; advanced models include it

5. Time Remaining ($\text{seconds}$) - Optional for advanced models
- Seconds remaining in half or game
- Interacts with score differential for strategy effects
- Early in games, less important; late in games, critically important
- Teams trailing with little time remaining take more risks
- Including time creates a time-varying EP model

6. Additional Variables - For specialized models
- Quarter: Some teams play differently in different quarters
- Timeouts Remaining: Affects clock management and two-minute drill effectiveness
- Weather Conditions: Wind, precipitation affect passing and kicking
- Home/Away: Home field advantage exists, though effect size is modest
- Team Strength: Adjusting for team quality improves predictions but complicates interpretation

For most applications, a model using field position, down, and distance provides excellent results. These three variables capture the vast majority of variation in scoring expectations. Adding score differential and time remaining improves accuracy for late-game situations but complicates the model and requires more data.

Model Complexity Trade-offs

**Simple Model** (Field Position + Down + Distance): - **Pros**: Easy to build, interpret, and explain; stable estimates; works well most of the time - **Cons**: Misses some strategic considerations; treats all game contexts equally - **Best for**: Player evaluation, general performance analysis, most analytical applications **Complex Model** (Simple Model + Score + Time + More): - **Pros**: More accurate in specific situations; captures strategic behavior; better for decision-making - **Cons**: Harder to build and interpret; requires more data; can overfit; less stable estimates - **Best for**: In-game decision support, late-game strategy, fourth-down models, coaching applications Most analysts start with simple models and add complexity only when necessary. The principle of parsimony (Occam's Razor) applies: prefer simpler models unless complexity demonstrably improves performance for your specific use case.

Drive Outcomes: Defining the Target Variable

The target variable in an EP model—what we're trying to predict—is the net points scored by the offense before the defense scores. This requires carefully defining all possible drive outcomes and their point values. The accounting must be precise because EP serves as the foundation for all subsequent EPA calculations.

Positive Outcomes (Offense scores):

Touchdown: +7 points
We conventionally include the PAT (extra point) in touchdown value
99% of extra points succeed, so expected value ≈ 0.99 points
Rounding to 7 points is standard practice in EP models
Some advanced models use 6.95 or calculate dynamic PAT values
Field Goal: +3 points
Constant value regardless of distance
In reality, long field goals are less likely to succeed
But we're measuring points scored, not probability of attempting
If a drive results in a made field goal, it's worth exactly 3 points
Two-Point Conversion: +8 points (for touchdowns with successful 2PC)
Rare enough that many models simplify by using 7 for all TDs
Sophisticated models track 2PC attempts and success separately

Negative Outcomes (Opponent scores next):

Opponent Touchdown: -7 points
If the drive ends (turnover, turnover on downs, punt) and opponent scores TD before offense scores again
Represents giving opponent good field position that leads to their scoring
The negative value reflects that the drive not only failed to score but enabled opponent scoring
Opponent Field Goal: -3 points
Similar logic to opponent touchdown
Drive ends, opponent gets ball, opponent kicks field goal before offense scores
Safety (offense commits): -2 points
Rare but devastating outcome
Offense tackled in their own end zone
Most commonly occurs when offense has terrible field position (own 1-5 yard line)

Neutral Outcomes (No score):

No Score by Either Team: 0 points
Drive ends, opponent gets ball, but neither team scores before next change of possession
Most common outcome for drives starting from poor field position
Represents "trading possessions" without scoring

Why Net Points Matter: The EP framework measures net points scored on the current drive before the opponent scores. This is crucial—it's not just "did the offense score?" but "what was the net scoring outcome?" A drive that ends in a punt from midfield (opponent gets ball at own 20) has different implications than a drive that ends in a turnover at the opponent's 10 (opponent gets ball with excellent field position). The first might result in "no score by either team" (0 points), while the second might result in "opponent field goal" (-3 points).

Drive Outcome Examples

**Example 1: Straightforward Touchdown Drive** - Start: 1st-and-10 at own 25 - Series of plays advances to opponent's 2-yard line - Touchdown scored on goal line run - **Outcome**: +7 points **Example 2: Failed Drive Leading to Opponent Score** - Start: 1st-and-10 at own 30 - Three-and-out after incomplete passes - Punt gives opponent ball at their own 40 - Opponent drives and scores touchdown - **Outcome**: -7 points (even though offense never scored, the drive's net result is negative) **Example 3: Back-and-Forth with No Scoring** - Start: 1st-and-10 at own 20 - Drive advances to midfield, then stalls - Punt gives opponent ball at their own 15 - Opponent's drive also fails, punts back - Neither team scores before next change of possession - **Outcome**: 0 points

Model Formulation: From Outcomes to Expected Points

With drive outcomes defined, we can now formulate the Expected Points model mathematically. The key insight is that EP is a weighted average of all possible outcomes, where weights are the probabilities of each outcome occurring.

$$ EP(y, d, D) = \sum_{k} P(\text{outcome}_k \mid y, d, D) \times \text{points}_k $$

Where:
- $EP(y, d, D)$ is the expected points for a given situation
- $y$ = yardline, measured as yards from opponent's end zone (0-100)
- $d$ = down (1, 2, 3, or 4)
- $D$ = distance to first down (1-50+ yards)
- $P(\text{outcome}_k \mid y, d, D)$ = probability of outcome $k$ occurring from this situation
- $\text{points}_k$ = point value of outcome $k$ (from table above)
- The sum $\sum_k$ is over all possible outcomes (TD, FG, opponent TD, opponent FG, no score)

Interpreting the Formula: This equation says that Expected Points is calculated by:
1. Identifying all possible ways the drive could end (touchdown, field goal, etc.)
2. Estimating the probability of each outcome from this specific situation
3. Multiplying each outcome's point value by its probability
4. Summing across all outcomes to get the expected value

Example Calculation: Suppose from 1st-and-10 at opponent's 25-yard line, historical data shows:
- Touchdown: 35% probability × 7 points = 2.45 expected points
- Field goal: 40% probability × 3 points = 1.20 expected points
- Opponent TD: 5% probability × -7 points = -0.35 expected points
- Opponent FG: 8% probability × -3 points = -0.24 expected points
- No score: 12% probability × 0 points = 0.00 expected points

Sum = 2.45 + 1.20 - 0.35 - 0.24 + 0.00 = 3.06 expected points

This calculation shows why field position near the goal line is valuable: the high probability of scoring (75% combined TD/FG rate) outweighs the low probability of giving up points.

Statistical Methods for Building EP Models

The formula above tells us what we want to calculate, but not how to calculate it from data. We need statistical methods to learn the relationship between game situations $(y, d, D)$ and expected points. Multiple approaches exist, each with different trade-offs between accuracy, interpretability, computational cost, and data requirements.

1. Binning and Averaging (Simplest approach)
- Method: Group similar situations into bins, calculate average points for each bin
- Example: All "1st-and-10 from yards 60-65" plays go in one bin
- Pros: Extremely simple; easy to understand and explain; no statistical software needed
- Cons: Requires arbitrary bin boundaries; doesn't smooth across boundaries; needs lots of data for rare situations
- When to use: Quick exploratory analysis; teaching/explaining concepts; situations with lots of data

2. Generalized Additive Models (GAM) (Flexible smoothing)
- Method: Fits smooth curves for continuous variables (yardline, distance) and factors for categorical ones (down)
- Example: Uses splines to create smooth EP curve as yardline changes
- Pros: Smooth predictions; interpretable; handles non-linearities well; built-in uncertainty estimates
- Cons: Limited ability to capture complex interactions; slower to fit than simple models
- When to use: When interpretability matters; when visualizing relationships; academic research

3. Random Forests (Ensemble tree method)
- Method: Builds many decision trees, each on random subset of data, averages predictions
- Example: Tree 1 might split on "yardline < 50?", Tree 2 on "distance < 5?", etc.
- Pros: Captures complex interactions automatically; robust to outliers; handles categorical and continuous variables easily
- Cons: "Black box" (hard to interpret); can overfit; slower prediction than simpler models
- When to use: When predictive accuracy is paramount; when interactions between variables are important

4. Gradient Boosting (Industry standard)
- Method: Builds trees sequentially, each correcting errors from previous trees
- Example: XGBoost, LightGBM, CatBoost implementations
- Pros: Highest accuracy in practice; handles missing data; fast prediction; feature importance available
- Cons: Requires careful tuning; can overfit; computationally expensive to train; less interpretable
- When to use: Production models; when maximum accuracy needed; when you have engineering resources

5. Neural Networks (Maximum flexibility)
- Method: Learns complex non-linear functions through layers of interconnected nodes
- Example: Deep learning models with multiple hidden layers
- Pros: Can learn any function given enough data; extremely flexible; state-of-the-art for complex problems
- Cons: Requires huge amounts of data; computationally expensive; complete black box; easy to overfit; hard to tune
- When to use: When you have massive datasets (10+ seasons); when traditional methods plateau; research applications

Model Selection in Practice

**For learning and exploration**: Start with binning/averaging to build intuition. It's simple enough to calculate by hand and helps you understand the data patterns. **For analysis and research**: Use GAM or random forests. GAMs provide interpretable smooth curves perfect for visualizing relationships. Random forests offer good accuracy without requiring extensive tuning. **For production systems**: Use gradient boosting (XGBoost or LightGBM). This is what nflfastR and professional teams use because it provides the best accuracy-to-effort ratio. The slight interpretability loss is worth the performance gain. **For cutting-edge research**: Consider neural networks only if you have extensive data, computational resources, and expertise. For most applications, gradient boosting is sufficient. In this chapter, we'll implement binning, GAM, and gradient boosting approaches so you understand the trade-offs and can choose appropriately for your applications.

Building an Expected Points Model from Scratch

Now that we understand the theory, let's implement a complete Expected Points model from scratch using historical NFL data. This hands-on section will take you through every step of model building, from data preparation through model training to performance evaluation. By building models yourself, you'll gain deep insights into how EP works, what choices modelers make, and how to adapt models for your specific needs.

We'll implement three different modeling approaches—binning, GAM, and gradient boosting—so you can compare their performance and understand the trade-offs. Each approach offers different benefits: binning is simple and transparent, GAM provides smooth interpretable curves, and gradient boosting delivers maximum accuracy. By implementing all three, you'll develop intuition about which method works best for different applications.

This is where theory meets practice. You'll see that building an EP model isn't just about running a statistical procedure—it requires thoughtful decisions about data filtering, feature engineering, model selection, and validation. The code examples provide both R and Python implementations, with detailed comments explaining every step. Even if you're new to these programming languages, the extensive comments will help you understand the logic.

Data Preparation: Creating Training and Test Sets

Before building any model, we need high-quality data. For EP models, this means collecting historical play-by-play data that includes down, distance, field position, and drive outcomes. We'll use multiple seasons for training (2016-2022) to ensure robust estimates, and reserve the most recent season (2023) for testing to evaluate how well our model generalizes to new data.

Why Multiple Seasons? Using seven seasons (2016-2022) gives us approximately 300,000+ plays, providing sufficient data even for rare situations like 4th-and-long from deep in your own territory. More data leads to more stable estimates and better model performance. However, we don't want to go back too far—the NFL evolves over time, and play-calling from the 1990s looks very different from today's game.

Why Hold Out 2023? We never evaluate a model on the same data used to train it—that would be circular reasoning (of course a model performs well on data it was trained on!). Instead, we hold out the most recent season as a test set, simulating how the model would perform on future, unseen data. This tests whether our model has learned generalizable patterns rather than memorizing specific situations.

Data Filtering Decisions: Not every play in the database belongs in our model:
- We exclude plays without down/distance/field position (kickoffs, penalties without plays)
- We keep only regular season games (playoffs involve different team quality and strategy)
- We track drive outcomes to calculate the target variable (points scored)

These filtering decisions affect model performance and interpretation, so we'll document them clearly in the code.

#| label: setup-r
#| message: false
#| warning: false

# Load required libraries for EPA model building
# These packages provide the tools we need for data manipulation,
# statistical modeling, and visualization

library(tidyverse)    # Data manipulation (dplyr), visualization (ggplot2), etc.
library(nflfastR)     # Access to NFL play-by-play data with advanced metrics
library(mgcv)         # Generalized Additive Models for smooth non-linear relationships
library(xgboost)      # Gradient boosting implementation - industry standard for EP models
library(gt)           # Publication-quality tables for displaying results
library(nflplotR)     # NFL team logos and styling for visualizations

# Set random seed for reproducibility
# This ensures that any random processes (like train/test splits,
# cross-validation folds, or model initialization) produce the same
# results each time we run the code
set.seed(42)

#| label: load-ep-data-r
#| message: false
#| warning: false
#| cache: true

# STEP 1: Load Historical Play-by-Play Data
# We'll use 2016-2023 seasons to build and test our EP model
# This gives us ~350,000 plays with sufficient data for even rare situations

seasons <- 2016:2023
pbp <- load_pbp(seasons)

# Display initial data summary
cat("=== RAW DATA LOADED ===\n")
cat("Total plays across all seasons:", format(nrow(pbp), big.mark = ","), "\n")
cat("Seasons included:", paste(unique(pbp$season), collapse = ", "), "\n\n")

# STEP 2: Filter and Prepare Data for EP Modeling
# We need to:
# - Keep only plays with valid down, distance, and field position
# - Exclude playoffs (different strategic considerations)
# - Calculate drive outcomes (what happened next)
# - Create target variable (net points scored)

ep_data <- pbp %>%
  # Filter to plays with complete state information
  filter(
    !is.na(down),            # Must have down information (1st, 2nd, 3rd, 4th)
    !is.na(ydstogo),         # Must have distance information (yards to go)
    !is.na(yardline_100),    # Must have field position (yards from opponent goal)
    season_type == "REG"     # Only regular season (playoffs are different strategically)
  ) %>%
  # Group by game and drive to calculate drive-level outcomes
  # Each drive is a sequence of plays ending in score or change of possession
  group_by(game_id, drive) %>%
  mutate(
    # Get the final result of this drive
    # fixed_drive_result is nflfastR's categorization of how the drive ended
    drive_result = last(fixed_drive_result),

    # Convert drive results to point values
    # This is our target variable - what we're trying to predict
    drive_points = case_when(
      # Positive outcomes (offense scores)
      drive_result == "Touchdown" ~ 7,        # TD + expected PAT value
      drive_result == "Field goal" ~ 3,       # Successful FG

      # Negative outcomes (defense scores next)
      drive_result == "Opp touchdown" ~ -7,   # Opponent scores TD after turnover/punt
      drive_result == "Opp field goal" ~ -3,  # Opponent scores FG after turnover/punt
      drive_result == "Safety" ~ -2,          # Offense commits safety

      # Neutral outcome (no score)
      TRUE ~ 0                                # No score before next possession change
    )
  ) %>%
  ungroup() %>%
  # Select only the variables we need for modeling
  # Keeping dataset focused improves performance and clarity
  select(
    season, game_id, play_id, drive,        # Identifiers
    down, ydstogo, yardline_100,            # State variables (predictors)
    qtr, half_seconds_remaining,            # Additional context (optional)
    score_differential,                      # Score context (optional)
    drive_points,                            # Target variable
    epa                                      # nflfastR's EPA for comparison
  )

# STEP 3: Create Train/Test Split
# Training set: 2016-2022 (7 seasons for model building)
# Test set: 2023 (1 season for evaluation)
# This simulates how the model would perform on future, unseen data

train_data <- ep_data %>% filter(season <= 2022)
test_data <- ep_data %>% filter(season == 2023)

# Display data split summary
cat("=== DATA PREPARATION COMPLETE ===\n")
cat("Training observations:", format(nrow(train_data), big.mark = ","),
    "(", round(nrow(train_data) / nrow(ep_data) * 100, 1), "% )\n")
cat("Testing observations:", format(nrow(test_data), big.mark = ","),
    "(", round(nrow(test_data) / nrow(ep_data) * 100, 1), "% )\n")
cat("Training seasons:", paste(unique(train_data$season), collapse = ", "), "\n")
cat("Test season:", unique(test_data$season), "\n\n")

# Verify data quality
cat("=== DATA QUALITY CHECKS ===\n")
cat("Missing values in train_data:\n")
cat("  down:", sum(is.na(train_data$down)), "\n")
cat("  ydstogo:", sum(is.na(train_data$ydstogo)), "\n")
cat("  yardline_100:", sum(is.na(train_data$yardline_100)), "\n")
cat("  drive_points:", sum(is.na(train_data$drive_points)), "\n")

#| label: setup-py
#| message: false
#| warning: false

# Load required libraries for EPA model building
# Python equivalents to the R packages used above

import pandas as pd                # Data manipulation (Python's equivalent to dplyr)
import numpy as np                 # Numerical computing and array operations
import nfl_data_py as nfl          # NFL data access (Python equivalent of nflfastR)
import matplotlib.pyplot as plt    # Basic plotting (equivalent to base R graphics)
import seaborn as sns              # Statistical visualizations (similar to ggplot2)
from sklearn.ensemble import GradientBoostingRegressor  # Gradient boosting models
from sklearn.metrics import mean_squared_error, mean_absolute_error  # Model evaluation
import warnings
warnings.filterwarnings('ignore')  # Suppress warnings for cleaner output

# Set random seed for reproducibility
# Ensures consistent results across runs for any random operations
np.random.seed(42)

#| label: load-ep-data-py
#| message: false
#| warning: false
#| cache: true

# STEP 1: Load Historical Play-by-Play Data
# Using same seasons as R version (2016-2023) for consistency

seasons = list(range(2016, 2024))  # Creates [2016, 2017, ..., 2023]
pbp = nfl.import_pbp_data(seasons)

# Display initial data summary
print("=== RAW DATA LOADED ===")
print(f"Total plays across all seasons: {len(pbp):,}")
print(f"Seasons included: {', '.join(map(str, sorted(pbp['season'].unique())))}\n")

# STEP 2: Filter and Prepare Data for EP Modeling
# Apply same filtering logic as R version

ep_data = pbp[
    pbp['down'].notna() &              # Must have down information
    pbp['ydstogo'].notna() &           # Must have distance information
    pbp['yardline_100'].notna() &      # Must have field position
    (pbp['season_type'] == 'REG')     # Only regular season games
].copy()  # .copy() creates independent DataFrame, prevents SettingWithCopyWarning

# STEP 3: Calculate Drive Outcomes
# Define function to compute net points scored on each drive

def calculate_drive_points(group):
    """
    Calculate the net points scored on a drive.

    Parameters:
    -----------
    group : DataFrame
        A group of plays from a single drive

    Returns:
    --------
    int : Net points scored on this drive (positive if offense scores,
          negative if opponent scores next, 0 if no score)
    """
    # Get the final result of this drive
    # .iloc[-1] accesses the last element (final play of drive)
    drive_result = group['fixed_drive_result'].iloc[-1]

    # Convert drive outcomes to point values (same logic as R version)
    if drive_result == 'Touchdown':
        return 7            # TD + PAT
    elif drive_result == 'Field goal':
        return 3            # Successful field goal
    elif drive_result == 'Safety':
        return -2           # Offense commits safety
    elif drive_result == 'Opp touchdown':
        return -7           # Opponent scores TD next
    elif drive_result == 'Opp field goal':
        return -3           # Opponent scores FG next
    else:
        return 0            # No score by either team

# Apply function to each drive using groupby
# This creates drive_points column with net points for each play
ep_data['drive_points'] = ep_data.groupby(['game_id', 'drive']).apply(
    calculate_drive_points
).reset_index(level=[0, 1], drop=True)  # Reset index to align with original DataFrame

# STEP 4: Select Relevant Columns
# Keep only variables needed for modeling to reduce memory usage

ep_data = ep_data[[
    'season', 'game_id', 'play_id', 'drive',      # Identifiers
    'down', 'ydstogo', 'yardline_100',            # State variables (predictors)
    'qtr', 'half_seconds_remaining',              # Additional context
    'score_differential',                          # Score context
    'drive_points',                                # Target variable
    'epa'                                          # nflfastR's EPA for comparison
]]

# STEP 5: Create Train/Test Split
# Same split as R version for direct comparability

train_data = ep_data[ep_data['season'] <= 2022]  # 7 seasons for training
test_data = ep_data[ep_data['season'] == 2023]   # 1 season for testing

# Display data split summary
print("=== DATA PREPARATION COMPLETE ===")
print(f"Training observations: {len(train_data):,} ({len(train_data)/len(ep_data)*100:.1f}%)")
print(f"Testing observations: {len(test_data):,} ({len(test_data)/len(ep_data)*100:.1f}%)")
print(f"Training seasons: {', '.join(map(str, sorted(train_data['season'].unique())))}")
print(f"Test season: {test_data['season'].unique()[0]}\n")

# Verify data quality
print("=== DATA QUALITY CHECKS ===")
print("Missing values in train_data:")
print(f"  down: {train_data['down'].isna().sum()}")
print(f"  ydstogo: {train_data['ydstogo'].isna().sum()}")
print(f"  yardline_100: {train_data['yardline_100'].isna().sum()}")
print(f"  drive_points: {train_data['drive_points'].isna().sum()}")

**Data Loading and Preparation: Key Concepts** This code prepares our dataset for building Expected Points models. Let's break down the key steps and concepts: **1. Data Import** - We use `nfl.import_pbp_data()` to load play-by-play data for multiple seasons - This function downloads data from the nflverse repository if not already cached - Each row represents one play with 300+ variables describing everything about that play **2. Filtering Logic** - We remove plays without complete state information (down, distance, field position) - Excluding playoffs ensures our model reflects regular season strategy - The `.copy()` method prevents pandas warnings about modifying views of DataFrames **3. Drive Outcome Calculation** - The `calculate_drive_points()` function is applied to each drive independently - `.groupby(['game_id', 'drive'])` groups plays into drives (sequence until possession change) - `.iloc[-1]` selects the last play of each drive to determine the final outcome - The function returns a single value (drive points) that gets assigned to ALL plays in that drive **4. Why Assign Drive Points to Every Play?** This might seem strange at first—why does every play in a drive get the same `drive_points` value? The answer is that we're building a model to predict drive outcome from ANY point in the drive. When we're at 1st-and-10 from our own 25, we want to predict: "Given this situation, what's the expected outcome of THIS ENTIRE DRIVE?" Every play gets labeled with the actual outcome so the model can learn the relationship between game state and eventual result. **5. Train/Test Split Philosophy** - **Training data (2016-2022)**: Used to learn patterns and build the model - **Test data (2023)**: Used to evaluate performance on unseen data - We never train on 2023 data, ensuring our evaluation is honest - Temporal split (older data for training, newer for testing) simulates real-world deployment **6. Data Quality Verification** The quality checks at the end confirm: - No missing values in our predictor variables (down, distance, field position) - No missing values in our target variable (drive_points) - If we found missing values, we'd need to either impute them or exclude those rows **R vs Python Differences:** - R uses `filter()` with logical conditions; Python uses boolean indexing with `&` operators - R's `case_when()` is replaced by Python's `if/elif/else` structure in a function - R's `mutate()` creates new columns; Python uses direct assignment `df['new_col'] = ...` - R's `group_by() %>% summarise()` is equivalent to Python's `.groupby().apply()` - Despite syntax differences, the logic and results are identical

Method 1: Binning and Averaging

The simplest approach to building an Expected Points model is binning and averaging—grouping similar game situations together and calculating the average points scored for each group. While this method is less sophisticated than GAM or gradient boosting, it's an excellent starting point because it's transparent, easy to understand, and requires no advanced statistical knowledge or specialized software.

The Core Idea: Imagine creating a massive lookup table where each row represents a unique combination of down, distance, and field position, and the value is the average points scored historically from that situation. When you want to know the EP for a new play, you simply look up the matching situation in your table. If you face 1st-and-10 from your own 30, you find all historical plays with 1st-and-10 from yards 25-35 (a bin), calculate their average drive outcome, and use that as your EP estimate.

Why Binning? In theory, we could create a separate estimate for every exact situation (1st-and-10 from the 30.5-yard line), but this creates two problems:
1. Data Sparsity: Many exact situations occur rarely or never, giving us no data or unreliable estimates
2. No Smoothing: We assume 1st-and-10 from the 30-yard line is fundamentally different from 1st-and-10 from the 31-yard line, when they're nearly identical

Binning solves both problems by grouping nearby situations together, increasing sample sizes and implicitly assuming similar situations have similar expected points.

The Trade-off: Binning introduces a classic bias-variance trade-off:
- Narrow bins (e.g., 5-yard field position increments): More precise, but less data per bin → higher variance, unstable estimates
- Wide bins (e.g., 20-yard field position increments): More stable, but less precise → higher bias, less discriminating

We'll use moderate bin widths that balance these concerns: 10-yard field position bins, categorical distance bins (1 yard, 2-3 yards, 4-6 yards, 7-9 yards, 10+ yards), and explicit down labels. This provides reasonable granularity while maintaining sufficient data per bin.

Advantages of Binning:
- Extremely simple to implement and explain to non-technical audiences
- No assumptions about functional form (non-parametric)
- Fast to compute, even with massive datasets
- Easy to update as new data arrives (just add new observations to bins)
- Interpretable: every prediction comes directly from historical averages

Disadvantages of Binning:
- Requires arbitrary choices about bin boundaries (where to split field position?)
- Discontinuities at bin boundaries (going from 39 to 41 yards changes bins despite similar situations)
- Doesn't handle sparse situations gracefully (what if a bin has only 2 observations?)
- Can't capture subtle interactions between variables
- Less accurate than sophisticated methods

Despite these limitations, binning provides a solid baseline model that often performs surprisingly well. Professional teams used binning-based EP models before modern machine learning became standard. Even today, binning is useful for exploratory analysis, quick approximations, and sanity-checking more complex models.

R
Python

#| label: binning-model-r
#| message: false
#| warning: false

# Create binned EP lookup table
ep_binned <- train_data %>%
  mutate(
    # Create bins for each variable
    ydline_bin = cut(yardline_100,
                     breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
                     include.lowest = TRUE,
                     labels = FALSE),
    dist_bin = case_when(
      ydstogo == 1 ~ "1",
      ydstogo %in% 2:3 ~ "2-3",
      ydstogo %in% 4:6 ~ "4-6",
      ydstogo %in% 7:9 ~ "7-9",
      TRUE ~ "10+"
    )
  ) %>%
  group_by(down, dist_bin, ydline_bin) %>%
  summarise(
    ep_binned = mean(drive_points),
    n = n(),
    .groups = "drop"
  ) %>%
  filter(n >= 10)  # Require minimum observations

# Apply to test data
test_predictions_binned <- test_data %>%
  mutate(
    ydline_bin = cut(yardline_100,
                     breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
                     include.lowest = TRUE,
                     labels = FALSE),
    dist_bin = case_when(
      ydstogo == 1 ~ "1",
      ydstogo %in% 2:3 ~ "2-3",
      ydstogo %in% 4:6 ~ "4-6",
      ydstogo %in% 7:9 ~ "7-9",
      TRUE ~ "10+"
    )
  ) %>%
  left_join(ep_binned, by = c("down", "dist_bin", "ydline_bin")) %>%
  replace_na(list(ep_binned = 0))

# Calculate model performance
rmse_binned <- sqrt(mean((test_predictions_binned$drive_points -
                          test_predictions_binned$ep_binned)^2, na.rm = TRUE))

cat("Binning Model RMSE:", round(rmse_binned, 3), "\n")

#| label: binning-model-py
#| message: false
#| warning: false

# Create binned EP lookup table
train_binned = train_data.copy()
train_binned['ydline_bin'] = pd.cut(
    train_binned['yardline_100'],
    bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
    labels=False
)
train_binned['dist_bin'] = pd.cut(
    train_binned['ydstogo'],
    bins=[0, 1, 3, 6, 9, 100],
    labels=['1', '2-3', '4-6', '7-9', '10+']
)

ep_binned = (train_binned
    .groupby(['down', 'dist_bin', 'ydline_bin'])
    .agg(
        ep_binned=('drive_points', 'mean'),
        n=('drive_points', 'count')
    )
    .reset_index()
    .query('n >= 10')  # Require minimum observations
)

# Apply to test data
test_binned = test_data.copy()
test_binned['ydline_bin'] = pd.cut(
    test_binned['yardline_100'],
    bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
    labels=False
)
test_binned['dist_bin'] = pd.cut(
    test_binned['ydstogo'],
    bins=[0, 1, 3, 6, 9, 100],
    labels=['1', '2-3', '4-6', '7-9', '10+']
)

test_predictions_binned = test_binned.merge(
    ep_binned[['down', 'dist_bin', 'ydline_bin', 'ep_binned']],
    on=['down', 'dist_bin', 'ydline_bin'],
    how='left'
).fillna({'ep_binned': 0})

# Calculate model performance
rmse_binned = np.sqrt(mean_squared_error(
    test_predictions_binned['drive_points'],
    test_predictions_binned['ep_binned']
))

print(f"Binning Model RMSE: {rmse_binned:.3f}")

Method 2: Generalized Additive Model (GAM)

GAMs provide smooth, non-linear relationships between variables and expected points.

R
Python

#| label: gam-model-r
#| message: false
#| warning: false
#| cache: true

# Fit GAM model
gam_model <- gam(
  drive_points ~ s(yardline_100, k = 20) +
                 s(ydstogo, k = 10) +
                 factor(down),
  data = train_data,
  family = gaussian()
)

# Generate predictions
test_data$ep_gam <- predict(gam_model, newdata = test_data)

# Calculate model performance
rmse_gam <- sqrt(mean((test_data$drive_points - test_data$ep_gam)^2,
                      na.rm = TRUE))

cat("GAM Model RMSE:", round(rmse_gam, 3), "\n")

# Model summary
summary(gam_model)

#| label: gam-model-py
#| message: false
#| warning: false
#| cache: true

from pygam import LinearGAM, s, f

# Prepare features
X_train = train_data[['yardline_100', 'ydstogo', 'down']].values
y_train = train_data['drive_points'].values

X_test = test_data[['yardline_100', 'ydstogo', 'down']].values
y_test = test_data['drive_points'].values

# Fit GAM model
gam_model = LinearGAM(s(0, n_splines=20) + s(1, n_splines=10) + f(2))
gam_model.fit(X_train, y_train)

# Generate predictions
test_data_py = test_data.copy()
test_data_py['ep_gam'] = gam_model.predict(X_test)

# Calculate model performance
rmse_gam = np.sqrt(mean_squared_error(y_test, test_data_py['ep_gam']))

print(f"GAM Model RMSE: {rmse_gam:.3f}")
print(f"\nModel Summary:")
print(f"Number of terms: {len(gam_model.terms)}")
print(f"Effective degrees of freedom: {gam_model.statistics_['edof']:.2f}")

Method 3: Gradient Boosting (Production Model)

Gradient boosting machines provide the best accuracy and are used in production EPA models.

R
Python

#| label: xgb-model-r
#| message: false
#| warning: false
#| cache: true

# Prepare data for XGBoost
train_matrix <- train_data %>%
  select(yardline_100, ydstogo, down) %>%
  as.matrix()

train_label <- train_data$drive_points

test_matrix <- test_data %>%
  select(yardline_100, ydstogo, down) %>%
  as.matrix()

# Create DMatrix objects
dtrain <- xgb.DMatrix(data = train_matrix, label = train_label)
dtest <- xgb.DMatrix(data = test_matrix, label = test_data$drive_points)

# Set parameters
params <- list(
  objective = "reg:squarederror",
  eta = 0.1,
  max_depth = 6,
  subsample = 0.8,
  colsample_bytree = 0.8
)

# Train model with cross-validation
xgb_model <- xgb.train(
  params = params,
  data = dtrain,
  nrounds = 200,
  watchlist = list(train = dtrain, test = dtest),
  early_stopping_rounds = 20,
  verbose = 0
)

# Generate predictions
test_data$ep_xgb <- predict(xgb_model, dtest)

# Calculate model performance
rmse_xgb <- sqrt(mean((test_data$drive_points - test_data$ep_xgb)^2,
                      na.rm = TRUE))

cat("XGBoost Model RMSE:", round(rmse_xgb, 3), "\n")
cat("Best iteration:", xgb_model$best_iteration, "\n")

# Feature importance
importance <- xgb.importance(model = xgb_model)
print(importance)

#| label: xgb-model-py
#| message: false
#| warning: false
#| cache: true

from sklearn.ensemble import GradientBoostingRegressor

# Prepare features
X_train = train_data[['yardline_100', 'ydstogo', 'down']].values
y_train = train_data['drive_points'].values

X_test = test_data[['yardline_100', 'ydstogo', 'down']].values
y_test = test_data['drive_points'].values

# Train gradient boosting model
gb_model = GradientBoostingRegressor(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=6,
    subsample=0.8,
    random_state=42,
    verbose=0
)

gb_model.fit(X_train, y_train)

# Generate predictions
test_data_py['ep_gb'] = gb_model.predict(X_test)

# Calculate model performance
rmse_gb = np.sqrt(mean_squared_error(y_test, test_data_py['ep_gb']))

print(f"Gradient Boosting Model RMSE: {rmse_gb:.3f}")

# Feature importance
feature_importance = pd.DataFrame({
    'feature': ['yardline_100', 'ydstogo', 'down'],
    'importance': gb_model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nFeature Importance:")
print(feature_importance.to_string(index=False))

Model Comparison

Let's compare all three modeling approaches:

R
Python

#| label: model-comparison-r
#| message: false
#| warning: false

# Create comparison table
model_comparison <- tibble(
  Model = c("Binning", "GAM", "XGBoost", "nflfastR"),
  RMSE = c(
    sqrt(mean((test_predictions_binned$drive_points -
               test_predictions_binned$ep_binned)^2, na.rm = TRUE)),
    sqrt(mean((test_data$drive_points - test_data$ep_gam)^2, na.rm = TRUE)),
    sqrt(mean((test_data$drive_points - test_data$ep_xgb)^2, na.rm = TRUE)),
    sqrt(mean((test_data$drive_points - test_data$epa)^2, na.rm = TRUE))
  ),
  MAE = c(
    mean(abs(test_predictions_binned$drive_points -
             test_predictions_binned$ep_binned), na.rm = TRUE),
    mean(abs(test_data$drive_points - test_data$ep_gam), na.rm = TRUE),
    mean(abs(test_data$drive_points - test_data$ep_xgb), na.rm = TRUE),
    mean(abs(test_data$drive_points - test_data$epa), na.rm = TRUE)
  )
) %>%
  mutate(
    RMSE_pct = (RMSE / min(RMSE) - 1) * 100,
    MAE_pct = (MAE / min(MAE) - 1) * 100
  )

model_comparison %>%
  gt() %>%
  cols_label(
    Model = "Model Type",
    RMSE = "RMSE",
    MAE = "MAE",
    RMSE_pct = "RMSE vs Best (%)",
    MAE_pct = "MAE vs Best (%)"
  ) %>%
  fmt_number(
    columns = c(RMSE, MAE),
    decimals = 3
  ) %>%
  fmt_number(
    columns = c(RMSE_pct, MAE_pct),
    decimals = 1
  ) %>%
  tab_header(
    title = "Expected Points Model Comparison",
    subtitle = "2023 Season Test Set Performance"
  )

#| label: model-comparison-py
#| message: false
#| warning: false

# Calculate metrics for all models
models = {
    'Binning': test_predictions_binned['ep_binned'],
    'GAM': test_data_py['ep_gam'],
    'Gradient Boosting': test_data_py['ep_gb'],
    'nflfastR': test_data['epa']
}

comparison_data = []
for model_name, predictions in models.items():
    rmse = np.sqrt(mean_squared_error(
        test_data['drive_points'][:len(predictions)],
        predictions
    ))
    mae = mean_absolute_error(
        test_data['drive_points'][:len(predictions)],
        predictions
    )
    comparison_data.append({
        'Model': model_name,
        'RMSE': rmse,
        'MAE': mae
    })

comparison_df = pd.DataFrame(comparison_data)
comparison_df['RMSE_pct'] = (comparison_df['RMSE'] / comparison_df['RMSE'].min() - 1) * 100
comparison_df['MAE_pct'] = (comparison_df['MAE'] / comparison_df['MAE'].min() - 1) * 100

print("\nExpected Points Model Comparison")
print("2023 Season Test Set Performance")
print("=" * 70)
print(comparison_df.to_string(index=False))

Visualizing the Expected Points Curve

One of the most important visualizations in football analytics is the expected points curve showing how EP varies by field position.

R
Python

#| label: fig-ep-curve-r
#| fig-cap: "Expected Points by Field Position and Down"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Create prediction grid
ep_curve_data <- expand_grid(
  yardline_100 = 1:99,
  down = 1:4,
  ydstogo = 10
) %>%
  mutate(
    ep = predict(xgb_model,
                 xgb.DMatrix(data = as.matrix(select(., yardline_100, ydstogo, down))))
  )

# Plot EP curve
ggplot(ep_curve_data, aes(x = yardline_100, y = ep, color = factor(down))) +
  geom_line(size = 1.5) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_hline(yintercept = 3, linetype = "dotted", alpha = 0.5, color = "darkgreen") +
  geom_hline(yintercept = 7, linetype = "dotted", alpha = 0.5, color = "darkblue") +
  scale_x_reverse(
    breaks = seq(0, 100, 10),
    labels = c("Own\nGoal", seq(10, 40, 10), "50", seq(40, 10, -10), "Opp\nGoal")
  ) +
  scale_color_manual(
    values = c("1" = "#0072B2", "2" = "#009E73", "3" = "#D55E00", "4" = "#CC79A7"),
    labels = c("1st Down", "2nd Down", "3rd Down", "4th Down")
  ) +
  annotate("text", x = 95, y = 3.2, label = "Field Goal",
           size = 3, color = "darkgreen", hjust = 0) +
  annotate("text", x = 95, y = 7.2, label = "Touchdown",
           size = 3, color = "darkblue", hjust = 0) +
  labs(
    title = "Expected Points by Field Position and Down",
    subtitle = "Based on XGBoost model trained on 2016-2022 data, 10 yards to go",
    x = "Field Position (Yards from Opponent Goal)",
    y = "Expected Points",
    color = "Down",
    caption = "Data: nflfastR | Model: Custom XGBoost"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 11),
    legend.position = "top",
    panel.grid.minor = element_blank()
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-ep-curve-py
#| fig-cap: "Expected Points by Field Position and Down - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Create prediction grid
yardlines = np.arange(1, 100)
downs = [1, 2, 3, 4]
ydstogo = 10

fig, ax = plt.subplots(figsize=(12, 8))

colors = ['#0072B2', '#009E73', '#D55E00', '#CC79A7']
labels = ['1st Down', '2nd Down', '3rd Down', '4th Down']

for down, color, label in zip(downs, colors, labels):
    X_grid = np.column_stack([
        yardlines,
        np.full_like(yardlines, ydstogo),
        np.full_like(yardlines, down)
    ])
    ep_values = gb_model.predict(X_grid)
    ax.plot(yardlines, ep_values, color=color, linewidth=2.5, label=label)

# Add reference lines
ax.axhline(y=0, color='black', linestyle='--', alpha=0.5)
ax.axhline(y=3, color='darkgreen', linestyle=':', alpha=0.5)
ax.axhline(y=7, color='darkblue', linestyle=':', alpha=0.5)

# Add annotations
ax.text(95, 3.2, 'Field Goal', fontsize=10, color='darkgreen')
ax.text(95, 7.2, 'Touchdown', fontsize=10, color='darkblue')

# Format x-axis
ax.set_xlim(100, 0)  # Reverse x-axis
ax.set_xticks([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
ax.set_xticklabels(['Own\nGoal', '10', '20', '30', '40', '50',
                    '40', '30', '20', '10', 'Opp\nGoal'])

ax.set_xlabel('Field Position (Yards from Opponent Goal)', fontsize=12, fontweight='bold')
ax.set_ylabel('Expected Points', fontsize=12, fontweight='bold')
ax.set_title('Expected Points by Field Position and Down\n' +
             'Based on Gradient Boosting model trained on 2016-2022 data, 10 yards to go',
             fontsize=14, fontweight='bold', pad=20)

ax.legend(loc='upper left', fontsize=10, framealpha=0.9)
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: nfl_data_py | Model: Custom Gradient Boosting',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Key Observations from the EP Curve

1. **Field position matters tremendously**: Moving from your own 20 to your own 40 increases EP by ~0.8 points 2. **Diminishing returns**: The EP curve flattens as you approach the goal line 3. **Down matters**: 1st down is always more valuable than 4th down at the same position 4. **The 50-yard line**: This is roughly the break-even point where EP = 0 for most downs

Calculating and Interpreting EPA

Now that we have an EP model, we can calculate EPA for any play by finding the difference in expected points before and after the play.

EPA Components

EPA can be decomposed into several components:

$$ \text{EPA} = \text{EPA}_{\text{yards}} + \text{EPA}_{\text{down}} + \text{EPA}_{\text{turnover}} + \text{EPA}_{\text{score}} $$

Where:
- $\text{EPA}_{\text{yards}}$ = value from yardage gained
- $\text{EPA}_{\text{down}}$ = value/penalty from down change
- $\text{EPA}_{\text{turnover}}$ = penalty for turnovers
- $\text{EPA}_{\text{score}}$ = value from immediate scoring

#| label: calculate-epa-r
#| message: false
#| warning: false

# Function to calculate EPA for a single play
calculate_epa <- function(start_down, start_ydstogo, start_yardline,
                          end_down, end_ydstogo, end_yardline,
                          ep_model) {

  # Get EP at start of play
  start_state <- matrix(c(start_yardline, start_ydstogo, start_down), nrow = 1)
  ep_start <- predict(ep_model, xgb.DMatrix(data = start_state))

  # Get EP at end of play
  end_state <- matrix(c(end_yardline, end_ydstogo, end_down), nrow = 1)
  ep_end <- predict(ep_model, xgb.DMatrix(data = end_state))

  # EPA is the difference
  epa <- ep_end - ep_start

  return(list(ep_start = ep_start, ep_end = ep_end, epa = epa))
}

# Example: 15-yard completion on 1st-and-10 from own 25
example_play <- calculate_epa(
  start_down = 1, start_ydstogo = 10, start_yardline = 75,
  end_down = 1, end_ydstogo = 10, end_yardline = 60,
  ep_model = xgb_model
)

cat("Example Play: 15-yard completion on 1st-and-10 from own 25\n")
cat("EP Start:", round(example_play$ep_start, 3), "\n")
cat("EP End:", round(example_play$ep_end, 3), "\n")
cat("EPA:", round(example_play$epa, 3), "\n")

#| label: calculate-epa-py
#| message: false
#| warning: false

def calculate_epa(start_down, start_ydstogo, start_yardline,
                  end_down, end_ydstogo, end_yardline,
                  ep_model):
    """Calculate EPA for a single play"""

    # Get EP at start of play
    start_state = np.array([[start_yardline, start_ydstogo, start_down]])
    ep_start = ep_model.predict(start_state)[0]

    # Get EP at end of play
    end_state = np.array([[end_yardline, end_ydstogo, end_down]])
    ep_end = ep_model.predict(end_state)[0]

    # EPA is the difference
    epa = ep_end - ep_start

    return {
        'ep_start': ep_start,
        'ep_end': ep_end,
        'epa': epa
    }

# Example: 15-yard completion on 1st-and-10 from own 25
example_play = calculate_epa(
    start_down=1, start_ydstogo=10, start_yardline=75,
    end_down=1, end_ydstogo=10, end_yardline=60,
    ep_model=gb_model
)

print("Example Play: 15-yard completion on 1st-and-10 from own 25")
print(f"EP Start: {example_play['ep_start']:.3f}")
print(f"EP End: {example_play['ep_end']:.3f}")
print(f"EPA: {example_play['epa']:.3f}")

EPA Distributions by Play Type

Let's analyze how EPA varies by play type:

R
Python

#| label: fig-epa-by-play-type-r
#| fig-cap: "EPA Distribution by Play Type"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Load fresh data with play types
pbp_2023 <- load_pbp(2023)

# Calculate EPA statistics by play type
epa_by_type <- pbp_2023 %>%
  filter(!is.na(epa), play_type %in% c("pass", "run")) %>%
  group_by(play_type) %>%
  summarise(
    n = n(),
    mean_epa = mean(epa),
    median_epa = median(epa),
    sd_epa = sd(epa),
    q25 = quantile(epa, 0.25),
    q75 = quantile(epa, 0.75),
    success_rate = mean(epa > 0),
    .groups = "drop"
  )

# Display summary table
epa_by_type %>%
  gt() %>%
  cols_label(
    play_type = "Play Type",
    n = "Plays",
    mean_epa = "Mean EPA",
    median_epa = "Median EPA",
    sd_epa = "Std Dev",
    q25 = "25th %ile",
    q75 = "75th %ile",
    success_rate = "Success %"
  ) %>%
  fmt_number(
    columns = c(mean_epa, median_epa, sd_epa, q25, q75),
    decimals = 3
  ) %>%
  fmt_percent(
    columns = success_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = n,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "EPA Statistics by Play Type",
    subtitle = "2023 NFL Regular Season"
  )

# Create violin plot
pbp_2023 %>%
  filter(!is.na(epa), play_type %in% c("pass", "run"), abs(epa) < 5) %>%
  ggplot(aes(x = play_type, y = epa, fill = play_type)) +
  geom_violin(alpha = 0.6, trim = FALSE) +
  geom_boxplot(width = 0.2, fill = "white", alpha = 0.8) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.7) +
  scale_fill_manual(
    values = c("pass" = "#00BFC4", "run" = "#F8766D"),
    labels = c("Pass", "Run")
  ) +
  scale_x_discrete(labels = c("Pass", "Run")) +
  labs(
    title = "EPA Distribution by Play Type",
    subtitle = "2023 NFL Regular Season (EPA limited to ±5 for visualization)",
    x = "Play Type",
    y = "Expected Points Added",
    caption = "Data: nflfastR"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    legend.position = "none",
    panel.grid.major.x = element_blank()
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-epa-by-play-type-py
#| fig-cap: "EPA Distribution by Play Type - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Load fresh data
pbp_2023_py = nfl.import_pbp_data([2023])

# Calculate EPA statistics by play type
epa_by_type = (pbp_2023_py
    .query("epa.notna() & play_type.isin(['pass', 'run'])")
    .groupby('play_type')
    .agg(
        n=('epa', 'count'),
        mean_epa=('epa', 'mean'),
        median_epa=('epa', 'median'),
        sd_epa=('epa', 'std'),
        q25=('epa', lambda x: x.quantile(0.25)),
        q75=('epa', lambda x: x.quantile(0.75)),
        success_rate=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
)

print("\nEPA Statistics by Play Type (2023 NFL Regular Season)")
print("=" * 80)
print(epa_by_type.to_string(index=False))

# Create violin plot
fig, ax = plt.subplots(figsize=(12, 8))

plot_data = pbp_2023_py.query(
    "epa.notna() & play_type.isin(['pass', 'run']) & abs(epa) < 5"
)

positions = [1, 2]
colors = ['#00BFC4', '#F8766D']
labels = ['Pass', 'Run']

for pos, play_type, color in zip(positions, ['pass', 'run'], colors):
    data = plot_data[plot_data['play_type'] == play_type]['epa']

    # Create violin plot
    parts = ax.violinplot([data], positions=[pos], widths=0.7,
                          showmeans=False, showextrema=False)
    for pc in parts['bodies']:
        pc.set_facecolor(color)
        pc.set_alpha(0.6)

    # Add boxplot
    bp = ax.boxplot([data], positions=[pos], widths=0.2,
                    patch_artist=True,
                    boxprops=dict(facecolor='white', alpha=0.8),
                    medianprops=dict(color='black', linewidth=2),
                    whiskerprops=dict(color='black'),
                    capprops=dict(color='black'))

ax.axhline(y=0, color='black', linestyle='--', alpha=0.7)
ax.set_xticks(positions)
ax.set_xticklabels(labels)
ax.set_xlabel('Play Type', fontsize=12, fontweight='bold')
ax.set_ylabel('Expected Points Added', fontsize=12, fontweight='bold')
ax.set_title('EPA Distribution by Play Type\n' +
             '2023 NFL Regular Season (EPA limited to ±5 for visualization)',
             fontsize=14, fontweight='bold', pad=20)
ax.grid(True, alpha=0.3, axis='y')
ax.text(0.98, 0.02, 'Data: nfl_data_py',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Key Insights from EPA Distributions

1. **Pass plays have higher mean EPA** than run plays (~0.05 vs -0.05), but also higher variance 2. **Pass plays have heavier tails**: More big gains and big losses 3. **Both distributions are centered near zero**: This is by design—EP models are trained to have mean zero 4. **Success rates differ**: Pass plays succeed ~48% of the time, runs ~43%

EPA by Game Situation: Understanding Context-Dependent EPA

While we've established that EPA provides a context-aware measure of play value, it's crucial to understand HOW EPA varies across different game situations. Not all positive EPA is equally impressive, and not all negative EPA is equally damaging. A play's EPA value must be interpreted relative to the specific situation—what was possible given the down, distance, and field position?

This section explores how EPA distributions change across different game contexts, revealing patterns that inform strategic decision-making. You'll see that certain situations favor offense or defense, that some downs are more volatile than others, and that field position fundamentally alters what constitutes "success." Understanding these patterns helps you:

Evaluate play-calling: Was that conservative run on 2nd-and-short smart given the situation?
Assess player performance: Did the quarterback deliver positive EPA in difficult situations?
Identify team tendencies: Does this offense struggle specifically on 3rd-and-medium?
Make strategic decisions: When should we be aggressive vs. conservative based on expected EPA?

The patterns we'll uncover aren't just statistical curiosities—they reflect fundamental football realities about down-and-distance management, field position value, and optimal strategy. These insights explain why coaches make certain decisions, why some statistics mislead, and how to think probabilistically about game situations.

EPA by Down and Distance: How Game State Affects Play Value

Down and distance fundamentally shape what's possible on any play. On 1st-and-10, the offense has three attempts to gain 10 yards—a manageable task with multiple strategic options. On 3rd-and-15, the offense must gain 15 yards in one attempt or face punting—a much more challenging situation with limited options. EPA reflects these realities, with distributions that shift dramatically across downs and distances.

Let's examine how EPA varies by down to understand these contextual differences and their strategic implications.

R
Python

#| label: fig-epa-by-down-distance-r
#| fig-cap: "Mean EPA by Down and Distance Category"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Calculate EPA by down and distance
epa_down_dist <- pbp_2023 %>%
  filter(
    !is.na(epa),
    play_type %in% c("pass", "run")
  ) %>%
  mutate(
    dist_category = case_when(
      ydstogo <= 3 ~ "Short (1-3)",
      ydstogo <= 6 ~ "Medium (4-6)",
      ydstogo <= 9 ~ "Long (7-9)",
      TRUE ~ "Very Long (10+)"
    ),
    dist_category = factor(dist_category,
                          levels = c("Short (1-3)", "Medium (4-6)",
                                   "Long (7-9)", "Very Long (10+)"))
  ) %>%
  group_by(down, dist_category, play_type) %>%
  summarise(
    n = n(),
    mean_epa = mean(epa),
    se_epa = sd(epa) / sqrt(n()),
    .groups = "drop"
  ) %>%
  filter(n >= 20)  # Require minimum sample size

# Create faceted plot
ggplot(epa_down_dist, aes(x = dist_category, y = mean_epa, fill = play_type)) +
  geom_col(position = "dodge", alpha = 0.8) +
  geom_errorbar(
    aes(ymin = mean_epa - 1.96 * se_epa, ymax = mean_epa + 1.96 * se_epa),
    position = position_dodge(0.9),
    width = 0.2
  ) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  facet_wrap(~down, labeller = labeller(down = function(x) paste0(x, "st/nd/rd/th Down"))) +
  scale_fill_manual(
    values = c("pass" = "#00BFC4", "run" = "#F8766D"),
    labels = c("Pass", "Run")
  ) +
  labs(
    title = "Mean EPA by Down and Distance",
    subtitle = "2023 NFL Regular Season with 95% confidence intervals",
    x = "Distance Category",
    y = "Mean EPA",
    fill = "Play Type",
    caption = "Data: nflfastR | Minimum 20 plays per category"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top",
    axis.text.x = element_text(angle = 15, hjust = 1),
    panel.grid.major.x = element_blank()
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-epa-by-down-distance-py
#| fig-cap: "Mean EPA by Down and Distance Category - Python"
#| fig-width: 12
#| fig-height: 10
#| message: false
#| warning: false

# Calculate EPA by down and distance
pbp_filtered = pbp_2023_py.query("epa.notna() & play_type.isin(['pass', 'run'])")

pbp_filtered['dist_category'] = pd.cut(
    pbp_filtered['ydstogo'],
    bins=[0, 3, 6, 9, 100],
    labels=['Short (1-3)', 'Medium (4-6)', 'Long (7-9)', 'Very Long (10+)']
)

epa_down_dist = (pbp_filtered
    .groupby(['down', 'dist_category', 'play_type'])
    .agg(
        n=('epa', 'count'),
        mean_epa=('epa', 'mean'),
        se_epa=('epa', lambda x: x.std() / np.sqrt(len(x)))
    )
    .reset_index()
    .query('n >= 20')
)

# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()

for idx, down in enumerate([1, 2, 3, 4]):
    ax = axes[idx]
    data = epa_down_dist[epa_down_dist['down'] == down]

    if len(data) == 0:
        continue

    # Set up positions
    categories = ['Short (1-3)', 'Medium (4-6)', 'Long (7-9)', 'Very Long (10+)']
    x = np.arange(len(categories))
    width = 0.35

    pass_data = data[data['play_type'] == 'pass'].set_index('dist_category')
    run_data = data[data['play_type'] == 'run'].set_index('dist_category')

    # Plot bars
    for i, cat in enumerate(categories):
        if cat in pass_data.index:
            row = pass_data.loc[cat]
            ax.bar(i - width/2, row['mean_epa'], width,
                  label='Pass' if i == 0 else '',
                  color='#00BFC4', alpha=0.8)
            ax.errorbar(i - width/2, row['mean_epa'],
                       yerr=1.96*row['se_epa'],
                       color='black', capsize=3, linewidth=1)

        if cat in run_data.index:
            row = run_data.loc[cat]
            ax.bar(i + width/2, row['mean_epa'], width,
                  label='Run' if i == 0 else '',
                  color='#F8766D', alpha=0.8)
            ax.errorbar(i + width/2, row['mean_epa'],
                       yerr=1.96*row['se_epa'],
                       color='black', capsize=3, linewidth=1)

    ax.axhline(y=0, color='black', linestyle='--', alpha=0.5)
    ax.set_xticks(x)
    ax.set_xticklabels(categories, rotation=15, ha='right')
    ax.set_ylabel('Mean EPA', fontweight='bold')
    ax.set_title(f'{down}{"st" if down==1 else "nd" if down==2 else "rd" if down==3 else "th"} Down',
                fontweight='bold')
    ax.grid(True, alpha=0.3, axis='y')

    if idx == 0:
        ax.legend(loc='upper left')

fig.suptitle('Mean EPA by Down and Distance\n2023 NFL Regular Season with 95% confidence intervals',
            fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()

EPA by Field Position

R
Python

#| label: fig-epa-by-field-position-r
#| fig-cap: "Mean EPA by Field Position Zone"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Calculate EPA by field position zone
epa_field_position <- pbp_2023 %>%
  filter(
    !is.na(epa),
    play_type %in% c("pass", "run")
  ) %>%
  mutate(
    field_zone = case_when(
      yardline_100 >= 90 ~ "Own 10",
      yardline_100 >= 75 ~ "Own 11-25",
      yardline_100 >= 55 ~ "Own 26-45",
      yardline_100 >= 45 ~ "Midfield",
      yardline_100 >= 25 ~ "Opp 45-26",
      yardline_100 >= 10 ~ "Opp 25-11",
      TRUE ~ "Red Zone"
    ),
    field_zone = factor(field_zone,
                       levels = c("Own 10", "Own 11-25", "Own 26-45",
                                "Midfield", "Opp 45-26", "Opp 25-11", "Red Zone"))
  ) %>%
  group_by(field_zone, play_type) %>%
  summarise(
    n = n(),
    mean_epa = mean(epa),
    success_rate = mean(epa > 0),
    .groups = "drop"
  )

# Create dual-axis plot
p1 <- ggplot(epa_field_position, aes(x = field_zone, group = play_type)) +
  geom_line(aes(y = mean_epa, color = play_type), size = 1.5) +
  geom_point(aes(y = mean_epa, color = play_type), size = 3) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  scale_color_manual(
    values = c("pass" = "#00BFC4", "run" = "#F8766D"),
    labels = c("Pass", "Run")
  ) +
  labs(
    title = "Mean EPA by Field Position Zone",
    subtitle = "2023 NFL Regular Season",
    x = "Field Position Zone",
    y = "Mean EPA",
    color = "Play Type",
    caption = "Data: nflfastR"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text.x = element_text(angle = 30, hjust = 1),
    legend.position = "top",
    panel.grid.major.x = element_blank()
  )

p1

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-epa-by-field-position-py
#| fig-cap: "Mean EPA by Field Position Zone - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Calculate EPA by field position zone
def field_zone(yardline):
    if yardline >= 90:
        return "Own 10"
    elif yardline >= 75:
        return "Own 11-25"
    elif yardline >= 55:
        return "Own 26-45"
    elif yardline >= 45:
        return "Midfield"
    elif yardline >= 25:
        return "Opp 45-26"
    elif yardline >= 10:
        return "Opp 25-11"
    else:
        return "Red Zone"

pbp_field = pbp_2023_py.query("epa.notna() & play_type.isin(['pass', 'run'])")
pbp_field['field_zone'] = pbp_field['yardline_100'].apply(field_zone)

zone_order = ["Own 10", "Own 11-25", "Own 26-45", "Midfield",
              "Opp 45-26", "Opp 25-11", "Red Zone"]
pbp_field['field_zone'] = pd.Categorical(pbp_field['field_zone'],
                                         categories=zone_order,
                                         ordered=True)

epa_field_position = (pbp_field
    .groupby(['field_zone', 'play_type'])
    .agg(
        n=('epa', 'count'),
        mean_epa=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
)

# Create plot
fig, ax = plt.subplots(figsize=(12, 7))

for play_type, color, label in [('pass', '#00BFC4', 'Pass'),
                                ('run', '#F8766D', 'Run')]:
    data = epa_field_position[epa_field_position['play_type'] == play_type]
    ax.plot(data['field_zone'], data['mean_epa'],
           marker='o', markersize=8, linewidth=2.5,
           color=color, label=label)

ax.axhline(y=0, color='black', linestyle='--', alpha=0.5)
ax.set_xlabel('Field Position Zone', fontsize=12, fontweight='bold')
ax.set_ylabel('Mean EPA', fontsize=12, fontweight='bold')
ax.set_title('Mean EPA by Field Position Zone\n2023 NFL Regular Season',
            fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper right', fontsize=11, title='Play Type')
ax.grid(True, alpha=0.3)
plt.xticks(rotation=30, ha='right')
ax.text(0.98, 0.02, 'Data: nfl_data_py',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Field Position Effects on EPA

1. **Red Zone uniqueness**: EPA distributions are markedly different in the red zone due to limited field and high scoring probability 2. **Pass advantage increases** as you move away from your own goal 3. **Run plays stabilize** in EPA across most field positions 4. **Variance increases** near midfield where more play types are viable

Contextual EPA Analysis

Basic EPA doesn't account for all game context. Let's explore several important adjustments.

Score-Adjusted EPA

Game script significantly impacts play-calling and EPA. Teams with large leads run more, while teams trailing pass more.

R
Python

#| label: fig-score-adjusted-epa-r
#| fig-cap: "EPA by Play Type and Score Differential"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Calculate EPA by score differential
epa_score <- pbp_2023 %>%
  filter(
    !is.na(epa),
    play_type %in% c("pass", "run"),
    abs(score_differential) <= 21
  ) %>%
  mutate(
    score_diff_bin = cut(score_differential,
                         breaks = seq(-21, 21, 3),
                         include.lowest = TRUE)
  ) %>%
  group_by(score_diff_bin, play_type) %>%
  summarise(
    n = n(),
    mean_epa = mean(epa),
    play_pct = n() / sum(n()),
    .groups = "drop"
  ) %>%
  filter(n >= 20)

# Extract midpoint of bins for plotting
epa_score <- epa_score %>%
  mutate(
    score_mid = as.numeric(gsub("\\(|\\]|\\[|\\,.*", "", score_diff_bin)) + 1.5
  )

# Plot
ggplot(epa_score, aes(x = score_mid, y = mean_epa, color = play_type)) +
  geom_line(size = 1.3) +
  geom_point(size = 2.5) +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_vline(xintercept = 0, linetype = "dotted", alpha = 0.5) +
  scale_color_manual(
    values = c("pass" = "#00BFC4", "run" = "#F8766D"),
    labels = c("Pass", "Run")
  ) +
  labs(
    title = "EPA by Play Type and Score Differential",
    subtitle = "2023 NFL Regular Season | Positive = Team Leading",
    x = "Score Differential (Team Score - Opponent Score)",
    y = "Mean EPA",
    color = "Play Type",
    caption = "Data: nflfastR"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-score-adjusted-epa-py
#| fig-cap: "EPA by Play Type and Score Differential - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Calculate EPA by score differential
pbp_score = pbp_2023_py.query(
    "epa.notna() & play_type.isin(['pass', 'run']) & abs(score_differential) <= 21"
)

pbp_score['score_diff_bin'] = pd.cut(
    pbp_score['score_differential'],
    bins=range(-21, 24, 3)
)

epa_score = (pbp_score
    .groupby(['score_diff_bin', 'play_type'])
    .agg(
        n=('epa', 'count'),
        mean_epa=('epa', 'mean')
    )
    .reset_index()
    .query('n >= 20')
)

# Get midpoint of bins
epa_score['score_mid'] = epa_score['score_diff_bin'].apply(lambda x: x.mid)

# Plot
fig, ax = plt.subplots(figsize=(12, 7))

for play_type, color, label in [('pass', '#00BFC4', 'Pass'),
                                ('run', '#F8766D', 'Run')]:
    data = epa_score[epa_score['play_type'] == play_type].sort_values('score_mid')
    ax.plot(data['score_mid'], data['mean_epa'],
           marker='o', markersize=7, linewidth=2,
           color=color, label=label)

ax.axhline(y=0, color='black', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='black', linestyle=':', alpha=0.5)
ax.set_xlabel('Score Differential (Team Score - Opponent Score)',
             fontsize=12, fontweight='bold')
ax.set_ylabel('Mean EPA', fontsize=12, fontweight='bold')
ax.set_title('EPA by Play Type and Score Differential\n' +
            '2023 NFL Regular Season | Positive = Team Leading',
            fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper right', fontsize=11, title='Play Type')
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: nfl_data_py',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

Garbage Time Adjustment

"Garbage time" refers to situations where the game outcome is largely decided. These plays can skew EPA metrics and should often be filtered.

R
Python

#| label: garbage-time-analysis-r
#| message: false
#| warning: false

# Define garbage time function
is_garbage_time <- function(score_diff, qtr, seconds_remaining) {
  # Garbage time if:
  # - Up/down 3+ scores in Q4 with <5 min
  # - Up/down 4+ scores anytime in Q4
  (abs(score_diff) >= 24 & qtr == 4 & seconds_remaining < 300) |
  (abs(score_diff) >= 31 & qtr == 4)
}

# Compare EPA with and without garbage time
epa_comparison <- pbp_2023 %>%
  filter(!is.na(epa), play_type %in% c("pass", "run")) %>%
  mutate(
    is_garbage = is_garbage_time(score_differential, qtr, game_seconds_remaining)
  ) %>%
  group_by(posteam) %>%
  summarise(
    all_epa = mean(epa),
    non_garbage_epa = mean(epa[!is_garbage]),
    garbage_pct = mean(is_garbage) * 100,
    .groups = "drop"
  ) %>%
  arrange(desc(non_garbage_epa))

# Display top 10 teams
epa_comparison %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    all_epa = "All Plays EPA",
    non_garbage_epa = "Non-Garbage EPA",
    garbage_pct = "Garbage Time %"
  ) %>%
  fmt_number(
    columns = c(all_epa, non_garbage_epa),
    decimals = 3
  ) %>%
  fmt_number(
    columns = garbage_pct,
    decimals = 1
  ) %>%
  tab_header(
    title = "Top 10 Offenses by EPA (2023)",
    subtitle = "Comparing all plays vs non-garbage time"
  ) %>%
  data_color(
    columns = non_garbage_epa,
    colors = scales::col_numeric(
      palette = c("#FEE5D9", "#A50F15"),
      domain = NULL
    )
  )

#| label: garbage-time-analysis-py
#| message: false
#| warning: false

def is_garbage_time(row):
    """Define garbage time situations"""
    score_diff = abs(row['score_differential'])
    qtr = row['qtr']
    seconds = row['game_seconds_remaining']

    # Garbage time if:
    # - Up/down 3+ scores in Q4 with <5 min
    # - Up/down 4+ scores anytime in Q4
    return ((score_diff >= 24 and qtr == 4 and seconds < 300) or
            (score_diff >= 31 and qtr == 4))

# Compare EPA with and without garbage time
pbp_garbage = pbp_2023_py.query("epa.notna() & play_type.isin(['pass', 'run'])")
pbp_garbage['is_garbage'] = pbp_garbage.apply(is_garbage_time, axis=1)

epa_comparison = (pbp_garbage
    .groupby('posteam')
    .agg(
        all_epa=('epa', 'mean'),
        non_garbage_epa=('epa', lambda x: x[~pbp_garbage.loc[x.index, 'is_garbage']].mean()),
        garbage_pct=('is_garbage', lambda x: x.mean() * 100)
    )
    .reset_index()
    .sort_values('non_garbage_epa', ascending=False)
)

print("\nTop 10 Offenses by EPA (2023)")
print("Comparing all plays vs non-garbage time")
print("=" * 70)
print(epa_comparison.head(10).to_string(index=False))

EPA Stability and Predictive Power

One of EPA's greatest strengths is its stability and predictive power. Let's examine how well EPA predicts future performance and win outcomes.

Year-to-Year Stability

R
Python

#| label: epa-stability-r
#| message: false
#| warning: false
#| cache: true

# Load multiple seasons
pbp_multi <- load_pbp(2020:2023)

# Calculate team EPA by season
team_epa_by_season <- pbp_multi %>%
  filter(!is.na(epa), play_type %in% c("pass", "run")) %>%
  group_by(season, posteam) %>%
  summarise(
    off_epa = mean(epa),
    plays = n(),
    .groups = "drop"
  ) %>%
  filter(plays >= 500)  # Minimum play threshold

# Calculate year-to-year correlation
stability_results <- team_epa_by_season %>%
  arrange(posteam, season) %>%
  group_by(posteam) %>%
  mutate(
    next_season_epa = lead(off_epa)
  ) %>%
  ungroup() %>%
  filter(!is.na(next_season_epa))

# Calculate correlation
epa_correlation <- cor(stability_results$off_epa,
                       stability_results$next_season_epa)

cat("Year-to-year EPA correlation:", round(epa_correlation, 3), "\n")
cat("R-squared:", round(epa_correlation^2, 3), "\n")

#| label: epa-stability-py
#| message: false
#| warning: false
#| cache: true

# Load multiple seasons
pbp_multi = nfl.import_pbp_data([2020, 2021, 2022, 2023])

# Calculate team EPA by season
team_epa_by_season = (pbp_multi
    .query("epa.notna() & play_type.isin(['pass', 'run'])")
    .groupby(['season', 'posteam'])
    .agg(
        off_epa=('epa', 'mean'),
        plays=('epa', 'count')
    )
    .reset_index()
    .query('plays >= 500')
)

# Calculate year-to-year correlation
stability_results = (team_epa_by_season
    .sort_values(['posteam', 'season'])
    .groupby('posteam')
    .apply(lambda x: x.assign(next_season_epa=x['off_epa'].shift(-1)))
    .reset_index(drop=True)
    .dropna(subset=['next_season_epa'])
)

# Calculate correlation
epa_correlation = stability_results['off_epa'].corr(
    stability_results['next_season_epa']
)

print(f"\nYear-to-year EPA correlation: {epa_correlation:.3f}")
print(f"R-squared: {epa_correlation**2:.3f}")

EPA and Winning

R
Python

#| label: fig-epa-wins-r
#| fig-cap: "Team EPA vs Win Percentage"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

# Calculate team EPA and win percentage
team_performance <- pbp_2023 %>%
  filter(!is.na(epa)) %>%
  group_by(posteam) %>%
  summarise(
    off_epa = mean(epa[play_type %in% c("pass", "run")], na.rm = TRUE),
    def_epa = -mean(epa[defteam == posteam], na.rm = TRUE),
    plays = n(),
    .groups = "drop"
  )

# Get win totals
games_2023 <- nflreadr::load_schedules(2023) %>%
  filter(game_type == "REG") %>%
  select(season, week, home_team, away_team, home_score, away_score)

team_wins <- bind_rows(
  games_2023 %>%
    transmute(team = home_team,
              win = as.numeric(home_score > away_score)),
  games_2023 %>%
    transmute(team = away_team,
              win = as.numeric(away_score > home_score))
) %>%
  group_by(team) %>%
  summarise(
    games = n(),
    wins = sum(win),
    win_pct = wins / games,
    .groups = "drop"
  )

# Combine
team_full <- team_performance %>%
  left_join(team_wins, by = c("posteam" = "team"))

# Calculate correlation
epa_win_cor <- cor(team_full$off_epa, team_full$win_pct, use = "complete.obs")

# Plot
ggplot(team_full, aes(x = off_epa, y = win_pct)) +
  geom_hline(yintercept = 0.5, linetype = "dashed", alpha = 0.5) +
  geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
  geom_smooth(method = "lm", se = TRUE, color = "gray60", alpha = 0.3) +
  geom_nfl_logos(aes(team_abbr = posteam), width = 0.05) +
  labs(
    title = "Offensive EPA vs Win Percentage",
    subtitle = sprintf("2023 NFL Regular Season | Correlation: r = %.3f", epa_win_cor),
    x = "Offensive EPA per Play",
    y = "Win Percentage",
    caption = "Data: nflfastR"
  ) +
  scale_y_continuous(labels = scales::percent_format()) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    panel.grid.minor = element_blank()
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-epa-wins-py
#| fig-cap: "Team EPA vs Win Percentage - Python"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false

from scipy import stats

# Calculate team EPA
team_performance = (pbp_2023_py
    .query("epa.notna()")
    .groupby('posteam')
    .agg(
        off_epa=('epa', lambda x: x[pbp_2023_py.loc[x.index, 'play_type'].isin(['pass', 'run'])].mean()),
        plays=('epa', 'count')
    )
    .reset_index()
)

# Get win totals
games_2023 = nfl.import_schedules([2023])
games_reg = games_2023[games_2023['game_type'] == 'REG']

home_results = pd.DataFrame({
    'team': games_reg['home_team'],
    'win': (games_reg['home_score'] > games_reg['away_score']).astype(int)
})

away_results = pd.DataFrame({
    'team': games_reg['away_team'],
    'win': (games_reg['away_score'] > games_reg['home_score']).astype(int)
})

team_wins = pd.concat([home_results, away_results]).groupby('team').agg({
    'win': ['count', 'sum']
}).reset_index()
team_wins.columns = ['team', 'games', 'wins']
team_wins['win_pct'] = team_wins['wins'] / team_wins['games']

# Combine
team_full = team_performance.merge(team_wins, left_on='posteam', right_on='team')

# Calculate correlation
epa_win_cor = team_full['off_epa'].corr(team_full['win_pct'])

# Plot
fig, ax = plt.subplots(figsize=(10, 8))

# Add reference lines
ax.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5)

# Regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(
    team_full['off_epa'], team_full['win_pct']
)
line_x = np.array([team_full['off_epa'].min(), team_full['off_epa'].max()])
line_y = slope * line_x + intercept
ax.plot(line_x, line_y, color='gray', alpha=0.6, linewidth=2)

# Scatter plot
ax.scatter(team_full['off_epa'], team_full['win_pct'],
          s=150, alpha=0.6, color='steelblue')

# Add team labels
for _, row in team_full.iterrows():
    ax.annotate(row['posteam'],
               (row['off_epa'], row['win_pct']),
               fontsize=8, ha='center', va='center')

ax.set_xlabel('Offensive EPA per Play', fontsize=12, fontweight='bold')
ax.set_ylabel('Win Percentage', fontsize=12, fontweight='bold')
ax.set_title(f'Offensive EPA vs Win Percentage\n' +
            f'2023 NFL Regular Season | Correlation: r = {epa_win_cor:.3f}',
            fontsize=14, fontweight='bold', pad=20)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: nfl_data_py',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

EPA's Predictive Power

EPA is one of the strongest predictors of team success: 1. **High correlation with wins**: r ≈ 0.75-0.85 in most seasons 2. **Year-to-year stability**: r ≈ 0.40-0.50, higher than most metrics 3. **Better than yards**: Explains ~60% of win variance vs ~40% for total yards 4. **Additive**: Offensive + Defensive EPA explains ~75% of win variance

EPA vs Traditional Metrics

Let's compare EPA to traditional football statistics to understand its advantages.

EPA vs Total Yards

R
Python

#| label: fig-epa-vs-yards-r
#| fig-cap: "Comparing EPA and Total Yards as Predictors"
#| fig-width: 12
#| fig-height: 6
#| message: false
#| warning: false

# Calculate team yards
team_yards <- pbp_2023 %>%
  filter(play_type %in% c("pass", "run")) %>%
  group_by(posteam) %>%
  summarise(
    total_yards = sum(yards_gained, na.rm = TRUE),
    total_plays = n(),
    yards_per_play = total_yards / total_plays,
    .groups = "drop"
  )

# Join with EPA and wins
comparison_data <- team_full %>%
  left_join(team_yards, by = "posteam")

# Calculate correlations
cor_epa <- cor(comparison_data$off_epa, comparison_data$win_pct,
               use = "complete.obs")
cor_yards <- cor(comparison_data$yards_per_play, comparison_data$win_pct,
                 use = "complete.obs")

# Create comparison plots
p1 <- ggplot(comparison_data, aes(x = off_epa, y = win_pct)) +
  geom_smooth(method = "lm", se = TRUE, color = "gray60", alpha = 0.3) +
  geom_nfl_logos(aes(team_abbr = posteam), width = 0.05) +
  labs(
    title = "EPA per Play vs Win %",
    subtitle = sprintf("r = %.3f | R² = %.3f", cor_epa, cor_epa^2),
    x = "EPA per Play",
    y = "Win Percentage"
  ) +
  scale_y_continuous(labels = scales::percent_format()) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

p2 <- ggplot(comparison_data, aes(x = yards_per_play, y = win_pct)) +
  geom_smooth(method = "lm", se = TRUE, color = "gray60", alpha = 0.3) +
  geom_nfl_logos(aes(team_abbr = posteam), width = 0.05) +
  labs(
    title = "Yards per Play vs Win %",
    subtitle = sprintf("r = %.3f | R² = %.3f", cor_yards, cor_yards^2),
    x = "Yards per Play",
    y = "Win Percentage"
  ) +
  scale_y_continuous(labels = scales::percent_format()) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

library(patchwork)
p1 + p2 +
  plot_annotation(
    title = "EPA vs Yards: Which Predicts Winning Better?",
    subtitle = "2023 NFL Regular Season",
    caption = "Data: nflfastR"
  )

#| label: fig-epa-vs-yards-py
#| fig-cap: "Comparing EPA and Total Yards as Predictors - Python"
#| fig-width: 12
#| fig-height: 6
#| message: false
#| warning: false

# Calculate team yards
team_yards = (pbp_2023_py
    .query("play_type.isin(['pass', 'run'])")
    .groupby('posteam')
    .agg(
        total_yards=('yards_gained', 'sum'),
        total_plays=('yards_gained', 'count')
    )
    .reset_index()
)
team_yards['yards_per_play'] = team_yards['total_yards'] / team_yards['total_plays']

# Join with EPA and wins
comparison_data = team_full.merge(team_yards, on='posteam')

# Calculate correlations
cor_epa = comparison_data['off_epa'].corr(comparison_data['win_pct'])
cor_yards = comparison_data['yards_per_play'].corr(comparison_data['win_pct'])

# Create comparison plots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

# EPA plot
slope, intercept = np.polyfit(comparison_data['off_epa'],
                               comparison_data['win_pct'], 1)
line_x = np.array([comparison_data['off_epa'].min(),
                   comparison_data['off_epa'].max()])
ax1.plot(line_x, slope * line_x + intercept, color='gray', alpha=0.6, linewidth=2)
ax1.scatter(comparison_data['off_epa'], comparison_data['win_pct'],
           s=150, alpha=0.6, color='steelblue')
for _, row in comparison_data.iterrows():
    ax1.annotate(row['posteam'], (row['off_epa'], row['win_pct']),
                fontsize=7, ha='center', va='center')
ax1.set_xlabel('EPA per Play', fontsize=11, fontweight='bold')
ax1.set_ylabel('Win Percentage', fontsize=11, fontweight='bold')
ax1.set_title(f'EPA per Play vs Win %\nr = {cor_epa:.3f} | R² = {cor_epa**2:.3f}',
             fontsize=12, fontweight='bold')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax1.grid(True, alpha=0.3)

# Yards plot
slope, intercept = np.polyfit(comparison_data['yards_per_play'],
                               comparison_data['win_pct'], 1)
line_x = np.array([comparison_data['yards_per_play'].min(),
                   comparison_data['yards_per_play'].max()])
ax2.plot(line_x, slope * line_x + intercept, color='gray', alpha=0.6, linewidth=2)
ax2.scatter(comparison_data['yards_per_play'], comparison_data['win_pct'],
           s=150, alpha=0.6, color='coral')
for _, row in comparison_data.iterrows():
    ax2.annotate(row['posteam'], (row['yards_per_play'], row['win_pct']),
                fontsize=7, ha='center', va='center')
ax2.set_xlabel('Yards per Play', fontsize=11, fontweight='bold')
ax2.set_ylabel('Win Percentage', fontsize=11, fontweight='bold')
ax2.set_title(f'Yards per Play vs Win %\nr = {cor_yards:.3f} | R² = {cor_yards**2:.3f}',
             fontsize=12, fontweight='bold')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.grid(True, alpha=0.3)

fig.suptitle('EPA vs Yards: Which Predicts Winning Better?\n2023 NFL Regular Season',
            fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

print(f"\nCorrelation with Win %:")
print(f"  EPA per Play: r = {cor_epa:.3f} (R² = {cor_epa**2:.3f})")
print(f"  Yards per Play: r = {cor_yards:.3f} (R² = {cor_yards**2:.3f})")
print(f"  EPA explains {(cor_epa**2 - cor_yards**2) / cor_yards**2 * 100:.1f}% more variance")

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Why EPA is Superior

EPA outperforms traditional metrics for several reasons:

Context-aware: Accounts for down, distance, and field position
Point-based: Measures true value (points) rather than yards
Negative plays matter: A 3-yard loss on 3rd-and-2 is catastrophic, not just -3 yards
Turnover value: Properly weights the massive swing of turnovers
Efficiency-focused: Not inflated by garbage time or prevent defense

Advanced EPA Applications

Situational EPA Analysis

R
Python

#| label: situational-epa-r
#| message: false
#| warning: false

# Early down EPA vs late down EPA
situational_epa <- pbp_2023 %>%
  filter(!is.na(epa), play_type %in% c("pass", "run")) %>%
  mutate(
    situation = case_when(
      down <= 2 ~ "Early Downs (1st/2nd)",
      down >= 3 ~ "Late Downs (3rd/4th)"
    )
  ) %>%
  group_by(posteam, situation, play_type) %>%
  summarise(
    mean_epa = mean(epa),
    plays = n(),
    .groups = "drop"
  ) %>%
  filter(plays >= 30)

# Reshape for comparison
situational_wide <- situational_epa %>%
  pivot_wider(
    names_from = c(situation, play_type),
    values_from = mean_epa,
    names_sep = "_"
  )

# Display top teams
situational_wide %>%
  arrange(desc(`Early Downs (1st/2nd)_pass`)) %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    `Early Downs (1st/2nd)_pass` = "Early Pass",
    `Early Downs (1st/2nd)_run` = "Early Run",
    `Late Downs (3rd/4th)_pass` = "Late Pass",
    `Late Downs (3rd/4th)_run` = "Late Run"
  ) %>%
  fmt_number(
    columns = -posteam,
    decimals = 3
  ) %>%
  tab_header(
    title = "Situational EPA by Team",
    subtitle = "2023 NFL Regular Season - Top 10 Early Down Passing"
  ) %>%
  data_color(
    columns = -posteam,
    colors = scales::col_numeric(
      palette = c("#FEE5D9", "#A50F15"),
      domain = NULL
    )
  )

#| label: situational-epa-py
#| message: false
#| warning: false

# Early down EPA vs late down EPA
pbp_situational = pbp_2023_py.query("epa.notna() & play_type.isin(['pass', 'run'])")

def categorize_situation(down):
    if down <= 2:
        return "Early Downs (1st/2nd)"
    else:
        return "Late Downs (3rd/4th)"

pbp_situational['situation'] = pbp_situational['down'].apply(categorize_situation)

situational_epa = (pbp_situational
    .groupby(['posteam', 'situation', 'play_type'])
    .agg(
        mean_epa=('epa', 'mean'),
        plays=('epa', 'count')
    )
    .reset_index()
    .query('plays >= 30')
)

# Reshape for comparison
situational_wide = situational_epa.pivot_table(
    index='posteam',
    columns=['situation', 'play_type'],
    values='mean_epa'
).reset_index()

situational_wide.columns = ['_'.join(col).strip('_') for col in situational_wide.columns.values]

print("\nSituational EPA by Team (2023 NFL Regular Season)")
print("Top 10 Early Down Passing Teams")
print("=" * 80)
print(situational_wide.sort_values(
    'Early Downs (1st/2nd)_pass', ascending=False
).head(10).to_string(index=False))

Player-Level EPA

While this chapter focuses on team EPA, player-level EPA is crucial for evaluation:

R
Python

#| label: player-epa-r
#| message: false
#| warning: false

# Top QBs by EPA
qb_epa <- pbp_2023 %>%
  filter(
    !is.na(epa),
    play_type == "pass",
    !is.na(passer_player_name)
  ) %>%
  group_by(passer_player_name, passer_player_id) %>%
  summarise(
    dropbacks = n(),
    total_epa = sum(epa),
    mean_epa = mean(epa),
    success_rate = mean(epa > 0),
    .groups = "drop"
  ) %>%
  filter(dropbacks >= 200) %>%
  arrange(desc(total_epa))

# Display top 15 QBs
qb_epa %>%
  head(15) %>%
  gt() %>%
  cols_label(
    passer_player_name = "Quarterback",
    dropbacks = "Dropbacks",
    total_epa = "Total EPA",
    mean_epa = "EPA/Play",
    success_rate = "Success Rate"
  ) %>%
  fmt_number(
    columns = c(total_epa, mean_epa),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = success_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = dropbacks,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  cols_hide(passer_player_id) %>%
  tab_header(
    title = "Top Quarterbacks by EPA",
    subtitle = "2023 NFL Regular Season (min. 200 dropbacks)"
  ) %>%
  data_color(
    columns = total_epa,
    colors = scales::col_numeric(
      palette = c("#FEE5D9", "#A50F15"),
      domain = NULL
    )
  )

#| label: player-epa-py
#| message: false
#| warning: false

# Top QBs by EPA
qb_epa = (pbp_2023_py
    .query("epa.notna() & play_type == 'pass' & passer_player_name.notna()")
    .groupby(['passer_player_name', 'passer_player_id'])
    .agg(
        dropbacks=('epa', 'count'),
        total_epa=('epa', 'sum'),
        mean_epa=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean())
    )
    .reset_index()
    .query('dropbacks >= 200')
    .sort_values('total_epa', ascending=False)
)

print("\nTop Quarterbacks by EPA (2023 NFL Regular Season)")
print("Minimum 200 dropbacks")
print("=" * 90)
print(qb_epa[['passer_player_name', 'dropbacks', 'total_epa',
              'mean_epa', 'success_rate']].head(15).to_string(index=False))

EPA Limitations and Considerations

While EPA is extremely valuable, it's important to understand its limitations:

1. Credit Assignment

EPA assigns all credit/blame to the offense/defense on the field, but doesn't distinguish between:

Quarterback vs receiver contributions on a pass
Offensive line vs running back on a run
Coverage vs pass rush on defense

Solution: Use tracking data or film analysis to decompose EPA by player role.

2. Randomness and Luck

Short-term EPA can be influenced by:

Dropped passes (assigned to QB)
Tipped interceptions
Fumble luck

Solution: Use larger sample sizes and complement with other metrics.

3. Scheme and Situation

EPA doesn't fully account for:

Offensive scheme complexity
Defensive alignments and disguises
Intentional conservative play-calling

Solution: Perform situational adjustments and context-aware analysis.

4. Terminal Value

Basic EP models don't account for:

Kneel downs at end of game
Time management considerations
Two-minute drill scenarios

Solution: Use Win Probability Added (WPA) for end-game situations.

When NOT to Use EPA

1. **Small samples**: EPA is noisy at the play level; need 50+ plays for stability 2. **Kneeldowns**: Filter out or adjust for end-game kneeldowns 3. **Special teams**: Use specialized metrics for kicks, punts, returns 4. **Individual player credit**: EPA alone doesn't separate QB/RB/WR contributions 5. **Draft evaluation**: Combine with college EPA, athletic testing, and scouting

Strategic Applications of EPA

Fourth Down Decision-Making

EPA is fundamental to fourth-down analytics. The decision framework:

Go for it when: $\text{EP}_{\text{go}} > \max(\text{EP}_{\text{punt}}, \text{EP}_{\text{FG}})$

R
Python

#| label: fourth-down-analysis-r
#| message: false
#| warning: false

# Analyze 4th down decisions
fourth_down <- pbp_2023 %>%
  filter(down == 4, play_type %in% c("pass", "run", "punt", "field_goal")) %>%
  mutate(
    decision = case_when(
      play_type %in% c("pass", "run") ~ "Go for it",
      play_type == "field_goal" ~ "Field Goal",
      play_type == "punt" ~ "Punt"
    ),
    ydstogo_bin = cut(ydstogo,
                     breaks = c(0, 2, 5, 10, 100),
                     labels = c("1-2", "3-5", "6-10", "10+"))
  ) %>%
  group_by(yardline_100, ydstogo_bin, decision) %>%
  summarise(
    attempts = n(),
    mean_epa = mean(epa, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(attempts >= 5)

# Show mean EPA by decision type in opponent territory
fourth_down %>%
  filter(yardline_100 <= 40, ydstogo_bin %in% c("1-2", "3-5")) %>%
  group_by(ydstogo_bin, decision) %>%
  summarise(
    attempts = sum(attempts),
    mean_epa = weighted.mean(mean_epa, attempts),
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = decision, values_from = c(attempts, mean_epa)) %>%
  gt() %>%
  tab_header(
    title = "4th Down EPA by Decision",
    subtitle = "Opponent Territory (inside 40), 2023 Season"
  ) %>%
  fmt_number(
    columns = starts_with("mean"),
    decimals = 3
  )

#| label: fourth-down-analysis-py
#| message: false
#| warning: false

# Analyze 4th down decisions
fourth_down = pbp_2023_py.query(
    "down == 4 & play_type.isin(['pass', 'run', 'punt', 'field_goal'])"
).copy()

def categorize_decision(play_type):
    if play_type in ['pass', 'run']:
        return "Go for it"
    elif play_type == 'field_goal':
        return "Field Goal"
    else:
        return "Punt"

fourth_down['decision'] = fourth_down['play_type'].apply(categorize_decision)
fourth_down['ydstogo_bin'] = pd.cut(
    fourth_down['ydstogo'],
    bins=[0, 2, 5, 10, 100],
    labels=["1-2", "3-5", "6-10", "10+"]
)

fourth_summary = (fourth_down
    .groupby(['yardline_100', 'ydstogo_bin', 'decision'])
    .agg(
        attempts=('epa', 'count'),
        mean_epa=('epa', 'mean')
    )
    .reset_index()
    .query('attempts >= 5')
)

# Show mean EPA by decision type in opponent territory
opp_territory = (fourth_summary
    .query("yardline_100 <= 40 & ydstogo_bin.isin(['1-2', '3-5'])")
    .groupby(['ydstogo_bin', 'decision'])
    .apply(lambda x: pd.Series({
        'attempts': x['attempts'].sum(),
        'mean_epa': np.average(x['mean_epa'], weights=x['attempts'])
    }))
    .reset_index()
)

print("\n4th Down EPA by Decision")
print("Opponent Territory (inside 40), 2023 Season")
print("=" * 60)
for dist in ["1-2", "3-5"]:
    print(f"\n{dist} yards to go:")
    subset = opp_territory[opp_territory['ydstogo_bin'] == dist]
    for _, row in subset.iterrows():
        print(f"  {row['decision']:15s}: {row['attempts']:3.0f} attempts, "
              f"EPA = {row['mean_epa']:+.3f}")

Play-Calling Optimization

EPA can identify optimal play-calling strategies:

R
Python

#| label: play-calling-optimization-r
#| message: false
#| warning: false

# Calculate pass rate and EPA by situation
playcalling <- pbp_2023 %>%
  filter(
    !is.na(epa),
    play_type %in% c("pass", "run"),
    down %in% c(1, 2)
  ) %>%
  mutate(
    ydstogo_group = case_when(
      ydstogo <= 3 ~ "Short",
      ydstogo <= 7 ~ "Medium",
      TRUE ~ "Long"
    )
  ) %>%
  group_by(posteam, down, ydstogo_group) %>%
  summarise(
    plays = n(),
    pass_rate = mean(play_type == "pass"),
    mean_epa = mean(epa),
    pass_epa = mean(epa[play_type == "pass"]),
    run_epa = mean(epa[play_type == "run"]),
    .groups = "drop"
  )

# Show 1st down efficiency
playcalling %>%
  filter(down == 1, ydstogo_group == "Medium") %>%
  mutate(epa_diff = pass_epa - run_epa) %>%
  arrange(desc(epa_diff)) %>%
  head(10) %>%
  gt() %>%
  cols_label(
    posteam = "Team",
    plays = "Plays",
    pass_rate = "Pass Rate",
    pass_epa = "Pass EPA",
    run_epa = "Run EPA",
    epa_diff = "EPA Diff"
  ) %>%
  fmt_percent(
    columns = pass_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(pass_epa, run_epa, epa_diff, mean_epa),
    decimals = 3
  ) %>%
  cols_hide(c(down, ydstogo_group)) %>%
  tab_header(
    title = "1st Down Pass vs Run Efficiency",
    subtitle = "Medium distance (4-7 yards), 2023 Season"
  )

#| label: play-calling-optimization-py
#| message: false
#| warning: false

# Calculate pass rate and EPA by situation
pbp_playcalling = pbp_2023_py.query(
    "epa.notna() & play_type.isin(['pass', 'run']) & down.isin([1, 2])"
).copy()

def ydstogo_group(yards):
    if yards <= 3:
        return "Short"
    elif yards <= 7:
        return "Medium"
    else:
        return "Long"

pbp_playcalling['ydstogo_group'] = pbp_playcalling['ydstogo'].apply(ydstogo_group)

playcalling = (pbp_playcalling
    .groupby(['posteam', 'down', 'ydstogo_group'])
    .apply(lambda x: pd.Series({
        'plays': len(x),
        'pass_rate': (x['play_type'] == 'pass').mean(),
        'mean_epa': x['epa'].mean(),
        'pass_epa': x.loc[x['play_type'] == 'pass', 'epa'].mean(),
        'run_epa': x.loc[x['play_type'] == 'run', 'epa'].mean()
    }))
    .reset_index()
)

# Show 1st down efficiency
first_down_medium = (playcalling
    .query("down == 1 & ydstogo_group == 'Medium'")
    .assign(epa_diff=lambda x: x['pass_epa'] - x['run_epa'])
    .sort_values('epa_diff', ascending=False)
)

print("\n1st Down Pass vs Run Efficiency")
print("Medium distance (4-7 yards), 2023 Season")
print("=" * 90)
print(first_down_medium[['posteam', 'plays', 'pass_rate', 'pass_epa',
                         'run_epa', 'epa_diff']].head(10).to_string(index=False))

Summary

Expected Points Added (EPA) is the cornerstone metric of modern football analytics. In this chapter, we:

Built EP models from scratch using binning, GAM, and gradient boosting approaches
Visualized the EP curve showing how expected points vary by field position
Calculated EPA for plays and understood its components
Analyzed EPA distributions by play type, down, distance, and field position
Examined contextual factors including score differential and garbage time
Demonstrated EPA's stability and strong correlation with winning
Compared EPA to traditional metrics showing its superior predictive power
Applied EPA to strategic decisions including fourth downs and play-calling
Recognized EPA limitations and when to use complementary metrics

EPA provides a unified framework for evaluating football performance that:

Accounts for context: Down, distance, field position matter
Measures true value: Points, not yards
Predicts success: High correlation with wins and future performance
Enables comparison: Apples-to-apples across situations
Guides decisions: Optimal fourth-down strategy, play-calling, personnel

Key Takeaways

EPA Best Practices

1. **Use sufficient sample sizes**: 50+ plays minimum, ideally 200+ 2. **Filter garbage time**: Remove plays when game is decided 3. **Consider context**: Score, time, and opponent strength matter 4. **Combine with other metrics**: Use Success Rate, CPOE, and yards for complete picture 5. **Understand limitations**: EPA doesn't separate individual contributions 6. **Update models regularly**: EP models should be retrained with recent data 7. **Visualize distributions**: Don't just report means—show full distributions

Exercises

Conceptual Questions

EP Model Comparison: Why does the gradient boosting model outperform simple binning? What are the tradeoffs?
EPA Interpretation: A team averages +0.15 EPA per play. Over a 65-play game, how many more points would we expect them to score than an average team?
Context Effects: Why does pass EPA vary more by score differential than run EPA? What does this tell us about game theory?
Stability: EPA has year-to-year correlation of ~0.45. Is this high or low? How does it compare to win percentage?

Coding Exercises

Exercise 1: Build a Custom EP Model

Create your own EP model with these requirements: a) Include additional features: quarter, score differential, time remaining b) Use cross-validation to select optimal hyperparameters c) Compare in-sample and out-of-sample performance d) Plot feature importance **Bonus**: Build separate models for different field zones (own territory, midfield, red zone)

Exercise 2: Team EPA Profiles

For each team in 2023, create a comprehensive EPA profile: a) Calculate EPA by down (1st, 2nd, 3rd) b) Calculate EPA by distance (short, medium, long) c) Calculate EPA by field position zone d) Create a radar chart or heatmap visualizing the profile e) Identify each team's strengths and weaknesses **Bonus**: Cluster teams based on their EPA profiles

Exercise 3: EPA-Based Play Recommendations

Build a recommendation system that suggests optimal play types: a) For each game situation (down, distance, field position), calculate historical EPA for pass vs run b) Account for team strength (offensive and defensive EPA) c) Adjust for score differential and time remaining d) Output a recommendation: pass, run, or neutral e) Validate recommendations against actual play-calling and outcomes **Bonus**: Include league-wide pass rate and adjust for game theory (defense expects pass in obvious passing situations)

Exercise 4: QB EPA Decomposition

Analyze quarterback EPA and attempt to decompose it: a) Calculate QB EPA per play b) Estimate "expected EPA" based on pass location and depth (using receiver position) c) Calculate "EPA over expected" (similar to CPOE) d) Compare rankings: total EPA vs EPA over expected e) Identify QBs who outperform/underperform expected EPA **Bonus**: Analyze how EPA over expected correlates with QB pressure and time to throw

Exercise 5: Predictive EPA Model

Build a model that predicts future EPA: a) Use team EPA from weeks 1-8 to predict weeks 9-17 b) Include additional features: schedule strength, injury status, weather c) Calculate prediction error (MAE, RMSE) d) Compare to naive baselines (e.g., league average, prior year) e) Identify which teams exceeded or fell short of predictions **Bonus**: Use this model to predict playoff outcomes

References

:::

Learning ObjectivesBy the end of this chapter, you will be able to:

Interactive EPA Calculator

EPA Distribution

NFL Team Performance Comparison (2023)

Introduction

Why This Chapter Matters

Review: Expected Points Framework

Expected Points (EP): The Foundation of Context-Aware Analysis

Expected Points Added (EPA): Measuring Play Value

Example: EPA Calculation for a Successful Pass Play

Example: EPA Calculation for an Unsuccessful Pass Play

Mathematical Foundation of Expected Points

The Expected Points Model: A Conditional Expectation Framework

State Variables: Defining the Game Situation

Model Complexity Trade-offs

Drive Outcomes: Defining the Target Variable

Drive Outcome Examples

Model Formulation: From Outcomes to Expected Points

Statistical Methods for Building EP Models

Model Selection in Practice

Building an Expected Points Model from Scratch

Data Preparation: Creating Training and Test Sets

Method 1: Binning and Averaging

Method 2: Generalized Additive Model (GAM)

Method 3: Gradient Boosting (Production Model)

Model Comparison

Visualizing the Expected Points Curve

📊 Visualization Output

Key Observations from the EP Curve

Calculating and Interpreting EPA

EPA Components

Calculating EPA from Scratch

EPA Distributions by Play Type

📊 Visualization Output

Key Insights from EPA Distributions

EPA by Game Situation: Understanding Context-Dependent EPA

EPA by Down and Distance: How Game State Affects Play Value

📊 Visualization Output

EPA by Field Position

📊 Visualization Output

Field Position Effects on EPA

Contextual EPA Analysis

Score-Adjusted EPA

📊 Visualization Output

Garbage Time Adjustment

EPA Stability and Predictive Power

Year-to-Year Stability

EPA and Winning

📊 Visualization Output

📊 Visualization Output

EPA's Predictive Power

EPA vs Traditional Metrics

EPA vs Total Yards

📊 Visualization Output

Why EPA is Superior

Advanced EPA Applications

Situational EPA Analysis

Player-Level EPA

EPA Limitations and Considerations

1. Credit Assignment

2. Randomness and Luck

3. Scheme and Situation

4. Terminal Value

When NOT to Use EPA

Strategic Applications of EPA

Fourth Down Decision-Making

Play-Calling Optimization

Summary

Key Takeaways

EPA Best Practices

Exercises

Conceptual Questions

Coding Exercises

Exercise 1: Build a Custom EP Model

Exercise 2: Team EPA Profiles

Exercise 3: EPA-Based Play Recommendations

Exercise 4: QB EPA Decomposition

Exercise 5: Predictive EPA Model

Further Reading

Academic Papers