Learning ObjectivesBy the end of this chapter, you will be able to:
Interactive EPA Calculator
Explore how field position, down, and distance affect Expected Points.
EPA Distribution
The distribution of EPA across all plays shows most plays have small EPA values, with occasional explosive plays.
NFL Team Performance Comparison (2023)
This scatter plot shows offensive EPA per play (x-axis) vs. defensive EPA allowed per play (y-axis). Top-right teams are dominant on both sides of the ball.
- Understand the mathematical foundation of Expected Points (EP) and Expected Points Added (EPA)
- Build expected points models from scratch using historical data
- Calculate and interpret EPA in different game contexts
- Recognize EPA limitations and apply appropriate adjustments
- Use EPA for strategic decision-making and team evaluation
- Compare EPA with traditional metrics and understand when to use each
- Implement EPA-based analysis in both R and Python
Introduction
In Chapter 1, we introduced Expected Points Added (EPA) as one of the foundational metrics in modern football analytics. We learned that EPA measures the value of each play by comparing the expected points before and after the play. This chapter takes a deep dive into EPA, exploring its mathematical foundations, practical implementation, and strategic applications in ways that will fundamentally change how you think about football performance evaluation.
Expected Points Added represents perhaps the single most important conceptual breakthrough in football analytics over the past two decades. Before EPA, analysts relied on traditional statistics—total yards, completion percentage, yards per carry—that failed to capture the situational context that makes football unique among major sports. A 5-yard gain means something entirely different on 3rd-and-4 (a successful conversion that extends the drive) versus 3rd-and-20 (an unsuccessful play that results in a punt). EPA elegantly solves this problem by measuring value in terms of scoring expectation rather than raw yardage.
The power of EPA lies in its ability to reduce the complexity of football into a single, context-aware metric. Every play in football occurs within a specific game state defined by down, distance, field position, score, and time remaining. Traditional statistics treat all yards equally, but we know intuitively that's wrong—a 1-yard touchdown plunge from the goal line is more valuable than a 50-yard gain that still leaves the offense at midfield on 4th-and-1. EPA captures these nuances by asking a simple but profound question: "How much closer to scoring (or further from allowing the opponent to score) did this play move the offense?"
This chapter will guide you through the complete EPA framework, from theoretical foundations to practical implementation. We'll build expected points models from scratch, learning not just how to use EPA but why it works and when it's most applicable. You'll learn to calculate EPA for any game situation, analyze EPA patterns across different contexts, and apply EPA insights to strategic decision-making. By the end, you'll understand why NFL teams now consider EPA their primary metric for evaluating offensive and defensive performance, why it predicts future success better than traditional stats, and how to implement EPA-based analysis in your own work.
Understanding EPA deeply is essential for anyone serious about football analytics. It forms the foundation for player evaluation (which quarterbacks add the most value?), play-calling optimization (which play types work best in which situations?), fourth-down decision models (when should teams attempt conversions?), and countless other applications. The concepts and methods you'll learn in this chapter will reappear throughout the remainder of this textbook, as EPA serves as the analytical backbone for modern football analysis.
Why This Chapter Matters
EPA is the single most important metric in modern football analytics. It forms the foundation for: - Player evaluation and grading - Play-calling analysis and optimization - Fourth-down decision-making - Draft analysis and team building - Coaching strategy and game planningReview: Expected Points Framework
Before diving deep into the mathematics and implementation of EPA, let's review and expand upon the core concepts introduced in Chapter 1. This review will establish the conceptual foundation we'll build upon throughout this chapter, ensuring we have a shared understanding of what Expected Points represents and why it matters.
Expected Points (EP): The Foundation of Context-Aware Analysis
Expected Points represents the average number of points a team can expect to score on the current drive before the opponent scores, given the current game state defined by down, distance, and field position. This might sound simple, but it's a profound insight that fundamentally changed how we think about football value.
To understand Expected Points, imagine we collected data from 50,000 drives over the past decade and identified every instance where a team faced 1st-and-10 at their own 20-yard line. Across those thousands of drives, we track what happened next: How often did teams score touchdowns? Field goals? How often did their opponents score next? When we average the net points scored across all those drives, we get the Expected Points for that situation—approximately 0.4 points.
Let's examine some illustrative examples to build intuition about how Expected Points varies across different situations:
- 1st-and-10 at own 20-yard line: ~0.4 expected points
- This is roughly the league average starting field position after a touchback
- Teams in this situation score before their opponent about 15% of touchdowns (7 pts), 8% field goals (3 pts)
- The opponent often scores next, dragging down the expectation
-
Net result: Slightly positive scoring expectation
-
1st-and-10 at opponent 20-yard line: ~4.0 expected points
- Often called the "red zone," this represents excellent field position
- Teams score touchdowns about 35% of the time, field goals 40% of the time
- Very unlikely that the opponent scores next from this position
-
This is nearly the value of a guaranteed field goal plus a coin flip for a touchdown
-
3rd-and-15 at own 30-yard line: ~0.2 expected points
- Long third downs dramatically reduce scoring expectation
- Conversion rate is low (~15-20%), meaning punt is likely
- Poor field position makes scoring difficult even after conversion
-
The opponent's good field position after the likely punt reduces net expectation
-
1st-and-goal at the 1-yard line: ~6.0 expected points
- The goal line represents the highest expected points situation
- Teams score touchdowns 85-90% of the time from this position
- The remaining 10-15% usually results in field goals or turnovers
- Average outcome is very close to an automatic touchdown
These values are empirically derived from analyzing hundreds of thousands of historical drives from NFL data. The key insight is that these values represent what actually happens on average, not theoretical calculations or subjective assessments. Expected Points is grounded in observed reality, which gives it predictive power and makes it useful for decision-making.
Notice the pattern: Expected Points increases as you move closer to your opponent's goal line, increases with more favorable downs (1st > 2nd > 3rd), and increases with shorter distances to the first down marker. These relationships aren't linear—moving from your own 20 to your own 40 is worth different value than moving from your opponent's 20 to the goal line. The EP framework captures these non-linear relationships that make football strategically rich.
Expected Points Added (EPA): Measuring Play Value
With Expected Points providing a valuation for every game state, Expected Points Added naturally follows as a measure of play value. EPA quantifies how much a single play changed the team's scoring expectation by comparing the Expected Points before and after the play. This simple difference tells us whether a play moved the team closer to scoring (positive EPA) or further away (negative EPA).
The mathematical formula for EPA is elegantly straightforward:
$$ \text{EPA} = \text{EP}_{\text{end}} - \text{EP}_{\text{start}} $$
Where:
- $\text{EP}_{\text{start}}$ is the expected points at the beginning of the play (based on down, distance, field position before the snap)
- $\text{EP}_{\text{end}}$ is the expected points at the end of the play (based on the resulting down, distance, field position after the play concludes)
This subtraction captures everything that happened during the play: yards gained or lost, down changes (conversion or loss of down), turnovers, penalties, and even immediate scores (touchdowns, field goals, safeties). Because EP already accounts for situational context, EPA automatically incorporates that context into play valuation.
Example: EPA Calculation for a Successful Pass Play
Let's walk through a detailed example to illustrate how EPA works in practice. **Starting Situation**: 1st-and-10 at own 25-yard line - Field position: 75 yards from opponent's end zone - Expected Points: ~0.5 points - This represents slightly better than touchback field position **The Play**: Quarterback completes a 15-yard pass to the slot receiver **Ending Situation**: 1st-and-10 at own 40-yard line - Field position: 60 yards from opponent's end zone - Expected Points: ~1.2 points - The offense has moved into more favorable territory **EPA Calculation**: $$\text{EPA} = 1.2 - 0.5 = +0.7 \text{ expected points}$$ **Interpretation**: This play added 0.7 expected points to the offense's scoring expectation. To put this in perspective, over a typical 65-play game, accumulating +0.7 EPA per play would result in approximately 45.5 expected points—an exceptionally high-scoring performance. This single play was successful because it meaningfully improved the team's position to score. **Why This Play Has Positive EPA**: 1. **Yardage gain**: 15 yards of field position improvement 2. **Maintained favorable down**: Stayed on 1st down (conversion was automatic) 3. **Moved into better EP territory**: The EP curve is non-linear, so moving from the 25 to 40 provides significant value 4. **No negative events**: No turnover, sack, penalty, or loss of downExample: EPA Calculation for an Unsuccessful Pass Play
Now let's examine a contrasting example where a play generates negative EPA. **Starting Situation**: 3rd-and-8 at own 42-yard line - Field position: 58 yards from opponent's end zone - Expected Points: ~1.3 points - This is a moderately difficult conversion situation **The Play**: Quarterback throws incomplete pass (defensed by cornerback) **Ending Situation**: 4th-and-8 at own 42-yard line (team will punt) - Field position: Unchanged at 58 yards from end zone - Expected Points after punt: ~-0.8 points (opponent gets ball around midfield) - The offense failed to convert and must surrender possession **EPA Calculation**: $$\text{EPA} = -0.8 - 1.3 = -2.1 \text{ expected points}$$ **Interpretation**: This play cost the offense 2.1 expected points. Despite gaining zero yards (incompletion), the play has substantial negative value because it forced the team to punt, giving the opponent favorable field position. This demonstrates EPA's power—it captures that failing to convert on 3rd down has major consequences beyond the immediate yardage. **Why This Play Has Negative EPA**: 1. **Loss of down**: Moved from 3rd to 4th down without converting 2. **Change of possession**: Team must punt, giving opponent the ball 3. **Opponent field position**: Punt likely results in opponent starting around midfield 4. **Lost opportunity**: The failure to convert ended the drive prematurely This example illustrates why completion percentage, while informative, misses critical context. An incomplete pass can be mildly negative (on 1st down) or severely negative (on 3rd down), and EPA captures this distinction automatically.The Power of EPA's Context Awareness:
What makes EPA revolutionary is its automatic incorporation of situational context. Consider three hypothetical 5-yard runs:
- 1st-and-10 from own 30: ~+0.3 EPA (modest gain, maintains manageable distance)
- 3rd-and-3 from own 30: ~+1.8 EPA (conversion! Drive extends with fresh downs)
- 3rd-and-8 from own 30: ~-1.5 EPA (non-conversion, leads to punt)
All three plays gain the same 5 yards, yet EPA reveals their vastly different values. Traditional statistics (yards per carry: 5.0 for all three) completely miss this. EPA's context-awareness makes it far more useful for evaluating play quality, player performance, and strategic decisions.
Mathematical Foundation of Expected Points
Understanding the mathematical foundations of Expected Points helps us appreciate why it works, how to build robust models, and when its assumptions might break down. This section transitions from the intuitive understanding we've developed to the formal statistical framework underlying EP and EPA. Don't worry if some concepts feel technical—we'll provide intuitive explanations alongside the mathematics, and you don't need advanced statistical training to grasp the key ideas.
The Expected Points Model: A Conditional Expectation Framework
At its core, the expected points model is a conditional expectation problem from probability theory. We're trying to estimate the expected value of a random variable (points scored) conditional on known information about the current state (down, distance, field position). Mathematically, we write this as:
$$ E[\text{Points} \mid X] $$
Where:
- $E[\cdot]$ denotes the expected value (average) of a random quantity
- $\text{Points}$ is the net points the offense will score before the defense scores
- $X$ is a vector of state variables describing the current game situation
- The vertical bar $\mid$ means "conditional on" or "given that we know"
This notation says: "What is the expected (average) number of points we expect the offense to score on this drive, given what we know about the current situation?" The beauty of this formulation is that it explicitly acknowledges uncertainty (we don't know what will happen next) while providing the best estimate based on historical patterns.
Why Conditional Expectation Matters: The conditioning aspect is crucial. We're not asking "What's the average points scored per drive in the NFL?" (which would be around 1.8-2.0 points). We're asking "What's the average points scored from THIS SPECIFIC situation?" The answer depends entirely on the situation—hence "conditional" expectation. A team facing 1st-and-10 at the opponent's 5-yard line has very different expectations than a team facing 4th-and-15 at their own 10.
The Learning Problem: Building an EP model means learning the function $f$ such that $f(X) \approx E[\text{Points} \mid X]$ from historical data. We observe thousands of drives, record their starting situations ($X$) and outcomes ($\text{Points}$), then use statistical/machine learning methods to estimate the relationship. This is supervised learning: we have inputs (game situations), outputs (points scored), and we want to learn the mapping between them.
State Variables: Defining the Game Situation
The accuracy and usefulness of an EP model depends critically on which state variables we include. More variables generally improve model accuracy but require more data and computational complexity. The art of EP modeling involves balancing predictive power against simplicity and interpretability.
The core state variables for a basic EP model include:
1. Field Position ($\text{yardline}$)
- Measured as distance from opponent's end zone (0-100 yards)
- Often represented as yardline_100 in nflfastR data
- Most important predictor of scoring expectation
- Relationship is non-linear: gaining 10 yards at midfield has different value than gaining 10 yards in the red zone
- Example values: Own 20 = 80 yards from goal; Opponent 20 = 20 yards from goal
2. Down ($\text{down}$)
- Categorical variable: 1st, 2nd, 3rd, or 4th down
- Captures how many attempts remain to gain first down
- First down is most valuable (three chances remaining)
- Fourth down usually results in punt/field goal attempt rather than offensive play
- Down interacts with distance—1st-and-15 is different from 1st-and-5
3. Distance to First Down ($\text{distance}$)
- Continuous variable: yards needed for first down (1-50+ yards)
- Also called ydstogo (yards to go) in most datasets
- Short distances favor offense (easier conversions)
- Long distances (10+) make conversions difficult and favor defense
- Interacts strongly with down: 3rd-and-1 vs. 3rd-and-15 are completely different situations
4. Score Differential ($\text{score\_diff}$) - Optional but increasingly common
- Points the offense is ahead (positive) or behind (negative)
- Affects strategy: teams protecting leads play conservatively, teams trailing play aggressively
- Particularly important late in games when teams may prioritize clock management over scoring
- Basic EP models often exclude this for simplicity; advanced models include it
5. Time Remaining ($\text{seconds}$) - Optional for advanced models
- Seconds remaining in half or game
- Interacts with score differential for strategy effects
- Early in games, less important; late in games, critically important
- Teams trailing with little time remaining take more risks
- Including time creates a time-varying EP model
6. Additional Variables - For specialized models
- Quarter: Some teams play differently in different quarters
- Timeouts Remaining: Affects clock management and two-minute drill effectiveness
- Weather Conditions: Wind, precipitation affect passing and kicking
- Home/Away: Home field advantage exists, though effect size is modest
- Team Strength: Adjusting for team quality improves predictions but complicates interpretation
For most applications, a model using field position, down, and distance provides excellent results. These three variables capture the vast majority of variation in scoring expectations. Adding score differential and time remaining improves accuracy for late-game situations but complicates the model and requires more data.
Model Complexity Trade-offs
**Simple Model** (Field Position + Down + Distance): - **Pros**: Easy to build, interpret, and explain; stable estimates; works well most of the time - **Cons**: Misses some strategic considerations; treats all game contexts equally - **Best for**: Player evaluation, general performance analysis, most analytical applications **Complex Model** (Simple Model + Score + Time + More): - **Pros**: More accurate in specific situations; captures strategic behavior; better for decision-making - **Cons**: Harder to build and interpret; requires more data; can overfit; less stable estimates - **Best for**: In-game decision support, late-game strategy, fourth-down models, coaching applications Most analysts start with simple models and add complexity only when necessary. The principle of parsimony (Occam's Razor) applies: prefer simpler models unless complexity demonstrably improves performance for your specific use case.Drive Outcomes: Defining the Target Variable
The target variable in an EP model—what we're trying to predict—is the net points scored by the offense before the defense scores. This requires carefully defining all possible drive outcomes and their point values. The accounting must be precise because EP serves as the foundation for all subsequent EPA calculations.
Positive Outcomes (Offense scores):
- Touchdown: +7 points
- We conventionally include the PAT (extra point) in touchdown value
- 99% of extra points succeed, so expected value ≈ 0.99 points
- Rounding to 7 points is standard practice in EP models
-
Some advanced models use 6.95 or calculate dynamic PAT values
-
Field Goal: +3 points
- Constant value regardless of distance
- In reality, long field goals are less likely to succeed
- But we're measuring points scored, not probability of attempting
-
If a drive results in a made field goal, it's worth exactly 3 points
-
Two-Point Conversion: +8 points (for touchdowns with successful 2PC)
- Rare enough that many models simplify by using 7 for all TDs
- Sophisticated models track 2PC attempts and success separately
Negative Outcomes (Opponent scores next):
- Opponent Touchdown: -7 points
- If the drive ends (turnover, turnover on downs, punt) and opponent scores TD before offense scores again
- Represents giving opponent good field position that leads to their scoring
-
The negative value reflects that the drive not only failed to score but enabled opponent scoring
-
Opponent Field Goal: -3 points
- Similar logic to opponent touchdown
-
Drive ends, opponent gets ball, opponent kicks field goal before offense scores
-
Safety (offense commits): -2 points
- Rare but devastating outcome
- Offense tackled in their own end zone
- Most commonly occurs when offense has terrible field position (own 1-5 yard line)
Neutral Outcomes (No score):
- No Score by Either Team: 0 points
- Drive ends, opponent gets ball, but neither team scores before next change of possession
- Most common outcome for drives starting from poor field position
- Represents "trading possessions" without scoring
Why Net Points Matter: The EP framework measures net points scored on the current drive before the opponent scores. This is crucial—it's not just "did the offense score?" but "what was the net scoring outcome?" A drive that ends in a punt from midfield (opponent gets ball at own 20) has different implications than a drive that ends in a turnover at the opponent's 10 (opponent gets ball with excellent field position). The first might result in "no score by either team" (0 points), while the second might result in "opponent field goal" (-3 points).
Drive Outcome Examples
**Example 1: Straightforward Touchdown Drive** - Start: 1st-and-10 at own 25 - Series of plays advances to opponent's 2-yard line - Touchdown scored on goal line run - **Outcome**: +7 points **Example 2: Failed Drive Leading to Opponent Score** - Start: 1st-and-10 at own 30 - Three-and-out after incomplete passes - Punt gives opponent ball at their own 40 - Opponent drives and scores touchdown - **Outcome**: -7 points (even though offense never scored, the drive's net result is negative) **Example 3: Back-and-Forth with No Scoring** - Start: 1st-and-10 at own 20 - Drive advances to midfield, then stalls - Punt gives opponent ball at their own 15 - Opponent's drive also fails, punts back - Neither team scores before next change of possession - **Outcome**: 0 pointsModel Formulation: From Outcomes to Expected Points
With drive outcomes defined, we can now formulate the Expected Points model mathematically. The key insight is that EP is a weighted average of all possible outcomes, where weights are the probabilities of each outcome occurring.
$$ EP(y, d, D) = \sum_{k} P(\text{outcome}_k \mid y, d, D) \times \text{points}_k $$
Where:
- $EP(y, d, D)$ is the expected points for a given situation
- $y$ = yardline, measured as yards from opponent's end zone (0-100)
- $d$ = down (1, 2, 3, or 4)
- $D$ = distance to first down (1-50+ yards)
- $P(\text{outcome}_k \mid y, d, D)$ = probability of outcome $k$ occurring from this situation
- $\text{points}_k$ = point value of outcome $k$ (from table above)
- The sum $\sum_k$ is over all possible outcomes (TD, FG, opponent TD, opponent FG, no score)
Interpreting the Formula: This equation says that Expected Points is calculated by:
1. Identifying all possible ways the drive could end (touchdown, field goal, etc.)
2. Estimating the probability of each outcome from this specific situation
3. Multiplying each outcome's point value by its probability
4. Summing across all outcomes to get the expected value
Example Calculation: Suppose from 1st-and-10 at opponent's 25-yard line, historical data shows:
- Touchdown: 35% probability × 7 points = 2.45 expected points
- Field goal: 40% probability × 3 points = 1.20 expected points
- Opponent TD: 5% probability × -7 points = -0.35 expected points
- Opponent FG: 8% probability × -3 points = -0.24 expected points
- No score: 12% probability × 0 points = 0.00 expected points
Sum = 2.45 + 1.20 - 0.35 - 0.24 + 0.00 = 3.06 expected points
This calculation shows why field position near the goal line is valuable: the high probability of scoring (75% combined TD/FG rate) outweighs the low probability of giving up points.
Statistical Methods for Building EP Models
The formula above tells us what we want to calculate, but not how to calculate it from data. We need statistical methods to learn the relationship between game situations $(y, d, D)$ and expected points. Multiple approaches exist, each with different trade-offs between accuracy, interpretability, computational cost, and data requirements.
1. Binning and Averaging (Simplest approach)
- Method: Group similar situations into bins, calculate average points for each bin
- Example: All "1st-and-10 from yards 60-65" plays go in one bin
- Pros: Extremely simple; easy to understand and explain; no statistical software needed
- Cons: Requires arbitrary bin boundaries; doesn't smooth across boundaries; needs lots of data for rare situations
- When to use: Quick exploratory analysis; teaching/explaining concepts; situations with lots of data
2. Generalized Additive Models (GAM) (Flexible smoothing)
- Method: Fits smooth curves for continuous variables (yardline, distance) and factors for categorical ones (down)
- Example: Uses splines to create smooth EP curve as yardline changes
- Pros: Smooth predictions; interpretable; handles non-linearities well; built-in uncertainty estimates
- Cons: Limited ability to capture complex interactions; slower to fit than simple models
- When to use: When interpretability matters; when visualizing relationships; academic research
3. Random Forests (Ensemble tree method)
- Method: Builds many decision trees, each on random subset of data, averages predictions
- Example: Tree 1 might split on "yardline < 50?", Tree 2 on "distance < 5?", etc.
- Pros: Captures complex interactions automatically; robust to outliers; handles categorical and continuous variables easily
- Cons: "Black box" (hard to interpret); can overfit; slower prediction than simpler models
- When to use: When predictive accuracy is paramount; when interactions between variables are important
4. Gradient Boosting (Industry standard)
- Method: Builds trees sequentially, each correcting errors from previous trees
- Example: XGBoost, LightGBM, CatBoost implementations
- Pros: Highest accuracy in practice; handles missing data; fast prediction; feature importance available
- Cons: Requires careful tuning; can overfit; computationally expensive to train; less interpretable
- When to use: Production models; when maximum accuracy needed; when you have engineering resources
5. Neural Networks (Maximum flexibility)
- Method: Learns complex non-linear functions through layers of interconnected nodes
- Example: Deep learning models with multiple hidden layers
- Pros: Can learn any function given enough data; extremely flexible; state-of-the-art for complex problems
- Cons: Requires huge amounts of data; computationally expensive; complete black box; easy to overfit; hard to tune
- When to use: When you have massive datasets (10+ seasons); when traditional methods plateau; research applications
Model Selection in Practice
**For learning and exploration**: Start with binning/averaging to build intuition. It's simple enough to calculate by hand and helps you understand the data patterns. **For analysis and research**: Use GAM or random forests. GAMs provide interpretable smooth curves perfect for visualizing relationships. Random forests offer good accuracy without requiring extensive tuning. **For production systems**: Use gradient boosting (XGBoost or LightGBM). This is what nflfastR and professional teams use because it provides the best accuracy-to-effort ratio. The slight interpretability loss is worth the performance gain. **For cutting-edge research**: Consider neural networks only if you have extensive data, computational resources, and expertise. For most applications, gradient boosting is sufficient. In this chapter, we'll implement binning, GAM, and gradient boosting approaches so you understand the trade-offs and can choose appropriately for your applications.Building an Expected Points Model from Scratch
Now that we understand the theory, let's implement a complete Expected Points model from scratch using historical NFL data. This hands-on section will take you through every step of model building, from data preparation through model training to performance evaluation. By building models yourself, you'll gain deep insights into how EP works, what choices modelers make, and how to adapt models for your specific needs.
We'll implement three different modeling approaches—binning, GAM, and gradient boosting—so you can compare their performance and understand the trade-offs. Each approach offers different benefits: binning is simple and transparent, GAM provides smooth interpretable curves, and gradient boosting delivers maximum accuracy. By implementing all three, you'll develop intuition about which method works best for different applications.
This is where theory meets practice. You'll see that building an EP model isn't just about running a statistical procedure—it requires thoughtful decisions about data filtering, feature engineering, model selection, and validation. The code examples provide both R and Python implementations, with detailed comments explaining every step. Even if you're new to these programming languages, the extensive comments will help you understand the logic.
Data Preparation: Creating Training and Test Sets
Before building any model, we need high-quality data. For EP models, this means collecting historical play-by-play data that includes down, distance, field position, and drive outcomes. We'll use multiple seasons for training (2016-2022) to ensure robust estimates, and reserve the most recent season (2023) for testing to evaluate how well our model generalizes to new data.
Why Multiple Seasons? Using seven seasons (2016-2022) gives us approximately 300,000+ plays, providing sufficient data even for rare situations like 4th-and-long from deep in your own territory. More data leads to more stable estimates and better model performance. However, we don't want to go back too far—the NFL evolves over time, and play-calling from the 1990s looks very different from today's game.
Why Hold Out 2023? We never evaluate a model on the same data used to train it—that would be circular reasoning (of course a model performs well on data it was trained on!). Instead, we hold out the most recent season as a test set, simulating how the model would perform on future, unseen data. This tests whether our model has learned generalizable patterns rather than memorizing specific situations.
Data Filtering Decisions: Not every play in the database belongs in our model:
- We exclude plays without down/distance/field position (kickoffs, penalties without plays)
- We keep only regular season games (playoffs involve different team quality and strategy)
- We track drive outcomes to calculate the target variable (points scored)
These filtering decisions affect model performance and interpretation, so we'll document them clearly in the code.
#| label: setup-r
#| message: false
#| warning: false
# Load required libraries for EPA model building
# These packages provide the tools we need for data manipulation,
# statistical modeling, and visualization
library(tidyverse) # Data manipulation (dplyr), visualization (ggplot2), etc.
library(nflfastR) # Access to NFL play-by-play data with advanced metrics
library(mgcv) # Generalized Additive Models for smooth non-linear relationships
library(xgboost) # Gradient boosting implementation - industry standard for EP models
library(gt) # Publication-quality tables for displaying results
library(nflplotR) # NFL team logos and styling for visualizations
# Set random seed for reproducibility
# This ensures that any random processes (like train/test splits,
# cross-validation folds, or model initialization) produce the same
# results each time we run the code
set.seed(42)
#| label: load-ep-data-r
#| message: false
#| warning: false
#| cache: true
# STEP 1: Load Historical Play-by-Play Data
# We'll use 2016-2023 seasons to build and test our EP model
# This gives us ~350,000 plays with sufficient data for even rare situations
seasons <- 2016:2023
pbp <- load_pbp(seasons)
# Display initial data summary
cat("=== RAW DATA LOADED ===\n")
cat("Total plays across all seasons:", format(nrow(pbp), big.mark = ","), "\n")
cat("Seasons included:", paste(unique(pbp$season), collapse = ", "), "\n\n")
# STEP 2: Filter and Prepare Data for EP Modeling
# We need to:
# - Keep only plays with valid down, distance, and field position
# - Exclude playoffs (different strategic considerations)
# - Calculate drive outcomes (what happened next)
# - Create target variable (net points scored)
ep_data <- pbp %>%
# Filter to plays with complete state information
filter(
!is.na(down), # Must have down information (1st, 2nd, 3rd, 4th)
!is.na(ydstogo), # Must have distance information (yards to go)
!is.na(yardline_100), # Must have field position (yards from opponent goal)
season_type == "REG" # Only regular season (playoffs are different strategically)
) %>%
# Group by game and drive to calculate drive-level outcomes
# Each drive is a sequence of plays ending in score or change of possession
group_by(game_id, drive) %>%
mutate(
# Get the final result of this drive
# fixed_drive_result is nflfastR's categorization of how the drive ended
drive_result = last(fixed_drive_result),
# Convert drive results to point values
# This is our target variable - what we're trying to predict
drive_points = case_when(
# Positive outcomes (offense scores)
drive_result == "Touchdown" ~ 7, # TD + expected PAT value
drive_result == "Field goal" ~ 3, # Successful FG
# Negative outcomes (defense scores next)
drive_result == "Opp touchdown" ~ -7, # Opponent scores TD after turnover/punt
drive_result == "Opp field goal" ~ -3, # Opponent scores FG after turnover/punt
drive_result == "Safety" ~ -2, # Offense commits safety
# Neutral outcome (no score)
TRUE ~ 0 # No score before next possession change
)
) %>%
ungroup() %>%
# Select only the variables we need for modeling
# Keeping dataset focused improves performance and clarity
select(
season, game_id, play_id, drive, # Identifiers
down, ydstogo, yardline_100, # State variables (predictors)
qtr, half_seconds_remaining, # Additional context (optional)
score_differential, # Score context (optional)
drive_points, # Target variable
epa # nflfastR's EPA for comparison
)
# STEP 3: Create Train/Test Split
# Training set: 2016-2022 (7 seasons for model building)
# Test set: 2023 (1 season for evaluation)
# This simulates how the model would perform on future, unseen data
train_data <- ep_data %>% filter(season <= 2022)
test_data <- ep_data %>% filter(season == 2023)
# Display data split summary
cat("=== DATA PREPARATION COMPLETE ===\n")
cat("Training observations:", format(nrow(train_data), big.mark = ","),
"(", round(nrow(train_data) / nrow(ep_data) * 100, 1), "% )\n")
cat("Testing observations:", format(nrow(test_data), big.mark = ","),
"(", round(nrow(test_data) / nrow(ep_data) * 100, 1), "% )\n")
cat("Training seasons:", paste(unique(train_data$season), collapse = ", "), "\n")
cat("Test season:", unique(test_data$season), "\n\n")
# Verify data quality
cat("=== DATA QUALITY CHECKS ===\n")
cat("Missing values in train_data:\n")
cat(" down:", sum(is.na(train_data$down)), "\n")
cat(" ydstogo:", sum(is.na(train_data$ydstogo)), "\n")
cat(" yardline_100:", sum(is.na(train_data$yardline_100)), "\n")
cat(" drive_points:", sum(is.na(train_data$drive_points)), "\n")
#| label: setup-py
#| message: false
#| warning: false
# Load required libraries for EPA model building
# Python equivalents to the R packages used above
import pandas as pd # Data manipulation (Python's equivalent to dplyr)
import numpy as np # Numerical computing and array operations
import nfl_data_py as nfl # NFL data access (Python equivalent of nflfastR)
import matplotlib.pyplot as plt # Basic plotting (equivalent to base R graphics)
import seaborn as sns # Statistical visualizations (similar to ggplot2)
from sklearn.ensemble import GradientBoostingRegressor # Gradient boosting models
from sklearn.metrics import mean_squared_error, mean_absolute_error # Model evaluation
import warnings
warnings.filterwarnings('ignore') # Suppress warnings for cleaner output
# Set random seed for reproducibility
# Ensures consistent results across runs for any random operations
np.random.seed(42)
#| label: load-ep-data-py
#| message: false
#| warning: false
#| cache: true
# STEP 1: Load Historical Play-by-Play Data
# Using same seasons as R version (2016-2023) for consistency
seasons = list(range(2016, 2024)) # Creates [2016, 2017, ..., 2023]
pbp = nfl.import_pbp_data(seasons)
# Display initial data summary
print("=== RAW DATA LOADED ===")
print(f"Total plays across all seasons: {len(pbp):,}")
print(f"Seasons included: {', '.join(map(str, sorted(pbp['season'].unique())))}\n")
# STEP 2: Filter and Prepare Data for EP Modeling
# Apply same filtering logic as R version
ep_data = pbp[
pbp['down'].notna() & # Must have down information
pbp['ydstogo'].notna() & # Must have distance information
pbp['yardline_100'].notna() & # Must have field position
(pbp['season_type'] == 'REG') # Only regular season games
].copy() # .copy() creates independent DataFrame, prevents SettingWithCopyWarning
# STEP 3: Calculate Drive Outcomes
# Define function to compute net points scored on each drive
def calculate_drive_points(group):
"""
Calculate the net points scored on a drive.
Parameters:
-----------
group : DataFrame
A group of plays from a single drive
Returns:
--------
int : Net points scored on this drive (positive if offense scores,
negative if opponent scores next, 0 if no score)
"""
# Get the final result of this drive
# .iloc[-1] accesses the last element (final play of drive)
drive_result = group['fixed_drive_result'].iloc[-1]
# Convert drive outcomes to point values (same logic as R version)
if drive_result == 'Touchdown':
return 7 # TD + PAT
elif drive_result == 'Field goal':
return 3 # Successful field goal
elif drive_result == 'Safety':
return -2 # Offense commits safety
elif drive_result == 'Opp touchdown':
return -7 # Opponent scores TD next
elif drive_result == 'Opp field goal':
return -3 # Opponent scores FG next
else:
return 0 # No score by either team
# Apply function to each drive using groupby
# This creates drive_points column with net points for each play
ep_data['drive_points'] = ep_data.groupby(['game_id', 'drive']).apply(
calculate_drive_points
).reset_index(level=[0, 1], drop=True) # Reset index to align with original DataFrame
# STEP 4: Select Relevant Columns
# Keep only variables needed for modeling to reduce memory usage
ep_data = ep_data[[
'season', 'game_id', 'play_id', 'drive', # Identifiers
'down', 'ydstogo', 'yardline_100', # State variables (predictors)
'qtr', 'half_seconds_remaining', # Additional context
'score_differential', # Score context
'drive_points', # Target variable
'epa' # nflfastR's EPA for comparison
]]
# STEP 5: Create Train/Test Split
# Same split as R version for direct comparability
train_data = ep_data[ep_data['season'] <= 2022] # 7 seasons for training
test_data = ep_data[ep_data['season'] == 2023] # 1 season for testing
# Display data split summary
print("=== DATA PREPARATION COMPLETE ===")
print(f"Training observations: {len(train_data):,} ({len(train_data)/len(ep_data)*100:.1f}%)")
print(f"Testing observations: {len(test_data):,} ({len(test_data)/len(ep_data)*100:.1f}%)")
print(f"Training seasons: {', '.join(map(str, sorted(train_data['season'].unique())))}")
print(f"Test season: {test_data['season'].unique()[0]}\n")
# Verify data quality
print("=== DATA QUALITY CHECKS ===")
print("Missing values in train_data:")
print(f" down: {train_data['down'].isna().sum()}")
print(f" ydstogo: {train_data['ydstogo'].isna().sum()}")
print(f" yardline_100: {train_data['yardline_100'].isna().sum()}")
print(f" drive_points: {train_data['drive_points'].isna().sum()}")
Method 1: Binning and Averaging
The simplest approach to building an Expected Points model is binning and averaging—grouping similar game situations together and calculating the average points scored for each group. While this method is less sophisticated than GAM or gradient boosting, it's an excellent starting point because it's transparent, easy to understand, and requires no advanced statistical knowledge or specialized software.
The Core Idea: Imagine creating a massive lookup table where each row represents a unique combination of down, distance, and field position, and the value is the average points scored historically from that situation. When you want to know the EP for a new play, you simply look up the matching situation in your table. If you face 1st-and-10 from your own 30, you find all historical plays with 1st-and-10 from yards 25-35 (a bin), calculate their average drive outcome, and use that as your EP estimate.
Why Binning? In theory, we could create a separate estimate for every exact situation (1st-and-10 from the 30.5-yard line), but this creates two problems:
1. Data Sparsity: Many exact situations occur rarely or never, giving us no data or unreliable estimates
2. No Smoothing: We assume 1st-and-10 from the 30-yard line is fundamentally different from 1st-and-10 from the 31-yard line, when they're nearly identical
Binning solves both problems by grouping nearby situations together, increasing sample sizes and implicitly assuming similar situations have similar expected points.
The Trade-off: Binning introduces a classic bias-variance trade-off:
- Narrow bins (e.g., 5-yard field position increments): More precise, but less data per bin → higher variance, unstable estimates
- Wide bins (e.g., 20-yard field position increments): More stable, but less precise → higher bias, less discriminating
We'll use moderate bin widths that balance these concerns: 10-yard field position bins, categorical distance bins (1 yard, 2-3 yards, 4-6 yards, 7-9 yards, 10+ yards), and explicit down labels. This provides reasonable granularity while maintaining sufficient data per bin.
Advantages of Binning:
- Extremely simple to implement and explain to non-technical audiences
- No assumptions about functional form (non-parametric)
- Fast to compute, even with massive datasets
- Easy to update as new data arrives (just add new observations to bins)
- Interpretable: every prediction comes directly from historical averages
Disadvantages of Binning:
- Requires arbitrary choices about bin boundaries (where to split field position?)
- Discontinuities at bin boundaries (going from 39 to 41 yards changes bins despite similar situations)
- Doesn't handle sparse situations gracefully (what if a bin has only 2 observations?)
- Can't capture subtle interactions between variables
- Less accurate than sophisticated methods
Despite these limitations, binning provides a solid baseline model that often performs surprisingly well. Professional teams used binning-based EP models before modern machine learning became standard. Even today, binning is useful for exploratory analysis, quick approximations, and sanity-checking more complex models.
#| label: binning-model-r
#| message: false
#| warning: false
# Create binned EP lookup table
ep_binned <- train_data %>%
mutate(
# Create bins for each variable
ydline_bin = cut(yardline_100,
breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
include.lowest = TRUE,
labels = FALSE),
dist_bin = case_when(
ydstogo == 1 ~ "1",
ydstogo %in% 2:3 ~ "2-3",
ydstogo %in% 4:6 ~ "4-6",
ydstogo %in% 7:9 ~ "7-9",
TRUE ~ "10+"
)
) %>%
group_by(down, dist_bin, ydline_bin) %>%
summarise(
ep_binned = mean(drive_points),
n = n(),
.groups = "drop"
) %>%
filter(n >= 10) # Require minimum observations
# Apply to test data
test_predictions_binned <- test_data %>%
mutate(
ydline_bin = cut(yardline_100,
breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
include.lowest = TRUE,
labels = FALSE),
dist_bin = case_when(
ydstogo == 1 ~ "1",
ydstogo %in% 2:3 ~ "2-3",
ydstogo %in% 4:6 ~ "4-6",
ydstogo %in% 7:9 ~ "7-9",
TRUE ~ "10+"
)
) %>%
left_join(ep_binned, by = c("down", "dist_bin", "ydline_bin")) %>%
replace_na(list(ep_binned = 0))
# Calculate model performance
rmse_binned <- sqrt(mean((test_predictions_binned$drive_points -
test_predictions_binned$ep_binned)^2, na.rm = TRUE))
cat("Binning Model RMSE:", round(rmse_binned, 3), "\n")
#| label: binning-model-py
#| message: false
#| warning: false
# Create binned EP lookup table
train_binned = train_data.copy()
train_binned['ydline_bin'] = pd.cut(
train_binned['yardline_100'],
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
labels=False
)
train_binned['dist_bin'] = pd.cut(
train_binned['ydstogo'],
bins=[0, 1, 3, 6, 9, 100],
labels=['1', '2-3', '4-6', '7-9', '10+']
)
ep_binned = (train_binned
.groupby(['down', 'dist_bin', 'ydline_bin'])
.agg(
ep_binned=('drive_points', 'mean'),
n=('drive_points', 'count')
)
.reset_index()
.query('n >= 10') # Require minimum observations
)
# Apply to test data
test_binned = test_data.copy()
test_binned['ydline_bin'] = pd.cut(
test_binned['yardline_100'],
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
labels=False
)
test_binned['dist_bin'] = pd.cut(
test_binned['ydstogo'],
bins=[0, 1, 3, 6, 9, 100],
labels=['1', '2-3', '4-6', '7-9', '10+']
)
test_predictions_binned = test_binned.merge(
ep_binned[['down', 'dist_bin', 'ydline_bin', 'ep_binned']],
on=['down', 'dist_bin', 'ydline_bin'],
how='left'
).fillna({'ep_binned': 0})
# Calculate model performance
rmse_binned = np.sqrt(mean_squared_error(
test_predictions_binned['drive_points'],
test_predictions_binned['ep_binned']
))
print(f"Binning Model RMSE: {rmse_binned:.3f}")
Method 2: Generalized Additive Model (GAM)
GAMs provide smooth, non-linear relationships between variables and expected points.
#| label: gam-model-r
#| message: false
#| warning: false
#| cache: true
# Fit GAM model
gam_model <- gam(
drive_points ~ s(yardline_100, k = 20) +
s(ydstogo, k = 10) +
factor(down),
data = train_data,
family = gaussian()
)
# Generate predictions
test_data$ep_gam <- predict(gam_model, newdata = test_data)
# Calculate model performance
rmse_gam <- sqrt(mean((test_data$drive_points - test_data$ep_gam)^2,
na.rm = TRUE))
cat("GAM Model RMSE:", round(rmse_gam, 3), "\n")
# Model summary
summary(gam_model)
#| label: gam-model-py
#| message: false
#| warning: false
#| cache: true
from pygam import LinearGAM, s, f
# Prepare features
X_train = train_data[['yardline_100', 'ydstogo', 'down']].values
y_train = train_data['drive_points'].values
X_test = test_data[['yardline_100', 'ydstogo', 'down']].values
y_test = test_data['drive_points'].values
# Fit GAM model
gam_model = LinearGAM(s(0, n_splines=20) + s(1, n_splines=10) + f(2))
gam_model.fit(X_train, y_train)
# Generate predictions
test_data_py = test_data.copy()
test_data_py['ep_gam'] = gam_model.predict(X_test)
# Calculate model performance
rmse_gam = np.sqrt(mean_squared_error(y_test, test_data_py['ep_gam']))
print(f"GAM Model RMSE: {rmse_gam:.3f}")
print(f"\nModel Summary:")
print(f"Number of terms: {len(gam_model.terms)}")
print(f"Effective degrees of freedom: {gam_model.statistics_['edof']:.2f}")
Method 3: Gradient Boosting (Production Model)
Gradient boosting machines provide the best accuracy and are used in production EPA models.
#| label: xgb-model-r
#| message: false
#| warning: false
#| cache: true
# Prepare data for XGBoost
train_matrix <- train_data %>%
select(yardline_100, ydstogo, down) %>%
as.matrix()
train_label <- train_data$drive_points
test_matrix <- test_data %>%
select(yardline_100, ydstogo, down) %>%
as.matrix()
# Create DMatrix objects
dtrain <- xgb.DMatrix(data = train_matrix, label = train_label)
dtest <- xgb.DMatrix(data = test_matrix, label = test_data$drive_points)
# Set parameters
params <- list(
objective = "reg:squarederror",
eta = 0.1,
max_depth = 6,
subsample = 0.8,
colsample_bytree = 0.8
)
# Train model with cross-validation
xgb_model <- xgb.train(
params = params,
data = dtrain,
nrounds = 200,
watchlist = list(train = dtrain, test = dtest),
early_stopping_rounds = 20,
verbose = 0
)
# Generate predictions
test_data$ep_xgb <- predict(xgb_model, dtest)
# Calculate model performance
rmse_xgb <- sqrt(mean((test_data$drive_points - test_data$ep_xgb)^2,
na.rm = TRUE))
cat("XGBoost Model RMSE:", round(rmse_xgb, 3), "\n")
cat("Best iteration:", xgb_model$best_iteration, "\n")
# Feature importance
importance <- xgb.importance(model = xgb_model)
print(importance)
#| label: xgb-model-py
#| message: false
#| warning: false
#| cache: true
from sklearn.ensemble import GradientBoostingRegressor
# Prepare features
X_train = train_data[['yardline_100', 'ydstogo', 'down']].values
y_train = train_data['drive_points'].values
X_test = test_data[['yardline_100', 'ydstogo', 'down']].values
y_test = test_data['drive_points'].values
# Train gradient boosting model
gb_model = GradientBoostingRegressor(
n_estimators=200,
learning_rate=0.1,
max_depth=6,
subsample=0.8,
random_state=42,
verbose=0
)
gb_model.fit(X_train, y_train)
# Generate predictions
test_data_py['ep_gb'] = gb_model.predict(X_test)
# Calculate model performance
rmse_gb = np.sqrt(mean_squared_error(y_test, test_data_py['ep_gb']))
print(f"Gradient Boosting Model RMSE: {rmse_gb:.3f}")
# Feature importance
feature_importance = pd.DataFrame({
'feature': ['yardline_100', 'ydstogo', 'down'],
'importance': gb_model.feature_importances_
}).sort_values('importance', ascending=False)
print("\nFeature Importance:")
print(feature_importance.to_string(index=False))
Model Comparison
Let's compare all three modeling approaches:
#| label: model-comparison-r
#| message: false
#| warning: false
# Create comparison table
model_comparison <- tibble(
Model = c("Binning", "GAM", "XGBoost", "nflfastR"),
RMSE = c(
sqrt(mean((test_predictions_binned$drive_points -
test_predictions_binned$ep_binned)^2, na.rm = TRUE)),
sqrt(mean((test_data$drive_points - test_data$ep_gam)^2, na.rm = TRUE)),
sqrt(mean((test_data$drive_points - test_data$ep_xgb)^2, na.rm = TRUE)),
sqrt(mean((test_data$drive_points - test_data$epa)^2, na.rm = TRUE))
),
MAE = c(
mean(abs(test_predictions_binned$drive_points -
test_predictions_binned$ep_binned), na.rm = TRUE),
mean(abs(test_data$drive_points - test_data$ep_gam), na.rm = TRUE),
mean(abs(test_data$drive_points - test_data$ep_xgb), na.rm = TRUE),
mean(abs(test_data$drive_points - test_data$epa), na.rm = TRUE)
)
) %>%
mutate(
RMSE_pct = (RMSE / min(RMSE) - 1) * 100,
MAE_pct = (MAE / min(MAE) - 1) * 100
)
model_comparison %>%
gt() %>%
cols_label(
Model = "Model Type",
RMSE = "RMSE",
MAE = "MAE",
RMSE_pct = "RMSE vs Best (%)",
MAE_pct = "MAE vs Best (%)"
) %>%
fmt_number(
columns = c(RMSE, MAE),
decimals = 3
) %>%
fmt_number(
columns = c(RMSE_pct, MAE_pct),
decimals = 1
) %>%
tab_header(
title = "Expected Points Model Comparison",
subtitle = "2023 Season Test Set Performance"
)
#| label: model-comparison-py
#| message: false
#| warning: false
# Calculate metrics for all models
models = {
'Binning': test_predictions_binned['ep_binned'],
'GAM': test_data_py['ep_gam'],
'Gradient Boosting': test_data_py['ep_gb'],
'nflfastR': test_data['epa']
}
comparison_data = []
for model_name, predictions in models.items():
rmse = np.sqrt(mean_squared_error(
test_data['drive_points'][:len(predictions)],
predictions
))
mae = mean_absolute_error(
test_data['drive_points'][:len(predictions)],
predictions
)
comparison_data.append({
'Model': model_name,
'RMSE': rmse,
'MAE': mae
})
comparison_df = pd.DataFrame(comparison_data)
comparison_df['RMSE_pct'] = (comparison_df['RMSE'] / comparison_df['RMSE'].min() - 1) * 100
comparison_df['MAE_pct'] = (comparison_df['MAE'] / comparison_df['MAE'].min() - 1) * 100
print("\nExpected Points Model Comparison")
print("2023 Season Test Set Performance")
print("=" * 70)
print(comparison_df.to_string(index=False))
Visualizing the Expected Points Curve
One of the most important visualizations in football analytics is the expected points curve showing how EP varies by field position.
#| label: fig-ep-curve-r
#| fig-cap: "Expected Points by Field Position and Down"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Create prediction grid
ep_curve_data <- expand_grid(
yardline_100 = 1:99,
down = 1:4,
ydstogo = 10
) %>%
mutate(
ep = predict(xgb_model,
xgb.DMatrix(data = as.matrix(select(., yardline_100, ydstogo, down))))
)
# Plot EP curve
ggplot(ep_curve_data, aes(x = yardline_100, y = ep, color = factor(down))) +
geom_line(size = 1.5) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
geom_hline(yintercept = 3, linetype = "dotted", alpha = 0.5, color = "darkgreen") +
geom_hline(yintercept = 7, linetype = "dotted", alpha = 0.5, color = "darkblue") +
scale_x_reverse(
breaks = seq(0, 100, 10),
labels = c("Own\nGoal", seq(10, 40, 10), "50", seq(40, 10, -10), "Opp\nGoal")
) +
scale_color_manual(
values = c("1" = "#0072B2", "2" = "#009E73", "3" = "#D55E00", "4" = "#CC79A7"),
labels = c("1st Down", "2nd Down", "3rd Down", "4th Down")
) +
annotate("text", x = 95, y = 3.2, label = "Field Goal",
size = 3, color = "darkgreen", hjust = 0) +
annotate("text", x = 95, y = 7.2, label = "Touchdown",
size = 3, color = "darkblue", hjust = 0) +
labs(
title = "Expected Points by Field Position and Down",
subtitle = "Based on XGBoost model trained on 2016-2022 data, 10 yards to go",
x = "Field Position (Yards from Opponent Goal)",
y = "Expected Points",
color = "Down",
caption = "Data: nflfastR | Model: Custom XGBoost"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 11),
legend.position = "top",
panel.grid.minor = element_blank()
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-ep-curve-py
#| fig-cap: "Expected Points by Field Position and Down - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Create prediction grid
yardlines = np.arange(1, 100)
downs = [1, 2, 3, 4]
ydstogo = 10
fig, ax = plt.subplots(figsize=(12, 8))
colors = ['#0072B2', '#009E73', '#D55E00', '#CC79A7']
labels = ['1st Down', '2nd Down', '3rd Down', '4th Down']
for down, color, label in zip(downs, colors, labels):
X_grid = np.column_stack([
yardlines,
np.full_like(yardlines, ydstogo),
np.full_like(yardlines, down)
])
ep_values = gb_model.predict(X_grid)
ax.plot(yardlines, ep_values, color=color, linewidth=2.5, label=label)
# Add reference lines
ax.axhline(y=0, color='black', linestyle='--', alpha=0.5)
ax.axhline(y=3, color='darkgreen', linestyle=':', alpha=0.5)
ax.axhline(y=7, color='darkblue', linestyle=':', alpha=0.5)
# Add annotations
ax.text(95, 3.2, 'Field Goal', fontsize=10, color='darkgreen')
ax.text(95, 7.2, 'Touchdown', fontsize=10, color='darkblue')
# Format x-axis
ax.set_xlim(100, 0) # Reverse x-axis
ax.set_xticks([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
ax.set_xticklabels(['Own\nGoal', '10', '20', '30', '40', '50',
'40', '30', '20', '10', 'Opp\nGoal'])
ax.set_xlabel('Field Position (Yards from Opponent Goal)', fontsize=12, fontweight='bold')
ax.set_ylabel('Expected Points', fontsize=12, fontweight='bold')
ax.set_title('Expected Points by Field Position and Down\n' +
'Based on Gradient Boosting model trained on 2016-2022 data, 10 yards to go',
fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper left', fontsize=10, framealpha=0.9)
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: nfl_data_py | Model: Custom Gradient Boosting',
transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()
Key Observations from the EP Curve
1. **Field position matters tremendously**: Moving from your own 20 to your own 40 increases EP by ~0.8 points 2. **Diminishing returns**: The EP curve flattens as you approach the goal line 3. **Down matters**: 1st down is always more valuable than 4th down at the same position 4. **The 50-yard line**: This is roughly the break-even point where EP = 0 for most downsCalculating and Interpreting EPA
Now that we have an EP model, we can calculate EPA for any play by finding the difference in expected points before and after the play.
EPA Components
EPA can be decomposed into several components:
$$ \text{EPA} = \text{EPA}_{\text{yards}} + \text{EPA}_{\text{down}} + \text{EPA}_{\text{turnover}} + \text{EPA}_{\text{score}} $$
Where:
- $\text{EPA}_{\text{yards}}$ = value from yardage gained
- $\text{EPA}_{\text{down}}$ = value/penalty from down change
- $\text{EPA}_{\text{turnover}}$ = penalty for turnovers
- $\text{EPA}_{\text{score}}$ = value from immediate scoring
Calculating EPA from Scratch
#| label: calculate-epa-r
#| message: false
#| warning: false
# Function to calculate EPA for a single play
calculate_epa <- function(start_down, start_ydstogo, start_yardline,
end_down, end_ydstogo, end_yardline,
ep_model) {
# Get EP at start of play
start_state <- matrix(c(start_yardline, start_ydstogo, start_down), nrow = 1)
ep_start <- predict(ep_model, xgb.DMatrix(data = start_state))
# Get EP at end of play
end_state <- matrix(c(end_yardline, end_ydstogo, end_down), nrow = 1)
ep_end <- predict(ep_model, xgb.DMatrix(data = end_state))
# EPA is the difference
epa <- ep_end - ep_start
return(list(ep_start = ep_start, ep_end = ep_end, epa = epa))
}
# Example: 15-yard completion on 1st-and-10 from own 25
example_play <- calculate_epa(
start_down = 1, start_ydstogo = 10, start_yardline = 75,
end_down = 1, end_ydstogo = 10, end_yardline = 60,
ep_model = xgb_model
)
cat("Example Play: 15-yard completion on 1st-and-10 from own 25\n")
cat("EP Start:", round(example_play$ep_start, 3), "\n")
cat("EP End:", round(example_play$ep_end, 3), "\n")
cat("EPA:", round(example_play$epa, 3), "\n")
#| label: calculate-epa-py
#| message: false
#| warning: false
def calculate_epa(start_down, start_ydstogo, start_yardline,
end_down, end_ydstogo, end_yardline,
ep_model):
"""Calculate EPA for a single play"""
# Get EP at start of play
start_state = np.array([[start_yardline, start_ydstogo, start_down]])
ep_start = ep_model.predict(start_state)[0]
# Get EP at end of play
end_state = np.array([[end_yardline, end_ydstogo, end_down]])
ep_end = ep_model.predict(end_state)[0]
# EPA is the difference
epa = ep_end - ep_start
return {
'ep_start': ep_start,
'ep_end': ep_end,
'epa': epa
}
# Example: 15-yard completion on 1st-and-10 from own 25
example_play = calculate_epa(
start_down=1, start_ydstogo=10, start_yardline=75,
end_down=1, end_ydstogo=10, end_yardline=60,
ep_model=gb_model
)
print("Example Play: 15-yard completion on 1st-and-10 from own 25")
print(f"EP Start: {example_play['ep_start']:.3f}")
print(f"EP End: {example_play['ep_end']:.3f}")
print(f"EPA: {example_play['epa']:.3f}")
EPA Distributions by Play Type
Let's analyze how EPA varies by play type:
#| label: fig-epa-by-play-type-r
#| fig-cap: "EPA Distribution by Play Type"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Load fresh data with play types
pbp_2023 <- load_pbp(2023)
# Calculate EPA statistics by play type
epa_by_type <- pbp_2023 %>%
filter(!is.na(epa), play_type %in% c("pass", "run")) %>%
group_by(play_type) %>%
summarise(
n = n(),
mean_epa = mean(epa),
median_epa = median(epa),
sd_epa = sd(epa),
q25 = quantile(epa, 0.25),
q75 = quantile(epa, 0.75),
success_rate = mean(epa > 0),
.groups = "drop"
)
# Display summary table
epa_by_type %>%
gt() %>%
cols_label(
play_type = "Play Type",
n = "Plays",
mean_epa = "Mean EPA",
median_epa = "Median EPA",
sd_epa = "Std Dev",
q25 = "25th %ile",
q75 = "75th %ile",
success_rate = "Success %"
) %>%
fmt_number(
columns = c(mean_epa, median_epa, sd_epa, q25, q75),
decimals = 3
) %>%
fmt_percent(
columns = success_rate,
decimals = 1
) %>%
fmt_number(
columns = n,
decimals = 0,
use_seps = TRUE
) %>%
tab_header(
title = "EPA Statistics by Play Type",
subtitle = "2023 NFL Regular Season"
)
# Create violin plot
pbp_2023 %>%
filter(!is.na(epa), play_type %in% c("pass", "run"), abs(epa) < 5) %>%
ggplot(aes(x = play_type, y = epa, fill = play_type)) +
geom_violin(alpha = 0.6, trim = FALSE) +
geom_boxplot(width = 0.2, fill = "white", alpha = 0.8) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.7) +
scale_fill_manual(
values = c("pass" = "#00BFC4", "run" = "#F8766D"),
labels = c("Pass", "Run")
) +
scale_x_discrete(labels = c("Pass", "Run")) +
labs(
title = "EPA Distribution by Play Type",
subtitle = "2023 NFL Regular Season (EPA limited to ±5 for visualization)",
x = "Play Type",
y = "Expected Points Added",
caption = "Data: nflfastR"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 16),
legend.position = "none",
panel.grid.major.x = element_blank()
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-epa-by-play-type-py
#| fig-cap: "EPA Distribution by Play Type - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Load fresh data
pbp_2023_py = nfl.import_pbp_data([2023])
# Calculate EPA statistics by play type
epa_by_type = (pbp_2023_py
.query("epa.notna() & play_type.isin(['pass', 'run'])")
.groupby('play_type')
.agg(
n=('epa', 'count'),
mean_epa=('epa', 'mean'),
median_epa=('epa', 'median'),
sd_epa=('epa', 'std'),
q25=('epa', lambda x: x.quantile(0.25)),
q75=('epa', lambda x: x.quantile(0.75)),
success_rate=('epa', lambda x: (x > 0).mean())
)
.reset_index()
)
print("\nEPA Statistics by Play Type (2023 NFL Regular Season)")
print("=" * 80)
print(epa_by_type.to_string(index=False))
# Create violin plot
fig, ax = plt.subplots(figsize=(12, 8))
plot_data = pbp_2023_py.query(
"epa.notna() & play_type.isin(['pass', 'run']) & abs(epa) < 5"
)
positions = [1, 2]
colors = ['#00BFC4', '#F8766D']
labels = ['Pass', 'Run']
for pos, play_type, color in zip(positions, ['pass', 'run'], colors):
data = plot_data[plot_data['play_type'] == play_type]['epa']
# Create violin plot
parts = ax.violinplot([data], positions=[pos], widths=0.7,
showmeans=False, showextrema=False)
for pc in parts['bodies']:
pc.set_facecolor(color)
pc.set_alpha(0.6)
# Add boxplot
bp = ax.boxplot([data], positions=[pos], widths=0.2,
patch_artist=True,
boxprops=dict(facecolor='white', alpha=0.8),
medianprops=dict(color='black', linewidth=2),
whiskerprops=dict(color='black'),
capprops=dict(color='black'))
ax.axhline(y=0, color='black', linestyle='--', alpha=0.7)
ax.set_xticks(positions)
ax.set_xticklabels(labels)
ax.set_xlabel('Play Type', fontsize=12, fontweight='bold')
ax.set_ylabel('Expected Points Added', fontsize=12, fontweight='bold')
ax.set_title('EPA Distribution by Play Type\n' +
'2023 NFL Regular Season (EPA limited to ±5 for visualization)',
fontsize=14, fontweight='bold', pad=20)
ax.grid(True, alpha=0.3, axis='y')
ax.text(0.98, 0.02, 'Data: nfl_data_py',
transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()
Key Insights from EPA Distributions
1. **Pass plays have higher mean EPA** than run plays (~0.05 vs -0.05), but also higher variance 2. **Pass plays have heavier tails**: More big gains and big losses 3. **Both distributions are centered near zero**: This is by design—EP models are trained to have mean zero 4. **Success rates differ**: Pass plays succeed ~48% of the time, runs ~43%EPA by Game Situation: Understanding Context-Dependent EPA
While we've established that EPA provides a context-aware measure of play value, it's crucial to understand HOW EPA varies across different game situations. Not all positive EPA is equally impressive, and not all negative EPA is equally damaging. A play's EPA value must be interpreted relative to the specific situation—what was possible given the down, distance, and field position?
This section explores how EPA distributions change across different game contexts, revealing patterns that inform strategic decision-making. You'll see that certain situations favor offense or defense, that some downs are more volatile than others, and that field position fundamentally alters what constitutes "success." Understanding these patterns helps you:
- Evaluate play-calling: Was that conservative run on 2nd-and-short smart given the situation?
- Assess player performance: Did the quarterback deliver positive EPA in difficult situations?
- Identify team tendencies: Does this offense struggle specifically on 3rd-and-medium?
- Make strategic decisions: When should we be aggressive vs. conservative based on expected EPA?
The patterns we'll uncover aren't just statistical curiosities—they reflect fundamental football realities about down-and-distance management, field position value, and optimal strategy. These insights explain why coaches make certain decisions, why some statistics mislead, and how to think probabilistically about game situations.
EPA by Down and Distance: How Game State Affects Play Value
Down and distance fundamentally shape what's possible on any play. On 1st-and-10, the offense has three attempts to gain 10 yards—a manageable task with multiple strategic options. On 3rd-and-15, the offense must gain 15 yards in one attempt or face punting—a much more challenging situation with limited options. EPA reflects these realities, with distributions that shift dramatically across downs and distances.
Let's examine how EPA varies by down to understand these contextual differences and their strategic implications.
#| label: fig-epa-by-down-distance-r
#| fig-cap: "Mean EPA by Down and Distance Category"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Calculate EPA by down and distance
epa_down_dist <- pbp_2023 %>%
filter(
!is.na(epa),
play_type %in% c("pass", "run")
) %>%
mutate(
dist_category = case_when(
ydstogo <= 3 ~ "Short (1-3)",
ydstogo <= 6 ~ "Medium (4-6)",
ydstogo <= 9 ~ "Long (7-9)",
TRUE ~ "Very Long (10+)"
),
dist_category = factor(dist_category,
levels = c("Short (1-3)", "Medium (4-6)",
"Long (7-9)", "Very Long (10+)"))
) %>%
group_by(down, dist_category, play_type) %>%
summarise(
n = n(),
mean_epa = mean(epa),
se_epa = sd(epa) / sqrt(n()),
.groups = "drop"
) %>%
filter(n >= 20) # Require minimum sample size
# Create faceted plot
ggplot(epa_down_dist, aes(x = dist_category, y = mean_epa, fill = play_type)) +
geom_col(position = "dodge", alpha = 0.8) +
geom_errorbar(
aes(ymin = mean_epa - 1.96 * se_epa, ymax = mean_epa + 1.96 * se_epa),
position = position_dodge(0.9),
width = 0.2
) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
facet_wrap(~down, labeller = labeller(down = function(x) paste0(x, "st/nd/rd/th Down"))) +
scale_fill_manual(
values = c("pass" = "#00BFC4", "run" = "#F8766D"),
labels = c("Pass", "Run")
) +
labs(
title = "Mean EPA by Down and Distance",
subtitle = "2023 NFL Regular Season with 95% confidence intervals",
x = "Distance Category",
y = "Mean EPA",
fill = "Play Type",
caption = "Data: nflfastR | Minimum 20 plays per category"
) +
theme_minimal(base_size = 11) +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "top",
axis.text.x = element_text(angle = 15, hjust = 1),
panel.grid.major.x = element_blank()
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-epa-by-down-distance-py
#| fig-cap: "Mean EPA by Down and Distance Category - Python"
#| fig-width: 12
#| fig-height: 10
#| message: false
#| warning: false
# Calculate EPA by down and distance
pbp_filtered = pbp_2023_py.query("epa.notna() & play_type.isin(['pass', 'run'])")
pbp_filtered['dist_category'] = pd.cut(
pbp_filtered['ydstogo'],
bins=[0, 3, 6, 9, 100],
labels=['Short (1-3)', 'Medium (4-6)', 'Long (7-9)', 'Very Long (10+)']
)
epa_down_dist = (pbp_filtered
.groupby(['down', 'dist_category', 'play_type'])
.agg(
n=('epa', 'count'),
mean_epa=('epa', 'mean'),
se_epa=('epa', lambda x: x.std() / np.sqrt(len(x)))
)
.reset_index()
.query('n >= 20')
)
# Create subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.flatten()
for idx, down in enumerate([1, 2, 3, 4]):
ax = axes[idx]
data = epa_down_dist[epa_down_dist['down'] == down]
if len(data) == 0:
continue
# Set up positions
categories = ['Short (1-3)', 'Medium (4-6)', 'Long (7-9)', 'Very Long (10+)']
x = np.arange(len(categories))
width = 0.35
pass_data = data[data['play_type'] == 'pass'].set_index('dist_category')
run_data = data[data['play_type'] == 'run'].set_index('dist_category')
# Plot bars
for i, cat in enumerate(categories):
if cat in pass_data.index:
row = pass_data.loc[cat]
ax.bar(i - width/2, row['mean_epa'], width,
label='Pass' if i == 0 else '',
color='#00BFC4', alpha=0.8)
ax.errorbar(i - width/2, row['mean_epa'],
yerr=1.96*row['se_epa'],
color='black', capsize=3, linewidth=1)
if cat in run_data.index:
row = run_data.loc[cat]
ax.bar(i + width/2, row['mean_epa'], width,
label='Run' if i == 0 else '',
color='#F8766D', alpha=0.8)
ax.errorbar(i + width/2, row['mean_epa'],
yerr=1.96*row['se_epa'],
color='black', capsize=3, linewidth=1)
ax.axhline(y=0, color='black', linestyle='--', alpha=0.5)
ax.set_xticks(x)
ax.set_xticklabels(categories, rotation=15, ha='right')
ax.set_ylabel('Mean EPA', fontweight='bold')
ax.set_title(f'{down}{"st" if down==1 else "nd" if down==2 else "rd" if down==3 else "th"} Down',
fontweight='bold')
ax.grid(True, alpha=0.3, axis='y')
if idx == 0:
ax.legend(loc='upper left')
fig.suptitle('Mean EPA by Down and Distance\n2023 NFL Regular Season with 95% confidence intervals',
fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()
EPA by Field Position
#| label: fig-epa-by-field-position-r
#| fig-cap: "Mean EPA by Field Position Zone"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Calculate EPA by field position zone
epa_field_position <- pbp_2023 %>%
filter(
!is.na(epa),
play_type %in% c("pass", "run")
) %>%
mutate(
field_zone = case_when(
yardline_100 >= 90 ~ "Own 10",
yardline_100 >= 75 ~ "Own 11-25",
yardline_100 >= 55 ~ "Own 26-45",
yardline_100 >= 45 ~ "Midfield",
yardline_100 >= 25 ~ "Opp 45-26",
yardline_100 >= 10 ~ "Opp 25-11",
TRUE ~ "Red Zone"
),
field_zone = factor(field_zone,
levels = c("Own 10", "Own 11-25", "Own 26-45",
"Midfield", "Opp 45-26", "Opp 25-11", "Red Zone"))
) %>%
group_by(field_zone, play_type) %>%
summarise(
n = n(),
mean_epa = mean(epa),
success_rate = mean(epa > 0),
.groups = "drop"
)
# Create dual-axis plot
p1 <- ggplot(epa_field_position, aes(x = field_zone, group = play_type)) +
geom_line(aes(y = mean_epa, color = play_type), size = 1.5) +
geom_point(aes(y = mean_epa, color = play_type), size = 3) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
scale_color_manual(
values = c("pass" = "#00BFC4", "run" = "#F8766D"),
labels = c("Pass", "Run")
) +
labs(
title = "Mean EPA by Field Position Zone",
subtitle = "2023 NFL Regular Season",
x = "Field Position Zone",
y = "Mean EPA",
color = "Play Type",
caption = "Data: nflfastR"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14),
axis.text.x = element_text(angle = 30, hjust = 1),
legend.position = "top",
panel.grid.major.x = element_blank()
)
p1
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-epa-by-field-position-py
#| fig-cap: "Mean EPA by Field Position Zone - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Calculate EPA by field position zone
def field_zone(yardline):
if yardline >= 90:
return "Own 10"
elif yardline >= 75:
return "Own 11-25"
elif yardline >= 55:
return "Own 26-45"
elif yardline >= 45:
return "Midfield"
elif yardline >= 25:
return "Opp 45-26"
elif yardline >= 10:
return "Opp 25-11"
else:
return "Red Zone"
pbp_field = pbp_2023_py.query("epa.notna() & play_type.isin(['pass', 'run'])")
pbp_field['field_zone'] = pbp_field['yardline_100'].apply(field_zone)
zone_order = ["Own 10", "Own 11-25", "Own 26-45", "Midfield",
"Opp 45-26", "Opp 25-11", "Red Zone"]
pbp_field['field_zone'] = pd.Categorical(pbp_field['field_zone'],
categories=zone_order,
ordered=True)
epa_field_position = (pbp_field
.groupby(['field_zone', 'play_type'])
.agg(
n=('epa', 'count'),
mean_epa=('epa', 'mean'),
success_rate=('epa', lambda x: (x > 0).mean())
)
.reset_index()
)
# Create plot
fig, ax = plt.subplots(figsize=(12, 7))
for play_type, color, label in [('pass', '#00BFC4', 'Pass'),
('run', '#F8766D', 'Run')]:
data = epa_field_position[epa_field_position['play_type'] == play_type]
ax.plot(data['field_zone'], data['mean_epa'],
marker='o', markersize=8, linewidth=2.5,
color=color, label=label)
ax.axhline(y=0, color='black', linestyle='--', alpha=0.5)
ax.set_xlabel('Field Position Zone', fontsize=12, fontweight='bold')
ax.set_ylabel('Mean EPA', fontsize=12, fontweight='bold')
ax.set_title('Mean EPA by Field Position Zone\n2023 NFL Regular Season',
fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper right', fontsize=11, title='Play Type')
ax.grid(True, alpha=0.3)
plt.xticks(rotation=30, ha='right')
ax.text(0.98, 0.02, 'Data: nfl_data_py',
transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()
Field Position Effects on EPA
1. **Red Zone uniqueness**: EPA distributions are markedly different in the red zone due to limited field and high scoring probability 2. **Pass advantage increases** as you move away from your own goal 3. **Run plays stabilize** in EPA across most field positions 4. **Variance increases** near midfield where more play types are viableContextual EPA Analysis
Basic EPA doesn't account for all game context. Let's explore several important adjustments.
Score-Adjusted EPA
Game script significantly impacts play-calling and EPA. Teams with large leads run more, while teams trailing pass more.
#| label: fig-score-adjusted-epa-r
#| fig-cap: "EPA by Play Type and Score Differential"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Calculate EPA by score differential
epa_score <- pbp_2023 %>%
filter(
!is.na(epa),
play_type %in% c("pass", "run"),
abs(score_differential) <= 21
) %>%
mutate(
score_diff_bin = cut(score_differential,
breaks = seq(-21, 21, 3),
include.lowest = TRUE)
) %>%
group_by(score_diff_bin, play_type) %>%
summarise(
n = n(),
mean_epa = mean(epa),
play_pct = n() / sum(n()),
.groups = "drop"
) %>%
filter(n >= 20)
# Extract midpoint of bins for plotting
epa_score <- epa_score %>%
mutate(
score_mid = as.numeric(gsub("\\(|\\]|\\[|\\,.*", "", score_diff_bin)) + 1.5
)
# Plot
ggplot(epa_score, aes(x = score_mid, y = mean_epa, color = play_type)) +
geom_line(size = 1.3) +
geom_point(size = 2.5) +
geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.5) +
geom_vline(xintercept = 0, linetype = "dotted", alpha = 0.5) +
scale_color_manual(
values = c("pass" = "#00BFC4", "run" = "#F8766D"),
labels = c("Pass", "Run")
) +
labs(
title = "EPA by Play Type and Score Differential",
subtitle = "2023 NFL Regular Season | Positive = Team Leading",
x = "Score Differential (Team Score - Opponent Score)",
y = "Mean EPA",
color = "Play Type",
caption = "Data: nflfastR"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "top"
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-score-adjusted-epa-py
#| fig-cap: "EPA by Play Type and Score Differential - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Calculate EPA by score differential
pbp_score = pbp_2023_py.query(
"epa.notna() & play_type.isin(['pass', 'run']) & abs(score_differential) <= 21"
)
pbp_score['score_diff_bin'] = pd.cut(
pbp_score['score_differential'],
bins=range(-21, 24, 3)
)
epa_score = (pbp_score
.groupby(['score_diff_bin', 'play_type'])
.agg(
n=('epa', 'count'),
mean_epa=('epa', 'mean')
)
.reset_index()
.query('n >= 20')
)
# Get midpoint of bins
epa_score['score_mid'] = epa_score['score_diff_bin'].apply(lambda x: x.mid)
# Plot
fig, ax = plt.subplots(figsize=(12, 7))
for play_type, color, label in [('pass', '#00BFC4', 'Pass'),
('run', '#F8766D', 'Run')]:
data = epa_score[epa_score['play_type'] == play_type].sort_values('score_mid')
ax.plot(data['score_mid'], data['mean_epa'],
marker='o', markersize=7, linewidth=2,
color=color, label=label)
ax.axhline(y=0, color='black', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='black', linestyle=':', alpha=0.5)
ax.set_xlabel('Score Differential (Team Score - Opponent Score)',
fontsize=12, fontweight='bold')
ax.set_ylabel('Mean EPA', fontsize=12, fontweight='bold')
ax.set_title('EPA by Play Type and Score Differential\n' +
'2023 NFL Regular Season | Positive = Team Leading',
fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper right', fontsize=11, title='Play Type')
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: nfl_data_py',
transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()
Garbage Time Adjustment
"Garbage time" refers to situations where the game outcome is largely decided. These plays can skew EPA metrics and should often be filtered.
#| label: garbage-time-analysis-r
#| message: false
#| warning: false
# Define garbage time function
is_garbage_time <- function(score_diff, qtr, seconds_remaining) {
# Garbage time if:
# - Up/down 3+ scores in Q4 with <5 min
# - Up/down 4+ scores anytime in Q4
(abs(score_diff) >= 24 & qtr == 4 & seconds_remaining < 300) |
(abs(score_diff) >= 31 & qtr == 4)
}
# Compare EPA with and without garbage time
epa_comparison <- pbp_2023 %>%
filter(!is.na(epa), play_type %in% c("pass", "run")) %>%
mutate(
is_garbage = is_garbage_time(score_differential, qtr, game_seconds_remaining)
) %>%
group_by(posteam) %>%
summarise(
all_epa = mean(epa),
non_garbage_epa = mean(epa[!is_garbage]),
garbage_pct = mean(is_garbage) * 100,
.groups = "drop"
) %>%
arrange(desc(non_garbage_epa))
# Display top 10 teams
epa_comparison %>%
head(10) %>%
gt() %>%
cols_label(
posteam = "Team",
all_epa = "All Plays EPA",
non_garbage_epa = "Non-Garbage EPA",
garbage_pct = "Garbage Time %"
) %>%
fmt_number(
columns = c(all_epa, non_garbage_epa),
decimals = 3
) %>%
fmt_number(
columns = garbage_pct,
decimals = 1
) %>%
tab_header(
title = "Top 10 Offenses by EPA (2023)",
subtitle = "Comparing all plays vs non-garbage time"
) %>%
data_color(
columns = non_garbage_epa,
colors = scales::col_numeric(
palette = c("#FEE5D9", "#A50F15"),
domain = NULL
)
)
#| label: garbage-time-analysis-py
#| message: false
#| warning: false
def is_garbage_time(row):
"""Define garbage time situations"""
score_diff = abs(row['score_differential'])
qtr = row['qtr']
seconds = row['game_seconds_remaining']
# Garbage time if:
# - Up/down 3+ scores in Q4 with <5 min
# - Up/down 4+ scores anytime in Q4
return ((score_diff >= 24 and qtr == 4 and seconds < 300) or
(score_diff >= 31 and qtr == 4))
# Compare EPA with and without garbage time
pbp_garbage = pbp_2023_py.query("epa.notna() & play_type.isin(['pass', 'run'])")
pbp_garbage['is_garbage'] = pbp_garbage.apply(is_garbage_time, axis=1)
epa_comparison = (pbp_garbage
.groupby('posteam')
.agg(
all_epa=('epa', 'mean'),
non_garbage_epa=('epa', lambda x: x[~pbp_garbage.loc[x.index, 'is_garbage']].mean()),
garbage_pct=('is_garbage', lambda x: x.mean() * 100)
)
.reset_index()
.sort_values('non_garbage_epa', ascending=False)
)
print("\nTop 10 Offenses by EPA (2023)")
print("Comparing all plays vs non-garbage time")
print("=" * 70)
print(epa_comparison.head(10).to_string(index=False))
EPA Stability and Predictive Power
One of EPA's greatest strengths is its stability and predictive power. Let's examine how well EPA predicts future performance and win outcomes.
Year-to-Year Stability
#| label: epa-stability-r
#| message: false
#| warning: false
#| cache: true
# Load multiple seasons
pbp_multi <- load_pbp(2020:2023)
# Calculate team EPA by season
team_epa_by_season <- pbp_multi %>%
filter(!is.na(epa), play_type %in% c("pass", "run")) %>%
group_by(season, posteam) %>%
summarise(
off_epa = mean(epa),
plays = n(),
.groups = "drop"
) %>%
filter(plays >= 500) # Minimum play threshold
# Calculate year-to-year correlation
stability_results <- team_epa_by_season %>%
arrange(posteam, season) %>%
group_by(posteam) %>%
mutate(
next_season_epa = lead(off_epa)
) %>%
ungroup() %>%
filter(!is.na(next_season_epa))
# Calculate correlation
epa_correlation <- cor(stability_results$off_epa,
stability_results$next_season_epa)
cat("Year-to-year EPA correlation:", round(epa_correlation, 3), "\n")
cat("R-squared:", round(epa_correlation^2, 3), "\n")
#| label: epa-stability-py
#| message: false
#| warning: false
#| cache: true
# Load multiple seasons
pbp_multi = nfl.import_pbp_data([2020, 2021, 2022, 2023])
# Calculate team EPA by season
team_epa_by_season = (pbp_multi
.query("epa.notna() & play_type.isin(['pass', 'run'])")
.groupby(['season', 'posteam'])
.agg(
off_epa=('epa', 'mean'),
plays=('epa', 'count')
)
.reset_index()
.query('plays >= 500')
)
# Calculate year-to-year correlation
stability_results = (team_epa_by_season
.sort_values(['posteam', 'season'])
.groupby('posteam')
.apply(lambda x: x.assign(next_season_epa=x['off_epa'].shift(-1)))
.reset_index(drop=True)
.dropna(subset=['next_season_epa'])
)
# Calculate correlation
epa_correlation = stability_results['off_epa'].corr(
stability_results['next_season_epa']
)
print(f"\nYear-to-year EPA correlation: {epa_correlation:.3f}")
print(f"R-squared: {epa_correlation**2:.3f}")
EPA and Winning
#| label: fig-epa-wins-r
#| fig-cap: "Team EPA vs Win Percentage"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false
# Calculate team EPA and win percentage
team_performance <- pbp_2023 %>%
filter(!is.na(epa)) %>%
group_by(posteam) %>%
summarise(
off_epa = mean(epa[play_type %in% c("pass", "run")], na.rm = TRUE),
def_epa = -mean(epa[defteam == posteam], na.rm = TRUE),
plays = n(),
.groups = "drop"
)
# Get win totals
games_2023 <- nflreadr::load_schedules(2023) %>%
filter(game_type == "REG") %>%
select(season, week, home_team, away_team, home_score, away_score)
team_wins <- bind_rows(
games_2023 %>%
transmute(team = home_team,
win = as.numeric(home_score > away_score)),
games_2023 %>%
transmute(team = away_team,
win = as.numeric(away_score > home_score))
) %>%
group_by(team) %>%
summarise(
games = n(),
wins = sum(win),
win_pct = wins / games,
.groups = "drop"
)
# Combine
team_full <- team_performance %>%
left_join(team_wins, by = c("posteam" = "team"))
# Calculate correlation
epa_win_cor <- cor(team_full$off_epa, team_full$win_pct, use = "complete.obs")
# Plot
ggplot(team_full, aes(x = off_epa, y = win_pct)) +
geom_hline(yintercept = 0.5, linetype = "dashed", alpha = 0.5) +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
geom_smooth(method = "lm", se = TRUE, color = "gray60", alpha = 0.3) +
geom_nfl_logos(aes(team_abbr = posteam), width = 0.05) +
labs(
title = "Offensive EPA vs Win Percentage",
subtitle = sprintf("2023 NFL Regular Season | Correlation: r = %.3f", epa_win_cor),
x = "Offensive EPA per Play",
y = "Win Percentage",
caption = "Data: nflfastR"
) +
scale_y_continuous(labels = scales::percent_format()) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 14),
panel.grid.minor = element_blank()
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-epa-wins-py
#| fig-cap: "Team EPA vs Win Percentage - Python"
#| fig-width: 10
#| fig-height: 8
#| message: false
#| warning: false
from scipy import stats
# Calculate team EPA
team_performance = (pbp_2023_py
.query("epa.notna()")
.groupby('posteam')
.agg(
off_epa=('epa', lambda x: x[pbp_2023_py.loc[x.index, 'play_type'].isin(['pass', 'run'])].mean()),
plays=('epa', 'count')
)
.reset_index()
)
# Get win totals
games_2023 = nfl.import_schedules([2023])
games_reg = games_2023[games_2023['game_type'] == 'REG']
home_results = pd.DataFrame({
'team': games_reg['home_team'],
'win': (games_reg['home_score'] > games_reg['away_score']).astype(int)
})
away_results = pd.DataFrame({
'team': games_reg['away_team'],
'win': (games_reg['away_score'] > games_reg['home_score']).astype(int)
})
team_wins = pd.concat([home_results, away_results]).groupby('team').agg({
'win': ['count', 'sum']
}).reset_index()
team_wins.columns = ['team', 'games', 'wins']
team_wins['win_pct'] = team_wins['wins'] / team_wins['games']
# Combine
team_full = team_performance.merge(team_wins, left_on='posteam', right_on='team')
# Calculate correlation
epa_win_cor = team_full['off_epa'].corr(team_full['win_pct'])
# Plot
fig, ax = plt.subplots(figsize=(10, 8))
# Add reference lines
ax.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
# Regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(
team_full['off_epa'], team_full['win_pct']
)
line_x = np.array([team_full['off_epa'].min(), team_full['off_epa'].max()])
line_y = slope * line_x + intercept
ax.plot(line_x, line_y, color='gray', alpha=0.6, linewidth=2)
# Scatter plot
ax.scatter(team_full['off_epa'], team_full['win_pct'],
s=150, alpha=0.6, color='steelblue')
# Add team labels
for _, row in team_full.iterrows():
ax.annotate(row['posteam'],
(row['off_epa'], row['win_pct']),
fontsize=8, ha='center', va='center')
ax.set_xlabel('Offensive EPA per Play', fontsize=12, fontweight='bold')
ax.set_ylabel('Win Percentage', fontsize=12, fontweight='bold')
ax.set_title(f'Offensive EPA vs Win Percentage\n' +
f'2023 NFL Regular Season | Correlation: r = {epa_win_cor:.3f}',
fontsize=14, fontweight='bold', pad=20)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Data: nfl_data_py',
transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
EPA's Predictive Power
EPA is one of the strongest predictors of team success: 1. **High correlation with wins**: r ≈ 0.75-0.85 in most seasons 2. **Year-to-year stability**: r ≈ 0.40-0.50, higher than most metrics 3. **Better than yards**: Explains ~60% of win variance vs ~40% for total yards 4. **Additive**: Offensive + Defensive EPA explains ~75% of win varianceEPA vs Traditional Metrics
Let's compare EPA to traditional football statistics to understand its advantages.
EPA vs Total Yards
#| label: fig-epa-vs-yards-r
#| fig-cap: "Comparing EPA and Total Yards as Predictors"
#| fig-width: 12
#| fig-height: 6
#| message: false
#| warning: false
# Calculate team yards
team_yards <- pbp_2023 %>%
filter(play_type %in% c("pass", "run")) %>%
group_by(posteam) %>%
summarise(
total_yards = sum(yards_gained, na.rm = TRUE),
total_plays = n(),
yards_per_play = total_yards / total_plays,
.groups = "drop"
)
# Join with EPA and wins
comparison_data <- team_full %>%
left_join(team_yards, by = "posteam")
# Calculate correlations
cor_epa <- cor(comparison_data$off_epa, comparison_data$win_pct,
use = "complete.obs")
cor_yards <- cor(comparison_data$yards_per_play, comparison_data$win_pct,
use = "complete.obs")
# Create comparison plots
p1 <- ggplot(comparison_data, aes(x = off_epa, y = win_pct)) +
geom_smooth(method = "lm", se = TRUE, color = "gray60", alpha = 0.3) +
geom_nfl_logos(aes(team_abbr = posteam), width = 0.05) +
labs(
title = "EPA per Play vs Win %",
subtitle = sprintf("r = %.3f | R² = %.3f", cor_epa, cor_epa^2),
x = "EPA per Play",
y = "Win Percentage"
) +
scale_y_continuous(labels = scales::percent_format()) +
theme_minimal() +
theme(plot.title = element_text(face = "bold"))
p2 <- ggplot(comparison_data, aes(x = yards_per_play, y = win_pct)) +
geom_smooth(method = "lm", se = TRUE, color = "gray60", alpha = 0.3) +
geom_nfl_logos(aes(team_abbr = posteam), width = 0.05) +
labs(
title = "Yards per Play vs Win %",
subtitle = sprintf("r = %.3f | R² = %.3f", cor_yards, cor_yards^2),
x = "Yards per Play",
y = "Win Percentage"
) +
scale_y_continuous(labels = scales::percent_format()) +
theme_minimal() +
theme(plot.title = element_text(face = "bold"))
library(patchwork)
p1 + p2 +
plot_annotation(
title = "EPA vs Yards: Which Predicts Winning Better?",
subtitle = "2023 NFL Regular Season",
caption = "Data: nflfastR"
)
#| label: fig-epa-vs-yards-py
#| fig-cap: "Comparing EPA and Total Yards as Predictors - Python"
#| fig-width: 12
#| fig-height: 6
#| message: false
#| warning: false
# Calculate team yards
team_yards = (pbp_2023_py
.query("play_type.isin(['pass', 'run'])")
.groupby('posteam')
.agg(
total_yards=('yards_gained', 'sum'),
total_plays=('yards_gained', 'count')
)
.reset_index()
)
team_yards['yards_per_play'] = team_yards['total_yards'] / team_yards['total_plays']
# Join with EPA and wins
comparison_data = team_full.merge(team_yards, on='posteam')
# Calculate correlations
cor_epa = comparison_data['off_epa'].corr(comparison_data['win_pct'])
cor_yards = comparison_data['yards_per_play'].corr(comparison_data['win_pct'])
# Create comparison plots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))
# EPA plot
slope, intercept = np.polyfit(comparison_data['off_epa'],
comparison_data['win_pct'], 1)
line_x = np.array([comparison_data['off_epa'].min(),
comparison_data['off_epa'].max()])
ax1.plot(line_x, slope * line_x + intercept, color='gray', alpha=0.6, linewidth=2)
ax1.scatter(comparison_data['off_epa'], comparison_data['win_pct'],
s=150, alpha=0.6, color='steelblue')
for _, row in comparison_data.iterrows():
ax1.annotate(row['posteam'], (row['off_epa'], row['win_pct']),
fontsize=7, ha='center', va='center')
ax1.set_xlabel('EPA per Play', fontsize=11, fontweight='bold')
ax1.set_ylabel('Win Percentage', fontsize=11, fontweight='bold')
ax1.set_title(f'EPA per Play vs Win %\nr = {cor_epa:.3f} | R² = {cor_epa**2:.3f}',
fontsize=12, fontweight='bold')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax1.grid(True, alpha=0.3)
# Yards plot
slope, intercept = np.polyfit(comparison_data['yards_per_play'],
comparison_data['win_pct'], 1)
line_x = np.array([comparison_data['yards_per_play'].min(),
comparison_data['yards_per_play'].max()])
ax2.plot(line_x, slope * line_x + intercept, color='gray', alpha=0.6, linewidth=2)
ax2.scatter(comparison_data['yards_per_play'], comparison_data['win_pct'],
s=150, alpha=0.6, color='coral')
for _, row in comparison_data.iterrows():
ax2.annotate(row['posteam'], (row['yards_per_play'], row['win_pct']),
fontsize=7, ha='center', va='center')
ax2.set_xlabel('Yards per Play', fontsize=11, fontweight='bold')
ax2.set_ylabel('Win Percentage', fontsize=11, fontweight='bold')
ax2.set_title(f'Yards per Play vs Win %\nr = {cor_yards:.3f} | R² = {cor_yards**2:.3f}',
fontsize=12, fontweight='bold')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.grid(True, alpha=0.3)
fig.suptitle('EPA vs Yards: Which Predicts Winning Better?\n2023 NFL Regular Season',
fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()
print(f"\nCorrelation with Win %:")
print(f" EPA per Play: r = {cor_epa:.3f} (R² = {cor_epa**2:.3f})")
print(f" Yards per Play: r = {cor_yards:.3f} (R² = {cor_yards**2:.3f})")
print(f" EPA explains {(cor_epa**2 - cor_yards**2) / cor_yards**2 * 100:.1f}% more variance")
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
Why EPA is Superior
EPA outperforms traditional metrics for several reasons:
- Context-aware: Accounts for down, distance, and field position
- Point-based: Measures true value (points) rather than yards
- Negative plays matter: A 3-yard loss on 3rd-and-2 is catastrophic, not just -3 yards
- Turnover value: Properly weights the massive swing of turnovers
- Efficiency-focused: Not inflated by garbage time or prevent defense
Advanced EPA Applications
Situational EPA Analysis
#| label: situational-epa-r
#| message: false
#| warning: false
# Early down EPA vs late down EPA
situational_epa <- pbp_2023 %>%
filter(!is.na(epa), play_type %in% c("pass", "run")) %>%
mutate(
situation = case_when(
down <= 2 ~ "Early Downs (1st/2nd)",
down >= 3 ~ "Late Downs (3rd/4th)"
)
) %>%
group_by(posteam, situation, play_type) %>%
summarise(
mean_epa = mean(epa),
plays = n(),
.groups = "drop"
) %>%
filter(plays >= 30)
# Reshape for comparison
situational_wide <- situational_epa %>%
pivot_wider(
names_from = c(situation, play_type),
values_from = mean_epa,
names_sep = "_"
)
# Display top teams
situational_wide %>%
arrange(desc(`Early Downs (1st/2nd)_pass`)) %>%
head(10) %>%
gt() %>%
cols_label(
posteam = "Team",
`Early Downs (1st/2nd)_pass` = "Early Pass",
`Early Downs (1st/2nd)_run` = "Early Run",
`Late Downs (3rd/4th)_pass` = "Late Pass",
`Late Downs (3rd/4th)_run` = "Late Run"
) %>%
fmt_number(
columns = -posteam,
decimals = 3
) %>%
tab_header(
title = "Situational EPA by Team",
subtitle = "2023 NFL Regular Season - Top 10 Early Down Passing"
) %>%
data_color(
columns = -posteam,
colors = scales::col_numeric(
palette = c("#FEE5D9", "#A50F15"),
domain = NULL
)
)
#| label: situational-epa-py
#| message: false
#| warning: false
# Early down EPA vs late down EPA
pbp_situational = pbp_2023_py.query("epa.notna() & play_type.isin(['pass', 'run'])")
def categorize_situation(down):
if down <= 2:
return "Early Downs (1st/2nd)"
else:
return "Late Downs (3rd/4th)"
pbp_situational['situation'] = pbp_situational['down'].apply(categorize_situation)
situational_epa = (pbp_situational
.groupby(['posteam', 'situation', 'play_type'])
.agg(
mean_epa=('epa', 'mean'),
plays=('epa', 'count')
)
.reset_index()
.query('plays >= 30')
)
# Reshape for comparison
situational_wide = situational_epa.pivot_table(
index='posteam',
columns=['situation', 'play_type'],
values='mean_epa'
).reset_index()
situational_wide.columns = ['_'.join(col).strip('_') for col in situational_wide.columns.values]
print("\nSituational EPA by Team (2023 NFL Regular Season)")
print("Top 10 Early Down Passing Teams")
print("=" * 80)
print(situational_wide.sort_values(
'Early Downs (1st/2nd)_pass', ascending=False
).head(10).to_string(index=False))
Player-Level EPA
While this chapter focuses on team EPA, player-level EPA is crucial for evaluation:
#| label: player-epa-r
#| message: false
#| warning: false
# Top QBs by EPA
qb_epa <- pbp_2023 %>%
filter(
!is.na(epa),
play_type == "pass",
!is.na(passer_player_name)
) %>%
group_by(passer_player_name, passer_player_id) %>%
summarise(
dropbacks = n(),
total_epa = sum(epa),
mean_epa = mean(epa),
success_rate = mean(epa > 0),
.groups = "drop"
) %>%
filter(dropbacks >= 200) %>%
arrange(desc(total_epa))
# Display top 15 QBs
qb_epa %>%
head(15) %>%
gt() %>%
cols_label(
passer_player_name = "Quarterback",
dropbacks = "Dropbacks",
total_epa = "Total EPA",
mean_epa = "EPA/Play",
success_rate = "Success Rate"
) %>%
fmt_number(
columns = c(total_epa, mean_epa),
decimals = 2
) %>%
fmt_percent(
columns = success_rate,
decimals = 1
) %>%
fmt_number(
columns = dropbacks,
decimals = 0,
use_seps = TRUE
) %>%
cols_hide(passer_player_id) %>%
tab_header(
title = "Top Quarterbacks by EPA",
subtitle = "2023 NFL Regular Season (min. 200 dropbacks)"
) %>%
data_color(
columns = total_epa,
colors = scales::col_numeric(
palette = c("#FEE5D9", "#A50F15"),
domain = NULL
)
)
#| label: player-epa-py
#| message: false
#| warning: false
# Top QBs by EPA
qb_epa = (pbp_2023_py
.query("epa.notna() & play_type == 'pass' & passer_player_name.notna()")
.groupby(['passer_player_name', 'passer_player_id'])
.agg(
dropbacks=('epa', 'count'),
total_epa=('epa', 'sum'),
mean_epa=('epa', 'mean'),
success_rate=('epa', lambda x: (x > 0).mean())
)
.reset_index()
.query('dropbacks >= 200')
.sort_values('total_epa', ascending=False)
)
print("\nTop Quarterbacks by EPA (2023 NFL Regular Season)")
print("Minimum 200 dropbacks")
print("=" * 90)
print(qb_epa[['passer_player_name', 'dropbacks', 'total_epa',
'mean_epa', 'success_rate']].head(15).to_string(index=False))
EPA Limitations and Considerations
While EPA is extremely valuable, it's important to understand its limitations:
1. Credit Assignment
EPA assigns all credit/blame to the offense/defense on the field, but doesn't distinguish between:
- Quarterback vs receiver contributions on a pass
- Offensive line vs running back on a run
- Coverage vs pass rush on defense
Solution: Use tracking data or film analysis to decompose EPA by player role.
2. Randomness and Luck
Short-term EPA can be influenced by:
- Dropped passes (assigned to QB)
- Tipped interceptions
- Fumble luck
Solution: Use larger sample sizes and complement with other metrics.
3. Scheme and Situation
EPA doesn't fully account for:
- Offensive scheme complexity
- Defensive alignments and disguises
- Intentional conservative play-calling
Solution: Perform situational adjustments and context-aware analysis.
4. Terminal Value
Basic EP models don't account for:
- Kneel downs at end of game
- Time management considerations
- Two-minute drill scenarios
Solution: Use Win Probability Added (WPA) for end-game situations.
When NOT to Use EPA
1. **Small samples**: EPA is noisy at the play level; need 50+ plays for stability 2. **Kneeldowns**: Filter out or adjust for end-game kneeldowns 3. **Special teams**: Use specialized metrics for kicks, punts, returns 4. **Individual player credit**: EPA alone doesn't separate QB/RB/WR contributions 5. **Draft evaluation**: Combine with college EPA, athletic testing, and scoutingStrategic Applications of EPA
Fourth Down Decision-Making
EPA is fundamental to fourth-down analytics. The decision framework:
Go for it when: $\text{EP}_{\text{go}} > \max(\text{EP}_{\text{punt}}, \text{EP}_{\text{FG}})$
#| label: fourth-down-analysis-r
#| message: false
#| warning: false
# Analyze 4th down decisions
fourth_down <- pbp_2023 %>%
filter(down == 4, play_type %in% c("pass", "run", "punt", "field_goal")) %>%
mutate(
decision = case_when(
play_type %in% c("pass", "run") ~ "Go for it",
play_type == "field_goal" ~ "Field Goal",
play_type == "punt" ~ "Punt"
),
ydstogo_bin = cut(ydstogo,
breaks = c(0, 2, 5, 10, 100),
labels = c("1-2", "3-5", "6-10", "10+"))
) %>%
group_by(yardline_100, ydstogo_bin, decision) %>%
summarise(
attempts = n(),
mean_epa = mean(epa, na.rm = TRUE),
.groups = "drop"
) %>%
filter(attempts >= 5)
# Show mean EPA by decision type in opponent territory
fourth_down %>%
filter(yardline_100 <= 40, ydstogo_bin %in% c("1-2", "3-5")) %>%
group_by(ydstogo_bin, decision) %>%
summarise(
attempts = sum(attempts),
mean_epa = weighted.mean(mean_epa, attempts),
.groups = "drop"
) %>%
pivot_wider(names_from = decision, values_from = c(attempts, mean_epa)) %>%
gt() %>%
tab_header(
title = "4th Down EPA by Decision",
subtitle = "Opponent Territory (inside 40), 2023 Season"
) %>%
fmt_number(
columns = starts_with("mean"),
decimals = 3
)
#| label: fourth-down-analysis-py
#| message: false
#| warning: false
# Analyze 4th down decisions
fourth_down = pbp_2023_py.query(
"down == 4 & play_type.isin(['pass', 'run', 'punt', 'field_goal'])"
).copy()
def categorize_decision(play_type):
if play_type in ['pass', 'run']:
return "Go for it"
elif play_type == 'field_goal':
return "Field Goal"
else:
return "Punt"
fourth_down['decision'] = fourth_down['play_type'].apply(categorize_decision)
fourth_down['ydstogo_bin'] = pd.cut(
fourth_down['ydstogo'],
bins=[0, 2, 5, 10, 100],
labels=["1-2", "3-5", "6-10", "10+"]
)
fourth_summary = (fourth_down
.groupby(['yardline_100', 'ydstogo_bin', 'decision'])
.agg(
attempts=('epa', 'count'),
mean_epa=('epa', 'mean')
)
.reset_index()
.query('attempts >= 5')
)
# Show mean EPA by decision type in opponent territory
opp_territory = (fourth_summary
.query("yardline_100 <= 40 & ydstogo_bin.isin(['1-2', '3-5'])")
.groupby(['ydstogo_bin', 'decision'])
.apply(lambda x: pd.Series({
'attempts': x['attempts'].sum(),
'mean_epa': np.average(x['mean_epa'], weights=x['attempts'])
}))
.reset_index()
)
print("\n4th Down EPA by Decision")
print("Opponent Territory (inside 40), 2023 Season")
print("=" * 60)
for dist in ["1-2", "3-5"]:
print(f"\n{dist} yards to go:")
subset = opp_territory[opp_territory['ydstogo_bin'] == dist]
for _, row in subset.iterrows():
print(f" {row['decision']:15s}: {row['attempts']:3.0f} attempts, "
f"EPA = {row['mean_epa']:+.3f}")
Play-Calling Optimization
EPA can identify optimal play-calling strategies:
#| label: play-calling-optimization-r
#| message: false
#| warning: false
# Calculate pass rate and EPA by situation
playcalling <- pbp_2023 %>%
filter(
!is.na(epa),
play_type %in% c("pass", "run"),
down %in% c(1, 2)
) %>%
mutate(
ydstogo_group = case_when(
ydstogo <= 3 ~ "Short",
ydstogo <= 7 ~ "Medium",
TRUE ~ "Long"
)
) %>%
group_by(posteam, down, ydstogo_group) %>%
summarise(
plays = n(),
pass_rate = mean(play_type == "pass"),
mean_epa = mean(epa),
pass_epa = mean(epa[play_type == "pass"]),
run_epa = mean(epa[play_type == "run"]),
.groups = "drop"
)
# Show 1st down efficiency
playcalling %>%
filter(down == 1, ydstogo_group == "Medium") %>%
mutate(epa_diff = pass_epa - run_epa) %>%
arrange(desc(epa_diff)) %>%
head(10) %>%
gt() %>%
cols_label(
posteam = "Team",
plays = "Plays",
pass_rate = "Pass Rate",
pass_epa = "Pass EPA",
run_epa = "Run EPA",
epa_diff = "EPA Diff"
) %>%
fmt_percent(
columns = pass_rate,
decimals = 1
) %>%
fmt_number(
columns = c(pass_epa, run_epa, epa_diff, mean_epa),
decimals = 3
) %>%
cols_hide(c(down, ydstogo_group)) %>%
tab_header(
title = "1st Down Pass vs Run Efficiency",
subtitle = "Medium distance (4-7 yards), 2023 Season"
)
#| label: play-calling-optimization-py
#| message: false
#| warning: false
# Calculate pass rate and EPA by situation
pbp_playcalling = pbp_2023_py.query(
"epa.notna() & play_type.isin(['pass', 'run']) & down.isin([1, 2])"
).copy()
def ydstogo_group(yards):
if yards <= 3:
return "Short"
elif yards <= 7:
return "Medium"
else:
return "Long"
pbp_playcalling['ydstogo_group'] = pbp_playcalling['ydstogo'].apply(ydstogo_group)
playcalling = (pbp_playcalling
.groupby(['posteam', 'down', 'ydstogo_group'])
.apply(lambda x: pd.Series({
'plays': len(x),
'pass_rate': (x['play_type'] == 'pass').mean(),
'mean_epa': x['epa'].mean(),
'pass_epa': x.loc[x['play_type'] == 'pass', 'epa'].mean(),
'run_epa': x.loc[x['play_type'] == 'run', 'epa'].mean()
}))
.reset_index()
)
# Show 1st down efficiency
first_down_medium = (playcalling
.query("down == 1 & ydstogo_group == 'Medium'")
.assign(epa_diff=lambda x: x['pass_epa'] - x['run_epa'])
.sort_values('epa_diff', ascending=False)
)
print("\n1st Down Pass vs Run Efficiency")
print("Medium distance (4-7 yards), 2023 Season")
print("=" * 90)
print(first_down_medium[['posteam', 'plays', 'pass_rate', 'pass_epa',
'run_epa', 'epa_diff']].head(10).to_string(index=False))
Summary
Expected Points Added (EPA) is the cornerstone metric of modern football analytics. In this chapter, we:
- Built EP models from scratch using binning, GAM, and gradient boosting approaches
- Visualized the EP curve showing how expected points vary by field position
- Calculated EPA for plays and understood its components
- Analyzed EPA distributions by play type, down, distance, and field position
- Examined contextual factors including score differential and garbage time
- Demonstrated EPA's stability and strong correlation with winning
- Compared EPA to traditional metrics showing its superior predictive power
- Applied EPA to strategic decisions including fourth downs and play-calling
- Recognized EPA limitations and when to use complementary metrics
EPA provides a unified framework for evaluating football performance that:
- Accounts for context: Down, distance, field position matter
- Measures true value: Points, not yards
- Predicts success: High correlation with wins and future performance
- Enables comparison: Apples-to-apples across situations
- Guides decisions: Optimal fourth-down strategy, play-calling, personnel
Key Takeaways
EPA Best Practices
1. **Use sufficient sample sizes**: 50+ plays minimum, ideally 200+ 2. **Filter garbage time**: Remove plays when game is decided 3. **Consider context**: Score, time, and opponent strength matter 4. **Combine with other metrics**: Use Success Rate, CPOE, and yards for complete picture 5. **Understand limitations**: EPA doesn't separate individual contributions 6. **Update models regularly**: EP models should be retrained with recent data 7. **Visualize distributions**: Don't just report means—show full distributionsExercises
Conceptual Questions
-
EP Model Comparison: Why does the gradient boosting model outperform simple binning? What are the tradeoffs?
-
EPA Interpretation: A team averages +0.15 EPA per play. Over a 65-play game, how many more points would we expect them to score than an average team?
-
Context Effects: Why does pass EPA vary more by score differential than run EPA? What does this tell us about game theory?
-
Stability: EPA has year-to-year correlation of ~0.45. Is this high or low? How does it compare to win percentage?
Coding Exercises
Exercise 1: Build a Custom EP Model
Create your own EP model with these requirements: a) Include additional features: quarter, score differential, time remaining b) Use cross-validation to select optimal hyperparameters c) Compare in-sample and out-of-sample performance d) Plot feature importance **Bonus**: Build separate models for different field zones (own territory, midfield, red zone)Exercise 2: Team EPA Profiles
For each team in 2023, create a comprehensive EPA profile: a) Calculate EPA by down (1st, 2nd, 3rd) b) Calculate EPA by distance (short, medium, long) c) Calculate EPA by field position zone d) Create a radar chart or heatmap visualizing the profile e) Identify each team's strengths and weaknesses **Bonus**: Cluster teams based on their EPA profilesExercise 3: EPA-Based Play Recommendations
Build a recommendation system that suggests optimal play types: a) For each game situation (down, distance, field position), calculate historical EPA for pass vs run b) Account for team strength (offensive and defensive EPA) c) Adjust for score differential and time remaining d) Output a recommendation: pass, run, or neutral e) Validate recommendations against actual play-calling and outcomes **Bonus**: Include league-wide pass rate and adjust for game theory (defense expects pass in obvious passing situations)Exercise 4: QB EPA Decomposition
Analyze quarterback EPA and attempt to decompose it: a) Calculate QB EPA per play b) Estimate "expected EPA" based on pass location and depth (using receiver position) c) Calculate "EPA over expected" (similar to CPOE) d) Compare rankings: total EPA vs EPA over expected e) Identify QBs who outperform/underperform expected EPA **Bonus**: Analyze how EPA over expected correlates with QB pressure and time to throwExercise 5: Predictive EPA Model
Build a model that predicts future EPA: a) Use team EPA from weeks 1-8 to predict weeks 9-17 b) Include additional features: schedule strength, injury status, weather c) Calculate prediction error (MAE, RMSE) d) Compare to naive baselines (e.g., league average, prior year) e) Identify which teams exceeded or fell short of predictions **Bonus**: Use this model to predict playoff outcomesFurther Reading
Academic Papers
-
Yurko, R., Ventura, S., & Horowitz, M. (2019). "nflWAR: a reproducible method for offensive player evaluation in football." Journal of Quantitative Analysis in Sports, 15(3), 163-183.
-
Lopez, M. J., Matthews, G. J., & Baumer, B. S. (2018). "How often does the best team win? A unified approach to understanding randomness in North American sport." The Annals of Applied Statistics, 12(4), 2483-2516.
-
Burke, B. (2009). "Expected Points (EP) and Expected Points Added (EPA)." Advanced NFL Stats.
Books and Resources
-
Football Outsiders. "Football Outsiders Almanac" (annual). Covers DVOA, DYAR, and other advanced metrics.
-
Baldwin, B. & Yurko, R. (2023). "nflfastR documentation." Comprehensive guide to EPA and related metrics.
-
The Athletic, ESPN Analytics. Regular EPA-based analysis and team rankings.
Online Resources
- nflfastR GitHub: https://github.com/nflverse/nflfastR
- Open Source Football: https://www.opensourcefootball.com/
- nflverse documentation: https://nflverse.nflverse.com/
References
:::