Learning ObjectivesBy the end of this chapter, you will be able to:
- Understand when to attempt two-point conversions based on game situations
- Build two-point conversion probability models using historical data
- Analyze score-specific optimal strategies using dynamic programming
- Study play-calling tendencies and success rates on two-point attempts
- Evaluate and create two-point conversion decision charts
Introduction
After scoring a touchdown, coaches face a critical decision: kick an extra point for one nearly-guaranteed point, or attempt a two-point conversion for two points with lower probability of success. This seemingly simple choice involves complex strategic considerations including current score, time remaining, opponent quality, and downstream game scenarios.
The NFL's 2015 rule change moving the extra point from the 2-yard line to the 15-yard line (33-yard kick) made this decision more interesting. Extra points, once automatic at 99.5% success rate, dropped to approximately 94% success. Meanwhile, two-point conversions historically succeed at roughly 47-49% rates, making the expected value comparison much closer.
At first glance, this appears to be a simple mathematical optimization problem: multiply success probability by points awarded and choose the option with higher expected value. However, this analysis ignores critical game-theoretic considerations. The optimal decision depends not just on expected points, but on how those points affect win probability in the current game state. A two-point conversion that ties the game may be worth far more than its 0.96 expected point value suggests, while the same attempt when leading by 20 points offers minimal strategic value.
Throughout this chapter, we'll develop both the mathematical framework and the practical intuition needed to make optimal two-point conversion decisions. We'll start by examining historical data to understand success rates, then build predictive models, develop score-specific decision frameworks, and finally analyze play-calling patterns and defensive strategies. By the end, you'll be able to evaluate coaching decisions and create your own decision support tools.
The Fundamental Tradeoff
The two-point decision balances: - **Extra Point**: ~94% probability of 1 point (0.94 expected points) - **Two-Point Conversion**: ~48% probability of 2 points (0.96 expected points) In neutral game situations, two-point conversions offer slightly higher expected value (0.96 vs 0.94 points), but game context often dominates this small 0.02-point edge. The real value of going for two comes from specific score differentials where converting (or failing) fundamentally changes your path to winning.Why This Chapter Matters
Two-point conversion decisions represent one of the clearest examples where analytics can improve coaching decisions. Research shows that coaches systematically make suboptimal choices, often kicking when they should go for two and vice versa. Unlike complex fourth-down scenarios, two-point situations have: 1. **Binary outcomes**: Success or failure, no ambiguity 2. **Known probabilities**: Historical data provides robust estimates 3. **Clear counterfactuals**: We know exactly what happened with each choice 4. **Measurable impact**: Win probability changes are quantifiable This makes two-point strategy an ideal domain for data-driven decision-making and an excellent learning opportunity for understanding how to combine statistical analysis with game theory.Historical Context and Rule Changes
The Evolution of Extra Points
Before 2015, extra points were essentially automatic. Kicked from the 2-yard line (making them 20-yard attempts), kickers converted 99.5% of attempts. This near-certainty made the post-touchdown decision trivial: always kick unless trailing late in games required specific point differentials.
The 2015 rule change, moving extra points to the 15-yard line (creating 33-yard attempts), fundamentally altered this calculus. Let's examine exactly how the landscape changed and why this matters for strategic decision-making.
Pre-2015 Era: The "Automatic" Extra Point
- Success rate: 99.5%
- Expected value: 0.995 points
- Two-point conversion rate: ~48%
- Two-point expected value: 0.96 points
- Expected value advantage for kicking: +0.035 points
- Strategic implication: Always kick unless score differential requires exactly two points
Post-2015 Era: The Strategic Decision
- Extra point success rate: ~94%
- Expected value: 0.94 points
- Two-point conversion rate: ~48%
- Two-point expected value: 0.96 points
- Expected value advantage for two-point: +0.02 points
- Strategic implication: Consider going for two even in neutral situations
The rule change created a paradigm shift. While the 0.02-point advantage for going for two seems small, it compounds over a season. A team that scores 40 touchdowns and always goes for two instead of kicking extra points gains approximately 0.8 expected points across the season. More importantly, the closer expected values mean that game situation now dominates the decision—a single percentage point change in win probability can swing the optimal choice.
Common Misconception: "Always Follow Expected Value"
Many analysts fall into the trap of believing that because two-point conversions have higher expected value (0.96 vs 0.94), teams should always go for two. This ignores several critical factors: 1. **Win probability vs. expected points**: Sometimes a guaranteed point changes win probability more than a higher-variance two-point attempt 2. **Information value**: Attempting (and failing) a two-point conversion reveals information that affects future decisions 3. **Risk tolerance**: In close games, minimizing variance may be more valuable than maximizing expected points 4. **Possession dynamics**: The number of remaining possessions affects whether variance helps or hurts We'll explore these nuances throughout this chapter, demonstrating that optimal strategy requires game-state-specific analysis, not blanket rules.Historical Trends
To understand how the rule change affected behavior and outcomes, we need to analyze comprehensive data spanning both eras. We'll examine two-point attempt rates and success rates from 2010 through 2023, allowing us to see both the pre-rule change baseline and the post-change evolution as coaches adapted their strategies.
The following analysis loads play-by-play data for all post-touchdown conversion attempts, classifies them as extra points or two-point attempts, and calculates success rates. This will reveal not just the average changes, but also year-to-year trends that show strategic evolution.
#| label: load-libraries-r
#| message: false
#| warning: false
# Load required libraries for data manipulation and visualization
library(tidyverse) # Core data manipulation (dplyr, ggplot2, etc.)
library(nflfastR) # NFL play-by-play data
library(nflplotR) # NFL-specific plotting utilities
library(gt) # Grammar of tables for nice table formatting
library(gtExtras) # Additional table styling options
library(patchwork) # Combining multiple plots
# Set consistent plot theme for all visualizations
theme_set(theme_minimal(base_size = 12))
#| label: load-historical-data-r
#| message: false
#| warning: false
#| cache: true
# Load play-by-play data from 2010-2023
# This provides 14 seasons: 5 pre-rule change (2010-2014) and 9 post-rule change (2015-2023)
pbp <- load_pbp(2010:2023)
# Filter for extra point and two-point conversion attempts
# We need both types to compare rates and success probabilities
conversion_attempts <- pbp %>%
filter(
# Include plays that are either two-point or extra point attempts
two_point_attempt == 1 | extra_point_attempt == 1,
# Exclude plays with missing team information (data quality check)
!is.na(posteam)
) %>%
mutate(
# Create readable attempt type label
attempt_type = case_when(
two_point_attempt == 1 ~ "Two-Point",
extra_point_attempt == 1 ~ "Extra Point"
),
# Define success for each attempt type
# Two-point: Check if conversion result was "success"
# Extra point: Check if result was "good"
success = case_when(
two_point_attempt == 1 ~ two_point_conv_result == "success",
extra_point_attempt == 1 ~ extra_point_result == "good"
),
# Flag for post-rule change era (2015 onward)
post_rule_change = season >= 2015
)
cat("Loaded", nrow(conversion_attempts), "conversion attempts from 2010-2023\n")
#| label: fig-historical-trends-r
#| fig-cap: "Two-point conversion attempt rates and success rates over time. The top panel shows that coaches increased two-point attempts after the 2015 rule change, recognizing the changed expected value calculus. The bottom panel reveals that extra point success dropped dramatically after 2015, while two-point conversion rates remained relatively stable, fundamentally altering the strategic landscape."
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Calculate annual statistics for attempt rates and success rates
# This aggregation will show trends over time
annual_stats <- conversion_attempts %>%
group_by(season, attempt_type) %>%
summarise(
# Count total attempts of each type
attempts = n(),
# Calculate success rate (proportion of successful attempts)
success_rate = mean(success, na.rm = TRUE),
.groups = "drop"
) %>%
group_by(season) %>%
mutate(
# Calculate total attempts across both types
total_attempts = sum(attempts),
# Calculate percentage of attempts that were two-point conversions
# This shows how aggressive coaches were
pct_of_attempts = attempts / total_attempts
) %>%
ungroup()
# Plot 1: Two-point attempt rate over time
# This reveals strategic evolution in coaching decisions
p1 <- annual_stats %>%
filter(attempt_type == "Two-Point") %>%
ggplot(aes(x = season, y = pct_of_attempts)) +
geom_line(color = "#0077BE", size = 1.2) +
geom_point(size = 3, color = "#0077BE") +
# Add vertical line at rule change
geom_vline(xintercept = 2015, linetype = "dashed", color = "red", alpha = 0.5) +
annotate("text", x = 2015, y = max(annual_stats$pct_of_attempts[annual_stats$attempt_type == "Two-Point"]),
label = "Rule Change", vjust = -0.5, color = "red") +
scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
scale_x_continuous(breaks = seq(2010, 2023, 2)) +
labs(
title = "Two-Point Conversion Attempt Rate",
subtitle = "Percentage of post-TD attempts that are two-point conversions",
x = NULL,
y = "% of Attempts"
) +
theme(plot.title = element_text(face = "bold"))
# Plot 2: Success rates over time for both attempt types
# This shows the impact of the rule change on outcomes
p2 <- annual_stats %>%
ggplot(aes(x = season, y = success_rate, color = attempt_type)) +
geom_line(size = 1.2) +
geom_point(size = 3) +
# Add vertical line at rule change
geom_vline(xintercept = 2015, linetype = "dashed", color = "red", alpha = 0.5) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_x_continuous(breaks = seq(2010, 2023, 2)) +
scale_color_manual(
values = c("Two-Point" = "#0077BE", "Extra Point" = "#FF6B35"),
name = "Attempt Type"
) +
labs(
title = "Success Rates Over Time",
subtitle = "Extra point success dropped after 2015 rule change",
x = "Season",
y = "Success Rate"
) +
theme(
plot.title = element_text(face = "bold"),
legend.position = "top"
)
# Combine plots vertically for easy comparison
p1 / p2 +
plot_annotation(
caption = "Data: nflfastR | Rule change: 2015 (XP moved from 2-yard line to 15-yard line)"
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: load-libraries-py
#| message: false
#| warning: false
# Import required libraries for data analysis and visualization
import pandas as pd # Data manipulation
import numpy as np # Numerical operations
import nfl_data_py as nfl # NFL data access
import matplotlib.pyplot as plt # Plotting
import seaborn as sns # Statistical visualization
from scipy import stats # Statistical functions
# Set plot style for consistent, professional-looking visualizations
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 100
#| label: load-historical-data-py
#| message: false
#| warning: false
#| cache: true
# Load play-by-play data from 2010-2023
# range(2010, 2024) creates list [2010, 2011, ..., 2023]
pbp = nfl.import_pbp_data(range(2010, 2024))
# Filter for extra point and two-point conversion attempts
# We use boolean indexing to select relevant plays
conversion_attempts = pbp[
((pbp['two_point_attempt'] == 1) | (pbp['extra_point_attempt'] == 1)) &
(pbp['posteam'].notna())
].copy()
# Create attempt type classification
conversion_attempts['attempt_type'] = np.where(
conversion_attempts['two_point_attempt'] == 1,
'Two-Point',
'Extra Point'
)
# Define success based on attempt type
conversion_attempts['success'] = np.where(
conversion_attempts['two_point_attempt'] == 1,
conversion_attempts['two_point_conv_result'] == 'success',
conversion_attempts['extra_point_result'] == 'good'
)
# Flag post-rule change era
conversion_attempts['post_rule_change'] = conversion_attempts['season'] >= 2015
print(f"Loaded {len(conversion_attempts):,} conversion attempts from 2010-2023")
#| label: fig-historical-trends-py
#| fig-cap: "Two-point conversion attempt rates and success rates over time. The 2015 rule change created a clear inflection point in both coaching behavior (more two-point attempts) and outcomes (lower extra point success rates)."
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Calculate annual statistics
# Group by season and attempt type to get yearly trends
annual_stats = (conversion_attempts
.groupby(['season', 'attempt_type'])
.agg(
attempts=('success', 'count'),
success_rate=('success', 'mean')
)
.reset_index()
)
# Calculate percentage of attempts that were two-point conversions
annual_stats['total_attempts'] = annual_stats.groupby('season')['attempts'].transform('sum')
annual_stats['pct_of_attempts'] = annual_stats['attempts'] / annual_stats['total_attempts']
# Create subplots for attempt rate and success rate trends
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
# Plot 1: Two-point attempt rate over time
two_pt_data = annual_stats[annual_stats['attempt_type'] == 'Two-Point']
ax1.plot(two_pt_data['season'], two_pt_data['pct_of_attempts'],
color='#0077BE', linewidth=2, marker='o', markersize=6)
ax1.axvline(x=2015, color='red', linestyle='--', alpha=0.5, linewidth=1.5)
ax1.text(2015, two_pt_data['pct_of_attempts'].max(), 'Rule Change',
color='red', ha='center', va='bottom')
ax1.set_ylabel('% of Attempts', fontsize=11)
ax1.set_title('Two-Point Conversion Attempt Rate\nPercentage of post-TD attempts that are two-point conversions',
fontsize=12, fontweight='bold')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.1%}'))
ax1.set_xticks(range(2010, 2024, 2))
ax1.grid(True, alpha=0.3)
# Plot 2: Success rates over time for both attempt types
for attempt_type, color in [('Two-Point', '#0077BE'), ('Extra Point', '#FF6B35')]:
data = annual_stats[annual_stats['attempt_type'] == attempt_type]
ax2.plot(data['season'], data['success_rate'],
color=color, linewidth=2, marker='o', markersize=6, label=attempt_type)
ax2.axvline(x=2015, color='red', linestyle='--', alpha=0.5, linewidth=1.5)
ax2.set_xlabel('Season', fontsize=11)
ax2.set_ylabel('Success Rate', fontsize=11)
ax2.set_title('Success Rates Over Time\nExtra point success dropped after 2015 rule change',
fontsize=12, fontweight='bold')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.set_xticks(range(2010, 2024, 2))
ax2.legend(loc='upper right', frameon=True)
ax2.grid(True, alpha=0.3)
plt.tight_layout()
fig.text(0.99, 0.01, 'Data: nfl_data_py | Rule change: 2015 (XP moved from 2-yard line to 15-yard line)',
ha='right', fontsize=8, style='italic')
plt.show()
Data Analysis Best Practice: Pre/Post Comparison
When analyzing rule changes or other interventions, always include sufficient data from both before and after the change. In this analysis, we used five pre-rule change seasons (2010-2014) and nine post-rule change seasons (2015-2023). This provides: 1. **Baseline establishment**: Multiple years before the change to establish normal variation 2. **Trend detection**: Enough post-change data to see if effects persist or fade 3. **Statistical power**: Larger sample sizes increase confidence in observed differences 4. **Context**: Ability to separate rule change effects from general NFL evolution For similar analyses, aim for at least 3-5 years of data on each side of the intervention point.Expected Value Analysis
Basic Expected Value Calculation
The fundamental decision framework for two-point conversions begins with expected value: multiply the probability of success by the points awarded. While we'll later see that expected points don't tell the whole story, they provide our starting point for analysis.
Expected value for each option is calculated as:
$$ \text{EV}_{\text{XP}} = P(\text{XP success}) \times 1 $$
$$ \text{EV}_{\text{2PT}} = P(\text{2PT success}) \times 2 $$
However, we need accurate estimates of these probabilities. Historical averages provide a baseline, but the rule change means we should analyze pre-2015 and post-2015 eras separately. Additionally, we need confidence intervals around our estimates—a 48% success rate with 100 attempts is very different from 48% with 10,000 attempts.
The following analysis calculates success rates separately for each era, computes expected values, and provides statistical confidence intervals. This allows us to quantify uncertainty in our estimates and determine whether observed differences are meaningful or could plausibly be due to random variation.
#| label: calculate-ev-r
#| message: false
#| warning: false
# Calculate success rates pre and post rule change
# We'll compute expected values and confidence intervals for each era
ev_comparison <- conversion_attempts %>%
# Create era labels for grouping
mutate(period = if_else(post_rule_change, "Post-2015", "Pre-2015")) %>%
# Group by era and attempt type
group_by(period, attempt_type) %>%
summarise(
# Count attempts and successes
attempts = n(),
successes = sum(success, na.rm = TRUE),
# Calculate success rate (proportion of successes)
success_rate = mean(success, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
# Calculate expected value based on attempt type
# Extra point: success_rate * 1 point
# Two-point: success_rate * 2 points
expected_value = case_when(
attempt_type == "Extra Point" ~ success_rate * 1,
attempt_type == "Two-Point" ~ success_rate * 2
),
# Calculate standard error using binomial formula: sqrt(p(1-p)/n)
std_error = sqrt(success_rate * (1 - success_rate) / attempts),
# Calculate 95% confidence interval bounds
# Using normal approximation: estimate ± 1.96 * SE
ev_lower = case_when(
attempt_type == "Extra Point" ~ (success_rate - 1.96 * std_error) * 1,
attempt_type == "Two-Point" ~ (success_rate - 1.96 * std_error) * 2
),
ev_upper = case_when(
attempt_type == "Extra Point" ~ (success_rate + 1.96 * std_error) * 1,
attempt_type == "Two-Point" ~ (success_rate + 1.96 * std_error) * 2
)
)
# Display table with formatting
ev_comparison %>%
select(period, attempt_type, attempts, success_rate, expected_value) %>%
arrange(period, desc(attempt_type)) %>%
gt() %>%
cols_label(
period = "Period",
attempt_type = "Attempt Type",
attempts = "Attempts",
success_rate = "Success Rate",
expected_value = "Expected Value"
) %>%
# Format numbers appropriately
fmt_number(
columns = c(success_rate, expected_value),
decimals = 3
) %>%
fmt_number(
columns = attempts,
decimals = 0,
use_seps = TRUE
) %>%
# Highlight maximum expected value in each period
tab_style(
style = cell_fill(color = "#E8F4F8"),
locations = cells_body(
columns = expected_value,
rows = expected_value == max(expected_value, na.rm = TRUE)
)
) %>%
tab_header(
title = "Expected Value Comparison",
subtitle = "Pre-2015 vs Post-2015 Rule Change"
) %>%
tab_source_note(
source_note = "95% confidence intervals calculated using normal approximation"
)
#| label: calculate-ev-py
#| message: false
#| warning: false
# Calculate success rates pre and post rule change
# Create period labels for grouping
conversion_attempts['period'] = np.where(
conversion_attempts['post_rule_change'],
'Post-2015',
'Pre-2015'
)
# Aggregate by period and attempt type
ev_comparison = (conversion_attempts
.groupby(['period', 'attempt_type'])
.agg(
attempts=('success', 'count'),
successes=('success', 'sum'),
success_rate=('success', 'mean')
)
.reset_index()
)
# Calculate expected values based on attempt type
ev_comparison['expected_value'] = np.where(
ev_comparison['attempt_type'] == 'Extra Point',
ev_comparison['success_rate'] * 1, # 1 point for extra point
ev_comparison['success_rate'] * 2 # 2 points for two-point conversion
)
# Calculate standard error using binomial formula
ev_comparison['std_error'] = np.sqrt(
ev_comparison['success_rate'] * (1 - ev_comparison['success_rate']) /
ev_comparison['attempts']
)
# Calculate 95% confidence interval bounds
ev_comparison['ev_lower'] = np.where(
ev_comparison['attempt_type'] == 'Extra Point',
(ev_comparison['success_rate'] - 1.96 * ev_comparison['std_error']) * 1,
(ev_comparison['success_rate'] - 1.96 * ev_comparison['std_error']) * 2
)
ev_comparison['ev_upper'] = np.where(
ev_comparison['attempt_type'] == 'Extra Point',
(ev_comparison['success_rate'] + 1.96 * ev_comparison['std_error']) * 1,
(ev_comparison['success_rate'] + 1.96 * ev_comparison['std_error']) * 2
)
# Display results
print("\nExpected Value Comparison: Pre-2015 vs Post-2015 Rule Change")
print("="*70)
print(ev_comparison[['period', 'attempt_type', 'attempts', 'success_rate', 'expected_value']]
.to_string(index=False))
The results reveal the magnitude of the rule change impact:
Pre-2015 Era:
- Extra points: 0.995 expected value (99.5% × 1 point)
- Two-point conversions: 0.960 expected value (48.0% × 2 points)
- Advantage to kicking: +0.035 expected points
Post-2015 Era:
- Extra points: 0.940 expected value (94.0% × 1 point)
- Two-point conversions: 0.960 expected value (48.0% × 2 points)
- Advantage to going for two: +0.020 expected points
This represents a swing of 0.055 expected points between the two eras—a substantial change that fundamentally alters optimal strategy. Over a full season of 40 touchdowns, always going for two (post-2015) would gain approximately 0.8 expected points compared to always kicking.
Why Expected Value Doesn't Tell the Whole Story
While two-point conversions have higher expected value post-2015, this doesn't mean teams should always go for two. Expected value optimization is appropriate for repeated decisions over many games (a season-long strategy), but individual game decisions should optimize win probability, not expected points. Consider these scenarios: **Scenario 1**: Leading 27-20, you score a TD with 2:00 remaining. Expected value says go for two (0.96 vs 0.94 points). But kicking guarantees a 9-point lead (two-possession game), while going for two risks leaving it at 7 points (one possession). The guaranteed two-possession lead is worth far more than 0.02 expected points. **Scenario 2**: Trailing 14-6, you score a TD. Expected value is nearly equal, but going for two gives you information: if it succeeds, you know another TD wins; if it fails, you know you need a TD + field goal. This information value isn't captured in simple expected value calculations. We'll formalize these intuitions later with win probability models that account for game state, time remaining, and future possession dynamics.Situation-Specific Success Rates
While overall success rates provide a baseline, two-point conversion success likely varies by situation. Game context might affect success rates through several mechanisms:
- Defensive preparation: Late-game situations may allow defenses to better anticipate two-point plays
- Play calling: Desperation situations might force predictable play calls
- Execution pressure: High-stakes attempts may affect execution quality
- Score effects: Trailing teams might take more risks or use suboptimal plays
Understanding these situational differences helps us build more accurate probability models and make better real-time decisions. We'll analyze success rates by quarter, game situation, and score differential.
#| label: situational-success-r
#| message: false
#| warning: false
# Analyze two-point success by various factors
# Focus on post-2015 era for current relevance
two_pt_situations <- conversion_attempts %>%
filter(two_point_attempt == 1, season >= 2015) %>%
mutate(
# Capture score differential before the conversion attempt
# This is the score after TD but before the conversion
score_diff_before = score_differential,
# Categorize quarter and time situations
quarter_situation = case_when(
qtr <= 2 ~ "First Half",
qtr == 3 ~ "Third Quarter",
qtr == 4 & game_seconds_remaining > 300 ~ "Early 4th Q",
qtr == 4 & game_seconds_remaining <= 300 ~ "Late 4th Q",
TRUE ~ "Overtime"
),
# Classify play type from play description
# This is imperfect but gives us directional insight
play_type_cat = case_when(
grepl("pass", desc, ignore.case = TRUE) ~ "Pass",
grepl("rush|run", desc, ignore.case = TRUE) ~ "Run",
TRUE ~ "Other"
)
)
# Calculate success by quarter situation
quarter_success <- two_pt_situations %>%
group_by(quarter_situation) %>%
summarise(
attempts = n(),
success_rate = mean(success, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(attempts)) # Order by sample size
# Success by play type
play_type_success <- two_pt_situations %>%
filter(play_type_cat != "Other") %>%
group_by(play_type_cat) %>%
summarise(
attempts = n(),
success_rate = mean(success, na.rm = TRUE),
.groups = "drop"
)
# Display quarter situation table
quarter_success %>%
gt() %>%
cols_label(
quarter_situation = "Game Situation",
attempts = "Attempts",
success_rate = "Success Rate"
) %>%
fmt_number(
columns = success_rate,
decimals = 1,
scale_by = 100,
pattern = "{x}%"
) %>%
fmt_number(
columns = attempts,
decimals = 0,
use_seps = TRUE
) %>%
tab_header(
title = "Two-Point Conversion Success by Game Situation",
subtitle = "2015-2023 Seasons"
) %>%
tab_source_note(
source_note = "Data: nflfastR"
)
#| label: situational-success-py
#| message: false
#| warning: false
# Analyze two-point success by various factors
# Filter to post-2015 era for current relevance
two_pt_situations = conversion_attempts[
(conversion_attempts['two_point_attempt'] == 1) &
(conversion_attempts['season'] >= 2015)
].copy()
# Capture score differential before conversion
two_pt_situations['score_diff_before'] = two_pt_situations['score_differential']
# Create quarter situation categories
def quarter_situation(row):
"""Categorize play by quarter and time remaining"""
if row['qtr'] <= 2:
return 'First Half'
elif row['qtr'] == 3:
return 'Third Quarter'
elif row['qtr'] == 4 and row['game_seconds_remaining'] > 300:
return 'Early 4th Q'
elif row['qtr'] == 4 and row['game_seconds_remaining'] <= 300:
return 'Late 4th Q'
else:
return 'Overtime'
two_pt_situations['quarter_situation'] = two_pt_situations.apply(quarter_situation, axis=1)
# Classify play type from description
def play_type_cat(desc):
"""Extract play type from play description"""
if pd.isna(desc):
return 'Other'
desc_lower = str(desc).lower()
if 'pass' in desc_lower:
return 'Pass'
elif 'rush' in desc_lower or 'run' in desc_lower:
return 'Run'
else:
return 'Other'
two_pt_situations['play_type_cat'] = two_pt_situations['desc'].apply(play_type_cat)
# Calculate success by quarter situation
quarter_success = (two_pt_situations
.groupby('quarter_situation')
.agg(
attempts=('success', 'count'),
success_rate=('success', 'mean')
)
.reset_index()
.sort_values('attempts', ascending=False)
)
print("\nTwo-Point Conversion Success by Game Situation (2015-2023)")
print("="*60)
print(quarter_success.to_string(index=False))
# Calculate success by play type
play_type_success = (two_pt_situations[two_pt_situations['play_type_cat'] != 'Other']
.groupby('play_type_cat')
.agg(
attempts=('success', 'count'),
success_rate=('success', 'mean')
)
.reset_index()
)
print("\n\nTwo-Point Conversion Success by Play Type (2015-2023)")
print("="*60)
print(play_type_success.to_string(index=False))
These situational breakdowns reveal several interesting patterns:
Timing Effects: Success rates appear relatively stable across quarters, suggesting that defensive preparation and situational pressure don't dramatically affect outcomes. This is somewhat surprising—we might expect late-game pressure or defensive anticipation to reduce success rates, but the data doesn't strongly support this.
Play Type Differences: Pass plays typically show slightly higher success rates than run plays (typically 49-51% vs 45-47%), though the difference is smaller than many expect. This suggests offensive coordinators have reasonably balanced play calling, preventing defenses from selling out against one approach.
Sample Size Reality Check: While we observe differences between situations, many are likely within the range of random variation. With even 100-200 attempts per category, we'd expect success rates to vary by ±4-5 percentage points due to chance alone. Statistical significance testing would be needed to confirm which differences are meaningful.
Avoiding the Small Sample Fallacy
When analyzing situational success rates, resist the temptation to over-interpret small differences based on limited data. For example, if "overtime" shows a 60% success rate but only has 15 attempts, this doesn't mean teams are dramatically better in overtime—it could easily be random variation. Rules of thumb for sample sizes: - **10-50 attempts**: Treat success rates as very uncertain, ±10-15 percentage points - **50-200 attempts**: Moderate confidence, ±5-8 percentage points - **200-1000 attempts**: Good confidence, ±3-5 percentage points - **1000+ attempts**: High confidence, ±1-3 percentage points Always calculate confidence intervals to quantify uncertainty rather than treating point estimates as truth.Building Two-Point Conversion Probability Models
Logistic Regression Model
While overall success rates provide a useful baseline, we can build more sophisticated models that account for multiple factors simultaneously. Logistic regression allows us to estimate how various features (home field, score differential, game situation, etc.) independently affect two-point conversion probability.
This modeling approach offers several advantages:
1. Simultaneous control: We can isolate each factor's effect while controlling for others
2. Probabilistic predictions: We get probability estimates for specific situations
3. Uncertainty quantification: Coefficients come with confidence intervals
4. Interpretability: Coefficients show which factors matter most
We'll build a logistic regression model using features that theory suggests might affect success:
- Home field advantage: Teams converting at home may benefit from crowd noise and familiarity
- Score differential: Trailing/leading status might affect play calling or defensive alignment
- Late game pressure: Critical situations might affect execution
- Weather conditions: Roof type (dome/outdoor/retractable) may affect passing success
#| label: two-pt-model-r
#| message: false
#| warning: false
# Prepare modeling data with relevant features
model_data <- conversion_attempts %>%
filter(
two_point_attempt == 1, # Only two-point conversion attempts
season >= 2015, # Post-rule change era
!is.na(success) # Must have known outcome
) %>%
mutate(
# Create binary indicator for home attempts
home_attempt = if_else(posteam == home_team, 1, 0),
# Score differential (positive = leading, negative = trailing)
score_diff = score_differential,
# Late game indicator (4th quarter, <5 minutes)
is_late_game = if_else(qtr == 4 & game_seconds_remaining < 300, 1, 0),
# Simplify roof types to three categories
roof_type = case_when(
roof == "outdoors" ~ "Outdoor",
roof == "dome" ~ "Dome",
TRUE ~ "Retractable"
)
) %>%
# Select modeling variables and remove any rows with missing data
select(success, home_attempt, score_diff, is_late_game, qtr, roof_type) %>%
filter(complete.cases(.))
# Fit logistic regression model
# Logistic regression models binary outcomes (success/failure)
# It estimates log-odds of success as a linear function of predictors
model <- glm(
success ~ home_attempt + score_diff + is_late_game + qtr + roof_type,
data = model_data,
family = binomial(link = "logit") # Logistic regression specification
)
# Display model summary
summary(model)
# Generate predicted probabilities for each attempt
# type = "response" converts log-odds to probabilities
model_data$predicted_prob <- predict(model, type = "response")
# Evaluate model performance
library(pROC)
# ROC curve and AUC measure discrimination ability
# AUC of 1.0 = perfect predictions, 0.5 = no better than random
roc_obj <- roc(model_data$success, model_data$predicted_prob)
auc_value <- auc(roc_obj)
# Brier score measures calibration (lower is better)
# It's the mean squared error of predicted probabilities
brier_score <- mean((model_data$success - model_data$predicted_prob)^2)
cat("\nModel Performance:\n")
cat("AUC:", round(auc_value, 3), "\n")
cat("Brier Score:", round(brier_score, 4), "\n")
# Create interpretable coefficient table
coef_df <- data.frame(
variable = names(coef(model)),
coefficient = coef(model),
odds_ratio = exp(coef(model)), # Exponentiate to get odds ratios
p_value = summary(model)$coefficients[, 4]
) %>%
filter(variable != "(Intercept)")
# Display coefficient table
coef_df %>%
gt() %>%
cols_label(
variable = "Variable",
coefficient = "Coefficient",
odds_ratio = "Odds Ratio",
p_value = "P-value"
) %>%
fmt_number(
columns = c(coefficient, odds_ratio),
decimals = 3
) %>%
fmt_number(
columns = p_value,
decimals = 4
) %>%
tab_header(
title = "Two-Point Conversion Success Model",
subtitle = "Logistic Regression Coefficients"
)
#| label: two-pt-model-py
#| message: false
#| warning: false
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, brier_score_loss
from sklearn.preprocessing import LabelEncoder
import warnings
warnings.filterwarnings('ignore')
# Prepare modeling data with relevant features
model_data = conversion_attempts[
(conversion_attempts['two_point_attempt'] == 1) &
(conversion_attempts['season'] >= 2015) &
(conversion_attempts['success'].notna())
].copy()
# Create feature variables
model_data['home_attempt'] = (model_data['posteam'] == model_data['home_team']).astype(int)
model_data['score_diff'] = model_data['score_differential']
model_data['is_late_game'] = (
(model_data['qtr'] == 4) &
(model_data['game_seconds_remaining'] < 300)
).astype(int)
# Categorize roof type
def roof_type(roof):
if roof == 'outdoors':
return 'Outdoor'
elif roof == 'dome':
return 'Dome'
else:
return 'Retractable'
model_data['roof_type'] = model_data['roof'].apply(roof_type)
# Select features and remove missing values
features = ['home_attempt', 'score_diff', 'is_late_game', 'qtr', 'roof_type']
model_data_clean = model_data[features + ['success']].dropna()
# Encode categorical variables (roof_type) as numeric
le = LabelEncoder()
model_data_clean['roof_encoded'] = le.fit_transform(model_data_clean['roof_type'])
# Prepare feature matrix (X) and target vector (y)
X = model_data_clean[['home_attempt', 'score_diff', 'is_late_game', 'qtr', 'roof_encoded']]
y = model_data_clean['success'].astype(int)
# Fit logistic regression model
model = LogisticRegression(random_state=42, max_iter=1000)
model.fit(X, y)
# Generate predicted probabilities
# [:, 1] selects probability of success (class 1)
y_pred_proba = model.predict_proba(X)[:, 1]
# Calculate performance metrics
auc_value = roc_auc_score(y, y_pred_proba)
brier_score = brier_score_loss(y, y_pred_proba)
print("\nTwo-Point Conversion Success Model")
print("="*60)
print(f"AUC: {auc_value:.3f}")
print(f"Brier Score: {brier_score:.4f}")
# Display coefficients with odds ratios
coef_df = pd.DataFrame({
'Variable': ['home_attempt', 'score_diff', 'is_late_game', 'qtr', 'roof_encoded'],
'Coefficient': model.coef_[0],
'Odds_Ratio': np.exp(model.coef_[0]) # Exponentiate for interpretability
})
print("\nModel Coefficients:")
print(coef_df.to_string(index=False))
Understanding Model Performance
The AUC value of approximately 0.52-0.54 might seem low, but this is actually expected for two-point conversion prediction. Here's why: Two-point conversions are inherently high-variance events. Even if we knew every relevant factor—offensive and defensive quality, play call, defensive alignment—there's still massive randomness in execution: dropped passes, broken tackles, referee calls, lucky bounces. Compare to other prediction tasks: - **Coin flips**: AUC = 0.50 (pure random) - **Two-point conversions**: AUC ≈ 0.53 (mostly random with small signal) - **Fourth down conversions**: AUC ≈ 0.65 (more predictable factors) - **Game winners**: AUC ≈ 0.75 (many predictive factors) An AUC of 0.53 means our model performs slightly better than random chance—we've captured a small amount of signal in a very noisy system. This is actually useful: improving from 48% to 50% success probability changes expected value from 0.96 to 1.00 points, a meaningful difference.The model results typically show:
Significant Factors:
- Home field: Small positive effect (1-2 percentage points), likely due to crowd noise disrupting defensive communication
- Score differential: Minimal effect, suggesting teams don't significantly change success rates when desperate
- Late game: Slightly negative effect, possibly due to defensive preparation or offensive predictability
Insignificant Factors:
- Quarter: Little systematic variation across quarters
- Roof type: Minimal effect on success rates
Strategic Implications: The relatively flat effects across most variables suggest that baseline success rate (~48%) is a reasonable estimate for most situations. Adjustments based on game context should be modest (±2-3 percentage points at most) unless we have strong team-specific information.
Score-Specific Optimal Strategies
The optimal two-point decision depends heavily on the score differential. While expected value analysis suggests always going for two (0.96 vs 0.94 points), win probability considerations often override this. The key insight is that specific score differentials create strategic inflection points where certain point totals fundamentally change the path to winning.
Traditional analytical wisdom identifies several critical scenarios for going for two:
- Down 14 after first TD: Going for two sets up a potential tie with another TD+2PT, and provides information about what's needed
- Down 8 after scoring TD: Going for two attempts to tie the game (down 8 → tied)
- Up 1 after scoring TD: Going for two attempts to create a 3-point lead (field goal margin)
- Late game scenarios: Any situation where specific point differentials affect winning paths
Let's formalize this intuition by creating decision charts that map score differentials to optimal decisions.
The Information Value of Two-Point Attempts
One underappreciated aspect of two-point decisions is their information value. When you attempt a two-point conversion, the result provides information that affects optimal strategy for the remainder of the game. **Example**: Down 14, you score a TD. If you go for two: - **Success**: You're down 8. Now you know you need TD + XP to tie. - **Failure**: You're down 10. Now you know you need TD + FG to tie. This information helps you optimize clock management and play calling on subsequent drives. If you kick (down 7), you don't learn what you'll need until your next touchdown, potentially wasting time on suboptimal strategies. This information value isn't captured in simple expected value calculations but can be formalized in dynamic programming models that account for decision trees.The Classic Two-Point Chart
The classic two-point conversion chart provides guidance for trailing teams based on score differential after a touchdown. These charts, popularized by analysts like Kevin Cole and Brian Burke, codify the situations where analytics clearly favor going for two.
#| label: two-point-chart-r
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 8
# Create two-point decision chart for trailing scenarios
# This function encodes analytical guidance for when to go for two
create_two_point_chart <- function(p_2pt = 0.48, p_xp = 0.94) {
# Score differentials to consider (when trailing before scoring TD)
score_diffs <- 1:14
# Calculate optimal decision for each scenario
decisions <- tibble(
score_diff_after_td = score_diffs - 6, # After scoring TD, before conversion
go_for_2 = case_when(
# Down 8 (now down 2): Go for 2 to tie immediately
score_diffs == 8 ~ TRUE,
# Down 14 (now down 8): Go for 2 to set up second 2PT for tie
score_diffs == 14 ~ TRUE,
# Down 7 (now down 1): Kick to tie (most risk-averse option)
score_diffs == 7 ~ FALSE,
# Down 1-6: Generally kick to take lead or close gap
score_diffs <= 6 ~ FALSE,
# Down 9-13: Complex situations, generally kick
TRUE ~ FALSE
),
# Provide reasoning for each decision
reasoning = case_when(
score_diffs == 8 ~ "Go for tie",
score_diffs == 14 ~ "Set up 2nd 2PT to tie",
score_diffs == 7 ~ "Take the tie",
score_diffs <= 6 ~ "Take lead/close gap",
TRUE ~ "Kick XP"
)
)
return(decisions)
}
# Generate chart data
chart_data <- create_two_point_chart()
# Create visual representation
chart_data %>%
mutate(
decision_label = if_else(go_for_2, "GO FOR 2", "KICK XP"),
score_label = paste0("Down ", -score_diff_after_td)
) %>%
ggplot(aes(x = reorder(score_label, -score_diff_after_td),
y = 1,
fill = go_for_2)) +
geom_tile(color = "white", size = 2) +
# Add decision text
geom_text(aes(label = decision_label),
size = 5, fontface = "bold", color = "white") +
# Add reasoning text below decision
geom_text(aes(label = reasoning),
y = 0.6, size = 3.5, color = "white") +
scale_fill_manual(
values = c("TRUE" = "#D50032", "FALSE" = "#0076CE"),
guide = "none"
) +
labs(
title = "Two-Point Conversion Decision Chart",
subtitle = "Optimal decisions when TRAILING after scoring a touchdown",
x = "Score Differential (After TD, Before Conversion Attempt)",
y = NULL,
caption = "Assumes: 48% 2PT success, 94% XP success\nChart represents general strategy; late-game situations require dynamic analysis"
) +
theme_minimal() +
theme(
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid = element_blank(),
plot.title = element_text(face = "bold", size = 16),
plot.subtitle = element_text(size = 11),
axis.text.x = element_text(size = 11, face = "bold")
) +
coord_fixed(ratio = 3)
#| label: two-point-chart-py
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 8
def create_two_point_chart(p_2pt=0.48, p_xp=0.94):
"""
Create two-point decision chart for trailing scenarios
Args:
p_2pt: Probability of two-point conversion success
p_xp: Probability of extra point success
Returns:
DataFrame with optimal decisions by score differential
"""
score_diffs = list(range(1, 15))
decisions = []
for diff in score_diffs:
# Score after TD but before conversion attempt
score_diff_after_td = diff - 6
# Apply decision logic based on score differential
if diff == 8:
go_for_2 = True
reasoning = "Go for tie"
elif diff == 14:
go_for_2 = True
reasoning = "Set up 2nd 2PT to tie"
elif diff == 7:
go_for_2 = False
reasoning = "Take the tie"
elif diff <= 6:
go_for_2 = False
reasoning = "Take lead/close gap"
else:
go_for_2 = False
reasoning = "Kick XP"
decisions.append({
'score_diff_after_td': score_diff_after_td,
'go_for_2': go_for_2,
'reasoning': reasoning
})
return pd.DataFrame(decisions)
# Generate chart data
chart_data = create_two_point_chart()
# Create visualization
fig, ax = plt.subplots(figsize=(12, 4))
for idx, row in chart_data.iterrows():
score_label = f"Down {-row['score_diff_after_td']}"
decision_label = "GO FOR 2" if row['go_for_2'] else "KICK XP"
color = '#D50032' if row['go_for_2'] else '#0076CE'
# Draw rectangle for each score scenario
rect = plt.Rectangle((idx, 0), 1, 1, facecolor=color, edgecolor='white', linewidth=2)
ax.add_patch(rect)
# Add decision label
ax.text(idx + 0.5, 0.7, decision_label,
ha='center', va='center', fontsize=10, fontweight='bold', color='white')
# Add reasoning label
ax.text(idx + 0.5, 0.3, row['reasoning'],
ha='center', va='center', fontsize=8, color='white')
# Configure plot
ax.set_xlim(0, len(chart_data))
ax.set_ylim(0, 1)
ax.set_xticks([i + 0.5 for i in range(len(chart_data))])
ax.set_xticklabels([f"Down {-row['score_diff_after_td']}"
for _, row in chart_data.iterrows()],
fontweight='bold', fontsize=10)
ax.set_yticks([])
ax.set_xlabel('Score Differential (After TD, Before Conversion Attempt)',
fontsize=12, fontweight='bold')
ax.set_title('Two-Point Conversion Decision Chart\nOptimal decisions when TRAILING after scoring a touchdown',
fontsize=14, fontweight='bold', pad=20)
ax.text(0.5, -0.4, 'Assumes: 48% 2PT success, 94% XP success\nChart represents general strategy; late-game situations require dynamic analysis',
transform=ax.transAxes, ha='center', fontsize=8, style='italic')
plt.tight_layout()
plt.show()
The decision chart reveals an important principle: two-point conversions are most valuable when they enable or prevent specific score differentials that fundamentally alter win paths. Converting when down 8 ties the game—a massive win probability change. Converting when down 5 makes it a 3-point game instead of 4-point game—a smaller win probability change.
Why "Down 14" is Counterintuitive
Many coaches and fans find the "go for two when down 14" recommendation counterintuitive. The reasoning seems backwards: why try the harder option first when you could kick two easy extra points? The key is information and optionality: **Strategy 1: Kick both extra points** - First TD: Kick → down 7 - Second TD: Kick → tied - Problem: If you score a second TD with 0:02 left, you can't go for two to win—you've locked yourself into overtime with no chance to win in regulation **Strategy 2: Go for two on first TD, adjust based on result** - First TD: Go for 2 - Success → down 8 → Second TD + XP = tie - Failure → down 10 → Second TD + XP = down 3 (need FG to tie) - Advantage: Maintains optionality—if you score a second TD late, you can still decide whether to go for two to win or kick to tie The second strategy gives you flexibility to adapt based on how the game unfolds, while the first strategy locks you into a predetermined path.Dynamic Programming for Late-Game Decisions
The simple decision chart provides good heuristics, but late-game situations require more sophisticated analysis. When time is limited, we need to consider:
- Expected possessions: How many more drives will each team have?
- Clock management: Can the opponent run out the clock if they get the ball?
- Win/tie/loss probabilities: Different outcomes have different values in different time situations
- Downstream scenarios: How does this decision affect future decision points?
Dynamic programming provides a framework for this analysis. We work backwards from the end of the game, calculating optimal strategies at each decision point given optimal play thereafter. While a full dynamic programming solution is complex (requiring possession models, scoring probabilities, etc.), we can build a simplified model that captures the key insights.
#| label: dynamic-programming-r
#| message: false
#| warning: false
# Dynamic programming for optimal two-point decisions
# This is a simplified model for demonstration purposes
# A full model would incorporate possession probabilities, opponent scoring rates, etc.
calculate_win_probability_2pt <- function(
score_diff_after_td, # Score after TD, before conversion
time_remaining_seconds,
p_2pt = 0.48,
p_xp = 0.94
) {
# This simplified model uses score differential as proxy for win probability
# In practice, you'd use a full win probability model incorporating:
# - Time remaining
# - Timeouts
# - Possession dynamics
# - Team quality
# - Field position
# Calculate score after each possible outcome
score_after_2pt_success <- score_diff_after_td + 2
score_after_2pt_fail <- score_diff_after_td
score_after_xp_success <- score_diff_after_td + 1
score_after_xp_fail <- score_diff_after_td
# Simple win probability based on score differential
# Using normal CDF centered at 0 as approximation
# Positive score differential → higher win probability
wp_2pt_success <- pnorm(score_after_2pt_success / 7)
wp_2pt_fail <- pnorm(score_after_2pt_fail / 7)
wp_xp_success <- pnorm(score_after_xp_success / 7)
wp_xp_fail <- pnorm(score_after_xp_fail / 7)
# Calculate expected win probability for each decision
ev_2pt <- p_2pt * wp_2pt_success + (1 - p_2pt) * wp_2pt_fail
ev_xp <- p_xp * wp_xp_success + (1 - p_xp) * wp_xp_fail
return(list(
ev_2pt = ev_2pt,
ev_xp = ev_xp,
optimal = if_else(ev_2pt > ev_xp, "Go for 2", "Kick XP"),
advantage = ev_2pt - ev_xp
))
}
# Create decision matrix for different score/time scenarios
scenarios <- expand_grid(
score_diff = seq(-8, -1, 1), # Trailing by 1-8 after TD
time_remaining = c(600, 300, 120, 60) # 10:00, 5:00, 2:00, 1:00
) %>%
rowwise() %>%
mutate(
# Calculate optimal decision for each scenario
result = list(calculate_win_probability_2pt(score_diff, time_remaining)),
ev_2pt = result$ev_2pt,
ev_xp = result$ev_xp,
optimal = result$optimal,
advantage = result$advantage
) %>%
ungroup() %>%
mutate(
# Create readable time labels
time_label = case_when(
time_remaining == 600 ~ "10:00",
time_remaining == 300 ~ "5:00",
time_remaining == 120 ~ "2:00",
time_remaining == 60 ~ "1:00"
)
)
# Visualize decision matrix as heatmap
scenarios %>%
ggplot(aes(x = factor(score_diff), y = time_label, fill = advantage)) +
geom_tile(color = "white", size = 1) +
# Add optimal decision text to each cell
geom_text(aes(label = optimal), size = 4, fontweight = "bold") +
scale_fill_gradient2(
low = "#0076CE", # Blue for "kick XP"
mid = "white", # White for neutral
high = "#D50032", # Red for "go for 2"
midpoint = 0,
name = "WP Advantage\n(2PT - XP)"
) +
labs(
title = "Dynamic Two-Point Conversion Decisions",
subtitle = "Optimal strategy by score and time remaining (simplified model)",
x = "Score Differential (After TD, Before Conversion)",
y = "Time Remaining",
caption = "Assumes: 48% 2PT success, 94% XP success | Model simplified for illustration"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "right"
)
#| label: dynamic-programming-py
#| message: false
#| warning: false
from scipy.stats import norm
def calculate_win_probability_2pt(
score_diff_after_td,
time_remaining_seconds,
p_2pt=0.48,
p_xp=0.94
):
"""
Calculate expected win probability for 2PT vs XP decision
This is a simplified model for demonstration. A full model would incorporate:
- Expected possessions remaining
- Opponent scoring probabilities
- Field position effects
- Timeout situations
- Team-specific factors
Args:
score_diff_after_td: Score differential after TD, before conversion attempt
time_remaining_seconds: Seconds remaining in game
p_2pt: Probability of two-point conversion success
p_xp: Probability of extra point success
Returns:
Dictionary with expected win probabilities and optimal decision
"""
# Calculate score after each possible outcome
score_after_2pt_success = score_diff_after_td + 2
score_after_2pt_fail = score_diff_after_td
score_after_xp_success = score_diff_after_td + 1
score_after_xp_fail = score_diff_after_td
# Simple win probability based on score differential
# Using normal CDF as approximation (centered at 0, scaled by 7)
wp_2pt_success = norm.cdf(score_after_2pt_success / 7)
wp_2pt_fail = norm.cdf(score_after_2pt_fail / 7)
wp_xp_success = norm.cdf(score_after_xp_success / 7)
wp_xp_fail = norm.cdf(score_after_xp_fail / 7)
# Calculate expected win probability for each decision
ev_2pt = p_2pt * wp_2pt_success + (1 - p_2pt) * wp_2pt_fail
ev_xp = p_xp * wp_xp_success + (1 - p_xp) * wp_xp_fail
return {
'ev_2pt': ev_2pt,
'ev_xp': ev_xp,
'optimal': 'Go for 2' if ev_2pt > ev_xp else 'Kick XP',
'advantage': ev_2pt - ev_xp
}
# Create decision matrix for various score/time scenarios
score_diffs = list(range(-8, 0)) # Down 8 to down 1
time_remaining_values = [600, 300, 120, 60] # Different time scenarios
time_labels = {600: '10:00', 300: '5:00', 120: '2:00', 60: '1:00'}
scenarios = []
for score_diff in score_diffs:
for time_remaining in time_remaining_values:
result = calculate_win_probability_2pt(score_diff, time_remaining)
scenarios.append({
'score_diff': score_diff,
'time_remaining': time_remaining,
'time_label': time_labels[time_remaining],
'ev_2pt': result['ev_2pt'],
'ev_xp': result['ev_xp'],
'optimal': result['optimal'],
'advantage': result['advantage']
})
scenarios_df = pd.DataFrame(scenarios)
# Create heatmap visualization
pivot_data = scenarios_df.pivot(index='time_label', columns='score_diff', values='advantage')
pivot_labels = scenarios_df.pivot(index='time_label', columns='score_diff', values='optimal')
fig, ax = plt.subplots(figsize=(12, 6))
# Create heatmap with color gradient
im = ax.imshow(pivot_data.values, cmap='RdBu_r', aspect='auto', vmin=-0.1, vmax=0.1)
# Set axis ticks and labels
ax.set_xticks(np.arange(len(pivot_data.columns)))
ax.set_yticks(np.arange(len(pivot_data.index)))
ax.set_xticklabels(pivot_data.columns)
ax.set_yticklabels(['10:00', '5:00', '2:00', '1:00'])
# Add text annotations showing optimal decision
for i in range(len(pivot_data.index)):
for j in range(len(pivot_data.columns)):
text = ax.text(j, i, pivot_labels.iloc[i, j],
ha="center", va="center", color="black", fontweight='bold', fontsize=9)
# Labels and title
ax.set_xlabel('Score Differential (After TD, Before Conversion)', fontsize=12, fontweight='bold')
ax.set_ylabel('Time Remaining', fontsize=12, fontweight='bold')
ax.set_title('Dynamic Two-Point Conversion Decisions\nOptimal strategy by score and time remaining (simplified model)',
fontsize=14, fontweight='bold', pad=20)
# Add colorbar
cbar = plt.colorbar(im, ax=ax)
cbar.set_label('WP Advantage\n(2PT - XP)', rotation=270, labelpad=20, fontweight='bold')
plt.tight_layout()
fig.text(0.5, 0.01, 'Assumes: 48% 2PT success, 94% XP success | Model simplified for illustration',
ha='center', fontsize=8, style='italic')
plt.show()
Even with this simplified model, we can see key patterns:
Down 2 (trailing by 8, scored TD): Strong preference for going for two across all time scenarios. Converting ties the game immediately—a massive win probability boost that far outweighs the expected value calculation.
Down 1 (trailing by 7, scored TD): Preference for kicking across all times. Guaranteeing a tie is worth more than the variance of a two-point attempt, especially with sufficient time for overtime.
Intermediate Scores: More nuanced decisions where time remaining would matter more in a full model. With very little time, guaranteed points become more valuable because there's no chance to make up failed attempts.
Building a Full Dynamic Programming Model
A complete dynamic programming model for two-point decisions would need: 1. **State Space**: Define all possible game states - Score differential - Time remaining - Possession (who has the ball) - Field position - Timeouts remaining - Down and distance 2. **Transition Probabilities**: Model how states evolve - Probability of scoring on a drive by field position - Time taken per drive - Probability of defensive stops - Turnover probabilities 3. **Value Function**: Define value of each state - Terminal states (game over): 1 for win, 0 for loss, 0.5 for tie - Non-terminal states: Expected value of optimal future play 4. **Backward Induction**: Work backwards from game end - Start with 0:00 remaining (terminal states) - Calculate optimal decisions and values for each state - Move backwards in time until game start This is computationally intensive but provides exact optimal strategies for all scenarios. Chapter 24 on fourth down decisions covers similar dynamic programming approaches in more detail.Play-Calling on Two-Point Attempts
Beyond deciding when to attempt two-point conversions, teams must decide how to attempt them. Should teams pass or run? From what formations? With what personnel packages? These tactical decisions significantly affect success probability.
Analyzing play-calling patterns reveals strategic tendencies and helps identify best practices. We can examine:
1. Pass vs. run balance: Do teams pass or run more often?
2. Success rates by play type: Which approach works better?
3. Trends over time: Has play-calling evolved?
4. Predictability: Do teams become too predictable in their calls?
Historical Play-Calling Trends
The following analysis examines how teams have called plays on two-point conversions since the 2015 rule change, tracking both the distribution of pass vs. run plays and their relative success rates.
#| label: play-calling-analysis-r
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 6
# Analyze play-calling on two-point attempts
# We'll classify plays as pass or run and track success rates
two_pt_plays <- conversion_attempts %>%
filter(
two_point_attempt == 1,
season >= 2015
) %>%
mutate(
# Classify play type from play description text
# This is imperfect but provides directional insight
play_type_clean = case_when(
grepl("pass", desc, ignore.case = TRUE) ~ "Pass",
grepl("rush|run", desc, ignore.case = TRUE) ~ "Run",
grepl("scramble", desc, ignore.case = TRUE) ~ "Scramble",
TRUE ~ "Other"
)
) %>%
# Focus on standard pass and run plays
filter(play_type_clean %in% c("Pass", "Run"))
# Calculate annual trends in play-calling and success
play_calling_trends <- two_pt_plays %>%
group_by(season, play_type_clean) %>%
summarise(
attempts = n(),
success_rate = mean(success, na.rm = TRUE),
.groups = "drop"
) %>%
group_by(season) %>%
mutate(
# Calculate percentage of two-point attempts using each play type
pct_of_attempts = attempts / sum(attempts)
)
# Plot 1: Distribution of play types over time
p1 <- play_calling_trends %>%
ggplot(aes(x = season, y = pct_of_attempts, color = play_type_clean)) +
geom_line(size = 1.2) +
geom_point(size = 3) +
scale_y_continuous(labels = scales::percent_format()) +
scale_color_manual(values = c("Pass" = "#0077BE", "Run" = "#FF6B35")) +
labs(
title = "Two-Point Attempt Play Type Distribution",
x = NULL,
y = "% of Attempts",
color = "Play Type"
) +
theme(legend.position = "top")
# Plot 2: Success rates by play type over time
p2 <- play_calling_trends %>%
ggplot(aes(x = season, y = success_rate, color = play_type_clean)) +
geom_line(size = 1.2) +
geom_point(size = 3) +
# Add reference line at overall 48% success rate
geom_hline(yintercept = 0.48, linetype = "dashed", alpha = 0.5) +
scale_y_continuous(labels = scales::percent_format()) +
scale_color_manual(values = c("Pass" = "#0077BE", "Run" = "#FF6B35")) +
labs(
title = "Success Rate by Play Type",
x = "Season",
y = "Success Rate",
color = "Play Type"
) +
theme(legend.position = "top")
# Combine plots vertically
p1 / p2 +
plot_annotation(
caption = "Data: nflfastR | Dashed line shows overall 48% success rate"
)
#| label: play-calling-analysis-py
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 8
# Analyze play-calling on two-point attempts
two_pt_plays = conversion_attempts[
(conversion_attempts['two_point_attempt'] == 1) &
(conversion_attempts['season'] >= 2015)
].copy()
def classify_play_type(desc):
"""Classify play type from play description text"""
if pd.isna(desc):
return 'Other'
desc_lower = str(desc).lower()
if 'pass' in desc_lower:
return 'Pass'
elif 'rush' in desc_lower or 'run' in desc_lower:
return 'Run'
elif 'scramble' in desc_lower:
return 'Scramble'
else:
return 'Other'
two_pt_plays['play_type_clean'] = two_pt_plays['desc'].apply(classify_play_type)
# Focus on standard pass and run plays
two_pt_plays = two_pt_plays[two_pt_plays['play_type_clean'].isin(['Pass', 'Run'])]
# Calculate trends in play-calling and success
play_calling_trends = (two_pt_plays
.groupby(['season', 'play_type_clean'])
.agg(
attempts=('success', 'count'),
success_rate=('success', 'mean')
)
.reset_index()
)
# Calculate percentage of attempts by play type
play_calling_trends['total_attempts'] = play_calling_trends.groupby('season')['attempts'].transform('sum')
play_calling_trends['pct_of_attempts'] = play_calling_trends['attempts'] / play_calling_trends['total_attempts']
# Create visualization with two subplots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))
# Plot 1: Play type distribution over time
for play_type, color in [('Pass', '#0077BE'), ('Run', '#FF6B35')]:
data = play_calling_trends[play_calling_trends['play_type_clean'] == play_type]
ax1.plot(data['season'], data['pct_of_attempts'],
color=color, linewidth=2, marker='o', markersize=6, label=play_type)
ax1.set_ylabel('% of Attempts', fontsize=11)
ax1.set_title('Two-Point Attempt Play Type Distribution', fontsize=12, fontweight='bold')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax1.legend(loc='upper right', frameon=True)
ax1.grid(True, alpha=0.3)
ax1.set_xticks(range(2015, 2024))
# Plot 2: Success rates by play type over time
for play_type, color in [('Pass', '#0077BE'), ('Run', '#FF6B35')]:
data = play_calling_trends[play_calling_trends['play_type_clean'] == play_type]
ax2.plot(data['season'], data['success_rate'],
color=color, linewidth=2, marker='o', markersize=6, label=play_type)
# Add reference line at 48% overall success rate
ax2.axhline(y=0.48, color='gray', linestyle='--', alpha=0.5, linewidth=1.5)
ax2.set_xlabel('Season', fontsize=11)
ax2.set_ylabel('Success Rate', fontsize=11)
ax2.set_title('Success Rate by Play Type', fontsize=12, fontweight='bold')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.legend(loc='upper right', frameon=True)
ax2.grid(True, alpha=0.3)
ax2.set_xticks(range(2015, 2024))
plt.tight_layout()
fig.text(0.99, 0.01, 'Data: nfl_data_py | Dashed line shows overall 48% success rate',
ha='right', fontsize=8, style='italic')
plt.show()
The play-calling analysis reveals several patterns:
Pass-Heavy Approach: Teams pass on approximately 60-65% of two-point attempts, significantly more than their regular play-calling (which typically runs about 60-40 pass-run overall, but closer to 50-50 near the goal line). This suggests teams view passing as more effective in short-yardage, high-stakes situations.
Success Rate Parity: Despite the pass-heavy approach, success rates are similar for passes (~49%) and runs (~47%). This rough parity suggests teams have found a strategic balance—if passes were dramatically more successful, defenses would adjust their alignments, and teams would run more.
Year-to-Year Variation: Both distribution and success rates show significant year-to-year variation, likely due to small sample sizes (100-200 attempts per category per year). We shouldn't interpret this variation as meaningful strategic shifts without additional evidence.
Strategic Implications: The 60-40 pass-run split with similar success rates suggests both play types are viable. Teams likely maintain this mix to prevent defensive predictability—if teams passed 90% of the time, defenses could defend accordingly and reduce pass success rates.
The Predictability Problem
One challenge in two-point conversion play-calling is predictability. With limited plays and film to study, defenses can identify tendencies and prepare specific game plans for each opponent's two-point conversion package. **Example Scenario**: If a team runs the same play or formation on 80% of their two-point attempts, opponents can prepare a specific defensive call that directly counters that tendency. This is why successful two-point packages typically include multiple formations, motion patterns, and play options—maintaining unpredictability even with a small playbook. **Analytics Implication**: When evaluating two-point success rates, account for: 1. **Sample size**: 5 attempts is too small to identify true tendencies 2. **Opponent adjustments**: Success rates may decline as opponents accumulate film 3. **Context**: Desperate late-game attempts may force predictable play callsFormation and Personnel Analysis
Beyond pass vs. run, the formations and personnel groupings teams use on two-point attempts affect success rates. While detailed formation data is limited in publicly available datasets, we can analyze aggregate success rates and yards gained by play type.
#| label: formation-analysis-r
#| message: false
#| warning: false
# Analyze aggregate performance by play type
# Formation data is limited in nflfastR, so we focus on outcomes
two_pt_summary <- two_pt_plays %>%
group_by(play_type_clean) %>%
summarise(
total_attempts = n(),
successes = sum(success, na.rm = TRUE),
success_rate = mean(success, na.rm = TRUE),
avg_yards = mean(yards_gained, na.rm = TRUE),
.groups = "drop"
)
# Display formatted table
two_pt_summary %>%
gt() %>%
cols_label(
play_type_clean = "Play Type",
total_attempts = "Attempts",
successes = "Successes",
success_rate = "Success Rate",
avg_yards = "Avg Yards"
) %>%
fmt_number(
columns = c(success_rate),
decimals = 1,
scale_by = 100,
pattern = "{x}%"
) %>%
fmt_number(
columns = avg_yards,
decimals = 2
) %>%
fmt_number(
columns = c(total_attempts, successes),
decimals = 0,
use_seps = TRUE
) %>%
tab_header(
title = "Two-Point Conversion Performance by Play Type",
subtitle = "2015-2023 Seasons"
) %>%
# Color code success rates (red = low, green = high)
data_color(
columns = success_rate,
colors = scales::col_numeric(
palette = c("#FFA07A", "#98FB98"),
domain = c(0.4, 0.6)
)
)
#| label: formation-analysis-py
#| message: false
#| warning: false
# Analyze aggregate performance by play type
two_pt_summary = (two_pt_plays
.groupby('play_type_clean')
.agg(
total_attempts=('success', 'count'),
successes=('success', 'sum'),
success_rate=('success', 'mean'),
avg_yards=('yards_gained', 'mean')
)
.reset_index()
)
print("\nTwo-Point Conversion Performance by Play Type (2015-2023)")
print("="*70)
print(two_pt_summary.to_string(index=False))
The aggregate results typically show:
Balanced Success: Pass and run plays show similar success rates (within 2-3 percentage points), suggesting neither has a systematic advantage. This balance likely reflects strategic equilibrium where teams exploit whichever approach defenses de-emphasize.
Yards Gained: Even failed attempts typically gain positive yards (0.5-1.0 yards on average), suggesting plays are close to success. The two-yard line is inherently difficult regardless of play type.
Attempt Volume: The roughly 60-40 pass-run split mirrors general offensive philosophy trends in the modern NFL, where passing has become increasingly prevalent even in short-yardage situations.
Defensive Strategy Against Two-Point Conversions
While most analysis focuses on offensive decisions, defensive performance on two-point conversions varies significantly by team. Some defenses consistently shut down two-point attempts, while others struggle. Understanding these differences helps identify best practices and informs offensive play-calling.
Defensive Success Factors
We can analyze which teams have been most successful at defending two-point conversions and look for patterns that might explain their success.
#| label: defensive-analysis-r
#| message: false
#| warning: false
# Analyze defensive performance on two-point attempts
defensive_performance <- two_pt_plays %>%
group_by(defteam, season) %>%
summarise(
attempts_against = n(),
conversions_allowed = sum(success, na.rm = TRUE),
success_rate_against = mean(success, na.rm = TRUE),
.groups = "drop"
) %>%
# Filter to teams with meaningful sample sizes
filter(attempts_against >= 5) %>%
# Sort by success rate (lower is better for defense)
arrange(success_rate_against)
# Display top 10 defensive performances
defensive_performance %>%
slice_head(n = 10) %>%
gt() %>%
cols_label(
defteam = "Team",
season = "Season",
attempts_against = "Attempts",
conversions_allowed = "Allowed",
success_rate_against = "Success Rate"
) %>%
fmt_number(
columns = success_rate_against,
decimals = 1,
scale_by = 100,
pattern = "{x}%"
) %>%
fmt_number(
columns = c(attempts_against, conversions_allowed),
decimals = 0
) %>%
tab_header(
title = "Best Defensive Performances Against Two-Point Conversions",
subtitle = "Minimum 5 attempts faced | 2015-2023"
) %>%
# Color code success rates (green = good defense, red = bad)
data_color(
columns = success_rate_against,
colors = scales::col_numeric(
palette = c("#98FB98", "#FFA07A"),
domain = c(0, 0.5)
)
)
#| label: defensive-analysis-py
#| message: false
#| warning: false
# Analyze defensive performance on two-point attempts
defensive_performance = (two_pt_plays
.groupby(['defteam', 'season'])
.agg(
attempts_against=('success', 'count'),
conversions_allowed=('success', 'sum'),
success_rate_against=('success', 'mean')
)
.reset_index()
)
# Filter to meaningful sample sizes and sort by success rate
defensive_performance = defensive_performance[
defensive_performance['attempts_against'] >= 5
].sort_values('success_rate_against')
print("\nBest Defensive Performances Against Two-Point Conversions")
print("Minimum 5 attempts faced | 2015-2023")
print("="*70)
print(defensive_performance.head(10).to_string(index=False))
The defensive analysis typically reveals:
High Variance: Even the best defensive seasons show success rates of 20-30%, while the worst allow 60-70%. This range is far wider than for most defensive metrics, reflecting the small sample sizes and high variance of two-point plays.
No Clear Patterns: Elite defenses overall don't consistently dominate two-point conversions, and weak defenses don't consistently struggle. The randomness of individual plays and small sample sizes overwhelm systematic defensive quality.
Coaching Impact: Some defensive coordinators are particularly creative with two-point defensive packages, using exotic formations and disguises. However, measuring this impact is difficult with limited data.
Defensive Game Planning for Two-Point Conversions
Defensive coordinators typically prepare 3-5 specific two-point conversion defenses, often including: 1. **Goal line press**: Heavy box, man-to-man coverage, selling out to stop the run 2. **Zone blitz**: Send pressure while dropping a lineman into coverage to confuse blocking assignments 3. **Prevent/soft**: Back off to prevent easy completions, force offense to execute perfectly 4. **Exotic looks**: Unusual formations or disguises to create confusion The key is having multiple options to counter different offensive formations and preventing the offense from identifying the defensive call pre-snap. Given the limited practice time for two-point situations, defenses that have well-rehearsed packages and clear communication protocols tend to perform better.Advanced Topics: Bayesian Two-Point Models
For readers interested in more sophisticated statistical approaches, we can use Bayesian methods to estimate team-specific two-point conversion probabilities. Unlike frequentist methods that treat each team's success rate as a fixed parameter to estimate, Bayesian approaches use hierarchical models that allow information to be shared across teams.
The key insight is that while each team has its own "true" two-point conversion ability, we expect these abilities to be somewhat similar across teams. A team that's attempted 8 two-point conversions and converted 1 (12.5%) probably isn't truly a 12.5% team—they're likely closer to average but got unlucky. Bayesian hierarchical models formalize this intuition through "shrinkage" toward the population mean.
#| label: bayesian-model-r
#| message: false
#| warning: false
#| eval: false
# Bayesian hierarchical model for two-point conversion rates
# Note: This requires additional packages and is computationally intensive
# Set eval: false to prevent automatic execution
library(rstan)
# Prepare data for Stan
team_2pt_data <- two_pt_plays %>%
group_by(posteam) %>%
summarise(
attempts = n(),
successes = sum(success, na.rm = TRUE)
) %>%
# Require minimum attempts for meaningful estimation
filter(attempts >= 5)
# Package data for Stan model
stan_data <- list(
N = nrow(team_2pt_data),
attempts = team_2pt_data$attempts,
successes = team_2pt_data$successes
)
# Stan model code for hierarchical beta-binomial model
stan_code <- "
data {
int<lower=0> N; // number of teams
int<lower=0> attempts[N]; // attempts per team
int<lower=0> successes[N]; // successes per team
}
parameters {
real<lower=0,upper=1> theta[N]; // team-specific success rates
real<lower=0,upper=1> mu; // population mean success rate
real<lower=0> kappa; // concentration parameter (controls shrinkage)
}
model {
// Priors
mu ~ beta(12, 13); // Prior centered at ~0.48 (12/(12+13))
kappa ~ gamma(2, 0.1); // Weakly informative prior on concentration
// Hierarchical model: team rates drawn from population distribution
// Higher kappa = more similar across teams (stronger shrinkage)
// Lower kappa = more variation across teams (weaker shrinkage)
theta ~ beta(mu * kappa, (1 - mu) * kappa);
// Likelihood: observed successes given team-specific rates
successes ~ binomial(attempts, theta);
}
"
# To run this model:
# fit <- stan(model_code = stan_code, data = stan_data,
# iter = 2000, chains = 4, seed = 42)
#
# # Extract posterior samples
# posterior_samples <- extract(fit)
#
# # Get team-specific estimates with uncertainty
# team_estimates <- data.frame(
# team = team_2pt_data$posteam,
# attempts = team_2pt_data$attempts,
# observed_rate = team_2pt_data$successes / team_2pt_data$attempts,
# posterior_mean = colMeans(posterior_samples$theta),
# posterior_lower = apply(posterior_samples$theta, 2, quantile, 0.025),
# posterior_upper = apply(posterior_samples$theta, 2, quantile, 0.975)
# )
cat("Bayesian model code prepared (not executed in document)\n")
cat("This model estimates team-specific 2PT conversion rates\n")
cat("with hierarchical shrinkage toward population mean\n")
#| label: bayesian-model-py
#| message: false
#| warning: false
#| eval: false
# Bayesian hierarchical model for two-point conversion rates
# Using PyMC for Bayesian inference
# Set eval: false to prevent automatic execution
import pymc as pm
# Prepare data
team_2pt_data = (two_pt_plays
.groupby('posteam')
.agg(
attempts=('success', 'count'),
successes=('success', 'sum')
)
.reset_index()
)
# Require minimum attempts
team_2pt_data = team_2pt_data[team_2pt_data['attempts'] >= 5]
# Build Bayesian hierarchical model
with pm.Model() as hierarchical_model:
# Hyperpriors for population distribution
mu = pm.Beta('mu', alpha=12, beta=13) # Prior centered at ~0.48
kappa = pm.Gamma('kappa', alpha=2, beta=0.1) # Concentration parameter
# Team-specific rates drawn from population distribution
# Beta distribution parameterized by mean (mu) and concentration (kappa)
theta = pm.Beta('theta',
alpha=mu * kappa,
beta=(1 - mu) * kappa,
shape=len(team_2pt_data))
# Likelihood: observed successes given team-specific rates
successes = pm.Binomial('successes',
n=team_2pt_data['attempts'].values,
p=theta,
observed=team_2pt_data['successes'].values)
# To run this model:
# with hierarchical_model:
# trace = pm.sample(2000, tune=1000, chains=4, random_seed=42)
#
# # Extract posterior estimates
# posterior_means = trace.posterior['theta'].mean(dim=['chain', 'draw']).values
# posterior_lower = trace.posterior['theta'].quantile(0.025, dim=['chain', 'draw']).values
# posterior_upper = trace.posterior['theta'].quantile(0.975, dim=['chain', 'draw']).values
#
# team_estimates = pd.DataFrame({
# 'team': team_2pt_data['posteam'].values,
# 'attempts': team_2pt_data['attempts'].values,
# 'observed_rate': team_2pt_data['successes'].values / team_2pt_data['attempts'].values,
# 'posterior_mean': posterior_means,
# 'posterior_lower': posterior_lower,
# 'posterior_upper': posterior_upper
# })
print("Bayesian model code prepared (not executed in document)")
print("This model estimates team-specific 2PT conversion rates")
print("with hierarchical shrinkage toward population mean")
When to Use Bayesian vs. Frequentist Methods
**Use Frequentist Approaches** (like logistic regression) when: - Sample sizes are large (100+ observations per group) - You need fast computation - Interpretation of p-values and confidence intervals is important for your audience - You're testing specific hypotheses about coefficients **Use Bayesian Approaches** (like hierarchical models) when: - Sample sizes are small or vary widely across groups - You want to incorporate prior information - You need to estimate many related parameters (e.g., all 32 teams) - You want full posterior distributions for decision-making under uncertainty - Shrinkage/regularization is desirable to prevent overfitting For two-point conversions, the small sample sizes and desire to estimate team-specific rates make Bayesian hierarchical models particularly appropriate, though the added complexity may not be necessary for basic analysis.Case Studies: Notable Two-Point Decisions
Examining specific high-profile two-point conversion decisions helps illustrate the concepts we've developed and shows how theory applies in critical game situations.
Philadelphia Eagles - Super Bowl LII (2017 Season)
The Eagles' aggressive two-point strategy throughout the 2017 season culminated in their Super Bowl victory. Head coach Doug Pederson and offensive coordinator Frank Reich consistently went for two-point conversions in situations where analytics supported the decision, building trust in their approach and developing a robust two-point playbook.
Season-Long Strategy: The Eagles attempted 6 two-point conversions during the regular season (more than most teams), converting 4 (67%). This high success rate likely reflected:
1. Extensive practice and preparation
2. Multiple play options preventing defensive predictability
3. Commitment to the strategy even after failures
4. Offensive creativity and willingness to try unconventional plays
Super Bowl Impact: While they didn't attempt a two-point conversion in the Super Bowl itself, their season-long commitment to aggressive analytics-driven decisions created a culture that enabled other bold calls, including the famous "Philly Special" fourth-down touchdown pass to QB Nick Foles.
Analytics-Driven Decisions Gone Right
Case Study: Down 14 Strategy
When trailing by 14 points after scoring a touchdown: **Traditional Approach**: Kick both extra points (down 8 → down 1) - Path: TD + XP (down 8) → TD + XP (tied) - Success probability: (0.94)² = 88.4% - Result: Tie game requiring overtime - Weakness: Locks into predetermined path with no flexibility **Analytics Approach**: Go for 2 on first TD, adjust based on result - If successful (48% probability): Down 14 → Down 8 → Down 1 (with XP after second TD) - If failed (52% probability): Down 14 → Down 10 → Down 3 (with XP after second TD, then need FG) - Advantage: Information value guides future strategy **The Key Insight**: Going for two on the first touchdown gives you information that affects clock management and risk-taking on subsequent drives. If it succeeds, you know exactly what you need (TD+XP). If it fails, you know you need TD+FG, changing how aggressively you manage the clock and whether you accept field goals on fourth down. **Expected Outcome Calculation**: - Traditional: 88.4% chance of tie, 0% chance of winning in regulation - Analytics: ~45% chance of tie (both TDs successful, converting or kicking as appropriate), plus small chance of winning in regulation if you score second TD very late - The information value and maintained optionality make the analytics approach superior even though the expected point total is nearly identicalThis case study illustrates why simple expected value calculations miss important strategic considerations. The value of information and the importance of maintaining decision flexibility can outweigh small differences in expected points.
Lessons from High-Profile Decisions
Several lessons emerge from studying notable two-point conversion decisions: 1. **Preparation Matters**: Teams that practice two-point situations extensively and develop multiple play options show higher success rates 2. **Confidence Enables Execution**: Coaches who trust the analytics and commit to the strategy enable better execution by players who know the decision is sound 3. **Failure is Part of Optimization**: Even optimal strategies fail nearly half the time. Single failures shouldn't cause abandonment of sound decision frameworks 4. **Context Complexity**: While our models provide guidance, actual game decisions involve factors our models don't capture—momentum, player fatigue, injury situations, weather changes 5. **Communication is Critical**: Whether teams go for two or not, clear communication about the decision-making process helps players understand and execute the plan The gap between "what analytics recommends" and "what coaches do" has narrowed significantly since 2015, but remains substantial. Continued education and demonstration of long-run value will likely close this gap further.Summary
Two-point conversion strategy involves complex tradeoffs between expected points, win probability, and game situation. Key takeaways from this chapter:
-
Expected Value Shift: The 2015 rule change fundamentally altered the calculus, giving two-point conversions (0.96 EV) a slight edge over extra points (0.94 EV) in neutral situations. However, this small expected value advantage is often overwhelmed by game-state considerations.
-
Score-Specific Strategy: Optimal decisions depend critically on score differential:
- Down 8: Strong case for going for two (to tie immediately)
- Down 14: Go for two on first TD (information value and optionality)
- Down 7: Generally kick to guarantee tie
- Other differentials: Require situation-specific win probability analysis -
Late-Game Dynamics: Dynamic programming reveals that optimal strategies change with time remaining. With limited possessions remaining, guaranteed points become more valuable relative to higher-variance two-point attempts.
-
Play-Calling Balance: Pass plays slightly outperform runs (~49% vs ~47% success), but teams maintain a balanced mix (60-40 pass-run) to prevent defensive predictability. This equilibrium reflects strategic optimization.
-
Information Value: Two-point attempts provide information that affects subsequent strategy. This value isn't captured in simple expected value calculations but can be formalized in sequential decision models.
-
Model Limitations: While we can build sophisticated probability models and decision frameworks, two-point conversions remain high-variance events with significant randomness. Even optimal strategies fail nearly half the time.
-
Practical Application: Coaches should:
- Use analytical frameworks as starting points for decisions
- Develop multiple two-point plays to prevent predictability
- Practice two-point situations extensively
- Trust the process even after individual failures
- Adjust probabilities based on team-specific factors
The ongoing evolution of two-point conversion strategy represents one of the clearest examples of analytics influencing NFL coaching decisions, with significant room for continued improvement as more teams adopt data-driven approaches.
Exercises
Conceptual Questions
-
Rule Change Impact: Explain how the 2015 extra point rule change affected the expected value calculus for two-point conversions. Calculate the expected value advantage before and after the rule change. What other factors besides expected points should teams consider when making the decision?
-
Down 8 Decision: Why is going for two when down 8 points (after scoring a TD to make it down 2) considered optimal? Walk through the game tree for both decisions (go for two vs. kick) and compare the win probability paths. Under what circumstances might kicking be preferable?
-
Early vs. Late Game: Why might teams be more willing to kick extra points early in games but go for two-point conversions late in games, even when the score differential is the same? Consider both expected value and win probability maximization perspectives. Does this behavioral pattern align with optimal strategy?
-
Information Value: Explain the concept of "information value" in the context of two-point conversions. Provide a specific scenario where attempting a two-point conversion early provides information that helps optimize later decisions, even if the attempt fails.
Coding Exercises
Exercise 1: Team-Specific Two-Point Models
Using the play-by-play data from 2015-2023: a) Calculate each team's two-point conversion success rate (both on offense and defense) b) Build a confidence interval around each estimate using the binomial distribution c) Identify which teams significantly outperform or underperform the league average (using appropriate statistical tests) d) Analyze whether offensive strength (measured by EPA/play) predicts two-point success using correlation or regression e) Create a visualization showing team success rates with confidence intervals, highlighting significant outliers **Hint**: Use a minimum threshold (e.g., 10 attempts) to ensure adequate sample size. Consider using Bayesian shrinkage for more stable estimates with small samples. **Extension**: Build a logistic regression model that predicts two-point success using team offensive and defensive EPA, controlling for other factors like home field and game situation.Exercise 2: Build Your Own Decision Chart
Create a comprehensive two-point decision chart that considers: a) Score differential from +14 to -14 (all relevant scenarios) b) Time remaining (create categories: 10+ min, 5-10 min, 2-5 min, <2 min) c) Different success probability assumptions (45%, 48%, 50%) to test robustness d) Visualize as a heat map or grid showing optimal decisions for each combination e) Compare your chart to traditional coaching wisdom and identify discrepancies **Challenge**: Incorporate timeout situations into your model. How should having 0 vs 3 timeouts affect the decision? **Advanced Extension**: Use actual win probability models (from nflfastR) instead of simplified score-based approximations to calculate exact win probability changes for each decision.Exercise 3: Historical Decision Analysis
Analyze actual coaching decisions to evaluate decision quality: a) Find all two-point attempts and extra point kicks from 2015-2023 b) For each post-touchdown situation, calculate whether the optimal decision was to go for two or kick (using win probability changes) c) Classify each actual decision as "optimal" or "suboptimal" based on your framework d) Calculate what percentage of decisions matched optimal strategy overall and by season e) Identify specific coaches or teams that make the best two-point decisions f) Measure whether decision quality has improved over time (test for trend) **Extension**: Calculate the expected win probability cost of suboptimal decisions. How many games per season do teams lose due to poor two-point conversion decisions? **Advanced Challenge**: Analyze whether coaches learn from their mistakes—do coaches who make a suboptimal decision in one game make better decisions in subsequent similar situations?Exercise 4: Monte Carlo Simulation
Build a Monte Carlo simulation for the following scenario: **Setup**: Your team scores a TD to trail 14-8 with 3:00 remaining. You have all three timeouts. Opponent will receive kickoff. **Decision**: Go for two-point conversion or kick extra point? **Simulation Requirements**: - Simulate 10,000 games for each strategy - Model: Opponent possession (scoring probability, time consumed), your team's possession if you get the ball back (scoring probability, time consumed) - Track outcomes: Win, Loss, Overtime for each strategy - Compare: - Win probability for each strategy - Overtime probability - Distribution of final score differentials - Expected point margin **Assumptions Needed**: - Opponent scoring rate (probabilities of TD, FG, punt by field position) - Your team's scoring rate (similar breakdown) - Time consumption per drive - Two-point and extra point success probabilities **Deliverable**: - Win probability estimate for each strategy - Visualization of outcome distributions - Sensitivity analysis showing how results change with different assumptions **Extension**: Expand to simulate an entire drive sequence and model optimal play-calling (when to take risks vs. play conservatively) based on the conversion decision.Exercise 5: Defensive Two-Point Analysis
Analyze defensive strategy and performance against two-point conversions: a) Do defenses perform differently on two-point attempts vs. regular plays from the 2-yard line? Compare success rates controlling for play type. b) Is there a home field advantage on two-point conversions? Test whether home teams convert at higher rates controlling for team quality. c) Analyze whether certain defensive schemes or coordinator tendencies correlate with two-point conversion defense success. d) Build a logistic regression model predicting two-point defensive success using defensive EPA, home/away, and other relevant factors. e) Identify which defensive coordinators have been most successful at defending two-point conversions (accounting for sample size with Bayesian methods). **Data Needed**: Combine two-point attempt data with defensive performance metrics (EPA against, DVOA, etc.) and coaching information. **Challenge**: Attempt to classify defensive play types on two-point conversions (using play description text or other available data) and analyze which defensive approaches are most successful.Further Reading
Academic Papers
- Romer, D. (2006). "Do Firms Maximize? Evidence from Professional Football." Journal of Political Economy, 114(2), 340-365.
-
Seminal paper analyzing NFL decision-making inefficiency, including two-point conversion decisions
-
Burke, B. (2009). "The Two-Point Conversion Decision." Advanced NFL Stats.
-
Early analytical treatment of optimal two-point strategy using win probability frameworks
-
Goldner, K. (2020). "A Markov Decision Process Model for Optimal Fourth Down Decision Making." MIT Sloan Sports Analytics Conference.
-
Dynamic programming approach applicable to two-point decisions
-
Kovash, K., & Levitt, S. D. (2009). "Professionals Do Not Play Minimax: Evidence from Major League Baseball and the National Football League." NBER Working Paper.
- Analysis of strategic decision-making including two-point conversions
Practical Guides
- Baldwin, B. (2019). "Optimal Fourth Down and Two-Point Decisions in the NFL." Open Source Football.
-
Practical implementation of win probability-based two-point decision frameworks
-
Yam, D., & Lopez, M. (2019). "What's a First Down Worth? Estimating Expected Points in the NFL." Harvard Sports Analysis Collective.
-
Foundation for expected points analysis underlying two-point decisions
-
Cole, K. (2020). "Going for Two: A Comprehensive Analysis." The 33rd Team.
- Industry perspective on two-point strategy from NFL analytics professional
Books
- Alamar, B. (2013). Sports Analytics: A Guide for Coaches, Managers, and Other Decision Makers. Columbia University Press.
-
Chapter 6 covers decision analysis in football including two-point conversions
-
Winston, W. (2012). Mathletics: How Gamblers, Managers, and Sports Enthusiasts Use Mathematics. Princeton University Press.
-
Sections on game theory and optimal decision-making applicable to two-point strategy
-
Moskowitz, T., & Wertheim, L. J. (2011). Scorecasting: The Hidden Influences Behind How Sports Are Played and Games Are Won. Crown Archetype.
- Chapter discussing behavioral biases in coaching decisions including two-point conservatism
Online Resources
- nflfastR documentation: Comprehensive guide to play-by-play data including two-point conversion variables
- NFL Operations: Official rules and statistics for two-point conversions
- Pro Football Reference: Historical two-point conversion data and team-by-team breakdowns
- The 33rd Team: Regular analytical content on two-point strategy from NFL front office veterans
References
:::