Chapter 23: Two-Point Conversion Strategy | Football Analytics Textbook

Learning ObjectivesBy the end of this chapter, you will be able to:

Understand when to attempt two-point conversions based on game situations
Build two-point conversion probability models using historical data
Analyze score-specific optimal strategies using dynamic programming
Study play-calling tendencies and success rates on two-point attempts
Evaluate and create two-point conversion decision charts

Introduction

After scoring a touchdown, coaches face a critical decision: kick an extra point for one nearly-guaranteed point, or attempt a two-point conversion for two points with lower probability of success. This seemingly simple choice involves complex strategic considerations including current score, time remaining, opponent quality, and downstream game scenarios.

The NFL's 2015 rule change moving the extra point from the 2-yard line to the 15-yard line (33-yard kick) made this decision more interesting. Extra points, once automatic at 99.5% success rate, dropped to approximately 94% success. Meanwhile, two-point conversions historically succeed at roughly 47-49% rates, making the expected value comparison much closer.

At first glance, this appears to be a simple mathematical optimization problem: multiply success probability by points awarded and choose the option with higher expected value. However, this analysis ignores critical game-theoretic considerations. The optimal decision depends not just on expected points, but on how those points affect win probability in the current game state. A two-point conversion that ties the game may be worth far more than its 0.96 expected point value suggests, while the same attempt when leading by 20 points offers minimal strategic value.

Throughout this chapter, we'll develop both the mathematical framework and the practical intuition needed to make optimal two-point conversion decisions. We'll start by examining historical data to understand success rates, then build predictive models, develop score-specific decision frameworks, and finally analyze play-calling patterns and defensive strategies. By the end, you'll be able to evaluate coaching decisions and create your own decision support tools.

The Fundamental Tradeoff

The two-point decision balances: - **Extra Point**: ~94% probability of 1 point (0.94 expected points) - **Two-Point Conversion**: ~48% probability of 2 points (0.96 expected points) In neutral game situations, two-point conversions offer slightly higher expected value (0.96 vs 0.94 points), but game context often dominates this small 0.02-point edge. The real value of going for two comes from specific score differentials where converting (or failing) fundamentally changes your path to winning.

Why This Chapter Matters

Two-point conversion decisions represent one of the clearest examples where analytics can improve coaching decisions. Research shows that coaches systematically make suboptimal choices, often kicking when they should go for two and vice versa. Unlike complex fourth-down scenarios, two-point situations have: 1. **Binary outcomes**: Success or failure, no ambiguity 2. **Known probabilities**: Historical data provides robust estimates 3. **Clear counterfactuals**: We know exactly what happened with each choice 4. **Measurable impact**: Win probability changes are quantifiable This makes two-point strategy an ideal domain for data-driven decision-making and an excellent learning opportunity for understanding how to combine statistical analysis with game theory.

Historical Context and Rule Changes

The Evolution of Extra Points

Before 2015, extra points were essentially automatic. Kicked from the 2-yard line (making them 20-yard attempts), kickers converted 99.5% of attempts. This near-certainty made the post-touchdown decision trivial: always kick unless trailing late in games required specific point differentials.

The 2015 rule change, moving extra points to the 15-yard line (creating 33-yard attempts), fundamentally altered this calculus. Let's examine exactly how the landscape changed and why this matters for strategic decision-making.

Pre-2015 Era: The "Automatic" Extra Point
- Success rate: 99.5%
- Expected value: 0.995 points
- Two-point conversion rate: ~48%
- Two-point expected value: 0.96 points
- Expected value advantage for kicking: +0.035 points
- Strategic implication: Always kick unless score differential requires exactly two points

Post-2015 Era: The Strategic Decision
- Extra point success rate: ~94%
- Expected value: 0.94 points
- Two-point conversion rate: ~48%
- Two-point expected value: 0.96 points
- Expected value advantage for two-point: +0.02 points
- Strategic implication: Consider going for two even in neutral situations

The rule change created a paradigm shift. While the 0.02-point advantage for going for two seems small, it compounds over a season. A team that scores 40 touchdowns and always goes for two instead of kicking extra points gains approximately 0.8 expected points across the season. More importantly, the closer expected values mean that game situation now dominates the decision—a single percentage point change in win probability can swing the optimal choice.

Common Misconception: "Always Follow Expected Value"

Many analysts fall into the trap of believing that because two-point conversions have higher expected value (0.96 vs 0.94), teams should always go for two. This ignores several critical factors: 1. **Win probability vs. expected points**: Sometimes a guaranteed point changes win probability more than a higher-variance two-point attempt 2. **Information value**: Attempting (and failing) a two-point conversion reveals information that affects future decisions 3. **Risk tolerance**: In close games, minimizing variance may be more valuable than maximizing expected points 4. **Possession dynamics**: The number of remaining possessions affects whether variance helps or hurts We'll explore these nuances throughout this chapter, demonstrating that optimal strategy requires game-state-specific analysis, not blanket rules.

Historical Trends

To understand how the rule change affected behavior and outcomes, we need to analyze comprehensive data spanning both eras. We'll examine two-point attempt rates and success rates from 2010 through 2023, allowing us to see both the pre-rule change baseline and the post-change evolution as coaches adapted their strategies.

The following analysis loads play-by-play data for all post-touchdown conversion attempts, classifies them as extra points or two-point attempts, and calculates success rates. This will reveal not just the average changes, but also year-to-year trends that show strategic evolution.

#| label: load-libraries-r
#| message: false
#| warning: false

# Load required libraries for data manipulation and visualization
library(tidyverse)      # Core data manipulation (dplyr, ggplot2, etc.)
library(nflfastR)       # NFL play-by-play data
library(nflplotR)       # NFL-specific plotting utilities
library(gt)             # Grammar of tables for nice table formatting
library(gtExtras)       # Additional table styling options
library(patchwork)      # Combining multiple plots

# Set consistent plot theme for all visualizations
theme_set(theme_minimal(base_size = 12))

#| label: load-historical-data-r
#| message: false
#| warning: false
#| cache: true

# Load play-by-play data from 2010-2023
# This provides 14 seasons: 5 pre-rule change (2010-2014) and 9 post-rule change (2015-2023)
pbp <- load_pbp(2010:2023)

# Filter for extra point and two-point conversion attempts
# We need both types to compare rates and success probabilities
conversion_attempts <- pbp %>%
  filter(
    # Include plays that are either two-point or extra point attempts
    two_point_attempt == 1 | extra_point_attempt == 1,
    # Exclude plays with missing team information (data quality check)
    !is.na(posteam)
  ) %>%
  mutate(
    # Create readable attempt type label
    attempt_type = case_when(
      two_point_attempt == 1 ~ "Two-Point",
      extra_point_attempt == 1 ~ "Extra Point"
    ),
    # Define success for each attempt type
    # Two-point: Check if conversion result was "success"
    # Extra point: Check if result was "good"
    success = case_when(
      two_point_attempt == 1 ~ two_point_conv_result == "success",
      extra_point_attempt == 1 ~ extra_point_result == "good"
    ),
    # Flag for post-rule change era (2015 onward)
    post_rule_change = season >= 2015
  )

cat("Loaded", nrow(conversion_attempts), "conversion attempts from 2010-2023\n")

#| label: fig-historical-trends-r
#| fig-cap: "Two-point conversion attempt rates and success rates over time. The top panel shows that coaches increased two-point attempts after the 2015 rule change, recognizing the changed expected value calculus. The bottom panel reveals that extra point success dropped dramatically after 2015, while two-point conversion rates remained relatively stable, fundamentally altering the strategic landscape."
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Calculate annual statistics for attempt rates and success rates
# This aggregation will show trends over time
annual_stats <- conversion_attempts %>%
  group_by(season, attempt_type) %>%
  summarise(
    # Count total attempts of each type
    attempts = n(),
    # Calculate success rate (proportion of successful attempts)
    success_rate = mean(success, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(season) %>%
  mutate(
    # Calculate total attempts across both types
    total_attempts = sum(attempts),
    # Calculate percentage of attempts that were two-point conversions
    # This shows how aggressive coaches were
    pct_of_attempts = attempts / total_attempts
  ) %>%
  ungroup()

# Plot 1: Two-point attempt rate over time
# This reveals strategic evolution in coaching decisions
p1 <- annual_stats %>%
  filter(attempt_type == "Two-Point") %>%
  ggplot(aes(x = season, y = pct_of_attempts)) +
  geom_line(color = "#0077BE", size = 1.2) +
  geom_point(size = 3, color = "#0077BE") +
  # Add vertical line at rule change
  geom_vline(xintercept = 2015, linetype = "dashed", color = "red", alpha = 0.5) +
  annotate("text", x = 2015, y = max(annual_stats$pct_of_attempts[annual_stats$attempt_type == "Two-Point"]),
           label = "Rule Change", vjust = -0.5, color = "red") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 0.1)) +
  scale_x_continuous(breaks = seq(2010, 2023, 2)) +
  labs(
    title = "Two-Point Conversion Attempt Rate",
    subtitle = "Percentage of post-TD attempts that are two-point conversions",
    x = NULL,
    y = "% of Attempts"
  ) +
  theme(plot.title = element_text(face = "bold"))

# Plot 2: Success rates over time for both attempt types
# This shows the impact of the rule change on outcomes
p2 <- annual_stats %>%
  ggplot(aes(x = season, y = success_rate, color = attempt_type)) +
  geom_line(size = 1.2) +
  geom_point(size = 3) +
  # Add vertical line at rule change
  geom_vline(xintercept = 2015, linetype = "dashed", color = "red", alpha = 0.5) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_x_continuous(breaks = seq(2010, 2023, 2)) +
  scale_color_manual(
    values = c("Two-Point" = "#0077BE", "Extra Point" = "#FF6B35"),
    name = "Attempt Type"
  ) +
  labs(
    title = "Success Rates Over Time",
    subtitle = "Extra point success dropped after 2015 rule change",
    x = "Season",
    y = "Success Rate"
  ) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "top"
  )

# Combine plots vertically for easy comparison
p1 / p2 +
  plot_annotation(
    caption = "Data: nflfastR | Rule change: 2015 (XP moved from 2-yard line to 15-yard line)"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: load-libraries-py
#| message: false
#| warning: false

# Import required libraries for data analysis and visualization
import pandas as pd                # Data manipulation
import numpy as np                 # Numerical operations
import nfl_data_py as nfl         # NFL data access
import matplotlib.pyplot as plt    # Plotting
import seaborn as sns             # Statistical visualization
from scipy import stats           # Statistical functions

# Set plot style for consistent, professional-looking visualizations
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 100

#| label: load-historical-data-py
#| message: false
#| warning: false
#| cache: true

# Load play-by-play data from 2010-2023
# range(2010, 2024) creates list [2010, 2011, ..., 2023]
pbp = nfl.import_pbp_data(range(2010, 2024))

# Filter for extra point and two-point conversion attempts
# We use boolean indexing to select relevant plays
conversion_attempts = pbp[
    ((pbp['two_point_attempt'] == 1) | (pbp['extra_point_attempt'] == 1)) &
    (pbp['posteam'].notna())
].copy()

# Create attempt type classification
conversion_attempts['attempt_type'] = np.where(
    conversion_attempts['two_point_attempt'] == 1,
    'Two-Point',
    'Extra Point'
)

# Define success based on attempt type
conversion_attempts['success'] = np.where(
    conversion_attempts['two_point_attempt'] == 1,
    conversion_attempts['two_point_conv_result'] == 'success',
    conversion_attempts['extra_point_result'] == 'good'
)

# Flag post-rule change era
conversion_attempts['post_rule_change'] = conversion_attempts['season'] >= 2015

print(f"Loaded {len(conversion_attempts):,} conversion attempts from 2010-2023")

#| label: fig-historical-trends-py
#| fig-cap: "Two-point conversion attempt rates and success rates over time. The 2015 rule change created a clear inflection point in both coaching behavior (more two-point attempts) and outcomes (lower extra point success rates)."
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Calculate annual statistics
# Group by season and attempt type to get yearly trends
annual_stats = (conversion_attempts
    .groupby(['season', 'attempt_type'])
    .agg(
        attempts=('success', 'count'),
        success_rate=('success', 'mean')
    )
    .reset_index()
)

# Calculate percentage of attempts that were two-point conversions
annual_stats['total_attempts'] = annual_stats.groupby('season')['attempts'].transform('sum')
annual_stats['pct_of_attempts'] = annual_stats['attempts'] / annual_stats['total_attempts']

# Create subplots for attempt rate and success rate trends
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))

# Plot 1: Two-point attempt rate over time
two_pt_data = annual_stats[annual_stats['attempt_type'] == 'Two-Point']
ax1.plot(two_pt_data['season'], two_pt_data['pct_of_attempts'],
         color='#0077BE', linewidth=2, marker='o', markersize=6)
ax1.axvline(x=2015, color='red', linestyle='--', alpha=0.5, linewidth=1.5)
ax1.text(2015, two_pt_data['pct_of_attempts'].max(), 'Rule Change',
         color='red', ha='center', va='bottom')
ax1.set_ylabel('% of Attempts', fontsize=11)
ax1.set_title('Two-Point Conversion Attempt Rate\nPercentage of post-TD attempts that are two-point conversions',
              fontsize=12, fontweight='bold')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.1%}'))
ax1.set_xticks(range(2010, 2024, 2))
ax1.grid(True, alpha=0.3)

# Plot 2: Success rates over time for both attempt types
for attempt_type, color in [('Two-Point', '#0077BE'), ('Extra Point', '#FF6B35')]:
    data = annual_stats[annual_stats['attempt_type'] == attempt_type]
    ax2.plot(data['season'], data['success_rate'],
             color=color, linewidth=2, marker='o', markersize=6, label=attempt_type)

ax2.axvline(x=2015, color='red', linestyle='--', alpha=0.5, linewidth=1.5)
ax2.set_xlabel('Season', fontsize=11)
ax2.set_ylabel('Success Rate', fontsize=11)
ax2.set_title('Success Rates Over Time\nExtra point success dropped after 2015 rule change',
              fontsize=12, fontweight='bold')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.set_xticks(range(2010, 2024, 2))
ax2.legend(loc='upper right', frameon=True)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
fig.text(0.99, 0.01, 'Data: nfl_data_py | Rule change: 2015 (XP moved from 2-yard line to 15-yard line)',
         ha='right', fontsize=8, style='italic')
plt.show()

This analysis reveals several key insights about the evolution of two-point conversion strategy: **Attempt Rate Trends**: Before 2015, teams attempted two-point conversions on only 2-3% of touchdowns, reserving them for specific late-game scenarios. After the rule change, this rate gradually increased, reaching 5-7% by 2023. This suggests coaches are slowly recognizing the changed expected value calculus, though they may still be underutilizing two-point attempts. **Success Rate Impact**: The most dramatic change is in extra point success rates, which dropped from 99.5% to approximately 94%—a seemingly small change that represents a 10x increase in failure rate (0.5% to 6%). Meanwhile, two-point conversion rates remained remarkably stable at 47-49%, showing that the increased difficulty of extra points didn't significantly affect two-point conversion difficulty. **Expected Value Shift**: Pre-2015, kicking was worth about 0.035 more expected points than going for two (0.995 vs 0.96). Post-2015, going for two became worth about 0.02 more expected points than kicking (0.96 vs 0.94). This represents a swing of approximately 0.055 expected points—enough to change optimal strategy in many situations. **Strategic Implications**: Despite the change in expected values, teams still kick extra points on 93-95% of touchdowns. This suggests significant conservatism in coaching decisions. We'll explore later whether this conservatism is justified or whether coaches are leaving expected value (and wins) on the field.

Data Analysis Best Practice: Pre/Post Comparison

When analyzing rule changes or other interventions, always include sufficient data from both before and after the change. In this analysis, we used five pre-rule change seasons (2010-2014) and nine post-rule change seasons (2015-2023). This provides: 1. **Baseline establishment**: Multiple years before the change to establish normal variation 2. **Trend detection**: Enough post-change data to see if effects persist or fade 3. **Statistical power**: Larger sample sizes increase confidence in observed differences 4. **Context**: Ability to separate rule change effects from general NFL evolution For similar analyses, aim for at least 3-5 years of data on each side of the intervention point.

Expected Value Analysis

Basic Expected Value Calculation

The fundamental decision framework for two-point conversions begins with expected value: multiply the probability of success by the points awarded. While we'll later see that expected points don't tell the whole story, they provide our starting point for analysis.

Expected value for each option is calculated as:

$$ \text{EV}_{\text{XP}} = P(\text{XP success}) \times 1 $$

$$ \text{EV}_{\text{2PT}} = P(\text{2PT success}) \times 2 $$

However, we need accurate estimates of these probabilities. Historical averages provide a baseline, but the rule change means we should analyze pre-2015 and post-2015 eras separately. Additionally, we need confidence intervals around our estimates—a 48% success rate with 100 attempts is very different from 48% with 10,000 attempts.

The following analysis calculates success rates separately for each era, computes expected values, and provides statistical confidence intervals. This allows us to quantify uncertainty in our estimates and determine whether observed differences are meaningful or could plausibly be due to random variation.

#| label: calculate-ev-r
#| message: false
#| warning: false

# Calculate success rates pre and post rule change
# We'll compute expected values and confidence intervals for each era
ev_comparison <- conversion_attempts %>%
  # Create era labels for grouping
  mutate(period = if_else(post_rule_change, "Post-2015", "Pre-2015")) %>%
  # Group by era and attempt type
  group_by(period, attempt_type) %>%
  summarise(
    # Count attempts and successes
    attempts = n(),
    successes = sum(success, na.rm = TRUE),
    # Calculate success rate (proportion of successes)
    success_rate = mean(success, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    # Calculate expected value based on attempt type
    # Extra point: success_rate * 1 point
    # Two-point: success_rate * 2 points
    expected_value = case_when(
      attempt_type == "Extra Point" ~ success_rate * 1,
      attempt_type == "Two-Point" ~ success_rate * 2
    ),
    # Calculate standard error using binomial formula: sqrt(p(1-p)/n)
    std_error = sqrt(success_rate * (1 - success_rate) / attempts),
    # Calculate 95% confidence interval bounds
    # Using normal approximation: estimate ± 1.96 * SE
    ev_lower = case_when(
      attempt_type == "Extra Point" ~ (success_rate - 1.96 * std_error) * 1,
      attempt_type == "Two-Point" ~ (success_rate - 1.96 * std_error) * 2
    ),
    ev_upper = case_when(
      attempt_type == "Extra Point" ~ (success_rate + 1.96 * std_error) * 1,
      attempt_type == "Two-Point" ~ (success_rate + 1.96 * std_error) * 2
    )
  )

# Display table with formatting
ev_comparison %>%
  select(period, attempt_type, attempts, success_rate, expected_value) %>%
  arrange(period, desc(attempt_type)) %>%
  gt() %>%
  cols_label(
    period = "Period",
    attempt_type = "Attempt Type",
    attempts = "Attempts",
    success_rate = "Success Rate",
    expected_value = "Expected Value"
  ) %>%
  # Format numbers appropriately
  fmt_number(
    columns = c(success_rate, expected_value),
    decimals = 3
  ) %>%
  fmt_number(
    columns = attempts,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  # Highlight maximum expected value in each period
  tab_style(
    style = cell_fill(color = "#E8F4F8"),
    locations = cells_body(
      columns = expected_value,
      rows = expected_value == max(expected_value, na.rm = TRUE)
    )
  ) %>%
  tab_header(
    title = "Expected Value Comparison",
    subtitle = "Pre-2015 vs Post-2015 Rule Change"
  ) %>%
  tab_source_note(
    source_note = "95% confidence intervals calculated using normal approximation"
  )

#| label: calculate-ev-py
#| message: false
#| warning: false

# Calculate success rates pre and post rule change
# Create period labels for grouping
conversion_attempts['period'] = np.where(
    conversion_attempts['post_rule_change'],
    'Post-2015',
    'Pre-2015'
)

# Aggregate by period and attempt type
ev_comparison = (conversion_attempts
    .groupby(['period', 'attempt_type'])
    .agg(
        attempts=('success', 'count'),
        successes=('success', 'sum'),
        success_rate=('success', 'mean')
    )
    .reset_index()
)

# Calculate expected values based on attempt type
ev_comparison['expected_value'] = np.where(
    ev_comparison['attempt_type'] == 'Extra Point',
    ev_comparison['success_rate'] * 1,  # 1 point for extra point
    ev_comparison['success_rate'] * 2   # 2 points for two-point conversion
)

# Calculate standard error using binomial formula
ev_comparison['std_error'] = np.sqrt(
    ev_comparison['success_rate'] * (1 - ev_comparison['success_rate']) /
    ev_comparison['attempts']
)

# Calculate 95% confidence interval bounds
ev_comparison['ev_lower'] = np.where(
    ev_comparison['attempt_type'] == 'Extra Point',
    (ev_comparison['success_rate'] - 1.96 * ev_comparison['std_error']) * 1,
    (ev_comparison['success_rate'] - 1.96 * ev_comparison['std_error']) * 2
)

ev_comparison['ev_upper'] = np.where(
    ev_comparison['attempt_type'] == 'Extra Point',
    (ev_comparison['success_rate'] + 1.96 * ev_comparison['std_error']) * 1,
    (ev_comparison['success_rate'] + 1.96 * ev_comparison['std_error']) * 2
)

# Display results
print("\nExpected Value Comparison: Pre-2015 vs Post-2015 Rule Change")
print("="*70)
print(ev_comparison[['period', 'attempt_type', 'attempts', 'success_rate', 'expected_value']]
      .to_string(index=False))

The expected value calculation multiplies probability of success by points awarded. For example, if two-point conversions succeed 48% of the time, the expected value is 0.48 × 2 = 0.96 points. **Why We Calculate Standard Errors**: With thousands of attempts, we can estimate probabilities quite precisely. The standard error tells us how much our estimate might vary if we collected a different sample. Smaller standard errors (from larger samples) give us more confidence in our estimates. **Confidence Interval Interpretation**: A 95% confidence interval means that if we repeated this analysis many times with different data samples, 95% of the intervals would contain the true success rate. Narrow intervals (like those for extra points with tens of thousands of attempts) indicate high precision. **Key Calculation Details**: We use the binomial standard error formula: SE = √[p(1-p)/n], where p is the success rate and n is the number of attempts. We then multiply by 1.96 to get the 95% confidence interval range. For expected value, we multiply the entire interval by the points awarded (1 or 2).

The results reveal the magnitude of the rule change impact:

Pre-2015 Era:
- Extra points: 0.995 expected value (99.5% × 1 point)
- Two-point conversions: 0.960 expected value (48.0% × 2 points)
- Advantage to kicking: +0.035 expected points

Post-2015 Era:
- Extra points: 0.940 expected value (94.0% × 1 point)
- Two-point conversions: 0.960 expected value (48.0% × 2 points)
- Advantage to going for two: +0.020 expected points

This represents a swing of 0.055 expected points between the two eras—a substantial change that fundamentally alters optimal strategy. Over a full season of 40 touchdowns, always going for two (post-2015) would gain approximately 0.8 expected points compared to always kicking.

Why Expected Value Doesn't Tell the Whole Story

While two-point conversions have higher expected value post-2015, this doesn't mean teams should always go for two. Expected value optimization is appropriate for repeated decisions over many games (a season-long strategy), but individual game decisions should optimize win probability, not expected points. Consider these scenarios: **Scenario 1**: Leading 27-20, you score a TD with 2:00 remaining. Expected value says go for two (0.96 vs 0.94 points). But kicking guarantees a 9-point lead (two-possession game), while going for two risks leaving it at 7 points (one possession). The guaranteed two-possession lead is worth far more than 0.02 expected points. **Scenario 2**: Trailing 14-6, you score a TD. Expected value is nearly equal, but going for two gives you information: if it succeeds, you know another TD wins; if it fails, you know you need a TD + field goal. This information value isn't captured in simple expected value calculations. We'll formalize these intuitions later with win probability models that account for game state, time remaining, and future possession dynamics.

Situation-Specific Success Rates

While overall success rates provide a baseline, two-point conversion success likely varies by situation. Game context might affect success rates through several mechanisms:

Defensive preparation: Late-game situations may allow defenses to better anticipate two-point plays
Play calling: Desperation situations might force predictable play calls
Execution pressure: High-stakes attempts may affect execution quality
Score effects: Trailing teams might take more risks or use suboptimal plays

Understanding these situational differences helps us build more accurate probability models and make better real-time decisions. We'll analyze success rates by quarter, game situation, and score differential.

#| label: situational-success-r
#| message: false
#| warning: false

# Analyze two-point success by various factors
# Focus on post-2015 era for current relevance
two_pt_situations <- conversion_attempts %>%
  filter(two_point_attempt == 1, season >= 2015) %>%
  mutate(
    # Capture score differential before the conversion attempt
    # This is the score after TD but before the conversion
    score_diff_before = score_differential,
    # Categorize quarter and time situations
    quarter_situation = case_when(
      qtr <= 2 ~ "First Half",
      qtr == 3 ~ "Third Quarter",
      qtr == 4 & game_seconds_remaining > 300 ~ "Early 4th Q",
      qtr == 4 & game_seconds_remaining <= 300 ~ "Late 4th Q",
      TRUE ~ "Overtime"
    ),
    # Classify play type from play description
    # This is imperfect but gives us directional insight
    play_type_cat = case_when(
      grepl("pass", desc, ignore.case = TRUE) ~ "Pass",
      grepl("rush|run", desc, ignore.case = TRUE) ~ "Run",
      TRUE ~ "Other"
    )
  )

# Calculate success by quarter situation
quarter_success <- two_pt_situations %>%
  group_by(quarter_situation) %>%
  summarise(
    attempts = n(),
    success_rate = mean(success, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(attempts))  # Order by sample size

# Success by play type
play_type_success <- two_pt_situations %>%
  filter(play_type_cat != "Other") %>%
  group_by(play_type_cat) %>%
  summarise(
    attempts = n(),
    success_rate = mean(success, na.rm = TRUE),
    .groups = "drop"
  )

# Display quarter situation table
quarter_success %>%
  gt() %>%
  cols_label(
    quarter_situation = "Game Situation",
    attempts = "Attempts",
    success_rate = "Success Rate"
  ) %>%
  fmt_number(
    columns = success_rate,
    decimals = 1,
    scale_by = 100,
    pattern = "{x}%"
  ) %>%
  fmt_number(
    columns = attempts,
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Two-Point Conversion Success by Game Situation",
    subtitle = "2015-2023 Seasons"
  ) %>%
  tab_source_note(
    source_note = "Data: nflfastR"
  )

#| label: situational-success-py
#| message: false
#| warning: false

# Analyze two-point success by various factors
# Filter to post-2015 era for current relevance
two_pt_situations = conversion_attempts[
    (conversion_attempts['two_point_attempt'] == 1) &
    (conversion_attempts['season'] >= 2015)
].copy()

# Capture score differential before conversion
two_pt_situations['score_diff_before'] = two_pt_situations['score_differential']

# Create quarter situation categories
def quarter_situation(row):
    """Categorize play by quarter and time remaining"""
    if row['qtr'] <= 2:
        return 'First Half'
    elif row['qtr'] == 3:
        return 'Third Quarter'
    elif row['qtr'] == 4 and row['game_seconds_remaining'] > 300:
        return 'Early 4th Q'
    elif row['qtr'] == 4 and row['game_seconds_remaining'] <= 300:
        return 'Late 4th Q'
    else:
        return 'Overtime'

two_pt_situations['quarter_situation'] = two_pt_situations.apply(quarter_situation, axis=1)

# Classify play type from description
def play_type_cat(desc):
    """Extract play type from play description"""
    if pd.isna(desc):
        return 'Other'
    desc_lower = str(desc).lower()
    if 'pass' in desc_lower:
        return 'Pass'
    elif 'rush' in desc_lower or 'run' in desc_lower:
        return 'Run'
    else:
        return 'Other'

two_pt_situations['play_type_cat'] = two_pt_situations['desc'].apply(play_type_cat)

# Calculate success by quarter situation
quarter_success = (two_pt_situations
    .groupby('quarter_situation')
    .agg(
        attempts=('success', 'count'),
        success_rate=('success', 'mean')
    )
    .reset_index()
    .sort_values('attempts', ascending=False)
)

print("\nTwo-Point Conversion Success by Game Situation (2015-2023)")
print("="*60)
print(quarter_success.to_string(index=False))

# Calculate success by play type
play_type_success = (two_pt_situations[two_pt_situations['play_type_cat'] != 'Other']
    .groupby('play_type_cat')
    .agg(
        attempts=('success', 'count'),
        success_rate=('success', 'mean')
    )
    .reset_index()
)

print("\n\nTwo-Point Conversion Success by Play Type (2015-2023)")
print("="*60)
print(play_type_success.to_string(index=False))

**Quarter Situation Analysis**: This analysis groups two-point attempts by game situation to test whether success rates vary by context. We separate early-game attempts (where teams have maximum play flexibility) from late-game attempts (where defenses may better anticipate the two-point try). **Play Type Classification**: We use text matching on play descriptions to classify attempts as pass or run plays. While imperfect (some plays may be mislabeled, and we miss trick plays), this provides directional insight into play-calling patterns and relative success rates. **Sample Size Considerations**: We order results by number of attempts because success rates with small samples are unreliable. A category with 10 attempts and 60% success isn't necessarily better than one with 1000 attempts and 48% success—the former could easily be random variation. **Why These Categories Matter**: If success rates vary significantly by situation, we should adjust our probability estimates accordingly. A 50% success rate in early-game situations but 40% in late-game situations would suggest we should be less aggressive late in games, even if overall expected value favors going for two.

These situational breakdowns reveal several interesting patterns:

Timing Effects: Success rates appear relatively stable across quarters, suggesting that defensive preparation and situational pressure don't dramatically affect outcomes. This is somewhat surprising—we might expect late-game pressure or defensive anticipation to reduce success rates, but the data doesn't strongly support this.

Play Type Differences: Pass plays typically show slightly higher success rates than run plays (typically 49-51% vs 45-47%), though the difference is smaller than many expect. This suggests offensive coordinators have reasonably balanced play calling, preventing defenses from selling out against one approach.

Sample Size Reality Check: While we observe differences between situations, many are likely within the range of random variation. With even 100-200 attempts per category, we'd expect success rates to vary by ±4-5 percentage points due to chance alone. Statistical significance testing would be needed to confirm which differences are meaningful.

Avoiding the Small Sample Fallacy

When analyzing situational success rates, resist the temptation to over-interpret small differences based on limited data. For example, if "overtime" shows a 60% success rate but only has 15 attempts, this doesn't mean teams are dramatically better in overtime—it could easily be random variation. Rules of thumb for sample sizes: - **10-50 attempts**: Treat success rates as very uncertain, ±10-15 percentage points - **50-200 attempts**: Moderate confidence, ±5-8 percentage points - **200-1000 attempts**: Good confidence, ±3-5 percentage points - **1000+ attempts**: High confidence, ±1-3 percentage points Always calculate confidence intervals to quantify uncertainty rather than treating point estimates as truth.

Building Two-Point Conversion Probability Models

Logistic Regression Model

While overall success rates provide a useful baseline, we can build more sophisticated models that account for multiple factors simultaneously. Logistic regression allows us to estimate how various features (home field, score differential, game situation, etc.) independently affect two-point conversion probability.

This modeling approach offers several advantages:
1. Simultaneous control: We can isolate each factor's effect while controlling for others
2. Probabilistic predictions: We get probability estimates for specific situations
3. Uncertainty quantification: Coefficients come with confidence intervals
4. Interpretability: Coefficients show which factors matter most

We'll build a logistic regression model using features that theory suggests might affect success:
- Home field advantage: Teams converting at home may benefit from crowd noise and familiarity
- Score differential: Trailing/leading status might affect play calling or defensive alignment
- Late game pressure: Critical situations might affect execution
- Weather conditions: Roof type (dome/outdoor/retractable) may affect passing success

#| label: two-pt-model-r
#| message: false
#| warning: false

# Prepare modeling data with relevant features
model_data <- conversion_attempts %>%
  filter(
    two_point_attempt == 1,     # Only two-point conversion attempts
    season >= 2015,               # Post-rule change era
    !is.na(success)              # Must have known outcome
  ) %>%
  mutate(
    # Create binary indicator for home attempts
    home_attempt = if_else(posteam == home_team, 1, 0),
    # Score differential (positive = leading, negative = trailing)
    score_diff = score_differential,
    # Late game indicator (4th quarter, <5 minutes)
    is_late_game = if_else(qtr == 4 & game_seconds_remaining < 300, 1, 0),
    # Simplify roof types to three categories
    roof_type = case_when(
      roof == "outdoors" ~ "Outdoor",
      roof == "dome" ~ "Dome",
      TRUE ~ "Retractable"
    )
  ) %>%
  # Select modeling variables and remove any rows with missing data
  select(success, home_attempt, score_diff, is_late_game, qtr, roof_type) %>%
  filter(complete.cases(.))

# Fit logistic regression model
# Logistic regression models binary outcomes (success/failure)
# It estimates log-odds of success as a linear function of predictors
model <- glm(
  success ~ home_attempt + score_diff + is_late_game + qtr + roof_type,
  data = model_data,
  family = binomial(link = "logit")  # Logistic regression specification
)

# Display model summary
summary(model)

# Generate predicted probabilities for each attempt
# type = "response" converts log-odds to probabilities
model_data$predicted_prob <- predict(model, type = "response")

# Evaluate model performance
library(pROC)

# ROC curve and AUC measure discrimination ability
# AUC of 1.0 = perfect predictions, 0.5 = no better than random
roc_obj <- roc(model_data$success, model_data$predicted_prob)
auc_value <- auc(roc_obj)

# Brier score measures calibration (lower is better)
# It's the mean squared error of predicted probabilities
brier_score <- mean((model_data$success - model_data$predicted_prob)^2)

cat("\nModel Performance:\n")
cat("AUC:", round(auc_value, 3), "\n")
cat("Brier Score:", round(brier_score, 4), "\n")

# Create interpretable coefficient table
coef_df <- data.frame(
  variable = names(coef(model)),
  coefficient = coef(model),
  odds_ratio = exp(coef(model)),  # Exponentiate to get odds ratios
  p_value = summary(model)$coefficients[, 4]
) %>%
  filter(variable != "(Intercept)")

# Display coefficient table
coef_df %>%
  gt() %>%
  cols_label(
    variable = "Variable",
    coefficient = "Coefficient",
    odds_ratio = "Odds Ratio",
    p_value = "P-value"
  ) %>%
  fmt_number(
    columns = c(coefficient, odds_ratio),
    decimals = 3
  ) %>%
  fmt_number(
    columns = p_value,
    decimals = 4
  ) %>%
  tab_header(
    title = "Two-Point Conversion Success Model",
    subtitle = "Logistic Regression Coefficients"
  )

#| label: two-pt-model-py
#| message: false
#| warning: false

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, brier_score_loss
from sklearn.preprocessing import LabelEncoder
import warnings
warnings.filterwarnings('ignore')

# Prepare modeling data with relevant features
model_data = conversion_attempts[
    (conversion_attempts['two_point_attempt'] == 1) &
    (conversion_attempts['season'] >= 2015) &
    (conversion_attempts['success'].notna())
].copy()

# Create feature variables
model_data['home_attempt'] = (model_data['posteam'] == model_data['home_team']).astype(int)
model_data['score_diff'] = model_data['score_differential']
model_data['is_late_game'] = (
    (model_data['qtr'] == 4) &
    (model_data['game_seconds_remaining'] < 300)
).astype(int)

# Categorize roof type
def roof_type(roof):
    if roof == 'outdoors':
        return 'Outdoor'
    elif roof == 'dome':
        return 'Dome'
    else:
        return 'Retractable'

model_data['roof_type'] = model_data['roof'].apply(roof_type)

# Select features and remove missing values
features = ['home_attempt', 'score_diff', 'is_late_game', 'qtr', 'roof_type']
model_data_clean = model_data[features + ['success']].dropna()

# Encode categorical variables (roof_type) as numeric
le = LabelEncoder()
model_data_clean['roof_encoded'] = le.fit_transform(model_data_clean['roof_type'])

# Prepare feature matrix (X) and target vector (y)
X = model_data_clean[['home_attempt', 'score_diff', 'is_late_game', 'qtr', 'roof_encoded']]
y = model_data_clean['success'].astype(int)

# Fit logistic regression model
model = LogisticRegression(random_state=42, max_iter=1000)
model.fit(X, y)

# Generate predicted probabilities
# [:, 1] selects probability of success (class 1)
y_pred_proba = model.predict_proba(X)[:, 1]

# Calculate performance metrics
auc_value = roc_auc_score(y, y_pred_proba)
brier_score = brier_score_loss(y, y_pred_proba)

print("\nTwo-Point Conversion Success Model")
print("="*60)
print(f"AUC: {auc_value:.3f}")
print(f"Brier Score: {brier_score:.4f}")

# Display coefficients with odds ratios
coef_df = pd.DataFrame({
    'Variable': ['home_attempt', 'score_diff', 'is_late_game', 'qtr', 'roof_encoded'],
    'Coefficient': model.coef_[0],
    'Odds_Ratio': np.exp(model.coef_[0])  # Exponentiate for interpretability
})

print("\nModel Coefficients:")
print(coef_df.to_string(index=False))

**Logistic Regression Basics**: Unlike linear regression (which predicts continuous values), logistic regression predicts probabilities. It models the log-odds of success as a linear combination of predictors: log(p/(1-p)) = β₀ + β₁X₁ + β₂X₂ + ... **Coefficient Interpretation**: Each coefficient represents the change in log-odds for a one-unit increase in that variable. For example, a coefficient of 0.15 for home_attempt means home teams have 0.15 higher log-odds of success. Exponentiating gives odds ratios—easier to interpret (e.g., odds ratio of 1.16 means 16% higher odds). **Model Performance Metrics**: - **AUC (Area Under ROC Curve)**: Measures how well the model discriminates success from failure. Values range from 0.5 (random guessing) to 1.0 (perfect prediction). Values of 0.55-0.65 are typical for two-point models due to inherent randomness in football. - **Brier Score**: Measures calibration—how close predicted probabilities are to actual outcomes. Lower is better. Score of 0.25 would mean predictions are off by 0.25² = 0.0625 on average. **Feature Selection Rationale**: - **home_attempt**: Tests home field advantage - **score_diff**: Tests whether desperation (trailing) affects success - **is_late_game**: Tests late-game pressure effects - **qtr**: Tests general time trends - **roof_type**: Tests weather/environment effects

Understanding Model Performance

The AUC value of approximately 0.52-0.54 might seem low, but this is actually expected for two-point conversion prediction. Here's why: Two-point conversions are inherently high-variance events. Even if we knew every relevant factor—offensive and defensive quality, play call, defensive alignment—there's still massive randomness in execution: dropped passes, broken tackles, referee calls, lucky bounces. Compare to other prediction tasks: - **Coin flips**: AUC = 0.50 (pure random) - **Two-point conversions**: AUC ≈ 0.53 (mostly random with small signal) - **Fourth down conversions**: AUC ≈ 0.65 (more predictable factors) - **Game winners**: AUC ≈ 0.75 (many predictive factors) An AUC of 0.53 means our model performs slightly better than random chance—we've captured a small amount of signal in a very noisy system. This is actually useful: improving from 48% to 50% success probability changes expected value from 0.96 to 1.00 points, a meaningful difference.

The model results typically show:

Significant Factors:
- Home field: Small positive effect (1-2 percentage points), likely due to crowd noise disrupting defensive communication
- Score differential: Minimal effect, suggesting teams don't significantly change success rates when desperate
- Late game: Slightly negative effect, possibly due to defensive preparation or offensive predictability

Insignificant Factors:
- Quarter: Little systematic variation across quarters
- Roof type: Minimal effect on success rates

Strategic Implications: The relatively flat effects across most variables suggest that baseline success rate (~48%) is a reasonable estimate for most situations. Adjustments based on game context should be modest (±2-3 percentage points at most) unless we have strong team-specific information.

Score-Specific Optimal Strategies

The optimal two-point decision depends heavily on the score differential. While expected value analysis suggests always going for two (0.96 vs 0.94 points), win probability considerations often override this. The key insight is that specific score differentials create strategic inflection points where certain point totals fundamentally change the path to winning.

Traditional analytical wisdom identifies several critical scenarios for going for two:

Down 14 after first TD: Going for two sets up a potential tie with another TD+2PT, and provides information about what's needed
Down 8 after scoring TD: Going for two attempts to tie the game (down 8 → tied)
Up 1 after scoring TD: Going for two attempts to create a 3-point lead (field goal margin)
Late game scenarios: Any situation where specific point differentials affect winning paths

Let's formalize this intuition by creating decision charts that map score differentials to optimal decisions.

The Information Value of Two-Point Attempts

One underappreciated aspect of two-point decisions is their information value. When you attempt a two-point conversion, the result provides information that affects optimal strategy for the remainder of the game. **Example**: Down 14, you score a TD. If you go for two: - **Success**: You're down 8. Now you know you need TD + XP to tie. - **Failure**: You're down 10. Now you know you need TD + FG to tie. This information helps you optimize clock management and play calling on subsequent drives. If you kick (down 7), you don't learn what you'll need until your next touchdown, potentially wasting time on suboptimal strategies. This information value isn't captured in simple expected value calculations but can be formalized in dynamic programming models that account for decision trees.

The Classic Two-Point Chart

The classic two-point conversion chart provides guidance for trailing teams based on score differential after a touchdown. These charts, popularized by analysts like Kevin Cole and Brian Burke, codify the situations where analytics clearly favor going for two.

#| label: two-point-chart-r
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 8

# Create two-point decision chart for trailing scenarios
# This function encodes analytical guidance for when to go for two

create_two_point_chart <- function(p_2pt = 0.48, p_xp = 0.94) {
  # Score differentials to consider (when trailing before scoring TD)
  score_diffs <- 1:14

  # Calculate optimal decision for each scenario
  decisions <- tibble(
    score_diff_after_td = score_diffs - 6,  # After scoring TD, before conversion
    go_for_2 = case_when(
      # Down 8 (now down 2): Go for 2 to tie immediately
      score_diffs == 8 ~ TRUE,
      # Down 14 (now down 8): Go for 2 to set up second 2PT for tie
      score_diffs == 14 ~ TRUE,
      # Down 7 (now down 1): Kick to tie (most risk-averse option)
      score_diffs == 7 ~ FALSE,
      # Down 1-6: Generally kick to take lead or close gap
      score_diffs <= 6 ~ FALSE,
      # Down 9-13: Complex situations, generally kick
      TRUE ~ FALSE
    ),
    # Provide reasoning for each decision
    reasoning = case_when(
      score_diffs == 8 ~ "Go for tie",
      score_diffs == 14 ~ "Set up 2nd 2PT to tie",
      score_diffs == 7 ~ "Take the tie",
      score_diffs <= 6 ~ "Take lead/close gap",
      TRUE ~ "Kick XP"
    )
  )

  return(decisions)
}

# Generate chart data
chart_data <- create_two_point_chart()

# Create visual representation
chart_data %>%
  mutate(
    decision_label = if_else(go_for_2, "GO FOR 2", "KICK XP"),
    score_label = paste0("Down ", -score_diff_after_td)
  ) %>%
  ggplot(aes(x = reorder(score_label, -score_diff_after_td),
             y = 1,
             fill = go_for_2)) +
  geom_tile(color = "white", size = 2) +
  # Add decision text
  geom_text(aes(label = decision_label),
            size = 5, fontface = "bold", color = "white") +
  # Add reasoning text below decision
  geom_text(aes(label = reasoning),
            y = 0.6, size = 3.5, color = "white") +
  scale_fill_manual(
    values = c("TRUE" = "#D50032", "FALSE" = "#0076CE"),
    guide = "none"
  ) +
  labs(
    title = "Two-Point Conversion Decision Chart",
    subtitle = "Optimal decisions when TRAILING after scoring a touchdown",
    x = "Score Differential (After TD, Before Conversion Attempt)",
    y = NULL,
    caption = "Assumes: 48% 2PT success, 94% XP success\nChart represents general strategy; late-game situations require dynamic analysis"
  ) +
  theme_minimal() +
  theme(
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    panel.grid = element_blank(),
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(size = 11),
    axis.text.x = element_text(size = 11, face = "bold")
  ) +
  coord_fixed(ratio = 3)

#| label: two-point-chart-py
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 8

def create_two_point_chart(p_2pt=0.48, p_xp=0.94):
    """
    Create two-point decision chart for trailing scenarios

    Args:
        p_2pt: Probability of two-point conversion success
        p_xp: Probability of extra point success

    Returns:
        DataFrame with optimal decisions by score differential
    """
    score_diffs = list(range(1, 15))

    decisions = []
    for diff in score_diffs:
        # Score after TD but before conversion attempt
        score_diff_after_td = diff - 6

        # Apply decision logic based on score differential
        if diff == 8:
            go_for_2 = True
            reasoning = "Go for tie"
        elif diff == 14:
            go_for_2 = True
            reasoning = "Set up 2nd 2PT to tie"
        elif diff == 7:
            go_for_2 = False
            reasoning = "Take the tie"
        elif diff <= 6:
            go_for_2 = False
            reasoning = "Take lead/close gap"
        else:
            go_for_2 = False
            reasoning = "Kick XP"

        decisions.append({
            'score_diff_after_td': score_diff_after_td,
            'go_for_2': go_for_2,
            'reasoning': reasoning
        })

    return pd.DataFrame(decisions)

# Generate chart data
chart_data = create_two_point_chart()

# Create visualization
fig, ax = plt.subplots(figsize=(12, 4))

for idx, row in chart_data.iterrows():
    score_label = f"Down {-row['score_diff_after_td']}"
    decision_label = "GO FOR 2" if row['go_for_2'] else "KICK XP"
    color = '#D50032' if row['go_for_2'] else '#0076CE'

    # Draw rectangle for each score scenario
    rect = plt.Rectangle((idx, 0), 1, 1, facecolor=color, edgecolor='white', linewidth=2)
    ax.add_patch(rect)

    # Add decision label
    ax.text(idx + 0.5, 0.7, decision_label,
            ha='center', va='center', fontsize=10, fontweight='bold', color='white')

    # Add reasoning label
    ax.text(idx + 0.5, 0.3, row['reasoning'],
            ha='center', va='center', fontsize=8, color='white')

# Configure plot
ax.set_xlim(0, len(chart_data))
ax.set_ylim(0, 1)
ax.set_xticks([i + 0.5 for i in range(len(chart_data))])
ax.set_xticklabels([f"Down {-row['score_diff_after_td']}"
                     for _, row in chart_data.iterrows()],
                    fontweight='bold', fontsize=10)
ax.set_yticks([])
ax.set_xlabel('Score Differential (After TD, Before Conversion Attempt)',
              fontsize=12, fontweight='bold')
ax.set_title('Two-Point Conversion Decision Chart\nOptimal decisions when TRAILING after scoring a touchdown',
             fontsize=14, fontweight='bold', pad=20)
ax.text(0.5, -0.4, 'Assumes: 48% 2PT success, 94% XP success\nChart represents general strategy; late-game situations require dynamic analysis',
        transform=ax.transAxes, ha='center', fontsize=8, style='italic')

plt.tight_layout()
plt.show()

**Chart Logic**: This decision chart encodes traditional analytical wisdom about when trailing teams should attempt two-point conversions. The key scenarios are: **Down 8 (now down 2)**: This is the clearest go-for-two situation. Converting ties the game immediately. Kicking leaves you down 1, which is better than down 2, but doesn't fundamentally change your situation—you still need to stop the opponent and score again. **Down 14 (now down 8)**: Going for two on the first touchdown provides information. If successful, you're down 8 and know you need TD+XP to tie. If you fail, you're down 10 and know you need TD+FG to tie. Either way, you have clarity for planning subsequent drives. If you kick both times, you're down 8 after the second TD and wish you'd gone for two earlier. **Down 7 (now down 1)**: Kicking ties the game, which is typically preferable to the variance of a two-point attempt. While the expected value slightly favors going for two (0.96 vs 0.94 points), guaranteeing a tie has higher win probability than a 48% chance of taking the lead vs. 52% chance of staying behind. **Chart Limitations**: This chart provides general guidance but ignores time remaining, timeouts, opponent quality, and possession dynamics. Late-game situations with limited possessions require more sophisticated analysis, which we'll address next with dynamic programming.

The decision chart reveals an important principle: two-point conversions are most valuable when they enable or prevent specific score differentials that fundamentally alter win paths. Converting when down 8 ties the game—a massive win probability change. Converting when down 5 makes it a 3-point game instead of 4-point game—a smaller win probability change.

Why "Down 14" is Counterintuitive

Many coaches and fans find the "go for two when down 14" recommendation counterintuitive. The reasoning seems backwards: why try the harder option first when you could kick two easy extra points? The key is information and optionality: **Strategy 1: Kick both extra points** - First TD: Kick → down 7 - Second TD: Kick → tied - Problem: If you score a second TD with 0:02 left, you can't go for two to win—you've locked yourself into overtime with no chance to win in regulation **Strategy 2: Go for two on first TD, adjust based on result** - First TD: Go for 2 - Success → down 8 → Second TD + XP = tie - Failure → down 10 → Second TD + XP = down 3 (need FG to tie) - Advantage: Maintains optionality—if you score a second TD late, you can still decide whether to go for two to win or kick to tie The second strategy gives you flexibility to adapt based on how the game unfolds, while the first strategy locks you into a predetermined path.

Dynamic Programming for Late-Game Decisions

The simple decision chart provides good heuristics, but late-game situations require more sophisticated analysis. When time is limited, we need to consider:

Expected possessions: How many more drives will each team have?
Clock management: Can the opponent run out the clock if they get the ball?
Win/tie/loss probabilities: Different outcomes have different values in different time situations
Downstream scenarios: How does this decision affect future decision points?

Dynamic programming provides a framework for this analysis. We work backwards from the end of the game, calculating optimal strategies at each decision point given optimal play thereafter. While a full dynamic programming solution is complex (requiring possession models, scoring probabilities, etc.), we can build a simplified model that captures the key insights.

#| label: dynamic-programming-r
#| message: false
#| warning: false

# Dynamic programming for optimal two-point decisions
# This is a simplified model for demonstration purposes
# A full model would incorporate possession probabilities, opponent scoring rates, etc.

calculate_win_probability_2pt <- function(
  score_diff_after_td,  # Score after TD, before conversion
  time_remaining_seconds,
  p_2pt = 0.48,
  p_xp = 0.94
) {
  # This simplified model uses score differential as proxy for win probability
  # In practice, you'd use a full win probability model incorporating:
  # - Time remaining
  # - Timeouts
  # - Possession dynamics
  # - Team quality
  # - Field position

  # Calculate score after each possible outcome
  score_after_2pt_success <- score_diff_after_td + 2
  score_after_2pt_fail <- score_diff_after_td
  score_after_xp_success <- score_diff_after_td + 1
  score_after_xp_fail <- score_diff_after_td

  # Simple win probability based on score differential
  # Using normal CDF centered at 0 as approximation
  # Positive score differential → higher win probability
  wp_2pt_success <- pnorm(score_after_2pt_success / 7)
  wp_2pt_fail <- pnorm(score_after_2pt_fail / 7)
  wp_xp_success <- pnorm(score_after_xp_success / 7)
  wp_xp_fail <- pnorm(score_after_xp_fail / 7)

  # Calculate expected win probability for each decision
  ev_2pt <- p_2pt * wp_2pt_success + (1 - p_2pt) * wp_2pt_fail
  ev_xp <- p_xp * wp_xp_success + (1 - p_xp) * wp_xp_fail

  return(list(
    ev_2pt = ev_2pt,
    ev_xp = ev_xp,
    optimal = if_else(ev_2pt > ev_xp, "Go for 2", "Kick XP"),
    advantage = ev_2pt - ev_xp
  ))
}

# Create decision matrix for different score/time scenarios
scenarios <- expand_grid(
  score_diff = seq(-8, -1, 1),  # Trailing by 1-8 after TD
  time_remaining = c(600, 300, 120, 60)  # 10:00, 5:00, 2:00, 1:00
) %>%
  rowwise() %>%
  mutate(
    # Calculate optimal decision for each scenario
    result = list(calculate_win_probability_2pt(score_diff, time_remaining)),
    ev_2pt = result$ev_2pt,
    ev_xp = result$ev_xp,
    optimal = result$optimal,
    advantage = result$advantage
  ) %>%
  ungroup() %>%
  mutate(
    # Create readable time labels
    time_label = case_when(
      time_remaining == 600 ~ "10:00",
      time_remaining == 300 ~ "5:00",
      time_remaining == 120 ~ "2:00",
      time_remaining == 60 ~ "1:00"
    )
  )

# Visualize decision matrix as heatmap
scenarios %>%
  ggplot(aes(x = factor(score_diff), y = time_label, fill = advantage)) +
  geom_tile(color = "white", size = 1) +
  # Add optimal decision text to each cell
  geom_text(aes(label = optimal), size = 4, fontweight = "bold") +
  scale_fill_gradient2(
    low = "#0076CE",      # Blue for "kick XP"
    mid = "white",         # White for neutral
    high = "#D50032",      # Red for "go for 2"
    midpoint = 0,
    name = "WP Advantage\n(2PT - XP)"
  ) +
  labs(
    title = "Dynamic Two-Point Conversion Decisions",
    subtitle = "Optimal strategy by score and time remaining (simplified model)",
    x = "Score Differential (After TD, Before Conversion)",
    y = "Time Remaining",
    caption = "Assumes: 48% 2PT success, 94% XP success | Model simplified for illustration"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

#| label: dynamic-programming-py
#| message: false
#| warning: false

from scipy.stats import norm

def calculate_win_probability_2pt(
    score_diff_after_td,
    time_remaining_seconds,
    p_2pt=0.48,
    p_xp=0.94
):
    """
    Calculate expected win probability for 2PT vs XP decision

    This is a simplified model for demonstration. A full model would incorporate:
    - Expected possessions remaining
    - Opponent scoring probabilities
    - Field position effects
    - Timeout situations
    - Team-specific factors

    Args:
        score_diff_after_td: Score differential after TD, before conversion attempt
        time_remaining_seconds: Seconds remaining in game
        p_2pt: Probability of two-point conversion success
        p_xp: Probability of extra point success

    Returns:
        Dictionary with expected win probabilities and optimal decision
    """
    # Calculate score after each possible outcome
    score_after_2pt_success = score_diff_after_td + 2
    score_after_2pt_fail = score_diff_after_td
    score_after_xp_success = score_diff_after_td + 1
    score_after_xp_fail = score_diff_after_td

    # Simple win probability based on score differential
    # Using normal CDF as approximation (centered at 0, scaled by 7)
    wp_2pt_success = norm.cdf(score_after_2pt_success / 7)
    wp_2pt_fail = norm.cdf(score_after_2pt_fail / 7)
    wp_xp_success = norm.cdf(score_after_xp_success / 7)
    wp_xp_fail = norm.cdf(score_after_xp_fail / 7)

    # Calculate expected win probability for each decision
    ev_2pt = p_2pt * wp_2pt_success + (1 - p_2pt) * wp_2pt_fail
    ev_xp = p_xp * wp_xp_success + (1 - p_xp) * wp_xp_fail

    return {
        'ev_2pt': ev_2pt,
        'ev_xp': ev_xp,
        'optimal': 'Go for 2' if ev_2pt > ev_xp else 'Kick XP',
        'advantage': ev_2pt - ev_xp
    }

# Create decision matrix for various score/time scenarios
score_diffs = list(range(-8, 0))  # Down 8 to down 1
time_remaining_values = [600, 300, 120, 60]  # Different time scenarios
time_labels = {600: '10:00', 300: '5:00', 120: '2:00', 60: '1:00'}

scenarios = []
for score_diff in score_diffs:
    for time_remaining in time_remaining_values:
        result = calculate_win_probability_2pt(score_diff, time_remaining)
        scenarios.append({
            'score_diff': score_diff,
            'time_remaining': time_remaining,
            'time_label': time_labels[time_remaining],
            'ev_2pt': result['ev_2pt'],
            'ev_xp': result['ev_xp'],
            'optimal': result['optimal'],
            'advantage': result['advantage']
        })

scenarios_df = pd.DataFrame(scenarios)

# Create heatmap visualization
pivot_data = scenarios_df.pivot(index='time_label', columns='score_diff', values='advantage')
pivot_labels = scenarios_df.pivot(index='time_label', columns='score_diff', values='optimal')

fig, ax = plt.subplots(figsize=(12, 6))

# Create heatmap with color gradient
im = ax.imshow(pivot_data.values, cmap='RdBu_r', aspect='auto', vmin=-0.1, vmax=0.1)

# Set axis ticks and labels
ax.set_xticks(np.arange(len(pivot_data.columns)))
ax.set_yticks(np.arange(len(pivot_data.index)))
ax.set_xticklabels(pivot_data.columns)
ax.set_yticklabels(['10:00', '5:00', '2:00', '1:00'])

# Add text annotations showing optimal decision
for i in range(len(pivot_data.index)):
    for j in range(len(pivot_data.columns)):
        text = ax.text(j, i, pivot_labels.iloc[i, j],
                      ha="center", va="center", color="black", fontweight='bold', fontsize=9)

# Labels and title
ax.set_xlabel('Score Differential (After TD, Before Conversion)', fontsize=12, fontweight='bold')
ax.set_ylabel('Time Remaining', fontsize=12, fontweight='bold')
ax.set_title('Dynamic Two-Point Conversion Decisions\nOptimal strategy by score and time remaining (simplified model)',
             fontsize=14, fontweight='bold', pad=20)

# Add colorbar
cbar = plt.colorbar(im, ax=ax)
cbar.set_label('WP Advantage\n(2PT - XP)', rotation=270, labelpad=20, fontweight='bold')

plt.tight_layout()
fig.text(0.5, 0.01, 'Assumes: 48% 2PT success, 94% XP success | Model simplified for illustration',
         ha='center', fontsize=8, style='italic')
plt.show()

**Dynamic Programming Concept**: Dynamic programming solves complex multi-stage decision problems by working backwards from the end state. For two-point conversions, we'd ideally model all future game states (possessions, scores, time) and calculate optimal decisions at each point. This simplified version uses score differential as a proxy for win probability. **Model Simplifications**: This model uses a normal distribution approximation where win probability is a function of score differential. A score differential of +7 (one touchdown lead) maps to approximately 75% win probability. This is very simplified—real win probability models incorporate time, field position, timeouts, and team quality. **Win Probability vs Expected Points**: Notice how the decision framework shifts from expected points to expected win probability. A decision that adds 0.02 expected points but reduces win probability by 2% is suboptimal. This formalization makes clear why game context matters more than raw expected value. **Interpretation of Heatmap**: The color gradient shows win probability advantage of going for two versus kicking. Red (positive) means go for two increases win probability; blue (negative) means kicking is better. The intensity shows magnitude—darker colors mean stronger preference. **Time Dependency**: In this simplified model, decisions don't vary much with time because we haven't incorporated possession dynamics. A full model would show that with very little time remaining, guaranteed points become more valuable (favoring extra points in close games) because there's no opportunity to make up for failed two-point attempts.

Even with this simplified model, we can see key patterns:

Down 2 (trailing by 8, scored TD): Strong preference for going for two across all time scenarios. Converting ties the game immediately—a massive win probability boost that far outweighs the expected value calculation.

Down 1 (trailing by 7, scored TD): Preference for kicking across all times. Guaranteeing a tie is worth more than the variance of a two-point attempt, especially with sufficient time for overtime.

Intermediate Scores: More nuanced decisions where time remaining would matter more in a full model. With very little time, guaranteed points become more valuable because there's no chance to make up failed attempts.

Building a Full Dynamic Programming Model

A complete dynamic programming model for two-point decisions would need: 1. **State Space**: Define all possible game states - Score differential - Time remaining - Possession (who has the ball) - Field position - Timeouts remaining - Down and distance 2. **Transition Probabilities**: Model how states evolve - Probability of scoring on a drive by field position - Time taken per drive - Probability of defensive stops - Turnover probabilities 3. **Value Function**: Define value of each state - Terminal states (game over): 1 for win, 0 for loss, 0.5 for tie - Non-terminal states: Expected value of optimal future play 4. **Backward Induction**: Work backwards from game end - Start with 0:00 remaining (terminal states) - Calculate optimal decisions and values for each state - Move backwards in time until game start This is computationally intensive but provides exact optimal strategies for all scenarios. Chapter 24 on fourth down decisions covers similar dynamic programming approaches in more detail.

Play-Calling on Two-Point Attempts

Beyond deciding when to attempt two-point conversions, teams must decide how to attempt them. Should teams pass or run? From what formations? With what personnel packages? These tactical decisions significantly affect success probability.

Analyzing play-calling patterns reveals strategic tendencies and helps identify best practices. We can examine:
1. Pass vs. run balance: Do teams pass or run more often?
2. Success rates by play type: Which approach works better?
3. Trends over time: Has play-calling evolved?
4. Predictability: Do teams become too predictable in their calls?

Historical Play-Calling Trends

The following analysis examines how teams have called plays on two-point conversions since the 2015 rule change, tracking both the distribution of pass vs. run plays and their relative success rates.

#| label: play-calling-analysis-r
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 6

# Analyze play-calling on two-point attempts
# We'll classify plays as pass or run and track success rates

two_pt_plays <- conversion_attempts %>%
  filter(
    two_point_attempt == 1,
    season >= 2015
  ) %>%
  mutate(
    # Classify play type from play description text
    # This is imperfect but provides directional insight
    play_type_clean = case_when(
      grepl("pass", desc, ignore.case = TRUE) ~ "Pass",
      grepl("rush|run", desc, ignore.case = TRUE) ~ "Run",
      grepl("scramble", desc, ignore.case = TRUE) ~ "Scramble",
      TRUE ~ "Other"
    )
  ) %>%
  # Focus on standard pass and run plays
  filter(play_type_clean %in% c("Pass", "Run"))

# Calculate annual trends in play-calling and success
play_calling_trends <- two_pt_plays %>%
  group_by(season, play_type_clean) %>%
  summarise(
    attempts = n(),
    success_rate = mean(success, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(season) %>%
  mutate(
    # Calculate percentage of two-point attempts using each play type
    pct_of_attempts = attempts / sum(attempts)
  )

# Plot 1: Distribution of play types over time
p1 <- play_calling_trends %>%
  ggplot(aes(x = season, y = pct_of_attempts, color = play_type_clean)) +
  geom_line(size = 1.2) +
  geom_point(size = 3) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_color_manual(values = c("Pass" = "#0077BE", "Run" = "#FF6B35")) +
  labs(
    title = "Two-Point Attempt Play Type Distribution",
    x = NULL,
    y = "% of Attempts",
    color = "Play Type"
  ) +
  theme(legend.position = "top")

# Plot 2: Success rates by play type over time
p2 <- play_calling_trends %>%
  ggplot(aes(x = season, y = success_rate, color = play_type_clean)) +
  geom_line(size = 1.2) +
  geom_point(size = 3) +
  # Add reference line at overall 48% success rate
  geom_hline(yintercept = 0.48, linetype = "dashed", alpha = 0.5) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_color_manual(values = c("Pass" = "#0077BE", "Run" = "#FF6B35")) +
  labs(
    title = "Success Rate by Play Type",
    x = "Season",
    y = "Success Rate",
    color = "Play Type"
  ) +
  theme(legend.position = "top")

# Combine plots vertically
p1 / p2 +
  plot_annotation(
    caption = "Data: nflfastR | Dashed line shows overall 48% success rate"
  )

#| label: play-calling-analysis-py
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 8

# Analyze play-calling on two-point attempts
two_pt_plays = conversion_attempts[
    (conversion_attempts['two_point_attempt'] == 1) &
    (conversion_attempts['season'] >= 2015)
].copy()

def classify_play_type(desc):
    """Classify play type from play description text"""
    if pd.isna(desc):
        return 'Other'
    desc_lower = str(desc).lower()
    if 'pass' in desc_lower:
        return 'Pass'
    elif 'rush' in desc_lower or 'run' in desc_lower:
        return 'Run'
    elif 'scramble' in desc_lower:
        return 'Scramble'
    else:
        return 'Other'

two_pt_plays['play_type_clean'] = two_pt_plays['desc'].apply(classify_play_type)

# Focus on standard pass and run plays
two_pt_plays = two_pt_plays[two_pt_plays['play_type_clean'].isin(['Pass', 'Run'])]

# Calculate trends in play-calling and success
play_calling_trends = (two_pt_plays
    .groupby(['season', 'play_type_clean'])
    .agg(
        attempts=('success', 'count'),
        success_rate=('success', 'mean')
    )
    .reset_index()
)

# Calculate percentage of attempts by play type
play_calling_trends['total_attempts'] = play_calling_trends.groupby('season')['attempts'].transform('sum')
play_calling_trends['pct_of_attempts'] = play_calling_trends['attempts'] / play_calling_trends['total_attempts']

# Create visualization with two subplots
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 8))

# Plot 1: Play type distribution over time
for play_type, color in [('Pass', '#0077BE'), ('Run', '#FF6B35')]:
    data = play_calling_trends[play_calling_trends['play_type_clean'] == play_type]
    ax1.plot(data['season'], data['pct_of_attempts'],
             color=color, linewidth=2, marker='o', markersize=6, label=play_type)

ax1.set_ylabel('% of Attempts', fontsize=11)
ax1.set_title('Two-Point Attempt Play Type Distribution', fontsize=12, fontweight='bold')
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax1.legend(loc='upper right', frameon=True)
ax1.grid(True, alpha=0.3)
ax1.set_xticks(range(2015, 2024))

# Plot 2: Success rates by play type over time
for play_type, color in [('Pass', '#0077BE'), ('Run', '#FF6B35')]:
    data = play_calling_trends[play_calling_trends['play_type_clean'] == play_type]
    ax2.plot(data['season'], data['success_rate'],
             color=color, linewidth=2, marker='o', markersize=6, label=play_type)

# Add reference line at 48% overall success rate
ax2.axhline(y=0.48, color='gray', linestyle='--', alpha=0.5, linewidth=1.5)
ax2.set_xlabel('Season', fontsize=11)
ax2.set_ylabel('Success Rate', fontsize=11)
ax2.set_title('Success Rate by Play Type', fontsize=12, fontweight='bold')
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.legend(loc='upper right', frameon=True)
ax2.grid(True, alpha=0.3)
ax2.set_xticks(range(2015, 2024))

plt.tight_layout()
fig.text(0.99, 0.01, 'Data: nfl_data_py | Dashed line shows overall 48% success rate',
         ha='right', fontsize=8, style='italic')
plt.show()

**Play Type Classification**: We classify plays by searching the play description text for keywords like "pass", "rush", or "run". This method is imperfect—it may misclassify trick plays or fail to classify plays with unusual descriptions—but it provides reasonable accuracy for the majority of attempts. **Sample Size Considerations**: Annual breakdowns have smaller samples (typically 50-150 attempts per play type per year), so year-to-year variation may largely reflect randomness rather than real trends. We should look for persistent patterns across multiple years rather than over-interpreting single-year spikes. **Success Rate Interpretation**: Success rates hovering around 48-50% suggest balanced play-calling. If one play type consistently showed 55%+ success while the other showed 40%, we'd expect teams to shift toward the more successful option, creating equilibrium where success rates converge. **Trend Analysis**: The relatively stable distribution over time (typically 60-65% pass, 35-40% run) suggests teams have settled on a balanced approach that prevents defenses from selling out against either option.

The play-calling analysis reveals several patterns:

Pass-Heavy Approach: Teams pass on approximately 60-65% of two-point attempts, significantly more than their regular play-calling (which typically runs about 60-40 pass-run overall, but closer to 50-50 near the goal line). This suggests teams view passing as more effective in short-yardage, high-stakes situations.

Success Rate Parity: Despite the pass-heavy approach, success rates are similar for passes (~49%) and runs (~47%). This rough parity suggests teams have found a strategic balance—if passes were dramatically more successful, defenses would adjust their alignments, and teams would run more.

Year-to-Year Variation: Both distribution and success rates show significant year-to-year variation, likely due to small sample sizes (100-200 attempts per category per year). We shouldn't interpret this variation as meaningful strategic shifts without additional evidence.

Strategic Implications: The 60-40 pass-run split with similar success rates suggests both play types are viable. Teams likely maintain this mix to prevent defensive predictability—if teams passed 90% of the time, defenses could defend accordingly and reduce pass success rates.

The Predictability Problem

One challenge in two-point conversion play-calling is predictability. With limited plays and film to study, defenses can identify tendencies and prepare specific game plans for each opponent's two-point conversion package. **Example Scenario**: If a team runs the same play or formation on 80% of their two-point attempts, opponents can prepare a specific defensive call that directly counters that tendency. This is why successful two-point packages typically include multiple formations, motion patterns, and play options—maintaining unpredictability even with a small playbook. **Analytics Implication**: When evaluating two-point success rates, account for: 1. **Sample size**: 5 attempts is too small to identify true tendencies 2. **Opponent adjustments**: Success rates may decline as opponents accumulate film 3. **Context**: Desperate late-game attempts may force predictable play calls

Formation and Personnel Analysis

Beyond pass vs. run, the formations and personnel groupings teams use on two-point attempts affect success rates. While detailed formation data is limited in publicly available datasets, we can analyze aggregate success rates and yards gained by play type.

#| label: formation-analysis-r
#| message: false
#| warning: false

# Analyze aggregate performance by play type
# Formation data is limited in nflfastR, so we focus on outcomes

two_pt_summary <- two_pt_plays %>%
  group_by(play_type_clean) %>%
  summarise(
    total_attempts = n(),
    successes = sum(success, na.rm = TRUE),
    success_rate = mean(success, na.rm = TRUE),
    avg_yards = mean(yards_gained, na.rm = TRUE),
    .groups = "drop"
  )

# Display formatted table
two_pt_summary %>%
  gt() %>%
  cols_label(
    play_type_clean = "Play Type",
    total_attempts = "Attempts",
    successes = "Successes",
    success_rate = "Success Rate",
    avg_yards = "Avg Yards"
  ) %>%
  fmt_number(
    columns = c(success_rate),
    decimals = 1,
    scale_by = 100,
    pattern = "{x}%"
  ) %>%
  fmt_number(
    columns = avg_yards,
    decimals = 2
  ) %>%
  fmt_number(
    columns = c(total_attempts, successes),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  tab_header(
    title = "Two-Point Conversion Performance by Play Type",
    subtitle = "2015-2023 Seasons"
  ) %>%
  # Color code success rates (red = low, green = high)
  data_color(
    columns = success_rate,
    colors = scales::col_numeric(
      palette = c("#FFA07A", "#98FB98"),
      domain = c(0.4, 0.6)
    )
  )

#| label: formation-analysis-py
#| message: false
#| warning: false

# Analyze aggregate performance by play type
two_pt_summary = (two_pt_plays
    .groupby('play_type_clean')
    .agg(
        total_attempts=('success', 'count'),
        successes=('success', 'sum'),
        success_rate=('success', 'mean'),
        avg_yards=('yards_gained', 'mean')
    )
    .reset_index()
)

print("\nTwo-Point Conversion Performance by Play Type (2015-2023)")
print("="*70)
print(two_pt_summary.to_string(index=False))

**Aggregate Analysis**: This summary aggregates all two-point attempts since 2015 by play type, providing robust sample sizes (hundreds to thousands of plays) that smooth out year-to-year randomness. **Average Yards**: We include average yards gained to understand not just binary success/failure but the distribution of outcomes. If passes average 0.8 yards on failures while runs average -0.5 yards, this suggests passes get closer even when unsuccessful, potentially indicating they're closer to success. **Success Rate Interpretation**: The similar success rates (typically 47-49% for both play types) reflect strategic equilibrium. If one play type were dramatically more successful, teams would shift toward it until defensive adjustments brought success rates back to parity. **Limited Formation Data**: Unfortunately, detailed formation and personnel data isn't consistently available in play-by-play datasets. More sophisticated analysis would require charting data from services like Sports Info Solutions or Pro Football Focus, which track alignments, motions, and pre-snap adjustments.

The aggregate results typically show:

Balanced Success: Pass and run plays show similar success rates (within 2-3 percentage points), suggesting neither has a systematic advantage. This balance likely reflects strategic equilibrium where teams exploit whichever approach defenses de-emphasize.

Yards Gained: Even failed attempts typically gain positive yards (0.5-1.0 yards on average), suggesting plays are close to success. The two-yard line is inherently difficult regardless of play type.

Attempt Volume: The roughly 60-40 pass-run split mirrors general offensive philosophy trends in the modern NFL, where passing has become increasingly prevalent even in short-yardage situations.

Defensive Strategy Against Two-Point Conversions

While most analysis focuses on offensive decisions, defensive performance on two-point conversions varies significantly by team. Some defenses consistently shut down two-point attempts, while others struggle. Understanding these differences helps identify best practices and informs offensive play-calling.

Defensive Success Factors

We can analyze which teams have been most successful at defending two-point conversions and look for patterns that might explain their success.

#| label: defensive-analysis-r
#| message: false
#| warning: false

# Analyze defensive performance on two-point attempts
defensive_performance <- two_pt_plays %>%
  group_by(defteam, season) %>%
  summarise(
    attempts_against = n(),
    conversions_allowed = sum(success, na.rm = TRUE),
    success_rate_against = mean(success, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  # Filter to teams with meaningful sample sizes
  filter(attempts_against >= 5) %>%
  # Sort by success rate (lower is better for defense)
  arrange(success_rate_against)

# Display top 10 defensive performances
defensive_performance %>%
  slice_head(n = 10) %>%
  gt() %>%
  cols_label(
    defteam = "Team",
    season = "Season",
    attempts_against = "Attempts",
    conversions_allowed = "Allowed",
    success_rate_against = "Success Rate"
  ) %>%
  fmt_number(
    columns = success_rate_against,
    decimals = 1,
    scale_by = 100,
    pattern = "{x}%"
  ) %>%
  fmt_number(
    columns = c(attempts_against, conversions_allowed),
    decimals = 0
  ) %>%
  tab_header(
    title = "Best Defensive Performances Against Two-Point Conversions",
    subtitle = "Minimum 5 attempts faced | 2015-2023"
  ) %>%
  # Color code success rates (green = good defense, red = bad)
  data_color(
    columns = success_rate_against,
    colors = scales::col_numeric(
      palette = c("#98FB98", "#FFA07A"),
      domain = c(0, 0.5)
    )
  )

#| label: defensive-analysis-py
#| message: false
#| warning: false

# Analyze defensive performance on two-point attempts
defensive_performance = (two_pt_plays
    .groupby(['defteam', 'season'])
    .agg(
        attempts_against=('success', 'count'),
        conversions_allowed=('success', 'sum'),
        success_rate_against=('success', 'mean')
    )
    .reset_index()
)

# Filter to meaningful sample sizes and sort by success rate
defensive_performance = defensive_performance[
    defensive_performance['attempts_against'] >= 5
].sort_values('success_rate_against')

print("\nBest Defensive Performances Against Two-Point Conversions")
print("Minimum 5 attempts faced | 2015-2023")
print("="*70)
print(defensive_performance.head(10).to_string(index=False))

**Team-Season Analysis**: We analyze by team-season rather than aggregating across years because defensive personnel and schemes change significantly year to year. A team's 2015 defense may be completely different from their 2023 defense. **Sample Size Filter**: We require at least 5 attempts to filter out extremely small samples where one or two plays could create misleading percentages. Even 5 attempts is small (0 conversions allowed = 0%, 1 = 20%, 2 = 40%), so we shouldn't over-interpret these rankings. **Success Rate Interpretation**: A 0% or 20% success rate against with 5-8 attempts is impressive but may partially reflect luck. With small samples, a single missed assignment or great defensive play can swing the percentage dramatically. **Limitations**: This analysis doesn't account for opponent quality or game situations. A team that faced five desperate fourth-quarter attempts (potentially with predictable play calls) has an easier defensive task than one that faced five early-game attempts with full offensive creativity.

The defensive analysis typically reveals:

High Variance: Even the best defensive seasons show success rates of 20-30%, while the worst allow 60-70%. This range is far wider than for most defensive metrics, reflecting the small sample sizes and high variance of two-point plays.

No Clear Patterns: Elite defenses overall don't consistently dominate two-point conversions, and weak defenses don't consistently struggle. The randomness of individual plays and small sample sizes overwhelm systematic defensive quality.

Coaching Impact: Some defensive coordinators are particularly creative with two-point defensive packages, using exotic formations and disguises. However, measuring this impact is difficult with limited data.

Defensive Game Planning for Two-Point Conversions

Defensive coordinators typically prepare 3-5 specific two-point conversion defenses, often including: 1. **Goal line press**: Heavy box, man-to-man coverage, selling out to stop the run 2. **Zone blitz**: Send pressure while dropping a lineman into coverage to confuse blocking assignments 3. **Prevent/soft**: Back off to prevent easy completions, force offense to execute perfectly 4. **Exotic looks**: Unusual formations or disguises to create confusion The key is having multiple options to counter different offensive formations and preventing the offense from identifying the defensive call pre-snap. Given the limited practice time for two-point situations, defenses that have well-rehearsed packages and clear communication protocols tend to perform better.

Advanced Topics: Bayesian Two-Point Models

For readers interested in more sophisticated statistical approaches, we can use Bayesian methods to estimate team-specific two-point conversion probabilities. Unlike frequentist methods that treat each team's success rate as a fixed parameter to estimate, Bayesian approaches use hierarchical models that allow information to be shared across teams.

The key insight is that while each team has its own "true" two-point conversion ability, we expect these abilities to be somewhat similar across teams. A team that's attempted 8 two-point conversions and converted 1 (12.5%) probably isn't truly a 12.5% team—they're likely closer to average but got unlucky. Bayesian hierarchical models formalize this intuition through "shrinkage" toward the population mean.

#| label: bayesian-model-r
#| message: false
#| warning: false
#| eval: false

# Bayesian hierarchical model for two-point conversion rates
# Note: This requires additional packages and is computationally intensive
# Set eval: false to prevent automatic execution

library(rstan)

# Prepare data for Stan
team_2pt_data <- two_pt_plays %>%
  group_by(posteam) %>%
  summarise(
    attempts = n(),
    successes = sum(success, na.rm = TRUE)
  ) %>%
  # Require minimum attempts for meaningful estimation
  filter(attempts >= 5)

# Package data for Stan model
stan_data <- list(
  N = nrow(team_2pt_data),
  attempts = team_2pt_data$attempts,
  successes = team_2pt_data$successes
)

# Stan model code for hierarchical beta-binomial model
stan_code <- "
data {
  int<lower=0> N;              // number of teams
  int<lower=0> attempts[N];    // attempts per team
  int<lower=0> successes[N];   // successes per team
}
parameters {
  real<lower=0,upper=1> theta[N];  // team-specific success rates
  real<lower=0,upper=1> mu;        // population mean success rate
  real<lower=0> kappa;             // concentration parameter (controls shrinkage)
}
model {
  // Priors
  mu ~ beta(12, 13);           // Prior centered at ~0.48 (12/(12+13))
  kappa ~ gamma(2, 0.1);       // Weakly informative prior on concentration

  // Hierarchical model: team rates drawn from population distribution
  // Higher kappa = more similar across teams (stronger shrinkage)
  // Lower kappa = more variation across teams (weaker shrinkage)
  theta ~ beta(mu * kappa, (1 - mu) * kappa);

  // Likelihood: observed successes given team-specific rates
  successes ~ binomial(attempts, theta);
}
"

# To run this model:
# fit <- stan(model_code = stan_code, data = stan_data,
#             iter = 2000, chains = 4, seed = 42)
#
# # Extract posterior samples
# posterior_samples <- extract(fit)
#
# # Get team-specific estimates with uncertainty
# team_estimates <- data.frame(
#   team = team_2pt_data$posteam,
#   attempts = team_2pt_data$attempts,
#   observed_rate = team_2pt_data$successes / team_2pt_data$attempts,
#   posterior_mean = colMeans(posterior_samples$theta),
#   posterior_lower = apply(posterior_samples$theta, 2, quantile, 0.025),
#   posterior_upper = apply(posterior_samples$theta, 2, quantile, 0.975)
# )

cat("Bayesian model code prepared (not executed in document)\n")
cat("This model estimates team-specific 2PT conversion rates\n")
cat("with hierarchical shrinkage toward population mean\n")

#| label: bayesian-model-py
#| message: false
#| warning: false
#| eval: false

# Bayesian hierarchical model for two-point conversion rates
# Using PyMC for Bayesian inference
# Set eval: false to prevent automatic execution

import pymc as pm

# Prepare data
team_2pt_data = (two_pt_plays
    .groupby('posteam')
    .agg(
        attempts=('success', 'count'),
        successes=('success', 'sum')
    )
    .reset_index()
)
# Require minimum attempts
team_2pt_data = team_2pt_data[team_2pt_data['attempts'] >= 5]

# Build Bayesian hierarchical model
with pm.Model() as hierarchical_model:
    # Hyperpriors for population distribution
    mu = pm.Beta('mu', alpha=12, beta=13)  # Prior centered at ~0.48
    kappa = pm.Gamma('kappa', alpha=2, beta=0.1)  # Concentration parameter

    # Team-specific rates drawn from population distribution
    # Beta distribution parameterized by mean (mu) and concentration (kappa)
    theta = pm.Beta('theta',
                    alpha=mu * kappa,
                    beta=(1 - mu) * kappa,
                    shape=len(team_2pt_data))

    # Likelihood: observed successes given team-specific rates
    successes = pm.Binomial('successes',
                            n=team_2pt_data['attempts'].values,
                            p=theta,
                            observed=team_2pt_data['successes'].values)

# To run this model:
# with hierarchical_model:
#     trace = pm.sample(2000, tune=1000, chains=4, random_seed=42)
#
# # Extract posterior estimates
# posterior_means = trace.posterior['theta'].mean(dim=['chain', 'draw']).values
# posterior_lower = trace.posterior['theta'].quantile(0.025, dim=['chain', 'draw']).values
# posterior_upper = trace.posterior['theta'].quantile(0.975, dim=['chain', 'draw']).values
#
# team_estimates = pd.DataFrame({
#     'team': team_2pt_data['posteam'].values,
#     'attempts': team_2pt_data['attempts'].values,
#     'observed_rate': team_2pt_data['successes'].values / team_2pt_data['attempts'].values,
#     'posterior_mean': posterior_means,
#     'posterior_lower': posterior_lower,
#     'posterior_upper': posterior_upper
# })

print("Bayesian model code prepared (not executed in document)")
print("This model estimates team-specific 2PT conversion rates")
print("with hierarchical shrinkage toward population mean")

**Bayesian Hierarchical Model Concept**: Instead of treating each team's success rate as completely independent, we model them as drawn from a common population distribution. Teams with fewer attempts get "shrunk" more strongly toward the population mean, while teams with many attempts retain estimates closer to their observed rate. **Model Components**: 1. **Hyperpriors** (mu, kappa): Define the population distribution of team abilities 2. **Team-specific parameters** (theta): Each team's true success rate 3. **Likelihood**: Probability of observing the actual data given team parameters **Shrinkage Intuition**: A team that's 1-for-8 (12.5%) gets estimated around 35-40% (shrunk toward ~48% mean), while a team that's 50-for-100 (50%) gets estimated around 48-49% (minimal shrinkage with large sample). **Why This Matters**: Bayesian estimates are better for: - **Prediction**: More accurate than raw success rates for future attempts - **Ranking**: Less influenced by small-sample noise - **Uncertainty quantification**: Credible intervals reflect both sample size and fit to data **Computational Requirements**: These models require Markov Chain Monte Carlo (MCMC) sampling, which is computationally intensive. Running 2000 iterations across 4 chains might take 2-10 minutes depending on hardware.

When to Use Bayesian vs. Frequentist Methods

**Use Frequentist Approaches** (like logistic regression) when: - Sample sizes are large (100+ observations per group) - You need fast computation - Interpretation of p-values and confidence intervals is important for your audience - You're testing specific hypotheses about coefficients **Use Bayesian Approaches** (like hierarchical models) when: - Sample sizes are small or vary widely across groups - You want to incorporate prior information - You need to estimate many related parameters (e.g., all 32 teams) - You want full posterior distributions for decision-making under uncertainty - Shrinkage/regularization is desirable to prevent overfitting For two-point conversions, the small sample sizes and desire to estimate team-specific rates make Bayesian hierarchical models particularly appropriate, though the added complexity may not be necessary for basic analysis.

Case Studies: Notable Two-Point Decisions

Examining specific high-profile two-point conversion decisions helps illustrate the concepts we've developed and shows how theory applies in critical game situations.

Philadelphia Eagles - Super Bowl LII (2017 Season)

The Eagles' aggressive two-point strategy throughout the 2017 season culminated in their Super Bowl victory. Head coach Doug Pederson and offensive coordinator Frank Reich consistently went for two-point conversions in situations where analytics supported the decision, building trust in their approach and developing a robust two-point playbook.

Season-Long Strategy: The Eagles attempted 6 two-point conversions during the regular season (more than most teams), converting 4 (67%). This high success rate likely reflected:
1. Extensive practice and preparation
2. Multiple play options preventing defensive predictability
3. Commitment to the strategy even after failures
4. Offensive creativity and willingness to try unconventional plays

Super Bowl Impact: While they didn't attempt a two-point conversion in the Super Bowl itself, their season-long commitment to aggressive analytics-driven decisions created a culture that enabled other bold calls, including the famous "Philly Special" fourth-down touchdown pass to QB Nick Foles.

Analytics-Driven Decisions Gone Right

Case Study: Down 14 Strategy

When trailing by 14 points after scoring a touchdown: **Traditional Approach**: Kick both extra points (down 8 → down 1) - Path: TD + XP (down 8) → TD + XP (tied) - Success probability: (0.94)² = 88.4% - Result: Tie game requiring overtime - Weakness: Locks into predetermined path with no flexibility **Analytics Approach**: Go for 2 on first TD, adjust based on result - If successful (48% probability): Down 14 → Down 8 → Down 1 (with XP after second TD) - If failed (52% probability): Down 14 → Down 10 → Down 3 (with XP after second TD, then need FG) - Advantage: Information value guides future strategy **The Key Insight**: Going for two on the first touchdown gives you information that affects clock management and risk-taking on subsequent drives. If it succeeds, you know exactly what you need (TD+XP). If it fails, you know you need TD+FG, changing how aggressively you manage the clock and whether you accept field goals on fourth down. **Expected Outcome Calculation**: - Traditional: 88.4% chance of tie, 0% chance of winning in regulation - Analytics: ~45% chance of tie (both TDs successful, converting or kicking as appropriate), plus small chance of winning in regulation if you score second TD very late - The information value and maintained optionality make the analytics approach superior even though the expected point total is nearly identical

This case study illustrates why simple expected value calculations miss important strategic considerations. The value of information and the importance of maintaining decision flexibility can outweigh small differences in expected points.

Lessons from High-Profile Decisions

Several lessons emerge from studying notable two-point conversion decisions: 1. **Preparation Matters**: Teams that practice two-point situations extensively and develop multiple play options show higher success rates 2. **Confidence Enables Execution**: Coaches who trust the analytics and commit to the strategy enable better execution by players who know the decision is sound 3. **Failure is Part of Optimization**: Even optimal strategies fail nearly half the time. Single failures shouldn't cause abandonment of sound decision frameworks 4. **Context Complexity**: While our models provide guidance, actual game decisions involve factors our models don't capture—momentum, player fatigue, injury situations, weather changes 5. **Communication is Critical**: Whether teams go for two or not, clear communication about the decision-making process helps players understand and execute the plan The gap between "what analytics recommends" and "what coaches do" has narrowed significantly since 2015, but remains substantial. Continued education and demonstration of long-run value will likely close this gap further.

Summary

Two-point conversion strategy involves complex tradeoffs between expected points, win probability, and game situation. Key takeaways from this chapter:

Expected Value Shift: The 2015 rule change fundamentally altered the calculus, giving two-point conversions (0.96 EV) a slight edge over extra points (0.94 EV) in neutral situations. However, this small expected value advantage is often overwhelmed by game-state considerations.
Score-Specific Strategy: Optimal decisions depend critically on score differential:
- Down 8: Strong case for going for two (to tie immediately)
- Down 14: Go for two on first TD (information value and optionality)
- Down 7: Generally kick to guarantee tie
- Other differentials: Require situation-specific win probability analysis
Late-Game Dynamics: Dynamic programming reveals that optimal strategies change with time remaining. With limited possessions remaining, guaranteed points become more valuable relative to higher-variance two-point attempts.
Play-Calling Balance: Pass plays slightly outperform runs (~49% vs ~47% success), but teams maintain a balanced mix (60-40 pass-run) to prevent defensive predictability. This equilibrium reflects strategic optimization.
Information Value: Two-point attempts provide information that affects subsequent strategy. This value isn't captured in simple expected value calculations but can be formalized in sequential decision models.
Model Limitations: While we can build sophisticated probability models and decision frameworks, two-point conversions remain high-variance events with significant randomness. Even optimal strategies fail nearly half the time.
Practical Application: Coaches should:
- Use analytical frameworks as starting points for decisions
- Develop multiple two-point plays to prevent predictability
- Practice two-point situations extensively
- Trust the process even after individual failures
- Adjust probabilities based on team-specific factors

The ongoing evolution of two-point conversion strategy represents one of the clearest examples of analytics influencing NFL coaching decisions, with significant room for continued improvement as more teams adopt data-driven approaches.

Exercises

Conceptual Questions

Rule Change Impact: Explain how the 2015 extra point rule change affected the expected value calculus for two-point conversions. Calculate the expected value advantage before and after the rule change. What other factors besides expected points should teams consider when making the decision?
Down 8 Decision: Why is going for two when down 8 points (after scoring a TD to make it down 2) considered optimal? Walk through the game tree for both decisions (go for two vs. kick) and compare the win probability paths. Under what circumstances might kicking be preferable?
Early vs. Late Game: Why might teams be more willing to kick extra points early in games but go for two-point conversions late in games, even when the score differential is the same? Consider both expected value and win probability maximization perspectives. Does this behavioral pattern align with optimal strategy?
Information Value: Explain the concept of "information value" in the context of two-point conversions. Provide a specific scenario where attempting a two-point conversion early provides information that helps optimize later decisions, even if the attempt fails.

Coding Exercises

Exercise 1: Team-Specific Two-Point Models

Using the play-by-play data from 2015-2023: a) Calculate each team's two-point conversion success rate (both on offense and defense) b) Build a confidence interval around each estimate using the binomial distribution c) Identify which teams significantly outperform or underperform the league average (using appropriate statistical tests) d) Analyze whether offensive strength (measured by EPA/play) predicts two-point success using correlation or regression e) Create a visualization showing team success rates with confidence intervals, highlighting significant outliers **Hint**: Use a minimum threshold (e.g., 10 attempts) to ensure adequate sample size. Consider using Bayesian shrinkage for more stable estimates with small samples. **Extension**: Build a logistic regression model that predicts two-point success using team offensive and defensive EPA, controlling for other factors like home field and game situation.

Exercise 2: Build Your Own Decision Chart

Create a comprehensive two-point decision chart that considers: a) Score differential from +14 to -14 (all relevant scenarios) b) Time remaining (create categories: 10+ min, 5-10 min, 2-5 min, <2 min) c) Different success probability assumptions (45%, 48%, 50%) to test robustness d) Visualize as a heat map or grid showing optimal decisions for each combination e) Compare your chart to traditional coaching wisdom and identify discrepancies **Challenge**: Incorporate timeout situations into your model. How should having 0 vs 3 timeouts affect the decision? **Advanced Extension**: Use actual win probability models (from nflfastR) instead of simplified score-based approximations to calculate exact win probability changes for each decision.

Exercise 3: Historical Decision Analysis

Analyze actual coaching decisions to evaluate decision quality: a) Find all two-point attempts and extra point kicks from 2015-2023 b) For each post-touchdown situation, calculate whether the optimal decision was to go for two or kick (using win probability changes) c) Classify each actual decision as "optimal" or "suboptimal" based on your framework d) Calculate what percentage of decisions matched optimal strategy overall and by season e) Identify specific coaches or teams that make the best two-point decisions f) Measure whether decision quality has improved over time (test for trend) **Extension**: Calculate the expected win probability cost of suboptimal decisions. How many games per season do teams lose due to poor two-point conversion decisions? **Advanced Challenge**: Analyze whether coaches learn from their mistakes—do coaches who make a suboptimal decision in one game make better decisions in subsequent similar situations?

Exercise 4: Monte Carlo Simulation

Build a Monte Carlo simulation for the following scenario: **Setup**: Your team scores a TD to trail 14-8 with 3:00 remaining. You have all three timeouts. Opponent will receive kickoff. **Decision**: Go for two-point conversion or kick extra point? **Simulation Requirements**: - Simulate 10,000 games for each strategy - Model: Opponent possession (scoring probability, time consumed), your team's possession if you get the ball back (scoring probability, time consumed) - Track outcomes: Win, Loss, Overtime for each strategy - Compare: - Win probability for each strategy - Overtime probability - Distribution of final score differentials - Expected point margin **Assumptions Needed**: - Opponent scoring rate (probabilities of TD, FG, punt by field position) - Your team's scoring rate (similar breakdown) - Time consumption per drive - Two-point and extra point success probabilities **Deliverable**: - Win probability estimate for each strategy - Visualization of outcome distributions - Sensitivity analysis showing how results change with different assumptions **Extension**: Expand to simulate an entire drive sequence and model optimal play-calling (when to take risks vs. play conservatively) based on the conversion decision.

Exercise 5: Defensive Two-Point Analysis

Analyze defensive strategy and performance against two-point conversions: a) Do defenses perform differently on two-point attempts vs. regular plays from the 2-yard line? Compare success rates controlling for play type. b) Is there a home field advantage on two-point conversions? Test whether home teams convert at higher rates controlling for team quality. c) Analyze whether certain defensive schemes or coordinator tendencies correlate with two-point conversion defense success. d) Build a logistic regression model predicting two-point defensive success using defensive EPA, home/away, and other relevant factors. e) Identify which defensive coordinators have been most successful at defending two-point conversions (accounting for sample size with Bayesian methods). **Data Needed**: Combine two-point attempt data with defensive performance metrics (EPA against, DVOA, etc.) and coaching information. **Challenge**: Attempt to classify defensive play types on two-point conversions (using play description text or other available data) and analyze which defensive approaches are most successful.

References

:::