Chapter 27: NFL Draft Analytics | Football Analytics Textbook

Learning ObjectivesBy the end of this chapter, you will be able to:

Evaluate draft pick value and trade curves
Analyze draft success rates by position and round
Build college-to-NFL projection models
Assess draft strategy and team performance
Optimize draft capital allocation

Introduction

The NFL Draft represents one of the most critical mechanisms for team building in professional football. Every spring, 32 teams select from a pool of college players, making decisions that can shape their franchise for years to come. The stakes are enormous: a successful draft can transform a struggling team into a championship contender, while poor draft decisions can waste valuable resources and set a franchise back years.

Traditional draft evaluation relied heavily on subjective scouting reports, combine performances, and gut instincts. While scouting expertise remains invaluable, modern analytics provides powerful tools to complement traditional evaluation methods. Data-driven approaches help teams:

Quantify pick value to make optimal trade decisions
Identify historical patterns of success and failure by position and round
Project college performance to NFL outcomes using statistical models
Evaluate combine metrics for predictive validity
Assess team draft performance over time
Optimize resource allocation across draft positions

The Draft as an Optimization Problem

The NFL Draft is fundamentally an optimization problem under uncertainty. Teams must allocate limited resources (draft picks) to maximize long-term team value while accounting for: - Positional scarcity and value - Projection uncertainty (college-to-NFL translation) - Information asymmetry (other teams' preferences) - Temporal constraints (immediate vs future needs) - Salary cap implications

This chapter explores the analytical frameworks that help teams navigate these challenges and make better draft decisions.

The Evolution of Draft Analytics

Traditional Approaches (Pre-2000s)

Historically, draft evaluation focused on:

Scouting reports: Subjective evaluations from college game film
Combine performance: Physical measurements and athletic testing
Positional need: Filling roster gaps
Best player available: Simple ranking systems

The Jimmy Johnson Trade Value Chart, created in the 1990s, was one of the first systematic attempts to quantify draft pick value. While widely used, it significantly overvalues early picks.

The Analytics Revolution (2000s-Present)

Modern draft analytics emerged from several key developments:

2004: Michael Lewis's Moneyball inspires football analytics
2011: Massey-Thaler study shows systematic overvaluation of high picks
2013: Harvard Sports Analysis Collective develops new value charts
2015+: Machine learning models for draft projection gain traction
2020s: Integration of tracking data and advanced college metrics

Draft Pick Value Charts

The Jimmy Johnson Chart

The traditional trade value chart assigns points to each pick:

Pick	Points	Pick	Points	Pick	Points
1	3000	11	1250	21	800
2	2600	12	1200	22	780
3	2200	13	1150	23	760
4	1800	14	1100	24	740
5	1700	15	1050	25	720

Problems with the Jimmy Johnson Chart:

Overvalues top picks relative to their historical performance
Doesn't account for positional differences
Based on subjective valuation, not empirical analysis
Doesn't consider salary cap implications

Modern Value Charts

Let's implement modern draft value models based on historical performance.

R
Python

#| label: setup-r
#| message: false
#| warning: false

library(tidyverse)
library(nflfastR)
library(nflplotR)
library(gt)
library(scales)

# Set seed for reproducibility
set.seed(2024)

#| label: jimmy-johnson-chart-r
#| message: false
#| warning: false

# Implement Jimmy Johnson trade value chart
jimmy_johnson_value <- function(pick) {
  case_when(
    pick == 1 ~ 3000,
    pick == 2 ~ 2600,
    pick == 3 ~ 2200,
    pick == 4 ~ 1800,
    pick == 5 ~ 1700,
    pick == 6 ~ 1600,
    pick == 7 ~ 1500,
    pick == 8 ~ 1400,
    pick == 9 ~ 1350,
    pick == 10 ~ 1300,
    pick >= 11 & pick <= 20 ~ 3000 - (pick - 1) * 100,
    pick >= 21 & pick <= 32 ~ 1000 - (pick - 21) * 20,
    pick >= 33 & pick <= 64 ~ 580 - (pick - 33) * 10,
    pick >= 65 & pick <= 96 ~ 265 - (pick - 65) * 6,
    pick >= 97 & pick <= 128 ~ 112 - (pick - 97) * 2.4,
    pick >= 129 & pick <= 160 ~ 43 - (pick - 129) * 0.8,
    pick >= 161 & pick <= 192 ~ 24 - (pick - 161) * 0.4,
    pick >= 193 & pick <= 224 ~ 11 - (pick - 193) * 0.2,
    pick >= 225 & pick <= 256 ~ 4.4 - (pick - 225) * 0.1,
    TRUE ~ 0
  )
}

# Create draft pick value dataframe
draft_values <- tibble(
  pick = 1:256,
  jimmy_johnson = jimmy_johnson_value(pick),
  round = ceiling(pick / 32)
)

# Display first round values
draft_values %>%
  filter(pick <= 32) %>%
  gt() %>%
  cols_label(
    pick = "Pick",
    jimmy_johnson = "JJ Value",
    round = "Round"
  ) %>%
  fmt_number(
    columns = jimmy_johnson,
    decimals = 0
  ) %>%
  tab_header(
    title = "Jimmy Johnson Trade Value Chart",
    subtitle = "First Round (Picks 1-32)"
  )

#| label: setup-py
#| message: false
#| warning: false

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

# Set random seed
np.random.seed(2024)

# Set plot style
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 100

#| label: jimmy-johnson-chart-py
#| message: false
#| warning: false

def jimmy_johnson_value(pick):
    """Calculate Jimmy Johnson trade value for a draft pick"""
    if pick == 1:
        return 3000
    elif pick == 2:
        return 2600
    elif pick == 3:
        return 2200
    elif pick == 4:
        return 1800
    elif pick == 5:
        return 1700
    elif pick == 6:
        return 1600
    elif pick == 7:
        return 1500
    elif pick == 8:
        return 1400
    elif pick == 9:
        return 1350
    elif pick == 10:
        return 1300
    elif pick <= 20:
        return 3000 - (pick - 1) * 100
    elif pick <= 32:
        return 1000 - (pick - 21) * 20
    elif pick <= 64:
        return 580 - (pick - 33) * 10
    elif pick <= 96:
        return 265 - (pick - 65) * 6
    elif pick <= 128:
        return 112 - (pick - 97) * 2.4
    elif pick <= 160:
        return 43 - (pick - 129) * 0.8
    elif pick <= 192:
        return 24 - (pick - 161) * 0.4
    elif pick <= 224:
        return 11 - (pick - 193) * 0.2
    elif pick <= 256:
        return 4.4 - (pick - 225) * 0.1
    else:
        return 0

# Create draft pick value dataframe
draft_values = pd.DataFrame({
    'pick': range(1, 257),
    'jimmy_johnson': [jimmy_johnson_value(p) for p in range(1, 257)]
})
draft_values['round'] = np.ceil(draft_values['pick'] / 32).astype(int)

# Display first round values
print("\nJimmy Johnson Trade Value Chart - First Round:\n")
print(draft_values[draft_values['pick'] <= 32][['pick', 'jimmy_johnson', 'round']].to_string(index=False))

Value Chart Based on Historical Performance

We'll create a more empirically grounded value chart using Approximate Value (AV), a metric that estimates player contribution.

R
Python

#| label: empirical-value-r
#| message: false
#| warning: false
#| cache: true

# Simulate draft data (in practice, load from Pro Football Reference or similar)
# This simulates historical draft picks with their career AV
simulate_draft_data <- function(n_years = 10) {
  set.seed(123)

  # Generate draft picks over multiple years
  draft_data <- expand_grid(
    year = 2010:2019,
    pick = 1:256
  ) %>%
    mutate(
      round = ceiling(pick / 32),
      # Simulate career AV with decreasing value by pick
      # Top picks have higher expected value but more variance
      expected_av = case_when(
        pick <= 10 ~ 60 - pick * 3,
        pick <= 32 ~ 50 - pick * 1.5,
        pick <= 64 ~ 40 - pick * 0.6,
        pick <= 96 ~ 30 - pick * 0.4,
        pick <= 160 ~ 25 - pick * 0.15,
        TRUE ~ 10 - pick * 0.03
      ),
      # Add noise
      career_av = pmax(0, rnorm(n(), expected_av, 15)),
      # Simulate if player became starter
      starter = career_av > 25,
      # Simulate if player was "hit"
      hit = career_av > 40,
      # Position simulation
      position = sample(
        c("QB", "RB", "WR", "TE", "OL", "DL", "LB", "DB"),
        n(),
        replace = TRUE,
        prob = c(0.08, 0.10, 0.12, 0.05, 0.20, 0.15, 0.12, 0.18)
      )
    )

  return(draft_data)
}

# Generate simulated data
draft_historical <- simulate_draft_data()

# Calculate average value by pick
pick_values <- draft_historical %>%
  group_by(pick, round) %>%
  summarise(
    avg_av = mean(career_av),
    median_av = median(career_av),
    starter_rate = mean(starter),
    hit_rate = mean(hit),
    n_players = n(),
    .groups = "drop"
  )

# Display value by pick for first round
pick_values %>%
  filter(pick <= 32) %>%
  gt() %>%
  cols_label(
    pick = "Pick",
    round = "Round",
    avg_av = "Avg AV",
    median_av = "Med AV",
    starter_rate = "Starter Rate",
    hit_rate = "Hit Rate",
    n_players = "N"
  ) %>%
  fmt_number(
    columns = c(avg_av, median_av),
    decimals = 1
  ) %>%
  fmt_percent(
    columns = c(starter_rate, hit_rate),
    decimals = 1
  ) %>%
  tab_header(
    title = "Draft Pick Value by Historical Performance",
    subtitle = "First Round (Picks 1-32)"
  )

#| label: empirical-value-py
#| message: false
#| warning: false
#| cache: true

def simulate_draft_data(n_years=10):
    """Simulate historical draft data with career AV"""
    np.random.seed(123)

    # Generate draft picks
    years = range(2010, 2020)
    picks = range(1, 257)

    data = []
    for year in years:
        for pick in picks:
            round_num = int(np.ceil(pick / 32))

            # Expected AV decreases with pick number
            if pick <= 10:
                expected_av = 60 - pick * 3
            elif pick <= 32:
                expected_av = 50 - pick * 1.5
            elif pick <= 64:
                expected_av = 40 - pick * 0.6
            elif pick <= 96:
                expected_av = 30 - pick * 0.4
            elif pick <= 160:
                expected_av = 25 - pick * 0.15
            else:
                expected_av = 10 - pick * 0.03

            # Add noise
            career_av = max(0, np.random.normal(expected_av, 15))

            data.append({
                'year': year,
                'pick': pick,
                'round': round_num,
                'career_av': career_av,
                'starter': career_av > 25,
                'hit': career_av > 40,
                'position': np.random.choice(
                    ['QB', 'RB', 'WR', 'TE', 'OL', 'DL', 'LB', 'DB'],
                    p=[0.08, 0.10, 0.12, 0.05, 0.20, 0.15, 0.12, 0.18]
                )
            })

    return pd.DataFrame(data)

# Generate simulated data
draft_historical = simulate_draft_data()

# Calculate average value by pick
pick_values = (draft_historical
    .groupby(['pick', 'round'])
    .agg({
        'career_av': ['mean', 'median'],
        'starter': 'mean',
        'hit': 'mean',
        'year': 'count'
    })
    .reset_index()
)

pick_values.columns = ['pick', 'round', 'avg_av', 'median_av',
                       'starter_rate', 'hit_rate', 'n_players']

# Display first round
print("\nDraft Pick Value by Historical Performance - First Round:\n")
first_round = pick_values[pick_values['pick'] <= 32].copy()
first_round['starter_rate'] = (first_round['starter_rate'] * 100).round(1)
first_round['hit_rate'] = (first_round['hit_rate'] * 100).round(1)
print(first_round.to_string(index=False))

#| label: fig-value-curves-r
#| fig-cap: "Comparison of draft value curves"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Combine value systems
value_comparison <- draft_values %>%
  left_join(pick_values %>% select(pick, avg_av), by = "pick") %>%
  mutate(
    # Normalize values to 0-100 scale
    jj_normalized = (jimmy_johnson / max(jimmy_johnson, na.rm = TRUE)) * 100,
    av_normalized = (avg_av / max(avg_av, na.rm = TRUE)) * 100
  )

# Create comparison plot
ggplot(value_comparison, aes(x = pick)) +
  geom_line(aes(y = jj_normalized, color = "Jimmy Johnson"),
            linewidth = 1.2, alpha = 0.8) +
  geom_line(aes(y = av_normalized, color = "Empirical (AV)"),
            linewidth = 1.2, alpha = 0.8) +
  geom_vline(xintercept = seq(32, 256, 32),
             linetype = "dashed", alpha = 0.3) +
  scale_color_manual(
    values = c("Jimmy Johnson" = "#d62728", "Empirical (AV)" = "#2ca02c"),
    name = "Value System"
  ) +
  scale_x_continuous(breaks = c(1, 32, 64, 96, 128, 160, 192, 224, 256)) +
  labs(
    title = "Draft Pick Value Curves: Traditional vs Empirical",
    subtitle = "Normalized to 0-100 scale | Vertical lines indicate round boundaries",
    x = "Draft Pick",
    y = "Normalized Value",
    caption = "Note: Jimmy Johnson chart overvalues early picks relative to historical performance"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    plot.subtitle = element_text(size = 10),
    legend.position = "top",
    panel.grid.minor = element_blank()
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-value-curves-py
#| fig-cap: "Comparison of draft value curves - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Combine value systems
value_comparison = draft_values.merge(
    pick_values[['pick', 'avg_av']],
    on='pick'
)

# Normalize to 0-100 scale
value_comparison['jj_normalized'] = (
    value_comparison['jimmy_johnson'] / value_comparison['jimmy_johnson'].max()
) * 100
value_comparison['av_normalized'] = (
    value_comparison['avg_av'] / value_comparison['avg_av'].max()
) * 100

# Create plot
fig, ax = plt.subplots(figsize=(12, 7))

ax.plot(value_comparison['pick'], value_comparison['jj_normalized'],
        label='Jimmy Johnson', color='#d62728', linewidth=2, alpha=0.8)
ax.plot(value_comparison['pick'], value_comparison['av_normalized'],
        label='Empirical (AV)', color='#2ca02c', linewidth=2, alpha=0.8)

# Add round boundaries
for round_end in range(32, 257, 32):
    ax.axvline(x=round_end, color='gray', linestyle='--', alpha=0.3)

ax.set_xlabel('Draft Pick', fontsize=12)
ax.set_ylabel('Normalized Value', fontsize=12)
ax.set_title('Draft Pick Value Curves: Traditional vs Empirical\nNormalized to 0-100 scale',
             fontsize=14, fontweight='bold', pad=20)
ax.legend(title='Value System', loc='upper right', fontsize=10)
ax.set_xticks([1, 32, 64, 96, 128, 160, 192, 224, 256])
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Note: Jimmy Johnson chart overvalues early picks',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

Key Insight: The Surplus Value Curve

The empirical value curve is much flatter than the Jimmy Johnson chart, suggesting that traditional draft trade values significantly overvalue top picks. This creates opportunities for savvy teams to trade down and accumulate additional picks.

Draft Success Rates by Position and Round

Understanding historical success rates helps teams set realistic expectations and identify value.

R
Python

#| label: success-rates-r
#| message: false
#| warning: false

# Calculate success rates by round and position
success_by_round_position <- draft_historical %>%
  group_by(round, position) %>%
  summarise(
    n_picks = n(),
    avg_av = mean(career_av),
    starter_rate = mean(starter),
    hit_rate = mean(hit),
    bust_rate = mean(career_av < 10),
    .groups = "drop"
  ) %>%
  arrange(round, desc(avg_av))

# Success rates by round (all positions)
success_by_round <- draft_historical %>%
  group_by(round) %>%
  summarise(
    n_picks = n(),
    avg_av = mean(career_av),
    median_av = median(career_av),
    starter_rate = mean(starter),
    hit_rate = mean(hit),
    bust_rate = mean(career_av < 10),
    .groups = "drop"
  )

# Display table
success_by_round %>%
  gt() %>%
  cols_label(
    round = "Round",
    n_picks = "N Picks",
    avg_av = "Avg AV",
    median_av = "Med AV",
    starter_rate = "Starter %",
    hit_rate = "Hit %",
    bust_rate = "Bust %"
  ) %>%
  fmt_number(
    columns = c(avg_av, median_av),
    decimals = 1
  ) %>%
  fmt_percent(
    columns = c(starter_rate, hit_rate, bust_rate),
    decimals = 1
  ) %>%
  tab_header(
    title = "Draft Success Rates by Round",
    subtitle = "All positions, 2010-2019 drafts"
  ) %>%
  tab_footnote(
    footnote = "Starter: Career AV > 25 | Hit: Career AV > 40 | Bust: Career AV < 10"
  )

#| label: success-rates-py
#| message: false
#| warning: false

# Calculate success rates by round
success_by_round = (draft_historical
    .groupby('round')
    .agg({
        'year': 'count',
        'career_av': ['mean', 'median'],
        'starter': 'mean',
        'hit': 'mean'
    })
    .reset_index()
)

success_by_round.columns = ['round', 'n_picks', 'avg_av', 'median_av',
                             'starter_rate', 'hit_rate']

# Add bust rate
bust_rate = (draft_historical
    .groupby('round')
    .apply(lambda x: (x['career_av'] < 10).mean())
    .reset_index(name='bust_rate')
)

success_by_round = success_by_round.merge(bust_rate, on='round')

# Display
print("\nDraft Success Rates by Round (All Positions):\n")
display_df = success_by_round.copy()
display_df['avg_av'] = display_df['avg_av'].round(1)
display_df['median_av'] = display_df['median_av'].round(1)
display_df['starter_rate'] = (display_df['starter_rate'] * 100).round(1)
display_df['hit_rate'] = (display_df['hit_rate'] * 100).round(1)
display_df['bust_rate'] = (display_df['bust_rate'] * 100).round(1)
print(display_df.to_string(index=False))
print("\nNote: Starter: Career AV > 25 | Hit: Career AV > 40 | Bust: Career AV < 10")

Position-Specific Success Rates

R
Python

#| label: fig-position-success-r
#| fig-cap: "Success rates by position and round"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Filter to first 3 rounds for clarity
position_success_plot <- success_by_round_position %>%
  filter(round <= 3)

ggplot(position_success_plot, aes(x = position, y = hit_rate, fill = as.factor(round))) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_manual(
    values = c("1" = "#1f77b4", "2" = "#ff7f0e", "3" = "#2ca02c"),
    name = "Round"
  ) +
  scale_y_continuous(labels = percent_format()) +
  labs(
    title = "Draft Hit Rates by Position and Round",
    subtitle = "Percentage of players with career AV > 40 (Rounds 1-3)",
    x = "Position",
    y = "Hit Rate",
    caption = "Data: Simulated draft data 2010-2019"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top",
    panel.grid.major.x = element_blank()
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-position-success-py
#| fig-cap: "Success rates by position and round - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Calculate by position and round
success_by_round_position = (draft_historical
    .groupby(['round', 'position'])
    .agg({
        'year': 'count',
        'career_av': 'mean',
        'starter': 'mean',
        'hit': 'mean'
    })
    .reset_index()
)

success_by_round_position.columns = ['round', 'position', 'n_picks',
                                      'avg_av', 'starter_rate', 'hit_rate']

# Filter to rounds 1-3
plot_data = success_by_round_position[success_by_round_position['round'] <= 3].copy()

# Create plot
fig, ax = plt.subplots(figsize=(12, 8))

positions = sorted(plot_data['position'].unique())
rounds = [1, 2, 3]
x = np.arange(len(positions))
width = 0.25

colors = ['#1f77b4', '#ff7f0e', '#2ca02c']

for i, round_num in enumerate(rounds):
    round_data = plot_data[plot_data['round'] == round_num]
    round_data = round_data.set_index('position').reindex(positions, fill_value=0)
    ax.bar(x + i * width, round_data['hit_rate'], width,
           label=f'Round {round_num}', color=colors[i], alpha=0.8)

ax.set_xlabel('Position', fontsize=12)
ax.set_ylabel('Hit Rate', fontsize=12)
ax.set_title('Draft Hit Rates by Position and Round\nPercentage of players with career AV > 40 (Rounds 1-3)',
             fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(x + width)
ax.set_xticklabels(positions)
ax.legend(title='Round', loc='upper right')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.grid(True, axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

Position Variability

Success rates vary significantly by position. Premium positions (QB, OL, DL) tend to have higher hit rates in early rounds, while skill positions show more variability. This reflects both positional importance and the difficulty of projecting college performance to the NFL.

College-to-NFL Performance Correlation

A critical question in draft analytics: How well do college statistics predict NFL success?

R
Python

#| label: college-nfl-correlation-r
#| message: false
#| warning: false
#| cache: true

# Simulate college statistics
set.seed(456)

draft_with_college <- draft_historical %>%
  mutate(
    # College production metrics (normalized, position-specific)
    college_production = case_when(
      position == "QB" ~ rnorm(n(), 70, 15),
      position == "RB" ~ rnorm(n(), 65, 20),
      position == "WR" ~ rnorm(n(), 60, 18),
      position == "TE" ~ rnorm(n(), 55, 16),
      position %in% c("OL", "DL") ~ rnorm(n(), 50, 12),
      position == "LB" ~ rnorm(n(), 58, 14),
      position == "DB" ~ rnorm(n(), 62, 15),
      TRUE ~ rnorm(n(), 50, 15)
    ),
    # Athletic score (combine)
    athletic_score = rnorm(n(), 50, 10),
    # Add correlation with NFL success
    college_production = college_production + career_av * 0.3 + rnorm(n(), 0, 5),
    athletic_score = athletic_score + career_av * 0.2 + rnorm(n(), 0, 5)
  ) %>%
  mutate(
    college_production = pmax(0, pmin(100, college_production)),
    athletic_score = pmax(0, pmin(100, athletic_score))
  )

# Calculate correlations by position
college_correlations <- draft_with_college %>%
  group_by(position) %>%
  summarise(
    n = n(),
    production_cor = cor(college_production, career_av),
    athletic_cor = cor(athletic_score, career_av),
    combined_cor = cor(college_production + athletic_score, career_av),
    .groups = "drop"
  ) %>%
  arrange(desc(combined_cor))

# Display correlations
college_correlations %>%
  gt() %>%
  cols_label(
    position = "Position",
    n = "N",
    production_cor = "Production ρ",
    athletic_cor = "Athletic ρ",
    combined_cor = "Combined ρ"
  ) %>%
  fmt_number(
    columns = c(production_cor, athletic_cor, combined_cor),
    decimals = 3
  ) %>%
  tab_header(
    title = "College-to-NFL Correlations by Position",
    subtitle = "Correlation with career AV"
  ) %>%
  tab_footnote(
    footnote = "ρ = Pearson correlation coefficient"
  )

#| label: college-nfl-correlation-py
#| message: false
#| warning: false
#| cache: true

# Simulate college statistics
np.random.seed(456)

def generate_college_stats(df):
    """Add simulated college statistics"""
    college_prod = []
    athletic = []

    for _, row in df.iterrows():
        # Position-specific college production
        if row['position'] == 'QB':
            base_prod = np.random.normal(70, 15)
        elif row['position'] == 'RB':
            base_prod = np.random.normal(65, 20)
        elif row['position'] == 'WR':
            base_prod = np.random.normal(60, 18)
        elif row['position'] == 'TE':
            base_prod = np.random.normal(55, 16)
        elif row['position'] in ['OL', 'DL']:
            base_prod = np.random.normal(50, 12)
        elif row['position'] == 'LB':
            base_prod = np.random.normal(58, 14)
        elif row['position'] == 'DB':
            base_prod = np.random.normal(62, 15)
        else:
            base_prod = np.random.normal(50, 15)

        # Add correlation with NFL success
        college_prod.append(
            np.clip(base_prod + row['career_av'] * 0.3 + np.random.normal(0, 5), 0, 100)
        )

        athletic.append(
            np.clip(50 + row['career_av'] * 0.2 + np.random.normal(0, 5), 0, 100)
        )

    return college_prod, athletic

draft_with_college = draft_historical.copy()
draft_with_college['college_production'], draft_with_college['athletic_score'] = \
    generate_college_stats(draft_with_college)

# Calculate correlations by position
college_correlations = (draft_with_college
    .groupby('position')
    .apply(lambda x: pd.Series({
        'n': len(x),
        'production_cor': x['college_production'].corr(x['career_av']),
        'athletic_cor': x['athletic_score'].corr(x['career_av']),
        'combined_cor': (x['college_production'] + x['athletic_score']).corr(x['career_av'])
    }))
    .reset_index()
    .sort_values('combined_cor', ascending=False)
)

print("\nCollege-to-NFL Correlations by Position:\n")
print(college_correlations.to_string(index=False))
print("\nNote: ρ = Pearson correlation coefficient")

Visualization of College-NFL Relationship

R
Python

#| label: fig-college-nfl-scatter-r
#| fig-cap: "College production vs NFL career value"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Create faceted scatter plot by position
draft_with_college %>%
  filter(position %in% c("QB", "RB", "WR", "OL", "DL", "LB", "DB")) %>%
  ggplot(aes(x = college_production, y = career_av)) +
  geom_point(aes(color = position), alpha = 0.4, size = 1) +
  geom_smooth(method = "lm", se = TRUE, color = "black", linewidth = 0.8) +
  facet_wrap(~ position, ncol = 4) +
  scale_color_manual(
    values = c("QB" = "#d62728", "RB" = "#2ca02c", "WR" = "#1f77b4",
               "OL" = "#ff7f0e", "DL" = "#9467bd", "LB" = "#8c564b",
               "DB" = "#e377c2")
  ) +
  labs(
    title = "College Production vs NFL Career Value by Position",
    subtitle = "Relationship varies significantly across positions",
    x = "College Production Score (0-100)",
    y = "Career Approximate Value",
    caption = "Data: Simulated draft data | Lines show linear fit"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "none",
    strip.text = element_text(face = "bold", size = 10)
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-college-nfl-scatter-py
#| fig-cap: "College production vs NFL career value - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Filter positions
positions_to_plot = ['QB', 'RB', 'WR', 'OL', 'DL', 'LB', 'DB']
plot_data = draft_with_college[draft_with_college['position'].isin(positions_to_plot)]

# Create subplot grid
fig, axes = plt.subplots(2, 4, figsize=(14, 8))
axes = axes.ravel()

colors = {'QB': '#d62728', 'RB': '#2ca02c', 'WR': '#1f77b4',
          'OL': '#ff7f0e', 'DL': '#9467bd', 'LB': '#8c564b', 'DB': '#e377c2'}

for idx, position in enumerate(positions_to_plot):
    pos_data = plot_data[plot_data['position'] == position]

    axes[idx].scatter(pos_data['college_production'], pos_data['career_av'],
                     alpha=0.4, s=20, color=colors[position])

    # Add regression line
    z = np.polyfit(pos_data['college_production'], pos_data['career_av'], 1)
    p = np.poly1d(z)
    x_line = np.linspace(pos_data['college_production'].min(),
                        pos_data['college_production'].max(), 100)
    axes[idx].plot(x_line, p(x_line), 'k-', linewidth=1.5, alpha=0.8)

    axes[idx].set_title(position, fontweight='bold', fontsize=11)
    axes[idx].set_xlabel('College Production Score', fontsize=9)
    axes[idx].set_ylabel('Career AV', fontsize=9)
    axes[idx].grid(True, alpha=0.3)

# Remove empty subplot
fig.delaxes(axes[7])

fig.suptitle('College Production vs NFL Career Value by Position\nRelationship varies significantly across positions',
             fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()

NFL Combine Metrics and Success

The NFL Combine provides standardized athletic testing. But how predictive are these metrics?

R
Python

#| label: combine-analysis-r
#| message: false
#| warning: false
#| cache: true

# Simulate combine metrics
set.seed(789)

draft_with_combine <- draft_with_college %>%
  mutate(
    # Simulate combine metrics (position-dependent)
    forty_yard = case_when(
      position %in% c("WR", "DB") ~ rnorm(n(), 4.50, 0.10),
      position %in% c("RB", "LB") ~ rnorm(n(), 4.60, 0.12),
      position == "QB" ~ rnorm(n(), 4.75, 0.15),
      position == "TE" ~ rnorm(n(), 4.70, 0.12),
      position %in% c("OL", "DL") ~ rnorm(n(), 5.10, 0.20),
      TRUE ~ rnorm(n(), 4.80, 0.20)
    ),
    # Faster times correlate weakly with success
    forty_yard = forty_yard - career_av * 0.002,

    # Vertical jump
    vertical = case_when(
      position %in% c("WR", "DB") ~ rnorm(n(), 36, 3),
      position %in% c("RB", "LB") ~ rnorm(n(), 34, 3),
      position %in% c("OL", "DL") ~ rnorm(n(), 28, 3),
      TRUE ~ rnorm(n(), 32, 3)
    ) + career_av * 0.1,

    # Broad jump
    broad_jump = case_when(
      position %in% c("WR", "DB") ~ rnorm(n(), 120, 6),
      position %in% c("RB", "LB") ~ rnorm(n(), 118, 6),
      position %in% c("OL", "DL") ~ rnorm(n(), 105, 7),
      TRUE ~ rnorm(n(), 115, 6)
    ) + career_av * 0.2,

    # Bench press (OL/DL specific)
    bench_press = case_when(
      position %in% c("OL", "DL") ~ round(rnorm(n(), 25, 4) + career_av * 0.05),
      position == "LB" ~ round(rnorm(n(), 22, 3) + career_av * 0.04),
      TRUE ~ round(rnorm(n(), 15, 3) + career_av * 0.02)
    ),

    # Height and weight (BMI proxy)
    height_inches = case_when(
      position == "QB" ~ rnorm(n(), 74, 2),
      position %in% c("WR", "DB") ~ rnorm(n(), 71, 2),
      position %in% c("RB") ~ rnorm(n(), 70, 2),
      position %in% c("OL", "DL") ~ rnorm(n(), 76, 2),
      position %in% c("TE", "LB") ~ rnorm(n(), 74, 2),
      TRUE ~ rnorm(n(), 73, 2)
    ),

    weight_lbs = case_when(
      position %in% c("OL", "DL") ~ rnorm(n(), 310, 20),
      position == "TE" ~ rnorm(n(), 250, 15),
      position == "LB" ~ rnorm(n(), 240, 15),
      position == "QB" ~ rnorm(n(), 220, 10),
      position == "RB" ~ rnorm(n(), 215, 15),
      position %in% c("WR", "DB") ~ rnorm(n(), 195, 15),
      TRUE ~ rnorm(n(), 220, 20)
    )
  )

# Calculate combine metric importance by position
combine_correlations <- draft_with_combine %>%
  group_by(position) %>%
  summarise(
    forty_cor = cor(forty_yard, career_av, use = "complete.obs"),
    vertical_cor = cor(vertical, career_av, use = "complete.obs"),
    broad_cor = cor(broad_jump, career_av, use = "complete.obs"),
    bench_cor = cor(bench_press, career_av, use = "complete.obs"),
    .groups = "drop"
  )

# Display
combine_correlations %>%
  gt() %>%
  cols_label(
    position = "Position",
    forty_cor = "40-Yard",
    vertical_cor = "Vertical",
    broad_cor = "Broad Jump",
    bench_cor = "Bench"
  ) %>%
  fmt_number(
    columns = c(forty_cor, vertical_cor, broad_cor, bench_cor),
    decimals = 3
  ) %>%
  tab_header(
    title = "Combine Metric Correlations with NFL Success",
    subtitle = "By position (negative for 40-yard dash = faster is better)"
  )

#| label: combine-analysis-py
#| message: false
#| warning: false
#| cache: true

# Simulate combine metrics
np.random.seed(789)

def generate_combine_metrics(df):
    """Add simulated combine metrics"""
    metrics = {
        'forty_yard': [],
        'vertical': [],
        'broad_jump': [],
        'bench_press': [],
        'height_inches': [],
        'weight_lbs': []
    }

    for _, row in df.iterrows():
        pos = row['position']
        av = row['career_av']

        # 40-yard dash (faster = lower time)
        if pos in ['WR', 'DB']:
            forty = np.random.normal(4.50, 0.10) - av * 0.002
        elif pos in ['RB', 'LB']:
            forty = np.random.normal(4.60, 0.12) - av * 0.002
        elif pos == 'QB':
            forty = np.random.normal(4.75, 0.15) - av * 0.002
        elif pos == 'TE':
            forty = np.random.normal(4.70, 0.12) - av * 0.002
        elif pos in ['OL', 'DL']:
            forty = np.random.normal(5.10, 0.20) - av * 0.002
        else:
            forty = np.random.normal(4.80, 0.20) - av * 0.002

        # Vertical jump
        if pos in ['WR', 'DB']:
            vertical = np.random.normal(36, 3) + av * 0.1
        elif pos in ['RB', 'LB']:
            vertical = np.random.normal(34, 3) + av * 0.1
        elif pos in ['OL', 'DL']:
            vertical = np.random.normal(28, 3) + av * 0.1
        else:
            vertical = np.random.normal(32, 3) + av * 0.1

        # Broad jump
        if pos in ['WR', 'DB']:
            broad = np.random.normal(120, 6) + av * 0.2
        elif pos in ['RB', 'LB']:
            broad = np.random.normal(118, 6) + av * 0.2
        elif pos in ['OL', 'DL']:
            broad = np.random.normal(105, 7) + av * 0.2
        else:
            broad = np.random.normal(115, 6) + av * 0.2

        # Bench press
        if pos in ['OL', 'DL']:
            bench = round(np.random.normal(25, 4) + av * 0.05)
        elif pos == 'LB':
            bench = round(np.random.normal(22, 3) + av * 0.04)
        else:
            bench = round(np.random.normal(15, 3) + av * 0.02)

        # Height
        if pos == 'QB':
            height = np.random.normal(74, 2)
        elif pos in ['WR', 'DB']:
            height = np.random.normal(71, 2)
        elif pos == 'RB':
            height = np.random.normal(70, 2)
        elif pos in ['OL', 'DL']:
            height = np.random.normal(76, 2)
        elif pos in ['TE', 'LB']:
            height = np.random.normal(74, 2)
        else:
            height = np.random.normal(73, 2)

        # Weight
        if pos in ['OL', 'DL']:
            weight = np.random.normal(310, 20)
        elif pos == 'TE':
            weight = np.random.normal(250, 15)
        elif pos == 'LB':
            weight = np.random.normal(240, 15)
        elif pos == 'QB':
            weight = np.random.normal(220, 10)
        elif pos == 'RB':
            weight = np.random.normal(215, 15)
        elif pos in ['WR', 'DB']:
            weight = np.random.normal(195, 15)
        else:
            weight = np.random.normal(220, 20)

        metrics['forty_yard'].append(forty)
        metrics['vertical'].append(vertical)
        metrics['broad_jump'].append(broad)
        metrics['bench_press'].append(bench)
        metrics['height_inches'].append(height)
        metrics['weight_lbs'].append(weight)

    return metrics

# Add metrics to dataframe
combine_metrics = generate_combine_metrics(draft_with_college)
for key, values in combine_metrics.items():
    draft_with_college[key] = values

draft_with_combine = draft_with_college.copy()

# Calculate correlations by position
combine_correlations = (draft_with_combine
    .groupby('position')
    .apply(lambda x: pd.Series({
        'forty_cor': x['forty_yard'].corr(x['career_av']),
        'vertical_cor': x['vertical'].corr(x['career_av']),
        'broad_cor': x['broad_jump'].corr(x['career_av']),
        'bench_cor': x['bench_press'].corr(x['career_av'])
    }))
    .reset_index()
)

print("\nCombine Metric Correlations with NFL Success:\n")
print(combine_correlations.to_string(index=False))
print("\nNote: Negative for 40-yard dash = faster is better")

Combine Limitations

While combine metrics are measurable and objective, their correlation with NFL success is position-dependent and generally modest. Athletic testing should complement, not replace, film study and college production analysis. The combine is best used to identify outliers (positive or negative) rather than as a primary evaluation tool.

Draft Trade Value and Optimization

How should teams value draft pick trades?

Trade Value Model

R
Python

#| label: trade-value-model-r
#| message: false
#| warning: false

# Create surplus value model (AV-based value minus salary cost)
# In practice, would use actual rookie salary scale

# Create modern value chart based on expected AV
modern_value_chart <- pick_values %>%
  mutate(
    # Expected value based on historical AV
    expected_value = avg_av,
    # Normalize to 1000 points for pick 1
    value_points = (expected_value / max(expected_value)) * 1000,
    # Calculate surplus value (simplified)
    # Assume rookie contracts worth ~$X million, decreasing by pick
    rookie_cost = case_when(
      pick <= 10 ~ 30 - pick * 2,
      pick <= 32 ~ 20 - (pick - 10) * 0.8,
      pick <= 100 ~ 10 - (pick - 32) * 0.05,
      TRUE ~ 5 - (pick - 100) * 0.01
    ),
    # Surplus = Value - Cost (simplified)
    surplus_value = expected_value - (rookie_cost / 2)
  )

# Compare value charts
value_chart_comparison <- draft_values %>%
  left_join(modern_value_chart %>% select(pick, value_points, surplus_value),
            by = "pick") %>%
  rename(
    jj_value = jimmy_johnson,
    modern_value = value_points
  )

# Display first round comparison
value_chart_comparison %>%
  filter(pick <= 32) %>%
  select(pick, round, jj_value, modern_value, surplus_value) %>%
  gt() %>%
  cols_label(
    pick = "Pick",
    round = "Rnd",
    jj_value = "JJ Chart",
    modern_value = "Modern",
    surplus_value = "Surplus"
  ) %>%
  fmt_number(
    columns = c(jj_value, modern_value, surplus_value),
    decimals = 0
  ) %>%
  tab_header(
    title = "Draft Pick Value Chart Comparison",
    subtitle = "First Round (Picks 1-32)"
  )

#| label: trade-value-model-py
#| message: false
#| warning: false

# Create modern value chart
modern_value_chart = pick_values.copy()

# Expected value from historical AV
modern_value_chart['expected_value'] = modern_value_chart['avg_av']

# Normalize to 1000 for pick 1
max_value = modern_value_chart['expected_value'].max()
modern_value_chart['value_points'] = (
    modern_value_chart['expected_value'] / max_value
) * 1000

# Simplified rookie cost model
def calculate_rookie_cost(pick):
    if pick <= 10:
        return 30 - pick * 2
    elif pick <= 32:
        return 20 - (pick - 10) * 0.8
    elif pick <= 100:
        return 10 - (pick - 32) * 0.05
    else:
        return 5 - (pick - 100) * 0.01

modern_value_chart['rookie_cost'] = modern_value_chart['pick'].apply(
    calculate_rookie_cost
)

# Surplus value
modern_value_chart['surplus_value'] = (
    modern_value_chart['expected_value'] -
    modern_value_chart['rookie_cost'] / 2
)

# Combine charts
value_chart_comparison = draft_values.merge(
    modern_value_chart[['pick', 'value_points', 'surplus_value']],
    on='pick'
)

value_chart_comparison.rename(columns={
    'jimmy_johnson': 'jj_value',
    'value_points': 'modern_value'
}, inplace=True)

# Display first round
print("\nDraft Pick Value Chart Comparison - First Round:\n")
first_rd = value_chart_comparison[value_chart_comparison['pick'] <= 32][
    ['pick', 'round', 'jj_value', 'modern_value', 'surplus_value']
].copy()
print(first_rd.to_string(index=False))

Trade Analysis Function

R
Python

#| label: trade-analysis-r
#| message: false
#| warning: false

# Function to analyze draft trades
analyze_trade <- function(team_a_picks, team_b_picks, value_system = "modern") {

  # Get value chart
  if (value_system == "modern") {
    values <- value_chart_comparison %>%
      select(pick, value = modern_value)
  } else {
    values <- value_chart_comparison %>%
      select(pick, value = jj_value)
  }

  # Calculate team values
  team_a_value <- values %>%
    filter(pick %in% team_a_picks) %>%
    summarise(total = sum(value, na.rm = TRUE)) %>%
    pull(total)

  team_b_value <- values %>%
    filter(pick %in% team_b_picks) %>%
    summarise(total = sum(value, na.rm = TRUE)) %>%
    pull(total)

  # Return analysis
  list(
    team_a_value = team_a_value,
    team_b_value = team_b_value,
    difference = team_a_value - team_b_value,
    winner = if_else(team_a_value > team_b_value, "Team A", "Team B"),
    fair = abs(team_a_value - team_b_value) < 50
  )
}

# Example trade: Pick 5 for picks 15 and 45
trade_example <- analyze_trade(
  team_a_picks = c(5),
  team_b_picks = c(15, 45),
  value_system = "modern"
)

cat("\nTrade Analysis:\n")
cat("Team A receives: Pick 5\n")
cat("Team B receives: Picks 15, 45\n\n")
cat(sprintf("Team A Value: %.0f\n", trade_example$team_a_value))
cat(sprintf("Team B Value: %.0f\n", trade_example$team_b_value))
cat(sprintf("Difference: %.0f\n", trade_example$difference))
cat(sprintf("Winner: %s\n", trade_example$winner))
cat(sprintf("Fair trade: %s\n", ifelse(trade_example$fair, "Yes", "No")))

#| label: trade-analysis-py
#| message: false
#| warning: false

def analyze_trade(team_a_picks, team_b_picks, value_system='modern'):
    """Analyze a draft pick trade"""

    # Get value chart
    if value_system == 'modern':
        values = value_chart_comparison[['pick', 'modern_value']].copy()
        values.columns = ['pick', 'value']
    else:
        values = value_chart_comparison[['pick', 'jj_value']].copy()
        values.columns = ['pick', 'value']

    # Calculate team values
    team_a_value = values[values['pick'].isin(team_a_picks)]['value'].sum()
    team_b_value = values[values['pick'].isin(team_b_picks)]['value'].sum()

    difference = team_a_value - team_b_value
    winner = "Team A" if difference > 0 else "Team B"
    fair = abs(difference) < 50

    return {
        'team_a_value': team_a_value,
        'team_b_value': team_b_value,
        'difference': difference,
        'winner': winner,
        'fair': fair
    }

# Example trade
trade_example = analyze_trade(
    team_a_picks=[5],
    team_b_picks=[15, 45],
    value_system='modern'
)

print("\nTrade Analysis:")
print("Team A receives: Pick 5")
print("Team B receives: Picks 15, 45\n")
print(f"Team A Value: {trade_example['team_a_value']:.0f}")
print(f"Team B Value: {trade_example['team_b_value']:.0f}")
print(f"Difference: {trade_example['difference']:.0f}")
print(f"Winner: {trade_example['winner']}")
print(f"Fair trade: {'Yes' if trade_example['fair'] else 'No'}")

Positional Value and Scarcity

Not all positions are equally valuable or scarce in the draft.

R
Python

#| label: positional-value-r
#| message: false
#| warning: false

# Calculate positional value metrics
positional_value <- draft_historical %>%
  group_by(position, round) %>%
  summarise(
    n_drafted = n(),
    avg_av = mean(career_av),
    hit_rate = mean(hit),
    .groups = "drop"
  )

# Overall positional value (across all rounds)
position_overall <- draft_historical %>%
  group_by(position) %>%
  summarise(
    n_drafted = n(),
    avg_av = mean(career_av),
    hit_rate = mean(hit),
    starter_rate = mean(starter),
    top_picks = sum(pick <= 32),
    avg_pick = mean(pick),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_av))

# Display
position_overall %>%
  gt() %>%
  cols_label(
    position = "Position",
    n_drafted = "N Drafted",
    avg_av = "Avg AV",
    hit_rate = "Hit Rate",
    starter_rate = "Starter Rate",
    top_picks = "1st Rd Picks",
    avg_pick = "Avg Pick"
  ) %>%
  fmt_number(
    columns = c(avg_av, avg_pick),
    decimals = 1
  ) %>%
  fmt_percent(
    columns = c(hit_rate, starter_rate),
    decimals = 1
  ) %>%
  tab_header(
    title = "Positional Value in the NFL Draft",
    subtitle = "Aggregated across all rounds (2010-2019)"
  )

#| label: positional-value-py
#| message: false
#| warning: false

# Calculate positional value
position_overall = (draft_historical
    .groupby('position')
    .agg({
        'year': 'count',
        'career_av': 'mean',
        'hit': 'mean',
        'starter': 'mean',
        'pick': ['mean', lambda x: (x <= 32).sum()]
    })
    .reset_index()
)

position_overall.columns = ['position', 'n_drafted', 'avg_av', 'hit_rate',
                            'starter_rate', 'avg_pick', 'top_picks']

position_overall = position_overall[['position', 'n_drafted', 'avg_av',
                                    'hit_rate', 'starter_rate', 'top_picks',
                                    'avg_pick']].sort_values('avg_av', ascending=False)

print("\nPositional Value in the NFL Draft:\n")
display_df = position_overall.copy()
display_df['avg_av'] = display_df['avg_av'].round(1)
display_df['avg_pick'] = display_df['avg_pick'].round(1)
display_df['hit_rate'] = (display_df['hit_rate'] * 100).round(1)
display_df['starter_rate'] = (display_df['starter_rate'] * 100).round(1)
print(display_df.to_string(index=False))

Positional Scarcity by Round

R
Python

#| label: fig-position-scarcity-r
#| fig-cap: "Positional hit rates decline by round"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Calculate by round
position_by_round <- draft_historical %>%
  filter(round <= 5) %>%
  group_by(position, round) %>%
  summarise(
    hit_rate = mean(hit),
    .groups = "drop"
  )

ggplot(position_by_round, aes(x = round, y = hit_rate, color = position)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2.5) +
  scale_y_continuous(labels = percent_format()) +
  scale_x_continuous(breaks = 1:5) +
  scale_color_brewer(palette = "Set2") +
  labs(
    title = "Draft Hit Rates by Position and Round",
    subtitle = "Rounds 1-5 | Hit = Career AV > 40",
    x = "Round",
    y = "Hit Rate",
    color = "Position"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-position-scarcity-py
#| fig-cap: "Positional hit rates decline by round - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Calculate by round
position_by_round = (draft_historical[draft_historical['round'] <= 5]
    .groupby(['position', 'round'])
    .agg({'hit': 'mean'})
    .reset_index()
    .rename(columns={'hit': 'hit_rate'})
)

# Create plot
fig, ax = plt.subplots(figsize=(12, 7))

positions = position_by_round['position'].unique()
colors = plt.cm.Set2(np.linspace(0, 1, len(positions)))

for i, position in enumerate(positions):
    pos_data = position_by_round[position_by_round['position'] == position]
    ax.plot(pos_data['round'], pos_data['hit_rate'],
           marker='o', linewidth=2, markersize=7,
           label=position, color=colors[i])

ax.set_xlabel('Round', fontsize=12)
ax.set_ylabel('Hit Rate', fontsize=12)
ax.set_title('Draft Hit Rates by Position and Round\nRounds 1-5 | Hit = Career AV > 40',
            fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(range(1, 6))
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.legend(title='Position', bbox_to_anchor=(1.05, 1), loc='upper left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Team Draft Performance Evaluation

How do we evaluate a team's draft performance?

R
Python

#| label: team-draft-performance-r
#| message: false
#| warning: false
#| cache: true

# Simulate team assignments
set.seed(999)

teams <- c("ARI", "ATL", "BAL", "BUF", "CAR", "CHI", "CIN", "CLE",
           "DAL", "DEN", "DET", "GB", "HOU", "IND", "JAX", "KC",
           "LAC", "LAR", "LV", "MIA", "MIN", "NE", "NO", "NYG",
           "NYJ", "PHI", "PIT", "SEA", "SF", "TB", "TEN", "WAS")

draft_with_teams <- draft_historical %>%
  mutate(
    team = sample(teams, n(), replace = TRUE)
  )

# Calculate team draft grades
team_performance <- draft_with_teams %>%
  group_by(team) %>%
  summarise(
    n_picks = n(),
    total_av = sum(career_av),
    avg_av = mean(career_av),
    hit_rate = mean(hit),
    bust_rate = mean(career_av < 10),
    starters = sum(starter),
    # Expected AV based on draft position
    expected_av = sum(pick_values$avg_av[match(pick, pick_values$pick)]),
    # Actual vs expected
    av_over_expected = total_av - expected_av,
    .groups = "drop"
  ) %>%
  arrange(desc(av_over_expected))

# Display top 10 teams
team_performance %>%
  head(10) %>%
  mutate(rank = row_number()) %>%
  select(rank, team, n_picks, avg_av, hit_rate, starters, av_over_expected) %>%
  gt() %>%
  cols_label(
    rank = "Rank",
    team = "Team",
    n_picks = "Picks",
    avg_av = "Avg AV",
    hit_rate = "Hit Rate",
    starters = "Starters",
    av_over_expected = "AV vs Exp"
  ) %>%
  fmt_number(
    columns = avg_av,
    decimals = 1
  ) %>%
  fmt_percent(
    columns = hit_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = av_over_expected,
    decimals = 0
  ) %>%
  tab_header(
    title = "Best Drafting Teams (2010-2019)",
    subtitle = "Ranked by AV above/below expected based on draft position"
  )

#| label: team-draft-performance-py
#| message: false
#| warning: false
#| cache: true

# Simulate team assignments
np.random.seed(999)

teams = ["ARI", "ATL", "BAL", "BUF", "CAR", "CHI", "CIN", "CLE",
         "DAL", "DEN", "DET", "GB", "HOU", "IND", "JAX", "KC",
         "LAC", "LAR", "LV", "MIA", "MIN", "NE", "NO", "NYG",
         "NYJ", "PHI", "PIT", "SEA", "SF", "TB", "TEN", "WAS"]

draft_with_teams = draft_historical.copy()
draft_with_teams['team'] = np.random.choice(teams, len(draft_with_teams))

# Add expected AV
draft_with_teams = draft_with_teams.merge(
    pick_values[['pick', 'avg_av']].rename(columns={'avg_av': 'expected_av'}),
    on='pick'
)

# Calculate team performance
team_performance = (draft_with_teams
    .groupby('team')
    .agg({
        'year': 'count',
        'career_av': ['sum', 'mean'],
        'hit': 'mean',
        'starter': 'sum',
        'expected_av': 'sum'
    })
    .reset_index()
)

team_performance.columns = ['team', 'n_picks', 'total_av', 'avg_av',
                            'hit_rate', 'starters', 'expected_av']

# Calculate AV over expected
team_performance['av_over_expected'] = (
    team_performance['total_av'] - team_performance['expected_av']
)

# Add bust rate
bust_rate = (draft_with_teams
    .groupby('team')
    .apply(lambda x: (x['career_av'] < 10).mean())
    .reset_index(name='bust_rate')
)

team_performance = team_performance.merge(bust_rate, on='team')

# Sort and display top 10
team_performance = team_performance.sort_values('av_over_expected', ascending=False)

print("\nBest Drafting Teams (2010-2019):\n")
top_10 = team_performance.head(10).copy()
top_10['rank'] = range(1, 11)
top_10['avg_av'] = top_10['avg_av'].round(1)
top_10['hit_rate'] = (top_10['hit_rate'] * 100).round(1)
top_10['av_over_expected'] = top_10['av_over_expected'].round(0)
print(top_10[['rank', 'team', 'n_picks', 'avg_av', 'hit_rate',
              'starters', 'av_over_expected']].to_string(index=False))

Classifying Hits vs Misses

Defining clear criteria for draft success:

R
Python

#| label: hit-miss-classification-r
#| message: false
#| warning: false

# Create classification system
draft_classified <- draft_historical %>%
  mutate(
    # Classification based on AV and round
    classification = case_when(
      # Round 1
      round == 1 & career_av >= 60 ~ "Elite",
      round == 1 & career_av >= 40 ~ "Hit",
      round == 1 & career_av >= 25 ~ "Starter",
      round == 1 & career_av >= 10 ~ "Contributor",
      round == 1 ~ "Bust",

      # Round 2
      round == 2 & career_av >= 50 ~ "Elite",
      round == 2 & career_av >= 35 ~ "Hit",
      round == 2 & career_av >= 20 ~ "Starter",
      round == 2 & career_av >= 10 ~ "Contributor",
      round == 2 ~ "Bust",

      # Rounds 3-4
      round %in% 3:4 & career_av >= 40 ~ "Elite",
      round %in% 3:4 & career_av >= 25 ~ "Hit",
      round %in% 3:4 & career_av >= 15 ~ "Starter",
      round %in% 3:4 & career_av >= 5 ~ "Contributor",
      round %in% 3:4 ~ "Bust",

      # Rounds 5+
      round >= 5 & career_av >= 30 ~ "Elite",
      round >= 5 & career_av >= 20 ~ "Hit",
      round >= 5 & career_av >= 10 ~ "Starter",
      round >= 5 & career_av >= 3 ~ "Contributor",
      TRUE ~ "Bust"
    ),
    classification = factor(
      classification,
      levels = c("Elite", "Hit", "Starter", "Contributor", "Bust")
    )
  )

# Distribution by round
classification_dist <- draft_classified %>%
  group_by(round, classification) %>%
  summarise(n = n(), .groups = "drop") %>%
  group_by(round) %>%
  mutate(pct = n / sum(n))

# Display for rounds 1-3
classification_dist %>%
  filter(round <= 3) %>%
  select(-n) %>%
  pivot_wider(names_from = classification, values_from = pct, values_fill = 0) %>%
  gt() %>%
  cols_label(round = "Round") %>%
  fmt_percent(
    columns = c(Elite, Hit, Starter, Contributor, Bust),
    decimals = 1
  ) %>%
  tab_header(
    title = "Draft Pick Classification Distribution",
    subtitle = "Rounds 1-3"
  )

#| label: hit-miss-classification-py
#| message: false
#| warning: false

def classify_pick(row):
    """Classify draft pick based on AV and round"""
    av = row['career_av']
    rd = row['round']

    if rd == 1:
        if av >= 60: return 'Elite'
        elif av >= 40: return 'Hit'
        elif av >= 25: return 'Starter'
        elif av >= 10: return 'Contributor'
        else: return 'Bust'
    elif rd == 2:
        if av >= 50: return 'Elite'
        elif av >= 35: return 'Hit'
        elif av >= 20: return 'Starter'
        elif av >= 10: return 'Contributor'
        else: return 'Bust'
    elif rd in [3, 4]:
        if av >= 40: return 'Elite'
        elif av >= 25: return 'Hit'
        elif av >= 15: return 'Starter'
        elif av >= 5: return 'Contributor'
        else: return 'Bust'
    else:  # Round 5+
        if av >= 30: return 'Elite'
        elif av >= 20: return 'Hit'
        elif av >= 10: return 'Starter'
        elif av >= 3: return 'Contributor'
        else: return 'Bust'

# Apply classification
draft_classified = draft_historical.copy()
draft_classified['classification'] = draft_classified.apply(classify_pick, axis=1)

# Calculate distribution
classification_dist = (draft_classified
    .groupby(['round', 'classification'])
    .size()
    .reset_index(name='n')
)

classification_dist['pct'] = (classification_dist
    .groupby('round')['n']
    .transform(lambda x: x / x.sum())
)

# Pivot for display
classification_pivot = classification_dist[classification_dist['round'] <= 3].pivot(
    index='round',
    columns='classification',
    values='pct'
).fillna(0)

# Reorder columns
col_order = ['Elite', 'Hit', 'Starter', 'Contributor', 'Bust']
classification_pivot = classification_pivot[[c for c in col_order if c in classification_pivot.columns]]

print("\nDraft Pick Classification Distribution (Rounds 1-3):\n")
print((classification_pivot * 100).round(1).to_string())

Visualization of Classifications

R
Python

#| label: fig-classification-viz-r
#| fig-cap: "Draft pick classification by round"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Stacked bar chart
classification_dist %>%
  filter(round <= 5) %>%
  ggplot(aes(x = factor(round), y = pct, fill = classification)) +
  geom_col(position = "stack") +
  scale_fill_manual(
    values = c(
      "Elite" = "#2E7D32",
      "Hit" = "#66BB6A",
      "Starter" = "#FDD835",
      "Contributor" = "#FFB74D",
      "Bust" = "#E53935"
    )
  ) +
  scale_y_continuous(labels = percent_format()) +
  labs(
    title = "Draft Pick Outcomes by Round",
    subtitle = "Distribution of player classifications (Rounds 1-5)",
    x = "Round",
    y = "Percentage",
    fill = "Classification"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-classification-viz-py
#| fig-cap: "Draft pick classification by round - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Filter to rounds 1-5
plot_data = classification_dist[classification_dist['round'] <= 5].copy()

# Pivot for stacking
plot_pivot = plot_data.pivot(index='round', columns='classification', values='pct').fillna(0)

# Reorder columns
col_order = ['Elite', 'Hit', 'Starter', 'Contributor', 'Bust']
plot_pivot = plot_pivot[[c for c in col_order if c in plot_pivot.columns]]

# Create stacked bar chart
fig, ax = plt.subplots(figsize=(12, 7))

colors = {
    'Elite': '#2E7D32',
    'Hit': '#66BB6A',
    'Starter': '#FDD835',
    'Contributor': '#FFB74D',
    'Bust': '#E53935'
}

bottom = np.zeros(len(plot_pivot))
for classification in col_order:
    if classification in plot_pivot.columns:
        ax.bar(plot_pivot.index, plot_pivot[classification],
              bottom=bottom, label=classification,
              color=colors[classification], alpha=0.9)
        bottom += plot_pivot[classification].values

ax.set_xlabel('Round', fontsize=12)
ax.set_ylabel('Percentage', fontsize=12)
ax.set_title('Draft Pick Outcomes by Round\nDistribution of player classifications (Rounds 1-5)',
            fontsize=14, fontweight='bold', pad=20)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.legend(title='Classification', loc='upper right')
ax.set_xticks(range(1, 6))
plt.tight_layout()
plt.show()

Machine Learning for Draft Projection

Modern draft analytics increasingly uses machine learning to project player success.

R
Python

#| label: ml-draft-model-r
#| message: false
#| warning: false
#| cache: true

library(randomForest)

# Prepare training data
ml_data <- draft_with_combine %>%
  select(
    career_av,
    pick,
    round,
    position,
    college_production,
    athletic_score,
    forty_yard,
    vertical,
    broad_jump,
    bench_press,
    height_inches,
    weight_lbs
  ) %>%
  na.omit()

# Split train/test
set.seed(2024)
train_idx <- sample(1:nrow(ml_data), 0.8 * nrow(ml_data))
train_data <- ml_data[train_idx, ]
test_data <- ml_data[-train_idx, ]

# Train random forest model
rf_model <- randomForest(
  career_av ~ .,
  data = train_data,
  ntree = 500,
  mtry = 4,
  importance = TRUE
)

# Make predictions
train_pred <- predict(rf_model, train_data)
test_pred <- predict(rf_model, test_data)

# Calculate metrics
train_rmse <- sqrt(mean((train_data$career_av - train_pred)^2))
test_rmse <- sqrt(mean((test_data$career_av - test_pred)^2))
train_r2 <- cor(train_data$career_av, train_pred)^2
test_r2 <- cor(test_data$career_av, test_pred)^2

# Display results
cat("\nRandom Forest Model Performance:\n")
cat(sprintf("Training RMSE: %.2f\n", train_rmse))
cat(sprintf("Testing RMSE: %.2f\n", test_rmse))
cat(sprintf("Training R²: %.3f\n", train_r2))
cat(sprintf("Testing R²: %.3f\n", test_r2))

# Feature importance
importance_df <- as.data.frame(importance(rf_model)) %>%
  rownames_to_column("feature") %>%
  as_tibble() %>%
  arrange(desc(`%IncMSE`))

# Display top features
importance_df %>%
  head(10) %>%
  select(feature, `%IncMSE`, IncNodePurity) %>%
  gt() %>%
  cols_label(
    feature = "Feature",
    `%IncMSE` = "% Inc MSE",
    IncNodePurity = "Node Purity"
  ) %>%
  fmt_number(
    columns = c(`%IncMSE`, IncNodePurity),
    decimals = 2
  ) %>%
  tab_header(
    title = "Random Forest Feature Importance",
    subtitle = "Top 10 predictors of NFL career value"
  )

#| label: ml-draft-model-py
#| message: false
#| warning: false
#| cache: true

# Prepare data
ml_data = draft_with_combine[[
    'career_av', 'pick', 'round', 'position', 'college_production',
    'athletic_score', 'forty_yard', 'vertical', 'broad_jump',
    'bench_press', 'height_inches', 'weight_lbs'
]].dropna()

# Encode position
ml_data_encoded = pd.get_dummies(ml_data, columns=['position'], drop_first=True)

# Split features and target
X = ml_data_encoded.drop('career_av', axis=1)
y = ml_data_encoded['career_av']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=2024
)

# Train Random Forest
rf_model = RandomForestRegressor(
    n_estimators=500,
    max_features='sqrt',
    random_state=2024,
    n_jobs=-1
)

rf_model.fit(X_train, y_train)

# Predictions
train_pred = rf_model.predict(X_train)
test_pred = rf_model.predict(X_test)

# Metrics
train_rmse = np.sqrt(mean_squared_error(y_train, train_pred))
test_rmse = np.sqrt(mean_squared_error(y_test, test_pred))
train_r2 = r2_score(y_train, train_pred)
test_r2 = r2_score(y_test, test_pred)

print("\nRandom Forest Model Performance:")
print(f"Training RMSE: {train_rmse:.2f}")
print(f"Testing RMSE: {test_rmse:.2f}")
print(f"Training R²: {train_r2:.3f}")
print(f"Testing R²: {test_r2:.3f}")

# Feature importance
importance_df = pd.DataFrame({
    'feature': X.columns,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nTop 10 Feature Importances:")
print(importance_df.head(10).to_string(index=False))

Model Comparison

Let's compare multiple modeling approaches:

R
Python

#| label: model-comparison-r
#| message: false
#| warning: false
#| cache: true

library(glmnet)

# Prepare matrix for glmnet
x_train <- model.matrix(career_av ~ . - 1, data = train_data)
y_train <- train_data$career_av
x_test <- model.matrix(career_av ~ . - 1, data = test_data)
y_test <- test_data$career_av

# Ridge regression
ridge_model <- cv.glmnet(x_train, y_train, alpha = 0)
ridge_pred <- predict(ridge_model, x_test, s = "lambda.min")
ridge_rmse <- sqrt(mean((y_test - ridge_pred)^2))
ridge_r2 <- cor(y_test, ridge_pred)^2

# Lasso regression
lasso_model <- cv.glmnet(x_train, y_train, alpha = 1)
lasso_pred <- predict(lasso_model, x_test, s = "lambda.min")
lasso_rmse <- sqrt(mean((y_test - lasso_pred)^2))
lasso_r2 <- cor(y_test, lasso_pred)^2

# Linear regression baseline
lm_model <- lm(career_av ~ ., data = train_data)
lm_pred <- predict(lm_model, test_data)
lm_rmse <- sqrt(mean((y_test - lm_pred)^2))
lm_r2 <- summary(lm_model)$r.squared

# Compare models
model_comparison <- tibble(
  Model = c("Linear Regression", "Ridge", "Lasso", "Random Forest"),
  RMSE = c(lm_rmse, ridge_rmse, lasso_rmse, test_rmse),
  R_squared = c(lm_r2, ridge_r2, lasso_r2, test_r2)
) %>%
  arrange(RMSE)

model_comparison %>%
  gt() %>%
  cols_label(
    Model = "Model",
    RMSE = "RMSE",
    R_squared = "R²"
  ) %>%
  fmt_number(
    columns = c(RMSE, R_squared),
    decimals = 3
  ) %>%
  tab_header(
    title = "Draft Projection Model Comparison",
    subtitle = "Test set performance"
  )

#| label: model-comparison-py
#| message: false
#| warning: false
#| cache: true

# Ridge Regression
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)
ridge_pred = ridge_model.predict(X_test)
ridge_rmse = np.sqrt(mean_squared_error(y_test, ridge_pred))
ridge_r2 = r2_score(y_test, ridge_pred)

# Gradient Boosting
gb_model = GradientBoostingRegressor(
    n_estimators=200,
    max_depth=4,
    random_state=2024
)
gb_model.fit(X_train, y_train)
gb_pred = gb_model.predict(X_test)
gb_rmse = np.sqrt(mean_squared_error(y_test, gb_pred))
gb_r2 = r2_score(y_test, gb_pred)

# Compare models
model_comparison = pd.DataFrame({
    'Model': ['Ridge', 'Random Forest', 'Gradient Boosting'],
    'RMSE': [ridge_rmse, test_rmse, gb_rmse],
    'R_squared': [ridge_r2, test_r2, gb_r2]
}).sort_values('RMSE')

print("\nDraft Projection Model Comparison (Test Set):\n")
print(model_comparison.to_string(index=False))

Prediction Intervals

R
Python

#| label: fig-prediction-intervals-r
#| fig-cap: "Model predictions vs actual career AV"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Create prediction dataframe
pred_df <- test_data %>%
  mutate(
    predicted = test_pred,
    residual = career_av - predicted
  )

# Plot predictions vs actual
ggplot(pred_df, aes(x = predicted, y = career_av)) +
  geom_point(alpha = 0.4, color = "#1f77b4") +
  geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
  geom_smooth(method = "lm", se = TRUE, color = "black", linewidth = 0.8) +
  labs(
    title = "Random Forest Predictions vs Actual Career AV",
    subtitle = sprintf("Test Set | RMSE = %.2f | R² = %.3f", test_rmse, test_r2),
    x = "Predicted Career AV",
    y = "Actual Career AV",
    caption = "Red dashed line = perfect predictions | Black line = actual fit"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14)
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-prediction-intervals-py
#| fig-cap: "Model predictions vs actual career AV - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false

# Create plot
fig, ax = plt.subplots(figsize=(12, 7))

ax.scatter(test_pred, y_test, alpha=0.4, color='#1f77b4', s=30)

# Perfect prediction line
max_val = max(test_pred.max(), y_test.max())
ax.plot([0, max_val], [0, max_val], 'r--', label='Perfect Predictions', linewidth=2)

# Trend line
z = np.polyfit(test_pred, y_test, 1)
p = np.poly1d(z)
x_line = np.linspace(test_pred.min(), test_pred.max(), 100)
ax.plot(x_line, p(x_line), 'k-', label='Actual Fit', linewidth=2)

ax.set_xlabel('Predicted Career AV', fontsize=12)
ax.set_ylabel('Actual Career AV', fontsize=12)
ax.set_title(f'Random Forest Predictions vs Actual Career AV\nTest Set | RMSE = {test_rmse:.2f} | R² = {test_r2:.3f}',
            fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Model Insights

Machine learning models can identify complex patterns in draft data, but they're not perfect predictors. The R² values typically range from 0.3-0.5, meaning 30-50% of variance in NFL success is explained by pre-draft measurables. This highlights the inherent uncertainty in draft evaluation and the importance of scouting expertise alongside analytics.

Summary

This chapter covered the analytical foundations of NFL draft evaluation:

Key Takeaways:

Draft Pick Value: Traditional value charts (Jimmy Johnson) overvalue early picks; empirical models based on historical performance provide better trade guidance
Success Rates: Hit rates decline sharply by round, with significant variation by position. First-round picks have ~40-50% hit rates, dropping to ~15-20% by round 3
College-NFL Translation: College production correlates moderately with NFL success (r ≈ 0.3-0.5), varying by position. QBs and WRs show stronger correlations than linemen
Combine Metrics: Athletic testing has modest predictive value; explosion metrics (vertical, broad jump) often outperform straight-line speed. Position-specific evaluation is critical
Trade Optimization: Value-based trading can create competitive advantages. Trading down often provides surplus value
Positional Strategy: Premium positions (QB, OL, DL, DB) warrant higher investment; scarcity varies by draft class
Team Evaluation: Measuring performance vs expected value (given draft position) provides fairer team assessment than raw totals
Classification Systems: Round-adjusted thresholds for "hits" and "busts" enable meaningful evaluation
ML Applications: Random forests and gradient boosting models outperform linear approaches but explain only 30-50% of variance, highlighting draft uncertainty
Integrated Approach: Best practices combine analytics (value charts, statistical models) with scouting expertise and team-specific needs

Exercises

Conceptual Questions

Value Chart Economics: Explain why the Jimmy Johnson trade chart overvalues high picks. What economic principles drive this overvaluation?
Position Strategy: Given limited draft capital, should a team prioritize premium positions or best player available? Justify your answer with data.
Combine Skepticism: Why might combine performance be less predictive for certain positions? Provide examples.

Coding Exercises

Exercise 1: Build Your Own Value Chart

Create a custom draft value chart using the simulated data: a) Calculate expected AV by pick number b) Incorporate positional adjustments c) Account for rookie salary cap hits d) Create a surplus value model e) Compare your chart to the Jimmy Johnson chart **Bonus**: Create a position-specific value chart (e.g., separate values for QB vs OL)

Exercise 2: Trade Analyzer

Build a function that evaluates proposed draft trades: a) Accept two sets of picks (Team A and Team B) b) Calculate value for each team using multiple value systems c) Recommend whether each team should accept d) Account for positional needs (if provided) e) Generate a trade "fairness" score **Test case**: Team A offers pick #10; Team B offers picks #20, #52, and next year's 2nd round pick (estimated #45)

Exercise 3: College-to-NFL Projection

Develop a position-specific projection model: a) Filter data to a single position (e.g., WR) b) Create relevant features from college and combine data c) Train multiple models (linear, tree-based, ensemble) d) Evaluate with cross-validation e) Identify the most important predictive features f) Generate predictions with confidence intervals **Advanced**: Build separate models for each position and compare predictive accuracy

Exercise 4: Draft Class Analysis

Analyze the quality of draft classes: a) Calculate average AV by draft year b) Identify "strong" and "weak" draft classes c) Analyze positional strength by year (e.g., 2011 had great WRs) d) Create visualizations showing class quality trends e) Adjust for years of experience (recent classes have less accumulated AV) **Research question**: Do weak draft classes create more parity in the NFL?

Exercise 5: Team Draft Strategy

Evaluate a specific team's draft strategy: a) Load real historical draft data for one team (2010-2023) b) Calculate their hit rate by round c) Identify positional preferences d) Measure AV vs expected given draft positions e) Analyze their trading behavior (trade up/down frequency) f) Create a comprehensive draft performance report **Teams to consider**: Green Bay (historically strong), Cleveland (historically weak), Baltimore (analytics-driven)

References

:::

Learning ObjectivesBy the end of this chapter, you will be able to:

Introduction

The Draft as an Optimization Problem

The Evolution of Draft Analytics

Traditional Approaches (Pre-2000s)

The Analytics Revolution (2000s-Present)

Draft Pick Value Charts

The Jimmy Johnson Chart

Modern Value Charts

Value Chart Based on Historical Performance

Visualizing Draft Value Curves

📊 Visualization Output

Key Insight: The Surplus Value Curve

Draft Success Rates by Position and Round

Position-Specific Success Rates

📊 Visualization Output

Position Variability

College-to-NFL Performance Correlation

Visualization of College-NFL Relationship

📊 Visualization Output

NFL Combine Metrics and Success

Combine Limitations

Draft Trade Value and Optimization

Trade Value Model

Trade Analysis Function

Positional Value and Scarcity

Positional Scarcity by Round

📊 Visualization Output

Team Draft Performance Evaluation

Classifying Hits vs Misses

Visualization of Classifications

📊 Visualization Output

Machine Learning for Draft Projection

Model Comparison

Prediction Intervals

📊 Visualization Output

Model Insights

Summary

Exercises

Conceptual Questions

Coding Exercises

Exercise 1: Build Your Own Value Chart

Exercise 2: Trade Analyzer

Exercise 3: College-to-NFL Projection

Exercise 4: Draft Class Analysis

Exercise 5: Team Draft Strategy

Further Reading

Academic Research

Industry Applications

Tools and Data

References