Learning ObjectivesBy the end of this chapter, you will be able to:
- Evaluate draft pick value and trade curves
- Analyze draft success rates by position and round
- Build college-to-NFL projection models
- Assess draft strategy and team performance
- Optimize draft capital allocation
Introduction
The NFL Draft represents one of the most critical mechanisms for team building in professional football. Every spring, 32 teams select from a pool of college players, making decisions that can shape their franchise for years to come. The stakes are enormous: a successful draft can transform a struggling team into a championship contender, while poor draft decisions can waste valuable resources and set a franchise back years.
Traditional draft evaluation relied heavily on subjective scouting reports, combine performances, and gut instincts. While scouting expertise remains invaluable, modern analytics provides powerful tools to complement traditional evaluation methods. Data-driven approaches help teams:
- Quantify pick value to make optimal trade decisions
- Identify historical patterns of success and failure by position and round
- Project college performance to NFL outcomes using statistical models
- Evaluate combine metrics for predictive validity
- Assess team draft performance over time
- Optimize resource allocation across draft positions
The Draft as an Optimization Problem
The NFL Draft is fundamentally an optimization problem under uncertainty. Teams must allocate limited resources (draft picks) to maximize long-term team value while accounting for: - Positional scarcity and value - Projection uncertainty (college-to-NFL translation) - Information asymmetry (other teams' preferences) - Temporal constraints (immediate vs future needs) - Salary cap implicationsThis chapter explores the analytical frameworks that help teams navigate these challenges and make better draft decisions.
The Evolution of Draft Analytics
Traditional Approaches (Pre-2000s)
Historically, draft evaluation focused on:
- Scouting reports: Subjective evaluations from college game film
- Combine performance: Physical measurements and athletic testing
- Positional need: Filling roster gaps
- Best player available: Simple ranking systems
The Jimmy Johnson Trade Value Chart, created in the 1990s, was one of the first systematic attempts to quantify draft pick value. While widely used, it significantly overvalues early picks.
The Analytics Revolution (2000s-Present)
Modern draft analytics emerged from several key developments:
- 2004: Michael Lewis's Moneyball inspires football analytics
- 2011: Massey-Thaler study shows systematic overvaluation of high picks
- 2013: Harvard Sports Analysis Collective develops new value charts
- 2015+: Machine learning models for draft projection gain traction
- 2020s: Integration of tracking data and advanced college metrics
Draft Pick Value Charts
The Jimmy Johnson Chart
The traditional trade value chart assigns points to each pick:
| Pick | Points | Pick | Points | Pick | Points |
|---|---|---|---|---|---|
| 1 | 3000 | 11 | 1250 | 21 | 800 |
| 2 | 2600 | 12 | 1200 | 22 | 780 |
| 3 | 2200 | 13 | 1150 | 23 | 760 |
| 4 | 1800 | 14 | 1100 | 24 | 740 |
| 5 | 1700 | 15 | 1050 | 25 | 720 |
Problems with the Jimmy Johnson Chart:
- Overvalues top picks relative to their historical performance
- Doesn't account for positional differences
- Based on subjective valuation, not empirical analysis
- Doesn't consider salary cap implications
Modern Value Charts
Let's implement modern draft value models based on historical performance.
#| label: setup-r
#| message: false
#| warning: false
library(tidyverse)
library(nflfastR)
library(nflplotR)
library(gt)
library(scales)
# Set seed for reproducibility
set.seed(2024)
#| label: jimmy-johnson-chart-r
#| message: false
#| warning: false
# Implement Jimmy Johnson trade value chart
jimmy_johnson_value <- function(pick) {
case_when(
pick == 1 ~ 3000,
pick == 2 ~ 2600,
pick == 3 ~ 2200,
pick == 4 ~ 1800,
pick == 5 ~ 1700,
pick == 6 ~ 1600,
pick == 7 ~ 1500,
pick == 8 ~ 1400,
pick == 9 ~ 1350,
pick == 10 ~ 1300,
pick >= 11 & pick <= 20 ~ 3000 - (pick - 1) * 100,
pick >= 21 & pick <= 32 ~ 1000 - (pick - 21) * 20,
pick >= 33 & pick <= 64 ~ 580 - (pick - 33) * 10,
pick >= 65 & pick <= 96 ~ 265 - (pick - 65) * 6,
pick >= 97 & pick <= 128 ~ 112 - (pick - 97) * 2.4,
pick >= 129 & pick <= 160 ~ 43 - (pick - 129) * 0.8,
pick >= 161 & pick <= 192 ~ 24 - (pick - 161) * 0.4,
pick >= 193 & pick <= 224 ~ 11 - (pick - 193) * 0.2,
pick >= 225 & pick <= 256 ~ 4.4 - (pick - 225) * 0.1,
TRUE ~ 0
)
}
# Create draft pick value dataframe
draft_values <- tibble(
pick = 1:256,
jimmy_johnson = jimmy_johnson_value(pick),
round = ceiling(pick / 32)
)
# Display first round values
draft_values %>%
filter(pick <= 32) %>%
gt() %>%
cols_label(
pick = "Pick",
jimmy_johnson = "JJ Value",
round = "Round"
) %>%
fmt_number(
columns = jimmy_johnson,
decimals = 0
) %>%
tab_header(
title = "Jimmy Johnson Trade Value Chart",
subtitle = "First Round (Picks 1-32)"
)
#| label: setup-py
#| message: false
#| warning: false
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')
# Set random seed
np.random.seed(2024)
# Set plot style
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 100
#| label: jimmy-johnson-chart-py
#| message: false
#| warning: false
def jimmy_johnson_value(pick):
"""Calculate Jimmy Johnson trade value for a draft pick"""
if pick == 1:
return 3000
elif pick == 2:
return 2600
elif pick == 3:
return 2200
elif pick == 4:
return 1800
elif pick == 5:
return 1700
elif pick == 6:
return 1600
elif pick == 7:
return 1500
elif pick == 8:
return 1400
elif pick == 9:
return 1350
elif pick == 10:
return 1300
elif pick <= 20:
return 3000 - (pick - 1) * 100
elif pick <= 32:
return 1000 - (pick - 21) * 20
elif pick <= 64:
return 580 - (pick - 33) * 10
elif pick <= 96:
return 265 - (pick - 65) * 6
elif pick <= 128:
return 112 - (pick - 97) * 2.4
elif pick <= 160:
return 43 - (pick - 129) * 0.8
elif pick <= 192:
return 24 - (pick - 161) * 0.4
elif pick <= 224:
return 11 - (pick - 193) * 0.2
elif pick <= 256:
return 4.4 - (pick - 225) * 0.1
else:
return 0
# Create draft pick value dataframe
draft_values = pd.DataFrame({
'pick': range(1, 257),
'jimmy_johnson': [jimmy_johnson_value(p) for p in range(1, 257)]
})
draft_values['round'] = np.ceil(draft_values['pick'] / 32).astype(int)
# Display first round values
print("\nJimmy Johnson Trade Value Chart - First Round:\n")
print(draft_values[draft_values['pick'] <= 32][['pick', 'jimmy_johnson', 'round']].to_string(index=False))
Value Chart Based on Historical Performance
We'll create a more empirically grounded value chart using Approximate Value (AV), a metric that estimates player contribution.
#| label: empirical-value-r
#| message: false
#| warning: false
#| cache: true
# Simulate draft data (in practice, load from Pro Football Reference or similar)
# This simulates historical draft picks with their career AV
simulate_draft_data <- function(n_years = 10) {
set.seed(123)
# Generate draft picks over multiple years
draft_data <- expand_grid(
year = 2010:2019,
pick = 1:256
) %>%
mutate(
round = ceiling(pick / 32),
# Simulate career AV with decreasing value by pick
# Top picks have higher expected value but more variance
expected_av = case_when(
pick <= 10 ~ 60 - pick * 3,
pick <= 32 ~ 50 - pick * 1.5,
pick <= 64 ~ 40 - pick * 0.6,
pick <= 96 ~ 30 - pick * 0.4,
pick <= 160 ~ 25 - pick * 0.15,
TRUE ~ 10 - pick * 0.03
),
# Add noise
career_av = pmax(0, rnorm(n(), expected_av, 15)),
# Simulate if player became starter
starter = career_av > 25,
# Simulate if player was "hit"
hit = career_av > 40,
# Position simulation
position = sample(
c("QB", "RB", "WR", "TE", "OL", "DL", "LB", "DB"),
n(),
replace = TRUE,
prob = c(0.08, 0.10, 0.12, 0.05, 0.20, 0.15, 0.12, 0.18)
)
)
return(draft_data)
}
# Generate simulated data
draft_historical <- simulate_draft_data()
# Calculate average value by pick
pick_values <- draft_historical %>%
group_by(pick, round) %>%
summarise(
avg_av = mean(career_av),
median_av = median(career_av),
starter_rate = mean(starter),
hit_rate = mean(hit),
n_players = n(),
.groups = "drop"
)
# Display value by pick for first round
pick_values %>%
filter(pick <= 32) %>%
gt() %>%
cols_label(
pick = "Pick",
round = "Round",
avg_av = "Avg AV",
median_av = "Med AV",
starter_rate = "Starter Rate",
hit_rate = "Hit Rate",
n_players = "N"
) %>%
fmt_number(
columns = c(avg_av, median_av),
decimals = 1
) %>%
fmt_percent(
columns = c(starter_rate, hit_rate),
decimals = 1
) %>%
tab_header(
title = "Draft Pick Value by Historical Performance",
subtitle = "First Round (Picks 1-32)"
)
#| label: empirical-value-py
#| message: false
#| warning: false
#| cache: true
def simulate_draft_data(n_years=10):
"""Simulate historical draft data with career AV"""
np.random.seed(123)
# Generate draft picks
years = range(2010, 2020)
picks = range(1, 257)
data = []
for year in years:
for pick in picks:
round_num = int(np.ceil(pick / 32))
# Expected AV decreases with pick number
if pick <= 10:
expected_av = 60 - pick * 3
elif pick <= 32:
expected_av = 50 - pick * 1.5
elif pick <= 64:
expected_av = 40 - pick * 0.6
elif pick <= 96:
expected_av = 30 - pick * 0.4
elif pick <= 160:
expected_av = 25 - pick * 0.15
else:
expected_av = 10 - pick * 0.03
# Add noise
career_av = max(0, np.random.normal(expected_av, 15))
data.append({
'year': year,
'pick': pick,
'round': round_num,
'career_av': career_av,
'starter': career_av > 25,
'hit': career_av > 40,
'position': np.random.choice(
['QB', 'RB', 'WR', 'TE', 'OL', 'DL', 'LB', 'DB'],
p=[0.08, 0.10, 0.12, 0.05, 0.20, 0.15, 0.12, 0.18]
)
})
return pd.DataFrame(data)
# Generate simulated data
draft_historical = simulate_draft_data()
# Calculate average value by pick
pick_values = (draft_historical
.groupby(['pick', 'round'])
.agg({
'career_av': ['mean', 'median'],
'starter': 'mean',
'hit': 'mean',
'year': 'count'
})
.reset_index()
)
pick_values.columns = ['pick', 'round', 'avg_av', 'median_av',
'starter_rate', 'hit_rate', 'n_players']
# Display first round
print("\nDraft Pick Value by Historical Performance - First Round:\n")
first_round = pick_values[pick_values['pick'] <= 32].copy()
first_round['starter_rate'] = (first_round['starter_rate'] * 100).round(1)
first_round['hit_rate'] = (first_round['hit_rate'] * 100).round(1)
print(first_round.to_string(index=False))
Visualizing Draft Value Curves
#| label: fig-value-curves-r
#| fig-cap: "Comparison of draft value curves"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Combine value systems
value_comparison <- draft_values %>%
left_join(pick_values %>% select(pick, avg_av), by = "pick") %>%
mutate(
# Normalize values to 0-100 scale
jj_normalized = (jimmy_johnson / max(jimmy_johnson, na.rm = TRUE)) * 100,
av_normalized = (avg_av / max(avg_av, na.rm = TRUE)) * 100
)
# Create comparison plot
ggplot(value_comparison, aes(x = pick)) +
geom_line(aes(y = jj_normalized, color = "Jimmy Johnson"),
linewidth = 1.2, alpha = 0.8) +
geom_line(aes(y = av_normalized, color = "Empirical (AV)"),
linewidth = 1.2, alpha = 0.8) +
geom_vline(xintercept = seq(32, 256, 32),
linetype = "dashed", alpha = 0.3) +
scale_color_manual(
values = c("Jimmy Johnson" = "#d62728", "Empirical (AV)" = "#2ca02c"),
name = "Value System"
) +
scale_x_continuous(breaks = c(1, 32, 64, 96, 128, 160, 192, 224, 256)) +
labs(
title = "Draft Pick Value Curves: Traditional vs Empirical",
subtitle = "Normalized to 0-100 scale | Vertical lines indicate round boundaries",
x = "Draft Pick",
y = "Normalized Value",
caption = "Note: Jimmy Johnson chart overvalues early picks relative to historical performance"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
plot.subtitle = element_text(size = 10),
legend.position = "top",
panel.grid.minor = element_blank()
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-value-curves-py
#| fig-cap: "Comparison of draft value curves - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Combine value systems
value_comparison = draft_values.merge(
pick_values[['pick', 'avg_av']],
on='pick'
)
# Normalize to 0-100 scale
value_comparison['jj_normalized'] = (
value_comparison['jimmy_johnson'] / value_comparison['jimmy_johnson'].max()
) * 100
value_comparison['av_normalized'] = (
value_comparison['avg_av'] / value_comparison['avg_av'].max()
) * 100
# Create plot
fig, ax = plt.subplots(figsize=(12, 7))
ax.plot(value_comparison['pick'], value_comparison['jj_normalized'],
label='Jimmy Johnson', color='#d62728', linewidth=2, alpha=0.8)
ax.plot(value_comparison['pick'], value_comparison['av_normalized'],
label='Empirical (AV)', color='#2ca02c', linewidth=2, alpha=0.8)
# Add round boundaries
for round_end in range(32, 257, 32):
ax.axvline(x=round_end, color='gray', linestyle='--', alpha=0.3)
ax.set_xlabel('Draft Pick', fontsize=12)
ax.set_ylabel('Normalized Value', fontsize=12)
ax.set_title('Draft Pick Value Curves: Traditional vs Empirical\nNormalized to 0-100 scale',
fontsize=14, fontweight='bold', pad=20)
ax.legend(title='Value System', loc='upper right', fontsize=10)
ax.set_xticks([1, 32, 64, 96, 128, 160, 192, 224, 256])
ax.grid(True, alpha=0.3)
ax.text(0.98, 0.02, 'Note: Jimmy Johnson chart overvalues early picks',
transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()
Key Insight: The Surplus Value Curve
The empirical value curve is much flatter than the Jimmy Johnson chart, suggesting that traditional draft trade values significantly overvalue top picks. This creates opportunities for savvy teams to trade down and accumulate additional picks.Draft Success Rates by Position and Round
Understanding historical success rates helps teams set realistic expectations and identify value.
#| label: success-rates-r
#| message: false
#| warning: false
# Calculate success rates by round and position
success_by_round_position <- draft_historical %>%
group_by(round, position) %>%
summarise(
n_picks = n(),
avg_av = mean(career_av),
starter_rate = mean(starter),
hit_rate = mean(hit),
bust_rate = mean(career_av < 10),
.groups = "drop"
) %>%
arrange(round, desc(avg_av))
# Success rates by round (all positions)
success_by_round <- draft_historical %>%
group_by(round) %>%
summarise(
n_picks = n(),
avg_av = mean(career_av),
median_av = median(career_av),
starter_rate = mean(starter),
hit_rate = mean(hit),
bust_rate = mean(career_av < 10),
.groups = "drop"
)
# Display table
success_by_round %>%
gt() %>%
cols_label(
round = "Round",
n_picks = "N Picks",
avg_av = "Avg AV",
median_av = "Med AV",
starter_rate = "Starter %",
hit_rate = "Hit %",
bust_rate = "Bust %"
) %>%
fmt_number(
columns = c(avg_av, median_av),
decimals = 1
) %>%
fmt_percent(
columns = c(starter_rate, hit_rate, bust_rate),
decimals = 1
) %>%
tab_header(
title = "Draft Success Rates by Round",
subtitle = "All positions, 2010-2019 drafts"
) %>%
tab_footnote(
footnote = "Starter: Career AV > 25 | Hit: Career AV > 40 | Bust: Career AV < 10"
)
#| label: success-rates-py
#| message: false
#| warning: false
# Calculate success rates by round
success_by_round = (draft_historical
.groupby('round')
.agg({
'year': 'count',
'career_av': ['mean', 'median'],
'starter': 'mean',
'hit': 'mean'
})
.reset_index()
)
success_by_round.columns = ['round', 'n_picks', 'avg_av', 'median_av',
'starter_rate', 'hit_rate']
# Add bust rate
bust_rate = (draft_historical
.groupby('round')
.apply(lambda x: (x['career_av'] < 10).mean())
.reset_index(name='bust_rate')
)
success_by_round = success_by_round.merge(bust_rate, on='round')
# Display
print("\nDraft Success Rates by Round (All Positions):\n")
display_df = success_by_round.copy()
display_df['avg_av'] = display_df['avg_av'].round(1)
display_df['median_av'] = display_df['median_av'].round(1)
display_df['starter_rate'] = (display_df['starter_rate'] * 100).round(1)
display_df['hit_rate'] = (display_df['hit_rate'] * 100).round(1)
display_df['bust_rate'] = (display_df['bust_rate'] * 100).round(1)
print(display_df.to_string(index=False))
print("\nNote: Starter: Career AV > 25 | Hit: Career AV > 40 | Bust: Career AV < 10")
Position-Specific Success Rates
#| label: fig-position-success-r
#| fig-cap: "Success rates by position and round"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Filter to first 3 rounds for clarity
position_success_plot <- success_by_round_position %>%
filter(round <= 3)
ggplot(position_success_plot, aes(x = position, y = hit_rate, fill = as.factor(round))) +
geom_col(position = "dodge", alpha = 0.8) +
scale_fill_manual(
values = c("1" = "#1f77b4", "2" = "#ff7f0e", "3" = "#2ca02c"),
name = "Round"
) +
scale_y_continuous(labels = percent_format()) +
labs(
title = "Draft Hit Rates by Position and Round",
subtitle = "Percentage of players with career AV > 40 (Rounds 1-3)",
x = "Position",
y = "Hit Rate",
caption = "Data: Simulated draft data 2010-2019"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "top",
panel.grid.major.x = element_blank()
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-position-success-py
#| fig-cap: "Success rates by position and round - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Calculate by position and round
success_by_round_position = (draft_historical
.groupby(['round', 'position'])
.agg({
'year': 'count',
'career_av': 'mean',
'starter': 'mean',
'hit': 'mean'
})
.reset_index()
)
success_by_round_position.columns = ['round', 'position', 'n_picks',
'avg_av', 'starter_rate', 'hit_rate']
# Filter to rounds 1-3
plot_data = success_by_round_position[success_by_round_position['round'] <= 3].copy()
# Create plot
fig, ax = plt.subplots(figsize=(12, 8))
positions = sorted(plot_data['position'].unique())
rounds = [1, 2, 3]
x = np.arange(len(positions))
width = 0.25
colors = ['#1f77b4', '#ff7f0e', '#2ca02c']
for i, round_num in enumerate(rounds):
round_data = plot_data[plot_data['round'] == round_num]
round_data = round_data.set_index('position').reindex(positions, fill_value=0)
ax.bar(x + i * width, round_data['hit_rate'], width,
label=f'Round {round_num}', color=colors[i], alpha=0.8)
ax.set_xlabel('Position', fontsize=12)
ax.set_ylabel('Hit Rate', fontsize=12)
ax.set_title('Draft Hit Rates by Position and Round\nPercentage of players with career AV > 40 (Rounds 1-3)',
fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(x + width)
ax.set_xticklabels(positions)
ax.legend(title='Round', loc='upper right')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.grid(True, axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
Position Variability
Success rates vary significantly by position. Premium positions (QB, OL, DL) tend to have higher hit rates in early rounds, while skill positions show more variability. This reflects both positional importance and the difficulty of projecting college performance to the NFL.College-to-NFL Performance Correlation
A critical question in draft analytics: How well do college statistics predict NFL success?
#| label: college-nfl-correlation-r
#| message: false
#| warning: false
#| cache: true
# Simulate college statistics
set.seed(456)
draft_with_college <- draft_historical %>%
mutate(
# College production metrics (normalized, position-specific)
college_production = case_when(
position == "QB" ~ rnorm(n(), 70, 15),
position == "RB" ~ rnorm(n(), 65, 20),
position == "WR" ~ rnorm(n(), 60, 18),
position == "TE" ~ rnorm(n(), 55, 16),
position %in% c("OL", "DL") ~ rnorm(n(), 50, 12),
position == "LB" ~ rnorm(n(), 58, 14),
position == "DB" ~ rnorm(n(), 62, 15),
TRUE ~ rnorm(n(), 50, 15)
),
# Athletic score (combine)
athletic_score = rnorm(n(), 50, 10),
# Add correlation with NFL success
college_production = college_production + career_av * 0.3 + rnorm(n(), 0, 5),
athletic_score = athletic_score + career_av * 0.2 + rnorm(n(), 0, 5)
) %>%
mutate(
college_production = pmax(0, pmin(100, college_production)),
athletic_score = pmax(0, pmin(100, athletic_score))
)
# Calculate correlations by position
college_correlations <- draft_with_college %>%
group_by(position) %>%
summarise(
n = n(),
production_cor = cor(college_production, career_av),
athletic_cor = cor(athletic_score, career_av),
combined_cor = cor(college_production + athletic_score, career_av),
.groups = "drop"
) %>%
arrange(desc(combined_cor))
# Display correlations
college_correlations %>%
gt() %>%
cols_label(
position = "Position",
n = "N",
production_cor = "Production ρ",
athletic_cor = "Athletic ρ",
combined_cor = "Combined ρ"
) %>%
fmt_number(
columns = c(production_cor, athletic_cor, combined_cor),
decimals = 3
) %>%
tab_header(
title = "College-to-NFL Correlations by Position",
subtitle = "Correlation with career AV"
) %>%
tab_footnote(
footnote = "ρ = Pearson correlation coefficient"
)
#| label: college-nfl-correlation-py
#| message: false
#| warning: false
#| cache: true
# Simulate college statistics
np.random.seed(456)
def generate_college_stats(df):
"""Add simulated college statistics"""
college_prod = []
athletic = []
for _, row in df.iterrows():
# Position-specific college production
if row['position'] == 'QB':
base_prod = np.random.normal(70, 15)
elif row['position'] == 'RB':
base_prod = np.random.normal(65, 20)
elif row['position'] == 'WR':
base_prod = np.random.normal(60, 18)
elif row['position'] == 'TE':
base_prod = np.random.normal(55, 16)
elif row['position'] in ['OL', 'DL']:
base_prod = np.random.normal(50, 12)
elif row['position'] == 'LB':
base_prod = np.random.normal(58, 14)
elif row['position'] == 'DB':
base_prod = np.random.normal(62, 15)
else:
base_prod = np.random.normal(50, 15)
# Add correlation with NFL success
college_prod.append(
np.clip(base_prod + row['career_av'] * 0.3 + np.random.normal(0, 5), 0, 100)
)
athletic.append(
np.clip(50 + row['career_av'] * 0.2 + np.random.normal(0, 5), 0, 100)
)
return college_prod, athletic
draft_with_college = draft_historical.copy()
draft_with_college['college_production'], draft_with_college['athletic_score'] = \
generate_college_stats(draft_with_college)
# Calculate correlations by position
college_correlations = (draft_with_college
.groupby('position')
.apply(lambda x: pd.Series({
'n': len(x),
'production_cor': x['college_production'].corr(x['career_av']),
'athletic_cor': x['athletic_score'].corr(x['career_av']),
'combined_cor': (x['college_production'] + x['athletic_score']).corr(x['career_av'])
}))
.reset_index()
.sort_values('combined_cor', ascending=False)
)
print("\nCollege-to-NFL Correlations by Position:\n")
print(college_correlations.to_string(index=False))
print("\nNote: ρ = Pearson correlation coefficient")
Visualization of College-NFL Relationship
#| label: fig-college-nfl-scatter-r
#| fig-cap: "College production vs NFL career value"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Create faceted scatter plot by position
draft_with_college %>%
filter(position %in% c("QB", "RB", "WR", "OL", "DL", "LB", "DB")) %>%
ggplot(aes(x = college_production, y = career_av)) +
geom_point(aes(color = position), alpha = 0.4, size = 1) +
geom_smooth(method = "lm", se = TRUE, color = "black", linewidth = 0.8) +
facet_wrap(~ position, ncol = 4) +
scale_color_manual(
values = c("QB" = "#d62728", "RB" = "#2ca02c", "WR" = "#1f77b4",
"OL" = "#ff7f0e", "DL" = "#9467bd", "LB" = "#8c564b",
"DB" = "#e377c2")
) +
labs(
title = "College Production vs NFL Career Value by Position",
subtitle = "Relationship varies significantly across positions",
x = "College Production Score (0-100)",
y = "Career Approximate Value",
caption = "Data: Simulated draft data | Lines show linear fit"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "none",
strip.text = element_text(face = "bold", size = 10)
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-college-nfl-scatter-py
#| fig-cap: "College production vs NFL career value - Python"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false
# Filter positions
positions_to_plot = ['QB', 'RB', 'WR', 'OL', 'DL', 'LB', 'DB']
plot_data = draft_with_college[draft_with_college['position'].isin(positions_to_plot)]
# Create subplot grid
fig, axes = plt.subplots(2, 4, figsize=(14, 8))
axes = axes.ravel()
colors = {'QB': '#d62728', 'RB': '#2ca02c', 'WR': '#1f77b4',
'OL': '#ff7f0e', 'DL': '#9467bd', 'LB': '#8c564b', 'DB': '#e377c2'}
for idx, position in enumerate(positions_to_plot):
pos_data = plot_data[plot_data['position'] == position]
axes[idx].scatter(pos_data['college_production'], pos_data['career_av'],
alpha=0.4, s=20, color=colors[position])
# Add regression line
z = np.polyfit(pos_data['college_production'], pos_data['career_av'], 1)
p = np.poly1d(z)
x_line = np.linspace(pos_data['college_production'].min(),
pos_data['college_production'].max(), 100)
axes[idx].plot(x_line, p(x_line), 'k-', linewidth=1.5, alpha=0.8)
axes[idx].set_title(position, fontweight='bold', fontsize=11)
axes[idx].set_xlabel('College Production Score', fontsize=9)
axes[idx].set_ylabel('Career AV', fontsize=9)
axes[idx].grid(True, alpha=0.3)
# Remove empty subplot
fig.delaxes(axes[7])
fig.suptitle('College Production vs NFL Career Value by Position\nRelationship varies significantly across positions',
fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()
NFL Combine Metrics and Success
The NFL Combine provides standardized athletic testing. But how predictive are these metrics?
#| label: combine-analysis-r
#| message: false
#| warning: false
#| cache: true
# Simulate combine metrics
set.seed(789)
draft_with_combine <- draft_with_college %>%
mutate(
# Simulate combine metrics (position-dependent)
forty_yard = case_when(
position %in% c("WR", "DB") ~ rnorm(n(), 4.50, 0.10),
position %in% c("RB", "LB") ~ rnorm(n(), 4.60, 0.12),
position == "QB" ~ rnorm(n(), 4.75, 0.15),
position == "TE" ~ rnorm(n(), 4.70, 0.12),
position %in% c("OL", "DL") ~ rnorm(n(), 5.10, 0.20),
TRUE ~ rnorm(n(), 4.80, 0.20)
),
# Faster times correlate weakly with success
forty_yard = forty_yard - career_av * 0.002,
# Vertical jump
vertical = case_when(
position %in% c("WR", "DB") ~ rnorm(n(), 36, 3),
position %in% c("RB", "LB") ~ rnorm(n(), 34, 3),
position %in% c("OL", "DL") ~ rnorm(n(), 28, 3),
TRUE ~ rnorm(n(), 32, 3)
) + career_av * 0.1,
# Broad jump
broad_jump = case_when(
position %in% c("WR", "DB") ~ rnorm(n(), 120, 6),
position %in% c("RB", "LB") ~ rnorm(n(), 118, 6),
position %in% c("OL", "DL") ~ rnorm(n(), 105, 7),
TRUE ~ rnorm(n(), 115, 6)
) + career_av * 0.2,
# Bench press (OL/DL specific)
bench_press = case_when(
position %in% c("OL", "DL") ~ round(rnorm(n(), 25, 4) + career_av * 0.05),
position == "LB" ~ round(rnorm(n(), 22, 3) + career_av * 0.04),
TRUE ~ round(rnorm(n(), 15, 3) + career_av * 0.02)
),
# Height and weight (BMI proxy)
height_inches = case_when(
position == "QB" ~ rnorm(n(), 74, 2),
position %in% c("WR", "DB") ~ rnorm(n(), 71, 2),
position %in% c("RB") ~ rnorm(n(), 70, 2),
position %in% c("OL", "DL") ~ rnorm(n(), 76, 2),
position %in% c("TE", "LB") ~ rnorm(n(), 74, 2),
TRUE ~ rnorm(n(), 73, 2)
),
weight_lbs = case_when(
position %in% c("OL", "DL") ~ rnorm(n(), 310, 20),
position == "TE" ~ rnorm(n(), 250, 15),
position == "LB" ~ rnorm(n(), 240, 15),
position == "QB" ~ rnorm(n(), 220, 10),
position == "RB" ~ rnorm(n(), 215, 15),
position %in% c("WR", "DB") ~ rnorm(n(), 195, 15),
TRUE ~ rnorm(n(), 220, 20)
)
)
# Calculate combine metric importance by position
combine_correlations <- draft_with_combine %>%
group_by(position) %>%
summarise(
forty_cor = cor(forty_yard, career_av, use = "complete.obs"),
vertical_cor = cor(vertical, career_av, use = "complete.obs"),
broad_cor = cor(broad_jump, career_av, use = "complete.obs"),
bench_cor = cor(bench_press, career_av, use = "complete.obs"),
.groups = "drop"
)
# Display
combine_correlations %>%
gt() %>%
cols_label(
position = "Position",
forty_cor = "40-Yard",
vertical_cor = "Vertical",
broad_cor = "Broad Jump",
bench_cor = "Bench"
) %>%
fmt_number(
columns = c(forty_cor, vertical_cor, broad_cor, bench_cor),
decimals = 3
) %>%
tab_header(
title = "Combine Metric Correlations with NFL Success",
subtitle = "By position (negative for 40-yard dash = faster is better)"
)
#| label: combine-analysis-py
#| message: false
#| warning: false
#| cache: true
# Simulate combine metrics
np.random.seed(789)
def generate_combine_metrics(df):
"""Add simulated combine metrics"""
metrics = {
'forty_yard': [],
'vertical': [],
'broad_jump': [],
'bench_press': [],
'height_inches': [],
'weight_lbs': []
}
for _, row in df.iterrows():
pos = row['position']
av = row['career_av']
# 40-yard dash (faster = lower time)
if pos in ['WR', 'DB']:
forty = np.random.normal(4.50, 0.10) - av * 0.002
elif pos in ['RB', 'LB']:
forty = np.random.normal(4.60, 0.12) - av * 0.002
elif pos == 'QB':
forty = np.random.normal(4.75, 0.15) - av * 0.002
elif pos == 'TE':
forty = np.random.normal(4.70, 0.12) - av * 0.002
elif pos in ['OL', 'DL']:
forty = np.random.normal(5.10, 0.20) - av * 0.002
else:
forty = np.random.normal(4.80, 0.20) - av * 0.002
# Vertical jump
if pos in ['WR', 'DB']:
vertical = np.random.normal(36, 3) + av * 0.1
elif pos in ['RB', 'LB']:
vertical = np.random.normal(34, 3) + av * 0.1
elif pos in ['OL', 'DL']:
vertical = np.random.normal(28, 3) + av * 0.1
else:
vertical = np.random.normal(32, 3) + av * 0.1
# Broad jump
if pos in ['WR', 'DB']:
broad = np.random.normal(120, 6) + av * 0.2
elif pos in ['RB', 'LB']:
broad = np.random.normal(118, 6) + av * 0.2
elif pos in ['OL', 'DL']:
broad = np.random.normal(105, 7) + av * 0.2
else:
broad = np.random.normal(115, 6) + av * 0.2
# Bench press
if pos in ['OL', 'DL']:
bench = round(np.random.normal(25, 4) + av * 0.05)
elif pos == 'LB':
bench = round(np.random.normal(22, 3) + av * 0.04)
else:
bench = round(np.random.normal(15, 3) + av * 0.02)
# Height
if pos == 'QB':
height = np.random.normal(74, 2)
elif pos in ['WR', 'DB']:
height = np.random.normal(71, 2)
elif pos == 'RB':
height = np.random.normal(70, 2)
elif pos in ['OL', 'DL']:
height = np.random.normal(76, 2)
elif pos in ['TE', 'LB']:
height = np.random.normal(74, 2)
else:
height = np.random.normal(73, 2)
# Weight
if pos in ['OL', 'DL']:
weight = np.random.normal(310, 20)
elif pos == 'TE':
weight = np.random.normal(250, 15)
elif pos == 'LB':
weight = np.random.normal(240, 15)
elif pos == 'QB':
weight = np.random.normal(220, 10)
elif pos == 'RB':
weight = np.random.normal(215, 15)
elif pos in ['WR', 'DB']:
weight = np.random.normal(195, 15)
else:
weight = np.random.normal(220, 20)
metrics['forty_yard'].append(forty)
metrics['vertical'].append(vertical)
metrics['broad_jump'].append(broad)
metrics['bench_press'].append(bench)
metrics['height_inches'].append(height)
metrics['weight_lbs'].append(weight)
return metrics
# Add metrics to dataframe
combine_metrics = generate_combine_metrics(draft_with_college)
for key, values in combine_metrics.items():
draft_with_college[key] = values
draft_with_combine = draft_with_college.copy()
# Calculate correlations by position
combine_correlations = (draft_with_combine
.groupby('position')
.apply(lambda x: pd.Series({
'forty_cor': x['forty_yard'].corr(x['career_av']),
'vertical_cor': x['vertical'].corr(x['career_av']),
'broad_cor': x['broad_jump'].corr(x['career_av']),
'bench_cor': x['bench_press'].corr(x['career_av'])
}))
.reset_index()
)
print("\nCombine Metric Correlations with NFL Success:\n")
print(combine_correlations.to_string(index=False))
print("\nNote: Negative for 40-yard dash = faster is better")
Combine Limitations
While combine metrics are measurable and objective, their correlation with NFL success is position-dependent and generally modest. Athletic testing should complement, not replace, film study and college production analysis. The combine is best used to identify outliers (positive or negative) rather than as a primary evaluation tool.Draft Trade Value and Optimization
How should teams value draft pick trades?
Trade Value Model
#| label: trade-value-model-r
#| message: false
#| warning: false
# Create surplus value model (AV-based value minus salary cost)
# In practice, would use actual rookie salary scale
# Create modern value chart based on expected AV
modern_value_chart <- pick_values %>%
mutate(
# Expected value based on historical AV
expected_value = avg_av,
# Normalize to 1000 points for pick 1
value_points = (expected_value / max(expected_value)) * 1000,
# Calculate surplus value (simplified)
# Assume rookie contracts worth ~$X million, decreasing by pick
rookie_cost = case_when(
pick <= 10 ~ 30 - pick * 2,
pick <= 32 ~ 20 - (pick - 10) * 0.8,
pick <= 100 ~ 10 - (pick - 32) * 0.05,
TRUE ~ 5 - (pick - 100) * 0.01
),
# Surplus = Value - Cost (simplified)
surplus_value = expected_value - (rookie_cost / 2)
)
# Compare value charts
value_chart_comparison <- draft_values %>%
left_join(modern_value_chart %>% select(pick, value_points, surplus_value),
by = "pick") %>%
rename(
jj_value = jimmy_johnson,
modern_value = value_points
)
# Display first round comparison
value_chart_comparison %>%
filter(pick <= 32) %>%
select(pick, round, jj_value, modern_value, surplus_value) %>%
gt() %>%
cols_label(
pick = "Pick",
round = "Rnd",
jj_value = "JJ Chart",
modern_value = "Modern",
surplus_value = "Surplus"
) %>%
fmt_number(
columns = c(jj_value, modern_value, surplus_value),
decimals = 0
) %>%
tab_header(
title = "Draft Pick Value Chart Comparison",
subtitle = "First Round (Picks 1-32)"
)
#| label: trade-value-model-py
#| message: false
#| warning: false
# Create modern value chart
modern_value_chart = pick_values.copy()
# Expected value from historical AV
modern_value_chart['expected_value'] = modern_value_chart['avg_av']
# Normalize to 1000 for pick 1
max_value = modern_value_chart['expected_value'].max()
modern_value_chart['value_points'] = (
modern_value_chart['expected_value'] / max_value
) * 1000
# Simplified rookie cost model
def calculate_rookie_cost(pick):
if pick <= 10:
return 30 - pick * 2
elif pick <= 32:
return 20 - (pick - 10) * 0.8
elif pick <= 100:
return 10 - (pick - 32) * 0.05
else:
return 5 - (pick - 100) * 0.01
modern_value_chart['rookie_cost'] = modern_value_chart['pick'].apply(
calculate_rookie_cost
)
# Surplus value
modern_value_chart['surplus_value'] = (
modern_value_chart['expected_value'] -
modern_value_chart['rookie_cost'] / 2
)
# Combine charts
value_chart_comparison = draft_values.merge(
modern_value_chart[['pick', 'value_points', 'surplus_value']],
on='pick'
)
value_chart_comparison.rename(columns={
'jimmy_johnson': 'jj_value',
'value_points': 'modern_value'
}, inplace=True)
# Display first round
print("\nDraft Pick Value Chart Comparison - First Round:\n")
first_rd = value_chart_comparison[value_chart_comparison['pick'] <= 32][
['pick', 'round', 'jj_value', 'modern_value', 'surplus_value']
].copy()
print(first_rd.to_string(index=False))
Trade Analysis Function
#| label: trade-analysis-r
#| message: false
#| warning: false
# Function to analyze draft trades
analyze_trade <- function(team_a_picks, team_b_picks, value_system = "modern") {
# Get value chart
if (value_system == "modern") {
values <- value_chart_comparison %>%
select(pick, value = modern_value)
} else {
values <- value_chart_comparison %>%
select(pick, value = jj_value)
}
# Calculate team values
team_a_value <- values %>%
filter(pick %in% team_a_picks) %>%
summarise(total = sum(value, na.rm = TRUE)) %>%
pull(total)
team_b_value <- values %>%
filter(pick %in% team_b_picks) %>%
summarise(total = sum(value, na.rm = TRUE)) %>%
pull(total)
# Return analysis
list(
team_a_value = team_a_value,
team_b_value = team_b_value,
difference = team_a_value - team_b_value,
winner = if_else(team_a_value > team_b_value, "Team A", "Team B"),
fair = abs(team_a_value - team_b_value) < 50
)
}
# Example trade: Pick 5 for picks 15 and 45
trade_example <- analyze_trade(
team_a_picks = c(5),
team_b_picks = c(15, 45),
value_system = "modern"
)
cat("\nTrade Analysis:\n")
cat("Team A receives: Pick 5\n")
cat("Team B receives: Picks 15, 45\n\n")
cat(sprintf("Team A Value: %.0f\n", trade_example$team_a_value))
cat(sprintf("Team B Value: %.0f\n", trade_example$team_b_value))
cat(sprintf("Difference: %.0f\n", trade_example$difference))
cat(sprintf("Winner: %s\n", trade_example$winner))
cat(sprintf("Fair trade: %s\n", ifelse(trade_example$fair, "Yes", "No")))
#| label: trade-analysis-py
#| message: false
#| warning: false
def analyze_trade(team_a_picks, team_b_picks, value_system='modern'):
"""Analyze a draft pick trade"""
# Get value chart
if value_system == 'modern':
values = value_chart_comparison[['pick', 'modern_value']].copy()
values.columns = ['pick', 'value']
else:
values = value_chart_comparison[['pick', 'jj_value']].copy()
values.columns = ['pick', 'value']
# Calculate team values
team_a_value = values[values['pick'].isin(team_a_picks)]['value'].sum()
team_b_value = values[values['pick'].isin(team_b_picks)]['value'].sum()
difference = team_a_value - team_b_value
winner = "Team A" if difference > 0 else "Team B"
fair = abs(difference) < 50
return {
'team_a_value': team_a_value,
'team_b_value': team_b_value,
'difference': difference,
'winner': winner,
'fair': fair
}
# Example trade
trade_example = analyze_trade(
team_a_picks=[5],
team_b_picks=[15, 45],
value_system='modern'
)
print("\nTrade Analysis:")
print("Team A receives: Pick 5")
print("Team B receives: Picks 15, 45\n")
print(f"Team A Value: {trade_example['team_a_value']:.0f}")
print(f"Team B Value: {trade_example['team_b_value']:.0f}")
print(f"Difference: {trade_example['difference']:.0f}")
print(f"Winner: {trade_example['winner']}")
print(f"Fair trade: {'Yes' if trade_example['fair'] else 'No'}")
Positional Value and Scarcity
Not all positions are equally valuable or scarce in the draft.
#| label: positional-value-r
#| message: false
#| warning: false
# Calculate positional value metrics
positional_value <- draft_historical %>%
group_by(position, round) %>%
summarise(
n_drafted = n(),
avg_av = mean(career_av),
hit_rate = mean(hit),
.groups = "drop"
)
# Overall positional value (across all rounds)
position_overall <- draft_historical %>%
group_by(position) %>%
summarise(
n_drafted = n(),
avg_av = mean(career_av),
hit_rate = mean(hit),
starter_rate = mean(starter),
top_picks = sum(pick <= 32),
avg_pick = mean(pick),
.groups = "drop"
) %>%
arrange(desc(avg_av))
# Display
position_overall %>%
gt() %>%
cols_label(
position = "Position",
n_drafted = "N Drafted",
avg_av = "Avg AV",
hit_rate = "Hit Rate",
starter_rate = "Starter Rate",
top_picks = "1st Rd Picks",
avg_pick = "Avg Pick"
) %>%
fmt_number(
columns = c(avg_av, avg_pick),
decimals = 1
) %>%
fmt_percent(
columns = c(hit_rate, starter_rate),
decimals = 1
) %>%
tab_header(
title = "Positional Value in the NFL Draft",
subtitle = "Aggregated across all rounds (2010-2019)"
)
#| label: positional-value-py
#| message: false
#| warning: false
# Calculate positional value
position_overall = (draft_historical
.groupby('position')
.agg({
'year': 'count',
'career_av': 'mean',
'hit': 'mean',
'starter': 'mean',
'pick': ['mean', lambda x: (x <= 32).sum()]
})
.reset_index()
)
position_overall.columns = ['position', 'n_drafted', 'avg_av', 'hit_rate',
'starter_rate', 'avg_pick', 'top_picks']
position_overall = position_overall[['position', 'n_drafted', 'avg_av',
'hit_rate', 'starter_rate', 'top_picks',
'avg_pick']].sort_values('avg_av', ascending=False)
print("\nPositional Value in the NFL Draft:\n")
display_df = position_overall.copy()
display_df['avg_av'] = display_df['avg_av'].round(1)
display_df['avg_pick'] = display_df['avg_pick'].round(1)
display_df['hit_rate'] = (display_df['hit_rate'] * 100).round(1)
display_df['starter_rate'] = (display_df['starter_rate'] * 100).round(1)
print(display_df.to_string(index=False))
Positional Scarcity by Round
#| label: fig-position-scarcity-r
#| fig-cap: "Positional hit rates decline by round"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Calculate by round
position_by_round <- draft_historical %>%
filter(round <= 5) %>%
group_by(position, round) %>%
summarise(
hit_rate = mean(hit),
.groups = "drop"
)
ggplot(position_by_round, aes(x = round, y = hit_rate, color = position)) +
geom_line(linewidth = 1.2) +
geom_point(size = 2.5) +
scale_y_continuous(labels = percent_format()) +
scale_x_continuous(breaks = 1:5) +
scale_color_brewer(palette = "Set2") +
labs(
title = "Draft Hit Rates by Position and Round",
subtitle = "Rounds 1-5 | Hit = Career AV > 40",
x = "Round",
y = "Hit Rate",
color = "Position"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "right"
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-position-scarcity-py
#| fig-cap: "Positional hit rates decline by round - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Calculate by round
position_by_round = (draft_historical[draft_historical['round'] <= 5]
.groupby(['position', 'round'])
.agg({'hit': 'mean'})
.reset_index()
.rename(columns={'hit': 'hit_rate'})
)
# Create plot
fig, ax = plt.subplots(figsize=(12, 7))
positions = position_by_round['position'].unique()
colors = plt.cm.Set2(np.linspace(0, 1, len(positions)))
for i, position in enumerate(positions):
pos_data = position_by_round[position_by_round['position'] == position]
ax.plot(pos_data['round'], pos_data['hit_rate'],
marker='o', linewidth=2, markersize=7,
label=position, color=colors[i])
ax.set_xlabel('Round', fontsize=12)
ax.set_ylabel('Hit Rate', fontsize=12)
ax.set_title('Draft Hit Rates by Position and Round\nRounds 1-5 | Hit = Career AV > 40',
fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(range(1, 6))
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.legend(title='Position', bbox_to_anchor=(1.05, 1), loc='upper left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Team Draft Performance Evaluation
How do we evaluate a team's draft performance?
#| label: team-draft-performance-r
#| message: false
#| warning: false
#| cache: true
# Simulate team assignments
set.seed(999)
teams <- c("ARI", "ATL", "BAL", "BUF", "CAR", "CHI", "CIN", "CLE",
"DAL", "DEN", "DET", "GB", "HOU", "IND", "JAX", "KC",
"LAC", "LAR", "LV", "MIA", "MIN", "NE", "NO", "NYG",
"NYJ", "PHI", "PIT", "SEA", "SF", "TB", "TEN", "WAS")
draft_with_teams <- draft_historical %>%
mutate(
team = sample(teams, n(), replace = TRUE)
)
# Calculate team draft grades
team_performance <- draft_with_teams %>%
group_by(team) %>%
summarise(
n_picks = n(),
total_av = sum(career_av),
avg_av = mean(career_av),
hit_rate = mean(hit),
bust_rate = mean(career_av < 10),
starters = sum(starter),
# Expected AV based on draft position
expected_av = sum(pick_values$avg_av[match(pick, pick_values$pick)]),
# Actual vs expected
av_over_expected = total_av - expected_av,
.groups = "drop"
) %>%
arrange(desc(av_over_expected))
# Display top 10 teams
team_performance %>%
head(10) %>%
mutate(rank = row_number()) %>%
select(rank, team, n_picks, avg_av, hit_rate, starters, av_over_expected) %>%
gt() %>%
cols_label(
rank = "Rank",
team = "Team",
n_picks = "Picks",
avg_av = "Avg AV",
hit_rate = "Hit Rate",
starters = "Starters",
av_over_expected = "AV vs Exp"
) %>%
fmt_number(
columns = avg_av,
decimals = 1
) %>%
fmt_percent(
columns = hit_rate,
decimals = 1
) %>%
fmt_number(
columns = av_over_expected,
decimals = 0
) %>%
tab_header(
title = "Best Drafting Teams (2010-2019)",
subtitle = "Ranked by AV above/below expected based on draft position"
)
#| label: team-draft-performance-py
#| message: false
#| warning: false
#| cache: true
# Simulate team assignments
np.random.seed(999)
teams = ["ARI", "ATL", "BAL", "BUF", "CAR", "CHI", "CIN", "CLE",
"DAL", "DEN", "DET", "GB", "HOU", "IND", "JAX", "KC",
"LAC", "LAR", "LV", "MIA", "MIN", "NE", "NO", "NYG",
"NYJ", "PHI", "PIT", "SEA", "SF", "TB", "TEN", "WAS"]
draft_with_teams = draft_historical.copy()
draft_with_teams['team'] = np.random.choice(teams, len(draft_with_teams))
# Add expected AV
draft_with_teams = draft_with_teams.merge(
pick_values[['pick', 'avg_av']].rename(columns={'avg_av': 'expected_av'}),
on='pick'
)
# Calculate team performance
team_performance = (draft_with_teams
.groupby('team')
.agg({
'year': 'count',
'career_av': ['sum', 'mean'],
'hit': 'mean',
'starter': 'sum',
'expected_av': 'sum'
})
.reset_index()
)
team_performance.columns = ['team', 'n_picks', 'total_av', 'avg_av',
'hit_rate', 'starters', 'expected_av']
# Calculate AV over expected
team_performance['av_over_expected'] = (
team_performance['total_av'] - team_performance['expected_av']
)
# Add bust rate
bust_rate = (draft_with_teams
.groupby('team')
.apply(lambda x: (x['career_av'] < 10).mean())
.reset_index(name='bust_rate')
)
team_performance = team_performance.merge(bust_rate, on='team')
# Sort and display top 10
team_performance = team_performance.sort_values('av_over_expected', ascending=False)
print("\nBest Drafting Teams (2010-2019):\n")
top_10 = team_performance.head(10).copy()
top_10['rank'] = range(1, 11)
top_10['avg_av'] = top_10['avg_av'].round(1)
top_10['hit_rate'] = (top_10['hit_rate'] * 100).round(1)
top_10['av_over_expected'] = top_10['av_over_expected'].round(0)
print(top_10[['rank', 'team', 'n_picks', 'avg_av', 'hit_rate',
'starters', 'av_over_expected']].to_string(index=False))
Classifying Hits vs Misses
Defining clear criteria for draft success:
#| label: hit-miss-classification-r
#| message: false
#| warning: false
# Create classification system
draft_classified <- draft_historical %>%
mutate(
# Classification based on AV and round
classification = case_when(
# Round 1
round == 1 & career_av >= 60 ~ "Elite",
round == 1 & career_av >= 40 ~ "Hit",
round == 1 & career_av >= 25 ~ "Starter",
round == 1 & career_av >= 10 ~ "Contributor",
round == 1 ~ "Bust",
# Round 2
round == 2 & career_av >= 50 ~ "Elite",
round == 2 & career_av >= 35 ~ "Hit",
round == 2 & career_av >= 20 ~ "Starter",
round == 2 & career_av >= 10 ~ "Contributor",
round == 2 ~ "Bust",
# Rounds 3-4
round %in% 3:4 & career_av >= 40 ~ "Elite",
round %in% 3:4 & career_av >= 25 ~ "Hit",
round %in% 3:4 & career_av >= 15 ~ "Starter",
round %in% 3:4 & career_av >= 5 ~ "Contributor",
round %in% 3:4 ~ "Bust",
# Rounds 5+
round >= 5 & career_av >= 30 ~ "Elite",
round >= 5 & career_av >= 20 ~ "Hit",
round >= 5 & career_av >= 10 ~ "Starter",
round >= 5 & career_av >= 3 ~ "Contributor",
TRUE ~ "Bust"
),
classification = factor(
classification,
levels = c("Elite", "Hit", "Starter", "Contributor", "Bust")
)
)
# Distribution by round
classification_dist <- draft_classified %>%
group_by(round, classification) %>%
summarise(n = n(), .groups = "drop") %>%
group_by(round) %>%
mutate(pct = n / sum(n))
# Display for rounds 1-3
classification_dist %>%
filter(round <= 3) %>%
select(-n) %>%
pivot_wider(names_from = classification, values_from = pct, values_fill = 0) %>%
gt() %>%
cols_label(round = "Round") %>%
fmt_percent(
columns = c(Elite, Hit, Starter, Contributor, Bust),
decimals = 1
) %>%
tab_header(
title = "Draft Pick Classification Distribution",
subtitle = "Rounds 1-3"
)
#| label: hit-miss-classification-py
#| message: false
#| warning: false
def classify_pick(row):
"""Classify draft pick based on AV and round"""
av = row['career_av']
rd = row['round']
if rd == 1:
if av >= 60: return 'Elite'
elif av >= 40: return 'Hit'
elif av >= 25: return 'Starter'
elif av >= 10: return 'Contributor'
else: return 'Bust'
elif rd == 2:
if av >= 50: return 'Elite'
elif av >= 35: return 'Hit'
elif av >= 20: return 'Starter'
elif av >= 10: return 'Contributor'
else: return 'Bust'
elif rd in [3, 4]:
if av >= 40: return 'Elite'
elif av >= 25: return 'Hit'
elif av >= 15: return 'Starter'
elif av >= 5: return 'Contributor'
else: return 'Bust'
else: # Round 5+
if av >= 30: return 'Elite'
elif av >= 20: return 'Hit'
elif av >= 10: return 'Starter'
elif av >= 3: return 'Contributor'
else: return 'Bust'
# Apply classification
draft_classified = draft_historical.copy()
draft_classified['classification'] = draft_classified.apply(classify_pick, axis=1)
# Calculate distribution
classification_dist = (draft_classified
.groupby(['round', 'classification'])
.size()
.reset_index(name='n')
)
classification_dist['pct'] = (classification_dist
.groupby('round')['n']
.transform(lambda x: x / x.sum())
)
# Pivot for display
classification_pivot = classification_dist[classification_dist['round'] <= 3].pivot(
index='round',
columns='classification',
values='pct'
).fillna(0)
# Reorder columns
col_order = ['Elite', 'Hit', 'Starter', 'Contributor', 'Bust']
classification_pivot = classification_pivot[[c for c in col_order if c in classification_pivot.columns]]
print("\nDraft Pick Classification Distribution (Rounds 1-3):\n")
print((classification_pivot * 100).round(1).to_string())
Visualization of Classifications
#| label: fig-classification-viz-r
#| fig-cap: "Draft pick classification by round"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Stacked bar chart
classification_dist %>%
filter(round <= 5) %>%
ggplot(aes(x = factor(round), y = pct, fill = classification)) +
geom_col(position = "stack") +
scale_fill_manual(
values = c(
"Elite" = "#2E7D32",
"Hit" = "#66BB6A",
"Starter" = "#FDD835",
"Contributor" = "#FFB74D",
"Bust" = "#E53935"
)
) +
scale_y_continuous(labels = percent_format()) +
labs(
title = "Draft Pick Outcomes by Round",
subtitle = "Distribution of player classifications (Rounds 1-5)",
x = "Round",
y = "Percentage",
fill = "Classification"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "right"
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-classification-viz-py
#| fig-cap: "Draft pick classification by round - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Filter to rounds 1-5
plot_data = classification_dist[classification_dist['round'] <= 5].copy()
# Pivot for stacking
plot_pivot = plot_data.pivot(index='round', columns='classification', values='pct').fillna(0)
# Reorder columns
col_order = ['Elite', 'Hit', 'Starter', 'Contributor', 'Bust']
plot_pivot = plot_pivot[[c for c in col_order if c in plot_pivot.columns]]
# Create stacked bar chart
fig, ax = plt.subplots(figsize=(12, 7))
colors = {
'Elite': '#2E7D32',
'Hit': '#66BB6A',
'Starter': '#FDD835',
'Contributor': '#FFB74D',
'Bust': '#E53935'
}
bottom = np.zeros(len(plot_pivot))
for classification in col_order:
if classification in plot_pivot.columns:
ax.bar(plot_pivot.index, plot_pivot[classification],
bottom=bottom, label=classification,
color=colors[classification], alpha=0.9)
bottom += plot_pivot[classification].values
ax.set_xlabel('Round', fontsize=12)
ax.set_ylabel('Percentage', fontsize=12)
ax.set_title('Draft Pick Outcomes by Round\nDistribution of player classifications (Rounds 1-5)',
fontsize=14, fontweight='bold', pad=20)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax.legend(title='Classification', loc='upper right')
ax.set_xticks(range(1, 6))
plt.tight_layout()
plt.show()
Machine Learning for Draft Projection
Modern draft analytics increasingly uses machine learning to project player success.
#| label: ml-draft-model-r
#| message: false
#| warning: false
#| cache: true
library(randomForest)
# Prepare training data
ml_data <- draft_with_combine %>%
select(
career_av,
pick,
round,
position,
college_production,
athletic_score,
forty_yard,
vertical,
broad_jump,
bench_press,
height_inches,
weight_lbs
) %>%
na.omit()
# Split train/test
set.seed(2024)
train_idx <- sample(1:nrow(ml_data), 0.8 * nrow(ml_data))
train_data <- ml_data[train_idx, ]
test_data <- ml_data[-train_idx, ]
# Train random forest model
rf_model <- randomForest(
career_av ~ .,
data = train_data,
ntree = 500,
mtry = 4,
importance = TRUE
)
# Make predictions
train_pred <- predict(rf_model, train_data)
test_pred <- predict(rf_model, test_data)
# Calculate metrics
train_rmse <- sqrt(mean((train_data$career_av - train_pred)^2))
test_rmse <- sqrt(mean((test_data$career_av - test_pred)^2))
train_r2 <- cor(train_data$career_av, train_pred)^2
test_r2 <- cor(test_data$career_av, test_pred)^2
# Display results
cat("\nRandom Forest Model Performance:\n")
cat(sprintf("Training RMSE: %.2f\n", train_rmse))
cat(sprintf("Testing RMSE: %.2f\n", test_rmse))
cat(sprintf("Training R²: %.3f\n", train_r2))
cat(sprintf("Testing R²: %.3f\n", test_r2))
# Feature importance
importance_df <- as.data.frame(importance(rf_model)) %>%
rownames_to_column("feature") %>%
as_tibble() %>%
arrange(desc(`%IncMSE`))
# Display top features
importance_df %>%
head(10) %>%
select(feature, `%IncMSE`, IncNodePurity) %>%
gt() %>%
cols_label(
feature = "Feature",
`%IncMSE` = "% Inc MSE",
IncNodePurity = "Node Purity"
) %>%
fmt_number(
columns = c(`%IncMSE`, IncNodePurity),
decimals = 2
) %>%
tab_header(
title = "Random Forest Feature Importance",
subtitle = "Top 10 predictors of NFL career value"
)
#| label: ml-draft-model-py
#| message: false
#| warning: false
#| cache: true
# Prepare data
ml_data = draft_with_combine[[
'career_av', 'pick', 'round', 'position', 'college_production',
'athletic_score', 'forty_yard', 'vertical', 'broad_jump',
'bench_press', 'height_inches', 'weight_lbs'
]].dropna()
# Encode position
ml_data_encoded = pd.get_dummies(ml_data, columns=['position'], drop_first=True)
# Split features and target
X = ml_data_encoded.drop('career_av', axis=1)
y = ml_data_encoded['career_av']
# Train/test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=2024
)
# Train Random Forest
rf_model = RandomForestRegressor(
n_estimators=500,
max_features='sqrt',
random_state=2024,
n_jobs=-1
)
rf_model.fit(X_train, y_train)
# Predictions
train_pred = rf_model.predict(X_train)
test_pred = rf_model.predict(X_test)
# Metrics
train_rmse = np.sqrt(mean_squared_error(y_train, train_pred))
test_rmse = np.sqrt(mean_squared_error(y_test, test_pred))
train_r2 = r2_score(y_train, train_pred)
test_r2 = r2_score(y_test, test_pred)
print("\nRandom Forest Model Performance:")
print(f"Training RMSE: {train_rmse:.2f}")
print(f"Testing RMSE: {test_rmse:.2f}")
print(f"Training R²: {train_r2:.3f}")
print(f"Testing R²: {test_r2:.3f}")
# Feature importance
importance_df = pd.DataFrame({
'feature': X.columns,
'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)
print("\nTop 10 Feature Importances:")
print(importance_df.head(10).to_string(index=False))
Model Comparison
Let's compare multiple modeling approaches:
#| label: model-comparison-r
#| message: false
#| warning: false
#| cache: true
library(glmnet)
# Prepare matrix for glmnet
x_train <- model.matrix(career_av ~ . - 1, data = train_data)
y_train <- train_data$career_av
x_test <- model.matrix(career_av ~ . - 1, data = test_data)
y_test <- test_data$career_av
# Ridge regression
ridge_model <- cv.glmnet(x_train, y_train, alpha = 0)
ridge_pred <- predict(ridge_model, x_test, s = "lambda.min")
ridge_rmse <- sqrt(mean((y_test - ridge_pred)^2))
ridge_r2 <- cor(y_test, ridge_pred)^2
# Lasso regression
lasso_model <- cv.glmnet(x_train, y_train, alpha = 1)
lasso_pred <- predict(lasso_model, x_test, s = "lambda.min")
lasso_rmse <- sqrt(mean((y_test - lasso_pred)^2))
lasso_r2 <- cor(y_test, lasso_pred)^2
# Linear regression baseline
lm_model <- lm(career_av ~ ., data = train_data)
lm_pred <- predict(lm_model, test_data)
lm_rmse <- sqrt(mean((y_test - lm_pred)^2))
lm_r2 <- summary(lm_model)$r.squared
# Compare models
model_comparison <- tibble(
Model = c("Linear Regression", "Ridge", "Lasso", "Random Forest"),
RMSE = c(lm_rmse, ridge_rmse, lasso_rmse, test_rmse),
R_squared = c(lm_r2, ridge_r2, lasso_r2, test_r2)
) %>%
arrange(RMSE)
model_comparison %>%
gt() %>%
cols_label(
Model = "Model",
RMSE = "RMSE",
R_squared = "R²"
) %>%
fmt_number(
columns = c(RMSE, R_squared),
decimals = 3
) %>%
tab_header(
title = "Draft Projection Model Comparison",
subtitle = "Test set performance"
)
#| label: model-comparison-py
#| message: false
#| warning: false
#| cache: true
# Ridge Regression
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)
ridge_pred = ridge_model.predict(X_test)
ridge_rmse = np.sqrt(mean_squared_error(y_test, ridge_pred))
ridge_r2 = r2_score(y_test, ridge_pred)
# Gradient Boosting
gb_model = GradientBoostingRegressor(
n_estimators=200,
max_depth=4,
random_state=2024
)
gb_model.fit(X_train, y_train)
gb_pred = gb_model.predict(X_test)
gb_rmse = np.sqrt(mean_squared_error(y_test, gb_pred))
gb_r2 = r2_score(y_test, gb_pred)
# Compare models
model_comparison = pd.DataFrame({
'Model': ['Ridge', 'Random Forest', 'Gradient Boosting'],
'RMSE': [ridge_rmse, test_rmse, gb_rmse],
'R_squared': [ridge_r2, test_r2, gb_r2]
}).sort_values('RMSE')
print("\nDraft Projection Model Comparison (Test Set):\n")
print(model_comparison.to_string(index=False))
Prediction Intervals
#| label: fig-prediction-intervals-r
#| fig-cap: "Model predictions vs actual career AV"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Create prediction dataframe
pred_df <- test_data %>%
mutate(
predicted = test_pred,
residual = career_av - predicted
)
# Plot predictions vs actual
ggplot(pred_df, aes(x = predicted, y = career_av)) +
geom_point(alpha = 0.4, color = "#1f77b4") +
geom_abline(slope = 1, intercept = 0, linetype = "dashed", color = "red") +
geom_smooth(method = "lm", se = TRUE, color = "black", linewidth = 0.8) +
labs(
title = "Random Forest Predictions vs Actual Career AV",
subtitle = sprintf("Test Set | RMSE = %.2f | R² = %.3f", test_rmse, test_r2),
x = "Predicted Career AV",
y = "Actual Career AV",
caption = "Red dashed line = perfect predictions | Black line = actual fit"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14)
)
📊 Visualization Output
The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.
#| label: fig-prediction-intervals-py
#| fig-cap: "Model predictions vs actual career AV - Python"
#| fig-width: 12
#| fig-height: 7
#| message: false
#| warning: false
# Create plot
fig, ax = plt.subplots(figsize=(12, 7))
ax.scatter(test_pred, y_test, alpha=0.4, color='#1f77b4', s=30)
# Perfect prediction line
max_val = max(test_pred.max(), y_test.max())
ax.plot([0, max_val], [0, max_val], 'r--', label='Perfect Predictions', linewidth=2)
# Trend line
z = np.polyfit(test_pred, y_test, 1)
p = np.poly1d(z)
x_line = np.linspace(test_pred.min(), test_pred.max(), 100)
ax.plot(x_line, p(x_line), 'k-', label='Actual Fit', linewidth=2)
ax.set_xlabel('Predicted Career AV', fontsize=12)
ax.set_ylabel('Actual Career AV', fontsize=12)
ax.set_title(f'Random Forest Predictions vs Actual Career AV\nTest Set | RMSE = {test_rmse:.2f} | R² = {test_r2:.3f}',
fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Model Insights
Machine learning models can identify complex patterns in draft data, but they're not perfect predictors. The R² values typically range from 0.3-0.5, meaning 30-50% of variance in NFL success is explained by pre-draft measurables. This highlights the inherent uncertainty in draft evaluation and the importance of scouting expertise alongside analytics.Summary
This chapter covered the analytical foundations of NFL draft evaluation:
Key Takeaways:
-
Draft Pick Value: Traditional value charts (Jimmy Johnson) overvalue early picks; empirical models based on historical performance provide better trade guidance
-
Success Rates: Hit rates decline sharply by round, with significant variation by position. First-round picks have ~40-50% hit rates, dropping to ~15-20% by round 3
-
College-NFL Translation: College production correlates moderately with NFL success (r ≈ 0.3-0.5), varying by position. QBs and WRs show stronger correlations than linemen
-
Combine Metrics: Athletic testing has modest predictive value; explosion metrics (vertical, broad jump) often outperform straight-line speed. Position-specific evaluation is critical
-
Trade Optimization: Value-based trading can create competitive advantages. Trading down often provides surplus value
-
Positional Strategy: Premium positions (QB, OL, DL, DB) warrant higher investment; scarcity varies by draft class
-
Team Evaluation: Measuring performance vs expected value (given draft position) provides fairer team assessment than raw totals
-
Classification Systems: Round-adjusted thresholds for "hits" and "busts" enable meaningful evaluation
-
ML Applications: Random forests and gradient boosting models outperform linear approaches but explain only 30-50% of variance, highlighting draft uncertainty
-
Integrated Approach: Best practices combine analytics (value charts, statistical models) with scouting expertise and team-specific needs
Exercises
Conceptual Questions
-
Value Chart Economics: Explain why the Jimmy Johnson trade chart overvalues high picks. What economic principles drive this overvaluation?
-
Position Strategy: Given limited draft capital, should a team prioritize premium positions or best player available? Justify your answer with data.
-
Combine Skepticism: Why might combine performance be less predictive for certain positions? Provide examples.
Coding Exercises
Exercise 1: Build Your Own Value Chart
Create a custom draft value chart using the simulated data: a) Calculate expected AV by pick number b) Incorporate positional adjustments c) Account for rookie salary cap hits d) Create a surplus value model e) Compare your chart to the Jimmy Johnson chart **Bonus**: Create a position-specific value chart (e.g., separate values for QB vs OL)Exercise 2: Trade Analyzer
Build a function that evaluates proposed draft trades: a) Accept two sets of picks (Team A and Team B) b) Calculate value for each team using multiple value systems c) Recommend whether each team should accept d) Account for positional needs (if provided) e) Generate a trade "fairness" score **Test case**: Team A offers pick #10; Team B offers picks #20, #52, and next year's 2nd round pick (estimated #45)Exercise 3: College-to-NFL Projection
Develop a position-specific projection model: a) Filter data to a single position (e.g., WR) b) Create relevant features from college and combine data c) Train multiple models (linear, tree-based, ensemble) d) Evaluate with cross-validation e) Identify the most important predictive features f) Generate predictions with confidence intervals **Advanced**: Build separate models for each position and compare predictive accuracyExercise 4: Draft Class Analysis
Analyze the quality of draft classes: a) Calculate average AV by draft year b) Identify "strong" and "weak" draft classes c) Analyze positional strength by year (e.g., 2011 had great WRs) d) Create visualizations showing class quality trends e) Adjust for years of experience (recent classes have less accumulated AV) **Research question**: Do weak draft classes create more parity in the NFL?Exercise 5: Team Draft Strategy
Evaluate a specific team's draft strategy: a) Load real historical draft data for one team (2010-2023) b) Calculate their hit rate by round c) Identify positional preferences d) Measure AV vs expected given draft positions e) Analyze their trading behavior (trade up/down frequency) f) Create a comprehensive draft performance report **Teams to consider**: Green Bay (historically strong), Cleveland (historically weak), Baltimore (analytics-driven)Further Reading
Academic Research
-
Massey, C., & Thaler, R. H. (2013). "The Loser's Curse: Decision Making and Market Efficiency in the National Football League Draft." Management Science, 59(7), 1479-1495.
-
Hendricks, W., DeBrock, L., & Koenker, R. (2003). "Uncertainty, Hiring, and Subsequent Performance: The NFL Draft." Journal of Labor Economics, 21(4), 857-886.
-
Mulholland, J., & Jensen, S. T. (2014). "Predicting the NFL Draft." Journal of Quantitative Analysis in Sports, 10(4), 381-396.
Industry Applications
-
Fitzgerald, M. (2015). "Building a Better Draft Value Chart." Harvard Sports Analysis Collective.
-
Baldwin, B. (2020). "Drafting Analytics: What We Know and Don't Know." Open Source Football.
-
Buehlmann, C. (2019). "Machine Learning Applications in NFL Draft Projection." MIT Sloan Sports Analytics Conference.
Tools and Data
- Pro Football Reference: Draft data and Approximate Value calculations
- Stathead Football: Query interface for historical draft data
- nflfastR/nflverse: R packages with draft-related data
- nfl-data-py: Python library for NFL data including draft information
References
:::