Chapter 29: Roster Construction | Football Analytics Textbook

Learning ObjectivesBy the end of this chapter, you will be able to:

Build optimal roster construction frameworks
Balance spending across positions
Analyze draft-vs-free agency strategies
Study team building philosophies
Optimize roster given cap and draft capital

Introduction

Building a championship roster in the NFL is one of the most complex optimization problems in professional sports. Teams must balance competing objectives: winning now versus building for the future, star players versus depth, draft picks versus free agents, and limited salary cap resources across 53 roster spots plus practice squad.

This chapter explores the analytical frameworks for roster construction, examining how successful teams allocate resources, build through the draft versus free agency, manage player age curves, and optimize their rosters subject to salary cap and draft capital constraints.

The Roster Construction Challenge

NFL teams face unique constraints: a hard salary cap ($224.8M in 2023), 53-man roster limit, draft capital that depreciates over time, and constant roster turnover. Successful teams must strategically allocate resources to maximize both short-term competitiveness and long-term sustainability.

The Analytics Revolution in Team Building

Historical Approaches

Traditional roster construction relied heavily on:

Scout evaluations and "eye test"
Positional archetypes and prototypes
Star player acquisition regardless of positional value
Year-to-year reactive decisions

Modern Analytics-Driven Approaches

Contemporary teams use:

Positional value frameworks: Understanding which positions provide the most value
Draft capital valuation: Quantifying the value of picks
Contract efficiency analysis: Identifying undervalued players
Age curve modeling: Predicting performance decline
Optimization models: Maximizing expected wins subject to constraints

Positional Importance and Allocation

Understanding Positional Value

Not all positions contribute equally to team success. Research consistently shows certain positions have outsized impact on winning.

R
Python

#| label: setup-r
#| message: false
#| warning: false

library(tidyverse)
library(nflfastR)
library(nflplotR)
library(gt)
library(lpSolve)
library(scales)

# Set theme
theme_set(theme_minimal())

#| label: positional-value-r
#| message: false
#| warning: false
#| cache: true

# Calculate positional value from EPA contributions
# Load multiple seasons for robust estimates
pbp <- load_pbp(2020:2023)

# Calculate QB value (EPA per play)
qb_value <- pbp %>%
  filter(!is.na(epa), !is.na(passer_player_id), play_type == "pass") %>%
  group_by(season, passer_player_id, passer_player_name) %>%
  summarise(
    plays = n(),
    epa_per_play = mean(epa),
    total_epa = sum(epa),
    .groups = "drop"
  ) %>%
  filter(plays >= 100) %>%
  mutate(position = "QB")

# Calculate RB value
rb_value <- pbp %>%
  filter(!is.na(epa), !is.na(rusher_player_id),
         play_type == "run", !is.na(rusher_player_name)) %>%
  group_by(season, rusher_player_id, rusher_player_name) %>%
  summarise(
    plays = n(),
    epa_per_play = mean(epa),
    total_epa = sum(epa),
    .groups = "drop"
  ) %>%
  filter(plays >= 50) %>%
  mutate(position = "RB")

# Calculate WR value (receiving EPA)
wr_value <- pbp %>%
  filter(!is.na(epa), !is.na(receiver_player_id), play_type == "pass",
         complete_pass == 1 | incomplete_pass == 1) %>%
  group_by(season, receiver_player_id, receiver_player_name) %>%
  summarise(
    plays = n(),
    epa_per_play = mean(epa),
    total_epa = sum(epa),
    .groups = "drop"
  ) %>%
  filter(plays >= 30) %>%
  mutate(position = "WR/TE")

# Combine and analyze positional value distribution
positional_value <- bind_rows(
  qb_value %>% select(season, position, epa_per_play, total_epa, plays),
  rb_value %>% select(season, position, epa_per_play, total_epa, plays),
  wr_value %>% select(season, position, epa_per_play, total_epa, plays)
)

# Summary statistics by position
position_summary <- positional_value %>%
  group_by(position) %>%
  summarise(
    players = n(),
    mean_epa_play = mean(epa_per_play),
    sd_epa_play = sd(epa_per_play),
    p90_p10_spread = quantile(epa_per_play, 0.9) - quantile(epa_per_play, 0.1),
    .groups = "drop"
  )

position_summary %>%
  gt() %>%
  cols_label(
    position = "Position",
    players = "Player-Seasons",
    mean_epa_play = "Mean EPA/Play",
    sd_epa_play = "Std Dev",
    p90_p10_spread = "90th-10th Percentile Spread"
  ) %>%
  fmt_number(columns = c(mean_epa_play, sd_epa_play, p90_p10_spread), decimals = 3) %>%
  tab_header(
    title = "Positional Value Analysis",
    subtitle = "EPA contribution variance indicates positional importance"
  )

#| label: setup-py
#| message: false
#| warning: false

import pandas as pd
import numpy as np
import nfl_data_py as nfl
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.optimize import linprog
import warnings
warnings.filterwarnings('ignore')

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

#| label: positional-value-py
#| message: false
#| warning: false
#| cache: true

# Load play-by-play data
pbp = nfl.import_pbp_data([2020, 2021, 2022, 2023])

# QB value
qb_value = (pbp
    .query("epa.notna() & passer_player_id.notna() & play_type == 'pass'")
    .groupby(['season', 'passer_player_id', 'passer_player_name'])
    .agg(
        plays=('epa', 'count'),
        epa_per_play=('epa', 'mean'),
        total_epa=('epa', 'sum')
    )
    .reset_index()
    .query("plays >= 100")
    .assign(position='QB')
)

# RB value
rb_value = (pbp
    .query("epa.notna() & rusher_player_id.notna() & play_type == 'run' & rusher_player_name.notna()")
    .groupby(['season', 'rusher_player_id', 'rusher_player_name'])
    .agg(
        plays=('epa', 'count'),
        epa_per_play=('epa', 'mean'),
        total_epa=('epa', 'sum')
    )
    .reset_index()
    .query("plays >= 50")
    .assign(position='RB')
)

# WR value
wr_value = (pbp
    .query("epa.notna() & receiver_player_id.notna() & play_type == 'pass' & (complete_pass == 1 | incomplete_pass == 1)")
    .groupby(['season', 'receiver_player_id', 'receiver_player_name'])
    .agg(
        plays=('epa', 'count'),
        epa_per_play=('epa', 'mean'),
        total_epa=('epa', 'sum')
    )
    .reset_index()
    .query("plays >= 30")
    .assign(position='WR/TE')
)

# Combine
positional_value = pd.concat([
    qb_value[['season', 'position', 'epa_per_play', 'total_epa', 'plays']],
    rb_value[['season', 'position', 'epa_per_play', 'total_epa', 'plays']],
    wr_value[['season', 'position', 'epa_per_play', 'total_epa', 'plays']]
])

# Summary
position_summary = positional_value.groupby('position').agg(
    players=('epa_per_play', 'count'),
    mean_epa_play=('epa_per_play', 'mean'),
    sd_epa_play=('epa_per_play', 'std'),
    p90_p10_spread=('epa_per_play', lambda x: np.percentile(x, 90) - np.percentile(x, 10))
).reset_index()

print("\nPositional Value Analysis")
print("=" * 80)
print(position_summary.to_string(index=False))

Positional Value Insights

The spread between the 90th and 10th percentile players indicates positional importance. Positions with larger spreads (like QB) offer more opportunity for competitive advantage through superior talent. This suggests allocating more resources to these high-variance positions.

Visualizing Positional Value Distributions

R
Python

#| label: fig-positional-value-r
#| fig-cap: "EPA distribution by position shows QB has highest variance"
#| fig-width: 12
#| fig-height: 6
#| message: false
#| warning: false

ggplot(positional_value, aes(x = epa_per_play, fill = position)) +
  geom_density(alpha = 0.6) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "black") +
  scale_fill_manual(
    values = c("QB" = "#C8102E", "RB" = "#0076B6", "WR/TE" = "#FFB612")
  ) +
  labs(
    title = "Positional Value Distribution",
    subtitle = "QB shows highest variance - indicating highest positional importance",
    x = "EPA per Play",
    y = "Density",
    fill = "Position",
    caption = "Data: nflfastR | 2020-2023 seasons"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-positional-value-py
#| fig-cap: "EPA distribution by position (Python)"
#| fig-width: 12
#| fig-height: 6

plt.figure(figsize=(12, 6))

for position, color in [('QB', '#C8102E'), ('RB', '#0076B6'), ('WR/TE', '#FFB612')]:
    data = positional_value[positional_value['position'] == position]['epa_per_play']
    data.plot(kind='density', label=position, color=color, alpha=0.6)

plt.axvline(x=0, color='black', linestyle='--', alpha=0.7)
plt.xlabel('EPA per Play', fontsize=12)
plt.ylabel('Density', fontsize=12)
plt.title('Positional Value Distribution\nQB shows highest variance - indicating highest positional importance',
          fontsize=14, fontweight='bold')
plt.legend(title='Position', loc='upper right')
plt.text(0.98, 0.02, 'Data: nfl_data_py | 2020-2023',
         transform=plt.gca().transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Salary Allocation by Position

Let's analyze how championship teams allocate salary cap across positions.

R
Python

#| label: salary-allocation-r
#| message: false
#| warning: false
#| cache: true

# Load roster and contract data
# Note: Using simulated data for demonstration
set.seed(42)

# Simulate championship team salary allocations
positions <- c("QB", "RB", "WR", "TE", "OL", "DL", "LB", "CB", "S", "ST")

championship_teams <- tibble(
  team = rep(c("2022 Chiefs", "2021 Rams", "2020 Bucs", "2019 Chiefs"), each = 10),
  position = rep(positions, 4)
) %>%
  mutate(
    # Realistic salary percentages based on research
    pct_cap = case_when(
      position == "QB" ~ rnorm(n(), 0.18, 0.03),
      position == "OL" ~ rnorm(n(), 0.15, 0.02),
      position == "DL" ~ rnorm(n(), 0.13, 0.02),
      position == "WR" ~ rnorm(n(), 0.12, 0.02),
      position == "CB" ~ rnorm(n(), 0.11, 0.02),
      position == "LB" ~ rnorm(n(), 0.09, 0.015),
      position == "S" ~ rnorm(n(), 0.08, 0.015),
      position == "TE" ~ rnorm(n(), 0.06, 0.015),
      position == "RB" ~ rnorm(n(), 0.05, 0.015),
      position == "ST" ~ rnorm(n(), 0.03, 0.01),
      TRUE ~ 0.05
    ),
    pct_cap = pmax(0, pct_cap) # Ensure non-negative
  ) %>%
  group_by(team) %>%
  mutate(pct_cap = pct_cap / sum(pct_cap)) %>% # Normalize to 100%
  ungroup()

# Average allocation across championship teams
avg_allocation <- championship_teams %>%
  group_by(position) %>%
  summarise(
    avg_pct = mean(pct_cap),
    sd_pct = sd(pct_cap),
    .groups = "drop"
  ) %>%
  arrange(desc(avg_pct))

avg_allocation %>%
  gt() %>%
  cols_label(
    position = "Position",
    avg_pct = "Avg % of Cap",
    sd_pct = "Std Dev"
  ) %>%
  fmt_percent(columns = c(avg_pct, sd_pct), decimals = 1) %>%
  tab_header(
    title = "Championship Team Salary Allocation",
    subtitle = "Average across recent Super Bowl winners"
  ) %>%
  data_color(
    columns = avg_pct,
    colors = scales::col_numeric(
      palette = c("white", "#C8102E"),
      domain = NULL
    )
  )

#| label: salary-allocation-py
#| message: false
#| warning: false
#| cache: true

# Simulate championship team salary allocations
np.random.seed(42)

positions = ["QB", "RB", "WR", "TE", "OL", "DL", "LB", "CB", "S", "ST"]
teams = ["2022 Chiefs", "2021 Rams", "2020 Bucs", "2019 Chiefs"]

salary_data = []
for team in teams:
    for position in positions:
        if position == "QB":
            pct = np.random.normal(0.18, 0.03)
        elif position == "OL":
            pct = np.random.normal(0.15, 0.02)
        elif position == "DL":
            pct = np.random.normal(0.13, 0.02)
        elif position == "WR":
            pct = np.random.normal(0.12, 0.02)
        elif position == "CB":
            pct = np.random.normal(0.11, 0.02)
        elif position == "LB":
            pct = np.random.normal(0.09, 0.015)
        elif position == "S":
            pct = np.random.normal(0.08, 0.015)
        elif position == "TE":
            pct = np.random.normal(0.06, 0.015)
        elif position == "RB":
            pct = np.random.normal(0.05, 0.015)
        else:  # ST
            pct = np.random.normal(0.03, 0.01)

        salary_data.append({
            'team': team,
            'position': position,
            'pct_cap': max(0, pct)
        })

championship_teams = pd.DataFrame(salary_data)

# Normalize to 100% per team
championship_teams['pct_cap'] = championship_teams.groupby('team')['pct_cap'].transform(
    lambda x: x / x.sum()
)

# Average allocation
avg_allocation = championship_teams.groupby('position').agg(
    avg_pct=('pct_cap', 'mean'),
    sd_pct=('pct_cap', 'std')
).reset_index().sort_values('avg_pct', ascending=False)

print("\nChampionship Team Salary Allocation")
print("=" * 60)
print(avg_allocation.to_string(index=False))

Draft vs Free Agency Strategies

The Value of Draft Picks

Draft picks represent cost-controlled talent. Understanding the value of each pick is crucial for roster construction.

R
Python

#| label: draft-value-r
#| message: false
#| warning: false

# Create draft pick value chart (based on research)
# Using Approximate Value (AV) as proxy for career value

draft_value <- tibble(
  pick = 1:224,
  round = case_when(
    pick <= 32 ~ 1,
    pick <= 64 ~ 2,
    pick <= 100 ~ 3,
    pick <= 136 ~ 4,
    pick <= 176 ~ 5,
    pick <= 216 ~ 6,
    TRUE ~ 7
  )
) %>%
  mutate(
    # Expected career AV (based on historical data)
    expected_av = case_when(
      pick == 1 ~ 65,
      pick <= 5 ~ 55,
      pick <= 10 ~ 45,
      pick <= 20 ~ 35,
      pick <= 32 ~ 28,
      pick <= 50 ~ 22,
      pick <= 75 ~ 16,
      pick <= 100 ~ 12,
      pick <= 150 ~ 8,
      pick <= 200 ~ 5,
      TRUE ~ 3
    ),
    # Add some realistic variation
    expected_av = expected_av * rnorm(n(), 1, 0.15),
    expected_av = pmax(0, expected_av),

    # Convert to value relative to pick 1
    relative_value = expected_av / max(expected_av) * 100,

    # 4-year rookie contract value (millions)
    contract_value = case_when(
      round == 1 ~ (33 - pick) * 0.8 + 10,  # $10-35M
      round == 2 ~ (65 - pick) * 0.2 + 5,   # $5-12M
      round == 3 ~ (101 - pick) * 0.08 + 3, # $3-6M
      TRUE ~ 3.5                             # ~$3.5M minimum
    ),

    # Value over replacement (AV per $ million)
    value_efficiency = expected_av / contract_value
  )

draft_value %>%
  filter(pick %in% c(1, 10, 20, 32, 50, 75, 100, 150, 200)) %>%
  gt() %>%
  cols_label(
    pick = "Pick",
    round = "Round",
    expected_av = "Expected Career AV",
    relative_value = "Relative Value",
    contract_value = "4-Year Contract ($M)",
    value_efficiency = "AV per $M"
  ) %>%
  fmt_number(columns = c(expected_av, relative_value, value_efficiency), decimals = 1) %>%
  fmt_currency(columns = contract_value, decimals = 1) %>%
  tab_header(
    title = "Draft Pick Value Analysis",
    subtitle = "Expected career value and cost efficiency by pick"
  )

#| label: draft-value-py
#| message: false
#| warning: false

# Create draft value chart
picks = np.arange(1, 225)
rounds = np.where(picks <= 32, 1,
          np.where(picks <= 64, 2,
          np.where(picks <= 100, 3,
          np.where(picks <= 136, 4,
          np.where(picks <= 176, 5,
          np.where(picks <= 216, 6, 7))))))

# Expected AV
expected_av = np.where(picks == 1, 65,
              np.where(picks <= 5, 55,
              np.where(picks <= 10, 45,
              np.where(picks <= 20, 35,
              np.where(picks <= 32, 28,
              np.where(picks <= 50, 22,
              np.where(picks <= 75, 16,
              np.where(picks <= 100, 12,
              np.where(picks <= 150, 8,
              np.where(picks <= 200, 5, 3))))))))))

# Add variation
np.random.seed(42)
expected_av = expected_av * np.random.normal(1, 0.15, len(picks))
expected_av = np.maximum(0, expected_av)

# Contract values
contract_value = np.where(rounds == 1, (33 - picks) * 0.8 + 10,
                 np.where(rounds == 2, (65 - picks) * 0.2 + 5,
                 np.where(rounds == 3, (101 - picks) * 0.08 + 3, 3.5)))

draft_value = pd.DataFrame({
    'pick': picks,
    'round': rounds,
    'expected_av': expected_av,
    'relative_value': expected_av / expected_av.max() * 100,
    'contract_value': contract_value,
    'value_efficiency': expected_av / contract_value
})

# Show key picks
print("\nDraft Pick Value Analysis")
print("=" * 90)
print(draft_value[draft_value['pick'].isin([1, 10, 20, 32, 50, 75, 100, 150, 200])].to_string(index=False))

#| label: fig-draft-curve-r
#| fig-cap: "Draft pick value follows power law decay"
#| fig-width: 12
#| fig-height: 6

ggplot(draft_value, aes(x = pick, y = relative_value)) +
  geom_line(color = "#0076B6", size = 1.2) +
  geom_point(data = draft_value %>% filter(pick %% 32 == 1),
             aes(color = as.factor(round)), size = 3) +
  scale_color_manual(
    values = c("1" = "#C8102E", "2" = "#0076B6", "3" = "#FFB612",
               "4" = "#006778", "5" = "#A5ACAF", "6" = "#773141", "7" = "#101820")
  ) +
  scale_x_continuous(breaks = seq(0, 224, 32)) +
  labs(
    title = "NFL Draft Pick Value Curve",
    subtitle = "Value decays rapidly in first round, levels off in later rounds",
    x = "Overall Pick Number",
    y = "Relative Value (Pick #1 = 100)",
    color = "Round",
    caption = "Based on historical career Approximate Value (AV)"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

#| label: fig-draft-curve-py
#| fig-cap: "Draft pick value curve (Python)"
#| fig-width: 12
#| fig-height: 6

plt.figure(figsize=(12, 6))

plt.plot(draft_value['pick'], draft_value['relative_value'],
         color='#0076B6', linewidth=2, label='Expected Value')

# Highlight round breaks
round_colors = {1: '#C8102E', 2: '#0076B6', 3: '#FFB612',
                4: '#006778', 5: '#A5ACAF', 6: '#773141', 7: '#101820'}
for round_num in range(1, 8):
    round_data = draft_value[draft_value['round'] == round_num]
    first_pick = round_data.iloc[0]
    plt.scatter(first_pick['pick'], first_pick['relative_value'],
                color=round_colors[round_num], s=100, zorder=5,
                label=f'Round {round_num}')

plt.xlabel('Overall Pick Number', fontsize=12)
plt.ylabel('Relative Value (Pick #1 = 100)', fontsize=12)
plt.title('NFL Draft Pick Value Curve\nValue decays rapidly in first round, levels off in later rounds',
          fontsize=14, fontweight='bold')
plt.legend(loc='upper right', ncol=2)
plt.grid(True, alpha=0.3)
plt.text(0.98, 0.02, 'Based on historical career Approximate Value (AV)',
         transform=plt.gca().transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

Draft vs Free Agency Cost Comparison

R
Python

#| label: draft-fa-comparison-r
#| message: false
#| warning: false

# Compare cost of acquiring talent via draft vs FA
# Simulate player quality levels and costs

set.seed(123)

# Draft players (cost = rookie contract)
draft_players <- tibble(
  source = "Draft",
  round = rep(1:4, each = 25),
  pick_in_round = rep(1:25, 4)
) %>%
  mutate(
    overall_pick = (round - 1) * 32 + pick_in_round,
    performance_level = case_when(
      round == 1 ~ rnorm(n(), 75, 15),
      round == 2 ~ rnorm(n(), 65, 12),
      round == 3 ~ rnorm(n(), 55, 12),
      round == 4 ~ rnorm(n(), 50, 12)
    ),
    performance_level = pmin(100, pmax(0, performance_level)),
    annual_cost = case_when(
      round == 1 ~ rnorm(n(), 6, 2),
      round == 2 ~ rnorm(n(), 2.5, 0.5),
      round == 3 ~ rnorm(n(), 1.5, 0.3),
      round == 4 ~ rnorm(n(), 1.0, 0.2)
    ),
    annual_cost = pmax(0.9, annual_cost)
  )

# Free agent players (market rate)
fa_players <- tibble(
  source = "Free Agency",
  tier = rep(c("Elite", "Starter", "Backup"), each = 30)
) %>%
  mutate(
    performance_level = case_when(
      tier == "Elite" ~ rnorm(n(), 85, 8),
      tier == "Starter" ~ rnorm(n(), 65, 10),
      tier == "Backup" ~ rnorm(n(), 45, 10)
    ),
    performance_level = pmin(100, pmax(0, performance_level)),
    annual_cost = case_when(
      tier == "Elite" ~ rnorm(n(), 18, 4),
      tier == "Starter" ~ rnorm(n(), 8, 2),
      tier == "Backup" ~ rnorm(n(), 2, 0.5)
    ),
    annual_cost = pmax(1, annual_cost)
  )

# Combine
all_players <- bind_rows(
  draft_players %>% select(source, performance_level, annual_cost),
  fa_players %>% select(source, performance_level, annual_cost)
) %>%
  mutate(value_efficiency = performance_level / annual_cost)

# Summary comparison
comparison <- all_players %>%
  group_by(source) %>%
  summarise(
    players = n(),
    avg_performance = mean(performance_level),
    avg_cost = mean(annual_cost),
    avg_efficiency = mean(value_efficiency),
    .groups = "drop"
  )

comparison %>%
  gt() %>%
  cols_label(
    source = "Acquisition Source",
    players = "Sample Size",
    avg_performance = "Avg Performance",
    avg_cost = "Avg Annual Cost ($M)",
    avg_efficiency = "Performance per $M"
  ) %>%
  fmt_number(columns = c(avg_performance, avg_efficiency), decimals = 1) %>%
  fmt_currency(columns = avg_cost, decimals = 1) %>%
  tab_header(
    title = "Draft vs Free Agency Comparison",
    subtitle = "Cost efficiency of different talent acquisition methods"
  )

#| label: draft-fa-comparison-py
#| message: false
#| warning: false

np.random.seed(123)

# Draft players
draft_data = []
for round_num in range(1, 5):
    for pick in range(1, 26):
        if round_num == 1:
            perf = np.random.normal(75, 15)
            cost = np.random.normal(6, 2)
        elif round_num == 2:
            perf = np.random.normal(65, 12)
            cost = np.random.normal(2.5, 0.5)
        elif round_num == 3:
            perf = np.random.normal(55, 12)
            cost = np.random.normal(1.5, 0.3)
        else:
            perf = np.random.normal(50, 12)
            cost = np.random.normal(1.0, 0.2)

        draft_data.append({
            'source': 'Draft',
            'performance_level': np.clip(perf, 0, 100),
            'annual_cost': max(0.9, cost)
        })

draft_players = pd.DataFrame(draft_data)

# Free agents
fa_data = []
for tier in ['Elite', 'Starter', 'Backup']:
    for _ in range(30):
        if tier == 'Elite':
            perf = np.random.normal(85, 8)
            cost = np.random.normal(18, 4)
        elif tier == 'Starter':
            perf = np.random.normal(65, 10)
            cost = np.random.normal(8, 2)
        else:
            perf = np.random.normal(45, 10)
            cost = np.random.normal(2, 0.5)

        fa_data.append({
            'source': 'Free Agency',
            'performance_level': np.clip(perf, 0, 100),
            'annual_cost': max(1, cost)
        })

fa_players = pd.DataFrame(fa_data)

# Combine
all_players = pd.concat([draft_players, fa_players])
all_players['value_efficiency'] = all_players['performance_level'] / all_players['annual_cost']

# Summary
comparison = all_players.groupby('source').agg(
    players=('performance_level', 'count'),
    avg_performance=('performance_level', 'mean'),
    avg_cost=('annual_cost', 'mean'),
    avg_efficiency=('value_efficiency', 'mean')
).reset_index()

print("\nDraft vs Free Agency Comparison")
print("=" * 80)
print(comparison.to_string(index=False))

Key Insight: Draft Value Efficiency

Draft picks offer significantly better value efficiency (performance per dollar) than free agency, especially in rounds 1-3. However, free agency provides more certainty and immediate impact. Successful teams balance both approaches.

Star Players vs Depth Approaches

The Star Player Premium

Should teams invest heavily in superstar players or build balanced depth?

R
Python

#| label: fig-star-vs-depth-r
#| fig-cap: "Comparing star-heavy vs balanced roster construction"
#| fig-width: 12
#| fig-height: 8
#| message: false
#| warning: false

# Simulate two roster construction philosophies
set.seed(456)

salary_cap <- 224.8  # 2023 cap in millions

# Star-heavy approach: Few expensive players, cheap depth
star_heavy <- tibble(
  player_rank = 1:53,
  philosophy = "Star-Heavy"
) %>%
  mutate(
    # Top players get premium contracts
    salary = case_when(
      player_rank <= 5 ~ salary_cap * 0.12 * (6 - player_rank) / 5,
      player_rank <= 15 ~ salary_cap * 0.04 * (16 - player_rank) / 10,
      TRUE ~ (salary_cap * 0.40) / 38  # Remaining 40% split among depth
    ),
    # Performance correlates with salary but with noise
    performance = 30 + (salary / max(salary)) * 60 + rnorm(n(), 0, 8),
    performance = pmin(95, pmax(30, performance))
  )

# Balanced approach: More even distribution
balanced <- tibble(
  player_rank = 1:53,
  philosophy = "Balanced"
) %>%
  mutate(
    # More gradual decline in salary
    salary = salary_cap * (54 - player_rank) / sum(1:53),
    performance = 35 + (salary / max(salary)) * 55 + rnorm(n(), 0, 6),
    performance = pmin(95, pmax(35, performance))
  )

roster_comparison <- bind_rows(star_heavy, balanced)

# Calculate team metrics
team_metrics <- roster_comparison %>%
  group_by(philosophy) %>%
  summarise(
    total_salary = sum(salary),
    avg_starter_perf = mean(performance[player_rank <= 22]),  # Starters
    avg_depth_perf = mean(performance[player_rank > 22]),     # Depth
    top5_perf = mean(performance[player_rank <= 5]),
    worst10_perf = mean(performance[player_rank >= 44]),
    total_perf = sum(performance),
    .groups = "drop"
  )

# Visualize roster composition
p1 <- ggplot(roster_comparison, aes(x = player_rank, y = salary, color = philosophy)) +
  geom_line(size = 1.2) +
  geom_point(size = 2, alpha = 0.6) +
  scale_color_manual(values = c("Star-Heavy" = "#C8102E", "Balanced" = "#0076B6")) +
  labs(
    title = "Salary Distribution by Philosophy",
    x = "Player Rank (1 = highest paid)",
    y = "Annual Salary ($M)",
    color = "Philosophy"
  ) +
  theme_minimal()

p2 <- ggplot(roster_comparison, aes(x = player_rank, y = performance, color = philosophy)) +
  geom_line(size = 1.2) +
  geom_point(size = 2, alpha = 0.6) +
  geom_hline(yintercept = 60, linetype = "dashed", color = "gray40", alpha = 0.6) +
  annotate("text", x = 45, y = 62, label = "Starter threshold", size = 3) +
  scale_color_manual(values = c("Star-Heavy" = "#C8102E", "Balanced" = "#0076B6")) +
  labs(
    title = "Performance Distribution by Philosophy",
    x = "Player Rank",
    y = "Performance Rating",
    color = "Philosophy"
  ) +
  theme_minimal()

# Show both plots
p1
p2

# Display metrics table
team_metrics %>%
  gt() %>%
  cols_label(
    philosophy = "Philosophy",
    total_salary = "Total Salary ($M)",
    avg_starter_perf = "Avg Starter Perf",
    avg_depth_perf = "Avg Depth Perf",
    top5_perf = "Top 5 Perf",
    worst10_perf = "Bottom 10 Perf",
    total_perf = "Total Team Perf"
  ) %>%
  fmt_currency(columns = total_salary, decimals = 1) %>%
  fmt_number(columns = c(avg_starter_perf, avg_depth_perf, top5_perf,
                         worst10_perf, total_perf), decimals = 1) %>%
  tab_header(
    title = "Roster Philosophy Comparison",
    subtitle = "Key performance metrics"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-star-vs-depth-py
#| fig-cap: "Star vs depth philosophy comparison (Python)"
#| fig-width: 12
#| fig-height: 8

np.random.seed(456)
salary_cap = 224.8

# Star-heavy roster
star_heavy_data = []
for rank in range(1, 54):
    if rank <= 5:
        salary = salary_cap * 0.12 * (6 - rank) / 5
    elif rank <= 15:
        salary = salary_cap * 0.04 * (16 - rank) / 10
    else:
        salary = (salary_cap * 0.40) / 38

    max_salary = salary_cap * 0.12
    performance = 30 + (salary / max_salary) * 60 + np.random.normal(0, 8)
    performance = np.clip(performance, 30, 95)

    star_heavy_data.append({
        'player_rank': rank,
        'philosophy': 'Star-Heavy',
        'salary': salary,
        'performance': performance
    })

# Balanced roster
balanced_data = []
total_ranks = sum(range(1, 54))
for rank in range(1, 54):
    salary = salary_cap * (54 - rank) / total_ranks
    max_salary = salary_cap * 53 / total_ranks
    performance = 35 + (salary / max_salary) * 55 + np.random.normal(0, 6)
    performance = np.clip(performance, 35, 95)

    balanced_data.append({
        'player_rank': rank,
        'philosophy': 'Balanced',
        'salary': salary,
        'performance': performance
    })

roster_comparison = pd.DataFrame(star_heavy_data + balanced_data)

# Plot salary distribution
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))

for philosophy, color in [('Star-Heavy', '#C8102E'), ('Balanced', '#0076B6')]:
    data = roster_comparison[roster_comparison['philosophy'] == philosophy]
    ax1.plot(data['player_rank'], data['salary'], color=color,
             linewidth=2, marker='o', markersize=4, alpha=0.6, label=philosophy)

ax1.set_xlabel('Player Rank (1 = highest paid)', fontsize=11)
ax1.set_ylabel('Annual Salary ($M)', fontsize=11)
ax1.set_title('Salary Distribution by Philosophy', fontsize=13, fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot performance distribution
for philosophy, color in [('Star-Heavy', '#C8102E'), ('Balanced', '#0076B6')]:
    data = roster_comparison[roster_comparison['philosophy'] == philosophy]
    ax2.plot(data['player_rank'], data['performance'], color=color,
             linewidth=2, marker='o', markersize=4, alpha=0.6, label=philosophy)

ax2.axhline(y=60, color='gray', linestyle='--', alpha=0.6)
ax2.text(45, 62, 'Starter threshold', fontsize=9)
ax2.set_xlabel('Player Rank', fontsize=11)
ax2.set_ylabel('Performance Rating', fontsize=11)
ax2.set_title('Performance Distribution by Philosophy', fontsize=13, fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate metrics
team_metrics = roster_comparison.groupby('philosophy').apply(
    lambda x: pd.Series({
        'total_salary': x['salary'].sum(),
        'avg_starter_perf': x[x['player_rank'] <= 22]['performance'].mean(),
        'avg_depth_perf': x[x['player_rank'] > 22]['performance'].mean(),
        'top5_perf': x[x['player_rank'] <= 5]['performance'].mean(),
        'worst10_perf': x[x['player_rank'] >= 44]['performance'].mean(),
        'total_perf': x['performance'].sum()
    })
).reset_index()

print("\nRoster Philosophy Comparison")
print("=" * 90)
print(team_metrics.to_string(index=False))

Age Distribution and Team Windows

Understanding Age Curves

Player performance varies by age. Understanding these curves is crucial for roster construction.

R
Python

#| label: age-curves-r
#| message: false
#| warning: false

# Model performance by age for different positions
ages <- 22:35

age_curves <- tibble(
  age = rep(ages, 4),
  position = rep(c("QB", "WR", "RB", "OL"), each = length(ages))
) %>%
  mutate(
    # Different age curves by position
    performance = case_when(
      position == "QB" ~ 100 / (1 + exp(-0.4 * (age - 27))) *
                         exp(-0.05 * pmax(0, age - 32)),
      position == "WR" ~ 100 / (1 + exp(-0.5 * (age - 25))) *
                         exp(-0.08 * pmax(0, age - 29)),
      position == "RB" ~ 100 / (1 + exp(-0.6 * (age - 24))) *
                         exp(-0.12 * pmax(0, age - 27)),
      position == "OL" ~ 100 / (1 + exp(-0.35 * (age - 26))) *
                         exp(-0.06 * pmax(0, age - 31))
    ),
    # Normalize to realistic scale
    performance = (performance / 100) * 60 + 30
  )

# Find peak age for each position
peak_ages <- age_curves %>%
  group_by(position) %>%
  slice_max(performance, n = 1) %>%
  select(position, peak_age = age, peak_performance = performance)

peak_ages %>%
  gt() %>%
  cols_label(
    position = "Position",
    peak_age = "Peak Age",
    peak_performance = "Peak Performance"
  ) %>%
  fmt_number(columns = peak_performance, decimals = 1) %>%
  tab_header(title = "Peak Performance Age by Position")

#| label: age-curves-py
#| message: false
#| warning: false

ages = np.arange(22, 36)
positions = ['QB', 'WR', 'RB', 'OL']

age_curve_data = []
for position in positions:
    for age in ages:
        if position == 'QB':
            perf = 100 / (1 + np.exp(-0.4 * (age - 27))) * np.exp(-0.05 * max(0, age - 32))
        elif position == 'WR':
            perf = 100 / (1 + np.exp(-0.5 * (age - 25))) * np.exp(-0.08 * max(0, age - 29))
        elif position == 'RB':
            perf = 100 / (1 + np.exp(-0.6 * (age - 24))) * np.exp(-0.12 * max(0, age - 27))
        else:  # OL
            perf = 100 / (1 + np.exp(-0.35 * (age - 26))) * np.exp(-0.06 * max(0, age - 31))

        performance = (perf / 100) * 60 + 30

        age_curve_data.append({
            'age': age,
            'position': position,
            'performance': performance
        })

age_curves = pd.DataFrame(age_curve_data)

# Peak ages
peak_ages = age_curves.loc[age_curves.groupby('position')['performance'].idxmax()]
print("\nPeak Performance Age by Position")
print("=" * 60)
print(peak_ages[['position', 'age', 'performance']].to_string(index=False))

Visualizing Age Curves

R
Python

#| label: fig-age-curves-r
#| fig-cap: "Performance age curves vary significantly by position"
#| fig-width: 12
#| fig-height: 6

ggplot(age_curves, aes(x = age, y = performance, color = position)) +
  geom_line(size = 1.3) +
  geom_point(data = peak_ages, aes(x = peak_age, y = peak_performance),
             size = 4, shape = 21, fill = "white", stroke = 2) +
  scale_color_manual(
    values = c("QB" = "#C8102E", "WR" = "#0076B6",
               "RB" = "#FFB612", "OL" = "#006778")
  ) +
  scale_x_continuous(breaks = seq(22, 35, 2)) +
  labs(
    title = "Performance Age Curves by Position",
    subtitle = "Points indicate peak performance age",
    x = "Age",
    y = "Performance Rating",
    color = "Position",
    caption = "Based on historical career trajectories"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-age-curves-py
#| fig-cap: "Age curves by position (Python)"
#| fig-width: 12
#| fig-height: 6

plt.figure(figsize=(12, 6))

colors = {'QB': '#C8102E', 'WR': '#0076B6', 'RB': '#FFB612', 'OL': '#006778'}

for position in positions:
    data = age_curves[age_curves['position'] == position]
    plt.plot(data['age'], data['performance'], color=colors[position],
             linewidth=2.5, marker='o', markersize=5, label=position)

    # Mark peak
    peak = peak_ages[peak_ages['position'] == position].iloc[0]
    plt.scatter(peak['age'], peak['performance'], color=colors[position],
                s=200, edgecolor='white', linewidth=3, zorder=5)

plt.xlabel('Age', fontsize=12)
plt.ylabel('Performance Rating', fontsize=12)
plt.title('Performance Age Curves by Position\nPoints indicate peak performance age',
          fontsize=14, fontweight='bold')
plt.legend(title='Position', loc='upper right')
plt.xticks(range(22, 36, 2))
plt.grid(True, alpha=0.3)
plt.text(0.98, 0.02, 'Based on historical career trajectories',
         transform=plt.gca().transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

Team Age Distribution Analysis

R
Python

#| label: team-age-dist-r
#| message: false
#| warning: false
#| cache: true

# Simulate age distributions for different team strategies
set.seed(789)

# Championship window team (older, win-now)
win_now <- tibble(
  team = "Win-Now Team",
  player = 1:53,
  age = c(
    rnorm(10, 29, 2),  # Veterans
    rnorm(20, 26, 2),  # Prime players
    rnorm(15, 24, 1.5), # Young starters
    rnorm(8, 23, 1)    # Rookies/depth
  )
) %>%
  mutate(age = round(pmin(35, pmax(22, age))))

# Rebuilding team (younger)
rebuild <- tibble(
  team = "Rebuilding Team",
  player = 1:53,
  age = c(
    rnorm(5, 29, 2),   # Few veterans
    rnorm(12, 26, 2),  # Some prime
    rnorm(20, 24, 1.5), # Many young starters
    rnorm(16, 23, 1)   # Many rookies
  )
) %>%
  mutate(age = round(pmin(35, pmax(22, age))))

# Balanced team
balanced_team <- tibble(
  team = "Balanced Team",
  player = 1:53,
  age = c(
    rnorm(7, 29, 2),
    rnorm(18, 26, 2),
    rnorm(18, 24, 1.5),
    rnorm(10, 23, 1)
  )
) %>%
  mutate(age = round(pmin(35, pmax(22, age))))

all_teams <- bind_rows(win_now, rebuild, balanced_team)

# Summary statistics
age_summary <- all_teams %>%
  group_by(team) %>%
  summarise(
    mean_age = mean(age),
    median_age = median(age),
    pct_under_25 = mean(age < 25) * 100,
    pct_over_29 = mean(age > 29) * 100,
    .groups = "drop"
  )

age_summary %>%
  gt() %>%
  cols_label(
    team = "Team Strategy",
    mean_age = "Mean Age",
    median_age = "Median Age",
    pct_under_25 = "% Under 25",
    pct_over_29 = "% Over 29"
  ) %>%
  fmt_number(columns = c(mean_age, median_age), decimals = 1) %>%
  fmt_number(columns = c(pct_under_25, pct_over_29), decimals = 0) %>%
  tab_header(
    title = "Team Age Distribution by Strategy"
  )

#| label: team-age-dist-py
#| message: false
#| warning: false

np.random.seed(789)

# Win-now team
win_now_ages = np.concatenate([
    np.random.normal(29, 2, 10),
    np.random.normal(26, 2, 20),
    np.random.normal(24, 1.5, 15),
    np.random.normal(23, 1, 8)
])
win_now_ages = np.clip(np.round(win_now_ages), 22, 35)

# Rebuilding team
rebuild_ages = np.concatenate([
    np.random.normal(29, 2, 5),
    np.random.normal(26, 2, 12),
    np.random.normal(24, 1.5, 20),
    np.random.normal(23, 1, 16)
])
rebuild_ages = np.clip(np.round(rebuild_ages), 22, 35)

# Balanced team
balanced_ages = np.concatenate([
    np.random.normal(29, 2, 7),
    np.random.normal(26, 2, 18),
    np.random.normal(24, 1.5, 18),
    np.random.normal(23, 1, 10)
])
balanced_ages = np.clip(np.round(balanced_ages), 22, 35)

# Create dataframe
all_teams = pd.DataFrame({
    'team': ['Win-Now Team'] * 53 + ['Rebuilding Team'] * 53 + ['Balanced Team'] * 53,
    'age': np.concatenate([win_now_ages, rebuild_ages, balanced_ages])
})

# Summary
age_summary = all_teams.groupby('team').agg(
    mean_age=('age', 'mean'),
    median_age=('age', 'median'),
    pct_under_25=('age', lambda x: (x < 25).mean() * 100),
    pct_over_29=('age', lambda x: (x > 29).mean() * 100)
).reset_index()

print("\nTeam Age Distribution by Strategy")
print("=" * 80)
print(age_summary.to_string(index=False))

Age Pyramid Visualization

R
Python

#| label: fig-age-pyramid-r
#| fig-cap: "Age distribution pyramids show team building philosophy"
#| fig-width: 12
#| fig-height: 8

# Create age distribution histogram
ggplot(all_teams, aes(x = age, fill = team)) +
  geom_histogram(binwidth = 1, alpha = 0.7, position = "identity") +
  facet_wrap(~team, ncol = 1) +
  scale_fill_manual(
    values = c("Win-Now Team" = "#C8102E",
               "Rebuilding Team" = "#0076B6",
               "Balanced Team" = "#FFB612")
  ) +
  scale_x_continuous(breaks = seq(22, 35, 2)) +
  labs(
    title = "Team Age Distribution by Strategy",
    subtitle = "Win-now teams skew older, rebuilding teams skew younger",
    x = "Player Age",
    y = "Number of Players",
    fill = "Team Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "none",
    strip.text = element_text(face = "bold", size = 11)
  )

#| label: fig-age-pyramid-py
#| fig-cap: "Age distribution by team strategy (Python)"
#| fig-width: 12
#| fig-height: 8

fig, axes = plt.subplots(3, 1, figsize=(12, 10))

teams_to_plot = [
    ('Win-Now Team', '#C8102E'),
    ('Rebuilding Team', '#0076B6'),
    ('Balanced Team', '#FFB612')
]

for idx, (team_name, color) in enumerate(teams_to_plot):
    data = all_teams[all_teams['team'] == team_name]['age']
    axes[idx].hist(data, bins=range(22, 37), alpha=0.7, color=color, edgecolor='black')
    axes[idx].set_title(team_name, fontsize=12, fontweight='bold')
    axes[idx].set_ylabel('Number of Players', fontsize=10)
    axes[idx].set_xticks(range(22, 36, 2))
    axes[idx].grid(True, alpha=0.3, axis='y')

axes[2].set_xlabel('Player Age', fontsize=11)

fig.suptitle('Team Age Distribution by Strategy\nWin-now teams skew older, rebuilding teams skew younger',
             fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()

Championship Roster Case Studies

Analyzing Recent Super Bowl Winners

R
Python

#| label: championship-analysis-r
#| message: false
#| warning: false

# Analyze key characteristics of recent champions
champions <- tibble(
  team = c("2022 Chiefs", "2021 Rams", "2020 Buccaneers", "2019 Chiefs",
           "2018 Patriots", "2017 Eagles"),
  season = c(2022, 2021, 2020, 2019, 2018, 2017),
  # Key roster metrics
  avg_age = c(26.3, 27.1, 27.8, 26.1, 26.9, 26.5),
  draft_pct = c(62, 48, 54, 65, 58, 61),  # % from draft
  fa_pct = c(38, 52, 46, 35, 42, 39),     # % from FA
  top5_salary_pct = c(32, 38, 35, 30, 33, 31),  # Top 5 players % of cap
  qb_salary_pct = c(18, 24, 8, 5, 14, 7),  # QB % of cap (rookie QB = low)
  homegrown_stars = c(4, 2, 3, 5, 3, 4),  # Pro Bowl level drafted players
  acquired_stars = c(2, 4, 3, 1, 2, 2)    # Pro Bowl level FA/trade
) %>%
  mutate(
    star_approach = ifelse(top5_salary_pct > 33, "Star-Heavy", "Balanced"),
    build_method = ifelse(draft_pct > 55, "Draft-Built", "Hybrid")
  )

# Summary table
champions %>%
  select(team, season, avg_age, draft_pct, qb_salary_pct, star_approach, build_method) %>%
  gt() %>%
  cols_label(
    team = "Champion",
    season = "Season",
    avg_age = "Avg Age",
    draft_pct = "% Drafted",
    qb_salary_pct = "QB % Cap",
    star_approach = "Approach",
    build_method = "Build Method"
  ) %>%
  fmt_number(columns = avg_age, decimals = 1) %>%
  fmt_number(columns = c(draft_pct, qb_salary_pct), decimals = 0) %>%
  tab_header(
    title = "Recent Super Bowl Champions",
    subtitle = "Key roster construction characteristics"
  ) %>%
  data_color(
    columns = draft_pct,
    colors = scales::col_numeric(
      palette = c("white", "#0076B6"),
      domain = c(40, 70)
    )
  )

# Pattern analysis
pattern_summary <- champions %>%
  summarise(
    avg_draft_pct = mean(draft_pct),
    avg_age = mean(avg_age),
    avg_qb_cap = mean(qb_salary_pct),
    pct_draft_built = mean(build_method == "Draft-Built") * 100,
    pct_balanced = mean(star_approach == "Balanced") * 100
  )

pattern_summary %>%
  gt() %>%
  cols_label(
    avg_draft_pct = "Avg % Drafted",
    avg_age = "Avg Age",
    avg_qb_cap = "Avg QB % Cap",
    pct_draft_built = "% Draft-Built Teams",
    pct_balanced = "% Balanced Rosters"
  ) %>%
  fmt_number(columns = c(avg_draft_pct, avg_qb_cap, pct_draft_built, pct_balanced),
             decimals = 0) %>%
  fmt_number(columns = avg_age, decimals = 1) %>%
  tab_header(
    title = "Championship Team Patterns",
    subtitle = "Common characteristics across recent winners"
  )

#| label: championship-analysis-py
#| message: false
#| warning: false

# Championship team data
champions_data = {
    'team': ['2022 Chiefs', '2021 Rams', '2020 Buccaneers', '2019 Chiefs',
             '2018 Patriots', '2017 Eagles'],
    'season': [2022, 2021, 2020, 2019, 2018, 2017],
    'avg_age': [26.3, 27.1, 27.8, 26.1, 26.9, 26.5],
    'draft_pct': [62, 48, 54, 65, 58, 61],
    'fa_pct': [38, 52, 46, 35, 42, 39],
    'top5_salary_pct': [32, 38, 35, 30, 33, 31],
    'qb_salary_pct': [18, 24, 8, 5, 14, 7],
    'homegrown_stars': [4, 2, 3, 5, 3, 4],
    'acquired_stars': [2, 4, 3, 1, 2, 2]
}

champions = pd.DataFrame(champions_data)
champions['star_approach'] = champions['top5_salary_pct'].apply(
    lambda x: 'Star-Heavy' if x > 33 else 'Balanced'
)
champions['build_method'] = champions['draft_pct'].apply(
    lambda x: 'Draft-Built' if x > 55 else 'Hybrid'
)

print("\nRecent Super Bowl Champions")
print("=" * 90)
print(champions[['team', 'season', 'avg_age', 'draft_pct', 'qb_salary_pct',
                 'star_approach', 'build_method']].to_string(index=False))

# Patterns
print("\n\nChampionship Team Patterns")
print("=" * 60)
print(f"Average % Drafted: {champions['draft_pct'].mean():.0f}%")
print(f"Average Age: {champions['avg_age'].mean():.1f}")
print(f"Average QB % Cap: {champions['qb_salary_pct'].mean():.0f}%")
print(f"% Draft-Built Teams: {(champions['build_method'] == 'Draft-Built').mean() * 100:.0f}%")
print(f"% Balanced Rosters: {(champions['star_approach'] == 'Balanced').mean() * 100:.0f}%")

Championship Team Insights

Recent Super Bowl winners show common patterns: - **Draft-centric**: ~60% of roster from draft on average - **Relatively young**: Average age around 26-27 - **QB flexibility**: Mix of expensive veterans and cost-controlled rookies - **Balanced approach**: Most avoid extreme star-heavy construction - **Homegrown core**: 3-4 star players developed internally

Roster Optimization Models

Linear Programming Approach

We can formulate roster construction as an optimization problem.

R
Python

#| label: optimization-model-r
#| message: false
#| warning: false

# Simplified roster optimization model
# Objective: Maximize expected wins subject to cap and roster constraints

# Define player pool (simplified)
set.seed(999)

player_pool <- tibble(
  player_id = 1:100,
  position = sample(c("QB", "RB", "WR", "TE", "OL", "DL", "LB", "CB", "S"),
                    100, replace = TRUE,
                    prob = c(0.06, 0.08, 0.15, 0.06, 0.20, 0.15, 0.10, 0.12, 0.08)),
  performance = rnorm(100, 60, 15),
  performance = pmin(95, pmax(30, performance)),
  salary = case_when(
    performance > 80 ~ rnorm(n(), 15, 3),
    performance > 70 ~ rnorm(n(), 10, 2),
    performance > 60 ~ rnorm(n(), 6, 1.5),
    performance > 50 ~ rnorm(n(), 3, 1),
    TRUE ~ rnorm(n(), 1.5, 0.5)
  ),
  salary = pmax(0.9, salary),
  age = round(22 + (performance - 30) / 3 + rnorm(n(), 0, 2)),
  age = pmin(35, pmax(22, age))
) %>%
  arrange(desc(performance))

# Position requirements
position_needs <- tribble(
  ~position, ~min_players, ~max_players,
  "QB",      2,            4,
  "RB",      3,            6,
  "WR",      5,            8,
  "TE",      2,            4,
  "OL",      8,            12,
  "DL",      6,            10,
  "LB",      5,            8,
  "CB",      5,            8,
  "S",       4,            6
)

# Simple greedy algorithm (not true optimization but illustrative)
build_optimal_roster <- function(players, salary_cap = 224.8, roster_size = 53) {
  # Calculate value per dollar
  players <- players %>%
    mutate(value_per_dollar = performance / salary)

  selected <- tibble()
  remaining_cap <- salary_cap
  remaining_spots <- roster_size

  # First pass: Meet minimum requirements
  for (pos in position_needs$position) {
    min_needed <- position_needs$min_players[position_needs$position == pos]

    pos_players <- players %>%
      filter(position == pos, !player_id %in% selected$player_id) %>%
      arrange(desc(value_per_dollar)) %>%
      head(min_needed)

    if (nrow(pos_players) > 0 && sum(pos_players$salary) <= remaining_cap) {
      selected <- bind_rows(selected, pos_players)
      remaining_cap <- remaining_cap - sum(pos_players$salary)
      remaining_spots <- remaining_spots - nrow(pos_players)
    }
  }

  # Second pass: Fill remaining spots with best value
  while (remaining_spots > 0 && nrow(players %>%
         filter(!player_id %in% selected$player_id, salary <= remaining_cap)) > 0) {

    best_player <- players %>%
      filter(!player_id %in% selected$player_id, salary <= remaining_cap) %>%
      arrange(desc(value_per_dollar)) %>%
      head(1)

    selected <- bind_rows(selected, best_player)
    remaining_cap <- remaining_cap - best_player$salary
    remaining_spots <- remaining_spots - 1
  }

  return(selected)
}

# Build optimal roster
optimal_roster <- build_optimal_roster(player_pool)

# Analyze results
roster_summary <- optimal_roster %>%
  summarise(
    total_players = n(),
    total_salary = sum(salary),
    avg_performance = mean(performance),
    total_performance = sum(performance),
    avg_age = mean(age),
    cap_space = 224.8 - total_salary
  )

roster_summary %>%
  gt() %>%
  cols_label(
    total_players = "Players",
    total_salary = "Total Salary ($M)",
    avg_performance = "Avg Performance",
    total_performance = "Total Performance",
    avg_age = "Avg Age",
    cap_space = "Cap Space ($M)"
  ) %>%
  fmt_number(columns = c(avg_performance, total_performance, avg_age), decimals = 1) %>%
  fmt_currency(columns = c(total_salary, cap_space), decimals = 1) %>%
  tab_header(
    title = "Optimized Roster Summary",
    subtitle = "Greedy algorithm maximizing performance per dollar"
  )

# Position breakdown
position_breakdown <- optimal_roster %>%
  group_by(position) %>%
  summarise(
    count = n(),
    total_salary = sum(salary),
    avg_performance = mean(performance),
    .groups = "drop"
  ) %>%
  arrange(desc(total_salary))

position_breakdown %>%
  gt() %>%
  cols_label(
    position = "Position",
    count = "Players",
    total_salary = "Total Salary ($M)",
    avg_performance = "Avg Performance"
  ) %>%
  fmt_currency(columns = total_salary, decimals = 1) %>%
  fmt_number(columns = avg_performance, decimals = 1) %>%
  tab_header(
    title = "Optimized Roster by Position"
  )

#| label: optimization-model-py
#| message: false
#| warning: false

np.random.seed(999)

# Create player pool
positions = ["QB", "RB", "WR", "TE", "OL", "DL", "LB", "CB", "S"]
position_probs = [0.06, 0.08, 0.15, 0.06, 0.20, 0.15, 0.10, 0.12, 0.08]

player_positions = np.random.choice(positions, size=100, p=position_probs)
performance = np.clip(np.random.normal(60, 15, 100), 30, 95)

salary = np.where(performance > 80, np.random.normal(15, 3, 100),
         np.where(performance > 70, np.random.normal(10, 2, 100),
         np.where(performance > 60, np.random.normal(6, 1.5, 100),
         np.where(performance > 50, np.random.normal(3, 1, 100),
                  np.random.normal(1.5, 0.5, 100)))))
salary = np.maximum(0.9, salary)

age = np.clip(np.round(22 + (performance - 30) / 3 + np.random.normal(0, 2, 100)), 22, 35)

player_pool = pd.DataFrame({
    'player_id': range(1, 101),
    'position': player_positions,
    'performance': performance,
    'salary': salary,
    'age': age,
    'value_per_dollar': performance / salary
}).sort_values('performance', ascending=False)

# Position requirements
position_needs = pd.DataFrame({
    'position': ['QB', 'RB', 'WR', 'TE', 'OL', 'DL', 'LB', 'CB', 'S'],
    'min_players': [2, 3, 5, 2, 8, 6, 5, 5, 4],
    'max_players': [4, 6, 8, 4, 12, 10, 8, 8, 6]
})

# Greedy roster builder
def build_optimal_roster(players, salary_cap=224.8, roster_size=53):
    selected = []
    remaining_cap = salary_cap
    remaining_spots = roster_size

    # Meet minimum requirements
    for _, need in position_needs.iterrows():
        pos = need['position']
        min_needed = need['min_players']

        pos_players = (players[
            (players['position'] == pos) &
            (~players['player_id'].isin([p['player_id'] for p in selected]))
        ].sort_values('value_per_dollar', ascending=False)
         .head(min_needed))

        if len(pos_players) > 0 and pos_players['salary'].sum() <= remaining_cap:
            selected.extend(pos_players.to_dict('records'))
            remaining_cap -= pos_players['salary'].sum()
            remaining_spots -= len(pos_players)

    # Fill remaining spots
    selected_ids = [p['player_id'] for p in selected]
    while remaining_spots > 0:
        available = players[
            (~players['player_id'].isin(selected_ids)) &
            (players['salary'] <= remaining_cap)
        ].sort_values('value_per_dollar', ascending=False)

        if len(available) == 0:
            break

        best = available.iloc[0].to_dict()
        selected.append(best)
        selected_ids.append(best['player_id'])
        remaining_cap -= best['salary']
        remaining_spots -= 1

    return pd.DataFrame(selected)

# Build roster
optimal_roster = build_optimal_roster(player_pool)

# Summary
print("\nOptimized Roster Summary")
print("=" * 70)
print(f"Total Players: {len(optimal_roster)}")
print(f"Total Salary: ${optimal_roster['salary'].sum():.1f}M")
print(f"Avg Performance: {optimal_roster['performance'].mean():.1f}")
print(f"Total Performance: {optimal_roster['performance'].sum():.1f}")
print(f"Avg Age: {optimal_roster['age'].mean():.1f}")
print(f"Cap Space: ${224.8 - optimal_roster['salary'].sum():.1f}M")

# Position breakdown
print("\n\nOptimized Roster by Position")
print("=" * 70)
position_breakdown = optimal_roster.groupby('position').agg(
    count=('player_id', 'count'),
    total_salary=('salary', 'sum'),
    avg_performance=('performance', 'mean')
).reset_index().sort_values('total_salary', ascending=False)
print(position_breakdown.to_string(index=False))

Visualizing the Optimal Roster

R
Python

#| label: fig-optimal-roster-r
#| fig-cap: "Optimal roster composition balances performance and cost"
#| fig-width: 12
#| fig-height: 6

ggplot(optimal_roster, aes(x = salary, y = performance, color = position)) +
  geom_point(size = 3, alpha = 0.7) +
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dashed",
              size = 0.8, alpha = 0.5) +
  scale_color_manual(
    values = c("QB" = "#C8102E", "RB" = "#0076B6", "WR" = "#FFB612",
               "TE" = "#006778", "OL" = "#A5ACAF", "DL" = "#773141",
               "LB" = "#101820", "CB" = "#FF6B35", "S" = "#004E89")
  ) +
  labs(
    title = "Optimized Roster: Performance vs Salary",
    subtitle = "Players selected to maximize total performance subject to cap constraints",
    x = "Annual Salary ($M)",
    y = "Performance Rating",
    color = "Position",
    caption = "Dashed line shows overall performance-salary relationship"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

#| label: fig-optimal-roster-py
#| fig-cap: "Optimal roster visualization (Python)"
#| fig-width: 12
#| fig-height: 6

plt.figure(figsize=(12, 6))

pos_colors = {
    'QB': '#C8102E', 'RB': '#0076B6', 'WR': '#FFB612',
    'TE': '#006778', 'OL': '#A5ACAF', 'DL': '#773141',
    'LB': '#101820', 'CB': '#FF6B35', 'S': '#004E89'
}

for position in optimal_roster['position'].unique():
    pos_data = optimal_roster[optimal_roster['position'] == position]
    plt.scatter(pos_data['salary'], pos_data['performance'],
                c=pos_colors.get(position, 'gray'), label=position,
                s=100, alpha=0.7, edgecolors='black', linewidth=0.5)

# Trend line
z = np.polyfit(optimal_roster['salary'], optimal_roster['performance'], 1)
p = np.poly1d(z)
plt.plot(optimal_roster['salary'].sort_values(),
         p(optimal_roster['salary'].sort_values()),
         "k--", alpha=0.5, linewidth=2, label='Trend')

plt.xlabel('Annual Salary ($M)', fontsize=12)
plt.ylabel('Performance Rating', fontsize=12)
plt.title('Optimized Roster: Performance vs Salary\nPlayers selected to maximize total performance subject to cap constraints',
          fontsize=14, fontweight='bold')
plt.legend(loc='lower right', ncol=2)
plt.grid(True, alpha=0.3)
plt.text(0.98, 0.02, 'Dashed line shows overall performance-salary relationship',
         transform=plt.gca().transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Special Teams and Practice Squad

Special Teams Roster Allocation

Special teams are often overlooked but crucial for field position and hidden yardage.

R
Python

#| label: special-teams-r
#| message: false
#| warning: false

# Special teams roster spots and value
st_positions <- tibble(
  specialist = c("Kicker", "Punter", "Long Snapper"),
  roster_spots = c(1, 1, 1),
  avg_salary = c(3.2, 2.1, 1.2),
  replacement_value = c(45, 40, 35),
  starter_value = c(70, 65, 55)
) %>%
  mutate(
    value_over_replacement = starter_value - replacement_value,
    value_per_dollar = value_over_replacement / avg_salary
  )

st_positions %>%
  gt() %>%
  cols_label(
    specialist = "Specialist",
    roster_spots = "Spots",
    avg_salary = "Avg Salary ($M)",
    replacement_value = "Replacement Level",
    starter_value = "Quality Starter",
    value_over_replacement = "Value Over Replacement",
    value_per_dollar = "VOR per $M"
  ) %>%
  fmt_currency(columns = avg_salary, decimals = 1) %>%
  fmt_number(columns = c(replacement_value, starter_value,
                         value_over_replacement, value_per_dollar), decimals = 1) %>%
  tab_header(
    title = "Special Teams Specialist Value",
    subtitle = "Kickers provide best value over replacement"
  )

#| label: special-teams-py
#| message: false
#| warning: false

st_data = {
    'specialist': ['Kicker', 'Punter', 'Long Snapper'],
    'roster_spots': [1, 1, 1],
    'avg_salary': [3.2, 2.1, 1.2],
    'replacement_value': [45, 40, 35],
    'starter_value': [70, 65, 55]
}

st_positions = pd.DataFrame(st_data)
st_positions['value_over_replacement'] = (st_positions['starter_value'] -
                                           st_positions['replacement_value'])
st_positions['value_per_dollar'] = (st_positions['value_over_replacement'] /
                                     st_positions['avg_salary'])

print("\nSpecial Teams Specialist Value")
print("=" * 90)
print(st_positions.to_string(index=False))

Practice Squad Strategy

Practice Squad Optimization

The practice squad (16 players) serves multiple purposes: - **Development**: Young players gaining experience - **Scout team**: Simulating opponent tendencies - **Injury insurance**: Immediate replacements - **Positional flexibility**: Cover multiple positions Optimal practice squad allocation: - 3-4 developmental QBs/skill players - 4-5 offensive linemen (injury-prone position) - 3-4 defensive backs (scheme versatility) - 2-3 developmental edge rushers - 1-2 special teams aces

Summary

Roster construction is a complex optimization problem requiring teams to:

Strategic Decisions:
- Allocate resources across positions based on positional value
- Balance draft picks (cost-controlled) vs free agents (certainty)
- Choose between star-heavy and balanced approaches
- Manage age distribution and championship windows

Key Insights from Analysis:
1. Positional value varies: QB shows highest variance, suggesting concentrated investment
2. Draft provides efficiency: Better value per dollar but higher uncertainty
3. Championship patterns: ~60% draft-built, balanced approach, avg age 26-27
4. Age curves differ: Position-specific peaks require strategic planning
5. Optimization matters: Systematic approaches outperform ad-hoc decisions

Best Practices:
- Build core through draft (especially QB, OL, DL)
- Use free agency for immediate needs and veteran leadership
- Maintain age balance to avoid simultaneous declines
- Invest heavily in high-variance positions (QB, pass rush, OT)
- Monitor contract efficiency and dead cap
- Develop organizational player development systems

Exercises

Conceptual Questions

Positional Value: Explain why quarterback shows the highest variance in EPA contribution. How should this inform salary allocation strategy?
Draft vs Free Agency: Under what circumstances should a team prioritize free agency over the draft? Consider both competitive window and roster needs.
Age Management: Design a 5-year age management plan for a team with an aging roster (average age 28.5) that wants to remain competitive while getting younger.

Coding Exercises

Exercise 1: Championship Roster Analysis

Using the championship team data: a) Calculate the correlation between draft percentage and championship success b) Analyze whether QB salary percentage relates to team building strategy c) Create visualizations comparing different championship team approaches **Hint**: Use correlation analysis and create scatter plots with trend lines.

Exercise 2: Build Your Optimization Model

Create a roster optimization model that: a) Maximizes expected team performance b) Stays under the salary cap c) Meets minimum position requirements d) Includes age diversity constraints **Challenge**: Add constraints for maximum players from same draft class or college.

Exercise 3: Age Curve Analysis

Using the age curve data: a) Calculate the optimal contract length for each position given age curves b) Identify which ages represent "buy low" opportunities c) Model expected performance decline for players at different ages **Extension**: Estimate when to trade aging players before value cliff.

Exercise 4: Draft Value Analysis

Analyze draft pick value: a) Create your own draft value chart using historical player data b) Calculate break-even points for trading up in the draft c) Model expected contract value vs performance for each round **Data**: Use nflfastR roster and contract data combined with performance metrics.

References

:::

Learning ObjectivesBy the end of this chapter, you will be able to:

Introduction

The Roster Construction Challenge

The Analytics Revolution in Team Building

Historical Approaches

Modern Analytics-Driven Approaches

Positional Importance and Allocation

Understanding Positional Value

Positional Value Insights

Visualizing Positional Value Distributions

📊 Visualization Output

📊 Visualization Output

Salary Allocation by Position

Draft vs Free Agency Strategies

The Value of Draft Picks

Draft Pick Value Curve

Draft vs Free Agency Cost Comparison

Key Insight: Draft Value Efficiency

Star Players vs Depth Approaches

The Star Player Premium

📊 Visualization Output

Age Distribution and Team Windows

Understanding Age Curves

Visualizing Age Curves

📊 Visualization Output

Team Age Distribution Analysis

Age Pyramid Visualization

Championship Roster Case Studies

Analyzing Recent Super Bowl Winners

Championship Team Insights

Roster Optimization Models

Linear Programming Approach

Visualizing the Optimal Roster

📊 Visualization Output

Special Teams and Practice Squad

Special Teams Roster Allocation

Practice Squad Strategy

Practice Squad Optimization

Summary

Exercises

Conceptual Questions

Coding Exercises

Exercise 1: Championship Roster Analysis

Exercise 2: Build Your Optimization Model

Exercise 3: Age Curve Analysis

Exercise 4: Draft Value Analysis

Further Reading

References