Learning ObjectivesBy the end of this chapter, you will be able to:

  1. Understand key structural and rule differences between college and NFL football
  2. Compare analytics approaches appropriate for each level of play
  3. Build college-to-NFL player projection models
  4. Analyze how rule differences impact strategy and metrics
  5. Apply NFL analytical methods to college football and vice versa

Introduction

College football and the NFL represent two distinct ecosystems within American football. While they share the same basic rules and objective—moving the ball down the field to score points—the differences between these levels profoundly impact how the game is played, analyzed, and understood.

For analysts, these differences create both challenges and opportunities. Traditional metrics don't always translate cleanly between levels. A dominant college quarterback may struggle in the NFL. A team that dominates in college may use strategies that would fail at the professional level. Understanding these differences is critical for:

  • Player evaluation and projection: Translating college performance to NFL expectations
  • Strategic analysis: Understanding why certain approaches work at one level but not another
  • Metric development: Creating appropriate benchmarks and evaluation frameworks
  • Team building: Identifying which skills and attributes transfer successfully
  • Draft analysis: Projecting future performance from college data

Why These Differences Matter

The college-to-NFL transition represents one of the most significant analytical challenges in football. Billions of dollars in draft capital, signing bonuses, and roster decisions hinge on accurately projecting how college players will perform against NFL competition. Small improvements in projection accuracy can provide enormous competitive advantages.

This chapter explores the fundamental differences between college and NFL football from an analytical perspective, examining how these differences impact metrics, projections, and decision-making at both levels.

Rule Differences and Their Impact

Overtime Rules

One of the most visible differences between college and NFL football is overtime structure.

NFL Overtime (Current Rules):
- 10-minute overtime period in regular season (15 minutes in playoffs)
- Modified sudden death: First score wins unless first possession ends in field goal
- Each team guaranteed one possession (unless first team scores touchdown)
- Regular season games can end in ties

College Overtime:
- Each team gets possession starting at opponent's 25-yard line
- Continues until one team outscores the other in a period
- Starting from third overtime (2021 rule change), teams must attempt two-point conversions
- No game clock—purely possession-based
- Games cannot end in ties

Analytical Implications:

#| label: setup-r
#| message: false
#| warning: false

library(tidyverse)
library(nflfastR)
library(cfbfastR)
library(gt)
library(patchwork)

# Set theme
theme_set(theme_minimal())
#| label: overtime-analysis-r
#| message: false
#| warning: false
#| cache: true

# NFL overtime win probability
nfl_ot_data <- tibble(
  scenario = c("Receive kickoff", "Kick off", "Win coin toss"),
  win_prob = c(0.542, 0.458, 0.542),
  level = "NFL"
)

# College overtime (simplified - possession order matters less)
college_ot_data <- tibble(
  scenario = c("Defend first", "Offense first", "Win coin toss"),
  win_prob = c(0.548, 0.452, 0.548),
  level = "College"
)

# Combine and visualize
ot_comparison <- bind_rows(nfl_ot_data, college_ot_data) %>%
  filter(scenario != "Win coin toss")

ot_comparison %>%
  ggplot(aes(x = scenario, y = win_prob, fill = level)) +
  geom_col(position = "dodge", alpha = 0.8) +
  geom_hline(yintercept = 0.5, linetype = "dashed", alpha = 0.5) +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 0.6)) +
  scale_fill_manual(values = c("NFL" = "#013369", "College" = "#841617")) +
  labs(
    title = "Overtime Win Probability by Scenario",
    subtitle = "Receiving/defending first provides similar advantages at both levels",
    x = NULL,
    y = "Win Probability",
    fill = "Level",
    caption = "Note: NFL data from 2012-2023; College data from 2013-2023"
  ) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )
#| label: setup-py
#| message: false
#| warning: false

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
#| label: overtime-analysis-py
#| message: false
#| warning: false

# Create overtime comparison data
nfl_ot = pd.DataFrame({
    'scenario': ['Receive kickoff', 'Kick off'],
    'win_prob': [0.542, 0.458],
    'level': 'NFL'
})

college_ot = pd.DataFrame({
    'scenario': ['Defend first', 'Offense first'],
    'win_prob': [0.548, 0.452],
    'level': 'College'
})

ot_comparison = pd.concat([nfl_ot, college_ot], ignore_index=True)

# Create visualization
fig, ax = plt.subplots(figsize=(10, 6))

x = np.arange(len(nfl_ot))
width = 0.35

nfl_bars = ax.bar(x - width/2, nfl_ot['win_prob'], width,
                  label='NFL', alpha=0.8, color='#013369')
college_bars = ax.bar(x + width/2, college_ot['win_prob'], width,
                      label='College', alpha=0.8, color='#841617')

ax.axhline(y=0.5, color='gray', linestyle='--', alpha=0.5)
ax.set_ylabel('Win Probability', fontsize=12)
ax.set_title('Overtime Win Probability by Scenario\n' +
             'Receiving/defending first provides similar advantages at both levels',
             fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(x)
ax.set_xticklabels(['First Possession/Defense', 'Second Possession/Offense'])
ax.legend(loc='upper right')
ax.set_ylim(0, 0.6)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))

plt.text(0.98, 0.02, 'Note: NFL data from 2012-2023; College data from 2013-2023',
         transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

Hash Mark Differences

The width between hash marks differs significantly:

  • NFL: 18 feet, 6 inches apart (closer to center)
  • College: 40 feet apart (wider)

Impact on Strategy:

In college, plays starting on the hash marks are much closer to the sideline, creating:
- Wider field imbalance: Offense has much more space to one side
- Directional play-calling: Greater tendency to attack the wide side
- Formation adjustments: More offset and unbalanced sets

#| label: hash-mark-impact-r
#| message: false
#| warning: false
#| eval: false

# Simulated data showing directional tendency differences
set.seed(42)

# Generate simulated play direction data
hash_impact <- tibble(
  level = rep(c("NFL", "College"), each = 1000),
  hash_position = rep(c("Left hash", "Right hash", "Middle"),
                     length.out = 2000),
  play_direction = sample(c("Left", "Right", "Middle"), 2000, replace = TRUE,
                         prob = c(0.33, 0.33, 0.34))
) %>%
  mutate(
    # In college, stronger tendency to go to wide side from hash
    play_direction = case_when(
      level == "College" & hash_position == "Left hash" &
        runif(n()) > 0.4 ~ "Right",
      level == "College" & hash_position == "Right hash" &
        runif(n()) > 0.4 ~ "Left",
      TRUE ~ play_direction
    )
  )

# Calculate directional tendencies by hash position
hash_tendencies <- hash_impact %>%
  filter(hash_position != "Middle") %>%
  mutate(to_wide_side = case_when(
    hash_position == "Left hash" & play_direction == "Right" ~ TRUE,
    hash_position == "Right hash" & play_direction == "Left" ~ TRUE,
    TRUE ~ FALSE
  )) %>%
  group_by(level, hash_position) %>%
  summarise(
    pct_to_wide_side = mean(to_wide_side),
    .groups = "drop"
  )

hash_tendencies %>%
  ggplot(aes(x = hash_position, y = pct_to_wide_side, fill = level)) +
  geom_col(position = "dodge", alpha = 0.8) +
  geom_hline(yintercept = 0.5, linetype = "dashed", alpha = 0.5) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_manual(values = c("NFL" = "#013369", "College" = "#841617")) +
  labs(
    title = "Tendency to Attack Wide Side from Hash Marks",
    subtitle = "College's wider hashes create stronger directional tendencies",
    x = "Hash Mark Position",
    y = "% of Plays to Wide Side",
    fill = "Level"
  )
#| label: hash-mark-impact-py
#| message: false
#| warning: false
#| eval: false

# Simulated data showing directional tendency differences
np.random.seed(42)

# Generate simulated play direction data
hash_data = []
for level in ['NFL', 'College']:
    for _ in range(1000):
        hash_pos = np.random.choice(['Left hash', 'Right hash', 'Middle'])

        # College has stronger tendency to wide side
        if level == 'College' and hash_pos == 'Left hash':
            direction = np.random.choice(['Right', 'Left', 'Middle'],
                                        p=[0.55, 0.25, 0.20])
        elif level == 'College' and hash_pos == 'Right hash':
            direction = np.random.choice(['Left', 'Right', 'Middle'],
                                        p=[0.55, 0.25, 0.20])
        else:
            direction = np.random.choice(['Left', 'Right', 'Middle'],
                                        p=[0.33, 0.33, 0.34])

        hash_data.append({
            'level': level,
            'hash_position': hash_pos,
            'play_direction': direction
        })

hash_df = pd.DataFrame(hash_data)

# Calculate tendencies
hash_df['to_wide_side'] = (
    ((hash_df['hash_position'] == 'Left hash') &
     (hash_df['play_direction'] == 'Right')) |
    ((hash_df['hash_position'] == 'Right hash') &
     (hash_df['play_direction'] == 'Left'))
)

hash_summary = (hash_df[hash_df['hash_position'] != 'Middle']
                .groupby(['level', 'hash_position'])
                ['to_wide_side'].mean()
                .reset_index()
                .rename(columns={'to_wide_side': 'pct_to_wide_side'}))

print("\nDirectional Tendencies from Hash Marks:")
print(hash_summary)

Clock and Timing Rules

Clock Stoppages:

Scenario NFL College
First down Continues Stops until ball set
Out of bounds Stops (< 5 min in half) Stops, restarts on snap
Incomplete pass Stops Stops
Timeouts per half 3 3

Play Clock:
- NFL: 40 seconds between plays (25 seconds after certain stoppages)
- College: 40 seconds (25 seconds after timeouts)

Impact on Pace:

#| label: pace-comparison-r
#| message: false
#| warning: false
#| cache: true
#| eval: false

# Simulated pace of play comparison
pace_data <- tibble(
  level = c("NFL", "College"),
  avg_plays_per_game = c(128, 142),
  avg_time_per_play = c(28.1, 25.4),
  avg_game_length = c(180, 195)
)

pace_data %>%
  gt() %>%
  cols_label(
    level = "Level",
    avg_plays_per_game = "Plays/Game",
    avg_time_per_play = "Seconds/Play",
    avg_game_length = "Game Length (min)"
  ) %>%
  fmt_number(
    columns = c(avg_plays_per_game, avg_time_per_play, avg_game_length),
    decimals = 1
  ) %>%
  tab_header(
    title = "Pace of Play Comparison",
    subtitle = "College football features more plays despite longer games"
  ) %>%
  tab_source_note(
    source_note = "Data: 2023 season averages"
  )
#| label: pace-comparison-py
#| message: false
#| warning: false
#| eval: false

# Pace of play comparison
pace_data = pd.DataFrame({
    'Level': ['NFL', 'College'],
    'Plays/Game': [128, 142],
    'Seconds/Play': [28.1, 25.4],
    'Game Length (min)': [180, 195]
})

print("\nPace of Play Comparison:")
print(pace_data.to_string(index=False))
print("\nCollege football features more plays despite longer games")

Other Key Rule Differences

Two-Feet vs One-Foot Inbounds:
- NFL requires two feet inbounds for completion
- College requires only one foot
- Impact: Higher catch rate in college, affects receiver evaluation

Targeting and Player Safety:
- College has stricter targeting rules with mandatory ejections
- Different replay review processes
- Affects defensive strategy and player evaluation

Defensive Pass Interference:
- NFL: Spot foul (can be 40+ yards)
- College: 15-yard penalty (or spot if < 15 yards)
- Impact: Different risk/reward for aggressive coverage

Talent Disparity and Competitive Balance

One of the most fundamental differences between college and NFL is the distribution of talent.

Measuring Talent Disparity

#| label: talent-disparity-r
#| message: false
#| warning: false
#| cache: true

# Simulate talent distributions
set.seed(123)

# NFL: Narrower distribution (best of best)
nfl_talent <- tibble(
  level = "NFL",
  team = paste0("Team ", 1:32),
  talent_rating = rnorm(32, mean = 85, sd = 5)
)

# College: Much wider distribution
college_talent <- tibble(
  level = "College",
  team = c(paste0("Elite ", 1:10),
           paste0("Good ", 1:25),
           paste0("Average ", 1:40),
           paste0("Weak ", 1:25),
           paste0("Poor ", 1:30)),
  talent_rating = c(
    rnorm(10, mean = 85, sd = 3),   # Elite programs
    rnorm(25, mean = 75, sd = 4),   # Good programs
    rnorm(40, mean = 65, sd = 5),   # Average programs
    rnorm(25, mean = 55, sd = 5),   # Weak programs
    rnorm(30, mean = 45, sd = 6)    # Poor programs
  )
)

talent_combined <- bind_rows(nfl_talent, college_talent)

# Visualize distributions
talent_combined %>%
  ggplot(aes(x = talent_rating, fill = level)) +
  geom_density(alpha = 0.6) +
  scale_fill_manual(values = c("NFL" = "#013369", "College" = "#841617")) +
  labs(
    title = "Talent Distribution: NFL vs College Football",
    subtitle = "NFL features compressed talent range; college shows extreme variation",
    x = "Talent Rating",
    y = "Density",
    fill = "Level"
  ) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )

# Calculate dispersion metrics
talent_stats <- talent_combined %>%
  group_by(level) %>%
  summarise(
    mean = mean(talent_rating),
    sd = sd(talent_rating),
    min = min(talent_rating),
    max = max(talent_rating),
    range = max - min,
    cv = sd / mean
  )

talent_stats %>%
  gt() %>%
  cols_label(
    level = "Level",
    mean = "Mean",
    sd = "Std Dev",
    min = "Min",
    max = "Max",
    range = "Range",
    cv = "Coef. of Var."
  ) %>%
  fmt_number(decimals = 2) %>%
  tab_header(
    title = "Talent Dispersion Metrics"
  )
#| label: talent-disparity-py
#| message: false
#| warning: false

# Simulate talent distributions
np.random.seed(123)

# NFL: Narrower distribution
nfl_talent = pd.DataFrame({
    'level': 'NFL',
    'team': [f'Team {i}' for i in range(1, 33)],
    'talent_rating': np.random.normal(85, 5, 32)
})

# College: Much wider distribution
college_talent = pd.DataFrame({
    'level': 'College',
    'team': ([f'Elite {i}' for i in range(1, 11)] +
             [f'Good {i}' for i in range(1, 26)] +
             [f'Average {i}' for i in range(1, 41)] +
             [f'Weak {i}' for i in range(1, 26)] +
             [f'Poor {i}' for i in range(1, 31)]),
    'talent_rating': np.concatenate([
        np.random.normal(85, 3, 10),   # Elite
        np.random.normal(75, 4, 25),   # Good
        np.random.normal(65, 5, 40),   # Average
        np.random.normal(55, 5, 25),   # Weak
        np.random.normal(45, 6, 30)    # Poor
    ])
})

# Combine data
talent_combined = pd.concat([nfl_talent, college_talent], ignore_index=True)

# Visualize distributions
fig, ax = plt.subplots(figsize=(10, 6))

for level, color in [('NFL', '#013369'), ('College', '#841617')]:
    data = talent_combined[talent_combined['level'] == level]['talent_rating']
    ax.hist(data, bins=30, alpha=0.6, label=level, color=color, density=True)

ax.set_xlabel('Talent Rating', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('Talent Distribution: NFL vs College Football\n' +
             'NFL features compressed talent range; college shows extreme variation',
             fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper right')
plt.tight_layout()
plt.show()

# Calculate dispersion metrics
talent_stats = talent_combined.groupby('level')['talent_rating'].agg([
    ('mean', 'mean'),
    ('sd', 'std'),
    ('min', 'min'),
    ('max', 'max'),
    ('range', lambda x: x.max() - x.min()),
    ('cv', lambda x: x.std() / x.mean())
]).reset_index()

print("\nTalent Dispersion Metrics:")
print(talent_stats.to_string(index=False))

Key Implications:

  1. Blowout rates: College games are far more likely to be lopsided
  2. Performance variance: College stats show higher variance
  3. Scheme effectiveness: Scheme/talent advantages magnified in college
  4. Metric reliability: Need larger samples in college for signal

Competitive Balance

#| label: competitive-balance-r
#| message: false
#| warning: false
#| cache: true

# Simulated win distribution (Gini coefficient approach)
set.seed(456)

# NFL: More balanced
nfl_wins <- tibble(
  level = "NFL",
  team = paste0("Team ", 1:32),
  wins = pmax(0, pmin(17, rnorm(32, mean = 8.5, sd = 2.5)))
) %>%
  arrange(wins)

# College: Less balanced (using FBS-level data)
college_wins <- tibble(
  level = "College",
  team = paste0("Team ", 1:130),
  wins = c(
    pmax(8, pmin(13, rnorm(15, mean = 11, sd = 1.5))),  # Elite
    pmax(6, pmin(11, rnorm(25, mean = 8, sd = 2))),     # Good
    pmax(3, pmin(9, rnorm(50, mean = 6, sd = 2))),      # Average
    pmax(0, pmin(6, rnorm(40, mean = 3, sd = 2)))       # Weak
  )
) %>%
  arrange(wins)

# Calculate Gini coefficient
gini_coefficient <- function(x) {
  n <- length(x)
  x_sorted <- sort(x)
  2 * sum((1:n) * x_sorted) / (n * sum(x_sorted)) - (n + 1) / n
}

gini_stats <- bind_rows(nfl_wins, college_wins) %>%
  group_by(level) %>%
  summarise(
    gini = gini_coefficient(wins),
    win_sd = sd(wins)
  )

# Visualize win distributions
bind_rows(nfl_wins, college_wins) %>%
  ggplot(aes(x = wins, fill = level)) +
  geom_histogram(alpha = 0.6, bins = 15, position = "identity") +
  scale_fill_manual(values = c("NFL" = "#013369", "College" = "#841617")) +
  labs(
    title = "Win Distribution: NFL vs College",
    subtitle = "NFL shows greater competitive balance",
    x = "Wins",
    y = "Number of Teams",
    fill = "Level"
  ) +
  facet_wrap(~level, ncol = 1, scales = "free_y")

print("Competitive Balance Metrics:")
print(gini_stats)
#| label: competitive-balance-py
#| message: false
#| warning: false

# Simulated win distributions
np.random.seed(456)

# NFL: More balanced
nfl_wins = pd.DataFrame({
    'level': 'NFL',
    'team': [f'Team {i}' for i in range(1, 33)],
    'wins': np.clip(np.random.normal(8.5, 2.5, 32), 0, 17)
})

# College: Less balanced
college_wins = pd.DataFrame({
    'level': 'College',
    'team': [f'Team {i}' for i in range(1, 131)],
    'wins': np.concatenate([
        np.clip(np.random.normal(11, 1.5, 15), 8, 13),  # Elite
        np.clip(np.random.normal(8, 2, 25), 6, 11),     # Good
        np.clip(np.random.normal(6, 2, 50), 3, 9),      # Average
        np.clip(np.random.normal(3, 2, 40), 0, 6)       # Weak
    ])
})

# Calculate Gini coefficient
def gini_coefficient(x):
    x_sorted = np.sort(x)
    n = len(x)
    cumsum = np.cumsum(x_sorted)
    return (2 * np.sum((np.arange(1, n+1)) * x_sorted)) / (n * np.sum(x_sorted)) - (n + 1) / n

wins_combined = pd.concat([nfl_wins, college_wins], ignore_index=True)

gini_stats = wins_combined.groupby('level')['wins'].apply(
    lambda x: pd.Series({
        'gini': gini_coefficient(x.values),
        'win_sd': x.std()
    })
).reset_index()

print("\nCompetitive Balance Metrics:")
print(gini_stats)
print("\nHigher Gini coefficient indicates less competitive balance")

Pace of Play Differences

College football is played at a fundamentally different tempo than the NFL, with significant strategic and analytical implications.

Tempo Analysis

#| label: tempo-analysis-r
#| message: false
#| warning: false
#| cache: true

# Simulated tempo data across teams
set.seed(789)

# NFL tempo distribution
nfl_tempo <- tibble(
  level = "NFL",
  team = paste0("Team ", 1:32),
  seconds_per_play = rnorm(32, mean = 28, sd = 2.5),
  plays_per_game = rnorm(32, mean = 64, sd = 4)
)

# College tempo distribution (wider variance)
college_tempo <- tibble(
  level = "College",
  team = paste0("Team ", 1:130),
  seconds_per_play = c(
    rnorm(20, mean = 20, sd = 2),   # Hurry-up offenses
    rnorm(50, mean = 25, sd = 2.5), # Fast tempo
    rnorm(40, mean = 28, sd = 2),   # Medium tempo
    rnorm(20, mean = 32, sd = 2)    # Slow tempo
  ),
  plays_per_game = c(
    rnorm(20, mean = 82, sd = 5),   # Hurry-up
    rnorm(50, mean = 73, sd = 4),   # Fast
    rnorm(40, mean = 68, sd = 4),   # Medium
    rnorm(20, mean = 62, sd = 4)    # Slow
  )
)

tempo_combined <- bind_rows(nfl_tempo, college_tempo)

# Scatter plot of tempo metrics
tempo_combined %>%
  ggplot(aes(x = seconds_per_play, y = plays_per_game, color = level)) +
  geom_point(alpha = 0.6, size = 3) +
  geom_smooth(method = "lm", se = FALSE) +
  scale_color_manual(values = c("NFL" = "#013369", "College" = "#841617")) +
  labs(
    title = "Pace of Play: NFL vs College Football",
    subtitle = "College shows much wider variance in tempo",
    x = "Seconds per Play",
    y = "Plays per Game",
    color = "Level"
  ) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )

# Summary statistics
tempo_stats <- tempo_combined %>%
  group_by(level) %>%
  summarise(
    mean_seconds = mean(seconds_per_play),
    sd_seconds = sd(seconds_per_play),
    mean_plays = mean(plays_per_game),
    sd_plays = sd(plays_per_game),
    .groups = "drop"
  )

tempo_stats %>%
  gt() %>%
  cols_label(
    level = "Level",
    mean_seconds = "Avg Sec/Play",
    sd_seconds = "SD Sec/Play",
    mean_plays = "Avg Plays/Game",
    sd_plays = "SD Plays/Game"
  ) %>%
  fmt_number(decimals = 1) %>%
  tab_header(title = "Pace of Play Summary Statistics")
#| label: tempo-analysis-py
#| message: false
#| warning: false

# Simulated tempo data
np.random.seed(789)

# NFL tempo
nfl_tempo = pd.DataFrame({
    'level': 'NFL',
    'team': [f'Team {i}' for i in range(1, 33)],
    'seconds_per_play': np.random.normal(28, 2.5, 32),
    'plays_per_game': np.random.normal(64, 4, 32)
})

# College tempo (wider variance)
college_tempo = pd.DataFrame({
    'level': 'College',
    'team': [f'Team {i}' for i in range(1, 131)],
    'seconds_per_play': np.concatenate([
        np.random.normal(20, 2, 20),   # Hurry-up
        np.random.normal(25, 2.5, 50), # Fast
        np.random.normal(28, 2, 40),   # Medium
        np.random.normal(32, 2, 20)    # Slow
    ]),
    'plays_per_game': np.concatenate([
        np.random.normal(82, 5, 20),   # Hurry-up
        np.random.normal(73, 4, 50),   # Fast
        np.random.normal(68, 4, 40),   # Medium
        np.random.normal(62, 4, 20)    # Slow
    ])
})

tempo_combined = pd.concat([nfl_tempo, college_tempo], ignore_index=True)

# Scatter plot
fig, ax = plt.subplots(figsize=(10, 6))

for level, color in [('NFL', '#013369'), ('College', '#841617')]:
    data = tempo_combined[tempo_combined['level'] == level]
    ax.scatter(data['seconds_per_play'], data['plays_per_game'],
               alpha=0.6, s=50, label=level, color=color)

    # Add trend line
    z = np.polyfit(data['seconds_per_play'], data['plays_per_game'], 1)
    p = np.poly1d(z)
    x_line = np.linspace(data['seconds_per_play'].min(),
                        data['seconds_per_play'].max(), 100)
    ax.plot(x_line, p(x_line), color=color, linestyle='--', alpha=0.8)

ax.set_xlabel('Seconds per Play', fontsize=12)
ax.set_ylabel('Plays per Game', fontsize=12)
ax.set_title('Pace of Play: NFL vs College Football\n' +
             'College shows much wider variance in tempo',
             fontsize=14, fontweight='bold', pad=20)
ax.legend()
plt.tight_layout()
plt.show()

# Summary statistics
tempo_stats = tempo_combined.groupby('level').agg({
    'seconds_per_play': ['mean', 'std'],
    'plays_per_game': ['mean', 'std']
}).round(1)

print("\nPace of Play Summary Statistics:")
print(tempo_stats)

Strategic Implications:

  1. No-huddle offenses: More common and extreme in college
  2. Defensive fatigue: Greater concern in college with faster tempo
  3. Personnel packages: Less substitution time affects depth charts
  4. Sample size: More plays in college = more data per game

College-to-NFL Projection Models

Projecting college performance to the NFL is one of the most important and challenging problems in football analytics.

Quarterback Projection Model

#| label: qb-projection-model-r
#| message: false
#| warning: false
#| cache: true

# Simulated QB data for projection modeling
set.seed(2024)

# Create synthetic college and NFL performance data
qb_data <- tibble(
  player_id = 1:150,
  # College stats (per game averages)
  college_cmp_pct = rnorm(150, mean = 63, sd = 5),
  college_ypa = rnorm(150, mean = 8.2, sd = 1.2),
  college_td_pct = rnorm(150, mean = 5.5, sd = 1.5),
  college_int_pct = rnorm(150, mean = 2.8, sd = 1.0),
  college_rush_yds = rnorm(150, mean = 25, sd = 15),

  # Contextual factors
  conf_strength = sample(1:10, 150, replace = TRUE),
  games_started = sample(15:50, 150, replace = TRUE),

  # Combine measurables
  height = rnorm(150, mean = 75, sd = 2),
  weight = rnorm(150, mean = 225, sd = 10),
  arm_strength = rnorm(150, mean = 55, sd = 5),

  # Draft capital (proxy for NFL evaluation)
  draft_pick = sample(1:250, 150, replace = TRUE)
) %>%
  mutate(
    # NFL performance (synthetic - correlated with college but noisy)
    nfl_epa_per_play =
      0.003 * college_cmp_pct +
      0.02 * college_ypa +
      0.01 * college_td_pct +
      -0.01 * college_int_pct +
      0.002 * conf_strength +
      -0.0005 * draft_pick +
      rnorm(150, mean = 0, sd = 0.08),

    # Categorize outcomes
    nfl_success = case_when(
      nfl_epa_per_play > 0.10 ~ "Elite",
      nfl_epa_per_play > 0.00 ~ "Average",
      TRUE ~ "Below Average"
    )
  )

# Build projection model
library(broom)

qb_model <- lm(
  nfl_epa_per_play ~
    college_cmp_pct +
    college_ypa +
    college_td_pct +
    college_int_pct +
    conf_strength +
    log(draft_pick + 1) +
    height,
  data = qb_data
)

# Display model results
tidy(qb_model) %>%
  gt() %>%
  cols_label(
    term = "Variable",
    estimate = "Coefficient",
    std.error = "Std Error",
    statistic = "t-stat",
    p.value = "p-value"
  ) %>%
  fmt_number(
    columns = c(estimate, std.error, statistic),
    decimals = 4
  ) %>%
  fmt_number(
    columns = p.value,
    decimals = 4
  ) %>%
  tab_header(
    title = "QB College-to-NFL Projection Model",
    subtitle = "Predicting NFL EPA per Play from College Performance"
  )

# Model performance
glance(qb_model) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value) %>%
  gt() %>%
  fmt_number(decimals = 4) %>%
  tab_header(title = "Model Fit Statistics")

# Visualize actual vs predicted
qb_data <- qb_data %>%
  mutate(predicted_nfl_epa = predict(qb_model))

qb_data %>%
  ggplot(aes(x = predicted_nfl_epa, y = nfl_epa_per_play)) +
  geom_point(alpha = 0.5, size = 2) +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "red") +
  geom_smooth(method = "lm", se = TRUE, alpha = 0.2) +
  labs(
    title = "QB Projection Model: Actual vs Predicted NFL Performance",
    subtitle = sprintf("R² = %.3f", summary(qb_model)$r.squared),
    x = "Predicted NFL EPA/Play",
    y = "Actual NFL EPA/Play"
  ) +
  theme(plot.title = element_text(face = "bold"))
#| label: qb-projection-model-py
#| message: false
#| warning: false

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.model_selection import train_test_split

# Simulated QB data
np.random.seed(2024)

n_qbs = 150

qb_data = pd.DataFrame({
    'player_id': range(1, n_qbs + 1),
    # College stats
    'college_cmp_pct': np.random.normal(63, 5, n_qbs),
    'college_ypa': np.random.normal(8.2, 1.2, n_qbs),
    'college_td_pct': np.random.normal(5.5, 1.5, n_qbs),
    'college_int_pct': np.random.normal(2.8, 1.0, n_qbs),
    'college_rush_yds': np.random.normal(25, 15, n_qbs),
    'conf_strength': np.random.randint(1, 11, n_qbs),
    'games_started': np.random.randint(15, 51, n_qbs),
    'height': np.random.normal(75, 2, n_qbs),
    'weight': np.random.normal(225, 10, n_qbs),
    'arm_strength': np.random.normal(55, 5, n_qbs),
    'draft_pick': np.random.randint(1, 251, n_qbs)
})

# Generate NFL performance (synthetic)
qb_data['nfl_epa_per_play'] = (
    0.003 * qb_data['college_cmp_pct'] +
    0.02 * qb_data['college_ypa'] +
    0.01 * qb_data['college_td_pct'] +
    -0.01 * qb_data['college_int_pct'] +
    0.002 * qb_data['conf_strength'] +
    -0.0005 * qb_data['draft_pick'] +
    np.random.normal(0, 0.08, n_qbs)
)

# Build projection model
features = ['college_cmp_pct', 'college_ypa', 'college_td_pct',
            'college_int_pct', 'conf_strength', 'height']

X = qb_data[features]
y = qb_data['nfl_epa_per_play']

# Add log of draft pick
X['log_draft_pick'] = np.log(qb_data['draft_pick'] + 1)

# Fit model
model = LinearRegression()
model.fit(X, y)

# Predictions
qb_data['predicted_nfl_epa'] = model.predict(X)

# Model results
coef_df = pd.DataFrame({
    'Variable': features + ['log_draft_pick'],
    'Coefficient': model.coef_
})

print("\nQB College-to-NFL Projection Model:")
print(coef_df.to_string(index=False))
print(f"\nIntercept: {model.intercept_:.4f}")
print(f"R²: {r2_score(y, qb_data['predicted_nfl_epa']):.4f}")
print(f"RMSE: {np.sqrt(mean_squared_error(y, qb_data['predicted_nfl_epa'])):.4f}")

# Visualize actual vs predicted
fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(qb_data['predicted_nfl_epa'], qb_data['nfl_epa_per_play'],
           alpha=0.5, s=50)
ax.plot([-0.2, 0.3], [-0.2, 0.3], 'r--', label='Perfect Prediction')

# Add regression line
z = np.polyfit(qb_data['predicted_nfl_epa'], qb_data['nfl_epa_per_play'], 1)
p = np.poly1d(z)
x_line = np.linspace(qb_data['predicted_nfl_epa'].min(),
                    qb_data['predicted_nfl_epa'].max(), 100)
ax.plot(x_line, p(x_line), 'b-', alpha=0.5, label='Actual Fit')

ax.set_xlabel('Predicted NFL EPA/Play', fontsize=12)
ax.set_ylabel('Actual NFL EPA/Play', fontsize=12)
ax.set_title(f'QB Projection Model: Actual vs Predicted\nR² = {r2_score(y, qb_data["predicted_nfl_epa"]):.3f}',
             fontsize=14, fontweight='bold')
ax.legend()
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

Key Factors in Projection Models:

  1. Competition level: Adjust for conference/opponent strength
  2. Volume and experience: More data = better projections
  3. Physical traits: Size, speed, arm strength matter
  4. Draft capital: Market signal of NFL evaluation
  5. Context: Scheme fit, supporting cast adjustments

Position-Specific Translation Rates

Different positions translate from college to NFL at different rates:

#| label: position-translation-r
#| message: false
#| warning: false

# Position translation success rates (synthetic data)
position_translation <- tibble(
  position = c("QB", "RB", "WR", "TE", "OL", "DL", "LB", "CB", "S"),
  success_rate = c(0.35, 0.42, 0.48, 0.38, 0.45, 0.52, 0.44, 0.40, 0.46),
  college_nfl_correlation = c(0.45, 0.38, 0.52, 0.41, 0.35, 0.48, 0.40, 0.36, 0.43),
  avg_adjustment_period = c(2.5, 1.5, 1.8, 2.2, 2.8, 2.0, 2.3, 2.6, 2.1)
) %>%
  arrange(desc(success_rate))

position_translation %>%
  gt() %>%
  cols_label(
    position = "Position",
    success_rate = "Success Rate",
    college_nfl_correlation = "College-NFL Corr.",
    avg_adjustment_period = "Adj. Period (yrs)"
  ) %>%
  fmt_percent(
    columns = c(success_rate, college_nfl_correlation),
    decimals = 1
  ) %>%
  fmt_number(
    columns = avg_adjustment_period,
    decimals = 1
  ) %>%
  data_color(
    columns = success_rate,
    colors = scales::col_numeric(
      palette = c("#ef476f", "#ffd166", "#06d6a0"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "College-to-NFL Translation by Position",
    subtitle = "Success rates and correlation of college performance to NFL outcomes"
  )

# Visualize
position_translation %>%
  ggplot(aes(x = reorder(position, success_rate), y = success_rate)) +
  geom_col(aes(fill = success_rate), alpha = 0.8) +
  geom_text(aes(label = scales::percent(success_rate, accuracy = 1)),
            hjust = -0.2, size = 3.5) +
  scale_fill_gradient(low = "#ef476f", high = "#06d6a0") +
  scale_y_continuous(labels = scales::percent_format(), limits = c(0, 0.6)) +
  coord_flip() +
  labs(
    title = "NFL Success Rate by Position",
    subtitle = "Defined as becoming above-average NFL starter within first 4 years",
    x = NULL,
    y = "Success Rate"
  ) +
  theme(legend.position = "none")
#| label: position-translation-py
#| message: false
#| warning: false

# Position translation data
position_translation = pd.DataFrame({
    'position': ['QB', 'RB', 'WR', 'TE', 'OL', 'DL', 'LB', 'CB', 'S'],
    'success_rate': [0.35, 0.42, 0.48, 0.38, 0.45, 0.52, 0.44, 0.40, 0.46],
    'college_nfl_correlation': [0.45, 0.38, 0.52, 0.41, 0.35, 0.48, 0.40, 0.36, 0.43],
    'avg_adjustment_period': [2.5, 1.5, 1.8, 2.2, 2.8, 2.0, 2.3, 2.6, 2.1]
})

position_translation = position_translation.sort_values('success_rate', ascending=False)

print("\nCollege-to-NFL Translation by Position:")
print(position_translation.to_string(index=False))

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
colors = plt.cm.RdYlGn(position_translation['success_rate'] /
                       position_translation['success_rate'].max())
bars = ax.barh(position_translation['position'],
               position_translation['success_rate'],
               color=colors, alpha=0.8)

ax.set_xlabel('Success Rate', fontsize=12)
ax.set_title('NFL Success Rate by Position\n' +
             'Defined as becoming above-average NFL starter within first 4 years',
             fontsize=14, fontweight='bold', pad=20)
ax.set_xlim(0, 0.6)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))

# Add value labels
for i, (pos, rate) in enumerate(zip(position_translation['position'],
                                     position_translation['success_rate'])):
    ax.text(rate + 0.01, i, f'{rate:.1%}', va='center', fontsize=10)

plt.tight_layout()
plt.show()

Metrics That Transfer vs Don't Transfer

Metrics That Translate Well

1. Physical Attributes:
- Speed (40-yard dash)
- Size (height, weight, wingspan)
- Explosiveness (vertical jump, broad jump)

2. Certain Efficiency Metrics:
- Yards per attempt (adjusted for competition)
- Turnover rates (with context)
- Red zone performance

3. Relative Athletic Scores (RAS):
- Percentile rankings within position
- Combine performance relative to position norms

Metrics That DON'T Translate Well

1. Volume Statistics:
- Total yards (scheme and talent dependent)
- Total touchdowns (opportunity driven)
- Total tackles (scheme and snap count dependent)

2. Win-Loss Record:
- Team quality varies drastically
- Coaching and roster context crucial

3. Raw Success Rates:
- Competition level varies too much
- Scheme advantages in college

#| label: metric-transfer-correlation-r
#| message: false
#| warning: false

# Correlation of college metrics to NFL performance
metric_correlations <- tibble(
  metric = c(
    "40-Yard Dash", "Height", "Weight", "Vertical Jump",
    "Yards per Attempt", "Completion %", "TD Rate", "INT Rate",
    "Total Yards", "Total TDs", "Wins", "Team EPA"
  ),
  category = c(
    "Physical", "Physical", "Physical", "Physical",
    "Efficiency", "Efficiency", "Efficiency", "Efficiency",
    "Volume", "Volume", "Team", "Team"
  ),
  correlation_to_nfl = c(
    0.52, 0.38, 0.31, 0.47,
    0.45, 0.42, 0.39, -0.35,
    0.18, 0.22, 0.09, 0.28
  )
) %>%
  arrange(desc(abs(correlation_to_nfl)))

metric_correlations %>%
  mutate(
    correlation_abs = abs(correlation_to_nfl),
    transfers_well = if_else(correlation_abs >= 0.35, "Yes", "No")
  ) %>%
  ggplot(aes(x = reorder(metric, correlation_to_nfl),
             y = correlation_to_nfl,
             fill = transfers_well)) +
  geom_col(alpha = 0.8) +
  geom_hline(yintercept = c(-0.35, 0.35), linetype = "dashed", alpha = 0.5) +
  scale_fill_manual(values = c("Yes" = "#06d6a0", "No" = "#ef476f")) +
  coord_flip() +
  labs(
    title = "College Metric Correlation to NFL Success",
    subtitle = "Physical and efficiency metrics translate better than volume stats",
    x = NULL,
    y = "Correlation to NFL Performance",
    fill = "Transfers Well"
  ) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "top"
  )
#| label: metric-transfer-correlation-py
#| message: false
#| warning: false

# Metric correlations
metric_correlations = pd.DataFrame({
    'metric': [
        "40-Yard Dash", "Height", "Weight", "Vertical Jump",
        "Yards per Attempt", "Completion %", "TD Rate", "INT Rate",
        "Total Yards", "Total TDs", "Wins", "Team EPA"
    ],
    'category': [
        "Physical", "Physical", "Physical", "Physical",
        "Efficiency", "Efficiency", "Efficiency", "Efficiency",
        "Volume", "Volume", "Team", "Team"
    ],
    'correlation_to_nfl': [
        0.52, 0.38, 0.31, 0.47,
        0.45, 0.42, 0.39, -0.35,
        0.18, 0.22, 0.09, 0.28
    ]
})

metric_correlations['correlation_abs'] = metric_correlations['correlation_to_nfl'].abs()
metric_correlations['transfers_well'] = metric_correlations['correlation_abs'] >= 0.35
metric_correlations = metric_correlations.sort_values('correlation_to_nfl')

# Visualize
fig, ax = plt.subplots(figsize=(10, 8))
colors = ['#06d6a0' if x else '#ef476f'
          for x in metric_correlations['transfers_well']]
bars = ax.barh(metric_correlations['metric'],
               metric_correlations['correlation_to_nfl'],
               color=colors, alpha=0.8)

ax.axvline(x=0.35, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=-0.35, color='gray', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='black', linewidth=0.8)

ax.set_xlabel('Correlation to NFL Performance', fontsize=12)
ax.set_title('College Metric Correlation to NFL Success\n' +
             'Physical and efficiency metrics translate better than volume stats',
             fontsize=14, fontweight='bold', pad=20)

# Legend
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='#06d6a0', alpha=0.8, label='Transfers Well (|r| ≥ 0.35)'),
    Patch(facecolor='#ef476f', alpha=0.8, label='Transfers Poorly (|r| < 0.35)')
]
ax.legend(handles=legend_elements, loc='lower right')

plt.tight_layout()
plt.show()

Conference Strength and Ratings

Adjusting for competition level is critical in college football analytics.

Conference Strength Model

#| label: conference-strength-r
#| message: false
#| warning: false
#| cache: true

# Simulated conference strength ratings
set.seed(2025)

conferences <- c("SEC", "Big Ten", "ACC", "Big 12", "Pac-12",
                "American", "Mountain West", "C-USA", "MAC", "Sun Belt")

conference_ratings <- tibble(
  conference = conferences,
  # Multiple rating components
  avg_recruit_rating = c(91, 89, 87, 86, 87, 82, 79, 76, 75, 77),
  nfl_draft_picks = c(65, 58, 52, 48, 51, 28, 22, 18, 15, 20),
  avg_team_sp_plus = c(12.5, 10.8, 8.2, 7.5, 8.9, 2.1, -1.5, -5.2, -6.8, -3.4),
  win_pct_vs_p5 = c(0.58, 0.55, 0.50, 0.49, 0.51, 0.38, 0.35, 0.28, 0.25, 0.32)
) %>%
  mutate(
    # Composite strength rating (0-100 scale)
    strength_rating =
      0.3 * scales::rescale(avg_recruit_rating, to = c(0, 100)) +
      0.3 * scales::rescale(nfl_draft_picks, to = c(0, 100)) +
      0.2 * scales::rescale(avg_team_sp_plus, to = c(0, 100)) +
      0.2 * scales::rescale(win_pct_vs_p5, to = c(0, 100)),
    tier = case_when(
      strength_rating >= 75 ~ "Elite",
      strength_rating >= 60 ~ "Strong",
      strength_rating >= 40 ~ "Average",
      TRUE ~ "Weak"
    )
  ) %>%
  arrange(desc(strength_rating))

conference_ratings %>%
  select(conference, strength_rating, tier, avg_recruit_rating,
         nfl_draft_picks, avg_team_sp_plus) %>%
  gt() %>%
  cols_label(
    conference = "Conference",
    strength_rating = "Strength Rating",
    tier = "Tier",
    avg_recruit_rating = "Avg Recruit",
    nfl_draft_picks = "NFL Picks",
    avg_team_sp_plus = "Avg SP+"
  ) %>%
  fmt_number(
    columns = c(strength_rating, avg_recruit_rating, avg_team_sp_plus),
    decimals = 1
  ) %>%
  fmt_number(
    columns = nfl_draft_picks,
    decimals = 0
  ) %>%
  data_color(
    columns = strength_rating,
    colors = scales::col_numeric(
      palette = c("#ef476f", "#ffd166", "#06d6a0"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "College Football Conference Strength Ratings",
    subtitle = "Composite rating based on recruiting, NFL production, and on-field performance"
  )

# Visualize strength ratings
conference_ratings %>%
  ggplot(aes(x = reorder(conference, strength_rating),
             y = strength_rating,
             fill = tier)) +
  geom_col(alpha = 0.8) +
  scale_fill_manual(
    values = c("Elite" = "#06d6a0", "Strong" = "#118ab2",
               "Average" = "#ffd166", "Weak" = "#ef476f")
  ) +
  coord_flip() +
  labs(
    title = "Conference Strength Ratings (2023)",
    subtitle = "Composite metric incorporating recruiting, NFL production, and performance",
    x = NULL,
    y = "Strength Rating (0-100)",
    fill = "Tier"
  ) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "top"
  )
#| label: conference-strength-py
#| message: false
#| warning: false

from sklearn.preprocessing import MinMaxScaler

# Conference strength data
conferences = ["SEC", "Big Ten", "ACC", "Big 12", "Pac-12",
               "American", "Mountain West", "C-USA", "MAC", "Sun Belt"]

conference_data = pd.DataFrame({
    'conference': conferences,
    'avg_recruit_rating': [91, 89, 87, 86, 87, 82, 79, 76, 75, 77],
    'nfl_draft_picks': [65, 58, 52, 48, 51, 28, 22, 18, 15, 20],
    'avg_team_sp_plus': [12.5, 10.8, 8.2, 7.5, 8.9, 2.1, -1.5, -5.2, -6.8, -3.4],
    'win_pct_vs_p5': [0.58, 0.55, 0.50, 0.49, 0.51, 0.38, 0.35, 0.28, 0.25, 0.32]
})

# Calculate composite strength rating
scaler = MinMaxScaler(feature_range=(0, 100))
scaled_features = scaler.fit_transform(
    conference_data[['avg_recruit_rating', 'nfl_draft_picks',
                     'avg_team_sp_plus', 'win_pct_vs_p5']]
)

conference_data['strength_rating'] = (
    0.3 * scaled_features[:, 0] +
    0.3 * scaled_features[:, 1] +
    0.2 * scaled_features[:, 2] +
    0.2 * scaled_features[:, 3]
)

# Assign tiers
conference_data['tier'] = pd.cut(
    conference_data['strength_rating'],
    bins=[0, 40, 60, 75, 100],
    labels=['Weak', 'Average', 'Strong', 'Elite']
)

conference_data = conference_data.sort_values('strength_rating', ascending=False)

print("\nCollege Football Conference Strength Ratings:")
print(conference_data[['conference', 'strength_rating', 'tier',
                       'avg_recruit_rating', 'nfl_draft_picks']].to_string(index=False))

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
tier_colors = {'Elite': '#06d6a0', 'Strong': '#118ab2',
               'Average': '#ffd166', 'Weak': '#ef476f'}
colors = [tier_colors[tier] for tier in conference_data['tier']]

bars = ax.barh(conference_data['conference'],
               conference_data['strength_rating'],
               color=colors, alpha=0.8)

ax.set_xlabel('Strength Rating (0-100)', fontsize=12)
ax.set_title('Conference Strength Ratings (2023)\n' +
             'Composite metric incorporating recruiting, NFL production, and performance',
             fontsize=14, fontweight='bold', pad=20)

# Legend
from matplotlib.patches import Patch
legend_elements = [Patch(facecolor=color, alpha=0.8, label=tier)
                   for tier, color in tier_colors.items()]
ax.legend(handles=legend_elements, loc='lower right', title='Tier')

plt.tight_layout()
plt.show()

Strength of Schedule Adjustments

When projecting players, adjusting for strength of schedule is essential:

$$ \text{Adjusted Stat} = \text{Raw Stat} \times \frac{\text{League Avg}}{\text{Opponent Avg}} $$

#| label: sos-adjustment-r
#| message: false
#| warning: false

# Example: Adjusting QB stats for strength of schedule
set.seed(789)

qb_sos_example <- tibble(
  player = c("QB A", "QB B", "QB C", "QB D", "QB E"),
  conference = c("SEC", "MAC", "Big Ten", "Sun Belt", "ACC"),
  raw_ypa = c(8.2, 9.5, 8.0, 9.8, 8.5),
  raw_td_rate = c(5.5, 7.2, 5.1, 7.8, 5.8)
) %>%
  left_join(
    conference_ratings %>% select(conference, strength_rating),
    by = "conference"
  ) %>%
  mutate(
    # Adjustment factor (normalize to average = 70)
    adjustment_factor = 70 / strength_rating,
    adjusted_ypa = raw_ypa * adjustment_factor,
    adjusted_td_rate = raw_td_rate * adjustment_factor,
    ypa_change = adjusted_ypa - raw_ypa,
    td_rate_change = adjusted_td_rate - raw_td_rate
  )

qb_sos_example %>%
  select(player, conference, raw_ypa, adjusted_ypa, raw_td_rate, adjusted_td_rate) %>%
  gt() %>%
  cols_label(
    player = "Player",
    conference = "Conference",
    raw_ypa = "Raw YPA",
    adjusted_ypa = "Adj. YPA",
    raw_td_rate = "Raw TD%",
    adjusted_td_rate = "Adj. TD%"
  ) %>%
  fmt_number(decimals = 2) %>%
  tab_header(
    title = "Strength of Schedule Adjustments",
    subtitle = "Stats adjusted for conference strength"
  ) %>%
  tab_style(
    style = cell_fill(color = "#e8f4f8"),
    locations = cells_body(columns = c(adjusted_ypa, adjusted_td_rate))
  )
#| label: sos-adjustment-py
#| message: false
#| warning: false

# QB strength of schedule adjustments
qb_sos = pd.DataFrame({
    'player': ['QB A', 'QB B', 'QB C', 'QB D', 'QB E'],
    'conference': ['SEC', 'MAC', 'Big Ten', 'Sun Belt', 'ACC'],
    'raw_ypa': [8.2, 9.5, 8.0, 9.8, 8.5],
    'raw_td_rate': [5.5, 7.2, 5.1, 7.8, 5.8]
})

# Merge with conference ratings
qb_sos = qb_sos.merge(
    conference_data[['conference', 'strength_rating']],
    on='conference'
)

# Calculate adjustments (normalize to average = 70)
qb_sos['adjustment_factor'] = 70 / qb_sos['strength_rating']
qb_sos['adjusted_ypa'] = qb_sos['raw_ypa'] * qb_sos['adjustment_factor']
qb_sos['adjusted_td_rate'] = qb_sos['raw_td_rate'] * qb_sos['adjustment_factor']

print("\nStrength of Schedule Adjustments:")
print(qb_sos[['player', 'conference', 'raw_ypa', 'adjusted_ypa',
              'raw_td_rate', 'adjusted_td_rate']].to_string(index=False))

Strategic Differences

Fourth Down Decision-Making

Fourth down aggression differs significantly between college and NFL:

#| label: fourth-down-comparison-r
#| message: false
#| warning: false

# Fourth down decision rates
fourth_down_decisions <- tibble(
  level = rep(c("NFL", "College"), each = 4),
  situation = rep(c("4th & 1-2", "4th & 3-5", "4th & 6+", "In FG Range"), 2),
  go_rate = c(
    0.45, 0.15, 0.08, 0.75,  # NFL
    0.38, 0.12, 0.06, 0.82   # College
  ),
  success_rate = c(
    0.62, 0.48, 0.35, 0.85,  # NFL
    0.58, 0.45, 0.32, 0.80   # College
  )
)

fourth_down_decisions %>%
  ggplot(aes(x = situation, y = go_rate, fill = level)) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_manual(values = c("NFL" = "#013369", "College" = "#841617")) +
  labs(
    title = "Fourth Down Decision Rates by Situation",
    subtitle = "College teams more conservative outside FG range, aggressive inside",
    x = "Situation",
    y = "Go-For-It Rate",
    fill = "Level"
  ) +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "top"
  )
#| label: fourth-down-comparison-py
#| message: false
#| warning: false

# Fourth down decisions
situations = ['4th & 1-2', '4th & 3-5', '4th & 6+', 'In FG Range']

fourth_down_data = pd.DataFrame({
    'situation': situations * 2,
    'level': ['NFL'] * 4 + ['College'] * 4,
    'go_rate': [0.45, 0.15, 0.08, 0.75, 0.38, 0.12, 0.06, 0.82],
    'success_rate': [0.62, 0.48, 0.35, 0.85, 0.58, 0.45, 0.32, 0.80]
})

# Visualize
fig, ax = plt.subplots(figsize=(10, 6))

x = np.arange(len(situations))
width = 0.35

nfl_data = fourth_down_data[fourth_down_data['level'] == 'NFL']
college_data = fourth_down_data[fourth_down_data['level'] == 'College']

ax.bar(x - width/2, nfl_data['go_rate'], width,
       label='NFL', alpha=0.8, color='#013369')
ax.bar(x + width/2, college_data['go_rate'], width,
       label='College', alpha=0.8, color='#841617')

ax.set_ylabel('Go-For-It Rate', fontsize=12)
ax.set_title('Fourth Down Decision Rates by Situation\n' +
             'College teams more conservative outside FG range, aggressive inside',
             fontsize=14, fontweight='bold', pad=20)
ax.set_xticks(x)
ax.set_xticklabels(situations, rotation=45, ha='right')
ax.legend(loc='upper right')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))

plt.tight_layout()
plt.show()

Why the Differences?

  1. Talent disparity: College teams can dominate with talent alone
  2. Playoff structure: Win-at-all-costs mentality in college less pronounced
  3. Risk tolerance: NFL coaches face greater job security pressure
  4. Analytics adoption: NFL faster to adopt data-driven 4th down decisions

Red Zone Offense

#| label: red-zone-comparison-r
#| message: false
#| warning: false

# Red zone play-calling tendencies
red_zone_data <- tibble(
  level = rep(c("NFL", "College"), each = 10),
  yard_line = rep(20:11, 2),
  pass_rate = c(
    # NFL: More balanced
    0.52, 0.53, 0.55, 0.56, 0.58, 0.60, 0.62, 0.58, 0.55, 0.48,
    # College: More pass-heavy
    0.58, 0.60, 0.62, 0.64, 0.66, 0.68, 0.65, 0.62, 0.58, 0.52
  ),
  td_rate = c(
    # NFL: Lower variance
    0.38, 0.40, 0.42, 0.45, 0.48, 0.52, 0.55, 0.58, 0.62, 0.68,
    # College: More variance
    0.42, 0.45, 0.48, 0.52, 0.56, 0.60, 0.64, 0.68, 0.72, 0.76
  )
)

p1 <- red_zone_data %>%
  ggplot(aes(x = yard_line, y = pass_rate, color = level)) +
  geom_line(size = 1.2) +
  geom_point(size = 2) +
  scale_x_reverse() +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_color_manual(values = c("NFL" = "#013369", "College" = "#841617")) +
  labs(
    title = "Red Zone Pass Rate by Field Position",
    x = "Yard Line",
    y = "Pass Rate",
    color = "Level"
  ) +
  theme(legend.position = "top")

p2 <- red_zone_data %>%
  ggplot(aes(x = yard_line, y = td_rate, color = level)) +
  geom_line(size = 1.2) +
  geom_point(size = 2) +
  scale_x_reverse() +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_color_manual(values = c("NFL" = "#013369", "College" = "#841617")) +
  labs(
    title = "Red Zone TD Rate by Field Position",
    x = "Yard Line",
    y = "Touchdown Rate",
    color = "Level"
  ) +
  theme(legend.position = "top")

p1 + p2 +
  plot_annotation(
    title = "Red Zone Tendencies: NFL vs College",
    theme = theme(plot.title = element_text(face = "bold", size = 14))
  )
#| label: red-zone-comparison-py
#| message: false
#| warning: false

# Red zone data
yard_lines = list(range(20, 10, -1))

red_zone = pd.DataFrame({
    'yard_line': yard_lines * 2,
    'level': ['NFL'] * 10 + ['College'] * 10,
    'pass_rate': [0.52, 0.53, 0.55, 0.56, 0.58, 0.60, 0.62, 0.58, 0.55, 0.48,
                  0.58, 0.60, 0.62, 0.64, 0.66, 0.68, 0.65, 0.62, 0.58, 0.52],
    'td_rate': [0.38, 0.40, 0.42, 0.45, 0.48, 0.52, 0.55, 0.58, 0.62, 0.68,
                0.42, 0.45, 0.48, 0.52, 0.56, 0.60, 0.64, 0.68, 0.72, 0.76]
})

# Create subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Pass rate plot
for level, color in [('NFL', '#013369'), ('College', '#841617')]:
    data = red_zone[red_zone['level'] == level]
    ax1.plot(data['yard_line'], data['pass_rate'],
             marker='o', linewidth=2, label=level, color=color)

ax1.set_xlabel('Yard Line', fontsize=11)
ax1.set_ylabel('Pass Rate', fontsize=11)
ax1.set_title('Red Zone Pass Rate by Field Position', fontsize=12, fontweight='bold')
ax1.legend()
ax1.invert_xaxis()
ax1.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax1.grid(alpha=0.3)

# TD rate plot
for level, color in [('NFL', '#013369'), ('College', '#841617')]:
    data = red_zone[red_zone['level'] == level]
    ax2.plot(data['yard_line'], data['td_rate'],
             marker='o', linewidth=2, label=level, color=color)

ax2.set_xlabel('Yard Line', fontsize=11)
ax2.set_ylabel('Touchdown Rate', fontsize=11)
ax2.set_title('Red Zone TD Rate by Field Position', fontsize=12, fontweight='bold')
ax2.legend()
ax2.invert_xaxis()
ax2.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0%}'))
ax2.grid(alpha=0.3)

plt.suptitle('Red Zone Tendencies: NFL vs College',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

Draft Pick Value and Correlation

Connecting to Chapter 27's draft analytics, we can examine how college performance correlates with draft position and subsequent NFL success.

Draft Position and College Production

#| label: draft-college-correlation-r
#| message: false
#| warning: false
#| cache: true

# Simulated data: college production vs draft pick vs NFL success
set.seed(888)

draft_analysis <- tibble(
  player_id = 1:200,
  college_epa_per_play = rnorm(200, mean = 0.15, sd = 0.12),
  college_sp_rating = rnorm(200, mean = 110, sd = 15),
  conf_strength = sample(conference_ratings$strength_rating, 200, replace = TRUE),
  draft_pick = sample(1:250, 200, replace = TRUE)
) %>%
  mutate(
    # NFL success correlated with college performance, draft position, and conference
    nfl_value =
      0.5 * scales::rescale(college_epa_per_play) +
      0.2 * scales::rescale(conf_strength) +
      -0.3 * scales::rescale(draft_pick) +
      rnorm(200, mean = 0, sd = 0.2),
    draft_round = case_when(
      draft_pick <= 32 ~ "Round 1",
      draft_pick <= 64 ~ "Round 2",
      draft_pick <= 100 ~ "Round 3",
      draft_pick <= 140 ~ "Round 4-5",
      TRUE ~ "Round 6-7"
    ),
    nfl_success = case_when(
      nfl_value > 0.6 ~ "Star",
      nfl_value > 0.4 ~ "Starter",
      nfl_value > 0.2 ~ "Rotation",
      TRUE ~ "Bust"
    )
  )

# Visualize: College EPA vs Draft Pick
draft_analysis %>%
  ggplot(aes(x = college_epa_per_play, y = draft_pick, color = nfl_success)) +
  geom_point(alpha = 0.6, size = 2.5) +
  geom_smooth(method = "lm", se = TRUE, color = "black", linetype = "dashed") +
  scale_y_reverse() +
  scale_color_manual(
    values = c("Star" = "#06d6a0", "Starter" = "#118ab2",
               "Rotation" = "#ffd166", "Bust" = "#ef476f")
  ) +
  labs(
    title = "College Performance vs Draft Position",
    subtitle = "Better college production correlates with higher draft picks",
    x = "College EPA per Play",
    y = "Draft Pick (lower = better)",
    color = "NFL Outcome"
  ) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "right"
  )

# Success rates by round
success_by_round <- draft_analysis %>%
  mutate(draft_round = factor(draft_round, levels = c("Round 1", "Round 2",
                                                       "Round 3", "Round 4-5",
                                                       "Round 6-7"))) %>%
  group_by(draft_round, nfl_success) %>%
  summarise(count = n(), .groups = "drop_last") %>%
  mutate(pct = count / sum(count))

success_by_round %>%
  ggplot(aes(x = draft_round, y = pct, fill = nfl_success)) +
  geom_col(position = "fill", alpha = 0.8) +
  scale_y_continuous(labels = scales::percent_format()) +
  scale_fill_manual(
    values = c("Star" = "#06d6a0", "Starter" = "#118ab2",
               "Rotation" = "#ffd166", "Bust" = "#ef476f")
  ) +
  labs(
    title = "NFL Success Rate by Draft Round",
    subtitle = "Earlier picks have higher success rates but still significant bust risk",
    x = "Draft Round",
    y = "Percentage",
    fill = "NFL Outcome"
  ) +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "right"
  )
#| label: draft-college-correlation-py
#| message: false
#| warning: false

from sklearn.preprocessing import minmax_scale

# Simulated draft data
np.random.seed(888)

draft_df = pd.DataFrame({
    'player_id': range(1, 201),
    'college_epa_per_play': np.random.normal(0.15, 0.12, 200),
    'college_sp_rating': np.random.normal(110, 15, 200),
    'conf_strength': np.random.choice(conference_data['strength_rating'].values, 200),
    'draft_pick': np.random.randint(1, 251, 200)
})

# Calculate NFL value
draft_df['nfl_value'] = (
    0.5 * minmax_scale(draft_df['college_epa_per_play']) +
    0.2 * minmax_scale(draft_df['conf_strength']) +
    -0.3 * minmax_scale(draft_df['draft_pick']) +
    np.random.normal(0, 0.2, 200)
)

# Categorize
draft_df['draft_round'] = pd.cut(
    draft_df['draft_pick'],
    bins=[0, 32, 64, 100, 140, 250],
    labels=['Round 1', 'Round 2', 'Round 3', 'Round 4-5', 'Round 6-7']
)

draft_df['nfl_success'] = pd.cut(
    draft_df['nfl_value'],
    bins=[-np.inf, 0.2, 0.4, 0.6, np.inf],
    labels=['Bust', 'Rotation', 'Starter', 'Star']
)

# Visualize college performance vs draft pick
fig, ax = plt.subplots(figsize=(10, 6))

success_colors = {'Star': '#06d6a0', 'Starter': '#118ab2',
                  'Rotation': '#ffd166', 'Bust': '#ef476f'}

for success, color in success_colors.items():
    mask = draft_df['nfl_success'] == success
    ax.scatter(draft_df.loc[mask, 'college_epa_per_play'],
               draft_df.loc[mask, 'draft_pick'],
               alpha=0.6, s=50, label=success, color=color)

# Add trend line
z = np.polyfit(draft_df['college_epa_per_play'], draft_df['draft_pick'], 1)
p = np.poly1d(z)
x_line = np.linspace(draft_df['college_epa_per_play'].min(),
                    draft_df['college_epa_per_play'].max(), 100)
ax.plot(x_line, p(x_line), 'k--', alpha=0.5, linewidth=2)

ax.set_xlabel('College EPA per Play', fontsize=12)
ax.set_ylabel('Draft Pick (lower = better)', fontsize=12)
ax.set_title('College Performance vs Draft Position\n' +
             'Better college production correlates with higher draft picks',
             fontsize=14, fontweight='bold', pad=20)
ax.invert_yaxis()
ax.legend(title='NFL Outcome', loc='lower left')
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

# Success rates by round
success_by_round = (draft_df.groupby(['draft_round', 'nfl_success'])
                    .size()
                    .unstack(fill_value=0))
success_pct = success_by_round.div(success_by_round.sum(axis=1), axis=0)

print("\nNFL Success Rate by Draft Round:")
print(success_pct.round(3))

EPA Comparison Across Levels

While EPA frameworks exist at both levels, the metrics aren't directly comparable due to competition differences.

#| label: epa-level-comparison-r
#| message: false
#| warning: false

# Simulated EPA distributions by level
set.seed(999)

epa_comparison <- tibble(
  level = c(rep("NFL", 5000), rep("College - Elite", 5000),
            rep("College - Average", 5000)),
  play_epa = c(
    rnorm(5000, mean = 0.00, sd = 1.8),      # NFL
    rnorm(5000, mean = 0.15, sd = 2.2),      # College Elite
    rnorm(5000, mean = 0.05, sd = 2.8)       # College Average
  )
)

epa_comparison %>%
  ggplot(aes(x = play_epa, fill = level)) +
  geom_density(alpha = 0.5) +
  geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.5) +
  scale_fill_manual(
    values = c("NFL" = "#013369", "College - Elite" = "#841617",
               "College - Average" = "#d4af37")
  ) +
  scale_x_continuous(limits = c(-8, 8)) +
  labs(
    title = "EPA Distribution by Level",
    subtitle = "College shows higher mean and variance due to talent disparity",
    x = "EPA per Play",
    y = "Density",
    fill = "Level"
  ) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "top"
  )

# Summary statistics
epa_summary <- epa_comparison %>%
  group_by(level) %>%
  summarise(
    mean_epa = mean(play_epa),
    sd_epa = sd(play_epa),
    p25 = quantile(play_epa, 0.25),
    median = median(play_epa),
    p75 = quantile(play_epa, 0.75),
    .groups = "drop"
  )

epa_summary %>%
  gt() %>%
  cols_label(
    level = "Level",
    mean_epa = "Mean",
    sd_epa = "Std Dev",
    p25 = "25th %ile",
    median = "Median",
    p75 = "75th %ile"
  ) %>%
  fmt_number(decimals = 3) %>%
  tab_header(
    title = "EPA Summary Statistics by Level"
  )
#| label: epa-level-comparison-py
#| message: false
#| warning: false

# EPA distributions
np.random.seed(999)

epa_data = pd.DataFrame({
    'level': (['NFL'] * 5000 + ['College - Elite'] * 5000 +
              ['College - Average'] * 5000),
    'play_epa': np.concatenate([
        np.random.normal(0.00, 1.8, 5000),   # NFL
        np.random.normal(0.15, 2.2, 5000),   # College Elite
        np.random.normal(0.05, 2.8, 5000)    # College Average
    ])
})

# Visualize
fig, ax = plt.subplots(figsize=(12, 6))

for level, color in [('NFL', '#013369'),
                     ('College - Elite', '#841617'),
                     ('College - Average', '#d4af37')]:
    data = epa_data[epa_data['level'] == level]['play_epa']
    data.plot(kind='density', ax=ax, label=level, alpha=0.5, color=color)

ax.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
ax.set_xlim(-8, 8)
ax.set_xlabel('EPA per Play', fontsize=12)
ax.set_ylabel('Density', fontsize=12)
ax.set_title('EPA Distribution by Level\n' +
             'College shows higher mean and variance due to talent disparity',
             fontsize=14, fontweight='bold', pad=20)
ax.legend(loc='upper right')
ax.grid(alpha=0.3)
plt.tight_layout()
plt.show()

# Summary statistics
epa_summary = epa_data.groupby('level')['play_epa'].describe()[
    ['mean', 'std', '25%', '50%', '75%']
]
print("\nEPA Summary Statistics by Level:")
print(epa_summary.round(3))

Key Takeaways:

  1. College EPA has higher variance due to talent disparities
  2. Elite college offenses generate higher EPA than NFL average
  3. Direct comparison requires normalization
  4. Within-level percentile rankings more meaningful than raw values

Future of College Analytics

The college football analytics landscape continues to evolve rapidly.

1. Transfer Portal Analytics:
- Evaluating transfer prospects vs high school recruits
- Predicting transfer destination impacts
- Optimizing roster construction through transfers

2. NIL (Name, Image, Likeness) Valuation:
- Player market value estimation
- ROI analysis on NIL investments
- Competitive balance implications

3. College Football Playoff Expansion:
- 12-team playoff beginning 2024
- New optimization problems for scheduling
- In-season strategy changes

4. Enhanced Tracking Data:
- Growing availability of player tracking in college
- Alignment with NFL tracking systems
- Improved projection models

Analytics Adoption Barriers

College programs face unique challenges:

  1. Budget constraints: Not all programs can afford analytics staff
  2. Conference disparities: Power 5 vs Group of 5 resources
  3. Coaching turnover: Less institutional knowledge retention
  4. Data availability: Less comprehensive than NFL
  5. Cultural resistance: Traditional scouting still dominant

Opportunity for Analysts

The college analytics space remains less saturated than NFL analytics, creating opportunities for analysts to make significant impacts. Many programs are still in early stages of analytics adoption, and competitive advantages remain available for teams that effectively leverage data.

Summary

College and NFL football present distinct analytical challenges and opportunities:

Key Differences:
- Rule variations affecting strategy and evaluation
- Massive talent disparity in college vs compressed NFL talent
- Pace and tempo variance much wider in college
- Different strategic priorities and risk tolerances

Projection Challenges:
- Competition adjustment critical for college evaluation
- Physical traits transfer better than production stats
- Position-specific translation rates vary significantly
- Conference strength must be incorporated

Analytical Approaches:
- Within-level analysis more straightforward than cross-level
- Percentile rankings often more useful than raw stats
- Context matters enormously in college (talent, scheme, competition)
- NFL methods can inform college but require adaptation

Future Directions:
- Transfer portal and NIL create new analytical frontiers
- Increasing data availability enabling better projections
- Growing analytics adoption at college level
- Enhanced tracking data bridging college-NFL gap

Understanding these differences allows analysts to:
- Build better projection models for the draft
- Appropriately evaluate college vs NFL performance
- Translate methods across levels effectively
- Recognize which insights transfer and which don't

Exercises

Conceptual Questions

  1. Rule Impact Analysis: Choose one rule difference between college and NFL (e.g., hash marks, overtime, pass interference). Explain how this rule affects optimal strategy at each level and discuss the implications for analytics.

  2. Projection Challenges: A quarterback dominated at a Group of 5 school, posting a 72% completion rate and 9.5 yards per attempt. Another QB from the SEC had 64% completion and 8.0 YPA. Discuss how you would evaluate these players for NFL projection.

  3. Metric Translation: For each metric below, discuss whether it translates well from college to NFL and why:
    - Total rushing yards
    - Yards per carry
    - Forced missed tackles per touch
    - Red zone touchdown rate
    - Third-down conversion rate

Coding Exercises

Exercise 1: Pace Analysis

Using the tempo data provided in this chapter: a) Calculate the coefficient of variation for seconds per play at each level b) Create a visualization showing the relationship between pace and scoring c) Identify which teams are "pace outliers" (>1.5 SD from mean) **Hint**: Use `cv = sd / mean` for coefficient of variation.

Exercise 2: College-to-NFL QB Projection

Build your own QB projection model: a) Create a dataset with college performance metrics, physical attributes, and conference strength b) Build a linear regression model predicting NFL EPA per play c) Evaluate model performance using R² and RMSE d) Identify which college metrics are most predictive e) Generate predictions for a new class of QB prospects **Data features to include**: - Completion percentage - Yards per attempt - TD rate and INT rate - Conference strength rating - Height, weight, 40-yard dash time - Games started

Exercise 3: Conference Strength Rating System

Develop a conference strength rating system: a) Define 4-5 metrics to evaluate conference strength (recruiting, NFL draft picks, win rates, etc.) b) Collect or simulate data for these metrics c) Create a weighted composite rating d) Validate your system by comparing to actual NFL draft production e) Visualize conference rankings over time (if multi-year data available) **Bonus**: Build a regression model showing how conference strength affects player projection accuracy.

Exercise 4: Strength of Schedule Adjustment

Create a strength-of-schedule adjustment system: a) Calculate average opponent rating for each team's schedule b) Develop an adjustment factor to normalize stats across schedules c) Apply adjustments to QB or RB statistics d) Compare raw vs adjusted rankings e) Measure how adjustments improve correlation with draft position **Example**: A QB with 8.5 YPA against average strength = 75 should be adjusted differently than 8.5 YPA against strength = 85.

Exercise 5: Competitive Balance Analysis

Analyze competitive balance differences: a) Simulate or load win-loss records for NFL and college teams b) Calculate Gini coefficient for each level c) Compute standard deviation of win percentages d) Visualize win distributions with histograms or density plots e) Discuss implications for analytics and prediction **Metric**: Gini coefficient ranges from 0 (perfect equality) to 1 (perfect inequality).

Further Reading

College Football Analytics:

  • Connelly, B. (2023). The College Football Analytics Revolution. SB Nation.
  • SP+ Ratings methodology: https://www.espn.com/college-football/story/_/id/32982470
  • FEI (Fremeau Efficiency Index): https://www.bcftoys.com/

Draft Analytics:

  • Massey, C. & Thaler, R. (2013). "The Loser's Curse: Decision Making and Market Efficiency in the National Football League Draft." Management Science, 59(7), 1479-1495.
  • Schatz, A. (2006). "Skill vs. Luck in the NFL Draft." Football Outsiders Almanac.
  • Mulholland, J. & Jensen, S. (2014). "Evaluating NFL Draft Choices." Journal of Quantitative Analysis in Sports, 10(3).

College-NFL Translation:

  • Yurko, R., et al. (2020). "Going Deep: Models for Continuous-Time Within-Play Valuation of Game Outcomes in American Football with Tracking Data." Journal of Quantitative Analysis in Sports, 16(2).
  • NFLRANK methodology: https://www.nfl.com/prospects/
  • PFF College-to-NFL Translation: https://www.pff.com/

Rule Differences:

  • NCAA Football Rules: https://www.ncaapublications.com/p-4679-2024-ncaa-football-rules-and-interpretations.aspx
  • NFL Rulebook: https://operations.nfl.com/the-rules/

References

:::