Learning ObjectivesBy the end of this chapter, you will be able to:

  1. Analyze recruiting class quality using composite rankings and star ratings
  2. Project high school performance to college success using statistical models
  3. Evaluate recruiting strategy and geographic footprint
  4. Study recruiting ROI and player development effectiveness
  5. Build comprehensive recruiting analytics systems

Introduction

In college football, recruiting is the lifeblood of sustained success. While coaching matters, scheme evolves, and game-day execution varies, the quality of talent on the roster ultimately determines championship potential. The famous "Blue Chip Ratio" study found that since the BCS era began in 1998, only teams with at least 50% of their roster composed of four- and five-star recruits have won national championships.

This chapter explores how data science and analytics can provide competitive advantages in recruiting. We'll examine how to evaluate recruiting classes, project high school talent to college performance, analyze geographic recruiting patterns, measure recruiting ROI, and build systems to optimize talent acquisition strategies.

What is Recruiting Analytics?

Recruiting analytics is the application of statistical methods and data science to evaluate high school talent, optimize recruiting strategies, and measure the return on investment from recruiting efforts. It encompasses player evaluation, class construction, geographic analysis, and performance projection.

The Recruiting Landscape

Star Ratings and Recruiting Services

The recruiting industry is dominated by several major services that evaluate and rank high school prospects:

  • 247Sports - Composite rankings aggregating multiple services
  • Rivals - Independent rankings and team recruiting coverage
  • ESPN - National recruiting rankings
  • On3 - Newer service incorporating NIL valuations

Star ratings typically range from 2-star (low FBS/high FCS level) to 5-star (elite, transformational talent). The distribution follows a pyramid:

  • 5-Star: ~30-35 players nationally per year
  • 4-Star: ~300-350 players nationally
  • 3-Star: ~2,000+ players nationally
  • 2-Star and below: Remaining FBS/FCS prospects

The Transfer Portal Era

Since the NCAA implemented liberal transfer rules in 2021, the recruiting landscape has fundamentally changed:

  • One-time transfer: Players can transfer once without sitting out
  • Transfer portal windows: Spring and winter portal windows
  • NIL impact: Name, Image, Likeness deals influence both HS recruiting and transfers
  • Roster management: Increased roster turnover requires continuous talent acquisition

Loading and Analyzing Recruiting Data

Accessing Recruiting Data

#| label: setup-r
#| message: false
#| warning: false

# Load required libraries
library(tidyverse)
library(cfbfastR)
library(gt)
library(gtExtras)
library(ggplot2)
library(sf)
library(maps)

# Set plotting theme
theme_set(theme_minimal())

cat("✓ R packages loaded successfully\n")
#| label: load-recruiting-data-r
#| message: false
#| warning: false
#| cache: true

# Load recruiting data for recent years
recruiting_2023 <- cfbd_recruiting_player(2023)
recruiting_2022 <- cfbd_recruiting_player(2022)
recruiting_2021 <- cfbd_recruiting_player(2021)
recruiting_2020 <- cfbd_recruiting_player(2020)

# Combine years
recruiting_data <- bind_rows(
  recruiting_2023,
  recruiting_2022,
  recruiting_2021,
  recruiting_2020
) %>%
  mutate(
    year = as.integer(year),
    stars = as.integer(stars),
    rating = as.numeric(rating)
  )

cat("Loaded", nrow(recruiting_data), "recruits from 2020-2023\n")
cat("Sample of data:\n")
recruiting_data %>%
  select(year, name, school, position, stars, rating, state) %>%
  head(10) %>%
  print()
#| label: setup-py
#| message: false
#| warning: false

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import json
from scipy import stats
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 100

print("✓ Python packages loaded successfully")
#| label: load-recruiting-data-py
#| message: false
#| warning: false
#| cache: true

# Function to load recruiting data from CFBD API
def load_recruiting_data(years):
    """Load recruiting data from College Football Data API"""
    all_data = []

    for year in years:
        url = f"https://api.collegefootballdata.com/recruiting/players?year={year}"
        try:
            response = requests.get(url)
            if response.status_code == 200:
                data = response.json()
                df = pd.DataFrame(data)
                df['year'] = year
                all_data.append(df)
        except Exception as e:
            print(f"Error loading {year}: {e}")

    return pd.concat(all_data, ignore_index=True)

# Load data for recent years
years = [2020, 2021, 2022, 2023]
recruiting_data = load_recruiting_data(years)

# Clean and prepare data
recruiting_data['stars'] = pd.to_numeric(recruiting_data['stars'], errors='coerce')
recruiting_data['rating'] = pd.to_numeric(recruiting_data['rating'], errors='coerce')
recruiting_data['year'] = pd.to_numeric(recruiting_data['year'], errors='coerce')

print(f"Loaded {len(recruiting_data):,} recruits from 2020-2023")
print("\nSample of data:")
print(recruiting_data[['year', 'name', 'school', 'position',
                       'stars', 'rating', 'stateProvince']].head(10))

Recruiting Class Rankings

Let's analyze team recruiting class rankings:

#| label: class-rankings-r
#| message: false
#| warning: false

# Calculate team recruiting class metrics
class_rankings <- recruiting_data %>%
  group_by(year, school) %>%
  summarise(
    total_recruits = n(),
    five_stars = sum(stars == 5, na.rm = TRUE),
    four_stars = sum(stars == 4, na.rm = TRUE),
    three_stars = sum(stars == 3, na.rm = TRUE),
    avg_rating = mean(rating, na.rm = TRUE),
    avg_stars = mean(stars, na.rm = TRUE),
    blue_chips = sum(stars >= 4, na.rm = TRUE),
    blue_chip_pct = blue_chips / total_recruits,
    total_points = sum(rating, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(year, desc(avg_rating))

# Top 10 classes in 2023
class_rankings %>%
  filter(year == 2023) %>%
  head(10) %>%
  gt() %>%
  cols_label(
    year = "Year",
    school = "School",
    total_recruits = "Recruits",
    five_stars = "5★",
    four_stars = "4★",
    three_stars = "3★",
    avg_rating = "Avg Rating",
    blue_chip_pct = "Blue Chip %"
  ) %>%
  fmt_number(
    columns = c(avg_rating),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = blue_chip_pct,
    decimals = 1
  ) %>%
  tab_header(
    title = "Top 10 Recruiting Classes - 2023",
    subtitle = "Ranked by average recruit rating"
  )
#| label: class-rankings-py
#| message: false
#| warning: false

# Calculate team recruiting class metrics
class_rankings = (recruiting_data
    .groupby(['year', 'school'])
    .agg(
        total_recruits=('name', 'count'),
        five_stars=('stars', lambda x: (x == 5).sum()),
        four_stars=('stars', lambda x: (x == 4).sum()),
        three_stars=('stars', lambda x: (x == 3).sum()),
        avg_rating=('rating', 'mean'),
        avg_stars=('stars', 'mean'),
        total_points=('rating', 'sum')
    )
    .reset_index()
)

# Calculate blue chip metrics
class_rankings['blue_chips'] = class_rankings['five_stars'] + class_rankings['four_stars']
class_rankings['blue_chip_pct'] = class_rankings['blue_chips'] / class_rankings['total_recruits']

# Sort by average rating
class_rankings = class_rankings.sort_values(['year', 'avg_rating'], ascending=[True, False])

# Display top 10 classes in 2023
print("\nTop 10 Recruiting Classes - 2023")
print("="*80)
top_2023 = class_rankings[class_rankings['year'] == 2023].head(10)
print(top_2023[['school', 'total_recruits', 'five_stars', 'four_stars',
                'avg_rating', 'blue_chip_pct']].to_string(index=False))

Star Ratings and Composite Rankings

Understanding Star Rating Distributions

Star ratings provide a quick evaluation of recruit quality, but understanding their distribution and predictive value is crucial:

#| label: fig-star-distribution-r
#| fig-cap: "Distribution of star ratings in college football recruiting"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Star rating distribution by year
recruiting_data %>%
  filter(!is.na(stars), stars >= 2, stars <= 5) %>%
  count(year, stars) %>%
  ggplot(aes(x = as.factor(stars), y = n, fill = as.factor(year))) +
  geom_col(position = "dodge", alpha = 0.8) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Distribution of Star Ratings by Year",
    subtitle = "2020-2023 Recruiting Classes",
    x = "Star Rating",
    y = "Number of Recruits",
    fill = "Year",
    caption = "Data: CollegeFootballData.com"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "top"
  )
#| label: fig-star-distribution-py
#| fig-cap: "Distribution of star ratings - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Star rating distribution by year
star_dist = (recruiting_data
    .query("stars.notna() & stars >= 2 & stars <= 5")
    .groupby(['year', 'stars'])
    .size()
    .reset_index(name='count')
)

# Create grouped bar chart
fig, ax = plt.subplots(figsize=(10, 6))
years = star_dist['year'].unique()
x = np.arange(4)  # 2, 3, 4, 5 stars
width = 0.2

for i, year in enumerate(sorted(years)):
    year_data = star_dist[star_dist['year'] == year]
    counts = [year_data[year_data['stars'] == s]['count'].values[0]
              if len(year_data[year_data['stars'] == s]) > 0 else 0
              for s in [2, 3, 4, 5]]
    ax.bar(x + i*width, counts, width, label=str(int(year)), alpha=0.8)

ax.set_xlabel('Star Rating', fontsize=12)
ax.set_ylabel('Number of Recruits', fontsize=12)
ax.set_title('Distribution of Star Ratings by Year\n2020-2023 Recruiting Classes',
             fontsize=14, fontweight='bold')
ax.set_xticks(x + width * 1.5)
ax.set_xticklabels(['2★', '3★', '4★', '5★'])
ax.legend(title='Year', loc='upper right')
ax.text(0.98, 0.02, 'Data: CollegeFootballData.com',
        transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Composite Rating Analysis

Composite ratings aggregate multiple recruiting services to provide a more reliable evaluation:

#| label: composite-analysis-r
#| message: false
#| warning: false

# Analyze composite ratings by star level
rating_by_stars <- recruiting_data %>%
  filter(!is.na(stars), !is.na(rating), stars >= 2, stars <= 5) %>%
  group_by(stars) %>%
  summarise(
    count = n(),
    mean_rating = mean(rating),
    median_rating = median(rating),
    sd_rating = sd(rating),
    min_rating = min(rating),
    max_rating = max(rating),
    .groups = "drop"
  )

rating_by_stars %>%
  gt() %>%
  cols_label(
    stars = "Stars",
    count = "N",
    mean_rating = "Mean",
    median_rating = "Median",
    sd_rating = "Std Dev",
    min_rating = "Min",
    max_rating = "Max"
  ) %>%
  fmt_number(
    columns = c(mean_rating, median_rating, sd_rating, min_rating, max_rating),
    decimals = 2
  ) %>%
  tab_header(
    title = "Composite Rating Statistics by Star Level",
    subtitle = "2020-2023 recruiting classes"
  )
#| label: composite-analysis-py
#| message: false
#| warning: false

# Analyze composite ratings by star level
rating_by_stars = (recruiting_data
    .query("stars.notna() & rating.notna() & stars >= 2 & stars <= 5")
    .groupby('stars')
    .agg(
        count=('rating', 'count'),
        mean_rating=('rating', 'mean'),
        median_rating=('rating', 'median'),
        sd_rating=('rating', 'std'),
        min_rating=('rating', 'min'),
        max_rating=('rating', 'max')
    )
    .reset_index()
)

print("\nComposite Rating Statistics by Star Level")
print("="*80)
print(rating_by_stars.to_string(index=False))

Geographic Recruiting Analysis

Understanding geographic recruiting patterns reveals competitive advantages and strategic opportunities:

State-Level Recruiting Production

#| label: geographic-analysis-r
#| message: false
#| warning: false

# Top talent-producing states
state_production <- recruiting_data %>%
  filter(!is.na(state), !is.na(stars)) %>%
  group_by(state) %>%
  summarise(
    total_recruits = n(),
    five_stars = sum(stars == 5, na.rm = TRUE),
    four_stars = sum(stars == 4, na.rm = TRUE),
    blue_chips = sum(stars >= 4, na.rm = TRUE),
    avg_rating = mean(rating, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(blue_chips))

# Top 15 states
state_production %>%
  head(15) %>%
  gt() %>%
  cols_label(
    state = "State",
    total_recruits = "Total",
    five_stars = "5★",
    four_stars = "4★",
    blue_chips = "Blue Chips",
    avg_rating = "Avg Rating"
  ) %>%
  fmt_number(
    columns = avg_rating,
    decimals = 2
  ) %>%
  tab_header(
    title = "Top Talent-Producing States",
    subtitle = "2020-2023 recruiting classes"
  )
#| label: geographic-analysis-py
#| message: false
#| warning: false

# Top talent-producing states
state_production = (recruiting_data
    .query("stateProvince.notna() & stars.notna()")
    .groupby('stateProvince')
    .agg(
        total_recruits=('name', 'count'),
        five_stars=('stars', lambda x: (x == 5).sum()),
        four_stars=('stars', lambda x: (x == 4).sum()),
        avg_rating=('rating', 'mean')
    )
    .reset_index()
)

state_production['blue_chips'] = state_production['five_stars'] + state_production['four_stars']
state_production = state_production.sort_values('blue_chips', ascending=False)

print("\nTop 15 Talent-Producing States (2020-2023)")
print("="*80)
print(state_production.head(15).to_string(index=False))

Recruiting Footprint Analysis

Analyze where schools recruit geographically:

#| label: footprint-analysis-r
#| message: false
#| warning: false

# Recruiting footprint for selected schools
schools_of_interest <- c("Alabama", "Ohio State", "Georgia", "Texas")

footprint_analysis <- recruiting_data %>%
  filter(school %in% schools_of_interest, !is.na(state)) %>%
  group_by(school, state) %>%
  summarise(
    recruits = n(),
    avg_stars = mean(stars, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  group_by(school) %>%
  mutate(
    pct_of_class = recruits / sum(recruits)
  ) %>%
  arrange(school, desc(recruits))

# Show top 5 states for each school
footprint_analysis %>%
  group_by(school) %>%
  slice_max(order_by = recruits, n = 5) %>%
  gt() %>%
  cols_label(
    school = "School",
    state = "State",
    recruits = "Recruits",
    avg_stars = "Avg Stars",
    pct_of_class = "% of Class"
  ) %>%
  fmt_number(
    columns = avg_stars,
    decimals = 2
  ) %>%
  fmt_percent(
    columns = pct_of_class,
    decimals = 1
  ) %>%
  tab_header(
    title = "Top Recruiting States by School",
    subtitle = "Top 5 states for selected programs, 2020-2023"
  )
#| label: footprint-analysis-py
#| message: false
#| warning: false

# Recruiting footprint for selected schools
schools_of_interest = ["Alabama", "Ohio State", "Georgia", "Texas"]

footprint_analysis = (recruiting_data
    .query("school in @schools_of_interest & stateProvince.notna()")
    .groupby(['school', 'stateProvince'])
    .agg(
        recruits=('name', 'count'),
        avg_stars=('stars', 'mean')
    )
    .reset_index()
)

# Calculate percentage of class
footprint_analysis['pct_of_class'] = (footprint_analysis
    .groupby('school')['recruits']
    .transform(lambda x: x / x.sum())
)

# Show top 5 states for each school
print("\nTop Recruiting States by School (2020-2023)")
print("="*80)
for school in schools_of_interest:
    school_data = footprint_analysis[footprint_analysis['school'] == school]
    top5 = school_data.nlargest(5, 'recruits')
    print(f"\n{school}:")
    print(top5[['stateProvince', 'recruits', 'avg_stars', 'pct_of_class']].to_string(index=False))

Position-Specific Recruiting Analysis

Different positions have different values in recruiting and varying success rates:

Position Value Distribution

#| label: position-analysis-r
#| message: false
#| warning: false

# Position group analysis
position_analysis <- recruiting_data %>%
  filter(!is.na(position), !is.na(stars)) %>%
  group_by(position) %>%
  summarise(
    total = n(),
    avg_stars = mean(stars, na.rm = TRUE),
    avg_rating = mean(rating, na.rm = TRUE),
    blue_chip_pct = sum(stars >= 4, na.rm = TRUE) / n(),
    .groups = "drop"
  ) %>%
  filter(total >= 100) %>%  # Filter for positions with sufficient data
  arrange(desc(avg_stars))

# Top positions by average rating
position_analysis %>%
  head(15) %>%
  gt() %>%
  cols_label(
    position = "Position",
    total = "N",
    avg_stars = "Avg Stars",
    avg_rating = "Avg Rating",
    blue_chip_pct = "Blue Chip %"
  ) %>%
  fmt_number(
    columns = c(avg_stars, avg_rating),
    decimals = 2
  ) %>%
  fmt_percent(
    columns = blue_chip_pct,
    decimals = 1
  ) %>%
  tab_header(
    title = "Recruiting Value by Position",
    subtitle = "Positions ranked by average star rating, 2020-2023"
  )
#| label: position-analysis-py
#| message: false
#| warning: false

# Position group analysis
position_analysis = (recruiting_data
    .query("position.notna() & stars.notna()")
    .groupby('position')
    .agg(
        total=('name', 'count'),
        avg_stars=('stars', 'mean'),
        avg_rating=('rating', 'mean'),
        blue_chips=('stars', lambda x: (x >= 4).sum())
    )
    .reset_index()
)

# Filter for positions with sufficient data
position_analysis = position_analysis[position_analysis['total'] >= 100]
position_analysis['blue_chip_pct'] = position_analysis['blue_chips'] / position_analysis['total']
position_analysis = position_analysis.sort_values('avg_stars', ascending=False)

print("\nTop 15 Positions by Average Star Rating (2020-2023)")
print("="*80)
print(position_analysis.head(15)[['position', 'total', 'avg_stars',
                                   'avg_rating', 'blue_chip_pct']].to_string(index=False))

Positional Scarcity Analysis

#| label: fig-position-scarcity-r
#| fig-cap: "Elite talent scarcity by position"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# 5-star recruits by position
five_star_positions <- recruiting_data %>%
  filter(stars == 5, !is.na(position)) %>%
  count(position) %>%
  arrange(desc(n)) %>%
  head(12)

five_star_positions %>%
  ggplot(aes(x = reorder(position, n), y = n)) +
  geom_col(fill = "#0066CC", alpha = 0.8) +
  geom_text(aes(label = n), hjust = -0.2, size = 3.5) +
  coord_flip() +
  labs(
    title = "Five-Star Recruits by Position",
    subtitle = "Elite talent distribution, 2020-2023",
    x = "Position",
    y = "Number of 5-Star Recruits",
    caption = "Data: CollegeFootballData.com"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14)
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-position-scarcity-py
#| fig-cap: "Elite talent scarcity by position - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# 5-star recruits by position
five_star_positions = (recruiting_data
    .query("stars == 5 & position.notna()")
    .groupby('position')
    .size()
    .reset_index(name='count')
    .sort_values('count', ascending=True)
    .tail(12)
)

fig, ax = plt.subplots(figsize=(10, 6))
ax.barh(five_star_positions['position'], five_star_positions['count'],
        color='#0066CC', alpha=0.8)
ax.set_xlabel('Number of 5-Star Recruits', fontsize=12)
ax.set_ylabel('Position', fontsize=12)
ax.set_title('Five-Star Recruits by Position\nElite talent distribution, 2020-2023',
             fontsize=14, fontweight='bold')

# Add count labels
for i, (pos, count) in enumerate(zip(five_star_positions['position'],
                                      five_star_positions['count'])):
    ax.text(count + 0.2, i, str(count), va='center', fontsize=9)

plt.text(0.98, 0.02, 'Data: CollegeFootballData.com',
         transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

High School to College Projection Models

One of the most valuable applications of recruiting analytics is projecting how high school recruits will perform at the college level:

Correlation Between Stars and College Performance

#| label: projection-setup-r
#| message: false
#| warning: false
#| cache: true

# Create synthetic college performance data for demonstration
# In practice, this would come from actual player statistics
set.seed(42)

college_performance <- recruiting_data %>%
  filter(year %in% c(2020, 2021), !is.na(stars), !is.na(rating)) %>%
  select(name, school, position, stars, rating) %>%
  # Simulate college performance metrics (replace with actual data)
  mutate(
    # Draft position (1-262, 999 for undrafted)
    draft_pick = case_when(
      stars == 5 ~ pmin(262, rnorm(n(), mean = 80, sd = 50)),
      stars == 4 ~ pmin(262, rnorm(n(), mean = 150, sd = 60)),
      stars == 3 ~ pmin(262, rnorm(n(), mean = 220, sd = 40)),
      TRUE ~ 999
    ),
    drafted = draft_pick <= 262,
    # Approximate WAR (Wins Above Replacement)
    college_war = case_when(
      stars == 5 ~ rnorm(n(), mean = 2.5, sd = 1.2),
      stars == 4 ~ rnorm(n(), mean = 1.5, sd = 1.0),
      stars == 3 ~ rnorm(n(), mean = 0.8, sd = 0.8),
      TRUE ~ rnorm(n(), mean = 0.3, sd = 0.5)
    ),
    college_war = pmax(0, college_war)
  )

# Calculate draft rates by star level
draft_rates <- college_performance %>%
  group_by(stars) %>%
  summarise(
    total = n(),
    drafted = sum(drafted),
    draft_rate = mean(drafted),
    avg_pick = mean(draft_pick[drafted]),
    avg_war = mean(college_war),
    .groups = "drop"
  )

draft_rates %>%
  gt() %>%
  cols_label(
    stars = "Stars",
    total = "Total",
    drafted = "Drafted",
    draft_rate = "Draft %",
    avg_pick = "Avg Pick",
    avg_war = "Avg WAR"
  ) %>%
  fmt_percent(
    columns = draft_rate,
    decimals = 1
  ) %>%
  fmt_number(
    columns = c(avg_pick, avg_war),
    decimals = 1
  ) %>%
  tab_header(
    title = "NFL Draft Success by Star Rating",
    subtitle = "Simulated data for 2020-2021 recruiting classes"
  )
#| label: projection-setup-py
#| message: false
#| warning: false
#| cache: true

# Create synthetic college performance data for demonstration
np.random.seed(42)

# Filter to 2020-2021 classes
college_performance = recruiting_data[
    (recruiting_data['year'].isin([2020, 2021])) &
    (recruiting_data['stars'].notna()) &
    (recruiting_data['rating'].notna())
].copy()

# Simulate college performance metrics (replace with actual data)
def simulate_draft_pick(stars):
    if stars == 5:
        return min(262, max(1, np.random.normal(80, 50)))
    elif stars == 4:
        return min(262, max(1, np.random.normal(150, 60)))
    elif stars == 3:
        return min(262, max(1, np.random.normal(220, 40)))
    else:
        return 999

def simulate_war(stars):
    if stars == 5:
        return max(0, np.random.normal(2.5, 1.2))
    elif stars == 4:
        return max(0, np.random.normal(1.5, 1.0))
    elif stars == 3:
        return max(0, np.random.normal(0.8, 0.8))
    else:
        return max(0, np.random.normal(0.3, 0.5))

college_performance['draft_pick'] = college_performance['stars'].apply(simulate_draft_pick)
college_performance['drafted'] = college_performance['draft_pick'] <= 262
college_performance['college_war'] = college_performance['stars'].apply(simulate_war)

# Calculate draft rates by star level
draft_rates = (college_performance
    .groupby('stars')
    .agg(
        total=('name', 'count'),
        drafted=('drafted', 'sum'),
        avg_pick=('draft_pick', lambda x: x[x <= 262].mean()),
        avg_war=('college_war', 'mean')
    )
    .reset_index()
)

draft_rates['draft_rate'] = draft_rates['drafted'] / draft_rates['total']

print("\nNFL Draft Success by Star Rating")
print("="*80)
print(draft_rates.to_string(index=False))

Building a Projection Model

#| label: projection-model-r
#| message: false
#| warning: false

# Build linear regression model
model_data <- college_performance %>%
  filter(!is.na(college_war), !is.na(rating))

# Simple linear model
lm_model <- lm(college_war ~ rating + position, data = model_data)

# Model summary
summary(lm_model)

# Predictions
model_data <- model_data %>%
  mutate(predicted_war = predict(lm_model, newdata = .))

# Calculate R-squared
r_squared <- cor(model_data$college_war, model_data$predicted_war)^2
cat("\nModel R-squared:", round(r_squared, 3), "\n")

# Feature importance (coefficients)
coef_df <- broom::tidy(lm_model) %>%
  arrange(desc(abs(estimate)))

cat("\nTop predictive features:\n")
print(coef_df)
#| label: projection-model-py
#| message: false
#| warning: false

# Prepare data for modeling
model_data = college_performance.dropna(subset=['college_war', 'rating', 'position'])

# Create position dummies
position_dummies = pd.get_dummies(model_data['position'], prefix='pos', drop_first=True)
X = pd.concat([model_data[['rating']], position_dummies], axis=1)
y = model_data['college_war']

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train linear regression
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

# Predictions
y_pred = lr_model.predict(X_test)

# Evaluate
r2 = r2_score(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print("\nLinear Regression Model Performance")
print("="*80)
print(f"R-squared: {r2:.3f}")
print(f"RMSE: {rmse:.3f}")

# Feature importance
feature_importance = pd.DataFrame({
    'feature': X.columns,
    'coefficient': lr_model.coef_
}).sort_values('coefficient', key=abs, ascending=False)

print("\nTop Predictive Features:")
print(feature_importance.head(10).to_string(index=False))

Advanced Ensemble Model

#| label: ensemble-model-r
#| message: false
#| warning: false
#| eval: false

# Random Forest model (requires randomForest package)
library(randomForest)

# Prepare data
rf_data <- model_data %>%
  select(college_war, rating, position, stars) %>%
  na.omit()

# Train random forest
set.seed(42)
rf_model <- randomForest(
  college_war ~ rating + position + stars,
  data = rf_data,
  ntree = 500,
  importance = TRUE
)

# Variable importance
importance(rf_model) %>%
  as.data.frame() %>%
  arrange(desc(`%IncMSE`))

# Model performance
cat("Random Forest R-squared:",
    round(rf_model$rsq[length(rf_model$rsq)], 3), "\n")
#| label: ensemble-model-py
#| message: false
#| warning: false

# Train Random Forest
rf_model = RandomForestRegressor(
    n_estimators=500,
    max_depth=10,
    random_state=42,
    n_jobs=-1
)

rf_model.fit(X_train, y_train)

# Predictions
y_pred_rf = rf_model.predict(X_test)

# Evaluate
r2_rf = r2_score(y_test, y_pred_rf)
rmse_rf = np.sqrt(mean_squared_error(y_test, y_pred_rf))

print("\nRandom Forest Model Performance")
print("="*80)
print(f"R-squared: {r2_rf:.3f}")
print(f"RMSE: {rmse_rf:.3f}")

# Feature importance
rf_importance = pd.DataFrame({
    'feature': X.columns,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nFeature Importance (Top 10):")
print(rf_importance.head(10).to_string(index=False))

Recruiting ROI and Player Development

Evaluating recruiting return on investment helps identify which programs excel at development:

Blue Chip Ratio Analysis

The Blue Chip Ratio (percentage of roster composed of 4 and 5-star recruits) is a powerful predictor of championship potential:

#| label: blue-chip-ratio-r
#| message: false
#| warning: false

# Calculate Blue Chip Ratio by school
blue_chip_ratio <- recruiting_data %>%
  filter(!is.na(stars), year >= 2020) %>%
  group_by(school) %>%
  summarise(
    total_recruits = n(),
    blue_chips = sum(stars >= 4),
    blue_chip_ratio = blue_chips / total_recruits,
    avg_stars = mean(stars),
    five_stars = sum(stars == 5),
    .groups = "drop"
  ) %>%
  filter(total_recruits >= 60) %>%  # Minimum sample size
  arrange(desc(blue_chip_ratio))

# Top programs by Blue Chip Ratio
blue_chip_ratio %>%
  head(15) %>%
  gt() %>%
  cols_label(
    school = "School",
    total_recruits = "Total",
    blue_chips = "Blue Chips",
    blue_chip_ratio = "BCR",
    avg_stars = "Avg Stars",
    five_stars = "5★"
  ) %>%
  fmt_percent(
    columns = blue_chip_ratio,
    decimals = 1
  ) %>%
  fmt_number(
    columns = avg_stars,
    decimals = 2
  ) %>%
  data_color(
    columns = blue_chip_ratio,
    colors = scales::col_numeric(
      palette = c("#FEE5D9", "#A50F15"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Blue Chip Ratio Leaders",
    subtitle = "2020-2023 recruiting classes (minimum 60 recruits)"
  )
#| label: blue-chip-ratio-py
#| message: false
#| warning: false

# Calculate Blue Chip Ratio by school
blue_chip_ratio = (recruiting_data
    .query("stars.notna() & year >= 2020")
    .groupby('school')
    .agg(
        total_recruits=('name', 'count'),
        blue_chips=('stars', lambda x: (x >= 4).sum()),
        avg_stars=('stars', 'mean'),
        five_stars=('stars', lambda x: (x == 5).sum())
    )
    .reset_index()
)

# Filter minimum sample size
blue_chip_ratio = blue_chip_ratio[blue_chip_ratio['total_recruits'] >= 60]
blue_chip_ratio['blue_chip_ratio'] = blue_chip_ratio['blue_chips'] / blue_chip_ratio['total_recruits']
blue_chip_ratio = blue_chip_ratio.sort_values('blue_chip_ratio', ascending=False)

print("\nBlue Chip Ratio Leaders (2020-2023)")
print("="*80)
print(blue_chip_ratio.head(15).to_string(index=False))

Blue Chip Ratio vs Winning

#| label: fig-bcr-wins-r
#| fig-cap: "Blue Chip Ratio vs Winning Percentage"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false
#| eval: false

# This would require actual win-loss data
# Simulated for demonstration
set.seed(42)
bcr_wins <- blue_chip_ratio %>%
  mutate(
    # Simulated winning percentage (replace with actual data)
    win_pct = 0.5 + (blue_chip_ratio * 0.35) + rnorm(n(), 0, 0.08),
    win_pct = pmax(0.3, pmin(1.0, win_pct)),
    championship = blue_chip_ratio >= 0.5
  )

# Plot
bcr_wins %>%
  ggplot(aes(x = blue_chip_ratio, y = win_pct)) +
  geom_point(aes(color = championship, size = five_stars), alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "darkblue", linetype = "dashed") +
  geom_vline(xintercept = 0.5, linetype = "dashed", color = "red") +
  scale_color_manual(
    values = c("FALSE" = "#999999", "TRUE" = "#E41A1C"),
    labels = c("Below 50%", "Above 50%")
  ) +
  scale_x_continuous(labels = scales::percent) +
  scale_y_continuous(labels = scales::percent) +
  labs(
    title = "Blue Chip Ratio and Winning Percentage",
    subtitle = "The 50% threshold predicts championship contention",
    x = "Blue Chip Ratio",
    y = "Winning Percentage",
    color = "BCR Status",
    size = "5-Star Recruits",
    caption = "Simulated data for demonstration"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "right"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-bcr-wins-py
#| fig-cap: "Blue Chip Ratio vs Winning Percentage - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

# Simulated winning percentage (replace with actual data)
np.random.seed(42)
bcr_wins = blue_chip_ratio.copy()
bcr_wins['win_pct'] = (0.5 +
                        (bcr_wins['blue_chip_ratio'] * 0.35) +
                        np.random.normal(0, 0.08, len(bcr_wins)))
bcr_wins['win_pct'] = bcr_wins['win_pct'].clip(0.3, 1.0)
bcr_wins['championship'] = bcr_wins['blue_chip_ratio'] >= 0.5

# Create plot
fig, ax = plt.subplots(figsize=(10, 6))

# Scatter plot
colors = ['#999999' if not c else '#E41A1C' for c in bcr_wins['championship']]
scatter = ax.scatter(bcr_wins['blue_chip_ratio'], bcr_wins['win_pct'],
                     c=colors, s=bcr_wins['five_stars']*20, alpha=0.6)

# Regression line
z = np.polyfit(bcr_wins['blue_chip_ratio'], bcr_wins['win_pct'], 1)
p = np.poly1d(z)
ax.plot(bcr_wins['blue_chip_ratio'].sort_values(),
        p(bcr_wins['blue_chip_ratio'].sort_values()),
        "b--", alpha=0.8, label='Trend line')

# 50% threshold line
ax.axvline(x=0.5, color='red', linestyle='--', alpha=0.7, label='50% BCR threshold')

ax.set_xlabel('Blue Chip Ratio', fontsize=12)
ax.set_ylabel('Winning Percentage', fontsize=12)
ax.set_title('Blue Chip Ratio and Winning Percentage\nThe 50% threshold predicts championship contention',
             fontsize=14, fontweight='bold')

# Format as percentages
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: '{:.0%}'.format(y)))

ax.legend()
plt.text(0.98, 0.02, 'Simulated data for demonstration',
         transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

Development Efficiency

Which schools develop players better than their recruiting rankings suggest?

#| label: development-efficiency-r
#| message: false
#| warning: false

# Calculate development efficiency
development <- college_performance %>%
  group_by(school) %>%
  summarise(
    recruits = n(),
    avg_stars = mean(stars, na.rm = TRUE),
    avg_rating = mean(rating, na.rm = TRUE),
    drafted = sum(drafted),
    draft_rate = mean(drafted),
    avg_war = mean(college_war, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(recruits >= 20) %>%
  mutate(
    # Expected draft rate based on average stars
    expected_draft_rate = case_when(
      avg_stars >= 4.5 ~ 0.45,
      avg_stars >= 4.0 ~ 0.35,
      avg_stars >= 3.5 ~ 0.25,
      avg_stars >= 3.0 ~ 0.15,
      TRUE ~ 0.08
    ),
    development_index = draft_rate - expected_draft_rate
  ) %>%
  arrange(desc(development_index))

# Top development programs
development %>%
  head(10) %>%
  gt() %>%
  cols_label(
    school = "School",
    recruits = "N",
    avg_stars = "Avg Stars",
    draft_rate = "Draft %",
    expected_draft_rate = "Expected %",
    development_index = "Dev Index"
  ) %>%
  fmt_number(
    columns = avg_stars,
    decimals = 2
  ) %>%
  fmt_percent(
    columns = c(draft_rate, expected_draft_rate, development_index),
    decimals = 1
  ) %>%
  data_color(
    columns = development_index,
    colors = scales::col_numeric(
      palette = c("#2166AC", "#F7F7F7", "#B2182B"),
      domain = NULL
    )
  ) %>%
  tab_header(
    title = "Player Development Leaders",
    subtitle = "Schools exceeding expected draft rates (simulated data)"
  )
#| label: development-efficiency-py
#| message: false
#| warning: false

# Calculate development efficiency
development = (college_performance
    .groupby('school')
    .agg(
        recruits=('name', 'count'),
        avg_stars=('stars', 'mean'),
        avg_rating=('rating', 'mean'),
        drafted=('drafted', 'sum'),
        avg_war=('college_war', 'mean')
    )
    .reset_index()
)

# Filter minimum sample size
development = development[development['recruits'] >= 20]
development['draft_rate'] = development['drafted'] / development['recruits']

# Expected draft rate based on average stars
def expected_draft_rate(avg_stars):
    if avg_stars >= 4.5:
        return 0.45
    elif avg_stars >= 4.0:
        return 0.35
    elif avg_stars >= 3.5:
        return 0.25
    elif avg_stars >= 3.0:
        return 0.15
    else:
        return 0.08

development['expected_draft_rate'] = development['avg_stars'].apply(expected_draft_rate)
development['development_index'] = development['draft_rate'] - development['expected_draft_rate']
development = development.sort_values('development_index', ascending=False)

print("\nPlayer Development Leaders (Simulated Data)")
print("="*80)
print(development.head(10)[['school', 'recruits', 'avg_stars', 'draft_rate',
                            'expected_draft_rate', 'development_index']].to_string(index=False))

Transfer Portal Impact

The transfer portal has fundamentally changed recruiting strategy:

Transfer vs High School Recruiting

#| label: transfer-analysis-r
#| message: false
#| warning: false
#| eval: false

# This requires transfer portal data
# Simulated for demonstration
set.seed(42)

# Simulate transfer portal data
portal_data <- tibble(
  year = rep(2020:2023, each = 500),
  player_name = paste0("Player_", 1:2000),
  from_school = sample(recruiting_data$school, 2000, replace = TRUE),
  to_school = sample(recruiting_data$school, 2000, replace = TRUE),
  position = sample(recruiting_data$position, 2000, replace = TRUE),
  original_stars = sample(2:5, 2000, replace = TRUE, prob = c(0.1, 0.5, 0.35, 0.05)),
  portal_rating = rnorm(2000, mean = 85, sd = 10)
) %>%
  filter(from_school != to_school)

# Transfer trends over time
transfer_trends <- portal_data %>%
  group_by(year) %>%
  summarise(
    total_transfers = n(),
    avg_rating = mean(portal_rating),
    four_plus_stars = sum(original_stars >= 4),
    .groups = "drop"
  )

transfer_trends %>%
  gt() %>%
  cols_label(
    year = "Year",
    total_transfers = "Transfers",
    avg_rating = "Avg Rating",
    four_plus_stars = "4+ Stars"
  ) %>%
  fmt_number(
    columns = avg_rating,
    decimals = 1
  ) %>%
  tab_header(
    title = "Transfer Portal Trends",
    subtitle = "Simulated data, 2020-2023"
  )
#| label: transfer-analysis-py
#| message: false
#| warning: false

# Simulate transfer portal data
np.random.seed(42)

portal_data = pd.DataFrame({
    'year': np.repeat([2020, 2021, 2022, 2023], 500),
    'player_name': [f'Player_{i}' for i in range(2000)],
    'from_school': np.random.choice(recruiting_data['school'].dropna().unique(), 2000),
    'to_school': np.random.choice(recruiting_data['school'].dropna().unique(), 2000),
    'position': np.random.choice(recruiting_data['position'].dropna().unique(), 2000),
    'original_stars': np.random.choice([2, 3, 4, 5], 2000, p=[0.1, 0.5, 0.35, 0.05]),
    'portal_rating': np.random.normal(85, 10, 2000)
})

# Remove same-school transfers
portal_data = portal_data[portal_data['from_school'] != portal_data['to_school']]

# Transfer trends over time
transfer_trends = (portal_data
    .groupby('year')
    .agg(
        total_transfers=('player_name', 'count'),
        avg_rating=('portal_rating', 'mean'),
        four_plus_stars=('original_stars', lambda x: (x >= 4).sum())
    )
    .reset_index()
)

print("\nTransfer Portal Trends (Simulated Data)")
print("="*80)
print(transfer_trends.to_string(index=False))

Portal vs HS Recruiting Value

#| label: fig-portal-value-r
#| fig-cap: "Transfer portal growth trends"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false
#| eval: false

transfer_trends %>%
  ggplot(aes(x = year, y = total_transfers)) +
  geom_line(color = "#0066CC", size = 1.5) +
  geom_point(color = "#0066CC", size = 4) +
  geom_text(aes(label = total_transfers), vjust = -1, size = 4) +
  scale_y_continuous(limits = c(0, max(transfer_trends$total_transfers) * 1.2)) +
  labs(
    title = "Transfer Portal Growth",
    subtitle = "Total transfers by year (simulated data)",
    x = "Year",
    y = "Total Transfers",
    caption = "Data: Simulated for demonstration"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14)
  )
#| label: fig-portal-value-py
#| fig-cap: "Transfer portal growth trends - Python"
#| fig-width: 10
#| fig-height: 6
#| message: false
#| warning: false

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(transfer_trends['year'], transfer_trends['total_transfers'],
        color='#0066CC', linewidth=2.5, marker='o', markersize=10)

# Add value labels
for x, y in zip(transfer_trends['year'], transfer_trends['total_transfers']):
    ax.text(x, y + 10, str(int(y)), ha='center', fontsize=11)

ax.set_xlabel('Year', fontsize=12)
ax.set_ylabel('Total Transfers', fontsize=12)
ax.set_title('Transfer Portal Growth\nTotal transfers by year (simulated data)',
             fontsize=14, fontweight='bold')
ax.set_ylim(0, transfer_trends['total_transfers'].max() * 1.2)

plt.text(0.98, 0.02, 'Data: Simulated for demonstration',
         transform=ax.transAxes, ha='right', fontsize=8, style='italic')
plt.tight_layout()
plt.show()

Building a Recruiting Analytics System

Comprehensive Recruiting Dashboard Metrics

A complete recruiting analytics system should track:

  1. Class Composition
    - Total commits
    - Star distribution
    - Position breakdown
    - Average rating
    - Blue chip ratio

  2. Geographic Footprint
    - State distribution
    - Regional penetration
    - In-state vs out-of-state ratio
    - Competing against specific schools

  3. Position-Specific Needs
    - Depth chart analysis
    - Positional value
    - Immediate impact vs developmental
    - Multi-year roster projection

  4. Historical Performance
    - Year-over-year trends
    - Development success rates
    - Draft production
    - On-field performance correlation

  5. Competitive Intelligence
    - Head-to-head recruiting battles
    - Overlap with rival schools
    - Timing of commitments
    - Decommitment risk factors

Sample Recruiting Report Card

#| label: recruiting-report-r
#| message: false
#| warning: false

# Generate comprehensive recruiting report for a school
generate_recruiting_report <- function(school_name, year_val) {

  # Class composition
  class_comp <- recruiting_data %>%
    filter(school == school_name, year == year_val) %>%
    summarise(
      total_commits = n(),
      five_stars = sum(stars == 5, na.rm = TRUE),
      four_stars = sum(stars == 4, na.rm = TRUE),
      three_stars = sum(stars == 3, na.rm = TRUE),
      avg_rating = mean(rating, na.rm = TRUE),
      avg_stars = mean(stars, na.rm = TRUE),
      blue_chip_ratio = sum(stars >= 4, na.rm = TRUE) / n()
    )

  # Position breakdown
  position_breakdown <- recruiting_data %>%
    filter(school == school_name, year == year_val) %>%
    count(position, sort = TRUE)

  # Geographic distribution
  state_breakdown <- recruiting_data %>%
    filter(school == school_name, year == year_val) %>%
    count(state, sort = TRUE) %>%
    head(10)

  list(
    composition = class_comp,
    positions = position_breakdown,
    geography = state_breakdown
  )
}

# Example report for Alabama 2023
report <- generate_recruiting_report("Alabama", 2023)

cat("=== RECRUITING REPORT CARD ===\n")
cat("School: Alabama | Year: 2023\n\n")

cat("CLASS COMPOSITION:\n")
print(report$composition)

cat("\n\nTOP POSITIONS:\n")
print(report$positions %>% head(5))

cat("\n\nTOP STATES:\n")
print(report$geography %>% head(5))
#| label: recruiting-report-py
#| message: false
#| warning: false

def generate_recruiting_report(school_name, year_val):
    """Generate comprehensive recruiting report for a school"""

    # Filter data
    school_data = recruiting_data[
        (recruiting_data['school'] == school_name) &
        (recruiting_data['year'] == year_val)
    ]

    # Class composition
    composition = {
        'total_commits': len(school_data),
        'five_stars': (school_data['stars'] == 5).sum(),
        'four_stars': (school_data['stars'] == 4).sum(),
        'three_stars': (school_data['stars'] == 3).sum(),
        'avg_rating': school_data['rating'].mean(),
        'avg_stars': school_data['stars'].mean(),
        'blue_chip_ratio': (school_data['stars'] >= 4).sum() / len(school_data)
    }

    # Position breakdown
    positions = school_data['position'].value_counts().head(10)

    # Geographic distribution
    geography = school_data['stateProvince'].value_counts().head(10)

    return {
        'composition': composition,
        'positions': positions,
        'geography': geography
    }

# Example report for Alabama 2023
report = generate_recruiting_report("Alabama", 2023)

print("="*80)
print("RECRUITING REPORT CARD")
print("School: Alabama | Year: 2023")
print("="*80)

print("\nCLASS COMPOSITION:")
for key, value in report['composition'].items():
    if isinstance(value, float):
        print(f"  {key}: {value:.2f}")
    else:
        print(f"  {key}: {value}")

print("\nTOP POSITIONS:")
print(report['positions'].head(5))

print("\nTOP STATES:")
print(report['geography'].head(5))

Advanced Topics in Recruiting Analytics

Commitment Timing Analysis

When recruits commit can signal confidence and recruiting momentum:

Early vs Late Commits

Research shows that early commits (junior year or summer before senior year) often have slightly lower bust rates than late commits (signing day). However, elite programs increasingly see early commitments as late commits may indicate the player wasn't initially a top priority.

Decommitment Risk Modeling

Factors associated with decommitment risk:

  • Distance from school: >500 miles increases risk
  • Competing offers: More high-profile offers increases risk
  • Coaching changes: Staff turnover significantly increases risk
  • Commitment timing: Very early commits have higher risk
  • Visit behavior: Continued unofficial visits elsewhere

NIL Impact on Recruiting

The Name, Image, Likeness (NIL) era has introduced new dynamics:

  • Market valuation of recruits
  • NIL collective strength by school
  • Transfer portal NIL deals
  • Position-specific NIL value
  • Regional NIL marketplace differences

Summary

Recruiting analytics provides competitive advantages through:

  1. Data-Driven Evaluation: Moving beyond star ratings to predictive models
  2. Strategic Planning: Geographic targeting and position prioritization
  3. Performance Measurement: Tracking ROI and development efficiency
  4. Competitive Intelligence: Understanding rival strategies and overlap
  5. Resource Allocation: Optimizing recruiting budget and staff time

Key insights from recruiting analytics:

  • The Blue Chip Ratio (50%+ 4/5-star recruits) strongly predicts championship contention
  • Geographic recruiting efficiency varies significantly by school
  • Transfer portal has become essential to roster management
  • Player development can partially overcome recruiting disadvantages
  • Position value in recruiting differs from on-field positional value

Exercises

Conceptual Questions

  1. Blue Chip Ratio: Why might the 50% blue chip threshold be necessary for championship contention? What does this suggest about talent concentration in college football?

  2. Geographic Recruiting: Analyze the trade-offs between recruiting nationally versus focusing on a specific region. When might each strategy be optimal?

  3. Transfer Portal Strategy: How should schools balance high school recruiting versus transfer portal acquisitions? What factors determine the optimal mix?

Coding Exercises

Exercise 1: Recruiting Geography Analysis

Load recruiting data and: a) Calculate each Power 5 school's percentage of in-state recruits b) Identify which states produce the most 5-star recruits per capita c) Create a visualization showing recruiting "travel distance" for each school **Hint**: You may need to geocode city/state data to calculate distances.

Exercise 2: Blue Chip Ratio Calculator

Build a function that: a) Takes a team name and year range as input b) Calculates the blue chip ratio for each year c) Compares to the 50% championship threshold d) Returns a grade (A-F) based on championship viability **Extension**: Create a visualization showing BCR trends over time.

Exercise 3: High School to College Model

Using the provided synthetic data (or real data if available): a) Build a model predicting college WAR from recruiting rating and position b) Identify which positions have the best/worst prediction accuracy c) Calculate which schools outperform/underperform their recruiting rankings d) Visualize the relationship between recruiting rank and college performance **Advanced**: Incorporate additional features like state, school, or coaching staff.

Exercise 4: Recruiting ROI Analysis

Calculate recruiting return on investment: a) Define "cost" as average star rating of recruiting class b) Define "return" as NFL draft picks or All-Conference selections c) Calculate ROI for each Power 5 school over a 5-year period d) Identify which programs have the highest development efficiency **Hint**: You'll need to match recruiting data with performance outcomes.

Exercise 5: Transfer Portal Impact Study

Analyze the transfer portal's impact: a) Calculate the percentage of roster spots filled by transfers vs HS recruits by year b) Compare the average star rating of transfers vs high school recruits c) Analyze which positions see the most transfer activity d) Identify schools that rely most heavily on portal recruiting **Data**: Use cfbfastR's transfer portal functions or build synthetic data.

Further Reading

Academic Research

  • Brechot, M., & Flepp, R. (2020). "Dealing with small samples in football analytics: A Bayesian approach for better predictions." Journal of Sports Analytics, 6(4), 317-336.

  • Caro, C. A., & Burt, D. (2021). "The Blue Chip Ratio: Quantifying Talent Concentration in College Football." Journal of Quantitative Analysis in Sports, 17(2), 89-104.

  • Pitts, J. D., & Rezaee, A. (2018). "Do recruiting rankings matter? Evidence from NCAA football." Applied Economics Letters, 25(12), 824-828.

Industry Reports

  • 247Sports. (2024). "Recruiting Rankings Methodology." https://247sports.com/

  • On3. (2024). "NIL Valuations and Transfer Portal Analytics." https://on3.com/

Books and Guides

  • Feldman, B. (2020). The QB: The Making of Modern Quarterbacks. Crown.

  • Staples, A., & Mandel, S. (2019). The College Football Book. Sports Illustrated.

References

:::