Chapter 31: Machine Learning for NFL | Football Analytics Textbook

Learning ObjectivesBy the end of this chapter, you will be able to:

Apply supervised learning methods to football prediction problems
Build classification models for categorical outcomes (wins, play types)
Develop regression models for continuous metrics (EPA, player performance)
Engineer football-specific features for improved model performance
Evaluate and validate machine learning models using appropriate metrics
Interpret model predictions using SHAP values and feature importance
Select appropriate algorithms for different football analytics tasks

Introduction

Machine learning has revolutionized football analytics, enabling teams to make predictions and identify patterns that would be impossible through traditional statistical methods. From predicting game outcomes to forecasting player performance, from classifying play types to optimizing personnel decisions, ML algorithms power many of the most sophisticated analytics applications in modern football.

Unlike traditional statistical models that rely on predefined relationships and assumptions, machine learning algorithms can automatically discover complex, non-linear patterns in data. This capability is particularly valuable in football, where interactions between variables—field position, down and distance, personnel groupings, game situation—create intricate relationships that simple models struggle to capture.

This chapter provides a comprehensive introduction to machine learning for football analytics. We'll explore both classification and regression problems, implement multiple algorithms, develop football-specific features, validate model performance, and interpret predictions. By the end, you'll have a complete toolkit for building, evaluating, and deploying ML models for football.

What is Machine Learning?

Machine learning is a branch of artificial intelligence focused on building systems that learn from data without being explicitly programmed. Instead of writing rules to solve a problem, we provide examples and let the algorithm discover the patterns. **Key characteristics:** - **Data-driven**: Learn from examples rather than rules - **Pattern discovery**: Automatically identify relationships - **Generalization**: Make predictions on new, unseen data - **Adaptability**: Improve as more data becomes available

Supervised vs Unsupervised Learning

Machine learning encompasses several paradigms, but two dominate football analytics applications:

Supervised Learning

Supervised learning uses labeled training data where the outcome is known. The algorithm learns to map inputs (features) to outputs (labels).

Classification tasks (categorical outcomes):
- Will the team win this game? (binary: yes/no)
- What play type will be called? (multi-class: pass/run/punt/FG)
- Will this be a successful play? (binary: success/failure)
- Will the player make the Pro Bowl? (binary: yes/no)

Regression tasks (continuous outcomes):
- What will the EPA of this play be?
- How many yards will this player gain next season?
- What's the expected point differential?
- What will this player's PFF grade be?

Unsupervised Learning

Unsupervised learning finds patterns in unlabeled data without predefined outcomes.

Common applications:
- Clustering: Group similar plays, players, or teams
- Dimensionality reduction: Simplify high-dimensional tracking data
- Anomaly detection: Identify unusual plays or performances

This chapter focuses on supervised learning, which addresses most practical football analytics problems.

Choosing Between Classification and Regression

- Use **classification** when predicting categories (win/loss, play type) - Use **regression** when predicting quantities (EPA, yards, points) - Some problems can be framed either way (e.g., "Will EPA be positive?" vs "What will EPA be?")

Classification Problems in Football

Classification models predict categorical outcomes. Let's explore two fundamental football classification problems.

Win Prediction with Logistic Regression

The most basic classification task is predicting game outcomes. We'll build a logistic regression model using pre-game team statistics.

R
Python

#| label: win-prediction-r
#| message: false
#| warning: false
#| cache: true

library(tidyverse)
library(nflfastR)
library(caret)
library(gt)

# Load multiple seasons
pbp <- load_pbp(2018:2023)

# Calculate team stats by game
team_game_stats <- pbp %>%
  filter(!is.na(posteam), !is.na(epa)) %>%
  group_by(season, week, game_id, posteam) %>%
  summarise(
    off_epa_play = mean(epa, na.rm = TRUE),
    off_success_rate = mean(epa > 0, na.rm = TRUE),
    off_pass_epa = mean(epa[pass == 1], na.rm = TRUE),
    off_rush_epa = mean(epa[rush == 1], na.rm = TRUE),
    plays = n(),
    .groups = "drop"
  )

# Get game results
game_results <- pbp %>%
  filter(!is.na(posteam), !is.na(result)) %>%
  group_by(season, week, game_id, posteam) %>%
  summarise(
    won = first(result) > 0,
    result = first(result),
    .groups = "drop"
  )

# Calculate rolling averages (last 4 games)
team_rolling <- team_game_stats %>%
  arrange(posteam, season, week) %>%
  group_by(posteam) %>%
  mutate(
    avg_off_epa = lag(zoo::rollmean(off_epa_play, k = 4, fill = NA, align = "right"), 1),
    avg_success_rate = lag(zoo::rollmean(off_success_rate, k = 4, fill = NA, align = "right"), 1),
    avg_pass_epa = lag(zoo::rollmean(off_pass_epa, k = 4, fill = NA, align = "right"), 1),
    avg_rush_epa = lag(zoo::rollmean(off_rush_epa, k = 4, fill = NA, align = "right"), 1)
  ) %>%
  ungroup()

# Join with results and remove NAs
model_data <- team_rolling %>%
  inner_join(game_results, by = c("season", "week", "game_id", "posteam")) %>%
  filter(!is.na(avg_off_epa)) %>%
  mutate(won = factor(won, levels = c(FALSE, TRUE), labels = c("Loss", "Win")))

# Split into train/test (2018-2022 train, 2023 test)
train_data <- model_data %>% filter(season < 2023)
test_data <- model_data %>% filter(season == 2023)

# Train logistic regression
set.seed(42)
logistic_model <- glm(
  won ~ avg_off_epa + avg_success_rate + avg_pass_epa + avg_rush_epa,
  data = train_data,
  family = binomial(link = "logit")
)

# Model summary
summary(logistic_model)

# Predictions on test set
test_predictions <- test_data %>%
  mutate(
    pred_prob = predict(logistic_model, newdata = test_data, type = "response"),
    pred_class = factor(ifelse(pred_prob > 0.5, "Win", "Loss"),
                       levels = c("Loss", "Win"))
  )

# Calculate accuracy
accuracy <- mean(test_predictions$pred_class == test_predictions$won)
cat("\nTest Set Accuracy:", round(accuracy * 100, 1), "%\n")

#| label: win-prediction-py
#| message: false
#| warning: false
#| cache: true

import pandas as pd
import numpy as np
import nfl_data_py as nfl
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
import warnings
warnings.filterwarnings('ignore')

# Load multiple seasons
pbp = nfl.import_pbp_data(list(range(2018, 2024)))

# Calculate team stats by game
team_game_stats = (pbp
    .query("posteam.notna() & epa.notna()")
    .groupby(['season', 'week', 'game_id', 'posteam'])
    .agg(
        off_epa_play=('epa', 'mean'),
        off_success_rate=('epa', lambda x: (x > 0).mean()),
        off_pass_epa=('epa', lambda x: x[pbp.loc[x.index, 'pass'] == 1].mean()),
        off_rush_epa=('epa', lambda x: x[pbp.loc[x.index, 'rush'] == 1].mean()),
        plays=('epa', 'count')
    )
    .reset_index()
)

# Get game results
game_results = (pbp
    .query("posteam.notna() & result.notna()")
    .groupby(['season', 'week', 'game_id', 'posteam'])
    .agg(
        won=('result', lambda x: x.iloc[0] > 0),
        result=('result', 'first')
    )
    .reset_index()
)

# Calculate rolling averages
team_game_stats = team_game_stats.sort_values(['posteam', 'season', 'week'])

team_rolling = (team_game_stats
    .groupby('posteam')
    .apply(lambda x: x.assign(
        avg_off_epa=x['off_epa_play'].rolling(window=4, min_periods=1).mean().shift(1),
        avg_success_rate=x['off_success_rate'].rolling(window=4, min_periods=1).mean().shift(1),
        avg_pass_epa=x['off_pass_epa'].rolling(window=4, min_periods=1).mean().shift(1),
        avg_rush_epa=x['off_rush_epa'].rolling(window=4, min_periods=1).mean().shift(1)
    ))
    .reset_index(drop=True)
)

# Join with results
model_data = (team_rolling
    .merge(game_results, on=['season', 'week', 'game_id', 'posteam'])
    .dropna(subset=['avg_off_epa'])
)

# Split train/test
train_data = model_data[model_data['season'] < 2023]
test_data = model_data[model_data['season'] == 2023]

# Prepare features
feature_cols = ['avg_off_epa', 'avg_success_rate', 'avg_pass_epa', 'avg_rush_epa']
X_train = train_data[feature_cols]
y_train = train_data['won']
X_test = test_data[feature_cols]
y_test = test_data['won']

# Train logistic regression
logistic_model = LogisticRegression(random_state=42, max_iter=1000)
logistic_model.fit(X_train, y_train)

# Predictions
y_pred = logistic_model.predict(X_test)
y_pred_proba = logistic_model.predict_proba(X_test)[:, 1]

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"\nTest Set Accuracy: {accuracy * 100:.1f}%")

# Coefficients
coef_df = pd.DataFrame({
    'Feature': feature_cols,
    'Coefficient': logistic_model.coef_[0]
})
print("\nModel Coefficients:")
print(coef_df.to_string(index=False))

Play Type Classification with Random Forest

Predicting offensive play-calling is a multi-class classification problem. Random forests excel at capturing complex interactions between game situation variables.

R
Python

#| label: play-type-classification-r
#| message: false
#| warning: false
#| cache: true

library(randomForest)

# Prepare play-level data
play_data <- pbp %>%
  filter(
    season == 2023,
    !is.na(down),
    down <= 3,
    play_type %in% c("pass", "run")
  ) %>%
  select(
    play_type,
    down,
    ydstogo,
    yardline_100,
    qtr,
    score_differential,
    half_seconds_remaining,
    shotgun,
    no_huddle
  ) %>%
  mutate(
    play_type = factor(play_type),
    shotgun = as.integer(shotgun),
    no_huddle = as.integer(no_huddle)
  ) %>%
  drop_na()

# Split data
set.seed(42)
train_idx <- sample(1:nrow(play_data), 0.8 * nrow(play_data))
train_plays <- play_data[train_idx, ]
test_plays <- play_data[-train_idx, ]

# Train random forest
rf_model <- randomForest(
  play_type ~ .,
  data = train_plays,
  ntree = 100,
  mtry = 3,
  importance = TRUE
)

# Predictions
rf_predictions <- predict(rf_model, newdata = test_plays)
rf_accuracy <- mean(rf_predictions == test_plays$play_type)

cat("\nRandom Forest Accuracy:", round(rf_accuracy * 100, 1), "%\n")

# Confusion matrix
conf_matrix <- table(Predicted = rf_predictions, Actual = test_plays$play_type)
print("\nConfusion Matrix:")
print(conf_matrix)

# Feature importance
importance_df <- as.data.frame(importance(rf_model)) %>%
  rownames_to_column("Feature") %>%
  arrange(desc(MeanDecreaseAccuracy))

print("\nTop Features by Importance:")
print(head(importance_df, 5))

#| label: play-type-classification-py
#| message: false
#| warning: false
#| cache: true

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split

# Prepare play-level data
play_data = (pbp
    .query("season == 2023 & down.notna() & down <= 3 & play_type.isin(['pass', 'run'])")
    [['play_type', 'down', 'ydstogo', 'yardline_100', 'qtr',
      'score_differential', 'half_seconds_remaining', 'shotgun', 'no_huddle']]
    .assign(
        shotgun=lambda x: x['shotgun'].astype(int),
        no_huddle=lambda x: x['no_huddle'].astype(int)
    )
    .dropna()
)

# Features and target
X = play_data.drop('play_type', axis=1)
y = play_data['play_type']

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train random forest
rf_model = RandomForestClassifier(
    n_estimators=100,
    max_features=3,
    random_state=42,
    n_jobs=-1
)
rf_model.fit(X_train, y_train)

# Predictions
y_pred = rf_model.predict(X_test)
rf_accuracy = accuracy_score(y_test, y_pred)

print(f"\nRandom Forest Accuracy: {rf_accuracy * 100:.1f}%")

# Confusion matrix
conf_mat = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix:")
print(pd.DataFrame(
    conf_mat,
    index=['Actual Pass', 'Actual Run'],
    columns=['Predicted Pass', 'Predicted Run']
))

# Feature importance
importance_df = pd.DataFrame({
    'Feature': X.columns,
    'Importance': rf_model.feature_importances_
}).sort_values('Importance', ascending=False)

print("\nTop Features by Importance:")
print(importance_df.head(5).to_string(index=False))

Regression Problems in Football

Regression models predict continuous numerical outcomes. Let's explore EPA prediction and player performance modeling.

EPA Prediction with Gradient Boosting

Gradient boosting machines (GBM) excel at regression tasks with complex feature interactions. We'll predict EPA for upcoming plays.

R
Python

#| label: epa-prediction-r
#| message: false
#| warning: false
#| cache: true

library(xgboost)

# Prepare regression data
epa_data <- pbp %>%
  filter(
    season %in% 2022:2023,
    !is.na(epa),
    !is.na(down),
    down <= 4,
    play_type %in% c("pass", "run")
  ) %>%
  select(
    epa,
    down,
    ydstogo,
    yardline_100,
    qtr,
    score_differential,
    half_seconds_remaining,
    shotgun,
    no_huddle,
    play_type
  ) %>%
  mutate(
    shotgun = as.integer(shotgun),
    no_huddle = as.integer(no_huddle),
    is_pass = as.integer(play_type == "pass")
  ) %>%
  select(-play_type) %>%
  drop_na()

# Split by season
train_epa <- epa_data %>% filter(half_seconds_remaining > 900 | qtr < 4)
test_epa <- epa_data %>% filter(half_seconds_remaining <= 900 & qtr == 4)

# Prepare matrices for xgboost
X_train <- train_epa %>% select(-epa) %>% as.matrix()
y_train <- train_epa$epa
X_test <- test_epa %>% select(-epa) %>% as.matrix()
y_test <- test_epa$epa

# Train gradient boosting model
dtrain <- xgb.DMatrix(data = X_train, label = y_train)
dtest <- xgb.DMatrix(data = X_test, label = y_test)

params <- list(
  objective = "reg:squarederror",
  eta = 0.1,
  max_depth = 6,
  subsample = 0.8,
  colsample_bytree = 0.8
)

xgb_model <- xgb.train(
  params = params,
  data = dtrain,
  nrounds = 100,
  watchlist = list(train = dtrain, test = dtest),
  verbose = 0
)

# Predictions
predictions <- predict(xgb_model, dtest)

# Evaluation metrics
rmse <- sqrt(mean((predictions - y_test)^2))
mae <- mean(abs(predictions - y_test))
r2 <- cor(predictions, y_test)^2

cat("\nGradient Boosting Performance:\n")
cat("RMSE:", round(rmse, 3), "\n")
cat("MAE:", round(mae, 3), "\n")
cat("R²:", round(r2, 3), "\n")

# Feature importance
importance_matrix <- xgb.importance(
  feature_names = colnames(X_train),
  model = xgb_model
)

print("\nTop 5 Features:")
print(head(importance_matrix, 5))

#| label: epa-prediction-py
#| message: false
#| warning: false
#| cache: true

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Prepare regression data
epa_data = (pbp
    .query("season in [2022, 2023] & epa.notna() & down.notna() & down <= 4 & play_type.isin(['pass', 'run'])")
    [['epa', 'down', 'ydstogo', 'yardline_100', 'qtr', 'score_differential',
      'half_seconds_remaining', 'shotgun', 'no_huddle', 'play_type']]
    .assign(
        shotgun=lambda x: x['shotgun'].astype(int),
        no_huddle=lambda x: x['no_huddle'].astype(int),
        is_pass=lambda x: (x['play_type'] == 'pass').astype(int)
    )
    .drop('play_type', axis=1)
    .dropna()
)

# Split data
train_epa = epa_data.query("half_seconds_remaining > 900 | qtr < 4")
test_epa = epa_data.query("half_seconds_remaining <= 900 & qtr == 4")

X_train = train_epa.drop('epa', axis=1)
y_train = train_epa['epa']
X_test = test_epa.drop('epa', axis=1)
y_test = test_epa['epa']

# Train gradient boosting model
gbm_model = GradientBoostingRegressor(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=6,
    subsample=0.8,
    random_state=42
)
gbm_model.fit(X_train, y_train)

# Predictions
predictions = gbm_model.predict(X_test)

# Evaluation metrics
rmse = np.sqrt(mean_squared_error(y_test, predictions))
mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print("\nGradient Boosting Performance:")
print(f"RMSE: {rmse:.3f}")
print(f"MAE: {mae:.3f}")
print(f"R²: {r2:.3f}")

# Feature importance
importance_df = pd.DataFrame({
    'Feature': X_train.columns,
    'Importance': gbm_model.feature_importances_
}).sort_values('Importance', ascending=False)

print("\nTop 5 Features:")
print(importance_df.head(5).to_string(index=False))

Player Performance Regression

Predicting future player performance requires careful feature engineering and handling of small sample sizes.

R
Python

#| label: player-performance-r
#| message: false
#| warning: false
#| cache: true

# Calculate QB season stats
qb_stats <- pbp %>%
  filter(
    season %in% 2019:2023,
    !is.na(passer_player_name),
    !is.na(epa),
    pass == 1
  ) %>%
  group_by(season, passer_player_id, passer_player_name) %>%
  summarise(
    attempts = n(),
    epa_per_play = mean(epa),
    success_rate = mean(epa > 0),
    cpoe = mean(cpoe, na.rm = TRUE),
    air_yards_per_att = mean(air_yards, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  filter(attempts >= 200)

# Create lagged features (predict next season)
qb_model_data <- qb_stats %>%
  arrange(passer_player_id, season) %>%
  group_by(passer_player_id) %>%
  mutate(
    next_epa = lead(epa_per_play),
    seasons_played = row_number()
  ) %>%
  ungroup() %>%
  filter(!is.na(next_epa)) %>%
  select(
    next_epa,
    epa_per_play,
    success_rate,
    cpoe,
    air_yards_per_att,
    attempts,
    seasons_played
  )

# Split data
set.seed(42)
train_idx <- sample(1:nrow(qb_model_data), 0.8 * nrow(qb_model_data))
train_qb <- qb_model_data[train_idx, ]
test_qb <- qb_model_data[-train_idx, ]

# Linear regression with regularization
library(glmnet)

X_train_qb <- train_qb %>% select(-next_epa) %>% as.matrix()
y_train_qb <- train_qb$next_epa
X_test_qb <- test_qb %>% select(-next_epa) %>% as.matrix()
y_test_qb <- test_qb$next_epa

# Ridge regression
ridge_model <- cv.glmnet(
  X_train_qb,
  y_train_qb,
  alpha = 0,
  nfolds = 5
)

# Predictions
ridge_pred <- predict(ridge_model, newx = X_test_qb, s = "lambda.min")

# Evaluation
qb_rmse <- sqrt(mean((ridge_pred - y_test_qb)^2))
qb_r2 <- cor(ridge_pred, y_test_qb)^2

cat("\nQB Performance Prediction:\n")
cat("RMSE:", round(qb_rmse, 3), "EPA per play\n")
cat("R²:", round(qb_r2, 3), "\n")

#| label: player-performance-py
#| message: false
#| warning: false
#| cache: true

from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler

# Calculate QB season stats
qb_stats = (pbp
    .query("season in [2019, 2020, 2021, 2022, 2023] & passer_player_name.notna() & epa.notna() & pass == 1")
    .groupby(['season', 'passer_player_id', 'passer_player_name'])
    .agg(
        attempts=('epa', 'count'),
        epa_per_play=('epa', 'mean'),
        success_rate=('epa', lambda x: (x > 0).mean()),
        cpoe=('cpoe', 'mean'),
        air_yards_per_att=('air_yards', 'mean')
    )
    .reset_index()
    .query("attempts >= 200")
)

# Create lagged features
qb_stats = qb_stats.sort_values(['passer_player_id', 'season'])
qb_model_data = (qb_stats
    .assign(
        next_epa=lambda x: x.groupby('passer_player_id')['epa_per_play'].shift(-1),
        seasons_played=lambda x: x.groupby('passer_player_id').cumcount() + 1
    )
    .dropna(subset=['next_epa'])
    [['next_epa', 'epa_per_play', 'success_rate', 'cpoe',
      'air_yards_per_att', 'attempts', 'seasons_played']]
)

# Split data
from sklearn.model_selection import train_test_split
X_qb = qb_model_data.drop('next_epa', axis=1)
y_qb = qb_model_data['next_epa']

X_train_qb, X_test_qb, y_train_qb, y_test_qb = train_test_split(
    X_qb, y_qb, test_size=0.2, random_state=42
)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_qb)
X_test_scaled = scaler.transform(X_test_qb)

# Ridge regression
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train_scaled, y_train_qb)

# Predictions
ridge_pred = ridge_model.predict(X_test_scaled)

# Evaluation
qb_rmse = np.sqrt(mean_squared_error(y_test_qb, ridge_pred))
qb_r2 = r2_score(y_test_qb, ridge_pred)

print("\nQB Performance Prediction:")
print(f"RMSE: {qb_rmse:.3f} EPA per play")
print(f"R²: {qb_r2:.3f}")

Feature Engineering for Football Data

Feature engineering—creating meaningful variables from raw data—often matters more than algorithm selection. Football provides rich opportunities for creative feature development.

#| label: feature-engineering-r
#| message: false
#| warning: false

# Example: Create time-based features
pbp_features <- pbp %>%
  filter(season == 2023, !is.na(epa)) %>%
  mutate(
    # Game situation features
    is_first_half = qtr <= 2,
    is_two_minute_drill = half_seconds_remaining <= 120,
    is_goal_to_go = yardline_100 <= 10 & ydstogo >= yardline_100,
    is_red_zone = yardline_100 <= 20,
    is_third_down = down == 3,
    is_fourth_down = down == 4,

    # Score situation
    is_close_game = abs(score_differential) <= 7,
    is_winning = score_differential > 0,
    is_losing = score_differential < 0,
    score_diff_bucket = cut(
      score_differential,
      breaks = c(-Inf, -14, -7, 0, 7, 14, Inf),
      labels = c("Down 14+", "Down 7-13", "Down 1-6", "Up 1-6", "Up 7-13", "Up 14+")
    ),

    # Field position
    field_zone = cut(
      yardline_100,
      breaks = c(0, 20, 50, 80, 100),
      labels = c("Red Zone", "Mid-Field-Near", "Mid-Field-Far", "Own Territory")
    ),

    # Down-distance combinations
    down_distance_combo = case_when(
      down == 1 ~ "First Down",
      down == 2 & ydstogo <= 3 ~ "Second & Short",
      down == 2 & ydstogo >= 7 ~ "Second & Long",
      down == 3 & ydstogo <= 3 ~ "Third & Short",
      down == 3 & ydstogo >= 7 ~ "Third & Long",
      TRUE ~ "Other"
    )
  )

# Show summary of engineered features
feature_summary <- pbp_features %>%
  summarise(
    pct_first_half = mean(is_first_half),
    pct_two_min = mean(is_two_minute_drill),
    pct_red_zone = mean(is_red_zone),
    pct_close_game = mean(is_close_game),
    pct_third_down = mean(is_third_down)
  )

print("Feature Statistics:")
print(feature_summary)

#| label: feature-engineering-py
#| message: false
#| warning: false

# Create time-based features
pbp_features = (pbp
    .query("season == 2023 & epa.notna()")
    .assign(
        # Game situation features
        is_first_half=lambda x: x['qtr'] <= 2,
        is_two_minute_drill=lambda x: x['half_seconds_remaining'] <= 120,
        is_goal_to_go=lambda x: (x['yardline_100'] <= 10) & (x['ydstogo'] >= x['yardline_100']),
        is_red_zone=lambda x: x['yardline_100'] <= 20,
        is_third_down=lambda x: x['down'] == 3,
        is_fourth_down=lambda x: x['down'] == 4,

        # Score situation
        is_close_game=lambda x: x['score_differential'].abs() <= 7,
        is_winning=lambda x: x['score_differential'] > 0,
        is_losing=lambda x: x['score_differential'] < 0,
        score_diff_bucket=lambda x: pd.cut(
            x['score_differential'],
            bins=[-np.inf, -14, -7, 0, 7, 14, np.inf],
            labels=['Down 14+', 'Down 7-13', 'Down 1-6', 'Up 1-6', 'Up 7-13', 'Up 14+']
        ),

        # Field position
        field_zone=lambda x: pd.cut(
            x['yardline_100'],
            bins=[0, 20, 50, 80, 100],
            labels=['Red Zone', 'Mid-Field-Near', 'Mid-Field-Far', 'Own Territory']
        ),

        # Down-distance combinations
        down_distance_combo=lambda x: np.select(
            [
                x['down'] == 1,
                (x['down'] == 2) & (x['ydstogo'] <= 3),
                (x['down'] == 2) & (x['ydstogo'] >= 7),
                (x['down'] == 3) & (x['ydstogo'] <= 3),
                (x['down'] == 3) & (x['ydstogo'] >= 7)
            ],
            [
                'First Down',
                'Second & Short',
                'Second & Long',
                'Third & Short',
                'Third & Long'
            ],
            default='Other'
        )
    )
)

# Feature statistics
feature_summary = pbp_features[[
    'is_first_half', 'is_two_minute_drill', 'is_red_zone',
    'is_close_game', 'is_third_down'
]].mean()

print("Feature Statistics:")
print(feature_summary)

Aggregated Historical Features

R
Python

#| label: historical-features-r
#| message: false
#| warning: false

# Team tendency features (rolling windows)
team_tendencies <- pbp %>%
  filter(season == 2023, !is.na(posteam)) %>%
  arrange(posteam, week, game_id) %>%
  group_by(posteam) %>%
  mutate(
    # Rolling pass rate
    pass_rate_L3 = lag(zoo::rollmean(pass, k = 50, fill = NA, align = "right"), 1),

    # Rolling EPA
    epa_L3 = lag(zoo::rollmean(epa, k = 50, fill = NA, align = "right"), 1),

    # Situational tendencies
    third_down_conv_rate = lag(
      zoo::rollmean(ifelse(down == 3, as.numeric(first_down), NA),
                   k = 20, fill = NA, align = "right", na.rm = TRUE),
      1
    )
  ) %>%
  ungroup()

print("Historical features created")

#| label: historical-features-py
#| message: false
#| warning: false

# Team tendency features
team_tendencies = (pbp
    .query("season == 2023 & posteam.notna()")
    .sort_values(['posteam', 'week', 'game_id'])
    .assign(
        pass_int=lambda x: x['pass'].astype(float)
    )
    .groupby('posteam', group_keys=False)
    .apply(lambda x: x.assign(
        pass_rate_L3=x['pass_int'].rolling(window=50, min_periods=1).mean().shift(1),
        epa_L3=x['epa'].rolling(window=50, min_periods=1).mean().shift(1)
    ))
)

print("Historical features created")

Model Selection and Algorithms

Different algorithms have different strengths for football analytics problems:

Algorithm Comparison

Algorithm	Strengths	Weaknesses	Best For
Logistic Regression	Interpretable, fast, probabilistic	Linear decision boundaries	Win prediction, simple classification
Random Forest	Handles interactions, robust	Less interpretable, can overfit	Play-calling, complex classification
Gradient Boosting	High accuracy, handles non-linearity	Slower training, tuning required	EPA prediction, player metrics
Neural Networks	Extremely flexible, handles complex patterns	Black box, needs lots of data	Tracking data, computer vision
Ridge/Lasso	Regularization prevents overfitting	Assumes linearity	Player projections, small samples

Cross-Validation and Model Selection

R
Python

#| label: cross-validation-r
#| message: false
#| warning: false
#| cache: true

library(caret)

# Prepare data for model comparison
model_comp_data <- pbp %>%
  filter(
    season == 2023,
    !is.na(epa),
    down <= 3,
    play_type %in% c("pass", "run")
  ) %>%
  select(
    epa,
    down,
    ydstogo,
    yardline_100,
    score_differential,
    half_seconds_remaining
  ) %>%
  drop_na() %>%
  slice_sample(n = 5000)  # Sample for speed

# Set up cross-validation
train_control <- trainControl(
  method = "cv",
  number = 5,
  verboseIter = FALSE
)

# Train multiple models
set.seed(42)

# Linear model
lm_cv <- train(
  epa ~ .,
  data = model_comp_data,
  method = "lm",
  trControl = train_control
)

# Ridge regression
ridge_cv <- train(
  epa ~ .,
  data = model_comp_data,
  method = "ridge",
  trControl = train_control,
  tuneGrid = expand.grid(lambda = seq(0, 1, by = 0.1))
)

# Random forest
rf_cv <- train(
  epa ~ .,
  data = model_comp_data,
  method = "rf",
  trControl = train_control,
  tuneGrid = expand.grid(mtry = c(2, 3, 4)),
  ntree = 50
)

# Compare models
model_comparison <- resamples(list(
  Linear = lm_cv,
  Ridge = ridge_cv,
  RandomForest = rf_cv
))

summary(model_comparison)

#| label: cross-validation-py
#| message: false
#| warning: false
#| cache: true

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.ensemble import RandomForestRegressor

# Prepare data
model_comp_data = (pbp
    .query("season == 2023 & epa.notna() & down <= 3 & play_type.isin(['pass', 'run'])")
    [['epa', 'down', 'ydstogo', 'yardline_100', 'score_differential', 'half_seconds_remaining']]
    .dropna()
    .sample(n=5000, random_state=42)
)

X = model_comp_data.drop('epa', axis=1)
y = model_comp_data['epa']

# Define models
models = {
    'Linear': LinearRegression(),
    'Ridge': Ridge(alpha=1.0),
    'RandomForest': RandomForestRegressor(n_estimators=50, random_state=42)
}

# Cross-validation
cv_results = {}
for name, model in models.items():
    scores = cross_val_score(model, X, y, cv=5, scoring='r2')
    cv_results[name] = {
        'mean_r2': scores.mean(),
        'std_r2': scores.std()
    }

# Display results
cv_df = pd.DataFrame(cv_results).T
print("\nCross-Validation Results:")
print(cv_df.to_string())

Hyperparameter Tuning

Hyperparameters control model complexity and learning. Systematic tuning improves performance.

R
Python

#| label: hyperparameter-tuning-r
#| message: false
#| warning: false
#| cache: true

# Grid search for Random Forest
rf_grid <- expand.grid(
  mtry = c(2, 3, 4, 5),
  splitrule = c("variance", "extratrees"),
  min.node.size = c(5, 10, 20)
)

# Use ranger (faster RF implementation)
library(ranger)

rf_tuned <- train(
  epa ~ .,
  data = model_comp_data,
  method = "ranger",
  trControl = train_control,
  tuneGrid = rf_grid,
  num.trees = 100,
  importance = "impurity"
)

# Best parameters
cat("\nBest Hyperparameters:\n")
print(rf_tuned$bestTune)

cat("\nBest RMSE:", round(min(rf_tuned$results$RMSE), 3), "\n")

#| label: hyperparameter-tuning-py
#| message: false
#| warning: false
#| cache: true

from sklearn.model_selection import GridSearchCV

# Define parameter grid
param_grid = {
    'n_estimators': [50, 100],
    'max_depth': [5, 10, None],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2]
}

# Grid search
rf_grid_search = GridSearchCV(
    RandomForestRegressor(random_state=42),
    param_grid,
    cv=3,
    scoring='r2',
    n_jobs=-1
)

rf_grid_search.fit(X, y)

# Best parameters
print("\nBest Hyperparameters:")
print(rf_grid_search.best_params_)

print(f"\nBest R² Score: {rf_grid_search.best_score_:.3f}")

Model Evaluation Metrics

Choosing the right evaluation metric depends on the problem and business context.

Classification Metrics

R
Python

#| label: classification-metrics-r
#| message: false
#| warning: false
#| fig-width: 10
#| fig-height: 5

library(pROC)
library(yardstick)

# Get predictions from win prediction model
eval_data <- test_predictions %>%
  mutate(
    won_binary = as.integer(won == "Win"),
    pred_binary = as.integer(pred_class == "Win")
  )

# Confusion matrix
conf_mat <- table(Predicted = eval_data$pred_class, Actual = eval_data$won)

# Calculate metrics
accuracy <- mean(eval_data$pred_binary == eval_data$won_binary)
precision <- sum(eval_data$pred_binary == 1 & eval_data$won_binary == 1) /
             sum(eval_data$pred_binary == 1)
recall <- sum(eval_data$pred_binary == 1 & eval_data$won_binary == 1) /
          sum(eval_data$won_binary == 1)
f1 <- 2 * (precision * recall) / (precision + recall)

cat("Classification Metrics:\n")
cat("Accuracy: ", round(accuracy, 3), "\n")
cat("Precision:", round(precision, 3), "\n")
cat("Recall:   ", round(recall, 3), "\n")
cat("F1 Score: ", round(f1, 3), "\n")

# ROC curve
roc_obj <- roc(eval_data$won_binary, eval_data$pred_prob)
auc_value <- auc(roc_obj)

cat("\nAUC:", round(auc_value, 3), "\n")

#| label: classification-metrics-py
#| message: false
#| warning: false

from sklearn.metrics import (
    accuracy_score, precision_score, recall_score,
    f1_score, roc_auc_score, roc_curve
)

# Classification metrics (using win prediction from earlier)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
auc = roc_auc_score(y_test, y_pred_proba)

print("Classification Metrics:")
print(f"Accuracy:  {accuracy:.3f}")
print(f"Precision: {precision:.3f}")
print(f"Recall:    {recall:.3f}")
print(f"F1 Score:  {f1:.3f}")
print(f"\nAUC: {auc:.3f}")

Visualizing Model Performance

R
Python

#| label: fig-roc-curve-r
#| fig-cap: "ROC Curve for Win Prediction Model"
#| fig-width: 8
#| fig-height: 6
#| message: false
#| warning: false

# ROC Curve
plot(roc_obj,
     col = "#0072B2",
     lwd = 2,
     main = "ROC Curve: Win Prediction Model",
     print.auc = TRUE,
     print.auc.x = 0.4,
     print.auc.y = 0.4)
abline(a = 0, b = 1, lty = 2, col = "gray")

#| label: fig-roc-curve-py
#| fig-cap: "ROC Curve for Win Prediction Model (Python)"
#| fig-width: 8
#| fig-height: 6
#| message: false
#| warning: false

# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, color='#0072B2', lw=2, label=f'ROC Curve (AUC = {auc:.3f})')
plt.plot([0, 1], [0, 1], color='gray', lw=1, linestyle='--', label='Random Classifier')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate', fontsize=12)
plt.ylabel('True Positive Rate', fontsize=12)
plt.title('ROC Curve: Win Prediction Model', fontsize=14, fontweight='bold')
plt.legend(loc='lower right')
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

Confusion Matrix Visualization

R
Python

#| label: fig-confusion-matrix-r
#| fig-cap: "Confusion Matrix for Win Prediction"
#| fig-width: 7
#| fig-height: 6
#| message: false
#| warning: false

library(ggplot2)

# Create confusion matrix dataframe
conf_df <- as.data.frame(conf_mat) %>%
  mutate(
    Percentage = Freq / sum(Freq) * 100,
    Label = paste0(Freq, "\n(", round(Percentage, 1), "%)")
  )

ggplot(conf_df, aes(x = Actual, y = Predicted, fill = Freq)) +
  geom_tile(color = "white", size = 1.5) +
  geom_text(aes(label = Label), size = 5, fontface = "bold") +
  scale_fill_gradient(low = "#E8F4F8", high = "#0072B2") +
  labs(
    title = "Confusion Matrix: Win Prediction Model",
    subtitle = "2023 Season Test Set",
    x = "Actual Outcome",
    y = "Predicted Outcome"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text = element_text(size = 11),
    legend.position = "none"
  )

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

#| label: fig-confusion-matrix-py
#| fig-cap: "Confusion Matrix for Win Prediction (Python)"
#| fig-width: 7
#| fig-height: 6
#| message: false
#| warning: false

import seaborn as sns

# Create confusion matrix
conf_mat = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(7, 6))
sns.heatmap(
    conf_mat,
    annot=True,
    fmt='d',
    cmap='Blues',
    xticklabels=['Loss', 'Win'],
    yticklabels=['Loss', 'Win'],
    cbar_kws={'label': 'Count'}
)
plt.xlabel('Actual Outcome', fontsize=12)
plt.ylabel('Predicted Outcome', fontsize=12)
plt.title('Confusion Matrix: Win Prediction Model\n2023 Season Test Set',
          fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

📊 Visualization Output

The code above generates a visualization. To see the output, run this code in your R or Python environment. The resulting plot will help illustrate the concepts discussed in this section.

Model Interpretability with SHAP

SHAP (SHapley Additive exPlanations) values provide model-agnostic interpretability by measuring each feature's contribution to predictions.

R
Python

#| label: shap-analysis-r
#| message: false
#| warning: false
#| cache: true

library(shapr)

# Note: SHAP in R requires specific setup
# Using variable importance as alternative
importance_plot_data <- importance_df %>%
  head(10) %>%
  mutate(Feature = fct_reorder(Feature, MeanDecreaseAccuracy))

ggplot(importance_plot_data, aes(x = MeanDecreaseAccuracy, y = Feature)) +
  geom_col(fill = "#0072B2") +
  labs(
    title = "Feature Importance: Play Type Prediction",
    subtitle = "Random Forest Model",
    x = "Mean Decrease in Accuracy",
    y = NULL
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    axis.text.y = element_text(size = 11)
  )

#| label: shap-analysis-py
#| message: false
#| warning: false
#| cache: true
#| fig-width: 10
#| fig-height: 6

# SHAP analysis
try:
    import shap

    # Sample data for SHAP (computationally expensive)
    X_sample = X_test.sample(n=min(100, len(X_test)), random_state=42)

    # Create explainer
    explainer = shap.TreeExplainer(rf_model)
    shap_values = explainer.shap_values(X_sample)

    # Summary plot
    plt.figure(figsize=(10, 6))
    shap.summary_plot(shap_values[1], X_sample, show=False)
    plt.title("SHAP Summary: Play Type Prediction", fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

except ImportError:
    print("SHAP library not available. Install with: pip install shap")

    # Alternative: Feature importance
    importance_df_top = importance_df.head(10)

    plt.figure(figsize=(10, 6))
    plt.barh(range(len(importance_df_top)), importance_df_top['Importance'])
    plt.yticks(range(len(importance_df_top)), importance_df_top['Feature'])
    plt.xlabel('Feature Importance', fontsize=12)
    plt.title('Feature Importance: Play Type Prediction',
              fontsize=14, fontweight='bold')
    plt.gca().invert_yaxis()
    plt.tight_layout()
    plt.show()

Handling Football-Specific Data Challenges

Football data presents unique challenges that require specialized approaches.

Class Imbalance

Some outcomes are rare (4th down conversions, touchdowns), creating imbalanced datasets.

Solutions for Class Imbalance:

1. **Resampling**: Oversample minority class or undersample majority class 2. **Class weights**: Penalize misclassification of rare events more heavily 3. **SMOTE**: Synthetic Minority Over-sampling Technique 4. **Ensemble methods**: Use algorithms robust to imbalance (e.g., balanced random forest) 5. **Appropriate metrics**: Use F1, precision-recall, or AUC instead of accuracy

Small Sample Sizes

Player-level predictions often suffer from limited observations (rookies, backups).

Solutions for Small Samples:

1. **Regularization**: Ridge/Lasso regression prevents overfitting 2. **Hierarchical models**: Borrow strength across similar players 3. **Feature engineering**: Create stable aggregate features 4. **Domain priors**: Incorporate expert knowledge (Bayesian methods) 5. **Ensemble predictions**: Combine multiple weak learners

Temporal Dependencies

Football data has strong time dependencies (player improvement, scheme changes).

Solutions for Temporal Dependencies:

1. **Time-series CV**: Use rolling or expanding window cross-validation 2. **Recurrent networks**: LSTM/GRU for sequential patterns 3. **Temporal features**: Include time trends, seasonal effects 4. **Separate test period**: Never test on past data

Production Deployment Considerations

Moving models from development to production requires careful planning.

Model Versioning and Monitoring

#| eval: false

# Example model versioning structure
model_metadata <- list(
  model_name = "win_probability_v2",
  algorithm = "logistic_regression",
  training_date = Sys.Date(),
  training_seasons = "2018-2022",
  features = c("avg_off_epa", "avg_success_rate", "avg_pass_epa", "avg_rush_epa"),
  performance = list(
    train_accuracy = 0.68,
    test_accuracy = 0.66,
    auc = 0.72
  ),
  hyperparameters = list(
    regularization = "none"
  )
)

# Save model and metadata
saveRDS(logistic_model, "models/win_prob_v2.rds")
saveRDS(model_metadata, "models/win_prob_v2_metadata.rds")

Retraining Strategy

Model Drift and Retraining

Football strategies evolve. Models must be retrained regularly: - **Weekly updates**: For in-season predictions - **Seasonal retraining**: After each season completes - **Rule change updates**: When NFL rules change significantly - **Performance monitoring**: Track prediction accuracy over time - **A/B testing**: Compare new models against production baseline

API Deployment Example

#| eval: false

# Example Flask API for model serving
from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

# Load model
model = joblib.load('models/win_prob_v2.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    # Get features from request
    data = request.get_json()
    features = [[
        data['avg_off_epa'],
        data['avg_success_rate'],
        data['avg_pass_epa'],
        data['avg_rush_epa']
    ]]

    # Make prediction
    probability = model.predict_proba(features)[0][1]

    return jsonify({
        'win_probability': float(probability),
        'model_version': 'v2'
    })

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Advanced Topics and Extensions

Ensemble Methods

Combining multiple models often outperforms single models:

R
Python

#| label: ensemble-methods-r
#| message: false
#| warning: false
#| eval: false

# Stacking ensemble
library(caretEnsemble)

# Train multiple models
model_list <- caretList(
  epa ~ .,
  data = model_comp_data,
  trControl = train_control,
  methodList = c("lm", "ridge", "rf"),
  tuneList = NULL
)

# Stack models
stacked_model <- caretStack(
  model_list,
  method = "glm",
  trControl = train_control
)

# Stacked predictions often improve performance

#| label: ensemble-methods-py
#| message: false
#| warning: false
#| eval: false

from sklearn.ensemble import VotingRegressor

# Create ensemble
ensemble = VotingRegressor([
    ('linear', LinearRegression()),
    ('ridge', Ridge(alpha=1.0)),
    ('rf', RandomForestRegressor(n_estimators=50, random_state=42))
])

# Train ensemble
ensemble.fit(X_train, y_train)

# Ensemble predictions
ensemble_pred = ensemble.predict(X_test)

Neural Networks for Complex Patterns

Deep learning excels with large datasets and complex interactions:

#| eval: false

# Simple neural network example
from sklearn.neural_network import MLPRegressor

nn_model = MLPRegressor(
    hidden_layer_sizes=(64, 32, 16),
    activation='relu',
    solver='adam',
    max_iter=500,
    random_state=42
)

nn_model.fit(X_train, y_train)
nn_pred = nn_model.predict(X_test)

Summary

Machine learning provides powerful tools for football analytics, enabling predictions and insights impossible with traditional methods. Key takeaways:

Classification vs Regression:
- Classification for categorical outcomes (wins, play types)
- Regression for continuous metrics (EPA, yards, player performance)

Algorithm Selection:
- Logistic regression for interpretable classification
- Random forests for complex interactions
- Gradient boosting for maximum predictive accuracy
- Ridge/Lasso for regularization with small samples

Feature Engineering:
- Domain knowledge drives feature creation
- Temporal, situational, and aggregated features
- Handle football-specific patterns

Evaluation:
- Choose metrics appropriate for problem (accuracy, AUC, RMSE, R²)
- Use cross-validation to prevent overfitting
- Visualize performance with ROC curves, confusion matrices

Interpretability:
- SHAP values explain individual predictions
- Feature importance identifies key drivers
- Balance accuracy with interpretability

Production:
- Version models and track performance
- Retrain regularly as strategies evolve
- Monitor for model drift

In this chapter, we:

Distinguished supervised from unsupervised learning
Built classification models for wins and play types
Developed regression models for EPA and player performance
Engineered football-specific features
Compared algorithms and tuned hyperparameters
Evaluated models with appropriate metrics
Interpreted predictions with SHAP values
Addressed football data challenges
Considered production deployment

Exercises

Conceptual Questions

Algorithm Selection: You need to predict whether a team will convert on 4th down. The dataset is highly imbalanced (conversions are rare). Which algorithm would you choose and why? What techniques would you use to handle the class imbalance?
Feature Engineering: Design five features for predicting quarterback performance next season. Explain why each feature might be predictive and how you would calculate it from play-by-play data.
Cross-Validation Strategy: Why is random cross-validation inappropriate for time-series football data? Describe a better validation strategy for predicting game outcomes.

Coding Exercises

Exercise 1: Build a Fourth Down Conversion Model

Using nflfastR data, build a classification model to predict 4th down conversion success. **Requirements:** a) Load 2023 season data and filter to 4th down plays b) Create features: distance, field position, score situation, time remaining c) Train logistic regression, random forest, and gradient boosting models d) Compare model performance using accuracy, precision, recall, and AUC e) Visualize feature importance **Bonus**: Handle class imbalance using SMOTE or class weights.

Exercise 2: Receiver Yards After Catch Prediction

Build a regression model to predict yards after catch (YAC) for receivers. **Requirements:** a) Filter to completed passes with YAC data b) Engineer features: air yards, defender proximity, field position, route type c) Train multiple regression models d) Use 5-fold cross-validation to compare models e) Calculate RMSE and R² for each model f) Identify the top 5 most important features **Bonus**: Create SHAP plots to explain predictions for specific catches.

Exercise 3: Team Performance Forecasting

Predict team offensive EPA for the next 4 weeks based on recent performance. **Requirements:** a) Calculate team-level EPA statistics by week b) Create rolling average features (last 3 games, last 5 games) c) Include opponent defensive metrics d) Build a model to predict next week's offensive EPA e) Evaluate using time-series cross-validation f) Compare predictions to actual results **Bonus**: Incorporate schedule difficulty and rest days.

Exercise 4: Model Interpretation Challenge

Take any model from the previous exercises and: a) Generate SHAP values for 100 predictions b) Create a SHAP summary plot c) Select 3 interesting predictions and explain why the model made those predictions d) Identify cases where the model is most uncertain e) Suggest features that might improve model performance **Goal**: Practice explaining model behavior to non-technical stakeholders.

References

:::

Learning ObjectivesBy the end of this chapter, you will be able to:

Introduction

What is Machine Learning?

Supervised vs Unsupervised Learning

Supervised Learning

Unsupervised Learning

Choosing Between Classification and Regression

Classification Problems in Football

Win Prediction with Logistic Regression

Play Type Classification with Random Forest

Regression Problems in Football

EPA Prediction with Gradient Boosting

Player Performance Regression

Feature Engineering for Football Data

Temporal Features

Aggregated Historical Features

Model Selection and Algorithms

Algorithm Comparison

Cross-Validation and Model Selection

Hyperparameter Tuning

Model Evaluation Metrics

Classification Metrics

Visualizing Model Performance

Confusion Matrix Visualization

📊 Visualization Output

📊 Visualization Output

Model Interpretability with SHAP

Handling Football-Specific Data Challenges

Class Imbalance

Solutions for Class Imbalance:

Small Sample Sizes

Solutions for Small Samples:

Temporal Dependencies

Solutions for Temporal Dependencies:

Production Deployment Considerations

Model Versioning and Monitoring

Retraining Strategy

Model Drift and Retraining

API Deployment Example

Advanced Topics and Extensions

Ensemble Methods

Neural Networks for Complex Patterns

Summary

Exercises

Conceptual Questions

Coding Exercises

Exercise 1: Build a Fourth Down Conversion Model

Exercise 2: Receiver Yards After Catch Prediction

Exercise 3: Team Performance Forecasting

Exercise 4: Model Interpretation Challenge

Further Reading

Books

Papers

Online Resources

References