Advanced Evaluation
In the previous video, we used training/test splits to evaluate models
Problem: If we use the test set to select our model:
Test error becomes an overly optimistic estimate of generalization error
We’ve essentially “leaked” information from the test set into our model selection process
Better Approach
Instead of two-way splits, use a three-way split of your data:
Training Set (typically ~60%): Used to fit model parameters (w,b)
Cross-Validation Set (typically ~20%): Used for model selection
Test Set (typically ~20%): Used for final evaluation of selected model
For each subset, we compute an error measure:
Training Error
J_train(w,b) = (1/2m_train)∑(f(x^(i)) - y^(i))²
Measures how well model fits training data
Cross-Validation Error
J_cv(w,b) = (1/2m_cv)∑(f(x_cv^(i)) - y_cv^(i))²
Used for model selection
Also called validation error or dev error
Test Error
J_test(w,b) = (1/2m_test)∑(f(x_test^(i)) - y_test^(i))²
Only used for final evaluation
Provides unbiased estimate of generalization
Step-by-Step Approach
Train multiple models with different polynomial degrees (d=1,2,…,10)
For each d, fit parameters w^d, b^d using only the training set
Evaluate each model on the cross-validation set
Compute J_cv(w^1,b^1), J_cv(w^2,b^2), …, J_cv(w^10,b^10)
Select the model with lowest cross-validation error
If d=4 gives lowest J_cv, choose this model
Estimate generalization error using the test set
Report J_test(w^4,b^4) as your final performance estimate
Broader Applications
The same three-way split approach works for selecting between different neural network architectures:
Train multiple neural network architectures (different sizes/depths)
Each trained on the training set only
Evaluate each network on the cross-validation set
For classification, J_cv is typically the fraction of misclassified examples
Select the architecture with lowest cross-validation error
Report final performance using the test set
Use Training Set For
Fitting model parameters (w, b)
Training neural networks
Use Cross-Validation Set For
Selecting model type or architecture
Choosing hyperparameters
Making any other model decisions
Use Test Set For
ONLY final evaluation
Never for making decisions
Getting unbiased estimate of generalization
This three-way split approach is widely used in practice for model selection
Next, we’ll explore powerful diagnostics to improve model performance
The most important diagnostic: bias and variance analysis
The three-way split into training, cross-validation, and test sets provides a robust framework for both selecting the best model and fairly estimating its performance on new data. By reserving the cross-validation set for model selection decisions and keeping the test set completely untouched until the final evaluation, we avoid the optimistic bias that comes from testing on data that influenced our model choices.