Skip to content
Pablo Rodriguez

Regularization

Parameter Selection

How Regularization Affects Bias and Variance

Section titled “How Regularization Affects Bias and Variance”
  • Regularization parameter λ controls the tradeoff between:
  • Keeping model parameters small (high regularization)
  • Fitting the training data well (low regularization)
  • Understanding this relationship helps choose good λ values

Effect of Different λ Values on Model Fit

Section titled “Effect of Different λ Values on Model Fit”
Example: 4th Order Polynomial
  • Forces all parameters (w₁, w₂, etc.) to be very close to zero
  • Model becomes approximately f(x) ≈ b (just a constant)
  • Results in high bias (underfitting)
  • J_train is large (performs poorly on training data)
  • No regularization effect
  • Model fits training data too closely (wiggly curve)
  • Results in high variance (overfitting)
  • J_train is small but J_cv is much larger (poor generalization)
  • Balances fitting training data and keeping parameters small
  • Creates a model that is “just right”
  • Low J_train and low J_cv (good performance on both sets)
Systematic Approach

Similar to selecting the polynomial degree, you can use cross-validation to choose λ:

  1. Try many different λ values (e.g., λ = 0, 0.01, 0.02, 0.04, …, 10)

  2. For each λ value:

  • Train model by minimizing regularized cost function
  • Obtain parameters w_λ, b_λ
  • Evaluate cross-validation error J_cv(w_λ, b_λ)
  1. Choose λ with lowest cross-validation error
  • If J_cv(w₅, b₅) is lowest, use λ₅ and parameters w₅, b₅
  1. Estimate generalization error using test set
  • Report J_test(w₅, b₅) as final performance estimate
Visual Analysis

When plotting training and cross-validation errors against λ:

Training Error (J_train)

  • Increases as λ increases
  • Higher regularization forces focus on keeping parameters small rather than fitting training data
  • J_train is lowest when λ = 0 (no regularization)

Cross-Validation Error (J_cv)

  • Forms a U-shaped curve
  • Too small λ: high J_cv due to overfitting (high variance)
  • Too large λ: high J_cv due to underfitting (high bias)
  • Optimal λ at minimum of this curve

Comparing Regularization with Polynomial Selection

Section titled “Comparing Regularization with Polynomial Selection”
  • Both techniques allow finding the sweet spot between bias and variance
  • For polynomial degree selection:
  • Left side (low degree): high bias/underfitting
  • Right side (high degree): high variance/overfitting
  • For regularization parameter selection:
  • Left side (low λ): high variance/overfitting
  • Right side (high λ): high bias/underfitting
  • Cross-validation helps select optimal values in both cases

Regularization provides another powerful tool for managing the bias-variance tradeoff in your models. By systematically trying different λ values and evaluating performance on a cross-validation set, you can select the optimal regularization strength. This approach helps you build models that generalize well by avoiding both underfitting and overfitting.x