Regularization

Regularization and Bias/Variance Tradeoff

Parameter Selection

How Regularization Affects Bias and Variance

Regularization parameter λ controls the tradeoff between:
Keeping model parameters small (high regularization)
Fitting the training data well (low regularization)
Understanding this relationship helps choose good λ values

Effect of Different λ Values on Model Fit

Example: 4th Order Polynomial

Very Large λ (e.g., λ = 10,000):

Forces all parameters (w₁, w₂, etc.) to be very close to zero
Model becomes approximately f(x) ≈ b (just a constant)
Results in high bias (underfitting)
J_train is large (performs poorly on training data)

Very Small λ (e.g., λ = 0):

No regularization effect
Model fits training data too closely (wiggly curve)
Results in high variance (overfitting)
J_train is small but J_cv is much larger (poor generalization)

Intermediate λ:

Balances fitting training data and keeping parameters small
Creates a model that is “just right”
Low J_train and low J_cv (good performance on both sets)

Selecting λ Using Cross-Validation

Systematic Approach

Similar to selecting the polynomial degree, you can use cross-validation to choose λ:

Try many different λ values (e.g., λ = 0, 0.01, 0.02, 0.04, …, 10)
For each λ value:

Train model by minimizing regularized cost function
Obtain parameters w_λ, b_λ
Evaluate cross-validation error J_cv(w_λ, b_λ)

Choose λ with lowest cross-validation error

If J_cv(w₅, b₅) is lowest, use λ₅ and parameters w₅, b₅

Estimate generalization error using test set

Report J_test(w₅, b₅) as final performance estimate

Error Curves as Function of λ

Visual Analysis

When plotting training and cross-validation errors against λ:

Training Error (J_train)

Increases as λ increases
Higher regularization forces focus on keeping parameters small rather than fitting training data
J_train is lowest when λ = 0 (no regularization)

Cross-Validation Error (J_cv)

Forms a U-shaped curve
Too small λ: high J_cv due to overfitting (high variance)
Too large λ: high J_cv due to underfitting (high bias)
Optimal λ at minimum of this curve

Comparing Regularization with Polynomial Selection

Both techniques allow finding the sweet spot between bias and variance
For polynomial degree selection:
Left side (low degree): high bias/underfitting
Right side (high degree): high variance/overfitting
For regularization parameter selection:
Left side (low λ): high variance/overfitting
Right side (high λ): high bias/underfitting
Cross-validation helps select optimal values in both cases

Regularization provides another powerful tool for managing the bias-variance tradeoff in your models. By systematically trying different λ values and evaluating performance on a cross-validation set, you can select the optimal regularization strength. This approach helps you build models that generalize well by avoiding both underfitting and overfitting.x