Evaluation Lab
Model Evaluation and Selection Lab
Section titled “Model Evaluation and Selection Lab”Lab Overview
Section titled “Lab Overview”This lab provides hands-on practice with:
- Splitting datasets into training, cross validation, and test sets
- Evaluating regression and classification models
- Improving models by adding polynomial features
- Comparing different neural network architectures
- Implementing systematic model selection
Part 1: Regression Models
Section titled “Part 1: Regression Models”Dataset Setup and Visualization
Section titled “Dataset Setup and Visualization”- Loaded a dataset with 50 examples of input feature
x
and targety
- Plotted the dataset to visualize the relationship between input and target
Dataset Splitting
Section titled “Dataset Splitting”- Split the data into:
- Training set (60%): 30 examples
- Cross validation set (20%): 10 examples
- Test set (20%): 10 examples
# Get 60% of the dataset as the training setx_train, x_, y_train, y_ = train_test_split(x, y, test_size=0.40, random_state=1)
# Split the 40% subset into two: half for CV and half for test setx_cv, x_test, y_cv, y_test = train_test_split(x_, y_, test_size=0.50, random_state=1)
Feature Scaling
Section titled “Feature Scaling”- Used
StandardScaler
to compute z-score of inputs:z = (x - μ)/σ
- Crucial to use training set’s mean and standard deviation when scaling CV and test sets:
- Fit and transform on training set:
X_train_scaled = scaler.fit_transform(x_train)
- Only transform on CV/test sets:
X_cv_scaled = scaler.transform(x_cv)
Linear Model Evaluation
Section titled “Linear Model Evaluation”- Trained a linear regression model on the scaled training data
- Calculated Mean Squared Error (MSE) for both training and CV sets:
- J_train = (1/2m_train)∑(f(x_train) - y_train)²
- J_cv = (1/2m_cv)∑(f(x_cv) - y_cv)²
Adding Polynomial Features
Section titled “Adding Polynomial Features”- Created polynomial features up to degree 10 using
PolynomialFeatures
- For each polynomial degree:
- Added polynomial features
- Scaled features
- Trained linear regression model
- Computed training and CV MSEs
# Initialize lists to save errors, models, and transformstrain_mses = []cv_mses = []models = []polys = []scalers = []
# Loop over different polynomial degreesfor degree in range(1,11): # Add polynomial features poly = PolynomialFeatures(degree, include_bias=False) X_train_mapped = poly.fit_transform(x_train)
# Scale features scaler_poly = StandardScaler() X_train_mapped_scaled = scaler_poly.fit_transform(X_train_mapped)
# Train model and compute errors # ...
- Results showed:
- Linear model (degree=1) had high training and CV MSEs
- Adding a quadratic term (degree=2) dramatically reduced both errors
- Performance remained relatively stable through degree=5
- Higher degrees (6-10) showed increasing CV error, indicating overfitting
Model Selection
Section titled “Model Selection”- Selected model with lowest CV MSE (degree = 5)
- Computed test MSE to estimate generalization error
- Final results demonstrated the model selection process:
- Training MSE, CV MSE, and Test MSE all reported
Part 2: Neural Network Regression Models
Section titled “Part 2: Neural Network Regression Models”- Used same dataset but applied neural network models
- Tested multiple architectures:
- Model 1: Small network (1 hidden layer)
- Model 2: Medium network (2 hidden layers)
- Model 3: Larger network (3 hidden layers)
- For each architecture:
- Trained the model on scaled training data
- Computed training and CV MSEs
- Selected model with lowest CV MSE
Part 3: Classification Tasks
Section titled “Part 3: Classification Tasks”Dataset and Preparation
Section titled “Dataset and Preparation”- Loaded a binary classification dataset with 200 examples
- Each example had 2 input features and a target (0 or 1)
- Split into training (60%), CV (20%), and test (20%) sets
- Scaled features using the training set statistics
Classification Error Metrics
Section titled “Classification Error Metrics”- Measured performance using misclassification rate:
- Fraction of examples where predicted class != actual class
- Computed as:
np.mean(predictions != ground_truth)
# After getting model outputs (probabilities)yhat = tf.math.sigmoid(model.predict(x_scaled))yhat = np.where(yhat >= threshold, 1, 0)
# Compute fraction of misclassified exampleserror = np.mean(yhat != y_true)
Neural Network Classification
Section titled “Neural Network Classification”-
Built same neural network architectures as for regression
-
Configured for classification:
-
Used linear activation in output layer
-
Applied binary crossentropy loss with
from_logits=True
-
Used sigmoid function to convert outputs to probabilities
-
Applied threshold (0.5) to make binary predictions
-
Selected best model based on CV error
-
Reported final training, CV, and test classification errors
Key Takeaways
Section titled “Key Takeaways”- Three-way splitting is crucial for model selection and honest performance estimation
- Feature scaling should always use training set statistics
- Polynomial features can dramatically improve linear models for non-linear data
- Model selection should be based on cross-validation performance, not training performance
- Generalization error should be estimated using a separate test set that wasn’t used for model decisions
The systematic approach to model evaluation and selection demonstrated in this lab provides a solid foundation for developing models that generalize well to new data. By properly separating training, validation, and testing data, you can confidently select model architectures and report honest performance metrics.