Relu Lab

ReLU Activation Function Lab

Introduction to ReLU

Definition

ReLU (Rectified Linear Unit) is defined as:
a = max(0,z)
Provides a continuous linear relationship with an ‘off’ range
The ‘off’ feature makes ReLU a non-linear activation

Why Non-Linear Activations?

Piecewise Linear Functions

Functions composed of linear pieces
Slope remains consistent during linear portions
Changes abruptly at transition points
At transition points, a new linear function is added

Role of Non-Linear Activation

Responsible for disabling input before/after transition points
Allows network to model complex functions by “stitching together” linear segments
Enables selective activation of different parts of the network

Lab Exercise: Modeling Piecewise Linear Functions

Network Structure:

First layer: 3 units (each responsible for one segment of the function)
Unit 0: Pre-programmed and fixed to map the first segment
Units 1 & 2: Need weight/bias adjustments to model 2nd and 3rd segments
Output unit: Fixed to sum the outputs of the first layer

Implementation Notes:

Use sliders to modify weights and biases to match the target
Start with w₁ and b₁, leaving w₂ and b₂ at zero until 2nd segment is matched
Clicking rather than sliding provides quicker adjustment

Understanding How ReLU Enables Piecewise Functions

Key Insight

Unit 0 (First Segment):

Responsible for segment [0,1]
ReLU cuts off the function after interval [0,1]
Critical feature: Prevents Unit 0 from interfering with following segments

Unit 1 (Second Segment):

Responsible for the 2nd segment
ReLU keeps this unit inactive until x > 1
Since Unit 0 is not contributing, w₁ equals the target line slope
Bias must be adjusted to keep output negative until x reaches 1
Note: Contribution extends to the 3rd segment

Unit 2 (Third Segment):

Responsible for the 3rd segment
ReLU zeros output until x reaches the appropriate value
w₂ must be set so that sum of Unit 1 and 2 creates desired slope
Bias adjusted to keep output negative until x reaches 2

ReLU’s non-linear behavior provides neural networks the critical ability to selectively activate different parts of the network depending on the input. This capability allows networks to model complex functions by combining simpler linear segments, creating piecewise linear approximations of any target function.