Alternatives

Activation Functions in Neural Networks

Introduction to Alternatives

We’ve been using the sigmoid activation function in all nodes (hidden layers and output)
Built neural networks by taking logistic regression units and stringing them together
Using other activation functions can make neural networks much more powerful

Limitations of Binary Modeling

Awareness isn’t necessarily binary (aware or not aware)
Potential buyers can be:
- A little aware
- Somewhat aware
- Extremely aware
- Or awareness “could have gone completely viral”
Rather than modeling awareness as:
- Binary number (0, 1)
- Probability between 0 and 1
Awareness could be any non-negative number (from 0 to very large values)

ReLU Activation Function

Common Choice

Previously: calculated activation with sigmoid function (limited between 0 and 1)
To allow activation to take larger positive values, we use a different function
ReLU function looks like:
- When z < 0: g(z) = 0
- When z ≥ 0: g(z) = z (straight 45° line to the right of 0)
Mathematical equation: g(z) = max(0, z)
Name: ReLU stands for “Rectified Linear Unit”
- Most people in deep learning simply say “ReLU”

Common Activation Functions

Sigmoid

Function: g(z) = sigmoid function
Output range: (0, 1)
Common in binary classification output layers

ReLU

Function: g(z) = max(0, z)
Output range: [0, ∞)
Solves vanishing gradient problem
Most common in hidden layers

Linear

Function: g(z) = z
Output range: (-∞, ∞)
Sometimes referred to as “not using any activation function”
Since a = g(z) = z = w·x + b (as if there was no g)

Additional Notes

These three (sigmoid, ReLU, linear) are by far the most commonly used activation functions
Later in the course: introduction to the softmax activation function
With these activation functions, you can build a rich variety of powerful neural networks

Activation functions are critical components that determine how neural networks process information. While sigmoid functions were initially used due to their connection with logistic regression, alternatives like ReLU enable models to capture more complex patterns by allowing for unbounded positive activations.