Skip to content
Pablo Rodriguez

Alternatives

  • We’ve been using the sigmoid activation function in all nodes (hidden layers and output)
  • Built neural networks by taking logistic regression units and stringing them together
  • Using other activation functions can make neural networks much more powerful
  • Awareness isn’t necessarily binary (aware or not aware)
  • Potential buyers can be:
    • A little aware
    • Somewhat aware
    • Extremely aware
    • Or awareness “could have gone completely viral”
  • Rather than modeling awareness as:
    • Binary number (0, 1)
    • Probability between 0 and 1
  • Awareness could be any non-negative number (from 0 to very large values)
Common Choice
  • Previously: calculated activation with sigmoid function (limited between 0 and 1)
  • To allow activation to take larger positive values, we use a different function
  • ReLU function looks like:
    • When z < 0: g(z) = 0
    • When z ≥ 0: g(z) = z (straight 45° line to the right of 0)
  • Mathematical equation: g(z) = max(0, z)
  • Name: ReLU stands for “Rectified Linear Unit”
    • Most people in deep learning simply say “ReLU”

Sigmoid

  • Function: g(z) = sigmoid function
  • Output range: (0, 1)
  • Common in binary classification output layers

ReLU

  • Function: g(z) = max(0, z)
  • Output range: [0, ∞)
  • Solves vanishing gradient problem
  • Most common in hidden layers

Linear

  • Function: g(z) = z
  • Output range: (-∞, ∞)
  • Sometimes referred to as “not using any activation function”
  • Since a = g(z) = z = w·x + b (as if there was no g)
  • These three (sigmoid, ReLU, linear) are by far the most commonly used activation functions
  • Later in the course: introduction to the softmax activation function
  • With these activation functions, you can build a rich variety of powerful neural networks

Activation functions are critical components that determine how neural networks process information. While sigmoid functions were initially used due to their connection with logistic regression, alternatives like ReLU enable models to capture more complex patterns by allowing for unbounded positive activations.