Choosing

Choosing Activation Functions for Neural Networks

Choosing Output Layer Activation Functions

Different neurons in a neural network can use different activation functions
For the output layer, there’s often one natural choice depending on what the target label (y) is

Binary Classification

When y is either 0 or 1
Use sigmoid activation
Neural network learns to predict probability that y equals 1
Similar to logistic regression

Regression (+ and -)

When y can be positive or negative
Example: predicting stock price changes
Use linear activation function
Allows output to take on positive or negative values

Non-negative Regression

When y can only take non-negative values
Example: predicting house prices
Use ReLU activation function
Only outputs zero or positive values

Hidden Layer Activation Functions

Industry Standard

ReLU is by far the most common choice for hidden layers
Evolution from sigmoid to ReLU:
Early neural networks used sigmoid functions
Modern practice heavily favors ReLU
Sigmoid now rarely used (except for binary classification output)

Why ReLU is Preferred:

Computational Efficiency:

ReLU only requires computing max(0,z)
Sigmoid requires exponentiation and inverse operations

Better Gradient Flow (more important reason):

ReLU goes flat in only one part of graph (left side)
Sigmoid goes flat in two places (both extremes)
Flat regions cause gradient descent to be slow
ReLU allows neural networks to learn faster

Implementation in TensorFlow

For hidden layers:
activation='relu' (recommended default)
For output layer:
Binary classification: activation='sigmoid'
Regression (positive/negative): activation='linear'
Non-negative outputs: activation='relu'

Other Activation Functions

Advanced

Research literature mentions other activation functions:
tanh (hyperbolic tangent)
LeakyReLU
Swish
New activation functions emerge periodically
Sometimes perform slightly better in specific cases
Example: “I’ve used the LeakyReLU activation function a few times in my work, and sometimes it works a little bit better than the ReLU”
For most applications, sigmoid/ReLU/linear are sufficient

Choosing the right activation function is essential for neural network performance. For output layers, select based on your prediction target type (binary, unbounded, or non-negative). For hidden layers, ReLU is the standard choice due to its computational efficiency and better gradient properties.