Intuition
- Negative log loss incentivizes high confidence in correct class
- When aⱼ approaches 1, loss approaches 0
- When aⱼ is small, loss becomes large
- Pushes model to assign high probability to correct class
For binary classification (y ∈ 1):
Calculate z = w·x + b
Compute a = g(z) using the sigmoid function
Interpret a as P(y=1|x)
P(y=0|x) = 1 - P(y=1|x) = 1 - a
Alternative view (to set up for softmax):
a₁ = P(y=1|x) = sigmoid(z)
a₂ = P(y=0|x) = 1 - a₁
a₁ + a₂ = 1 (probabilities must sum to 1)
For y ∈ 4:
For y ∈ {1,2,
… ,n}:
Intuition
Important Note
Softmax regression extends logistic regression to handle multiple classes by computing a separate score for each class and then converting these scores to probabilities using the softmax function. The probabilities sum to 1, and the cost function encourages the model to assign high probability to the correct class. This forms the foundation for multiclass classification in neural networks.