SparseCategoricalCrossentropy
- Expects target to be an integer corresponding to the index
- Example: For 10 potential classes, y would be between 0 and 9
In both softmax regression and neural networks with softmax outputs:
N outputs are generated
One output is selected as the predicted category
Vector z is generated by a linear function then passed to softmax
Softmax function:
Converts z into a probability distribution
Each output will be between 0 and 1
All outputs sum to 1
Larger inputs correspond to larger output probabilities
Mathematical formula:
a_j = e^(z_j) / Σ(e^(z_k)) for k=1 to N
Vector form interpretation:
Output a(x) is a vector of probabilities:
def my_softmax(z): ez = np.exp(z) #element-wise exponential sm = ez/np.sum(ez) return(sm)
Cross-entropy loss function:
L(a,y) = -log(a_y) where y is the target category
Only the probability of the correct class contributes to the loss
Complete cost function (over all examples):
J(w,b) = -1/m [ ΣΣ 1{y^(i)==j} log(e^(z_j^(i))/Σ(e^(z_k^(i)))) ]
m is number of examples
N is number of outputs
This is the average of all losses
model = Sequential([ Dense(25, activation = 'relu'), Dense(15, activation = 'relu'), Dense(4, activation = 'softmax') # softmax activation here])model.compile( loss=tf.keras.losses.SparseCategoricalCrossentropy(), optimizer=tf.keras.optimizers.Adam(0.001),)
preferred_model = Sequential([ Dense(25, activation = 'relu'), Dense(15, activation = 'relu'), Dense(4, activation = 'linear') # No activation here])preferred_model.compile( loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), # Note the from_logits=True optimizer=tf.keras.optimizers.Adam(0.001),)
In the preferred model, outputs are not probabilities
Values can range from large negative to large positive numbers
These raw outputs are called “logits”
To get probabilities during prediction:
Pass outputs through tf.nn.softmax()
p_preferred = preferred_model.predict(X_train)sm_preferred = tf.nn.softmax(p_preferred).numpy()
for i in range(5): print(f"{p_preferred[i]}, category: {np.argmax(p_preferred[i])}")
SparseCategoricalCrossentropy
CategoricalCrossEntropy
The softmax function transforms linear outputs into a probability distribution, enabling neural networks to perform multiclass classification. While the standard implementation puts softmax in the output layer, the preferred implementation uses linear activation with from_logits=True for numerical stability. Unlike other activation functions, softmax spans multiple outputs, making it uniquely suited for classification problems with multiple possible categories.