Skip to content
Pablo Rodriguez

Derivative

Optional Content
  • TensorFlow automatically computes outputs, cost functions, and uses backpropagation
  • Backpropagation is a key algorithm that computes derivatives of the cost function
  • This video explains the foundations of derivatives (using basic calculus)

Understanding Derivatives with a Simple Example

Section titled “Understanding Derivatives with a Simple Example”
Simplified Cost Function
  • Using simplified cost function: J(w) = w²
  • When w = 3, J(w) = 9
  • If w increases by a tiny amount (ε = 0.001):
    • w becomes 3.001
    • J(w) becomes 9.006001
    • J(w) increases by approximately 6 × 0.001
  • If ε = 0.002:
    • w becomes 3.002
    • J(w) becomes 9.012004
    • J(w) increases by approximately 6 × 0.002
  • The smaller ε is, the more accurate this 6:1 ratio becomes

Derivative Definition

When w goes up by a tiny amount ε, causing J(w) to go up by k × ε, we say the derivative of J(w) with respect to w equals k.

  • In gradient descent, we update parameters using: w_j = w_j - α × ∂J/∂w_j
  • If derivative is small: small update to w_j
  • If derivative is large: big change to w_j
  • This makes sense because:
    • Small derivative means changing w doesn’t affect J much
    • Large derivative means even a small change to w significantly impacts J

w = 3

  • J(w) = w² = 9
  • If w increases by 0.001
  • J increases by ~0.006 (6 × ε)
  • Derivative = 6

w = 2

  • J(w) = w² = 4
  • If w increases by 0.001
  • J increases by ~0.004 (4 × ε)
  • Derivative = 4

w = -3

  • J(w) = w² = 9
  • If w increases by 0.001
  • J decreases by ~0.006 (-6 × ε)
  • Derivative = -6
Practical Tools
  • SymPy is a Python library that can compute derivatives symbolically
  • Examples using w = 2:
Computing derivatives with SymPy
import sympy as sp
w, J = sp.symbols('w J')
# Example 1: J = w²
J = w**2
dJ_dw = sp.diff(J, w) # Returns 2*w
dJ_dw.subs(w, 2) # Returns 4
FunctionDerivative FormulaValue at w=2
2w4
3w²12
w11
1/w-1/w²-1/4
  • For J(w) = w³:

    • J(2.001) ≈ 8.012…
    • Increased by ~0.012 (12 × 0.001)
    • Confirms derivative = 12
  • For J(w) = w:

    • J(2.001) = 2.001
    • Increased by exactly 0.001 (1 × 0.001)
    • Confirms derivative = 1
  • For J(w) = 1/w:

    • J(2.001) ≈ 0.4998 (decreased from 0.5)
    • Decreased by ~0.00025 (0.25 × 0.001)
    • Confirms derivative = -1/4
Mathematical Notation
  • For functions of a single variable:

    • d/dw of J(w)
  • For functions of multiple variables:

    • ∂J/∂w_i (partial derivative)
  • Shorthand notations:
    • ∂J/∂w_i
    • dJ/dw_i

The derivative quantifies how sensitive a function is to small changes in its input. In neural networks, derivatives guide parameter updates during training - we move parameters in the direction that most efficiently reduces the cost function. The next video will explore how derivatives work in neural networks through computation graphs.