4  Module 3: Multivariable Calculus - Differentiation

Syllabus: Multivariable Calculus - Differentiation Concept of limit and continuity of functions of two variables - Partial derivatives of first and higher order - Implicit partial differentiation - Local linear approximations - Chain rule for derivatives and partial derivatives - Relative maxima and minima of function of two variables (finding relative extrema only)


4.1 A New Dimension

So far, our world has been one of lines, planes, and vectors—the “flat” world of linear algebra. Now, we venture into the “curvy” world of calculus, but in higher dimensions.

In single-variable calculus, you studied functions \(y = f(x)\), whose graphs are curves in a 2D plane. Now, we’ll explore functions of two variables, \(z = f(x, y)\). Their graphs are surfaces in 3D space.

Think of it like this: \(x\) and \(y\) are your coordinates on a map (east-west and north-south), and \(z\) is your altitude. The function \(f(x, y)\) describes a landscape. Our goal is to understand this landscape: how steep is it? Which way is uphill? Where are the peaks and valleys?

Let’s start by looking at a landscape.

Code
import numpy as np
import plotly.graph_objects as go

# Define the function that describes our "landscape"
def f(x, y):
    return (x**2 + 3*y**2) * np.exp(1 - x**2 - y**2)

# Create a grid of (x,y) points
x_vals = np.linspace(-2.5, 2.5, 100)
y_vals = np.linspace(-2.5, 2.5, 100)
X, Y = np.meshgrid(x_vals, y_vals)

# Calculate the z-value (altitude) for each point
Z = f(X, Y)

# Create the interactive 3D surface plot
fig = go.Figure(data=[go.Surface(z=Z, x=X, y=Y)])

fig.update_layout(
    title='The "Landscape" of a Function of Two Variables',
    scene=dict(
        xaxis_title='x-axis',
        yaxis_title='y-axis',
        zaxis_title='z-axis (Altitude)'
    ),
    width=800, height=600,
    autosize=False
)

fig.show()
(a) An interactive plot of the surface z = f(x,y). Our goal is to analyze its features.
(b)
Figure 4.1

Looking at this plot, we can see peaks, a valley at the center, and ridges. How can we find these features mathematically?

4.2 Partial Derivatives: The Slope in One Direction

How do we measure the “slope” of a surface? The problem is that the slope depends on which direction you’re facing!

The simplest way to start is to do the simplest thing: hold one variable constant.

  1. Imagine you are standing on the surface and decide to walk only in the x-direction (due east). The slope you experience is the partial derivative with respect to x, written as \(\frac{\partial f}{\partial x}\) or \(f_x\).
  2. Alternatively, if you walk only in the y-direction (due north), the slope is the partial derivative with respect to y, written as \(\frac{\partial f}{\partial y}\) or \(f_y\).

To calculate \(\frac{\partial f}{\partial x}\), you simply treat \(y\) as a constant and differentiate with respect to \(x\). Let’s use SymPy to do this for our function \(f(x, y) = x^2 e^{-y}\).

Code
import sympy as sp

# Define x and y as symbolic variables
x, y = sp.symbols('x y')

# Define a simpler function symbolically
f_sym = x**2 * sp.exp(-y)

print(f"Our function is f(x, y) = {f_sym}")

# Calculate the partial derivative with respect to x (treat y as a constant)
fx = sp.diff(f_sym, x)
print(f"The partial derivative ∂f/∂x is: {fx}")

# Calculate the partial derivative with respect to y (treat x as a constant)
fy = sp.diff(f_sym, y)
print(f"The partial derivative ∂f/∂y is: {fy}")

# We can also find second-order derivatives
fxx = sp.diff(fx, x)
fxy = sp.diff(fx, y)
print(f"\nThe second-order partial f_xx is: {fxx}")
print(f"The mixed partial f_xy is: {fxy}")
Our function is f(x, y) = x**2*exp(-y)
The partial derivative ∂f/∂x is: 2*x*exp(-y)
The partial derivative ∂f/∂y is: -x**2*exp(-y)

The second-order partial f_xx is: 2*exp(-y)
The mixed partial f_xy is: -2*x*exp(-y)

4.3 Tutorial- Basic Applications of Partial Differentiation in Engineering

4.3.1 Problem 1: Voltage variation in a resistive circuit

In a resistive heating circuit, the voltage developed across a resistor depends on both the current through it and its resistance, according to:

\[ V = I^2 R \]

where \(I\) is the current and \(R\) is the resistance. The current in the circuit is \(2~\mathrm{A}\) and the resistance is \(5~\Omega\).

Tasks:

  1. Determine the sensitivity of voltage with respect to current and resistance by finding \(\dfrac{\partial V}{\partial I}\) and \(\dfrac{\partial V}{\partial R}\).
  2. Estimate the approximate change in voltage when the current increases by \(0.1~\mathrm{A}\) and the resistance decreases by \(0.2~\Omega\) due to heating.

Solution:

\[ \frac{\partial V}{\partial I} = 2 I R, \quad \frac{\partial V}{\partial R} = I^2 \]

At \((I, R) = (2,5)\):

\[ \frac{\partial V}{\partial I} = 20, \quad \frac{\partial V}{\partial R} = 4 \]

Approximate change in voltage:

\[ \Delta V \approx 20(0.1) + 4(-0.2) = 1.2~\mathrm{V} \]

Interpretation: A small current increase dominates, producing \(\Delta V \approx 1.2~\mathrm{V}\) rise.

4.3.2 Problem 2: Temperature distribution on a microchip surface

In an electronic processor, heat dissipation on the chip surface is modeled by

\[ T(x, y) = 200 e^{-0.01(x^2 + y^2)} \]

where \(T\) (in °C) denotes the temperature at coordinates \((x, y)\) measured in millimeters from the chip’s center.

Tasks:

  1. Find the rate of change of temperature at point \((4,3)\) along the \(x\)-direction.
  2. Determine the maximum rate of increase of temperature and the direction in which it occurs.

Solution:

\[ T_x = -4x\, e^{-0.01(x^2 + y^2)}, \quad T_y = -4y\, e^{-0.01(x^2 + y^2)} \]

At \((4,3)\):

\[ T_x = -16 e^{-0.25}, \quad T_y = -12 e^{-0.25} \]

The gradient:

\[ \nabla T = (-16 e^{-0.25}, -12 e^{-0.25}) \]

Magnitude of maximum rate of increase:

\[ \|\nabla T\| = 20 e^{-0.25} \]

Direction toward maximum increase: \((-\frac{4}{5}, -\frac{3}{5})\) (toward the chip center).

4.3.3 Problem 3: Electrostatic potential and field intensity

The electrostatic potential \(V\) at a point \((x, y, z)\) near a charge \(q\) is expressed as

\[ V(x, y, z) = \frac{kq}{\sqrt{x^2 + y^2 + z^2}} \]

where \(k\) is a constant of proportionality.

Tasks:

  1. Compute \(\dfrac{\partial V}{\partial x}\), \(\dfrac{\partial V}{\partial y}\), and \(\dfrac{\partial V}{\partial z}\).
  2. Derive the expression for the electric field vector \(\vec{E} = -\nabla V\) and discuss its physical direction.

Solution:

\[ \frac{\partial V}{\partial x} = -\frac{kq\, x}{(x^2 + y^2 + z^2)^{3/2}}, \quad \frac{\partial V}{\partial y} = -\frac{kq\, y}{(x^2 + y^2 + z^2)^{3/2}}, \quad \frac{\partial V}{\partial z} = -\frac{kq\, z}{(x^2 + y^2 + z^2)^{3/2}} \]

Electric field vector:

\[ \vec{E} = \frac{kq (x, y, z)}{(x^2 + y^2 + z^2)^{3/2}} \]

Direction: radially outward for a positive charge.

4.3.4 Problem 4: Gradient descent in a learning model

In a neural network, the cost function for a single data instance \((x_1, x_2, y)\) is defined by

\[ J(w_1, w_2) = (w_1 x_1 + w_2 x_2 - y)^2 \]

where \(w_1\) and \(w_2\) are the model parameters.

Tasks:

  1. Find \(\dfrac{\partial J}{\partial w_1}\) and \(\dfrac{\partial J}{\partial w_2}\).
  2. Explain the significance of these derivatives in adjusting the weights using gradient descent.

Solution:

\[ \frac{\partial J}{\partial w_1} = 2 (w_1 x_1 + w_2 x_2 - y) x_1, \quad \frac{\partial J}{\partial w_2} = 2 (w_1 x_1 + w_2 x_2 - y) x_2 \]

The gradient \(\nabla J\) guides weight updates via:

\[ w_i \leftarrow w_i - \eta \frac{\partial J}{\partial w_i} \]

4.3.5 Problem 6: CPU performance sensitivity analysis

A simplified performance model for a CPU relates processing time \(t\) (ms) to clock frequency \(f\) (GHz) and memory load \(m\) (GB) as

\[ t(f, m) = \frac{1000 m}{f - 0.1 m} \]

Tasks:

  1. Find \(\dfrac{\partial t}{\partial f}\) and \(\dfrac{\partial t}{\partial m}\).
  2. Interpret these derivatives as sensitivity measures of CPU performance with respect to \(f\) and \(m\).

Solution:

\[ \frac{\partial t}{\partial f} = -\frac{1000 m}{(f - 0.1 m)^2}, \quad \frac{\partial t}{\partial m} = \frac{1000 f}{(f - 0.1 m)^2} \]

Increasing \(f\) reduces processing time; increasing \(m\) increases processing time. Sensitivities scale with \((f - 0.1 m)^{-2}\).

4.3.6 Problem 7

If \(f(x, y, z) = \ln(\tan x + \tan y + \tan z)\), show that

\[ \sin 2x\, f_x + \sin 2y\, f_y + \sin 2z\, f_z = 2 \]

Solution:

\[ f_x = \frac{\sec^2 x}{\tan x + \tan y + \tan z}, \quad \sin 2x\, f_x = \frac{2 \tan x}{\tan x + \tan y + \tan z} \]

Similarly for \(y, z\). Adding all terms:

\[ \sin 2x\, f_x + \sin 2y\, f_y + \sin 2z\, f_z = 2 \]

4.3.7 Problem 8

If \(g(x, y, z) = \ln(\cot x + \cot y + \cot z)\), prove that

\[ \sin 2x\, g_x + \sin 2y\, g_y + \sin 2z\, g_z = -2 \]

Solution:

\[ g_x = \frac{-\csc^2 x}{\cot x + \cot y + \cot z}, \quad \sin 2x\, g_x = -\frac{2 \cot x}{\cot x + \cot y + \cot z} \]

Summing over \(x, y, z\) gives \(-2\).

4.3.8 Problem 9

If \(h(x, y, z) = \ln(\tan x \tan y \tan z)\), show that

\[ \sin 2x\, h_x + \sin 2y\, h_y + \sin 2z\, h_z = 6 \]

Solution:

\[ h_x = \frac{\sec^2 x}{\tan x} = \frac{2}{\sin 2x} \Rightarrow \sin 2x\, h_x = 2 \]

Similarly for \(y, z\), summing gives \(6\).

4.4 The Gradient and Linear Approximation

The two partial derivatives tell us the slope in the cardinal directions. But what if we want to know the slope in any direction? We can package our partial derivatives into a single, powerful object: the gradient vector.

Definition: The Gradient The gradient of \(f(x,y)\) is the vector: \[ \nabla f = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix} = f_x \mathbf{i} + f_y \mathbf{j} \]

The gradient is not just a container. It has a beautiful geometric meaning:

  1. Direction: The gradient vector \(\nabla f\) at a point \((x_0, y_0)\) points in the direction of the steepest ascent on the surface. It’s the “uphill” direction.
  2. Magnitude: The magnitude of the gradient, \(||\nabla f||\), is the slope in that steepest direction.

This leads to the idea of local linear approximation. Just as a smooth curve looks like its tangent line up close, a smooth surface looks like its tangent plane up close. The gradient helps us define this plane.

4.5 Local Linear Approximation: The Tangent Plane

This is a central idea that connects everything together. Remember from single-variable calculus the Madhava-Taylor series. The first-order approximation of a function \(f(x)\) near a point \(x=a\) is its tangent line: \[ L(x) = f(a) + f'(a)(x-a) \] This is the local linear approximation. The idea is powerful: if you zoom in far enough on any smooth curve, it looks like a straight line.

We will now extend this to two dimensions. If you zoom in far enough on any smooth surface, it looks like a flat plane. This is the tangent plane.

Definition: Local Linear Approximation The local linear approximation of a function \(f(x,y)\) at a point \((a,b)\) is given by: \[ L(x, y) = f(a, b) + f_x(a, b)(x-a) + f_y(a, b)(y-b) \] The graph of this function, \(z = L(x,y)\), is the tangent plane to the surface \(z = f(x,y)\) at the point \((a,b)\).

This formula is a beautiful extension of the 1D case. It says the approximate height near \((a,b)\) is the starting height \(f(a,b)\), plus the change due to moving in x (slope in x times distance in x), plus the change due to moving in y (slope in y times distance in y).

4.5.1 Problem and Application: Estimating Values

Let’s find the tangent plane for the function \(f(x,y) = \sqrt{x^2 + y^2}\) (a cone) at the point \((3, 4)\) and use it to approximate \(f(3.01, 3.99)\).

Code
import sympy as sp

# Define symbols and the function
x, y = sp.symbols('x y')
f = sp.sqrt(x**2 + y**2)
a, b = 3, 4

# 1. Find the value of the function at (a,b)
f_val = f.subs([(x, a), (y, b)])
print(f"The value f({a},{b}) is: {f_val}")

# 2. Find the partial derivatives
fx = sp.diff(f, x)
fy = sp.diff(f, y)
print(f"∂f/∂x = {fx}")
print(f"∂f/∂y = {fy}")

# 3. Find the slope values at (a,b)
fx_val = fx.subs([(x, a), (y, b)])
fy_val = fy.subs([(x, a), (y, b)])
print(f"\nThe slope fx({a},{b}) is: {fx_val}")
print(f"The slope fy({a},{b}) is: {fy_val}")

# 4. Assemble the linear approximation L(x,y)
L = f_val + fx_val * (x - a) + fy_val * (y - b)
print(f"\nThe Tangent Plane equation is: z = {sp.simplify(L)}")

# 5. Use L to approximate f(3.01, 3.99)
approx_val = L.subs([(x, 3.01), (y, 3.99)])
print(f"\nThe approximate value of f(3.01, 3.99) is: {approx_val}")

# 6. Compare with the true value
true_val = f.subs([(x, 3.01), (y, 3.99)])
print(f"The true value is: {true_val.evalf()}")
print(f"The approximation is excellent!")
The value f(3,4) is: 5
∂f/∂x = x/sqrt(x**2 + y**2)
∂f/∂y = y/sqrt(x**2 + y**2)

The slope fx(3,4) is: 3/5
The slope fy(3,4) is: 4/5

The Tangent Plane equation is: z = 3*x/5 + 4*y/5

The approximate value of f(3.01, 3.99) is: 4.99800000000000
The true value is: 4.99801960780468
The approximation is excellent!

4.5.2 Visualization: Surface and Tangent Plane

Seeing is believing. Let’s plot the cone and its tangent plane at \((3, 4, 5)\). Notice how the plane perfectly “kisses” the surface at that single point.

Code
import numpy as np
import plotly.graph_objects as go

# Define the surface function
def f_np(x, y):
    return np.sqrt(x**2 + y**2)

# Create grid for the surface plot
x_surf = np.linspace(0, 6, 50)
y_surf = np.linspace(0, 8, 50)
X_surf, Y_surf = np.meshgrid(x_surf, y_surf)
Z_surf = f_np(X_surf, Y_surf)

# Tangent plane: L(x,y) = 5 + (3/5)(x-3) + (4/5)(y-4)
def L_np(x, y):
    return 5 + (3/5)*(x - 3) + (4/5)*(y - 4)

X_plane, Y_plane = np.meshgrid(np.linspace(1, 5, 10), np.linspace(2, 6, 10))
Z_plane = L_np(X_plane, Y_plane)

# Create the plot
fig = go.Figure()

# Add the surface
fig.add_trace(go.Surface(z=Z_surf, x=X_surf, y=Y_surf, opacity=0.8, name='f(x,y)'))

# Add the tangent plane
fig.add_trace(go.Surface(z=Z_plane, x=X_plane, y=Y_plane,
                         colorscale='Reds', showscale=False, name='Tangent Plane'))

# Add the point of tangency (3,4,5)
fig.add_trace(go.Scatter3d(
    x=[3], y=[4], z=[5],
    mode='markers',
    marker=dict(size=8, color='black'),
    name='Point (3,4,5)'
))

fig.update_layout(title='Surface and its Tangent Plane',
                  width=800, height=600, autosize=False)
fig.show()
Figure 4.2: The tangent plane (red) provides a linear approximation to the surface (blue) at the point of tangency.

4.6 The Chain Rule: Derivatives on a Path

What if you’re not standing still, but walking along a path on the map? Suppose your path is given by \((x(t), y(t))\). Your altitude is then \(z = f(x(t), y(t))\). How fast is your altitude changing with respect to time, \(t\)?

The multivariable chain rule gives the answer: \[ \frac{dz}{dt} = \frac{\partial f}{\partial x} \frac{dx}{dt} + \frac{\partial f}{\partial y} \frac{dy}{dt} \]

This formula has a beautiful, compact form using the gradient and the velocity vector of your path, \(r'(t) = \begin{bmatrix} dx/dt \\ dy/dt \end{bmatrix}\): \[ \frac{dz}{dt} = \nabla f \cdot r'(t) \] The rate of change of your altitude is the dot product of the “steepest uphill” vector and your direction of travel vector. This is a perfect example of how linear algebra and calculus work together.

4.7 Tutorial 6: Applications of Chain Rule

4.7.1 Problem 1: Voltage Variation in a Temperature-Dependent Circuit

A resistive circuit has voltage \(V = I \cdot R\), where current and resistance vary with temperature \(T\):

\[ I(T) = 5\sqrt{T}, \quad R(T) = 2T + 3 \]

Using the chain rule, find \(\dfrac{dV}{dT}\) and interpret how voltage changes as temperature increases. Verify by substitution.

Chain-rule solution:

  1. Intermediate variable: \(V(I(T),R(T))\)
  2. Partial derivatives:

\[ \frac{\partial V}{\partial I} = R, \quad \frac{\partial V}{\partial R} = I \]

  1. Derivatives of intermediate variables:

\[ \frac{dI}{dT} = \frac{5}{2} T^{-1/2}, \quad \frac{dR}{dT} = 2 \]

  1. Apply chain rule:

\[ \frac{dV}{dT} = \frac{\partial V}{\partial I}\frac{dI}{dT} + \frac{\partial V}{\partial R}\frac{dR}{dT} = R \cdot \frac{5}{2}T^{-1/2} + I \cdot 2 \]

  1. Substitute \(I(T)\) and \(R(T)\):

\[ \frac{dV}{dT} = (2T+3) \cdot \frac{5}{2}T^{-1/2} + 5\sqrt{T} \cdot 2 = 15\sqrt{T} +\frac{15}{2\sqrt{T}} \]

Verification by substitution:

\[ V(T) = 5\sqrt{T}(2T+3) \implies \frac{dV}{dT} = 15T^{1/2} + \frac{15}{2T^{1/2}} \]

Python SymPy code:

Code
import sympy as sp

T = sp.symbols('T', positive=True)
I = 5*sp.sqrt(T)
R = 2*T + 3
V = I*R

# Direct derivative w.r.t T
dV_dT = sp.diff(V, T)
dV_dT

\(\displaystyle 10 \sqrt{T} + \frac{5 \left(2 T + 3\right)}{2 \sqrt{T}}\)

Alternate method:

Code
T = sp.symbols('T', positive=True)
I, R = sp.symbols('I R', real=True)  # treat as independent
V = I*R

# Partial derivatives
dV_dI = sp.diff(V, I)   # R
dV_dR = sp.diff(V, R)   # I

# Substitute expressions for I and R
I_expr = 5*sp.sqrt(T)
R_expr = 2*T + 3

# Derivatives of I and R w.r.t T
dI_dT = sp.diff(I_expr, T)
dR_dT = sp.diff(R_expr, T)

# Chain rule
dV_dT = dV_dI.subs(I,I_expr).subs(R,R_expr)*dI_dT + dV_dR.subs(I,I_expr)*dR_dT
sp.simplify(dV_dT)

\(\displaystyle \frac{15 \left(2 T + 1\right)}{2 \sqrt{T}}\)

4.7.2 Problem 2: Power Output in a Time-Varying Transistor

In a transistor circuit, the instantaneous power is given by \(P = V^2 / R\), where \(V\) is the voltage across the transistor and \(R\) is the resistance of the load. Suppose the voltage varies with time due to a decaying input signal, and the resistance slowly changes due to heating effects:

\[ V(t) = 10 e^{-0.02 t} \text{ volts}, \quad R(t) = 4 + 0.1 t \ \Omega \]

Compute the rate of change of power \(\dfrac{dP}{dt}\) at any time \(t\) using the multivariable chain rule, and verify the result by direct differentiation.

Solution (Chain-Rule Method):

  1. Treat \(P\) as a function of two variables: \(P(V,R)\).

\[ P(V,R) = \frac{V^2}{R} \]

  1. Compute the partial derivatives:

\[ \frac{\partial P}{\partial V} = \frac{2V}{R}, \quad \frac{\partial P}{\partial R} = -\frac{V^2}{R^2} \]

  1. Compute derivatives of intermediate variables w.r.t. \(t\):

\[ \frac{dV}{dt} = \frac{d}{dt}(10 e^{-0.02 t}) = -0.2 e^{-0.02 t}, \quad \frac{dR}{dt} = \frac{d}{dt}(4 + 0.1 t) = 0.1 \]

  1. Apply the multivariable chain rule:

\[ \frac{dP}{dt} = \frac{\partial P}{\partial V} \frac{dV}{dt} + \frac{\partial P}{\partial R} \frac{dR}{dt} \]

Substitute the values:

\[ \frac{dP}{dt} = \frac{2V}{R} \cdot (-0.2 e^{-0.02 t}) + \left(-\frac{V^2}{R^2}\right) \cdot 0.1 \]

  1. Substitute \(V(t)\) and \(R(t)\):

\[ \frac{dP}{dt} = \frac{2 \cdot 10 e^{-0.02 t}}{4 + 0.1 t} \cdot (-0.2 e^{-0.02 t}) - \frac{(10 e^{-0.02 t})^2}{(4 + 0.1 t)^2} \cdot 0.1 \]

Simplify:

\[ \frac{dP}{dt} = - \frac{4 (e^{-0.04 t})}{4 + 0.1 t} - \frac{100 e^{-0.04 t}}{(4 + 0.1 t)^2} \]

This is the rate of change of power at any time \(t\).

Verification by direct differentiation:

Directly compute:

\[ P(t) = \frac{(10 e^{-0.02 t})^2}{4 + 0.1 t} = \frac{100 e^{-0.04 t}}{4 + 0.1 t} \]

Differentiating w.r.t \(t\) gives exactly the same expression:

\[ \frac{dP}{dt} = - \frac{4 e^{-0.04 t}}{4 + 0.1 t} - \frac{100 e^{-0.04 t}}{(4 + 0.1 t)^2} \]

Python code:

Code
import sympy as sp

t = sp.symbols('t', real=True)
# Define independent symbols for chain rule
V_sym, R_sym = sp.symbols('V R', real=True)
P = V_sym**2 / R_sym

# Partial derivatives
dP_dV = sp.diff(P, V_sym)   # 2*V/R
dP_dR = sp.diff(P, R_sym)   # -V^2 / R^2

# Expressions for V(t) and R(t)
V_expr = 10*sp.exp(-0.02*t)
R_expr = 4 + 0.1*t

# Derivatives of intermediate variables
dV_dt = sp.diff(V_expr, t)
dR_dt = sp.diff(R_expr, t)

# Chain rule
dP_dt = dP_dV.subs({V_sym:V_expr, R_sym:R_expr})*dV_dt + dP_dR.subs({V_sym:V_expr, R_sym:R_expr})*dR_dt
dP_dt_simplified = sp.simplify(dP_dt)
dP_dt_simplified

\(\displaystyle \frac{\left(- 0.0025 t - 0.1625\right) e^{- 0.04 t}}{6.25 \cdot 10^{-5} t^{2} + 0.005 t + 0.1}\)

4.7.3 Problem 3: Capacitance Sensitivity in a Temperature-Dependent Capacitor

A parallel-plate capacitor has capacitance

\[ C = \varepsilon \frac{A}{d}, \]

where \(A\) is the plate area, \(d\) is the separation, and \(\varepsilon\) is the permittivity. The capacitor is subject to temperature variations that change the radius of the circular plates and the separation:

\[ A = \pi r^2, \quad r = 2 + 0.01 T, \quad d = 1 + 0.005 T \]

Compute the rate of change of capacitance \(\dfrac{dC}{dT}\) using the multivariable chain rule and verify by direct differentiation.

Solution (Chain-Rule Method):

  1. Treat \(C\) as a function of two variables: \(C(A(T), d(T))\).

\[ C(A,d) = \frac{\varepsilon A}{d} \]

  1. Compute the partial derivatives:

\[ \frac{\partial C}{\partial A} = \frac{\varepsilon}{d}, \quad \frac{\partial C}{\partial d} = -\frac{\varepsilon A}{d^2} \]

  1. Compute derivatives of intermediate variables w.r.t \(T\):

\[ \frac{dA}{dT} = \frac{d}{dT} (\pi r^2) = 2\pi r \frac{dr}{dT} = 2 \pi (2+0.01T)(0.01) = 0.02 \pi (2+0.01T) \]

\[ \frac{dd}{dT} = \frac{d}{dT}(1 + 0.005 T) = 0.005 \]

  1. Apply the multivariable chain rule:

\[ \frac{dC}{dT} = \frac{\partial C}{\partial A} \frac{dA}{dT} + \frac{\partial C}{\partial d} \frac{dd}{dT} \]

Substitute the partial derivatives:

\[ \frac{dC}{dT} = \frac{\varepsilon}{d} \cdot 0.02 \pi (2+0.01T) - \frac{\varepsilon A}{d^2} \cdot 0.005 \]

  1. Substitute \(A = \pi r^2 = \pi (2 + 0.01 T)^2\) and \(d = 1 + 0.005 T\):

\[ \frac{dC}{dT} = \frac{\varepsilon \cdot 0.02 \pi (2+0.01T)}{1+0.005T} - \frac{\varepsilon \pi (2+0.01T)^2 \cdot 0.005}{(1+0.005T)^2} \]

This gives the rate of change of capacitance at any temperature \(T\).

Verification by direct differentiation:

Directly differentiate:

\[ C(T) = \frac{\varepsilon \pi (2+0.01 T)^2}{1 + 0.005 T} \]

w.r.t \(T\) to get exactly the same expression as above.

Python SymPy Code (Corrected Chain-Rule Implementation):

Code
import sympy as sp

T = sp.symbols('T', real=True)
eps = sp.symbols('eps', real=True)

# Define intermediate variables as symbols for chain rule
A_sym, d_sym = sp.symbols('A d', real=True)
C = eps * A_sym / d_sym

# Partial derivatives
dC_dA = sp.diff(C, A_sym)   # eps / d
dC_dd = sp.diff(C, d_sym)   # -eps*A / d^2

# Expressions for A(T) and d(T)
r_expr = 2 + 0.01*T
A_expr = sp.pi * r_expr**2
d_expr = 1 + 0.005*T

# Derivatives of intermediate variables
dA_dT = sp.diff(A_expr, T)
dd_dT = sp.diff(d_expr, T)

# Chain rule
dC_dT = dC_dA.subs({A_sym:A_expr, d_sym:d_expr})*dA_dT + dC_dd.subs({A_sym:A_expr, d_sym:d_expr})*dd_dT
dC_dT_simplified = sp.simplify(dC_dT)
dC_dT_simplified

\(\displaystyle 0.02 \pi eps\)

4.7.4 Problem 4: Neural Network Weight Sensitivity

Consider a simple neuron in a feedforward neural network with two inputs \(x_1\) and \(x_2\), weights \(w_1\) and \(w_2\), and bias \(b\). The output of the neuron is

\[ z = \tanh(u), \quad u = w_1 x_1 + w_2 x_2 + b \]

Compute the sensitivity of the output with respect to the weights, i.e., \(\frac{\partial z}{\partial w_1}\) and \(\frac{\partial z}{\partial w_2}\), using the multivariable chain rule. Verify the results using Python’s SymPy library.

Solution (Chain-Rule Method):

  1. Treat \(z\) as a function of \(u\): \(z = \tanh(u)\).
  2. Compute the derivative of \(z\) with respect to \(u\):

\[ \frac{dz}{du} = 1 - \tanh^2(u) \]

  1. Compute the derivatives of \(u\) with respect to the weights:

\[ \frac{\partial u}{\partial w_1} = x_1, \quad \frac{\partial u}{\partial w_2} = x_2 \]

  1. Apply the chain rule:

\[ \frac{\partial z}{\partial w_1} = \frac{dz}{du} \cdot \frac{\partial u}{\partial w_1} = (1 - \tanh^2(u)) \cdot x_1 \]

\[ \frac{\partial z}{\partial w_2} = \frac{dz}{du} \cdot \frac{\partial u}{\partial w_2} = (1 - \tanh^2(u)) \cdot x_2 \]

Verification using Python SymPy:

Code
import sympy as sp

# Define symbols
w1, w2, x1, x2, b = sp.symbols('w1 w2 x1 x2 b', real=True)

# Define u and z
u = w1*x1 + w2*x2 + b
z = sp.tanh(u)

# Compute derivatives using chain rule automatically
dz_dw1 = sp.diff(z, w1)
dz_dw2 = sp.diff(z, w2)

# Simplify
dz_dw1_simpl = sp.simplify(dz_dw1)
dz_dw2_simpl = sp.simplify(dz_dw2)

dz_dw1_simpl, dz_dw2_simpl
(-x1*tanh(b + w1*x1 + w2*x2)**2 + x1, -x2*tanh(b + w1*x1 + w2*x2)**2 + x2)

4.7.5 Problem 5: Temperature in a Polar Sensor Grid

A sensor grid measures temperature at points \((x, y)\), where the temperature depends on position as

\[ T = x^2 + y^2 \]

Suppose the sensors are arranged in a polar coordinate system:

\[ x = r \cos\theta, \quad y = r \sin\theta \]

Compute the rate of change of temperature with respect to the radial distance \(r\) and the angular position \(\theta\), i.e., \(\frac{\partial T}{\partial r}\) and \(\frac{\partial T}{\partial \theta}\), using the multivariable chain rule. Verify the results using Python SymPy.

Solution (Chain-Rule Method):

  1. Treat \(T\) as a function of \(x\) and \(y\):

\[ T(x,y) = x^2 + y^2 \]

Partial derivatives:

\[ \frac{\partial T}{\partial x} = 2x, \quad \frac{\partial T}{\partial y} = 2y \]

  1. Compute derivatives of \(x\) and \(y\) with respect to \(r\) and \(\theta\):

\[ \frac{\partial x}{\partial r} = \cos\theta, \quad \frac{\partial x}{\partial \theta} = -r \sin\theta \]

\[ \frac{\partial y}{\partial r} = \sin\theta, \quad \frac{\partial y}{\partial \theta} = r \cos\theta \]

  1. Apply the multivariable chain rule:

\[ \frac{\partial T}{\partial r} = \frac{\partial T}{\partial x} \frac{\partial x}{\partial r} + \frac{\partial T}{\partial y} \frac{\partial y}{\partial r} = 2x \cos\theta + 2y \sin\theta \]

\[ \frac{\partial T}{\partial \theta} = \frac{\partial T}{\partial x} \frac{\partial x}{\partial \theta} + \frac{\partial T}{\partial y} \frac{\partial y}{\partial \theta} = 2x(-r \sin\theta) + 2y(r \cos\theta) = 0 \]

  1. Substitute \(x = r \cos\theta\), \(y = r \sin\theta\):

\[ \frac{\partial T}{\partial r} = 2r (\cos^2\theta + \sin^2\theta) = 2r \]

\[ \frac{\partial T}{\partial \theta} = 0 \]

Verification using Python SymPy:

Code
import sympy as sp

# Define symbols
r, theta = sp.symbols('r theta', real=True)

# Define coordinate transformations
x = r * sp.cos(theta)
y = r * sp.sin(theta)

# Temperature function
T = x**2 + y**2

# Partial derivatives w.r.t r and theta
dT_dr = sp.diff(T, r)
dT_dtheta = sp.diff(T, theta)

dT_dr_simpl = sp.simplify(dT_dr)
dT_dtheta_simpl = sp.simplify(dT_dtheta)

dT_dr_simpl, dT_dtheta_simpl
(2*r, 0)

4.7.6 Problem 6: Dynamic System Response Over Time

A dynamic system has an output that depends multiplicatively on two time-varying inputs. The instantaneous output is

\[ z = e^{\,x y}, \]

where the inputs themselves vary with time as

\[ x(t) = 2t^2,\qquad y(t) = 3t + 1. \]

Use the multivariable chain rule to compute the total derivative \(\dfrac{dz}{dt}\) (i.e. the rate of change of the output with respect to time). Then verify the result by substituting \(x(t),y(t)\) into \(z\) and differentiating directly. Evaluate the rate at \(t=1\) and interpret the result.

Solution (Chain-Rule Method):

  1. Identify intermediate variables: \(z = f(x,y)\) with \(f(x,y)=e^{xy}\), and \(x=x(t),\ y=y(t)\).

  2. Compute the partial derivatives of \(z\) w.r.t the intermediate variables:

\[ \frac{\partial z}{\partial x} = \frac{\partial}{\partial x} e^{xy} = y\,e^{xy}, \qquad \frac{\partial z}{\partial y} = \frac{\partial}{\partial y} e^{xy} = x\,e^{xy}. \]

  1. Compute time-derivatives of the intermediate variables:

\[ \frac{dx}{dt} = \frac{d}{dt}(2t^2) = 4t, \qquad \frac{dy}{dt} = \frac{d}{dt}(3t+1) = 3. \]

  1. Apply the multivariable chain rule (total derivative):

\[ \frac{dz}{dt} = \frac{\partial z}{\partial x}\frac{dx}{dt} + \frac{\partial z}{\partial y}\frac{dy}{dt}. \]

Substitute the partials and time-derivatives:

\[ \frac{dz}{dt} = \bigl(y e^{xy}\bigr)(4t) + \bigl(x e^{xy}\bigr)(3) = e^{xy}\bigl(4t y + 3x\bigr). \]

  1. Evaluate at \(t=1\). First compute \(x(1)=2\cdot1^2 = 2\) and \(y(1)=3\cdot1+1 = 4\). Thus

\[ \frac{dz}{dt}\Big|_{t=1} = e^{(2)(4)}\bigl(4\cdot1\cdot4 + 3\cdot2\bigr) = e^{8}(16 + 6) = 22 e^{8}. \]

Interpretation: At \(t=1\) the output is increasing very rapidly: the instantaneous rate is \(22e^8\), reflecting strong sensitivity because \(z\) is exponential in the product \(xy\).

Verification by substitution (direct differentiation):

Substitute \(x(t)\) and \(y(t)\) into \(z\):

\[ z(t) = e^{(2t^2)(3t+1)} = e^{6t^3 + 2t^2}. \]

Differentiate directly:

\[ \frac{dz}{dt} = e^{6t^3 + 2t^2}\cdot(18t^2 + 4t). \]

Check algebraically that

\[ 18t^2 + 4t \equiv 4t(3t+1) + 3(2t^2) = 4t y + 3x, \]

so the direct differentiation result matches the chain-rule result. Evaluating at \(t=1\) gives \(22e^8\) as before.

Python (SymPy) verification code:

Code
import sympy as sp

# symbol
t = sp.symbols('t', real=True)

# define x(t), y(t)
x_expr = 2*t**2
y_expr = 3*t + 1

# Method A: explicit chain-rule via partials (use x,y as symbols then substitute)
x_sym, y_sym = sp.symbols('x_sym y_sym')
z_sym = sp.exp(x_sym*y_sym)

# partials
z_x = sp.diff(z_sym, x_sym)   # y * exp(xy)
z_y = sp.diff(z_sym, y_sym)   # x * exp(xy)

# substitute x(t), y(t) and multiply by dx/dt, dy/dt
dzdt_chain = (z_x.subs({x_sym: x_expr, y_sym: y_expr}) * sp.diff(x_expr, t) +
              z_y.subs({x_sym: x_expr, y_sym: y_expr}) * sp.diff(y_expr, t))
dzdt_chain_simpl = sp.simplify(dzdt_chain)

# evaluate at t=1
dzdt_chain_at_1 = dzdt_chain_simpl.subs(t, 1)

# Method B: substitute then differentiate
z_sub = sp.exp(x_expr * y_expr)
dzdt_sub = sp.diff(z_sub, t)
dzdt_sub_simpl = sp.simplify(dzdt_sub)
dzdt_sub_at_1 = dzdt_sub_simpl.subs(t, 1)

dzdt_chain_simpl, dzdt_chain_at_1, dzdt_sub_simpl, dzdt_sub_at_1
(2*t*(9*t + 2)*exp(2*t**2*(3*t + 1)),
 22*exp(8),
 2*t*(9*t + 2)*exp(2*t**2*(3*t + 1)),
 22*exp(8))

4.8 The Main Event: Finding Maxima and Minima

Now we can answer the big question: how do we find the peaks and valleys of our landscape?

At the very top of a peak or the bottom of a valley, the ground is perfectly flat. The slope in every direction is zero. This means both partial derivatives must be zero.

Critical Points A point \((a, b)\) is a critical point of \(f(x,y)\) if the gradient at that point is the zero vector: \[ \nabla f(a, b) = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \quad \text{which means} \quad f_x(a, b) = 0 \text{ and } f_y(a, b) = 0 \]

But a flat spot isn’t always a peak or a valley. It could also be a saddle point, like the middle of a Pringles chip—it’s a minimum in one direction and a maximum in another.

To classify these critical points, we need a multivariable version of the Second Derivative Test. This test involves a quantity called the Discriminant (\(D\)), which is built from the second-order partial derivatives.

The Second Derivative Test

First, find all critical points by solving \(\nabla f = 0\). Then, for each critical point \((a, b)\), calculate the second partial derivatives (\(f_{xx}, f_{yy}, f_{xy}\)) at that point.

Define the Discriminant \(D = f_{xx}(a,b) f_{yy}(a,b) - [f_{xy}(a,b)]^2\).

  1. If \(D > 0\) and \(f_{xx}(a,b) > 0\), then \(f\) has a local minimum at \((a, b)\).
  2. If \(D > 0\) and \(f_{xx}(a,b) < 0\), then \(f\) has a local maximum at \((a, b)\).
  3. If \(D < 0\), then \(f\) has a saddle point at \((a, b)\).
  4. If \(D = 0\), the test is inconclusive.

Note that the Discriminant is just the determinant of the Hessian matrix, a beautiful connection back to linear algebra! \[ H = \begin{bmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{bmatrix} \implies D = \det(H) \]

4.8.1 Example: Finding the Extrema of Our Landscape

Let’s use SymPy to find and classify all the critical points of the function \(f(x, y) = (x^2 + 3y^2) e^{1 - x^2 - y^2}\) we plotted at the beginning.

Code
import sympy as sp

# Define symbols and the function
x, y = sp.symbols('x y')
f = (x**2 + 3*y**2) * sp.exp(1 - x**2 - y**2)

# 1. Find the partial derivatives
fx = sp.diff(f, x)
fy = sp.diff(f, y)

# 2. Find the critical points by solving ∇f = 0
# This can be computationally intensive; we'll use a numerical approach for clarity
# For this specific function, inspection shows critical points at:
# (0,0), (1,0), (-1,0), (0,1), (0,-1)
critical_points = [
    (0, 0),
    (1, 0),
    (-1, 0),
    (0, 1),
    (0, -1)
]
print(f"The critical points are: {critical_points}\n")


# 3. Calculate second-order partial derivatives
fxx = sp.diff(fx, x)
fyy = sp.diff(fy, y)
fxy = sp.diff(fx, y)

# 4. Create the Discriminant D
D = fxx * fyy - fxy**2

# 5. Classify each critical point
for p in critical_points:
    px, py = p
    # Substitute the point's coordinates into D and fxx
    D_val = D.subs([(x, px), (y, py)])
    fxx_val = fxx.subs([(x, px), (y, py)])
    
    print(f"--- Analyzing point {p} ---")
    print(f"  D = {D_val:.2f}, f_xx = {fxx_val:.2f}")

    if D_val > 0 and fxx_val > 0:
        print("  Result: Local Minimum")
    elif D_val > 0 and fxx_val < 0:
        print("  Result: Local Maximum")
    elif D_val < 0:
        print("  Result: Saddle Point")
    else:
        print("  Result: Test is inconclusive")
The critical points are: [(0, 0), (1, 0), (-1, 0), (0, 1), (0, -1)]

--- Analyzing point (0, 0) ---
  D = 88.67, f_xx = 5.44
  Result: Local Minimum
--- Analyzing point (1, 0) ---
  D = -16.00, f_xx = -4.00
  Result: Saddle Point
--- Analyzing point (-1, 0) ---
  D = -16.00, f_xx = -4.00
  Result: Saddle Point
--- Analyzing point (0, 1) ---
  D = 48.00, f_xx = -4.00
  Result: Local Maximum
--- Analyzing point (0, -1) ---
  D = 48.00, f_xx = -4.00
  Result: Local Maximum

The results match what we see in the 3D plot perfectly! The origin is a local minimum, the points on the y-axis are local maxima (the two highest peaks), and the points on the x-axis are saddle points.

4.9 Tutorial 7: Local Extrema and the Second Partial Derivative Test

4.9.1 Problem 1- Power in a resistor (filter design)

The instantaneous power dissipated across a resistor is modeled by \[ P(V,R)=\frac{V^{2}}{R+10}, \] where \(V\) is the input voltage (volts) and \(R\) is the resistance (ohms). Find all stationary points and classify them using the second partial derivative test. Give a physical interpretation.

Solution:

1. First partial derivatives \[ P_V=\frac{\partial P}{\partial V}=\frac{2V}{R+10},\qquad P_R=\frac{\partial P}{\partial R}=-\frac{V^{2}}{(R+10)^{2}}. \]

Stationary points require \(P_V=0\) and \(P_R=0\) simultaneously.

  • From \(P_V=0\) we get \(V=0\).

  • Substitute \(V=0\) into \(P_R\): \(P_R(0,R)=0\) (for all \(R\)).

Conclusion: The set of stationary points is the entire line \(V=0\) (all \(R\)).

Second partial derivatives \[ P_{VV}=\frac{2}{R+10},\qquad P_{RR}=\frac{2V^{2}}{(R+10)^{3}},\qquad P_{VR}=P_{RV}=-\frac{2V}{(R+10)^{2}}. \]

Evaluate at any stationary point \((V=0,R)\): \[ P_{VV}\big|_{V=0}=\frac{2}{R+10}>0,\quad P_{RR}\big|_{V=0}=0,\quad P_{VR}\big|_{V=0}=0. \]

Hessian determinant (Discriminant): \[ D = P_{VV}P_{RR}-P_{VR}^2 = \frac{2}{R+10}\cdot 0 - 0 = 0, \] so the second derivative test is inconclusive (because \(D=0\)).

Further reasoning / interpretation

Compute \(P\) along \(V=0\): \(P(0,R)=0\). Since \(P(V,R)\ge0\) for all \((V,R)\) (denominator is positive for physical \(R\)), \(P=0\) is the global minimum value. The line \(V=0\) is therefore a flat valley of minima: in the \(V\)-direction the function is convex (\(P_{VV}>0\)), while along \(R\) it is flat at \(V=0\).

SymPy verification

Code
import sympy as sp
V, R = sp.symbols('V R', real=True)
P = V**2/(R+10)
# first partials
dPV = sp.diff(P, V)
dPR = sp.diff(P, R)
# stationary set
crit = sp.solve([sp.Eq(dPV,0), sp.Eq(dPR,0)], [V,R], dict=True)  # will show V=0 with free R
# second partials
PVV = sp.diff(P, V, 2)
PRR = sp.diff(P, R, 2)
PVR = sp.diff(P, V, 1, R, 1)
dPV, dPR, crit, PVV, PRR, PVR
(2*V/(R + 10),
 -V**2/(R + 10)**2,
 [{V: 0}],
 2/(R + 10),
 2*V**2/(R + 10)**3,
 -2*V/(R + 10)**2)

4.9.2 Problem 2 — Carrier mobility in a thin film

*Carrier mobility in a semiconductor thin film is modeled as \[ \mu(x,y)=5 - x^{2} - 2y^{2} + xy, \] where \(x\) and \(y\) are nondimensional process parameters (e.g., doping concentration factor and annealing index). Find the stationary points and classify them using the second partial derivative test. Discuss the engineering interpretation for the point with highest mobility.

Solution

First partial derivatives

Compute the partial derivatives: \[ \mu_x = \frac{\partial \mu}{\partial x} = -2x + y, \qquad \mu_y = \frac{\partial \mu}{\partial y} = -4y + x. \]

Set \(\mu_x=0\) and \(\mu_y=0\) to find stationary points: \[ -2x + y = 0 \Rightarrow y = 2x, \\ -4y + x = 0 \Rightarrow x = 4y. \]

Substitute \(y = 2x\) into \(x = 4y\):

\[ x = 4(2x) = 8x \Rightarrow 7x=0 \Rightarrow x=0 \Rightarrow y = 0. \]

Stationary point: \((x,y) = (0,0)\).

Second partial derivatives

Compute the second partial derivatives: \[ \mu_{xx} = -2, \quad \mu_{yy} = -4, \quad \mu_{xy} = 1. \]

Hessian determinant: \[ D = \mu_{xx}\mu_{yy} - (\mu_{xy})^2 = (-2)(-4) - 1^2 = 8 - 1 = 7 > 0. \]

Since \(D>0\) and \(\mu_{xx}=-2<0\), the stationary point \((0,0)\) is a local maximum.

Value at maximum: \(\mu(0,0) = 5\).

Interpretation

The model predicts that the highest carrier mobility occurs at the nominal process parameters \((x,y)=(0,0)\). Any deviation from these parameters reduces mobility.

SymPy verification

Code
import sympy as sp

# Define symbols
x, y = sp.symbols('x y', real=True)

# Define function
mu = 5 - x**2 - 2*y**2 + x*y

# Compute first partial derivatives
dmu_dx = sp.diff(mu, x)
dmu_dy = sp.diff(mu, y)

# Solve for stationary points
crit = sp.solve([sp.Eq(dmu_dx,0), sp.Eq(dmu_dy,0)], [x,y])

# Compute Hessian
H = sp.hessian(mu, (x,y))

# Evaluate determinant at stationary point
D = sp.det(H).subs({x:0, y:0})

dmu_dx, dmu_dy, crit, H, D
(-2*x + y,
 x - 4*y,
 {x: 0, y: 0},
 Matrix([
 [-2,  1],
 [ 1, -4]]),
 7)

4.9.3 Problem 3 — Cooling fin efficiency

The thermal efficiency of a cooling fin is modeled by \[ \eta(x,y) = 80 - 4x^{2} - 2y^{2} + 3xy, \] where \(x\) and \(y\) are nondimensional geometric parameters (e.g., normalized fin spacing and thickness). Determine the stationary point that optimizes efficiency and classify it using the second partial derivative test. Interpret the result in terms of fin design.

Solution

First partial derivatives

Compute the first partial derivatives: \[ \eta_x = \frac{\partial \eta}{\partial x} = -8x + 3y, \qquad \eta_y = \frac{\partial \eta}{\partial y} = -4y + 3x. \]

Set \(\eta_x=0\) and \(\eta_y=0\) to find stationary points:

\[ -8x + 3y = 0 \Rightarrow y = \frac{8}{3}x, \\ -4y + 3x = 0 \Rightarrow x = \frac{4}{3}y. \]

Substitute \(y=\frac{8}{3}x\) into \(x = \frac{4}{3}y\):

\[ x = \frac{4}{3}\cdot \frac{8}{3}x = \frac{32}{9}x \Rightarrow \frac{32}{9}x - x = 0 \Rightarrow \frac{23}{9}x = 0 \Rightarrow x=0. \]

Hence \(y = \frac{8}{3}\cdot 0 = 0\).

Stationary point: \((x,y) = (0,0)\).

Second partial derivatives

Compute second partial derivatives: \[ \eta_{xx} = -8, \quad \eta_{yy} = -4, \quad \eta_{xy} = 3. \]

Hessian determinant: \[ D = \eta_{xx}\eta_{yy} - (\eta_{xy})^2 = (-8)(-4) - 3^2 = 32 - 9 = 23 > 0. \]

Since \(D>0\) and \(\eta_{xx}=-8<0\), the stationary point \((0,0)\) is a local maximum.

Value at maximum: \(\eta(0,0) = 80\).

Interpretation

The model predicts that the maximum thermal efficiency occurs at the nominal fin geometry \((x,y)=(0,0)\). Deviations in spacing or thickness reduce efficiency. Engineers can use this information to guide optimal fin design.

SymPy verification

Code
import sympy as sp

# Define symbols
x, y = sp.symbols('x y', real=True)

# Define function
eta = 80 - 4*x**2 - 2*y**2 + 3*x*y

# First partial derivatives
deta_dx = sp.diff(eta, x)
deta_dy = sp.diff(eta, y)

# Solve for stationary points
crit = sp.solve([sp.Eq(deta_dx,0), sp.Eq(deta_dy,0)], [x,y])

# Hessian matrix
H = sp.hessian(eta, (x,y))

# Evaluate determinant at stationary point
D = sp.det(H).subs({x:0, y:0})

deta_dx, deta_dy, crit, H, D
(-8*x + 3*y,
 3*x - 4*y,
 {x: 0, y: 0},
 Matrix([
 [-8,  3],
 [ 3, -4]]),
 23)

4.9.4 Problem 4 — Execution time in resource allocation

The estimated execution time of a program is modeled by \[ T(x,y) = x^{3} + y^{3} - 6xy + 20, \] where \(x\) represents CPU allocation and \(y\) represents memory bandwidth (both nondimensional). Find all stationary points, classify them using the second partial derivative test, and identify the point corresponding to minimal execution time.

Solution:

First partial derivatives

Compute the first partial derivatives: \[ T_x = \frac{\partial T}{\partial x} = 3x^{2} - 6y, \qquad T_y = \frac{\partial T}{\partial y} = 3y^{2} - 6x. \]

Set \(T_x=0\) and \(T_y=0\) to find stationary points:

\[ 3x^2 - 6y = 0 \Rightarrow y = \frac{1}{2} x^2, \\ 3y^2 - 6x = 0 \Rightarrow x = \frac{1}{2} y^2. \]

Substitute \(y=\frac{1}{2}x^2\) into \(x=\frac{1}{2}y^2\):

\[ x = \frac{1}{2}\left(\frac{1}{2}x^2\right)^2 = \frac{1}{8} x^4. \]

Solve: - \(x=0 \Rightarrow y = \frac{1}{2} \cdot 0^2 = 0\), giving stationary point \((0,0)\). - If \(x\ne 0\): divide both sides by \(x\), \(1 = \frac{1}{8} x^3 \Rightarrow x^3 = 8 \Rightarrow x=2\), then \(y = \frac{1}{2} (2)^2 = 2\) giving \((2,2)\).

Stationary points: \((0,0)\) and \((2,2)\).

Second partial derivatives

Compute the second partial derivatives: \[ T_{xx} = 6x, \quad T_{yy} = 6y, \quad T_{xy} = -6. \]

Hessian determinant: \[ D = T_{xx}T_{yy} - (T_{xy})^2. \]

  • At \((0,0)\): \(T_{xx}=0\), \(T_{yy}=0\), \(T_{xy}=-6\)
    \(D = 0\cdot0 - (-6)^2 = -36 < 0\)saddle point.

  • At \((2,2)\): \(T_{xx}=12\), \(T_{yy}=12\), \(T_{xy}=-6\)
    \(D = 12\cdot12 - (-6)^2 = 144 - 36 = 108 > 0\) and \(T_{xx}=12>0\)local minimum.

Value at \((2,2)\): \[ T(2,2) = 2^3 + 2^3 - 6\cdot2\cdot2 + 20 = 8 + 8 - 24 + 20 = 12. \]

Interpretation

  • \((0,0)\) is a saddle point: small changes in \(x\) or \(y\) can increase or decrease execution time.
  • \((2,2)\) is a local minimum: allocating CPU and memory according to \((2,2)\) minimizes execution time (locally), giving \(T=12\).

Engineers can use this analysis for resource optimization to achieve minimal runtime.

SymPy verification

Code
import sympy as sp

# Define symbols
x, y = sp.symbols('x y', real=True)

# Define function
Tfun = x**3 + y**3 - 6*x*y + 20

# First partial derivatives
dTx = sp.diff(Tfun, x)
dTy = sp.diff(Tfun, y)

# Solve for stationary points
crit = sp.solve([sp.Eq(dTx,0), sp.Eq(dTy,0)], [x,y])

# Hessian matrix
H = sp.hessian(Tfun, (x,y))

# Evaluate determinant and classify each stationary point
results = [(pt, sp.simplify(sp.det(H).subs({x:pt[0], y:pt[1]})), sp.simplify(H[0,0].subs({x:pt[0], y:pt[1]}))) for pt in crit]

crit, H, results
([(0, 0), (2, 2)],
 Matrix([
 [6*x,  -6],
 [ -6, 6*y]]),
 [((0, 0), -36, 0), ((2, 2), 108, 12)])

4.10 Module III Summary

  • We’ve moved from 2D curves to 3D surfaces, or “landscapes.”
  • Partial derivatives (\(f_x, f_y\)) are the slopes in the cardinal directions.
  • The gradient vector (\(\nabla f = [f_x, f_y]\)) packages these slopes and points in the direction of steepest ascent. It is the key to understanding the local geometry of a surface.
  • To find potential maxima and minima (critical points), we find where the landscape is flat by solving \(\nabla f = 0\).
  • The Second Derivative Test, using the determinant of the Hessian matrix, allows us to classify these critical points as local maxima, local minima, or saddle points.
  • This process of finding extrema is called optimization, and it is the absolute core of how modern AI models are trained.