PyTorch: Logistic Regression

numpy

pytorch

tutorial

Author

Kai Tan

Published

April 21, 2024

Logistic Regression Model

The logistic regression model estimates the probability of the label \(y\) given the input \(x\) by:

\[ \mathbb{P}(y = 1) = \sigma(w^\top x + b) \]

Here:

\(x \in \mathbb{R}^d\) is the input vector, \(y \in \{0, 1\}\) is the binary label
\(w \in \mathbb{R}^d\) is the weight vector, \(b \in \mathbb{R}\) is the bias (intercept)
\(\sigma(z) = \dfrac{1}{1 + e^{-z}} = \dfrac{e^z}{1 + e^z}\) is the sigmoid function

Binary Cross-Entropy Loss

The loss function we minimize is the binary cross-entropy:

\[ L(w, b) = -\frac{1}{n} \sum_{i=1}^n \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right] \]

Here:

\(n\) is the number of training examples
\(y_i\) is the true label for the \(i\)-th sample
\(\hat{y}_i = \sigma(w^\top x_i + b)\) is the predicted probability

Intuition

If \(y_i = \hat{y}_i = 1\) for all \(i\), the loss is 0 — perfect prediction.
If \(y_i = \hat{y}_i = 0\) for all \(i\), the loss is also 0.
If \(y_i = 1\), \(\hat{y}_i = 0\), the loss becomes infinite — the worst-case prediction.

Deriving the Gradient

We start with the individual loss for each sample:

\[ L_i = -\left[ y_i \log(\sigma(z_i)) + (1 - y_i) \log(1 - \sigma(z_i)) \right], \]

where \(z_i = w^\top x_i + b\). Using the identity \(\frac{d\sigma(z)}{dz} = \sigma(z)(1 - \sigma(z))\), we obtain the gradient:

\[\begin{align*} \frac{\partial L_i}{\partial z_i} &= -\left[ y_i \frac{1}{\sigma(z_i)} \frac{\partial \sigma(z_i)}{\partial z_i} + (1 - y_i) \frac{-1}{1 - \sigma(z_i)} \frac{\partial \sigma(z_i)}{\partial z_i} \right]\\ &= -\left[ y_i \bigl(1 - \sigma(z_i)\bigr) - (1 - y_i) \sigma(z_i) \right]\\ &= \sigma(z_i) - y_i\\ &= \hat{y}_i - y_i. \end{align*}\]

By the chain rule, we can compute the gradient of the loss w.r.t. the weights \(w\) and bias \(b\): \[\begin{align*} \frac{\partial L_i}{\partial w} &= \frac{\partial L_i}{\partial z_i} \cdot \frac{\partial z_i}{\partial w} = (\hat y_i - y_i) x_i,\\ \frac{\partial L_i}{\partial b} &= \frac{\partial L_i}{\partial z_i} \cdot \frac{\partial z_i}{\partial b} = (\hat y_i - y_i). \end{align*}\]

Averaging over all \(n\) samples,the gradients become:

\[\begin{equation} \frac{\partial L}{\partial w} = \frac{1}{n} \sum_{i=1}^n (\hat{y}_i - y_i) x_i \quad \text{ and } \quad \frac{\partial L}{\partial b} = \frac{1}{n} \sum_{i=1}^n (\hat{y}_i - y_i). \end{equation}\]

Step 1: Data Generation

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)
n, d = 1000, 2
X = np.random.randn(n, d)
true_w = np.array([2.0, -3.0])
bias = 0.5

logits = X @ true_w + bias
probs = 1 / (1 + np.exp(-logits))
y = (probs > 0.5).astype(np.float32)

plt.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr', edgecolor='k')
plt.title("Synthetic Binary Classification Data")
plt.xlabel("x1")
plt.ylabel("x2")
plt.grid(True)
plt.show()

Step 2: Logistic Regression with NumPy

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def binary_cross_entropy(y_true, y_pred):
    eps = 1e-7
    return -np.mean(y_true * np.log(y_pred + eps) + (1 - y_true) * np.log(1 - y_pred + eps))

w = np.zeros(2)
b = 0.0
lr = 0.1

for epoch in range(100):
    z = X @ w + b
    y_pred = sigmoid(z)
    loss = binary_cross_entropy(y, y_pred)

    dz = y_pred - y
    dw = X.T @ dz / n
    db = np.mean(dz)

    w -= lr * dw
    b -= lr * db

    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

# Plotting the decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
x_vals = np.linspace(x_min, x_max, 100)
# Solve for x2 on the decision boundary line: w1*x1 + w2*x2 + b = 0
# => x2 = -(w1*x1 + b)/w2
y_vals = -(w[0] * x_vals + b) / w[1]

# Plot
plt.figure(figsize=(6, 4))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr', edgecolor='k')
plt.plot(x_vals, y_vals, 'k--', label='Decision Boundary')
plt.title("Logistic Regression Decision Boundary")
plt.xlabel("x1")
plt.ylabel("x2")
plt.legend()
plt.grid(True)
plt.show()

Epoch 0, Loss: 0.6931
Epoch 10, Loss: 0.5710
Epoch 20, Loss: 0.4932
Epoch 30, Loss: 0.4403
Epoch 40, Loss: 0.4019
Epoch 50, Loss: 0.3726
Epoch 60, Loss: 0.3495
Epoch 70, Loss: 0.3305
Epoch 80, Loss: 0.3147
Epoch 90, Loss: 0.3013

Step 3: Logistic Regression with PyTorch

import torch
from torch import nn

X_tensor = torch.tensor(X, dtype=torch.float32)
y_tensor = torch.tensor(y, dtype=torch.float32).view(-1, 1)

model = nn.Sequential(
    nn.Linear(d, 1),
    nn.Sigmoid()
)

loss_fn = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

for epoch in range(100):
    model.train()  # set model to training mode
    # step 1: make prediction (forward pass)
    y_pred = model(X_tensor)
    # step 2: compute loss
    loss = loss_fn(y_pred, y_tensor)
    # step 3: clear old and compute current gradients
    optimizer.zero_grad() # clear old gradients
    loss.backward()       # compute current gradients
    # step 4: update parameters
    optimizer.step()

    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

Epoch 0, Loss: 0.6921
Epoch 10, Loss: 0.5684
Epoch 20, Loss: 0.4893
Epoch 30, Loss: 0.4356
Epoch 40, Loss: 0.3970
Epoch 50, Loss: 0.3678
Epoch 60, Loss: 0.3449
Epoch 70, Loss: 0.3263
Epoch 80, Loss: 0.3108
Epoch 90, Loss: 0.2977