optim.SGD¶

class lucid.optim.SGD(params: Iterable[Parameter], lr: float = 0.001, momentum: float = 0.0, weight_decay: float = 0.0)¶

The SGD class implements the Stochastic Gradient Descent optimization algorithm with optional momentum and weight decay. It inherits from the abstract Optimizer base class and provides a straightforward method for updating model parameters based on their gradients.

Class Signature¶

class SGD(optim.Optimizer):
    def __init__(
        self,
        params: Iterable[nn.Parameter],
        lr: float = 1e-3,
        momentum: float = 0.0,
        weight_decay: float = 0.0,
    ) -> None

Parameters¶

Learning Rate (`lr`): Controls the step size during parameter updates. A higher learning rate can speed up training but may cause instability, while a lower learning rate ensures more stable convergence.
Momentum (`momentum`): Accelerates SGD in the relevant direction and dampens oscillations. Momentum values typically range between 0.0 (no momentum) and 1.0.
Weight Decay (`weight_decay`): Adds a regularization term to prevent overfitting by penalizing large weights. This corresponds to L2 regularization.

Algorithm¶

The Stochastic Gradient Descent (SGD) algorithm updates the model parameters based on the following formulas:

\[ \begin{align}\begin{aligned}v_{t} &= \text{momentum} \times v_{t-1} + \text{gradient}_{t} + \text{weight_decay} \times \theta_{t}\\\theta_{t+1} &= \theta_{t} - \text{lr} \times v_{t}\end{aligned}\end{align} \]

Where:

\(\theta_{t}\) are the parameters at iteration \(t\).
\(\text{gradient}_{t}\) is the gradient of the loss with respect to \(\theta_{t}\).
\(v_{t}\) is the velocity (momentum buffer) at iteration \(t\).
\(\text{lr}\) is the learning rate.
\(\text{momentum}\) is the momentum factor.
\(\text{weight_decay}\) is the weight decay coefficient.

Examples¶

Using the SGD Optimizer

import lucid.optim as optim
import lucid.nn as nn

# Define a simple model
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.param = nn.Parameter([1.0, 2.0, 3.0])

    def forward(self, x):
        return x * self.param

# Initialize model and SGD optimizer
model = MyModel()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=1e-4)

# Training loop
for input, target in data:
    optimizer.zero_grad()
    output = model(input)
    loss = compute_loss(output, target)
    loss.backward()
    optimizer.step()

Inspecting Optimizer State

Use the state_dict() and load_state_dict() methods to save and load the optimizer state.

# Save state
optimizer_state = optimizer.state_dict()

# Load state
optimizer.load_state_dict(optimizer_state)