fn

backward

None
backward(tensors: Tensor | list[Tensor], grad_tensors: list[Tensor] | None = None, retain_graph: bool = False, create_graph: bool = False, inputs: list[Tensor] | None = None)
source

Compute gradients of tensors w.r.t. the leaf variables in their graph.

Top-level entry point that triggers reverse-mode automatic differentiation across the computation graph rooted at tensors. For every leaf tensor x reachable from tensors whose requires_grad is True, this function accumulates L/x\partial \mathcal{L} / \partial x into x.grad, where L\mathcal{L} is the (possibly weighted) sum of the root tensors.

The chain rule is applied edge-by-edge during a topological walk of the graph in reverse order, so each intermediate Jacobian-vector product fires exactly once.

Parameters

tensorsTensor or list of Tensor
Root tensors at which the backward pass starts. When more than one root is supplied each receives its own seed and the contributions are summed at every shared leaf.
grad_tensorslist of Tensor or None= None
Seed cotangent vectors, one per root tensor and matching the corresponding root's shape. Required when a root is non-scalar. Defaults to ones_like(t) for each root, which is correct for scalar losses.
retain_graphbool= False
If True the intermediate saved tensors are not freed after the backward pass, so the same graph can be traversed again. Necessary when calling backward multiple times on overlapping graphs or when create_graph is also True.
create_graphbool= False
If True the operations performed during backward are themselves recorded in the graph, enabling higher-order differentiation (e.g. Hessian-vector products, meta-learning). Implies stronger memory usage. Defaults to False.
inputslist of Tensor or None= None
Reserved for the future ability to restrict gradient accumulation to a specified subset of leaves. Currently unused.

Returns

None

Gradients are accumulated in-place onto each leaf tensor's .grad attribute. Existing .grad values are added to, not overwritten — call Tensor.zero_grad (or the optimizer's zero_grad) between successive backward passes if accumulation is undesired.

Notes

Reverse-mode AD computes

Lx=i(Lti) ⁣tix,\frac{\partial \mathcal{L}}{\partial x} = \sum_{i} \left( \frac{\partial \mathcal{L}}{\partial t_i} \right)^{\!\top} \frac{\partial t_i}{\partial x},

propagating cotangents tˉi=L/ti\bar t_i = \partial \mathcal{L} / \partial t_i from the roots through each saved op contract until every reachable leaf has received its contribution.

Memory/compute trade-off:

  • retain_graph=False (default) is the cheapest mode — once the walk finishes, every saved tensor is freed.
  • retain_graph=True, create_graph=False keeps activations so the same graph can be traversed again.
  • create_graph=True additionally records the backward ops in a new graph, doubling memory in the worst case but enabling 2L\nabla^2 \mathcal{L} and beyond.

Examples

>>> import lucid
>>> from lucid.autograd import backward
>>> x = lucid.tensor([1.0, 2.0, 3.0], requires_grad=True)
>>> y = (x * x).sum()
>>> backward(y)
>>> x.grad
Tensor([2., 4., 6.])