backward
→Nonebackward(tensors: Tensor | list[Tensor], grad_tensors: list[Tensor] | None = None, retain_graph: bool = False, create_graph: bool = False, inputs: list[Tensor] | None = None)Compute gradients of tensors w.r.t. the leaf variables in their graph.
Top-level entry point that triggers reverse-mode automatic
differentiation across the computation graph rooted at
tensors. For every leaf tensor x reachable from
tensors whose requires_grad is True, this function
accumulates into
x.grad, where is the (possibly weighted)
sum of the root tensors.
The chain rule is applied edge-by-edge during a topological walk of the graph in reverse order, so each intermediate Jacobian-vector product fires exactly once.
Parameters
tensorsTensor or list of Tensorgrad_tensorslist of Tensor or None= Noneones_like(t) for each root, which
is correct for scalar losses.retain_graphbool= FalseTrue the intermediate saved tensors are not freed after
the backward pass, so the same graph can be traversed again.
Necessary when calling backward multiple times on
overlapping graphs or when create_graph is also True.create_graphbool= FalseTrue the operations performed during backward are
themselves recorded in the graph, enabling higher-order
differentiation (e.g. Hessian-vector products, meta-learning).
Implies stronger memory usage. Defaults to False.inputslist of Tensor or None= NoneReturns
NoneGradients are accumulated in-place onto each leaf tensor's
.grad attribute. Existing .grad values are added to,
not overwritten — call Tensor.zero_grad (or the
optimizer's zero_grad) between successive backward passes
if accumulation is undesired.
Notes
Reverse-mode AD computes
propagating cotangents from the roots through each saved op contract until every reachable leaf has received its contribution.
Memory/compute trade-off:
retain_graph=False(default) is the cheapest mode — once the walk finishes, every saved tensor is freed.retain_graph=True, create_graph=Falsekeeps activations so the same graph can be traversed again.create_graph=Trueadditionally records the backward ops in a new graph, doubling memory in the worst case but enabling and beyond.
Examples
>>> import lucid
>>> from lucid.autograd import backward
>>> x = lucid.tensor([1.0, 2.0, 3.0], requires_grad=True)
>>> y = (x * x).sum()
>>> backward(y)
>>> x.grad
Tensor([2., 4., 6.])