fn

grad

tuple[Tensor or None, ...]
grad(outputs: Tensor | list[Tensor], inputs: Tensor | list[Tensor], grad_outputs: list[Tensor] | None = None, retain_graph: bool | None = None, create_graph: bool = False, only_inputs: bool = True, allow_unused: bool = False)
source

Compute gradients of outputs w.r.t. inputs, returning them as a tuple.

The "functional" gradient interface — invoke once to get the partial derivatives back without touching .grad on any leaf tensor. Useful for higher-order differentiation, gradient-based meta-learning, or any pattern where you want to use the gradients as input to a new computation rather than to update parameters in-place.

Parameters

outputsTensor or list of Tensor
Output tensors to differentiate. Each must have requires_grad set in the graph that produced it.
inputsTensor or list of Tensor
Input tensors w.r.t. which gradients are requested. Each must be a leaf (or non-leaf with requires_grad=True if you want grads flowing into intermediate nodes).
grad_outputslist of Tensor= None
Seed gradients L/outputs\partial \mathcal{L} / \partial \text{outputs} for non-scalar outputs. If omitted, outputs is expected to be scalar and an implicit ones_like seed is used.
retain_graphbool= None
Keep the autograd graph alive after this call so additional backward passes are possible. Defaults to create_graph.
create_graphbool= False
If True, build the autograd graph of the gradient itself so the returned tensors are differentiable — used by gradgradcheck and other higher-order recipes.
only_inputsbool= True
Reserved for reference-framework compatibility; gradients are always restricted to the requested inputs.
allow_unusedbool= False
If True, return None for any inputs entry that lies outside the computation graph of outputs. Otherwise raise.

Returns

tuple[Tensor or None, ...]

One gradient per element of inputs, in the same order. Entries are None only when allow_unused=True and the input is disconnected from outputs.

Notes

Mathematically, grad computes the vector-Jacobian product

x(kgrad_outputskoutputsk)\frac{\partial}{\partial \mathbf{x}} \left(\sum_k \text{grad\_outputs}_k \cdot \text{outputs}_k\right)

via one reverse-mode pass. Unlike Tensor.backward, it does NOT accumulate into .grad — leaf tensors' existing .grad values are preserved across the call. For chained gradient computations (Hessian-vector products, MAML inner loops, etc.) this is the right primitive.

Examples

Scalar output — no seed needed:
>>> import lucid
>>> from lucid.autograd import grad
>>> x = lucid.randn(3, requires_grad=True)
>>> y = (x * x).sum()
>>> (gx,) = grad(y, [x])
>>> gx                         # equals 2 * x
Tensor([...])
Vector output — explicit seed:
>>> z = x * x
>>> seed = lucid.ones_like(z)
>>> (gx,) = grad(z, [x], grad_outputs=[seed])
Higher-order with ``create_graph=True``:
>>> y = (x ** 3).sum()
>>> (g,) = grad(y, [x], create_graph=True)
>>> (gg,) = grad(g.sum(), [x])    # second derivative: 6x