grad

→tuple[Tensor or None, ...]

grad(outputs: Tensor | list[Tensor], inputs: Tensor | list[Tensor], grad_outputs: list[Tensor] | None = None, retain_graph: bool | None = None, create_graph: bool = False, only_inputs: bool = True, allow_unused: bool = False)

source edit

Compute gradients of outputs w.r.t. inputs, returning them as a tuple.

The "functional" gradient interface — invoke once to get the partial derivatives back without touching .grad on any leaf tensor. Useful for higher-order differentiation, gradient-based meta-learning, or any pattern where you want to use the gradients as input to a new computation rather than to update parameters in-place.

Parameters

outputsTensor or list of Tensor

Output tensors to differentiate. Each must have requires_grad set in the graph that produced it.

inputsTensor or list of Tensor

Input tensors w.r.t. which gradients are requested. Each must be a leaf (or non-leaf with requires_grad=True if you want grads flowing into intermediate nodes).

grad_outputslist of Tensor= None

Seed gradients

\partial \mathcal{L} / \partial \text{outputs}

for non-scalar outputs. If omitted, outputs is expected to be scalar and an implicit ones_like seed is used.

retain_graphbool= None

Keep the autograd graph alive after this call so additional backward passes are possible. Defaults to create_graph.

create_graphbool= False

If True, build the autograd graph of the gradient itself so the returned tensors are differentiable — used by gradgradcheck and other higher-order recipes.

only_inputsbool= True

Reserved for reference-framework compatibility; gradients are always restricted to the requested inputs.

allow_unusedbool= False

If True, return None for any inputs entry that lies outside the computation graph of outputs. Otherwise raise.

Returns

tuple[Tensor or None, ...]

One gradient per element of inputs, in the same order. Entries are None only when allow_unused=True and the input is disconnected from outputs.

Notes

Mathematically, grad computes the vector-Jacobian product

\frac{\partial}{\partial \mathbf{x}} \left(\sum_k \text{grad\_outputs}_k \cdot \text{outputs}_k\right)

via one reverse-mode pass. Unlike Tensor.backward, it does NOT accumulate into .grad — leaf tensors' existing .grad values are preserved across the call. For chained gradient computations (Hessian-vector products, MAML inner loops, etc.) this is the right primitive.

Examples

Scalar output — no seed needed:
>>> import lucid
>>> from lucid.autograd import grad
>>> x = lucid.randn(3, requires_grad=True)
>>> y = (x * x).sum()
>>> (gx,) = grad(y, [x])
>>> gx                         # equals 2 * x
Tensor([...])
Vector output — explicit seed:
>>> z = x * x
>>> seed = lucid.ones_like(z)
>>> (gx,) = grad(z, [x], grad_outputs=[seed])
Higher-order with create_graph=True:
>>> y = (x ** 3).sum()
>>> (g,) = grad(y, [x], create_graph=True)
>>> (gg,) = grad(g.sum(), [x])    # second derivative: 6x

Used by 3

grad(outputs: Tensor | list[Tensor], inputs: Tensor | list[Tensor], grad_outputs: list[Tensor] | None = None, retain_graph: bool | None = None, create_graph: bool = False, only_inputs: bool = True, allow_unused: bool = False)

Scalar output — no seed needed: >>> import lucid >>> from lucid.autograd import grad >>> x = lucid.randn(3, requires_grad=True) >>> y = (x * x).sum() >>> (gx,) = grad(y, [x]) >>> gx # equals 2 * x Tensor([...]) Vector output — explicit seed: >>> z = x * x >>> seed = lucid.ones_like(z) >>> (gx,) = grad(z, [x], grad_outputs=[seed]) Higher-order with create_graph=True: >>> y = (x ** 3).sum() >>> (g,) = grad(y, [x], create_graph=True) >>> (gg,) = grad(g.sum(), [x]) # second derivative: 6x