Tensor
Tensor(data: np.ndarray | list[object] | int | float | bool | Tensor, dtype: _dtype_cls | _C_engine.Dtype | str | None = None, device: _device_cls | str | None = None, requires_grad: bool = False)Multi-dimensional array with automatic differentiation support.
The central data structure of the Lucid framework. Wraps a C++ TensorImpl
via composition. Tensors live on either cpu (Apple Accelerate) or
metal (Apple Metal GPU) devices.
Parameters
dataarray_likedtypelucid.dtype= Nonelucid.float32.devicestr or lucid.device= None"cpu" or "metal"). Defaults to the global default.requires_gradbool= FalseTrue, operations on this tensor are recorded for autograd.
Default is False.Examples
>>> import lucid
>>> x = lucid.Tensor([[1.0, 2.0], [3.0, 4.0]])
>>> x.shape
(2, 2)
>>> x.dtype
lucid.float32Methods (88)
__init__
→Tensor__init__(data: np.ndarray | list[object] | int | float | bool | Tensor, dtype: _dtype_cls | _C_engine.Dtype | str | None = None, device: _device_cls | str | None = None, requires_grad: bool = False)Construct a Tensor from Python data, a NumPy array, or another Tensor.
The data is funnelled through lucid._factories.converters._to_impl,
the canonical bridge between the outside world and Lucid's C++
TensorImpl. Python scalars become 0-d tensors; nested lists are
recursively flattened with shape inference; NumPy arrays cross the
single sanctioned host-to-engine boundary; and existing Tensor
sources are rewrapped (optionally with cast and/or device transfer).
Parameters
datandarray | list | int | float | bool | Tensorlucid._factories.converters; another Tensor
is shallow-copied to a new Tensor with possibly different
dtype / device / requires_grad.dtypedtype | str | None= NoneNone, inferred from data.devicedevice | str | None= None"cpu" or "metal"). If None, defaults
to the source device or "cpu" for host data.requires_gradbool= FalseReturns
TensorA freshly constructed tensor whose storage is owned by the engine.
Notes
Constructor is one of the six sanctioned host-to-engine bridge boundaries (rule H4). NumPy arrays cross the bridge exactly once here; afterwards the data is owned by Lucid's engine. The resulting layout is C-contiguous row-major:
where is the element size in bytes and are the dimension sizes.
Examples
>>> import lucid
>>> lucid.Tensor([1.0, 2.0, 3.0])
tensor([1., 2., 3.])
>>> lucid.Tensor([[1, 2], [3, 4]], dtype=lucid.int64).shape
(2, 2)
>>> x = lucid.Tensor(3.14, requires_grad=True)
>>> x.requires_grad
Trueimpl
→lucid._C.engine.TensorImplimpl: _C_engine.TensorImplAccess the underlying C++ TensorImpl object.
Provides read-only access to the engine-side tensor that backs this Python wrapper. Used by ops and the autograd engine to fetch the native handle without going through Python-level conversions.
Returns
lucid._C.engine.TensorImplThe engine tensor object. Lifetime is tied to self; do not
retain references beyond self's lifetime.
Notes
This is an internal accessor exposed for advanced use cases such as
writing custom ops that bind directly against the engine. Public
APIs should compose existing Tensor operations instead.
Logically the identity projection
with
.
Examples
>>> import lucid
>>> x = lucid.zeros(3)
>>> type(x.impl).__name__
'TensorImpl'shape
→tuple[int, ...]shape: tuple[int, ...]Shape of the tensor as a tuple of integers.
Each element gives the size of the corresponding dimension, with
shape[0] being the outermost (batch) dimension. A scalar tensor
(0-d) has shape == ().
Returns
tuple[int, ...]Immutable tuple of dimension sizes.
Notes
For a tensor with elements arranged in dimensions , the total element count satisfies:
Examples
>>> import lucid
>>> x = lucid.zeros(3, 4, 5)
>>> x.shape
(3, 4, 5)
>>> lucid.tensor(1.0).shape
()dtype
→lucid.dtypedtype: _dtype_clsData type of the tensor elements.
Reflects the numeric format used to store each element in memory. Lucid supports the following dtypes:
lucid.float32(default for floating-point)lucid.float64lucid.float16lucid.bfloat16lucid.int8,lucid.int16,lucid.int32,lucid.int64lucid.bool_lucid.complex64
Returns
lucid.dtypeThe element type of this tensor.
Notes
The memory footprint of a tensor is bytes, where
is the total number of elements and is
dtype.itemsize.
Examples
>>> import lucid
>>> lucid.zeros(3).dtype
lucid.float32
>>> lucid.zeros(3, dtype=lucid.int64).dtype
lucid.int64device
→lucid.devicedevice: _device_clsDevice on which this tensor is stored.
Lucid tensors reside on one of two devices:
cpu— Apple Accelerate (vDSP / vForce / BLAS / LAPACK).metal— Apple Metal GPU via the MLX backend.
Returns
lucid.deviceThe device object for this tensor.
Notes
On Apple Silicon the CPU and GPU share the same physical DRAM (unified memory architecture). Moving a tensor between devices copies the logical dispatch target, not physical bytes, unless the tensor is in a non-shared Metal buffer. The dispatch routing rule is
Examples
>>> import lucid
>>> x = lucid.zeros(3)
>>> x.device
device(type='cpu')
>>> x.metal().device
device(type='metal')ndim
→intndim: intNumber of dimensions (rank) of the tensor.
Equivalent to len(tensor.shape). A scalar tensor has ndim == 0,
a vector has ndim == 1, a matrix has ndim == 2, and so on.
Returns
intThe number of dimensions.
Notes
For a shape tuple the rank is simply
Examples
>>> import lucid
>>> lucid.tensor(3.14).ndim
0
>>> lucid.zeros(5).ndim
1
>>> lucid.zeros(2, 3).ndim
2T
→TensorT: SelfTensor with all dimensions reversed.
For a 2-D tensor this is the standard matrix transpose. For tensors
with more than two dimensions, the axis order is fully reversed:
axis i maps to axis ndim - 1 - i.
Returns
TensorA view (or copy) with reversed dimension order.
Notes
For a tensor of shape , the
transposed tensor has shape .
This differs from mT, which only transposes the final two axes.
Examples
>>> import lucid
>>> x = lucid.arange(6).reshape(2, 3)
>>> x.shape
(2, 3)
>>> x.T.shape
(3, 2)
>>> lucid.arange(24).reshape(2, 3, 4).T.shape
(4, 3, 2)mT
→TensormT: SelfTensor with the last two dimensions transposed.
Equivalent to calling lucid.swapaxes(x, -2, -1). Useful for
batched linear-algebra operations where the batch dimensions should
remain untouched.
Returns
TensorA view (or copy) with axes -2 and -1 swapped.
Notes
For a tensor of shape the result has shape . The leading batch dimensions are unchanged.
Examples
>>> import lucid
>>> x = lucid.zeros(4, 3, 5)
>>> x.mT.shape
(4, 5, 3)is_metal
→boolis_metal: boolWhether this tensor resides on the Metal (GPU) device.
On Apple Silicon the GPU backend is Apple Metal accessed through the MLX library. CPU tensors use Apple Accelerate instead.
Returns
boolTrue if the tensor is on the Metal device, False for CPU.
Notes
Equivalent predicate . On Apple Silicon the Metal backend dispatches via MLX, which lazily builds a graph and evaluates on the GPU when results are observed.
Examples
>>> import lucid
>>> x = lucid.zeros(3)
>>> x.is_metal
False
>>> x.metal().is_metal
Trueis_shared
→boolis_shared: boolTrue when backed by a Metal MTLResourceStorageModeShared buffer.
Shared-memory tensors live in Apple Silicon unified DRAM and are
simultaneously accessible from CPU and GPU without a memcpy.
Create them with lucid.metal.shared_tensor() or promote an
existing tensor with lucid.metal.to_shared().
Returns
boolTrue if the tensor's storage uses Metal's shared storage
mode, False otherwise (including all pure-CPU tensors and
private-mode Metal buffers).
Notes
On Apple Silicon, shared storage permits zero-copy hand-off between CPU and GPU code paths because both engines see the same physical DRAM page. The trade-off is that Metal's GPU caches cannot be used as aggressively as with private storage; for compute-bound GPU kernels, private mode is often preferable. Predicate: .
Examples
>>> import lucid
>>> x = lucid.zeros(3)
>>> x.is_shared
False
>>> y = lucid.metal.shared_tensor((3,))
>>> y.is_shared
Trueis_leaf
→boolis_leaf: boolWhether this tensor is a leaf in the autograd computation graph.
A tensor is a leaf if it was created directly by the user (not as the
result of an operation) or if it does not require gradients. Only
leaf tensors accumulate gradients into their .grad attribute during
backward.
Returns
boolTrue for leaf tensors, False for intermediate results.
Notes
In the computation graph, leaf nodes are the "inputs" to the
forward pass. All tensors created with requires_grad=False
are leaves by definition. Tensors created with
requires_grad=True directly by the user are also leaves.
Tensors produced by differentiable operations on
requires_grad=True inputs are not leaves — they are
intermediate nodes and their .grad is not retained unless
retain_grad is called. Predicate:
Examples
>>> import lucid
>>> x = lucid.tensor([1.0, 2.0], requires_grad=True)
>>> x.is_leaf
True
>>> y = x * 2 # result of an op — not a leaf
>>> y.is_leaf
Falserequires_grad
→boolrequires_grad: boolWhether gradient computation is enabled for this tensor.
When True, operations involving this tensor are recorded in the
computation graph so that backward can propagate gradients
back through them.
Setting this attribute on a leaf tensor promotes or demotes it from
the autograd graph. Calling requires_grad = True on an
intermediate (non-leaf) tensor raises a RuntimeError because
those tensors are not user-controlled inputs.
Returns
boolTrue if gradients are tracked for this tensor.
Notes
Tensors with requires_grad=True become nodes in the autograd DAG.
Operations consuming them produce non-leaf tensors that also require
gradients (transitive closure of the flag along the forward graph).
Examples
>>> import lucid
>>> x = lucid.tensor([1.0, 2.0])
>>> x.requires_grad
False
>>> x.requires_grad = True
>>> x.requires_grad
Truedim
→intdim()Return the number of dimensions (rank) of the tensor.
This is identical to the ndim property. It exists as a
method for API compatibility with code that calls tensor.dim().
Returns
intNumber of dimensions.
Notes
Equivalent to len(tensor.shape); matches the conventional rank
such that .
Examples
>>> import lucid
>>> lucid.zeros(2, 3, 4).dim()
3
>>> lucid.tensor(1.0).dim()
0size
→intsize(dim: int | None = None)Return the size of a specific dimension, or the full shape tuple.
Parameters
dimint= NoneReturns
intSize of the requested dimension (when dim is given).
Notes
The full shape and the element count are related by
Examples
>>> import lucid
>>> x = lucid.zeros(3, 4, 5)
>>> x.size()
(3, 4, 5)
>>> x.size(0)
3
>>> x.size(-1)
5is_contiguous
→boolis_contiguous()Return True if the tensor's data is stored contiguously in memory.
A contiguous tensor stores its elements in a single unbroken block of memory in C (row-major) order — i.e. the stride of each dimension equals the product of all later dimension sizes times the element size.
Non-contiguous tensors can arise from operations such as slicing,
transposing, or permuting axes. Many C++ kernel paths require
contiguous input; call contiguous to get a contiguous copy
when needed.
Returns
boolTrue if the tensor uses a single contiguous memory block in
C-order, False otherwise.
Notes
A tensor of shape is contiguous iff its strides satisfy the C-order recurrence
(in element units). Many Accelerate/MLX kernels require contiguous
inputs; if not, call contiguous first.
Examples
>>> import lucid
>>> x = lucid.zeros(3, 4)
>>> x.is_contiguous()
True
>>> x.T.is_contiguous() # transpose is not contiguous
Falsegrad
→Tensor or Nonegrad: Self | NoneAccumulated gradient tensor, or None if not yet computed.
After calling backward, this attribute holds the gradient of
the scalar loss with respect to this tensor — i.e.
where
is this tensor.
Gradients are accumulated (added) across multiple backward
calls. Zero the gradient before each optimisation step with
tensor.grad = None or tensor.grad.zero_(), or use an optimiser
that calls zero_grad() automatically.
Only leaf tensors (those created directly by user code with
requires_grad=True) populate this attribute by default.
Non-leaf (intermediate) tensors discard their gradient after the
backward pass unless retain_grad was called.
Returns
Tensor or NoneThe accumulated gradient, or None if backward has not
been called yet or if this tensor does not require gradients.
Notes
The gradient has the same shape as the tensor itself:
Examples
>>> import lucid
>>> x = lucid.tensor([1.0, 2.0, 3.0], requires_grad=True)
>>> y = (x ** 2).sum()
>>> y.backward()
>>> x.grad # d(sum(x^2))/dx = 2x
tensor([2., 4., 6.])grad_fn
→lucid._C.engine.Node or Nonegrad_fn: _C_engine.Node | NoneThe autograd graph node that created this tensor, or None.
Every tensor produced by a differentiable operation holds a reference
to the C++ Node (gradient function) that can propagate gradients
back through that operation. The graph is a directed acyclic graph
(DAG) of such nodes; backward traverses it in reverse
topological order.
Leaf tensors (created directly, not via ops) always have
grad_fn = None.
Returns
lucid._C.engine.Node or NoneThe gradient-computing node, or None for leaf tensors or
tensors that do not require gradients.
Notes
The graph forms the chain-rule factorisation used by backward:
where and grad_fn carries the
Jacobian-vector product .
Examples
>>> import lucid
>>> x = lucid.tensor([1.0, 2.0], requires_grad=True)
>>> x.grad_fn is None # leaf — no grad_fn
True
>>> y = x * 3
>>> y.grad_fn is None # result of an op — has a grad_fn
Falserequires_grad_
→Tensorrequires_grad_(requires_grad: bool = True)Enable or disable gradient tracking for this tensor, in-place.
Unlike the requires_grad property setter, this method
returns self so it can be chained inline:
.. code-block:: python
x = lucid.randn(3, 4).requires_grad_(True)
Parameters
requires_gradbool= TrueTrue.Returns
Tensorself with the updated requires_grad flag.
Notes
In-place flag flip: the underlying storage is preserved, but the
TensorImpl is replaced with one whose autograd flag is
.
Only valid on leaf tensors; non-leaf tensors inherit the flag
from their producing op and cannot be flipped on directly.
Examples
>>> import lucid
>>> x = lucid.zeros(3)
>>> x.requires_grad_(True).requires_grad
Trueretain_grad
→Noneretain_grad()Retain the gradient on this non-leaf tensor after backward.
By default, gradients are only stored on leaf tensors. Intermediate
(non-leaf) results in the computation graph have their .grad
discarded after the backward pass to save memory. Calling
retain_grad() on an intermediate tensor before the forward pass
instructs the engine to keep that gradient so it can be inspected
afterwards.
Notes
This method must be called before the forward computation whose
gradient you want to inspect. Calling it after backward has
no effect. Conceptually retains
for
the intermediate node instead of discarding it
after its parents have consumed it.
Examples
>>> import lucid
>>> x = lucid.tensor([1.0, 2.0, 3.0], requires_grad=True)
>>> y = x * 2 # intermediate — grad normally discarded
>>> y.retain_grad()
>>> y.sum().backward()
>>> y.grad # now available: d(sum(y))/dy = [1., 1., 1.]
tensor([1., 1., 1.])register_hook
→RemovableHandleregister_hook(hook: Callable[[Tensor], Tensor | None])Register a hook that fires when this tensor's gradient is computed.
The hook receives the accumulated gradient tensor. If it returns a
non-None lucid.Tensor, that value replaces the gradient.
Parameters
hookcallablehook(grad: Tensor) -> Tensor | NoneReturns
RemovableHandleCall .remove() to de-register the hook, or use it as a
context manager.
Notes
- For leaf tensors the hook fires after
backwardaccumulates the gradient, which is the common use case (gradient clipping, logging). - For non-leaf tensors, call
retain_gradbefore the forward pass so the gradient is preserved and available when hooks fire. - The hook must be registered before the forward computation for non-leaf tensors; for leaf tensors any timing works.
Chain-rule effect: if the hook returns a tensor ,
the engine substitutes before continuing backward propagation.
A returned None leaves the gradient untouched.
Examples
>>> x = lucid.tensor([1.0, 2.0, 3.0], requires_grad=True)
>>> grads = []
>>> h = x.register_hook(lambda g: grads.append(g.clone()))
>>> (x * 2).sum().backward()
>>> grads[0] # tensor([2., 2., 2.])
>>> h.remove() # de-registerbackward
→Nonebackward(gradient: Tensor | None = None, retain_graph: bool = False, create_graph: bool = False)Compute gradients via reverse-mode automatic differentiation.
Traverses the computation graph in reverse topological order, applying
the chain rule at each node to accumulate
into the
.grad attribute of every leaf tensor that has
requires_grad=True.
Parameters
gradientTensor= Noneself. Must have the same shape as self.
- For scalar tensors (
numel() == 1) this may be omitted; Lucid uses an implicit seed of1.0. - For non-scalar tensors it is required. Pass the upstream gradient explicitly (e.g. when differentiating through a loss that is itself a vector).
retain_graphbool= FalseFalse (default), intermediate tensors and gradient functions
stored in the computation graph are freed immediately after the
backward pass to reclaim memory. Set to True if you need to
call backward() again on the same graph (e.g. to compute
multiple gradient signals or to inspect intermediate values).create_graphbool= FalseTrue the
backward pass itself is differentiable, enabling gradients of
gradients. Not yet fully supported; accepted for API
compatibility.Raises
RuntimeErrorself has more than one element and gradient is not
provided.RuntimeErrorgradient.shape != self.shape.Notes
The chain rule and reverse-mode AD
Given a scalar loss and a sequence of operations , the chain rule gives:
where is the Jacobian of the -th operation. Reverse-mode AD (backpropagation) evaluates this product right-to-left, starting from the scalar output with seed gradient , so it computes gradients for all inputs in a single backward pass — passes regardless of the number of parameters.
Gradient accumulation
Gradients are added to tensor.grad rather than overwritten.
This is intentional: it supports patterns like accumulated gradient
steps. Zero gradients explicitly before each optimisation step:
.. code-block:: python
for param in model.parameters():
param.grad = None # or param.grad.zero_()
Metal flush
On Metal devices Lucid calls TensorImpl::eval() before the
backward pass. This forces MLX to evaluate the forward graph eagerly
so that the backward kernel sees concrete values rather than deferred
MLX computations. In practice this yields roughly 2× faster
backward passes for typical model sizes.
Examples
Scalar output — no explicit gradient seed needed:
>>> import lucid
>>> x = lucid.tensor([1.0, 2.0, 3.0], requires_grad=True)
>>> loss = (x ** 2).sum() # scalar
>>> loss.backward()
>>> x.grad # d(sum(x^2))/dx = 2x
tensor([2., 4., 6.])
Non-scalar output — must supply a gradient seed:
>>> import lucid
>>> x = lucid.tensor([1.0, 2.0], requires_grad=True)
>>> y = x * 3 # shape (2,) — not a scalar
>>> y.backward(lucid.ones(2))
>>> x.grad # d(3x)/dx = 3 for each element
tensor([3., 3.])
Multiple backward passes with ``retain_graph=True``:
>>> import lucid
>>> x = lucid.tensor([2.0], requires_grad=True)
>>> y = x ** 3
>>> y.backward(retain_graph=True) # first pass
>>> y.backward() # second pass — accumulates
>>> x.grad # 3*x^2 + 3*x^2 = 2 * 12 = 24
tensor([24.])clamp_min_
→Tensorclamp_min_(min: float)Raise all elements below min to min, in-place.
Shorthand for clamp_(min=min).
Parameters
minfloatmin are set to
min.Returns
Tensorself after clamping.
Notes
Pointwise rectification:
The non-smooth point at has subgradient
; autograd selects 1 for
and 0 otherwise.
Examples
>>> import lucid
>>> x = lucid.tensor([-1.0, 0.0, 1.0])
>>> x.clamp_min_(0.0)
tensor([0., 0., 1.])clamp_max_
→Tensorclamp_max_(max: float)Lower all elements above max to max, in-place.
Shorthand for clamp_(max=max).
Parameters
maxfloatmax are set to
max.Returns
Tensorself after clamping.
Notes
Pointwise saturation:
The gradient is the indicator .
Examples
>>> import lucid
>>> x = lucid.tensor([0.5, 1.5, 2.5])
>>> x.clamp_max_(2.0)
tensor([0.5, 1.5, 2. ])item
→float or int or boolitem()Return the value of a single-element tensor as a Python scalar.
Delegates to the engine's TensorImpl::item which performs the
single-element extraction (including IEEE-754 binary16 to float
decoding) without going through numpy.
Returns
float or int or boolThe unboxed Python scalar. Floating-point dtypes (including
float16 / bfloat16) return float; integer dtypes
return int; bool_ returns bool.
Raises
RuntimeErrornumel() != 1).Notes
Triggers a device-to-host synchronisation when called on a Metal
tensor — the value cannot be inspected until any pending MLX
computation has completed. Avoid calling in tight loops on GPU
tensors; prefer tensor.cpu().tolist() for batch extraction.
Defined only when . item
is one of the sanctioned engine-to-host bridge points (rule H4)
and detaches from autograd: the returned Python scalar carries no
gradient information.
Examples
>>> import lucid
>>> lucid.tensor(3.14).item()
3.140000104904175
>>> lucid.tensor(7, dtype=lucid.int64).item()
7numpy
→numpy.ndarraynumpy()Return the tensor as a NumPy array (CPU only).
Imports numpy lazily — the rest of Lucid stays numpy-free unless
the user explicitly bridges through this method. When numpy is
not installed, raises an ImportError pointing at
pip install lucid[numpy].
Returns
numpy.ndarrayA NumPy view (or copy) of the tensor's data. Shape and dtype mirror the source; the array lives in host memory.
Raises
ImportErrorRuntimeErrorNotes
This is one of the sanctioned bridge points between Lucid and the
outside world (see project rule H4). The returned ndarray
does not participate in autograd; downstream NumPy operations
will not produce gradients.
Layout is C-contiguous; for a tensor of shape and itemsize , the NumPy strides equal
Examples
>>> import lucid
>>> x = lucid.tensor([[1.0, 2.0], [3.0, 4.0]])
>>> arr = x.numpy()
>>> arr.shape
(2, 2)
>>> arr.dtype
dtype('float32')tolist
→list or int or float or bool or complextolist()Return the tensor contents as a nested Python list (or scalar).
Converts the tensor to a standard Python object:
- 0-d tensor → a Python scalar (
int,float,bool, orcomplex). - 1-d tensor → a flat
list. - N-d tensor → a nested
listof depthN.
Returns
list or int or float or bool or complexNested Python representation of the tensor data.
Notes
Numpy-free for every supported dtype (F16 / F32 / F64 / I8 / I16 /
I32 / I64 / Bool / C64). Delegates to the engine-side
TensorImpl.tolist(), which mirrors item()'s dtype dispatch
but walks the full shape recursively and yields Python complex
for C64 leaves. Forces a device-to-host synchronisation for
Metal tensors via the underlying to_bytes() snapshot.
Autograd information is dropped — the returned Python objects are
pure value copies.
Examples
>>> import lucid
>>> lucid.tensor([[1, 2], [3, 4]]).tolist()
[[1, 2], [3, 4]]
>>> lucid.tensor(3.14).tolist()
3.14unfold
→Tensorunfold(dimension: int, size: int, step: int)Return a view with an extra dimension containing sliding-window slices.
Extracts non-overlapping or overlapping windows of length size
along dimension, advancing by step elements between windows.
The output has one extra trailing dimension compared to the input.
Parameters
dimensionintsizeintstepintReturns
TensorTensor of shape
where
and is the original size along
dimension.
Notes
Unfold is the fundamental primitive behind 1-D convolution and
sliding-window aggregations. The windows may overlap when
step < size.
Examples
>>> import lucid
>>> x = lucid.arange(8, dtype=lucid.float32)
>>> x.unfold(0, size=3, step=2).shape
(3, 3)
>>> # windows: [0,1,2], [2,3,4], [4,5,6]data
→Tensordata: SelfThe tensor's underlying data, detached from gradient tracking.
Returns a view of the same storage as self but with
requires_grad=False and no grad_fn. Assigning to
tensor.data replaces the underlying storage in-place without
affecting the autograd graph — useful for in-place weight updates
that should not be tracked.
Returns
TensorA non-differentiable view of the same data.
Notes
Aliases the same storage as self with the gradient flag
cleared. Mathematically the same identity as detach —
equal values, zero Jacobian — but unlike detach writes
through data propagate back to self's underlying buffer
without participating in autograd. Prefer detach for new
code; data is retained for API compatibility.
Examples
>>> import lucid
>>> x = lucid.tensor([1.0, 2.0], requires_grad=True)
>>> x.data.requires_grad
False
>>> x.data
tensor([1., 2.])new_empty
→Tensornew_empty(size: int = (), dtype: _dtype_cls | None = None, device: _device_cls | str | None = None, requires_grad: bool = False)Return an uninitialized tensor of the given shape.
The returned tensor inherits this tensor's dtype and device
unless overridden. The contents are undefined — do not read
values without first writing them.
Parameters
*sizeint= ()dtypelucid.dtype= Noneself.dtype.devicestr or lucid.device= Noneself.device.requires_gradbool= FalseFalse.Returns
TensorUninitialized tensor of shape size.
Notes
Allocates bytes of uninitialised
memory where is dtype.itemsize. Faster than
new_zeros because the engine skips the zero-fill kernel.
Must be followed by a write before any read.
Examples
>>> import lucid
>>> x = lucid.zeros(2, 3, dtype=lucid.float64)
>>> y = x.new_empty(4, 5)
>>> y.shape, y.dtype
((4, 5), lucid.float64)new_zeros
→Tensornew_zeros(size: int = (), dtype: _dtype_cls | None = None, device: _device_cls | str | None = None, requires_grad: bool = False)Return a zero-filled tensor of the given shape.
The returned tensor inherits this tensor's dtype and device
unless overridden. All elements are initialised to 0.
Parameters
*sizeint= ()dtypelucid.dtype= Noneself.dtype.devicestr or lucid.device= Noneself.device.requires_gradbool= FalseFalse.Returns
TensorZero tensor of shape size.
Notes
Initialises every element to the additive identity . The
zero-fill is delegated to a fused engine kernel — Accelerate
catlas_*set on CPU and MLX broadcast-fill on Metal.
Examples
>>> import lucid
>>> x = lucid.ones(2, dtype=lucid.int32)
>>> x.new_zeros(3, 3)
tensor([[0, 0, 0],
[0, 0, 0],
[0, 0, 0]])new_ones
→Tensornew_ones(size: int = (), dtype: _dtype_cls | None = None, device: _device_cls | str | None = None, requires_grad: bool = False)Return an all-ones tensor of the given shape.
The returned tensor inherits this tensor's dtype and device
unless overridden. All elements are initialised to 1.
Parameters
*sizeint= ()dtypelucid.dtype= Noneself.dtype.devicestr or lucid.device= Noneself.device.requires_gradbool= FalseFalse.Returns
TensorAll-ones tensor of shape size.
Notes
Initialises every element to the multiplicative identity .
For arbitrary fill values use new_full.
Examples
>>> import lucid
>>> x = lucid.zeros(2, dtype=lucid.float16)
>>> x.new_ones(2, 4)
tensor([[1., 1., 1., 1.],
[1., 1., 1., 1.]])new_full
→Tensornew_full(size: tuple[int, ...], fill_value: float, dtype: _dtype_cls | None = None, device: _device_cls | str | None = None, requires_grad: bool = False)Return a tensor of the given shape filled with a constant value.
The returned tensor inherits this tensor's dtype and device
unless overridden. Every element is set to fill_value.
Parameters
sizetuple[int, ...]fill_valuefloatdtypelucid.dtype= Noneself.dtype.devicestr or lucid.device= Noneself.device.requires_gradbool= FalseFalse.Returns
TensorConstant-filled tensor of shape size.
Notes
Constant tensor where is
fill_value and is the all-ones tensor of
the given shape. fill_value is promoted to dtype before
the broadcast fill.
Examples
>>> import lucid
>>> x = lucid.zeros(1)
>>> x.new_full((2, 3), fill_value=7.0)
tensor([[7., 7., 7.],
[7., 7., 7.]])new_tensor
→Tensornew_tensor(data: np.ndarray | list[object] | int | float | bool | Tensor, dtype: _dtype_cls | None = None, device: _device_cls | str | None = None, requires_grad: bool = False)Return a new tensor constructed from data, inheriting dtype/device.
Creates a fresh tensor from the provided data, using this tensor's
dtype and device as defaults. The data is always copied;
the result does not share storage with data even if data is
already a Tensor.
Parameters
dataarray_like or Tensordtypelucid.dtype= Noneself.dtype.devicestr or lucid.device= Noneself.device.requires_gradbool= FalseFalse.Returns
TensorNew tensor containing a copy of data.
Notes
Always copies — never aliases data's storage even when data
is already a Tensor. Routes through the same
_to_impl bridge boundary as the public constructor (rule H4).
Examples
>>> import lucid
>>> proto = lucid.zeros(1, dtype=lucid.float64)
>>> t = proto.new_tensor([[1, 2], [3, 4]])
>>> t.dtype
lucid.float64element_size
→intelement_size()Return the size in bytes of a single element.
Determined entirely by dtype. Common values:
float32/int32→ 4 bytesfloat64/int64→ 8 bytesfloat16/bfloat16/int16→ 2 bytesint8/bool_→ 1 bytecomplex64→ 8 bytes (two 32-bit floats)
Returns
intNumber of bytes per element.
Notes
The total memory footprint of the tensor is:
Examples
>>> import lucid
>>> lucid.zeros(3, dtype=lucid.float32).element_size()
4
>>> lucid.zeros(3, dtype=lucid.float64).element_size()
8itemsize
→intitemsize: intBytes per element — alias for element_size.
Provided as a property (rather than a method) for NumPy-style
attribute access: tensor.itemsize instead of
tensor.element_size().
Returns
intNumber of bytes per element.
Notes
Total footprint of the tensor satisfies .
Examples
>>> import lucid
>>> lucid.zeros(5, dtype=lucid.int16).itemsize
2nbytes
→intnbytes: intTotal number of bytes occupied by the tensor's data buffer.
Equals numel() * element_size(). Useful for estimating memory
usage and for setting buffer sizes when interoperating with C or
Metal shaders.
Returns
intTotal byte count of the data storage.
Notes
where are the dimension sizes and is the element size in bytes.
Examples
>>> import lucid
>>> lucid.zeros(4, 4, dtype=lucid.float32).nbytes
64
>>> lucid.zeros(4, 4, dtype=lucid.float64).nbytes
128stride
→tuple[int, ...] or intstride(dim: int | None = None)Return the strides of the tensor in element counts.
Parameters
dimint= NoneReturns
tuple[int, ...] or intElement-count strides (same semantics as the reference framework).
Notes
For a C-contiguous tensor of shape the element strides satisfy the row-major recurrence
Non-contiguous tensors (e.g. transposed or sliced views) may have
arbitrary strides; the address of element [i_0, \ldots, i_{d-1}]
is base + \sum_k i_k \cdot \text{stride}[k].
Examples
>>> import lucid
>>> lucid.zeros(3, 4).stride()
(4, 1)
>>> lucid.zeros(3, 4).stride(0)
4data_ptr
→intdata_ptr()Return the address of the first element as an integer.
On Apple Silicon the tensor lives in unified memory; this method
returns a best-effort identifier derived from the storage object.
Use numpy + ndarray.ctypes.data for interop that
requires the actual pointer.
Returns
intA stable, process-unique integer suitable for aliasing checks (e.g. "do these two tensors share storage?"). Not guaranteed to be the raw memory address.
Notes
On Apple Silicon CPU and GPU share unified DRAM, so the same
identifies a buffer addressable from
both backends. The value is stable for the lifetime of self;
equality is the canonical aliasing predicate
.
Examples
>>> import lucid
>>> x = lucid.zeros(3)
>>> y = x
>>> x.data_ptr() == y.data_ptr()
True
>>> z = lucid.zeros(3)
>>> x.data_ptr() != z.data_ptr()
Truestorage_offset
→intstorage_offset()Return the offset (in elements) of the first element in storage.
Contiguous tensors always return 0. Non-contiguous view tensors
may return a non-zero offset in frameworks that support strided
views; Lucid currently represents all tensors as contiguous so
this always returns 0.
Returns
intThe element offset into the underlying storage where self
begins. Always 0 in the current Lucid implementation.
Notes
In Lucid every tensor owns a fresh contiguous storage, so the offset is identically zero: . Frameworks that support sub-views over a shared buffer use this for pointer arithmetic; for Lucid it is provided purely for API compatibility.
Examples
>>> import lucid
>>> lucid.zeros(3, 4).storage_offset()
0H
→TensorH: TensorConjugate (Hermitian) transpose of the last two axes.
For real-valued tensors this is identical to mT. For
complex tensors the elements are conjugated before transposing,
producing the Hermitian adjoint familiar
from linear algebra:
Returns
TensorA tensor of shape for an input of
shape . Real inputs round-trip through
mT; complex inputs are first conjugated by
lucid._ops.composite.conj.
Notes
Equivalent to x.conj().mT for complex tensors. For batched
linear algebra the leading dimensions are untouched.
Examples
>>> import lucid
>>> x = lucid.arange(6).reshape(2, 3)
>>> x.H.shape
(3, 2)type
→strtype(dtype: str | None = None)Return or cast the tensor type using legacy type strings.
t.type()— return a string like'lucid.FloatTensor'.t.type('lucid.DoubleTensor')— cast and return the new tensor.
Supported type strings: FloatTensor, DoubleTensor,
HalfTensor, IntTensor, LongTensor, BoolTensor,
ShortTensor, ByteTensor.
Parameters
dtypestr= NoneNone, returns the type string
of self; otherwise casts to the corresponding dtype.Returns
strWhen dtype is None — the legacy type label.
Raises
TypeErrordtype is not one of the supported legacy strings.Notes
Provided purely for API compatibility with code that uses legacy
type strings. New code should use dtype for inspection and
to for casting — both avoid stringly-typed dispatch.
Examples
>>> import lucid
>>> lucid.zeros(3).type()
'lucid.FloatTensor'
>>> lucid.zeros(3).type('lucid.LongTensor').dtype
lucid.int64get_device
→intget_device()Return the device index.
Returns 0 for Metal (GPU) tensors and -1 for CPU tensors,
following the convention adopted by the reference framework.
Returns
int0 for Metal-resident tensors; -1 for CPU-resident tensors.
Notes
Indicator-style encoding:
.
Lucid only supports a single Metal device on Apple Silicon, so a
positive index is always 0.
Examples
>>> import lucid
>>> lucid.zeros(3).get_device()
-1
>>> lucid.zeros(3).metal().get_device()
0pin_memory
→Tensorpin_memory(device: object = None)Return self — pinned memory is a no-op on Apple Silicon.
Apple Silicon uses unified memory: CPU and GPU share the same physical DRAM, so the "page-lock host memory to accelerate host-to-device DMA" concept from discrete-GPU frameworks does not apply. This method exists for API compatibility and simply returns the tensor unchanged.
Parameters
deviceobject= NoneReturns
Tensorself, untouched.
Notes
See also is_pinned (always False) and
share_memory_ (also a no-op). The function is the
identity: .
Examples
>>> import lucid
>>> x = lucid.zeros(3)
>>> x.pin_memory() is x
Trueis_pinned
→boolis_pinned(device: object = None)Return False — pinned memory is not applicable on Apple Silicon.
Apple Silicon's unified-memory architecture makes the distinction
between "pageable" and "page-locked" host memory irrelevant: CPU
and GPU already see the same DRAM. This predicate is provided for
API compatibility and always reports False.
Parameters
deviceobject= NoneReturns
boolAlways False.
Notes
Identically false: . Unified memory makes the concept moot — all host buffers are already DMA-accessible to the GPU without page-locking.
Examples
>>> import lucid
>>> lucid.zeros(3).is_pinned()
Falseis_cuda
→boolis_cuda: boolReturn False — Lucid does not target NVIDIA GPUs.
Lucid is Apple-Silicon-exclusive: the GPU stream is MLX-on-Metal,
not NVIDIA's discrete GPU stack. Use is_metal to detect
GPU-resident tensors. This property exists purely for API
compatibility with code paths that probe for the legacy attribute.
Returns
boolAlways False.
Notes
Identically false: .
Use is_metal to query GPU residency on Apple Silicon.
Examples
>>> import lucid
>>> lucid.zeros(3).is_cuda
Falsereshape_as
→Tensorreshape_as(other: Tensor)Return a tensor with the same data reshaped to other.shape.
Convenience wrapper around reshape that takes the target
shape from another tensor instead of as a tuple. Element count
must agree:
Parameters
otherTensorother.shape is
consulted; the values and dtype of other are ignored.Returns
TensorA view (or copy if storage layout requires) of self with
shape other.shape.
Raises
RuntimeErrorself.numel() != other.numel().Notes
Reshape preserves element order under row-major (C) traversal;
the element at flat index
in self becomes the element at the same flat index in the
result. A view is returned when the source is contiguous;
otherwise a contiguous copy is materialised first.
Examples
>>> import lucid
>>> x = lucid.arange(12)
>>> proto = lucid.zeros(3, 4)
>>> x.reshape_as(proto).shape
(3, 4)untyped_storage
→Tensor._UntypedStorageuntyped_storage()Return a minimal storage view of the underlying data buffer.
The returned object exposes data_ptr, size,
and nbytes — the subset needed for common introspection
patterns (aliasing checks, memory-footprint accounting, debug
printing).
Returns
Tensor._UntypedStorageLightweight storage handle whose __len__ and size
report the byte count of the buffer.
Notes
Lucid does not currently expose a fully-featured Storage type;
untyped_storage is the introspection-only minimum. Mutating
through this handle is not supported. The reported size satisfies
.
Examples
>>> import lucid
>>> x = lucid.zeros(4, dtype=lucid.float32)
>>> s = x.untyped_storage()
>>> s.size()
16
>>> len(s)
16share_memory_
→Tensorshare_memory_()Mark storage as shareable across processes — a no-op on Apple Silicon.
On platforms with separate CPU and GPU address spaces this method
moves the storage into a shared-memory segment so that worker
processes (e.g. DataLoader workers) can read it without copying.
Apple Silicon's unified-memory architecture makes this unnecessary:
all tensors are already addressable from every process that holds
a reference to the buffer. Returns self for chaining.
Returns
Tensorself, unchanged.
Notes
Provided for API compatibility. See also is_pinned and
pin_memory, which are no-ops for the same reason. The
operation is the identity:
.
Examples
>>> import lucid
>>> x = lucid.zeros(3)
>>> x.share_memory_() is x
Truefill_
→Tensorfill_(value: float)Fill the tensor with a scalar value in-place.
Mutates the tensor's storage so every element becomes value.
Implemented by materialising a constant tensor with
_C_engine.full and copying it into self's storage; the
original TensorImpl (and thus identity) is preserved.
Parameters
valuefloatReturns
Tensorself (in-place); useful for method chaining.
Notes
In-place operations bypass autograd's view tracking for
performance. Calling fill_ on a leaf tensor with
requires_grad=True may raise a runtime error from the
autograd engine.
Mathematically the result is the constant tensor
with the same shape as self;
every entry satisfies .
Examples
>>> import lucid
>>> x = lucid.empty(3)
>>> _ = x.fill_(0.5)
>>> x.tolist()
[0.5, 0.5, 0.5]copy_
→Tensorcopy_(other: Self)Copy data from other into self in-place.
Overwrites self's storage with other's values. Broadcasting
is permitted: other may have a shape that broadcasts to
self.shape. The dtype of self is preserved; other
values are cast as necessary.
Parameters
otherTensorReturns
Tensorself after the copy.
Notes
Unlike clone, copy_ does not allocate a new tensor —
only self's storage is written. The autograd graph is not
extended by this operation. Element-wise:
with standard right-aligned broadcasting from other.shape to
self.shape.
Examples
>>> import lucid
>>> a = lucid.zeros(3)
>>> b = lucid.tensor([1.0, 2.0, 3.0])
>>> _ = a.copy_(b)
>>> a.tolist()
[1.0, 2.0, 3.0]expand_as
→Tensorexpand_as(other: Self)Broadcast self to match other.shape without copying data.
Convenience wrapper around broadcast_to that takes the target
shape from another tensor. The expansion is view-like: the
underlying storage is not duplicated, and the broadcast dimensions
are realised by zero-stride entries in the resulting tensor.
Parameters
otherTensorother.shape is
consulted.Returns
TensorA view of self with shape other.shape.
Notes
Broadcasting rules follow the standard right-aligned semantics:
each dimension of self.shape must either equal the
corresponding entry of other.shape or be 1. Size-1 axes
are stretched by setting the corresponding stride to zero, so the
resulting view aliases the source storage. Formally, for each axis
with stride on stretched axes.
Examples
>>> import lucid
>>> x = lucid.tensor([1.0, 2.0, 3.0]).reshape(1, 3)
>>> proto = lucid.zeros(4, 3)
>>> x.expand_as(proto).shape
(4, 3)view_as
→Tensorview_as(other: Self)Reinterpret strides to match other.shape.
Convenience wrapper around reshape that adopts the shape
of another tensor. When the source is contiguous the result is a
true view (zero copy); otherwise the engine may have to
materialise a contiguous copy first.
Parameters
otherTensorother.shape is
consulted.Returns
TensorA reshaped view (or copy) of self with shape other.shape.
Raises
RuntimeErrorself.numel() != other.numel().Notes
For a contiguous tensor the new strides are computed as
so the layout is row-major with no data motion.
Examples
>>> import lucid
>>> x = lucid.arange(6)
>>> proto = lucid.zeros(2, 3)
>>> x.view_as(proto).shape
(2, 3)type_as
→Tensortype_as(other: Self)Cast self to the dtype of other.
Convenience wrapper around to that adopts the dtype of
another tensor. Useful when two tensors must have matching
precision before a fused op.
Parameters
otherTensordtype will be adopted.Returns
Tensorself cast to other.dtype. If the dtype already
matches, the call is a no-op.
Notes
Values are cast element-wise: where . Casts between floating-point types preserve gradients; integer→float→integer round-trips lose information in the integer truncation step.
Examples
>>> import lucid
>>> x = lucid.zeros(3, dtype=lucid.int32)
>>> y = lucid.zeros(1, dtype=lucid.float64)
>>> x.type_as(y).dtype
lucid.float64zero_
→Tensorzero_()Fill the tensor with zeros in-place.
Mutates self's storage so every element becomes 0,
without allocating a new tensor. Equivalent to self.fill_(0.0)
but routed through a multiply-by-zero kernel for clarity.
Returns
Tensorself, after zeroing.
Notes
As with fill_, in-place mutation bypasses autograd's
view tracking. Calling zero_ on a leaf tensor with
requires_grad=True may raise a runtime error from the
autograd engine.
Element-wise: for every position. The result is the additive identity of the tensor algebra at the same shape and dtype.
Examples
>>> import lucid
>>> x = lucid.ones(3)
>>> _ = x.zero_()
>>> x.tolist()
[0.0, 0.0, 0.0]to
→Tensorto(device: DeviceLike | DTypeLike | Tensor | None = None, dtype: DTypeLike | None = None, copy: _bool = False)Move and/or cast tensor.
Overloads: .to(device) → device conversion .to(dtype) → dtype conversion .to(device, dtype) → both .to(other_tensor) → match other's device & dtype .to(device=, dtype=, copy=)
Notes
No-op when the source already matches the target device and dtype,
unless copy=True; in that case the same Tensor instance is
returned without allocation. Apple Silicon's unified memory means
device transfers between CPU and Metal copy the dispatch label but
share physical DRAM when the underlying buffer is a SharedStorage.
Examples
>>> import lucid
>>> x = lucid.zeros(3)
>>> x.to(lucid.float64).dtype
lucid.float64
>>> x.to("metal").device
device(type='metal')metal
→Tensormetal()Move this tensor to Apple Metal GPU.
Notes
No-op when the tensor is already on Metal. Under Apple Silicon's unified memory architecture the physical DRAM is shared with the CPU; this call only updates the dispatch target so subsequent ops run through the MLX/Metal backend.
Examples
>>> import lucid
>>> x = lucid.zeros(3)
>>> x.metal().device
device(type='metal')cpu
→Tensorcpu()Move this tensor to CPU.
Notes
No-op when the tensor is already on CPU. On Apple Silicon both CPU and Metal share the same physical DRAM (unified memory); this call relabels the dispatch target so subsequent ops route through Apple Accelerate (vDSP / vForce / BLAS / LAPACK).
Examples
>>> import lucid
>>> x = lucid.zeros(3).metal()
>>> x.cpu().device
device(type='cpu')float
→Tensorfloat()Cast to float32.
Notes
Returns self unchanged when the source dtype is already
float32. Otherwise performs a copy-cast via the engine's
astype op; the device is preserved.
Examples
>>> import lucid
>>> lucid.zeros(3, dtype=lucid.int64).float().dtype
lucid.float32double
→Tensordouble()Cast to float64.
Notes
Returns self unchanged when the source dtype is already
float64. Otherwise allocates a new tensor with widened
precision; device is preserved.
Examples
>>> import lucid
>>> lucid.zeros(3).double().dtype
lucid.float64half
→Tensorhalf()Cast to float16.
Notes
Returns self unchanged when the source dtype is already
float16. Lossy narrowing from float32/float64 may overflow to
inf for magnitudes above ~65504 and underflow to 0 for
magnitudes below ~6e-5.
Examples
>>> import lucid
>>> lucid.zeros(3).half().dtype
lucid.float16int
→Tensorint()Cast to int32.
Notes
Returns self unchanged when the source dtype is already
int32. Casts from floating point truncate toward zero;
values outside [-2**31, 2**31 - 1] produce undefined results.
Examples
>>> import lucid
>>> lucid.tensor([1.7, 2.3]).int().dtype
lucid.int32long
→Tensorlong()Cast to int64.
Notes
Returns self unchanged when the source dtype is already
int64. Casts from floating point truncate toward zero.
Typically used for index tensors (e.g. gather / scatter).
Examples
>>> import lucid
>>> lucid.tensor([1.7, 2.3]).long().dtype
lucid.int64bool
→Tensorbool()Cast to bool.
Notes
Returns self unchanged when the source dtype is already
bool. Otherwise each element is reduced to True if
nonzero and False if zero.
Examples
>>> import lucid
>>> lucid.tensor([0.0, 1.5, -2.0]).bool()
Tensor([False, True, True])__len__
→int__len__()Return the size of the first dimension (shape[0]).
Implements the standard Python len() protocol. Mirrors
sequence semantics: the length is the number of top-level elements
(rows) the tensor iterates over.
Returns
intself.shape[0] — the size of the outermost dimension.
Raises
TypeErrorlen() on
sized-zero scalar objects.Notes
Mirrors NumPy/sequence semantics: .
Iterating with __iter__ yields exactly slices.
Examples
>>> import lucid
>>> len(lucid.zeros(5, 3))
5
>>> len(lucid.zeros(7))
7__bool__
→bool__bool__()Convert a single-element tensor to a Python bool.
Implements the truthiness protocol used by if t:, while t:,
and bool(t). Only defined for single-element tensors because
the truth value of a multi-element tensor is ambiguous: "is any
element true?" (.any()) versus "are all elements true?"
(.all()) are different reductions.
Returns
boolThe unboxed value of the single element interpreted as a
Python bool (non-zero numerics map to True).
Raises
RuntimeErrornumel() != 1. Use any / all explicitly
on multi-element tensors.Notes
This matches the convention adopted by every mainstream tensor
framework. The error guards against subtle bugs such as
if pred_tensor: silently truncating to the first element.
Examples
>>> import lucid
>>> bool(lucid.tensor(1.0))
True
>>> bool(lucid.tensor(0))
False
Mathematically: defined only when :math:`\prod_i s_i = 1`. The
single element :math:`x` maps to :math:`\text{bool}(x) = (x \neq 0)`.
>>> bool(lucid.tensor([1.0, 0.0]))
Traceback (most recent call last):
...
RuntimeError: Boolean value of Tensor with more than one element is ambiguous__repr__
→str__repr__()Return a string representation suitable for debugging.
Delegates to the formatted printer in lucid._tensor._repr,
which renders shape, dtype, and (truncated) element values in a
readable layout.
Returns
strMulti-line repr showing element values with appropriate precision, plus shape and dtype metadata when non-default.
Notes
Display only — does not participate in autograd and is one of
the bridge boundaries permitted by rule H4 (_repr.py).
For Metal-resident tensors this forces a host synchronisation to
materialise element values for printing.
Examples
>>> import lucid
>>> repr(lucid.zeros(2, 3))
'tensor([[0., 0., 0.],\n [0., 0., 0.]])'__iter__
→Iterator[Self]__iter__()Iterate over the first dimension of the tensor.
Yields successive slices self[i] for i in range(shape[0]),
so iterating an -d tensor produces -d slices.
Implements the standard Python iterator protocol.
Raises
TypeErrorNotes
Yields slices each of shape .
For very large leading axes prefer chunked iteration via
lucid.split to amortise per-slice overhead.
Examples
>>> import lucid
>>> for row in lucid.arange(6).reshape(3, 2):
... print(row.tolist())
[0, 1]
[2, 3]
[4, 5]__add__
→Tensor__add__(other: TensorOrScalar)Element-wise addition: self + other with broadcasting and dtype promotion.
Forwards to the engine add op. The right operand may be another
Tensor (broadcast-compatible) or a Python scalar (auto-promoted to
the tensor's dtype and broadcast across the shape).
Parameters
otherTensor or scalarReturns
TensorSum with shape broadcast(self.shape, other.shape) and dtype
promote_types(self.dtype, other.dtype).
Notes
Math:
Mixed-dtype operands are promoted to their common type before the add. Both operands receive gradient flow with unit Jacobian (, ).
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> b = lucid.tensor([10.0, 20.0, 30.0])
>>> a + b
Tensor([11., 22., 33.])
>>> a + 5
Tensor([6., 7., 8.])__radd__
→Tensor__radd__(other: TensorOrScalar)Reflected addition: other + self.
Triggered when the left operand does not implement __add__ with
a Tensor right operand (typically when other is a plain Python
scalar). Addition is commutative so the result equals
__add__.
Parameters
otherTensor or scalarReturns
TensorSum with broadcast shape and promoted dtype.
Notes
Math:
Order is preserved for symmetry with __rsub__ even though
addition commutes. Dtype promotion and broadcasting follow the
same rules as __add__.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> 5 + a
Tensor([6., 7., 8.])__iadd__
→Tensor__iadd__(other: TensorOrScalar)In-place addition: self += other.
Mutates self._impl to hold the sum and returns self so the
expression can be chained. Useful for memory-tight loops; for
leaves that participate in autograd prefer the out-of-place
__add__.
Parameters
otherTensor or scalarReturns
Tensorself after mutation.
Notes
Math (in place):
Performing an in-place add on a leaf tensor with
requires_grad=True is typically rejected by autograd because it
would destroy the value needed for the backward pass. Use the
non-in-place form there.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> a += 10
>>> a
Tensor([11., 12., 13.])__sub__
→Tensor__sub__(other: TensorOrScalar)Element-wise subtraction: self - other with broadcasting.
Parameters
otherTensor or scalarReturns
TensorDifference with broadcast shape and promoted dtype.
Notes
Math:
Dtype promotion follows the standard kind/width rules. Gradients
flow with Jacobian +1 for self and -1 for other.
Examples
>>> import lucid
>>> a = lucid.tensor([10.0, 20.0, 30.0])
>>> b = lucid.tensor([1.0, 2.0, 3.0])
>>> a - b
Tensor([9., 18., 27.])
>>> a - 1
Tensor([9., 19., 29.])__rsub__
→Tensor__rsub__(other: TensorOrScalar)Reflected subtraction: other - self.
Triggered when the left operand does not implement __sub__ with
a Tensor right operand. Subtraction is not commutative, so the
order of operands matters.
Parameters
otherTensor or scalarReturns
TensorDifference with broadcast shape and promoted dtype.
Notes
Math:
Jacobians are +1 for other and -1 for self.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> 10 - a
Tensor([9., 8., 7.])__mul__
→Tensor__mul__(other: TensorOrScalar)Element-wise multiplication: self * other (Hadamard product).
Parameters
otherTensor or scalarReturns
TensorProduct with broadcast shape and promoted dtype.
Notes
Math:
This is the element-wise product; for matrix multiplication
use __matmul__ (@ operator). Jacobians are
and
.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> b = lucid.tensor([4.0, 5.0, 6.0])
>>> a * b
Tensor([4., 10., 18.])
>>> a * 2
Tensor([2., 4., 6.])__rmul__
→Tensor__rmul__(other: TensorOrScalar)Reflected multiplication: other * self.
Triggered when the left operand does not implement __mul__
with a Tensor right operand (e.g. plain scalar on the left).
Multiplication is commutative, so the result equals
__mul__.
Parameters
otherTensor or scalarReturns
TensorProduct with broadcast shape and promoted dtype.
Notes
Math:
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> 3 * a
Tensor([3., 6., 9.])__truediv__
→Tensor__truediv__(other: TensorOrScalar)Element-wise true division: self / other.
Always produces a floating-point result. Integer operands are promoted to float before the division.
Parameters
otherTensor or scalarReturns
TensorQuotient with broadcast shape and a float dtype.
Notes
Math:
Division by zero follows IEEE 754: positive / 0 -> +inf,
0 / 0 -> nan. Jacobians are
and
.
Examples
>>> import lucid
>>> a = lucid.tensor([10.0, 20.0, 30.0])
>>> b = lucid.tensor([2.0, 4.0, 5.0])
>>> a / b
Tensor([5., 5., 6.])
>>> a / 10
Tensor([1., 2., 3.])__floordiv__
→Tensor__floordiv__(other: TensorOrScalar)Element-wise floor division: self // other.
Computes the largest integer less than or equal to the true quotient, broadcast element-wise.
Parameters
otherTensor or scalarReturns
TensorFloor-divided result with broadcast shape and promoted dtype.
Notes
Math:
For integer operands the result is integer; for float operands the result is float but takes integral values. Floor division by zero follows the engine's convention (typically raises or produces NaN/inf depending on dtype). Not differentiable at integer-quotient boundaries.
Examples
>>> import lucid
>>> a = lucid.tensor([7.0, 8.0, 9.0])
>>> a // 2
Tensor([3., 4., 4.])__pow__
→Tensor__pow__(other: TensorOrScalar)Element-wise exponentiation: self ** other.
Raises self to the other power element-by-element with
broadcasting and dtype promotion.
Parameters
otherTensor or scalarReturns
TensorPowered result with broadcast shape and promoted dtype.
Notes
Math:
Gradients are and . Negative base with non-integer exponent produces NaN under IEEE 754 unless complex dtype is used.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> a ** 2
Tensor([1., 4., 9.])
>>> a ** lucid.tensor([3.0, 2.0, 1.0])
Tensor([1., 4., 3.])__matmul__
→Tensor__matmul__(other: Tensor)Matrix multiplication: self @ other with batched semantics.
Supports 2-D (matrix) and N-D (batched) inputs. For 1-D vectors the usual NumPy/BLAS dot-product semantics apply (1-D @ 1-D is scalar, 1-D @ 2-D is a row vector, 2-D @ 1-D is a column vector). For N-D operands the trailing two dimensions are matrix-multiplied while leading dimensions broadcast.
Parameters
otherTensorself's last
dimension.Returns
TensorMatrix product with shape determined by broadcasting batch dims and contracting the inner dimension.
Notes
Math (2-D case):
For N-D inputs the same contraction is applied to the last two axes:
Backward passes both matmuls' transposes through the chain rule.
Use __mul__ (*) for element-wise multiplication.
Examples
>>> import lucid
>>> A = lucid.tensor([[1.0, 2.0], [3.0, 4.0]])
>>> B = lucid.tensor([[5.0, 6.0], [7.0, 8.0]])
>>> A @ B
Tensor([[19., 22.], [43., 50.]])__neg__
→Tensor__neg__()Unary negation: -self.
Returns
TensorElement-wise negation with the same shape and dtype as
self.
Notes
Math:
Defined for all signed numeric dtypes. Applied to an unsigned
integer dtype the engine raises TypeError because the result
would not be representable. Jacobian is -1.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, -2.0, 3.0])
>>> -a
Tensor([-1., 2., -3.])__abs__
→Tensor__abs__()Element-wise absolute value: abs(self).
Returns
TensorMagnitude tensor with the same shape as self. For complex
input the result has the corresponding real float dtype.
Notes
Math:
For complex input .
The derivative is and is
undefined at zero (the engine returns 0 there).
Examples
>>> import lucid
>>> a = lucid.tensor([-1.0, 2.0, -3.0])
>>> abs(a)
Tensor([1., 2., 3.])__eq__
→Tensor__eq__(other: object)Element-wise equality comparison: self == other.
Parameters
otherTensor or scalarself.Returns
TensorBoolean tensor with broadcast shape; True where the
corresponding elements match.
Notes
Math:
Comparison with NaN follows IEEE 754 — NaN == NaN is always
False. Not differentiable; the output tensor never carries
gradient.
Returns an element-wise boolean Tensor when other is a Tensor
or numeric scalar (broadcast against self). For arguments
that are neither (e.g. None, lists, arbitrary objects),
Python's == falls back to object identity and yields a plain
bool rather than a Tensor.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> b = lucid.tensor([1.0, 5.0, 3.0])
>>> a == b
Tensor([True, False, True])__ne__
→Tensor__ne__(other: object)Element-wise inequality comparison: self != other.
Parameters
otherTensor or scalarself.Returns
TensorBoolean tensor with broadcast shape.
Notes
Math:
Under IEEE 754, NaN != NaN evaluates to True — this is the
only comparison operator that is "true" against NaN. Not
differentiable.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> b = lucid.tensor([1.0, 5.0, 3.0])
>>> a != b
Tensor([False, True, False])__lt__
→Tensor__lt__(other: TensorOrScalar)Element-wise less-than comparison: self < other.
Parameters
otherTensor or scalarself.Returns
TensorBoolean tensor with broadcast shape.
Notes
Math:
Any comparison involving NaN returns False per IEEE 754. Not
differentiable.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> a < 2.5
Tensor([True, True, False])__le__
→Tensor__le__(other: TensorOrScalar)Element-wise less-than-or-equal comparison: self <= other.
Parameters
otherTensor or scalarself.Returns
TensorBoolean tensor with broadcast shape.
Notes
Math:
Any comparison involving NaN returns False per IEEE 754. Not
differentiable.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> a <= 2.0
Tensor([True, True, False])__gt__
→Tensor__gt__(other: TensorOrScalar)Element-wise greater-than comparison: self > other.
Parameters
otherTensor or scalarself.Returns
TensorBoolean tensor with broadcast shape.
Notes
Math:
Any comparison involving NaN returns False per IEEE 754. Not
differentiable.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> a > 1.5
Tensor([False, True, True])__ge__
→Tensor__ge__(other: TensorOrScalar)Element-wise greater-than-or-equal comparison: self >= other.
Parameters
otherTensor or scalarself.Returns
TensorBoolean tensor with broadcast shape.
Notes
Math:
Any comparison involving NaN returns False per IEEE 754. Not
differentiable.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0])
>>> a >= 2.0
Tensor([False, True, True])__getitem__
→Tensor__getitem__(idx: _IndexType)Tensor indexing: self[idx].
Dispatches to lucid._tensor._indexing._getitem, which
supports both basic and advanced indexing semantics.
Parameters
idxint, slice, None, Ellipsis, Tensor, or tuple of theseint— pick a single position along an axis (reduces rank by 1).slice— Pythonstart:stop:stepwindow (preserves rank).None— insert a new axis of length 1....(Ellipsis) — fill in remaining axes with full slices.- integer Tensor — gather along an axis (advanced indexing).
- boolean Tensor — mask selection (advanced indexing).
- tuple — combine the above across multiple axes.
Returns
TensorView or gather result. Basic indexing produces a view that
shares storage with self; advanced indexing copies.
Notes
Math (basic case, single axis):
Advanced indexing follows NumPy's gather semantics: integer tensors are broadcast against one another and used to fetch elements; boolean masks select a flat 1-D subset along the masked axes. Indexing participates in autograd via a sparse scatter in the backward pass.
Examples
>>> import lucid
>>> a = lucid.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
>>> a[0]
Tensor([1., 2., 3.])
>>> a[:, 1]
Tensor([2., 5.])
>>> a[a > 3]
Tensor([4., 5., 6.])__setitem__
→None__setitem__(idx: _IndexType, value: TensorOrScalar)In-place indexed assignment: self[idx] = value.
Dispatches to lucid._tensor._indexing._setitem. Supports
the same index forms as __getitem__.
Parameters
idxint, slice, None, Ellipsis, Tensor, or tuple of these__getitem__ for the full
list.valueTensor or scalarself.dtype.Returns
NoneThe assignment mutates self in place.
Notes
Math (basic case):
The mutated region is disconnected from the autograd graph for
the previous values — subsequent backward passes will see only
the new values. Assigning into a leaf tensor with
requires_grad=True raises because it would invalidate saved
activations.
Examples
>>> import lucid
>>> a = lucid.tensor([1.0, 2.0, 3.0, 4.0])
>>> a[1:3] = 0
>>> a
Tensor([1., 0., 0., 4.])