class

Bilinear

extendsModule

Bilinear(in1_features: int, in2_features: int, out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Apply a bilinear transformation to a pair of input tensors.

For each output unit $k$ the layer computes

y_k = \mathbf{x}_1^{\top} \mathbf{W}_{k,:,:}\, \mathbf{x}_2 + b_k, \qquad k = 1, \dots, d_{\text{out}}

where $\mathbf{W} \in \mathbb{R}^{d_{\text{out}} \times d_{\text{in}_1} \times d_{\text{in}_2}}$ is the weight tensor and $\mathbf{b} \in \mathbb{R}^{d_{\text{out}}}$ is the optional bias.

Parameters

in1_featuresint

Dimensionality of the first input (

d_{\text{in}_1}

in2_featuresint

Dimensionality of the second input (

d_{\text{in}_2}

out_featuresint

Number of output units (

d_{\text{out}}

biasbool= True

If True (default) add a learnable bias term

\mathbf{b}

deviceDeviceLike= None

Device for initial parameters.

dtypeDTypeLike= None

Dtype for initial parameters.

Attributes

weightParameter

Weight tensor of shape (out_features, in1_features, in2_features). Each slice weight[k] is a matrix that mixes the two input spaces for the

k

-th output unit. Initialized with uniform sampling over

\left[-\tfrac{1}{\sqrt{d_{\text{in}_1}}},\; \tfrac{1}{\sqrt{d_{\text{in}_1}}}\right]

biasParameter or None

Bias vector of shape (out_features,). None when bias=False.

Notes

Input 1 (x1): $(\ast,\, d_{\text{in}_1})$ .
Input 2 (x2): $(\ast,\, d_{\text{in}_2})$ .
Output: $(\ast,\, d_{\text{out}})$ .

Bilinear captures multiplicative interactions between two feature vectors that a plain Linear layer cannot express. Typical use cases include:

Attention scoring — compute compatibility between query and key vectors using a learned interaction matrix instead of the dot product.
Relation networks — model pair-wise relationships in graph neural networks or visual question answering.
Similarity scoring — learn an asymmetric distance metric between two embeddings (e.g. in contrastive or metric-learning settings).

Examples

Scoring query–key compatibility in an attention mechanism:
>>> import lucid
>>> import lucid.nn as nn
>>> attn_score = nn.Bilinear(64, 64, 1)
>>> q = lucid.randn(8, 64)   # queries
>>> k = lucid.randn(8, 64)   # keys
>>> scores = attn_score(q, k)
>>> scores.shape
(8, 1)
Multi-output relation layer with different input spaces:
>>> rel = nn.Bilinear(128, 256, 32)
>>> x1 = lucid.randn(4, 128)
>>> x2 = lucid.randn(4, 256)
>>> rel(x1, x2).shape
(4, 32)

Used by 1

lucid.nn.modules

Constructors

dunder

init

→None

__init__(in1_features: int, in2_features: int, out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)

source edit

Initialise the Bilinear module. See the class docstring for parameter semantics.

Instance methods

extra_repr

→str

extra_repr()

source edit

Return a string representation of the layer's configuration.

forward

→Tensor

forward(x1: Tensor, x2: Tensor)

source edit

Apply the linear transformation to the input tensor.

Parameters

inputTensor

Input tensor of shape

(*, \text{in\_features})

Returns

Tensor

Output tensor of shape $(*, \text{out\_features})$ .

reset_parameters

→None

reset_parameters()

source edit

Initialize with Kaiming uniform.

Scoring query–key compatibility in an attention mechanism: >>> import lucid >>> import lucid.nn as nn >>> attn_score = nn.Bilinear(64, 64, 1) >>> q = lucid.randn(8, 64) # queries >>> k = lucid.randn(8, 64) # keys >>> scores = attn_score(q, k) >>> scores.shape (8, 1) Multi-output relation layer with different input spaces: >>> rel = nn.Bilinear(128, 256, 32) >>> x1 = lucid.randn(4, 128) >>> x2 = lucid.randn(4, 256) >>> rel(x1, x2).shape (4, 32)