class

Bilinear

extendsModule
Bilinear(in1_features: int, in2_features: int, out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Apply a bilinear transformation to a pair of input tensors.

For each output unit kk the layer computes

yk=x1Wk,:,:x2+bk,k=1,,douty_k = \mathbf{x}_1^{\top} \mathbf{W}_{k,:,:}\, \mathbf{x}_2 + b_k, \qquad k = 1, \dots, d_{\text{out}}

where WRdout×din1×din2\mathbf{W} \in \mathbb{R}^{d_{\text{out}} \times d_{\text{in}_1} \times d_{\text{in}_2}} is the weight tensor and bRdout\mathbf{b} \in \mathbb{R}^{d_{\text{out}}} is the optional bias.

Parameters

in1_featuresint
Dimensionality of the first input (din1d_{\text{in}_1}).
in2_featuresint
Dimensionality of the second input (din2d_{\text{in}_2}).
out_featuresint
Number of output units (doutd_{\text{out}}).
biasbool= True
If True (default) add a learnable bias term b\mathbf{b}.
deviceDeviceLike= None
Device for initial parameters.
dtypeDTypeLike= None
Dtype for initial parameters.

Attributes

weightParameter
Weight tensor of shape (out_features, in1_features, in2_features). Each slice weight[k] is a matrix that mixes the two input spaces for the kk-th output unit. Initialized with uniform sampling over [1din1,  1din1]\left[-\tfrac{1}{\sqrt{d_{\text{in}_1}}},\; \tfrac{1}{\sqrt{d_{\text{in}_1}}}\right].
biasParameter or None
Bias vector of shape (out_features,). None when bias=False.

Notes

  • Input 1 (x1): (,din1)(\ast,\, d_{\text{in}_1}).
  • Input 2 (x2): (,din2)(\ast,\, d_{\text{in}_2}).
  • Output: (,dout)(\ast,\, d_{\text{out}}).

Bilinear captures multiplicative interactions between two feature vectors that a plain Linear layer cannot express. Typical use cases include:

  • Attention scoring — compute compatibility between query and key vectors using a learned interaction matrix instead of the dot product.
  • Relation networks — model pair-wise relationships in graph neural networks or visual question answering.
  • Similarity scoring — learn an asymmetric distance metric between two embeddings (e.g. in contrastive or metric-learning settings).

Examples

Scoring query–key compatibility in an attention mechanism:
>>> import lucid
>>> import lucid.nn as nn
>>> attn_score = nn.Bilinear(64, 64, 1)
>>> q = lucid.randn(8, 64)   # queries
>>> k = lucid.randn(8, 64)   # keys
>>> scores = attn_score(q, k)
>>> scores.shape
(8, 1)
Multi-output relation layer with different input spaces:
>>> rel = nn.Bilinear(128, 256, 32)
>>> x1 = lucid.randn(4, 128)
>>> x2 = lucid.randn(4, 256)
>>> rel(x1, x2).shape
(4, 32)

Methods (4)

dunder

__init__

None
__init__(in1_features: int, in2_features: int, out_features: int, bias: bool = True, device: DeviceLike = None, dtype: DTypeLike = None)
source

Initialise the Bilinear module. See the class docstring for parameter semantics.

fn

reset_parameters

None
reset_parameters()
source

Initialize with Kaiming uniform.

fn

forward

Tensor
forward(x1: Tensor, x2: Tensor)
source

Apply the linear transformation to the input tensor.

Parameters

inputTensor
Input tensor of shape (,in_features)(*, \text{in\_features}).

Returns

Tensor

Output tensor of shape (,out_features)(*, \text{out\_features}).

fn

extra_repr

str
extra_repr()
source

Return a string representation of the layer's configuration.