class

Mish

extendsModule

Mish()

source edit

Implementing kernel

C++MishBackwardclass

Mish activation function.

Applies element-wise:

\text{Mish}(x) = x \cdot \tanh\!\bigl(\text{Softplus}(x)\bigr) = x \cdot \tanh\!\bigl(\ln(1 + e^x)\bigr)

Mish is smooth, non-monotone, and unbounded above while being bounded below (approaching zero for large negative inputs). It preserves small negative values — unlike ReLU — and empirically outperforms Swish/SiLU on several object detection benchmarks (YOLOv4, YOLOv5).

Notes

Input: $(*)$ — any shape.
Output: $(*)$ — same shape as input.

Mish requires computing both a softplus and a tanh, making it slightly more expensive than ReLU or SiLU. The smooth gradient landscape can aid optimisation in very deep networks.

Examples

>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.Mish()
>>> x = lucid.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
>>> m(x)
tensor([-0.1876, -0.3034,  0.    ,  0.8651,  1.9440])
>>> # Drop-in for SiLU in detection backbones
>>> x = lucid.randn(4, 128, 7, 7)
>>> out = m(x)
>>> out.shape
(4, 128, 7, 7)

Used by 1

lucid.nn.modules

Instance methods

forward

→Tensor

forward(x: Tensor)

source edit

Apply the activation function element-wise.

Parameters

inputTensor

Input tensor of arbitrary shape.

Returns

Tensor

Output tensor of the same shape as input.

>>> import lucid >>> import lucid.nn as nn >>> m = nn.Mish() >>> x = lucid.tensor([-2.0, -1.0, 0.0, 1.0, 2.0]) >>> m(x) tensor([-0.1876, -0.3034, 0. , 0.8651, 1.9440]) >>> # Drop-in for SiLU in detection backbones >>> x = lucid.randn(4, 128, 7, 7) >>> out = m(x) >>> out.shape (4, 128, 7, 7)