class
Mish
extends
ModuleMish()Mish activation function.
Applies element-wise:
Mish is smooth, non-monotone, and unbounded above while being bounded below (approaching zero for large negative inputs). It preserves small negative values — unlike ReLU — and empirically outperforms Swish/SiLU on several object detection benchmarks (YOLOv4, YOLOv5).
Notes
- Input: — any shape.
- Output: — same shape as input.
Mish requires computing both a softplus and a tanh, making it slightly more expensive than ReLU or SiLU. The smooth gradient landscape can aid optimisation in very deep networks.
Examples
>>> import lucid
>>> import lucid.nn as nn
>>> m = nn.Mish()
>>> x = lucid.tensor([-2.0, -1.0, 0.0, 1.0, 2.0])
>>> m(x)
tensor([-0.1876, -0.3034, 0. , 0.8651, 1.9440])
>>> # Drop-in for SiLU in detection backbones
>>> x = lucid.randn(4, 128, 7, 7)
>>> out = m(x)
>>> out.shape
(4, 128, 7, 7)Methods (1)
fn
forward
→Tensorforward(x: Tensor)Apply the activation function element-wise.
Parameters
inputTensorInput tensor of arbitrary shape.
Returns
TensorOutput tensor of the same shape as input.