nn.Mish¶

class lucid.nn.Mish¶

The Mish module applies the Mish activation function to the input tensor. The Mish function is defined as:

\[\text{Mish}(\mathbf{x}) = \mathbf{x} \cdot \tanh(\ln(1 + e^{\mathbf{x}}))\]

Where \(\tanh(\cdot)\) is the hyperbolic tangent function and \(\ln(1 + e^{\mathbf{x}})\) is the softplus function. Mish is a smooth, non-monotonic activation function that has demonstrated promising performance in deep neural networks.

Class Signature¶

class lucid.nn.Mish()

Forward Calculation¶

The Mish module performs the following operation:

\[\mathbf{y} = \mathbf{x} \cdot \tanh(\ln(1 + e^{\mathbf{x}}))\]

Where:

\(\mathbf{x}\) is the input tensor.
\(\mathbf{y}\) is the output tensor, calculated as the element-wise product of the input and the hyperbolic tangent of its softplus.

Backward Gradient Calculation¶

During backpropagation, the derivative of Mish with respect to the input is:

\[\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \tanh(\text{sp}) + \mathbf{x} \cdot \sigma(\mathbf{x}) \cdot (1 - \tanh^2(\text{sp}))\]

Where:

\(\text{sp} = \ln(1 + e^{\mathbf{x}})\) is the softplus of the input.
\(\sigma(\mathbf{x}) = \frac{1}{1 + e^{-\mathbf{x}}}\) is the sigmoid function.
This derivative accounts for both the input and its transformation under softplus and tanh.

The Lucid autodiff engine uses this formula internally for accurate and efficient gradient computation.

Examples¶

Applying `Mish` to a single input tensor:

>>> import lucid.nn as nn
>>> input_tensor = Tensor([[-1.0, 2.0, -0.5, 3.0]], requires_grad=True)  # Shape: (1, 4)
>>> mish = nn.Mish()
>>> output = mish(input_tensor)
>>> print(output)
Tensor([[-0.3034016, 1.9439592, -0.2291952, 2.9865277]], grad=None)

# Backpropagation
>>> output.backward()
>>> print(input_tensor.grad)
[[...]]  # Gradients with respect to input_tensor

Using `Mish` within a simple neural network:

>>> import lucid.nn as nn
>>> class SimpleMishModel(nn.Module):
...     def __init__(self):
...         super(SimpleMishModel, self).__init__()
...         self.mish = nn.Mish()
...
...     def forward(self, x):
...         return self.mish(x)
...
>>> model = SimpleMishModel()
>>> input_data = Tensor([[-2.0, 0.5, 1.5, -0.3]], requires_grad=True)  # Shape: (1, 4)
>>> output = model(input_data)
>>> print(output)
Tensor([[-0.2525015, 0.4078608, 1.416362, -0.086727]], grad=None)

# Backpropagation
>>> output.backward()
>>> print(input_data.grad)
[[...]]  # Gradients with respect to input_data

Integrating `Mish` into a Neural Network Model:

>>> import lucid.nn as nn
>>> class NeuralNetwork(nn.Module):
...     def __init__(self):
...         super(NeuralNetwork, self).__init__()
...         self.fc1 = nn.Linear(in_features=3, out_features=5)
...         self.mish = nn.Mish()
...         self.fc2 = nn.Linear(in_features=5, out_features=2)
...
...     def forward(self, x):
...         x = self.fc1(x)
...         x = self.mish(x)
...         x = self.fc2(x)
...         return x
...
>>> model = NeuralNetwork()
>>> input_data = Tensor([[0.5, -1.2, 3.3]], requires_grad=True)  # Shape: (1, 3)
>>> output = model(input_data)
>>> print(output)
Tensor([[...]], grad=None)  # Output tensor after passing through the model

# Backpropagation
>>> output.backward()
>>> print(input_data.grad)
[[...]]  # Gradients with respect to input_data