nn.Dropout¶

class lucid.nn.Dropout(p: float = 0.5)¶

The Dropout module applies Dropout to the input tensor.

Dropout is a regularization technique used to prevent overfitting in neural networks by randomly zeroing out a subset of activations during training. This encourages the network to learn more robust features that are not reliant on specific activations.

Class Signature¶

class lucid.nn.Dropout(p: float = 0.5) -> None

Parameters¶

p (float, optional): The probability of an element to be zeroed. Must be between 0 and 1. Default is 0.5.

Attributes¶

mask (Tensor or None): A binary mask tensor of the same shape as the input, where each element is 0 with probability p and 1 otherwise. This mask is used to zero out elements during the forward pass in training mode.

Forward Calculation¶

The Dropout module performs the following operations:

During Training:

Mask Generation:

\[\mathbf{mask} \sim \text{Bernoulli}(1 - p)\]

Each element of the mask tensor is sampled independently from a Bernoulli distribution with probability 1 - p of being 1.
Applying Dropout:

\[\mathbf{y} = \frac{\mathbf{x} \odot \mathbf{mask}}{1 - p}\]

Where:
- \(\mathbf{x}\) is the input tensor.
- \(\mathbf{mask}\) is the binary mask tensor.
- \(\odot\) denotes element-wise multiplication.
- The division by 1 - p ensures that the expected value of the activations remains the same.

During Evaluation:

\[\mathbf{y} = \mathbf{x}\]

Dropout is not applied during evaluation; the input is passed through unchanged.

Backward Gradient Calculation¶

During backpropagation, the gradient of the loss with respect to the input tensor is computed as follows:

During Training:

\[\frac{\partial \mathcal{L}}{\partial \mathbf{x}} = \frac{\mathbf{mask}}{1 - p} \odot \frac{\partial \mathcal{L}}{\partial \mathbf{y}}\]

During Evaluation:

\[\frac{\partial \mathcal{L}}{\partial \mathbf{x}} = \frac{\partial \mathcal{L}}{\partial \mathbf{y}}\]

Where:

\(\mathcal{L}\) is the loss function.
\(\mathbf{mask}\) is the binary mask tensor applied during the forward pass.
\(\frac{\partial \mathcal{L}}{\partial \mathbf{y}}\) is the gradient of the loss with respect to the output.

These gradients ensure that only the non-dropped elements contribute to the weight updates, maintaining the robustness introduced by Dropout.

Examples¶

Using `Dropout` with a simple input tensor:

>>> import lucid.nn as nn
>>> from lucid import Tensor
>>> input_tensor = Tensor([[1.0, 2.0, 3.0, 4.0]], requires_grad=True)  # Shape: (1, 4)
>>> dropout = nn.Dropout(p=0.5)
>>> output = dropout(input_tensor)  # Shape: (1, 4)
>>> print(output)
Tensor([[2.0, 0.0, 6.0, 8.0]], grad=None)  # Example output with some elements zeroed

# Backpropagation
>>> output.backward()
>>> print(input_tensor.grad)
[[2.0, 0.0, 6.0, 8.0]]  # Gradients are scaled and zeroed where dropout was applied

Using `Dropout` within a simple neural network:

>>> import lucid.nn as nn
>>> from lucid import Tensor
>>> class DropoutModel(nn.Module):
...     def __init__(self):
...         super(DropoutModel, self).__init__()
...         self.linear = nn.Linear(in_features=4, out_features=2)
...         self.dropout = nn.Dropout(p=0.5)
...
...     def forward(self, x):
...         x = self.linear(x)
...         x = self.dropout(x)
...         return x
...
>>> model = DropoutModel()
>>> input_data = Tensor([[1.0, 2.0, 3.0, 4.0]], requires_grad=True)  # Shape: (1, 4)
>>> output = model(input_data)
>>> print(output)
Tensor([[...], [...]], grad=None)  # Example output after passing through the model

# Backpropagation
>>> output.backward()
>>> print(input_data.grad)
# Gradients with respect to input_data, scaled and zeroed appropriately