Fast R-CNN¶
ConvNet Two-Stage Detector Object Detection
- class lucid.models.FastRCNN(backbone: Module, feat_channels: int, num_classes: int, pool_size: tuple[int, int] = (7, 7), hidden_dim: int = 4096, bbox_reg_means: tuple[float, ...] = (0.0, 0.0, 0.0, 0.0), bbox_reg_stds: tuple[float, ...] = (0.1, 0.1, 0.2, 0.2), dropout: float = 0.5, proposal_generator: Callable[[...], Tensor] | None = None)¶
FastRCNN implements the Fast Region-based Convolutional Neural Network architecture for object detection, building upon the R-CNN approach by introducing a more efficient detection pipeline. It replaces per-region feature extraction with RoI pooling and integrates classification and bounding box regression into a single forward pass.

Class Signature¶
class FastRCNN(nn.Module):
def __init__(
self,
backbone: nn.Module,
feat_channels: int,
num_classes: int,
pool_size: tuple[int, int] = (7, 7),
hidden_dim: int = 4096,
bbox_reg_means: tuple[float, ...] = (0.0, 0.0, 0.0, 0.0),
bbox_reg_stds: tuple[float, ...] = (0.1, 0.1, 0.2, 0.2),
dropout: float = 0.5,
proposal_generator: Callable[..., Tensor] | None = None,
)
Parameters¶
backbone (nn.Module): Convolutional feature extractor applied over the entire image once.
feat_channels (int): Number of output channels from the backbone feature map.
num_classes (int): Number of object categories (excluding background).
pool_size (tuple[int, int], optional): Output size of the spatial pooling operation (typically RoIAlign or RoIPool). Default is (7, 7).
hidden_dim (int, optional): Number of hidden units in the fully connected layers after pooling. Default is 4096.
bbox_reg_means (tuple[float, …], optional): Normalization means for bounding box regression targets. Default is (0.0, 0.0, 0.0, 0.0).
bbox_reg_stds (tuple[float, …], optional): Normalization stds for bounding box regression targets. Default is (0.1, 0.1, 0.2, 0.2).
dropout (float, optional): Dropout probability used in the fully connected layers. Default is 0.5.
proposal_generator (Callable[…, Tensor] | None, optional): Custom region proposal function. If None, uses precomputed proposals.
Architecture¶
Fast R-CNN improves over the original R-CNN by computing the CNN feature map once per image and classifying object proposals directly on this shared map:
Full-Image Feature Map:
The input image is passed through the backbone to extract a dense feature map.
Region of Interest (RoI) Pooling:
Region proposals are projected onto the feature map and cropped to a fixed size using RoI pooling (size defined by pool_size).
Two-Stream Head:
Each pooled region is passed through a set of fully connected layers.
One stream performs classification over num_classes.
The other stream regresses bounding box adjustments per class.
Bounding Box Normalization:
Regression outputs are scaled using bbox_reg_means and bbox_reg_stds.
Examples¶
Basic Usage
import lucid.nn as nn
import lucid.random
import lucid
class ToyBackbone(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
nn.ReLU()
)
def forward(self, x):
return self.net(x)
# Instantiate model
backbone = ToyBackbone()
model = nn.FastRCNN(backbone, feat_channels=64, num_classes=4)
# Dummy input
image = lucid.random.randn(1, 3, 512, 512)
output = model.predict(image)
print(output["boxes"].shape, output["scores"].shape, output["labels"].shape)
Explanation
Fast R-CNN accelerates inference by removing redundant computation. A single backbone pass generates features, which are then reused for each region proposal. RoI pooling ensures a fixed-size input for classification and regression heads.
Custom Configuration
class MiniBackbone(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Sequential(
nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(16, 32, kernel_size=3, stride=2, padding=1),
nn.ReLU()
)
def forward(self, x):
return self.conv(x)
backbone = MiniBackbone()
model = nn.FastRCNN(
backbone=backbone,
feat_channels=32,
num_classes=3,
pool_size=(5, 5),
hidden_dim=1024,
dropout=0.3,
)
image = lucid.random.randn(1, 3, 256, 256)
output = model.predict(image)
print(output["boxes"].shape, output["scores"].shape, output["labels"].shape)