yolo_v1_tiny¶
ConvNet One-Stage Detector Object Detection
The yolo_v1_tiny function constructs a lightweight YOLO-v1 object detection model based on the simplified YOLO-v1 architecture. It reduces the depth and width of the convolutional backbone to improve inference speed, while retaining the single-stage detection strategy.
Total Parameters: 236,720,462 (ConvNet + FC)
Function Signature¶
@register_model
def yolo_v1_tiny(num_classes: int = 20, **kwargs) -> YOLO_V1
Parameters¶
num_classes (int, optional): Number of object classes to detect. Default is 20 (PASCAL VOC).
kwargs (dict, optional): Additional arguments to override defaults in YOLO_V1, such as:
split_size (int): Grid size for dividing the input image (default: 7).
num_boxes (int): Number of bounding boxes per grid cell (default: 2).
lambda_coord (float): Weight for coordinate loss (default: 5.0).
lambda_noobj (float): Weight for no-object confidence loss (default: 0.5).
Returns¶
YOLO_V1: An instance of the YOLO-v1-tiny model ready for training or inference.
Examples¶
Basic Usage
from lucid.models import yolo_v1_tiny
# Create YOLO-v1-tiny model with 20 target classes
model = yolo_v1_tiny(num_classes=20)
# Input: batch of images with shape (N, 3, 448, 448)
x = lucid.rand(8, 3, 448, 448)
# Output: tensor of shape (N, 7, 7, 30) for VOC (20 classes, 2 boxes)
preds = model(x)
print(preds.shape) # (8, 7, 7, 30)
Training Notes¶
The output shape of the model is:
(N, S, S, 5 * B + C)
Where:
S is the grid size (split_size, default: 7),
B is the number of boxes per cell (num_boxes, default: 2),
C is the number of object classes (num_classes, e.g., 20 for VOC).
This includes:
B bounding boxes (x, y, w, h, conf),
C class probabilities.
Use the get_loss method of the returned model to compute the training loss against the corresponding ground truth targets in the same format.
Tip
You can override architectural options like split_size, num_boxes, or loss coefficients via **kwargs to create variants of the YOLO-v1-tiny model.
Warning
Make sure the ground truth targets fed into the loss function match the required shape (N, S, S, 5 * B + C), with coordinates normalized to the grid and confidence + class vectors properly set.
Architectural Differences¶
YOLO-v1 (original): Uses a deeper ConvNet with 24 convolutional layers followed by 2 fully connected layers, enabling strong feature extraction but at the cost of computational load.
YOLO-v1-tiny: Replaces the backbone with a smaller ConvNet that has fewer convolutional layers and narrower channel sizes, reducing model size and computation while sacrificing some accuracy.
In practice, yolo_v1_tiny trades off detection performance for real-time speed on resource-limited devices.