Object detection

R-CNN

ConvNet Two-Stage Detector Object Detection

R-CNN (Region-based CNN) detects objects by first generating region proposals using Selective Search, then classifies each using a shared CNN. It combines region warping, feature extraction, and per-region classification with Non-Maximum Suppression.

Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition (2014): 580-587.

Name

Model

Input Shape

R-CNN

RCNN

\((N,C_{in},H,W)\)

Fast R-CNN

ConvNet Two-Stage Detector Object Detection

Fast R-CNN improves upon R-CNN by computing the feature map once for the entire image, then pooling features from proposed regions using RoI Pooling. It unifies classification and bounding box regression into a single network with a shared backbone.

Girshick, Ross. “Fast R-CNN.” Proceedings of the IEEE international conference on computer vision (2015): 1440-1448.

Name

Model

Input Shape

Fast R-CNN

FastRCNN

\((N,C_{in},H,W)\)

Faster R-CNN

ConvNet Two-Stage Detector Object Detection

Faster R-CNN builds on Fast R-CNN by introducing a Region Proposal Network (RPN) that shares convolutional features with the detection head, enabling end-to-end training and real-time inference.

Ren, Shaoqing et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).

Name

Model

Input Shape

Parameter Count

Faster R-CNN

FasterRCNN

\((N,C_{in},H,W)\)

\(-\)

Faster R-CNN ResNet-50 FPN

faster_rcnn_resnet_50_fpn

\((N,3,H,W)\)

43,515,902

Faster R-CNN ResNet-101 FPN

faster_rcnn_resnet_101_fpn

\((N,3,H,W)\)

62,508,030

YOLO

ConvNet One-Stage Detector Object Detection

YOLO is a one-stage object detector that frames detection as a single regression problem, directly predicting bounding boxes and class probabilities from full images in a single forward pass. It enables real-time detection with impressive speed and accuracy.

Redmon, Joseph et al. “You Only Look Once: Unified, Real-Time Object Detection.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016).

Name

Model

Input Shape

Parameter Count

YOLO-v1

yolo_v1

\((N,3,448,448)\)

271,716,734

To be implemented…🔮