Object detection¶
R-CNN¶
ConvNet Two-Stage Detector Object Detection
R-CNN (Region-based CNN) detects objects by first generating region proposals using Selective Search, then classifies each using a shared CNN. It combines region warping, feature extraction, and per-region classification with Non-Maximum Suppression.
Girshick, Ross, et al. “Rich feature hierarchies for accurate object detection and semantic segmentation.” Proceedings of the IEEE conference on computer vision and pattern recognition (2014): 580-587.
Name |
Model |
Input Shape |
---|---|---|
R-CNN |
\((N,C_{in},H,W)\) |
Fast R-CNN¶
ConvNet Two-Stage Detector Object Detection
Fast R-CNN improves upon R-CNN by computing the feature map once for the entire image, then pooling features from proposed regions using RoI Pooling. It unifies classification and bounding box regression into a single network with a shared backbone.
Girshick, Ross. “Fast R-CNN.” Proceedings of the IEEE international conference on computer vision (2015): 1440-1448.
Name |
Model |
Input Shape |
---|---|---|
Fast R-CNN |
\((N,C_{in},H,W)\) |
Faster R-CNN¶
ConvNet Two-Stage Detector Object Detection
Faster R-CNN builds on Fast R-CNN by introducing a Region Proposal Network (RPN) that shares convolutional features with the detection head, enabling end-to-end training and real-time inference.
Ren, Shaoqing et al. “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” IEEE Transactions on Pattern Analysis and Machine Intelligence (2017).
Name |
Model |
Input Shape |
Parameter Count |
---|---|---|---|
Faster R-CNN |
\((N,C_{in},H,W)\) |
\(-\) |
|
Faster R-CNN ResNet-50 FPN |
\((N,3,H,W)\) |
43,515,902 |
|
Faster R-CNN ResNet-101 FPN |
\((N,3,H,W)\) |
62,508,030 |
YOLO¶
ConvNet One-Stage Detector Object Detection
YOLO is a one-stage object detector that frames detection as a single regression problem, directly predicting bounding boxes and class probabilities from full images in a single forward pass. It enables real-time detection with impressive speed and accuracy.
Redmon, Joseph et al. “You Only Look Once: Unified, Real-Time Object Detection.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016).
Name |
Model |
Input Shape |
Parameter Count |
---|---|---|---|
YOLO-v1 |
\((N,3,448,448)\) |
271,716,734 |
To be implemented…🔮