Vision ModelsΒΆ
Task |
Explanation |
Docs |
|---|---|---|
Image Classification |
Assign one or more semantic labels to an input image by learning discriminative visual features. Typical outputs are a top-1 class prediction and class probabilities (top-k), and this task is commonly used for recognition benchmarks and as a pretrained backbone for downstream vision tasks. |
|
Object Detection |
Detect and localize multiple objects in an image by predicting bounding boxes together with class labels and confidence scores for each instance. Unlike image classification, detection must answer both what is present and where it appears, making it suitable for scene understanding and real-world perception pipelines. |
|
Image Segmentation |
Perform dense visual understanding by assigning predictions at the pixel level. In instance segmentation, each object instance receives both a category label and its own binary mask, enabling fine-grained scene parsing beyond bounding boxes. |