Category
👁️

Computer Vision

Object detection, image segmentation, OCR, face recognition, depth estimation, and pose detection models — including YOLO26, SAM 3, and RT-DETR for real-time visual AI applications.

2APIs4AI Models
Most Popular In
Object DetectionImage ClassificationOCR
Auth Breakdown
API Key100%
Notable Developers
Meta AI (SAM 3)Ultralytics (YOLO26)Google (Vision API)Microsoft Azure VisionRoboflow
Updated Jun 12, 2026
Curated by FreeAPIHub editors
Topics:Object DetectionImage SegmentationOCR & Document AIFacial RecognitionPose EstimationDepth Estimation
6 of 6
Access:
Auth:
Format:
Imagga API logo

Imagga API

🔥 Hot
Computer Vision

Imagga is an image-recognition API for automatic tagging, categorisation, smart cropping, colour extraction and content moderation. Send an image and get descriptive tags with confidence scores as JSON.

FreemiumAPI Key
View details
Clarifai API logo

Clarifai API

🔥 Hot
Computer Vision

Clarifai is an AI platform with a unified API for computer vision, natural language and generative models. Send an image, video or text to a model and get predictions - labels, detections, embeddings or generated output - as JSON.

FreemiumAPI Key
View details
SA

Segment Anything

🔥 Hot
by Meta AI

Segment Anything (SAM) is Meta's foundation model for image segmentation. Given a point, box or mask prompt, it cuts out any object in any image zero-shot — no per-class training — making it a universal segmentation tool.

Apache 2.0ViT-B 91M / ViT-L 30
View model
YO

YOLOv5

🔥 Hot
by Ultralytics

YOLOv5 is a fast, popular real-time object-detection model from Ultralytics. Built in PyTorch with sizes from nano to extra-large, it balances speed and accuracy and is easy to train, export and deploy on edge or server.

GPL v31.9M (n) – 86.7M (x)
View model
DE

Detectron2

🔥 Hot
by Meta AI (FAIR)

Detectron2 is Meta's open-source library for object detection and segmentation. It provides fast, production-ready implementations of models like Faster R-CNN, Mask R-CNN and RetinaNet, plus a model zoo of pretrained weights.

Apache 2.0Library (varies by m
View model
DE

DeepLabV3+

🔥 Hot
by Google AI

DeepLabV3+ is Google's semantic image segmentation model that labels every pixel of an image. It combines atrous (dilated) convolutions and an encoder–decoder with ASPP for sharp object boundaries.

Apache 2.0~41M (Xception backb
View model
Showing 6 of 6 resources

At a glance

Compare the top Computer Vision APIs

Browse all APIs
APIAccessAuthFormatsRating
Imagga API logo
Imagga API
FreemiumAPI KeyRESTJSONView
Clarifai API logo
Clarifai API
FreemiumAPI KeyRESTJSONView

About this category

Computer Vision — developer guide

What Are Computer Vision Models?

Computer vision models give machines the ability to interpret and understand the visual world. They identify objects, understand scenes, read text, detect faces, estimate depth, and track motion in images and video streams. Applications powered by these models range from quality control cameras on factory floors to medical image analysis systems that flag anomalies in radiology scans. The field advanced rapidly in 2025–2026 with SAM 3 (Meta, November 2025) for universal segmentation and YOLO26 (Ultralytics, September 2025) as a unified five-task detection framework.

Core Computer Vision Tasks

  • Object detection — locate and classify multiple objects in an image with bounding boxes
  • Image segmentation — assign every pixel to an object class or a specific instance
  • OCR — extract printed and handwritten text from documents, signs, and receipts
  • Facial recognition — identify or verify individuals from face images or video
  • Pose estimation — detect human body keypoints for fitness, gaming, and animation
  • Depth estimation — infer 3D structure from a 2D image for AR and robotics

Current State-of-the-Art Models

YOLO26 (September 2025) unifies detection, segmentation, classification, pose, and oriented bounding box in one efficient architecture — the best choice for real-time edge deployment. SAM 3 (Meta, November 2025) enables promptable concept segmentation with memory-based tracking, ideal for interactive annotation tools. RF-DETR (Roboflow, March 2025) provides state-of-the-art real-time detection with simpler training pipelines. CLIP remains the standard for zero-shot image classification and image-text retrieval. EasyOCR and PaddleOCR offer the best free, open-source text extraction across 80+ languages.