Computer Vision

Object detection, image segmentation, OCR, face recognition, depth estimation, and pose detection models — including YOLO26, SAM 3, and RT-DETR for real-time visual AI applications.

2APIs4AI Models

APIs (2)

View all Computer Vision apis

Imagga API

🔥 Hot

Computer Vision

Imagga is an image-recognition API for automatic tagging, categorisation, smart cropping, colour extraction and content moderation. Send an image and get descriptive tags with confidence scores as JSON.

FreemiumAPI Key

View details

Clarifai API

🔥 Hot

Computer Vision

Clarifai is an AI platform with a unified API for computer vision, natural language and generative models. Send an image, video or text to a model and get predictions - labels, detections, embeddings or generated output - as JSON.

FreemiumAPI Key

View details

Browse all Computer Vision APIs

AI Models (4)

View all Computer Vision ai models

SA

Segment Anything

🔥 Hot

by Meta AI

Segment Anything (SAM) is Meta's foundation model for image segmentation. Given a point, box or mask prompt, it cuts out any object in any image zero-shot — no per-class training — making it a universal segmentation tool.

Apache 2.0ViT-B 91M / ViT-L 30

View model

YO

YOLOv5

🔥 Hot

by Ultralytics

YOLOv5 is a fast, popular real-time object-detection model from Ultralytics. Built in PyTorch with sizes from nano to extra-large, it balances speed and accuracy and is easy to train, export and deploy on edge or server.

GPL v31.9M (n) – 86.7M (x)

View model

DE

Detectron2

🔥 Hot

by Meta AI (FAIR)

Detectron2 is Meta's open-source library for object detection and segmentation. It provides fast, production-ready implementations of models like Faster R-CNN, Mask R-CNN and RetinaNet, plus a model zoo of pretrained weights.

Apache 2.0Library (varies by m

View model

DE

DeepLabV3+

🔥 Hot

by Google AI

DeepLabV3+ is Google's semantic image segmentation model that labels every pixel of an image. It combines atrous (dilated) convolutions and an encoder–decoder with ASPP for sharp object boundaries.

Apache 2.0~41M (Xception backb

View model

Showing 6 of 6 resources

At a glance

Compare the top Computer Vision APIs

Browse all APIs

APIAccessAuthFormatsRating

Imagga APIFreemiumAPI KeyRESTJSON—View

Clarifai APIFreemiumAPI KeyRESTJSON—View

More to explore

Explore related categories

All categories

Learn more

From our blog

Tutorials

About this category

Computer Vision — developer guide

What Are Computer Vision Models?

Computer vision models give machines the ability to interpret and understand the visual world. They identify objects, understand scenes, read text, detect faces, estimate depth, and track motion in images and video streams. Applications powered by these models range from quality control cameras on factory floors to medical image analysis systems that flag anomalies in radiology scans. The field advanced rapidly in 2025–2026 with SAM 3 (Meta, November 2025) for universal segmentation and YOLO26 (Ultralytics, September 2025) as a unified five-task detection framework.

Core Computer Vision Tasks

Object detection — locate and classify multiple objects in an image with bounding boxes
Image segmentation — assign every pixel to an object class or a specific instance
OCR — extract printed and handwritten text from documents, signs, and receipts
Facial recognition — identify or verify individuals from face images or video
Pose estimation — detect human body keypoints for fitness, gaming, and animation
Depth estimation — infer 3D structure from a 2D image for AR and robotics

Current State-of-the-Art Models

YOLO26 (September 2025) unifies detection, segmentation, classification, pose, and oriented bounding box in one efficient architecture — the best choice for real-time edge deployment. SAM 3 (Meta, November 2025) enables promptable concept segmentation with memory-based tracking, ideal for interactive annotation tools. RF-DETR (Roboflow, March 2025) provides state-of-the-art real-time detection with simpler training pipelines. CLIP remains the standard for zero-shot image classification and image-text retrieval. EasyOCR and PaddleOCR offer the best free, open-source text extraction across 80+ languages.

Computer Vision

APIs (2)

Imagga API

Clarifai API

AI Models (4)

Segment Anything

YOLOv5

Detectron2

DeepLabV3+

Compare the top Computer Vision APIs

Explore related categories

Productivity

Natural Language Processing

Development

Science & Nature

From our blog

DeepSeek API Tutorial: Free, Low-Cost AI in Python (2026)

Free Vector Database & Embeddings APIs in 2026

How to Build a Free MCP Server (Model Context Protocol)

Computer Vision — developer guide

What Are Computer Vision Models?

Core Computer Vision Tasks

Current State-of-the-Art Models

Get new Computer Vision APIs & Models APIs & tools every week.