May 11, 20261 views

Google Cloud Vision API Tutorial: Build an Image Recognition App

Learn to build an image recognition app using Google Cloud Vision's free tier. Covers label detection, text extraction, and face detection in Python and JavaScript. Real use cases include photo organization and content moderation.

Developer workstation showing a Python script calling the Google Cloud Vision API on one screen and a terminal printing detected image labels and confidence scores on another

FreeAPIHub

You've got a folder of 3,000 photos and no idea what's in any of them. Or maybe you're building an app that needs to read text out of receipts. Either way, you need image recognition — and you don't want to train your own model from scratch. This google cloud vision api tutorial walks you through the whole thing in Python and JavaScript, with code that runs the first time you paste it.

Google Cloud Vision gives you label detection, OCR, face detection, logo detection, and more — all through a single REST endpoint. The free tier covers 1,000 units per feature per month, which is plenty for hobby projects and prototypes.

By the end of this post, you'll have a working script that takes any image URL and returns what's inside it, along with any text the API can read. We'll cover the gotchas too — the ones the official docs gloss over.

What Is the Google Cloud Vision API?

Google Cloud Vision is a pre-trained machine learning service that analyzes images. You send it a picture, it sends back structured JSON describing what it sees. No model training, no GPU, no PhD required.

It supports several detection types in one request:

LABEL_DETECTION — identifies objects, scenes, and activities (dog, beach, wedding)
TEXT_DETECTION — extracts text from images (this is the OCR part)
FACE_DETECTION — finds faces and their emotions (no identity matching, just detection)
LOGO_DETECTION — spots brand logos
SAFE_SEARCH_DETECTION — flags adult, violent, or medical content

One important quirk: each feature you request counts as a separate billable unit. If you ask for labels and text on one image, that's two units, not one. The free tier gives you 1,000 units per feature per month — so 1,000 label requests AND 1,000 text requests per month, free. After that it's around $1.50 per 1,000 units.

Why Use This Free Image Recognition API?

No model training needed — Google trained it on billions of images already
Free tier is generous — 1,000 units per feature per month with no credit card needed beyond GCP signup
One endpoint, many features — labels, OCR, faces all from the same call
Works with URLs or base64 — no need to upload files anywhere
Production-grade accuracy — same engine that powers Google Photos search

The trade-off: you do need a Google Cloud account and an API key. It's not zero-setup like Open-Meteo. But once it's wired up, it stays wired up.

Step-by-Step Setup

Before any code, you need an API key. Here's the shortest path:

Go to console.cloud.google.com and create a new project (call it whatever)
Open the API Library, search for "Cloud Vision API", click Enable
Go to Credentials, click "Create Credentials" → "API key"
Copy the key. Restrict it to the Vision API only (saves you if it leaks)

Requirements for the Python side:

pip install requests

That's it. We're hitting the REST endpoint directly with requests instead of the official client library. Why? The client library pulls in dozens of dependencies and forces service-account JSON files. For a beginner tutorial, the REST approach is cleaner — one HTTP call, one key, done.

For JavaScript you need Node.js 18 or newer (for built-in fetch). No npm install needed at all.

Python Example: Basic Label Detection

Let's start with the simplest possible call. Send an image URL, get back a list of labels.

import requests

# Replace with your actual API key from Google Cloud Console
API_KEY = "YOUR_API_KEY_HERE"
ENDPOINT = f"https://vision.googleapis.com/v1/images:annotate?key={API_KEY}"

# Public image URL — a dog on a beach
image_url = "https://images.unsplash.com/photo-1517849845537-4d257902454a"

# Build the request body — Vision API expects this exact shape
payload = {
    "requests": [
        {
            "image": {"source": {"imageUri": image_url}},
            "features": [{"type": "LABEL_DETECTION", "maxResults": 5}]
        }
    ]
}

response = requests.post(ENDPOINT, json=payload)
response.raise_for_status()

data = response.json()
labels = data["responses"][0]["labelAnnotations"]

for label in labels:
    print(f"{label['description']} — {label['score']:.2%}")

A few things worth pointing out. The endpoint is images:annotate — that colon is intentional, not a typo. The body is always wrapped in a requests array even when you're sending one image. And maxResults: 5 caps how many labels come back. The default is 10, which is usually too noisy.

Python Example: Practical Multi-Feature Script with Error Handling

The basic version works, but it falls apart the moment something goes wrong. Here's a version that asks for labels AND text in one call, handles errors properly, and prints clean output. This is closer to what you'd actually ship.

import requests
from requests.exceptions import HTTPError, Timeout, RequestException

API_KEY = "YOUR_API_KEY_HERE"
ENDPOINT = f"https://vision.googleapis.com/v1/images:annotate?key={API_KEY}"

# Vision API limit: max 16 images per batch request
# Free tier: 1000 units per feature per month
MAX_IMAGES_PER_REQUEST = 16

def analyze_image(image_url, max_labels=5):
    """Send one image to Vision API, return labels and any detected text."""
    payload = {
        "requests": [
            {
                "image": {"source": {"imageUri": image_url}},
                "features": [
                    {"type": "LABEL_DETECTION", "maxResults": max_labels},
                    {"type": "TEXT_DETECTION"}
                ]
            }
        ]
    }

    try:
        response = requests.post(ENDPOINT, json=payload, timeout=15)
        response.raise_for_status()
    except HTTPError as e:
        # 400 usually means a bad image URL, 403 means the key is wrong or quota hit
        print(f"HTTP error: {e.response.status_code} — {e.response.text[:200]}")
        return None
    except Timeout:
        print("Request timed out. Vision API is usually fast — check your connection.")
        return None
    except RequestException as e:
        print(f"Network error: {e}")
        return None

    data = response.json()
    result = data.get("responses", [{}])[0]

    # The API returns an 'error' field inside the response on per-image failures
    if "error" in result:
        print(f"Vision API error: {result['error'].get('message')}")
        return None

    labels = result.get("labelAnnotations", [])
    text_blocks = result.get("textAnnotations", [])

    # First textAnnotation contains the full extracted text
    full_text = text_blocks[0]["description"] if text_blocks else ""

    return {
        "labels": [(lbl["description"], lbl["score"]) for lbl in labels],
        "text": full_text.strip()
    }


if __name__ == "__main__":
    url = "https://images.unsplash.com/photo-1485827404703-89b55fcc595e"
    result = analyze_image(url)

    if result:
        print("=== Labels ===")
        for name, score in result["labels"]:
            print(f"  {name:30s} {score:.1%}")

        print("\n=== Extracted Text ===")
        print(result["text"] if result["text"] else "  (no text found)")

The label detection api python flow above is doing real work. We're requesting two features in one call, checking for both HTTP errors and the inner error object Vision sometimes returns inside a 200 response, and treating missing fields as empty instead of crashing on them.

That last part — checking for error inside a 200 response — is the part that trips most people up. The Vision API will return HTTP 200 even when individual images fail. You have to look inside the JSON.

Sample Python Output

=== Labels ===
  Robot                          96.4%
  Technology                     89.1%
  Machine                        85.7%
  Toy                            72.3%
  Animation                      68.0%

=== Extracted Text ===
  (no text found)

JavaScript Example: Vision API with Fetch and Error Handling

Same logic, JavaScript flavor. Works in Node.js 18+ or any modern browser. The endpoint and body shape are identical — only the syntax changes.

// Node.js 18+ or any modern browser
const API_KEY = "YOUR_API_KEY_HERE";
const ENDPOINT = `https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}`;

// Vision API caps batch requests at 16 images
// Free tier: 1000 units per feature per month
async function analyzeImage(imageUrl, maxLabels = 5) {
  const payload = {
    requests: [
      {
        image: { source: { imageUri: imageUrl } },
        features: [
          { type: "LABEL_DETECTION", maxResults: maxLabels },
          { type: "TEXT_DETECTION" }
        ]
      }
    ]
  };

  try {
    const response = await fetch(ENDPOINT, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify(payload)
    });

    if (!response.ok) {
      // 400 = bad image URL, 403 = bad key or quota exceeded
      throw new Error(`Request failed — HTTP ${response.status}`);
    }

    const data = await response.json();
    const result = data.responses?.[0] ?? {};

    // Per-image errors can appear inside a 200 response — check explicitly
    if (result.error) {
      console.error("Vision API error:", result.error.message);
      return null;
    }

    const labels = result.labelAnnotations ?? [];
    const textBlocks = result.textAnnotations ?? [];
    const fullText = textBlocks[0]?.description?.trim() ?? "";

    return {
      labels: labels.map(l => ({ name: l.description, score: l.score })),
      text: fullText
    };
  } catch (error) {
    console.error("Fetch failed:", error.message);
    return null;
  }
}

// Run it
const url = "https://images.unsplash.com/photo-1485827404703-89b55fcc595e";
analyzeImage(url).then(result => {
  if (!result) return;

  console.log("=== Labels ===");
  result.labels.forEach(l => {
    console.log(`  ${l.name} — ${(l.score * 100).toFixed(1)}%`);
  });

  console.log("\n=== Extracted Text ===");
  console.log(result.text || "  (no text found)");
});

Sample Console Output

=== Labels ===
  Robot — 96.4%
  Technology — 89.1%
  Machine — 85.7%
  Toy — 72.3%
  Animation — 68.0%

=== Extracted Text ===
  (no text found)

Understanding the Output

The Vision API response is nested deeper than most. Here's what each piece means:

responses — array, one entry per image you sent (always check index 0 for single requests)
labelAnnotations — list of detected objects/concepts, sorted by confidence
labelAnnotations[].description — the human-readable label (e.g., "Dog")
labelAnnotations[].score — confidence from 0.0 to 1.0 (0.85 = 85% sure)
labelAnnotations[].topicality — how central this concept is to the image
textAnnotations — list of detected text regions; index 0 has the full text, the rest are word-by-word
error — present only when something went wrong on a per-image basis

One thing the docs don't make obvious: textAnnotations[0].description contains every word the API found, joined with newlines. The other entries break it down word by word with bounding boxes. For most use cases, you only need index 0.

Error Handling: What Breaks and Why

Here are the errors you'll hit in your first hour:

403 PERMISSION_DENIED — your API key isn't enabled for the Vision API. Go back to the API Library and click Enable.
400 Bad image data — the image URL is unreachable, behind auth, or not actually an image. Try opening it in a browser first.
429 RESOURCE_EXHAUSTED — you've blown through the 1,000 free units for that feature this month. Wait or pay.
Empty labelAnnotations — the image is too small, too blurry, or genuinely contains nothing recognizable. Not an error, just an empty result.
200 with inner error — Vision returned success at the HTTP level but failed on this specific image. Always check result['error'] before trusting the data.

The last one bites everyone. You'll write code that checks response.ok, get back a 200, and then crash on KeyError: 'labelAnnotations'. The Python and JavaScript examples above both handle this — copy that pattern.

Real-World Use Cases

A few places this google cloud vision api tutorial actually pays off:

Photo library auto-tagging — run label detection on every uploaded image and store the tags. Now your users can search "sunset" or "dog" without manually tagging anything.
Receipt and invoice OCR — text detection turns photographed receipts into searchable text. This is the ocr api free workflow expense apps use.
Content moderation — SAFE_SEARCH_DETECTION flags adult or violent uploads before they hit your platform. Cheaper than human moderation for the obvious cases.
Accessibility alt-text — auto-generate image descriptions for screen readers. Combine the top 3 labels into a sentence and you've got decent alt text.

Vision API vs. Other Free Image Recognition Options

Service	Free Tier	OCR Quality	Setup Time
Google Cloud Vision	1,000 units/feature/month	Excellent (90+ languages)	10 minutes (GCP signup)
AWS Rekognition	5,000 images/month (first 12 months only)	Good (English-focused)	15 minutes (AWS + IAM)
Azure Computer Vision	5,000 transactions/month	Excellent (164 languages)	10 minutes (Azure signup)
Tesseract (self-hosted)	Unlimited (free)	Decent (depends on tuning)	30+ minutes (install + config)

For a computer vision api beginner, Google's free tier and clean REST endpoint make it the easiest place to start.

FAQ

Do I need a credit card to use the Google Cloud Vision free tier?

Yes, Google requires a credit card to activate any Cloud project, even for the free tier. But you won't be charged until you exceed 1,000 units per feature per month, and you can set billing alerts at $1 to make sure you never get surprised.

Is Google Cloud Vision really free?

The first 1,000 units per feature per month are free, forever — not just for 12 months. After that, it's $1.50 per 1,000 units for most features. For prototypes and side projects, you'll almost never hit the cap.

Can I send local image files instead of URLs?

Yes. Read the file in binary, base64-encode it, and send it in the image.content field instead of image.source.imageUri. The rest of the request stays the same. URLs are simpler when the image is already public.

How accurate is the label detection?

For common objects (dogs, cars, food, landmarks), accuracy is excellent — usually 90%+ on the top label. For niche or fine-grained categories (specific dog breeds, rare plants), it's hit or miss. Always check the confidence score before trusting a result.

What's the difference between TEXT_DETECTION and DOCUMENT_TEXT_DETECTION?

TEXT_DETECTION is tuned for short text in natural scenes — street signs, product labels, screenshots. DOCUMENT_TEXT_DETECTION is built for dense pages of text like scanned PDFs or book pages. If you're processing receipts or documents, use the second one — it preserves layout much better.

Why does my Vision API request hang or time out?

Usually it's the image URL, not the API. If the URL points to a slow server or a huge file, Vision waits for it to download before processing. Set a request timeout (15 seconds is reasonable) and consider hosting images somewhere fast like a CDN.

Conclusion

You now have a working image recognition pipeline that handles labels, OCR, and errors properly in both Python and JavaScript. The same pattern extends to face detection, logo detection, and safe-search — just swap the features array.

The next logical step is batching. The Vision API accepts up to 16 images per request, which cuts your latency dramatically when you're processing a folder of photos. Loop through your files, build the batch payload, and parse the response array in order.

Want more no-auth APIs to pair with this one? Browse the Free API Hub directory for free APIs that work without a credit card.

Found this helpful?

Share this article with fellow developers or save it for later reference. Your support helps us create more quality content.

Media APIs

May 11, 20261 views

Google Cloud Vision API Tutorial: Build an Image Recognition App

Developer workstation showing a Python script calling the Google Cloud Vision API on one screen and a terminal printing detected image labels and confidence scores on another

FreeAPIHub

What Is the Google Cloud Vision API?

It supports several detection types in one request:

LABEL_DETECTION — identifies objects, scenes, and activities (dog, beach, wedding)
TEXT_DETECTION — extracts text from images (this is the OCR part)
FACE_DETECTION — finds faces and their emotions (no identity matching, just detection)
LOGO_DETECTION — spots brand logos
SAFE_SEARCH_DETECTION — flags adult, violent, or medical content

Why Use This Free Image Recognition API?

No model training needed — Google trained it on billions of images already
Free tier is generous — 1,000 units per feature per month with no credit card needed beyond GCP signup
One endpoint, many features — labels, OCR, faces all from the same call
Works with URLs or base64 — no need to upload files anywhere
Production-grade accuracy — same engine that powers Google Photos search

The trade-off: you do need a Google Cloud account and an API key. It's not zero-setup like Open-Meteo. But once it's wired up, it stays wired up.

Step-by-Step Setup

Before any code, you need an API key. Here's the shortest path:

Go to console.cloud.google.com and create a new project (call it whatever)
Open the API Library, search for "Cloud Vision API", click Enable
Go to Credentials, click "Create Credentials" → "API key"
Copy the key. Restrict it to the Vision API only (saves you if it leaks)

Requirements for the Python side:

pip install requests

For JavaScript you need Node.js 18 or newer (for built-in fetch). No npm install needed at all.

Python Example: Basic Label Detection

Let's start with the simplest possible call. Send an image URL, get back a list of labels.

import requests

# Replace with your actual API key from Google Cloud Console
API_KEY = "YOUR_API_KEY_HERE"
ENDPOINT = f"https://vision.googleapis.com/v1/images:annotate?key={API_KEY}"

# Public image URL — a dog on a beach
image_url = "https://images.unsplash.com/photo-1517849845537-4d257902454a"

# Build the request body — Vision API expects this exact shape
payload = {
    "requests": [
        {
            "image": {"source": {"imageUri": image_url}},
            "features": [{"type": "LABEL_DETECTION", "maxResults": 5}]
        }
    ]
}

response = requests.post(ENDPOINT, json=payload)
response.raise_for_status()

data = response.json()
labels = data["responses"][0]["labelAnnotations"]

for label in labels:
    print(f"{label['description']} — {label['score']:.2%}")

Python Example: Practical Multi-Feature Script with Error Handling

import requests
from requests.exceptions import HTTPError, Timeout, RequestException

API_KEY = "YOUR_API_KEY_HERE"
ENDPOINT = f"https://vision.googleapis.com/v1/images:annotate?key={API_KEY}"

# Vision API limit: max 16 images per batch request
# Free tier: 1000 units per feature per month
MAX_IMAGES_PER_REQUEST = 16

def analyze_image(image_url, max_labels=5):
    """Send one image to Vision API, return labels and any detected text."""
    payload = {
        "requests": [
            {
                "image": {"source": {"imageUri": image_url}},
                "features": [
                    {"type": "LABEL_DETECTION", "maxResults": max_labels},
                    {"type": "TEXT_DETECTION"}
                ]
            }
        ]
    }

    try:
        response = requests.post(ENDPOINT, json=payload, timeout=15)
        response.raise_for_status()
    except HTTPError as e:
        # 400 usually means a bad image URL, 403 means the key is wrong or quota hit
        print(f"HTTP error: {e.response.status_code} — {e.response.text[:200]}")
        return None
    except Timeout:
        print("Request timed out. Vision API is usually fast — check your connection.")
        return None
    except RequestException as e:
        print(f"Network error: {e}")
        return None

    data = response.json()
    result = data.get("responses", [{}])[0]

    # The API returns an 'error' field inside the response on per-image failures
    if "error" in result:
        print(f"Vision API error: {result['error'].get('message')}")
        return None

    labels = result.get("labelAnnotations", [])
    text_blocks = result.get("textAnnotations", [])

    # First textAnnotation contains the full extracted text
    full_text = text_blocks[0]["description"] if text_blocks else ""

    return {
        "labels": [(lbl["description"], lbl["score"]) for lbl in labels],
        "text": full_text.strip()
    }


if __name__ == "__main__":
    url = "https://images.unsplash.com/photo-1485827404703-89b55fcc595e"
    result = analyze_image(url)

    if result:
        print("=== Labels ===")
        for name, score in result["labels"]:
            print(f"  {name:30s} {score:.1%}")

        print("\n=== Extracted Text ===")
        print(result["text"] if result["text"] else "  (no text found)")

Sample Python Output

=== Labels ===
  Robot                          96.4%
  Technology                     89.1%
  Machine                        85.7%
  Toy                            72.3%
  Animation                      68.0%

=== Extracted Text ===
  (no text found)

JavaScript Example: Vision API with Fetch and Error Handling

Same logic, JavaScript flavor. Works in Node.js 18+ or any modern browser. The endpoint and body shape are identical — only the syntax changes.

// Node.js 18+ or any modern browser
const API_KEY = "YOUR_API_KEY_HERE";
const ENDPOINT = `https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}`;

// Vision API caps batch requests at 16 images
// Free tier: 1000 units per feature per month
async function analyzeImage(imageUrl, maxLabels = 5) {
  const payload = {
    requests: [
      {
        image: { source: { imageUri: imageUrl } },
        features: [
          { type: "LABEL_DETECTION", maxResults: maxLabels },
          { type: "TEXT_DETECTION" }
        ]
      }
    ]
  };

  try {
    const response = await fetch(ENDPOINT, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify(payload)
    });

    if (!response.ok) {
      // 400 = bad image URL, 403 = bad key or quota exceeded
      throw new Error(`Request failed — HTTP ${response.status}`);
    }

    const data = await response.json();
    const result = data.responses?.[0] ?? {};

    // Per-image errors can appear inside a 200 response — check explicitly
    if (result.error) {
      console.error("Vision API error:", result.error.message);
      return null;
    }

    const labels = result.labelAnnotations ?? [];
    const textBlocks = result.textAnnotations ?? [];
    const fullText = textBlocks[0]?.description?.trim() ?? "";

    return {
      labels: labels.map(l => ({ name: l.description, score: l.score })),
      text: fullText
    };
  } catch (error) {
    console.error("Fetch failed:", error.message);
    return null;
  }
}

// Run it
const url = "https://images.unsplash.com/photo-1485827404703-89b55fcc595e";
analyzeImage(url).then(result => {
  if (!result) return;

  console.log("=== Labels ===");
  result.labels.forEach(l => {
    console.log(`  ${l.name} — ${(l.score * 100).toFixed(1)}%`);
  });

  console.log("\n=== Extracted Text ===");
  console.log(result.text || "  (no text found)");
});

Sample Console Output

=== Labels ===
  Robot — 96.4%
  Technology — 89.1%
  Machine — 85.7%
  Toy — 72.3%
  Animation — 68.0%

=== Extracted Text ===
  (no text found)

Understanding the Output

The Vision API response is nested deeper than most. Here's what each piece means:

responses — array, one entry per image you sent (always check index 0 for single requests)
labelAnnotations — list of detected objects/concepts, sorted by confidence
labelAnnotations[].description — the human-readable label (e.g., "Dog")
labelAnnotations[].score — confidence from 0.0 to 1.0 (0.85 = 85% sure)
labelAnnotations[].topicality — how central this concept is to the image
textAnnotations — list of detected text regions; index 0 has the full text, the rest are word-by-word
error — present only when something went wrong on a per-image basis

Error Handling: What Breaks and Why

Here are the errors you'll hit in your first hour:

403 PERMISSION_DENIED — your API key isn't enabled for the Vision API. Go back to the API Library and click Enable.
400 Bad image data — the image URL is unreachable, behind auth, or not actually an image. Try opening it in a browser first.
429 RESOURCE_EXHAUSTED — you've blown through the 1,000 free units for that feature this month. Wait or pay.
Empty labelAnnotations — the image is too small, too blurry, or genuinely contains nothing recognizable. Not an error, just an empty result.
200 with inner error — Vision returned success at the HTTP level but failed on this specific image. Always check result['error'] before trusting the data.

Real-World Use Cases

A few places this google cloud vision api tutorial actually pays off:

Photo library auto-tagging — run label detection on every uploaded image and store the tags. Now your users can search "sunset" or "dog" without manually tagging anything.
Receipt and invoice OCR — text detection turns photographed receipts into searchable text. This is the ocr api free workflow expense apps use.
Content moderation — SAFE_SEARCH_DETECTION flags adult or violent uploads before they hit your platform. Cheaper than human moderation for the obvious cases.
Accessibility alt-text — auto-generate image descriptions for screen readers. Combine the top 3 labels into a sentence and you've got decent alt text.

Vision API vs. Other Free Image Recognition Options

Service	Free Tier	OCR Quality	Setup Time
Google Cloud Vision	1,000 units/feature/month	Excellent (90+ languages)	10 minutes (GCP signup)
AWS Rekognition	5,000 images/month (first 12 months only)	Good (English-focused)	15 minutes (AWS + IAM)
Azure Computer Vision	5,000 transactions/month	Excellent (164 languages)	10 minutes (Azure signup)
Tesseract (self-hosted)	Unlimited (free)	Decent (depends on tuning)	30+ minutes (install + config)

For a computer vision api beginner, Google's free tier and clean REST endpoint make it the easiest place to start.

FAQ

Do I need a credit card to use the Google Cloud Vision free tier?

Is Google Cloud Vision really free?

Can I send local image files instead of URLs?

How accurate is the label detection?

What's the difference between TEXT_DETECTION and DOCUMENT_TEXT_DETECTION?

Why does my Vision API request hang or time out?

Conclusion

Want more no-auth APIs to pair with this one? Browse the Free API Hub directory for free APIs that work without a credit card.

Found this helpful?

Share this article with fellow developers or save it for later reference. Your support helps us create more quality content.

Google Cloud Vision API Tutorial: Build an Image Recognition App

What Is the Google Cloud Vision API?

Why Use This Free Image Recognition API?

Step-by-Step Setup

Python Example: Basic Label Detection

Python Example: Practical Multi-Feature Script with Error Handling

Sample Python Output

JavaScript Example: Vision API with Fetch and Error Handling

Sample Console Output

Understanding the Output

Error Handling: What Breaks and Why

Real-World Use Cases

Vision API vs. Other Free Image Recognition Options

FAQ

Do I need a credit card to use the Google Cloud Vision free tier?

Is Google Cloud Vision really free?

Can I send local image files instead of URLs?

How accurate is the label detection?

What's the difference between TEXT_DETECTION and DOCUMENT_TEXT_DETECTION?

Why does my Vision API request hang or time out?

Conclusion

Tags

Found this helpful?

Suggested for You

Flask vs Django vs FastAPI: Choosing the Best Python Web Framework

Top 7 Free AI Tools for Academic Research and Paper Discovery

Master API Testing with Postman: A Complete Beginner’s Guide

Top AI Video Editing Tools Compared for Faster Content Creation

Top AI Coding Tools to Revolutionize Development in 2026

Continue Learning

Unsplash API Tutorial: Fetch Free Royalty-Free Images in Python and JavaScript

Google Cloud Vision API Tutorial: Build an Image Recognition App

What Is the Google Cloud Vision API?

Why Use This Free Image Recognition API?

Step-by-Step Setup

Python Example: Basic Label Detection

Python Example: Practical Multi-Feature Script with Error Handling

Sample Python Output

JavaScript Example: Vision API with Fetch and Error Handling

Sample Console Output

Understanding the Output

Error Handling: What Breaks and Why

Real-World Use Cases

Vision API vs. Other Free Image Recognition Options

FAQ

Do I need a credit card to use the Google Cloud Vision free tier?

Is Google Cloud Vision really free?

Can I send local image files instead of URLs?

How accurate is the label detection?

What's the difference between TEXT_DETECTION and DOCUMENT_TEXT_DETECTION?

Why does my Vision API request hang or time out?

Conclusion

Tags

Found this helpful?

Suggested for You

Flask vs Django vs FastAPI: Choosing the Best Python Web Framework

Top 7 Free AI Tools for Academic Research and Paper Discovery

Master API Testing with Postman: A Complete Beginner’s Guide

Top AI Video Editing Tools Compared for Faster Content Creation

Top AI Coding Tools to Revolutionize Development in 2026

Continue Learning

Unsplash API Tutorial: Fetch Free Royalty-Free Images in Python and JavaScript