open source

CogAgent

Provided by:

Tsinghua University

• Framework: PyTorch

CogAgent is a powerful open-source AI agent framework developed by Tsinghua University. It supports multimodal understanding, integrating text, image, and other data types for comprehensive AI reasoning and interaction. Built with PyTorch and licensed under Apache 2.0, CogAgent enables researchers and developers to build intelligent systems combining multiple data modalities.

CogAgent AI Model

Views

February 14, 2024

Released

Jul 20, 2025

Last Checked

1.0

Version

Capabilities

Screen Understanding
Workflow Automation

Performance Benchmarks

ScreenQA88.7%

GUI Understanding92.3%

Technical Specifications

Parameter Count: N/A

Training & Dataset

Dataset Used

Web screenshots, app UIs

Related AI Models

Discover similar AI models that might interest you

More AI Models

Modelopen source

Granite 3.3

IBM

Granite 3.3 is IBM’s latest open-source multimodal AI model, offering advanced reasoning, speech-to-text, and document understanding capabilities. Trained on diverse datasets, it excels in enterprise applications requiring high accuracy and efficiency. Available under Apache 2.0 license.

Natural Language Processingautomationenterprise

Modelopen source

CLIP

OpenAI

CLIP (Contrastive Language–Image Pretraining) is an open-source multimodal model developed by OpenAI that learns visual concepts from natural language supervision. Built with PyTorch and released under the MIT license, it enables powerful image and text embeddings for applications such as zero-shot classification, semantic search, and cross-modal retrieval. It remains actively used in research and AI product development.

Multimodalimage-text embeddingMultimodal AI

Modelopen source

DeepSeek-VL

DeepSeek AI

DeepSeek-VL is a cutting-edge open-source multimodal AI model that integrates vision and language processing to enable tasks like image captioning, semantic search, and cross-modal retrieval. Developed using PyTorch under the MIT license, it is suitable for building advanced AI systems requiring deep understanding across visual and textual data.

MultimodalMultimodal AI

Model Performance Statistics

Dataset Used

Related AI Models

Granite 3.3

Granite 3.3

CLIP

CLIP

DeepSeek-VL

DeepSeek-VL