🔗

Multimodal

Explore 0 APIs and 8 AI models.

0 APIs 8 AI Models 8 Total

Multimodal

Baaivision

Emu2-Chat

Open SourcePyTorch

Emu2-Chat is a conversational AI model designed for engaging and context-aware chat interactions, optimized for natural language understanding and generating human-like responses across various domains.

Views

256

Favorites

Released

2023

Official URL

https://baaivision.github.io/emu2/

conversational

Multimodal

Tsinghua University

CogVLM

Open SourcePyTorch

CogVLM is an advanced open-source vision-language model developed by Tsinghua University, capable of handling various multimodal AI tasks.

Views

136

Favorites

Released

2023

Official URL

https://github.com/THUDM/CogVLM

Multimodal AI

Multimodal

DeepSeek AI

DeepSeek-VL

Open SourcePyTorch

DeepSeek-VL is a cutting-edge open-source multimodal AI model that integrates vision and language processing to enable tasks like image captioning, semantic search, and cross-modal retrieval.

Views

228

Favorites

Released

2024

Official URL

https://github.com/deepseek-ai/DeepSeek-VL

Multimodal AI

Multimodal

Baidu

ERNIE-ViL

Open SourcePaddlePaddle

ERNIE-ViL is a powerful multimodal AI model developed by Baidu that integrates vision and language understanding into a unified framework.

Views

101

Favorites

Released

2019

Official URL

https://github.com/PaddlePaddle/ERNIE

Multimodal AI

Multimodal

OpenAI

CLIP

Open SourcePyTorch

CLIP (Contrastive Language–Image Pretraining) is an open-source multimodal model developed by OpenAI that learns visual concepts from natural language supervision.

Views

192

Favorites

Released

2021

Official URL

https://openai.com/research/clip

Multimodal AIimage-text embedding

Multimodal

University of Wisconsin–Madison

LLaVA-NeXT

Open SourcePyTorch

LLaVA-NeXT is a next-generation multimodal large language model developed by the University of Wisconsin–Madison, building upon the LLaVA framework. It excels in visual perception and language understanding.

Views

Favorites

Released

2025

Official URL

https://llava-vl.github.io/

ai-modelsvision language AI

Multimodal

Meta AI

Chameleon 7B

Open SourcePyTorch

Chameleon 7B is a multimodal foundation model developed by Meta AI that unifies text, image, and code understanding within a single early-fusion transformer architecture.

Views

Favorites

Released

2025

Official URL

https://huggingface.co/facebook/chameleon-7b

llmreasoning LLMai-models

Multimodal

Microsoft

Kosmos-2.5

Open SourcePyTorch

Kosmos-2.5 is Microsoft’s multimodal AI model that integrates text, image, and audio understanding in a unified architecture.

Views

113

Favorites

Released

2025

Official URL

https://github.com/microsoft/unilm

enterprisevision language AI

Multimodal

Other Categories

Emu2-Chat

CogVLM

DeepSeek-VL

ERNIE-ViL

CLIP

LLaVA-NeXT

Chameleon 7B

Kosmos-2.5

Multimodal

Other Categories

Emu2-Chat

CogVLM

DeepSeek-VL

ERNIE-ViL

CLIP

LLaVA-NeXT

Chameleon 7B

Kosmos-2.5