open source

Chameleon 7B

Provided by: Framework: Unknown

Chameleon 7B is a multimodal foundation model developed by Meta AI that unifies text, image, and code understanding within a single early-fusion transformer architecture. Designed for cross-modal reasoning, it achieves 83.4% on ScienceQA and 58.7% on MathVista benchmarks, showcasing strong performance in visual question answering, mathematical reasoning, and code understanding. By processing multiple input types simultaneously, Chameleon 7B enables seamless contextual alignment across visual and textual data. This open-source model supports tasks like captioning, visual comprehension, document reasoning, and multimodal problem-solving, making it a valuable tool for AI research and enterprise applications.

Model Performance Statistics

0

Views

April 10, 2025

Released

Aug 19, 2025

Last Checked

1.0

Version

Capabilities
  • Multimodal reasoning
  • Code generation
  • Visual QA
Performance Benchmarks
MathVista58.7%
ScienceQA83.4%
Technical Specifications
Parameter Count
N/A
Training & Dataset

Dataset Used

Multimodal instruction datasets

Related AI Models

Discover similar AI models that might interest you

Modelopen source

Jais 30B

Jais 30B

Jais 30B

G42 & Cerebras

Jais 30B is an open-source large language model developed by G42 and Cerebras, designed to advance Arabic and bilingual NLP research. Trained on over 116 billion Arabic and English tokens, it delivers 83.4% performance on the Arabic MMLU benchmark and supports cross-lingual reasoning, translation, and text generation. Jais 30B leverages a specialized tokenizer optimized for Arabic script, ensuring accurate morphological understanding and natural context flow. With its bilingual training and cultural adaptation, Jais 30B stands as the most powerful Arabic-English model for developers, researchers, and AI startups focusing on regional NLP solutions.

Natural Language Processingai-modelsllm
0
Modelopen source

LLaVA-NeXT

LLaVA-NeXT

LLaVA-NeXT

University of Wisconsin-Madison

LLaVA-NeXT is a next-generation multimodal large language model developed by the University of Wisconsin–Madison, building upon the LLaVA (Large Language and Vision Assistant) framework. It combines visual perception and language understanding to interpret and reason over text, images, and charts. Powered by open LLMs such as Mistral and Llama 3, LLaVA-NeXT supports visual question answering, document parsing, chart interpretation, and multimodal dialogue. The model introduces improved visual grounding, faster inference, and enhanced multimodal alignment, achieving state-of-the-art results across multiple vision-language benchmarks. It is widely used in research and enterprise applications for AI assistants that see, read, and reason.

Multimodalai-modelsvision language AI
0
Modelopen source

Emu2-Chat

Emu2-Chat

Emu2-Chat

Beijing Academy of AI

Emu2-Chat is a conversational AI model designed for engaging and context-aware chat interactions. It is optimized for natural language understanding and generating human-like responses across various domains. Ideal for chatbots, virtual assistants, and customer support automation.

Multimodalconversational
94
Chameleon 7B – Meta AI Multimodal Reasoning Model – Free API Hub