Explore 0 APIs and 8 AI models.
Emu2-Chat is a conversational AI model designed for engaging and context-aware chat interactions, optimized for natural language understanding and generating human-like responses across various domains.
https://baaivision.github.io/emu2/DeepSeek-VL is a cutting-edge open-source multimodal AI model that integrates vision and language processing to enable tasks like image captioning, semantic search, and cross-modal retrieval.
https://github.com/deepseek-ai/DeepSeek-VLCLIP (Contrastive Language–Image Pretraining) is an open-source multimodal model developed by OpenAI that learns visual concepts from natural language supervision.
https://openai.com/research/clipLLaVA-NeXT is a next-generation multimodal large language model developed by the University of Wisconsin–Madison, building upon the LLaVA framework. It excels in visual perception and language understanding.
https://llava-vl.github.io/Chameleon 7B is a multimodal foundation model developed by Meta AI that unifies text, image, and code understanding within a single early-fusion transformer architecture.
https://huggingface.co/facebook/chameleon-7b