Explore AI Models
Discover, compare, and integrate cutting-edge AI models for your projects
84 AI models found
MusicGen
MusicGen is a cutting-edge, single-stage autoregressive transformer AI from Meta AI via the AudioCraft library, designed for high-quality music generation.
https://github.com/facebookresearch/audiocraftDetectron2
Detectron2 is a powerful open-source computer vision library developed by Meta AI (Facebook AI Research) that excels in object detection, instance segmentation, and keypoint detection tasks.
https://github.com/facebookresearch/detectron2OpenVoice
OpenVoice V2 is a cutting-edge open-source voice cloning and speech synthesis model focused on delivering high-fidelity voice outputs with emotional and stylistic flexibility.
https://research.myshell.ai/open-voiceSeamlessM4T v2
SeamlessM4T v2 is Meta AI’s advanced multilingual speech and text translation model, designed for real-time translation across over 100 languages.
https://ai.meta.com/research/seamless-communication/Mistral 8x22B
Mixtral 8x22B is a cutting‑edge open‑source Mixture‑of‑Experts LLM by Mistral AI, featuring 141B total parameters and 39B active parameters, optimized for multilingual reasoning, math, and coding tasks.
https://mistral.ai/news/mixtral-of-experts/Emu2-Chat
Emu2-Chat is a conversational AI model designed for engaging and context-aware chat interactions, optimized for natural language understanding and generating human-like responses across various domains.
https://baaivision.github.io/emu2/T5
T5 (Text-to-Text Transfer Transformer) is Google’s powerful open-source model that converts all NLP problems into a text-to-text format, enabling flexible language understanding and generation.
https://github.com/google-research/text-to-text-transfer-transformerDeepSeek-VL
DeepSeek-VL is a cutting-edge open-source multimodal AI model that integrates vision and language processing to enable tasks like image captioning, semantic search, and cross-modal retrieval.
https://github.com/deepseek-ai/DeepSeek-VLVITS
VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) is an advanced speech synthesis model developed by NVIDIA. It combines variational autoencoders and GANs to generate high-quality, natural-sounding speech directly from text.
https://arxiv.org/abs/2106.06103Bloom
Bloom is an open-source multilingual transformer model developed by BigScience, designed for a variety of natural language processing tasks across multiple languages.
https://bigscience.huggingface.co/wav2vec 2.0
wav2vec 2.0 is a self-supervised speech representation learning model developed by Meta AI, revolutionizing automatic speech recognition (ASR) by significantly decreasing the need for labeled data.
https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec