open source

TensorRT-LLM

Provided by: Framework: C++/Python

TensorRT-LLM is an open-source library by NVIDIA that delivers highly optimized inference for large language models. It leverages TensorRT and CUDA to accelerate transformer-based models, enabling efficient deployment across GPUs with minimal latency. Built for developers aiming to scale LLMs efficiently.

Model Performance Statistics

13

Views

October 10, 2023

Released

Jul 20, 2025

Last Checked

0.8

Version

Capabilities
  • Inference Acceleration
  • Quantization
Performance Benchmarks
Latency35ms/token
Throughput2.4x faster than vLLM
Technical Specifications
Parameter Count
N/A
Training & Dataset

Dataset Used

N/A

Related AI Models

Discover similar AI models that might interest you

Modelopen source

MLC-LLM

MLC-LLM

MLC-LLM

CMU, SAMOA, OctoML

MLC-LLM is a universal and open-source framework developed by the Apache TVM community, CMU, and SAMOAI. It allows deployment of large language models on a wide range of edge devices such as iPhones, Android, WebAssembly, and GPUs, enabling efficient and fast inference anywhere.

Scientific AIaillm
13
Modelfree

Bloom

Bloom

Bloom

BigScience

Bloom is an open-source multilingual transformer model developed by BigScience, designed for a variety of natural language processing tasks across multiple languages.

Natural Language Processingllm
47
Modelopen source

ControlNet

ControlNet

ControlNet

lllyasviel

Conditional image generation with additional control signals.

Generative Modelsaiimage-generation
33