open source
TensorRT-LLM
Provided by:
NVIDIA
• Framework: C++/PythonTensorRT-LLM is an open-source library by NVIDIA that delivers highly optimized inference for large language models. It leverages TensorRT and CUDA to accelerate transformer-based models, enabling efficient deployment across GPUs with minimal latency. Built for developers aiming to scale LLMs efficiently.
TensorRT-LLM AI Model

Model Performance Statistics
13
Views
October 10, 2023
Released
Jul 20, 2025
Last Checked
0.8
Version
Capabilities
- Inference Acceleration
- Quantization
Performance Benchmarks
Latency35ms/token
Throughput2.4x faster than vLLM
Technical Specifications
- Parameter Count
- N/A
Training & Dataset
Dataset Used
N/A
Related AI Models
Discover similar AI models that might interest you
Modelopen source
MLC-LLM

MLC-LLM
CMU, SAMOA, OctoML
MLC-LLM is a universal and open-source framework developed by the Apache TVM community, CMU, and SAMOAI. It allows deployment of large language models on a wide range of edge devices such as iPhones, Android, WebAssembly, and GPUs, enabling efficient and fast inference anywhere.
Scientific AIaillm
13