open source

TensorRT-LLM

Provided by:

NVIDIA

• Framework: C++/Python

TensorRT-LLM is an open-source library by NVIDIA that delivers highly optimized inference for large language models. It leverages TensorRT and CUDA to accelerate transformer-based models, enabling efficient deployment across GPUs with minimal latency. Built for developers aiming to scale LLMs efficiently.

TensorRT-LLM AI Model

Views

October 10, 2023

Released

Jul 20, 2025

Last Checked

0.8

Version

Capabilities

Inference Acceleration
Quantization

Performance Benchmarks

Latency35ms/token

Throughput2.4x faster than vLLM

Technical Specifications

Parameter Count: N/A

Training & Dataset

Dataset Used

N/A

Related AI Models

Discover similar AI models that might interest you

More AI Models

Modelopen source

MLC-LLM

CMU, SAMOA, OctoML

MLC-LLM is a universal and open-source framework developed by the Apache TVM community, CMU, and SAMOAI. It allows deployment of large language models on a wide range of edge devices such as iPhones, Android, WebAssembly, and GPUs, enabling efficient and fast inference anywhere.

Scientific AIaillm

Modelfree

Bloom

BigScience

Bloom is an open-source multilingual transformer model developed by BigScience, designed for a variety of natural language processing tasks across multiple languages.

Natural Language Processingllm

Modelopen source

ControlNet

lllyasviel

Conditional image generation with additional control signals.

Generative Modelsaiimage-generation

Model Performance Statistics

Dataset Used

Related AI Models

MLC-LLM

MLC-LLM

Bloom

Bloom

ControlNet

ControlNet