..
baichuan
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
bert
doc: fix path after examples migration ( #3814 )
2025-04-24 02:36:45 +08:00
bloom
Update TensorRT-LLM
2024-08-20 18:55:15 +08:00
chatglm
Update TensorRT-LLM ( #2820 )
2025-02-25 21:21:49 +08:00
clip
Update ( #2978 )
2025-03-23 16:39:35 +08:00
cogvlm
Update TensorRT-LLM ( #2562 )
2024-12-11 00:31:05 -08:00
commandr
Update TensorRT-LLM ( #2562 )
2024-12-11 00:31:05 -08:00
dbrx
Update TensorRT-LLM ( #1793 )
2024-06-18 18:18:23 +08:00
deepseek_v1
Update TensorRT-LLM ( #2755 )
2025-02-11 03:01:00 +00:00
deepseek_v2
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
dit
Support RingAttention in the BertAttention plugin and the DiT model ( #3661 )
2025-05-09 08:06:54 +08:00
eagle
[refactor] Simplification of Speculative decoding configs ( #5639 )
2025-07-10 11:37:30 -04:00
enc_dec
fix: nvbugs/5075538: fix cross attention mask when decoder input len > 1 ( #3585 )
2025-04-16 08:31:33 +08:00
falcon
Update TensorRT-LLM ( #2562 )
2024-12-11 00:31:05 -08:00
gemma
[5305318] fix: Fix the accuracy issue when reduce_fusion is enabled for GEMMA model. ( #5801 )
2025-07-08 19:51:05 +08:00
gpt
feat: Add support for fp8 rowwise quantization ( #4876 )
2025-06-14 06:37:48 -07:00
gptj
Update TensorRT-LLM ( #2562 )
2024-12-11 00:31:05 -08:00
gptneox
Update TensorRT-LLM ( #1891 )
2024-07-04 14:37:19 +08:00
grok
Update TensorRT-LLM ( #2562 )
2024-12-11 00:31:05 -08:00
llama
feat: Support Mistral Small 3.1 24B VLM in TRT workflow ( #4183 )
2025-05-14 03:47:22 +08:00
mamba
Update TensorRT-LLM ( #2755 )
2025-02-11 03:01:00 +00:00
medusa
[refactor] Simplification of Speculative decoding configs ( #5639 )
2025-07-10 11:37:30 -04:00
mllama
chore: remove usernames from comments ( #3291 )
2025-04-05 13:44:28 +08:00
mmdit_sd3
Update TensorRT-LLM ( #2849 )
2025-03-04 18:44:00 +08:00
mpt
Update TensorRT-LLM ( #1763 )
2024-06-11 16:59:02 +08:00
multimodal_encoders
Update ( #2978 )
2025-03-23 16:39:35 +08:00
nemotron_nas
test(perf): Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (TRT flow, trtllm-bench) ( #4128 )
2025-05-19 12:00:48 -07:00
opt
Add initial EAGLE-3 implementation ( #3035 )
2025-03-29 22:31:24 +08:00
phi
Update ( #2978 )
2025-03-23 16:39:35 +08:00
phi3
fix: Unable to load phi4-model with tp_size>1 ( #5962 )
2025-07-16 11:39:41 +08:00
qwen
[FIX] fix bugs caused by None attention_bias during Qwen3 model convert engine ( #6344 )
2025-07-30 07:13:44 +08:00
recurrentgemma
Update TensorRT-LLM ( #2755 )
2025-02-11 03:01:00 +00:00
redrafter
[refactor] Simplification of Speculative decoding configs ( #5639 )
2025-07-10 11:37:30 -04:00
stdit
Update TensorRT-LLM ( #2873 )
2025-03-11 21:13:42 +08:00
unet
chore: remove usernames from comments ( #3291 )
2025-04-05 13:44:28 +08:00
__init__.py
[feat] Add TensorRT-Engine Qwen3 (dense) model support ( #5650 )
2025-07-10 10:26:06 +08:00
automodel.py
[nvbug/5387226] chore: add propogation for trust_remote_code to AutoConfig ( #6001 )
2025-07-16 16:05:38 +08:00
convert_utils.py
feat: adding multimodal (only image for now) support in trtllm-bench ( #3490 )
2025-04-18 07:06:16 +08:00
generation_mixin.py
fix: [nvbugs/5287097] Align PP layer distribution between pytorch and TRT flow. ( #4399 )
2025-05-19 14:25:36 -07:00
model_weights_loader.py
Add support for Phi-4-mini ( #2990 )
2025-04-02 08:34:39 +08:00
modeling_utils.py
[feat] Auto-enable ngram with concurrency <= 32. ( #6232 )
2025-07-31 18:45:51 -04:00