Commit Graph

224 Commits

Author SHA1 Message Date
William Zhang
34c1e9c341
[None][feat] Skip prefetching consolidated safetensors when appropriate (#7225)
* Why?

Some models (e.g. anything produced by Mistral) can have both sharded
safetensors and a consolidated safetensor in the same checkpoint
directory. In such cases, prefetching both to memory is a waste of time,
and memory.

* What?

This commit skips over consolidated safetensors when they are not the
only safetensor file present in the checkpoint directory.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-26 09:40:17 -07:00
Yuxian Qiu
2fb16ad328
[None][fix] fix log_once usage (#7210)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-26 19:13:03 +08:00
Wanli Jiang
036c3dd0ea
[TRTLLM-6825][fix] Update lora for phi4-mm (#7149)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-23 20:57:00 +08:00
Dom Brown
3f2eb4d2e8
[https://nvbugs/5461712] [fix] Disable deep_gemm for Qwen3 due to accuracy issues (#7170)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-08-23 05:26:12 -04:00
Jin Li
69846c6586
[https://nvbugs/5427801][fix] Torch compile support for Llama4 and Ea… (#6978)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-20 15:06:56 +08:00
Liao Lanyu
d9b9b5d053
[TRTLLM-6835][fix] Fix potential hang caused by python multiprocessing when prefetching weights (#6927)
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
2025-08-18 10:20:09 +08:00
Mike Iovine
9e02f6b9f4
[https://nvbugs/5455836][fix] Fix llama 4 FP4 (#6911)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-15 10:09:09 -04:00
brb-nv
a00ca11673
[None][chore] Add docs for Gemma3 VLMs (#6880)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-14 18:23:32 -07:00
Yukun He
d62b9c0ed7
[None][fix] Complete the last missing allreduce op in Llama3/4. (#6850)
The allreduce op of the last decoder layer is missing in some circumstances for the models Llama3 and Llama4.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-15 09:07:09 +08:00
Anurag Mukkara
a8618b2d14
[None][fix] Revert phi4-mm aggregate mode (#6907)
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-08-14 15:45:45 -04:00
2ez4bz
7ebb770dce
[None][fix] Fix batching bug in Mistral3 model (#6841)
Prior to this commit, if multiple requests with images were in the same
batch, the batching logic for the images would fail.

This commit fixes it, and adds unit tests for it that were verified to
fail prior to the fix.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-14 02:15:44 -04:00
Wanli Jiang
b4167cce68
[TRTLLM-6308][feat] Support Aggregate mode for phi4-mm (#6820)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-13 21:45:22 -07:00
2ez4bz
ccb62ef97e
[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731)
This commit adds some level of FP8 support to Mistral Small 3.1 by:

* disabling quantization for the vision sub-model since `modelopt` does
  support quantizing it (yet).
* extending existing accuracy tests to use a modelopt produced FP8
  checkpoint.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-13 21:25:55 -04:00
2ez4bz
efd0a51508
[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611) (#6765)
This commit propagates the mapping to intermediate layers to enable
tensor parallelism (amongst other things) in them.

It also fixes issues with a unit test for TP for pixtral, and adds it to a
test list.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-11 10:13:10 -07:00
brb-nv
4adde41632
[TRTLLM-6656][chore] Validate FP8 support for Gemma3 (#6678)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-07 13:14:04 -04:00
Izzy Putterman
7e0158b583
Qwen3: Fix eagle hidden states (#6199)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-08-06 17:05:18 -04:00
brb-nv
9a01934dbf
[None][feat] Switch to internal version of MMProjector in Gemma3 (#6572)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-05 21:48:23 -04:00
Yechan Kim
c17f4984e2
[None][feat] Refactor Llava-Next (#6478)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-05 17:53:53 -07:00
danielafrimi
ed801ff74b
[None][fix] Remove expand configuration from mamba2 mixer (#6521)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
2025-08-05 04:18:25 -04:00
Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec (#6379)
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
brb-nv
87e4e9f468
[None][chore] Add unit test for Gemma3 lora (#6560)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-04 04:56:57 -04:00
Yechan Kim
ee6ab5be96
chore: add EXAONE4 accuracy test (#6397)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-04 10:14:16 +08:00
Jinyang Yuan
df90202b51
[fix] Fix DeepSeek w4a8 weight loading (#6498)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-08-04 10:12:06 +08:00
brb-nv
7447d6ed85
[TRTLLM-6657][feat] Add LoRA support for Gemma3 (#6371)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-01 09:19:54 -04:00
Zongfei Jing
7bb0a78631
Deepseek R1 FP8 Support on Blackwell (#6486)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-01 10:26:28 +08:00
brb-nv
2eca0d5925
fix: Fix poor generation with FP8 Gemma3 1B checkpoint (#6499)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-31 17:18:23 -07:00
amitz-nv
1ee7a08d2b
[5830][feat] Improve LoRA cache memory control (#6220)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-07-31 09:26:38 +03:00
Vadim Gimpelson
25cd4f215e
[PERF] Move calculation Qwen2-VL's rotary_cos_sin to LLM worker process (#6004)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>
2025-07-31 09:35:24 +09:00
Wanli Jiang
9632dba02e
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-30 09:20:16 -07:00
NVShreyas
e67f4da9b5
[Perf]: Add residual, norm for nemotron_nas models (#6455)
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
2025-07-30 09:10:38 -07:00
Chang Liu
b4065d8ca6
[TRTLLM-6654][feat] Add support for external multimodal embeddings (#6263)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-07-30 10:00:15 -04:00
tomeras91
a2514d93fc
[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-30 07:22:32 -04:00
peaceh-nv
5b420ad267
Rename layer to comply with deepseek (#6393)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-30 10:00:48 +08:00
Yechan Kim
d6eb8e2366
fix: support mixture of text & multimodal prompts (#6345)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 08:52:31 +08:00
nv-guomingz
49044733e1
chore: delete useless gitkeep files. (#6400)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-28 11:38:30 -04:00
Yukun He
93a0fd0a23
[TRTLLM-6445] feat: Enable AllReduce-associated fusion patterns in Llama3/4. (#6205)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-07-28 09:36:26 +08:00
ameynaik-hub
1e5e71aa42
Mtp optimizations round1 (#5689)
Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
2025-07-25 13:48:27 -04:00
bhsueh_NV
7b6aadc800
[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend (#6235)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-24 21:47:37 +08:00
Yechan Kim
83c3ed128b
chore: set default device to cpu on Multimodal models (#5994)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-22 21:45:31 -07:00
Venky
9538c8d0e5
Add basic Nemo Ckpt Lora Loading in pytorch flow (#6019) 2025-07-22 19:42:45 -07:00
2ez4bz
ab7434ac62
[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#6152)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-22 11:06:41 -07:00
John Calderon
b7c8a672da
[Issue 6193] Fix gemma3vl weight loader (#6233)
Signed-off-by: John Calderon <johncalesp@gmail.com>
2025-07-22 10:32:18 -07:00
Yi Zhang
eb7d0f84b5 [nvbugs/5368410][fix] Disable moe allreduce for multi node (#5918)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
Chang Liu
7381f1dba7
[TRTLLM-5059][feat] Add KV cache reuse support for multimodal models (#5444)
Only supports qwen in this PR
2025-07-21 16:11:58 -07:00
brb-nv
a433ebad2b
enh: Lift expectation of single image per sample in Gemma3 VLM (#6195)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-21 08:43:07 +08:00
xiaoqi
28858c8711
feat(eagle3):support qwen3 dense model (#5879)
Signed-off-by: xq25478 <xq25478@qq.com>
2025-07-19 01:24:32 +08:00
Bo Li
07e8813984
feat: Remove padding in attention DP. (#6064)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-18 23:30:34 +08:00
2ez4bz
8480c120b1
[fix] Fix Mistral3VLM weight-loading & enable in pre-merge (#6105)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-17 11:04:17 -07:00
Shiyu Li
6e1aee6fd6
[fix] Performance Optimization for MNNVL TwoShot Kernel (#5934)
Signed-off-by: Shiyu Li <shili@nvidia.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-07-17 10:49:51 +08:00
Wanli Jiang
2d2b8bae32
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-17 06:30:58 +08:00