Commit Graph

2334 Commits

Author SHA1 Message Date
Frank
2cc59aacb3
[None][fix] Correct reporting of torch_dtype for ModelConfig class. (#6800)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-14 22:46:20 -04:00
Yunfan Fan
11d08c33af
[None][fix] Fix responsibility boundary between the assert and tllmException files (#6723)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-15 10:34:49 +08:00
JunyiXu-nv
70e352a6f7
[https://nvbugs/5437106][fix] Add L4 Scout benchmarking WAR option in deploy guide (#6829)
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-08-15 08:53:13 +08:00
Perkz Zheng
11d89a3732
[https://nvbugs/5394685][fix] using static scheduler 2CTA MLA as WAR for an accuracy issue (#6896)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-15 08:51:04 +08:00
hlu1
5346eb7bc5
[None][doc] Update gpt-oss doc on MoE support matrix (#6908)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-15 08:50:31 +08:00
Aurelien Chartier
b13a5a99b2
[None][chore] Add tests for non-existent and completed request cancellation (#6840)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-14 15:57:01 -07:00
qianbiao
5c2f0fd03d
[None] [feat] Add Tencent HunYuanMoEV1 model support (#5521)
Signed-off-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-08-15 06:56:44 +08:00
Raayan Dhar
8b237b943b
[https://nvbugs/5441714][chore] remove skip on disagg n-gram test (#6872)
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-14 15:45:00 -07:00
Mike Iovine
078e907b16
[https://nvbugs/5455651][fix] Make ngram use XQA attention on Blackwell (#6873)
Signed-off-by: Michael Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
2025-08-14 18:36:19 -04:00
Bo Li
26f413ad90
[https://nvbugs/5450262][fix] Fix unsupported alltoall use case (#6882)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-08-14 17:46:54 -04:00
Matthias Jouanneaux
69574ad730
[TRTLLM-5966][feat] Helix: extend mapping to support different CP types (#6816)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-08-14 09:00:02 -07:00
Emma Qiao
96339c69a9
[None][infra] Waive failed cases on main (#6902)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-14 23:59:44 +08:00
Jiagan Cheng
afb116f703
[None][fix] Fix python-only build that uses TRTLLM_USE_PRECOMPILED (#6825)
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-08-14 23:26:35 +08:00
kris1025
4aed7a7d19
[TRTLLM-6853][feat] refactor deepseekv3 model (#6698)
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-08-14 11:03:17 -04:00
Pengbo Wang @ NVIDIA
ffc976ceaf
[https://nvbugs/5445466][fix] fix deepseek r1 hang by not enabling mnnvl by default (#6860)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-14 22:36:56 +08:00
Shi Xiaowei
1095dfd03c
[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#6323) 2025-08-14 03:48:57 -04:00
chenfeiz0326
5cd8c0f6cc
[None][test] Add perf-sweep scripts (#6738)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-14 14:04:47 +08:00
Tao Li @ NVIDIA
345d3d3524
[None][doc] update moe support matrix for DS R1 (#6883)
Signed-off-by: taoli <litaotju@users.noreply.github.com>
Co-authored-by: taoli <litaotju@users.noreply.github.com>
2025-08-14 13:55:11 +08:00
NVJiangShao
a700646132
[None][fix] Add FP4 all2all unitest and fix a bug for module WideEPMoE (#6784)
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-08-14 13:35:37 +08:00
Yan Chunwei
0132c1db84
[https://nvbugs/5427043][fix] request length exceeds max_num_tokens (#6821)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-14 13:31:12 +08:00
Bo Deng
d8acca495b
[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 (#6735)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-14 04:36:38 +00:00
jmydurant
4200fa46d1
[None][feat] Add support for Hopper MLA chunked prefill (#6655)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-14 10:39:26 +08:00
Zhenhua Wang
868c5d166e
[None][chore] fix markdown format for the deployment guide (#6879)
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-13 22:19:11 -04:00
Izzy Putterman
ef53de8eef
[None][feat] Add test for speculative rejection sampler (2-model) (#6542)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-08-13 22:09:35 -04:00
Linda
eb4ed18a63
[None][fix] max_num_sequences argument in nanobind (#6862)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-13 19:16:17 -04:00
Mike Iovine
7cba883932
[https://nvbugs/5410399][chore] Unwaive mtp llmapi test (#6833)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 17:38:45 -04:00
Perkz Zheng
58f7783ea4
[https://nvbugs/5394685][fix] the bug with spec-decoding + SWA && an accuracy issue related to 2CTA MLA (#6834)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-13 13:55:56 -07:00
Tin-Yin Lai
6c52bb07ff
[https://nvbugs/5302040][feat] Add whisper support (Bert Attention on SM100 and GPTAttention for cross attention on SM100) (#5527)
Signed-off-by: tinyinl <tinyinl@nvidia.com>
2025-08-13 11:19:13 -07:00
danielafrimi
bda42f8c3a
[None][feat] Support running heterogeneous model execution for Nemotron-H (#6866)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
2025-08-13 19:51:19 +03:00
Emma Qiao
c7e6145409
[None][infra] Waive failed cases on main (#6863)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-13 09:50:14 -04:00
Anthony Chang
2198587b35
[https://nvbugs/5378031] [feat] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend (#6200)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-08-13 21:24:40 +08:00
Zhenhua Wang
8416d7fea8
[https://nvbugs/5412885][doc] Add the workaround doc for H200 OOM (#6853)
Signed-off-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com>
2025-08-13 19:51:38 +08:00
Perkz Zheng
0fad6029f7
[TRTLLM-7093][fix] the perf regression to cvt_fp4 kernels (#6851)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-13 19:13:40 +08:00
Shi Xiaowei
fe7dda834d
[TRTLLM-7030][fix] Refactor the example doc of dist-serving (#6766)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-13 17:39:27 +08:00
Yukun He
bc5f766e0e
[TRTLLM-4501][feat] AutoTuner tuning config refactor and valid tactic generalization. (#6545)
* Generalize the definition of tactics so that users can implement more customizable tactic types, making the configurations clearer for each kernel run. 
* Allow the user not to specify the `gen_tuning_buckets` or the `map_to_tuning_buckets` function.
* Other code refactoring.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-13 16:25:22 +08:00
Void
1d80df0955
[None][feat] DeepEP LL combine FP4 (#6822)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-08-13 04:20:21 -04:00
Zhou Yuxin
50e5e725e9
[https://nvbugs/5412456][fix] Fix an illegal instruction was encountered (#6776)
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-08-13 15:45:59 +08:00
Aurelien Chartier
2e0081b53e
[#6530][fix] Fix script when using calibration tensors from modelopt (#6803)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-12 20:41:10 -07:00
Mike Iovine
f68e03e646
[https://nvbugs/5452167][fix] Fix ngram padding issue (#6837)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 11:23:16 +08:00
Yechan Kim
12102e2d48
[TRTLLM-6772][feat] Multimodal benchmark_serving support (#6622)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-12 19:34:02 -07:00
Fanrong Li
1bbc0e323b
[None][fix] Pre-allocate workspaces for DeepGEMM MoE to avoid frequent cudaFree/cudaMalloc (#6811)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-13 10:27:57 +08:00
Kaiyu Xie
47806f09d9
feat: Support custom repo_dir for SLURM script (#6546)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
2025-08-12 22:06:59 -04:00
rakib-hasan
2923eb88a1
[None][fix] Refactoring input prep to allow out-of-tree models (#6497)
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-12 20:29:10 -04:00
dongxuy04
bd9a6dd9ab
[TRTLLM-7008][fix] fix wideEP weights loading and args (#6789)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-08-12 19:14:20 -04:00
Robin Kobus
45c7518032
[None][refactor] Simplify decoder state initialization (#6559)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-12 21:44:41 +02:00
Robin Kobus
dd11e08d26
[#6187][feat] add LayerNorm module (#6625)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-12 21:43:30 +02:00
nvchenghaoz
81f0ded1c4
[None][feat] Add GPT OSS support for AutoDeploy (#6641)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2025-08-12 14:03:22 -04:00
Jhao-Ting Chen
a060e12041
[https://nvbugs/5438869][fix] Set nvfp4 expert w1 w3 weight scale to the same value if they're not (#6656)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-08-12 20:47:10 +08:00
xinhe-nv
e35fca4272
[TRTQA-2920][chore] improve hang tests (#6781)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-12 18:26:51 +08:00
QI JUN
8845e0f065
[None][fix] fix ci (#6814) 2025-08-12 02:21:50 -07:00