Commit Graph

2354 Commits

Author SHA1 Message Date
Izzy Putterman
f6ff0e3311
[None][fix] Skip Topk if 0 (#6934)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-08-16 02:17:36 -04:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
Yiqing Yan
ec3d9f8052
[None][chore] Bump version to 1.1.0rc1 (#6953)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-16 10:32:47 +08:00
brb-nv
9505727d31
[https://nvbugs/5401114][fix] Unwaive Gemma3 tests (#6952)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-15 16:35:02 -07:00
Yuening Li
1f8ae2b2db
[TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow (#6629)
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
2025-08-15 17:15:49 -04:00
dongfengy
0ad0b967bb
[None][fix] Make TP working for Triton MOE (in additional to EP we are using) (#6722)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-15 16:58:42 -04:00
ajrasane
4162d2d746
[None][test] Add accuracy evaluation for AutoDeploy (#6764)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-15 13:46:09 -04:00
yifeizhang-c
4127d77678
[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537)
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-08-15 09:52:06 -07:00
Perkz Zheng
6037fe3716
[https://nvbugs/5394685][fix] proper fix for the accuracy issue in 2CTA MLA kernels (#6941)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-15 23:29:36 +08:00
liji-nv
18ccd053d3
[https://nvbugs/5427801][fix] Torch compile support for Llama4 and Ea… (#6858)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-15 11:14:20 -04:00
tomeras91
f7dbc1435a
[None] [chore] Mamba cache in separate file (#6796)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-08-15 13:42:51 +03:00
Xianjie Qiao
c2fe8b03a2
[https://nvbugs/5405041][fix] Update wide-ep doc (#6933)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-08-15 05:32:32 -04:00
peaceh-nv
1c1d5d2495
[https://nvbugs/5451373][fix] : Fix the accuracy issue when using FP8 context MLA (#6881)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-08-15 16:53:56 +08:00
Zhenhua Wang
fadb5e75dd
[None][chore] add a EditorConfig config (#6897)
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-15 03:54:37 -04:00
xinhe-nv
b23fdfc62f
[None][chore] Add failed cases into waives.txt (#6914)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-08-15 14:00:16 +08:00
jmydurant
8e252256f5
[None][doc] Modify the description for mla chunked context (#6929)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-15 12:52:26 +08:00
Yanchao Lu
3a987891d8
[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures (#6836)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-15 11:16:07 +08:00
Bo Deng
e54ba75dac
[None][fix] Update tests to use standardized uppercase backend identifiers (#6921)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-15 11:14:15 +08:00
Wanli Jiang
9a133e9b41
[https://nvbugs/5415862][fix] Update cublas as 12.9.1 and cuda memory alignment as 256 (#6501)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-15 11:10:59 +08:00
Bo Li
15aabc1540
[None][fix] Fix perfect router. (#6797)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-08-14 20:09:08 -07:00
Frank
2cc59aacb3
[None][fix] Correct reporting of torch_dtype for ModelConfig class. (#6800)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-14 22:46:20 -04:00
Yunfan Fan
11d08c33af
[None][fix] Fix responsibility boundary between the assert and tllmException files (#6723)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-15 10:34:49 +08:00
JunyiXu-nv
70e352a6f7
[https://nvbugs/5437106][fix] Add L4 Scout benchmarking WAR option in deploy guide (#6829)
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-08-15 08:53:13 +08:00
Perkz Zheng
11d89a3732
[https://nvbugs/5394685][fix] using static scheduler 2CTA MLA as WAR for an accuracy issue (#6896)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-15 08:51:04 +08:00
hlu1
5346eb7bc5
[None][doc] Update gpt-oss doc on MoE support matrix (#6908)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-15 08:50:31 +08:00
Aurelien Chartier
b13a5a99b2
[None][chore] Add tests for non-existent and completed request cancellation (#6840)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-14 15:57:01 -07:00
qianbiao
5c2f0fd03d
[None] [feat] Add Tencent HunYuanMoEV1 model support (#5521)
Signed-off-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-08-15 06:56:44 +08:00
Raayan Dhar
8b237b943b
[https://nvbugs/5441714][chore] remove skip on disagg n-gram test (#6872)
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-14 15:45:00 -07:00
Mike Iovine
078e907b16
[https://nvbugs/5455651][fix] Make ngram use XQA attention on Blackwell (#6873)
Signed-off-by: Michael Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
2025-08-14 18:36:19 -04:00
Bo Li
26f413ad90
[https://nvbugs/5450262][fix] Fix unsupported alltoall use case (#6882)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-08-14 17:46:54 -04:00
Matthias Jouanneaux
69574ad730
[TRTLLM-5966][feat] Helix: extend mapping to support different CP types (#6816)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-08-14 09:00:02 -07:00
Emma Qiao
96339c69a9
[None][infra] Waive failed cases on main (#6902)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-14 23:59:44 +08:00
Jiagan Cheng
afb116f703
[None][fix] Fix python-only build that uses TRTLLM_USE_PRECOMPILED (#6825)
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-08-14 23:26:35 +08:00
kris1025
4aed7a7d19
[TRTLLM-6853][feat] refactor deepseekv3 model (#6698)
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-08-14 11:03:17 -04:00
Pengbo Wang @ NVIDIA
ffc976ceaf
[https://nvbugs/5445466][fix] fix deepseek r1 hang by not enabling mnnvl by default (#6860)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-14 22:36:56 +08:00
Shi Xiaowei
1095dfd03c
[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#6323) 2025-08-14 03:48:57 -04:00
chenfeiz0326
5cd8c0f6cc
[None][test] Add perf-sweep scripts (#6738)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-14 14:04:47 +08:00
Tao Li @ NVIDIA
345d3d3524
[None][doc] update moe support matrix for DS R1 (#6883)
Signed-off-by: taoli <litaotju@users.noreply.github.com>
Co-authored-by: taoli <litaotju@users.noreply.github.com>
2025-08-14 13:55:11 +08:00
NVJiangShao
a700646132
[None][fix] Add FP4 all2all unitest and fix a bug for module WideEPMoE (#6784)
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-08-14 13:35:37 +08:00
Yan Chunwei
0132c1db84
[https://nvbugs/5427043][fix] request length exceeds max_num_tokens (#6821)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-14 13:31:12 +08:00
Bo Deng
d8acca495b
[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 (#6735)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-14 04:36:38 +00:00
jmydurant
4200fa46d1
[None][feat] Add support for Hopper MLA chunked prefill (#6655)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-14 10:39:26 +08:00
Zhenhua Wang
868c5d166e
[None][chore] fix markdown format for the deployment guide (#6879)
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-13 22:19:11 -04:00
Izzy Putterman
ef53de8eef
[None][feat] Add test for speculative rejection sampler (2-model) (#6542)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-08-13 22:09:35 -04:00
Linda
eb4ed18a63
[None][fix] max_num_sequences argument in nanobind (#6862)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-13 19:16:17 -04:00
Mike Iovine
7cba883932
[https://nvbugs/5410399][chore] Unwaive mtp llmapi test (#6833)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 17:38:45 -04:00
Perkz Zheng
58f7783ea4
[https://nvbugs/5394685][fix] the bug with spec-decoding + SWA && an accuracy issue related to 2CTA MLA (#6834)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-13 13:55:56 -07:00
Tin-Yin Lai
6c52bb07ff
[https://nvbugs/5302040][feat] Add whisper support (Bert Attention on SM100 and GPTAttention for cross attention on SM100) (#5527)
Signed-off-by: tinyinl <tinyinl@nvidia.com>
2025-08-13 11:19:13 -07:00
danielafrimi
bda42f8c3a
[None][feat] Support running heterogeneous model execution for Nemotron-H (#6866)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
2025-08-13 19:51:19 +03:00
Emma Qiao
c7e6145409
[None][infra] Waive failed cases on main (#6863)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-13 09:50:14 -04:00