Wanli Jiang
9a133e9b41
[ https://nvbugs/5415862 ][fix] Update cublas as 12.9.1 and cuda memory alignment as 256 ( #6501 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-15 11:10:59 +08:00
Bo Li
15aabc1540
[None][fix] Fix perfect router. ( #6797 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-08-14 20:09:08 -07:00
Frank
2cc59aacb3
[None][fix] Correct reporting of torch_dtype for ModelConfig class. ( #6800 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-14 22:46:20 -04:00
Yunfan Fan
11d08c33af
[None][fix] Fix responsibility boundary between the assert and tllmException files ( #6723 )
...
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-15 10:34:49 +08:00
JunyiXu-nv
70e352a6f7
[ https://nvbugs/5437106 ][fix] Add L4 Scout benchmarking WAR option in deploy guide ( #6829 )
...
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-08-15 08:53:13 +08:00
Perkz Zheng
11d89a3732
[ https://nvbugs/5394685 ][fix] using static scheduler 2CTA MLA as WAR for an accuracy issue ( #6896 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-15 08:51:04 +08:00
hlu1
5346eb7bc5
[None][doc] Update gpt-oss doc on MoE support matrix ( #6908 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-15 08:50:31 +08:00
Aurelien Chartier
b13a5a99b2
[None][chore] Add tests for non-existent and completed request cancellation ( #6840 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-14 15:57:01 -07:00
qianbiao
5c2f0fd03d
[None] [feat] Add Tencent HunYuanMoEV1 model support ( #5521 )
...
Signed-off-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-08-15 06:56:44 +08:00
Raayan Dhar
8b237b943b
[ https://nvbugs/5441714 ][chore] remove skip on disagg n-gram test ( #6872 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-14 15:45:00 -07:00
Mike Iovine
078e907b16
[ https://nvbugs/5455651 ][fix] Make ngram use XQA attention on Blackwell ( #6873 )
...
Signed-off-by: Michael Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
2025-08-14 18:36:19 -04:00
Bo Li
26f413ad90
[ https://nvbugs/5450262 ][fix] Fix unsupported alltoall use case ( #6882 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-08-14 17:46:54 -04:00
Matthias Jouanneaux
69574ad730
[TRTLLM-5966][feat] Helix: extend mapping to support different CP types ( #6816 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-08-14 09:00:02 -07:00
Emma Qiao
96339c69a9
[None][infra] Waive failed cases on main ( #6902 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-14 23:59:44 +08:00
Jiagan Cheng
afb116f703
[None][fix] Fix python-only build that uses TRTLLM_USE_PRECOMPILED ( #6825 )
...
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-08-14 23:26:35 +08:00
kris1025
4aed7a7d19
[TRTLLM-6853][feat] refactor deepseekv3 model ( #6698 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-08-14 11:03:17 -04:00
Pengbo Wang @ NVIDIA
ffc976ceaf
[ https://nvbugs/5445466 ][fix] fix deepseek r1 hang by not enabling mnnvl by default ( #6860 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-14 22:36:56 +08:00
Shi Xiaowei
1095dfd03c
[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands ( #6323 )
2025-08-14 03:48:57 -04:00
chenfeiz0326
5cd8c0f6cc
[None][test] Add perf-sweep scripts ( #6738 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-14 14:04:47 +08:00
Tao Li @ NVIDIA
345d3d3524
[None][doc] update moe support matrix for DS R1 ( #6883 )
...
Signed-off-by: taoli <litaotju@users.noreply.github.com>
Co-authored-by: taoli <litaotju@users.noreply.github.com>
2025-08-14 13:55:11 +08:00
NVJiangShao
a700646132
[None][fix] Add FP4 all2all unitest and fix a bug for module WideEPMoE ( #6784 )
...
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-08-14 13:35:37 +08:00
Yan Chunwei
0132c1db84
[ https://nvbugs/5427043 ][fix] request length exceeds max_num_tokens ( #6821 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-14 13:31:12 +08:00
Bo Deng
d8acca495b
[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 ( #6735 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-14 04:36:38 +00:00
jmydurant
4200fa46d1
[None][feat] Add support for Hopper MLA chunked prefill ( #6655 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-14 10:39:26 +08:00
Zhenhua Wang
868c5d166e
[None][chore] fix markdown format for the deployment guide ( #6879 )
...
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-13 22:19:11 -04:00
Izzy Putterman
ef53de8eef
[None][feat] Add test for speculative rejection sampler (2-model) ( #6542 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-08-13 22:09:35 -04:00
Linda
eb4ed18a63
[None][fix] max_num_sequences argument in nanobind ( #6862 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-13 19:16:17 -04:00
Mike Iovine
7cba883932
[ https://nvbugs/5410399 ][chore] Unwaive mtp llmapi test ( #6833 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 17:38:45 -04:00
Perkz Zheng
58f7783ea4
[ https://nvbugs/5394685 ][fix] the bug with spec-decoding + SWA && an accuracy issue related to 2CTA MLA ( #6834 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-13 13:55:56 -07:00
Tin-Yin Lai
6c52bb07ff
[ https://nvbugs/5302040 ][feat] Add whisper support (Bert Attention on SM100 and GPTAttention for cross attention on SM100) ( #5527 )
...
Signed-off-by: tinyinl <tinyinl@nvidia.com>
2025-08-13 11:19:13 -07:00
danielafrimi
bda42f8c3a
[None][feat] Support running heterogeneous model execution for Nemotron-H ( #6866 )
...
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
2025-08-13 19:51:19 +03:00
Emma Qiao
c7e6145409
[None][infra] Waive failed cases on main ( #6863 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-13 09:50:14 -04:00
Anthony Chang
2198587b35
[ https://nvbugs/5378031 ] [feat] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend ( #6200 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-08-13 21:24:40 +08:00
Zhenhua Wang
8416d7fea8
[ https://nvbugs/5412885 ][doc] Add the workaround doc for H200 OOM ( #6853 )
...
Signed-off-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com>
2025-08-13 19:51:38 +08:00
Perkz Zheng
0fad6029f7
[TRTLLM-7093][fix] the perf regression to cvt_fp4 kernels ( #6851 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-13 19:13:40 +08:00
Shi Xiaowei
fe7dda834d
[TRTLLM-7030][fix] Refactor the example doc of dist-serving ( #6766 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-13 17:39:27 +08:00
Yukun He
bc5f766e0e
[TRTLLM-4501][feat] AutoTuner tuning config refactor and valid tactic generalization. ( #6545 )
...
* Generalize the definition of tactics so that users can implement more customizable tactic types, making the configurations clearer for each kernel run.
* Allow the user not to specify the `gen_tuning_buckets` or the `map_to_tuning_buckets` function.
* Other code refactoring.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-13 16:25:22 +08:00
Void
1d80df0955
[None][feat] DeepEP LL combine FP4 ( #6822 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-08-13 04:20:21 -04:00
Zhou Yuxin
50e5e725e9
[ https://nvbugs/5412456 ][fix] Fix an illegal instruction was encountered ( #6776 )
...
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-08-13 15:45:59 +08:00
Aurelien Chartier
2e0081b53e
[ #6530 ][fix] Fix script when using calibration tensors from modelopt ( #6803 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-12 20:41:10 -07:00
Mike Iovine
f68e03e646
[ https://nvbugs/5452167 ][fix] Fix ngram padding issue ( #6837 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 11:23:16 +08:00
Yechan Kim
12102e2d48
[TRTLLM-6772][feat] Multimodal benchmark_serving support ( #6622 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-12 19:34:02 -07:00
Fanrong Li
1bbc0e323b
[None][fix] Pre-allocate workspaces for DeepGEMM MoE to avoid frequent cudaFree/cudaMalloc ( #6811 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-13 10:27:57 +08:00
Kaiyu Xie
47806f09d9
feat: Support custom repo_dir for SLURM script ( #6546 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
2025-08-12 22:06:59 -04:00
rakib-hasan
2923eb88a1
[None][fix] Refactoring input prep to allow out-of-tree models ( #6497 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-12 20:29:10 -04:00
dongxuy04
bd9a6dd9ab
[TRTLLM-7008][fix] fix wideEP weights loading and args ( #6789 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-08-12 19:14:20 -04:00
Robin Kobus
45c7518032
[None][refactor] Simplify decoder state initialization ( #6559 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-12 21:44:41 +02:00
Robin Kobus
dd11e08d26
[ #6187 ][feat] add LayerNorm module ( #6625 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-12 21:43:30 +02:00
nvchenghaoz
81f0ded1c4
[None][feat] Add GPT OSS support for AutoDeploy ( #6641 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2025-08-12 14:03:22 -04:00
Jhao-Ting Chen
a060e12041
[ https://nvbugs/5438869 ][fix] Set nvfp4 expert w1 w3 weight scale to the same value if they're not ( #6656 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-08-12 20:47:10 +08:00