Bala Marimuthu
1c065fbb3e
[ #11109 ][feat] AutoDeploy: GLM 4.7 Flash Improvements ( #11414 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-02-17 08:43:59 -05:00
TensorRT LLM
fedd7178d1
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-17 03:10:50 +00:00
jthomson04
2450188808
[None][fix] Better error message for mismatched MPI world size ( #11294 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2026-02-16 15:37:49 -08:00
Yanchao Lu
cc4511997a
[None][revert] - Revert "[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework" ( #11532 )
2026-02-16 21:23:12 +08:00
mpikulski
08c7103fc4
[TRTLLM-10030][test] ensure that TorchSampler does not sync ( #11508 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-16 13:10:40 +01:00
TensorRT LLM
d72f8098fe
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-16 03:04:44 +00:00
Suyog Gupta
f3d784c6f6
[ #10345 ][perf] Enable multi-stream MOE for super. Also adds multi-stream MLA attn ( #11520 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2026-02-15 15:07:56 -08:00
tcherckez-nvidia
fcb7bea07f
[ #11455 ][bug] Use the torch_dtype set by ModelOpt ( #11525 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-02-15 19:37:59 +02:00
Yi Zhang
361ff36784
[None][feat] Use new index api, add block scale support, fix max_seq_len esitmation, add flash mla support ( #11334 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-02-15 21:40:54 +08:00
yingguo-trt
59b6bee7e6
[None][chore] Fix slurm job name ( #11265 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Ivy Zhang
17e6062690
[ https://nvbugs/5821433 ][fix] complete WAR for popen in QA env ( #11214 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Pengbo Wang
2b4ef3a014
[ https://nvbugs/5815025 ][fix] Fix spec-dec mode flag and related cpp requirements ( #10996 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Yechan Kim
ebd859cf61
[ https://nvbugs/5854419 ][fix] Fix Qwen3-VL-Dense/MoE accuracy drop ( #11134 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Mike Iovine
435ea36977
[None][chore] Add warning about 2-model MTP deprecation ( #11043 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Emma Qiao
5e47e6970b
[None][infra] Waive failed cases for release branch on 02/02 ( #11182 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Pengyun Lin
592988ebdb
[ https://nvbugs/5819444 ][fix] Unwaive gpt-oss test ( #10927 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
xinhe-nv
80708ba231
[ https://nvbugs/5787904 ][fix] update mig tests ( #11014 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
dominicshanshan
d8e7c61ea9
[ https://nvbugs/5823465 ][fix] Add CUTEDSL moe backend for deepseek r1 nvfp4 checkpoint in stress test ( #10920 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
dhansen-nvidia
80235e53cf
[None][feat] Add documentation on configuring CPU affinity in TRT-LLM ( #10678 )
...
Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Ziyi Xiong
5d73194ffb
[ https://nvbugs/5829830 ][fix] Declare the var in the correct scope ( #11066 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Patrice Castonguay
d9f787a8d2
[None][doc] Hardware support update ( #10719 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Yukun He
ed404f9298
[TRTLLM-10851][feat] Add line_profiler tool for host overhead analysis. ( #11232 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-02-15 16:18:10 +08:00
Chang Liu
b003355050
[None][doc] Add doc for TRTLLM AIGV initial release ( #11489 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2026-02-14 22:16:23 -08:00
TensorRT LLM
144188c2c4
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-15 03:11:51 +00:00
Chuang Zhu
0a9ddf8c17
[ https://nvbugs/5880261 ][fix] fix cacheTransceiver ( #11409 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-02-15 10:40:44 +08:00
Thor Johnsen
29e44dd749
[None][fix] Add cacheSaltID property to BlockKey serialization code ( #11457 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2026-02-14 10:22:35 +08:00
Balaram Buddharaju
2989bf5b39
[None][feat] Add new helix kernels for MNNVL-based codepath ( #11433 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-02-14 09:39:24 +08:00
William Zhang
4debf153d8
[ #11170 ][fix] Fix for mm placeholder counts ( #11461 )
...
* Why?
As reported by #11170 , when a single request contains multiple
messages, and only a subset of those messages include multimodal data,
the previous logic incorrectly adds placeholder tokens to subsequent
messages that do not contain such data.
* What?
This commit fixes this issue, and adds unit tests that would have
caught this.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-14 09:12:03 +08:00
Suyog Gupta
b4e9669d2c
[None][chore] Optimize MOE export by tracing with reduced experts and expanding graph ( #11504 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2026-02-13 16:59:30 -08:00
tburt-nv
f164669c04
[None][chore] Adjust waive to avoid sm parsing ( #11518 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2026-02-13 17:38:40 -05:00
Chang Liu
26901e4aa0
[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM ( #11462 )
...
Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Co-authored-by: Freddy Qi <junq@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2026-02-14 06:11:11 +08:00
Pamela Peng
19a3031ecb
[TRTLLM-10329][feat] Fix weight loading for Nemotron 3 models on DGX Spark ( #11405 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2026-02-13 15:29:41 -05:00
dpitman-nvda
052fe2f7f6
[None][chore] Update allowlist 2026-02-13 ( #11512 )
...
Signed-off-by: Derek Pitman <dpitman@nvidia.com>
2026-02-14 01:28:26 +08:00
mpikulski
37c53425c1
[TRTLLM-10030][chore] improve assert in sampler ( #11475 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-13 21:54:28 +08:00
Venky
b67dcd8fef
[None][docs] enable Deepwiki docs ( #11492 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2026-02-13 20:25:08 +08:00
Lizhi Zhou
6837e73219
[ https://nvbugs/5847284 ][fix] fix cuda oom error ( #11219 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-13 19:04:33 +08:00
mpikulski
0ee757e03a
[TRTLLM-10030][chore] use weakref in atexit handler ( #11476 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-13 18:02:29 +08:00
yuanjingx87
ca499d600d
[None][infra] Waive failed test in Post-Merge ( #11491 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-02-12 22:57:17 -08:00
Gal Hubara-Agam
d0e7ba102e
[ #11455 ][fix] Fallback to triton_ssm for nvfp4 quantization ( #11456 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-02-13 07:38:37 +02:00
Balaram Buddharaju
db35119c7c
[None][chore] Waive test blocking pre-merge ( #11498 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-02-12 20:08:14 -08:00
xxi
2565f0f4e4
[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework ( #11437 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-02-13 11:05:38 +08:00
dpitman-nvda
45d3792245
[TRTINFRA-7648][chore] Add SECURITY.md file to TensorRT-LLM GitHub ( #11484 )
...
Signed-off-by: Derek Pitman <dpitman@nvidia.com>
2026-02-12 20:46:28 -05:00
Iman Tabrizian
dd74f90914
[ https://nvbugs/5887893 ][fix] Make NVML work with older CUDA driver versions ( #11465 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-12 18:06:47 -05:00
Ludwig Schneider
5130cbd73e
[None][fix] Pre-Allocation for Auto-Tuning NCCL_SYMMETRIC ( #11326 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-02-12 14:31:51 -08:00
Balaram Buddharaju
9c2d23c2e5
[ https://nvbugs/5888410 ][fix] Enable warmup for Helix CP ( #11460 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-02-12 14:24:51 -08:00
tburt-nv
07cd3d4ff2
[None][chore] Bump version to 1.3.0rc4 ( #11485 )
...
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2026-02-12 16:55:23 -05:00
Yukun He
cb1d8d130f
[TRTLLM-10791][feat] TorchSampler general host time optimization ( #11141 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-02-12 18:05:58 +01:00
Pamela Peng
4b2b1d146b
[ https://nvbugs/5810935 ][test] unwaive RTX 6000 pro tests ( #11452 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2026-02-12 11:17:45 -05:00
Wanli Jiang
421eb9e39c
[None][feat] Optimize NemotronH model with elementwise and nvfp4 fusion ( #11273 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-02-12 09:25:31 -05:00
xinhe-nv
ef7830d137
[None][chore] Add failed cases into waives.txt ( #11447 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-02-12 07:47:25 -05:00