TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-18 16:55:08 +08:00

Author	SHA1	Message	Date
Balaram Buddharaju	c64bc14719	[None][chore] Waive moe fp4 test (#11558 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-02-17 15:39:18 -08:00
Balaram Buddharaju	957f803dd2	[None][chore] Waive failing pre-merge test (#11551 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-02-18 03:51:07 +08:00
Bala Marimuthu	6157f30b06	[#11318 ][infra] AutoDeploy: Add fused rope kernel - triton_rope_on_interleaved_qk_inputs (#11327 ) Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>	2026-02-18 02:24:18 +08:00
Bala Marimuthu	1c065fbb3e	[#11109 ][feat] AutoDeploy: GLM 4.7 Flash Improvements (#11414 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com> Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-02-17 08:43:59 -05:00
Yanchao Lu	cc4511997a	[None][revert] - Revert "[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework" (#11532 )	2026-02-16 21:23:12 +08:00
mpikulski	08c7103fc4	[TRTLLM-10030][test] ensure that TorchSampler does not sync (#11508 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-16 13:10:40 +01:00
Suyog Gupta	f3d784c6f6	[#10345 ][perf] Enable multi-stream MOE for super. Also adds multi-stream MLA attn (#11520 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2026-02-15 15:07:56 -08:00
Yi Zhang	361ff36784	[None][feat] Use new index api, add block scale support, fix max_seq_len esitmation, add flash mla support (#11334 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-02-15 21:40:54 +08:00
yingguo-trt	59b6bee7e6	[None][chore] Fix slurm job name (#11265 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-15 19:57:03 +08:00
Ivy Zhang	17e6062690	[https://nvbugs/5821433 ][fix] complete WAR for popen in QA env (#11214 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-15 19:57:03 +08:00
Pengbo Wang	2b4ef3a014	[https://nvbugs/5815025 ][fix] Fix spec-dec mode flag and related cpp requirements (#10996 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-15 19:57:03 +08:00
Emma Qiao	5e47e6970b	[None][infra] Waive failed cases for release branch on 02/02 (#11182 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-15 19:57:03 +08:00
Pengyun Lin	592988ebdb	[https://nvbugs/5819444 ][fix] Unwaive gpt-oss test (#10927 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-15 19:57:03 +08:00
xinhe-nv	80708ba231	[https://nvbugs/5787904 ][fix] update mig tests (#11014 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-15 19:57:03 +08:00
dominicshanshan	d8e7c61ea9	[https://nvbugs/5823465 ][fix] Add CUTEDSL moe backend for deepseek r1 nvfp4 checkpoint in stress test (#10920 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-15 19:57:03 +08:00
Yukun He	ed404f9298	[TRTLLM-10851][feat] Add line_profiler tool for host overhead analysis. (#11232 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-02-15 16:18:10 +08:00
Chuang Zhu	0a9ddf8c17	[https://nvbugs/5880261 ][fix] fix cacheTransceiver (#11409 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-02-15 10:40:44 +08:00
Balaram Buddharaju	2989bf5b39	[None][feat] Add new helix kernels for MNNVL-based codepath (#11433 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-02-14 09:39:24 +08:00
William Zhang	4debf153d8	[#11170 ][fix] Fix for mm placeholder counts (#11461 ) * Why? As reported by #11170, when a single request contains multiple messages, and only a subset of those messages include multimodal data, the previous logic incorrectly adds placeholder tokens to subsequent messages that do not contain such data. * What? This commit fixes this issue, and adds unit tests that would have caught this. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-02-14 09:12:03 +08:00
Suyog Gupta	b4e9669d2c	[None][chore] Optimize MOE export by tracing with reduced experts and expanding graph (#11504 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2026-02-13 16:59:30 -08:00
tburt-nv	f164669c04	[None][chore] Adjust waive to avoid sm parsing (#11518 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2026-02-13 17:38:40 -05:00
Chang Liu	26901e4aa0	[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462 ) Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com> Co-authored-by: Freddy Qi <junq@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>	2026-02-14 06:11:11 +08:00
Lizhi Zhou	6837e73219	[https://nvbugs/5847284 ][fix] fix cuda oom error (#11219 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-13 19:04:33 +08:00
yuanjingx87	ca499d600d	[None][infra] Waive failed test in Post-Merge (#11491 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2026-02-12 22:57:17 -08:00
Balaram Buddharaju	db35119c7c	[None][chore] Waive test blocking pre-merge (#11498 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-02-12 20:08:14 -08:00
xxi	2565f0f4e4	[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework (#11437 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-02-13 11:05:38 +08:00
Yukun He	cb1d8d130f	[TRTLLM-10791][feat] TorchSampler general host time optimization (#11141 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-02-12 18:05:58 +01:00
Pamela Peng	4b2b1d146b	[https://nvbugs/5810935 ][test] unwaive RTX 6000 pro tests (#11452 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2026-02-12 11:17:45 -05:00
Wanli Jiang	421eb9e39c	[None][feat] Optimize NemotronH model with elementwise and nvfp4 fusion (#11273 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-02-12 09:25:31 -05:00
xinhe-nv	ef7830d137	[None][chore] Add failed cases into waives.txt (#11447 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-02-12 07:47:25 -05:00
JennyLiu	11d79aa875	[https://nvbugs/5832481 ][test] Add gpt-oss-120b-Eagle3-throughput case on DGX-Spark (#11419 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-02-12 05:33:39 -05:00
Tailing Yuan	31cdbdfd72	[https://nvbugs/5808500 ][chore] Move DeepEPLowLatency tests to machines that support IBGDA with GPU handles (#11178 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-02-12 03:58:01 -05:00
mpikulski	d0f3c412ff	[TRTLLM-10030][chore] refactor finish reasons tests (#11445 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-12 08:32:50 +01:00
xinhe-nv	3c1323442b	[None][chore] Add failed cases into waives.txt (#11451 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-02-12 02:31:34 -05:00
Simeng Liu	12085536df	[TRTLLM-10487][feat] Add user-provided UUID support for multimodal KV cache identification. (#11075 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-02-12 00:48:47 -05:00
Perkz Zheng	e0b11d6ea0	[https://nvbugs/5804923 ][none] unwaive test (#11005 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2026-02-12 13:26:28 +08:00
William Zhang	ca9537e17c	[TRTLLM-10858][feat] Multi-image support for EPD disagg (#11264 ) * Why? Prior to this commit, we only supported a single multimodal input for E/P/D disaggregated serving. * What? This commit does a minor refactor of the multimodal embedding handles that cross process boundaries to enable this. Existing unit tests are updated accordingly to test this. The `RequestOutput` has its `mm_embedding_handle` replaced in favor of `disaggregated_params`, addressing a previous TODO. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-02-11 20:50:00 -08:00
xinhe-nv	42648734b8	[None][chore] Add failed cases into waives.txt (#11392 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-11 21:52:29 -05:00
Liao Lanyu	58165d5394	[None][chore] Introduceing an abstract WaitingQueue interface to decouple the request scheduling logic from specific queue implementations (#11330 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-02-12 09:18:24 +08:00
Emma Qiao	8ebd6056fa	[None][infra] Waive failed cases for main on 2/11 (#11441 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-11 15:25:52 +08:00
Bo Li	5ea6888dda	[https://nvbugs/5810940 ][fix] Update lm_eval to 4.9.10 and re-enable Skip Softmax Attention tests on CI. (#11176 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-02-11 00:54:40 -05:00
peihengh	a982554190	[https://nvbugs/5868038 ][fix] Gracefully terminate disagg serving servers to prevent leftover subprocess warnings (#11395 ) Signed-off-by: peihu-nv <259410613+peihu-nv@users.noreply.github.com>	2026-02-10 22:41:37 -05:00
Iman Tabrizian	7d992972b2	[TRTLLM-10273][feat] Move MambaCacheManager from Python to C++ (#10540 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-02-10 07:20:56 -08:00
Yiqing Yan	cf02456613	[TRTLLM-9711][infra] Fix the testcase name in timeout xml (#9781 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-02-10 18:50:42 +08:00
xinhe-nv	c7689df152	[None][chore] Add failed cases into waives.txt (#11396 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-10 05:50:16 -05:00
xinhe-nv	6e0659dc4d	[None][chore] Add failed cases into waives.txt (#11363 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-02-10 05:48:33 -05:00
dominicshanshan	2a4e70b4a9	[None][chore] Unwaive tests after last MI (#11400 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-10 17:12:39 +08:00
Emma Qiao	8a74ccc57e	[None][infra] Waive failed cases for main branch on 02/10 (#11413 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-02-10 03:21:59 -05:00
Yuxian Qiu	5f4df89109	[None][feat] Fully non-blocking pipeline parallelism executor loop. (#10349 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-02-10 15:43:28 +08:00
shuyixiong	c3cdc93211	[TRTLLM-9771][feat] Make update_weights compatible with CUDA Graph (#11267 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2026-02-10 01:12:49 -05:00

1 2 3 4 5 ...

2869 Commits