TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-19 01:05:12 +08:00

Author	SHA1	Message	Date
Bala Marimuthu	6157f30b06	[#11318 ][infra] AutoDeploy: Add fused rope kernel - triton_rope_on_interleaved_qk_inputs (#11327 ) Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>	2026-02-18 02:24:18 +08:00
Bala Marimuthu	1c065fbb3e	[#11109 ][feat] AutoDeploy: GLM 4.7 Flash Improvements (#11414 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com> Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-02-17 08:43:59 -05:00
jthomson04	2450188808	[None][fix] Better error message for mismatched MPI world size (#11294 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2026-02-16 15:37:49 -08:00
Yanchao Lu	cc4511997a	[None][revert] - Revert "[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework" (#11532 )	2026-02-16 21:23:12 +08:00
Suyog Gupta	f3d784c6f6	[#10345 ][perf] Enable multi-stream MOE for super. Also adds multi-stream MLA attn (#11520 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2026-02-15 15:07:56 -08:00
tcherckez-nvidia	fcb7bea07f	[#11455 ][bug] Use the torch_dtype set by ModelOpt (#11525 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2026-02-15 19:37:59 +02:00
Yi Zhang	361ff36784	[None][feat] Use new index api, add block scale support, fix max_seq_len esitmation, add flash mla support (#11334 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-02-15 21:40:54 +08:00
Pengbo Wang	2b4ef3a014	[https://nvbugs/5815025 ][fix] Fix spec-dec mode flag and related cpp requirements (#10996 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-15 19:57:03 +08:00
Yechan Kim	ebd859cf61	[https://nvbugs/5854419 ][fix] Fix Qwen3-VL-Dense/MoE accuracy drop (#11134 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-15 19:57:03 +08:00
Mike Iovine	435ea36977	[None][chore] Add warning about 2-model MTP deprecation (#11043 ) Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-15 19:57:03 +08:00
Yukun He	ed404f9298	[TRTLLM-10851][feat] Add line_profiler tool for host overhead analysis. (#11232 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-02-15 16:18:10 +08:00
Balaram Buddharaju	2989bf5b39	[None][feat] Add new helix kernels for MNNVL-based codepath (#11433 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-02-14 09:39:24 +08:00
William Zhang	4debf153d8	[#11170 ][fix] Fix for mm placeholder counts (#11461 ) * Why? As reported by #11170, when a single request contains multiple messages, and only a subset of those messages include multimodal data, the previous logic incorrectly adds placeholder tokens to subsequent messages that do not contain such data. * What? This commit fixes this issue, and adds unit tests that would have caught this. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-02-14 09:12:03 +08:00
Suyog Gupta	b4e9669d2c	[None][chore] Optimize MOE export by tracing with reduced experts and expanding graph (#11504 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2026-02-13 16:59:30 -08:00
Chang Liu	26901e4aa0	[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462 ) Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com> Co-authored-by: Freddy Qi <junq@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>	2026-02-14 06:11:11 +08:00
Pamela Peng	19a3031ecb	[TRTLLM-10329][feat] Fix weight loading for Nemotron 3 models on DGX Spark (#11405 ) Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>	2026-02-13 15:29:41 -05:00
mpikulski	37c53425c1	[TRTLLM-10030][chore] improve assert in sampler (#11475 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-13 21:54:28 +08:00
mpikulski	0ee757e03a	[TRTLLM-10030][chore] use weakref in atexit handler (#11476 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-13 18:02:29 +08:00
Gal Hubara-Agam	d0e7ba102e	[#11455 ][fix] Fallback to triton_ssm for nvfp4 quantization (#11456 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-02-13 07:38:37 +02:00
xxi	2565f0f4e4	[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework (#11437 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-02-13 11:05:38 +08:00
Ludwig Schneider	5130cbd73e	[None][fix] Pre-Allocation for Auto-Tuning NCCL_SYMMETRIC (#11326 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2026-02-12 14:31:51 -08:00
Balaram Buddharaju	9c2d23c2e5	[https://nvbugs/5888410 ][fix] Enable warmup for Helix CP (#11460 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-02-12 14:24:51 -08:00
tburt-nv	07cd3d4ff2	[None][chore] Bump version to 1.3.0rc4 (#11485 ) Signed-off-by: Tyler Burt <tburt@nvidia.com>	2026-02-12 16:55:23 -05:00
Yukun He	cb1d8d130f	[TRTLLM-10791][feat] TorchSampler general host time optimization (#11141 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-02-12 18:05:58 +01:00
Wanli Jiang	421eb9e39c	[None][feat] Optimize NemotronH model with elementwise and nvfp4 fusion (#11273 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-02-12 09:25:31 -05:00
Lizhi Zhou	219195688c	[None][chore] fix a bug in PR11336 (#11439 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-12 14:34:14 +08:00
Simeng Liu	12085536df	[TRTLLM-10487][feat] Add user-provided UUID support for multimodal KV cache identification. (#11075 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-02-12 00:48:47 -05:00
William Zhang	ca9537e17c	[TRTLLM-10858][feat] Multi-image support for EPD disagg (#11264 ) * Why? Prior to this commit, we only supported a single multimodal input for E/P/D disaggregated serving. * What? This commit does a minor refactor of the multimodal embedding handles that cross process boundaries to enable this. Existing unit tests are updated accordingly to test this. The `RequestOutput` has its `mm_embedding_handle` replaced in favor of `disaggregated_params`, addressing a previous TODO. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-02-11 20:50:00 -08:00
Liao Lanyu	58165d5394	[None][chore] Introduceing an abstract WaitingQueue interface to decouple the request scheduling logic from specific queue implementations (#11330 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-02-12 09:18:24 +08:00
Harris Nover	2d5ebb3fe8	[None][chore] Merge residual+hidden into layer norm at the end of each NemotronH MTP, and remove a % operation (#11406 ) Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>	2026-02-11 12:01:36 -05:00
Robin Kobus	7a103035be	[None][fix] Remove overlap scheduler adjustment for max sequence length in create_py_executor function (#9229 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2026-02-11 08:46:25 -08:00
Guoming Zhang	c47ff4da43	[None][feat] Remove the hard code for activation type definition in T… (#11164 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-02-11 21:50:45 +08:00
Yihan Wang	e8b860965b	[None][feat] Initial PR for trtllm-gen attention backend (#10784 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-02-11 17:16:52 +08:00
Bo Li	5ea6888dda	[https://nvbugs/5810940 ][fix] Update lm_eval to 4.9.10 and re-enable Skip Softmax Attention tests on CI. (#11176 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-02-11 00:54:40 -05:00
Taylor Yeonbok Lee	860054c859	[#11203 ][feat] AutoDeploy: Refactor node caching and improve engine build time (#11250 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-02-10 13:35:44 -08:00
mpikulski	411fa9ff87	[TRTLLM-10030][perf] pin host memory and batch sampler setup in beam search (#11390 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-10 16:48:36 +01:00
Iman Tabrizian	7d992972b2	[TRTLLM-10273][feat] Move MambaCacheManager from Python to C++ (#10540 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-02-10 07:20:56 -08:00
Leslie Fang	d6e49542bd	[https://nvbugs/5848377 ][fix] fix deepeplowlatency with trtllm moe backend running fp8 DS_R1 (#11266 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com> Signed-off-by: Leslie Fang <leslief@nvidia.com> Co-authored-by: Tailing Yuan <yuantailing@gmail.com>	2026-02-10 20:09:00 +08:00
chenfeiz0326	eac56b793e	[https://nvbugs/5853720 ][fix] Disable cutedsl argmax kernel to fix perf regression (#11403 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-02-10 18:10:38 +08:00
mpikulski	adc0d82500	[https://nvbugs/5791242 ][chore] remove obsolete code (#11388 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-02-10 10:55:29 +01:00
Yuxian Qiu	5f4df89109	[None][feat] Fully non-blocking pipeline parallelism executor loop. (#10349 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-02-10 15:43:28 +08:00
shuyixiong	c3cdc93211	[TRTLLM-9771][feat] Make update_weights compatible with CUDA Graph (#11267 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2026-02-10 01:12:49 -05:00
Jonas Li	8b2dc57823	[None][chore] Mass merge commits from release/1.2.0rc6.post1 branch (#11384 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>	2026-02-10 14:00:42 +08:00
Lucas Liebenwein	a2fb5afecf	[#11032 ][feat] MLA revisited and GLM 4.7 Flash support (#11324 )	2026-02-09 23:26:51 -05:00
Yuan Tong	4fc3644705	[None][fix] Avoid reserved filename on Windows (#11382 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2026-02-10 11:22:59 +08:00
Yuxian Qiu	af68c29d3d	[None][chore] Reduce attention module repeated warnings. (#11335 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-02-10 08:58:21 +08:00
Ziyi Xiong	e76b634251	[TRTLLM-10321][feat] Support different KV cache layout for one-model spec dec (#10502 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2026-02-10 05:16:02 +08:00
Bala Marimuthu	4a743338c3	[None][infra] AutoDeploy: Dump graph IR after every transform (#11045 ) Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>	2026-02-09 10:43:44 -08:00
Lizhi Zhou	e719721a60	[TRTLLM-10866][feat] implement disaggregated harmony chat (#11336 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-02-09 12:09:03 -05:00
Guiju Zhang	c37531c3f7	[TRTLLM-10669][fix] Fix Eagle3 draft model weight loading for throughput checkpoint (#11010 ) Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-09 23:53:40 +08:00

1 2 3 4 5 ...

2214 Commits