TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-28 22:56:13 +08:00

Author	SHA1	Message	Date
Patrice Castonguay	8a751a0e56	[None][chore] Remove is_disaggregated param in executor request queue (#9049 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-12 13:37:15 -05:00
Fanrong Li	780d4f9dc5	[None][feat] Add MTP>1 support for DS-v3.2 (#9045 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-12 09:56:12 -08:00
Neta Zmora	53491ffdb1	[#9023 ][feat] reduce AD graph optimization time for non-participating passes (#9024 ) Shorten AD graph optimization by 30% (measured on Nemotron-6): A bug in the transformation interface marked all passes as not clean, regardless of what was reported by the transformation Fix how the optimization passes report the results of their actions. Many passes report that the graph is not clean even when they didn't participate in the optimization. Each graph cleaning invocation can take several seconds. Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-11-12 09:05:53 -08:00
Chang Liu	0b81173efa	[TRTLLM-9259][perf] Use torch.compile to fuse copy + layernorm within the LayerNorm module (#9052 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>	2025-11-11 18:11:00 -08:00
Lucas Liebenwein	aca56097cb	[None][fix] AutoDeploy: update nano3 accuracy test (#9061 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-11 12:26:31 -08:00
QI JUN	524754b6fd	[TRTLLM-8521][chore] remove circular dependency between model engine and cuda graph runner (#7572 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-11 10:13:45 -08:00
Chenghao Zhang	ec9cf715a2	[None][feat] AutoDeploy: Perf improvement for mamba layers (#8991 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-11 08:27:07 -08:00
Wanli Jiang	ebdd1cc8e0	[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-11-11 07:48:23 -08:00
mpikulski	20fd305bb6	[None][fix] type annotation (#9071 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-11 07:20:20 -08:00
mpikulski	b151de4a8f	[TRTLLM-8377][test] unit tests for TorchSampler batched sampling (#9012 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-11 07:16:42 -08:00
Guoming Zhang	b894dc2d70	[None][fix] Display the GPU memory information in GiB unit. (#9070 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-11-11 06:24:59 -08:00
mpikulski	979b3ae9ce	[TRTLLM-7723][feat] sampling using FlashInfer.sampling (#8581 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-11 03:21:19 -08:00
Yuxian Qiu	7aeac97e4e	[https://nvbugs/5622938 ][fix] Use async send_requests_to_next_pp. (#9041 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-11 14:19:44 +08:00
Lucas Liebenwein	6bf4e59267	[#8763 ][feature] AutoDeploy: configurable dtype for caching (#8812 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-10 22:17:14 -08:00
Chang Liu	7ceb5e5ab6	[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-11 12:33:30 +08:00
Frida Hou	f40e1f7496	[https://nvbugs/5625972 ][fix] Add context manager to fix FakeTensorProp (#9047 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-10 16:25:58 -08:00
mpikulski	edc91ba819	[None][fix] Improve type annotations on ResourceManager.get_resource_manager (#9013 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-10 15:06:16 +01:00
ChristinaZ	2e7769d1e8	[None][feat] Add customized topk and related unit tests for DSA (#8882 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-10 03:35:35 -08:00
bhsueh_NV	e8d4a56dd0	[None][fix] fix eagle3 accuracy issue on sm120 (#8944 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-11-10 14:02:03 +08:00
Fanrong Li	a7033a9193	[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-10 12:16:01 +08:00
Chang Liu	7081f254cf	[None][perf] Add custom indexer k cache scatter op (#8960 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-07 11:24:26 -08:00
Patrice Castonguay	d8ea0b967f	[None][fix] Moving transfer timeout test to test_llm_pytorch, fixing broken kv transfer timeout (#8892 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-11-07 07:33:51 -08:00
mpikulski	5ef65872a3	[None][fix] type annotations in fuse_input_embeds (#8976 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-07 09:04:08 +01:00
Stefan Niebler	326a201473	[https://nvbugs/5508536 ][fix] Take Over (#8627 ): Reintroduce: Move stop_criteria to sample_async (#7041 ) (#8794 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-11-07 09:01:15 +01:00
QI JUN	1c6e490894	[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-06 22:37:03 -08:00
Eran Geva	990e674b71	[None][fix] Switch AD AllReduce strategy to NCCL (#8979 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-11-07 06:49:44 +02:00
xiweny	ee20e679a9	[https://nvbugs/5636986 ][fix] Fix DeepGemmMoe get_buffer calls (#8939 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-11-06 19:57:19 -08:00
Cao Dong	b53961e972	[None][feat] Return logprobs incrementally in torch backend (#8785 ) Signed-off-by: Dong Cao <docao@nvidia.com>	2025-11-07 10:23:39 +08:00
Chang Liu	1c19fd6868	[https://nvbugspro.nvidia.com/bug/5637012 ][fix] Bugfix when config is None for MLA (#8978 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-07 09:37:19 +08:00
jthomson04	fcae852cef	[None][fix] Fix KV cache clearing with KV Connector API (#8750 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2025-11-06 14:28:27 -08:00
Chenghao Zhang	1a78e7a3d6	[None][feat] AutoDeploy: Support Latent MOE for Nemotron (#8955 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-06 12:40:19 -08:00
Chenghao Zhang	ddf2d010e2	[TRTLLM-8814][feat] AutoDeploy: Use TRTLLM kernels for FP8 linear (#8820 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-06 11:00:10 -08:00
yunruis	51545560da	[TRTLLM-8803][feat] Add rope and uk-bgemm overlap for mla generation (#8495 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-11-06 17:39:57 +08:00
JadoTu	6bbb43f2b9	[None][feat] Add qwen3-next nvfp4 support (#8526 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-11-06 09:45:44 +08:00
Frida Hou	fb7f9831d3	[#8924 ][fix] Fix AutoDeploy pattern matcher for torch 2.9 (#8920 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-05 13:29:20 -08:00
Lucas Liebenwein	b181568d6f	[TRTLLM-8201][feat] Nemotron H MoE Sharding (#8744 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-05 12:35:29 -08:00
Chang Liu	e57d83c5dc	[TRTLLM-8768][chore] Fuse QK down_proj with indexer K + weight_proj for FP4 ckpt (#8771 )	2025-11-05 07:57:09 -08:00
Yukun He	b9e5315dfb	[https://nvbugs/5623960 ][fix] Fix the logger once key issue and further compress log in AutoTuner. (#8873 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-11-05 15:25:43 +08:00
Shiyu Li	eeb56c2848	[None][feat] MNNVLAllreduce Kernel Refactor (#8018 ) Signed-off-by: Shiyu Li <timlee0212@outlook.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-11-05 08:49:47 +08:00
Frida Hou	11ded113cd	[#8389 ][fix] Update group attention matching to first map to custom torch attention (#8638 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-11-04 12:00:43 -08:00
shuyixiong	70e4d72ffa	[TRTLLM-8511][feat] Add update_weights and sleep_wakeup support for rl integration (#8302 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com> Co-authored-by: Liwei Ma <liweim@nvidia.com> Co-authored-by: Jonas Yang CN <joyang@nvidia.com>	2025-11-04 10:19:24 -08:00
Bo Li	e4bf29bc66	[None][feat] Integrate MnnvlThroughput into TRTLLM MoE. (#8728 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-04 21:36:29 +08:00
Cao Dong	dddfcdd3bf	[None][fix] Fix bug of undefined py_topk_logprobs_vals (#8789 ) Signed-off-by: Dong Cao <docao@nvidia.com>	2025-11-04 19:32:59 +08:00
CarstyYou	4296c9553d	[TRTLLM-1234][feat] Add fp8 blockscaled Gemm for sm120 (#8844 ) Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>	2025-11-04 18:10:36 +08:00
danielafrimi	2b58dba0f6	[https://nvbugs/5524714 ][fix] Fix TP sharding of fused-QKV weight scales in W4A16 AWQ (#8432 ) Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Patrice Castonguay	65c138108e	[https://nvbugs/5552889 ][fix] fix: Prevent empty batch when using attention DP with disagg (#8372 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
xiweny	fcac2022e2	[https://nvbugs/5565565 ] [fix] fp8 wideep support sm103 (#8228 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
Yechan Kim	67208f1512	[None][fix] InputProcessor config naming convention fix (#8705 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-11-03 22:29:21 -08:00
HuiGao-NV	97674c3114	[TRTLLM-8690][feat] add more tensors to share buffers (#8691 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-11-03 21:08:01 -08:00
Yan Chunwei	ed297d7c2e	[None][chore] Optimize perf for the RPC executor and add some profile utilities to llm-api (#8415 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-11-03 17:59:49 -08:00

1 2 3 4 5 ...

1109 Commits