TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-18 08:45:05 +08:00

Author	SHA1	Message	Date
JunyiXu-nv	af899d2fe7	[TRTLLM-9860][doc] Add docs and examples for Responses API (#9946 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-14 21:46:13 -08:00
Ziyi Xiong	f2aee0db03	[TRTLLM-9854][feat] Optimize the host overhead of _sample_async (#9935 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-12-15 13:28:54 +08:00
Fanrong Li	8f144d9282	[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. (#9524 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-15 12:42:25 +08:00
xxi	f5696df285	[TRTLLM-8961][feat] ConfigurableMoE support DeepGemm (#9858 )	2025-12-15 10:47:15 +08:00
dominicshanshan	4bf42f8fa8	[https://nvbugs/5580297 ][fix] Skip capture request error test from Ray stage (#9947 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-12-15 10:03:16 +08:00
Simeng Liu	f21e2b3329	[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. (#9604 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2025-12-15 08:42:30 +08:00
nvxuanyuc	a5a37227d6	[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852 ) Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-12-14 10:47:24 +08:00
Yan Chunwei	85406f9dda	[https://nvbugs/5720482 ][fix] Fix test rpc streaming (#9902 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-13 01:14:43 -08:00
shuyixiong	8cbf2d958c	[TRTLLM-9738][chore] Guard accuracy with nccl allreduce strategy (#9793 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-13 01:02:11 -08:00
Balaram Buddharaju	461446045e	[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 16:49:25 -08:00
Chuang Zhu	4cc4cbe926	[https://nvbugs/5716787 ][fix] terminate nixl running when exiting (#9785 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-12-12 11:15:02 -05:00
JunyiXu-nv	2fec53dfa5	[TRTLLM-9637][feat] Support tool parser for Kimi K2 (#9830 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-12 23:32:39 +08:00
Yihan Wang	9df4dad3b6	[None][fix] Introduce inline namespace to avoid symbol collision (#9541 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2025-12-12 23:32:15 +08:00
Balaram Buddharaju	af315d8ef1	[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 22:29:05 +08:00
Lucas Liebenwein	e767fc649a	[None][feat] AutoDeploy: prepare_metadata revisited (#9764 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-12 20:14:14 +08:00
Venky	fd1270b9ab	[TRTC-43] [feat] Add config db and docs (#9420 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com> Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-12-12 04:00:03 +08:00
Simeng Liu	24f92721f2	[https://nvbugs/5597647 ][ci] Unwaive fixed tests. (#9812 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2025-12-12 02:29:30 +08:00
Erin	89dabf5aa1	[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353 ) Signed-off-by: Liwei Ma <liweim@nvidia.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Liwei Ma <liweim@nvidia.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-11 09:33:25 -08:00
JadoTu	02edb19f43	[None] [feat] add eos_token_id in generation_config to sampling params (#9514 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-12-12 00:52:03 +08:00
xxi	488d38f88d	[TRTLLM-8959][feat] ConfigurableMoE support CUTLASS (#9772 )	2025-12-12 00:22:13 +08:00
Yan Chunwei	04a39a4e2b	[None][chore] enable test_ipc.py (#9865 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-11 17:47:14 +08:00
Zongfei Jing	c76b428e2e	[TRTLLM-9685] [feat] Add gather fc1 kernel by cuteDSL (#9618 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-12-11 16:21:32 +08:00
JunyiXu-nv	454e7e59e5	[https://nvbugs/5718004 ][fix] Add warmup for cancellation test (#9860 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-11 12:20:33 +08:00
Yukun He	072f236002	[None][fix] Fully resolve the tactic recovery issues in AutoTuner serialized cache (#9835 ) Restrict tactic types to those compatible with AutoTuner cache serialization and deserialization. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-10 20:41:04 +08:00
zhanghaotong	36c9e7cfe6	[None][chore] Add unittest for otlp tracing (#8716 ) Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-12-09 18:34:08 -08:00
dhansen-nvidia	2d33ae94d5	[https://nvbugs/5508301 ][feat] Move D->H copies to a worker thread whe… (#8463 ) Signed-off-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com> Signed-off-by: dhansen-nvidia <218031328+dhansen-nvidia@users.noreply.github.com> Co-authored-by: Dan Hansen <1+dhansen-nvidia@users.noreply.github.com>	2025-12-09 18:51:31 -05:00
Mike Iovine	07c76a5fac	[None][feat] Make 2-model spec dec use the 1-model kernels (Hopper) (#8810 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-12-09 11:06:31 -05:00
Dom Brown	3156f2e852	[https://nvbugs/5575841 ] [fix] Nvbug 5575841: Remove additional test waivers for TestMoEFP4 (#9788 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-12-09 13:37:55 +00:00
Stefan Niebler	d600b9f851	[TRTLLM-6756][feat] Update BeamSearch for TorchSampler (#9660 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-12-09 10:44:01 +01:00
Robin Kobus	76f49c903b	[None][fix] Additional model outputs for pipeline parallelism (#9794 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-09 10:41:22 +01:00
JunyiXu-nv	f521f6d910	[None][fix] Fix unterminated process issue for RemoteOpenAIServer (#9490 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-09 11:15:40 +08:00
Jiagan Cheng	4a3a66b124	[https://nvbugs/5677746 ][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang (#9659 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-12-08 18:43:52 -08:00
Chenghao Zhang	75f5446d67	[#9753 ][feat] AutoDeploy: Implement add rms_norm fusion (#9754 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-12-08 14:24:27 -08:00
Eran Geva	23cf72b0f8	[#8921 ][feat] Added symetric memory AllReduce strategy (#8919 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-08 13:12:56 -08:00
Yibin Li	faabc1a387	[TRTLLM-7967][chore] Add more tests (#9415 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-12-08 11:57:32 -08:00
Frank	f6df9eb2a6	[TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench (#9250 )	2025-12-08 10:37:40 -08:00
Li Min	a422d70be6	[None][chore] Enable tvm_ffi for cute dsl nvfp4_gemm to reduce host overhead. (#9690 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-12-08 13:28:11 +08:00
xxi	8e27ce7084	[TRTLLM-9603][feat] Enable ConfigurableMoE test in the CI (#9645 )	2025-12-08 10:19:40 +08:00
Ludwig Schneider	41ce14ab04	[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce (#9314 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2025-12-07 09:43:26 -08:00
Yan Chunwei	e4c707845f	[None][fix] enable hmac in RPC (#9745 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-12-07 08:24:46 +08:00
Jonas Li	2645a78f34	[TRTLLM-9660][feat] Convert cuteDSL GEMM to opt-in feature (#9682 ) Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-06 02:24:51 -08:00
Enwei Zhu	7cd5a67e25	[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (#9592 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-05 22:08:52 -08:00
Robin Kobus	eb0b426e5d	[None][refactor] Improve request processing function in sampler (#9671 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-05 16:41:49 +01:00
Robin Kobus	faf682b8bc	[TRTLLM-7136][feat] Update load_weights method to include mapping parameter in checkpoint loaders (#9583 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-05 16:07:20 +01:00
gramnarayan	74df9b180b	[#9602 ][feat] AutoDeploy: Support TRTLLM Sampler (#9641 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2025-12-04 19:24:11 -08:00
Lizhi Zhou	0d0a16fff4	[TRTLLM-8920][feat] decouple disagg service from fastapi (#8714 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-05 10:44:16 +08:00
Anthony Chang	60cdca3740	[None][fix] Recover TRTLLM MoE Perf for DEP (#9562 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2025-12-04 22:10:25 +08:00
Jin Li	e5d4305c04	[https://nvbugs/5467531 ][fix] Unwaive fused_moe all to all test with … (#9617 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-04 18:17:24 +08:00
Yan Chunwei	05058f5e2a	[None][ci] unwaive tests (#9651 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-04 15:06:07 +08:00
tcherckez-nvidia	f9aa86dbdd	[#8733 ][feat] Add Llama4 MoE handling to AutoDeploy (#9556 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Neta Zmora <nzmora@nvidia.com>	2025-12-04 08:03:33 +02:00

1 2 3 4 5 ...

1020 Commits