TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-05 02:31:33 +08:00

Author	SHA1	Message	Date
jthomson04	cf88da7eca	[None][feat] KV Connector Support for MTP (#10932 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2026-01-23 18:58:26 -05:00
Taylor Yeonbok Lee	1fbbb1f3cd	[None][feat] AutoDeploy: Enhance memory consumption for MoE fusion transform (#10772 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-01-23 15:22:54 -08:00
Jin Li	b560598c79	[https://nvbugs/5707359 ][fix] Unwaive the test that due to flashinfer… (#10570 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2026-01-23 13:09:04 -05:00
yuanjingx87	f4b52d3b78	[None][infra] Regenerate out dated lock file (#10940 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2026-01-23 09:21:03 -08:00
Yihan Wang	1d68fab49c	[https://nvbugs/5814215 ][fix] Unwaive test_trtllm_flashinfer_symbol_collision.py::test_flashinfer_fused_moe_matches_torch_moe (#10930 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-01-24 01:09:18 +08:00
Yan Chunwei	54768f3f2c	[None][chore] refine placement group in ray executor (#10235 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2026-01-23 19:31:20 +08:00
Yihan Wang	43f2b51e94	[https://nvbugs/5833795 ][chore] Waive test test_e2e.py::test_ptp_quickstart_advanced[GPT-OSS-120B-gpt_oss/gpt-oss-120b] (#10953 ) Signed-off-by: Yihan Wang <yihwang@nvidia.com>	2026-01-23 06:04:57 -05:00
Emma Qiao	ae114ec7cf	[None][infra] Waive a failed case in pre-merge stage (#10948 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-23 04:40:17 -05:00
zackyoray	51c7a06da6	[None][feat] Upgrade NIXL to v0.9.0 (#10896 ) Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>	2026-01-23 15:58:53 +08:00
Stanley Sun	0f7192c7fe	[None][test] Remove unused test list (#10916 ) Signed-off-by: Stanley Sun <stsun@nvidia.com>	2026-01-23 10:24:06 +08:00
Leslie Fang	31d04dfa12	[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-23 10:16:58 +08:00
yuanjingx87	ea928f62af	[None][infra] Update CI allowlist (#10936 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2026-01-22 14:22:27 -08:00
Lucas Liebenwein	d793bd973d	[https://nvbugs/5688721 ][fix] unwaive NemotronH accuracy test (#10852 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-22 16:23:28 -05:00
William Zhang	2146c23786	[#9306 ][refactor] Refactor AutoDeployConfig into LlmArgs (#10613 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-22 16:02:49 -05:00
Grzegorz Kwasniewski	d8e6e22060	[https://nvbugs/5819002 ][fix] fix sharding tests (#10775 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-22 20:02:48 +01:00
Yi Zhang	d43be7b65e	[None][fix] Avoid Double update for previous batch (#9888 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2026-01-22 13:15:06 -05:00
Shi Xiaowei	944c304bbb	[TRTLLM-9527][feat] Python transceiver components (step 2) (#10494 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-01-22 10:14:50 -08:00
Shi Xiaowei	9adef4eb28	[TRTLLM-9527][doc] Add NIXL as a Python attribution (step 4) (#10910 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-01-22 10:09:55 -08:00
Venky	b3146d095d	[TRTC-122][feat] Eagle3 Specdec UX improvements (#10124 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2026-01-22 07:24:11 -08:00
Yan Chunwei	30ffa58b54	[https://nvbugs/5783876 ][fix] fix hmac launch (#10434 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2026-01-22 23:20:53 +08:00
Bo Deng	a218cf02fd	[https://nvbugs/5768068 ][chore] improve disagg acc tests (#10833 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2026-01-22 09:45:35 -05:00
Pengyun Lin	5e34112b27	[TRTLLM-10388][feat] Support logprobs for Completions API (#10809 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2026-01-22 21:25:24 +08:00
彭晋韬(jtao peng)	9beb971827	[None][fix] Update RMSNorm custom op plumbing (#10843 ) Signed-off-by: jintaop <jintaop@nvidia.com>	2026-01-22 21:03:22 +08:00
Jiayu Chang	1dc49b266e	[https://nvbugs/5322131 ][feat] Multi-LoRA serving with CUDA Graph (#8279 ) Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>	2026-01-22 14:01:18 +01:00
Yihan Wang	cdb9ffd0ab	[https://nvbugs/5741304 ][chore] Update flashinfer-python to 0.6.1 (#10872 ) Signed-off-by: Yihan Wang	2026-01-22 19:29:16 +08:00
tcherckez-nvidia	128d4ac5be	[None][chore] NVFP4 MoE - Move weights transformation to fusion phase… (#10803 ) Signed-off-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>	2026-01-22 13:08:05 +02:00
Yiqing Yan	0243abee22	[None][chore] Bump version to 1.3.0rc1 (#10923 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-01-22 18:45:40 +08:00
Enwei Zhu	0b3092e144	[None][ci] Fix test list llm_spark_func.txt (#10921 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-22 04:23:03 -05:00
tcherckez-nvidia	6e72aff866	[#10838 ][fix] Add missing dist strategy param. fix typo for ad_logger… (#10892 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2026-01-22 10:38:31 +02:00
Bo Li	9ce0511d86	[https://nvbugs/5811159 ][fix] Unwaive bug 5811159. (#10903 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-22 16:28:11 +08:00
Pengbo Wang	9462d90ec7	[None][feat] Add KV cache cleanup (#7439 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2026-01-22 15:14:17 +08:00
shuyixiong	fd2af8d58a	[TRTLLM-9771][feat] Support partial update weight for fp8 (#10456 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com> Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>	2026-01-22 14:46:05 +08:00
Wanli Jiang	ff0775408d	[None][fix] Fix waived tests for Nemotron-h models (#10758 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-22 14:17:50 +08:00
Enwei Zhu	be4a431ffd	[TRTLLM-10154][feat] Enable guided decoding with reasoning parsers (#10890 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-22 14:14:28 +08:00
Taylor Yeonbok Lee	895bb94b3d	[#8241 ][feat] Support model_kwargs for pytorch backend (#10351 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-01-21 20:51:38 -08:00
Yechan Kim	70caa779a4	[None][feat] K-EXAONE MTP support (#10796 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-01-22 13:43:00 +09:00
JennyLiu	415739711f	[None][chore] Add DGX-Spark VLM accuracy and perf spec dec cases (#10804 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Signed-off-by: JennyLiu <141791095+JennyLiu-nv@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-01-22 12:38:17 +08:00
Lizhi Zhou	f3a41c8d94	[TRTLLM-10059][feat] Use global unique id as disagg request id (#10187 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-21 22:52:34 -05:00
Daniil	0434db5bf7	[None][feat] GLM-4.5-Air support (#10653 ) Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>	2026-01-22 11:42:09 +08:00
TensorRT LLM	bd56b4e1e3	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-22 03:24:57 +00:00
Yuxian Qiu	c2a9e66dff	[https://nvbugs/5784543 ][chore] unwaive test. (#10835 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-22 11:17:28 +08:00
dongxuy04	635cbf01ba	[https://nvbugs/5816267 ][fix] Remove weight tensor holder to release memory earlier (#10876 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2026-01-21 16:42:52 -08:00
yuanjingx87	5450485bec	[None][infra] Fix sonarQube job hang by create jenkins homd folder if not exist (#10830 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2026-01-21 11:45:19 -08:00
Guiju Zhang	8cf8fbbe16	[TRTLLM-10325][feat] Refactor speculative decoding workers (#10768 ) Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>	2026-01-21 13:05:29 -05:00
kris1025	f91ea37a13	[None][chore] unwaive qwen3 235B accuracy test (#10493 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2026-01-21 17:52:04 +08:00
Yukun He	bf7303c7f1	[https://nvbugs/5636916 ][fix] Cherry-pick #10654 : Fix accuracy issue of TWO-SHOT AllReduce kernel (#10841 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-21 17:25:40 +08:00
Emma Qiao	165dd360b9	[None][infra] Waive failed cases for main branch on 01/21 (#10882 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-21 04:24:05 -05:00
xxi	9feebb3a27	[None][chore] switch to ConfigurableMoE as the default path (#10792 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-21 15:57:38 +08:00
Yukun He	a4152c80f6	[https://nvbugs/5814253 ][fix] unwaive test_autotuner_distributed_strategy tests (#10793 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-21 15:37:11 +08:00
HuiGao-NV	1592dfab6d	[https://nvbugs/5740377 ][fix] Lock resource to fix potential access to released data (#10827 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-21 14:17:29 +08:00

1 2 3 4 5 ...

4797 Commits