TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-05 18:51:38 +08:00

Author	SHA1	Message	Date
Taylor Yeonbok Lee	304dc6f3c0	[None][chore] Print memory usage before/after accuracy test in CI (#11155 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-02-03 00:23:14 -05:00
Yi Zhang	0306c0f12c	[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-02-02 14:29:02 +08:00
Guoming Zhang	6bace84167	[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-31 13:48:25 +08:00
JennyLiu	6506d63466	[None][test] Add DGX-Spark VLM gemm3-12b bfp16/fp4/fp8 accuracy and perf cases (#11096 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-01-30 00:38:19 -05:00
Chenghao Zhang	e033929221	[None][feat] AutoDeploy: Flashinfer kernels bringup (#10867 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-29 14:59:29 -08:00
Mike Iovine	0ad87895f5	[https://nvbugs/5836592 ][fix] Fix qwen3 eagle test (#11030 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2026-01-29 14:49:08 -08:00
Balaram Buddharaju	c7a86f89de	[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-29 02:57:13 -05:00
Anish Shanbhag	24ac86c485	[https://nvbugs/5761391 ][fix] Include triton-kernels as a packaged dependency (#10471 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-28 19:56:32 -08:00
Grzegorz Kwasniewski	38bcee189c	[TRTLLM-10362][feat] Added Mamba and MLA layers to the sharding tests (#10364 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-28 10:34:10 +01:00
Lucas Liebenwein	ff3a494f5c	[#10013 ][feat] AutoDeploy: native cache manager integration (#10635 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-27 11:23:22 -05:00
zhhuang-nv	ca9f70f78c	[https://nvbugs/5612438 ][fix] Add timeout for SeedOSS test (#8683 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2026-01-27 15:22:21 +08:00
sunnyqgg	ff0dd6076e	[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754 ) Signed-off-by: qgai <qgai@nvidia.com>	2026-01-26 11:23:26 -05:00
Lucas Liebenwein	00f341be49	[#8982 ][feat] AutoDeploy attention dp support (#10728 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-26 09:43:33 -05:00
Tian Zheng	5efee01da1	[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-01-26 16:46:33 +08:00
yingguo-trt	c8f1745a6e	[https://nvbugs/5661741 ][feat] Add 250K-token NVFP4 MoE + PDL regression tests (#10911 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-26 01:48:29 -05:00
Ivy Zhang	bcd2dc490c	[None][test] Update case for release (#10811 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Ivy Zhang	4ebc1b1596	[None][test] Update test case for release (#10763 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Kaiyu Xie	da967d0bd7	[TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances (#10755 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2026-01-23 22:29:37 -05:00
Taylor Yeonbok Lee	1fbbb1f3cd	[None][feat] AutoDeploy: Enhance memory consumption for MoE fusion transform (#10772 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-01-23 15:22:54 -08:00
Venky	b3146d095d	[TRTC-122][feat] Eagle3 Specdec UX improvements (#10124 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2026-01-22 07:24:11 -08:00
Bo Deng	a218cf02fd	[https://nvbugs/5768068 ][chore] improve disagg acc tests (#10833 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2026-01-22 09:45:35 -05:00
tcherckez-nvidia	128d4ac5be	[None][chore] NVFP4 MoE - Move weights transformation to fusion phase… (#10803 ) Signed-off-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>	2026-01-22 13:08:05 +02:00
Wanli Jiang	ff0775408d	[None][fix] Fix waived tests for Nemotron-h models (#10758 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-22 14:17:50 +08:00
JennyLiu	415739711f	[None][chore] Add DGX-Spark VLM accuracy and perf spec dec cases (#10804 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Signed-off-by: JennyLiu <141791095+JennyLiu-nv@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-01-22 12:38:17 +08:00
Daniil	0434db5bf7	[None][feat] GLM-4.5-Air support (#10653 ) Signed-off-by: Daniil Kulko <kulkodaniil@gmail.com>	2026-01-22 11:42:09 +08:00
kris1025	f91ea37a13	[None][chore] unwaive qwen3 235B accuracy test (#10493 ) Signed-off-by: linquanh <linquanh@nvidia.com>	2026-01-21 17:52:04 +08:00
Gal Hubara-Agam	e61c942d1f	[#10707 ][fix] AutoDeploy: Super accuracy test fixes (#10717 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-01-20 18:16:13 +02:00
benzh-2025	4c8468c5d3	[None][fix] default disable gemm+allreduce fusion (#10656 )	2026-01-20 12:31:17 +08:00
Shi Xiaowei	442d2e8a15	[None][test] adjust the dis-agg test timeout threshold (#10800 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-01-19 17:02:00 +08:00
Chuang Zhu	4f04532ce7	[https://nvbugs/5769890 ][fix] enable system memory to transfer active message in NIXL ucx (#10602 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-01-19 09:20:12 +08:00
Lucas Liebenwein	b64052539d	[https://nvbugs/5769712 ][fix] fix timeout in AutoDeploy llama accuracy test (#10461 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-18 13:20:55 -05:00
Chenghao Zhang	0b748d5bba	[None][chore] update flashinfer to 0.6.0 (#10522 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-16 16:22:06 -05:00
Chenghao Zhang	b6acd96616	[None][fix] AutoDeploy: Fix the nvfp4 fused_moe (#10727 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-16 12:04:40 -08:00
Tian Zheng	cfebfbb505	[https://nvbugs/5783509 ][fix] Fix a hang issue when enabling skip softmax on Blackwell (#10490 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-01-16 18:59:54 +08:00
Chuang Zhu	7e2cbc0756	[https://nvbugs/5598674 ][fix] enable partial reuse in gemma and gpt oss test (#10559 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-01-16 10:26:15 +08:00
Anish Shanbhag	faa80e73fd	[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-14 21:06:07 -08:00
Wanli Jiang	73d1840c12	[TRTLLM-10245][feat] Add accuracy tests for super v3 fp8 model (#10482 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-15 10:07:02 +08:00
彭晋韬(jtao peng)	211c44b951	[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905 ) Signed-off-by: jintaop <jintaop@nvidia.com>	2026-01-15 07:29:15 +08:00
Bo Li	582dec5bb5	[https://nvbugs/5774869 ][infra] Use 2 GPUs to test skip softmax attention on H100. (#10420 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-14 07:03:01 -05:00
jmydurant	e7882d5c74	[None][feat] MiniMax M2 support (#10532 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2026-01-14 17:38:58 +08:00
xinhe-nv	07d9390e9b	[None][test] add test into qa test list (#10627 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-13 22:43:00 -05:00
Balaram Buddharaju	ccdfa43a6e	[https://nvbugs/5791900 ][fix] Fix HelixCpMnnvlMemory init with PP (#10533 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-13 15:48:42 -05:00
Guoming Zhang	bdaee87895	[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. (#10347 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-13 17:13:55 +08:00
Suyog Gupta	a1385243e1	[#10580 ][fix] re-enable NemotronH MOE MMLU test (#10594 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2026-01-12 09:26:07 -08:00
Wanli Jiang	11da7e3605	[None][fix] Solve pillow version conflict (#10537 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-12 04:05:54 -05:00
William Zhang	ff7eb93f31	[https://nvbugs/5669097 ][tests] Add MMMU test for mistral small (#10530 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-09 16:09:28 -08:00
Yechan Kim	7295af68ba	[None][fix] Enable AttentionDP on Qwen3-VL and fix test (#10435 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-01-10 00:13:26 +09:00
Jie Li	627d306df9	[None][chore] remove some model support; add device constraint (#10563 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2026-01-09 09:36:23 -05:00
JadoTu	4c498bfe58	[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case (#9873 ) Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>	2026-01-09 14:50:16 +08:00
bhsueh_NV	bea61bb17d	[None][fix] Mistral large 3 few code refine (#10405 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2026-01-08 06:38:49 -05:00

1 2 3 4 5 ...

503 Commits