TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

Author	SHA1	Message	Date
chenfeiz0326	a65b0d4efa	[None][fix] Decrease Pre Merge Perf Tests (#10390 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-04 12:21:34 -05:00
Yanchao Lu	c4f27fa4c0	[None][ci] Some tweaks for the CI pipeline (#10359 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-04 11:10:47 -05:00
dongfengy	afc533193d	[None][feat] Support nvfp4 for gptoss (#8956 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-01-04 08:57:44 -05:00
Jaedeok Kim	a4dcc6a711	[TRTLLM-10171][fix] Correct attention handling in ModelConfig and KVCacheManager (#10330 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-04 06:07:30 -05:00
Yuxian Qiu	6ba04eba06	[https://nvbugs/5748683 ][fix] Use get_free_port_in_ci to avoid port conflict. (#10392 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-04 19:04:58 +08:00
TensorRT LLM	71b4a8aa60	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-04 03:08:01 +00:00
yuanjingx87	5bd37ce41e	[None][infra] add retry logic to get slurm sbatch job log when ssh dropped (#9167 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2026-01-04 10:11:37 +08:00
Grzegorz Kwasniewski	0d1f5ad7a2	[TRTLLM-10358][feat] Added proper rescaling of FP4 weights (#10378 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-03 16:26:16 -05:00
Yanchao Lu	c0b3c2b919	[None][ci] Remove an invalid test waive Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-03 23:34:13 +08:00
Ludwig Schneider	59045a0e41	[None][fix] [fix] Make NCCL resource manager destructor exception-safe (#10166 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2026-01-03 10:25:05 -05:00
Emma Qiao	865992b86b	[None][infra] Waive failed cases on 1/3 (#10391 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-03 05:54:09 -05:00
Bo Deng	9e7b50aefb	[TRTLLM-9752][fix] WAR: Disable PDL for quant kernels to fix accuracy issues (#10285 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2026-01-03 14:34:55 +08:00
TensorRT LLM	45ffbf1f21	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-03 03:07:50 +00:00
Lucas Liebenwein	937f8f78a1	[None][doc] promote AutoDeploy to beta feature in docs (#10372 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-02 18:46:31 -05:00
Izzy Putterman	bdf6953ddc	[None][feat] Eagle: MLA Based Eagle (#9677 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2026-01-02 13:45:07 -05:00
Gal Hubara-Agam	f3dd6da080	[#10056 ][chore] AutoDeploy: Enable Nemo SuperV3 accuracy test (#10308 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-01-02 11:20:19 +02:00
chenfeiz0326	5e0e48144f	[None][fix] Minor updates on Perf Test System (#10375 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-01-02 17:17:42 +08:00
TensorRT LLM	098251648d	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-02 03:11:08 +00:00
fredricz-20070104	f631b25c85	[None][test] Unified slurm extra args management and session collection logic (#10332 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com> Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com> Co-authored-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-01 21:10:51 -05:00
Balaram Buddharaju	4a1b742aa0	[TRTLLM-9467][fix] Fix PP+CP combination with helix parallelism (#10312 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-01 13:42:53 -05:00
Gal Hubara-Agam	5845951538	[#10056 ][fix] AutoDeploy: Handle deletion of nested params in sharding (#10376 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-01-01 08:11:11 -05:00
tcherckez-nvidia	4868772ad7	[None][feat] Add export data to build and run script for AD (#10299 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2026-01-01 04:54:47 -05:00
Balaram Buddharaju	9f5b750a93	[None][chore] Waive tests blocking pre-merge 12/31 (#10373 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-01 03:00:24 -05:00
Balaram Buddharaju	0b75340223	[https://nvbugs/5744427 ][fix] Make Gemma3 multimodal test fp8 (#10368 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-01 01:11:34 -05:00
TensorRT LLM	edbcff0257	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-01 03:08:31 +00:00
Yuxian Qiu	ff836d4f41	[https://nvbugs/5740359 ][chore] Unwaive tests. (#10260 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-01 09:53:34 +08:00
Lucas Liebenwein	1bbe71b3ed	[#10244 ][feat] AutoDeploy: separate prefill/decode in flashinfer (#10252 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-31 17:01:24 -05:00
Mike Iovine	9085021aa4	[None][feat] Implement sampling for MTP 1-model (#10019 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2025-12-31 13:48:34 -05:00
Simeng Liu	84d107b2f0	[https://nvbugs/5717993 ][fix] Add execution_stream across PyExecutor, KVCacheManager, PeftCacheManager to ensure proper CUDA stream synchronization between KV cache transfer operations and model forward kernels. (#10060 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2025-12-31 09:22:54 -08:00
xinhe-nv	0d2e2718ce	[None][chore] Add failed cases into waives.txt (#10354 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 09:30:22 -05:00
chenfeiz0326	a23c6f1092	[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-31 21:44:59 +08:00
tcherckez-nvidia	464847c6be	[#9717 ][chore] Standardize MoE weights interface (#10295 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2025-12-31 07:37:18 -05:00
Jin Li	ef1d4a40b5	[https://nvbugs/5727475 ][fix] Avoid use property with setter in nn.Mo… (#10212 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-31 06:21:36 -05:00
Emma Qiao	d944430f96	[None][infra] Waive failed cases on 12/31 (#10353 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-31 17:39:49 +08:00
Necofish	73870ae4ad	[None][feat] support Qwen3-VL dense model in pytorch backend (#9060 ) Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>	2025-12-31 17:54:26 +09:00
xinhe-nv	827d12caaf	[https://nvbugs/5558516 ][test] add disaggregated stress test (#9354 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 16:47:36 +08:00
Yuxian Qiu	910a633066	[https://nvbugs/5774869 ][chore] waive tests. (#10356 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-31 03:00:52 -05:00
Yiqing Yan	fdc03684cc	[TRTLLM-10016][infra] Use SlurmPatition attribute time as timeout threshold (#10254 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-31 15:02:24 +08:00
Pengyun Lin	fad000589d	[None][chore] Unify DS tool parser names (#10239 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-31 14:40:07 +08:00
xinhe-nv	1e9c153b4c	[None][fix] disable thread leak check for kimi (#10337 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 01:31:37 -05:00
xinhe-nv	6c1abf2d45	[None][chore] Add failed cases into waives.txt (#10344 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-12-31 00:11:54 -05:00
TensorRT LLM	ed3a3097a4	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-12-31 03:11:56 +00:00
Jin Li	34c2fd50a9	[https://nvbugs/5707359 ][fix] Unwaive OOM case that should be fixed by #9446 (#10334 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-31 10:41:39 +08:00
Yuxian Qiu	1f3afb8e6f	[None][feat] Implement send_object for TorchDist. (#10213 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-31 10:40:52 +08:00
Yuxian Qiu	ec8a388c25	[https://nvbugs/5769890 ][fix] Import get_free_port. (#10341 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-12-31 09:47:27 +08:00
Eran Geva	74832a1895	[https://nvbugs/5766986 ][fix] fixed the shard_all_unprocessed default value to align with the default.yml (#10271 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-30 08:54:13 -05:00
Bo Li	1f0365da36	[None][infra] Add LongBenchV1 to trtllm-eval. (#10265 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-30 21:39:34 +08:00
Emma Qiao	6732c76414	[None][infra] Waive failed cases for main on 12/30 (#10338 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-30 05:17:43 -05:00
Emma Qiao	fb05cd769a	[None][infra] Enable single-gpu CI on spark (#9304 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-30 17:22:14 +08:00
Emma Qiao	cce7247815	[https://nvbugs/5594703 ][infra] Unwaive the failed case to test (#10275 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-12-30 16:38:54 +08:00

1 2 3 4 5 ...

4454 Commits