TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-04 02:02:01 +08:00

Author	SHA1	Message	Date
heyuhhh	e3f27e06c7	[None][chore] Waive star attention unittests (#10439 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2026-01-16 10:12:32 +08:00
Yuxian Qiu	ef838cc852	[https://nvbugs/5701445 ][chore] isolate test. (#10444 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-16 10:04:12 +08:00
Lucas Liebenwein	49c6f73554	[None][bug] AutoDeploy: fix regression in kv cache resize memory estimation (#10726 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-16 09:52:03 +08:00
Iman Tabrizian	5ad8cf6d5e	[https://nvbugs/5738168 ][fix] unwaive test accuracy/test_disaggregated_serving.py::TestDeepSeekV32Exp::test_auto_dtype[False] (#10584 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-01-16 06:04:45 +08:00
Thor Johnsen	0998a7bf20	[https://nvbugs/5721661 ][fix] Prevent out-of-bounds read (#9879 ) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>	2026-01-15 10:51:40 -06:00
heyuhhh	dfac07c045	[None][feat] Support to export data in trtllm-eval (#10075 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2026-01-15 23:27:08 +08:00
forrestl	43b9db3364	[None][doc] doc updates (#10711 ) Signed-off-by: forrestl	2026-01-15 21:46:49 +08:00
Lizhi Zhou	93db0d5e18	[TRTLLM-9942][feat] new request states and kvcache transceiver APIs in generation-first disagg (#10406 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-15 19:18:21 +08:00
Jun Yang	3bc17e1aa3	[None][doc] doc updates (#10704 ) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2026-01-15 19:05:26 +08:00
Lizhi Zhou	ff277b591e	[https://nvbugs/5791830 ][fix] fix pp loop hang caused by i-sending new requests (#10665 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-15 16:33:55 +08:00
yufeiwu-nv	cd55fb4551	[None][test] Remove NIM test (#10657 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>	2026-01-15 16:30:47 +08:00
Pengbo Wang	683515b1bd	[None][feat] Use XQA JIT impl by default and mitigate perf loss with sliding window (#10335 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2026-01-15 15:47:00 +08:00
Perkz Zheng	71ccc07d2b	[None][feat] update trtllm-gen to support groupsTokensHeadsQ (#10261 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-15 02:24:25 -05:00
Ludwig Schneider	e12a7119cf	[https://nvbugs/5741392 ][fix] [chore] Remove test exemptions from waivers tile (#10517 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2026-01-14 22:07:52 -08:00
Yiqing Yan	f4ace99218	[None][chore] Bump version to 1.3.0rc0 (#10681 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-01-15 13:55:44 +08:00
ruodil	22240e43eb	[None][test] store per user output and per gpu output metric in csv file (#10658 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2026-01-15 00:51:08 -05:00
Emma Qiao	7b3b6f1161	[None][infra] Waive failed tests on main 01/15 (#10683 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-15 13:40:37 +08:00
Anish Shanbhag	faa80e73fd	[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-14 21:06:07 -08:00
Lucas Liebenwein	62050b2381	[None][infra] separate AutoDeploy tests into own stages (#10634 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-14 23:05:26 -05:00
Void	f7de285a82	[None][fix] add quantization check for DeepEP LL low precision combine in new moe comm api (#10072 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2026-01-14 22:15:29 -05:00
TensorRT LLM	482b7b8837	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-15 03:10:09 +00:00
Lucas Liebenwein	15b43e8a14	[https://nvbugs/5777041 ][fix] fix AutoDeploy ep sharding test (#10460 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-14 21:53:56 -05:00
Dom Brown	94c7b69048	[https://nvbugs/5630196 ] [fix] Prevent flaky failures in C++ test_e2e.py by using local cached datasets for benchmarking (#10638 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2026-01-14 21:39:55 -05:00
Wanli Jiang	73d1840c12	[TRTLLM-10245][feat] Add accuracy tests for super v3 fp8 model (#10482 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-15 10:07:02 +08:00
dominicshanshan	0f2d61b8c6	[https://nvbugs/5766952 ][fix] Fix AIPerf issue. (#10666 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-15 09:54:34 +08:00
bhsueh_NV	5f9fc50233	[https://nvbugs/5800725 ][infra] Update waives.txt (#10625 )	2026-01-15 09:08:07 +08:00
彭晋韬(jtao peng)	211c44b951	[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905 ) Signed-off-by: jintaop <jintaop@nvidia.com>	2026-01-15 07:29:15 +08:00
TensorRT LLM	968db53194	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-14 22:18:53 +00:00
Tzu-Ling Kan	c99faaed06	[#9760 ][fix] Use RequestError for validation errors to prevent engine shutdown (#9761 ) Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>	2026-01-14 10:22:36 -05:00
Emma Qiao	01083b56bf	[TRTLLM-9849][infra] Update dependencies to 25.12 (#9818 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: xxi <xxi@nvidia.com> Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: xxi <xxi@nvidia.com> Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-14 21:54:04 +08:00
Emma Qiao	35c24424f6	[None][infra] Waive failed cases in post-merge on 01/14 (#10668 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-14 21:39:32 +08:00
HuiGao-NV	b10704428d	[https://nvbugs/5787566 ][fix] Only keep a limited number of performance statistic data (#10569 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-14 07:53:01 -05:00
Bo Li	582dec5bb5	[https://nvbugs/5774869 ][infra] Use 2 GPUs to test skip softmax attention on H100. (#10420 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-14 07:03:01 -05:00
shuyixiong	babd5ecacc	[https://nvbugs/5760740 ][fix] Enable ray tests (#10272 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2026-01-14 19:25:46 +08:00
Kyungmin Lee	25148d3fee	[None][feat] Support new Transformers RoPE configuration format (#10636 ) Signed-off-by: lkm2835 <lkm2835@gmail.com>	2026-01-14 19:41:27 +09:00
xxi	e9817461ba	[None][chore] improve the readability of log for cutlass can only sup… (#10630 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-14 05:33:45 -05:00
xxi	d8862505b9	[None][chore] enable EPLB for DEEPGEMM (#10617 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-14 05:28:08 -05:00
xinhe-nv	272688c663	[None][fix] fix L0 issues (#10670 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-14 18:09:40 +08:00
jmydurant	e7882d5c74	[None][feat] MiniMax M2 support (#10532 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2026-01-14 17:38:58 +08:00
mpikulski	052c36ddd2	[TRTLLM-9522][feat] support image_embeds in OpenAI API (#9715 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-01-14 10:31:03 +01:00
Bo Li	487287a412	[None][chore] Update test name MNNVL->NVLinkTwoSided. (#9672 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-14 04:29:57 -05:00
Zhenhuan Chen	287f6c2e0f	[None][test] add log_samples and output_path for trtllm_eval (#10629 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2026-01-14 16:01:38 +08:00
QI JUN	c4da4fd462	[https://nvbugs/5637220 ][ci] unwaive TestQwen3_235B_A22B::test_nvfp4[latency_moe_trtllm_attention_dp] (#9870 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2026-01-14 15:41:14 +08:00
Yukun He	15281de799	[None][fix] Reduce host overhead for unified nvfp4 gemm tuning path. (#10503 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-14 14:26:18 +08:00
Yuxian Qiu	39cefd6125	[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 14:05:47 +08:00
xxi	f841b43cde	[None][chore] waive the CI failure (#10655 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-14 13:59:15 +08:00
JennyLiu	92ae490410	[None][test] Spark - Change testlist name and perf yml format (#10626 ) Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com> Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>	2026-01-13 23:07:11 -05:00
xinhe-nv	07d9390e9b	[None][test] add test into qa test list (#10627 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-13 22:43:00 -05:00
tburt-nv	b65c515314	[None][chore] update allowlist 2026-01-13 (#10645 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2026-01-13 22:23:03 -05:00
TensorRT LLM	dd22324675	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-14 03:07:57 +00:00

1 2 3 4 5 ...

4667 Commits