TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

Author	SHA1	Message	Date
TensorRT LLM	9bab773cb5	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-12-13 00:25:40 +00:00
Yanchao Lu	48b7b5d8b7	[None][chore] Upgrade starlette and FastAPI (#9319 ) (#9904 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com> Co-authored-by: tburt-nv <195370667+tburt-nv@users.noreply.github.com>	2025-12-11 23:27:19 +08:00
QI JUN	6624cc293a	[None][doc] remove nano-vl-v2 model support in release notes (#9887 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-10 18:09:54 -08:00
QI JUN	67ffa90d62	[https://nvbugs/5729847 ][doc] fix broken links to modelopt (#9868 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-10 02:57:11 -08:00
Zhanrui Sun	8550abf142	[TRTLLM-9811][infra] Update urllib3 version >= 2.6.0 to fix high vulnerability issue (#9824 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-12-10 11:20:17 +08:00
QI JUN	df8d2310c8	[None][doc] Update release notes (#9739 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com> Co-authored-by: Laikh Tewari <laikhtewari1@gmail.com>	2025-12-09 18:46:55 -08:00
Zac Patel	cfaa13a98a	[IB-1920][doc] Update Perf_Overview.md with Benchmarking Results for Release 1.1 (#9723 ) Signed-off-by: Zachary Patel <22306219+zbpatel@users.noreply.github.com>	2025-12-09 15:36:13 -08:00
TensorRT LLM	16a7d18c4e	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-12-08 22:09:55 +00:00
ChristinaZ	9aa479599e	[https://nvbugs/5537738 ][fix] Add fp8 post-quant allgather support to release 1.1 (#8322 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-12-06 09:57:37 +08:00
xiweny	9a421d0a00	[https://nvbugs/5503138 ] [fix] Remove compile warnings (#9733 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-12-05 10:29:20 -08:00
Zhanrui Sun	6f5e8a3576	[TRTLLM-9124][infra] Modify the requirement of tensorrt from 10.13.0 to 10.13.3 (#9128 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-12-05 00:06:35 -08:00
xiweny	82e5a4cad8	[TRTLLM-4629][doc] Add B300 & GB300 in documents (#9663 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-12-03 20:06:50 +08:00
ruodil	0a6fd1de1b	[https://nvbugs/5652552 ][fix] cherry-pick add printing for llm args (#9206 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-12-03 16:49:06 +08:00
Iman Tabrizian	ac5aa63b11	[TRTLLM-9082][doc] Address Dynamo Example feedback (#9619 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-12-02 17:25:55 +08:00
yuanjingx87	cc124a9b92	[None][infra] add attribution files for release/1.1 (#9495 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-12-02 00:19:59 -08:00
Kaiyu Xie	6f7804ff50	[TRTLLM-9090] [doc] Update online benchmarking docs (#9611 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-12-02 15:58:43 +08:00
Yibin Li	307afac716	[None][chore] cherry-pick: Design diagram review process change (#9596 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-12-01 21:13:14 -08:00
QI JUN	ec223a11c9	[TRTLLM-9092][doc] link to modelopt checkpoints in quick start guide (#9571 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-01 10:51:31 +08:00
QI JUN	534609b5a6	[TRTLLM-9093][doc] update hyper links in overview (#9568 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-12-01 10:43:03 +08:00
Yan Chunwei	2ab8e58ede	[TRTLLM-9075][doc] refine the slurm examples (#9548 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-12-01 09:33:21 +08:00
Emma Qiao	221e4bbbb0	[None][infra] Waive failed tests for release branch on 11/30 (#9553 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-11-30 13:35:23 +08:00
Yiqing Yan	6339e76b6d	[None][infra] Updated Linux installation guide (#9485 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-11-29 22:55:47 +08:00
Enwei Zhu	a65ee3d045	[https://nvbugs/5687820 ][fix] Remove self.abort() in DetokenizedGenerationResult (#9450 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-11-27 14:16:04 +08:00
Enwei Zhu	4263108ebe	[TRTLLM-9157][doc] Guided decoding doc improvement (#9359 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-11-27 14:14:43 +08:00
QI JUN	267c850792	[TRTLLM-9086][doc] Clean up TODOs in documentation (#9292 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-27 14:13:00 +08:00
Pengyun Lin	41c903d6a7	[None][doc] VDR 1.0 trtllm-serve doc enhancement (#9443 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-11-27 13:08:26 +08:00
Yan Chunwei	eb7c6d9301	[TRTLLM-9160][doc] add doc to llm_runtime.py (#9482 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-11-27 10:10:17 +08:00
Yukun He	816e4d73b1	[https://nvbugs/5676748 ][fix] Cherry-pick #9336 : Fix mismatched nvfp4 gemm sf shape. (#9437 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-11-26 11:57:54 +08:00
TensorRT LLM	dbb58bac25	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2025-11-24 18:23:53 +00:00
jthomson04	b9d92380da	[TRTLLM-9199][docs] KV Connector Docs (#9325 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-11-24 18:07:50 +01:00
Jin Li	0339255103	[https://nvbugs/5545522 ][fix] Correct Cutlass with PDL support (#9335 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-11-22 09:05:13 -08:00
Iman Tabrizian	4180417b8c	[https://nvbugs/5601682 ][fix] Fix cacheTransceiver hang (#9311 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-11-20 15:19:23 -08:00
JunyiXu-nv	838df92e21	[https://nvbugs/5670793 ][fix] Solve trtllm-serve launch_disaggregated… (#9324 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-11-20 19:31:35 +08:00
dominicshanshan	2cde4e41da	[https://nvbugs/5648685 ][fix] Fix openAI server waiting time to avoid large model weight loading out time (#9254 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-11-19 09:46:02 +08:00
QI JUN	a49fdb36df	[TRTLLM-9092][doc] Add a pre-quantized example in quick start guide (#9223 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-18 17:36:01 -08:00
sunnyqgg	35b176ae78	[https://nvbugs/5461796 ][fix] Unwaive and extend time for test_llmapi_speculative_decoding_mtp (#9092 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-11-18 19:20:07 +08:00
Chuang Zhu	1c4c737206	[https://nvbugs/5582133 ][fix] unwaive nixl test (#9244 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-11-18 13:07:30 +08:00
Wanli Jiang	6640aed0c2	[None][fix] Bypass key-word matching for multimodal tests (#9170 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-11-18 10:33:07 +08:00
sunnyqgg	55a9771ff0	[https://nvbugs/5649826 ][fix] Unwaive test test_llm_commandr_plus_4gpus_summary (#9201 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-11-16 23:11:44 -08:00
Shunkangz	fd0e2e4e79	[TRTLLM-9159][doc] Add KV Connector docs (#9043 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-11-17 10:44:49 +08:00
brb-nv	6d28e6c3a6	[https://nvbugs/5568836 ][fix] Skip keyword matching for Gemma3 e2e test (#9158 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-14 02:18:24 -08:00
Kaiyu Xie	e5c1cd41cd	[None] [fix] Disable UCC as WAR to MPI allgather issue before NGC PyTorch 25.12 upgrade (#9127 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-11-14 01:18:04 -08:00
Leslie Fang	d43036e3fd	[https://nvbugs/5652552 ][fix] Log the llm args (#9119 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-11-14 12:02:41 +08:00
Chang Liu	4661820d05	[TRTLLM-7971][doc] Doc update for multimodal in v1.1 (#9015 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-13 14:58:14 -08:00
Michal Guzek	8e9409ce04	[https://nvbugs/5628204 ][fix] Stop token IDs - fast path optimization for single stop token IDs only (#9014 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>	2025-11-13 14:17:20 +01:00
Chuang Zhu	12fa81c679	[https://nvbugs/5628952 ][fix] avoid cudaFree overlap with cuda graph (#8903 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-11-12 09:08:05 +01:00
peaceh-nv	f1d02b5664	[https://nvbugs/5570575 ][fix] : Use less kv cache memory on SM120 (#9054 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-11-11 15:42:08 +08:00
Vincent Zhang	08f8f96cbd	[https://nvbugs/5284463 ][fix] fix ada fp8 group gemm lacks shared memory (#9044 ) Signed-off-by: Vincent Zhang <vinczhang@nvidia.com>	2025-11-11 13:00:47 +08:00
Lizhi Zhou	0649b77d16	[https://nvbugs/5608743 ][chore] unwaive test (#8994 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-11-10 05:59:29 -08:00
Zhanrui Sun	7ff0b13de3	[TRTLLM-9080][infra] upgrade tritonserver DLFW 25.10 (#8877 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-11-09 22:36:56 -08:00

1 2 3 4 5 ...

3174 Commits