TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Enwei Zhu	598e88594c	[https://nvbugs/5568951 ][fix] Fix guided decoding disagg tests (#8311 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-10-13 18:55:28 +08:00
Zhanrui Sun	02080e199d	[https://nvbugs/5563653 ][infra] reduce docker image layers (#8250 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-10-13 01:38:27 -07:00
Chuang Zhu	ad0e91a174	[https://nvbugs/5546202 ][fix] Fix concurrent bug for NIXL cache transceiver (#8147 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-13 09:40:56 +02:00
xiweny	6545d541bb	[https://nvbugs/5532789 ] [doc] Add documents about CUDA 12.9 (#8192 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-13 00:35:36 -07:00
Yechan Kim	745cf55ff3	[https://nvbugs/5550722 ][fix] Fix image load (#8093 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-13 14:12:39 +08:00
Yechan Kim	3d3d49434a	[https://nvbugs/5547434 ][fix] Fix Qwen2.5-VL device_path error (#8057 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-13 14:12:27 +08:00
Ivy Zhang	6a42a9649b	[None][chore] Update test configs for release (#8224 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-13 14:07:33 +08:00
Liao Lanyu	8f2e48a981	[https://nvbugs/5522746 ][fix] unwaive tests caused by node issues after rebooting (#8268 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-10-13 13:31:52 +08:00
Ivy Zhang	bcf9cb1f58	[TRTLLM-8246][test] add multimodal kvcache+chunked_prefil cases in to QA test list (#8212 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-13 11:38:38 +08:00
Ivy Zhang	bca5e29387	[None][chore] Update constaintfor release (#8211 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-13 11:14:24 +08:00
brb-nv	04bded7c40	[None][chore] Waive test failing on pre-merge CI (#8295 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-10-12 16:54:56 -07:00
Emma Qiao	d857cd47a0	[None][infra] Update and waive failed tests for release branch (#8291 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-12 21:51:54 +08:00
Zhanrui Sun	4c36bba2ec	[None][infra] Remove WAR code for GH200 node (#8267 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-10-11 20:40:16 -07:00
Yan Chunwei	4ebc443fa9	[https://nvbugs/5565590 ][fix] test_request_perf_metrics_draft (#8257 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-10-12 10:01:20 +08:00
Yan Chunwei	7771669651	[https://nvbugs/5532023 ][fix] unwaive GenerationExecutor tests (#8251 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-10-11 10:43:04 +08:00
Patrice Castonguay	2e787d73ea	[https://nvbugs/5538098 ][fix] Checking connection to etcd server in unit test (#8269 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-10-10 14:31:36 -07:00
Zhanrui Sun	f72058264f	[None][fix] cherry-pick !8217 pin flashinfer-python version (#8217 ) (#8252 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-10-09 23:48:21 -07:00
xxi	ea640a186b	[https://nvbugs/5550283 ][fix] update test case to call post quantization explicitly due to code refactor (#8188 ) Signed-off-by: xxi <xxi@nvidia.com>	2025-10-09 09:41:47 +08:00
brb-nv	a9a0969de7	[None][chore] Waive tests failing on release/1.1 post merge (#8185 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-10-08 09:59:50 -07:00
Yukun He	1ca84e1a25	[https://nvbugs/5536131 ][fix] Fix illegal access issue when scale is not provided in Llama3/4. (#7960 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-10-07 23:47:00 -07:00
xxi	647080e3d5	[https://nvbugs/5550283 ][fix] update to the latest MoE API (#8169 ) Signed-off-by: xxi <xxi@nvidia.com>	2025-10-07 21:12:20 +08:00
xiweny	72144a40d2	[https://nvbugs/5541494 ] [fix] Fix missing sm100f/103a kernels and add tests (#8098 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-07 08:27:55 +08:00
Jin Li	b4e6a1648b	[https://nvbugs/5451280 ][fix] Reduce memory fraction problem by warmu… (#7999 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-03 18:14:13 -07:00
Jin Li	ef8e2173d4	[None][ci] Waive failing tests on release/1.1 (#8088 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-09-30 04:10:22 -04:00
Zheyu Fu	e87c89c03f	[https://nvbugs/5548098 ][fix] Fix flakey unit test for dynamic spec decode (#8078 ) Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>	2025-09-30 15:36:32 +08:00
Enwei Zhu	a64d9b69e5	[None][fix] Fix chunked prefill state of draft request (#8067 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-30 09:51:21 +08:00
Guoming Zhang	0c47925600	[None][doc] Refine perf overview.md and correct the error link in per… (#8036 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-09-28 16:14:31 +08:00
Yiqing Yan	4d5465a575	[None][chore] Bump version to 1.1.0 (#7942 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-09-26 13:17:36 +08:00
sunnyqgg	2e5850c28a	[TRTLLM-7330][feat] Eagle3 cuda graph support for the first draft model inference (#7363 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-09-26 11:28:05 +08:00
Chuang Zhu	f98fa0cf8b	[None][feat] Optimize kv cache transfer TEP (#7613 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-09-25 20:20:04 -07:00
QI JUN	4c0f8482f1	[None][ci] Waive test_mm_encoder_standalone.py::test_multi_request_batch_chat[llava-v1.6-mistral-7b-hf] (#8010 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-26 11:07:54 +08:00
Yuan Tong	fae83c387b	[#6102 ][fix] support non-system python installation (#7763 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-09-26 10:16:15 +08:00
Enwei Zhu	d650320de4	[None][infra] Improve the failure message for accuracy test suite (#7994 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-26 10:04:47 +08:00
Yiqing Yan	108248ece1	[TRTLLM-7999][infra] Add B300/GB300 single gpu test (#7951 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-09-26 09:59:11 +08:00
Yanchao Lu	7e2521a7f0	[None][chore] Some clean-ups for CUDA 13.0 dependencies (#7979 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-09-26 08:46:11 +08:00
dongfengy	1eb653146a	[https://nvbugs/5525951 ][fix] Clarify that PP is not supported for GPTOSS (#7911 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-09-25 12:54:18 -07:00
QI JUN	1529a6f22d	[None][chore] extract weights loading related logic to model loader (#7579 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-25 10:19:22 -07:00
Emma Qiao	2dc93c6371	[None][infra] Waive failed tests on main (#8001 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-25 08:13:39 -07:00
WeiHaocheng	4b0570a0d6	[None][doc] Add acknowledgements in scaffolding tech blog (#7983 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-09-25 08:07:13 -07:00
xxi	57ff5f4c0d	[None][fix] fix a bug in wideEp use DeepEP with num_chunks > 1 (#7954 ) Signed-off-by: xxi <xxi@nvidia.com>	2025-09-25 07:53:42 -07:00
Matthias Jouanneaux	eda1467061	[TRTLLM-5966][feat] Helix: add alltoall op (#6815 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>	2025-09-25 07:18:29 -07:00
PeganovAnton	396c0ea677	[None][chore] relax version constraints on fastapi (#7935 ) Signed-off-by: Anton Peganov <apeganov@nvidia.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-09-25 21:58:53 +08:00
Yueh-Ting (eop) Chen	c5012423f5	[None][chore] Remove developer name in comment (#7981 ) Signed-off-by: eopXD <yuehtingc@nvidia.com>	2025-09-25 06:43:38 -07:00
Yan Chunwei	40c6103ef8	[None][doc] add Llama PP known issue to release note (#7959 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Guoming Zhang	663ce3a4de	[None][doc] fix invalid links in perf benchmarking. (#7933 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Guoming Zhang	202bed4574	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
QI JUN	961418908c	[https://nvbugs/5531963 ][fix] cherry pick #7725 (#7907 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Yan Chunwei	5999fab146	[https://nvbugs/5427043 ][fix] cherrypick: request length exceeds max_num_tokens (#7718 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Yan Chunwei	cb466a846d	[None][fix] api stability bug in status label (#7861 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Yan Chunwei	9d48898def	[None][doc] add stable label to all the un-labelled arguments in LLM class (#7863 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00

1 2 3 4 5 ...

3014 Commits