TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Yi Zhang	a69bd2a6fa	[https://nvbugs/5550409 ][fix] Disable torch compile in piecewise attention part to Avoid host overhead (#8708 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2025-10-29 18:12:58 +08:00
Zheng Duan	d626d13d37	[https://nvbugs/5607238 ][test] fix working dir in disagg worker test (#8648 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-10-29 16:13:52 +08:00
Pengyun Lin	2aade46d18	[TRTLLM-8214][feat] Support Qwen3 tool parser (#8216 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-10-29 15:48:29 +08:00
Yiteng Niu	741183917c	[None][infra] update ci allow list 2025/10/29 (#8749 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>	2025-10-29 15:34:44 +08:00
Faraz	585733f113	[None][fix] add readme copy to wheel stage to avoid setup.py failure (#8736 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-10-29 14:43:03 +08:00
dongxuy04	00eaf5f883	[None][feat] add flag for EPLB to force using GDRCopy (#8650 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-10-29 13:33:26 +08:00
Stefan Niebler	19ca7b15c7	[https://nvbugs/5593199 ][test] Enhance beam search tests deterministic dummy model (#8625 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-10-29 06:12:22 +01:00
Chang Liu	5f737b8dbe	[None][perf] Use fp8 quant kernel in DS3.2 indexer module (#8701 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-10-29 12:45:09 +08:00
Cheng Hang	15c293a90b	[None][feat] Enable nvfp4 cuda core for sm120 (#8620 ) Signed-off-by: Cheng Hang <chang@nvidia.com>	2025-10-29 12:39:03 +08:00
Yechan Kim	bc26f4ce7c	[https://nvbugs/5549829 ][fix] Qwen2.5-VL TP > 1 + Quantized weight load fix (#8680 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-29 13:38:42 +09:00
xinhe-nv	7ba98a6b20	[None][chore] Add failed cases into waives.txt (#8684 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-10-28 20:30:01 -07:00
Yan Chunwei	f2faf2809f	[None][ci] waive test_rpc.py temporarily (#8743 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>	2025-10-28 19:20:27 -07:00
Zheng Duan	fea5bfbda7	[None][feat] add detailed KV cache transfer time breakdown (#8521 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-10-29 10:11:09 +08:00
ruodil	f444fe2deb	[None][test] fix a typo in perf test sampler config (#8726 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-10-29 09:53:53 +08:00
Chuang Zhu	b828b6445b	[https://nvbugs/5612529 ][fix] Fix transferAgent_test (#8710 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-29 09:14:34 +08:00
Yechan Kim	cf8a1d2ef9	[https://nvbugs/5596377 ][fix] Fix mm dummy calculation (#8498 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-29 09:45:21 +09:00
Lizhi Zhou	24167d00eb	[TRTLLM-8431][doc] update public doc and example, add etcd auto-scaling tests (#8602 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-10-28 17:04:53 -07:00
Kaiyu Xie	227c288441	[TRTLLM-8827] [feat] Enable low precision alltoall for Cutlass and TRTLLMGen backends (#8675 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-10-29 07:56:48 +08:00
Mike Iovine	00161b315f	[https://nvbugs/5549111 ][fix] Fix 2-model overlap scheduler accuracy on very long prompts (#8076 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Michael Iovine <miovine@nvidia.com>	2025-10-28 14:55:34 -07:00
dongfengy	083f3637f1	[https://nvbugs/5596343 ][test] Update test waive to get back some coverage (#8702 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>	2025-10-28 14:05:48 -07:00
Lucas Liebenwein	0ee71d95ec	[https://nvbugs/5606166 ][fix] AutoDeploy: use tuples for cudagraph shape lookup (#8658 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-10-28 10:52:43 -07:00
Anish Shanbhag	a09b38a862	[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-10-28 09:17:26 -07:00
William Zhang	cdc9e5e645	[None][fix] Properly raise error for nemotron H models (#8697 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-10-28 08:59:42 -07:00
dongfengy	5a01f382c1	[https://nvbugs/5575913 ][fix] Use separate thresholds for 120b/20b gptoss (#8664 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>	2025-10-28 10:35:07 -04:00
Robin Kobus	e8e2b0697a	[None][chore] Revert "[TRTLLM-7835][test] add default sample config for perf test (#8523 ) (#8725 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-10-28 14:23:38 +01:00
Eran Geva	e051a05e6c	[#8694 ][fix] fix AutoDeploy cuda memory access failure in nvidia/NVIDIA-Nemotron-Nano-31B-A3-v3 (#8696 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-10-28 13:21:43 +02:00
dongxuy04	b37a8a9a74	[None][fix] fix EPLB init hang (#8649 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-10-28 05:22:49 -04:00
ruodil	6b9b73ee27	[https://nvbugs/5564465 ][test] ensure deepseek_v3_lite isl + osl < max_seq_len (#8565 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-10-28 15:25:52 +08:00
ruodil	bf72eb045e	[TRTLLM-7835][test] add default sample config for perf test (#8523 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>	2025-10-28 02:22:47 -04:00
yufeiwu-nv	0e36484fba	[None][test] Add gpt_oss_20b Model to Sanity Perf Test (#8265 )	2025-10-28 13:36:28 +08:00
Erin	a966644a71	[None][fix] Change Ray submit() to use async RPC (#8636 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-10-28 00:56:13 -04:00
Sai Kiran Polisetty	08134cbca0	[https://nvbugs/5556475 ] [fix] Fix the `tensorrt_llm_bls` model to correctly return the outputs for `num_input_tokens` and `num_output_tokens` (#8150 ) Signed-off-by: Sai Kiran Polisetty <spolisetty@nvidia.com>	2025-10-27 21:06:28 -07:00
Aurelien Chartier	0a02f5f25d	[None][chore] Use a cached model path for Ray integration test (#8660 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-10-27 19:16:06 -07:00
HuiGao-NV	49974eed75	[None][chore] ISOLATE some cases (#8690 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-10-27 22:10:44 -04:00
chenfeiz0326	f5265a087b	[None][infra] Minor Update on Perf Sanity Testdb Files (#8607 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-10-28 09:54:48 +08:00
gramnarayan	88b0fbc8ff	[#8245 ][feat] Autodeploy: Guided Decoding Support (#8551 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com> Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-10-28 09:29:57 +08:00
Yechan Kim	a6017f6266	[https://nvbugs/5608723 ][fix] Use local data on multimodal tests and unwaive tests (#8673 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-28 09:20:02 +09:00
Emma Qiao	73a5479b26	[None][infra] Skip failed tests for main 10/27 (#8686 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-28 08:04:30 +08:00
Aurelien Chartier	1401a3c09c	[None][feat] Add FP8 rowwise GEMMs for B200 (#8332 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-10-27 16:33:14 -04:00
Bo Li	9c4432f8a4	[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. (#7499 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-27 13:23:06 -04:00
nvxuanyuc	d1398c05e6	[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127 ) Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>	2025-10-27 13:12:31 -04:00
Chenghao Zhang	b9b2802599	[None][feat] Autodeploy: Update the ssm to use slice (#8667 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2025-10-27 09:45:20 -07:00
mpikulski	7c8ba71b49	[TRTLLM-8832][feat] fully async _select_generated_logits with tests (#8628 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-27 16:15:32 +01:00
QI JUN	4fd58137a1	[TRTLLM-8933][chore] remove unused update_executor_config function (#8678 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-10-27 10:00:47 -04:00
Kaiyu Xie	c9b08790c2	[None] [test] Add MNNVL AlltoAll tests to pre-merge (#8601 )	2025-10-27 21:39:44 +08:00
Chao Ni	0019d99e6d	[None][test] Add longbench v2 for long context evaluation (#8604 ) Signed-off-by: mni <125171826+baize97@users.noreply.github.com>	2025-10-27 20:01:14 +08:00
zhanghaotong	1026069a2b	[None][feat] Add opentelemetry tracing (#5897 ) Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com> Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-10-27 18:51:07 +08:00
Jie Li	ce0d76135d	[https://nvbugs/5546507 ][fix] skip TRT-Flow test case due to CMake Error in building (#8677 ) Signed-off-by: Jie Li <lijie@nvidia.com>	2025-10-27 05:11:47 -04:00
Robin Kobus	990b0c0c47	[TRTLLM-7159][docs] Add documentation for additional outputs (#8325 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-10-27 09:52:04 +01:00
xinhe-nv	8090c9641c	[TRTLLM-8638][fix] Add failed cases into waives.txt (#8672 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-10-27 03:20:46 -04:00

1 2 3 4 5 ...

3389 Commits