TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Pengyun Lin	e86d6db9ec	[https://nvbugs/5575829 ][fix] Unwaive gpt-oss test (#8576 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-10-22 07:31:56 -04:00
Emma Qiao	09349ccbfe	[None][infra] Waive failed tests for release 10/22 (#8574 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-22 04:41:00 -04:00
Bo Deng	9e30f14da8	[https://nvbugs/5565549 ][fix] unwaive test_disaggregated_spec_dec_bat… (#8500 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-10-22 14:59:59 +08:00
Jin Li	6631791c60	[https://nvbugs/5546510 ][fix] Move torch.cuda.Stream out of torch com… (#8494 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-22 11:21:58 +08:00
Guoming Zhang	a519c2c43c	[https://nvbugs/5504095 ][fix] Unwaive test_user_specify_workspace case. (#8316 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-10-22 09:31:24 +08:00
Simeng Liu	1375b9f074	[https://nvbugs/5515753 ][ci] Add NCCL_DEBUG=INFO flag to collect more info with CI failure. (#8440 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-10-21 18:12:05 -07:00
JunyiXu-nv	0acdecb2c3	[https://nvbugs/5569713 ][fix] Disable fp8 deep gemm for EXAONE-4.0-32B-FP8 (#8429 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-10-21 12:37:56 -04:00
mpikulski	f256eb9063	[TRTLLM-8650][fix] beam search request validation (#8433 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-21 10:50:27 +02:00
Emma Qiao	2b0a10e4d5	[None][infra] Waive tests for release 1021 (#8522 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-21 03:21:00 -04:00
Yuxian Qiu	4faa5150ab	[https://nvbugs/5569081 ][fix] Upgrade fmha_v2. (cherry-pick from https://github.com/NVIDIA/TensorRT-LLM/pull/8364 ) (#8499 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-10-21 12:32:13 +08:00
Pengbo Wang	8ce2dc5cb7	[https://nvbugs/5501820 ][fix] Add requirements for numba-cuda version to WAR mem corruption (#7992 ) (#8414 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2025-10-20 09:01:08 +02:00
bhsueh_NV	14d0f5d683	[https://nvbugs/5516666 ][fix] cherry-pick PR 8130 to unwaive the Qwen3 CI (#8444 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-10-19 23:14:10 -04:00
Ivy Zhang	f904348cd6	[TRTLLM-8580][test] save runtime report periodically (#8312 ) (#8455 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-20 10:54:24 +08:00
danielafrimi	a0b7fe9e36	[https://nvbugs/5524714 ][fix] Fix TP sharding of fused-QKV weight scales in W4A16 AWQ (#8432 ) Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>	2025-10-19 15:27:23 +03:00
xiweny	af2450c266	[https://nvbugs/5565565 ] [fix] Remove waiver (#8450 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-17 01:13:01 -07:00
Yukun He	437a3fc642	[None][chore] Remove duplicate log outputs in test_perf.py (#8418 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-10-17 14:11:32 +08:00
Yan Chunwei	995b93bc38	[https://nvbugs/5437384 ][test] fix trtllm-llmapi-launch multi tests with single launch (#8397 ) Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-10-16 21:14:43 -07:00
Iman Tabrizian	82430f84dc	[None][bug] Set NCCL_GRAPH_REGISTER to false to avoid hang (#8409 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-10-16 07:40:51 -07:00
ruodil	20c2de4924	[None][test] cherry-pick: add test-model-suites in integration conftest.py (#8388 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-10-15 23:26:32 -07:00
Yukun He	fd4311e6a3	[TRTLLM-8129][feat] Allreduce tuning and benchmark script revising (#7870 ) Because we have encountered some perf regression due to using a one-shot kernel instead of NCCL on A100/H100, it will be beneficial if we can have a solid benchmarking of allreduce Op and analyze the data collected from it. Implemented new AllreduceOp heuristic: - Added Linear programming-based heuristic implementation. - Added LUT-based heuristic implementation and corresponding code generation script. AllreduceOp minor fixing: - Fixed a minor issue in AllreduceOp, that the strategy can not be overridden when ONESHOT or TWOSHOT is set. - Fixed a minor TWOSHOT kernel perf issue. - Cleaned up Dispatching code in AllReduceOp. This PR will fix the perf gaps reported in: https://nvbugspro.nvidia.com/bug/5517023 For Deepseek-R1, it shows a performance gain of about 3-4% in concurrency levels of 256 and 512. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-10-16 14:15:25 +08:00
Ziyi Xiong	4ad7ef1497	[https://nvbugs/5534705 ][fix] Skip unnecessary CUDA graph capture (#8… (#8344 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-10-16 10:27:19 +08:00
Zhenhuan Chen	838958c631	[https://nvbugs/5545522 ][fix] move PREEXIT in UB kernels to fix accuracy issue (#8318 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2025-10-16 09:50:43 +08:00
Patrice Castonguay	7862372ee2	[https://nvbugs/5552889 ][fix] fix: Prevent empty batch when using attention DP with disagg (#8372 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-10-16 09:11:04 +08:00
amitz-nv	27c6c8466b	[https://nvbugs/5510879 ][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 (#8313 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-10-15 08:24:02 -07:00
amitz-nv	e5476a6b2a	[https://nvbugs/5521949 ][fix] Update FP8 model with BF16 LoRA test, fix test_bielik_11b_v2_2_instruct_multi_lora (#8324 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-10-15 05:48:38 -07:00
Ivy Zhang	4751bdbcb6	[None][chore] Update nim test list (#8356 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-10-15 02:04:20 -07:00
Emma Qiao	988f93790f	[None][infra] Waive failed tests in release post-merge 10/15 (#8386 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-15 16:06:08 +08:00
Stanley Sun	cce97e6e15	[TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled (#8357 ) Signed-off-by: Stanley Sun <stsun@nvidia.com>	2025-10-15 15:09:21 +08:00
xiweny	d5b79268e7	[https://nvbugs/5565565 ] [fix] fp8 wideep support sm103 (#8228 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-15 10:17:08 +08:00
Jin Li	4bac6b337e	[https://nvbugs/5537348 ][fix] Use device tensor index for MTP (#8062 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-14 05:51:45 -07:00
Yiqing Yan	7b5ba7ca66	[https://nvbugs/5565541 ][fix] Add timeout threshold for H100 FHMA test (#8354 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-10-14 01:23:08 -07:00
bhsueh_NV	66aa88739b	[https://nvbugs/5574556 ][fix] fix bug of Qwen3_235B_A22B::test_fp8 CI (#8351 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-10-14 15:26:15 +08:00
Ziyi Xiong	9ecc6db5b4	[https://nvbugs/5537878 ][fix] Reserve an extra slot for padded batch … (#8231 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-10-13 23:34:22 -07:00
Lizhi Zhou	553ff3402a	[https://nvbugs/5550671 ][fix] fix disagg-serving multinodes test failure (#8307 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-10-14 08:01:00 +02:00
Chuang Zhu	6a73f079fe	[https://nvbugs/5465642 ][fix] Increase server timeout to wait weight loading (#8297 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-14 07:55:31 +02:00
Jin Li	3860a674d5	[https://nvbugs/5543770 ][fix] Update to Cutlass v4.2.1 (#8055 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-10-13 22:39:25 -07:00
yuanjingx87	e065ff21d2	[None][infra] cherry pick numexpr fix to release/1.1 (#8333 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-10-13 21:20:09 -07:00
Lizhi Zhou	2c44e8198a	[https://nvbugs/5470769 ][chore] unwaive test for PR7338 (#8258 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-10-14 11:17:03 +08:00
William Zhang	dc052b663f	[https://nvbugs/5565530 ][fix] Unwaive test (#8273 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-10-13 17:59:32 +02:00
Patrice Castonguay	fd7a11e11d	[https://nvbugs/5534837 ][fix] Fix KV cache split on long context (#8247 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-13 11:48:49 -04:00
Enwei Zhu	598e88594c	[https://nvbugs/5568951 ][fix] Fix guided decoding disagg tests (#8311 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-10-13 18:55:28 +08:00
Zhanrui Sun	02080e199d	[https://nvbugs/5563653 ][infra] reduce docker image layers (#8250 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-10-13 01:38:27 -07:00
Chuang Zhu	ad0e91a174	[https://nvbugs/5546202 ][fix] Fix concurrent bug for NIXL cache transceiver (#8147 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-10-13 09:40:56 +02:00
xiweny	6545d541bb	[https://nvbugs/5532789 ] [doc] Add documents about CUDA 12.9 (#8192 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-10-13 00:35:36 -07:00
Yechan Kim	745cf55ff3	[https://nvbugs/5550722 ][fix] Fix image load (#8093 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-13 14:12:39 +08:00
Yechan Kim	3d3d49434a	[https://nvbugs/5547434 ][fix] Fix Qwen2.5-VL device_path error (#8057 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-10-13 14:12:27 +08:00
Ivy Zhang	6a42a9649b	[None][chore] Update test configs for release (#8224 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-13 14:07:33 +08:00
Liao Lanyu	8f2e48a981	[https://nvbugs/5522746 ][fix] unwaive tests caused by node issues after rebooting (#8268 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-10-13 13:31:52 +08:00
Ivy Zhang	bcf9cb1f58	[TRTLLM-8246][test] add multimodal kvcache+chunked_prefil cases in to QA test list (#8212 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-13 11:38:38 +08:00
Ivy Zhang	bca5e29387	[None][chore] Update constaintfor release (#8211 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-10-13 11:14:24 +08:00

1 2 3 4 5 ...

3054 Commits