Ivy Zhang
f904348cd6
[TRTLLM-8580][test] save runtime report periodically ( #8312 ) ( #8455 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-10-20 10:54:24 +08:00
danielafrimi
a0b7fe9e36
[ https://nvbugs/5524714 ][fix] Fix TP sharding of fused-QKV weight scales in W4A16 AWQ ( #8432 )
...
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>
2025-10-19 15:27:23 +03:00
xiweny
af2450c266
[ https://nvbugs/5565565 ] [fix] Remove waiver ( #8450 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-17 01:13:01 -07:00
Yukun He
437a3fc642
[None][chore] Remove duplicate log outputs in test_perf.py ( #8418 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-17 14:11:32 +08:00
Yan Chunwei
995b93bc38
[ https://nvbugs/5437384 ][test] fix trtllm-llmapi-launch multi tests with single launch ( #8397 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-16 21:14:43 -07:00
Iman Tabrizian
82430f84dc
[None][bug] Set NCCL_GRAPH_REGISTER to false to avoid hang ( #8409 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-10-16 07:40:51 -07:00
ruodil
20c2de4924
[None][test] cherry-pick: add test-model-suites in integration conftest.py ( #8388 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-10-15 23:26:32 -07:00
Yukun He
fd4311e6a3
[TRTLLM-8129][feat] Allreduce tuning and benchmark script revising ( #7870 )
...
Because we have encountered some perf regression due to using a one-shot kernel instead of NCCL on A100/H100, it will be beneficial if we can have a solid benchmarking of allreduce Op and analyze the data collected from it.
Implemented new AllreduceOp heuristic:
- Added Linear programming-based heuristic implementation.
- Added LUT-based heuristic implementation and corresponding code generation script.
AllreduceOp minor fixing:
- Fixed a minor issue in AllreduceOp, that the strategy can not be overridden when ONESHOT or TWOSHOT is set.
- Fixed a minor TWOSHOT kernel perf issue.
- Cleaned up Dispatching code in AllReduceOp.
This PR will fix the perf gaps reported in:
https://nvbugspro.nvidia.com/bug/5517023
For Deepseek-R1, it shows a performance gain of about 3-4% in concurrency levels of 256 and 512.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-16 14:15:25 +08:00
Ziyi Xiong
4ad7ef1497
[ https://nvbugs/5534705 ][fix] Skip unnecessary CUDA graph capture (#8… ( #8344 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-10-16 10:27:19 +08:00
Zhenhuan Chen
838958c631
[ https://nvbugs/5545522 ][fix] move PREEXIT in UB kernels to fix accuracy issue ( #8318 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-10-16 09:50:43 +08:00
Patrice Castonguay
7862372ee2
[ https://nvbugs/5552889 ][fix] fix: Prevent empty batch when using attention DP with disagg ( #8372 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-16 09:11:04 +08:00
amitz-nv
27c6c8466b
[ https://nvbugs/5510879 ][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 ( #8313 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-15 08:24:02 -07:00
amitz-nv
e5476a6b2a
[ https://nvbugs/5521949 ][fix] Update FP8 model with BF16 LoRA test, fix test_bielik_11b_v2_2_instruct_multi_lora ( #8324 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-15 05:48:38 -07:00
Ivy Zhang
4751bdbcb6
[None][chore] Update nim test list ( #8356 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-10-15 02:04:20 -07:00
Emma Qiao
988f93790f
[None][infra] Waive failed tests in release post-merge 10/15 ( #8386 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-15 16:06:08 +08:00
Stanley Sun
cce97e6e15
[TRTLLM-8113][test] Add pytorch workflow e2e tests with pp enabled ( #8357 )
...
Signed-off-by: Stanley Sun <stsun@nvidia.com>
2025-10-15 15:09:21 +08:00
xiweny
d5b79268e7
[ https://nvbugs/5565565 ] [fix] fp8 wideep support sm103 ( #8228 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-15 10:17:08 +08:00
Jin Li
4bac6b337e
[ https://nvbugs/5537348 ][fix] Use device tensor index for MTP ( #8062 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-14 05:51:45 -07:00
Yiqing Yan
7b5ba7ca66
[ https://nvbugs/5565541 ][fix] Add timeout threshold for H100 FHMA test ( #8354 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-10-14 01:23:08 -07:00
bhsueh_NV
66aa88739b
[ https://nvbugs/5574556 ][fix] fix bug of Qwen3_235B_A22B::test_fp8 CI ( #8351 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-10-14 15:26:15 +08:00
Ziyi Xiong
9ecc6db5b4
[ https://nvbugs/5537878 ][fix] Reserve an extra slot for padded batch … ( #8231 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-10-13 23:34:22 -07:00
Lizhi Zhou
553ff3402a
[ https://nvbugs/5550671 ][fix] fix disagg-serving multinodes test failure ( #8307 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-14 08:01:00 +02:00
Chuang Zhu
6a73f079fe
[ https://nvbugs/5465642 ][fix] Increase server timeout to wait weight loading ( #8297 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-14 07:55:31 +02:00
Jin Li
3860a674d5
[ https://nvbugs/5543770 ][fix] Update to Cutlass v4.2.1 ( #8055 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-13 22:39:25 -07:00
yuanjingx87
e065ff21d2
[None][infra] cherry pick numexpr fix to release/1.1 ( #8333 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-10-13 21:20:09 -07:00
Lizhi Zhou
2c44e8198a
[ https://nvbugs/5470769 ][chore] unwaive test for PR7338 ( #8258 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-14 11:17:03 +08:00
William Zhang
dc052b663f
[ https://nvbugs/5565530 ][fix] Unwaive test ( #8273 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-10-13 17:59:32 +02:00
Patrice Castonguay
fd7a11e11d
[ https://nvbugs/5534837 ][fix] Fix KV cache split on long context ( #8247 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-13 11:48:49 -04:00
Enwei Zhu
598e88594c
[ https://nvbugs/5568951 ][fix] Fix guided decoding disagg tests ( #8311 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-10-13 18:55:28 +08:00
Zhanrui Sun
02080e199d
[ https://nvbugs/5563653 ][infra] reduce docker image layers ( #8250 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-10-13 01:38:27 -07:00
Chuang Zhu
ad0e91a174
[ https://nvbugs/5546202 ][fix] Fix concurrent bug for NIXL cache transceiver ( #8147 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-13 09:40:56 +02:00
xiweny
6545d541bb
[ https://nvbugs/5532789 ] [doc] Add documents about CUDA 12.9 ( #8192 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-13 00:35:36 -07:00
Yechan Kim
745cf55ff3
[ https://nvbugs/5550722 ][fix] Fix image load ( #8093 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-13 14:12:39 +08:00
Yechan Kim
3d3d49434a
[ https://nvbugs/5547434 ][fix] Fix Qwen2.5-VL device_path error ( #8057 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-13 14:12:27 +08:00
Ivy Zhang
6a42a9649b
[None][chore] Update test configs for release ( #8224 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-10-13 14:07:33 +08:00
Liao Lanyu
8f2e48a981
[ https://nvbugs/5522746 ][fix] unwaive tests caused by node issues after rebooting ( #8268 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-10-13 13:31:52 +08:00
Ivy Zhang
bcf9cb1f58
[TRTLLM-8246][test] add multimodal kvcache+chunked_prefil cases in to QA test list ( #8212 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-10-13 11:38:38 +08:00
Ivy Zhang
bca5e29387
[None][chore] Update constaintfor release ( #8211 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-10-13 11:14:24 +08:00
brb-nv
04bded7c40
[None][chore] Waive test failing on pre-merge CI ( #8295 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-12 16:54:56 -07:00
Emma Qiao
d857cd47a0
[None][infra] Update and waive failed tests for release branch ( #8291 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-12 21:51:54 +08:00
Zhanrui Sun
4c36bba2ec
[None][infra] Remove WAR code for GH200 node ( #8267 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-10-11 20:40:16 -07:00
Yan Chunwei
4ebc443fa9
[ https://nvbugs/5565590 ][fix] test_request_perf_metrics_draft ( #8257 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-12 10:01:20 +08:00
Yan Chunwei
7771669651
[ https://nvbugs/5532023 ][fix] unwaive GenerationExecutor tests ( #8251 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-11 10:43:04 +08:00
Patrice Castonguay
2e787d73ea
[ https://nvbugs/5538098 ][fix] Checking connection to etcd server in unit test ( #8269 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-10 14:31:36 -07:00
Zhanrui Sun
f72058264f
[None][fix] cherry-pick !8217 pin flashinfer-python version ( #8217 ) ( #8252 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-10-09 23:48:21 -07:00
xxi
ea640a186b
[ https://nvbugs/5550283 ][fix] update test case to call post quantization explicitly due to code refactor ( #8188 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-10-09 09:41:47 +08:00
brb-nv
a9a0969de7
[None][chore] Waive tests failing on release/1.1 post merge ( #8185 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-08 09:59:50 -07:00
Yukun He
1ca84e1a25
[ https://nvbugs/5536131 ][fix] Fix illegal access issue when scale is not provided in Llama3/4. ( #7960 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-07 23:47:00 -07:00
xxi
647080e3d5
[ https://nvbugs/5550283 ][fix] update to the latest MoE API ( #8169 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-10-07 21:12:20 +08:00
xiweny
72144a40d2
[ https://nvbugs/5541494 ] [fix] Fix missing sm100f/103a kernels and add tests ( #8098 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-07 08:27:55 +08:00