HuiGao-NV
4e6a492aa3
[None][chore] Isolate several intermittent cases ( #8408 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-15 23:48:31 -07:00
Yan Chunwei
42ab473bb0
[ https://nvbugs/5583261 ][ci] waive test_fetch_responses_streaming_sync ( #8407 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-15 23:19:31 -07:00
Min Yu
0a0159fdd8
[ https://nvbugs/5378031 ] [feat] W4A8 AWQ MoE supports Per Expert Pre-quant Scale Factor for PyT backend ( #7286 )
...
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
2025-10-16 11:07:48 +08:00
xiweny
4143887370
[ https://nvbugs/5541494 ] [fix] Remove waivers ( #8353 )
...
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-15 19:10:35 -07:00
Yan Chunwei
206cf31705
[ https://nvbugs/5560921 ][fix] GenerationExecutor RPC ( #8209 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-10-16 09:05:22 +08:00
Chuang Zhu
40d129a415
[None][fix] Fix cache buffer size for window ( #8320 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-16 09:01:11 +08:00
dongfengy
7a0aa64973
[None][fix] Refactor triton paddings ( #6980 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-10-15 12:59:01 -07:00
Yukun He
56c20665a9
[TRTLLM-4501][feat] Add input tensor pre-hook function API for the tuning process. ( #6924 )
...
Some tunable ops require a more realistic data distribution, for instance, a shape-associated tensor. Thus, a customizable pre-hook function can be declared in the tuning config to modify the input tensor before the tuning process.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-10-15 21:18:11 +08:00
mpikulski
0510b34588
[TRTLLM-8551][feat] add cache_salt in LLM.generate and refactor test_return_logits.py ( #8317 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 02:53:57 -07:00
QI JUN
1a1c9a29ab
[None][ci] move all llama4 test cases to post merge ( #8387 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-15 16:36:37 +08:00
mpikulski
93a4b7f1b6
[None][chore] update torch_dtype -> dtype in 'transformers' ( #8263 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-15 17:09:30 +09:00
QI JUN
616d1df7a0
[None][chore] set the default value of max_num_tokens explicitly ( #8208 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-14 23:03:02 -07:00
sychen52
6a6124dcb5
[OMNIML-2336][feat] w4a8 nvfp4 fp8 exports scale factor properly ( #8180 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
Co-authored-by: Shiyang Chen <shiychen@omniml-a6.nvidia.com>
2025-10-15 13:41:27 +08:00
Jin Li
206a9930df
[ https://nvbugs/5547435 ][fix] Fix a merge conflict ( #8365 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-15 10:43:10 +08:00
Emma Qiao
493da020c1
[TRTLLM-7351][infra] Add isolate marker for L0 ( #7497 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-14 16:58:14 -07:00
dongfengy
9d855f47ad
[None][fix] Remove outdated test waives for GPTOSS ( #8183 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-10-14 16:20:38 -07:00
Lizhi Zhou
22471ecc67
[TRTLLM-7846][feat] implement etcd storage for disagg cluster ( #8210 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-14 16:48:41 -04:00
Michal Guzek
1cdb0b62c3
[ https://nvbugs/5563469 ][fix] Temporarily disable test_nemotron_nano_8b_lora_torch in L0 due to Torch non-determinism ( #8206 )
...
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-10-14 17:55:28 +02:00
shuyixiong
6776caaad1
[TRTLLM-8507][fix] Fix ray resource cleanup and error handling in LoRA test ( #8175 )
...
Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>
2025-10-14 23:46:30 +08:00
Fanrong Li
0d20a8fd61
[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support ( #8086 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-10-14 08:23:16 -07:00
Yan Chunwei
86be06bda4
[None][ci] waive several rpc tests ( #8349 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-10-14 03:12:49 -07:00
William Zhang
72d65d079a
[ https://nvbugs/5542878 ][fix] Unwaive test ( #8027 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-10-14 07:58:07 +02:00
xinhe-nv
371fcb0338
[TRTLLM-8366][feat] add kimi multi nodes case ( #8025 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-13 21:36:03 -07:00
Yuxian Qiu
3450fe9944
[None][fix] Fix dummy load format for key models. ( #7993 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-10-14 11:18:39 +08:00
Lucas Liebenwein
22aa4ac08c
[None][feat] AutoDeploy: VLMs with subgraphs + cudagraph/compile ( #8203 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-13 17:34:09 -07:00
Zheyu Fu
bac665e650
[TRTLLM-7412][feat] Turn off spec decode when the rolling average acceptance length drops below threshold. ( #7283 )
...
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-10-13 15:51:14 -07:00
Robin Kobus
db8c63b9b1
[TRTLLM-4517] [feat] Additional model outputs ( #7206 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 15:33:18 +02:00
amitz-nv
bbae7a05f0
[ https://nvbugs/5521949 ][fix] Replace test_codellama_fp8_with_bf16_lora with test_llama_3_1_8b_fp8_with_bf16_lora ( #8199 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-13 06:01:55 -07:00
Po-Han Huang (NVIDIA)
6fc6f70a68
[ https://nvbugs/5441729 ][test] Fix test_modeling_llama_min_latency.py failures ( #7478 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-10-13 15:35:02 +08:00
xinhe-nv
9fe63dd8db
[None][chore] Add failed cases into waives.txt ( #8290 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-13 00:07:00 -07:00
Leslie Fang
8d1b068b1a
[TRTLLM-8477][chore] Replace KvCacheConfigCpp with KvCacheConfig inside PyExecutor ( #8259 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-13 14:55:36 +08:00
xinhe-nv
72fcff1044
[None][fix] add timeout for llama4 ( #8254 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-12 21:04:20 -07:00
Guoming Zhang
989c25fcba
[None][doc] Add qwen3-next doc into deployment guid and test case into L0. ( #8288 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Faradawn Yang <faradawny@gmail.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 10:25:45 +08:00
amitz-nv
fac47e2826
[ https://nvbugs/5510879 ][fix] Fix pytorch & TRT-python flows fused LoRA adapter modules weight split with TP>1 ( #8063 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-12 12:29:52 -07:00
Eran Geva
a1ed03fe8a
[None][fix] AD test_trtllm_bench to use small model config and skip loading weights ( #8149 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-12 18:30:20 +03:00
Emma Qiao
fdbeea51d3
[None][infra] Skip failed cases for main branch ( #8293 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-12 08:04:09 -07:00
kris1025
a7ea544dbe
[TRTLLM-7384][feat] enable rejection sampling for CDL ( #7731 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-10-12 20:38:48 +08:00
brb-nv
56a539cd37
[None][chore] Waive failing pre-merge test on main ( #8282 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-10 23:52:05 -07:00
Yilin Fan
2695d70d42
[None][feat] Add request timing breakdown option in benchmark_serving ( #8128 )
...
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-10 09:24:54 -07:00
xinhe-nv
2655995a09
[None][fix] add gc for test fixture ( #8220 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-10 02:50:25 -07:00
bhsueh_NV
d3059dbd8a
[ https://nvbugs/5547416 ][fix] unwaive no_cache test ( #8213 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-10-10 01:50:13 -07:00
xinhe-nv
b555f1ff98
[None][chore] Add failed cases into waives.txt ( #8229 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-09 23:45:28 -07:00
xinhe-nv
e8c9bae37e
[None][chore] Remove closed bugs ( #8151 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-10 16:39:40 +11:00
Pengbo Wang
7da4b05289
[ https://nvbugs/5501820 ][fix] Add requirements for numba-cuda version to WAR mem corruption ( #7992 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-10-10 10:18:27 +08:00
Emma Qiao
ccd949ea5b
[None][infra] Waive failed tests on main 10/09 ( #8230 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-09 22:46:07 +08:00
amitz-nv
d560054e1b
[None][chore] Restore asserts in pytorch flow LoRA tests ( #8227 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-10-09 17:10:38 +03:00
bhsueh_NV
27677a36f5
[ https://nvbugs/5516666 ][fix] unwaive some Qwen3 CI tests ( #8130 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-10-09 09:44:58 +08:00
Lizhi Zhou
fdf29ab8fa
[TRTLLM-7846][feat] Http disagg-cluster management implemention ( #7869 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-10-09 09:44:01 +08:00
QI JUN
6884d06aed
[None][ci] move some llama4 test cases to pre merge ( #8189 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-08 18:34:08 -07:00
Liao Lanyu
ed8e00ad4a
[ https://nvbugs/5522746 ][fix] unwaive tests caused by node issues after rebooting ( #8193 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-10-09 08:45:56 +08:00