QI JUN
39248320d4
[None][feat] add an example of KV cache host offloading ( #7767 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 13:51:15 +08:00
Zhenhuan Chen
6983e8a00d
[ https://nvbugs/5517260 ][fix] move scaffolding contrib module's import to subdirectory ( #7758 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-09-17 11:36:33 +08:00
QI JUN
bd7aad4988
[None][ci] waive test_llm_gemma_1gpu_summary_vswa ( #7781 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 10:48:31 +08:00
Lucas Liebenwein
4c3dc89f84
[None][chore] AutoDeploy: clean up of model unit test configuration ( #7742 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-17 10:42:01 +08:00
Kaiyu Xie
62042a9733
[TRTLLM-6741] [feat] enable LM tp for MTP, under attention dp case (cherry-pick #7128 ) ( #7571 )
...
Signed-off-by: Cheng Hang <chang@nvidia.com>
Co-authored-by: Cheng Hang <chang@nvidia.com>
2025-09-17 09:41:32 +08:00
Yukun He
6313c9799c
[ https://nvbugs/5488582 ][fix] Cherry-pick 7495: Avoid unexpected Triton recompilation in DG fused_moe ( #7708 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-09-17 09:00:28 +08:00
Shiyu Li
8bdbb48264
[ https://nvbugs/5489015 ][fix] Support communicator split in MNNVL allreduce and fix the binding issues. ( #7387 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-09-17 07:43:20 +08:00
Iman Tabrizian
a91453de34
[None][waive] Waive tests ( #7775 )
...
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-09-16 19:42:32 -04:00
HuiGao-NV
a49cfb3e68
[ https://nvbugs/5516666 ][fix] cherrypick fix to the CUDA graph warmup issue when using speculative decoding ( #7737 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
Co-authored-by: Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-17 06:24:20 +08:00
yuanjingx87
eeb89a167c
[None][infra] Add nightly pipeline to generate lock files ( #5798 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-16 15:00:03 -07:00
yuanjingx87
88d9d77912
[None][infra] Update CI allowlist 2025-09-16 ( #7773 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-16 14:55:30 -07:00
Chang Liu
98f533453a
[TRTLLM-7398][doc] Add doc for KV cache salting support ( #7772 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-16 14:49:14 -07:00
yuanjingx87
0f30d7dd6f
[None][infra] add nspect allow list for false positive secrets ( #5797 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-16 13:23:01 -07:00
Aurelien Chartier
471723bce1
[None][chore] Remove unused get_quant_scales methods ( #7687 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-09-16 12:56:11 -07:00
Lucas Liebenwein
9befd1a72f
[None][chore] AutoDeploy: neat disablement of transforms in pipeline ( #7736 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-16 23:31:48 +08:00
Iman Tabrizian
6ce0624208
[TRTLLM-8044][refactor] Rename data -> cache for cacheTransceiver ( #7659 )
2025-09-16 08:43:56 -04:00
bhsueh_NV
8226ef23dc
Revert "[None][feat] support attention dp for qwen3 dense model" ( #7765 )
2025-09-16 19:09:04 +08:00
xinhe-nv
e7c1569456
[None][chore] Add failed cases into waives.txt ( #7746 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 18:43:40 +08:00
Ziyi Xiong
905bb26bbd
[ https://nvbugs/5471106 ][fix] Remove the waivers ( #7711 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-16 17:43:39 +08:00
xinhe-nv
c6ab2072b5
[None][fix] waive hang tests on main ( #7720 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 17:05:15 +08:00
xinhe-nv
1fbea497ff
[TRTLLM-7070][feat] add gpt-oss serve benchmark tests ( #7638 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 16:39:31 +08:00
amitz-nv
750d15bfaa
[ https://nvbugs/5503529 ][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF ( #7740 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-09-16 16:35:13 +08:00
Kaiyu Xie
6eef19297f
[None] [chore] cherry pick changes on slurm scripts from release/1.1.0rc2 ( #7750 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-09-16 16:07:13 +08:00
Li Min
b278d06481
[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op ( #7632 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-09-16 14:25:26 +08:00
Guoming Zhang
085271eceb
[None][doc] Clean the doc folder and move the outdated docs into lega… ( #7729 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-16 11:43:19 +08:00
Bo Li
3f4e160cba
[None][chore] Fix error when running trtllm-bench without cuda graph. ( #7725 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-09-15 20:30:23 -07:00
Void
103b554734
[None][fix] Ensure that the W4A8 custom input scale remains aligned across all ranks ( #7614 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-09-16 11:04:26 +08:00
xinhe-nv
cf55927064
[None][chore] Add failed cases into waives.txt ( #7735 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 10:58:06 +08:00
Yanchao Lu
e5cead1eb9
[TRTLLM-6295][test] Exit as early as possible and propagate exit status correctly for multi-node testing ( #7739 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-16 09:59:18 +08:00
xiweny
c076a02b38
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices ( #7568 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-09-16 09:56:18 +08:00
Shi Xiaowei
809c4d20c0
[None][doc] Fix the link in the doc ( #7713 )
...
Signed-off-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
2025-09-16 09:50:25 +08:00
Necofish
96f11b10ae
[None][feat] support attention dp for qwen3 dense model ( #7618 )
...
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
2025-09-16 09:33:22 +08:00
QI JUN
44d5ccfdd9
[None][ci] move qwen3 tests from GB200 to B200 ( #7733 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-16 08:12:28 +08:00
Ziyi Xiong
536e8776cd
[TRTLLM-6668][feat] Enable overlap scheduler for two-model spec decoding ( #7651 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-16 07:33:44 +08:00
Lucas Liebenwein
857c0b45be
[None][infra] AutoDeploy: codeowners for autodeploy unit tests ( #7743 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-15 11:20:12 -07:00
Izzy Putterman
8097be7e9c
[None][feat] Eagle, use last hidden post norm ( #7546 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-15 12:23:57 -04:00
Yanchao Lu
0c9430e5a5
[None][ci] Test waives for the main branch 09/15 ( #7709 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-15 22:13:56 +08:00
jmydurant
7deefb3d2b
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill ( #7477 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-09-15 21:43:49 +08:00
Zheng Duan
24fc1f9acf
[None][fix] using arrival time in llmapi when creating LlmRequest in pytorch workflow ( #7553 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-09-15 07:26:01 -04:00
Wanli Jiang
e080294725
[TRTLLM-7918][feat] Revert "Support kvcache reuse for phi4mm ( #7563 )" ( #7722 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-15 17:19:44 +08:00
ixlmar
965a3dab90
[None][test] add test for min_tokens ( #7678 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-09-15 08:59:23 +01:00
Wanli Jiang
fc9f4c9295
[TRTLLM-7918][feat] Support kvcache reuse for phi4mm ( #7563 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-15 15:47:00 +08:00
HuiGao-NV
335c007df8
[None][chore] move some cases from post-merge to pre-merge to detect errors in early stage ( #7699 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-15 15:37:58 +08:00
DylanChen-NV
d5df0af017
[ https://nvbugs/5467981 ][fix] Fix Qwen2.5-VL fails with cuda graph padding ( #7122 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-09-15 15:02:34 +08:00
Ivy Zhang
ddfe0320b3
[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill ( #7365 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-15 13:38:52 +08:00
JunyiXu-nv
a2c45d82c3
[None][chore] Enable multiple postprocess workers tests for chat completions api ( #7602 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-09-15 12:16:44 +08:00
xinhe-nv
b69e3e9f99
[None][chore] Add failed cases into waives.txt ( #7682 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-15 11:44:52 +08:00
Chang Liu
47e37755a3
[TRTLLM-6903][feat] Support chunked prefill for multimodal models ( #6843 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-14 20:10:10 -07:00
Perkz Zheng
1b29c2e731
[None][feat] support gpt-oss with fp8 kv cache ( #7612 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-15 02:17:37 +08:00
Yanchao Lu
70aa4e28c1
[None][ci] Test waives for the main branch 09/14 ( #7698 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-14 23:48:04 +08:00