Commit Graph

4570 Commits

Author SHA1 Message Date
Ziyi Xiong
d8b5aeb061
[https://nvbugs/5652062][fix] Rewind kv_cache and reset draft tokens (#10160)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-25 09:13:51 -05:00
ZhichenJiang
46e4af5688
[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201)
Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-25 09:04:20 -05:00
Lizhi Zhou
fe12faef81
[https://nvbugs/5752516][chore] unwaive test; fix port conflicts in CI (#10152)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-12-25 08:16:09 -05:00
Iman Tabrizian
cd5cd60ee4
[None][infra] Move install_boost from install_triton.sh to install_base.sh (#10055)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-12-25 08:09:55 -05:00
Zhenhuan Chen
8462cf6c96
[TRTLLM-9578][feat] make PDL enabled by default (#9695)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-12-25 07:15:24 -05:00
Jatin Gangani
97b38ac403
[None] [doc] Update IFB performance guide & GPTOSS deployment guide (#10283)
Signed-off-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
Co-authored-by: Jatin Gangani <jgangani@dc2-container-xterm-014.prd.it.nvidia.com>
2025-12-25 05:52:04 -05:00
Emma Qiao
0ecdb69b93
[None][infra] Waive failed tests for main on 12/25 (#10298)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-25 05:22:39 -05:00
Xianjie Qiao
53b81783b1
[None][fix] Fix pageable H2D memcopy issue on GB200 (#10289)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-12-25 18:15:57 +08:00
Jie Li
83e02ee335
[None][chore] Remove NIM TRT-Backend Test Lists (#10232)
Signed-off-by: Jie Li <lijie@nvidia.com>
2025-12-25 04:01:51 -05:00
Enwei Zhu
182b3eb633
[None][ci] Waive TestLlama3_1_8B::test_auto_dtype[False-2] for timeout (#10293)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-25 02:35:18 -05:00
Gabriel Wu
1d01214ff0
[None][feat] Drop non-deepgemm fp8 block scale gemm (#10256)
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-12-25 14:52:52 +08:00
xinhe-nv
4ae6f6a46c
[None][chore] Add failed cases into waives.txt (#10249)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-25 01:26:21 -05:00
heyuhhh
7395ca93b6
[None][doc] Add Sparse Attention feature doc (#9648)
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-25 00:26:18 -05:00
Venky
c059e6caa1
[TRTC-121] [feat] Add recipe selector UI to complement the recipe database (#10125)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-12-24 23:56:54 -05:00
gramnarayan
a9eb5afc9f
[#9241][feat] AutoDeploy: Support Eagle3 Speculative Decoding (#9869)
Support two model flow with no overlap scheduler or chain drafter. Drafting model is in PyTorch backend.

Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2025-12-24 23:30:42 -05:00
TensorRT LLM
1f8ed71d5f [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-25 03:08:38 +00:00
Emma Qiao
16fd781e42
[TRTLLM-9862][infra] Move single-gpu tests on rtxpro6000d to pre-merge (#9897)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-24 21:45:33 -05:00
Ziyi Xiong
43178590d1
[TRTLLM-10143][feat] Reuse previous draft requests if possible (#10263)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-24 17:48:38 -08:00
Neta Zmora
c4b36d31ff
[#10137][feat] AutoDeploy FP8 MoE refactor (#10138)
The trtllm (cutlass) fp8 moe operator performs W3+W1 fusion (concat) during inference and we want to move this fusion to the model optimization time.

The Cutlass MoE kernel is used thru a trtllm torch operator.
Its implementation uses two FC operations (fc1 and fc2) while the canonical MoE API defines three GEMM operations and their associated weights (W1, W2, W3) so when we switch from the torch.moe op to the trtllm.moe op we also change terminology from w1, w2, w3 to fc1, fc2.

Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-12-24 18:58:10 +02:00
Necofish
8614cd3439
[None][fix] fix: resolve GPU memory imbalance in concurrent weight loading (#6472)
Signed-off-by: Necofish <liuxiangyang@mail.ustc.edu.cn>
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
Signed-off-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
2025-12-24 09:43:09 -05:00
Suyog Gupta
e2891a6c77
[#10052][feat] AutoDeploy enable cudagraphs for flashinfer BatchDecode (#10193)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-12-24 05:55:09 -08:00
Stanley Sun
ddac4d7379
[None][test] Add disag-serving auto scaling qa test (#10262)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
2025-12-24 08:43:47 -05:00
Yiqing Yan
69152c4e7c
[None][infra] Check GB200 coherent GPU mapping (#10253)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-24 17:12:36 +08:00
tcherckez-nvidia
56ef97e06e
[#10246][feature] Move AD dashboard to use cudagraph compile backend (#10267)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-24 11:09:59 +02:00
Jonas Li
ecea71ca7a
[None][chore] Update tinygemm kernel name (#10248)
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
2025-12-24 02:33:25 -05:00
shuyixiong
f4f0fe85e9
[TRTLLM-9737][chore] Add rl perf reproduce script and enhance the robustness of Ray tests (#9939)
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2025-12-24 15:27:01 +08:00
xinhe-nv
534700ecd9
[None][chore] Add failed cases into waives.txt (#10240)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-12-24 02:21:50 -05:00
Yukun He
595daa5089
[TRTLLM-9615][feat] Support synchronization through PP ranks in the distributed tuning system (#10011)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-24 15:03:10 +08:00
Fanrong Li
156f6453dc
[TRTLLM-9798][feat] Change to use new DeepGEMM MQA sm100 kernel for MTP-3 (#10226)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-24 14:39:12 +08:00
zackyoray
f6c3bc16b9
[None][docs] Add NIXL-Libfabric Usage to Documentation (#10205)
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
2025-12-23 23:05:40 -05:00
Emma Qiao
7b84e48e0f
[None][infra] Waive failed cases om 12/24 (#10257)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-23 22:49:57 -05:00
TensorRT LLM
68cf5c7924 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-24 03:08:16 +00:00
xinhe-nv
fc1f77eafc
[None][chore] Add failed cases into waives.txt (#10204)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2025-12-24 10:37:23 +08:00
Balaram Buddharaju
8c1cfc872b
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 18:14:30 -08:00
Jhao-Ting Chen
92d90fa29a
[None][feat] Expose enable_trt_overlap in Triton_backend brings 1.05x OTPS (#10018)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-23 11:41:31 -06:00
Grzegorz Kwasniewski
0027a01ad5
[https://nvbugs/5680312][fix] Updated test waiving (#9630)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-23 09:38:12 -08:00
Grzegorz Kwasniewski
06900a7f19
[TRTLLM-9565][fix] Fix deepseek sharding (#9984)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-12-23 10:28:14 -05:00
Emma Qiao
984c20e0b2
[None][infra] Waive failed cases on 12/23 (#10236)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-12-23 08:48:54 -05:00
dongfengy
e284d0bf80
[None][infra] Waive flaky unittest/executor/test_rpc_proxy.py and unittest/executor/test_rpc_worker.py tests (#10209)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-12-23 07:43:13 -05:00
tcherckez-nvidia
64bb1a5155
[None][chore] Update AD coverage to use torch-cudagraph (#10233)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2025-12-23 07:20:32 -05:00
Roey Azran
8408c40d8b
[https://nvbugs/5702786][fix] Fix race conditions in KV cache communication during unexpected termination (#10076)
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
2025-12-23 14:09:51 +02:00
Xianjie Qiao
871c6b435c
[None] [feat] skip batch_tokenize_prompts in CustomDataset (#10214)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-12-23 17:40:57 +08:00
Yukun He
522f1d2bc3
[https://nvbugs/5764627][chore] waive the time-out test (#10222)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-23 16:36:06 +08:00
Balaram Buddharaju
f2e00a75de
[None][chore] Remove helix test from rtx test list (#10224)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 03:07:37 -05:00
Shiyu Li
3ddc9d2b48
[https://nvbugs/5729697][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. (#10062)
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-12-23 16:07:07 +08:00
chenfeiz0326
48c875f8ea
[None][fix] Add OpenSearch URL in slurm_launch.sh for Multinode Perf Sanity Test (#9990)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-23 16:02:38 +08:00
Bo Li
cc1323be24
[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. (#10197)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-23 02:13:37 -05:00
Yiqing Yan
59b05dc0a8
[None][chore] Bump version to 1.2.0rc7 (#10216)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-23 15:07:47 +08:00
Chuang Zhu
53db3b2612
[https://nvbugs/5741884][fix] unwaive disagg sampler (#10189)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-23 14:38:07 +08:00
xinhe-nv
77b591f73b
[None][chore] Add failed cases into waives.txt (#10177)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <lijie@nvidia.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <lijie@nvidia.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-12-23 13:43:50 +08:00