Commit Graph

4140 Commits

Author SHA1 Message Date
Yuxian Qiu
cd4e639536
[None][feat] Async pp send. (#9952)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-13 00:52:30 +08:00
Chuang Zhu
4cc4cbe926
[https://nvbugs/5716787][fix] terminate nixl running when exiting (#9785)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-12 11:15:02 -05:00
Chuang Zhu
9c59c9f920
[https://nvbugs/5643787][fix] remove the war path for notify to itself (#9834)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 11:10:05 -05:00
JunyiXu-nv
2fec53dfa5
[TRTLLM-9637][feat] Support tool parser for Kimi K2 (#9830)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-12 23:32:39 +08:00
Yihan Wang
9df4dad3b6
[None][fix] Introduce inline namespace to avoid symbol collision (#9541)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 23:32:15 +08:00
Balaram Buddharaju
af315d8ef1
[TRTLLM-5972][chore] Load balance decode token KV cache with helix parallelism (#9757)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:05 +08:00
zackyoray
d5b9ad91c9
[None][feat] Upgrade NIXL to v0.8.0 (#9707)
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
Signed-off-by: zackyoray 
Signed-off-by: Bo Deng 
Co-authored-by: Bo Deng
2025-12-12 20:21:10 +08:00
Lucas Liebenwein
e767fc649a
[None][feat] AutoDeploy: prepare_metadata revisited (#9764)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-12-12 20:14:14 +08:00
Yukun He
a6263a127f
[None][chore] Degrade log level in cublas fp4 runner when using default configs (#9951)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-12 18:53:54 +08:00
ruodil
9b3e5e90ee
[None][test] fix a typo in model name in script (#9867)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-12 17:35:55 +08:00
chenfeiz0326
61745f034a
[https://nvbugs/5727481][ci] Fix Port Conflict in Perf-Sanity CI Test (#9896)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-12-12 17:16:50 +08:00
kris1025
2fc94e5dd7
[None][chore] unwaive qwen3 accuracy test (#9895)
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-12-12 16:30:09 +08:00
yufeiwu-nv
fd3d3a553d
[None][chore] Modify python ipc_util to align with C++ path (#9894)
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-12-12 15:55:22 +08:00
Yihan Wang
711016c799
[https://nvbugs/5736923][infra] Waive timeout disaggregated/test_auto_scaling[http-round_robin] test (#9942)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 15:15:13 +08:00
yuanjingx87
eeb03f314a
[None][infra] Replace the deprecated github token (#9915)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-12-11 22:46:14 -08:00
Yifei Wang
9d1f2a9925
[#6425][fix] address CUDA stream sync issue in ModelRunnerCPP (#6426)
Signed-off-by: yifei.w <yifei.w@bytedance.com>
2025-12-12 13:33:22 +08:00
Ivy Zhang
fded6c393d
[TRTLLM-9262][test] add groupgemm ada case for rcca (#9833)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-12-12 13:23:33 +08:00
Kaiyu Xie
110820bb15
[TRTLLM-9792] [feat] Support multiple instances on single node for slurm scripts (#9900)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-12 12:12:08 +08:00
Chuang Zhu
bd441e9822
[None][infra] revert ucx to 1.19 (#9936)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 11:37:19 +08:00
Yiteng Niu
3e39afea9a
[None][infra] update nspect version for api change (#9899)
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
2025-12-12 11:27:42 +08:00
dominicshanshan
093465ed29
[https://nvbugs/5599176][fix] Unwaive fixed test for Ray (#9861)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-12-12 11:24:05 +08:00
TensorRT LLM
0132769c22 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2025-12-12 03:20:43 +00:00
Yiqing Yan
5065b60cd1
[None][infra] Fix mergeWaiveList stage (#9892)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-12 11:19:42 +08:00
xinhe-nv
e8efeb765d
[TRTLLM-9717][fix] fix multi nodes tests cases (#9736)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-12-12 10:14:23 +08:00
Chuang Zhu
4670e0c297
[None][infra] update ucx to 1.20 (#9786)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 09:49:46 +08:00
JunyiXu-nv
710c592d7c
[https://nvbugs/5727517][fix] Preserve ip:port for disagg (#9859)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-12 09:45:34 +08:00
Kanghwan
98c68c195b
[None][infra] Ignore comments from bots and CI accounts (#9929)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-12-12 09:20:51 +08:00
jthomson04
4f6d4da035
[None][perf] Fix TPOT when min_tokens set (#9862)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-12-11 13:55:31 -08:00
Kanghwan
95d928f071
[None][infra] Add workflow to auto-label 'waiting for feedback' on team comments (#9886)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-12-12 05:43:30 +08:00
Venky
fd1270b9ab
[TRTC-43] [feat] Add config db and docs (#9420)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-12-12 04:00:03 +08:00
Simeng Liu
24f92721f2
[https://nvbugs/5597647][ci] Unwaive fixed tests. (#9812)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-12 02:29:30 +08:00
Erin
89dabf5aa1
[TRTLLM-9736][feat] AsyncLLM and verl integ (#9353)
Signed-off-by: Liwei Ma <liweim@nvidia.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Liwei Ma <liweim@nvidia.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-12-11 09:33:25 -08:00
JadoTu
02edb19f43
[None] [feat] add eos_token_id in generation_config to sampling params (#9514)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2025-12-12 00:52:03 +08:00
xxi
488d38f88d
[TRTLLM-8959][feat] ConfigurableMoE support CUTLASS (#9772) 2025-12-12 00:22:13 +08:00
Fanrong Li
af2849cc7a
[None][doc] Add DeepSeek-V3.2 to the supported models (#9893)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-11 18:04:48 +08:00
Yan Chunwei
04a39a4e2b
[None][chore] enable test_ipc.py (#9865)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-12-11 17:47:14 +08:00
Zongfei Jing
c76b428e2e
[TRTLLM-9685] [feat] Add gather fc1 kernel by cuteDSL (#9618)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-12-11 16:21:32 +08:00
ChristinaZ
b8a5159fad
[None][feat] Enable PDL for indexer topK (#9843)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-11 14:31:23 +08:00
Kanghwan
d147ad053e
[#2730][fix] Fix circular import bug in medusa/weight.py (#9866)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-12-11 13:51:08 +08:00
JunyiXu-nv
454e7e59e5
[https://nvbugs/5718004][fix] Add warmup for cancellation test (#9860)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-12-11 12:20:33 +08:00
Ziyi Xiong
81222c3670
[None] Fix warning when capturing CUDA graph (#9746)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-12-10 19:22:38 -08:00
Bo Deng
c1d53ee43d
[https://nvbugs/5582258][fix] unwaive (#9650)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-12-10 19:18:30 -08:00
fredricz-20070104
341cb1a12c
[None][chore] Add GB300 support since it does not support segment (#9731)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-12-10 18:36:55 -08:00
Patrice Castonguay
2c0293c612
[https://nvbugs/5601682][fix] Unwaiving disagg test (#9627)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-10 13:42:26 -05:00
Tian Zheng
ece3a8748f
[None][doc] Update doc for NVFP4 KV cache (#9475)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-10 06:20:12 -08:00
cheshirekow
2f030312a8
[TRTLLM-9228][infra] Verify thirdparty C++ process (#9367)
Signed-off-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
Co-authored-by: Josh Bialkowski <1309820+cheshirekow@users.noreply.github.com>
2025-12-10 21:01:19 +08:00
Yiqing Yan
1c11cae54d
[None][chore] bump version to 1.2.0rc6 (#9874)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-12-10 04:53:26 -08:00
Yukun He
072f236002
[None][fix] Fully resolve the tactic recovery issues in AutoTuner serialized cache (#9835)
Restrict tactic types to those compatible with AutoTuner cache serialization and deserialization.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-10 20:41:04 +08:00
Matt Lefebvre
df1adfbb50
[TRTINFRA-7328][infra] - Move half B200 tests to lbd (#9853)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
2025-12-10 04:24:30 -08:00
Brian K. Ryu
8cec2da375
[None][feat] Port fp4 quantization kernel optimization from FlashInfer (#9854)
Signed-off-by: Brian Ryu <bryu@nvidia.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-12-10 13:13:48 +01:00