Commit Graph

3079 Commits

Author SHA1 Message Date
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray (#7520)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00
Lucas Liebenwein
9d098e3142
[None][feat] AutoDeploy: graph/module inputs with kwargs instead of args (#8137)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 16:53:42 -07:00
Lucas Liebenwein
2c454e8003
[None][feat] AutoDeploy: Nemotron-H accuracy test (#8133)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 15:39:03 -07:00
Michal Guzek
38da871db3
[TRTLLM-6496][feat] Add LoRa Torch tests for the latest NIM model list (#6806)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-10-03 12:10:48 -07:00
Mike Iovine
ca8291133a
[None][fix] Fix MTP 2-model (#8115)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-10-03 10:13:50 -07:00
Lucas Liebenwein
aaf2c3c2e5
[None][feat] AutoDeploy: compiler backends based on nn.Module (#8126)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 12:14:21 -04:00
Ziyi Xiong
7bc2d9e993
[https://nvbugs/5537878][fix] Reserve an extra slot for padded batch (#7998)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-10-03 08:42:52 -07:00
Suyog Gupta
d8215241d8
[None][feat] AutoDeploy add autotuning when capturing cudagraphs (#8120)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-03 08:33:21 -07:00
Aurelien Chartier
9db4366903
[None][fix] Fix Qwen3 FP8 per-tensor when requesting TRTLLM-GEN MoE backend (#8075)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-03 07:52:52 -07:00
Lucas Liebenwein
5faa5e9dd8
[None][feat] AutoDeploy: dive deeper into token generation bugs + enable_block_reuse (#8108)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 04:57:26 -07:00
Robin Kobus
e2f69c5c23
[None] [refactor] Minor cleanup and improvements (#7619)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-03 11:40:06 +02:00
Erin
ba3dbb6c94
[https://nvbugs/5548098][fix] Fix flakey unit test for dynamic spec d… (#8129) 2025-10-02 22:58:37 -07:00
Nikita Korobov
9b3d7cc3e6
[None][feat] Update TRT-LLM Gen MoE kernels (#7970)
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-10-03 09:22:45 +08:00
Yilin Fan
01423ac183
[None][feat] perf_metrics endpoint functionality improvement (#8005)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: nv-yilinf <206948969+nv-yilinf@users.noreply.github.com>
2025-10-02 17:43:25 -07:00
Grzegorz Kwasniewski
a5b59fd31d
[TRTLLM-6342][bug] Patched incorrect starcoder tp config (#8118)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-10-02 18:41:59 -04:00
Eran Geva
4136942436
[#7588][fix] fixed the kv cache size parsing in test_perf.py AD backend (#8092)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-02 15:55:31 -04:00
Patrice Castonguay
08a47918cf
[None][chore] Adding install_tensorrt.sh script to pip wheel (#8116)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-02 15:47:12 -04:00
Daniel Cámpora
ab433b7228
[None][fix] Fix access to new tokens in sampler. (#7958)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-10-02 15:41:21 -04:00
Patrice Castonguay
fefa7d8fa3
[None][feat] Support for cancelling requests with disaggregation (#8114)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-02 11:04:26 -07:00
dongfengy
6568e565db
[TRTLLM-7775][feat] Integrate tinygemm2 for gpt-oss (#7916)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-02 10:47:04 -07:00
yifeizhang-c
34d158b6da
[TRTLLM-6589][feat] Support CUDA graph for DeepEP (#7514)
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-10-02 10:13:24 -07:00
Erin
293637e0a1
[https://nvbugs/5556020][chore] waive test_eagle3 (#8119)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-02 05:33:21 -04:00
mpikulski
fc7f78c400
[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling (#8110)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-02 10:20:32 +02:00
Eran Geva
32c7f8c36f
[#7588][feat] lock gpu clocks in test_perf.py to reliably detect perf regressions (#8099)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-02 11:18:10 +03:00
Chang Liu
726ac07cc0
[https://nvbugs/5549081][fix] Fix device id assignment for some vision models (#8070)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-10-01 23:28:05 -04:00
brb-nv
bd3d0ad233
[TRTLLM-7733][feat] Executor changes to support helix parallelism (#7972)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-10-01 22:13:03 -04:00
Izzy Putterman
1ad7bc4c78
[None][feat] Draft: Save state first pass (#7012)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-10-01 18:40:55 -04:00
Bo Deng
e107749a69
[None][fix] fix patchelf version issue (#8112)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-01 16:39:22 -04:00
Frida Hou
de99e23696
[#5860][feat] Add ModelOPT INT4 awq fake quant support in AutoDeploy (#7770)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-01 13:13:45 -07:00
Yibin Li
d7581bb551
[TRTLLM-8031][feat] Add chunked return_generation_logits logic (#7831)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-10-01 12:47:07 -04:00
Grzegorz Kwasniewski
6fd225833c
[TRTLLM-6342][bug] Fix shape propagation after TP sharding (#7912)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-10-01 11:15:46 -04:00
sychen52
ba8abeab10
[OMNIML-2336][feat] add W4A8 NVFP4 FP8 fused moe (#7968)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-10-01 02:39:33 -04:00
Patrice Castonguay
b77f19f4ff
[https://nvbugs/5434320][fix] fix: Unwaiving disagg pp tests (#8069)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-10-01 00:33:59 -04:00
Emma Qiao
b1e3fef8aa
[None][infra] Skip failed tests in post-merge for main (#8102)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-01 10:12:10 +08:00
Yechan Kim
e9e4632e44
[None][doc] Add more description on EXAONE usage (#8089)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-09-30 21:32:43 -04:00
peaceh-nv
808e556c79
[None][fix] : Fix OOM issue when dp padding is enabled (#8052)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-10-01 09:10:00 +08:00
brb-nv
84aa3c981e
[None][chore] Waive failing MNNVL alltoall multi-gpu test (#8106)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-09-30 20:05:42 -04:00
mpikulski
ee5ae49337
[TRTLLM-8269][fix] Revert "do not explicitly pass temperature=0 to select greedy sampling" (#8103)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-09-30 16:53:49 -04:00
Guoming Zhang
b4be0d2e4c
[None][chore] Refine qwen3-next implementation. (#8064)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-30 15:05:13 -04:00
Iman Tabrizian
c510b67fa0
[https://nvbugs/5547414][fix] avoid downloading Tiny llama from HF (#8071)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-30 13:47:59 -04:00
Yiqing Yan
1560cca227
[None][chore] Bump version to 1.2.0rc1 (#8097)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-30 06:00:25 -04:00
Yechan Kim
948b8b9569
[None][fix] Fix CUDA graph for Qwen2.5-VL (#8047)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-09-30 14:40:03 +08:00
xinhe-nv
1dba9fa89e
[TRTLLM-6239][feat] add test cases into QA test list (#8081)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-30 00:23:45 -04:00
Kaiyu Xie
b0cb9ca50e
[None] [test] Add MNNVL AlltoAll tests to pre-merge (#7466)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-09-29 23:12:24 -04:00
Lucas Liebenwein
dcfd3ef81c
[#4593][feat] AutoDeploy: Linear Attention Support (SSM + causal_conv + Bamba + Nemotron-H) (#8068)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-09-29 22:41:06 -04:00
Cao Dong
62010c0ab7
[None][feat] Return topk logprobs in torch backend (#7976)
Signed-off-by: Cao Dong <87467313+dcaox@users.noreply.github.com>
2025-09-30 09:32:37 +08:00
Cheng Hang
cdce68c3e0
[TRTLLM-6741][fix] Add heuristics for lm head tp size when enable_lm_head_tp_in_adp=True (#7891)
Signed-off-by: Cheng Hang <chang@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-30 09:24:35 +08:00
Patrice Castonguay
6396cb9208
[https://nvbugs/5538098][fix] Checking connection to etcd server in unit test (#8006)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-09-29 20:53:32 -04:00
Chang Liu
334e2cab0d
[https://nvbugs/5542867][fix] Fix the non-determinism issue in the mm_encoder test (#8033)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-29 09:45:16 -07:00
amitz-nv
e5f9b6aaa0
[None][fix] Fix TRT-python multi LoRA TP=2 test arguments (#8059)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-09-29 12:20:04 -04:00