Commit Graph

1480 Commits

Author SHA1 Message Date
Daniel Cámpora
fdf1c47d1d
[TRTLLM-4995][feat] TRTLLM Sampler log probs support (#4836)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-06-11 08:18:13 +02:00
Enwei Zhu
00991d1520
chore: Merge remaining changes from feat/large-ep branch to main (#5039)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-11 13:47:43 +08:00
Yiqing Yan
0a9f105931
Waive L0 tests (#5111)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-11 11:53:15 +08:00
Zhanrui Sun
035b048a65
infra: Add timeout and retry for wget in docker image build (#5035)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-06-11 10:37:13 +08:00
ChristinaZ
273c6b9355
[https://nvbugspro.nvidia.com/bug/5332927][fix] Fix the bug in the routing unit test (#5065)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-06-11 09:44:35 +08:00
Zheng Duan
580a92521e
test: conditional disagg and cache aware balancing for deepseek v3 (#4522)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-06-11 09:44:29 +08:00
Bo Li
1b79041f5d
fix: XQA is not enabled when history_length < kMinHistoryTokensPerBlock. (#4264)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-06-11 09:38:10 +08:00
Mike Iovine
fcd71921f1
[fix] Unwaive test_llama_eagle3 (#5042)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-10 18:11:07 -04:00
Izzy Putterman
6cb2b7d370
CI: Allow run (#5101)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-06-11 06:03:38 +08:00
Jinyang Yuan
194a708d83
[fix] Fix test_attention_mla (#5084)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-06-10 14:20:11 -07:00
Linda
50f576172b
doc: add info about stop words appearing in output (#4956)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-06-10 22:38:33 +02:00
nvpohanh
7b210ae9c3
test: add unit tests for Llama4 min_latency code (#4980)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-06-10 12:10:26 -07:00
Lucas Liebenwein
7ddc4d6282
[AutoDeploy] Merge Feature Branch Week 3 (#5054)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-06-11 00:20:43 +08:00
Tracin
6c91f1c7ac
Mxfp8xmxfp4 quant mode(#4978)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-10 22:01:37 +08:00
liji-nv
f6a49a9343
[CI] waive failing L0 test (#5089)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-10 20:40:44 +08:00
Zongfei Jing
6d1f2d0fd7
[TRTLLM-3927] [feat] Finalize + Allreduce + add + rmsnorm fusion (#4756)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-06-10 19:55:16 +08:00
Yuxian Qiu
08dc369a4d
fix: pytorch_backend_config is deprecated in update_llm_args_with_extra_dict. (#4890)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-06-10 18:40:29 +08:00
Aurelien Chartier
dcf72c6ad3
chore: cleanup GDS Cmake interface (#4928)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-06-10 17:25:43 +08:00
Yiqing Yan
8ec8e4559d
Waive L0 test (#5077)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-10 16:23:49 +08:00
tomeras91
f121f13ddf
[nvbug 5325284][fix] Increase Nemotron-H warmup request robustness (#4954)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-06-10 11:09:37 +03:00
Yiqing Yan
fdfc711261
Waive L0 test (#5067)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-10 15:40:57 +08:00
dongxuy04
7137cc8f67
fix cuda driver link issue with driver version less than 12.3 (#5025)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-06-10 15:27:39 +08:00
Xiaowei Wang
ec6b1821c7
[fix] Fix W4A8 weight loading error in WInt4AFP8FusedMoEMethod (#5026)
Signed-off-by: Xiaowei Wang <100599594+xiaoweiw-nv@users.noreply.github.com>
2025-06-10 15:09:06 +08:00
QI JUN
12ffdcbf53
CI: waive test_ad_build_small_multi (#5071)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-10 14:54:05 +08:00
Simeng Liu
86959ef1e4
chore: Waive CI failure. (#5069)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-06-10 14:04:10 +08:00
pcastonguay
87c56ab024
perf: Removing initializing ptuning buffers to zero (#4915)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-09 21:57:21 -04:00
Stanley Sun
74b0e71ef4
test: add more disaggregated serving tests into QA testlist (#5036)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-06-10 09:24:53 +08:00
Daniel Cámpora
d68b8180d3
feat: port MakeDecodingBatchInputOutput to python in TRTLLMSampler (#4828)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-06-10 07:28:34 +08:00
pcastonguay
6d4d179cac
[TRTLLM-5518] doc: Adding disaggregated serving section to models doc (#4877)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-09 17:19:02 -04:00
tburt-nv
e2bd01fa18
[https://nvbugs/5332927] Waive new tests (#5051)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-06-10 05:17:54 +08:00
Chang Liu
f70815c945
[TRTLLM-5007][feat] Add multimodal hashing support (image hashing) (#4145)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-06-10 01:59:56 +08:00
Yukun He
5097c86168
chore: Change cutlass version back to 4.0 (#5041)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-06-09 22:57:05 +08:00
Yuxian Qiu
e79527d195
chore: Refine weight prefetching. (#4893)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-06-09 21:24:16 +08:00
pcastonguay
5b84fd9201
[nvbug 5283506] fix: Fix spec decode triton test (#4845)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-09 08:40:17 -04:00
Mike Iovine
f4d9c87c51
[nvbug/5314469][feat] Include the executor's max batch size in CUDA g… (#4843)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-09 08:31:35 -04:00
Yukun He
137fe35539
fix: Fix warmup phase batch size out of range. (#4986)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-09 19:19:16 +08:00
Yuxian Qiu
88480197da
ci: [nvbugs/5280806] Unwaive unittests/_torch. (#4951)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-06-09 19:04:11 +08:00
Dom Brown
9c012d5bf8
[TRTLLM-5589] feat: Integrate TRT-LLM Gen FP8 Batched GEMM with Pytorch workflow kernel autotuner (#4872)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-06-09 11:02:48 +01:00
liji-nv
1d4f748773
[fix] Fix illegal mem access and possible accuracy lose. Cherry-pick … (#5017)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-09 17:50:57 +08:00
ChristinaZ
f45aff2b7d
Add customized renormalized moe routing kernel for moe cutlass backend (#4955)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-06-09 17:38:50 +08:00
Bo Li
c104388d37
chore: Refactor apply_rope. (#4918)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-06-09 16:51:59 +08:00
Yiqing Yan
6b17dff2f1
Waive L0 test (#5024)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-09 16:03:15 +08:00
Chuang Zhu
9a874760c1
Kv cache transfer support duplicate heads (#4929)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-06-09 14:11:19 +08:00
Chuang Zhu
947571c311
Fix buffer count (#5007)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-06-09 14:01:13 +08:00
Yan Chunwei
f4bfb8e49d
ci: unwaive llmapi launch test (#4991)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-09 13:25:43 +08:00
Daniel Stokes
3a4851b7c3
feat: Add Mixture of Experts FP8xMXFP4 support (#4750)
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-06-09 13:25:04 +08:00
amitz-nv
77e8d739f1
[TRTLLM-4987][feat] Support generation logits in TRTLLMSampler (#4819) 2025-06-09 06:30:01 +03:00
Yechan Kim
8b4104d34a
feat: add HyperCLOVAX-SEED-Vision support in refactored way (#4799)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-09 11:04:04 +08:00
Julien Demouth
bb79ba7c35
Edits for tech blog 4 (#5006)
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-09 09:38:41 +08:00
nv-guomingz
78472339b3
fix:https://nvbugs/5324252 (#4925)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-09 01:15:45 +08:00