Jinyang Yuan
5339d367ce
[perf] Reduce the workspace size of FP4 activation scales for MoE ( #4303 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-30 09:03:52 +08:00
hlu1
3093c747b7
[Architecture] Redesign Linear module ( #4721 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-29 16:05:46 -07:00
Yilin Fan
31bb650298
Cherry pick feat/llama4 to main ( #4739 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Co-authored-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-05-30 05:28:40 +08:00
Robin Kobus
79a94a28f9
refactor: unique_ptr instead of shared_ptr ( #4697 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-29 22:49:35 +02:00
Mike Iovine
2d61174a9b
[feat] Support RULER + chunked prefill in lm-eval-harness ( #4592 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-30 04:14:00 +08:00
Jhao-Ting Chen
fcadce9f8d
[fix] Eagle-2 LLMAPI pybind argument fix. ( #3967 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-05-29 12:23:25 -07:00
QI JUN
255779a91d
Chore: fuse _merge_requests method into _fetch_new_requests method ( #4689 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-05-29 17:47:44 +08:00
yuanjingx87
2c48ff5898
[feat] add b200 support via slurm ( #4709 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-29 14:49:46 +08:00
Yan Chunwei
33a9ba55f5
fix: test trtllm-bench mgmn ( #4613 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-29 14:43:47 +08:00
ruodil
500aca4f44
test: remove perf test l40s/l20 oom test cases and unwaive tests ( #4755 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-05-29 13:58:47 +08:00
Zhanrui Sun
7b2b657198
infra: [TRTLLM-5247][TRTLLM-5248][TRTLLM-5249] Refactor docker build image groovy and support NGC images ( #4294 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-29 11:23:29 +08:00
QI JUN
058f83e47b
CI: move post-merge multi GPU test of PyTorch backend to H200 ( #4733 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-29 11:15:56 +08:00
Yiqing Yan
7f29a70f53
Waive L0 test ( #4748 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-29 11:05:27 +08:00
Yan Chunwei
ac17142495
chore: rename ExecutorBindingsWorker/Proxy ( #4716 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-29 10:32:35 +08:00
yuanjingx87
2307e91122
[fix] add back rtx6000pro tests ( #4679 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-29 09:22:52 +08:00
Arthur Rasmusson
812b1abf86
feature: KV Cache GPUDirect Storage ( #3209 )
...
Signed-off-by: Arthur Rasmusson <47877520+arthurrasmusson@users.noreply.github.com.>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-28 23:27:43 +00:00
Erin
820c39041f
chore: [nvbug_5273941] unwaive test_llm_loading_from_ckpt_for_tp2 ( #4725 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-29 06:54:32 +08:00
Yuxian Qiu
bf691b3d28
feat: support packed weights in vanilla moe ( #4719 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-29 06:24:24 +08:00
Aurelien Chartier
6cf1e4d0a9
chore: add -f to pkill calls ( #4711 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-29 02:54:31 +08:00
Robin Kobus
12763779c4
chore: Clean up cpp runtime ( #4449 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-28 16:32:59 +02:00
Ivy Zhang
ed3c67e34a
tests: [ https://nvbugspro.nvidia.com/bug/5289908 ] run maverick bf16 on blackwell ( #4722 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-28 22:05:51 +08:00
xinhe-nv
93283484c2
test: [CI] Add failed cases into waives.txt ( #4688 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-28 22:04:35 +08:00
Aurelien Chartier
6b96f09ff2
chore: remove extra paths to find binaries ( #4706 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-28 06:51:42 -07:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
ixlmar
fbe4db207d
feat: forward exceptions to Python and catch OOMs ( #4497 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-05-28 11:58:10 +02:00
Yiqing Yan
66828015a0
Fix rerun step ( #4715 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-28 17:01:05 +08:00
Iman Tabrizian
c875184f78
Add missing serialization classes ( #4642 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-28 16:40:23 +08:00
amirkl94
fbec0c3552
Release 0.20 to main ( #4577 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-28 16:25:33 +08:00
Kaiyu Xie
b800adc65c
Fix: hang on disagg when MNNVL two-shot AllReduce is enabled ( #4678 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-28 13:03:53 +08:00
Martin Marciniszyn Mehringer
f3fba4cc63
doc: Document the docker release image on NGC ( #4705 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-28 12:00:53 +08:00
Pengyun Lin
971d16a2ee
[TRTLLM-1658][feat] Enable multiple response in trtllm-serve for TRT backend ( #4623 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-28 11:36:44 +08:00
Bo Li
9c4b8f66b4
feat: Integration of Fused QKNorm+RoPE. ( #4611 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-05-28 11:20:45 +08:00
Shunkangz
6493401986
Fix handle cancel request for attentionDP ( #4648 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-05-28 11:04:02 +08:00
Yuxian Qiu
5700a4ffcd
feat: Add vanilla MOE. ( #4682 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-28 10:44:14 +08:00
Martin Marciniszyn Mehringer
06eba1e061
Update the description for NGC docker images ( #4671 ) ( #4702 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-28 00:44:34 +08:00
yunruis
29ac4c20e0
fix: fix dsr1 min lat cga ar rate drop(0.2) ( #4561 )
...
Signed-off-by: yunruis <yunruis@nvidia.com>
2025-05-27 21:59:57 +08:00
Yuxian Qiu
e538b0d95e
refactor: extract and reuse filter_weights. ( #4681 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-27 19:48:01 +08:00
xinhe-nv
bb3d998eb1
test: [CI] remove closed bugs ( #4638 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-27 18:07:59 +08:00
Perkz Zheng
40a7161f4f
fix: fmha_v2 compilation ( #4659 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-05-27 17:39:39 +08:00
Lucas Liebenwein
5cdd6bb10f
[AutoDeploy] Increased Model Coverage Mass Migration Week 1 ( #4468 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: sugunav14 <178320438+sugunav14@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-27 16:43:15 +08:00
Yiqing Yan
f6c50293d2
[Infra][TRTLLM-3929] Rerun failure tests ( #3264 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-27 16:13:23 +08:00
Yuan Tong
5cb4f9be33
feat: improve build_wheel.py venv handling ( #4525 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-05-27 15:17:14 +08:00
Yiqing Yan
92a7984945
Waive L0 tests ( #4686 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-27 15:07:02 +08:00
QI JUN
1582361400
Chore: only pad one dummy request for attention dp scenario ( #4664 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-05-27 14:56:22 +08:00
Yanchao Lu
d6e1b71388
[Test] - Correct waive the Slurm test stage ( #4677 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-27 12:35:43 +08:00
Tracin
268171bc66
[NVBUG 5301980] Fix fp4 gemm padding. ( #4662 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-05-27 11:30:53 +08:00
xinhe-nv
59f7622281
test: rcca https://nvbugs/5223130 ( #4510 )
...
* add rcca tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* skip tests on blackwell
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-27 09:59:47 +08:00
qsang-nv
157fe62965
fix fmha v2 tests ( #4661 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-05-27 09:47:01 +08:00
Yanchao Lu
258d782b0a
[Test] - Waive RTX Pro 6000 Slurm testing ( #4672 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-27 01:17:22 +08:00
Chuang Zhu
4318037ca3
fix disagg config params ( #4646 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-26 23:28:52 +08:00