tomeras91
|
f121f13ddf
|
[nvbug 5325284][fix] Increase Nemotron-H warmup request robustness (#4954)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-06-10 11:09:37 +03:00 |
|
Yiqing Yan
|
fdfc711261
|
Waive L0 test (#5067)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-10 15:40:57 +08:00 |
|
QI JUN
|
12ffdcbf53
|
CI: waive test_ad_build_small_multi (#5071)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-10 14:54:05 +08:00 |
|
Simeng Liu
|
86959ef1e4
|
chore: Waive CI failure. (#5069)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-06-10 14:04:10 +08:00 |
|
Stanley Sun
|
74b0e71ef4
|
test: add more disaggregated serving tests into QA testlist (#5036)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
|
2025-06-10 09:24:53 +08:00 |
|
tburt-nv
|
e2bd01fa18
|
[https://nvbugs/5332927] Waive new tests (#5051)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
|
2025-06-10 05:17:54 +08:00 |
|
Chang Liu
|
f70815c945
|
[TRTLLM-5007][feat] Add multimodal hashing support (image hashing) (#4145)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
|
2025-06-10 01:59:56 +08:00 |
|
Yuxian Qiu
|
e79527d195
|
chore: Refine weight prefetching. (#4893)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-06-09 21:24:16 +08:00 |
|
pcastonguay
|
5b84fd9201
|
[nvbug 5283506] fix: Fix spec decode triton test (#4845)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-06-09 08:40:17 -04:00 |
|
Mike Iovine
|
f4d9c87c51
|
[nvbug/5314469][feat] Include the executor's max batch size in CUDA g… (#4843)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-06-09 08:31:35 -04:00 |
|
Yukun He
|
137fe35539
|
fix: Fix warmup phase batch size out of range. (#4986)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-09 19:19:16 +08:00 |
|
Yuxian Qiu
|
88480197da
|
ci: [nvbugs/5280806] Unwaive unittests/_torch. (#4951)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-06-09 19:04:11 +08:00 |
|
Dom Brown
|
9c012d5bf8
|
[TRTLLM-5589] feat: Integrate TRT-LLM Gen FP8 Batched GEMM with Pytorch workflow kernel autotuner (#4872)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
2025-06-09 11:02:48 +01:00 |
|
liji-nv
|
1d4f748773
|
[fix] Fix illegal mem access and possible accuracy lose. Cherry-pick … (#5017)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-06-09 17:50:57 +08:00 |
|
ChristinaZ
|
f45aff2b7d
|
Add customized renormalized moe routing kernel for moe cutlass backend (#4955)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-06-09 17:38:50 +08:00 |
|
Yiqing Yan
|
6b17dff2f1
|
Waive L0 test (#5024)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-09 16:03:15 +08:00 |
|
Yan Chunwei
|
f4bfb8e49d
|
ci: unwaive llmapi launch test (#4991)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-09 13:25:43 +08:00 |
|
amitz-nv
|
77e8d739f1
|
[TRTLLM-4987][feat] Support generation logits in TRTLLMSampler (#4819)
|
2025-06-09 06:30:01 +03:00 |
|
Yechan Kim
|
8b4104d34a
|
feat: add HyperCLOVAX-SEED-Vision support in refactored way (#4799)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-06-09 11:04:04 +08:00 |
|
nv-guomingz
|
78472339b3
|
fix:https://nvbugs/5324252 (#4925)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-09 01:15:45 +08:00 |
|
Omer Ullman Argov
|
8731f5f14f
|
chore: Mass integration of release/0.20 (#4898)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-06-08 23:26:26 +08:00 |
|
Mike Iovine
|
ec0d984656
|
[nvbug/5280806][fix] Fix 2 model spec decode flow (#4807)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-06-08 07:40:02 -04:00 |
|
Yanchao Lu
|
9e05613679
|
[Infra] - Update JNLP container config (#5008)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-06-08 16:44:09 +08:00 |
|
dongxuy04
|
1e369658f1
|
feat: large-scale EP(part 6: Online EP load balancer integration for GB200 nvfp4) (#4818)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-06-08 10:25:18 +08:00 |
|
QI JUN
|
5ee0de7f2a
|
Resubmit #4894 (#4969)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-08 04:42:15 +08:00 |
|
Ivy Zhang
|
7dce328ad6
|
[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow (#4940)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Co-authored-by: Ruodi Lu <ruodil@nvidia.com>
|
2025-06-07 11:18:32 +08:00 |
|
nv-guomingz
|
0c7dd660d8
|
fix:https://nvbugs/5324248 (#4973)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-07 04:14:07 +08:00 |
|
Fanrong Li
|
75d020cf07
|
fix: fix cuda graph padding for spec decoding (#4853)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-06 22:21:42 +08:00 |
|
Anthony Chang
|
eeb555e37b
|
chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM (#4826)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-06-06 16:13:54 +08:00 |
|
QI JUN
|
1b963c17c0
|
CI: waive test_llm_multi_node_with_postproc (#4977)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-06 14:19:56 +08:00 |
|
xinhe-nv
|
564472168e
|
test: [CI] Add failed cases into waives.txt (#4966)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-06-06 10:30:15 +08:00 |
|
QI JUN
|
ec50684d80
|
Revert "fix a bug of global cuda graph dummy request" (#4970)
|
2025-06-06 08:54:45 +08:00 |
|
QI JUN
|
154f7cc40a
|
fix a bug of global cuda graph dummy request (#4894)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 19:47:40 +08:00 |
|
Yiqing Yan
|
7e921c78b5
|
Waive L0 tests (#4953)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-05 19:36:48 +08:00 |
|
Shunkangz
|
3eae58ca36
|
Add disaggregated unittest (#4899)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-06-05 19:14:31 +08:00 |
|
ixlmar
|
a1526356aa
|
[TRTLLM-5630] restore free_gpu_memory_fraction=0.9 in tests (#4859)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-05 10:46:29 +01:00 |
|
QI JUN
|
b8c5e3892b
|
Revert "fix: build_config in TorchLlmArgs and avoid invalid args" (#4949)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 17:43:30 +08:00 |
|
QI JUN
|
d5a8079eb6
|
Revert "[infra] Unwaive unittests/_torch" (#4950)
|
2025-06-05 17:21:07 +08:00 |
|
Lucas Liebenwein
|
743fb0a159
|
[AutoDeploy] _AutoDeployLlmArgs as primary config object (#4891)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-06-05 17:20:55 +08:00 |
|
QI JUN
|
91e8d43d66
|
CI: waive test_llm_get_queued_stats (#4945)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 16:44:56 +08:00 |
|
xinhe-nv
|
1c3091c63b
|
tests: [TRTQA-2906] add benchmark serving tests (#4901)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-05 14:33:03 +08:00 |
|
Netanel Haber
|
ddbaa5ef80
|
Only pass fast_build=true to non-pytorch backend (#4920)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2025-06-05 13:30:17 +08:00 |
|
Yiqing Yan
|
9ceef983c0
|
Waive L0 tests (#4927)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-05 11:09:01 +08:00 |
|
xinhe-nv
|
50a74a1daa
|
tests: fix 5273697 (#4685)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-06-05 10:39:21 +08:00 |
|
Shiyu Li
|
b0d287c9b7
|
[TRTLLM-4647][fix] Fix the no fusion allreduce hanging (#4594)
Signed-off-by: Shiyu Li <shili@nvidia.com>
|
2025-06-04 18:26:13 -07:00 |
|
Mike Iovine
|
8433091630
|
[infra] Unwaive unittests/_torch (#4919)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-06-05 08:49:37 +08:00 |
|
Lucas Liebenwein
|
f9d45e03a4
|
[AutoDeploy] deprecate CI post-merge tests and keep them for local testing (#4892)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-06-05 08:27:17 +08:00 |
|
Yan Chunwei
|
8e0d96fcc6
|
fix: LLM invalid arg in a test (#4922)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-05 08:00:32 +08:00 |
|
Yuxian Qiu
|
6b3242654e
|
fix: Fix broken vanilla moe since FusedMoE refactor. (#4897)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-06-05 03:56:41 +08:00 |
|
Yi Zhang
|
1fca654bfd
|
tests: Update gb200 test case (#4754)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-06-04 18:49:20 +08:00 |
|