Yiqing Yan
|
4fef14da56
|
Deduplicate waive list (#5546)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-06-30 11:12:26 +08:00 |
|
nv-guomingz
|
578430e64c
|
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 11:05:40 +08:00 |
|
Omer Ullman Argov
|
2780fc27a7
|
[ci] remove MMLU if followed by GSM8K (#5578)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-30 05:29:54 +03:00 |
|
Cheng Hang
|
64db7d27f6
|
[feat] Optimizations on weight-only batched gemv kernel (#5420)
Signed-off-by: Cheng Hang <chang@nvidia.com>
|
2025-06-30 10:20:16 +08:00 |
|
Omer Ullman Argov
|
94dc97ab10
|
[feat][test] reuse MPI pool executor across tests (#5566)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-29 17:23:12 +03:00 |
|
tomeras91
|
a1c1c6b504
|
[CI] reduce mamba2 ssm test parameterization (#5571)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-06-29 15:56:23 +03:00 |
|
Talor Abramovich
|
70e34a3291
|
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve (#5376)
Signed-off-by: Talor Abramovich <talora@nvidia.com>
|
2025-06-29 12:46:30 +00:00 |
|
amirkl94
|
a985c0b7e6
|
tests: Move stress tests to be Post-Merge only (#5166)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
|
2025-06-29 09:44:47 +03:00 |
|
Emma Qiao
|
9db769ee62
|
[Infra] - Add import pytest (#5565)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-29 11:06:14 +08:00 |
|
Lucas Liebenwein
|
619709fc33
|
[AutoDeploy] merge feat/ad-2025-06-13 (#5556)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-06-29 03:52:14 +08:00 |
|
Li Min
|
6021a439ab
|
Make moe permute and final as custom op (#5412)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
|
2025-06-27 15:48:33 -07:00 |
|
Iman Tabrizian
|
26b953e29a
|
[nvbugs/5309940] Add support for input output token counts (#5445)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-06-28 04:39:39 +08:00 |
|
Darragh Hanley
|
5437075def
|
ReDrafter support for Qwen (#4875)
Signed-off-by: darraghdog <darragh.hanley@gmail.com>
Signed-off-by: Darragh Hanley <darragh.hanley@gmail.com>
Co-authored-by: rakib-hasan <rhasan@nvidia.com>
|
2025-06-28 02:33:10 +08:00 |
|
Aurelien Chartier
|
833c0dea4a
|
[TRTLLM-6104] feat: add request_perf_metrics to LLMAPI (#5497)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-06-27 17:03:05 +02:00 |
|
wili
|
56cdfe5c6c
|
[TRTLLM-5000][feat] NGrams V2 (#4569)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-06-27 23:00:17 +08:00 |
|
Omer Ullman Argov
|
6fc1c6fd7b
|
[fix][ci] correct unittests test prefix (#5547)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-27 20:34:44 +08:00 |
|
Enwei Zhu
|
7f1893f54c
|
ci: waive flaky test test_llama_eagle3 (#5548)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-27 19:16:07 +08:00 |
|
Emma Qiao
|
980030c816
|
[Infra] - Waive failed case in post-merge (#5536)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-27 13:55:49 +08:00 |
|
Iman Tabrizian
|
49af791f66
|
Add testing for trtllm-llmapi-launch with tritonserver (#5528)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-06-27 11:19:52 +08:00 |
|
xinhe-nv
|
a3494bebec
|
tests: waive failed tests on main (#5512)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-27 10:13:22 +08:00 |
|
Yibin Li
|
0f3bd7800e
|
[TRTLLM-4971]: Use safe deserialization in ParallelConfig (#4630)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
|
2025-06-27 09:58:41 +08:00 |
|
Frank
|
aa6e015ef8
|
Update trtllm-bench to support new Pytorch default. (#5491)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
|
2025-06-26 17:05:43 -07:00 |
|
jmydurant
|
8836990bde
|
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2025-06-26 22:18:08 +08:00 |
|
Robin Kobus
|
8dfa31c71d
|
refactor: remove batch_manager::KvCacheConfig and use executor::KvCacheConfig instead (#5384)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-26 19:45:52 +08:00 |
|
Omer Ullman Argov
|
6bae76d7ca
|
[fix][ci] move torch tests to run under torch stage (#5473)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-26 14:31:38 +03:00 |
|
Omer Ullman Argov
|
1633bd2bef
|
[CI] move flashinfer llama tests to post merge (#5506)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-26 19:27:32 +08:00 |
|
xinhe-nv
|
ff2dd72df4
|
tests: waive tests (#5458)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-26 14:53:55 +08:00 |
|
Bo Li
|
1bab9000a6
|
perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf (#5318)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-06-26 14:03:56 +08:00 |
|
dongxuy04
|
490d2e5819
|
feat: large-scale EP(part 8: Online EP load balancer integration for PCIe fp8) (#5226)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-06-25 22:25:13 -07:00 |
|
Emma Qiao
|
32d1573c43
|
[Infra] - Add timeout setting for long tests found in post-merge (#5501)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-26 11:31:39 +08:00 |
|
Venky
|
d9b75f83fd
|
[CI] Waive test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False] (#5494)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
|
2025-06-25 20:17:12 -07:00 |
|
jmydurant
|
578dbc8d9a
|
feat: chunked prefill for MLA (Blackwell) (#4651)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2025-06-26 09:01:00 +08:00 |
|
HuiGao-NV
|
74ae15a26b
|
CI: enable test cases on single device type (#5484)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-26 08:03:44 +08:00 |
|
QI JUN
|
feaf789342
|
CI: reduce BF16 test cases in B200 (#5482)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-06-26 07:18:20 +08:00 |
|
Omer Ullman Argov
|
bdc8dfebc3
|
[fix][ci] dont build wheel for cpp tests (#5443)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-26 00:13:47 +03:00 |
|
Omer Ullman Argov
|
61bb71fd1b
|
[fix][test] remove test in global scope (#5470)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-25 23:42:26 +03:00 |
|
QI JUN
|
3a2c4ca77b
|
chore: split _build_model method for TorchLlm and TrtLlm (#5418)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-06-26 04:32:46 +08:00 |
|
Daniel Cámpora
|
205c97a4ae
|
[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler (#5328)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
2025-06-25 17:41:36 +02:00 |
|
HuiGao-NV
|
314f15f0a7
|
Fix: fix nvbug 5356427 (#5464)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-25 22:24:26 +08:00 |
|
HuiGao-NV
|
cc3c2b3be2
|
Move 3 disaggregated cases from 4 GPUs devices to 1 GPU device (#5457)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-25 21:38:14 +08:00 |
|
Kaiyu Xie
|
d6ada5ffce
|
[nvbug/5354956] fix: unexpected keyword argument 'streaming' (#5436)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-25 20:37:24 +08:00 |
|
QI JUN
|
2901c5a5bc
|
CI: waive test_ad_build_small_multi (#5471)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-25 16:44:42 +08:00 |
|
Netanel Haber
|
3ca2f6ac51
|
start OAIServer with max_beam_width=1 for TorchSampler (#5427)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
|
2025-06-25 15:52:06 +08:00 |
|
Enwei Zhu
|
fc7a81ceb0
|
test: Add LLGuidance test and refine guided decoding (#5348)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-25 14:12:56 +08:00 |
|
Enwei Zhu
|
76da7fed86
|
fix (NvBug 5354925): Fix static EPLB (#5411)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-25 13:14:40 +08:00 |
|
HuiGao-NV
|
da98e03747
|
tests: Set kv cache free memory fraction in test case (#5433)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-25 12:31:58 +08:00 |
|
Shunkangz
|
d5354897c0
|
feat: Dynamically remove servers in PD (#5270)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-06-25 09:50:04 +08:00 |
|
Lucas Liebenwein
|
5cffb7e0ec
|
[AutoDeploy] Merge feat/ad_2025_06_13 feature branch (#5454)
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-06-25 09:30:13 +08:00 |
|
QI JUN
|
241f921800
|
waive test_moe.py::test_moe_fp8[autotune] (#5455)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-25 09:14:44 +08:00 |
|
dongxuy04
|
699520082b
|
Add MTP support for Online EPLB (#5213)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-06-25 07:58:13 +08:00 |
|