Commit Graph

804 Commits

Author SHA1 Message Date
Kaiyu Xie
155b19e6b0 feat: Return context response immediately when stream_interval > 1 (#5836)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-10 22:50:57 -07:00
Yegor
5b7007a69d [feat] Detokenize option in /v1/completions request (#5382)
Signed-off-by: Yegor <75512761+Wokzy@users.noreply.github.com>
Signed-off-by: Yegor Yershov <yegor6741@gmail.com>
2025-07-10 22:50:36 -07:00
jmydurant
8836990bde
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-06-26 22:18:08 +08:00
Robin Kobus
8dfa31c71d
refactor: remove batch_manager::KvCacheConfig and use executor::KvCacheConfig instead (#5384)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-26 19:45:52 +08:00
Omer Ullman Argov
6bae76d7ca
[fix][ci] move torch tests to run under torch stage (#5473)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-26 14:31:38 +03:00
Omer Ullman Argov
1633bd2bef
[CI] move flashinfer llama tests to post merge (#5506)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-26 19:27:32 +08:00
xinhe-nv
ff2dd72df4
tests: waive tests (#5458)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-26 14:53:55 +08:00
Bo Li
1bab9000a6
perf: Optimize swizzle_sf, unswizzle_sf, reswizzle_sf (#5318)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-06-26 14:03:56 +08:00
dongxuy04
490d2e5819
feat: large-scale EP(part 8: Online EP load balancer integration for PCIe fp8) (#5226)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-06-25 22:25:13 -07:00
Emma Qiao
32d1573c43
[Infra] - Add timeout setting for long tests found in post-merge (#5501)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-26 11:31:39 +08:00
Venky
d9b75f83fd
[CI] Waive test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False] (#5494)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-06-25 20:17:12 -07:00
jmydurant
578dbc8d9a
feat: chunked prefill for MLA (Blackwell) (#4651)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-06-26 09:01:00 +08:00
HuiGao-NV
74ae15a26b
CI: enable test cases on single device type (#5484)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-26 08:03:44 +08:00
QI JUN
feaf789342
CI: reduce BF16 test cases in B200 (#5482)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-06-26 07:18:20 +08:00
Omer Ullman Argov
bdc8dfebc3
[fix][ci] dont build wheel for cpp tests (#5443)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-26 00:13:47 +03:00
Omer Ullman Argov
61bb71fd1b
[fix][test] remove test in global scope (#5470)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-25 23:42:26 +03:00
QI JUN
3a2c4ca77b
chore: split _build_model method for TorchLlm and TrtLlm (#5418)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-06-26 04:32:46 +08:00
Daniel Cámpora
205c97a4ae
[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler (#5328)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-06-25 17:41:36 +02:00
HuiGao-NV
314f15f0a7
Fix: fix nvbug 5356427 (#5464)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-25 22:24:26 +08:00
HuiGao-NV
cc3c2b3be2
Move 3 disaggregated cases from 4 GPUs devices to 1 GPU device (#5457)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-25 21:38:14 +08:00
Kaiyu Xie
d6ada5ffce
[nvbug/5354956] fix: unexpected keyword argument 'streaming' (#5436)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-25 20:37:24 +08:00
QI JUN
2901c5a5bc
CI: waive test_ad_build_small_multi (#5471)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-25 16:44:42 +08:00
Netanel Haber
3ca2f6ac51
start OAIServer with max_beam_width=1 for TorchSampler (#5427)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-06-25 15:52:06 +08:00
Enwei Zhu
fc7a81ceb0
test: Add LLGuidance test and refine guided decoding (#5348)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 14:12:56 +08:00
Enwei Zhu
76da7fed86
fix (NvBug 5354925): Fix static EPLB (#5411)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 13:14:40 +08:00
HuiGao-NV
da98e03747
tests: Set kv cache free memory fraction in test case (#5433)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-25 12:31:58 +08:00
Shunkangz
d5354897c0
feat: Dynamically remove servers in PD (#5270)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-06-25 09:50:04 +08:00
Lucas Liebenwein
5cffb7e0ec
[AutoDeploy] Merge feat/ad_2025_06_13 feature branch (#5454)
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-06-25 09:30:13 +08:00
QI JUN
241f921800
waive test_moe.py::test_moe_fp8[autotune] (#5455)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-25 09:14:44 +08:00
dongxuy04
699520082b
Add MTP support for Online EPLB (#5213)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-06-25 07:58:13 +08:00
Iman Tabrizian
846bbf1edc
Fix test Pytorch model engine (#5416)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-24 11:09:27 -07:00
HuiGao-NV
35a92f6bab
Add debug hook to support dump tensor data and add new debug functions easily (#5182)
Signed-off-by: Hui Gao
2025-06-24 17:45:28 +08:00
Emma Qiao
475272046a
[Infra] - Waive failed tests in post-merge and increase some timeout setting (#5424)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-24 17:19:31 +08:00
xinhe-nv
658fb5b54e
tests: update benchmark test lists (#5365)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-24 15:23:38 +08:00
xinhe-nv
4b32a3f1a7
test: [CI] remove closed bugs (#5400)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-24 13:39:57 +08:00
Robin Kobus
b3045c44b9
refactor: remove TrtGptModelOptionalParams (#5165)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-20 10:31:40 +02:00
Fanrong Li
5d4ab47d5b
fix: refactor and fix mtp vanilla (#4762)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-20 05:23:39 +08:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00
Kaiyu Xie
7246fd75d1
feat: Support stream_interval (#5284)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-19 21:57:10 +08:00
Enwei Zhu
bca758fce1
fix: Fix DS-R1 nvfp4 test case naming (#5361)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-19 15:50:43 +08:00
Emma Qiao
493f268b1c
[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 (#5360)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-19 15:05:57 +08:00
hlu1
b558232ce1
Refactor CutlassFusedMoE (#5344)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-06-19 00:04:07 -07:00
ruodil
e22e884b02
test: amend test case name in perf cluster test (#5356)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-19 14:50:12 +08:00
ruodil
21ce9b6749
test: add qwen3 cases (#5302)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-19 14:38:36 +08:00
amitz-nv
1753202b61
[TRTLLM-5825][fix] Fix torch LoRA TP (#5338)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-06-19 09:12:00 +03:00
Emma Qiao
7f68de3e3f
Refactor test timeout for individual long case (#4757)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-19 13:52:11 +08:00
bhsueh_NV
dce8620013
chore: enable moe_backend on Qwen3 test (#5230)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-06-19 13:40:45 +08:00
xinhe-nv
e5400eeae0
tests: add ds r1 tp4 test (#5197)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-19 12:48:33 +08:00
Yiqing Yan
da576bcafa
Waive L0 test (#5349)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-19 12:01:11 +08:00
Fanrong Li
6c3210a8be
[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-19 09:48:22 +08:00