Commit Graph

843 Commits

Author SHA1 Message Date
jmydurant
578dbc8d9a
feat: chunked prefill for MLA (Blackwell) (#4651)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-06-26 09:01:00 +08:00
HuiGao-NV
74ae15a26b
CI: enable test cases on single device type (#5484)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-26 08:03:44 +08:00
QI JUN
feaf789342
CI: reduce BF16 test cases in B200 (#5482)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-06-26 07:18:20 +08:00
Omer Ullman Argov
bdc8dfebc3
[fix][ci] dont build wheel for cpp tests (#5443)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-26 00:13:47 +03:00
Omer Ullman Argov
61bb71fd1b
[fix][test] remove test in global scope (#5470)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-25 23:42:26 +03:00
QI JUN
3a2c4ca77b
chore: split _build_model method for TorchLlm and TrtLlm (#5418)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-06-26 04:32:46 +08:00
Daniel Cámpora
205c97a4ae
[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler (#5328)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-06-25 17:41:36 +02:00
HuiGao-NV
314f15f0a7
Fix: fix nvbug 5356427 (#5464)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-25 22:24:26 +08:00
HuiGao-NV
cc3c2b3be2
Move 3 disaggregated cases from 4 GPUs devices to 1 GPU device (#5457)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-25 21:38:14 +08:00
Kaiyu Xie
d6ada5ffce
[nvbug/5354956] fix: unexpected keyword argument 'streaming' (#5436)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-25 20:37:24 +08:00
QI JUN
2901c5a5bc
CI: waive test_ad_build_small_multi (#5471)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-25 16:44:42 +08:00
Netanel Haber
3ca2f6ac51
start OAIServer with max_beam_width=1 for TorchSampler (#5427)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-06-25 15:52:06 +08:00
Enwei Zhu
fc7a81ceb0
test: Add LLGuidance test and refine guided decoding (#5348)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 14:12:56 +08:00
Enwei Zhu
76da7fed86
fix (NvBug 5354925): Fix static EPLB (#5411)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 13:14:40 +08:00
HuiGao-NV
da98e03747
tests: Set kv cache free memory fraction in test case (#5433)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-25 12:31:58 +08:00
Shunkangz
d5354897c0
feat: Dynamically remove servers in PD (#5270)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-06-25 09:50:04 +08:00
Lucas Liebenwein
5cffb7e0ec
[AutoDeploy] Merge feat/ad_2025_06_13 feature branch (#5454)
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-06-25 09:30:13 +08:00
QI JUN
241f921800
waive test_moe.py::test_moe_fp8[autotune] (#5455)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-25 09:14:44 +08:00
dongxuy04
699520082b
Add MTP support for Online EPLB (#5213)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-06-25 07:58:13 +08:00
Iman Tabrizian
846bbf1edc
Fix test Pytorch model engine (#5416)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-24 11:09:27 -07:00
HuiGao-NV
35a92f6bab
Add debug hook to support dump tensor data and add new debug functions easily (#5182)
Signed-off-by: Hui Gao
2025-06-24 17:45:28 +08:00
Emma Qiao
475272046a
[Infra] - Waive failed tests in post-merge and increase some timeout setting (#5424)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-24 17:19:31 +08:00
xinhe-nv
658fb5b54e
tests: update benchmark test lists (#5365)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-24 15:23:38 +08:00
xinhe-nv
4b32a3f1a7
test: [CI] remove closed bugs (#5400)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-24 13:39:57 +08:00
Robin Kobus
b3045c44b9
refactor: remove TrtGptModelOptionalParams (#5165)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-20 10:31:40 +02:00
Fanrong Li
5d4ab47d5b
fix: refactor and fix mtp vanilla (#4762)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-20 05:23:39 +08:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00
Kaiyu Xie
7246fd75d1
feat: Support stream_interval (#5284)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-19 21:57:10 +08:00
Enwei Zhu
bca758fce1
fix: Fix DS-R1 nvfp4 test case naming (#5361)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-19 15:50:43 +08:00
Emma Qiao
493f268b1c
[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 (#5360)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-19 15:05:57 +08:00
hlu1
b558232ce1
Refactor CutlassFusedMoE (#5344)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-06-19 00:04:07 -07:00
ruodil
e22e884b02
test: amend test case name in perf cluster test (#5356)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-19 14:50:12 +08:00
ruodil
21ce9b6749
test: add qwen3 cases (#5302)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-19 14:38:36 +08:00
amitz-nv
1753202b61
[TRTLLM-5825][fix] Fix torch LoRA TP (#5338)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-06-19 09:12:00 +03:00
Emma Qiao
7f68de3e3f
Refactor test timeout for individual long case (#4757)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-19 13:52:11 +08:00
bhsueh_NV
dce8620013
chore: enable moe_backend on Qwen3 test (#5230)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-06-19 13:40:45 +08:00
xinhe-nv
e5400eeae0
tests: add ds r1 tp4 test (#5197)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-19 12:48:33 +08:00
Yiqing Yan
da576bcafa
Waive L0 test (#5349)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-19 12:01:11 +08:00
Fanrong Li
6c3210a8be
[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-19 09:48:22 +08:00
nv-guomingz
6a388b105a
chore: remove torch_compile prefix for TorchCompileConfig field members (#5261)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-19 09:21:51 +08:00
Yan Chunwei
3946e798db
fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances (#4727)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-19 06:13:53 +08:00
Omer Ullman Argov
0b6d005ef6
[fix][test] clear cuda cache before unittests automatically (#5121)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-19 00:36:53 +03:00
Aurelien Chartier
d25f93c07f
chore: skip test_llm_gpt2_medium_fp8 for fp8_pc_pt + quant_lm_head (#5293)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-06-18 11:13:12 -07:00
Omer Ullman Argov
5010f8719d
[fix][test] remove duplicate test runs (#5241)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-19 01:59:54 +08:00
Omer Ullman Argov
a28a152001
[fix][test] remove some cpp test cases from h100 (#5335)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-18 20:40:26 +03:00
yuanjingx87
a1c5704055
[feat] Multi-node CI testing support via Slurm (#4771)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-19 01:11:12 +08:00
Iman Tabrizian
e5ee5c5352
Unwaive disaggregated serving accuracy tests (#5095)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-06-19 00:41:15 +08:00
HuiGao-NV
d13d2f460d
Remove duplicated test cases (#5323)
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Hui Gao†<huig@nvidia.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-18 21:20:20 +08:00
Emma Qiao
b29ac5b561
[Infra] Update 5080 and 5090 case condition due to the driver update (#5317)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-18 20:01:36 +08:00
xinhe-nv
610a49f117
tests: add multi nodes tests (#5196)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-18 18:08:04 +08:00
Yi Zhang
375dd0b971
Waive L0 (#5311)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-18 16:40:41 +08:00
Yuan Tong
f599ee63c1
test: correct unittest rerun behavior (#5273)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-06-18 16:37:19 +08:00
Robin Kobus
38547b92f3
refactor: Introduce ResourceManagerType enum for resource management (#5246)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-18 09:55:59 +02:00
Wanli Jiang
3a02489e86
[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support (#5159)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-06-18 15:12:49 +08:00
QI JUN
9ea7bb67a4
CI: fix TensorRT H200 tests (#5301)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-18 14:40:57 +08:00
ruodil
3b5d916250
test: cherry-pick deepseek rcca cases in main branch (#5307)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-18 14:26:26 +08:00
Yiqing Yan
8f67e3604d
Waive L0 tests (#5308)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-18 12:43:45 +08:00
Omer Ullman Argov
f501ce57b1
[fix][test] move deepseek single gpu tests to post merge (#5280)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-18 06:59:39 +03:00
dominicshanshan
3c0fecbf42
CI: extend model weights load time for dsv3 in stress test. (#5275)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-06-18 11:51:48 +08:00
Ivy Zhang
41cfcaa964
test: update qa test list (#5305)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-18 11:29:11 +08:00
Emma Qiao
ff32caf4d7
[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 (#4885)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 23:48:34 +08:00
QI JUN
f899c4d294
Re-implement LlmResponse in Python to reduce host overhead of pybind (#5224)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 21:28:09 +08:00
Yanchao Lu
f4cdbfcdf0
None - Some clean-ups for the automation pipeline (#5245)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 21:08:24 +08:00
Dom Brown
44fb3c1673
[TRTLLM-5770] feat: Integrate TRT-LLM Gen FP8 block scale MoE with Pytorch workflow kernel autotuner (#5207)
- Adds a new Python custom op (fp8_block_scale_moe_runner) and a FP8BlockScaleMoERunner class for autotuning.
- Updates C++ MoE and batched GEMM kernels to accept a configIndex for workspace sizing and execution.
- Extends the unit test to run both autotuned and non-autotuned code paths.

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-06-17 21:01:56 +08:00
amirkl94
8451a87742
chore: Mass integration of release/0.20 (#5082)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-17 14:32:02 +03:00
liji-nv
13eef642e6
[feat] Piecewise cuda graph support for MLA (#4467)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-17 18:58:38 +08:00
QI JUN
ccd9adbe33
CI: move multi-gpu test cases of tensorrt backend to h200 (#5272)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 17:37:37 +08:00
Ivy Zhang
2ad8758ecc
[TRTLLM-5786][https://nvbugspro.nvidia.com/bug/5310520][test] Add QA test cases (#5073)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-17 17:14:01 +08:00
QI JUN
517c1ecf72
move some test cases of TensorRT backend back (#5232)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 17:03:11 +08:00
qsang-nv
134cb66a53
fix mla test (#5240)
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
2025-06-17 15:26:25 +08:00
xinhe-nv
a49ad790b3
test: [CI] remove closed bugs (#5218)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-17 13:13:23 +08:00
QI JUN
546274d40e
fix ci (#5259)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 12:03:09 +08:00
ruodil
bb2348372c
test: add more pytorch cases in perf test (#5237)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-17 11:11:28 +08:00
Mike Iovine
c53bc19f5e
[infra] Make test_chunked_prefill faster (#5248)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-17 04:19:47 +08:00
Simeng Liu
5c18160d27
chore: Waive CI failure. (#5252)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-06-16 20:47:05 +02:00
Izzy Putterman
e607768e45
Speculation: Draft Target in new FW (#4558)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-06-17 02:26:08 +08:00
Yilin Fan
dd29063538
[feat] Add llm args to tune python gc threshold (#5141)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
2025-06-16 17:45:22 +08:00
Ivy Zhang
64b7f04fdc
[test] split nemotron test cases from examples_test_list (#5238)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-16 16:36:33 +08:00
xinhe-nv
802f22cd12
test: [CI] Add failed cases into waives.txt (#5221)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-16 16:11:53 +08:00
Yiqing Yan
8445416c39
Waive L0 tests (#5233)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-16 15:19:03 +08:00
Anthony Chang
4f9fa9f21d
feat: MoE trtllm backend kernel update (#5183)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-06-16 14:46:13 +08:00
Wanli Jiang
0acf23185e
[Stress test] Add DeepSeek-R1 stress test (#5033)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-06-16 11:54:31 +08:00
Tracin
ef3fdc8051
feat: Add w4a8_mxfp4_fp8 quantization recipe. (#4867)
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-06-16 11:30:57 +08:00
Yi Zhang
9b616db13b
test: Add fixture to skip tests based on MPI world size (#5028)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-16 11:25:01 +08:00
ruodil
2848e012ae
test: add llama4 models for perf test (#5187)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-16 11:24:35 +08:00
ruodil
3d22f27063
test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attention_dp to false to non-deepseek models (#5155)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-16 11:23:20 +08:00
Enwei Zhu
babdd9ce06
test: Add json_mode_eval for guided decoding evaluation (#5179)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-16 10:03:55 +08:00
Yan Chunwei
c84e41fd9d
fix: build_config in TorchLlmArgs and avoid arbitrary args (#4972)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-15 17:51:56 -07:00
amitz-nv
109c426077
Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130) 2025-06-15 18:54:04 +03:00
Omer Ullman Argov
4eade3ae33
[fix][test] Speedup Nemotron NAS unittests (#5202)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-15 11:26:03 +03:00
Kaiyu Xie
dce1dcc4f9
feat: Support post_proc for bench (#5122)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-15 13:02:38 +08:00
ixlmar
e055af1bc9
chore: improve disagg test failure detection (#4738)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-06-15 01:28:26 +08:00
Aurelien Chartier
1389f5a4d3
feat: Add support for fp8 rowwise quantization (#4876)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: aikitoria <151776613+aikitoria@users.noreply.github.com>
2025-06-14 06:37:48 -07:00
Tailing Yuan
0b60da2c45
feat: large-scale EP(part 7: DeepEP integration) (#4792)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-14 19:12:38 +08:00
yunruis
b99c5ce8c1
Feat/ds r1 min latency opt round3, add router gemm, fused a gemm, PDL (#4560)
Signed-off-by: yunruis <yunruis@nvidia.com>
Signed-off-by: kduan <176893526+Kefeng-Duan@users.noreply.github.com>
Signed-off-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
Co-authored-by: kduan <176893526+Kefeng-Duan@users.noreply.github.com>
2025-06-14 17:36:22 +08:00
nv-guomingz
3b7b5a5ad5
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) (#5032)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-14 14:23:13 +08:00
Enwei Zhu
5f2785fb90
fix: Fix waive list (#5205)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-13 23:33:23 +08:00
Mike Iovine
25aa3881d7
[nvbug/5319281][fix] Stop drafting when we hit the draft model's max seq len (#4879)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-13 11:06:36 -04:00
QI JUN
952f33dcad
CI: move all test cases of TensorRT backend into post merge (#5186)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-13 20:48:48 +08:00
xinhe-nv
30d9d0fa71
test: [CI] Add failed cases into waives.txt (#5178)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-13 16:38:51 +08:00