amirkl94
|
8451a87742
|
chore: Mass integration of release/0.20 (#5082)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-06-17 14:32:02 +03:00 |
|
liji-nv
|
13eef642e6
|
[feat] Piecewise cuda graph support for MLA (#4467)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-06-17 18:58:38 +08:00 |
|
Ivy Zhang
|
2ad8758ecc
|
[TRTLLM-5786][https://nvbugspro.nvidia.com/bug/5310520][test] Add QA test cases (#5073)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-06-17 17:14:01 +08:00 |
|
QI JUN
|
546274d40e
|
fix ci (#5259)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-17 12:03:09 +08:00 |
|
Mike Iovine
|
c53bc19f5e
|
[infra] Make test_chunked_prefill faster (#5248)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-06-17 04:19:47 +08:00 |
|
Izzy Putterman
|
e607768e45
|
Speculation: Draft Target in new FW (#4558)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
|
2025-06-17 02:26:08 +08:00 |
|
Wanli Jiang
|
0acf23185e
|
[Stress test] Add DeepSeek-R1 stress test (#5033)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-06-16 11:54:31 +08:00 |
|
Yi Zhang
|
9b616db13b
|
test: Add fixture to skip tests based on MPI world size (#5028)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-06-16 11:25:01 +08:00 |
|
ruodil
|
2848e012ae
|
test: add llama4 models for perf test (#5187)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-16 11:24:35 +08:00 |
|
ruodil
|
3d22f27063
|
test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attention_dp to false to non-deepseek models (#5155)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-06-16 11:23:20 +08:00 |
|
Enwei Zhu
|
babdd9ce06
|
test: Add json_mode_eval for guided decoding evaluation (#5179)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-16 10:03:55 +08:00 |
|
Yan Chunwei
|
c84e41fd9d
|
fix: build_config in TorchLlmArgs and avoid arbitrary args (#4972)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-15 17:51:56 -07:00 |
|
amitz-nv
|
109c426077
|
Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130)
|
2025-06-15 18:54:04 +03:00 |
|
Kaiyu Xie
|
dce1dcc4f9
|
feat: Support post_proc for bench (#5122)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-15 13:02:38 +08:00 |
|
ixlmar
|
e055af1bc9
|
chore: improve disagg test failure detection (#4738)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-15 01:28:26 +08:00 |
|
Aurelien Chartier
|
1389f5a4d3
|
feat: Add support for fp8 rowwise quantization (#4876)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: aikitoria <151776613+aikitoria@users.noreply.github.com>
|
2025-06-14 06:37:48 -07:00 |
|
nv-guomingz
|
3b7b5a5ad5
|
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) (#5032)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-14 14:23:13 +08:00 |
|
Zheng Duan
|
4d0a5ad384
|
chore: gracefully exit disagg process in tests; better startup and logging (#5109)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
|
2025-06-13 14:03:55 +08:00 |
|
Iman Tabrizian
|
01bd4c00b4
|
Add two MTP disaggregated test (#4546)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-06-13 12:17:45 +08:00 |
|
xinhe-nv
|
d9be419f45
|
tests: update tests for b200 (#5180)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-13 11:25:33 +08:00 |
|
ruodil
|
fa582cbe9a
|
test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test (#5083)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-06-13 11:09:15 +08:00 |
|
Fanrong Li
|
38a907aaca
|
[TRTLLM-5278][feat] Add attention dp support to MTP relaxed acceptance (#5119)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-13 08:58:44 +08:00 |
|
Omer Ullman Argov
|
655bce0b19
|
[fix][test] report individual unittests results to jenkins (#5116)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-13 01:52:09 +08:00 |
|
nv-guomingz
|
cf35a079f9
|
fix:https://nvbugs/5298661 (#5022)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-12 20:41:44 +08:00 |
|
Shi Xiaowei
|
88cba5f354
|
test: waive the NIXL related tests (#5153)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-06-12 17:02:27 +08:00 |
|
Fanrong Li
|
4d070d3862
|
chore: fix typo in tests (#5092)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-12 15:11:26 +08:00 |
|
Michal Guzek
|
53983ad273
|
[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests (#4933)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-06-12 15:06:28 +08:00 |
|
ruodil
|
d021cc5126
|
test: set enable_attention_dp to False for non-deepseek models and add more cases for llama_v3.1/3.3 70b fp8 models (#5149)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-12 14:59:16 +08:00 |
|
bhsueh_NV
|
505678a286
|
update the free_gpu_mem_fraction for H100 qwen3 qa test (#5114)
Signed-off-by: root <root@eos0274.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0274.eos.clusters.nvidia.com>
|
2025-06-12 14:40:57 +08:00 |
|
Michal Guzek
|
0daa70999a
|
Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs (#4961)
Signed-off-by: moraxu <mguzek@nvidia.com>
|
2025-06-12 14:32:04 +08:00 |
|
Venky
|
c3b2eb6dab
|
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨ (#5066)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
|
2025-06-12 14:19:15 +08:00 |
|
xinhe-nv
|
11b94feff8
|
test: skip disaggregated tests on arm (#5070)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-06-11 17:00:10 +08:00 |
|
ruodil
|
56abae0835
|
test: add more llama_v3.3_70b cases in perf test (#4979)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-11 15:44:22 +08:00 |
|
Zheng Duan
|
580a92521e
|
test: conditional disagg and cache aware balancing for deepseek v3 (#4522)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
|
2025-06-11 09:44:29 +08:00 |
|
Stanley Sun
|
74b0e71ef4
|
test: add more disaggregated serving tests into QA testlist (#5036)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
|
2025-06-10 09:24:53 +08:00 |
|
liji-nv
|
1d4f748773
|
[fix] Fix illegal mem access and possible accuracy lose. Cherry-pick … (#5017)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-06-09 17:50:57 +08:00 |
|
Yechan Kim
|
8b4104d34a
|
feat: add HyperCLOVAX-SEED-Vision support in refactored way (#4799)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-06-09 11:04:04 +08:00 |
|
Omer Ullman Argov
|
8731f5f14f
|
chore: Mass integration of release/0.20 (#4898)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-06-08 23:26:26 +08:00 |
|
QI JUN
|
5ee0de7f2a
|
Resubmit #4894 (#4969)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-08 04:42:15 +08:00 |
|
Ivy Zhang
|
7dce328ad6
|
[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow (#4940)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Co-authored-by: Ruodi Lu <ruodil@nvidia.com>
|
2025-06-07 11:18:32 +08:00 |
|
Fanrong Li
|
75d020cf07
|
fix: fix cuda graph padding for spec decoding (#4853)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-06 22:21:42 +08:00 |
|
Anthony Chang
|
eeb555e37b
|
chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM (#4826)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-06-06 16:13:54 +08:00 |
|
QI JUN
|
ec50684d80
|
Revert "fix a bug of global cuda graph dummy request" (#4970)
|
2025-06-06 08:54:45 +08:00 |
|
QI JUN
|
154f7cc40a
|
fix a bug of global cuda graph dummy request (#4894)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 19:47:40 +08:00 |
|
ixlmar
|
a1526356aa
|
[TRTLLM-5630] restore free_gpu_memory_fraction=0.9 in tests (#4859)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-05 10:46:29 +01:00 |
|
QI JUN
|
b8c5e3892b
|
Revert "fix: build_config in TorchLlmArgs and avoid invalid args" (#4949)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 17:43:30 +08:00 |
|
xinhe-nv
|
1c3091c63b
|
tests: [TRTQA-2906] add benchmark serving tests (#4901)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
|
2025-06-05 14:33:03 +08:00 |
|
xinhe-nv
|
50a74a1daa
|
tests: fix 5273697 (#4685)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-06-05 10:39:21 +08:00 |
|
Yi Zhang
|
1fca654bfd
|
tests: Update gb200 test case (#4754)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-06-04 18:49:20 +08:00 |
|
Yan Chunwei
|
ac20159d32
|
fix: build_config in TorchLlmArgs and avoid invalid args (#4600)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-04 13:17:29 +08:00 |
|