DylanChen-NV
|
74dca0aa7b
|
[NVBUG-5304516/5319741]Qwen2.5VL FP8 support (#5029)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
|
2025-07-09 23:16:42 +08:00 |
|
xavier-nvidia
|
b6013da198
|
Fix GEMM+AR fusion on blackwell (#5563)
Signed-off-by: xsimmons <xsimmons@nvidia.com>
|
2025-07-09 08:48:47 +08:00 |
|
Yan Chunwei
|
e50d95c40d
|
chore [TRTLLM-6161]: add LLM speculative decoding example (#5706)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-09 07:33:11 +08:00 |
|
Raayan Dhar
|
e3268a4221
|
[TRTLLM-5847][feat] Support n-gram speculative decoding with disagg (#5732)
Signed-off-by: raayandhar <rdhar@nvidia.com>
|
2025-07-08 09:39:58 -04:00 |
|
liji-nv
|
95978e3044
|
[fix] https://nvbugs/5333654 Unwaive to check ci status and improve torch compile multi-gpu coverage (#5700)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-07-08 12:42:15 +08:00 |
|
Yanchao Lu
|
2013034948
|
[Test] - Waive or fix few known test failures (#5769)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-07-06 21:14:16 +08:00 |
|
Stefan Niebler
|
d1112aac37
|
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow (#5333)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
|
2025-07-05 01:35:13 +09:00 |
|
Chuang Zhu
|
ffc0b8f5da
|
Cache transceiver support VSWA (#5505)
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-07-05 01:18:42 +09:00 |
|
Yi Zhang
|
73d30a23c7
|
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Zheng Duan
|
cb9f596dbe
|
[nvbug 5300551] test: increase block count in eviction test (#5465)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-07-04 13:14:13 +08:00 |
|
Netanel Haber
|
f91379b7e8
|
delete duplicate eagle3 and ngram tests (#5711)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2025-07-03 15:47:26 +03:00 |
|
Omer Ullman Argov
|
c72856188c
|
[ci] small multigpu speedups (#5643)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-07-03 08:06:10 -04:00 |
|
Emma Qiao
|
31699cbeb1
|
[Infra] - Set default timeout to 1hr and remove some specific settings (#5667)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-02 08:37:54 -04:00 |
|
Kaiyu Xie
|
f9a455651b
|
perf: Use tokenizers API to optimize incremental detokenization perf (#5574)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-01 09:35:25 -04:00 |
|
brb-nv
|
4ef60d5fbb
|
nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-07-01 20:12:55 +08:00 |
|
Yan Chunwei
|
a5eff139f1
|
[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) (#5431)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
|
2025-07-01 19:06:41 +08:00 |
|
Emma Qiao
|
65c2b93284
|
[Infra] - Add some timeout and unwaive a test which dev fixed (#5631)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-07-01 05:01:32 -04:00 |
|
Omer Ullman Argov
|
42134b8b84
|
[ci] move eagle1 and medusa tests to post-merge (#5604)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-30 19:32:28 +08:00 |
|
Talor Abramovich
|
70e34a3291
|
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve (#5376)
Signed-off-by: Talor Abramovich <talora@nvidia.com>
|
2025-06-29 12:46:30 +00:00 |
|
amirkl94
|
a985c0b7e6
|
tests: Move stress tests to be Post-Merge only (#5166)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
|
2025-06-29 09:44:47 +03:00 |
|
Iman Tabrizian
|
26b953e29a
|
[nvbugs/5309940] Add support for input output token counts (#5445)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-06-28 04:39:39 +08:00 |
|
wili
|
56cdfe5c6c
|
[TRTLLM-5000][feat] NGrams V2 (#4569)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-06-27 23:00:17 +08:00 |
|
Iman Tabrizian
|
49af791f66
|
Add testing for trtllm-llmapi-launch with tritonserver (#5528)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-06-27 11:19:52 +08:00 |
|
Frank
|
aa6e015ef8
|
Update trtllm-bench to support new Pytorch default. (#5491)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
|
2025-06-26 17:05:43 -07:00 |
|
jmydurant
|
8836990bde
|
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2025-06-26 22:18:08 +08:00 |
|
Omer Ullman Argov
|
6bae76d7ca
|
[fix][ci] move torch tests to run under torch stage (#5473)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-26 14:31:38 +03:00 |
|
Omer Ullman Argov
|
1633bd2bef
|
[CI] move flashinfer llama tests to post merge (#5506)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-26 19:27:32 +08:00 |
|
Emma Qiao
|
32d1573c43
|
[Infra] - Add timeout setting for long tests found in post-merge (#5501)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-26 11:31:39 +08:00 |
|
jmydurant
|
578dbc8d9a
|
feat: chunked prefill for MLA (Blackwell) (#4651)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2025-06-26 09:01:00 +08:00 |
|
HuiGao-NV
|
74ae15a26b
|
CI: enable test cases on single device type (#5484)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-26 08:03:44 +08:00 |
|
QI JUN
|
feaf789342
|
CI: reduce BF16 test cases in B200 (#5482)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-06-26 07:18:20 +08:00 |
|
HuiGao-NV
|
cc3c2b3be2
|
Move 3 disaggregated cases from 4 GPUs devices to 1 GPU device (#5457)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-25 21:38:14 +08:00 |
|
Enwei Zhu
|
fc7a81ceb0
|
test: Add LLGuidance test and refine guided decoding (#5348)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-25 14:12:56 +08:00 |
|
dongxuy04
|
699520082b
|
Add MTP support for Online EPLB (#5213)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-06-25 07:58:13 +08:00 |
|
Emma Qiao
|
475272046a
|
[Infra] - Waive failed tests in post-merge and increase some timeout setting (#5424)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-06-24 17:19:31 +08:00 |
|
Fanrong Li
|
5d4ab47d5b
|
fix: refactor and fix mtp vanilla (#4762)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-20 05:23:39 +08:00 |
|
Kaiyu Xie
|
7246fd75d1
|
feat: Support stream_interval (#5284)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-19 21:57:10 +08:00 |
|
Enwei Zhu
|
bca758fce1
|
fix: Fix DS-R1 nvfp4 test case naming (#5361)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-19 15:50:43 +08:00 |
|
Emma Qiao
|
493f268b1c
|
[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 (#5360)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-19 15:05:57 +08:00 |
|
amitz-nv
|
1753202b61
|
[TRTLLM-5825][fix] Fix torch LoRA TP (#5338)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-06-19 09:12:00 +03:00 |
|
Emma Qiao
|
7f68de3e3f
|
Refactor test timeout for individual long case (#4757)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-19 13:52:11 +08:00 |
|
bhsueh_NV
|
dce8620013
|
chore: enable moe_backend on Qwen3 test (#5230)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-06-19 13:40:45 +08:00 |
|
Fanrong Li
|
6c3210a8be
|
[test] add nvfp4 DeepSeek-V3-Lite-mtp tests (#5125)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-19 09:48:22 +08:00 |
|
Omer Ullman Argov
|
5010f8719d
|
[fix][test] remove duplicate test runs (#5241)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-19 01:59:54 +08:00 |
|
Omer Ullman Argov
|
a28a152001
|
[fix][test] remove some cpp test cases from h100 (#5335)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-18 20:40:26 +03:00 |
|
yuanjingx87
|
a1c5704055
|
[feat] Multi-node CI testing support via Slurm (#4771)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-06-19 01:11:12 +08:00 |
|
HuiGao-NV
|
d13d2f460d
|
Remove duplicated test cases (#5323)
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Hui Gaoâ <huig@nvidia.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-18 21:20:20 +08:00 |
|
Emma Qiao
|
b29ac5b561
|
[Infra] Update 5080 and 5090 case condition due to the driver update (#5317)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-06-18 20:01:36 +08:00 |
|
Omer Ullman Argov
|
f501ce57b1
|
[fix][test] move deepseek single gpu tests to post merge (#5280)
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
|
2025-06-18 06:59:39 +03:00 |
|
Yanchao Lu
|
f4cdbfcdf0
|
None - Some clean-ups for the automation pipeline (#5245)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-06-17 21:08:24 +08:00 |
|