pcastonguay
ddd704f39c
fix: Fix queued req stats for release/0.20 ( #4806 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-02 08:32:24 -04:00
brb-nv
7a2cd255bc
fix: Skip dummy medusa/eagle tests when WORLD_SIZE env variable is missing ( #4786 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-06-02 02:21:24 -07:00
QI JUN
555118f783
[ https://nvbugs/5303634 ] skip evaluating empty batch_input_ids in summarize.py ( #4676 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-02 16:16:05 +08:00
Yan Chunwei
55170ec83a
fix: llmapi-launch add add trtllm-bench test with engine building (#4… ( #4550 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-01 08:38:01 +08:00
Iman Tabrizian
00e0837e5c
Remove disaggregated cuda graph waived test ( #4707 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-31 07:24:00 +08:00
Yanchao Lu
86779213db
[Docs] - Add date and commit info ( #4448 ) ( #4752 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-30 15:58:49 +08:00
Yiqing Yan
830d68d101
Waive l0 tests ( #4795 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-30 15:56:58 +08:00
Ivy Zhang
9980e73afa
tests: waive failed case ( #4785 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-30 11:24:25 +08:00
xinhe-nv
1bc3dfa490
tests: fix 5250460 ( #4751 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-30 10:13:45 +08:00
Iman Tabrizian
de0613bd83
[nvbugs/5297821] Fix llama4 disaggregated serving accuracy tests ( #4743 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-29 12:55:17 -07:00
Anurag Mukkara
cdde37b779
[nvbugs/5302709] fix: Use HF vision tower for llava-next on A100 ( #4747 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-05-29 11:27:07 -07:00
Pamela Peng
52465216f4
[ https://nvbugs/5295389 ][fix]fix moe fp4 on sm120 ( #4624 )
...
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
2025-05-29 09:50:47 -07:00
Stanley Sun
040fef709a
test: remove large bs as it will oom ( #4726 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-29 14:31:57 +08:00
yuanjingx87
55254bdfc4
[fix] Add back RTX6000Pro post-merge tests ( #4744 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-29 13:17:34 +08:00
ruodil
5c235de80d
test: remove perf test l40s/l20 oom test cases and unwaive tests ( #4720 )
...
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-05-29 12:47:52 +08:00
wili
9acf19d069
[ https://nvbugspro.nvidia.com/bug/5236935 ][Fix] Fix document of using Draft-Target-Model (DTM) speculative decoding in Triton Server ( #4731 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-05-29 10:38:34 +08:00
nv-guomingz
bc7e53c9ef
fix: https://nvbugs/5214239 ( #4718 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-29 09:36:31 +08:00
Iman Tabrizian
f57cd1b1a9
Remove V1 batching tests ( #4703 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-29 05:57:57 +08:00
Bo Li
6567453d3e
fix: [ https://nvbugspro.nvidia.com/bug/5286795 ] Unwaive tests for bug-5286795. ( #4724 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-05-29 00:51:23 +08:00
Yukun He
2deba0dca5
fix: Fix AutoTuner warmup request generating. ( #4670 )
...
The current warmup phase creates one request, which is insufficient for the warmup to cover the max_num_tokens. Revise the warmup phase to a batch of requests to cover the max_num_tokens to eliminate potential fallback cases.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-29 00:25:57 +08:00
nv-guomingz
1555478c1b
fix: https://nvbugs/5305692 update invalid links in doc. ( #4698 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-28 17:24:55 +08:00
Venky
1a989a8189
[cherry-pick] test(perf): Pt.2 Add Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp) ( #4499 ) ( #4588 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-28 15:48:01 +08:00
Venky
b4e598da27
[cherry-pick] test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) ( #4446 ) ( #4590 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-28 14:17:24 +08:00
Venky
42e622a3b9
[cherry-pick] test(perf): Add remaining Phi-4-mini-instruct perf tests ( #4443 ) ( #4589 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-28 14:17:18 +08:00
brb-nv
fc3c2f7f7c
fix: Mistral Small vision encoder with BS>1 ( #4713 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-28 12:49:28 +08:00
HuiGao-NV
1bfc7d4c29
fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cache memory and log more memory information ( #4660 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-28 10:00:19 +08:00
Yuxian Qiu
87b50a5736
fix: [nvbugs/5289912][nvbugs/5232406] use thread pool for multi-thread weight loading in fused moe. ( #4699 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-28 08:13:06 +08:00
Yiqing Yan
6df8620577
[TRTLLM-5326] - Fix test coverage report generation ( #4691 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-27 18:24:29 +08:00
Ivy Zhang
fbe48df361
tests: waive and unwaive QA test cases ( #4644 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-27 15:19:45 +08:00
Yan Chunwei
10119412ef
fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu ( #4529 )
...
fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4428 )
2025-05-27 15:19:04 +08:00
Yanchao Lu
cbb6a264be
[Test] - Correct waive the Slurm test stage ( #4680 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-27 13:34:49 +08:00
Martin Marciniszyn Mehringer
8eafe83c37
Update the description for NGC docker images ( #4671 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-27 10:57:39 +08:00
Michal Guzek
24153c068e
[TRTLLM-4932] Add QA accuracy tests for NIM-prioritized models ( #4242 )
...
* Add tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add tests v2
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add fixes
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip fp8 test for Ultra
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add tests for Phi
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip tests for Phi
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip tests for Phi - fix
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip tests for Phi - comment out acc refs
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add more test granularity
Signed-off-by: moraxu <mguzek@nvidia.com>
* Fix examples_test_list.txt
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update test list file
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update yaml files
Signed-off-by: moraxu <mguzek@nvidia.com>
* Address review comments
Signed-off-by: moraxu <mguzek@nvidia.com>
* Remove MMLU tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add remaining models
Signed-off-by: moraxu <mguzek@nvidia.com>
---------
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-24 19:17:21 +08:00
Jinyang Yuan
f9a9a1af2e
[fix] Fix Llama4 allgather error due to None tensor ( #4511 )
...
* [fix] Fix Llama4 allgather error due to None tensor
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
* Refactor modifications
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
* Minor modification
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
* Minor fix
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
---------
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-24 19:12:12 +08:00
Iman Tabrizian
ad4d947b24
Add missing rcca folder ( #4591 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-24 03:28:10 +08:00
Michal Guzek
2a2d7ebf2e
[fix] Incorrect mocker argument for a CLI accuracy test in Llama-3.3-70B-Instruct ( #4604 )
...
Fix mocker argument
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-23 12:18:37 -07:00
Michal Guzek
d2e6af2fe4
[TRTLLM-4932] Add CLI accuracy tests for Llama-3_3-Nemotron-Super-49B-v1 and LLM API FP8 variant ( #4375 )
...
* Add CLI TestNemotronSuper acc tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update mmlu.yaml
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update yaml files
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip FP8 test in CLI
Signed-off-by: moraxu <mguzek@nvidia.com>
* Address reviews
Signed-off-by: moraxu <mguzek@nvidia.com>
* Address review comments
Signed-off-by: moraxu <mguzek@nvidia.com>
---------
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-23 12:17:23 -07:00
Faraz
53008d3ee8
[TR[TLLM-4618][feat] Add remaining NVFP4 Nemotron Super 49B test on RTX6000 Pro (SM120) ( #4548 )
...
added nvfp4 nemotron for qa testing on RTX 6000
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-23 10:42:32 -07:00
Simeng Liu
630b7907a0
[CI] Waive known errors with test TestDeepSeekV3Lite::test_fp8_block_scales_4gpus ( #4627 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-05-23 10:33:44 -07:00
Robin Kobus
7c1565a2b6
[nvbugs/5274894] fix: Sort requests for functional correctness and performance ( #4608 )
...
* Revert "[nvbugs/5274894] fix: Moving finished context requests to generation (#4576 )"
This reverts commit d39bcb6b40 .
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* fix: Sort requests for functional correctness and performance
- Moved sorting related logic to a dedicated function for better clarity and maintainability.
- Enhanced sorting logic to separate finished context requests from ongoing ones before sorting by Lora task ID.
- Updated function documentation to reflect the sorting behavior and its purpose.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-23 15:08:54 +02:00
stnie
21af6f77dc
ci: waive testcase [NVBUG 5297821] ( #4616 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-05-23 20:54:42 +08:00
Barry Kang
9e15c035a7
Update internal cutlass kernels commit id ( #4619 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-05-23 20:07:41 +08:00
Barry Kang
26793e3569
[ https://nvbugs/5289907 ][fix] Restore per-channel pre-quant ( #4545 )
...
* Restore per-channel pre-quant
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
* Update TRT test script
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
* Fix pre-commit
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
---------
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-05-23 19:46:53 +08:00
Yukun He
d7701ea6d8
[5180961] chore: Unwaive test for Qwen model. ( #4524 )
...
* Unwaive test for Qwen model.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* update.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
---------
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-23 13:28:08 +08:00
ruodil
2ce14357ff
test: fix for perf sanity test and skip fp8 deepseek blackwell cases ( #4598 )
...
fix for sanity test and skip fp8 deepseek blackwell cases
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-23 11:13:14 +08:00
Venky
d15ceae62e
test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) ( #4407 )
...
* extend pyt nano tests perf coverage
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
* explicitly set maxnt for some cases
This is because the test harness default to no prefill chunking, that means the isl specified is the true context.
When explicitly unspecified in the test harness, the `maxnt` passed down to `trtllm-bench` is 2048.
This means trtllm-bench gets conflicting inputs when isl>2048 but maxnt=2048; hence overriding maxnt to be consistent with isl for such cases.
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
---------
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-23 08:44:37 +08:00
Yukun He
dd79631b77
[5234029][5226211] chore: Unwaive multimodal tests for Qwen model. ( #4519 )
...
Unwaive multimodal tests for Qwen models.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-23 08:04:56 +08:00
Robin Kobus
d39bcb6b40
[nvbugs/5274894] fix: Moving finished context requests to generation ( #4576 )
...
fix: Moving finished context requests to generation
- Unfinished chunked context requests appear at end of context requests vector.
- Replaced std::find_if with std::partition to find the correct position to move finished context requests to generation.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-22 17:49:40 +02:00
ruodil
3d083b69be
test: waive hanging cases for perf test ( #4563 )
...
waive hanging cases
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-22 21:09:12 +08:00
Yukun He
21ada0a961
[5141290][5273694][5260696] fix: Fix mrope argument missing issue in the summary tasks for Qwen model. ( #4432 )
...
Fixed the mrope argument missing issue in the summary tasks for Qwen models.
And re-enabled the fixed tests.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-22 17:45:59 +08:00