Yiqing Yan
b1ce7f0765
Waive L0 test ( #4862 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-03 18:37:21 +08:00
Yiqing Yan
95e6ad579d
Waive L0 test ( #4857 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-03 15:58:26 +08:00
Yechan Kim
565abb6887
fix: [nvbugs/5298600] fix illegal memory access on mrope_position_deltas ( #4830 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-03 14:56:50 +08:00
Fanrong Li
6e46e13523
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4379 ( #4833 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-03 12:30:01 +08:00
Fanrong Li
82d918b93e
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/4536 ( #4834 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-03 12:29:54 +08:00
Yanchao Lu
36116f09f6
[Infra] - Better utilize multi-GPU CI resources ( #4850 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-03 12:25:20 +08:00
ruodil
7c47714a39
test: shorten reqs in con:1 cases and add streaming cases, add l2 perf test ( #4796 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-03 10:20:55 +08:00
Stanley Sun
b58556e2d9
test: remove invalid triton integration test cases ( #4801 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-03 09:39:23 +08:00
Michal Guzek
4e68be2da7
[TRTLLM-4932] Remove moe- related arguments from Llama-3_1-Nemotron-Ultra-253B-v1 CLI accuracy test ( #4808 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-06-02 12:16:28 -07:00
Faraz
10d5af06e0
[NVBUG-5291971] JIT path for XQA ( #4675 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-06-02 16:24:59 +02:00
pcastonguay
ddd704f39c
fix: Fix queued req stats for release/0.20 ( #4806 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-02 08:32:24 -04:00
brb-nv
7a2cd255bc
fix: Skip dummy medusa/eagle tests when WORLD_SIZE env variable is missing ( #4786 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-06-02 02:21:24 -07:00
QI JUN
555118f783
[ https://nvbugs/5303634 ] skip evaluating empty batch_input_ids in summarize.py ( #4676 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-02 16:16:05 +08:00
Yan Chunwei
55170ec83a
fix: llmapi-launch add add trtllm-bench test with engine building (#4… ( #4550 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-01 08:38:01 +08:00
Iman Tabrizian
00e0837e5c
Remove disaggregated cuda graph waived test ( #4707 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-31 07:24:00 +08:00
Yanchao Lu
86779213db
[Docs] - Add date and commit info ( #4448 ) ( #4752 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-30 15:58:49 +08:00
Yiqing Yan
830d68d101
Waive l0 tests ( #4795 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-30 15:56:58 +08:00
Ivy Zhang
9980e73afa
tests: waive failed case ( #4785 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-30 11:24:25 +08:00
xinhe-nv
1bc3dfa490
tests: fix 5250460 ( #4751 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-30 10:13:45 +08:00
Iman Tabrizian
de0613bd83
[nvbugs/5297821] Fix llama4 disaggregated serving accuracy tests ( #4743 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-29 12:55:17 -07:00
Anurag Mukkara
cdde37b779
[nvbugs/5302709] fix: Use HF vision tower for llava-next on A100 ( #4747 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-05-29 11:27:07 -07:00
Pamela Peng
52465216f4
[ https://nvbugs/5295389 ][fix]fix moe fp4 on sm120 ( #4624 )
...
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
2025-05-29 09:50:47 -07:00
Stanley Sun
040fef709a
test: remove large bs as it will oom ( #4726 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-29 14:31:57 +08:00
yuanjingx87
55254bdfc4
[fix] Add back RTX6000Pro post-merge tests ( #4744 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-29 13:17:34 +08:00
ruodil
5c235de80d
test: remove perf test l40s/l20 oom test cases and unwaive tests ( #4720 )
...
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-05-29 12:47:52 +08:00
wili
9acf19d069
[ https://nvbugspro.nvidia.com/bug/5236935 ][Fix] Fix document of using Draft-Target-Model (DTM) speculative decoding in Triton Server ( #4731 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-05-29 10:38:34 +08:00
nv-guomingz
bc7e53c9ef
fix: https://nvbugs/5214239 ( #4718 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-29 09:36:31 +08:00
Iman Tabrizian
f57cd1b1a9
Remove V1 batching tests ( #4703 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-29 05:57:57 +08:00
Bo Li
6567453d3e
fix: [ https://nvbugspro.nvidia.com/bug/5286795 ] Unwaive tests for bug-5286795. ( #4724 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-05-29 00:51:23 +08:00
Yukun He
2deba0dca5
fix: Fix AutoTuner warmup request generating. ( #4670 )
...
The current warmup phase creates one request, which is insufficient for the warmup to cover the max_num_tokens. Revise the warmup phase to a batch of requests to cover the max_num_tokens to eliminate potential fallback cases.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-29 00:25:57 +08:00
nv-guomingz
1555478c1b
fix: https://nvbugs/5305692 update invalid links in doc. ( #4698 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-28 17:24:55 +08:00
Venky
1a989a8189
[cherry-pick] test(perf): Pt.2 Add Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp) ( #4499 ) ( #4588 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-28 15:48:01 +08:00
Venky
b4e598da27
[cherry-pick] test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) ( #4446 ) ( #4590 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-28 14:17:24 +08:00
Venky
42e622a3b9
[cherry-pick] test(perf): Add remaining Phi-4-mini-instruct perf tests ( #4443 ) ( #4589 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-28 14:17:18 +08:00
brb-nv
fc3c2f7f7c
fix: Mistral Small vision encoder with BS>1 ( #4713 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-28 12:49:28 +08:00
HuiGao-NV
1bfc7d4c29
fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cache memory and log more memory information ( #4660 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-28 10:00:19 +08:00
Yuxian Qiu
87b50a5736
fix: [nvbugs/5289912][nvbugs/5232406] use thread pool for multi-thread weight loading in fused moe. ( #4699 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-28 08:13:06 +08:00
Yiqing Yan
6df8620577
[TRTLLM-5326] - Fix test coverage report generation ( #4691 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-27 18:24:29 +08:00
Ivy Zhang
fbe48df361
tests: waive and unwaive QA test cases ( #4644 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-27 15:19:45 +08:00
Yan Chunwei
10119412ef
fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu ( #4529 )
...
fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4428 )
2025-05-27 15:19:04 +08:00
Yanchao Lu
cbb6a264be
[Test] - Correct waive the Slurm test stage ( #4680 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-27 13:34:49 +08:00
Martin Marciniszyn Mehringer
8eafe83c37
Update the description for NGC docker images ( #4671 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-27 10:57:39 +08:00
Michal Guzek
24153c068e
[TRTLLM-4932] Add QA accuracy tests for NIM-prioritized models ( #4242 )
...
* Add tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add tests v2
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add fixes
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip fp8 test for Ultra
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add tests for Phi
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip tests for Phi
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip tests for Phi - fix
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip tests for Phi - comment out acc refs
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add more test granularity
Signed-off-by: moraxu <mguzek@nvidia.com>
* Fix examples_test_list.txt
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update test list file
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update yaml files
Signed-off-by: moraxu <mguzek@nvidia.com>
* Address review comments
Signed-off-by: moraxu <mguzek@nvidia.com>
* Remove MMLU tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add remaining models
Signed-off-by: moraxu <mguzek@nvidia.com>
---------
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-24 19:17:21 +08:00
Jinyang Yuan
f9a9a1af2e
[fix] Fix Llama4 allgather error due to None tensor ( #4511 )
...
* [fix] Fix Llama4 allgather error due to None tensor
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
* Refactor modifications
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
* Minor modification
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
* Minor fix
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
---------
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-24 19:12:12 +08:00
Iman Tabrizian
ad4d947b24
Add missing rcca folder ( #4591 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-24 03:28:10 +08:00
Michal Guzek
2a2d7ebf2e
[fix] Incorrect mocker argument for a CLI accuracy test in Llama-3.3-70B-Instruct ( #4604 )
...
Fix mocker argument
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-23 12:18:37 -07:00
Michal Guzek
d2e6af2fe4
[TRTLLM-4932] Add CLI accuracy tests for Llama-3_3-Nemotron-Super-49B-v1 and LLM API FP8 variant ( #4375 )
...
* Add CLI TestNemotronSuper acc tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update mmlu.yaml
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update yaml files
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip FP8 test in CLI
Signed-off-by: moraxu <mguzek@nvidia.com>
* Address reviews
Signed-off-by: moraxu <mguzek@nvidia.com>
* Address review comments
Signed-off-by: moraxu <mguzek@nvidia.com>
---------
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-23 12:17:23 -07:00
Faraz
53008d3ee8
[TR[TLLM-4618][feat] Add remaining NVFP4 Nemotron Super 49B test on RTX6000 Pro (SM120) ( #4548 )
...
added nvfp4 nemotron for qa testing on RTX 6000
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-23 10:42:32 -07:00
Simeng Liu
630b7907a0
[CI] Waive known errors with test TestDeepSeekV3Lite::test_fp8_block_scales_4gpus ( #4627 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-05-23 10:33:44 -07:00
Robin Kobus
7c1565a2b6
[nvbugs/5274894] fix: Sort requests for functional correctness and performance ( #4608 )
...
* Revert "[nvbugs/5274894] fix: Moving finished context requests to generation (#4576 )"
This reverts commit d39bcb6b40 .
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* fix: Sort requests for functional correctness and performance
- Moved sorting related logic to a dedicated function for better clarity and maintainability.
- Enhanced sorting logic to separate finished context requests from ongoing ones before sorting by Lora task ID.
- Updated function documentation to reflect the sorting behavior and its purpose.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-23 15:08:54 +02:00