Yan Chunwei
55170ec83a
fix: llmapi-launch add add trtllm-bench test with engine building (#4… ( #4550 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-01 08:38:01 +08:00
Iman Tabrizian
00e0837e5c
Remove disaggregated cuda graph waived test ( #4707 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-31 07:24:00 +08:00
Yiqing Yan
830d68d101
Waive l0 tests ( #4795 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-30 15:56:58 +08:00
Ivy Zhang
9980e73afa
tests: waive failed case ( #4785 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-30 11:24:25 +08:00
xinhe-nv
1bc3dfa490
tests: fix 5250460 ( #4751 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-30 10:13:45 +08:00
Stanley Sun
040fef709a
test: remove large bs as it will oom ( #4726 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-29 14:31:57 +08:00
ruodil
5c235de80d
test: remove perf test l40s/l20 oom test cases and unwaive tests ( #4720 )
...
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-05-29 12:47:52 +08:00
nv-guomingz
bc7e53c9ef
fix: https://nvbugs/5214239 ( #4718 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-05-29 09:36:31 +08:00
Bo Li
6567453d3e
fix: [ https://nvbugspro.nvidia.com/bug/5286795 ] Unwaive tests for bug-5286795. ( #4724 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-05-29 00:51:23 +08:00
Venky
1a989a8189
[cherry-pick] test(perf): Pt.2 Add Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp) ( #4499 ) ( #4588 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-28 15:48:01 +08:00
Venky
b4e598da27
[cherry-pick] test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) ( #4446 ) ( #4590 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-28 14:17:24 +08:00
Venky
42e622a3b9
[cherry-pick] test(perf): Add remaining Phi-4-mini-instruct perf tests ( #4443 ) ( #4589 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-28 14:17:18 +08:00
brb-nv
fc3c2f7f7c
fix: Mistral Small vision encoder with BS>1 ( #4713 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-28 12:49:28 +08:00
Yuxian Qiu
87b50a5736
fix: [nvbugs/5289912][nvbugs/5232406] use thread pool for multi-thread weight loading in fused moe. ( #4699 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-28 08:13:06 +08:00
Ivy Zhang
fbe48df361
tests: waive and unwaive QA test cases ( #4644 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-27 15:19:45 +08:00
Michal Guzek
24153c068e
[TRTLLM-4932] Add QA accuracy tests for NIM-prioritized models ( #4242 )
...
* Add tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add tests v2
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add fixes
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip fp8 test for Ultra
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add tests for Phi
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip tests for Phi
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip tests for Phi - fix
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip tests for Phi - comment out acc refs
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add more test granularity
Signed-off-by: moraxu <mguzek@nvidia.com>
* Fix examples_test_list.txt
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update test list file
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update yaml files
Signed-off-by: moraxu <mguzek@nvidia.com>
* Address review comments
Signed-off-by: moraxu <mguzek@nvidia.com>
* Remove MMLU tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add remaining models
Signed-off-by: moraxu <mguzek@nvidia.com>
---------
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-24 19:17:21 +08:00
Jinyang Yuan
f9a9a1af2e
[fix] Fix Llama4 allgather error due to None tensor ( #4511 )
...
* [fix] Fix Llama4 allgather error due to None tensor
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
* Refactor modifications
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
* Minor modification
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
* Minor fix
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
---------
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-24 19:12:12 +08:00
Michal Guzek
d2e6af2fe4
[TRTLLM-4932] Add CLI accuracy tests for Llama-3_3-Nemotron-Super-49B-v1 and LLM API FP8 variant ( #4375 )
...
* Add CLI TestNemotronSuper acc tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update mmlu.yaml
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update yaml files
Signed-off-by: moraxu <mguzek@nvidia.com>
* Skip FP8 test in CLI
Signed-off-by: moraxu <mguzek@nvidia.com>
* Address reviews
Signed-off-by: moraxu <mguzek@nvidia.com>
* Address review comments
Signed-off-by: moraxu <mguzek@nvidia.com>
---------
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-23 12:17:23 -07:00
Faraz
53008d3ee8
[TR[TLLM-4618][feat] Add remaining NVFP4 Nemotron Super 49B test on RTX6000 Pro (SM120) ( #4548 )
...
added nvfp4 nemotron for qa testing on RTX 6000
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-23 10:42:32 -07:00
Simeng Liu
630b7907a0
[CI] Waive known errors with test TestDeepSeekV3Lite::test_fp8_block_scales_4gpus ( #4627 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-05-23 10:33:44 -07:00
Yukun He
d7701ea6d8
[5180961] chore: Unwaive test for Qwen model. ( #4524 )
...
* Unwaive test for Qwen model.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* update.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
---------
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-23 13:28:08 +08:00
ruodil
2ce14357ff
test: fix for perf sanity test and skip fp8 deepseek blackwell cases ( #4598 )
...
fix for sanity test and skip fp8 deepseek blackwell cases
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-23 11:13:14 +08:00
Venky
d15ceae62e
test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) ( #4407 )
...
* extend pyt nano tests perf coverage
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
* explicitly set maxnt for some cases
This is because the test harness default to no prefill chunking, that means the isl specified is the true context.
When explicitly unspecified in the test harness, the `maxnt` passed down to `trtllm-bench` is 2048.
This means trtllm-bench gets conflicting inputs when isl>2048 but maxnt=2048; hence overriding maxnt to be consistent with isl for such cases.
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
---------
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-23 08:44:37 +08:00
Yukun He
dd79631b77
[5234029][5226211] chore: Unwaive multimodal tests for Qwen model. ( #4519 )
...
Unwaive multimodal tests for Qwen models.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-23 08:04:56 +08:00
ruodil
3d083b69be
test: waive hanging cases for perf test ( #4563 )
...
waive hanging cases
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-22 21:09:12 +08:00
Yukun He
21ada0a961
[5141290][5273694][5260696] fix: Fix mrope argument missing issue in the summary tasks for Qwen model. ( #4432 )
...
Fixed the mrope argument missing issue in the summary tasks for Qwen models.
And re-enabled the fixed tests.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-22 17:45:59 +08:00
ruodil
ce6a32997b
test: add failed case in waive list and fix some test script issue for perf test ( #4528 )
...
add failed case in waive list and fix some test script issue
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-21 16:36:32 +08:00
Ivy Zhang
e977c75300
tests: update api change from decoder to sampler in test ( #4479 )
...
update
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-21 14:22:18 +08:00
ruodil
b5edf13b33
test: update test filter in perf test yml file to select cases by gpu name and add cases for RTX 6000 pro ( #4282 )
...
* add cases for rtx_pro_6000 and update test filter
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
* amend a typo in model llama_v3.1_405b_instruct fp4 and add more cases for rtx pro 6000 and waive_list
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
---------
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-20 10:58:05 +08:00
Michal Guzek
0a342a42f7
[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant ( #4362 )
...
* Add CLI TestLlama3_3_70BInstruct acc tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add tests to qa lists
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add comment
Signed-off-by: moraxu <mguzek@nvidia.com>
* Fix test names
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update yaml files
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update cli file
Signed-off-by: moraxu <mguzek@nvidia.com>
---------
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-20 09:48:14 +08:00
xinhe-nv
402385588d
test: [CI] Add failed cases into waives.txt ( #4429 )
...
* update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* update waive id
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-20 09:43:55 +08:00
Yuxian Qiu
c8e062bfd3
fix: [nvbugs/5287097] Align PP layer distribution between pytorch and TRT flow. ( #4399 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-19 14:25:36 -07:00
Venky
bb02d86b54
test(perf): Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (TRT flow, trtllm-bench) ( #4128 )
...
* changes to run llama-v3.3-nemotron-super-49b
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
* yapf
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
* address review comments pt 1
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
* re-add cpp super tests
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
---------
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-19 12:00:48 -07:00
Faraz
7656af1b57
[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) ( #4335 )
...
* add mixtral7x8b fp8 test with fixed cutlass fp8 moe gemm
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* update cutlass versions
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* added internal cutlass with fix and docker update
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* added mixtral to pro 6000
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
---------
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-19 08:56:21 -07:00
liji-nv
58e405624a
[ https://nvbugs/5123103 ][fix] Fix torch compile for DeepSeekV3 ( #3952 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-05-19 22:12:25 +08:00
Iman Tabrizian
c6074c47da
Add llama4 disagg accuracy tests ( #4336 )
...
* Add llama4 disagg accuracy tests
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Make it async and add GSM8K benchmark
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 21:55:08 +08:00
Dom Brown
c45f414bbf
Test: Improve model re-use in C++ DGX tests for CI stability ( #4263 )
...
* Fix padded vocab size for Llama
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Refactor multi GPU llama executor tests, and reuse the built model engines
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Fix test list typo
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Further WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Update test lists and readme
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Try parametrize for asymmetric
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Parametrize + skip unsupported combinations
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
* Update test list
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
* Reduce environment duplicated code
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
---------
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
2025-05-19 14:20:21 +01:00
Yan Chunwei
5b1c88de8d
chore: cleanup perf_evaluator code ( #3833 )
...
* chore: cleanup perf_evaluator code
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* up
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-19 13:21:36 +08:00
Ivy Zhang
58d2508b89
tests: Add test cases for rcca cases ( #4347 )
...
* add qwen2_0_5_instruct cp4 test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add qwen2.5 fp8 kvcache test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add ds distill qwen cpp runner test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-19 12:06:43 +08:00
Ivy Zhang
c4a0d768b5
tests: add qa test mentioned in docs ( #4357 )
...
* add nemotron-h and llama_70b cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add llm decoder quick_start case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update nemotron-h test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add qwen3 quickstart test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add trtllm_decoder accuracy test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove quickstart test for llm_decoder
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix import error
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* nemotronh fp8 trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix name
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove nemotronh-fp8
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-19 10:06:51 +08:00
Faraz
791c209006
[TRTLLM-4618][feat] Add Nemotron Super 49B FP8 test on RTX6000 Pro (SM120) ( #4363 )
...
* added nemotron 49b fp8 for B40 release
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* add tests to QA list
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* pre-commit changes
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
---------
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-19 09:30:24 +08:00
Iman Tabrizian
7de90a66bc
Remove vila test ( #4376 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 09:02:39 +08:00
Yanchao Lu
0d7269e2a7
[Infra][Docs] - Some clean-up for the CI pipeline and docs ( #4419 )
...
* [Docs] - Some clean-up for the docs
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
* [Infra] - Some clean-up for the CI pipeline
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-19 00:07:45 +08:00
shaharmor98
27afcb9928
add changes for fp8, nemotron-nas, API ( #4180 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-05-18 23:27:25 +08:00
Venky
fb663b637a
Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) ( #4195 )
...
* add ll-nm-nano tests that map to nim requirements
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
* prune some pytorch cases (fp8)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
* removing pyt backend test changes
- When validating the pytorch tests with the isl/osl/conc/quant settings (that is done for cpp backend too), seeing hangs that need further debugging.
- Therefore don't want to block this PR, hence removing them.
- Seeing
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
---------
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-17 22:46:21 +08:00
Yuxian Qiu
cc1bba1686
test: Waive tests for nvbugs/5286795. ( #4409 )
...
* Waive tests for nvbugs/5286795.
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
* Apply suggestions from code review
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-17 19:41:05 +08:00
Jinyang Yuan
b618e1f55b
perf: Eliminate the need for attention DP padding when possible ( #3439 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: raccoonliukai <raccoonliu@tencent.com>
2025-05-17 13:30:55 +08:00
liji-nv
fb437ed709
[CI] waive accuracy/test_cli_flow.py::TestTinyLlama1_1BChat::test_pp4 ( #4397 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-05-16 20:18:07 +08:00
Daniel Cámpora
df19430629
chore: Mass Integration 0.19 ( #4255 )
...
* fix: Fix/fused moe 0.19 (#3799 )
* fix bug of stream init
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix: Add pre-download of checkpoint before benchmark. (#3772 )
* Add pre-download of checkpoint before benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Add missing remote code flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move from_pretrained to throughput benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move download and use snapshot_download.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Removed trusted flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Fix benchmark command in iteration log test.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
---------
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5241495 ][fix] CUDA Graph padding with overlap scheduler (#3839 )
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fuse
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* TRTLLM-4875 feat: Add version switcher to doc (#3871 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* waive a test (#3897 )
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939 )
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
* fix: remote mpi session abort (#3884 )
* fix remote mpi session
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* skip fp8 gemm for pre-hopper (#3931 )
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5247148 ][fix] Attention DP with overlap scheduler (#3975 )
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update multigpu list
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix namings
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* Doc: Fix H200 DeepSeek R1 perf doc (#4006 )
* fix doc
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* update perf number
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
---------
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* Fix the perf regression caused by insufficient cache warmup. (#4042 )
Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* doc: Update 0.19.0 release notes (#3976 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* Optimize the AutoTuner cache access code to reduce host code overhead. (#4060 )
The NVFP4 Linear op is very sensitive to the host overhead.
This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Update switcher (#4098 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* doc: update release notes (#4108 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* docs:update 0.19 doc. (#4120 )
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* docs:add torch flow supported model list. (#4129 )
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* doc: Release V0.19 Perf Overview Update (#4166 )
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
* Fix readme of autodeploy.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Update tensorrt_llm/_torch/pyexecutor/llm_request.py
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
* Revert mgmn worker node.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Change to disable_overlap_scheduler.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>
2025-05-16 10:53:25 +02:00
xinhe-nv
500b43e90c
test: [CI] remove closed bugs ( #4345 )
...
update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-16 13:47:42 +08:00