yuxianq
4f8afe4cc6
feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism ( #4034 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-16 04:16:53 +08:00
Venky
adb0839a33
test(perf): Add Phi-4-mini-instruct to perf tests ( #4267 )
...
* add phi-4-mini-instruct
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
* trim tests
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
---------
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-15 21:27:03 +08:00
yuxianq
0e87fcc228
refactor: use x is None instead of x == None. ( #4244 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-15 20:00:04 +08:00
Yanchao Lu
5ce1102a02
Revert "[test] add qa test mentioned in docs" ( #4355 )
...
Revert "[test] add qa test mentioned in docs (#4248 )"
This reverts commit b0ce1371ee .
2025-05-15 18:47:30 +08:00
zhhuang-nv
d6b741ddfe
[fix] test_no_kv_cache_reuse for overlap_scheduler ( #4350 )
...
fix test_no_kv_cache_reuse for overlap_scheduler
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-05-15 16:43:53 +08:00
xinhe-nv
14bfb5e0d6
test: FIX test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus ( #4283 )
...
* update test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* skip llava-v1.6-mistral-7b-hf-vision-trtllm on L40S
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-15 15:57:44 +08:00
zhhuang-nv
97bc680cd8
feat: support kv cache reuse for MLA ( #3571 )
...
* support kv cache reuse for MLA
load compressed_kv and k_pe and do up-projection
use 192/128 head size MLA context kernel
support Blackwell and Hopper now
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* add CI test
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
* resolve comments
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* use GPTJ style RoPE for MLA
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix rebase error and some docs
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix kv_lens
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* tiny fix
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix torch compile
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix: use normal device memory instead of pinned memory for unit test
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
* fix L0 tests
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix torch compile after rebase
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* resolve comments
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* resolve comments again
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
---------
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
Signed-off-by: zhhuang-nv <145532724+zhhuang-nv@users.noreply.github.com>
Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-05-15 15:22:21 +08:00
Kaiyu Xie
b4e5df0ee0
Breaking change: perf: Enable scheduling overlap by default ( #4174 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-15 14:27:36 +08:00
dominicshanshan
404fbe9b32
[ https://nvbugs/5277113 ][fix]genai-perf API change stress test ( #4300 )
...
* fix bug 5277113.
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
* fix bug 5277113 and 5278517.
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
---------
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-15 14:12:34 +08:00
Ivy Zhang
b0ce1371ee
[test] add qa test mentioned in docs ( #4248 )
...
* add nemotron-h and llama_70b cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add llm decoder quick_start case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update nemotron-h test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add qwen3 quickstart test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add trtllm_decoder accuracy test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove quickstart test for llm_decoder
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-15 13:37:11 +08:00
hlu1
3ea42e7519
[test] Reorganize TestDeepSeekR1::test_nvfp4_8gpus ( #4346 )
...
Reorganize TestDeepSeekR1::test_nvfp4_8gpus
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-15 13:09:13 +08:00
Mike Iovine
f9adac3dea
[feat] Enable chunked context for flashinfer ( #4132 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-15 10:59:38 +08:00
Robin Kobus
d31fefde2c
[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow ( #4092 )
...
* chore: Remove GptSession/V1 from TRT workflow
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove stateful decoders
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession buffers
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession utils
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession kernels
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove V1 GPT models from tests
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove gptSessionBenchmark from scripts and docs
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove gptSession IO classes
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession from test lists
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession from docs
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove useless encoder test
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove mActualBatchSize from DecoderState
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove static batching from ExecutorTest
- Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter.
- Adjusted related test functions to reflect the changes in parameter lists.
- Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-14 23:10:04 +02:00
Faraz
42de79d49e
test: Added tests for Llama3.1-70B-BF16 on SM120 ( #4198 )
...
* Added tests for Llama3.1-70B-BF16 on SM120
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* solve conflicts add more tests
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
---------
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-14 11:57:49 -04:00
Kaiyu Xie
6c45586c51
chore: Remove deprecated Python runtime benchmark ( #4171 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-14 18:41:05 +08:00
HuiGao-NV
f4059c6e2e
Add test case for kv memory estimation ( #4158 )
...
* Add test case for kv memory estimation
* Dump running log into file and parse kv cache memory size from file
* Set bigger peak memory size for mixed percision case and test_ptp_quickstart_advanced_eagle3 case
* Revert change to usage of fraction
* use context manager to guard temp files
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-14 18:39:25 +08:00
DylanChen-NV
206f82115d
[bug/5247505] fix: CP accuracy on Blackwell ( #4188 )
...
* fix xqa params for cp
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* try adding B200 multi gpu test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add accuracy tests for cp
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
---------
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-05-14 17:40:50 +08:00
Anurag Mukkara
b15f57763d
tests: PyTorch multimodal using keyword match ( #4215 )
...
* keyword accuracy check for pytorch multimodal
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
* Change keywords for some prompts
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
* Delete full text answers
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
* Cleanup debug code
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
---------
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-05-14 17:18:43 +08:00
bhsueh_NV
1a9298bc66
CI: add fp8/fp4 ci on Qwen3-30B-A3B ( #4266 )
...
add fp8/fp4 ci on Qwen3-30B-A3B
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-14 14:38:04 +08:00
brb-nv
8280c3d4f2
feat: Support Gemma3-1b-it in Pytorch workflow ( #3999 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-14 14:02:44 +08:00
Yi Zhang
86ae506b9d
[fix] Enable pp tests ( #3978 )
...
Fix misrebase issue
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-05-14 10:51:20 +08:00
brb-nv
1ef117688c
test: Validate FP8 and LoRA for Gemma3 ( #3670 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-13 17:28:02 -07:00
brb-nv
cd5b3d21a0
feat: Support Mistral Small 3.1 24B VLM in TRT workflow ( #4183 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-14 03:47:22 +08:00
ruodil
d555fe2530
test: fix for perf test script issue ( #4230 )
...
fix for perf test script issue
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-13 10:29:20 +08:00
xinhe-nv
0cebc16139
test: [CI] Add failed cases into waives.txt ( #4205 )
...
waive tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-13 10:22:42 +08:00
Enwei Zhu
035d915fea
[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior ( #4090 )
...
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* normalize mtp_nextn
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update test_durations
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-13 07:41:51 +08:00
wili
eba3623a54
Feat: Variable-Beam-Width-Search (VBWS) part4 ( #3979 )
...
* feat/vbws-part4-v1.8: rebase
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* feat/vbws-part4-v1.9: fix incorrect output when using short output length
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.1: remove useless variables
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.2:fix incorrect output when using short output length
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.3: rebase
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.4: rebase
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.5: remove API change
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
---------
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-05-12 22:32:29 +02:00
Zheng Duan
c9e2a963e0
feat: add kv cache aware router ( #3831 )
...
* kv cache aware router
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* add tests
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* router config
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* eviction test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
add test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* eviction detect in worker test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* move worker tests to single gpu
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* reduce memory fraction
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* fix partial block
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
---------
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-05-12 07:23:57 -04:00
Yixin Dong
c90ebadd84
feat: Support the Structural Tag in guided decoding ( #4066 )
...
* finish
Signed-off-by: Ubospica <ubospica@gmail.com>
* update
Signed-off-by: Ubospica <ubospica@gmail.com>
* update
Signed-off-by: Ubospica <ubospica@gmail.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* exc overlap scheduler
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add test
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix api ref
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-12 17:24:50 +08:00
Ivy Zhang
ee92edf2b4
[ https://nvbugspro.nvidia.com/bug/5270564 ][test] skip per-hopper for llama4 ( #4211 )
...
skip per-hopper for llama4
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-12 15:27:15 +08:00
ruodil
9c03a7ab74
test: add llama_3.2_1B model and fix for test lora script issue ( #4139 )
...
* test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
* add llama_3.2_1B model and fix for lora script issue
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
---------
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-12 14:51:59 +08:00
xinhe-nv
849d9c343c
tests: https://nvbugs/5219534 remove failed tests from test list ( #4113 )
...
remove unsupported tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-12 14:13:40 +08:00
Enwei Zhu
7db368c72c
test: Remove CNN Dailymail tasks in favor of GSM8K ( #4187 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-10 09:02:07 +08:00
Dom Brown
2d0f93a054
Refactor: Restructure C++ tests for better modularisation of non-shared code ( #4027 )
...
* Refactor: Restructure C++ tests for better modularisation of non-shared code
Start cleanup of pytest code for C++ tests
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Clean up names and remove references to test_cpp.py
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Move multi-GPU code
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Update doc and try un-waiving
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Update multi GPU file check
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Address minor multi-GPU setup bug
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
---------
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-05-09 19:16:51 +01:00
Tracin
446f62bbab
chore: Deprecate evaltool ( #4173 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-05-09 20:31:53 +08:00
ruodil
bf5b2a2e0a
test: amend regex match for perf throughput ( #4186 )
...
amend regex match for perf throughput
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-09 17:33:25 +08:00
ruodil
5ce5b81281
test: amend default pytorch extra-llm-api-config.yml in perf test ( #4176 )
...
* amend default pytorch extra-llm-api-config.yml
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
* add print info to separate cases in output log
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
---------
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-09 16:46:48 +08:00
xinhe-nv
1d26a3fd7c
test: skip tests on b200 ( #3913 )
...
* skip tests on b200
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* skip phi-3-128k
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-09 14:51:55 +08:00
Bo Li
e3cf3fd15f
test: Add fp8kv to DS-v3-lite integration tests. ( #3950 )
...
* Add fp8 kv cache tests to DSV3-Lite integration tests.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Refactor. Make fp8kv parallel to attention_dp, overlap_scheduler and cuda_graph.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update gsm8k.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update CI list.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update TestDeepSeekR1.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Fix test list.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Need quant_config besides pytorch_config.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update waive list (bug 5239087).
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update waive list.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Correct test name.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
* Update waive list.
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
---------
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Bo Li <bobboli0202@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-09 13:35:04 +08:00
Ivy Zhang
c91d03fa0a
test: move mistral / mixtral test cases in QA test list into the new accuracy test suite ( #3440 )
...
* add mistral-7b-v0.1 torch flow test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* rearrange mistral
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* rearrange mixtral case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove api function test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* move mistral nemo cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* move mixtral cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update threshold
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix failure
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix name
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix failure cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update list
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update threshold
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove awq llmapi test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* adjust threshold
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix ci
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix partial comments
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix path
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update thres
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove duplicate test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix ci
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-09 13:32:02 +08:00
Ivy Zhang
c2d4c2adb6
[ https://nvbugspro.nvidia.com/bug/5260676 ]test: skip fp8 quantization case for pre-ada ( #4095 )
...
skip pre ada
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-09 13:30:16 +08:00
Enwei Zhu
74df12bbaa
[TRTLLM-4480][doc] Documentation for new accuracy test suite and trtllm-eval ( #3946 )
...
* fix formula
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update doc
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* 1st version
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* polish
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-08 19:35:23 +08:00
Ivy Zhang
7666bec7c4
[TRTQA-2861][test]: add nemotron and llama4 cases into qa test ( #4053 )
...
* add MMLU, GPQADiamond check for llama-4 models
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add nomotron cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add online quant test cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove trt flow cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update threshold
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* adjust parallelism strategy
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix fail
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update sanity list
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix comment
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* skip nemotron-h test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-08 18:10:41 +08:00
Ivy Zhang
d7c51c953b
test: add INTEGRATION_TEST env var to speed up integration test ( #3618 )
...
add INTEGRATION_TEST env var
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-08 10:44:50 +08:00
ruodil
4d0e462723
tests: skip writing prepare_dataset output to logs, and add llama_v3.1_8b_fp8, llama_v3.3_70b_fp8, llama_v3.1_405b_fp4 models ( #3864 )
...
* tests: skip writing prepare_dataset output to logs
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
* test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
---------
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-07 13:56:35 +08:00
Yan Chunwei
0c26059703
chore: Cleanup deprecated APIs from LLM-API (part 1/2) ( #3732 )
...
* beam_width and max_new_token
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* remove beam_width
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* remove min_length
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* remove return_num_sequences
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-07 13:20:25 +08:00
Venky
62fea1e885
test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests ( #3822 )
...
* **Model:** Llama-3.1-Nemotron-Nano-8B-v1
* **Precision:** float16
* **Environment:**
* GPUs: 1 H100 PCIe
* Driver: 570.86.15
* **Test String:** `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128`
* **Request Throughput:** 81.86 req/sec
* **Total Token Throughput:** 20956.44 tokens/sec
* **Average Request Latency:** 5895.24 ms
* **Test String:** `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000`
* **Request Throughput:** 1.45 req/sec
* **Total Token Throughput:** 5783.92 tokens/sec
* **Average Request Latency:** 211541.08 ms
* **Test String:** `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128`
* **Request Throughput:** 52.75 req/sec
* **Total Token Throughput:** 13505.00 tokens/sec
* **Average Request Latency:** 5705.50 ms
* **Test String:** `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000`
* **Request Throughput:** 1.41 req/sec
* **Total Token Throughput:** 5630.76 tokens/sec
* **Average Request Latency:** 217139.59 ms
Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>
2025-05-06 17:17:55 -07:00
dominicshanshan
3ac6637005
fix: trtllm-serve hang in stress test and ds v3 stress parameter update ( #3836 )
...
* Remove stdout pipe for genai-perf and make stress time as public parameter.
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
* Update llmRequest based on comment.
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
* launch process function refactor.
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
---------
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-06 16:52:30 +08:00
bhsueh_NV
5c0f554b9e
doc: update qwen3 document ( #4073 )
...
* update qwen3 document
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* remove useless codes
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-06 08:42:51 +08:00
bhsueh_NV
e053cb651b
Fix: fix bug of qwen3 moe ( #4058 )
...
* fix bug of qwen3 moe
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* update threshold
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-06 08:20:15 +08:00