Commit Graph

134 Commits

Author SHA1 Message Date
Enwei Zhu
74df12bbaa
[TRTLLM-4480][doc] Documentation for new accuracy test suite and trtllm-eval (#3946)
* fix formula

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update doc

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* 1st version

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* polish

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-08 19:35:23 +08:00
Ivy Zhang
7666bec7c4
[TRTQA-2861][test]: add nemotron and llama4 cases into qa test (#4053)
* add MMLU, GPQADiamond check for llama-4 models

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add nomotron cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* add online quant test cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove trt flow cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* adjust parallelism strategy

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix fail

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update sanity list

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix comment

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* skip nemotron-h test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-08 18:10:41 +08:00
Ivy Zhang
d7c51c953b
test: add INTEGRATION_TEST env var to speed up integration test (#3618)
add INTEGRATION_TEST env var

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-08 10:44:50 +08:00
ruodil
4d0e462723
tests: skip writing prepare_dataset output to logs, and add llama_v3.1_8b_fp8, llama_v3.3_70b_fp8, llama_v3.1_405b_fp4 models (#3864)
* tests: skip writing prepare_dataset output to logs

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

* test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

---------

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-07 13:56:35 +08:00
Yan Chunwei
0c26059703
chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732)
* beam_width and max_new_token

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove beam_width

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove min_length

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* remove return_num_sequences

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-07 13:20:25 +08:00
Venky
62fea1e885
test(perf): Add Llama-3.1-Nemotron-8B-v1 to perf tests (#3822)
*   **Model:** Llama-3.1-Nemotron-Nano-8B-v1
*   **Precision:** float16
*   **Environment:**
    *   GPUs: 1 H100 PCIe
    *   Driver: 570.86.15

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:128,128`
*   **Request Throughput:** 81.86 req/sec
*   **Total Token Throughput:** 20956.44 tokens/sec
*   **Average Request Latency:** 5895.24 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-pytorch-float16-input_output_len:2000,2000`
*   **Request Throughput:** 1.45 req/sec
*   **Total Token Throughput:** 5783.92 tokens/sec
*   **Average Request Latency:** 211541.08 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:128,128`
*   **Request Throughput:** 52.75 req/sec
*   **Total Token Throughput:** 13505.00 tokens/sec
*   **Average Request Latency:** 5705.50 ms

*   **Test String:** `llama_v3.1_nemotron_nano_8b-bench-float16-maxbs:128-input_output_len:2000,2000`
*   **Request Throughput:** 1.41 req/sec
*   **Total Token Throughput:** 5630.76 tokens/sec
*   **Average Request Latency:** 217139.59 ms

Signed-off-by: Venky Ganesh <gvenkatarama@nvidia.com>
2025-05-06 17:17:55 -07:00
dominicshanshan
3ac6637005
fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836)
* Remove stdout pipe for genai-perf and make stress time as public parameter.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* Update llmRequest based on comment.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* launch process function refactor.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

---------

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-06 16:52:30 +08:00
bhsueh_NV
5c0f554b9e
doc: update qwen3 document (#4073)
* update qwen3 document

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* remove useless codes

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-06 08:42:51 +08:00
bhsueh_NV
e053cb651b
Fix: fix bug of qwen3 moe (#4058)
* fix bug of qwen3 moe

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

* update threshold

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>

---------

Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-06 08:20:15 +08:00
pansicheng
e84dc6b3c7
feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354)
* add deepseek-r1 reasoning parser

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>

* fix test

Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>

---------

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-06 08:13:04 +08:00
Iman Tabrizian
85867d76dd
test: Add disaggregated serving accuracy tests (#4036)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-05 08:56:59 -07:00
Yanchao Lu
5ee38ad92a
[Test]: Clean up stale waives (#4062)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-05 22:13:12 +08:00
Yukun He
aa38e28cfa
fix: [nvbug/5241627] Fix AllReduce kernel hang issue when both tp and pp are enabled. (#3988)
* Fix AllReduce kernel hang issue when both tp and pp are enabled.
Allocate one workspace for each pp rank to avoid potential race.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

* update waive list

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

---------

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-05 11:33:25 +08:00
Yan Chunwei
bc0cf41592
chore: refactor llmapi e2e tests (#3803)
* refactor llmapi e2e tests

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-05 07:37:24 +08:00
Emma Qiao
2692daad2e
infra: Remove the WAR for test items incompletely (#3313)
* Remove the WAR for test items incompleted

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Complete test item manually

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix another test definition file

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Complete test name

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix some other test names

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix another test name after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update name for waived case name, too

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix name for multi-gpu tests

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix test name after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix another test name

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix typo

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix test name after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix other qa tests

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix tests name after rebase

Signed-off-by: qqiao <qqiao@nvidia.com>

* Fix name after rebase

Signed-off-by: qqiao <qqiao@nvidia.com>

* Correct test names in waive.txt

Signed-off-by: qqiao <qqiao@nvidia.com>

* Add new test_durations file

Signed-off-by: qqiao <qqiao@nvidia.com>

* Fix names after rebase

Signed-off-by: qqiao <qqiao@nvidia.com>

* Update test duration to latest

Signed-off-by: qqiao <qqiao@nvidia.com>

---------

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-05-04 11:31:59 +08:00
Robin Kobus
403370af62
refactor: Move ModelSpec to core library (#3980)
* refactor: Move ModelSpec from tests to core library

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Move ModelSpec from runtime to separatedir

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Use new bindings path and clean up

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Updated licenses

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Remove script_dir from path

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-04 01:39:09 +08:00
bhsueh_NV
561ee44737
add ci and doc for qwen3 (#4022)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-02 14:13:38 +08:00
nv-guomingz
dc344b6a4f
fix:https://nvbugs/5246733 (#3989)
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-05-01 22:52:31 +08:00
YueWeng
b1621e8d4e
feat: add relaxed acceptance for DS (#3865)
* add relaxed acceptance for DS R1

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

* clean and update docs

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

* fix

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

* Modified based on review

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

* fix mtp manager issue

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>

---------

Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-05-01 21:50:36 +08:00
Dom Brown
b40f351b7a
[TRTLLM-4460] test: Use Llama 3.2 1B for Llama C++ tests (#3206)
* Squash of dev commits

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Add timer + waive test with suspected GptSession bug

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>

* Respond to reviewer comments

Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>

---------

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
2025-05-01 05:31:08 +08:00
yuxianq
f568cbb671
chore: Remove duplicated get_sm_version. (#3935)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-30 11:43:53 +08:00
QI JUN
99929e724b
ci: skip pipeline parallelism test of pytorch flow (#3947)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-30 01:00:16 +08:00
Pamela Peng
c8649ce3aa
skip blackwell tests for sm120 (#3815)
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
2025-04-29 09:53:35 -07:00
Dom Brown
8709fe8b53
chore: bump version to 0.19.0 (#3598) (#3841)
test: add test cases for 0.19 release (#3608)

* fix test name



* add quickstart test for nemotron-ultra



* add rcca multi-node test case for deepseek-v3



* add rcca info



---------




squash (#3642)



fix: nvbugs/5187237: fix deterministic mode crash (#3448)

* nvbugs/5187237 nvbugs/5112075: fix deterministic mode error

* remove waive


* Revert "remove waive"

This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac.



* revert ar fusion



---------



update fp8 doc (#3647)




tests: change qa perf test to trtllm-bench (#3619)




 fix: FP8 quantized lm_head (NvBug 5214229) (#3567)



infra: Add PR approval protection for the release branch (#3634)



fix: nvbugs/5231298: pytorch allreduce issue (#3673)



Fix: nvbugs/5222698 variable not defined (#3630)

* Fix: nvbugs/5222698 variable not defined



* Tidy code



---------



test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685)



test:restore fp8 kv cache testing for L0 (#3671)



doc: Update DeepSeek perf docs (#3693)

* Update DeepSeek perf docs



* update



* Apply suggestions from code review




---------




tests: waive test_llm_multi_node (#3664)



fix: update test_user_buffers_mm_add_prologue atol (#3711)



Fix: cherry-pick hmac encryption from main branch (#3635)

* security fix cherry-pick changes from main



* fix hmac in remote mpi session (#3649)



---------





Un-waive DS-V3-Lite tests. (#3621)



fix: FP8 kv accuracy (#3675)

* fix FP8 kv accuracy



* update doc



---------



Fix script options for engines. (#3622)



unwaive multi-node test (#3721)



chore : Split more tests out of gpt tests (#3524) (#3674)



doc:add torch examples link into torch backend documentation (#3749)




test: Get Eagle tests working (#3593) (#3722)




Waive L0 test (#3756)



waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3656)





Update ds v3 parameters in stress test. (#3676)

waive gemma on L20 (#3766)



https://nvbugs/5141291: Fix convert.py script for Qwen model. (#3758)

Include Qwen2VLDecoderLayer in the smooth_qwen2_model function.



fix: PP4 fixes and cleanup (#3688)




remove benchmark test list (#3643)



skip disagg deepseek test if sm!=90 (#3720)



test: skip failed cases on B200 (#3710)

* add skip condition to tests



* fix error



---------



test: [nvbug: 5234494] skip_pre_ada for fp8 cases (#3718)

* skip_pre_ada for fp8 cases



* update



* update after rebase



---------



add know issue to deepseek doc. (#3800)



Fix ModelOpt Mixtral AWQ OOM (#3714) (#3761)




Waive L0 tests (#3826)



fix: Reduce memory usage in fused moe op associated with AutoTuning and fix moe fallback issue. (#3793)

* Reduce memory usage in fused moe op associated with AutoTuning.
* Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens.
* Add free_memory logic of workspace in min_latency_mode fused moe path.



* Fix fused_moe fallback issue. (#3652)

min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression.



---------



[doc] Better document for Draft-Target-Model (DTM) speculative decoding (#3797)




Fix pre-commit



Fix again



Address some review comments for the MI

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-29 16:57:22 +08:00
xiweny
f84dd8f815
test: add deepseek v3 & r1 cases (#3528)
* test: add deepseek v3 & r1 cases

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-04-28 23:37:26 +08:00
Zhenhuan Chen
ad15e45f07
[TRTLLM-4638 ][feat] add best of n support with reward model in scaffolding (#3807)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-04-28 17:15:33 +08:00
xinhe-nv
e20b67e9fd
update waives & tests (#3887)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-28 14:29:35 +08:00
Dom Brown
7ff9fd345c
Test: Split C++ unit tests for CI granularity (#3868)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-04-25 13:30:58 -07:00
Yuan Tong
57944206ba
feat: return logits in PyTorch flow (#3221)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-24 16:56:03 -07:00
QI JUN
991939a0f4
chore: increase A30 for cpp test (#3811)
* increase A30 for cpp test

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* enable parallel run test for gpt_executor

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* clean

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* decrease freeGpuMemoryFraction of cpp tests

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-24 16:34:39 -07:00
Enwei Zhu
777c40e5fa
[https://nvbugspro.nvidia.com/bug/5238599][fix] Normalize example path in accuracy tests
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-24 10:09:59 +08:00
xinhe-nv
476d7003f8
test: [CI] Add failed cases into waives.txt (#3777)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waives.txt

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-24 09:36:05 +08:00
Mike Iovine
bc5fe7800d
[chore] Fix KV cache block reuse flag name in quickstart_advanced (#3781)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-04-24 06:02:47 +08:00
Kaiyu Xie
dfbcb543ce
doc: fix path after examples migration (#3814)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-24 02:36:45 +08:00
Enwei Zhu
a51b3cf7a6
[TRTLLM-4763][test] Accuracy test improvement (Part 3.6): Deprecate mmlu_llmapi.py (#3802)
* cleanup mmlu_llmapi.py

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* polish

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-23 23:05:13 +08:00
Enwei Zhu
8f2b2eaf83
test: Add DeepSeek-V3-Lite GSM8K tests (#3771)
* tmp

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update ref

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update waives

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-23 16:54:48 +08:00
Yechan Kim
11d35656bf
fix: nvbugs/5234029 fix Qwen2.5-VL image test (#3726)
* fix: nvbugs/5234029 fix Qwen2.5-VL image test case by adding more answer candidate

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove qwen2.5_vl from waive list

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-04-23 14:09:39 +08:00
xinhe-nv
80d8fdefd6
add test_mistral_large_hidden_vocab_size tests (#3716)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-23 13:40:11 +08:00
ruodil
9223000765
waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3657)
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-04-22 14:51:45 +08:00
Yi Zhang
98966cb45e
test: Unwaive Llama 3.1 with torch compile test (#3475)
* Fix log info

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

* Revert "test: Waive torch compile tests (#3471)"

This reverts commit 410f56357e.

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

* Update test_llm_api_pytorch.py

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>

---------

Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-04-22 10:41:56 +08:00
Enwei Zhu
3fa19ffa4e
test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483)
* add gsm8k

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix gsm8k

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add gpqa

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* conditional import lm_eval

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* gpqa in lm_eval

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* system prompt

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* shuffle

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update AA prompt and regex

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* revert AA prompt and regex

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* integration to tests

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add DS-R1

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix and clean

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update tests

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* clean up

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* free_gpu_memory_fraction=0.8

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-22 07:38:16 +08:00
Barry Kang
d87b009d8d
Fix ModelOpt Mixtral AWQ OOM (#3714)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-04-21 19:14:14 +08:00
Zheng Duan
ae48abefc1
bind block key and hasher (#3712) 2025-04-21 18:50:57 +08:00
Iman Tabrizian
af04b6f6aa
bug: Fix hang bug when context server doesn't have enough capacity for KV Cache (#3095)
* Fix hang bug when KV cache is low

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Review comments

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Fix attentiondp typo

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Add CI test for this case

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* fix: Fix the insertion order for responder futures

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* fix: Fix disagg CPP

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

---------

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-21 15:16:55 +08:00
Stanley Sun
852dd0c1be
test: add llama3.2 ptp test case (#3363)
* add llama3.2 ptp test case

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

* update test list

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>

---------

Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-04-21 15:15:45 +08:00
Emma Qiao
48db263d9a
infra: Add test list name check (#3097)
* Add steps to check test names

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Correct test-db command

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Switch to use a trt-llm image

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update go path

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Correct go path

Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Move the test list check to test ci

Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Correct file path

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix path again

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix get path

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix typo

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Skip test list check for ARM

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix expression

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Change back unrelated file

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Correct qa test names

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Remove a stage

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update jenkins/L0_Test.groovy

Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Move some steps to a python script

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix script path

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Split commands and debug

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix typo

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix typo

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Also correct case name in waives list

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Move check script to another folder

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update qa list after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Remove the perf tests under QA

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Some tests already fixed after rebase to TOT

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

---------

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-04-20 23:02:16 +08:00
QI JUN
d51ae53940
move the reset models into examples/models/core directory (#3555)
* move rest models to examples/models/core directory

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* update multimodal readme

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix example path

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix cpp test

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix tensorrt test

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-19 20:48:59 -07:00
brb-nv
c35d2a7532
test: Get Eagle tests working (#3593)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-20 00:50:57 +08:00
nv-guomingz
e70961f541
test:update waives.txt for nvbug 5219532 (#3672)
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-04-19 18:57:39 +08:00
Iman Tabrizian
61ee983488
fix: Fix disaggregated load balance test (#3689)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-19 10:40:40 +08:00