Commit Graph

73 Commits

Author SHA1 Message Date
Ivy Zhang
c91d03fa0a
test: move mistral / mixtral test cases in QA test list into the new accuracy test suite (#3440)
* add mistral-7b-v0.1 torch flow test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* rearrange mistral

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* rearrange mixtral case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove api function test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* move mistral nemo cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* move mixtral cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix failure

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix name

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix failure cases

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update list

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove awq llmapi test

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* adjust threshold

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix ci

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix partial comments

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix path

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update thres

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* update

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* remove duplicate test case

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

* fix ci

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

---------

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-09 13:32:02 +08:00
yuanjingx87
6e1d2a1320
feat: Add Slurm support and enable RTX Pro 6000 testing pipeline in CI (#4019)
* Add slurm support with RTXPro6000 PostMerge Tests

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>

* remove H100 post merge test from testing

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>

---------

Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-08 15:15:36 +08:00
dominicshanshan
3ac6637005
fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836)
* Remove stdout pipe for genai-perf and make stress time as public parameter.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* Update llmRequest based on comment.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

* launch process function refactor.

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>

---------

Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-06 16:52:30 +08:00
pansicheng
e84dc6b3c7
feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354)
* add deepseek-r1 reasoning parser

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>

* fix test

Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>

---------

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-06 08:13:04 +08:00
Iman Tabrizian
85867d76dd
test: Add disaggregated serving accuracy tests (#4036)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-05 08:56:59 -07:00
Yan Chunwei
bc0cf41592
chore: refactor llmapi e2e tests (#3803)
* refactor llmapi e2e tests

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-05 07:37:24 +08:00
Emma Qiao
2692daad2e
infra: Remove the WAR for test items incompletely (#3313)
* Remove the WAR for test items incompleted

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Complete test item manually

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix another test definition file

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Complete test name

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix some other test names

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix another test name after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update name for waived case name, too

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix name for multi-gpu tests

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix test name after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix another test name

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix typo

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix test name after rebase

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix other qa tests

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Fix tests name after rebase

Signed-off-by: qqiao <qqiao@nvidia.com>

* Fix name after rebase

Signed-off-by: qqiao <qqiao@nvidia.com>

* Correct test names in waive.txt

Signed-off-by: qqiao <qqiao@nvidia.com>

* Add new test_durations file

Signed-off-by: qqiao <qqiao@nvidia.com>

* Fix names after rebase

Signed-off-by: qqiao <qqiao@nvidia.com>

* Update test duration to latest

Signed-off-by: qqiao <qqiao@nvidia.com>

---------

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-05-04 11:31:59 +08:00
Mike Iovine
906cddffb0
[infra] Improve llama4 parallelism test coverage (#3821)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-02 16:15:04 -04:00
bhsueh_NV
561ee44737
add ci and doc for qwen3 (#4022)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-02 14:13:38 +08:00
Dom Brown
8709fe8b53
chore: bump version to 0.19.0 (#3598) (#3841)
test: add test cases for 0.19 release (#3608)

* fix test name



* add quickstart test for nemotron-ultra



* add rcca multi-node test case for deepseek-v3



* add rcca info



---------




squash (#3642)



fix: nvbugs/5187237: fix deterministic mode crash (#3448)

* nvbugs/5187237 nvbugs/5112075: fix deterministic mode error

* remove waive


* Revert "remove waive"

This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac.



* revert ar fusion



---------



update fp8 doc (#3647)




tests: change qa perf test to trtllm-bench (#3619)




 fix: FP8 quantized lm_head (NvBug 5214229) (#3567)



infra: Add PR approval protection for the release branch (#3634)



fix: nvbugs/5231298: pytorch allreduce issue (#3673)



Fix: nvbugs/5222698 variable not defined (#3630)

* Fix: nvbugs/5222698 variable not defined



* Tidy code



---------



test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685)



test:restore fp8 kv cache testing for L0 (#3671)



doc: Update DeepSeek perf docs (#3693)

* Update DeepSeek perf docs



* update



* Apply suggestions from code review




---------




tests: waive test_llm_multi_node (#3664)



fix: update test_user_buffers_mm_add_prologue atol (#3711)



Fix: cherry-pick hmac encryption from main branch (#3635)

* security fix cherry-pick changes from main



* fix hmac in remote mpi session (#3649)



---------





Un-waive DS-V3-Lite tests. (#3621)



fix: FP8 kv accuracy (#3675)

* fix FP8 kv accuracy



* update doc



---------



Fix script options for engines. (#3622)



unwaive multi-node test (#3721)



chore : Split more tests out of gpt tests (#3524) (#3674)



doc:add torch examples link into torch backend documentation (#3749)




test: Get Eagle tests working (#3593) (#3722)




Waive L0 test (#3756)



waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3656)





Update ds v3 parameters in stress test. (#3676)

waive gemma on L20 (#3766)



https://nvbugs/5141291: Fix convert.py script for Qwen model. (#3758)

Include Qwen2VLDecoderLayer in the smooth_qwen2_model function.



fix: PP4 fixes and cleanup (#3688)




remove benchmark test list (#3643)



skip disagg deepseek test if sm!=90 (#3720)



test: skip failed cases on B200 (#3710)

* add skip condition to tests



* fix error



---------



test: [nvbug: 5234494] skip_pre_ada for fp8 cases (#3718)

* skip_pre_ada for fp8 cases



* update



* update after rebase



---------



add know issue to deepseek doc. (#3800)



Fix ModelOpt Mixtral AWQ OOM (#3714) (#3761)




Waive L0 tests (#3826)



fix: Reduce memory usage in fused moe op associated with AutoTuning and fix moe fallback issue. (#3793)

* Reduce memory usage in fused moe op associated with AutoTuning.
* Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens.
* Add free_memory logic of workspace in min_latency_mode fused moe path.



* Fix fused_moe fallback issue. (#3652)

min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression.



---------



[doc] Better document for Draft-Target-Model (DTM) speculative decoding (#3797)




Fix pre-commit



Fix again



Address some review comments for the MI

Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-29 16:57:22 +08:00
QI JUN
c381380ecc
increase H100 CI nodes for PyTorch only pipelines (#3927)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-29 10:58:43 +08:00
Jinyang Yuan
dafc28fb85
fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863) 2025-04-29 09:09:43 +08:00
xiweny
f84dd8f815
test: add deepseek v3 & r1 cases (#3528)
* test: add deepseek v3 & r1 cases

Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-04-28 23:37:26 +08:00
Dom Brown
7ff9fd345c
Test: Split C++ unit tests for CI granularity (#3868)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-04-25 13:30:58 -07:00
QI JUN
991939a0f4
chore: increase A30 for cpp test (#3811)
* increase A30 for cpp test

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* enable parallel run test for gpt_executor

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* clean

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* decrease freeGpuMemoryFraction of cpp tests

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-24 16:34:39 -07:00
Zhanrui Sun
bfc4e55ded
infra: [TRTLLM-4417]Support auto trigger special test stage for special file change (#3478)
* infra: Support auto trigger special test stage for special file change

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

* Fix review

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

* Fix review

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

---------

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-04-23 20:32:19 +08:00
xinhe-nv
80d8fdefd6
add test_mistral_large_hidden_vocab_size tests (#3716)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-04-23 13:40:11 +08:00
QI JUN
257abfbc51
move pytorch tests of LLM API into separate test files (#3745)
* move pytorch tests of LLM API into separate test files

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* polish

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* update

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* clean

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-22 14:36:59 -07:00
Emma Qiao
442386d302
infra: Add test stages for sm120 (#3533)
* Add test stages for sm120

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Update chip name and config name

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Split tests to gb202 and gb203

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Don't flash driver for rtx-5090

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Skip the failed cases

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Change the test stage names

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>

* Reduce 5080 jobs and add back gpu list which doesn't support dynamic driver flashing

Signed-off-by: qqiao <qqiao@nvidia.com>

* Skip failed case on gb202

Signed-off-by: qqiao <qqiao@nvidia.com>

* Fix condition to dynamic driver flashing

Signed-off-by: qqiao <qqiao@nvidia.com>

---------

Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-04-23 01:26:12 +08:00
Iman Tabrizian
af04b6f6aa
bug: Fix hang bug when context server doesn't have enough capacity for KV Cache (#3095)
* Fix hang bug when KV cache is low

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Review comments

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Fix attentiondp typo

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Add CI test for this case

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* fix: Fix the insertion order for responder futures

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

* fix: Fix disagg CPP

Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>

---------

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-21 15:16:55 +08:00
Yechan Kim
5460d18b10
feat: trtllm-serve multimodal support (#3590)
* feat: trtllm-serve multimodal support

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove disable argument

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove disable

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add and separate tests and move the doc

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* remove block_resue arg from serve.py

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-19 05:01:28 +08:00
pcastonguay
ae5671644a
feat: Disaggregated router class (#3584)
* Add draft scheduler class

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* Refactor the design

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* feat: Introduce router class for disaggregated server

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Add unit tests for router class

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Adding tests for disagg_utils

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing missing import

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing disagg integration tests

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Addressing MR review comments

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-19 00:34:12 +08:00
QI JUN
b9fce42717
enable test_ptp_quickstart_advanced_mixed_precision (#3667)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-18 05:06:24 -07:00
Zheng Duan
bce7ea8c38
test: add kv cache event tests for disagg workers (#3602) 2025-04-18 18:30:19 +08:00
peaceh-nv
88cff61fa1
chore : Split more tests out of gpt tests (#3524)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-04-18 12:04:57 +08:00
dongfengy
b71a0f76b4
test: Add llama 4 to ci (#3520)
* Add llama 4 to ci

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

* Only test trtllm

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

* Disable marverick

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>

---------

Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-04-18 11:25:52 +08:00
Netanel Haber
3c52ac098f
feat: allocate minimal blocks per window size (#3028)
* implement variable window attention by breaking the block manager into window block managers per window size

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* revert isCyclic to be true if the min attention window is reached, not per window size

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* add explanatory comment to mCyclicThreshold

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* load correct gemma config

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* don't shadow inputLength in addSequence - it should remain the function scope input length between window size loop iterations

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix KVCacheManagerVariableWindowAttentionWithReuseTest for multiple window block managers

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* if TYPE_CHECKING

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* set temp_attention_window_inputs to None explicitly

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* set temp_attention_window_inputs to None explicitly

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* pass dtype as well

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* test_gemma variable sliding window attention

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* allot a fraction of primary/secondaryBlocks to different window size heaps, depending on the window size's total contribution to the kvcache size (i.e., including all layers)

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* remove || mEnableBlockReuse which erroneously triggers beamsearch code for cyclic variable attention window code

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* turn off request delaying for MaxUtil

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* make comments better

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* windowSizesTotalSum using std::accumulate

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix error handling of forwardAsync - forwardAsync catch-all catch cleanup code that runs terminateRequest can also fail and must be caught

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix comments

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* remove assert that kills disagg tests, since it isn't necessary

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix corrupted expression: 'isNewTask && (peftCacheManager ?' -> '(isNewTask && peftCacheManager) ?' which caused boolean algebra. Main is correct

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* add Gemma3 to SUPPORTED_HF_ARCHITECTURES

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* support Gemma3

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix kvfactor field for deepseek

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix comment

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix gemma-3 entries in testlist to include vswa

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* only quantize gemma2 VSWA

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

remove misleading comment

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

fix test_gemma

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix test_gemma

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix test_gemma

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* in sendRequestInfo, fromOldAllocatedBlockIds->fromOldAllocatedBlockIds, like in main

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>

* fix: disable KV cache reuse if using attention sink (#3021)

* fix: disable KV cache reuse if using attention sink

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: disable KV cache reuse if sink bubble

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* add comment

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-04-17 16:04:57 +08:00
Zheng Duan
b0cb963199
test: torch-flow conditional disagg test (#3410)
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-04-15 10:54:14 +08:00
nv-guomingz
b32ae7ac92
test:add fp8_kv_cache functionality test case. (#3457)
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
2025-04-15 09:16:46 +08:00
Iman Tabrizian
bad55e99bb
test: Add MTP + overlap + Attention DP disaggregated test (#3542)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-04-15 07:46:03 +08:00
brb-nv
44090a5388
Add support for Phi-4-MM (#3296)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-14 14:24:10 +08:00
Yiqing Yan
19d296b4b2
chore: add dgx_h200 tests (#3451)
* add dgx_h200 tests

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* test

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix pre-commit

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* change bsl branch

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* fix

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

* change multi gpu related file list

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>

---------

Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-04-14 11:20:55 +08:00
dominicshanshan
5d3180be82
feat: Add stress test for TRT-LLM (#3250)
Signed-off-by: Wangshanshan <dominicw@nvidia.com>
2025-04-13 10:24:25 +08:00
pcastonguay
145a126a28
chore: Unwaive DS + overlap disagg test (#3339)
* chore: Unwaive DS + overlap disagg test

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing pre-commit

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing pre-commit

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-12 13:33:38 -04:00
Enwei Zhu
cf9ceea890
test: Add DeepSeek-V3-Lite PP=4 cases (#3454)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-12 00:09:12 +08:00
QI JUN
16ca45747b
always trigger multi gpu test to protect modeling_llama.py and modeling_deepseekv3.py (#3434)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-11 13:19:23 +08:00
Iman Tabrizian
d7f45e50c6
test: disable attention DP tests for single GPU (#3395)
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-04-11 01:38:17 +08:00
amitz-nv
a6a2ae6cc1
chore: Rename nvsmall to nemotron nas (#3447)
* Rename nvsmall to nemotron NAS

* Revert nvsmall to nemotron_nas rename in paths in tests that access llm_models_root/nvsmall/tests

* Add NemotronNAS to pytorch supported models table

Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-04-10 23:16:52 +08:00
wm2012011492
af05749e90
feat: add qwen2 moe to torch flow; fix wrong imported KvCacheConfig in gpqa… (#3369)
* add qwen2 moe to torch flow; fix wrong imported KvCacheConfig in gpqa_llmapi.py

Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>

* fix coding style

Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>

* add unittest

Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>

---------

Signed-off-by: mengw <12670782+wm2012011492@users.noreply.github.com>
Co-authored-by: mengw <12670782+wm2012011492@users.noreply.github.com>
2025-04-10 22:45:57 +08:00
brb-nv
c59abae436
feat: Add Gemma3 text-only model support (#3247)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-04-10 12:34:58 +08:00
peaceh-nv
215fb20567
chore : split GptExecutor tests out of gpt tests to reduce single test time (#3412)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-04-10 09:08:15 +08:00
Yechan Kim
943218b54a
feat: Add Qwen2.5-VL and refactor Qwen2-VL (#3156)
* feat: Add Qwen2.5-VL and refactor Qwen2-VL

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix yapf and codespell

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* add test

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix test_e2e

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* generalize get_rope_index

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix qwen2.5-vl on REAME

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix test

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

* fix image test

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>

---------

Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-10 04:09:03 +08:00
Iman Tabrizian
8401722245
test: Add single gpu disaggregated tests (#3295)
* test: Add single gpu disaggregated tests

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Add deepseek with overlap tests

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Use updated prompt

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

* Move test to disaggregated folder

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>

---------

Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-04-09 09:34:45 +08:00
pcastonguay
02f446a9ff
chore: Adding DS V3-lite tests with overlap + cuda graph (#3342)
* chore: Adding DS V3-lite tests with overlap + cuda graph

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing pre-commit

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-08 09:36:09 -04:00
Chuang Zhu
cdb0906be4
disagg test single h100 (#3353) 2025-04-08 17:45:35 +08:00
Enwei Zhu
8ee019f8c4
test: Accuracy test improvement (Part 3.4): Move LLaMA tests (#3350)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-08 15:07:57 +08:00
MinaHuai
31422e7e46
add tp=2 ci test for vision encoder (#3319)
Signed-off-by: mhuai <mhuai@nvidia.com>
2025-04-07 21:46:08 -07:00
Enwei Zhu
ba019a43d6
test: Accuracy test improvement (Part 3.3): Move DeepSeek tests (#3260)
add skip



fix



fix



update



update test list



fixqa list



move bf16 to postmerge

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-08 07:19:04 +08:00
YueWeng
aab6214801
test: fix conflicting test names (#3316)
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-04-07 20:10:01 +08:00
QI JUN
a2fad51011
chore: waive a timeout multi-GPU test case (#3310)
* debug CI timeout issue

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* waive timeout case

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-04-07 14:04:54 +08:00