h-guo18
55fed1873c
[None][chore] AutoDeploy: cleanup old inference optimizer configs ( #8039 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-17 15:55:57 -04:00
Eran Geva
4136942436
[ #7588 ][fix] fixed the kv cache size parsing in test_perf.py AD backend ( #8092 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-02 15:55:31 -04:00
Eran Geva
32c7f8c36f
[ #7588 ][feat] lock gpu clocks in test_perf.py to reliably detect perf regressions ( #8099 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-10-02 11:18:10 +03:00
Eran Geva
9cea6bfb30
[ #7288 ][feat] Added AutoDeploy backend support to test_perf.py ( #7588 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-09-28 21:21:27 -07:00
Guoming Zhang
202bed4574
[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. ( #7851 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
9f0f52249e
[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … ( #7850 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
ruodil
c5453103d6
[None][test] add deepseek r1/v3 model with chunked prefill cases ( #7124 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-09-19 11:12:53 +08:00
fredricz-20070104
fc4e6d3702
[TRTLLM-7183][test] Feature fix model issue for disagg serving ( #7785 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-19 10:12:55 +08:00
ruodil
e6073b3911
[None][test] add gpt oss model for trtllm perf test ( #7328 )
...
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-09-17 15:23:21 +08:00
Bo Deng
bf57829acf
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. ( #7503 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-10 17:35:51 +08:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd ( #7519 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
Emma Qiao
aea8ac1649
[TRTLLM-5950][infra] Removing remaining turtle keywords from the code base ( #7086 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-07 14:26:18 +08:00
peaceh-nv
f4dc1ed39c
[ https://nvbugs/5449218 ][fix] Fix KvCacheConfig error in test_perf ( #6937 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
ruodil
b845eb7a3a
[None][test] add kv cache size in bench metric and fix failed cases ( #7160 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 10:10:02 +08:00
Wanli Jiang
07c711eb1f
[TRTLLM-6825][fix] Update lora for phi4-mm ( #6817 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-21 22:00:04 -04:00
dominicshanshan
6f245ec78b
[None][chore] Mass integration of release/1.0 ( #6864 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-22 09:25:15 +08:00
ruodil
22f45a0e19
[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test ( #6685 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-07 22:57:04 -04:00
ruodil
f30398470d
[None][chore] update readme for perf release test ( #6664 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 10:00:45 +10:00
ruodil
907c180eb2
[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 ( #6632 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-06 02:25:57 -04:00
ruodil
7625845365
test: add README_release_test.md for perf test ( #6443 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-05 02:07:42 -04:00
ruodil
8d82ccca63
test: modify max_lora_rank of phi4_multimodal to 320 ( #6474 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 12:20:22 +10:00
brb-nv
7447d6ed85
[TRTLLM-6657][feat] Add LoRA support for Gemma3 ( #6371 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-01 09:19:54 -04:00
Venky
ad5742b105
[fix] Update get_trtllm_bench_build_command to handle batch size and tokens ( #6313 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-01 00:08:09 -04:00
Venky
ab40369053
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests ( #6463 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 10:53:43 +10:00
ruodil
e11255e9d0
test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases ( #6430 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-29 15:52:45 +10:00
ruodil
03632a679f
test: organize perf cases and add missing perflab cases in qa test list ( #6283 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-28 20:33:32 +10:00
ruodil
6a3c9f8061
test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test ( #5826 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-21 11:29:19 +10:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend ( #5752 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
ruodil
2a147c4d01
test: add llama_v3.3_70b_cases in perf test ( #6035 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-15 17:53:59 +10:00
ruodil
2504aa552e
test: add recursive updating pytorch config and change MOE backend format in perf test ( #6046 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-15 17:53:15 +10:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #6003 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
ruodil
278a1a7df3
test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases ( #5693 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Venky
e27215ca03
test: Validate and add accuracy& perf tests for Ministral-8B-Instruct[-FP8](pytorch only) ( #5654 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-07-08 18:16:21 -07:00
Pamela Peng
da8c7372d4
[TRTLLM-5366][feat]Add support for sm121 ( #5524 )
...
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Initial CI run failed a single step A30-CPP-3 due to timeout. Rerunning that step succeeded.
2025-07-08 14:27:00 -07:00
nv-guomingz
c8fa08da5c
doc: update cuda_graph_config usage part in DS R1 docs ( #5796 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-08 16:54:46 +09:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" ( #5818 )
2025-07-08 13:15:30 +09:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #5795 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
ruodil
ded203d8aa
test: set enable_attention_dp=True in default deepseek settings ( #5461 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
nv-guomingz
6e48ac25a6
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. ( #5585 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 12:23:14 -04:00
nv-guomingz
578430e64c
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) ( #5014 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 11:05:40 +08:00
Frank
aa6e015ef8
Update trtllm-bench to support new Pytorch default. ( #5491 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-06-26 17:05:43 -07:00
ruodil
21ce9b6749
test: add qwen3 cases ( #5302 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-19 14:38:36 +08:00
ruodil
3b5d916250
test: cherry-pick deepseek rcca cases in main branch ( #5307 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-18 14:26:26 +08:00
ruodil
2848e012ae
test: add llama4 models for perf test ( #5187 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-16 11:24:35 +08:00
ruodil
3d22f27063
test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attention_dp to false to non-deepseek models ( #5155 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-16 11:23:20 +08:00
amitz-nv
109c426077
Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow ( #5130 )
2025-06-15 18:54:04 +03:00
ruodil
fa582cbe9a
test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test ( #5083 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-13 11:09:15 +08:00
ruodil
d021cc5126
test: set enable_attention_dp to False for non-deepseek models and add more cases for llama_v3.1/3.3 70b fp8 models ( #5149 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-12 14:59:16 +08:00
Venky
c3b2eb6dab
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨ ( #5066 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-06-12 14:19:15 +08:00
ruodil
56abae0835
test: add more llama_v3.3_70b cases in perf test ( #4979 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-11 15:44:22 +08:00