TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-28 06:33:15 +08:00

Author	SHA1	Message	Date
Zongfei Jing	bb17649517	test: Add UT for moe trtllmgen (#4258 ) * Add ut for moe trtllmgen Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * Update tests/unittest/_torch/modeling/test_modeling_deepseek.py Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> --------- Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>	2025-05-14 15:22:58 +08:00
bhsueh_NV	1a9298bc66	CI: add fp8/fp4 ci on Qwen3-30B-A3B (#4266 ) add fp8/fp4 ci on Qwen3-30B-A3B Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-14 14:38:04 +08:00
brb-nv	8280c3d4f2	feat: Support Gemma3-1b-it in Pytorch workflow (#3999 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-14 14:02:44 +08:00
brb-nv	cd5b3d21a0	feat: Support Mistral Small 3.1 24B VLM in TRT workflow (#4183 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-05-14 03:47:22 +08:00
Enwei Zhu	035d915fea	[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior (#4090 ) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * normalize mtp_nextn Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update test_durations Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-13 07:41:51 +08:00
wili	eba3623a54	Feat: Variable-Beam-Width-Search (VBWS) part4 (#3979 ) * feat/vbws-part4-v1.8: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * feat/vbws-part4-v1.9: fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.1: remove useless variables Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.2:fix incorrect output when using short output length Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.3: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.4: rebase Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> * v1.9.5: remove API change Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> --------- Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-05-12 22:32:29 +02:00
Zheng Duan	c9e2a963e0	feat: add kv cache aware router (#3831 ) * kv cache aware router Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * add tests Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * router config Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> add test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * eviction detect in worker test Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * move worker tests to single gpu Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * reduce memory fraction Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> * fix partial block Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com> --------- Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-05-12 07:23:57 -04:00
Yixin Dong	c90ebadd84	feat: Support the Structural Tag in guided decoding (#4066 ) * finish Signed-off-by: Ubospica <ubospica@gmail.com> * update Signed-off-by: Ubospica <ubospica@gmail.com> * update Signed-off-by: Ubospica <ubospica@gmail.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * exc overlap scheduler Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add test Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix api ref Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Ubospica <ubospica@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-12 17:24:50 +08:00
Yechan Kim	3e9bda3a09	[feat] Support HyperCLOVAX-SEED-Text language part (#3902 ) * feat: support HyperCLOVAX-SEED-Text language part Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add Pytorch flow and remove test file Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * revert summarize Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * fix summarize Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove from pytorch example Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-05-12 16:05:14 +08:00
Dom Brown	2d0f93a054	Refactor: Restructure C++ tests for better modularisation of non-shared code (#4027 ) * Refactor: Restructure C++ tests for better modularisation of non-shared code Start cleanup of pytest code for C++ tests Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Clean up names and remove references to test_cpp.py Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> WIP Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Move multi-GPU code Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Update doc and try un-waiving Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Update multi GPU file check Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> * Address minor multi-GPU setup bug Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> --------- Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-05-09 19:16:51 +01:00
Mike Iovine	4b8ba7ad61	[fix][nvbug/5244009] Fix llama 4 test lists/scout accuracy issue (#4069 ) [fix] Fix llama 4 test lists Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-09 22:45:14 +08:00
Bo Li	e3cf3fd15f	test: Add fp8kv to DS-v3-lite integration tests. (#3950 ) * Add fp8 kv cache tests to DSV3-Lite integration tests. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Refactor. Make fp8kv parallel to attention_dp, overlap_scheduler and cuda_graph. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update gsm8k. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update CI list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update TestDeepSeekR1. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Fix test list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Need quant_config besides pytorch_config. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list (bug 5239087). Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Correct test name. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> * Update waive list. Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> --------- Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Bo Li <bobboli0202@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-05-09 13:35:04 +08:00
Ivy Zhang	c91d03fa0a	test: move mistral / mixtral test cases in QA test list into the new accuracy test suite (#3440 ) * add mistral-7b-v0.1 torch flow test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mistral Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * rearrange mixtral case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove api function test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mistral nemo cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * move mixtral cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix failure cases Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update list Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove awq llmapi test Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * adjust threshold Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix partial comments Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix path Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update thres Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * update Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * remove duplicate test case Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix ci Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-05-09 13:32:02 +08:00
yuanjingx87	6e1d2a1320	feat: Add Slurm support and enable RTX Pro 6000 testing pipeline in CI (#4019 ) * Add slurm support with RTXPro6000 PostMerge Tests Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> * remove H100 post merge test from testing Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com> --------- Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-05-08 15:15:36 +08:00
dominicshanshan	3ac6637005	fix: trtllm-serve hang in stress test and ds v3 stress parameter update (#3836 ) * Remove stdout pipe for genai-perf and make stress time as public parameter. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * Update llmRequest based on comment. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> * launch process function refactor. Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> --------- Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-05-06 16:52:30 +08:00
pansicheng	e84dc6b3c7	feat: add deepseek-r1 reasoning parser to trtllm-serve (#3354 ) * add deepseek-r1 reasoning parser Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> * fix test Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> --------- Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-05-06 08:13:04 +08:00
Iman Tabrizian	85867d76dd	test: Add disaggregated serving accuracy tests (#4036 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-05-05 08:56:59 -07:00
Yan Chunwei	bc0cf41592	chore: refactor llmapi e2e tests (#3803 ) * refactor llmapi e2e tests Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-05 07:37:24 +08:00
Emma Qiao	2692daad2e	infra: Remove the WAR for test items incompletely (#3313 ) * Remove the WAR for test items incompleted Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Complete test item manually Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test definition file Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Complete test name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix some other test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update name for waived case name, too Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix name for multi-gpu tests Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix another test name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix test name after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix other qa tests Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix tests name after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Fix name after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Correct test names in waive.txt Signed-off-by: qqiao <qqiao@nvidia.com> * Add new test_durations file Signed-off-by: qqiao <qqiao@nvidia.com> * Fix names after rebase Signed-off-by: qqiao <qqiao@nvidia.com> * Update test duration to latest Signed-off-by: qqiao <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com>	2025-05-04 11:31:59 +08:00
Mike Iovine	906cddffb0	[infra] Improve llama4 parallelism test coverage (#3821 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-05-02 16:15:04 -04:00
bhsueh_NV	561ee44737	add ci and doc for qwen3 (#4022 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-05-02 14:13:38 +08:00
Dom Brown	8709fe8b53	chore: bump version to 0.19.0 (#3598 ) (#3841 ) test: add test cases for 0.19 release (#3608) * fix test name * add quickstart test for nemotron-ultra * add rcca multi-node test case for deepseek-v3 * add rcca info --------- squash (#3642) fix: nvbugs/5187237: fix deterministic mode crash (#3448) * nvbugs/5187237 nvbugs/5112075: fix deterministic mode error * remove waive * Revert "remove waive" This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac. * revert ar fusion --------- update fp8 doc (#3647) tests: change qa perf test to trtllm-bench (#3619) fix: FP8 quantized lm_head (NvBug 5214229) (#3567) infra: Add PR approval protection for the release branch (#3634) fix: nvbugs/5231298: pytorch allreduce issue (#3673) Fix: nvbugs/5222698 variable not defined (#3630) * Fix: nvbugs/5222698 variable not defined * Tidy code --------- test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685) test:restore fp8 kv cache testing for L0 (#3671) doc: Update DeepSeek perf docs (#3693) * Update DeepSeek perf docs * update * Apply suggestions from code review --------- tests: waive test_llm_multi_node (#3664) fix: update test_user_buffers_mm_add_prologue atol (#3711) Fix: cherry-pick hmac encryption from main branch (#3635) * security fix cherry-pick changes from main * fix hmac in remote mpi session (#3649) --------- Un-waive DS-V3-Lite tests. (#3621) fix: FP8 kv accuracy (#3675) * fix FP8 kv accuracy * update doc --------- Fix script options for engines. (#3622) unwaive multi-node test (#3721) chore : Split more tests out of gpt tests (#3524) (#3674) doc:add torch examples link into torch backend documentation (#3749) test: Get Eagle tests working (#3593) (#3722) Waive L0 test (#3756) waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3656) Update ds v3 parameters in stress test. (#3676) waive gemma on L20 (#3766) https://nvbugs/5141291: Fix convert.py script for Qwen model. (#3758) Include Qwen2VLDecoderLayer in the smooth_qwen2_model function. fix: PP4 fixes and cleanup (#3688) remove benchmark test list (#3643) skip disagg deepseek test if sm!=90 (#3720) test: skip failed cases on B200 (#3710) * add skip condition to tests * fix error --------- test: [nvbug: 5234494] skip_pre_ada for fp8 cases (#3718) * skip_pre_ada for fp8 cases * update * update after rebase --------- add know issue to deepseek doc. (#3800) Fix ModelOpt Mixtral AWQ OOM (#3714) (#3761) Waive L0 tests (#3826) fix: Reduce memory usage in fused moe op associated with AutoTuning and fix moe fallback issue. (#3793) * Reduce memory usage in fused moe op associated with AutoTuning. * Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens. * Add free_memory logic of workspace in min_latency_mode fused moe path. * Fix fused_moe fallback issue. (#3652) min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression. --------- [doc] Better document for Draft-Target-Model (DTM) speculative decoding (#3797) Fix pre-commit Fix again Address some review comments for the MI Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-29 16:57:22 +08:00
QI JUN	c381380ecc	increase H100 CI nodes for PyTorch only pipelines (#3927 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-29 10:58:43 +08:00
Jinyang Yuan	dafc28fb85	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
xiweny	f84dd8f815	test: add deepseek v3 & r1 cases (#3528 ) * test: add deepseek v3 & r1 cases Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-04-28 23:37:26 +08:00
Dom Brown	7ff9fd345c	Test: Split C++ unit tests for CI granularity (#3868 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-04-25 13:30:58 -07:00
QI JUN	991939a0f4	chore: increase A30 for cpp test (#3811 ) * increase A30 for cpp test Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * enable parallel run test for gpt_executor Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * clean Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * decrease freeGpuMemoryFraction of cpp tests Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-24 16:34:39 -07:00
Zhanrui Sun	bfc4e55ded	infra: [TRTLLM-4417]Support auto trigger special test stage for special file change (#3478 ) * infra: Support auto trigger special test stage for special file change Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-23 20:32:19 +08:00
xinhe-nv	80d8fdefd6	add test_mistral_large_hidden_vocab_size tests (#3716 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-23 13:40:11 +08:00
QI JUN	257abfbc51	move pytorch tests of LLM API into separate test files (#3745 ) * move pytorch tests of LLM API into separate test files Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * polish Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * update Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * clean Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-22 14:36:59 -07:00
Emma Qiao	442386d302	infra: Add test stages for sm120 (#3533 ) * Add test stages for sm120 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update chip name and config name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Split tests to gb202 and gb203 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Don't flash driver for rtx-5090 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Skip the failed cases Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Change the test stage names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Reduce 5080 jobs and add back gpu list which doesn't support dynamic driver flashing Signed-off-by: qqiao <qqiao@nvidia.com> * Skip failed case on gb202 Signed-off-by: qqiao <qqiao@nvidia.com> * Fix condition to dynamic driver flashing Signed-off-by: qqiao <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com>	2025-04-23 01:26:12 +08:00
Iman Tabrizian	af04b6f6aa	bug: Fix hang bug when context server doesn't have enough capacity for KV Cache (#3095 ) * Fix hang bug when KV cache is low Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Review comments Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Fix attentiondp typo Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Add CI test for this case Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * fix: Fix the insertion order for responder futures Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * fix: Fix disagg CPP Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-21 15:16:55 +08:00
Yechan Kim	5460d18b10	feat: trtllm-serve multimodal support (#3590 ) * feat: trtllm-serve multimodal support Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable argument Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add and separate tests and move the doc Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove block_resue arg from serve.py Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-04-19 05:01:28 +08:00
pcastonguay	ae5671644a	feat: Disaggregated router class (#3584 ) * Add draft scheduler class Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * Refactor the design Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * feat: Introduce router class for disaggregated server Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Add unit tests for router class Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Adding tests for disagg_utils Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing missing import Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing disagg integration tests Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Addressing MR review comments Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-04-19 00:34:12 +08:00
QI JUN	b9fce42717	enable test_ptp_quickstart_advanced_mixed_precision (#3667 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-18 05:06:24 -07:00
Zheng Duan	bce7ea8c38	test: add kv cache event tests for disagg workers (#3602 )	2025-04-18 18:30:19 +08:00
peaceh-nv	88cff61fa1	chore : Split more tests out of gpt tests (#3524 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-04-18 12:04:57 +08:00
dongfengy	b71a0f76b4	test: Add llama 4 to ci (#3520 ) * Add llama 4 to ci Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> * Only test trtllm Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> * Disable marverick Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> --------- Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-04-18 11:25:52 +08:00
Netanel Haber	3c52ac098f	feat: allocate minimal blocks per window size (#3028 ) * implement variable window attention by breaking the block manager into window block managers per window size Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * revert isCyclic to be true if the min attention window is reached, not per window size Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * add explanatory comment to mCyclicThreshold Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * load correct gemma config Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * don't shadow inputLength in addSequence - it should remain the function scope input length between window size loop iterations Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix KVCacheManagerVariableWindowAttentionWithReuseTest for multiple window block managers Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * if TYPE_CHECKING Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * set temp_attention_window_inputs to None explicitly Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * set temp_attention_window_inputs to None explicitly Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * pass dtype as well Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * test_gemma variable sliding window attention Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * allot a fraction of primary/secondaryBlocks to different window size heaps, depending on the window size's total contribution to the kvcache size (i.e., including all layers) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * remove \|\| mEnableBlockReuse which erroneously triggers beamsearch code for cyclic variable attention window code Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * turn off request delaying for MaxUtil Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * make comments better Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * windowSizesTotalSum using std::accumulate Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix error handling of forwardAsync - forwardAsync catch-all catch cleanup code that runs terminateRequest can also fail and must be caught Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix comments Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * remove assert that kills disagg tests, since it isn't necessary Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix corrupted expression: 'isNewTask && (peftCacheManager ?' -> '(isNewTask && peftCacheManager) ?' which caused boolean algebra. Main is correct Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * add Gemma3 to SUPPORTED_HF_ARCHITECTURES Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * support Gemma3 Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix kvfactor field for deepseek Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix comment Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix gemma-3 entries in testlist to include vswa Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * only quantize gemma2 VSWA Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> remove misleading comment Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * in sendRequestInfo, fromOldAllocatedBlockIds->fromOldAllocatedBlockIds, like in main Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix: disable KV cache reuse if using attention sink (#3021) * fix: disable KV cache reuse if using attention sink Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * fix: disable KV cache reuse if sink bubble Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * add comment Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-17 16:04:57 +08:00
Zheng Duan	b0cb963199	test: torch-flow conditional disagg test (#3410 ) Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>	2025-04-15 10:54:14 +08:00
nv-guomingz	b32ae7ac92	test:add fp8_kv_cache functionality test case. (#3457 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-04-15 09:16:46 +08:00
Iman Tabrizian	bad55e99bb	test: Add MTP + overlap + Attention DP disaggregated test (#3542 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-15 07:46:03 +08:00
brb-nv	44090a5388	Add support for Phi-4-MM (#3296 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-14 14:24:10 +08:00
Yiqing Yan	19d296b4b2	chore: add dgx_h200 tests (#3451 ) * add dgx_h200 tests Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * test Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * fix pre-commit Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * fix Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * fix Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * change bsl branch Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * fix Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * change multi gpu related file list Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> --------- Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-14 11:20:55 +08:00
dominicshanshan	5d3180be82	feat: Add stress test for TRT-LLM (#3250 ) Signed-off-by: Wangshanshan <dominicw@nvidia.com>	2025-04-13 10:24:25 +08:00
pcastonguay	145a126a28	chore: Unwaive DS + overlap disagg test (#3339 ) * chore: Unwaive DS + overlap disagg test Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing pre-commit Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing pre-commit Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-04-12 13:33:38 -04:00
Enwei Zhu	cf9ceea890	test: Add DeepSeek-V3-Lite PP=4 cases (#3454 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-12 00:09:12 +08:00
QI JUN	16ca45747b	always trigger multi gpu test to protect modeling_llama.py and modeling_deepseekv3.py (#3434 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-11 13:19:23 +08:00
Iman Tabrizian	d7f45e50c6	test: disable attention DP tests for single GPU (#3395 ) Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>	2025-04-11 01:38:17 +08:00
amitz-nv	a6a2ae6cc1	chore: Rename nvsmall to nemotron nas (#3447 ) * Rename nvsmall to nemotron NAS * Revert nvsmall to nemotron_nas rename in paths in tests that access llm_models_root/nvsmall/tests * Add NemotronNAS to pytorch supported models table Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-04-10 23:16:52 +08:00

1 2

85 Commits