TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-23 04:03:22 +08:00

Author	SHA1	Message	Date
Chuang Zhu	1ada3c9800	unwaive disagg tests (#3925 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-04-30 16:44:00 +08:00
xinhe-nv	a31afcf3a9	update waive list (#3890 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-30 11:07:48 +08:00
Dom Brown	8709fe8b53	chore: bump version to 0.19.0 (#3598 ) (#3841 ) test: add test cases for 0.19 release (#3608) * fix test name * add quickstart test for nemotron-ultra * add rcca multi-node test case for deepseek-v3 * add rcca info --------- squash (#3642) fix: nvbugs/5187237: fix deterministic mode crash (#3448) * nvbugs/5187237 nvbugs/5112075: fix deterministic mode error * remove waive * Revert "remove waive" This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac. * revert ar fusion --------- update fp8 doc (#3647) tests: change qa perf test to trtllm-bench (#3619) fix: FP8 quantized lm_head (NvBug 5214229) (#3567) infra: Add PR approval protection for the release branch (#3634) fix: nvbugs/5231298: pytorch allreduce issue (#3673) Fix: nvbugs/5222698 variable not defined (#3630) * Fix: nvbugs/5222698 variable not defined * Tidy code --------- test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685) test:restore fp8 kv cache testing for L0 (#3671) doc: Update DeepSeek perf docs (#3693) * Update DeepSeek perf docs * update * Apply suggestions from code review --------- tests: waive test_llm_multi_node (#3664) fix: update test_user_buffers_mm_add_prologue atol (#3711) Fix: cherry-pick hmac encryption from main branch (#3635) * security fix cherry-pick changes from main * fix hmac in remote mpi session (#3649) --------- Un-waive DS-V3-Lite tests. (#3621) fix: FP8 kv accuracy (#3675) * fix FP8 kv accuracy * update doc --------- Fix script options for engines. (#3622) unwaive multi-node test (#3721) chore : Split more tests out of gpt tests (#3524) (#3674) doc:add torch examples link into torch backend documentation (#3749) test: Get Eagle tests working (#3593) (#3722) Waive L0 test (#3756) waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3656) Update ds v3 parameters in stress test. (#3676) waive gemma on L20 (#3766) https://nvbugs/5141291: Fix convert.py script for Qwen model. (#3758) Include Qwen2VLDecoderLayer in the smooth_qwen2_model function. fix: PP4 fixes and cleanup (#3688) remove benchmark test list (#3643) skip disagg deepseek test if sm!=90 (#3720) test: skip failed cases on B200 (#3710) * add skip condition to tests * fix error --------- test: [nvbug: 5234494] skip_pre_ada for fp8 cases (#3718) * skip_pre_ada for fp8 cases * update * update after rebase --------- add know issue to deepseek doc. (#3800) Fix ModelOpt Mixtral AWQ OOM (#3714) (#3761) Waive L0 tests (#3826) fix: Reduce memory usage in fused moe op associated with AutoTuning and fix moe fallback issue. (#3793) * Reduce memory usage in fused moe op associated with AutoTuning. * Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens. * Add free_memory logic of workspace in min_latency_mode fused moe path. * Fix fused_moe fallback issue. (#3652) min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression. --------- [doc] Better document for Draft-Target-Model (DTM) speculative decoding (#3797) Fix pre-commit Fix again Address some review comments for the MI Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-29 16:57:22 +08:00
QI JUN	c381380ecc	increase H100 CI nodes for PyTorch only pipelines (#3927 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-29 10:58:43 +08:00
Jinyang Yuan	dafc28fb85	fix: Fix FMHA-based MLA in the generation phase and add MLA unit test (#3863 )	2025-04-29 09:09:43 +08:00
xiweny	f84dd8f815	test: add deepseek v3 & r1 cases (#3528 ) * test: add deepseek v3 & r1 cases Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-04-28 23:37:26 +08:00
xinhe-nv	82a8e43557	test: [CI] Add failed cases into waives.txt (#3867 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waives Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-28 14:32:48 +08:00
xinhe-nv	e20b67e9fd	update waives & tests (#3887 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-28 14:29:35 +08:00
Yanchao Lu	068c72ebf8	Test: waive intermittent test hang (#3894 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-28 08:53:20 +08:00
Iman Tabrizian	74cc9e26ff	infra: install Triton in the base image (#3759 ) * infra: install Triton in the base image Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * install Triton from the base image Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * update base image Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * Address review comments Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * update base image Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * waive test Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-28 07:36:30 +08:00
Dom Brown	7ff9fd345c	Test: Split C++ unit tests for CI granularity (#3868 ) Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-04-25 13:30:58 -07:00
Yiqing Yan	238fefc659	[infra] Waive L0 tests (#3853 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-25 17:32:21 +08:00
QI JUN	991939a0f4	chore: increase A30 for cpp test (#3811 ) * increase A30 for cpp test Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * enable parallel run test for gpt_executor Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * clean Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * decrease freeGpuMemoryFraction of cpp tests Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-24 16:34:39 -07:00
xinhe-nv	476d7003f8	test: [CI] Add failed cases into waives.txt (#3777 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * update waives.txt Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-24 09:36:05 +08:00
Zhanrui Sun	bfc4e55ded	infra: [TRTLLM-4417]Support auto trigger special test stage for special file change (#3478 ) * infra: Support auto trigger special test stage for special file change Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> * Fix review Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> --------- Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-23 20:32:19 +08:00
Enwei Zhu	8f2b2eaf83	test: Add DeepSeek-V3-Lite GSM8K tests (#3771 ) * tmp Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update ref Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update waives Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-23 16:54:48 +08:00
xinhe-nv	b82d72bc37	update waive list (#3696 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-23 14:18:57 +08:00
Yechan Kim	11d35656bf	fix: nvbugs/5234029 fix Qwen2.5-VL image test (#3726 ) * fix: nvbugs/5234029 fix Qwen2.5-VL image test case by adding more answer candidate Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove qwen2.5_vl from waive list Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-04-23 14:09:39 +08:00
xinhe-nv	80d8fdefd6	add test_mistral_large_hidden_vocab_size tests (#3716 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-23 13:40:11 +08:00
Yiqing Yan	cc161dd83d	Waive L0 tests (#3784 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-23 11:22:11 +08:00
QI JUN	257abfbc51	move pytorch tests of LLM API into separate test files (#3745 ) * move pytorch tests of LLM API into separate test files Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * polish Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * update Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * clean Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-22 14:36:59 -07:00
Emma Qiao	442386d302	infra: Add test stages for sm120 (#3533 ) * Add test stages for sm120 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update chip name and config name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Split tests to gb202 and gb203 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Don't flash driver for rtx-5090 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Skip the failed cases Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Change the test stage names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Reduce 5080 jobs and add back gpu list which doesn't support dynamic driver flashing Signed-off-by: qqiao <qqiao@nvidia.com> * Skip failed case on gb202 Signed-off-by: qqiao <qqiao@nvidia.com> * Fix condition to dynamic driver flashing Signed-off-by: qqiao <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com>	2025-04-23 01:26:12 +08:00
Ivy Zhang	47d2f16bb8	waive gemma on L20 (#3767 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-04-22 17:52:49 +08:00
ruodil	9223000765	waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3657 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-22 14:51:45 +08:00
xinhe-nv	ba216341f4	update waive list (#3683 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-22 11:09:41 +08:00
Enwei Zhu	3fa19ffa4e	test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483 ) * add gsm8k Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix gsm8k Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add gpqa Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * conditional import lm_eval Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * gpqa in lm_eval Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * system prompt Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * shuffle Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update AA prompt and regex Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * revert AA prompt and regex Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * integration to tests Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add DS-R1 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix and clean Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update tests Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * clean up Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * free_gpu_memory_fraction=0.8 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-22 07:38:16 +08:00
Barry Kang	d87b009d8d	Fix ModelOpt Mixtral AWQ OOM (#3714 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-04-21 19:14:14 +08:00
Iman Tabrizian	af04b6f6aa	bug: Fix hang bug when context server doesn't have enough capacity for KV Cache (#3095 ) * Fix hang bug when KV cache is low Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Review comments Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Fix attentiondp typo Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Add CI test for this case Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * fix: Fix the insertion order for responder futures Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * fix: Fix disagg CPP Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-21 15:16:55 +08:00
Stanley Sun	852dd0c1be	test: add llama3.2 ptp test case (#3363 ) * add llama3.2 ptp test case Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * update test list Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> --------- Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-04-21 15:15:45 +08:00
Yiqing Yan	6f7f262779	Waive L0 tests (#3709 ) * Waive L0 tests Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * the test is fixed in PR 3711 Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> --------- Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-21 11:24:00 +08:00
Emma Qiao	48db263d9a	infra: Add test list name check (#3097 ) * Add steps to check test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct test-db command Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Switch to use a trt-llm image Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update go path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct go path Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move the test list check to test ci Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct file path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix path again Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix get path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Skip test list check for ARM Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix expression Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Change back unrelated file Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct qa test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Remove a stage Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update jenkins/L0_Test.groovy Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move some steps to a python script Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix script path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Split commands and debug Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Also correct case name in waives list Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move check script to another folder Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update qa list after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Remove the perf tests under QA Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Some tests already fixed after rebase to TOT Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-20 23:02:16 +08:00
brb-nv	c35d2a7532	test: Get Eagle tests working (#3593 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-20 00:50:57 +08:00
nv-guomingz	e70961f541	test:update waives.txt for nvbug 5219532 (#3672 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-04-19 18:57:39 +08:00
Iman Tabrizian	61ee983488	fix: Fix disaggregated load balance test (#3689 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-19 10:40:40 +08:00
Iman Tabrizian	a2f190f306	chore: Waive disaggregated load balance (#3687 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-18 16:04:33 -07:00
Yechan Kim	5460d18b10	feat: trtllm-serve multimodal support (#3590 ) * feat: trtllm-serve multimodal support Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable argument Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add and separate tests and move the doc Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove block_resue arg from serve.py Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-04-19 05:01:28 +08:00
pcastonguay	ae5671644a	feat: Disaggregated router class (#3584 ) * Add draft scheduler class Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * Refactor the design Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * feat: Introduce router class for disaggregated server Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Add unit tests for router class Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Adding tests for disagg_utils Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing missing import Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing disagg integration tests Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Addressing MR review comments Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-04-19 00:34:12 +08:00
QI JUN	b9fce42717	enable test_ptp_quickstart_advanced_mixed_precision (#3667 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-18 05:06:24 -07:00
Zheng Duan	bce7ea8c38	test: add kv cache event tests for disagg workers (#3602 )	2025-04-18 18:30:19 +08:00
peaceh-nv	88cff61fa1	chore : Split more tests out of gpt tests (#3524 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-04-18 12:04:57 +08:00
dongfengy	b71a0f76b4	test: Add llama 4 to ci (#3520 ) * Add llama 4 to ci Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> * Only test trtllm Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> * Disable marverick Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> --------- Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-04-18 11:25:52 +08:00
Ivy Zhang	ad19ca3cbf	remove benchmark test list (#3644 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 16:23:41 +08:00
Netanel Haber	3c52ac098f	feat: allocate minimal blocks per window size (#3028 ) * implement variable window attention by breaking the block manager into window block managers per window size Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * revert isCyclic to be true if the min attention window is reached, not per window size Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * add explanatory comment to mCyclicThreshold Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * load correct gemma config Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * don't shadow inputLength in addSequence - it should remain the function scope input length between window size loop iterations Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix KVCacheManagerVariableWindowAttentionWithReuseTest for multiple window block managers Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * if TYPE_CHECKING Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * set temp_attention_window_inputs to None explicitly Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * set temp_attention_window_inputs to None explicitly Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * pass dtype as well Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * test_gemma variable sliding window attention Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * allot a fraction of primary/secondaryBlocks to different window size heaps, depending on the window size's total contribution to the kvcache size (i.e., including all layers) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * remove \|\| mEnableBlockReuse which erroneously triggers beamsearch code for cyclic variable attention window code Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * turn off request delaying for MaxUtil Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * make comments better Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * windowSizesTotalSum using std::accumulate Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix error handling of forwardAsync - forwardAsync catch-all catch cleanup code that runs terminateRequest can also fail and must be caught Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix comments Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * remove assert that kills disagg tests, since it isn't necessary Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix corrupted expression: 'isNewTask && (peftCacheManager ?' -> '(isNewTask && peftCacheManager) ?' which caused boolean algebra. Main is correct Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * add Gemma3 to SUPPORTED_HF_ARCHITECTURES Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * support Gemma3 Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix kvfactor field for deepseek Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix comment Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix gemma-3 entries in testlist to include vswa Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * only quantize gemma2 VSWA Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> remove misleading comment Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * in sendRequestInfo, fromOldAllocatedBlockIds->fromOldAllocatedBlockIds, like in main Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix: disable KV cache reuse if using attention sink (#3021) * fix: disable KV cache reuse if using attention sink Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * fix: disable KV cache reuse if sink bubble Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * add comment Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-17 16:04:57 +08:00
Yiqing Yan	1c6f3debbb	Waive L0 tests (#3651 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-17 15:13:56 +08:00
xinhe-nv	b82a4e8d01	test: [CI] Add failed cases into waives.txt (#3627 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * fix waives Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-17 14:45:41 +08:00
Ivy Zhang	b2fb0fe843	test: add quickstart test for nemotron-ultra (#3596 ) * add quickstart test for nemotron-ultra Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix test name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 11:16:41 +08:00
ruodil	5e2ebebe76	tests: change qa perf test to trtllm-bench (#3189 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 09:53:32 +08:00
QI JUN	ab29348db2	waive test_llm_phi_quantization_1gpu (#3603 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-16 13:33:46 +08:00
Daniel Cámpora	41ce5440fe	chore: Mass integration of release/0.18 (#3421 ) * [Infra][TRTLLM-4063] - Branch out for the TRT-LLM v0.18.0 release Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> (cherry picked from commit de90312020e51c22ba5e75b3502c7ee90c059265) * [Infra][TRTLLM-3652] - Update dependencies to TRT 10.9 / CUDA 12.8.1 / DLFW 25.03(Internal) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> (cherry picked from commit 58db1340ef7db22f1910f878d220a92be5b830d1) * [None][Doc] - Update docs for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit d23e75bc95619ce3b116213d55319272888e0c88) * [Infra] - Fix or WAR issues in the package sanity check stages Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit e874e2b127515c52ba10c8df1cc2631627f74ffe) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <yukih@nvidia.com> (cherry picked from commit 731811d4e182d70a66193d646152cb71dfafe83a) * cherry-pick 'test: Updat cluster and multi node test lists and trtllm-bench' test to fix perf drop issue Signed-off-by: Ruodi Lu <ruodil@nvidia.com> (cherry picked from commit 5214616283fbc15ae98871a1d84c78d8e1f2e6e8) * Revert "Merge branch 'user/yukih/fix_5173454_5173432' into 'release/0.18'" Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 8d34831cb2b81ee2dfa8021b68e7158b33789a5f) * [Infra]Restrict setuptools version to avoid sasb pip install issue Signed-off-by: Emma Qiao <qqiao@nvidia.com> (cherry picked from commit 1e60ad29e0dafec0e295bedb5d89b716a02a707c) * [https://nvbugs/5173454] [https://nvbugs/5173432] [https://nvbugs/5175863] fix chatglm tokenizer and tmp model path Signed-off-by: Yuki Huang <yukih@nvidia.com> (cherry picked from commit 3ed8164e5bfea1d5aa2039b5408439fd6cf59dac) * WAR for bug 5173448 Signed-off-by: Thor Johnsen <tjohnsen@nvidia.com> (cherry picked from commit b6528b2ba15322b6c6a4c81a8b74c04d4973de4f) * [Infra][TRTLLM-3652] - Update dependencies to CUDA 12.8.1 / DLFW 25.03 Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> (cherry picked from commit 6560983d132d9d257ee15849664eb055e94adaa9) * [Docs] - Doc changes for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 26769b61218a947c8f9d070f73b63d576fcc20c4) * [Doc] - Doc change for v0.18.0 Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> (cherry picked from commit 4b3b5ed6bfbc2300e3775fe75456083faad7b235) * [Infra] update version to 0.18.1 Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> (cherry picked from commit 59e8326c75639275837d34de8e140358737a3365) * Add back nemotron file. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Fix recurrentgemma reqs. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Adding WAR for bug 5173448. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Formatting. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Remove duplicated file. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update examples/prompt_lookup/requirements.txt Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Remove glm-4-9b from model dir in chatglm test. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Remove indent change. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Revert changes on l0_test.groovy. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update dev images Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> * Remove duplicated import. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Fix custom op Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Fix flashinfer & vanilla backend Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Skip problematic case. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Skip problematic test_moe_w4a8_1_14336_4096_8_bfloat16_True_False case. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> --------- Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Ruodi Lu <ruodil@nvidia.com> Co-authored-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Thor Johnsen <tjohnsen@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com> Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-04-16 10:03:29 +08:00
xiweny	da47d5f27e	fix: nvbugs/5075538: fix cross attention mask when decoder input len > 1 (#3585 ) * fix: nvbugs/5075538: fix cross attention mask when decoder input len > 1 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> * remove waiver Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> --------- Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-04-16 08:31:33 +08:00

1 2 3

131 Commits