TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
QI JUN	257abfbc51	move pytorch tests of LLM API into separate test files (#3745 ) * move pytorch tests of LLM API into separate test files Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * polish Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * update Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * clean Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-22 14:36:59 -07:00
Lucas Liebenwein	06b914e0f9	feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs (#3589 ) * generalizing cudagraph to multiple dynamic inputs Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> * fix for failing test Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> --------- Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-04-23 03:38:51 +08:00
Emma Qiao	442386d302	infra: Add test stages for sm120 (#3533 ) * Add test stages for sm120 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update chip name and config name Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Split tests to gb202 and gb203 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Don't flash driver for rtx-5090 Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Skip the failed cases Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Change the test stage names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Reduce 5080 jobs and add back gpu list which doesn't support dynamic driver flashing Signed-off-by: qqiao <qqiao@nvidia.com> * Skip failed case on gb202 Signed-off-by: qqiao <qqiao@nvidia.com> * Fix condition to dynamic driver flashing Signed-off-by: qqiao <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com>	2025-04-23 01:26:12 +08:00
Yukun He	0ae7017342	Unify two versions of AllReduce custom op (#3032 ) * Rewrite unit test for unified allreduce op. Removing the legacy unit test. * Revise formats, fusion_op bindings. Put all tensors as optional inputs. * Move the MoeAllreduceOp to a separate custom op. * Move all the fusion patterns to the new version of the AllReduce fusion kernel. Remove the AllReduce strategy config. Revise the AllReduce strategies and fusion pattern definitions. * Add more TODOs, fixing minor bugs, and remove legacy code. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-04-22 21:58:42 +08:00
Ivy Zhang	47d2f16bb8	waive gemma on L20 (#3767 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-04-22 17:52:49 +08:00
ruodil	9223000765	waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3657 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-22 14:51:45 +08:00
xinhe-nv	ba216341f4	update waive list (#3683 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-22 11:09:41 +08:00
Yi Zhang	98966cb45e	test: Unwaive Llama 3.1 with torch compile test (#3475 ) * Fix log info Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Revert "test: Waive torch compile tests (#3471)" This reverts commit `410f56357e`. Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> * Update test_llm_api_pytorch.py Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> --------- Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-04-22 10:41:56 +08:00
Enwei Zhu	3fa19ffa4e	test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483 ) * add gsm8k Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix gsm8k Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add gpqa Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * conditional import lm_eval Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * gpqa in lm_eval Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * system prompt Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * shuffle Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update AA prompt and regex Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * revert AA prompt and regex Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * integration to tests Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * add DS-R1 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix and clean Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update tests Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * clean up Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * free_gpu_memory_fraction=0.8 Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-04-22 07:38:16 +08:00
Yan Chunwei	231b39015c	unwaive multi_node test (#3715 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-04-21 21:26:07 +08:00
Barry Kang	d87b009d8d	Fix ModelOpt Mixtral AWQ OOM (#3714 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-04-21 19:14:14 +08:00
Zheng Duan	ae48abefc1	bind block key and hasher (#3712 )	2025-04-21 18:50:57 +08:00
Iman Tabrizian	af04b6f6aa	bug: Fix hang bug when context server doesn't have enough capacity for KV Cache (#3095 ) * Fix hang bug when KV cache is low Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Review comments Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Fix attentiondp typo Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Add CI test for this case Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * fix: Fix the insertion order for responder futures Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * fix: Fix disagg CPP Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-21 15:16:55 +08:00
Stanley Sun	852dd0c1be	test: add llama3.2 ptp test case (#3363 ) * add llama3.2 ptp test case Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> * update test list Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com> --------- Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-04-21 15:15:45 +08:00
Zhenhuan Chen	2672f13d77	test: fix cublas_scaled_mm with aligned workspace size (#3600 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-04-21 14:51:42 +08:00
yuxianq	faef37782a	fix: Remove ParallelConfig. (#3678 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-04-21 14:14:08 +08:00
liji-nv	a51f7559a3	fix: update test_user_buffers_mm_add_prologue atol (#3711 ) (#3713 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-04-21 11:24:20 +08:00
Yiqing Yan	6f7f262779	Waive L0 tests (#3709 ) * Waive L0 tests Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> * the test is fixed in PR 3711 Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> --------- Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-21 11:24:00 +08:00
hlu1	31624b079a	feat: [Deepseek] Add trtllm-gen MOE FP4 MOE backend (#3387 ) * Add TRT-LLM Gen MOE to Deepseek fix fused moe rebase bug. Fix atol in test_fp4_gemm_quantize.py fix fused moe rebase bug. Fix FusedMoe. Disable 2nd routing kernel preexit Bump routing reduction to fp32 Disable PDL for fc1 [DEBUG] Lift token limit to 16k [Bugfix] Token limit to 16k + fp32 routing + tanh Make fp8 tileN 8 Fix FP8 MoE + Remove redundent temp output for FP4 [FP8-only] Avoid wasting CTAs for activation kernel fix: unblock FP8 weightloading with trtllm-gen Remove max_token limit for trtllm-gen path perf: avoid type-conversion and fill_ from aten Minor fix Signed-off-by: Hao Lu <haolu@nvidia.com> * Fix rebase issues Signed-off-by: Hao Lu <haolu@nvidia.com> * Fix compile issue Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> * CI clean Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> --------- Signed-off-by: Hao Lu <haolu@nvidia.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-04-21 10:01:33 +08:00
Emma Qiao	48db263d9a	infra: Add test list name check (#3097 ) * Add steps to check test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct test-db command Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Switch to use a trt-llm image Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update go path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct go path Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move the test list check to test ci Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct file path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix path again Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix get path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Skip test list check for ARM Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix expression Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Change back unrelated file Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Correct qa test names Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Remove a stage Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update jenkins/L0_Test.groovy Co-authored-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move some steps to a python script Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix script path Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Split commands and debug Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix typo Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Also correct case name in waives list Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move check script to another folder Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update qa list after rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Fix rebase Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Remove the perf tests under QA Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Some tests already fixed after rebase to TOT Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-04-20 23:02:16 +08:00
QI JUN	d51ae53940	move the reset models into `examples/models/core` directory (#3555 ) * move rest models to examples/models/core directory Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * update multimodal readme Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix example path Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix cpp test Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix tensorrt test Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> * fix ci Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> --------- Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-19 20:48:59 -07:00
brb-nv	c35d2a7532	test: Get Eagle tests working (#3593 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-04-20 00:50:57 +08:00
nv-guomingz	e70961f541	test:update waives.txt for nvbug 5219532 (#3672 ) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>	2025-04-19 18:57:39 +08:00
Iman Tabrizian	61ee983488	fix: Fix disaggregated load balance test (#3689 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-19 10:40:40 +08:00
hlu1	c861b6cf17	Clean up modeling_deepseek.py (#3640 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-04-18 17:54:33 -07:00
Iman Tabrizian	a2f190f306	chore: Waive disaggregated load balance (#3687 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-18 16:04:33 -07:00
Yechan Kim	5460d18b10	feat: trtllm-serve multimodal support (#3590 ) * feat: trtllm-serve multimodal support Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable argument Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove disable Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * add and separate tests and move the doc Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> * remove block_resue arg from serve.py Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> --------- Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>	2025-04-19 05:01:28 +08:00
pcastonguay	ae5671644a	feat: Disaggregated router class (#3584 ) * Add draft scheduler class Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * Refactor the design Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> * feat: Introduce router class for disaggregated server Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Add unit tests for router class Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Adding tests for disagg_utils Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing missing import Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Fixing disagg integration tests Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> * Addressing MR review comments Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> --------- Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-04-19 00:34:12 +08:00
QI JUN	b9fce42717	enable test_ptp_quickstart_advanced_mixed_precision (#3667 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-18 05:06:24 -07:00
Zheng Duan	bce7ea8c38	test: add kv cache event tests for disagg workers (#3602 )	2025-04-18 18:30:19 +08:00
Yan Chunwei	2a09826ec4	fix hmac in remote mpi session (#3649 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-04-18 17:47:51 +08:00
HuiGao-NV	d3608d6818	Remove dummy forward path (#3669 ) Remove dummy forward path	2025-04-18 16:17:50 +08:00
Dom Brown	dbd9a83b0d	feat: Integrate GPUDirect Storage (GDS) into Executor API (#3582 ) * feat: Integrate GPUDirect Storage (GDS) into Executor API Squash of several dev commits Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>	2025-04-18 15:59:21 +08:00
Erin	4fedf0be5c	unwaive test for nvbug_5150466 (#3552 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-04-18 15:15:58 +08:00
Emma Qiao	2f48985b9c	infra: Add step to generate new duration file (#3298 ) * Add step to generate new duration file Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Install python in earlier step Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Clone repo and add debug info Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Remove debug info and only generate duration for post-merge Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Test for the new duration file Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Update the duration file format Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> * Move generate_duration.py to scripts folder and add try-catch avoiding any broken Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com> --------- Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>	2025-04-18 12:56:31 +08:00
peaceh-nv	88cff61fa1	chore : Split more tests out of gpt tests (#3524 ) Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>	2025-04-18 12:04:57 +08:00
dongfengy	b71a0f76b4	test: Add llama 4 to ci (#3520 ) * Add llama 4 to ci Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> * Only test trtllm Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> * Disable marverick Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> --------- Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2025-04-18 11:25:52 +08:00
Iman Tabrizian	fc88d67675	chore: Refactor test_disaggregated.py (#3154 ) * Refactor test_disaggregated.py Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Address review comments Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> * Remove waived tests Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> * fix: Fix streaming endpoint chat completions Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> --------- Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com> Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-04-18 11:04:06 +08:00
rakib-hasan	ff3b741045	feat: adding multimodal (only image for now) support in trtllm-bench (#3490 ) * feat: adding multimodal (only image for now) support in trtllm-bench Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * fix: add in load_dataset() calls to maintain the v2.19.2 behavior Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * re-adding prompt_token_ids and using that for prompt_len Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * updating the datasets version in examples as well Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * api changes are not needed Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * moving datasets requirement and removing a missed api change Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * addressing review comments Signed-off-by: Rakib Hasan <rhasan@nvidia.com> * refactoring the quickstart example Signed-off-by: Rakib Hasan <rhasan@nvidia.com> --------- Signed-off-by: Rakib Hasan <rhasan@nvidia.com>	2025-04-18 07:06:16 +08:00
QI JUN	91660939fd	tests: waive test_llm_multi_node (#3664 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-18 01:59:16 +08:00
Ivy Zhang	ad19ca3cbf	remove benchmark test list (#3644 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 16:23:41 +08:00
Netanel Haber	3c52ac098f	feat: allocate minimal blocks per window size (#3028 ) * implement variable window attention by breaking the block manager into window block managers per window size Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * revert isCyclic to be true if the min attention window is reached, not per window size Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * add explanatory comment to mCyclicThreshold Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * load correct gemma config Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * don't shadow inputLength in addSequence - it should remain the function scope input length between window size loop iterations Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix KVCacheManagerVariableWindowAttentionWithReuseTest for multiple window block managers Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * if TYPE_CHECKING Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * set temp_attention_window_inputs to None explicitly Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * set temp_attention_window_inputs to None explicitly Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * pass dtype as well Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * test_gemma variable sliding window attention Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * allot a fraction of primary/secondaryBlocks to different window size heaps, depending on the window size's total contribution to the kvcache size (i.e., including all layers) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * remove \|\| mEnableBlockReuse which erroneously triggers beamsearch code for cyclic variable attention window code Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * turn off request delaying for MaxUtil Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * make comments better Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * windowSizesTotalSum using std::accumulate Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix error handling of forwardAsync - forwardAsync catch-all catch cleanup code that runs terminateRequest can also fail and must be caught Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix comments Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * remove assert that kills disagg tests, since it isn't necessary Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix corrupted expression: 'isNewTask && (peftCacheManager ?' -> '(isNewTask && peftCacheManager) ?' which caused boolean algebra. Main is correct Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * add Gemma3 to SUPPORTED_HF_ARCHITECTURES Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * support Gemma3 Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * finally fix test_gemma - always spread at least {} into generate_summary_cmd, never None Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix kvfactor field for deepseek Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix comment Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix gemma-3 entries in testlist to include vswa Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * only quantize gemma2 VSWA Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> remove misleading comment Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix test_gemma Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * in sendRequestInfo, fromOldAllocatedBlockIds->fromOldAllocatedBlockIds, like in main Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> * fix: disable KV cache reuse if using attention sink (#3021) * fix: disable KV cache reuse if using attention sink Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * fix: disable KV cache reuse if sink bubble Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> * add comment Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> --------- Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-04-17 16:04:57 +08:00
Yiqing Yan	1c6f3debbb	Waive L0 tests (#3651 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-04-17 15:13:56 +08:00
xinhe-nv	b82a4e8d01	test: [CI] Add failed cases into waives.txt (#3627 ) * update waive list Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> * fix waives Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> --------- Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-04-17 14:45:41 +08:00
danielafrimi	0f084d9566	added loraOp into lora layer + test for mlp and comparison to lora plugin (#3455 ) Loraop integration into torch modules Signed-off-by: Ubuntu <dafrimi@nvidia.com>	2025-04-17 12:48:27 +08:00
Ivy Zhang	b2fb0fe843	test: add quickstart test for nemotron-ultra (#3596 ) * add quickstart test for nemotron-ultra Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * fix test name Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> --------- Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 11:16:41 +08:00
ruodil	5e2ebebe76	tests: change qa perf test to trtllm-bench (#3189 ) Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-04-17 09:53:32 +08:00
Chuang Zhu	f4ddc304f2	disable ib for ucx test (#3613 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-04-17 06:43:57 +08:00
QI JUN	57cafe7f9b	waive test_fp8_scaled_mm (#3637 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-04-16 15:07:30 -07:00
Luis Vega	0bda1f9780	feat: Nemotron-H model support (#3430 ) * added files for nemotron-h Signed-off-by: Luis Vega <lvega@nvidia.com> * use try/except to import RMSNorm Signed-off-by: Luis Vega <lvega@nvidia.com> --------- Signed-off-by: Luis Vega <lvega@nvidia.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-04-16 14:05:56 -07:00

1 2 3 4 5 ...

283 Commits