TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-01 08:41:13 +08:00

Author	SHA1	Message	Date
Xiwen Yu	38ef850552	Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_0901 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-09-01 11:46:44 +08:00
Bo Deng	3805f615da	[https://nvbugs/5453949 ][infra] unwaive test_llama_eagle3 Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-08-31 18:29:39 -07:00
Jiagan Cheng	8d5a7ea5b3	[https://nvbugs/5443053 ][fix] Disable finalize fusion when Lora is used Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-08-31 18:28:09 -07:00
Tian Zheng	e257cb3533	[None][feat] Support NVFP4 KV Cache (#6244 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-09-01 09:24:52 +08:00
xinhe-nv	5f939b9121	[None][chore] Add failed cases into waives.txt (#7342 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-30 00:49:14 -04:00
Emma Qiao	15ec2b855d	[None][infra] Waive failed tests on main branch 08/29 (#7370 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA	62459d533d	[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com> Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com> Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>	2025-08-29 17:03:46 +08:00
fredricz-20070104	091b67ad2f	[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326 ) Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>	2025-08-29 02:16:22 -04:00
Yiqing Yan	3c06303542	[TRTLLM-7755][infra] Add DGX_B300 and GB300 tests in CI Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-08-28 22:45:00 -07:00
Chang Liu	31b0f0fb0c	[https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules (#7268 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-08-29 12:36:30 +08:00
Richard Huo	ce580ce4f5	[None][feat] KV Cache Connector API (#7228 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com> Signed-off-by: richardhuo-nv <rihuo@nvidia.com> Co-authored-by: jthomson04 <jwillthomson19@gmail.com> Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-28 23:09:27 -04:00
aalanwyr	085dc19bfa	[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284 ) Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>	2025-08-28 23:09:11 -04:00
Yuan Tong	ccb800f909	[TRTLLM-7457][ci] Update unittest parallel config (#7297 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-08-29 09:28:04 +08:00
Emma Qiao	1e644fa28a	[None][infra] Waive failed tests on main branch 08/26 (#7346 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-29 00:24:08 +08:00
Neta Zmora	08f935681d	[https://nvbugs/5474453 ][fix] fix path to tested model (#7272 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-08-28 08:01:48 -04:00
Zongfei Jing	53163bf1df	[TRTLLM-6876][feat] Add low precision all2all for mnnvl (#7155 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2025-08-28 18:26:16 +08:00
QI JUN	ae89163368	[None][ci] skip TestGPTOSS (#7333 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-28 05:01:49 -04:00
William Zhang	4541655e5f	[https://nvbugs/5430124 ][ci] Unwaive Mistral 3.1 Small tests (#7274 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-28 00:03:32 -04:00
QI JUN	39c9ffda5a	[None][ci] fix test list name (#7321 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:33:22 -04:00
Pengyun Lin	c1e7fb9042	[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-28 10:22:06 +08:00
bhsueh_NV	9d345b31c0	[https://nvbugs/5453727 ][fix] unwaive qwen3 CI tests (#7293 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 22:58:59 +08:00
Eran Geva	462169bfc9	[https://nvbugs/5458798 ][fix] AD perf test outliers handling, tightened threshold, re-enabled in CI, fixed mem threshold (#7189 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-08-27 07:57:46 -07:00
QI JUN	d09add5ede	[None][ci] parallelize unit tests of auto deploy in B200 (#7291 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:32:11 +08:00
Emma Qiao	8dc62ffac4	[None][infra] Waive failed tests on main (#7300 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-27 09:53:33 -04:00
xinhe-nv	f082e4857c	[TRTLLM-7250][fix] waive failed cases (#7292 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-27 18:04:46 +08:00
nvamyt	dbd4f21687	[None][fix] Update maxnt of llama_v3.2_1b bench (#7279 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-27 16:56:28 +08:00
bhsueh_NV	f167b1fd99	[https://nvbugs/5453727 ][fix] Fix bug of how GPT-OSS setup the parameters in CI (#7151 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 15:26:10 +08:00
QI JUN	e08c7cf17b	[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 03:12:30 -04:00
dongxuy04	abdb2735be	[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-08-27 01:39:24 -04:00
Yuan Tong	6c7813e821	[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-08-27 00:45:58 -04:00
Zhenhuan Chen	d0d8903a7f	[TRTLLM-6960][fix] replace flasky scaled_mm test with more stable config (#7089 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-08-26 20:58:33 -07:00
Shunkangz	ff4047414b	[None][opt] Balance the request based on number of tokens in AttentionDP (#7183 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-27 11:16:12 +08:00
Zhou Yuxin	ccb6aadea8	[https://nvbugs/5412456 ][fix] Remove from waives.txt (#7248 ) Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>	2025-08-27 10:05:53 +08:00
Jin Li	028235404b	[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-26 18:31:33 -04:00
Fridah-nv	0f947c64cb	[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder (#7233 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-08-26 10:47:57 -07:00
Void	040f4c70d3	[None][perf] Accelerate global scale calculations for deepEP fp4 combine (#7126 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-08-27 00:13:13 +08:00
QI JUN	baef70e67e	[None][ci] move qwen3 tests from b200 to gb200 (#7257 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-26 11:50:53 -04:00
xinhe-nv	80043affb5	[None][chore] Add failed cases into waives.txt (#7251 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 17:13:44 +08:00
amitz-nv	23ed0c892d	[https://nvbugs/5477332 ][fix] Relax atol in test_mamba2_chunk_scan_combined_prefill_chunking (#7215 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-26 10:48:58 +03:00
Zheng Duan	cf50ba2980	[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server (#6985 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-26 15:34:44 +08:00
Zheng Duan	1a929a1490	[https://nvbugs/5457504 ][fix] fix kv cache event test in disaggregated worker tests (#7028 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-26 14:25:10 +08:00
nvamyt	d8bd8843fc	[None][test] Update qwen3 timeout to 60 minutes (#7200 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 14:18:42 +08:00
qixiang-99	b165f8bc97	fix/improve kvcache allocation in PyTorch runtime (#5933 ) Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>	2025-08-26 12:40:22 +08:00
William Zhang	92576488d3	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013 ) * Why? Some models (e.g. anything produced by Mistral) can have both sharded safetensors and a consolidated safetensor in the same checkpoint directory. In such cases, prefetching both to memory is a waste of time, and memory. * What? This commit skips over consolidated safetensors when they are not the only safetensor file present in the checkpoint directory Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-25 23:56:21 -04:00
Leslie Fang	20922b7d1f	[None][chore] Create PyExecutor from TorchLlmArgs Part 1 (#7105 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-26 10:42:01 +08:00
Xiwen Yu	ab7febd4d8	Merge commit '31979aefacbf80d2742c98ef30385db162788c84' into feat/b300_cu13 Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>	2025-08-26 10:31:35 +08:00
ruodil	b845eb7a3a	[None][test] add kv cache size in bench metric and fix failed cases (#7160 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 10:10:02 +08:00
Grzegorz Kwasniewski	2101d46d68	[TRTLLM-6342][feat] TP Sharding read from the model config (#6972 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-25 15:41:27 -07:00
chenfeiz0326	6a44e5b9d1	[https://nvbugs/5440241 ][fix] Fix 70B GSM8K Accuracy drop (#6967 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-08-25 22:09:30 +08:00
Emma Qiao	200db3b809	[None][infra] Waive failed tests on main branch (#7201 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-25 09:04:37 -04:00

1 2 3 4 5 ...

1363 Commits