TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
QI JUN	39c9ffda5a	[None][ci] fix test list name (#7321 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:33:22 -04:00
Pengyun Lin	c1e7fb9042	[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-08-28 10:22:06 +08:00
Venky	f30768e70d	[TRTLLM-6822][infra] Add PR-Checklist github action and modify PR template (#6029 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-27 18:45:23 -07:00
Kaiyu Xie	8a619be828	[None] [chore] Make disagg example compatible with recommended usage (#7121 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-08-27 23:57:46 +08:00
Martin Marciniszyn Mehringer	7cfa475e05	[None][fix] Remove the wheel from intermediate docker storage (#7175 ) Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>	2025-08-27 11:32:17 -04:00
bhsueh_NV	9d345b31c0	[https://nvbugs/5453727 ][fix] unwaive qwen3 CI tests (#7293 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 22:58:59 +08:00
Eran Geva	462169bfc9	[https://nvbugs/5458798 ][fix] AD perf test outliers handling, tightened threshold, re-enabled in CI, fixed mem threshold (#7189 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-08-27 07:57:46 -07:00
QI JUN	d09add5ede	[None][ci] parallelize unit tests of auto deploy in B200 (#7291 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 22:32:11 +08:00
Emma Qiao	8dc62ffac4	[None][infra] Waive failed tests on main (#7300 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-27 09:53:33 -04:00
xinhe-nv	f082e4857c	[TRTLLM-7250][fix] waive failed cases (#7292 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-08-27 18:04:46 +08:00
Mike Iovine	8b216135f0	[None][refactor] Move draft token padding out of Drafter (#7134 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-08-27 11:07:50 +02:00
nvamyt	dbd4f21687	[None][fix] Update maxnt of llama_v3.2_1b bench (#7279 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-27 16:56:28 +08:00
bhsueh_NV	f167b1fd99	[https://nvbugs/5453727 ][fix] Fix bug of how GPT-OSS setup the parameters in CI (#7151 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-08-27 15:26:10 +08:00
QI JUN	e08c7cf17b	[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-27 03:12:30 -04:00
dongxuy04	abdb2735be	[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262 ) Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>	2025-08-27 01:39:24 -04:00
Yukun He	bed5bc9f2e	[None][chore] Wrap the swiglu into custom op to avoid redundant device copy. (#7021 ) A redundant D2D copy is observed when enabling torch.compile for the Llama model due to the swiglu triton kernel, which brings perf overhead. Use a custom op to wrap the swiglu op to avoid this overhead. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-08-27 13:02:10 +08:00
Raayan Dhar	82bd1871ea	[None][chore] update disagg readme and scripts for pipeline parallelism (#6875 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-08-27 00:53:57 -04:00
Yuan Tong	6c7813e821	[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-08-27 00:45:58 -04:00
Iman Tabrizian	bc84758626	[None][feat] Add logging for OAI disagg server (#7232 )	2025-08-26 21:02:03 -07:00
Zhenhuan Chen	d0d8903a7f	[TRTLLM-6960][fix] replace flasky scaled_mm test with more stable config (#7089 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-08-26 20:58:33 -07:00
Shunkangz	ff4047414b	[None][opt] Balance the request based on number of tokens in AttentionDP (#7183 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-08-27 11:16:12 +08:00
Fanrong Li	e12868bc00	[None][fix] Remove and fuse some element-wise ops in the ds-r1-fp8 model (#7238 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-08-27 10:35:38 +08:00
Zhou Yuxin	ccb6aadea8	[https://nvbugs/5412456 ][fix] Remove from waives.txt (#7248 ) Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>	2025-08-27 10:05:53 +08:00
Jin Li	028235404b	[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-08-26 18:31:33 -04:00
Iman Tabrizian	87d1d3ab06	[None][update] Update disagg code owners (#7266 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-08-26 14:36:29 -04:00
Fridah-nv	0f947c64cb	[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder (#7233 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-08-26 10:47:57 -07:00
Frank	78ecfbb4a4	[None][fix] Fix data type of KV Cache percentage in bench. (#7230 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-08-26 12:28:09 -04:00
Void	040f4c70d3	[None][perf] Accelerate global scale calculations for deepEP fp4 combine (#7126 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-08-27 00:13:13 +08:00
QI JUN	baef70e67e	[None][ci] move qwen3 tests from b200 to gb200 (#7257 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-26 11:50:53 -04:00
Maurits de Groot	2d0c9b383f	[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM (#7260 ) Signed-off-by: Maurits de Groot <63357890+Maurits-de-Groot@users.noreply.github.com>	2025-08-26 11:26:19 -04:00
xinhe-nv	80043affb5	[None][chore] Add failed cases into waives.txt (#7251 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 17:13:44 +08:00
Emma Qiao	a142c0c4de	[None][infra] Add retry 3 times if ssh cluster failed (#6859 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-26 05:11:50 -04:00
Zhou Yuxin	f01101f687	[None][feat] Hopper Fp8 context mla (#7116 ) Signed-off-by: Yuxin <yuxinz@nvidia.com>	2025-08-26 17:10:20 +08:00
amitz-nv	23ed0c892d	[https://nvbugs/5477332 ][fix] Relax atol in test_mamba2_chunk_scan_combined_prefill_chunking (#7215 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-08-26 10:48:58 +03:00
Guoming Zhang	bf377d0b8e	[None][doc] Display tech blog for nvidia.github.io domain. (#7241 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-26 15:36:28 +08:00
Zheng Duan	cf50ba2980	[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server (#6985 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-26 15:34:44 +08:00
Zheng Duan	1a929a1490	[https://nvbugs/5457504 ][fix] fix kv cache event test in disaggregated worker tests (#7028 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-26 14:25:10 +08:00
nvamyt	d8bd8843fc	[None][test] Update qwen3 timeout to 60 minutes (#7200 ) Signed-off-by: nvamyt <amyt@nvidia.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 14:18:42 +08:00
yuanjingx87	bbc1478627	[None][chore] Update CI allowlist 2025-08-25 (#7229 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2025-08-25 22:53:48 -07:00
qixiang-99	b165f8bc97	fix/improve kvcache allocation in PyTorch runtime (#5933 ) Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>	2025-08-26 12:40:22 +08:00
William Zhang	92576488d3	[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013 ) * Why? Some models (e.g. anything produced by Mistral) can have both sharded safetensors and a consolidated safetensor in the same checkpoint directory. In such cases, prefetching both to memory is a waste of time, and memory. * What? This commit skips over consolidated safetensors when they are not the only safetensor file present in the checkpoint directory Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-25 23:56:21 -04:00
Zheng Duan	4f84a45899	[https://nvbugs/5452463 ][doc] update disagg doc about UCX_MAX_RNDV_RAILS (#7205 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-08-25 22:42:42 -04:00
Leslie Fang	20922b7d1f	[None][chore] Create PyExecutor from TorchLlmArgs Part 1 (#7105 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-26 10:42:01 +08:00
ruodil	b845eb7a3a	[None][test] add kv cache size in bench metric and fix failed cases (#7160 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-08-26 10:10:02 +08:00
Leslie Fang	9df15b2104	[None][doc] update feature_combination_matrix doc (#6691 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-08-26 08:25:31 +08:00
Grzegorz Kwasniewski	2101d46d68	[TRTLLM-6342][feat] TP Sharding read from the model config (#6972 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-08-25 15:41:27 -07:00
Lucas Liebenwein	97d550b4ba	[None] [AutoDeploy] canonicalize_graph before shape prop for consistent state_dict (#7223 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-08-25 16:59:57 -04:00
Bo Li	bf1b958f1a	[TRTLLM-7319][perf] Fuse slicing into MoE. (#6728 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com> Co-authored-by: Sergey Klevtsov <sklevtsov@nvidia.com>	2025-08-25 16:52:30 -04:00
Daniel Cámpora	e8e7e52892	[None][chore] Refactored the handle logits pp communication (#7154 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-25 16:14:08 -04:00
Frank	788fc62d23	[None][fix] Update to pull LLM from a central location. (#6458 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-08-25 13:07:29 -07:00

1 2 3 4 5 ...

2519 Commits