TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Ivy Zhang	71524a1a48	[https://nvbugs/5419066 ][fix] Use trt flow LLM (#6467 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-08-01 03:33:07 -04:00
Zero Zeng	48768fd720	fix: Fix missing key (#6471 ) Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>	2025-08-01 14:25:58 +08:00
Kaiyu Xie	aee35e2dbd	chore: Make example SLURM scripts more parameterized (#6511 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-08-01 12:53:15 +08:00
Robin Kobus	d3c14682f0	refactor: Remove unused buffers and bindings from sampler (#6484 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-08-01 00:43:03 -04:00
Venky	ad5742b105	[fix] Update get_trtllm_bench_build_command to handle batch size and tokens (#6313 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-08-01 00:08:09 -04:00
Yiteng Niu	4472f11bb7	[TRTLLM-6364][infra] Validate for PR titles to ensure they follow the required format (#6278 ) Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-08-01 11:26:05 +08:00
Yao Yao	942e080415	[fix] Fix missing fields in xqa kernel cache key (#6282 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>	2025-08-01 10:41:26 +08:00
Jaedeok Kim	fbee279909	fix: remove duplicate layer multiplication in KV cache size calculation (#6481 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>	2025-07-31 22:34:34 -04:00
Zongfei Jing	7bb0a78631	Deepseek R1 FP8 Support on Blackwell (#6486 ) Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-08-01 10:26:28 +08:00
Venky	8c165fd27a	[TRTLLM-6611][feat] Add warnings and stricter validation to LoraManager adapter loading (#6453 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-07-31 22:22:51 -04:00
Yukun He	00059de380	chore: Improve the AutoTuner log information. (#6368 ) * Change the fallback alert from DEBUG to WARNING level and only do it once. * Add debug information for profiling cache right after the warmup phase. * Change the level of exception message during tactic profiling from ERROR to WARNING level. All exception details are pushed to the DEBUG level. * Other trivial refinements and cleanups. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-08-01 09:19:52 +08:00
brb-nv	2eca0d5925	fix: Fix poor generation with FP8 Gemma3 1B checkpoint (#6499 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-31 17:18:23 -07:00
Simeng Liu	8cf3faa26a	[feat] Auto-enable ngram with concurrency <= 32. (#6232 ) Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Mike Iovine <mike.iovine7@gmail.com> Co-authored-by: Mike Iovine <miovine@nvidia.com> Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>	2025-07-31 18:45:51 -04:00
Ziyi Xiong	8062e0fe7c	[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically (#6363 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-07-31 15:31:39 -04:00
Michal Guzek	b8719fe96d	[nvbug/5374773] chore: Update nanobind with fail_fast_on_attention_window_too_large changes (#6491 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-07-31 20:25:29 +01:00
tomeras91	6d5da9f7c2	[https://nvbugs/5404046 ][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test (#6485 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-31 21:35:10 +03:00
shaharmor98	0c42f54a39	Bugfix/fix nemotron nas lora support (#6380 ) Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>	2025-07-31 13:39:35 -04:00
Emma Qiao	baece56758	[None][infra] Pin the version for triton to 3.3.1 (#6508 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-31 19:25:15 +08:00
Yiqing Yan	d38c26bb78	[Infra][TRTLLM-5633] - Fix merge waive list (#6504 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-31 14:57:51 +08:00
amitz-nv	1ee7a08d2b	[5830][feat] Improve LoRA cache memory control (#6220 ) Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>	2025-07-31 09:26:38 +03:00
Venky	83e97659aa	[infra] Remove auto_assign_reviewers option from .coderabbit.yaml (#6490 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-07-30 23:07:21 -07:00
Wanli Jiang	fcd5706615	doc: add bielik model to support-matrix (#6480 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-31 00:48:53 -04:00
Faraz	8e84df74b5	Fix e2e test failure for RTX6000 Pro (#6420 ) Signed-off-by: list <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Faraz <58580514+farazkh80@users.noreply.github.com>	2025-07-30 23:32:44 -04:00
xinhe-nv	ca534e4798	test: add accuracy reference (#6479 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-31 12:27:29 +10:00
dongjiyingdjy	17e0d0fb1a	fix: fix illeagel memory access (#6437 ) Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>	2025-07-31 10:01:34 +08:00
Enwei Zhu	4b299cb77e	feat: Support structural tag in C++ runtime and upgrade xgrammar to 0.1.21 (#6408 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-31 09:53:52 +08:00
bhsueh_NV	ae3a5fc918	[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI (#6477 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-07-31 09:37:23 +08:00
Yechan Kim	83621e4b80	doc: update multimodal models on support-matrix.md (#6431 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-31 08:50:18 +08:00
Vadim Gimpelson	25cd4f215e	[PERF] Move calculation Qwen2-VL's rotary_cos_sin to LLM worker process (#6004 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>	2025-07-31 09:35:24 +09:00
brb-nv	0e16d1f070	test: Add time logging for lora tests (#6466 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-30 14:02:43 -07:00
shaharmor98	f9cf683e39	add propagation of trust_remote_code to OpenAIServer (#6446 ) Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>	2025-07-30 15:25:41 -04:00
Anurag Mukkara	fac186e3b5	[nvbug/5409417] Unwaive llava test case (#6460 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-07-30 14:38:47 -04:00
brb-nv	f6287e4498	Unwaive Gemma2 LoRA test on H100 (#6461 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-30 12:56:12 -04:00
Bo Deng	24e7f4eece	[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests (#6439 ) Signed-off-by: Bo Deng <deemod@nvidia.com>	2025-07-31 00:41:37 +08:00
Wanli Jiang	9632dba02e	feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm (#6353 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-30 09:20:16 -07:00
NVShreyas	e67f4da9b5	[Perf]: Add residual, norm for nemotron_nas models (#6455 ) Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>	2025-07-30 09:10:38 -07:00
pcastonguay	0f083b9daf	fix: Unwaive triton cpp test [nvbug 5401088] (#6412 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>	2025-07-30 11:25:18 -04:00
nv-guomingz	03e38c9087	chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-30 11:11:06 -04:00
Chang Liu	b4065d8ca6	[TRTLLM-6654][feat] Add support for external multimodal embeddings (#6263 ) Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>	2025-07-30 10:00:15 -04:00
pcastonguay	e7ae5e2824	feat: Add support for disaggregation with pp with pytorch backend (#6369 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: raayandhar <rdhar@nvidia.com> Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: raayandhar <rdhar@nvidia.com> Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-07-30 09:42:13 -04:00
tomeras91	a2514d93fc	[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 (#6447 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-30 07:22:32 -04:00
Leslie Fang	d980928c96	[doc] update the doc of feature combination matrix (#6441 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2025-07-30 18:48:49 +08:00
Yiqing Yan	0cf2f6f154	[TRTLLM-5633] - Merge current waive list with the TOT waive list (#5198 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-30 17:50:05 +08:00
Yechan Kim	22b29df38c	[nvbugs/5414909] fix: Qwen2-VL keyword on L20 (#6427 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-07-30 17:29:55 +08:00
xinhe-nv	d9ab3fd35e	tests: add TestNemotronH cuda graph tests (#6390 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-30 18:45:58 +10:00
nv-guomingz	a5540acfce	chore: add trtllm-serve json schema example into doc. (#6418 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-30 04:33:08 -04:00
QI JUN	2fe9cc0889	chore: remove draft_model_engine from init parameter list of PyExecutor (#6325 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-30 03:31:49 -04:00
QI JUN	1f39a11af0	chore: clean code of PyExecutor (#6445 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-30 02:11:43 -04:00
2ez4bz	d6eed1b624	[fix] Switch placement of image placeholder for mistral 3.1 (#6435 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-30 14:10:36 +08:00
Jinyang Yuan	a427f5bece	[fix] Fix wide EP when using DeepEP with online EPLB (#6429 ) Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>	2025-07-30 00:13:18 -04:00

1 2 3 4 5 ...

2102 Commits