TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-14 23:14:02 +08:00

Author	SHA1	Message	Date
sunnyqgg	ea3e0eea51	[TRTLLM-7954][feat] Target model KV cache rellocation (#8421 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-10-23 09:36:50 +08:00
Yechan Kim	4230639370	[https://nvbugs/5550722 ][fix] Fix image load (#8093 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-10-16 22:46:19 +08:00
mpikulski	93a4b7f1b6	[None][chore] update torch_dtype -> dtype in 'transformers' (#8263 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-10-15 17:09:30 +09:00
Yuxian Qiu	3450fe9944	[None][fix] Fix dummy load format for key models. (#7993 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-10-14 11:18:39 +08:00
Po-Han Huang (NVIDIA)	6fc6f70a68	[https://nvbugs/5441729 ][test] Fix test_modeling_llama_min_latency.py failures (#7478 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-10-13 15:35:02 +08:00
Yibin Li	d7581bb551	[TRTLLM-8031][feat] Add chunked return_generation_logits logic (#7831 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-10-01 12:47:07 -04:00
Emma Qiao	b1e3fef8aa	[None][infra] Skip failed tests in post-merge for main (#8102 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-10-01 10:12:10 +08:00
sunnyqgg	2e5850c28a	[TRTLLM-7330][feat] Eagle3 cuda graph support for the first draft model inference (#7363 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-09-26 11:28:05 +08:00
Wanli Jiang	2a30f11d63	[None][chore] Upgrade transformers to 4.56.0 (#7523 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-22 22:20:16 +08:00
Yechan Kim	f77aca9f2c	[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-09-22 03:40:02 -07:00
Emma Qiao	c4abca323e	[None][infra] Waive failed tests on main (#7812 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-09-17 23:44:36 +08:00
William Zhang	2614d71994	[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 (#7628 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-09-17 08:11:16 -07:00
xiweny	c076a02b38	[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568 ) Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Signed-off-by: Daniel Stokes <dastokes@nvidia.com> Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com> Signed-off-by: Xiwen Yu <xiweny@nvidia.com> Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com> Signed-off-by: Yiqing Yan <yiqingy@nvidia.com> Signed-off-by: Bo Deng <deemod@nvidia.com> Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Daniel Stokes <dastokes@nvidia.com> Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com> Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com> Co-authored-by: Yiqing Yan <yiqingy@nvidia.com> Co-authored-by: Bo Deng <deemod@nvidia.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-09-16 09:56:18 +08:00
QI JUN	ff3704897b	[None][ci] remove unnecessary test_modeling_deepseek.py (#7542 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-09-04 20:05:27 -07:00
2ez4bz	cf0c47ca2d	[None][fix] Fix batching bug in Mistral3 model (#6841 ) Prior to this commit, if multiple requests with images were in the same batch, the batching logic for the images would fail. This commit fixes it, and adds unit tests for it that were verified to fail prior to the fix. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-01 11:02:31 +08:00
QI JUN	bea5e07fb7	[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-08-25 20:52:05 +08:00
tomeras91	c232ba8157	[TRTLLM-4921][feat] Enable chunked prefill for Nemotron-H (#6334 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-08-22 12:15:20 -04:00
tomeras91	f0bfb49219	[https://nvbugs/5458874 ][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test (#6996 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-08-19 15:45:06 +03:00
Emma Qiao	cc6d763824	[None][infra]Waive failed cases in main branch (#6951 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-08-17 14:27:59 +03:00
Daniel Cámpora	53312eeebd	[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-16 00:27:24 -04:00
shaharmor98	14b36e07d7	[TRTLLM-6174][feat] Enable FP32 mamba ssm cache (#6574 ) Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>	2025-08-10 16:27:51 -04:00
2ez4bz	064eb7a70f	[TRTLLM-5252][fix] Propagate mapping to intermediate layers (#6611 ) This commit propagates the mapping to intermediate layers to enable tensor parallelism (amongst other things) in them. It also fixes issues with a unit test for TP for pixtral, and adds it to a test list. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-08-08 01:50:36 -04:00
Daniel Cámpora	efca359b66	[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default (#6216 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-07 22:19:37 -04:00
hlu1	8207d5fd39	[None] [feat] Add model gpt-oss (#6645 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>	2025-08-07 03:04:18 -04:00
brb-nv	6135f75f87	[None][chore] Update Gemma3 closeness check to mitigate flakiness (#6591 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-08-04 10:10:58 -04:00
Yechan Kim	ee6ab5be96	chore: add EXAONE4 accuracy test (#6397 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-08-04 10:14:16 +08:00
tomeras91	6d5da9f7c2	[https://nvbugs/5404046 ][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test (#6485 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-31 21:35:10 +03:00
2ez4bz	ab7434ac62	[feat] Enable TP and batching for PixtralVisionModel / Mistral3VLM (#6152 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-22 11:06:41 -07:00
Emma Qiao	e41507a253	[Infra] - Waive failed cases on recent post-merge (#6212 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-21 21:00:18 +08:00
brb-nv	ca9bc5727e	fix: Flush stale `PlanParams` with custom attention mask (#6163 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-21 09:55:09 +08:00
Wanli Jiang	2d2b8bae32	feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support (#5644 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-17 06:30:58 +08:00
shaharmor98	e0836f9ca9	[TRTLLM-5493] Add core infrastructure to enable loading of custom checkpoint formats (#5372 ) Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>	2025-07-17 00:50:30 +08:00
Yan Chunwei	a02606a9e2	[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend (#5752 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-16 16:42:59 +08:00
nv-guomingz	4e4d18826f	chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#6003 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-15 15:50:03 +09:00
brb-nv	f5f5be9e94	enh: Bidirectional mask with multiple images for Gemma3 (#5976 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-14 22:39:18 +08:00
brb-nv	1a2d96919c	feat: Update Gemma3 Vision Encoder (#5973 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-14 22:38:10 +08:00
brb-nv	0385f89abc	test: Fix Gemma3 unit tests due to transformers upgrade (#5921 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 17:24:10 -07:00
2ez4bz	c19840235d	[fix] Fix mistral unit tests due to transformers upgrade (#5904 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-10 10:45:27 -07:00
brb-nv	3209b31665	feat: Custom masking utils for Gemma3 VLM (#5853 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-10 06:18:04 +09:00
2ez4bz	87fe44fd29	feat(models): Mistral3.1 VLM pytorch backend support (#5529 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-07-09 13:17:40 -07:00
Wanli Jiang	3f7cedec7c	Update transformers to 4.53.0 (#5747 ) Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com> Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-09 09:32:24 -07:00
Erin	e277766f0d	chores: merge examples for v1.0 doc (#5736 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-07-08 21:00:42 -07:00
brb-nv	2bd09ed2d4	fix: Skip rope scaling for local layers in Gemma3 VLM (#5857 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-09 10:10:33 +08:00
brb-nv	cdaa6abce7	fix: Investigate Gemma3 1B decoder output discrepancy (#5564 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
tomeras91	7dbecf7272	[TRTLLM-4923][feat] Enable CUDA graphs for Nemotron-H (#5646 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-03 11:07:51 +03:00
Omer Ullman Argov	1db63c2546	[fix] speedup modeling unittests (#5579 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-30 06:30:45 +03:00
nv-guomingz	578430e64c	[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-30 11:05:40 +08:00
Yan Chunwei	9bd42ecf9b	[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-20 03:01:10 +08:00
Omer Ullman Argov	4eade3ae33	[fix][test] Speedup Nemotron NAS unittests (#5202 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-15 11:26:03 +03:00
ixlmar	e055af1bc9	chore: improve disagg test failure detection (#4738 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-06-15 01:28:26 +08:00

1 2

77 Commits