TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Void	7992869798	perf: better heuristic for allreduce (#5432 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2025-07-01 22:56:06 -04:00
HuiGao-NV	10c50515c2	fix: Add back allreduce_strategy parameter into TorchLlmArgs (#5637 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-07-02 09:49:20 +08:00
Perkz Zheng	ba2ab5098b	[Bug] attention DP doesn't work with embedding TP (#5642 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-07-02 08:57:46 +08:00
Aurelien Chartier	efef911f5e	fix: add missing self. from PR #5346 (#5653 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-07-01 20:38:55 -04:00
Po-Wei (Vincent)	1341ffdfaa	[TRTLLM-5644][infra] Update the community action to more appropriate api (#4883 ) Don't need blossom-ci Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>	2025-07-01 14:44:16 -07:00
Aurelien Chartier	fa95e402a5	feat: add LLmArgs option to force using dynamic quantization (#5346 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-07-01 12:16:09 -07:00
liji-nv	c345f5876c	[feat] Support torch compile for attention dp (#5086 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-07-01 13:48:52 -04:00
Kaiyu Xie	f9a455651b	perf: Use tokenizers API to optimize incremental detokenization perf (#5574 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-01 09:35:25 -04:00
Robin Kobus	d68fa728d8	refactor: Clean up DecodingInput and DecodingOutput (#5617 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-01 14:31:42 +02:00
Yan Chunwei	3bc703d450	ci: unwaive llmapi launch test (#5281 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
Emma Qiao	178fc3f655	[Infra][release/0.21] - waive failed tests (#5537 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-01 20:12:55 +08:00
ixlmar	48eee338bf	fix: constrain grepping in docker/Makefile (#5493 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
ixlmar	4b3f2dbb45	fix: fix regression in LOCAL_USER (#5517 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
Anurag Mukkara	93edfea2b8	[nvbug/5354825] Fix nougat test image url (#5496 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
Yan Chunwei	ee7fcbf20e	[nvbug 5273941] fix: broken cyclic reference detect (#5417 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
Martin Marciniszyn Mehringer	be5ddb0533	Fix permission for local user issues in NGC docker container. (#5373 ) Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
ruodil	ded203d8aa	test: set enable_attention_dp=True in default deepseek settings (#5461 ) Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com> Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
Wanli Jiang	3789ba1d37	feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 (#5364 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
brb-nv	4ef60d5fbb	nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 (#5453 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
Ivy Zhang	61213e3562	tests: fix typos in qa test (#5421 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
Martin Marciniszyn Mehringer	872610a048	doc: cherry pick #5334 (#5368 ) Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>	2025-07-01 20:12:55 +08:00
Yan Chunwei	a5eff139f1	[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) (#5431 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-07-01 19:06:41 +08:00
杨凯旋	61c5a53642	[#5403 ][perf] Conditionally enable SWAP AB for speculative decoding (#5404 ) Signed-off-by: zoheth <z0heth@outlook.com> Co-authored-by: Yao Yao <lowsfer@users.noreply.github.com>	2025-07-01 18:32:37 +08:00
Emma Qiao	65c2b93284	[Infra] - Add some timeout and unwaive a test which dev fixed (#5631 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-01 05:01:32 -04:00
Pamela Peng	071ad758c4	[https://nvbugs/5318059 ][test] Unwaive test (#5624 ) Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>	2025-07-01 04:54:44 -04:00
Robin Kobus	5f77d212ef	test: Reduce number of C++ test cases (#5437 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-01 09:40:49 +02:00
danielafrimi	7a617ad1fe	feat: W4A16 GEMM (#4232 ) Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>	2025-07-01 10:36:05 +03:00
xinhe-nv	19c56f0374	test: [CI] Add failed cases into waives.txt (#5582 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 14:57:03 +08:00
Vivian Chen	34212e2e36	[TRTLLM-6104] feat: add request_perf_metrics to triton LLMAPI backend (#5554 ) Signed-off-by: Vivian Chen <140748220+xuanzic@users.noreply.github.com>	2025-06-30 21:34:42 -07:00
Stanley Sun	7135b27284	rcca: test default kv_cache_reuse option for pytorch multimodal (#5544 ) Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>	2025-07-01 12:12:48 +08:00
xinhe-nv	a8cf611baa	test: [CI] Add failed cases into waives.txt (#5569 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 11:02:56 +08:00
xinhe-nv	9b17b29b6e	test: [CI] remove closed bugs (#5572 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2025-07-01 10:15:43 +08:00
QI JUN	82547f733d	add feature support matrix for PyTorch backend (#5037 ) Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-07-01 10:09:54 +08:00
Erin	8caaf6871d	chores: [TRTLLM-6072] 1.0 LLMAPI doc updates (#5629 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-06-30 21:58:45 -04:00
Yi Zhang	7cf1209a19	[fix]: Fix main test skip issue (#5503 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2025-06-30 21:39:49 -04:00
Netanel Haber	6ee94c7ac8	Reintroduce with perf fixes: feature: unify new_tokens format sample state to trtllm samper tokens format (#5513 ) `58a8a8f` - these changes were previously merged to main here. `6aef149` - the changes were temporarily reverted in main, due to a significant perf regression in models using the TorchSampler (observed by @byshiue). This PR is meant to re-merge these changes along with a fix to prevent the regression. The first commit of this PR is actually just the reverted revert - filter it out of the changes to see previously unmerged changes. Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-06-30 11:58:59 -07:00
Wei-Ming Chen	f28cd3056e	feat: AutoDeploy fp8 quantization support for bmm (#3849 ) Signed-off-by: Wei-Ming Chen <17592131+meenchen@users.noreply.github.com>	2025-06-30 12:36:34 -04:00
nv-guomingz	6e48ac25a6	chore: remove cuda_graph_ prefix from cuda_graph_config filed members. (#5585 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-06-30 12:23:14 -04:00
Li Min	16fc99391f	refactor: [TRTLLM-6150] Refactor moe permute and finalize op by removing duplicated code (#5557 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-06-30 08:48:04 -07:00
Omer Ullman Argov	3b19634a5c	[fix][ci] missing class names in post-merge test reports (#5603 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-30 22:13:29 +08:00
Yan Chunwei	98a7c24062	chore [TRTLLM-6009]: remove ptuning knobs from TorchLlmArgs (#5595 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-06-30 20:40:23 +08:00
Omer Ullman Argov	42134b8b84	[ci] move eagle1 and medusa tests to post-merge (#5604 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-30 19:32:28 +08:00
ixlmar	38a39772ce	[TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instructions (#5490 ) (#5605 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-06-30 13:27:49 +02:00
Emma Qiao	b8a568d3c6	[Infra][main] Cherry-pick from release/0.21: Update nccl to 2.27.5 (#5539 ) (#5587 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-06-30 18:12:08 +08:00
Robin Kobus	9bdc5951f8	refactor: decoder state setup (#5093 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-06-30 11:09:43 +02:00
Fanrong Li	6cbc9a5297	[nvbug/5354946][fix] Fix mtp vanilla draft inputs (#5568 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-30 15:59:12 +08:00
Kaiyu Xie	2ce200fbbb	doc: Minor update to DeepSeek R1 best practice (#5600 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-06-30 15:49:06 +08:00
WeiHaocheng	42a9385d02	[TRTLLM-5331] perf: Replace allgaher with AllToAllPrepare (#5570 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-06-30 13:06:09 +08:00
dongjiyingdjy	852b79053d	feat : support duplicate_kv_weight for qwen3 blockwise scale (#5459 ) Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>	2025-06-30 11:49:22 +08:00
Omer Ullman Argov	1db63c2546	[fix] speedup modeling unittests (#5579 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-06-30 06:30:45 +03:00

1 2 3 4 5 ...

1609 Commits