TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
Stefan Niebler	d1112aac37	[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow (#5333 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-07-05 01:35:13 +09:00
Chuang Zhu	ffc0b8f5da	Cache transceiver support VSWA (#5505 ) Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com> Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com> Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-07-05 01:18:42 +09:00
HuiGao-NV	3ed3bbcb5d	Fix: pass allreduce strategy to pytorchConfig (#5746 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-07-04 21:32:13 +09:00
Yiqing Yan	7f3ea058f0	[Infra] - Waive L0 flaky test (#5759 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-04 19:25:12 +09:00
Shunkangz	32339d1b20	Raise shut down error for each request (#4936 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-07-04 18:58:24 +09:00
ixlmar	471bf0b4fc	fix: check file exists in dev container script (#5755 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-07-04 10:29:17 +02:00
xinhe-nv	3869b969a6	test: [CI] Add failed cases into waives.txt (#5718 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-04 17:24:48 +09:00
Faraz	81c0764012	Cherry pick "[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 (#5724 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>	2025-07-04 16:53:20 +09:00
Robin Kobus	07f9cf1519	fix: Improve chunking test and skip empty kernel calls (#5710 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-04 09:08:15 +02:00
Yiqing Yan	b8fef809ae	[Infra] - Waive L0 test (#5748 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-04 15:04:49 +08:00
Tailing Yuan	e134a52e07	Perf: reduce DeepEPLowLatency memory and time (#5712 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-07-04 14:46:28 +08:00
nv-guomingz	c434147366	chore: update doc by replacing use_cuda_graph with cuda_graph_config (#5680 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-04 15:39:15 +09:00
Yuan Tong	32b244af38	feat: reduce unnecessary kernel generation (#5476 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2025-07-04 14:37:49 +08:00
Shunkangz	a79d8c9f5e	Fix none response in PD (#5422 ) Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>	2025-07-04 14:25:10 +08:00
Netanel Haber	134b2383ff	[fix: nvbugs/5355493] Correctly clamp max sequence len to max attention window (#5720 ) Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-07-04 08:16:25 +02:00
Linda	94f0252b46	Doc: Update invalid hugging face URLs (#5683 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
Emma Qiao	a0135c0f6f	[Infra] - Waive failed cases on release/0.21 (#5674 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-04 13:14:13 +08:00
brb-nv	cdaa6abce7	fix: Investigate Gemma3 1B decoder output discrepancy (#5564 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
Frank	819ae903de	[https://nvbugspro.nvidia.com/bug/5351333 ][fix] Update to chunking calculation. (#5625 ) Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
Kaiyu Xie	ab488a5a5d	doc: Fix outdated config in DeepSeek best perf practice doc (#5638 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
Yi Zhang	73d30a23c7	test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests (#5397 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
Zheng Duan	cb9f596dbe	[nvbug 5300551] test: increase block count in eviction test (#5465 ) Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
nv-guomingz	d0b3d2ac65	fix:https://nvbugs/5362398 (#5609 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
Yan Chunwei	77288d3671	fix [nvbug5351244]: test_mpi_session submit sync/async (#5608 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-04 13:14:13 +08:00
xinhe-nv	7f837b6e8b	tests: waive failures on main (#5704 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2025-07-04 12:39:12 +09:00
Venky	4762e0b244	Waive tests : test_openai_lora, test_trtllm_serve_lora_example and test_openai_chat_structural_tag_example (#5740 ) Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>	2025-07-04 11:01:08 +09:00
Clay	7a319524da	feat: support more parameters in openai worker of scaffolding (#5115 ) Signed-off-by: Clay <ccs96307@gmail.com>	2025-07-04 09:35:34 +08:00
Lucas Liebenwein	24ac9b5f69	[AutoDeploy] merge feat/ad-2025-06-29 (#5737 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com> Co-authored-by: Neta Zmora <nzmora@nvidia.com> Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2025-07-04 10:21:18 +09:00
Netanel Haber	aa72d39b72	MTP and derivatives: Align sample state with trtllm sampler sample state (#5675 ) This PR moves MTPSampler and derivatives to use the universal seq_slot indexing for sampling. This is the last piece of the puzzle: After this, all of the samplers will use this format. See: `6ee94c7` Signed-off-by: Netanel Haber <nhaber@nvidia.com>	2025-07-03 19:55:48 +02:00
Po-Wei (Vincent)	0566fa1697	[None][infra] Update the auto-community label action to be triggered every hour (#5658 ) Signed-off-by: Po-Wei Wang (Vincent) <poweiw@nvidia.com>	2025-07-03 09:56:30 -07:00
Zhenhuan Chen	528ff52ef4	[https://nvbugs/5365714 ] fix(scaffolding): use default LLM rather than trt backend LLM (#5705 ) Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>	2025-07-03 23:54:20 +09:00
Rashid Kaleem	2b0c87e613	[ModelLoad] Concurrent load model (#5291 ) Signed-off-by: Rashid K <rkaleem@nvidia.com> Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>	2025-07-03 22:18:04 +08:00
nv-guomingz	8dad22cbe7	chore: refine the default value by using pydantic default instead of … (#5695 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-03 22:41:29 +09:00
Robin Kobus	1a3bd140ed	chore: Remove unused isFullContextRequest method (#5666 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-03 15:08:09 +02:00
Netanel Haber	f91379b7e8	delete duplicate eagle3 and ngram tests (#5711 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2025-07-03 15:47:26 +03:00
Omer Ullman Argov	c72856188c	[ci] small multigpu speedups (#5643 ) Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>	2025-07-03 08:06:10 -04:00
WeiHaocheng	dccbfc8b1e	fix: Set init value for moe expert id (#5660 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-07-03 07:05:31 -04:00
Emma Qiao	530897388c	[Infra] - Waive a failed case on main (#5702 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-03 06:09:27 -04:00
Yiqing Yan	de0b522dfd	[Infra] - Fix test stage check for the package sanity check stage (#5694 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-03 16:39:46 +08:00
tomeras91	7dbecf7272	[TRTLLM-4923][feat] Enable CUDA graphs for Nemotron-H (#5646 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-07-03 11:07:51 +03:00
Yiqing Yan	3c9dd5cd66	chore: bump version to 1.0.0rc2 (#5645 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-07-03 12:35:28 +08:00
Emma Qiao	2a5fdebf10	[Infra] - Waive failed tests for main 0702 (#5671 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-02 22:05:07 -04:00
Enwei Zhu	3a46cf275b	fix: Fix missing arg to alltoall_prepare_maybe_dispatch (#5669 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-07-02 21:41:55 -04:00
Fridah-nv	afef5127f0	feat:[AutoDeploy] E2E build example for llama4 VLM (#3922 ) Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>	2025-07-02 19:29:34 -04:00
ixlmar	04fa6c0cfc	[TRTLLM-6143] feat: Improve dev container tagging (#5551 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-07-02 14:56:34 +02:00
Emma Qiao	31699cbeb1	[Infra] - Set default timeout to 1hr and remove some specific settings (#5667 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2025-07-02 08:37:54 -04:00
Jhao-Ting Chen	77082cde38	[https://nvbugspro.nvidia.com/bug/5329655 ] [feat] Pytorch path add spec dec param to attention op (#5146 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-07-02 04:54:43 -04:00
Robin Kobus	4cd8543d8c	[TRTLLM-1316] refactor: Remove unnecessary pipeline parallelism logic from postProcessRequest (#5489 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-07-02 10:13:31 +02:00
qixiang-99	ca7b6ec8d8	Feat/pytorch vswa kvcachemanager (#5151 ) Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>	2025-07-02 15:58:00 +08:00
Yan Chunwei	2d69b55fe8	chore: enhance yaml loading arbitrary options in LlmArgs (#5610 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-02 14:21:37 +08:00

1 2 3 4 5 ...

1662 Commits