TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

Author	SHA1	Message	Date
Void	7d16f3a28b	[https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow (#10499 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2026-01-13 17:16:22 +08:00
Zhenhuan Chen	7c3bb8534d	[None][chore] Revert "[None][fix] change allreduce workspace dtype to torch.int64 t… (#9538 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2025-11-28 16:45:23 +08:00
Zhenhuan Chen	e47927e847	[None][fix] change allreduce workspace dtype to torch.int64 to avoid overflow (#9479 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2025-11-27 17:08:41 +08:00
Eran Geva	afc52d7b93	[https://nvbugs/5647400 ] [fix] Enlarged the AllReduce workspace size to 64MB. Added AllReduce strategy to AD config. (#9145 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-11-25 10:56:07 -08:00
Anish Shanbhag	5ff4f88be6	[TRTLLM-8683][chore] Migrate PluginConfig to Pydantic (#8277 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-10-17 16:13:22 -04:00
Guoming Zhang	202bed4574	[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-25 21:02:35 +08:00
Aurelien Chartier	1389f5a4d3	feat: Add support for fp8 rowwise quantization (#4876 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com> Co-authored-by: aikitoria <151776613+aikitoria@users.noreply.github.com>	2025-06-14 06:37:48 -07:00
Yilin Fan	31bb650298	Cherry pick feat/llama4 to main (#4739 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com> Co-authored-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-05-30 05:28:40 +08:00
kanghui0204	6f3922f318	feat: Low Precision Allreduce for PCIe based GPU (#4344 ) This PR adds a customized allreduce to TensorRT-LLM. The new allreduce is used for communication on PCIe-based GPUs via low-precision quantization, which can accelerate the PCIe allreduce process. Signed-off-by: Hui Kang <hkang@nvidia.com> Co-authored-by: Hui Kang <hkang@nvidia.com>	2025-05-20 06:53:46 +08:00
QI JUN	498ce8a056	Revert "feat: Low Precision Allreduce for PCIe based GPU" (#4340 ) Revert "feat: Low Precision Allreduce for PCIe based GPU (#3851)" This reverts commit `5e634dd1bd`.	2025-05-15 09:52:39 +08:00
kanghui0204	5e634dd1bd	feat: Low Precision Allreduce for PCIe based GPU (#3851 ) This PR adds a customized allreduce to TensorRT-LLM. The new allreduce is used for communication on PCIe-based GPUs via low-precision quantization, which can accelerate the PCIe allreduce process. Signed-off-by: Hui Kang <hkang@nvidia.com> Co-authored-by: Hui Kang <hkang@nvidia.com>	2025-05-14 16:45:43 +08:00
Yukun He	aa38e28cfa	fix: [nvbug/5241627] Fix AllReduce kernel hang issue when both tp and pp are enabled. (#3988 ) * Fix AllReduce kernel hang issue when both tp and pp are enabled. Allocate one workspace for each pp rank to avoid potential race. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * update waive list Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> --------- Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-05-05 11:33:25 +08:00
Dom Brown	8709fe8b53	chore: bump version to 0.19.0 (#3598 ) (#3841 ) test: add test cases for 0.19 release (#3608) * fix test name * add quickstart test for nemotron-ultra * add rcca multi-node test case for deepseek-v3 * add rcca info --------- squash (#3642) fix: nvbugs/5187237: fix deterministic mode crash (#3448) * nvbugs/5187237 nvbugs/5112075: fix deterministic mode error * remove waive * Revert "remove waive" This reverts commit 0bf5486d19906d692bfb7a6262333c296b0087ac. * revert ar fusion --------- update fp8 doc (#3647) tests: change qa perf test to trtllm-bench (#3619) fix: FP8 quantized lm_head (NvBug 5214229) (#3567) infra: Add PR approval protection for the release branch (#3634) fix: nvbugs/5231298: pytorch allreduce issue (#3673) Fix: nvbugs/5222698 variable not defined (#3630) * Fix: nvbugs/5222698 variable not defined * Tidy code --------- test:sync waives.txt from main branch by disabling test_perf/gpt_350m-cppmanager case (#3685) test:restore fp8 kv cache testing for L0 (#3671) doc: Update DeepSeek perf docs (#3693) * Update DeepSeek perf docs * update * Apply suggestions from code review --------- tests: waive test_llm_multi_node (#3664) fix: update test_user_buffers_mm_add_prologue atol (#3711) Fix: cherry-pick hmac encryption from main branch (#3635) * security fix cherry-pick changes from main * fix hmac in remote mpi session (#3649) --------- Un-waive DS-V3-Lite tests. (#3621) fix: FP8 kv accuracy (#3675) * fix FP8 kv accuracy * update doc --------- Fix script options for engines. (#3622) unwaive multi-node test (#3721) chore : Split more tests out of gpt tests (#3524) (#3674) doc:add torch examples link into torch backend documentation (#3749) test: Get Eagle tests working (#3593) (#3722) Waive L0 test (#3756) waive failed case in perf test, change default max_batch_size to 512 and write config.json to output log (#3656) Update ds v3 parameters in stress test. (#3676) waive gemma on L20 (#3766) https://nvbugs/5141291: Fix convert.py script for Qwen model. (#3758) Include Qwen2VLDecoderLayer in the smooth_qwen2_model function. fix: PP4 fixes and cleanup (#3688) remove benchmark test list (#3643) skip disagg deepseek test if sm!=90 (#3720) test: skip failed cases on B200 (#3710) * add skip condition to tests * fix error --------- test: [nvbug: 5234494] skip_pre_ada for fp8 cases (#3718) * skip_pre_ada for fp8 cases * update * update after rebase --------- add know issue to deepseek doc. (#3800) Fix ModelOpt Mixtral AWQ OOM (#3714) (#3761) Waive L0 tests (#3826) fix: Reduce memory usage in fused moe op associated with AutoTuning and fix moe fallback issue. (#3793) * Reduce memory usage in fused moe op associated with AutoTuning. * Replace pre-defined bucket size strategy with a generating function based on the tune_max_num_tokens. * Add free_memory logic of workspace in min_latency_mode fused moe path. * Fix fused_moe fallback issue. (#3652) min_latency_mode is only set to False during warmup phase. Thus when it becomes true during inference, all tactics fall back to the default one and thus cause perf regression. --------- [doc] Better document for Draft-Target-Model (DTM) speculative decoding (#3797) Fix pre-commit Fix again Address some review comments for the MI Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com> Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2025-04-29 16:57:22 +08:00
Kaiyu Xie	3aa6b11d13	Update TensorRT-LLM (#2936 ) * Update TensorRT-LLM --------- Co-authored-by: changcui <cuichang147@gmail.com>	2025-03-18 21:25:19 +08:00
Kaiyu Xie	9b931c0f63	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
Kaiyu Xie	e88da961c5	Update TensorRT-LLM (#2783 )	2025-02-13 18:40:22 +08:00
Dan Blanaru	16d2467ea8	Update TensorRT-LLM (#2755 ) * Update TensorRT-LLM --------- Co-authored-by: Denis Kayshev <topenkoff@gmail.com> Co-authored-by: akhoroshev <arthoroshev@gmail.com> Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com> Update	2025-02-11 03:01:00 +00:00
Kaiyu Xie	aaacc9bd68	Update TensorRT-LLM (#2562 ) * Update TensorRT-LLM --------- Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>	2024-12-11 00:31:05 -08:00
石晓伟	548b5b7310	Update TensorRT-LLM (#2532 ) * blossom-ci.yml: run vulnerability scan on blossom * open source efb18c1256f8c9c3d47b7d0c740b83e5d5ebe0ec --------- Co-authored-by: niukuo <6831097+niukuo@users.noreply.github.com> Co-authored-by: pei0033 <59505847+pei0033@users.noreply.github.com> Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2024-12-04 21:16:56 +08:00
Kaiyu Xie	c629546ce4	Update TensorRT-LLM (#2436 )	2024-11-12 15:27:49 +08:00
Kaiyu Xie	b7868dd1bd	Update TensorRT-LLM (#2413 )	2024-11-05 16:27:06 +08:00
Kaiyu Xie	f14d1d433c	Update TensorRT-LLM (#2389 ) * Update TensorRT-LLM --------- Co-authored-by: Alessio Netti <netti.alessio@gmail.com>	2024-10-29 22:24:38 +08:00
Kaiyu Xie	1730a587d8	Update TensorRT-LLM (#2363 ) * Update TensorRT-LLM --------- Co-authored-by: tonylek <137782967+tonylek@users.noreply.github.com>	2024-10-22 20:27:35 +08:00
Kaiyu Xie	75057cd036	Update TensorRT-LLM (#2333 ) * Update TensorRT-LLM --------- Co-authored-by: Puneesh Khanna <puneesh.khanna@tii.ae> Co-authored-by: Ethan Zhang <26497102+ethnzhng@users.noreply.github.com>	2024-10-15 15:28:40 +08:00
Kaiyu Xie	8681b3a4c0	open source 4dbf696ae9b74a26829d120b67ab8443d70c8e58 (#2297 ) * Update TensorRT-LLM --------- Co-authored-by: Bhuvanesh Sridharan <bhuvanesh.sridharan@sprinklr.com> Co-authored-by: Qingquan Song <ustcsqq@gmail.com>	2024-10-08 12:19:19 +02:00
Kaiyu Xie	31ac30e928	Update TensorRT-LLM (#2215 ) * Update TensorRT-LLM --------- Co-authored-by: Sherlock Xu <65327072+Sherlock113@users.noreply.github.com>	2024-09-10 18:21:22 +08:00
Kaiyu Xie	78f5c2936b	Update TensorRT-LLM (#2184 )	2024-09-03 12:14:23 +02:00
石晓伟	32ed92e449	Update TensorRT-LLM Co-authored-by: Rong Zhou <130957722+ReginaZh@users.noreply.github.com> Co-authored-by: Onur Galoglu <33498883+ogaloglu@users.noreply.github.com> Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>	2024-08-20 18:55:15 +08:00
Kaiyu Xie	74b324f667	Update TensorRT-LLM (#2110 )	2024-08-13 22:34:33 +08:00
Kaiyu Xie	be9cd719f7	Update TensorRT-LLM (#2094 ) * Update TensorRT-LLM --------- Co-authored-by: akhoroshev <arthoroshev@gmail.com> Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com> Co-authored-by: Tayef Shah <tayefshah@gmail.com> Co-authored-by: lfz941 <linfanzai941@gmail.com>	2024-08-07 16:44:43 +08:00
Kaiyu Xie	a681853d38	Update TensorRT-LLM (#2053 )	2024-07-30 21:25:01 +08:00
Kaiyu Xie	bca9a33b02	Update TensorRT-LLM (#2008 ) * Update TensorRT-LLM --------- Co-authored-by: Timur Abishev <abishev.timur@gmail.com> Co-authored-by: MahmoudAshraf97 <hassouna97.ma@gmail.com> Co-authored-by: Saeyoon Oh <saeyoon.oh@furiosa.ai> Co-authored-by: hattizai <hattizai@gmail.com>	2024-07-23 23:05:09 +08:00
Kaiyu Xie	9dbc5b38ba	Update TensorRT-LLM (#1891 ) * Update TensorRT-LLM --------- Co-authored-by: Marks101 <markus.schnoes@gmx.de> Co-authored-by: lkm2835 <lkm2835@gmail.com>	2024-07-04 14:37:19 +08:00
Kaiyu Xie	db4edea1e1	Update TensorRT-LLM (#1763 ) * Update TensorRT-LLM --------- Co-authored-by: Kota Tsuyuzaki <bloodeagle40234@gmail.com> Co-authored-by: Pzzzzz <hello-cd.plus@hotmail.com> Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>	2024-06-11 16:59:02 +08:00
Kaiyu Xie	b777bd6475	Update TensorRT-LLM (#1725 ) * Update TensorRT-LLM --------- Co-authored-by: RunningLeon <mnsheng@yeah.net> Co-authored-by: Tlntin <TlntinDeng01@Gmail.com> Co-authored-by: ZHENG, Zhen <zhengzhen.z@qq.com> Co-authored-by: Pham Van Ngoan <ngoanpham1196@gmail.com> Co-authored-by: Nathan Price <nathan@abridge.com> Co-authored-by: Tushar Goel <tushar.goel.ml@gmail.com> Co-authored-by: Mati <132419219+matichon-vultureprime@users.noreply.github.com>	2024-06-04 20:26:32 +08:00
Kaiyu Xie	f430a4b447	Update TensorRT-LLM (#1688 ) * Update TensorRT-LLM --------- Co-authored-by: IbrahimAmin <ibrahimamin532@gmail.com> Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com> Co-authored-by: Pzzzzz <hello-cd.plus@hotmail.com> Co-authored-by: CoderHam <hemant@cohere.com> Co-authored-by: Konstantin Lopuhin <kostia.lopuhin@gmail.com>	2024-05-28 20:07:49 +08:00
Kaiyu Xie	bf0a5afc92	Update TensorRT-LLM (#1598 ) * Update TensorRT-LLM	2024-05-14 16:43:41 +08:00
Kaiyu Xie	89ba1b1a67	Update TensorRT-LLM (#1554 )	2024-05-07 23:34:28 +08:00
Kaiyu Xie	71d8d4d3dc	Update TensorRT-LLM (#1455 )	2024-04-16 19:40:08 +08:00
Kaiyu Xie	035b99e0d0	Update TensorRT-LLM (#1427 ) * Update TensorRT-LLM --------- Co-authored-by: meghagarwal <16129366+megha95@users.noreply.github.com>	2024-04-09 17:03:34 +08:00
石晓伟	850b6fa1e7	Update TensorRT-LLM (#1358 ) Co-authored-by: Kaiyu <26294424+kaiyux@users.noreply.github.com>	2024-03-26 20:47:14 +08:00
Kaiyu Xie	66ca3378c6	Update TensorRT-LLM (#1315 )	2024-03-19 17:36:42 +08:00
Kaiyu Xie	4bb65f216f	Update TensorRT-LLM (#1274 ) * Update TensorRT-LLM --------- Co-authored-by: meghagarwal <16129366+megha95@users.noreply.github.com> Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2024-03-12 18:15:52 +08:00
Kaiyu Xie	728cc0044b	Update TensorRT-LLM (#1233 ) * Update TensorRT-LLM --------- Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com> Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2024-03-05 18:32:53 +08:00
Kaiyu Xie	0f041b7b57	Update TensorRT-LLM (#1098 ) * Update TensorRT-LLM * update submodule * Remove unused binaries	2024-02-18 15:48:08 +08:00
Kaiyu Xie	e06f537e08	Update TensorRT-LLM (#1019 ) * Update TensorRT-LLM --------- Co-authored-by: erenup <ping.nie@pku.edu.cn> Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2024-01-31 21:55:32 +08:00
Kaiyu Xie	b57221b764	Update TensorRT-LLM (#941 ) * Update TensorRT-LLM --------- Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2024-01-23 23:22:35 +08:00
Kaiyu Xie	c89653021e	Update TensorRT-LLM (20240116) (#891 ) * Update TensorRT-LLM --------- Co-authored-by: Eddie-Wang1120 <81598289+Eddie-Wang1120@users.noreply.github.com> Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2024-01-16 20:03:11 +08:00
Kaiyu Xie	d879430b04	Update TensorRT-LLM (#846 ) * Update TensorRT-LLM --------- Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2024-01-09 21:03:35 +08:00
Kaiyu Xie	deaae40bd7	Update TensorRT-LLM (#787 ) * Update TensorRT-LLM --------- Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2024-01-02 17:54:32 +08:00

1 2

58 Commits