Yuan Tong
a2f271c8e0
[TRTLLM-4406][feat] LLM sleep & wakeup Part 1: virtual device memory ( #5034 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-04 13:51:01 +08:00
xavier-nvidia
b6013da198
Fix GEMM+AR fusion on blackwell ( #5563 )
...
Signed-off-by: xsimmons <xsimmons@nvidia.com>
2025-07-09 08:48:47 +08:00
Yuan Tong
32b244af38
feat: reduce unnecessary kernel generation ( #5476 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-07-04 14:37:49 +08:00
dongxuy04
490d2e5819
feat: large-scale EP(part 8: Online EP load balancer integration for PCIe fp8) ( #5226 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-06-25 22:25:13 -07:00
dongxuy04
7137cc8f67
fix cuda driver link issue with driver version less than 12.3 ( #5025 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-06-10 15:27:39 +08:00
dongxuy04
1e369658f1
feat: large-scale EP(part 6: Online EP load balancer integration for GB200 nvfp4) ( #4818 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-08 10:25:18 +08:00
Zongfei Jing
dbaddb3a29
Adding two-shot allreduce kernel and mnnvl multicasting buffer ( #4216 )
...
* Adding two-shot allreduce kernel and mnnvl multicasting buffergit gffe
Signed-off-by: Shiyu Li <shili@nvidia.com>
Adding comments
Signed-off-by: Shiyu Li <shili@nvidia.com>
Add unittest of the twoshot kernel.
Signed-off-by: Shiyu Li <shili@nvidia.com>
Update dispatch logic
Signed-off-by: Shiyu Li <shili@nvidia.com>
Use cpu barrier instead of GPU at init
Signed-off-by: Shiyu Li <shili@nvidia.com>
Merge dispatch logic fix
Signed-off-by: Shiyu Li <shili@nvidia.com>
Update the kernel to use GPU-managed buffer
Signed-off-by: Shiyu Li <shili@nvidia.com>
* Refine
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Clean code
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix compile error
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix issue
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Clean up
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Simplify AllReduce interface
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Rename
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix warning
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Tidy code
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Rename
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Fix compile error
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Refine
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Skip ut for no_fusion
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Refine
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
---------
Signed-off-by: Shiyu Li <shili@nvidia.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Shiyu Li <shili@nvidia.com>
2025-05-22 03:42:36 +08:00
dongxuy04
21aff2e313
feat: large-scale EP(part 2: MoE Load Balancer - core utilities) ( #4384 )
...
* first commit of cpp moe loadbalance code
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* add python bindings for moe load balance
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* add python wrapper, ut and bug fixes
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* add binding for layerId and update binding test
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
* add host tensor sharing and ut
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
---------
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-05-20 17:53:48 +08:00
Robin Kobus
d31fefde2c
[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow ( #4092 )
...
* chore: Remove GptSession/V1 from TRT workflow
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove stateful decoders
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession buffers
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession utils
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession kernels
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove V1 GPT models from tests
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove gptSessionBenchmark from scripts and docs
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove gptSession IO classes
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession from test lists
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession from docs
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove useless encoder test
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove mActualBatchSize from DecoderState
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove static batching from ExecutorTest
- Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter.
- Adjusted related test functions to reflect the changes in parameter lists.
- Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-14 23:10:04 +02:00
Dom Brown
dbd9a83b0d
feat: Integrate GPUDirect Storage (GDS) into Executor API ( #3582 )
...
* feat: Integrate GPUDirect Storage (GDS) into Executor API
Squash of several dev commits
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-04-18 15:59:21 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM ( #2936 )
...
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM ( #2792 )
...
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8
Update TensorRT-LLM ( #2755 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
2025-02-11 03:01:00 +00:00
Kaiyu Xie
aaacc9bd68
Update TensorRT-LLM ( #2562 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
2024-12-11 00:31:05 -08:00
Kaiyu Xie
b7868dd1bd
Update TensorRT-LLM ( #2413 )
2024-11-05 16:27:06 +08:00
Kaiyu Xie
f14d1d433c
Update TensorRT-LLM ( #2389 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Alessio Netti <netti.alessio@gmail.com>
2024-10-29 22:24:38 +08:00
Kaiyu Xie
78f5c2936b
Update TensorRT-LLM ( #2184 )
2024-09-03 12:14:23 +02:00
石晓伟
b8fc6633ba
Update TensorRT-LLM ( #2156 )
...
Co-authored-by: Bruno Magalhaes <bruno.magalhaes@synthesia.io>
2024-08-27 18:20:59 +08:00
石晓伟
32ed92e449
Update TensorRT-LLM
...
Co-authored-by: Rong Zhou <130957722+ReginaZh@users.noreply.github.com>
Co-authored-by: Onur Galoglu <33498883+ogaloglu@users.noreply.github.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
2024-08-20 18:55:15 +08:00
Kaiyu Xie
be9cd719f7
Update TensorRT-LLM ( #2094 )
...
* Update TensorRT-LLM
---------
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Fabian Joswig <fjosw@users.noreply.github.com>
Co-authored-by: Tayef Shah <tayefshah@gmail.com>
Co-authored-by: lfz941 <linfanzai941@gmail.com>
2024-08-07 16:44:43 +08:00
石晓伟
2a115dae84
Update TensorRT-LLM ( #1793 )
...
Co-authored-by: DreamGenX <x@dreamgen.com>
Co-authored-by: Ace-RR <78812427+Ace-RR@users.noreply.github.com>
Co-authored-by: bprus <39293131+bprus@users.noreply.github.com>
Co-authored-by: janpetrov <janpetrov@icloud.com>
2024-06-18 18:18:23 +08:00
Kaiyu Xie
bf0a5afc92
Update TensorRT-LLM ( #1598 )
...
* Update TensorRT-LLM
2024-05-14 16:43:41 +08:00
Kaiyu Xie
89ba1b1a67
Update TensorRT-LLM ( #1554 )
2024-05-07 23:34:28 +08:00
石晓伟
850b6fa1e7
Update TensorRT-LLM ( #1358 )
...
Co-authored-by: Kaiyu <26294424+kaiyux@users.noreply.github.com>
2024-03-26 20:47:14 +08:00
Kaiyu Xie
66ca3378c6
Update TensorRT-LLM ( #1315 )
2024-03-19 17:36:42 +08:00
Kaiyu Xie
728cc0044b
Update TensorRT-LLM ( #1233 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Morgan Funtowicz <funtowiczmo@gmail.com>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-03-05 18:32:53 +08:00
Kaiyu Xie
0f041b7b57
Update TensorRT-LLM ( #1098 )
...
* Update TensorRT-LLM
* update submodule
* Remove unused binaries
2024-02-18 15:48:08 +08:00
Kaiyu Xie
0ab9d17a59
Update TensorRT-LLM ( #1055 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-02-06 18:38:07 +08:00
Kaiyu Xie
d879430b04
Update TensorRT-LLM ( #846 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-01-09 21:03:35 +08:00
Kaiyu Xie
deaae40bd7
Update TensorRT-LLM ( #787 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2024-01-02 17:54:32 +08:00
Kaiyu Xie
a75618df24
Update TensorRT-LLM ( #667 )
...
* Update TensorRT-LLM
---------
Co-authored-by: 0xymoro <jerrymeng100@gmail.com>
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2023-12-15 22:14:51 +08:00
Kaiyu Xie
f044eb8d94
Update TensorRT-LLM ( #302 )
...
* Update TensorRT-LLM
---------
Co-authored-by: wangruohui <12756472+wangruohui@users.noreply.github.com>
2023-11-07 19:51:58 +08:00
Kaiyu Xie
d8b408e6dc
Update TensorRT-LLM ( #148 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2023-10-27 12:10:00 +08:00
Kaiyu Xie
75b6210ff4
Kaiyu/update main ( #5 )
...
* Update
* Update
2023-10-18 22:38:53 +08:00
Kevin Xie
027cd518e3
Update
2023-10-10 23:22:17 -07:00
Kevin Xie
6e9e318e91
Update code
2023-09-28 09:00:05 -07:00
Kaiyu Xie
23bc5b7c49
Initial commit
2023-09-20 00:29:41 -07:00