TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Zhenhuan Chen 838958c631 [https://nvbugs/5545522 ][fix] move PREEXIT in UB kernels to fix accuracy issue (#8318 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>		2025-10-16 09:50:43 +08:00
..
CMakeLists.txt	feat: reduce unnecessary kernel generation (#5476 )	2025-07-04 14:37:49 +08:00
ipcsocket.cpp	Update TensorRT-LLM (#2532 )	2024-12-04 21:16:56 +08:00
ipcsocket.h	Update TensorRT-LLM (#2532 )	2024-12-04 21:16:56 +08:00
ub_allocator.cpp	[None][feat] Add NCCL Symmetric Integration for All Reduce (#4500 )	2025-08-07 17:28:14 -07:00
ub_allocator.h	[#6798 ][fix] fix compilation error in ub_allocator in single device build (#6874 )	2025-09-09 07:13:53 -04:00
ub_interface.cpp	[None][feat] Add NCCL Symmetric Integration for All Reduce (#4500 )	2025-08-07 17:28:14 -07:00
ub_interface.h	feat: Introduce UB allocator for pytorch flow (#3257 )	2025-04-08 18:39:49 +08:00
userbuffers-host.cpp	Update TensorRT-LLM (#2792 )	2025-02-18 21:27:39 +08:00
userbuffers.cu	[https://nvbugs/5545522 ][fix] move PREEXIT in UB kernels to fix accuracy issue (#8318 )	2025-10-16 09:50:43 +08:00
userbuffers.h	None - Add one-shot version for UB AR NORM FP16/BF16 (#2995 )	2025-03-31 11:16:03 +08:00
userbuffersManager.cpp	[None][feat] Add NCCL Symmetric Integration for All Reduce (#4500 )	2025-08-07 17:28:14 -07:00
userbuffersManager.h	[None][feat] Add NCCL Symmetric Integration for All Reduce (#4500 )	2025-08-07 17:28:14 -07:00
utils.h	Update TensorRT-LLM (#2783 )	2025-02-13 18:40:22 +08:00