TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-23 12:12:39 +08:00

History

liji-nv b168adba70 feat: Add NVFP4 UB pattern optimization pass in torch compile (#3371 ) * feat: Add NVFP4 UB pattern optimization pass in torch compile * Add an additional flag for UB fp4 pattern to avoid inverse the scale * Add NVFP4 related UB patterns Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> * Update atol, some points fails for B200 umbriel. Signed-off-by: liji-nv <59594262+liji-nv@users.noreply.github.com> --------- Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: liji-nv <59594262+liji-nv@users.noreply.github.com>		2025-04-11 21:25:29 +08:00
..
CMakeLists.txt	feat: Introduce UB allocator for pytorch flow (#3257 )	2025-04-08 18:39:49 +08:00
ipcsocket.cpp	Update TensorRT-LLM (#2532 )	2024-12-04 21:16:56 +08:00
ipcsocket.h	Update TensorRT-LLM (#2532 )	2024-12-04 21:16:56 +08:00
ub_allocator.cpp	feat: Introduce UB allocator for pytorch flow (#3257 )	2025-04-08 18:39:49 +08:00
ub_allocator.h	feat: Introduce UB allocator for pytorch flow (#3257 )	2025-04-08 18:39:49 +08:00
ub_interface.cpp	feat: Introduce UB allocator for pytorch flow (#3257 )	2025-04-08 18:39:49 +08:00
ub_interface.h	feat: Introduce UB allocator for pytorch flow (#3257 )	2025-04-08 18:39:49 +08:00
userbuffers-host.cpp	Update TensorRT-LLM (#2792 )	2025-02-18 21:27:39 +08:00
userbuffers.cu	feat: Add NVFP4 UB pattern optimization pass in torch compile (#3371 )	2025-04-11 21:25:29 +08:00
userbuffers.h	None - Add one-shot version for UB AR NORM FP16/BF16 (#2995 )	2025-03-31 11:16:03 +08:00
userbuffersManager.cpp	feat: Introduce UB allocator for pytorch flow (#3257 )	2025-04-08 18:39:49 +08:00
userbuffersManager.h	feat: Introduce UB allocator for pytorch flow (#3257 )	2025-04-08 18:39:49 +08:00
utils.h	Update TensorRT-LLM (#2783 )	2025-02-13 18:40:22 +08:00