TensorRT-LLMs/cpp/tensorrt_llm/kernels/userbuffers
liji-nv b168adba70
feat: Add NVFP4 UB pattern optimization pass in torch compile (#3371)
* feat: Add NVFP4 UB pattern optimization pass in torch compile

* Add an additional flag for UB fp4 pattern to avoid inverse the scale
* Add NVFP4 related UB patterns

Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>

* Update atol, some points fails for B200 umbriel.

Signed-off-by: liji-nv <59594262+liji-nv@users.noreply.github.com>

---------

Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: liji-nv <59594262+liji-nv@users.noreply.github.com>
2025-04-11 21:25:29 +08:00
..
CMakeLists.txt feat: Introduce UB allocator for pytorch flow (#3257) 2025-04-08 18:39:49 +08:00
ipcsocket.cpp Update TensorRT-LLM (#2532) 2024-12-04 21:16:56 +08:00
ipcsocket.h Update TensorRT-LLM (#2532) 2024-12-04 21:16:56 +08:00
ub_allocator.cpp feat: Introduce UB allocator for pytorch flow (#3257) 2025-04-08 18:39:49 +08:00
ub_allocator.h feat: Introduce UB allocator for pytorch flow (#3257) 2025-04-08 18:39:49 +08:00
ub_interface.cpp feat: Introduce UB allocator for pytorch flow (#3257) 2025-04-08 18:39:49 +08:00
ub_interface.h feat: Introduce UB allocator for pytorch flow (#3257) 2025-04-08 18:39:49 +08:00
userbuffers-host.cpp Update TensorRT-LLM (#2792) 2025-02-18 21:27:39 +08:00
userbuffers.cu feat: Add NVFP4 UB pattern optimization pass in torch compile (#3371) 2025-04-11 21:25:29 +08:00
userbuffers.h None - Add one-shot version for UB AR NORM FP16/BF16 (#2995) 2025-03-31 11:16:03 +08:00
userbuffersManager.cpp feat: Introduce UB allocator for pytorch flow (#3257) 2025-04-08 18:39:49 +08:00
userbuffersManager.h feat: Introduce UB allocator for pytorch flow (#3257) 2025-04-08 18:39:49 +08:00
utils.h Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00