[None][doc] add Llama PP known issue to release note (#7959)

Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2026-01-13 22:18:36 +08:00 · 2025-09-24 19:39:54 +08:00 · 2025-09-24 19:39:54 +08:00 · 93392cefd2
commit 93392cefd2
parent ae8270b713
1 changed files with 1 additions and 0 deletions
--- a/docs/source/release-notes.md
+++ b/docs/source/release-notes.md
@ -178,6 +178,7 @@ TensorRT LLM 1.0 brings 2 major changes: the PyTorch-based architecture is now s
 ### Known Issues
 - When using disaggregated serving with pipeline parallelism and KV cache reuse, a hang can occur. This will be fixed in a future release. In the meantime, disabling KV cache reuse will fix this issue.
 - Running multi-node cases where each node has just a single GPU is known to fail. This will be addressed in a future release. 
+- For the Llama 3.x and Llama 4 models, there is an issue with pipeline parallelism when using FP8 and NVFP4 weights. As a workaround, you can set the environment variable `export TRTLLM_LLAMA_EAGER_FUSION_DISABLED=1`.

 ## TensorRT-LLM Release 0.21.0