TensorRT-LLMs/tests/integration/test_lists
Yukun He aa38e28cfa
fix: [nvbug/5241627] Fix AllReduce kernel hang issue when both tp and pp are enabled. (#3988)
* Fix AllReduce kernel hang issue when both tp and pp are enabled.
Allocate one workspace for each pp rank to avoid potential race.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

* update waive list

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

---------

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-05 11:33:25 +08:00
..
dev Update (#2978) 2025-03-23 16:39:35 +08:00
qa chore: refactor llmapi e2e tests (#3803) 2025-05-05 07:37:24 +08:00
test-db chore: refactor llmapi e2e tests (#3803) 2025-05-05 07:37:24 +08:00
waives.txt fix: [nvbug/5241627] Fix AllReduce kernel hang issue when both tp and pp are enabled. (#3988) 2025-05-05 11:33:25 +08:00