[https://nvbugs/5688388][fix] fix: Reducing num request in disagg test to speed up (#9598)

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
This commit is contained in:
Patrice Castonguay 2025-12-02 12:48:53 -05:00 committed by GitHub
parent a560ba5546
commit 3991aa9c72
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -351,8 +351,8 @@ def test_disaggregated_llama_context_capacity(model, enable_cuda_graph,
max_tokens = 25
requests = []
# Send 256 requests to make sure the context worker is saturated
for _ in range(256):
# Send 32 requests to make sure the context worker is saturated
for _ in range(32):
requests.append(
(prompt, SamplingParams(max_tokens=1, ignore_eos=True),
DisaggregatedParams(request_type="context_only")))