ruodil
cbcc55e073
test: remove duplicate cases in perf sanity test ( #5870 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-09 15:36:22 +10:00
Yi Zhang
39ad6023b7
doc: Update gb200 doc ( #5840 )
...
Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>
2025-07-08 20:22:06 +09:00
Bo Li
6d7a2cb1c5
fix: [ https://nvbugs/5351130 ][ https://nvbugs/5333654 ] Unwaive for bug 5351130 and 5333654. ( #5821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-08 18:12:48 +08:00
QI JUN
f8b4077654
[nvbugs/5326453] Avoid nesting NCCL grouping in allgather OP ( #5789 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-08 15:39:27 +09:00
Bo Li
6062dc675f
fix: [ https://nvbugspro.nvidia.com/bug/5345215 ] Unwaive for bug 5345215. ( #5606 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-08 13:11:08 +09:00
Perkz Zheng
5a50e2b26b
[ https://nvbugspro.nvidia.com/bug/5355054 ] fallback to cubins for fp8 fmha kernels on Ada. ( #5779 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: qsang-nv <200703406+qsang-nv@users.noreply.github.com>
2025-07-08 10:35:38 +08:00
Yan Chunwei
97f4c9e24f
[nvbug5266240] chore: unwaive test_llm_with_dummy_weights ( #5744 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-07 22:13:34 +08:00
Pengyun Lin
0a0ac7b5dc
[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend ( #5541 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-07 19:26:13 +08:00
QI JUN
d47ac4e3e5
cherry pick #5416 ( #5776 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-07 17:19:38 +08:00
QI JUN
4fa9284612
[nvbug/5302638][nvbugs/5310314] fix _handle_cancelled_requests ( #5532 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-07 16:51:24 +08:00
Martin Marciniszyn Mehringer
06f83277b2
Fix docker cache mount ( #5763 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-07-07 00:18:55 -07:00
QI JUN
3a58db88c8
fix _pad_attention_dp_dummy_request ( #5583 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-07 14:13:54 +08:00
Pengyun Lin
7524c77e1e
[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens ( #5201 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-07 15:06:49 +09:00
brb-nv
9106b5d9a5
fix: Skip rope scaling for local layers in Gemma3 VLM ( #5773 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-07 13:36:23 +08:00
ruodil
6103466de2
test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases ( #5693 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-07 13:11:41 +10:00
Yanchao Lu
aa4d0f04eb
[Infra] - Always use x86 image for the Jenkins agent ( #5756 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 10:25:20 +08:00
Iman Tabrizian
518915b5c6
[nvbug/5337601][fix] Fix disagg + speculative decoding ( #5558 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-04 12:52:35 -04:00
Yi Zhang
5ac92bb8ff
[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTLLM Gen Moe workspace allocation ( #5463 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>
2025-07-04 23:23:41 +09:00
Yiqing Yan
3e44db11c9
[Infra][nvbugs/5370968] - Unwaive l0 test ( #5750 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-04 15:27:53 +08:00
Yukun He
b0354ef43c
[5321981] fix: Fix the Llama3.1 405B hanging issue. ( #5698 )
...
Correct the output shape of the fusedLayerNormPlugin.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-07-04 12:29:19 +08:00
Yi Zhang
53394e0030
test: Move some of the test from post merge to pre-merge, update dgx b200 test case ( #5640 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-04 13:26:53 +09:00
brb-nv
2b66fe8fbd
[nvbug/5341178][fix] Fix OOM in Llama 4 accuracy test ( #5735 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-04 10:55:34 +08:00
Dom Brown
2aacdba1e4
[TRTLLM-6100] fix: Nvbug 5356427: autotuned TRTLLM Gen fp8 block scale MoE illegal memory access ( #5676 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-07-04 10:38:08 +08:00
Faraz
8a8d2e9901
[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 ( #5651 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
2025-07-03 22:08:15 +09:00
Linda
14f938e510
Doc: Update invalid hugging face URLs ( #5683 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-03 09:37:01 +02:00
Emma Qiao
2f9d0619c3
[Infra] - Waive failed cases on release/0.21 ( #5674 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-02 22:23:54 -04:00
brb-nv
a3c0cf02ce
fix: Investigate Gemma3 1B decoder output discrepancy ( #5564 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-03 09:55:25 +08:00
Frank
92d3a2d0e0
[ https://nvbugspro.nvidia.com/bug/5351333 ][fix] Update to chunking calculation. ( #5625 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-07-02 17:48:02 +08:00
bhsueh_NV
d5606b062a
fix: [ https://nvbugs/5355219 ] Fix bug of Qwen3 235B CI on dgx_gb200 ( #5602 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-02 10:07:01 +08:00
Kaiyu Xie
682b164b9b
doc: Fix outdated config in DeepSeek best perf practice doc ( #5638 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-01 04:58:50 -04:00
Yi Zhang
aa0b9278d2
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests ( #5397 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-01 01:06:47 -04:00
Zheng Duan
1824c44004
[nvbug 5300551] test: increase block count in eviction test ( #5465 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-01 10:48:25 +08:00
nv-guomingz
9fe1dd6be1
fix: https://nvbugs/5362398 ( #5609 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 13:29:40 -04:00
Yan Chunwei
d6c81bad97
fix [nvbug5351244]: test_mpi_session submit sync/async ( #5608 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-01 00:48:59 +08:00
Emma Qiao
647e070ed6
[Infra][release/0.21]Update nccl to 2.27.5 ( #5539 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-29 20:50:15 +08:00
Venky
4fc0666daa
[cherry-pick] [CI] Waive test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False] ( #5553 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-06-28 01:15:04 +08:00
ixlmar
abb7357f25
[TRTLLM-5989, TRTLLM-5991, TRTLLM-5993] doc: Update container instructions ( #5490 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-06-27 07:09:41 -07:00
Yan Chunwei
b78ad754c8
ci: unwaive llmapi launch test ( #5281 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-27 14:10:45 +08:00
Emma Qiao
e2054bb2aa
[Infra][release/0.21] - waive failed tests ( #5537 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-27 13:58:13 +08:00
ixlmar
312fd47f84
fix: constrain grepping in docker/Makefile ( #5493 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-06-26 13:44:40 +02:00
Kaiyu Xie
30a2a8b81c
doc: Fix benchmark cmd in disagg scripts ( #5516 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-26 17:23:24 +08:00
ixlmar
a811077f90
fix: fix regression in LOCAL_USER ( #5517 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-06-26 11:10:55 +02:00
Anurag Mukkara
c2799d0465
[nvbug/5354825] Fix nougat test image url ( #5496 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-06-26 10:10:18 +08:00
Yan Chunwei
87ead4ecbe
[nvbug 5273941] fix: broken cyclic reference detect ( #5417 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-26 07:35:35 +08:00
Martin Marciniszyn Mehringer
fc64f139e4
Fix permission for local user issues in NGC docker container. ( #5373 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-06-25 14:10:20 +02:00
Emma Qiao
b6d23d58c4
[Infra] - Waive failed tests on release/0.21 ( #5477 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-25 19:01:55 +08:00
HuiGao-NV
5cd87bee41
tests: Set kv cache free memory fraction in test case ( #5462 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-25 16:27:46 +08:00
ruodil
5e50fcc51b
test: set enable_attention_dp=True in default deepseek settings ( #5461 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-25 14:21:14 +08:00
Wanli Jiang
af5839303d
feat: TRTLLM-5941 Upgrade xgrammar to 0.1.18 ( #5364 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-06-25 14:10:50 +08:00
brb-nv
32f50ded17
nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 ( #5453 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-06-25 11:45:14 +08:00