Yanchao Lu
b4b1185af3
[ https://nvbugs/5450855 ][fix] Cherry pick #6700 and #6702 from main ( #6808 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-12 18:11:47 +08:00
Yanchao Lu
751d5f175c
[None][infra] Pin the version for triton to 3.3.1 ( #6508 ) ( #6519 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
2025-08-01 15:04:38 +08:00
Zac Patel
3bf405f6c3
[doc] Update perf_overview.md for release 0.21 ( #6270 )
...
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-07-31 12:13:38 +08:00
Perkz Zheng
92397476d3
[ https://nvbugspro.nvidia.com/bug/5415268 ] fix illegal smem access with chunked attention ( #6401 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-07-30 11:33:22 +08:00
QI JUN
418892e270
doc: update release notes ( #6438 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-29 04:22:28 -04:00
Ivy Zhang
94de3c11b0
tests: Add llama4 functional cases ( #6392 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-29 17:49:43 +10:00
brb-nv
eb157accac
test: Relax Gemma3 unit test thresholds ( #6016 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-28 21:24:34 +08:00
QI JUN
44f6db8c1c
doc: update release notes ( #6324 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-28 16:05:50 +08:00
Mike Iovine
a55c631dab
[fix] Cherry pick "[TRTLLM-6262] Fix Llama4 Scout FP4 crash issue" ( #6267 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Passing all test stages, failing test seems like an infra issue.
2025-07-22 19:00:37 -07:00
QI JUN
1209001bac
doc: update known issues ( #6247 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-22 17:27:31 +08:00
Pengyun Lin
ab4e178bef
[fix]: Revert commit 388b491 ( #6143 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-18 17:38:13 +08:00
bhsueh_NV
9323de6e37
[Doc][Qwen3] update qwen3 into support-matrix ( #6161 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-18 11:23:30 +08:00
Yanchao Lu
eeca3ad084
[None][infra] Cherry-pick #6128 and #6130 from main branch ( #6151 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
2025-07-18 11:02:11 +08:00
pcastonguay
4d0bcbcb2d
fix: Fix triton backend build [nvbug 5396469] ( #6098 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-07-16 16:30:16 -04:00
QI JUN
f6db521e95
add release notes for 0.21 release ( #6049 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-16 16:54:14 +08:00
Zhanrui Sun
bce13bb436
Cherry Pick: PR #6076 ( #6088 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-07-16 14:29:45 +08:00
Yiqing Yan
69a15c8c74
[None] - Waive L0 tests ( #6082 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-16 13:14:16 +08:00
nv-guomingz
63f4a7ad32
[TRTLLM-6495] doc: add disclaimer for 3rd party software installation. ( #6039 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 13:33:03 +08:00
Iman Tabrizian
2e7da20934
[fix] Release slots with spec decode + disagg ( #5975 )
...
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-07-14 16:15:03 -07:00
Yi Zhang
332a65b837
[nvbugs/5368410][fix] Disable moe allreduce for multi node ( #5918 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-14 10:06:29 +08:00
Fanrong Li
bed78a2575
fix: fix index out of bounds error in spec decoding ( #5954 )
2025-07-14 09:41:27 +08:00
Fanrong Li
4905cac8fd
[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup ( #5947 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-07-12 21:55:44 +08:00
Zheng Duan
e831673f80
fix: timeout and broken pipe in disagg and worker tests ( #5827 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-11 12:42:47 +08:00
Nikita Korobov
aeea5b3a56
fix: [5328141] increase tolerance for test_fp8_block_scale_gemm ( #5849 )
...
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-07-10 15:44:19 +02:00
Yan Chunwei
bfa917ff9b
fix [nvbug/5351244]: address remote mpi session submit ( #5664 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-10 21:22:41 +09:00
amirkl94
8429c8b139
chore: Port leftover 0.20 ( #5907 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Yingge He <yinggeh@nvidia.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Yingge He <157551214+yinggeh@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: zpatel <22306219+zbpatel@users.noreply.github.com>
2025-07-10 13:48:12 +02:00
Bo Li
8b7422c5b7
fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction to 0.75 to prevent OOM on CI. ( #5896 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-10 19:16:38 +08:00
amirkl94
cd7aeec061
tests: Fix lora perf test ( #5875 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-07-10 10:56:46 +02:00
brb-nv
ff9aabb038
test: Add Gemma3 unit tests to CI in release/0.21 ( #5899 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-10 09:47:49 +02:00
Zhenhuan Chen
d9e265d5e7
[ https://nvbugs/5355316 ] fix: update torch.compile option to fix triton store_cubin error ( #5865 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-10 12:16:57 +09:00
Netanel Haber
ce048eccd3
cherry-pick: [fix: nvbugs/5355493] Correctly clamp max sequence len to max attention window ( #5874 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-07-09 19:11:17 +02:00
Robin Kobus
fd94d3cbf5
[nvbugs/5345391] fix: chunked prefill + overlap scheduling ( #5761 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-09 17:59:45 +02:00
Pengyun Lin
2e21e3421f
[nvbug 5327706][fix] fix mgmn postprocess error ( #5835 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-09 17:08:03 +08:00
ruodil
cbcc55e073
test: remove duplicate cases in perf sanity test ( #5870 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-09 15:36:22 +10:00
Yi Zhang
39ad6023b7
doc: Update gb200 doc ( #5840 )
...
Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>
2025-07-08 20:22:06 +09:00
Bo Li
6d7a2cb1c5
fix: [ https://nvbugs/5351130 ][ https://nvbugs/5333654 ] Unwaive for bug 5351130 and 5333654. ( #5821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-08 18:12:48 +08:00
QI JUN
f8b4077654
[nvbugs/5326453] Avoid nesting NCCL grouping in allgather OP ( #5789 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-08 15:39:27 +09:00
Bo Li
6062dc675f
fix: [ https://nvbugspro.nvidia.com/bug/5345215 ] Unwaive for bug 5345215. ( #5606 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-08 13:11:08 +09:00
Perkz Zheng
5a50e2b26b
[ https://nvbugspro.nvidia.com/bug/5355054 ] fallback to cubins for fp8 fmha kernels on Ada. ( #5779 )
...
Signed-off-by: Qidi Sang <200703406+qsang-nv@users.noreply.github.com>
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: qsang-nv <200703406+qsang-nv@users.noreply.github.com>
2025-07-08 10:35:38 +08:00
Yan Chunwei
97f4c9e24f
[nvbug5266240] chore: unwaive test_llm_with_dummy_weights ( #5744 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-07 22:13:34 +08:00
Pengyun Lin
0a0ac7b5dc
[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend ( #5541 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-07 19:26:13 +08:00
QI JUN
d47ac4e3e5
cherry pick #5416 ( #5776 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-07 17:19:38 +08:00
QI JUN
4fa9284612
[nvbug/5302638][nvbugs/5310314] fix _handle_cancelled_requests ( #5532 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-07 16:51:24 +08:00
Martin Marciniszyn Mehringer
06f83277b2
Fix docker cache mount ( #5763 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-07-07 00:18:55 -07:00
QI JUN
3a58db88c8
fix _pad_attention_dp_dummy_request ( #5583 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-07-07 14:13:54 +08:00
Pengyun Lin
7524c77e1e
[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens ( #5201 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-07 15:06:49 +09:00
brb-nv
9106b5d9a5
fix: Skip rope scaling for local layers in Gemma3 VLM ( #5773 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-07 13:36:23 +08:00
ruodil
6103466de2
test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases ( #5693 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-07 13:11:41 +10:00
Yanchao Lu
aa4d0f04eb
[Infra] - Always use x86 image for the Jenkins agent ( #5756 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 10:25:20 +08:00
Iman Tabrizian
518915b5c6
[nvbug/5337601][fix] Fix disagg + speculative decoding ( #5558 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-04 12:52:35 -04:00