JieXin Liang
664bf95892
[fix] improve fp4_block_scale_moe_runner type check ( #5681 )
...
Signed-off-by: JieXin Liang <Alcanderian@users.noreply.github.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
2025-07-08 14:32:14 +09:00
liji-nv
95978e3044
[fix] https://nvbugs/5333654 Unwaive to check ci status and improve torch compile multi-gpu coverage ( #5700 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-08 12:42:15 +08:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" ( #5818 )
2025-07-08 13:15:30 +09:00
Yechan Kim
5bc3a15f10
feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL ( #5522 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-07 18:03:12 -07:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #5795 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
davidclark-nv
a1235ee978
[feat] Adds optional module cache for TRT-LLM Gen Gemm interfaces ( #5743 )
...
Signed-off-by: David Clark <215764518+davidclark-nv@users.noreply.github.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-07-07 13:34:55 -07:00
Omer Ullman Argov
1191555cce
[ci] speedup fused moe tests ( #5726 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-07-07 18:03:15 +03:00
Robin Kobus
30a19fcf7c
[TRTLLM-6291] feat: Add user-provided speculative decoding support ( #5204 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-07 16:30:43 +02:00
Tailing Yuan
85b4a6808d
Refactor: move DeepEP from Docker images to wheel building ( #5534 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-07-07 22:57:03 +09:00
Daniel Cámpora
1260e2f33f
feat: Optimize TRTLLM Sampler perf single beam single step ( #5550 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-07-07 15:44:47 +02:00
DylanChen-NV
5ca2b9bb15
[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow ( #5615 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-07 18:04:57 +08:00
Yi Zhang
ed1b3c884a
fix: Adjust free GPU memory fraction in KvCacheConfig for DeepSeek R1 tests ( #5774 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-07 18:38:54 +09:00
Yan Chunwei
dfce61f4b9
[TRTLLM-5530][BREAKING CHANGE] refactor: LLM arglist rename mixed_sampler to enable_mixed_sampler ( #5751 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-07 17:05:14 +08:00
xinhe-nv
ded38ebdbd
test: [CI] remove closed bugs ( #5770 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-07 18:06:07 +10:00
ChristinaZ
12d8c7d129
Refactor the topk parallelization part for the routing kernels ( #5567 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-07-07 15:53:25 +08:00
Bo Li
9db2e9ee47
fix: [nvbug/5368507] Fix test_generate_with_seed CI failure. ( #5772 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-07 14:58:32 +08:00
Zheng Duan
de10774c2e
chore: log stack trace on error in openai server ( #5749 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-07 14:54:36 +08:00
Yanchao Lu
092e0eb86a
[Infra] - Fix a syntax issue in the image check ( #5775 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-07 11:19:59 +09:00
bhsueh_NV
85e934a7fe
[Doc] update the document of qwen3 and cuda_graph usage ( #5703 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-07 09:44:25 +08:00
Daniel Stokes
ec6c7dff1a
feat: Add support for MXFP8xMXFP4 in pytorch ( #5535 )
...
Signed-off-by: Daniel Stokes <40156487+djns99@users.noreply.github.com>
2025-07-06 15:32:06 -07:00
Yiteng Niu
66f299a205
[TRTLLM-5878] add stage for image registration to nspect ( #5699 )
...
Signed-off-by: Yiteng Niu <6831097+niukuo@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 23:52:54 +08:00
Yanchao Lu
2013034948
[Test] - Waive or fix few known test failures ( #5769 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 21:14:16 +08:00
Robin Kobus
ae27261094
refactor: decoding inputs ( #5679 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-06 08:21:02 +02:00
Yanchao Lu
d95ae1378b
[Infra] - Always use x86 image for the Jenkins agent and few clean-ups ( #5753 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 10:25:57 +08:00
Julien Debache
6bddaf6df6
chore: Improve documentation of Kv_block_array ( #5765 )
...
Signed-off-by: Julien Debache <julien.debache@hotmail.com>
2025-07-05 22:25:27 +02:00
Xianjie Qiao
b1976c2add
Add wide-ep benchmarking scripts ( #5760 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Signed-off-by: Xianjie Qiao <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-05 19:29:39 +08:00
Xianjie Qiao
089fd55eda
Add dummy all_reduce for kernel breakdown ( #5745 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-07-05 13:08:58 +09:00
jthomson04
1b588f8390
feat: KV events for sliding window attention ( #5580 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-07-05 06:05:20 +08:00
Frank
d61893dc77
[fix] Update to properly set cuda graphs in trtllm-bench overrides. ( #5634 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-07-05 05:19:16 +09:00
Stefan Niebler
d1112aac37
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow ( #5333 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-05 01:35:13 +09:00
Chuang Zhu
ffc0b8f5da
Cache transceiver support VSWA ( #5505 )
...
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-07-05 01:18:42 +09:00
HuiGao-NV
3ed3bbcb5d
Fix: pass allreduce strategy to pytorchConfig ( #5746 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-07-04 21:32:13 +09:00
Yiqing Yan
7f3ea058f0
[Infra] - Waive L0 flaky test ( #5759 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-04 19:25:12 +09:00
Shunkangz
32339d1b20
Raise shut down error for each request ( #4936 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-07-04 18:58:24 +09:00
ixlmar
471bf0b4fc
fix: check file exists in dev container script ( #5755 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-04 10:29:17 +02:00
xinhe-nv
3869b969a6
test: [CI] Add failed cases into waives.txt ( #5718 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-04 17:24:48 +09:00
Faraz
81c0764012
Cherry pick "[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 ( #5724 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
2025-07-04 16:53:20 +09:00
Robin Kobus
07f9cf1519
fix: Improve chunking test and skip empty kernel calls ( #5710 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-04 09:08:15 +02:00
Yiqing Yan
b8fef809ae
[Infra] - Waive L0 test ( #5748 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-04 15:04:49 +08:00
Tailing Yuan
e134a52e07
Perf: reduce DeepEPLowLatency memory and time ( #5712 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-07-04 14:46:28 +08:00
nv-guomingz
c434147366
chore: update doc by replacing use_cuda_graph with cuda_graph_config ( #5680 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-04 15:39:15 +09:00
Yuan Tong
32b244af38
feat: reduce unnecessary kernel generation ( #5476 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-07-04 14:37:49 +08:00
Shunkangz
a79d8c9f5e
Fix none response in PD ( #5422 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-07-04 14:25:10 +08:00
Netanel Haber
134b2383ff
[fix: nvbugs/5355493] Correctly clamp max sequence len to max attention window ( #5720 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-07-04 08:16:25 +02:00
Linda
94f0252b46
Doc: Update invalid hugging face URLs ( #5683 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Emma Qiao
a0135c0f6f
[Infra] - Waive failed cases on release/0.21 ( #5674 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-04 13:14:13 +08:00
brb-nv
cdaa6abce7
fix: Investigate Gemma3 1B decoder output discrepancy ( #5564 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Frank
819ae903de
[ https://nvbugspro.nvidia.com/bug/5351333 ][fix] Update to chunking calculation. ( #5625 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Kaiyu Xie
ab488a5a5d
doc: Fix outdated config in DeepSeek best perf practice doc ( #5638 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Yi Zhang
73d30a23c7
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests ( #5397 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00