Chenghao Zhang
|
0b748d5bba
|
[None][chore] update flashinfer to 0.6.0 (#10522)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
|
2026-01-16 16:22:06 -05:00 |
|
Chenghao Zhang
|
b6acd96616
|
[None][fix] AutoDeploy: Fix the nvfp4 fused_moe (#10727)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
|
2026-01-16 12:04:40 -08:00 |
|
Tian Zheng
|
cfebfbb505
|
[https://nvbugs/5783509][fix] Fix a hang issue when enabling skip softmax on Blackwell (#10490)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
|
2026-01-16 18:59:54 +08:00 |
|
Chuang Zhu
|
7e2cbc0756
|
[https://nvbugs/5598674][fix] enable partial reuse in gemma and gpt oss test (#10559)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2026-01-16 10:26:15 +08:00 |
|
Anish Shanbhag
|
faa80e73fd
|
[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2026-01-14 21:06:07 -08:00 |
|
Wanli Jiang
|
73d1840c12
|
[TRTLLM-10245][feat] Add accuracy tests for super v3 fp8 model (#10482)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2026-01-15 10:07:02 +08:00 |
|
彭晋韬(jtao peng)
|
211c44b951
|
[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905)
Signed-off-by: jintaop <jintaop@nvidia.com>
|
2026-01-15 07:29:15 +08:00 |
|
Bo Li
|
582dec5bb5
|
[https://nvbugs/5774869][infra] Use 2 GPUs to test skip softmax attention on H100. (#10420)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2026-01-14 07:03:01 -05:00 |
|
jmydurant
|
e7882d5c74
|
[None][feat] MiniMax M2 support (#10532)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2026-01-14 17:38:58 +08:00 |
|
xinhe-nv
|
07d9390e9b
|
[None][test] add test into qa test list (#10627)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2026-01-13 22:43:00 -05:00 |
|
Balaram Buddharaju
|
ccdfa43a6e
|
[https://nvbugs/5791900][fix] Fix HelixCpMnnvlMemory init with PP (#10533)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2026-01-13 15:48:42 -05:00 |
|
Guoming Zhang
|
bdaee87895
|
[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. (#10347)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2026-01-13 17:13:55 +08:00 |
|
Suyog Gupta
|
a1385243e1
|
[#10580][fix] re-enable NemotronH MOE MMLU test (#10594)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2026-01-12 09:26:07 -08:00 |
|
Wanli Jiang
|
11da7e3605
|
[None][fix] Solve pillow version conflict (#10537)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2026-01-12 04:05:54 -05:00 |
|
William Zhang
|
ff7eb93f31
|
[https://nvbugs/5669097][tests] Add MMMU test for mistral small (#10530)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
|
2026-01-09 16:09:28 -08:00 |
|
Yechan Kim
|
7295af68ba
|
[None][fix] Enable AttentionDP on Qwen3-VL and fix test (#10435)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2026-01-10 00:13:26 +09:00 |
|
Jie Li
|
627d306df9
|
[None][chore] remove some model support; add device constraint (#10563)
Signed-off-by: Jie Li <lijie@nvidia.com>
|
2026-01-09 09:36:23 -05:00 |
|
JadoTu
|
4c498bfe58
|
[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case (#9873)
Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>
|
2026-01-09 14:50:16 +08:00 |
|
bhsueh_NV
|
bea61bb17d
|
[None][fix] Mistral large 3 few code refine (#10405)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2026-01-08 06:38:49 -05:00 |
|
Fanrong Li
|
a34aa63685
|
[https://nvbugs/5767223][feat] add pp support for DeepSeek-v3.2 (#10449)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2026-01-07 12:29:51 +08:00 |
|
Ivy Zhang
|
4a1b2e23b3
|
[https://nvbugs/5698434][test] add qwen3-4b accuracy test case (#10382)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2026-01-06 21:56:34 -05:00 |
|
Ivy Zhang
|
1e828587e5
|
[TRTLLM-9896][test] add vswa test cases coverage (#10146)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2026-01-06 02:02:29 -05:00 |
|
Ivy Zhang
|
22a1d31a27
|
[None][test] update test case constraint (#10381)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2026-01-06 12:28:59 +08:00 |
|
Mike Iovine
|
91ff46d418
|
[https://nvbugs/5745152][fix] Unwaive gpt oss spec decode test (#10370)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2026-01-05 16:06:58 -05:00 |
|
Mike Iovine
|
7a2dab8e85
|
[https://nvbugs/5695984][fix] Unwaive llama3 eagle test (#10092)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2026-01-05 16:03:35 -05:00 |
|
Gal Hubara-Agam
|
e98c27ee4f
|
[TRTLLM-10053][feat] AutoDeploy: Add Super v3 config file, improve test runtime (#10397)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
|
2026-01-05 18:17:27 +02:00 |
|
Balaram Buddharaju
|
a792c23dcf
|
[TRTLLM-9465][fix] Swap TP-CP grouping order (#10350)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2026-01-05 20:08:03 +08:00 |
|
xinhe-nv
|
b1733d56f6
|
[TRTLLM-9381][test] add disag-serving kimi k2 thinking tests (#10357)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2026-01-05 05:15:52 -05:00 |
|
Fanrong Li
|
4931c5eb3a
|
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2026-01-05 16:43:42 +08:00 |
|
HuiGao-NV
|
2f768b76f8
|
[https://nvbugs/5715568][fix] Force release torch memory when LLM is destroyed (#10314)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2026-01-05 15:30:18 +08:00 |
|
Fanrong Li
|
b5a1e10bc0
|
[https://nvbugs/5779534][fix] fix buffer reuse for CUDA graph attention metadata (#10393)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2026-01-05 09:43:44 +08:00 |
|
Wanli Jiang
|
da0830670a
|
[TRTLLM-10065][feat] Add accuracy tests for super-v3 with multiple-gpus (#10234)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2026-01-05 09:41:49 +08:00 |
|
dongfengy
|
afc533193d
|
[None][feat] Support nvfp4 for gptoss (#8956)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
|
2026-01-04 08:57:44 -05:00 |
|
Gal Hubara-Agam
|
f3dd6da080
|
[#10056][chore] AutoDeploy: Enable Nemo SuperV3 accuracy test (#10308)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
|
2026-01-02 11:20:19 +02:00 |
|
Balaram Buddharaju
|
4a1b742aa0
|
[TRTLLM-9467][fix] Fix PP+CP combination with helix parallelism (#10312)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2026-01-01 13:42:53 -05:00 |
|
Balaram Buddharaju
|
0b75340223
|
[https://nvbugs/5744427][fix] Make Gemma3 multimodal test fp8 (#10368)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2026-01-01 01:11:34 -05:00 |
|
Lucas Liebenwein
|
1bbe71b3ed
|
[#10244][feat] AutoDeploy: separate prefill/decode in flashinfer (#10252)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-12-31 17:01:24 -05:00 |
|
xinhe-nv
|
1e9c153b4c
|
[None][fix] disable thread leak check for kimi (#10337)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-12-31 01:31:37 -05:00 |
|
Bo Li
|
1f0365da36
|
[None][infra] Add LongBenchV1 to trtllm-eval. (#10265)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-12-30 21:39:34 +08:00 |
|
xinhe-nv
|
3e0344a53d
|
[None][chore] Add failed cases into waives.txt (#10301)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-12-30 14:04:28 +08:00 |
|
Yueh-Ting (eop) Chen
|
9cee32ab39
|
[https://nvbugs/5625990][fix] Respect VSWA scheme when doing block store for reuse and load block for reuse in KV cache manager (#10183)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
|
2025-12-29 14:29:14 +08:00 |
|
Jin Li
|
c04563657e
|
[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile (#9740)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-12-27 00:07:20 +08:00 |
|
dongfengy
|
bfc591994c
|
[https://nvbugs/5745152][fix] Fix some GPTOSS test setups (#10085)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
|
2025-12-26 17:52:40 +08:00 |
|
bhsueh_NV
|
db3430f589
|
[None][feat] Support VLM part for Mistral Large 3 (#10188)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-12-25 11:20:58 -05:00 |
|
Balaram Buddharaju
|
8c1cfc872b
|
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-12-23 18:14:30 -08:00 |
|
Perkz Zheng
|
c87f1a6b39
|
[https://nvbugs/5503479][fix] update trtllm-gen kernels to address few bugs (#10089)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-12-22 04:45:33 -05:00 |
|
Chuang Zhu
|
914dd39127
|
[None][fix] disable cuda ipc on device without nvlink (L40s) for disagg test (#9735)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-12-22 09:29:24 +08:00 |
|
Balaram Buddharaju
|
5266475014
|
[None][feat] Cudagraph updates for helix parallelism (#10141)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-12-21 15:21:52 -05:00 |
|
bhsueh_NV
|
cd4b4f43fa
|
[None][feat] Support Eagle3 on Mistral Large3 (#9971)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
|
2025-12-21 10:25:45 -05:00 |
|
Balaram Buddharaju
|
dcd3f7b5ea
|
[https://nvbugs/5744427][fix] Fix accuracy test OOM (#10173)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-12-21 02:03:38 -05:00 |
|