Enwei Zhu
|
7c4777a571
|
[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-11-18 17:40:12 -08:00 |
|
Bo Deng
|
34f845bf69
|
[TRTLLM-9287][infra] Use NIXL backend for accuracy tests (#9247)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-11-18 14:46:20 -08:00 |
|
Ivy Zhang
|
ca41a71f92
|
[TRTLLM-8948][test] Add long bench case (#9165)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-11-18 04:41:48 -08:00 |
|
Tri Dao
|
fc088e642c
|
[None][feat] Support Glm4MoeForCausalLM (#8256)
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>
|
2025-11-18 09:43:21 +08:00 |
|
Chang Liu
|
bed4e95e9f
|
[https://nvbugs/5629887][fix] Add missing device count guard for DSv32 multiGPU tests (#9159)
|
2025-11-14 07:52:23 -08:00 |
|
Zhenhuan Chen
|
943b05e2d3
|
[TRTLLM-9179][feat] add pp_partition to customize each rank's layer number (#9003)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
|
2025-11-13 10:34:17 +08:00 |
|
dongxuy04
|
9241ccaf27
|
[None][feat] Enable EPLB for trtllm-gen and cutlass backend (#8886)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-11-12 12:30:27 -08:00 |
|
Fanrong Li
|
780d4f9dc5
|
[None][feat] Add MTP>1 support for DS-v3.2 (#9045)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-11-12 09:56:12 -08:00 |
|
Iman Tabrizian
|
cdde15b275
|
[TRTLLM-8540][feat] Add support for disagg in DSv3.2 (#8735)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-11-12 08:21:11 -08:00 |
|
Lucas Liebenwein
|
aca56097cb
|
[None][fix] AutoDeploy: update nano3 accuracy test (#9061)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-11-11 12:26:31 -08:00 |
|
Wanli Jiang
|
ebdd1cc8e0
|
[TRTLLM-8119][feat] Update doc/tests/chat_template for nano-v2-vlm (#8840)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-11-11 07:48:23 -08:00 |
|
Yechan Kim
|
0938a3ad2a
|
[https://nvbugs/5644187][fix] Llava-Next MMMU bugfix and Phi4 test bugfix (#9034)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-11-11 10:24:31 +09:00 |
|
Fanrong Li
|
a7033a9193
|
[TRTLLM-9001][feat] add TP support for DeepSeek-V3.2 (#8943)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-11-10 12:16:01 +08:00 |
|
QI JUN
|
1c6e490894
|
[TRTLLM-9065][chore] remove PyTorchConfig completely (#8856)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-11-06 22:37:03 -08:00 |
|
Fanrong Li
|
d246f62868
|
[https://nvbugs/5630345] [chore] skip deepseek-v3.2 fp8 kv tests on pre-Blackwell architectures (#8973)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-11-06 03:41:37 -08:00 |
|
Fanrong Li
|
c2feed798a
|
[https://nvbugs/5630345][chore] unwaive DS-v32 nvfp4 and fp8 tests (#8887)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-11-05 03:49:23 -08:00 |
|
Chuang Zhu
|
595f78078c
|
[https://nvbugs/5624367][fix] Fix disagg GPT-OSS test (#8870)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-11-05 01:47:09 -08:00 |
|
xiweny
|
cae468cc8e
|
[https://nvbugs/5596343] [test] Waive flaky GPT-OSS cases (#8904)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
2025-11-04 03:00:00 -08:00 |
|
xiweny
|
fcac2022e2
|
[https://nvbugs/5565565] [fix] fp8 wideep support sm103 (#8228)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-11-04 16:42:31 +08:00 |
|
Yueh-Ting (eop) Chen
|
bd1c9c0af4
|
[https://nvbugs/5625990][chore] Add test coverage for current incapability of the KV cache manager (#8829)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
|
2025-11-04 16:35:45 +08:00 |
|
Mike Iovine
|
5e6f1bcd24
|
[TRTLLM-8979][test] Improve qwen3 spec dec test coverage (#8767)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-11-03 10:12:10 -08:00 |
|
Yechan Kim
|
f48968b6cc
|
[TRTLLM-6928][fix] Refactor multimodal unittest (#8453)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-11-03 06:01:07 -08:00 |
|
Fanrong Li
|
e9f78c687a
|
[https://nvbugs/5625962][chore] unwaive DS-v32-fp4 tests (#8853)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-11-03 00:34:52 -08:00 |
|
dongfengy
|
6d6797c792
|
[None][test] Enhance GPT-OSS CI with GPQA Diamond and additional Spec Decoding Test (#8661)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
|
2025-11-02 16:44:02 -08:00 |
|
Fanrong Li
|
f0dc746738
|
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-10-31 14:38:31 -07:00 |
|
Yuxian Qiu
|
025d2926df
|
[https://nvbugs/5599515][fix] Fix PP bubbles. (#8687)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-10-31 10:13:56 +08:00 |
|
Mike Iovine
|
b87448b009
|
[TRTLLM-8978][test] Remove llama 4 spec dec tests (#8766)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-10-30 15:47:04 -04:00 |
|
HuiGao-NV
|
ae57738bae
|
[https://nvbugs/5547414][fix] Use cached models (#8755)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-10-29 19:10:10 -07:00 |
|
Iman Tabrizian
|
ae6875fe10
|
[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager (#8699)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-10-29 08:04:26 -07:00 |
|
Chang Liu
|
81eb861df0
|
[None][chore] Enable GPQA in CI for DeepSeek V3.2 (#8712)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-10-29 04:22:22 -07:00 |
|
dongfengy
|
083f3637f1
|
[https://nvbugs/5596343][test] Update test waive to get back some coverage (#8702)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
|
2025-10-28 14:05:48 -07:00 |
|
Anish Shanbhag
|
a09b38a862
|
[TRTLLM-8684][chore] Migrate BuildConfig to Pydantic, add a Python wrapper for KVCacheType enum (#8330)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
|
2025-10-28 09:17:26 -07:00 |
|
dongfengy
|
5a01f382c1
|
[https://nvbugs/5575913][fix] Use separate thresholds for 120b/20b gptoss (#8664)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
|
2025-10-28 10:35:07 -04:00 |
|
Bo Li
|
9c4432f8a4
|
[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. (#7499)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-10-27 13:23:06 -04:00 |
|
xinhe-nv
|
0ac5cbcac4
|
[None][chore] Add failed cases into waives.txt (#8669)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-27 02:36:28 -04:00 |
|
Chenghao Zhang
|
a6d20f6f9b
|
[None][feat] AutoDeploy: Add FP8 MOE for Nemotron (#8599)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
2025-10-25 15:26:45 -04:00 |
|
Simeng Liu
|
2b27810198
|
[https://nvbugs/5494718][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark (#8514)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
|
2025-10-24 19:09:07 -07:00 |
|
Chang Liu
|
e47c787dd7
|
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
|
2025-10-24 13:40:41 -04:00 |
|
Chuang Zhu
|
2420918e5b
|
[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-10-24 08:58:16 -04:00 |
|
xinhe-nv
|
2aaedd08cd
|
[TRTLLM-8638][fix] fix test issues (#8557)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-24 02:16:55 -04:00 |
|
xinhe-nv
|
59375e8bed
|
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8590)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-24 00:02:42 -04:00 |
|
Anthony Chang
|
8a3b870e09
|
[None][feat] Update TRTLLM MoE MxFP4 cubins; autotune tileN (#8156)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-10-23 09:14:18 +08:00 |
|
sunnyqgg
|
90080e0e09
|
[https://nvbugs/5556020][fix] test_disaggregated_serving.py::TestLlama3_1_8BInstruct::test_eagle3 dimension mismatch (#8517)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-10-22 09:58:22 +08:00 |
|
Chenghao Zhang
|
bac9e8c2ad
|
[None][feat] AutoDeploy: Add Nemotron MOE support for AutoDeploy (#8469)
|
2025-10-21 15:32:01 -07:00 |
|
Suyog Gupta
|
7050b1ea49
|
[#8272][feat] Enable chunked prefill for SSMs in AutoDeploy (#8477)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-10-20 15:31:52 -07:00 |
|
Pamela Peng
|
b818a912d7
|
[https://nvbugs/5540752][fix] Support quantized Phi4 MM models (#8190)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
|
2025-10-20 06:36:09 -04:00 |
|
Lucas Liebenwein
|
41169fb20c
|
[None][feat] AutoDeploy: chunked prefill support (#8158)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-18 00:47:35 -07:00 |
|
h-guo18
|
55fed1873c
|
[None][chore] AutoDeploy: cleanup old inference optimizer configs (#8039)
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-10-17 15:55:57 -04:00 |
|
xinhe-nv
|
bc833d3de3
|
[TRTLLM-8638][fix] add waives tests (#8445)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-10-17 03:37:53 -07:00 |
|
zhhuang-nv
|
7a2bab93f0
|
[None][test] Add post merge test for Seed-OSS-36B-Instruct (#8321)
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
|
2025-10-17 02:30:33 -07:00 |
|