Commit Graph

3356 Commits

Author SHA1 Message Date
HuiGao-NV
49974eed75
[None][chore] ISOLATE some cases (#8690)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-27 22:10:44 -04:00
chenfeiz0326
f5265a087b
[None][infra] Minor Update on Perf Sanity Testdb Files (#8607)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-10-28 09:54:48 +08:00
gramnarayan
88b0fbc8ff
[#8245][feat] Autodeploy: Guided Decoding Support (#8551)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-28 09:29:57 +08:00
Yechan Kim
a6017f6266
[https://nvbugs/5608723][fix] Use local data on multimodal tests and unwaive tests (#8673)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-28 09:20:02 +09:00
Emma Qiao
73a5479b26
[None][infra] Skip failed tests for main 10/27 (#8686)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-28 08:04:30 +08:00
Aurelien Chartier
1401a3c09c
[None][feat] Add FP8 rowwise GEMMs for B200 (#8332)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-27 16:33:14 -04:00
Bo Li
9c4432f8a4
[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. (#7499)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-27 13:23:06 -04:00
nvxuanyuc
d1398c05e6
[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127)
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-10-27 13:12:31 -04:00
Chenghao Zhang
b9b2802599
[None][feat] Autodeploy: Update the ssm to use slice (#8667)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2025-10-27 09:45:20 -07:00
mpikulski
7c8ba71b49
[TRTLLM-8832][feat] fully async _select_generated_logits with tests (#8628)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-27 16:15:32 +01:00
QI JUN
4fd58137a1
[TRTLLM-8933][chore] remove unused update_executor_config function (#8678)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-27 10:00:47 -04:00
Kaiyu Xie
c9b08790c2
[None] [test] Add MNNVL AlltoAll tests to pre-merge (#8601) 2025-10-27 21:39:44 +08:00
Chao Ni
0019d99e6d
[None][test] Add longbench v2 for long context evaluation (#8604)
Signed-off-by: mni <125171826+baize97@users.noreply.github.com>
2025-10-27 20:01:14 +08:00
zhanghaotong
1026069a2b
[None][feat] Add opentelemetry tracing (#5897)
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-27 18:51:07 +08:00
Jie Li
ce0d76135d
[https://nvbugs/5546507][fix] skip TRT-Flow test case due to CMake Error in building (#8677)
Signed-off-by: Jie Li <lijie@nvidia.com>
2025-10-27 05:11:47 -04:00
Robin Kobus
990b0c0c47
[TRTLLM-7159][docs] Add documentation for additional outputs (#8325)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-27 09:52:04 +01:00
xinhe-nv
8090c9641c
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8672)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-27 03:20:46 -04:00
Yanchao Lu
1614624beb
[None][docs] Update Python wheel's short-/long-descriptions (#8676)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-27 14:58:49 +08:00
xinhe-nv
0ac5cbcac4
[None][chore] Add failed cases into waives.txt (#8669)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-27 02:36:28 -04:00
Tailing Yuan
858d6437c1
[None][fix] Fix ModelConfig.from_pretrained get quant config file (#8647)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-27 11:02:24 +08:00
QI JUN
cc5b8b6d28
[None][ci] move some time-consuming benchmark test cases to post merge (#8641)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-26 22:47:17 -04:00
Jinyang Yuan
0a0f93d4a8
[None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP (#8501)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-10-27 10:18:19 +08:00
Emma Qiao
e0728ba8a7
[None][infra] Waive failed case on main 10/26 (#8668)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-26 22:02:32 +08:00
Chenghao Zhang
a6d20f6f9b
[None][feat] AutoDeploy: Add FP8 MOE for Nemotron (#8599)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-25 15:26:45 -04:00
Wanli Jiang
95be56e56b
[TRTLLM-8238][feat] Add EVS support for nano-v2-vlm (#8024)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-10-25 05:43:27 -04:00
Simeng Liu
2b27810198
[https://nvbugs/5494718][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark (#8514)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-10-24 19:09:07 -07:00
Erin
812bc8c954
[TRTLLM-8513][feat] Add back worker extension (#8482)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-24 20:30:28 -04:00
jthomson04
02081e2390
[None][feat] Support KV Connector with Disagg Prefill Worker (#8246)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-24 11:09:06 -07:00
Chang Liu
e47c787dd7
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-24 13:40:41 -04:00
Yechan Kim
2d86d6be40
[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve (#8528)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-24 12:53:40 -04:00
Aurelien Chartier
cdf0403c64
[None][feat] Pass KvCacheRetentionConfig to torch LlmRequest (#8634)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-24 06:44:34 -07:00
Chuang Zhu
2420918e5b
[TRTLLM-7078][chore] optimal kvcache transfer for VWSA (#7952)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-24 08:58:16 -04:00
Suyog Gupta
f512ddaeef
[None][feat] add skip condition in AutoDeploy's triton fused moe kernel (#8632)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-24 08:46:17 -04:00
Yiqing Yan
602b059180
[None][chore] Disable GB300 stages due to nodes will be offline temporarily (#8643)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-10-24 05:32:05 -04:00
Emma Qiao
35e35db422
[None][infra] Waive tests on main and remove lines which missed in MI (#8639)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-10-24 02:49:23 -04:00
xinhe-nv
2aaedd08cd
[TRTLLM-8638][fix] fix test issues (#8557)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 02:16:55 -04:00
xinhe-nv
9a9d647292
[None][chore] Add failed cases into waives.txt (#8630)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 02:11:03 -04:00
ruodil
07a957e5cb
[None][test] remove redunctant runtime backend in perf test (#8358)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-10-24 02:01:34 -04:00
Stanley Sun
6b793d5c3d
[TRTLLM-8738][test] Add end-to-end trtllm-serve negative tests (#8580)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
2025-10-24 13:23:47 +08:00
yuanjingx87
e7ad5e4d6a
[None][infra] enable lfs for generateLockFile pipeline (#8547)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-10-24 12:59:27 +08:00
xinhe-nv
59375e8bed
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8590)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 00:02:42 -04:00
xinhe-nv
95d39e6e76
[TRTLLM-8638][fix] Add failed cases into waives.txt (#8588)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-23 23:08:52 -04:00
Wanli Jiang
f448043d88
[None][feat] Support base64 video input (#8458)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-10-24 10:23:13 +08:00
Zheng Duan
e666a704f5
[None][doc] add visualization of perf metrics in time breakdown tool doc (#8530)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-10-23 22:09:21 -04:00
QI JUN
6ee1c87595
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly (#8561)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-24 08:55:49 +08:00
h-guo18
23920223ab
[#4585][feat] Replace unified attention before export (#8303)
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-10-23 18:02:04 -04:00
Aurelien Chartier
32e1ad68e1
[None][chore] Cleanup GDS code (#8475)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-23 12:36:31 -07:00
QI JUN
cc81028547
[TRTLLM-8812][chore] Limit the scope of pybind based CacheTransceiverConfig (#8558)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-23 10:32:09 -04:00
Emma Qiao
ee21ea3e91
[None][infra] Disable rtxpro6000 stages due to nodes will be offline (#8613)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-23 10:24:05 -04:00
Emma Qiao
7c1bca4563
[None][infra] Fix slurm exitcode (#8585)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-10-23 09:46:00 -04:00