HuiGao-NV
49974eed75
[None][chore] ISOLATE some cases ( #8690 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-10-27 22:10:44 -04:00
chenfeiz0326
f5265a087b
[None][infra] Minor Update on Perf Sanity Testdb Files ( #8607 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-10-28 09:54:48 +08:00
gramnarayan
88b0fbc8ff
[ #8245 ][feat] Autodeploy: Guided Decoding Support ( #8551 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-28 09:29:57 +08:00
Yechan Kim
a6017f6266
[ https://nvbugs/5608723 ][fix] Use local data on multimodal tests and unwaive tests ( #8673 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-28 09:20:02 +09:00
Emma Qiao
73a5479b26
[None][infra] Skip failed tests for main 10/27 ( #8686 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-28 08:04:30 +08:00
Aurelien Chartier
1401a3c09c
[None][feat] Add FP8 rowwise GEMMs for B200 ( #8332 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-27 16:33:14 -04:00
Bo Li
9c4432f8a4
[TRTLLM-7318][feat] MnnvlThroughput AlltoAll implementation. ( #7499 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-10-27 13:23:06 -04:00
nvxuanyuc
d1398c05e6
[None][feat] Support ignored prompt length for penalties via new sampling config parameter ( #8127 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-10-27 13:12:31 -04:00
Chenghao Zhang
b9b2802599
[None][feat] Autodeploy: Update the ssm to use slice ( #8667 )
...
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2025-10-27 09:45:20 -07:00
mpikulski
7c8ba71b49
[TRTLLM-8832][feat] fully async _select_generated_logits with tests ( #8628 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-10-27 16:15:32 +01:00
QI JUN
4fd58137a1
[TRTLLM-8933][chore] remove unused update_executor_config function ( #8678 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-27 10:00:47 -04:00
Kaiyu Xie
c9b08790c2
[None] [test] Add MNNVL AlltoAll tests to pre-merge ( #8601 )
2025-10-27 21:39:44 +08:00
Chao Ni
0019d99e6d
[None][test] Add longbench v2 for long context evaluation ( #8604 )
...
Signed-off-by: mni <125171826+baize97@users.noreply.github.com>
2025-10-27 20:01:14 +08:00
zhanghaotong
1026069a2b
[None][feat] Add opentelemetry tracing ( #5897 )
...
Signed-off-by: Zhang Haotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: zhanghaotong <zhanghaotong.zht@antgroup.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Zhang Haotong <zhanghaotong.zht@alibaba-inc.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-10-27 18:51:07 +08:00
Jie Li
ce0d76135d
[ https://nvbugs/5546507 ][fix] skip TRT-Flow test case due to CMake Error in building ( #8677 )
...
Signed-off-by: Jie Li <lijie@nvidia.com>
2025-10-27 05:11:47 -04:00
Robin Kobus
990b0c0c47
[TRTLLM-7159][docs] Add documentation for additional outputs ( #8325 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-27 09:52:04 +01:00
xinhe-nv
8090c9641c
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #8672 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-27 03:20:46 -04:00
Yanchao Lu
1614624beb
[None][docs] Update Python wheel's short-/long-descriptions ( #8676 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-10-27 14:58:49 +08:00
xinhe-nv
0ac5cbcac4
[None][chore] Add failed cases into waives.txt ( #8669 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-27 02:36:28 -04:00
Tailing Yuan
858d6437c1
[None][fix] Fix ModelConfig.from_pretrained get quant config file ( #8647 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-10-27 11:02:24 +08:00
QI JUN
cc5b8b6d28
[None][ci] move some time-consuming benchmark test cases to post merge ( #8641 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-26 22:47:17 -04:00
Jinyang Yuan
0a0f93d4a8
[None][fix] Fix the performance issue of FP8 blockwise grouped GEMM when using attention DP ( #8501 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-10-27 10:18:19 +08:00
Emma Qiao
e0728ba8a7
[None][infra] Waive failed case on main 10/26 ( #8668 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-26 22:02:32 +08:00
Chenghao Zhang
a6d20f6f9b
[None][feat] AutoDeploy: Add FP8 MOE for Nemotron ( #8599 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-10-25 15:26:45 -04:00
Wanli Jiang
95be56e56b
[TRTLLM-8238][feat] Add EVS support for nano-v2-vlm ( #8024 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-10-25 05:43:27 -04:00
Simeng Liu
2b27810198
[ https://nvbugs/5494718 ][fix] Fix Single GPU Multi-node issue and OOM on DGX Spark ( #8514 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-10-24 19:09:07 -07:00
Erin
812bc8c954
[TRTLLM-8513][feat] Add back worker extension ( #8482 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-24 20:30:28 -04:00
jthomson04
02081e2390
[None][feat] Support KV Connector with Disagg Prefill Worker ( #8246 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2025-10-24 11:09:06 -07:00
Chang Liu
e47c787dd7
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache ( #8405 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-24 13:40:41 -04:00
Yechan Kim
2d86d6be40
[TRTLLM-8737][feat] Support media_io_kwargs on trtllm-serve ( #8528 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-24 12:53:40 -04:00
Aurelien Chartier
cdf0403c64
[None][feat] Pass KvCacheRetentionConfig to torch LlmRequest ( #8634 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-24 06:44:34 -07:00
Chuang Zhu
2420918e5b
[TRTLLM-7078][chore] optimal kvcache transfer for VWSA ( #7952 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-10-24 08:58:16 -04:00
Suyog Gupta
f512ddaeef
[None][feat] add skip condition in AutoDeploy's triton fused moe kernel ( #8632 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-10-24 08:46:17 -04:00
Yiqing Yan
602b059180
[None][chore] Disable GB300 stages due to nodes will be offline temporarily ( #8643 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-10-24 05:32:05 -04:00
Emma Qiao
35e35db422
[None][infra] Waive tests on main and remove lines which missed in MI ( #8639 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-10-24 02:49:23 -04:00
xinhe-nv
2aaedd08cd
[TRTLLM-8638][fix] fix test issues ( #8557 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 02:16:55 -04:00
xinhe-nv
9a9d647292
[None][chore] Add failed cases into waives.txt ( #8630 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 02:11:03 -04:00
ruodil
07a957e5cb
[None][test] remove redunctant runtime backend in perf test ( #8358 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-10-24 02:01:34 -04:00
Stanley Sun
6b793d5c3d
[TRTLLM-8738][test] Add end-to-end trtllm-serve negative tests ( #8580 )
...
Signed-off-by: Stanley Sun <stsun@nvidia.com>
2025-10-24 13:23:47 +08:00
yuanjingx87
e7ad5e4d6a
[None][infra] enable lfs for generateLockFile pipeline ( #8547 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-10-24 12:59:27 +08:00
xinhe-nv
59375e8bed
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #8590 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-10-24 00:02:42 -04:00
xinhe-nv
95d39e6e76
[TRTLLM-8638][fix] Add failed cases into waives.txt ( #8588 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-10-23 23:08:52 -04:00
Wanli Jiang
f448043d88
[None][feat] Support base64 video input ( #8458 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-10-24 10:23:13 +08:00
Zheng Duan
e666a704f5
[None][doc] add visualization of perf metrics in time breakdown tool doc ( #8530 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-10-23 22:09:21 -04:00
QI JUN
6ee1c87595
[TRTLLM-8817][chore] Set default value of KvCacheConfig.free_gpu_memory_fraction explicitly ( #8561 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-24 08:55:49 +08:00
h-guo18
23920223ab
[ #4585 ][feat] Replace unified attention before export ( #8303 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-10-23 18:02:04 -04:00
Aurelien Chartier
32e1ad68e1
[None][chore] Cleanup GDS code ( #8475 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-10-23 12:36:31 -07:00
QI JUN
cc81028547
[TRTLLM-8812][chore] Limit the scope of pybind based CacheTransceiverConfig ( #8558 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-23 10:32:09 -04:00
Emma Qiao
ee21ea3e91
[None][infra] Disable rtxpro6000 stages due to nodes will be offline ( #8613 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-10-23 10:24:05 -04:00
Emma Qiao
7c1bca4563
[None][infra] Fix slurm exitcode ( #8585 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
2025-10-23 09:46:00 -04:00