jmydurant
|
7deefb3d2b
|
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2025-09-15 21:43:49 +08:00 |
|
Zheng Duan
|
24fc1f9acf
|
[None][fix] using arrival time in llmapi when creating LlmRequest in pytorch workflow (#7553)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-09-15 07:26:01 -04:00 |
|
Wanli Jiang
|
e080294725
|
[TRTLLM-7918][feat] Revert "Support kvcache reuse for phi4mm (#7563)" (#7722)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-09-15 17:19:44 +08:00 |
|
ixlmar
|
965a3dab90
|
[None][test] add test for min_tokens (#7678)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-09-15 08:59:23 +01:00 |
|
Wanli Jiang
|
fc9f4c9295
|
[TRTLLM-7918][feat] Support kvcache reuse for phi4mm (#7563)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-09-15 15:47:00 +08:00 |
|
HuiGao-NV
|
335c007df8
|
[None][chore] move some cases from post-merge to pre-merge to detect errors in early stage (#7699)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-09-15 15:37:58 +08:00 |
|
DylanChen-NV
|
d5df0af017
|
[https://nvbugs/5467981][fix] Fix Qwen2.5-VL fails with cuda graph padding (#7122)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
|
2025-09-15 15:02:34 +08:00 |
|
Ivy Zhang
|
ddfe0320b3
|
[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill (#7365)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-09-15 13:38:52 +08:00 |
|
JunyiXu-nv
|
a2c45d82c3
|
[None][chore] Enable multiple postprocess workers tests for chat completions api (#7602)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-09-15 12:16:44 +08:00 |
|
xinhe-nv
|
b69e3e9f99
|
[None][chore] Add failed cases into waives.txt (#7682)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-09-15 11:44:52 +08:00 |
|
Chang Liu
|
47e37755a3
|
[TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-09-14 20:10:10 -07:00 |
|
Perkz Zheng
|
1b29c2e731
|
[None][feat] support gpt-oss with fp8 kv cache (#7612)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-09-15 02:17:37 +08:00 |
|
Yanchao Lu
|
70aa4e28c1
|
[None][ci] Test waives for the main branch 09/14 (#7698)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-14 23:48:04 +08:00 |
|
Yanchao Lu
|
89fc136972
|
[None][ci] Some improvements for Slurm CI (#7689)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-14 16:56:32 +08:00 |
|
Zhanrui Sun
|
1f43854496
|
[TRTLLM-6791][infra] Add check for uploading stage name and avoid overriding test result tar file (#6742)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-13 01:15:33 +08:00 |
|
Zhanrui Sun
|
7d73a89ad0
|
[TRTLLM-7169][infra] Fix Slurm multi-node test showing "Submit Test Results" in the test name (#6856)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-09-12 18:46:19 +08:00 |
|
Pengyun Lin
|
c2bc39af63
|
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend (#6097)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-09-12 15:32:34 +08:00 |
|
Guoming Zhang
|
ef676fc71f
|
[https://nvbugs/5513192][fix] Add the missing param for kv_cache_tran… (#7679)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-09-11 19:00:16 +08:00 |
|
Chang Liu
|
3a9847eb84
|
[https://nvbugs/5498165][fix] fix permission error for config file lock (#7656)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
2025-09-11 10:36:51 +08:00 |
|
Fan - Yunfan
|
e3117731b3
|
[None][fix] Fix the incorrect header file import in dataType.h (#7133)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: fanyunfan <2569658856@qq.com>
Co-authored-by: Yunfan Fan <46273019+fyf2016@users.noreply.github.com>
Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>
|
2025-09-11 08:59:04 +08:00 |
|
QI JUN
|
656f229b58
|
[None][ci] move some test cases from l40s to a30 (#7684)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-09-11 07:22:34 +08:00 |
|
Kanghwan
|
aa152ce8cf
|
[None][infra] Adjust labeling llm prompt for bug issues (#7385)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
|
2025-09-11 05:10:31 +08:00 |
|
Emma Qiao
|
9986070044
|
[None][infra] Waive failed cases on main 0910 (#7676)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-11 01:43:29 +08:00 |
|
Dom Brown
|
fc9d426589
|
[https://nvbugs/5505402] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues (#7616)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
2025-09-10 18:30:48 +01:00 |
|
v-shobhit
|
0652514c6d
|
[None][feat] Use a shell context to install dependancies (#7383)
Signed-off-by: Shobhit Verma <shobhitv@nvidia.com>
Signed-off-by: v-shobhit <161510941+v-shobhit@users.noreply.github.com>
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
|
2025-09-10 09:57:37 -07:00 |
|
nvamyt
|
222e01662c
|
[https://nvbugs/5488212][waive] Waive failed tests for L20 (#7664)
Signed-off-by: nvamyt <amyt@nvidia.com>
|
2025-09-10 22:32:15 +08:00 |
|
Leslie Fang
|
d219a4f225
|
[None][chore] remove executor config in kv cache creator (#7526)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-09-10 21:14:44 +08:00 |
|
Linda
|
a4312ba743
|
[https://nvbugs/5477359][fix] Nanobind: Allow none types for fields in result (#7672)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-09-10 14:13:46 +01:00 |
|
xinhe-nv
|
207c5258c4
|
[https://nvbugs/5494698][fix] skip gemma3 27b on blackwell (#7505)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-09-10 21:09:27 +08:00 |
|
Bo Deng
|
bf57829acf
|
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. (#7503)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-09-10 17:35:51 +08:00 |
|
Yiqing Yan
|
76c5e1a12f
|
[None][infra] Bump version to 1.1.0rc5 (#7668)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-09-10 16:06:54 +08:00 |
|
Kanghwan
|
758c22f832
|
[#7208][fix] Fix config type of MedusaConfig (#7320)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
|
2025-09-09 23:25:17 -07:00 |
|
Frida Hou
|
bbb5ae3349
|
[#5861][autodeploy] Refactor: Quantization Transforms with Inheritance (#7227)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-09-10 13:00:06 +08:00 |
|
Zheyu Fu
|
c353ff342e
|
[None][feat] Make the should_use_spec_decode logic a bit smarter (#7112)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
|
2025-09-10 12:53:59 +08:00 |
|
Chuang Zhu
|
f412f5c4b0
|
[None][fix]UCX zmq ip support ipv6 (#7530)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-09-10 10:24:41 +08:00 |
|
fredricz-20070104
|
ef620f3579
|
[https://nvbugs/5410687][test] Add deepseek r1-w4afp8 quickstart (#7645)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
|
2025-09-10 10:21:01 +08:00 |
|
Guoming Zhang
|
beefd6413e
|
[None][fix] fix post-merge issue raised by #5488 (#7655)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-09-10 09:26:27 +08:00 |
|
Chang Liu
|
faa2f46554
|
[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-09-09 14:51:36 -04:00 |
|
Jin Li
|
d49374bc45
|
[TRTLLM-7408][feat] Wrap MOE with custom op. (#7277)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-09-09 12:18:56 -04:00 |
|
QI JUN
|
a0e1604898
|
[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline (#7629)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-09-09 11:06:32 -04:00 |
|
Linda
|
0566df672d
|
[TRTLLM-6707][fix] nanobind fix for executor exit call (#7565)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-09-09 14:56:04 +01:00 |
|
Richard Huo
|
dcd110cfac
|
[None][chore] add TorchLlmArgs to the connector api (#7493)
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
|
2025-09-09 09:05:59 -04:00 |
|
NVJiangShao
|
cc7593987b
|
[https://nvbugs/5434424][fix] A quick fix for the wrong output issue of SM89 blocked scaling batched GEMM when the input tensor is non-contiguous. (#7615)
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
|
2025-09-09 08:58:15 -04:00 |
|
William Tambellini
|
a6ed0d17d6
|
[#6798][fix] fix compilation error in ub_allocator in single device build (#6874)
Signed-off-by: William Tambellini <wtambellini@sdl.com>
|
2025-09-09 07:13:53 -04:00 |
|
Liao Lanyu
|
af403848d7
|
[https://nvbugs/5445466][fix] unwaive DS R1 test cases with bug already fixed (#7429)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
|
2025-09-09 17:25:49 +08:00 |
|
Perkz Zheng
|
da6cb541a2
|
[None][feat] Optimize MLA kernels with separate reduction kernels (#7597)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-09-09 16:58:44 +08:00 |
|
tomeras91
|
6e712dd1cc
|
[None][fix] enable NvFP4/FP8 quantization for Nemotron-H architecture (#7589)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2025-09-09 11:42:22 +03:00 |
|
Linda
|
9cb5410067
|
[https://nvbugs/5454559][fix] handle bias term in fuse_gate_mlp (#7449)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-09-09 10:26:17 +02:00 |
|
xinhe-nv
|
8a52015f50
|
[None][chore] Remove closed bugs (#7591)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-09-09 04:08:42 -04:00 |
|
Guoming Zhang
|
62b564ac3c
|
[None][fix] add the missing import raised by #7607 (#7639)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-09-09 03:42:42 -04:00 |
|