amitz-nv
|
750d15bfaa
|
[https://nvbugs/5503529][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF (#7740)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
|
2025-09-16 16:35:13 +08:00 |
|
Kaiyu Xie
|
6eef19297f
|
[None] [chore] cherry pick changes on slurm scripts from release/1.1.0rc2 (#7750)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-09-16 16:07:13 +08:00 |
|
Li Min
|
b278d06481
|
[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op (#7632)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
|
2025-09-16 14:25:26 +08:00 |
|
Guoming Zhang
|
085271eceb
|
[None][doc] Clean the doc folder and move the outdated docs into lega… (#7729)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-09-16 11:43:19 +08:00 |
|
Bo Li
|
3f4e160cba
|
[None][chore] Fix error when running trtllm-bench without cuda graph. (#7725)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-09-15 20:30:23 -07:00 |
|
Void
|
103b554734
|
[None][fix] Ensure that the W4A8 custom input scale remains aligned across all ranks (#7614)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-09-16 11:04:26 +08:00 |
|
xinhe-nv
|
cf55927064
|
[None][chore] Add failed cases into waives.txt (#7735)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-09-16 10:58:06 +08:00 |
|
Yanchao Lu
|
e5cead1eb9
|
[TRTLLM-6295][test] Exit as early as possible and propagate exit status correctly for multi-node testing (#7739)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-16 09:59:18 +08:00 |
|
xiweny
|
c076a02b38
|
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-09-16 09:56:18 +08:00 |
|
Shi Xiaowei
|
809c4d20c0
|
[None][doc] Fix the link in the doc (#7713)
Signed-off-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
|
2025-09-16 09:50:25 +08:00 |
|
Necofish
|
96f11b10ae
|
[None][feat] support attention dp for qwen3 dense model (#7618)
Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>
|
2025-09-16 09:33:22 +08:00 |
|
QI JUN
|
44d5ccfdd9
|
[None][ci] move qwen3 tests from GB200 to B200 (#7733)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-09-16 08:12:28 +08:00 |
|
Ziyi Xiong
|
536e8776cd
|
[TRTLLM-6668][feat] Enable overlap scheduler for two-model spec decoding (#7651)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-09-16 07:33:44 +08:00 |
|
Lucas Liebenwein
|
857c0b45be
|
[None][infra] AutoDeploy: codeowners for autodeploy unit tests (#7743)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-09-15 11:20:12 -07:00 |
|
Izzy Putterman
|
8097be7e9c
|
[None][feat] Eagle, use last hidden post norm (#7546)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
|
2025-09-15 12:23:57 -04:00 |
|
Yanchao Lu
|
0c9430e5a5
|
[None][ci] Test waives for the main branch 09/15 (#7709)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-15 22:13:56 +08:00 |
|
jmydurant
|
7deefb3d2b
|
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
|
2025-09-15 21:43:49 +08:00 |
|
Zheng Duan
|
24fc1f9acf
|
[None][fix] using arrival time in llmapi when creating LlmRequest in pytorch workflow (#7553)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
|
2025-09-15 07:26:01 -04:00 |
|
Wanli Jiang
|
e080294725
|
[TRTLLM-7918][feat] Revert "Support kvcache reuse for phi4mm (#7563)" (#7722)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-09-15 17:19:44 +08:00 |
|
ixlmar
|
965a3dab90
|
[None][test] add test for min_tokens (#7678)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-09-15 08:59:23 +01:00 |
|
Wanli Jiang
|
fc9f4c9295
|
[TRTLLM-7918][feat] Support kvcache reuse for phi4mm (#7563)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-09-15 15:47:00 +08:00 |
|
HuiGao-NV
|
335c007df8
|
[None][chore] move some cases from post-merge to pre-merge to detect errors in early stage (#7699)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-09-15 15:37:58 +08:00 |
|
DylanChen-NV
|
d5df0af017
|
[https://nvbugs/5467981][fix] Fix Qwen2.5-VL fails with cuda graph padding (#7122)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
|
2025-09-15 15:02:34 +08:00 |
|
Ivy Zhang
|
ddfe0320b3
|
[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill (#7365)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-09-15 13:38:52 +08:00 |
|
JunyiXu-nv
|
a2c45d82c3
|
[None][chore] Enable multiple postprocess workers tests for chat completions api (#7602)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
|
2025-09-15 12:16:44 +08:00 |
|
xinhe-nv
|
b69e3e9f99
|
[None][chore] Add failed cases into waives.txt (#7682)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
|
2025-09-15 11:44:52 +08:00 |
|
Chang Liu
|
47e37755a3
|
[TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-09-14 20:10:10 -07:00 |
|
Perkz Zheng
|
1b29c2e731
|
[None][feat] support gpt-oss with fp8 kv cache (#7612)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-09-15 02:17:37 +08:00 |
|
Yanchao Lu
|
70aa4e28c1
|
[None][ci] Test waives for the main branch 09/14 (#7698)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-14 23:48:04 +08:00 |
|
Yanchao Lu
|
89fc136972
|
[None][ci] Some improvements for Slurm CI (#7689)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-14 16:56:32 +08:00 |
|
Zhanrui Sun
|
1f43854496
|
[TRTLLM-6791][infra] Add check for uploading stage name and avoid overriding test result tar file (#6742)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-13 01:15:33 +08:00 |
|
Zhanrui Sun
|
7d73a89ad0
|
[TRTLLM-7169][infra] Fix Slurm multi-node test showing "Submit Test Results" in the test name (#6856)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
|
2025-09-12 18:46:19 +08:00 |
|
Pengyun Lin
|
c2bc39af63
|
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend (#6097)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-09-12 15:32:34 +08:00 |
|
Guoming Zhang
|
ef676fc71f
|
[https://nvbugs/5513192][fix] Add the missing param for kv_cache_tran… (#7679)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-09-11 19:00:16 +08:00 |
|
Chang Liu
|
3a9847eb84
|
[https://nvbugs/5498165][fix] fix permission error for config file lock (#7656)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
|
2025-09-11 10:36:51 +08:00 |
|
Fan - Yunfan
|
e3117731b3
|
[None][fix] Fix the incorrect header file import in dataType.h (#7133)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: fanyunfan <2569658856@qq.com>
Co-authored-by: Yunfan Fan <46273019+fyf2016@users.noreply.github.com>
Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>
|
2025-09-11 08:59:04 +08:00 |
|
QI JUN
|
656f229b58
|
[None][ci] move some test cases from l40s to a30 (#7684)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-09-11 07:22:34 +08:00 |
|
Kanghwan
|
aa152ce8cf
|
[None][infra] Adjust labeling llm prompt for bug issues (#7385)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
|
2025-09-11 05:10:31 +08:00 |
|
Emma Qiao
|
9986070044
|
[None][infra] Waive failed cases on main 0910 (#7676)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-11 01:43:29 +08:00 |
|
Dom Brown
|
fc9d426589
|
[https://nvbugs/5505402] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues (#7616)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
|
2025-09-10 18:30:48 +01:00 |
|
v-shobhit
|
0652514c6d
|
[None][feat] Use a shell context to install dependancies (#7383)
Signed-off-by: Shobhit Verma <shobhitv@nvidia.com>
Signed-off-by: v-shobhit <161510941+v-shobhit@users.noreply.github.com>
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
|
2025-09-10 09:57:37 -07:00 |
|
nvamyt
|
222e01662c
|
[https://nvbugs/5488212][waive] Waive failed tests for L20 (#7664)
Signed-off-by: nvamyt <amyt@nvidia.com>
|
2025-09-10 22:32:15 +08:00 |
|
Leslie Fang
|
d219a4f225
|
[None][chore] remove executor config in kv cache creator (#7526)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-09-10 21:14:44 +08:00 |
|
Linda
|
a4312ba743
|
[https://nvbugs/5477359][fix] Nanobind: Allow none types for fields in result (#7672)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2025-09-10 14:13:46 +01:00 |
|
xinhe-nv
|
207c5258c4
|
[https://nvbugs/5494698][fix] skip gemma3 27b on blackwell (#7505)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-09-10 21:09:27 +08:00 |
|
Bo Deng
|
bf57829acf
|
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. (#7503)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-09-10 17:35:51 +08:00 |
|
Yiqing Yan
|
76c5e1a12f
|
[None][infra] Bump version to 1.1.0rc5 (#7668)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-09-10 16:06:54 +08:00 |
|
Kanghwan
|
758c22f832
|
[#7208][fix] Fix config type of MedusaConfig (#7320)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
|
2025-09-09 23:25:17 -07:00 |
|
Frida Hou
|
bbb5ae3349
|
[#5861][autodeploy] Refactor: Quantization Transforms with Inheritance (#7227)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-09-10 13:00:06 +08:00 |
|
Zheyu Fu
|
c353ff342e
|
[None][feat] Make the should_use_spec_decode logic a bit smarter (#7112)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
|
2025-09-10 12:53:59 +08:00 |
|