Commit Graph

1675 Commits

Author SHA1 Message Date
ruodil
e6073b3911
[None][test] add gpt oss model for trtllm perf test (#7328)
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-09-17 15:23:21 +08:00
xinhe-nv
7801d0992b
[None][chore] Remove closed bugs (#7697)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-17 15:14:09 +08:00
QI JUN
d3e680b3c3
[None][ci] waive test_llama_eagle3[True-FLASHINFER-False-False-False-False-True] (#7788)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 15:12:55 +08:00
Fanrong Li
523a17d990
[https://nvbugs/5485325][fix] Cherry-pick #7373: fix the CUDA graph warmup issue when using speculative decoding (#7734)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-17 13:57:39 +08:00
QI JUN
bd7aad4988
[None][ci] waive test_llm_gemma_1gpu_summary_vswa (#7781)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 10:48:31 +08:00
Lucas Liebenwein
4c3dc89f84
[None][chore] AutoDeploy: clean up of model unit test configuration (#7742)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-17 10:42:01 +08:00
Kaiyu Xie
62042a9733
[TRTLLM-6741] [feat] enable LM tp for MTP, under attention dp case (cherry-pick #7128) (#7571)
Signed-off-by: Cheng Hang <chang@nvidia.com>
Co-authored-by: Cheng Hang <chang@nvidia.com>
2025-09-17 09:41:32 +08:00
Iman Tabrizian
a91453de34
[None][waive] Waive tests (#7775)
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-09-16 19:42:32 -04:00
HuiGao-NV
a49cfb3e68
[https://nvbugs/5516666][fix] cherrypick fix to the CUDA graph warmup issue when using speculative decoding (#7737)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
Co-authored-by: Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-17 06:24:20 +08:00
xinhe-nv
e7c1569456
[None][chore] Add failed cases into waives.txt (#7746)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 18:43:40 +08:00
Ziyi Xiong
905bb26bbd
[https://nvbugs/5471106][fix] Remove the waivers (#7711)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-16 17:43:39 +08:00
xinhe-nv
c6ab2072b5
[None][fix] waive hang tests on main (#7720)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 17:05:15 +08:00
xinhe-nv
1fbea497ff
[TRTLLM-7070][feat] add gpt-oss serve benchmark tests (#7638)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 16:39:31 +08:00
amitz-nv
750d15bfaa
[https://nvbugs/5503529][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF (#7740)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-09-16 16:35:13 +08:00
Li Min
b278d06481
[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op (#7632)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-09-16 14:25:26 +08:00
xinhe-nv
cf55927064
[None][chore] Add failed cases into waives.txt (#7735)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 10:58:06 +08:00
xiweny
c076a02b38
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices (#7568)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-09-16 09:56:18 +08:00
QI JUN
44d5ccfdd9
[None][ci] move qwen3 tests from GB200 to B200 (#7733)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-16 08:12:28 +08:00
Ziyi Xiong
536e8776cd
[TRTLLM-6668][feat] Enable overlap scheduler for two-model spec decoding (#7651)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-16 07:33:44 +08:00
Yanchao Lu
0c9430e5a5
[None][ci] Test waives for the main branch 09/15 (#7709)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-15 22:13:56 +08:00
jmydurant
7deefb3d2b
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill (#7477)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-09-15 21:43:49 +08:00
ixlmar
965a3dab90
[None][test] add test for min_tokens (#7678)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-09-15 08:59:23 +01:00
HuiGao-NV
335c007df8
[None][chore] move some cases from post-merge to pre-merge to detect errors in early stage (#7699)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-15 15:37:58 +08:00
Ivy Zhang
ddfe0320b3
[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill (#7365)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-15 13:38:52 +08:00
JunyiXu-nv
a2c45d82c3
[None][chore] Enable multiple postprocess workers tests for chat completions api (#7602)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-09-15 12:16:44 +08:00
xinhe-nv
b69e3e9f99
[None][chore] Add failed cases into waives.txt (#7682)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-15 11:44:52 +08:00
Chang Liu
47e37755a3
[TRTLLM-6903][feat] Support chunked prefill for multimodal models (#6843)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-14 20:10:10 -07:00
Perkz Zheng
1b29c2e731
[None][feat] support gpt-oss with fp8 kv cache (#7612)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-15 02:17:37 +08:00
Yanchao Lu
70aa4e28c1
[None][ci] Test waives for the main branch 09/14 (#7698)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-14 23:48:04 +08:00
Pengyun Lin
c2bc39af63
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend (#6097)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-12 15:32:34 +08:00
Guoming Zhang
ef676fc71f
[https://nvbugs/5513192][fix] Add the missing param for kv_cache_tran… (#7679)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-11 19:00:16 +08:00
QI JUN
656f229b58
[None][ci] move some test cases from l40s to a30 (#7684)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-11 07:22:34 +08:00
Emma Qiao
9986070044
[None][infra] Waive failed cases on main 0910 (#7676)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-11 01:43:29 +08:00
Dom Brown
fc9d426589
[https://nvbugs/5505402] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues (#7616)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-10 18:30:48 +01:00
nvamyt
222e01662c
[https://nvbugs/5488212][waive] Waive failed tests for L20 (#7664)
Signed-off-by: nvamyt <amyt@nvidia.com>
2025-09-10 22:32:15 +08:00
xinhe-nv
207c5258c4
[https://nvbugs/5494698][fix] skip gemma3 27b on blackwell (#7505)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-10 21:09:27 +08:00
Bo Deng
bf57829acf
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. (#7503)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-10 17:35:51 +08:00
Frida Hou
bbb5ae3349
[#5861][autodeploy] Refactor: Quantization Transforms with Inheritance (#7227)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-09-10 13:00:06 +08:00
Zheyu Fu
c353ff342e
[None][feat] Make the should_use_spec_decode logic a bit smarter (#7112)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
2025-09-10 12:53:59 +08:00
fredricz-20070104
ef620f3579
[https://nvbugs/5410687][test] Add deepseek r1-w4afp8 quickstart (#7645)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-10 10:21:01 +08:00
Guoming Zhang
beefd6413e
[None][fix] fix post-merge issue raised by #5488 (#7655)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-10 09:26:27 +08:00
Chang Liu
faa2f46554
[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next (#7349)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-09 14:51:36 -04:00
Jin Li
d49374bc45
[TRTLLM-7408][feat] Wrap MOE with custom op. (#7277)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-09 12:18:56 -04:00
QI JUN
a0e1604898
[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline (#7629)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-09 11:06:32 -04:00
Liao Lanyu
af403848d7
[https://nvbugs/5445466][fix] unwaive DS R1 test cases with bug already fixed (#7429)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-09-09 17:25:49 +08:00
Perkz Zheng
da6cb541a2
[None][feat] Optimize MLA kernels with separate reduction kernels (#7597)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-09 16:58:44 +08:00
xinhe-nv
8a52015f50
[None][chore] Remove closed bugs (#7591)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-09 04:08:42 -04:00
William Zhang
c53d1814a7
[None][feat] Extend VLM factory and add Mistral3 factory (#7583)
This commit:

* extends existing factory interfaces to enable Mistral3 in AutoDeploy.
* adds a Mistral3 VLM factory.
* adds various model patches for pixtral (the vision model) and mistral3
  to make the VLM export compliant.
* adjusts checkpoint loading code to take possible parameter name
  conversions into account.
* fixes a sampling bug (the `end_id` needs to be take into account when
  sampling, but it is not included in the stop words' token IDs).

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-09 02:47:18 -04:00
Yiqing Yan
5c616da2fd
[TRTLLM-5877][infra] Add fmha tests and auto trigger rules (#6050)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-09 11:33:09 +08:00
Wanli Jiang
1e0669d27a
[https://nvbugs/5453709][fix] Remove transformers version limit in Qwen2VL (#7152)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-09 10:38:20 +08:00
Iman Tabrizian
d96c54d8ae
[None][test] Skip eagle3 test (#7627)
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-09-08 17:23:53 -04:00
dongfengy
fdd5bd49fc
[https://nvbugs/5481080][fix] Fix GPTOSS W4A16 reference (#7323)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-08 13:59:28 -07:00
Chuang Zhu
77657a1c12
[TRTLLM-7361][feat] KV cache transfer for uneven pp (#7117)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-08 13:37:46 -04:00
Eran Geva
5f2a42b3df
[TRTLLM-6142][feat] AutoDeploy: set torch recompile_limit based on cuda_graph_batch_sizes and refactored (#7219)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-09-08 08:45:58 -04:00
Chang Liu
4a1e13897f
[None][feat] Update multimodal utility get_num_tokens_per_image for better generalization (#7544)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-08 07:42:46 -04:00
bhsueh_NV
219e95569a
[https://nvbugs/5506683][fix] adjust the CI (#7604)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-09-08 15:41:41 +08:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd (#7519)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
JunyiXu-nv
504bb7ffa9
[TRTLLM-7779][feat] Support multiple postprocess workers for chat completions API (#7508)
Signed-off-by: Junyi Xu 
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-08 11:11:35 +08:00
Raayan Dhar
8f3121ac81
[None][fix] chore: fixing the math on asymmetric tp+pp tests (#7098)
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-09-07 14:27:46 -04:00
Netanel Haber
0fee8cd028
[TRTLLM-7153] [feat] Move stop_criteria to sample_async (#7041)
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-09-07 17:36:49 +03:00
Raayan Dhar
bae9560e62
[https://nvbugs/5448767][fix] sync termination of requests across PP ranks (#7455)
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-07 08:45:49 -04:00
Emma Qiao
aea8ac1649
[TRTLLM-5950][infra] Removing remaining turtle keywords from the code base (#7086)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-07 14:26:18 +08:00
Mike Iovine
45390402fc
[https://nvbugs/5502352][fix] Fix 2-model CDL path (#7543)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-06 23:53:27 -04:00
Chang Liu
99b98f1374
[TRTLLM-7440][fix] Split fused_input_embed to separate out host sync (#7280)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-06 23:11:39 -04:00
Chang Liu
23500b55c3
[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse (#7106)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-06 17:58:32 -04:00
QI JUN
12ecb864c2
[None][chore] share input_ids buffers among different cuda graphs (#7236)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-06 17:49:42 -04:00
dominicshanshan
9a97f0a3b7
[None][ci] Waive qwen3 test for accuracy bug in https://nvbugs/5505402 (#7585)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-06 21:29:16 +08:00
QI JUN
525bb806a9
[None][ci] move some test cases of DGX H100 to post merge (#7569)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-06 01:03:38 -04:00
QI JUN
b8183cac2b
[None][ci] Revert "[https://nvbugs/5461761][fix] Remove the waiver (#7476)" (#7584) 2025-09-05 22:02:09 -07:00
Lucas Liebenwein
74105a45d9
[#6120][feat] AutoDeploy: flexible args for sequence interface + AD multi-modal input processor + llama4 VLM example (#7221)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-05 22:10:48 -04:00
peaceh-nv
25389c9fe2
[https://nvbugs/5453806][unwaive] Unwaive fp8 kvcache attention test (#7243)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-05 12:13:57 -04:00
Emma Qiao
d8ec546b73
[None][infra] Waive failed tests on main branch 0905 (#7564)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-05 22:46:46 +08:00
Ziyi Xiong
79e0296ca0
[https://nvbugs/5461761][fix] Remove the waiver (#7476)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-05 15:29:54 +08:00
xinhe-nv
8e3962d278
[TRTLLM-6642][feat] add gptoss 20g tests (#7361)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:20:28 -04:00
xinhe-nv
b3ba3d98d2
[None][chore] Remove closed bugs (#7408)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:11:16 -04:00
QI JUN
ff3704897b
[None][ci] remove unnecessary test_modeling_deepseek.py (#7542)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 20:05:27 -07:00
Jin Li
2189a2f3ff
[https://nvbugs/5483615][fix] Remove unnecessary assertion to let mai… (#7441)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-05 10:56:21 +08:00
Shunkangz
bddf183e15
[None][feat] Add Request specific exception (#6931)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-09-04 18:43:42 -04:00
Chang Liu
08a0e06621
[TRTLLM-7410][feat] Support hashing and KV cache reuse for videos (#7360)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-04 14:39:23 -04:00
Yuxian Qiu
48a5270868
[https://nvbugs/5492485][fix] Use offline dataset from llm-models instead. (#7435)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-04 09:58:16 -07:00
sychen52
98a1bffb7c
[OMNIML-2336][feat] Add NVFP4 x FP8 (#6809)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-04 09:03:38 -07:00
Enwei Zhu
1745102e72
[TRTLLM-7027][feat] Fuse d2t to logitsBitmaskKernel and fix a race condition in one-model spec (#7481)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 23:30:14 +08:00
Izzy Putterman
26b133f3a7
[None][feat] MultiLayer Eagle (#7234)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-04 10:49:13 -04:00
Ivy Zhang
b46e0ae5d4
[None][test] update nim and full test list (#7468)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-04 09:06:01 -04:00
QI JUN
d38b8e3dd9
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS for thop/parallel tests (#7489)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 06:04:51 -07:00
kris1025
cce9556858
[https://nvbugs/5485886][fix] Fix resource free of Eagle3ResourceManager (#7437)
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-09-04 17:38:13 +08:00
Grzegorz Kwasniewski
3755f8ab7d
[TRTLLM-6342][fix] Fixed triggering BMM sharding (#7389)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-09-04 02:01:27 -04:00
Jin Li
2a2dfe273b
[https://nvbugs/5485102][fix] Correctly set stride for piecewise outp… (#7442)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 10:48:15 +08:00
Stanley Sun
db8eb0a447
[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options (#7492)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-09-04 10:34:38 +08:00
Lizhi Zhou
d97c1e6bd9
[https://nvbugs/5470769][fix] fix disagg-serving accuracy test case (#7338)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-04 09:11:01 +08:00
Enwei Zhu
5ff3a65b23
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) (#6948)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-03 15:16:11 -07:00
Lizhi Zhou
7c73c2ff4b
[https://nvbugs/5485593][fix] improve accuracy/test_disaggregated_serving.py (#7366)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-03 09:38:53 -04:00
Stanley Sun
cebbf48b74
[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 (#7083)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-09-03 08:36:52 -04:00
Mike Iovine
79d93f9419
[https://nvbugs/5488141][fix] Unwaive llama3 test_eagle3 (#7486)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-03 14:10:40 +08:00
Wanli Jiang
4223a9aada
[TRTLLM-7261][feat] Support phi-4 model in pytorch backend (#7371)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-03 10:27:42 +08:00
Daniel Stokes
109f27265c
[None][perf] Add MOE support for dynamic cluster shapes and custom epilogue schedules (#6126)
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
2025-09-02 21:54:43 -04:00
Eran Geva
75c1bb6389
[https://nvbugs/5458798][fix] Disabled test_trtllm_bench_backend_comparison due to timeout (#7397)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-09-02 11:21:42 -07:00
Simeng Liu
bcc55bcdf3
[https://nvbugs/5470782][fix] Add specific test names for test_deepseek.py (#7318)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-09-02 10:31:40 -07:00
Emma Qiao
aae5d22bfe
[None][infra] Waive failed tests on main branch 0902 (#7482)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-02 10:16:49 -04:00
peaceh-nv
90479c50fb
[https://nvbugs/5453992][unwaive] Unwaive llama quickstart test (#7242)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-02 20:28:32 +08:00
JunyiXu-nv
eefe5f2093
[TRTLLM-7208][feat] Implement basic functionalities for Responses API (#7341)
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-09-02 07:08:22 -04:00
HuiGao-NV
7279297717
[None][infra] waive test case failed on post-merge (#7471)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-02 06:20:08 -04:00
aalanwyr
c3c95736a1
[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test (#7413)
Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>
2025-09-02 17:21:27 +08:00
Ivy Zhang
3799e5d460
[None][test] auto reuse torch empty cache on qa test (#7421)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-02 04:44:47 -04:00
Yan Chunwei
f90375f37c
[https://nvbugs/5476580][fix] unwaive test_nvfp4_4gpus (#7454)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-09-02 04:17:14 -04:00
Mike Iovine
b3c57a7042
[TRTLLM-7353][feat] Implement capturable drafting loops for speculation (#7100)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-01 14:37:44 -04:00
Emma Qiao
01dfd3af1b
[None][infra] Waive failed case on main 0901 (#7447)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-01 23:27:24 +08:00
bhsueh_NV
16e9d1121c
[https://nvbugs/5481087][fix] fix bug of ci when we use mocker (#7332)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-09-01 16:22:45 +08:00
nvamyt
efaefca2c8
[None][test] Update case that not support passing quantization fp8 for pytorch backend (#7302)
Signed-off-by: nvamyt <amyt@nvidia.com>
2025-09-01 12:59:21 +08:00
Yiqing Yan
21291f3d8e [None][chore] Remove duplicate test waives (#6999)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Emma Qiao
09bca7ca82 [None][infra] Waive failed tests for release branch 0818 (#6993)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
peaceh-nv
f4dc1ed39c [https://nvbugs/5449218][fix] Fix KvCacheConfig error in test_perf (#6937)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
29cdcdb56a [None][fix] update skip config (#6891)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Guoming Zhang
d5bc5cd4f2 [https://nvbugs/5375646][fix] update waives.txt for nvbug 5375646 (#6847)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
William Zhang
d15dcdc4ae [https://nvbugs/5448525][fix] Mistral Small 3.1 accuracy tests (#6909)
This commit lowers the GPU memory allocated for KV cache in accuracy
tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yan Chunwei
ac07418968 [None][ci] unwaive test_ptp_star_attention_example (#6943)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
xinhe-nv
b4d41d6604 [TRTLLM-7048][feat] add benchmark TRT flow test for MIG (#6884)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yan Chunwei
612c26be22 [None][doc] add legacy section for tensorrt engine (#6724)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
2ez4bz
cf0c47ca2d [None][fix] Fix batching bug in Mistral3 model (#6841)
Prior to this commit, if multiple requests with images were in the same
batch, the batching logic for the images would fail.

This commit fixes it, and adds unit tests for it that were verified to
fail prior to the fix.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
2ez4bz
2480aedb73 [TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 (#6731)
This commit adds some level of FP8 support to Mistral Small 3.1 by:

* disabling quantization for the vision sub-model since `modelopt` does
  support quantizing it (yet).
* extending existing accuracy tests to use a modelopt produced FP8
  checkpoint.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Guoming Zhang
3e99744201 [https://nvbugs/5375594][fix] fix oom issue on structural_tag test case (#6838)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
deba2885c1 [None][fix] fix Llama3 eagle3 test case OOM (#6832)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
xinhe-nv
7841ea6255 [None][chore] waive GB300 known issues (#6812)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
c7147d25dc [TRTLLM-6975][test] Add multi-turn test cases for VLM models (#6749)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Tian Zheng
e257cb3533
[None][feat] Support NVFP4 KV Cache (#6244)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-09-01 09:24:52 +08:00
xinhe-nv
5f939b9121
[None][chore] Add failed cases into waives.txt (#7342)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-30 00:49:14 -04:00
Emma Qiao
15ec2b855d
[None][infra] Waive failed tests on main branch 08/29 (#7370)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA
62459d533d
[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-29 17:03:46 +08:00
fredricz-20070104
091b67ad2f
[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-08-29 02:16:22 -04:00
Chang Liu
31b0f0fb0c
[https://nvbugs/5445466][fix] Eliminate race when loading HF dynamic modules (#7268)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-29 12:36:30 +08:00
Richard Huo
ce580ce4f5
[None][feat] KV Cache Connector API (#7228)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-28 23:09:27 -04:00
aalanwyr
085dc19bfa
[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284)
Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>
2025-08-28 23:09:11 -04:00
Yuan Tong
ccb800f909
[TRTLLM-7457][ci] Update unittest parallel config (#7297)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-29 09:28:04 +08:00
Emma Qiao
1e644fa28a
[None][infra] Waive failed tests on main branch 08/26 (#7346)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 00:24:08 +08:00
Neta Zmora
08f935681d
[https://nvbugs/5474453][fix] fix path to tested model (#7272)
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-08-28 08:01:48 -04:00
Zongfei Jing
53163bf1df
[TRTLLM-6876][feat] Add low precision all2all for mnnvl (#7155)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-28 18:26:16 +08:00
QI JUN
ae89163368
[None][ci] skip TestGPTOSS (#7333)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-28 05:01:49 -04:00
William Zhang
4541655e5f
[https://nvbugs/5430124][ci] Unwaive Mistral 3.1 Small tests (#7274)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-28 00:03:32 -04:00
QI JUN
39c9ffda5a
[None][ci] fix test list name (#7321)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:33:22 -04:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
bhsueh_NV
9d345b31c0
[https://nvbugs/5453727][fix] unwaive qwen3 CI tests (#7293)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 22:58:59 +08:00
Eran Geva
462169bfc9
[https://nvbugs/5458798][fix] AD perf test outliers handling, tightened threshold, re-enabled in CI, fixed mem threshold (#7189)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-08-27 07:57:46 -07:00
QI JUN
d09add5ede
[None][ci] parallelize unit tests of auto deploy in B200 (#7291)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:32:11 +08:00
Emma Qiao
8dc62ffac4
[None][infra] Waive failed tests on main (#7300)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-27 09:53:33 -04:00
xinhe-nv
f082e4857c
[TRTLLM-7250][fix] waive failed cases (#7292)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-27 18:04:46 +08:00
nvamyt
dbd4f21687
[None][fix] Update maxnt of llama_v3.2_1b bench (#7279)
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-27 16:56:28 +08:00
bhsueh_NV
f167b1fd99
[https://nvbugs/5453727][fix] Fix bug of how GPT-OSS setup the parameters in CI (#7151)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 15:26:10 +08:00
QI JUN
e08c7cf17b
[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 03:12:30 -04:00
dongxuy04
abdb2735be
[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-08-27 01:39:24 -04:00
Yuan Tong
6c7813e821
[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-08-27 00:45:58 -04:00
Zhenhuan Chen
d0d8903a7f
[TRTLLM-6960][fix] replace flasky scaled_mm test with more stable config (#7089)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-08-26 20:58:33 -07:00
Shunkangz
ff4047414b
[None][opt] Balance the request based on number of tokens in AttentionDP (#7183)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-27 11:16:12 +08:00
Zhou Yuxin
ccb6aadea8
[https://nvbugs/5412456][fix] Remove from waives.txt (#7248)
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-08-27 10:05:53 +08:00
Jin Li
028235404b
[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-26 18:31:33 -04:00
Fridah-nv
0f947c64cb
[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder (#7233)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-26 10:47:57 -07:00
Void
040f4c70d3
[None][perf] Accelerate global scale calculations for deepEP fp4 combine (#7126)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-08-27 00:13:13 +08:00
QI JUN
baef70e67e
[None][ci] move qwen3 tests from b200 to gb200 (#7257)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-26 11:50:53 -04:00
xinhe-nv
80043affb5
[None][chore] Add failed cases into waives.txt (#7251)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 17:13:44 +08:00
amitz-nv
23ed0c892d
[https://nvbugs/5477332][fix] Relax atol in test_mamba2_chunk_scan_combined_prefill_chunking (#7215)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-26 10:48:58 +03:00
Zheng Duan
cf50ba2980
[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server (#6985)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 15:34:44 +08:00
Zheng Duan
1a929a1490
[https://nvbugs/5457504][fix] fix kv cache event test in disaggregated worker tests (#7028)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 14:25:10 +08:00
nvamyt
d8bd8843fc
[None][test] Update qwen3 timeout to 60 minutes (#7200)
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 14:18:42 +08:00
qixiang-99
b165f8bc97
fix/improve kvcache allocation in PyTorch runtime (#5933)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-08-26 12:40:22 +08:00
William Zhang
92576488d3
[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013)
* Why?

Some models (e.g. anything produced by Mistral) can have both sharded
safetensors and a consolidated safetensor in the same checkpoint
directory. In such cases, prefetching both to memory is a waste of time,
and memory.

* What?

This commit skips over consolidated safetensors when they are not the
only safetensor file present in the checkpoint directory

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-25 23:56:21 -04:00
Leslie Fang
20922b7d1f
[None][chore] Create PyExecutor from TorchLlmArgs Part 1 (#7105)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-26 10:42:01 +08:00
ruodil
b845eb7a3a
[None][test] add kv cache size in bench metric and fix failed cases (#7160)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 10:10:02 +08:00
Grzegorz Kwasniewski
2101d46d68
[TRTLLM-6342][feat] TP Sharding read from the model config (#6972)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-25 15:41:27 -07:00
chenfeiz0326
6a44e5b9d1
[https://nvbugs/5440241][fix] Fix 70B GSM8K Accuracy drop (#6967)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-25 22:09:30 +08:00
Emma Qiao
200db3b809
[None][infra] Waive failed tests on main branch (#7201)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-25 09:04:37 -04:00
QI JUN
bea5e07fb7
[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-25 20:52:05 +08:00
amitz-nv
a1e03af0f4
[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7033)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-25 10:37:40 +03:00
Ivy Zhang
f61b74f796
[None][test] add l20 specific qa test list (#7067)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-25 12:44:08 +08:00
QI JUN
630e67b845
[None][ci] waive test_mamba2_chunk_scan_combined_prefill_chunking[seqlens1-8] (#7194)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-24 23:52:59 -04:00
Yukun He
9c5b464fe0
[None][feat] Apply AutoTuner to fp8_block_scale_deep_gemm to trigger JIT ahead of time. (#7113)
Because deep_gemm.gp8_gemm_nt will trigger many JIT processes during the inference phase, we need to sweep these shapes ahead of time. Apply the AutoTuner framework to achieve this and retain the potential capability to tune the swap_ab flag.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-25 10:48:31 +08:00
Bo Deng
c038fb3ef4
[None][chore] cherry-pick 6940 (#7097)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-25 10:28:45 +08:00
xinhe-nv
3ba9afcc7b
[None][feat] add gpt-osss tests to sanity list (#7158)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-25 10:22:07 +08:00
Bo Deng
6e131602b2
[TRTLLM-7096][infra] Testing cache transmission functionality in Python (#7025)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-25 09:47:39 +08:00
Yiqing Yan
486bc763c3
[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge (#7074)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:09:04 -04:00
Robin Kobus
31979aefac
[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests (#6754)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-24 20:53:17 +02:00
ajrasane
068056677f
[None][chore] Enable auto deploy accuracy test in CI (#7179)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-24 08:42:30 -07:00
Yanchao Lu
ec35481b0a
[None][infra] Prepare for single GPU GB200 test pipeline (#7073)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:46:39 +08:00
dongxuy04
19a0ea363b
[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP (#6973)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: Dongxu Yang <dongxuy@nvidia.com>
Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-08-24 08:15:29 -04:00
Iman Tabrizian
96ff82e77a
[None][fix] Waive test (#7185)
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-24 10:45:11 +08:00
Izzy Putterman
b36460d7b5
[None][feat] Deepseek: Start Eagle work (#6210)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
2025-08-22 12:57:17 -04:00
tomeras91
c232ba8157
[TRTLLM-4921][feat] Enable chunked prefill for Nemotron-H (#6334)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-22 12:15:20 -04:00
Suyog Gupta
e3de5758a3
[#7136][feat] trtllm-serve + autodeploy integration (#7141)
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-22 08:30:53 -07:00
QI JUN
1388e84793
[None][ci] move all B200 TensorRT test cases to post merge (#7165)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-22 06:47:23 -04:00
xinhe-nv
b8b2bd4a0a
[TRTLLM-7245][feat] add test_multi_nodes_eval tests (#7108)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-22 17:17:27 +08:00
Linda
898f37faa0
[None][feat] Enable nanobind as the default binding library (#6608)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-22 09:48:41 +02:00
Daniel Cámpora
099f081e03
[TRTLLM-7155][feat] Unify sampler handle logits implementation. (#6867)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-22 08:09:30 +02:00
xinhe-nv
4017f7cd6b
[None][chore] Add failed cases into waives.txt (#7109)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-22 10:39:25 +08:00
Wanli Jiang
07c711eb1f
[TRTLLM-6825][fix] Update lora for phi4-mm (#6817)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-21 22:00:04 -04:00
dominicshanshan
6f245ec78b
[None][chore] Mass integration of release/1.0 (#6864)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-22 09:25:15 +08:00
Daniel Stokes
f7c597ec40
[None][perf] Make finalize fusion part of the tactic selection logic (#6915)
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
2025-08-21 14:08:03 -07:00
Fridah-nv
e18dacc931
[#4403][refactor] Move fusion, kvcache, and compile to modular inference optimizer (#7057)
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-08-21 10:30:36 -07:00
Emma Qiao
344bc4575d
[None][infra] Waive failed case for main branch (#7129)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-22 00:08:55 +08:00
Dimitrios Bariamis
f49dafe0da
[https://nvbugs/5394409][feat] Support Mistral Small 3.1 multimodal in Triton Backend (#6714)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-21 18:08:38 +02:00
bhsueh_NV
ba0a86e0bb
[https://nvbugs/5437405][fix] qwen3 235b eagle3 ci (#7000)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-21 01:17:32 -04:00
xinhe-nv
21f4434404
[None][chore] waive failed cases on H100 (#7084)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-21 11:15:23 +08:00
Chang Liu
75b8a90816
[None][fix] Fix llama4 multimodal by skipping request validation (#6957)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-20 21:58:53 -04:00
Yechan Kim
0893afae3d
[TRTLLM-6771][feat] Support MMMU for multimodal models (#6828)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-21 08:54:12 +08:00
bhsueh_NV
73d2daa386
[https://nvbugs/5457489][fix] unwaive some tests (#6991)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-21 08:49:57 +08:00
QI JUN
a918de710a
[None][ci] move some tests of b200 to post merge (#7093)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-20 19:43:40 -04:00
Emma Qiao
f84dd64250
[None][infra] Waive failed tests on main branch 8/20 (#7092)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-20 06:33:44 -04:00
Robin Kobus
b95cab2a7c
[None][ci] move unittests to sub-directories (#6635)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-20 05:42:22 -04:00
Iman Tabrizian
e27088421e
[None][infra] "[TRTLLM-6960][fix] enable scaled_mm tests (#6936)" (#7059)
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
2025-08-20 01:45:09 -04:00
xinhe-nv
9e71b4fda4
[TRTLLM-7205][feat] add llama4 tp4 tests (#6989)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-20 13:22:05 +08:00
Leslie Fang
3f6a9267f1
[None][infra] update feature_combination_matrix of disaggregated and chunked prefill (#6661)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-20 13:14:34 +08:00
Chang Liu
ce53832610
[TRTLLM-7326][feat] Add standalone multimodal encoder (#6743)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-19 21:42:50 -07:00
Ivy Zhang
fc85e3db1c
[None][fix] fix llmapi import error (#7030)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-19 22:58:13 -04:00
Bo Deng
30da5d3cc4
[None][chore] unwaive test_disaggregated_genbs1 (#6944)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-20 09:57:35 +08:00
Yanchao Lu
d26a5a93ad
[https://nvbugs/5451296][bug] Cherry-pick #7017 from release/1.0 branch (#7043)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-19 11:25:05 -04:00
pcastonguay
e07fcc3a22
[https://nvbugs/5444937][chore] Fixing KV events tests (#7004)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-19 11:18:04 -04:00
zhhuang-nv
7e135d2ea7
[None][feat] Use Separate QKV Input Layout for Context MLA (#6538)
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-08-19 22:04:48 +08:00
Emma Qiao
8f95f35503
[None][infra] Waive failed tests on main (#7037)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-19 09:31:07 -04:00
Yiqing Yan
07506bccbe
[None][chore] Remove duplicate test waives (#7044)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-19 21:04:31 +08:00
Fanrong Li
655d0f48d0
[https://nvbugs/5455140][fix] unwaive DSR1-fp4 throughput_tp8 (#7022)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-08-19 20:48:05 +08:00
tomeras91
f0bfb49219
[https://nvbugs/5458874][fix] Fix Nemotron-H flaky CUDA graph / overlap scheduler test (#6996)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-08-19 15:45:06 +03:00
xinhe-nv
2c86cee38c
[None][chore] Remove closed bugs (#6969)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-19 16:01:33 +08:00
Shunkangz
54ec2c1af1
[None][opt] Add batch wait timeout in fetching requests (#6923)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-19 03:50:08 -04:00
Eran Geva
636c622bb8
[https://nvbugs/5458798][fix] Relaxed test threshold, added documentation (#6997)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-19 00:24:03 -07:00
Ivy Zhang
bff5fdf6df
[TRTLLM-6541][test] Add NIM Related Cases Part 1 (#6684)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-19 13:59:14 +08:00
William Zhang
daa2a65d37
[https://nvbugs/5454875][ci] Unwaive Mistral Small 3.1 test (#7011)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-19 00:32:14 -04:00
fredricz-20070104
e90280a84d
[TRTLLM-6541][test] Add NIM Related Cases [StarCoder2_7B] and [Codestral_22B_V01] (#6939)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-08-19 00:13:04 -04:00
Fanrong Li
816a120af6
[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell (#6710)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-08-19 00:03:03 -04:00
Zhenhuan Chen
2bb90ba002
[TRTLLM-6960][fix] enable scaled_mm tests (#6936)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-08-19 10:18:04 +08:00
Yi Zhang
a15af879ec
[None][refactor] Refactor Torch Compile Backend, MoeLoadBalancer and warmup Logic (#6615)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-08-19 09:58:44 +08:00
Lizhi Zhou
71e28eab36
[TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models (#6741)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-08-19 09:58:22 +08:00
Wanli Jiang
dabebb2c7a
[https://nvbugs/5371480][fix] Enable test_phi3_small_8k (#6938)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-19 09:42:35 +08:00
Leslie Fang
e76e5c640f
[None][infra] Enable accuracy test for mtp and chunked prefill (#6314)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-19 07:42:52 +08:00
Yiqing Yan
1ce23545fc
[None][chore] Remove duplicate test waives (#6998)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-18 21:15:49 +08:00
Emma Qiao
69ff32f9b1
[None][infra] Waive failed tests on main 0818 (#6992)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-18 20:34:52 +08:00
Shi Xiaowei
5ec15b98f0
[TRTLLM-7030][fix] uppercase def value in pd-config (#6981)
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-18 02:33:23 -04:00
Leslie Fang
ce0b13ea02
[None][infra] update feature_combination_matrix of disaggregated and Eagle3 (#6945)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-18 09:18:17 +08:00
Naveassaf
d6322f70b7
[https://nvbugs/5451028][fix] Constrain NemotronSuper test parameters to prevent OOMs (#6970)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-08-17 13:38:36 -04:00
amitz-nv
3a49b47081
[https://nvbugs/5390853][fix] Fix _test_openai_lora.py - disable cuda graph (#6965)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-17 16:56:16 +03:00
Emma Qiao
cc6d763824
[None][infra]Waive failed cases in main branch (#6951)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-17 14:27:59 +03:00
bhsueh_NV
85cbd0263b
[None][feat] Support Yarn on Qwen3 (#6785)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-17 07:21:29 +08:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
brb-nv
9505727d31
[https://nvbugs/5401114][fix] Unwaive Gemma3 tests (#6952)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-15 16:35:02 -07:00
Yuening Li
1f8ae2b2db
[TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow (#6629)
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
2025-08-15 17:15:49 -04:00
dongfengy
0ad0b967bb
[None][fix] Make TP working for Triton MOE (in additional to EP we are using) (#6722)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-15 16:58:42 -04:00
ajrasane
4162d2d746
[None][test] Add accuracy evaluation for AutoDeploy (#6764)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-15 13:46:09 -04:00
yifeizhang-c
4127d77678
[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537)
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-08-15 09:52:06 -07:00
liji-nv
18ccd053d3
[https://nvbugs/5427801][fix] Torch compile support for Llama4 and Ea… (#6858)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-15 11:14:20 -04:00
peaceh-nv
1c1d5d2495
[https://nvbugs/5451373][fix] : Fix the accuracy issue when using FP8 context MLA (#6881)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-08-15 16:53:56 +08:00
xinhe-nv
b23fdfc62f
[None][chore] Add failed cases into waives.txt (#6914)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-08-15 14:00:16 +08:00
Yanchao Lu
3a987891d8
[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures (#6836)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-15 11:16:07 +08:00
Bo Deng
e54ba75dac
[None][fix] Update tests to use standardized uppercase backend identifiers (#6921)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-15 11:14:15 +08:00
Frank
2cc59aacb3
[None][fix] Correct reporting of torch_dtype for ModelConfig class. (#6800)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-14 22:46:20 -04:00