Commit Graph

2375 Commits

Author SHA1 Message Date
Yi Zhang
a15af879ec
[None][refactor] Refactor Torch Compile Backend, MoeLoadBalancer and warmup Logic (#6615)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-08-19 09:58:44 +08:00
Lizhi Zhou
71e28eab36
[TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models (#6741)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-08-19 09:58:22 +08:00
Wanli Jiang
dabebb2c7a
[https://nvbugs/5371480][fix] Enable test_phi3_small_8k (#6938)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-19 09:42:35 +08:00
Fridah-nv
97ba0eb879
[None][autodeploy] Doc: fix link path in trtllm bench doc (#7007)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-19 08:43:28 +08:00
Leslie Fang
e76e5c640f
[None][infra] Enable accuracy test for mtp and chunked prefill (#6314)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-19 07:42:52 +08:00
Daniel Cámpora
d16af87d03
[TRTLLM-7158][feat] Introduce sampler options in trtllm bench (#6855)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-18 18:10:05 -04:00
Yanchao Lu
d1d17dbeba
[None][infra] Cherry-pick #6836 from main branch and improve SSH connection (#6971) (#7005)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-19 01:35:30 +08:00
Martin Marciniszyn Mehringer
425dad01fd
[None][fix] Clean up linking to CUDA stub libraries in build_wheel.py (#6823)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-18 11:20:51 -04:00
Yiqing Yan
1ce23545fc
[None][chore] Remove duplicate test waives (#6998)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-18 21:15:49 +08:00
Emma Qiao
69ff32f9b1
[None][infra] Waive failed tests on main 0818 (#6992)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-18 20:34:52 +08:00
ChristinaZ
55f4f2d80c
[None] [fix] Fix the macro name (#6983)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-08-18 03:08:32 -04:00
Shi Xiaowei
5ec15b98f0
[TRTLLM-7030][fix] uppercase def value in pd-config (#6981)
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-18 02:33:23 -04:00
Kaiyu Xie
e88cb92f24
[None] [feat] Support accurate device iter time (#6906)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-18 13:47:14 +08:00
Bo Li
8b05b5d801
[None][doc] Update gpt oss doc (#6954)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-18 01:27:30 -04:00
Leslie Fang
ce0b13ea02
[None][infra] update feature_combination_matrix of disaggregated and Eagle3 (#6945)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-18 09:18:17 +08:00
Naveassaf
d6322f70b7
[https://nvbugs/5451028][fix] Constrain NemotronSuper test parameters to prevent OOMs (#6970)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-08-17 13:38:36 -04:00
amitz-nv
3a49b47081
[https://nvbugs/5390853][fix] Fix _test_openai_lora.py - disable cuda graph (#6965)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-17 16:56:16 +03:00
Emma Qiao
cc6d763824
[None][infra]Waive failed cases in main branch (#6951)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-17 14:27:59 +03:00
ChristinaZ
1e72721e8c
[None][feat] Add single block version renormalized routing kernel (#6756)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-08-17 13:47:13 +08:00
bhsueh_NV
85cbd0263b
[None][feat] Support Yarn on Qwen3 (#6785)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-17 07:21:29 +08:00
Fan - Yunfan
22d59a6f61
[None][fix] Using RAII to automatically manage the allocation and release of va_list for potential resource leak (#6758)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: fanyunfan <2569658856@qq.com>
Co-authored-by: Yunfan Fan <46273019+fyf2016@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-16 15:19:19 +08:00
Izzy Putterman
f6ff0e3311
[None][fix] Skip Topk if 0 (#6934)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-08-16 02:17:36 -04:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
Yiqing Yan
ec3d9f8052
[None][chore] Bump version to 1.1.0rc1 (#6953)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-16 10:32:47 +08:00
brb-nv
9505727d31
[https://nvbugs/5401114][fix] Unwaive Gemma3 tests (#6952)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-15 16:35:02 -07:00
Yuening Li
1f8ae2b2db
[TRTLLM-5863][feat] Support MoE INT8 Weight-Only-Quantization in PyTorch Workflow (#6629)
Signed-off-by: Yuening Li <62227368+yueningl@users.noreply.github.com>
2025-08-15 17:15:49 -04:00
dongfengy
0ad0b967bb
[None][fix] Make TP working for Triton MOE (in additional to EP we are using) (#6722)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-15 16:58:42 -04:00
ajrasane
4162d2d746
[None][test] Add accuracy evaluation for AutoDeploy (#6764)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-15 13:46:09 -04:00
yifeizhang-c
4127d77678
[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 (#6537)
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-08-15 09:52:06 -07:00
Perkz Zheng
6037fe3716
[https://nvbugs/5394685][fix] proper fix for the accuracy issue in 2CTA MLA kernels (#6941)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-15 23:29:36 +08:00
liji-nv
18ccd053d3
[https://nvbugs/5427801][fix] Torch compile support for Llama4 and Ea… (#6858)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-15 11:14:20 -04:00
tomeras91
f7dbc1435a
[None] [chore] Mamba cache in separate file (#6796)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-08-15 13:42:51 +03:00
Xianjie Qiao
c2fe8b03a2
[https://nvbugs/5405041][fix] Update wide-ep doc (#6933)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-08-15 05:32:32 -04:00
peaceh-nv
1c1d5d2495
[https://nvbugs/5451373][fix] : Fix the accuracy issue when using FP8 context MLA (#6881)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-08-15 16:53:56 +08:00
Zhenhua Wang
fadb5e75dd
[None][chore] add a EditorConfig config (#6897)
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-15 03:54:37 -04:00
xinhe-nv
b23fdfc62f
[None][chore] Add failed cases into waives.txt (#6914)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-08-15 14:00:16 +08:00
jmydurant
8e252256f5
[None][doc] Modify the description for mla chunked context (#6929)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-15 12:52:26 +08:00
Yanchao Lu
3a987891d8
[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures (#6836)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-15 11:16:07 +08:00
Bo Deng
e54ba75dac
[None][fix] Update tests to use standardized uppercase backend identifiers (#6921)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-15 11:14:15 +08:00
Wanli Jiang
9a133e9b41
[https://nvbugs/5415862][fix] Update cublas as 12.9.1 and cuda memory alignment as 256 (#6501)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-15 11:10:59 +08:00
Bo Li
15aabc1540
[None][fix] Fix perfect router. (#6797)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-08-14 20:09:08 -07:00
Frank
2cc59aacb3
[None][fix] Correct reporting of torch_dtype for ModelConfig class. (#6800)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-14 22:46:20 -04:00
Yunfan Fan
11d08c33af
[None][fix] Fix responsibility boundary between the assert and tllmException files (#6723)
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-15 10:34:49 +08:00
JunyiXu-nv
70e352a6f7
[https://nvbugs/5437106][fix] Add L4 Scout benchmarking WAR option in deploy guide (#6829)
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-08-15 08:53:13 +08:00
Perkz Zheng
11d89a3732
[https://nvbugs/5394685][fix] using static scheduler 2CTA MLA as WAR for an accuracy issue (#6896)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-08-15 08:51:04 +08:00
hlu1
5346eb7bc5
[None][doc] Update gpt-oss doc on MoE support matrix (#6908)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-15 08:50:31 +08:00
Aurelien Chartier
b13a5a99b2
[None][chore] Add tests for non-existent and completed request cancellation (#6840)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-14 15:57:01 -07:00
qianbiao
5c2f0fd03d
[None] [feat] Add Tencent HunYuanMoEV1 model support (#5521)
Signed-off-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: sorenwu <sorenwu@tencent.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
2025-08-15 06:56:44 +08:00
Raayan Dhar
8b237b943b
[https://nvbugs/5441714][chore] remove skip on disagg n-gram test (#6872)
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-14 15:45:00 -07:00
Mike Iovine
078e907b16
[https://nvbugs/5455651][fix] Make ngram use XQA attention on Blackwell (#6873)
Signed-off-by: Michael Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
2025-08-14 18:36:19 -04:00