Zhanrui Sun
ee37589c8c
infra: update DLFW 25.08 GA, triton 25.08 GA
...
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
2025-08-27 20:17:56 -07:00
Xiwen Yu
9ad68de159
Merge branch 'user/xiweny/update_cutlass_4.2' into 'feat/b300_cu13'
...
update cutlass and DeepGEMM
See merge request ftp/tekit!9678
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-08-26 22:43:39 -07:00
Xiwen Yu
b1c6f6a568
update cutlass and DeepGEMM
...
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
2025-08-26 22:43:39 -07:00
Xiwen Yu
ab7febd4d8
Merge commit '31979aefacbf80d2742c98ef30385db162788c84' into feat/b300_cu13
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-26 10:31:35 +08:00
Xiwen Yu
66b1d8d66d
Update flashinfer
2025-08-24 22:18:32 -07:00
Xiwen Yu
80ea0628d7
fix cubins
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-25 11:41:25 +08:00
Robin Kobus
31979aefac
[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests ( #6754 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-24 20:53:17 +02:00
ajrasane
068056677f
[None][chore] Enable auto deploy accuracy test in CI ( #7179 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-24 08:42:30 -07:00
Yanchao Lu
ec35481b0a
[None][infra] Prepare for single GPU GB200 test pipeline ( #7073 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:46:39 +08:00
dongfengy
48155f52bf
[TRTLLM-7321][doc] Refine GPT-OSS doc ( #7180 )
...
Signed-off-by: Dongfeng Yu
2025-08-24 08:53:53 -04:00
dongxuy04
19a0ea363b
[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP ( #6973 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: Dongxu Yang <dongxuy@nvidia.com>
Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-08-24 08:15:29 -04:00
amitz-nv
35e0ae484a
[ https://nvbugs/5467232 ][fix] Fix load_torch_hf_lora to override lora_config.trtllm_modules_to_hf_modules with default only when it has no value ( #7132 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-24 15:00:24 +03:00
Iman Tabrizian
96ff82e77a
[None][fix] Waive test ( #7185 )
...
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-24 10:45:11 +08:00
Xiwen Yu
90a9bc463d
fix build error
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-23 23:03:32 +08:00
Xiwen Yu
808059da34
Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_main_0819
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-23 16:13:30 +08:00
Xiwen Yu
fa8b52ed33
fix more sm version check
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-23 15:17:59 +08:00
Xiwen Yu
b7cc06cd6a
disable merge waive list stage
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-23 15:17:57 +08:00
Xiwen Yu
f4de8840ec
Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_main_0819
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-23 15:17:48 +08:00
Xiwen Yu
5391191d7f
update tg cubins (temp ver)
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-23 15:14:38 +08:00
Grace Ho
3d54a1a521
[None] [feat] nsys profile output kernel classifier ( #7020 )
...
Signed-off-by: Grace Ho <grho@nvidia.com>
2025-08-23 00:57:37 -04:00
Frank
81fd468fec
[None][fix] Correct KV cache percentage report out. ( #7102 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-22 10:28:57 -07:00
Izzy Putterman
b36460d7b5
[None][feat] Deepseek: Start Eagle work ( #6210 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
2025-08-22 12:57:17 -04:00
Robin Kobus
37543a9ad7
[None][refactor] Simplify decoder state initialization for speculative decoding ( #6869 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-22 18:44:17 +02:00
tomeras91
c232ba8157
[TRTLLM-4921][feat] Enable chunked prefill for Nemotron-H ( #6334 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-22 12:15:20 -04:00
Suyog Gupta
e3de5758a3
[ #7136 ][feat] trtllm-serve + autodeploy integration ( #7141 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-22 08:30:53 -07:00
Yiqing Yan
907bc22fcb
[None][chore] Bump version to 1.1.0rc2 ( #7167 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-22 22:02:28 +08:00
QI JUN
1388e84793
[None][ci] move all B200 TensorRT test cases to post merge ( #7165 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-22 06:47:23 -04:00
xinhe-nv
b8b2bd4a0a
[TRTLLM-7245][feat] add test_multi_nodes_eval tests ( #7108 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-22 17:17:27 +08:00
dongfengy
d94cc3fa3c
[TRTLLM-7321][doc] Add GPT-OSS Deployment Guide into official doc site ( #7143 )
...
Signed-off-by: Dongfeng Yu
2025-08-22 16:17:01 +08:00
Linda
898f37faa0
[None][feat] Enable nanobind as the default binding library ( #6608 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-22 09:48:41 +02:00
Emma Qiao
a49cf684f8
[TRTLLM-5801][infra] Add more RTX Pro 6000 test stages ( #5126 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-22 03:12:02 -04:00
Daniel Cámpora
099f081e03
[TRTLLM-7155][feat] Unify sampler handle logits implementation. ( #6867 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-22 08:09:30 +02:00
Yukun He
983dd7e57c
[None][fix] Fix mm_placholder_counts extraction issue. ( #7118 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-22 12:28:30 +08:00
xinhe-nv
4017f7cd6b
[None][chore] Add failed cases into waives.txt ( #7109 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-22 10:39:25 +08:00
Wanli Jiang
07c711eb1f
[TRTLLM-6825][fix] Update lora for phi4-mm ( #6817 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-21 22:00:04 -04:00
Suyog Gupta
c5036cb536
[None][docs] update stale link for AutoDeploy ( #7135 )
2025-08-21 18:41:44 -07:00
dominicshanshan
6f245ec78b
[None][chore] Mass integration of release/1.0 ( #6864 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-22 09:25:15 +08:00
Daniel Stokes
f7c597ec40
[None][perf] Make finalize fusion part of the tactic selection logic ( #6915 )
...
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
2025-08-21 14:08:03 -07:00
Fridah-nv
e18dacc931
[ #4403 ][refactor] Move fusion, kvcache, and compile to modular inference optimizer ( #7057 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-08-21 10:30:36 -07:00
Emma Qiao
344bc4575d
[None][infra] Waive failed case for main branch ( #7129 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-22 00:08:55 +08:00
Dimitrios Bariamis
f49dafe0da
[ https://nvbugs/5394409 ][feat] Support Mistral Small 3.1 multimodal in Triton Backend ( #6714 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-21 18:08:38 +02:00
brb-nv
9a2b44d0f2
[None][chore] No-op changes to support context parallelism in disaggregated serving later ( #7063 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-21 08:21:27 -07:00
Yuan Tong
90bfc8cc29
[ https://nvbugs/5453827 ][fix] Fix RPATH of th_common shared library to find pip-installed NCCL ( #6984 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-21 17:58:30 +08:00
ChristinaZ
c7269ea93a
[ https://nvbugs/5392414 ] [fix] Add customized default routing method ( #6818 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-08-21 16:58:41 +08:00
Farshad Ghodsian
2d40e8750b
[None][doc] Update gpt-oss deployment guide to latest release image ( #7101 )
...
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-21 02:33:07 -04:00
bhsueh_NV
ba0a86e0bb
[ https://nvbugs/5437405 ][fix] qwen3 235b eagle3 ci ( #7000 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-21 01:17:32 -04:00
Fridah-nv
647a52698a
[ https://nvbugs/5443039 ][fix] Fix AutoDeploy pattern matcher for torch 2.8 ( #7076 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-21 01:14:51 -04:00
Yao Yao
cbcea33279
[fix]: use safeInitRowMax instead of fp32_lowest to avoid NaN ( #7087 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-08-20 22:12:21 -07:00
xinhe-nv
21f4434404
[None][chore] waive failed cases on H100 ( #7084 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-21 11:15:23 +08:00
Fan - Yunfan
41ff4901ee
[None][fix] Fix const modifier inconsistency in log function declaration/implementation ( #6679 )
...
Signed-off-by: fanyunfan <2569548856@qq.com>
Co-authored-by: fanyunfan <2569658856@qq.com>
Co-authored-by: Yunfan Fan <46273019+fyf2016@users.noreply.github.com>
2025-08-21 11:08:11 +08:00