chenfeiz0326
6a44e5b9d1
[ https://nvbugs/5440241 ][fix] Fix 70B GSM8K Accuracy drop ( #6967 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-25 22:09:30 +08:00
Emma Qiao
200db3b809
[None][infra] Waive failed tests on main branch ( #7201 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-25 09:04:37 -04:00
QI JUN
bea5e07fb7
[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs ( #6846 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-25 20:52:05 +08:00
shaharmor98
b32e00e9fd
[None][chore] remove CLI support for mamba cache dtype setting ( #7119 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-25 08:08:51 -04:00
amitz-nv
a1e03af0f4
[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests ( #7033 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-25 10:37:40 +03:00
Enwei Zhu
be6d92f09f
[None][fix] Fix MoE load balancer config loading ( #7150 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-25 01:42:54 -04:00
Ivy Zhang
f61b74f796
[None][test] add l20 specific qa test list ( #7067 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-25 12:44:08 +08:00
QI JUN
630e67b845
[None][ci] waive test_mamba2_chunk_scan_combined_prefill_chunking[seqlens1-8] ( #7194 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-24 23:52:59 -04:00
Yukun He
9c5b464fe0
[None][feat] Apply AutoTuner to fp8_block_scale_deep_gemm to trigger JIT ahead of time. ( #7113 )
...
Because deep_gemm.gp8_gemm_nt will trigger many JIT processes during the inference phase, we need to sweep these shapes ahead of time. Apply the AutoTuner framework to achieve this and retain the potential capability to tune the swap_ab flag.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-25 10:48:31 +08:00
Bo Deng
c038fb3ef4
[None][chore] cherry-pick 6940 ( #7097 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-25 10:28:45 +08:00
xinhe-nv
3ba9afcc7b
[None][feat] add gpt-osss tests to sanity list ( #7158 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-25 10:22:07 +08:00
Bo Deng
6e131602b2
[TRTLLM-7096][infra] Testing cache transmission functionality in Python ( #7025 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-25 09:47:39 +08:00
Yiqing Yan
486bc763c3
[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge ( #7074 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:09:04 -04:00
Robin Kobus
31979aefac
[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests ( #6754 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-24 20:53:17 +02:00
ajrasane
068056677f
[None][chore] Enable auto deploy accuracy test in CI ( #7179 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-24 08:42:30 -07:00
Yanchao Lu
ec35481b0a
[None][infra] Prepare for single GPU GB200 test pipeline ( #7073 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:46:39 +08:00
dongfengy
48155f52bf
[TRTLLM-7321][doc] Refine GPT-OSS doc ( #7180 )
...
Signed-off-by: Dongfeng Yu
2025-08-24 08:53:53 -04:00
dongxuy04
19a0ea363b
[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP ( #6973 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: Dongxu Yang <dongxuy@nvidia.com>
Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-08-24 08:15:29 -04:00
amitz-nv
35e0ae484a
[ https://nvbugs/5467232 ][fix] Fix load_torch_hf_lora to override lora_config.trtllm_modules_to_hf_modules with default only when it has no value ( #7132 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-24 15:00:24 +03:00
Iman Tabrizian
96ff82e77a
[None][fix] Waive test ( #7185 )
...
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-24 10:45:11 +08:00
Grace Ho
3d54a1a521
[None] [feat] nsys profile output kernel classifier ( #7020 )
...
Signed-off-by: Grace Ho <grho@nvidia.com>
2025-08-23 00:57:37 -04:00
Frank
81fd468fec
[None][fix] Correct KV cache percentage report out. ( #7102 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-22 10:28:57 -07:00
Izzy Putterman
b36460d7b5
[None][feat] Deepseek: Start Eagle work ( #6210 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
2025-08-22 12:57:17 -04:00
Robin Kobus
37543a9ad7
[None][refactor] Simplify decoder state initialization for speculative decoding ( #6869 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-22 18:44:17 +02:00
tomeras91
c232ba8157
[TRTLLM-4921][feat] Enable chunked prefill for Nemotron-H ( #6334 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-22 12:15:20 -04:00
Suyog Gupta
e3de5758a3
[ #7136 ][feat] trtllm-serve + autodeploy integration ( #7141 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-22 08:30:53 -07:00
Yiqing Yan
907bc22fcb
[None][chore] Bump version to 1.1.0rc2 ( #7167 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-22 22:02:28 +08:00
QI JUN
1388e84793
[None][ci] move all B200 TensorRT test cases to post merge ( #7165 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-22 06:47:23 -04:00
xinhe-nv
b8b2bd4a0a
[TRTLLM-7245][feat] add test_multi_nodes_eval tests ( #7108 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-22 17:17:27 +08:00
dongfengy
d94cc3fa3c
[TRTLLM-7321][doc] Add GPT-OSS Deployment Guide into official doc site ( #7143 )
...
Signed-off-by: Dongfeng Yu
2025-08-22 16:17:01 +08:00
Linda
898f37faa0
[None][feat] Enable nanobind as the default binding library ( #6608 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-22 09:48:41 +02:00
Emma Qiao
a49cf684f8
[TRTLLM-5801][infra] Add more RTX Pro 6000 test stages ( #5126 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-22 03:12:02 -04:00
Daniel Cámpora
099f081e03
[TRTLLM-7155][feat] Unify sampler handle logits implementation. ( #6867 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-22 08:09:30 +02:00
Yukun He
983dd7e57c
[None][fix] Fix mm_placholder_counts extraction issue. ( #7118 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-22 12:28:30 +08:00
xinhe-nv
4017f7cd6b
[None][chore] Add failed cases into waives.txt ( #7109 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-22 10:39:25 +08:00
Wanli Jiang
07c711eb1f
[TRTLLM-6825][fix] Update lora for phi4-mm ( #6817 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-21 22:00:04 -04:00
Suyog Gupta
c5036cb536
[None][docs] update stale link for AutoDeploy ( #7135 )
2025-08-21 18:41:44 -07:00
dominicshanshan
6f245ec78b
[None][chore] Mass integration of release/1.0 ( #6864 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-22 09:25:15 +08:00
Daniel Stokes
f7c597ec40
[None][perf] Make finalize fusion part of the tactic selection logic ( #6915 )
...
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
2025-08-21 14:08:03 -07:00
Fridah-nv
e18dacc931
[ #4403 ][refactor] Move fusion, kvcache, and compile to modular inference optimizer ( #7057 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-08-21 10:30:36 -07:00
Emma Qiao
344bc4575d
[None][infra] Waive failed case for main branch ( #7129 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-22 00:08:55 +08:00
Dimitrios Bariamis
f49dafe0da
[ https://nvbugs/5394409 ][feat] Support Mistral Small 3.1 multimodal in Triton Backend ( #6714 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-21 18:08:38 +02:00
brb-nv
9a2b44d0f2
[None][chore] No-op changes to support context parallelism in disaggregated serving later ( #7063 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-21 08:21:27 -07:00
Yuan Tong
90bfc8cc29
[ https://nvbugs/5453827 ][fix] Fix RPATH of th_common shared library to find pip-installed NCCL ( #6984 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-21 17:58:30 +08:00
ChristinaZ
c7269ea93a
[ https://nvbugs/5392414 ] [fix] Add customized default routing method ( #6818 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-08-21 16:58:41 +08:00
Farshad Ghodsian
2d40e8750b
[None][doc] Update gpt-oss deployment guide to latest release image ( #7101 )
...
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-21 02:33:07 -04:00
bhsueh_NV
ba0a86e0bb
[ https://nvbugs/5437405 ][fix] qwen3 235b eagle3 ci ( #7000 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-21 01:17:32 -04:00
Fridah-nv
647a52698a
[ https://nvbugs/5443039 ][fix] Fix AutoDeploy pattern matcher for torch 2.8 ( #7076 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-21 01:14:51 -04:00
Yao Yao
cbcea33279
[fix]: use safeInitRowMax instead of fp32_lowest to avoid NaN ( #7087 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2025-08-20 22:12:21 -07:00
xinhe-nv
21f4434404
[None][chore] waive failed cases on H100 ( #7084 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-21 11:15:23 +08:00