Xiwen Yu
62a78973a8
Merge remote-tracking branch 'origin/main' into user/xiweny/merge_0901
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-02 10:12:30 +08:00
Leslie Fang
e81c50dbd2
[None][chore] Use llm args in create_py_executor ( #7239 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-01 16:27:55 -07:00
Tian Zheng
1b9c4cc2f7
[None][fix] Fix nanobind failure ( #7425 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-09-01 17:26:40 -04:00
jiahanc
9f2dc3069d
[None] [doc] Update DeepSeek example doc ( #7358 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-09-01 14:43:58 -04:00
Mike Iovine
b3c57a7042
[TRTLLM-7353][feat] Implement capturable drafting loops for speculation ( #7100 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-01 14:37:44 -04:00
Emma Qiao
01dfd3af1b
[None][infra] Waive failed case on main 0901 ( #7447 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-01 23:27:24 +08:00
bhsueh_NV
16e9d1121c
[ https://nvbugs/5481087 ][fix] fix bug of ci when we use mocker ( #7332 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-09-01 16:22:45 +08:00
yuanjingx87
2b286ae613
[None][infra] Disable GB200-PyTorch-1 due to OOM issue ( #7386 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-01 01:56:31 -04:00
nvamyt
efaefca2c8
[None][test] Update case that not support passing quantization fp8 for pytorch backend ( #7302 )
...
Signed-off-by: nvamyt <amyt@nvidia.com>
2025-09-01 12:59:21 +08:00
Xiwen Yu
38ef850552
Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_0901
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-01 11:46:44 +08:00
Dimitrios Bariamis
b0558c73fc
[None][fix] Fix build of tritonbuild/tritonrelease image ( #7003 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Dimitrios Bariamis
44cc308e6a
[ https://nvbugs/5474037 ][fix] Fix building tritonbuild/tritonrelease images ( #7157 )
...
Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com>
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
QI JUN
ed4087a295
[ https://nvbugs/5374016 ][fix] improve error message ( #6893 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Aurelien Chartier
93e623b455
[ https://nvbugs/5449155 ][fix] Fix DeepSeek R1 weight loading for TP16 ( #6913 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yiqing Yan
21291f3d8e
[None][chore] Remove duplicate test waives ( #6999 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Emma Qiao
09bca7ca82
[None][infra] Waive failed tests for release branch 0818 ( #6993 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
peaceh-nv
f4dc1ed39c
[ https://nvbugs/5449218 ][fix] Fix KvCacheConfig error in test_perf ( #6937 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
29cdcdb56a
[None][fix] update skip config ( #6891 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Guoming Zhang
d5bc5cd4f2
[ https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 ( #6847 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
William Zhang
d15dcdc4ae
[ https://nvbugs/5448525 ][fix] Mistral Small 3.1 accuracy tests ( #6909 )
...
This commit lowers the GPU memory allocated for KV cache in accuracy
tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Liao Lanyu
704fca4178
[TRTLLM-6835][fix] Fix potential hang caused by python multiprocessing when prefetching weights ( #6927 )
...
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yilin Fan
261ffacfa4
[ https://nvbugs/5412562 ][feat] Allocate MoE workspace only when necessary (release/1.0 retargeted) ( #6955 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Venky
093a03796f
[None][infra] update CODEOWNERS for release ( #6905 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Mike Iovine
de55763f13
[ https://nvbugs/5455836 ][fix] Fix llama 4 FP4 ( #6911 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yan Chunwei
ac07418968
[None][ci] unwaive test_ptp_star_attention_example ( #6943 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Iman Tabrizian
665a1a7c36
[ https://nvbugs/5451434 ][fix] Fix triton docker build ( #6898 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
xinhe-nv
b4d41d6604
[TRTLLM-7048][feat] add benchmark TRT flow test for MIG ( #6884 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yan Chunwei
612c26be22
[None][doc] add legacy section for tensorrt engine ( #6724 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
brb-nv
0253036a4e
[None][chore] Add docs for Gemma3 VLMs ( #6880 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yukun He
e106045fda
[None][fix] Complete the last missing allreduce op in Llama3/4. ( #6850 )
...
The allreduce op of the last decoder layer is missing in some circumstances for the models Llama3 and Llama4.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Anurag Mukkara
b821883b25
[None][fix] Revert phi4-mm aggregate mode ( #6907 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
2ez4bz
cf0c47ca2d
[None][fix] Fix batching bug in Mistral3 model ( #6841 )
...
Prior to this commit, if multiple requests with images were in the same
batch, the batching logic for the images would fail.
This commit fixes it, and adds unit tests for it that were verified to
fail prior to the fix.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yiqing Yan
3aeee19f9c
[None][infra] Setup the code review rule on the release branch ( #6725 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
2ez4bz
2480aedb73
[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 ( #6731 )
...
This commit adds some level of FP8 support to Mistral Small 3.1 by:
* disabling quantization for the vision sub-model since `modelopt` does
support quantizing it (yet).
* extending existing accuracy tests to use a modelopt produced FP8
checkpoint.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Guoming Zhang
3e99744201
[ https://nvbugs/5375594 ][fix] fix oom issue on structural_tag test case ( #6838 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
deba2885c1
[None][fix] fix Llama3 eagle3 test case OOM ( #6832 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
xinhe-nv
7841ea6255
[None][chore] waive GB300 known issues ( #6812 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
c7147d25dc
[TRTLLM-6975][test] Add multi-turn test cases for VLM models ( #6749 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yanchao Lu
c5148f52d5
[None][ci] Some improvements for Slurm CI setup ( #7407 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-01 10:57:36 +08:00
Xiwen Yu
14154ec1d3
disable sm103 moe kernel
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-01 10:47:58 +08:00
Xiwen Yu
a765ee4d21
Merge branch 'feat/b300_cu13-latest' into 'feat/b300_cu13'
...
[https://nvbugs/5453949 ][infra] unwaive test_llama_eagle3
See merge request ftp/tekit!9684
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-31 18:29:56 -07:00
Bo Deng
3805f615da
[ https://nvbugs/5453949 ][infra] unwaive test_llama_eagle3
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-31 18:29:39 -07:00
Xiwen Yu
3cc2591a45
Merge branch 'dev-jiaganc-fix-b300-moe-lora' into 'feat/b300_cu13'
...
[https://nvbugs/5443053 ][fix] Disable finalize fusion when Lora is used
See merge request ftp/tekit!9692
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-08-31 18:28:09 -07:00
Jiagan Cheng
8d5a7ea5b3
[ https://nvbugs/5443053 ][fix] Disable finalize fusion when Lora is used
...
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-08-31 18:28:09 -07:00
Tian Zheng
e257cb3533
[None][feat] Support NVFP4 KV Cache ( #6244 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-09-01 09:24:52 +08:00
Zongfei Jing
a7ed26dd8b
[TRTLLM-6747][feat] Merge add sparse exp and shared exp into local reduction ( #7369 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-31 21:20:00 -04:00
Yiqing Yan
ec595a8e29
[None][chore] Bump version to 1.1.0rc2 ( #7394 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-31 10:20:38 +08:00
Xiwen Yu
0fb835d7c2
fix cutlass moe not falling back
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-30 14:50:54 +08:00
xinhe-nv
5f939b9121
[None][chore] Add failed cases into waives.txt ( #7342 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-30 00:49:14 -04:00
Robin Kobus
e09c025ffb
[None] [fix] store blog 10 media via lfs ( #7375 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-30 10:17:53 +08:00