xinhe-nv
1dba9fa89e
[TRTLLM-6239][feat] add test cases into QA test list ( #8081 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-30 00:23:45 -04:00
Kaiyu Xie
b0cb9ca50e
[None] [test] Add MNNVL AlltoAll tests to pre-merge ( #7466 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-09-29 23:12:24 -04:00
Lucas Liebenwein
dcfd3ef81c
[ #4593 ][feat] AutoDeploy: Linear Attention Support (SSM + causal_conv + Bamba + Nemotron-H) ( #8068 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-09-29 22:41:06 -04:00
Cao Dong
62010c0ab7
[None][feat] Return topk logprobs in torch backend ( #7976 )
...
Signed-off-by: Cao Dong <87467313+dcaox@users.noreply.github.com>
2025-09-30 09:32:37 +08:00
Cheng Hang
cdce68c3e0
[TRTLLM-6741][fix] Add heuristics for lm head tp size when enable_lm_head_tp_in_adp=True ( #7891 )
...
Signed-off-by: Cheng Hang <chang@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-30 09:24:35 +08:00
Patrice Castonguay
6396cb9208
[ https://nvbugs/5538098 ][fix] Checking connection to etcd server in unit test ( #8006 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-09-29 20:53:32 -04:00
Chang Liu
334e2cab0d
[ https://nvbugs/5542867 ][fix] Fix the non-determinism issue in the mm_encoder test ( #8033 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-29 09:45:16 -07:00
amitz-nv
e5f9b6aaa0
[None][fix] Fix TRT-python multi LoRA TP=2 test arguments ( #8059 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-09-29 12:20:04 -04:00
mpikulski
31a1a5ff80
[TRTLLM-8269][test] do not explicitly pass temperature=0 to select greedy sampling ( #7909 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-09-29 14:52:18 +01:00
xiweny
48e779ae8c
[ https://nvbugs/5541494 ] [fix] add back missing sm100f bmm kernels ( #8051 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-29 05:35:44 -04:00
yufeiwu-nv
3ba6727a68
[None][test] Update get_sysinfo.py to avoid UnboundLocalError ( #7982 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
2025-09-29 05:14:38 -04:00
Gal Hubara-Agam
b2095aa074
[ #4674 ][bugfix] AutoDeploy Fix memory leak in fuse_moe ( #7844 )
...
Delete the unstacked weights immediately to save GPU memory, cleanup occurs automatically after the transformation, but for large models we'll run out of memory during the transformation itself.
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-09-29 11:01:07 +03:00
xinhe-nv
20e6cd39f1
[None][chore] Add failed cases into waives.txt ( #8043 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-29 03:37:39 -04:00
Emma Qiao
ce381d6813
[None][infra] Waive failed cases for main on 0929 ( #8053 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-29 02:46:02 -04:00
HuiGao-NV
7ac932d45e
[ https://nvbugs/5532087 ][CI] Enable test case ( #8029 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-29 01:46:28 -04:00
Ivy Zhang
1e2e851db8
[None][chore] update test case constraint ( #8020 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-29 13:25:09 +08:00
Eran Geva
9cea6bfb30
[ #7288 ][feat] Added AutoDeploy backend support to test_perf.py ( #7588 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-09-28 21:21:27 -07:00
Ivy Zhang
0ecafd84da
[None][chore] Update chunked prefill test case configs ( #7868 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-29 10:37:34 +08:00
Yukun He
28b9a81c58
[TRTLLM-4500][feat] Add serialization/deserialization options for AutoTuner profiling cache ( #7738 )
...
To achieve determinism for the AutoTuner profiling cache, serialization and deserialization are introduced to store the cache on disk in JSON format. Use TLLM_AUTOTUNER_CACHE_PATH to indicate the path where the cache file should be stored:
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-09-29 07:40:51 +08:00
Emma Qiao
2be05cbd6e
[None][infra] Skip failed test for main branch on 9/28 ( #8040 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-28 07:00:55 -04:00
ChristinaZ
95eac2cda7
[ https://nvbugs/5537738 ][fix] Add fp8 post-quant allgather support ( #8008 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-09-28 15:32:45 +08:00
Iman Tabrizian
33282351a2
[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path ( #6348 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-27 19:29:30 -04:00
Frida Hou
a36b48bcab
[ #5860 ][autodeploy] GPT-OSS MXFP4 support ( #7451 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-09-26 15:36:06 -07:00
Jhao-Ting Chen
c33f43e13a
[ https://nvbugs/5518713 ][fix] Trtllm-gen moe backend for blockwise fp8 ckpt (Qwen3-235B-A22B-FP8) ( #7856 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-09-26 14:29:32 -07:00
Emma Qiao
c8bef27ebb
[None][infra] Waive failed cases in post-merge 2305 ( #8019 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-26 10:20:12 -07:00
YueWeng
a4243f0da5
[TRTLLM-6393][feat] add static tree sampling and verification ( #7161 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-09-26 13:16:16 -04:00
xinhe-nv
ba6ab62bd1
[None][chore] Add failed cases into waives.txt ( #8004 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-26 00:41:02 -07:00
xinhe-nv
f32f5730b2
[None][chore] Add failed cases into waives.txt ( #7986 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-25 23:50:09 -07:00
Lucas Liebenwein
3a96d75a3c
[ https://nvbugs/5527956 ][fix] AutoDeploy: fix IMA due to outdated metadata ( #8002 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-25 22:05:55 -07:00
sunnyqgg
2e5850c28a
[TRTLLM-7330][feat] Eagle3 cuda graph support for the first draft model inference ( #7363 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-09-26 11:28:05 +08:00
QI JUN
4c0f8482f1
[None][ci] Waive test_mm_encoder_standalone.py::test_multi_request_batch_chat[llava-v1.6-mistral-7b-hf] ( #8010 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-26 11:07:54 +08:00
Enwei Zhu
d650320de4
[None][infra] Improve the failure message for accuracy test suite ( #7994 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-26 10:04:47 +08:00
Yiqing Yan
108248ece1
[TRTLLM-7999][infra] Add B300/GB300 single gpu test ( #7951 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-26 09:59:11 +08:00
QI JUN
1529a6f22d
[None][chore] extract weights loading related logic to model loader ( #7579 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-25 10:19:22 -07:00
Emma Qiao
2dc93c6371
[None][infra] Waive failed tests on main ( #8001 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-25 08:13:39 -07:00
xxi
57ff5f4c0d
[None][fix] fix a bug in wideEp use DeepEP with num_chunks > 1 ( #7954 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-09-25 07:53:42 -07:00
Matthias Jouanneaux
eda1467061
[TRTLLM-5966][feat] Helix: add alltoall op ( #6815 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-09-25 07:18:29 -07:00
Guoming Zhang
202bed4574
[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. ( #7851 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
5999fab146
[ https://nvbugs/5427043 ][fix] cherrypick: request length exceeds max_num_tokens ( #7718 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
cb466a846d
[None][fix] api stability bug in status label ( #7861 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
9f0f52249e
[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … ( #7850 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
5342c607cd
[ https://nvbugs/5516710 ][fix] fix Llama 3.3 TP PP case ( #7717 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
xinhe-nv
e30d9aced9
[ https://nvbugs/4955671 ][fix] update test list ( #7980 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-25 02:58:09 -07:00
Chuang Zhu
791e73edf6
[ https://nvbugs/5536141 ][fix] fix_disagg_single_gpu_test ( #7990 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-25 02:07:22 -07:00
Emma Qiao
cb53261aaf
[None][infra] Unwaive some tests since dev already have a PR to collect more info ( #7984 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-25 01:03:13 -07:00
fredricz-20070104
0945403174
[TRTLLM-6541][test] Add NIM perf test cases ( #7924 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-25 13:15:26 +08:00
Iman Tabrizian
be7e51727e
[ https://nvbugs/5456485 ][bug] unwaive triton test ( #7966 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-24 17:02:55 -07:00
Iman Tabrizian
da30d496b0
[None][fix] Revert "[None][feat] Return topk logprobs in torch backend ( #7756 )" ( #7969 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-24 15:36:38 -07:00
sychen52
5a65af24cd
[OMNIML-2336][feat] Add NVFP4 x FP8 moe kernels ( #7821 )
...
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-24 12:14:35 -07:00
Mike Iovine
42c2ec3239
[ https://nvbugs/5473781 ][fix] Fix llama 4 FP8 for PP>1 ( #7220 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-24 12:16:27 -04:00