Ivy Zhang
0ecafd84da
[None][chore] Update chunked prefill test case configs ( #7868 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-29 10:37:34 +08:00
Emma Qiao
2be05cbd6e
[None][infra] Skip failed test for main branch on 9/28 ( #8040 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-28 07:00:55 -04:00
ChristinaZ
95eac2cda7
[ https://nvbugs/5537738 ][fix] Add fp8 post-quant allgather support ( #8008 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-09-28 15:32:45 +08:00
Iman Tabrizian
33282351a2
[TRTLLM-6106][feat] Add support for KVCache transfer from KVCache reuse path ( #6348 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-27 19:29:30 -04:00
Jhao-Ting Chen
c33f43e13a
[ https://nvbugs/5518713 ][fix] Trtllm-gen moe backend for blockwise fp8 ckpt (Qwen3-235B-A22B-FP8) ( #7856 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-09-26 14:29:32 -07:00
Emma Qiao
c8bef27ebb
[None][infra] Waive failed cases in post-merge 2305 ( #8019 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-26 10:20:12 -07:00
xinhe-nv
ba6ab62bd1
[None][chore] Add failed cases into waives.txt ( #8004 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-26 00:41:02 -07:00
xinhe-nv
f32f5730b2
[None][chore] Add failed cases into waives.txt ( #7986 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-25 23:50:09 -07:00
Lucas Liebenwein
3a96d75a3c
[ https://nvbugs/5527956 ][fix] AutoDeploy: fix IMA due to outdated metadata ( #8002 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-25 22:05:55 -07:00
Enwei Zhu
d650320de4
[None][infra] Improve the failure message for accuracy test suite ( #7994 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-26 10:04:47 +08:00
Yiqing Yan
108248ece1
[TRTLLM-7999][infra] Add B300/GB300 single gpu test ( #7951 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-26 09:59:11 +08:00
Emma Qiao
2dc93c6371
[None][infra] Waive failed tests on main ( #8001 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-25 08:13:39 -07:00
Guoming Zhang
202bed4574
[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. ( #7851 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
9f0f52249e
[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … ( #7850 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
5342c607cd
[ https://nvbugs/5516710 ][fix] fix Llama 3.3 TP PP case ( #7717 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
xinhe-nv
e30d9aced9
[ https://nvbugs/4955671 ][fix] update test list ( #7980 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-25 02:58:09 -07:00
Chuang Zhu
791e73edf6
[ https://nvbugs/5536141 ][fix] fix_disagg_single_gpu_test ( #7990 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-25 02:07:22 -07:00
Emma Qiao
cb53261aaf
[None][infra] Unwaive some tests since dev already have a PR to collect more info ( #7984 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-25 01:03:13 -07:00
fredricz-20070104
0945403174
[TRTLLM-6541][test] Add NIM perf test cases ( #7924 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-25 13:15:26 +08:00
Iman Tabrizian
be7e51727e
[ https://nvbugs/5456485 ][bug] unwaive triton test ( #7966 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-24 17:02:55 -07:00
Mike Iovine
42c2ec3239
[ https://nvbugs/5473781 ][fix] Fix llama 4 FP8 for PP>1 ( #7220 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-24 12:16:27 -04:00
Pamela Peng
b1dc84b4a3
[TRTLLM-7399][test] Add DS-R1/Qwen3 test cases for RTX 6000 ( #7662 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-24 11:40:26 -04:00
HuiGao-NV
c8bda4b3a9
[None][ci] Waive some intermittent failures ( #7955 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-24 19:00:38 +08:00
Enwei Zhu
a1a57e83b8
[TRTLLM-5235][feat] Enable regex and EBNF grammar in trtllm-serve ( #7925 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-24 18:30:23 +08:00
xinhe-nv
b8bfa63197
[None][chore] add test_w4_1gpu[True-True-cutlass-fp8] & TestKimiK2::test_fp8_blocks… ( #7944 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-24 03:25:17 -07:00
QI JUN
18ff1e31b8
[None][ci] remove duplicate test cases ( #7956 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-24 17:47:22 +08:00
yufeiwu-nv
f323b74d42
[None][test] Update llm_models_root to improve path handling on BareMetal environment ( #7876 )
...
Signed-off-by: yufeiwu <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-09-24 17:35:57 +08:00
HuiGao-NV
29e63d3bc2
[ https://nvbugs/5532248 ][fix] Fix fused_moe OOM ( #7931 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-24 02:22:38 -07:00
QI JUN
946ffcd2eb
[None][ci] optimize test cases of dgx b200 ( #7948 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-24 00:39:45 -07:00
xinhe-nv
62563760fb
[None][chore] update chunked prefill cases ( #7921 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-24 15:14:49 +08:00
Pengbo Wang
b890d7fea4
[None][infra] Skip failed test for nvbugs 5537738 ( #7946 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 23:48:50 -07:00
Yueh-Ting (eop) Chen
cf100933cc
[TRTLLM-6341][feature] Support SWA KV cache reuse ( #6768 )
...
This merge request attempts to support more SWA KV cache functionality
inside the KV cache manager. Before this merge request, the KV cache for
sliding window attention (SWA) only holds "window size" number of blocks
and reuse them in a cyclic manner. We will not be able to utilize more
GPU memory with this design, leading to a limited max batch size
throughput. Additionally, we will not be able to support KV cache reuse
with this design.
In this MR, we change such behavior to let the manager write blocks in
a linear manner. With a linear block writing behavior, as the attention
window moves on, the out-of-window (OOW) blocks will be detached. Right
now for the sake of a correct feature first, we directly offload the
OOW block from the primary block pool (GPU memory) to the secondary
block pool (host memory). We will improve this in the future by
delegating the block movement to the eviction policy.
KV cache reuse for SWA is not developed in this merge request and will
be amended in a follow-up merge request.
Writing the blocks linearly, the maximum number of blocks allocated for
a sequence(`GenerationRequest`) is the "max sequence length" specified.
The `GenerationRequest` that stores the cache block bookkeeping
structure will now keep "max sequence length" tokens of blocks.
Given the above, main changes are (more context in the MR):
- Remove "cyclic" concept under the kv cache manager, such concept
originally guards the block reuse under kv cache manager.
- Add detach mechanism and have it under `KVCacheManager::addToken`.
Please note that detach is still guarded off for SWA when reuse
is enabled. A follow-up merge request will proceed to improve this.
- Enforce "max sequence length" to be a non-optional parameter to
the `KVCacheManager`/`BlockManager`
- Let all window size resource pool get identical proportion of memory
- Fix free memory calculation under `resource_manager.py`
Signed-off-by: eopXD <yuehtingc@nvidia.com>
Co-authored-by: Tomer Asida <tasida@nvidia.com>
2025-09-24 14:28:24 +08:00
Lizhi Zhou
e4f1f90202
[ https://nvbugs/5477404 ][chore] unwaive test_disaggregated_single_gpu.py::test_disaggregated_llama_context_capacity ( #7857 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-24 10:31:35 +08:00
Lizhi Zhou
7550251988
[TRTLLM-7182][test] add multi-nodes test for disagg-serving ( #7470 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-24 08:31:56 +08:00
Zheng Duan
e3c1a9409f
[TRTLLM-6549][fix] add kv cache time output back ( #7798 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-09-23 14:12:42 -04:00
Yanchao Lu
6a36349964
[None][test] Waive another intermittent OOM test ( #7930 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-23 22:34:09 +08:00
ruodil
05bec3bf0f
[None][test] rename llm_perf_full to llm_perf_core and add missing cases ( #7899 )
...
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-09-22 23:04:34 -07:00
Pengbo Wang
a4b4ed4535
[None][fix] Fix and add test for TRTLLM MoE backend ( #7755 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 11:26:25 +08:00
Pengbo Wang
08cc7a041f
[ https://nvbugs/5355128 ][fix] Add missing wgmma intrinsic for starcoder ( #7643 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-09-23 10:38:58 +08:00
yunruis
126cd707e3
[None][opt] Add batch waiting when scheduling ( #7416 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-23 10:27:37 +08:00
Enwei Zhu
59f57598a7
[ https://nvbugs/5504086 ][fix] Fix MTP vanilla ( #7904 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-23 08:38:28 +08:00
Jin Li
b5391b4ac6
[ https://nvbugs/5516665 ][fix] Fix CUTLASS moe fake impl errors ( #7714 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-22 11:08:39 -07:00
Emma Qiao
324301ccba
[None][infra] Skip failed test for nvbugs 5532023 ( #7905 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-22 03:49:44 -07:00
Yechan Kim
f77aca9f2c
[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance ( #7250 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-09-22 03:40:02 -07:00
Bo Deng
8cf95681e6
[TRTLLM-7989][infra] Bundle UCX and NIXL libs in the TRTLLM python package ( #7766 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-22 16:43:35 +08:00
Emma Qiao
d330d0005c
[None][infra] Waive a failed case on main ( #7901 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-22 00:37:01 -07:00
xinhe-nv
9c1b75e978
[TRTLLM-7070][feat] add gpt-oss chunked prefill tests ( #7779 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-22 00:12:43 -07:00
Wanli Jiang
f5bfd68a50
[ https://nvbugs/5509024 ][fix] Print full parsed outputs and update keywords for multimodal model ( #7670 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yi Zhang
f9c9c3f50a
[ https://nvbugs/5355219 ][fix] Fix trtllm moe backend test config and Qwen3 MoE multi node ( #7724 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Ivy Zhang
022bc96fb6
[ https://nvbugs/5512734 ][fix] Update kv cache config for maverick ( #7710 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
bhsueh_NV
ef557f880b
[ https://nvbugs/5437405 ][fix] cherry-pick PR 7000 (qwen3 235b eagle3 ci) ( #7702 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yanchao Lu
5c8b022d1e
[None][ci] Test waives for the release/1.0 branch 09/15 ( #7700 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
peaceh-nv
541b7fda89
[ https://nvbugs/5503423 ][waive] Waive Llama3.1-70B-FP8 test on RTX PRO 6000 ( #7603 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yuxian Qiu
2d46dda6a7
[ https://nvbugs/5448754 ][fix] Download HF model for all nodes. ( #6824 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Lizhi Zhou
293d9fb612
[ https://nvbugs/5448767 ][fix] disable kv cache reuse for disagg pp>1 tests ( #7354 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
peaceh-nv
9dc7316b7f
[ https://nvbugs/5512556 ][unwaive] Unwaive DeepSeek PP tests ( #7828 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-22 10:26:30 +08:00
dongxuy04
9eb8084ca9
[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist ( #7727 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-09-21 11:01:51 -07:00
Mike Iovine
8030b540ac
[ https://nvbugs/5522462 ][fix] Fix FP8 scout illegal memory access ( #7845 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-19 10:30:37 -04:00
pcastonguay
fbe325ce57
[ https://nvbugs/5471108 ][chore] Unwaiving disagg acc test ( #7686 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-09-19 08:56:09 -04:00
Yuxian Qiu
7d28acdbf0
[ https://nvbugs/5522332 ][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783 ) ( #7797 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 18:50:40 +08:00
xinhe-nv
efb763402f
[None][chore] Add failed cases into waives.txt ( #7841 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-19 17:59:47 +08:00
Ivy Zhang
0ac51487f4
[None][chore] remove cli cases for rtx6k ( #7833 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-19 16:33:59 +08:00
Ivy Zhang
6b33bcced2
[None][test] Add accuracy benchmark in stress test ( #7561 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-19 16:09:46 +08:00
dominicshanshan
451475e0dc
[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956 . ( #7853 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-19 14:54:59 +08:00
Emma Qiao
ea079fa530
[None][infra] Waive failed tests in post-merge ( #7859 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-19 14:16:12 +08:00
ruodil
c5453103d6
[None][test] add deepseek r1/v3 model with chunked prefill cases ( #7124 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-09-19 11:12:53 +08:00
fredricz-20070104
fc4e6d3702
[TRTLLM-7183][test] Feature fix model issue for disagg serving ( #7785 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-19 10:12:55 +08:00
Yuxian Qiu
d6ebcf7c4a
[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) ( #7610 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 09:40:49 +08:00
QI JUN
7646da2d85
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS correctly ( #7800 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-19 07:19:50 +08:00
sunnyqgg
80dd8fe197
[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle ( #7001 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-09-18 12:05:36 -04:00
xinhe-nv
d3a907131a
[ https://nvbugs/5519462 ][fix] Add failed cases into waives.txt ( #7817 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 20:01:06 +08:00
Wanli Jiang
fe104dc20d
[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm ( #7723 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 17:37:16 +08:00
xinhe-nv
d909f80379
[TRTLLM-7250][fix] Add failed cases into waives.txt ( #7807 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 17:13:07 +08:00
Wanli Jiang
a7ca0fff54
[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend ( #7207 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 16:26:20 +08:00
dongfengy
2ae08bd1b8
[ https://nvbugs/5519530 ][fix] Fix gptoss 2-gpu test ( #7819 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-18 16:01:53 +08:00
xinhe-nv
236f71ea05
[None][chore] Add failed cases into waives.txt ( #7801 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 14:48:16 +08:00
Ivy Zhang
26d50eb539
[TRTLLM-8070][test] add generation logits case for llama3 ( #7759 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-18 13:33:16 +08:00
Yan Chunwei
327e5e5eed
[None][ci] restore unwaive list ( #7802 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-18 10:50:34 +08:00
Netanel Haber
a5cfc8368f
[ https://nvbugs/5508536 ][fix] Revert #7041 : Move stop_criteria to sample_async ( #7041 ) ( #7796 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
2025-09-17 21:27:01 -04:00
yunruis
7c03eb9ea2
[ https://nvbugs/5516661 ][fix] Drop waive case 5516661 ( #7791 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-09-18 08:55:32 +08:00
Emma Qiao
c4abca323e
[None][infra] Waive failed tests on main ( #7812 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-17 23:44:36 +08:00
William Zhang
2614d71994
[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 ( #7628 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-17 08:11:16 -07:00
xinhe-nv
f918302b3a
[TRTLLM-7250][fix] waive block tests ( #7782 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-17 15:31:03 +08:00
ruodil
e6073b3911
[None][test] add gpt oss model for trtllm perf test ( #7328 )
...
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-09-17 15:23:21 +08:00
xinhe-nv
7801d0992b
[None][chore] Remove closed bugs ( #7697 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-17 15:14:09 +08:00
Fanrong Li
523a17d990
[ https://nvbugs/5485325 ][fix] Cherry-pick #7373 : fix the CUDA graph warmup issue when using speculative decoding ( #7734 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-17 13:57:39 +08:00
QI JUN
bd7aad4988
[None][ci] waive test_llm_gemma_1gpu_summary_vswa ( #7781 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 10:48:31 +08:00
HuiGao-NV
a49cfb3e68
[ https://nvbugs/5516666 ][fix] cherrypick fix to the CUDA graph warmup issue when using speculative decoding ( #7737 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
Co-authored-by: Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-17 06:24:20 +08:00
xinhe-nv
e7c1569456
[None][chore] Add failed cases into waives.txt ( #7746 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 18:43:40 +08:00
Ziyi Xiong
905bb26bbd
[ https://nvbugs/5471106 ][fix] Remove the waivers ( #7711 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-16 17:43:39 +08:00
xinhe-nv
c6ab2072b5
[None][fix] waive hang tests on main ( #7720 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 17:05:15 +08:00
xinhe-nv
1fbea497ff
[TRTLLM-7070][feat] add gpt-oss serve benchmark tests ( #7638 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 16:39:31 +08:00
amitz-nv
750d15bfaa
[ https://nvbugs/5503529 ][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF ( #7740 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-09-16 16:35:13 +08:00
xinhe-nv
cf55927064
[None][chore] Add failed cases into waives.txt ( #7735 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 10:58:06 +08:00
xiweny
c076a02b38
[TRTLLM-4629] [feat] Add support of CUDA13 and sm103 devices ( #7568 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Signed-off-by: Daniel Stokes <dastokes@nvidia.com>
Signed-off-by: Zhanrui Sun <zhanruis@nvidia.com>
Signed-off-by: Xiwen Yu <xiweny@nvidia.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: xiweny <13230610+VALLIS-NERIA@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Daniel Stokes <dastokes@nvidia.com>
Co-authored-by: Zhanrui Sun <zhanruis@nvidia.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-09-16 09:56:18 +08:00
QI JUN
44d5ccfdd9
[None][ci] move qwen3 tests from GB200 to B200 ( #7733 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-16 08:12:28 +08:00
Yanchao Lu
0c9430e5a5
[None][ci] Test waives for the main branch 09/15 ( #7709 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-15 22:13:56 +08:00
jmydurant
7deefb3d2b
[TRTLLM-7192][feat] optimize MLA chunked prefill && support fp8 mla chunked prefill ( #7477 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-09-15 21:43:49 +08:00
HuiGao-NV
335c007df8
[None][chore] move some cases from post-merge to pre-merge to detect errors in early stage ( #7699 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-15 15:37:58 +08:00
Ivy Zhang
ddfe0320b3
[TRTLLM-7279][test] add accuracy test for deepseek-r1 with chunked_prefill ( #7365 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-15 13:38:52 +08:00
xinhe-nv
b69e3e9f99
[None][chore] Add failed cases into waives.txt ( #7682 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-15 11:44:52 +08:00
Chang Liu
47e37755a3
[TRTLLM-6903][feat] Support chunked prefill for multimodal models ( #6843 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-14 20:10:10 -07:00
Perkz Zheng
1b29c2e731
[None][feat] support gpt-oss with fp8 kv cache ( #7612 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-15 02:17:37 +08:00
Yanchao Lu
70aa4e28c1
[None][ci] Test waives for the main branch 09/14 ( #7698 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-14 23:48:04 +08:00
Pengyun Lin
c2bc39af63
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend ( #6097 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-12 15:32:34 +08:00
QI JUN
656f229b58
[None][ci] move some test cases from l40s to a30 ( #7684 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-11 07:22:34 +08:00
Emma Qiao
9986070044
[None][infra] Waive failed cases on main 0910 ( #7676 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-11 01:43:29 +08:00
Dom Brown
fc9d426589
[ https://nvbugs/5505402 ] [fix] Disable deep_gemm for Qwen3 QKNormRoPEAttention and Linear layers due to accuracy issues ( #7616 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-10 18:30:48 +01:00
nvamyt
222e01662c
[ https://nvbugs/5488212 ][waive] Waive failed tests for L20 ( #7664 )
...
Signed-off-by: nvamyt <amyt@nvidia.com>
2025-09-10 22:32:15 +08:00
xinhe-nv
207c5258c4
[ https://nvbugs/5494698 ][fix] skip gemma3 27b on blackwell ( #7505 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-10 21:09:27 +08:00
Bo Deng
bf57829acf
[TRTLLM-7871][infra] Extend test_perf.py to add disagg-serving perf tests. ( #7503 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-09-10 17:35:51 +08:00
fredricz-20070104
ef620f3579
[ https://nvbugs/5410687 ][test] Add deepseek r1-w4afp8 quickstart ( #7645 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-10 10:21:01 +08:00
Chang Liu
faa2f46554
[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next ( #7349 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-09 14:51:36 -04:00
QI JUN
a0e1604898
[None][ci] add DGX_H100-2_GPUs-PyTorch-Others-1 pipeline ( #7629 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-09 11:06:32 -04:00
Liao Lanyu
af403848d7
[ https://nvbugs/5445466 ][fix] unwaive DS R1 test cases with bug already fixed ( #7429 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-09-09 17:25:49 +08:00
Perkz Zheng
da6cb541a2
[None][feat] Optimize MLA kernels with separate reduction kernels ( #7597 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-09-09 16:58:44 +08:00
xinhe-nv
8a52015f50
[None][chore] Remove closed bugs ( #7591 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-09 04:08:42 -04:00
Yiqing Yan
5c616da2fd
[TRTLLM-5877][infra] Add fmha tests and auto trigger rules ( #6050 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-09 11:33:09 +08:00
Wanli Jiang
1e0669d27a
[ https://nvbugs/5453709 ][fix] Remove transformers version limit in Qwen2VL ( #7152 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-09 10:38:20 +08:00
Iman Tabrizian
d96c54d8ae
[None][test] Skip eagle3 test ( #7627 )
...
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-09-08 17:23:53 -04:00
dongfengy
fdd5bd49fc
[ https://nvbugs/5481080 ][fix] Fix GPTOSS W4A16 reference ( #7323 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-08 13:59:28 -07:00
Chuang Zhu
77657a1c12
[TRTLLM-7361][feat] KV cache transfer for uneven pp ( #7117 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-08 13:37:46 -04:00
Eran Geva
5f2a42b3df
[TRTLLM-6142][feat] AutoDeploy: set torch recompile_limit based on cuda_graph_batch_sizes and refactored ( #7219 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-09-08 08:45:58 -04:00
bhsueh_NV
219e95569a
[ https://nvbugs/5506683 ][fix] adjust the CI ( #7604 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-09-08 15:41:41 +08:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd ( #7519 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
Raayan Dhar
8f3121ac81
[None][fix] chore: fixing the math on asymmetric tp+pp tests ( #7098 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-09-07 14:27:46 -04:00
Raayan Dhar
bae9560e62
[ https://nvbugs/5448767 ][fix] sync termination of requests across PP ranks ( #7455 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-07 08:45:49 -04:00
Emma Qiao
aea8ac1649
[TRTLLM-5950][infra] Removing remaining turtle keywords from the code base ( #7086 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-07 14:26:18 +08:00
Chang Liu
23500b55c3
[TRTLLM-7398][feat] Support KV cache salting for secure KV cache reuse ( #7106 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
2025-09-06 17:58:32 -04:00
dominicshanshan
9a97f0a3b7
[None][ci] Waive qwen3 test for accuracy bug in https://nvbugs/5505402 ( #7585 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-06 21:29:16 +08:00
QI JUN
525bb806a9
[None][ci] move some test cases of DGX H100 to post merge ( #7569 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-06 01:03:38 -04:00
Emma Qiao
d8ec546b73
[None][infra] Waive failed tests on main branch 0905 ( #7564 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-05 22:46:46 +08:00
xinhe-nv
8e3962d278
[TRTLLM-6642][feat] add gptoss 20g tests ( #7361 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:20:28 -04:00
xinhe-nv
b3ba3d98d2
[None][chore] Remove closed bugs ( #7408 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-05 02:11:16 -04:00
QI JUN
ff3704897b
[None][ci] remove unnecessary test_modeling_deepseek.py ( #7542 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 20:05:27 -07:00
Jin Li
2189a2f3ff
[ https://nvbugs/5483615 ][fix] Remove unnecessary assertion to let mai… ( #7441 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-05 10:56:21 +08:00
Yuxian Qiu
48a5270868
[ https://nvbugs/5492485 ][fix] Use offline dataset from llm-models instead. ( #7435 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-04 09:58:16 -07:00
Ivy Zhang
b46e0ae5d4
[None][test] update nim and full test list ( #7468 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-04 09:06:01 -04:00
QI JUN
d38b8e3dd9
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS for thop/parallel tests ( #7489 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-04 06:04:51 -07:00
kris1025
cce9556858
[ https://nvbugs/5485886 ][fix] Fix resource free of Eagle3ResourceManager ( #7437 )
...
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-09-04 17:38:13 +08:00
Jin Li
2a2dfe273b
[ https://nvbugs/5485102 ][fix] Correctly set stride for piecewise outp… ( #7442 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-09-04 10:48:15 +08:00
Stanley Sun
db8eb0a447
[TRTLLM-7876][test] Test trtllm-serve with --extra_llm_api_options ( #7492 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-09-04 10:34:38 +08:00
Lizhi Zhou
d97c1e6bd9
[ https://nvbugs/5470769 ][fix] fix disagg-serving accuracy test case ( #7338 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-04 09:11:01 +08:00
Enwei Zhu
5ff3a65b23
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) ( #6948 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-03 15:16:11 -07:00
Lizhi Zhou
7c73c2ff4b
[ https://nvbugs/5485593 ][fix] improve accuracy/test_disaggregated_serving.py ( #7366 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-09-03 09:38:53 -04:00
Stanley Sun
cebbf48b74
[TRTLLM-7363][test] Add 8-GPU test cases for RTX6000 ( #7083 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-09-03 08:36:52 -04:00
Mike Iovine
79d93f9419
[ https://nvbugs/5488141 ][fix] Unwaive llama3 test_eagle3 ( #7486 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-03 14:10:40 +08:00
Wanli Jiang
4223a9aada
[TRTLLM-7261][feat] Support phi-4 model in pytorch backend ( #7371 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-03 10:27:42 +08:00
Simeng Liu
bcc55bcdf3
[ https://nvbugs/5470782 ][fix] Add specific test names for test_deepseek.py ( #7318 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-09-02 10:31:40 -07:00
Emma Qiao
aae5d22bfe
[None][infra] Waive failed tests on main branch 0902 ( #7482 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-02 10:16:49 -04:00
peaceh-nv
90479c50fb
[ https://nvbugs/5453992 ][unwaive] Unwaive llama quickstart test ( #7242 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-02 20:28:32 +08:00
JunyiXu-nv
eefe5f2093
[TRTLLM-7208][feat] Implement basic functionalities for Responses API ( #7341 )
...
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-09-02 07:08:22 -04:00
HuiGao-NV
7279297717
[None][infra] waive test case failed on post-merge ( #7471 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-02 06:20:08 -04:00
aalanwyr
c3c95736a1
[TRTLLM-6643][feat] Add DeepSeek-v3-0324 e2e torch test ( #7413 )
...
Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>
2025-09-02 17:21:27 +08:00
Ivy Zhang
3799e5d460
[None][test] auto reuse torch empty cache on qa test ( #7421 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-02 04:44:47 -04:00
Yan Chunwei
f90375f37c
[ https://nvbugs/5476580 ][fix] unwaive test_nvfp4_4gpus ( #7454 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-09-02 04:17:14 -04:00
Emma Qiao
01dfd3af1b
[None][infra] Waive failed case on main 0901 ( #7447 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-01 23:27:24 +08:00
bhsueh_NV
16e9d1121c
[ https://nvbugs/5481087 ][fix] fix bug of ci when we use mocker ( #7332 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-09-01 16:22:45 +08:00
nvamyt
efaefca2c8
[None][test] Update case that not support passing quantization fp8 for pytorch backend ( #7302 )
...
Signed-off-by: nvamyt <amyt@nvidia.com>
2025-09-01 12:59:21 +08:00
Yiqing Yan
21291f3d8e
[None][chore] Remove duplicate test waives ( #6999 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Emma Qiao
09bca7ca82
[None][infra] Waive failed tests for release branch 0818 ( #6993 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
peaceh-nv
f4dc1ed39c
[ https://nvbugs/5449218 ][fix] Fix KvCacheConfig error in test_perf ( #6937 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
29cdcdb56a
[None][fix] update skip config ( #6891 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Guoming Zhang
d5bc5cd4f2
[ https://nvbugs/5375646 ][fix] update waives.txt for nvbug 5375646 ( #6847 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
William Zhang
d15dcdc4ae
[ https://nvbugs/5448525 ][fix] Mistral Small 3.1 accuracy tests ( #6909 )
...
This commit lowers the GPU memory allocated for KV cache in accuracy
tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yan Chunwei
ac07418968
[None][ci] unwaive test_ptp_star_attention_example ( #6943 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
xinhe-nv
b4d41d6604
[TRTLLM-7048][feat] add benchmark TRT flow test for MIG ( #6884 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Yan Chunwei
612c26be22
[None][doc] add legacy section for tensorrt engine ( #6724 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
2ez4bz
2480aedb73
[TRTLLM-5252][feat] Add fp8 support for Mistral Small 3.1 ( #6731 )
...
This commit adds some level of FP8 support to Mistral Small 3.1 by:
* disabling quantization for the vision sub-model since `modelopt` does
support quantizing it (yet).
* extending existing accuracy tests to use a modelopt produced FP8
checkpoint.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Guoming Zhang
3e99744201
[ https://nvbugs/5375594 ][fix] fix oom issue on structural_tag test case ( #6838 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
deba2885c1
[None][fix] fix Llama3 eagle3 test case OOM ( #6832 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
xinhe-nv
7841ea6255
[None][chore] waive GB300 known issues ( #6812 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Ivy Zhang
c7147d25dc
[TRTLLM-6975][test] Add multi-turn test cases for VLM models ( #6749 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Tian Zheng
e257cb3533
[None][feat] Support NVFP4 KV Cache ( #6244 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-09-01 09:24:52 +08:00
xinhe-nv
5f939b9121
[None][chore] Add failed cases into waives.txt ( #7342 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-30 00:49:14 -04:00
Emma Qiao
15ec2b855d
[None][infra] Waive failed tests on main branch 08/29 ( #7370 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA
62459d533d
[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss ( #7192 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-29 17:03:46 +08:00
fredricz-20070104
091b67ad2f
[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests ( #7326 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-08-29 02:16:22 -04:00
Chang Liu
31b0f0fb0c
[ https://nvbugs/5445466 ][fix] Eliminate race when loading HF dynamic modules ( #7268 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-29 12:36:30 +08:00
Richard Huo
ce580ce4f5
[None][feat] KV Cache Connector API ( #7228 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-28 23:09:27 -04:00
aalanwyr
085dc19bfa
[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test ( #7284 )
...
Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>
2025-08-28 23:09:11 -04:00
Yuan Tong
ccb800f909
[TRTLLM-7457][ci] Update unittest parallel config ( #7297 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-29 09:28:04 +08:00
Emma Qiao
1e644fa28a
[None][infra] Waive failed tests on main branch 08/26 ( #7346 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 00:24:08 +08:00
QI JUN
ae89163368
[None][ci] skip TestGPTOSS ( #7333 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-28 05:01:49 -04:00
William Zhang
4541655e5f
[ https://nvbugs/5430124 ][ci] Unwaive Mistral 3.1 Small tests ( #7274 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-28 00:03:32 -04:00
QI JUN
39c9ffda5a
[None][ci] fix test list name ( #7321 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:33:22 -04:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss ( #7261 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
bhsueh_NV
9d345b31c0
[ https://nvbugs/5453727 ][fix] unwaive qwen3 CI tests ( #7293 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 22:58:59 +08:00
QI JUN
d09add5ede
[None][ci] parallelize unit tests of auto deploy in B200 ( #7291 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:32:11 +08:00
Emma Qiao
8dc62ffac4
[None][infra] Waive failed tests on main ( #7300 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-27 09:53:33 -04:00
xinhe-nv
f082e4857c
[TRTLLM-7250][fix] waive failed cases ( #7292 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-27 18:04:46 +08:00
nvamyt
dbd4f21687
[None][fix] Update maxnt of llama_v3.2_1b bench ( #7279 )
...
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-27 16:56:28 +08:00
bhsueh_NV
f167b1fd99
[ https://nvbugs/5453727 ][fix] Fix bug of how GPT-OSS setup the parameters in CI ( #7151 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 15:26:10 +08:00
QI JUN
e08c7cf17b
[None][ci] remove test_llm_api_autodeploy from B200 test db ( #7282 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 03:12:30 -04:00
dongxuy04
abdb2735be
[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge ( #7262 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-08-27 01:39:24 -04:00
Yuan Tong
6c7813e821
[TRTLLM-7457][ci] Update & cleanup unittest parallel config ( #7254 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-08-27 00:45:58 -04:00
Zhou Yuxin
ccb6aadea8
[ https://nvbugs/5412456 ][fix] Remove from waives.txt ( #7248 )
...
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-08-27 10:05:53 +08:00
Jin Li
028235404b
[TRTLLM-6633][feat] Padding for piecewise cudagraph ( #6750 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-26 18:31:33 -04:00
QI JUN
baef70e67e
[None][ci] move qwen3 tests from b200 to gb200 ( #7257 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-26 11:50:53 -04:00
xinhe-nv
80043affb5
[None][chore] Add failed cases into waives.txt ( #7251 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 17:13:44 +08:00
Zheng Duan
cf50ba2980
[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server ( #6985 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 15:34:44 +08:00
Zheng Duan
1a929a1490
[ https://nvbugs/5457504 ][fix] fix kv cache event test in disaggregated worker tests ( #7028 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 14:25:10 +08:00
nvamyt
d8bd8843fc
[None][test] Update qwen3 timeout to 60 minutes ( #7200 )
...
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 14:18:42 +08:00
qixiang-99
b165f8bc97
fix/improve kvcache allocation in PyTorch runtime ( #5933 )
...
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-08-26 12:40:22 +08:00
William Zhang
92576488d3
[None][feat] Skip prefetching consolidated safetensors when appropriate ( #7013 )
...
* Why?
Some models (e.g. anything produced by Mistral) can have both sharded
safetensors and a consolidated safetensor in the same checkpoint
directory. In such cases, prefetching both to memory is a waste of time,
and memory.
* What?
This commit skips over consolidated safetensors when they are not the
only safetensor file present in the checkpoint directory
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-25 23:56:21 -04:00
ruodil
b845eb7a3a
[None][test] add kv cache size in bench metric and fix failed cases ( #7160 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 10:10:02 +08:00
chenfeiz0326
6a44e5b9d1
[ https://nvbugs/5440241 ][fix] Fix 70B GSM8K Accuracy drop ( #6967 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-25 22:09:30 +08:00
Emma Qiao
200db3b809
[None][infra] Waive failed tests on main branch ( #7201 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-25 09:04:37 -04:00
Ivy Zhang
f61b74f796
[None][test] add l20 specific qa test list ( #7067 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-25 12:44:08 +08:00
Bo Deng
c038fb3ef4
[None][chore] cherry-pick 6940 ( #7097 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-25 10:28:45 +08:00
xinhe-nv
3ba9afcc7b
[None][feat] add gpt-osss tests to sanity list ( #7158 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-25 10:22:07 +08:00
Yiqing Yan
486bc763c3
[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge ( #7074 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:09:04 -04:00
Robin Kobus
31979aefac
[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests ( #6754 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-24 20:53:17 +02:00
ajrasane
068056677f
[None][chore] Enable auto deploy accuracy test in CI ( #7179 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-24 08:42:30 -07:00
Yanchao Lu
ec35481b0a
[None][infra] Prepare for single GPU GB200 test pipeline ( #7073 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:46:39 +08:00
dongxuy04
19a0ea363b
[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP ( #6973 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: Dongxu Yang <dongxuy@nvidia.com>
Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-08-24 08:15:29 -04:00
Iman Tabrizian
96ff82e77a
[None][fix] Waive test ( #7185 )
...
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-24 10:45:11 +08:00
Suyog Gupta
e3de5758a3
[ #7136 ][feat] trtllm-serve + autodeploy integration ( #7141 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-22 08:30:53 -07:00
QI JUN
1388e84793
[None][ci] move all B200 TensorRT test cases to post merge ( #7165 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-22 06:47:23 -04:00
xinhe-nv
b8b2bd4a0a
[TRTLLM-7245][feat] add test_multi_nodes_eval tests ( #7108 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-22 17:17:27 +08:00
Linda
898f37faa0
[None][feat] Enable nanobind as the default binding library ( #6608 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-22 09:48:41 +02:00
Daniel Cámpora
099f081e03
[TRTLLM-7155][feat] Unify sampler handle logits implementation. ( #6867 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-22 08:09:30 +02:00
xinhe-nv
4017f7cd6b
[None][chore] Add failed cases into waives.txt ( #7109 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-22 10:39:25 +08:00
Wanli Jiang
07c711eb1f
[TRTLLM-6825][fix] Update lora for phi4-mm ( #6817 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-21 22:00:04 -04:00
dominicshanshan
6f245ec78b
[None][chore] Mass integration of release/1.0 ( #6864 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: 2ez4bz <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-22 09:25:15 +08:00
Emma Qiao
344bc4575d
[None][infra] Waive failed case for main branch ( #7129 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-22 00:08:55 +08:00
Dimitrios Bariamis
f49dafe0da
[ https://nvbugs/5394409 ][feat] Support Mistral Small 3.1 multimodal in Triton Backend ( #6714 )
...
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Signed-off-by: Dimitrios Bariamis <dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-21 18:08:38 +02:00
bhsueh_NV
ba0a86e0bb
[ https://nvbugs/5437405 ][fix] qwen3 235b eagle3 ci ( #7000 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-21 01:17:32 -04:00
xinhe-nv
21f4434404
[None][chore] waive failed cases on H100 ( #7084 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-21 11:15:23 +08:00
Yechan Kim
0893afae3d
[TRTLLM-6771][feat] Support MMMU for multimodal models ( #6828 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-21 08:54:12 +08:00
bhsueh_NV
73d2daa386
[ https://nvbugs/5457489 ][fix] unwaive some tests ( #6991 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-21 08:49:57 +08:00
QI JUN
a918de710a
[None][ci] move some tests of b200 to post merge ( #7093 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-20 19:43:40 -04:00
Emma Qiao
f84dd64250
[None][infra] Waive failed tests on main branch 8/20 ( #7092 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-20 06:33:44 -04:00
Robin Kobus
b95cab2a7c
[None][ci] move unittests to sub-directories ( #6635 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-20 05:42:22 -04:00
xinhe-nv
9e71b4fda4
[TRTLLM-7205][feat] add llama4 tp4 tests ( #6989 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-20 13:22:05 +08:00
Leslie Fang
3f6a9267f1
[None][infra] update feature_combination_matrix of disaggregated and chunked prefill ( #6661 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-20 13:14:34 +08:00
Chang Liu
ce53832610
[TRTLLM-7326][feat] Add standalone multimodal encoder ( #6743 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-19 21:42:50 -07:00
Ivy Zhang
fc85e3db1c
[None][fix] fix llmapi import error ( #7030 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-19 22:58:13 -04:00
Bo Deng
30da5d3cc4
[None][chore] unwaive test_disaggregated_genbs1 ( #6944 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-20 09:57:35 +08:00
Emma Qiao
8f95f35503
[None][infra] Waive failed tests on main ( #7037 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-19 09:31:07 -04:00
Yiqing Yan
07506bccbe
[None][chore] Remove duplicate test waives ( #7044 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-19 21:04:31 +08:00
Fanrong Li
655d0f48d0
[ https://nvbugs/5455140 ][fix] unwaive DSR1-fp4 throughput_tp8 ( #7022 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-08-19 20:48:05 +08:00
xinhe-nv
2c86cee38c
[None][chore] Remove closed bugs ( #6969 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-19 16:01:33 +08:00
Ivy Zhang
bff5fdf6df
[TRTLLM-6541][test] Add NIM Related Cases Part 1 ( #6684 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-19 13:59:14 +08:00
William Zhang
daa2a65d37
[ https://nvbugs/5454875 ][ci] Unwaive Mistral Small 3.1 test ( #7011 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-19 00:32:14 -04:00
fredricz-20070104
e90280a84d
[TRTLLM-6541][test] Add NIM Related Cases [StarCoder2_7B] and [Codestral_22B_V01] ( #6939 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-08-19 00:13:04 -04:00
Fanrong Li
816a120af6
[TRTLLM-6991][chore] add DeepSeek-R1 FP8 accuracy tests on Blackwell ( #6710 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-08-19 00:03:03 -04:00
Lizhi Zhou
71e28eab36
[TRTLLM-7014][chore] Add accuracy test for ctx and gen workers with different models ( #6741 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-08-19 09:58:22 +08:00
Leslie Fang
e76e5c640f
[None][infra] Enable accuracy test for mtp and chunked prefill ( #6314 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-19 07:42:52 +08:00
Yiqing Yan
1ce23545fc
[None][chore] Remove duplicate test waives ( #6998 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-18 21:15:49 +08:00
Emma Qiao
69ff32f9b1
[None][infra] Waive failed tests on main 0818 ( #6992 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-18 20:34:52 +08:00
Shi Xiaowei
5ec15b98f0
[TRTLLM-7030][fix] uppercase def value in pd-config ( #6981 )
...
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-18 02:33:23 -04:00
Leslie Fang
ce0b13ea02
[None][infra] update feature_combination_matrix of disaggregated and Eagle3 ( #6945 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-18 09:18:17 +08:00
Naveassaf
d6322f70b7
[ https://nvbugs/5451028 ][fix] Constrain NemotronSuper test parameters to prevent OOMs ( #6970 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-08-17 13:38:36 -04:00
Emma Qiao
cc6d763824
[None][infra]Waive failed cases in main branch ( #6951 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-17 14:27:59 +03:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options ( #6831 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
brb-nv
9505727d31
[ https://nvbugs/5401114 ][fix] Unwaive Gemma3 tests ( #6952 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-15 16:35:02 -07:00
dongfengy
0ad0b967bb
[None][fix] Make TP working for Triton MOE (in additional to EP we are using) ( #6722 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-15 16:58:42 -04:00
ajrasane
4162d2d746
[None][test] Add accuracy evaluation for AutoDeploy ( #6764 )
...
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-15 13:46:09 -04:00
yifeizhang-c
4127d77678
[ https://nvbugs/5394392 ][fix] Enlarge scheduler capacity under disagg bs == 1 ( #6537 )
...
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-08-15 09:52:06 -07:00
liji-nv
18ccd053d3
[ https://nvbugs/5427801 ][fix] Torch compile support for Llama4 and Ea… ( #6858 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-15 11:14:20 -04:00
xinhe-nv
b23fdfc62f
[None][chore] Add failed cases into waives.txt ( #6914 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-08-15 14:00:16 +08:00
Yanchao Lu
3a987891d8
[TRTLLM-7141][infra] Use repo mirrors to avoid intermittent network failures ( #6836 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-08-15 11:16:07 +08:00
Bo Deng
e54ba75dac
[None][fix] Update tests to use standardized uppercase backend identifiers ( #6921 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-15 11:14:15 +08:00
Frank
2cc59aacb3
[None][fix] Correct reporting of torch_dtype for ModelConfig class. ( #6800 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-14 22:46:20 -04:00
Aurelien Chartier
b13a5a99b2
[None][chore] Add tests for non-existent and completed request cancellation ( #6840 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-14 15:57:01 -07:00
Raayan Dhar
8b237b943b
[ https://nvbugs/5441714 ][chore] remove skip on disagg n-gram test ( #6872 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-08-14 15:45:00 -07:00
Bo Li
26f413ad90
[ https://nvbugs/5450262 ][fix] Fix unsupported alltoall use case ( #6882 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-08-14 17:46:54 -04:00
Emma Qiao
96339c69a9
[None][infra] Waive failed cases on main ( #6902 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-14 23:59:44 +08:00
Pengbo Wang @ NVIDIA
ffc976ceaf
[ https://nvbugs/5445466 ][fix] fix deepseek r1 hang by not enabling mnnvl by default ( #6860 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-14 22:36:56 +08:00
Shi Xiaowei
1095dfd03c
[None][fix] BREAKING CHANGE: Mismatch between docs and actual commands ( #6323 )
2025-08-14 03:48:57 -04:00
NVJiangShao
a700646132
[None][fix] Add FP4 all2all unitest and fix a bug for module WideEPMoE ( #6784 )
...
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
2025-08-14 13:35:37 +08:00
Bo Deng
d8acca495b
[TRTLLM-6675][infra] Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6623 ( #6735 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-14 04:36:38 +00:00
jmydurant
4200fa46d1
[None][feat] Add support for Hopper MLA chunked prefill ( #6655 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-14 10:39:26 +08:00
Mike Iovine
7cba883932
[ https://nvbugs/5410399 ][chore] Unwaive mtp llmapi test ( #6833 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-08-13 17:38:45 -04:00
Emma Qiao
c7e6145409
[None][infra] Waive failed cases on main ( #6863 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-13 09:50:14 -04:00
Anthony Chang
2198587b35
[ https://nvbugs/5378031 ] [feat] Hopper W4A8 MoE supports ModelOpt ckpt for PyT backend ( #6200 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-08-13 21:24:40 +08:00
Yechan Kim
12102e2d48
[TRTLLM-6772][feat] Multimodal benchmark_serving support ( #6622 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-12 19:34:02 -07:00
xinhe-nv
e35fca4272
[TRTQA-2920][chore] improve hang tests ( #6781 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-12 18:26:51 +08:00
Enwei Zhu
7c686ba8de
[TRTLLM-2285][feat] Enable guided decoding with CUDA graph padding and draft model chunked prefill ( #6774 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-12 09:30:06 +08:00
Ziyi Xiong
b4fcd5f592
[ https://nvbugs/5441438 ][fix] Set correct draft length for the cuda graph dummy request ( #6701 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-08-12 09:28:47 +08:00
Chang Liu
be9dd4713c
[ https://nvbugs/5385987 ][fix] Fix Qwen2 quantization issue by pinning transformers version ( #6673 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-11 17:16:49 -07:00
Aurelien Chartier
56bfc3a6d2
[None][chore] Find LLM_ROOT and LLM_BACKEND_ROOT dynamically ( #6763 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-08-11 15:18:19 -07:00
Emma Qiao
5145e9d40e
[None][infra] Unwaive an updated case to test ( #6791 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 06:47:33 -04:00
Emma Qiao
d6ad4a9d5b
[None][infra] Waive failed tests on main 0811 ( #6778 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-11 03:16:25 -04:00
xinhe-nv
9c358c26e4
[None][chore] remove closed bugs ( #6772 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-11 14:39:58 +08:00
Eran Geva
b3e8fa2960
[None][test] Test trtllm-bench AD vs, PT BEs on H100 single gpu ( #6487 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2025-08-11 08:33:13 +03:00
Tracin
49bcaa4e95
Add gpt-oss GSM8K test. ( #6732 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-08-10 22:45:43 -04:00
Chuang Zhu
c566a8d2a2
[None][fix] fix same pp disagg ( #6730 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-08-10 22:45:15 -04:00
Bo Deng
767879ef85
[ https://nvbugs/5431127 ][fix] Run test_disaggregated_deepseek_v3_lite_fp8_nixl[DeepSeek-V3-Lite-fp8] only on hopper ( #6736 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-11 10:05:10 +08:00
Emma Qiao
ee19ca5e58
[None][infra] Waive test main 0808 ( #6751 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-09 23:54:07 -04:00
Ye Zhang
bcf5ec0c9a
[None][feat] Core Metrics Implementation ( #5785 )
...
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-09 02:48:53 -04:00
Leslie Fang
294e0d3dab
[ https://nvbugs/5436461 ][infra] Adjust free_gpu_memory_fraction of test_eagle3 to prevent OOM on CI ( #6631 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-08 15:30:47 +08:00
Li Min
d913955952
[TRTLLM-6898][feat] make fused_moe_cute_dsl work on blackwell ( #6616 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-08-08 15:03:48 +08:00
ruodil
b15d6fb145
[None][test] fix yml condition error under qa folder ( #6734 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-08 15:59:01 +10:00
2ez4bz
064eb7a70f
[TRTLLM-5252][fix] Propagate mapping to intermediate layers ( #6611 )
...
This commit propagates the mapping to intermediate layers to enable
tensor parallelism (amongst other things) in them.
It also fixes issues with a unit test for TP for pixtral, and adds it to a
test list.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-08 01:50:36 -04:00
Enwei Zhu
aee828d98a
[TRTLLM-6854][feat] Enable guided decoding with disagg serving ( #6704 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-08 12:10:36 +08:00
ruodil
22f45a0e19
[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test ( #6685 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-07 22:57:04 -04:00
xinhe-nv
88ced50ca7
[TRTQA-2920][fix] Add failed cases into waives.txt ( #6719 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-08-08 12:54:13 +10:00
Daniel Cámpora
efca359b66
[TRTLLM-6785][feat] BREAKING CHANGE Enable TRTLLM sampler by default ( #6216 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-07 22:19:37 -04:00
Yuan Tong
db8dc97b7b
[None][fix] Migrate to new cuda binding package name ( #6700 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-07 16:29:55 -04:00
Raayan Dhar
4055b764db
[None][fix] disagg ctx pp4 + gen pp4 integ test ( #6489 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Raayan Dhar <58057652+raayandhar@users.noreply.github.com>
2025-08-07 11:18:02 -04:00
pcastonguay
453a06e6ab
[TRTLLM-6881][feat] Include attention dp rank info with KV cache events ( #6563 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-07 14:17:07 +02:00
Enwei Zhu
1b9781e8e7
[TRTLLM-6409][feat] Enable guided decoding with speculative decoding (part 1: two-model engine) ( #6300 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-07 05:53:48 -04:00
xinhe-nv
0a467b00cc
[ https://nvbugs/5409414 ][fix] fix Not registered specs ( #6660 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 17:55:53 +10:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss ( #6645 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
ruodil
6c1f7d8b91
[None][test] correct test-db context for perf yaml file ( #6686 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-07 02:47:10 -04:00
YueWeng
157ea77549
[ https://nvbugs/5375966 ][chore] Unwaive test_disaggregated_deepseek_v3_lite_fp8_attention_dp_one ( #6658 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-08-07 10:25:17 +08:00
ruodil
780d7507f9
[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml ( #6662 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 10:02:13 +10:00
ruodil
f30398470d
[None][chore] update readme for perf release test ( #6664 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-07 10:00:45 +10:00
Yan Chunwei
5eae3184fa
[None][chore] add missing tests to test list ( #6590 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-06 22:12:27 +08:00
Yechan Kim
1aed7511fe
[ https://nvbugs/5430124 ][fix] Mistral mixture_text_image test case fix ( #6648 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-06 06:58:58 -07:00
Iman Tabrizian
13ecb4aced
[ https://nvbugs/5328160 ][fix] Unwaive disaggregated serving tests ( #6644 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-06 09:08:29 -04:00
ruodil
907c180eb2
[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 ( #6632 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-06 02:25:57 -04:00
ruodil
0bd99b5d6d
[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test ( #6650 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-06 01:45:13 -04:00
yunruis
3ff4f503ad
[None][opt] ADP schedule balance optimization ( #6061 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-06 09:38:02 +08:00
Yechan Kim
c17f4984e2
[None][feat] Refactor Llava-Next ( #6478 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-05 17:53:53 -07:00
ixlmar
1ebceb790d
[TRTLLM-5508][feat] check input tokens + improve error handling ( #5170 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-08-05 18:27:43 +01:00
liji-nv
dcbfa7e509
[ https://nvbugs/5252313 ][fix] Fix torch compile + MTP ( #6554 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-05 10:31:29 -04:00
Venky
61da2daeb4
[TRTLLM-6761][refactor] Replace LogitBiasLogitsProcessor with embedding bias tensor system ( #6464 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-05 07:14:24 -07:00
Emma Qiao
78a75c2990
[None][Infra] - Split gb200 stages for each test ( #6594 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-05 07:10:00 -04:00
xinhe-nv
c32584125e
[TRTQA-2920][fix] Add failed cases into waives.txt ( #6600 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-05 20:12:55 +10:00
Pengbo Wang @ NVIDIA
c289880afb
[None][fix] fix kimi k2 serving and add test for Kimi-K2 ( #6589 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2025-08-05 18:05:33 +08:00
Ivy Zhang
08ed9d7305
[None][doc] add introduction doc on qa test ( #6535 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-05 17:02:17 +08:00
Ivy Zhang
d101a6cebc
[ https://nvbugs/5410279 ][test] resubmit timeout refactor ( #6337 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-05 16:39:25 +08:00
Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec ( #6379 )
...
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
Leslie Fang
164acfa31e
[None][infra] Skip test_eagle3 test with device memory check ( #6617 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-05 02:36:03 -04:00
ruodil
7625845365
test: add README_release_test.md for perf test ( #6443 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-05 02:07:42 -04:00
xinhe-nv
a178cea324
[TRTLLM-6856][feat] add disaggregated serving tests to QA list ( #6536 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-05 12:47:53 +10:00
xinhe-nv
fe3d607c4b
[TRTQA-2920][fix] Add failed cases into waives.txt ( #6581 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-05 12:41:23 +10:00
Ivy Zhang
f3651adea8
[None][test] update invalid test name ( #6596 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-04 08:01:05 -04:00
Emma Qiao
5d8a5a0cb8
[None][Infra]Waive failed case in post-merge on main ( #6602 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-04 19:39:44 +08:00
brb-nv
87e4e9f468
[None][chore] Add unit test for Gemma3 lora ( #6560 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-04 04:56:57 -04:00
Pengyun Lin
a15e33351d
[None][fix] Revert commit 48ddc3d & add test for disagg server with different max_num_tokens ( #6259 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-04 15:09:51 +08:00
xinhe-nv
a54972e463
[None][fix] remove closed bugs ( #6576 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 15:52:11 +10:00
Leslie Fang
a60190836c
[None][infra] Enable accuracy test for eagle3 and chunked prefill ( #6386 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-04 01:45:24 -04:00
ruodil
6459725bf9
test: move ministral_8b_fp8 to fp8_specific gpu list(exclude Ampere) ( #6533 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 15:22:39 +10:00
Ivy Zhang
5eefdf2c75
tests: Add llama4 functional cases ( #6392 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-04 11:19:58 +08:00
ruodil
8d82ccca63
test: modify max_lora_rank of phi4_multimodal to 320 ( #6474 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-04 12:20:22 +10:00
Yechan Kim
ee6ab5be96
chore: add EXAONE4 accuracy test ( #6397 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-04 10:14:16 +08:00
Ivy Zhang
7547a7d0a2
[TRTLLM-6473][test] add speculative decoding and ep load balance cases into QA test list ( #6436 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-03 22:11:26 -04:00
Yiqing Yan
3f7abf87bc
[TRTLLM-6224][infra] Upgrade dependencies to DLFW 25.06 and CUDA 12.9.1 ( #5678 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-03 11:18:59 +08:00
Jhao-Ting Chen
4da5cfc511
[None][infra] add eagle3 one model accuracy tests ( #6264 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-08-02 16:07:46 -07:00
Lizhi Zhou
6f34f3489b
[TRTLLM-6357][test] Add accuracy tests for Qwen3 ( #6177 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-08-01 13:33:34 -04:00
xinhe-nv
263c6c0ad0
test: skip post blackwell ( #6357 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-01 13:10:14 -04:00
Emma Qiao
16febefee0
[None][Infra] - Skip failed tests in post-merge ( #6558 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-01 22:21:23 +08:00
brb-nv
7447d6ed85
[TRTLLM-6657][feat] Add LoRA support for Gemma3 ( #6371 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-01 09:19:54 -04:00
liji-nv
1daa8c3232
[ https://nvbugs/5340941 ][ https://nvbugs/5375785 ] - fix: Wrap attentio… ( #6355 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-01 07:38:06 -04:00
xinhe-nv
fca0d37798
[None][fix] update nemotron nas tests free_gpu_memory_fraction=0.8 ( #6552 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-01 20:27:22 +10:00
chenfeiz0326
ba5bdbb138
[None][chore] Disable add special tokens for Llama3.3 70B ( #6482 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-01 17:03:27 +08:00
Yukun He
90856bf97d
[ https://nvbugs/5419069 ][fix] Fix the mismatched layer name components. ( #6417 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-01 16:32:39 +08:00
Ivy Zhang
71524a1a48
[ https://nvbugs/5419066 ][fix] Use trt flow LLM ( #6467 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-01 03:33:07 -04:00
Venky
ad5742b105
[fix] Update get_trtllm_bench_build_command to handle batch size and tokens ( #6313 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-01 00:08:09 -04:00
brb-nv
2eca0d5925
fix: Fix poor generation with FP8 Gemma3 1B checkpoint ( #6499 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-31 17:18:23 -07:00
Simeng Liu
8cf3faa26a
[feat] Auto-enable ngram with concurrency <= 32. ( #6232 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>
2025-07-31 18:45:51 -04:00
Ziyi Xiong
8062e0fe7c
[TRTLLM-6392][feat] Support turning on/off spec decoding dynamically ( #6363 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-07-31 15:31:39 -04:00
Faraz
8e84df74b5
Fix e2e test failure for RTX6000 Pro ( #6420 )
...
Signed-off-by: list <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Faraz <58580514+farazkh80@users.noreply.github.com>
2025-07-30 23:32:44 -04:00
xinhe-nv
ca534e4798
test: add accuracy reference ( #6479 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-31 12:27:29 +10:00
bhsueh_NV
ae3a5fc918
[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI ( #6477 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-31 09:37:23 +08:00
brb-nv
0e16d1f070
test: Add time logging for lora tests ( #6466 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-30 14:02:43 -07:00
Anurag Mukkara
fac186e3b5
[nvbug/5409417] Unwaive llava test case ( #6460 )
...
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-07-30 14:38:47 -04:00
brb-nv
f6287e4498
Unwaive Gemma2 LoRA test on H100 ( #6461 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-30 12:56:12 -04:00
Bo Deng
24e7f4eece
[nvbug/5410296][fix] Fix OOM in Llama 4 disagg-serve tests ( #6439 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-07-31 00:41:37 +08:00
Wanli Jiang
9632dba02e
feat: TRTLLM-6450 update long rope for phi3.5/phi4-mini/phi4-mm ( #6353 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-30 09:20:16 -07:00
pcastonguay
0f083b9daf
fix: Unwaive triton cpp test [nvbug 5401088] ( #6412 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-07-30 11:25:18 -04:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. ( #6419 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
pcastonguay
e7ae5e2824
feat: Add support for disaggregation with pp with pytorch backend ( #6369 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: raayandhar <rdhar@nvidia.com>
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: raayandhar <rdhar@nvidia.com>
Co-authored-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-07-30 09:42:13 -04:00
tomeras91
a2514d93fc
[nvbug 5380101][fix] Fix nemotronNAS loading for TP>1 ( #6447 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
2025-07-30 07:22:32 -04:00
Yechan Kim
22b29df38c
[nvbugs/5414909] fix: Qwen2-VL keyword on L20 ( #6427 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 17:29:55 +08:00
xinhe-nv
d9ab3fd35e
tests: add TestNemotronH cuda graph tests ( #6390 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 18:45:58 +10:00
2ez4bz
d6eed1b624
[fix] Switch placement of image placeholder for mistral 3.1 ( #6435 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-30 14:10:36 +08:00
xinhe-nv
c00d6763b2
test: [CI] Add failed cases into waives.txt ( #6457 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-30 12:36:58 +10:00
Venky
ab40369053
[fix] Move kv_cache_free_gpu_mem_fraction arg to benchmark command in tests ( #6463 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-30 10:53:43 +10:00
Yechan Kim
d6eb8e2366
fix: support mixture of text & multimodal prompts ( #6345 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-30 08:52:31 +08:00
xinhe-nv
f1086e7d4f
test: [CI] remove closed bugs ( #6381 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-29 19:01:23 +10:00
xinhe-nv
4fbb344caf
test: [CI] Add failed cases into waives.txt ( #6423 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-29 19:00:30 +10:00
Yukun He
0eee2e2850
[5385981] fix: Update the usage of VisionAttention init API. ( #6413 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-07-29 16:41:48 +08:00
ruodil
e11255e9d0
test:[nvbug 5415268] add kv_cache_free_gpu_mem_fraction param and llama4 rcca cases ( #6430 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-29 15:52:45 +10:00
Michal Guzek
2573bb729d
feat: Add Phi-4-Mini-Instruct in Pytorch backend for LLM API accuracy tests ( #6303 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-28 14:02:14 -07:00
2ez4bz
cdca541148
[test] Unwaive mistral3.1 small E2E test ( #6352 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-28 14:37:42 -04:00
2ez4bz
60e4d3a9d4
[test] Add accuracy regression test for Mistral3.1 ( #6322 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-28 09:41:44 -07:00
ruodil
03632a679f
test: organize perf cases and add missing perflab cases in qa test list ( #6283 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-28 20:33:32 +10:00
xinhe-nv
971be1fe86
test: waive failed cases ( #6394 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-28 20:31:43 +10:00
Ivy Zhang
2945817cae
[nvbug/5409414, 5355707] tests: adjust batchsize and decoding name ( #6292 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-28 15:33:30 +08:00
Emma Qiao
b3ca159787
[Infa] - waive failed cases and fix a typo ( #6384 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-28 02:06:57 -04:00
Chang Liu
dc757799e1
[nvbugs/5401156][fix] Avoid import all models when import trtllm._common ( #6266 )
2025-07-27 23:29:21 -04:00
Yan Chunwei
908f49a4ad
[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch ( #6359 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-28 09:01:10 +08:00
Iman Tabrizian
c35c78ff58
[fix][nvbugs/5390810] Improve the check for disaggregated serving test ( #6301 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-25 12:47:01 -07:00
nv-guomingz
b8d4cb8beb
feat: Support JSON Schema in OpenAI-Compatible API ( #6321 )
...
Signed-off-by: noiji <52301388+noiji@users.noreply.github.com>
2025-07-25 12:55:56 -04:00
xiaoqi
a0aecf0476
[feat]: support logit_bias ( #5354 )
...
Signed-off-by: xq25478 <xq25478@qq.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: hexiao.xq <hexiao.xq@antgroup.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-25 09:37:41 +00:00
xinhe-nv
470544cf17
test: [CI] Add failed cases into waives.txt ( #6333 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-25 17:18:06 +10:00
xinhe-nv
6268a60ab3
tests: add test_chunked_prefill for llama4 ( #5549 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-24 23:02:00 -04:00
xinhe-nv
2dcfa90e99
test: skip llama3.3 70b test on cg4 ( #6293 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-24 19:29:56 -07:00
bhsueh_NV
7b6aadc800
[Fix][nvbug 5401163][nvbug 5404726][Qwen3] Fix bug of MoE on tp > 1 with trtllm moe backend ( #6235 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-24 21:47:37 +08:00
Emma Qiao
0cc1f8c03d
[Infra] - Wiave failed tests in post-merge ( #6331 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-24 21:18:06 +08:00
Ivy Zhang
f290108cd8
tests: only get timeout value from pytest marker ( #6287 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-24 20:51:02 +08:00
Iman Tabrizian
5fceaa6153
Revert "tests: add timeout_manager to tensorrt flow test cases ( #5942 )" ( #6309 )
2025-07-23 23:58:10 -04:00
Emma Qiao
82d03ca979
[Infra] - Increase unittest execution time since some test exceeds 1600 ( #6277 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-24 10:02:28 +08:00
Iman Tabrizian
7740bfa31d
Waive tests ( #6312 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-23 18:15:07 -07:00
Emma Qiao
cb737a5fcd
[Infra] - Skip failed cases ( #6299 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-23 21:26:31 +08:00
xinhe-nv
2b0fa24175
test: [CI] Add failed cases into waives.txt ( #6289 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-23 19:04:21 +10:00
YueWeng
ed62a06eef
[nvbug/5322354] fix PD + MTP + overlap scheduler accuracy issue ( #6136 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-07-23 14:53:37 +08:00
Yechan Kim
83c3ed128b
chore: set default device to cpu on Multimodal models ( #5994 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-22 21:45:31 -07:00
Iman Tabrizian
bc2fb29c5e
[nvbugs/5401261][fix] Fix Triton backend disaggregated serving support ( #6224 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-23 05:27:16 +08:00
John Calderon
b7c8a672da
[Issue 6193] Fix gemma3vl weight loader ( #6233 )
...
Signed-off-by: John Calderon <johncalesp@gmail.com>
2025-07-22 10:32:18 -07:00
Stanley Sun
04f2d4b2eb
test: update test list for RTX6KD ( #6213 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-07-22 18:55:24 +08:00
pcastonguay
310bdd9830
fix: Fix triton backend build [nvbug 5396469] ( #6098 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
Yi Zhang
eb7d0f84b5
[nvbugs/5368410][fix] Disable moe allreduce for multi node ( #5918 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
Yan Chunwei
f194b65f3e
fix [nvbug/5351244]: address remote mpi session submit ( #5664 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
Bo Li
537757e669
fix: [nvbugs/5351130] Adjust DSV3-Lite tests free_gpu_memory_fraction to 0.75 to prevent OOM on CI. ( #5896 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-22 12:48:00 +08:00
2ez4bz
37d0b68442
[fix] Fix flaky mistral E2E test ( #6230 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-22 11:55:28 +08:00
Ivy Zhang
eb5cb5b642
tests: add timeout_manager to tensorrt flow test cases ( #5942 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-22 10:23:41 +08:00
Simeng Liu
4a0951f85c
[Chore] Replace MODEL_CACHE_DIR with LLM_MODELS_ROOT and unwaive triton_server/test_triton.py::test_gpt_ib[gpt-ib] ( #5859 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-07-21 15:46:37 -07:00
Mike Iovine
9645814bdf
[chore] Clean up quickstart_advanced.py ( #6021 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-21 15:00:59 -04:00
Yi Zhang
f9b0a911fb
test: Enable GB200 torch compile multi gpu tests ( #6145 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-21 22:17:13 +08:00
Pengyun Lin
9832bef07d
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve ( #5717 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-21 21:09:43 +08:00
Emma Qiao
e41507a253
[Infra] - Waive failed cases on recent post-merge ( #6212 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-21 21:00:18 +08:00
liji-nv
3e0fb60e50
[TRTLLM-4279] feat: Multistream initial support for torch compile flow ( #5847 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-21 19:10:22 +08:00
Linda
3efad2e58c
feat: nanobind bindings ( #6185 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-21 08:56:57 +01:00
xinhe-nv
b46fd41026
test: [CI] remove closed bugs ( #6201 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-21 15:40:30 +08:00
ruodil
6a3c9f8061
test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test ( #5826 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-21 11:29:19 +10:00
bhsueh_NV
2e14c8f443
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 ( #6065 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-20 10:25:25 +08:00
Ziyi Xiong
66030ef815
[TRTLLM-6452][feat]: Two-model engine KV cache reuse support ( #6133 )
...
Signed-off-by: ziyixiong-nv <fxiong@nvidia.com>
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-07-19 13:17:15 +08:00
wili
82d3587bb8
[refactor] Unify name of NGram speculative decoding ( #5937 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-19 12:59:57 +08:00
xiaoqi
28858c8711
feat(eagle3):support qwen3 dense model ( #5879 )
...
Signed-off-by: xq25478 <xq25478@qq.com>
2025-07-19 01:24:32 +08:00
Bo Deng
2c6fa145ee
[TRTLLM-6471] Infra: unwaive nixl tests and some disagg-serve tests ( #6095 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-07-19 00:48:44 +08:00
Erin
9522cde464
fix: NVBug 5385576 py_batch_idx issue ( #6153 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-18 22:36:43 +08:00
Emma Qiao
77acb4f753
[Infra] - Waive failed tests in post-merge ( #6176 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-18 17:34:34 +08:00
Chuang Zhu
c0e416535e
fix single_disagg_test ( #6166 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-18 13:18:37 +08:00
Zhenhuan Chen
992b273045
[ https://nvbugs/5387375 ] fix(scaffolding): fix scaffolding aime test in test_e2e ( #6140 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-18 10:34:37 +08:00
Iman Tabrizian
b75e53ab69
Revert "feat: nanobind bindings ( #5961 )" ( #6160 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-18 10:12:54 +08:00
2ez4bz
8480c120b1
[fix] Fix Mistral3VLM weight-loading & enable in pre-merge ( #6105 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-17 11:04:17 -07:00
Linda
5bff317abf
feat: nanobind bindings ( #5961 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-17 22:42:52 +08:00
Stanley Sun
9518e14f69
test: fix PytestUnknownMarkWarning: Unknown pytest.mark.timeout ( #6115 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-07-17 20:55:04 +10:00
Yi Zhang
a718486900
fix: Fix DeepSeek R1 CI ( #6129 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-17 18:24:49 +08:00
Erin
de60ae47e3
chores: unwaive a few tests for v1.0 ( #6107 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-17 17:59:51 +08:00
Enwei Zhu
21efb50068
[TRTLLM-6406] feat: Enable guided decoding with overlap scheduler ( #6000 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-17 17:46:10 +08:00
Chuang Zhu
44c70c88f9
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service ( #5234 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-17 17:42:07 +08:00
Iman Tabrizian
d4d21a106e
[fix] Release slots with spec decode + disagg ( #5975 ) ( #6032 )
...
Signed-off-by: Iman Tabrizian <itabrizian@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-17 12:58:18 +08:00
chenfeiz0326
fe070a0168
test: Update Llama4 Scout FP4 & FP8 accuracy tests ( #5901 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-07-17 09:41:18 +08:00
Wanli Jiang
2d2b8bae32
feat: TRTLLM-5574 Add phi-4-multimodal pytorch-backend support ( #5644 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-17 06:30:58 +08:00
qixiang-99
e09e409dfb
Fix: Enhance ModelConfig for kv cache size calculations ( #5868 )
...
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-07-16 14:41:31 -07:00
Emma Qiao
e30d7bec38
[Infra] - Waive failed cases in post-merge on main ( #6096 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-16 22:41:18 +08:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend ( #5752 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
Ivy Zhang
dda91b5117
tests: add QA test cases ( #5959 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-16 16:14:25 +08:00
Ivy Zhang
763012a88a
[nvbug/5359218][tests] add test llm api test case on lookahead with chunked prefill ( #6051 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-16 16:04:08 +08:00
peaceh-nv
f5f31beee1
feat: Add deepseek-lite tests for RTX pro 6000 ( #5903 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-16 15:51:45 +08:00
Zheng Duan
385af53a4d
[nvbug/5347489][nvbug/5388036] increase timeout in disagg worker test ( #6041 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-16 13:52:13 +08:00
Wanli Jiang
8679a058a3
fix: Unable to load phi4-model with tp_size>1 ( #5962 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-07-16 11:39:41 +08:00
Aurelien Chartier
6a47cac981
feat: Add support for Triton request cancellation ( #5898 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-07-15 20:52:43 -04:00
danielafrimi
edab7532dd
feat/add latency support for trtllm bench ( #3730 )
...
Signed-off-by: Ubuntu <dafrimi@nvidia.com>
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Daniel Afrimi <dafrimi@nvidia.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
2025-07-15 13:13:49 -07:00
brb-nv
9214ac662a
test: Add regression tests for Gemma3 VLM ( #6033 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-15 11:37:56 -07:00
Fanrong Li
7a1af1c738
Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/5947 ( #5989 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-07-16 01:33:12 +09:00
MinaHuai
9ebc3ab9c4
[nvbugs/5385972][nvbugs/5387423][Fix] Minor fix for llava_next/llava_onevision ( #5998 )
...
Signed-off-by: Mina Huai <121143971+MinaHuai@users.noreply.github.com>
2025-07-15 10:01:35 -04:00
ruodil
2a147c4d01
test: add llama_v3.3_70b_cases in perf test ( #6035 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-15 17:53:59 +10:00
ruodil
2504aa552e
test: add recursive updating pytorch config and change MOE backend format in perf test ( #6046 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-15 17:53:15 +10:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #6003 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
Yiqing Yan
6b35afaf1b
[Infra][TRTLLM-6013] - Fix stage name in single stage test rerun report ( #5672 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-15 12:27:21 +09:00
ixlmar
f225f5cd2e
[nvbugs-5318143] fix: restrict PyTorch memory usage to avoid OOMs ( #5964 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-07-15 06:49:42 +08:00
Iman Tabrizian
c4ee535afb
[fix] fix eagle3 two model disaggregated serving test ( #6014 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-15 04:26:04 +09:00
brb-nv
1a2d96919c
feat: Update Gemma3 Vision Encoder ( #5973 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-14 22:38:10 +08:00
Zhanrui Sun
3a0ef73414
infra: [TRTLLM-6242] install cuda-toolkit to fix sanity check ( #5709 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-07-14 18:52:13 +09:00
Zhenhuan Chen
30608a5e6d
[ https://nvbugs/5355316 ] fix: update torch.compile option to fix triton store_cubin error ( #5865 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-14 17:17:30 +08:00
Robin Kobus
5a61d64b5b
[nvbugs/5345391] fix: chunked prefill + overlap scheduling ( #5761 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
ruodil
347520494b
test: remove duplicate cases in perf sanity test ( #5870 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Bo Li
6d79559f3e
fix: [ https://nvbugs/5351130 ][ https://nvbugs/5333654 ] Unwaive for bug 5351130 and 5333654. ( #5821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Bo Li
2991cf4b80
fix: [ https://nvbugspro.nvidia.com/bug/5345215 ] Unwaive for bug 5345215. ( #5606 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Pengyun Lin
6992616c1f
[nvbug 5004744][fix] rewrite completion API to avoid repetitive tokens ( #5201 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
ruodil
278a1a7df3
test: fix some test failure and add llama_nemotron models in perf sanity test, add more torch cases ( #5693 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Iman Tabrizian
c8874a7f94
[nvbug/5337601][fix] Fix disagg + speculative decoding ( #5558 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Yi Zhang
9cc4e5d50e
[nvbugs/5336321][fix] Enable attention dp = False test case, Fix TRTLLM Gen Moe workspace allocation ( #5463 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: yizhan <187001205+yizhang-nv@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
Yi Zhang
e5e87ecf34
test: Move some of the test from post merge to pre-merge, update dgx b200 test case ( #5640 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
brb-nv
869e88304a
[nvbug/5341178][fix] Fix OOM in Llama 4 accuracy test ( #5735 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-14 17:17:30 +08:00
dominicshanshan
c9e7f831dc
Breaking change: perf: [TRTLLM-4662] Enable cuda graph by default ( #5480 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-07-14 16:42:23 +08:00
Yan Chunwei
9c673e9707
[TRTLLM-6160] chore: add sampling examples for pytorch ( #5951 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-14 15:28:32 +09:00
Yan Chunwei
c30eead09f
[TRTLLM-6164][TRTLLM-6165] chore: add runtime example for pytorch ( #5956 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-14 14:09:39 +08:00
Mike Iovine
8950223f6f
[fix] Remove SpecConfig and fix thread leak issues ( #5931 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-12 21:03:24 +09:00
Chang Liu
308776442a
[nvbug/5308432] fix: extend triton exit time for test_llava ( #5971 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-12 12:56:37 +09:00
Thor Johnsen
041f1fa513
[TRTLLM-6264] Fix flaky test_e2e.py::test_openai_lora ( #5885 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2025-07-11 16:20:41 -07:00
xinhe-nv
509363d858
tests: update sanity tests & fix tests ( #5906 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-11 19:48:19 +10:00
brb-nv
0385f89abc
test: Fix Gemma3 unit tests due to transformers upgrade ( #5921 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-10 17:24:10 -07:00
2ez4bz
c19840235d
[fix] Fix mistral unit tests due to transformers upgrade ( #5904 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-10 10:45:27 -07:00
wili
2e3cf42e03
[refactor] Simplification of Speculative decoding configs ( #5639 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-10 11:37:30 -04:00
Yiqing Yan
3aa53ec36c
[None] - Waive L0 tests ( #5915 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-10 18:33:17 +08:00
Enwei Zhu
055c4a9fe6
[NvBug 5370718, 5371538] fix: Fix incremental detokenization ( #5825 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-07-10 16:30:00 +08:00
Anthony Chang
7d21b55b5a
[feat] Add TRTLLM MoE nvfp4 cubins for mid-high concurrency; attention_dp for TRTLLM MoE ( #5723 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-07-10 14:06:50 +08:00
Yan Chunwei
07f6da763d
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner ( #5876 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-10 11:31:35 +08:00
peaceh-nv
76c3a12bcb
[fix] WAR to fix the illegal memory access issue in moe gemm on SM120 ( #5636 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-07-10 09:20:30 +08:00
2ez4bz
87fe44fd29
feat(models): Mistral3.1 VLM pytorch backend support ( #5529 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-07-09 13:17:40 -07:00
DylanChen-NV
74dca0aa7b
[NVBUG-5304516/5319741]Qwen2.5VL FP8 support ( #5029 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-09 23:16:42 +08:00
Erin
e277766f0d
chores: merge examples for v1.0 doc ( #5736 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-08 21:00:42 -07:00
Bo Li
9d894bc0cb
fix: [ https://nvbugspro.nvidia.com/bug/5375656 ] Unwaive for bug 5375656. ( #5842 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-07-09 10:17:05 +08:00
Venky
e27215ca03
test: Validate and add accuracy& perf tests for Ministral-8B-Instruct[-FP8](pytorch only) ( #5654 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-07-08 18:16:21 -07:00
xavier-nvidia
b6013da198
Fix GEMM+AR fusion on blackwell ( #5563 )
...
Signed-off-by: xsimmons <xsimmons@nvidia.com>
2025-07-09 08:48:47 +08:00
Yan Chunwei
e50d95c40d
chore [TRTLLM-6161]: add LLM speculative decoding example ( #5706 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-09 07:33:11 +08:00
Pamela Peng
da8c7372d4
[TRTLLM-5366][feat]Add support for sm121 ( #5524 )
...
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Initial CI run failed a single step A30-CPP-3 due to timeout. Rerunning that step succeeded.
2025-07-08 14:27:00 -07:00
Chang Liu
08a3dfeb2b
[nvbug/5308432] unwaive test: post-merge-triton_backend-test_llava ( #5814 )
2025-07-08 09:53:11 -07:00
Raayan Dhar
e3268a4221
[TRTLLM-5847][feat] Support n-gram speculative decoding with disagg ( #5732 )
...
Signed-off-by: raayandhar <rdhar@nvidia.com>
2025-07-08 09:39:58 -04:00
xinhe-nv
89bbb230cc
tests: waive failed cases on main ( #5781 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-08 19:44:12 +10:00
nv-guomingz
c8fa08da5c
doc: update cuda_graph_config usage part in DS R1 docs ( #5796 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-08 16:54:46 +09:00
liji-nv
95978e3044
[fix] https://nvbugs/5333654 Unwaive to check ci status and improve torch compile multi-gpu coverage ( #5700 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-08 12:42:15 +08:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" ( #5818 )
2025-07-08 13:15:30 +09:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #5795 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
Robin Kobus
30a19fcf7c
[TRTLLM-6291] feat: Add user-provided speculative decoding support ( #5204 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-07 16:30:43 +02:00
Yi Zhang
ed1b3c884a
fix: Adjust free GPU memory fraction in KvCacheConfig for DeepSeek R1 tests ( #5774 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-07 18:38:54 +09:00
xinhe-nv
ded38ebdbd
test: [CI] remove closed bugs ( #5770 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-07 18:06:07 +10:00
Yanchao Lu
2013034948
[Test] - Waive or fix few known test failures ( #5769 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-07-06 21:14:16 +08:00
Stefan Niebler
d1112aac37
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow ( #5333 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-05 01:35:13 +09:00
Chuang Zhu
ffc0b8f5da
Cache transceiver support VSWA ( #5505 )
...
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-07-05 01:18:42 +09:00
Yiqing Yan
7f3ea058f0
[Infra] - Waive L0 flaky test ( #5759 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-04 19:25:12 +09:00
xinhe-nv
3869b969a6
test: [CI] Add failed cases into waives.txt ( #5718 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-04 17:24:48 +09:00
Faraz
81c0764012
Cherry pick "[NVBUG:5355009] Modify check for fuse_fp4_quant on SM120 ( #5724 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
2025-07-04 16:53:20 +09:00
Yiqing Yan
b8fef809ae
[Infra] - Waive L0 test ( #5748 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-04 15:04:49 +08:00
Yuan Tong
32b244af38
feat: reduce unnecessary kernel generation ( #5476 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-07-04 14:37:49 +08:00
brb-nv
cdaa6abce7
fix: Investigate Gemma3 1B decoder output discrepancy ( #5564 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Yi Zhang
73d30a23c7
test: add more tests for GB200 with 8 GPUs/2 nodes in L0 tests ( #5397 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Zheng Duan
cb9f596dbe
[nvbug 5300551] test: increase block count in eviction test ( #5465 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
xinhe-nv
7f837b6e8b
tests: waive failures on main ( #5704 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-07-04 12:39:12 +09:00
Venky
4762e0b244
Waive tests : test_openai_lora, test_trtllm_serve_lora_example and test_openai_chat_structural_tag_example ( #5740 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-07-04 11:01:08 +09:00
Netanel Haber
f91379b7e8
delete duplicate eagle3 and ngram tests ( #5711 )
...
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-07-03 15:47:26 +03:00
Omer Ullman Argov
c72856188c
[ci] small multigpu speedups ( #5643 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-07-03 08:06:10 -04:00
Emma Qiao
530897388c
[Infra] - Waive a failed case on main ( #5702 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-03 06:09:27 -04:00
Emma Qiao
2a5fdebf10
[Infra] - Waive failed tests for main 0702 ( #5671 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-02 22:05:07 -04:00
Emma Qiao
31699cbeb1
[Infra] - Set default timeout to 1hr and remove some specific settings ( #5667 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-02 08:37:54 -04:00
qixiang-99
ca7b6ec8d8
Feat/pytorch vswa kvcachemanager ( #5151 )
...
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-07-02 15:58:00 +08:00
liji-nv
c345f5876c
[feat] Support torch compile for attention dp ( #5086 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-07-01 13:48:52 -04:00
Kaiyu Xie
f9a455651b
perf: Use tokenizers API to optimize incremental detokenization perf ( #5574 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-07-01 09:35:25 -04:00
Yan Chunwei
3bc703d450
ci: unwaive llmapi launch test ( #5281 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
ruodil
ded203d8aa
test: set enable_attention_dp=True in default deepseek settings ( #5461 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
brb-nv
4ef60d5fbb
nvbugs-5331031; nvbugs-5344203 - address intermittent issues with Mistral Small multimodal for BS=8 ( #5453 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Ivy Zhang
61213e3562
tests: fix typos in qa test ( #5421 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-07-01 20:12:55 +08:00
Yan Chunwei
a5eff139f1
[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) ( #5431 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-01 19:06:41 +08:00
Emma Qiao
65c2b93284
[Infra] - Add some timeout and unwaive a test which dev fixed ( #5631 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-07-01 05:01:32 -04:00
Pamela Peng
071ad758c4
[ https://nvbugs/5318059 ][test] Unwaive test ( #5624 )
...
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
2025-07-01 04:54:44 -04:00
Robin Kobus
5f77d212ef
test: Reduce number of C++ test cases ( #5437 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-01 09:40:49 +02:00
xinhe-nv
19c56f0374
test: [CI] Add failed cases into waives.txt ( #5582 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-01 14:57:03 +08:00
Stanley Sun
7135b27284
rcca: test default kv_cache_reuse option for pytorch multimodal ( #5544 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-07-01 12:12:48 +08:00
xinhe-nv
a8cf611baa
test: [CI] Add failed cases into waives.txt ( #5569 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-01 11:02:56 +08:00
xinhe-nv
9b17b29b6e
test: [CI] remove closed bugs ( #5572 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-07-01 10:15:43 +08:00
Yi Zhang
7cf1209a19
[fix]: Fix main test skip issue ( #5503 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-30 21:39:49 -04:00
nv-guomingz
6e48ac25a6
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. ( #5585 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 12:23:14 -04:00
Omer Ullman Argov
42134b8b84
[ci] move eagle1 and medusa tests to post-merge ( #5604 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-30 19:32:28 +08:00
Fanrong Li
6cbc9a5297
[nvbug/5354946][fix] Fix mtp vanilla draft inputs ( #5568 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-30 15:59:12 +08:00
Yiqing Yan
4fef14da56
Deduplicate waive list ( #5546 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-30 11:12:26 +08:00
nv-guomingz
578430e64c
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) ( #5014 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 11:05:40 +08:00
Omer Ullman Argov
2780fc27a7
[ci] remove MMLU if followed by GSM8K ( #5578 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-30 05:29:54 +03:00
Talor Abramovich
70e34a3291
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve ( #5376 )
...
Signed-off-by: Talor Abramovich <talora@nvidia.com>
2025-06-29 12:46:30 +00:00
amirkl94
a985c0b7e6
tests: Move stress tests to be Post-Merge only ( #5166 )
...
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-06-29 09:44:47 +03:00
Li Min
6021a439ab
Make moe permute and final as custom op ( #5412 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-06-27 15:48:33 -07:00
Iman Tabrizian
26b953e29a
[nvbugs/5309940] Add support for input output token counts ( #5445 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-28 04:39:39 +08:00
Darragh Hanley
5437075def
ReDrafter support for Qwen ( #4875 )
...
Signed-off-by: darraghdog <darragh.hanley@gmail.com>
Signed-off-by: Darragh Hanley <darragh.hanley@gmail.com>
Co-authored-by: rakib-hasan <rhasan@nvidia.com>
2025-06-28 02:33:10 +08:00
wili
56cdfe5c6c
[TRTLLM-5000][feat] NGrams V2 ( #4569 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-06-27 23:00:17 +08:00
Omer Ullman Argov
6fc1c6fd7b
[fix][ci] correct unittests test prefix ( #5547 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-27 20:34:44 +08:00
Iman Tabrizian
49af791f66
Add testing for trtllm-llmapi-launch with tritonserver ( #5528 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-27 11:19:52 +08:00
xinhe-nv
a3494bebec
tests: waive failed tests on main ( #5512 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-27 10:13:22 +08:00
Frank
aa6e015ef8
Update trtllm-bench to support new Pytorch default. ( #5491 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-06-26 17:05:43 -07:00
jmydurant
8836990bde
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) ( #5475 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-06-26 22:18:08 +08:00
Omer Ullman Argov
6bae76d7ca
[fix][ci] move torch tests to run under torch stage ( #5473 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-26 14:31:38 +03:00
Omer Ullman Argov
1633bd2bef
[CI] move flashinfer llama tests to post merge ( #5506 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-26 19:27:32 +08:00
xinhe-nv
ff2dd72df4
tests: waive tests ( #5458 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-26 14:53:55 +08:00
Emma Qiao
32d1573c43
[Infra] - Add timeout setting for long tests found in post-merge ( #5501 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-26 11:31:39 +08:00
Venky
d9b75f83fd
[CI] Waive test_fp8_block_scales_4gpus[ep4-mtp_nextn=0-fp8kv=True-attention_dp=True-cuda_graph=True-overlap_scheduler=True-torch_compile=False] ( #5494 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-06-25 20:17:12 -07:00
jmydurant
578dbc8d9a
feat: chunked prefill for MLA (Blackwell) ( #4651 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-06-26 09:01:00 +08:00
HuiGao-NV
74ae15a26b
CI: enable test cases on single device type ( #5484 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-26 08:03:44 +08:00
QI JUN
feaf789342
CI: reduce BF16 test cases in B200 ( #5482 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-06-26 07:18:20 +08:00
Omer Ullman Argov
bdc8dfebc3
[fix][ci] dont build wheel for cpp tests ( #5443 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-26 00:13:47 +03:00
Daniel Cámpora
205c97a4ae
[TRTLLM-5974][feat] Support disaggregated serving in TRTLLM Sampler ( #5328 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-06-25 17:41:36 +02:00
HuiGao-NV
cc3c2b3be2
Move 3 disaggregated cases from 4 GPUs devices to 1 GPU device ( #5457 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-25 21:38:14 +08:00
Kaiyu Xie
d6ada5ffce
[nvbug/5354956] fix: unexpected keyword argument 'streaming' ( #5436 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-25 20:37:24 +08:00
Netanel Haber
3ca2f6ac51
start OAIServer with max_beam_width=1 for TorchSampler ( #5427 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
2025-06-25 15:52:06 +08:00
Enwei Zhu
fc7a81ceb0
test: Add LLGuidance test and refine guided decoding ( #5348 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 14:12:56 +08:00
Enwei Zhu
76da7fed86
fix (NvBug 5354925): Fix static EPLB ( #5411 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 13:14:40 +08:00
HuiGao-NV
da98e03747
tests: Set kv cache free memory fraction in test case ( #5433 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-25 12:31:58 +08:00
dongxuy04
699520082b
Add MTP support for Online EPLB ( #5213 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-06-25 07:58:13 +08:00
Emma Qiao
475272046a
[Infra] - Waive failed tests in post-merge and increase some timeout setting ( #5424 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-24 17:19:31 +08:00
xinhe-nv
658fb5b54e
tests: update benchmark test lists ( #5365 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-24 15:23:38 +08:00
xinhe-nv
4b32a3f1a7
test: [CI] remove closed bugs ( #5400 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-24 13:39:57 +08:00
Fanrong Li
5d4ab47d5b
fix: refactor and fix mtp vanilla ( #4762 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-20 05:23:39 +08:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default ( #5312 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00
Kaiyu Xie
7246fd75d1
feat: Support stream_interval ( #5284 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-19 21:57:10 +08:00
Enwei Zhu
bca758fce1
fix: Fix DS-R1 nvfp4 test case naming ( #5361 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-19 15:50:43 +08:00
Emma Qiao
493f268b1c
[Infra]Fix l0_sanity_check.yml which also has gb202 and gb203 ( #5360 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-19 15:05:57 +08:00
ruodil
e22e884b02
test: amend test case name in perf cluster test ( #5356 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-19 14:50:12 +08:00
ruodil
21ce9b6749
test: add qwen3 cases ( #5302 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-19 14:38:36 +08:00
amitz-nv
1753202b61
[TRTLLM-5825][fix] Fix torch LoRA TP ( #5338 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-06-19 09:12:00 +03:00
Emma Qiao
7f68de3e3f
Refactor test timeout for individual long case ( #4757 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-19 13:52:11 +08:00
bhsueh_NV
dce8620013
chore: enable moe_backend on Qwen3 test ( #5230 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-06-19 13:40:45 +08:00
xinhe-nv
e5400eeae0
tests: add ds r1 tp4 test ( #5197 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-19 12:48:33 +08:00
Yiqing Yan
da576bcafa
Waive L0 test ( #5349 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-19 12:01:11 +08:00
Fanrong Li
6c3210a8be
[test] add nvfp4 DeepSeek-V3-Lite-mtp tests ( #5125 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-19 09:48:22 +08:00
nv-guomingz
6a388b105a
chore: remove torch_compile prefix for TorchCompileConfig field members ( #5261 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-19 09:21:51 +08:00
Aurelien Chartier
d25f93c07f
chore: skip test_llm_gpt2_medium_fp8 for fp8_pc_pt + quant_lm_head ( #5293 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-06-18 11:13:12 -07:00
Omer Ullman Argov
5010f8719d
[fix][test] remove duplicate test runs ( #5241 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-19 01:59:54 +08:00
Omer Ullman Argov
a28a152001
[fix][test] remove some cpp test cases from h100 ( #5335 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-18 20:40:26 +03:00
yuanjingx87
a1c5704055
[feat] Multi-node CI testing support via Slurm ( #4771 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: yuanjingx87 <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-19 01:11:12 +08:00
Iman Tabrizian
e5ee5c5352
Unwaive disaggregated serving accuracy tests ( #5095 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-06-19 00:41:15 +08:00
HuiGao-NV
d13d2f460d
Remove duplicated test cases ( #5323 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Hui Gaoâ <huig@nvidia.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-18 21:20:20 +08:00
Emma Qiao
b29ac5b561
[Infra] Update 5080 and 5090 case condition due to the driver update ( #5317 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-18 20:01:36 +08:00
xinhe-nv
610a49f117
tests: add multi nodes tests ( #5196 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-18 18:08:04 +08:00
Yi Zhang
375dd0b971
Waive L0 ( #5311 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-18 16:40:41 +08:00
Yuan Tong
f599ee63c1
test: correct unittest rerun behavior ( #5273 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-06-18 16:37:19 +08:00
Wanli Jiang
3a02489e86
[TRTLLM-5758] test: Add Bielik-11B-v2.2 Model Support ( #5159 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-06-18 15:12:49 +08:00
QI JUN
9ea7bb67a4
CI: fix TensorRT H200 tests ( #5301 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-18 14:40:57 +08:00
ruodil
3b5d916250
test: cherry-pick deepseek rcca cases in main branch ( #5307 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-18 14:26:26 +08:00
Yiqing Yan
8f67e3604d
Waive L0 tests ( #5308 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-18 12:43:45 +08:00
Omer Ullman Argov
f501ce57b1
[fix][test] move deepseek single gpu tests to post merge ( #5280 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-18 06:59:39 +03:00
dominicshanshan
3c0fecbf42
CI: extend model weights load time for dsv3 in stress test. ( #5275 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-06-18 11:51:48 +08:00
Ivy Zhang
41cfcaa964
test: update qa test list ( #5305 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-18 11:29:11 +08:00
Emma Qiao
ff32caf4d7
[Infra] - Update dependencies with NGC PyTorch 25.05 and TRT 10.11 ( #4885 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 23:48:34 +08:00
Yanchao Lu
f4cdbfcdf0
None - Some clean-ups for the automation pipeline ( #5245 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-17 21:08:24 +08:00
amirkl94
8451a87742
chore: Mass integration of release/0.20 ( #5082 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-17 14:32:02 +03:00
liji-nv
13eef642e6
[feat] Piecewise cuda graph support for MLA ( #4467 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-17 18:58:38 +08:00
QI JUN
ccd9adbe33
CI: move multi-gpu test cases of tensorrt backend to h200 ( #5272 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 17:37:37 +08:00
Ivy Zhang
2ad8758ecc
[TRTLLM-5786][ https://nvbugspro.nvidia.com/bug/5310520 ][test] Add QA test cases ( #5073 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-17 17:14:01 +08:00
QI JUN
517c1ecf72
move some test cases of TensorRT backend back ( #5232 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 17:03:11 +08:00
xinhe-nv
a49ad790b3
test: [CI] remove closed bugs ( #5218 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-17 13:13:23 +08:00
QI JUN
546274d40e
fix ci ( #5259 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-17 12:03:09 +08:00
ruodil
bb2348372c
test: add more pytorch cases in perf test ( #5237 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-17 11:11:28 +08:00
Mike Iovine
c53bc19f5e
[infra] Make test_chunked_prefill faster ( #5248 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-17 04:19:47 +08:00
Simeng Liu
5c18160d27
chore: Waive CI failure. ( #5252 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-06-16 20:47:05 +02:00
Izzy Putterman
e607768e45
Speculation: Draft Target in new FW ( #4558 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-06-17 02:26:08 +08:00
Ivy Zhang
64b7f04fdc
[test] split nemotron test cases from examples_test_list ( #5238 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-16 16:36:33 +08:00
xinhe-nv
802f22cd12
test: [CI] Add failed cases into waives.txt ( #5221 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-16 16:11:53 +08:00
Yiqing Yan
8445416c39
Waive L0 tests ( #5233 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-16 15:19:03 +08:00
Wanli Jiang
0acf23185e
[Stress test] Add DeepSeek-R1 stress test ( #5033 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-06-16 11:54:31 +08:00
Yi Zhang
9b616db13b
test: Add fixture to skip tests based on MPI world size ( #5028 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-16 11:25:01 +08:00
ruodil
2848e012ae
test: add llama4 models for perf test ( #5187 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-16 11:24:35 +08:00
ruodil
3d22f27063
test: add more cases for llama_v3.3/3.1 70b fp8 and set enable_attention_dp to false to non-deepseek models ( #5155 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-16 11:23:20 +08:00
Enwei Zhu
babdd9ce06
test: Add json_mode_eval for guided decoding evaluation ( #5179 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-16 10:03:55 +08:00
Yan Chunwei
c84e41fd9d
fix: build_config in TorchLlmArgs and avoid arbitrary args ( #4972 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-15 17:51:56 -07:00
amitz-nv
109c426077
Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow ( #5130 )
2025-06-15 18:54:04 +03:00
Kaiyu Xie
dce1dcc4f9
feat: Support post_proc for bench ( #5122 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-15 13:02:38 +08:00
ixlmar
e055af1bc9
chore: improve disagg test failure detection ( #4738 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-06-15 01:28:26 +08:00
Aurelien Chartier
1389f5a4d3
feat: Add support for fp8 rowwise quantization ( #4876 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: aikitoria <151776613+aikitoria@users.noreply.github.com>
2025-06-14 06:37:48 -07:00
Tailing Yuan
0b60da2c45
feat: large-scale EP(part 7: DeepEP integration) ( #4792 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-06-14 19:12:38 +08:00
nv-guomingz
3b7b5a5ad5
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) ( #5032 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-14 14:23:13 +08:00
Enwei Zhu
5f2785fb90
fix: Fix waive list ( #5205 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-13 23:33:23 +08:00
QI JUN
952f33dcad
CI: move all test cases of TensorRT backend into post merge ( #5186 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-13 20:48:48 +08:00
xinhe-nv
30d9d0fa71
test: [CI] Add failed cases into waives.txt ( #5178 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-13 16:38:51 +08:00
Zheng Duan
4d0a5ad384
chore: gracefully exit disagg process in tests; better startup and logging ( #5109 )
...
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-06-13 14:03:55 +08:00
Ivy Zhang
28cd536bd6
[test] Update timeout params in QA test list ( #5124 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-13 13:40:03 +08:00
Iman Tabrizian
01bd4c00b4
Add two MTP disaggregated test ( #4546 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-13 12:17:45 +08:00
xinhe-nv
d9be419f45
tests: update tests for b200 ( #5180 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-13 11:25:33 +08:00
ruodil
fa582cbe9a
test: add more cases for rtx_pro_6000_se and add option kv_cache_dtype in perf test ( #5083 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-06-13 11:09:15 +08:00
Fanrong Li
38a907aaca
[TRTLLM-5278][feat] Add attention dp support to MTP relaxed acceptance ( #5119 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-13 08:58:44 +08:00
Omer Ullman Argov
655bce0b19
[fix][test] report individual unittests results to jenkins ( #5116 )
...
Signed-off-by: Omer Ullman Argov <118735753+omera-nv@users.noreply.github.com>
2025-06-13 01:52:09 +08:00
nv-guomingz
cf35a079f9
fix: https://nvbugs/5298661 ( #5022 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-12 20:41:44 +08:00
Shi Xiaowei
88cba5f354
test: waive the NIXL related tests ( #5153 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-06-12 17:02:27 +08:00
Fanrong Li
4d070d3862
chore: fix typo in tests ( #5092 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-12 15:11:26 +08:00
Michal Guzek
53983ad273
[TRTLLM-4932] Add Llama-3.1-Nemotron-Nano-8B-v1-FP8 accuracy tests ( #4933 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-06-12 15:06:28 +08:00
ruodil
d021cc5126
test: set enable_attention_dp to False for non-deepseek models and add more cases for llama_v3.1/3.3 70b fp8 models ( #5149 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-12 14:59:16 +08:00
bhsueh_NV
505678a286
update the free_gpu_mem_fraction for H100 qwen3 qa test ( #5114 )
...
Signed-off-by: root <root@eos0274.eos.clusters.nvidia.com>
Co-authored-by: root <root@eos0274.eos.clusters.nvidia.com>
2025-06-12 14:40:57 +08:00
Michal Guzek
0daa70999a
Fix Llama-3_3-Nemotron-Super-49B-v1 FP8 accuracy threshold configs ( #4961 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-06-12 14:32:04 +08:00
Venky
c3b2eb6dab
test(perf): Add remaining Llama-Nemotron perftests (nano, super, ultra) + extras ✨ ( #5066 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-06-12 14:19:15 +08:00
xinhe-nv
11b94feff8
test: skip disaggregated tests on arm ( #5070 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-11 17:00:10 +08:00
ruodil
56abae0835
test: add more llama_v3.3_70b cases in perf test ( #4979 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-11 15:44:22 +08:00
Yiqing Yan
0a9f105931
Waive L0 tests ( #5111 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-11 11:53:15 +08:00
Zheng Duan
580a92521e
test: conditional disagg and cache aware balancing for deepseek v3 ( #4522 )
...
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-06-11 09:44:29 +08:00
liji-nv
f6a49a9343
[CI] waive failing L0 test ( #5089 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-10 20:40:44 +08:00
Yiqing Yan
8ec8e4559d
Waive L0 test ( #5077 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-10 16:23:49 +08:00
Yiqing Yan
fdfc711261
Waive L0 test ( #5067 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-10 15:40:57 +08:00
Stanley Sun
74b0e71ef4
test: add more disaggregated serving tests into QA testlist ( #5036 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-06-10 09:24:53 +08:00
pcastonguay
5b84fd9201
[nvbug 5283506] fix: Fix spec decode triton test ( #4845 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-09 08:40:17 -04:00
Yukun He
137fe35539
fix: Fix warmup phase batch size out of range. ( #4986 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-09 19:19:16 +08:00
Yuxian Qiu
88480197da
ci: [nvbugs/5280806] Unwaive unittests/_torch. ( #4951 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-06-09 19:04:11 +08:00
liji-nv
1d4f748773
[fix] Fix illegal mem access and possible accuracy lose. Cherry-pick … ( #5017 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-06-09 17:50:57 +08:00
Yiqing Yan
6b17dff2f1
Waive L0 test ( #5024 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-09 16:03:15 +08:00
Yan Chunwei
f4bfb8e49d
ci: unwaive llmapi launch test ( #4991 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-09 13:25:43 +08:00
Yechan Kim
8b4104d34a
feat: add HyperCLOVAX-SEED-Vision support in refactored way ( #4799 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-09 11:04:04 +08:00
Omer Ullman Argov
8731f5f14f
chore: Mass integration of release/0.20 ( #4898 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-08 23:26:26 +08:00
Mike Iovine
ec0d984656
[nvbug/5280806][fix] Fix 2 model spec decode flow ( #4807 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-08 07:40:02 -04:00
Yanchao Lu
9e05613679
[Infra] - Update JNLP container config ( #5008 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-06-08 16:44:09 +08:00
QI JUN
5ee0de7f2a
Resubmit #4894 ( #4969 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-08 04:42:15 +08:00
Ivy Zhang
7dce328ad6
[TRTLLM-5692][tests] Add speculative decoding test cases on torch flow ( #4940 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Co-authored-by: Ruodi Lu <ruodil@nvidia.com>
2025-06-07 11:18:32 +08:00
Fanrong Li
75d020cf07
fix: fix cuda graph padding for spec decoding ( #4853 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-06 22:21:42 +08:00
Anthony Chang
eeb555e37b
chore: memoize weight shuffle index to speed up weight preproc in moe_backend=TRTLLM ( #4826 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-06-06 16:13:54 +08:00
xinhe-nv
564472168e
test: [CI] Add failed cases into waives.txt ( #4966 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-06 10:30:15 +08:00
QI JUN
ec50684d80
Revert "fix a bug of global cuda graph dummy request" ( #4970 )
2025-06-06 08:54:45 +08:00
QI JUN
154f7cc40a
fix a bug of global cuda graph dummy request ( #4894 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-05 19:47:40 +08:00
Yiqing Yan
7e921c78b5
Waive L0 tests ( #4953 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-05 19:36:48 +08:00
Shunkangz
3eae58ca36
Add disaggregated unittest ( #4899 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-06-05 19:14:31 +08:00
ixlmar
a1526356aa
[TRTLLM-5630] restore free_gpu_memory_fraction=0.9 in tests ( #4859 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-06-05 10:46:29 +01:00
QI JUN
b8c5e3892b
Revert "fix: build_config in TorchLlmArgs and avoid invalid args" ( #4949 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-05 17:43:30 +08:00
QI JUN
d5a8079eb6
Revert "[infra] Unwaive unittests/_torch" ( #4950 )
2025-06-05 17:21:07 +08:00
xinhe-nv
1c3091c63b
tests: [TRTQA-2906] add benchmark serving tests ( #4901 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-05 14:33:03 +08:00
Yiqing Yan
9ceef983c0
Waive L0 tests ( #4927 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-05 11:09:01 +08:00
xinhe-nv
50a74a1daa
tests: fix 5273697 ( #4685 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-06-05 10:39:21 +08:00
Mike Iovine
8433091630
[infra] Unwaive unittests/_torch ( #4919 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-06-05 08:49:37 +08:00
Lucas Liebenwein
f9d45e03a4
[AutoDeploy] deprecate CI post-merge tests and keep them for local testing ( #4892 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-05 08:27:17 +08:00
Yi Zhang
1fca654bfd
tests: Update gb200 test case ( #4754 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-06-04 18:49:20 +08:00
Yan Chunwei
ac20159d32
fix: build_config in TorchLlmArgs and avoid invalid args ( #4600 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-04 13:17:29 +08:00
Shi Xiaowei
b13f8c9cba
Fix: NVBug 5302895 ( #4835 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-06-04 09:31:39 +08:00
Nikita Korobov
8043d7a03c
feat: update DeepSeek FP8 TRT-LLM Gen cubins ( #4643 )
...
Signed-off-by: Nikita Korobov <nkorobov@nvidia.com>
2025-06-03 14:07:54 -07:00
rakib-hasan
d0eb47d33a
[TRTLLM-5053] Refactoring and Unifying the Multimodal input preparation ( #4506 )
...
* refactoring the multimodal input prep
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* adding out-of-tree override option
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* adding exceptional case for llava-next
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* fixing typo
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* addressing review comments, adding placement option, handling tokenizer variations
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* addressing pytest-asyncio behavior change
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
---------
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-06-03 12:02:07 -07:00
Simeng Liu
2384655c3a
chore: Waive examples/test_mistral.py::test_llm_mistral_v1_1gpu. ( #4873 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-06-03 14:45:14 -04:00
Shunkangz
ae9a6cf24f
feat: Add integration of etcd ( #3738 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Batsheva Black <bblack@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>
2025-06-03 20:01:44 +08:00
Robin Kobus
b9263a8e10
fix: max_num_sequences calculation with overlap scheduling ( #4532 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-06-03 09:31:22 +02:00
Iman Tabrizian
141467d4b6
Add pre-merge Triton backend tests ( #4842 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-06-03 00:47:58 -04:00
ruodil
fa93eeee84
shorten reqs in con:1 cases and add streaming cases, and add l2 perf … ( #4849 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-03 12:28:13 +08:00
Ivy Zhang
8686868531
tests: [TRTQA-2905] improve timeout report for qa test cases ( #4753 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-06-03 12:27:27 +08:00
Robin Kobus
e34a1beb72
[nvbugs/5303555] ci: unwaive test_fp8_block_scales_cuda_graph_padding ( #4735 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-06-03 10:40:43 +08:00
Fanrong Li
380a5d1690
[ https://nvbugs/5271281 ][fix] fix a pd+mtp accuracy issue ( #4536 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-03 10:03:34 +08:00
Fanrong Li
13f68338d2
fix: [ https://nvbugspro.nvidia.com/bug/5273945 ] Unwaive tests for bug-5273945 ( #4832 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-02 22:01:57 +08:00
Yanchao Lu
8166649d03
[Infra] - Minor clean-up and test Ubuntu mirrors ( #4829 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-02 20:18:20 +08:00
Fanrong Li
7d356efc7d
fix: fix accuracy and illegal memory access issues when using mtp + attention dp ( #4379 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-02 00:35:52 +08:00
amirkl94
8039ef45d3
CI: Performance regression tests update ( #3531 )
2025-06-01 09:47:55 +03:00
Emma Qiao
202813f054
Check test names in waive list ( #4292 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-06-01 14:39:30 +08:00
Dom Brown
338d6e9f95
[nvbug 5305210] fix: Resolve nvbug 5305210 ( #4759 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-05-31 19:21:06 +08:00
Yan Chunwei
93c0632ee4
opt: the perormance for dist-agg streaming generation ( #4214 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-31 17:40:32 +08:00
Emma Qiao
c945e92fdb
[Infra]Remove some old keyword ( #4552 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-05-31 13:50:45 +08:00
Zheng Duan
54200ee8ac
fix: random fail of cache router test ( #4597 )
...
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-05-30 16:28:19 +08:00
xinhe-nv
53794b26f8
test: skip test_llm_hf_gemma_quantization_1gpu_vswa on A100 ( #4779 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-30 15:12:12 +08:00
Yilin Fan
31bb650298
Cherry pick feat/llama4 to main ( #4739 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Co-authored-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-05-30 05:28:40 +08:00
Jhao-Ting Chen
fcadce9f8d
[fix] Eagle-2 LLMAPI pybind argument fix. ( #3967 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-05-29 12:23:25 -07:00
yuanjingx87
2c48ff5898
[feat] add b200 support via slurm ( #4709 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-05-29 14:49:46 +08:00
Yan Chunwei
33a9ba55f5
fix: test trtllm-bench mgmn ( #4613 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-29 14:43:47 +08:00
ruodil
500aca4f44
test: remove perf test l40s/l20 oom test cases and unwaive tests ( #4755 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-05-29 13:58:47 +08:00
QI JUN
058f83e47b
CI: move post-merge multi GPU test of PyTorch backend to H200 ( #4733 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-29 11:15:56 +08:00
Aurelien Chartier
6cf1e4d0a9
chore: add -f to pkill calls ( #4711 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-29 02:54:31 +08:00
Ivy Zhang
ed3c67e34a
tests: [ https://nvbugspro.nvidia.com/bug/5289908 ] run maverick bf16 on blackwell ( #4722 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-28 22:05:51 +08:00
xinhe-nv
93283484c2
test: [CI] Add failed cases into waives.txt ( #4688 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-28 22:04:35 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
amirkl94
fbec0c3552
Release 0.20 to main ( #4577 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-28 16:25:33 +08:00
xinhe-nv
bb3d998eb1
test: [CI] remove closed bugs ( #4638 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-27 18:07:59 +08:00
Yiqing Yan
f6c50293d2
[Infra][TRTLLM-3929] Rerun failure tests ( #3264 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-27 16:13:23 +08:00
Yiqing Yan
92a7984945
Waive L0 tests ( #4686 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-27 15:07:02 +08:00
xinhe-nv
59f7622281
test: rcca https://nvbugs/5223130 ( #4510 )
...
* add rcca tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* skip tests on blackwell
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-27 09:59:47 +08:00
yuanjingx87
732d92ff62
[Infra] - Multi-GPU testing support with Slurm ( #4454 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-26 19:44:19 +08:00
Enwei Zhu
88190faa34
feat: large-scale EP(part 4: Static EP load balancer integration) ( #4615 )
...
* MoeLoadBalancerConfig
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* MoeLoadBalancer integration
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* config file
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* test
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* test
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-26 18:25:11 +08:00
Emma Qiao
6f626af386
[TRTLLM-4535][infra]: Add marker TIMEOUT for test level ( #3905 )
...
* Add marker for TIMEOUT
Signed-off-by: qqiao <qqiao@nvidia.com>
* Remove workspace after tests
Signed-off-by: qqiao <qqiao@nvidia.com>
* Add missed property
Signed-off-by: qqiao <qqiao@nvidia.com>
* Add some debug info
Signed-off-by: qqiao <qqiao@nvidia.com>
* Fix errors
Signed-off-by: qqiao <qqiao@nvidia.com>
* Testing
Signed-off-by: qqiao <qqiao@nvidia.com>
* Special process for unittests
Signed-off-by: qqiao <qqiao@nvidia.com>
* Move special proecessing unittests to test generating stage
Signed-off-by: qqiao <qqiao@nvidia.com>
* Process for the whole test list
Signed-off-by: qqiao <qqiao@nvidia.com>
* Test more
Signed-off-by: qqiao <qqiao@nvidia.com>
* Add another test case
Signed-off-by: qqiao <qqiao@nvidia.com>
* Change back the setting for testing
Signed-off-by: qqiao <qqiao@nvidia.com>
* Revert another config file
Signed-off-by: qqiao <qqiao@nvidia.com>
* Add descriptionf or timeout in test readme
Signed-off-by: qqiao <qqiao@nvidia.com>
---------
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-05-25 23:30:40 -07:00
Yiqing Yan
2fee408536
Waive L0 tests ( #4645 )
...
* Waive L0 tests
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
* Apply suggestions from code review
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-26 11:05:01 +08:00
hlu1
4a236d107d
[Fix][Deepseek] Fix bugs in TestDeepSeekR1 ( #4413 )
...
[Deepseek] Fix bugs in TestDeepSeekR1
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-24 09:52:57 +08:00
Yanchao Lu
20c15fc04f
Fix invalid testcase name ( #4626 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-24 00:40:00 +08:00
Robin Kobus
15a59e57f6
[nvbugs/5301492] ci: waive test_workers_kv_cache_aware_router ( #4617 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-23 20:14:28 +08:00
zhhuang-nv
8452775db8
[TRTLLM-5070][feat] Support FP8 KV Cache Reuse for MLA ( #4535 )
...
* optimize kv cache reuse workflow for MLA
write kv cache first and only call up-projection GEMM once
relax contiguous requirements of k/v for setting paged kv cache
return two contiguous tensors when loading MLA KV Cache
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* support fp8 kv cache for MLA kv cache reuse
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* resolve comments
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
---------
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-05-23 19:47:50 +08:00
Anthony Chang
bbea2647b1
Qwen3 supports TRTLLM FP4 MoE backend ( #4530 )
...
* MoE TRTLLM backend for Qwen3
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* add extra moe_backend to test
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* address comments
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* conditionally compile kernels on newer archs
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* missing positional arg
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* Update the routing kernels
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
* Revise usage of TLLM_LOG_ERROR
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
* Add unit test for Qwen3 moe (trtllm_gen backend)
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
* improve weight processing speed of moe_backend=TRTLLM; roughly 2x
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* tidy and minor fix
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
* temporarily disable accuracy test that has known issue
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
---------
Signed-off-by: Anthony Chang <anchengc@nvidia.com>
Signed-off-by: Christina Zhang <christinaz@nvidia.com>
Co-authored-by: Christina Zhang <christinaz@nvidia.com>
2025-05-23 18:31:08 +08:00
Enwei Zhu
d7443b6068
[ https://nvbugspro.nvidia.com/bug/5181262 ] [test] Unwaive Mistral Nemo test ( #4515 )
...
unwaive
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-23 10:14:00 +08:00
pcastonguay
d7d455e7ea
[feat][TRTLLM-5018] Dis serving python runtime trt backend ( #4243 )
...
* feat: Enabling dis serving with TRT backend with Python runtime
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing formatting
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing disagg mtp test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-22 22:01:06 -04:00
Mike Iovine
14fc48ada7
[nvbug/5285881][fix] Fix chunked prefill + overlap scheduler ( #4402 )
...
[fix] Fix chunked prefill + overlap scheduler
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-23 04:38:22 +08:00
Venky
c713eb5799
test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) ( #4446 )
...
ultra
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-22 13:07:33 -07:00
xinhe-nv
22c01d5b21
test: [CI] Add failed cases into waives.txt ( #4549 )
...
* update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* fix test issues
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-22 17:18:53 +08:00
ruodil
1a45890dae
test: waive hanging cases for perf test ( #4562 )
...
waive hanging cases
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-22 15:50:05 +08:00
HuiGao-NV
bc9f1dbede
fix[nvbug-5228840]: Remove test cases of feature not supported anymore ( #3972 )
...
* Remove waived cases
* Remove test cases of not supported feature
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-22 11:18:58 +08:00
Aurelien Chartier
f491244c84
feat: add dataset support for benchmark_core_model with LLMAPI ( #4457 )
...
* feat: add dataset support for benchmark_core_model with LLMAPI
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-21 19:18:43 -07:00
Michal Guzek
9033dd987d
[TRTLLM-4932] Add CLI accuracy tests for Phi-4-mini-instruct ( #4415 )
...
Add phi-4-mini CLI acc test
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-22 09:56:48 +08:00
Chuang Zhu
44cfd757b2
Agent interface impl for NIXL ( #4125 )
...
* agentConnection
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
recv
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
agentState
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
NIXL interfaces
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
update cmakelists
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
nixl improve
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
remove cppzmq
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
transferAgent remove register
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
work for cache Test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
reduce sleep time
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
intergarte
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
nixl env
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
fix rebase error
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
cpp test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
stash for send metaData
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
loadRemoteMD after fetchRemoteMD
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
workaround for mixed gen and context
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
test_env
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
avoid port conflict in test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* format
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* use std::string
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* typo
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* fix transferAgentTest
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-22 09:09:41 +08:00
Aurelien Chartier
1681e9fd1e
chore: remove extra PYTHONPATH ( #4453 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-21 17:38:01 -07:00
Dom Brown
1cffa99792
test: Split test_simple into mpi_utils and cache transceiver tests for DGX ( #4451 )
...
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-05-22 04:26:21 +08:00
Venky
0a8461d54c
test(perf): Pt.2 Add Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (cpp) ( #4499 )
...
add low concurrency perf tests
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-21 10:46:48 -07:00
xinhe-nv
407ef08662
tests: add qwene fp4 tests into QA test list & update sanity test list ( #4478 )
...
* update sanity test list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* update test list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-21 16:52:02 +08:00
ruodil
83f1933f0c
test: add failed case in waive list and fix some test script issue for perf test ( #4527 )
...
add failed case in waive list and fix some test script issue
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-21 16:37:25 +08:00
ruodil
3d9a2b5eb7
test: remove enable_overlap_schedule in pytorch config and set enable_chunked prefill to be true for isl>2048 cases ( #4285 )
...
1.remove enable_overlap_schedule in pytorch config
2.rename model_yaml_config.py to pytorch_model_config.py and set enable_chunked_prefill to be true for cases with isl>2048
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-21 14:26:56 +08:00
QI JUN
15317ece5a
CI: waive test_fp8_block_scales_4gpus of deepseek v3 lite ( #4520 )
...
waive test_fp8_block_scales_4gpus of deepseek v3 lite
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-21 13:19:43 +08:00
xinhe-nv
750f412b8f
tests: add llama 3.3 70b 2 nodes tests ( #4391 )
...
* add llama 3.3 70b 2 nodes tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* remove enable_overlap_scheduler parameter
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-21 12:42:45 +08:00
Chuang Zhu
ab5bea957d
unwaive some disagg test ( #4476 )
...
* unwaive some disagg test
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
* pytest.mark.skip_less_device(4)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
---------
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-21 11:45:11 +08:00
QI JUN
2372589689
Chore: waive torch compile test cases of deepseek v3 lite ( #4508 )
...
waive torch compile test cases of deepseek v3 lite
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-21 10:43:31 +08:00
Shi Xiaowei
3d62727303
test: NIXL single process test ( #4486 )
2025-05-21 10:41:46 +08:00
Yan Chunwei
9199793848
fix: llmapi-launch add add trtllm-bench test with engine building ( #4091 )
...
* add trtllm-bench mgmn test
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-21 10:18:01 +08:00
Zheng Duan
77a0189554
feat: conditional disaggregation in disagg server ( #3974 )
2025-05-21 09:57:46 +08:00
Venky
9a8c3ece22
test(perf): Add remaining Phi-4-mini-instruct perf tests ( #4443 )
...
add remaining 2 phi cpp perf tests
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-21 09:26:12 +08:00
xinhe-nv
19c6e68bec
test: [CI] remove closed bugs ( #4417 )
...
* waives closed bugs
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* update waives
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-21 09:13:25 +08:00
Rohan Varma
3d940e77f0
[TRTLLM-5273]feat/Use full attention mask if Llama3 is used as encoder and fix EarlyStopDecoder unsqueeze bug ( #4290 )
...
* add bidirectional support and fix EarlyStopDecoder unsqueeze to be compatible with LogitsStorage
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
* run pre-commit
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
* instead of bidirectional flag use ModelConfig.is_generation
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
* fix unit test to extract logits from correct dim
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
---------
Signed-off-by: Rohan Varma <rohanv@nvidia.com>
2025-05-20 10:15:36 -07:00
bhsueh_NV
ec4190fb71
infra: Add qwen3 235B tests into QA ( #4483 )
...
* add qwen3 qa test
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* add qwen3 test into qa list
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-20 17:37:09 +08:00
ruodil
b5edf13b33
test: update test filter in perf test yml file to select cases by gpu name and add cases for RTX 6000 pro ( #4282 )
...
* add cases for rtx_pro_6000 and update test filter
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
* amend a typo in model llama_v3.1_405b_instruct fp4 and add more cases for rtx pro 6000 and waive_list
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
---------
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-20 10:58:05 +08:00
Michal Guzek
0a342a42f7
[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant ( #4362 )
...
* Add CLI TestLlama3_3_70BInstruct acc tests
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add tests to qa lists
Signed-off-by: moraxu <mguzek@nvidia.com>
* Add comment
Signed-off-by: moraxu <mguzek@nvidia.com>
* Fix test names
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update yaml files
Signed-off-by: moraxu <mguzek@nvidia.com>
* Update cli file
Signed-off-by: moraxu <mguzek@nvidia.com>
---------
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-20 09:48:14 +08:00
xinhe-nv
402385588d
test: [CI] Add failed cases into waives.txt ( #4429 )
...
* update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* update waive id
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-20 09:43:55 +08:00
Yuxian Qiu
c8e062bfd3
fix: [nvbugs/5287097] Align PP layer distribution between pytorch and TRT flow. ( #4399 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-19 14:25:36 -07:00
Venky
bb02d86b54
test(perf): Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (TRT flow, trtllm-bench) ( #4128 )
...
* changes to run llama-v3.3-nemotron-super-49b
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
* yapf
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
* address review comments pt 1
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
* re-add cpp super tests
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
---------
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-19 12:00:48 -07:00
Faraz
7656af1b57
[TRTLLM-4618][feat] Fix cutlass MoE GEMM fallback failure on FP8 + add e2e test for Mixtral 8x7B FP8 on RTX6000 Pro (SM120) ( #4335 )
...
* add mixtral7x8b fp8 test with fixed cutlass fp8 moe gemm
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* update cutlass versions
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* added internal cutlass with fix and docker update
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* added mixtral to pro 6000
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
---------
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-19 08:56:21 -07:00
liji-nv
58e405624a
[ https://nvbugs/5123103 ][fix] Fix torch compile for DeepSeekV3 ( #3952 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-05-19 22:12:25 +08:00
Iman Tabrizian
c6074c47da
Add llama4 disagg accuracy tests ( #4336 )
...
* Add llama4 disagg accuracy tests
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Make it async and add GSM8K benchmark
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 21:55:08 +08:00
Shi Xiaowei
001704cc6a
fix: temp disable the problem test ( #4445 )
...
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-19 21:54:32 +08:00
Dom Brown
c45f414bbf
Test: Improve model re-use in C++ DGX tests for CI stability ( #4263 )
...
* Fix padded vocab size for Llama
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Refactor multi GPU llama executor tests, and reuse the built model engines
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Fix test list typo
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Further WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Update test lists and readme
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Try parametrize for asymmetric
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Parametrize + skip unsupported combinations
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
* Update test list
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
* Reduce environment duplicated code
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
---------
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Signed-off-by: domb <3886319+DomBrown@users.noreply.github.com>
2025-05-19 14:20:21 +01:00
Shi Xiaowei
df2798e0c3
feat: NIXL interface integration ( #3934 )
...
NIXL interfaces
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-19 18:18:22 +08:00
Kaiyu Xie
a43914619f
fix: wrong argument name enable_overlap_scheduler ( #4433 )
...
Fix wrong argument
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-19 15:02:22 +08:00
Yan Chunwei
5b1c88de8d
chore: cleanup perf_evaluator code ( #3833 )
...
* chore: cleanup perf_evaluator code
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* up
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-19 13:21:36 +08:00
Ivy Zhang
58d2508b89
tests: Add test cases for rcca cases ( #4347 )
...
* add qwen2_0_5_instruct cp4 test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add qwen2.5 fp8 kvcache test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add ds distill qwen cpp runner test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-19 12:06:43 +08:00
Ivy Zhang
c4a0d768b5
tests: add qa test mentioned in docs ( #4357 )
...
* add nemotron-h and llama_70b cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add llm decoder quick_start case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update nemotron-h test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add qwen3 quickstart test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add trtllm_decoder accuracy test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove quickstart test for llm_decoder
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix import error
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* nemotronh fp8 trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* fix name
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove nemotronh-fp8
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-19 10:06:51 +08:00
Faraz
791c209006
[TRTLLM-4618][feat] Add Nemotron Super 49B FP8 test on RTX6000 Pro (SM120) ( #4363 )
...
* added nemotron 49b fp8 for B40 release
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* add tests to QA list
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* pre-commit changes
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
---------
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-19 09:30:24 +08:00
Iman Tabrizian
7de90a66bc
Remove vila test ( #4376 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-19 09:02:39 +08:00
Yanchao Lu
0d7269e2a7
[Infra][Docs] - Some clean-up for the CI pipeline and docs ( #4419 )
...
* [Docs] - Some clean-up for the docs
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
* [Infra] - Some clean-up for the CI pipeline
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-19 00:07:45 +08:00
shaharmor98
27afcb9928
add changes for fp8, nemotron-nas, API ( #4180 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-05-18 23:27:25 +08:00
Venky
fb663b637a
Extend the Llama-Nemotron-Nano-8B perf-integration-tests (cpp) ( #4195 )
...
* add ll-nm-nano tests that map to nim requirements
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
* prune some pytorch cases (fp8)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
* removing pyt backend test changes
- When validating the pytorch tests with the isl/osl/conc/quant settings (that is done for cpp backend too), seeing hangs that need further debugging.
- Therefore don't want to block this PR, hence removing them.
- Seeing
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
---------
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-17 22:46:21 +08:00
Yuxian Qiu
cc1bba1686
test: Waive tests for nvbugs/5286795. ( #4409 )
...
* Waive tests for nvbugs/5286795.
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
* Apply suggestions from code review
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
---------
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-17 19:41:05 +08:00
Jinyang Yuan
b618e1f55b
perf: Eliminate the need for attention DP padding when possible ( #3439 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: raccoonliukai <raccoonliu@tencent.com>
2025-05-17 13:30:55 +08:00
hlu1
befb93cbff
[Deepseek] Add accuracy test references for fp8 kvcache ( #4374 )
...
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-17 11:23:00 +08:00
liji-nv
fb437ed709
[CI] waive accuracy/test_cli_flow.py::TestTinyLlama1_1BChat::test_pp4 ( #4397 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-05-16 20:18:07 +08:00
Emma Qiao
27bdd0c82d
[TRTLLM-4886][infra]Try another timeout opt to exit test thread directly instead of gracefully ( #4341 )
...
* Try another timeout opt to kill test thread
Signed-off-by: qqiao <qqiao@nvidia.com>
* Return true when try to delete non-existing result file
Signed-off-by: qqiao <qqiao@nvidia.com>
* quick test for the result file
Signed-off-by: qqiao <qqiao@nvidia.com>
* Change back the global timeout setting
Signed-off-by: qqiao <qqiao@nvidia.com>
* Try to kill test in internal pytest
Signed-off-by: qqiao <qqiao@nvidia.com>
---------
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-05-16 17:56:40 +08:00
Daniel Cámpora
df19430629
chore: Mass Integration 0.19 ( #4255 )
...
* fix: Fix/fused moe 0.19 (#3799 )
* fix bug of stream init
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix bug
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
* fix: Add pre-download of checkpoint before benchmark. (#3772 )
* Add pre-download of checkpoint before benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Add missing remote code flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move from_pretrained to throughput benchmark.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Move download and use snapshot_download.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Removed trusted flag.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* Fix benchmark command in iteration log test.
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
---------
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5241495 ][fix] CUDA Graph padding with overlap scheduler (#3839 )
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fuse
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* TRTLLM-4875 feat: Add version switcher to doc (#3871 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* waive a test (#3897 )
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939 )
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
* fix: remote mpi session abort (#3884 )
* fix remote mpi session
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* skip fp8 gemm for pre-hopper (#3931 )
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* [https://nvbugspro.nvidia.com/bug/5247148 ][fix] Attention DP with overlap scheduler (#3975 )
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update multigpu list
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix namings
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* Doc: Fix H200 DeepSeek R1 perf doc (#4006 )
* fix doc
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* update perf number
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
---------
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
* Fix the perf regression caused by insufficient cache warmup. (#4042 )
Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* doc: Update 0.19.0 release notes (#3976 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* Optimize the AutoTuner cache access code to reduce host code overhead. (#4060 )
The NVFP4 Linear op is very sensitive to the host overhead.
This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Update switcher (#4098 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* doc: update release notes (#4108 )
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
* docs:update 0.19 doc. (#4120 )
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* docs:add torch flow supported model list. (#4129 )
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
* doc: Release V0.19 Perf Overview Update (#4166 )
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
* Fix readme of autodeploy.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Update tensorrt_llm/_torch/pyexecutor/llm_request.py
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
* Revert mgmn worker node.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
* Change to disable_overlap_scheduler.
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
---------
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>
2025-05-16 10:53:25 +02:00
HuiGao-NV
d5578b37fc
Change the method to calculate kv memory size in tests ( #4332 )
...
* Change the method to calculate kv memory size in tests
* Set larger peak memory size to llama case
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-16 15:35:40 +08:00
xinhe-nv
500b43e90c
test: [CI] remove closed bugs ( #4345 )
...
update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-16 13:47:42 +08:00
Stanley Sun
11aa50d1ea
test: add kv cache aware test cases to qa test list ( #4257 )
...
add kv cache_aware test cases
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-16 12:47:01 +08:00
QI JUN
c4cd403af9
[CI] waive test_chunked_prefill test cases ( #4380 )
...
waive test_chunked_prefill
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-16 10:27:20 +08:00
Iman Tabrizian
4c7191af67
Move Triton backend to TRT-LLM main ( #3549 )
...
* Move TRT-LLM backend repo to TRT-LLM repo
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Address review comments
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* debug ci
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Update triton backend
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
* Fixes after update
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
---------
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-16 07:15:23 +08:00
yuxianq
4f8afe4cc6
feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism ( #4034 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-16 04:16:53 +08:00
Venky
adb0839a33
test(perf): Add Phi-4-mini-instruct to perf tests ( #4267 )
...
* add phi-4-mini-instruct
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
* trim tests
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
---------
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-15 21:27:03 +08:00
yuxianq
0e87fcc228
refactor: use x is None instead of x == None. ( #4244 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-15 20:00:04 +08:00
Yanchao Lu
5ce1102a02
Revert "[test] add qa test mentioned in docs" ( #4355 )
...
Revert "[test] add qa test mentioned in docs (#4248 )"
This reverts commit b0ce1371ee .
2025-05-15 18:47:30 +08:00
Stanley Sun
9d3e05486b
test: add qa test list for rtx5090 and rtx_pro_6000 ( #4254 )
...
* add test list for rtx5090 and rtx_pro_6000
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
* add 2gpu llama70b test cases
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
* remove duplicate and invalid test cases
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
* add 2gpus test cases
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
---------
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
2025-05-15 17:57:31 +08:00
zhhuang-nv
d6b741ddfe
[fix] test_no_kv_cache_reuse for overlap_scheduler ( #4350 )
...
fix test_no_kv_cache_reuse for overlap_scheduler
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-05-15 16:43:53 +08:00
xinhe-nv
14bfb5e0d6
test: FIX test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus ( #4283 )
...
* update test_ptp_quickstart_advanced_deepseek_v3_2nodes_8gpus
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* skip llava-v1.6-mistral-7b-hf-vision-trtllm on L40S
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-15 15:57:44 +08:00
zhhuang-nv
97bc680cd8
feat: support kv cache reuse for MLA ( #3571 )
...
* support kv cache reuse for MLA
load compressed_kv and k_pe and do up-projection
use 192/128 head size MLA context kernel
support Blackwell and Hopper now
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* add CI test
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix: set k_pe head_num to 1 for kernel 2 and kernel 2V2
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
* resolve comments
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* use GPTJ style RoPE for MLA
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix rebase error and some docs
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix kv_lens
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* tiny fix
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix torch compile
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix: use normal device memory instead of pinned memory for unit test
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
* fix L0 tests
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* fix torch compile after rebase
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* resolve comments
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
* resolve comments again
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
---------
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
Signed-off-by: zhhuang-nv <145532724+zhhuang-nv@users.noreply.github.com>
Co-authored-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-05-15 15:22:21 +08:00
Kaiyu Xie
b4e5df0ee0
Breaking change: perf: Enable scheduling overlap by default ( #4174 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-15 14:27:36 +08:00
dominicshanshan
404fbe9b32
[ https://nvbugs/5277113 ][fix]genai-perf API change stress test ( #4300 )
...
* fix bug 5277113.
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
* fix bug 5277113 and 5278517.
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
---------
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-05-15 14:12:34 +08:00
Ivy Zhang
b0ce1371ee
[test] add qa test mentioned in docs ( #4248 )
...
* add nemotron-h and llama_70b cases
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* trial
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add llm decoder quick_start case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* update nemotron-h test case
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add qwen3 quickstart test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* add trtllm_decoder accuracy test
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
* remove quickstart test for llm_decoder
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
---------
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-15 13:37:11 +08:00
hlu1
3ea42e7519
[test] Reorganize TestDeepSeekR1::test_nvfp4_8gpus ( #4346 )
...
Reorganize TestDeepSeekR1::test_nvfp4_8gpus
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
Co-authored-by: Hao Lu <14827759+hlu1@users.noreply.github.com@users.noreply.github.com>
2025-05-15 13:09:13 +08:00
Mike Iovine
f9adac3dea
[feat] Enable chunked context for flashinfer ( #4132 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-15 10:59:38 +08:00
Robin Kobus
d31fefde2c
[TRTLLM-5171] chore: Remove GptSession/V1 from TRT workflow ( #4092 )
...
* chore: Remove GptSession/V1 from TRT workflow
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove stateful decoders
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession buffers
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession utils
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession kernels
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove V1 GPT models from tests
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove gptSessionBenchmark from scripts and docs
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove gptSession IO classes
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession from test lists
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove GptSession from docs
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove useless encoder test
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove mActualBatchSize from DecoderState
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
* chore: Remove static batching from ExecutorTest
- Updated `validateContextLogits` and `validateGenerationLogits` functions to remove the `batchingType` parameter.
- Adjusted related test functions to reflect the changes in parameter lists.
- Cleaned up the instantiation of test cases to eliminate unnecessary batchingType references.
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
---------
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-14 23:10:04 +02:00
Faraz
42de79d49e
test: Added tests for Llama3.1-70B-BF16 on SM120 ( #4198 )
...
* Added tests for Llama3.1-70B-BF16 on SM120
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
* solve conflicts add more tests
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
---------
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-14 11:57:49 -04:00
Yanchao Lu
504f4bf779
[Infra] - Update the upstream PyTorch dependency to 2.7.0 ( #4235 )
...
[Infra][TRTLLM-4941] - Update the upstream PyTorch dependency to 2.7.0
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-14 22:28:13 +08:00
Kaiyu Xie
6c45586c51
chore: Remove deprecated Python runtime benchmark ( #4171 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-14 18:41:05 +08:00
HuiGao-NV
f4059c6e2e
Add test case for kv memory estimation ( #4158 )
...
* Add test case for kv memory estimation
* Dump running log into file and parse kv cache memory size from file
* Set bigger peak memory size for mixed percision case and test_ptp_quickstart_advanced_eagle3 case
* Revert change to usage of fraction
* use context manager to guard temp files
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-14 18:39:25 +08:00
xinhe-nv
f2bfe2f84f
test: [CI] remove closed bugs ( #4207 )
...
update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-14 17:59:05 +08:00
DylanChen-NV
206f82115d
[bug/5247505] fix: CP accuracy on Blackwell ( #4188 )
...
* fix xqa params for cp
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* try adding B200 multi gpu test
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
* add accuracy tests for cp
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
---------
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-05-14 17:40:50 +08:00
Anurag Mukkara
b15f57763d
tests: PyTorch multimodal using keyword match ( #4215 )
...
* keyword accuracy check for pytorch multimodal
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
* Change keywords for some prompts
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
* Delete full text answers
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
* Cleanup debug code
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
---------
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
2025-05-14 17:18:43 +08:00
Yiqing Yan
a66a02a75a
[Infra] Waive L0 test ( #4295 )
...
Waive L0 test
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-14 16:38:33 +08:00
Zongfei Jing
bb17649517
test: Add UT for moe trtllmgen ( #4258 )
...
* Add ut for moe trtllmgen
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
* Update tests/unittest/_torch/modeling/test_modeling_deepseek.py
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
---------
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-05-14 15:22:58 +08:00
bhsueh_NV
1a9298bc66
CI: add fp8/fp4 ci on Qwen3-30B-A3B ( #4266 )
...
add fp8/fp4 ci on Qwen3-30B-A3B
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-05-14 14:38:04 +08:00
brb-nv
8280c3d4f2
feat: Support Gemma3-1b-it in Pytorch workflow ( #3999 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-14 14:02:44 +08:00
Yi Zhang
86ae506b9d
[fix] Enable pp tests ( #3978 )
...
Fix misrebase issue
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2025-05-14 10:51:20 +08:00
brb-nv
1ef117688c
test: Validate FP8 and LoRA for Gemma3 ( #3670 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-13 17:28:02 -07:00
Iman Tabrizian
f408de2d99
Waive disagg kv cache load balancer test ( #4276 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-14 06:03:24 +08:00
brb-nv
cd5b3d21a0
feat: Support Mistral Small 3.1 24B VLM in TRT workflow ( #4183 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-14 03:47:22 +08:00
Yiqing Yan
290649b6aa
[Infra] Waive L0 test ( #4269 )
...
Waive L0 test
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-13 23:06:13 +08:00
Yiqing Yan
bfa16a63d4
[Infra] Waive L0 test ( #4268 )
...
Waive L0 test
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-13 22:43:17 +08:00
dominicshanshan
44d6adfb68
Waive stress test. ( #4262 )
...
* Waive stress test.
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com>
---------
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: dominicshanshan <30051912+dominicshanshan@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-13 21:01:57 +08:00
Enwei Zhu
8f68d56cc1
[ https://nvbugs/5220763 ] [test] Unwaive Mixtral FP8 TP2 test ( #4252 )
...
unwaive
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-13 15:55:33 +08:00
Yiqing Yan
fda8b0277a
[Infra][TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04 ( #4049 )
...
* [TRTLLM-4374] Upgrade TRT 10.10.0 GA, CUDA 12.9 GA and DLFW 25.04
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
* fix review
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
* update images
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
* Update jenkins/L0_Test.groovy
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
* update image name
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
---------
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-13 14:59:12 +08:00
ruodil
d555fe2530
test: fix for perf test script issue ( #4230 )
...
fix for perf test script issue
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-13 10:29:20 +08:00
xinhe-nv
0cebc16139
test: [CI] Add failed cases into waives.txt ( #4205 )
...
waive tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-13 10:22:42 +08:00
xinhe-nv
7ebae4dcaa
test: [CI] Add failed cases into waives.txt ( #4203 )
...
* update waive list
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* update waives
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-13 10:08:02 +08:00
Enwei Zhu
035d915fea
[TRTLLM-5081] [test] Align parametrize_with_ids to the pytest behavior ( #4090 )
...
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* normalize mtp_nextn
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update test_durations
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-13 07:41:51 +08:00
wili
eba3623a54
Feat: Variable-Beam-Width-Search (VBWS) part4 ( #3979 )
...
* feat/vbws-part4-v1.8: rebase
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* feat/vbws-part4-v1.9: fix incorrect output when using short output length
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.1: remove useless variables
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.2:fix incorrect output when using short output length
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.3: rebase
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.4: rebase
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.9.5: remove API change
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
---------
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-05-12 22:32:29 +02:00
Enwei Zhu
c31ca1688c
[ https://nvbugs/5214229 ] [fix] Unwaive lm_head quantization case ( #4222 )
...
unwaive
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-12 20:23:06 +08:00
Zheng Duan
c9e2a963e0
feat: add kv cache aware router ( #3831 )
...
* kv cache aware router
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* add tests
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* router config
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* eviction test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
add test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* eviction detect in worker test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* move worker tests to single gpu
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* reduce memory fraction
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* fix partial block
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
---------
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-05-12 07:23:57 -04:00
Yixin Dong
c90ebadd84
feat: Support the Structural Tag in guided decoding ( #4066 )
...
* finish
Signed-off-by: Ubospica <ubospica@gmail.com>
* update
Signed-off-by: Ubospica <ubospica@gmail.com>
* update
Signed-off-by: Ubospica <ubospica@gmail.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* exc overlap scheduler
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add test
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix api ref
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Ubospica <ubospica@gmail.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-12 17:24:50 +08:00
Yechan Kim
3e9bda3a09
[feat] Support HyperCLOVAX-SEED-Text language part ( #3902 )
...
* feat: support HyperCLOVAX-SEED-Text language part
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* add Pytorch flow and remove test file
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* revert summarize
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* fix summarize
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
* remove from pytorch example
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
---------
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-05-12 16:05:14 +08:00
Ivy Zhang
ee92edf2b4
[ https://nvbugspro.nvidia.com/bug/5270564 ][test] skip per-hopper for llama4 ( #4211 )
...
skip per-hopper for llama4
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-12 15:27:15 +08:00
ruodil
9c03a7ab74
test: add llama_3.2_1B model and fix for test lora script issue ( #4139 )
...
* test: add llama_v3.1_8b_fp8 model, llama_v3.1_405b model and llama_nemotron_49b model in perf test, and modify original llama models dtype from float16 to bfloat16 according to README.md
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
* add llama_3.2_1B model and fix for lora script issue
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
---------
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-12 14:51:59 +08:00
xinhe-nv
849d9c343c
tests: https://nvbugs/5219534 remove failed tests from test list ( #4113 )
...
remove unsupported tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-12 14:13:40 +08:00
Yiqing Yan
3c54e84e47
[Infra] Waive L0 test ( #4212 )
...
Waive L0 test
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-12 11:37:49 +08:00
QI JUN
f021afa241
[CI] waive two multi-gpu test cases ( #4206 )
...
waive two multi-gpu test cases
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-12 08:04:48 +08:00
Enwei Zhu
7db368c72c
test: Remove CNN Dailymail tasks in favor of GSM8K ( #4187 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-05-10 09:02:07 +08:00
Dom Brown
2d0f93a054
Refactor: Restructure C++ tests for better modularisation of non-shared code ( #4027 )
...
* Refactor: Restructure C++ tests for better modularisation of non-shared code
Start cleanup of pytest code for C++ tests
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Clean up names and remove references to test_cpp.py
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
WIP
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Move multi-GPU code
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Update doc and try un-waiving
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Update multi GPU file check
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
* Address minor multi-GPU setup bug
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
---------
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-05-09 19:16:51 +01:00
Mike Iovine
4b8ba7ad61
[fix][nvbug/5244009] Fix llama 4 test lists/scout accuracy issue ( #4069 )
...
[fix] Fix llama 4 test lists
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-05-09 22:45:14 +08:00
Tracin
446f62bbab
chore: Deprecate evaltool ( #4173 )
...
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-05-09 20:31:53 +08:00
ruodil
bf5b2a2e0a
test: amend regex match for perf throughput ( #4186 )
...
amend regex match for perf throughput
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-09 17:33:25 +08:00
xinhe-nv
9082411a50
test: [CI] Add failed cases into waives.txt ( #4165 )
...
wavie oom tests
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-09 16:56:30 +08:00
ruodil
5ce5b81281
test: amend default pytorch extra-llm-api-config.yml in perf test ( #4176 )
...
* amend default pytorch extra-llm-api-config.yml
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
* add print info to separate cases in output log
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
---------
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-09 16:46:48 +08:00
xinhe-nv
1d26a3fd7c
test: skip tests on b200 ( #3913 )
...
* skip tests on b200
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
* skip phi-3-128k
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
---------
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-09 14:51:55 +08:00