Commit Graph

1238 Commits

Author SHA1 Message Date
HuiGao-NV
5206f1ce47
[https://nvbugs/5474169][fix] seq_len mismatch between kv cache manager and graph attn metadata (#7606)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-09 08:32:31 +08:00
Yanchao Lu
275a09d0a2 Revert "[https://nvbugs/5461761][fix] Remove the waiver (#7427)"
This reverts commit 4612906b67.
2025-09-06 18:11:34 +08:00
Ziyi Xiong
4612906b67
[https://nvbugs/5461761][fix] Remove the waiver (#7427)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-04 11:34:25 +08:00
Yan Chunwei
ad80819ef0
[https://nvbugs/5351244][fix] test_mpi_session (#7501)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-04 10:10:43 +08:00
dongxuy04
9eecdf2ee9
[TRTLLM-7008][fix] cherrypick fix to 1.0 Add automatic shared memory delete if already exist (#7433)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-09-02 11:23:53 +08:00
Emma Qiao
991b83af81
[None][infra] Waive failed tests on release branch 0901 (#7448)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-01 23:24:51 +08:00
Yuxian Qiu
559762f185
[https://nvbugs/5448754][fix] Download HF model for all nodes. (#6824)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-01 16:00:43 +08:00
Lizhi Zhou
7e4dad4dbb
[https://nvbugs/5448767][fix] disable kv cache reuse for disagg pp>1 tests (#7354)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2025-08-29 09:33:16 +02:00
amitz-nv
66f0657716
[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7203)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-28 16:06:32 +08:00
Wanli Jiang
4ae40cbacf
[https://nvbugs/5480415][fix] Fix phi4mm multi-gpu test (#7275)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-27 22:24:19 -04:00
Iman Tabrizian
91c4af3f01
[https://nvbugs/5434320][bug] Fix disagg pp bug (#7099)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-27 13:38:01 -04:00
brb-nv
4b1898e82e
[https://nvbugs/5480550][fix] Increase timeout for Gemma3 27B test (#7271)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-27 08:45:05 -07:00
brb-nv
201fd257cc
[https://nvbugs/5478151][fix] Add missing spec for Llama-3.3 70B (#7267)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-27 09:56:58 +08:00
William Zhang
b6eba85dfc
[https://nvbugs/5430125][ci] Unwaive test case for mistral 3.1 small (#7265)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-26 17:32:02 -04:00
William Zhang
34c1e9c341
[None][feat] Skip prefetching consolidated safetensors when appropriate (#7225)
* Why?

Some models (e.g. anything produced by Mistral) can have both sharded
safetensors and a consolidated safetensor in the same checkpoint
directory. In such cases, prefetching both to memory is a waste of time,
and memory.

* What?

This commit skips over consolidated safetensors when they are not the
only safetensor file present in the checkpoint directory.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-26 09:40:17 -07:00
Emma Qiao
7409d56053
[None][infra] Waive failed cases for release/1.0 (#7258)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-26 19:50:28 +08:00
Linda
b394c51c99
[https://nvbugs/5409416][fix] test_openai_multi_chat_example (#7174)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-08-26 11:33:58 +02:00
Ivy Zhang
1f7a1645d6
[None][fix] update skip case (#7193)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 12:31:48 +08:00
ruodil
ebbbacf81c
[None][test] add kv cache size in bench metric and fix failed cases (#7211)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-08-26 10:09:22 +08:00
pcastonguay
9d2b181e7d
[https://nvbugs/5470840][fix] Disaggregated unit test MPI Init handling (#7139)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-08-25 19:48:38 -04:00
Shi Xiaowei
d010b2043a
[TRTLLM-7030][fix] BREAKING CHANGE: Mismatch between docs and actual commands (#7191)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-25 20:21:43 +08:00
Wanli Jiang
036c3dd0ea
[TRTLLM-6825][fix] Update lora for phi4-mm (#7149)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-08-23 20:57:00 +08:00
Dom Brown
3f2eb4d2e8
[https://nvbugs/5461712] [fix] Disable deep_gemm for Qwen3 due to accuracy issues (#7170)
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-08-23 05:26:12 -04:00
Michal Guzek
7ea53ff516
[https://nvbugs/5433545][fix] TestPhi4MiniInstruct::test_auto_dtype - Use max_seq_len=4096 to fallback to the short RoPE factor (#6895)
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
2025-08-22 10:28:09 -07:00
Shi Xiaowei
3ee8523829
[https://nvbugs/5450074][fix] Reduce the device memory requirements for testing (#6990)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-22 17:33:30 +08:00
HuiGao-NV
253af9f9af
[https://nvbugs/5410391][bug] Support to share device buffers in attention meta (#6557)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-08-22 13:19:27 +08:00
Pamela Peng
1e5a6be55d
[https://nvbugs/5448442][fix] Skip trtllm moe backend for sm120 (#7010)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2025-08-21 13:34:07 -04:00
Emma Qiao
441edf1eeb
[None][infra] Skip failed tests for release branch (#7130)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-22 00:09:51 +08:00
Venky
9eac744d72
[https://nvbugs/5464088] [fix] dequantize fp8 activation input to lora forward; update perf test config (#7014)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-08-21 08:28:54 -04:00
Yan Chunwei
e77ec061db
[https://nvbugs/5451296][fix] zmq nonblock bug with retry (#7019)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-21 08:34:46 +08:00
chenfeiz0326
5acf213a15
[https://nvbugs/5440241][fix] Fix 70B GSM8K Accuracy drop (#7075)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-20 18:11:00 -04:00
yifeizhang-c
5959d72d74
[https://nvbugs/5394392][fix] Enlarge scheduler capacity under disagg bs == 1 (#6975)
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2025-08-20 16:32:27 +08:00
Jin Li
69846c6586
[https://nvbugs/5427801][fix] Torch compile support for Llama4 and Ea… (#6978)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-20 15:06:56 +08:00
Bo Deng
df00c81aea
[https://nvbugs/5448437][fix] fix some nixl tests (#6940)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-20 14:19:48 +08:00
Yan Chunwei
fae43e7b46
[None][doc] add status labels to LLM class's api reference (#6899)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-19 21:50:04 -04:00
Emma Qiao
c4535e6c3a
[None][infra] Waive failed tests for release branch (#7036)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-19 20:42:47 +08:00
Iman Tabrizian
520117ece0
[https://nvbugs/5451296][bug] Fix a thread leak in test_llm_args.py (#7017)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-19 07:19:53 -04:00
brb-nv
da91256503
[None][chore] Waive E2E GB200 tests for Gemma3 27B (#6916)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-08-19 05:19:34 -04:00
Yechan Kim
d6c2a6a81f
[https://nvbugs/5448579][fix] EXAONE-4.0 accuracy test bugfix (#6888)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-19 09:29:32 +02:00
Nave Assaf
d4dd5b4f4d
[https://nvbugs/5451028][fix] Constrain NemotronSuper test parameters… (#6987)
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
2025-08-19 09:19:50 +02:00
William Zhang
790a105563
[https://nvbugs/5462007][ci] Unwaive Mistral Small 3.1 FP8 test (#7008)
The error was fixed by #6909.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-18 19:50:03 -04:00
Yiqing Yan
28c30e1bf8
[None][chore] Remove duplicate test waives (#6999)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-18 22:04:43 +08:00
Emma Qiao
2992e9cd58
[None][infra] Waive failed tests for release branch 0818 (#6993)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-18 20:31:50 +08:00
peaceh-nv
28526fe2b1
[https://nvbugs/5449218][fix] Fix KvCacheConfig error in test_perf (#6937)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-08-18 15:58:53 +08:00
Ivy Zhang
055fdd9e31
[None][fix] update skip config (#6891)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-18 13:50:46 +08:00
Guoming Zhang
96bda14fbd
[https://nvbugs/5375646][fix] update waives.txt for nvbug 5375646 (#6847)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-17 23:22:01 -04:00
William Zhang
c16aff5e3f
[https://nvbugs/5448525][fix] Mistral Small 3.1 accuracy tests (#6909)
This commit lowers the GPU memory allocated for KV cache in accuracy
tests, and adjusts a threshold for Mistral Small 3.1 24B for FP8.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-18 11:17:37 +08:00
Yan Chunwei
6d65b63b8d
[None][ci] unwaive test_ptp_star_attention_example (#6943)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-15 05:33:25 -04:00
xinhe-nv
c03ea1ba2d
[TRTLLM-7048][feat] add benchmark TRT flow test for MIG (#6884)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-15 14:01:05 +08:00
Yan Chunwei
54ffc6a250
[None][doc] add legacy section for tensorrt engine (#6724)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-08-15 11:08:38 +08:00