Commit Graph

1363 Commits

Author SHA1 Message Date
Xiwen Yu
38ef850552 Merge remote-tracking branch 'gitlab/main' into user/xiweny/merge_0901
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-09-01 11:46:44 +08:00
Bo Deng
3805f615da [https://nvbugs/5453949][infra] unwaive test_llama_eagle3
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-31 18:29:39 -07:00
Jiagan Cheng
8d5a7ea5b3 [https://nvbugs/5443053][fix] Disable finalize fusion when Lora is used
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-08-31 18:28:09 -07:00
Tian Zheng
e257cb3533
[None][feat] Support NVFP4 KV Cache (#6244)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-09-01 09:24:52 +08:00
xinhe-nv
5f939b9121
[None][chore] Add failed cases into waives.txt (#7342)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-30 00:49:14 -04:00
Emma Qiao
15ec2b855d
[None][infra] Waive failed tests on main branch 08/29 (#7370)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 10:28:20 -04:00
Pengbo Wang @ NVIDIA
62459d533d
[None][chore] Update pre-merge test to add DeepSeek/LLaMA and gpt-oss (#7192)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Pengbo Wang @ NVIDIA <221450789+pengbowang-nv@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-29 17:03:46 +08:00
fredricz-20070104
091b67ad2f
[TRTLLM-7280][test] Add beam search CudaGraph + Overlap Scheduler tests (#7326)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-08-29 02:16:22 -04:00
Yiqing Yan
3c06303542 [TRTLLM-7755][infra] Add DGX_B300 and GB300 tests in CI
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-08-28 22:45:00 -07:00
Chang Liu
31b0f0fb0c
[https://nvbugs/5445466][fix] Eliminate race when loading HF dynamic modules (#7268)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-29 12:36:30 +08:00
Richard Huo
ce580ce4f5
[None][feat] KV Cache Connector API (#7228)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
Co-authored-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-08-28 23:09:27 -04:00
aalanwyr
085dc19bfa
[TRTLLM-6646][test] NIM migration to TRT-LLM LLMAPI : Add QWQ-32b torch test (#7284)
Signed-off-by: Yaran Wu <28771492+aalanwyr@users.noreply.github.com>
2025-08-28 23:09:11 -04:00
Yuan Tong
ccb800f909
[TRTLLM-7457][ci] Update unittest parallel config (#7297)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-08-29 09:28:04 +08:00
Emma Qiao
1e644fa28a
[None][infra] Waive failed tests on main branch 08/26 (#7346)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-29 00:24:08 +08:00
Neta Zmora
08f935681d
[https://nvbugs/5474453][fix] fix path to tested model (#7272)
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
2025-08-28 08:01:48 -04:00
Zongfei Jing
53163bf1df
[TRTLLM-6876][feat] Add low precision all2all for mnnvl (#7155)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-08-28 18:26:16 +08:00
QI JUN
ae89163368
[None][ci] skip TestGPTOSS (#7333)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-28 05:01:49 -04:00
William Zhang
4541655e5f
[https://nvbugs/5430124][ci] Unwaive Mistral 3.1 Small tests (#7274)
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-28 00:03:32 -04:00
QI JUN
39c9ffda5a
[None][ci] fix test list name (#7321)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:33:22 -04:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
bhsueh_NV
9d345b31c0
[https://nvbugs/5453727][fix] unwaive qwen3 CI tests (#7293)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 22:58:59 +08:00
Eran Geva
462169bfc9
[https://nvbugs/5458798][fix] AD perf test outliers handling, tightened threshold, re-enabled in CI, fixed mem threshold (#7189)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2025-08-27 07:57:46 -07:00
QI JUN
d09add5ede
[None][ci] parallelize unit tests of auto deploy in B200 (#7291)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 22:32:11 +08:00
Emma Qiao
8dc62ffac4
[None][infra] Waive failed tests on main (#7300)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-27 09:53:33 -04:00
xinhe-nv
f082e4857c
[TRTLLM-7250][fix] waive failed cases (#7292)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-08-27 18:04:46 +08:00
nvamyt
dbd4f21687
[None][fix] Update maxnt of llama_v3.2_1b bench (#7279)
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-27 16:56:28 +08:00
bhsueh_NV
f167b1fd99
[https://nvbugs/5453727][fix] Fix bug of how GPT-OSS setup the parameters in CI (#7151)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-08-27 15:26:10 +08:00
QI JUN
e08c7cf17b
[None][ci] remove test_llm_api_autodeploy from B200 test db (#7282)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-27 03:12:30 -04:00
dongxuy04
abdb2735be
[None][fix] Fix possible hang issue in WideEP and move some tests to pre-merge (#7262)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-08-27 01:39:24 -04:00
Yuan Tong
6c7813e821
[TRTLLM-7457][ci] Update & cleanup unittest parallel config (#7254)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-08-27 00:45:58 -04:00
Zhenhuan Chen
d0d8903a7f
[TRTLLM-6960][fix] replace flasky scaled_mm test with more stable config (#7089)
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-08-26 20:58:33 -07:00
Shunkangz
ff4047414b
[None][opt] Balance the request based on number of tokens in AttentionDP (#7183)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-08-27 11:16:12 +08:00
Zhou Yuxin
ccb6aadea8
[https://nvbugs/5412456][fix] Remove from waives.txt (#7248)
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-08-27 10:05:53 +08:00
Jin Li
028235404b
[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-26 18:31:33 -04:00
Fridah-nv
0f947c64cb
[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder (#7233)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-26 10:47:57 -07:00
Void
040f4c70d3
[None][perf] Accelerate global scale calculations for deepEP fp4 combine (#7126)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-08-27 00:13:13 +08:00
QI JUN
baef70e67e
[None][ci] move qwen3 tests from b200 to gb200 (#7257)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-26 11:50:53 -04:00
xinhe-nv
80043affb5
[None][chore] Add failed cases into waives.txt (#7251)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 17:13:44 +08:00
amitz-nv
23ed0c892d
[https://nvbugs/5477332][fix] Relax atol in test_mamba2_chunk_scan_combined_prefill_chunking (#7215)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-26 10:48:58 +03:00
Zheng Duan
cf50ba2980
[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server (#6985)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 15:34:44 +08:00
Zheng Duan
1a929a1490
[https://nvbugs/5457504][fix] fix kv cache event test in disaggregated worker tests (#7028)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 14:25:10 +08:00
nvamyt
d8bd8843fc
[None][test] Update qwen3 timeout to 60 minutes (#7200)
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 14:18:42 +08:00
qixiang-99
b165f8bc97
fix/improve kvcache allocation in PyTorch runtime (#5933)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-08-26 12:40:22 +08:00
William Zhang
92576488d3
[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013)
* Why?

Some models (e.g. anything produced by Mistral) can have both sharded
safetensors and a consolidated safetensor in the same checkpoint
directory. In such cases, prefetching both to memory is a waste of time,
and memory.

* What?

This commit skips over consolidated safetensors when they are not the
only safetensor file present in the checkpoint directory

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-25 23:56:21 -04:00
Leslie Fang
20922b7d1f
[None][chore] Create PyExecutor from TorchLlmArgs Part 1 (#7105)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-26 10:42:01 +08:00
Xiwen Yu
ab7febd4d8 Merge commit '31979aefacbf80d2742c98ef30385db162788c84' into feat/b300_cu13
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-08-26 10:31:35 +08:00
ruodil
b845eb7a3a
[None][test] add kv cache size in bench metric and fix failed cases (#7160)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 10:10:02 +08:00
Grzegorz Kwasniewski
2101d46d68
[TRTLLM-6342][feat] TP Sharding read from the model config (#6972)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-25 15:41:27 -07:00
chenfeiz0326
6a44e5b9d1
[https://nvbugs/5440241][fix] Fix 70B GSM8K Accuracy drop (#6967)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-25 22:09:30 +08:00
Emma Qiao
200db3b809
[None][infra] Waive failed tests on main branch (#7201)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-25 09:04:37 -04:00