Commit Graph

2498 Commits

Author SHA1 Message Date
Fanrong Li
e12868bc00
[None][fix] Remove and fuse some element-wise ops in the ds-r1-fp8 model (#7238)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-08-27 10:35:38 +08:00
Zhou Yuxin
ccb6aadea8
[https://nvbugs/5412456][fix] Remove from waives.txt (#7248)
Signed-off-by: Zhou Yuxin <yuxinz@nvidia.com>
2025-08-27 10:05:53 +08:00
Jin Li
028235404b
[TRTLLM-6633][feat] Padding for piecewise cudagraph (#6750)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-08-26 18:31:33 -04:00
Iman Tabrizian
87d1d3ab06
[None][update] Update disagg code owners (#7266)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-08-26 14:36:29 -04:00
Fridah-nv
0f947c64cb
[None][doc] Update autodeploy README.md, deprecate lm_eval in examples folder (#7233)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-26 10:47:57 -07:00
Frank
78ecfbb4a4
[None][fix] Fix data type of KV Cache percentage in bench. (#7230)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-26 12:28:09 -04:00
Void
040f4c70d3
[None][perf] Accelerate global scale calculations for deepEP fp4 combine (#7126)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-08-27 00:13:13 +08:00
QI JUN
baef70e67e
[None][ci] move qwen3 tests from b200 to gb200 (#7257)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-26 11:50:53 -04:00
Maurits de Groot
2d0c9b383f
[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM (#7260)
Signed-off-by: Maurits de Groot <63357890+Maurits-de-Groot@users.noreply.github.com>
2025-08-26 11:26:19 -04:00
xinhe-nv
80043affb5
[None][chore] Add failed cases into waives.txt (#7251)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 17:13:44 +08:00
Emma Qiao
a142c0c4de
[None][infra] Add retry 3 times if ssh cluster failed (#6859)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-26 05:11:50 -04:00
Zhou Yuxin
f01101f687
[None][feat] Hopper Fp8 context mla (#7116)
Signed-off-by: Yuxin <yuxinz@nvidia.com>
2025-08-26 17:10:20 +08:00
amitz-nv
23ed0c892d
[https://nvbugs/5477332][fix] Relax atol in test_mamba2_chunk_scan_combined_prefill_chunking (#7215)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-26 10:48:58 +03:00
Guoming Zhang
bf377d0b8e
[None][doc] Display tech blog for nvidia.github.io domain. (#7241)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-26 15:36:28 +08:00
Zheng Duan
cf50ba2980
[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server (#6985)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 15:34:44 +08:00
Zheng Duan
1a929a1490
[https://nvbugs/5457504][fix] fix kv cache event test in disaggregated worker tests (#7028)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 14:25:10 +08:00
nvamyt
d8bd8843fc
[None][test] Update qwen3 timeout to 60 minutes (#7200)
Signed-off-by: nvamyt <amyt@nvidia.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 14:18:42 +08:00
yuanjingx87
bbc1478627
[None][chore] Update CI allowlist 2025-08-25 (#7229)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-08-25 22:53:48 -07:00
qixiang-99
b165f8bc97
fix/improve kvcache allocation in PyTorch runtime (#5933)
Signed-off-by: qixiang-99 <203170375+qixiang-99@users.noreply.github.com>
2025-08-26 12:40:22 +08:00
William Zhang
92576488d3
[None][feat] Skip prefetching consolidated safetensors when appropriate (#7013)
* Why?

Some models (e.g. anything produced by Mistral) can have both sharded
safetensors and a consolidated safetensor in the same checkpoint
directory. In such cases, prefetching both to memory is a waste of time,
and memory.

* What?

This commit skips over consolidated safetensors when they are not the
only safetensor file present in the checkpoint directory

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-08-25 23:56:21 -04:00
Zheng Duan
4f84a45899
[https://nvbugs/5452463][doc] update disagg doc about UCX_MAX_RNDV_RAILS (#7205)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-25 22:42:42 -04:00
Leslie Fang
20922b7d1f
[None][chore] Create PyExecutor from TorchLlmArgs Part 1 (#7105)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-26 10:42:01 +08:00
ruodil
b845eb7a3a
[None][test] add kv cache size in bench metric and fix failed cases (#7160)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-26 10:10:02 +08:00
Leslie Fang
9df15b2104
[None][doc] update feature_combination_matrix doc (#6691)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-26 08:25:31 +08:00
Grzegorz Kwasniewski
2101d46d68
[TRTLLM-6342][feat] TP Sharding read from the model config (#6972)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-25 15:41:27 -07:00
Lucas Liebenwein
97d550b4ba
[None] [AutoDeploy] canonicalize_graph before shape prop for consistent state_dict (#7223)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-08-25 16:59:57 -04:00
Bo Li
bf1b958f1a
[TRTLLM-7319][perf] Fuse slicing into MoE. (#6728)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Sergey Klevtsov <sklevtsov@nvidia.com>
Co-authored-by: Sergey Klevtsov <sklevtsov@nvidia.com>
2025-08-25 16:52:30 -04:00
Daniel Cámpora
e8e7e52892
[None][chore] Refactored the handle logits pp communication (#7154)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-25 16:14:08 -04:00
Frank
788fc62d23
[None][fix] Update to pull LLM from a central location. (#6458)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-25 13:07:29 -07:00
chenfeiz0326
6a44e5b9d1
[https://nvbugs/5440241][fix] Fix 70B GSM8K Accuracy drop (#6967)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-08-25 22:09:30 +08:00
Emma Qiao
200db3b809
[None][infra] Waive failed tests on main branch (#7201)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-08-25 09:04:37 -04:00
QI JUN
bea5e07fb7
[None][refactor] refactor the CUDA graph runner to manage all CUDA graphs (#6846)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-25 20:52:05 +08:00
shaharmor98
b32e00e9fd
[None][chore] remove CLI support for mamba cache dtype setting (#7119)
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-25 08:08:51 -04:00
amitz-nv
a1e03af0f4
[TRTLLM-7346][fix] Improve performance of PyTorchModelEngine._get_lora_params_from_requests (#7033)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-25 10:37:40 +03:00
Enwei Zhu
be6d92f09f
[None][fix] Fix MoE load balancer config loading (#7150)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-08-25 01:42:54 -04:00
Ivy Zhang
f61b74f796
[None][test] add l20 specific qa test list (#7067)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-08-25 12:44:08 +08:00
QI JUN
630e67b845
[None][ci] waive test_mamba2_chunk_scan_combined_prefill_chunking[seqlens1-8] (#7194)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-08-24 23:52:59 -04:00
Yukun He
9c5b464fe0
[None][feat] Apply AutoTuner to fp8_block_scale_deep_gemm to trigger JIT ahead of time. (#7113)
Because deep_gemm.gp8_gemm_nt will trigger many JIT processes during the inference phase, we need to sweep these shapes ahead of time. Apply the AutoTuner framework to achieve this and retain the potential capability to tune the swap_ab flag.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-08-25 10:48:31 +08:00
Bo Deng
c038fb3ef4
[None][chore] cherry-pick 6940 (#7097)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-25 10:28:45 +08:00
xinhe-nv
3ba9afcc7b
[None][feat] add gpt-osss tests to sanity list (#7158)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-08-25 10:22:07 +08:00
Bo Deng
6e131602b2
[TRTLLM-7096][infra] Testing cache transmission functionality in Python (#7025)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-08-25 09:47:39 +08:00
Yiqing Yan
486bc763c3
[None][infra] Split DGX_B200 stage into multiple parts and pre-/post-merge (#7074)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:09:04 -04:00
Robin Kobus
31979aefac
[None] [ci] Reorganize CMake and Python integration test infrastructure for C++ tests (#6754)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-24 20:53:17 +02:00
ajrasane
068056677f
[None][chore] Enable auto deploy accuracy test in CI (#7179)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-24 08:42:30 -07:00
Yanchao Lu
ec35481b0a
[None][infra] Prepare for single GPU GB200 test pipeline (#7073)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-08-24 21:46:39 +08:00
dongfengy
48155f52bf
[TRTLLM-7321][doc] Refine GPT-OSS doc (#7180)
Signed-off-by: Dongfeng Yu
2025-08-24 08:53:53 -04:00
dongxuy04
19a0ea363b
[TRTLLM-6743][feat] Optimize and refactor alltoall in WideEP (#6973)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: Dongxu Yang <dongxuy@nvidia.com>
Co-authored-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-08-24 08:15:29 -04:00
amitz-nv
35e0ae484a
[https://nvbugs/5467232][fix] Fix load_torch_hf_lora to override lora_config.trtllm_modules_to_hf_modules with default only when it has no value (#7132)
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-08-24 15:00:24 +03:00
Iman Tabrizian
96ff82e77a
[None][fix] Waive test (#7185)
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-08-24 10:45:11 +08:00
Grace Ho
3d54a1a521
[None] [feat] nsys profile output kernel classifier (#7020)
Signed-off-by: Grace Ho <grho@nvidia.com>
2025-08-23 00:57:37 -04:00