TensorRT LLM
|
9135d580bf
|
[None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
|
2025-11-19 03:25:00 +00:00 |
|
jellysnack
|
99ba723e20
|
[None][fix] logits device and shape issues in dynamic draft path (#9079)
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
|
2025-11-18 19:22:47 -08:00 |
|
Ivy Zhang
|
782dfca7e8
|
[TRTLLM-9050][test] add llama4 disagg case to cover kv cache overflow error (#9172)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-11-18 18:26:32 -08:00 |
|
Grzegorz Kwasniewski
|
7905d6c0da
|
[#9098][feat] Simple sharding latent experts (#9099)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
|
2025-11-18 21:14:22 -05:00 |
|
ChristinaZ
|
fbf6c16cd2
|
[None][fix] Update the default invalid value for deepseek mode of routing (#9222)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-11-19 10:14:06 +08:00 |
|
Grzegorz Kwasniewski
|
92f86a50d4
|
[#9137][feat] Factory sharding as default (#9144)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
|
2025-11-18 21:12:03 -05:00 |
|
Patrice Castonguay
|
9b0f45298f
|
[None][feat] Have ability to cancel disagg request if KV cache resource are exhausted (#9155)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-11-18 20:59:17 -05:00 |
|
xinhe-nv
|
35658eab55
|
[None][chore] Add failed cases into waives.txt (#9193)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-11-18 17:47:55 -08:00 |
|
Enwei Zhu
|
7c4777a571
|
[TRTLLM-9286][feat] Integration of CuteDSL NVFP4 grouped GEMM (#8880)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-11-18 17:40:12 -08:00 |
|
Lizhi Zhou
|
c789000a62
|
[https://nvbugs/5649010][fix] increase status-checking interval to avoid instability (#9203)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-11-19 08:55:42 +08:00 |
|
Bo Deng
|
34f845bf69
|
[TRTLLM-9287][infra] Use NIXL backend for accuracy tests (#9247)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2025-11-18 14:46:20 -08:00 |
|
Ajinkya Rasane
|
8d7cda2318
|
[None][chore] Update the Flux autodeploy example (#8434)
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-11-18 14:16:04 -08:00 |
|
Ziyi Xiong
|
7c4344b92e
|
[https://nvbugs/5590408][fix] Exclude num of draft tokens from mMaxSeqLenKv (#9210)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-11-18 15:41:56 -05:00 |
|
Eran Geva
|
3ac11a6180
|
[#9152][fix] AutoDeploy fused_allreduce_residual_rmsnorm to support demollm mode (#9197)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
|
2025-11-18 22:15:29 +02:00 |
|
Chenghao Zhang
|
f0b68e4c66
|
[None][feat] AutoDeploy: Perf improvement for small batch size (#9163)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
|
2025-11-18 12:11:12 -08:00 |
|
Nikita Korobov
|
fe569f0594
|
[None][feat] bias for FP4 TRT-LLM Gen MoE (#9220)
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
|
2025-11-18 09:59:47 -08:00 |
|
mpikulski
|
04fb481da3
|
[TRTLLM-9295][fix] restore greedy sampling in _test_openai_chat_guided_decoding (#9178)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-11-18 09:41:59 -08:00 |
|
Gal Hubara-Agam
|
36d3d8f608
|
[None][chore] Print device info in trtllm-bench report (#8584)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
|
2025-11-18 09:00:10 -08:00 |
|
Kaiyu Xie
|
d076aa44d3
|
[None] [tests] Unwaive wide ep related tests (#9204)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-11-18 08:54:46 -08:00 |
|
Zheyu Fu
|
c4e02d7f04
|
[TRTLLM-8136][feat] Dynamic draft length in spec decode (stage 1). (#8194)
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>
|
2025-11-18 11:13:39 -05:00 |
|
Ivy Zhang
|
160b361588
|
[TRTLLM-8949][test] Add rcca test case for eagle3 consistency check (#9088)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-11-18 05:55:00 -08:00 |
|
Robin Kobus
|
9913dc25ae
|
[None][refactor] decoding inputs, part 2 (#5799)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-11-18 14:38:51 +01:00 |
|
Ivy Zhang
|
ca41a71f92
|
[TRTLLM-8948][test] Add long bench case (#9165)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-11-18 04:41:48 -08:00 |
|
Chang Liu
|
8e001dd195
|
[None][fix] DeepSeek V3.2 indexer RoPE fix (#9232)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-11-18 20:35:27 +08:00 |
|
Lizhi Zhou
|
07343bb11c
|
[None][chore] fix a deepseekv3 error when debug mode is on (#9217)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
|
2025-11-18 01:14:32 -08:00 |
|
ruodil
|
82480346aa
|
[https://nvbugs/5652552][fix] add printing for llm args (#9205)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
|
2025-11-17 23:58:36 -08:00 |
|
Zero Zeng
|
43896af1b1
|
[None][chore] benchmark refactor (#9207)
Signed-off-by: Zero Zeng <38289304+zerollzeng@users.noreply.github.com>
|
2025-11-17 23:29:28 -08:00 |
|
Stanley Sun
|
96cfdd8a72
|
[None][chore] Change trt-server to trtlllm-server in opentelemetry readme (#9173)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
|
2025-11-17 22:02:24 -08:00 |
|
Gal Hubara-Agam
|
5e5300898b
|
[#8732][feat] Add ReLU2 to TRTLLM Cutlass MoE BF16 kernels (#9191)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
|
2025-11-17 20:30:00 -08:00 |
|
TensorRT LLM
|
fd9916424f
|
[None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
|
2025-11-18 03:23:16 +00:00 |
|
Tri Dao
|
fc088e642c
|
[None][feat] Support Glm4MoeForCausalLM (#8256)
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
Co-authored-by: Xuanyu Chen <xuanyuc@nvidia.com>
|
2025-11-18 09:43:21 +08:00 |
|
QI JUN
|
c3376fa114
|
[None][ci] split speculative test case into several small cases (#9209)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-11-17 17:02:25 -08:00 |
|
Lucas Liebenwein
|
6d0a8edbbb
|
[None][chore] local imports for AutoDeploy in serve and bench (#9199)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-11-18 08:14:32 +08:00 |
|
zackyoray
|
e3c9a97075
|
[None][feat] Add TRTLLM_NIXL_KVCACHE_BACKEND environment variable for NIXL backend selection (#9075)
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
|
2025-11-17 15:39:55 -08:00 |
|
TensorRT LLM
|
2d6289b4b4
|
[None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
|
2025-11-17 22:26:06 +00:00 |
|
yuanjingx87
|
ec36a3af7e
|
[None][infra] Fix lock file generation script (#9180)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
|
2025-11-17 11:53:56 -08:00 |
|
Matt Lefebvre
|
470d777744
|
[TRTINFRA-7280][infra] Support enroot/pyxis clusters in multi-node SLURM and enable oci-hsg GB200 in post-merge (#9117)
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
|
2025-11-17 10:59:30 -08:00 |
|
Robin Kobus
|
df41f220a2
|
[TRTLLM-8831][feat] Enable early exit with overlap scheduler (#8587)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-11-17 18:07:13 +01:00 |
|
Mike Iovine
|
6151a4c9d6
|
[None][feat] Add simple optimizations for MTP 2-model (#9176)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-11-17 10:05:39 -05:00 |
|
Yiqing Yan
|
24f5cd7493
|
[TRTLLM-8000][infra] Catch error in merge waive list stage (#7289)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-11-17 13:28:50 +08:00 |
|
Kaiyu Xie
|
04be5a704e
|
[None] [fix] Fix missing ActivationType issue (#9171)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
|
2025-11-17 10:43:25 +08:00 |
|
Anthony Chang
|
86cfb3ea7e
|
[None][feat] Update TRTLLM MoE cubins; reduce mxfp4 weight padding requirement; tighten TMA bound (#9025)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-11-17 10:04:29 +08:00 |
|
Jinyang Yuan
|
6dc70aa0e5
|
[https://nvbugs/5613089][fix] Fix the rank to access all_rank_chunk_size_list when chunked MoE is used (#8723)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-11-17 10:01:08 +08:00 |
|
Emma Qiao
|
d16b1a84c5
|
[None][infra] Waive a failed case in pre-merge stage 11/16 (#9192)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-11-17 09:36:56 +08:00 |
|
sunnyqgg
|
7862b15a65
|
[TRTLLM-8778][feat] Add tree attention support for blackwell arch (#8975)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-11-17 09:01:53 +08:00 |
|
Guoming Zhang
|
e0f69657c7
|
[None][fix] Update the attention layers counting for Qwen3-next. (#9072)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-11-16 11:52:56 -08:00 |
|
Emma Qiao
|
2854f0cf3d
|
[None][infra] Waive failed tests for main branch 11/15 (#9187)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
|
2025-11-16 01:48:25 -08:00 |
|
brb-nv
|
63237494db
|
[None][chore] Waive failing tests blocking pre-merge (#9189)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-11-16 01:06:03 -08:00 |
|
JadoTu
|
3cde84581d
|
[None][fix] Make the sliced nvfp4 output contiguous (#9123)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
|
2025-11-15 20:00:54 +08:00 |
|
Thor Johnsen
|
64cd91ae0a
|
[None][infra] Add trt-llm-kv-cache-manager-devs as code owner for appropriate files (#9182)
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
|
2025-11-15 16:46:14 +08:00 |
|