Guoming Zhang
|
8fed8ee066
|
[None][doc] add blackwell information into support matrix (#6740)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Yan Chunwei
|
2ffc33921f
|
[https://nvbugs/5416501][doc] add known issues to llmapi doc (#7560)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Ryan McCormick <mccormick.codes@gmail.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Simeng Liu
|
99995846b3
|
[https://nvbugs/5470782][chore] Remove the skip statement in 1.0 rele… (#7573)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
peaceh-nv
|
541b7fda89
|
[https://nvbugs/5503423][waive] Waive Llama3.1-70B-FP8 test on RTX PRO 6000 (#7603)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
HuiGao-NV
|
af34c9713a
|
[https://nvbugs/5474169][fix] seq_len mismatch between kv cache manager and graph attn metadata (#7606)
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Yukun He
|
3cc16c2438
|
[https://nvbugs/5496960][fix] Fix Gemma model forward. (#7509)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Yan Chunwei
|
afca2fcbe0
|
[https://nvbugs/5351244][fix] test_mpi_session (#7501)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Yuxian Qiu
|
2d46dda6a7
|
[https://nvbugs/5448754][fix] Download HF model for all nodes. (#6824)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
HuiGao-NV
|
123f5cbbf0
|
[https://nvbugs/5474169][fix]Adjust max seq len for kvcache for memory estimation (#7391)
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Lizhi Zhou
|
293d9fb612
|
[https://nvbugs/5448767][fix] disable kv cache reuse for disagg pp>1 tests (#7354)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Bo Li
|
a15f08db3d
|
[https://nvbugs/5467548][fix] DeepSeek illegal memory access. (#7298)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-22 14:28:38 +08:00 |
|
Barry Kang
|
8484aa9858
|
[None][fix] Fix DeepGEMM commit (#7875)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
|
2025-09-22 13:52:50 +08:00 |
|
Stefan Niebler
|
8aead224fb
|
[https://nvbugs/5513423][fix] Correctly respect min_tokens in PyTorch Workflow (#7808)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
|
2025-09-21 22:15:18 -07:00 |
|
peaceh-nv
|
9dc7316b7f
|
[https://nvbugs/5512556][unwaive] Unwaive DeepSeek PP tests (#7828)
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
|
2025-09-22 10:26:30 +08:00 |
|
dongxuy04
|
b057fc9593
|
[None][fix] cherrypick to main: Fix possible mpi broadcast and gather issue on large object (#7854)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-09-22 10:17:23 +08:00 |
|
Enwei Zhu
|
639d4109a7
|
[None][fix] Disable torch.compile for CapturableGuidedDecoder (#7871)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-22 10:04:30 +08:00 |
|
dongxuy04
|
9eb8084ca9
|
[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist (#7727)
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
|
2025-09-21 11:01:51 -07:00 |
|
xiweny
|
822cb0115b
|
[TRTLLM-6286] [perf] Add NoSmem epilogue schedule and dynamic cluster shape for sm10x group gemm (#7757)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
Co-authored-by: djns99 <40156487+djns99@users.noreply.github.com>
|
2025-09-21 11:38:17 +08:00 |
|
Ziyi Xiong
|
897c4dd23b
|
[https://nvbugs/5517404][fix] Use the correct cuda graph for dynamic spec dec (#7728)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-09-21 08:20:48 +08:00 |
|
Yan Chunwei
|
4509d97780
|
[TRTLLM-8188][chore] refactor GenerationExecutorWorker with WorkerBase for better code reusing (#7840)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-09-20 06:24:22 -07:00 |
|
brb-nv
|
e10a027a03
|
[TRTLLM-7731][feat] KV cache transmission in disagg with CP on gen side (#7624)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-09-20 06:15:26 -07:00 |
|
Enwei Zhu
|
e943a39cbd
|
[None][doc] Update tech blog12 (#7884)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-20 18:15:39 +08:00 |
|
Chang Liu
|
2e317a7db6
|
[https://nvbugs/5520490][fix] Fix intermittent test failures by avoiding external web data pulls (#7879)
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-09-19 17:24:13 -07:00 |
|
Grzegorz Kwasniewski
|
8adaf0bb78
|
[TRTLLM-6342][feat] Support for partial sharding from factory (#7393)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
|
2025-09-19 09:07:42 -07:00 |
|
Kanghwan
|
8fcd11515d
|
[#7704][chore] Enable MathJax to fix formulas in documentation (#7744)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
|
2025-09-19 08:42:26 -07:00 |
|
Mike Iovine
|
8030b540ac
|
[https://nvbugs/5522462][fix] Fix FP8 scout illegal memory access (#7845)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-09-19 10:30:37 -04:00 |
|
pcastonguay
|
fbe325ce57
|
[https://nvbugs/5471108][chore] Unwaiving disagg acc test (#7686)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-09-19 08:56:09 -04:00 |
|
Matthias Jouanneaux
|
1be7faef37
|
[TRTLLM-5966][feat] Helix: add custom position ids to MLA kernels (#6904)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
|
2025-09-19 20:55:32 +08:00 |
|
Yuxian Qiu
|
7d28acdbf0
|
[https://nvbugs/5522332][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783) (#7797)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-09-19 18:50:40 +08:00 |
|
Enwei Zhu
|
c8cc16d38d
|
[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly (#7864)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-19 18:38:12 +08:00 |
|
Liao Lanyu
|
18095a7cb8
|
[https://nvbugs/5503440][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue (#7646)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
|
2025-09-19 18:13:33 +08:00 |
|
xinhe-nv
|
efb763402f
|
[None][chore] Add failed cases into waives.txt (#7841)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-09-19 17:59:47 +08:00 |
|
Gabriel Wu
|
0e72e8f7e6
|
[None][feat] Support EPLB in Qwen3 MoE (#7443)
Signed-off-by: Gabriel Wu <13583761+lucifer1004@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-09-19 16:45:35 +08:00 |
|
Ivy Zhang
|
0ac51487f4
|
[None][chore] remove cli cases for rtx6k (#7833)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-09-19 16:33:59 +08:00 |
|
Ivy Zhang
|
6b33bcced2
|
[None][test] Add accuracy benchmark in stress test (#7561)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
|
2025-09-19 16:09:46 +08:00 |
|
dominicshanshan
|
451475e0dc
|
[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956. (#7853)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-19 14:54:59 +08:00 |
|
Emma Qiao
|
ea079fa530
|
[None][infra] Waive failed tests in post-merge (#7859)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-09-19 14:16:12 +08:00 |
|
Kyungmin Lee
|
6fcc0540f0
|
[None][fix] fix load_model_on_cpu on qwen/convert_checkpoint.py (#2382)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>
|
2025-09-18 21:54:26 -07:00 |
|
QI JUN
|
f1b362faac
|
[None][chore] polish error message in cute_dsl_utils.py (#7852)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-09-19 12:05:11 +08:00 |
|
ruodil
|
c5453103d6
|
[None][test] add deepseek r1/v3 model with chunked prefill cases (#7124)
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
|
2025-09-19 11:12:53 +08:00 |
|
HuiGao-NV
|
a6370fd143
|
[https://nvbugs/5481434][feat] cherry-pick fix to reuse pytorch memory segments occupied by cudagraph (#7747)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-09-19 10:25:21 +08:00 |
|
fredricz-20070104
|
fc4e6d3702
|
[TRTLLM-7183][test] Feature fix model issue for disagg serving (#7785)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
|
2025-09-19 10:12:55 +08:00 |
|
Chuang Zhu
|
c98b9468af
|
[None][fix] get Local IP by connect remote (#7719)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-09-19 10:01:03 +08:00 |
|
xiweny
|
423e5f6a3c
|
[TRTLLM-6286] [feat] Update CUTLASS to 4.2 and enable SM103 group gemm (#7832)
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
|
2025-09-19 09:50:54 +08:00 |
|
Yuxian Qiu
|
d6ebcf7c4a
|
[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) (#7610)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-09-19 09:40:49 +08:00 |
|
Ziyi Xiong
|
420f0fbcf5
|
[https://nvbugs/5522851][fix] Correct the logic to update kv_lens_cuda (#7790)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
|
2025-09-19 08:11:29 +08:00 |
|
QI JUN
|
7646da2d85
|
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS correctly (#7800)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-09-19 07:19:50 +08:00 |
|
sunnyqgg
|
80dd8fe197
|
[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle (#7001)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-09-18 12:05:36 -04:00 |
|
dongfengy
|
026f22eb50
|
[None][doc] Cherry-pick deployment guide update from 1.1.0rc2 branch to main branch (#7774)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
|
2025-09-18 22:50:26 +08:00 |
|
Li Min
|
d921fc3352
|
[TRTLLM-6898][feat] Add swapab, tileN64, cga sync support for cute dsl nvfp4 gemm (#7764)
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
|
2025-09-18 21:20:04 +08:00 |
|