YueWeng
a4243f0da5
[TRTLLM-6393][feat] add static tree sampling and verification ( #7161 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-09-26 13:16:16 -04:00
HuiGao-NV
f4d3be4bbc
[None][feat] Add a standalone buffer cache class and reuse buffers between cduagraph and no-graph flow ( #7669 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-26 07:28:06 -07:00
Tailing Yuan
b11ee868c5
[ https://nvbugs/5495789 ][feat] Optionally disable server GC and worker GC ( #7995 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-09-26 21:39:24 +08:00
Martin Marciniszyn Mehringer
6dc50ebcdd
[None][chore] Require NVIDIA developers to use their full name or NVIDIA account in GitHub profiles ( #8022 )
...
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-09-26 21:16:58 +08:00
WeiHaocheng
35edad37f9
[None][doc] Add scaffolding tech blog to cover ( #8021 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-26 02:22:11 -07:00
xinhe-nv
ba6ab62bd1
[None][chore] Add failed cases into waives.txt ( #8004 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-26 00:41:02 -07:00
xinhe-nv
f32f5730b2
[None][chore] Add failed cases into waives.txt ( #7986 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-25 23:50:09 -07:00
Yueh-Ting (eop) Chen
2db22fb4e5
[None][feature] Add environment variable to adjust block pool allocation ration under kv cache manager ( #7923 )
...
By default, we allocate equal proportion shares of memory for all
window sizes (see the else case). With TRTLLM_WINDOW_SIZE_SHARES,
we can override this behavior to adjust the memory share of each
window size. For example, if we have window size of [512, 32768],
then setting TRTLLM_WINDOW_SIZE_SHARES=0.4,0.6 will be allocating
40% of the memory to window size 512 and 60% of the memory to window
size 32768.
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-09-26 14:09:01 +08:00
HuiGao-NV
a9965d84e0
[None][chore] Report NCCL error message but not OOM when NCCL error happens ( #8009 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-25 23:07:32 -07:00
peaceh-nv
55ce70060e
[ https://nvbugs/5451740 ][fix] Add DP padding back on SM120 ( #7965 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-26 13:59:54 +08:00
Lucas Liebenwein
3a96d75a3c
[ https://nvbugs/5527956 ][fix] AutoDeploy: fix IMA due to outdated metadata ( #8002 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-25 22:05:55 -07:00
sunnyqgg
2e5850c28a
[TRTLLM-7330][feat] Eagle3 cuda graph support for the first draft model inference ( #7363 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-09-26 11:28:05 +08:00
Chuang Zhu
f98fa0cf8b
[None][feat] Optimize kv cache transfer TEP ( #7613 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-25 20:20:04 -07:00
QI JUN
4c0f8482f1
[None][ci] Waive test_mm_encoder_standalone.py::test_multi_request_batch_chat[llava-v1.6-mistral-7b-hf] ( #8010 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-26 11:07:54 +08:00
Yuan Tong
fae83c387b
[ #6102 ][fix] support non-system python installation ( #7763 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-09-26 10:16:15 +08:00
Enwei Zhu
d650320de4
[None][infra] Improve the failure message for accuracy test suite ( #7994 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-26 10:04:47 +08:00
Yiqing Yan
108248ece1
[TRTLLM-7999][infra] Add B300/GB300 single gpu test ( #7951 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-26 09:59:11 +08:00
Yanchao Lu
7e2521a7f0
[None][chore] Some clean-ups for CUDA 13.0 dependencies ( #7979 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-26 08:46:11 +08:00
dongfengy
1eb653146a
[ https://nvbugs/5525951 ][fix] Clarify that PP is not supported for GPTOSS ( #7911 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-25 12:54:18 -07:00
QI JUN
1529a6f22d
[None][chore] extract weights loading related logic to model loader ( #7579 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-25 10:19:22 -07:00
Emma Qiao
2dc93c6371
[None][infra] Waive failed tests on main ( #8001 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-25 08:13:39 -07:00
WeiHaocheng
4b0570a0d6
[None][doc] Add acknowledgements in scaffolding tech blog ( #7983 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-25 08:07:13 -07:00
xxi
57ff5f4c0d
[None][fix] fix a bug in wideEp use DeepEP with num_chunks > 1 ( #7954 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2025-09-25 07:53:42 -07:00
Matthias Jouanneaux
eda1467061
[TRTLLM-5966][feat] Helix: add alltoall op ( #6815 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-09-25 07:18:29 -07:00
PeganovAnton
396c0ea677
[None][chore] relax version constraints on fastapi ( #7935 )
...
Signed-off-by: Anton Peganov <apeganov@nvidia.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-25 21:58:53 +08:00
Yueh-Ting (eop) Chen
c5012423f5
[None][chore] Remove developer name in comment ( #7981 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-09-25 06:43:38 -07:00
Yan Chunwei
40c6103ef8
[None][doc] add Llama PP known issue to release note ( #7959 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
663ce3a4de
[None][doc] fix invalid links in perf benchmarking. ( #7933 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
202bed4574
[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. ( #7851 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
QI JUN
961418908c
[ https://nvbugs/5531963 ][fix] cherry pick #7725 ( #7907 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
5999fab146
[ https://nvbugs/5427043 ][fix] cherrypick: request length exceeds max_num_tokens ( #7718 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
cb466a846d
[None][fix] api stability bug in status label ( #7861 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
9d48898def
[None][doc] add stable label to all the un-labelled arguments in LLM class ( #7863 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Zac Patel
c38d4cf6a6
[None][doc] Update Perf-Overview.md for release/1.0 ( #7848 )
...
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
57c098956e
[None][doc] add a guide for modifying APIs ( #7866 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
9f0f52249e
[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … ( #7850 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
5ecc8d0ee2
[None][doc] Replace the main in the examples' link with commit id. ( #7837 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
5342c607cd
[ https://nvbugs/5516710 ][fix] fix Llama 3.3 TP PP case ( #7717 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Tao Li @ NVIDIA
44d7c3b245
[ https://nvbugs/1234567 ][fix] Revert https://github.com/NVIDIA/TensorRT-LLM/pull/7768/files ( #7813 )
...
Signed-off-by: Tao Li
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
4a09be40f0
[None][doc] Update docker cmd in quick start guide and trtllm-serve … ( #7787 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
xinhe-nv
e30d9aced9
[ https://nvbugs/4955671 ][fix] update test list ( #7980 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-25 02:58:09 -07:00
Chuang Zhu
791e73edf6
[ https://nvbugs/5536141 ][fix] fix_disagg_single_gpu_test ( #7990 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-25 02:07:22 -07:00
Jinyang Yuan
b622cde5d5
[None][perf] Fix the tactic sorting in TrtllmGenBatchedGemmRunner::getValidConfigIndices ( #7419 )
...
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-09-25 10:27:57 +02:00
Emma Qiao
cb53261aaf
[None][infra] Unwaive some tests since dev already have a PR to collect more info ( #7984 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-25 01:03:13 -07:00
Wanli Jiang
22b45ff9c7
[TRTLLM-7758][feat] Phi4-mm image modality inference optimization ( #7918 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-25 15:58:29 +08:00
WeiHaocheng
259cc66c34
[None][doc] scaffolding tech blog part one ( #7835 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: zheyuf <zheyuf@NVIDIA.com>
Co-authored-by: zheyuf <zheyuf@NVIDIA.com>
2025-09-25 14:41:59 +08:00
fredricz-20070104
0945403174
[TRTLLM-6541][test] Add NIM perf test cases ( #7924 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-25 13:15:26 +08:00
Guoming Zhang
bb6067176f
[None][chroe] Update the cuda and tensorrt version in homepage icons. ( #7963 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-24 19:20:04 -07:00
Aurelien Chartier
98726a3bed
[None][chore] Update trtllm-bench documentation on setting FP8 KV cache ( #7885 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-09-25 09:28:53 +08:00
Void
336c2ef540
[None][feat] DeepEP LL fp8 dispatch/combine ( #7927 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-09-25 09:20:24 +08:00