Commit Graph

2989 Commits

Author SHA1 Message Date
Enwei Zhu
a64d9b69e5
[None][fix] Fix chunked prefill state of draft request (#8067)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-30 09:51:21 +08:00
Guoming Zhang
0c47925600
[None][doc] Refine perf overview.md and correct the error link in per… (#8036)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-28 16:14:31 +08:00
Yiqing Yan
4d5465a575
[None][chore] Bump version to 1.1.0 (#7942)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-26 13:17:36 +08:00
sunnyqgg
2e5850c28a
[TRTLLM-7330][feat] Eagle3 cuda graph support for the first draft model inference (#7363)
Signed-off-by: qgai <qgai@nvidia.com>
2025-09-26 11:28:05 +08:00
Chuang Zhu
f98fa0cf8b
[None][feat] Optimize kv cache transfer TEP (#7613)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-25 20:20:04 -07:00
QI JUN
4c0f8482f1
[None][ci] Waive test_mm_encoder_standalone.py::test_multi_request_batch_chat[llava-v1.6-mistral-7b-hf] (#8010)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-26 11:07:54 +08:00
Yuan Tong
fae83c387b
[#6102][fix] support non-system python installation (#7763)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-09-26 10:16:15 +08:00
Enwei Zhu
d650320de4
[None][infra] Improve the failure message for accuracy test suite (#7994)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-26 10:04:47 +08:00
Yiqing Yan
108248ece1
[TRTLLM-7999][infra] Add B300/GB300 single gpu test (#7951)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-09-26 09:59:11 +08:00
Yanchao Lu
7e2521a7f0
[None][chore] Some clean-ups for CUDA 13.0 dependencies (#7979)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-26 08:46:11 +08:00
dongfengy
1eb653146a
[https://nvbugs/5525951][fix] Clarify that PP is not supported for GPTOSS (#7911)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-25 12:54:18 -07:00
QI JUN
1529a6f22d
[None][chore] extract weights loading related logic to model loader (#7579)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-25 10:19:22 -07:00
Emma Qiao
2dc93c6371
[None][infra] Waive failed tests on main (#8001)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-25 08:13:39 -07:00
WeiHaocheng
4b0570a0d6
[None][doc] Add acknowledgements in scaffolding tech blog (#7983)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-25 08:07:13 -07:00
xxi
57ff5f4c0d
[None][fix] fix a bug in wideEp use DeepEP with num_chunks > 1 (#7954)
Signed-off-by: xxi <xxi@nvidia.com>
2025-09-25 07:53:42 -07:00
Matthias Jouanneaux
eda1467061
[TRTLLM-5966][feat] Helix: add alltoall op (#6815)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
2025-09-25 07:18:29 -07:00
PeganovAnton
396c0ea677
[None][chore] relax version constraints on fastapi (#7935)
Signed-off-by: Anton Peganov <apeganov@nvidia.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-25 21:58:53 +08:00
Yueh-Ting (eop) Chen
c5012423f5
[None][chore] Remove developer name in comment (#7981)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-09-25 06:43:38 -07:00
Yan Chunwei
40c6103ef8 [None][doc] add Llama PP known issue to release note (#7959)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
663ce3a4de [None][doc] fix invalid links in perf benchmarking. (#7933)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
202bed4574 [None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
QI JUN
961418908c [https://nvbugs/5531963][fix] cherry pick #7725 (#7907)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
5999fab146 [https://nvbugs/5427043][fix] cherrypick: request length exceeds max_num_tokens (#7718)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
cb466a846d [None][fix] api stability bug in status label (#7861)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
9d48898def [None][doc] add stable label to all the un-labelled arguments in LLM class (#7863)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Zac Patel
c38d4cf6a6 [None][doc] Update Perf-Overview.md for release/1.0 (#7848)
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
57c098956e [None][doc] add a guide for modifying APIs (#7866)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
9f0f52249e [None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … (#7850)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
5ecc8d0ee2 [None][doc] Replace the main in the examples' link with commit id. (#7837)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
5342c607cd [https://nvbugs/5516710][fix] fix Llama 3.3 TP PP case (#7717)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Tao Li @ NVIDIA
44d7c3b245 [https://nvbugs/1234567][fix] Revert https://github.com/NVIDIA/TensorRT-LLM/pull/7768/files (#7813)
Signed-off-by: Tao Li
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
4a09be40f0 [None][doc] Update docker cmd in quick start guide and trtllm-serve … (#7787)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
xinhe-nv
e30d9aced9
[https://nvbugs/4955671][fix] update test list (#7980)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-25 02:58:09 -07:00
Chuang Zhu
791e73edf6
[https://nvbugs/5536141][fix] fix_disagg_single_gpu_test (#7990)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-25 02:07:22 -07:00
Jinyang Yuan
b622cde5d5
[None][perf] Fix the tactic sorting in TrtllmGenBatchedGemmRunner::getValidConfigIndices (#7419)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-09-25 10:27:57 +02:00
Emma Qiao
cb53261aaf
[None][infra] Unwaive some tests since dev already have a PR to collect more info (#7984)
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-25 01:03:13 -07:00
Wanli Jiang
22b45ff9c7
[TRTLLM-7758][feat] Phi4-mm image modality inference optimization (#7918)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-25 15:58:29 +08:00
WeiHaocheng
259cc66c34
[None][doc] scaffolding tech blog part one (#7835)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: zheyuf <zheyuf@NVIDIA.com>
Co-authored-by: zheyuf <zheyuf@NVIDIA.com>
2025-09-25 14:41:59 +08:00
fredricz-20070104
0945403174
[TRTLLM-6541][test] Add NIM perf test cases (#7924)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-25 13:15:26 +08:00
Guoming Zhang
bb6067176f
[None][chroe] Update the cuda and tensorrt version in homepage icons. (#7963)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-24 19:20:04 -07:00
Aurelien Chartier
98726a3bed
[None][chore] Update trtllm-bench documentation on setting FP8 KV cache (#7885)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-09-25 09:28:53 +08:00
Void
336c2ef540
[None][feat] DeepEP LL fp8 dispatch/combine (#7927)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-09-25 09:20:24 +08:00
Iman Tabrizian
be7e51727e
[https://nvbugs/5456485][bug] unwaive triton test (#7966)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-24 17:02:55 -07:00
Leslie Fang
342014069e
[None][chore] Validate features combination (#7630)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-25 08:01:13 +08:00
Iman Tabrizian
da30d496b0
[None][fix] Revert "[None][feat] Return topk logprobs in torch backend (#7756)" (#7969)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-24 15:36:38 -07:00
sychen52
5a65af24cd
[OMNIML-2336][feat] Add NVFP4 x FP8 moe kernels (#7821)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
2025-09-24 12:14:35 -07:00
Iman Tabrizian
6d45cd163e
[None][bug] Fix transformers version for Triton backend (#7964)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-09-24 12:55:52 -04:00
Mike Iovine
42c2ec3239
[https://nvbugs/5473781][fix] Fix llama 4 FP8 for PP>1 (#7220)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-24 12:16:27 -04:00
Pamela Peng
b1dc84b4a3
[TRTLLM-7399][test] Add DS-R1/Qwen3 test cases for RTX 6000 (#7662)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-24 11:40:26 -04:00
Yuxian Qiu
48fda86c56
[None][fix] Fix dummy load format for DeepSeek. (#7874)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-24 23:03:16 +08:00