Enwei Zhu
|
a64d9b69e5
|
[None][fix] Fix chunked prefill state of draft request (#8067)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-30 09:51:21 +08:00 |
|
Guoming Zhang
|
0c47925600
|
[None][doc] Refine perf overview.md and correct the error link in per… (#8036)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-09-28 16:14:31 +08:00 |
|
Yiqing Yan
|
4d5465a575
|
[None][chore] Bump version to 1.1.0 (#7942)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-09-26 13:17:36 +08:00 |
|
sunnyqgg
|
2e5850c28a
|
[TRTLLM-7330][feat] Eagle3 cuda graph support for the first draft model inference (#7363)
Signed-off-by: qgai <qgai@nvidia.com>
|
2025-09-26 11:28:05 +08:00 |
|
Chuang Zhu
|
f98fa0cf8b
|
[None][feat] Optimize kv cache transfer TEP (#7613)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-09-25 20:20:04 -07:00 |
|
QI JUN
|
4c0f8482f1
|
[None][ci] Waive test_mm_encoder_standalone.py::test_multi_request_batch_chat[llava-v1.6-mistral-7b-hf] (#8010)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-09-26 11:07:54 +08:00 |
|
Yuan Tong
|
fae83c387b
|
[#6102][fix] support non-system python installation (#7763)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
|
2025-09-26 10:16:15 +08:00 |
|
Enwei Zhu
|
d650320de4
|
[None][infra] Improve the failure message for accuracy test suite (#7994)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-09-26 10:04:47 +08:00 |
|
Yiqing Yan
|
108248ece1
|
[TRTLLM-7999][infra] Add B300/GB300 single gpu test (#7951)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
|
2025-09-26 09:59:11 +08:00 |
|
Yanchao Lu
|
7e2521a7f0
|
[None][chore] Some clean-ups for CUDA 13.0 dependencies (#7979)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
|
2025-09-26 08:46:11 +08:00 |
|
dongfengy
|
1eb653146a
|
[https://nvbugs/5525951][fix] Clarify that PP is not supported for GPTOSS (#7911)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
|
2025-09-25 12:54:18 -07:00 |
|
QI JUN
|
1529a6f22d
|
[None][chore] extract weights loading related logic to model loader (#7579)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-09-25 10:19:22 -07:00 |
|
Emma Qiao
|
2dc93c6371
|
[None][infra] Waive failed tests on main (#8001)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-09-25 08:13:39 -07:00 |
|
WeiHaocheng
|
4b0570a0d6
|
[None][doc] Add acknowledgements in scaffolding tech blog (#7983)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
|
2025-09-25 08:07:13 -07:00 |
|
xxi
|
57ff5f4c0d
|
[None][fix] fix a bug in wideEp use DeepEP with num_chunks > 1 (#7954)
Signed-off-by: xxi <xxi@nvidia.com>
|
2025-09-25 07:53:42 -07:00 |
|
Matthias Jouanneaux
|
eda1467061
|
[TRTLLM-5966][feat] Helix: add alltoall op (#6815)
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
|
2025-09-25 07:18:29 -07:00 |
|
PeganovAnton
|
396c0ea677
|
[None][chore] relax version constraints on fastapi (#7935)
Signed-off-by: Anton Peganov <apeganov@nvidia.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-09-25 21:58:53 +08:00 |
|
Yueh-Ting (eop) Chen
|
c5012423f5
|
[None][chore] Remove developer name in comment (#7981)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
|
2025-09-25 06:43:38 -07:00 |
|
Yan Chunwei
|
40c6103ef8
|
[None][doc] add Llama PP known issue to release note (#7959)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Guoming Zhang
|
663ce3a4de
|
[None][doc] fix invalid links in perf benchmarking. (#7933)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Guoming Zhang
|
202bed4574
|
[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
QI JUN
|
961418908c
|
[https://nvbugs/5531963][fix] cherry pick #7725 (#7907)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Yan Chunwei
|
5999fab146
|
[https://nvbugs/5427043][fix] cherrypick: request length exceeds max_num_tokens (#7718)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Yan Chunwei
|
cb466a846d
|
[None][fix] api stability bug in status label (#7861)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Yan Chunwei
|
9d48898def
|
[None][doc] add stable label to all the un-labelled arguments in LLM class (#7863)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Zac Patel
|
c38d4cf6a6
|
[None][doc] Update Perf-Overview.md for release/1.0 (#7848)
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Yan Chunwei
|
57c098956e
|
[None][doc] add a guide for modifying APIs (#7866)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Guoming Zhang
|
9f0f52249e
|
[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … (#7850)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Guoming Zhang
|
5ecc8d0ee2
|
[None][doc] Replace the main in the examples' link with commit id. (#7837)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Yan Chunwei
|
5342c607cd
|
[https://nvbugs/5516710][fix] fix Llama 3.3 TP PP case (#7717)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Tao Li @ NVIDIA
|
44d7c3b245
|
[https://nvbugs/1234567][fix] Revert https://github.com/NVIDIA/TensorRT-LLM/pull/7768/files (#7813)
Signed-off-by: Tao Li
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
Guoming Zhang
|
4a09be40f0
|
[None][doc] Update docker cmd in quick start guide and trtllm-serve … (#7787)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-09-25 21:02:35 +08:00 |
|
xinhe-nv
|
e30d9aced9
|
[https://nvbugs/4955671][fix] update test list (#7980)
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
|
2025-09-25 02:58:09 -07:00 |
|
Chuang Zhu
|
791e73edf6
|
[https://nvbugs/5536141][fix] fix_disagg_single_gpu_test (#7990)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-09-25 02:07:22 -07:00 |
|
Jinyang Yuan
|
b622cde5d5
|
[None][perf] Fix the tactic sorting in TrtllmGenBatchedGemmRunner::getValidConfigIndices (#7419)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-09-25 10:27:57 +02:00 |
|
Emma Qiao
|
cb53261aaf
|
[None][infra] Unwaive some tests since dev already have a PR to collect more info (#7984)
Signed-off-by: qqiao <qqiao@nvidia.com>
|
2025-09-25 01:03:13 -07:00 |
|
Wanli Jiang
|
22b45ff9c7
|
[TRTLLM-7758][feat] Phi4-mm image modality inference optimization (#7918)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
|
2025-09-25 15:58:29 +08:00 |
|
WeiHaocheng
|
259cc66c34
|
[None][doc] scaffolding tech blog part one (#7835)
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: zheyuf <zheyuf@NVIDIA.com>
Co-authored-by: zheyuf <zheyuf@NVIDIA.com>
|
2025-09-25 14:41:59 +08:00 |
|
fredricz-20070104
|
0945403174
|
[TRTLLM-6541][test] Add NIM perf test cases (#7924)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
|
2025-09-25 13:15:26 +08:00 |
|
Guoming Zhang
|
bb6067176f
|
[None][chroe] Update the cuda and tensorrt version in homepage icons. (#7963)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-09-24 19:20:04 -07:00 |
|
Aurelien Chartier
|
98726a3bed
|
[None][chore] Update trtllm-bench documentation on setting FP8 KV cache (#7885)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-09-25 09:28:53 +08:00 |
|
Void
|
336c2ef540
|
[None][feat] DeepEP LL fp8 dispatch/combine (#7927)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-09-25 09:20:24 +08:00 |
|
Iman Tabrizian
|
be7e51727e
|
[https://nvbugs/5456485][bug] unwaive triton test (#7966)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-09-24 17:02:55 -07:00 |
|
Leslie Fang
|
342014069e
|
[None][chore] Validate features combination (#7630)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
|
2025-09-25 08:01:13 +08:00 |
|
Iman Tabrizian
|
da30d496b0
|
[None][fix] Revert "[None][feat] Return topk logprobs in torch backend (#7756)" (#7969)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-09-24 15:36:38 -07:00 |
|
sychen52
|
5a65af24cd
|
[OMNIML-2336][feat] Add NVFP4 x FP8 moe kernels (#7821)
Signed-off-by: Shiyang Chen <shiychen@nvidia.com>
|
2025-09-24 12:14:35 -07:00 |
|
Iman Tabrizian
|
6d45cd163e
|
[None][bug] Fix transformers version for Triton backend (#7964)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
|
2025-09-24 12:55:52 -04:00 |
|
Mike Iovine
|
42c2ec3239
|
[https://nvbugs/5473781][fix] Fix llama 4 FP8 for PP>1 (#7220)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-09-24 12:16:27 -04:00 |
|
Pamela Peng
|
b1dc84b4a3
|
[TRTLLM-7399][test] Add DS-R1/Qwen3 test cases for RTX 6000 (#7662)
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
2025-09-24 11:40:26 -04:00 |
|
Yuxian Qiu
|
48fda86c56
|
[None][fix] Fix dummy load format for DeepSeek. (#7874)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-09-24 23:03:16 +08:00 |
|