Yan Chunwei
271a981f1f
[None][doc] Add LLM-API API change principle ( #8350 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-11-03 01:47:15 -08:00
dongfengy
6424f7e55f
[None][doc] Clarify the perf best practice and supported hardware for gptoss ( #8665 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>
2025-10-31 10:11:59 -07:00
Yi Zhang
496b419791
[None][doc] Add doc for torch.compile & piecewise cuda graph ( #8527 )
...
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2025-10-29 21:15:46 -07:00
Sharan Chetlur
a2e964d9a8
[None][doc] Minor doc update to disagg-serving ( #8768 )
...
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
2025-10-29 17:38:06 -07:00
Robin Kobus
990b0c0c47
[TRTLLM-7159][docs] Add documentation for additional outputs ( #8325 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-27 09:52:04 +01:00
Zheng Duan
e666a704f5
[None][doc] add visualization of perf metrics in time breakdown tool doc ( #8530 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-10-23 22:09:21 -04:00
yunruis
8c9fda4b85
[None][doc] Paragraph adjustment and fix statistic ( #8568 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-10-22 03:26:09 -04:00
Shi Xiaowei
50149ac2bd
[None][doc] Fix the incorrect doc figure ( #8536 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-22 10:08:55 +08:00
Shi Xiaowei
a0024f4d34
[None][doc] Facilitates the integration of the transfer agent ( #7867 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-10-21 20:06:24 +08:00
Yueh-Ting (eop) Chen
85088dce05
[None][chore] Update feature combination matrix for SWA kv cache reuse ( #8529 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-10-21 04:41:44 -04:00
Yechan Kim
85d5aa7763
[None][feat] Support kv_cahce_reuse for HyperCLOVAX-Vision model ( #7789 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-10-21 11:11:24 +09:00
Bo Deng
dd25595ae8
[TRTLLM-7964][infra] Set nixl to default cache transceiver backend ( #7926 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2025-10-19 19:24:43 +08:00
h-guo18
55fed1873c
[None][chore] AutoDeploy: cleanup old inference optimizer configs ( #8039 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-17 15:55:57 -04:00
Leslie Fang
023e515d33
[None][chore] Combine two documents of feature combination matrix ( #8442 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-10-17 14:31:33 +08:00
xiweny
89d03d7668
[ https://nvbugs/5532789 ] [doc] Add documents about CUDA 12.9 ( #8411 )
...
Signed-off-by: Xiwen Yu <13230610+VALLIS-NERIA@users.noreply.github.com>
2025-10-16 00:05:17 -07:00
Erin
f4e7738f65
[None][doc] Ray orchestrator initial doc ( #8373 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-14 21:17:57 -07:00
Kaiyu Xie
c822c117ce
[None] [docs] Update TPOT/ITL docs ( #8378 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-10-14 20:50:54 -07:00
Kaiyu Xie
040103ab56
[None] [blog] Scaling Expert Parallelism in TensorRT LLM (Part 3: Pushing the Performance Boundary) ( #8323 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-10-13 06:37:17 -07:00
Guoming Zhang
989c25fcba
[None][doc] Add qwen3-next doc into deployment guid and test case into L0. ( #8288 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Faradawn Yang <faradawny@gmail.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-10-13 10:25:45 +08:00
Guoming Zhang
656d73087e
[None][doc] Fix several invalid ref links in deployment guide sections. ( #8287 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-10-13 10:22:32 +08:00
Guoming Zhang
a193867f8f
[None][doc] Refine deployment guide by renaming TRT-LLM to TensorRT L… ( #8214 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-10-09 17:11:24 +08:00
Jonas Yang CN
88ea2c4ee9
[TRTLLM-7349][feat] Adding new orchestrator type -- ray ( #7520 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-10-04 08:12:24 +08:00
WeiHaocheng
563e588e56
[None][doc] Scaffolding tech blog fix a typo ( #8042 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-28 10:29:01 -04:00
Guoming Zhang
51aefd1bac
[None][doc] Refine perf overview.md and correct the error link in per… ( #8035 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-28 16:14:42 +08:00
Chuang Zhu
f98fa0cf8b
[None][feat] Optimize kv cache transfer TEP ( #7613 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-09-25 20:20:04 -07:00
Yanchao Lu
7e2521a7f0
[None][chore] Some clean-ups for CUDA 13.0 dependencies ( #7979 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-26 08:46:11 +08:00
WeiHaocheng
4b0570a0d6
[None][doc] Add acknowledgements in scaffolding tech blog ( #7983 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
2025-09-25 08:07:13 -07:00
Yan Chunwei
40c6103ef8
[None][doc] add Llama PP known issue to release note ( #7959 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
663ce3a4de
[None][doc] fix invalid links in perf benchmarking. ( #7933 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Zac Patel
c38d4cf6a6
[None][doc] Update Perf-Overview.md for release/1.0 ( #7848 )
...
Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yan Chunwei
57c098956e
[None][doc] add a guide for modifying APIs ( #7866 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
9f0f52249e
[None][doc] Rename TensorRT-LLM to TensorRT LLM for homepage and the … ( #7850 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
5ecc8d0ee2
[None][doc] Replace the main in the examples' link with commit id. ( #7837 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Guoming Zhang
4a09be40f0
[None][doc] Update docker cmd in quick start guide and trtllm-serve … ( #7787 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
WeiHaocheng
259cc66c34
[None][doc] scaffolding tech blog part one ( #7835 )
...
Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>
Signed-off-by: zheyuf <zheyuf@NVIDIA.com>
Co-authored-by: zheyuf <zheyuf@NVIDIA.com>
2025-09-25 14:41:59 +08:00
Aurelien Chartier
98726a3bed
[None][chore] Update trtllm-bench documentation on setting FP8 KV cache ( #7885 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-09-25 09:28:53 +08:00
Leslie Fang
342014069e
[None][chore] Validate features combination ( #7630 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-25 08:01:13 +08:00
xxi
d471655242
[TRTLLM-7831][feat] Cherry-pick from #7423 Support fp8 block wide ep cherry pick ( #7712 )
2025-09-23 08:41:38 +08:00
Guoming Zhang
edbe270198
[TRTLLM-7958][doc] add 1.0 release notes ( #7605 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yan Chunwei
ba2864a2c6
[None][doc] Enhance api reference doc by labeling stable APIs ( #7751 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Guoming Zhang
e8a3e21b87
[ https://nvbugs/5519525 ][fix] fix doc invalid link for bug 5519525 ( #7753 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Guoming Zhang
bc7b50334c
[None][doc] Add labels description note into llm api section ( #7696 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Guoming Zhang
ab915fb333
[None][doc] Use hash id for external link ( #7641 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Guoming Zhang
5c54173054
[None][doc] Fix a invalid link and a typo. ( #7634 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Guoming Zhang
8fed8ee066
[None][doc] add blackwell information into support matrix ( #6740 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yan Chunwei
2ffc33921f
[ https://nvbugs/5416501 ][doc] add known issues to llmapi doc ( #7560 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Ryan McCormick <mccormick.codes@gmail.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Enwei Zhu
e943a39cbd
[None][doc] Update tech blog12 ( #7884 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-20 18:15:39 +08:00
Kanghwan
8fcd11515d
[ #7704 ][chore] Enable MathJax to fix formulas in documentation ( #7744 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-09-19 08:42:26 -07:00
Enwei Zhu
c8cc16d38d
[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly ( #7864 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-19 18:38:12 +08:00
dongfengy
026f22eb50
[None][doc] Cherry-pick deployment guide update from 1.1.0rc2 branch to main branch ( #7774 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-18 22:50:26 +08:00
Wanli Jiang
fe104dc20d
[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm ( #7723 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 17:37:16 +08:00
Wanli Jiang
a7ca0fff54
[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend ( #7207 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 16:26:20 +08:00
William Zhang
2614d71994
[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 ( #7628 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-17 08:11:16 -07:00
QI JUN
39248320d4
[None][feat] add an example of KV cache host offloading ( #7767 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 13:51:15 +08:00
Chang Liu
98f533453a
[TRTLLM-7398][doc] Add doc for KV cache salting support ( #7772 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-16 14:49:14 -07:00
Guoming Zhang
085271eceb
[None][doc] Clean the doc folder and move the outdated docs into lega… ( #7729 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-16 11:43:19 +08:00
Shi Xiaowei
809c4d20c0
[None][doc] Fix the link in the doc ( #7713 )
...
Signed-off-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
2025-09-16 09:50:25 +08:00
Wanli Jiang
e080294725
[TRTLLM-7918][feat] Revert "Support kvcache reuse for phi4mm ( #7563 )" ( #7722 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-15 17:19:44 +08:00
Wanli Jiang
fc9f4c9295
[TRTLLM-7918][feat] Support kvcache reuse for phi4mm ( #7563 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-15 15:47:00 +08:00
Chang Liu
47e37755a3
[TRTLLM-6903][feat] Support chunked prefill for multimodal models ( #6843 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-14 20:10:10 -07:00
v-shobhit
0652514c6d
[None][feat] Use a shell context to install dependancies ( #7383 )
...
Signed-off-by: Shobhit Verma <shobhitv@nvidia.com>
Signed-off-by: v-shobhit <161510941+v-shobhit@users.noreply.github.com>
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
2025-09-10 09:57:37 -07:00
Chang Liu
faa2f46554
[TRTLLM-5059][feat] Enable KV-cache reuse and add E2E tests for llava-next ( #7349 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-09 14:51:36 -04:00
Guoming Zhang
7f3f658d5f
[None][doc] Rename TensorRT-LLM to TensorRT LLM. ( #7554 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00
Guoming Zhang
35dac55716
[None][doc] Update kvcache part ( #7549 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00
Guoming Zhang
f53fb4c803
[TRTLLM-5930][doc] 1.0 Documentation. ( #6696 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-09 12:16:03 +08:00
dominicshanshan
c9dca69e1b
[None][chore] Mass integration of release/1.0 - 3rd ( #7519 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Bo Deng <deemod@nvidia.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
Co-authored-by: Nave Assaf <55059536+Naveassaf@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Bo Deng <deemod@nvidia.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: yifeizhang-c <219273404+yifeizhang-c@users.noreply.github.com>
Co-authored-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: chenfeiz0326 <chenfeiz@nvidia.com>
Co-authored-by: ChristinaZ <83400082+ChristinaZ@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: milesial <milesial@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
Co-authored-by: pcastonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Linda <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Jiagan Cheng <jiaganc@nvidia.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>
Co-authored-by: Dom Brown <3886319+DomBrown@users.noreply.github.com>
2025-09-08 14:03:04 +08:00
binghanc
14ee43e254
[None][docs] refine docs for accuracy evaluation of gpt-oss models ( #7252 )
...
Signed-off-by: 176802681+binghanc@users.noreply.github.com
2025-09-08 09:56:23 +08:00
Enwei Zhu
5ff3a65b23
[TRTLLM-7028][feat] Enable guided decoding with speculative decoding (part 2: one-model engine) ( #6948 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-09-03 15:16:11 -07:00
Izzy Putterman
f156221c27
[None][doc] add GPT OSS Eagle3 blog ( #7140 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-09-03 12:28:01 -04:00
Wanli Jiang
4223a9aada
[TRTLLM-7261][feat] Support phi-4 model in pytorch backend ( #7371 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-03 10:27:42 +08:00
Yan Chunwei
612c26be22
[None][doc] add legacy section for tensorrt engine ( #6724 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Robin Kobus
e09c025ffb
[None] [fix] store blog 10 media via lfs ( #7375 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-08-30 10:17:53 +08:00
yunruis
f617b03bfc
[None][fix] fix doc formula ( #7367 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-08-29 04:48:10 -04:00
dongfengy
367ff88a5e
[None][feat] Refactor llama4 for multimodal encoder IFB ( #6844 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-08-28 13:22:19 -07:00
yunruis
c4f823319b
[None][doc] add adp balance blog ( #7213 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>
2025-08-28 11:19:34 -04:00
Maurits de Groot
2d0c9b383f
[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM ( #7260 )
...
Signed-off-by: Maurits de Groot <63357890+Maurits-de-Groot@users.noreply.github.com>
2025-08-26 11:26:19 -04:00
Guoming Zhang
bf377d0b8e
[None][doc] Display tech blog for nvidia.github.io domain. ( #7241 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-26 15:36:28 +08:00
Zheng Duan
4f84a45899
[ https://nvbugs/5452463 ][doc] update disagg doc about UCX_MAX_RNDV_RAILS ( #7205 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-25 22:42:42 -04:00
Leslie Fang
9df15b2104
[None][doc] update feature_combination_matrix doc ( #6691 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-26 08:25:31 +08:00
dongfengy
48155f52bf
[TRTLLM-7321][doc] Refine GPT-OSS doc ( #7180 )
...
Signed-off-by: Dongfeng Yu
2025-08-24 08:53:53 -04:00
Suyog Gupta
e3de5758a3
[ #7136 ][feat] trtllm-serve + autodeploy integration ( #7141 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-22 08:30:53 -07:00
dongfengy
d94cc3fa3c
[TRTLLM-7321][doc] Add GPT-OSS Deployment Guide into official doc site ( #7143 )
...
Signed-off-by: Dongfeng Yu
2025-08-22 16:17:01 +08:00
Farshad Ghodsian
2d40e8750b
[None][doc] Update gpt-oss deployment guide to latest release image ( #7101 )
...
Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-21 02:33:07 -04:00
Leslie Fang
3f6a9267f1
[None][infra] update feature_combination_matrix of disaggregated and chunked prefill ( #6661 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-20 13:14:34 +08:00
Kaiyu Xie
9a74ee9dae
[None] [doc] Add more documents for large scale EP ( #7029 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-08-19 19:04:39 +08:00
Fridah-nv
97ba0eb879
[None][autodeploy] Doc: fix link path in trtllm bench doc ( #7007 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-08-19 08:43:28 +08:00
Leslie Fang
e76e5c640f
[None][infra] Enable accuracy test for mtp and chunked prefill ( #6314 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-19 07:42:52 +08:00
Bo Li
8b05b5d801
[None][doc] Update gpt oss doc ( #6954 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-08-18 01:27:30 -04:00
Leslie Fang
ce0b13ea02
[None][infra] update feature_combination_matrix of disaggregated and Eagle3 ( #6945 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-08-18 09:18:17 +08:00
Daniel Cámpora
53312eeebd
[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options ( #6831 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-08-16 00:27:24 -04:00
jmydurant
8e252256f5
[None][doc] Modify the description for mla chunked context ( #6929 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-15 12:52:26 +08:00
JunyiXu-nv
70e352a6f7
[ https://nvbugs/5437106 ][fix] Add L4 Scout benchmarking WAR option in deploy guide ( #6829 )
...
Signed-off-by: Junyi Xu <junyix@nvidia.com>
2025-08-15 08:53:13 +08:00
Tao Li @ NVIDIA
345d3d3524
[None][doc] update moe support matrix for DS R1 ( #6883 )
...
Signed-off-by: taoli <litaotju@users.noreply.github.com>
Co-authored-by: taoli <litaotju@users.noreply.github.com>
2025-08-14 13:55:11 +08:00
Zhenhua Wang
868c5d166e
[None][chore] fix markdown format for the deployment guide ( #6879 )
...
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2025-08-13 22:19:11 -04:00
Zhenhua Wang
8416d7fea8
[ https://nvbugs/5412885 ][doc] Add the workaround doc for H200 OOM ( #6853 )
...
Signed-off-by: Zhenhua Wang <4936589+zhenhuaw-me@users.noreply.github.com>
2025-08-13 19:51:38 +08:00
Shi Xiaowei
fe7dda834d
[TRTLLM-7030][fix] Refactor the example doc of dist-serving ( #6766 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-08-13 17:39:27 +08:00
Yechan Kim
12102e2d48
[TRTLLM-6772][feat] Multimodal benchmark_serving support ( #6622 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-12 19:34:02 -07:00
rakib-hasan
7ab8112450
[None][fix] Refactoring to avoid circular import when importing torch models ( #6720 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-11 18:00:42 -04:00
shaharmor98
b6baa9ed9b
[TRTLLM-6823][doc] Add checkpoint refactor docs ( #6592 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 19:47:39 -04:00
Fridah-nv
cc0f4c87d4
[None][doc] Move AutoDeploy README.md to torch docs ( #6528 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-08 19:11:45 -04:00