TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
yunruis	8c9fda4b85	[None][doc] Paragraph adjustment and fix statistic (#8568 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-10-22 03:26:09 -04:00
Kaiyu Xie	040103ab56	[None] [blog] Scaling Expert Parallelism in TensorRT LLM (Part 3: Pushing the Performance Boundary) (#8323 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-10-13 06:37:17 -07:00
WeiHaocheng	563e588e56	[None][doc] Scaffolding tech blog fix a typo (#8042 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-09-28 10:29:01 -04:00
WeiHaocheng	4b0570a0d6	[None][doc] Add acknowledgements in scaffolding tech blog (#7983 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2025-09-25 08:07:13 -07:00
WeiHaocheng	259cc66c34	[None][doc] scaffolding tech blog part one (#7835 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com> Signed-off-by: zheyuf <zheyuf@NVIDIA.com> Co-authored-by: zheyuf <zheyuf@NVIDIA.com>	2025-09-25 14:41:59 +08:00
Enwei Zhu	e943a39cbd	[None][doc] Update tech blog12 (#7884 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-20 18:15:39 +08:00
Kanghwan	8fcd11515d	[#7704 ][chore] Enable MathJax to fix formulas in documentation (#7744 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2025-09-19 08:42:26 -07:00
Enwei Zhu	c8cc16d38d	[None][doc] Tech blog: Combining Guided Decoding and Speculative Decoding: Making CPU and GPU Cooperate Seamlessly (#7864 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-09-19 18:38:12 +08:00
Guoming Zhang	7f3f658d5f	[None][doc] Rename TensorRT-LLM to TensorRT LLM. (#7554 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-09 12:16:03 +08:00
Guoming Zhang	35dac55716	[None][doc] Update kvcache part (#7549 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-09 12:16:03 +08:00
Guoming Zhang	f53fb4c803	[TRTLLM-5930][doc] 1.0 Documentation. (#6696 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-09 12:16:03 +08:00
Izzy Putterman	f156221c27	[None][doc] add GPT OSS Eagle3 blog (#7140 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2025-09-03 12:28:01 -04:00
yunruis	f617b03bfc	[None][fix] fix doc formula (#7367 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-08-29 04:48:10 -04:00
yunruis	c4f823319b	[None][doc] add adp balance blog (#7213 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: Kefeng-Duan <176893526+Kefeng-Duan@users.noreply.github.com>	2025-08-28 11:19:34 -04:00
Maurits de Groot	2d0c9b383f	[None][fix] Updated blog9_Deploying_GPT_OSS_on_TRTLLM (#7260 ) Signed-off-by: Maurits de Groot <63357890+Maurits-de-Groot@users.noreply.github.com>	2025-08-26 11:26:19 -04:00
Guoming Zhang	bf377d0b8e	[None][doc] Display tech blog for nvidia.github.io domain. (#7241 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-26 15:36:28 +08:00
Farshad Ghodsian	2d40e8750b	[None][doc] Update gpt-oss deployment guide to latest release image (#7101 ) Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-08-21 02:33:07 -04:00
Bo Li	8b05b5d801	[None][doc] Update gpt oss doc (#6954 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-08-18 01:27:30 -04:00
Shi Xiaowei	fe7dda834d	[TRTLLM-7030][fix] Refactor the example doc of dist-serving (#6766 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-08-13 17:39:27 +08:00
Andrew Chen	4ecda91ecc	[https://nvbugs/5423962 ][fix] Address broken links (#6531 )	2025-08-07 16:00:05 -04:00
Guoming Zhang	3036d49071	[None][doc] Unify the tech blogs naming. (#6649 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-08-06 01:45:40 -04:00
Farshad Ghodsian	6af1514dc3	[None][doc] Adding GPT-OSS Deployment Guide documentation (#6637 ) Signed-off-by: Farshad Ghodsian <47931571+farshadghodsian@users.noreply.github.com> Co-authored-by: Sharan Chetlur <116769508+schetlur-nv@users.noreply.github.com>	2025-08-05 19:19:48 +02:00
Enwei Zhu	899b74c357	[None][doc] Fix blog4 typo (#6612 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-08-05 10:20:37 +08:00
Kaiyu Xie	147ad69368	[None][doc] blog: Scaling Expert Parallelism in TensorRT-LLM (Part 2: Performance Status and Optimization) (#6547 ) Signed-off-by: Kaiyu XIe <26294424+kaiyux@users.noreply.github.com>	2025-08-01 16:46:15 +08:00
nv-guomingz	03e38c9087	chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-30 11:11:06 -04:00
Kaiyu Xie	e58afa510e	doc: Add README for wide EP (#6356 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-29 00:36:12 -04:00
Simeng Liu	7bff341553	[doc] Add NGram tech blog (#6311 ) Signed-off-by: Simeng Liu <simengl@nvidia.com>	2025-07-25 10:26:33 -07:00
Kaiyu Xie	f08286c679	doc: Refactor documents and examples of disaggregated serving and wide ep (#6054 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-23 09:20:57 +08:00
Raayan Dhar	5234502717	[nvbug/5361223] doc: Update Llama4 deployment guide: update config & note concurrency (#6222 ) Signed-off-by: raayandhar <rdhar@nvidia.com>	2025-07-22 11:28:23 -07:00
nv-guomingz	b4c7e8c9a5	doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b… (#6150 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-21 10:49:29 +08:00
nv-guomingz	4e4d18826f	chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#6003 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-15 15:50:03 +09:00
Shi Xiaowei	f4e0425a7b	doc: update the link of the diagram (#5953 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-07-11 18:02:22 +09:00
Shi Xiaowei	37293e4dfd	blog: add qwen3 disagg perf metrics (#5822 )	2025-07-11 16:41:45 +09:00
wili	2e3cf42e03	[refactor] Simplification of Speculative decoding configs (#5639 ) Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com> Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>	2025-07-10 11:37:30 -04:00
Yan Chunwei	07f6da763d	[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner (#5876 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-07-10 11:31:35 +08:00
Erin	e277766f0d	chores: merge examples for v1.0 doc (#5736 ) Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-07-08 21:00:42 -07:00
jiahanc	607bf4c395	Doc: Add llama4 Maverick eagle3 and max-throughput and low_latency benchmark guide (#5810 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-07-09 10:10:02 +09:00
nv-guomingz	c8fa08da5c	doc: update cuda_graph_config usage part in DS R1 docs (#5796 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2025-07-08 16:54:46 +09:00
nv-guomingz	0be41b6524	Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" (#5818 )	2025-07-08 13:15:30 +09:00
nv-guomingz	5a8173c121	chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#5795 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-08 08:52:36 +08:00
nv-guomingz	c434147366	chore: update doc by replacing use_cuda_graph with cuda_graph_config (#5680 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-07-04 15:39:15 +09:00
Fanrong Li	ebadc13086	[doc] update mtp documents (#5387 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-06-21 16:05:52 +08:00
Shi Xiaowei	1e35be5840	doc: subsequent modifications of blog 5 (#5366 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-06-19 18:23:13 +08:00
Shi Xiaowei	9a53e58a58	blog: Disaggregated Serving in TensorRT-LLM (#5353 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-06-19 18:02:15 +08:00
Julien Demouth	bb79ba7c35	Edits for tech blog 4 (#5006 ) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com> Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-06-09 09:38:41 +08:00
juney-nvidia	a761cc2f8d	doc: refinement based on Julien's feedbacks (#4967 ) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-06-06 08:56:14 +08:00
Kaiyu Xie	5a5427f86e	blog: Scaling Expert Parallelism in TensorRT-LLM (Part 1: Design and Implementation of Large-scale EP) (#4958 ) Signed-off-by: juney-nvidia <143764042+juney-nvidia@users.noreply.github.com> Co-authored-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com> Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-06-05 22:24:04 +08:00
juney-nvidia	49f2f1f8eb	Expose new tech blog about DSR1 throughput optimization to the main R… (#4803 ) Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>	2025-05-30 20:44:12 +08:00
Tao Li @ NVIDIA	3b7120d60e	DeepSeek R1 throughut optimization tech blog for Blackwell GPUs (#4791 ) Signed-off-by: Tao Li	2025-05-30 18:54:19 +08:00
Yan Chunwei	5506f60037	chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs (#4603 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2025-05-28 18:43:04 +08:00

1 2

56 Commits