Commit Graph

80 Commits

Author SHA1 Message Date
Kanghwan
f58a183c6e
[None][chore] Fix formatting error in Gemma3 readme (#7352)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2025-09-03 01:15:37 +08:00
jiahanc
9f2dc3069d
[None] [doc] Update DeepSeek example doc (#7358)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-09-01 14:43:58 -04:00
brb-nv
0253036a4e [None][chore] Add docs for Gemma3 VLMs (#6880)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-01 11:02:31 +08:00
Pengyun Lin
c1e7fb9042
[TRTLLM-7207][feat] Chat completions API for gpt-oss (#7261)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-08-28 10:22:06 +08:00
zhhuang-nv
7e135d2ea7
[None][feat] Use Separate QKV Input Layout for Context MLA (#6538)
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2025-08-19 22:04:48 +08:00
jmydurant
8e252256f5
[None][doc] Modify the description for mla chunked context (#6929)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-08-15 12:52:26 +08:00
hlu1
5346eb7bc5
[None][doc] Update gpt-oss doc on MoE support matrix (#6908)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-15 08:50:31 +08:00
Chang Liu
be9dd4713c
[https://nvbugs/5385987][fix] Fix Qwen2 quantization issue by pinning transformers version (#6673)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-11 17:16:49 -07:00
Liao Lanyu
a2e9153cb0
[None][doc] Add K2 tool calling examples (#6667)
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-08-11 16:25:41 +08:00
Yibin Li
97787883c3
[TRTLLM-6420][feat] add support for Eclairv2 model - cherry-pick changes and minor fix (#6493)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-08 21:40:48 -04:00
Guoming Zhang
0223de0727
[None][doc] Add deployment guide section for VDR task (#6669)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-07 10:30:47 -04:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
Guoming Zhang
f7f46a5017
doc: remove the outdated features which marked as Experimental (#5995)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-08-06 22:01:42 -04:00
Yibin Li
2a946859a7
[None][fix] Upgrade dependencies version to avoid security vulnerability (#6506)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-08-06 14:21:03 -07:00
chenfeiz0326
a16ba6445c
[None][doc] Create deployment guide for Llama4 Scout FP8 and NVFP4 (#6550)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-08-06 22:15:24 +08:00
Yuxian Qiu
3a71ddfe09
[TRTLLM-6859][doc] Add DeepSeek R1 deployment guide. (#6579)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-08-06 22:13:54 +08:00
jiahanc
3170039e36
[None][doc] Add llama4 hybrid guide (#6640)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-08-06 01:25:38 -04:00
bhsueh_NV
ae3a5fc918
[doc][ci][Qwen3][nvbugs 5374145] Add Qwen3 235B eagle3 CI (#6477)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-31 09:37:23 +08:00
nv-guomingz
03e38c9087
chore: update trtllm-serve usage doc by removing backend parameter when it use torch as backend. (#6419)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-30 11:11:06 -04:00
nv-guomingz
7231134996
doc: remove backend parameter for trtllm-bench when backend is set to… (#6428)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-29 11:01:21 -04:00
Liana Koleva
96d004d800
doc: fix invalid link in llama 4 example documentation (#6340)
Signed-off-by: Liana Koleva <43767763+lianakoleva@users.noreply.github.com>
2025-07-26 11:27:10 -04:00
nv-guomingz
31d3eff24b
doc: fix invalid links related with llm api example (#6317)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-24 00:46:51 -04:00
Mike Iovine
9645814bdf
[chore] Clean up quickstart_advanced.py (#6021)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-07-21 15:00:59 -04:00
Linda
3efad2e58c
feat: nanobind bindings (#6185)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-21 08:56:57 +01:00
nv-guomingz
b4c7e8c9a5
doc: remove cuda_graph_config: {} from doc since cuda_graph enabled b… (#6150)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-21 10:49:29 +08:00
bhsueh_NV
2e14c8f443
[Fix][Chore][Qwen3] fix bug of using fp4 on sm120 (#6065)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-20 10:25:25 +08:00
Iman Tabrizian
b75e53ab69
Revert "feat: nanobind bindings (#5961)" (#6160)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-18 10:12:54 +08:00
Linda
5bff317abf
feat: nanobind bindings (#5961)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-17 22:42:52 +08:00
nv-guomingz
4e4d18826f
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#6003)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-15 15:50:03 +09:00
jiahanc
24dfd4cd0b
Doc: Update llama-3.3-70B guide (#6028)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-15 11:37:26 +09:00
Yechan Kim
2320f12321
doc: update EXAONE 4.0 news (#6034)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-15 10:26:51 +09:00
Yechan Kim
63139fdcff
feat: EXAONE4.0 support (#5696)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-07-14 22:28:10 +09:00
Iman Tabrizian
c32c9e2fad
doc: Add instructions for running gemma in disaggregated serving (#5922)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-10 10:21:19 -07:00
Erin
e277766f0d
chores: merge examples for v1.0 doc (#5736)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-08 21:00:42 -07:00
jiahanc
c24eb67054
Doc: fix link in llama4 Maverick example (#5864)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 11:09:58 +09:00
jiahanc
607bf4c395
Doc: Add llama4 Maverick eagle3 and max-throughput and low_latency benchmark guide (#5810)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 10:10:02 +09:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" (#5818) 2025-07-08 13:15:30 +09:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#5795)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
DylanChen-NV
5ca2b9bb15
[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow (#5615)
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-07 18:04:57 +08:00
bhsueh_NV
85e934a7fe
[Doc] update the document of qwen3 and cuda_graph usage (#5703)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-07 09:44:25 +08:00
nv-guomingz
c434147366
chore: update doc by replacing use_cuda_graph with cuda_graph_config (#5680)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-04 15:39:15 +09:00
Linda
94f0252b46 Doc: Update invalid hugging face URLs (#5683)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
nv-guomingz
6e48ac25a6
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. (#5585)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 12:23:14 -04:00
jmydurant
8836990bde
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) (#5475)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-06-26 22:18:08 +08:00
Fanrong Li
ebadc13086
[doc] update mtp documents (#5387)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-21 16:05:52 +08:00
amirkl94
8451a87742
chore: Mass integration of release/0.20 (#5082)
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-17 14:32:02 +03:00
bhsueh_NV
6a6b9d2594
doc: add document of benchmarking for Qwen3 (#5158)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-06-17 16:18:55 +08:00
Fanrong Li
38a907aaca
[TRTLLM-5278][feat] Add attention dp support to MTP relaxed acceptance (#5119)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-13 08:58:44 +08:00
pcastonguay
6d4d179cac
[TRTLLM-5518] doc: Adding disaggregated serving section to models doc (#4877)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-06-09 17:19:02 -04:00
Omer Ullman Argov
8731f5f14f
chore: Mass integration of release/0.20 (#4898)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Hui Gao <huig@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: HuiGao-NV <huig@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Pamela Peng <179191831+pamelap-nvidia@users.noreply.github.com>
Co-authored-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-08 23:26:26 +08:00