Iman Tabrizian
c32c9e2fad
doc: Add instructions for running gemma in disaggregated serving ( #5922 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-10 10:21:19 -07:00
wili
2e3cf42e03
[refactor] Simplification of Speculative decoding configs ( #5639 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-07-10 11:37:30 -04:00
Yan Chunwei
07f6da763d
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner ( #5876 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-10 11:31:35 +08:00
Erin
e277766f0d
chores: merge examples for v1.0 doc ( #5736 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-08 21:00:42 -07:00
jiahanc
c24eb67054
Doc: fix link in llama4 Maverick example ( #5864 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 11:09:58 +09:00
jiahanc
607bf4c395
Doc: Add llama4 Maverick eagle3 and max-throughput and low_latency benchmark guide ( #5810 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-07-09 10:10:02 +09:00
Yan Chunwei
e50d95c40d
chore [TRTLLM-6161]: add LLM speculative decoding example ( #5706 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-09 07:33:11 +08:00
Yiqing Yan
5203a0f6df
chore: bump version to 1.0.0rc3 ( #5819 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-08 16:04:40 +09:00
Zhenhuan Chen
dee6644ed9
feat(scaffolding): add streaming scaffolding_llm.generate_async support ( #5345 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-07-08 15:08:40 +09:00
nv-guomingz
0be41b6524
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" ( #5818 )
2025-07-08 13:15:30 +09:00
nv-guomingz
5a8173c121
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… ( #5795 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-08 08:52:36 +08:00
DylanChen-NV
5ca2b9bb15
[TRTLLM-5812][feat] support FP8 row-wise dense GEMM in torch flow ( #5615 )
...
Signed-off-by: Dylan Chen <191843203+DylanChen-NV@users.noreply.github.com>
2025-07-07 18:04:57 +08:00
bhsueh_NV
85e934a7fe
[Doc] update the document of qwen3 and cuda_graph usage ( #5703 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-07-07 09:44:25 +08:00
Xianjie Qiao
b1976c2add
Add wide-ep benchmarking scripts ( #5760 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
Signed-off-by: Xianjie Qiao <5410381+qiaoxj07@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-05 19:29:39 +08:00
Stefan Niebler
d1112aac37
[TRTLLM-3442] feat: added beam search support to the PyTorch Workflow ( #5333 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-07-05 01:35:13 +09:00
nv-guomingz
c434147366
chore: update doc by replacing use_cuda_graph with cuda_graph_config ( #5680 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-07-04 15:39:15 +09:00
Linda
94f0252b46
Doc: Update invalid hugging face URLs ( #5683 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-04 13:14:13 +08:00
Lucas Liebenwein
24ac9b5f69
[AutoDeploy] merge feat/ad-2025-06-29 ( #5737 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Neta Zmora <nzmora@nvidia.com>
Co-authored-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2025-07-04 10:21:18 +09:00
Yiqing Yan
3c9dd5cd66
chore: bump version to 1.0.0rc2 ( #5645 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-07-03 12:35:28 +08:00
Shunkangz
3e75320fe8
Add pd dynamic scaling readme ( #5540 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.com>
2025-07-02 02:18:51 -04:00
Yan Chunwei
a5eff139f1
[TRTLLM-5277] chore: refine llmapi examples for 1.0 (part1) ( #5431 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-07-01 19:06:41 +08:00
nv-guomingz
6e48ac25a6
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. ( #5585 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 12:23:14 -04:00
dongjiyingdjy
852b79053d
feat : support duplicate_kv_weight for qwen3 blockwise scale ( #5459 )
...
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
2025-06-30 11:49:22 +08:00
nv-guomingz
578430e64c
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) ( #5014 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-30 11:05:40 +08:00
Talor Abramovich
70e34a3291
[TRTLLM-5831][feat] Add LoRA support for pytorch backend in trtllm-serve ( #5376 )
...
Signed-off-by: Talor Abramovich <talora@nvidia.com>
2025-06-29 12:46:30 +00:00
Lucas Liebenwein
619709fc33
[AutoDeploy] merge feat/ad-2025-06-13 ( #5556 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-29 03:52:14 +08:00
Darragh Hanley
5437075def
ReDrafter support for Qwen ( #4875 )
...
Signed-off-by: darraghdog <darragh.hanley@gmail.com>
Signed-off-by: Darragh Hanley <darragh.hanley@gmail.com>
Co-authored-by: rakib-hasan <rhasan@nvidia.com>
2025-06-28 02:33:10 +08:00
wili
56cdfe5c6c
[TRTLLM-5000][feat] NGrams V2 ( #4569 )
...
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-06-27 23:00:17 +08:00
jmydurant
8836990bde
[TRTLLM-3602][feat] support nvfp4 model and fp8 kv cache for MLA chunked prefill (Blackwell) ( #5475 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2025-06-26 22:18:08 +08:00
QI JUN
3a2c4ca77b
chore: split _build_model method for TorchLlm and TrtLlm ( #5418 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-06-26 04:32:46 +08:00
Yiqing Yan
f3cfe86dd1
chore: bump version to 1.0.0rc1 ( #5460 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-25 16:21:34 +08:00
Enwei Zhu
fc7a81ceb0
test: Add LLGuidance test and refine guided decoding ( #5348 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 14:12:56 +08:00
Enwei Zhu
76da7fed86
fix (NvBug 5354925): Fix static EPLB ( #5411 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 13:14:40 +08:00
Fanrong Li
ebadc13086
[doc] update mtp documents ( #5387 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-21 16:05:52 +08:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default ( #5312 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00
Yiqing Yan
dedce8ab0e
chore: bump version to 1.0.0rc0 ( #5326 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-06-19 12:02:28 +08:00
nv-guomingz
6a388b105a
chore: remove torch_compile prefix for TorchCompileConfig field members ( #5261 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-19 09:21:51 +08:00
Zhanrui Sun
516bd4dc05
chore: bump version to 0.21.0rc3 ( #5309 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-06-18 15:59:53 +08:00
amirkl94
8451a87742
chore: Mass integration of release/0.20 ( #5082 )
...
Signed-off-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Stanley Sun <190317771+StanleySun639@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Erin <14718778+hchings@users.noreply.github.com>
Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Yechan Kim <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-06-17 14:32:02 +03:00
bhsueh_NV
6a6b9d2594
doc: add document of benchmarking for Qwen3 ( #5158 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-06-17 16:18:55 +08:00
Izzy Putterman
e607768e45
Speculation: Draft Target in new FW ( #4558 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-06-17 02:26:08 +08:00
Yan Chunwei
c84e41fd9d
fix: build_config in TorchLlmArgs and avoid arbitrary args ( #4972 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-15 17:51:56 -07:00
Aurelien Chartier
1389f5a4d3
feat: Add support for fp8 rowwise quantization ( #4876 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: aikitoria <151776613+aikitoria@users.noreply.github.com>
2025-06-14 06:37:48 -07:00
Daniel Cámpora
dec326ba7d
[fix] Reenable test return logits ( #5160 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-06-13 06:07:22 +02:00
Yibin Li
b79eb34bfe
[fix]: Fall back to HMAC to Avoid IPC Serialization Churn ( #5074 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
2025-06-13 11:37:50 +08:00
Fanrong Li
38a907aaca
[TRTLLM-5278][feat] Add attention dp support to MTP relaxed acceptance ( #5119 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-06-13 08:58:44 +08:00
HuiGao-NV
43192379af
Use backend to replace macro to control enablement of MNNVL all reduce ( #4635 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-12 11:22:49 +08:00
Zhanrui Sun
e2863a3159
chore: bump version to 0.21.0rc2 ( #5112 )
...
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-06-11 15:08:14 +08:00
Enwei Zhu
00991d1520
chore: Merge remaining changes from feat/large-ep branch to main ( #5039 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Signed-off-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
Co-authored-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
Co-authored-by: ShiXiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Jun Yang <143764042+juney-nvidia@users.noreply.github.com>
2025-06-11 13:47:43 +08:00
Lucas Liebenwein
7ddc4d6282
[AutoDeploy] Merge Feature Branch Week 3 ( #5054 )
...
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
2025-06-11 00:20:43 +08:00