QI JUN
616d1df7a0
[None][chore] set the default value of max_num_tokens explicitly ( #8208 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-10-14 23:03:02 -07:00
Izzy Putterman
1ad7bc4c78
[None][feat] Draft: Save state first pass ( #7012 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-10-01 18:40:55 -04:00
Pengyun Lin
c2bc39af63
[TRTLLM-1302][feat] Topk logprobs for TRT backend and top1 logprob for PyT backend ( #6097 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-09-12 15:32:34 +08:00
Chang Liu
ce53832610
[TRTLLM-7326][feat] Add standalone multimodal encoder ( #6743 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-19 21:42:50 -07:00
Simeng Liu
8cf3faa26a
[feat] Auto-enable ngram with concurrency <= 32. ( #6232 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Mike Iovine <mike.iovine7@gmail.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
Co-authored-by: Mike Iovine <mike.iovine7@gmail.com>
2025-07-31 18:45:51 -04:00
Yan Chunwei
45d441e60c
[TRTLLM-5061] chore: add status tags to LLM API reference ( #5707 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-28 15:57:07 +08:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend ( #5752 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
Robin Kobus
30a19fcf7c
[TRTLLM-6291] feat: Add user-provided speculative decoding support ( #5204 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-07-07 16:30:43 +02:00
Yan Chunwei
98a7c24062
chore [TRTLLM-6009]: remove ptuning knobs from TorchLlmArgs ( #5595 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-30 20:40:23 +08:00
Enwei Zhu
fc7a81ceb0
test: Add LLGuidance test and refine guided decoding ( #5348 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-25 14:12:56 +08:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default ( #5312 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00
Izzy Putterman
e607768e45
Speculation: Draft Target in new FW ( #4558 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2025-06-17 02:26:08 +08:00
Yilin Fan
dd29063538
[feat] Add llm args to tune python gc threshold ( #5141 )
...
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
2025-06-16 17:45:22 +08:00
Yan Chunwei
c84e41fd9d
fix: build_config in TorchLlmArgs and avoid arbitrary args ( #4972 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-15 17:51:56 -07:00
HuiGao-NV
dfeeaf6746
Move allreduce_strategy from committed api to reference ( #5147 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-12 21:00:20 +08:00
HuiGao-NV
43192379af
Use backend to replace macro to control enablement of MNNVL all reduce ( #4635 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-12 11:22:49 +08:00
QI JUN
b8c5e3892b
Revert "fix: build_config in TorchLlmArgs and avoid invalid args" ( #4949 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-05 17:43:30 +08:00
Yan Chunwei
ac20159d32
fix: build_config in TorchLlmArgs and avoid invalid args ( #4600 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-04 13:17:29 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
Thor Johnsen
5d438be59a
[TRTLLM-5000][feat] Pytorch implementation of ngram drafter ( #3936 )
...
* v1.5
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
v1.5.4 Add back draft_overhead to spec dec stats
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* v1.5.5: fix CI error
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v1.6: fix CI error 8196 > 8192
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* Address reviewer concerns
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* Address reviewer concerns
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* precommit run
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
* v2.0: Address reviewer concerns
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* v2.1: add fix from wili
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
* Revert changes that require use of TypeAlias because that requires python version >= 3.10
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
---------
Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
2025-05-21 10:40:00 +08:00
shaharmor98
27afcb9928
add changes for fp8, nemotron-nas, API ( #4180 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-05-18 23:27:25 +08:00
Erin
cba1793cda
cleanup logprob params ( #4039 )
...
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-07 00:50:16 +08:00
Erin
8fe7bdeacf
feat: LogitsProcessor in PyTorch backend ( #3145 )
...
* support lp in pytorch backend
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
* fix tp
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
---------
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-05-01 14:15:30 -07:00
Erin
83f37614ef
feat: Support Top-K logprobs and prompt_logprobs in LLMAPI ( #3388 )
...
* support return logprob in llmapi
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
update and add test
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
stability test
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
* revert removal of old flag
Signed-off-by: Erin Ho <erinh@nvidia.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
---------
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Signed-off-by: Erin Ho <erinh@nvidia.com>
2025-05-01 12:47:14 -04:00
Yan Chunwei
b21cfcfed1
chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python ( #3025 )
...
* make LlmArgs Pydantic
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* amending doc
fix api_stability
fix tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* restore yaml groups
refine StackTrace
singleton
clean tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix trtllm-bench
fix pytorch
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix serve distagg
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-05 13:31:48 +08:00
wili
34e63d07e6
feat: Variable-Beam-Width-Search (VBWS) Part2 ( #3133 )
...
* feat: Variable-Beam-Width-Search Part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part2, fix CPP tests
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part3, simplify CPP tests
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search Part4, move beam_width_array param
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search, fix CI error
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2, fix pre-commit
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
* feat: Variable-Beam-Width-Search part2, fix review
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
---------
Signed-off-by: wili-65535 <wili-65535@user.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@user.noreply.github.com>
2025-04-02 12:31:28 +08:00
wili
3e035f2219
v1.2 ( #3082 )
...
Signed-off-by: wili <wili@nvidia.com>
2025-03-26 23:31:29 +08:00
Enwei Zhu
224469b096
test: [TRTLLM-4334] Create 1.0 criteria scope from API stability references ( #3069 )
...
* committed APIs validation
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* clean name
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* separate
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add TODOs
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix naming
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-26 18:14:35 +08:00