Pengyun Lin
|
388b4919b8
|
[nvbug 5304752][fix] enhance _check_arguments to filter illegal requests for pytorch backend (#5541)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
|
2025-07-14 17:17:30 +08:00 |
|
dominicshanshan
|
c9e7f831dc
|
Breaking change: perf: [TRTLLM-4662] Enable cuda graph by default (#5480)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
|
2025-07-14 16:42:23 +08:00 |
|
wili
|
cfcb97af0e
|
[BUG5388075][fix] Fix error in post-merge-tests (#5949)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-14 14:33:39 +09:00 |
|
Mike Iovine
|
8950223f6f
|
[fix] Remove SpecConfig and fix thread leak issues (#5931)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-07-12 21:03:24 +09:00 |
|
wili
|
2e3cf42e03
|
[refactor] Simplification of Speculative decoding configs (#5639)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-10 11:37:30 -04:00 |
|
Enwei Zhu
|
055c4a9fe6
|
[NvBug 5370718, 5371538] fix: Fix incremental detokenization (#5825)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-07-10 16:30:00 +08:00 |
|
Yan Chunwei
|
07f6da763d
|
[TRTLLM-5530] chore: rename LLM.autotuner_enabled to enable_autotuner (#5876)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-10 11:31:35 +08:00 |
|
nv-guomingz
|
0be41b6524
|
Revert "chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie…" (#5818)
|
2025-07-08 13:15:30 +09:00 |
|
Yechan Kim
|
5bc3a15f10
|
feat: add MultimodalParams & putting all multimodal params into it and refactor HyperCLOVAX & Qwen2/2.5-VL (#5522)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
|
2025-07-07 18:03:12 -07:00 |
|
nv-guomingz
|
5a8173c121
|
chore: [Breaking Change] Rename cuda_graph_config padding_enabled fie… (#5795)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-08 08:52:36 +08:00 |
|
Robin Kobus
|
30a19fcf7c
|
[TRTLLM-6291] feat: Add user-provided speculative decoding support (#5204)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-07-07 16:30:43 +02:00 |
|
Yan Chunwei
|
dfce61f4b9
|
[TRTLLM-5530][BREAKING CHANGE] refactor: LLM arglist rename mixed_sampler to enable_mixed_sampler (#5751)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-07 17:05:14 +08:00 |
|
HuiGao-NV
|
3ed3bbcb5d
|
Fix: pass allreduce strategy to pytorchConfig (#5746)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-07-04 21:32:13 +09:00 |
|
Shunkangz
|
32339d1b20
|
Raise shut down error for each request (#4936)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-07-04 18:58:24 +09:00 |
|
nv-guomingz
|
8dad22cbe7
|
chore: refine the default value by using pydantic default instead of … (#5695)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-03 22:41:29 +09:00 |
|
Yan Chunwei
|
2d69b55fe8
|
chore: enhance yaml loading arbitrary options in LlmArgs (#5610)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-02 14:21:37 +08:00 |
|
HuiGao-NV
|
10c50515c2
|
fix: Add back allreduce_strategy parameter into TorchLlmArgs (#5637)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-07-02 09:49:20 +08:00 |
|
Aurelien Chartier
|
efef911f5e
|
fix: add missing self. from PR #5346 (#5653)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-01 20:38:55 -04:00 |
|
Aurelien Chartier
|
fa95e402a5
|
feat: add LLmArgs option to force using dynamic quantization (#5346)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-01 12:16:09 -07:00 |
|
Kaiyu Xie
|
f9a455651b
|
perf: Use tokenizers API to optimize incremental detokenization perf (#5574)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-01 09:35:25 -04:00 |
|
nv-guomingz
|
6e48ac25a6
|
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. (#5585)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 12:23:14 -04:00 |
|
Yan Chunwei
|
98a7c24062
|
chore [TRTLLM-6009]: remove ptuning knobs from TorchLlmArgs (#5595)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-30 20:40:23 +08:00 |
|
nv-guomingz
|
578430e64c
|
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 11:05:40 +08:00 |
|
Lucas Liebenwein
|
619709fc33
|
[AutoDeploy] merge feat/ad-2025-06-13 (#5556)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-06-29 03:52:14 +08:00 |
|
wili
|
56cdfe5c6c
|
[TRTLLM-5000][feat] NGrams V2 (#4569)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-06-27 23:00:17 +08:00 |
|
Robin Kobus
|
8dfa31c71d
|
refactor: remove batch_manager::KvCacheConfig and use executor::KvCacheConfig instead (#5384)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-26 19:45:52 +08:00 |
|
QI JUN
|
3a2c4ca77b
|
chore: split _build_model method for TorchLlm and TrtLlm (#5418)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-06-26 04:32:46 +08:00 |
|
Enwei Zhu
|
fc7a81ceb0
|
test: Add LLGuidance test and refine guided decoding (#5348)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-25 14:12:56 +08:00 |
|
Shunkangz
|
d5354897c0
|
feat: Dynamically remove servers in PD (#5270)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-06-25 09:50:04 +08:00 |
|
QI JUN
|
d93a5e04b5
|
Chore: remove unused variables (#5314)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-24 22:27:32 +08:00 |
|
Fanrong Li
|
5d4ab47d5b
|
fix: refactor and fix mtp vanilla (#4762)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-20 05:23:39 +08:00 |
|
Yan Chunwei
|
9bd42ecf9b
|
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-20 03:01:10 +08:00 |
|
Kaiyu Xie
|
7246fd75d1
|
feat: Support stream_interval (#5284)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-19 21:57:10 +08:00 |
|
nv-guomingz
|
6a388b105a
|
chore: remove torch_compile prefix for TorchCompileConfig field members (#5261)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-19 09:21:51 +08:00 |
|
Yan Chunwei
|
3946e798db
|
fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances (#4727)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-19 06:13:53 +08:00 |
|
jellysnack
|
0623ffe3bc
|
feat: Add LLGuidance Support for PyTorch Backend (#5214)
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-18 19:33:34 +08:00 |
|
Yan Chunwei
|
724e495254
|
chore: partition LLM class into TorchLLM and TrtLLM (#4900)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-18 14:01:25 +08:00 |
|
Izzy Putterman
|
e607768e45
|
Speculation: Draft Target in new FW (#4558)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
|
2025-06-17 02:26:08 +08:00 |
|
Yilin Fan
|
dd29063538
|
[feat] Add llm args to tune python gc threshold (#5141)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
|
2025-06-16 17:45:22 +08:00 |
|
Yan Chunwei
|
c84e41fd9d
|
fix: build_config in TorchLlmArgs and avoid arbitrary args (#4972)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-15 17:51:56 -07:00 |
|
amitz-nv
|
109c426077
|
Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130)
|
2025-06-15 18:54:04 +03:00 |
|
ixlmar
|
e055af1bc9
|
chore: improve disagg test failure detection (#4738)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-15 01:28:26 +08:00 |
|
nv-guomingz
|
3b7b5a5ad5
|
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) (#5032)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-14 14:23:13 +08:00 |
|
nv-guomingz
|
b959618579
|
refactor [BREAKING CHANGE]:: remove the redundant use_kv_cache field from PytorchConfig (#5031)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-13 16:34:24 +08:00 |
|
Lucas Liebenwein
|
7ddc4d6282
|
[AutoDeploy] Merge Feature Branch Week 3 (#5054)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-06-11 00:20:43 +08:00 |
|
Yuxian Qiu
|
08dc369a4d
|
fix: pytorch_backend_config is deprecated in update_llm_args_with_extra_dict. (#4890)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-06-10 18:40:29 +08:00 |
|
Chang Liu
|
f70815c945
|
[TRTLLM-5007][feat] Add multimodal hashing support (image hashing) (#4145)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
|
2025-06-10 01:59:56 +08:00 |
|
QI JUN
|
5ee0de7f2a
|
Resubmit #4894 (#4969)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-08 04:42:15 +08:00 |
|
QI JUN
|
bfa877a22e
|
Fix: fix autodeploy (#4957)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 21:06:55 +08:00 |
|
QI JUN
|
b8c5e3892b
|
Revert "fix: build_config in TorchLlmArgs and avoid invalid args" (#4949)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 17:43:30 +08:00 |
|