Shunkangz
|
32339d1b20
|
Raise shut down error for each request (#4936)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-07-04 18:58:24 +09:00 |
|
nv-guomingz
|
8dad22cbe7
|
chore: refine the default value by using pydantic default instead of … (#5695)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-07-03 22:41:29 +09:00 |
|
Yan Chunwei
|
2d69b55fe8
|
chore: enhance yaml loading arbitrary options in LlmArgs (#5610)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-07-02 14:21:37 +08:00 |
|
HuiGao-NV
|
10c50515c2
|
fix: Add back allreduce_strategy parameter into TorchLlmArgs (#5637)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-07-02 09:49:20 +08:00 |
|
Aurelien Chartier
|
efef911f5e
|
fix: add missing self. from PR #5346 (#5653)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-01 20:38:55 -04:00 |
|
Aurelien Chartier
|
fa95e402a5
|
feat: add LLmArgs option to force using dynamic quantization (#5346)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-01 12:16:09 -07:00 |
|
Kaiyu Xie
|
f9a455651b
|
perf: Use tokenizers API to optimize incremental detokenization perf (#5574)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-07-01 09:35:25 -04:00 |
|
nv-guomingz
|
6e48ac25a6
|
chore: remove cuda_graph_ prefix from cuda_graph_config filed members. (#5585)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 12:23:14 -04:00 |
|
Yan Chunwei
|
98a7c24062
|
chore [TRTLLM-6009]: remove ptuning knobs from TorchLlmArgs (#5595)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-30 20:40:23 +08:00 |
|
nv-guomingz
|
578430e64c
|
[TRTLLM-5530][BREAKING CHANGE]: enhance the llm args pytorch config part 1(cuda_graph_config) (#5014)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-30 11:05:40 +08:00 |
|
Lucas Liebenwein
|
619709fc33
|
[AutoDeploy] merge feat/ad-2025-06-13 (#5556)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-06-29 03:52:14 +08:00 |
|
wili
|
56cdfe5c6c
|
[TRTLLM-5000][feat] NGrams V2 (#4569)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-06-27 23:00:17 +08:00 |
|
Robin Kobus
|
8dfa31c71d
|
refactor: remove batch_manager::KvCacheConfig and use executor::KvCacheConfig instead (#5384)
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
|
2025-06-26 19:45:52 +08:00 |
|
QI JUN
|
3a2c4ca77b
|
chore: split _build_model method for TorchLlm and TrtLlm (#5418)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
|
2025-06-26 04:32:46 +08:00 |
|
Enwei Zhu
|
fc7a81ceb0
|
test: Add LLGuidance test and refine guided decoding (#5348)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-25 14:12:56 +08:00 |
|
Shunkangz
|
d5354897c0
|
feat: Dynamically remove servers in PD (#5270)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-06-25 09:50:04 +08:00 |
|
QI JUN
|
d93a5e04b5
|
Chore: remove unused variables (#5314)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-24 22:27:32 +08:00 |
|
Fanrong Li
|
5d4ab47d5b
|
fix: refactor and fix mtp vanilla (#4762)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-06-20 05:23:39 +08:00 |
|
Yan Chunwei
|
9bd42ecf9b
|
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default (#5312)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-20 03:01:10 +08:00 |
|
Kaiyu Xie
|
7246fd75d1
|
feat: Support stream_interval (#5284)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
|
2025-06-19 21:57:10 +08:00 |
|
nv-guomingz
|
6a388b105a
|
chore: remove torch_compile prefix for TorchCompileConfig field members (#5261)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-19 09:21:51 +08:00 |
|
Yan Chunwei
|
3946e798db
|
fix[nvbug5298640]: trtllm-llmapi-launch multiple LLM instances (#4727)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-19 06:13:53 +08:00 |
|
jellysnack
|
0623ffe3bc
|
feat: Add LLGuidance Support for PyTorch Backend (#5214)
Signed-off-by: jellysnack <oleg.jellysnack@gmail.com>
Signed-off-by: jellysnack <158609015+jellysnack@users.noreply.github.com>
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-18 19:33:34 +08:00 |
|
Yan Chunwei
|
724e495254
|
chore: partition LLM class into TorchLLM and TrtLLM (#4900)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-18 14:01:25 +08:00 |
|
Izzy Putterman
|
e607768e45
|
Speculation: Draft Target in new FW (#4558)
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
|
2025-06-17 02:26:08 +08:00 |
|
Yilin Fan
|
dd29063538
|
[feat] Add llm args to tune python gc threshold (#5141)
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
|
2025-06-16 17:45:22 +08:00 |
|
Yan Chunwei
|
c84e41fd9d
|
fix: build_config in TorchLlmArgs and avoid arbitrary args (#4972)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-15 17:51:56 -07:00 |
|
amitz-nv
|
109c426077
|
Enable trtllm-bench to run LoRA and add basic e2e perf testing capability for LoRA in PyT flow (#5130)
|
2025-06-15 18:54:04 +03:00 |
|
ixlmar
|
e055af1bc9
|
chore: improve disagg test failure detection (#4738)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
|
2025-06-15 01:28:26 +08:00 |
|
nv-guomingz
|
3b7b5a5ad5
|
refactor [BREAKING CHANGE]: enhance the llm args pytorch config part 3(torch_compile_config) (#5032)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-14 14:23:13 +08:00 |
|
nv-guomingz
|
b959618579
|
refactor [BREAKING CHANGE]:: remove the redundant use_kv_cache field from PytorchConfig (#5031)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-06-13 16:34:24 +08:00 |
|
Lucas Liebenwein
|
7ddc4d6282
|
[AutoDeploy] Merge Feature Branch Week 3 (#5054)
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
|
2025-06-11 00:20:43 +08:00 |
|
Yuxian Qiu
|
08dc369a4d
|
fix: pytorch_backend_config is deprecated in update_llm_args_with_extra_dict. (#4890)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-06-10 18:40:29 +08:00 |
|
Chang Liu
|
f70815c945
|
[TRTLLM-5007][feat] Add multimodal hashing support (image hashing) (#4145)
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
|
2025-06-10 01:59:56 +08:00 |
|
QI JUN
|
5ee0de7f2a
|
Resubmit #4894 (#4969)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-08 04:42:15 +08:00 |
|
QI JUN
|
bfa877a22e
|
Fix: fix autodeploy (#4957)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 21:06:55 +08:00 |
|
QI JUN
|
b8c5e3892b
|
Revert "fix: build_config in TorchLlmArgs and avoid invalid args" (#4949)
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-06-05 17:43:30 +08:00 |
|
Lucas Liebenwein
|
743fb0a159
|
[AutoDeploy] _AutoDeployLlmArgs as primary config object (#4891)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
|
2025-06-05 17:20:55 +08:00 |
|
Yi Zhang
|
1fca654bfd
|
tests: Update gb200 test case (#4754)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
|
2025-06-04 18:49:20 +08:00 |
|
Yan Chunwei
|
ac20159d32
|
fix: build_config in TorchLlmArgs and avoid invalid args (#4600)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-06-04 13:17:29 +08:00 |
|
Shunkangz
|
ae9a6cf24f
|
feat: Add integration of etcd (#3738)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Batsheva Black <bblack@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>
|
2025-06-03 20:01:44 +08:00 |
|
Enwei Zhu
|
0087bd27ba
|
[fix] Fix SamplingParams check on n and best_of (#4655)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-01 09:11:55 +08:00 |
|
Enwei Zhu
|
25dde49c28
|
fix: EP load balancer with MTP layer and route offset by EP rank (#4767)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-06-01 00:07:44 +08:00 |
|
Yilin Fan
|
31bb650298
|
Cherry pick feat/llama4 to main (#4739)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Co-authored-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2025-05-30 05:28:40 +08:00 |
|
Yan Chunwei
|
5506f60037
|
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs (#4603)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-05-28 18:43:04 +08:00 |
|
amirkl94
|
fbec0c3552
|
Release 0.20 to main (#4577)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: moraxu <mguzek@nvidia.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Co-authored-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: ruodil <200874449+ruodil@users.noreply.github.com>
Co-authored-by: stnie <82932102+stnie@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Faraz <58580514+farazkh80@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
Co-authored-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
|
2025-05-28 16:25:33 +08:00 |
|
Shunkangz
|
fd27f89df6
|
fix: Remove duplicate tokenization in generation server (#4492)
* Add nvtx
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Add draft change
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor and add support of chat
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
---------
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-05-26 16:43:07 +08:00 |
|
pcastonguay
|
d7d455e7ea
|
[feat][TRTLLM-5018] Dis serving python runtime trt backend (#4243)
* feat: Enabling dis serving with TRT backend with Python runtime
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing formatting
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing disagg mtp test
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-05-22 22:01:06 -04:00 |
|
Chuang Zhu
|
3410508020
|
cache_transceiver_config (#4556)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-05-22 13:59:51 +08:00 |
|
Yan Chunwei
|
4798d088d9
|
chore: Partition LlmArgs into TorchLlmArgs and TrtLlmArgs (#3823)
* partition LlmArgs
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* update backend
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
|
2025-05-22 09:40:56 +08:00 |
|