Lucas Liebenwein
5faa5e9dd8
[None][feat] AutoDeploy: dive deeper into token generation bugs + enable_block_reuse ( #8108 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-10-03 04:57:26 -07:00
Lucas Liebenwein
dcfd3ef81c
[ #4593 ][feat] AutoDeploy: Linear Attention Support (SSM + causal_conv + Bamba + Nemotron-H) ( #8068 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Co-authored-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
Co-authored-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-09-29 22:41:06 -04:00
Tailing Yuan
b11ee868c5
[ https://nvbugs/5495789 ][feat] Optionally disable server GC and worker GC ( #7995 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-09-26 21:39:24 +08:00
Guoming Zhang
202bed4574
[None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. ( #7851 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Yuan Tong
f050b8d871
[None][fix] refine backend option handling for commands ( #7829 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-09-24 10:54:33 +08:00
Tailing Yuan
740340dd17
[ https://nvbugs/5522847 ][fix] Disable GC on disagg server and client ( #7858 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2025-09-23 09:16:55 +08:00
Wanli Jiang
a7ca0fff54
[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend ( #7207 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 16:26:20 +08:00
Iman Tabrizian
bc84758626
[None][feat] Add logging for OAI disagg server ( #7232 )
2025-08-26 21:02:03 -07:00
Zheng Duan
cf50ba2980
[TRTLLM-6549][feat] add perf metrics endpoint to openai server and openai disagg server ( #6985 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2025-08-26 15:34:44 +08:00
shaharmor98
b32e00e9fd
[None][chore] remove CLI support for mamba cache dtype setting ( #7119 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-25 08:08:51 -04:00
Suyog Gupta
e3de5758a3
[ #7136 ][feat] trtllm-serve + autodeploy integration ( #7141 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2025-08-22 08:30:53 -07:00
Yechan Kim
0893afae3d
[TRTLLM-6771][feat] Support MMMU for multimodal models ( #6828 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-08-21 08:54:12 +08:00
Chang Liu
ce53832610
[TRTLLM-7326][feat] Add standalone multimodal encoder ( #6743 )
...
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-08-19 21:42:50 -07:00
rakib-hasan
7ab8112450
[None][fix] Refactoring to avoid circular import when importing torch models ( #6720 )
...
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-08-11 18:00:42 -04:00
shaharmor98
14b36e07d7
[TRTLLM-6174][feat] Enable FP32 mamba ssm cache ( #6574 )
...
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-08-10 16:27:51 -04:00
Haohang Huang
c9eebcb454
[TRTLLM-6674][feat] (Breaking Change) Hopper SWA non-cyclic kernels + KV reuse + Spec Dec ( #6379 )
...
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Signed-off-by: symphonylyh <31998628+symphonylyh@users.noreply.github.com>
2025-08-05 07:47:41 +00:00
Michal Guzek
08d57123f9
[nvbug/5374773] chore: Add a runtime flag to enable fail fast when attn window is too large to fit at least one sequence in KV cache ( #5974 )
...
Signed-off-by: moraxu <mguzek@nvidia.com>
2025-07-25 18:10:40 -04:00
Pengyun Lin
9832bef07d
[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve ( #5717 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-07-21 21:09:43 +08:00
Linda
3efad2e58c
feat: nanobind bindings ( #6185 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-21 08:56:57 +01:00
Rashid Kaleem
152e2df43b
[Disaggregated] Add retry knobs and handling ( #5808 )
...
Signed-off-by: Rashid Kaleem <4079439+arekay@users.noreply.github.com>
Signed-off-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
Co-authored-by: Shi Xiaowei <39303645+Shixiaowei02@users.noreply.github.com>
2025-07-19 07:27:59 +08:00
Iman Tabrizian
b75e53ab69
Revert "feat: nanobind bindings ( #5961 )" ( #6160 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-07-18 10:12:54 +08:00
Linda
5bff317abf
feat: nanobind bindings ( #5961 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2025-07-17 22:42:52 +08:00
Chuang Zhu
44c70c88f9
chore:[BREAKING CHANGE] use cacheTransceiverConfig as knobs for disagg service ( #5234 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-07-17 17:42:07 +08:00
Yan Chunwei
9bd42ecf9b
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default ( #5312 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-20 03:01:10 +08:00
Enwei Zhu
babdd9ce06
test: Add json_mode_eval for guided decoding evaluation ( #5179 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-06-16 10:03:55 +08:00
Yan Chunwei
c84e41fd9d
fix: build_config in TorchLlmArgs and avoid arbitrary args ( #4972 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-15 17:51:56 -07:00
nv-guomingz
58d4ca2385
fix:remove duplicated trust_remote_code knob from trtllm-serve ( #5143 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-06-12 19:48:24 +08:00
Yechan Kim
8b4104d34a
feat: add HyperCLOVAX-SEED-Vision support in refactored way ( #4799 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-06-09 11:04:04 +08:00
Shunkangz
3eae58ca36
Add disaggregated unittest ( #4899 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-06-05 19:14:31 +08:00
QI JUN
b8c5e3892b
Revert "fix: build_config in TorchLlmArgs and avoid invalid args" ( #4949 )
...
Signed-off-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-06-05 17:43:30 +08:00
Yan Chunwei
ac20159d32
fix: build_config in TorchLlmArgs and avoid invalid args ( #4600 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-06-04 13:17:29 +08:00
Shunkangz
ae9a6cf24f
feat: Add integration of etcd ( #3738 )
...
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Batsheva Black <bblack@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: BatshevaBlack <132911331+BatshevaBlack@users.noreply.github.com>
2025-06-03 20:01:44 +08:00
Yan Chunwei
93c0632ee4
opt: the perormance for dist-agg streaming generation ( #4214 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-31 17:40:32 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs ( #4603 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
Chuang Zhu
4318037ca3
fix disagg config params ( #4646 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-05-26 23:28:52 +08:00
Zheng Duan
77a0189554
feat: conditional disaggregation in disagg server ( #3974 )
2025-05-21 09:57:46 +08:00
Kaiyu Xie
b4e5df0ee0
Breaking change: perf: Enable scheduling overlap by default ( #4174 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-15 14:27:36 +08:00
Zheng Duan
c9e2a963e0
feat: add kv cache aware router ( #3831 )
...
* kv cache aware router
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* add tests
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* router config
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* eviction test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
add test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* eviction detect in worker test
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* move worker tests to single gpu
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* reduce memory fraction
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
* fix partial block
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
---------
Signed-off-by: Zheng Duan <200704041+zhengd-nv@users.noreply.github.com>
2025-05-12 07:23:57 -04:00
pcastonguay
836c142e1b
[feat] Allow overriding cli args with yaml file in trtllm-serve ( #4164 )
...
feat: Allow overriding cli args with yaml file in trtllm-serve
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-05-08 21:19:05 -04:00
pansicheng
e84dc6b3c7
feat: add deepseek-r1 reasoning parser to trtllm-serve ( #3354 )
...
* add deepseek-r1 reasoning parser
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
* fix test
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
---------
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2025-05-06 08:13:04 +08:00
Zongfei Jing
1e5af736ea
Add smart router for moe ( #3641 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-23 12:21:59 +08:00
Xianjie Qiao
ba4131f176
Add log_level for disaggregated_mpi_worker ( #3765 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-04-22 09:14:46 -07:00
Enwei Zhu
3fa19ffa4e
test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA ( #3483 )
...
* add gsm8k
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix gsm8k
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add gpqa
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* conditional import lm_eval
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* gpqa in lm_eval
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* system prompt
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* shuffle
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update AA prompt and regex
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* revert AA prompt and regex
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* integration to tests
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add DS-R1
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix and clean
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update tests
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* update
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* clean up
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* free_gpu_memory_fraction=0.8
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-22 07:38:16 +08:00
pcastonguay
ae5671644a
feat: Disaggregated router class ( #3584 )
...
* Add draft scheduler class
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* Refactor the design
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
* feat: Introduce router class for disaggregated server
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Add unit tests for router class
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Adding tests for disagg_utils
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing missing import
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing disagg integration tests
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Addressing MR review comments
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-19 00:34:12 +08:00
Pengyun Lin
60e02a3684
Use llm.tokenizer in OpenAIServer ( #3199 )
...
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-04-08 14:55:02 +08:00
Yan Chunwei
b21cfcfed1
chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python ( #3025 )
...
* make LlmArgs Pydantic
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* amending doc
fix api_stability
fix tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* restore yaml groups
refine StackTrace
singleton
clean tests
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix trtllm-bench
fix pytorch
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix serve distagg
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
* fix
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
---------
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-05 13:31:48 +08:00
pcastonguay
b763051ba4
chore: Refactor disaggregated serving scripts ( #3073 )
...
* chore: Refactor to reduce duplicated code in disagg server, reuse trtllm-serve
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Updating README, removing launch script
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Fixing integration tests
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
* Adding scripts to populate urls section of disagg config based on SLURM env vars
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
---------
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-03 14:55:05 -04:00
Enwei Zhu
b2f69db507
test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of trtllm-eval ( #3167 )
...
* add eval_llmapi
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
tmp commit
port to CLI tool
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
setup llmapi
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix spec_dec_algo
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
_update_from_hf_quant_config
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
migrate test_pytorch.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix fp8 block scales
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
fix fp8 rowwise
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
adj alpha
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move test_pytorch.py cases
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
move
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
rename test_accuracy.py to test_cli.py
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
clean
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix cnn_dailymail
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* renaming to cli flow
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* rename MMLU
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* rename
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* add error
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
* fix
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
---------
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-01 22:20:29 +08:00
Kaiyu Xie
2631f21089
Update ( #2978 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00