Commit Graph

48 Commits

Author SHA1 Message Date
Yan Chunwei
ad4226d946
fix: trtllm-bench build trt engine on slurm (#3825)
* add submit_sync to RemoteMpiSessionClient

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

add barrier

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

fix comment

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

disable test

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-27 22:26:23 +08:00
Chuang Zhu
e2318756ed
cacheTransceiver buffer manager (#3798)
* cacheTransceiver buffer manager

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* fix args

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* cpp kvCacheManager

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

* format

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>

---------

Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-04-27 11:48:15 +08:00
milesial
362a8272f8
feat: llama4 input processor (#3383)
Signed-off-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Signed-off-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
Co-authored-by: Alexandre Milesi <30204471+milesial@users.noreply.github.com>
Co-authored-by: Haohang Huang <31998628+symphonylyh@users.noreply.github.com>
2025-04-25 16:47:14 -07:00
shaharmor98
49262a62a5
add passing E2E LoRA flow (#3788)
add passing E2E LoRA flow (#3788)

Signed-off-by: Shahar Mor <smor@nvidia.com>
2025-04-23 18:38:06 +03:00
Zongfei Jing
1e5af736ea
Add smart router for moe (#3641)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-23 12:21:59 +08:00
shaharmor98
5fff8f0935
Add running E2E LoRA flow (#3648)
* add passing E2E LoRA flow

Signed-off-by: Shahar Mor <smor@nvidia.com>

* add experimental feature

Signed-off-by: Shahar Mor <smor@nvidia.com>

* fix llma_args definition

Signed-off-by: Shahar Mor <smor@nvidia.com>

* decreased manually size of max loras to address OOM

Signed-off-by: Shahar Mor <smor@nvidia.com>

---------

Signed-off-by: Shahar Mor <smor@nvidia.com>
2025-04-23 11:19:41 +08:00
Enwei Zhu
3fa19ffa4e
test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483)
* add gsm8k

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix gsm8k

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add gpqa

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* conditional import lm_eval

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* gpqa in lm_eval

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* system prompt

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* shuffle

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update AA prompt and regex

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* revert AA prompt and regex

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* integration to tests

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add DS-R1

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix and clean

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update tests

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* update

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* clean up

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* free_gpu_memory_fraction=0.8

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-22 07:38:16 +08:00
pcastonguay
ae5671644a
feat: Disaggregated router class (#3584)
* Add draft scheduler class

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* Refactor the design

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>

* feat: Introduce router class for disaggregated server

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Add unit tests for router class

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Adding tests for disagg_utils

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing missing import

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing disagg integration tests

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Addressing MR review comments

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-19 00:34:12 +08:00
Yan Chunwei
2a09826ec4
fix hmac in remote mpi session (#3649)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-04-18 17:47:51 +08:00
Enwei Zhu
44da0e8d60
fix: LLM API _hf_model_dir for non-cached case (#3562)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-16 10:39:34 +08:00
Kaiyu Xie
e037d3e99b
chore: Unify Python NVTX call (#3450)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-15 23:25:36 +08:00
bhsueh_NV
3aa37e6b72
fix bug (#3570)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-04-15 16:50:22 +08:00
Yuan Tong
668a0335e4
fix: Proper error bubbling for PyExecutor (#3321)
* fix: Proper error bubbling for PyExecutor
* fix: Proper shutdown
* fix: multi gpu proper shutdown

Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-04-15 14:49:46 +08:00
dongjiyingdjy
2fb1d65d43
fix: fix max_seq_len in executor_config (#3487)
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
2025-04-14 15:13:29 +08:00
Yan Chunwei
b37c5c0a4d
make LLM-API slurm examples executable (#3402)
Signed-off-by: chunweiy <328693+Superjomn@users.noreply.github.com>
2025-04-13 21:42:45 +08:00
Yan Chunwei
74850c61e9
fix: switch ZMQ from file socket to tcp socket in RemoteMpiCommSession (#3462)
* switch ZMQ from file socket to tcp

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix comment

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-13 09:15:55 +08:00
Enwei Zhu
cf9ceea890
test: Add DeepSeek-V3-Lite PP=4 cases (#3454)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-12 00:09:12 +08:00
yuxianq
7b03350527
Add thread leak check and fix thread/memory leak issues. (#3270)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-04-08 19:03:18 +08:00
Yan Chunwei
deb876ecdb
clean up trtllm-llmapi-launch logs (#3358)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-08 16:00:59 +08:00
tburt-nv
7a659885e3
chore: remove usernames from comments (#3291)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-04-05 13:44:28 +08:00
Yan Chunwei
b21cfcfed1
chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python (#3025)
* make LlmArgs Pydantic

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* amending doc

fix api_stability

fix tests

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* restore yaml groups

refine StackTrace

singleton

clean tests

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix trtllm-bench

fix pytorch

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix serve distagg

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-05 13:31:48 +08:00
pcastonguay
b763051ba4
chore: Refactor disaggregated serving scripts (#3073)
* chore: Refactor to reduce duplicated code in disagg server, reuse trtllm-serve

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Updating README, removing launch script

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Fixing integration tests

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

* Adding scripts to populate urls section of disagg config based on SLURM env vars

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>

---------

Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-04-03 14:55:05 -04:00
Enwei Zhu
b2f69db507
test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of trtllm-eval (#3167)
* add eval_llmapi

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

tmp commit

port to CLI tool

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

move

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

setup llmapi

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

fix spec_dec_algo

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

_update_from_hf_quant_config

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

migrate test_pytorch.py

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

fix fp8 block scales

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

fix fp8 rowwise

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

adj alpha

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

move test_pytorch.py cases

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

move

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

rename test_accuracy.py to test_cli.py

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

clean

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix cnn_dailymail

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* renaming to cli flow

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* rename MMLU

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* rename

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add error

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-04-01 22:20:29 +08:00
Shunkangz
dda7354d1a
Refactor return of first gen token in PD (#2986)
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-04-01 12:28:27 +08:00
Yan Chunwei
794f61c997
fix: fix single-node cannot quit issue on slurm (#3140)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-31 10:15:27 +08:00
Mike Iovine
5416966ddb
Add initial EAGLE-3 implementation (#3035)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-03-29 22:31:24 +08:00
Erin
c75d7cd684
move BuildConfig functional args to llmargs (#3036)
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2025-03-29 02:20:18 +08:00
Enwei Zhu
224469b096
test: [TRTLLM-4334] Create 1.0 criteria scope from API stability references (#3069)
* committed APIs validation

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* clean name

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* separate

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* add TODOs

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix naming

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

* fix

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>

---------

Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-03-26 18:14:35 +08:00
Yan Chunwei
531b98ed62
feat: Add several pure python configs to LlmArgs (#2997)
* add SchedulerConfig
* add PeftCacheConfig
2025-03-24 16:16:17 +08:00
Kaiyu Xie
2631f21089
Update (#2978)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-23 16:39:35 +08:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM

---------

Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
9b931c0f63
Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM

---------

Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
ab5b19e027
Update TensorRT-LLM (#2820) 2025-02-25 21:21:49 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM (#2792)
* Update TensorRT-LLM

---------

Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8 Update TensorRT-LLM (#2755)
* Update TensorRT-LLM

---------

Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>

Update
2025-02-11 03:01:00 +00:00
Denis Kayshev
d93a2dde84
Fix kwarg name (#2691) 2025-01-20 12:18:26 +08:00
Kaiyu Xie
be17881062
Update TensorRT-LLM (#2582) 2024-12-16 21:50:47 -08:00
Kaiyu Xie
aaacc9bd68
Update TensorRT-LLM (#2562)
* Update TensorRT-LLM

---------

Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
2024-12-11 00:31:05 -08:00
石晓伟
548b5b7310
Update TensorRT-LLM (#2532)
* blossom-ci.yml: run vulnerability scan on blossom

* open source efb18c1256f8c9c3d47b7d0c740b83e5d5ebe0ec

---------

Co-authored-by: niukuo <6831097+niukuo@users.noreply.github.com>
Co-authored-by: pei0033 <59505847+pei0033@users.noreply.github.com>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2024-12-04 21:16:56 +08:00
Kaiyu Xie
385626572d
Update TensorRT-LLM (#2502)
* Update TensorRT-LLM

---------

Co-authored-by: 岑灿 <yunyi.hyy@alibaba-inc.com>
2024-11-26 16:51:34 +08:00
Kaiyu Xie
535c9cc673
Update TensorRT-LLM (#2460) 2024-11-19 18:30:34 +08:00
Kaiyu Xie
c629546ce4
Update TensorRT-LLM (#2436) 2024-11-12 15:27:49 +08:00
Kaiyu Xie
b7868dd1bd
Update TensorRT-LLM (#2413) 2024-11-05 16:27:06 +08:00
Kaiyu Xie
f14d1d433c
Update TensorRT-LLM (#2389)
* Update TensorRT-LLM

---------

Co-authored-by: Alessio Netti <netti.alessio@gmail.com>
2024-10-29 22:24:38 +08:00
Kaiyu Xie
1730a587d8
Update TensorRT-LLM (#2363)
* Update TensorRT-LLM

---------

Co-authored-by: tonylek <137782967+tonylek@users.noreply.github.com>
2024-10-22 20:27:35 +08:00
Kaiyu Xie
75057cd036
Update TensorRT-LLM (#2333)
* Update TensorRT-LLM

---------

Co-authored-by: Puneesh Khanna <puneesh.khanna@tii.ae>
Co-authored-by: Ethan Zhang <26497102+ethnzhng@users.noreply.github.com>
2024-10-15 15:28:40 +08:00