Commit Graph

26 Commits

Author SHA1 Message Date
QI JUN
1529a6f22d
[None][chore] extract weights loading related logic to model loader (#7579)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-25 10:19:22 -07:00
Guoming Zhang
202bed4574 [None][chroe] Rename TensorRT-LLM to TensorRT LLM for source code. (#7851)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-25 21:02:35 +08:00
Frank
78ecfbb4a4
[None][fix] Fix data type of KV Cache percentage in bench. (#7230)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-26 12:28:09 -04:00
Frank
81fd468fec
[None][fix] Correct KV cache percentage report out. (#7102)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-22 10:28:57 -07:00
Frank
2cc59aacb3
[None][fix] Correct reporting of torch_dtype for ModelConfig class. (#6800)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-08-14 22:46:20 -04:00
hlu1
8207d5fd39
[None] [feat] Add model gpt-oss (#6645)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-08-07 03:04:18 -04:00
kris1025
6a3a921284
[TRTLLM-6685][feat] Add speculative metrics for trt llm bench (#6476)
Signed-off-by: linquanh <linquanh@nvidia.com>
2025-08-04 15:22:57 -07:00
Yan Chunwei
a02606a9e2
[TRTLLM-5530][BREAKING CHANGE] refactor: unify KvCacheConfig in LLM class for pytorch backend (#5752)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-07-16 16:42:59 +08:00
Frank
aa4eebe973
[enhance] Add the ability to write a request timeline. (#5258)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
Signed-off-by: Frank <3429989+FrankD412@users.noreply.github.com>
2025-07-10 17:15:30 -07:00
Lucas Liebenwein
743fb0a159
[AutoDeploy] _AutoDeployLlmArgs as primary config object (#4891)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-06-05 17:20:55 +08:00
Yilin Fan
31bb650298
Cherry pick feat/llama4 to main (#4739)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Co-authored-by: Chenfei Zhang <chenfeiz@nvidia.com>
2025-05-30 05:28:40 +08:00
Yan Chunwei
5506f60037
chore [BREAKING CHANGE]: Flatten PyTorchConfig knobs into TorchLlmArgs (#4603)
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-05-28 18:43:04 +08:00
Frank
57afbf6b79
Fix incorrect conversion. (#4112)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-05-09 06:34:52 +08:00
Frank
cf15efa15e
[TRTLLM-4883][fix]: Update output speed calculation. (#3923)
* Update gen tps calculation.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Add back output speed for comparison.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Fix issue with f-string.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Fix some spacing.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Replace output speed with per-request genphase tput.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Add gen TPS breakdown.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

* Update some tagging.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

---------

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-04-29 11:04:12 +08:00
dongxuy04
16535991b2
feat: Add MNNVL MoE A2A support (#3504)
* add MNNVL memory mapping support

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add more MPI environment for trtllm-llmapi-launch

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add MoE communication and prepare kernels

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add MNNVL AlltoAll support for DeepSeekV3

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* add output dump for throughput benchmark

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* support dynamic kernel launch grid

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* address review comments

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

* address review comments #2

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>

---------

Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-04-25 17:29:08 +08:00
Frank
5a6cb2b985
fix: Correct reporting of text dtype for Llama 4 (#3494)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-04-18 00:07:49 +08:00
Yan Chunwei
b21cfcfed1
chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python (#3025)
* make LlmArgs Pydantic

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* amending doc

fix api_stability

fix tests

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* restore yaml groups

refine StackTrace

singleton

clean tests

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix trtllm-bench

fix pytorch

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix serve distagg

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

* fix

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>

---------

Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-04-05 13:31:48 +08:00
Frank
f8a4cc0629
perf: Add total token throughput metric. (#3212)
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-04-05 13:17:59 +08:00
Fanrong Li
1fe64b90be
fix: fix the acceptance rate of pytorch workflow in trtllm-bench (#3240)
* fix acceptance rate of pytorch workflow.
* revert the RequestOutput API change.

---------

Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-04-03 15:12:24 +08:00
Suyog Gupta
047f2b234d
perf: [AutoDeploy] Enable AutoDeploy as a backend in trtllm-bench (#3041)
* Enable AutoDeploy as a backend in trtllm-bench

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* update how caches are resized

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* fix: files permission from 100755 to 100644

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* some comments

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* lint

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* lint

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* lint

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* lint

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* Fix function name

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* refactor

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* Remove spurious change

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* Add cursor generated doc strings

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* re-enable ad test

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* some perf cleanup

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* debug ci

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* ensure that overlap scheduler is enabled

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

* Reorder the tests

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>

---------

Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-26 14:33:14 -07:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM

---------

Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM

---------

Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8 Update TensorRT-LLM (#2755)
* Update TensorRT-LLM

---------

Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>

Update
2025-02-11 03:01:00 +00:00
Kaiyu Xie
aaacc9bd68
Update TensorRT-LLM (#2562)
* Update TensorRT-LLM

---------

Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
2024-12-11 00:31:05 -08:00
Kaiyu Xie
385626572d
Update TensorRT-LLM (#2502)
* Update TensorRT-LLM

---------

Co-authored-by: 岑灿 <yunyi.hyy@alibaba-inc.com>
2024-11-26 16:51:34 +08:00