Zongfei Jing
1e5af736ea
Add smart router for moe ( #3641 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2025-04-23 12:21:59 +08:00
Kaiyu Xie
a32389b4cd
fix: Remove unnecessary max call ( #3574 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-04-22 10:33:50 +08:00
rakib-hasan
ff3b741045
feat: adding multimodal (only image for now) support in trtllm-bench ( #3490 )
...
* feat: adding multimodal (only image for now) support in trtllm-bench
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* fix: add in load_dataset() calls to maintain the v2.19.2 behavior
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* re-adding prompt_token_ids and using that for prompt_len
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* updating the datasets version in examples as well
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* api changes are not needed
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* moving datasets requirement and removing a missed api change
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* addressing review comments
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
* refactoring the quickstart example
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
---------
Signed-off-by: Rakib Hasan <rhasan@nvidia.com>
2025-04-18 07:06:16 +08:00
Fanrong Li
1fe64b90be
fix: fix the acceptance rate of pytorch workflow in trtllm-bench ( #3240 )
...
* fix acceptance rate of pytorch workflow.
* revert the RequestOutput API change.
---------
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-04-03 15:12:24 +08:00
Frank
8bb3eea285
perf: Readd iteration logging for trtllm-bench. ( #3039 )
...
Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>
2025-04-01 08:13:09 +08:00
Yan Chunwei
82edd90350
fix gpus_per_node in trtllm-bench when world_size < device_count ( #3007 )
...
Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>
2025-03-27 09:31:40 +08:00
Suyog Gupta
047f2b234d
perf: [AutoDeploy] Enable AutoDeploy as a backend in trtllm-bench ( #3041 )
...
* Enable AutoDeploy as a backend in trtllm-bench
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* update how caches are resized
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* fix: files permission from 100755 to 100644
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* some comments
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* lint
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Fix function name
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* refactor
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Remove spurious change
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Add cursor generated doc strings
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* re-enable ad test
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* some perf cleanup
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* debug ci
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* ensure that overlap scheduler is enabled
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
* Reorder the tests
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
---------
Signed-off-by: Suyog Gupta <suyogg@nvidia.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-03-26 14:33:14 -07:00
Kaiyu Xie
77d7fe1eb2
Update TensorRT-LLM ( #2849 )
...
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
2025-03-04 18:44:00 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM ( #2783 )
2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8
Update TensorRT-LLM ( #2755 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
2025-02-11 03:01:00 +00:00
Kaiyu Xie
aaacc9bd68
Update TensorRT-LLM ( #2562 )
...
* Update TensorRT-LLM
---------
Co-authored-by: Starrick Liu <73152103+StarrickLiu@users.noreply.github.com>
2024-12-11 00:31:05 -08:00
石晓伟
548b5b7310
Update TensorRT-LLM ( #2532 )
...
* blossom-ci.yml: run vulnerability scan on blossom
* open source efb18c1256f8c9c3d47b7d0c740b83e5d5ebe0ec
---------
Co-authored-by: niukuo <6831097+niukuo@users.noreply.github.com>
Co-authored-by: pei0033 <59505847+pei0033@users.noreply.github.com>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2024-12-04 21:16:56 +08:00
Kaiyu Xie
385626572d
Update TensorRT-LLM ( #2502 )
...
* Update TensorRT-LLM
---------
Co-authored-by: 岑灿 <yunyi.hyy@alibaba-inc.com>
2024-11-26 16:51:34 +08:00
Kaiyu Xie
535c9cc673
Update TensorRT-LLM ( #2460 )
2024-11-19 18:30:34 +08:00