yuxianq
|
fd8ded2b2b
|
feat: Support cos_sin_cache in all cases. (#3517)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-04-16 13:48:44 +08:00 |
|
shaharmor98
|
ede7058544
|
Feat/ Integrate peftCacheManager in PyExecutor creation (#3372)
* integrate peftCacheManager in PyExecutor creation
Signed-off-by: Shahar Mor <smor@nvidia.com>
|
2025-04-15 15:14:43 +08:00 |
|
Mike Iovine
|
5bdf997963
|
Add Llama 4 (#3302)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
|
2025-04-09 03:35:21 +08:00 |
|
Jinyang Yuan
|
992d513bc6
|
feat: Optionally split MoE inputs into chunks to reduce GPU memory usage (#3104)
Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
Co-authored-by: raccoonliukai <raccoonliu@tencent.com>
|
2025-04-01 16:07:02 +08:00 |
|
dongjiyingdjy
|
22ff81b047
|
fix:fix illeagel memory access when mtp >= 2 (#3006)
* fix - fix illeagel memory access when mtp > 2
---------
Signed-off-by: Jiying Dong <87510204+dongjiyingdjy@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
|
2025-04-01 13:36:45 +08:00 |
|
Kaiyu Xie
|
3aa6b11d13
|
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
|
2025-03-18 21:25:19 +08:00 |
|
Kaiyu Xie
|
77d7fe1eb2
|
Update TensorRT-LLM (#2849)
* Update TensorRT-LLM
---------
Co-authored-by: aotman <chenhangatm@gmail.com>
|
2025-03-04 18:44:00 +08:00 |
|
Kaiyu Xie
|
ab5b19e027
|
Update TensorRT-LLM (#2820)
|
2025-02-25 21:21:49 +08:00 |
|
Kaiyu Xie
|
2ea17cdad2
|
Update TensorRT-LLM (#2792)
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
|
2025-02-18 21:27:39 +08:00 |
|
Kaiyu Xie
|
e88da961c5
|
Update TensorRT-LLM (#2783)
|
2025-02-13 18:40:22 +08:00 |
|
Dan Blanaru
|
16d2467ea8
|
Update TensorRT-LLM (#2755)
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
|
2025-02-11 03:01:00 +00:00 |
|