Aurelien Chartier
|
fa95e402a5
|
feat: add LLmArgs option to force using dynamic quantization (#5346)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
|
2025-07-01 12:16:09 -07:00 |
|
HuiGao-NV
|
43192379af
|
Use backend to replace macro to control enablement of MNNVL all reduce (#4635)
Signed-off-by: Hui Gao <huig@nvidia.com>
|
2025-06-12 11:22:49 +08:00 |
|
shaharmor98
|
7d94c9561f
|
feat: support multi lora adapters and TP (#3885)
* support multi lora, tp
Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
|
2025-05-08 23:45:45 +08:00 |
|
hlu1
|
cd2bcdc1a9
|
Fix create_weights in attention (#3692)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-04-24 07:30:00 +08:00 |
|
hlu1
|
b6bae33453
|
Clean up linear.py, mlp.py, gated_mlp.py (#3553)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
|
2025-04-16 12:21:44 -07:00 |
|
QI JUN
|
d167cbd5bb
|
refactor: remove ParallelConfig in tensorrt_llm._torch.distributed module (#3370)
* remove tensorrt_llm._torch.distributed.ParallelConfig
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* clean
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix embedding test
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix comments
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* polish
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* fix ci
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
* rebase
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
---------
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
|
2025-04-11 15:34:20 -07:00 |
|
danielafrimi
|
47f5cf6c0d
|
lora_tests (#3201)
LoRA tests and layers
Signed-off-by: Ubuntu <dafrimi@nvidia.com>
Co-authored-by: Ubuntu <dafrimi@nvidia.com>
|
2025-04-09 18:06:52 +03:00 |
|
Kaiyu Xie
|
3aa6b11d13
|
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM
---------
Co-authored-by: changcui <cuichang147@gmail.com>
|
2025-03-18 21:25:19 +08:00 |
|
Kaiyu Xie
|
2ea17cdad2
|
Update TensorRT-LLM (#2792)
* Update TensorRT-LLM
---------
Co-authored-by: jlee <jungmoolee@clika.io>
|
2025-02-18 21:27:39 +08:00 |
|
Kaiyu Xie
|
e88da961c5
|
Update TensorRT-LLM (#2783)
|
2025-02-13 18:40:22 +08:00 |
|
Dan Blanaru
|
16d2467ea8
|
Update TensorRT-LLM (#2755)
* Update TensorRT-LLM
---------
Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>
Update
|
2025-02-11 03:01:00 +00:00 |
|