Commit Graph

11 Commits

Author SHA1 Message Date
Aurelien Chartier
fa95e402a5
feat: add LLmArgs option to force using dynamic quantization (#5346)
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-07-01 12:16:09 -07:00
HuiGao-NV
43192379af
Use backend to replace macro to control enablement of MNNVL all reduce (#4635)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-06-12 11:22:49 +08:00
shaharmor98
7d94c9561f
feat: support multi lora adapters and TP (#3885)
* support multi lora, tp

Signed-off-by: Shahar Mor <17088876+shaharmor98@users.noreply.github.com>
2025-05-08 23:45:45 +08:00
hlu1
cd2bcdc1a9
Fix create_weights in attention (#3692)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-04-24 07:30:00 +08:00
hlu1
b6bae33453
Clean up linear.py, mlp.py, gated_mlp.py (#3553)
Signed-off-by: Hao Lu <14827759+hlu1@users.noreply.github.com>
2025-04-16 12:21:44 -07:00
QI JUN
d167cbd5bb
refactor: remove ParallelConfig in tensorrt_llm._torch.distributed module (#3370)
* remove tensorrt_llm._torch.distributed.ParallelConfig

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* clean

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix embedding test

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix comments

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* polish

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* fix ci

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

* rebase

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>

---------

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: hlu1 <14827759+hlu1@users.noreply.github.com>
2025-04-11 15:34:20 -07:00
danielafrimi
47f5cf6c0d
lora_tests (#3201)
LoRA tests and layers

Signed-off-by: Ubuntu <dafrimi@nvidia.com>
Co-authored-by: Ubuntu <dafrimi@nvidia.com>
2025-04-09 18:06:52 +03:00
Kaiyu Xie
3aa6b11d13
Update TensorRT-LLM (#2936)
* Update TensorRT-LLM

---------

Co-authored-by: changcui <cuichang147@gmail.com>
2025-03-18 21:25:19 +08:00
Kaiyu Xie
2ea17cdad2
Update TensorRT-LLM (#2792)
* Update TensorRT-LLM

---------

Co-authored-by: jlee <jungmoolee@clika.io>
2025-02-18 21:27:39 +08:00
Kaiyu Xie
e88da961c5
Update TensorRT-LLM (#2783) 2025-02-13 18:40:22 +08:00
Dan Blanaru
16d2467ea8 Update TensorRT-LLM (#2755)
* Update TensorRT-LLM

---------

Co-authored-by: Denis Kayshev <topenkoff@gmail.com>
Co-authored-by: akhoroshev <arthoroshev@gmail.com>
Co-authored-by: Patrick Reiter Horn <patrick.horn@gmail.com>

Update
2025-02-11 03:01:00 +00:00