wili
|
2e3cf42e03
|
[refactor] Simplification of Speculative decoding configs (#5639)
Signed-off-by: wili-65535 <wili-65535@users.noreply.github.com>
Co-authored-by: wili-65535 <wili-65535@users.noreply.github.com>
|
2025-07-10 11:37:30 -04:00 |
|
Darragh Hanley
|
5437075def
|
ReDrafter support for Qwen (#4875)
Signed-off-by: darraghdog <darragh.hanley@gmail.com>
Signed-off-by: Darragh Hanley <darragh.hanley@gmail.com>
Co-authored-by: rakib-hasan <rhasan@nvidia.com>
|
2025-06-28 02:33:10 +08:00 |
|
Ivan Sorokin
|
d40fce474a
|
fix: redrafter sampling (#3278)
* Fix redrafter sampling
Signed-off-by: Ivan Sorokin <isorokin@nvidia.com>
* Rename redrafter bream search var
Signed-off-by: Ivan Sorokin <isorokin@nvidia.com>
* Remove _beam_search_candidates_v0
Signed-off-by: Ivan Sorokin <isorokin@nvidia.com>
* Remove unused import
Signed-off-by: Ivan Sorokin <isorokin@nvidia.com>
---------
Signed-off-by: Ivan Sorokin <isorokin@nvidia.com>
|
2025-04-08 07:49:32 +08:00 |
|
Kaiyu Xie
|
c629546ce4
|
Update TensorRT-LLM (#2436)
|
2024-11-12 15:27:49 +08:00 |
|
Kaiyu Xie
|
8681b3a4c0
|
open source 4dbf696ae9b74a26829d120b67ab8443d70c8e58 (#2297)
* Update TensorRT-LLM
---------
Co-authored-by: Bhuvanesh Sridharan <bhuvanesh.sridharan@sprinklr.com>
Co-authored-by: Qingquan Song <ustcsqq@gmail.com>
|
2024-10-08 12:19:19 +02:00 |
|
Kaiyu Xie
|
74b324f667
|
Update TensorRT-LLM (#2110)
|
2024-08-13 22:34:33 +08:00 |
|
Kaiyu Xie
|
2d234357c6
|
Update TensorRT-LLM (#1954)
* Update TensorRT-LLM
---------
Co-authored-by: Altair-Alpha <62340011+Altair-Alpha@users.noreply.github.com>
|
2024-07-16 15:30:25 +08:00 |
|