Yilin Fan
|
31bb650298
|
Cherry pick feat/llama4 to main (#4739)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
Signed-off-by: Yilin Fan <206948969+nv-yilinf@users.noreply.github.com>
Co-authored-by: Chenfei Zhang <chenfeiz@nvidia.com>
|
2025-05-30 05:28:40 +08:00 |
|
Netanel Haber
|
9cd8148f28
|
API Breaking Change + Readability: "decoder"->"sampler" (#4121)
* *decoder*->*sampler*; new_tensors_device: dict[str, torch.Tensor] -> device: SampleStateTensors
* **Breaking Change**, as it changes public interfaces, main changes:
* PyTorchConfig [consumed via LLM(pytorch_backend_config)]: Configuration parameters mixed_decoder and enable_trtllm_decoder -> sampler.
* Command-line argument --enable_trtllm_decoder becomes --enable_trtllm_sampler in examples/pytorch/quickstart_advanced.py.
---------
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2025-05-16 23:52:25 +08:00 |
|
Mike Iovine
|
e534bf09cc
|
[fix] Fix flashinfer + speculation issues (#3686)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-04-28 14:34:22 -04:00 |
|
Mike Iovine
|
41a6c98544
|
Support CUDA graphs for EAGLE3 (#3176)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
|
2025-04-17 04:53:50 +08:00 |
|
Yuan Tong
|
d4c0423cdb
|
refactor: collect executor and decoder states into dataclass (#3234)
* fix: Proper error bubbling for PyExecutor
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
|
2025-04-15 16:31:45 +08:00 |
|
tburt-nv
|
7a659885e3
|
chore: remove usernames from comments (#3291)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
|
2025-04-05 13:44:28 +08:00 |
|
Mike Iovine
|
5416966ddb
|
Add initial EAGLE-3 implementation (#3035)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
|
2025-03-29 22:31:24 +08:00 |
|