| .. |
|
attention_backend
|
feat: Introduce feature properties for attention backend. (#3659)
|
2025-04-19 12:37:27 +08:00 |
|
auto_deploy
|
feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs (#3589)
|
2025-04-23 03:38:51 +08:00 |
|
compilation
|
Unify two versions of AllReduce custom op (#3032)
|
2025-04-22 21:58:42 +08:00 |
|
custom_ops
|
Add smart router for moe (#3641)
|
2025-04-23 12:21:59 +08:00 |
|
distributed
|
Unify two versions of AllReduce custom op (#3032)
|
2025-04-22 21:58:42 +08:00 |
|
models
|
Add running E2E LoRA flow (#3648)
|
2025-04-23 11:19:41 +08:00 |
|
modules
|
Add smart router for moe (#3641)
|
2025-04-23 12:21:59 +08:00 |
|
peft
|
Add running E2E LoRA flow (#3648)
|
2025-04-23 11:19:41 +08:00 |
|
pyexecutor
|
Add running E2E LoRA flow (#3648)
|
2025-04-23 11:19:41 +08:00 |
|
speculative
|
fix: remove the unnecessary metadata changes in mtp. (#3787)
|
2025-04-23 16:01:28 +08:00 |
|
__init__.py
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
autotuner.py
|
feat: Apply the new torch-flow compatible AutoTuner to both Fused MoE and NVFP4 Linear operators. (#3151)
|
2025-04-08 14:28:36 +08:00 |
|
llm.py
|
test: [TRTLLM-4334] Create 1.0 criteria scope from API stability references (#3069)
|
2025-03-26 18:14:35 +08:00 |
|
metadata.py
|
feat: no-cache attention in PyTorch workflow (#3085)
|
2025-04-05 01:54:32 +08:00 |
|
model_config.py
|
feat: [Deepseek] Add trtllm-gen MOE FP4 MOE backend (#3387)
|
2025-04-21 10:01:33 +08:00 |
|
pipeline_interface.py
|
Clean up modeling_deepseek.py (#3640)
|
2025-04-18 17:54:33 -07:00 |
|
utils.py
|
Cache sin cos in model instead of global LRU cache. (#3378)
|
2025-04-14 11:19:09 +08:00 |