| .. |
|
attention_backend
|
Fix create_weights in attention (#3692)
|
2025-04-24 07:30:00 +08:00 |
|
auto_deploy
|
feat: [AutoDeploy] generalizing cudagraph to multiple dynamic inputs (#3589)
|
2025-04-23 03:38:51 +08:00 |
|
compilation
|
Unify two versions of AllReduce custom op (#3032)
|
2025-04-22 21:58:42 +08:00 |
|
custom_ops
|
Add smart router for moe (#3641)
|
2025-04-23 12:21:59 +08:00 |
|
distributed
|
fix: Fix C++ decoder synchronization in PyTorch (#3106)
|
2025-04-23 23:55:27 +08:00 |
|
models
|
feat: return logits in PyTorch flow (#3221)
|
2025-04-24 16:56:03 -07:00 |
|
modules
|
Fix create_weights in attention (#3692)
|
2025-04-24 07:30:00 +08:00 |
|
peft
|
add passing E2E LoRA flow (#3788)
|
2025-04-23 18:38:06 +03:00 |
|
pyexecutor
|
feat: return logits in PyTorch flow (#3221)
|
2025-04-24 16:56:03 -07:00 |
|
speculative
|
feat: return logits in PyTorch flow (#3221)
|
2025-04-24 16:56:03 -07:00 |
|
__init__.py
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
autotuner.py
|
feat: Apply the new torch-flow compatible AutoTuner to both Fused MoE and NVFP4 Linear operators. (#3151)
|
2025-04-08 14:28:36 +08:00 |
|
llm.py
|
test: [TRTLLM-4334] Create 1.0 criteria scope from API stability references (#3069)
|
2025-03-26 18:14:35 +08:00 |
|
metadata.py
|
feat: no-cache attention in PyTorch workflow (#3085)
|
2025-04-05 01:54:32 +08:00 |
|
model_config.py
|
Fix create_weights in attention (#3692)
|
2025-04-24 07:30:00 +08:00 |
|
pipeline_interface.py
|
Clean up modeling_deepseek.py (#3640)
|
2025-04-18 17:54:33 -07:00 |
|
utils.py
|
Cache sin cos in model instead of global LRU cache. (#3378)
|
2025-04-14 11:19:09 +08:00 |