..
attention_backend
[ https://nvbugspro.nvidia.com/bug/5329655 ] [feat] Pytorch path add spec dec param to attention op ( #5146 )
2025-07-02 04:54:43 -04:00
auto_deploy
[feat] Support torch compile for attention dp ( #5086 )
2025-07-01 13:48:52 -04:00
compilation
[feat] Piecewise cuda graph support for MLA ( #4467 )
2025-06-17 18:58:38 +08:00
custom_ops
[ https://nvbugspro.nvidia.com/bug/5329655 ] [feat] Pytorch path add spec dec param to attention op ( #5146 )
2025-07-02 04:54:43 -04:00
debug
Add debug hook to support dump tensor data and add new debug functions easily ( #5182 )
2025-06-24 17:45:28 +08:00
distributed
[feat] Support torch compile for attention dp ( #5086 )
2025-07-01 13:48:52 -04:00
models
[ModelLoad] Concurrent load model ( #5291 )
2025-07-03 22:18:04 +08:00
modules
[ModelLoad] Concurrent load model ( #5291 )
2025-07-03 22:18:04 +08:00
peft
feat: support multi lora adapters and TP ( #3885 )
2025-05-08 23:45:45 +08:00
pyexecutor
MTP and derivatives: Align sample state with trtllm sampler sample state ( #5675 )
2025-07-03 19:55:48 +02:00
speculative
MTP and derivatives: Align sample state with trtllm sampler sample state ( #5675 )
2025-07-03 19:55:48 +02:00
__init__.py
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default ( #5312 )
2025-06-20 03:01:10 +08:00
autotuner.py
[TRTLLM-5770] feat: Integrate TRT-LLM Gen FP8 block scale MoE with Pytorch workflow kernel autotuner ( #5207 )
2025-06-17 21:01:56 +08:00
expert_statistic.py
Add MTP support for Online EPLB ( #5213 )
2025-06-25 07:58:13 +08:00
llm.py
[TRTLLM-5208][BREAKING CHANGE] chore: make pytorch LLM the default ( #5312 )
2025-06-20 03:01:10 +08:00
metadata.py
feat: no-cache attention in PyTorch workflow ( #3085 )
2025-04-05 01:54:32 +08:00
model_config.py
Feat/pytorch vswa kvcachemanager ( #5151 )
2025-07-02 15:58:00 +08:00
utils.py
[feat] Support torch compile for attention dp ( #5086 )
2025-07-01 13:48:52 -04:00