mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-02-08 12:12:33 +08:00
* Why? There were a couple of issues with the recently merged custom model injection for AutoDeploy + the reference implementation of nemotron H: - `d_mlp` was left in despite being mathematically always null (could lead to runtime issues during sharding). - the custom model mapping was inherited by children factories. * What? This commit fixes these issues, and refactors the key of the custom implementation to be based on the name of the configuration class as well. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| attention | ||
| auto_deploy | ||
| compilation | ||
| debugger | ||
| executor | ||
| misc | ||
| modeling | ||
| models/checkpoints/hf | ||
| modules | ||
| multi_gpu | ||
| multi_gpu_modeling | ||
| multimodal | ||
| ray_orchestrator | ||
| sampler | ||
| speculative | ||
| thop | ||
| helpers.py | ||
| pattern_watcher.py | ||
| test_connector.py | ||