| .. |
|
_torch
|
feat: Add pp support for hybrid attn/mamba model (#4358)
|
2025-05-19 14:47:45 +08:00 |
|
auto_parallel
|
fix: Fix NVLink version decoding. (#3996)
|
2025-05-06 13:56:50 +08:00 |
|
bench
|
chore: Mass Integration 0.19 (#4255)
|
2025-05-16 10:53:25 +02:00 |
|
commands
|
Breaking change: perf: Enable scheduling overlap by default (#4174)
|
2025-05-15 14:27:36 +08:00 |
|
evaluate
|
[TRTLLM-4480][doc] Documentation for new accuracy test suite and trtllm-eval (#3946)
|
2025-05-08 19:35:23 +08:00 |
|
executor
|
[https://nvbugspro.nvidia.com/bug/5243740][fix] deduce default max_tokens for trtllm-serve (#4265)
|
2025-05-19 00:34:40 +08:00 |
|
inputs
|
[TRTLLM-5054][fix] Removing repeated loading of input processor (#4161)
|
2025-05-16 08:04:58 +08:00 |
|
layers
|
refactor: use x is None instead of x == None. (#4244)
|
2025-05-15 20:00:04 +08:00 |
|
llmapi
|
chore: cleanup perf_evaluator code (#3833)
|
2025-05-19 13:21:36 +08:00 |
|
models
|
fix:https://nvbugs/5234033 enable starcoder trt-flow with transforme… (#3909)
|
2025-05-15 11:16:45 +08:00 |
|
plugin
|
Revert "feat: Low Precision Allreduce for PCIe based GPU" (#4340)
|
2025-05-15 09:52:39 +08:00 |
|
quantization
|
chore: bump version to 0.19.0 (#3598) (#3841)
|
2025-04-29 16:57:22 +08:00 |
|
runtime
|
feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism (#4034)
|
2025-05-16 04:16:53 +08:00 |
|
scaffolding
|
API Breaking Change + Readability: "decoder"->"sampler" (#4121)
|
2025-05-16 23:52:25 +08:00 |
|
serve
|
[https://nvbugspro.nvidia.com/bug/5243740][fix] deduce default max_tokens for trtllm-serve (#4265)
|
2025-05-19 00:34:40 +08:00 |
|
tools
|
feat: Support Mistral Small 3.1 24B VLM in TRT workflow (#4183)
|
2025-05-14 03:47:22 +08:00 |
|
__init__.py
|
fix: revert https://github.com/NVIDIA/TensorRT-LLM/pull/3858 (#3928)
|
2025-04-29 11:26:13 +08:00 |
|
_common.py
|
Update (#2978)
|
2025-03-23 16:39:35 +08:00 |
|
_dlpack_utils.py
|
feat: Add MNNVL MoE A2A support (#3504)
|
2025-04-25 17:29:08 +08:00 |
|
_ipc_utils.py
|
fix: Proper error bubbling for PyExecutor (#3321)
|
2025-04-15 14:49:46 +08:00 |
|
_mnnvl_utils.py
|
fix: Remove real size allocation (#4396)
|
2025-05-18 19:13:22 +08:00 |
|
_utils.py
|
feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism (#4034)
|
2025-05-16 04:16:53 +08:00 |
|
builder.py
|
chore: remove usernames from comments (#3291)
|
2025-04-05 13:44:28 +08:00 |
|
disaggregated_params.py
|
Update TensorRT-LLM (#2936)
|
2025-03-18 21:25:19 +08:00 |
|
functional.py
|
refactor: use x is None instead of x == None. (#4244)
|
2025-05-15 20:00:04 +08:00 |
|
graph_rewriting.py
|
Update TensorRT-LLM (#2755)
|
2025-02-11 03:01:00 +00:00 |
|
logger.py
|
Update TensorRT-LLM (#2873)
|
2025-03-11 21:13:42 +08:00 |
|
lora_manager.py
|
add changes for fp8, nemotron-nas, API (#4180)
|
2025-05-18 23:27:25 +08:00 |
|
mapping.py
|
feat: [nvbugs/5261055][nvbugs/5170160] non-invasive pipeline parallelism (#4034)
|
2025-05-16 04:16:53 +08:00 |
|
module.py
|
Update (#2978)
|
2025-03-23 16:39:35 +08:00 |
|
network.py
|
chore: remove usernames from comments (#3291)
|
2025-04-05 13:44:28 +08:00 |
|
parameter.py
|
fix:https://nvbugs/5234033 enable starcoder trt-flow with transforme… (#3909)
|
2025-05-15 11:16:45 +08:00 |
|
profiler.py
|
test [TRTLLM-4477,TRTLLM-4481]: Accuracy test improvement (Part 3.5): Support GSM8K and GPQA (#3483)
|
2025-04-22 07:38:16 +08:00 |
|
prompt_adapter_manager.py
|
Update TensorRT-LLM (#2333)
|
2024-10-15 15:28:40 +08:00 |
|
python_plugin.py
|
refactor: use x is None instead of x == None. (#4244)
|
2025-05-15 20:00:04 +08:00 |
|
sampling_params.py
|
feat: Support the Structural Tag in guided decoding (#4066)
|
2025-05-12 17:24:50 +08:00 |
|
top_model_mixin.py
|
Update TensorRT-LLM (#2053)
|
2024-07-30 21:25:01 +08:00 |
|
version.py
|
chore: bump version to 0.20.0rc3 (#4261)
|
2025-05-14 10:15:25 +08:00 |