TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-12 14:03:48 +08:00

History

Yuxian Qiu 3b3069b390 [https://nvbugs/5747930 ][fix] Use offline tokenizer for whisper models. (#10121 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>		2025-12-20 09:42:07 +08:00
..
apps
auto_deploy	[#9640 ][feat] Migrate model registry to v2.0 format with composable configs (#9836 )	2025-12-19 05:30:02 -08:00
bindings/executor
configs	[None][fix] enable KV cache reuse for config database (#10094 )	2025-12-19 15:16:56 -08:00
cpp/executor	[TRTLLM-9197][infra] Move thirdparty stuff to it's own listfile (#8986 )	2025-11-20 16:44:23 -08:00
cpp_library
disaggregated	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
dora
draft_target_model
eagle
infinitebench
language_adapter
layer_wise_benchmarks	[TRTLLM-9615][feat] Implement a distributed tuning system (#9621 )	2025-12-15 21:08:53 +08:00
llm-api	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
llm-eval/lm-eval-harness
longbench	[None] [feat] Optimize the algorithm part of RocketKV (#9333 )	2025-12-01 09:04:09 +08:00
lookahead
medusa	[OMNIML-3036][doc] Re-branding TensorRT-Model-Optimizer as Nvidia Model-Optimizer (#9679 )	2025-12-07 07:14:05 -08:00
models	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
ngram
openai_triton
opentelemetry	[None][chore] Change trt-server to trtlllm-server in opentelemetry readme (#9173 )	2025-11-17 22:02:24 -08:00
python_plugin
quantization	[OMNIML-3036][doc] Re-branding TensorRT-Model-Optimizer as Nvidia Model-Optimizer (#9679 )	2025-12-07 07:14:05 -08:00
ray_orchestrator	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
redrafter
sample_weight_stripping	[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522 )	2025-11-29 21:48:48 +08:00
scaffolding
serve	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
sparse_attention	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
trtllm-eval
wide_ep	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
__init__.py	[TRTC-102][docs] `--extra_llm_api_options`->`--config` in docs/examples/tests (#10005 )	2025-12-19 13:48:43 -05:00
constraints.txt	[None][chore] bump version to 1.2.0rc6 (#9874 )	2025-12-10 04:53:26 -08:00
eval_long_context.py
generate_checkpoint_config.py
generate_xgrammar_tokenizer_info.py
hf_lora_convert.py
mmlu.py	[https://nvbugs/4141427 ][chore] Add more details to LICENSE file (#9881 )	2025-12-13 08:35:31 +08:00
run.py
summarize.py	[None][feat] Support ignored prompt length for penalties via new sampling config parameter (#8127 )	2025-10-27 13:12:31 -04:00
utils.py	[https://nvbugs/5747930 ][fix] Use offline tokenizer for whisper models. (#10121 )	2025-12-20 09:42:07 +08:00