TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

amirkl94 e04f6a1b9b fix: Fix p-tuning test bug (#3326 ) * fix: Fix p-tuning test bug * A change in the vocab_size calculation for T5Tokenizer, introduced in transformers version 4.34, caused addition of incorrect vtokens for ptuning. In general, instead of adding tokens which are outside the vocabulary, tokens inside the vocabulary were added. Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>		2025-04-08 17:14:00 +08:00
..
apps	doc: refactor trtllm-serve examples and doc (#3187 )	2025-04-04 11:40:43 +08:00
auto_deploy	chore: remove usernames from comments (#3291 )	2025-04-05 13:44:28 +08:00
bert	Update TensorRT-LLM (#2582 )	2024-12-16 21:50:47 -08:00
bindings/executor	Update TensorRT-LLM (#2582 )	2024-12-16 21:50:47 -08:00
commandr	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
cpp/executor	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
cpp_library	Update TensorRT-LLM (#1274 )	2024-03-12 18:15:52 +08:00
deepseek_v3	feat: enable DeepGEMM by default (#3341 )	2025-04-08 13:58:57 +08:00
disaggregated	update readme for disaggregated (#3323 )	2025-04-07 21:29:15 +08:00
dora	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
draft_target_model	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
eagle	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
enc_dec	doc: use alert formatting (#3153 )	2025-03-31 07:30:52 +08:00
exaone	Add EXAONE-Deep (#3054 )	2025-03-26 14:24:04 +08:00
gemma	Update (#2978 )	2025-03-23 16:39:35 +08:00
glm-4-9b	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
gpt	fix: GPT-Next convert failure (#3220 )	2025-04-02 17:14:39 +08:00
granite	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
infinitebench	Update TensorRT-LLM (#1725 )	2024-06-04 20:26:32 +08:00
internlm2	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
language_adapter	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
llama	fix: fix for cp > kvHeadNum (#3002 )	2025-03-26 12:39:02 +08:00
llm-api	feat: use cudaMalloc to allocate kvCache (#3303 )	2025-04-08 10:59:14 +08:00
llm-eval/lm-eval-harness	Update (#2978 )	2025-03-23 16:39:35 +08:00
lookahead	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
mamba	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
medusa	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
mixtral	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
mllama	Update TensorRT-LLM (#2582 )	2024-12-16 21:50:47 -08:00
models/contrib	chore: clean some ci of qa test (#3083 )	2025-03-31 14:30:41 +08:00
multimodal	test: add random image test for llama-3.2-11b-vision (#3055 )	2025-03-26 15:38:16 +08:00
nemotron	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
nemotron_nas	Update TensorRT-LLM (#2562 )	2024-12-11 00:31:05 -08:00
openai_triton	Update TensorRT-LLM (#2792 )	2025-02-18 21:27:39 +08:00
phi	Add support for Phi-4-mini (#2990 )	2025-04-02 08:34:39 +08:00
prompt_lookup	Update TensorRT-LLM (#2873 )	2025-03-11 21:13:42 +08:00
python_plugin	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
pytorch	chore: refactor the LlmArgs with Pydantic and migrate remaining pybinding configs to python (#3025 )	2025-04-05 13:31:48 +08:00
quantization	Update README.md (#2862 )	2025-03-24 13:46:09 +08:00
qwen	fix: fix for cp > kvHeadNum (#3002 )	2025-03-26 12:39:02 +08:00
qwen2audio	chore: Handle qwen2audio inputs ids expansion during processing (#3080 )	2025-03-26 15:00:27 +08:00
qwenvl	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
recurrentgemma	Fix .gitmodules (#2852 )	2025-03-04 22:34:09 +08:00
redrafter	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
sample_weight_stripping	Update (#2978 )	2025-03-23 16:39:35 +08:00
scaffolding	doc: add a directory for scaffolding contributors (#3224 )	2025-04-02 16:08:00 +08:00
serve	doc: refactor trtllm-serve examples and doc (#3187 )	2025-04-04 11:40:43 +08:00
vit	Update (#2978 )	2025-03-23 16:39:35 +08:00
whisper	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00
constraints.txt	chore: bump version to 0.19.0.dev2025040800 (#3171 )	2025-04-02 08:21:55 +08:00
eval_long_context.py	test: Accuracy test improvement (Part 2): Incorporate mmlu to accuracy test suite (#2982 )	2025-03-25 07:34:10 +08:00
generate_checkpoint_config.py	Update TensorRT-LLM (#2562 )	2024-12-11 00:31:05 -08:00
generate_xgrammar_tokenizer_info.py	Update TensorRT-LLM (#2783 )	2025-02-13 18:40:22 +08:00
gpqa_llmapi.py	test: Add gpqa tests for DeepSeek models (#3063 )	2025-03-27 19:47:06 +08:00
hf_lora_convert.py	Update TensorRT-LLM (#2755 )	2025-02-11 03:01:00 +00:00
mmlu_llmapi.py	test: Accuracy test improvement (Part 2): Incorporate mmlu to accuracy test suite (#2982 )	2025-03-25 07:34:10 +08:00
mmlu.py	test: Accuracy test improvement (Part 2): Incorporate mmlu to accuracy test suite (#2982 )	2025-03-25 07:34:10 +08:00
run.py	fix: Fix p-tuning test bug (#3326 )	2025-04-08 17:14:00 +08:00
summarize.py	test: Accuracy test improvement (Part 3.1): Extend accuracy test suite with LLM API and initial implementation of `trtllm-eval` (#3167 )	2025-04-01 22:20:29 +08:00
utils.py	Update TensorRT-LLM (#2849 )	2025-03-04 18:44:00 +08:00