llama.cpp

mirror of https://github.com/ggml-org/llama.cpp.git synced 2026-07-04 18:20:21 +00:00

Author	SHA1	Message	Date
Andrei Betlen	a89427908d	Add custom kq scaling from Gemma2Attention	2024-06-29 10:17:33 -04:00
Andrei Betlen	6f2464e3dd	Merge branch 'add-gemma2-soft-capping' of github.com:ggerganov/llama.cpp into add-gemma2-soft-capping	2024-06-29 01:11:17 -04:00
Andrei Betlen	bb7159927d	Add default value for attention and final logit softcap value	2024-06-29 01:10:55 -04:00
Andrei	3a2471811f	Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com>	2024-06-28 16:07:47 -04:00
Andrei Betlen	f4424c150f	Disable flash attention for Gemma2	2024-06-28 16:00:20 -04:00
Andrei Betlen	4d3f17b4ac	Add attention and final logit softcapping.	2024-06-28 15:42:19 -04:00
Xuan Son Nguyen	26a39bbd6b	Add MiniCPM, Deepseek V2 chat template + clean up `llama_chat_apply_template_internal` (#8172 ) * tmp_contains * minicpm chat template * add DeepSeek Lite template * change deepseek-lite to deepseek2 * correct code comment * correct code from master branch	2024-06-28 15:11:44 +02:00
pculliton	e57dc62057	llama: Add support for Gemma2ForCausalLM (#8156 ) * Inference support for Gemma 2 model family * Update convert-hf-to-gguf.py, constants, and tensor mappings * cleanup * format fix * Fix special token vocab bug * Don't add space prefix * fix deleted lines * Update src/llama.cpp Co-authored-by: slaren <slarengh@gmail.com> * Add model type names * Add control vector * Fix model type identification --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-06-27 21:00:43 -07:00
Sigbjørn Skjæret	6030c61281	Add Qwen2MoE 57B-A14B model identifier (#8158 ) * Add Qwen2MoE 57B-A14B * Add Qwen2MoE 57B-A14B	2024-06-27 16:27:41 +02:00
kustaaya	f675b20a3b	Added support for Viking pre-tokenizer (#8135 ) Co-authored-by: kustaaya <kustaaya@protonmail.com>	2024-06-27 10:58:54 +02:00
Sigbjørn Skjæret	911e35bb8b	llama : fix CodeLlama FIM token checks (#8144 ) * account for space prefix character * use find instead	2024-06-27 10:46:41 +03:00
Georgi Gerganov	f3f65429c4	llama : reorganize source code + improve CMake (#8006 ) * scripts : update sync [no ci] * files : relocate [no ci] * ci : disable kompute build [no ci] * cmake : fixes [no ci] * server : fix mingw build ggml-ci * cmake : minor [no ci] * cmake : link math library [no ci] * cmake : build normal ggml library (not object library) [no ci] * cmake : fix kompute build ggml-ci * make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE ggml-ci * move public backend headers to the public include directory (#8122) * move public backend headers to the public include directory * nix test * spm : fix metal header --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * scripts : fix sync paths [no ci] * scripts : sync ggml-blas.h [no ci] --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-06-26 18:33:02 +03:00

12 Commits