Commit Graph

  • 1dc04b2dee ggml : adjust is_first_call init value (#10193) b4036 Georgi Gerganov 2024-11-06 11:20:10 +02:00
  • a1eaf6a960 metal : add quantized FA support (#10149) Georgi Gerganov 2024-11-06 10:24:23 +02:00
  • c5d8bb5a81 leave only basic functions for SYCL CI fix_sycl_ci Meng, Hengyu 2024-11-06 07:47:50 +00:00
  • b8deef0ec0 llama : add <|tool_call|> formatting to Granite template (#10177) b4034 Gabe Goodhart 2024-11-05 05:23:04 -07:00
  • a9e8a9a030 ggml : fix arch check in bf16_to_fp32 (#10164) b4033 Diego Devesa 2024-11-04 23:17:01 +01:00
  • 3407364776 Q6_K AVX improvements (#10118) b4032 Eve 2024-11-04 22:06:31 +00:00
  • b4e9c5998d convert : fix flake8 lint Francis Couture-Harpin 2024-11-04 15:26:15 -05:00
  • 8d8f065743 Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-11-04 14:30:18 -05:00
  • d5a409e57f ggml : fix gelu tables initialization (#10172) Diego Devesa 2024-11-04 20:06:58 +01:00
  • 3bc7103d2e ggml : avoid multiply by D in GGML_OP_SSM_SCAN Francis Couture-Harpin 2024-11-04 11:36:37 -05:00
  • 401558b7ba ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment (#10167) Diego Devesa 2024-11-04 17:34:08 +01:00
  • 9e0ecfb697 server : clarify /slots endpoint, add is_processing (#10162) Xuan Son Nguyen 2024-11-04 16:33:29 +01:00
  • 6a066b9978 fix build break on arm64 linux (#10166) snadampal 2024-11-04 09:08:33 -06:00
  • ea02c753eb cuda : clear error after changing peer access (#10153) b4027 Diego Devesa 2024-11-04 13:10:23 +01:00
  • 05697f670b metal : simplify f16 and f32 dequant kernels (#0) b4026 Georgi Gerganov 2024-11-04 13:49:34 +02:00
  • f8e58135cf metal : move dequantize templates to beginning of MSL source (#0) b4025 Georgi Gerganov 2024-11-04 13:43:32 +02:00
  • 329ed914c9 CANN: adjust backend registry refactor. (#10158) b4024 leo-pony 2024-11-04 19:08:22 +08:00
  • ce027adfb3 sync : ggml b4023 Georgi Gerganov 2024-11-04 10:33:37 +02:00
  • 284e5b0275 cmake : make it possible linking ggml as external lib (ggml/1003) Yuri Khrustalev 2024-11-02 05:09:12 -04:00
  • e2292aaa17 metal : fix minor string leaks (ggml/1004) Plamen Minev 2024-11-01 16:55:10 +02:00
  • 9f40989351 ggml : move CPU backend to a separate file (#10144) b4020 Diego Devesa 2024-11-03 19:34:08 +01:00
  • 08828a6d7d metal : minor fixup in FA kernel (#10143) b4019 Georgi Gerganov 2024-11-03 15:18:40 +02:00
  • 1839f69130 flake.lock: Update (#10146) Georgi Gerganov 2024-11-03 15:14:15 +02:00
  • 9830b6923b Add apple arm to presets (#10134) Christian Köhnenkamp 2024-11-02 23:35:31 +01:00
  • 42cadc74bd server : fix slot selection by lru (#10126) b4016 sasha0552 2024-11-02 16:34:56 +00:00
  • 45950415ed server : fix endpoint checks (#10135) b4015 Georgi Gerganov 2024-11-02 18:34:00 +02:00
  • 4fc8673d09 llama-bench : skip repeated values in consecutive lines sl/llama-bench-headers slaren 2024-11-02 15:37:33 +01:00
  • 1926d6e39d llama : adjust default context size + print warnings (#10136) b4014 Georgi Gerganov 2024-11-02 15:18:56 +02:00
  • b634f8a26f simple-chat : only add bos on first prompt (#10129) b4013 Diego Devesa 2024-11-02 13:08:53 +01:00
  • 7554aa4655 convert-lora : make --base optional (#10110) Xuan Son Nguyen 2024-11-02 12:53:17 +01:00
  • 20e12112fd llama : suggest reduce ctx size when kv init fails sl/aligned-alloc-no-abort slaren 2024-11-02 00:55:19 +01:00
  • bf60f27cda ggml : do not abort when ggml_aligned_malloc fails slaren 2024-11-02 00:54:16 +01:00
  • a6744e43e8 llama : add simple-chat example (#10124) b4011 Diego Devesa 2024-11-01 23:50:59 +01:00
  • e991e3127f llama : use smart pointers for ggml resources (#10117) b4010 Diego Devesa 2024-11-01 23:48:26 +01:00
  • 418f5eef26 vulkan : improve ggml_vk_create_buffer error handling (#9898) b4009 Shupei Fan 2024-11-02 02:33:14 +08:00
  • ba6f62eb79 readme : update hot topics Georgi Gerganov 2024-11-01 17:31:51 +02:00
  • 7d16e1bc8c Merge branch 'master' into compilade/mamba2 Francis Couture-Harpin 2024-11-01 11:12:18 -04:00
  • d865d1478c server : fix smart selection of available slot (#10120) b4007 sasha0552 2024-11-01 13:33:14 +00:00
  • 1804adb0cf ggml : remove ggml_scratch (#10121) b4006 Georgi Gerganov 2024-11-01 12:58:45 +02:00
  • 815fe72adc sync : ggml b4005 Georgi Gerganov 2024-11-01 10:28:24 +02:00
  • f221d56220 ggml : alloc ggml_contexts on the heap (whisper/2525) Georgi Gerganov 2024-11-01 10:23:05 +02:00
  • e597e50794 build: fix build error in Windows env with OneAPI setup (#10107) b4003 Zhenwei Jin 2024-11-01 11:09:59 +08:00
  • 85679d37f3 llama : improve output buffer type selection (#10098) b4002 Diego Devesa 2024-11-01 00:49:53 +01:00
  • 1e9f94994e quantize : fix --keep-split (#10114) b4001 Diego Devesa 2024-11-01 00:45:34 +01:00
  • c02e5ab2a6 llama : fix buffer checks for mamba and rwk (#10111) b4000 Diego Devesa 2024-10-31 22:54:23 +01:00
  • ab3d71f97f loader: refactor tensor weights storage (#9935) b3999 Zhenwei Jin 2024-11-01 02:50:39 +08:00
  • 0a683e8088 server : include scheme when printing URL (#10106) b3998 Kevin Gibbons 2024-10-31 06:02:35 -07:00
  • dea5e86051 ggml : check tensor name lengths in gguf files (#10100) b3997 Diego Devesa 2024-10-31 11:40:59 +01:00
  • 1329c0a75e kompute: add mul_mat_q4_k shader (#10097) b3996 Sergio López 2024-10-31 10:09:52 +01:00
  • afc4a7de65 llama : enable flash attn automatically when supported sl/auto-flash-attn slaren 2024-10-30 23:30:04 +01:00
  • 61408e7fad kompute: add backend registry / device interfaces (#10045) b3995 Sergio López 2024-10-30 17:01:52 +01:00
  • b9e02e8184 ggml : fix memory leaks when loading invalid gguf files (#10094) b3994 Diego Devesa 2024-10-30 14:51:21 +01:00
  • 6763f713bb readme : more lora detail in main example readme (#10064) Rich Dougherty 2024-10-31 01:22:39 +13:00
  • 79a2bc042d convert : more detailed convert lora usage docs (#10065) Rich Dougherty 2024-10-31 01:22:21 +13:00
  • fc83a9e584 ggml : add Q4_0_8_8 RISC-V GEMV and GEMM kernels (#10029) b3991 xctan 2024-10-30 15:00:40 +08:00
  • c5b0f4b5d9 llama : refactor model loader with backend registry (#10026) b3990 Diego Devesa 2024-10-30 02:01:23 +01:00
  • 8f275a7c45 ggml: Add POOL2D OP for GPU acceleration to the Vulkan backend in the MobileVLM model. (#9763) b3989 Changyeon Kim 2024-10-29 17:52:56 +09:00
  • 8d8ff71536 llama : remove Tail-Free sampling (#10071) b3988 Georgi Gerganov 2024-10-29 10:42:05 +02:00
  • 61715d5cc8 llama : Add IBM granite template (#10013) b3987 arch-btw 2024-10-28 10:45:33 -07:00
  • 07028f9d74 flake.lock: Update (#10063) Georgi Gerganov 2024-10-28 17:41:24 +02:00
  • 524afeec9d musa: workaround for Guilty Lockup in cleaning src0 (#10042) b3985 R0CKSTAR 2024-10-28 17:02:48 +08:00
  • 8125e6cbfc server : don't overfill the batch during infill (#10018) b3984 Georgi Gerganov 2024-10-28 08:49:32 +02:00
  • 8841ce3f43 llama : switch KQ multiplication to F32 precision by default (#10015) b3983 Georgi Gerganov 2024-10-27 20:59:58 +02:00
  • cc2983d375 sync : ggml b3982 Georgi Gerganov 2024-10-26 10:34:08 +03:00
  • 8c60a8a462 increase cuda_cpy block size (ggml/996) bssrdf 2024-10-23 14:34:00 -04:00
  • 9e4a2563ea scripts : fix amx sync [no ci] Georgi Gerganov 2024-10-26 10:33:31 +03:00
  • 668750357e metal : support permuted matrix multiplicaions (#10033) Georgi Gerganov 2024-10-25 22:26:15 +03:00
  • ff252ea48e llama : add DRY sampler (#9702) b3978 wwoodsTM 2024-10-25 10:07:34 -06:00
  • d80fb71f8b llama: string_split fix (#10022) b3977 Michael Podvitskiy 2024-10-25 17:57:54 +02:00
  • c263ca767b remove wrong assert in norm WA for permute(0,1,3,2) mul_mat ggml-ci Meng, Hengyu 2024-10-25 07:41:48 +00:00
  • 2f8bd2b901 llamafile : extend sgemm.cpp support for Q5_0 models (#10010) b3976 Srihari-mcw 2024-10-25 12:57:41 +05:30
  • bc5ba007b2 server : check that the prompt fits in the slot's context (#10030) b3975 Georgi Gerganov 2024-10-25 10:13:46 +03:00
  • 958367bf53 server : refactor slot input data, move tokenizer to HTTP thread (#10023) b3974 Xuan Son Nguyen 2024-10-24 21:51:22 +02:00
  • 40f2555797 ci : fix cmake flags for SYCL Georgi Gerganov 2024-10-24 21:23:33 +03:00
  • 167a515651 CUDA: fix insufficient buffer clearing for MMQ (#10032) b3972 Johannes Gäßler 2024-10-24 14:40:23 +02:00
  • c39665f589 CUDA: fix MMQ for non-contiguous src0, add tests (#10021) b3971 Johannes Gäßler 2024-10-24 11:09:36 +02:00
  • 0a1c750c80 server : samplers accept the prompt correctly (#10019) b3970 wwoodsTM 2024-10-23 13:27:51 -06:00
  • 190a37d797 sync : ggml b3969 Georgi Gerganov 2024-10-23 17:23:55 +03:00
  • 2d3aba9ee8 llama.vim : bump generation time limit to 3s [no ci] Georgi Gerganov 2024-10-23 17:16:56 +03:00
  • 80273a306d CUDA: fix 1D im2col, add tests (ggml/993) b3967 Johannes Gäßler 2024-10-18 09:24:44 +02:00
  • c19af0acb1 ggml : remove redundant set of contexts used field (ggml/978) Daniel Bevenius 2024-10-16 20:10:01 +02:00
  • ac113a0fee llama.vim : add classic vim support (#9995) b3965 Michael Coppola 2024-10-23 07:09:26 -04:00
  • 4c9388fb96 metal : add POOL2D and fix IM2COL (#9943) b3964 Jun Hee Yoo 2024-10-23 19:33:45 +09:00
  • 873279b159 flake.lock: Update github-actions[bot] 2024-10-20 00:22:59 +00:00
  • c8c07d658a llama : fix empty batch causing llama_batch_allocr to crash (#9966) b3962 Xuan Son Nguyen 2024-10-22 16:59:02 +02:00
  • 19d900a756 llama : rename batch to ubatch (#9950) b3961 Daniel Bevenius 2024-10-22 15:31:06 +02:00
  • 11d47057a5 Rwkv chat template fix (#10001) b3960 Molly Sophia 2024-10-22 21:22:26 +08:00
  • c421ac072d lora : warn user if new token is added in the adapter (#9948) b3959 Xuan Son Nguyen 2024-10-22 13:08:41 +02:00
  • 4ff7fe1fb3 llama : add chat template for RWKV-World + fix EOT (#9968) b3958 Molly Sophia 2024-10-22 18:33:37 +08:00
  • 6b8447352d [CANN] Adapt to dynamically loadable backends mechanism (#9970) b3957 leo-pony 2024-10-22 16:16:01 +08:00
  • 674804a996 arg : fix typo in embeddings argument help [no ci] (#9994) Daniel Bevenius 2024-10-22 09:40:02 +02:00
  • e94a138d64 llama.vim : fix info text display [no ci] (#9787) Georgi Gerganov 2024-10-22 00:35:25 +03:00
  • e01c67affe llama.vim : move info to the right of screen [no ci] (#9787) Georgi Gerganov 2024-10-21 22:52:22 +03:00
  • 994cfb1acb readme : update UI list (#9972) Asghar Ghorbani 2024-10-21 20:20:59 +02:00
  • 94008cc760 arg : fix attention non-causal arg value hint (#9985) b3952 Daniel Bevenius 2024-10-21 20:12:52 +02:00
  • dbd5f2f573 llama.vim : plugin for Neovim (#9787) Georgi Gerganov 2024-10-21 20:25:02 +03:00
  • f594bc80ba ggml : add asserts for type conversion in fattn kernels (#9971) b3950 Georgi Gerganov 2024-10-21 16:20:46 +03:00
  • d5ebd79c76 rpc : pack only RPC structs (#9959) b3949 Radoslav Gerganov 2024-10-21 13:35:40 +03:00
  • 55e47786e3 llama : default sampling changes + greedy update (#9897) b3948 Georgi Gerganov 2024-10-21 09:46:40 +03:00
  • bc21975084 speculative : fix handling of some input params (#9963) b3947 Georgi Gerganov 2024-10-21 09:37:12 +03:00