Commit Graph

  • d0f5deebf8 readme : update UI list (#6503) Hoang Nguyen 2024-04-05 11:39:43 -07:00
  • 87e21bbacd bench : make n_batch and n_ubatch configurable in Batched bench (#6500) b2613 Ting Sun 2024-04-06 01:34:53 +07:00
  • 1b496a745c [SYCL] Fixed minor bug when enabling FP16 for non intel targets (#6464) b2612 Ouadie EL FAROUKI 2024-04-05 14:35:06 +01:00
  • 89961dea87 Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-04-05 09:44:12 +03:00
  • a37696d4f1 speculative : more robust tokenizer comparison ceb/bert-tokenizer-fixes Jared Van Bortel 2024-04-04 18:25:19 -04:00
  • 92591c125f examples : rely on new behavior of add_special Jared Van Bortel 2024-04-04 18:12:33 -04:00
  • d1a1b614cd spm : fix special_add_bos default Jared Van Bortel 2024-04-04 17:54:46 -04:00
  • 45983e3a47 convert : remove now-unused ignore_nonllama parameter Jared Van Bortel 2024-04-04 17:44:58 -04:00
  • 909f6be291 convert scripts : fix python 3.8 compatibility Jared Van Bortel 2024-04-04 17:14:46 -04:00
  • 6a9d3c0911 convert : fix Tensor type annotations Jared Van Bortel 2024-04-04 17:07:07 -04:00
  • 0d052cbe39 Merge branch 'master' into ceb/bert-tokenizer-fixes Jared Van Bortel 2024-04-04 16:02:31 -04:00
  • 8803582721 llama : handle added special tokens like HF does Jared Van Bortel 2024-03-27 16:59:49 -04:00
  • 748fc8baa3 convert-hf-to-gguf : fix BERT abuse of LlamaHfVocab Jared Van Bortel 2024-03-27 16:13:09 -04:00
  • a307375c02 readme : add Dot to UI list (#6487) b2611 alexpinel 2024-04-04 18:22:50 +01:00
  • b660a5729e readme : fix typo (#6481) Jun Jie 2024-04-05 01:16:37 +08:00
  • 0a1d889e27 server: add cURL support to server Dockerfiles (#6474) Ed Lepedus 2024-04-04 17:31:22 +01:00
  • 7dda1b727e ci: exempt master branch workflows from getting cancelled (#6486) b2608 Minsoo Cheong 2024-04-05 01:30:53 +09:00
  • c666ba26c3 build CI: Name artifacts (#6482) Ewout ter Hoeven 2024-04-04 17:08:55 +02:00
  • 2e66913e5f server: allow penalizing repetition of newlines on server webpage (#6431) Shakhar Dasgupta 2024-04-04 11:03:00 -04:00
  • 8120efee1d ci: bench fix concurrency for workflow trigger dispatch with sha1 (#6478) Pierrick Hymbert 2024-04-04 16:59:04 +02:00
  • 8db1e4d45f llama : use std::find for seq_nodes in llama_rs_cache Francis Couture-Harpin 2024-04-04 10:46:43 -04:00
  • a74401f0e5 Correct README link (#6458) limitedAtonement 2024-04-04 10:30:02 -04:00
  • 7a2c92637a ci: bench: add more ftype, fix triggers and bot comment (#6466) Pierrick Hymbert 2024-04-04 11:57:58 +02:00
  • 4bcd6b959c common: remove duplicate check for curl (#6471) Daniel Bevenius 2024-04-04 09:49:21 +02:00
  • 9b84ae1806 examples : add GBNF validator program (#5948) Clint Herron 2024-04-04 03:44:28 -04:00
  • 4399f13fb9 server : remove obsolete --memory-f32 option Georgi Gerganov 2024-04-04 09:34:58 +03:00
  • 1a43c7254e server : add option to disable KV offload (#6468) Xiao-Yong Jin 2024-04-04 01:33:48 -05:00
  • 72d73af651 convert : fix for lint error complaining of bare except (#6470) Clint Herron 2024-04-04 02:32:53 -04:00
  • 271104c65c wip: llama : separate recurrent states from the KV cache Francis Couture-Harpin 2024-04-03 11:07:16 -04:00
  • 5fb1574c81 A few small fixes to server's README docs (#6428) Fattire 2024-04-03 13:22:57 -07:00
  • 60cdf40cc3 server : handle exception on wrong type in request (#6452) JH23X 2024-04-03 20:09:52 +02:00
  • bb43cf7e9d llama : add SEA-LION support (#6448) bryanSwk 2024-04-04 02:05:10 +08:00
  • 9f62c0173d ci : update checkout, setup-python and upload-artifact to latest (#6456) Ewout ter Hoeven 2024-04-03 20:01:13 +02:00
  • 5d4f12e462 server: add cURL support to server.Dockerfile (#6461) Ed Lepedus 2024-04-03 18:56:37 +01:00
  • 154d4ee39c readme : add feature-rich rust bindings (#6465) Francisco Melo 2024-04-03 18:53:37 +01:00
  • e69945d953 security : create policy (#6354) Joyce 2024-04-03 14:48:07 -03:00
  • db214fa578 Missing tokenizer.model error during gguf conversion (#6443) b2590 Abhishek Gopinath K 2024-04-03 21:12:52 +05:30
  • 1ff4d9f3d6 Add OpenChat, Alpaca, Vicuna chat templates (#6397) b2589 kaizau 2024-04-03 23:24:31 +08:00
  • 076b08649e readme : update hot topics Georgi Gerganov 2024-04-03 16:11:15 +03:00
  • 08a0c02060 ggml : mul_mat_id use the same tensor for all the experts (#6387) slaren 2024-04-03 15:07:05 +02:00
  • 52604860f9 [SYCL] Disable iqx on windows as WA (#6435) b2586 Meng, Hengyu 2024-04-03 10:34:40 +08:00
  • ee19a4ab7e fix KV cache padding, NaN from INFINITY (#6438) Johannes Gäßler 2024-04-02 17:26:22 +02:00
  • c63dfdf765 fix cmake build Johannes Gäßler 2024-04-02 11:58:59 +02:00
  • bb0d51accd fix excessive KQ_b loads Johannes Gäßler 2024-04-02 11:13:46 +02:00
  • e1ecd3b129 fix compile warnings Johannes Gäßler 2024-04-02 10:27:34 +02:00
  • 3f777acf06 Multiple parallel blocks for batch size 1 Johannes Gäßler 2024-04-01 16:41:56 +02:00
  • 68d793bee8 no ncols == 64 Johannes Gäßler 2024-04-01 15:54:50 +02:00
  • cca6d027a3 4 warps, 256 stride for all D Johannes Gäßler 2024-03-31 18:39:02 +02:00
  • 269374ed81 adjust kernel selection logic Johannes Gäßler 2024-03-31 16:01:27 +02:00
  • 81da919864 no vec for hs, no hs==256 ncols==32 for Volta Johannes Gäßler 2024-03-30 10:34:09 +01:00
  • d59ac670bf 16 cols for Phi-2 Johannes Gäßler 2024-03-30 09:19:19 +01:00
  • 75aa7b4b18 CUDA: faster FlashAttention, kernel for bs == 1 Johannes Gäßler 2024-03-29 23:02:39 +01:00
  • f87f7b8986 flake.lock: Update (#6402) b2585 Georgi Gerganov 2024-04-01 19:05:57 +03:00
  • 33a5244806 compare-llama-bench.py: fix long hexsha args (#6424) Johannes Gäßler 2024-04-01 13:30:43 +02:00
  • 226e819371 ci: server: verify deps are coherent with the commit (#6409) Pierrick Hymbert 2024-04-01 12:36:40 +02:00
  • c50a82ce0f readme : update hot topics b2582 Georgi Gerganov 2024-03-31 11:56:30 +03:00
  • 805d705032 license : add AUTHORS Georgi Gerganov 2024-03-31 10:17:36 +03:00
  • 37e7854c10 ci: bench: fix Resource not accessible by integration on PR event (#6393) b2581 Pierrick Hymbert 2024-03-30 11:36:07 +01:00
  • c342d070c6 Fedora build update (#6388) Mohammadreza Hendiani 2024-03-30 01:29:56 +03:30
  • f7fc5f6c6f split: allow --split-max-size option (#6343) b2579 Xuan Son Nguyen 2024-03-29 22:34:44 +01:00
  • ba0c7c70ab Vulkan k-quant mmq and ggml-backend offload functionality (#6155) b2578 0cc4m 2024-03-29 17:29:21 +01:00
  • d48ccf3ad4 sync : ggml (#6351) Georgi Gerganov 2024-03-29 17:45:46 +02:00
  • 069574775c [Model] Add support for xverse (#6301) b2576 hxer7963 2024-03-29 21:37:03 +08:00
  • cfde806eb9 ci : fix BGE wget (#6383) Georgi Gerganov 2024-03-29 14:34:28 +02:00
  • b910287954 readme : add project (#6356) zhouwg 2024-03-29 15:33:46 +08:00
  • 8093987090 cmake : add explicit metal version options (#6370) b2573 Matt Clayton 2024-03-29 03:27:42 -04:00
  • 057400a3fd llama : remove redundant reshape in build_kv_store (#6369) Daniel Bevenius 2024-03-29 08:23:22 +01:00
  • b75c38166c convert : allow conversion of Mistral HF models (#6144) Pedro Cuenca 2024-03-29 08:15:00 +01:00
  • bfe7dafc9c readme : add notice for UI list b2570 Georgi Gerganov 2024-03-28 22:56:03 +02:00
  • 4c190ba676 cuda : reduce registers gg/flash-attn-a Georgi Gerganov 2024-03-28 21:17:08 +02:00
  • 5dd355fe26 cuda : bump nwarps by 1 Georgi Gerganov 2024-03-28 20:21:09 +02:00
  • 08e69c5008 cuda : adapt soft_max to F16 mask and pos Georgi Gerganov 2024-03-28 19:40:11 +02:00
  • 3e318e764f Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-03-28 19:32:51 +02:00
  • 57c03b78b6 metal : improve perf via smaller int registers Georgi Gerganov 2024-03-28 19:29:06 +02:00
  • 5106ef482c [SYCL] Revisited & updated SYCL build documentation (#6141) Ouadie EL FAROUKI 2024-03-28 16:01:47 +00:00
  • be55134a53 convert : refactor vocab selection logic (#6355) b2568 Jared Van Bortel 2024-03-28 11:44:36 -04:00
  • 66ba560256 llava : fix MobileVLM (#6364) b2567 Ziang Wu 2024-03-28 22:33:10 +08:00
  • 0308f5e3d7 llama : fix command-r inference when omitting outputs (#6367) b2566 compilade 2024-03-28 08:05:54 -04:00
  • 28cb9a09c4 ci: bench: fix master not schedule, fix commit status failed on external repo (#6365) Pierrick Hymbert 2024-03-28 11:27:56 +01:00
  • 64b7d85891 llama : fix command-r inference compilade/fix-command-r Francis Couture-Harpin 2024-03-28 06:22:24 -04:00
  • cfc4d75df6 doc: fix outdated default value of batch size (#6336) Ting Sun 2024-03-28 16:51:06 +08:00
  • 6902cb7f2e server : stop gracefully on SIGTERM (#6348) b2563 Eric Zhang 2024-03-28 16:50:48 +08:00
  • d2d8f38996 nix: removed unnessesary indentation hutli 2024-03-27 19:17:30 +01:00
  • d39b308eaf nix: moved blas availability check to package inputs so it is still overridable hutli 2024-03-27 19:14:28 +01:00
  • c873976649 using blas.meta.available to check host platform hutli 2024-03-27 18:10:08 +01:00
  • dbb03e2b9c only using explicit blas if hostPlatform is allowed hutli 2024-03-27 17:25:05 +01:00
  • e9f17dc3bf nix: .#windows: proper cross-compilation set-up Someone Serge 2024-03-26 16:22:42 +00:00
  • 22a462cc1f nix: package: don't introduce the dependency on python Someone Serge 2024-03-26 16:22:07 +00:00
  • f6a0f5c642 nix: .#widnows: init hutli 2024-02-15 14:25:04 +01:00
  • d0e2f6416b doc: fix typo in MobileVLM-README.md (#6181) Ziang Wu 2024-03-28 12:03:30 +08:00
  • 25f4a613c4 [SYCL] fix set main gpu crash (#6339) b2554 Neo Zhang Jianyu 2024-03-28 08:55:24 +08:00
  • a016026a3a server: continuous performance monitoring and PR comment (#6283) Pierrick Hymbert 2024-03-27 20:26:49 +01:00
  • 53c7ec53d5 nix: ci: dont test cuda and rocm (for now) Someone Serge 2024-03-27 16:17:46 +00:00
  • e5b89a441a ggml : fix bounds checking of zero size views (#6347) b2551 slaren 2024-03-27 15:07:50 +01:00
  • 3a0345970e make : whitespace Georgi Gerganov 2024-03-27 15:02:49 +02:00
  • 1e13987fba embedding : show full embedding for single prompt (#6342) howlger 2024-03-27 12:15:44 +01:00
  • 6be02b5969 cuda : fix build gg/flash-attn-wip Georgi Gerganov 2024-03-27 10:31:52 +02:00
  • 013721df2b Merge branch 'master' into gg/flash-attn Georgi Gerganov 2024-03-27 10:24:09 +02:00
  • e82f9e2b83 [SYCL] Fix batched impl for NVidia GPU (#6164) b2548 AidanBeltonS 2024-03-27 08:16:40 +00:00
  • cbc8343619 Make IQ1_M work for QK_K = 64 (#6327) Kawrakow 2024-03-27 08:44:27 +01:00