Commit Graph

  • 5be6c803fa llama : remove token functions with context args in favor of model (#3720) b1416 Marcus Dunn 2023-10-23 12:40:03 -07:00
  • c13fcfbfc0 cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops) Georgi Gerganov 2023-10-23 20:37:04 +03:00
  • 84d4ca0e47 cuda : minor indentation Georgi Gerganov 2023-10-23 20:36:50 +03:00
  • 8d8d54f834 ggml : skip nops in compute_forward Georgi Gerganov 2023-10-23 20:36:32 +03:00
  • 6a30bf3e51 batched : add NGL arg Georgi Gerganov 2023-10-23 20:36:12 +03:00
  • 8fb1be642e cmake : add helper for faster CUDA builds Georgi Gerganov 2023-10-23 20:35:19 +03:00
  • b9bb4cbe86 Separate bug and enhancement template + no default title upd-issue-templates M. Yusuf Sarıgöz 2023-10-23 18:59:11 +03:00
  • 6336701c93 Fix baichuan convert script not detecing model (#3739) Galunid 2023-10-23 17:47:03 +02:00
  • 96981f37b1 make : add optional CUDA_NATIVE_ARCH (#2482) b1414 Alex 2023-10-22 15:56:53 -04:00
  • 438c2ca830 server : parallel decoding and multimodal (#3677) b1413 Georgi Gerganov 2023-10-22 22:53:08 +03:00
  • c0f4d54870 server : add comment about changing slot_state to bool server-rev Georgi Gerganov 2023-10-22 22:24:39 +03:00
  • 9e70cc0322 Add test for MPT tokenization (#3728) b1412 goerch 2023-10-22 21:21:42 +02:00
  • 83e1490187 server : fix slot reuse Georgi Gerganov 2023-10-22 21:57:23 +03:00
  • 5a42a5f8e8 readme : remove unsupported node.js library (#3703) Ian Scrivener 2023-10-23 05:16:43 +11:00
  • a5e7dbd614 llama : validate special token ids are in range when loading GGUF model (#3635) b1410 Kerfuffle 2023-10-22 12:14:56 -06:00
  • d3956aea53 main : escape prompt for cfg_negative_prompt and consecutive inputs in main with interactive (#3623) b1409 vvhg1 2023-10-22 20:09:51 +02:00
  • 8fe7ca4875 server : apply fix from #3722 Georgi Gerganov 2023-10-22 21:05:45 +03:00
  • 00ae55b388 server : hide ctx_sampling->prev behind API (#3696) Georgi Gerganov 2023-10-22 20:09:01 +03:00
  • 3d6a687f1d Update readme to document multimodal in server M. Yusuf Sarıgöz 2023-10-22 20:03:35 +03:00
  • dd1af2ed35 server : minor style Georgi Gerganov 2023-10-22 19:52:38 +03:00
  • a4d69d8b81 Merge branch 'server-rev' of https://github.com//ggerganov/llama.cpp into server-rev M. Yusuf Sarıgöz 2023-10-22 19:49:48 +03:00
  • 2679c432d5 Update readme to document multimodal in server M. Yusuf Sarıgöz 2023-10-22 19:49:33 +03:00
  • a8063171bd server : completion requests remember slot_id Georgi Gerganov 2023-10-22 19:34:48 +03:00
  • f305d6434f editorconfig : new line in index.html Georgi Gerganov 2023-10-22 19:10:30 +03:00
  • 5359fb9267 Do not save/load image_data to localStorage M. Yusuf Sarıgöz 2023-10-22 19:08:09 +03:00
  • f67d971344 server : bug fix for prompt caching Georgi Gerganov 2023-10-22 17:52:59 +03:00
  • 569ebf11cf server : refactor ctx_sampling init + n_ctx + names Georgi Gerganov 2023-10-22 16:57:05 +03:00
  • ef18f4d579 server : fix crash in Debug on macOS (I have no idea why this fixes it!?) Georgi Gerganov 2023-10-22 16:55:40 +03:00
  • 197a0a9e23 server : fix switch fallthrough Georgi Gerganov 2023-10-22 16:55:05 +03:00
  • 715f384a6b clip : link to ggml, not to llama Georgi Gerganov 2023-10-22 16:52:12 +03:00
  • 4b4ab722ab make : silence stb warnings Georgi Gerganov 2023-10-22 16:51:59 +03:00
  • 176993c871 Merge branch 'master' into server-rev Georgi Gerganov 2023-10-22 15:04:16 +03:00
  • cb79f8a2d8 llama : add SKIP_KQ_KQV option perf-study Georgi Gerganov 2023-10-22 09:58:29 +03:00
  • ed9fde7a1e ggml : skip nops Georgi Gerganov 2023-10-22 09:55:37 +03:00
  • 2471d56a2e llama : profiling the attention compute Georgi Gerganov 2023-10-22 09:22:54 +03:00
  • 22c69a2794 batched : add len CLI argument b1408 Georgi Gerganov 2023-10-22 08:37:20 +03:00
  • 2eb4c11ec5 fix image load + view image in chat FSSRepo 2023-10-21 14:34:19 -04:00
  • 17b23eb9cb server : fix multibyte handle in partial response (#3706) Jhen-Jie Hong 2023-10-21 19:58:03 +08:00
  • 465219b914 CLBlast: Add outer loops over src0 for broadcasting in mulmat b1407 shibe2 2023-10-12 16:01:23 +04:00
  • d1031cf49c sampling : refactor init to use llama_sampling_params (#3696) b1406 Georgi Gerganov 2023-10-20 21:07:23 +03:00
  • 778c070d1b server : logs + minor code style Georgi Gerganov 2023-10-20 20:44:51 +03:00
  • 5d540e80d1 server : no need for atomic int - already using mutex Georgi Gerganov 2023-10-20 20:44:29 +03:00
  • 113dd60005 server : bach has to be allocated for n_parallel sequences Georgi Gerganov 2023-10-20 20:42:45 +03:00
  • 6b2437e32d added thread safe pipeline FSSRepo 2023-10-20 12:07:32 -04:00
  • 56ba00b923 sampling : hide prev behind API and apply #3661 sampling-refactor Georgi Gerganov 2023-10-20 18:26:20 +03:00
  • 7e2b5fb1dd sampling : add llama_sampling_print helper Georgi Gerganov 2023-10-20 18:02:50 +03:00
  • b526561583 sampling : rename penalty params + reduce size of "prev" vector Georgi Gerganov 2023-10-20 17:47:13 +03:00
  • 84ed48b473 examples : remove embd-input and gptneox-wip Georgi Gerganov 2023-10-20 17:08:32 +03:00
  • 6e6587656f llama : combine repetition, frequency and presence penalties in 1 call Georgi Gerganov 2023-10-20 17:05:46 +03:00
  • cd1e937821 sampling : refactor init to use llama_sampling_params Georgi Gerganov 2023-10-20 14:58:20 +03:00
  • 8cf19d60dc gguf : support big endian platform (#3552) b1405 Qin Yue Chen 2023-10-20 06:19:40 -05:00
  • a0edf73bda server : fix uninitialized sampling context (close #3685) b1404 Georgi Gerganov 2023-10-20 13:06:10 +03:00
  • f439e506e8 ggml : fix rope + llama minor optimizations (#3560) b1403 Herman Semenov 2023-10-20 10:02:12 +00:00
  • e78f3ef24a convert : restore compat with old Falcon models (#3680) cebtenzzre 2023-10-20 01:32:08 -04:00
  • f3b25e4043 multimodal : add BakLLaVA conversion support (#3682) M. Yusuf Sarıgöz 2023-10-19 19:40:41 +03:00
  • 60abea9798 llava : avoid segfault in case of non-existent mmproj file (#3674) b1400 M. Yusuf Sarıgöz 2023-10-19 16:59:11 +03:00
  • 325d1793f7 server : minor sync Georgi Gerganov 2023-10-19 15:03:24 +03:00
  • 9740824ba5 server : snake case Georgi Gerganov 2023-10-19 14:44:37 +03:00
  • e3a2c3fe32 server : use refs + use llama_batch_clear() Georgi Gerganov 2023-10-19 14:44:04 +03:00
  • 3d5929e8ee server : bug fix in ingest_images Georgi Gerganov 2023-10-19 14:43:19 +03:00
  • a8c981b734 server : remove beam-search functionality Georgi Gerganov 2023-10-19 14:10:37 +03:00
  • 654e0a1fe0 server : coding-style normalization (part 2) Georgi Gerganov 2023-10-19 14:09:45 +03:00
  • e44ed60187 server : coding-style normalization Georgi Gerganov 2023-10-19 13:37:39 +03:00
  • ab2fc00224 latest changes of sampling API FSSRepo 2023-10-18 16:57:48 -04:00
  • 8540568c48 Merge branch 'master' of https://github.com/ggerganov/llama.cpp FSSRepo 2023-10-18 16:55:26 -04:00
  • 7196c4e08a new sampling API FSSRepo 2023-10-18 16:50:09 -04:00
  • 004797f6ac readme : update hot topics Georgi Gerganov 2023-10-18 21:44:43 +03:00
  • 4e82b2ea3f speculative : bug fixes b1398 Georgi Gerganov 2023-10-18 18:49:40 +03:00
  • 0e89203b51 speculative : add tree-based sampling example (#3624) b1397 Georgi Gerganov 2023-10-18 16:21:57 +03:00
  • 84b8f2b060 Merge branch 'ggerganov:master' into master Steward Garcia 2023-10-18 08:43:17 -04:00
  • c67fe68e41 metal : implement q5_0 and q5_1 kernels (#3648) b1396 Jhen-Jie Hong 2023-10-18 07:21:48 -05:00
  • 1117d06607 opencl : fix element-wise multiplication (#3656) b1395 shibe2 2023-10-18 16:09:22 +04:00
  • ad2727d091 Merge branch 'master' into speculative-tree speculative-tree Georgi Gerganov 2023-10-18 10:38:03 +03:00
  • 35fd37430f fix zig build FSSRepo 2023-10-17 18:04:26 -04:00
  • c02c52efb5 fix multiple clients FSSRepo 2023-10-17 17:54:56 -04:00
  • d2b1fac6c7 fix make bui;d errors FSSRepo 2023-10-17 17:18:56 -04:00
  • ed0c11cb83 multimodal support enabled by default FSSRepo 2023-10-17 16:58:20 -04:00
  • 6c277eaab5 update api like OpenAI FSSRepo 2023-10-17 16:53:38 -04:00
  • 58f8ae9bfe readme change FSSRepo 2023-10-17 16:32:19 -04:00
  • fa0f22f14f Merge remote-tracking branch 'upstream/master' FSSRepo 2023-10-17 16:31:33 -04:00
  • cb33f43a2a fix embeddings when using CUDA (#3657) b1394 slaren 2023-10-17 22:24:50 +02:00
  • aa2268f4cd sync README.md changes FSSRepo 2023-10-17 16:21:05 -04:00
  • e1675d133c llama : avoid fprintf in favor of LLAMA_LOG (#3538) b1393 Georgi Gerganov 2023-10-17 22:34:26 +03:00
  • 8402566a7c readme : update hot-topics & models, detail windows release in usage (#3615) BarfingLemurs 2023-10-17 14:13:21 -04:00
  • 40e5ce054f CLBlast: Fix temporary buffer size for f16 conversion (wsize) b1391 shibe2 2023-10-11 21:30:06 +04:00
  • a5e8c1d8c7 train-text-from-scratch : fix assert failure in ggml-alloc (#3618) b1390 slaren 2023-10-17 19:00:58 +02:00
  • e74c705e15 editorconfig : remove trailing spaces Georgi Gerganov 2023-10-17 19:52:53 +03:00
  • 3ad1e3f1a1 server : documentation of JSON return value of /completion endpoint (#3632) coezbek 2023-10-17 18:51:02 +02:00
  • bd9451ca2a Merge branch 'master' into speculative-tree Georgi Gerganov 2023-10-17 19:31:40 +03:00
  • 1142013da4 save-load-state : fix example + add ci test (#3655) b1387 Georgi Gerganov 2023-10-17 19:12:46 +03:00
  • 5fe268a4d9 readme : add Aquila2 links (#3610) ldwang 2023-10-17 23:52:33 +08:00
  • 1a159553f9 tokenizer : special token handling (#3538) b1385 staviq 2023-10-17 17:11:01 +02:00
  • 010c52ec59 Merge branch 'master' into speculative-tree Georgi Gerganov 2023-10-17 17:24:11 +03:00
  • e6dd81f0bc speculative : fix the n_drafted fix + p constants Georgi Gerganov 2023-10-17 17:04:31 +03:00
  • f07cd35da4 speculative : fix off-by-one for n_drafted Georgi Gerganov 2023-10-17 11:40:09 +03:00
  • 281ef73c25 k-quants : fix quantization ranges (#3646) b1384 Georgi Gerganov 2023-10-17 09:19:28 +03:00
  • 940efa95fe llava : fix tokenization to not add bos between image embeddings and user prompt (#3645) b1383 Georgi Gerganov 2023-10-16 23:58:00 +03:00
  • 4d1804330e fix llava implementation FSSRepo 2023-10-16 16:31:17 -04:00
  • d7eca255d7 context shift fixed FSSRepo 2023-10-16 14:43:10 -04:00
  • 2d9f11db28 fixed premature end due stop word FSSRepo 2023-10-16 12:36:05 -04:00