Commit Graph

  • ffd0624be2 Remove this debug code. Adam Treat 2023-10-30 11:38:21 -04:00
  • a5eb001eab Revert the prompt processing on gpu for now. Adam Treat 2023-10-27 18:32:51 -04:00
  • e006d377dd Scale the workgroup count down to allow correct generation for falcon with AMD radeon cards with lower workgroup count limit Adam Treat 2023-10-27 18:32:29 -04:00
  • 89b71278ff llama : decide to disable Vulkan before loading tensors (#7) cebtenzzre 2023-10-27 19:04:26 -04:00
  • 1c17010188 vulkan : fix missing break in matmul selection (#9) cebtenzzre 2023-10-23 12:22:27 -04:00
  • 74ddf0f17d Fix synchronization problem for AMD Radeon with amdvlk driver or windows drivers. Does not have any performance or fidelity effect on other gpu/driver combos I've tested. Adam Treat 2023-10-27 12:05:24 -04:00
  • 8d9efbf97a Lower the workgroup count for some shaders by providing a loop that processes four floats at a time. Adam Treat 2023-10-26 11:48:36 -04:00
  • 752f7ebd61 Remove unused push constant that was giving validation errors. Adam Treat 2023-10-26 13:01:40 -04:00
  • 8400015337 Don't try an allocation on a heap that is smaller than the size we require. Adam Treat 2023-10-26 13:00:53 -04:00
  • cbc0d1af79 kompute : make scripts executable cebtenzzre 2023-10-23 11:46:26 -04:00
  • 21841d3163 kompute : enable kp_logger and make it static (#8) cebtenzzre 2023-10-16 16:51:41 -04:00
  • cc05a602d6 use mat*vec shaders for mat*mat Aaron Miller 2023-10-16 10:00:25 -07:00
  • c1fd64548d attempted speedups 2 Aaron Miller 2023-10-13 13:14:36 -07:00
  • 9bc52ebae3 attempted speedups Aaron Miller 2023-10-13 11:10:02 -07:00
  • 8dc79ac380 clean up vulkan/cpu switch Aaron Miller 2023-10-12 11:46:30 -07:00
  • cd0257ed0d q4_1 mat*mat Aaron Miller 2023-10-12 11:22:31 -07:00
  • 4809890d80 rm commented dbg print Aaron Miller 2023-10-12 10:23:09 -07:00
  • b78a94bc6d q6k mm works Aaron Miller 2023-10-11 17:10:42 -07:00
  • d5741c07a5 use op param epsilon for norms Aaron Miller 2023-10-11 18:40:07 -07:00
  • 3327d84a7f perf: use bigger threadgroups in mm Aaron Miller 2023-10-11 16:02:53 -07:00
  • 46385ee0d5 misc vulkan cleanup Aaron Miller 2023-10-10 21:38:18 -07:00
  • f0cd38b9ad add mat*mat ops Aaron Miller 2023-10-10 21:37:07 -07:00
  • 09d83f0401 Delete TODO now that we have q8_0. Adam Treat 2023-10-05 10:52:04 -04:00
  • 8564f79036 falcon h2d + reenable vulkan Aaron Miller 2023-10-04 21:03:27 -07:00
  • 020b1745a0 vulkan: implement neox mode for rope Aaron Miller 2023-10-04 23:36:24 -07:00
  • ff4212d20f q8 mat*vec Aaron Miller 2023-10-04 21:02:17 -07:00
  • 9db90cbe12 f16 mv broadcasting fix (gqa fix) Aaron Miller 2023-10-04 21:49:55 -07:00
  • 3d850db767 kompute : remove Q6_K from list of supported quant types Cebtenzzre 2023-10-04 16:19:19 -04:00
  • 24a4a5956a kompute : only try to use Vulkan for LLaMA itself Cebtenzzre 2023-10-04 16:16:04 -04:00
  • bc4b5ed1cb Fixes for subgroup size to bring AMD and NVIDIA inline with eachother for all kernels. Adam Treat 2023-10-04 14:24:35 -04:00
  • de589ced7c Change this back to be in agreement with metal and our previous softmax kernel. Adam Treat 2023-10-03 13:30:23 -04:00
  • 6ac39752bf Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch. Adam Treat 2023-10-03 12:40:24 -04:00
  • 32289aa447 Fixes for norm. Adam Treat 2023-10-02 21:00:48 -04:00
  • 06d4b21598 Fix offset into the qh and now we have working vulkan accelerated for gguff'd llama. Adam Treat 2023-10-02 11:30:10 -04:00
  • f1c9bc1821 Add q6_k getrows and mul*vec kernel. Adam Treat 2023-10-02 09:05:22 -04:00
  • 4b223ec432 Refactor getrows to use common code and get ready for q6_k. Adam Treat 2023-10-02 09:04:02 -04:00
  • 5509f74318 Minor cleanup. Adam Treat 2023-10-02 09:01:45 -04:00
  • 601905e75e Move the subgroups and printf into common. Adam Treat 2023-10-02 09:00:55 -04:00
  • 93306f16d0 Consolidate code for mat x vec kernels and use subgroups more extensively. Adam Treat 2023-09-29 10:02:22 -04:00
  • 77135a3bf5 Add a common boilerplate code via include and elim copy pasta Adam Treat 2023-09-21 13:00:10 -04:00
  • 9e4f8b4acc Upload immediately to device. Adam Treat 2023-09-26 11:58:39 -04:00
  • 6b6c73a9e3 kompute : don't fail build because of -Warray-bounds Cebtenzzre 2023-09-26 10:35:05 -04:00
  • 1b1416d7b7 Support for gguf. Adam Treat 2023-09-21 12:39:33 -04:00
  • d9b33fe95b metal : round up to 16 to fix MTLDebugComputeCommandEncoder assertion (#3938) b1483 Peter Sugihara 2023-11-03 12:18:18 -07:00
  • 5ba3746171 ggml-metal: fix yarn rope (#3937) Xiao-Yong Jin 2023-11-03 13:00:31 -05:00
  • abb77e7319 ggml-cuda : move row numbers to x grid dim in mmv kernels (#3921) b1481 slaren 2023-11-03 12:13:09 +01:00
  • 8f961abdc4 speculative : change default p_accept to 0.5 + CLI args (#3919) Georgi Gerganov 2023-11-03 09:41:17 +02:00
  • 05816027d6 common : YAYF (yet another YARN fix) (#3925) Georgi Gerganov 2023-11-03 09:24:00 +02:00
  • 3fdbe6b66b llama : change yarn_ext_factor placeholder to -1 (#3922) cebtenzzre 2023-11-03 02:31:58 -04:00
  • 629f917cd6 cuda : add ROCM aliases for CUDA pool stuff (#3918) b1477 Kerfuffle 2023-11-02 13:58:22 -06:00
  • 51b2fc11f7 cmake : fix relative path to git submodule index (#3915) b1476 Andrei 2023-11-02 15:40:31 -04:00
  • 224e7d5b14 readme : add notice about #3912 Georgi Gerganov 2023-11-02 20:44:12 +02:00
  • c7743fe1c1 cuda : fix const ptrs warning causing ROCm build issues (#3913) b1474 Georgi Gerganov 2023-11-02 20:32:11 +02:00
  • d6069051de cuda : use CUDA memory pool with async memory allocation/deallocation when available (#3903) b1473 Oleksii Maryshchenko 2023-11-02 18:10:39 +01:00
  • 4ff1046d75 gguf : print error for GGUFv1 files (#3908) b1472 Georgi Gerganov 2023-11-02 16:22:30 +02:00
  • 21958bb393 cmake : disable LLAMA_NATIVE by default (#3906) b1471 slaren 2023-11-02 13:10:33 +01:00
  • 2756c4fbff gguf : remove special-case code for GGUFv1 (#3901) b1470 Georgi Gerganov 2023-11-02 11:20:21 +02:00
  • 1efae9b7dc llm : prevent from 1-D tensors being GPU split (#3697) b1469 Georgi Gerganov 2023-11-02 09:54:18 +02:00
  • b12fa0d1c1 build : link against build info instead of compiling against it (#3879) b1468 cebtenzzre 2023-11-02 02:50:16 -04:00
  • 4d719a6d4e cuda : check if this fixes Pascal card regression (#3882) b1467 Georgi Gerganov 2023-11-02 08:35:10 +02:00
  • 183b3fac6c metal : fix build errors and kernel sig after #2268 (#3898) b1466 Georgi Gerganov 2023-11-02 08:33:37 +02:00
  • 2fffa0d61f cuda : fix RoPE after #2268 (#3897) b1465 cebtenzzre 2023-11-02 01:49:44 -04:00
  • 0eb332a10f llama : fix llama_context_default_params after #2268 (#3893) b1464 cebtenzzre 2023-11-01 19:29:14 -04:00
  • d02e98cde0 ggml-cuda : compute ptrs for cublasGemmBatchedEx in a kernel (#3891) b1463 slaren 2023-11-01 23:10:09 +01:00
  • 898aeca90a llama : implement YaRN RoPE scaling (#2268) b1462 cebtenzzre 2023-11-01 18:04:33 -04:00
  • c43c2da8af llm : fix llm_build_kqv taking unused tensor (benign, #3837) b1461 Georgi Gerganov 2023-11-01 23:08:30 +02:00
  • 523e49b111 llm : fix falcon norm after refactoring (#3837) b1460 Georgi Gerganov 2023-11-01 23:00:50 +02:00
  • e16b9fa4ba metal : multi-simd softmax (#3710) b1459 Georgi Gerganov 2023-11-01 21:25:00 +02:00
  • 46868a499e metal : multi-simd softmax metal-soft-max Georgi Gerganov 2023-10-21 13:18:26 +03:00
  • ff8f9a88da common : minor (#3715) b1458 Georgi Gerganov 2023-11-01 21:15:55 +02:00
  • 50337961a6 llm : add llm_build_context (#3881) b1457 Georgi Gerganov 2023-11-01 20:11:02 +02:00
  • a8796f9609 llm : cleanup + comments llm-build-context Georgi Gerganov 2023-11-01 20:08:02 +02:00
  • 0e40806c1c common : allow caller to handle help/argument exceptions (#3715) b1456 bandoti 2023-11-01 14:42:01 -03:00
  • 78186f4009 llm : restore the non-graph llm_build_ functional API Georgi Gerganov 2023-11-01 15:25:50 +02:00
  • a2758d08e4 log : make generating separate log files optional (#3787) b1455 staviq 2023-11-01 15:18:27 +01:00
  • e75dfdd31b sampling : null grammar field after reset (#3885) b1454 l3utterfly 2023-11-01 21:40:43 +08:00
  • 9a3b4f6c86 ggml : fix UNUSED macro (#3762) b1453 Georgi Gerganov 2023-11-01 13:50:45 +02:00
  • 73bdcb395e finetune : add -ngl parameter (#3762) Andrew Godfrey 2023-11-01 04:49:04 -07:00
  • f0e209324a scripts : add server-llm.sh (#3868) Georgi Gerganov 2023-11-01 11:29:07 +02:00
  • ca190bca8e server : re-enable completion and embedded at the same time (#3876) b1450 Adrian Hesketh 2023-11-01 09:28:28 +00:00
  • 995ee0919f llm : deduce norm eps based on type + explict max_alibi_bias, clamp_kqv Georgi Gerganov 2023-11-01 11:19:58 +02:00
  • 9284aa6a70 llm : add llm_build_context Georgi Gerganov 2023-11-01 08:51:43 +02:00
  • 7420bef83e wip wip wip llm-reuse-constants Georgi Gerganov 2023-11-01 08:51:43 +02:00
  • 71e3718abd llama : refactor graph build code (#3837) b1449 Georgi Gerganov 2023-11-01 08:04:02 +02:00
  • 238657db23 samplers : Min-P sampler implementation [alternative to Top P/Top K] (#3841) b1448 kalomaze 2023-10-31 14:44:49 -05:00
  • afb3929279 Merge branch 'master' into llama-refactor llama-refactor Georgi Gerganov 2023-10-31 20:35:31 +02:00
  • 07178c98e1 flake.nix: fix for rocm 5.7 (#3853) Tungsten842 2023-10-31 18:24:03 +01:00
  • 5baefef497 llama : add llm_build helper functions (#3848) Georgi Gerganov 2023-10-31 19:23:12 +02:00
  • 29fe516913 wip test-mmv Georgi Gerganov 2023-10-31 18:36:37 +02:00
  • dab42893c9 scripts : working curl pipe deploy Georgi Gerganov 2023-10-31 17:03:56 +02:00
  • 7923b70cb8 llama : add llm_build_inp_embd helper llama-refactor-norm Georgi Gerganov 2023-10-31 16:43:08 +02:00
  • 2073347e3b llama : remove extra ; + deduplicate gate_b logic Georgi Gerganov 2023-10-31 16:28:09 +02:00
  • f3947e1e02 scripts : rename to server-llm.sh Georgi Gerganov 2023-10-31 13:58:18 +02:00
  • 2f719c876d scripts : add deploy-server.sh Georgi Gerganov 2023-10-31 11:29:23 +02:00
  • fc5a26aade llama : enable warning about not offloaded tensors Georgi Gerganov 2023-10-31 08:57:10 +02:00
  • 0bfdcdd0f8 llama : normalize tensor names Georgi Gerganov 2023-10-31 08:46:34 +02:00
  • 6669cd8329 llama : update offload functions for KQ tensors Georgi Gerganov 2023-10-31 08:24:07 +02:00
  • 2926ef63b1 llama : fix input allocation logic Georgi Gerganov 2023-10-31 08:23:43 +02:00
  • 207b51900e ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861) b1446 Georgi Gerganov 2023-10-30 19:19:15 +02:00
  • 4b3cb98d46 ggml-impl : move extern "C" to start of file ggml-impl Georgi Gerganov 2023-10-30 19:05:58 +02:00