Spaces:

ggml-org
/

gguf-my-repo

Running on A10G

App Files Files Community

191

Update app.py

#144

by gghfez - opened Dec 31, 2024

base: refs/heads/main

←

from: refs/pr/144

Discussion Files changed

+21

-15

Files changed (1) hide show

app.py +21 -15

app.py CHANGED Viewed

@@ -228,45 +228,51 @@ def process_model(model_id, q_method, use_imatrix, imatrix_q_method, private_rep
                 # {new_repo_id}
                 This model was converted to GGUF format from [`{model_id}`](https://huggingface.co/{model_id}) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
                 Refer to the [original model card](https://huggingface.co/{model_id}) for more details on the model.
                 ## Use with llama.cpp
                 Install llama.cpp through brew (works on Mac and Linux)
                 ```bash
                 brew install llama.cpp
                 ```
                 Invoke the llama.cpp server or the CLI.
                 ### CLI:
                 ```bash
                 llama-cli --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -p "The meaning to life and the universe is"
                 ```
                 ### Server:
                 ```bash
                 llama-server --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -c 2048
                 ```
                 Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
                 Step 1: Clone llama.cpp from GitHub.
-                ```
                 git clone https://github.com/ggerganov/llama.cpp
                 ```
-                Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
-                ```
-                cd llama.cpp && LLAMA_CURL=1 make
                 ```
-                Step 3: Run inference through the main binary.
-                ```
-                ./llama-cli --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -p "The meaning to life and the universe is"
                 ```
-                or
                 ```
-                ./llama-server --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -c 2048
                 ```
                 """
             )

                 # {new_repo_id}
                 This model was converted to GGUF format from [`{model_id}`](https://huggingface.co/{model_id}) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
                 Refer to the [original model card](https://huggingface.co/{model_id}) for more details on the model.
                 ## Use with llama.cpp
                 Install llama.cpp through brew (works on Mac and Linux)
                 ```bash
                 brew install llama.cpp
                 ```
                 Invoke the llama.cpp server or the CLI.
                 ### CLI:
                 ```bash
                 llama-cli --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -p "The meaning to life and the universe is"
                 ```
                 ### Server:
                 ```bash
                 llama-server --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -c 2048
                 ```
                 Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
                 Step 1: Clone llama.cpp from GitHub.
+                ```bash
                 git clone https://github.com/ggerganov/llama.cpp
+                cd llama.cpp
                 ```
+                Step 2: Build using CMake. For CPU-only use:
+                ```bash
+                cmake -B build
+                cmake --build build --config Release
                 ```
+                For CUDA support on Linux/Windows:
+                ```bash
+                cmake -B build -DGGML_CUDA=ON
+                cmake --build build --config Release
                 ```
+                Step 3: Run inference through the binary (from the llama.cpp folder):
+                ```bash
+                ./build/bin/llama-cli --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -p "The meaning to life and the universe is"
                 ```
+                or
+                ```bash
+                ./build/bin/llama-server --hf-repo {new_repo_id} --hf-file {quantized_gguf_name} -c 2048
                 ```
                 """
             )