ollama
Ollama is an open-source app that lets you create, run, and share large language models (LLMs) locally with a CLI for Mac and Linux. You can also install ollama on Windows, but Unix support is better (as of 11/11/24).
The utility is developed by Meta (I know...but they've developed a lot of great open source tools like the React.js framework), but Meta keeps their grubby paws off your data. All LLMs executed with ollama are local-only and private.
System Requirements
LLMs require a lot of power to run. Ollama uses weights differently from LLMs like chatGPT, shrinking the model size and enabling them to be run on (powerful enough) home devices.
Warning
Just because ollama is lighter than other LLMs does not mean it is "light." While ollama can run on a regular CPU, it's much better to have a dedicated Graphics card with at least 6GB of VRAM. The "heavier" the model you wish to use with ollama, the more system resources you will need.
Ollama's Github has a page listing supported GPUs so you can quickly check if yours is supported and ollama will run without issue, or if you'll have to struggle and optimize to get this working on your device.
I was not able to find an official source for system requirements, but the table below is often cited as the minimum requirements to run an ollama local server:
Resource | Minimum Required | Notes |
---|---|---|
OS | Linux: Ubuntu 18.04 or later, macOS: macOS 11 Big Sur or later | There is technically Windows support, but ollama runs best on Unix. |
RAM | 8GB for running 3B models, 16GB for running 7B models, 32GB for running 13B models | The more the better. Models are loaded into RAM, and very large models (like dolphin-mixtral, which is ~26GB) will crash without sufficient memory. |
Storage | 12GB for installing Ollama and the base models, Additional space required for storing model data, depending on the models you use. | The more storage the better, especially if you plan to experiment with a lot of different models. These things are big. |
CPU | Any modern CPU with at least 4 cores is recommended, for running 13B models, a CPU with at least 8 cores is recommended. | ollama is less efficient running via CPU than GPU, make sure you have a decently powerful CPU if going this route. |
GPU(Optional) | Guide to help you pick a compatible CPU | A GPU is not required for running Ollama, but it can improve performance, especially for running larger models. If you have a GPU, you can use it to accelerate training of custom models. |
Installing ollama
- You can download ollama right from their website.
- On Linux, you can simple use this command:
Choosing a model
You can browse available models on ollama's website. You can install multiple models side-by-side and switch between them at will, you're really only limited to how many models you can download by the size of your disk.
Once you've installed ollama, you can download a model by running:
For example, to get started with the llama3.2
model (current as of 11/11/24), you can run:
To run your new model, open your CLI and run (the model will be downloaded if you have not already run ollama pull
):
Ollama control script
This Bash script can help manage ollama. The script includes the following arguments:
start
: Start the ollama service & serverstop
: Stop the ollama server & stop/disable the serviceinstall
: Install the ollama server if you have not alreadyupgrade
: Upgrading ollama is as simple as re-running the install script. This script takes care of that for you if you use this argument