M4 Mac Mini LLM Server

This is a short guide to hosting your own language models on a Mac Mini. I’ll cover the short set of configuration items required to spin things up so you can infrerence, completely privately, on local hardware.

Let me clue you in on a little secret that it took me weeks of banging my head against my keyboard to discover: In order to run a LaunchDaemon at boot with no users logged in, FileVault must be disabled. I must have restarted my Mac Mini 20+ times, tweaking my configuration each time, wondering why there were no errors in the logs and why on earth my Ollama service absolutely refused to start on boot, even as root!

Hosting your own LLM helps to drive down costs while using simple LLMs, enables the processing of sensitive personal and financial data, and breaks reliance on APIs that may not always handle your data in the manner suggested by the vendor - a noble exercise in decentralization.

To begin, create a new user called llama. You don’t want to run this service as your own user account or as root. I did this in the system settings.

Once you’ve got brew , install Ollama and create an operating directory for our model files.

# Install Ollama
brew install ollama

# Make a directory in /opt/
sudo mkdir /opt/ollama
sudo chown llama:staff /opt/ollama

# As 'llama' user, make 'models' directory
su - llama
mkdir /opt/ollama/models

Next, edit /Library/LaunchDaemons/com.ollama.server.plist and add:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">

  <!-- This runs Ollama locally on your M4 Mac Mini -->

  <dict>

    <key>UserName</key>
    <string>llama</string>

    <key>Label</key>
    <string>com.ollama.server</string>

    <key>WorkingDirectory</key>
    <string>/opt/ollama</string>

    <key>ProgramArguments</key>
    <array>
      <string>/opt/homebrew/bin/ollama</string>
      <string>serve</string>
    </array>


    <!-- Run in background -->
    <key>RunAtLoad</key>
    <true/>

    <!-- Restart if it crashes -->
    <key>KeepAlive</key>
    <true/>

    <!-- Environment variable: bind to all interfaces -->
    <key>EnvironmentVariables</key>
    <dict>
      <!-- HOST -->
      <key>OLLAMA_HOST</key>
      <string>0.0.0.0</string>
      <!-- For local-only privacy: <string>127.0.0.1</string> -->

      <key>HOME</key>
      <string>/Users/llama</string>

      <!-- PARALLEL REQUESTS -->
      <key>OLLAMA_NUM_PARALLEL</key>
      <string>16</string>

      <key>OLLAMA_LOG_LEVEL</key>
      <string>DEBUG</string>

      <!-- CONTEXT WINDOW MAXIMUM -->
      <key>OLLAMA_CONTEXT_LENGTH</key>
      <string>16384</string>

      <!-- Max Request Queue -->
      <key>OLLAMA_MAX_QUEUE</key>
      <string>1024</string>

      <!-- Maximum Loaded Models -->
      <key>OLLAMA_MAX_LOADED_MODELS</key>
      <string>3</string>

      <!-- Ensure Ollama Stays Loaded -->
      <key>OLLAMA_KEEP_ALIVE</key>
      <string>-1</string>

      <!-- Models Path -->
      <key>OLLAMA_MODELS</key>
      <string>/opt/ollama/models</string>

    </dict>

    <!-- Log files (optional) -->
    <key>StandardOutPath</key>
    <string>/tmp/ollama.out.log</string>

    <key>StandardErrorPath</key>
    <string>/tmp/ollama.err.log</string>

  </dict>
</plist>

Finally, reload the .plist file with this script.

#!/bin/bash

# Define variables for clarity
PLIST="/Library/LaunchDaemons/com.ollama.server.plist"
SERVICE="system/com.ollama.server"
OLLAMA_BIN="/opt/homebrew/bin/ollama"

echo "--- STARTING DEBUG RELOAD ---"

# Check if Ollama binary exists
if [ ! -f "$OLLAMA_BIN" ]; then
    echo "ERROR: Ollama binary not found at $OLLAMA_BIN"
    exit 1
fi

# Set Permissions
echo "Setting permissions..."
sudo chown root:wheel "$PLIST"
sudo chmod 644 "$PLIST"

# Unload
echo "Attempting to unload..."
sudo launchctl bootout "$SERVICE" 2>/dev/null || echo "Service was not running (this is fine)"

# Load
echo "Bootstrapping service..."
sudo launchctl bootstrap system "$PLIST"

# Check Status
echo "--- STATUS CHECK ---"
sleep 2 # Wait for a moment while the service starts
sudo launchctl list | grep ollama

echo "Check status code (2nd column):"
echo "  0   = Running perfectly"
echo "  78  = Binary not found (Check ProgramArguments path)"
echo "  1   = General Error (Likely missing HOME variable or permission issue)"
echo "  -   = Not loaded at all"

Change your computer’s name to something short and descriptive.

Fantastic - that’s all! After rebooting, you should be able to go to another computer and run the following CURL commands:

# Pull a Model
curl http://mini.local:11434/api/pull -d '{
  "model": "gemma3:latest"
}'

# Generate a Response
curl http://mini.local:11434/api/generate -d '{
  "model": "gemma3:latest",
  "prompt": "Just reply `hello` with no other content, please.",
  "stream": false
}'

{"model":"gemma3:latest",
 "response":"hello\n",
 "created_at":"2025-12-20T17:56:14.06673Z",
 "done":true, "done_reason":"stop",
 "context":[105,2364,107,11896,15148,2165,23391,236929,607,951,1032,3004,236764,5091,236761,106,107,105,4368,107,23391,107],
 "total_duration":1569100833, "load_duration":149052000,
 "prompt_eval_count":21, "prompt_eval_duration":1352561458,
 "eval_count":3, "eval_duration":61480499}

# Check Model Status
curl http://mini.local:11434/api/ps

{"models":[
 {"name":"gemma3:latest","model":"gemma3:latest",
  "size":12473293248,
  "details":{"parent_model":"","format":"gguf","family":"gemma3","families":["gemma3"],
             "parameter_size":"4.3B","quantization_level":"Q4_K_M"},
  "expires_at":"2318-04-01T11:43:30.924156807-06:00",
  "size_vram":12473293248,
  "context_length":16384}
]}

Nice. We’re off to the races.

Check the Ollama HTTP API for full documentation of available endpoints.

Note that this is not production safe and you should reverse-proxy this port with NGINX, guarding it with an API key - but there are plenty of guides on how to do that.

Happy hacking!