Docker Model Runner Tutorial and Cheatsheet: Mac, Windows and Linux Support

Whether you're building generative AI applications, experimenting with machine learning workflows, or integrating AI into your software development lifecycle, Docker Model Runner provides a consistent, secure, and efficient way to work with AI models locally.

Docker Model Runner Tutorial and Cheatsheet: Mac, Windows and Linux Support

Docker Model Runner is a feature that allows you to run and interact with AI models directly from the command line or through the Docker Desktop GUI. It enables you to pull models from Docker Hub, run them locally, and interact with them using OpenAI-compatible APIs. Models are cached locally after the first use for faster access, and are loaded into memory only at runtime to optimize resource usage. Docker Model Runner supports packaging and publishing models as OCI Artifacts, and can be integrated with tools like Docker Compose and Testcontainers for multi-container and automated testing workflows.

What can you do with Docker Model Runner?

With Docker Model Runner, you can:

  • Pull and run AI models locally: Download models from Docker Hub and run them on your machine.
  • Interact with models via CLI or GUI: Use the command line or Docker Desktop’s Models tab to run models and interact with them in chat mode or by submitting prompts.
  • Use OpenAI-compatible APIs: Interact with models using standard OpenAI API endpoints, making integration with existing AI applications straightforward.
  • Package and publish models: Package GGUF files as OCI Artifacts and publish them to any container registry, including Docker Hub.
  • Integrate with Docker Compose and Testcontainers: Run AI models as part of multi-container applications or automated test environments.
  • Manage and view logs: List, inspect, and view logs for your local models to help with troubleshooting and monitoring.

Models are cached locally after the first use for faster access, and are loaded into memory only when needed to optimize resource usage.

Platform Support

PlatformStatusGPU SupportVersion
macOS (Apple Silicon)✅ FullMetal4.40+
Windows✅ FullNVIDIA CUDA4.41+
Linux (Docker CE)✅ FullNVIDIALatest

Here's more details:

  • Status: ✅ Fully Supported (Since Docker Desktop 4.40)
  • Requirements:
    • macOS with Apple Silicon (M1, M2, M3, M4)
    • Docker Desktop 4.40 or later
    • 8GB+ RAM recommended
  • GPU Support: Native Metal acceleration via host-based execution

Windows

  • Status: ✅ Supported (Since Docker Desktop 4.41)
  • Requirements:
    • Windows 10/11 (x64 or ARM64)
    • Docker Desktop 4.41 or later
    • For GPU: NVIDIA GPU with compatible drivers
    • 8GB+ RAM recommended
  • GPU Support: NVIDIA CUDA acceleration (optional, requires additional setup)

Linux

  • Status: ✅ Supported (Docker CE/Community Edition)
  • Requirements:
    • Linux distribution with Docker Engine
    • Docker CE (Community Edition)
    • 8GB+ RAM recommended
    • For GPU: NVIDIA GPU with Container Toolkit
  • Note: The status quo is Docker Model Runner support in Docker Desktop on macOS and Windows and support for Docker CE on Linux (including WSL2)

Does Model Runner uses GPU?

Docker Model Runner can use GPU acceleration. On Windows, GPU-backed inference is supported if you have an NVIDIA GPU and enable the relevant setting in Docker Desktop. When you enable Docker Model Runner in Docker Desktop, you will see an option to enable GPU-backed inference if your hardware supports it

Let's look at how to get started with Docker Model Runner on the Windows platform.

Using Docker Model Runner on Windows x86 system

  • Install the latest version of Docker Desktop(4.42+)
  • Ensure that “Docker Model Runner” is enabled.
  • X86 System with Windows 11 Pro
  • Enable Docker Model Runner under Settings

The "Enable host-side TCP support" feature allows Docker Model Runner to additionally accept connections on the host OS on the specified TCP port (default: 12434) rather than only through the host Docker socket (/var/run/docker.sock). You can change this to another port if needed, particularly if 12434 is already in use by another application. We will see its usage later in the docs.

  • Click “Docker Hub” and search for your preferred AI Model.
  • Select the model and start chatting with the AI model by adding your prompt

There are two ways to enable Model Runner - either using CLI or using Docker Dashboard. You already saw how to use Model Runner using Docker Dashboard.

Let's look at how to get started with Model Runner using Powershell:

Using PowerShell

If you want to enable Model Runner using CLI, here's a quick way:

Enabling the Model Runner

docker desktop enable model-runner

Run the following command to see the usage of Model Runner CLI:

docker model --help
Usage:  docker model COMMAND

Docker Model Runner

Commands:
  inspect     Display detailed information on one model
  list        List the available models that can be run with the Docker Model Runner
  pull        Download a model
  rm          Remove a model downloaded from Docker Hub
  run         Run a model with the Docker Model Runner
  status      Check if the Docker Model Runner is running
  version     Show the Docker Model Runner version

Run 'docker model COMMAND --help' for more information on a command.

Check if the Model Runner is running or not

docker model status
Docker Model Runner is running

List the available models

docker model ls
MODEL  PARAMETERS  QUANTIZATION  ARCHITECTURE  FORMAT  MODEL ID  CREATED  SIZE

The response shows an empty list.

Let’s go ahead and download the model from the Docker Hub.

Download a model

docker model pull ai/llama3.2:1B-Q8_0

All these models are hosted on https://hub.docker.com/u/ai:

  • ai/gemma3
  • ai/llama3.2
  • ai/qwq
  • ai/mistral-nemo
  • ai/mistral
  • ai/phi4
  • ai/qwen2.5
  • ai/deepseek-r1-distill-llama (distill means it’s not the actual RL-ed deepseek, it’s a llama trained on DeepSeek-R1 inputs/outputs)
  • and many more

List the Model

docker model ls
MODEL                PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID      CREATED       SIZE
ai/llama3.2:1B-Q8_0  1.24 B      Q8_0          llama         a15c3117eeeb  20 hours ago  1.22 GiB

Use the following command to send a single message:

docker model run ai/llama3.2:1B-Q8_0 "Hi"
Hello! How can I help you today?

Run the Model in interactive mode

docker model run ai/llama3.2:1B-Q8_0
Interactive chat mode started. Type '/bye' to exit.
> why is water blue?
Water appears blue because ...

Remove the model

docker model rm ai/llama3.2:1B-Q8_0

All the above listed commands and steps works for MacOS too.

Using Docker Model Runner on Linux (Ubuntu, Red Hat, Fedora)

Prerequisites

  • Linux system (Ubuntu/Debian recommended)
  • Docker Engine installed
  • At least 4GB RAM
  • Internet connection for downloading models

Let's get started!

Step 1: Install Docker Model Runner

First, update your system and install the Docker Model Runner plugin:

sudo apt-get update
sudo apt-get install docker-model-plugin

If you're using Fedora or Red Hat, run the following commands:

sudo dnf update
sudo dnf install docker-model-plugin

Step 2: Verify Installation

Check that Docker Model Runner is properly installed:

docker model version

You should see version information for Docker Model Runner. The docker model command is now available alongside your regular Docker commands.

Step 3: Run Your First AI Model

Now let's deploy an AI model using Docker Model Runner:

docker model run ai/smollm2

This command will:

  • Pull the SmolLM2 model (270 MB in size, 360 million parameters) from Docker Hub
  • Perfect for Chat assistants, Text-extraction, Rewriting and summarization tasks.
  • Start a model server
  • Make the model available via API on port 12434

Start chatting with the model. Once done, you can exit by typing /bye.

Let's verify the model is running:

docker model ps

Supported Model Format

Docker Model Runner supports models in the GGUF (GPT-style General Use Format), a lightweight binary file format designed for efficient local inference. This format includes model weights, tokenizer, and metadata, making it ideal for packaging and distributing LLMs in containerized environments.

You can also directly package a model file in GGUF format as an OCI Artifact and publish it to Docker Hub.

# Download a model file in GGUF format, e.g. from HuggingFace
$ curl -L -o model.gguf https://huggingface.co/TheBloke/Mistral-7B-v0.1-GGUF/resolve/main/mistral-7b-v0.1.Q4_K_M.gguf

# Package it as OCI Artifact and push it to Docker Hub
$ docker model package --gguf "$(pwd)/model.gguf" --push myorg/mistral-7b-v0.1:Q4_K_M

Does Model Runner work with Docker Compose?

That's a great question.

Yes, Docker Model Runner works with Docker Compose. Starting from Docker Compose v2.35.0 and Docker Desktop 4.41, you can integrate Model Runner into your Compose applications.

Compose introduces a new service type called provider, which allows you to declare model dependencies using the model type. This lets you define and run AI-powered applications alongside your other services in a multi-container setup. During docker compose up, Model Runner automatically pulls and runs the specified model, and passes connection details to dependent services via environment variables.

For example, you can declare a model provider service in your docker-compose.yml.

Here's a demo project called aiwatch - AI Model Management and Observability powered by Docker Model Runner. This project showcases a complete Generative AI interface that includes:

  • React/TypeScript frontend with a responsive chat UI
  • Go backend server for API handling
  • Integration with Docker's Model Runner to run Llama 3.2 locally
  • Comprehensive observability with metrics, logging, and tracing
  • llama.cpp metrics integration directly in the UI
GitHub - ajeetraina/aiwatch: AI Model Management and Observability
AI Model Management and Observability. Contribute to ajeetraina/aiwatch development by creating an account on GitHub.

services:
  backend:
    env_file: 'backend.env'
    build:
      context: .
      target: backend
    ports:
      - '8080:8080'
      - '9090:9090'  # Metrics port
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock  # Add Docker socket access
    healthcheck:
      test: ['CMD', 'wget', '-qO-', 'http://localhost:8080/health']
      interval: 3s
      timeout: 3s
      retries: 3
    networks:
      - app-network
    depends_on:
      - llm

  frontend:
    build:
      context: ./frontend
    ports:
      - '3000:3000'
    depends_on:
      backend:
        condition: service_healthy
    networks:
      - app-network

  prometheus:
    image: prom/prometheus:v2.45.0
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--web.enable-lifecycle'
    ports:
      - '9091:9090'
    networks:
      - app-network

  grafana:
    image: grafana/grafana:10.1.0
    volumes:
      - ./grafana/provisioning:/etc/grafana/provisioning
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_USERS_ALLOW_SIGN_UP=false
      - GF_SERVER_DOMAIN=localhost
    ports:
      - '3001:3000'
    depends_on:
      - prometheus
    networks:
      - app-network

  jaeger:
    image: jaegertracing/all-in-one:1.46
    environment:
      - COLLECTOR_ZIPKIN_HOST_PORT=:9411
    ports:
      - '16686:16686'  # UI
      - '4317:4317'    # OTLP gRPC
      - '4318:4318'    # OTLP HTTP
    networks:
      - app-network

  # New LLM service using Docker Compose's model provider
  llm:
    provider:
      type: model
      options:
        model: ${LLM_MODEL_NAME:-ai/llama3.2:1B-Q8_0}

volumes:
  grafana-data:

networks:
  app-network:
    driver: bridge

API Access

  1. From Host (HTTP)

# Enable TCP access (if not done during setup)
docker desktop enable model-runner --tcp 12434

# Make API calls
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello! How are you?"}
    ]
  }'
  1. From Containers

# Models accessible via special DNS name
curl http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'
  1. Via Docker Socket (Unix)

curl --unix-socket /var/run/docker.sock \
  localhost/exp/vDD4.40/engines/llama.cpp/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {"role": "user", "content": "Tell me a joke"}
    ]
  }'

📋 Core Commands

Model Management

CommandDescriptionExample
docker model pull <model>Download model from Docker Hubdocker model pull ai/smollm2
docker model listList local modelsdocker model list
docker model rm <model>Remove local modeldocker model rm ai/smollm2
docker model statusCheck Model Runner statusdocker model status
docker model inspect <model>Show model detailsdocker model inspect ai/smollm2
docker model dfShow disk usagedocker model df

Model Execution

CommandDescriptionExample
docker model run <model> "<prompt>"One-time promptdocker model run ai/smollm2 "Hello"
docker model run <model>Interactive chat modedocker model run ai/smollm2
docker model run -it <model>Interactive mode (explicit)docker model run -it ai/smollm2

Model Publishing

CommandDescriptionExample
docker model push <namespace>/<model>Push model to registrydocker model push myorg/my-model
docker model tag <source> <target>Tag modeldocker model tag ai/smollm2 my-model:latest
docker model package <gguf-file>Package GGUF as OCI artifactdocker model package model.gguf

Debugging & Monitoring

CommandDescriptionExample
docker model logsView logsdocker model logs
docker model logs -fFollow logs in real-timedocker model logs -f
docker model logs --no-enginesExclude engine logsdocker model logs --no-engines

Runner Management (Docker CE)

CommandDescriptionExample
docker model install-runnerInstall runner explicitlydocker model install-runner
docker model uninstall-runnerRemove runnerdocker model uninstall-runner

🌐 API Endpoints

Base URLs

Access MethodURLNotes
From containershttp://model-runner.docker.internal/Special DNS name
From host (TCP)http://localhost:12434/Requires TCP enabled
From host (Unix)Unix socket: /var/run/docker.sockPrefix paths with /exp/vDD4.40

Docker Model Management APIs

MethodEndpointDescription
POST/models/createCreate/pull model
GET/modelsList models
GET/models/{namespace}/{name}Get model info
DELETE/models/{namespace}/{name}Delete model

OpenAI-Compatible APIs

MethodEndpointDescription
GET/engines/llama.cpp/v1/modelsList available models
GET/engines/llama.cpp/v1/models/{namespace}/{name}Get specific model
POST/engines/llama.cpp/v1/chat/completionsChat completions
POST/engines/llama.cpp/v1/completionsText completions
POST/engines/llama.cpp/v1/embeddingsGenerate embeddings
💡 Tip: You can omit llama.cpp from paths: /engines/v1/chat/completions

Further Readings:

Run Open WebUI with Docker Model Runner Guide
Learn to run Open WebUI using Docker Model Runner.
Docker Model Runner: The Missing Piece for Your GenAI Development Workflow - Collabnix
Ever tried building a GenAI application and hit a wall? 🧱 I know I have. You start with excitement about implementing that cool chatbot or content generator, but then reality hits. You’re either sending sensitive data to third-party APIs with usage limits and costs that quickly add up 💸, or you’re wrestling with complex local […]
Which Model to Choose with Docker Model Runner? - Collabnix
Choosing the Right Docker Model Runner for Your Needs Docker Model Runner allows you to run AI models locally through Docker Desktop. Here’s a breakdown of the available models and their recommended use cases: 1. ai/smollm2 2. ai/llama3.2 3. ai/llama3.3 4. ai/gemma3 5. ai/phi4 6. ai/mistral and ai/mistral-nemo 7. ai/qwen2.5 8. ai/deepseek-r1-distill-llama How to Choose […]
Testcontainers Tutorial: Docker Model Runner Guide
Learn how to use Testcontainers in this tutorial.