Running Vision Models Locally with Docker Model Runner: A Complete Tutorial
Want to add vision capabilities to your applications without sending data to external APIs? Docker Model Runner makes it straightforward to run multimodal AI models locally, giving you complete control over your data while using the familiar OpenAI-compatible API format.
Want to run AI vision models that can analyze images, extract text, and answer questions about photos—all on your own machine? Docker Model Runner makes it straightforward to run multimodal AI models locally with the same OpenAI-compatible API you already know.
Let me show you how to get started with vision-capable models using Docker Model Runner, from pulling your first multimodal model to making real API calls that analyze images.
What is Docker Model Runner?
Docker Model Runner is Docker's native solution for running AI models locally, integrated directly into Docker Desktop. It brings local AI inference into your Docker workflow, using the same familiar concepts—registries, tags, versioning—that you already use for containers.
Here's what makes it clever: Models don't run inside containers. Instead, Docker Model Runner:
- Uses an Inference Server API endpoint through Docker Desktop
- Runs llama.cpp as a native host process for direct GPU access
- Loads models on-demand when you make API calls
- Automatically unloads models after 5 minutes of inactivity
No containers to manage, no docker run commands before making API calls. Models are stored as OCI artifacts in registries, giving you version control and distribution for AI models just like you have for container images.
The architecture in a nutshell
Docker Model Runner has three main components:
- model-runner - The backend that manages and runs models (native process for performance)
- model-cli - Command-line tool for pulling and managing models
- model-spec - Specification for packaging models as OCI artifacts
The inference engine is llama.cpp, which exposes an OpenAI-compatible API. This means if you've worked with OpenAI's API before, you already know most of what you need.
Getting started: Your first model
If you have Docker Desktop, you already have Model Runner. Let's verify:
docker model --helpPull a text model first
Let's start with a simple text model to get comfortable:
# Pull a small, fast model
docker model pull ai/smollm2:360M-Q4_K_M
# List your models
docker model lsThe model name breaks down as:
ai/- Docker's namespace for official modelssmollm2- Model family360M- 360 million parametersQ4_K_M- 4-bit quantization (smaller, faster)
Make your first API call
The API runs on localhost:12434:
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/smollm2:360M-Q4_K_M",
"messages": [
{
"role": "user",
"content": "Explain Docker in simple terms"
}
]
}'No API keys, no authentication—everything runs locally. The response follows OpenAI's familiar format:
{
"id": "chatcmpl-...",
"model": "ai/smollm2:360M-Q4_K_M",
"choices": [{
"message": {
"role": "assistant",
"content": "Docker is a platform that lets you package..."
}
}],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 87,
"total_tokens": 102
}
}Moving to multimodal: Vision-capable models
Now let's explore Docker Model Runner's multimodal capabilities. Vision-capable models can analyze images, extract text, identify objects, and answer questions about visual content.
Pulling a vision model
Several models in the ai/ namespace support vision:
# Google's Gemma 3 (good balance of speed and quality)
docker model pull ai/gemma3:4B-Q4_K_M
# Meta's Llama 3.2 Vision (larger, more capable)
docker model pull ai/llama3.2-vision:11B-Q4_K_MIdentifying vision-capable models
Look for these model families that support multimodal input:
- gemma3 - Google's vision-capable models (2B, 4B, 9B, 27B)
- llama3.2-vision - Meta's multimodal models (11B, 90B)
- llava - Popular open-source vision models
You can also pull GGUF models directly from Hugging Face:
docker model pull hf.co/bartowski/llava-v1.6-mistral-7b-GGUFWorking with images: The base64 format
To send images to vision models, you'll encode them as base64 data URIs. This keeps everything in a single JSON payload and works seamlessly with the OpenAI-compatible API format.
Quick encoding methods
Command line (Linux/Mac):
echo "data:image/jpeg;base64,$(base64 -i photo.jpg)"Python:
import base64
def encode_image(image_path):
with open(image_path, 'rb') as f:
encoded = base64.b64encode(f.read()).decode('utf-8')
ext = image_path.lower().split('.')[-1]
mime = f'image/{"jpeg" if ext == "jpg" else ext}'
return f"data:{mime};base64,{encoded}"
data_uri = encode_image('photo.jpg')JavaScript/Node.js:
const fs = require('fs');
const path = require('path');
function encodeImage(imagePath) {
const buffer = fs.readFileSync(imagePath);
const base64 = buffer.toString('base64');
const ext = path.extname(imagePath).slice(1).toLowerCase();
const mime = `image/${ext === 'jpg' ? 'jpeg' : ext}`;
return `data:${mime};base64,${base64}`;
}Your first vision API call
Here's a complete example that sends both text and an image:
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/gemma3:4B-Q4_K_M",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What do you see in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "..."
}
}
]
}
]
}'The key difference from text-only requests is the content array format:
- Text prompts use
{"type": "text", "text": "..."} - Images use
{"type": "image_url", "image_url": {"url": "data:..."}}
You can mix multiple text segments and images in the same request!
Practical use cases
1. Image description and analysis
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/gemma3:4B-Q4_K_M",
"messages": [{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in detail"
},
{
"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,[BASE64]"}
}
]
}]
}'2. Text extraction (OCR)
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/gemma3:4B-Q4_K_M",
"messages": [{
"role": "user",
"content": [
{
"type": "text",
"text": "Extract all text from this image"
},
{
"type": "image_url",
"image_url": {"url": "data:image/png;base64,[BASE64]"}
}
]
}]
}'3. Document and chart analysis
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/llama3.2-vision:11B-Q4_K_M",
"messages": [{
"role": "user",
"content": [
{
"type": "text",
"text": "Analyze this chart and summarize the key trends"
},
{
"type": "image_url",
"image_url": {"url": "data:image/png;base64,[BASE64]"}
}
]
}]
}'4. Comparing multiple images
curl http://localhost:12434/engines/llama.cpp/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "ai/gemma3:4B-Q4_K_M",
"messages": [{
"role": "user",
"content": [
{
"type": "text",
"text": "Compare these two images and describe the differences"
},
{
"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,[BASE64_1]"}
},
{
"type": "image_url",
"image_url": {"url": "data:image/jpeg;base64,[BASE64_2]"}
}
]
}]
}'Building a Python wrapper
Let's create a reusable class for easier integration:
import base64
import requests
from pathlib import Path
class DockerModelRunner:
def __init__(self,
base_url="http://localhost:12434",
model="ai/gemma3:4B-Q4_K_M"):
self.base_url = base_url
self.model = model
self.endpoint = f"{base_url}/engines/llama.cpp/v1/chat/completions"
def encode_image(self, image_path):
"""Encode image to base64 data URI"""
path = Path(image_path)
with open(path, 'rb') as f:
encoded = base64.b64encode(f.read()).decode('utf-8')
ext = path.suffix.lower()[1:]
mime = f'image/{"jpeg" if ext == "jpg" else ext}'
return f"data:{mime};base64,{encoded}"
def analyze_image(self, image_path, prompt="Describe this image"):
"""Analyze an image with a custom prompt"""
data_uri = self.encode_image(image_path)
payload = {
"model": self.model,
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": data_uri}}
]
}]
}
response = requests.post(
self.endpoint,
headers={"Content-Type": "application/json"},
json=payload,
timeout=60
)
response.raise_for_status()
result = response.json()
return result['choices'][0]['message']['content']
def chat(self, message):
"""Simple text chat"""
payload = {
"model": self.model,
"messages": [{
"role": "user",
"content": message
}]
}
response = requests.post(
self.endpoint,
headers={"Content-Type": "application/json"},
json=payload,
timeout=60
)
response.raise_for_status()
result = response.json()
return result['choices'][0]['message']['content']
# Usage example
if __name__ == "__main__":
runner = DockerModelRunner()
# Text chat
print("=== Text Chat ===")
response = runner.chat("What is containerization?")
print(response)
# Vision analysis
print("\n=== Vision Analysis ===")
response = runner.analyze_image(
"diagram.png",
"Explain what this architecture diagram shows"
)
print(response)Tips for best results
1. Choose the right model size
| Use Case | Recommended Model | Why |
|---|---|---|
| Quick prototyping | ai/gemma3:2B-Q4_K_M | Fastest, lowest resource usage |
| General purpose | ai/gemma3:4B-Q4_K_M | Good balance of speed and quality |
| High accuracy | ai/llama3.2-vision:11B-Q4_K_M | Better understanding, slower |
| Production quality | ai/gemma3:27B-Q4_K_M | Best results |
2. Optimize your images
Before encoding, consider resizing large images:
# Using ImageMagick
convert large-image.jpg -resize 1024x1024 optimized.jpg
# Using Python/Pillow
from PIL import Image
img = Image.open('large-image.jpg')
img.thumbnail((1024, 1024))
img.save('optimized.jpg')Smaller images mean faster encoding, smaller payloads, and quicker inference.
3. Understand the API URLs
The API endpoint changes based on context:
- From host machine:
http://localhost:12434 - From Docker container:
http://model-runner.docker.internal:12434 - With Docker Compose: Use the service name
4. Performance expectations
- First API call loads the model (may take a few seconds)
- Subsequent calls are fast
- Models auto-unload after 5 minutes of inactivity
- Vision requests are slower than text-only due to image processing
5. Supported image formats
Common formats work well:
- JPEG/JPG:
data:image/jpeg;base64,... - PNG:
data:image/png;base64,... - WebP:
data:image/webp;base64,... - GIF:
data:image/gif;base64,...
Integrating with Docker Compose
You can declare model dependencies in your Compose file:
services:
app:
build: .
ports:
- "3000:3000"
models:
- vision-model
environment:
- MODEL_API_URL=http://model-runner.docker.internal:12434
models:
vision-model:
model: ai/gemma3:4B-Q4_K_M
context_size: 4096This ensures the model is available before your application starts.
Troubleshooting common issues
"Model not found"
# Verify the model is pulled
docker model ls
# Pull it if missing
docker model pull ai/gemma3:4B-Q4_K_MConnection refused
- Verify Docker Desktop is running
- Check that Model Runner is enabled in Docker Desktop settings
Timeout errors
- Try a smaller model
- Reduce image size before encoding
- Ensure sufficient system resources (RAM, GPU memory)
Unexpected responses
- Verify the model supports vision (use gemma3 or llama3.2-vision)
- Check that base64 encoding is complete and properly formatted
- Ensure MIME type matches image format
Real-world example: Building an image analyzer
Here's a complete Flask app that uses Docker Model Runner for image analysis:
from flask import Flask, request, jsonify
import base64
import requests
from io import BytesIO
from PIL import Image
app = Flask(__name__)
MODEL_API = "http://localhost:12434/engines/llama.cpp/v1/chat/completions"
MODEL_NAME = "ai/gemma3:4B-Q4_K_M"
def encode_image(image_file):
"""Encode uploaded image to base64"""
image = Image.open(image_file)
# Optimize size
image.thumbnail((1024, 1024))
# Convert to bytes
buffer = BytesIO()
image.save(buffer, format='JPEG')
encoded = base64.b64encode(buffer.getvalue()).decode('utf-8')
return f"data:image/jpeg;base64,{encoded}"
@app.route('/analyze', methods=['POST'])
def analyze():
if 'image' not in request.files:
return jsonify({"error": "No image provided"}), 400
image_file = request.files['image']
prompt = request.form.get('prompt', 'Describe this image')
# Encode image
data_uri = encode_image(image_file)
# Call Docker Model Runner
payload = {
"model": MODEL_NAME,
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": data_uri}}
]
}]
}
response = requests.post(MODEL_API, json=payload, timeout=60)
result = response.json()
return jsonify({
"analysis": result['choices'][0]['message']['content'],
"model": MODEL_NAME,
"tokens_used": result['usage']['total_tokens']
})
if __name__ == '__main__':
app.run(debug=True, port=5000)Test it:
curl -X POST http://localhost:5000/analyze \
-F "image=@photo.jpg" \
-F "prompt=What objects do you see in this image?"Why choose Docker Model Runner?
- Complete privacy: Your images and data never leave your machine
- No API costs: Run as many requests as you want, no usage fees
- Docker integration: Models work seamlessly with your containerized apps
- Familiar API: OpenAI-compatible format means minimal learning curve
- Registry benefits: Version control, tagging, and distribution for AI models
- Flexible deployment: Works on Docker Desktop, servers, and CI/CD pipelines
Next steps
Now that you understand the basics:
- Experiment with different models - Try various sizes and families to find the right balance
- Build practical applications - Image captioning, document analysis, accessibility tools
- Integrate with your workflow - Add vision capabilities to existing Docker applications
- Optimize performance - Tune image sizes and model selection for your use case
- Explore the ecosystem - Check out models on Docker Hub and Hugging Face
Resources
- Official documentation: https://docs.docker.com/ai/model-runner/
- API reference: https://docs.docker.com/ai/model-runner/api-reference/
- Docker Hub AI models: https://hub.docker.com/u/ai
- Hugging Face GGUF models: https://huggingface.co/models?library=gguf
Have questions or want to share what you've built with Docker Model Runner? We'd love to hear from you!
Ready to run AI vision models locally? Get started with Docker Model Runner today at docs.docker.com/ai/model-runner/