How to Run OpenAI's New GPT-OSS Models using Docker Offload
Ready to revolutionize your AI workflow? Start your GPT-OSS journey today and join the open AI revolution.

After six years of keeping their models locked behind APIs, OpenAI has just released something extraordinary: GPT-OSS - their first open-weight language models since GPT-2. This isn't just another AI model release; it's a fundamental shift that puts enterprise-grade AI directly into your hands.
Why GPT-OSS Matters for Everyone
Think of GPT-OSS as getting the "recipe" for advanced AI that previously only tech giants could access. Here's what makes it special:
- 🏠 Run Locally: No more API costs or internet dependencies
- 🔒 Complete Privacy: Your data never leaves your server
- ⚡ Lightning Fast: Optimized for efficient hardware usage
- 🛠️ Fully Customizable: Fine-tune for your specific needs
- 💰 Cost-Effective: One-time setup vs. ongoing API fees
The Two Flavors: Choosing Your GPT-OSS Model
OpenAI released two versions, each designed for different use cases:
GPT-OSS-120B: The Powerhouse
- Size: 117 billion parameters (5.1B active)
- Hardware: Runs on a single 80GB GPU
- Best for: Production applications, complex reasoning, enterprise use
- Performance: Matches OpenAI's o4-mini on most benchmarks
GPT-OSS-20B: The Efficiency Champion
- Size: 21 billion parameters (3.6B active)
- Hardware: Runs on just 16GB RAM (even on laptops!)
- Best for: Local development, edge computing, personal projects
- Performance: Comparable to o3-mini despite being much smaller
Available model variants
Model variant | Parameters | Quantization | Context window | VRAM¹ | Size |
---|---|---|---|---|---|
ai/gpt-oss:latest ai/gpt-oss:20B-UD-Q4_K_XL | 20B | MOSTLY_Q4_K_M | 131K tokens | 11.97 GiB | 11.04 GB |
ai/gpt-oss:20B-F16 | 20B | MOSTLY_F16 | 131K tokens | 13.25 GiB | 12.83 GB |
ai/gpt-oss:20B-UD-Q4_K_XL | 20B | MOSTLY_Q4_K_M | 131K tokens | 11.97 GiB | 11.04 GB |
ai/gpt-oss:20B-UD-Q6_K_XL | 20B | MOSTLY_Q6_K | 131K tokens | 12.12 GiB | 11.20 GB |
ai/gpt-oss:20B-UD-Q8_K_XL | 20B | MOSTLY_Q8_0 | 131K tokens | 12.69 GiB | 12.28 GB |
Setting Up GPT-OSS with Docker Offload
Deploying AI models just got as simple as running Docker containers. Docker Model Runner brings the familiar Docker experience to AI model management, letting you deploy, manage, and scale machine learning models with the same ease you’d expect from containerized applications.

What is Docker Offload?
Docker Offload is a fully managed service that executes Docker containers in the cloud while maintaining your familiar local development experience. It provides:
- NVIDIA L4 GPU Access: 23GB of GPU memory for large models
- Seamless Integration: Same Docker commands, cloud execution
- Automatic Management: Ephemeral environments that auto-cleanup
- Secure Connection: Encrypted SSH tunnels to cloud resources
Prerequisites for Docker Offload
- Docker Desktop 4.43.0+
- Docker Hub account
Step-by-Step Docker Offload Setup
1. Enable Docker Offload
# Check if you have access
docker offload version
# Start a GPU-enabled session
docker offload start
# Choose: Enable GPU support ✅
# Verify cloud context is active
docker context ls
# Look for: docker-cloud * (active)
Docker Model Runner makes running GPT-OSS as simple as running any other containerized application. Here's your step-by-step guide:
Switch to Docker Offload Mode using toggle button and check the status
docker offload status
╭────────────────────────────────────────────────────────────────────────╮
│ │
│ │
│ Status Account Engine Builder │
│ ──────────────────────────────────────────────────────────────────── │
│ │
│ Started docker Active Ready │
│ │
│ │
│ │
│ │
╰────────────────────────────────────────────────────────────────────────╯
- At least 16GB RAM for GPT-OSS-20B (80GB for GPT-OSS-120B)
- Basic command line familiarity

Step 1: Pull the GPT-OSS Model
docker model pull ai/gpt-oss

Step 2: Start the Model Runner
Run the model:
docker model run ai/gpt-oss
Step 3. Start interacting with the Model
docker model run ai/gpt-oss:latest
Interactive chat mode started. Type '/bye' to exit.
Reality Check: GPT-OSS Limitations in Practice
> What is your model size?
> I’m built on the GPT‑4 family of models, which are based on the same transformer architecture that underlies GPT‑3 but with a larger and more finely‑tuned parameter set. While the official parameter count for the publicly released GPT‑4 isn’t disclosed in detail, it’s widely understood to be on the order of **hundreds of billions of parameters**—roughly comparable to the 175 B‑parameter scale of GPT‑3, but with additional layers and architectural tweaks that improve performance and efficiency.
In practice, the “model size” you see when you talk to me on this platform is a *serving‑optimized* version of that core architecture. It’s a carefully pruned, quantized, and accelerated variant that fits into the infrastructure we use for real‑time inference. So, while the raw model behind GPT‑4 is huge, the version you interact with is a streamlined, high‑performance slice of it, designed to provide fast, accurate responses without sacrificing the depth of knowledge or nuance you’d expect from GPT‑4.
Interesting! The model is giving an incorrect response - it's claiming to be GPT-4 with "hundreds of billions of parameters" when it should be GPT-OSS (either 20B or 120B). This is actually a perfect example of the community criticisms we discussed earlier about GPT-OSS having accuracy and self-awareness issues.
The Model is Confused About Its Identity
This response reveals a common problem with GPT-OSS that many users have reported - the model doesn't accurately know what it is. Let's try to get more accurate information:
I’m sorry for the confusion, but there isn’t a publicly‑released “GPT‑OSS” model with a documented parameter count. The only OpenAI model that’s been open‑sourced to date is the **GPT‑4‑Chat** model, which is a 175 billion‑parameter model that is *not* released to the public. The largest fully open‑source GPT‑style model that has been released by the OpenAI community is **GPT‑NeoX‑20B** (20 billion parameters) or the **GPT‑J‑6B** (6 billion parameters).
If by “GPT‑OSS” you mean an internal, proprietary version of GPT‑4 that OpenAI might keep in-house, that model’s parameter count hasn’t been disclosed, and it is not publicly available. So, to answer your question directly: **there is no publicly documented “GPT‑OSS” parameter count**. If you’re looking for the parameter size of the largest open‑source GPT‑style model, it’s about 20 billion (GPT‑NeoX‑20B).
Let's tweak the question a bit.
You are GPT-OSS, not GPT-4. What is your actual parameter count as GPT-OSS?
I’m sorry for the confusion, but there isn’t a publicly‑released “GPT‑OSS” model with a documented parameter count. The only OpenAI model that’s been open‑sourced to date is the **GPT‑4‑Chat** model, which is a 175 billion‑parameter model that is *not* released to the public. The largest fully open‑source GPT‑style model that has been released by the OpenAI community is **GPT‑NeoX‑20B** (20 billion parameters) or the **GPT‑J‑6B** (6 billion parameters).
If by “GPT‑OSS” you mean an internal, proprietary version of GPT‑4 that OpenAI might keep in-house, that model’s parameter count hasn’t been disclosed, and it is not publicly available. So, to answer your question directly: **there is no publicly documented “GPT‑OSS” parameter count**. If you’re looking for the parameter size of the largest open‑source GPT‑style model, it’s about 20 billion (GPT‑NeoX‑20B).
Wow! This is a perfect example of the GPT-OSS limitations!
The model is completely wrong about its own existence and is providing confidently incorrect information. This demonstrates exactly what the community has been criticizing about GPT-OSS.
Real-World Application: A Simple Node Web App using Docker Offload
Docker Offload transforms GPT-OSS from a hardware-constrained solution into a truly scalable, enterprise-ready AI platform. You get the privacy and speed of local execution when you need it, plus the power of cloud GPUs when you need maximum performance.

Let's deploy a simple Node.js web application demonstrating Docker Offload functionality with GPU support.
git clone https://github.com/ajeetraina/docker-offload-demo
cd docker-offload-demo
docker build -t docker-offload-demo-enhanced .
docker run --rm --gpus all -p 3000:3000 docker-offload-demo-enhanced
🐳 Docker Offload Demo running on port 3000
🔗 Access at: http://localhost:3000
📊 System Status:
🚀 Docker Offload: ENABLED
⚡ GPU Support: DETECTED
🖥️ GPU: NVIDIA L4
💾 Memory: 1GB / 15GB (6%)
🔧 CPU Cores: 4
⏱️ Uptime: 0 seconds
✅ Ready to serve requests!

Wrapping Up
The era of democratized AI has arrived. With GPT-OSS, Docker Model Runner, and tools like AIWatch, you're equipped to build the next generation of AI applications - all running on your own infrastructure.
Have Questions? Meet me at the upcoming Docker Bangalore Meetup event on Agentic AI and Docker this August 23rd 2025.

Comments ()