How I Built an Auto-Curator Agent That Uses Nemotron to Manage Its Own Awesome List

How I Built an Auto-Curator Agent That Uses Nemotron to Manage Its Own Awesome List
NVIDIA Nemotron ~ Open and efficient multimodal models for agentic AI.
💡
"The model is so capable, we use it to maintain the very repository that tracks its own ecosystem."

I maintain awesome-nvidia-nemotron ~ a curated list of everything in the NVIDIA Nemotron ecosystem: models, papers, tutorials, tools, companies, videos, and more. Keeping it fresh is a full-time job by itself. New models drop on HuggingFace. Papers land on arXiv overnight. Blog posts pop up across dev.to, Medium, and NVIDIA's own blog every few days.

So I did what any reasonable developer would do: I automated it. With Nemotron itself.

This post walks through how I built a multi-agent system using Docker Agent and NVIDIA Nemotron that wakes up every morning at 9am, searches the web for new Nemotron content, validates it, and opens a GitHub Pull Request to update the repo — all without me touching a thing.


The Architecture: Four Agents, One YAML

The entire system is defined in a single YAML file. No Python. No Node.js. Just declarative configuration that Docker Agent interprets and executes.

The system has four specialized agents:

root (coordinator)
 ├── discoverer  →  searches web, HuggingFace, arXiv, GitHub
 ├── validator   →  checks links, quality, relevance
 └── publisher   →  creates branch, updates README, opens PR

The root agent receives a daily prompt, delegates to each sub-agent in sequence, and orchestrates the full pipeline. Each agent has a focused job and its own toolset.

Why Multi-Agent?

A single agent trying to do everything — search, validate, format, commit — gets confused and produces poor results. Splitting responsibilities means each agent can be optimized for its task. The discoverer uses web search tools. The validator fetches URLs and checks content. The publisher uses GitHub MCP tools to write files and open PRs. Clean separation, better results.


The Model: Nemotron Curating Nemotron

This is the part I find most compelling about this project. The curator is powered by NVIDIA Nemotron ~ the very model family the repository is about.

providers:
  nvidia:
    api_type: openai_chatcompletions
    base_url: https://integrate.api.nvidia.com/v1
    token_key: NVIDIA_API_KEY

models:
  smart:
    provider: nvidia
    model: nvidia/llama-3.1-nemotron-ultra-253b-v1
    max_tokens: 8192

  fast:
    provider: nvidia
    model: nvidia/llama-3.1-nemotron-nano-8b-v1
    max_tokens: 4096

The root, discoverer, and publisher agents use the Nemotron Ultra 253B for deep reasoning and quality judgments. The validator uses Nemotron Nano 8B for fast, cheap URL checking and relevance scoring.

The NVIDIA API at integrate.api.nvidia.com/v1 is OpenAI-compatible, which means Docker Agent can talk to it natively. You get free credits by signing up at build.nvidia.com.


The Full Agent YAML

Here's the complete auto-curator-agent/nemotron-curator.yaml:

#!/usr/bin/env -S docker agent run

providers:
  nvidia:
    api_type: openai_chatcompletions
    base_url: https://integrate.api.nvidia.com/v1
    token_key: NVIDIA_API_KEY

models:
  smart:
    provider: nvidia
    model: nvidia/llama-3.1-nemotron-ultra-253b-v1
    max_tokens: 8192

  fast:
    provider: nvidia
    model: nvidia/llama-3.1-nemotron-nano-8b-v1
    max_tokens: 4096

agents:
  root:
    model: smart
    skills: true
    description: |
      Coordinates maintenance of the awesome-nvidia-nemotron list.
    instruction: |
      You are the curator of the awesome-nvidia-nemotron list at:
      https://github.com/ajeetraina/awesome-nvidia-nemotron

      ## Workflow
      1. Delegate to `discoverer` to find new blogs, papers, models, repos, videos
      2. Delegate to `validator` to check each URL is live and relevant
      3. Delegate to `publisher` to create a branch and PR with additions

      ## Quality Standards
      - Only include resources specifically about NVIDIA Nemotron
      - Prefer official sources, arXiv, active GitHub repos, well-known tech blogs
      - Each entry must match the existing markdown table format in its section

    sub_agents: [discoverer, validator, publisher]
    toolsets:
      - type: filesystem
      - type: think
      - type: todo
      - type: mcp
        ref: docker:github
        env:
          GITHUB_PERSONAL_ACCESS_TOKEN: $GITHUB_PERSONAL_ACCESS_TOKEN

  discoverer:
    model: smart
    skills: true
    description: |
      Searches for new Nemotron resources across the web, HuggingFace, arXiv, and GitHub.
    instruction: |
      Search for new content about NVIDIA Nemotron:

      - HuggingFace: new models under nvidia/ org with "nemotron" in name
      - arXiv: papers with "Nemotron" in title or abstract
      - Blogs: nvidia.com/blog, developer.nvidia.com, huggingface.co/blog, dev.to, medium.com
      - GitHub: repos with "nemotron" in name/description, recent activity
      - Videos: YouTube talks, GTC sessions, tutorial walkthroughs

      Always check the current README.md to avoid duplicates.

    toolsets:
      - type: mcp
        ref: docker:duckduckgo
      - type: fetch
      - type: filesystem
      - type: mcp
        ref: docker:github
        env:
          GITHUB_PERSONAL_ACCESS_TOKEN: $GITHUB_PERSONAL_ACCESS_TOKEN

  validator:
    model: fast
    skills: true
    description: |
      Validates URLs and checks content quality for the awesome list.
    instruction: |
      For each resource:
      1. Fetch the URL — check it returns HTTP 200
      2. Verify content is specifically about NVIDIA Nemotron
      3. Check it's current and not deprecated

      Report: ✅ VALID / ⚠️ WARNING / ❌ BROKEN / 🚫 REJECTED

    toolsets:
      - type: filesystem
      - type: fetch
      - type: think
      - type: mcp
        ref: docker:github
        env:
          GITHUB_PERSONAL_ACCESS_TOKEN: $GITHUB_PERSONAL_ACCESS_TOKEN

  publisher:
    model: smart
    skills: true
    description: |
      Creates GitHub branches and PRs to update the awesome list README.
    instruction: |
      1. Read current README.md from ajeetraina/awesome-nvidia-nemotron
      2. Create branch: curator/add-resources-YYYY-MM-DD
      3. Insert new entries at end of correct sections, preserving all formatting
      4. Open PR with clear title and list of changes

      Never delete existing entries. Match the existing table format exactly.

    toolsets:
      - type: filesystem
      - type: think
      - type: mcp
        ref: docker:github
        env:
          GITHUB_PERSONAL_ACCESS_TOKEN: $GITHUB_PERSONAL_ACCESS_TOKEN

The GitHub Actions Workflow

The agent runs automatically every morning via GitHub Actions:

# .github/workflows/nemotron-auto-curator.yml
name: Nemotron Auto Curator

on:
  schedule:
    - cron: "0 9 * * *"   # Every morning at 9am UTC

  workflow_dispatch:
    inputs:
      prompt:
        description: "Curator prompt to run"
        required: false
        default: "Find new Nemotron resources from the last week and create a PR adding them"

jobs:
  curate:
    runs-on: ubuntu-latest
    permissions:
      contents: write
      pull-requests: write

    steps:
      - uses: actions/checkout@v4

      - name: Install Docker Agent
        run: |
          curl -L -o /tmp/docker-agent \
            https://github.com/docker/docker-agent/releases/latest/download/docker-agent-linux-amd64
          mkdir -p ~/.docker/cli-plugins
          mv /tmp/docker-agent ~/.docker/cli-plugins/docker-agent
          chmod +x ~/.docker/cli-plugins/docker-agent

      - name: Run Nemotron Curator Agent
        env:
          NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
          GITHUB_PERSONAL_ACCESS_TOKEN: ${{ secrets.GH_PAT }}
        run: |
          docker agent run ./auto-curator-agent/nemotron-curator.yaml \
            "Find new Nemotron resources published in the last 24 hours and create a PR"

Where the Files Live

awesome-nvidia-nemotron/
├── .github/
│   └── workflows/
│       └── nemotron-auto-curator.yml   ← Runs every morning at 9am UTC
└── auto-curator-agent/
    └── nemotron-curator.yaml           ← The Docker Agent config

Secrets You Need

Two secrets in your repo's Settings → Secrets → Actions:

Secret Name What it is
NVIDIA_API_KEY From build.nvidia.com — free credits on sign-up
GH_PAT GitHub Personal Access Token with repo + pull-requests: write scopes
Why GH_PAT and not GITHUB_TOKEN? GitHub Actions blocks any secret name starting with GITHUB_. So we store it as GH_PAT and map it to GITHUB_PERSONAL_ACCESS_TOKEN at runtime in the workflow env block. This is the variable Docker Agent's GitHub MCP toolset expects.

Lessons Learned the Hard Way

Building this was not without friction. Here's everything I ran into so you don't have to:

1. api-inference.huggingface.co is gone HuggingFace deprecated their old inference endpoint. The new one is router.huggingface.co. But even then...

2. Llama-3.1-Nemotron-70B-Instruct-HF has inference: false on HuggingFace This model requires 2× A100 80GB GPUs to self-host and is not available for serverless inference via the HuggingFace router. Use NVIDIA's own API at integrate.api.nvidia.com/v1 instead — it's OpenAI-compatible and has a generous free tier.

3. Docker Agent needs GITHUB_PERSONAL_ACCESS_TOKEN in the agent YAML too Even after setting it in the workflow env, each toolset block that uses docker:github needs it passed explicitly:

toolsets:
  - type: mcp
    ref: docker:github
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: $GITHUB_PERSONAL_ACCESS_TOKEN

Running It Manually

You can also run it locally with:

export NVIDIA_API_KEY=your_key_here
export GITHUB_PERSONAL_ACCESS_TOKEN=your_pat_here

docker agent run ./auto-curator-agent/nemotron-curator.yaml \
  "Find new Nemotron models released on HuggingFace this month"

Or for other tasks:

# Check for broken links
docker agent run ./auto-curator-agent/nemotron-curator.yaml \
  "Check all links in the awesome list for broken URLs"

# Hunt for new papers
docker agent run ./auto-curator-agent/nemotron-curator.yaml \
  "Search for new arXiv papers citing Nemotron published this week"

Why This Matters Beyond the Demo

The meta-story here is compelling, but there's a real engineering point underneath it:

Agentic AI changes what's possible for open source maintenance. Most awesome lists go stale because no one has time to curate them manually. With Docker Agent + a capable reasoning model, you can turn a YAML file into a fully autonomous maintenance pipeline that runs on a schedule, integrates with GitHub natively, and produces PRs that a human just needs to review and merge.

The Nemotron angle makes this particularly fitting. The model is designed for agentic tasks — tool-calling, multi-step reasoning, following complex instructions. Curation is exactly that kind of task: search, reason about relevance, format output, take action.

It's also a great template for any awesome list. Swap out the instructions, point it at a different repo, change the search terms — and you have an autonomous curator for any topic.


What's Next

A few things I'm planning to add:

  • Weekly digest mode — instead of daily PRs, batch a week of findings into one PR with a summary
  • Duplicate detection — smarter deduplication using embeddings rather than exact URL matching
  • Section scoring — the agent rates how well-covered each section is and prioritizes gaps
  • Slack notification — ping the Collabnix Slack when a PR is opened

Try It Yourself

The full source is in the repo:

If you maintain an awesome list and want to set this up for your own repo, drop a message in the Collabnix Slack — happy to help.