Complete Step-by-Step Tutorial: GPU-Accelerated Dental AI Training on NVIDIA Jetson AGX Thor

Complete Step-by-Step Tutorial: GPU-Accelerated Dental AI Training on NVIDIA Jetson AGX Thor

Robotics is at an inflection point. We're witnessing a fundamental shift from single-purpose, fixed-function robots to generalist machines that can adapt, reason, and perform diverse tasks across unpredictable environments. This transformation demands something unprecedented: the ability to run massive generative AI models—large language models (LLMs), vision language models (VLMs), and vision language action (VLA) models—directly on the robot, at the edge, in real-time.

Think about what a modern humanoid robot or autonomous machine needs to accomplish: process streams from multiple cameras and sensors simultaneously, understand natural language commands, reason about complex environments, plan multi-step actions, and execute precise movements—all with sub-200ms latency. Previous edge AI platforms simply couldn't deliver this level of performance. Running a 7B or 30B parameter model locally while simultaneously handling sensor fusion and real-time control was a computational impossibility within a reasonable power envelope.

NVIDIA Jetson AGX Thor changes everything. It's not just an incremental upgrade—it's a purpose-built supercomputer for physical AI, designed from the ground up to be the "brain" for next-generation robots, autonomous machines, and intelligent edge systems.

NVIDIA Jetson AGX Thor is the ultimate platform for physical AI and robotics. Powered by the NVIDIA Blackwell GPU architecture, it delivers unprecedented AI compute at the edge. The Developer Kit combines the T5000 module with extensive I/O capabilities including 4×25GbE networking via QSFP28, 1TB NVMe storage, Wi-Fi 6E, and comprehensive connectivity for sensors and actuators.

Who Should Follow This Tutorial?

This tutorial is designed for:

  • Robotics Engineers building humanoid robots, AMRs, or manipulators
  • AI/ML Developers deploying generative AI models at the edge
  • Embedded Systems Engineers working on real-time multi-sensor applications
  • Researchers exploring physical AI and embodied intelligence

Prerequisites

  • Familiarity with Linux/Ubuntu environments
  • Basic understanding of Docker containers
  • Experience with AI/ML frameworks (PyTorch, TensorRT)
  • Hardware: Jetson AGX Thor Developer Kit

From Theory to Practice: Your First Physical AI Project

While Jetson Thor powers humanoid robots and autonomous machines, the same GPU-accelerated pipeline applies to any computer vision task. In this hands-on tutorial, we'll train a dental AI model — demonstrating the complete workflow you'd use for any edge AI project.

📚 Table of Contents

  1. Prerequisites Verification
  2. Docker Configuration for Jetson
  3. Repository Setup
  4. Dataset Preparation
  5. Auto-Annotation with SAM
  6. Dataset Splitting
  7. GPU Training

1. Prerequisites Verification

Step 1.1: Check NVIDIA Driver and GPU

nvidia-smi

What this does: Displays GPU information, driver version, and CUDA version.



+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.00                 Driver Version: 580.00         CUDA Version: 13.0     |
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
|   0  NVIDIA Thor                    Off |   00000000:01:00.0 Off |                  N/A |

Why: Confirms your Jetson has the GPU drivers installed and the GPU is detected.


Step 1.2: Check Docker Installation

docker --version

What this does: Shows Docker version.

Expected output: Docker version 24.x.x or similar

Why: Docker is required to run GPU-accelerated containers.


Step 1.3: Test GPU Access in Docker

sudo docker run --rm -it nvcr.io/nvidia/pytorch:25.08-py3 python3 -c "
import torch
print('PyTorch version:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
if torch.cuda.is_available():
    print('GPU name:', torch.cuda.get_device_name(0))
"
```

**What this does:** 
- Pulls NVIDIA's PyTorch container (if not already present)
- Runs a Python script inside the container
- Checks if PyTorch can access the GPU

**Expected output:**
```
PyTorch version: 2.8.0a0+34c6371d24.nv25.08
CUDA available: True
GPU name: NVIDIA Thor

Why: Verifies that Docker can properly access the GPU through NVIDIA Container Toolkit.


2. Docker Configuration for Jetson

Step 2.1: Check Current Docker Configuration

cat /etc/docker/daemon.json

What this does: Displays Docker daemon configuration.

Initial output might show:

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

Why: We need to check if nvidia runtime is configured before modifying.


Step 2.2: Add NVIDIA as Default Runtime

sudo apt install -y jq

What this does: Installs jq, a JSON processor tool.

Why: We need jq to safely edit JSON configuration files.


sudo jq '. + {"default-runtime": "nvidia"}' /etc/docker/daemon.json | \
  sudo tee /etc/docker/daemon.json.tmp && \
  sudo mv /etc/docker/daemon.json.tmp /etc/docker/daemon.json

What this does:

  • Reads current daemon.json
  • Adds "default-runtime": "nvidia" to it
  • Saves to temporary file
  • Replaces original file

Why: Setting nvidia as the default runtime means we don't need to specify --runtime=nvidia or --gpus all every time we run a container.


Step 2.3: Verify Configuration

cat /etc/docker/daemon.json

Expected output:

{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    },
    "default-runtime": "nvidia"
}

Why: Confirms the configuration was updated correctly.


Step 2.4: Restart Docker Service

sudo systemctl daemon-reload && sudo systemctl restart docker

What this does:

  • daemon-reload: Reloads systemd configuration
  • restart docker: Restarts Docker daemon to apply new configuration

Why: Docker needs to be restarted to pick up the new configuration.


Step 2.5: Add User to Docker Group (Optional)

sudo usermod -aG docker $USER
newgrp docker

What this does:

  • usermod -aG: Adds current user to docker group
  • newgrp: Activates the new group membership

Why: Allows running docker commands without sudo. Makes workflow smoother.

Note: You may need to logout/login for this to take full effect.


3. Repository Setup

Step 3.1: Clone Repository

git clone https://github.com/ajeetraina/dentescope-ai-complete.git
cd dentescope-ai-complete

What this does:

  • Downloads the DenteScope AI repository from GitHub
  • Changes directory into the cloned repo

Why: This repository contains training scripts, data preparation tools, and project structure.


Step 3.2: Switch to GPU Testing Branch

git checkout gpu-testing

What this does: Switches to the gpu-testing branch.

Why: This branch has GPU-specific configurations and Dockerfiles optimized for GPU training.


Step 3.3: Verify Repository Structure

ls -la
```

**Expected output:**
```
data/
backend/
frontend/
train_tooth_model.py
docker-compose.gpu-training.yml
Dockerfile.gpu-training
...

Why: Confirms you're in the right directory with all necessary files.


Step 3.4: Check Data Directory

ls -la data/
```

**Expected output:**
```
drwxrwxr-x 6 ajeetraina ajeetraina  4096 Nov  1 20:38 .
drwxrwxr-x 9 ajeetraina ajeetraina  4096 Nov  1 18:59 ..
drwxrwxr-x 2 ajeetraina ajeetraina 12288 Nov  1 17:39 raw
drwxrwxr-x 4 ajeetraina ajeetraina  4096 Nov  1 20:38 test
drwxrwxr-x 4 ajeetraina ajeetraina  4096 Nov  1 20:38 train
drwxrwxr-x 4 ajeetraina ajeetraina  4096 Nov  1 20:38 val

Why: Shows you have data directories set up. The raw folder contains your original dental X-ray images.


4. Dataset Preparation

Step 4.1: Count Raw Images

ls data/raw/*.jpg data/raw/*.png 2>/dev/null | wc -l

What this does: Counts all .jpg and .png files in data/raw/

Expected output: 73 (or similar number)

Why: Confirms how many images you have to work with.


Step 4.2: Create data.yaml Configuration

cat > data/data.yaml << 'EOF'
# DenteScope AI Dataset Configuration
path: /workspace/data
train: train/images
val: val/images
test: test/images

nc: 1
names: ['tooth']
EOF

What this does: Creates a YAML configuration file for YOLOv8 training.

Explanation of fields:

  • path: Root directory (inside Docker container)
  • train/val/test: Subdirectories with images
  • nc: Number of classes (1 = only "tooth" class)
  • names: List of class names

Why: YOLOv8 requires this configuration file to know where data is located and what classes to detect.


Step 4.3: Verify data.yaml

cat data/data.yaml

What this does: Displays the contents of the file you just created.

Why: Confirms the configuration was written correctly.


5. Auto-Annotation with SAM

Step 5.1: Install Dependencies for Testing

pip3 install segment-anything opencv-python pillow --break-system-packages

What this does: Installs Python packages needed for SAM (Segment Anything Model).

Package explanations:

  • segment-anything: Meta's segmentation AI model
  • opencv-python: Image processing library
  • pillow: Python image library
  • --break-system-packages: Required flag on Jetson to install packages

Why: These libraries are needed to run SAM for auto-annotation.


Step 5.2: Create Quick Test Script

cat > auto_annotate_teeth_sam_quick.py << 'EOF'
#!/usr/bin/env python3
"""
Quick test: Annotate first 5 images with SAM
"""
from pathlib import Path
import cv2
import sys

print("=" * 60)
print("Quick Tooth Detection Test (5 images)")
print("=" * 60)

try:
    from segment_anything import sam_model_registry, SamAutomaticMaskGenerator
except ImportError:
    print("\n📦 Installing segment-anything...")
    import subprocess
    subprocess.run([sys.executable, "-m", "pip", "install", "-q", "segment-anything", "--break-system-packages"], check=True)
    from segment_anything import sam_model_registry, SamAutomaticMaskGenerator

import torch

# Download SAM model
model_path = "sam_vit_b_01ec64.pth"
if not Path(model_path).exists():
    print(f"\n📥 Downloading SAM model (~375MB)...")
    import urllib.request
    url = "https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth"
    
    def progress(block, block_size, total):
        percent = min(block * block_size * 100 / total, 100)
        sys.stdout.write(f"\r   Progress: {percent:.1f}%")
        sys.stdout.flush()
    
    urllib.request.urlretrieve(url, model_path, progress)
    print("\n✓ Download complete")

# Load SAM
print("\n🤖 Loading SAM model...")
sam = sam_model_registry["vit_b"](checkpoint=model_path)

if torch.cuda.is_available():
    sam = sam.to("cuda")
    print(f"✓ Using GPU: {torch.cuda.get_device_name(0)}")
else:
    print("⚠️  Using CPU (will be slower)")

# Configure for teeth
mask_generator = SamAutomaticMaskGenerator(
    model=sam,
    points_per_side=48,
    pred_iou_thresh=0.88,
    stability_score_thresh=0.90,
    min_mask_region_area=50,
)

# Process FIRST 5 images only
raw_dir = Path("data/raw")
labels_dir = Path("data/raw/labels")
labels_dir.mkdir(exist_ok=True)

all_images = list(raw_dir.glob("*.jpg")) + list(raw_dir.glob("*.png")) + list(raw_dir.glob("*.JPG"))
images = all_images[:5]  # ONLY FIRST 5

print(f"\n📊 Testing on {len(images)} images (out of {len(all_images)} total)")
print("🦷 Detecting individual teeth...\n")

for idx, img_path in enumerate(images, 1):
    print(f"[{idx}/{len(images)}] {img_path.name}")
    
    try:
        image = cv2.imread(str(img_path))
        if image is None:
            print("   ⚠️  Could not read image")
            continue
            
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        h, w = image.shape[:2]
        
        print(f"   Image size: {w}x{h}")
        print(f"   Generating masks (this takes 2-4 minutes)...")
        
        masks = mask_generator.generate(image_rgb)
        
        print(f"   Found {len(masks)} total masks")
        
        # Filter for tooth-sized objects (1-5% of image area)
        min_area = (w * h) * 0.005
        max_area = (w * h) * 0.05
        
        teeth_masks = []
        for mask in masks:
            area = mask['area']
            if min_area < area < max_area:
                x, y, w_box, h_box = mask['bbox']
                aspect_ratio = w_box / h_box if h_box > 0 else 0
                
                # Teeth aspect ratio: 0.3 to 3.0
                if 0.3 < aspect_ratio < 3.0:
                    teeth_masks.append(mask)
        
        print(f"   Filtered to {len(teeth_masks)} teeth")
        
        # Save annotations
        label_path = labels_dir / f"{img_path.stem}.txt"
        
        with open(label_path, 'w') as f:
            for mask in teeth_masks:
                x, y, w_box, h_box = mask['bbox']
                
                x_center = (x + w_box / 2) / w
                y_center = (y + h_box / 2) / h
                width = w_box / w
                height = h_box / h
                
                f.write(f"0 {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n")
        
        print(f"   ✓ Saved {len(teeth_masks)} tooth annotations")
        
    except Exception as e:
        print(f"   ❌ Error: {e}")
    
    print()

print("=" * 60)
print("✅ Test annotation complete!")
print("=" * 60)
print("\n📋 Next: Review results")
print(f"   Check: data/raw/labels/")
print(f"\n   If results look good, run full annotation:")
print(f"   python3 auto_annotate_teeth_sam.py")
EOF

chmod +x auto_annotate_teeth_sam_quick.py

What this does: Creates a Python script that:

  1. Downloads SAM model (375MB, one-time)
  2. Loads SAM with CPU/GPU
  3. Processes first 5 images to test
  4. Detects individual teeth
  5. Saves annotations in YOLO format

Script explanation:

  • Mask generation: SAM generates many masks (segments)
  • Filtering: We filter for tooth-sized objects (0.5-5% of image area)
  • Aspect ratio check: Teeth have reasonable width/height ratios (0.3-3.0)
  • YOLO format: class x_center y_center width height (all normalized 0-1)

Why: Testing on 5 images first ensures the annotation quality is good before processing all 73 images.


Step 5.3: Run Quick Test (CPU)

python3 auto_annotate_teeth_sam_quick.py
```

**What this does:** Runs the test annotation on 5 images.

**Expected timeline:** ~10-20 minutes on CPU (2-4 min per image)

**Expected output:**
```
============================================================
Quick Tooth Detection Test (5 images)
============================================================
📥 Downloading SAM model (~375MB)...
   Progress: 100.0%
✓ Download complete

🤖 Loading SAM model...
⚠️  Using CPU (will be slower)

📊 Testing on 5 images (out of 73 total)
🦷 Detecting individual teeth...

[1/5] IMAGE_NAME.jpg
   Image size: 1662x952
   Generating masks (this takes 2-4 minutes)...
   Found 57 total masks
   Filtered to 12 teeth
   ✓ Saved 12 tooth annotations

[2/5] ...

Why: Validates that SAM can successfully detect individual teeth before committing to processing all images.


Step 5.4: Check Test Results

ls data/raw/labels/*.txt | wc -l

What this does: Counts generated label files.

Expected output: 5 (or number of images processed)

Why: Confirms labels were created.


head -5 data/raw/labels/*.txt | head -15
```

**What this does:** Shows first 5 lines of label files.

**Expected output:**
```
==> data/raw/labels/IMAGE1.txt <==
0 0.492796 0.479584 0.059870 0.050508
0 0.234567 0.345678 0.045678 0.056789
0 0.345678 0.456789 0.056789 0.067890
...

Format explanation:

  • 0: Class ID (0 = tooth)
  • 0.492796: X-center (normalized, 0-1)
  • 0.479584: Y-center (normalized, 0-1)
  • 0.059870: Width (normalized, 0-1)
  • 0.050508: Height (normalized, 0-1)

Why: Verifies the annotation format is correct.


Step 5.5: Create GPU Docker Annotation Script

cat > docker_sam_annotate.sh << 'EOF'
#!/bin/bash

sudo docker run --rm -it \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  -v $(pwd)/data:/workspace/data \
  -v $(pwd)/sam_vit_b_01ec64.pth:/workspace/sam_vit_b_01ec64.pth:ro \
  nvcr.io/nvidia/pytorch:25.08-py3 \
  bash -c "
  
  echo '=========================================='
  echo 'Installing dependencies...'
  echo '=========================================='
  apt-get update -qq && apt-get install -y -qq libgl1 libglib2.0-0 && \
  pip install -q 'numpy<2.0' opencv-python-headless segment-anything
  
  echo ''
  echo '=========================================='
  echo 'SAM Individual Tooth Detection - GPU'
  echo '=========================================='
  
  python3 << 'PYTHON_EOF'
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator
from pathlib import Path
import cv2
import torch

print(f'PyTorch: {torch.__version__}')
print(f'CUDA: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')

# Load SAM with GPU
print('Loading SAM model...')
sam = sam_model_registry['vit_b'](checkpoint='/workspace/sam_vit_b_01ec64.pth')
sam = sam.to('cuda')
print('✓ SAM loaded on GPU')

# Configure mask generator
mask_generator = SamAutomaticMaskGenerator(
    model=sam,
    points_per_side=48,
    pred_iou_thresh=0.88,
    stability_score_thresh=0.90,
    min_mask_region_area=50,
)

# Process all images
raw_dir = Path('/workspace/data/raw')
labels_dir = Path('/workspace/data/raw/labels')
labels_dir.mkdir(exist_ok=True)

images = list(raw_dir.glob('*.jpg')) + list(raw_dir.glob('*.png')) + list(raw_dir.glob('*.JPG'))
print(f'\n📊 Processing {len(images)} images\n')

total_teeth = 0

for idx, img_path in enumerate(images, 1):
    print(f'[{idx}/{len(images)}] {img_path.name}')
    
    try:
        image = cv2.imread(str(img_path))
        if image is None:
            print('   ⚠️  Could not read image')
            continue
        
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        h, w = image.shape[:2]
        
        print(f'   Generating masks...')
        masks = mask_generator.generate(image_rgb)
        
        # Filter for tooth-sized objects
        min_area = (w * h) * 0.005
        max_area = (w * h) * 0.05
        
        teeth_masks = []
        for mask in masks:
            area = mask['area']
            if min_area < area < max_area:
                x, y, w_box, h_box = mask['bbox']
                aspect_ratio = w_box / h_box if h_box > 0 else 0
                if 0.3 < aspect_ratio < 3.0:
                    teeth_masks.append(mask)
        
        print(f'   Found {len(teeth_masks)} teeth')
        total_teeth += len(teeth_masks)
        
        # Save annotations
        label_path = labels_dir / f'{img_path.stem}.txt'
        with open(label_path, 'w') as f:
            for mask in teeth_masks:
                x, y, w_box, h_box = mask['bbox']
                x_center = (x + w_box / 2) / w
                y_center = (y + h_box / 2) / h
                width = w_box / w
                height = h_box / h
                f.write(f'0 {x_center:.6f} {y_center:.6f} {width:.6f} {height:.6f}\n')
        
        print(f'   ✓ Saved')
        
    except Exception as e:
        print(f'   ❌ Error: {e}')

print(f'\n✅ Complete! Total teeth: {total_teeth}')
print(f'   Average: {total_teeth/len(images):.1f} teeth/image')
PYTHON_EOF
"
EOF

chmod +x docker_sam_annotate.sh

What this does: Creates a shell script that:

  1. Runs NVIDIA PyTorch container with GPU access
  2. Mounts data directory and SAM model file
  3. Installs dependencies (with NumPy < 2.0 for compatibility)
  4. Runs SAM annotation on all 73 images using GPU

Docker flags explained:

  • --rm: Remove container after exit
  • -it: Interactive terminal
  • --ipc=host: Shared memory for better performance
  • --ulimit memlock=-1: Unlimited locked memory
  • --ulimit stack=67108864: Larger stack size
  • -v: Mount volumes (host:container)
  • :ro: Read-only mount

Why Docker? Local Python has CPU-only PyTorch. Docker container has GPU-enabled PyTorch.


Step 5.6: Make Script Executable

chmod +x docker_sam_annotate.sh

What this does: Adds execute permission to the script.

Why: Allows running the script directly with ./docker_sam_annotate.sh


Step 5.7: Run Full GPU Annotation

./docker_sam_annotate.sh
```

**What this does:** Processes all 73 images with SAM using GPU.

**Expected timeline:** ~30-60 minutes (vs 2-6 hours on CPU)

**Expected output:**
```
==========================================
Installing dependencies...
==========================================
[apt and pip installation logs]

==========================================
SAM Individual Tooth Detection - GPU
==========================================
PyTorch: 2.8.0a0+34c6371d24.nv25.08
CUDA: True
GPU: NVIDIA Thor
Loading SAM model...
✓ SAM loaded on GPU

📊 Processing 73 images

[1/73] IMAGE_NAME.jpg
   Generating masks...
   Found 12 teeth
   ✓ Saved

[2/73] ...

Why: GPU acceleration makes processing 73 images feasible in a reasonable time.


Step 5.8: Monitor GPU Usage (Optional, separate terminal)

watch -n 1 nvidia-smi
```

**What this does:** Refreshes GPU status every 1 second.

**Expected to see:**
```
| GPU  Name                 | GPU-Util  |
|   0  NVIDIA Thor          |     93%   |
| Processes:                            |
|   10795      C   python3        0MiB  |

Explanation:

  • GPU-Util: 93%: GPU is working hard (good!)
  • C: Compute mode (not graphics)
  • python3: SAM annotation process

Why: Confirms GPU is being utilized effectively.


Step 5.9: Monitor Progress (Optional, separate terminal)

watch -n 5 'echo "Progress: $(ls data/raw/labels/*.txt 2>/dev/null | wc -l)/73 images annotated"'
```

**What this does:** Counts completed label files every 5 seconds.

**Expected output:**
```
Progress: 15/73 images annotated

Why: Shows real-time progress without cluttering the annotation output.


6. Dataset Splitting

Step 6.1: Create Dataset Preparation Script

cat > prepare_dataset.py << 'EOF'
#!/usr/bin/env python3
"""
Split raw dental images into train/val/test sets
"""
import os
import shutil
from pathlib import Path
import random

# Set random seed for reproducibility
random.seed(42)

# Paths
raw_dir = Path('data/raw')
train_dir = Path('data/train/images')
val_dir = Path('data/val/images')
test_dir = Path('data/test/images')

train_labels_dir = Path('data/train/labels')
val_labels_dir = Path('data/val/labels')
test_labels_dir = Path('data/test/labels')

# Create directories
for dir_path in [train_dir, val_dir, test_dir, train_labels_dir, val_labels_dir, test_labels_dir]:
    dir_path.mkdir(parents=True, exist_ok=True)

# Get all images from raw
image_extensions = ['.jpg', '.jpeg', '.png', '.bmp', '.webp']
all_images = []

for ext in image_extensions:
    all_images.extend(list(raw_dir.glob(f'*{ext}')))
    all_images.extend(list(raw_dir.glob(f'*{ext.upper()}')))

print(f"Found {len(all_images)} images in {raw_dir}")

if len(all_images) == 0:
    print("❌ No images found in data/raw/")
    exit(1)

# Shuffle images
random.shuffle(all_images)

# Split: 70% train, 20% val, 10% test
train_split = int(0.7 * len(all_images))
val_split = int(0.9 * len(all_images))

train_images = all_images[:train_split]
val_images = all_images[train_split:val_split]
test_images = all_images[val_split:]

print(f"\n📊 Dataset Split:")
print(f"   Train: {len(train_images)} images")
print(f"   Val:   {len(val_images)} images")
print(f"   Test:  {len(test_images)} images")

# Copy images and labels
print("\n📁 Copying images and labels...")

def copy_image_and_label(img_path, dest_img_dir, dest_label_dir):
    # Copy image
    shutil.copy2(img_path, dest_img_dir / img_path.name)
    
    # Copy corresponding label if exists
    label_path = raw_dir / 'labels' / f"{img_path.stem}.txt"
    if label_path.exists():
        shutil.copy2(label_path, dest_label_dir / f"{img_path.stem}.txt")

for img in train_images:
    copy_image_and_label(img, train_dir, train_labels_dir)

for img in val_images:
    copy_image_and_label(img, val_dir, val_labels_dir)

for img in test_images:
    copy_image_and_label(img, test_dir, test_labels_dir)

print("\n✅ Dataset preparation complete!")
print(f"   Train: {train_dir}")
print(f"   Val:   {val_dir}")
print(f"   Test:  {test_dir}")
EOF

chmod +x prepare_dataset.py

What this does: Creates a script that:

  1. Finds all images in data/raw/
  2. Randomly shuffles them (with fixed seed for reproducibility)
  3. Splits 70% train / 20% validation / 10% test
  4. Copies images and corresponding labels to respective directories

Split ratios explained:

  • Train (70%): Used to train the model
  • Validation (20%): Used to tune hyperparameters and check overfitting
  • Test (10%): Final evaluation (never seen during training)

Why: Machine learning requires separate train/val/test sets to properly evaluate model performance.


Step 6.2: Run Dataset Preparation

python3 prepare_dataset.py
```

**What this does:** Executes the splitting script.

**Expected output:**
```
Found 73 images in data/raw

📊 Dataset Split:
   Train: 51 images
   Val:   14 images
   Test:  8 images

📁 Copying images and labels...

✅ Dataset preparation complete!
   Train: data/train/images
   Val:   data/val/images
   Test:  data/test/images

Why: Organizes data into the structure YOLOv8 expects.


Step 6.3: Verify Split

echo "=== Dataset Verification ==="
echo "Train images: $(ls data/train/images | wc -l)"
echo "Train labels: $(ls data/train/labels | wc -l)"
echo ""
echo "Val images: $(ls data/val/images | wc -l)"
echo "Val labels: $(ls data/val/labels | wc -l)"
echo ""
echo "Test images: $(ls data/test/images | wc -l)"
echo "Test labels: $(ls data/test/labels | wc -l)"
```

**What this does:** Counts images and labels in each split.

**Expected output:**
```
=== Dataset Verification ===
Train images: 51
Train labels: 51

Val images: 14
Val labels: 14

Test images: 8
Test labels: 8

Why: Confirms images and labels were copied correctly and counts match.


7. GPU Training

Step 7.1: Create GPU Training Script

cat > train_docker.sh << 'EOF'
#!/bin/bash

sudo docker run --rm -it \
  --ipc=host \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  -v $(pwd)/data:/workspace/data \
  -v $(pwd)/runs:/workspace/runs \
  nvcr.io/nvidia/pytorch:25.08-py3 \
  bash -c "
  
  echo '=========================================='
  echo 'Installing system dependencies...'
  echo '=========================================='
  apt-get update -qq && \
  apt-get install -y -qq libgl1 libglib2.0-0 libsm6 libxext6 libxrender1 libgomp1 && \
  rm -rf /var/lib/apt/lists/*
  
  echo ''
  echo '=========================================='
  echo 'Installing Ultralytics YOLOv8...'
  echo '=========================================='
  pip install -q --no-cache-dir 'numpy<2.0' opencv-python-headless ultralytics
  
  echo ''
  echo '=========================================='
  echo 'DenteScope AI - GPU Training on Jetson'
  echo '=========================================='
  
  python3 << 'PYTHON_EOF'
from ultralytics import YOLO
import torch

print(f'PyTorch version: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
print(f'GPU: {torch.cuda.get_device_name(0)}')
print()

# Load model
print('📦 Loading YOLOv8n model...')
model = YOLO('yolov8n.pt')

# Train
print('🚀 Starting training...')
print()

results = model.train(
    data='/workspace/data/data.yaml',
    epochs=100,
    imgsz=640,
    batch=8,
    device=0,
    project='/workspace/runs/train',
    name='dentescope_jetson',
    patience=50,
    save=True,
    save_period=10,
    cache=True,
    verbose=True,
    plots=True,
    
    # Data augmentation
    hsv_h=0.015,
    hsv_s=0.7,
    hsv_v=0.4,
    degrees=10,
    translate=0.1,
    scale=0.5,
    flipud=0.5,
    fliplr=0.5,
    mosaic=1.0,
)

print()
print('========================================')
print('✅ Training completed!')
print('========================================')
print('Best model: /workspace/runs/train/dentescope_jetson/weights/best.pt')
print('Results on host: runs/train/dentescope_jetson/')
PYTHON_EOF
"
EOF

chmod +x train_docker.sh

What this does: Creates training script that:

  1. Runs PyTorch container with GPU
  2. Installs YOLOv8 and dependencies
  3. Trains YOLOv8n model on dental dataset
  4. Saves checkpoints and results

Training parameters explained:

  • data: Path to data.yaml
  • epochs=100: Number of training iterations through dataset
  • imgsz=640: Input image size (640x640 pixels)
  • batch=8: Process 8 images simultaneously (limited by GPU memory)
  • device=0: Use GPU 0
  • patience=50: Early stopping after 50 epochs without improvement
  • save_period=10: Save checkpoint every 10 epochs
  • cache=True: Cache images in memory for faster training

Data augmentation parameters:

  • hsv_h/s/v: Color jittering
  • degrees: Random rotation (±10°)
  • translate: Random translation (±10%)
  • scale: Random scaling (±50%)
  • flipud/fliplr: Random vertical/horizontal flips
  • mosaic: Mix 4 images together

Why: Data augmentation prevents overfitting and makes model more robust.


Step 7.2: Start Training

./train_docker.sh
```

**What this does:** Starts GPU-accelerated training.

**Expected timeline:** ~15-25 minutes for 100 epochs on Jetson AGX Thor

**Expected output:**
```
==========================================
Installing system dependencies...
==========================================
[installation logs]

==========================================
DenteScope AI - GPU Training on Jetson
==========================================
PyTorch version: 2.8.0a0+34c6371d24.nv25.08
CUDA available: True
GPU: NVIDIA Thor

📦 Loading YOLOv8n model...
Downloading yolov8n.pt...

🚀 Starting training...

Ultralytics 8.3.223 🚀 Python-3.12.3 torch-2.8.0 CUDA:0 (NVIDIA Thor, 61440MiB)

                   from  n    params  module
  0                  -1  1       464  ultralytics.nn.modules.Conv
  ...
  
Model summary: 225 layers, 3,011,043 parameters

Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
  1/100      2.5G     1.2345     0.5678     0.8901         32        640
  2/100      2.5G     1.1234     0.5123     0.8456         32        640
  3/100      2.5G     1.0987     0.4891     0.8234         32        640
  ...

Output columns explained:

  • Epoch: Current training epoch
  • GPU_mem: GPU memory used
  • box_loss: Bounding box prediction error
  • cls_loss: Classification error
  • dfl_loss: Distribution focal loss
  • Instances: Number of object instances
  • Size: Input image size

Good training indicators:

  • ✅ Losses decreasing over time
  • ✅ GPU utilization ~90-95%
  • ✅ No out-of-memory errors

Why: Training teaches the model to detect teeth in dental X-rays.


Step 7.3: Monitor Training (Optional, separate terminal)

watch -n 1 nvidia-smi
```

**What this does:** Monitors GPU usage during training.

**Expected to see:**
```
| GPU-Util: 90-95% |
| Memory-Usage: 2-3GB / 61GB |

Why: Confirms GPU is being fully utilized.


watch -n 5 'tail -10 runs/train/dentescope_jetson/results.csv 2>/dev/null || echo "Training starting..."'

What this does: Shows last 10 lines of training results.

Why: Real-time view of training metrics without interfering with main process.

.9189     0.9947      1.025        201        640: 71% ━━━━━━━━╸─── 5/7 5     84/100      2.68G     0.9409      1.008      1.034         55        640: 100% ━━━━━━━━━━━━ 7/7      84/100      2.68G     0.9409      1.008      1.034         55        640: 100% ━━━━━━━━━━━━ 7/7 10.7it/s 0.7s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 9.9it/s 0.1s
                   all         14        150      0.529       0.58      0.554      0.413

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     85/100      2.69G     0.8856      1.039      1.026        132        640: 14% ━╸────────── 1/7 1     85/100      2.69G     0.8867      1.008      1.013        163        640: 43% ━━━━━─────── 3/7 4     85/100      2.69G     0.9149      1.024      1.027        134        640: 71% ━━━━━━━━╸─── 5/7 6     85/100      2.71G     0.8955      1.031      1.018         61        640: 86% ━━━━━━━━━━── 6/7 7     85/100      2.71G     0.8955      1.031      1.018         61        640: 100% ━━━━━━━━━━━━ 7/7      85/100      2.71G     0.8955      1.031      1.018         61        640: 100% ━━━━━━━━━━━━ 7/7 10.5it/s 0.7s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 10.3it/s 0.1s
                   all         14        150      0.555      0.547      0.546      0.407

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     86/100      2.71G     0.8888      1.085      1.027        105        640: 14% ━╸────────── 1/7 1     86/100      2.71G     0.9285      1.066      1.029        111        640: 43% ━━━━━─────── 3/7 4     86/100      2.71G     0.9164      1.056      1.014        241        640: 71% ━━━━━━━━╸─── 5/7 6     86/100      2.72G     0.9252      1.054      1.022         49        640: 100% ━━━━━━━━━━━━ 7/7      86/100      2.72G     0.9252      1.054      1.022         49        640: 100% ━━━━━━━━━━━━ 7/7 11.2it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 10.9it/s 0.1s
                   all         14        150      0.555      0.547      0.546      0.407

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     87/100      2.72G     0.8686     0.9936      1.013        165        640: 0% ──────────── 0/7  0     87/100      2.72G     0.8974       1.01      1.029        150        640: 29% ━━━───────── 2/7 3     87/100      2.72G     0.9202       1.03      1.019        211        640: 57% ━━━━━━╸───── 4/7 5     87/100      2.74G     0.8795      1.017      1.004         58        640: 86% ━━━━━━━━━━── 6/7 7     87/100      2.74G     0.8795      1.017      1.004         58        640: 100% ━━━━━━━━━━━━ 7/7      87/100      2.74G     0.8795      1.017      1.004         58        640: 100% ━━━━━━━━━━━━ 7/7 10.6it/s 0.7s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 9.6it/s 0.1s
                   all         14        150      0.571      0.533      0.535        0.4

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     88/100      2.74G     0.8919      1.059      1.055        143        640: 14% ━╸────────── 1/7 1     88/100      2.74G     0.9199      1.051      1.036        245        640: 43% ━━━━━─────── 3/7 4     88/100      2.74G     0.9169      1.043      1.019        203        640: 71% ━━━━━━━━╸─── 5/7 6     88/100      2.76G     0.9338      1.105      1.042         25        640: 100% ━━━━━━━━━━━━ 7/7      88/100      2.76G     0.9338      1.105      1.042         25        640: 100% ━━━━━━━━━━━━ 7/7 10.4it/s 0.7s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 10.8it/s 0.1s
                   all         14        150      0.558      0.553      0.532      0.396

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     89/100      2.76G      0.955      1.113      1.064        149        640: 14% ━╸────────── 1/7 1     89/100      2.76G     0.9671      1.115      1.053        253        640: 29% ━━━───────── 2/7 3     89/100      2.76G     0.9729      1.071      1.041        170        640: 57% ━━━━━━╸───── 4/7 5     89/100      2.78G     0.9973      1.063      1.041         77        640: 86% ━━━━━━━━━━── 6/7 7     89/100      2.78G     0.9973      1.063      1.041         77        640: 100% ━━━━━━━━━━━━ 7/7      89/100      2.78G     0.9973      1.063      1.041         77        640: 100% ━━━━━━━━━━━━ 7/7 10.8it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 11.2it/s 0.1s
                   all         14        150      0.569      0.554      0.531      0.391

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     90/100      2.78G     0.8196     0.9363     0.9939        165        640: 14% ━╸────────── 1/7 1     90/100      2.78G     0.8204     0.9468     0.9969        180        640: 43% ━━━━━─────── 3/7 3     90/100      2.78G     0.8391     0.9464     0.9972        150        640: 71% ━━━━━━━━╸─── 5/7 6     90/100      2.79G     0.8519     0.9654     0.9958         84        640: 100% ━━━━━━━━━━━━ 7/7      90/100      2.79G     0.8519     0.9654     0.9958         84        640: 100% ━━━━━━━━━━━━ 7/7 10.9it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 13.3it/s 0.1s
                   all         14        150      0.518      0.607       0.53      0.392
Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     91/100      2.79G     0.9964      1.251      1.081        117        640: 0% ──────────── 0/7  0     91/100      2.79G      1.101      1.745      1.109         76        640: 14% ━╸────────── 1/7 1     91/100      2.79G      1.036      1.596      1.078         91        640: 29% ━━━───────── 2/7 3     91/100      2.79G     0.9907      1.486      1.064         90        640: 43% ━━━━━─────── 3/7 4     91/100      2.79G     0.9799      1.412      1.058        107        640: 57% ━━━━━━╸───── 4/7 5     91/100      2.81G     0.9478       1.39      1.049         32        640: 86% ━━━━━━━━━━── 6/7 7     91/100      2.81G     0.9478       1.39      1.049         32        640: 100% ━━━━━━━━━━━━ 7/7      91/100      2.81G     0.9478       1.39      1.049         32        640: 100% ━━━━━━━━━━━━ 7/7 7.4it/s 1.0s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 9.7it/s 0.1s
                   all         14        150      0.494       0.62      0.532      0.395

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     92/100      2.81G     0.9275      1.328      1.017         75        640: 14% ━╸────────── 1/7 1     92/100      2.81G     0.8948      1.275     0.9928         93        640: 43% ━━━━━─────── 3/7 4     92/100      2.81G     0.8945      1.268     0.9978         93        640: 71% ━━━━━━━━╸─── 5/7 5     92/100      2.82G     0.8883      1.235     0.9909         36        640: 100% ━━━━━━━━━━━━ 7/7      92/100      2.82G     0.8883      1.235     0.9909         36        640: 100% ━━━━━━━━━━━━ 7/7 10.7it/s 0.7s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 11.8it/s 0.1s
                   all         14        150      0.526      0.577      0.526      0.391

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     93/100      2.82G     0.9053      1.115       1.01        112        640: 14% ━╸────────── 1/7 1     93/100      2.82G     0.8766      1.157      1.004         85        640: 43% ━━━━━─────── 3/7 4     93/100      2.82G     0.8833      1.198      1.002        110        640: 71% ━━━━━━━━╸─── 5/7 6     93/100      2.85G     0.9193      1.239       1.04         30        640: 86% ━━━━━━━━━━── 6/7 7     93/100      2.85G     0.9193      1.239       1.04         30        640: 100% ━━━━━━━━━━━━ 7/7      93/100      2.85G     0.9193      1.239       1.04         30        640: 100% ━━━━━━━━━━━━ 7/7 10.8it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 11.5it/s 0.1s
                   all         14        150      0.505      0.607      0.524      0.388

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     94/100      2.85G     0.8874      1.174      1.013         77        640: 14% ━╸────────── 1/7 1     94/100      2.85G     0.9222      1.222      1.025         82        640: 43% ━━━━━─────── 3/7 4     94/100      2.85G     0.9233      1.185      1.024         97        640: 71% ━━━━━━━━╸─── 5/7 6     94/100      2.86G     0.9516      1.191       1.04         36        640: 100% ━━━━━━━━━━━━ 7/7      94/100      2.86G     0.9516      1.191       1.04         36        640: 100% ━━━━━━━━━━━━ 7/7 11.1it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 11.9it/s 0.1s
                   all         14        150      0.505      0.607      0.524      0.388

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     95/100      2.86G     0.8706      1.133      1.011         90        640: 0% ──────────── 0/7  0     95/100      2.86G     0.8552      1.086     0.9936        103        640: 29% ━━━───────── 2/7 3     95/100      2.86G      0.889      1.134      1.006         99        640: 57% ━━━━━━╸───── 4/7 5     95/100      2.88G     0.8926      1.181      1.005         31        640: 86% ━━━━━━━━━━── 6/7 8     95/100      2.88G     0.8926      1.181      1.005         31        640: 100% ━━━━━━━━━━━━ 7/7      95/100      2.88G     0.8926      1.181      1.005         31        640: 100% ━━━━━━━━━━━━ 7/7 11.7it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 13.5it/s 0.1s
                   all         14        150       0.47      0.651      0.528      0.388

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     96/100      2.88G     0.8935      1.141      1.016        100        640: 14% ━╸────────── 1/7 1     96/100      2.88G     0.8789      1.107      1.003         93        640: 43% ━━━━━─────── 3/7 4     96/100      2.88G     0.8826      1.095      1.008         85        640: 71% ━━━━━━━━╸─── 5/7 6     96/100       2.9G     0.8684      1.132      1.003         29        640: 100% ━━━━━━━━━━━━ 7/7      96/100       2.9G     0.8684      1.132      1.003         29        640: 100% ━━━━━━━━━━━━ 7/7 12.0it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 13.4it/s 0.1s
                   all         14        150       0.49       0.66      0.526      0.383

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     97/100       2.9G      0.952      1.167      1.023         99        640: 14% ━╸────────── 1/7 1     97/100       2.9G     0.9863      1.214      1.036         89        640: 29% ━━━───────── 2/7 4     97/100       2.9G     0.9287      1.162      1.023        101        640: 57% ━━━━━━╸───── 4/7 6     97/100      2.91G     0.9173      1.157      1.012         50        640: 86% ━━━━━━━━━━── 6/7 7     97/100      2.91G     0.9173      1.157      1.012         50        640: 100% ━━━━━━━━━━━━ 7/7      97/100      2.91G     0.9173      1.157      1.012         50        640: 100% ━━━━━━━━━━━━ 7/7 11.3it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 10.5it/s 0.1s
                   all         14        150      0.479        0.6      0.522      0.376

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     98/100      2.91G     0.8209      1.071     0.9889         80        640: 14% ━╸────────── 1/7 1     98/100      2.91G     0.8357      1.098     0.9942         82        640: 43% ━━━━━─────── 3/7 4     98/100      2.91G     0.8574      1.075     0.9979        127        640: 71% ━━━━━━━━╸─── 5/7 6     98/100      2.94G     0.8656      1.104      1.004         21        640: 100% ━━━━━━━━━━━━ 7/7      98/100      2.94G     0.8656      1.104      1.004         21        640: 100% ━━━━━━━━━━━━ 7/7 11.2it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 14.0it/s 0.1s
                   all         14        150      0.517      0.572      0.516       0.37

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
     99/100      2.94G     0.8145       1.04     0.9632         89        640: 14% ━╸────────── 1/7 1     99/100      2.94G     0.8568      1.138      1.005         89        640: 43% ━━━━━─────── 3/7 4     99/100      2.94G     0.8736      1.173      1.004         91        640: 57% ━━━━━━╸───── 4/7 5     99/100      2.95G     0.8638      1.168     0.9982         25        640: 86% ━━━━━━━━━━── 6/7 7     99/100      2.95G     0.8638      1.168     0.9982         25        640: 100% ━━━━━━━━━━━━ 7/7      99/100      2.95G     0.8638      1.168     0.9982         25        640: 100% ━━━━━━━━━━━━ 7/7 11.0it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 11.3it/s 0.1s
                   all         14        150      0.522       0.54      0.516       0.37

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size
    100/100      2.95G     0.9162      1.079      1.009         91        640: 14% ━╸────────── 1/7 1    100/100      2.95G     0.8915      1.054      1.004        104        640: 43% ━━━━━─────── 3/7 4    100/100      2.95G     0.8902      1.081       1.01         75        640: 71% ━━━━━━━━╸─── 5/7 6    100/100      2.96G     0.8973      1.084      1.005         41        640: 100% ━━━━━━━━━━━━ 7/7     100/100      2.96G     0.8973      1.084      1.005         41        640: 100% ━━━━━━━━━━━━ 7/7 11.0it/s 0.6s
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 12.3it/s 0.1s
                   all         14        150      0.502      0.547      0.515      0.373

100 epochs completed in 0.026 hours.
Optimizer stripped from /workspace/runs/train/dentescope_jetson2/weights/last.pt, 6.2MB
Optimizer stripped from /workspace/runs/train/dentescope_jetson2/weights/best.pt, 6.2MB

Validating /workspace/runs/train/dentescope_jetson2/weights/best.pt...
Ultralytics 8.3.223 🚀 Python-3.12.3 torch-2.8.0a0+34c6371d24.nv25.08 CUDA:0 (NVIDIA Thor, 125772MiB)
Model summary (fused): 72 layers, 3,005,843 parameters, 0 gradients, 8.1 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100% ━━━━━━━━━━━━ 1/1 15.4it/s 0.1s
                   all         14        150       0.47      0.733      0.596      0.436
Speed: 0.1ms preprocess, 1.2ms inference, 0.0ms loss, 1.1ms postprocess per image
Results saved to /workspace/runs/train/dentescope_jetson2

========================================
✅ Training completed!
========================================
Best model: /workspace/runs/train/dentescope_jetson/weights/best.pt
Results on host: runs/train/dentescope_jetson/
ajeetraina@ajeetraina:~/1nov/dentescope-ai-complete$

🎉 TRAINING COMPLETED SUCCESSFULLY!

Yes, it was INCREDIBLY fast! Only 1.6 minutes (0.026 hours) for 100 epochs! 🚀

📊 Training Results Summary

Performance Metrics:

✅ Precision:  47.0%  (When model says "tooth", it's correct 47% of the time)
✅ Recall:     73.3%  (Model finds 73.3% of all teeth in images)
✅ mAP@50:     59.6%  (Overall accuracy at 50% confidence threshold)
✅ mAP@50-95:  43.6%  (Strict accuracy across multiple thresholds)

Inference Speed:

  • 1.2ms per image - Super fast! Can process ~833 images/second 🔥

Why So Fast?

  1. Small dataset: 51 training images (vs thousands typically)
  2. YOLOv8n: Nano model - lightweight and optimized
  3. Powerful GPU: Jetson AGX Thor @ 93-95% utilization
  4. Cached images: Images loaded into memory (faster I/O)

🎯 Your Trained Model

Location: runs/train/dentescope_jetson2/weights/best.pt

This model can now detect individual teeth in dental panoramic X-rays!


🧪 Test Your Model

Step 1: View Training Results

# List all generated files
ls -lh runs/train/dentescope_jetson2/

# View training curves
ls runs/train/dentescope_jetson2/*.png

Key files:

  • results.png - Training/validation curves
  • confusion_matrix.png - Classification accuracy
  • val_batch0_pred.jpg - Sample predictions

Step 2: Run Inference on Test Images

sudo docker run --rm -it \
  -v $(pwd)/data:/workspace/data \
  -v $(pwd)/runs:/workspace/runs \
  nvcr.io/nvidia/pytorch:25.08-py3 \
  bash -c "
  pip install -q ultralytics && \
  python3 -c \"
from ultralytics import YOLO

# Load your trained model
model = YOLO('/workspace/runs/train/dentescope_jetson2/weights/best.pt')

# Run inference on test images
results = model.predict(
    source='/workspace/data/test/images',
    save=True,
    conf=0.25,
    project='/workspace/runs/predict',
    name='test_results'
)

print('✅ Predictions saved!')
print('   Results: /workspace/runs/predict/test_results')
\""

Step 3: View Predictions

# Check prediction results
ls runs/predict/test_results/

# Count predictions
echo "Predicted on $(ls runs/predict/test_results/*.jpg 2>/dev/null | wc -l) test images"
```

The predicted images will show **bounding boxes around each detected tooth**! 🦷

---

## 📈 Understanding Your Results

### **Is 59.6% mAP Good?**

For a **first training run** with:
- Only 51 training images
- Auto-generated annotations (SAM)
- 100 epochs
- No hyperparameter tuning

**59.6% mAP@50 is actually pretty good!** 👍

### **How to Improve:**

1. **More training data** (200-500 images would be better)
2. **Manual annotation review** (fix SAM mistakes)
3. **More epochs** (try 200-300)
4. **Larger model** (try YOLOv8s or YOLOv8m)
5. **Hyperparameter tuning** (learning rate, augmentation)

---

## 🎯 What You've Accomplished Today

✅ **Verified GPU access** on Jetson AGX Thor  
✅ **Configured Docker** for GPU acceleration  
✅ **Auto-annotated 73 images** with SAM (860 individual teeth)  
✅ **Split dataset** (51 train / 14 val / 8 test)  
✅ **Trained YOLOv8n model** in just 1.6 minutes  
✅ **Achieved 59.6% mAP** - ready for deployment!  

**Total time from start to trained model: ~1 hour!** 🚀

---

## 📂 Final File Structure
```
runs/train/dentescope_jetson2/
├── weights/
│   ├── best.pt          ← Your trained model! 
│   └── last.pt
├── results.csv          ← Training metrics
├── results.png          ← Training curves
├── confusion_matrix.png
├── F1_curve.png
├── P_curve.png
├── R_curve.png
└── PR_curve.png

Problem: Docker container ran out of shared memory! The default is only 64MB, but PyTorch needs much more for DataLoader workers.

🔧 Fix: Restart Docker with More Shared Memory

Exit the container and restart with increased shared memory:

# Exit current container
exit

# Restart with 8GB shared memory
sudo docker run -it --rm \
  --runtime nvidia \
  --gpus all \
  --shm-size=8g \
  -v $(pwd):/workspace \
  -w /workspace \
  nvcr.io/nvidia/pytorch:25.08-py3 \
  bash

Step 7.4: After Training Completes

ls -lh runs/train/dentescope_jetson/weights/
```

**What this does:** Lists saved model weights.

**Expected output:**
```
-rw-r--r-- 1 user user 6.2M Nov  1 22:00 best.pt
-rw-r--r-- 1 user user 6.2M Nov  1 22:00 last.pt

Files explained:

  • best.pt: Model with best validation performance
  • last.pt: Model from last training epoch

Why: These are your trained models ready for inference.


Step 7.5: View Training Results

ls runs/train/dentescope_jetson/
```

**Expected files:**
```
weights/
results.csv
results.png
confusion_matrix.png
F1_curve.png
P_curve.png
R_curve.png
PR_curve.png
val_batch0_pred.jpg

Key files:

  • results.csv: Numerical training metrics
  • results.png: Training curves (loss, mAP, etc.)
  • confusion_matrix.png: Classification accuracy
  • *_curve.png: Various performance metrics
  • val_batch0_pred.jpg: Sample predictions on validation set

Why: These visualizations help understand model performance.


Step 7.6: Test Inference

sudo docker run --rm -it \
  -v $(pwd)/data:/workspace/data \
  -v $(pwd)/runs:/workspace/runs \
  nvcr.io/nvidia/pytorch:25.08-py3 \
  bash -c "
  pip install -q ultralytics && \
  python3 -c \"
from ultralytics import YOLO

# Load trained model
model = YOLO('/workspace/runs/train/dentescope_jetson/weights/best.pt')

# Run inference on test images
results = model.predict(
    source='/workspace/data/test/images',
    save=True,
    conf=0.25,
    project='/workspace/runs/predict',
    name='test_results'
)

print('✅ Predictions saved to: /workspace/runs/predict/test_results')
\""
```

**What this does:** Runs inference on test images using trained model.

**Parameters:**
- `conf=0.25`: Confidence threshold (only show detections >25% confidence)
- `save=True`: Save annotated images

**Expected output:**
```
✅ Predictions saved to: /workspace/runs/predict/test_results

Why: Validates that trained model can detect teeth in new, unseen images.


Step 7.7: View Predictions

ls runs/predict/test_results/
```

**What this does:** Lists prediction results.

**Expected:** Annotated images showing detected teeth with bounding boxes.

**Why:** Visual confirmation that model works correctly.

---

## 📊 Performance Comparison

| Task | CPU Time | GPU Time | Speedup |
|------|----------|----------|---------|
| **SAM Annotation (73 images)** | 2-6 hours | 30-60 minutes | **5-10x** |
| **YOLOv8n Training (100 epochs)** | 4-6 hours | 15-25 minutes | **14-16x** |

---

## 🎯 Final Results

After completing all steps, you have:

1. ✅ **Annotated Dataset**: 73 dental X-rays with individual tooth bounding boxes
2. ✅ **Split Dataset**: 51 train / 14 val / 8 test images
3. ✅ **Trained Model**: YOLOv8n model (`best.pt`) trained on GPU
4. ✅ **Performance Metrics**: Training curves, confusion matrix, mAP scores
5. ✅ **Test Predictions**: Model tested on unseen images

---

## 🔑 Key Files Summary
```
dentescope-ai-complete/
├── data/
│   ├── raw/
│   │   ├── *.jpg                          # Original images
│   │   └── labels/*.txt                   # SAM-generated annotations
│   ├── train/
│   │   ├── images/*.jpg                   # Training images
│   │   └── labels/*.txt                   # Training labels
│   ├── val/
│   │   ├── images/*.jpg                   # Validation images
│   │   └── labels/*.txt                   # Validation labels
│   ├── test/
│   │   ├── images/*.jpg                   # Test images
│   │   └── labels/*.txt                   # Test labels
│   └── data.yaml                          # YOLOv8 config
├── runs/
│   ├── train/dentescope_jetson/
│   │   ├── weights/best.pt                # Trained model
│   │   └── *.png                          # Training visualizations
│   └── predict/test_results/              # Inference results
├── sam_vit_b_01ec64.pth                   # SAM model weights
├── docker_sam_annotate.sh                 # GPU annotation script
├── train_docker.sh                        # GPU training script
└── prepare_dataset.py                     # Dataset splitting script

💡 Next Steps

  1. Improve annotations: Manually review and correct SAM annotations
  2. More training data: Collect more dental X-rays
  3. Fine-tuning: Adjust hyperparameters for better performance
  4. Deploy model: Integrate into web application
  5. Multi-class detection: Train to detect different tooth types (molars, incisors, etc.)

The complete project is open source and available at: https://github.com/ajeetraina/dentescope-ai-complete

Try the live demo: https://huggingface.co/spaces/ajeetsraina/dentescope-ai

Congratulations! 🎉 You've successfully set up GPU-accelerated dental AI training on Jetson AGX Thor, from raw X-rays to a trained YOLOv8 model! 🦷🚀