Kubernetes Annoyances for DevOps: A Deep Dive into Real-World Pain Points
If you’ve ever spent hours debugging a misconfigured probe or chasing down a rogue config drift, you’re not alone. This post is a no-fluff breakdown of the top Kubernetes headaches every DevOps engineer encounters, paired with proven fixes, pro tips, and real-world examples to save your sanity.

Kubernetes has revolutionized container orchestration, but let's be honest—it's not all smooth sailing. After years of wrestling with K8s in production environments, every DevOps engineer has a collection of war stories about seemingly simple tasks that turned into multi-hour debugging sessions. This post explores the most common Kubernetes annoyances that keep DevOps teams up at night, along with practical solutions and workarounds.
1. The YAML Verbosity Nightmare
The Problem: Kubernetes YAML manifests are notoriously verbose. A simple application deployment can require hundreds of lines of YAML across multiple files, making them error-prone and difficult to maintain.
Example of the Pain:
apiVersion: apps/v1
kind: Deployment
metadata:
name: simple-app
namespace: production
labels:
app: simple-app
version: v1.0.0
environment: production
component: backend
spec:
replicas: 3
selector:
matchLabels:
app: simple-app
template:
metadata:
labels:
app: simple-app
version: v1.0.0
environment: production
component: backend
spec:
containers:
- name: app
image: myregistry/simple-app:v1.0.0
ports:
- containerPort: 8080
name: http
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: database-url
- name: REDIS_URL
valueFrom:
configMapKeyRef:
name: app-config
key: redis-url
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: simple-app-service
namespace: production
spec:
selector:
app: simple-app
ports:
- port: 80
targetPort: 8080
name: http
type: ClusterIP
This seemingly "simple" application deployment requires 80+ lines of YAML just to run a basic web service. Notice the massive amount of repetition—labels are duplicated across metadata sections, and configuration references are scattered throughout. The verbosity makes it error-prone; a single mismatched label in the selector will break the deployment entirely.
The real pain comes when you need to maintain this across multiple environments. Each environment requires its own copy with slight variations, leading to configuration drift and deployment inconsistencies. Small changes like updating the image tag require careful editing across multiple sections, and forgetting to update the version label means your monitoring and rollback strategies break silently.
Solution: Use templating tools like Helm or Kustomize to reduce repetition:
# values.yaml for Helm
app:
name: simple-app
namespace: production
image:
repository: myregistry/simple-app
tag: v1.0.0
replicas: 3
labels:
version: v1.0.0
environment: production
component: backend
ports:
- name: http
containerPort: 8080
servicePort: 80
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: database-url
- name: REDIS_URL
valueFrom:
configMapKeyRef:
name: app-config
key: redis-url
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 512Mi
cpu: 500m
probes:
liveness:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readiness:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
service:
type: ClusterIP
ports:
- port: 80
targetPort: 8080
name: http
This Helm values file demonstrates the power of templating—the same application configuration that took 80+ lines of verbose YAML can now be expressed in just 35-40 lines of meaningful configuration. The template engine handles all the repetitive boilerplate, label consistency, and cross-references automatically.
The beauty of this approach is environment-specific overrides become trivial. You can have a base values.yaml and then create values-production.yaml, values-staging.yaml files that only specify the differences. This eliminates configuration drift and makes promoting applications between environments much safer and more predictable.
Kustomize takes a different approach than Helm - instead of templating, it uses overlays and patches to reduce repetition. Here's how the same example would look:
Base Configuration
base/deployment.yaml (simplified base):
apiVersion: apps/v1
kind: Deployment
metadata:
name: simple-app
labels:
app: simple-app
spec:
replicas: 3
selector:
matchLabels:
app: simple-app
template:
metadata:
labels:
app: simple-app
spec:
containers:
- name: app
image: myregistry/simple-app:latest
ports:
- containerPort: 8080
name: http
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
base/service.yaml:
apiVersion: v1
kind: Service
metadata:
name: simple-app-service
spec:
selector:
app: simple-app
ports:
- port: 80
targetPort: 8080
name: http
type: ClusterIP
base/kustomization.yaml:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
commonLabels:
app: simple-app
images:
- name: myregistry/simple-app
newTag: v1.0.0
Environment Overlays
overlays/production/kustomization.yaml:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: production
resources:
- ../../base
commonLabels:
environment: production
component: backend
version: v1.0.0
patches:
- target:
kind: Deployment
name: simple-app
patch: |-
- op: add
path: /spec/template/spec/containers/0/env
value:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: database-url
- name: REDIS_URL
valueFrom:
configMapKeyRef:
name: app-config
key: redis-url
- op: replace
path: /spec/replicas
value: 5
images:
- name: myregistry/simple-app
newTag: v1.0.0-abc123
overlays/staging/kustomization.yaml:
yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: staging
resources:
- ../../base
commonLabels:
environment: staging
component: backend
version: v1.0.0
patches:
- target:
kind: Deployment
name: simple-app
patch: |-
- op: add
path: /spec/template/spec/containers/0/env
value:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: staging-app-secrets
key: database-url
- name: REDIS_URL
valueFrom:
configMapKeyRef:
name: staging-app-config
key: redis-url
- op: replace
path: /spec/replicas
value: 2
images:
- name: myregistry/simple-app
newTag: v1.0.0-staging-def456
Key Differences: Kustomize vs Helm
Aspect | Kustomize | Helm |
---|---|---|
Approach | Overlays & patches | Templating |
Base files | Valid YAML | Template files |
Complexity | Simpler, more declarative | More powerful, more complex |
Environment differences | Patches/overlays | Different values files |
Learning curve | Gentler | Steeper |
Usage
# Build for production
kustomize build overlays/production
# Apply directly
kubectl apply -k overlays/production
# Build for staging
kustomize build overlays/staging
Advantages of Kustomize
- No templating language - uses standard YAML
- Base files are always valid - can be applied directly
- Simpler mental model - patches are easier to understand than templates
- Built into kubectl - no additional tools needed
- Better for smaller variations between environments
When to Choose What
- Use Kustomize when: You have mostly similar configurations with small environment-specific differences
- Use Helm when: You need complex templating, package management, or significantly different configurations per environment
Kustomize would actually be more verbose than the Helm values.yaml (since you still need the base YAML files), but it's more transparent and easier to debug since everything remains valid Kubernetes YAML.
2. Resource Limits: The Guessing Game
The Problem: Setting appropriate CPU and memory limits feels like throwing darts blindfolded. Set them too low, and your pods get OOMKilled or throttled into oblivion. Set them too high, and you're burning money on wasted cluster resources. Most teams resort to cargo-cult configurations copied from tutorials, leading to production surprises.
The Pain in Action
# This looks reasonable, right? Think again!
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
This resource configuration is the kind of "safe" conservative guess that seems reasonable in isolation but becomes a disaster in production. The 128Mi memory request might work for a trivial demo app, but any real application—especially Java or Node.js workloads—will immediately hit the 256Mi limit and get terminated. The 100m CPU request (0.1 cores) will cause severe throttling the moment your application receives any meaningful traffic.
Here's what actually happens when this "conservative" configuration meets reality:
# Your pod gets OOMKilled during the first real request
$ kubectl describe pod my-app-7d4c8b5f6-xyz12
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Failed 2m kubelet Error: container has runaway memory usage
Normal Killing 2m kubelet Killing container with id docker://abc123:
container "app" is using 267Mi of memory; limit is 256Mi
# CPU throttling causes response time spikes (but pod stays alive)
$ kubectl top pods
NAME CPU(cores) MEMORY(bytes)
my-app-7d4c8b5f6-abc34 100m 180Mi # Throttled at 100m, requests timing out
The insidious part about CPU limits is that they don't kill your pod—they just make it painfully slow. Your application appears "healthy" to basic monitoring, but response times spike from 50ms to 2000ms because the kernel is throttling CPU cycles. Users experience timeouts while your monitoring shows the pod as "running."
Understanding the Quality of Service Impact
When you set resource requests and limits, Kubernetes assigns a Quality of Service (QoS) class that determines how your pod behaves under resource pressure:
# Creates "Burstable" QoS - good for most applications
resources:
requests: # Guaranteed baseline - used for scheduling
memory: "512Mi"
cpu: "250m"
limits: # Maximum allowed - prevents resource hogging
memory: "1Gi"
cpu: "1000m"
# vs. "Guaranteed" QoS - requests = limits (very restrictive)
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "512Mi" # Same as requests
cpu: "250m" # Same as requests - will throttle constantly
The "Guaranteed" QoS class might seem safer, but it actually creates more problems. When requests equal limits, your application can never burst above baseline capacity, causing artificial performance bottlenecks during normal traffic variations. The "Burstable" QoS gives you a guaranteed foundation with headroom for real-world usage patterns.
A Data-Driven Approach
Instead of guessing, start with monitoring and iterate based on actual behavior:
# Science-based resource allocation
resources:
requests:
memory: "512Mi" # Based on 95th percentile + 50% buffer
cpu: "250m" # Steady-state usage from monitoring
limits:
memory: "1536Mi" # 3x requests for traffic spikes + GC headroom
cpu: "1500m" # Allow bursting to 1.5 cores for peak loads
This configuration reflects a monitoring-driven approach where each number has a story:
- Memory request (512Mi): Derived from observing actual memory usage patterns over at least a week, taking the 95th percentile and adding 50% buffer for growth
- Memory limit (1536Mi): 3x the request provides substantial headroom for garbage collection cycles and traffic spikes while preventing runaway processes
- CPU request (250m): Based on steady-state CPU usage during normal operation, ensuring consistent performance
- CPU limit (1500m): Allows bursting to handle traffic spikes, background tasks, and initialization overhead
Essential Monitoring for Resource Right-Sizing
Deploy these monitoring tools before you deploy your application:
# Vertical Pod Autoscaler for intelligent recommendations
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Recommend only, don't auto-update in production
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: 100m
memory: 256Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
controlledResources: ["cpu", "memory"]
---
# Prometheus monitoring for resource usage patterns
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: resource-usage-monitor
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics
interval: 30s
path: /metrics
The VPA is your secret weapon for eliminating guesswork. Set it to "Off" mode to get recommendations without the risk of automatic pod restarts in production. Run this for at least a week to capture different traffic patterns—weekday peaks, weekend lulls, batch processing windows, and any monthly/quarterly cycles specific to your application.
Monitor these key Prometheus metrics to understand your resource patterns:
# Memory usage 95th percentile over the last week
quantile(0.95, container_memory_working_set_bytes{pod=~"my-app-.*"}[7d])
# CPU usage patterns to identify steady state vs. spikes
rate(container_cpu_usage_seconds_total{pod=~"my-app-.*"}[5m])
# Memory growth rate to predict future needs
rate(container_memory_working_set_bytes{pod=~"my-app-.*"}[1h])
Real-World Resource Sizing Rules
Based on application type, here are starting points that work better than random guessing:
Java Applications:
resources:
requests:
memory: "1Gi" # JVM heap + non-heap overhead
cpu: "500m" # Account for JIT compilation
limits:
memory: "2Gi" # GC headroom + safety buffer
cpu: "2000m" # Allow JIT and GC bursting
Node.js Applications:
resources:
requests:
memory: "512Mi" # V8 heap + application state
cpu: "250m" # Single-threaded baseline
limits:
memory: "1Gi" # Event loop and buffer growth
cpu: "1000m" # I/O and async task bursting
Go/Rust Applications:
resources:
requests:
memory: "256Mi" # Compiled binaries are efficient
cpu: "100m" # Low overhead baseline
limits:
memory: "512Mi" # Conservative limit for safety
cpu: "500m" # Allow for concurrent operations
The Hidden Cost of Getting It Wrong
Resource misconfigurations don't just affect individual applications—they cascade through your entire cluster:
Under-allocation consequences:
- OOMKills during traffic spikes create service degradation
- CPU throttling causes response time variability and timeouts
- Cascading failures as healthy pods can't handle redirected traffic from failed pods
- False alerts and monitoring noise from resource-related failures
Over-allocation consequences:
- Cluster resource waste leading to unnecessary infrastructure costs
- Reduced pod density requiring more nodes than necessary
- Poor bin-packing efficiency in the scheduler
- Higher blast radius during node failures due to fewer pods per node
Pro Tips for Production Success
1. Start Conservative, Then Optimize: Begin with generous limits based on application type, then use monitoring data to optimize downward. It's easier to reduce limits than to debug OOMKilled pods during a production incident.
2. Set CPU Limits Carefully: Unlike memory limits (which kill pods when exceeded), CPU limits throttle performance. Consider setting high CPU limits or no limits at all if you trust your applications and have good monitoring.
3. Use Resource Quotas for Safety:
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
namespace: production
spec:
hard:
requests.cpu: "50"
requests.memory: 100Gi
limits.cpu: "100"
limits.memory: 200Gi
4. Monitor Resource Efficiency: Track the ratio of actual usage to requested resources. If your applications consistently use less than 50% of requested resources, you're over-allocating and wasting money.
The key insight is that resource limits aren't a one-time configuration—they're part of an ongoing optimization cycle. Start with monitoring, make data-driven decisions, and continuously refine based on actual usage patterns. Your future self (and your infrastructure budget) will thank you.
3. ConfigMap and Secret Management Hell
The Problem: Configuration management in Kubernetes starts simple but quickly becomes a maintenance nightmare. What begins as a few environment-specific ConfigMaps evolves into dozens of scattered configuration files with duplicated values, inconsistent formatting, and no clear source of truth. Add secrets into the mix, and you're juggling sensitive data across multiple environments with no automated rotation or centralized management.
The Mess You Inevitably Create
Here's how most teams start—and why it doesn't scale:
# production-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-prod
namespace: production
data:
database.host: "prod-db.company.com"
database.port: "5432"
database.name: "myapp_production"
redis.host: "prod-redis.company.com"
redis.port: "6379"
api.timeout: "30s"
api.retries: "3"
log.level: "info"
feature.new_ui: "true"
feature.beta_analytics: "false"
---
# production-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: app-secrets-prod
namespace: production
type: Opaque
data:
database.password: cHJvZC1wYXNzd29yZA== # base64 encoded
redis.password: cmVkaXMtcHJvZC1wYXNzd29yZA==
api.key: YWJjZGVmZ2hpams=
jwt.secret: c3VwZXItc2VjcmV0LWp3dC1rZXk=
---
# staging-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-staging
namespace: staging
data:
database.host: "staging-db.company.com"
database.port: "5432" # Duplicated!
database.name: "myapp_staging"
redis.host: "staging-redis.company.com"
redis.port: "6379" # Duplicated!
api.timeout: "30s" # Duplicated!
api.retries: "3" # Duplicated!
log.level: "debug" # Different from prod
feature.new_ui: "true" # Duplicated!
feature.beta_analytics: "true" # Different from prod
---
# And 8 more environments with 90% duplicated configuration...
This approach seems reasonable at first—separate configs per environment provide clear isolation. But notice the massive duplication: database.port
, redis.port
, api.timeout
, and api.retries
are identical across environments. When you need to change the API timeout globally, you'll need to update it in every single environment file, inevitably missing one and creating mysterious production issues.
The real horror emerges during incident response. When production behaves differently than staging, you'll spend precious time hunting through multiple files trying to spot the configuration differences. The secrets are particularly problematic—those base64 encoded values give no hint about their actual content, rotation dates, or which systems depend on them.
The Secret Sprawl Problem
As your application grows, secret management becomes exponentially more complex:
# What you end up managing manually
$ kubectl get secrets -A | grep app
production app-secrets-prod Opaque 4 23d
staging app-secrets-staging Opaque 4 15d
development app-secrets-dev Opaque 4 45d # Outdated!
qa app-secrets-qa Opaque 3 8d # Missing one secret!
demo app-secrets-demo Opaque 4 67d # Ancient passwords!
# No way to tell which secrets are current or which need rotation
$ kubectl get secret app-secrets-prod -o yaml
# Shows base64 gibberish with no metadata about source or age
Each environment requires manual secret creation and updates. When the database password changes, you'll need to manually update 5+ Kubernetes secrets, inevitably forgetting one environment. There's no audit trail, no automated rotation, and no way to verify that secrets are current across all environments.
Configuration Drift: The Silent Killer
Configuration drift happens gradually and invisibly:
# Week 1: Emergency hotfix applied only to production
data:
api.timeout: "60s" # Increased for Black Friday traffic
# Week 3: New feature flag added only to staging for testing
data:
feature.advanced_search: "true"
# Week 5: Security patch requires new API key format
data:
api.key: "new-format-key-prod-only"
# Result: No two environments have the same configuration
# Bugs appear in production that were never seen in testing
This drift is insidious because each change seems reasonable in isolation, but collectively they make your environments incompatible. Features work in staging but fail in production. Security policies differ between environments. Performance characteristics become unpredictable because timeout and retry configurations have diverged.
Solution 1: External Secrets Operator (Recommended)
The External Secrets Operator transforms configuration management from a manual process into an automated, centralized system:
# First, set up the secret store connection
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
namespace: production
spec:
provider:
vault:
server: "https://vault.company.com"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "myapp-production"
caBundle: |
-----BEGIN CERTIFICATE-----
MIIDXTCCAkWgAwIBAgIJAKoK/heBjcOuMA0GCSqGSIb3DQEBBQUAMEUxCzAJBgNV
... (your CA certificate)
-----END CERTIFICATE-----
---
# Define how to fetch and sync secrets
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
namespace: production
spec:
refreshInterval: 1h # Automatically sync every hour
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: app-secrets
creationPolicy: Owner
template:
type: Opaque
metadata:
labels:
app: myapp
managed-by: external-secrets
data:
# Transform vault data into application format
DATABASE_URL: "postgresql://{{ .database_user }}:{{ .database_password }}@{{ .database_host }}:5432/myapp_production"
REDIS_URL: "redis://:{{ .redis_password }}@{{ .redis_host }}:6379"
API_KEY: "{{ .api_key }}"
JWT_SECRET: "{{ .jwt_secret }}"
data:
# Map vault paths to secret keys
- secretKey: database_user
remoteRef:
key: myapp/production/database
property: username
- secretKey: database_password
remoteRef:
key: myapp/production/database
property: password
- secretKey: database_host
remoteRef:
key: myapp/production/database
property: host
- secretKey: redis_password
remoteRef:
key: myapp/production/redis
property: password
- secretKey: redis_host
remoteRef:
key: myapp/production/redis
property: host
- secretKey: api_key
remoteRef:
key: myapp/production/api
property: key
- secretKey: jwt_secret
remoteRef:
key: myapp/production/jwt
property: secret
This configuration eliminates secret sprawl entirely. Your secrets live in a centralized vault with proper access controls, audit logging, and rotation policies. The External Secrets Operator automatically syncs changes to your Kubernetes clusters, ensuring all environments stay current. The template feature transforms raw vault data into application-ready environment variables.
Multi-Environment Setup
# staging-external-secret.yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
namespace: staging
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: app-secrets
creationPolicy: Owner
template:
type: Opaque
data:
DATABASE_URL: "postgresql://{{ .database_user }}:{{ .database_password }}@{{ .database_host }}:5432/myapp_staging"
REDIS_URL: "redis://:{{ .redis_password }}@{{ .redis_host }}:6379"
API_KEY: "{{ .api_key }}"
JWT_SECRET: "{{ .jwt_secret }}"
data:
# Same structure, different vault paths
- secretKey: database_user
remoteRef:
key: myapp/staging/database # Environment-specific path
property: username
- secretKey: database_password
remoteRef:
key: myapp/staging/database
property: password
# ... rest of mappings use staging paths
Solution 2: Kustomize for Configuration Management
For non-sensitive configuration, Kustomize provides excellent deduplication:
# base/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
# Common configuration that doesn't change between environments
database.port: "5432"
redis.port: "6379"
api.timeout: "30s"
api.retries: "3"
cache.ttl: "300s"
batch.size: "100"
---
# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- configmap.yaml
# Default values that can be overridden
configMapGenerator:
- name: app-config-env
literals:
- log.level=info
- feature.new_ui=false
- feature.beta_analytics=false
---
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: production
resources:
- ../../base
# Override only what's different in production
configMapGenerator:
- name: app-config-env
behavior: replace
literals:
- database.host=prod-db.company.com
- database.name=myapp_production
- redis.host=prod-redis.company.com
- log.level=info
- feature.new_ui=true
- feature.beta_analytics=false
---
# overlays/staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: staging
resources:
- ../../base
configMapGenerator:
- name: app-config-env
behavior: replace
literals:
- database.host=staging-db.company.com
- database.name=myapp_staging
- redis.host=staging-redis.company.com
- log.level=debug
- feature.new_ui=true
- feature.beta_analytics=true # Test new features in staging
This approach maintains the common configuration in a single base file while allowing environment-specific overrides. When you need to change the global API timeout, you update it once in the base configuration, and all environments inherit the change.
Solution 3: Helm for Complex Configuration Templates
For applications with complex configuration relationships, Helm provides powerful templating:
# values.yaml (common defaults)
app:
name: myapp
database:
port: 5432
poolSize: 10
timeout: 30s
redis:
port: 6379
maxConnections: 100
api:
timeout: 30s
retries: 3
rateLimit: 1000
features:
newUI: false
betaAnalytics: false
advancedSearch: false
logging:
level: info
format: json
---
# values-production.yaml (production overrides)
database:
host: prod-db.company.com
name: myapp_production
poolSize: 20 # Higher pool for production traffic
redis:
host: prod-redis.company.com
maxConnections: 200
features:
newUI: true
logging:
level: warn # Reduce log noise in production
---
# templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "myapp.fullname" . }}-config
labels:
{{- include "myapp.labels" . | nindent 4 }}
data:
database.host: {{ .Values.database.host | quote }}
database.port: {{ .Values.database.port | quote }}
database.name: {{ .Values.database.name | quote }}
database.pool-size: {{ .Values.database.poolSize | quote }}
database.timeout: {{ .Values.database.timeout | quote }}
redis.host: {{ .Values.redis.host | quote }}
redis.port: {{ .Values.redis.port | quote }}
redis.max-connections: {{ .Values.redis.maxConnections | quote }}
api.timeout: {{ .Values.api.timeout | quote }}
api.retries: {{ .Values.api.retries | quote }}
api.rate-limit: {{ .Values.api.rateLimit | quote }}
{{- range $key, $value := .Values.features }}
feature.{{ $key | kebabcase }}: {{ $value | quote }}
{{- end }}
log.level: {{ .Values.logging.level | quote }}
log.format: {{ .Values.logging.format | quote }}
Best Practices for Production Configuration Management
1. Separate Concerns:
# Different ConfigMaps for different purposes
apiVersion: v1
kind: ConfigMap
metadata:
name: app-database-config
data:
host: "prod-db.company.com"
port: "5432"
pool-size: "20"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: app-feature-flags
data:
new-ui: "true"
beta-analytics: "false"
advanced-search: "true"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: app-logging-config
data:
level: "info"
format: "json"
retention: "30d"
2. Use Immutable ConfigMaps for Safety:
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config-v1-2-3 # Version in the name
immutable: true # Prevents accidental changes
data:
version: "1.2.3"
config.yaml: |
database:
timeout: 30s
api:
retries: 3
3. Implement Configuration Validation:
# Init container to validate configuration
spec:
initContainers:
- name: config-validator
image: myapp:latest
command: ["/app/validate-config"]
env:
- name: CONFIG_FILE
value: "/config/app.yaml"
volumeMounts:
- name: config
mountPath: /config
containers:
- name: app
# ... main container config
4. Monitor Configuration Drift:
# Use labels to track configuration versions
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
labels:
config-version: "v1.2.3"
config-hash: "abc123def456"
last-updated: "2025-08-12"
updated-by: "deploy-pipeline"
data:
# ... configuration data
Debugging Configuration Issues
Essential kubectl commands for config troubleshooting:
# Compare configurations across environments
kubectl get configmap app-config -n production -o yaml > prod-config.yaml
kubectl get configmap app-config -n staging -o yaml > staging-config.yaml
diff prod-config.yaml staging-config.yaml
# Check secret status and sync health
kubectl get externalsecrets -A
kubectl describe externalsecret app-secrets -n production
# Validate pod environment variables
kubectl exec -it my-app-pod -- env | grep DATABASE
kubectl exec -it my-app-pod -- cat /etc/config/app.yaml
# Check configuration mount points
kubectl describe pod my-app-pod | grep -A 10 "Mounts:"
Monitor configuration with Prometheus:
# Alert on configuration sync failures
- alert: ExternalSecretSyncFailure
expr: external_secrets_sync_calls_error > 0
for: 5m
labels:
severity: warning
annotations:
summary: "External secret sync failed"
description: "Secret {{ $labels.name }} in namespace {{ $labels.namespace }} failed to sync"
# Alert on configuration drift
- alert: ConfigurationDrift
expr: |
count by (configmap) (
kube_configmap_info{configmap=~"app-config-.*"}
) > 1
for: 0m
labels:
severity: critical
annotations:
summary: "Configuration drift detected"
description: "Multiple versions of configuration detected across environments"
The key to successful configuration management in Kubernetes is treating it as a software engineering problem, not an operational afterthought. Use version control, automated validation, centralized secret management, and monitoring to transform configuration from a source of production surprises into a reliable, auditable system.
4. Networking: The Black Box of Pain
The Problem: Kubernetes networking is where simple concepts collide with complex reality. What should be straightforward—"make this service talk to that service"—becomes a maze of DNS resolution, iptables rules, CNI plugins, service meshes, and network policies. When networking breaks, debugging feels like performing surgery blindfolded while the entire application stack is on fire.
The Networking Stack from Hell
Kubernetes networking involves multiple layers that can each fail independently:
# The journey of a simple HTTP request
Pod A → CNI Plugin → Node iptables → kube-proxy → Service → Endpoints →
Target Pod → CNI Plugin → Node iptables → Target Container
# Each hop can fail with cryptic errors:
# DNS: "name or service not known"
# iptables: "connection refused"
# CNI: "network unreachable"
# Service: "no endpoints available"
The complexity multiplies when you add service meshes, ingress controllers, network policies, and multiple availability zones. A single misconfigured network policy can silently block traffic, while a CNI plugin issue can make pods unreachable despite appearing healthy.
Common Networking Nightmares
1. The "Connection Refused" Mystery
# This looks like it should work, but...
$ kubectl exec -it pod-a -- curl service-b.namespace.svc.cluster.local
curl: (7) Failed to connect to service-b.namespace.svc.cluster.local port 80: Connection refused
# The service exists and looks correct
$ kubectl get svc service-b -n namespace
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service-b ClusterIP 10.96.15.234 <none> 80/TCP 5m
# But the endpoints are empty!
$ kubectl get endpoints service-b -n namespace
NAME ENDPOINTS AGE
service-b <none> 5m
This is the classic "service exists but has no endpoints" problem. The service is correctly configured, DNS resolves properly, but there are no pods matching the service selector. This often happens when:
- Pod labels don't exactly match service selectors
- Pods are stuck in pending/failed state
- Readiness probes are failing, preventing endpoint registration
- Namespace isolation prevents pod discovery
2. DNS Resolution Hell
# DNS works for some services but not others
$ kubectl exec -it pod-a -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
$ kubectl exec -it pod-a -- nslookup service-b.namespace.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'service-b.namespace.svc.cluster.local': Name or service not known
DNS issues are particularly frustrating because they're intermittent and environment-specific. Common causes include:
- CoreDNS configuration issues
- Service not properly registered with DNS
- Network policies blocking DNS traffic
- DNS caching issues in applications
- Incorrect search domain configuration
3. Cross-Namespace Communication Failures
# This works within the same namespace
$ kubectl exec -it pod-a -n app-ns -- curl service-b
HTTP/1.1 200 OK
# But fails across namespaces
$ kubectl exec -it pod-a -n app-ns -- curl service-b.other-ns.svc.cluster.local
curl: (7) Failed to connect to service-b.other-ns.svc.cluster.local port 80: Connection timed out
This is often caused by network policies that default-deny cross-namespace traffic, but the error messages give no indication of policy violations.
The Ultimate Networking Debug Toolkit
Deploy this comprehensive debug environment to diagnose networking issues:
# Advanced network debugging pod with all the tools
apiVersion: v1
kind: Pod
metadata:
name: network-debug-swiss-army-knife
labels:
app: network-debug
spec:
containers:
- name: debug
image: nicolaka/netshoot:latest
command: ["sleep", "infinity"]
securityContext:
capabilities:
add: ["NET_ADMIN", "NET_RAW"]
privileged: true # For advanced debugging only
env:
- name: PS1
value: "netdebug:\\w# "
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
hostNetwork: false # Set to true to debug host networking
dnsPolicy: ClusterFirst
---
# Debug service to test service discovery
apiVersion: v1
kind: Service
metadata:
name: network-debug-service
spec:
selector:
app: network-debug
ports:
- port: 80
targetPort: 8080
name: http
type: ClusterIP
---
# NetworkPolicy to test policy restrictions (optional)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: debug-policy
spec:
podSelector:
matchLabels:
app: network-debug
policyTypes:
- Ingress
- Egress
ingress:
- from: [] # Allow all ingress for debugging
egress:
- to: [] # Allow all egress for debugging
Essential Debug Commands
# Get into the debug pod
kubectl exec -it network-debug-swiss-army-knife -- bash
# === DNS Debugging ===
# Test basic DNS resolution
nslookup kubernetes.default
dig kubernetes.default.svc.cluster.local
# Test specific service resolution
nslookup service-b.namespace.svc.cluster.local
dig +short service-b.namespace.svc.cluster.local
# Check DNS search domains
cat /etc/resolv.conf
# Should show search domains like:
# search default.svc.cluster.local svc.cluster.local cluster.local
# Test DNS server directly
dig @10.96.0.10 service-b.namespace.svc.cluster.local
# === Connectivity Testing ===
# Test basic connectivity to specific IP
ping 10.96.15.234
# Test port connectivity
telnet service-b.namespace.svc.cluster.local 80
nc -zv service-b.namespace.svc.cluster.local 80
# Test HTTP connectivity with detailed output
curl -v http://service-b.namespace.svc.cluster.local/health
curl -I --connect-timeout 5 --max-time 10 http://service-b.namespace.svc.cluster.local
# === Network Policy Debugging ===
# Check if traffic is being dropped by policies
# (requires NET_ADMIN capability)
tcpdump -i any -n host 10.96.15.234
# Monitor connection attempts
ss -tuln | grep :80
netstat -an | grep :80
# === Route and Interface Analysis ===
# Check routing table
ip route show
route -n
# Check network interfaces
ip addr show
ifconfig
# Check iptables rules (if accessible)
iptables -t nat -L | grep -i service-b
iptables -t filter -L | grep -i service-b
# === Service Mesh Debugging (if using Istio/Linkerd) ===
# Check proxy configuration
curl localhost:15000/config_dump # Envoy admin interface
curl localhost:15000/clusters # Upstream clusters
curl localhost:15000/listeners # Listener configuration
# === Advanced Packet Analysis ===
# Capture packets to/from specific service
tcpdump -i any -w /tmp/capture.pcap host service-b.namespace.svc.cluster.local
# Then analyze with: wireshark /tmp/capture.pcap
# Test MTU and packet fragmentation
ping -M do -s 1472 service-b.namespace.svc.cluster.local
tracepath service-b.namespace.svc.cluster.local
Systematic Network Troubleshooting Process
Phase 1: Basic Connectivity
# 1. Verify the target service exists and has endpoints
kubectl get svc service-b -n namespace -o wide
kubectl get endpoints service-b -n namespace
# 2. If no endpoints, check pod status and labels
kubectl get pods -n namespace -l app=service-b
kubectl describe pods -n namespace -l app=service-b
# 3. Verify service selector matches pod labels
kubectl get svc service-b -n namespace -o yaml | grep -A 5 selector
kubectl get pods -n namespace --show-labels | grep service-b
Phase 2: DNS Resolution
# 4. Test DNS from the source pod
kubectl exec -it source-pod -- nslookup service-b.namespace.svc.cluster.local
# 5. If DNS fails, check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns
# 6. Check DNS configuration in source pod
kubectl exec -it source-pod -- cat /etc/resolv.conf
Phase 3: Network Policy Analysis
# 7. Check for network policies affecting traffic
kubectl get networkpolicy -A
kubectl describe networkpolicy -n namespace
# 8. Test with temporary permissive policy
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-debug
namespace: namespace
spec:
podSelector: {}
policyTypes: [] # Allow all traffic
EOF
Phase 4: Deep Packet Analysis
# 9. Capture traffic on source and destination nodes
# On source node:
sudo tcpdump -i any -n 'host SERVICE_IP and port 80'
# On destination node:
sudo tcpdump -i any -n 'port 80'
# 10. Check iptables rules on nodes
sudo iptables -t nat -L KUBE-SERVICES | grep service-b
sudo iptables -t nat -L KUBE-SEP-* | grep SERVICE_IP
Common Network Policy Gotchas
Network policies are often the culprit in mysterious connection failures:
# This policy looks permissive but blocks everything by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: sneaky-deny-all
namespace: production
spec:
podSelector: {} # Applies to all pods in namespace
policyTypes:
- Ingress # Specifying this without rules = deny all ingress
- Egress # Specifying this without rules = deny all egress
# No ingress or egress rules = deny everything!
Better approach with explicit rules:
# Explicit network policy with clear intent
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: app-network-policy
namespace: production
spec:
podSelector:
matchLabels:
app: my-app
policyTypes:
- Ingress
- Egress
ingress:
# Allow traffic from pods with specific labels
- from:
- podSelector:
matchLabels:
app: frontend
- namespaceSelector:
matchLabels:
name: ingress-system
ports:
- protocol: TCP
port: 8080
egress:
# Allow DNS resolution
- to: []
ports:
- protocol: UDP
port: 53
# Allow database access
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
# Allow external API calls
- to: []
ports:
- protocol: TCP
port: 443
Service Mesh Networking Complications
When using service meshes like Istio or Linkerd, debugging becomes even more complex:
# Check if sidecars are properly injected
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].name}{"\n"}{end}'
# Verify mesh configuration
kubectl get virtualservices,destinationrules,gateways -A
# Check sidecar proxy logs
kubectl logs pod-name -c istio-proxy
kubectl logs pod-name -c linkerd-proxy
# Test connectivity bypassing the mesh
kubectl exec -it source-pod -c app-container -- curl destination-service
kubectl exec -it source-pod -c istio-proxy -- curl destination-service
Performance and Latency Issues
Network performance problems often manifest as timeouts or slow responses:
# Test network latency between pods
kubectl exec -it source-pod -- ping -c 10 destination-pod-ip
# Measure HTTP response times
kubectl exec -it source-pod -- curl -w "@curl-format.txt" -o /dev/null -s http://service-b.namespace.svc.cluster.local
# curl-format.txt contents:
# time_namelookup: %{time_namelookup}\n
# time_connect: %{time_connect}\n
# time_appconnect: %{time_appconnect}\n
# time_pretransfer: %{time_pretransfer}\n
# time_redirect: %{time_redirect}\n
#time_starttransfer: %{time_starttransfer}\n
# ----------\n
# time_total: %{time_total}\n
# Test bandwidth between pods
kubectl exec -it source-pod -- iperf3 -c destination-pod-ip
Monitoring Network Health
Set up comprehensive network monitoring to catch issues before they become incidents:
# Prometheus NetworkPolicy monitoring
apiVersion: v1
kind: ServiceMonitor
metadata:
name: network-policy-monitor
spec:
selector:
matchLabels:
app: network-policy-exporter
endpoints:
- port: metrics
---
# Alert on DNS resolution failures
groups:
- name: networking
rules:
- alert: DNSResolutionFailure
expr: |
increase(coredns_dns_response_rcode_count_total{rcode!="NOERROR"}[5m]) > 10
for: 2m
labels:
severity: warning
annotations:
summary: "High DNS resolution failure rate"
description: "DNS failures have increased in the last 5 minutes"
- alert: ServiceEndpointDown
expr: |
up{job="kubernetes-services"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service endpoint is down"
description: "Service {{ $labels.service }} in namespace {{ $labels.namespace }} has no healthy endpoints"
Quick Reference: Network Debugging Checklist
When networking fails, work through this checklist systematically:
- ✅ Service Discovery
- Does the service exist? (
kubectl get svc
) - Does it have endpoints? (
kubectl get endpoints
) - Are pod labels correct? (
kubectl get pods --show-labels
)
- Does the service exist? (
- ✅ DNS Resolution
- Can you resolve the service name? (
nslookup
) - Is CoreDNS running? (
kubectl get pods -n kube-system
) - Check DNS config (
cat /etc/resolv.conf
)
- Can you resolve the service name? (
- ✅ Network Connectivity
- Can you ping the service IP? (
ping
) - Can you connect to the port? (
telnet
/nc
) - Check routing (
ip route
)
- Can you ping the service IP? (
- ✅ Network Policies
- Are there policies blocking traffic? (
kubectl get netpol
) - Test with permissive policy temporarily
- Check policy logs if available
- Are there policies blocking traffic? (
- ✅ Node-Level Issues
- Check iptables rules (
iptables -t nat -L
) - Verify CNI plugin health
- Check kube-proxy logs
- Check iptables rules (
- ✅ Application-Level Issues
- Is the app actually listening? (
netstat -an
) - Check application logs
- Verify health check endpoints
- Is the app actually listening? (
The key to mastering Kubernetes networking is understanding that it's not one system but a collection of interconnected components. When troubleshooting, start with the basics (service existence, DNS, connectivity) before diving into complex packet analysis. Most networking issues are actually configuration problems disguised as mysterious connection failures.
5. RBAC: When Security Becomes a Maze
The Problem: Kubernetes RBAC is powerful but incredibly verbose. Creating proper permissions often requires deep knowledge of API groups and resource types that aren't obvious.
RBAC That Doesn't Work:
# This looks right but won't work for deployments
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: app-deployer
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "create", "update", "delete"]
This RBAC configuration is a classic trap that catches many Kubernetes newcomers. It looks logical—if you want to deploy applications, you need permissions on pods, right? Wrong! This Role will allow you to manually create individual pods, but it won't let you create Deployments, which are what you actually use in production. Deployments belong to the "apps" API group, not the core API group (indicated by the empty string).
The frustration here is that kubectl will give you cryptic permission denied errors when you try to apply your deployment manifests, and the error messages don't clearly explain the API group mismatch. You'll spend time debugging your YAML syntax when the real issue is insufficient RBAC permissions for the resources you're actually trying to create.
What You Actually Need:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: app-deployer
rules:
# For deployments (not in core API group!)
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "create", "update", "patch", "delete"]
# For pods (core API group)
- apiGroups: [""]
resources: ["pods", "pods/log", "pods/status"]
verbs: ["get", "list", "watch"]
# For services
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "list", "create", "update", "patch"]
# For configmaps and secrets
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list", "create", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: app-deployer-binding
subjects:
- kind: ServiceAccount
name: app-deployer
namespace: default
roleRef:
kind: Role
name: app-deployer
apiGroup: rbac.authorization.k8s.io
This comprehensive RBAC configuration demonstrates the complexity of Kubernetes permissions. Notice how Deployments and ReplicaSets require the "apps" API group, while pods, services, ConfigMaps, and Secrets use the core API group (empty string). The granular sub-resources like "pods/log" and "pods/status" are necessary for debugging and monitoring workflows.
The verb selection is also crucial—"patch" is required for most CI/CD tools that use strategic merge patches for updates, while "watch" is needed for controllers and operators that need to monitor resource changes. The RoleBinding ties everything together, but note that this only grants permissions within the namespace where the RoleBinding exists. For cluster-wide permissions, you'd need ClusterRole and ClusterRoleBinding instead.
Pro Tip: Use kubectl auth can-i
to test permissions:
# Test if service account can create deployments
kubectl auth can-i create deployments --as=system:serviceaccount:default:app-deployer
# Test specific resource
kubectl auth can-i get pods --as=system:serviceaccount:default:app-deployer -n production
6. Persistent Volume Provisioning Nightmares
The Problem: Persistent volumes often fail to provision correctly, leaving your stateful applications in pending state with cryptic error messages.
The Frustrating Experience:
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
app-data Pending fast-ssd 5m
$ kubectl describe pvc app-data
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ProvisioningFailed 3m persistentvolume-controller Failed to provision volume with StorageClass "fast-ssd": rpc error: code = ResourceExhausted desc = Insufficient quota
Better PVC with Explicit Configuration:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: app-data
annotations:
# Explicitly set volume binding mode
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
spec:
accessModes:
- ReadWriteOnce
storageClassName: gp3-encrypted # Use specific, tested storage class
resources:
requests:
storage: 20Gi
# Add selector for specific requirements
selector:
matchLabels:
environment: production
This PVC configuration takes a defensive approach to storage provisioning by being explicit about every critical setting. The storage-provisioner annotation ensures you're using the expected provisioner, preventing silent failures when multiple storage providers are available in your cluster. The specific storage class "gp3-encrypted" avoids relying on default storage classes that might change unexpectedly.
The selector with environment labels provides an additional layer of control, ensuring that production workloads only bind to volumes specifically tagged for production use. This prevents accidental data leakage between environments and helps enforce compliance requirements. The 20Gi size should be based on actual data growth projections, not wishful thinking—undersized volumes in production lead to emergency midnight expansion procedures.
Debug PV Issues:
# Check storage classes
kubectl get storageclass
# Check PV provisioner logs
kubectl logs -n kube-system -l app=ebs-csi-controller
# Describe the storage class
kubectl describe storageclass gp3-encrypted
7. Pod Scheduling Mysteries
The Problem: Pods get stuck in "Pending" state, and the scheduler's decisions often seem arbitrary. Node affinity, taints, and tolerations create a complex web of constraints.
When Your Pod Won't Schedule:
# This pod will never schedule on most clusters
apiVersion: v1
kind: Pod
metadata:
name: impossible-pod
spec:
containers:
- name: app
image: nginx
resources:
requests:
memory: "64Gi" # More memory than any node has
cpu: "32" # More CPU than available
nodeSelector:
disktype: ssd
gpu: "true"
zone: us-west-1a
tolerations: [] # Can't tolerate any taints
This pod specification is a perfect example of overly restrictive scheduling constraints that doom your workload to perpetual pending status. The resource requests alone would require a node with 64GB available memory and 32 CPU cores, which is rare even in large clusters. The nodeSelector compounds the problem by requiring specific labels that might not exist on any nodes.
The empty tolerations array is particularly problematic because most production clusters use taints to reserve certain nodes for specific workloads or to mark nodes during maintenance. Without appropriate tolerations, your pod can't schedule on tainted nodes, severely limiting placement options. This configuration teaches us that overly specific requirements often result in unschedulable workloads.
Better Scheduling Configuration:
apiVersion: v1
kind: Pod
metadata:
name: well-scheduled-pod
spec:
containers:
- name: app
image: nginx
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
# Use affinity instead of nodeSelector for flexibility
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: disktype
operator: In
values: ["ssd"]
- weight: 50
preference:
matchExpressions:
- key: zone
operator: In
values: ["us-west-1a", "us-west-1b"]
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values: ["nginx"]
topologyKey: kubernetes.io/hostname
tolerations:
- key: "high-memory"
operator: "Equal"
value: "true"
effect: "NoSchedule"
This configuration demonstrates sophisticated scheduling that balances preferences with flexibility. Instead of hard requirements through nodeSelector, it uses weighted preferences that guide the scheduler without creating impossible constraints. The scheduler will prefer SSD nodes and specific zones but can still place the pod elsewhere if needed.
The podAntiAffinity ensures high availability by preferring to schedule pods on different nodes, reducing the blast radius of node failures. The tolerations allow scheduling on specialized nodes when necessary. This approach provides intelligent placement while maintaining scheduling flexibility—your pods get better placement when possible but remain schedulable under all conditions.
Debug Scheduling Issues:
# Check why pod isn't scheduling
kubectl describe pod impossible-pod
# Check node resources
kubectl describe nodes | grep -A 5 "Allocated resources"
# Check scheduler logs
kubectl logs -n kube-system -l component=kube-scheduler
8. Rolling Updates Gone Wrong
The Problem: Rolling updates can fail spectacularly, leaving your application in a mixed state with old and new versions running simultaneously, often breaking functionality.
Deployment That Will Cause Problems:
apiVersion: apps/v1
kind: Deployment
metadata:
name: risky-app
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 50% # Too aggressive
maxSurge: 100% # Will double resource usage
template:
spec:
containers:
- name: app
image: myapp:latest # Never use 'latest' in production!
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5 # Too short for complex apps
periodSeconds: 5
failureThreshold: 3 # Too few failures allowed
This deployment configuration is a masterclass in how not to handle rolling updates. The 50% maxUnavailable setting means half your capacity disappears during updates, potentially causing service degradation or outages during deployment windows. The 100% maxSurge doubles your resource consumption temporarily, which can overwhelm cluster capacity and cause resource contention.
The "latest" image tag is particularly dangerous because it makes deployments non-deterministic—you never know exactly which version you're deploying, and rollbacks become impossible. The aggressive readiness probe settings will mark pods as ready before they're actually prepared to handle traffic, leading to failed requests during the update process. This configuration prioritizes speed over reliability, which is exactly backward for production deployments.
Safer Rolling Update Strategy:
apiVersion: apps/v1
kind: Deployment
metadata:
name: safe-app
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 # Conservative approach
maxSurge: 2 # Controlled resource increase
template:
metadata:
annotations:
# Force pod restart on config changes
config/hash: "{{ .Values.configHash }}"
spec:
containers:
- name: app
image: myapp:v1.2.3-abc123 # Specific, immutable tag
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30 # Allow app to fully start
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 6 # More tolerant of transient failures
successThreshold: 1
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 60 # Don't kill pods too early
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
# Set a reasonable progress deadline
progressDeadlineSeconds: 600
This deployment configuration prioritizes reliability and predictability over deployment speed. The conservative maxUnavailable setting of 1 ensures you maintain 90% capacity throughout the update process, while the controlled maxSurge of 2 limits resource overhead. The immutable image tag with version and commit hash enables precise rollbacks and eliminates deployment ambiguity.
The generous probe timeouts and failure thresholds accommodate real-world application startup patterns and temporary health check failures during deployments. The config/hash annotation ensures pods restart when configuration changes, preventing stale configuration issues. The progressDeadlineSeconds provides a safety net for stuck deployments, automatically failing deployments that can't complete within a reasonable timeframe.
9. The Log Aggregation Struggle
The Problem: Debugging issues in Kubernetes often requires correlating logs across multiple pods, but the built-in logging is limited and painful to use.
The Pain of Basic Logging:
# Trying to debug across multiple pods
kubectl logs deployment/my-app --previous
kubectl logs -l app=my-app --tail=100
kubectl logs my-app-7d4c8b5f6-xyz12 -c sidecar-container
# Logs are truncated, timestamps are inconsistent, no correlation IDs
Better Logging Setup:
# Structured logging configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: logging-config
data:
log4j2.xml: |
<?xml version="1.0" encoding="UTF-8"?>
<Configuration>
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<JSONLayout compact="true" eventEol="true">
<KeyValuePair key="timestamp" value="${date:yyyy-MM-dd'T'HH:mm:ss.SSSZ}"/>
<KeyValuePair key="level" value="${level}"/>
<KeyValuePair key="thread" value="${thread}"/>
<KeyValuePair key="logger" value="${logger}"/>
<KeyValuePair key="pod" value="${env:HOSTNAME}"/>
<KeyValuePair key="namespace" value="${env:POD_NAMESPACE}"/>
<KeyValuePair key="service" value="${env:SERVICE_NAME}"/>
</JSONLayout>
</Console>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="Console"/>
</Root>
</Loggers>
</Configuration>
---
# Fluentd DaemonSet for log collection
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
namespace: kube-system
spec:
selector:
matchLabels:
name: fluentd-elasticsearch
template:
metadata:
labels:
name: fluentd-elasticsearch
spec:
containers:
- name: fluentd-elasticsearch
image: quay.io/fluentd_elasticsearch/fluentd:v3.1.0
env:
- name: FLUENTD_SYSTEMD_CONF
value: disable
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: config-volume
mountPath: /etc/fluent/config.d
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: config-volume
configMap:
name: fluentd-config
This logging configuration transforms the chaotic world of Kubernetes logs into a structured, queryable system. The JSON-formatted log output includes critical metadata like pod name, namespace, and service name that makes correlation across distributed systems possible. Instead of hunting through multiple pod logs manually, you can now query for all logs from a specific service or namespace in your log aggregation system.
The Fluentd DaemonSet automatically collects logs from every node in your cluster, eliminating the need to manually configure log forwarding for each application. The read-only mount of docker containers and the var/log directory ensures comprehensive log collection without interfering with node operations. This setup provides the foundation for effective observability—when incidents occur, you can quickly filter and correlate logs across your entire application stack rather than playing detective with kubectl logs commands.
10. Namespace Isolation That Isn't
The Problem: Namespaces provide logical separation but don't enforce actual isolation by default. Resources can still communicate across namespaces, and RBAC permissions can accidentally grant too much access.
Namespace "Isolation" That Doesn't Work:
# This creates namespaces but no real isolation
apiVersion: v1
kind: Namespace
metadata:
name: team-a
---
apiVersion: v1
kind: Namespace
metadata:
name: team-b
# Pods in team-a can still reach services in team-b!
This configuration demonstrates one of Kubernetes' most misleading features—namespaces provide logical organization but zero network isolation by default. Many teams assume that creating separate namespaces automatically isolates their workloads, only to discover during security audits that services can freely communicate across namespace boundaries. A pod in team-a can easily reach team-b.default.svc.cluster.local, potentially accessing sensitive data or services.
This false sense of security is particularly dangerous in multi-tenant environments where different teams or applications share the same cluster. Without proper network policies, a compromised pod in one namespace can laterally move to access resources in other namespaces, bypassing application-level security measures entirely.
Proper Namespace Isolation:
# Network policies for actual isolation
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all-ingress
namespace: team-a
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
egress:
# Allow DNS
- to: []
ports:
- protocol: UDP
port: 53
# Allow within namespace
- to:
- namespaceSelector:
matchLabels:
name: team-a
# Allow to shared services
- to:
- namespaceSelector:
matchLabels:
name: shared-services
ports:
- protocol: TCP
port: 80
---
# Resource quotas to prevent resource hogging
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-a-quota
namespace: team-a
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
persistentvolumeclaims: "10"
pods: "50"
services: "10"
secrets: "10"
configmaps: "10"
---
# Limit ranges for individual resources
apiVersion: v1
kind: LimitRange
metadata:
name: team-a-limits
namespace: team-a
spec:
limits:
- default:
cpu: 500m
memory: 512Mi
defaultRequest:
cpu: 100m
memory: 128Mi
type: Container
- max:
cpu: "2"
memory: 4Gi
min:
cpu: 50m
memory: 64Mi
type: Container
This comprehensive namespace isolation setup provides real security and resource boundaries. The NetworkPolicy implements a default-deny approach, only allowing essential traffic like DNS resolution and controlled access to specific namespaces. Notice how egress rules explicitly allow DNS (port 53) and intra-namespace communication, while carefully controlling access to shared services through namespace labels.
The ResourceQuota prevents any team from monopolizing cluster resources, while the LimitRange ensures individual containers can't exceed reasonable bounds or deploy without resource specifications. Together, these three resources create true multi-tenancy—teams are isolated from each other's network traffic and resource consumption, while still allowing controlled sharing of common services. This approach scales to hundreds of teams while maintaining security and operational sanity.
Survival Tips for Kubernetes in Production
1. Always Use Resource Requests and Limits
Never deploy without setting these. Start conservative and adjust based on monitoring data.
2. Implement Proper Health Checks
# Comprehensive health check setup
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
successThreshold: 1
This health check configuration demonstrates the crucial distinction between liveness and readiness probes that many teams get wrong. The liveness probe uses /healthz endpoint with conservative timing—60 seconds initial delay and 30-second intervals prevent premature pod termination during startup or temporary issues. The higher failure threshold allows for transient problems without triggering unnecessary restarts.
The readiness probe uses a separate /ready endpoint with more aggressive timing since it only controls traffic routing, not pod lifecycle. The key insight is that these probes serve different purposes: readiness determines if a pod should receive traffic, while liveness determines if a pod should be restarted. Getting this right prevents cascading failures during deployments and ensures smooth traffic management during pod lifecycle events.
3. Use Immutable Image Tags
Never use latest
in production. Use semantic versioning with commit hashes:
image: myregistry/myapp:v1.2.3-git-abc123def
4. Set Up Monitoring and Alerting Early
Deploy Prometheus, Grafana, and AlertManager before you need them:
# ServiceMonitor for Prometheus
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-metrics
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics
interval: 30s
path: /metrics
This ServiceMonitor configuration automates the discovery of application metrics by Prometheus, eliminating manual configuration maintenance as your application scales. The label selector automatically includes any service with the "my-app" label, making metrics collection self-service for development teams. The 30-second scrape interval balances monitoring granularity with resource consumption.
The real power of this approach is that it makes monitoring a deployment-time decision rather than an operational afterthought. When teams deploy applications with proper metrics endpoints and ServiceMonitor configurations, they automatically get observability without involving the platform team. This scales monitoring operations across hundreds of services while maintaining consistency and reducing operational overhead.
5. Plan for Disaster Recovery
Always have a backup strategy for persistent data and a tested restore procedure.
Conclusion
Kubernetes is powerful but complex. These annoyances are the price we pay for flexibility and scalability. The key to managing Kubernetes successfully is to:
- Start simple and add complexity gradually
- Monitor everything from day one
- Automate repetitive tasks with proper tooling
- Document your decisions and configurations
- Test failure scenarios regularly
Remember, every experienced Kubernetes operator has been through these pain points. The difference between a novice and an expert isn't avoiding these issues—it's knowing how to debug and fix them quickly when they inevitably occur.
The most important lesson? When something goes wrong in Kubernetes (and it will), take a systematic approach to debugging. Check the basics first: resource availability, networking, RBAC permissions, and pod logs. Most issues fall into these categories, and having a systematic troubleshooting process will save you hours of frustration.
Happy kubectl-ing! 🚀