Kubernetes Security Hardening: A DevSecOps Engineer's Playbook
Securing Kubernetes clusters from day zero to production
Kubernetes Security Hardening: A DevSecOps Engineer's Playbook
Kubernetes security isn't an afterthought—it should be built into every layer of your cluster from day one. After securing dozens of production K8s environments, here's my battle-tested approach to hardening clusters.
The Kubernetes Security Model Reality Check
Most organizations deploy Kubernetes with defaults that prioritize convenience over security. That's a mistake that will bite you later. Let's fix that from the ground up.
Security Layers in Kubernetes
Think of K8s security like an onion:
- Cluster Infrastructure (nodes, network, etcd)
- Kubernetes API (RBAC, admission controllers)
- Workload Security (pods, containers, images)
- Runtime Security (monitoring, incident response)
Cluster Infrastructure Hardening
Node Security Configuration
Start with hardened node images and proper configuration:
# CIS Kubernetes Benchmark automated checks
curl -sSL https://github.com/aquasecurity/kube-bench/releases/latest/download/kube-bench_linux_amd64.tar.gz | tar xz
./kube-bench --config-dir cfg/ --config cfg/config.yaml
# Network security - disable unnecessary services
systemctl disable --now cups
systemctl disable --now bluetooth
systemctl disable --now avahi-daemon
# Kernel hardening
echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf
echo 'net.bridge.bridge-nf-call-iptables = 1' >> /etc/sysctl.conf
echo 'kernel.kptr_restrict = 2' >> /etc/sysctl.conf
sysctl -p
etcd Security Best Practices
Protect the brain of your cluster:
# etcd TLS configuration
apiVersion: v1
kind: Pod
metadata:
  name: etcd
spec:
  containers:
  - name: etcd
    image: k8s.gcr.io/etcd:3.5.1-0
    command:
    - etcd
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --client-cert-auth=true
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --peer-client-cert-auth=true
    - --auto-tls=false
    - --peer-auto-tls=false
API Server Hardening
Robust RBAC Configuration
Implement least privilege access from day one:
# Example: Developer role with limited permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: development
  name: developer
rules:
- apiGroups: [""]
  resources: ["pods", "services", "configmaps", "secrets"]
  verbs: ["get", "list", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list", "create", "update", "patch", "delete"]
- apiGroups: [""]
  resources: ["pods/exec", "pods/portforward"]
  verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: development
subjects:
- kind: User
  name: jane.developer
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io
Admission Controllers Configuration
Enable security-focused admission controllers:
# API server configuration
apiVersion: v1
kind: Pod
metadata:
  name: kube-apiserver
spec:
  containers:
  - name: kube-apiserver
    image: k8s.gcr.io/kube-apiserver:v1.28.0
    command:
    - kube-apiserver
    - --enable-admission-plugins=NodeRestriction,ResourceQuota,LimitRanger,SecurityContextDeny,PodSecurityPolicy,AlwaysPullImages
    - --audit-log-path=/var/log/audit.log
    - --audit-log-maxage=30
    - --audit-log-maxbackup=10
    - --audit-log-maxsize=100
    - --audit-policy-file=/etc/kubernetes/audit-policy.yaml
Comprehensive Audit Policy
Track everything that matters:
# /etc/kubernetes/audit-policy.yaml
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all security-sensitive operations at Metadata level
- level: Metadata
  namespaces: ["kube-system", "kube-public"]
  verbs: ["create", "update", "patch", "delete"]
  
# Log all secret operations
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets"]
    
# Log RBAC changes
- level: RequestResponse
  resources:
  - group: "rbac.authorization.k8s.io"
    
# Log pod exec and portforward
- level: Request
  resources:
  - group: ""
    resources: ["pods/exec", "pods/portforward"]
    
# Log everything else at Metadata level
- level: Metadata
  omitStages:
  - RequestReceived
Pod Security Standards Implementation
Replace deprecated PodSecurityPolicy with Pod Security Standards:
# Namespace with restricted security profile
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
Secure Pod Configuration Template
apiVersion: v1
kind: Pod
metadata:
  name: secure-app
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 10001
    runAsGroup: 10001
    fsGroup: 10001
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:v1.0.0
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      runAsUser: 10001
      capabilities:
        drop:
        - ALL
    resources:
      requests:
        memory: "64Mi"
        cpu: "250m"
      limits:
        memory: "128Mi"
        cpu: "500m"
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    - name: cache
      mountPath: /app/cache
  volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}
Network Security Implementation
NetworkPolicies for Microsegmentation
Implement zero-trust networking:
# Default deny-all policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
# Allow specific communication patterns
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: web-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: web-frontend
    ports:
    - protocol: TCP
      port: 8080
Service Mesh Security with Istio
Implement mTLS and fine-grained access control:
# Enable strict mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT
---
# Authorization policy
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: api-access
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-server
  action: ALLOW
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/production/sa/web-frontend"]
  - to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/v1/*"]
Container and Image Security
Image Security Scanning Pipeline
Integrate security scanning into your CI/CD:
#!/bin/bash
# Image security scanning script
IMAGE_NAME=$1
SEVERITY_THRESHOLD="HIGH"
# Trivy scanning
trivy image --severity ${SEVERITY_THRESHOLD},CRITICAL --exit-code 1 ${IMAGE_NAME}
if [ $? -ne 0 ]; then
    echo "Image failed security scan with ${SEVERITY_THRESHOLD} or CRITICAL vulnerabilities"
    exit 1
fi
# Cosign image signing verification
cosign verify --key cosign.pub ${IMAGE_NAME}
if [ $? -ne 0 ]; then
    echo "Image signature verification failed"
    exit 1
fi
echo "Image passed security checks"
Distroless Container Best Practices
Use minimal base images:
# Multi-stage build with distroless final image
FROM golang:1.19-alpine AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"' -o main .
FROM gcr.io/distroless/static-debian11
COPY --from=builder /app/main /
EXPOSE 8080
USER 10001
ENTRYPOINT ["/main"]
Runtime Security Monitoring
Falco Rules for Runtime Threat Detection
Deploy Falco for runtime security monitoring:
# Custom Falco rules
- rule: Unexpected K8s NodePort Connection
  desc: Detect attempts to connect to K8s NodePort services
  condition: >
    (inbound_outbound) and
    fd.sport >= 30000 and fd.sport <= 32767 and
    not proc.name in (kube-proxy, kubelet)
  output: >
    Unexpected K8s NodePort connection
    (connection=%fd.name sport=%fd.sport dport=%fd.dport 
     proc=%proc.name command=%proc.cmdline)
  priority: WARNING
- rule: Detect crypto mining
  desc: Detect cryptocurrency mining activities
  condition: >
    spawned_process and
    (proc.name in (xmrig, minergate, ccminer, cgminer) or
     proc.cmdline contains "stratum+tcp" or
     proc.cmdline contains "mining.pool")
  output: >
    Crypto mining process detected
    (user=%user.name command=%proc.cmdline)
  priority: CRITICAL
OPA Gatekeeper Policies
Implement policy-as-code with Gatekeeper:
# Require security context
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredsecuritycontext
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredSecurityContext
      validation:
        openAPIV3Schema:
          type: object
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredsecuritycontext
        
        violation[{"msg": msg}] {
            container := input.review.object.spec.containers[_]
            not container.securityContext.runAsNonRoot
            msg := "Container must run as non-root user"
        }
        
        violation[{"msg": msg}] {
            container := input.review.object.spec.containers[_]
            container.securityContext.allowPrivilegeEscalation != false
            msg := "Container must not allow privilege escalation"
        }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredSecurityContext
metadata:
  name: must-have-security-context
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    namespaces: ["production", "staging"]
Secrets Management Strategy
External Secrets Operator Configuration
Never store secrets in etcd:
# External Secrets Operator with AWS Secrets Manager
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secrets-manager
  namespace: production
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-west-2
      auth:
        secretRef:
          accessKeyID:
            name: awssm-secret
            key: access-key
          secretAccessKey:
            name: awssm-secret
            key: secret-access-key
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-credentials
  namespace: production
spec:
  refreshInterval: 15s
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: db-credentials
    creationPolicy: Owner
  data:
  - secretKey: username
    remoteRef:
      key: prod/database
      property: username
  - secretKey: password
    remoteRef:
      key: prod/database
      property: password
Automated Security Testing
Kubernetes Security Testing Script
#!/usr/bin/env python3
"""
Kubernetes Security Assessment Script
"""
import subprocess
import json
import sys
from typing import List, Dict
class K8sSecurityChecker:
    def __init__(self):
        self.results = []
    
    def run_kube_bench(self) -> Dict:
        """Run CIS Kubernetes Benchmark checks"""
        try:
            result = subprocess.run(
                ['kube-bench', '--json'],
                capture_output=True,
                text=True,
                check=True
            )
            return json.loads(result.stdout)
        except subprocess.CalledProcessError:
            return {"error": "kube-bench failed"}
    
    def check_rbac_permissions(self) -> List[Dict]:
        """Check for overly permissive RBAC"""
        dangerous_permissions = []
        
        # Check for cluster-admin bindings
        try:
            result = subprocess.run([
                'kubectl', 'get', 'clusterrolebindings', 
                '-o', 'json'
            ], capture_output=True, text=True, check=True)
            
            bindings = json.loads(result.stdout)
            for binding in bindings['items']:
                if binding['roleRef']['name'] == 'cluster-admin':
                    dangerous_permissions.append({
                        'type': 'cluster-admin-binding',
                        'name': binding['metadata']['name'],
                        'subjects': binding.get('subjects', [])
                    })
        except subprocess.CalledProcessError:
            pass
            
        return dangerous_permissions
    
    def check_pod_security_standards(self) -> List[Dict]:
        """Check Pod Security Standards compliance"""
        violations = []
        
        try:
            result = subprocess.run([
                'kubectl', 'get', 'pods', '--all-namespaces',
                '-o', 'json'
            ], capture_output=True, text=True, check=True)
            
            pods = json.loads(result.stdout)
            for pod in pods['items']:
                issues = self._analyze_pod_security(pod)
                if issues:
                    violations.append({
                        'pod': f"{pod['metadata']['namespace']}/{pod['metadata']['name']}",
                        'issues': issues
                    })
                    
        except subprocess.CalledProcessError:
            pass
            
        return violations
    
    def _analyze_pod_security(self, pod: Dict) -> List[str]:
        """Analyze individual pod for security issues"""
        issues = []
        spec = pod.get('spec', {})
        
        # Check if running as root
        if not spec.get('securityContext', {}).get('runAsNonRoot'):
            issues.append("Pod may be running as root")
        
        # Check containers
        for container in spec.get('containers', []):
            sec_ctx = container.get('securityContext', {})
            
            if sec_ctx.get('privileged'):
                issues.append(f"Container {container['name']} is privileged")
            
            if sec_ctx.get('allowPrivilegeEscalation', True):
                issues.append(f"Container {container['name']} allows privilege escalation")
        
        return issues
    
    def generate_report(self) -> str:
        """Generate comprehensive security report"""
        print("Running Kubernetes Security Assessment...")
        
        # Run checks
        cis_results = self.run_kube_bench()
        rbac_issues = self.check_rbac_permissions()
        pod_violations = self.check_pod_security_standards()
        
        report = f"""
Kubernetes Security Assessment Report
=====================================
CIS Benchmark Results:
{json.dumps(cis_results, indent=2)}
RBAC Issues Found: {len(rbac_issues)}
{json.dumps(rbac_issues, indent=2)}
Pod Security Violations: {len(pod_violations)}
{json.dumps(pod_violations, indent=2)}
Recommendations:
- Review and remediate CIS benchmark failures
- Implement least-privilege RBAC policies
- Enable Pod Security Standards
- Regular security scanning and monitoring
"""
        
        return report
if __name__ == "__main__":
    checker = K8sSecurityChecker()
    report = checker.generate_report()
    print(report)
    
    # Exit with error if critical issues found
    if "FAIL" in report or "privileged" in report:
        sys.exit(1)
Production Deployment Checklist
Pre-Deployment Security Validation
#!/bin/bash
# Pre-deployment security checklist
echo "🔒 Running Kubernetes Security Pre-Deployment Checks..."
# 1. Check for security contexts
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.securityContext.runAsNonRoot}{"\n"}{end}' | grep -v "true"
# 2. Validate network policies exist
kubectl get networkpolicies --all-namespaces
# 3. Check for resource limits
kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].resources.limits}{"\n"}{end}' | grep -v "map"
# 4. Verify image signatures
for image in $(kubectl get pods --all-namespaces -o jsonpath='{.items[*].spec.containers[*].image}' | tr ' ' '\n' | sort -u); do
    echo "Checking signature for $image"
    cosign verify --key cosign.pub $image || echo "❌ No valid signature for $image"
done
# 5. Run security policy checks
gatekeeper-policy-manager audit
echo "✅ Security checks completed"
Monitoring and Incident Response
Security Metrics to Track
# Prometheus monitoring rules
groups:
- name: kubernetes-security
  rules:
  - alert: UnauthorizedAPIAccess
    expr: increase(apiserver_audit_total{verb="create",objectRef_resource="pods/exec"}[5m]) > 0
    labels:
      severity: critical
    annotations:
      summary: "Unauthorized pod exec detected"
      
  - alert: PrivilegedPodCreated
    expr: increase(kube_pod_container_info{container_security_context_privileged="true"}[5m]) > 0
    labels:
      severity: high
    annotations:
      summary: "Privileged pod created"
      
  - alert: FailedRBACCheck
    expr: increase(apiserver_audit_total{verb="create",objectRef_resource="rolebindings",response_code!~"2.."}[5m]) > 3
    labels:
      severity: warning
    annotations:
      summary: "Multiple failed RBAC operations detected"
Wrapping Up
Kubernetes security isn't a one-time setup—it's an ongoing process. Start with these fundamentals:
- Harden the infrastructure before deploying workloads
- Implement defense in depth across all layers
- Automate security testing in your CI/CD pipeline
- Monitor continuously and respond quickly to threats
- Keep learning as the threat landscape evolves
Remember: security is a journey, not a destination. The key is building security into your processes from the beginning rather than bolting it on later.
Looking for more DevSecOps insights? Subscribe to my newsletter for weekly deep-dives into cloud security, automation, and real-world war stories from the trenches.