Kubernetes Security Hardening: A DevSecOps Engineer's Playbook
Kubernetes security isn't an afterthought—it should be built into every layer of your cluster from day one. After securing dozens of production K8s environments, here's my battle-tested approach to hardening clusters.
The Kubernetes Security Model Reality Check
Most organizations deploy Kubernetes with defaults that prioritize convenience over security. That's a mistake that will bite you later. Let's fix that from the ground up.
Security Layers in Kubernetes
Think of K8s security like an onion:
- Cluster Infrastructure (nodes, network, etcd)
- Kubernetes API (RBAC, admission controllers)
- Workload Security (pods, containers, images)
- Runtime Security (monitoring, incident response)
Cluster Infrastructure Hardening
Node Security Configuration
Start with hardened node images and proper configuration:
# CIS Kubernetes Benchmark automated checks curl -sSL https://github.com/aquasecurity/kube-bench/releases/latest/download/kube-bench_linux_amd64.tar.gz | tar xz ./kube-bench --config-dir cfg/ --config cfg/config.yaml # Network security - disable unnecessary services systemctl disable --now cups systemctl disable --now bluetooth systemctl disable --now avahi-daemon # Kernel hardening echo 'net.ipv4.ip_forward = 1' >> /etc/sysctl.conf echo 'net.bridge.bridge-nf-call-iptables = 1' >> /etc/sysctl.conf echo 'kernel.kptr_restrict = 2' >> /etc/sysctl.conf sysctl -p
etcd Security Best Practices
Protect the brain of your cluster:
# etcd TLS configuration apiVersion: v1 kind: Pod metadata: name: etcd spec: containers: - name: etcd image: k8s.gcr.io/etcd:3.5.1-0 command: - etcd - --cert-file=/etc/kubernetes/pki/etcd/server.crt - --key-file=/etc/kubernetes/pki/etcd/server.key - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt - --client-cert-auth=true - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt - --peer-client-cert-auth=true - --auto-tls=false - --peer-auto-tls=false
API Server Hardening
Robust RBAC Configuration
Implement least privilege access from day one:
# Example: Developer role with limited permissions apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: development name: developer rules: - apiGroups: [""] resources: ["pods", "services", "configmaps", "secrets"] verbs: ["get", "list", "create", "update", "patch", "delete"] - apiGroups: ["apps"] resources: ["deployments", "replicasets"] verbs: ["get", "list", "create", "update", "patch", "delete"] - apiGroups: [""] resources: ["pods/exec", "pods/portforward"] verbs: ["create"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: developer-binding namespace: development subjects: - kind: User name: jane.developer apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: developer apiGroup: rbac.authorization.k8s.io
Admission Controllers Configuration
Enable security-focused admission controllers:
# API server configuration apiVersion: v1 kind: Pod metadata: name: kube-apiserver spec: containers: - name: kube-apiserver image: k8s.gcr.io/kube-apiserver:v1.28.0 command: - kube-apiserver - --enable-admission-plugins=NodeRestriction,ResourceQuota,LimitRanger,SecurityContextDeny,PodSecurityPolicy,AlwaysPullImages - --audit-log-path=/var/log/audit.log - --audit-log-maxage=30 - --audit-log-maxbackup=10 - --audit-log-maxsize=100 - --audit-policy-file=/etc/kubernetes/audit-policy.yaml
Comprehensive Audit Policy
Track everything that matters:
# /etc/kubernetes/audit-policy.yaml apiVersion: audit.k8s.io/v1 kind: Policy rules: # Log all security-sensitive operations at Metadata level - level: Metadata namespaces: ["kube-system", "kube-public"] verbs: ["create", "update", "patch", "delete"] # Log all secret operations - level: RequestResponse resources: - group: "" resources: ["secrets"] # Log RBAC changes - level: RequestResponse resources: - group: "rbac.authorization.k8s.io" # Log pod exec and portforward - level: Request resources: - group: "" resources: ["pods/exec", "pods/portforward"] # Log everything else at Metadata level - level: Metadata omitStages: - RequestReceived
Pod Security Standards Implementation
Replace deprecated PodSecurityPolicy with Pod Security Standards:
# Namespace with restricted security profile apiVersion: v1 kind: Namespace metadata: name: production labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted
Secure Pod Configuration Template
apiVersion: v1 kind: Pod metadata: name: secure-app spec: securityContext: runAsNonRoot: true runAsUser: 10001 runAsGroup: 10001 fsGroup: 10001 seccompProfile: type: RuntimeDefault containers: - name: app image: myapp:v1.0.0 securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 10001 capabilities: drop: - ALL resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m" volumeMounts: - name: tmp mountPath: /tmp - name: cache mountPath: /app/cache volumes: - name: tmp emptyDir: {} - name: cache emptyDir: {}
Network Security Implementation
NetworkPolicies for Microsegmentation
Implement zero-trust networking:
# Default deny-all policy apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: production spec: podSelector: {} policyTypes: - Ingress - Egress --- # Allow specific communication patterns apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: web-to-api namespace: production spec: podSelector: matchLabels: app: api-server policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: web-frontend ports: - protocol: TCP port: 8080
Service Mesh Security with Istio
Implement mTLS and fine-grained access control:
# Enable strict mTLS apiVersion: security.istio.io/v1beta1 kind: PeerAuthentication metadata: name: default namespace: production spec: mtls: mode: STRICT --- # Authorization policy apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: api-access namespace: production spec: selector: matchLabels: app: api-server action: ALLOW rules: - from: - source: principals: ["cluster.local/ns/production/sa/web-frontend"] - to: - operation: methods: ["GET", "POST"] paths: ["/api/v1/*"]
Container and Image Security
Image Security Scanning Pipeline
Integrate security scanning into your CI/CD:
#!/bin/bash # Image security scanning script IMAGE_NAME=$1 SEVERITY_THRESHOLD="HIGH" # Trivy scanning trivy image --severity ${SEVERITY_THRESHOLD},CRITICAL --exit-code 1 ${IMAGE_NAME} if [ $? -ne 0 ]; then echo "Image failed security scan with ${SEVERITY_THRESHOLD} or CRITICAL vulnerabilities" exit 1 fi # Cosign image signing verification cosign verify --key cosign.pub ${IMAGE_NAME} if [ $? -ne 0 ]; then echo "Image signature verification failed" exit 1 fi echo "Image passed security checks"
Distroless Container Best Practices
Use minimal base images:
# Multi-stage build with distroless final image FROM golang:1.19-alpine AS builder WORKDIR /app COPY . . RUN CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"' -o main . FROM gcr.io/distroless/static-debian11 COPY --from=builder /app/main / EXPOSE 8080 USER 10001 ENTRYPOINT ["/main"]
Runtime Security Monitoring
Falco Rules for Runtime Threat Detection
Deploy Falco for runtime security monitoring:
# Custom Falco rules - rule: Unexpected K8s NodePort Connection desc: Detect attempts to connect to K8s NodePort services condition: > (inbound_outbound) and fd.sport >= 30000 and fd.sport <= 32767 and not proc.name in (kube-proxy, kubelet) output: > Unexpected K8s NodePort connection (connection=%fd.name sport=%fd.sport dport=%fd.dport proc=%proc.name command=%proc.cmdline) priority: WARNING - rule: Detect crypto mining desc: Detect cryptocurrency mining activities condition: > spawned_process and (proc.name in (xmrig, minergate, ccminer, cgminer) or proc.cmdline contains "stratum+tcp" or proc.cmdline contains "mining.pool") output: > Crypto mining process detected (user=%user.name command=%proc.cmdline) priority: CRITICAL
OPA Gatekeeper Policies
Implement policy-as-code with Gatekeeper:
# Require security context apiVersion: templates.gatekeeper.sh/v1beta1 kind: ConstraintTemplate metadata: name: k8srequiredsecuritycontext spec: crd: spec: names: kind: K8sRequiredSecurityContext validation: openAPIV3Schema: type: object targets: - target: admission.k8s.gatekeeper.sh rego: | package k8srequiredsecuritycontext violation[{"msg": msg}] { container := input.review.object.spec.containers[_] not container.securityContext.runAsNonRoot msg := "Container must run as non-root user" } violation[{"msg": msg}] { container := input.review.object.spec.containers[_] container.securityContext.allowPrivilegeEscalation != false msg := "Container must not allow privilege escalation" } --- apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sRequiredSecurityContext metadata: name: must-have-security-context spec: match: kinds: - apiGroups: [""] kinds: ["Pod"] namespaces: ["production", "staging"]
Secrets Management Strategy
External Secrets Operator Configuration
Never store secrets in etcd:
# External Secrets Operator with AWS Secrets Manager apiVersion: external-secrets.io/v1beta1 kind: SecretStore metadata: name: aws-secrets-manager namespace: production spec: provider: aws: service: SecretsManager region: us-west-2 auth: secretRef: accessKeyID: name: awssm-secret key: access-key secretAccessKey: name: awssm-secret key: secret-access-key --- apiVersion: external-secrets.io/v1beta1 kind: ExternalSecret metadata: name: database-credentials namespace: production spec: refreshInterval: 15s secretStoreRef: name: aws-secrets-manager kind: SecretStore target: name: db-credentials creationPolicy: Owner data: - secretKey: username remoteRef: key: prod/database property: username - secretKey: password remoteRef: key: prod/database property: password
Automated Security Testing
Kubernetes Security Testing Script
#!/usr/bin/env python3 """ Kubernetes Security Assessment Script """ import subprocess import json import sys from typing import List, Dict class K8sSecurityChecker: def __init__(self): self.results = [] def run_kube_bench(self) -> Dict: """Run CIS Kubernetes Benchmark checks""" try: result = subprocess.run( ['kube-bench', '--json'], capture_output=True, text=True, check=True ) return json.loads(result.stdout) except subprocess.CalledProcessError: return {"error": "kube-bench failed"} def check_rbac_permissions(self) -> List[Dict]: """Check for overly permissive RBAC""" dangerous_permissions = [] # Check for cluster-admin bindings try: result = subprocess.run([ 'kubectl', 'get', 'clusterrolebindings', '-o', 'json' ], capture_output=True, text=True, check=True) bindings = json.loads(result.stdout) for binding in bindings['items']: if binding['roleRef']['name'] == 'cluster-admin': dangerous_permissions.append({ 'type': 'cluster-admin-binding', 'name': binding['metadata']['name'], 'subjects': binding.get('subjects', []) }) except subprocess.CalledProcessError: pass return dangerous_permissions def check_pod_security_standards(self) -> List[Dict]: """Check Pod Security Standards compliance""" violations = [] try: result = subprocess.run([ 'kubectl', 'get', 'pods', '--all-namespaces', '-o', 'json' ], capture_output=True, text=True, check=True) pods = json.loads(result.stdout) for pod in pods['items']: issues = self._analyze_pod_security(pod) if issues: violations.append({ 'pod': f"{pod['metadata']['namespace']}/{pod['metadata']['name']}", 'issues': issues }) except subprocess.CalledProcessError: pass return violations def _analyze_pod_security(self, pod: Dict) -> List[str]: """Analyze individual pod for security issues""" issues = [] spec = pod.get('spec', {}) # Check if running as root if not spec.get('securityContext', {}).get('runAsNonRoot'): issues.append("Pod may be running as root") # Check containers for container in spec.get('containers', []): sec_ctx = container.get('securityContext', {}) if sec_ctx.get('privileged'): issues.append(f"Container {container['name']} is privileged") if sec_ctx.get('allowPrivilegeEscalation', True): issues.append(f"Container {container['name']} allows privilege escalation") return issues def generate_report(self) -> str: """Generate comprehensive security report""" print("Running Kubernetes Security Assessment...") # Run checks cis_results = self.run_kube_bench() rbac_issues = self.check_rbac_permissions() pod_violations = self.check_pod_security_standards() report = f""" Kubernetes Security Assessment Report ===================================== CIS Benchmark Results: {json.dumps(cis_results, indent=2)} RBAC Issues Found: {len(rbac_issues)} {json.dumps(rbac_issues, indent=2)} Pod Security Violations: {len(pod_violations)} {json.dumps(pod_violations, indent=2)} Recommendations: - Review and remediate CIS benchmark failures - Implement least-privilege RBAC policies - Enable Pod Security Standards - Regular security scanning and monitoring """ return report if __name__ == "__main__": checker = K8sSecurityChecker() report = checker.generate_report() print(report) # Exit with error if critical issues found if "FAIL" in report or "privileged" in report: sys.exit(1)
Production Deployment Checklist
Pre-Deployment Security Validation
#!/bin/bash # Pre-deployment security checklist echo "🔒 Running Kubernetes Security Pre-Deployment Checks..." # 1. Check for security contexts kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.securityContext.runAsNonRoot}{"\n"}{end}' | grep -v "true" # 2. Validate network policies exist kubectl get networkpolicies --all-namespaces # 3. Check for resource limits kubectl get pods --all-namespaces -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].resources.limits}{"\n"}{end}' | grep -v "map" # 4. Verify image signatures for image in $(kubectl get pods --all-namespaces -o jsonpath='{.items[*].spec.containers[*].image}' | tr ' ' '\n' | sort -u); do echo "Checking signature for $image" cosign verify --key cosign.pub $image || echo "❌ No valid signature for $image" done # 5. Run security policy checks gatekeeper-policy-manager audit echo "✅ Security checks completed"
Monitoring and Incident Response
Security Metrics to Track
# Prometheus monitoring rules groups: - name: kubernetes-security rules: - alert: UnauthorizedAPIAccess expr: increase(apiserver_audit_total{verb="create",objectRef_resource="pods/exec"}[5m]) > 0 labels: severity: critical annotations: summary: "Unauthorized pod exec detected" - alert: PrivilegedPodCreated expr: increase(kube_pod_container_info{container_security_context_privileged="true"}[5m]) > 0 labels: severity: high annotations: summary: "Privileged pod created" - alert: FailedRBACCheck expr: increase(apiserver_audit_total{verb="create",objectRef_resource="rolebindings",response_code!~"2.."}[5m]) > 3 labels: severity: warning annotations: summary: "Multiple failed RBAC operations detected"
Wrapping Up
Kubernetes security isn't a one-time setup—it's an ongoing process. Start with these fundamentals:
- Harden the infrastructure before deploying workloads
- Implement defense in depth across all layers
- Automate security testing in your CI/CD pipeline
- Monitor continuously and respond quickly to threats
- Keep learning as the threat landscape evolves
Remember: security is a journey, not a destination. The key is building security into your processes from the beginning rather than bolting it on later.
Looking for more DevSecOps insights? Subscribe to my newsletter for weekly deep-dives into cloud security, automation, and real-world war stories from the trenches.