Incident Response Playbook: Lessons from Real Cyber Attacks

After responding to dozens of security incidents—from ransomware attacks to APT campaigns—I've learned that good incident response isn't just about having the right tools. It's about having the right processes, mindset, and ability to stay calm under pressure.

The Reality of Incident Response

The First 30 Minutes Are Critical

When the SOC calls you at 3 AM with "We think we have a breach," what you do in the next 30 minutes often determines whether you're dealing with a minor incident or a company-ending catastrophe.

Here's my battle-tested approach:

# Immediate triage script I keep ready
#!/bin/bash
echo "=== INCIDENT RESPONSE TRIAGE ==="
echo "Time: $(date)"
echo "Analyst: $USER"

# Quick system status
echo -e "\n=== SYSTEM STATUS ==="
uptime
df -h
free -m

# Check for obvious signs of compromise
echo -e "\n=== PROCESS ANALYSIS ==="
ps aux | grep -E "(crypto|miner|xmrig|monero)" | grep -v grep
netstat -tulpn | grep ESTABLISHED | wc -l

# Check recent logins
echo -e "\n=== RECENT LOGINS ==="
last | head -10

# Look for new files in common attack locations
echo -e "\n=== RECENT FILE CHANGES ==="
find /tmp /var/tmp /dev/shm -type f -mtime -1 2>/dev/null | head -20

The NIST Framework in Practice

Preparation: Building Your Arsenal

Don't wait for an incident to start building your toolkit. Here's what I keep ready:

#!/usr/bin/env python3
"""
Incident Response Toolkit
Quick deployment script for IR tools
"""
import subprocess
import os
import sys

class IRToolkit:
    def __init__(self):
        self.tools = {
            'volatility': 'Memory analysis',
            'plaso': 'Timeline analysis', 
            'bulk_extractor': 'Evidence extraction',
            'sleuthkit': 'File system analysis',
            'yara': 'Malware detection',
            'clamav': 'Antivirus scanning'
        }
    
    def deploy_tools(self):
        """Deploy essential IR tools quickly"""
        print("🚀 Deploying Incident Response Toolkit...")
        
        for tool, description in self.tools.items():
            if self.is_tool_available(tool):
                print(f"✅ {tool} - {description}")
            else:
                print(f"❌ {tool} - Installing...")
                self.install_tool(tool)
    
    def is_tool_available(self, tool):
        return subprocess.call(['which', tool], 
                             stdout=subprocess.DEVNULL, 
                             stderr=subprocess.DEVNULL) == 0
    
    def install_tool(self, tool):
        install_commands = {
            'volatility': 'pip3 install volatility3',
            'plaso': 'apt-get install -y plaso-tools',
            'bulk_extractor': 'apt-get install -y bulk-extractor',
            'sleuthkit': 'apt-get install -y sleuthkit',
            'yara': 'apt-get install -y yara',
            'clamav': 'apt-get install -y clamav clamav-daemon'
        }
        
        if tool in install_commands:
            subprocess.run(install_commands[tool], shell=True)

if __name__ == "__main__":
    toolkit = IRToolkit()
    toolkit.deploy_tools()

Detection and Analysis: Finding the Needle

Memory Analysis Workflow

When you suspect active malware, memory is your best friend:

# Memory acquisition and analysis workflow
# 1. Acquire memory dump
sudo dd if=/dev/mem of=/case/memory.dump bs=1M

# 2. Identify the profile
vol.py -f /case/memory.dump windows.info

# 3. Hunt for malicious processes
vol.py -f /case/memory.dump --profile=Win10x64_19041 windows.pslist
vol.py -f /case/memory.dump --profile=Win10x64_19041 windows.pstree

# 4. Check network connections
vol.py -f /case/memory.dump --profile=Win10x64_19041 windows.netscan

# 5. Look for code injection
vol.py -f /case/memory.dump --profile=Win10x64_19041 windows.malfind
vol.py -f /case/memory.dump --profile=Win10x64_19041 windows.hollowfind

# 6. Extract suspicious processes
vol.py -f /case/memory.dump --profile=Win10x64_19041 windows.memmap --pid 1234 --dump-dir /case/extracted/

Log Analysis Automation

Time is critical during incidents. Here's my log analysis automation:

import re
import pandas as pd
from datetime import datetime, timedelta
from collections import Counter

class LogAnalyzer:
    def __init__(self, log_file):
        self.log_file = log_file
        self.suspicious_patterns = {
            'sql_injection': r'(union|select|drop|insert|update|delete).*from',
            'xss_attempt': r'<script|javascript:|vbscript:|onload|onerror',
            'directory_traversal': r'\.\.[\\/]',
            'command_injection': r'[;&|`]\s*(cat|ls|pwd|whoami|id|uname)',
            'brute_force': r'(401|403|failed)',
        }
    
    def analyze_timeframe(self, hours_back=24):
        """Analyze logs for the last N hours"""
        cutoff_time = datetime.now() - timedelta(hours=hours_back)
        suspicious_events = []
        
        with open(self.log_file, 'r') as f:
            for line in f:
                event = self.parse_log_line(line)
                if event and event['timestamp'] > cutoff_time:
                    threats = self.check_threats(event['message'])
                    if threats:
                        event['threats'] = threats
                        suspicious_events.append(event)
        
        return suspicious_events
    
    def parse_log_line(self, line):
        """Parse common log formats"""
        # Apache/Nginx combined log format
        pattern = r'(\S+) \S+ \S+ \[([\w:/]+\s[+\-]\d{4})\] "(\S+) (\S+) (\S+)" (\d{3}) (\d+|-) "([^"]*)" "([^"]*)"'
        match = re.match(pattern, line)
        
        if match:
            return {
                'ip': match.group(1),
                'timestamp': datetime.strptime(match.group(2), '%d/%b/%Y:%H:%M:%S %z'),
                'method': match.group(3),
                'uri': match.group(4),
                'status': int(match.group(6)),
                'size': match.group(7),
                'referer': match.group(8),
                'user_agent': match.group(9),
                'message': line
            }
        return None
    
    def check_threats(self, message):
        """Check for threat indicators"""
        threats = []
        for threat_name, pattern in self.suspicious_patterns.items():
            if re.search(pattern, message, re.IGNORECASE):
                threats.append(threat_name)
        return threats
    
    def generate_ioc_report(self, events):
        """Generate IOC report from suspicious events"""
        ips = Counter([event['ip'] for event in events])
        user_agents = Counter([event['user_agent'] for event in events])
        threats = Counter([threat for event in events for threat in event.get('threats', [])])
        
        report = f"""
INCIDENT RESPONSE - IOC ANALYSIS
===============================
Analysis Time: {datetime.now()}
Events Analyzed: {len(events)}

TOP SUSPICIOUS IPs:
{self.format_counter(ips, 10)}

TOP THREAT TYPES:
{self.format_counter(threats, 5)}

TOP SUSPICIOUS USER AGENTS:
{self.format_counter(user_agents, 5)}

RECOMMENDATIONS:
- Block top suspicious IPs at firewall level
- Investigate source of threat patterns
- Review authentication logs for these IPs
- Check for lateral movement from these sources
"""
        return report
    
    def format_counter(self, counter, top_n):
        """Format counter for reporting"""
        result = ""
        for item, count in counter.most_common(top_n):
            result += f"  {item}: {count}\n"
        return result

Containment: Stop the Bleeding

Network Isolation Script

When you need to isolate a compromised system quickly:

#!/bin/bash
# Emergency network isolation script
# Usage: ./isolate.sh <hostname_or_ip>

TARGET=$1
ISOLATION_VLAN=999  # Quarantine VLAN

if [ -z "$TARGET" ]; then
    echo "Usage: $0 <hostname_or_ip>"
    exit 1
fi

echo "🚨 EMERGENCY ISOLATION INITIATED FOR: $TARGET"
echo "Time: $(date)"

# Log the isolation action
echo "$(date) - ISOLATION: $TARGET isolated by $USER" >> /var/log/incident-response.log

# Method 1: Switch-based isolation (if you have management access)
isolate_via_switch() {
    echo "Attempting switch-based isolation..."
    
    # Find switch port (this varies by network infrastructure)
    MAC=$(arp -n $TARGET | awk '{print $3}')
    
    if [ ! -z "$MAC" ]; then
        echo "MAC Address found: $MAC"
        
        # SNMP commands to move port to quarantine VLAN
        # (Replace with your actual switch management commands)
        snmpset -v2c -c private $SWITCH_IP 1.3.6.1.2.1.17.7.1.4.5.1.1.$PORT i $ISOLATION_VLAN
        echo "✅ Moved $TARGET to quarantine VLAN $ISOLATION_VLAN"
    fi
}

# Method 2: Firewall rule isolation
isolate_via_firewall() {
    echo "Creating firewall isolation rules..."
    
    # Block all traffic to/from the target
    iptables -I FORWARD -s $TARGET -j DROP
    iptables -I FORWARD -d $TARGET -j DROP
    
    # Allow only essential management traffic
    iptables -I FORWARD -s $TARGET -d $MANAGEMENT_SUBNET -p tcp --dport 22 -j ACCEPT
    
    echo "✅ Firewall isolation rules applied"
}

# Method 3: DNS blackhole (for malware C2 communication)
isolate_dns() {
    echo "Implementing DNS isolation..."
    
    # Add to DNS blackhole zone
    echo "$TARGET" >> /etc/bind/blackhole.zone
    systemctl reload bind9
    
    echo "✅ DNS blackhole updated"
}

# Execute isolation methods
isolate_via_firewall
isolate_dns

# Notify incident response team
echo "📧 Notifying incident response team..."
echo "URGENT: System $TARGET has been isolated due to security incident. Isolation completed at $(date)" | \
    mail -s "SECURITY INCIDENT: System Isolated" ir-team@company.com

echo "🔒 Isolation complete. System $TARGET is quarantined."
echo "📋 Next steps:"
echo "  1. Preserve system for forensic analysis"
echo "  2. Begin malware analysis"
echo "  3. Check for lateral movement"
echo "  4. Update incident documentation"

Case Study: Ransomware Response

The 2 AM Call

Last year, our monitoring system detected file encryption activity across multiple servers. Here's how we responded:

Initial Detection

# The alert that woke me up:
# "High volume of file modifications detected across 15 servers"

# First response - check what's happening
for server in $(cat affected_servers.txt); do
    echo "Checking $server..."
    ssh $server "find /home /opt /var -name '*.locked' -o -name '*.encrypted' | wc -l"
done

# Output showed hundreds of encrypted files - definitely ransomware

Immediate Actions

Document everything - Started incident log immediately
Isolate affected systems - Used automated isolation script
Preserve evidence - Created disk images of key systems
Activate incident response team - Called in the cavalry

Investigation Timeline

# Timeline reconstruction script
import json
from datetime import datetime

timeline_events = [
    {
        "time": "2023-11-15 01:47:22",
        "source": "SIEM",
        "event": "Unusual PowerShell execution detected on WEB01",
        "severity": "medium"
    },
    {
        "time": "2023-11-15 01:52:11", 
        "source": "EDR",
        "event": "Suspicious process tree: powershell.exe -> cmd.exe -> cipher.exe",
        "severity": "high"
    },
    {
        "time": "2023-11-15 02:15:33",
        "source": "File Monitor", 
        "event": "Mass file encryption started on FILE01",
        "severity": "critical"
    },
    {
        "time": "2023-11-15 02:17:45",
        "source": "Network Monitor",
        "event": "Outbound connection to known C2 server 185.159.157.13",
        "severity": "critical"
    }
]

# Analysis showed the attack progression:
# 1. Initial compromise via phishing email
# 2. PowerShell-based reconnaissance 
# 3. Credential theft using Mimikatz
# 4. Lateral movement to file servers
# 5. Ransomware deployment across network

Recovery and Lessons Learned

What Worked

Automated isolation limited the blast radius
Offline backups enabled complete recovery
Incident response playbook kept team focused
Regular tabletop exercises prepared the team

What Could Have Been Better

Earlier detection - initial compromise went unnoticed for 6 hours
Better segmentation - lateral movement was too easy
Faster communication - took too long to notify leadership

Building Your Incident Response Program

Essential Documentation

Create these templates before you need them:

# INCIDENT RESPONSE CHECKLIST

## Initial Response (First 30 minutes)
- [ ] Document incident start time
- [ ] Identify incident commander
- [ ] Assess initial scope and impact
- [ ] Begin evidence preservation
- [ ] Notify key stakeholders

## Investigation Phase
- [ ] Collect and analyze logs
- [ ] Perform memory analysis
- [ ] Document timeline of events
- [ ] Identify attack vectors
- [ ] Assess data impact

## Containment
- [ ] Isolate affected systems
- [ ] Block malicious indicators
- [ ] Patch vulnerabilities
- [ ] Update security controls

## Recovery
- [ ] Verify system integrity
- [ ] Restore from clean backups
- [ ] Implement additional monitoring
- [ ] Validate security posture

## Post-Incident
- [ ] Complete incident report
- [ ] Conduct lessons learned session
- [ ] Update playbooks
- [ ] Implement improvements

Automation Scripts Library

Keep these ready for rapid deployment:

#!/usr/bin/env python3
"""
Incident Response Automation Suite
"""
import argparse
import subprocess
import json
from pathlib import Path

class IncidentAutomation:
    def __init__(self):
        self.case_dir = Path(f"/cases/{datetime.now().strftime('%Y%m%d_%H%M%S')}")
        self.case_dir.mkdir(parents=True, exist_ok=True)
    
    def collect_evidence(self, target_host):
        """Automated evidence collection"""
        print(f"🔍 Collecting evidence from {target_host}")
        
        evidence_items = [
            'ps aux',
            'netstat -tulpn', 
            'ls -la /tmp',
            'find /home -type f -mtime -1',
            'last -10',
            'crontab -l',
            'cat /var/log/auth.log | tail -100'
        ]
        
        for item in evidence_items:
            output_file = self.case_dir / f"{item.replace(' ', '_').replace('/', '_')}.txt"
            cmd = f"ssh {target_host} '{item}' > {output_file}"
            subprocess.run(cmd, shell=True)
    
    def hunt_iocs(self, ioc_file):
        """Hunt for indicators of compromise"""
        print(f"🎯 Hunting IOCs from {ioc_file}")
        
        with open(ioc_file) as f:
            iocs = json.load(f)
        
        results = {}
        
        # Hunt for file hashes
        for hash_val in iocs.get('hashes', []):
            cmd = f"find / -type f -exec md5sum {{}} \\; 2>/dev/null | grep {hash_val}"
            result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
            if result.stdout:
                results[hash_val] = result.stdout.strip()
        
        # Hunt for IP addresses in logs
        for ip in iocs.get('ips', []):
            cmd = f"grep -r {ip} /var/log/ 2>/dev/null"
            result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
            if result.stdout:
                results[ip] = result.stdout.strip()
        
        return results
    
    def generate_report(self, findings):
        """Generate incident response report"""
        report_file = self.case_dir / "incident_report.md"
        
        report_content = f"""
# INCIDENT RESPONSE REPORT

**Case ID**: {self.case_dir.name}
**Date**: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
**Analyst**: {os.environ.get('USER', 'Unknown')}

## Executive Summary
[To be filled by analyst]

## Technical Findings
{json.dumps(findings, indent=2)}

## Timeline
[Reconstruct based on evidence]

## Impact Assessment
[Document business impact]

## Recommendations
[Provide remediation steps]

## Appendices
- Evidence location: {self.case_dir}
- Tools used: [List tools]
- IOCs discovered: [List IOCs]
"""
        
        with open(report_file, 'w') as f:
            f.write(report_content)
        
        print(f"📄 Report generated: {report_file}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Incident Response Automation')
    parser.add_argument('--collect', help='Collect evidence from host')
    parser.add_argument('--hunt', help='Hunt IOCs from file')
    parser.add_argument('--report', action='store_true', help='Generate report')
    
    args = parser.parse_args()
    ir = IncidentAutomation()
    
    findings = {}
    
    if args.collect:
        ir.collect_evidence(args.collect)
    
    if args.hunt:
        findings = ir.hunt_iocs(args.hunt)
    
    if args.report:
        ir.generate_report(findings)

Communication During Crisis

Stakeholder Communication Templates

# Automated notification system
class IncidentNotifications:
    def __init__(self):
        self.severity_levels = {
            'LOW': {'escalation_time': 4, 'notify': ['ir_team']},
            'MEDIUM': {'escalation_time': 2, 'notify': ['ir_team', 'security_manager']},
            'HIGH': {'escalation_time': 1, 'notify': ['ir_team', 'security_manager', 'ciso']},
            'CRITICAL': {'escalation_time': 0.5, 'notify': ['all_hands']}
        }
    
    def send_initial_alert(self, severity, description):
        message = f"""
🚨 SECURITY INCIDENT ALERT

Severity: {severity}
Time: {datetime.now()}
Description: {description}

Initial Response:
- Incident response team activated
- Investigation in progress
- Systems being secured

Next Update: Within {self.severity_levels[severity]['escalation_time']} hours

Incident Commander: [Name]
Contact: [Phone/Email]
"""
        
        # Send to appropriate stakeholders
        recipients = self.severity_levels[severity]['notify']
        self.send_notification(message, recipients)
    
    def send_update(self, incident_id, status, findings):
        message = f"""
📊 INCIDENT UPDATE - {incident_id}

Status: {status}
Time: {datetime.now()}

Current Findings:
{findings}

Actions Taken:
- [List completed actions]

Next Steps:
- [List planned actions]

Estimated Resolution: [Timeline]
"""
        self.send_notification(message, ['all_stakeholders'])

Measuring Incident Response Effectiveness

Key Metrics to Track

class IncidentMetrics:
    def __init__(self):
        self.metrics = {}
    
    def calculate_mttr(self, incidents):
        """Mean Time to Recovery"""
        total_time = sum([
            (incident['resolved_time'] - incident['detected_time']).total_seconds()
            for incident in incidents if incident['resolved_time']
        ])
        return total_time / len(incidents) / 3600  # Hours
    
    def calculate_mttd(self, incidents):
        """Mean Time to Detection"""
        total_time = sum([
            (incident['detected_time'] - incident['occurred_time']).total_seconds()
            for incident in incidents if incident['occurred_time']
        ])
        return total_time / len(incidents) / 3600  # Hours
    
    def calculate_containment_time(self, incidents):
        """Average time to contain incidents"""
        total_time = sum([
            (incident['contained_time'] - incident['detected_time']).total_seconds()
            for incident in incidents if incident['contained_time']
        ])
        return total_time / len(incidents) / 60  # Minutes

Wrapping Up

Effective incident response isn't about having perfect tools or processes—it's about being prepared, staying calm under pressure, and learning from every incident.

Key Success Factors

Preparation beats perfection - Have playbooks ready
Speed matters - First 30 minutes are critical
Document everything - Evidence and decisions
Communicate clearly - Keep stakeholders informed
Learn and improve - Every incident teaches something

Building Your IR Capability

Start with these foundational elements:

Written playbooks for common scenarios
Automated tools for evidence collection
Communication templates for different audiences
Regular exercises to test your response
Continuous improvement based on lessons learned

Remember: The best incident response is the one you never have to use because your prevention worked. But when prevention fails, having a solid IR capability can mean the difference between a minor incident and a company-ending breach.

Want to dive deeper into incident response and digital forensics? Subscribe to my newsletter for weekly case studies, tools, and techniques from real incidents.