5.7 KiB
Patrick Beane
Infrastructure & Security Engineer | SRE | Cloud-Native Platforms
I design and operate a self-directed production infrastructure platform focused on security automation, reliability engineering, observability, vulnerability management, and recoverability.
The environment spans Kubernetes, Linux, multi-cloud infrastructure, identity controls, threat detection, backup verification, and public operational dashboards. My goal is to build systems that are secure-by-default, observable in production, and recoverable under failure.
Production Infrastructure Overview
This environment blends production operations, security research, and continuous infrastructure improvement. Services are distributed across cloud and self-hosted nodes, with each node scoped to a specific operational role to reduce blast radius and simplify ownership.
| Node | Primary Role | Function |
|---|---|---|
| Argus | Security telemetry and failover automation | Threat detection, event correlation, and node-health automation |
| Triton | Observability and internal tooling | Prometheus, Grafana, Authelia, code-server, CrowdSec bouncers |
| Ares | Kubernetes and source control | Gitea, PostgreSQL, Valkey, CI runners, Kubernetes control plane |
| Zephyrus | Container hosting | Docker workloads and service hosting |
| Iris | Edge services | NGINX/PHP ingress and public-facing services |
| Vault | Secrets and identity-adjacent services | Vaultwarden and protected internal services |
| Apollo | Threat intelligence dashboard | Flask-based analytics and reporting |
| Hermes | Public API frontend | Public API and frontend service layer |
| Hades | Public API backend | Backend service support for public APIs |
| Zeus | Monitoring and metrics | Centralized observability and service-health tracking |
Infrastructure Strategy
- Compute layer: Heterogeneous self-hosted and cloud infrastructure scoped by workload type
- Edge layer: Cloud and VPS ingress for public services and low-latency routing
- Security telemetry: Multi-node detection and mitigation workflows using CrowdSec and custom automation
- Observability: Centralized monitoring with Prometheus, Grafana, Netdata, exporters, and public dashboards
- Resilience: Automated health checks, DNS failover, backup verification, and role-scoped service design
Security Detection and Response
Security controls are integrated directly into the platform rather than handled as one-off manual checks.
Current detection and response patterns include:
- Telemetry ingestion from 7 active nodes
- CrowdSec-based detection and mitigation
- MITRE ATT&CK mapping for selected security events
- Escalation logic for high-confidence indicators
- Watchlist and retention policies based on event confidence
- High-severity event notifications through Discord
- Runtime visibility through public and private dashboards
Technical Stack
Languages: Python, Bash, JavaScript, React, Node.js
Infrastructure: Kubernetes, Docker, Caddy, NGINX, Linux
Security: CrowdSec, Trivy, Authelia, OIDC, MFA, Fail2Ban, Vaultwarden
Cloud and Networking: AWS, GCP, Oracle Cloud, Vultr, Cloudflare, DNS automation
Observability: Prometheus, Grafana, Netdata, Blackbox Exporter, Node Exporter, cAdvisor
Backups: Borgmatic, encrypted offsite backups, restore verification
CI/CD and Source Control: Git, GitHub Actions, Gitea, container image scanning
Infrastructure as Code: Terraform
Operational Metrics
Current platform highlights:
- 10-node distributed infrastructure environment
- 7-node security telemetry and detection footprint
127877lines of custom code across infrastructure, security, and automation projects1455commits since January 1 across active repositories- Automated failover between AWS and peer infrastructure
- Public dashboards for uptime, vulnerabilities, backups, threat telemetry, and service health
- Multiple daily encrypted Borgmatic snapshots shipped offsite
- Recurring backup verification and restore-oriented operational workflows
- Nightly metrics refresh via Gitea Actions and
tokei
Activity
Deployment Patterns
- Reverse proxy: Caddy and NGINX, with Cloudflare where applicable
- Observability: Prometheus, Grafana, Node Exporter, Blackbox Exporter, cAdvisor, Netdata
- Access control: Authelia, OIDC, MFA, TLS hardening, and protected reverse-proxy routes
- Lifecycle management: Controlled container updates and service monitoring
- Service isolation: Nodes scoped by role to reduce blast radius and simplify recovery
- Backup strategy: Encrypted offsite backups with recurring verification
Selected Public Projects
- Portfolio: beane.me
- Threat Decisions and Telemetry: threats.beane.me
- Threat Intelligence and Analytics: intel.beane.me
- Vulnerability Scanning and Trends: vuln.beane.me
- Backup and Restore Verification: backups.beane.me
- Threat Decision Observability: observe.beane.me
- Health and Failover Dashboard: health.beane.me
- Source Control: git.beane.me
- Terraform Threat Modeling: tfstride.beane.me
Engineering Philosophy
Production systems should be observable, automated, recoverable, and secure from the start.
I focus on infrastructure that explains itself: clear telemetry, deterministic automation, evidence-backed security findings, documented recovery paths, and controls that improve reliability without slowing delivery.