Files
.profile/README.template

121 lines
5.8 KiB
Plaintext

# Patrick Beane
**Infrastructure & Security Engineer | SRE | Cloud-Native Platforms**
I design and operate a self-directed production infrastructure platform focused on security automation, reliability engineering, observability, vulnerability management, and recoverability.
The environment spans Kubernetes, Linux, multi-cloud infrastructure, identity controls, threat detection, backup verification, and public operational dashboards. My goal is to build systems that are secure-by-default, observable in production, and recoverable under failure.
---
## Production Infrastructure Overview
This environment blends production operations, security research, and continuous infrastructure improvement. Services are distributed across cloud and self-hosted nodes, with each node scoped to a specific operational role to reduce blast radius and simplify ownership.
| Node | Primary Role | Function |
| :--- | :--- | :--- |
| **Argus** | Security telemetry and failover automation | Threat detection, event correlation, and node-health automation |
| **Triton** | Observability and internal tooling | Prometheus, Grafana, Authelia, code-server, CrowdSec bouncers |
| **Ares** | Kubernetes and source control | Gitea, PostgreSQL, Valkey, CI runners, Kubernetes control plane |
| **Zephyrus** | Container hosting | Docker workloads and service hosting |
| **Iris** | Edge services | NGINX/PHP ingress and public-facing services |
| **Vault** | Secrets and identity-adjacent services | Vaultwarden and protected internal services |
| **Apollo** | Threat intelligence dashboard | Flask-based analytics and reporting |
| **Hermes** | Public API frontend | Public API and frontend service layer |
| **Hades** | Public API backend | Backend service support for public APIs |
| **Zeus** | Monitoring and metrics | Centralized observability and service-health tracking |
---
## Infrastructure Strategy
- **Compute layer:** Heterogeneous self-hosted and cloud infrastructure scoped by workload type
- **Edge layer:** Cloud and VPS ingress for public services and low-latency routing
- **Security telemetry:** Multi-node detection and mitigation workflows using CrowdSec and custom automation
- **Observability:** Centralized monitoring with Prometheus, Grafana, Netdata, exporters, and public dashboards
- **Resilience:** Automated health checks, DNS failover, backup verification, and role-scoped service design
---
## Security Detection and Response
Security controls are integrated directly into the platform rather than handled as one-off manual checks.
Current detection and response patterns include:
- Telemetry ingestion from 7 active nodes
- CrowdSec-based detection and mitigation
- MITRE ATT&CK mapping for selected security events
- Escalation logic for high-confidence indicators
- Watchlist and retention policies based on event confidence
- High-severity event notifications through Discord
- Runtime visibility through public and private dashboards
---
## Technical Stack
**Languages:** Python, Bash, JavaScript, React, Node.js
**Infrastructure:** Kubernetes, Docker, Caddy, NGINX, Linux
**Security:** CrowdSec, Trivy, Authelia, OIDC, MFA, Fail2Ban, Vaultwarden
**Cloud and Networking:** AWS, GCP, Oracle Cloud, Vultr, Cloudflare, DNS automation
**Observability:** Prometheus, Grafana, Netdata, Blackbox Exporter, Node Exporter, cAdvisor
**Backups:** Borgmatic, encrypted offsite backups, restore verification
**CI/CD and Source Control:** Git, GitHub Actions, Gitea, container image scanning
**Infrastructure as Code:** Terraform
---
## Operational Metrics
Current platform highlights:
- 10-node distributed infrastructure environment
- 7-node security telemetry and detection footprint
- `REPLACE_ME_LOC` lines of custom code across infrastructure, security, and automation projects
- `REPLACE_ME_COMMITS` commits since January 1 across active repositories
- Automated failover between AWS and peer infrastructure
- Public dashboards for uptime, vulnerabilities, backups, threat telemetry, and service health
- Multiple daily encrypted Borgmatic snapshots shipped offsite
- Recurring backup verification and restore-oriented operational workflows
- Nightly metrics refresh via Gitea Actions and `tokei`
---
## Activity
![Commit heatmap](public/heatmap.svg)
---
## Deployment Patterns
- **Reverse proxy:** Caddy and NGINX, with Cloudflare where applicable
- **Observability:** Prometheus, Grafana, Node Exporter, Blackbox Exporter, cAdvisor, Netdata
- **Access control:** Authelia, OIDC, MFA, TLS hardening, and protected reverse-proxy routes
- **Lifecycle management:** Controlled container updates and service monitoring
- **Service isolation:** Nodes scoped by role to reduce blast radius and simplify recovery
- **Backup strategy:** Encrypted offsite backups with recurring verification
---
## Selected Public Projects
- **Portfolio:** [beane.me](https://beane.me)
- **Threat Decisions and Telemetry:** [threats.beane.me](https://threats.beane.me)
- **Threat Intelligence and Analytics:** [intel.beane.me](https://intel.beane.me)
- **Vulnerability Scanning and Trends:** [vuln.beane.me](https://vuln.beane.me)
- **Backup and Restore Verification:** [backups.beane.me](https://backups.beane.me)
- **Threat Decision Observability:** [observe.beane.me](https://observe.beane.me)
- **Health and Failover Dashboard:** [health.beane.me](https://health.beane.me)
- **Source Control:** [git.beane.me](https://git.beane.me)
- **Terraform Threat Modeling:** [tfstride.beane.me](https://tfstride.beane.me)
---
## Engineering Philosophy
Production systems should be observable, automated, recoverable, and secure from the start.
I focus on infrastructure that explains itself: clear telemetry, deterministic automation, evidence-backed security findings, documented recovery paths, and controls that improve reliability without slowing delivery.