docs(profile): refresh Gitea infrastructure profile

2026-05-27 10:27:14 -04:00
parent 95646eb281
commit 6fcdaa9f84
2 changed files with 152 additions and 166 deletions
@@ -1,126 +1,119 @@
-# 🛡️ Patrick Beane
+# Patrick Beane

-**SRE | Security Engineer | Self-Hosted Infra & Detection**
+**Infrastructure & Security Engineer | SRE | Cloud-Native Platforms**

-I design and operate **security-first, self-hosted infrastructure** focused on detection, resilience, and sovereignty.  
-My lab functions as a live production environment where threat intelligence, automation, and reliability engineering intersect.
+I design and operate a self-directed production infrastructure platform focused on security automation, reliability engineering, observability, vulnerability management, and recoverability.
+
+The environment spans Kubernetes, Linux, multi-cloud infrastructure, identity controls, threat detection, backup verification, and public operational dashboards. My goal is to build systems that are secure-by-default, observable in production, and recoverable under failure.

 ---

-## 🛰 The Fleet (10 Nodes)
+## Production Infrastructure Overview

-> This environment blends production, research, and continuous experimentation.
-> Availability and controls are intentionally tuned per node role.
+This environment blends production operations, security research, and continuous infrastructure improvement. Services are distributed across cloud and self-hosted nodes, with each node scoped to a specific operational role to reduce blast radius and simplify ownership.

-| Node | Role | Specs | Status |
-| :--- | :--- | :--- | :--- |
-| **Argus** | SIEM / Brain / node-health Failover | Xeon E5-2660v2 (1 core) | 🟢 Online |
-| **Triton** | High Performance Compute | EPYC 9634 (8 cores) | 🟢 Online |
-| **Ares** | Gitea / Kubernetes Management Node (MicroK8s) | Ryzen 9 9950X (8 cores) | 🟢 Online |
-| **Zephyrus** | Container Host | Ryzen 9 7950X (4 cores) | 🟢 Online |
-| **Iris** | NGINX / PHP Edge | Vultr | 🟢 Online |
-| **Vault** | Secrets Management | GCP (Vaultwarden) | 🟢 Online |
-| **Apollo** | Intel Dashboard (Flask) | AWS | 🟢 Online |
-| **Hermes** | Public API (Frontend) | Oracle Cloud | 🟢 Online |
-| **Hades** | Public API (Backend) | Oracle Cloud | 🟢 Online |
-| **Zeus** | Monitoring / Metrics NOC | Xeon Gold 6150 (1 core) | 🟢 Online |
+| Node | Primary Role | Function |
+| :--- | :--- | :--- |
+| **Argus** | Security telemetry and failover automation | Threat detection, event correlation, and node-health automation |
+| **Triton** | Observability and internal tooling | Prometheus, Grafana, Authelia, code-server, CrowdSec bouncers |
+| **Ares** | Kubernetes and source control | Gitea, PostgreSQL, Valkey, CI runners, Kubernetes control plane |
+| **Zephyrus** | Container hosting | Docker workloads and service hosting |
+| **Iris** | Edge services | NGINX/PHP ingress and public-facing services |
+| **Vault** | Secrets and identity-adjacent services | Vaultwarden and protected internal services |
+| **Apollo** | Threat intelligence dashboard | Flask-based analytics and reporting |
+| **Hermes** | Public API frontend | Public API and frontend service layer |
+| **Hades** | Public API backend | Backend service support for public APIs |
+| **Zeus** | Monitoring and metrics | Centralized observability and service-health tracking |

 ---

-## 🌐 Infrastructure Strategy
+## Infrastructure Strategy

- **Compute Layer:** Zen 5 (9950X), Zen 4 (7950X), EPYC 9634 for sustained workloads.
- **Edge Layer:** Oracle Cloud & Vultr for low-latency public ingress.
- **Sentinel Layer:** **Argus SIEM** correlating telemetry and enforcing distributed decisions across nodes.
- **Observability:** Zeus as the centralized NOC and metrics authority.
+- **Compute layer:** Heterogeneous self-hosted and cloud infrastructure scoped by workload type
+- **Edge layer:** Cloud and VPS ingress for public services and low-latency routing
+- **Security telemetry:** Multi-node detection and mitigation workflows using CrowdSec and custom automation
+- **Observability:** Centralized monitoring with Prometheus, Grafana, Netdata, exporters, and public dashboards
+- **Resilience:** Automated health checks, DNS failover, backup verification, and role-scoped service design

 ---

-## 🛡️ Detection & Response Lifecycle
+## Security Detection and Response

- **Triage:** Telemetry ingested from 7 active nodes into the Argus engine.
- **Escalation:** Post-exploitation indicators (e.g. webshells) trigger immediate `PERM_BAN`.
- **Retention:** 
-  - 24 hours for lower confidence scenarios  
-  - 14 days for high-confidence IOCs  
-  - 7 days for offender watchlist
- **Notification:** High-severity events dynamically pushed to Discord.
+Security controls are integrated directly into the platform rather than handled as one-off manual checks.
+
+Current detection and response patterns include:
+
+- Telemetry ingestion from 7 active nodes
+- CrowdSec-based detection and mitigation
+- MITRE ATT&CK mapping for selected security events
+- Escalation logic for high-confidence indicators
+- Watchlist and retention policies based on event confidence
+- High-severity event notifications through Discord
+- Runtime visibility through public and private dashboards

 ---

-## 🛠 The Arsenal
+## Technical Stack

-**Languages:** Python (Flask, Gunicorn), Bash, JavaScript (React, Node.js)  
-**Infrastructure:** Kubernetes (K8s), Docker, Caddy, NGINX  
-**Security:** Argus (Custom SIEM), CrowdSec, Trivy, SQLite, Vaultwarden  
-**Observability:** Prometheus, Blackbox Exporter, Node Exporter      
-**Backups:** Borgmatic, Rsync.net (Encrypted Offsite)
+**Languages:** Python, Bash, JavaScript, React, Node.js  
+**Infrastructure:** Kubernetes, Docker, Caddy, NGINX, Linux  
+**Security:** CrowdSec, Trivy, Authelia, OIDC, MFA, Fail2Ban, Vaultwarden  
+**Cloud and Networking:** AWS, GCP, Oracle Cloud, Vultr, Cloudflare, DNS automation  
+**Observability:** Prometheus, Grafana, Netdata, Blackbox Exporter, Node Exporter, cAdvisor  
+**Backups:** Borgmatic, encrypted offsite backups, restore verification  
+**CI/CD and Source Control:** Git, GitHub Actions, Gitea, container image scanning  
+**Infrastructure as Code:** Terraform  

 ---

-### 🧠 Supporting Tooling & Concepts
+## Operational Metrics

-Actively used across this environment or in adjacent projects:
-
- **Security & Identity:** Fail2Ban, MITRE ATT&CK mapping, OIDC, Authelia, MFA, TLS hardening
- **Infrastructure & Cloud:** Linux (Debian/Ubuntu), Terraform, AWS, GCP, Oracle Cloud, Vultr
- **CI / Ops:** Git, GitHub Actions, container image scanning  
- **Observability (Extended):** Grafana, Netdata  
+Current platform highlights:

+- 10-node distributed infrastructure environment
+- 7-node security telemetry and detection footprint
+- `REPLACE_ME_LOC` lines of custom code across infrastructure, security, and automation projects
+- `REPLACE_ME_COMMITS` commits since January 1 across active repositories
+- Automated failover between AWS and peer infrastructure
+- Public dashboards for uptime, vulnerabilities, backups, threat telemetry, and service health
+- Multiple daily encrypted Borgmatic snapshots shipped offsite
+- Recurring backup verification and restore-oriented operational workflows

 ---

-## ⚡ Efficiency Metrics
-
- **Codebase Growth:** `REPLACE_ME_LOC` lines of custom code across all our repositories
- **Commit Velocity:** `REPLACE_ME_COMMITS` commits since Jan 1
- **Ares:** Ryzen 9 9950X sustaining ~0.06 load avg while running Gitea and a Kubernetes control plane
- **Resilience:** Automated failover between AWS and peer nodes
-
-## 📈 Activity
+## Activity

 ![Commit heatmap](public/heatmap.svg)

 ---

-### 🧩 Deployment Patterns
- **Reverse Proxy:** Caddy/NGINX (Cloudflare where applicable)
- **Observability:** Prometheus + Node Exporter + cAdvisor
- **Lifecycle:** Watchtower for controlled auto-updates
- **Access Control:** Authelia where exposed
- **Management:** Portainer (loopback-bound where possible)
+## Deployment Patterns

-> Nodes are intentionally heterogeneous.  
-> Each host is scoped to its role to reduce blast radius and cognitive load.
+- **Reverse proxy:** Caddy and NGINX, with Cloudflare where applicable
+- **Observability:** Prometheus, Grafana, Node Exporter, Blackbox Exporter, cAdvisor, Netdata
+- **Access control:** Authelia, OIDC, MFA, TLS hardening, and protected reverse-proxy routes
+- **Lifecycle management:** Controlled container updates and service monitoring
+- **Service isolation:** Nodes scoped by role to reduce blast radius and simplify recovery
+- **Backup strategy:** Encrypted offsite backups with recurring verification

 ---

-#### 📍 Triton
-Primary high-density services node running:
- Prometheus + Grafana
- Code-server
- Authelia
- Trilium
- CrowdSec bouncers
+## Selected Public Projects

-Optimized for sustained workloads and observability aggregation.
-
---
-
-### 🔗 Live Projects
- **Threat Decisions & Telemetry:** `threats.beane.me`
- **Threat Intelligence & Analytics:** `intel.beane.me`
- **Vulnerability Scanning (Trivy):** `vuln.beane.me`
- **Backups & Restore Verification:** `backups.beane.me`
+- **Portfolio:** `beane.me`
+- **Threat Decisions and Telemetry:** `threats.beane.me`
+- **Threat Intelligence and Analytics:** `intel.beane.me`
+- **Vulnerability Scanning and Trends:** `vuln.beane.me`
+- **Backup and Restore Verification:** `backups.beane.me`
 - **Threat Decision Observability:** `observe.beane.me`
- **Source Control (Gitea + K8s):** `git.beane.me`
+- **Health and Failover Dashboard:** `health.beane.me`
+- **Source Control:** `git.beane.me`
+- **Terraform Threat Modeling:** `tfstride.beane.me`

 ---

-## 🚜 Resource Management
+## Engineering Philosophy

- **Compute Density:** Kubernetes control plane with Postgres and CI workloads on Zen 5 hardware
- **Sovereignty:** All code, telemetry, and backups remain self-hosted
- **Backups:** Multiple daily encrypted Borgmatic snapshots shipped offsite
+Production systems should be observable, automated, recoverable, and secure from the start.

-> *"Production systems should be observable, automated, recoverable, and secure from the start."*
+I focus on infrastructure that explains itself: clear telemetry, deterministic automation, evidence-backed security findings, documented recovery paths, and controls that improve reliability without slowing delivery.