docs(profile): refresh Gitea infrastructure profile

2026-05-27 10:27:14 -04:00
parent 95646eb281
commit 6fcdaa9f84
2 changed files with 152 additions and 166 deletions
@@ -1,126 +1,119 @@
-# 🛡️ Patrick Beane
+# Patrick Beane
-**SRE | Security Engineer | Self-Hosted Infra & Detection**
+**Infrastructure & Security Engineer | SRE | Cloud-Native Platforms**
-I design and operate **security-first, self-hosted infrastructure** focused on detection, resilience, and sovereignty.  
+I design and operate a self-directed production infrastructure platform focused on security automation, reliability engineering, observability, vulnerability management, and recoverability.
-My lab functions as a live production environment where threat intelligence, automation, and reliability engineering intersect.
+
 The environment spans Kubernetes, Linux, multi-cloud infrastructure, identity controls, threat detection, backup verification, and public operational dashboards. My goal is to build systems that are secure-by-default, observable in production, and recoverable under failure.
 ---
-## 🛰 The Fleet (10 Nodes)
+## Production Infrastructure Overview
-> This environment blends production, research, and continuous experimentation.
+This environment blends production operations, security research, and continuous infrastructure improvement. Services are distributed across cloud and self-hosted nodes, with each node scoped to a specific operational role to reduce blast radius and simplify ownership.
 > Availability and controls are intentionally tuned per node role.
-| Node | Role | Specs | Status |
+| Node | Primary Role | Function |
-| :--- | :--- | :--- | :--- |
+| :--- | :--- | :--- |
-| **Argus** | SIEM / Brain / node-health Failover | Xeon E5-2660v2 (1 core) | 🟢 Online |
+| **Argus** | Security telemetry and failover automation | Threat detection, event correlation, and node-health automation |
-| **Triton** | High Performance Compute | EPYC 9634 (8 cores) | 🟢 Online |
+| **Triton** | Observability and internal tooling | Prometheus, Grafana, Authelia, code-server, CrowdSec bouncers |
-| **Ares** | Gitea / Kubernetes Management Node (MicroK8s) | Ryzen 9 9950X (8 cores) | 🟢 Online |
+| **Ares** | Kubernetes and source control | Gitea, PostgreSQL, Valkey, CI runners, Kubernetes control plane |
-| **Zephyrus** | Container Host | Ryzen 9 7950X (4 cores) | 🟢 Online |
+| **Zephyrus** | Container hosting | Docker workloads and service hosting |
-| **Iris** | NGINX / PHP Edge | Vultr | 🟢 Online |
+| **Iris** | Edge services | NGINX/PHP ingress and public-facing services |
-| **Vault** | Secrets Management | GCP (Vaultwarden) | 🟢 Online |
+| **Vault** | Secrets and identity-adjacent services | Vaultwarden and protected internal services |
-| **Apollo** | Intel Dashboard (Flask) | AWS | 🟢 Online |
+| **Apollo** | Threat intelligence dashboard | Flask-based analytics and reporting |
-| **Hermes** | Public API (Frontend) | Oracle Cloud | 🟢 Online |
+| **Hermes** | Public API frontend | Public API and frontend service layer |
-| **Hades** | Public API (Backend) | Oracle Cloud | 🟢 Online |
+| **Hades** | Public API backend | Backend service support for public APIs |
-| **Zeus** | Monitoring / Metrics NOC | Xeon Gold 6150 (1 core) | 🟢 Online |
+| **Zeus** | Monitoring and metrics | Centralized observability and service-health tracking |
 ---
-## 🌐 Infrastructure Strategy
+## Infrastructure Strategy
- **Compute Layer:** Zen 5 (9950X), Zen 4 (7950X), EPYC 9634 for sustained workloads.
+- **Compute layer:** Heterogeneous self-hosted and cloud infrastructure scoped by workload type
- **Edge Layer:** Oracle Cloud & Vultr for low-latency public ingress.
+- **Edge layer:** Cloud and VPS ingress for public services and low-latency routing
- **Sentinel Layer:** **Argus SIEM** correlating telemetry and enforcing distributed decisions across nodes.
+- **Security telemetry:** Multi-node detection and mitigation workflows using CrowdSec and custom automation
- **Observability:** Zeus as the centralized NOC and metrics authority.
+- **Observability:** Centralized monitoring with Prometheus, Grafana, Netdata, exporters, and public dashboards
 - **Resilience:** Automated health checks, DNS failover, backup verification, and role-scoped service design
 ---
-## 🛡️ Detection & Response Lifecycle
+## Security Detection and Response
- **Triage:** Telemetry ingested from 7 active nodes into the Argus engine.
+Security controls are integrated directly into the platform rather than handled as one-off manual checks.
- **Escalation:** Post-exploitation indicators (e.g. webshells) trigger immediate `PERM_BAN`.
+
- **Retention:** 
+Current detection and response patterns include:
-  - 24 hours for lower confidence scenarios  
+
-  - 14 days for high-confidence IOCs  
+- Telemetry ingestion from 7 active nodes
-  - 7 days for offender watchlist
+- CrowdSec-based detection and mitigation
- **Notification:** High-severity events dynamically pushed to Discord.
+- MITRE ATT&CK mapping for selected security events
 - Escalation logic for high-confidence indicators
 - Watchlist and retention policies based on event confidence
 - High-severity event notifications through Discord
 - Runtime visibility through public and private dashboards
 ---
-## 🛠 The Arsenal
+## Technical Stack
-**Languages:** Python (Flask, Gunicorn), Bash, JavaScript (React, Node.js)  
+**Languages:** Python, Bash, JavaScript, React, Node.js  
-**Infrastructure:** Kubernetes (K8s), Docker, Caddy, NGINX  
+**Infrastructure:** Kubernetes, Docker, Caddy, NGINX, Linux  
-**Security:** Argus (Custom SIEM), CrowdSec, Trivy, SQLite, Vaultwarden  
+**Security:** CrowdSec, Trivy, Authelia, OIDC, MFA, Fail2Ban, Vaultwarden  
-**Observability:** Prometheus, Blackbox Exporter, Node Exporter      
+**Cloud and Networking:** AWS, GCP, Oracle Cloud, Vultr, Cloudflare, DNS automation  
-**Backups:** Borgmatic, Rsync.net (Encrypted Offsite)
+**Observability:** Prometheus, Grafana, Netdata, Blackbox Exporter, Node Exporter, cAdvisor  
 **Backups:** Borgmatic, encrypted offsite backups, restore verification  
 **CI/CD and Source Control:** Git, GitHub Actions, Gitea, container image scanning  
 **Infrastructure as Code:** Terraform  
 ---
-### 🧠 Supporting Tooling & Concepts
+## Operational Metrics
-Actively used across this environment or in adjacent projects:
+Current platform highlights:
 - **Security & Identity:** Fail2Ban, MITRE ATT&CK mapping, OIDC, Authelia, MFA, TLS hardening
 - **Infrastructure & Cloud:** Linux (Debian/Ubuntu), Terraform, AWS, GCP, Oracle Cloud, Vultr
 - **CI / Ops:** Git, GitHub Actions, container image scanning  
 - **Observability (Extended):** Grafana, Netdata  
 - 10-node distributed infrastructure environment
 - 7-node security telemetry and detection footprint
 - `107096` lines of custom code across infrastructure, security, and automation projects
 - `1366` commits since January 1 across active repositories
 - Automated failover between AWS and peer infrastructure
 - Public dashboards for uptime, vulnerabilities, backups, threat telemetry, and service health
 - Multiple daily encrypted Borgmatic snapshots shipped offsite
 - Recurring backup verification and restore-oriented operational workflows
 ---
-## ⚡ Efficiency Metrics
+## Activity
 - **Codebase Growth:** `107096` lines of custom code across all our repositories
 - **Commit Velocity:** `1366` commits since Jan 1
 - **Ares:** Ryzen 9 9950X sustaining ~0.06 load avg while running Gitea and a Kubernetes control plane
 - **Resilience:** Automated failover between AWS and peer nodes
 ## 📈 Activity
 ![Commit heatmap](public/heatmap.svg)
 ---
-### 🧩 Deployment Patterns
+## Deployment Patterns
 - **Reverse Proxy:** Caddy/NGINX (Cloudflare where applicable)
 - **Observability:** Prometheus + Node Exporter + cAdvisor
 - **Lifecycle:** Watchtower for controlled auto-updates
 - **Access Control:** Authelia where exposed
 - **Management:** Portainer (loopback-bound where possible)
-> Nodes are intentionally heterogeneous.  
+- **Reverse proxy:** Caddy and NGINX, with Cloudflare where applicable
-> Each host is scoped to its role to reduce blast radius and cognitive load.
+- **Observability:** Prometheus, Grafana, Node Exporter, Blackbox Exporter, cAdvisor, Netdata
 - **Access control:** Authelia, OIDC, MFA, TLS hardening, and protected reverse-proxy routes
 - **Lifecycle management:** Controlled container updates and service monitoring
 - **Service isolation:** Nodes scoped by role to reduce blast radius and simplify recovery
 - **Backup strategy:** Encrypted offsite backups with recurring verification
 ---
-#### 📍 Triton
+## Selected Public Projects
 Primary high-density services node running:
 - Prometheus + Grafana
 - Code-server
 - Authelia
 - Trilium
 - CrowdSec bouncers
-Optimized for sustained workloads and observability aggregation.
+- **Portfolio:** `beane.me`
-
+- **Threat Decisions and Telemetry:** `threats.beane.me`
---
+- **Threat Intelligence and Analytics:** `intel.beane.me`
-
+- **Vulnerability Scanning and Trends:** `vuln.beane.me`
-### 🔗 Live Projects
+- **Backup and Restore Verification:** `backups.beane.me`
 - **Threat Decisions & Telemetry:** `threats.beane.me`
 - **Threat Intelligence & Analytics:** `intel.beane.me`
 - **Vulnerability Scanning (Trivy):** `vuln.beane.me`
 - **Backups & Restore Verification:** `backups.beane.me`
 - **Threat Decision Observability:** `observe.beane.me`
- **Source Control (Gitea + K8s):** `git.beane.me`
+- **Health and Failover Dashboard:** `health.beane.me`
 - **Source Control:** `git.beane.me`
 - **Terraform Threat Modeling:** `tfstride.beane.me`
 ---
-## 🚜 Resource Management
+## Engineering Philosophy
- **Compute Density:** Kubernetes control plane with Postgres and CI workloads on Zen 5 hardware
+Production systems should be observable, automated, recoverable, and secure from the start.
 - **Sovereignty:** All code, telemetry, and backups remain self-hosted
 - **Backups:** Multiple daily encrypted Borgmatic snapshots shipped offsite
-> *"Production systems should be observable, automated, recoverable, and secure from the start."*
+I focus on infrastructure that explains itself: clear telemetry, deterministic automation, evidence-backed security findings, documented recovery paths, and controls that improve reliability without slowing delivery.
@@ -1,126 +1,119 @@
-# 🛡️ Patrick Beane
+# Patrick Beane
-**SRE | Security Engineer | Self-Hosted Infra & Detection**
+**Infrastructure & Security Engineer | SRE | Cloud-Native Platforms**
-I design and operate **security-first, self-hosted infrastructure** focused on detection, resilience, and sovereignty.  
+I design and operate a self-directed production infrastructure platform focused on security automation, reliability engineering, observability, vulnerability management, and recoverability.
-My lab functions as a live production environment where threat intelligence, automation, and reliability engineering intersect.
+
 The environment spans Kubernetes, Linux, multi-cloud infrastructure, identity controls, threat detection, backup verification, and public operational dashboards. My goal is to build systems that are secure-by-default, observable in production, and recoverable under failure.
 ---
-## 🛰 The Fleet (10 Nodes)
+## Production Infrastructure Overview
-> This environment blends production, research, and continuous experimentation.
+This environment blends production operations, security research, and continuous infrastructure improvement. Services are distributed across cloud and self-hosted nodes, with each node scoped to a specific operational role to reduce blast radius and simplify ownership.
 > Availability and controls are intentionally tuned per node role.
-| Node | Role | Specs | Status |
+| Node | Primary Role | Function |
-| :--- | :--- | :--- | :--- |
+| :--- | :--- | :--- |
-| **Argus** | SIEM / Brain / node-health Failover | Xeon E5-2660v2 (1 core) | 🟢 Online |
+| **Argus** | Security telemetry and failover automation | Threat detection, event correlation, and node-health automation |
-| **Triton** | High Performance Compute | EPYC 9634 (8 cores) | 🟢 Online |
+| **Triton** | Observability and internal tooling | Prometheus, Grafana, Authelia, code-server, CrowdSec bouncers |
-| **Ares** | Gitea / Kubernetes Management Node (MicroK8s) | Ryzen 9 9950X (8 cores) | 🟢 Online |
+| **Ares** | Kubernetes and source control | Gitea, PostgreSQL, Valkey, CI runners, Kubernetes control plane |
-| **Zephyrus** | Container Host | Ryzen 9 7950X (4 cores) | 🟢 Online |
+| **Zephyrus** | Container hosting | Docker workloads and service hosting |
-| **Iris** | NGINX / PHP Edge | Vultr | 🟢 Online |
+| **Iris** | Edge services | NGINX/PHP ingress and public-facing services |
-| **Vault** | Secrets Management | GCP (Vaultwarden) | 🟢 Online |
+| **Vault** | Secrets and identity-adjacent services | Vaultwarden and protected internal services |
-| **Apollo** | Intel Dashboard (Flask) | AWS | 🟢 Online |
+| **Apollo** | Threat intelligence dashboard | Flask-based analytics and reporting |
-| **Hermes** | Public API (Frontend) | Oracle Cloud | 🟢 Online |
+| **Hermes** | Public API frontend | Public API and frontend service layer |
-| **Hades** | Public API (Backend) | Oracle Cloud | 🟢 Online |
+| **Hades** | Public API backend | Backend service support for public APIs |
-| **Zeus** | Monitoring / Metrics NOC | Xeon Gold 6150 (1 core) | 🟢 Online |
+| **Zeus** | Monitoring and metrics | Centralized observability and service-health tracking |
 ---
-## 🌐 Infrastructure Strategy
+## Infrastructure Strategy
- **Compute Layer:** Zen 5 (9950X), Zen 4 (7950X), EPYC 9634 for sustained workloads.
+- **Compute layer:** Heterogeneous self-hosted and cloud infrastructure scoped by workload type
- **Edge Layer:** Oracle Cloud & Vultr for low-latency public ingress.
+- **Edge layer:** Cloud and VPS ingress for public services and low-latency routing
- **Sentinel Layer:** **Argus SIEM** correlating telemetry and enforcing distributed decisions across nodes.
+- **Security telemetry:** Multi-node detection and mitigation workflows using CrowdSec and custom automation
- **Observability:** Zeus as the centralized NOC and metrics authority.
+- **Observability:** Centralized monitoring with Prometheus, Grafana, Netdata, exporters, and public dashboards
 - **Resilience:** Automated health checks, DNS failover, backup verification, and role-scoped service design
 ---
-## 🛡️ Detection & Response Lifecycle
+## Security Detection and Response
- **Triage:** Telemetry ingested from 7 active nodes into the Argus engine.
+Security controls are integrated directly into the platform rather than handled as one-off manual checks.
- **Escalation:** Post-exploitation indicators (e.g. webshells) trigger immediate `PERM_BAN`.
+
- **Retention:** 
+Current detection and response patterns include:
-  - 24 hours for lower confidence scenarios  
+
-  - 14 days for high-confidence IOCs  
+- Telemetry ingestion from 7 active nodes
-  - 7 days for offender watchlist
+- CrowdSec-based detection and mitigation
- **Notification:** High-severity events dynamically pushed to Discord.
+- MITRE ATT&CK mapping for selected security events
 - Escalation logic for high-confidence indicators
 - Watchlist and retention policies based on event confidence
 - High-severity event notifications through Discord
 - Runtime visibility through public and private dashboards
 ---
-## 🛠 The Arsenal
+## Technical Stack
-**Languages:** Python (Flask, Gunicorn), Bash, JavaScript (React, Node.js)  
+**Languages:** Python, Bash, JavaScript, React, Node.js  
-**Infrastructure:** Kubernetes (K8s), Docker, Caddy, NGINX  
+**Infrastructure:** Kubernetes, Docker, Caddy, NGINX, Linux  
-**Security:** Argus (Custom SIEM), CrowdSec, Trivy, SQLite, Vaultwarden  
+**Security:** CrowdSec, Trivy, Authelia, OIDC, MFA, Fail2Ban, Vaultwarden  
-**Observability:** Prometheus, Blackbox Exporter, Node Exporter      
+**Cloud and Networking:** AWS, GCP, Oracle Cloud, Vultr, Cloudflare, DNS automation  
-**Backups:** Borgmatic, Rsync.net (Encrypted Offsite)
+**Observability:** Prometheus, Grafana, Netdata, Blackbox Exporter, Node Exporter, cAdvisor  
 **Backups:** Borgmatic, encrypted offsite backups, restore verification  
 **CI/CD and Source Control:** Git, GitHub Actions, Gitea, container image scanning  
 **Infrastructure as Code:** Terraform  
 ---
-### 🧠 Supporting Tooling & Concepts
+## Operational Metrics
-Actively used across this environment or in adjacent projects:
+Current platform highlights:
 - **Security & Identity:** Fail2Ban, MITRE ATT&CK mapping, OIDC, Authelia, MFA, TLS hardening
 - **Infrastructure & Cloud:** Linux (Debian/Ubuntu), Terraform, AWS, GCP, Oracle Cloud, Vultr
 - **CI / Ops:** Git, GitHub Actions, container image scanning  
 - **Observability (Extended):** Grafana, Netdata  
 - 10-node distributed infrastructure environment
 - 7-node security telemetry and detection footprint
 - `REPLACE_ME_LOC` lines of custom code across infrastructure, security, and automation projects
 - `REPLACE_ME_COMMITS` commits since January 1 across active repositories
 - Automated failover between AWS and peer infrastructure
 - Public dashboards for uptime, vulnerabilities, backups, threat telemetry, and service health
 - Multiple daily encrypted Borgmatic snapshots shipped offsite
 - Recurring backup verification and restore-oriented operational workflows
 ---
-## ⚡ Efficiency Metrics
+## Activity
 - **Codebase Growth:** `REPLACE_ME_LOC` lines of custom code across all our repositories
 - **Commit Velocity:** `REPLACE_ME_COMMITS` commits since Jan 1
 - **Ares:** Ryzen 9 9950X sustaining ~0.06 load avg while running Gitea and a Kubernetes control plane
 - **Resilience:** Automated failover between AWS and peer nodes
 ## 📈 Activity
 ![Commit heatmap](public/heatmap.svg)
 ---
-### 🧩 Deployment Patterns
+## Deployment Patterns
 - **Reverse Proxy:** Caddy/NGINX (Cloudflare where applicable)
 - **Observability:** Prometheus + Node Exporter + cAdvisor
 - **Lifecycle:** Watchtower for controlled auto-updates
 - **Access Control:** Authelia where exposed
 - **Management:** Portainer (loopback-bound where possible)
-> Nodes are intentionally heterogeneous.  
+- **Reverse proxy:** Caddy and NGINX, with Cloudflare where applicable
-> Each host is scoped to its role to reduce blast radius and cognitive load.
+- **Observability:** Prometheus, Grafana, Node Exporter, Blackbox Exporter, cAdvisor, Netdata
 - **Access control:** Authelia, OIDC, MFA, TLS hardening, and protected reverse-proxy routes
 - **Lifecycle management:** Controlled container updates and service monitoring
 - **Service isolation:** Nodes scoped by role to reduce blast radius and simplify recovery
 - **Backup strategy:** Encrypted offsite backups with recurring verification
 ---
-#### 📍 Triton
+## Selected Public Projects
 Primary high-density services node running:
 - Prometheus + Grafana
 - Code-server
 - Authelia
 - Trilium
 - CrowdSec bouncers
-Optimized for sustained workloads and observability aggregation.
+- **Portfolio:** `beane.me`
-
+- **Threat Decisions and Telemetry:** `threats.beane.me`
---
+- **Threat Intelligence and Analytics:** `intel.beane.me`
-
+- **Vulnerability Scanning and Trends:** `vuln.beane.me`
-### 🔗 Live Projects
+- **Backup and Restore Verification:** `backups.beane.me`
 - **Threat Decisions & Telemetry:** `threats.beane.me`
 - **Threat Intelligence & Analytics:** `intel.beane.me`
 - **Vulnerability Scanning (Trivy):** `vuln.beane.me`
 - **Backups & Restore Verification:** `backups.beane.me`
 - **Threat Decision Observability:** `observe.beane.me`
- **Source Control (Gitea + K8s):** `git.beane.me`
+- **Health and Failover Dashboard:** `health.beane.me`
 - **Source Control:** `git.beane.me`
 - **Terraform Threat Modeling:** `tfstride.beane.me`
 ---
-## 🚜 Resource Management
+## Engineering Philosophy
- **Compute Density:** Kubernetes control plane with Postgres and CI workloads on Zen 5 hardware
+Production systems should be observable, automated, recoverable, and secure from the start.
 - **Sovereignty:** All code, telemetry, and backups remain self-hosted
 - **Backups:** Multiple daily encrypted Borgmatic snapshots shipped offsite
-> *"Production systems should be observable, automated, recoverable, and secure from the start."*
+I focus on infrastructure that explains itself: clear telemetry, deterministic automation, evidence-backed security findings, documented recovery paths, and controls that improve reliability without slowing delivery.