adr discussing the choice of kubernetes distro

2025-12-08 15:42:55 +01:00 · 2025-12-08 15:42:55 +01:00 · bd103e17fe
commit bd103e17fe
parent 8c64107789
1 changed files with 38 additions and 0 deletions
--- a/docs/ADRs/adr002.md
+++ b/docs/ADRs/adr002.md
@ -0,0 +1,38 @@
 ---
 status: draft
 date: 2025-12-05
 author: Marco De Luca and Tobias Brunner
 ---
 # ADR 002 Kubernetes Distribution
 ## Context and Problem Statement
 TODO
 ## Considered Options
 We evaluated two options for running Kubernetes on each CSP: using the CSP's managed Kubernetes offering or deploying our own Kubernetes distribution on top of their compute layer (e.g., Talos Linux). We explicitly decided not to use OpenShift or OKE, mainly due to their high cost, added complexity, and the amount of bloat they introduce for a platform that only needs to run hosted AppCat workloads. Lower infrastructure cost directly benefits Servala and allows us to offer more competitive pricing.
 For Servala, consistency across CSPs and predictable behavior for AppCat are essential. Managed Kubernetes offerings differ significantly between providers, resulting in fragmentation and making AppCat development, testing, and support more difficult. A BYO Kubernetes approach gives us full control over versions, components, and security defaults, enabling a standardized setup across all CSPs.
 The table below summarizes the main differences we identified.
 | BYO Kubernetes                                                                                                    | Cloud Kubernetes                                                                                   |
 | ----------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
 | :lucide-check: We have a lot already in place (concept)                                                           | :lucide-check: Out-of-the-box pre-initiated                                                        |
 | :lucide-check: Standardized infrastructure across CSPs                                                            | :lucide-check: Potentially better integrated                                                       |
 | :lucide-check: Easier for AppCat to develop and test                                                              | :lucide-check: Pre-defined upgrade paths                                                           |
 | :lucide-check: Full control over versions, upgrade cadence and feature gates                                      | :lucide-check: Support from CSP                                                                    |
 | :lucide-check: Freedom of choice of cluster components                                                            | :lucide-x: Limited flexibility                                                                     |
 | :lucide-check: Potentially better security                                                                        | :lucide-x: Inconsistency across CSPs (different k8s flavors, k8s version, CRDs, API feature gates) |
 | :lucide-check: Predictable cluster behavior across CSPs                                                           | :lucide-x: Harder for AppCat to test on different environments                                     |
 | :lucide-check: Easier to implement in a GitOps-first pattern                                                      | :lucide-x: Opinionated software and constraints                                                    |
 | :lucide-check: Potentially cheaper, scalable cost model tied to raw compute offering not per-cluster service fees | :lucide-x: Unpredictable behavior (e.g., noisy neighbors)                                          |
 | :lucide-check: Streamlined support and troubleshooting model                                                      |                                                                                                    |
 | :lucide-x: Cope with infrastructure                                                                               |                                                                                                    |
 | :lucide-x: Manage whole stack                                                                                     |                                                                                                    |
 ## Decision
 ### Consequences