Add architecture overview #4

marco.deluca · 2026-01-07T17:07:29Z

marco.deluca commented

2026-01-07 17:07:29 +00:00

No description provided.

marco.deluca added 1 commit

2026-01-07 17:07:30 +00:00

add architecture overview 6dc8deec03

marco.deluca changed title from ~~WIP: Add architecture overview~~ to Add architecture overview

2026-01-08 08:27:43 +00:00

tobru force-pushed overall-architecture from 6dc8deec03 to b171f10143

2026-01-08 09:08:23 +00:00

Compare

tobru requested changes

2026-01-08 09:35:25 +00:00

tobru left a comment

Do we need to add some notes on how Project Syn plays its part?

docs/architecture.md Outdated

					
				@ -0,0 +20,4 @@

				| Cluster Type | Purpose |

				|--------------|---------|

				| **Management Cluster** | Hosts the Servala Portal, centralized monitoring, and alerting infrastructure |

				| **Workload Clusters** | Run customer workloads and AppCat service instances |

Run customer workloads and AppCat service instances

"Run customer service instances and AppCat control-plane"

> Run customer workloads and AppCat service instances "Run customer service instances and AppCat control-plane"

marco.deluca marked this conversation as resolved

docs/architecture.md Outdated

					
				@ -0,0 +22,4 @@

				| **Management Cluster** | Hosts the Servala Portal, centralized monitoring, and alerting infrastructure |

				| **Workload Clusters** | Run customer workloads and AppCat service instances |

				All clusters run Talos Linux and are deployed using Terraform/OpenTofu across multiple cloud providers. The architecture is designed for secure, private access with no public Kubernetes API exposure.

Should we already mention the fact that there might be CSP offering "Managed Talos" like Hidora/Hikube where we potentially use their service instead of bringing our own?

I have added a little note which states that we are open to it if the CSP offers a managed Kubernetes based on Talos

marco.deluca marked this conversation as resolved

docs/architecture.md Outdated

					
				@ -0,0 +24,4 @@

				All clusters run Talos Linux and are deployed using Terraform/OpenTofu across multiple cloud providers. The architecture is designed for secure, private access with no public Kubernetes API exposure.

				All clusters run a highly available control plane consisting of 3 master nodes. Worker nodes are scaled manually based on capacity needs.

Worker nodes are scaled manually based on capacity needs.

"Worker nodes are currently scaled manually based on capacity needs."

> Worker nodes are scaled manually based on capacity needs. "Worker nodes are currently scaled manually based on capacity needs."

marco.deluca marked this conversation as resolved

docs/architecture.md Outdated

					
				@ -0,0 +30,4 @@

				AppCat is the core component of the Servala service catalog, built on [Crossplane](https://www.crossplane.io/). It runs on every cluster and enables the provisioning and management of cloud-native services for customers.

				For detailed information on AppCat's architecture and capabilities, see the [AppCat documentation](https://docs.appcat.ch/index.html).

Maybe also link to https://kb.vshn.ch/app-catalog/ which is more technical and about the architecture

done

marco.deluca marked this conversation as resolved

docs/architecture.md Outdated

					
				@ -0,0 +38,4 @@

				### CIDR Allocation Strategy

				To enable potential future mesh connectivity, each cluster receives non-overlapping network ranges:

Do you want to elaborate here what constraints these prefix length will impose? Aka what that means in terms of max nodes, pods or services per cluster and the max. number of clusters supported?

Added what these CIDRs imply

marco.deluca marked this conversation as resolved

docs/architecture.md Outdated

					
				@ -0,0 +87,4 @@

				### Audit Logging

				Kubernetes API server audit logging is enabled on all clusters to track who did what and when. Audit logs are collected centrally alongside other cluster logs.

I think we need to track this in a Jira issue so that we do not forget to set it up =)

done, it's in our board

marco.deluca marked this conversation as resolved

docs/architecture.md Outdated

					
				@ -0,0 +95,4 @@

				## Naming Conventions

				### Cluster Names

Can you elaborate where this cluster name is used? I think this is a Project Syn specific thing? Why do we need this c-servala prefix?

Can you elaborate where this cluster name is used? I think this is a Project Syn specific thing? Why do we need this `c-servala` prefix?

I just stuck with it since it makes it a bit easier for switching between the syn related stuff and my infra configs. I personally don't mind the prefix at all.

I'm OK with keeping the prefix. I'd still love to have a sentence added what this cluster name is used for, because for example it differs from the DNS names.

tobru marked this conversation as resolved

docs/architecture.md Outdated

					
				@ -0,0 +127,4 @@

				| Group | Pattern | Examples |

				|-------|---------|----------|

				| Control plane | `master-[ID]` | `master-904e`, `master-8beb`, `master-dcb8` |

				| Workers | `worker-[ID]` | `worker-e852` |

Do we care about the worker type in the naming scheme? For example, on Cloudscale we might have plus and flex workers.

My intention with this was to keep workers naming scheme the same across the cluster no matter what compute flavour it uses. I am using nodelabels to distinguish this, for example: node.kubernetes.io/instance-type: plus-16-4

My intention with this was to keep workers naming scheme the same across the cluster no matter what compute flavour it uses. I am using nodelabels to distinguish this, for example: `node.kubernetes.io/instance-type: plus-16-4`

tobru marked this conversation as resolved

docs/architecture.md Outdated

					
				@ -0,0 +129,4 @@

				| Control plane | `master-[ID]` | `master-904e`, `master-8beb`, `master-dcb8` |

				| Workers | `worker-[ID]` | `worker-e852` |

				## Cluster Provisioning

I think this section belongs into a separate documentation. This is not architecture, this is more a "how to" or "Runbook".

marco.deluca marked this conversation as resolved

docs/architecture.md Outdated

					
				@ -0,0 +276,4 @@

				## Image Management

				Container images are pulled from public registries (e.g., ghcr.io for AppCat components). [Spegel](https://github.com/spegel-org/spegel) is deployed on each cluster to provide peer-to-peer image sharing between nodes, reducing external registry pulls and improving pull performance.

Yay! Is this part of Talos or do we install / manage it?

it's something we slap on top

marco.deluca marked this conversation as resolved

docs/architecture.md Outdated

					
				@ -0,0 +285,4 @@

				| Scenario | Solution |

				|----------|----------|

				| CSP provides block storage | Use native CSP storage with appropriate CSI driver |

				| CSP lacks storage options | Deploy Rook Ceph for software-defined storage |

I would go one step further: The CSP must support CSI, otherwise we can't work with them. I don't think we should run Rook tbh. This is a qualification step for the CSP: No CSI? No Servala.

I would go one step further: The CSP _must_ support CSI, otherwise we can't work with them. I don't think we should run Rook tbh. This is a qualification step for the CSP: No CSI? No Servala.

I am more than fine with this statement. Initially I only included it in the document because I was told that for example the performance of Exoscale's block storage was not great but it has been a while since that was assessed. I still need to find out but I think it'll be good enough.
Yes, I'll discard the Rook part. That makes things a lot easier for us.

I am more than fine with this statement. Initially I only included it in the document because I was told that for example the performance of Exoscale's block storage was not great but it has been a while since that was assessed. I still need to find out but I think it'll be good enough. Yes, I'll discard the Rook part. That makes things a lot easier for us.

tobru marked this conversation as resolved