Add architecture overview #4

Merged
marco.deluca merged 3 commits from overall-architecture into main 2026-01-09 15:54:57 +00:00
Member
No description provided.
marco.deluca changed title from WIP: Add architecture overview to Add architecture overview 2026-01-08 08:27:43 +00:00
tobru force-pushed overall-architecture from 6dc8deec03 to b171f10143 2026-01-08 09:08:23 +00:00 Compare
tobru left a comment
Owner

Do we need to add some notes on how Project Syn plays its part?

Do we need to add some notes on how Project Syn plays its part?
@ -0,0 +20,4 @@
| Cluster Type | Purpose |
|--------------|---------|
| **Management Cluster** | Hosts the Servala Portal, centralized monitoring, and alerting infrastructure |
| **Workload Clusters** | Run customer workloads and AppCat service instances |
Owner

Run customer workloads and AppCat service instances

"Run customer service instances and AppCat control-plane"

> Run customer workloads and AppCat service instances "Run customer service instances and AppCat control-plane"
marco.deluca marked this conversation as resolved
@ -0,0 +22,4 @@
| **Management Cluster** | Hosts the Servala Portal, centralized monitoring, and alerting infrastructure |
| **Workload Clusters** | Run customer workloads and AppCat service instances |
All clusters run Talos Linux and are deployed using Terraform/OpenTofu across multiple cloud providers. The architecture is designed for secure, private access with no public Kubernetes API exposure.
Owner

Should we already mention the fact that there might be CSP offering "Managed Talos" like Hidora/Hikube where we potentially use their service instead of bringing our own?

Should we already mention the fact that there might be CSP offering "Managed Talos" like Hidora/Hikube where we potentially use their service instead of bringing our own?
Author
Member

I have added a little note which states that we are open to it if the CSP offers a managed Kubernetes based on Talos

I have added a little note which states that we are open to it if the CSP offers a managed Kubernetes based on Talos
marco.deluca marked this conversation as resolved
@ -0,0 +24,4 @@
All clusters run Talos Linux and are deployed using Terraform/OpenTofu across multiple cloud providers. The architecture is designed for secure, private access with no public Kubernetes API exposure.
All clusters run a highly available control plane consisting of 3 master nodes. Worker nodes are scaled manually based on capacity needs.
Owner

Worker nodes are scaled manually based on capacity needs.

"Worker nodes are currently scaled manually based on capacity needs."

> Worker nodes are scaled manually based on capacity needs. "Worker nodes are currently scaled manually based on capacity needs."
marco.deluca marked this conversation as resolved
@ -0,0 +30,4 @@
AppCat is the core component of the Servala service catalog, built on [Crossplane](https://www.crossplane.io/). It runs on every cluster and enables the provisioning and management of cloud-native services for customers.
For detailed information on AppCat's architecture and capabilities, see the [AppCat documentation](https://docs.appcat.ch/index.html).
Owner

Maybe also link to https://kb.vshn.ch/app-catalog/ which is more technical and about the architecture

Maybe also link to https://kb.vshn.ch/app-catalog/ which is more technical and about the architecture
Author
Member

done

done
marco.deluca marked this conversation as resolved
@ -0,0 +38,4 @@
### CIDR Allocation Strategy
To enable potential future mesh connectivity, each cluster receives non-overlapping network ranges:
Owner

Do you want to elaborate here what constraints these prefix length will impose? Aka what that means in terms of max nodes, pods or services per cluster and the max. number of clusters supported?

Do you want to elaborate here what constraints these prefix length will impose? Aka what that means in terms of max nodes, pods or services per cluster and the max. number of clusters supported?
Author
Member

Added what these CIDRs imply

Added what these CIDRs imply
marco.deluca marked this conversation as resolved
@ -0,0 +87,4 @@
### Audit Logging
Kubernetes API server audit logging is enabled on all clusters to track who did what and when. Audit logs are collected centrally alongside other cluster logs.
Owner

I think we need to track this in a Jira issue so that we do not forget to set it up =)

I think we need to track this in a Jira issue so that we do not forget to set it up =)
Author
Member

done, it's in our board

done, it's in our board
marco.deluca marked this conversation as resolved
@ -0,0 +95,4 @@
## Naming Conventions
### Cluster Names
Owner

Can you elaborate where this cluster name is used? I think this is a Project Syn specific thing? Why do we need this c-servala prefix?

Can you elaborate where this cluster name is used? I think this is a Project Syn specific thing? Why do we need this `c-servala` prefix?
Author
Member

I just stuck with it since it makes it a bit easier for switching between the syn related stuff and my infra configs. I personally don't mind the prefix at all.

I just stuck with it since it makes it a bit easier for switching between the syn related stuff and my infra configs. I personally don't mind the prefix at all.
Owner

I'm OK with keeping the prefix. I'd still love to have a sentence added what this cluster name is used for, because for example it differs from the DNS names.

I'm OK with keeping the prefix. I'd still love to have a sentence added what this cluster name is used for, because for example it differs from the DNS names.
tobru marked this conversation as resolved
@ -0,0 +127,4 @@
| Group | Pattern | Examples |
|-------|---------|----------|
| Control plane | `master-[ID]` | `master-904e`, `master-8beb`, `master-dcb8` |
| Workers | `worker-[ID]` | `worker-e852` |
Owner

Do we care about the worker type in the naming scheme? For example, on Cloudscale we might have plus and flex workers.

Do we care about the worker type in the naming scheme? For example, on Cloudscale we might have plus and flex workers.
Author
Member

My intention with this was to keep workers naming scheme the same across the cluster no matter what compute flavour it uses. I am using nodelabels to distinguish this, for example: node.kubernetes.io/instance-type: plus-16-4

My intention with this was to keep workers naming scheme the same across the cluster no matter what compute flavour it uses. I am using nodelabels to distinguish this, for example: `node.kubernetes.io/instance-type: plus-16-4`
tobru marked this conversation as resolved
@ -0,0 +129,4 @@
| Control plane | `master-[ID]` | `master-904e`, `master-8beb`, `master-dcb8` |
| Workers | `worker-[ID]` | `worker-e852` |
## Cluster Provisioning
Owner

I think this section belongs into a separate documentation. This is not architecture, this is more a "how to" or "Runbook".

I think this section belongs into a separate documentation. This is not architecture, this is more a "how to" or "Runbook".
marco.deluca marked this conversation as resolved
@ -0,0 +276,4 @@
## Image Management
Container images are pulled from public registries (e.g., ghcr.io for AppCat components). [Spegel](https://github.com/spegel-org/spegel) is deployed on each cluster to provide peer-to-peer image sharing between nodes, reducing external registry pulls and improving pull performance.
Owner

Yay! Is this part of Talos or do we install / manage it?

Yay! Is this part of Talos or do we install / manage it?
Author
Member

it's something we slap on top

it's something we slap on top
marco.deluca marked this conversation as resolved
@ -0,0 +285,4 @@
| Scenario | Solution |
|----------|----------|
| CSP provides block storage | Use native CSP storage with appropriate CSI driver |
| CSP lacks storage options | Deploy Rook Ceph for software-defined storage |
Owner

I would go one step further: The CSP must support CSI, otherwise we can't work with them. I don't think we should run Rook tbh. This is a qualification step for the CSP: No CSI? No Servala.

I would go one step further: The CSP _must_ support CSI, otherwise we can't work with them. I don't think we should run Rook tbh. This is a qualification step for the CSP: No CSI? No Servala.
Author
Member

I am more than fine with this statement. Initially I only included it in the document because I was told that for example the performance of Exoscale's block storage was not great but it has been a while since that was assessed. I still need to find out but I think it'll be good enough.
Yes, I'll discard the Rook part. That makes things a lot easier for us.

I am more than fine with this statement. Initially I only included it in the document because I was told that for example the performance of Exoscale's block storage was not great but it has been a while since that was assessed. I still need to find out but I think it'll be good enough. Yes, I'll discard the Rook part. That makes things a lot easier for us.
tobru marked this conversation as resolved
marco.deluca force-pushed overall-architecture from b171f10143 to 38e3b6e05a 2026-01-09 12:50:42 +00:00 Compare
marco.deluca force-pushed overall-architecture from 38e3b6e05a to de5fba5e2a 2026-01-09 12:52:50 +00:00 Compare
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
servala/documentation!4
No description provided.