ADR 003 Cluster API Access #3

Merged
tobru merged 11 commits from adr/cluster-api-access into main 2026-01-08 09:06:55 +00:00
Owner
No description provided.
tobru force-pushed adr/cluster-api-access from ab3fae0e96 to 8be49c4932 2025-12-23 08:32:06 +00:00 Compare
tobru changed title from WIP: ADR 003 Cluster API Access to ADR 003 Cluster API Access 2025-12-23 10:21:50 +00:00
@ -0,0 +73,4 @@
## Decision
We go with **Netbird**.
Member

I agree that Netbird seems to be the most viable solution for our use cases.
I have tested Netbird quite extensively. It poses a good alternative to Tailscale. I like the fact that we could potentially self-host the control server too.
The Netbird Kubernetes Operator works, but the installation process was rather confusing since the documentation did not reflect the actual Helm charts... However, I managed to get it working with the Helm chart.

I tried a different approach, which I personally quite like.
I installed the Netbird Talos Extension on each of the nodes. Upon booting, they self-registered and were placed in the cluster group automatically. I then created a network manually, created the Kubernetes API resource, and used all of the nodes as routing peers to access the VIP I had configured for the Kubernetes API. This allows us to access the Kubernetes API in a fault-tolerant way at both the Netbird and Kubernetes levels. Through Netbird's policy engine, I was then able to allow the clients group to access the Kubernetes API on port 6443, as well as allowing node-level access on port 50000, which is the Talos machine API. Consequently, I don't require an initial bastion host or a temporary LBaaS to bootstrap the Talos cluster. I can boot the machines with their machine configurations via CloudInit (NetBird is already installed and getting bootstrapped automatically at this stage). Then I only have to run talosctl bootstrap.

I agree that Netbird seems to be the most viable solution for our use cases. I have tested Netbird quite extensively. It poses a good alternative to Tailscale. I like the fact that we could potentially self-host the control server too. The Netbird Kubernetes Operator works, but the installation process was rather confusing since the documentation did not reflect the actual Helm charts... However, I managed to get it working with the Helm chart. I tried a different approach, which I personally quite like. I installed the Netbird Talos Extension on each of the nodes. Upon booting, they self-registered and were placed in the cluster group automatically. I then created a network manually, created the Kubernetes API resource, and used all of the nodes as routing peers to access the VIP I had configured for the Kubernetes API. This allows us to access the Kubernetes API in a fault-tolerant way at both the Netbird and Kubernetes levels. Through Netbird's policy engine, I was then able to allow the clients group to access the Kubernetes API on port 6443, as well as allowing node-level access on port 50000, which is the Talos machine API. Consequently, I don't require an initial bastion host or a temporary LBaaS to bootstrap the Talos cluster. I can boot the machines with their machine configurations via CloudInit (NetBird is already installed and getting bootstrapped automatically at this stage). Then I only have to run `talosctl bootstrap`.
marco.deluca marked this conversation as resolved
@ -0,0 +83,4 @@
- There is no closed source / enterprise version of it, it's possible to fully [self-host it](https://netbird.io/pricing#on-prem).
- Netbird GmbH, the company behind it, is [based in Berlin, Germany](https://netbird.io/imprint).
- [Pricing of the SaaS version](https://netbird.io/pricing) is moderate, starts at $5 user per month with a good feature set (including SSO).
- Integration in Kubernetes is well done with the [Kubernetes operator](https://github.com/netbirdio/kubernetes-operator).
Member

It's worth mentioning that the operator only comes into play when we want to give our customers service-level access. I was able to solve all the infrastructure-related access problems without it

It's worth mentioning that the operator only comes into play when we want to give our customers service-level access. I was able to solve all the infrastructure-related access problems without it
marco.deluca marked this conversation as resolved
@ -0,0 +96,4 @@
We'll have to decide if we use the SaaS-hosted control-plane of Netbird or if we go all-in self-hosted.
When we use the SaaS version, we have to do additional due diligence of the product and we need to factor in the costs in the product.
There are some differences between self-hosted and SaaS version documented: [Self-hosted vs. Cloud-hosted Netbird](https://docs.netbird.io/selfhosted/self-hosted-vs-cloud-netbird#features).
It's not clear if this is a license or software restriction. Right now, Netbird SaaS is hosted at AWS which is an issue to us.
Member

For now I think it's OK to use their offering but in the long term we should definitely evaluate the self-hosted version since AWS contradicts our principles

For now I think it's OK to use their offering but in the long term we should definitely evaluate the self-hosted version since AWS contradicts our principles
marco.deluca marked this conversation as resolved
@ -0,0 +1,66 @@
# Kubernetes API Access
```
┌──────────────────────────────────────────────────────────────────────────────┐
Member

Depending on the comment I made above and if you like the proposed solution, we can adjust this diagram slightly

Depending on the comment I made above and if you like the proposed solution, we can adjust this diagram slightly
marco.deluca marked this conversation as resolved
Member

Also I forgot to mention; The number of machines that can be registered under their managed plan is quite limited... Another reason why self-hosting could be a good option.

Also I forgot to mention; The number of machines that can be registered under their managed plan is quite limited... Another reason why self-hosting could be a good option.
Member

@tobru I have updated the ADR with my findings and added some more context how that solution would be implemented. Furthermore I have removed the diagram for now, I am working on the overall architecture doc, which will include the updated access model.

@tobru I have updated the ADR with my findings and added some more context how that solution would be implemented. Furthermore I have removed the diagram for now, I am working on the overall architecture doc, which will include the updated access model.
tobru merged commit 8dcbb600cb into main 2026-01-08 09:06:55 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
servala/documentation!3
No description provided.