ADR 003 Cluster API Access #3
No reviewers
Labels
No labels
bug
change
duplicate
enhancement
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
servala/documentation!3
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "adr/cluster-api-access"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
ab3fae0e96to8be49c4932WIP: ADR 003 Cluster API Accessto ADR 003 Cluster API Access@ -0,0 +73,4 @@## DecisionWe go with **Netbird**.I agree that Netbird seems to be the most viable solution for our use cases.
I have tested Netbird quite extensively. It poses a good alternative to Tailscale. I like the fact that we could potentially self-host the control server too.
The Netbird Kubernetes Operator works, but the installation process was rather confusing since the documentation did not reflect the actual Helm charts... However, I managed to get it working with the Helm chart.
I tried a different approach, which I personally quite like.
I installed the Netbird Talos Extension on each of the nodes. Upon booting, they self-registered and were placed in the cluster group automatically. I then created a network manually, created the Kubernetes API resource, and used all of the nodes as routing peers to access the VIP I had configured for the Kubernetes API. This allows us to access the Kubernetes API in a fault-tolerant way at both the Netbird and Kubernetes levels. Through Netbird's policy engine, I was then able to allow the clients group to access the Kubernetes API on port 6443, as well as allowing node-level access on port 50000, which is the Talos machine API. Consequently, I don't require an initial bastion host or a temporary LBaaS to bootstrap the Talos cluster. I can boot the machines with their machine configurations via CloudInit (NetBird is already installed and getting bootstrapped automatically at this stage). Then I only have to run
talosctl bootstrap.@ -0,0 +83,4 @@- There is no closed source / enterprise version of it, it's possible to fully [self-host it](https://netbird.io/pricing#on-prem).- Netbird GmbH, the company behind it, is [based in Berlin, Germany](https://netbird.io/imprint).- [Pricing of the SaaS version](https://netbird.io/pricing) is moderate, starts at $5 user per month with a good feature set (including SSO).- Integration in Kubernetes is well done with the [Kubernetes operator](https://github.com/netbirdio/kubernetes-operator).It's worth mentioning that the operator only comes into play when we want to give our customers service-level access. I was able to solve all the infrastructure-related access problems without it
@ -0,0 +96,4 @@We'll have to decide if we use the SaaS-hosted control-plane of Netbird or if we go all-in self-hosted.When we use the SaaS version, we have to do additional due diligence of the product and we need to factor in the costs in the product.There are some differences between self-hosted and SaaS version documented: [Self-hosted vs. Cloud-hosted Netbird](https://docs.netbird.io/selfhosted/self-hosted-vs-cloud-netbird#features).It's not clear if this is a license or software restriction. Right now, Netbird SaaS is hosted at AWS which is an issue to us.For now I think it's OK to use their offering but in the long term we should definitely evaluate the self-hosted version since AWS contradicts our principles
@ -0,0 +1,66 @@# Kubernetes API Access```┌──────────────────────────────────────────────────────────────────────────────┐Depending on the comment I made above and if you like the proposed solution, we can adjust this diagram slightly
Also I forgot to mention; The number of machines that can be registered under their managed plan is quite limited... Another reason why self-hosting could be a good option.
@tobru I have updated the ADR with my findings and added some more context how that solution would be implemented. Furthermore I have removed the diagram for now, I am working on the overall architecture doc, which will include the updated access model.