commit 0cb8bc2e85e73a45b281260398c13d48dfaa3095bd6a007b4e3948a39ae7efe0 Author: mue Date: Sun Dec 29 17:02:26 2024 +0100 Initial commit diff --git a/README.md b/README.md new file mode 100644 index 0000000..adf7655 --- /dev/null +++ b/README.md @@ -0,0 +1,90 @@ +# k8sbc + +## Original description of the 38C3 planning session +We want to build a Kubernetes cluster on distributed single board computers. +The idea is a self-hosted / home-hosted system with nodes being distributed among the homes +of friends or people one trusts. + +Nodes should be easily bootstrappable which is why we want to start with +one kind of SBC, so that we only need to provide one kind of system image +for the chosen hardware specifically. Nodes are connected with WireGuard. + +This is no hands-on workshop but a self-organized planning session + +## Communication / Collaboration + +Checkout the 'kickoff' repo and add your nick and mail address and we will +create a gitea account for you + +- mue mue-k8sbc@e2m.io +- steigr +- juwi +- hybris + +Mailinglist? +Matrix Room? + +## Notes from 38C3 +We had a first planning session at 38C3 discussing some of the project aspects, +especially latency issues with kubernetes / etcd and distributed storage solutions. +Furthermore we discussed how to handle the cluster IP / entry point, ingresses, +how nodes discover each other and update their (dynamic) IP adresses + +Control Plane on a cloud provider (e.g. Hetzner) - would ease things but is also less interesting. +More challenging is to also run the control plane within our SBC cluster + +Wireguard seems to be the preferred technology to establish the 'node interconnect' +Alternatives are Tailscale and Tinc + +### Node discovery and Ingress +TODO: Split this topic into 'node discovery' (for nodes joining the cluster) +and 'ingress' (service discovery inside the cluster) + +DNS-only-solution, nodes update their adress in the public DNS server + +[Cilium](https://cilium.io/) +[cloud-controller-manager](https://kubernetes.io/docs/concepts/architecture/cloud-controller) +[Tor hidden service](https://community.torproject.org/onion-services/setup/) + +### Storage +S3 compat layer provided by Ceph as one option +[Ceph Stretch](https://docs.ceph.com/en/latest/rados/operations/stretch-mode/) +Needs 3 zones (arbiter versus data zones) with a maximum of 700ms latency +Ceph Stretch is a solution to specifically cater for higher latencies +because its use case is to span over several geographically seperated data centers +Possibly [rook](https://rook.io/docs/rook/v1.9/ceph-storage.html) as provider +[OpenEBS](https://openebs.io/) +[Longhorn](https://longhorn.io/) + +### Problems +etcd latency +Network bandwith and latencies not predictable for the individual participant's connections +Low bandwith, not symmetric with regard to bandwith + +### What we did _not_ discuss +What kind of SBC to use or support +The more RAM the better. Many SBCs are low on RAM, typically only up to 4GB +and while SBCs with much more RAM exist they might not be as well supported by Linux +or not as well available like the more common ones +An interesting candidate seems to the [Raxda Rock 3-5](https://wiki.radxa.com/Rock3) with up to 32GB of RAM + +While there is no immediate or inherent reason to restrict that, +it's easiest to commit to one kind of SBC, providing one type of system image to users +On the other hand the people with initial interest in the project already own SBCs +to dedicate to the cluster and those people are generally capable to create +and / or run their own system image for their SBC + +## What we have +### Hardware +Pine64 +Pine ROCKPro64 +Raspberry Pi + +### Software +Talos on Pine64 + +## Next steps +Please add your name, mail address, notes and thoughts to this README +Decide on and set up communication infrastructure +Decide on cluster endpoint / entry point solution +Explore Talos more in-depth