Toward Building a Kubernetes Control Plane for the Edge

Unleashing the power of kcp

Paolo Dettori
10 min readMar 3, 2023

By Paolo Dettori, Jun Duan, Andy Anderson

Photo by Greg Rakozy on Unsplash

There have never been more exciting times as these days for Cloud Native technologies. A vibrant ecosystem has grown around Kubernetes in the Cloud Native Computing Foundation. A lot of innovation has been happening as well in Edge Computing, due to the increased amount of data generated at the edge and availability of powerful but affordable System on a Chip (SoC) computing platforms. As Kubernetes gets more adoption in the edge, thanks to a growing number of projects addressing this space such as k3s, k0s and MicroShift, managing the lifecycle of edge applications can greatly benefit from best practices and technologies adopted in the CNCF ecosystem. However, new challenges arise when dealing with a large number of clusters, disconnected operations and limited resource availability at the edge.

This blog is part of a series of posts from the KCP-Edge community regarding challenges related to multi-cloud and edge. You can learn more about the challenges and read posts from other members of the KCP-Edge community on edge and multi-cloud topics in a post by Andy Anderson entitled Navigating the Edge: Overcoming Multi-Cloud Challenges in Edge Computing.

A new approach to building control planes

In the past few years, new trends have emerged in the CNCF community related to managing Kubernetes clusters as cattle and not as pets, as well as the emergence of projects focusing on management of resources other than containers, such as Crossplane. These trends motivated the evolution of Kubernetes-based technologies aimed at raising the level of abstraction above container managament. In this blog, we will take a look at kcp, a new project based on Kubernetes API machinery that provides a set of interesting new features to build a highly scalable control plane for Kubernetes-style APIs. What does kcp bring to the table ? Fist of all, it provides a minimal Kubernetes API server. This is quite useful as we may just want to use the API server for all the extensibility it provides for defining new APIs as Custom Resource Definitions. But where kcp really shines is the ability to virtualize the API server and provide many independent “clusters” known as workspaces. Each workspace behaves like the Kubernetes we know and love; we can use the kubectl CLI or other Kubernetes tools to interact with it, create namespaces and install CRDs. Workspaces are very lightweight, and the experience of creating them is very similar to the experience of creating namespaces. There are also other useful features, such as exporting and importing APIs across workspaces. In short, kcp provides a set of building blocks that we can use to build a control plane for our needs. For example, we can use kcp as a Kubernetes API front-end for a set of clusters; users interact with kcp workspaces to submit workloads and we can write placement policies to assign each workload to a cluster (out of many) running behind the kcp control plane. As a matter of fact, this is the first service that has been built with the kcp building blocks — Transparent Multi-Cluster (TMC).

Kicking the tires

Let’s first have a look at kcp and TMC. You will need kubectl and a local cluster such as kind. Download the latest kcp release (v0.11.0 at the time of this writing) for your platform and copy both the kcp binary and the plugins to a location in your $PATH. Open a shell and create a kind cluster to host a workload.

$ kind create cluster --name red

Now we are ready to start kcp:

$ mkdir ${HOME}/kcp
$ cd ${HOME}/kcp
$ kcp start

Open another shell and setup KUBECONFIG to point to the kcp config:

$ export KUBECONFIG=${HOME}/kcp/.kcp/admin.kubeconfig

Let’s now interact with kcp workspaces. We can use the ws plugin to check on our current workspace:

$ kubectl ws .
Current workspace is "root".

Let’s create a new workspace infra under the root workspace. This workspace will hold information about the cluster, as I will explain more in detail later.

$ kubectl ws create infra --type universal --enter
Workspace "infra" (type root:universal) created. Waiting for it to be ready...
Workspace "infra" (type root:universal) is ready to use.
Current workspace is "root:infra" (type root:universal).

A workspace looks a lot like a cluster; we can list and create new namespaces under a workspace, without affecting other workspaces:

$ kubectl create ns ns1
namespace/ns1 created
$ kubectl get ns
NAME STATUS AGE
default Active 4m20s
ns1 Active 8s

Let’s now register our kind cluster with TMC. First we issue a command to generate the installation manifest for a TMC agent to install on the cluster. The agent is called “syncer”, as it allows to sync resources from the workspace to the cluster.

$ VERSION=$(kubectl version -o yaml | grep "+kcp" | awk -v FS=kcp- '{print $2}')
$ kubectl kcp workload sync red --syncer-image ghcr.io/kcp-dev/kcp/syncer:${VERSION} -o ${HOME}/syncer-red.yaml

Note that the image tag for the syncer must match the kcp VERSION.

We can now install the syncer on the kind cluster. We are assuming that the kube config file for the kind cluster is on the default path ${HOME}/.kube/config and that context is set on the kind cluster.

$ KUBECONFIG=${HOME}/.kube/config kubectl apply -f ${HOME}/syncer-red.yaml

We can verify that the syncer is started with the command:

$ KUBECONFIG=${HOME}/.kube/config kubectl get pods -A | grep syncer
kcp-syncer-red-13taz88g kcp-syncer-red-13taz88g-7dffc78558-tpxl8 1/1 Running 0 35s

kcp provides a nice model for helping with separation of responsibility. Platform engineers can focus on allocating infrastructure in Ops-managed workspaces (the infra workspace in our example) while developers can consume infrastructure in workspaces dedicated to workload management. Let’s create a new workload workspace under root:

$ kubectl ws root
Current workspace is "root".
$ kubectl ws create dev1 --type universal --enter
Workspace "dev1" (type root:universal) created. Waiting for it to be ready...
Workspace "dev1" (type root:universal) is ready to use.
Current workspace is "root:dev1" (type root:universal).

Finally, we need to create a binding to the workload APIs export workspace and a default placement for the kind cluster defined in the infra workspace:

$ kubectl kcp bind compute root:infra
placement placement-35awluoi created.
Placement "placement-35awluoi" is ready.

And here is the coolest part; we can now create a deployment on the kcp workspace which gets automatically scheduled on the kind cluster. From a user perspective, this looks like interacting with a regular Kubernetes cluster.

$ kubectl create deployment --image=gcr.io/kuar-demo/kuard-amd64:blue --port=8080 kuard
deployment.apps/kuard created

But, since the kcp-based control plane acts as a front-end for the users, we can make decisions there on where workloads should run and even migrate workloads from one cluster to another.

We can check the status of the deployment on kcp and verify it is actually running on the kind cluster:

$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
kuard 1/1 1 1 4m44s
$ KUBECONFIG=${HOME}/.kube/config kubectl get deployment -A | grep kuard
kcp-2o0aifot36ba kuard 1/1 1 1 4m46s

Deploying to multiple clusters

So far we have deployed to one cluster. What happens if I have more clusters where we want to deploy workloads ? Let’s create another cluster with kind.

$ KUBECONFIG=${HOME}/.kube/config kind create cluster --name blue

To register our new cluster with TMC, we run again the command to generate the syncer installation manifest from the infra workspace. Let’s make sure we use the name of the new cluster in the command (blue):

$ kubectl ws root:infra
Current workspace is "root:infra".
$ VERSION=$(kubectl version -o yaml | grep "+kcp" | awk -v FS=kcp- '{print $2}')
$ kubectl kcp workload sync blue --syncer-image ghcr.io/kcp-dev/kcp/syncer:${VERSION} -o ${HOME}/syncer-blue.yaml

As before, let’s install the syncer on the new cluster and check it starts correctly. Note that we need also to ensure we set the context correctly for the new cluster before running commands on it.

$ KUBECONFIG=${HOME}/.kube/config kubectl config use-context kind-blue
Switched to context "kind-blue".
$ KUBECONFIG=${HOME}/.kube/config kubectl apply -f ${HOME}/syncer-blue.yaml
$ KUBECONFIG=${HOME}/.kube/config kubectl get pods -A | grep syncer
kcp-syncer-blue-23ltr8wx kcp-syncer-blue-23ltr8wx-6b7c48fb86-np9jb 1/1 Running 0 117s

Great ! We have now one more cluster, but now we may ask, how do we control what goes where when we have multiple clusters registered with kcp ?

Looking under the hood

To answer that question, we need to first understand how Transparent Multi-Cluster scheduling works in kcp. We will look at the basics here, but for more details you may have a look at the kcp documentation for Placement, Location and Scheduling.

Fig.1: Transparent Multi-Cluster Scheduling in kcp

As illustrated in Fig.1, when we generate a syncer install manifest, the kcp workload sync command generates also a SyncTarget and a Location resource in the workspace where we run the command. A SyncTarget represents the physical cluster registered with kcp, and contains information such as the kind of resources being synced and the health of the cluster. SyncTargets are grouped in Locations using label selectors. There could be multiple locations in a workspace, but when we run the kcp workload sync command for the first time it creates a default location for the first SyncTarget and when we run the command to add more clusters it adds all the other SyncTargets to the default location. But how can we tell what goes where? That is the job of the Placement resource. A Placement binds namespaces to locations using label selectors. Note that the Placement can be on a different workspace from the one hosting the locations and SyncTargets. The syncer can be configured to sync specific kinds of resources (syncable resources). All syncable resources in a bound namespace are synced to one of the clusters that belongs to one of the locations bound to the namespace. For each bound namespace, the TMC scheduler selects first one location and then one SyncTarget and then it labels the namespace and each syncable object within with a label identifying the SyncTarget. Each syncer then syncs to its own physical cluster all syncable resources with a label matching the associated SyncTarget. That was a lot to take in, but now, armed with that knowledge, we can look at our setup and understand better what is going on.

Let’s check first on SyncTargets:

$ kubectl get synctargets
NAME AGE
blue 55m
red 59m

As expected, we have one SyncTarget for the blue cluster and one for the red cluster. We have also a default location grouping together the two SyncTargets:

$ kubectl get locations
NAME RESOURCE AVAILABLE INSTANCES LABELS AGE
default synctargets 2 2 60m

Finally, we have a default placement that selects and binds all namespaces and all locations within the workspace:

$ k get placements
NAME AGE
placement-9vzlsinu 70m

The Transparent Multi-Cluster main use case addresses the scenario where there is a pool of clusters and a need to spread workloads across the clusters in the pool. TMC also provides automatic failover; workloads are automatically migrated from unhealthy clusters to healthy clusters. For these reasons, TMC is focused on one-to-any scheduling of workloads. But in edge computing, we are interested in one-to-many scheduling, where we have a fleet of clusters and we want to deliver the same application (possibly with a different configuration for each cluster) to the whole fleet or subsets of the fleet.

One-to-many distribution

Now that we understand better how TMC works, we can look at how we can modify our simple example to deliver the same workload to both clusters. Since the TMC scheduler selects only one SyncTarget within each location, we need to have two locations, and each location should select only one SyncTarget. And since a Placement selects only one location, we need also to have two Placements, one for each location.

Let’s start by first deleting the default location in the infra workspace:

$ kubectl delete location default

Now we label the SyncTargets as follows:

$ kubectl label synctarget red cluster=red
$ kubectl label synctarget blue cluster=blue

Then, we create the blue and red locations as follows:

cat <<EOF | kubectl apply -f -
apiVersion: scheduling.kcp.io/v1alpha1
kind: Location
metadata:
name: blue
labels:
cluster: blue
spec:
instanceSelector:
matchLabels:
cluster: blue
resource:
group: workload.kcp.io
resource: synctargets
version: v1alpha1
EOF
cat <<EOF | kubectl apply -f -
apiVersion: scheduling.kcp.io/v1alpha1
kind: Location
metadata:
name: red
labels:
cluster: red
spec:
instanceSelector:
matchLabels:
cluster: red
resource:
group: workload.kcp.io
resource: synctargets
version: v1alpha1
EOF

Finally, let’s switch back to the dev1 workspace to delete the old placement:

$ kubectl ws root:dev1
Current workspace is "root:dev1".
$ kubectl delete placements --all
placement.scheduling.kcp.io "placement-1plx456f" deleted

and add new red and blue placements:

$ cat <<EOF | kubectl apply -f -
apiVersion: scheduling.kcp.io/v1alpha1
kind: Placement
metadata:
name: placement-blue
spec:
locationResource:
group: workload.kcp.io
resource: synctargets
version: v1alpha1
locationSelectors:
- matchLabels:
cluster: blue
locationWorkspace: root:infra
namespaceSelector: {}
EOF
$ cat <<EOF | kubectl apply -f -
apiVersion: scheduling.kcp.io/v1alpha1
kind: Placement
metadata:
name: placement-red
spec:
locationResource:
group: workload.kcp.io
resource: synctargets
version: v1alpha1
locationSelectors:
- matchLabels:
cluster: red
locationWorkspace: root:infra
namespaceSelector: {}
EOF

We can now verify that each location in infra has only one SyncTarget:

$ kubectl ws root:infra
Current workspace is "root:infra".
$ kubectl get locations
NAME RESOURCE AVAILABLE INSTANCES LABELS AGE
blue synctargets 1 1 15m
red synctargets 1 1 13m

And that the deployment has been applied to both clusters:

$ KUBECONFIG=${HOME}/.kube/config kubectl config use-context kind-blue
Switched to context "kind-blue".
$ KUBECONFIG=${HOME}/.kube/config kubectl get deployments -A | grep kuard
kcp-1w2uy53g7j03 kuard 1/1 1 1 5m2s

$ KUBECONFIG=${HOME}/.kube/config kubectl config use-context kind-red
Switched to context "kind-red".
$ KUBECONFIG=${HOME}/.kube/config kubectl get deployments -A | grep kuard
kcp-1e1duzmjz5lp kuard 1/1 1 1 5m49s

In conclusion, we managed to achieve a one-to-many deployment, but it is a laborious process which would quickly become unmanageable for a large number of clusters. Furthermore, there are other issues we have not even discussed, such us getting a summary of the status of the deployments on the different clusters. For edge scenarios, there are even more requirements to consider, such as the ability to customize deployments for different clusters, the ability to tolerate disconnected operations etc.

Conclusions

The kcp project provides a set of innovative and useful features for building control planes abstracting clusters from users. But, as I have shown in this blog, additional features need to be built to address edge computing scenarios. To learn more about technical challenges at the edge, you may read the post Seven Ways to Stub Your Toes on The Edge by Mike Spreitzer. Recently, we started a new edge multi-cluster project (kcp-edge) together with the kcp community, focused specifically on these topics. You can find more information and ways to get in touch with our community by visiting kubestellar.io.

--

--

Paolo Dettori

Sr. Technical Staff Member at IBM. Passionate about Cloud, Containers and cooking Italian food. Views are my own.