The open source Operator Framework is a toolkit to manage Kubernetes-native applications. The framework and its features provide the ability to develop solutions to simplify some complexities, such as the process to install, configure, manage and package applications on Kubernetes and Red Hat OpenShift. It provides the ability to use a client to perform CRUD actions, that is, operations to create, read, update, and delete data on these platforms.

By using operators, it's possible not only to provide all expected resources but also to manage them dynamically, programmatically, and at execution time. To illustrate this idea, imagine if someone accidentally changed a configuration or removed a resource by mistake; in this case, the operator could fix it without any human intervention. We'll take a look at Operators and the Operator SDK in this article.

Note: As a prerequisite for this content, it's essential to follow the steps outlined in the Getting Started guide.

APIs

operator-sdk add api --api-version=cache.example.com/v1alpha1 --kind=Memcached.cache.example.comv1alpha1Memcached

Consequently, by using the Operator SDK tool, we can create our APIs and objects that will represent our solutions on these platforms. The Getting Started tutorial adds only a single kind of resource; however, it could have as many Kinds as needed (1…N). Basically, the CRDs are a definition of our customized Objects, and the CRs are an instance of it.

Project

$operator-sdk build user/image:tagREPLACE_IMAGEoperator.yamlkubectl create -f deploy/operator.yaml

Demonstrating the idea

Let's think about the classic scenario where the goal is to have an application and its database running on the platform with Kubernetes. Then, one object could represent the App, and another one could represent the DB. By having one CRD to describe the App and another one for the DB, we will not be hurting concepts such as encapsulation, the single responsibility principle, and cohesion. Damaging these concepts could cause unexpected side effects, such as difficulty in extending, reuse, or maintenance, just to mention a few.

In conclusion, the App CRD will have as its controller the DB CRD. Imagine, that a Deployment and Service are required for the application run so that the App's Controller will provide these resources in this example. Similarly, the DB's controller will have the business logic implementation of its objects.

In this way, for each CRD, one controller should be produced according to the design set by the controller-runtime.

Controller main functions

Reconcile()

The reconcile function is responsible for synchronizing the resources and their specifications according to the business logic implemented on them. In this way, it works like a loop, and it does not stop until all conditionals match its implementation. The following is pseudo-code with an example that clarifies it.

reconcile App {

   // Check if a Deployment for the app exists, if not create one
   // If has an error, then go to the beginning of the reconcile
   if err != nil {
       return reconcile.Result{}, err 
   } 
   
   // Check if a Service for the app exists, if not create one 
   // If has an error, then go to the beginning of the reconcile
   if err != nil {
       return reconcile.Result{}, err 
   }  

   // Looking for Database CR/CRD 
   // Check the Database Deployments Replicas size
   // If deployment.replicas size != cr.size, then update it
   // Then, go to the beginning of the reconcile
   if err != nil {
       return reconcile.Result{Requeue: true}, nil
   }  
   ...
   
   // If it is at the end of the loop, then:
   // All was done successfully and the reconcile can stop  
   return reconcile.Result{}, nil

}

The following are possible return options to restart the Reconcile:

  • With the error:
return reconcile.Result{}, err
  • Without an error:
return reconcile.Result{Requeue: true}, nil
  • Therefore, to stop the Reconcile, use:
return reconcile.Result{}, nil
ReconcileResult

Watch()

The watches are responsible for ''watching" the objects and triggering the Reconcile. Also, the Operator SDK tool will generate a Watch function for each primary resource (CRD). Here is an example:

// Watch for changes to primary resource Memcached
err = c.Watch(&source.Kind{Type: &cachev1alpha1.Memcached{}}, &handler.EnqueueRequestForObject{})
if err != nil {
    return err
}

By following the Getting Started, a watch function for each secondary object managed by it will also be implemented, such as below.

// Watch for changes to secondary resource Pods and requeue the owner Memcached

err = c.Watch(&source.Kind{Type: &appsv1.Deployment{}}, &handler.EnqueueRequestForOwner{
    IsController: true,
    OwnerType:    &cachev1alpha1.Memcached{},
})
if err != nil {
    return err
}

err = c.Watch(&source.Kind{Type: &corev1.Service{}}, &handler.EnqueueRequestForOwner{
    IsController: true,
    OwnerType:    &cachev1alpha1.Memcached{},
})
if err != nil {
    return err
}

Also, the following code ensures the quantity of Memcached replicas running on the cluster.

// Ensure the deployment size is the same as the spec
size := memcached.Spec.Size
if *deployment.Spec.Replicas != size {
    deployment.Spec>.Replicas = &size
    err = r.client.Update(context.TODO(), deployment)
    if err != nil {
        reqLogger.Error(err, "Failed to update Deployment.", "Deployment.Namespace", deployment.Namespace, "Deployment.Name", deployment.Name)
         return reconcile.Result{}, err
    }
}

After that, you can check that the above code worked by doing the following steps.

  1. Scale the Memcached pod up or down.
  2. Check that the replicas will come back for the original size because of the above code.

Note: The above steps will only work if you were able to follow the guide and all finished successfully.

Last updated: March 29, 2023