Operator SDK monitoring with Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit. Below is the overview of the different helpers that exist in Operator SDK to help setup metrics in the generated operator.

Metrics in Operator SDK

General metrics

The CreateMetricsService(ctx context.Context, cfg *rest.Config, servicePorts []v1.ServicePort) (*v1.Service, error) function exposes general metrics about the running program. These metrics are inherited from controller-runtime. To understand which metrics are exposed, read the metrics package doc of controller-runtime. The function creates a Service object with the metrics port exposed, which can then be accessed by Prometheus. The Service object is garbage collected when the leader pod’s root owner is deleted.

By default, the metrics are served on 0.0.0.0:8383/metrics. To modify the port the metrics are exposed on, change the var metricsPort int32 = 8383 variable in the cmd/manager/main.go file of the generated operator.

Usage:

    import(
        "context"

        "github.com/operator-framework/operator-sdk/pkg/metrics"
        "sigs.k8s.io/controller-runtime/pkg/manager"
        "k8s.io/api/core/v1"
        "k8s.io/apimachinery/pkg/util/intstr"
    )

    func main() {

        ...

        // Change the below variables to serve metrics on different host or port.
        var metricsHost = "0.0.0.0"
        var metricsPort int32 = 8383

        // Pass metrics address to controller-runtime manager
        mgr, err := manager.New(cfg, manager.Options{
            Namespace:          namespace,
            MetricsBindAddress: fmt.Sprintf("%s:%d", metricsHost, metricsPort),
        })

        ...

        // Add to the below struct any other metrics ports you want to expose.
	    servicePorts := []v1.ServicePort{
		    {Port: metricsPort, Name: metrics.OperatorPortName, Protocol: v1.ProtocolTCP, TargetPort: intstr.IntOrString{Type: intstr.Int, IntVal: metricsPort}},
	    }

        // Create Service object to expose the metrics port.
        _, err = metrics.CreateMetricsService(context.TODO(), cfg, servicePorts)
        if err != nil {
            // handle error
        }

        ...

    }

Note: The above example is already present in cmd/manager/main.go in all the operators generated with Operator SDK from v0.5.0 onwards.

Garbage collection

The metrics Service is garbage collected when the resource used to deploy the operator is deleted (e.g. Deployment). This resource is determined when the metrics Service is created, at that time the resource owner reference is added to the Service.

In Kubernetes clusters where OwnerReferencesPermissionEnforcement is enabled (on by default in all OpenShift clusters), the role requires a <RESOURCE-KIND>/finalizers rule to be added. By default when creating the operator with the Operator SDK, this is done automatically under the assumption that the Deployment object was used to create the operator pods. In case another method of deploying the operator is used, replace the - deployments/finalizers in the deploy/role.yaml file. Example rule from deploy/role.yaml file for deploying operator with a StatefulSet:

...
- apiGroups:
  - apps
  resourceNames:
  - <STATEFULSET-NAME>
  resources:
  - statefulsets/finalizers
  verbs:
  - update
...

Custom resource specific metrics

By default operator will expose info metrics based on the number of the current instances of an operator’s custom resources in the cluster. It leverages kube-state-metrics as a library to generate those metrics. Metrics initialization lives in the cmd/manager/main.go file of the operator in the serveCRMetrics function. Its arguments are a custom resource’s group, version, and kind to generate the metrics. The metrics are served on 0.0.0.0:8686/metrics by default. To modify the exposed metrics port number, change the operatorMetricsPort variable at the top of the cmd/manager/main.go file in the generated operator.

Expose custom metrics

The operator uses Prometheus to expose a number of metrics by default. In order to expose custom metrics they have to be registered with the Registry object. An example can be found in the kubebuilder book.