SPDX-License-Identifier: Apache-2.0
Copyright (c) 2020 Intel Corporation

Telemetry Support in OpenNESS

Overview

OpenNESS supports platform and application telemetry with the aid of multiple telemetry projects. This support allows users to retrieve information about the platform, the underlying hardware, cluster, and applications deployed. The data gathered by telemetry can be used to visualize metrics and resource consumption, set up alerts for certain events, and aid in making scheduling decisions based on the received telemetry. With the telemetry data at a user’s disposal, a mechanism is also provided to schedule workloads based on the data available.

Currently, the support for telemetry is focused on metrics; support for application tracing telemetry is planned in the future.

Architecture

The telemetry components used in OpenNESS are deployed from the Edge Controller as Kubernetes* (K8s) pods. Components for telemetry support include:

  • collectors
  • metric aggregators
  • schedulers
  • monitoring and visualization tools

Depending on the role of the component, it is deployed as either a Deployment or Deamonset. Generally, global components receiving inputs from local collectors are deployed as a Deployment type with a single replica set, whereas local collectors running on each host are deployed as Daemonsets. Monitoring and visualization components such as Prometheus* and Grafana* along with TAS (Telemetry Aware Scheduler) are deployed on the Edge Controller while other components are generally deployed on Edge Nodes. Local collectors running on Edge Nodes that collect platform metrics are deployed as privileged containers. Communication between telemetry components is secured with TLS either using native TLS support for a given feature or using a reverse proxy running in a pod as a container. All the components are deployed as Helm charts.

Telemetry Architecture

Flavors and configuration

The deployment of telemetry components in OpenNESS is easily configurable from the OpenNESS Experience Kit (OEK). The deployment of the Grafana dashboard and PCM (Performance Counter Monitoring) collector is optional (telemetry_grafana_enable enabled by default, telemetry_pcm_enable disabled by default). There are four distinctive flavors for the deployment of the CollectD collector, enabling the respective set of plugins (telemetry_flavor):

  • common (default)
  • flexran
  • smartcity
  • corenetwork

Further information on what plugins each flavor enables can be found in the CollectD section. All flags can be changed in ./group_vars/all/10-default.yml for the default configuration or in ./flavors in a configuration for a specific platform flavor.

Telemetry features

This section provides an overview of each of the components supported within OpenNESS and a description of how to use these features.

Prometheus

Prometheus is an open-source, community-driven toolkit for systems monitoring and alerting. The main features include:

  • PromQL query language
  • multi-dimensional, time-series data model
  • support for dashboards and graphs

The main idea behind Prometheus is that it defines a unified metrics data format that can be hosted as part of any application that incorporates a simple web server. The data can be then scraped (downloaded) and processed by Prometheus using a simple HTTP/HTTPS connection.

In OpenNESS, Prometheus is deployed as a K8s Deployment with a single pod/replica on the Edge Controller node. It is configured out of the box to scrape all other telemetry endpoints/collectors enabled in OpenNESS and gather data from them. Prometheus is enabled in the OEK by default with the telemetry/prometheus role.

Usage

  1. To connect to a Prometheus dashboard, start a browser on the same network as the OpenNESS cluster and enter the address of the dashboard (where the IP address is the address of the Edge Controller)

     From browser:
     http://<controller-ip>:30000
    
  2. To list the targets/endpoints currently scraped by Prometheus, navigate to the status tab and select targets from the drop-down menu. Prometheus targets
  3. To query a specific metric from one of the collectors, navigate to the graph tab, select a metric from the insert-metric-at-cursor list or type the value into the field above and press execute. Prometheus metrics
  4. To graph the selected metrics, press the graph tab. Prometheus graph

Grafana

Grafana is an open-source visualization and analytics software. It takes the data provided from external sources and displays relevant data to the user via dashboards. It enables the user to create customized dashboards based on the information the user wants to monitor and allows for the provision of additional data sources. In OpenNESS, the Grafana pod is deployed on a control plane as a K8s Deployment type and is by default provisioned with data from Prometheus. It is enabled by default in OEK and can be enabled/disabled by changing the telemetry_grafana_enable flag.

Usage

  1. To connect to the Grafana dashboard, start a browser on the same network as the OpenNESS cluster and enter the address of the dashboard (where the IP address is the address of the Edge Controller)

     From browser:
     http://<controller-ip>:32000
    
  2. Access the dashboard
    1. Extract grafana password by running the following command on Kubernetes controller:
      kubectl get secrets/grafana -n telemetry -o json | jq -r '.data."admin-password"' | base64 -d
    2. Log in to the dashboard using the password from the previous step and admin login Grafana login
  3. To create a new dashboard, navigate to http://<controller-ip>:32000/dashboards. Grafana dash
  4. Navigate to dashboard settings. Grafana settings
  5. Change the name of dashboard in the General tab and add new variables per the following table from the Variables tab: