SPDX-License-Identifier: Apache-2.0       
Copyright (c) 2020 Intel Corporation

Edge Multi-Cluster Orchestrator (EMCO)

Background

Edge Multi-Cluster Orchestration(EMCO), an OpenNESS Building Block, is a Geo-distributed application orchestrator for Kubernetes*. EMCO operates at a higher level than Kubernetes* and interacts with multiple of edges and clouds running Kubernetes. The main objective of EMCO is automation of the deployment of applications and services across multiple clusters. It acts as a central orchestrator that can manage edge services and network functions across geographically distributed edge clusters from different third parties.

Increasingly we see a requirement of deploying ‘composite applications’ in multiple geographical locations. Some of the catalysts for this change are:

  • Latency - requirements for new low latency application use cases such as AR/VR. Need for ultra low latency response needed in IIOT and other cases. This requires running some parts of the applications on edges close to the user
  • Bandwidth - processing data on edges to avoid costs associated with transporting the data to clouds for processing,
  • Context/Promixity - running some part of the applications on edges near the user that require local context
  • Privacy/Legal - some data can have legal requirements to not leave a geographic location

OpenNESS EMCOFigure 1 - Orchestrate GeoDitributed Edge Applications

NOTE: A ‘composite application’ is a combination of multiple applications with each application packaged as a Helm chart. Based on the deployment intent, various applications of the composite application get deployed at various locations, and get replicated in multiple locations.

Life cycle management of composite applications is complex. Instantiation and terminations of the complex application across multiple K8s clusters (Edges and Clouds), monitoring the status of the complex application deployment, Day 2 operations (Modification of the deployment intent, upgrades etc..) are few complex operations.

Number of K8s clusters (Edges or clouds) could be in tens of thousands, number of complex applications that need to be managed could be in hundreds, number of applications in a complex application could be in tens and number of micro-services in each application of the complex application can be in tens. Moreover, there can be multiple deployments of the same complex applications for different purposes. To reduce the complexity, all these operations are to be automated. There shall be one-click deployment of the complex applications and one simple dashboard to know the status of the complex application deployment at any time. Hence, there is a need for Multi-Edge and Multi-Cloud distributed application orchestrator.

Compared with other multiple-clusters orchestration, EMCO focuses on the following functionalities:

  • Enrolling multiple geographically distributed OpenNESS clusters and third party cloud clusters.
  • Orchestrating composite applications (composed of multiple individual applications) across different clusters.
  • Deploying edge services and network functions on to different nodes spread across different clusters.
  • Monitoring the health of the deployed edge services/network functions across different clusters.
  • Orchestrating edge services and network functions with deployment intents based on compute, acceleration, and storage requirements.
  • Supporting multiple tenants from different enterprises while ensuring confidentiality and full isolation between the tenants.

The following figure shows the topology overview for the OpenNESS EMCO orchestration with edge and multiple clusters. It also shows an example of deploying SmartCity with EMCO. OpenNESS EMCO

Figure 2 - Topology Overview with OpenNESS EMCO

All the managed edge clusters and cloud clusters are connected with the EMCO cluster through the WAN network.

  • The central orchestration (EMCO) cluster can be installed and provisioned by using the OpenNESS Central Orchestrator Flavor.
  • The edge clusters and the cloud cluster can be installed and provisioned by using the OpenNESS Flavor.
  • The composite application - SmartCity is composed of two parts: edge application and cloud (web) application.
    • The edge application executes media processing and analytics on multiple edge clusters to reduce latency.
    • The cloud application is like a web application for additional post-processing, such as calculating statistics and display/visualization on the cloud cluster side.
    • The EMCO user can deploy the SmartCity applications across the clusters. Besides that, EMCO allows the operator to override configurations and profiles to satisfy deployment needs.

This document aims to familiarize the user with EMCO and OpenNESS deployment flavor for EMCO installation and provision, and provide instructions accordingly.

EMCO Introduction

EMCO Terminology

Term Description
AppContext <p>The AppContext is a set of records maintained in the EMCO etcd data store which maintains the collection of resources and clusters associated with a deployable EMCO resource (e.g. Deployment Intent Group)..</p>
Cluster Provider <p>The provider is someone who owns clusters and registers them.</p>
Projects <p>The project resource provides means for a collection of applications to be grouped. Several applications can exist under a specific project. Projects allows for grouping of applications under a common tenant to be defined.</p>
Composite application <p>The composite application is combination of multiple applications. Based on the deployment intent, various applications of the composite application get deployed at various locations. Also, some applications of the composite application get replicated in multiple locations. </p>
Deployment Intent <p>EMCO does not expect the editing of Helm charts provided by application/Network-function vendors by DevOps admins. Any customization and additional K8s resources that need to be present with the application are specified as deployment intents. </p>
Deployment Intent Group <p>The Deployment Intent Group represents an instance of a composite application that can be deployed with a specified composite profile and a specified set of deployment intents which will control the placement and other configuration of the application resources. </p>
Placement <p>EMCO supports to create generic placement intents for a given composite application. Normally, EMCO scheduler calls placement controllers first to figure out the edge/cloud locations for a given application. Finally, it works with ‘resource synchronizer & status collector’ to deploy K8s resources on various Edge/Cloud clusters. </p>

EMCO Architecture

The following diagram depicts a high level overview of the EMCO architecture. OpenNESS EMCO

Figure 3 - EMCO Architecture

  • Cluster Registration Controller registers clusters by cluster owners.
  • Distributed Application Scheduler provides a simplified and extensible placement.
  • Network Configuration Management handles creation/management of virtual and provider networks.
  • Hardware Platform Aware Controller enables scheduling with auto-discovery of platform features/ capabilities.
  • Distributed Cloud Manager presents a single logical cloud from multiple edges.
  • Secure Mesh Controller auto-configures both service mesh (ISTIO) and security policy (NAT, firewall).
  • Secure WAN Controller automates secure overlays across edge groups.
  • Resource Syncronizer manages instantiation of resources to clusters.
  • Monitoring covers distributed application.

Cluster Registration

A microservice exposes RESTful API. User can register cluster providers and clusters of those providers via these APIs. After preparing edge clusters and cloud clusters, which can be any Kubernetes* cluster, user can onboard those clusters to EMCO by creating a cluster provider and then adding clusters to the cluster provider. After cluster providers are created, the KubeConfig files of edge and cloud clusters should be provided to EMCO as part of the multi-part POST call to the Cluster API.

Additionally, after a cluster is created, labels and key value pairs can be added to the cluster via the EMCO API. Clusters can be specified by label when preparing placement intents.

NOTE: The cluster provider is someone who owns clusters and registers them to EMCO. If an Enterprise has clusters, for example from AWS, then the cluster provider for those clusters from AWS is still considered as from that Enterprise. AWS is not the provider. Here, the provider is someone who owns clusters and registers them here. Since AWS does not register their clusters here, AWS is not considered cluster provider in this context.

Distributed Application Scheduler

The distributed application scheduler microservice:

  • Project Management provides multi-tenancy in the application from a user perspective.
  • Composite App Management manages composite apps that are collections of Helm Charts, one per application.
  • Composite Profile Management manages composite profiles that are collections of profile, one per application.
  • Deployment Intent Group Management manages Intents for composite applications.
  • Controller Registration manages placement and action controller registration, priorities etc.
  • Status Notifier framework allows user to get on-demand status updates or notifications on status updates.
  • Scheduler:
    • Placement Controllers: Generic Placement Controller.
    • Action Controllers.
Lifecycle Operations

The Distributed Application Scheduler supports operations on a deployment intent group resource to instantiate the associated composite application with any placement, and action intents performed by the registered placement and action controllers. The basic flow of lifecycle operations on a deployment intent group after all the supporting resources have been created via the APIs are:

  • approve: marks that the deployment intent group has been approved and is ready for instantiation.
  • instantiate: the Distributed Application Scheduler prepares the application resourcs for deployment, and applies placement and action intents before invoking the Resource Synchronizer to deploy them to the intended remote clusters.
  • status: (may be invoked at any step) provides information on the status of the deployment intent group.
  • terminate: terminates the application resources of an instantiated application from all of the clusters to which it was deployed. In some cases, if a remote cluster is intermittently unreachable, the instantiate operation may still retry the instantiate operation for that cluster. The terminate operation will cause the instantiate operation to complete (i.e. fail), before the termination operation is performed.
  • stop: In some cases, if the remote cluster is intermittently unreachable, the Resource Synchronizer will continue retrying an instantiate or terminate operation. The stop operation can be used to force the retry operation to stop, and the instantiate or terminate operation will complete (with a failed status). In the case of terminate, this allows the deployment intent group resource to be deleted via the API, since deletion is prevented until a deployment intent group resource has reached a completed terminate operation status. Refer to EMCO Resource Lifecycle Operations for more details.

Network Configuration Management

The network configuration management (NCM) microservice:

  • Provider Network Management to create provider networks.
  • Virtual Network Management to create dynamic virtual networks.
  • Controller Registration manages network plugin controllers, priorities etc.
  • Status Notifier framework allows user to get on-demand status updates or notifications on status updates.
  • Scheduler with Built in Controller - OVN-for-K8s-NFV Plugin Controller.
Lifecycle Operations

The Network Configuration Management microservice supports operations on the network intents of a cluster resource to instantiate the associated provider and virtual networks that have been defined via the API for the cluster. The basic flow of lifecycle operations on a cluster, after all the supporting network resources have been created via the APIs are:

  • apply: the Network Configuration Management microservice prepares the network resources and invokes the Resource Synchronizer to deploy them to the designated cluster.
  • status: (may be invoked at any step) provides information on the status of the cluster networks.
  • terminate: terminates the network resources from the cluster to which they were deployed. In some cases, if a remote cluster is intermittently unreachable, the Resource Synchronizer may still retry the instantiate operation for that cluster. The terminate operation will cause the instantiate operation to complete (i.e. fail), before the termination operation is performed.
  • stop: In some cases, if the remote cluster is intermittently unreachable, the Resource Synchronizer will continue retrying an instantiate or terminate operation. The stop operation can be used to force the retry operation to stop, and the instantate or terminate operation will be completed (with a failed status). In the case of terminate, this allows the deployment intent group resource to be deleted via the API, since deletion is prevented until a deployment intent group resource has reached a completed terminate operation status.

Distributed Cloud Manager

The Distributed Cloud Manager (DCM) provides the Logical Cloud abstraction and effectively completes the concept of “multi-cloud”. One Logical Cloud is a grouping of one or many clusters, each with their own control plane, specific configurations and geo-location, which get partitioned for a particular EMCO project. This partitioning is made via the creation of distinct, isolated namespaces in each of the Kubernetes* clusters that thus make up the Logical Cloud.

A Logical Cloud is the overall target of a Deployment Intent Group and is a mandatory parameter (the specific applications under it further refine what gets run and in which location). A Logical Cloud must be explicitly created and instantiated before a Deployment Intent Group can be instantiated.

Due to the close relationship with Clusters, which are provided by Cluster Registration (clm) above, it is important to understand the mapping between the two. A Logical Cloud groups many Clusters together but a Cluster may also be grouped by multiple Logical Clouds, effectively turning the cluster multi-tenant. The partitioning/multi-tenancy of a particular Cluster, via the different Logical Clouds, is done today at the namespace level (different Logical Clouds access different namespace names, and the name is consistent across the multiple clusters of the Logical Cloud).

Mapping between Logical Clouds and Clusters

Figure 4 - Mapping between Logical Clouds and Clusters

Lifecycle Operations

Prerequisites to using Logical Clouds:

  • with the project-less Cluster Registration API, create the cluster providers, clusters and optionally cluster labels.
  • with the Distributed Application Scheduler API, create a project which acts as a tenant in EMCO.

The basic flow of lifecycle operations to get a Logical Cloud up and running via the Distributed Cloud Manager API is:

  • Create a Logical Cloud specifying the following attributes:
    • Level: either 1 or 0, depending on whether an admin or a custom/user cloud is sought - more on the differences below.
    • (for Level-1 only) Namespace name - the namespace to use in all of the Clusters of the Logical Cloud.
    • (for Level-1 only) User name - the name of the user that will be authenticating to the Kubernetes* APIs to access the namespaces created.
    • (for Level-1 only) User permissions - permissions that the user specified will have in the namespace specified, in all of the clusters.
  • (for Level-1 only) Create resource quotas and assign them to the Logical Cloud created: this specifies what quotas/limits the user will face in the Logical Cloud, for each of the Clusters.
  • Assign the Clusters previously created with the project-less Cluster Registration API to the newly-created Logical Cloud.
  • Instantiate the Logical Cloud. All of the clusters assigned to the Logical Cloud are automatically set up to join the Logical Cloud. Once this operation is complete, the Distributed Application Scheduler’s lifecycle operations can be followed to deploy applications on top of the Logical Cloud.

Apart from the creation/instantiation of Logical Clouds, the following operations are also available:

  • Terminate a Logical Cloud - this removes all of the Logical Cloud -related resources from all of the respective Clusters.
  • Delete a Logical Cloud - this eliminates all traces of the Logical Cloud in EMCO.
Level-1 Logical Clouds

Logical Clouds were introduced to group and partition clusters in a multi-tenant way and across boundaries, improving flexibility and scalability. A Level-1 Logical Cloud is the default type of Logical Cloud providing just that much. When projects request a Logical Cloud to be created, they provide what permissions are available, resource quotas and clusters that compose it. The Distributed Cloud Manager, alongside the Resource Synchronizer, sets up all the clusters accordingly, with the necessary credentials, namespace/resources, and finally generating the kubeconfig files used to authenticate/reach each of those clusters in the context of the Logical Cloud.

Level-0 Logical Clouds

In some use cases, and in the administrative domains where it makes sense, a project may want to access raw, unmodified, administrator-level clusters. For such cases, no namespaces need to be created and no new users need to be created or authenticated in the API. To solve this, the Distributed Cloud Manager introduces Level-0 Logical Clouds, which offer the same consistent interface as Level-1 Logical Clouds to the Distributed Application Scheduler. Being of type Level-0 means “the lowest-level”, or the administrator level. As such, no changes will be made to the clusters themselves. Instead, the only operation that takes place is the reuse of credentials already provided via the Cluster Registration API for the clusters assigned to the Logical Cloud (instead of generating new credentials, namespace/resources and kubeconfig files).

OVN Action Controller

The OVN Action Controller (ovnaction) microservice is an action controller which may be registered and added to a deployment intent group to apply specific network intents to resources in the composite application. It provides the following functionalities:

  • Network intent APIs which allow specification of network connection intents for resources within applications.
  • On instantiation of a deployment intent group configured to utilize the ovnaction controller, network interface annotations will be added to the pod template of the identified application resources.
  • ovnaction supports specifying interfaces which attach to networks created by the Network Configuration Management microservice.

Traffic Controller

The traffic controller microservice provides a way to create network policy resources across edge clusters. It provides inbound RESTful APIs to create intents to open the traffic from clients, and provides change and delete APIs for update and deletion of traffic intents. Using the information provided through intents, it also creates a network policy resource for each of the application servers on the corresponding edge cluster.

NOTE:For network policy to work, edge cluster must have network policy support using CNI such as calico.

Generic Action Controller

The generic action controller microservice is an action controller which may be registered with the central orchestrator. It can achieve the following usecases:

  • Create a new Kubernetes* object and deploy that along with a specific application which is part of the composite Application. There are two variations here:

    • Default : Apply the new object to every instance of the app in every cluster where the app is deployed.
    • Cluster-Specific : Apply the new object only where the app is deployed to a specific cluster, denoted by a cluster-name or a list of clusters denoted by a cluster-label.
  • Modify an existing Kubernetes* object which may have been deployed using the Helm chart for an app, or may have been newly created by the above mentioned usecase. Modification may correspond to specific fields in the YAML definition of the object.

To achieve both the usecases, the controller exposes RESTful APIs to create, update and delete the following:

  • Resource - Specifies the newly defined object or an existing object.
  • Customization - Specifies the modifications (using JSON Patching) to be applied on the objects.

Resource Synchronizer

This microservice is the one which deploys the resources in edge/cloud clusters. ‘Resource contexts’ created by various microservices are used by this microservice. It takes care of retrying, in case the remote clusters are not reachable temporarily.

Placment and Action Controllers in EMCO

This section illustrates some key aspects of the EMCO controller architecture. Depending on the needs of a composite application, intents that handle specific operations for application resources (e.g. addition, modification, etc.) can be created via the APIs provided by the corresponding controller API. The following diagram shows the sequence of interactions to register controllers with EMCO.

OpenNESS EMCO

Figure 5 - Register placement and action controllers with EMCO

This diagram illustrates the sequence of operations taken to prepare a Deployment Intent Group that utilizes some intents supported by controllers. The desired set of controllers and associated intents are included in the definition of a Deployment Intent Group to satisfy the requirements of a specific deployed instance of a composite application.

OpenNESS EMCO

Figure 6 - Create a Deployment Intent Group

When the Deployment Intent Group is instantiated, the identified set of controllers are invoked in order to perform their specific operations.

OpenNESS EMCO

Figure 7 - Instantiate a Deployment Intent Group

In this initial release of EMCO, a built-in generic placement controller is provided in the orchestrator. As described above, the three provided action controllers are the OVN Action, Traffic and Generic Action controllers.

Status Monitoring and Queries in EMCO

When a resource like a Deployment Intent Group is instantiated, status information about both the deployment and the deployed resources in the cluster are collected and made available for query by the API. The following diagram illustrates the key components involved. For more information about status queries see EMCO Resource Lifecycle Operations.