Version: 6.4.1

Autoscaling Kubernetes

Compute Resources Scaling

The Cortex Helm chart comes bundled with the cluster-autoscaler helm chart (version 3.1.0) that allows for automatic scaling of Kubernetes worker nodes based on resource utilization. We highly recommend that you enable autoscaling for variable workloads and cost efficiency, for example to manage bringing up and down GPU nodes or for periods of heavy utilization. The official documentation for the cluster-autoscaler chart provides more information on how to configure against supported cloud providers, such as AWS and Azure

NOTE: For Azure it is currently not possible to have an autoscaling node group with a desired count of 0, so you must have an active instance deployed for each node group. Changes to this requirement are currently on the Azure Roadmap slated for Q2 2020.

Example configuration

In order to enable the cluster-autoscaler sub-chart that is included with your Cortex installation, you must set the following property in the Cortex helm overrides file (or optionally on the helm command line using --set):

cluster-autoscaler:
enabled: true

Any additional configuration for the sub-chart must be provided according to your Kubernetes deployment and cloud provider, a full list of the sub-chart configuration parameters is available here.

AWS

The example values override packaged with the Fabric6 Helm Chart in cortex5/examples/values-cortex-autoscaler-aws.yaml shows how to enable the autoscaling functionality of an EKS deployment that will auto-discover which resources to manage in AWS based on tags.

Substitute your own values for your AWS access key, secret key, region the cluster is deployed in, and EKS cluster name.

Kubernetes Metrics Server

This section provides information about how to install and configure the Kubernetes Metrics Server on your Kubernetes cluster.

Metrics Server is a source of container resource metrics for Kubernetes built-in autoscaling pipelines.

Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes' apiserver through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler. Metrics API can also be accessed by kubectl top, making it easier to debug autoscaling pipelines.

Instructions for installing the Metric Server can be found in the Bitnami Helm chart repository.

Horizontal Pod Autoscaling

When the Kubernetes Metrics Server is deployed, the Fabric Helm chart has support for configuring Horizontal Pod Autoscaler rules for each of the Fabric service deployments. The full spec field of the HorizontalPodAutoscaler resource is available to be extended under cortex.autoscaleRules, globally or overridden per Fabric service:

Global HPA Settings

The global HPA settings impact all services except Dex and API. To configure those services you must set the values as "Per Service" settings of the values.yaml.

Example of Global HPA Settings

cortex:
# cortex.autoscaleRules: A generic HorizontalPodAutoscaler spec to apply to each cortex service
autoscaleRules:
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80

Per Service HPA Settings

Dex and API services must be configured for HPA at the per service level.

Example of Per Service HPA settings

accounts:
# accounts.autoscaleRules: override service specific autoscaleRules (similar to cortex.autoscaleRules)
autoscaleRules:
minReplicas: 1
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80