Kubernetes

gridscale Managed Kubernetes (GSK) is a secure and fully-managed Kubernetes solution. All you need to do is to configure how powerful you wish your cluster to be. We take care of upgrades and OS maintenance.

gridscale Managed Kubernetes (GSK) fully integrates into our products, offering easy configuration, monitoring, release management and security enabling you to explicitly focus on your business applications. GSK easily integrates with our Load Balancer, Certificates and Storage IaaS for Ingress and persistent volumes respectively.

The GSK cluster is managed by our provisioning stack. Any changes that are done directly to the worker nodes are transient as worker nodes can and will be recycled under certain circumstances. Due to this transient nature, volume claims that need to persist should be done as persistent volume claims through Kubernetes.

If you are new to Kubernetes or containers in general, we’d recommend you get familiar with commonly used terminology and go through our line-up of content for you to get started:

  1. gridscale presents: Managed Kubernetes
  2. Kubernetes - All about clusters, pods and kubelets
  3. How to: connect gridscale Kubernetes Cluster and PaaS

Release Support

The GSK offering supports the three latest stable Kubernetes releases. Keep in mind that support for newer Kubernetes versions require time to adapt components that integrate the gridscale platform into Kubernetes and provide a migration path from previous releases.

Releases other than the three latest supported GSK releases are deprecated. You will be notified of GSK release deprecation within the Panel Notification four weeks in advance.

Please upgrade your clusters ahead of deprecation (see below on how to perform upgrades). Once deprecated, your cluster is subject to a forced upgrade. With forced upgrades, the correct functioning of your workloads cannot be guaranteed, because minor Kubernetes versions can introduce breaking changes to your workloads.

GSK Updates and Upgrades

Patch Updates

Patch updates contain either a new Kubernetes patch release or GSK specific changes (such as CSI plugin) or both. Patch updates are performed automatically for you in regular intervals. Also, you can perform patch updates ahead of time via the API.

Release Upgrades

Release upgrades contain a new Kubernetes minor or major release and (optionally) GSK specific changes. Release upgrades are not performed automatically for you. You can perform release upgrades via the panel or the API.

Please consult the upgrade considerations section below for compatibility information between Kubernetes releases.

Example of performing patch updates or release upgrades via API with cURL:

  1. Get your GSK service:
curl 'https://api.gridscale.io/objects/paas/services/<CLUSTER_UUID>' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>'
  1. Get the available Service Templates:
curl 'https://api.gridscale.io/objects/paas/service_templates' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>'
  1. Take the current service_template_uuid from Step 2, which corresponds to your GSK cluster found in Step 1.
  2. Find the target services template from the patch_updates attribute from Step 3.
  3. Initiate GSK Update via Service Patch using the UUID from Step 4:
curl 'https://api.gridscale.io/objects/paas/services/<CLUSTER_UUID>' -X PATCH -H 'Content-Type: application/json' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>' --data-raw '{"service_template_uuid":"<PATCH_UPDATE_SERVICE_TEMPLATE_UUID>"}'

Downtime During Updates and Upgrades

Since we replace and upgrade the Kubernetes master, the Kubernetes API will experience a short interruption during which you won’t be able to change cluster resources.

Under optimal conditions, your workloads won’t experience any downtime as long as at least two worker nodes are present in the cluster and workloads are configured properly with redundancy.

We ensure that only one node is upgraded at a time, and that the workloads will be moved away from the node that is upgraded. This process is designed to be graceful. If workloads prevent graceful draining, however, node shutdown is forced to allow the process to continue.

One exception is when the cluster is working at or near resource limits, in which case you should add one more nodes to the cluster to make sure no resource limits are exceeded.

Considerations for Upgrades

Patch updates (1.19.10 → 1.19.11) is considered safe from the Kubernetes project.

Release upgrades (1.16.x → 1.17.x) can introduce breaking changes. To check if your workloads (deployments, services, daemonsets, etc.) are still compatible with the Kubernetes release you want to upgrade to, you can do several things.

Read on to find out how to check if your workloads (deployments, services, daemonsets etc.) are still compatible with the Kubernetes release you want to upgrade to.

Official Kubernetes Documentation

You can find the official Kubernetes release notes in the Changelog.

Another helpful resource is the Deprecated API Migration Guide, which lists all API removals by release.

Example:

The extensions/v1beta1 API version of NetworkPolicy is no longer served as of v1.16.

  • Migrate manifests and API clients to use the networking.k8s.io/v1 API version, available since v1.8.
  • All existing persisted objects are accessible via the new API

Deploy and Test Workload on a Temporary Cluster

The easiest way is just to provision a test cluster with the new release.

Deploy your workloads to the test cluster and check if everything is working as expected.

This way you can make sure your workloads are compatible with the kubernetes release you want to upgrade to without impact on live workloads.

Third Party Cluster Linting Tool

There are some third party tools which could make the transition easier.

Pluto is a tool which helps users find deprecated Kubernetes APIs.

In this example we see two files in our directory that have deprecated apiVersions. Deployment extensions/v1beta1 is not longer available and needs to be replaced with apps/v1. This will need to be fixed prior to a 1.16 upgrade:

pluto detect-files -d kubernetes/testdata

NAME        KIND         VERSION              REPLACEMENT   REMOVED   DEPRECATED
utilities   Deployment   extensions/v1beta1   apps/v1       true      true
utilities   Deployment   extensions/v1beta1   apps/v1       true      true

Head over to the Pluto Documentation to read more about in-depth usage.

Visible and Protected Resources

The resources like servers, storages, networks, and so on which make up the cluster can also be viewed via the API or within the Panel for transparency and billing reasons. It is not recommended to change any individual resources, as any successful changes will be reset once the cluster updates or recovers.

Beginning with the 1.19.12-gs0 release we protect Kubernetes resources from altering via Panel and API. This will protect users for example by accedentially delete, change, shutdown resources from a GSK cluster.

Protected Resourcecs:

  • Master Nodes (server, storage, ips)
  • Worker Nodes (server, storage, ips)
  • Kubernetes network
  • Storages created by Kubernetes (like physical volumes)
  • LoadBalancers created by Kubernetes (like Ingress-Controlers)

If you want to change your worker config you can still do this in the Kubernetes configuration.

Horizontal Pod Autoscaler (HPA)

In order to use the horizontal pod autoscaler (HPA) you need to install the Metrics Server. You can bring your own or just follow the example.

Install Metrics Server

You can install the Metrics Server via Helm. There is a ready-to-use Metrics Server Helm Chart by Bitnami.

Add the Bitnami Metrics Server repository to your Helm installation:

helm repo add bitnami https://charts.bitnami.com/bitnami

Create a values.yaml with this content to configure your Metrics Server:

apiService:
  create: true
extraArgs:
  kubelet-insecure-tls: true
  kubelet-preferred-address-types: InternalIP

Install the Metrics Server Helm Chart:

helm install metrics-server bitnami/metrics-server -f values.yaml

Wait for the Metrics Server to be ready. It might take a minute or two before the first metrics are collected.

Run HPA

In order to run the HPA you need to create a deployment and generate some load against it.

Keep in mind that it is required to define the resource limits and request in order to use the HPA. The service is just for allowing access for load-generator.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

Download the example deployment and service and deploy with:

kubectl apply -f php-apache.yaml

Create the HPA for the deployment:

kubectl autoscale deployment php-apache --cpu-percent=20 --min=1 --max=10

Check the current status of the HPA:

kubectl get hpa

This should look like this:

NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   0%/50%    1         10        1          2d22h

Generate Test Load

Now you create an infinite loop which will generate a load:

kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

Open a second terminal and check the HPA status:

kubectl get hpa -w

After some time you should see the pods scale:

NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   0%/50%    1         10        1          2d22h
php-apache   Deployment/php-apache   91%/50%   1         10        1          2d22h
php-apache   Deployment/php-apache   91%/50%   1         10        2          2d22h
php-apache   Deployment/php-apache   253%/50%  1         10        2          2d22h
php-apache   Deployment/php-apache   253%/50%  1         10        4          2d22h
php-apache   Deployment/php-apache   253%/50%  1         10        6          2d22h
php-apache   Deployment/php-apache   101%/50%  1         10        6          2d22h
php-apache   Deployment/php-apache   71%/50%   1         10        6          2d22h
php-apache   Deployment/php-apache   71%/50%   1         10        9          2d22h
php-apache   Deployment/php-apache   51%/50%   1         10        9          2d22h

You can also check the deployment itself:

kubectl get deployment php-apache

NAME         READY   UP-TO-DATE   AVAILABLE   AGE
php-apache   9/9     9            9           2d22h

Stop Load and Clean Up

In order to stop the load, hit CTRL+C in the terminal where you started the load generator.

You can verify the scale down with the commands from above:

kubectl get deployment php-apache -w

Delete the example deployment and service:

kubectl delete -f php-apache.yaml

Vertical Scaling

GSK supports vertical scaling, which can be enabled by simply editing the worker node configuration of your Kubernetes cluster in the Panel or via the API. Scaling the cluster will start to recycle all nodes individually. We do not directly support editing the resources, for example changing the RAM configuration of a worker node via the servers section of the Panel. Any changes made are revertible at any given point of time.

If available for your Kubernetes version, you’ll be able to scale the cluster via the Expert Panel.

The following can be changed:

Cores per worker node k8s_worker_node_cores

RAM per worker node k8s_worker_node_ram

Storage per worker node k8s_worker_node_storage

StorageType per worker node k8s_worker_node_storage_type

Via the API, make sure the service templates worker node parameters are mutable by setting immutable to false as shown below:

"k8s_worker_node_ram": {
          "max": 192,
          "min": 2,
          "type": "integer",
          "default": 16,
          "required": true,
          "immutable": false,
          "description": "Memory per worker node"
        },
"k8s_worker_node_cores": {
          "max": 32,
          "min": 1,
          "type": "integer",
          "default": 4,
          "required": true,
          "immutable": false,
          "description": "Cores per worker node"
        }

You can then patch your cluster by ensuring to patch the full worker node configuration, even if you don’t want the RAM to change.

For example:

{
  "parameters": {
      "k8s_worker_node_ram": 4,
      "k8s_worker_node_cores": 2,
      "k8s_worker_node_count": 3,
      "k8s_worker_node_storage": 40,
      "k8s_worker_node_storage_type": "storage"
      }
}

Worker Node Storage Types

When creating or editing your cluster you can choose to use one of our Distributed Storage clusters. If you want to change this, you can change it via the Expert Panel.

Via the API, make sure the service templates storage parameters are mutable by setting immutable to false like so:

"k8s_worker_node_storage": {
          "max": 1024,
          "min": 30,
          "type": "integer",
          "default": 30,
          "required": true,
          "immutable": false,
          "description": "Storage (in GiB) per worker node"
        },
"k8s_worker_node_storage_type": {
          "type": "string",
          "allowed": [
            "storage",
            "storage_high",
            "storage_insane"
          ],
          "default": "storage_insane",
          "required": true,
          "immutable": false,
          "description": "Storage type"
        }

You can then patch the full worker node configuration of your cluster, even if you don’t want the RAM to change.

For example:

{
  "parameters": {
      "k8s_worker_node_ram": 4,
      "k8s_worker_node_cores": 2,
      "k8s_worker_node_count": 3,
      "k8s_worker_node_storage": 40,
      "k8s_worker_node_storage_type": "storage"
    }
}

Logging

Container logs can be obtained via kubectl. While this is certainly feasible for ad-hoc debugging of single containers, it doesn’t give you the full picture of your application or even the whole cluster.

It is therefore a common practice to ship logs to a centralized log management platform, where they can be transformed and analyzed in one place - giving you that full picture and the means to act on events or trends.

There are multiple ways to get your logs into the log management platform:

  • Your application can directly implement the format your log management platform accepts the logs in, and send them there.
  • Your application can log to stdout and stderr, leaving it to the container engine to store the logs.

It is a good practice to use the latter approach. This approach decouples the application from runtime environment specifics. It is non-blocking for the application and provides a general approach to reliably and securely transfer logs, even when running into temporary unavailability of the log management platform

Log Shipping

While the container engine technically might be able to ship the logs directly to your log management platform, having the container engine store them locally instead and a third-party component read and ship them has proven to be the more reliable and portable solution.

This third-party component is called a log shipper. In general it can run anywhere, has inputs to read logs from locally and outputs to ship logs to remotely. The log shipper is an application agnostic approach - in the sense that it doesn’t need to be integrated into the applications you run on your cluster in any way. It just needs to support the format the logs are stored in as an input and the format the log management platform accepts the logs in as an output.

Accessing Container Logs

The log driver used by the container engine docker on our managed Kubernetes platform is journald.

journald is part of systemd and designed to store logs safely and handle rotation gracefully to prevent node disks from filling up. journald makes it easy for the shipper to reliably transfer logs, since the shipper only needs to keep track of one event stream.

journald stores logs in /var/log/journald. Among the log shippers that support journald as an input are:

Note: The log shipper needs to keep track where it left off, so that after a restart/redployment log shipping doesn’t start at the beginning resp. all logs are transferred again. Since the position is node-specific, a local hostPath mount to store the position in is recommended.

Load Balancing

Applying a service with the type of Load Balancer will provision a gridscale Load Balancer. Below are some helpful tips on integrating with our Load Balancer as a Service (LBaaS):

IP Address Forwarding

The Load Balancer needs to be set to HTTP mode. The client’s IP address is then available in the X-Forwarded-For HTTP header.

Note: When in HTTP mode, HTTPS-termination happens at the Load Balancer level. For the HTTP mode alone, certificates will be obtained via Let’s Encrypt or you can upload your own custom certificate.

Configuring Load Balancer Modes

The cloud controller manager (CCM) uses service annotations to configure the LBaaS for a GSK cluster. If an annotation of a specific parameter is not set, the default value for that parameter will be configured. This feature is supported from these GSK versions 1.18.12-gs1, 1.19.4-gs1, 1.17.14-gs1, and 1.16.15-gs2 and later.

AnnotationDefault value
service.beta.kubernetes.io/gs-loadbalancer-modetcp
service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https“false”
service.beta.kubernetes.io/gs-loadbalancer-ssl-domainsnil
service.beta.kubernetes.io/gs-loadbalancer-algorithmleastconn
service.beta.kubernetes.io/gs-loadbalancer-https-ports443
service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuidsnil

Examples

The following annotations configure the LBaaS with HTTP mode, Round Robin Algorithm, redirect HTTP to HTTPS, and multiple SSL Domains wherein domains are separated by a comma.

annotations:
    service.beta.kubernetes.io/gs-loadbalancer-mode: http
    service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https: "true"
    service.beta.kubernetes.io/gs-loadbalancer-ssl-domains: demo1.test.com,demo2.test.com
    service.beta.kubernetes.io/gs-loadbalancer-algorithm: roundrobin

The following annotations configure the LBaaS with HTTP mode, Round Robin Algorithm, redirect HTTP to HTTPS, a none standard SSL port 4443, and a custom certificate wherein certificate UUIDs are separated by a comma.

annotations:
    service.beta.kubernetes.io/gs-loadbalancer-mode: http
    service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https: "true"
    service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids: c8b786e7-53ee-427b-8ff6-498f59f58b14
    service.beta.kubernetes.io/gs-loadbalancer-algorithm: roundrobin
    service.beta.kubernetes.io/gs-loadbalancer-https-ports: "4443"

Adding Annotations to an Existing Ingress

You can customize the behaviour of specific Ingress objects using annotations:

kubectl annotate --overwrite svc <INGRESS_NAME> \
"service.beta.kubernetes.io/gs-loadbalancer-mode=http" \
"service.beta.kubernetes.io/gs-loadbalancer-algorithm=roundrobin"

Networking

We use Flannel out-of-the-box, which cannot be currently changed.

Network Policies

Due to Flannel being used as the network overlay, our cluster does not support networking policies.

Persistent Volumes

We support up to fifteen storages connected to a server. Because the servers and storages which make up a Kubernetes cluster are visible in the Panel, a VNC connection is still possible if network issues arise.

Storage Classes

We offer three different storage classes for every GSK cluster respectifly to our performance classes from our distributed storage.

NAME                      PROVISIONER           RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
block-storage (default)   bs.csi.gridscale.io   Delete          Immediate           true                   68d
block-storage-high        bs.csi.gridscale.io   Delete          Immediate           true                   68d
block-storage-insane      bs.csi.gridscale.io   Delete          Immediate           true                   68d

The reclaim policy “Delete” will make sure that deleted physical volumes (PV) will also be deleted as a storage in the Panel and API.

Deleting and changing our storage classes is not recommended. This also means changing the default storage class. Post updating and upgrading, we will introduce them again into the cluster.

Mount Options

Currently we only support the ReadWriteOnce (RWO) mount option. ReadWriteMany (RWM) storages can be setup by yourself, with a custom NFS server.

Types of Persistent Volumes

hostPath volumes can be used, however due to the transient nature of the worker nodes they will be lost if a worker node is being recycled.

Ingress Controller

Since there is no Ingress Controller installed by default, you can setup the cluster how you like.

Access and Security

All users with write access (or higher) to the project will be able to download the Kuberenetes certificate.

PKI Certificate Access

Authentication against the Kubernetes master is based on X.509 client certificates, which can be generated and expire after three days. This can be used with gscloud, which will automatically renew the certificate for you.

After installation you can setup gscloud with your API Keys, which can be found here: gscloud make-config

After setup, register gscloud to fetch and update the certificates automatically using the following command: gscloud kubernetes cluster save-kubeconfig --credential-plugin --cluster <cluster uuid>

Encryption

Data is encrypted at rest, and network traffic is TLS encrypted on the application layer.

Role-based Access Control (RBAC)

GSK supports standard Kubernetes RBAC and can be configued using an access server like Microsoft Access Directory.

Firewall

GSK controlplane and worker nodes utilize the firewall in the OS to secure cluster-internals from the public network.

This does not restrict you from exposing your workloads to the public network.

Backups

Data that belongs to the controlplane of the cluster (such as etcd) is backed up by gridscale.

Data that comes from within the application needs to be backed up by the user. gridscale Storage snapshots and backups are not supported by GSK at this point. They cannot be used for backing up persisted data.

Please employ a solution that runs in the cluster.

Node Pools

Currently we only support one node pool.

Kubernetes Dashboard

The official Kubernetes dashboard is not deployed by default and can be installed with a single command that is mentioned in the Official Kubernetes Documentation.

FAQ

Does gridscale monitor the cluster?

We monitor the overall cluster health of a cluster. We assure that the cluster is healthy and functional and we will be paged about abnormal conditions of the cluster. gridscale does not monitor the application(s) that are deployed within the cluster. Since we don’t know anything about your workloads, we don’t include performance and resource monitoring from our side as part of the standardised gridscale Managed Kubernetes (GSK).

Do cluster components communicate on the Public or the Private Network?

Cluster communication is strictly private. This includes communication between Kubernetes components, but also communication between pods and/or services. However, as a user you can contact external services. Thereby it would technically be possible, but not usual, to communicate with other services on the cluster through the Public Network and Load Balancers, if that service is exposed to the outside and communication is explicitly directed there through public connection details.

Terms and Abbreviations

  • GSK: gridscale Kubernetes
  • K8s: K-ubernete-s.
  • kubectl: A command line tool which functions as a management interface for a K8s cluster.
  • Node: A K8s cluster is made of a few virtual machines that talk to each other. In this context, a virtual machine is a node. A master (we have one master at the moment) and one or more workers.
  • Control Plane: A fancy way of saying “masters of the cluster”. Technically, all programs that run on the master that make the cluster a cluster. For instance, a specialized database or a program that decides which worker should run which software.
  • Deployment: In most cases an app running on K8s. Technically a collection of containers based on a set of templates (images).
  • PV: Persistent Volume. A persistent storage for Kubernetes deployments.
  • PVC: Persistent Volume Claim. When a client (user, customer, an application) needs a PV, they send a PVC to the K8s cluster.
  • Service: A way of accessing your deployment outside of the cluster, tightly related to Load Balancers and Ingresses.
  • Ingress: A special way of exposing a deployment outside of the cluster. Think of it as a kind of Load Balancer.
  • IngressController: This component runs inside the cluster and is responsible for handling requests for an Ingress.
  • RBAC: Role Based Access-Control. Allows you to selectively give different people different access rights to the cluster.
  • Dashboard: A graphical frontend for the cluster API. The user can see their deployments, nodes and a few metrics without using the command line. This is not enabled by default, but can be easily installed into the GSK.