Kubernetes

gridscale Managed Kubernetes (GSK) is a secure and fully-managed Kubernetes solution. All you need to do is to configure how powerful you wish your cluster to be. We take care of upgrades and OS maintenance.

gridscale Managed Kubernetes (GSK) fully integrates into our products, offering easy configuration, monitoring, release management and security enabling you to explicitly focus on your business applications.

GSK easily integrates with our Load Balancer, Certificates and Storage IaaS for Ingress and persistent volumes respectively.

If you are new to Kubernetes or containers in general, we’d recommend you get familiar with commonly used terminology and go through our line-up of content for you to get started:

  1. gridscale Kubernetes Cluster in 5 Minuten einrichten
  2. Kubernetes - All about clusters, pods and kubelets
  3. How to: connect gridscale Kubernetes Cluster and PaaS
  4. Release notes

You also may want to take a look at known issues.

Release Support

The GSK offering supports three stable Kubernetes releases (minor/major versions) at any time.

New Kubernetes releases provided by the community are adopted within 6 months of time, after their initial stable release. This adoption window is used to migrate GSK components, assure stable operations and provide a migration path from previous releases.

Releases other than the latest three are deprecated, not available for new clusters and no longer maintained for existing clusters. You are notified of GSK release deprecation in your Cloud Panel four weeks in advance.

Once deprecated, your cluster is subject to auto-upgrade. With auto-upgrades, correct functioning of your workloads cannot be guaranteed, since Kubernetes releases do introduce breaking changes and preparations on your side should be made.

Please upgrade your clusters proactively ahead of deprecation.

Release notes:

1.27.5-gs0, 1.26.8-gs0, 1.25.13-gs0, and 1.24.17-gs0 (released: 2023-09-06)

Kubernetes Release Notes for 1.27

Kubernetes Release Notes for 1.26

Kubernetes Release Notes for 1.25

Kubernetes Release Notes for 1.24

Bug fixes:

  • Fix issue causing custom storage classes to be deleted during upgrade process. All cluster upgrades are available again.

1.25.12-gs0 (released: 2023-08-28)

Kubernetes Release Notes for 1.25

Improvements:

1.26.7-gs0 (released: 2023-08-28)

Kubernetes Release Notes for 1.26

Improvements:

1.27.4-gs0 (released: 2023-08-23)

Kubernetes Release Notes for 1.27

Improvements:

1.26.5-gs0 (released: 2023-06-14)

Kubernetes Release Notes for 1.26

Improvements:

  • Support rocket (local) storage to provision local PVCs for the workloads that requires a storage with extreme IOPs. How to use the rocket storage with GSK can be found Here.

1.25.8-gs0, 1.24.12-gs0, 1.23.17-gs0, and 1.22.17-gs1 (released: 2023-03-30)

Kubernetes Release Notes for 1.25

Kubernetes Release Notes for 1.24

Kubernetes Release Notes for 1.23

Bug fixes:

Improvements:

  • Includes the latest version of CSI-plugin, which adds the followig labels to the storage:
    • SuccessfulAttachVolume: means the CSI-plugin attached the storage at least once during its lifetime.
    • VolumeToBeDeleted: means the CSI-plugin received a delete action from the provisioner to remove the storage (the PVC got deleted). The protection label will be removed and the customer will be able to delete the storages via the API or the panel if thay are not removed automatically by the CSI plugin.

1.25.6-gs0 (released 2023-02-23)

Kubernetes Release Notes for 1.25

Improvements:

  • Cluster private network IP range can now be configured via parameter k8s_cluster_cidr
    • accepts a private /16 CIDR block, which is then broken down into a /19 node block, a /18 service block and a /17 pod block
  • inotify sysctls have been increased
    • fs.inotify.max_user_instances is now 8192 (was: 128)
    • fs.inotify.max_user_watches is now 524288 (was: 327875)

1.20.15-gs5, 1.21.14-gs4, 1.22.17-gs0, 1.23.15-gs1, and1.24.9-gs0 (released: 2023-01-19)

Kubernetes Release Notes for 1.24 Kubernetes Release Notes for 1.23

Improvements:

  • Includes the latest version of CSI-plugin, which provisions PVCs (storages) with 0 Reserved blocks. The CSI-plugin tunes the previously provisioned PVCs (storages) by setting Reserved blocks to 0, so the customer does not need to perform any further action.
  • GSK upgarde from 1.23 to 1.24 is enabled

1.24.8-gs0 (released: 2023-01-04)

Kubernetes Release Notes for 1.24

Breaking change

  • Removal of Dockershim

    Dockershim is officially dropped by Kubernetes in 1.24. This means Kubernetes is no longer using Docker as a container runtime. We now have switched to containerd as our container runtime.

    This might required further action from the customer.

    For most customers this should have no impact. However we encourage you to read if you are affected this on the offical K8s docs: Check whether dockershim removal affects you

  • The logging has been changed to the standard cri log format

    The earlier versions of GSK used the journald log driver by Docker. In the earlier versions, journald stored the logs in /var/log/journal. You can find more about cri log format

    If you are using a log shipper you have to adjust the log shippers config in order to retrieve logs after the update.

    Now the logs can be found in /var/log/containers/*.log and /var/log/pods/*/*/*.log.

  • No longer access the docker engine /var/run/docker.sock

    Docker engine was replaced with containerd. You can find more about Kubernetes Containerd Integration.

Improvements:

  • The nodes use Ubuntu 22.04
  • The nodes use containerd as the container runtime instead of Docker engine, as dockershim was removed from Kubernetes 1.24

1.23.15-gs0 and 1.22.16-gs0 (released: 2022-12-16)

Improvements:

  • The node uses Ubuntu 22.04
  • Scheduling coredns evenly on the worker nodes
  • Includes the latest version of csi-plugin

1.21.14-gs2, 1.20.15-gs3, and 1.19.16-gs3 (released: 2022-11-14)

Bug fixes:

  • Fix the upgarde of csi-plugin for worker nodes, so the customer can collect metrics of storages (PVs).
  • Fix the missing csi-plugin for the surge node, so the workload will be re-scheduled into the surge node when the csi-plugin is ready to handle PVCs
  • Fix the scale-in of the surge node after the upgarde is done

1.21.14-gs1, 1.20.15-gs2, 1.19.16-gs2 (released: 2022-09-12)

Bug fixes:

  • Fix the issue of the surge node upgrade, where the new configuration was not saved for further operations such as scale-out/in.

1.21.14-gs0, 1.20.15-gs1, 1.19.16-gs1 (released: 2022-07-19)

Improvements:

  • Upgrade k8s: v1.21.14, v1.20.15, 1.19.16.
  • Support PVC volume usage metrics
  • Support PVC volume health
  • Support surge upgrades to avoid resource shortage during the upgrade

1.21.11-gs0, 1.20.15-gs0, 1.19.16-gs0

Improvements:

  • Upgrade k8s: v1.21.11, v1.20.15, 1.19.16.
  • Avoid Warning FailedScheduling in pods with PVC.
  • Spread pods across nodes evenly via PodTopologySpread.

Bug fixes:

  • Fix issue causing storage cannot be deleted when storageclass has reclaimPolicy: Retain.
  • Fix scale out/in fails if one of the nodes is down.
  • Fix k8s doesn’t recursively change ownership and permissions for the contents of each volume to match the fsGroup specified in a Pod’s securityContext when that volume is mounted.

GSK Updates and Upgrades

Patch Updates

Patch updates contain either a new Kubernetes patch release or GSK specific changes (such as CSI plugin) or both.

Availability of new patch updates are announced as notifications in your Cloud Panel.

Upon availability, you can update your cluster via the Cloud Panel or the API at a time of your choosing.

To guarantee that your cluster is running the latest stable patch update, unpatched clusters will be auto-updated after 3 weeks of patch availability.

Please consult the upgrade considerations section below.

Release Upgrades

Release upgrades contain a new Kubernetes minor or major release and (optionally) GSK specific changes. Release upgrades are not performed automatically for you.

You can perform release upgrades via the Cloud Panel or the API at a time of your choosing.

Please consult the upgrade considerations section below for compatibility information between Kubernetes releases.

Performing Patch Updates and Release Upgrades via the API

  1. Get your GSK service:
curl 'https://api.gridscale.io/objects/paas/services/<CLUSTER_UUID>' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>'
  1. Get the available Service Templates:
curl 'https://api.gridscale.io/objects/paas/service_templates' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>'

3.Take the current service_template_uuid from Step 2, which corresponds to your GSK cluster found in Step 1.

  1. Find the target services template from the patch_updates attribute from Step 3.

  2. Initiate GSK Update via Service Patch using the UUID from Step 4:

curl 'https://api.gridscale.io/objects/paas/services/<CLUSTER_UUID>' -X PATCH -H 'Content-Type: application/json' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>' --data-raw '{"service_template_uuid":"<PATCH_UPDATE_SERVICE_TEMPLATE_UUID>"}'

Effect of Updates and Upgrades on Nodes and Workloads

Nodes are considered volatile in the Kubernetes cluster. During updates, upgrades or node recoveries, nodes are not modified - they are replaced.

The process starts by upgrading the master node. Kubernetes API will experience a short interruption during which you won’t be able to change cluster resources. Existing pods will continue to run uninterrupted. New pods can be scheduled once the master node upgrade has completed.

The next step is upgrading all worker nodes. This is a sequential process, where nodes are upgraded one at a time. To avoid resource shortage, surge upgrades are performed by default.

Worker node upgrades drain workloads of the node before taking it down, to allow your pods to be rescheduled gracefully. In case pod disruption policies prevent your workloads from being drained, the process will continue to ensure cluster integrity. Once the node has been drained, it is replaced and joins the cluster again.

Be sure to configure your workloads with redundancies in place, so that they remain available during an upgrade, if continuous operation is a priority for your workload.

Surge Upgrades

With surge upgrades, resource shortage during upgrades is counteracted by adding worker nodes for the time of the upgrade.

If enabled (default is 1 surge node), the configured amount of nodes are added to your cluster before the first node is taken down. They are temporary in nature and are removed once the upgrade has succeeded.

Additional costs are generated during surge node lifetime. You can disable surge upgrades in your Cloud Panel or via the API by setting parameter k8s_surge_node_count to 0.

Note: Surge node count is currently limited to either 0 or 1. Support for counts >1 will be added in the future.

Impact on Node Labels

Node labels are not persisted when nodes are replaced. In case you rely on node labels to control where deployments run in your cluster, please look into Affinity and anti-affinity as the preferred approach.

Considerations for Upgrades

Patch updates (1.19.10 → 1.19.11) is considered safe from the Kubernetes project.

Release upgrades (1.16.x → 1.17.x) can introduce breaking changes. To check if your workloads (deployments, services, daemonsets, etc.) are still compatible with the Kubernetes release you want to upgrade to, you can do several things.

Read on to find out how to check if your workloads (deployments, services, daemonsets etc.) are still compatible with the Kubernetes release you want to upgrade to.

Official Kubernetes Documentation

You can find the official Kubernetes release notes in the Changelog.

Another helpful resource is the Deprecated API Migration Guide, which lists all API removals by release.

Example:

The extensions/v1beta1 API version of NetworkPolicy is no longer served as of v1.16.

  • Migrate manifests and API clients to use the networking.k8s.io/v1 API version, available since v1.8.
  • All existing persisted objects are accessible via the new API

Deploy and Test Workload on a Temporary Cluster

The easiest way is just to provision a test cluster with the new release.

Deploy your workloads to the test cluster and check if everything is working as expected.

This way you can make sure your workloads are compatible with the kubernetes release you want to upgrade to without impact on live workloads.

Third Party Cluster Linting Tool

There are some third party tools which could make the transition easier.

Pluto is a tool which helps users find deprecated Kubernetes APIs.

In this example we see two files in our directory that have deprecated apiVersions. Deployment extensions/v1beta1 is no longer available and needs to be replaced with apps/v1. This will need to be fixed prior to a 1.16 upgrade:

pluto detect-files -d kubernetes/testdata

NAME        KIND         VERSION              REPLACEMENT   REMOVED   DEPRECATED
utilities   Deployment   extensions/v1beta1   apps/v1       true      true
utilities   Deployment   extensions/v1beta1   apps/v1       true      true

Head over to the Pluto Documentation to read more about in-depth usage.

Connect a Kubernetes Cluster to a PaaS service

_Requires 1.19.16-gs0, 1.20.15-gs0, 1.21.11-gs0 or higher.

We recently released the support of private networks with IPv4 for PaaS services. This feature allows you to access a PaaS service from a Kubernetes cluster as a Kubernetes service, so your application can access the PaaS service without a proxy. You can follow the following steps:

  • First, you need a GSK cluster. The worker nodes of the GSK cluster will be connected to a private network with IPv4.
  • Determine the private network that the worker nodes are connected to. The name of the private network always consists of the cluster name and the suffix private. For example, if you have a cluster named my-first-gridscale-k8s, the name of the cluster’s private network is my-first-gridscale-k8s-private
  • Connect your PaaS service to the cluster’s private network that you looked up in the previous step. For both new and existing services you can do so
    • via the API, where you specify network_uuid in the create or update request’s payload.
    • via the panel, where you can check the “Relate custom private Network” box during creation of the PaaS service or with the edit-icon in the Connections-pane for existing PaaS services. Then select the corresponding network from the dropdown.
  • Create a Kubernetes service via mapping a hostname to the PaaS service private IP
    • Determine the PaaS service private IP from the Service Access: for example a Postgres database with the following Service Access:

      • connection-string format:

        postgres://postgres:XXpasswordXX@10.244.0.43:5432
        
      • connection-parameters format:

        username = postgres
        password = XXpasswordXX
        host = 10.244.0.43
        port = 5432
        
    • Create a Kubernetes service as in this example

      kind: "Service"
      apiVersion: "v1"
      metadata:
        name: "paas-postgres"
      spec:
        ports:
          - name: "paas-postgres"
            protocol: "TCP"
            port: 5432
            targetPort: 5432
      
    • After applying the above yaml manifest, you can get the paas-postgres service as following

      $ kubectl get services paas-postgres
      NAME            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
      paas-postgres   ClusterIP   10.244.69.82   <none>        5432/TCP   2d17h
      
    • Create a Kubernetes Endpoints for the Kubernetes service. The IP address should be the one from the service access (connection-string or connection-parameters). In this example, the ip address is 10.244.0.43 yaml kind: "Endpoints" apiVersion: "v1" metadata: name: "paas-postgres" subsets: - addresses: - ip: "10.244.0.43" ports: - port: 5432 name: "paas-postgres"

    • After applying the above yaml manifest, you can get the paas-postgres endpoints as following

      $ kubectl get endpoints paas-postgres
      NAME            ENDPOINTS          AGE
      paas-postgres   10.244.0.43:5432   2d17h
      
    • Create the secrets for database access, use the postgres database, username, and password.

      $ kubectl create secret generic paas-postgres \
          --from-literal=database=postgres \
          --from-literal=username=postgres \
          --from-literal=password=XXpasswordXX
      
    • As the service, endpoint, and secrets were created, the application now can access the postgres database as a Kubernetes service. Here is an example on how to configure your application to access the postgres database.

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: my-app
        labels:
          app: my-app
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: my-app
        template:
          metadata:
            labels:
              app: my-app
          spec:
            containers:
            - name: my-app
              image: postgres:12-alpine
              imagePullPolicy: Always
              env:
                - name: DATABASE_HOST
                  value: "paas-postgres"
                - name: DATABASE_NAME
                  valueFrom:
                    secretKeyRef:
                      name: paas-postgres
                      key: database
                - name: DATABASE_USER
                  valueFrom:
                    secretKeyRef:
                      name: paas-postgres
                      key: username
                - name: DATABASE_PASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: paas-postgres
                      key: password
                - name: POSTGRES_PASSWORD
                  valueFrom:
                    secretKeyRef:
                      name: paas-postgres
                      key: password
              ports:
              - containerPort: 8080
      
    • Show the pods

      $ kubectl get pods
      NAME                      READY   STATUS    RESTARTS   AGE
      my-app-6559f7f88c-fjqtq   1/1     Running   0          10s
      
    • You can access the database from one of the pods

      $ kubectl exec -it my-app-6559f7f88c-fjqtq bash
      
    • Connect, describe and list the database

      bash-5.1# PGPASSWORD=$POSTGRES_PASSWORD psql -U $DATABASE_USER -h $DATABASE_HOST
      psql (12.10, server 13.0 (Debian 13.0-1.pgdg100+1))
      WARNING: psql major version 12, server major version 13.
               Some psql features might not work.
      Type "help" for help.
      
      postgres=# \d
                              List of relations
       Schema |               Name                |   Type   |  Owner
      --------+-----------------------------------+----------+----------
       public | auth_group                        | table    | postgres
       public | auth_group_id_seq                 | sequence | postgres
       public | auth_group_permissions            | table    | postgres
       public | auth_group_permissions_id_seq     | sequence | postgres
       public | auth_permission                   | table    | postgres
       public | auth_permission_id_seq            | sequence | postgres
       public | auth_user                         | table    | postgres
       public | auth_user_groups                  | table    | postgres
       public | auth_user_groups_id_seq           | sequence | postgres
       public | auth_user_id_seq                  | sequence | postgres
      
      postgres=# \l
                                       List of databases
         Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
      -----------+----------+----------+------------+------------+-----------------------
       postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
       template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
                 |          |          |            |            | postgres=CTc/postgres
       template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
                 |          |          |            |            | postgres=CTc/postgres
      (3 rows)
      
      postgres=#
      

Resource Protection

Resources like servers, storages, networks, ip addresses or load balancers, which make up the cluster, are visible to you via API or within the Cloud Panel for transparency and billing reasons. They are, however, protected from being altered. This not only makes sure that they are not deleted accidentally, but is also vital to stable cluster operations.

Protected Resources:

  • Master Nodes (server, storage, ips)
  • Worker Nodes (server, storage, ips)
  • Kubernetes network
    • You can still attach your own servers or platform services to it, i.e. to access them from inside your cluster.
  • Storages created by Kubernetes (like Persistent Volumes)
  • LoadBalancers created by Kubernetes (like Ingress-Controllers)

If you want to change your worker config you can still do this in the Kubernetes configuration.

Horizontal Pod Autoscaler (HPA)

In order to use the horizontal pod autoscaler (HPA) you need to install the Metrics Server. You can bring your own or just follow the example.

Install Metrics Server

You can install the Metrics Server via Helm. There is a ready-to-use Metrics Server Helm Chart by Bitnami.

Add the Bitnami Metrics Server repository to your Helm installation:

helm repo add bitnami https://charts.bitnami.com/bitnami

Create a values.yaml with this content to configure your Metrics Server:

apiService:
  create: true
extraArgs:
  - --kubelet-insecure-tls=true
  - --kubelet-preferred-address-types=InternalIP

Install the Metrics Server Helm Chart:

helm install metrics-server bitnami/metrics-server -f values.yaml

Wait for the Metrics Server to be ready. It might take a minute or two before the first metrics are collected.

Run HPA

In order to run the HPA you need to create a deployment and generate some load against it.

Keep in mind that it is required to define the resource limits and request in order to use the HPA. The service is just for allowing access for load-generator.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache

Download the example deployment and service and deploy with:

kubectl apply -f php-apache.yaml

Create the HPA for the deployment:

kubectl autoscale deployment php-apache --cpu-percent=20 --min=1 --max=10

Check the current status of the HPA:

kubectl get hpa

This should look like this:

NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   0%/50%    1         10        1          2d22h

Generate Test Load

Now you create an infinite loop which will generate a load:

kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

Open a second terminal and check the HPA status:

kubectl get hpa -w

After some time you should see the pods scale:

NAME         REFERENCE               TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   0%/50%    1         10        1          2d22h
php-apache   Deployment/php-apache   91%/50%   1         10        1          2d22h
php-apache   Deployment/php-apache   91%/50%   1         10        2          2d22h
php-apache   Deployment/php-apache   253%/50%  1         10        2          2d22h
php-apache   Deployment/php-apache   253%/50%  1         10        4          2d22h
php-apache   Deployment/php-apache   253%/50%  1         10        6          2d22h
php-apache   Deployment/php-apache   101%/50%  1         10        6          2d22h
php-apache   Deployment/php-apache   71%/50%   1         10        6          2d22h
php-apache   Deployment/php-apache   71%/50%   1         10        9          2d22h
php-apache   Deployment/php-apache   51%/50%   1         10        9          2d22h

You can also check the deployment itself:

kubectl get deployment php-apache

NAME         READY   UP-TO-DATE   AVAILABLE   AGE
php-apache   9/9     9            9           2d22h

Stop Load and Clean Up

In order to stop the load, hit CTRL+C in the terminal where you started the load generator.

You can verify the scale down with the commands from above:

kubectl get deployment php-apache -w

Delete the example deployment and service:

kubectl delete -f php-apache.yaml

Vertical Scaling

GSK supports vertical scaling, which can be enabled by simply editing the worker node configuration of your Kubernetes cluster in the Cloud Panel or via the API. Scaling the cluster will recycle all nodes sequentially.

The following node resources can be changed:

  • Cores per worker node via parameter k8s_worker_node_cores
  • RAM per worker node via parameter k8s_worker_node_ram
  • Storage per worker node via parameter k8s_worker_node_storage
  • Storage type per worker node via parameter k8s_worker_node_storage_type

You can either change these in your Cloud Panel in the Configuration section, or via API.

To do so via API, you need to patch your cluster’s parameters. Always include all the parameters in the patch, not just the ones you want to change.

For example:

{
  "parameters": {
      "k8s_worker_node_ram": 4,
      "k8s_worker_node_cores": 2,
      "k8s_worker_node_count": 3,
      "k8s_worker_node_storage": 40,
      "k8s_worker_node_storage_type": "storage"
      }
}

Worker Node Storage Performance Classes

Worker nodes in your cluster use a distributed storage for their operating system. On cluster creation, you choose the performance class for this storage with the parameter k8s_worker_node_storage_type.

The performance class of your worker nodes’ storage is independent of your PersistentVolumes and only affects the OS, kubelet and potential hostPath mounts. A higher performance class can help the node stay responsive when under increased memory pressure.

The performance class of your worker nodes can be changed at any time by editing your cluster. You can do so either in your Cloud Panel in the Configuration section, or via API. Changing the performance class will recycle all nodes sequentially.

To do so via API, you need to patch your cluster’s parameters to update the parameter k8s_worker_node_storage_type. Always include all the parameters in the patch, not just the ones you want to change.

For example:

{
  "parameters": {
      "k8s_worker_node_ram": 4,
      "k8s_worker_node_cores": 2,
      "k8s_worker_node_count": 3,
      "k8s_worker_node_storage": 40,
      "k8s_worker_node_storage_type": "storage"
    }
}

Logging

Container logs can be obtained via kubectl. While this is certainly feasible for ad-hoc debugging of single containers, it doesn’t give you the full picture of your application or even the whole cluster.

It is therefore a common practice to ship logs to a centralized log management platform, where they can be transformed and analyzed in one place - giving you that full picture and the means to act on events or trends.

There are multiple ways to get your logs into the log management platform:

  • Your application can directly implement the format your log management platform accepts the logs in, and send them there.
  • Your application can log to stdout and stderr, leaving it to the container engine to store the logs.

It is a good practice to use the latter approach. This approach decouples the application from runtime environment specifics. It is non-blocking for the application and provides a general approach to reliably and securely transfer logs, even when running into temporary unavailability of the log management platform

Log Shipping

While the container engine technically might be able to ship the logs directly to your log management platform, having the container engine store them locally instead and a third-party component read and ship them has proven to be the more reliable and portable solution.

This third-party component is called a log shipper. In general it can run anywhere, has inputs to read logs from locally and outputs to ship logs to remotely. The log shipper is an application agnostic approach - in the sense that it doesn’t need to be integrated into the applications you run on your cluster in any way. It just needs to support the format the logs are stored in as an input and the format the log management platform accepts the logs in as an output.

Accessing Container Logs

GSK 1.24 and higher

The logging format used by the container engine is the CRI logging format.

You can choose any log shipper that supports the CRI logging format, such as

Logs are stored in /var/log/containers and /var/log/pods.

GSK 1.23 and lower

The log driver used by the container engine docker on our managed Kubernetes platform is journald.

journald is part of systemd and designed to store logs safely and handle rotation gracefully to prevent node disks from filling up. journald makes it easy for the shipper to reliably transfer logs, since the shipper only needs to keep track of one event stream.

journald stores logs in /var/log/journal. Among the log shippers that support journald as an input are:

Note: The log shipper needs to keep track where it left off, so that after a restart/redployment log shipping doesn’t start at the beginning resp. all logs are transferred again. Since the position is node-specific, a local hostPath mount to store the position in is recommended.

Load Balancing

Applying a service with the type of Load Balancer will provision a gridscale Load Balancer. Below are some helpful tips on integrating with our Load Balancer as a Service (LBaaS):

IP Address Forwarding

The Load Balancer needs to be set to HTTP mode. The client’s IP address is then available in the X-Forwarded-For HTTP header.

Note: When in HTTP mode, HTTPS-termination happens at the Load Balancer level. For the HTTP mode alone, certificates will be obtained via Let’s Encrypt or you can upload your own custom certificate.

Configuring Load Balancer Modes

The cloud controller manager (CCM) uses service annotations to configure the LBaaS for a GSK cluster. If an annotation of a specific parameter is not set, the default value for that parameter will be configured. This feature is supported from these GSK versions 1.18.12-gs1, 1.19.4-gs1, 1.17.14-gs1, and 1.16.15-gs2 and later.

AnnotationDefault value
service.beta.kubernetes.io/gs-loadbalancer-modetcp
service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https“false”
service.beta.kubernetes.io/gs-loadbalancer-ssl-domainsnil
service.beta.kubernetes.io/gs-loadbalancer-algorithmleastconn
service.beta.kubernetes.io/gs-loadbalancer-https-ports443
service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuidsnil

Examples

  • The following annotations configure the LBaaS with HTTP mode, Round Robin Algorithm, redirect HTTP to HTTPS, and multiple SSL Domains wherein domains are separated by a comma. The service.beta.kubernetes.io/gs-loadbalancer-ssl-domains annotation allows you to add multiple SSL Domains to the loadbalancer.
annotations:
    service.beta.kubernetes.io/gs-loadbalancer-mode: http
    service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https: "true"
    service.beta.kubernetes.io/gs-loadbalancer-ssl-domains: demo1.test.com,demo2.test.com
    service.beta.kubernetes.io/gs-loadbalancer-algorithm: roundrobin
  • The following annotations configure the LBaaS with HTTP mode, Round Robin Algorithm, redirect HTTP to HTTPS, a none standard SSL port 4443, and a custom certificate wherein certificate UUIDs are separated by a comma. The service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids annotation allows you to an already uploaded custom certificates to the loadbalancer. Thus, first upload the custom certificate via the panel or API. Then, you can use the uuid of the uploaded custom certificate, for example c8b786e7-53ee-427b-8ff6-498f59f58b14, with service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids annotation.
annotations:
    service.beta.kubernetes.io/gs-loadbalancer-mode: http
    service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https: "true"
    service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids: c8b786e7-53ee-427b-8ff6-498f59f58b14
    service.beta.kubernetes.io/gs-loadbalancer-algorithm: roundrobin
    service.beta.kubernetes.io/gs-loadbalancer-https-ports: "4443"

Adding Annotations to an Existing Ingress

You can customize the behaviour of specific Ingress objects using annotations:

kubectl annotate --overwrite svc <INGRESS_NAME> \
"service.beta.kubernetes.io/gs-loadbalancer-mode=http" \
"service.beta.kubernetes.io/gs-loadbalancer-algorithm=roundrobin"

Networking

We use Flannel out-of-the-box, which cannot be currently changed.

Network Policies

Due to Flannel being used as the network overlay, our cluster does not support networking policies.

Persistent Volumes

We differentiate between Persistent Volumes that are based on block devices and those that are based on network filesystems.

Block Device Persistent Volumes

Block device based Persistent Volumes use distributed storages that are directly attached to your GSK nodes.

Since they are block devices with plain, non-clustered filesystems (ext4 by default), they can only ever be attached to a single node at a time and thus only be used by pods that run on the same node. (ReadWriteOnce (RWO) access mode)

Their strength is performance.

Storage Classes

Block device based Persistent Volumes give you the raw performance of the Distributed Storage. You can find a storage class for each of its performance classes.

NAME                      PROVISIONER           RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
block-storage (default)   bs.csi.gridscale.io   Delete          Immediate           true                   68d
block-storage-high        bs.csi.gridscale.io   Delete          Immediate           true                   68d
block-storage-insane      bs.csi.gridscale.io   Delete          Immediate           true                   68d

Reclaim Policy

Reclaim policy Delete makes sure that deleting Persistent Volumes (PV) will also delete the corresponding Distributed Storage.

Deleting and changing preconfigured storage classes to modify this behaviour is not recommended. Your changes will be reverted with every upgrade.

Instead, create your own storage classes that use the same provisioner.

Limitations

Block device based Persistent Volumes are subject to Distributed Storage and Server limitations. Currently, up to 15 storages respectively Persistent Volumes can be attached to a single GSK node at a time. The attach-process takes a few seconds per Storage/PV.

Network Filesystem Persistent Volumes via GridFs

Requires 1.19.16-gs0, 1.20.15-gs0, 1.21.11-gs0 or higher.

Network Filesystem based Persistent Volumes use GridFs to store data. GridFs is an NFS-compatible network filesystem. It grows with your data, you only pay for volume you actually use and your data can be access read-write by any number of GSK nodes at a time. (ReadWriteMany (RWX) and ReadOnlyMany (ROM) access modes)

Its strengths are scalability and being read-write accessible from all your GSK nodes.

Set up GridFs based Persistent Volumes

GridFs is an NFS compatible network filesystem. As such, access is achieved through the NFS CSI driver for Kubernetes.

  1. Create a new GridFs instance or use an existing one.
  2. Follow the first three steps of Connect a Kubernetes Cluster to a PaaS service to make sure your GridFs is connected to your GSK cluster.
  3. Install the NFS CSI driver for Kubernetes as described here.
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system --version v4.0.0
  1. Create a storage class that uses the NFS CSI driver as the provisioner and your GridFS as the NFS server.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gridfs-<PAAS_SERVICE_UUID OF YOUR GRIDFS>
provisioner: nfs.csi.k8s.io
parameters:
  server: <IP ADDRESS OF YOUR GRIDFS>
  share: /
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
  - nfsvers=4.1
  1. Use that storage class for your PVCs.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-first-gridfs-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: gridfs-<PAAS_SERVICE_UUID OF YOUR GRIDFS>
  1. The NFS CSI driver creates a directory for this PVC under the share-path configured in the storage class and makes it available as a new PersistentVolume.

Limitations

Network Filesystem based Persistent Volumes via GridFs can hold any number of PVCs in a single GridFs instance.

Host Path Persistent Volumes

Aside from block device based and network filesystem based Persistent Volumes, hostPath Persistent Volumes can be used for node-local storage.

Please note:

  • Due to the transient nature of the Kubernetes nodes, hostPath Persistent Volumes will be lost whenever the node is being recycled. (f.e. during updates, upgrades or node recovery)
  • Use of hostPath Persistent Volumes can fill up node-local storage and affect health of the node.

Persistent Volumes are not automatically deleted

The PersistentVolume is created automatically when a PersistentVolumeClaim is requested. But it’s not automatically deleted after you delete the GSK cluster. This behaviour prevents data loss of your persistent volumes.

There are two ways to delete the persistent volumes:

  1. After deleting the cluster, it’s also possible to delete the persistent volumes from the Cloud Panel.
  2. Before deleting the cluster, you should delete the related deployments that use the PersistentVolume and the PersistentVolumeClaim from the cluster.

Ingress Controller

Your cluster does not come with an ingress controller preinstalled. You can install the ingress controller of your choice as described in ingress-controllers.

Access and Security

All users with write access (or higher) to the project will be able to download the Kuberenetes certificate.

PKI Certificate Access

Authentication against the Kubernetes master is based on X.509 client certificates, which can be generated and expire after three days. This can be used with gscloud, which will automatically renew the certificate for you.

After installation of gscloud, set it up with your API token as described here. Then use gscloud to fetch and maintain your kubeconfig as described here.

Encryption

Data is encrypted at rest, and network traffic is TLS encrypted on the application layer.

Role-based Access Control (RBAC)

GSK supports standard Kubernetes RBAC.

Firewall

GSK controlplane and worker nodes utilize the firewall in the OS to secure cluster-internals from the public network.

This does not restrict you from exposing your workloads to the public network.

Backups

Data that belongs to the controlplane of the cluster (such as etcd) is backed up by gridscale.

Data that comes from within the application needs to be backed up by the user. gridscale Storage snapshots and backups are not supported by GSK at this point. They cannot be used for backing up persisted data.

Please employ a solution that runs in the cluster.

Node Pools

Currently, we only support one node pool.

Kubernetes Dashboard

The official Kubernetes dashboard is not deployed by default and can be installed with a single command that is mentioned in the Official Kubernetes Documentation.

Known issues

Storage instances are not deleted from gridscale panel

To prevent this issue, please do NOT delete the PVs (Persistent Volumes) before the storage instances are deleted completely from the panel. If you already have some storage instances dangling in the panel, please contact us to remove them.

Cannot delete k8s cluster when there are other PaaS/servers connected to the cluster’s private network

The issue can be solved by either attaching the PaaS/servers to other networks or removing the PaaS/servers.

Node labels do not persist

Nodes in a Kubernetes cluster are volatile and can be replaced at any time, i.e. during updates, upgrades or node recovery. When they are, replacement nodes do not inherit their labels.

If you control scheduling of your pods with nodeSelector and node labels, please consider migrating to Affinity and anti-affinity.

FAQ

Does gridscale monitor the cluster?

We monitor the overall cluster health of a cluster. We assure that the cluster is healthy and functional, and we will be paged about abnormal conditions of the cluster.

gridscale does not monitor the application(s) that are deployed within the cluster. Since we don’t know anything about your workloads, we don’t include performance and resource monitoring from our side as part of the standardised gridscale Managed Kubernetes (GSK).

Do cluster components communicate on the Public or the Private Network?

Cluster communication is strictly private. This includes communication between Kubernetes components, but also communication between pods and/or services.

However, as a user you can contact external services.

Thereby it would technically be possible, but not usual, to communicate with other services on the cluster through the Public Network and Load Balancers, if that service is exposed to the outside and communication is explicitly directed there through public connection details.

A specific tool that I want to use with my cluster is not working. What shall I do?

Please check whether your tool is supporting the kubernetes version of your cluster. If your cluster version is not supported, please have a look in the Cloud Panel, where you can update your cluster to a new patch version (e.g. 1.24.8 to 1.24.9) or replace your cluster with a more up-to-date one.

I cannot see PVC usage in Grafana. What shall I do?

Please ensure that the volume is mounted for long enough, and the query interval in Grafana low enough to catch all metrics.

Terms and Abbreviations

  • GSK: gridscale Kubernetes
  • K8s: K-ubernete-s.
  • kubectl: A command line tool which functions as a management interface for a K8s cluster.
  • Node: A K8s cluster is made of a few virtual machines that talk to each other. In this context, a virtual machine is a node. A master (we have one master at the moment) and one or more workers.
  • Control Plane: A fancy way of saying “masters of the cluster”. Technically, all programs that run on the master that make the cluster a cluster. For instance, a specialized database or a program that decides which worker should run which software.
  • Deployment: In most cases an app running on K8s. Technically a collection of containers based on a set of templates (images).
  • PV: Persistent Volume. A persistent storage for Kubernetes deployments.
  • PVC: Persistent Volume Claim. When a client (user, customer, an application) needs a PV, they send a PVC to the K8s cluster.
  • Service: A way of accessing your deployment outside of the cluster, tightly related to Load Balancers and Ingresses.
  • Ingress: A special way of exposing a deployment outside of the cluster. Think of it as a kind of Load Balancer.
  • IngressController: This component runs inside the cluster and is responsible for handling requests for an Ingress.
  • RBAC: Role Based Access-Control. Allows you to selectively give different people different access rights to the cluster.
  • Dashboard: A graphical frontend for the cluster API. The user can see their deployments, nodes and a few metrics without using the command line. This is not enabled by default, but can be easily installed into the GSK.