Kubernetes
gridscale Managed Kubernetes (GSK) is a secure and fully-managed Kubernetes solution. All you need to do is to configure how powerful you wish your cluster to be. We take care of upgrades and OS maintenance.
gridscale Managed Kubernetes (GSK) fully integrates into our products, offering easy configuration, monitoring, release management and security enabling you to explicitly focus on your business applications.
GSK easily integrates with our Load Balancer, Certificates and Storage IaaS for Ingress and persistent volumes respectively.
If you are new to Kubernetes or containers in general, we’d recommend you get familiar with commonly used terminology and go through our line-up of content for you to get started:
- gridscale Kubernetes Cluster in 5 Minuten einrichten
- Kubernetes - All about clusters, pods and kubelets
- How to: connect gridscale Kubernetes Cluster and PaaS
- Release notes
You also may want to take a look at known issues.
Release Support
The GSK offering supports three stable Kubernetes releases (minor/major versions) at any time.
New Kubernetes releases provided by the community are adopted within 6 months of time, after their initial stable release. This adoption window is used to migrate GSK components, assure stable operations and provide a migration path from previous releases.
Releases other than the latest three are deprecated, not available for new clusters and no longer maintained for existing clusters. You are notified of GSK release deprecation in your Cloud Panel four weeks in advance.
Once deprecated, your cluster is subject to auto-upgrade. With auto-upgrades, correct functioning of your workloads cannot be guaranteed, since Kubernetes releases do introduce breaking changes and preparations on your side should be made.
Please upgrade your clusters proactively ahead of deprecation.
Release notes:
1.27.5-gs0, 1.26.8-gs0, 1.25.13-gs0, and 1.24.17-gs0 (released: 2023-09-06)
Kubernetes Release Notes for 1.27
Kubernetes Release Notes for 1.26
Kubernetes Release Notes for 1.25
Kubernetes Release Notes for 1.24
Bug fixes:
- Fix issue causing custom storage classes to be deleted during upgrade process. All cluster upgrades are available again.
1.25.12-gs0 (released: 2023-08-28)
Kubernetes Release Notes for 1.25
Improvements:
- Container storage interface (CSI) plugin supports G and Gi as a storage size unit.
- Cloud controller manager (CCM) supports
ProxyProtocol
. You can check the proxy protocol for the loadbalancer provisioned in GSK cluster.
1.26.7-gs0 (released: 2023-08-28)
Kubernetes Release Notes for 1.26
Improvements:
- Container storage interface (CSI) plugin supports G and Gi as a storage size unit.
- Cloud controller manager (CCM) supports
ProxyProtocol
. You can check the proxy protocol for the loadbalancer provisioned in GSK cluster.
1.27.4-gs0 (released: 2023-08-23)
Kubernetes Release Notes for 1.27
Improvements:
- Container storage interface (CSI) plugin supports G and Gi as a storage size unit.
- Cloud controller manager (CCM) supports
ProxyProtocol
. You can check the proxy protocol for the loadbalancer provisioned in GSK cluster.
1.26.5-gs0 (released: 2023-06-14)
Kubernetes Release Notes for 1.26
Improvements:
- Support rocket (local) storage to provision local PVCs for the workloads that requires a storage with extreme IOPs. How to use the rocket storage with GSK can be found Here.
1.25.8-gs0, 1.24.12-gs0, 1.23.17-gs0, and 1.22.17-gs1 (released: 2023-03-30)
Kubernetes Release Notes for 1.25
Kubernetes Release Notes for 1.24
Kubernetes Release Notes for 1.23
Bug fixes:
- Fix issue causing sometimes pods could not reach other pods by replacing OLD flannel v0.20.2 deployment with NEW flannel v0.20.2 deployment.
Improvements:
- Includes the latest version of CSI-plugin, which adds the followig labels to the storage:
- SuccessfulAttachVolume: means the CSI-plugin attached the storage at least once during its lifetime.
- VolumeToBeDeleted: means the CSI-plugin received a delete action from the provisioner to remove the storage (the PVC got deleted). The protection label will be removed and the customer will be able to delete the storages via the API or the panel if thay are not removed automatically by the CSI plugin.
1.25.6-gs0 (released 2023-02-23)
Kubernetes Release Notes for 1.25
Improvements:
- Cluster private network IP range can now be configured via parameter
k8s_cluster_cidr
- accepts a private /16 CIDR block, which is then broken down into a /19 node block, a /18 service block and a /17 pod block
- inotify sysctls have been increased
fs.inotify.max_user_instances
is now8192
(was:128
)fs.inotify.max_user_watches
is now524288
(was:327875
)
1.20.15-gs5, 1.21.14-gs4, 1.22.17-gs0, 1.23.15-gs1, and1.24.9-gs0 (released: 2023-01-19)
Kubernetes Release Notes for 1.24 Kubernetes Release Notes for 1.23
Improvements:
- Includes the latest version of CSI-plugin, which provisions PVCs (storages) with 0
Reserved blocks
. The CSI-plugin tunes the previously provisioned PVCs (storages) by settingReserved blocks
to 0, so the customer does not need to perform any further action. - GSK upgarde from 1.23 to 1.24 is enabled
1.24.8-gs0 (released: 2023-01-04)
Kubernetes Release Notes for 1.24
Breaking change
Removal of Dockershim
Dockershim is officially dropped by Kubernetes in 1.24. This means Kubernetes is no longer using Docker as a container runtime. We now have switched to containerd as our container runtime.
This might required further action from the customer.
For most customers this should have no impact. However we encourage you to read if you are affected this on the offical K8s docs: Check whether dockershim removal affects you
The logging has been changed to the standard
cri log format
The earlier versions of GSK used the
journald
log driver by Docker. In the earlier versions, journald stored the logs in/var/log/journal
. You can find more about cri log formatIf you are using a log shipper you have to adjust the log shippers config in order to retrieve logs after the update.
Now the logs can be found in
/var/log/containers/*.log
and/var/log/pods/*/*/*.log
.No longer access the docker engine
/var/run/docker.sock
Docker engine was replaced with containerd. You can find more about Kubernetes Containerd Integration.
Improvements:
- The nodes use Ubuntu 22.04
- The nodes use containerd as the container runtime instead of Docker engine, as dockershim was removed from Kubernetes 1.24
1.23.15-gs0 and 1.22.16-gs0 (released: 2022-12-16)
Improvements:
- The node uses Ubuntu 22.04
- Scheduling coredns evenly on the worker nodes
- Includes the latest version of csi-plugin
1.21.14-gs2, 1.20.15-gs3, and 1.19.16-gs3 (released: 2022-11-14)
Bug fixes:
- Fix the upgarde of csi-plugin for worker nodes, so the customer can collect metrics of storages (PVs).
- Fix the missing csi-plugin for the surge node, so the workload will be re-scheduled into the surge node when the csi-plugin is ready to handle PVCs
- Fix the scale-in of the surge node after the upgarde is done
1.21.14-gs1, 1.20.15-gs2, 1.19.16-gs2 (released: 2022-09-12)
Bug fixes:
- Fix the issue of the surge node upgrade, where the new configuration was not saved for further operations such as scale-out/in.
1.21.14-gs0, 1.20.15-gs1, 1.19.16-gs1 (released: 2022-07-19)
Improvements:
- Upgrade k8s:
v1.21.14
,v1.20.15
,1.19.16
. - Support PVC volume usage metrics
- Support PVC volume health
- Support surge upgrades to avoid resource shortage during the upgrade
1.21.11-gs0, 1.20.15-gs0, 1.19.16-gs0
Improvements:
- Upgrade k8s:
v1.21.11
,v1.20.15
,1.19.16
. - Avoid
Warning FailedScheduling
in pods with PVC. - Spread pods across nodes evenly via
PodTopologySpread
.
Bug fixes:
- Fix issue causing storage cannot be deleted when storageclass has
reclaimPolicy: Retain
. - Fix scale out/in fails if one of the nodes is down.
- Fix k8s doesn’t recursively change ownership and permissions for the contents of each volume to match the
fsGroup
specified in a Pod’ssecurityContext
when that volume is mounted.
GSK Updates and Upgrades
Patch Updates
Patch updates contain either a new Kubernetes patch release or GSK specific changes (such as CSI plugin) or both.
Availability of new patch updates are announced as notifications in your Cloud Panel.
Upon availability, you can update your cluster via the Cloud Panel or the API at a time of your choosing.
To guarantee that your cluster is running the latest stable patch update, unpatched clusters will be auto-updated after 3 weeks of patch availability.
Please consult the upgrade considerations section below.
Release Upgrades
Release upgrades contain a new Kubernetes minor or major release and (optionally) GSK specific changes. Release upgrades are not performed automatically for you.
You can perform release upgrades via the Cloud Panel or the API at a time of your choosing.
Please consult the upgrade considerations section below for compatibility information between Kubernetes releases.
Performing Patch Updates and Release Upgrades via the API
- Get your GSK service:
curl 'https://api.gridscale.io/objects/paas/services/<CLUSTER_UUID>' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>'
- Get the available Service Templates:
curl 'https://api.gridscale.io/objects/paas/service_templates' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>'
3.Take the current service_template_uuid
from Step 2, which corresponds to your GSK cluster found in Step 1.
Find the target services template from the
patch_updates
attribute from Step 3.Initiate GSK Update via Service Patch using the UUID from Step 4:
curl 'https://api.gridscale.io/objects/paas/services/<CLUSTER_UUID>' -X PATCH -H 'Content-Type: application/json' -H 'X-Auth-UserId: <AUTH_USER_UUID>' -H 'X-Auth-Token: <AUTH_TOKEN>' --data-raw '{"service_template_uuid":"<PATCH_UPDATE_SERVICE_TEMPLATE_UUID>"}'
Effect of Updates and Upgrades on Nodes and Workloads
Nodes are considered volatile in the Kubernetes cluster. During updates, upgrades or node recoveries, nodes are not modified - they are replaced.
The process starts by upgrading the master node. Kubernetes API will experience a short interruption during which you won’t be able to change cluster resources. Existing pods will continue to run uninterrupted. New pods can be scheduled once the master node upgrade has completed.
The next step is upgrading all worker nodes. This is a sequential process, where nodes are upgraded one at a time. To avoid resource shortage, surge upgrades are performed by default.
Worker node upgrades drain workloads of the node before taking it down, to allow your pods to be rescheduled gracefully. In case pod disruption policies prevent your workloads from being drained, the process will continue to ensure cluster integrity. Once the node has been drained, it is replaced and joins the cluster again.
Be sure to configure your workloads with redundancies in place, so that they remain available during an upgrade, if continuous operation is a priority for your workload.
Surge Upgrades
With surge upgrades, resource shortage during upgrades is counteracted by adding worker nodes for the time of the upgrade.
If enabled (default is 1 surge node), the configured amount of nodes are added to your cluster before the first node is taken down. They are temporary in nature and are removed once the upgrade has succeeded.
Additional costs are generated during surge node lifetime. You can disable surge upgrades in your Cloud Panel or via the API by setting parameter k8s_surge_node_count
to 0
.
Note: Surge node count is currently limited to either 0 or 1. Support for counts >1 will be added in the future.
Impact on Node Labels
Node labels are not persisted when nodes are replaced. In case you rely on node labels to control where deployments run in your cluster, please look into Affinity and anti-affinity as the preferred approach.
Considerations for Upgrades
Patch updates (1.19.10 → 1.19.11) is considered safe from the Kubernetes project.
Release upgrades (1.16.x → 1.17.x) can introduce breaking changes. To check if your workloads (deployments, services, daemonsets, etc.) are still compatible with the Kubernetes release you want to upgrade to, you can do several things.
Read on to find out how to check if your workloads (deployments, services, daemonsets etc.) are still compatible with the Kubernetes release you want to upgrade to.
Official Kubernetes Documentation
You can find the official Kubernetes release notes in the Changelog.
Another helpful resource is the Deprecated API Migration Guide, which lists all API removals by release.
Example:
The extensions/v1beta1 API version of NetworkPolicy is no longer served as of v1.16.
- Migrate manifests and API clients to use the networking.k8s.io/v1 API version, available since v1.8.
- All existing persisted objects are accessible via the new API
Deploy and Test Workload on a Temporary Cluster
The easiest way is just to provision a test cluster with the new release.
Deploy your workloads to the test cluster and check if everything is working as expected.
This way you can make sure your workloads are compatible with the kubernetes release you want to upgrade to without impact on live workloads.
Third Party Cluster Linting Tool
There are some third party tools which could make the transition easier.
Pluto is a tool which helps users find deprecated Kubernetes APIs.
In this example we see two files in our directory that have deprecated apiVersions.
Deployment extensions/v1beta1
is no longer available and needs to be replaced with apps/v1
.
This will need to be fixed prior to a 1.16 upgrade:
pluto detect-files -d kubernetes/testdata
NAME KIND VERSION REPLACEMENT REMOVED DEPRECATED
utilities Deployment extensions/v1beta1 apps/v1 true true
utilities Deployment extensions/v1beta1 apps/v1 true true
Head over to the Pluto Documentation to read more about in-depth usage.
Connect a Kubernetes Cluster to a PaaS service
_Requires 1.19.16-gs0, 1.20.15-gs0, 1.21.11-gs0 or higher.
We recently released the support of private networks with IPv4 for PaaS services. This feature allows you to access a PaaS service from a Kubernetes cluster as a Kubernetes service, so your application can access the PaaS service without a proxy. You can follow the following steps:
- First, you need a GSK cluster. The worker nodes of the GSK cluster will be connected to a private network with IPv4.
- Determine the private network that the worker nodes are connected to. The name of the private network always consists of the cluster name and the suffix
private
. For example, if you have a cluster namedmy-first-gridscale-k8s
, the name of the cluster’s private network ismy-first-gridscale-k8s-private
- Connect your PaaS service to the cluster’s private network that you looked up in the previous step. For both new and existing services you can do so
- via the API, where you specify
network_uuid
in the create or update request’s payload. - via the panel, where you can check the “Relate custom private Network” box during creation of the PaaS service or with the edit-icon in the Connections-pane for existing PaaS services. Then select the corresponding network from the dropdown.
- via the API, where you specify
- Create a Kubernetes service via mapping a hostname to the PaaS service private IP
Determine the PaaS service private IP from the Service Access: for example a Postgres database with the following Service Access:
connection-string format:
postgres://postgres:XXpasswordXX@10.244.0.43:5432
connection-parameters format:
username = postgres password = XXpasswordXX host = 10.244.0.43 port = 5432
Create a Kubernetes service as in this example
kind: "Service" apiVersion: "v1" metadata: name: "paas-postgres" spec: ports: - name: "paas-postgres" protocol: "TCP" port: 5432 targetPort: 5432
After applying the above yaml manifest, you can get the
paas-postgres
service as following$ kubectl get services paas-postgres NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE paas-postgres ClusterIP 10.244.69.82 <none> 5432/TCP 2d17h
Create a Kubernetes Endpoints for the Kubernetes service. The IP address should be the one from the service access (connection-string or connection-parameters). In this example, the ip address is
10.244.0.43
yaml kind: "Endpoints" apiVersion: "v1" metadata: name: "paas-postgres" subsets: - addresses: - ip: "10.244.0.43" ports: - port: 5432 name: "paas-postgres"
After applying the above yaml manifest, you can get the
paas-postgres
endpoints as following$ kubectl get endpoints paas-postgres NAME ENDPOINTS AGE paas-postgres 10.244.0.43:5432 2d17h
Create the secrets for database access, use the postgres database, username, and password.
$ kubectl create secret generic paas-postgres \ --from-literal=database=postgres \ --from-literal=username=postgres \ --from-literal=password=XXpasswordXX
As the service, endpoint, and secrets were created, the application now can access the
postgres
database as a Kubernetes service. Here is an example on how to configure your application to access thepostgres
database.apiVersion: apps/v1 kind: Deployment metadata: name: my-app labels: app: my-app spec: replicas: 1 selector: matchLabels: app: my-app template: metadata: labels: app: my-app spec: containers: - name: my-app image: postgres:12-alpine imagePullPolicy: Always env: - name: DATABASE_HOST value: "paas-postgres" - name: DATABASE_NAME valueFrom: secretKeyRef: name: paas-postgres key: database - name: DATABASE_USER valueFrom: secretKeyRef: name: paas-postgres key: username - name: DATABASE_PASSWORD valueFrom: secretKeyRef: name: paas-postgres key: password - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: name: paas-postgres key: password ports: - containerPort: 8080
Show the pods
$ kubectl get pods NAME READY STATUS RESTARTS AGE my-app-6559f7f88c-fjqtq 1/1 Running 0 10s
You can access the database from one of the pods
$ kubectl exec -it my-app-6559f7f88c-fjqtq bash
Connect, describe and list the database
bash-5.1# PGPASSWORD=$POSTGRES_PASSWORD psql -U $DATABASE_USER -h $DATABASE_HOST psql (12.10, server 13.0 (Debian 13.0-1.pgdg100+1)) WARNING: psql major version 12, server major version 13. Some psql features might not work. Type "help" for help. postgres=# \d List of relations Schema | Name | Type | Owner --------+-----------------------------------+----------+---------- public | auth_group | table | postgres public | auth_group_id_seq | sequence | postgres public | auth_group_permissions | table | postgres public | auth_group_permissions_id_seq | sequence | postgres public | auth_permission | table | postgres public | auth_permission_id_seq | sequence | postgres public | auth_user | table | postgres public | auth_user_groups | table | postgres public | auth_user_groups_id_seq | sequence | postgres public | auth_user_id_seq | sequence | postgres postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+----------+------------+------------+----------------------- postgres | postgres | UTF8 | en_US.utf8 | en_US.utf8 | template0 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | =c/postgres + | | | | | postgres=CTc/postgres (3 rows) postgres=#
Resource Protection
Resources like servers, storages, networks, ip addresses or load balancers, which make up the cluster, are visible to you via API or within the Cloud Panel for transparency and billing reasons. They are, however, protected from being altered. This not only makes sure that they are not deleted accidentally, but is also vital to stable cluster operations.
Protected Resources:
- Master Nodes (server, storage, ips)
- Worker Nodes (server, storage, ips)
- Kubernetes network
- You can still attach your own servers or platform services to it, i.e. to access them from inside your cluster.
- Storages created by Kubernetes (like Persistent Volumes)
- LoadBalancers created by Kubernetes (like Ingress-Controllers)
If you want to change your worker config you can still do this in the Kubernetes configuration.
Horizontal Pod Autoscaler (HPA)
In order to use the horizontal pod autoscaler (HPA) you need to install the Metrics Server. You can bring your own or just follow the example.
Install Metrics Server
You can install the Metrics Server via Helm. There is a ready-to-use Metrics Server Helm Chart by Bitnami.
Add the Bitnami Metrics Server repository to your Helm installation:
helm repo add bitnami https://charts.bitnami.com/bitnami
Create a values.yaml
with this content to configure your Metrics Server:
apiService:
create: true
extraArgs:
- --kubelet-insecure-tls=true
- --kubelet-preferred-address-types=InternalIP
Install the Metrics Server Helm Chart:
helm install metrics-server bitnami/metrics-server -f values.yaml
Wait for the Metrics Server to be ready. It might take a minute or two before the first metrics are collected.
Run HPA
In order to run the HPA you need to create a deployment and generate some load against it.
Keep in mind that it is required to define the resource limits and request in order to use the HPA. The service is just for allowing access for load-generator.
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
Download the example deployment and service and deploy with:
kubectl apply -f php-apache.yaml
Create the HPA for the deployment:
kubectl autoscale deployment php-apache --cpu-percent=20 --min=1 --max=10
Check the current status of the HPA:
kubectl get hpa
This should look like this:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 2d22h
Generate Test Load
Now you create an infinite loop which will generate a load:
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
Open a second terminal and check the HPA status:
kubectl get hpa -w
After some time you should see the pods scale:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 10 1 2d22h
php-apache Deployment/php-apache 91%/50% 1 10 1 2d22h
php-apache Deployment/php-apache 91%/50% 1 10 2 2d22h
php-apache Deployment/php-apache 253%/50% 1 10 2 2d22h
php-apache Deployment/php-apache 253%/50% 1 10 4 2d22h
php-apache Deployment/php-apache 253%/50% 1 10 6 2d22h
php-apache Deployment/php-apache 101%/50% 1 10 6 2d22h
php-apache Deployment/php-apache 71%/50% 1 10 6 2d22h
php-apache Deployment/php-apache 71%/50% 1 10 9 2d22h
php-apache Deployment/php-apache 51%/50% 1 10 9 2d22h
You can also check the deployment itself:
kubectl get deployment php-apache
NAME READY UP-TO-DATE AVAILABLE AGE
php-apache 9/9 9 9 2d22h
Stop Load and Clean Up
In order to stop the load, hit CTRL+C in the terminal where you started the load generator.
You can verify the scale down with the commands from above:
kubectl get deployment php-apache -w
Delete the example deployment and service:
kubectl delete -f php-apache.yaml
Vertical Scaling
GSK supports vertical scaling, which can be enabled by simply editing the worker node configuration of your Kubernetes cluster in the Cloud Panel or via the API. Scaling the cluster will recycle all nodes sequentially.
The following node resources can be changed:
- Cores per worker node via parameter
k8s_worker_node_cores
- RAM per worker node via parameter
k8s_worker_node_ram
- Storage per worker node via parameter
k8s_worker_node_storage
- Storage type per worker node via parameter
k8s_worker_node_storage_type
You can either change these in your Cloud Panel in the Configuration section, or via API.
To do so via API, you need to patch your cluster’s parameters. Always include all the parameters in the patch, not just the ones you want to change.
For example:
{
"parameters": {
"k8s_worker_node_ram": 4,
"k8s_worker_node_cores": 2,
"k8s_worker_node_count": 3,
"k8s_worker_node_storage": 40,
"k8s_worker_node_storage_type": "storage"
}
}
Worker Node Storage Performance Classes
Worker nodes in your cluster use a distributed storage for their operating system. On cluster creation, you choose the performance class for this storage with the parameter k8s_worker_node_storage_type
.
The performance class of your worker nodes’ storage is independent of your PersistentVolumes and only affects the OS, kubelet and potential hostPath
mounts. A higher performance class can help the node stay responsive when under increased memory pressure.
The performance class of your worker nodes can be changed at any time by editing your cluster. You can do so either in your Cloud Panel in the Configuration section, or via API. Changing the performance class will recycle all nodes sequentially.
To do so via API, you need to patch your cluster’s parameters to update the parameter k8s_worker_node_storage_type
. Always include all the parameters in the patch, not just the ones you want to change.
For example:
{
"parameters": {
"k8s_worker_node_ram": 4,
"k8s_worker_node_cores": 2,
"k8s_worker_node_count": 3,
"k8s_worker_node_storage": 40,
"k8s_worker_node_storage_type": "storage"
}
}
Logging
Container logs can be obtained via kubectl
. While this is certainly feasible for ad-hoc debugging of single containers, it doesn’t give you the full picture of your application or even the whole cluster.
It is therefore a common practice to ship logs to a centralized log management platform, where they can be transformed and analyzed in one place - giving you that full picture and the means to act on events or trends.
There are multiple ways to get your logs into the log management platform:
- Your application can directly implement the format your log management platform accepts the logs in, and send them there.
- Your application can log to
stdout
andstderr
, leaving it to the container engine to store the logs.
It is a good practice to use the latter approach. This approach decouples the application from runtime environment specifics. It is non-blocking for the application and provides a general approach to reliably and securely transfer logs, even when running into temporary unavailability of the log management platform
Log Shipping
While the container engine technically might be able to ship the logs directly to your log management platform, having the container engine store them locally instead and a third-party component read and ship them has proven to be the more reliable and portable solution.
This third-party component is called a log shipper. In general it can run anywhere, has inputs to read logs from locally and outputs to ship logs to remotely. The log shipper is an application agnostic approach - in the sense that it doesn’t need to be integrated into the applications you run on your cluster in any way. It just needs to support the format the logs are stored in as an input and the format the log management platform accepts the logs in as an output.
Accessing Container Logs
GSK 1.24 and higher
The logging format used by the container engine is the CRI logging format.
You can choose any log shipper that supports the CRI logging format, such as
Logs are stored in /var/log/containers
and /var/log/pods
.
GSK 1.23 and lower
The log driver used by the container engine docker on our managed Kubernetes platform is journald
.
journald is part of systemd and designed to store logs safely and handle rotation gracefully to prevent node disks from filling up. journald makes it easy for the shipper to reliably transfer logs, since the shipper only needs to keep track of one event stream.
journald stores logs in /var/log/journal
. Among the log shippers that support journald as an input are:
Note:
The log shipper needs to keep track where it left off, so that after a restart/redployment log shipping doesn’t start at the beginning resp. all logs are transferred again. Since the position is node-specific, a local hostPath
mount to store the position in is recommended.
Load Balancing
Applying a service with the type of Load Balancer will provision a gridscale Load Balancer. Below are some helpful tips on integrating with our Load Balancer as a Service (LBaaS):
IP Address Forwarding
The Load Balancer needs to be set to HTTP mode.
The client’s IP address is then available in the X-Forwarded-For
HTTP header.
Note: When in HTTP mode, HTTPS-termination happens at the Load Balancer level. For the HTTP mode alone, certificates will be obtained via Let’s Encrypt or you can upload your own custom certificate.
Configuring Load Balancer Modes
The cloud controller manager (CCM) uses service annotations to configure the LBaaS for a GSK cluster. If an annotation of a specific parameter is not set, the default value for that parameter will be configured. This feature is supported from these GSK versions 1.18.12-gs1, 1.19.4-gs1, 1.17.14-gs1, and 1.16.15-gs2 and later.
Annotation | Default value |
---|---|
service.beta.kubernetes.io/gs-loadbalancer-mode | tcp |
service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https | “false” |
service.beta.kubernetes.io/gs-loadbalancer-ssl-domains | nil |
service.beta.kubernetes.io/gs-loadbalancer-algorithm | leastconn |
service.beta.kubernetes.io/gs-loadbalancer-https-ports | 443 |
service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids | nil |
Examples
- The following annotations configure the LBaaS with HTTP mode, Round Robin Algorithm, redirect HTTP to HTTPS, and multiple SSL Domains wherein domains are separated by a comma. The
service.beta.kubernetes.io/gs-loadbalancer-ssl-domains
annotation allows you to add multiple SSL Domains to the loadbalancer.
annotations:
service.beta.kubernetes.io/gs-loadbalancer-mode: http
service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https: "true"
service.beta.kubernetes.io/gs-loadbalancer-ssl-domains: demo1.test.com,demo2.test.com
service.beta.kubernetes.io/gs-loadbalancer-algorithm: roundrobin
- The following annotations configure the LBaaS with HTTP mode, Round Robin Algorithm, redirect HTTP to HTTPS, a none standard SSL port 4443, and a custom certificate wherein certificate UUIDs are separated by a comma. The
service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids
annotation allows you to an already uploaded custom certificates to the loadbalancer. Thus, first upload the custom certificate via the panel or API. Then, you can use the uuid of the uploaded custom certificate, for examplec8b786e7-53ee-427b-8ff6-498f59f58b14
, withservice.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids
annotation.
annotations:
service.beta.kubernetes.io/gs-loadbalancer-mode: http
service.beta.kubernetes.io/gs-loadbalancer-redirect-http-to-https: "true"
service.beta.kubernetes.io/gs-loadbalancer-custom-certificate-uuids: c8b786e7-53ee-427b-8ff6-498f59f58b14
service.beta.kubernetes.io/gs-loadbalancer-algorithm: roundrobin
service.beta.kubernetes.io/gs-loadbalancer-https-ports: "4443"
Adding Annotations to an Existing Ingress
You can customize the behaviour of specific Ingress objects using annotations:
kubectl annotate --overwrite svc <INGRESS_NAME> \
"service.beta.kubernetes.io/gs-loadbalancer-mode=http" \
"service.beta.kubernetes.io/gs-loadbalancer-algorithm=roundrobin"
Networking
We use Flannel out-of-the-box, which cannot be currently changed.
Network Policies
Due to Flannel being used as the network overlay, our cluster does not support networking policies.
Persistent Volumes
We differentiate between Persistent Volumes that are based on block devices and those that are based on network filesystems.
Block Device Persistent Volumes
Block device based Persistent Volumes use distributed storages that are directly attached to your GSK nodes.
Since they are block devices with plain, non-clustered filesystems (ext4
by default), they can only ever be attached to a single node at a time and thus only be used by pods that run on the same node. (ReadWriteOnce (RWO) access mode)
Their strength is performance.
Storage Classes
Block device based Persistent Volumes give you the raw performance of the Distributed Storage. You can find a storage class for each of its performance classes.
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
block-storage (default) bs.csi.gridscale.io Delete Immediate true 68d
block-storage-high bs.csi.gridscale.io Delete Immediate true 68d
block-storage-insane bs.csi.gridscale.io Delete Immediate true 68d
Reclaim Policy
Reclaim policy Delete
makes sure that deleting Persistent Volumes (PV) will also delete the corresponding Distributed Storage.
Deleting and changing preconfigured storage classes to modify this behaviour is not recommended. Your changes will be reverted with every upgrade.
Instead, create your own storage classes that use the same provisioner
.
Limitations
Block device based Persistent Volumes are subject to Distributed Storage and Server limitations. Currently, up to 15 storages respectively Persistent Volumes can be attached to a single GSK node at a time. The attach-process takes a few seconds per Storage/PV.
Network Filesystem Persistent Volumes via GridFs
Requires 1.19.16-gs0, 1.20.15-gs0, 1.21.11-gs0 or higher.
Network Filesystem based Persistent Volumes use GridFs to store data. GridFs is an NFS-compatible network filesystem. It grows with your data, you only pay for volume you actually use and your data can be access read-write by any number of GSK nodes at a time. (ReadWriteMany (RWX) and ReadOnlyMany (ROM) access modes)
Its strengths are scalability and being read-write accessible from all your GSK nodes.
Set up GridFs based Persistent Volumes
GridFs is an NFS compatible network filesystem. As such, access is achieved through the NFS CSI driver for Kubernetes.
- Create a new GridFs instance or use an existing one.
- Follow the first three steps of Connect a Kubernetes Cluster to a PaaS service to make sure your GridFs is connected to your GSK cluster.
- Install the NFS CSI driver for Kubernetes as described here.
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --namespace kube-system --version v4.0.0
- Create a storage class that uses the NFS CSI driver as the provisioner and your GridFS as the NFS server.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gridfs-<PAAS_SERVICE_UUID OF YOUR GRIDFS>
provisioner: nfs.csi.k8s.io
parameters:
server: <IP ADDRESS OF YOUR GRIDFS>
share: /
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
- nfsvers=4.1
- Use that storage class for your PVCs.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-first-gridfs-pvc
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: gridfs-<PAAS_SERVICE_UUID OF YOUR GRIDFS>
- The NFS CSI driver creates a directory for this PVC under the
share
-path configured in the storage class and makes it available as a new PersistentVolume.
Limitations
Network Filesystem based Persistent Volumes via GridFs can hold any number of PVCs in a single GridFs instance.
Host Path Persistent Volumes
Aside from block device based and network filesystem based Persistent Volumes, hostPath
Persistent Volumes can be used for node-local storage.
Please note:
- Due to the transient nature of the Kubernetes nodes,
hostPath
Persistent Volumes will be lost whenever the node is being recycled. (f.e. during updates, upgrades or node recovery) - Use of
hostPath
Persistent Volumes can fill up node-local storage and affect health of the node.
Persistent Volumes are not automatically deleted
The PersistentVolume is created automatically when a PersistentVolumeClaim is requested. But it’s not automatically deleted after you delete the GSK cluster. This behaviour prevents data loss of your persistent volumes.
There are two ways to delete the persistent volumes:
- After deleting the cluster, it’s also possible to delete the persistent volumes from the Cloud Panel.
- Before deleting the cluster, you should delete the related deployments that use the PersistentVolume and the PersistentVolumeClaim from the cluster.
Ingress Controller
Your cluster does not come with an ingress controller preinstalled. You can install the ingress controller of your choice as described in ingress-controllers.
Access and Security
All users with write access (or higher) to the project will be able to download the Kuberenetes certificate.
PKI Certificate Access
Authentication against the Kubernetes master is based on X.509 client certificates, which can be generated and expire after three days. This can be used with gscloud, which will automatically renew the certificate for you.
After installation of gscloud, set it up with your API token as described here. Then use gscloud to fetch and maintain your kubeconfig as described here.
Encryption
Data is encrypted at rest, and network traffic is TLS encrypted on the application layer.
Role-based Access Control (RBAC)
GSK supports standard Kubernetes RBAC.
Firewall
GSK controlplane and worker nodes utilize the firewall in the OS to secure cluster-internals from the public network.
This does not restrict you from exposing your workloads to the public network.
Backups
Data that belongs to the controlplane of the cluster (such as etcd) is backed up by gridscale.
Data that comes from within the application needs to be backed up by the user. gridscale Storage snapshots and backups are not supported by GSK at this point. They cannot be used for backing up persisted data.
Please employ a solution that runs in the cluster.
Node Pools
Currently, we only support one node pool.
Kubernetes Dashboard
The official Kubernetes dashboard is not deployed by default and can be installed with a single command that is mentioned in the Official Kubernetes Documentation.
Known issues
Storage instances are not deleted from gridscale panel
To prevent this issue, please do NOT delete the PVs (Persistent Volumes) before the storage instances are deleted completely from the panel. If you already have some storage instances dangling in the panel, please contact us to remove them.
Cannot delete k8s cluster when there are other PaaS/servers connected to the cluster’s private network
The issue can be solved by either attaching the PaaS/servers to other networks or removing the PaaS/servers.
Node labels do not persist
Nodes in a Kubernetes cluster are volatile and can be replaced at any time, i.e. during updates, upgrades or node recovery. When they are, replacement nodes do not inherit their labels.
If you control scheduling of your pods with nodeSelector and node labels, please consider migrating to Affinity and anti-affinity.
FAQ
Does gridscale monitor the cluster?
We monitor the overall cluster health of a cluster. We assure that the cluster is healthy and functional, and we will be paged about abnormal conditions of the cluster.
gridscale does not monitor the application(s) that are deployed within the cluster. Since we don’t know anything about your workloads, we don’t include performance and resource monitoring from our side as part of the standardised gridscale Managed Kubernetes (GSK).
Do cluster components communicate on the Public or the Private Network?
Cluster communication is strictly private. This includes communication between Kubernetes components, but also communication between pods and/or services.
However, as a user you can contact external services.
Thereby it would technically be possible, but not usual, to communicate with other services on the cluster through the Public Network and Load Balancers, if that service is exposed to the outside and communication is explicitly directed there through public connection details.
A specific tool that I want to use with my cluster is not working. What shall I do?
Please check whether your tool is supporting the kubernetes version of your cluster. If your cluster version is not supported, please have a look in the Cloud Panel, where you can update your cluster to a new patch version (e.g. 1.24.8 to 1.24.9) or replace your cluster with a more up-to-date one.
I cannot see PVC usage in Grafana. What shall I do?
Please ensure that the volume is mounted for long enough, and the query interval in Grafana low enough to catch all metrics.
Terms and Abbreviations
- GSK: gridscale Kubernetes
- K8s: K-ubernete-s.
- kubectl: A command line tool which functions as a management interface for a K8s cluster.
- Node: A K8s cluster is made of a few virtual machines that talk to each other. In this context, a virtual machine is a node. A master (we have one master at the moment) and one or more workers.
- Control Plane: A fancy way of saying “masters of the cluster”. Technically, all programs that run on the master that make the cluster a cluster. For instance, a specialized database or a program that decides which worker should run which software.
- Deployment: In most cases an app running on K8s. Technically a collection of containers based on a set of templates (images).
- PV: Persistent Volume. A persistent storage for Kubernetes deployments.
- PVC: Persistent Volume Claim. When a client (user, customer, an application) needs a PV, they send a PVC to the K8s cluster.
- Service: A way of accessing your deployment outside of the cluster, tightly related to Load Balancers and Ingresses.
- Ingress: A special way of exposing a deployment outside of the cluster. Think of it as a kind of Load Balancer.
- IngressController: This component runs inside the cluster and is responsible for handling requests for an Ingress.
- RBAC: Role Based Access-Control. Allows you to selectively give different people different access rights to the cluster.
- Dashboard: A graphical frontend for the cluster API. The user can see their deployments, nodes and a few metrics without using the command line. This is not enabled by default, but can be easily installed into the GSK.