Managing Enterprise Kubernetes: Roles and Responsibilities

Written by Neil Cresswell, CEO | March 24, 2022

A questions we often get asked is "who is responsible for what inside Kubernetes?" Whilst the answer really depends on your organizational structure and how segmented your teams are, I will describe the most common structure we see.

In general, there are four roles that span the operational responsibilities of managing Kubernetes: the Infrastructure Team (for on-premises deployments), the Cloud/Platform Team, the DevOps Team, and Developers.

The infrastructure team only exists if you are using on-premises hardware, as someone needs to manage the physical server/storage/networking equipment.

The Cloud/Platform team is responsible for the creation, upgrading, and scaling of Kubernetes / Docker Clusters. It is also responsible for triaging any performance or availability issues that arise with the platform (but not the applications). Fundamentally, it holds the internal OLA to ensure the system delivers acceptable SLAs.

The Cloud Platform team is responsible for:

creating any automation for the deployment and configuration of the clusters (Infrastructure as Code),
ensuring the platform complies with internal security requirements/policies (such as authentication, activity logging, policy agents, and configuring role-based access control).
And, as the holders of the "cluster-admin" privilege, they also install any DevOps tooling that is required using this privilege, including tools such as Portainer, and any other observability, logging, and access tooling.

Combined, the Infrastructure Team and the Cloud Platform team are commonly known as the "Ops" Team.

The DevOps team is responsible for

getting (and keeping) applications running in production and non-production environments. This often includes writing (or at least assisting) the dockerfiles that create container images from developer code.
They create the application deployment manifests, and are the people that configure any CI/CD automation pipelines that ensure the application is built and deployed as expected.
They are generally the team that is on call to support any issues with the applications in production. They need to be able to triage application performance and availability issues and so are consumers of observability and logging tools that run in the cluster.

The Dev team is responsible for

writing the application code and testing the code works locally on their development environments,
creating dockerfiles and local deployment manifests (compose files). They are also consumers of the CI/CD pipelines through automated image builds for their committed code.
supporting their applications in production.

Combined, the Devs and DevOps teams are often known as the "Development" team.

For larger organisations, there is also likely an SRE team, whose sole focus is to improve the system reliability through continuous improvements, either by recommending adjustments to deployment configurations, implementing more reliable rolling update policies, monitoring load distribution and multi-geo deployments etc. When an SRE team is in play, they are the team ultimately responsible for application performance and availability.

For smaller organisations, it's common for the Infrastructure Team, the Cloud/Platform Team, and the DevOps team to be one and the same group of people.

So, this is the most common structure we see most often in organizations today... but is it a panacea? Only time will tell. One thing is for certain though, where your organization obtains differentiated value from your digital assets (customer facing software), you must have your Devs focussed on improving software, not running the platform that the apps run on.

Thoughts?

Neil

View full post