Restricting cluster-admin permissions
by Marcus Noble on Feb 22, 2022
This post originally appeared on https://marcusnoble.co.uk/.
If you've managed multi-user/multi-tenant Kubernetes clusters, then there's a good chance you've come across RBAC (Role-Based Access Control). RBAC provides a strong method of providing permissions to users, groups, or service accounts within a cluster. These permissions can either be cluster-wide, with ClusterRole
, or namespace scoped, with Role
. Roles can be combined to build up all the rules stating what the associated entity is allowed to perform. These rules are additive, starting from a base of no permissions to do anything, building up what is allowed to be performed and there's no syntax to take away a permission that is granted by another rule.
Generally, and by default, operators of the cluster are assigned to the cluster-admin
ClusterRole. This gives the user access and permission to do all operations on all resources in the cluster.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-admin
labels:
kubernetes.io/bootstrapping: rbac-defaults
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- '*'
- nonResourceURLs:
- '*'
verbs:
- '*'
There's very good reason for this, an admin generally needs to be able to setup and manage the cluster, including the ability to define and assign roles. But what if we need to block an action performed by cluster admins? We can't do it with RBAC, it only allows for adding of permissions, not taking them away.
Well, recently at Giant Swarm, we had an issue in one of our CLIs used by cluster admins that incorrectly deleted more resources than intended. This was causing issues authenticating with the cluster, and we needed to get a fix in place fast. The problem was we had no control over when people would update their CLI even once we'd released the fix.
We couldn't hot-fix this by updating RBAC rules as we couldn't subtract the specific permission, but what we could do was leverage an admission controller to block the request to the API.
At Giant Swarm, we're pretty big users of Kyverno (which deploys an admission controller) and use it for a lot of validation and defaulting of resources in all our clusters. Not only can Kyverno validate and block based on a resource's values but it can also validate the operation being performed, and by who.
With this functionality available, we were able to deploy a cluster policy that would block cluster-admin (and anyone else) from performing the specific delete all action that was causing us issues.
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: block-bulk-certconfigs-delete
annotations:
policies.kyverno.io/title: Block bulk certconfig deletes
policies.kyverno.io/subject: CertConfigs
policies.kyverno.io/description: >-
A bug in an old kubectl-gs version causes all certconfigs to
be deleted on performing a login, this policy blocks that
action while still allowing the single delete.
spec:
validationFailureAction: enforce
background: false
rules:
- name: block-bulk-certconfigs-delete
match:
any:
- resources:
kinds:
- CertConfig
preconditions:
any:
- key: ""
operator: Equals
value: ""
validate:
message: "Your current kubectl-gs version contains a critical bug, please update to the latest version using `kubectl gs selfupdate`"
deny:
conditions:
- key: ""
operator: In
value:
- DELETE
There are a few things going on here so I'm going to explain each bit individually.
First up, we specify the Kind of resource we want this policy to apply to, in our case, that's CertConfig
but it can be anything within the cluster. It's also possible to target specific groups and API versions if needed.
match:
any:
- resources:
kinds:
- CertConfig
Next up, we're setting a precondition that allows us to do some filtering based on the details of the request. In this instance, we're only interested in requests that don't have a name specified (this is the case when operating on a list of resources rather than a single resource) which allows us to target only those 'delete all' requests.
preconditions:
any:
- key: ""
operator: Equals
value: ""
Finally, we have the actual validation rule. We're specifying a deny
rule that'll block requests (matching the previous match
and preconditions
) with the DELETE
operation. We're also able to define a message that'll be returned to the client as the reason why the admission controller rejected the request. We're using this to inform users about the bug and encouraging them to upgrade.
validate:
message: "Your current kubectl-gs version contains a critical bug, please update to the latest version using `kubectl gs selfupdate`"
deny:
conditions:
- key: ""
operator: In
value:
- DELETE
With this single policy deployed to our clusters we've been able to block the bug in our CLI, even when the user performing the action has cluster-admin level permissions.
Other use cases
While we needed this to work around a bug in our client, there are other situations where this approach could be useful.
- A policy that prevents deletion of any resource that has a
do-not-delete: "true"
annotation on it to prevent accidental deletion of critical resources (such as persistent volumes or secrets). - A policy that prevents fetching the details of secrets in a specific namespace while still allowing in every other namespace (including those that may not yet exist).
- A policy that blocks deletes, updates or patches from everyone except a specific user that can be used to prevent others 'cleaning up' resources that you may be trying to debug.
So, if you haven't already, I recommend taking a look at Kyverno and especially taking a look at the example Policies that they have for an idea of what can be done.
You May Also Like
These Related Stories
Bridging the gap: Kubernetes on VMware Cloud Director
While the general trend of the last decade has been to move workloads to the cloud, along with major companies jumping on the bandwagon to offer their …
Manage Kubernetes Secrets using AWS Secrets Manager
Kubernetes has a built-in feature for secrets management called a Secret. The Secret object is convenient to use, but does not support storing or retr …
Part 4: Operations and the Cloud-Native Stack in Action
An in-depth series on how to easily get centralized logging, better security, performance metrics, and authentication using a Kubernetes-based platfor …