Configuring Gloo API Gateway on a Google Kubernetes Engine (GKE) Private Cluster

Google Kubernetes Engine (GKE) is one of the most popular options out there for running Kubernetes in production on a public cloud provider. The model of GKE is to have one or more master nodes running in a Google managed project, and worker nodes running in a customer-managed project.

GKE clusters can be zonal, multi-zonal, or regional, all of which affect the availability and cost of the cluster. Each worker node receives a public IP address, and Google uses that public IP address to communicate between the master nodes and the worker nodes. Giving a public IP address to worker nodes may be undesirable for certain workloads and industries. Google created private clusters to address such a situation.

Using private clusters solves the public IP address problem, but it also introduces new challenges for the Kubernetes administrator. In this post we will look at how you can deploy a private cluster in GKE and install Gloo in gateway mode to manage traffic for services running on the cluster.

GKE private cluster overview

The master nodes in a GKE cluster run inside a Google managed project and VPC. On a normal cluster, the master nodes are able to communicate with the worker nodes using the public IP addresses allocated to the worker nodes. In a private cluster, the worker nodes are only assigned private IP addresses in the RFC 1918 space. The master nodes can no longer use public IP addresses to talk to the worker nodes.

Google’s solution was to create a VPC network peering connection between the VPC hosting the master nodes and the VPC hosting the worker nodes. A VPC network peering connection creates a direct connection between two VPCs, enabling communication between the VPCs using private IP addresses. A network VPC peering connection has a default deny all firewall rule restricting traffic between the two VPCs. Google must create firewall rules between the two VPCs to allow traffic from the master nodes to the worker nodes.

The nature of the private cluster configuration creates two potential issues.

The worker nodes have no public IP address, which means they cannot access internet-based resources. Any container image registry not hosted on GCP or a privately connected network will be inaccessible.

The firewall rules restrict the API server on the master nodes from communicating directly with any of the pods. That creates an issue for services like Linkerd and Gloo gateway, which rely on API server to pod communication. Fortunately, we can address both of these issues by making a few tweaks.

Creating a GKE private cluster

Let’s first create a private GKE cluster so we can see the process in action. If you’d like to follow along, you’ll need a Google Compute Platform account and access to the gcloud CLI locally or through the Cloud Shell in the Console. Once you have the CLI ready, select the project and region you want to use.

Now we are going to set a few variables for use in later commands. Switch out the values for what makes sense in your project. We are going to use the default VPC and the us-central1-c zone.

# Network to use for the worker nodes
NETWORK=default

# Subnet name to create for the worker nodes
SUBNET_NAME=private-gke-cluster

# Network CIDR for the master nodes (should be /28)
MASTER_NETWORK=192.168.42.0/28

# Name of the cluster
CLUSTER_NAME=private-cluster-0

#Region for the cluster
REGION_NAME=us-central1

# Zone for the cluster
ZONE_NAME=us-central1-c

In this command we are going to create a private cluster using a new subnet created in the default VPC of the currently selected project and region.

gcloud config set compute/zone $ZONE_NAME

gcloud container clusters create $CLUSTER_NAME \
    --create-subnetwork name=$SUBNET_NAME \
    --no-enable-master-authorized-networks \
    --enable-ip-alias \
    --enable-private-nodes \
    --master-ipv4-cidr $MASTER_NETWORK \
    --no-enable-basic-auth \
    --no-issue-client-certificate

Access to the API server on the master node(s) is not being restricted. In a production scenario you would either configure authorized networks for the master node(s) or specify the --enable-private-endpoint argument, which limits access to private networks only.

The creation of the cluster also creates a few firewall rules as shown below:

Name Target Source Traffic Allow/Deny
gke-cluster-name-all gke-cluster-id-node 10.8.0.0/14 all Allow
gke-cluster-name-master gke-cluster-id-node 192.168.42.0/28 tcp:443,10250 Allow
gke-cluster-name-vms gke-cluster-id-node 10.128.0.0/9, 10.74.60.0/22 tcp:all, udp:all, icmp Allow

 

The gke-cluster-name-all rule allows all traffic from the 10.8.0.0/14 address range, which is the range being used by pods launched on the worker nodes. The gke-cluster-name-vms rule allows all udp, tcp, and icmp traffic from any subnet on the default VPC and the additional subnet range of 10.74.60.0/22 created for the services running on the Kubernetes cluster. The gke-cluster-name-master firewall rule is what allows the master node(s) on the peered VPC (using 192.168.42.0/28) to talk to the worker nodes on tcp ports 443 and 10250.

Any traffic not allowed by these rules will be denied.

Connect to the cluster

Now that we have our private GKE cluster, let’s go ahead and get connected to it. The simplest way is to run the following command:

PROJECT_ID=$(gcloud config get-value project)

gcloud container clusters get-credentials $CLUSTER_NAME --zone $ZONE_NAME --project $PROJECT_ID 

The credentials are merged into our current context for kubectl. In order to install Gloo, we need to be an admin user. Run the following command to add yourself to the cluster-admin role.

kubectl create clusterrolebinding cluster-admin-binding \
    --clusterrole cluster-admin \
    --user $(gcloud config get-value account)

Now it’s time to install Gloo.

Installing Gloo

We already have a great set of guides around installing Gloo using glooctl, but the short version can be achieved by running the following commands to install glooctl and then install Gloo as a gateway:

curl -sL https://run.solo.io/gloo/install | sh
export PATH=$HOME/.gloo/bin:$PATH

glooctl install gateway

Here is the output you will see.

Creating namespace gloo-system... Done.
Starting Gloo installation...

After which the installation will hang until you cancel it. One of the things the Gloo installation does at the beginning of its process is create a Kubernetes job to set up the certificates for the installation using the container image quay.io/solo-io/certgen hosted on quay.io. If we cancel the installation and take a look at the existing pods, we can see that gateway-certgen-xxxxx is in an ImagePullBackoff state.

kubectl get pod -n gloo-system -w
NAME                  READY  STATUS            RESTARTS AGE
gateway-certgen-5vjrg 0/1    ErrImagePull      0        23s
gateway-certgen-5vjrg 0/1    ImagePullBackOff  0        30s

Essentially, it means that the image could not be pulled from the repository for some reason. In our case, the reason is simple. The worker nodes in the private cluster have no access to the internet because they do not have a public IP address or a NAT gateway. The simplest solution is to enable Cloud NAT for the VPC.

There are two steps to this process: create a Cloud Router for the region where you want to set up Cloud NAT (assuming you don’t already have one) and then create the Cloud NAT associated with the VPC. Here are the commands to create the Cloud Router and Cloud NAT:

gcloud compute routers create cloud-rtr1 --project=$PROJECT_ID \
    --region=$REGION_NAME --network=$NETWORK

gcloud compute routers nats create nat-gw1 \
    --router-region $REGION_NAME \
    --router cloud-rtr1 \
    --nat-all-subnet-ip-ranges \
    --auto-allocate-nat-external-ips

Now let’s run the glooctl installation again:

glooctl install gateway
Unable to check if namespace gloo-system exists. Continuing...
Starting Gloo installation...

Gloo failed to install! Detailed logs available at ~/.gloo/debug.log.
Error: installing gloo in gateway mode: cannot re-use a name that is still in use

Ouch, our previous installation attempt failed. We need to do a little cleanup. Fortunately we can run the uninstall command to clean up the previous attempt.

glooctl uninstall --all
Uninstalling Gloo…
Removing Gloo system components from namespace gloo-system…
Removing Gloo CRDs...
Removing namespace gloo-system... Done.

Gloo was successfully uninstalled.

Great. Let’s try this installation one more time.

glooctl install gateway
Creating namespace gloo-system... Done.
Starting Gloo installation…

Gloo was successfully installed!

Voila! Gloo gateway has been successfully installed, and we should have no further issues pulling images for other Kubernetes deployments. Speaking of which, let’s try to deploy the Pet Store app and use Gloo Gateway to front-end it, just like in our Hello World example.

Creating Routing and Virtual Services

We have a working installation of Gloo gateway in our GKE private cluster. Now we are going to deploy the Pet Store application and attempt to create a Route and Virtual Service to the application.

You can install the Pet Store application by running the following command:

kubectl apply -f https://raw.githubusercontent.com/solo-io/gloo/v1.2.9/example/petstore/petstore.yaml

This will create the Pet Store application deployment and Gloo will automatically create an Upstream for the application. Let’s add a route to the application to our Gloo Gateway:

glooctl add route \
  --path-exact /all-pets \
  --dest-name default-petstore-8080 \
  --prefix-rewrite /api/pets
Error: creating kube resource default: Timeout: request did not complete within requested timeout 30s

Oh no! The command timed out. Why did this happen? The short answer is that the admission controller on the master node has a webhook that talks to the Gateway pod to validate the configuration, you can find more information in the official Gloo docs. It tries to reach out to the pod using port 8443. But the firewall rules only allow TCP ports 443 and 10250 from the master node to the worker nodes.

Once again the solution is simple, we can either alter the existing firewall rule to add a port or create a new firewall rule. We think it’s best to add a new firewall rule for clarity’s sake. We will do that by collecting some information and then creating the rule.

For starters, we need the source IP address range of the master node(s). We already know that it will be 192.168.42.0/28 based on how we provisioned the cluster.

Next we need to know the network tag being used for the worker nodes. We can find that by running a gcloud command to look at the firewall rules associated with the cluster we created. Shout-out to this Linkerd post that figured out the JSON parsing logic.

NETWORK_TARGET_TAG=$(gcloud compute firewall-rules list \
  --filter network=$NETWORK --format json \
  | jq ".[] | select(.name | contains(\"$CLUSTER_NAME\"))" \
  | jq -r '.targetTags[0]' | head -1)

Now we can create the firewall rule to allow TCP traffic on port 8443.

gcloud compute firewall-rules create gke-to-gloo-gateway \
  --network "$NETWORK" \
  --allow "tcp:8443" \
  --source-ranges "$MASTER_NETWORK" \
  --target-tags "$NETWORK_TARGET_TAG" \
  --priority 1000 \
  --description "Allow traffic on port 8443 for gloo gateway communication"

Running the route create command again will succeed.

glooctl add route \
  --path-exact /all-pets \
  --dest-name default-petstore-8080 \
  --prefix-rewrite /api/pets
+-----------------+--------------+---------+------+---------+-----------------+-----------------------------------+
| VIRTUAL SERVICE | DISPLAY NAME | DOMAINS | SSL  | STATUS  | LISTENERPLUGINS |              ROUTES               |
+-----------------+--------------+---------+------+---------+-----------------+-----------------------------------+
| default         |              | *       | none | Pending |                 | /all-pets                         |
|                 |              |         |      |         |                 | gloo-system.default-petstore-8080 |
|                 |              |         |      |         |                 | (upstream)                        |
+-----------------+--------------+---------+------+---------+-----------------+-----------------------------------+

And we can test the gateway by using curl, just like in the Hello World example:

curl $(glooctl proxy url)/all-pets
[{"id":1,"name":"Dog","status":"available"},{"id":2,"name":"Cat","status":"pending"}]

Conclusion

In this post we showed how to add the Cloud NAT feature to provide internet access for your worker nodes, and how to add a firewall rule to enable admission validation. Private clusters on GKE provide additional security and protection for your Kubernetes cluster, but they also require additional considerations when it comes to networking.

Aside from what we reviewed in the post, consideration must be given to the primary and secondary IP address ranges used for the cluster. Additional firewall rules may be necessary as you deploy new components in the cluster that integrate with the API server on the master node(s). Before you deploy a new application to your private cluster, review its requirements for networking and master node communication.