Configuring Gloo API Gateway on a Google Kubernetes Engine (GKE) Private Cluster

Ned Bellavance | April 21, 2020

Google Kubernetes Engine (GKE) is one of the most popular options out there for running Kubernetes in production on a public cloud provider. The model of GKE is to have one or more master nodes running in a Google managed project, and worker nodes running in a customer-managed project.

GKE clusters can be zonal, multi-zonal, or regional, all of which affect the availability and cost of the cluster. Each worker node receives a public IP address, and Google uses that public IP address to communicate between the master nodes and the worker nodes. Giving a public IP address to worker nodes may be undesirable for certain workloads and industries. Google created private clusters to address such a situation.

Using private clusters solves the public IP address problem, but it also introduces new challenges for the Kubernetes administrator. In this post we will look at how you can deploy a private cluster in GKE and install Gloo in gateway mode to manage traffic for services running on the cluster.

GKE private cluster overview

The master nodes in a GKE cluster run inside a Google managed project and VPC. On a normal cluster, the master nodes are able to communicate with the worker nodes using the public IP addresses allocated to the worker nodes. In a private cluster, the worker nodes are only assigned private IP addresses in the RFC 1918 space. The master nodes can no longer use public IP addresses to talk to the worker nodes.

Google’s solution was to create a VPC network peering connection between the VPC hosting the master nodes and the VPC hosting the worker nodes. A VPC network peering connection creates a direct connection between two VPCs, enabling communication between the VPCs using private IP addresses. A network VPC peering connection has a default deny all firewall rule restricting traffic between the two VPCs. Google must create firewall rules between the two VPCs to allow traffic from the master nodes to the worker nodes.

The nature of the private cluster configuration creates two potential issues.

The worker nodes have no public IP address, which means they cannot access internet-based resources. Any container image registry not hosted on GCP or a privately connected network will be inaccessible.

The firewall rules restrict the API server on the master nodes from communicating directly with any of the pods. That creates an issue for services like Linkerd and Gloo gateway, which rely on API server to pod communication. Fortunately, we can address both of these issues by making a few tweaks.

Creating a GKE private cluster

Let’s first create a private GKE cluster so we can see the process in action. If you’d like to follow along, you’ll need a Google Compute Platform account and access to the gcloud CLI locally or through the Cloud Shell in the Console. Once you have the CLI ready, select the project and region you want to use.

Now we are going to set a few variables for use in later commands. Switch out the values for what makes sense in your project. We are going to use the default VPC and the us-central1-c zone.

In this command we are going to create a private cluster using a new subnet created in the default VPC of the currently selected project and region.

Access to the API server on the master node(s) is not being restricted. In a production scenario you would either configure authorized networks for the master node(s) or specify the --enable-private-endpoint argument, which limits access to private networks only.

The creation of the cluster also creates a few firewall rules as shown below:

NameTargetSourceTrafficAllow/Deny
gke-cluster-name-allgke-cluster-id-node10.8.0.0/14allAllow
gke-cluster-name-mastergke-cluster-id-node192.168.42.0/28tcp:443,10250Allow
gke-cluster-name-vmsgke-cluster-id-node10.128.0.0/9, 10.74.60.0/22tcp:all, udp:all, icmpAllow

 

The gke-cluster-name-all rule allows all traffic from the 10.8.0.0/14 address range, which is the range being used by pods launched on the worker nodes. The gke-cluster-name-vms rule allows all udp, tcp, and icmp traffic from any subnet on the default VPC and the additional subnet range of 10.74.60.0/22 created for the services running on the Kubernetes cluster. The gke-cluster-name-master firewall rule is what allows the master node(s) on the peered VPC (using 192.168.42.0/28) to talk to the worker nodes on tcp ports 443 and 10250.

Any traffic not allowed by these rules will be denied.

Connect to the cluster

Now that we have our private GKE cluster, let’s go ahead and get connected to it. The simplest way is to run the following command:

The credentials are merged into our current context for kubectl. In order to install Gloo, we need to be an admin user. Run the following command to add yourself to the cluster-admin role.

Now it’s time to install Gloo.

Installing Gloo

We already have a great set of guides around installing Gloo using glooctl, but the short version can be achieved by running the following commands to install glooctl and then install Gloo as a gateway:

Here is the output you will see.

After which the installation will hang until you cancel it. One of the things the Gloo installation does at the beginning of its process is create a Kubernetes job to set up the certificates for the installation using the container image quay.io/solo-io/certgen hosted on quay.io. If we cancel the installation and take a look at the existing pods, we can see that gateway-certgen-xxxxx is in an ImagePullBackoff state.

Essentially, it means that the image could not be pulled from the repository for some reason. In our case, the reason is simple. The worker nodes in the private cluster have no access to the internet because they do not have a public IP address or a NAT gateway. The simplest solution is to enable Cloud NAT for the VPC.

There are two steps to this process: create a Cloud Router for the region where you want to set up Cloud NAT (assuming you don’t already have one) and then create the Cloud NAT associated with the VPC. Here are the commands to create the Cloud Router and Cloud NAT:

Now let’s run the glooctl installation again:

Ouch, our previous installation attempt failed. We need to do a little cleanup. Fortunately we can run the uninstall command to clean up the previous attempt.

Great. Let’s try this installation one more time.

Voila! Gloo gateway has been successfully installed, and we should have no further issues pulling images for other Kubernetes deployments. Speaking of which, let’s try to deploy the Pet Store app and use Gloo Gateway to front-end it, just like in our Hello World example.

Creating Routing and Virtual Services

We have a working installation of Gloo gateway in our GKE private cluster. Now we are going to deploy the Pet Store application and attempt to create a Route and Virtual Service to the application.

You can install the Pet Store application by running the following command:

This will create the Pet Store application deployment and Gloo will automatically create an Upstream for the application. Let’s add a route to the application to our Gloo Gateway:

Oh no! The command timed out. Why did this happen? The short answer is that the admission controller on the master node has a webhook that talks to the Gateway pod to validate the configuration, you can find more information in the official Gloo docs. It tries to reach out to the pod using port 8443. But the firewall rules only allow TCP ports 443 and 10250 from the master node to the worker nodes.

Once again the solution is simple, we can either alter the existing firewall rule to add a port or create a new firewall rule. We think it’s best to add a new firewall rule for clarity’s sake. We will do that by collecting some information and then creating the rule.

For starters, we need the source IP address range of the master node(s). We already know that it will be 192.168.42.0/28 based on how we provisioned the cluster.

Next we need to know the network tag being used for the worker nodes. We can find that by running a gcloud command to look at the firewall rules associated with the cluster we created. Shout-out to this Linkerd post that figured out the JSON parsing logic.

Now we can create the firewall rule to allow TCP traffic on port 8443.

Running the route create command again will succeed.

And we can test the gateway by using curl, just like in the Hello World example:

Conclusion

In this post we showed how to add the Cloud NAT feature to provide internet access for your worker nodes, and how to add a firewall rule to enable admission validation. Private clusters on GKE provide additional security and protection for your Kubernetes cluster, but they also require additional considerations when it comes to networking.

Aside from what we reviewed in the post, consideration must be given to the primary and secondary IP address ranges used for the cluster. Additional firewall rules may be necessary as you deploy new components in the cluster that integrate with the API server on the master node(s). Before you deploy a new application to your private cluster, review its requirements for networking and master node communication.

Back to Blog