Many of you have chosen Red Hat OpenShift to orchestrate Kubernetes containers on-premises. At the same time, most of you are also adopting (or about to adopt) a service mesh to connect your containerized applications, and Istio is becoming the de facto industry standard for service mesh management. Open source Istio has many useful features, including service-to-service communications with mutual Transport Layer Security (mTLS), enabling canary deployments of new software builds, and providing telemetry data for observability. As your reliance on Istio within OpenShift increases, you'll very quickly realize that to avoid unplanned interruptions you should run Istio on multiple clusters and set up cross cluster communications and failover. These features increase reliability and reduce your risk.
Gloo Mesh is a management plane that simplifies operations and workflows of service mesh installations across multiple clusters and deployment footprints, building on the strengths of Istio. With Gloo Mesh, you can install, discover, and operate a service-mesh deployment across your enterprise, deployed on premises, or in the cloud, even across heterogeneous service-mesh implementations. In this blog, I'll explain how to deploy Istio (1.9) on multiple OpenShift (4.6.22) clusters on IBM cloud and how to leverage Gloo Mesh for:
- mTLS between pods running on different clusters
- Locality-based failover
- Global access control
- Global observability
Preparation
First of all, we need a few OpenShift clusters, three in fact. We'll deploy the management plane, Gloo Mesh, on one of these clusters and Istio on the other two clusters. Be aware that Gloo Mesh could also be deployed on one of these Istio clusters, but here we'll deploy it on a separate cluster to show that it doesn't depend on Istio.
Let's deploy these three OpenShift clusters using
IBM cloud. It's easy and our cost can be covered by
free credits from IBM for this exercise, though this approach would run the same anywhere. For our example, three worker nodes (b3c.4x16 - 4 vCPUs 16GB RAM) per cluster is more than enough. When the OpenShift clusters are ready, let's rename the Kubernetes contexts to use the following:
mgmt
,
cluster1
and
cluster2
.
How to deploy Istio on OpenShift
There are a few specific things we have to do to deploy Istio on OpenShift, but they are well documented
here. By default, OpenShift doesn’t allow containers running with user ID 0. Let's enable containers running with UID 0 for Istio’s service accounts by running the commands below:
oc --context cluster1 adm policy add-scc-to-group anyuid system:serviceaccounts:istio-system
oc --context cluster1 adm policy add-scc-to-group anyuid system:serviceaccounts:istio-operator
oc --context cluster2 adm policy add-scc-to-group anyuid system:serviceaccounts:istio-system
oc --context cluster2 adm policy add-scc-to-group anyuid system:serviceaccounts:istio-operator
Note that the second command isn't in the Istio documentation, but it is needed to deploy Istio using the Operator approach. We can deploy Istio on
cluster1
using the following yaml:
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: istiocontrolplane-default
namespace: istio-system
spec:
profile: openshift
meshConfig:
accessLogFile: /dev/stdout
enableAutoMtls: true
defaultConfig:
envoyMetricsService:
address: enterprise-agent.gloo-mesh:9977
envoyAccessLogService:
address: enterprise-agent.gloo-mesh:9977
proxyMetadata:
ISTIO_META_DNS_CAPTURE: "true"
ISTIO_META_DNS_AUTO_ALLOCATE: "true"
GLOO_MESH_CLUSTER_NAME: cluster1
values:
global:
meshID: mesh1
multiCluster:
clusterName: cluster1
trustDomain: cluster1
network: network1
meshNetworks:
network1:
endpoints:
- fromRegistry: cluster1
gateways:
- registryServiceName: istio-ingressgateway.istio-system.svc.cluster.local
port: 443
vm-network:
components:
ingressGateways:
- name: istio-ingressgateway
label:
topology.istio.io/network: network1
enabled: true
k8s:
env:
# sni-dnat adds the clusters required for AUTO_PASSTHROUGH mode
- name: ISTIO_META_ROUTER_MODE
value: "sni-dnat"
# traffic through this gateway should be routed inside the network
- name: ISTIO_META_REQUESTED_NETWORK_VIEW
value: network1
service:
ports:
- name: http2
port: 80
targetPort: 8080
- name: https
port: 443
targetPort: 8443
- name: tcp-status-port
port: 15021
targetPort: 15021
- name: tls
port: 15443
targetPort: 15443
- name: tcp-istiod
port: 15012
targetPort: 15012
- name: tcp-webhook
port: 15017
targetPort: 15017
pilot:
k8s:
env:
- name: PILOT_SKIP_VALIDATE_TRUST_DOMAIN
value: "true"
Some of the values aren't mandatory, but we will use the same yaml to demonstrate other features, like virtual machine (VM) integration (stay tuned, we'll probably write another blog on that topic soon!)
Notice a few values:
- the
profile
value is set to openshift
.
- the `envoyMetricsService` and `envoyAccessLogService` values which allow Gloo Mesh to consolidate metrics and access logs globally.
- the
trustDomain
value which ensures a unique identity for each pod globally (as soon as as they use their own Kubernetes Service Account.)
Let's use the same yaml to deploy Istion on
cluster2
, but replaced
cluster1
by
cluster2
and
network1
by
network2
. Note that we use different network values because we don't have a flat network (the pods from different clusters can't communicate directly.) Everything described in this blog would also work with a flat network.
After installation is complete, we need to expose an OpenShift route for the ingress gateway on each cluster:
oc --context cluster1 -n istio-system expose svc/istio-ingressgateway --port=http2
oc --context cluster2 -n istio-system expose svc/istio-ingressgateway --port=http2
Gloo Mesh deployment
Let's install Gloo Mesh using the Helm chart and the following options:
helm install gloo-mesh-enterprise gloo-mesh-enterprise/gloo-mesh-enterprise \
--namespace gloo-mesh --kube-context mgmt \
--set licenseKey=${GLOO_MESH_LICENSE_KEY} \
--set gloo-mesh-ui.GlooMeshDashboard.apiserver.floatingUserId=true
By default, the
kubernetes-admin
user is granted the Gloo Mesh admin role, but when deploying OpenShift on IBM cloud, the user has a different name. Let's use the following snippet to update the role binding:
cat > rolebinding-patch.yaml <<EOF
spec:
roleRef:
name: admin-role
namespace: gloo-mesh
subjects:
- kind: User
name: $(kubectl --context mgmt get user -o jsonpath='{.items[0].metadata.name}')
EOF
kubectl --context mgmt -n gloo-mesh patch rolebindings.rbac.enterprise.mesh.gloo.solo.io admin-role-binding --type=merge --patch "$(cat rolebinding-patch.yaml)"
Istio clusters registration
To register the Istio clusters, we need to find the external IP of the Gloo Mesh service:
SVC=$(kubectl --context mgmt -n gloo-mesh get svc enterprise-networking -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
Now, we can register the Istio clusters using
meshctl
:
meshctl cluster register --mgmt-context=mgmt --remote-context=cluster1 --relay-server-address=$SVC:9900 enterprise cluster1 --cluster-domain cluster.local
meshctl cluster register --mgmt-context=mgmt --remote-context=cluster2 --relay-server-address=$SVC:9900 enterprise cluster2 --cluster-domain cluster.local
Bookinfo application deployment
The Istio sidecar injected into each application pod runs with user ID 1337, which is not allowed by default in OpenShift. To allow this user ID to be used, execute the following commands:
oc --context cluster1 adm policy add-scc-to-group privileged system:serviceaccounts:default
oc --context cluster1 adm policy add-scc-to-group anyuid system:serviceaccounts:default
CNI on OpenShift is managed by Multus, and it requires a NetworkAttachmentDefinition to be present in the application namespace in order to invoke the istio-cni plugin. We do that by executing the following commands:
cat <<EOF | oc --context cluster1 -n default create -f -
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: istio-cni
EOF
Now, we can deploy the
bookinfo
application on
cluster1
:
kubectl --context cluster1 label namespace default istio-injection=enabled
# deploy bookinfo application components for all versions less than v3
kubectl --context cluster1 apply -f https://raw.githubusercontent.com/istio/istio/1.8.2/samples/bookinfo/platform/kube/bookinfo.yaml -l 'app,version notin (v3)'
# deploy all bookinfo service accounts
kubectl --context cluster1 apply -f https://raw.githubusercontent.com/istio/istio/1.8.2/samples/bookinfo/platform/kube/bookinfo.yaml -l 'account'
# configure ingress gateway to access bookinfo
kubectl --context cluster1 apply -f https://raw.githubusercontent.com/istio/istio/1.8.2/samples/bookinfo/networking/bookinfo-gateway.yaml
As you can see, we deployed everything but the version
v3
of the
reviews
service. We can follow the same steps to deploy the
bookinfo
application on the second cluster, but including the version
v3
this time.
Here is a diagram of the current situation:
Mesh federation
Gloo Mesh makes it very easy to federate the different Istio clusters. We simply need to create a Virtual Mesh using the following yaml:
apiVersion: networking.mesh.gloo.solo.io/v1
kind: VirtualMesh
metadata:
name: virtual-mesh
namespace: gloo-mesh
spec:
mtlsConfig:
autoRestartPods: true
shared:
rootCertificateAuthority:
generated: {}
federation: {}
globalAccessPolicy: ENABLED
meshes:
- name: istiod-istio-system-cluster1
namespace: gloo-mesh
- name: istiod-istio-system-cluster2
namespace: gloo-mesh
It triggers the creation of a Root certificate (but we could have provided our own) and the generation of intermediate CA certificates for the different Istio clusters. Basically, it automates what you should manually do following this Istio
documentation. The second thing that you get when you create a Virtual Mesh is workload discovery. Gloo Mesh will discover all the services running on one cluster and make the other clusters aware of them using
ServiceEntries
. For example, here are the
ServiceEntries
that have been created on `cluster1`:
NAMESPACE NAME HOSTS LOCATION RESOLUTION AGE
istio-system details.default.svc.cluster2.global [details.default.svc.cluster2.global] MESH_INTERNAL DNS 8m12s
istio-system istio-ingressgateway.istio-system.svc.cluster2.global [istio-ingressgateway.istio-system.svc.cluster2.global] MESH_INTERNAL DNS 8m12s
istio-system productpage.default.svc.cluster2.global [productpage.default.svc.cluster2.global] MESH_INTERNAL DNS 8m12s
istio-system ratings.default.svc.cluster2.global [ratings.default.svc.cluster2.global] MESH_INTERNAL DNS 8m12s
istio-system reviews.default.svc.cluster2.global [reviews.default.svc.cluster2.global] MESH_INTERNAL DNS 8m12s
As you see,
cluster1
is now aware of the services running on
cluster2
. Let's have a look at one of these ServiceEntries:
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
labels:
cluster.multicluster.solo.io: ""
owner.networking.mesh.gloo.solo.io: gloo-mesh
relay-agent: cluster1
name: reviews.default.svc.cluster2.global
namespace: istio-system
spec:
addresses:
- x.x.x.x
endpoints:
- address: y.y.y.y
labels:
cluster: cluster2
ports:
http: 15443
hosts:
- reviews.default.svc.cluster2.global
location: MESH_INTERNAL
ports:
- name: http
number: 9080
protocol: HTTP
resolution: DNS
You can see that Gloo Mesh has assigned a unique IP address and built an endpoint entry to specify how to reach this service (using the ingress gateway of
cluster2
in this case.)
Note that there's a native multi-cluster discovery mechanism in Istio called Endpoint Discovery Service (EDS), but it has some limitations, namely:
- it doesn't create
ServiceEntries
, so the users don't have any visibility on what services have been discovered
- each Istio cluster needs to discover the services of all the other clusters, so it generates more load
- each Istio cluster discovers services by communicating with the Kubernetes API servers of the other clusters, so you need to share the Kubernetes credentials of all the clusters with all the clusters. It's a security concern.
- if a Kubernetes API server isn't available,
istiod
can't start
The Gloo Mesh discovery mechanism doesn't have any of these limitations. The services are discovered by a local agent running on each cluster which passes the information to the management plane through a secure gRPC channel.
Global Access Control
Perhaps you've noticed that we have created the Virtual Mesh with the option `globalAccessPolicy` enabled. When doing so, Gloo Mesh creates Istio
AuthorizationPolicies
on all the Istio clusters to make all the service to service communications forbidden by default. We can create Gloo Mesh
AccessPolicies
to define what services are allowed to communicate together (globally). Here is an example to allow the
productpage service on
cluster1
to communicate with the
reviews
and the
ratings
services on any cluster:
apiVersion: networking.mesh.gloo.solo.io/v1
kind: AccessPolicy
metadata:
namespace: gloo-mesh
name: productpage
spec:
sourceSelector:
- kubeServiceAccountRefs:
serviceAccounts:
- name: bookinfo-productpage
namespace: default
clusterName: cluster1
destinationSelector:
- kubeServiceMatcher:
namespaces:
- default
labels:
service: details
- kubeServiceMatcher:
namespaces:
- default
labels:
service: reviews
Gloo Mesh will translate this
AccessPolicy
in the following
AuthorizationPolicies
on each cluster:
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
labels:
cluster.multicluster.solo.io: ""
owner.networking.mesh.gloo.solo.io: gloo-mesh
relay-agent: cluster1
name: details
namespace: default
spec:
rules:
- from:
- source:
principals:
- cluster1/ns/default/sa/bookinfo-productpage
selector:
matchLabels:
app: details
and
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
labels:
cluster.multicluster.solo.io: ""
owner.networking.mesh.gloo.solo.io: gloo-mesh
relay-agent: cluster1
name: reviews
namespace: default
spec:
rules:
- from:
- source:
principals:
- cluster1/ns/default/sa/bookinfo-productpage
selector:
matchLabels:
app: reviews
Gloo Mesh also shows which current running services match the criteria of the policy:
Cross cluster communication
To allow a service on one cluster to communicate with a service of another cluster, you would normally need to create several Istio objects (such as
VirtualServices
,
DestinationRules
.) Gloo Mesh makes this easier with
TrafficPolicies
. Here is an example to define a traffic shift between different versions of a services running in different clusters:
apiVersion: networking.mesh.gloo.solo.io/v1
kind: TrafficPolicy
metadata:
namespace: gloo-mesh
name: simple
spec:
sourceSelector:
- kubeWorkloadMatcher:
namespaces:
- default
destinationSelector:
- kubeServiceRefs:
services:
- clusterName: cluster1
name: reviews
namespace: default
policy:
trafficShift:
destinations:
- kubeService:
clusterName: cluster2
name: reviews
namespace: default
subset:
version: v3
weight: 75
- kubeService:
clusterName: cluster1
name: reviews
namespace: default
subset:
version: v1
weight: 15
- kubeService:
clusterName: cluster1
name: reviews
namespace: default
subset:
version: v2
weight: 10
Very easy, no? Gloo Mesh can also be used to define locality based failover. The example below defines a new hostname called
reviews.global
which will be available on any cluster. This allows a service to communicate with the local
reviews
service if it's available or automatically use a remote instance if it's not (going to the next zone first, then the next region):
apiVersion: networking.enterprise.mesh.gloo.solo.io/v1beta1
kind: VirtualDestination
metadata:
name: reviews-global
namespace: gloo-mesh
spec:
hostname: reviews.global
port:
number: 9080
protocol: http
localized:
outlierDetection:
consecutiveErrors: 1
maxEjectionPercent: 100
interval: 5s
baseEjectionTime: 120s
destinationSelectors:
- kubeServiceMatcher:
labels:
app: reviews
virtualMesh:
name: virtual-mesh
namespace: gloo-mesh
A
TrafficPolicy
can then be created to use the
VirtualDestination
in a transparent way (without explicitly sending requests to the
reviews.global
hostname.)
Global observability
As explained at the beginning of the post, we deployed Istio to have all the Envoy Proxies sending their metrics to the local Gloo Mesh agent which is then passing this data to the Gloo Mesh management plane. We can then configure a Prometheus instance to scrape the metrics from Gloo Mesh and visualize them using Grafana or Kiali. We'll also be adding the ability to see visualizations directly in the Gloo Mesh admin dashboard in the near future. We can also gather access logs globally on demand.
Let's say we have an issue with the
reviews
service on all the clusters and want to understand what's going on. We can start gathering all the corresponding access logs by creating an `AccessLogRecord`:
apiVersion: observability.enterprise.mesh.gloo.solo.io/v1
kind: AccessLogRecord
metadata:
name: access-log-reviews
namespace: gloo-mesh
spec:
workloadSelectors:
- kubeWorkloadMatcher:
namespaces:
- default
labels:
app: reviews
Then, we can send a request to the Gloo Mesh endpoint to get access logs like below:
{
"result": {
"workloadRef": {
"name": "reviews-v2",
"namespace": "default",
"clusterName": "cluster1"
},
"httpAccessLog": {
"commonProperties": {
"downstreamRemoteAddress": {
"socketAddress": {
"address": "10.102.158.19",
"portValue": 47198
}
},
"downstreamLocalAddress": {
"socketAddress": {
"address": "10.102.158.25",
"portValue": 9080
}
},
"tlsProperties": {
"tlsVersion": "TLSv1_2",
"tlsCipherSuite": 49200,
"tlsSniHostname": "outbound_.9080_._.reviews.default.svc.cluster.local",
"localCertificateProperties": {
"subjectAltName": [
{
"uri": "spiffe://cluster1/ns/default/sa/bookinfo-reviews"
}
]
},
"peerCertificateProperties": {
"subjectAltName": [
{
"uri": "spiffe://cluster1/ns/default/sa/bookinfo-productpage"
}
]
}
},
"startTime": "2021-03-21T17:33:46.182478Z",
"timeToLastRxByte": "0.000062572s",
"timeToFirstUpstreamTxByte": "0.000428530s",
"timeToLastUpstreamTxByte": "0.000436843s",
"timeToFirstUpstreamRxByte": "0.040638581s",
"timeToLastUpstreamRxByte": "0.040692768s",
"timeToFirstDownstreamTxByte": "0.040671495s",
"timeToLastDownstreamTxByte": "0.040708877s",
"upstreamRemoteAddress": {
"socketAddress": {
"address": "127.0.0.1",
"portValue": 9080
}
},
"upstreamLocalAddress": {
"socketAddress": {
"address": "127.0.0.1",
"portValue": 43078
}
},
"upstreamCluster": "inbound|9080||",
"metadata": {
"filterMetadata": {
"istio_authn": {
"request.auth.principal": "cluster1/ns/default/sa/bookinfo-productpage",
"source.namespace": "default",
"source.principal": "cluster1/ns/default/sa/bookinfo-productpage",
"source.user": "cluster1/ns/default/sa/bookinfo-productpage"
}
}
},
"routeName": "default",
"downstreamDirectRemoteAddress": {
"socketAddress": {
"address": "10.102.158.19",
"portValue": 47198
}
}
},
"protocolVersion": "HTTP11",
"request": {
"requestMethod": "GET",
"scheme": "http",
"authority": "reviews:9080",
"path": "/reviews/0",
"userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36",
"requestId": "b0522245-d300-46a4-bfd3-727fcfa42efd",
"requestHeadersBytes": "644"
},
"response": {
"responseCode": 200,
"responseHeadersBytes": "1340",
"responseBodyBytes": "379",
"responseCodeDetails": "via_upstream"
}
}
}
}
As you can see, it's far more detailed than traditional access logs. We can get information about the source and remote IP addresses, the identity of the source and remote services, performance metrics and obviously the traditional HTTP headers.
As shown here, setting up an Istio multi-cluster deployment on Red Hat OpenShift is easy with Gloo Mesh. You can run the latest Istio version, get
enterprise support from Solo.io, and operate reliably everywhere. High availability is an important piece of business continuity, so make sure your applications are covered with redundancies and failover. Observability is just as important so you can monitor what is happening and respond quickly to any issues. We hope this blog helped you understand how to gain more resiliency for your applications.
You might also be interested in reading some of our documentation on
multi-cluster meshes or
troubleshooting. Or
request a demo from our Gloo Mesh product page today!
And much more
We can't cover all the Gloo Mesh features in this blog - there are too many! For example, Gloo Mesh comes with
fine-grained role-based access control (RBAC) which allows you to define who can create what kind of policies with what kind of content. It allows you to declaratively deploy
Web Assembly filters. Keep exploring our blog for more information on other topics, and let us know if you have any questions!