Using service mesh to support government standards for zero trust architecture
Organizations no longer rely on a monolithic application system – instead, service mesh provides secure access to critical workloads when and where it’s needed. With service mesh, government agencies can take a security-first approach, allowing teams to build zero trust into the core of their system configurations.
In this post, we’ll explore the basics of service mesh architecture, including approaches for traffic management and observability, and show how to configure a secure, resilient platform for your mission-critical workload.
Watch on-demand webinar: How FIPS-Enabled Software Can Help You Get FedRamp Certified
Understanding service mesh and service mesh with Istio
As many organizations modernize their application stacks with ephemeral systems, deploying a workload isn’t guaranteed to stay running, creating a dynamic environment.
For these applications to do anything valuable, they need to communicate with each other and external systems. That’s all done over a network. In the past, developers would build that infrastructure. Now, a service mesh allows for externalizing those cross-cutting concerns.
A service mesh is a dedicated infrastructure layer that you can add to your applications and allows you to transparently add capabilities, without adding them to your own code, like:
- Secure service-to-service communication
- Traffic management
- Policy-based access control
Istio, in particular, is an open source service mesh project that deploys a sidecar proxy next to every application instance, which intercepts traffic in and out to achieve:
- L7 application identity and encryption in transit
- Per-request policy and controls
- Service discovery, load balancing, and resiliency
- Operational telemetry: metrics, logs, and traces
Istio then controls the proxies centrally with declarative configuration and dynamic update.
Istio Ambient Mesh is a new open source contribution to the Istio project that defines a new sidecar-less data plane, with the outcomes of simplifying operations, reducing costs, and improving performance.
Managing traffic, achieving observability and security with service mesh
Among the capabilities of service mesh, controlling traffic is a huge benefit. The service mesh controls traffic flow and requests between services using:
- Circuit breakers
- Canary rollouts
- A/B testing
- Staged rollouts
- Percentage-based traffic splits
Another important capability in the service mesh is collecting the telemetry data between applications as requests go on to the network without forcing the applications to have to do this themselves. That data includes:
- Metrics like health, traffic flow, and report errors (Kiali)
- Logging and scraping metrics (Prometheus and Grafana)
- Distributed tracing, including monitoring transactions (Jaeger)
At the ground level, existing controls can be made applicable to zero trust and service mesh – as well as future controls. Simplicity is key in terms of configuration as code, as well as being able to dynamically push these updates out through the mesh.
Users can build zero trust from the ground up through these configurations, allowing for enforcement of connectivity – what can talk to what – and creating the ability to replicate instead of building one-off systems. Service mesh provides valuable flexibility.
Enforcing NIST SP 800-204 series with service mesh
A service mesh like Istio is a platform component that allows users to start rolling out cross-cutting features and changes across the organization and allows for centralized control with distributed enforcement. Central teams can manage policy on behalf of their organization.
NIST SP 800-204A deals with the best practices and reference architectures and implementations for securing microservices. The special publication provides guidance around having:
- Mutual TLS (mTLS) between all services
- All ingress and egress flow through policy controlled proxies, and being dynamically configured and separate from application change
- Policy attached to externally consumed resources with explicit intent
- PKI and certificate rotation
- A cryptographic workload identity
These are not necessarily the skill sets that developers and architects would naturally have, but service mesh provides an easy one place to go to define that guidance in policy and then have it enforced anywhere.
In practice, small, focused teams can take on concerns, like a central security team that takes over PKI certificates and encryption in transit and can own that for all the teams that are coming to a platform. That results in a few employees with a specific skill set who can apply a policy across a large developer population and a large application or service population.
An organization’s platform ends up being any of these enforcement points.
NIST SP 800-204A talks about the ways to secure the application, including the attributes and control points to author policy on. The key idea is to be able to look at an application and its traffic and determine:
- Do I want to look at upstream services and downstream services?
- Do I want to look at the identity of the service?
- Do I want to look at characteristics of the request, like headers or other information that’s being passed?
- Do I want to look at end user identities like a jot token and the claims that are contained within that request?
With this information, users can start to centralize the authoring of the policy, then enforce those policies wherever applications are running – providing strong assurances that these policies are in effect.
Service mesh forms a security kernel. With Envoy as PEP, users can move security concerns out of the application and into the mesh. That results in having a small, focused, concentrated set of code that handles security capabilities. In a distributed system, there’s now a place to focus scrutiny, auditing, and assessment to ensure safety and security.
Success with service mesh
Achieving success with service mesh means there’s an alignment and dialogue with those who own the application and the platform itself. To do this at scale, with significant automation, it’s critical to find the right semantics to expose these capabilities to the application owners so they can focus on other priorities.
It’s also important to build automated processes to roll out policy from the start.
Challenges with service mesh
Complexity starts to increase when looking at multiple clusters, and potentially spreading those across multiple clouds or a hybrid or legacy infrastructure scenario. That’s where automation and the toolchains that drive these come into play.
Defining the semantics in how applications are described, their intents, and what they can do are also critical and should be considered as part of an assembly line that makes the process as simple as possible for developers.
Example: Platform One
Platform One, which is part of the Department of Defense Enterprise DevSecOps Initiative (DSOP) provides tooling, automation, and the compute platform to do a lot of this work for its development teams that are building cloud native services.
The paper describing this effort uses the term “baked in” frequently. Istio is one of the key ingredients in the post-deploy, runtime security layer that allows for that “baked in” feature.
How Gloo Platform can help
Solo.io provides a government-ready zero trust architecture (ZTA) built on federal government requirements for cybersecurity with NIST standards FIPS 140-2, 800-204A, and 800-207.
Gloo Platform utilizes best practices for patterns and architecture when building secure microservice applications. Using high-level abstraction allows teams to use GitOps style approaches to ensure proper inspection and adherence to corporate standards and policies. The use of industry standard tooling for observability allows operations the ability to monitor application networking metrics using open source and/or enterprise monitoring software.
Built around zero trust architecture, Istio, Gloo Gateway, and Gloo Mesh provide government organizations and system integrators with the centralized command and control required for FedRAMP certification.
Liked this blog post? Watch our presentation on DevSecOps for Government Systems with Defense Unicorns and Tetrate for an expert discussion.BACK TO BLOG