What is a service mesh?

A service mesh enables you to manage communications between individual services within a microservices architecture. It decouples network logic from the business logic of each microservice, ensuring you can implement and manage networking and communication consistently across the entire system. 

A service mesh is an infrastructure layer that includes network proxies deployed alongside each service instance, collectively known as the data plane. In addition, it has a control plane that can configure and manage these proxies at large scale.

The need for a service mesh architecture

The rise of service mesh architectures has been a solution to many of the problems associated with microservices. Many development teams moved away from monolithic application development to microservices architectures. This splits the application from a monolithic unit into a collection of autonomous services. The challenge is finding an efficient way for these microservices to communicate with each other.

In a microservices architecture, application performance depends on services working together quickly and efficiently to share data and provide functionality. For example, a web-based email application might consist of a login service that handles user registration and authentication, a UI service that displays the web-based interface, and a database that stores emails and contacts. These three services must communicate perfectly with each other, and any lapse in communication will result in a poor user experience. 

Microservices communicate through APIs, so it is important to find a good solution for service discovery and routing. Developers also need to ensure that communication is secure. A firewall protects applications from external attacks, but a microservices architecture has a flat, open network, and if any one service is compromised, attackers could gain access to the entire system.

Before the advent of service mesh, traffic routing was handled by load balancers. However, load balancers are complex to deploy, costly, and can find it difficult to operate in a microservices environment. The service mesh was envisioned as a solution to all of these problems. Service mesh solutions:

  • Provide a centralized control plane for the network layer of a microservices application
  • Integrate with all services through a proxy
  • Are is easier to configure and scale than load balancers
  • Enables central control over routing rules without requiring any changes to services. 
  • Implements networking and communication logic at the platform layer, rather than having to build it into each individual microservice.

The 4 components of a service mesh architecture

Sidecar Proxies

A service mesh architecture adds an extra hop to every call, because calls to a service must go through a proxy. To minimize additional latency, proxies run on the same machine (virtual or physical), or on the same pod (in Kubernetes clusters), as the microservice. This allows the proxy to communicate with the service quickly via localhost. This model is called a “sidecar” deployment, and therefore service mesh proxies are known as “sidecar proxies”.

Node Proxies

A service mesh architecture adds an extra hop to every call, because calls to a service must go through a proxy. To minimize additional latency, proxies run on the same machine (virtual or physical) as the microservice. This model is called “node-level proxy” deployment, which was introduced in 2022 as part of Istio Ambient Mesh. 

The Data Plane

In a service mesh architecture, the data plane refers to a network of proxies deployed together with individual microservices. Sidecar proxies are deployed with each instance of a service that needs to communicate with other services. All service calls go through these proxies, which perform authentication, authorization, encryption, rate limiting, and load balancing, handle service discovery, and enable logging and tracing.

The Control Plane

In a microservices-based architecture of hundreds of services, each service must be scaled on-demand, and might have a large number of instances. Altogether there might be hundreds or thousands of service instances in the entire microservices application, each with its own sidecar proxy—this is where the control plane comes in.

The control plane of a service mesh architecture provides an interface where users can define policies to configure a proxy’s behavior in the data plane, and propagate this configuration to all proxies. This requires that each sidecar proxy connects to the control plane, registers itself, and receives configuration details.

Learn more about the primary features of a service mesh in our detailed guide to service mesh technology (coming soon)

For an example of a popular service mesh platform, read our guide to Istio.

Service mesh architecture: design considerations

A service mesh might seem like an ideal solution for various aspects of designing and implementing microservice systems, but there are some caveats.

Processing Overhead

Service meshes use a proxy to route the invocations between microservices, often via a load balancer. They also track invocations and make modifications using encryption. Encryption doesn’t generate too much processing overhead at the individual level, but the aggregate burden of encryption across services increases resource consumption and latency. 

Analysis based on scalability and performance metrics can help determine if a given use case causes significant overhead. 

Configuration Complexity

Setting up service mesh configurations requires complex design tasks to ensure proper implementation. The admin must know the service mesh’s general configuration options and how to compose the right configurations for each application. The configuration must match the system’s requirements when configuring a service mesh.

Validating and Testing Configurations

Once the service mesh is configured, it is important to validate configuration, and do so repeatedly throughout the CI/CD pipeline, recognizing that configurations will often change.

After validating service mesh configurations, organizations should test them to ensure that each configuration’s behavior and intent reflects the expected behavior when invoking microservices.

Reviewing Configurations

The control plane does not always ensure the service mesh system is secure and reliable. After configuring and testing the service mesh, a verification process helps prevent issues such as insecure, undetected invocations.

Any change to a microservice, such as an addition or update, could also impact how the mesh behaves. The change might not be significant enough for the configuration to register, even if it affects communication. There should be a review process for each change to the service mesh configuration to ensure it covers all updates.

Service meshes don’t address all security concerns affecting an enterprise – they only address the aspects relating to communication between services. Additional security measures, such as infrastructure provisioning (i.e., network controls, firewalls), require separate tools and processes.

Control Plane Changes

Service mesh systems usually change over time, with new updates to improve performance and scalability, add features and functionality, or apply patches and bug fixes. Regression tests are important during updates to the service mesh control plane – they help ensure the system updates do not introduce negative changes to a service mesh’s behavior.

Emergence of envoy and istio

Solo.io provides Enterprise service mesh based on Istio and Envoy, Gloo Mesh, part of the integrated Gloo Platform. Gloo Mesh Enterprise delivers connectivity, security, observability, and reliability for Kubernetes, VMs, and microservices spanning single cluster to multi-cluster, hybrid environments, plus production support for Istio.According to the 2022 GigaOm Service Mesh Radar report, “Solo.io Gloo Mesh continues to be the leading Istio-based service mesh, incorporating built-in best practices for extensibility and security and simplified, centralized Istio and Envoy lifecycle management.”