Istio as an example of when NOT to do Microservices

I’ve been pretty invested in helping organizations with their cloud-native journeys for the last five years. Modernizing and improving a team (and eventually an organization’s) velocity to deliver software-based technology is heavily influenced by its people, process and eventual technology decisions. A microservices approach may be appropriate when the culmination of an application’s architecture has become a bottleneck (as a result of the various people/process/tech factors) for making changes and “going faster”, but it is not the only approach.

Microservices is not THE “utopian application architecture”.

I’ve written in the past how I didn’t think many teams would be able to pull it off, how there are “hard parts” to getting it working, and even heads up about some technology that might be beneficial to your efforts in the long run. FWIW I even wrote a book on the topic.

Staying away from microservices is likely the best place to start, though many organizations are way passed that today.

You’ve already gone down the path of microservices

If you do go down the path of microservices, be honest with yourself and the organization when it is not working. Correcting course may be the right move for the success of your product.

Honesty about when it is not working

Despite best intentions, it may be the right choice to go back to a monolith once you’ve started with microservices, even if it was for the right reasons. “It is okay” to go back to a monolith if your assumptions or the context around your decisions have changed.

In the Istio community, which builds a service mesh for microservices communication, the implementation of the control plane will be gradually changing from a microservices approach to a more monolithic approach. Louis Ryan, who’s the Principal Engineer and architect on Google’s API infrastructure, gave a talk back at the Istio meetup at KubeConNA 2019 detailing the motivations as well as outlining the case in a design doc. Starting in Istio 1.5 (expected mid-February 2020), we should begin to see the effect of the istiod approach where functionality previously assigned to various microservices deployments will be coalesced into a single daemon.

Istio is used to help solve difficult application-networking challenges introduced by a microservices/cloud architecture, so why would Istio itself move away from a microservices architecture? The most straight-forward answer is:

The complexity of the microservices approach proved to not deliver the value or goals it intended. On the contrary, it worked against those goals.

For the Istio project, it looks like a monolithic approach would better contribute to those goals. Let’s take a closer look.

Istio implemented as microservices

Istio is a open-source service mesh, which is architected similar to other service-mesh implementations with a control plane and a data plane. The data plane consists of the proxies that live with each application instance and is in the request path. The control plane lives outside of the request path and is used to administer and control the behavior of the data plane.

Historically, Istio’s control plane was implemented as separately deployable services that did the following:

Pilot – the core data-plane config (xDS) server
Galley – configuration watching, validation, forwarding
Injector – responsible for auto-injecting the data plane and setting up bootstrap
Citadel – certificate signing, secret generation, integration with CAs, etc
Telemetry – a “mixer” component responsible for aggregating and syndicating telemetry to various backends
Policy – a request-path “mixer” component responsible for enforcing policy

These services would be driven by a set of operator-defined configuration and coordinate to eventually serve and direct the data plane.

Microservices benefits

Microservices can enable an organization to go faster by reducing friction to make changes to the system. With a microservices architecture, each service would likely be operated independently (each with their own team) and have its own release cadence/lifecycle independent from others. This would give the developers and operators parallel tracks to move faster without the locking/synchronization/coordination of making changes which can serve to slow down deployments and feature changes.

Another reason why a service may be further broken down is its usage patterns and scaling properties. For a simple example, a service that has heavy reads and writes may benefit from separating out reads from writes because reads may be more memory intensive (maybe needing more cache space to make the reads super fast) while writes may be more storage or network intensive. You could optimize the read part of a service on machines/quotas that allow it to scale independently (more memory) then have the write part of the service on other machines that have SSD or optimized EBS/SAN, etc.

Some other reasons why you might split an app into services:

security concerns
domain grouping
different language optimizations
criticality of service

The #1 tradeoff when going to a microservices architecture is complexity. When you go from one thing (monolith) to a bunch of little things communicating with each other (to optimize for a particular concern), you significantly increase complexity both in the architecture as well as the infrastructure necessary to run things.

This may be a necessary tradeoff insofar you realize the benefits. If not, it’s best you evaluate your assumptions and correct course. That’s what’s happening with Istio right now.

Correcting course

The first thing to understand is who is developing and operating your services architecture. In the Istio community, there are different components in the project as can be seen with the different community working groups. On the other hand, the persona downloading and operating an Istio installation is not as deconstructed. In fact, the observation so far is that a single group (or even single person) operates the Istio control plane. In some ways, the Istio control plane as a set of microservices would work well if run as a larger SaaS, but in its current adoption that doesn’t seem to be the case.

The second thing to understand is how is a release done? Can the services be released independently? The answer for Istio was “theoretically yes” in practice this doesn’t seem to be the case. When a new version of Istio is released, you need to upgrade/deploy all of the control-plane components.

Lastly, in the Istio case, you could ask “aren’t there other scaling variables and security concerns that are different for the various components?” And an honest answer would realize that there isn’t really. Taken directly from the Istio design doc for istiod:

This is not the case in Istio today for the majority of components however — control plane costs are dominated by a single feature (serving XDS). Every other control plane feature has a marginal cost by comparison, therefore there is little value to separation.

For security, all of the control-plane services had the same level of privilege:

This is not the case today as the powers exercised by the Mutating Webhook, Envoy Bootstrap & Pilot are similar to those of Citadel in many respects and therefore exploits of them have near equivalent damage

As the subtext in the Istiod design doc states “Complexity is the root of all evil or: How I Learned to Stop Worrying and Love the Monolith”.

istiod is an incarnation of a monolith which supports all of the functionality of the previous releases with significantly reduced complexity. Note that the services that made up the previous control planes are still implemented as sub-modules within the project (complete with boundary and contracts, etc) but the operational experience is improved. An operator now needs to worry about running and upgrading a single binary vs a collection of them.

For Istio going to a monolithic control plane, a significant amount of complexity can be reduced which never fully paid off:

Installation/upgrade becomes easier with only a single deployable service
Configuration complexity is reduced, as configuration to orchestrate services is no longer needed
Easier to debug issues (look in one place vs all the places)
Increase efficiency/reduce overhead of transmissions, sharing of caches, etc

See the Istiod design doc for more detail.

Also as a side note: you can check out a demo I did of this istiodapproach which should appear in Istio 1.5. Just note, it’s demo’d on a super alpha build of Istio, so not all polished like it will be 🙂

Conclusion

I’m happy to see the Istio community continue to improve its usability and operability characteristics. Going to a monolithic deployment of the Istio control plane makes a lot of sense for the project. Is that something that makes sense for your project? Is that something you would consider if it did? Are you measuring the value-to-complexity ratio for your microservices architecture (and associated infrastructure) in such a way that you’d be able to determine the time to change approaches?

Reach out to me @christianposta on twitter if you have thoughts you’d like to share. A good follow up post would be something along the lines of a decision table or key indicators of when you should be tempted to change course once you’ve decided to go down the path of microservices et. al. Hit me up if you’d like to contribute to that.

Istio as an example of when NOT to do Microservices

You’ve already gone down the path of microservices

Honesty about when it is not working

Istio implemented as microservices

Microservices benefits

Correcting course

Conclusion

Featured content

Fortifying Your Cloud Native Connectivity Security Posture with Solo and Ambient Mesh

Migrating from Sidecars to Ambient Mesh - Risks, Challenges, and Benefits

Overhaul of Agent Gateway supporting A2A, MCP, and Kubernetes Gateway API

How Ambient Mesh Delivers Advanced Resource and Cost Savings

Getting Started with Ambient Mesh: From 0 to 100 mph

Agent Discovery, Naming, and Resolution - the Missing Pieces to A2A

Part Two: MCP Authorization The Hard Way

Part One: MCP Authorization The Hard Way

Agent Identity and Access Management - Can SPIFFE Work?

Deep Dive into llm-d and Distributed Inference

Gloo Mesh 2.8 simplifies service mesh operations with new enhanced user experience across multi-cluster environments.

Gloo Gateway 1.19 accelerates context-rich, real-time AI apps with Gateway API

llm-d: Distributed Inference Serving on Kubernetes

AI Reliability Engineering For More Dependable Humans

Kubernetes Identity the Right Way with SPIRE and Ambient

Optimizing GenAI in Production: High-Value Use Cases for AI Gateways

Solo.io Recognized as a Visionary in the 2024 Gartner® Magic Quadrant™ for API Management for the SECOND year in a row.

Guardians of the Governance: GenAI Gateway Guidance with GitOps and Gloo

Istio Ambient Waypoint Proxy explained

Hands-On with the Kubernetes Gateway API and Envoy Proxy: A Tutorial with GitOps and Gloo Gateway

Istio and the State of DevOps: Enhancing Key Metrics

What is an AI Gateway and its role in AI Applications?

Best practices for secure Istio deployment with Gloo Mesh Core

Gloo Mesh 2.6: Istio's Ambient mode now ready for production

HTTP Observability Without Compromises

Advance your knowledge of service mesh tech with Solo.io Academy certifications

Service Mesh for the developer workflow, a series

Challenges of adopting service mesh in enterprise organizations

Service Mesh in the Real World #2 — Ingress Traffic Control

Service Mesh in the Real World Video Series – Episode # 1: Egress Traffic

Service Mesh the easy way with AWS App Mesh and SuperGloo

Webinar Recap: Intro to Service Mesh Hub and SMI

D-TECK Uses Solo.io Gloo Gateway and Google Cloud to Help Businesses Make Better HR Decisions

Minimize the blast radius of changes with Solo.io Gloo Gateway and Weaveworks Flagger

Announcing Service Mesh Interface (SMI) Support and Collaboration

Service Mesh Interface (SMI) and our Vision for the Community and Ecosystem

The need for a standard, service mesh API

SuperGloo to the Rescue! Making it easier to write extensions for Service Mesh

Introducing The Service Mesh Hub -everything you need for your service mesh

Kubernetes Ingress Past, Present, and Future

Solo.io Streamlines Service Mesh and Serverless Adoption for Enterprises in Google Cloud

Ingenico

ParkMobile

Vonage

Domino’s Pizza

Gloo Mesh Feature Comparison

Service Mesh for Developers, Part 1: Exploring the Power of Observability and OpenTelemetry

Service Mesh at Scale

Compare Capabilities of the Top Service Mesh Platforms

Compare Capabilities of the Top API Gateways

Establishing zero trust security for modern cloud architectures

Unlocking the Power of Your API Gateway

API Gateways: Productivity, Resilience, and Security for Next-Generation Cloud Applications

Driving Business Value with Istio

Service Mesh Vendor Comparison

Istio Then & Now

4 Reasons Why You Need an AI Gateway

Gloo Gateway vs. Kong

Gloo Gateway vs. Apigee

3 Reasons You Need an API Gateway for Microservices Apps

Ambient Mesh Lab: EnvoyFilter Support

Ambient Mesh Lab: SPIRE integration with Gloo Mesh in Istio Ambient Mode

Ambient Mesh Lab: Introduction to ztunnel in Ambient Mesh

Solo Academy Course: Service Mesh Basics

Solo Academy Course: Istio Basics

Solo Academy Course: Envoy Basics

Solo Academy Course: API Gateway Basics

Solo Academy Course: Get Started with Istio Service Mesh

Solo Academy Course: Introduction to Envoy Proxy

Solo Academy Course: Deploying Istio for Production

Kgateway Lab: Integrating kgateway with Istio at Ingress

Kgateway Lab: Kgateway as a Waypoint