Service Mesh for Developers, Part 3: From DevOps to Platform Engineering

Welcome to the final part of our three-article journey on making app development even better through observability and supercharged testing tactics. In the first article we dove into the cool world of observability, fixing complex app issues like a pro within the service mesh. In the second article we cracked open the door to testing in production, using observability to ensure your app’s reliability is top-notch.

In this final article, we’re putting the pieces together to understand how all these practices blend into the big picture of app development. We’re digging into how they work together, ensuring a smooth connection between development and operations. Most excitingly, we’re showing you how observability and fancy testing don’t just level up DevOps but also take platform engineering to new heights. Stay tuned!

Development and Operations

From The Phoenix Project book (which led to The DevOps Handbook, which is considered the Bible of DevOps):

“You win when you protect the organization without putting meaningless work into the IT system. And you win even more when you can take meaningless work out of the IT system. (Chapter 21)”

The book develops The Theory of Constraints by Eliyahu Goldratt, which builds on the concept that a chain is only as strong as its weakest link. In business operations, where each process involves a series of steps, the slowest step controls the pace of the whole flow of the process.

To strengthen the chain, the industry adopted various models: DevOps, SRE, and platform engineering.

Now, let’s journey back to the roots: The Process.

In the early days of software development, the Waterfall model was the prevailing approach. This model featured sequential phases, each commencing only upon the completion of its predecessor.

Interestingly, the Waterfall model inadvertently laid the groundwork for a notable division – that between the development and operations teams. During this era, the development team’s primary role was to craft and code software based on provided requirements. Their involvement typically ended once coding was finished. At this point, the operations team took over, managing deployment, ongoing maintenance, and system administration.

These two teams developed distinct working cultures. Development teams were driven by innovation, aiming to introduce novel features and functionalities. In contrast, operations teams were dedicated to ensuring system stability. A valid concern arose: too many rapid changes (innovation) could overwhelm operations, potentially jeopardizing system stability.

This dilemma gave birth to an enduring dynamic: the dev vs. ops debate.

What Is DevOps?

‍

The outcome of the Waterfall model was detrimental to the business, leading to the emergence of DevOps.

In 2009, a presentation shed light, “10+ Deploys per Day: Dev and Ops Cooperation at Flickr,” and highlighted the perpetual clash between Developers and Operators, encapsulated by the phrase: “It’s not my code, it’s your machines!”

Patrick Debois identified with this scenario and initiated the “DevOps Days” conference, combining the terms Dev and Ops.

DevOps bridges the gap between these two teams, ushering in a cultural shift within organizations. It thrives on three fundamental pillars:

Enhancing work efficiency from development to production, introducing concepts like CI/CD (Continuous Integration/Continuous Deployment).
Prioritizing continuous feedback, reshaping teams to encompass end-to-end roles (developer, tester, operator), and emphasizing Observability in system components for informed decision-making.
Fostering a culture of continuous learning, enabling experimentation and the discovery of scenarios impossible to predict upfront. This leads to practices like chaos engineering and testing in production.

How Service Mesh Helps to Embrace DevOps

Service mesh and its implementations like Istio or Gloo Platform have been helping tremendously to the adopters given its architecture:

Decoupling cross-cutting concerns like observability from the application allows systems under the service mesh to be observable. The final goal of this feature is to have quick feedback on what is happening in your systems. This was tackled in the first post of this series, Service Mesh for Developers, Part 1: Exploring the Power of Observability and OpenTelemetry.
As described in the previous post from this series, Service Mesh for Developers, Part 2: Testing in Production, service mesh facilitates testing in production and also the deployment strategies needed to run several deployments a day (continuous deployment).
Service mesh also helps you to define chaos scenarios in a production environment enabling chaos engineering.

Is DevOps Succeeding?

In my view, DevOps has indeed succeeded in its mission. It aimed to transform our work model, and it’s safe to say that it has achieved that objective. The groundbreaking concepts introduced by the DevOps community have played a pivotal role in reshaping our workplace culture.

Yet this journey’s flexible and adaptable nature, rather than rigid steps, also presents a challenge. Different folks may interpret DevOps differently, and this diversity can lead some individuals to lose their way along the path.

What is Site Reliability Engineering (SRE)?

You might think this model came after DevOps, but that’s false. The first SRE team started at Google in 2003.

Since there’s no consensus on the SRE role’s exact definition, let’s explore the problems it addresses.

Previously, two teams operated differently; one focused on infrastructure, the other on applications, often causing tension (Dev vs. Ops). SRE bridges this gap by applying a software engineering mindset to system administration. Google effectively merged these two worlds, creating a specific role: Site Reliability Engineer. SRE’s goal is to ensure a “Site” (e.g., an application) works flawlessly in production, guided by SLAs (Service Level Agreements) that define minimum service quality.

SREs develop observability tools, user-friendly dashboards, and efficient supply chains (CI/CD) to meet these agreements, making operators develop like developers.

How service mesh helps to embrace SRE

The core of the SRE role is to develop tools and processes to measure systems. But, to add telemetry, it must be done in the tools the SRE controls and the applications they are operating.

Telemetry is one of the cross-cutting concerns that service mesh takes from the applications and shifts it to the platform, making the developer not worry about telemetry anymore.

Service mesh and its implementations like Istio or Gloo Platform help to build measurements on top of the systems that are being deployed in the infrastructure.

Why is it not succeeding?

SRE isn’t the ultimate solution because it doesn’t necessarily make life easier for developers. It’s great for keeping services manageable, but it doesn’t completely dismantle the “Wall of Confusion” that DevOps aimed to tear down.

Platform Engineering

The SRE role is about keeping services running to meet business needs. In the DevOps world, this translates to “shifting right,” focusing on the operations side.

However, as software evolves, platforms like Kubernetes have emerged. While they simplify operations, they can widen the gap between developers and operations. Developers want to code without diving into Kubernetes and similar complexities.

So, a developer’s work can be split into two loops:

Inner loop: This covers local activities like coding and unit testing.
Outer loop: These are activities that happen in a close-to-production environment.

Jorge Morales explains the significance of these two loops in Kubernetes development in his post.

Developers code and test on their computers, but these tests aren’t enough. To truly validate functionality, it must be tested in an environment close to production, like Kubernetes.

To streamline this process, a new role is emerging: platform engineering.

The goal of a platform engineer is to create and maintain a robust and efficient platform for software development and deployment. This platform should empower developers to build, test, and deploy their applications seamlessly while ensuring stability, scalability, and security.

How Service Mesh Helps to Embrace Platform Engineering

Service mesh technologies such as Istio and Gloo Platform, along with their architectural capabilities, greatly simplify the role of a platform engineer.

Think of this role as a facilitator for developers. Service mesh’s ability to manage traffic to and from applications (workloads) is a boon for this task.

Both Gloo Platform and Istio enable the creation of isolated “spaces” (referred to as workspaces in Gloo Platform). Platform engineers can configure these spaces, providing developers with a controlled environment for testing, as we discussed in the previous post.

However, testing in production is just one aspect of their capabilities. The real game-changer for developers is the ability to develop within a remote environment.

Today, tools like Gefyra, Telepresence, and DevPod offer this experience, working seamlessly with service mesh. This significantly enhances the developer experience.

By incorporating tools like DevPod, Gefyra, Telepresence, and others into your toolkit, you empower developers to work directly within remote environments, bridging the gap between the Inner Loop and the Outer Loop.

Service mesh implementations like Istio and Gloo Platform, thanks to their robust control over application networking, enable developers to navigate complex platforms like Kubernetes while ensuring that development adheres to the same rules applied in a production environment: think authentication, authorization, security, and more.

The Big Picture

What’s the big picture? Observability and clever testing aren’t just DevOps game-changers; they’re also turbocharging platform engineering. We’re breaking down the barriers between development and operations.

Service mesh, especially Istio and Gloo Platform, is Solo.io’s secret sauce. They’re not just about testing in production; they’re about developing remotely and following production-level rules. We’re talking authentication, authorization, and top-notch security.

So, gear up for the future of app development — it’s looking brighter than ever!

For more discussion on this topic, don’t hesitate to contact me.

Service Mesh for Developers, Part 3: From DevOps to Platform Engineering

Development and Operations

What Is DevOps?

How Service Mesh Helps to Embrace DevOps

Is DevOps Succeeding?

What is Site Reliability Engineering (SRE)?

How service mesh helps to embrace SRE

Why is it not succeeding?

Platform Engineering

How Service Mesh Helps to Embrace Platform Engineering

The Big Picture

Featured content

Overhaul of Agent Gateway supporting A2A, MCP, and Kubernetes Gateway API

How Ambient Mesh Delivers Advanced Resource and Cost Savings

Getting Started with Ambient Mesh: From 0 to 100 mph

Agent Discovery, Naming, and Resolution - the Missing Pieces to A2A

Part Two: MCP Authorization The Hard Way

Part One: MCP Authorization The Hard Way

Agent Identity and Access Management - Can SPIFFE Work?

Deep Dive into llm-d and Distributed Inference

Gloo Mesh 2.8 simplifies service mesh operations with new enhanced user experience across multi-cluster environments.

Gloo Gateway 1.19 accelerates context-rich, real-time AI apps with Gateway API

llm-d: Distributed Inference Serving on Kubernetes

AI Reliability Engineering For More Dependable Humans

Kubernetes Identity the Right Way with SPIRE and Ambient

Optimizing GenAI in Production: High-Value Use Cases for AI Gateways

Solo.io Recognized as a Visionary in the 2024 Gartner® Magic Quadrant™ for API Management for the SECOND year in a row.

Guardians of the Governance: GenAI Gateway Guidance with GitOps and Gloo

Istio Ambient Waypoint Proxy explained

Hands-On with the Kubernetes Gateway API and Envoy Proxy: A Tutorial with GitOps and Gloo Gateway

Istio and the State of DevOps: Enhancing Key Metrics

What is an AI Gateway and its role in AI Applications?

Best practices for secure Istio deployment with Gloo Mesh Core

Gloo Mesh 2.6: Istio's Ambient mode now ready for production

HTTP Observability Without Compromises

Advance your knowledge of service mesh tech with Solo.io Academy certifications

Service Mesh for the developer workflow, a series

Challenges of adopting service mesh in enterprise organizations

Service Mesh in the Real World #2 — Ingress Traffic Control

Service Mesh in the Real World Video Series – Episode # 1: Egress Traffic

Service Mesh the easy way with AWS App Mesh and SuperGloo

Webinar Recap: Intro to Service Mesh Hub and SMI

D-TECK Uses Solo.io Gloo Gateway and Google Cloud to Help Businesses Make Better HR Decisions

Minimize the blast radius of changes with Solo.io Gloo Gateway and Weaveworks Flagger

Announcing Service Mesh Interface (SMI) Support and Collaboration

Service Mesh Interface (SMI) and our Vision for the Community and Ecosystem

The need for a standard, service mesh API

SuperGloo to the Rescue! Making it easier to write extensions for Service Mesh

Introducing The Service Mesh Hub -everything you need for your service mesh

Kubernetes Ingress Past, Present, and Future

Solo.io Streamlines Service Mesh and Serverless Adoption for Enterprises in Google Cloud

Ingenico

ParkMobile

Vonage

Domino’s Pizza

Gloo Mesh Feature Comparison

Service Mesh for Developers, Part 1: Exploring the Power of Observability and OpenTelemetry

Service Mesh at Scale

Compare Capabilities of the Top Service Mesh Platforms

Compare Capabilities of the Top API Gateways

Establishing zero trust security for modern cloud architectures

Unlocking the Power of Your API Gateway

API Gateways: Productivity, Resilience, and Security for Next-Generation Cloud Applications

Driving Business Value with Istio

Service Mesh Vendor Comparison

Istio Then & Now

4 Reasons Why You Need an AI Gateway

Gloo Gateway vs. Kong

Gloo Gateway vs. Apigee

3 Reasons You Need an API Gateway for Microservices Apps

Ambient Mesh Lab: SPIRE integration with Gloo Mesh in Istio Ambient Mode

Ambient Mesh Lab: Introduction to ztunnel in Ambient Mesh

Solo Academy Course: Service Mesh Basics

Solo Academy Course: Istio Basics

Solo Academy Course: Envoy Basics

Solo Academy Course: API Gateway Basics

Solo Academy Course: Get Started with Istio Service Mesh

Solo Academy Course: Introduction to Envoy Proxy

Solo Academy Course: Deploying Istio for Production

Kgateway Lab: Integrating kgateway with Istio at Ingress