Platform Engineering Essential Tools

Platform engineering is a vast field that covers many aspects of software development. No single tool can handle all of these, so companies often combine multiple tools to provide developers with everything they need to work efficiently and deliver products faster. The goal is to remove obstacles and streamline the process of getting products from development to production.

From here:

To here:

The first picture shows a common problem that happens in software development. Developers and Operators have different ways of working and different goals. This can lead to a bumpy transition between the two teams, which is called the “Wall of Confusion.” This can make the entire development process less efficient.

With the widespread adoption of Kubernetes as the standard platform for microservices architectures, the “Wall of Confusion” problem increases.  Developers typically work locally, but their code needs to be integrated into and tested in complex environments that they may not have the expertise to manage effectively.

The second picture shows a common way of developing software today. The inner loop represents the developer’s local environment, where they’re comfortable and have the expertise to work. The outer loop represents the deployment environment, which may be complex and involve platforms like Kubernetes or cloud-based systems. Developers may not have the same level of expertise in these environments, which can lead to challenges during deployment and testing.

Platform engineering is about making it easier for developers to get their code into production. Instead of having to learn complex infrastructure details, developers can focus on writing code and let platform engineers build a platform that takes care of the rest.

To enjoy the benefits of Platform Engineering here is your list of tools:

Istio Service Mesh

Istio is a service mesh platform that helps both developers and operators.

For developers, Istio simplifies common tasks like integrating certificates, managing observability, and implementing authorization and authentication.

For operators, Istio makes it easier to manage security, routing, and other infrastructure components across multiple applications.

In short, Istio bridges the gap between developers and operators, helping them work together more efficiently.

Remember, by controlling all traffic, you can effectively control your entire infrastructure.

A service mesh acts as an intermediary between applications and the underlying network, but it doesn’t directly manage the network itself. That’s where Container Network Interface (CNI) tools like Cilium come in.


Cilium is a popular CNI tool used by major cloud providers like Microsoft and Google. It utilizes eBPF, a new technology that enhances communication speed, to improve the efficiency of network traffic.

eBPF offers two key advantages:

  • It can bypass the traditional network stack, reducing processing overhead and increasing throughput.
  • It enables fine-grained observability at the packet level, allowing for deeper insights into network traffic patterns.


Infrastructure as Code (IaC) is a way to manage and provision your infrastructure using code. This means that you can describe the desired state of your infrastructure in a file, and IaC tools will automatically take action to make your infrastructure match that state.

IaC is a key part of DevOps, and it can help you automate your infrastructure provisioning, configuration, and management.

Terraform is an open-source IaC tool that is popular for managing cloud and on-premises infrastructure.

Recently, HashiCorp, the company that developed Terraform, changed its license model. This led to concerns about the neutrality of the tool.

In response, the community created a fork of Terraform called OpenTofu. OpenTofu is a community-driven project that is committed to neutrality.

OpenTofu is a good option for organizations that are looking for a free and open-source IaC tool.


Continuous Integration (CI) and Continuous Delivery/Deployment (CD) are essential practices in DevOps.

Traditional CI pipelines were designed for monolithic applications, but they became too complex for microservices architectures.

Tools like Spinnaker emerged to address this challenge. While Spinnaker became obsolete, other tools emerged. ArgoCD is a newer CD tool specifically designed for the Kubernetes platform. It automates the deployment and management of Kubernetes applications.

By integrating ArgoCD with Istio, a service mesh platform, developers can easily perform deployment strategies like Blue/Green or Canary deployments, ensuring smooth and reliable transitions to new versions of applications.

GitLab CI / GitHub Actions

Continuous Integration (CI) tools have evolved to become more versatile and powerful. They can now be used to automate a wide range of tasks, not just code builds.

Tools like GitLab CI and GitHub Actions allow users to create pipelines that are triggered by events such as pull requests, merge requests, or issue creation.

This makes CI platforms a valuable tool for platform engineers, as they can be used to orchestrate complex and dependent workflows.


One of the challenges for development teams is that they can only test their code in production after it’s already been deployed. This can be a slow and inefficient process, as it can take time to identify and fix bugs that only appear in production.

To address this challenge, we can use a tool like Istio Service Mesh to enable full control of traffic. This allows us to separate live traffic from traffic used for development, testing, and pre-production environments.

By doing this, we can test our code in production without affecting live users, and we can identify and fix bugs much more quickly.

Test In Production:


Using Istio, we can create isolated environments within our infrastructure to deploy work in progress (WIP) workloads. This allows us to manage our infrastructure more efficiently and avoid impacting production environments.

Once we’ve deployed WIP workloads, we can use tools like DevPod, Telepresence, or Gefyra to bridge the traffic between our local development environment and the remote production environment.

This way, we can debug issues in our local environment as if we were interacting with the production environment. This helps us catch and fix bugs early in the development process, ensuring that our code is stable and bug-free before it goes live.

Develop In Production:

Grafana, Prometheus, Loki, Tempo

We’ve already discussed how Istio provides centralized traffic management and unified infrastructure control. We’ve also mentioned that Istio offers telemetry data, while Cilium enables packet-level observability.

Now, let’s focus on how we can visualize all the events happening in our infrastructure, including workloads, in a human-friendly way.

By using visualizations, we can provide developers with a comprehensive understanding of their infrastructure. They can quickly identify and troubleshoot issues using the three pillars of observability (logs, traces, and metrics). They can also assess the health of the underlying infrastructure. And they can make informed decisions based on the data they see.

For example, visualizations can help developers track application performance in different geographic regions. This can be valuable for identifying performance bottlenecks and improving the user experience for users in specific locations.

Remote Development Environment

Remote development platforms like Codespaces, Gitpod,, and Cloud9 IDE are becoming increasingly popular because they allow developers to work together without the hassle of setting up complex local environments.

By enabling developers to work remotely, we can give them access to all the powerful capabilities that cloud platforms offer.

This means that developers don’t need to waste time and energy trying to simulate cloud environments on their local machines.

For example, if our application is going to be deployed behind firewalls, load balancers, or content delivery networks (CDNs), we can develop it directly in the cloud and take advantage of the same capabilities that will be used in production.

Remote development also makes it easier for developers to collaborate with each other. Never has remote pair programming been so simple.

Internal Developer Portal

All developer platforms should have a portal to improve the developer experience and help manage the collective APIs that serve as the foundation for the tools mentioned in this article. is a platform for managing and deploying software development workflows. It provides a centralized location for developers to access and manage their code, pipelines, and deployments. also makes it easy to collaborate on code, share documentation, and track issues.

Bonus: AI (Artificial Intelligence)

It is clear that AI tools like Copilot can help teams work more efficiently. Considering the potential of AI, it is reasonable to use them to support developers.

Let’s see how a developer can benefit from using AI as part of the platform stack:

  • Copilot can suggest code completions as you type, saving you time and effort.
  • Copilot can review your code for potential errors and suggest improvements, helping you to catch bugs before they go live.
  • Copilot can generate documentation for your code, which can make it easier for other developers to understand and use.

The goal of AI tools is not to replace developers but to assist them in their work. AI tools can help with coding, testing, and documentation, which can free up developers to focus on more creative and strategic tasks. This can help teams to work more efficiently and produce better software.

Putting These Tools Together

These tools are like different pieces of a puzzle. Each one covers a specific aspect of platform engineering. When you put them together, your teams can work more efficiently and quickly, which will help your business succeed.

If you’re not sure how to put all these pieces together, I can help. Reach out here.