Hoot [Episode 2]: Linkerd

February 12, 2020

Hoot is a live video series hosted roughly every two weeks where engineers unbox various cloud-native technologies, dig into features of new releases and answer your questions live. To kick off the launch of Hoot, we start with a series on Get to Know Service Mesh as service mesh is the latest buzzword in our ecosystem. The questions come up often include: What is it? Why do I need it? and Which one should I choose?

 

Get to Know Service Mesh

To help explain service mesh, this series will explore the different service meshes, explain the architectural approach, unique capabilities, contrast them and provide guidance on how to navigate the landscape in choosing a mesh(es) for your applications.

Covered in this series:

Episode 2 Replay: Get to know Linkerd 

Speaker: Rick Ducott, Director of Engineering

Transcript

Alright. Well, hello, everyone. Welcome to the second Hoot. The current focus of the Hoot video series is about unboxing and getting to know service meshes. So if you followed along with us. The last time in the Hoot series, we covered Istio with Christian our field CTO. Today I’ll be covering Linkerd, and in two weeks, we’ll do another one of these unboxing Consul Connect. And so we’re working our way through the meshes. 

And now we’ll dig into Linkerd. I don’t want to focus too much conceptually on what is a service mesh, but just to kind of ground the conversation because I’ll be using these phrases a lot service mesh kind of paradigm is how people are deploying software. Now, especially with orchestrators like Kubernetes that make it really easy or the paradigm calls for deploying a sidecar proxy next to each of your actual services. This enables you to decouple from your application. Some of the cross cutting features related to routing and security observability and kind of expose the configuration of those features in a common user friendly way.

So that all of those proxies make up the data plane and then in the service mesh world, there’s a control plane that is pushing the configuration to those proxies. That is conceptually what we’re talking about here and let’s go through and focus on Linkerd, one of the service mesh implementations

So this is the link or the website and at the top, they give a really good summary of the focus of the the Linkerd service mesh and they call it the ultralight service mesh for Kubernetes and specifically, it gives you observability reliability and security without requiring code changes. This is a pretty good concise explanation of where Linkerd’s focus is: making it really usable out of the box. It’s very simple to enable mTLS and to start getting really interesting kinds of tools to observe your application. 

And that’s really the focus of Linkerd. If you watched the the Hoot video last week about Istio, there’s a much broader range of features a much more complex configuration API and there’s trade offs for that. There’s advantages to using service mesh to provide those capabilities. There are advantages to using something focused on mTLS and observability, especially if that’s your core use case.

A good kind of background on Linkerd if you’re interested is there was a podcast by William Morgan (who’s leading Buoyant) on the software engineering daily podcast talking about where Linkerd sits and and really emphasizing that in the long run, there’s going to be a lot of different service mesh implementations, some that are really good for particular cloud providers, maybe, you know, Amazon AWS has the app mesh implementation and that might be the right choice for folks that are doing workloads exclusively and Amazon and EKS. But, you know, that might not be the right choice if you’re in a different cloud provider or if you’re specifically looking for features that are better suited by different service mesh. So, you know, it’s an interesting angle and I think as we go through the kind of onboarding for the product will really see how they’ve been emphasizing these things over and over again from the jump.

And before we dive into the kind of actual installation and demos I want to talk a little bit about the architecture. So in particular, the question many people start with when it comes to gateway or service meshes is “what is the proxy implementation?”. So as we said before, we’re going to be injecting the sidecar proxy next to every service in our application. In the case of Linkerd, they have an open source proxy that they’ve built alongside the control plane called the Linkerd2 proxy. 

I’ll open that up. So this is an open source proxy implementation written in Rust. Linkerd essentially took the approach of let’s build our own proxy with the features that we care about built in, and they’ve been working on this for years. It’s pretty mature at this point, there have been, you know, dozens of contributors with 100 and more releases.

It is worth noting, though, because there are other control plane implementations, for example, that are based on Envoy proxy and open source proxy pioneered by Lyft. For example, Istio uses this and it has probably a larger feature set surface area, but it requires sending configuration in Envoy language. One advantage of going with something like Envoy proxy, as opposed to rolling your own is you can pick up improvements from the community. So for instance the Envoy proxy community is really excited about web assembly as a way to further customize the behavior of the proxy.

You obviously wouldn’t get that with Linkerd, but as we said before, that’s not really the focus here and for the use cases that Linkerd supports out of the box today, their proxy is more than sufficient, very performant and pretty easy to use.

If the data plane consists of all these proxy instances next to the application. Then of course there’s a control plane set of components that manages and sends configuration to those and powers the applications. So let’s, let’s get installed and then we can take a look at what actually those components consist of. So I’m switching over to my terminal. I just deployed mini-Kube, a fresh installation, so that we could start essentially from a blank slate.

So the first thing that will do in terms of getting deployed is installing the command line tool. One of the most common ways to interface with Linkerd, especially for the onboarding and the initial install is through this tool. It is basically based on top of the same API is that the web dashboard is based on and so it provides in the command line really useful functionality in terms of tap and top and other features that will get into a little bit.

At Solo.io we were really inspired by this kind of one liner for easy install so anyone can just run the single command – it downloads a script and it runs the scripts and make sure that you have the command line tool in the kind of expected location prompts you to add it to your path and then how to get fully up and running with Linkerd. I really like this kind of very simple, clear experience for the user.

So I’ve already done this. I have Linkerd in my path. We’ll do a quick version. Okay, so now I’m running the latest stable release and I currently don’t have the server or the control plane installed on my cluster. If we look at the cluster. This is a vanilla mini-kube. It was just installed eight minutes ago, set up eight minutes ago. So let’s go ahead and do the first thing which is a precheck. This is really nice. It basically is a set of really quick checks, but fairly comprehensive ensuring g a bunch of things that make sure that you can you have Kubernetes running, you can query that’s the right version, it checks out the RBAC of your current news profile or contacts, making sure that you can actually create the resources that needs to be created, and making sure that there aren’t conflicting resources that already exist. This is really great. It also kind of makes sure that your tooling is up to date. 

So probably a good time to just mention, there are different release channels for Linkerd so the latest stable release, which is kind of how they’re intended for mass consumption and for enterprise users who are relying on long term support. So the latest stable release was 2.6 and 2.61 is the latest patch. And I read the announcement this morning, it looks like it took about seven or eight weeks since the last stable release. So they’ve been actually ramping up their velocity and the releases are coming out once or twice a quarter.

So we did the pre check everything looks good. So let’s start the installation process. Linkerd makes it pretty easy. I’m going to first run the install commands and as you can see it spits out a bunch of YAML. This is basically a dry run. So what the install command does is produce to standard out, all the YAML you need to apply to a normal kubectl apply. We’re kind of looking at the Linkerd code a while back and one of the reasons that we did that this was that kind of the recommended starting point, as opposed to Helm, though more recently, there is Helm support is that this manifest contains things like public keys certificates that were generated during install. So this is kind of a good way to inspect a valid manifest for installing Linkerd, and now will pipe that to kubectl apply.

To actually perform the installation. So as you can see it’s kind of setting up the namespace and all the RBAC for service counts roles and role bindings and then it’s actually deploying to the namespace all the control plane components and configuration.

Looks like everything got created so just means Kubernetes resources applied successfully. Let’s see what’s in the namespace. I deployed this to the kind of standard Linkerd system namespace and as we can see, it’s initializing and let’s run check. Now we’re not running the pre check, but we’re kind of checking the active installation. A very useful tool – like a quick debugging step to see, are we healthy and in this case it’s the control plane healthy. Whereas, before it was “is the control plane installable?” There are also commands for checking “is the status of my application that’s been injected with the proxies healthy?” So it’s a really useful debugging command, so that I don’t need to really think about it. I can just run the command and and feel confident that if it succeeds, that everything is good to go.

Also since the install command and applying that command just kind of created the resources, It’s a good way to feel confident that once the check succeeds, actually the control plane is ready and not just in the process of being deployed. So the checks all passed. Now, this means that Linkerd is healthy, good to go, and the control plane is installed. 

So let’s go back to the architecture and kind of look at what is actually deployed here. So as you can see we have 9 pods here running. These are all the different components of the control plane. We can also look at these in the UI, but I’m going to talk through it at a high level, what the architecture is here. 

Actually, maybe it’s useful to do this in context. They’ve got a really good architecture document. So here, this is the rough kind of architecture of all of the components. As we’re talking about, there’s a separation, where the data plane is the application on the but this bottom arrow is kind of the request path through the application. And as you can see the requests go through the proxies then to the application and then they continue on. As we go through a demo we will talk kind of how that gets initialized how that setup in terms of what this pod configuration actually looks like.

But first of all, in terms of the components. So there’s two key components that help configure the behavior of the proxy in destination and identity. Destinations basically are responsible for all the routing information, making sure the proxy knows when a service or URLs and folks like how to find that resource. Identity is a certificate authority that provides signed certificates to all the proxy instances that enables that seamless out of the box mTLS behavior between all of the applications that you injected Linkerd into.

There’s a lot of other components on here and the next question is “how does my application actually get deployed like this?” Often when you’re managing an application or Kubernetes is you’re managing your own set of deployments, which tell you what pods will get spun up, what services to expose with DNS, what config to deploy, and secrets. But you don’t want to have to go kind of manually augment all that for the extra configuration so that there are proxy containers and the DNS tables and IP tables have been rewritten. So instead, there’s a proxy injector component that helps with this process and it can be invoked either dynamically with through annotations on namespaces or generally configuring other resources in Kubernetes it can be dynamically injected as resources are applied or can be done as a command line tool, which I’ll show in a little bit.

Then there’s several other features here. One is a service profile validator that enables you to add extra data on top of a service to say things like, here the endpoints and the shapes of the requests and responses to those endpoints. The proxy has built in support for a tap filter. So you can actually stream and watch real time request and response traffic through a particular workload. There’s a Prometheus and Grafana instance for metrics, it’s set up to automatically scrape both the control plane API as well as all the proxy instances. And there’s a built-in dashboard that comes with Linkerd that has a really good kind of starting point for monitoring the real time health of your overall mesh. And finally, there’s the actual API component which is called controller and then there’s the CLI and the web dashboard that are built on top of that. So that’s what all of these things are.

Let me open up the dashboard and show you what the control plane looks like from there. So this is pretty cool. It’s essentially showing the health of the control plane so I can see it was successfully installed. Up here there’s kind of a recommendation for how to kind of get started. And then down here. We’ve got all the actual components that I was mentioning in terms of what goes into the control plane and and then a summary of namespaces and whatnot. It’s worth noting that all these control plane components are also meshed or injected with the sidecar. But obviously they serve a different function than your actual application.

 

So the dashboard command essentially is setting up the right networking locally. In Kubernetes that often means port forwarding setting it up so that you’re serving through local host I’m running off of mini-kube, though, it kind of needs to set that up in the background. Now we also have this graph on a dashboard. Out of the box that you can already see kind of all the traffic that’s happening currently. This is just the control plane traffic so it’s not super interesting, but we’ll come back to this after we install an actual demo application. Now let’s actually install a demo. So first, before applying it. I just want to take a look at what this is. So this is the

 

This is the standard Linkerd demo. It’s called emoji vote. It’s basically got several components to web UI where you can select emojis is your favorite and then you can see a leaderboard of who’s been voting, and it also comes with a little with built in kind of load tool so that there’s active traffic happening and it comes with some bugs, so that you can kind of get a better sense of what it means to debug failing requests or broken services, which is kind of the purpose of deploying Linkerd in the first place. So it’s quite a useful demo application to really understand the basics of the mesh. Notably, it doesn’t have anything in this manifest related to the proxies or Linkerd or the word Linkerd isn’t anywhere in here.

 

So as I was saying there is a component that helps with the injecting have that additional configuration typically into your deployment specs so that you know it’s spinning up the right proxy container next to your real container for the application. Also it’s initializing the overall pod with the right certs and mounts and other things necessary for mTLS and to pick up the right configuration. But none of that’s here, that’s all done by the injector. And that’s nice because you know a lot of people will manage this deployment in some kind of gitops pipeline, where you might want the transformation to be part of the continuous delivery part of the pipeline, rather than actually kind of stored in your repo, or however you’re managing your kind of infrastructure config as code.

So let’s apply this. And now we’ve created a new namespace. So we can take a look at that and we can look at the pods in the namespace. Now, it’s worth noting, so far I’ve just deployed the demo application. I haven’t done anything related to connecting it with Linkerd.

There’s several common ways that meshes support kind of set injection or that onboarding or as Linkerd has called it the meshing of pods or workloads in your application. One of those is you could annotate namespaces or have some kind of global configuration with a mission control like a mutating web hook or something to that effect. In our case, here we’re just going to manually do it even after the initial applications deployed initially. But before we do that, let’s just take a quick look at what this application is doing.

So I’m going to port forward the web service, which is the actual site. This is the emoji application. Essentially what it’s allowing you to do is pick different emojis and then there’s a leaderboard as you can see by the time I logged in here, there’s already been a ton of traffic.

That’s because of the load generator that’s running in the background and every second, it’s putting a new vote in. It’s also worth noting that, you know, and you know, we’re kind of we. There are some bugs in here so we’ll use the dashboard to start to drill into what those bugs are but you know it’s kind of built in to the demo here.

Now I’ve got my application deployed it’s running everything’s going well, but I need to inject this with the sidecars to get it hooked up to Linkerd. I can see that there is this namespace. Now the UI, the Linkerd the dashboard knows about it knows that there’s workloads there but they aren’t meshed. They’re not injected so you can’t really do anything meaningful with those at this point with this dashboard.

Let’s go fix that.  So the next step here is to actually we’re going to, we’re going to do this a different way, since the resources are already there. We’re just going to read them out, run them through the injector, and then reapply them back to Kubernetes, so we can first do this. So first we’ll get all the deployments in emoji voter namespace.

So this is all of the actual deployments. This is just the manifest that we deployed before. So there’s no Linkerd info yet. Now pipe that Linkerd inject and this essentially provides the dry run. So this isn’t quite because I’m seeing both standard in and standard out but meaningfully, if I run the inject command, but don’t pipe that to Kuberentes apply, then it’s just kind of spitting out to standard out what the manifests will be. And if we look through it, we can start to see how these deployments are actually going to be modified in order to become hooked up to Linkerd.

Apologies for my scrolling. Now we’ve reconfigured these and let’s look at what the deployments look like again. So now we can see some annotations here, we’ve got annotations that we should be injecting these things. What I wanted to see not seeing it here. Let’s look at what the shape of the deployments look like. So I’m running into a slight problem here. I’m not getting what I expect, which is essentially to have added several containers to the emoji votes app. I’m going to go back to the docs and take a look at what I’m doing wrong. 

So kind of did the install, we did the pre check, we installed Linkerd. Oh let’s see what I was doing wrong. Oh, there, right. I was looking at the slightly wrong thing. So now. So if I look at the pods, we can now see that there’s two containers in each pod. And if we look at the actual pod spec, we can now see that, first of all, there’s an inner container in each of these pods and the inner container is running this container called proxy in it. And this is basically setting up your IP tables so that the networking transparently to the application is now going through the proxy.

So this has to be done as an inner container so that by the time your application is deployed and the containers running, the networking is all predefined setup and to ensure you’re of communicating through the ports and with the mTLS configuration as intended.

Then there if we look at the containers on this thing. So first of all, the, you know, there’s one here, which was already there, you know, this is the same container that has always been part of this pod. And then we see a second container and this was also injected and so what this container basically is is the Linkerd proxy with a bunch of environment variables.

Okay, a bunch of environment variables and there’s a certificate that gets mounted and some other volume mounts and this is basically all the configuration necessary to run the proxy. And to do things related, you know, related to getting the right certificate and identity configurations. And as we can see, it’s running the proxy, the container the proxy image that we were looking at the same version. I was getting confused before because I was looking at the deployments looking for the side cars, but obviously I found them in the pod. So the deployment has enough information so that when the pod is created, it is done so with the appropriate containers.

So now we have an injected application. Let’s go back to the dashboard and see how it looks now. We can see this updated emoji vote application is now meshed all four of the workloads that are running inside of it. This means that they are all connected to the mesh, they’ll have sidecars. We can look at that namespace and get kind of an overview of what’s going on here so we can see there, we can see the four deployments, there’s this funny kind of animation, so over time, you know, we can kind of see the request traffic and you know in a more complex application. This might interestingly rearrange over time as new services come online.

And generally, you can ultimately kind of get some high level stats across the entire across all the workloads in the namespace and we can drill in on specific ones to get a better sense. So let’s first look at this emoji deployment. It’s got a 100% success rate so there’s probably not anything interesting here, but we can at least get a sense of who’s making requests to this workload?  Looks like in this case there’s two common requests that the web component is sending it, this is a tap of all the live of the actual live request traffic.

So theoretically, we could go and port forward the voting app. So now we can continue to play around; the app’s still running just as before, able to talk to it the same way, theoretically, there should be more kind of traffic that’s feeding into our actual dashboard. Now if I go to the voting or the the actual votes are coming in here. In the architecture of this demo application, the traffic. You know, when I actually make a vote, it’s actually coming here. Now, it’s worth noting that every once in a while these requests are failing so I can already see here. Whenever someone is making a request to vote donut, I’m getting a failure rate and the success is actually zero percent in this case.

In fact, if we look, you know, drill into this gives us this is another view. This is maybe a better view of the overall architecture of the application. So the bot is kind of initiating requests through the web to kind of emulate a user in the browser and then the web is kind of making a request the emoji back end as well as making actual votes.

So we can see that the bot is making a request to vote, sometimes occasionally that is failing and certain votes, particularly for donut, seem to be failing whereas everything else seems to be pretty healthy. So right away we can kind of get a sense of where the problems in this application lie and that really kind of checks out, intuitively. From how we’ve been, you know, our experience from playing around the application, so we were kind of seeing when you click here we were getting a 404 but otherwise, things are pretty much working.

As we can see, if we view the leaderboards we can see basically everything is working fine and it appears that there’s a problem with the vote donut endpoint. That is always failing. And as a result, it also appears to be causing some responses on the back to the bot to be failures.

 

So that’s kind of the extent of the demo that I wanted to go through. But I did want to touch on a few more really cool things about Linkerd that are worth mentioning, without going too deep on it for this intro session. So as I was saying before, like the focus of Linkerd is really on rapid onboarding and ease of use.  And as we can see, it was very quick to get things deployed; it was easy, there were a bunch of checks; so really good guardrails. And then once you’ve got the control plane deployed, there’s several ways to actually go and set up your application to be injected or meshed. And once you have that, you get all these tools on the box and the key focus or the kind of the best tooling that is really attractive to people with Linkerd, are these tap and top capabilities we were seeing it kind of geared around the deployments here but you can really see both aggregate kind of stats about request and response rates and generally the health of your pods as our workloads, as well as the request going through them and then with the tap feature you can actually look at these real time requests.

 

You can focus on specific source and destination specific paths and just do some interesting analysis on the traffic as it’s flying through your application in real time. There’s more information here if you want extra details about the requests. Currently, this is all that’s exposed. But as you can see it’s already providing a lot of information out of the box.

 

I mentioned here before leaving the UI that there’s also this traffic split configuration option. Not going to go through a demo of this now just for time. But I think it’s an experience that I’ve had working on service mesh stuff is seeing Linkerd’s presence as part of a broader community effort to standardize on how on the API is for service meshes that is called Service Mesh Interface (SMI). Linkerd was written to support all the SMI APIs out of the box. And one of those API’s is traffic splitting, enabling you to essentially reroute traffic for particular destination to one or more alternative destinations. 

 

You could be using this for progressive delivery, things like flagger where you’re essentially slowly shifting traffic over to a new version of a service or you’re kind of rolling out two versions of a service side by side to collect data from your users. So there’s a really good UI for that here. And if you look at the most recent kind of release announcement, you’ll see a lot more information about that. 

 

But the last thing I wanted to touch on is just really props to the Linkerd people for not just providing a good dashboard for this, this kind of information, but also good tooling from the command line. For all of these features that we’re looking at. We can do the same thing from the command. So let’s first get some stats. Or actually its first check. So we can do a check on the emoji voto namespace and you know just instantly get a sense that everything’s healthy connected to Linker. We can do a stat on the deployments and get some good aggregate stats and this is all Linkerd deploys Prometheus so that we can get and maintain these relevant queries and aggregate stats. We can also get live stats. So we can do tops on the deployments in the namespace and see in real time what requests are occurring.

 

And I need to have made this too large, it goes off the screen. But we can kind of there’s a lot more information off to the right. If this was reasonably sized so you got the request count and there are some stats about the performance of those and so forth.

 

And lastly, same thing with the tap. So let’s tap a particular workload, the web deployment and here we get like actual access to the exact traffic. So we’re talking might have been more of an aggregate per endpoint. This is now actually like a record of all of the inputs and outputs for a particular workload.

 

So as you can see, out of the box with with basically zero configuration we get really fast onboarding of a new service mesh implementation, out of the box we’re getting MTS for free with certificate authority that’s generated unique certificates for each of our sidecars, and once we’ve on boarded an application which is ultimately a single command, we’re able to start utilizing all these interesting features related to observability. Looking at the aggregate stats about the namespace or about particular workloads and then seeing the actual traffic. With Linkerd you can also do things like configure policy and as I mentioned before, they’re starting to move into features like traffic splitting.

 

So that about covers it for the unboxing of Linkerd. There’s, there’s a lot more things you can do once you’ve gotten through the basic getting started. So I would encourage you to really dig into the docs. It’s a really good summary of kind of more broadly, all the features that are available and then some useful user guides to get an understanding of how to use those features as well as other references. I mentioned the release channels, so you know there’s the standard way that we installed is going to get us the latest stable release, but there’s more frequent releases. If you want to install the edge so you can really stay up and running.

 

And last thing I’ll note is that there’s a very active community and a lot of enterprises are looking for essentially solving first with service mesh, the thing that Linkerd provides. And so the community around this is pretty broad, a lot of contributors and a lot of people in Slack. So, you know, if you want to learn more. I would encourage you to check these things out.