How to turn application logs into useful metrics with Gloo Edge and Grafana
Monitoring your applications is a requirement in the cloud native world, but having access to useful metrics is often a difficult task. For some, the phase when metrics become most useful begins at the end of the development lifecycle, mostly during the day-2 operations (post-deployment). At that time, your operations teams can leverage the metrics to monitor your applications and to have a better understanding of what is happening.
Yet development teams work at the very beginning of the development lifecycle. To have good and meaningful metrics, some iterations and feedback are required. In reality, developers care most about features, tests, and… logs traces! Developers dream of having good log traces because then they can properly debug their applications.
So, how can you improve this experience?
Turning application logs into useful metrics
Promtail is an agent which ships the contents of local logs to a log aggregator. The magic happens when, besides shipping logs, it can also parse them. That allows you to configure transformations which can be translated into meaningful metrics.
To have the picture, think of a log trace. For example:
[ERROR] Error trying to connect to the endpoint. Connection reset.
You could parse the log traces which contains the expression [ERROR] and create a counter metric out of it to be displayed in your favorite monitoring system, such as Grafana and Prometheus.
A typical developer scenario for logs and metrics
One of our customers came with a question. They had configured OPA (Open Policy Agent) for their passthrough server as described in the documentation. Unfortunately, the OPA service was not running when they deployed the Gloo Edge resource (AuthConfig).
The event threw an error in the ExtAuth pod logs:
gloo-system extauth-67b888f686-2s7cr extauth {"level":"error","ts":" [...] ", "caller":"config/generator.go:114","msg":"Errors encountered while processing new server configuration", "version":"1.9.7","error":"1 error occurred:\n\t* failed to get auth service for auth config with id [gloo-system.passthrough-auth]; this configuration will be ignored: failed to create grpc client manager: context deadline exceeded\n\n", "stacktrace":" [...] "}
The customer was quite interested in having everything covered by metrics. But when they discovered the issue, there were no metrics implemented to catch this scenario. The process to fix this issue involved opening a ticket and waiting for the engineering team to work on it. This delay could be easily tackled with the technique you will see in the next section.
Using Gloo Edge to build metrics from logs
The goal of this workshop is to catch ExtAuth (one of the Gloo Edge components) log traces so that they can be converted into metrics and exposed by a Grafana dashboard. Let’s walk through a common scenario and solution.
Your architecture will look like this:
Your Admin user is in charge of applying GlooEdge resources, while your Devops team is in charge of monitoring the infrastructure. You know that the error log is thrown when you apply an AuthConfig resource. In that resource, you specify the address to the OPA resource, which is not reachable.
Setup the environment
For this workshop you need to have:
- A Kubernetes cluster
- Helm
- A GlooEdge Enteprise license key
NOTE: Occasionally the button to “copy” might not work in some browsers. If so, please, select the text and copy.
Install Gloo Edge with a LICENSE_KEY:
export LICENSE_KEY=<here-your-license> helm repo add glooe https://storage.googleapis.com/gloo-ee-helm helm repo update helm upgrade --install gloo glooe/gloo-ee --namespace gloo-system \ --create-namespace --version 1.10.1 --set-string license_key=$LICENSE_KEY --devel
First, let’s discover the error. Create an application (i.e. httpbin):
kubectl apply -f https://raw.githubusercontent.com/istio/istio/1.12.0/samples/httpbin/httpbin.yaml
And expose it through Gloo Edge:
kubectl apply -f - <<EOF apiVersion: gateway.solo.io/v1 kind: VirtualService metadata: name: demo namespace: gloo-system spec: virtualHost: domains: - "*" routes: - matchers: - prefix: / options: extauth: configRef: name: passthrough-auth namespace: gloo-system routeAction: single: upstream: name: default-httpbin-80 namespace: gloo-system EOF
Create a file for the AuthConfig manifest so that you can then apply it and delete it easily:
cat << EOF > authconfig.yaml apiVersion: enterprise.gloo.solo.io/v1 kind: AuthConfig metadata: name: passthrough-auth namespace: gloo-system spec: configs: - passThroughAuth: grpc: address: ext-authz.default.svc.cluster.local:9000 connectionTimeout: 30s EOF
Now, apply it:
kubectl apply -f authconfig.yaml
Now since the service opa.opa.svc.cluster.local:9191 does not exist, an error is thrown in the ExtAuth pod.
Let’s verify it:
kubectl logs -l gloo=extauth -n gloo-system | grep "failed to create grpc client manager"
And you will see following error:
NOTE: If you do not see the error, wait a bit and try again. Propagating the events can take a bit of time.
gloo-system extauth-67b888f686-2s7cr extauth {"level":"error","ts":" [...] ",
"caller":"config/generator.go:114",
"msg":"Errors encountered while processing new server configuration",
"version":"1.9.7","error":"1 error occurred:\n\t* failed to get auth
service for auth config with id [gloo-system.passthrough-auth]; this configuration
will be ignored: failed to create grpc client manager: context deadline exceeded\n\n",
"stacktrace":" [...] "}
Now, you need to catch this error with Promtail, parse it as metric, and let Prometheus scrape the metric.
Let’s delete first the AuthConfig which triggered the error, to have a clean scenario:
kubectl delete -f authconfig.yaml
Deploy Prometheus and Grafana. Promtail depends on Prometheus given the ServiceMonitor resource. That is why the installation of Promtail must happen later on.
# Prometheus helm upgrade --install prometheus -n logging bitnami/kube-prometheus --version 6.6.0 --create-namespace # Grafana helm upgrade --install grafana grafana/grafana -n logging --version 6.13.5 --create-namespace -f - <<EOF datasources: datasources.yaml: apiVersion: 1 datasources: - name: Prometheus type: prometheus uuid: prometheus access: proxy url: http://prometheus-kube-prometheus-prometheus.logging.svc.cluster.local:9090 isDefault: true editable: true adminUser: admin adminPassword: password service: type: LoadBalancer sidecar: dashboards: enabled: true EOF
Finally, you need to install Promtail. Notice in the configuration the regex to catch the log traces you are targeting:
# Promtail helm upgrade --install promtail -n logging grafana/promtail --version 3.8.1 -f - <<EOF extraArgs: - -dry-run # This blocks the config: logLevel: info serverPort: 9080 # To expose the metrics lokiAddress: "http://this-will-be-ignored-due-to-dry-run" # To not deviate from the purpose of the article, you do not integrate with Loki snippets: pipelineStages: - match: selector: '{app="extauth"}' stages: - regex: # This is the regex expression: ".*(?P<passthrough_not_ready>error.*failed to create grpc client manager).*" - metrics: passthrough_not_ready: prefix: 'gloo_' type: Gauge # type: Counter description: "Total count of log traces when a passthrough service is not ready" source: passthrough_not_ready config: action: inc serviceMonitor: enabled: true interval: 10s scrapeTimeout: 5s EOF
Once you have Grafana ready, access it through your browser:
kubectl -n logging get svc grafana -o jsonpath='{.status.loadBalancer.ingress[0].*}'
You can use these credentials:
Username: admin Password: password
Create a dashboard with following specifications:
- Query:
count(gloo_passthrough_not_ready)
Panel configuration. The most relevant part is this:
You now have all the integration ready.
Set the dashboard to show the last 5 mins and you should see the panel:
Testing your configuration
Let’s verify that the dashboard shows an error once it occurs.
Apply the AuthConfig resource:
kubectl apply -f authconfig.yaml
Port-forward to access Promtail’s exposed metrics:
kubectl --namespace logging port-forward daemonset/promtail 9080
And check the metrics in your browser: http://127.0.0.1:9080/metrics
:You should be able to see your new metric
If you go to your new Grafana dashboard you should see the error:
Once you deploy the Authz service, there will be no more errors in the logs. After the event is propagated, the dashboard turns to No Error.
kubectl apply -f https://raw.githubusercontent.com/istio/istio/master/samples/extauthz/ext-authz.yaml
The creation of the Alert to handle this Error is left to the reader.
Level up for more advanced metrics use cases
The case above is straight forward. A more complex scenario is to leverage Header Transformations and then to Enrich Access Logs. The goal here is to have that processed data in the logs. Once in there, you can create metrics out of them.
Examples of usage can be to perform Analytics to understand the usage of your system based on requests.
In an e-commerce system:
- How many times a day Product A (my-e-commerce.com/products/product-a) is more visited compared to Product B (my-e-commerce.com/products/product-b)
An analytics example with Istio and an eCommerce website will be published shortly in the official Grafana blog site. Stay tuned.
Thanks to Neeraj Poddar for his feedback!
BACK TO BLOG