Introducing Istio Ambient Mesh, contributed by Solo.io & Google. READ THE BLOG

Exit Icon

Porting eBPF Applications to BumbleBee

Krisztian Fekete | June 21, 2022

The easiest way to develop modern BPF CO-RE programs

Starting out with eBPF can be a daunting task.

There’s still not that much documentation available, and while examples can be found, they can leave you without much explanation. And when you do find some info that looks promising, it might not work on your machine, could be outdated, or it may require additional layers to solve the distribution or handling the user-space tasks.

Fortunately, BPF CO-RE (Compile Once – Run Everywhere) aims to solve the problem of modern and portable eBPF programs, so having this as a foundation can be a guarantee for future-proof applications.

In this blog, we will cover a detailed example of how you can take a BPF CO-RE libbpf script and port it to Bumblebee in order to solve user-space, distribution, and integration challenges.

IO Visor’s libbpf-tools/ collection is a good place to start if you are looking for existing scripts (why reinvent the wheel?), so we will do exactly this.

Why port eBPF applications to BumbleBee?

Let’s say you are new to eBPF and you would like to play around with the existing scripts available. Or, you know that an existing script can solve your business problem and you would would like to take it into production.

Problems can arise, such as:

  • How would you store and distribute your eBPF script?
  • How would you integrate it into your existing ecosystem?
  • How would you validate the provenance of your scripts and handle security?

By using BumbleBee, you can reuse upstream code, modernize it, and solve distribution and integration problems, all without writing a single line of code in the user-space, taking your code to production faster and more safely.

Porting libbpf-tools/oomkill

Catching oomkills is essential for operational reliability–but it’s not a straightforward problem to solve, especially on Kubernetes platforms.

Fortunately, eBPF can help here, so let’s see how we can leverage existing eBPF scripts and port them to BumbleBee.

First, let’s take a look at the oomkill program from the aforementioned libbpf-tools collection, and use it as an example.

We will take a following steps to port the code to BumbleBee:

  1. Examine the source code to get a better understanding of how it works, and what needs to be done
  2. Migrate from perfbuffer to ringbuffer to optimize and modernize our script
  3. Package the code as an OCI image, and make it emit Prometheus metrics

Understanding the original code

This is what the original code does in short:

  • Defines a structure for our oomkill events
  • Defines our BPF logic, which will be triggered upon oomkill kernel probes
  • Sends these events to user-space
#include <vmlinux.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_core_read.h>
#include <bpf/bpf_tracing.h>
        
#include "oomkill.h"

The vmlinux.h header contains all the Linux kernel definitions. Due to BPF CO-RE, there’s no need to have this generated locally, but if you wish, you can.

The next header file is, bpf/bpf_helpers.h, which is distributed with libbpf, and contains common macros and helper functions, e.g. bpf_trace_printk(), which will be discussed during the section about debugging.

The bpf/bpf_core_read.h and bpf/bpf_tracing.h headers are also part of libbpf and provide macros and abstractions to low level syscalls to make developing BPF programs a more streamlined experience.

We will need to define a structure to represent the event in user-space. This can be found in last header, oomkill.h.

Note: We will include its content in our final oomkill.c for simplicity’s sake.
#define TASK_COMM_LEN 16
        
struct data_t {
    __u32 fpid;
    __u32 tpid;
    __u64 pages;
    char fcomm[TASK_COMM_LEN];
    char tcomm[TASK_COMM_LEN];
};

We need the process ID (fpid) and the name (fcomm) of the process that caused the oomkill and we also need the same for the process that was actually killed by the oomkiller, these are tpid and tcomm, respectively. In addition, we will also print out how many pages it reached when it died.

Then we need a place to store all these data_t events, before they are sent to the user-space. This can be a map with BPF_MAP_TYPE_PERF_EVENT_ARRAY type, and we will name it events.

struct {
    __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
    __uint(key_size, sizeof(u32));
    __uint(value_size, sizeof(u32));
} events SEC(".maps");

The next part is essentially our BPF program. The SEC macro defines what probe/tracepoint should be used to trigger an event. Thanks to libbpf, this is mostly abstracted away with these annotations.

SEC("kprobe/oom_kill_process")
int BPF_KPROBE(oom_kill_process, struct oom_control *oc, const char *message)
{
    struct data_t data;
        
    data.fpid = bpf_get_current_pid_tgid() >> 32;
    data.tpid = BPF_CORE_READ(oc, chosen, tgid);
    data.pages = BPF_CORE_READ(oc, totalpages);
    bpf_get_current_comm(&data.fcomm, sizeof(data.fcomm));
    bpf_probe_read_kernel(&data.tcomm, sizeof(data.tcomm), BPF_CORE_READ(oc, chosen, comm));
    bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &data, sizeof(data));
    return 0;
}

Firstly, we’ll get the ID of the process which was running when the oomkill happened. This is basically the process that is running in the context of our kprobe/oom_kill_process.

The bpf_get_current_pid_tgid() helper returns a 64 bit integer with the process ID and the thread group ID of the current process, we just need to right-shift by 32 places, to get rid of the thread group ID (tgid).

Then, we use the BPF_CORE_READ helper to extract tpid and pages in the context of the oomkill.

bpf_get_current_comm() can get us the current process’s name by using its first argument’s (&data.fcomm) address.

After this, we can read the oomkilled process’s name from the kernel by using the bpf_core_read() helper to get the memory address in our oomkill context.

Finally, bpf_perf_event_output will reserve space for struct data_t, and read sizeof(data) bytes of memory, so the data can be consumed by user-space.

The last part of the original program is this line, setting the GPL license, because a GPL-compatible license is needed for kernel modules.

char LICENSE[] SEC("license") = "GPL";

A few notes on user-space

Besides the kernel-space code, you will need to have something in order to get your work into user-space.

If you take a look at the content of the libbpf-tools/ folder, you will see that every application has at least two files; one is for the kernel-space (for example, like the file we just took a look at above), and another one is for user-space, with .c extension.

We won’t explain the user-space part in detail here, but it’s important to understand its lifecycle as one of the advantages of BumbleBee that will take care of these tasks, so you can focus on building the actual BPF logic.

Without BumbleBee, you would need to implement parsing arguments, visualize the data, and handle the lifecycle of the BPF application which consists of loading, attaching the BPF structures, then freeing up the resources. All this can be tedious, so why not let BumbleBee take care of them for you?

Migrating to ringbuf

Now that we know how the original program is working, we can port it to BumbleBee.

The libbpf-tool’s version of oomkill uses PerfBuffer as a map. BumbleBee supports RingBuffer and HashMap to exchange data between kernel-space and user-space, so we have to refactor the code.

PerfBuff can be considered suboptimal compared to RingBuffer as the latter eliminates memory overhead, solves event ordering, and is more performant.

--- a/libbpf
+++ b/bumblebee
@@ -1,5 +1,5 @@
struct {
-       __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
-       __uint(key_size, sizeof(u32));
-       __uint(value_size, sizeof(u32));
-} events SEC(".maps");
+       __uint(type, BPF_MAP_TYPE_RINGBUF);
+       __uint(max_entries, 1 << 24);
+       __type(value, struct data_t);
+} oomkills SEC(".maps.print");

Here, we are changing the type of the map, getting rid of the key_size, setting the size of max_entries (multiple of 4096 bytes), defining the struct of our entries.

Additionally, we are changing the name from events to oomkills. While that’s just a minor UX change, it will be relevant when we get to the Prometheus integration.

The print suffix will also make more sense later. It is used to define the way we want to consume the output of our BPF program. In this case, we set it to print, which is printing this to the GUI of BumbleBee.

Let’s see how the main function changed:

--- a/libbpf
+++ b/bumblebee
@@ -1,14 +1,21 @@
SEC("kprobe/oom_kill_process")
int BPF_KPROBE(oom_kill_process, struct oom_control *oc, const char *message)
{
-       struct data_t data;
-
-       data.fpid = bpf_get_current_pid_tgid() >> 32;
-       data.tpid = BPF_CORE_READ(oc, chosen, tgid);
-       data.pages = BPF_CORE_READ(oc, totalpages);
-       bpf_get_current_comm(&data.fcomm, sizeof(data.fcomm));
-       bpf_probe_read_kernel(&data.tcomm, sizeof(data.tcomm), BPF_CORE_READ(oc, chosen, comm));
-       bpf_perf_event_output(ctx, &events, BPF_F_CURRENT_CPU, &data, sizeof(data));
-       return 0;
-}
+        struct data_t *e;
+
+        e = bpf_ringbuf_reserve(&oomkills, sizeof(struct data_t), 0);
+        if (!e) {
+                return 0;
+        }
+
+        e->tpid = bpf_get_current_pid_tgid();
+        bpf_get_current_comm(&e->fcomm, TASK_COMM_LEN);
        
+        e->fpid = bpf_get_current_pid_tgid() >> 32;
+        e->pages = BPF_CORE_READ(oc, totalpages);
+        bpf_probe_read_kernel(&e->tcomm, sizeof(e->tcomm), BPF_CORE_READ(oc, chosen, comm));
+
+        bpf_ringbuf_submit(e, 0);
+
+        return 0;
+}

Here, we are creating a pointer to our data_t struct, so let’s call it e.

With the ringbuf approach, we can reserve the space for our events, then submitting them to user-space after all fields are populated.

Prometheus integration

BumbleBee can also help generating Prometheus metrics from the events.

If you change the .print suffix to .counter at the end of the ringbuf map, BumbleBee will turn these into counters, and enable you to visualize them or fire alerts upon the events.

The name of the map is important because of this, as it will be included in the name of the metrics, e.g. ebpf_solo_io_oomkills, in this case.

Packaging the application

After the code is ready, we can save it as oomkill.c, then use BumbleBee’s CLI tool, bee to build and save it as an OCI image.

bee build oomkill.c localhost:5000/solo/oomkill:v1

After this, you can run it locally, or distribute it across your Kubernetes nodes as a Daemonset. It’s up to you!

Additionally, you can also verify the provenance of our modules by using cosign, you can find the instructions for this feature here.

Debugging BPF applications

Debugging eBPF programs is not yet highly evolved, but you can use bpf_printk() as the BPF equivalent of print().

This function is part of the bpf helpers header, and defined like this:

#define __bpf_printk(fmt, ...)                     \
({                                                 \
  BPF_PRINTK_FMT_MOD char ____fmt[] = fmt;         \
  bpf_trace_printk(____fmt, sizeof(____fmt),       \
  ##__VA_ARGS__);                                  \
})

You can inject this function at the place where you need the value of a variable, e.g.:

SEC("kprobe/oom_kill_process")
int BPF_KPROBE(oom_kill_process, struct oom_control *oc, const char *message)
{
    ...
    e->tpid = bpf_get_current_pid_tgid();
    bpf_printk("oom_kill_process: setting tpid: %u", e->tpid);
    ...
    bpf_probe_read_kernel(&e->tcomm, sizeof(e->tcomm), BPF_CORE_READ(oc, chosen, comm));
    bpf_printk("oom_kill_process: setting tcomm: %s", e->tcomm);
    ...
}

Then, you can read the values in runtime, by reading the trace_pipe:

sudo cat /sys/kernel/debug/tracing/trace_pipe
    mandb-6957    [001] d...   669.470023: bpf_trace_printk: oom_kill_process: setting tpid: 6957
    mandb-6957    [001] d...   669.470028: bpf_trace_printk: oom_kill_process: setting tcomm: coredns
    local-path-prov-5002    [002] d...   669.642077: bpf_trace_printk: oom_kill_process: setting tpid: 5002
    local-path-prov-5002    [002] d...   669.642082: bpf_trace_printk: oom_kill_process: setting tcomm: bash

Learn more about BumbleBee

We are happy to announce that we are actively working on a new eBPF workshop, focusing on developing eBPF applications with Bumblebee! Check out our eBPF and BumbleBee live workshops to learn more from our team of experts and get your eBPF and Cilium certifications.

If you’re interested in learning more about eBPF, head over to bumblebee.io and visit our GitHub Repo to dive into eBPF!

 

BACK TO BLOG