Technical

Choosing the Right Routing in Cilium

The way our applications communicate at scale is going to bring about networking challenges (and potential headaches!). The reality is that networks are so complex, vast, and far reaching, that it takes special kinds of protocols to help manage, navigate, and influence the paths our applications need to take. This becomes increasingly true when dealing with large containerized environments spanning multiple geographical regions.

Cilium offers a flexible networking approach for Kubernetes clusters. When it comes to routing traffic between nodes, Cilium provides two primary methods: encapsulation (usually via VXLAN or Geneve tunneling) and direct routing (often called “native routing”). Both methods have their advantages and trade-offs:

Encapsulation

By default, when deploying Cilium, the encapsulate mode is enabled, which makes use of the GENEVE or VXLAN (SDN Overlays) encapsulation and tunneling mechanism. This method enables all nodes in a cluster to tunnel to each other creating n x n tunnels, where n is the number of nodes. As long as nodes have IP reachability to each other, traffic will be routed by default.

This mode offers simpler onboarding, larger address space allocation for pods, automatic configuration of nodes, and maintenance of identity of packet metadata. However, if operating in this model, it’s highly recommended to enable jumbo frames on the physical network as the VXLAN/GENEVE tunneling adds additional overhead to the original packet. This may not always be possible and will lead to packet fragmentation, sending more packets than needed, and potentially sporadically having tunnels go offline.

Pros:

Overlay simplicity: Tunneling encapsulates cluster traffic in an overlay network, which means you don’t need to worry about the underlying network’s intricacies. This encapsulation abstracts away potential network conflicts and policies.
Uniform addressing: In an encapsulated overlay, you have full control over IP addressing, making it easier to avoid IP conflicts with the underlying network.
Potential for better multi-cluster support: For setups that involve stretching clusters over different data centers or cloud providers, overlay networks can provide a consistent network space.

Cons:

Performance overhead: Encapsulation and decapsulation of packets introduce some overhead, which might lead to slightly reduced network performance compared to native routing.
MTU considerations: Due to encapsulation, the Maximum Transmission Unit (MTU) size of packets can be an issue. Encapsulated packets are larger, and if not managed correctly, this can lead to fragmentation or dropped packets.
Complexity: While the overlay abstracts underlying network complexities, it also introduces its own set of complexities, like managing the overlay, potential tunneling issues, etc.

Direct (Native) Routing

Native routing relies on the network that Cilium runs on. In this mode, packets from an endpoint that are not addressed to another node-local endpoint will be handed off to the Linux Kernel routing subsystem for forwarding. The underlying network must be able to route the IPs in the Pod CIDRs.

Pros:

Performance: Native routing generally offers better performance since it avoids the overhead of encapsulation. Traffic flows directly between nodes without the added layer of tunneling.
Simplicity in networking: Without the tunneling layer, the networking stack has fewer layers to traverse, potentially reducing troubleshooting complexity in certain scenarios.
No MTU issues from encapsulation: Since there’s no encapsulation, you avoid the MTU size issues associated with overlay networks.

Cons:

Requires a cooperative underlying network: The underlying network must be aware of the Pod CIDRs, and routes must be propagated appropriately. This might not always be possible, especially in certain cloud environments.
IP addressing conflicts: Without an overlay to provide a separate address space, you must ensure that the Pod CIDRs do not conflict with the underlying network’s other IP ranges.
Potential security concerns: Depending on your environment, using direct routing might expose more of the cluster’s internal traffic patterns to the underlying network, which could be a concern if the environment isn’t fully trusted.

Choosing Between Encapsulation and Direct (Native) Routing in Cilium

The choice between encapsulation and native routing depends largely on your environment, requirements, and constraints. If you’re in a cloud environment where you have limited control over the underlying network, or in a situation where overlapping networks are a concern, tunneling might be the more straightforward choice.

On the other hand, if you’re running in an on-premises environment or a dedicated cloud setup where you have control over network routing and desire optimal performance, native routing can be a good fit.

You should always test in a representative environment to see which method best meets your needs, both from a functional and performance perspective.

To learn more, download our ebook Getting Started with Cilium today.