Traffic in ambient mesh: Redirection using iptables and GENEVE tunnels (Part 2)

In my previous post, I explained the role of Istio CNI and mentioned the two redirection mechanisms Istio CNI plugin supports for redirecting traffic in the ambient mesh.

In this post, I will dive deeper into how redirection using iptables and GENEVE tunnels works.

You can also watch the accompanying video here

 

Redirection using iptables and GENEVE tunnels

The CNI plugin initializes the routing on each node and sets up the iptables and ipset rules. On each node, two virtual interfaces are set up – istioin and istioout

$ ip link show
…
6: istioin: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether ee:9b:3a:7d:20:99 brd ff:ff:ff:ff:ff:ff
7: istioout: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether 36:e1:15:c1:b7:fe brd ff:ff:ff:ff:ff:ff
…

As the names suggest the purpose of these interfaces is to handle inbound (istioin) and outbound (istioout) traffic on the node.

The two interfaces are connected using the GENEVE (Generic Network Virtualization Encapsulation) tunnel to the interfaces on the ztunnel pod running on the same node.

geneve tunnels

These virtual interfaces combined with the iptables rules and route tables on the node ensure that traffic from ambient pods is intercepted and, depending on the direction (inbound or outbound), sent either to istioout or istioin

The packets sent to those interfaces end up on the pistioout or pistioin of the ztunnel pod running on the same node. 

We can use the command below to look at the details of istioin or istioout, or any other interface on the host:

$ ip -d addr show istioin
6: istioin: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether ee:9b:3a:7d:20:99 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65485
    geneve id 1000 remote 10.244.1.3 ttl auto dstport 6081 noudpcsum udp6zerocsumrx numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    inet 192.168.126.1/30 brd 192.168.126.3 scope global istioin
       valid_lft forever preferred_lft forever

Note the highlighted line that references the IP 10.244.1.3, which in this example corresponds to the IP address of the ztunnel pod on the same node.

Additionally, as part of the Kubernetes networking a virtual ethernet device (veth) interface pair is typically created for each pod running on the node. This veth pair connects the container to the nodes’ network. 

veth node

Figure: Connecting pods to the nodes network using veth pairs

The other changes on the node include the ztunnel flavored iptables chains that redirect packets from the standard chains (e.g. PREROUTING, OUTPUT, …) from the existing tables (NAT and MANGLE) to custom ztunnel chains. 

iptables chains

Figure: Default iptables chains redirect to custom ztunnel chains  

Additionally, an IP set with the name ztunnel-pods-ips gets created on the node and is used to store the IP addresses of pods that are part of the ambient mesh. 

What is an IP set? An IP set is a framework for storing IP addresses, port numbers, and MAC address pairs. A tool called “ipset” allows configuring the IP sets.

Every time a pod gets added, the CNI plugin adds the pods IP address to the ztunnel-pods-ips IP set on the node. If we go to the node, we can look at the IPs that are part of the IP set by running the ipset list command:

$ ipset list
Name: ztunnel-pods-ips
Type: hash:ip
Revision: 0
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 328
References: 1
Number of entries: 2
Members:
10.244.1.6
10.244.1.5

Note: to install ipset tool on the node, you can run `apt update && apt install ipset`.

In the example output above, we’ll notice two IP addresses (10.244.1.5 and 10.244.1.6) that are part of the IP set on that node. The two IP addresses correspond to the pod IPs that are part of the ambient mesh on the node. 

kubectl

Figure: Pods and their IP addresses in the ambient mesh

As the pods are added or removed (i.e. their IP address changes), the CNI plugin keeps the IP set on the node up to date.

The ztunnel-pods-ips IP set is consulted as part of the iptables rules where the IPs are looked up and marked accordingly. 

For example, packets are marked with 0x100/0x100 mark if they originate from an ambient pod and get redirected through the istioout interface to the ztunnel.

We can use the ip rule list command to see how the route tables are set up using different marks:

$ ip rule list
0:      from all lookup local
100:    from all fwmark 0x200/0x200 goto 32766
101:    from all fwmark 0x100/0x100 lookup 101
102:    from all fwmark 0x40/0x40 lookup 102
103:    from all lookup 100
32766:  from all lookup main
32767:  from all lookup default

If we continue with the previous example, the highlighted rule above tells us that any packets marked with 0x100/0x100 (i.e. originating from an ambient pod) should be routed according to the rules in the route table with priority 101.

To look at that route table, run the command below:

$ ip route show table 101
default via 192.168.127.2 dev istioout
10.244.2.3 dev veth0cd2d371 scope link

The default route (first line) is telling us to send traffic to the IP address 192.168.127.2 (pistioout on ztunnel) via the istioout interface. 

The second, more specific rule is saying that any traffic sent to 10.244.2.3 directly (i.e. the IP address of the ztunnel pod) should be sent through the virtual interface (veth0cd2d371).

The istioout interface (192.168.127.1) is connected using the GENEVE tunnel to the interface with IP 192.168.127.2 on the ztunnel running on the same node. Practically what this means is if we send a request to 192.168.127.2 through the istoout interface, the packets will end up on the ztunnel outbound interface called pistioout (note the prefix p), effectively routing the outbound traffic from the pods in the ambient mesh through the ztunnel.

The figure below shows how the packets flow for a request originating from a pod inside the ambient mesh.

outobund

Figure: Outbound traffic flow through the ztunnel

In addition to the outbound mark, there are a couple of other packet marks used in the configuration. The table below explains the different rules and marks the rules try to match. 

Rule Matches Description
100:    from all fwmark 0x200/0x200 goto 32766 0x200/0x200 (skip mark) Skip any traffic marked with 0x200/0x200 (i.e. no need to intercept it) and go directly to the main route table.
101: from all fwmark 0x100/0x100 lookup 101 0x100/0x100 (outbound mark) Outbound traffic from the node gets redirected to istioout. See the example above.
102:    from all fwmark 0x40/0x40 lookup 102 0x40/x40 (proxy return mark) Packets with the proxy return mark (0x40/0x40) go to the ztunnel proxy veth.

Table: Packet marks and rules on the node

Conclusion

In this post we looked into how the redirection using the iptables and GENEVE tunnels is set up by the Istio CNI plugin on the cluster nodes. I have also explained the interfaces that are set up on the node and connect directly to the interfaces on the ztunnel pods.

In the next post, we’ll look into the ztunnel side and explain the configuration that happens there. I will also talk about the waypoint proxies and the second mechanism Istio CNI plugin uses for redirection – eBPF.

As Solo.io is a co-founder of the Istio ambient sidecar-less architecture and leads the development upstream in the Istio community, we are uniquely positioned to help our customers adopt this architecture for production security and compliance requirements. Please reach out to us to talk with an expert.