Traffic in ambient mesh: Redirection using iptables and GENEVE tunnels (Part 2)
In my previous post, I explained the role of Istio CNI and mentioned the two redirection mechanisms Istio CNI plugin supports for redirecting traffic in the ambient mesh.
In this post, I will dive deeper into how redirection using iptables and GENEVE tunnels works.
You can also watch the accompanying video here.
Redirection using iptables and GENEVE tunnels
The CNI plugin initializes the routing on each node and sets up the iptables and ipset rules. On each node, two virtual interfaces are set up – istioin
and istioout
:
$ ip link show … 6: istioin: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether ee:9b:3a:7d:20:99 brd ff:ff:ff:ff:ff:ff 7: istioout: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default link/ether 36:e1:15:c1:b7:fe brd ff:ff:ff:ff:ff:ff …
As the names suggest the purpose of these interfaces is to handle inbound (istioin
) and outbound (istioout
) traffic on the node.
The two interfaces are connected using the GENEVE (Generic Network Virtualization Encapsulation) tunnel to the interfaces on the ztunnel pod running on the same node.
These virtual interfaces combined with the iptables rules and route tables on the node ensure that traffic from ambient pods is intercepted and, depending on the direction (inbound or outbound), sent either to istioout
or istioin
.
The packets sent to those interfaces end up on the pistioout
or pistioin
of the ztunnel pod running on the same node.
We can use the command below to look at the details of istioin
or istioout
, or any other interface on the host:
$ ip -d addr show istioin 6: istioin: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default link/ether ee:9b:3a:7d:20:99 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65485 geneve id 1000 remote 10.244.1.3 ttl auto dstport 6081 noudpcsum udp6zerocsumrx numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 192.168.126.1/30 brd 192.168.126.3 scope global istioin valid_lft forever preferred_lft forever
Note the highlighted line that references the IP 10.244.1.3, which in this example corresponds to the IP address of the ztunnel pod on the same node.
Additionally, as part of the Kubernetes networking a virtual ethernet device (veth) interface pair is typically created for each pod running on the node. This veth pair connects the container to the nodes’ network.
Figure: Connecting pods to the nodes network using veth pairs
The other changes on the node include the ztunnel flavored iptables chains that redirect packets from the standard chains (e.g. PREROUTING
, OUTPUT
, …) from the existing tables (NAT
and MANGLE
) to custom ztunnel chains.
Figure: Default iptables chains redirect to custom ztunnel chains
Additionally, an IP set with the name ztunnel-pods-ips
gets created on the node and is used to store the IP addresses of pods that are part of the ambient mesh.
What is an IP set? An IP set is a framework for storing IP addresses, port numbers, and MAC address pairs. A tool called “ipset” allows configuring the IP sets.
Every time a pod gets added, the CNI plugin adds the pods IP address to the ztunnel-pods-ips
IP set on the node. If we go to the node, we can look at the IPs that are part of the IP set by running the ipset list
command:
$ ipset list Name: ztunnel-pods-ips Type: hash:ip Revision: 0 Header: family inet hashsize 1024 maxelem 65536 Size in memory: 328 References: 1 Number of entries: 2 Members: 10.244.1.6 10.244.1.5
Note: to install ipset tool on the node, you can run `apt update && apt install ipset`.
In the example output above, we’ll notice two IP addresses (10.244.1.5
and 10.244.1.6
) that are part of the IP set on that node. The two IP addresses correspond to the pod IPs that are part of the ambient mesh on the node.
Figure: Pods and their IP addresses in the ambient mesh
As the pods are added or removed (i.e. their IP address changes), the CNI plugin keeps the IP set on the node up to date.
The ztunnel-pods-ips
IP set is consulted as part of the iptables rules where the IPs are looked up and marked accordingly.
For example, packets are marked with 0x100/0x100
mark if they originate from an ambient pod and get redirected through the istioout
interface to the ztunnel.
We can use the ip rule list
command to see how the route tables are set up using different marks:
$ ip rule list 0: from all lookup local 100: from all fwmark 0x200/0x200 goto 32766 101: from all fwmark 0x100/0x100 lookup 101 102: from all fwmark 0x40/0x40 lookup 102 103: from all lookup 100 32766: from all lookup main 32767: from all lookup default
If we continue with the previous example, the highlighted rule above tells us that any packets marked with 0x100/0x100
(i.e. originating from an ambient pod) should be routed according to the rules in the route table with priority 101.
To look at that route table, run the command below:
$ ip route show table 101 default via 192.168.127.2 dev istioout 10.244.2.3 dev veth0cd2d371 scope link
The default route (first line) is telling us to send traffic to the IP address 192.168.127.2
(pistioout
on ztunnel) via the istioout
interface.
The second, more specific rule is saying that any traffic sent to 10.244.2.3
directly (i.e. the IP address of the ztunnel pod) should be sent through the virtual interface (veth0cd2d371
).
The istioout
interface (192.168.127.1
) is connected using the GENEVE tunnel to the interface with IP 192.168.127.2
on the ztunnel running on the same node. Practically what this means is if we send a request to 192.168.127.2
through the istoout
interface, the packets will end up on the ztunnel outbound interface called pistioout
(note the prefix p), effectively routing the outbound traffic from the pods in the ambient mesh through the ztunnel.
The figure below shows how the packets flow for a request originating from a pod inside the ambient mesh.
Figure: Outbound traffic flow through the ztunnel
In addition to the outbound mark, there are a couple of other packet marks used in the configuration. The table below explains the different rules and marks the rules try to match.
Rule | Matches | Description |
100: from all fwmark 0x200/0x200 goto 32766 |
0x200/0x200 (skip mark) |
Skip any traffic marked with 0x200/0x200 (i.e. no need to intercept it) and go directly to the main route table. |
101: from all fwmark 0x100/0x100 lookup 101 |
0x100/0x100 (outbound mark) | Outbound traffic from the node gets redirected to istioout . See the example above. |
102: from all fwmark 0x40/0x40 lookup 102 |
0x40/x40 (proxy return mark) | Packets with the proxy return mark (0x40/0x40 ) go to the ztunnel proxy veth. |
Table: Packet marks and rules on the node
Conclusion
In this post we looked into how the redirection using the iptables and GENEVE tunnels is set up by the Istio CNI plugin on the cluster nodes. I have also explained the interfaces that are set up on the node and connect directly to the interfaces on the ztunnel pods.
In the next post, we’ll look into the ztunnel side and explain the configuration that happens there. I will also talk about the waypoint proxies and the second mechanism Istio CNI plugin uses for redirection – eBPF.
As Solo.io is a co-founder of the Istio ambient sidecar-less architecture and leads the development upstream in the Istio community, we are uniquely positioned to help our customers adopt this architecture for production security and compliance requirements. Please reach out to us to talk with an expert.