Lesson Contents
ARP suppression is a feature for MP-BGP EVPN that reduces ARP flooding on VXLAN networks. Flooding impacts network performance, so it should be kept to a minimum.
IPv4 uses ARP to map an IP address to a MAC address. When a host wants to talk to another host on the same subnet, it sends an ARP request to figure out the remote host’s MAC address. An ARP request is broadcast traffic, so it falls in the category of broadcast, unknown-unicast, and multicast traffic (BUM traffic).
When using VXLAN, the hosts have no clue as to what network they are connected to. ARP requests are treated as multi-destination traffic and flooded to all VTEPs within the L2 VNI. This is achieved using static ingress replication or with a multicast underlay. The ARP reply is a unicast packet. Here is a visualization of ARP on a VXLAN network:
This flooding behavior is inefficient and should be reduced to a minimum.
So, how does ARP suppression work? There are two pieces to understand:
- Creating and maintaining the ARP suppression cache table
- Dealing with ARP requests from hosts
Let’s take a closer look at both.
ARP Suppression Cache Table
Once you enable ARP suppression, each VTEP will create and maintain an ARP suppression cache table. In this table, they store the IP-MAC bindings of the different hosts in a VNI.
There are two ways to add MAC and IP bindings to the cache table:
- Local
- Remote
Let’s take a closer look at both options.
Local
VTEPs learn from the ARP requests from downstream hosts.
Many hosts are “chatty” and generate some traffic when they connect to the network. They might send a Gratuitous Address Resolution Protocol (GARP) or Reverse ARP (RARP) when they connect to the network. Otherwise, they might immediately send an ARP request for their default gateway, usually the VTEP.
When a host immediately generates traffic, the VTEP can quickly learn the host’s MAC and IP address and add them to the ARP suppression cache table.
There are exceptions, though. Some hosts might not generate any traffic until someone looks for them. We call these silent hosts. The only way to discover their MAC and IP address is that some host sends an ARP request for them. When the silent host replies with an ARP reply, we can learn their MAC and IP address. Examples of silent hosts can be printers or security devices. They can be connected for hours or days without initiating any traffic.
Remote
The second option for learning MAC and IP bindings and installing them in the cache table is to learn them from remote VTEPs. When a VTEP learns about a host’s MAC and IP address, it installs it in the cache table but also creates an MP-BGP EVPN type 2 route, which is advertised to other VTEPs.
Respond to ARP Requests
Now that you know how the VTEPs fill the cache table, the second part is understanding how they deal with ARP requests from hosts.
Cache table without match
When a host in a VNI sends an ARP request for a host in the same VNI, the VTEP intercepts and checks the cache table, when there is no match, the switch floods the ARP request to other VTEPs.
Cache table contains match
When there is a match in the cache table, the VTEP suppresses the ARP request so it won’t be flooded throughout the fabric.
Instead, the VTEP creates an ARP reply on behalf of the destination host and sends it to the host who sent the ARP request. The VTEP acts as an ARP proxy. It’s different, though, because with an ARP proxy, we use the router’s MAC address, and this time, the ARP reply is the same as if it originated from the destination host.
Not flooding that ARP request to other VTEPs saves bandwidth on the underlay network and some CPU cycles on the hosts because they don’t have to process unnecessary ARP requests.
Issues
ARP suppression might sound like a good idea. Who doesn’t want less flooding and a more efficient network? It is also easy to configure. However, there are some issues.
ARPs aren’t restricted and flooded when needed on a typical Ethernet network. ARP suppression changes this behavior. Some applications might use ARP as a keep-alive mechanism. With ARP suppression enabled, these keepalives don’t make it end-to-end, and the application will think something is wrong. You might argue that (mis)using ARP for keepalives isn’t the best idea, but the reality is that we’ll have all kinds of applications running on our networks.
Issues related to inactive hosts also exist because of the mismatch between the MAC address aging time (5 minutes) and the ARP aging time (4 hours). This requires a more in-depth explanation outside the scope of this lesson.
ARP suppression is enabled or disabled by default, depending on the vendor. Do not enable this feature without fully understanding the possible complications.
Configuration
Let’s take a look at ARP suppression in action. We’ll do a before-and-after comparison so you can see the difference. I’m using a topology with a single spine switch and two leaf switches. We use an L2 VNI. It is the same topology we used in the MP-BGP EVPN L2 VNI lesson.
We have two leaf switches connected to a single spine switch. The two hosts are Ubuntu docker containers. We’ll use these to generate some ARP and ICMP traffic. We use a single L2 VNI so that the hosts can communicate directly in the same subnet. I’m using Cisco NX-OS 9000v 10.3(1) on all switches.
Configurations
Want to take a look for yourself? Here, you will find the startup configuration of each device.
SPINE1
hostname SPINE1
nv overlay evpn
feature ospf
feature bgp
feature pim
ip pim rp-address 1.1.1.1 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
interface Ethernet1/1
no switchport
mac-address 0050.c253.1001
ip address 192.168.12.1/24
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown
interface Ethernet1/2
no switchport
mac-address 0050.c253.1002
ip address 192.168.13.1/24
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown
interface loopback0
ip address 1.1.1.1/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
icam monitor scale
router ospf 1
router bgp 123
log-neighbor-changes
neighbor 2.2.2.2
remote-as 123
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client
neighbor 3.3.3.3
remote-as 123
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client
LEAF1
hostname LEAF1
nv overlay evpn
feature ospf
feature bgp
feature pim
feature vn-segment-vlan-based
feature nv overlay
ip pim rp-address 1.1.1.1 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
vlan 10
vn-segment 10010
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0
member vni 10010
mcast-group 239.1.1.1
interface Ethernet1/1
no switchport
mac-address 0050.c253.2001
ip address 192.168.12.2/24
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown
interface Ethernet1/2
switchport access vlan 10
interface loopback0
ip address 2.2.2.2/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
router ospf 1
router bgp 123
log-neighbor-changes
neighbor 1.1.1.1
remote-as 123
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
LEAF2
hostname LEAF2
nv overlay evpn
feature ospf
feature bgp
feature pim
feature vn-segment-vlan-based
feature nv overlay
ip pim rp-address 1.1.1.1 group-list 224.0.0.0/4
ip pim ssm range 232.0.0.0/8
vlan 10
vn-segment 10010
interface nve1
no shutdown
host-reachability protocol bgp
source-interface loopback0
member vni 10010
mcast-group 239.1.1.1
interface Ethernet1/1
no switchport
mac-address 0050.c253.3001
ip address 192.168.13.3/24
ip ospf network point-to-point
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown
interface Ethernet1/2
switchport access vlan 10
interface loopback0
ip address 3.3.3.3/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
router ospf 1
router bgp 123
log-neighbor-changes
neighbor 1.1.1.1
remote-as 123
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
Without ARP Suppression
Let’s start with the default behavior where ARP suppression is disabled. I’ll send a ping from S1 to S2:
root@S2:# ping 172.16.12.2 -c 5
PING 172.16.12.2 (172.16.12.2) 56(84) bytes of data.
64 bytes from 172.16.12.2: icmp_seq=1 ttl=64 time=5.52 ms
64 bytes from 172.16.12.2: icmp_seq=2 ttl=64 time=18.7 ms
64 bytes from 172.16.12.2: icmp_seq=3 ttl=64 time=24.8 ms
64 bytes from 172.16.12.2: icmp_seq=4 ttl=64 time=7.16 ms
64 bytes from 172.16.12.2: icmp_seq=5 ttl=64 time=16.6 ms
The ARP request looks like this:
Frame 8: 110 bytes on wire (880 bits), 110 bytes captured (880 bits)
Ethernet II, Src: OrionTechnol_53:10:02 (00:50:c2:53:10:02), Dst: IPv4mcast_01:01:01 (01:00:5e:01:01:01)
Internet Protocol Version 4, Src: 2.2.2.2, Dst: 239.1.1.1
User Datagram Protocol, Src Port: 63096, Dst Port: 4789
Virtual eXtensible Local Area Network
Ethernet II, Src: OrionTechnol_53:40:01 (00:50:c2:53:40:01), Dst: Broadcast (ff:ff:ff:ff:ff:ff)
Address Resolution Protocol (request)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (1)
Sender MAC address: OrionTechnol_53:40:01 (00:50:c2:53:40:01)
Sender IP address: 172.16.12.1
Target MAC address: Xerox_00:00:00 (00:00:00:00:00:00)
Target IP address: 172.16.12.2
This is a broadcast that is flooded to all VTEPs in the VNI. Here is the ARP reply:
Frame 9: 110 bytes on wire (880 bits), 110 bytes captured (880 bits)
Ethernet II, Src: OrionTechnol_53:30:01 (00:50:c2:53:30:01), Dst: OrionTechnol_53:10:02 (00:50:c2:53:10:02)
Internet Protocol Version 4, Src: 3.3.3.3, Dst: 2.2.2.2
User Datagram Protocol, Src Port: 58649, Dst Port: 4789
Virtual eXtensible Local Area Network
Ethernet II, Src: OrionTechnol_53:50:01 (00:50:c2:53:50:01), Dst: OrionTechnol_53:40:01 (00:50:c2:53:40:01)
Address Resolution Protocol (reply)
Hardware type: Ethernet (1)
Protocol type: IPv4 (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: reply (2)
Sender MAC address: OrionTechnol_53:50:01 (00:50:c2:53:50:01)
Sender IP address: 172.16.12.2
Target MAC address: OrionTechnol_53:40:01 (00:50:c2:53:40:01)
Target IP address: 172.16.12.1
This is a unicast packet from S2 to S1.
VXLAN MP-BGP EVPN ARP with Multicast Underlay
Here is what LEAF1 has advertised in MP-BGP EVPN:
LEAF1# show bgp l2vpn evpn neighbors 1.1.1.1 advertised-routes
Peer 1.1.1.1 routes for address family L2VPN EVPN:
BGP table version is 5, Local Router ID is 2.2.2.2
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 2.2.2.2:32777 (L2VNI 10010)
*>l[2]:[0]:[0]:[48]:[0050.c253.4001]:[0]:[0.0.0.0]/216
2.2.2.2 100 32768 i
Route Distinguisher: 3.3.3.3:32777
Above, we see the MAC address of S1 but no IP address. You see the same thing on LEAF2 for S2:
LEAF2# show bgp l2vpn evpn neighbors 1.1.1.1 advertised-routes
Peer 1.1.1.1 routes for address family L2VPN EVPN:
BGP table version is 5, Local Router ID is 3.3.3.3
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2
Network Next Hop Metric LocPrf Weight Path
Route Distinguisher: 2.2.2.2:32777
Route Distinguisher: 3.3.3.3:32777 (L2VNI 10010)
*>l[2]:[0]:[0]:[48]:[0050.c253.5001]:[0]:[0.0.0.0]/216
3.3.3.3 100 32768 i
We only see the MAC address. This is as expected for our L2 VNI.
With ARP Suppression
Now, let’s see how ARP suppression works.
First, we need to carve the TCAM (Ternary Content Addressable Memory); otherwise, you can’t enable ARP suppression. TCAM is a special memory type for storing data that requires fast lookups, such as access lists. The TCAM size is limited, so you need to decide which features you need and how much memory you assign to them. TCAM carving means we reallocate TCAM resources for specific features or requirements.
Assigning resources is done with slices. A slice is a unit of memory allocation, and it can be 256 or 512 bytes.
Whether you need TCAM carving or not depends on the platform. You need to do it on the NX-OS 9000v. Otherwise, you get this error when you try to enable ARP suppression: