Problem with ARP towards some Wi-FI Access Points

Frequently Asked Questions
Post Reply
Peter
Posts: 683
Joined: 10 Apr 2008, 14:14
Location: Clavister HQ - Örnsköldsvik

Problem with ARP towards some Wi-FI Access Points

Post by Peter » 25 Jun 2020, 16:21

This FAQ applies to:
  • cOS Core any version
Question:
We have started having problems with our Wi-Fi Access Points (AP) where clients are unable to connect to the Firewall/Internet through the Firewall. The log contains entries about ARP resolution failed. A log sample:
2020-06-25 08:24:30 +02:00 ARP: id=00300009 rev=1 event=arp_resolution_failed action=remove_entry ipaddr=192.168.41.50 iface=Lan
In the CLI it can look like this:
ARP cache of iface Lan
Reslvng 192.168.41.50 = 00-00-00-00-00-00 Expire=1
"192.168.41.50" is the IP of a client behind a Wi-Fi AP.

Answer:

The log message is fairly straight forward, the Firewall is attempting to send an ARP query to the client behind the Wi-Fi AP interface (Lan in this example) but it did not get any reply.

Why does the Firewall ARP for the client IP?

The answer to that is based on the default behavior of the Firewall (and many other products, routers etc), it works like this. Let’s say that a client with IP 192.168.41.50 tries to connect to 8.8.8.8 using it’s default gateway 192.168.41.1 (which is the IP one of the Firewall’s interfaces).

1. Client first requests a DHCP lease and get the IP 192.168.41.50 with netmask 255.255.255.0 and gateway 192.168.41.1, so far so good.
2. Client wants to talk to 8.8.8.8 which is beyond its default gateway (192.168.41.1).
3. Since the client does not have the MAC address for 192.168.41.1 in its ARP cache the client sends an ARP request to find 192.168.41.1, the Firewall receives the request and responds to the client what it’s MAC address is.
4. The client now knows the MAC address of 192.168.41.1 and proceeds to send the packet to 8.8.8.8 to it as it is its default gateway and 8.8.8.8 does not belong to the clients local subnet (192.168.41.0/24).
5. The Firewall picks up the packet and forwards it to the destination server (assuming of course the IP rules allow it). A connection is created in the Firewall that tracks which source IP/port that made the request.
6. When the response packet from 8.8.8.8 arrive on the Firewall the Firewall knows that the packet should be returned to the initiator of the connection (192.168.41.50).
7. Since the Firewall so far does not have the IP 192.168.41.50 in it’s ARP cache it does not know the MAC address of the client. So the Firewall sends an ARP request to find the MAC address of 192.168.41.50 in order to know to which MAC address it should send the return packet.

And here is where it (most likely) goes wrong, the client behind the Wi-Fi AP for some reason does not reply to this ARP query. The firewall is unable to know where it should send the return packet and the traffic flow breaks. The Firewall would generate a log saying "ARP resolution failed".

Why does the client not respond to ARP?

It is a rather unusual scenario that should reasonably not happen as its basic layer 2 communication between the Firewall and the Wi-Fi AP. Maybe there is some sort of “security” function or feature on the AP that deny incoming ARP requests to be forwarded to the clients behind the AP, but if that is the case it’s a bit… odd.

There is a setting in the Firewall that could be a potential workaround to the problem. The option can be found under “Network->ARP->Advanced Settings” and is called “ARP Requests”, change this from “Drop” to “Accept”.

This change basically means that when the client makes the ARP request in step #3 above, we will add it to our own ARP cache as well. Then we would not need to perform step #7 as we already have the client IP/MAC in our ARP cache.

The reason why we do not allow this by default is because we are a Firewall, we do not allow something to add or make modifications to our ARP cache by default and it makes the Firewall susceptible to ARP poisoning attacks.

Why has this problem started showing up more and more recently?

We do not know, it could be that some Wi-FI AP's have changed some of their default settings regarding ARP and/or a new firmware update have changed it automatically on many installations.

Is this a Clavister specifc problem?

We would say no, why would the Wi-Fi AP not forward our ARP request to the clients? It's one thing to try make modifications to an existing ARP table but to simply refuse to answer to an ARP query? Very odd behavior.

Best would be to find an option on the Wi-Fi AP to allow it instead of changing the global ARP setting on the Firewall as that should be considered a temporary workaround.

Good to know CLI commands for troubleshooting.
"arp -show <interface>"
To see the status of the ARP cache on a specific interface.

"arpsnoop <interface> -verbose"
To see detailed information about the ARP queries sent and received by the Firewall. Warning: could spam the console if there is a lot of ARP traffic going on on the target interface.

"logsnoop -on -pattern=<client IP> -num=20"
Useful to get log output directly on the console. The "-num" flag effectively stops the user from accidentally spam the console with log entries by stopping the output after 20 entries.

Post Reply