Network outage after a system restart or firmware upgrade (identical route problem)

Frequently Asked Questions
Post Reply
Peter
Posts: 650
Joined: 10 Apr 2008, 14:14
Location: Clavister HQ - Örnsköldsvik

Network outage after a system restart or firmware upgrade (identical route problem)

Post by Peter » 21 Mar 2017, 08:06

This FAQ applies to:
  • Clavister cOS Core all versions
Question:

My system has been working fine for weeks/months+ without any configuration changes but after a system restart (or firmware upgrade) i lost all internet access, what happened?

Answer:

A very common scenario that we have seen during the years is a situation we call "Duplicate" or "Identical" route problem. An example on how the routing table on the problematic Firewall can look like this:
Route Lan 192.168.1.0/24 Metric=100
Route Wan 203.0.113.0/24 Metric=100
Route Wan all-nets Gateway=203.0.113.1 Metric=100
Route DMZ all-nets Metric=100
The problem in the above routing scenario is that there are two all-nets route (the route towards the Internet / default route) with the same metric (metric 100). This means that the Firewall will not be able to determine which route it should use and picks one at random. After a system restart or firmware upgrade it may simply choose the Dmz one and then the Internet access goes down. This is a fairly sneaky problem as it can work fine for months if "lucky".

It is not always the all-nets route that can cause this problem, it could be a problem on the local network as well as shown in the below routing outuput:
Route Lan 192.168.1.0/24 Metric=100
Route DMZ 192.168.1.0/24 Metric=100

Route Wan 203.0.113.0/24 Metric=100
Route Wan all-nets Gateway=203.0.113.1 Metric=100
In the above routing output we have a similar problem but between Lan and DMZ. If Lan is where all our users are it could all of a sudden stop working for all users on the Lan interface after a system restart (or firmware upgrade). The logs in the above scenario would generate a lot of "Default_Access_Rule" log entries for all Lan users.

The solution to this problem is fairly easy, either lower the metric on the route you want to use as primary, remove the duplicate route or change the network on the duplicate route to be something else.

Note-1: It is not necessary a restart or firmware upgrade that can trigger this problem, if a configuration change cause the routing table to be updated it could trigger the problem as well as cOS Core needs to repopulate the routing table.
Note-2: "Dupe routes" is not necessary a problem as it can be a valid configuration if for instance Route Load Balancing is used and you want an equal load distribution.
Note-3: Information about the "Default_Access_Rule" log entry can be found here.

Post Reply