Route failover on IPsec tunnels (10.x)

How to's for older versions of CorePlus
Post Reply
Peter
Posts: 659
Joined: 10 Apr 2008, 14:14
Location: Clavister HQ - Örnsköldsvik

Route failover on IPsec tunnels (10.x)

Post by Peter » 25 Aug 2010, 13:59

This How-to applies to:
  • Clavister Security Gateway 8.x, 9.x and 10.x.
Problem:

I have two installations on two different locations regarding route failover and IPsec.
  • Scenario1: An SGW with one ISP has an IPsec tunnel towards an SGW with two ISP’s.
I want to be able to use Route Failover on the SGW with two ISP’s. If the primary ISP/Route goes down it should be able to establish the IPsec tunnel using the secondary/backup ISP.
  • Scenario2: An SGW with two ISP’s has an IPsec tunnel towards an SGW which also has two ISP’s.
Similar scenario but with more redundancy, if any of the primary ISP’s on either side goes down it should establish and use the secondary/backup IPsec tunnel.

Solution:
  • Scenario1.
One-ISP.png
One-ISP.png (67.98 KiB) Viewed 2426 times
Since we do not have two ISP’s on both sides there is no need to use 2 IPsectunnels. We simply setup the RFO (Route Failover) towards the ISP only and then configure the IPsec tunnel to accept this. It is possible to use two IPsec tunnels in this scenario but it will only add another layer of complexity that is not really needed.
  • Base configuration:
Site-A (One ISP)

Code: Select all

Lannet=10.10.10.0/24
Wannet=80.80.80.0/24
IP_Wan=80.80.80.10
Routing Table(Site-A):

Code: Select all

Route Lan Lannet
Route Wan Wannet
Route IPSecTunnel SiteBLannet
Route Wan All-nets Gw-World
Site-B (Two ISP’s)

Code: Select all

Lannet=192.168.200.0/24
Wannet=90.90.90.0/24
Wannet2=100.100.100.0/24
IP_Wan=90.90.90.10
IP_Wan2=100.100.100.10
Routing Table(Site-B):

Code: Select all

Route Lan Lannet
Route Wan Wannet
Route Wan2 Wannet2
Route IPSecTunnel SiteALannet
Route Wan All-nets Gw-World
Route Wan2 All-nets Gw-World2
The routing table on Site-B is in its current state incorrect, the reason for this is because we have two “identical” routes on the all-nets route. Without a metric definition the SGW will be unable to determine which ISP it should use when i.e surfing the web. Sometimes you may use Wan and sometimes Wan2, so not a good setup for the moment.

Since we want to have redundancy we first need to setup the RFO, in this particular scenario it is fairly simple. We edit the routing table on Site-B to look like this:

Code: Select all

Route Lan Lannet
Route Wan Wannet
Route Wan2 Wannet2
Route IPSecTunnel SiteALannet
Route Wan All-nets Gw-World Metric=10 Monitor=Yes
Route Wan2 All-nets Gw-World2 Metric=20
What you choose to monitor here is up to you, it can be ARP, Link and HostMonitor depending on the requirement.

As you may notice we have not configured any monitoring on the IPsec tunnel. The reason for this is because we do not need to. We only have one IPsec tunnel, so no monitoring on the IPsec tunnel itself is needed in THIS scenario. What will happen here is when the primary ISP fails, it will failover to the secondary ISP. Monitoring that is per default enabled on the IPsec tunnel (DPD – Dead Peer Detection) will detect that the tunnel is no longer alive and will try to establish the tunnel anew. When it performs a route lookup for the remote gateway it will find a matching route on Wan2 as the Wan All-net route is now disabled, the tunnel will then be established from ISP2.

So are we done? No, there is some more things that needs to be taken into account, one that is the IPsec tunnel configuration on Site-A. We need to configure this tunnel to accept tunnel negotiations from 2 different IP’s. Normally you define an IPsec tunnel between two IP’s which is statically defined. In this scenario we do not know if the tunnel negotiation will arrive from 90.90.90.10 (Wan on Site-B) or 100.100.100.10 (Wan2 on Site-B).

So on Site-A we create a new host group that contains both these IP’s and use them as Remote Gateway on the IPsec tunnel configured at Site-A. By doing this we will accept tunnel negotiations from both public IP’s that exists on Site-B.

There is however a drawback, and that is that you can never initiate the tunnel from Site-A. It must always be initiated from Site-B. So if you have Keep-Alive configured on Site-A’s IPsec tunnel, remove it. This IPsec RFO failover scenario is not as fast as normal interface failovers (or multiple ISP’s at both sites) since DPD is not that fast in declaring a tunnel as down. Usually the failover takes around 60-100 seconds in this scenario.

And lastly you need to configure/specify a Local ID on the IPsec tunnel on Site-B in order for it to identify the tunnel as the "same". If you do not, after a failover the tunnel will be established but no traffic will go thru the tunnel until the Conn has been re-established again. I.e if you are constantly sending a ping thru the tunnel you need to stop the ping, wait a few seconds and then start it again. Setting a Local ID will solve that problem. The value can be pretty much anything as it is used as a tunnel "identifier".
  • Scenario2.
In this scenario we have two ISP’s on both sides. Which means that in this scenario we can use two IPsec tunnels as well as we now have two static IP’s to connect to on both sides.
Two-ISP.png
Two-ISP.png (72.66 KiB) Viewed 2427 times
  • Base configuration:
Site-A (Two ISP’s)

Code: Select all

Lannet=10.10.10.0/24
IP_Lan=10.10.10.1
Wannet=70.70.70.0/24
Wannet2=80.80.80.0/24
IP_Wan=70.70.70.10
IP_Wan2=80.80.80.10
Routing Table(Site-A):

Code: Select all

Route Lan Lannet
Route Wan Wannet
Route Wan2 Wannet2
Route IPSecTunnel1 SiteBLannet
Route IPSecTunnel2 SiteBLannet
Route Wan All-nets Gw-World Metric=10 Monitor=Yes
Route Wan2 All-nets Gw-World2 Metric=20
Site-B (Two ISP’s)

Code: Select all

Lannet=192.168.200.0/24
IP_Lan=192.168.200.1
Wannet=90.90.90.0/24
Wannet2=100.100.100.0/24
IP_Wan=90.90.90.10
IP_Wan2=100.100.100.10

Routing Table (Site-B):

Code: Select all

Route Lan Lannet
Route Wan Wannet
Route Wan2 Wannet2
Route IPSecTunnel1 SiteALannet
Route IPSecTunnel2 SiteALannet
Route Wan All-nets Gw-World Metric=10 Monitor=Yes
Route Wan2 All-nets Gw-World2 Metric=20
The above scenario is in its current state working fine for the standard RFO scenario (physical interface) but the IPsec tunnel RFO still needs some modifications. First of all we need to add Metrics to the tunnel routes. So on both sites we add Metric=10 for the primary and Metric=20 for the secondary tunnel. Also we need to monitor the primary tunnel. Since this is an IPsec tunnel we cannot use ARP or LinkState, the only method we can use is HostMonitor. What host to monitor here is up to you but in this example we use the internal interface IP on the other side of the IPsec tunnel.

Note: While it's possible to select e.g. Link for the monitored tunnel, monitoring an IPsec tunnel using link does not work. The link will always be reported as OK. So it's a kind of false response and could give the impression that it actually works.

So Site-A’s primary IPsec tunnel route monitors the IP 192.168.200.1 and Site-B’s primary IPsec tunnel route monitors the IP 10.10.10.1.

If the primary ISP goes down so will the monitor of the primary IPsec tunnel stop receiving replies to its HostMon queries. So the primary Route will failover to the secondary ISP and the primary route on the IPsec tunnel will failover to the secondary IPsec tunnel route, which is also an IPsec tunnel but towards a different remote gateway.

Note: The primary and secondary tunnel configuration is identical in terms of local and remote network. The only thing that is different is the remote gateway. The primary tunnel goes towards ISP-1-IP and the secondary goes towards ISP-2 IP on the remote side.

Note2: Keep-alive must not be active on the secondary/backup IPsec tunnel. Otherwise it will cause delays similar to scenario1. The failover will take 60+ seconds instead of ~10 seconds (possibly even less). It is possible to have both tunnels up at the same time by single-host route the remote endpoints on Wan and Wan2. But we will not go into details about that specific setup in this How-To.

So far so good, the failover scenario works and the secondary tunnel will take over if the primary goes down. But what happens when the primary ISP comes back up?

When the primary ISP comes back the monitoring functions on the physical interfaces towards this ISP will notice this and will declare the primary route as alive again and move traffic from the secondary ISP back to the primary.

But this causes a problem for the IPsec monitoring. We monitor a host on the other side of the IPsec tunnel, monitoring packets are ONLY sent out on the route it is defined on. This means that when the primary ISP is back up the primary tunnel will be established again, but the primary route can never be declared as up since the monitoring packets are received (on the other side of the tunnel) on a disabled route, these packets will be dropped due to “default_access_rule”.

In order to solve this we define the source address of the monitor traffic on the primary route. This may sound a bit cryptic but if we update the routing table on Site-A to reflect this it will look like this:

Code: Select all

Route Lan Lannet
Route Wan Wannet
Route Wan2 Wannet2
Route IPSecTunnel1 SiteBLannet Metric=10 Monitor=Yes
Route IPSecTunnel1 MonitorSource
Route IPSecTunnel2 SiteBLannet Metric=20
Route Wan All-nets Gw-World Metric=10 Monitor=Yes
Route Wan2 All-nets Gw-World2 Metric=20
Where the MonitorSource object is the following IP, 192.168.200.1. Which is Site-B’s internal interface IP. A similar single-host route needs to be defined on Site-B’s routing table where the MonitorSource object is 10.10.10.1.

So what does this mean?

This means that when a failover has occurred the defined HostMon on the primary route will constantly send out Monitoring packets to the defined host. If the primary tunnel is back up these packets will start to arrive on the primary tunnel. Since our routing principle is “smallest route first” it will match the “Route IPSecTunnel1 MonitorSource” route and accept the incoming packets. It will no longer be dropped by the “Default_Access_Rule” and the primary IPsec tunnel route can now be declared as up when the primary ISP recovers.

Important note: The IP defined as MonitorSource will always be routed on the primary IPSec tunnel, so no other traffic than the monitor should be used towards this IP. If something other than the monitor need to be able to reach this IP, it is better to define and use a new/different IP.

Peter
Posts: 659
Joined: 10 Apr 2008, 14:14
Location: Clavister HQ - Örnsköldsvik

Re: Route failover on IPsec tunnels

Post by Peter » 15 Mar 2012, 11:51

Update 2012-03-15:

Changes made in CorePlus in version 9.30.04 and up enables the use of an alternative configuration method which is much easier to setup.

In order to describe the alternative method we will use Scenario-2 as base, two ISP's on both sides.

Short description how it should be configured:
  • 1. The Primary IPsec tunnel has Keep-Alive configured where manual configuration of Keep-Alive is done, both the Source IP and Destination IP must be manually configured and the remote host must be reachable using ICMP.
    2. The Primary tunnel must have a single-host route towards the remote network host used by Keep-Alive. It is recommended that this IP be dedicated to monitoring only as we will single-host route it on the primary tunnel only. Any traffic towards this IP if/when the primary tunnel goes down will not failover.
    2.1. The reason for this is to always allow monitoring traffic to arrive on the primary IPsec tunnel. If we do not have this route, the primary route will NEVER recover as the monitoring packets arrive on the wrong/secondary interface.
    3. The Primary IPsec tunnel must have DPD (Dead-Peer-Detection) disabled.
    4. The Primary IPsec tunnel route must have monitoring enabled using Link Status ONLY.
An example routing table in Scenario-B would then look something like this:

Code: Select all

Route Lan Lannet
Route Wan Wannet
Route Wan2 Wannet2
Route IPsecTunnel1 SiteAMonitorHost
Route IPSecTunnel1 SiteALannet Metric=10 Monitor=Yes
Route IPSecTunnel2 SiteALannet Metric=20
Route Wan All-nets Gw-World Metric=10 Monitor=Yes
Route Wan2 All-nets Gw-World2 Metric=20
This method is much easier, if you use version 9.30.04 and up, this method is recommended.

Peter
Posts: 659
Joined: 10 Apr 2008, 14:14
Location: Clavister HQ - Örnsköldsvik

Re: Route failover on IPsec tunnels (10.x)

Post by Peter » 10 Oct 2013, 15:17

Update 2013-10-10:

A problem that was discovered when re-testing the 2 vs 2 ISP scenario is when only one of the links are down. It is very unlikely that both ISP links go down at the same time and in order to handle the situation that occurs when primary is working on e.g. Site-B but not on Site-A we need to add a new Single-Host route on both Site-A and B.

The route would look something like this:

Site-A

Code: Select all

Route Wan2 SiteB_Wan2_IP Gw-World2
Site-B

Code: Select all

Route Wan2 SiteA_Wan2_IP Gw-World2
The problem is when the primary link fails on one side only, then the secondary tunnel will be established towards the secondary interface of the other end. And if the primary interface route is still working on that end, it will be dropped by "default_access_rule" (e.g. a the source IP is not routed there) as the tunnel tries to establish towards a route/IP that has higher metric (e.g. Wan2).

So in order to solve that particular problem i had to setup ONE single host route on both sides towards their secondary tunnel's remote gateway IP. That way incoming IKE packets will be allowed even if the primary link is still up.

It is very unlikely that both ISP links go down at the same time, so this will handle the scenario that only one go down on either side.

The drawback of this is that when the primary link fails on one side, the backup tunnel will always be used, you cannot for instance setup the primary tunnel towards the secondary and vise versa.

Post Reply