Bidirectional Forwarding Detection Protocol is a relatively new tool that allows to significantly lower dynamic routing protocol convergence times. Previously faster convergence in the LAN was often achieved by tuning protocol hello/keepalive timers. A good example is Fast Hello setting for OSPF. Lowering routing protocol timers, however, can lead to higher CPU resource utilization and somewhat unexpected behavior of the network. For example in a network where you have OSPF, BGP and LDP working simultaneously with two latter depending on the former, and you try to improve convergence times by tweaking the timers for all three protocols it is difficult to predict how the whole thing reacts on some link flapping in terms of both convergence time and hardware resource utilization during convergence period. Again, I am talking here about the complicated situations, such as when you lose communications over the link without interfaces going down, maybe just in one direction. Situations when the Layer 1 and Layer 2 go down normally trigger some protocol events and convergence occurs quite quickly.
BFD offers lightweight mechanism for detecting link communications failures and notifying routing protocol making them react quickly. Convergence times with BFD are lower than for routing protocols configuring with shorter timers (se my example for OSPF below). Of course, as detection times get shorter the sensitivity to link flapping grows which may be undesirable. Also BFD has dampening mechanism that allows to introduce exponential delay in communication failure detection mechanism.
For now BFD can work with the following protocols:
- Static Routing
- ATM Pseudowires
Here is a simple topology I used to test BFD operations:
One important thing to understand with BFD is that it works only in conjunction with the protocol it is supposed to notify of failures. If you just configure it on two adjacent interfaces like this:
It won’t do anything. No neighborship is going to be established a this stage, no packets will be sent. What you need is enable it for the routing protocol, in our case OSPF:
And here it is:
BFD neighborship is established:
Now, breaking the communication between OSPF neighbors (with debug bfd events enabled):
BFD reports communication failure in 267 ms and signals OSPF to shut down the neighborship which OSPF immediately does. This gives us convergence time a lot shorter than OSPF Fast Hello.
In real life you should, of course, check that you have enough hardware resources to run BFD with these parameters as well as understand whether you need such short convergence time (because of the possibility of false-positives).
Looking at how BFD packet capture you may expect to see some sort of keepalive exchange between neighbor IP addresses which in our case are 10.0.0.1 and 10.0.0.2. Here is what you actually see:
Both R1 and R2 are sending UDP messages to their own interface IP addresses. This is another important thing about BFD: it works on Layer 2. Here are these packets details:
R1 sends BFD control message to R2 MAC, R2 swaps source and destination MAC and sends it back out the interface it was received. R1 gets his own message back and sees that communication across the link is working. The same process happens in the opposite direction. So R2 doesn’t process BFD data coming from R1 and vice versa. Apart from this routers interchange control messages with link state information. These messages are sent on Layer 3:
Here is what the link communication failure detection process looks like: