VRRP troubleshooting case

Got an interesting troubleshooting case. Two Layer-3 switches with VRRP configured on their downlinks:

vrrp

The diagram shows only three downlinks with VRRP, the actual number is around 100.

SW2 being VRRP Backup in normal conditions was reporting VRRP flapping from time to time becoming Master and going back to Backup state again:

%VRRP-6-STATECHANGE: Vl366 Grp 166 state Backup -> Master
%VRRP-6-STATECHANGE: Vl60 Grp 60 state Backup -> Master
%VRRP-6-STATECHANGE: Vl673 Grp 73 state Backup -> Master
%VRRP-6-STATECHANGE: Vl479 Grp 79 state Backup -> Master

%VRRP-6-STATECHANGE: Vl366 Grp 166 state Master -> Backup
%VRRP-6-STATECHANGE: Vl60 Grp 60 state Master -> Backup
%VRRP-6-STATECHANGE: Vl673 Grp 73 state Master -> Backup
%VRRP-6-STATECHANGE: Vl479 Grp 79 state Master -> Backup

The clue that drove me to solve the case was that almost all the flappings were occurring in between 9:00 and 18:00. The Layer-3 interfaces had traffic shaping configured:

interface Vlan366
  ip address x.x.x.73 255.255.255.248
  vrrp 166 ip x.x.x.73
  vrrp 166 preempt delay minimum 60
  vrrp 166 priority 101
  service-policy input Limitto2mbps
  service-policy output Limitto2mbps

Looking at SNMP monitoring system I found that %VRRP-6-STATECHANGE Syslog messages timestamps match with time when the traffic on a given interface reaches the shaping limit. What was actually happening was at this moment policy-map was starting to drop traffic and VRRP messages that SW1 was sending to SW2 were occasionally being dropped as well. SW2 missing a subsequent VRRP message declared itself a Master, then got next VRRP message from SW1 and switched to Backup again.

So excluding VRRP (IP Protocol 112 to and from its multicast address 224.0.0.18) from traffic shaping by adding deny statement to traffic shaping ACLs

SW1#sh ip access-lists TS-ACL

Extended IP access list TS-ACL
4 deny 112 x.x.x.72 0.0.0.7 host 224.0.0.18
10 permit ip any any (4131183 matches)

solved the problem.

Nice thing to keep in mind: next time you do traffic shaping, make sure you don’t cause problems for your control plane traffic.

Some Useful Commands to Troubleshoot IP Routing

To see some general routing information

#show ip route summary
IP routing table name is default (0x0)
IP routing table maximum-paths is 32
Route Source    Networks    Subnets     Replicates  Overhead    Memory (bytes)
connected       0           13          0           1408        3744
static          0           0           0           0           0
application     0           0           0           0           0
ospf 100        0           6           0           768         1752
  Intra-area: 6 Inter-area: 0 External-1: 0 External-2: 0
  NSSA External-1: 0 NSSA External-2: 0
bgp 14234       103         1006        0           106464      319392
  External: 629 Internal: 455 Local: 25
internal        63                                              111864
Total           166         1025        0           108640      436752

Shows how much memory is being consumed by each routing process.

To see information on OSPF Area Border Routers and Autonomous System Border Routers

#show ip ospf border-routers

            OSPF Router with ID (10.0.0.2) (Process ID 100)

                Base Topology (MTID 0)

Internal Router Routing Table
Codes: i - Intra-area route, I - Inter-area route

i 10.0.0.3 [10] via 10.0.23.2, Ethernet0/1, ABR/ASBR, Area 0, SPF 9
i 10.0.0.1 [10] via 10.0.12.1, Ethernet0/0, ASBR, Area 1, SPF 5
I 10.0.0.4 [20] via 10.0.23.2, Ethernet0/1, ASBR, Area 0, SPF 92

Basically shows how your router is intended to reach each of the ABRs and ASBRs it knows about.

To see which routing information is originated by your  OSPF-enabled router

#show ip ospf database self-originate

OSPF Router with ID (10.0.0.2) (Process ID 100)

Router Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum Link count
10.0.0.2        10.0.0.2        156         0x80000006 0x005D69 1

Summary Net Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
10.0.12.0       10.0.0.2        111         0x80000001 0x006EA5

Summary ASB Link States (Area 0)

Link ID         ADV Router      Age         Seq#       Checksum
10.0.0.1        10.0.0.2        111         0x80000001 0x00EC2E

Router Link States (Area 1)

Link ID         ADV Router      Age         Seq#       Checksum Link count
10.0.0.2        10.0.0.2        156         0x80000003 0x00746B 1

Summary Net Link States (Area 1)

Link ID         ADV Router      Age         Seq#       Checksum
10.0.23.0       10.0.0.2        111         0x80000001 0x00F414
10.0.34.0       10.0.0.2        111         0x80000001 0x00DF14

Summary ASB Link States (Area 1)

Link ID         ADV Router      Age         Seq#       Checksum
10.0.0.3        10.0.0.2        111         0x80000001 0x00D840
10.0.0.4        10.0.0.2        111         0x80000001 0x0033DA

The output above is taken from Router that is ABR, so you can see Summary (Type 3) LSAs that this router generates and sends to Area 1 to tell about the routes belonging to Area 0 and vice versa. Also you can see Summary ASB LSAs that it sends to inform routers in Area 0 about an ASBR it sees in Area 1 and routers in Area 1 about ASBRs in Area 1.

OSPF LSA Types

For many years I’ve been struggling to memorize OSPF LSA types. Finally, I have decided to create a lab that shows all those LSA types that are supported by Cisco in OSPFv2 (IPv4-OSPF).

Here is my lab topology:

Lab Topology
Lab Topology

Type 1 – Router LSA

According to the documentation LSA type 1 is sent by any router within the area and contains the information about its directly connected links.

To see it I start packet capture in R1 Fa0/0 interface and reset the OSPF process at R2 to provoke LSA flood. Theoretically I should see two Type 1 LSA coming from R2: one for link 10.0.2.0/30 and another one for 10.0.1.0/24.

Here is what actually appears:

Screen Shot 12-14-15 at 05.20 PM

Two LSAs. One with unicast destination and one destined to “ALL OSPF ROUTERS” multicast address. Both contain the same information:

LSA1Capt1

Screen Shot 12-14-15 at 05.21 PM

The information on both links is passed inside one Type 1 LSA. Links are presented with their IDs. There is a proper alogithm to derive Link ID depending on link type, in case of Transit that appears in this LSA network it’s DR IP address. The reason why R2 sends two LSA probably has to do with Cisco implementation of OSPF and the fact that R2 has broadcast multicaccess segment connected to it’s Fa0/1 interface.

This is how Type 1 LSA has been sent:

LSA Type 1
LSA Type 1

Type 2 – Network LSA

Type 2 LSA is generated by DR elected for multiaccess segment to inform its neighbors about this segment existence. An example of such a segment is the interconnecion between R2 R3 and R4. From R2 we can see that R4 got elected as DR (it’s router ID is 172.16.2.1):

R2#sh ip ospf neighbor

Neighbor ID Pri State Dead Time Address Interface
172.16.1.1 1 FULL/BDR 00:00:37 10.0.1.1 FastEthernet0/1
172.16.2.1 1 FULL/DR 00:00:34 10.0.1.2 FastEthernet0/1

So putting capture on R2 fa0/0 and resetting its OSPF process we should get Type 2 LSA from R4 containing information on R2-R3-R4 segment. Here it is, packed together with some other LSAs:

LSA Type 2
LSA Type 2

Notice the subnet mask for the segment, DR ID (Advertising Router) and three attached router IDs.

So Type 2 LSA flow is the following:

LSA Type 2

LSA Type 2

This type of LSA then gets resent by all the routers within the area (in this example R2 will resend it to R1). Advertising router field within the LSA will retain its original value as R4 being the DR is the one responsible for the LSA.

Type 3 – Summary LSA

Type 3 LSAs are sent by ABR. When a link-state change occurs within one of the areas connected to it, it sends this type of LSA to all the other areas thus notifying them about that change. For example, when R2 Fa0/0 link goes down we can expect R3 to notify R6 about this change with Type 3 LSA.

Setting capture for Fa0/0 on R6 and shutting down Fa0/0 on R2 we get the following:

Type 3 LSA
Type 3 LSA

Notice the Metric field value. Setting it to 16777215 (FFFFFF hexadecimal) is a way to mark link as unreachable and signal to other OSPF routers to withdraw its data from their databases.

R3 generates Type 3 LSA because it gets notified of the change by Type 1 LSA coming from R2. Here is the flow diagram:

LSA Type 3
LSA Type 3

Please note that the name “summary LSA” has nothing to do with route summarization feature. Nothing gets summarized here, at least by default.

Type 4 – ASBR summary LSA

Type 4 LSAs are also sent by ABR. By using them ABR notifies it’s connected areas about the presence of ASBR in any of these areas. In my topology R3(ABR) would notify R6 which is in Area 1 about the presence of R1(ASBR) in Area 0 using LSA of this type. Later on R6 will receive some external route information from R1 and the Type 5 LSA announcing this route will contain R1 Router ID, saying that this external network is behind R1. Data obtained by R6 from Type 4 LSA will tell it that R1 is behind R3.

What makes router an ASBR and thus provokes LSA Type 4 generation? It’s route redistribution from any other routing process or protocol into OSPF occurring at this router. In my topology R1 is an ASBR as it redistributes routes from RIP into OSPF.

So, again, R3, being an ABR, should notify R6 about the presence of R1 using Type 4 LSA. But how does R3 get to know that R1 is an ASBR and not just any common intra-area router? The answer is: R1 notifies everyone within Area 0 of itself being an ASBR by setting a special (E) flag in Type 1 LSAs it sends. Here it is:

Type 1 LSA with E flag set
Type 1 LSA with E flag set

Wireshark already interprets E flag set as “AS boundary router”.

So when R3 receives this LSA it sends Type 4 LSA to notify R6:

Type 4 LSA
Type 4 LSA

Advertising router field contains R3 Router-ID, Link State ID contains R1 (ASBR) Router ID.

Here is what the process looks like:

LSA Type 4
LSA Type 4

Type 5 – External LSA

LSA Type 5 is used to advertise external routes in all the OSPF areas. External in this case means not calculated or generated by OSPF process, but redistributed from other routing protocols, redistributed static or connected routes. As an example route for 192.168.99.0/24 network should be advertised by R1 using this type of LSA.

Putting R7 Loopback0 interface into shutdown provokes LSA flooding out of R1 FastEthernet0/0 notifying it as unreachable. Here is the capture:

LSA Type 5
LSA Type 5
LSA Type 5
LSA Type 5

Type 7 – NSSA External LSA

This LSA Type contains external routes information as well as Type 5 LSA. The difference is that Type 7 LSA is propagated across Not-so-Stubby area and Type 5 LSA is not. On ABR that connects NSSA to Backbone area Type 7 LSA gets converted into Type 5 LSA.

In my topology R5 should notify R4 about changes with interface Loopback0 (172.16.102.0/24 subnet) using this type of LSA.

Here is a Type 7 LSA sent by R5 on it’s Loopback0 shutdown:

LSA Type 7
LSA Type 7

and here is LSA 5 generated by R4 for Area0, containing information on the same link:

LSA Type 5 generated by ABR, based on the information from received LSA Type 7
LSA Type 5 generated by ABR, based on the information from received LSA Type 7
LSA Type 7
LSA Type 7

 

Drawing VLANs on Visio Network Diagrams

I have been trying to elaborate a nice way to depict VLANs on network diagrams throughout all my career.

The last idea I came up with looks like this:

VLANs1

Each one of the shapes represents a VLAN. I use the colors together with numbers in order to make shapes easier to read as they appear on the different parts of my diagram.

Here are some VLAN interfaces configured on a Layer-3 switch:

NetDiagP

And here is a dot1q trunk:

VLANs3

Guess what it means when 31 shape covers the line? Right! – A native VLAN. The three other VLANs whose shapes are above and below the line are tagged.

You may stockpile the VLAN shapes around the “trunk line” whichever way you want:

VLANs4

Here I colored some VLANs the same yellow color as they all correspond to wireless segment. Also VLANs 3510-3517 are grouped together because they are all used for the same SSID with VLAN Select.

Simple ASA Access-lists Configuration Technique

The plot

Some day some application or server person comes and wants to deploy some new service on top of your network. The deployment implies installing several new network endpoints (servers, workstations or some other IP-enabled devices) in several network security zones and granting them necessary access permissions. The person is in a hurry to finish the deployment ant test it. You are in a hurry with your day-to-day tasks. The solution vendors documentation often doesn’t explain well what network protocols it uses and which direction the connections between different solution components are going to be established (I have experienced this with new CCTV solution that was being deployed this week). Or you just study the doc which seems well detailed, configure everything according to it, but the service just doesn’t work and neither of you two can be sure whether it’s a firewall or the application level configuration problem.

Solution?

One common way to resolve this is to temporarily permit all traffic between all the IP addresses involved “just to test it works”. Quite often in such cases the deployment gets successfully finished, everyone is happy, the person who was deploying this new has to go on with his or her job, you  have to go on with your job, so no one has time to correct the permissions you have configured for the tests and they stay forever…

Here is a general algorithm I try to use while operating Cisco ASA in order to avoid this:

First configure what vendor asks you to

Study the application vendor documentation. Configure the permissions it asks you to configure. If points are unclear in the documentation – just skip it. Vendor may say something like “Server A and Network Device B need ports X and Y to be opened between them” but doesn’t say which direction the traffic is going to flow and sometimes doesn’t even specify whether it is going to be TCP or UDP. You don’t want to configure both TCP and UDP in both directions, just to make sure. Skip it, and investigate later.

An important thing: configure each pair of Source/Destination hosts/networks and port/protocol as a separate line. Here is an example for some network device and two servers it should be talking to:

Initial permissions configuration
Initial permissions configuration

Second: configure an explicit deny

For each pair of Source/Destination hosts or networks configure an explicit deny line below the permit statements configured in a previous step:

Explicit deny statement
Explicit deny statement

Enable logging for these statements, set the Logging Level to the one that is being logged to ASDM (or your Syslog-server) and an interval of 1 second:

Enabling logging for explicit deny
Enabling logging for explicit deny

This will be used to collect data on denied traffic.

At the end you will have something like this:

Initial config with explicit deny statements for each Source-Destination pair
Initial config with explicit deny statements for each Source-Destination pair

Third: test, check, and correct configuration

Ask a person who deploys the new service to do the tests while observing the logs configured for deny statements. If something fails, check the logs for denied packets and if there are any, analyze  whether this traffic should be permitted or not. If it should – add necessary permissions above the deny line and do the test again:

ASA999
Check explicit deny line logs for denied packets

You may need to repeat this step several times while new features of the service that is being deployed are being tested.

Fourth: let it run for some time and then check and correct it again

Everything went up and running. Now let it work for some time and check the hit counters for ACL lines.

It depends on particular situation, but commonly if some ACL line wasn’t hit by a single packet during 2-3 moths you may consider disabling it.

Also you may disable the explicit deny line at this step.

Fifth: re-arrange the configuration if possible

Ultimately it is group the Sources and Destinations and ports/protocols into groups to make configuration look shorter and easier to read:

Final configuration. Ports and protocols grouped into Service Groups
Final configuration. Allowed ports and protocols grouped into Service Groups

The grouping might be different, for example, you may want to group some ports and protocols based on their role, creating Service Groups such as “NetworkDeviceX-ServerA-Management-Access”.

This way I normally end up having quite well organized configuration that permits just what is needed to be permitted and serves as a good reference for documenting how the deployed service works from the network point of view.