Tuesday, April 18, 2006

 

Debugging Using Rapidly Incrementing Counters.

In this post I would like to talk about one debugging technique that I developed independently but have since seen some other people use but one that I have never seen actually discussed anywhere in writing. I call this debugging by way of rapidly incrementing counters for reasons that will become clear soon.

As an example, let's say you have NAT configured on a router in a network and you find that pings from a box on the inside to a box on the outside are not working. In such a situation you might want to quickly find out how far the packets make it before they get dropped? Do they get successfully NATed but get dropped because there is a return route that is missing? Or is it that it is getting dropped on the NAT router itself? Assume that there is a low amount of background traffic in the network also due to CDP, keepalives etc.

To debug this, first visit the routers you are looking at as the possible source of the problem at hand and do a clear counters on all of them so that you can start off with low counters on all of them. Next, get to a router in the inside network and start off a extended ping to the host on the outside network and specify a timeout of zero and a large count for the number of packets. What this will do is to cause the sender to continuously send packets without waiting for a response to the ICMP echo request sent out earlier. The side effect of this will be that the interfaces in the packet path will see counters that start going up rapidly. You will see the pings failing but that is to be expected with a zero timeout setting and you are really after the generated packet stream rather that success/failure of the pings. You can trace the path checking the interface stats of interfaces that you know should be in the packet path and see the first interface where the count is not incrementing rapidly. That is the router where the packets are getting dropped.

For instance, here is the extended ping initiated from a router on the inside -

R1#ping 66.66.66.1 repeat 9999999 timeout 0

Type escape sequence to abort.
Sending 9999999, 100-byte ICMP Echos to 66.66.66.1, timeout is 0 seconds:
......................................................................
......................................................................
......................................................................

These packets come into the NAT router via Ethernet0 and should exit the router via Serial2 after being NATed. Here is the dump of the interface stats for the two interfaces on the NAT router with some delay between dumps with the ping stream going -

R2#sh int e0 stat
Ethernet0
Switching path Pkts In Chars In Pkts Out Chars Out
Processor 1 351 1 360
Route cache 0 0 0 0
Total 1 351 1 360
R2#sh int s 2 stat
Serial2
Switching path Pkts In Chars In Pkts Out Chars Out
Processor 1 335 1 335
Route cache 0 0 0 0
Total 1 335 1 335
R2#
R2#sh int e0 stat
Ethernet0
Switching path Pkts In Chars In Pkts Out Chars Out
Processor 271 31131 13 1200
Route cache 0 0 0 0
Total 271 31131 13 1200
R2#sh int s 2 stat
Serial2
Switching path Pkts In Chars In Pkts Out Chars Out
Processor 17 1295 13 1200
Route cache 0 0 0 0
Total 17 1295 13 1200
R2#
R2#sh int e0 stat
Ethernet0
Switching path Pkts In Chars In Pkts Out Chars Out
Processor 651 74451 28 2250
Route cache 0 0 0 0
Total 651 74451 28 2250
R2#sh int s 2 stat
Serial2
Switching path Pkts In Chars In Pkts Out Chars Out
Processor 32 2195 13 1200
Route cache 0 0 0 0
Total 32 2195 13 1200
R2#

Looking at the packet stats dump it is clear that the packets are coming in via the ethernet interface but are not exiting the NAT router via the serial interface. The next step would be to try debug ip nat to further debug the problem.

One thing to watch out for when using this debugging technique is to make sure that no per-packet debugs are turned on any routers in the packet path of this high-rate stream. If not, the debugs could overrun the console and potentially even require a system reboot.

Note that this debugging technique will not work if the background traffic rates are already high enough that the zero timeout ping does not cause the counter increment due to the ping stream to stand out. As with all other debugging tools, this is one that can supplement other techniques in your tool box. This method usually will help to quickly isolate the problem to a particular router or interface and then you would need to use appropriate debug and show commands to drill down further.

Comments:
Great method of discovery! I will definately use this method in the future!
 
Post a Comment

Links to this post:

Create a Link



<< Home

This page is powered by Blogger. Isn't yours?