Professional Documents
Culture Documents
S. U. Etienam-Umoh
Tackling different problems requires different approaches. Sometimes it makes sense to just try a solution and see
whether it works. Sometimes you can use a similar system as a working model, or you might have to buckle down
and research the problem thoroughly. In this section, you learn about different methods and circumstances in which
some methods work and others dont. With this knowledge, you can try a variety of approaches for your
environment.
In this course we shall examine the OSI approach to troubleshooting network problems.
Several steps in this process might be repeated. For example, if Step 7 doesnt lead to a solution for the problem,
you probably need to repeat Steps 2 through 7 until you do have a solution. Figure below is a flowchart of the basic
process.
Experience
If It Happened Once, It Will Happen Again
Colleagues Experience
Experience from Manufacturers Technical Support
The World Wide Web
Using a Knowledge Base or Search Engine
Finding Drivers and Updates
Consulting Online Support Services and Newsgroups
Researching Online Periodicals
Network Documentation
Network Diagrams
Internetworking Devices
2. Troubleshooting tools
Many networking problems occur at the Physical layer and include problems with cables, connectors, and NICs.
The first step in troubleshooting these problems is to determine whether the problem lies with the cable or the
computer. One easy way to do this is to connect another computer to the cable. If it functions normally, you can
conclude that the problem is with the original computer. If it exhibits the same symptoms, check the cable first,
and then check the device it connects to, and so forth.
2. Power Fluctuations
One way to eliminate the effects of power fluctuations, especially on servers, is to connect them to
uninterruptible power supplies (UPSs). UPS systems provide battery power to computers so that they can be
shut down without data loss. Some perform shutdowns automatically, thereby eliminating the need for human
intervention when power failures or severe power fluctuations occur.
Because networking technology changes constantly, frequent upgrades of equipment and software, such as the
file servers OS, are necessary. During these upgrades, its common for some equipment to run on an old OS and
some to run on a new one. When you perform network upgrades, remember three important points:
Ignoring upgrades to new software releases and new hardware can lead to a situation in which a complete
network overhaul is necessary because many upgrades build on top of others.
Test any upgrade before deploying it on your production network.
Dont forget to tell users about upgrades.
If all goes well, the network monitoring and planning you do will ensure that the network performs optimally.
However, you might notice that your network slows down; this problem can happen quickly, or it might be a
gradual deterioration. Whether performance problems are exhibited slowly or suddenly, answering the following
questions should help pinpoint the causes:
What has changed since the last time the network functioned normally?
Has new equipment been added to the network?
Have new applications been added to computers on the network?
Is someone playing electronic games across the network? (Youd be surprised at the amount of traffic
networked games can generate.)
Are there new users on the network? How many?
Could any other new equipment, such as a generator, cause interference near the network?
If new users, added equipment, or newly installed applications seem to degrade network performance, it might
be time to expand your network and add equipment to limit or contain network traffic. Higher-speed backbones,
network partitions, and additional servers and routers are alternatives worth considering when you must
increase capacity to accommodate usage levels that have grown beyond your networks current capabilities.
5. Disaster Recovery
If your network is well documented, recovering from a disaster will be much easier. Disasters can come in many
forms, from a simple disk crash that disables a key server to a fire or flood that devastates your entire workplace.
The procedures for recovery from total devastation are beyond the scope of this book; instead, the following
sections focus on backup procedures and recovery from system failure.
A comprehensive backup program can prevent major data loss. A backup plan is an important part of an overall
disaster recovery plan and should be revised as your needs and data and applicationschange. To formulate
your backup plan, review the following guidelines:
Determine what data should be backed up and how often. Some files, such as program executable and OS
files, seldom change and might require backup only weekly or monthly.
Develop a schedule for backing up data that includes the type of backup to perform, how often, and at what
time of day. The next section reviews the most common backup types.
Identify the people responsible for performing backups.
Test your backup system regularly. The person responsible for backups should perform these tests, which
include backing up data and restoring it. After a backup system is in place, conduct periodic tests to ensure
data integrity. A data backup is no use to you if the restore process doesnt work.
Maintain a backup log listing what data was backed up, when the backup took place, who performed the
backup, and what medium was used. Windows Backup maintains a summary of the backup history, as do
third-party backup programs, but these logs cant be accessed if the backup server fails.
Backup Types
Full A full backup copies all selected files to the selected medium and marks files as backed up; also called
a normal backup.
Incremental An incremental backup copies all files changed since the last full or incremental backup and
marks files as backed up.
Differential A differential backup copies all files changed since the last full backup and doesnt mark files as
backed up.
CopyA copy backup copies selected files to a selected medium without marking files as backed up.
DailyA daily backup copies all files changed the day the backup is made and doesnt mark files as backed
up.
Of these five types, full, incremental, and differential backups are most useful as part of a regular backup
schedule.
The System Recovery Options dialog box has the following options:
Startup Repair - Attempts to fix problems automatically that keep Windows from starting.
Examples are corrupted boot configuration files and a damaged Master Boot Record.
System Restore - Restores Windows system state settings to an earlier point in time.
System Image Recovery - Restores one or more disks from a backup image created by Windows Backup.
Windows Memory Diagnostic - Performs a complete memory test.
Command Prompt - Opens a command prompt window, where you have access to a variety of command-
line troubleshooting tools.
Microsoft Windows, Mac OS X, and Linux provide command line ping programs that can be run from the operating
system shell. Computers can be pinged by either IP address or by name.
open a shell prompt by selecting the start menu, All Programs, then select Accessories and finally, Command
Prompt or on Start Menu by typing cmd and press the Return Key
type "ping" followed by a space and the IP address or you may type something like ping google.com
leaving out the http:// and www
press the Enter (or Return) key
For example, if you open up a command window and type in "ping ask-leo.com", you'll see something like this:
[C:\]ping ask-leo.com
There's a lot of information here, and I'm not going to get into all the geeky details, but here are some of the basic
and important things that ping does:
"Pinging ask-leo.com [72.3.133.152]" - Ping only pings IP addresses so the first thing it did when I asked it to
ping "ask-leo.com" is it looked up the corresponding IP address. This is perhaps one of the quickest ways I
know of to determine the IP address associated with a domain. Also, if this look-up fails, you'll know that
there's a typo in the domain name, or the domain name look-up (DNS) is failing for some reason.
"Reply from 72.3.133.152:" - this tells you that the remote server at that IP address replied, obviously. What
that means, though, is that the entire route across the internet, from your machine through routers and
switches and networking equipment and whatever else, worked. As did the return path carrying the server's
reply. If this fails, ("timed out") then something along the connection between you and the server might be
broken, the server might be off line, or the server might not even exist. It's also possible that the server is
explicitly configured not to respond to ping requests.
"time=69ms" - this is the round trip time; the time between sending the "are you there?" and receiving the
"yes I am!". In this case, 69 milliseconds. Since the ping is repeated several times you can see that this time is
fairly consistent, which is good. The time will vary depending on many factors including how close you are to
the remote server, how many routers and other networking equipment are in between you and that server,
and more. In the example above, the ping was from me in the Seattle area to the Ask Leo! server housed in
Texas. A quick test of a ping to a server in Japan resulted in times twice as long.
"Sent = 4, Received = 4" - one of the things that TCP/IP is designed to deal with is packet loss. Ideally, every
packet you send should get to where it's going, but for various reasons that doesn't always happen. As long
as the packets can get there after a retry or two, in normal usage you'd never notice. Ping sends multiple
packets and reports specifically on the success rate, so that you can see if a particular connection is prone to
packet loss.
"Approximate round trip times" - while on average the same kind of packet sent to the same destination
should take roughly the same amount of time, that's also not always the case. Sometimes for reasons as
The graphic below illustrates a typical ping session when a device at the target IP address responds with no network
errors:
Reply from: By default, Microsoft Windows ping sends a series of four messages to the address. The program
outputs a confirmation line for each response message received from the target computer.
Bytes: Each ping request is 32 bytes in size by default
Time: Ping reports the amount of time (in milliseconds) between the sending of requests and receipt of
responses
TTL (Time-to-Live): A value between 1 and 128, TTL can be used to count how many different networks the
ping messages passed through before reaching the target computer. A value of 128 indicates the device is on
the local network, with 0 other networks in between.
Ping also includes several options (type "ping -?" for a list)
While ping's only job is to generate the echo command across the network, you will still find that the ping utility has
multiple options -- most if not all of which we can safely ignore. The ping implemented in Windows 7 has 16 separate
options.
Ping Switches
There are a wide number of switches that may be used with the Ping command.
Remember that the average ping is usually between 35 and 70ms. Higher ping rate will produce lag, missing sound,
poor connection and other connection problems. There is also a more advanced method of testing the connection.
Resolving Problems
If you can ping an IP host on a different network, it suggests that both hosts have TCP/IP correctly initialized and
configured, and that routing between the networks is also configured correctly.
In cases where you cannot ping a remote host, don't jump to the conclusion that the remote host is unavailable or
misconfigured, though it might be, the problem may also be a configuration issue with the source host, or potentially
some routing-related (or physical connectivity) issue between the two. As a general rule, use the following steps to
determine the source of connectivity issues between your PC and a remote system:
Run ipconfig /all displays all pertinent information for your IP address configuration, including the IP
address, default gateway, and DNS servers.
Ping the Ethernet card (NIC) in your computer this will always be 127.0.0.1
A successful response verifies that the IP protocol is functioning correctly. It doesnt mean the IP address
configuration is correct, however. The following checks do that. If this fails, it generally indicates that TCP/IP
is not properly installed or initialized on your host system.
Ping your own computer (local IP address) verifies the computers capability to receive ICMP packets. If
you can ping the loopback address but not the local computers IP address, its likely the firewall is blocking
ICMP packets or that you have configured your host PC's IP address incorrectly.
Ping the Default Gateway (router) the default gateway is the address of the router the computer sends
packets to when the destination host is on another network. If you cant ping the default gateway, you wont
be able to send packets outside your local LAN. It may indicate that TCP/IP is not configured correctly on
your local router interface, on your host PC, or that the router interface has not been enabled with the no
shutdown command. If the host youre trying to communicate with is on the same LAN, you can skip this
check.
Ping an outside IP address verifies whether you can communicate with the target computer by using
ICMP.
Ping a domain name to check DNS verifies that you can resolve the hostname to the correct IP address. If
this check is unsuccessful, try the next two checks.
Ping DNS servers a response from one or more DNS servers indicates that the computer can
communicate with a server that can resolve names to IP addresses, but it doesnt indicate that DNS lookups
are working. If you can ping the DNS server, the next check verifies whether the DNS server can perform DNS
lookups.
Use Nslookup determines whether the DNS server can resolve the name of the host youre trying to
communicate with. If it cant resolve the hostname to an address, try a well-known Internet name to see
whether the problem happens only when looking up the target hosts address.
Once all of the hardware connections are checked and verified, any of the following could be the cause of your
networking error(s).
The ping command uses Windows Sockets-style name resolution to resolve a computer name to an IP address, so if
pinging by address succeeds, but pinging by name fails, then the problem lies in address or name resolution, not
network connectivity.
Troubleshooting Examples
In some cases, ping requests fail. This happens for any of several reasons:
The graphic below illustrates a typical ping session when the program does not receive any responses from the
target IP address. Each "Reply from" line takes several seconds to appear on the screen as the program waits and
eventually times out. The IP address referenced in each reply line of the output is the address of the pinging (host)
computer.
Though uncommon, it is possible for ping to report a response rate other than 0% (fully unresponsive) or 100% (fully
responsive). This most often occurs when the target system is shutting down (as in the example shown) or starting
up:
Ping programs allow specifying a computer name instead of an IP address. Users normally prefer pinging by name
when targeting a home computer or a Web site.
The graphic above illustrates the results of pinging bbc's Web site (www.bbc.com). Ping reports the target IP address
and response time in milliseconds. Note that large Web sites like Google utilize many Web server computers
worldwide. Therefore, many different (and valid) IP addresses may be reported by programs when pinging Web
sites.
Many Web sites (including about.com) block ping requests as a network security precaution. The result of pinging
these Web sites varies but generally includes a "Destination net unreachable" error message and no useful
information. IP addresses reported by pinging sites that block ping tend to be those of DNS servers and not the Web
sites themselves.
C:\>ping www.about.com
2. Traceroute
TRACEROUTE is another very helpful utility that operates similarly to ping and also uses the services of the ICMP
protocol. Traceroute, as the name implies, is used to trace the path between the sender and the destination host. It
is a one-way trace, meaning that it traces the route from the source to destination and not the other way around,
which by the way, may follow a different path. Traceroute also uses the services of User Datagram Protocol (UDP), in
specific implementations, as the transport layer for a specific reason that we'll go into further on.
Whenever a computer connects to a website, it must travel a path that consists of several points, a little like
connecting the dots between your computer and the website. The signal starts at your local router in your home or
business, then moves out to your ISP, then onto the main networks. From there it may have several junctions until it
gets off the Internet highway at the local network for the website and then to the webserver itself.
A traceroute displays the path that the signal took as it travelled around the Internet to the website. It also displays
times which are the response times that occurred at each stop along the route. If there is a connection problem or
latency connecting to a site, it will show up in these times. You will be able to identify which of the stops (also called
'hops') along the route is the culprit.
Traceroute is run from a command prompt or terminal window. On Windows, press the Windows key, type
Command Prompt, and press Enter to launch one.
To run a traceroute, run the tracert command followed by the address of a website. For example, if you wanted to
run a traceroute on How-To Geek, youd run the command:
tracert howtogeek.com
Youll gradually see the route take form as your computer receives responses from the routers along the way.
Once the traceroute is run, it generates the report as it goes along the route. Below is a sample traceroute:
As you can see, there are several rows divided into columns on the report. Each row represents a "hop" along the
route. Think of it as a check-in point where the signal gets its next set of directions. Each row is divided into five
columns.
The basic idea is self-explanatory. The first line represents your home router (assuming youre behind a router), the
next lines represent your ISP, and each line further down represents a router thats further away.
RTT1, RTT2, RTT3: This is the round-trip time that it takes for a packet to get to a hop and back to your computer
(in milliseconds). This is often referred to as latency, and is the same number you see when using ping. Traceroute
sends three packets to each hop and displays each time, so you have some idea of how consistent (or inconsistent)
the latency is. If you see a * in some columns, you didnt receive a response which could indicate packet loss.
Domain Name [IP Address]: The domain name, if available, can often help you see the location of a router. If this
isnt available, only the IP address of the router is displayed.
Hop Number - This is the first column and is simply the number of the hop along the route. In this case, it is the
tenth hop.
RTT Columns - The next three columns display the round trip time (RTT) for your packet to reach that point and
return to your computer. This is listed in milliseconds. There are three columns because the traceroute sends three
separate signal packets. This is to display consistency, or a lack thereof, in the route.
Domain/IP column - The last column has the IP address of the router. If it is available, the domain name may also be
listed. The router name can help you determine where the router is physically located.
The times listed in the RTT columns are the main thing you want to look at when evaluating a traceroute. Consistent
times are what you are looking for. There may be specific hops with increased latency times but they may not
Another Example
From this traceroute, you can see that it took 17 hops to go from onyx.training.verio.net to www.neo.com and that
the round trip time was roughly 72-98 ms (based on the 3 numbers on the last line). Keep in mind that the RTT's
reported are the round trip times from the source host to that router hop. It's not a cumulative sum of the previous
times. Each hop is going to add some time to the path, so you'd expect each hop to take a little bit more time to get
to than the last. Looking at this example, you can see that this is pretty much the case here, except for slight
fluctuations on the orders of milliseconds due to network traffic.
Now an important thing to know when using traceroute is what the asterisks/stars mean. If you see traceroute print
out a star instead of a round trip time that means that either your probe packet got dropped, or the reply back to
you for that probe got lost along the way. This is usually referred to as "packet loss," and we will discuss this later.
To understand how to interpret a route, you will need to know a little bit about interpreting reverse DNS. Whenever
a traceroute is done, the program will look up the reverse DNS of each host as it goes and print that information as
part of each line. This can help to give you clues as to each network that a packet goes through when it travels from
you to the final destination. Let's go through an example and show how to interpret it.
In this example, we have tracerouted to www.idsoftware.com from a host within Verio's network. We can now
analyse each hop along the way.
From this traceroute, we can tell that www.idsoftware.com is hosted by ID software themselves in the Dallas/Fort
Worth metroplex. We also know that ID Software is a customer of savvis.net, who is in turn a customer of Alter.net.
We will look at one more traceroute that shows another example of what you might see.
This last traceroute is to www.heaven.com. This traceroute follows much the same path as the last one up to hop 5.
This website is hosted on the Verio network, however, it is hosted by a customer of Verio's, so is not Verio's
responsibility, other than maintaining connectivity.
Before we continue on, there are a couple little caveats to using traceroute that you should be aware of, so you
don't accidently misinterpret the results.
The first caveat to be aware of is that sometimes it will look like the last hop on a traceroute dropped a packet, when
it really didn't. This is due to both the fact that this host is the actual final destination of your traceroute probes, and
how certain Operating Systems handle ICMP. (ICMP, Internet Control Message Protocol, is one protocol that
machines on the Internet use to send messages to each other, and the "Your packet died here" message that
traceroute relies on is an ICMP message.) Since the last hop is your destination, instead of that host sending you
back an ICMP message saying "Sorry your packet died here," that host will send back a different ICMP message
saying "Hi, your packet made it here, but this port is unreachable." This is because traceroute purposefully sets the
probe packet's destination to be some large port number that will most likely be unreachable at the destination host
because it wants to receive that "port unreachable" message back. The caveat here has to do with the fact that some
OS's, such as IOS (which Cisco routers run) and Sun Solaris, purposefully drop ICMP responses like "port
unreachable" if it gets too many of them in a short period of time. They do this presumably as a security precaution.
So, if you were to add in more delay between probes, you wouldn't see this erroneous packet loss.
Another caveat of traceroute is that ICMP, which is the protocol traceroute relies on to get responses from each hop,
is usually the lowest priority protocol. So if one router is really busy it might decide to drop ICMP messages, and you
will see lots of packet loss, but that router might be forwarding on more common, higher priority traffic just fine.
If you see a sudden increase in a hop and it keeps increasing to the destination (if it even gets there), then this
indicates an issue starting at the hop with the increase. This may well cause packet loss where you will even see
asterisks (*) in the report.
If the hop immediately after a long one drops back down, it simply means that the router at the long hop set the
signal to a lower priority and does not have an issue. Patterns like this do not indicate an issue.
If you see a hop jump but remain consistent throughout the rest of the report, this does not indicate an issue.
Seeing reported latency in the first few hops indicates a possible issue on the local network level. You will want to
work with your local network administrator to verify and fix it.
If you have timeouts at the very beginning of the report, say within the first one or two hops, but the rest of the
report runs, do not worry. This is perfectly normal as the device responsible likely does not respond to traceroute
requests.
Timeouts at the end may occur for a number of reasons. Not all of them indicate an issue, however.
The target's firewall may be blocking requests. The target is still most probably reachable with a normal
HTTP request, however. This should not affect normal connection.
The return path may have an issue from the destination point. This would mean the signal is still reaching,
but just not getting the return signal back to your computer. This should not affect normal connection.
Possible connection problem at the target. This will affect the connection.
Once you have found a hop that seems to have an issue, you can identify its location and determine where the issue
lies. It may be within your network, your ISP, somewhere along the route, or within your hosting provider's domain.
The first hop is within your own network. The next hop is your ISP. The last couple of hops are likely within your
hosting providers' domain and control, so if the issue is there, they may be able to fix it for you. If it is anywhere
prior to that, the issue is simply along the route and is within neither you nor your hosting provider's control.
Questions
Doing a traceroute I don't believe there is a way to tell 100% if a specific hop (ISP, router, etc) is blocking ICMP. You
may just try to ask your ISP support directly to be certain.
Thank you for contacting us. A "request time out" message can indicate an issue with the router/switch, or a switch
set to not.
Performing a Ping & Traceroute may provide additional clues on your connection.
Thank you for the trace example. Following the article above, you can see where the latency begins to affect your
connection. You can search on that IP address to find out where it is located so you know the geographic location of
the issue and possibly the company involved.
An assumption people often make that this means that those hops have problems. It doesn't. Nodes can be
configured to ignore ICMP pings - indeed before some Operating Systems were patched against it this was a
common defence against the "Ping of Death" attack. Others choose to use the technique to hide their network
topography.
Network Boundaries
Identifying network boundaries (where the packet crosses from one network to another) is extremely important to
troubleshooting problems with Traceroute, because these tend to be the points at which administrative policies
change, greatly influencing the final results. For example, when crossing from network A to network B, you may find
that different local-preference values (based on administrative policies like the price of transit from a particular
provider) have completely changed the return path of the ICMP TTL Exceed packet.
Network boundaries also tend to be points where capacity and routing policy is the most constrained (after all, its
usually harder to work with another network than it is to work within your own network), and are thus more likely to
be problem areas. Identifying the type of relationship between two networks (i.e. transit provider, peer, or
customer) can also be helpful in understanding where the problem is occurring, though often more for political than
technical reasons. Having knowledge about which direction the money is flowing can help reveal which side is
responsible for fixing the problem, or for not maintaining sufficient capacity on a particular link.
Sometimes it is very easy to spot a network boundary just by looking for the DNS change. For example:
4 te1-2-10g.ar3.DCA3.gblx.net (67.17.108.146)
5 sl-st21-ash-8-0-0.sprintlink.net (144.232.18.65)
Unfortunately, it isnt always that easy. The problem usually stems from the process of assigning the /30 (or /31) IPs
used on the network boundary interface. The convention followed by most networks is that in a provider/customer
relationship the provider supplies the interface IPs, but in a relationship between peers there may be no clear
answer as to which side should provide the /30, or they may simply take turns. The network who supplies the /30 for
the interface maintains control over the DNS for the entire block, which usually means that the only way for the
other party to request a particular DNS value is to send an e-mail and ask for it. This often leads to DNS values such
as:
4 po2-20G.ar5.DCA3.gblx.net (67.16.133.90)
5 cogent-1.ar5.DCA3.gblx.net (64.212.107.90)
In the above example, hop 5 is actually terminated on Cogents router , but the /30 between these two networks is
being provided by gblx.net. You can further confirm this by looking at the other side of the /30, i.e. the egress
interface that you cant see in Traceroute.
host 64.212.107.89
89.107.212.64.in-addr.arpa domain name pointer te2-3-10GE.ar5.DCA3.gblx.net.
The multiple references to the ar5.DCA3 router is a clear indicator that hop 5 is NOT a Global Crossing router, even
with the cogent hint in DNS. In this particular instance, the interface IPs between these two networks is
64.212.107.88/30, where 64.212.107.89 is the Global Crossingside, and 64.212.107.90 is the Cogent side.
2 po2-20G.ar4.DCA3.gblx.net (67.16.133.82)
3 192.205.34.109 (192.205.34.109)
4 cr2.wswdc.ip.att.net (12.122.84.46)
In the above Traceroute, is the border between Global Crossing and AT&T router at hop 3, or hop 4? One way you
may be able to identify the boundary is to look at the owner of the IP block. In the example above, the whois tool
shows that the 192.205.34.109 IP is owned by AT&T. Using this information, you know that hop 3 is the border
between these two networks, and AT&T is the one supplying the /30.
Being able to identify network boundaries is critical for troubleshooting, so you know which network to contact
when there is a problem. For example, imagine you saw the following Traceroute:
4 po2-20G.ar5.DCA3.gblx.net (67.16.133.90)
etc, etc, the packet never reaches the destination following this point
Who would you contact about this issue, Global Crossing or Cogent? If you couldnt identify the network boundary,
you might naively assume that you should be contacting Global Crossing about the issue, since gblx.net is the last
thing that shows up in the Traceroute. Unfortunately, they wouldnt be able to help you, all they would be able to
say is were successfully handing the packet off to Cogent, youll have to talk to them to troubleshoot this further.
Being able to identify the network boundary greatly reduces the amount of time it takes to troubleshoot an issue,
and the number of unnecessary tickets you will need to open with another networks NOC.
Some networks will try to make it obvious where their customer demark is, with clear naming schemes like
networkname.customer.alter.net. Other networks are a little more subtle about it, sometimes referencing the ASN
or name of the peer. These are almost always indicators of a network boundary, which should help steer your
troubleshooting efforts.
The latency values reported by Traceroute are based on the following 3 components:
The time taken for the probe packet to reach a specific router, plus
The time taken for that router to generate an ICMP TTL Exceed packet, plus
The time taken for the ICMP TTL Exceed packet to return to the Traceroute source.
Items #1 and #3 are based on actual network conditions which affect all forwarded traffic in the same way, but item
#2 is not. Any delay caused by the router generating the ICMP TTL Exceed packet will show up in a Traceroute as an
increase in latency, or even a completely dropped packet.
To understand how routers handle Traceroute packets, you need to understand the basic architecture of a modern
router. Even state-of-the-art hardware based routers capable of handling terabits of traffic do not handle every
function in hardware. All routers have some software components to handle complex or unusual exception
packets, and the ICMP generation needed by Traceroute falls into this category.
Depending on the design of the router, the Slow Path of ICMP generation may be handled by a dedicated CPU (on
larger routers these are often distributed onto each line-card), or it may share the same CPU as the Control Plane.
Either way, ICMP generation is considered to be one of the lowest priority functions of the router, and is typically
rate-limited to very low values (typically in the 50-300 packets/sec range, depending on the device).
When ICMP generation occurs on the same CPU as the Control Plane, you may notice that events like BGP churn or
even heavy CLI use (such as running a computationally intensive show command) will noticeably delay the ICMP
generation process. One of the biggest examples of this type of behaviour is the Cisco BGP Scanner process, which
runs every 60 seconds to perform internal BGP maintenance functions on most IOS based devices. On a router
without a dedicated Data Plane-Slow Path CPU (such as on the popular Cisco 6500 and 7600-series platforms),
Traceroute users will notice latency spikes on the device every 60 seconds as this scanner runs.
It is important to note that this type of latency spike is cosmetic, and only affects Traceroute (by delaying the
returning ICMP TTL Exceed packet). Traffic forwarded through the router via the normal Fast Path method will be
unaffected. An easy way to detect these cosmetic spikes is to look at the future hops in the Traceroute. If there was
In the following example, you can see a cosmetic latency spike which is not a real issue. Note that the increased
latency does not continue into future hops.
Asymmetric Routing
One of the most basic concepts of routing on the Internet is that there is absolutely no guarantee of symmetrical
routing of traffic flowing between the same end-points but in opposite directions. Regular IP forwarding is done by
destination-based routing lookups, and each router can potentially have its own idea about where traffic should be
forwarded.
As we discussed earlier, Traceroute is only capable of showing you the forward path between the source and
destination you are trying to probe, even though latency incurred on the reverse path of the ICMP TTL Exceed
packets is part of the round-trip time calculation process. This means that you must also examine the reverse path
Traceroute before you can be certain that a particular link is responsible for any latency values you observe in a
forward Traceroute.
Asymmetric paths most often start at network boundaries, because this is where administrative policies are most
likely to change. For example, consider the following Traceroute:
This Traceroute shows a 100ms increase in latency between Global Crossing in Ashburn VA and Sprint in Ashburn VA,
and youre trying to figure out why. Obviously distance isnt the cause for the increased latency, since these devices
are both in the same city. It could be congestion between Global Crossing and Sprint, but this isnt guaranteed. After
the packets cross the boundary between Global Crossing and Sprint, the administrative policy is also likely to change.
In this specific example, the reverse path from Sprint to the original Traceroute source travels via a different
network, which happens to have a congested link. Someone looking at only the forward Traceroute would never
know this though, which is why obtaining both forward and reverse Traceroutes is so important to proper
troubleshooting.
But asymmetric paths dont only start at network borders, they can potentially occur at each and every router along
the way. A common example of this is when two networks interconnect with each other in multiple locations. Most
routing between networks on the Internet uses a concept called hot potato, i.e. the goal is to get the traffic off of
your network and onto the other network as quickly as possible. This means that a Traceroute which spans a large
distance and goes by multiple interconnection points can have many different return paths, even between the same
two networks. Consider the following example:
If there was congestion between the blue and grey networks in Chicago, a Traceroute might show the increased
latency for one hop, but by the next hop the packets would no longer be traversing this interface. Alternatively,
there could be congestion on the grey networks backbone between San Jose and Washington DC, which the first
two probes would not encounter because they do not traverse that path. The forward Traceroute would indicate a
severe increase in latency and/or packet loss on the blue network between Chicago and San Jose, even though the
actual cause of the problem lies with the grey network.
One possible way to troubleshoot asymmetric reverse paths is with the controlled selection of the source address
where your Traceroute probes originate from. Consider the earlier example of a mysterious increase in latency
between Global Crossing and Sprint in your forward Traceroute. In this example, assume that your network is multi-
homed to Global Crossing and AT&T, and the congestion is actually occurring on the return path from Sprint, which
takes AT&T rather than Global Crossing.
How could you possibly prove that the congestion is between Sprint and AT&T, rather than between Global Crossing
and Sprint? If the interface IPs between your network and another network come out of that other networks IP
space, you can use that interface IP in the source address of your Traceroute to force the return traffic in via that
particular network. In the example above, Sprints network does not know about these two individual /30s, it only
knows that 10.20.0.0/16 routes to Global Crossing, and 172.16.0.0/16 routes to AT&T. If you run a Traceroute from
your router and force the source address to be 10.20.30.1, Sprint will return the traffic via Global Crossing rather
than via AT&T. If the latency does not persist, you know that the problem is actually with the reverse path.
Even if the IP address of the interface does not come out of the other networks IP space, as in the case of a
customer or potentially a peer, you can still use selective control of the Traceroute source address to troubleshoot
the return path. For example, if you source the Traceroute from your loopback address, a peer with multiple
interconnection points could potentially deliver the return traffic via any of those interfaces. If on the other hand
you source the Traceroute from your side of the /30, and the other network carries that /30 within their IGP, you
would be guaranteed that the return path will traverse that same interface.
By default most routers will source their Traceroute probes from the interface IP of the egress interface the
Traceroute probe is routed over. However, some routers allow you to configure a different default source address,
such as always from your loopback interface. Trying the Traceroute from multiple sources can give you different
viewpoints and valuable insight into the return path, even if it is impossible to obtain an actual reverse Traceroute.
In addition to showing you where bottlenecks might exist in an internetwork, Trace Route can confirm your network
design. If you have a complex internetwork with multiple routes to some destinations, this command can show you
which path your packets are taking. Most large internetworks with multiple routes for fault tolerance or load sharing
have a preferred path. Router configuration determines the path packets should take, and Trace Route can verify
whether your network configuration is operating as expected.
The default number of hops is 30, and the default wait time before a time-out is 3 seconds. The default period is
250 milliseconds, and the default number of queries to each router along the path is 100.
The following is a typical pathping report. The compiled statistics that follow the hop list indicate packet loss at each
individual router.
If your IP address is 192.168.x.x, 10.x.x.x, or 172.16.x.x, then you are receiving an internal IP address from a router or
other device. The IP address that the world sees is that of the router. If you are receiving a 169.254.x.x address, this
is a Windows address that generally means your network connection is not working properly.
If you want more detailed information about your network connection, type ipconfig /all at the prompt. Here you
can get the same information as ipconfig with the addition of your MAC (hardware) address, DNS and DHCP server
addresses, IP lease information, etc.
You can use the ipconfig tool to check if you are experiencing some problem with your cable. Windows will show up
a list of the interfaces and if they are connected or not.
A gateway is the device, usually a router that connects your computer to other networks and Internet. If you cannot
connect with your device, it is pretty sure you are not going to connect to the Internet. First you need to know the IP
address of the device. You can use ipconfig to give us the information about the default gateway:
Now you can use ping to test if the gateway respond. This tool informs if a device with a given IP is answering. In
plain English, with this test, we know that our network adapter, the cable and the router are connected correctly.
ECT 572 Introduction to Computer Networking
5. Nslookup, Dig, and Host
Nslookup is a great utility specially designed for troubleshooting Domain Name System (DNS) servers and finding
DNS related problems. The name means "name server lookup" - nslookup, but tool itself can be used for manual
name resolution querying against DNS servers, getting information about the DNS configuration, getting DNS records
and IP addresses of a particular network resource, mail servers of domain, name servers (NS) and general DNS server
diagnosis.
It's available on most of todays modern operating systems including Windows and Linux/Unix like, and can be easily
accessed from command prompt by simple entering "nslookup" command.
Although it is still available by default on Windows and Linux/Unix, nslookup has been deprecated and further use is
discouraged. It has effectively been replaced by its successors, the dig (Domain Information Groper) and host
utilities. Unlike nslookup, they are not available natively on Windows and must be installed manually. There is a host
command in Windows PowerShell but that is something different. You can install the Windows versions of dig and
host by extracting them from BIND for Windows available here.
Dig is basically an improved version of nslookup. Host enables quick lookups of DNS server information and is used to
find 1) the IP address of a given domain name and 2) the domain name of a given IP address.
However as it is still supported on windows Nslookup is the correct tool to use when troubleshooting the following
types of problems:
Nslookup has two modes: interactive and non-interactive. Interactive mode allows the user to query name servers
for information about various hosts and domains or to print a list of hosts in a domain. Non-interactive mode is used
to print just the name and requested information for a host or domain.
It provides us with information of name and IP address of the DNS server it is using.
The first looks up an IP address (4.2.2.2) while the second looks up a domain name (www.yahoo.com).
In the example above, for our query we got the IP address of a server which is hosting the site, but as it can be seen
the answer is Non-authoritative. This is because Nslookup assumes that you are querying your internal domain from
your local private network. However, nslookup in this case queries an external domain for which our chosen DSN
server google-public-dns-a.google.com is not authoritative.
You can also do the reverse DNS look-up by providing the IP Address as argument to nslookup.
Type Description
a IP address
cname Canonical name for an alias
The set type command will let you query a particular type of DNS record. The MX record tells that all the mails sent
to @redhat.com should be routed to the Mail server in that domain. For example, if you wanted to check the MX
(mail) records for a particular domain, you would type the following:
set type=mx
You can now perform another lookup on the domain name, and only the MX records will be returned.
google.com
command
> set type=mx
> google.com
generate output:
Non-authoritative answer:
google.com MX preference = 10, mail exchanger = smtp3.google.com
google.com MX preference = 10, mail exchanger = smtp4.google.com
google.com MX preference = 10, mail exchanger = smtp1.google.com
google.com MX preference = 10, mail exchanger = smtp2.google.com
The first four lines show that the domain google.com has four MX records. Mail addressed to that domain is sent to
the machine with the lowest preference (cost). If that machine is down or not accepting mail, the message is sent to
Note that machines that have MX records do not necessarily have A records.
We will not say any more about nslookup because it is deprecated and has been replaced with the dig tool.
6. Netstat
The netstat command displays the TCP/IP protocol statistics and active connections on the computer on which it was
executed. Netstat is particularly useful when you suspect that there may be unauthorized connections to your
computer (such as when a possible malware infection has occurred). Two popular graphical viewers for netstat are
TCPEye and CurrPorts.
Netstat can be used to detect SYN floods that may be affecting a host. If you run a netstat command such as netstat -
n -p TCP and you see many connections in the SYN_RECV state, you know some anomaly is occurring.
Netstat command
Description
switches (Windows)
netstat Shows the active connections for all outbound TCP/IP connections.
netstat -a Displays a more comprehensive list of active connections and the ports on which the
computer is listening (includes UDP).
netstat -b Displays the executable involved in creating each connection or listening port.
netstat -f Displays Fully Qualified Domain Names (FQDN) for foreign addresses. With this option you
can check if your PC is connected to suspicious websites.
netstat -n Displays active TCP connections; addresses and port numbers are expressed numerically; no
attempt is made to determine host names.
netstat -o Displays the owning process ID (PID) associated with each connection. You can look up a
PID with the Windows Task Manager.
netstat -p [proto] Displays connection details for only a certain protocol, where [proto] can be TCP, UDP,
TCPv6, or UDPv6. With the additional -s option, [proto] can be IP, IPv6, ICMP, ICMPv6, TCP,
TCPv6, UDP, or UDPv6.
netstat -s Displays per-protocol statistics. By default, statistics are shown for IP, IPv6, ICMP, ICMPv6,
TCP, TCPv6, UDP, and UDPv6; the -p option may be used to specify a subset of the default.
[interval] Specifies the length of time in seconds to wait before displaying fresh statistics.
Using the table above you can figure out what each of these commands does.
Examples
When you invoke netstat with the -r flag, it displays the kernel routing table. The -n option makes netstat print
addresses as dotted quad IP numbers rather than the symbolic host and network names. This option is especially
useful when you want to avoid address lookups over the network (e.g., to a DNS or NIS server).
# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
128.178.156.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
0.0.0.0 128.178.156.1 0.0.0.0 UG 0 0 0 eth0
The second column of netstat's output shows the gateway to which the routing entry points. If no gateway is used,
an asterisk is printed instead. The third column shows the "generality" of the route, i.e., the network mask for this
route. The fourth column displays the following flags that describe the route:
7. Arp
The arp command lets you view and manage the Address Resolution Protocol (ARP) cache. In other words, arp
displays and modifies the IP address-to-MAC address translation tables used by the ARP protocol. In order for the
arp command to be meaningful and helpful, you need to first understand the purpose of the Address Resolution
Protocol. As DNS translates between host names and IP addresses, ARP translates between MAC addresses (Layer 2)
and IP addresses (Layer 3). When a host attempts to communicate with another host on the same subnet, it must
first know the destination hosts MAC address. If there is no entry in the sending hosts ARP cache for the destination
MAC address, ARP sends out a broadcast (to all hosts in the subnet) asking the host with the target IP address to
send back its MAC address. These IP-to-MAC mappings build up in the ARP cache which the arp command lets you
view and modify.
Be aware that the ARP cache is a tempting target for hackers. It can be vulnerable to cache poisoning attacks in
which false entries are inserted into the ARP cache, causing the compromised host to unknowingly send data (often
unencrypted) to the attacker.
arp -a or arp -g Displays both the IP and MAC addresses in the ARP cache for all network interfaces using
ARP.
arp -d [inet_addr] Deletes all entries from the ARP cache which causes ARP queries for local network hosts to
be re-processed. For example, arp -d 10.57.10.32.
arp -N [if_addr] Displays the ARP entries for the network interface specified by [if_addr]. Used in
conjunction with -a or -g. For example: arp -a -N 192.168.20.15, where 192.168.20.15 is
the IP address of your (or one of your) network interfaces.
arp -s Adds a static (permanent) entry to the ARP cache. This command is a countermeasure to
ARP spoofing attacks. For example, this command adds a static entry: arp -s 157.55.85.212
00-aa-00-62-c6-09.
arp -v Displays current ARP entries in verbose mode. All invalid entries and entries on the
loopback interface will be shown. Used in conjunction with arp -a or arp -g.
[if_addr] If present, this specifies the Internet address of the interface on your computer whose
address translation table should be modified. Useful if your computer has multiple
network interfaces. If not present, the first applicable interface will be used.
[inet_addr] Specifies an IP address entry in the ARP cache. Used in conjunction with -a or -g. For
example, the command arp -a 192.168.10.20 will query the cache to display the MAC
address of host 192.168. 10.20
The Address Resolution Protocol (ARP) is used to resolve IP addresses to MAC addresses. This is important because
on a network, devices find each other using the IP address, but communication between devices requires the MAC
address.
When a computer wants to send data to another computer on the network, it must know the MAC address of the
destination system. To discover this information, ARP sends out a discovery packet to obtain the MAC address.
When the destination computer is found, it sends its MAC address to the sending computer. The ARP-resolved MAC
addresses are stored temporarily on a computer system in the ARP cache. Inside this ARP cache is a list of matching
MAC and IP addresses. This ARP cache is checked before a discovery packet is sent on to the network to determine if
there is an existing entry.
Entries in the ARP cache are periodically flushed so that the cache doesn't fill up with unused entries. The following
code shows an example of the ARP command with the output from a Windows 2000 system:
As you might notice in the previous code, the type is listed as dynamic. Entries in the ARP cache can be added
statically or dynamically. Static entries are added manually and do not expire. The dynamic entries are added
automatically when the system accesses another on the network.
This command isnt included with Windows itself, but Microsofts Windows Sysinternals provides a Whois tool you
can download. This information is also available from many websites that can perform whois lookups for you.
Ok, so now you have some numbers and assessments from the Speed Test application. Here are the basics of how
your internet connection speed was tested and what the download/upload numbers mean.
1. Performance tests
Firewall
Tests if your Local Area Network (LAN) is protected by a firewall.
Ping
Requests are sent to a selected server and the time it takes to get a response is measured. This is a basic test of your
Internet connectivity.
Jitter
A measurement of the variance among successive ping tests. The lower the jitter value the better indicating that
there is minimal difference in speed from one ping test to another.
Packet Loss
This is exactly what it sounds like - the number of packets travelling across a computer network that fail to reach
their destination. Packet loss can cause jitter with streaming technologies, resulting in inconsistent performance.
Download speed
The number provided here represents the number of Mbps your connection is allowing to travel from a website to
your network. When it comes to reading, playing games, viewing video and listening to streaming music on the web,
this is the key number. The download speed is the number that your Internet service provider (ISP) uses to
differentiate their different plans.
If your ISP is doing a good job, the download speed you get from speed test will be close to the one your service
provider associates with your plan.
Upload speed
The number provided here is the number of Mbps your connection allows you to send from your computer to a
website. Because so much online activity is interactive, your upload speed is important because it will determine
how well you are able to work with web-based applications. ISPs don't pay as much attention to upload speeds in
their marketing, but you should be able to find the expected performance noted somewhere on their website.
As with your download speed, your upload speed should be close to the speed your service provider associates with
your plan.
Download Speeds
1-4 Mbps
Generally, this is the lowest level of service available in most areas. Email and most web site will load fine and most
music streaming services will work without interruption. Internet phone services (VOIP) should have no trouble. But
Standard Definition (SD) videos will buffer on occasion.
4-6 Mbps
According to the Federal Communications Commission, this is the minimum speed "generally required for using
today's video rich broadband applications and services." Users at this speed should not have any trouble with
streaming audio or video. Service at this speed will allow some file sharing and should work fine for streaming
Internet TV (IP TV).
6-10 Mbps
For online gamers and heavy video-on-demand, this is the preferred speed. This speed delivers uninterrupted online
gaming and smooth on-demand video as long as only one device is using a high bandwidth service.
10-15 Mbps
Users at this speed say they do notice the increase in speed. Web sites drop right into the browser and your
interaction with web-based applications and cloud services will be much quicker. Will help you interact with more
complex online applications like remote education services, telemedicine and high definition Internet TV.
15-50 Mbps
If you have a number of devices connected to your network and want to use them at the same time without delays,
this may be the speed for you. With the explosion of electronic products that can be connected to the Internet,
50+ Mbps
Speed like this is not usually seen feeding home networks. The main reasons for such blazing download speed are
commercial - video conferencing, real-time data collection and intense remote computing. But again, with the
explosion of web-enabled devices in homes, speed like this may someday become the new normal. Remember, we
used to access the Internet with dial-up modems.