How to Troubleshoot Networks

There are different reasons why things go wrong on our networks, humans make errors in their configurations, hardware can fail, software updates may include bugs and changing traffic patterns might cause congestion on our networks. To troubleshoot these errors there are different approaches and some are more effective than others.

Troubleshooting consists of 3 steps:

Problem Diagnosis Solution

It all starts when someone or something reports a problem. Often this will be a user that calls the helpdesk because something is not working as expected but it’s also possible that you find issues because of network monitoring (you do monitor your network right?). The next step is to diagnose the problem and it’s important to find the root of the problem. Once you have found out the problem you will implement a (temporary) solution.

Diagnosing the problem is one of the most important steps to do because we need to find the root cause of the problem, here’s what we do to diagnose the problem:

  • Collect information: Most of the time a problem report doesn’t give us enough information. Users are very good at reporting “network is down” or “my computer doesn’t work” but this doesn’t tell us anything. We need to collect information by asking our users detailed questions or we use network tools to gather information.
  • Analyze information: Once we have gathered all information we will analyze it so see what is wrong. We can compare our information to previously collected information or other devices with similar configurations.
  • Eliminate possible causes: We need to think about the possible causes and eliminate the potential causes for the problem. This requires thorough knowledge of the network and all the protocols that are involved.
  • Hypothesize: After eliminating possible causes you will end up with a couple of possible causes that could be the problem. We will select the most likely cause for the problem.
  • Verify hypothesis: We will test our hypothesis to see if we are right or wrong. If we are right we have a victory…if we are wrong we test our other possible causes.

If you don’t use a structured approach for troubleshooting you might just “follow your gut feeling” and get confused because you forget what you already tried or not. It’s also easier if you work together with other network engineers because you can share the steps you already went through.

Here are the steps in a nice flowchart:

Structured Troubleshooting Approach

We call this the structured troubleshooting approach. However if you have a lot of experience with the network you are working on and as you become better at troubleshooting this approach might be too time-consuming.

Instead of walking through all the different steps in the structured troubleshooting approach we can also jump from the “collect information” step directly to the “hypothesize” step and skip the “analyze information” and “eliminate possible causes” steps  If you are inexperienced with troubleshooting it’s best to use the structured troubleshooting approach. As you become better at troubleshooting you might want to skip some of the steps…we call this the shoot from the hip approach:

Shoot from the Hip Troubleshooting

Here’s the shoot from the hip model. The steps that we skip are in blue. If your instincts are wrong you won’t lose your life but you will lose valuable time. If you are right however you’ll save a lot of time (or become the new sheriff in town).

Eliminating possible causes is an important step in the troubleshooting process and there are a couple of approaches how you can do this, here they are:

  • Top-down.
  • Bottom-up.
  • Divide and conquer.
  • Follow the traffic path.
  • Spot the difference.
  • Replace components.

Let’s walk through the different approaches one-by-one!

Top Down Troubleshooting

Top-down means we start at the top of the OSI model (application layer) and work our way further down to the bottom. The idea is that we will check the application to see if it’s working and assume that if a certain layer is working that all the layers below are also working. If you send a ping from one computer to another (ICMP) you can assume that layer 1,2 and 3 are operational.  The downside of this approach is that you need access to the application that you are troubleshooting.

Bottom Up Troubleshooting

Bottom-up means we start at the bottom of the OSI model and we’ll work our way up. We will start with the physical layer which means we check our cables and connectors, move up to the data link layer to see if Ethernet is working, Spanning-tree is working ok, port security is not causing issue, VLANs are configured properly and then move onto the network layer. Here we will check our IP addresses, access-lists, routing protocols and so on. This approach is very thorough but also time-consuming. If you are new to troubleshooting I would recommend to use this method because you will eliminate all the possible causes for problems.

Divide and Conquer Troubleshooting

Divide and conquer means we start in the middle of the OSI-model.

We're Sorry, Full Content Access is for Members Only...

If you like to keep on reading, Become a Member Now! Here is why:

  • Learn any CCNA, CCNP and CCIE R&S Topic. Explained As Simple As Possible.
  • Try for Just $1. The Best Dollar You've Ever Spent on Your Cisco Career!
  • Full Access to our 657 Lessons. More Lessons Added Every Week!
  • Content created by Rene Molenaar (CCIE #41726)

539 Sign Ups in the last 30 days

satisfaction-guaranteed
100% Satisfaction Guaranteed!
You may cancel your monthly membership at any time.
No Questions Asked!

Tags: ,


Forum Replies

  1. So Good and very helpful for me.Thanks you RENE!!! Your are very good for us.

  2. I really enjoyed this lesson so much I memorized the main topics as I like letter groupings so I did CAE HV and TBD FSR don’t ask me why that is easy for me to remember but something about those stick in my memory. Just like
    E ACE WNI D for the log messages. I sometimes forget a letters meaning but a quick look puts it back in my memory for weeks if not longer. The grouping is also important for me remember… weird huh lol…

    Anyway I really liked this lesson it stood out for me as one of my favorites.

  3. Hello Vimal

    Latency and bandwidth issues can be among the most difficult to troubleshoot. Some things that you can do to try to pinpoint this lantency include:

    1. Verify latency and the path that is being taken by packets experiencing the delay using tools such as traceroute
    2. Examine the interfaces of the network devices through which the particular latency has been identified using the show interface command. Look for queues being overflowed, errors, dropped frames or any other anomaly.
    3. Examine the routers through which packets are being routed and verify that
    ... Continue reading in our forum

  4. Hello Vimal

    Policing and shaping are QoS techniques that are used to enforce lower bit rates than what the physical interface is capable of. Shaping will buffer traffic that goes beyond the allowed bit rate, while policing will drop packets that exceed the bit rate.

    If packets are buffered during shaping, then this means that buffered packets will be delayed in their traversal of the network. This will only occur if the enforced limit is reached. If traffic rates are below the enforced limit, no latency will occur.

    The results of policing are even more sever

    ... Continue reading in our forum

  5. Many thanks ! once again Laz. Things are getting friendly now. I really enjoyed it :grinning:

5 more replies! Ask a question or join the discussion by visiting our Community Forum