TCP Window Size Scaling

TCP (Transmission Control Protocol) is a connection oriented protocol which means that we keep track of how much data has been transmitted. The sender will transmit some data and the receiver has to acknowledge it. When we don’t receive the acknowledgment in time then the sender will re-transmit the data.

TCP uses “windowing” which means that a sender will send one or more data segments and the receiver will acknowledge one or all segments. When we start a TCP connection, the hosts will use a receive buffer where we temporarily store data before the application can process it.

When the receiver sends an acknowledgment, it will tell the sender how much data it can transmit before the receiver will send an acknowledgment. We call this the window size. Basically, the window size indicates the size of the receive buffer.

Typically the TCP connection will start with a small window size and every time when there is a successful acknowledgement, the window size will increase. Here’s an example:

TCP Window size 1

Above we have two hosts, the host on the left side will send one segment and the host on the right side will send an acknowledgment in return. Since the acknowledgement was successful, the windows size will increase:

tcp window size 2

The host on the left side is now sending two segments and the host on the right side will return a single acknowledgment. Everything is working fine so the window size will increase even further:

TCP Window size 4

The host is now sending four segments and the host on the right side responds with a single acknowledgment.

In the example above the window size keeps increasing as long as the receiver sends acknowledgments for all our segments or when the window size hits a certain maximum limit. When the receiver doesn’t send an acknowledgment within a certain time period (called the round-trip time) then the window size will be reduced.

When an interface has congestion then it’s possible that IP packets are dropped. To deal with this, TCP has a number of algorithms that deal with congestion control. One of them is called slow start.

Congestion occurs when the interface has to transmit more data than it can handle. It’s queue(s) will hit a limit and packets will be dropped.

With TCP slow start, the window size will initially grow exponentially (window size doubles) but once a packet is dropped, the window size will be reduced to one segment. It will then grow exponentially again until the window size is half of what it was when the congestion occurred. At that moment, the window size will grow linearly instead of exponentially.

When an interface gets congested, it’s possible that all your TCP connections will experience TCP slow start. Packets will be dropped and then all TCP connections will have a small window size. This is called TCP global synchronization. Here’s what it looks like:

TCP Global Synchronization

The orange, blue and green lines are three different TCP connections. These TCP connections start at different times and after awhile, the interface gets congested and packets of all TCP connections are dropped. What happens is that the window size of all these TCP connections will drop to one and once the interface congestion is gone, all their window sizes will increase again.

The interface then gets congested again, the window size drops back to one and the story repeats itself. The result of this is that we don’t use all the available bandwidth that our interface has to offer. If you look at the dashed line you can see that we the average interface utilization isn’t very high.

To prevent global synchronization we can use RED (Random Early Detection). this is a feature that drops “random” packets from TCP flows based on the number of packets in a queue and the TOS (Type of Service) marking of the packets. When packets are dropped before a queue is full, we can avoid the global synchronization.

The end result will look similar to this:

TCP Global Synchronization Red

When we use RED, our average interface utilization will improve.

Now you have an idea what the TCP window size is about, let’s take a look at a real example of how the window size is used. We can use wireshark for this.

Wireshark Captures

To examine the TCP window size I will use two devices:

Host Raspberry Pi

The device on the left side is a modern computer with a gigabit interface. On the right side, we have a small raspberry pi which has a FastEthernet interface. The raspberry pi is a great little device but it’s cpu / memory / ethernet interface are limited. To get an interesting output, I will copy a large file through SSH from my computer to the raspberry pi which will be easily overburdened.

Here’s what happened, take a look at this picture:

Wireshark IO graphs window size drop

In the graph above you can see the window size that was used during this connection. The file transfer started after about 6 seconds and you can see that the window size increased fast. It went up and down a bit but at around 30 seconds, it totally collapsed. After a few seconds it increased again and I was able to complete the file transfer. Let’s take a closer look at this file transfer, which starts with the three way handshake:

Wireshark capture TCP SYN ACK Window Size

My fast computer uses 10.56.100.1 and the raspberry pi uses 10.56.100.164. Above you can see that in the SYN,ACK message that the raspberry pi wants to use a window size of 29200. My computer wants to use a window size of 8388480 (win=65535 * ws=128) which is irrelevant now since we are sending data to the raspberry pi.

After a few packets, the window size of the raspberry pi looks like this:

Wireshark Capture TCP SYN ACK Window Size Large

Above you can see that the window size has increased to 132480. Originally the window size is a 16 bit value so the largest window size would be 65535. Nowadays we use a scaling factor so that we can use larger window sizes.

At around the 10 second mark the window size decreased. Here’s what happened:

Wireshark Capture TCP Window Full

The raspberry pi seems to have trouble keeping up and its receive buffer is probably full. It tells the computer to use a window size of 26752 from now on. The computer sends 18 segments with 1460 bytes and one segment of 472 bytes (26752 bytes in total). The last packet shows us “TCP Window Full” message. This is something that wireshark reports to us, our computer has completely filled the receive buffer of the raspberry pi.

Once the raspberry pi has caught up a bit and around the 30 second mark, something bad happens. Take a look at the wireshark capture below:

Wireshark Capture TCP Window Zero

Above you can see that the raspberry pi sends an ACK to the computer with a window size of 0. This means that the window size will remain at 0 for a specified amount of time, the raspberry pi is unable to receive any more data at this moment and the TCP transmission will be paused for awhile while the receive buffer is processed.

Here’s the actual packet:

Wireshark Capture TCP Window Zero Packet

Above you can see that the window size is now 0. Once the receive buffer has been processed, the raspberry pi will send an ACK with a new window size:

Wireshark capture window size after zero

The window size is now only 25600 bytes but will grow again. The rest of the transmission went without any hiccups and the file transfer completed.

Conclusion

You have now seen how TCP uses the window size to tell the sender how much data to transmit before it will receive an acknowledgment. I also showed you an example of how the window size is used when the receiver is unable to process its receive buffer in time.

UDP, unlike TCP is a connectionless protocol and will just keep sending traffic. There is no window size, for this reason you might want to limit your UDP traffic or you might see starvation of your TCP traffic when there is congestion.

I hope you have enjoyed this lesson, if you have any more questions feel free to leave a comment in our forum.


Forum Replies

  1. Hello Rene,

    Thank you very much for the lesson.

    but I’m still a little bit confused, Please, correct me if I’m wrong:
    based on the command below if it’s set AF probability will be considered:
    random-detect dscp-based

    now if we have AF21 and AF33 the class different but the probability of dropping packet from AF33 more than AF21, correct?
    what about if the packets AF21 and AF31? what about if we have AF21 and EF and CS3 and CS4?

    also what is the meaning for fair-queue command? what is the impact when you are using it in the policy map?

    Thank you,
    Samer Abbas

  2. Hello Samer

    Class 4 has the highest priority, so if you have AF33, it will have a lower drop probability than AF21 for example. But within the same class, the higher the number the higher the drop probability, so AF13 will more likely be dropped compared to AF11. So yes, you are correct.

    ... Continue reading in our forum

  3. Hello Ranganna

    When WRED calculates the average queue size, it does so by calculating the actual size of the real queue. Specifically, the average is calculated periodically every few milliseconds. It uses the following formula:

    //cdn-forum.networklessons.com/uploads/default/original/2X/4/49dee3e66a13cca56dab8dce4c14e612f03c090d.png

    • o is the old average calculated the previous time
    • n is the weight factor you configure
    • c is the current queue size

    The maximum size of the physical queue will depend on what kind of interface we’re talking about and what plat

    ... Continue reading in our forum

  4. Hello,

    Perhaps a note would be useful informing that the instantaneous queue depth is used for the tail drop (Exponential Weighting Constant chapter).

    Also in the formula for the average queue depth:
    the (instantaneous_old_average) should be change into (instantaneous - old_average)

    Many thanks,
    Stefanita

  5. Thanks Stefanita. I fixed this and added something about the instantaneous queue depth.

    Rene

2 more replies! Ask a question or join the discussion by visiting our Community Forum