Analysis of Buffer Size in Core Routers

Arthur Dick
December 8, 2005

Abstract

Core routers in use today contain buffers which may be much larger than necessary. This paper will analyse what effect changing the buffer sizes in routers would have on the performance of network protocols. Specifically, we will be focusing on TCP performance, as it is the transfer protocol in use for popular applications such as the web, FTP, email, and peer to peer networks.

Introduction and Motivation

The widely used rule-of-thumb used to determine Internet router buffer size is that we need at least one round trip time worth of buffering[2]. This rule-of-thumb assures that the buffer in a bottleneck link will not go empty when there is a single TCP flow, it does not consider multiple flows.

The fast memory required in core routers is expensive, so reducing the required memory would reduce the cost of these routers. Additionally, if the memory requirements for the buffer in core routers is reduced, an all-optical router becomes feasible[4], as with current technology, a small number of packets can be buffered optically, eliminating the need for an external buffer.

There have been a number of papers [2,3,4,5] written recently discussing the possibility of significantly reducing the amount of buffer space in core routers. However, these studies have only considered TCP Reno flows, which is an algorithm optimized for dealing with one packet loss. In this paper, we will be considering TCP NewReno, which handles multiple packet losses more efficiently.

Background Information

TCP Congestion Control

The TCP congestion control algorithm is designed to allow multiple connections to share a congested links bandwidth. If a TCP sender perceives there is more bandwidth available, it will increase the sending rate. On the other hand, if the sender perceives there is congestion somewhere along it's path, it will reduce the sending rate.

Both TCP Reno and TCP NewReno senders implement the same slow start phase, they will start their congestion window at one and increase the window size by one for each ACK received until the slow start threshold is reached.

The sender will then enter congestion avoidance, which is again identical for both TCP Reno and TCP NewReno. The window size is increased by one every round trip time until a packet loss is detected. A lost packet is detected by the sender upon receipt of triple duplicate ACKs from the receiver.

At this point, TCP will enter the fast retransmit/fast recovery phase, by setting the slow start threshold to half the congestion window, and re-transmitting the lost packet. The congestion window is increased by one for each duplicate ACK received. Congestion avoidance is resumed upon acknowledgement of the lost packet (assuming no other packets were lost.)

TCP NewReno makes a slight modification to TCP Renos fast retransmit/fast recovery phase. When a partial ACK is received by NewReno, the next packet is assumed to be lost as well and is then re-transmitted, remaining in the fast recovery phase. Fast recovery is finished once the last packet sent before fast retransmit was initiated is acknowledged. This behaviour eliminates the multiple window size reductions in Reno.

The Rule-of-Thumb

The rule-of-thumb used to determine the size of buffers in a routers is calculated according to the formula B = RTT * C, where RTT is the average round trip time, and C is the data rate of the link[5]. A buffer of this size is thought to allow the router to run at a high utilization. However, it has been argued that utilization may not necessarily the right metric[2], as most core routers today run well below 100% utilization.

This calculation only takes into account a single TCP Reno flow, whereas backbone routers today carry many more flows. Therefore this single flow is not a realistic model.

Previous Work on Buffer Sizing

In the paper "Sizing Router Buffers"[5], Appenzeller et al determine that when many long-lived TCP flows share the same backbone link they tend to become desynchronized. Therefore there is no significant waiting period for the buffer to drain after a packet is lost, as there would be if there were a single flow or if all the flows were synchronized. Using a probabilistic model, simulations, and measurements with a physical router, they show that buffer sizes can be reduced to C*RTT/sqrt(n) while maintaining a link utilization of 98% or higher. Raina et al argue in the paper "Control Theory for Buffer Sizing"[3] that buffers can be very small if the flows are desynchronized.

In the paper "Buffer Sizes for Core Routers"[2], Wischik and McKeown argue that utilization may not even be the right metric to use, as backbone routers today tend to run significantly below 100%. They show that buffers can nearly be eliminated by sacrificing 5% utilization, and that throughput may actually increase due to the reduced latency and jitter observed with small buffers.

Buffer Size for a Single Long-Lived TCP NewReno Flow


Figure 1: Single flow network topology.


Figure 2: Cycle of TCP NewReno.

We are assuming the topology in figure 1, with a single sender through a bottleneck link; we also assume the sender is in congestion avoidance mode. The sender will increase the window size each round trip until k packets are lost off the tail end of the buffer. We want to ensure there is enough buffer space such that the buffer will not go empty during the fast retransmit and fast recovery stages.

Upon entering fast recovery, the sender will reduce the transmission rate to Wmax/4*RTT, we will assume this rate is constant. The buffer will continue to drain at the constant rate C, so the net draining rate from the buffer is C - Wmax/4*RTT. Therefore, the buffer will drain in a time period of B/(C - Wmax/4*RTT).

Since k packets were lost, the sender will spend a time period of k*RTT in fast recovery. To ensure the congested link remains fully utilized, we want the buffer to just go empty at this point, so B/(C - Wmax/4*RTT) >= k*RTT. Now we can find the necessary relationship between B, C, and RTT to be: B >= k*C*RTT - k*Wmax/4.

At this point, fast recovery is complete, so the sender will resume sending at Wmax/2*RTT. To ensure the congested link remains fully utilized, this sending rate should be equal to C. Therefore Wmax = 2*C*RTT.

Substituting the value for Wmax into the relationship between B, C, and RTT, we find the minimum buffer size which allows the congested link to remain fully utilized is B = k*C*RTT/2.

The rule-of-thumb calculated for TCP Reno gives a buffer size of B = C*RTT for one packet drop, and those calculations can be extended to k packet drops by B = k*C*RTT. Therefore the buffer requirements for NewReno are half those for Reno.

Many Flows Through the Same Bottleneck Link

TCP's congestion control algorithm is designed such that it will fill any buffer[2]. A router will drop packets once the buffer is full causing the sender to reduce their sending rate.

If the buffer size in a router is large, each TCP flow through the router tends to lose packets at the same time, and therefore they all reduce their sending rate at the same time. In this case, the TCP flows are said to be synchronized, and the sum of these flows appear to act as a single amplified flow.

On the other hand, if the buffer size in a router is small, TCP flows tend to lose packets according to a random process. The sum of these flows will not act as a single flow, so there is no need for large buffers to keep the link busy, as there is no point when all the senders enter fast recovery.

Therefore, a router with a small buffer promotes desynchronized TCP flows, and if the TCP flows are desynchronized, a small buffer size is sufficient[3]. This creates a virtuous cycle, as the small buffer tends to induce the conditions favorable for a small buffer.

Simulation

Experiment Methodology

Using the simulation tool ns-2[1], we constructed a network topology which consisted of a single congested link, with a bandwidth of 45Mb/s. We varied the number of long-lived TCP flows through the congested link from 30 to 120, setting the buffer size according to different multiples of the rule B = RTT*C/sqrt(n), the rule suggested in "Sizing Router Buffers"[5]. The average round trip time for the flows was held constant at approximately 0.1 seconds.

We then measured the resulting utilization of the congested link for both TCP Reno and TCP NewReno flows.

Experiment Results

TCP Flows RTT*C/sqrt(n) Buffer Packets Utilization Reno Utilization NewReno
30 0.5x 51 82.1% 90.0%
30 1x 103 89.6% 92.3%
30 2x 205 94.7% 96.6%
30 3x 308 97.7% 98.8%
60 0.5x 36 80.5% 86.2%
60 1x 73 91.9% 94.8%
60 2x 145 92.1% 95.7%
60 3x 218 96.8% 98.5%
90 0.5x 30 82.0% 86.8%
90 1x 59 95.0% 96.9%
90 2x 119 92.0% 96.2%
90 3x 178 90.5% 98.0%
120 0.5x 26 81.4% 88.3%
120 1x 51 95.5% 97.3%
120 2x 103 91.5% 98.2%
120 3x 154 87.6% 97.9%

Figure 3: Comparison of TCP Reno and TCP NewReno buffer requirements.


Figure 4: 30 TCP NewReno flows with a buffer size of 562 packets.


Figure 5: 30 TCP NewReno flows with a buffer size of 51 packets.

The rule-of-thumb in this case tells us we require 562 packets of buffer space. Figure 4 shows when the congested link is over-buffered, the flows tend to become synchronized, and together act as a single flow. Figure 5 shows when B=RTT*C/(2*sqrt(n)), the flows are desynchronized, and we can see in Figure 3 the utilization is high, at 90%.

It seems the buffer requirements for NewReno weren't exactly 1/2 those for Reno as we had predicted. This is possibly because we often only experienced one packet drop per flow. In this case, NewReno and Reno perform identically.

Our results have only considered a single congested link, whereas it is possible a TCP flow would encounter a number of congested links during transmission. We don't believe this would effect our results in a significant way, however this may warrant further investigation.

Conclusion

Buffers in core routers could potentially be significantly reduced by sacrificing a small percentage of utilization. The rule B=RTT*C/sqrt(n) maintains a high utilization of the congested link if we assume Reno flows. The buffer requirements could be further reduced with TCP NewReno.

References

[1]The Network Simulator - ns-2. http://www.isi.edu/nsnam/ns/, accessed October 1, 2005.
[2]Damon Wischik, Nick McKeown. Buffer Sizes for Core Routers. ACM SIGCOMM Computer Communication Review, 35(3):75-78, July 2005.
[3]Gaurav Raina, Don Towsley, Damon Wischik. Control Theory for Buffer Sizing. ACM SIGCOMM Computer Communication Review, 35(3):79-82, July 2005.
[4]Mihaela Enachescu, Yashar Ganjali, Ashish Goel, Nick McKeown, Tim Roughgarden. Routers with Very Small Buffers. ACM SIGCOMM Computer Communication Review, 35(3):83-90, July 2005.
[5]Guido Appenzeller, Isaac Keslassy, Nick McKeown. Sizing Router Buffers. ACM SIGCOMM 2004, 1-12, August 2004.