Research on Multi-stage Switch Fabric and Scheduling Algorithms for High-performance and High-capacity Switches
|School||Southwest Jiaotong University|
|Course||Applied Computer Technology|
|Keywords||packet switch load-balanced switch three-stage Clos network 100%throughput module-level matching|
With ever-increasing application of Internet, audio and video have gradually dominated the application traffic in Internet, subsequently Peer-to-Peer (P2P) computing has emerged as a new application mode after the Browser/Server (B/S) mode. Furthermore, being pushed by the increase in network application types and numbers, and by exponential rise in numbers of Internet users, the volume of network application traffic has drastically multiplied in several orders of magnitude. Such changes in Internet application environment together with dramatic deterioration in network security, have forced subnetworks, especially backbones, and their core equipment to confront with serious challenges. Therefore, people are calling for securer, more reliable, easy to manage, and higher available backbone-subnetworks and for provisioning different users with required quality-of-service (QoS) in data delivery. This can only be done when their core switches have higher capacity (i.e. the product of port number and data rate per port) and better switching performance (in switching rate, transit delay, jitter, etc.).Optic communication techniques, represented by DWDM (Dense Wavelength Division Multiplication), have raised the data rate up to160Gbps per wavelength and provide a sound basis for constructing high-speed networks. Nevertheless, modern switch fabrics remain electronics-oriented for lacking of optical memory and processor in optical domain. Considering that the gate delay and memory access time are in the order of nano-seconds, it is very challenging even in electronic domain to build a switch to cope with over100Gbps lambda rate, let alone the difficulties in engineering fabrication. Indeed, it had to use ten lambda of10Gbps to support the port rate of100Gbps in IEEE802.3ba (2010), instead of using100Gbps lambda rate directly. The contrast between high lambda rate in optics and comparatively long gate delay and memory access time in electronic devices has made it difficult to design switch fabric with high-capacity and high-performance even in E-O (electronics-to-optic) combined switches. This is exactly the general background for this Ph.D. research.In the past, the switch fabrics of network relay devices (i.e., routers and switches) have evolved from bus structures, through shared-memory structures to crossbar-based structures. The single-stage crossbar structure has reached its limit with state-in-the-art electronic devices and fabrication arts, and is unable to meet the requirements for future high-capacity and high-performance switches, even though, crossbars are still playing an important role in modern switches. As a result, multi-stage crossbar switch fabrics emerged, such as Load-Balanced (LB) structures and Clos network structures, to improve the capacity of modern switches, which are the focuses of this thesis.An LB switch utilizes basic switching elements (crossbar or AWGR) to form a special two-stage switch to support high data rate with simple but strict time-division multiplexing (TDM) scheduling scheme for each switching element while maintaining100%throughput. As a penalty, total switching delay is increased and cell out-of-order may occur. The author’s contributions to the LB switch in this thesis include:1) The introduction of an enhanced LB (E-LB) switch to avoid crucial problem inherited from strict TDM, i.e., unutilized time-slots (or "connection wastes") to improve the performance in delay;2) The introduction of a scheduling scheme by combining the Virtual Central Queue (VCQ) structure at the input side and Virtual Input Queue Set (VIQ set) at the output side to ensure no disordered cells, which for the first time reduced the computing complexity of scheduling for the whole switch to O(1) while still maintaining100%throughput and excellent delay performance. In addition, the author also explored the operability of these techniques with crossbar (electronic) and Arrayed Waveguide Grating Router (AWGR)(optical) respectively, as the basic switching element and analyzed their advantages and shortcomings.One critical criticism to LB switches is that it uses too many basic switching elements to increase the switching capacity, compared with other types of multi-stage structures, for example Clos network structure. Thus, Clos network is more popular in industry. The author’s contribution to Clos-network switches in this thesis can be represented in three directions:The author introduced a scheduling algorithm called StablePlus for bufferless Clos-network switches, which has reduced the computing complexity of Karol’s algorithm for the path-allocation to O(1). The complexity of the whole scheduling process of StablePlus and Karol’s Algorithm is O(1). As a result, it can be easily implemented with a hardware design of the author and enables100%throughput in bufferless Clos-network switches. With further optimization to execution process and execution sequence of StablePlus, the scheduling time for the whole switch has been shortened greatly: consequently, it can be deployed in high-speed switches. However, there is a need for new technologies to further expand to ultra-high capacity inherited from bufferless Clos networks, although StablePlus can satisfy the requirements of high-capacity and high-performance to some extent in the existing network environment. The rest of the thesis is focused on buffered Clos networks.Author’s first contribution to buffered Clos networks is the novel concept of "module-level matching". The concept is first applied to MSM (Memory-Space-Memory) Clos networks. This concept turns the traditional "port-level matching" into a macro "module-level matching" so as to greatly reduce the numbers of requests and responses in scheduling and subtly avoid the need for path-allocation machenism in Clos networks. As a result, a scheduler can easily be implemented on a single board or even in a single chip, and this feature is very critical for high-capacity switches, which cannot be achieved with bufferless Clos networks. Another advantage of module-level matching is the ability to avoid cell out-of-order problem, as the "frame" is an integral unit passing through the Clos network. Furthermore, this concept also enables a scheduler to simply apply any matured scheduling algorithm for single-stage input-queued (IQ) switches to Clos-network switches, and100%throughput can be achieved should the scheduling algorithm be stable. To remedy the long latency because of the full frame waiting time under low loads, a static and a dynamic cell dispatching algorithms have been proposed. The simulation results have shown satisfactory improvement in delay performance.It is surprisingly rewarding to apply the "module-level matching" concept to Clos networks with three-stage buffers (i.e., Memory-Memroy-Memory Clos networks). As a result, the MMM Clos-network switch can actually be modeled as a single-stage CICQ (Combined Input and Crosspoint Queued) switch, and all the advantages of CICO switches and their scheduling algorithms can be taken, such as high-performance with simple scheduling and less buffer requirement. Indeed, applying SQUISH, a typical algorithm, in CICQ to MMM Clos has resulted in100%throughput. Better still, the CICQ modeled MMM Clos-network switch has shown excellent performance, especially under very heavy burst loads, and the system throughput remains above95%with limited buffer size, which is not achievable in MSM Clos networks. While under low loads, its delay performance can be close to traditional single-stage switches if static or dynamic dispatch approaches is applied.Finally, this thesis deals with congestion control issues encountered by all multi-stage buffered switches. A combined admission control in input modules and threshold controlled overflow mechanism is proposed and can effectively solve the congestion control and Head-of-Line Blocking problems both for MSM and MMM Clos-network switches with limited buffer size.