A Configurable Networks-on-Chip Router Using Altera FPGA and NIOS2 Embedded Processor

Wen-Chung Tsai #1, You-Jyun Shih #2, Bo-Sheng Lyu #3

# Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan, ROC

#1 azongtsai@cyut.edu.tw
#2 asd8730873@yahoo.com.tw
#3 jay604132002@gmail.com

Abstract—In this paper, we introduce a communication IP for System-on-Chip (SoC) namely Configurable Network-on-Chip Router (CNoC-Router), which can perform various kinds of topology configuration for an on-chip network. With the flexible topology configuration and an adaptive routing scheme, CNoC-Router can enable a network with high performance using a relatively easy control. Synthesizable Register-Transfer-Level (RTL) coding was designed and verified with ModelSim. Furthermore, a prototype based on Altera FPGA with a NIOS2 embedded processor has been implemented to demonstrate its practice.

Keywords—Embedded Processor; Field Programmable Gate Array; Networks on Chip; Router; System on Chip

1. INTRODUCTION

Embedded devices usually run just one or a few applications. Therefore, general-purpose processors are not suitable for running those applications because they are either too slow or too power hungry. Low-power embedded processors combined with hardware accelerators are the preferred choice for most designers. Recent designs for the embedded market use multi-core microprocessor to reduce power consumption. Future embedded systems may use a large number of heterogeneous cores to deliver the best trade-off between performance and power consumption. For these reasons, this paper presents a Configurable Networks-on-Chip Router (CNoC-Router) implemented by Altera FPGA [1] (Field Programmable Gate Array) and cooperating with a NIOS2 [2] embedded processor to demonstrate its function correctness and application extension for the many-core paradigm.

2. BACKGROUND

From on-chip interconnection networks [3] to the Networks-on-Chips (NoC) paradigm [4], packet switching is an aggressive, long-term approach for nano-scale interconnection networks [5]. To exchange messages among cores, the CNoC-Router was designed to transmit and receive packets through the physical interconnection channels between routers in the network. In contrast to the computer network, CNoC-Router covers functions of not only the data-link layer, but also the network layer to offload the processing overheads (e.g., packet routing procedures and switching controls) in embedded processors. In following sections, we will introduce the proposed techniques and demonstrate the enhanced network performance. Finally, a conclusion will be drawn.

3. ARCHITECTURE

CNoC-Router is a 5 five port router as shown in Fig. 1. In which, Port 0 can be attached to a host entity (i.e., processor). Port1~Port4 are used to connect to neighbour routers.

Fig. 1 A five-port router
3.1. Applicably Topology

By using CNoC-Router, various kinds of topologies can be configured, implementation examples are shown in Fig 2.

![A 3x3 Mesh](image1)
![A Tree with 7 Nodes](image2)

Fig. 2 Topology examples

3.2. Router Interface

Host/Neighbour-Router interfaces contain wrappers for each master and slave controller to transform Host/Neighbour-Router signals for the FIFO controls. Most kinds of processors can be attached to the CNoC-Router with little interface design modifications. In this paper, the Host interface is fitted to an Altera NIOS2 embedded processor.

![Host or router interface of the router](image3)

Fig. 3 Host or router interface of the router

3.3. Switching Core

The router switching core includes Arbiter, Routing Table, Control Registers, RX FIFOs, Interface Controllers, and Multiplexers as shown in Fig4.

![Fig. 4 Switching core](image4)

Fig. 4 Switching core

4. Specification

Encapsulation and routing for packets are two essential functionalities of the designed router. Next, the proposed specifications are introduced as follows.

<table>
<thead>
<tr>
<th>TABLE 1</th>
<th>PACKET FORMAT</th>
</tr>
</thead>
<tbody>
<tr>
<td>31 16 15 00</td>
<td></td>
</tr>
<tr>
<td>Destination Address</td>
<td>Control Code</td>
</tr>
<tr>
<td>Source Address</td>
<td>Data Length</td>
</tr>
<tr>
<td>Destination Offset</td>
<td>Extended Control Code</td>
</tr>
<tr>
<td>Data (0 - 65535 flits)</td>
<td>...</td>
</tr>
<tr>
<td>CRC (1 flit)</td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>TABLE 2</th>
<th>ROUTER REGISTER</th>
</tr>
</thead>
<tbody>
<tr>
<td>Name</td>
<td>Address</td>
</tr>
<tr>
<td>Router Add.</td>
<td>0x0000</td>
</tr>
<tr>
<td>Channel Num.</td>
<td>0x0000</td>
</tr>
<tr>
<td>TABDA3–0</td>
<td>0x0001</td>
</tr>
<tr>
<td>TABDA7–4</td>
<td>0x0002</td>
</tr>
<tr>
<td>TABDAb–8</td>
<td>0x0003</td>
</tr>
<tr>
<td>THPRI4–1</td>
<td>0x0004</td>
</tr>
<tr>
<td>Reversed</td>
<td>0x0005</td>
</tr>
<tr>
<td>~ 0xFFFF</td>
<td></td>
</tr>
<tr>
<td>Testing</td>
<td>0xFFFF</td>
</tr>
</tbody>
</table>
4.2. Router Register

Table 2 is the implemented router register table (32 bits/word). The registers can be programmed directly by the local host (i.e., the attached processor) or via a packet come from another router.

4.3. Routing Scheme

Each router owns a routing table (cf. TABDAx in TABLE 2), which keeps the priority of output ports to other routers (the table is predefined according to the network topology). In general, the router chooses the first priority except the expected port is congested. This policy gives a router the ability to choose another path to avoid the heavy traffic region. Besides, the threshold (cf. THPRIx in TABLE 2) of congested levels to enable a re-route can be adjusted. As Fig. 5 shows, when a packet in Router #0 and would like to go to Router #3, it will choose the path via Router #1 instead of Router #2, because Router #1’s FIFO have more free space than that free FIFO space in Router #2.

5. PERFORMANCE ANALYSES

In this section, FIFO size, packet size, routing function, and FPGA performance of the implemented CNoC-Router are evaluated.

5.1. Analyses of FIFO and Packet Sizes

Referring to Fig. 6, we adopted a star topology with five routers as the analysis configuration and compared the performance results with a native switching method. The FIFO size can be programmed via a QUE_LENGTH parameter in the CNoC-Router RTL design, and the packet size can be adjusted in the Data_Length field of the proposed packet header (cf. Table 1).

Fig. 6 Configuration of size effect analyses

The first diagram in Fig. 7 shows that a larger FIFO size has higher traffic variations tolerances compared with a small size FIFO. According to the second diagram in Fig. 2, transmissions with bigger size packets (16 words per packet) achieve superior transit throughput due to less arbitration transition overheads in routers.

Fig. 7 Performance analyses of diverse FIFO sizes and packet sizes

5.2. Analyses of Adaptive Routing

In Fig. 8, CPU #0 at Router #0 issues packets to CPU #3 at Router #3. There are 3 paths to be selected. We list the paths from the shortest to the longest as follows: Path1 (#0 → #1 → #2 → #3), Path2 (#0 → #4 → #5 → #2 → #3), Path3 (#0 →
As the number above the Router #3 shows, CPU #3 received 397 packets during 100 us simulation time, most of them are coming via the shortest path (Path1).

![Fig. 8 Configuration of adaptive routing analyses](image)

Next, we enforced local traffic jams on Path1 (Router #1 to Router #2) and Path2 (Router #5 to Router #2) as shown in Fig. 9. We found that packets can be rerouted to another passable path (i.e., Path3) to avoid the hot traffic spots. The total packets received by CPU #3 are decreased slightly in the identical simulation time of 100 us.

![Fig. 9: Routing analyses with local traffic jams](image)

### 5.3. Analyses of FPGA Performance

The CNoC-Router was designed in Register-Transfer-Level (RTL), verified with ModelSim [6], and implemented by Altera FPGA [1]. We list the FPGA performance indexes in Table 3.

<table>
<thead>
<tr>
<th>Tool</th>
<th>Quartus II Version 7.2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Device</td>
<td>Stratix II, EP2S90F1020C5</td>
</tr>
<tr>
<td>Utilization</td>
<td>14 %</td>
</tr>
<tr>
<td>Frequency</td>
<td>76.38 MHz</td>
</tr>
</tbody>
</table>

The maximum operation frequency is 76.38 MHz, which is large than the FPGA board clock rate of 50 MHz. The delay of the design critical path can be greatly reduced in an Application Specific Integrated Circuit (ASIC) design.

### 4. Conclusions

In this paper, a Configurable Network-on-Chip Router (CNoC-Router) was proposed to support an on-chip, packet-switching based network infrastructure for the coming many-core system-on-chip designs. CNoC-Router supported with packet routing and switching are functions corresponding to the network and data-link layers. Evaluations on comprehensive traffic patterns showed that CNoC-Router can enhance network performance by well setting the router’s configurable control register set. Besides, the designed packet format reserved many fields for further extensions. We believe that such a flexible architecture of CNoC-Router will support advanced applications and satisfy diverse communication requirements in the next-generation deep-submicron chip designs.

### ACKNOWLEDGMENT

This work was supported by National Science Council, ROC, grants NSC-102-2218-E-324-001.

### REFERENCES


