System-Level Performance Optimization Methodology for SoC Memory Subsystem
|Course||Microelectronics and Solid State Electronics|
|Keywords||System-on-Chip Co-Simulation Memory Subsystem Performance Optimization Scratch-Pad Memory External Memory Interface|
Today’s embedded systems are far more sophisticated than a decade ago. The increasing gap between high frequency of embedded microprocessor cores and low access speed of off-chip memories has become an important problem in System-on-Chip (SoC) design which causes a heavy fall of chip performance, especially in the multimedia applications. The system-level optimization methodology proposed by this dissertation focuses on the target memory subsystem which contains Scratch-Pad memory (SPM), external memory interface (EMI) and off-chip memory. It enables the designer to discovery the optimal design parameters and internal organizations for the target application at the stage of chip design.The self-built SystemC-based simulator provides the co-simulation environment for the target application and the cycle-accurate performance evaluation for the optimization methodology. Compared with the result of RTL simulation, the maximal evaluation error of the simulator is less than two per ten thousand and the simulating speed is 800 times faster than RTL simulation. The optimization of memory subsystem includes three aspects. Firstly, two buffer components, Buffer Group and Cache, are introduced into the EMI design based on the physical characteristics of the off-chip memory’s organization. Their detailed architectures are decided by the result of design space discovery with the utilization of the performance simulator. Secondly, to overcome the drawbacks of the previous optimization methods for data layout in SPM, a new method based on relation matrix is proposed by this dissertation. It partitions the program into a serial of nodes by control flow graph, utilizes the simulator to estimate each node’s performance factor, and constructs relation matrix to describe the influences among nodes. The refined allocation algorithm selects the appropriate nodes into SPM in order to obtain maximal performance improvement. The method constructs the relationship between SPM size and optimization ability. It is used to find the best SPM size for the target application during the chip design. Finally, the analysis of influences between EMI buffer and SPM optimization enables the designer to obtain the optimal parameters for their combination.With the utilization of the system-level optimization methodology, the optimal design of memory subsystem for the target application is proposed at the end of this dissertation. While the system clock is 50MHz, the optimal design consists of EMI buffer and SPM. The former is 128-row two-way associative unified cache with 4-level write buffer, and the latter has 4096 bytes. Total size is 8192 bytes. The experimental results show that the negative effect of memory access delay is reduced significantly by this design. And the execution speed of the target application rises up to fourfold. At last, a summary is given and some pursued problems about memory subsystem design are pointed out.