Research of Programming Analysis and Parallelism Based on Graphics Processing Unit
|School||PLA Information Engineering University|
|Course||Computer Software and Theory|
|Keywords||Graphics Processing Unit General Computation on GPU Programming Analysis and Parallelism CUDA Cost Model|
High performance computer is not only the integrated expression of a country’s economic and technological strength, but also an important tool for economic promotion, technology development, social progress and national security. It has become the strategic high ground. While people pursue the cost-effective parallel super-computer system, some dedicated computing components play their powerful parallel computing power in many special areas, Graphics Processing Unit, GPU, is one of them for image processing and general purpose computation. With the development of microelectronics technology, GPU is far better than general-purpose processor in integration and data processing capabilities. And GPU has become the component of high performance computer systems.At present, the research for GPU parallelism mainly based on the original serial program, and the professional, who is familiar with the GPU architecture, transforms the serial into parallel. But due to the various costs brought by the parallel implementation, the efficiency of the parallel program is less than that of serial program. This is undoubtedly a great waste of manpower and financial resources. Therefore, how to analyse the serial program reasonably and to predict the efficiency of parallel program on GPU becomes particularly important. This thesis studies how to make GPU more reasonable and effective in general purpose computation. The main research contents and innovations are as follows:1. The thesis analyses the current status of high performance computing, points out the difficulties and challenges which the traditional high performance computers are facing from different views, and studies the hardware architecture of GPU and the programming model, which will be the theoretic foundation of the following cost model.2. The thesis studies the data dependent relation technologies, and adopts an improved method to accurate the number of iteration for calculating loop body workload, which SUIF cannot do when the upper bound and the lower bound of loop body are not certain.3. In order to predict the execution efficiency of parallel program on GPU, the thesis presents a cost model for GPU based on CUDA architecture. The model takes into account several factors including the cost of data transfer, the cost of device startup and the cost of GPU execution. The model can estimate the total time cost of parallel program on GPU, which can determine whether it is worthy for GPU acceleration.