Designing Algorithms on RP3
作者:
Luigi Brochard,
Alex Freau,
期刊:
Concurrency: Practice and Experience
(WILEY Available online 1992)
卷期:
Volume 4,
issue 1
页码: 79-106
ISSN:1040-3108
年代: 1992
DOI:10.1002/cpe.4330040106
出版商: John Wiley&Sons, Ltd
数据来源: WILEY
摘要:
AbstractWe study here the behavior of two numerical algorithms (matrix multiplication and finite difference method) on a three‐level memory hierarchy multi‐processor RP3. Using different versions of these algorithms, which differ on data placement (global, local, global and cacheable, local and cacheable) and on data access (blocked or non‐blocked), we study the impact of these parameters on the performance of the program. This performance analysis is done using a very accurate monitoring system (VPMC) which records instructions, memory requests, cache requests and misses. We perform also a theoretical performance analysis of these programs using a model of computation and communication. Good agreement is found between theoretical and experimental results. As a conclusion we discuss the use of local memory on such a machine and show that it is ineffective with RP3 cache, local and global memory communication speed ratios. We also discuss optimal use of cache and show that the optima can only be realized under some cache properties (private store‐in cache with user control of write‐back) and show that blocked optimal algorithms are to be used to find it. Comparing programming of shared and distributed memory multi‐processors, we remark that optimized algorithms for shared memory systems utilize the same blocking techniques used for programming distributed memory systems, leading to a common programmi
点击下载:
PDF
(1478KB)
返 回