NSTL回溯数据服务平台

NSTL首页设为首页加入收藏关于我们

高级检索浏览申请使用帮助

按字顺浏览

期刊浏览

卷期浏览

返回

Concurrency: Practice and Experience

ISSN: 1040-3108 年代：1992
当前卷期：Volume 4 issue 1 [ 查看所有卷期 ]

年代：1992

	Volume 4 issue 1
	Volume 4 issue 2
	Volume 4 issue 3
	Volume 4 issue 4
	Volume 4 issue 5
	Volume 4 issue 6
	Volume 4 issue 7
	Volume 4 issue 8

1.	Performance of a particle‐in‐cell plasma simulation code on the BBN TC2000
	Concurrency: Practice and Experience, Volume 4, Issue 1, 1992, Page 1-18 Judy E. Sturtevant, Phil M. Campbell, Arthur B. Maccabe, Preview \| PDF (950KB)
	摘要: AbstractThe BBN TC2000 is a multiple instruction, multiple data (MIMD) machine that combines a physically distributed memory with a logically shared memory programming environment using the unique Butterfly switch. Particle‐in‐cell (PIC) plasma simulations model the interaction of charged particles with electric and magnetic fields. This paper describes the implementation of both a 1‐D electrostatic and a 2 1/2‐D electromagnetic PIC (particle‐in‐cell) plasma simulation code on a BBN TC2000. Performance is compared to implementation of the same code on the shared memory Sequent Ba ISSN:1040-3108 DOI:10.1002/cpe.4330040102 出版商:John Wiley&Sons, Ltd 年代:1992 数据来源: WILEY
2.	Numerical simulation of three‐dimensional thermal convection on the array processor DAP 510
	Concurrency: Practice and Experience, Volume 4, Issue 1, 1992, Page 19-35 W. Erhard, M. Schäfer, Preview \| PDF (783KB)
	摘要: AbstractIn this paper we deal with the numerical simulation of time dependent three‐dimensional thermal convection on the array processor DAP 510. Applying finite differences in combination with a pressure correction method to the underlying non‐linear system of partial differential equations, we reduce the numerical solution of the problem to the solution of a sequence of sparse linear systems. Using polynomial preconditioned conjugate gradient methods for the solution of these systems results in a highly parallel algorithm for the simulation of the considered flows on the DAP 510. Using this parallel algorithm, data can be mapped in different ways onto the processor array. Depending on the number of grid points, several methods are shown. Numerical experiments illustrate the capabilities of the proposed algori ISSN:1040-3108 DOI:10.1002/cpe.4330040103 出版商:John Wiley&Sons, Ltd 年代:1992 数据来源: WILEY
3.	Fault‐tolerant parallel programming in Argus
	Concurrency: Practice and Experience, Volume 4, Issue 1, 1992, Page 37-55 Henri E. Bal, Preview \| PDF (1142KB)
	摘要: AbstractFault tolerance is an issue ignored in most parallel languages. The overhead of making parallel, high‐performance programs resilient to processor crashes is often too high, given the low probability of such events. If parallel systems become more large‐scaled, however, processor failures will become likely, so they should be dealt with. Two approaches to this problem are feasible. First, the system can make programs fault‐tolerant transparently. It can log messages, make checkpoints, and so on. Second, the programmer can write explicit code for handling failures in an application‐specific way. The latter approach is potentially more efficient, but also requires more work from the programmer. In this paper, we intend to get some initial insight into how hard and efficient explicit fault‐tolerant parallel programming is. We do so by implementing four parallel applications in Argus, a language supporting parallelism as well as fault tolerance. Our experiences indicate that the extra effort needed for fault tolerance varies much between different applications. Also, trade‐offs can frequently be made between programming effort and efficiency. One lesson we learned is that fault tolerance should not be added as an afterthought, but is best taken into account from the start. As another result, the ability to integrate transparent and explicit mechanisms for fault tolerance would sometimes be hi ISSN:1040-3108 DOI:10.1002/cpe.4330040104 出版商:John Wiley&Sons, Ltd 年代:1992 数据来源: WILEY
4.	Computation and data movement on RP3
	Concurrency: Practice and Experience, Volume 4, Issue 1, 1992, Page 57-78 Luigi Brochard, Alex Freau, Preview \| PDF (1029KB)
	摘要: AbstractWe present in this paper a study of the computation and communication costs on RP3 and on some issues about algorithm designs on a three‐level memory hierarchy multi‐processor. Using very simple algorithms (vector‐add, vector‐sum, saxpy, … ), we compare different implementations which differ on data localization (global or local) and data cacheability (cacheable or non‐cacheable). This comparison is done using a performance monitoring system (VPMC) that records instructions, data movement, cache requests and misses. The output of the VPMC was then used as input to an analytical performance model which we used to compute the elemental computation and communication times of every basic algorithm. Regarding cacheability (marking the data cacheable instead of non‐cacheable), we found it worthwhile as long as data are blocked adequately. For our simple 1‐D data structures, a block size equal to a multiple of the cache line size gives the best results. However, considering possible load imbalance, a block size equal to the cache line seems optimal. Regarding localization (copying data from global to local, working on local data instead of global and copying data back), we found it ineffective, at least with the RP3 local and global communication speed r ISSN:1040-3108 DOI:10.1002/cpe.4330040105 出版商:John Wiley&Sons, Ltd 年代:1992 数据来源: WILEY
5.	Designing Algorithms on RP3
	Concurrency: Practice and Experience, Volume 4, Issue 1, 1992, Page 79-106 Luigi Brochard, Alex Freau, Preview \| PDF (1478KB)
	摘要: AbstractWe study here the behavior of two numerical algorithms (matrix multiplication and finite difference method) on a three‐level memory hierarchy multi‐processor RP3. Using different versions of these algorithms, which differ on data placement (global, local, global and cacheable, local and cacheable) and on data access (blocked or non‐blocked), we study the impact of these parameters on the performance of the program. This performance analysis is done using a very accurate monitoring system (VPMC) which records instructions, memory requests, cache requests and misses. We perform also a theoretical performance analysis of these programs using a model of computation and communication. Good agreement is found between theoretical and experimental results. As a conclusion we discuss the use of local memory on such a machine and show that it is ineffective with RP3 cache, local and global memory communication speed ratios. We also discuss optimal use of cache and show that the optima can only be realized under some cache properties (private store‐in cache with user control of write‐back) and show that blocked optimal algorithms are to be used to find it. Comparing programming of shared and distributed memory multi‐processors, we remark that optimized algorithms for shared memory systems utilize the same blocking techniques used for programming distributed memory systems, leading to a common programmi ISSN:1040-3108 DOI:10.1002/cpe.4330040106 出版商:John Wiley&Sons, Ltd 年代:1992 数据来源: WILEY
6.	Masthead
	Concurrency: Practice and Experience, Volume 4, Issue 1, 1992, Page - Preview \| PDF (103KB)
	ISSN:1040-3108 DOI:10.1002/cpe.4330040101 出版商:John Wiley&Sons, Ltd 年代:1992 数据来源: WILEY

首页

上一页

下一页

尾页

第1页共6条

高级检索 | 浏览 | 申请使用 | 帮助

版权所有 © 2009 NSTL国家科技图书文献中心

咨询热线：800-990-8900 010－58882057 Email:service@nstl.gov.cn

地址：北京市复兴路15号 100038 京ICP备05017586号