NSTL回溯数据服务平台

NSTL首页设为首页加入收藏关于我们

高级检索浏览申请使用帮助

按字顺浏览

期刊浏览

卷期浏览

返回

Concurrency: Practice and Experience

ISSN: 1040-3108 年代：1994
当前卷期：Volume 6 issue 7 [ 查看所有卷期 ]

年代：1994

	Volume 6 issue 1
	Volume 6 issue 2
	Volume 6 issue 3
	Volume 6 issue 4
	Volume 6 issue 5
	Volume 6 issue 6
	Volume 6 issue 7
	Volume 6 issue 8

1.	Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers
	Concurrency: Practice and Experience, Volume 6, Issue 7, 1994, Page 543-570 Jaeyoung Choi, David W. Walker, Jack J. Dongarra, Preview \| PDF (1538KB)
	摘要: AbstractThe paper describes Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PUMMA package includes not only the non‐transposed matrix multiplication routine C = A ⋅ B, but also transposed multiplication routines C = AT⋅ B, C = A ⋅ BT, and C = AT⋅ BT, for a block cyclic data distribution. The routines perform efficiently for a wide range of processor configurations and block sizes. The PUMMA together provide the same functionality as the Level 3 BLAS routine xGEMM. Details of the parallel implementation of the routines are given, and results are presented for runs on the Intel Touchstone Delta ISSN:1040-3108 DOI:10.1002/cpe.4330060702 出版商:John Wiley&Sons, Ltd 年代:1994 数据来源: WILEY
2.	Matrix multiplication on the Intel Touchstone Delta
	Concurrency: Practice and Experience, Volume 6, Issue 7, 1994, Page 571-594 Steven Huss‐Lederman, Elaine M. Jacobson, Anna Tsao, Guodong Zhang, Preview \| PDF (1306KB)
	摘要: AbstractMatrix multiplication is a key primitive in block matrix algorithms such as those found in LAPACK. We present results from our study of matrix multiplication algorithms on the Intel Touchstone Delta, a distributed memory message‐passing architecture with a two‐dimensional mesh topology. We analyze and compare three algorithms and obtain an implementation, BiMMeR, that uses communication primitives highly suited to the Delta and exploits the single node assembly‐coded matrix multiplication. Our algorithm is completely general, i.e. able to deal with various data layouts as well as arbitrary mesh aspect ratios and matrix dimensions, and has achieved parallel efficiency of 86 %, with overall peak performance in excess of 8 Gflops on 256 nodes for an 8800 × 8800 matrix. We describe BiMMeR's design and implementation and present performance results that demonstrate scalability and robust behavior over varying mesh topo ISSN:1040-3108 DOI:10.1002/cpe.4330060703 出版商:John Wiley&Sons, Ltd 年代:1994 数据来源: WILEY
3.	Determining update latency bounds in Galactica Net
	Concurrency: Practice and Experience, Volume 6, Issue 7, 1994, Page 595-611 S. Clayton, A. Wilson, R. J. Duckworth, W. Michalson, Preview \| PDF (1079KB)
	摘要: AbstractThe paper looks at the problem of ensuring the performance of real‐time applications hosted on Galactica Net, a mesh‐based distributed cache coherent shared memory multiprocessing system. A method for determining strict upper bounds on worst case latencies in wormhole routed networks of known or unknown communication patterns is presented. From this, a tool for determining upper bounds for shared memory update latencies is developed, and it is shown that the update latency of Galactica Net is deterministic. The analytical bounds are then compared with maximum latencies observed in simulations of GNet, with which they compare favorably. Finally, it is shown that the tool for determining update latency bounds is useful for comparing differing GNet system configurations in order to minimize update latency bou ISSN:1040-3108 DOI:10.1002/cpe.4330060704 出版商:John Wiley&Sons, Ltd 年代:1994 数据来源: WILEY
4.	Masthead
	Concurrency: Practice and Experience, Volume 6, Issue 7, 1994, Page - Preview \| PDF (103KB)
	ISSN:1040-3108 DOI:10.1002/cpe.4330060701 出版商:John Wiley&Sons, Ltd 年代:1994 数据来源: WILEY

首页

上一页

下一页

尾页

第1页共4条

高级检索 | 浏览 | 申请使用 | 帮助

版权所有 © 2009 NSTL国家科技图书文献中心

咨询热线：800-990-8900 010－58882057 Email:service@nstl.gov.cn

地址：北京市复兴路15号 100038 京ICP备05017586号