|
1. |
Empirical analysis of overheads in cluster environments |
|
Concurrency: Practice and Experience,
Volume 6,
Issue 1,
1994,
Page 1-32
Brian K. Schmidt,
Vaidy S. Sunderam,
Preview
|
PDF (1520KB)
|
|
摘要:
AbstractIn concurrent computing environments based on heterogeneous processing elements interconnected by general‐purpose networks, several classes of overheads contribute to lowered performance. The most obvious limitations are network throughput and latency, but certain other factors also play a significant role. In an attempt to gain some insight into the nature of these overheads, and to propose strategies to alleviate them, empirical measurements of native communication performance as well as application execution performance were conducted, using the PVM network computing system. These experiments and our analyses have identified load imbalance, the parallelism model adopted, communication delay and throughput, and within‐host overheads as the primary factors affecting performance in cluster environments. Interestingly, we find that agenda parallelism and load balancing strategies contribute significantly more to better performance than improved communications or system tuning. Drawing general conclusions on how these inefficiencies may be overcome is inadvisable because of the tremendous variability of many parameters in general purpose network environments; we therefore propose several potential approaches, including model selection criteria, partitioning strategies, and software system heuristics, to reduce overheads and enhance performance in network based environme
ISSN:1040-3108
DOI:10.1002/cpe.4330060102
出版商:John Wiley&Sons, Ltd
年代:1994
数据来源: WILEY
|
2. |
Experiments withProgram unificationon the Cray Y‐MP |
|
Concurrency: Practice and Experience,
Volume 6,
Issue 1,
1994,
Page 33-53
Ling‐Yu Chuang,
Vernon Rego,
Aditya Mathur,
Preview
|
PDF (1062KB)
|
|
摘要:
AbstractProgram unification is a technique for source‐to‐source transformation of code for enhanced execution performance on vector and SIMD architectures. This work focuses on simple examples of program unification to explain the methodology and demonstrate its promise as a practical technique for improved performance. Using simple examples to explain how unification is done, we outline two experiments in the simulation domain that benefit from unification, namely Monte Carlo and discrete‐event simulation. Empirical tests of unified code on a Cray Y‐MP multiprocessor show that unification improves execution performance by a factor of roughly 8 for given application. The technique is general in that it can be applied to computation‐intensive programs in various data‐parallel applicat
ISSN:1040-3108
DOI:10.1002/cpe.4330060103
出版商:John Wiley&Sons, Ltd
年代:1994
数据来源: WILEY
|
3. |
Solving an advection‐diffusion problem on the connection machine |
|
Concurrency: Practice and Experience,
Volume 6,
Issue 1,
1994,
Page 55-68
Martin Berggren,
Preview
|
PDF (557KB)
|
|
摘要:
AbstractAn algorithm is presented that solves a linear advection‐diffusion problem using a least‐squares formulation and a conjugate gradient method to solve the corresponding minimization problem. An implementation in CM‐Fortran on a Thinking Machines CM‐2 is compared with a serial implementation on an IBM RS6000. The maximum speed‐up obtained is a factor of 70. For fine grids, the CPU time scales almost ideally when the number of processors is increased from 409
ISSN:1040-3108
DOI:10.1002/cpe.4330060104
出版商:John Wiley&Sons, Ltd
年代:1994
数据来源: WILEY
|
4. |
A parallel block row‐action method for solving large sparse linear systems on distributed memory multiprocessors |
|
Concurrency: Practice and Experience,
Volume 6,
Issue 1,
1994,
Page 69-84
Marco D'apuzzo,
Maria Assunta De Rosa,
Preview
|
PDF (767KB)
|
|
摘要:
AbstractRecently developed block‐iterative versions of some row‐action algorithms for solving general systems of sparse linear equations allow parallelism in the computations when the underlying problem is appropriately decomposed. However, problems associated with the parallel implementation of these algorithms have to be addressed.In this paper we present an implementation on distributed memory multiprocessors of a block version of the Kaczmarz row‐action method. One of the main issues related to the efficient implementation of this method on a concurrent environment is to develop suitable communication schemes in order to reduce the amount of communication needed at each iteration.We propose two data distribution strategies which lead to different computation and communication schemes.To verify and compare the effectiveness of the proposed strategies, numerical experiments have been carried out on a Symult S2010 and a Meiko Computing Surface. The performance evaluation has been done using a scaled efficiency
ISSN:1040-3108
DOI:10.1002/cpe.4330060105
出版商:John Wiley&Sons, Ltd
年代:1994
数据来源: WILEY
|
5. |
Masthead |
|
Concurrency: Practice and Experience,
Volume 6,
Issue 1,
1994,
Page -
Preview
|
PDF (103KB)
|
|
ISSN:1040-3108
DOI:10.1002/cpe.4330060101
出版商:John Wiley&Sons, Ltd
年代:1994
数据来源: WILEY
|
|