|
1. |
Performance of dynamic load balancing algorithms for unstructured mesh calculations |
|
Concurrency: Practice and Experience,
Volume 3,
Issue 5,
1991,
Page 457-481
Roy D. Williams,
Preview
|
PDF (1528KB)
|
|
摘要:
AbstractIf a finite element mesh has a sufficiently regular structure, it is easy to decide in advance how to distribute the mesh among the processors of a distributed‐memory parallel processor, but if the mesh is unstructured the problem becomes much more difficult. The distribution should be made so that each processor has approximately equal work to do, and such that communication overhead is minimized.If the mesh is solution‐adaptive, i.e. the mesh and hence the load‐balancing problem change discretely during execution of the code, then it is most efficient to decide the optimal mesh distribution in parallel. In this paper three parallel algorithms, orthogonal recursive bisection (ORB), eigenvector recursive bisection (ERB) and a simple parallelization of simulated annealing (SA) have been implemented for load balancing a dynamic unstructured triangular mesh on 16 processors of an NCUBE machine.The test problem is a solution‐adaptive Laplace solver, with an initial mesh of 280 elements, refined in seven stages to 5772 elements. We present execution times for the solver resulting from the mesh distributions using the three algorithms, as well as results on imbalance, communication traffic and element migration.The load‐balancing itself is fastest with ORB, but a very long run of SA produces a saving of 21% in the execution time of the Laplace solver. ERB is only a little slower than ORB, and yet produces a mesh distribution whose execution time is 15% faster
ISSN:1040-3108
DOI:10.1002/cpe.4330030502
出版商:John Wiley&Sons, Ltd
年代:1991
数据来源: WILEY
|
2. |
Two approaches to the concurrent implementation of the prime factor algorithm on a hypercube |
|
Concurrency: Practice and Experience,
Volume 3,
Issue 5,
1991,
Page 483-495
G. Aloisio,
E. Lopinto,
N. Veneziani,
G. C. Fox,
J. S. Kim,
Preview
|
PDF (586KB)
|
|
摘要:
AbstractOn sequential computers, the prime factor algorithm (PFA) allows the Computation of the discrete Fourier transform (DFT) with a higher efficiency than the traditional Cooley‐Tukey FFT algorithm (CTA). However, the PFA requires substantial data movement, which poses a challenging problem for distributed‐memory multi‐processor systems. In this paper, two approaches for a concurrent implementation of the PFA on these structures are presented. In the first approach, the concurrent PFA runs on all nodes of the multi‐processor system, which is inefficient on large configurations due to the large communication overhead. A second approach developed to reduce this bottleneck is also presented. These solutions have been benchmarked on Caltech hypercubes, and the performances achieved are reported. In both approaches, thecrystal_routeralgorithm was exploited as a concurrent technique for communicating data amon
ISSN:1040-3108
DOI:10.1002/cpe.4330030503
出版商:John Wiley&Sons, Ltd
年代:1991
数据来源: WILEY
|
3. |
Masthead |
|
Concurrency: Practice and Experience,
Volume 3,
Issue 5,
1991,
Page -
Preview
|
PDF (86KB)
|
|
ISSN:1040-3108
DOI:10.1002/cpe.4330030501
出版商:John Wiley&Sons, Ltd
年代:1991
数据来源: WILEY
|
|