首页   按字顺浏览 期刊浏览 卷期浏览 Fault‐tolerant parallel programming in Argus
Fault‐tolerant parallel programming in Argus

 

作者: Henri E. Bal,  

 

期刊: Concurrency: Practice and Experience  (WILEY Available online 1992)
卷期: Volume 4, issue 1  

页码: 37-55

 

ISSN:1040-3108

 

年代: 1992

 

DOI:10.1002/cpe.4330040104

 

出版商: John Wiley&Sons, Ltd

 

数据来源: WILEY

 

摘要:

AbstractFault tolerance is an issue ignored in most parallel languages. The overhead of making parallel, high‐performance programs resilient to processor crashes is often too high, given the low probability of such events. If parallel systems become more large‐scaled, however, processor failures will become likely, so they should be dealt with. Two approaches to this problem are feasible. First, the system can make programs fault‐tolerant transparently. It can log messages, make checkpoints, and so on. Second, the programmer can write explicit code for handling failures in an application‐specific way. The latter approach is potentially more efficient, but also requires more work from the programmer. In this paper, we intend to get some initial insight into how hard and efficient explicit fault‐tolerant parallel programming is. We do so by implementing four parallel applications in Argus, a language supporting parallelism as well as fault tolerance. Our experiences indicate that the extra effort needed for fault tolerance varies much between different applications. Also, trade‐offs can frequently be made between programming effort and efficiency. One lesson we learned is that fault tolerance should not be added as an afterthought, but is best taken into account from the start. As another result, the ability to integrate transparent and explicit mechanisms for fault tolerance would sometimes be hi

 

点击下载:  PDF (1142KB)



返 回