Dynamic fault recovery in mesh‐connected parallel computers

Takashi Yokota, Hideharu Amano, Hideo Aiso

研究成果: Article査読


A trend in computer development aiming at high‐speed processing is high‐level parallel processing using a large number of processing elements. This scheme is becoming more realistic with the recent progress of VLSI technology. On the other hand, there arises a problem of how to cope with the generation of faults with the increased number of processing elements. A faulttolerant computer with multiple redundancy has been developed, but no method has been presented in the parallel computer environment whereby sufficient redundancy against fault can be provided, to recover from fault and to continue the computation without a system down. In general, completeness of data is lost by a fault. In the field of numerical computation, however, there are problems with less stringent requirement for completeness of data (e.g., in iterative solution of a system of equations). This paper discusses the case where such a problem is solved by a parallel computer with lattice topology. Three structural types are proposed for dynamic fault recovery during execution, mutual connection and the method of recovery. The result of evaluation by simulation is shown.

ジャーナルSystems and Computers in Japan
出版ステータスPublished - 1986

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • 情報システム
  • ハードウェアとアーキテクチャ
  • 計算理論と計算数学


「Dynamic fault recovery in mesh‐connected parallel computers」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。