JPS6334492B2

JPS6334492B2 -

Info

Publication number: JPS6334492B2
Application number: JP58133221A
Authority: JP
Inventors: Haruo Kohama; Seijiro Tajima; Ichigaku Asano
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 1983-07-21
Filing date: 1983-07-21
Publication date: 1988-07-11
Also published as: JPS6024651A

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は障害処理方式に係り、詳しくは、情報
処理装置内のプロセツサ部でハードウエア障害を
検出した時の処理方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a fault handling method, and more particularly to a processing method when a hardware fault is detected in a processor section within an information processing device.

[Prior art]

第１図は情報処理装置の構成例である。図にお
いて、１ａ，１ｂはプロセツサ部（CPU）であ
り、それぞれバス７５を通してメモリ部２よりプ
ログラムをフエツチして実行すると共に、該メモ
リ部２に対してデータの読出しあるいは書込みを
行う。又、プロセツサ部１ａ，１ｂは必要に応じ
てそれぞれインタフエース７３，７４を通してデ
ータチヤネル部（DCH）３ａ，３ｂに入出力動
作を指示し、これを受けてデータチヤネル部３
ａ，３ｂはメモリ部２と入出力装置６ａ，６ｂあ
るいはプロセツサ部１ａ，１ｂと入出力装置６
ａ，６ｂ間の入出力動作を制御する。４は該情報
処理装置の保守・運転制御を行うサービスプロセ
ツサ部（SVP）であり、５はサービスプロセツ
サ部４への指示の投入あるいはサービスプロセツ
サ部４からの情報の表示を行う入出力装置であ
る。 FIG. 1 shows an example of the configuration of an information processing device. In the figure, reference numerals 1a and 1b denote processor units (CPUs), which respectively fetch and execute programs from the memory unit 2 through a bus 75, and read or write data to the memory unit 2. Further, the processor units 1a and 1b instruct the input/output operations of the data channel units (DCH) 3a and 3b through interfaces 73 and 74, respectively, as necessary, and upon receiving the instructions, the data channel units 3
a and 3b are the memory section 2 and the input/output devices 6a and 6b, or the processor sections 1a and 1b and the input/output device 6.
Controls input/output operations between a and 6b. 4 is a service processor unit (SVP) that performs maintenance and operational control of the information processing device; 5 is an input/output unit that inputs instructions to the service processor unit 4 or displays information from the service processor unit 4; It is a device.

このような情報処理装置における従来のプロセ
ツサ部の障害処理に関するハードウエアの一例を
第２図に示す。第２図は便宜上、第１図における
プロセツサ部１ａに関する構成を示したもので、
１０は障害検出回路、１１は再試行可能か否かを
示す表示子（再試行不可の時オンとなる）、１７
は再試行中を示す表示子、１２はオア回路、１３
はアンド回路、１４は割込制御回路、１５はイン
バータ、１６は再試行制御回路、１８はサービス
プロセツサ部４あるいはプロセツサ部１ｂとのイ
ンタフエース制御回路、２０はプログラム実行部
である。 FIG. 2 shows an example of hardware related to conventional failure handling in the processor section of such an information processing apparatus. For convenience, FIG. 2 shows the configuration of the processor section 1a in FIG.
10 is a failure detection circuit; 11 is an indicator indicating whether retry is possible (turns on when retry is not possible); 17
is an indicator indicating retrying, 12 is an OR circuit, 13
14 is an AND circuit, 14 is an interrupt control circuit, 15 is an inverter, 16 is a retry control circuit, 18 is an interface control circuit with the service processor section 4 or the processor section 1b, and 20 is a program execution section.

再試行可能か否かを示す表示子１１の状態は、
プロセツサ部１ａ上の命令の実行状況（命令読出
し、命令デコード、演算実行、結果の格納等）に
応じて刻々変化する。障害検出回路１０により障
害が検出された時、表示子１１がオフで再試行が
可能であり、かつ表示子１７がオフで再試行動作
中でなければ、アンド回路１３の出力がオフで、
インバータ１５の出力はオンであり、この時、再
試行制御回路１６により障害発生時点に実行され
ていた命令の再試行が行われる（再試行中は表示
子１７はオンとなる）。そして、再試行が成功す
ると、再試行制御回路１６か割込制御回路１４に
通知され、該割込み制御回路１４により再試行成
功の障害が発生したことを示す内部マシンチエツ
ク割込みを起こし、自プロセツサ部１ａ上のプロ
グラムで再試行成功障害発生を記録する。 The status of the indicator 11 indicating whether retry is possible is as follows:
It changes every moment according to the execution status of instructions on the processor section 1a (instruction reading, instruction decoding, operation execution, result storage, etc.). When a fault is detected by the fault detection circuit 10, if the indicator 11 is off and a retry is possible, and the indicator 17 is off and a retry operation is not in progress, the output of the AND circuit 13 is off,
The output of the inverter 15 is on, and at this time, the retry control circuit 16 retries the instruction that was being executed at the time the failure occurred (during the retry, the indicator 17 is on). If the retry is successful, the retry control circuit 16 or the interrupt control circuit 14 is notified, and the interrupt control circuit 14 generates an internal machine check interrupt indicating that a failure in the successful retry has occurred, and the self-processor section The program on 1a records the retry success and failure occurrence.

一方、障害検出回路１０により障害が検出され
た時、表示子１１がオンで再試行が不能あるいは
表示子１７がオンで再試行に失敗したならば、オ
ア回路１２の出力はオン、アンド回路１３の出力
もオンで、この時、割込制御回路１４により内部
マシンチエツク割込みを起こし、自プロセツサ部
１ａ上のプログラムに障害発生を通知するととも
に、インタフエース制御回路１８、インタフエー
ス７２を介してプロセツサ部１ｂに外部マシンチ
エツク割込み信号を送り、プロセツサ部１ｂ上の
プログラムにも障害発生を通知する。さらに、イ
ンタフエース制御回路１８はインタフエース７１
を介してサービスプロセツサ部４にも障害発生を
通知する。割込みにより障害の通知を受けたプロ
セツサ部１ａおよび１ｂ上のプログラムは、それ
ぞれ独立にデータチヤネル部３ａ，３ｂを介して
入出力装置６ａおよび６ｂにプロセツサ部１ａの
障害を報告するメツセージを表示する。また、プ
ロセツサ部１ａからの障害通知を受けたサービス
プロセツサ部４は、プロセツサ部１ａあるいは１
ｂ上のプログラムとは別に、入出力装置５にプロ
セツサ部１ａの障害を示すメツセージを表示す
る。 On the other hand, when a failure is detected by the failure detection circuit 10, if the indicator 11 is on and retry is impossible, or the indicator 17 is on and retry fails, the output of the OR circuit 12 is on, and the output of the AND circuit 13 is on. The output of An external machine check interrupt signal is sent to section 1b, and the program on processor section 1b is also notified of the occurrence of the fault. Furthermore, the interface control circuit 18
The service processor unit 4 is also notified of the occurrence of the failure via the service processor unit 4. The programs on the processor units 1a and 1b that have been notified of the failure by the interrupt display messages reporting the failure of the processor unit 1a on the input/output devices 6a and 6b independently via the data channel units 3a and 3b, respectively. Further, the service processor section 4 that has received the fault notification from the processor section 1a processes the processor section 1a or 1.
A message indicating a failure of the processor section 1a is displayed on the input/output device 5 in addition to the program on the processor section 1a.

第３図は上記プロセツサ部１ａで障害が検出さ
れた時の、プロセツサ部１ａのハードウエア動
作、プロセツサ部１ａ，１ｂ上のプログラム動
作、およびサービスプロセツサ部４の動作をフロ
ーで示したものである。 FIG. 3 is a flowchart showing the hardware operation of the processor section 1a, the program operations on the processor sections 1a and 1b, and the operation of the service processor section 4 when a fault is detected in the processor section 1a. be.

ところで、このような従来の障害処理方式には
次のような問題があつた。 However, such conventional fault handling methods have the following problems.

(1) プロセツサ部で再試行不能な障害あるいは再
試行失敗となつた障害が検出されたにもかかわ
らず、該プロセツサ部でのプログラム実行が継
続されているため、該プログラム動作により、
他プロセツサ部でのプログラム処理に擾乱を与
えたり、システムに必須のフアイルを破壊する
恐れがある。(1) Even though a failure that cannot be retried or a failure that results in a retry failure is detected in the processor unit, program execution continues in the processor unit.
This may disturb program processing in other processors or destroy files essential to the system.

(2) プロセツサ部で再試行不能な障害あるいは再
試行失敗となつた障害が検出されたとき、他プ
ロセツサ部によりシステム運転が継続できる場
合でも、障害プロセツサ部上のプログラムある
いはサービスプロセツサ部によりシステム運転
が継続できない場合と同様の報告が運用者にに
対して行われるため、運用者に混乱を与える。(2) When a fault that cannot be retried or a fault that results in a failed retry is detected in the processor, even if system operation can be continued by another processor, the program on the faulty processor or the service processor The same report as when operation cannot be continued is sent to the operator, which causes confusion for the operator.

[Purpose of the invention]

本発明の目的は、プロセツサ部で再試行不能な
障害が検出された時、あるいは再試行に失敗した
時、障害にるシステムへの擾乱を防ぐと共に、運
用者に混乱を与えることのない障害処理方式を提
供することにある。 The purpose of the present invention is to prevent disturbance to the system caused by the failure when a failure that cannot be retried is detected in the processor section, or when a retry attempt fails, and to handle the failure without causing confusion to the operator. The goal is to provide a method.

[Summary of the invention]

本発明は、プロセツサ部には障害検出回路から
の信号によりプロセツサ部の動作を停止する停止
指示回路を設けると共に、サービスプロセツサ部
には障害処理指示回路を設けて、プロセツサ部で
再試行不能なハードウエア障害を検出した時ある
いは障害の再試行が失敗した時、該プロセツサ部
でのプログラム実行を停止し障害によるシステム
への擾乱を防止するとともに、他プロセツサ部上
で実行中のプログラムがあれば該プログラムに運
用者への障害報告を行うか否か等の障害処理を委
ね、いずれのプロセツサ部でもプログラム実行が
行われていない時又は障害プロセツサ部以外のプ
ロセツサ部が存在しない時だけサービスプロセツ
サ部が運用者への障害報告を行うようにしたもの
である。 The present invention provides a stop instruction circuit for stopping the operation of the processor section in response to a signal from a fault detection circuit in the processor section, and a fault handling instruction circuit in the service processor section to prevent the processor section from retrying. When a hardware failure is detected or when retrying the failure fails, program execution in the processor is stopped to prevent disturbance to the system due to the failure, and if there are programs running on other processors, The service processor is entrusted with fault handling, such as whether or not to report a fault to the operator, to the program, and the service processor This allows the department to report failures to the operator.

[Embodiments of the invention]

以下、本発明の一実施例を説明するが、ここで
も情報処理装置は第１図の構成をとるとする。 An embodiment of the present invention will be described below, with the information processing apparatus assuming the configuration shown in FIG.

第１図におけるプロセツサ１ａの障害処理に関
するハードウエア構成の本発明の一実施例を第４
図に示す。第４図において、１０は障害検出回
路、１１は再試行可能か否かを示す表示子、１７
は再試行中を示す表示子、１４は割込制御回路、
１６は再試行制御回路、１８はサービスプロセツ
サ部４とのインタフエース制御回路、１９はプロ
セツサ部の動作停止指示回路、２０はプログラム
実行部である。又、第５図は第１図におけるサー
ビスプロセツサ部４の障害処理に関するハードウ
エア構成の本発明の一実施例であつて、４０はプ
ロセツサ部１ａ，１ｂとのインタフエース制御回
路、４１は再試行可能か否かを判定し、プロセツ
サ部に再試行を指示する再試行指示回路、４２は
他のプロセツサ部の状態を調べ他のプロセツサに
障害処理を指示する回路、４３は入出力装置５に
障害表示を行う表示制御回路である。 An embodiment of the present invention of the hardware configuration related to failure handling of the processor 1a in FIG.
As shown in the figure. In FIG. 4, 10 is a failure detection circuit, 11 is an indicator indicating whether retry is possible, and 17
is an indicator indicating that retrying is in progress; 14 is an interrupt control circuit;
16 is a retry control circuit, 18 is an interface control circuit with the service processor section 4, 19 is a circuit for instructing the operation of the processor section to stop, and 20 is a program execution section. FIG. 5 shows an embodiment of the present invention of a hardware configuration related to failure handling in the service processor section 4 shown in FIG. A retry instructing circuit that determines whether or not a trial is possible and instructs the processor section to retry; 42 a circuit that checks the status of other processor sections and instructs other processors to handle a failure; 43 a circuit that instructs the input/output device 5 This is a display control circuit that displays faults.

今、プロセツサ部１ａの障害検出回路１０で障
害が検出されると、インタフエース制御回路１８
によりサービスプロセツサ４に障害発生を通知す
るとともに、停止指示回路１９によりプロセツサ
部１ａ上でのプログラム実行動作を停止せしめ
る。 Now, when a fault is detected by the fault detection circuit 10 of the processor section 1a, the interface control circuit 18
This notifies the service processor 4 of the occurrence of the failure, and causes the stop instruction circuit 19 to stop the program execution operation on the processor section 1a.

上記プロセツサ部１ａからの障害発生の通知
は、サービスプロセツサ４のインタフエース制御
部４０を介して再試行指示回路４１に渡される。
プロセツサ部１ａから障害発生の通知を受けた再
試行指示回路４１は、インタフエース制御部４０
を介してまずプロセツサ部１ａの内部情報を読出
し、プロセツサ部１ａでの再試行が可能か否かを
判別する。もし表示子１１，１７がともにオフで
プロセツサ部１ａでの再試行が可能ならば、再試
行指示回路４１はプロセツサ部１ａに再試行を指
示する。 The notification of the occurrence of a failure from the processor section 1a is passed to the retry instruction circuit 41 via the interface control section 40 of the service processor 4.
The retry instruction circuit 41 receives the notification of the occurrence of a failure from the processor unit 1a, and the interface control unit 40
First, the internal information of the processor section 1a is read out via the processor section 1a, and it is determined whether a retry in the processor section 1a is possible. If the indicators 11 and 17 are both off and the processor section 1a can retry, the retry instruction circuit 41 instructs the processor section 1a to retry.

サービスプロセツサ４からの再試行指示は、イ
ンタフエース制御回路１８を介してプロセツサ部
１ａの再試行制御回路１６に渡される。サービス
プロセツサ部４から再試行の指示を受けた再試行
制御回路１６は、障害発生により実行が中断され
た命令の再試行を行い、再試行に成功すると割込
制御回路１４に通知し割込制御回路１４は自プロ
セツサ部１ａ上のプログラムに対して再試行成功
の障害が発生したことを示す内部マシンチエツク
割込みを起こす。該割込みを受付けたプロセツサ
部１ａ上のプログラムは、割込みレベル及び割込
みコードにより割込み原因を知り、プロセツサ部
１ａで再試行成功の障害が発生したことをメモリ
部あるいは外部記憶装置に記録した後、プログラ
ム動作を継続する。この再試行成功の障害に関す
る記録は、ハードウエアの保守診断時に利用され
る。 A retry instruction from the service processor 4 is passed to the retry control circuit 16 of the processor section 1a via the interface control circuit 18. The retry control circuit 16, which receives a retry instruction from the service processor section 4, retries the instruction whose execution was interrupted due to the occurrence of a failure, and when the retry is successful, it notifies the interrupt control circuit 14 and issues an interrupt. The control circuit 14 generates an internal machine check interrupt to the program on its own processor section 1a indicating that a retry success failure has occurred. The program on the processor section 1a that has accepted the interrupt learns the cause of the interrupt from the interrupt level and interrupt code, records in the memory section or external storage device that a retry success failure has occurred in the processor section 1a, and then executes the program. Continue operation. This record of successful retry failures is used during hardware maintenance diagnosis.

サービスプロセツサ部４から再試行を受けたプ
ロセツサ部１ａで、命令の再試行に失敗した時、
プロセツサ部１ａはインタフエース７１を介して
サービスプロセツサ部４に再試行失敗を通知し、
自らは再び停止状態になる。 When the processor section 1a receives a retry from the service processor section 4 and fails to retry the command,
The processor unit 1a notifies the service processor unit 4 of the retry failure via the interface 71,
itself becomes stationary again.

プロセツサ部１ａから障害発生の通知を受け、
その障害が再試行不能な障害（プロセツサ部１ａ
の表示子１１がオン）あるいは再試行に失敗した
障害（プロセツサ部１ａの表示子１７がオン）で
あつた時、サービスプロセツサ部４の再試行指示
回路４１は障害処理指示回路４２に制御を渡す。
この時、プロセツサ部１ａは停止状態のままであ
る。制御を引き継いだ障害処理指示回路４２は、
まず他プロセツサ部１ｂの状態を調べ、プロセツ
サ部１ｂにおいてプログラム実行中であれば、該
指示回路４２はインタフエース制御回路４０、イ
ンタフエース７１を介してプロセツサ部１ｂ内の
外部マシンチエツク割込原因表示レジスタ（図示
せず）のプロセツサ部１ａに対応する障害表示ビ
ツトをセツト（“１”をスキヤンイン）し、表示
制御回路４３から入出力装置５への障害発生メツ
セージの表示は行わない。外部マシンチエツク割
込原因表示レジスタがセツトされると、プロセツ
サ部１ｂは外部マシンチエツク割込みを起こし、
プロセツサ部１ｂ上のプログラムにプロセツサ部
１ａでのハードウエア障害発生を知らせる。割込
みを受付たプロセツサ部１ｂ上のプログラムは、
割込みレベル、割込みコードにより他プロセツサ
部１ａでのハードウエア障害発生を知ると、シス
テム運転を継続するか否か、プロセツサ部１ａの
障害を運用者に通知するか否か等を判断し、必要
があれば、データチヤネル部３ｂを介して入出力
装置６ｂにプロセツサ部１ａ障害を示すメツセー
ジを表示する。 Upon receiving notification of the occurrence of a failure from the processor unit 1a,
The failure is a failure that cannot be retried (processor unit 1a
When the retry instruction circuit 41 of the service processor section 4 instructs the failure processing instruction circuit 42 to control the service hand over.
At this time, the processor section 1a remains in a stopped state. The failure processing instruction circuit 42 that took over the control is
First, the status of the other processor section 1b is checked, and if the program is being executed in the processor section 1b, the instruction circuit 42 displays the cause of an external machine check interrupt in the processor section 1b via the interface control circuit 40 and interface 71. A fault indication bit corresponding to the processor section 1a of a register (not shown) is set (scanned in as "1"), and a fault occurrence message is not displayed from the display control circuit 43 to the input/output device 5. When the external machine check interrupt cause display register is set, the processor section 1b generates an external machine check interrupt,
The program on the processor section 1b is notified of the occurrence of a hardware failure in the processor section 1a. The program on the processor unit 1b that received the interrupt is
When the interrupt level and interrupt code indicate that a hardware failure has occurred in another processor unit 1a, it is determined whether or not to continue system operation, whether or not to notify the operator of the failure in the processor unit 1a, and so on. If so, a message indicating a failure in the processor section 1a is displayed on the input/output device 6b via the data channel section 3b.

プロセツサ部１ｂが停止状態でプログラム実行
が行われていない場合、あるいはシングル・プロ
セツサ構成でプロセツサ部１ｂが接続されていな
い場合、障害処理指示回路４２は、プロセツサ部
１ｂへの指示（スキヤンイン）動作は行わず、表
示制御回路４３に入出力装置５への障害報告メツ
セージ表示を指示する。表示制御回路４３は、障
害処理指示回路４２からの指示に基づき、入出力
装置５に障害発生を報告するメツセージを表示す
る。 When the processor section 1b is in a stopped state and no program is being executed, or when the processor section 1b is not connected in a single processor configuration, the fault handling instruction circuit 42 instructs the processor section 1b to perform a scan-in operation. Instead, it instructs the display control circuit 43 to display a failure report message on the input/output device 5. The display control circuit 43 displays a message reporting the occurrence of a failure on the input/output device 5 based on instructions from the failure handling instruction circuit 42 .

第６図は上記プロセツサ部１ａで障害が発生し
たときのプロセツサ部、サービスプロセツサ部の
動作をフローで示したもので、ａはサービスプロ
セツサ部４の動作、ｂはプロセツサ部１ａの動
作、ｃはプロセツサ部１ｂの動作である。 FIG. 6 is a flowchart showing the operations of the processor section and service processor section when a failure occurs in the processor section 1a, where a indicates the operation of the service processor section 4, b indicates the operation of the processor section 1a, c is the operation of the processor section 1b.

本実施例では、運用者への障害報告は入出力装
置５，６ａ，６ｂにより行われるとしたが、回線
を介して遠隔へメツセージ送出、あるいは警報
（ベル等）の鳴動等によつて障害の表示を行うこ
ともできる。 In this embodiment, the failure report to the operator is carried out by the input/output devices 5, 6a, and 6b, but failures can be reported by sending a message to a remote location via a line or by ringing an alarm (bell, etc.). It can also be displayed.

また、プロセツサ部１ａで障害が検出された
時、再試行可能な障害か、再試行不能な障害かの
切分けをプロセツサ部１ａのハードウエアで行
い、再試行不能な障害が検出されたとき、あるい
は再試行失敗のときにプロセツサ部１ａからサー
ビスプロセツサ部４に通知し、再試行成功の場合
はサービスプロセツサ４に通知しない方法も考え
られる。 Furthermore, when a fault is detected in the processor section 1a, the hardware of the processor section 1a distinguishes between a fault that can be retried and a fault that cannot be retried, and when a fault that cannot be retried is detected. Alternatively, a method may be considered in which the processor section 1a notifies the service processor section 4 when the retry fails, but does not notify the service processor 4 when the retry is successful.

また、プロセツサ部１ａで再試行不能な障害が
検出又は再試行失敗となつたとき、プロセツサ部
１ａからサービスプロセツサ部４に障害を通知す
るとともに、プロセツサ部１ａからプロセツサ部
１ｂに対して、直接、外部マシンチエツク割込み
要求信号を送出することとし、サービスプロセツ
サ部４からプロセツサ部１ｂへのスキヤンイン
（障害表示ビツトを１にセツト）を行わないよう
にする方法も考えられる。 Furthermore, when a failure that cannot be retried is detected in the processor unit 1a or a retry failure occurs, the processor unit 1a notifies the service processor unit 4 of the failure, and also directly sends a message from the processor unit 1a to the processor unit 1b. Another possible method is to send an external machine check interrupt request signal and prevent scan-in from the service processor section 4 to the processor section 1b (setting the fault indication bit to 1).

なお、情報処理装置内にプロセツサ部が３つ以
上ある場合は、サービスプロツツサ部から障害を
通知するプロセツサ部の優先順位をあらかじめ決
めておき、いずれかのプロセツサ部で再試行不能
なハードウエア障害が検出されたとき、あるいは
ハードウエア障害の再試行が失敗となつたとき、
プログラム実行中でかつ最も優先順位の高いプロ
セツサ部に障害発生を通知することにより、本実
施例と同様に行うことができる。あるいは、優先
順位の判定は外部マシンチエツク割込みマスクの
オン／オフによりプログラムが行うこととし、プ
ロセツサ部での再試行不能な障害が検出されたと
きあるいは再試行失敗のときは他の全てのプロセ
ツサ部に通知する方法も考えられる。 In addition, if there are three or more processor units in the information processing device, the priority order of the processor units to which the service processor will notify failures is determined in advance, and any hardware failure that cannot be retried by any one of the processor units is determined in advance. is detected or a hardware failure retry fails.
This can be done in the same way as in this embodiment by notifying the processor unit that is executing the program and has the highest priority of the failure occurrence. Alternatively, the program determines the priority by turning on/off the external machine check interrupt mask, and when a non-retryable failure in the processor is detected or a retry failure occurs, all other processors are Another possible method is to notify the

〔Effect of the invention〕

以上の通り、本発明によれば次のような効果が
得られる。 As described above, according to the present invention, the following effects can be obtained.

(1) あるプロセツサ部で再試行不能な障害が検出
された時、あるいは再試行に失敗した時、該障
害プロセツサ部でのプログラム実行は停止する
ため、他プロセツサ部でのプログラム実行への
悪影響を防止し、該障害によるシステムへの擾
乱を防ぐことができる。(1) When a failure that cannot be retried is detected in a certain processor section, or when a retry attempt fails, program execution in the faulty processor section is halted, so there is no negative impact on program execution in other processor sections. It is possible to prevent disturbances to the system due to such failures.

(2) 他プロセツサ部でプログラム実行が行われて
おり、システムの運転継続が可能な場合、運用
者への障害情報等は該他プロセツサ部のプログ
ラムにより行われるため、運用者にはシステム
の運転に合致した障害報告が可能であり、運用
者に混乱を与えることがない。(2) If the program is being executed in another processor and the system can continue to operate, the operator will be informed of the failure, etc. by the program of the other processor. It is possible to report failures in accordance with the requirements, and there is no confusion for operators.

(3) 他のプロセツサ部でプログラムが実行されて
いない場合、あるいは他プロセツサ部が存在し
ない場合は、サービスプロセツサ部が運用者へ
の障害報告を行うため、プロセツサ部の障害に
よりシステムの運用が停止した場合でも、運用
者への障害報告は確実に行われる。(3) If the program is not being executed in another processor, or if there is no other processor, the service processor will report the failure to the operator, so system operation may be interrupted due to a failure in the processor. Even in the event of an outage, failures will be reported to the operator without fail.

[Brief explanation of the drawing]

第１図は情報処理装置の構成例を示す図、第２
図は従来の障害処理方式のハードウエア構成の一
例を示す図、第３図は第２図の処理フローを示す
図、第４図及び第５図は本発明による障害処理方
式のハードウエア構成の一実施例を示す図、第６
図は本発明による処理フローを示す図である。１ａ，１ｂ……プロセツサ部（CPU）、２……
メモリ部（MEM）、３ａ，３ｂ……データチヤ
ネル部（DCH）、４……サービスプロセツサ部
（SVP）、５，６ａ，６ｂ……入出力装置、１０
……障害検出回路、１１……再試行可能／不可能
表示子、１２……OR回路、１３……AND回路、
１４……割込制御回路、１５……インバータ、１
６……再試行制御回路、１７……再試行中表示
子、１８……インタフエース制御回路、１９……
停止指示回路、４０……インタフエース制御回
路、４１……再試行指示回路、４２……障害処理
指示回路、４３……表示制御回路。 Figure 1 is a diagram showing an example of the configuration of an information processing device;
The figure shows an example of the hardware configuration of the conventional fault handling method, FIG. 3 shows the processing flow of FIG. 2, and FIGS. 4 and 5 show the hardware configuration of the fault handling method according to the present invention. Diagram showing one embodiment, No. 6
The figure is a diagram showing a processing flow according to the present invention. 1a, 1b...processor section (CPU), 2...
Memory section (MEM), 3a, 3b...data channel section (DCH), 4...service processor section (SVP), 5, 6a, 6b...input/output device, 10
... Failure detection circuit, 11 ... Retry possible/impossible indicator, 12 ... OR circuit, 13 ... AND circuit,
14...Interrupt control circuit, 15...Inverter, 1
6... Retry control circuit, 17... Retrying indicator, 18... Interface control circuit, 19...
Stop instruction circuit, 40...Interface control circuit, 41...Retry instruction circuit, 42...Fault processing instruction circuit, 43...Display control circuit.

Claims

[Scope of Claims] 1. In an information processing device comprising one or more processor sections that execute programs and a service processor section that maintains and controls the operation of the processor sections, the processor section comprises: When the processor detects a hardware failure that cannot be retried, or when the processor fails to retry the hardware failure, it stops program execution on its own processor and also services the failure. A means for notifying the processor section, and a means for notifying the operator of the occurrence of a failure when the service processor section receives a notification of a failure in another processor section during program execution. When the service processor section receives a notification of the occurrence of a failure from the processor section, the service processor section checks the status of the other processor sections, and if any other processor section is executing the program, the service processor section sends a message to the processor section. If a failure is notified and the program is not executed in any other processor section,
Or, if there is no other processor unit, a failure handling method characterized by having means for notifying an operator of the occurrence of a failure through the input/output device of the service processor.