JPH0320774B2

JPH0320774B2 -

Info

Publication number: JPH0320774B2
Application number: JP58056370A
Authority: JP
Inventors: Masaaki Nagao; Yasutaka Oochi
Original assignee: Fujitsu Dai Ichi Communications Software Ltd; Fujitsu Ltd
Current assignee: Fujitsu Dai Ichi Communications Software Ltd; Fujitsu Ltd
Priority date: 1983-03-31
Filing date: 1983-03-31
Publication date: 1991-03-20
Also published as: JPS59194253A

Description

【発明の詳細な説明】 (1) 発明の技術分野本発明は、システムを構成する装置の障害判定
方式に係り、特に装置間の接続状態においてある
装置の異常によつてその他の装置が障害とみなさ
れることを防止する障害判定方式に関する。[Detailed Description of the Invention] (1) Technical Field of the Invention The present invention relates to a failure determination method for devices constituting a system, and in particular, in a connection state between devices, when an abnormality in one device causes a failure in another device. This invention relates to a failure determination method that prevents failures from occurring.

(2) 従来技術と問題点従来の障害判定方式としては、ある装置に異常
を検出した時直ちに障害とみなす方式と、各装置
の異常発生回数が一定の値をこえた時に障害とみ
なす方式とがあるが、前者では、一時的な異常の
発生で実際は使用可能な場合にも障害とみなされ
る欠点があり、後者には前者の場合のような欠点
はないが、装置相互の接続状態によつて、ある装
置の異常によつて他の装置が障害とみなされるこ
とがあるという欠点を持つ。(2) Prior art and problems There are two conventional failure determination methods: one that immediately considers a failure to occur when an abnormality is detected in a device, and the other that deems it a failure when the number of times an abnormality occurs in each device exceeds a certain value. However, the former has the disadvantage that the occurrence of a temporary abnormality is considered a failure even if it is actually usable, while the latter does not have the disadvantage of the former, but it may depend on the mutual connection state of the devices. However, it has the disadvantage that an abnormality in one device may cause other devices to be considered to be at fault.

(3) 発明の目的本発明の目的は、上記問題点を解決することに
あり、システム内の装置の障害を判定する時に、
その装置自体の異常発生回数だけでなく、その装
置の下に接続された全ての装置の異常発生回数の
総和が一定値をこえた時にも障害と判定すること
によつて、ある装置の異常によつてその装置に接
続される別装置が異常とみなされ障害と判定され
ることを防止するような障害判定方式を提供する
ことにある。(3) Purpose of the Invention The purpose of the present invention is to solve the above-mentioned problems.
By determining a failure not only when the number of times an error occurs in the device itself, but also when the sum of the number of times an error occurs in all devices connected to the device exceeds a certain value, it is possible to detect an error in a device. Therefore, it is an object of the present invention to provide a failure determination method that prevents another device connected to the device from being considered abnormal and being determined to be a failure.

(4) 発明の構成上記目的を達成するために、本発明は、上位装
置に複数の下位装置が接続され階層的に運転さ
れ、各装置対応に異常の発生回数をカウントする
手段を備え、該手段により所定値以上カウントさ
れた装置を障害とみなしてシステムから切離し、
代替装置に切替ることにより運転を続行するよう
なシステムにおいて、前記各装置対応の異常発生
回数の合計を出す合計手段及び該合計の値により
障害とみなす合計の基準値を備え、前記装置対応
の前記カウントする手段によりカウントした各値
が装置対応の所定値を越えるとき、該当装置を障
害と判定し前記システムより切離すとともに、前
記合計手段により計数した合計値が前記合計の基
準値より越えるとき、前記各装置の上位装置を障
害と判定し該上位装置を代替装置に切替ることを
特徴とする。(4) Structure of the Invention In order to achieve the above object, the present invention provides a system in which a plurality of lower-level devices are connected to a higher-level device and are operated in a hierarchical manner, and a device is provided with means for counting the number of times an abnormality occurs for each device. A device whose count exceeds a predetermined value by the means is regarded as a failure and is disconnected from the system.
In a system that continues operation by switching to an alternative device, the system is provided with a totalizing means for calculating the total number of abnormality occurrences corresponding to each of the devices, and a reference value for the total that is considered to be a failure based on the total value, When each value counted by the counting means exceeds a predetermined value corresponding to the device, the corresponding device is determined to be a failure and is disconnected from the system, and when the total value counted by the summing means exceeds the reference value for the total. , the host device of each of the devices is determined to be at fault, and the host device is switched to an alternative device.

(5) 発明の実施例以下本発明を実施例により詳細に説明する。第
１図は本発明に係るシステム構成例を示す。第１
図において、制御装置Ｘとそれに接続され、現在
使用中の装置Ａとその代替装置A′、装置Ａ及び
A′に接続され、Ａ又はA′を通して制御装置Ｘに
よつて制御される装置Ｂ，Ｃ，Ｄがある。また、
これらの各装置に対応した異常発生回数のカウン
タFCNT（Ca，Ca′，Cb，Cc，Cd）と、障害を判
定するための基準の値FB（Fa，Fa′，Fb，Fc，
Fd及びFbcd）を設ける。(5) Examples of the invention The present invention will be explained in detail below using examples. FIG. 1 shows an example of a system configuration according to the present invention. 1st
In the figure, a control device X, a device A connected to it and currently in use, an alternative device A', a device A
There are devices B, C, D connected to A' and controlled by control device X through A or A'. Also,
Counter FCNT (Ca, Ca′, Cb, Cc, Cd) of the number of abnormalities that have occurred corresponding to each of these devices and standard value FB (Fa, Fa′, Fb, Fc,
Fd and Fbcd).

そこで、従来の障害装置の識別法を第２図の制
御フローを例に説明する。装置Ｂに異常が発生
し、これが検出されるとカウンタCbがカウント
アツプされ、異常発生回数Cbが規準の値Fbをこ
えた場合に装置Ｂは障害とみなされたシステムか
ら切離される。ところが、現在使用中の装置Ａに
異常が発生し、これがＡに接続された装置Ｂ，
Ｃ，Ｄの異常として検出された場合カウンタCb，
Cc，Cdは異常検出の都度計数が行なわれ、従来
の方式では装置Ｂ，Ｃ，Ｄがそれぞれ障害とみな
される。 Therefore, a conventional method for identifying a faulty device will be explained using the control flow shown in FIG. 2 as an example. When an abnormality occurs in device B and is detected, a counter Cb is incremented, and when the number of abnormality occurrences Cb exceeds a standard value Fb, device B is separated from the system deemed to be at fault. However, an abnormality occurs in the device A currently in use, and this causes the device B, which is connected to A, to
When C and D are detected as abnormal, counter Cb,
Cc and Cd are counted each time an abnormality is detected, and in the conventional system, devices B, C, and D are each regarded as a failure.

そこで、本発明では障害の判定基準値FBの値
を適当に設定することにより、例えば各カウンタ
Cb，Cc，Cdの総和が基準の値Fbcdをこえた時点
でＢ，Ｃ，Ｄの各装置が接続されている装置Ａを
障害としてシステムから切離し、代替装置A′と
切り替えて運転を続行可能としたものである。こ
の制御フローを第３図に示す。このように、本発
明の方式ではFa，Fa′，Fb，Fc，Fd及びFbcdの
値を適当に設定することによつて、装置Ａの異常
が原因でそれに接続された装置Ｂ，Ｃ，Ｄが異常
とみなされても、装置Ｂ，Ｃ，Ｄすべてが障害と
して切離される前に装置Ａを障害とし、代替装置
A′に切替えて運転することができシステムの使
用効率を上げることができる。 Therefore, in the present invention, by appropriately setting the failure determination reference value FB, for example, each counter
When the sum of Cb, Cc, and Cd exceeds the standard value Fbcd, device A to which devices B, C, and D are connected can be disconnected from the system as a failure, and operation can be continued by switching to alternative device A'. That is. This control flow is shown in FIG. As described above, in the method of the present invention, by appropriately setting the values of Fa, Fa', Fb, Fc, Fd, and Fbcd, it is possible to prevent devices B, C, and D connected to device A due to an abnormality. Even if device B, C, and D are considered to be abnormal, device A should be considered a failure and an alternative device should be installed.
The system can be operated by switching to A', increasing the system usage efficiency.

第４図、第５図に本発明を適用した他のシステ
ム構成例を示す。 FIGS. 4 and 5 show other system configuration examples to which the present invention is applied.

第４図は第１図で示したように被制御装置Ａに
代替装置を設けるのではなく、制御装置自体を二
重化した場合の例で第５図は下位の装置が上位の
装置と常時接続されるのではなく、その間の接続
をスイツチにより自由に変更できるような場合の
例である。ここれらの例についても、本発明の方
式は同様の効果を示す。また、各種の制御カウン
タの持ち方についても、下位の装置の異常回数の
総和によつて上位装置の障害を判定するための基
準値を別に設ける形にすればその構成及び物理的
な媒体の種別にかかわらず同様の効果が得られ
る。 Figure 4 is an example of a case where the control device itself is duplicated, rather than providing an alternative device to the controlled device A as shown in Figure 1, and Figure 5 shows a case where the lower-level device is always connected to the higher-level device. This is an example of a case where the connections between the two can be freely changed using a switch, rather than the connection between the two. The method of the present invention exhibits similar effects in these examples as well. In addition, regarding the way to hold various control counters, it is possible to set a separate reference value for determining a failure in the upper device based on the total number of abnormalities in the lower device, depending on the configuration and type of physical medium. The same effect can be obtained regardless.

次に本発明の障害装置判定方式を具体的システ
ムに適用した例を用いて説明する。第６図はシス
テム全体の構成を示し、CCは中央処理装置、
MMは主記憶装置、CHはチヤネル装置、ioは入
出力装置で、各ioはCHを通してCCから制御され
る。MM，CCはそれぞれ二重化構成をとり、使
用中装置を障害と認識した場合は、代替装置に切
かえ運転を続行する。MM，CHはそれぞれ、い
ずれのCCとも接続を行なうことが可能である。 Next, an explanation will be given using an example in which the faulty device determination method of the present invention is applied to a specific system. Figure 6 shows the overall system configuration, where CC is the central processing unit,
MM is the main memory, CH is a channel device, io is an input/output device, and each io is controlled from CC through CH. MM and CC each have a redundant configuration, and if a device in use is recognized as a failure, it switches to an alternative device and continues operation. MM and CH can each be connected to any CC.

各CH０，１に対応した異常発生回数を計数す
るためのカウンタC_CH0，C_CH1及びこれらの総和を
示すC_TOTALを持つものとし、これらの値がそれぞ
れ基準の値F_CH0，F_CH1，F_TOTALをこえた場合それ
ぞれ、CH０，CH１又はCCの障害と認識する。 Assume that there are counters C _CH0 and C _CH1 for counting the number of abnormal occurrences corresponding to each CH0 and CH1, and C _TOTAL indicating the sum of these, and these values are the reference values F _CH0 , F _CH1 , and F _TOTAL, respectively. If it exceeds, it is recognized as a failure of CH0, CH1 or CC respectively.

このようなシステムにおいて、CH０に障害が
発生した場合の時間的推移を第７図に示す。同図
において時間の経過Ｔに従い、CH０へのアクセ
スが発生しCH０が障害であるため、異常として
検出される。この異常を検出するたびにカウンタ
C_CH0が計数され、この値が一定値F_CH0をこえた時
にCH０は障害と認識されシステムから切離され
る。 In such a system, FIG. 7 shows the time course when a failure occurs in CH0. In the figure, as time elapses T, an access to CH0 occurs, and CH0 is a failure, so it is detected as an abnormality. Each time this abnormality is detected, the counter
C _CH0 is counted, and when this value exceeds a certain value F _CH0 , CH0 is recognized as a failure and is disconnected from the system.

即ち、この障害の段階ではF_TOTALに及ぶことな
く、チヤネル装置CHの段階でチヤネル装置系が
切り換えられる。 That is, at this failure stage, the channel device system is switched at the channel device CH stage without reaching F _TOTAL .

次にチヤネル障害ではなく、中央処理装置本
体、もしくはCC₀とCHとの接続部が障害となつ
た場合を第８図に示す。時間経過Ｔに従つてCH
０，CH１へのアクセスが発生するがCC本体、も
しくはCC−CH接続部が障害のためにCH０，
CH１が正常であるにもかかわらず異常として検
出される。これによつて、カウンタC_CH0，C_CH1及
び、C_TOTALが計数される。 Next, FIG. 8 shows a case where the failure is not due to a channel failure but to the main body of the central processing unit or the connection between CC ₀ and CH. CH according to time lapse T
0, CH1 is accessed, but due to a failure in the CC body or CC-CH connection, CH0,
CH1 is detected as abnormal even though it is normal. As a result, the counters C _CH0 , C _CH1 , and C _TOTAL are counted.

今、基準の値F_CH0，F_CH1，F_TOTALを３，３，４
という値であると仮定する。 Now, set the standard values F _CH0 , F _CH1 , F _TOTAL to 3, 3, 4
Assume that the value is .

従来方式では、C_TOTALがないため、t₀の時でCH
０が、t₁の時点でCH１が障害とみなされシステ
ムから切離されてしまいio装置が使用できない状
態となる。一方、本発明の方式ではC_TOTAL，
F_TOTALを設けたためt_TOTALの時において、C_TOTAL＞
F_TOTALとなつてCC₀を障害と認識し、代替装置
CC₁と切替え、その後は正常に運転を続行するこ
とが可能である。 In the conventional method, since there is no C _TOTAL , _CH
At time _t1 , CH1 is considered to be a failure and is disconnected from the system, making the IO device unusable. On the other hand, in the method of the present invention, C _TOTAL ,
Since F _TOTAL is provided, when t _TOTAL , C _TOTAL >
F _TOTAL recognizes CC ₀ as a failure and installs an alternative device.
After switching to CC ₁ , it is possible to continue operating normally.

以上述べたように、本発明の方式によれば、あ
る装置が障害となつた場合、その装置に接続され
た下位の装置を異常とみなして切離すことを防
ぎ、上位の装置を切替ることによつて、正常に処
理を続けることが可能となる。 As described above, according to the method of the present invention, when a certain device becomes a failure, it is possible to prevent lower-level devices connected to that device from being deemed abnormal and disconnect them, and to switch over to higher-level devices. This allows processing to continue normally.

(6) 発明の効果本発明によれば、ある装置の異常によつてその
下に接続された装置が異常とみなされるような場
合でも、下位の装置の異常発生回路の総和によつ
て、上位の装置を障害としてシステムから切離し
代替装置によつて運転を続行することができるの
で、システムの使用効率を上昇させることができ
るという効果がある。(6) Effects of the Invention According to the present invention, even if a device connected below it is deemed to be abnormal due to an abnormality in a certain device, the abnormality can be determined by the sum of the abnormality generating circuits of the lower devices. Since the faulty device can be disconnected from the system and operation can be continued with an alternative device, the efficiency of system use can be increased.

[Brief explanation of drawings]

第１図は本発明に係るシステム構成図、第２図
は従来の制御フロー、第３図は本発明の制御フロ
ー、第４図、第５図は本発明を適用し得る他のシ
ステム構成図、第６図は本発明を適用した具体的
システム構成の実施例図、第７図は障害発生時の
一例としてタイムチヤート及びカウンタの流れ
図、第８図は障害発生時の他の例としてのタイム
チヤート及びカウンタの流れ図である。Ｘ；制御装置、Ａ，A′，Ｂ，Ｃ，Ｄ；装置、
FCNT；異常発生回数のカウンタ、FB；基準値。 Figure 1 is a system configuration diagram according to the present invention, Figure 2 is a conventional control flow, Figure 3 is a control flow of the present invention, and Figures 4 and 5 are other system configuration diagrams to which the present invention can be applied. , FIG. 6 is an example diagram of a specific system configuration to which the present invention is applied, FIG. 7 is a time chart and counter flow chart as an example when a failure occurs, and FIG. 8 is a time chart as another example when a failure occurs. Figure 3 is a chart and counter flow diagram. X: control device, A, A', B, C, D: device,
FCNT: Counter for the number of abnormal occurrences, FB: Reference value.

Claims

[Scope of Claims] 1. A plurality of lower-level devices are connected to a higher-level device and operated hierarchically, and each device is provided with means for counting the number of times an abnormality has occurred, and the means causes a device whose count exceeds a predetermined value to be faulty. In a system that continues operation by disconnecting from the system and switching to an alternative device, a totalizing means for calculating the total number of abnormality occurrences for each device, and a reference value for the total that is considered to be a failure based on the total value. and when each value counted by the counting means corresponding to the device exceeds a predetermined value corresponding to the device, the corresponding device is determined to be a failure and is disconnected from the system, and the total value counted by the totaling means is A failure device determination method characterized in that when the total exceeds a reference value, a higher-level device of each of the devices is determined to be a failure, and the higher-level device is switched to an alternative device.