JP6087673B2

JP6087673B2 - Workaround execution management system and workaround execution management method

Info

Publication number: JP6087673B2
Application number: JP2013050952A
Authority: JP
Inventors: 弘晃京林
Original assignee: Hitachi Systems Ltd
Current assignee: Hitachi Systems Ltd
Priority date: 2013-03-13
Filing date: 2013-03-13
Publication date: 2017-03-01
Anticipated expiration: 2033-03-13
Also published as: JP2014178777A

Description

本発明は、障害監視装置が検知したインシデントをサービスデスクがワークアラウンド（一次対応）を行う場合に、運用手順を自動実行するランブックオートメーションなどの作業自動実行システムにおけるワークアラウンドの確認作業を支援し、作業自動実行システム導入初期のリスクを軽減することができるワークアラウンド実行管理システム及びワークアラウンド実行管理方法に関する。 The present invention supports work-around confirmation work in an automatic work execution system such as runbook automation that automatically executes an operation procedure when a service desk performs work-around (primary response) on an incident detected by a failure monitoring device. The present invention relates to a work-around execution management system and a work-around execution management method that can reduce the initial risk of introducing an automatic work execution system.

近年のコンピュータシステムにおいては、日々発生する監視対象機器（顧客コンピュータ、仮想マシン）からの解決すべき案件や課題を含むメッセージに対して迅速かつ正確にワークアラウンド（一次対応）を行う際に、これらメッセージをインシデントとしてコンピュータ管理するインシデント管理システムが採用されている。このインシデント管理システムでは、監視対象機器からのメッセージに加えて顧客からの問合せや作業依頼もインシデントとしてとらえ、インシデント管理システムに登録し、この障害監視運用業務においては、サービスデスクが監視対象の機器などから日々発生するインシデントに対し、業務への影響を最小限に抑えることを目的として、関係者への連絡や一次対応を行っている。 In recent computer systems, when a workaround (primary response) is performed quickly and accurately on a message containing problems and issues to be solved from monitored devices (customer computers, virtual machines) that occur daily An incident management system that manages a message as an incident has been adopted. In this incident management system, in addition to messages from monitored devices, inquiries and work requests from customers are also regarded as incidents, which are registered in the incident management system. In order to minimize the impact on business operations for incidents that occur every day from the beginning, we contact the relevant parties and provide primary responses.

サービスデスクが実施するこれらのワークアラウンドは、顧客のコンピュータシステムの拡大や複雑化に伴い処理件数の増加や手順の複雑化を招き、この対策として、ランブックオートメーションなどの作業自動実行システムを導入して、ワークアラウンドを自動化する対策が取られている。 These workarounds conducted by the service desk lead to an increase in the number of processing and complexity of procedures as customers' computer systems expand and become complex, and as a countermeasure against this, we have introduced an automated work execution system such as runbook automation. Measures are taken to automate workarounds.

このワークアラウンドを自動化する技術が記載された文献としては、下記の特許文献１が挙げられる。該特許文献１には、不具合発生時に、自己の識別情報及び時刻データを含むインシデント情報を抽出するインシデントデータ抽出部と、予め定められている不具合の判定条件を示す判定条件データに基づいて、不具合が発生しているか否かを判定するチェックリスト処理部と、不具合が発生しているときは、発生している不具合を解消するために、判定条件データに予め対応付けられているコマンドを実行する修復コマンド実行部と、インシデントデータ及びコマンドの実行結果を示す情報を管理サーバへ送信する送信部とを各クライアントに設ける技術が記載されている。これによって、軽微な不具合の修復を自動化する。 The following Patent Document 1 is cited as a document describing a technique for automating the workaround . The patent document 1 describes a problem based on an incident data extraction unit that extracts incident information including its own identification information and time data when a defect occurs, and determination condition data indicating a predetermined defect determination condition. A checklist processing unit that determines whether or not a problem has occurred, and when a problem has occurred, a command associated with determination condition data in advance is executed in order to eliminate the problem that has occurred a restoration command execution unit, technique and a transmitter Ru provided to each client that transmits information indicating the execution result of the incident data and commands to the management server is described. It's in this, to automate the minor problems of repair.

特開２００９−１８１４４１号公報JP 2009-181441 A

前述の特許文献１記載技術は、障害の内容に応じて復旧のためのコマンドを予め定義し、これを自動で実行することによって、サービスデスクが実施する作業を削減することができ、既に確立されたコマンドを順に実行する場合には有効であるが、複雑な手順をランブックオートメーションなどの技術を利用して実行に適用することが困難であるという課題があった。即ち、インシデント管理システムにおいては顧客によってコンピュータのハードウェア及びソフトウェアの構成が異なり、特にコマンドの結果に応じて複数の作業が分岐されて実行される場合があり、多種多様なインシデントに対応する一次対応を予め定義しても、これら一次対応が正常に実行可能かを人手により確認する必要があり、この確認作業が繁雑であり、ランブックオートメーションなどの技術を利用して実行に適用することが困難であるという課題があった。 The technique described in Patent Document 1 described above is already established by pre-defining a command for recovery according to the content of a failure and automatically executing this command, thereby reducing the work performed by the service desk. However, it is difficult to apply complicated procedures to execution using techniques such as runbook automation. In other words, in the incident management system, the hardware and software configuration of the computer varies depending on the customer, and in particular, multiple operations may be branched and executed according to the command result, and the primary response corresponding to various incidents However, it is necessary to manually check whether these primary correspondences can be executed normally, even if they are defined in advance, this confirmation work is complicated, and it is difficult to apply to the execution using techniques such as runbook automation. There was a problem of being.

即ち、自動化された手順は、適用の初期段階では、人が正常・異常を監視し、異常が発生した場合には、速やかに復旧するようなリカバリ手順を確立していないと、一時的に品質が劣化するリスクがあり、このためサービスデスクは、一定期間毎に自動化された手順を定期的に捕捉する作業が必要となり、運用が複雑及び煩雑であるという課題があった。 In other words, the automated procedure is used in the initial stage of application, when a person monitors normality / abnormality, and if a recovery procedure has been established that will promptly recover in the event of an abnormality, the quality is temporarily There is a risk that the service desk deteriorates. For this reason, the service desk needs to regularly capture an automated procedure every certain period, and there is a problem that the operation is complicated and complicated.

本発明の目的は、複数のインシデントに対するワークアラウンド（一次対応）の確認作業を支援して作業自動実行システム導入初期のリスクを軽減することができるワークアラウンド実行管理システム及びワークアラウンド実行管理方法を提供することである。 An object of the present invention, a work-around (primary response) workaround execution management system and the work-around execution management method of the confirmation work can reduce the risk of working automatic execution system initially introduced to support for multiple incidents Is to provide.

本発明は、テストを完了した作業フローの本番環境においてサービスデスクが作業フローの動作を監視する第１段階と、該第１段階において第１の成功率以上の正常動作が確認された作業フローの正常及び異常をサービスデスクに通知すると共に前記第１の成功率未満の正常動作が確認された作業フローを第１段階に下げる第２段階と、該第２段階において第２の成功率以上の正常動作が確認された作業フローの異常をサービスデスクに通知すると共に前記第２の成功率未満の正常動作が確認された作業フローを第２段階に下げる第３段階とを設定する。
そして本発明は、前記フロー名の作業フローを実行した際の第１から第３段階毎の成功率を含む実績情報を作業実績管理データベースに格納する第１工程と、第１段階において第１の成功率以上の正常動作が確認された作業フローの正常及び異常をサービスデスクに通知し、前記第１の成功率以上の正常動作が確認された作業フローを第２段階に上げる第２工程と、第２段階において第２の成功率以上の正常動作が確認された作業フローの異常をサービスデスクに通知し、前記第１の成功率以上の正常動作が確認された作業フローを第３段階に上げると共に第２の成功率未満の正常動作が確認された作業フローを第１段階に下げる第３工程と、第３段階において第２の成功率未満の正常動作が確認された作業フローを第２段階に下げる第４工程とを実行する。 The present invention relates to a first stage in which a service desk monitors a workflow operation in a production environment of a workflow that has been tested, and a workflow in which a normal operation of a first success rate or higher is confirmed in the first stage. A second stage for notifying the service desk of normality and abnormalities, and lowering the work flow in which normal operation less than the first success rate is confirmed to the first stage, and normal above the second success rate in the second stage A third stage is set for notifying the service desk of an abnormality in the workflow whose operation has been confirmed and lowering the work flow whose normal operation less than the second success rate has been confirmed to the second stage.
The present invention includes a first step of storing performance information including a success rate for each of the first to third stages in the work performance management database when the workflow having the flow name is executed, and a first step in the first stage. A second step of notifying the service desk of the normality and abnormality of the work flow confirmed to have a normal operation equal to or higher than the success rate, and raising the work flow confirmed to be normal operation equal to or higher than the first success rate to the second stage ; In the second stage, the service desk is notified of an abnormality in the work flow in which the normal operation of the second success rate or higher is confirmed, and the work flow in which the normal operation of the first success rate or higher is confirmed is raised to the third stage. And a third step for lowering the work flow in which the normal operation less than the second success rate is confirmed to the first stage, and a work flow in which the normal operation less than the second success rate is confirmed in the third step in the second stage. 4th work to lower To run the door.

本発明によるワークアラウンド実行管理システム及びワークアラウンド実行管理方法は、複数のインシデントに対するワークアラウンド（一次対応）の確認作業を支援して作業自動実行システム導入初期のリスクを軽減することができる。 The workaround execution management system and the workaround execution management method according to the present invention can support the workaround (primary response) confirmation work for a plurality of incidents, and can reduce the risk at the initial stage of the automatic work execution system introduction.

本発明の実施形態によるワークアラウンド実行管理システムを含むコンピュータシステムを示す図である。 It is a figure which shows the computer system containing the workaround execution management system by embodiment of this invention. 本実施形態によるインシデントデータベースを説明するための図である。 It is a figure for demonstrating the incident database by this embodiment. 本実施形態による作業実績管理データベースを説明するための図である。 It is a figure for demonstrating the work performance management database by this embodiment. 本発明の原理を説明するための適用イメージを示す図である。 It is a figure which shows the application image for demonstrating the principle of this invention. 本実施形態による作業実績管理画面の一例を示す図である。 It is a figure which shows an example of the work performance management screen by this embodiment. 本実施形態によるインシデント一覧画面の一例を示す図である。 It is a figure which shows an example of the incident list screen by this embodiment. 本実施形態によるワークアラウンド実行管理動作を示すフロー図である。 It is a flowchart which shows the workaround execution management operation | movement by this embodiment.

以下、本発明によるワークアラウンド実行管理方法を適用したワークアラウンド実行管理システムの一実施形態を図面を参照して詳細に説明するものであるが、まず、本実施形態によるワークアラウンド実行管理システムの原理を説明する。
［原理］
本実施形態によるワークアラウンド実行管理システムは、従来技術においてはサービスデスク１４がワークアラウンド（一次対応）である作業フローのテストを完了し、本番環境にリリースした後、本番環境においても当該作業フロー全てのインシデントに対して有効（エラーが生じるか否か）か監視するため一定期間サービスデスク１４が当該作業フローが正常に動作するか常時監視しなければならないものであった。これに対して本発明においては、まず、インシデントに対して付与された対応事例であるワークアラウンド（一次対応）の検証を段階的に行うため、この段階（Ｓｔｅｐ）としてＳｔｅｐ１〜Ｓｔｅｐ３の検証段階として３段階を設定する。次いで、図４の適用イメージに示す如く、この検証を行う際の検証レベルを、サービスデスク１４が作業フローのテストを完了し、本番環境にリリースした段階であるＳｔｅｐ１と、該Ｓｔｅｐ１の次に実装置において所定実行回数以上且つ一定成功率（第１の成功率）以上の実績を残した段階であるＳｔｅｐ２と、該Ｓｔｅｐ２の次に所定実行回数以上且つ一定成功率（第２の成功率）以上で実績を残した段階であるＳｔｅｐ３との３段階として設定する。なお、前記検証段階は３段階に限られるものではなく更に多段とすることや、前記成功率及び又は所定実行回数はサービスデスク又は管理者のコンピュータ等の外部から変更できるように構成しても良い。 Hereinafter, an embodiment of a workaround execution management system to which a workaround execution management method according to the present invention is applied will be described in detail with reference to the drawings. First, the principle of the workaround execution management system according to the present embodiment will be described. Will be explained.
[principle]
In the workaround execution management system according to the present embodiment, in the prior art, after the service desk 14 completes the test of the workflow that is workaround (primary correspondence) and releases it to the production environment, all the workflows also in the production environment. In order to monitor whether the incident is valid (whether or not an error occurs), the service desk 14 has to constantly monitor whether the work flow operates normally for a certain period of time . On the other hand, in the present invention, first, in order to perform step-by-step verification of workaround (primary response) which is a response example given to an incident, this step (Step) is performed as a verification step of Step 1 to Step 3. Set 3 levels . Next, as shown in the application image of FIG. 4, the verification level at the time of performing this verification is set to Step 1 which is the stage where the service desk 14 has completed the workflow test and released to the production environment, and after Step 1 Step 2, which is a stage in which the device has a record of a predetermined number of executions or more and a certain success rate (first success rate) or more, and a step after the Step 2 and a predetermined success rate (second success rate) or more Is set as three stages with Step 3, which is the stage where the results are left. The verification stage is not limited to three stages, and may be further multi-staged, or the success rate and / or the predetermined number of executions may be changed from the outside such as a service desk or an administrator's computer. .

前記Ｓｔｅｐ１では、作業フローの実行実績がない状態であるため、サービスデスクが人手で作業自動実行システム１１を利用して作業フローを実行する段階であり、作業フローの実行に失敗しても、サービスデスク１４が即時に対応することができる。 In Step 1, since there is no work flow execution result, the service desk is a stage in which the service desk manually executes the work flow using the work automatic execution system 11, and even if the work flow execution fails, the service desk The desk 14 can respond immediately.

前記Ｓｔｅｐ２では、ワークアラウンド実行管理システム８が作業自動実行システム１１を起動し、作業フローを自動で実行し、実行結果を正常であってもサービスデスク１４にメールなどで通知する段階である。 In Step 2, the workaround execution management system 8 activates the automatic work execution system 11, automatically executes the work flow, and notifies the service desk 14 by e-mail or the like even if the execution result is normal.

前記Ｓｔｅｐ３では、ワークアラウンド実行管理システム８が作業自動実行システム１１を起動し、作業フローを自動で実行し、実行結果が異常時のみサービスデスク１４にメールなどで通知を行い、殆どサービスデスクの工数を必要としない段階である。また、Ｓｔｅｐ２又はＳｔｅｐ３において一定成功率以上作業フローが成功しない場合は、現在のＳｔｅｐレベルを低下（Ｓｔｅｐ２→Ｓｔｅｐ１、Ｓｔｅｐ３→Ｓｔｅｐ２）させる。 In Step 3, the workaround execution management system 8 starts the automatic work execution system 11, automatically executes the work flow, and notifies the service desk 14 by e-mail or the like only when the execution result is abnormal. It is a stage that does not require. If the workflow does not succeed at a certain success rate or higher in Step 2 or Step 3, the current Step level is lowered (Step 2 → Step 1, Step 3 → Step 2).

このように本発明によるワークアラウンド実行管理システムは、ワークアラウンド（一次対応）である作業フロー動作の検証レベルとして、作業フローの本番環境において、作業フローの多数のインシデントに対する検証レベルとして、次のＳｔｅｐを設定する。
・本番環境にリリースし、サービスデスク１４が動作を監視する段階であるＳｔｅｐ１。
・該Ｓｔｅｐ１の次に実装置において一定成功率（第１の成功率）以上の実績を残し、作業フロー実行の正常及び異常をサービスデスク１４に通知する段階であるＳｔｅｐ２。
・該Ｓｔｅｐ２の次に一定成功率（第２の成功率）以上で実績を残し、作業フロー実行が異常のときのみにサービスデスク１４に通知する段階であるＳｔｅｐ３。
前記Ｓｔｅｐ２又はＳｔｅｐ３において一定成功率以上作業フローが成功しない場合は現Ｓｔｅｐレベルを下げることによって本システムは、作業を自動化する際のワークアラウンド（一次対応）である作業フローの動作の検証をサービスデスク１４の常時監視を必要とせずに効率的に行うことができる。本実施形態においては、前記各Ｓｔｅｐを検証段階とも呼ぶ。 As described above, the workaround execution management system according to the present invention uses the following Step as a verification level for a workflow operation that is a workaround (primary correspondence) as a verification level for a large number of incidents in the workflow in the production environment of the workflow. Set.
- released to the production environment, the service desk 14 is a stage for monitoring the operation Step1.
- constant success rate in real device to the next of the Step1 leaving (first success rate) or more results, a step of notifying the normal and abnormal workflow execution service desk 14 Step2.
· The following constant success rate of the Step2 leaving proven (second success rate) than a step of notifying only the service desk 14 when workflow execution is abnormal Step3.
The S Tep2 or if workflow over a certain success rate in Step3 is not successful the system by Rukoto lowering the current Step level, verification of the operation of the workflow as a work-around (primary response) when automating work Can be efficiently performed without requiring constant monitoring of the service desk 14. In the present embodiment, each Step is also called a verification stage.

［構成］
本実施形態によるワークアラウンド実行管理システムを含むコンピュータシステムは、図１に示す如く、顧客コンピュータである複数の監視対象機器２に接続された障害監視装置３を設置したデータセンタ１と、該データセンタ１にネットワーク４を介して接続されて前記障害監視装置３からの障害メッセージをインシデントとして管理し、インシデントに対する一次対応の進捗状況を管理する監視センタ５とを備える。該監視センタ５は、次の構成を備える。
（１）インシデントに一意に付与されたインシデントＩＤ毎のインシデント発生日時・ホスト名・発生システム・顧客名・メッセージ・対応事例ＩＤ・対応事例のフロー名を格納するインシデントデータベース７。
（２）ネットワーク４を介して障害監視装置３から受信したインシデントをインシデントデータベース７に自動で登録するインシデント管理システム６。
（３）対応事例に一意に付与された対応事例ＩＤ毎の対応事例と検証段階（Ｓｔｅｐ１〜Ｓｔｅｐ３）毎の成功率・失敗率・実行回数・失敗回数等の実績情報を格納した作業実績管理データベース９。
（４）各処理ステップ毎の処理を実行する作業自動実行システム１１と、前記インシデントがインシデントデータベース７に登録されたことを契機として起動され、インシデントデータベース７と作業実績管理データベース９の登録内容に基づいてパトランプ（信号灯）１３を鳴動するか又は作業自動実行システム１１を起動するかの何れかを決定するワークアラウンド実行管理システム８。
（５）該ワークアラウンド実行管理システム８が保存する作業実績管理データベース９に格納された実行結果（実績情報）を確認するための作業実績管理画面１０。
（６）登録されたインシデントを参照し、インシデントに付加された対応事例に応じて対応手順書に基づいて一次対応を行うサービスデスク１４のコンピュータ。
（７）前記作業自動実行システム１１により実行された処理ステップである作業フロー１５の結果を格納するデータベースである実行ログ１２。 [Constitution]
As shown in FIG. 1, a computer system including a workaround execution management system according to the present embodiment includes a data center 1 in which a failure monitoring device 3 connected to a plurality of monitored devices 2 that are customer computers is installed, and the data center. 1 are connected via a network 4 manages the fault message from the fault monitoring apparatus 3 as an incident, Ru and a monitoring center 5 for managing the progress of the primary response to incidents. The monitoring center 5 includes the following constituent.
(1) Incident database 7 for storing the incident occurrence date / time, host name / occurrence system / customer name / message / corresponding case ID / corresponding case flow name for each incident ID uniquely assigned to the incident .
(2) An incident management system 6 that automatically registers an incident received from the failure monitoring apparatus 3 via the network 4 in the incident database 7 .
(3) A work performance management database storing performance information such as success cases, failure rates, execution times, failure times for each verification case (Step 1 to Step 3) for each response case ID uniquely assigned to a response case 9 .
(4) An automatic work execution system 11 that executes processing for each processing step, and triggered by the fact that the incident has been registered in the incident database 7, based on the registered contents of the incident database 7 and the work performance management database 9. A work-around execution management system 8 for determining whether to sound a patrol lamp (signal lamp) 13 or to activate the automatic work execution system 11 .
(5) Work result management screen 10 for confirming the execution result (result information) stored in the work result management database 9 stored by the work around execution management system 8 .
(6) A computer of the service desk 14 that refers to a registered incident and performs a primary response based on a response procedure manual according to a response case added to the incident .
(7) An execution log 12 which is a database for storing the result of the work flow 15 which is a processing step executed by the work automatic execution system 11 .

なお、これら構成は、一般のコンピュータシステム同様に、ＣＰＵ・メモリ・入出力機器・磁気ディスク装置・表示部を含むコンピュータ及びサーバ、データベース等のハードウェア並びにソフトウェアによって形成され前記パトランプ１３は点灯する信号灯に限られるものではなく電子メール等のサービスデスク１４に警告を発することができる他の手段であっても良い。 These components are formed by computer and server including CPU, memory, input / output device, magnetic disk device, display unit, hardware such as database, and software, and the patrol lamp 13 is turned on as in a general computer system. The present invention is not limited to this, and other means that can issue a warning to the service desk 14 such as e-mail may be used.

このように構成されたワークアラウンド実行管理システムを含むコンピュータシステムは、データセンタ１の障害監視装置３が監視対象機器２で発生した障害メッセージ（インシデントのメッセージ）を監視センタ５のインシデント管理システム６に送信する。このメッセージを受信したインシデント管理システム６が、インシデントデータベース７に受信したインシデントを自動で登録し、この登録を契機としてワークアラウンド実行管理システム８が、前記インシデントデータベース７及び作業実績管理データベース９の登録内容に基づいてパトランプ１３を鳴動するか又は作業自動実行システム１１を起動するかを後述の処理によって決定する。パトランプ１３を鳴動した場合、サービスデスク１４が作業自動実行システム１１を使用して作業フロー１５を手動で実行し、実行結果がＮＧ（失敗）のとき、サービスデスク１４が然るべき対策を行うように構成され、作業自動実行システム１１が起動された場合、作業フロー１５を自動で実行して、実行結果（実績情報）を実行ログ１２に記録するように動作する。 The computer system including the workaround execution management system configured as described above transmits a failure message (incident message) generated by the failure monitoring device 3 of the data center 1 to the monitored device 2 to the incident management system 6 of the monitoring center 5. that sends. Incident management system 6 that has received this message, and automatically registered incidents received in the incident database 7, a work-around execution management system 8 this registration as a trigger is registered in the incidents database 7 and the work record management database 9 Whether the patrol lamp 13 is sounded or the automatic work execution system 11 is activated is determined based on the contents by a process described later . If you ringing the path cards 13, the service desk 14 using the work automatic execution system 11 executes the workflow 15 manually, when the execution result is NG in (failure), the line service desk 14 appropriate measures Uyo When the work automatic execution system 11 is activated, the work flow 15 is automatically executed and the execution result (result information) is recorded in the execution log 12.

前記インシデントデータベース７は、ＩｎｃｉｄｅｎｔＴａｂｌｅを格納するものであって、図２に示す如く、次の項目情報から構成される。
（１）監視対象機器２から受信したメッセージ毎にインシデント管理システムが自動で採番した対応事例ＩＤ。
（２）当該インシデントが発生した発生日時。
（３）インシデントが発生した監視対象機器のホスト名。
（４）該監視対象機器にて稼働するアプリケーションシステムを表す発生システム。
（５）当該監視対象機器を利用する顧客名。
（６）当該監視対象機器から受信した実メッセージ。
（７）当該インシデントに対する障害解決を行うための対応事例（フロー名）。
（８）作業自動実行システム１１が実行する作業フロー１５のフロー名。
（９）実行状況。 The incident database 7 stores IncidentTable, and includes the following item information as shown in FIG .
(1) Corresponding case ID automatically assigned by the incident management system for each message received from the monitored device 2 .
(2) Date and time when the incident occurred .
(3) Host name of the monitored device where the incident occurred .
(4) A generation system representing an application system operating on the monitored device .
(5) Name of the customer who uses the monitored device .
(6) An actual message received from the monitored device .
(7) Response example (flow name) for solving a failure for the incident .
(8) The flow name of the work flow 15 executed by the work automatic execution system 11 .
(9) Execution status .

前記作業実績管理データベース９は、インシデント毎に付与される対応事例毎にそれぞれの事例の現在のＳｔｅｐと、各Ｓｔｅｐでの実行回数と、各Ｓｔｅｐでの成功回数と、各Ｓｔｅｐでの失敗回数と、各Ｓｔｅｐでの成功率と、各Ｓｔｅｐでの失敗率などの運用実績情報を格納している。 The work performance management database 9 includes the current step of each case, the number of executions in each step, the number of successes in each step, the number of failures in each step, for each corresponding case given for each incident. , Operation result information such as success rate at each step and failure rate at each step is stored.

前記作業実績管理画面１０は、インシデントの各検証段階（Ｓｔｅｐ）の実行レベルに応じた「現在の状況」欄と「過去１週間の実行結果」欄から成る検証状況を表示するものであって、図５に示す如く、「現在の状況」欄は、事例ＩＤ毎に、対応事例と、現在のＳｔｅｐと、各Ｓｔｅｐにおける実行回数と、各Ｓｔｅｐでの成功回数と、各Ｓｔｅｐでの失敗回数と、各Ｓｔｅｐでの成功率と、各Ｓｔｅｐでの失敗率などの各項目情報を運用実績情報として表示し、「過去１週間の実行結果」欄は、インシデントＩＤ毎に、対応事例と、実行区分と、実行結果の各項目情報を表示するものである。 The work record management screen 10 is for displaying the verification status consisting running "current situation" column and "past week execution result" corresponding to the level section of the verification step (Step) of the incident, As shown in FIG. 5, the “current situation” column includes, for each case ID, the corresponding case, the current step, the number of executions in each step, the number of successes in each step, and the number of failures in each step. , Each item information such as success rate at each step and failure rate at each step is displayed as operation result information, and the “execution result for the past one week” column shows the corresponding case and execution category for each incident ID. And each item information of the execution result is displayed.

インシデント一覧画面は、図６に示す如く、前記インシデントＩＤ毎に、当該インシデントが発生した発生日時と、インシデントが発生した監視対象機器のホスト名と、該監視対象機器にて稼働するアプリケーションシステムを表す発生システムと、当該監視対象機器を利用する顧客名と、当該監視対象機器から受信した実メッセージと、当該インシデントに対する障害解決を行うための対応事例（フロー名）と、実行状況の各項目情報とから構成される。 As shown in FIG. 6, the incident list screen shows, for each incident ID, the date and time when the incident occurred, the host name of the monitored device where the incident occurred, and the application system running on the monitored device. The generation system, the name of the customer who uses the monitored device, the actual message received from the monitored device, the response case (flow name) for solving the fault for the incident, and the item information of the execution status Consists of

［動作］
さて、前述のように構成されたワークアラウンド実行管理システムは、図７に示す如く、次の各ステップを実行する。
（１）インシデント登録時にインシデント管理システム６から自動実行されたワークアラウンド実行管理システム８が起動されたとき、インシデントデータベース７から前記インシデントの対応事例ＩＤと対応事例に定義されているフロー名を抽出するステップ７０１。
（２）該ステップ７０１で取得した対応事例ＩＤに基づいて作業実績管理データベース９から現在のＳｔｅｐ（検証レベル）を取得するステップ７０２。
（３）該ステップ７０２によって取得した現在のＳｔｅｐの検証レベルがＳｔｅｐ１かＳｔｅｐ２かＳｔｅｐ３かを判定するステップ７０３。 [Operation]
Now, the workaround execution management system configured as described above executes the following steps as shown in FIG .
(1) When the workaround execution management system 8 automatically executed from the incident management system 6 at the time of incident registration is activated, the incident case ID and the flow name defined in the corresponding case are extracted from the incident database 7 Step 701 .
(2) Step 702 of acquiring the current Step (verification level) from the work record management database 9 based on the corresponding case ID acquired in Step 701 .
(3) Step 703 for determining whether the verification level of the current Step acquired in Step 702 is Step 1, Step 2, or Step 3 .

（４）該ステップ７０３によってＳｔｅｐ１の検証レベルと判定したとき、パトランプ１３を鳴動させてサービスデスク１４に作業フローを手動で実行させることを促すと共にワークアラウンド実行管理システム８がインシデント一覧画面（図６）の作業状況を「手動実行（Ｓｔｅｐ１）」に更新するステップ７０４。
（５）該ステップ７０４による一連の作業実行終了まで待機するステップ７０５。
（６）前記ステップ７０４によるサービスデスクでの手動実行を実行ログ１２を監視して作業終了まで待機すると共に実行結果を取得するステップ７０６。
（７）前記ステップ７０３によってＳｔｅｐ２と判定されたとき、ワークアラウンド実行管理システム８に対して当該作業フローを実行指示するステップ７０７。
（８）該ステップ７０７による実行完まで待機するステップ７０８。 (4) When the verification level of Step 1 is determined in Step 703, the patrol lamp 13 is sounded to prompt the service desk 14 to manually execute the work flow, and the workaround execution management system 8 displays the incident list screen (FIG. 6). Step 704 is updated to “Manual execution (Step 1)” .
(5) Step 705 for waiting until the end of a series of work execution in step 704 .
(6) Step 706 of monitoring the execution log 12 for manual execution at the service desk in step 704, waiting until the end of work, and acquiring the execution result .
(7) Step 707 that instructs the workaround execution management system 8 to execute the work flow when it is determined as Step2 in Step 703 .
(8) Step 708 which waits until the execution in step 707 is completed .

（９）実行完了後にステップ７０９によって結果を取得し、作業フローの実行結果が正常・異常に関わらず、サービスデスクに対して結果を通知するステップ７０９。
（１０）前記ステップ７０３によってＳｔｅｐ３と判定されたとき、ワークアラウンド実行管理システム８に対して当該作業フローを実行指示するステップ７１０。
（１１）実行完了まで待機するステップ７１１。
（１２）該ステップ７１０による実行完了後に結果を取得し、実行結果が正常か異常かを判定するステップ７１２。
（１３）該ステップ７１２において異常と判定したときに異常が発生したことをサービスデスクに通知するステップ７１３。
（１４）前記ステップ７０６とステップ７０９とステップ７１３とステップ７１２により正常と判定されたとき、各Ｓｔｅｐの実行結果を作業実績管理データベース９に記録するステップ７１４。 (9) Step 709 that obtains the result after completion of execution and notifies the result to the service desk regardless of whether the execution result of the workflow is normal or abnormal .
(10) Step 710 that instructs the workaround execution management system 8 to execute the work flow when it is determined as Step3 in Step 703 .
(11) Step 711 for waiting until execution is completed .
(12) Step 712 which acquires a result after completion of the execution by the step 710 and determines whether the execution result is normal or abnormal .
(13) Step 713 of notifying the service desk that an abnormality has occurred when it is determined in step 712 that there is an abnormality .
(14) Step 714 which records the execution result of each Step in the work performance management database 9 when it is determined normal by Step 706, Step 709, Step 713 and Step 712 .

（１５）作業実績管理データベース９の現在のＳｔｅｐの実行結果を取得し、実行回数が５以上かつ成功率が８０％以上か判定するステップ７１５。
（１６）該ステップ７１５において実行回数が５以上かつ成功率が８０％以上と判定したとき、処理中事例の「現在のＳｔｅｐ」をランクアップするステップ７１６。
（１７）前記ステップ７１５において実行回数が５以上かつ成功率が８０％以上でないと判定したとき、現在のＳｔｅｐの実行結果を取得し失敗率が２１％以上であるか否かを判定し、失敗率が２１％以上でないと判定したときに処理を終了するステップ７１７。
（１８）前記ステップ７１８において失敗率が２１％以上であると判定したとき、現在のＳｔｅｐ（検証レベル）を低下させて処理を終了するステップ７１８。
これらステップを実行することによって、各作業フローの運用実績により、サービスデスク１４の関与を必要とする作業を変化させることによって、段階的に自動化の範囲を拡大することができる。 (15) Step 715 of acquiring the execution result of the current Step of the work performance management database 9 and determining whether the number of executions is 5 or more and the success rate is 80% or more .
(16) When it is determined in step 715 that the number of executions is 5 or more and the success rate is 80% or more, step 716 ranks up “current step” of the case being processed .
(17) When it is determined in step 715 that the number of executions is 5 or more and the success rate is not 80% or more, the execution result of the current Step is acquired, and it is determined whether or not the failure rate is 21% or more. Step 717 terminates the process when it is determined that the rate is not 21% or more .
(18) When it is determined in step 718 that the failure rate is 21% or more, the current step (verification level) is lowered and the process is terminated 718 .
By executing these steps , the range of automation can be expanded step by step by changing the work requiring the involvement of the service desk 14 according to the operation results of each work flow.

本ワークアラウンド実行管理システムは、作業フローの本番環境において、作業フローの多数のインシデントに対する検証レベルとして、本番環境にリリースし、サービスデスク１４が動作を監視する段階であるＳｔｅｐ１と、該Ｓｔｅｐ１の次に実装置において一定成功率（第１の成功率）以上の実績を残し、作業フロー実行の正常及び異常をサービスデスク１４に通知する段階であるＳｔｅｐ２と、該Ｓｔｅｐ２の次に一定成功率（第２の成功率）以上で実績を残し、作業フロー実行が異常のときのみにサービスデスク１４に通知する段階であるＳｔｅｐ３との３段階として設定する。そして、前記Ｓｔｅｐ２又はＳｔｅｐ３において第１又は第２成功率未満の作業フローの段階を下げることによって、作業を自動化する際のワークアラウンド（一次対応）である作業フローの検証を効率的に行うことができる。 This word over click around execution management system, in a production environment workflow, and the verify level for a number of incidents of workflow, released into production, service desk 14 is the step of monitoring the operating Step1, following the Step1 In Step 2, which is a stage in which the actual device has a record of a certain success rate (first success rate) or more and notifies the service desk 14 of normal or abnormal execution of the work flow, and a constant success rate (first (Success rate of 2) or more is set as three stages with Step 3, which is a stage that leaves a track record with the above and notifies the service desk 14 only when the work flow execution is abnormal . Then, by lowering the stage workflow less than the first or second success rate before Symbol Step2 or Step3, efficiently performed by the verification of the workflow as a work-around (primary response) when automate tasks Can do.

１データセンタ、２監視対象機器、３障害監視装置、
４ネットワーク、５監視センタ、６インシデント管理システム、
７インシデントデータベース、８ワークアラウンド実行管理システム、
９作業実績管理データベース、１０作業実績管理画面、
１１作業自動実行システム、１２実行ログ、１３パトランプ、
１４サービスデスク、１５作業フロー 1 data center, 2 monitored equipment, 3 fault monitoring device,
4 network, 5 monitoring center, 6 incident management system,
7 incident database, 8 workaround execution management system,
9 Work results management database, 10 Work results management screen,
11 work automatic execution system, 12 execution log, 13 patrol,
14 Service Desk, 15 Work Flow

Claims

An incident database that stores a message corresponding to an incident ID uniquely assigned to an incident and a flow name of a response case, an incident management system that registers a received incident in the incident database, and a work flow with the flow name are executed Confirming the work automatic execution system, the work result management database storing the result information including the success rate for each verification stage when the work flow of the flow name is executed, and the result information stored in the work result management database A work-around execution management system that displays a work result management screen and is connected to a service desk computer that controls the execution of the work flow, and manages the execution of the incident response work flow,
The first stage in which the service desk monitors the operation of the work flow in the production environment of the work flow that has been tested, and the normality and abnormality of the work flow in which the normal operation of the first success rate or higher is confirmed in the first stage. A second step of notifying the service desk and lowering the work flow in which the normal operation less than the first success rate is confirmed to the first step, and a normal operation exceeding the second success rate being confirmed in the second step. A third step of notifying the service desk of the abnormal work flow and lowering the work flow in which normal operation less than the second success rate is confirmed to the second step;
A first step of storing performance information including a success rate for each of the first to third stages when the work flow having the flow name is executed in a work performance management database;
The service desk is notified of the normality and abnormality of the work flow in which the normal operation equal to or higher than the first success rate is confirmed in the first stage, and the work flow in which the normal operation equal to or higher than the first success rate is confirmed to the second A second step to step up,
An abnormality in the work flow in which the normal operation equal to or higher than the second success rate is confirmed in the second stage is notified to the service desk, and the work flow in which the normal operation equal to or higher than the first success rate is confirmed is set to the third stage. And a third step of lowering the work flow in which normal operation less than the second success rate is confirmed to the first stage,
The third step second fourth Ruwa over click around execution management system to perform the steps of lowering the work flow that normal operation has been confirmed less than the success rate in the second stage of the.

When the performance information in the work performance management database includes the number of executions when the workflow for each verification stage is executed, and the first and second success rates are determined to be equal to or greater than the predetermined number of executions, the second from that perform fourth step 請 Motomeko 1 wherein workarounds execution management system.

It said first and second success rate and or 請 Motomeko 1 or 2, wherein the work-around execution management system that have a function of editing a value of a predetermined number of executions.

Comprising a revolving lamp to alert the service desk computer according abnormalities of Oite workflow in the third step to the service desk computer 3 or from 請 Motomeko 1 that warning by using the revolving lamp workpiece Around execution management system.

An incident database that stores a message corresponding to an incident ID uniquely assigned to an incident and a flow name of a response case, an incident management system that registers a received incident in the incident database, and a work flow with the flow name are executed Confirming the work automatic execution system, the work result management database storing the result information including the success rate for each verification stage when the work flow of the flow name is executed, and the result information stored in the work result management database is connected to the service desk computer to control the execution of the workflow displays an operation result management screen for, a work Arau down de execution management method of a computer system for managing the execution of the workflow incident response,
In the computer system,
The first stage in which the service desk monitors the operation of the work flow in the production environment of the work flow that has been tested, and the normality and abnormality of the work flow in which the normal operation of the first success rate or higher is confirmed in the first stage. A second step of notifying the service desk and lowering the work flow in which the normal operation less than the first success rate is confirmed to the first step, and a normal operation exceeding the second success rate being confirmed in the second step. A third step of notifying the service desk of the abnormal work flow and lowering the work flow in which normal operation less than the second success rate is confirmed to the second step;
A first step of storing performance information including a success rate for each of the first to third stages when the work flow having the flow name is executed in a work performance management database;
The service desk is notified of the normality and abnormality of the work flow in which the normal operation equal to or higher than the first success rate is confirmed in the first stage, and the work flow in which the normal operation equal to or higher than the first success rate is confirmed to the second A second step to step up,
An abnormality in the work flow in which the normal operation equal to or higher than the second success rate is confirmed in the second stage is notified to the service desk, and the work flow in which the normal operation equal to or higher than the first success rate is confirmed is set to the third stage. And a third step of lowering the work flow in which normal operation less than the second success rate is confirmed to the first stage,
The third step second fourth step and Ruwa over click-around execution management method is run to lower the work flow that normal operation has been confirmed less than the success rate in the second step in.

When the performance information of the work performance management database includes the number of executions when the workflow for each verification stage is executed, and when the first and second success rates are determined to be greater than or equal to a predetermined number of executions, 請 Motomeko 5 workaround execution management method according to the 2 Ru to execute the fourth step.

Said first and second success rate value and or 請 Motomeko 5 or 6 Symbol mounting workarounds execution management method Ru to execute the function of editing the predetermined number of executions.

Provided revolving lamp to alert the service desk computer according abnormalities of Oite workflow in the third step to the service desk computer 7 any of 請 Motomeko 5 Ru is warned using the revolving lamp workpiece Around execution management method.