JP7243402B2

JP7243402B2 - DOCUMENT PROCESSING METHOD, DOCUMENT PROCESSING PROGRAM AND INFORMATION PROCESSING DEVICE

Info

Publication number: JP7243402B2
Application number: JP2019075907A
Authority: JP
Inventors: 孝広齊藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-04-11
Filing date: 2019-04-11
Publication date: 2023-03-22
Anticipated expiration: 2039-04-11
Also published as: JP2020173673A

Description

本発明は、文書処理方法等に関する。 The present invention relates to a document processing method and the like.

フィールドで稼働する製品に障害が発生した場合、保守作業員は、対象製品の修理を行うと共に、障害内容を記載した障害レポート（ＭＲ：Maintenance Report）を作成し、保守管理部門等に報告する。 When a failure occurs in a product operating in the field, a maintenance worker repairs the target product, prepares a failure report (MR) describing the details of the failure, and reports it to the maintenance management department.

保守管理部門では、報告された複数のＭＲに対して分析を行い、たとえば、発生件数の多い障害を検出する。保守管理部門では、検出した発生件数の多い障害内容および対策を文書化し、フィールドに周知させることで、製品に起こり得る障害を未然に防止する。 The maintenance department analyzes a plurality of reported MRs to detect, for example, failures that occur frequently. The maintenance management department documents the content of the most frequently detected failures and countermeasures, and disseminates them to the field to prevent failures that may occur in products.

製品を販売する会社は、上記のような取り組みを繰り返し実行することで、製品の保守性や品質を向上させている。かかる取り組みのサイクルを迅速に行うため、現状では人手に頼っているＭＲの分析作業をＡＩ（artificial intelligence）を用いて効率化することが求められている。 A company that sells products improves the maintainability and quality of its products by repeatedly implementing the above-described measures. In order to speed up the cycle of such efforts, there is a need to use AI (artificial intelligence) to improve the efficiency of MR analysis work, which currently relies on human labor.

これまで、ＭＲの記述内容は、表記揺れや同義語、類義語を含んでおり、同一の障害のＭＲであるか否かの判断を、コンピュータが実行することは難しかった。しかし、単語の分散表現を活用することで、単語をベクトル化し、表記揺れや同義語、類義語を含むＭＲを対応付けることが可能となっている。たとえば、各単語のベクトルを重み付き合成によって文書（ＭＲ）のベクトルを算出し、文書間の類似性を定量化する従来技術もある。 Until now, descriptions of MRs have included spelling variations, synonyms, and synonyms, and it has been difficult for a computer to determine whether or not MRs have the same disorder. However, by utilizing the distributed representation of words, it is possible to vectorize words and associate MRs including spelling variations, synonyms, and synonyms. For example, there is also a prior art that calculates a document (MR) vector by weighted synthesis of each word vector and quantifies the similarity between documents.

また、文書のベクトルを用いて、文書間の類似度を算出し、類似度の高い文書同士を同一のクラスタに分類する従来技術（クラスタリング）がある。たとえば、各ＭＲの類似度を基にして、クラスタリングを行い、所定数以上のＭＲが含まれるクラスタを、発生件数の多い障害に対応するＭＲのクラスタとして見なすことが可能である。 There is also a conventional technique (clustering) that calculates similarity between documents using vectors of documents and classifies documents with high similarity into the same cluster. For example, it is possible to perform clustering based on the similarity of each MR, and regard a cluster containing a predetermined number or more of MRs as a cluster of MRs corresponding to failures that occur frequently.

たとえば、互いに類似度が１となるＭＲのみで構成されるクラスタは、同一障害のクラスタと見なすことができるが、多発障害の検出漏れを抑制するためには、類似度の閾値を１未満にしたほうがよい。 For example, a cluster consisting of only MRs with a similarity of 1 can be regarded as a cluster with the same failure. Better.

特開２００９－１４６３９７号公報JP 2009-146397 A 特開２０１７－１９４７２７号公報JP 2017-194727 A 特開２０１８－１０６３９０号公報JP 2018-106390 A

ＭＲには障害内容を記述した文以外の文が含まれている場合もあり、かかるＭＲに対してクラスタリングを実行すると、共通する障害内容を記述していないＭＲ同士が同一のクラスタに分類され、適切なクラスタが生成されない。 MRs may contain sentences other than the sentence describing the failure details, and when clustering is performed on such MRs, MRs that do not describe common failure details are classified into the same cluster, Proper clusters are not generated.

ＭＲは、一般的な文書と比べ文字数および文数が非常に少ないという特徴があり、ＭＲに含まれる障害内容と関係のない文の存在が、クラスタリングの結果に影響を与えやすい。 MRs are characterized by having a very small number of characters and sentences compared to general documents, and the existence of sentences unrelated to the content of failures contained in MRs tends to affect the results of clustering.

たとえば、ＭＲ「syn flood攻撃の検知が頻発。対処方法を教えて欲しい。」には、文Ａ「syn flood攻撃の検知が頻発。」と、文Ｂ「対処方法を教えて欲しい。」とを含んでいる。この文Ａ、文Ｂのうち、文Ａは、障害内容を記述した文であり、文Ｂは、障害内容を記述した文ではない。 For example, the MR "Syn flood attacks are frequently detected. Please tell me how to deal with them." contains. Of these sentences A and B, sentence A is a sentence describing the content of failure, and sentence B is not a sentence describing the content of failure.

ここで、文Ａを含むＭＲの件数よりも、文Ｂを含むＭＲの件数の方が多い場合、障害内容に関わりなく、文Ｂを含むＭＲが一つのクラスタ（第１クラスタ）に分類される。第１クラスタに含まれるＭＲの件数は多くなるが、かかる第１クラスタは、障害内容とは関わりのない文Ｂを共通に持つＭＲのクラスタであるため、かかる第１クラスタを検出すると、誤検知の発生に繋がる。 Here, when the number of MRs including sentence B is larger than the number of MRs including sentence A, the MRs including sentence B are classified into one cluster (first cluster) regardless of the fault content. . The number of MRs included in the first cluster increases, but since this first cluster is a cluster of MRs that have in common the sentence B that is not related to the content of the failure, when this first cluster is detected, false positives occur. lead to the occurrence of

また、文Ａを含み、文Ｂを含まないＭＲは、文Ｂを含むＭＲとは別のクラスタ（第２クラスタ）に分類される。第２クラスタは、障害内容に対応するＭＲを分類したクラスタであるにもかかわらず、文Ａと文Ｂ両方を含むＭＲは第１クラスタに属してしまうため、第２クラスタに含まれるＭＲの件数が少なくなるので、かかる第２クラスタは検出対象から除外され、検出漏れが発生する。 Also, MRs containing sentence A but not B are classified into a cluster (second cluster) different from MRs containing sentence B. FIG. Although the second cluster is a cluster that classifies MRs corresponding to failure details, MRs that include both sentences A and B belong to the first cluster, so the number of MRs included in the second cluster becomes smaller, the second cluster is excluded from detection targets, and detection omission occurs.

このため、ＭＲ等の文書から、障害内容等を記述した所定の文を残しつつ、他の文を文書から除外することが求められている。 For this reason, it is required to exclude other sentences from documents such as MRs, while leaving predetermined sentences describing the details of failures and the like.

１つの側面では、本発明は、障害内容を記述した文を残しつつ、障害内容とは関係のない他の文を文書から除外することができる文書処理方法、文書処理プログラムおよび情報処理装置を提供することを目的とする。 In one aspect, the present invention provides a document processing method, a document processing program, and an information processing apparatus capable of leaving a sentence describing the content of a failure while excluding other sentences unrelated to the content of the failure from the document. intended to

第１の案では、コンピュータは、以下の処理を実行する。コンピュータは、一文または複数文から構成される複数の文書を取得する。コンピュータは、複数の文書の中から予め設定された条件を満たす一文から構成される第一着目文を特定する。コンピュータは、複数の文書の中から、特定した第一着目文を含む複数文から構成される複数の第一文書を取得する。コンピュータは、取得した複数の第一文書の中から、特定した第一着目文以外の一文から構成される第二着目文を特定する。コンピュータは、複数の文書の中から、第二着目文を含む複数文から構成される複数の第二文書を取得する。コンピュータは、複数の第一文書および複数の第二文書のそれぞれに含まれる同一文書の数と、同一文書以外の文書の数との関係に基づいて、複数の文書の中から第二着目文を除外する。 In the first alternative, the computer performs the following processes. A computer obtains a plurality of documents consisting of one or more sentences. The computer identifies a first sentence of interest composed of a sentence that satisfies a preset condition from among a plurality of documents. The computer acquires a plurality of first documents composed of a plurality of sentences including the specified first sentence of interest from among the plurality of documents. The computer identifies a second sentence of interest composed of a sentence other than the identified first sentence of interest from among the plurality of acquired first documents. The computer acquires a plurality of second documents composed of a plurality of sentences including the second sentence of interest from among the plurality of documents. The computer selects a second sentence of interest from among the plurality of documents based on the relationship between the number of identical documents included in each of the plurality of first documents and the plurality of second documents and the number of documents other than the identical document. exclude.

障害内容を記述した文を残しつつ、障害内容とは関係のない他の文を文書から除外することができる。 Other sentences unrelated to the content of the failure can be excluded from the document while leaving the sentence describing the content of the failure.

図１は、本実施例に係る情報処理装置の処理を説明するための図である。FIG. 1 is a diagram for explaining the processing of the information processing apparatus according to the embodiment. 図２は、本実施例に係る情報処理装置の構成を示す機能ブロック図である。FIG. 2 is a functional block diagram showing the configuration of the information processing apparatus according to this embodiment. 図３は、文書ＤＢのデータ構造の一例を示す図である。FIG. 3 is a diagram showing an example of the data structure of the document DB. 図４は、セットＳテーブルのデータ構造の一例を示す図である。FIG. 4 is a diagram showing an example of the data structure of the set S table. 図５は、セットＭテーブルのデータ構造の一例を示す図である。FIG. 5 is a diagram showing an example of the data structure of the set M table. 図６は、セットＭ’テーブルのデータ構造の一例を示す図である。FIG. 6 is a diagram showing an example of the data structure of the set M' table. 図７は、判別モデルテーブルのデータ構造の一例を示す図である。FIG. 7 is a diagram showing an example of the data structure of a discriminant model table. 図８は、除外処理部の処理の一例を説明するための図（１）である。FIG. 8 is a diagram (1) for explaining an example of the processing of the exclusion processing unit; 図９は、除外処理部の処理の一例を説明するための図（２）である。FIG. 9 is a diagram (2) for explaining an example of the processing of the exclusion processing unit; 図１０は、除外処理部の処理の一例を説明するための図（３）である。FIG. 10 is a diagram (3) for explaining an example of the processing of the exclusion processing unit; 図１１は、除外処理部の処理の一例を説明するための図（４）である。FIG. 11 is a diagram (4) for explaining an example of the processing of the exclusion processing unit; 図１２は、除外処理部の処理の一例を説明するための図（５）である。FIG. 12 is a diagram (5) for explaining an example of the processing of the exclusion processing unit; 図１３は、検出結果の一例を示す図（１）である。FIG. 13 is a diagram (1) showing an example of a detection result. 図１４は、検出結果の一例を示す図（２）である。FIG. 14 is a diagram (2) showing an example of the detection result. 図１５は、本実施例に係る情報処理装置の処理手順を示すフローチャート（１）である。FIG. 15 is a flowchart (1) showing the processing procedure of the information processing apparatus according to the present embodiment. 図１６は、本実施例に係る情報処理装置の処理手順を示すフローチャート（２）である。FIG. 16 is a flowchart (2) showing the processing procedure of the information processing apparatus according to the embodiment. 図１７は、本実施例に係る情報処理装置と同様の機能を実現するコンピュータのハードウェア構成の一例を示す図である。FIG. 17 is a diagram showing an example of the hardware configuration of a computer that implements the same functions as the information processing apparatus according to this embodiment.

以下に、本願の開示する文書処理方法、文書処理プログラムおよび情報処理装置の実施例を図面に基づいて詳細に説明する。なお、この実施例によりこの発明が限定されるものではない。 Exemplary embodiments of the document processing method, the document processing program, and the information processing apparatus disclosed in the present application will be described below in detail with reference to the drawings. In addition, this invention is not limited by this Example.

図１は、本実施例に係る情報処理装置の処理を説明するための図である。情報処理装置は、複数の文書（たとえば、ＭＲ）から、障害内容を記述した文１ａを含み、他の文を含まない文書Ｓ１を検出する。 FIG. 1 is a diagram for explaining the processing of the information processing apparatus according to the embodiment. The information processing device detects a document S1 that includes a sentence 1a describing the content of the failure and does not include other sentences from a plurality of documents (for example, MR).

また、情報処理装置は、文１ａと他の文を含む文書Ｍ１，Ｍ２，Ｍ３（および他の文書）と、文１ａを含まないＭ４、Ｍ５（および他の文書）とを検出する。図１において、文１ａ，文２ａは、それぞれ異なる障害内容を記述した文とする。文１ｂ，文１ｃ、他の文は、障害内容が記述された文か否かが不明な文とする。 The information processing device also detects documents M1, M2, and M3 (and other documents) containing sentence 1a and other sentences, and M4 and M5 (and other documents) that do not contain sentence 1a. In FIG. 1, sentences 1a and 2a are sentences describing different fault contents. Sentences 1b, 1c, and other sentences are sentences in which it is unclear whether or not they describe the details of the failure.

同一の文書において、障害内容を記述した文１ａと共起する他の文は、障害内容を記述した文、または、障害内容を記述していない文のいずれかとなる。また、障害内容を記述していない文は、特定の障害内容とは関わりなく、障害内容を記述した文と共起することが多いという特徴がある。逆に言えば、様々な障害を記述する文と共起する文は障害内容を記述していないといえる。 In the same document, other sentences co-occurring with the sentence 1a describing the content of the failure are either sentences describing the content of the failure or sentences not describing the content of the failure. In addition, there is a feature that sentences not describing the content of failure often co-occur with sentences describing the content of failure, regardless of specific content of the failure. Conversely, it can be said that sentences co-occurring with sentences describing various failures do not describe the contents of failures.

ここで、各文書Ｍ１～Ｍ５を、区分１０Ａ，１０Ｂ，１０Ｃに分類する。区分１０Ａは、文１ａと、文１ｂとが共起しており、かつ、文１ｃを含まない文書Ｍ２，Ｍ３（図示しない他の文書）を含む。区分１０Ｂは、文２ａと、文１ｃとが共起している文書Ｍ４，Ｍ５（図示しない他の文書）を含む。区分１０Ｃは、文１ａと、文１ｂと、文１ｃとが共起している文書Ｍ１を含む。 Here, each document M1 to M5 is classified into sections 10A, 10B, and 10C. Section 10A includes documents M2 and M3 (other documents not shown) in which sentence 1a and sentence 1b co-occur and which do not include sentence 1c. Section 10B includes documents M4 and M5 (other documents not shown) in which sentence 2a and sentence 1c co-occur. Section 10C includes document M1 in which sentence 1a, sentence 1b, and sentence 1c co-occur.

ここで、区分１０Ａに含まれる、文書Ｍ２、Ｍ３および図示しない他の文書は、文１ａと、文１ｂとが共起しており「文１ｂは、文１ａの障害内容と関係のある文」と言える。一方、区分１０Ｂにおいて、文１ｃは、文１ａとは異なる障害内容を記述した、文２ａと共起しているため、「文１ｃは、文１ａの障害内容と関係のない文」と言える。このため、情報処理装置は、文書Ｍ１から、文１ｃを除外する。 Here, documents M2, M3 and other documents (not shown) included in section 10A have sentences 1a and 1b co-occurring, and "sentence 1b is a sentence related to the failure content of sentence 1a." I can say. On the other hand, in section 10B, sentence 1c co-occurs with sentence 2a, which describes a failure content different from that of sentence 1a. Therefore, the information processing device excludes the sentence 1c from the document M1.

上記のように、本実施例に係る情報処理装置は、着目した障害内容を記述した文を含む文書を検出し、検出した文書のうち、複数の文を含む文書について、着目した障害内容に関係のある文（障害内容を記述した文）を残す。また、情報処理装置は、着目した障害内容に関係のない文（障害内容を記述していない文）を削除する処理を行う。このように、障害内容を記述した文に関連する文を残し、関連しない文を削除することができるので、クラスタリング処理による障害検出において、誤検知や検出もれを抑止することができる。 As described above, the information processing apparatus according to the present embodiment detects a document including a sentence describing the failure content of interest, and among the detected documents, a document including a plurality of sentences is related to the failure content of interest. Leave a sentence with In addition, the information processing device performs processing for deleting sentences unrelated to the focused failure content (sentences not describing the failure content). In this way, sentences related to the sentence describing the content of the failure can be left, and sentences not related can be deleted. Therefore, in failure detection by clustering processing, erroneous detection and detection omission can be suppressed.

次に、本実施例に係る情報処理装置の構成の一例について説明する。図２は、本実施例に係る情報処理装置の構成を示す機能ブロック図である。図２に示すように、この情報処理装置１００は、通信部１１０と、入力部１２０と、表示部１３０と、記憶部１４０と、制御部１５０とを有する。 Next, an example of the configuration of the information processing apparatus according to this embodiment will be described. FIG. 2 is a functional block diagram showing the configuration of the information processing apparatus according to this embodiment. As shown in FIG. 2 , this information processing apparatus 100 has a communication section 110 , an input section 120 , a display section 130 , a storage section 140 and a control section 150 .

通信部１１０は、ネットワークを介して外部装置とデータ通信を実行する処理部である。通信部１１０は、通信装置の一例である。後述する制御部１５０は、通信部１１０を介して、データをやり取りする。たとえば、通信部１１０は、障害内容を記述した文書の情報を外部装置から受信する。 The communication unit 110 is a processing unit that performs data communication with an external device via a network. Communication unit 110 is an example of a communication device. A control unit 150 , which will be described later, exchanges data via the communication unit 110 . For example, the communication unit 110 receives information of a document describing the details of the failure from the external device.

入力部１２０は、情報処理装置１００に各種の情報を入力するための入力装置である。たとえば、入力部１２０は、キーボードやマウス、タッチパネル等に対応する。利用者は、入力部１２０を操作して、障害内容を記述した文書の情報を、情報処理装置１００に入力してもよい。 The input unit 120 is an input device for inputting various kinds of information to the information processing device 100 . For example, the input unit 120 corresponds to a keyboard, mouse, touch panel, and the like. The user may operate the input unit 120 to input the information of the document describing the content of the failure to the information processing apparatus 100 .

表示部１３０は、制御部１５０から出力される各種の情報を表示する表示装置である。表示部１３０は、液晶ディスプレイ、タッチパネル等の表示装置に対応する。 The display unit 130 is a display device that displays various information output from the control unit 150 . The display unit 130 corresponds to a display device such as a liquid crystal display and a touch panel.

記憶部１４０は、文書ＤＢ（Data Base）１４１と、セットＳテーブル１４２と、セットＭテーブル１４３と、判別モデルテーブル１４５とを有する。記憶部１４０は、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、フラッシュメモリ（Flash Memory）などの半導体メモリ素子や、ＨＤＤ（Hard Disk Drive）などの記憶装置に対応する。 The storage unit 140 has a document DB (Data Base) 141 , a set S table 142 , a set M table 143 and a discrimination model table 145 . The storage unit 140 corresponds to semiconductor memory devices such as RAM (Random Access Memory), ROM (Read Only Memory), flash memory, and storage devices such as HDD (Hard Disk Drive).

文書ＤＢ１４１は、障害内容を記述した複数の文書（ＭＲ）の情報を登録するデータベースである。図３は、文書ＤＢのデータ構造の一例を示す図である。図３に示すように、この文書ＤＢ１４１は、文書識別情報と、文書情報とを対応付ける。文書識別情報は、文書を一意に識別する情報である。文書情報は、障害内容を記述した一つの文または複数の文を含む文書の情報である。たとえば、一つの文書情報に含まれる各文は、句読点によって、他の文と区分される。 The document DB 141 is a database for registering information of a plurality of documents (MR) describing failure details. FIG. 3 is a diagram showing an example of the data structure of the document DB. As shown in FIG. 3, the document DB 141 associates document identification information with document information. Document identification information is information that uniquely identifies a document. The document information is document information including one or more sentences describing the content of the failure. For example, each sentence included in one piece of document information is distinguished from other sentences by punctuation marks.

図３において、文書識別情報「ＭＲ１」に対応する文書情報は「syn flood攻撃が検知されました。対処方法を教えてください。」である。この文書情報には、文「syn flood攻撃が検知されました。」と、文「対処方法をおしえてください。」とを含む。 In FIG. 3, the document information corresponding to the document identification information "MR1" is "A syn flood attack has been detected. Please tell me how to deal with it.". This document information includes the sentence "A syn flood attack has been detected." and the sentence "Please tell me how to deal with it."

図３において、文書識別情報「ＭＲ１００」に対応する文書情報は「syn flood攻撃が検知されました。」である。この文書情報は、文「syn flood攻撃が検知されました。」を含む。このように、文書情報に一つの文しか含まれない場合、かかる一つの文は、障害内容を記述した文と見なす事ができる。 In FIG. 3, the document information corresponding to the document identification information "MR100" is "A syn flood attack was detected." This document information includes the sentence "A syn flood attack was detected." In this way, when the document information contains only one sentence, the one sentence can be regarded as a sentence describing the details of the failure.

セットＳテーブル１４２は、文書ＤＢ１４１に登録された各文書情報のうち、一つの文を含む文書情報を登録するテーブルである。図４は、セットＳテーブルのデータ構造の一例を示す図である。図４に示すように、このセットＳテーブル１４２は、文書識別情報と、文書情報（一つの文）とを対応付ける。 The set S table 142 is a table for registering document information including one sentence among the document information registered in the document DB 141 . FIG. 4 is a diagram showing an example of the data structure of the set S table. As shown in FIG. 4, this set S table 142 associates document identification information with document information (one sentence).

セットＭテーブル１４３は、文書ＤＢ１４１に登録された各文書情報のうち、複数の文を含む文書情報を登録するテーブルである。図５は、セットＭテーブルのデータ構造の一例を示す図である。図５に示すように、このセットＭテーブル１４３は、文書識別情報と、文書情報（複数の文）とを対応付ける。 The set M table 143 is a table for registering document information including a plurality of sentences among the document information registered in the document DB 141 . FIG. 5 is a diagram showing an example of the data structure of the set M table. As shown in FIG. 5, this set M table 143 associates document identification information with document information (a plurality of sentences).

セットＭ’テーブル１４４は、セットＭテーブル１４３に登録される文書情報を、一文毎に分割した情報を登録するテーブルである。図６は、セットＭ’テーブルのデータ構造の一例を示す図である。図６に示すように、このセットＭ’テーブル１４４は、文書識別情報と、文書サブ識別情報と、削除フラグと、文書情報とを対応付ける。文書識別情報は、図５で説明した文書識別情報に対応する。文書サブ識別情報は、複数文の文書情報に含まれる各文をそれぞれ識別する情報である。削除フラグは、対応する文書情報を削除するか否かを示すフラグである。削除する場合には「オン」となり、削除しない場合には「オフ」となる。削除フラグは、後述する除外処理部１５４に設定される。削除フラグの初期値は「オフ」である。文書情報は、一つの文の情報である。 The set M' table 144 is a table for registering information obtained by dividing the document information registered in the set M table 143 for each sentence. FIG. 6 is a diagram showing an example of the data structure of the set M' table. As shown in FIG. 6, this set M' table 144 associates document identification information, document sub-identification information, deletion flags, and document information. The document identification information corresponds to the document identification information described with reference to FIG. The document sub-identification information is information for identifying each sentence included in document information of multiple sentences. The deletion flag is a flag indicating whether or not to delete the corresponding document information. If it is to be deleted, it is "on", and if it is not to be deleted, it is "off". The deletion flag is set in the exclusion processing unit 154, which will be described later. The initial value of the delete flag is "off". Document information is information of one sentence.

判別モデルテーブル１４５は、対象文書情報と類似する文書であるか否かを判定するモデルを登録するテーブルである。図７は、判別モデルテーブルのデータ構造の一例を示す図である。図７に示すように、この判別モデルテーブル１４５は、対象文書識別情報と、判別モデルとを対応付ける。対象文書識別情報は、判別モデルの対象となった文書情報の文書識別情報、または、文書サブ識別情報（後述する）を示す情報である。判別モデルは、対象文書識別情報の文書と類似する文書を判別するための判別モデルの情報である。 The determination model table 145 is a table for registering a model for determining whether or not a document is similar to the target document information. FIG. 7 is a diagram showing an example of the data structure of a discriminant model table. As shown in FIG. 7, the discriminant model table 145 associates target document identification information with discriminant models. The target document identification information is information indicating the document identification information of the document information that is the target of the discriminant model, or document sub-identification information (described later). The discriminant model is information of a discriminant model for discriminating a document similar to the document of the target document identification information.

図２の説明に戻る。制御部１５０は、取得部１５１と、第一特定部１５２と、第二特定部１５３と、除外処理部１５４と、検出部１５５とを有する。制御部１５０は、ＣＰＵやＭＰＵ（Micro Processing Unit）などによって実現できる。また、制御部１５０は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などのハードワイヤードロジックによっても実現できる。 Returning to the description of FIG. The control unit 150 has an acquisition unit 151 , a first identification unit 152 , a second identification unit 153 , an exclusion processing unit 154 and a detection unit 155 . The control unit 150 can be implemented by a CPU, MPU (Micro Processing Unit), or the like. The control unit 150 can also be realized by hardwired logic such as ASIC (Application Specific Integrated Circuit) and FPGA (Field Programmable Gate Array).

取得部１５１は、ネットワークを介して、外部装置等から、障害内容を記述した文書情報を取得し、取得した文書情報を、文書ＤＢ１４１に登録する。文書情報に対応する文書識別情報は、文書情報に予め設定されていてもよいし、取得部１５１が、文書情報にユニークな文書識別情報を割り当ててもよい。取得部１５１は、入力部１２０を介して、文書情報を取得してもよい。 The acquisition unit 151 acquires document information describing the details of the failure from an external device or the like via the network, and registers the acquired document information in the document DB 141 . The document identification information corresponding to the document information may be preset in the document information, or the acquisition unit 151 may assign unique document identification information to the document information. The acquisition unit 151 may acquire document information via the input unit 120 .

第一特定部１５２は、文書ＤＢ１４１に含まれる複数の文書情報のうち、一文から構成される文書情報を特定し、特定した文書情報および文書識別情報を、セットＳテーブル１４２に登録する。 The first identification unit 152 identifies document information consisting of one sentence from among multiple pieces of document information contained in the document DB 141 , and registers the identified document information and document identification information in the set S table 142 .

また、第一特定部１５２は、セットＳテーブル１４２に登録された各文書情報（文）から一つの文Ｓ１を選択し、下記の処理を実行することにより、文Ｓ１と類似する文を判別する判別モデルを生成する。 Further, the first identifying unit 152 selects one sentence S1 from each document information (sentence) registered in the set S table 142, and executes the following process to determine sentences similar to the sentence S1. Generate a discriminant model.

第一特定部１５２が、判別モデルを生成する処理の一例について説明する。第一特定部１５２は、文Ｓ１と、セットＳテーブル１４２に含まれる各文（文Ｓ１を含む）との類似度をそれぞれ算出し、セットＳテーブル１４２に含まれる各文のうち、類似度上位の文を特定する。 An example of a process of generating a discriminant model by the first identifying unit 152 will be described. The first identifying unit 152 calculates the degree of similarity between the sentence S1 and each sentence (including the sentence S1) included in the set S table 142, and finds the sentences included in the set S table 142 that have the highest similarity. identify the sentence

たとえば、第一特定部１５２は、word2vec等と同様にして、文に含まれる各単語のベクトルを算出し、文に含まれる各単語のベクトルを積算することで、文のベクトルを算出する。第一特定部１５２は、文Ｓ１のベクトルと、セットＳテーブル１４２に登録された各文のベクトルとの距離をそれぞれ類似度として算出する。第一特定部１５２は、ベクトル間の距離が近いほど、類似度を大きくする。第一特定部１５２は、類似度の降順に、文をソートし、上位ｎに含まれる文を、類似度上位の文として特定する。 For example, the first identifying unit 152 calculates the vector of each word included in the sentence in the same manner as word2vec, and multiplies the vectors of each word included in the sentence to calculate the vector of the sentence. The first identifying unit 152 calculates the distance between the vector of the sentence S1 and the vector of each sentence registered in the set S table 142 as the degree of similarity. The first identifying unit 152 increases the degree of similarity as the distance between the vectors is shorter. The first identifying unit 152 sorts the sentences in descending order of similarity, and identifies sentences included in the top n as sentences with the highest similarity.

第一特定部１５２は、特定した類似度上位の文を「正例」としたＰＵ（Positive Unlabeled）学習を行い、文Ｓ１に類似する文であるか否かを判別する判別モデルを生成する。第一特定部１５２は、文Ｓ１を識別する対象文書識別情報と、判別モデルの情報とを対応付けて、判別モデルテーブル１４５に登録する。 The first identifying unit 152 performs PU (Positive Unlabeled) learning with the identified high-ranking similarity sentences as “positive examples” and generates a discrimination model for determining whether or not the sentence is similar to the sentence S1. The first identifying unit 152 associates the target document identification information for identifying the sentence S1 with the discriminant model information and registers them in the discriminant model table 145 .

ここで、ＰＵ学習は、訓練データとして、一部の正例のみが与えられている場合に機械学習を行う学習法である。ＰＵ学習により学習される判別モデルは、正負不明のデータに対して正例確率を推定する推定モデルである。また、ＰＵ学習により学習される判別モデルは、算出された正例確率によって重みづけされた判別モデルである。なお、以降の処理においてはこのＰＵ学習を用いているが、類似度が低い文を負例とみなして通常の機械学習方式を用いて判別モデルを構築することもできる。 Here, PU learning is a learning method that performs machine learning when only some positive examples are given as training data. A discriminant model learned by PU learning is an estimation model for estimating the probability of a positive case for data whose positive or negative is unknown. Also, the discriminant model learned by PU learning is a discriminant model weighted by the calculated positive case probability. Although this PU learning is used in the subsequent processing, it is also possible to regard sentences with low similarity as negative examples and construct a discriminant model using a normal machine learning method.

たとえば、第一特定部１５２は、ＰＵ学習を行う場合に、確率変数ｘ、ｙ、ｚを定義する。ここで、ｘ∈Ｒ（実数全体），ｙ∈｛－１，１｝，ｓ∈｛０，１｝とする。ｘは、入力（文のベクトル）、ｙはクラスラベル（負例＝－１，正例＝１）、ｓはデータがラベリングされているか否か（ラベリングされていないｓ＝０，ラベルされている＝１）を示す。 For example, the first identifying unit 152 defines random variables x, y, and z when performing PU learning. Here, let x ∈ R (all real numbers), y ∈ {-1, 1}, s ∈ {0, 1}. x is the input (sentence vector), y is the class label (negative example = -1, positive example = 1), s is whether the data is labeled (unlabeled s = 0, labeled = 1).

まず、第一特定部１５２は、ｐ（ｓ＝１｜ｘ）の推定モデルを学習する。上記のように、類似度上位の文には「正例」ラベルが付与され（ｓ＝１）、他の文にはラベルが付与されていない（ｓ＝０）ので、ラベルが付与されているデータ（文のベクトル）は、正例である。このため、ｐ（ｓ＝１｜ｘ）の推定モデルは、「正例らしさの確率」を推定するモデルであると言える。第一特定部１５２は、たとえば、ＮＮ（Neural Network）のパラメータを調整する学習を行うことで、ｐ（ｓ＝１｜ｘ）の推定モデルを学習する。 First, the first identifying unit 152 learns an estimation model of p(s=1|x). As described above, the sentence with the highest degree of similarity is given the “positive” label (s = 1), and the other sentences are not labeled (s = 0), so they are labeled. The data (vector of sentences) are positive examples. Therefore, it can be said that the estimation model for p(s=1|x) is a model for estimating the “probability of being a positive example”. The first identifying unit 152 learns the estimation model of p(s=1|x) by, for example, learning to adjust the parameters of the NN (Neural Network).

続いて、第一特定部１５２は、ｐ（ｓ＝１｜ｘ）の推定モデルの出力を正例らしさと見なして、判別モデルｐ（ｙ＝１｜ｘ）＝ｐ（ｓ＝１｜ｘ）／ｐ（ｓ＝１｜ｙ＝１）を学習する。第一特定部１５２は、ＮＮのパラメータを調整する学習を行うことで、ｐ（ｙ＝１｜ｘ）の判別モデルを学習する。この判別モデルに、文のベクトルを入力すると、文Ｓ１に類似する文である確からしさが出力される。 Subsequently, the first identification unit 152 regards the output of the estimation model of p(s=1|x) as likely to be a positive example, and determines the discriminant model p(y=1|x)=p(s=1|x) /p(s=1|y=1). The first identifying unit 152 learns a discriminant model of p(y=1|x) by performing learning for adjusting parameters of the NN. When a sentence vector is input to this discriminant model, the probability that the sentence is similar to sentence S1 is output.

第一特定部１５２は、セットＳテーブル１４２に登録された他の文（Ｓ２～Ｓｎ）についても、文Ｓ１と同様の処理を実行することで、判別モデルを生成し、対象文識別情報と、判別モデルの情報とを対応付けて、判別モデルテーブル１４５に登録する。 The first identification unit 152 generates a discriminant model by executing the same processing as for the sentence S1 for the other sentences (S2 to Sn) registered in the set S table 142, and identifies the target sentence identification information, It is registered in the discriminant model table 145 in association with the discriminant model information.

第二特定部１５３は、文書ＤＢ１４１に含まれる複数の文書情報のうち、複数文から構成される文書情報を特定し、特定した文書情報および文書識別情報を、セットＭテーブル１４３に登録する。 The second identifying unit 153 identifies document information composed of a plurality of sentences among a plurality of pieces of document information contained in the document DB 141 , and registers the identified document information and document identification information in the set M table 143 .

第二特定部１５３は、セットＭテーブル１４３を基にして、セットＭ’テーブル１４４を生成する。たとえば、第二特定部１５３は、セットＭテーブル１４３のレコードを選択し、選択したレコードに含まれる複数文を、一文毎に分割し、各文に文書サブ識別情報を割り当てる。第二特定部１５３は、文書識別情報と、文書サブ識別情報と、文書情報（一つの文）とを対応付けて、セットＭ’テーブル１４４に登録する。 The second identifying unit 153 generates a set M' table 144 based on the set M table 143. FIG. For example, the second identifying unit 153 selects a record of the set M table 143, divides multiple sentences included in the selected record into individual sentences, and assigns document sub-identification information to each sentence. The second specifying unit 153 registers the document identification information, the document sub-identification information, and the document information (one sentence) in the set M′ table 144 in association with each other.

第二特定部１５３は、判別モデルテーブル１４５に含まれる各文の判別モデルから、文Ｓ１の判別モデルを取得する。第二特定部１５３は、セットＭ’テーブル１４４に含まれる各文書情報（一つの文）を、文Ｓ１の判別モデルに適用することで、セットＭ’テーブル１４４に含まれる各文書情報のうち、文Ｓ１と類似する文書情報を特定する。 The second identifying unit 153 acquires the discriminant model of the sentence S1 from the discriminant model of each sentence included in the discriminant model table 145 . The second identifying unit 153 applies each piece of document information (one sentence) included in the set M′ table 144 to the discriminant model of the sentence S1, thereby, among the pieces of document information included in the set M′ table 144, Document information similar to sentence S1 is specified.

たとえば、第二特定部１５３は、セットＭ’テーブル１４４に含まれる文書情報のベクトルを、判別モデルに入力し、判別モデルから出力される確からしさの値が、閾値以上である場合に、文書情報が、文Ｓ１と類似する文書情報として特定する。 For example, the second identifying unit 153 inputs the vector of the document information included in the set M′ table 144 to the discriminant model, and if the likelihood value output from the discriminant model is equal to or greater than a threshold, the document information is specified as document information similar to the sentence S1.

第二特定部１５３は、特定した文Ｓ１と類似する文書情報の文書サブ識別情報と、セットＭ’テーブル１４４とを比較して、係る文書サブ識別情報に対応する文書識別情報を特定し、特定した文書識別情報を、リストＬ１（Ｓ１）として生成する。 The second identifying unit 153 compares the document sub-identification information of the document information similar to the identified sentence S1 with the set M' table 144 to identify the document identification information corresponding to the document sub-identification information. The document identification information obtained is generated as a list L1 (S1).

第二特定部１５３は、文Ｓ２～Ｓｎについても、文Ｓ１と同様の処理を行い、リストＬ１（Ｓ２）～リストＬ１（Ｓｎ）を生成する。 The second identification unit 153 performs the same processing as for sentence S1 on sentences S2 to Sn to generate lists L1(S2) to L1(Sn).

除外処理部１５４は、下記の処理を実行することで、セットＭテーブル１４３の文書情報（複数文）から、障害内容と関係のない文を除外する処理部である。たとえば、除外処理部１５４は、判別モデルを生成する処理、リストＬ２を生成する処理、削除フラグを設定する処理、除外する処理を行う。 The exclusion processing unit 154 is a processing unit that excludes sentences unrelated to the failure content from the document information (plural sentences) in the set M table 143 by executing the following processing. For example, the exclusion processing unit 154 performs a process of generating a discriminant model, a process of generating a list L2, a process of setting a deletion flag, and a process of excluding.

除外処理部１５４が実行する「判別モデルを生成する処理」について説明する。ここでは一例として、リストＬ１（Ｓ１）を用いて説明する。除外処理部１５４は、リストＬ１（Ｓ１）に含まれる文書情報に含まれる複数文のうち、文Ｓ１と類似しない文を選択し、選択した文の判別モデルを生成する。 The “process for generating a discriminant model” executed by the exclusion processing unit 154 will be described. Here, as an example, the list L1 (S1) will be used for explanation. The exclusion processing unit 154 selects sentences that are not similar to sentence S1 from among multiple sentences included in the document information included in list L1 (S1), and generates a discrimination model for the selected sentences.

図８は、除外処理部の処理の一例を説明するための図（１）である。図８に示すように、たとえば、リストＬ１（Ｓ１）には、文書識別情報ＭＲ１、ＭＲ２、・・・、ＭＲ１０が登録されているものとする。また、文書識別情報ＭＲ１の文書には、文書サブ識別情報Ｔ１１，Ｔ１２が含まれる。文書サブ識別情報Ｔ１１の文が、文Ｓ１と類似する文とすると、除外処理部１５４は、文書サブ識別情報Ｔ１２の文を選択し、選択した文の判別モデルを生成する。除外処理部１５４は、リストＬ１（Ｓ１）に含まれる文書識別情報ＭＲ２、・・・ＭＲ１０の文書についても、文Ｓ１と類似しない文を選択し、選択した文の判別モデルを生成する。 FIG. 8 is a diagram (1) for explaining an example of the processing of the exclusion processing unit; As shown in FIG. 8, for example, it is assumed that document identification information MR1, MR2, . Document sub-identification information T11 and T12 are included in the document with document identification information MR1. Assuming that the sentence with the document sub-identification information T11 is similar to the sentence S1, the exclusion processing unit 154 selects the sentence with the document sub-identification information T12 and generates a discrimination model for the selected sentence. The exclusion processing unit 154 also selects sentences that are not similar to the sentence S1 from among the documents with the document identification information MR2, .

一例として、文書サブ識別情報Ｔ１１の判別モデルを生成する処理の一例について説明する。以下の説明では、文書サブ識別情報Ｔ１２の文書情報を「文Ｔ１２」と表記する。除外処理部１５４は、文Ｔ１２と、セットＭ’データに含まれる各文（文Ｔ１２を含む）との類似度をそれぞれ算出し、セットＭ’テーブルに含まれる各文のうち、類似度上位の文を特定する。除外処理部１５４が、各文の類似度を算出する処理は、第一特定部１５２と同様にして、文のベクトルを用いる。 As an example, an example of processing for generating a discriminant model of document sub-identification information T11 will be described. In the following description, the document information of the document sub-identification information T12 is referred to as "sentence T12". The exclusion processing unit 154 calculates the degree of similarity between the sentence T12 and each sentence (including the sentence T12) included in the set M′ data, and selects sentences with the highest similarity among the sentences included in the set M′ table. Identify sentences. Similar to the first identification unit 152, the exclusion processing unit 154 uses sentence vectors to calculate the similarity of each sentence.

除外処理部１５４は、特定した類似度上位の文を「正例」としたＰＵ学習を行い、文Ｔ１２に類似する文であるか否かを判別する判別モデルを生成する。除外処理部１５４は、文Ｔ１２を識別する対象文書識別情報と、判別モデルの情報とを対応付けて、判別モデルテーブル１４５に登録する。除外処理部１５４が実行するＰＵ学習は、第一特定部１５２が実行するＰＵ学習と同様である。 The exclusion processing unit 154 performs PU learning using the identified high-similarity sentences as “positive examples” and generates a discrimination model for determining whether or not the sentences are similar to the sentence T12. The exclusion processing unit 154 associates the target document identification information for identifying the sentence T12 with the discriminant model information and registers them in the discriminant model table 145 . The PU learning performed by the exclusion processing unit 154 is the same as the PU learning performed by the first identifying unit 152 .

除外処理部１５４は、リストＬ１（Ｓ１）に含まれる、文Ｓ１と類似しない他の文についても、文Ｔ１１と同様の処理を実行することで、判別モデルを生成し、対象文書識別情報と、判別モデルの情報とを対応付けて、判別モデルテーブル１４５に登録する。 The exclusion processing unit 154 generates a discriminant model by executing the same process as for sentence T11 for other sentences not similar to sentence S1 included in the list L1 (S1), and generates a discriminant model, the target document identification information, It is registered in the discriminant model table 145 in association with the discriminant model information.

除外処理部１５４は、リストＬ１（Ｓ２～Ｓｎ）に含まれる、文Ｓ２～Ｓｎと類似しない他の文についても、文Ｔ１２と同様の処理を実行することで、判別モデルを生成し、対象文書識別情報と、判別モデルの情報とを対応付けて、判別モデルテーブル１４５に登録する。 The exclusion processing unit 154 generates a discriminant model by executing the same process as for sentence T12 for other sentences not similar to sentences S2 to Sn included in the list L1 (S2 to Sn), and extracts the target document. The identification information and the discriminant model information are associated and registered in the discriminant model table 145 .

続いて、除外処理部１５４が実行する「リストＬ２を生成する処理」について説明する。除外処理部１５４は、各リストＬ１（Ｓ１～Ｓｎ）に対して、複数のリストＬ２を生成する。たとえば、一つのリストＬ１（Ｓ１）に対応するリストＬ２の数は、リストＬ１（Ｓ１）の各文のうち、文Ｓ１と類似しない文の数となる。 Next, the “processing for generating the list L2” executed by the exclusion processing unit 154 will be described. The exclusion processing unit 154 generates a plurality of lists L2 for each list L1 (S1 to Sn). For example, the number of lists L2 corresponding to one list L1 (S1) is the number of sentences not similar to sentence S1 among the sentences of the list L1 (S1).

図９は、除外処理部の処理の一例を説明するための図（２）である。たとえば、リストＬ１（Ｓ１）には、文書識別情報ＭＲ１、ＭＲ２、ＭＲ１０が登録されているものとする。また、文書識別情報ＭＲ１の文書には、文書サブ識別情報Ｔ１１，Ｔ１２が含まれており、文書サブ識別情報Ｔ１１の文は、文Ｓ１に類似しているものとする。 FIG. 9 is a diagram (2) for explaining an example of the processing of the exclusion processing unit; For example, it is assumed that document identification information MR1, MR2, and MR10 are registered in list L1 (S1). It is also assumed that the document with document identification information MR1 includes document sub-identification information T11 and T12, and the sentence with document sub-identification information T11 is similar to sentence S1.

文書識別情報ＭＲ２の文書には、文書サブ識別情報Ｔ２１，Ｔ２２が含まれており、文書サブ識別情報Ｔ２１の文は、文Ｓ１に類似しているものとする。文書識別情報ＭＲ１０の文書には、文書サブ識別情報Ｔ１０１，Ｔ１０２が含まれており、文書サブ識別情報Ｔ１０１の文は、文Ｓ１に類似しているものとする。この場合、除外処理部１５４は、リストＬ１（Ｓ１）に対応するリストＬ２として、文書サブ識別情報Ｔ１２，Ｔ２２，Ｔ１０２に基づく、リストＬ２（Ｔ１２）、リストＬ２（Ｔ２２）、リストＬ２（Ｔ１０２）を生成する。 It is assumed that the document with document identification information MR2 includes document sub-identification information T21 and T22, and the sentence with document sub-identification information T21 is similar to sentence S1. It is assumed that the document with document identification information MR10 includes document sub-identification information T101 and T102, and the sentence with document sub-identification information T101 is similar to sentence S1. In this case, the exclusion processing unit 154 uses list L2 (T12), list L2 (T22), list L2 (T102) based on document sub-identification information T12, T22, and T102 as list L2 corresponding to list L1 (S1). to generate

ここでは一例として、文Ｓ１と類似しない文（Ｔ１２）のリストＬ２（Ｔ１２）を生成する場合について説明する。 Here, as an example, a case of generating a list L2 (T12) of sentences (T12) that are not similar to sentence S1 will be described.

除外処理部１５４は、判別モデルテーブル１４５に含まれる各文の判別モデルから文Ｔ１２の判別モデルを取得する。除外処理部１５４は、セットＭ’テーブル１４４に含まれる各文書情報（一つの文）を、文Ｔ１２の判別モデルに適用することで、セットＭ’テーブル１４４に含まれる各文書情報のうち、文Ｔ１１と類似する文書情報を特定する。 The exclusion processing unit 154 acquires the discriminant model of the sentence T12 from the discriminant model of each sentence included in the discriminant model table 145. FIG. The exclusion processing unit 154 applies each piece of document information (single sentence) included in the set M′ table 144 to the discrimination model for the sentence T12, so that among the pieces of document information included in the set M′ table 144, sentence Identify document information similar to T11.

たとえば、除外処理部１５４は、セットＭ’テーブル１４４に含まれる文書情報のベクトルを、判別モデルに入力し、判別モデルから出力される確からしさの値が、閾値以上である場合に、文書情報が、文Ｔ１２と類似する文書情報として特定する。 For example, the exclusion processing unit 154 inputs a vector of document information included in the set M′ table 144 to the discriminant model, and if the likelihood value output from the discriminant model is equal to or greater than a threshold, the document information is , as document information similar to the sentence T12.

除外処理部１５４は、特定した文Ｔ１２と類似する文書情報の文書サブ識別情報と、セットＭ’テーブル１４４とを比較して、係る文書サブ識別情報に対応する文書識別情報を特定し、特定した文書識別情報を、リストＬ２（Ｔ１２）として生成する。 The exclusion processing unit 154 compares the document sub-identification information of the document information similar to the identified sentence T12 with the set M' table 144, and identifies the document identification information corresponding to the document sub-identification information. Document identification information is generated as a list L2 (T12).

除外処理部１５４は、リストＬ２（Ｔ１２）を生成する処理と同様にして、文Ｔ２２の判別モデルを用いて、リストＬ２（Ｔ２２）を生成する。除外処理部１５４は、文Ｔ１０２の判別モデルを用いて、リストＬ２（Ｔ１０２）を生成する。 Exclusion processing unit 154 generates list L2 (T22) using the discriminant model of sentence T22 in the same manner as the process for generating list L2 (T12). The exclusion processing unit 154 uses the discriminant model of sentence T102 to generate list L2 (T102).

また、除外処理部１５４は、リストＬ１（Ｓ１）に対する複数のリストＬ２を生成する処理と同様にして、各リストＬ１（Ｓ２～Ｓｎ）に対する、複数のリストＬ２を生成する。 Also, the exclusion processing unit 154 generates a plurality of lists L2 for each list L1 (S2 to Sn) in the same manner as the process for generating a plurality of lists L2 for the list L1 (S1).

続いて、除外処理部１５４が実行する「削除フラグを設定する処理」について説明する。一例として、リストＬ１（Ｓ１）と、リストＬ２（Ｔ１２）とを基にして、文書サブ識別情報Ｔ１２の文を除外するか否かを判定する処理について説明する。たとえば、除外処理部１５４は、リストＬ１（Ｓ１）とリストＬ２（Ｔ１２）とで共通する文書情報の件数が多い場合に、文Ｓ１と文Ｔ１２とが関連し、文Ｔ１２を残すと判定する。 Next, the “deletion flag setting process” executed by the exclusion processing unit 154 will be described. As an example, the process of determining whether or not to exclude the sentence of the document sub-identification information T12 based on the list L1 (S1) and the list L2 (T12) will be described. For example, when the list L1 (S1) and the list L2 (T12) have a large number of pieces of common document information, the exclusion processing unit 154 determines that the sentences S1 and T12 are related and that the sentence T12 should be left.

一方、除外処理部１５４は、リストＬ１（Ｓ１）とリストＬ２（Ｔ１２）とで共通する文書情報の件数が少ない場合に、文Ｓ１と文Ｔ１２とが関連せず、文Ｔ１２を除外すると判定する。除外処理部１５４は、文Ｔ１２を除外すると判定した場合、セットＭ’テーブル１４４の文書サブ識別情報「Ｔ１２」の削除フラグを「オン」に設定する。また、除外処理部１５４は、文書サブ識別情報「Ｔ１２」の判別モデルを基にして、文Ｔ１２に類似する他の文（類似文）をセットＭ’テーブル１４４から検出し、検出した他の文（類似文）に対応する削除フラグを「オン」に設定する。 On the other hand, the exclusion processing unit 154 determines that the sentence T12 is excluded because the sentence S1 and the sentence T12 are not related when the number of pieces of document information common to the list L1 (S1) and the list L2 (T12) is small. . When the exclusion processing unit 154 determines to exclude the sentence T12, it sets the deletion flag of the document sub-identification information “T12” in the set M′ table 144 to “on”. Further, the exclusion processing unit 154 detects other sentences (similar sentences) similar to the sentence T12 from the set M′ table 144 based on the discriminant model of the document sub-identification information “T12”, and Set the deletion flag corresponding to (similar sentence) to "on".

図１０は、除外処理部の処理の一例を説明するための図（３）である。図１０に示す例では、リストＬ１（Ｓ１）には、文書識別情報「ＭＲ１，ＭＲ２，・・・，ＭＲ１０」の文書情報が登録されているものとする。リストＬ２（Ｔ１２）には、文書識別番号「ＭＲ１，ＭＲ２，ＭＲ３，ＭＲ１１，・・・，ＭＲ２０」の文書情報が含まれているもとする。 FIG. 10 is a diagram (3) for explaining an example of the processing of the exclusion processing unit; In the example shown in FIG. 10, it is assumed that document information of document identification information "MR1, MR2, . . . , MR10" is registered in the list L1 (S1). It is assumed that list L2 (T12) includes document information with document identification numbers "MR1, MR2, MR3, MR11, . . . , MR20".

また、除外処理部１５４は、リストＬ１（Ｓ１）と、リストＬ２（Ｔ１２）とを比較し、表２０Ａの得るものとする。図１０に示すように、リストＬ１（Ｓ１）に含まれ、かつ、リストＬ２（Ｔ１２）に含まれる文書情報の数を「３件」とする。リストＬ１（Ｓ１）に含まれ、かつ、リストＬ２（Ｔ１２）に含まれない文書情報の数を「７件」とする。リストＬ１（Ｓ１）に含まれず、かつ、リストＬ２（Ｔ１２）に含まれる文書情報の数を「１０件」とする。リストＬ１（Ｓ１）に含まれず、かつ、リストＬ２（Ｔ１２）に含まれない文書情報の数を「９８０件」とする。 Also, the exclusion processing unit 154 compares the list L1 (S1) and the list L2 (T12) to obtain Table 20A. As shown in FIG. 10, it is assumed that the number of pieces of document information included in list L1 (S1) and included in list L2 (T12) is "three". It is assumed that the number of pieces of document information included in the list L1 (S1) and not included in the list L2 (T12) is "7". It is assumed that the number of document information not included in the list L1 (S1) and included in the list L2 (T12) is "10". Assume that the number of document information not included in the list L1 (S1) and not included in the list L2 (T12) is "980".

除外処理部１５４は、表２０Ａに対して検定（正解確率検定、カイ二乗検定等）を行い、文Ｓ１と、文Ｔ１２との関連性の有無を判定する。たとえば、表２０Ａに対する検定では、危険度５％で、ｐ値＝１．９６×１０^－４となり、ｐ値の値が閾値未満であり、関連性ありと判定する。この場合、除外処理部１５４は、文Ｔ１２を除外しないと判定する。 The exclusion processing unit 154 performs a test (correctness probability test, chi-square test, etc.) on Table 20A to determine whether sentence S1 and sentence T12 are related. For example, in the test for Table 20A, at 5% risk, p-value = 1.96 x 10 ^-4 , p-value values below the threshold determine association. In this case, the exclusion processing unit 154 determines not to exclude sentence T12.

図１１は、除外処理部の処理の一例を説明するための図（４）である。図１１に示す例では、リストＬ１（Ｓ１）には、文書識別情報「ＭＲ１，ＭＲ２，・・・，ＭＲ１０」の文書情報が登録されているものとする。リストＬ２（Ｔ１０２）には、文書識別番号「ＭＲ１，ＭＲ２１，ＭＲ２２，・・・，ＭＲ４０」の文書情報が含まれているもとする。 FIG. 11 is a diagram (4) for explaining an example of the processing of the exclusion processing unit; In the example shown in FIG. 11, it is assumed that document information of document identification information "MR1, MR2, . . . , MR10" is registered in the list L1 (S1). It is assumed that list L2 (T102) includes document information with document identification numbers "MR1, MR21, MR22, . . . , MR40".

また、除外処理部１５４は、リストＬ１（Ｓ１）と、リストＬ２（Ｔ１０２）とを比較し、表２０Ｂを得るものとする。図１１に示すように、リストＬ１（Ｓ１）に含まれ、かつ、リストＬ２（Ｔ１０２）に含まれる文書情報の数を「１件」とする。リストＬ１（Ｓ１）に含まれ、かつ、リストＬ２（Ｔ１０２）に含まれない文書情報の数を「９件」とする。リストＬ１（Ｓ１）に含まれず、かつ、リストＬ２（Ｔ１０２）に含まれる文書情報の数を「２０件」とする。リストＬ１（Ｓ１）に含まれず、かつ、リストＬ２（Ｔ１０２）に含まれない文書情報の数を「９７０件」とする。 Also, the exclusion processing unit 154 compares the list L1 (S1) and the list L2 (T102) to obtain Table 20B. As shown in FIG. 11, it is assumed that the number of pieces of document information included in list L1 (S1) and included in list L2 (T102) is "one". Assume that the number of pieces of document information included in the list L1 (S1) and not included in the list L2 (T102) is "9". It is assumed that the number of document information not included in the list L1 (S1) and included in the list L2 (T102) is "20". Assume that the number of document information not included in the list L1 (S1) and not included in the list L2 (T102) is "970".

除外処理部１５４は、表２０Ｂに対して検定（正解確率検定、カイ二乗検定等）を行い、文Ｓ１と、文Ｔ１０２との関連性の有無を判定する。表２０Ｂに対する検定では、危険度５％で、ｐ値＝０．１９６となり、ｐ値の値が閾値以上であるため、関連性なしと判定する。除外処理部１５４は、セットＭ’テーブル１４４の文書サブ識別情報Ｔ１０２に対応する削除フラグを「オン」に設定する。 The exclusion processing unit 154 performs a test (correctness probability test, chi-square test, etc.) on Table 20B to determine whether sentence S1 and sentence T102 are related. In the test for Table 20B, the risk level is 5%, p-value = 0.196, and the value of p-value is equal to or greater than the threshold, so it is determined that there is no association. The exclusion processing unit 154 sets the deletion flag corresponding to the document sub-identification information T102 in the set M' table 144 to "on".

除外処理部１５４は、リストＬ１（Ｓ１～Ｓｎ）と、対応するリストＬ２とを比較して、各文が関連するか否かを判定し、関連しないと判定した文については、削除フラグを「オン」にする処理を繰り返し実行する。 The exclusion processing unit 154 compares the list L1 (S1 to Sn) with the corresponding list L2 to determine whether or not each sentence is related. Repeat the process to turn it on.

続いて、除外処理部１５４が実行する「除外する処理」について説明する。除外処理部１５４は、文書ＤＢ１４１と、セットＭ’テーブル１４４とを比較して、削除フラグが「オン」となる文を、文書ＤＢ１４１から削除する処理を実行する。 Next, the “exclusion processing” executed by the exclusion processing unit 154 will be described. The exclusion processing unit 154 compares the document DB 141 and the set M' table 144, and deletes from the document DB 141 sentences whose deletion flags are "ON".

図１２は、除外処理部の処理の一例を説明するための図（５）である。図１２に示すように、文書サブ識別情報Ｔ１２に対応する削除フラグが「オン」となっているため、除外処理部１５４は、文書識別情報「ＭＲ１」に対応する文書情報から、文書サブ識別情報Ｔ１２に対応する文「対処方法を教えてください。」を削除する。 FIG. 12 is a diagram (5) for explaining an example of the processing of the exclusion processing unit; As shown in FIG. 12, since the deletion flag corresponding to the document sub-identification information T12 is "ON", the exclusion processing unit 154 removes the document sub-identification information from the document information corresponding to the document identification information "MR1". Delete the sentence "Please tell me how to deal with it" corresponding to T12.

除外処理部１５４は、文書サブ識別情報Ｔ２２に対応する削除フラグが「オン」となっているため、文書識別情報「ＭＲ２」に対応する文書情報から、文書サブ識別情報Ｔ１２に対応する文「対処方法が不明です。」を削除する。 Since the deletion flag corresponding to the document sub-identification information T22 is "on", the exclusion processing unit 154 extracts the sentence "handling" corresponding to the document sub-identification information T12 from the document information corresponding to the document identification information "MR2". I don't know how." is deleted.

除外処理部１５４は、他の文書情報についても、削除フラグが「オン」となっている文を削除する処理を繰り返し実行することで、文書ＤＢ１４１を更新する。更新した文書ＤＢ１４１を、文書ＤＢ１４１ａと表記する。ここで、除外処理部１５４は、文書ＤＢ１４１ａを参照し、文書ＤＢ１４１ａに含まれる文書情報のうち、上記の除外する処理により、一つの文となった文書情報を、セットＳテーブル１４２に登録する。 The exclusion processing unit 154 updates the document DB 141 by repeatedly executing the process of deleting sentences with the deletion flag set to "on" for other document information as well. The updated document DB 141 is referred to as document DB 141a. Here, the exclusion processing unit 154 refers to the document DB 141a, and registers, in the set S table 142, the document information that has become one sentence by the above exclusion processing among the document information contained in the document DB 141a.

図２の説明に戻る。検出部１５５は、文書ＤＢ１４１ａに対してクラスタリングを行うことで、文書ＤＢ１４１ａに含まれる各文書情報を複数のクラスタに分類する。検出部１５５は、複数のクラスタのうち、クラスタに属する文書情報の数か所定数以上となるクラスタを検出する。所定数以上の文書情報が属するクラスタは、障害内容を記述した文書情報といえる。 Returning to the description of FIG. The detection unit 155 classifies each piece of document information included in the document DB 141a into a plurality of clusters by clustering the document DB 141a. The detection unit 155 detects clusters in which the number of pieces of document information belonging to the cluster is equal to or greater than a predetermined number among the plurality of clusters. A cluster to which more than a predetermined number of pieces of document information belong can be said to be document information describing failure details.

たとえば、検出部１５５は、文書ＤＢ１４１ａの各文書情報のベクトルを算出する。文書情報に一つの文が含まれている場合には、文書情報のベクトルは、かかる文のベクトルとなる。文書情報に、複数の文が含まれている場合には、各文のベクトルを積算することで、文書情報のベクトルを算出する。検出部１５５は、各文書情報のベクトルの類似度を算出し、類似度が閾値以上となる文書情報が同一のクラスタに属するようにクラスタリングを行う。 For example, the detection unit 155 calculates a vector of each piece of document information in the document DB 141a. If the document information contains one sentence, the vector of the document information is the vector of such sentence. If the document information contains a plurality of sentences, the vector of the document information is calculated by integrating the vectors of each sentence. The detection unit 155 calculates the degree of similarity between the vectors of each piece of document information, and performs clustering so that the pieces of document information whose degree of similarity is greater than or equal to a threshold belong to the same cluster.

検出部１５５は、検出結果を表示部１３０に出力して表示させてもよいし、ネットワークを介して、外部装置に通知してもよい。図１３は、検出結果の一例を示す図（１）である。図１３に示すように、検出部１５５は、更新された文書ＤＢ１４１ａをクラスタリングすることで、複数のクラスタ３０Ａ～３０Ｃを生成する。たとえば、検出部１５５は、クラスタ３０Ａに属する文書情報の数が所定数以上の場合には、クラスタ３０Ａを、検出結果として検出する。管理者は、クラスタ３０Ａを参照すると「syn flood（シンフラット）攻撃」に関する文書情報の数が所定数以上であるため「syn flood攻撃」を多発障害として特定することができる。 The detection unit 155 may output the detection result to the display unit 130 for display, or may notify an external device of the result via a network. FIG. 13 is a diagram (1) showing an example of a detection result. As shown in FIG. 13, the detection unit 155 generates a plurality of clusters 30A to 30C by clustering the updated document DB 141a. For example, when the number of pieces of document information belonging to cluster 30A is equal to or greater than a predetermined number, detection unit 155 detects cluster 30A as the detection result. When referring to the cluster 30A, the administrator can specify the "syn flood attack" as a frequent failure because the number of document information items related to the "syn flood attack" is equal to or greater than a predetermined number.

ところで、仮に、更新していない文書ＤＢ１４１に対してそのままクラスタリングを行い、同様にクラスタを検出すると、図１４に示すものとなる。図１４は、検出結果の一例を示す図（２）である。図１４に示すように、仮に、検出部１５５は、文書ＤＢ１４１をクラスタリングすることで、複数のクラスタ３１Ａ～３１Ｃを生成する。たとえば、検出部１５５は、クラスタ３１Ａに属する文書情報の数が所定数以上の場合には、クラスタ３１Ａを、検出結果として検出する。クラスタ３１Ａに含まれる各文書情報は、障害内容に関係のない、文「対象方法を教えてください」の影響により同一のクラスタに属しているため、管理者は、クラスタ３１Ａを参照しても、それぞれの障害の発生件数が所定数に満たないため、多発障害なしと判断してしまう。 By the way, if clustering is performed on the document DB 141 that has not been updated as it is and clusters are detected in the same way, the result shown in FIG. 14 is obtained. FIG. 14 is a diagram (2) showing an example of the detection result. As shown in FIG. 14, the detection unit 155 temporarily clusters the document DB 141 to generate a plurality of clusters 31A to 31C. For example, when the number of pieces of document information belonging to cluster 31A is equal to or greater than a predetermined number, detection unit 155 detects cluster 31A as the detection result. Each piece of document information included in the cluster 31A belongs to the same cluster due to the influence of the sentence "Please tell me the target method" that is not related to the failure content. Since the number of occurrences of each failure is less than the predetermined number, it is determined that there are no multiple failures.

次に、本実施例に係る情報処理装置１００の処理手順の一例について説明する。図１５及び図１６は、本実施例に係る情報処理装置の処理手順を示すフローチャートである。図１５に示すように、情報処理装置１００の第一特定部１５２は、文書ＤＢ１４１の各文書情報のうち、１文で構成される文書情報をセットＳテーブル１４２に登録する（ステップＳ１０１）。情報処理装置１００の第二特定部１５３は、文書ＤＢ１４１の各文書情報のうち、複数の文で構成される文書情報を、セットＭテーブル１４３に登録する（ステップＳ１０２）。 Next, an example of the processing procedure of the information processing apparatus 100 according to this embodiment will be described. 15 and 16 are flowcharts showing the processing procedure of the information processing apparatus according to this embodiment. As shown in FIG. 15, the first specifying unit 152 of the information processing apparatus 100 registers document information composed of one sentence in the set S table 142 among the pieces of document information in the document DB 141 (step S101). The second specifying unit 153 of the information processing apparatus 100 registers document information composed of a plurality of sentences among the document information in the document DB 141 in the set M table 143 (step S102).

第二特定部１５３は、セットＭテーブル１４３を基にして、セットＭ’テーブル１４４を生成する（ステップＳ１０３）。第一特定部１５２は、セットＳテーブル１４２から、１文（たとえば、文Ｓ１）を抽出する（ステップＳ１０４）。情報処理装置１００は、セットＳテーブル１４２の全ての文を抽出している場合（抽出に成功しない場合）には（ステップＳ１０５，Ｎｏ）、処理を終了する。 The second identifying unit 153 generates the set M' table 144 based on the set M table 143 (step S103). The first identifying unit 152 extracts one sentence (for example, sentence S1) from the set S table 142 (step S104). When all the sentences in the set S table 142 have been extracted (when the extraction is not successful) (step S105, No), the information processing apparatus 100 ends the process.

一方、第一特定部１５２は、セットＳテーブル１４２の文の抽出に成功した場合には（ステップＳ１０５，Ｙｅｓ）、文の判別モデルを生成する（ステップＳ１０６）。第二特定部１５３は、文の判定モデルをセットＭ’テーブル１４４に適用し、文の類似文を含む複数の文書情報を検出し、リストＬ１に登録する（ステップＳ１０７）。 On the other hand, when the first identification unit 152 succeeds in extracting the sentence from the set S table 142 (step S105, Yes), it generates a sentence discrimination model (step S106). The second identification unit 153 applies the sentence judgment model to the set M' table 144, detects a plurality of pieces of document information including sentences similar to the sentence, and registers them in the list L1 (step S107).

情報処理装置１００の除外処理部１５４は、リストＬ１の文書情報から、類似文以外の１文（たとえば、Ｔ１１）を抽出し（ステップＳ１０８）、図１６のステップＳ１０９に移行する。 The exclusion processing unit 154 of the information processing apparatus 100 extracts one sentence (for example, T11) other than the similar sentence from the document information of the list L1 (step S108), and proceeds to step S109 in FIG.

図１６の説明に移行する。除外処理部１５４は、抽出に成功した場合には（ステップＳ１０９，Ｙｅｓ）、リストＬ１から抽出した文の判別モデルを生成する（ステップＳ１１０）。除外処理部１５４は、リストＬ１から抽出した文の判別モデルをセットＭ’テーブルに適用し、文の類似文を含む文書情報を検出し、リストＬ２に登録する（ステップＳ１１１）。 16 will be described. If the extraction is successful (step S109, Yes), the exclusion processing unit 154 generates a discriminant model of the sentence extracted from the list L1 (step S110). The exclusion processing unit 154 applies the sentence discrimination model extracted from the list L1 to the set M' table, detects document information including sentences similar to the sentence, and registers them in the list L2 (step S111).

除外処理部１５４は、リストＬ１およびリストＬ２を基にして、抽出した各文（たとえば、文Ｓ１と、文Ｔ１２）との関連の有無を判定する（ステップＳ１１２）。除外処理部１５４は、各文が関連しない場合には（ステップＳ１１３，Ｎｏ）、ステップＳ１１５に移行する。 Exclusion processing unit 154 determines whether there is a relationship between the extracted sentences (for example, sentence S1 and sentence T12) based on list L1 and list L2 (step S112). If the sentences are not related (step S113, No), the exclusion processing unit 154 proceeds to step S115.

一方、除外処理部１５４は、各文が関連する場合には（ステップＳ１１３，Ｙｅｓ）、リストＬ１から抽出した文およびこの文に類似する類似文に対応する削除フラグをオンに設定する（ステップＳ１１４）。 On the other hand, if the sentences are related (step S113, Yes), the exclusion processing unit 154 turns on the deletion flags corresponding to the sentence extracted from the list L1 and similar sentences (step S114). ).

除外処理部１５４は、リストＬ１から、未選択の文を抽出し（ステップＳ１１５）、ステップＳ１０９に移行する。 The exclusion processing unit 154 extracts unselected sentences from the list L1 (step S115), and proceeds to step S109.

ところで、除外処理部１５４は、抽出に成功しない場合には（ステップＳ１０９，Ｎｏ）、文書ＤＢ１４１の各文書情報から、削除フラグがオンとなる文を削除する（ステップＳ１１６）。除外処理部１５４は、削除により１文となって文書情報を、セットＳテーブル１４２に追加し（ステップＳ１１７）、図１５のステップＳ１０４に移行する。 By the way, if the extraction is not successful (step S109, No), the exclusion processing unit 154 deletes sentences whose deletion flag is turned on from each piece of document information in the document DB 141 (step S116). The exclusion processing unit 154 adds the document information that has been deleted into one sentence to the set S table 142 (step S117), and proceeds to step S104 in FIG.

次に、本実施例に係る情報処理装置１００の効果について説明する。情報処理装置１００は、着目した障害内容を記述した文を含む文書を検出し、検出した文書のうち、複数の文を含む文書について、着目した障害内容に関係のある文（障害内容を記述した文）を残す。また、情報処理装置は、着目した障害内容に関係のない文（障害内容を記述していない文）を削除する処理を行う。このように、障害内容を記述した文に関連する文を残し、関連しない文を削除することができるので、クラスタリング処理による障害検出において、誤検知や検出もれを抑止することができる。 Next, the effects of the information processing apparatus 100 according to this embodiment will be described. The information processing apparatus 100 detects a document containing a sentence describing the content of the failure of interest, and among the detected documents, for a document including a plurality of sentences, a sentence related to the content of the failure of interest (a sentence describing the content of the failure) is detected. sentence). In addition, the information processing device performs processing for deleting sentences unrelated to the focused failure content (sentences not describing the failure content). In this way, sentences related to the sentence describing the content of the failure can be left, and sentences not related can be deleted. Therefore, in failure detection by clustering processing, erroneous detection and detection omission can be suppressed.

たとえば、図１３で説明したように、障害内容を記述した文書情報を残し、障害内容を記述していない文書情報を削除することで、類似する障害内容に関連する文書情報をクラスタに分類することができるので、多発障害を特定することが容易となる。図１４で説明したように、仮に、障害内容を記述していない文書情報が残っていると、障害内容を記述していない文書情報を共通に含む文書情報が同一のクラスタに分類されてしまい、多発障害を検出することが難しい。 For example, as described with reference to FIG. 13, by leaving the document information describing the content of the failure and deleting the document information not describing the content of the failure, the document information related to the similar content of the failure can be classified into clusters. This makes it easier to identify multiple failures. As explained with reference to FIG. 14, if document information that does not describe the content of the failure remains, document information that commonly includes document information that does not describe the content of the failure will be classified into the same cluster. It is difficult to detect multiple failures.

また、情報処理装置１００は、文と類似する他の文を判別する場合に、ＰＵ学習を基にして、文の判別モデルを生成し、かかる判別モデルを基にして、類似する他の文を判別する。これによって、文に関する教師データが少ない場合でも、類似する文を判別することができる。 In addition, when discriminating other sentences similar to a sentence, the information processing apparatus 100 generates a discrimination model of the sentence based on PU learning, and based on the discrimination model, discriminates other similar sentences. discriminate. This makes it possible to discriminate similar sentences even when there is little teacher data about sentences.

次に、本実施例に示した情報処理装置１００と同様の機能を実現するコンピュータのハードウェア構成の一例について説明する。図１７は、本実施例に係る情報処理装置と同様の機能を実現するコンピュータのハードウェア構成の一例を示す図である。 Next, an example of a hardware configuration of a computer that implements the same functions as the information processing apparatus 100 shown in this embodiment will be described. FIG. 17 is a diagram showing an example of the hardware configuration of a computer that implements the same functions as the information processing apparatus according to this embodiment.

図１７に示すように、コンピュータ５００は、各種演算処理を実行するＣＰＵ５０１と、ユーザからのデータの入力を受け付ける入力装置５０２と、ディスプレイ５０３とを有する。また、コンピュータ５００は、記憶媒体からプログラム等を読み取る読み取り装置５０４と、有線または無線ネットワークを介して、外部装置等との間でデータの授受を行うインタフェース装置５０５とを有する。コンピュータ５００は、各種情報を一時記憶するＲＡＭ５０６と、ハードディスク装置５０７とを有する。そして、各装置５０１～５０７は、バス５０８に接続される。 As shown in FIG. 17, a computer 500 has a CPU 501 that executes various arithmetic processes, an input device 502 that receives data input from a user, and a display 503 . The computer 500 also has a reading device 504 that reads programs and the like from a storage medium, and an interface device 505 that exchanges data with an external device or the like via a wired or wireless network. The computer 500 has a RAM 506 that temporarily stores various information and a hard disk device 507 . Each device 501 - 507 is then connected to a bus 508 .

ハードディスク装置５０７は、取得プログラム５０７ａ、第一特定プログラム５０７ｂ、第二特定プログラム５０７ｃ、除外処理プログラム５０７ｄ、検出プログラム５０７ｅを有する。ＣＰＵ５０１は、取得プログラム５０７ａ、第一特定プログラム５０７ｂ、第二特定プログラム５０７ｃ、除外処理プログラム５０７ｄ、検出プログラム５０７ｅを読み出してＲＡＭ５０６に展開する。 The hard disk device 507 has an acquisition program 507a, a first identification program 507b, a second identification program 507c, an exclusion processing program 507d, and a detection program 507e. The CPU 501 reads out the acquisition program 507 a , the first identification program 507 b , the second identification program 507 c , the exclusion processing program 507 d and the detection program 507 e and develops them in the RAM 506 .

取得プログラム５０７ａは、取得プロセス５０６ａとして機能する。第一特定プログラム５０７ｂは、第一特定プロセス５０６ｂとして機能する。第二特定プログラム５０７ｃは、第二特定プロセス５０６ｃとして機能する。除外処理プログラム５０７ｄは、除外処理プロセス５０６ｄとして機能する。検出プログラム５０７ｅは、検出プロセス５０６ｅとして機能する。 Acquisition program 507a functions as acquisition process 506a. The first identification program 507b functions as a first identification process 506b. The second identification program 507c functions as a second identification process 506c. The exclusion processing program 507d functions as an exclusion processing process 506d. Detection program 507e functions as detection process 506e.

取得プロセス５０６ａの処理は、取得部１５１の処理に対応する。第一特定プロセス５０６ｂの処理は、第一特定部１５２の処理に対応する。第二特定プロセス５０６ｃの処理は、第二特定部５５０ｃの処理に対応する。除外処理プロセス５０６ｄの処理は、除外処理部１５４の処理に対応する。検出プロセス５０６ｅの処理は、検出部１５５の処理に対応する。 The processing of the acquisition process 506 a corresponds to the processing of the acquisition unit 151 . The processing of the first identification process 506 b corresponds to the processing of the first identification unit 152 . The processing of the second identification process 506c corresponds to the processing of the second identification unit 550c. The processing of the exclusion processing process 506 d corresponds to the processing of the exclusion processing unit 154 . The processing of the detection process 506 e corresponds to the processing of the detection unit 155 .

なお、各プログラム５０７ａ～５０７ｅついては、必ずしも最初からハードディスク装置５０７に記憶させておかなくてもよい。例えば、コンピュータ５００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に各プログラムを記憶させておく。そして、コンピュータ５００が各プログラム５０７ａ～５０７ｅを読み出して実行するようにしてもよい。 Note that the programs 507a to 507e do not necessarily have to be stored in the hard disk device 507 from the beginning. For example, each program is stored in a “portable physical medium” such as a flexible disk (FD), CD-ROM, DVD disk, magneto-optical disk, IC card, etc. inserted into the computer 500 . Then, the computer 500 may read out and execute each of the programs 507a to 507e.

以上の各実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following additional remarks are disclosed regarding the embodiments including the above examples.

（付記１）コンピュータが実行する文書処理方法であって、
一文または複数文から構成される複数の文書を取得し、
前記複数の文書の中から予め設定された条件を満たす一文から構成される第一着目文を特定し、
前記複数の文書の中から、特定した前記第一着目文を含む複数文から構成される複数の第一文書を取得し、
取得した前記複数の第一文書の中から、特定した前記第一着目文以外の一文から構成される第二着目文を特定し、
前記複数の文書の中から、前記第二着目文を含む複数文から構成される複数の第二文書を取得し、
前記複数の第一文書および前記複数の第二文書のそれぞれに含まれる同一文書の数と、同一文書以外の文書の数との関係に基づいて、前記複数の文書の中から前記第二着目文を除外する
処理を実行することを特徴とする文書処理方法。 (Appendix 1) A document processing method executed by a computer,
Get multiple documents consisting of one or more sentences,
identifying a first sentence of interest composed of a sentence that satisfies a preset condition from among the plurality of documents;
Obtaining, from among the plurality of documents, a plurality of first documents composed of a plurality of sentences including the specified first sentence of interest;
identifying a second sentence of interest composed of one sentence other than the identified first sentence of interest from among the plurality of acquired first documents;
acquiring, from among the plurality of documents, a plurality of second documents composed of a plurality of sentences including the second sentence of interest;
the second sentence of interest from among the plurality of documents based on the relationship between the number of identical documents included in each of the plurality of first documents and the plurality of second documents and the number of documents other than the same document; A document processing method characterized by performing a process of excluding

（付記２）前記除外する処理によって、前記第二着目文の除外された複数の文書を、文書間の類似度を基にして複数のクラスタに分類する処理を更に実行することを特徴とする付記１に記載の文書処理方法。 (Supplementary note 2) A supplementary note characterized by further executing a process of classifying the plurality of documents excluded from the second sentence of interest into a plurality of clusters based on the degree of similarity between the documents. 1. The document processing method according to 1.

（付記３）前記第一着目文は、障害内容に関して記述された文であり、前記複数のクラスタに属する文書の数を基にして、前記障害内容に関して記述された文に関連するクラスタを検出する処理を更に実行することを特徴とする付記２に記載の文書処理方法。 (Appendix 3) The first sentence of interest is a sentence describing the content of the failure, and based on the number of documents belonging to the plurality of clusters, a cluster related to the sentence describing the content of the failure is detected. 3. The method of claim 2, further comprising performing processing.

（付記４）前記複数の文書に含まれる複数の文のうち、前記第一着目文に類似する文を正例とした学習を行うことで、前記第一着目文に類似する文か否かを判別する判別モデルを生成する処理を更に実行し、前記第一文書を取得する処理は、前記判別モデルを基にして、前記複数の文書の中から、特定した第一着目文を含む複数文から構成される複数の第一文書を取得することを特徴とする付記１、２または３に記載の文書処理方法。 (Appendix 4) By performing learning using a sentence similar to the first sentence of interest among the plurality of sentences included in the plurality of documents as a positive example, whether or not the sentence is similar to the first sentence of interest is determined. The process of generating a discriminant model for discrimination is further executed, and the process of acquiring the first document is performed based on the discriminant model, from among the plurality of documents, a plurality of sentences including the specified first sentence of interest. 4. The document processing method of claim 1, 2 or 3, comprising obtaining a plurality of structured first documents.

（付記５）コンピュータに、
一文または複数文から構成される複数の文書を取得し、
前記複数の文書の中から予め設定された条件を満たす一文から構成される第一着目文を特定し、
前記複数の文書の中から、特定した前記第一着目文を含む複数文から構成される複数の第一文書を取得し、
取得した前記複数の第一文書の中から、特定した前記第一着目文以外の一文から構成される第二着目文を特定し、
前記複数の文書の中から、前記第二着目文を含む複数文から構成される複数の第に文書を取得し、
前記複数の第一文書および前記複数の第二文書のそれぞれに含まれる同一文書の数と、同一文書以外の文書の数との関係に基づいて、前記複数の文書の中から前記第二着目文を除外する
処理を実行させることを特徴とする文書処理プログラム。 (Appendix 5) to the computer,
Get multiple documents consisting of one or more sentences,
identifying a first sentence of interest composed of a sentence that satisfies a preset condition from among the plurality of documents;
Obtaining, from among the plurality of documents, a plurality of first documents composed of a plurality of sentences including the specified first sentence of interest;
identifying a second sentence of interest composed of one sentence other than the identified first sentence of interest from among the plurality of acquired first documents;
obtaining a plurality of second documents composed of a plurality of sentences including the second sentence of interest from among the plurality of documents;
the second sentence of interest from among the plurality of documents based on the relationship between the number of identical documents included in each of the plurality of first documents and the plurality of second documents and the number of documents other than the same document; A word processing program characterized by causing a process to be executed.

（付記６）前記除外する処理によって、前記第二着目文の除外された複数の文書を、文書間の類似度を基にして複数のクラスタに分類する処理を更に実行することを特徴とする付記５に記載の文書処理プログラム。 (Supplementary note 6) A supplementary note characterized by further executing a process of classifying the plurality of documents from which the second sentence of interest is excluded by the exclusion process into a plurality of clusters based on the degree of similarity between the documents. 5. The document processing program according to 5.

（付記７）前記第一着目文は、障害内容に関して記述された文であり、前記複数のクラスタに属する文書の数を基にして、前記障害内容に関して記述された文に関連するクラスタを検出する処理を更に実行することを特徴とする付記６に記載の文書処理プログラム。 (Appendix 7) The first sentence of interest is a sentence describing the content of the failure, and based on the number of documents belonging to the plurality of clusters, a cluster related to the sentence describing the content of the failure is detected. 7. The document processing program of Claim 6, further performing processing.

（付記８）前記複数の文書に含まれる複数の文のうち、前記第一着目文に類似する文を正例とした学習を行うことで、前記第一着目文に類似する文か否かを判別する判別モデルを生成する処理を更に実行し、前記第一文書を取得する処理は、前記判別モデルを基にして、前記複数の文書の中から、特定した第一着目文を含む複数文から構成される複数の第一文書を取得することを特徴とする付記５、６または７に記載の文書処理プログラム。 (Appendix 8) By performing learning using a sentence similar to the first sentence of interest among the plurality of sentences included in the plurality of documents as a positive example, whether or not the sentence is similar to the first sentence of interest is determined. The process of generating a discriminant model for discrimination is further executed, and the process of acquiring the first document is performed based on the discriminant model, from among the plurality of documents, a plurality of sentences including the specified first sentence of interest. 8. A document processing program according to Clause 5, 6 or 7, wherein a plurality of structured first documents are obtained.

（付記９）一文または複数文から構成される複数の文書を取得し、
前記複数の文書の中から予め設定された条件を満たす一文から構成される第一着目文を特定する第一特定部と、
前記複数の文書の中から、特定した前記第一着目文を含む複数文から構成される複数の第一文書を取得し、取得した前記複数の第一文書の中から、特定した前記第一着目文以外の一文から構成される第二着目文を特定する第二特定部と、
前記複数の第一文書および前記複数の第二文書のそれぞれに含まれる同一文書の数と、同一文書以外の文書の数との関係に基づいて、前記複数の文書の中から前記第二着目文を除外する除外処理部と
を有することを特徴とする情報処理装置。 (Appendix 9) Acquiring a plurality of documents consisting of one or more sentences,
a first identifying unit that identifies a first sentence of interest composed of a sentence that satisfies a preset condition from among the plurality of documents;
Obtaining a plurality of first documents composed of a plurality of sentences including the specified first sentence of interest from among the plurality of documents, and obtaining the specified first sentence of interest from among the plurality of acquired first documents a second identifying unit that identifies a second sentence of interest composed of a sentence other than a sentence;
the second sentence of interest from among the plurality of documents based on the relationship between the number of identical documents included in each of the plurality of first documents and the plurality of second documents and the number of documents other than the same document; and an exclusion processing unit that excludes the information processing apparatus.

（付記１０）前記除外処理部によって、前記第二着目文の除外された複数の文書を、文書間の類似度を基にして複数のクラスタに分類する検出部を更に有することを特徴とする付記９に記載の情報処理装置。 (Supplementary Note 10) A supplementary note characterized by further comprising a detection unit that classifies a plurality of documents from which the second sentence of interest is excluded by the exclusion processing unit into a plurality of clusters based on the similarity between the documents. 10. The information processing device according to 9.

（付記１１）前記第一着目文は、障害内容に関して記述された文であり、前記検出部は、前記複数のクラスタに属する文書の数を基にして、前記障害内容に関して記述された文に関連するクラスタを検出する処理を更に実行することを特徴とする付記１０に記載の情報処理装置。 (Supplementary Note 11) The first sentence of interest is a sentence describing the content of the failure, and the detection unit detects the sentence describing the content of the failure based on the number of documents belonging to the plurality of clusters. 11. The information processing apparatus according to appendix 10, further executing a process of detecting a cluster that

（付記１２）前記第一特定部は、複数の文書に含まれる複数の文のうち、前記第一着目文に類似する文を正例とした学習を行うことで、前記第一着目文に類似する文か否かを判別する判別モデルを生成する処理を更に実行し、前記第二特定部は、前記判別モデルを基にして、前記複数の文書の中から、特定した第一着目文を含む複数文から構成される複数の第一文書を取得することを特徴とする付記９、１０または１１に記載の情報処理装置。 (Supplementary Note 12) The first identifying unit performs learning using, among a plurality of sentences included in a plurality of documents, sentences similar to the first sentence of interest as positive examples. Further executing a process of generating a discriminant model for discriminating whether or not the sentence includes the first sentence of interest specified from among the plurality of documents based on the discriminant model 12. The information processing apparatus according to appendix 9, 10, or 11, wherein a plurality of first documents composed of a plurality of sentences are obtained.

１００情報処理装置
１１０通信部
１２０入力部
１３０表示部
１４０記憶部
１４１文書ＤＢ
１４２セットＳテーブル
１４３セットＭテーブル
１４４セットＭ’テーブル
１４５判別モデルテーブル
１５０制御部
１５１取得部
１５２第一特定部
１５３第二特定部
１５４除外処理部
１５５検出部 100 information processing device 110 communication unit 120 input unit 130 display unit 140 storage unit 141 document DB
142 set S table 143 set M table 144 set M' table 145 discrimination model table 150 control unit 151 acquisition unit 152 first identification unit 153 second identification unit 154 exclusion processing unit 155 detection unit

Claims

A computer-implemented document processing method comprising:
Get multiple documents consisting of one or more sentences,
identifying a first sentence of interest composed of a sentence that satisfies a preset condition from among the plurality of documents;
Obtaining, from among the plurality of documents, a plurality of first documents composed of a plurality of sentences including the specified first sentence of interest;
identifying a second sentence of interest composed of one sentence other than the identified first sentence of interest from among the plurality of acquired first documents;
acquiring, from among the plurality of documents, a plurality of second documents composed of a plurality of sentences including the second sentence of interest;
the second sentence of interest from among the plurality of documents based on the relationship between the number of identical documents included in each of the plurality of first documents and the plurality of second documents and the number of documents other than the same document; A document processing method characterized by performing a process of excluding

2. The method according to claim 1, further comprising classifying the plurality of excluded documents of the second sentence of interest into a plurality of clusters based on the similarity between the documents in the excluding processing. document processing method.

The first sentence of interest is a sentence describing the content of the failure, and based on the number of documents belonging to the plurality of clusters, a process of detecting a cluster related to the sentence describing the content of the failure is further executed. 3. The document processing method according to claim 2, wherein:

A discriminant model that determines whether or not the sentence is similar to the first sentence of interest by performing learning using sentences similar to the first sentence of interest among the plurality of sentences included in the plurality of documents as positive examples. and obtaining the first document, based on the discriminant model, comprises a plurality of sentences including the specified first sentence of interest from among the plurality of documents. 4. A document processing method according to claim 1, 2 or 3, wherein a plurality of first documents are obtained.

to the computer,
Get multiple documents consisting of one or more sentences,
identifying a first sentence of interest composed of a sentence that satisfies a preset condition from among the plurality of documents;
Obtaining, from among the plurality of documents, a plurality of first documents composed of a plurality of sentences including the specified first sentence of interest;
identifying a second sentence of interest composed of one sentence other than the identified first sentence of interest from among the plurality of acquired first documents;
acquiring, from among the plurality of documents, a plurality of second documents composed of a plurality of sentences including the second sentence of interest;
the second sentence of interest from among the plurality of documents based on the relationship between the number of identical documents included in each of the plurality of first documents and the plurality of second documents and the number of documents other than the same document; A word processing program characterized by causing a process to be executed.

Get multiple documents consisting of one or more sentences,
a first identifying unit that identifies a first sentence of interest composed of a sentence that satisfies a preset condition from among the plurality of documents;
Obtaining a plurality of first documents composed of a plurality of sentences including the specified first sentence of interest from among the plurality of documents, and obtaining the specified first sentence of interest from among the plurality of acquired first documents a second identifying unit that identifies a second sentence of interest composed of a sentence other than a sentence;
Obtaining a plurality of second documents composed of a plurality of sentences including the second sentence of interest from among the plurality of documents, and acquiring the same document included in each of the plurality of first documents and the plurality of second documents and an exclusion processing unit that excludes the second sentence of interest from among the plurality of documents based on the relationship between the number of the documents and the number of documents other than the same document.