JP7305077B2

JP7305077B2 - Information processing device, abstract output method, and abstract output program

Info

Publication number: JP7305077B2
Application number: JP2023501746A
Authority: JP
Inventors: 辰彦斉藤; 啓恭伍井
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2023-07-07
Anticipated expiration: 2041-02-25
Also published as: JPWO2022180721A1; WO2022180721A1

Description

本開示は、情報処理装置、要約文出力方法、及び要約文出力プログラムに関する。 The present disclosure relates to an information processing device, a summary output method, and a summary output program.

コールセンターでは、オペレータは、顧客とオペレータとの対話内容をまとめて、報告書を作成する。報告書を作成する作業は、アフターコールワークとも言う。オペレータがアフターコールワークを行うことは、オペレータの作業負担が大きい。そこで、対話内容を自動的に要約することが考えられる。ここで、要約を作成する技術が提案されている（特許文献１を参照）。特許文献１の要約装置は、入力されたデータから抽出された重要な文を結合した要約を作成する。なお、当該データは、音声認識結果を示すデータなどである。 At the call center, the operator summarizes the contents of the conversation between the customer and the operator and creates a report. The work of preparing reports is also called after-call work. The operator's work burden is heavy for the operator to perform after-call work. Therefore, it is conceivable to automatically summarize the dialogue contents. Here, a technique for creating a summary has been proposed (see Patent Literature 1). The summarizing device of Patent Literature 1 creates a summary by combining important sentences extracted from input data. Note that the data is, for example, data indicating a speech recognition result.

特許５５６２２１９号公報Japanese Patent No. 5562219 国際公開第２０２２／０４９６６８号WO2022/049668

上記の技術では、音声認識結果を示すデータを用いて、要約文が作成される。そのため、例えば、音声認識に誤りがあった場合、要約文には、誤りが含まれる。また、例えば、要約文の内容は、話し言葉で表される。このように、上記の技術では、要約文は、データの文体に依存される。そのため、上記の技術に基づく要約文は、適切と言えない場合がある。 In the above technique, a summary sentence is created using data indicating the speech recognition result. Therefore, for example, if there is an error in speech recognition, the summary contains the error. Also, for example, the content of the abstract is expressed in spoken language. Thus, in the above technique, the summary sentence depends on the writing style of the data. Therefore, the abstract based on the above technology may not be appropriate.

本開示の目的は、適切な要約文を出力することである。 The purpose of the present disclosure is to output an appropriate summary sentence.

本開示の一態様に係る情報処理装置が提供される。情報処理装置は、対話内容を示すテキストデータと、過去の報告書に基づいて作成された複数の文を含む知識情報とを取得する取得部と、形態素解析を用いて、前記テキストデータを解析する解析部と、前記形態素解析の結果を用いて、前記テキストデータに含まれている複数の文の中から、１つの文を、第１の重要文として、抽出する重要文抽出部と、前記第１の重要文に基づいて、クエリを作成し、前記知識情報の中から、前記クエリにより得られる文を検索し、予め設定された方法で、検索された複数の文のそれぞれのスコアを算出する作成検索算出部と、前記第１の重要文と、検索された複数の文のそれぞれとに基づいて、前記第１の重要文と、検索された複数の文のそれぞれとが類似している度合である複数の類似度を算出し、前記複数の類似度に基づいて、検索された複数の文のそれぞれのスコアを更新する算出更新部と、複数の文のそれぞれのスコアに基づいて、検索された複数の文のうちの１つの文を、要約文として、選択する選択部と、前記要約文を出力する出力部と、を有する。 An information processing device according to one aspect of the present disclosure is provided. The information processing device analyzes the text data using an acquisition unit that acquires text data indicating the content of dialogue and knowledge information including a plurality of sentences created based on past reports, and morphological analysis. an analysis unit, an important sentence extracting unit for extracting one sentence as a first important sentence from among a plurality of sentences included in the text data using the result of the morphological analysis; A query is created based on one important sentence, sentences obtained by the query are searched from the knowledge information, and scores of each of the retrieved sentences are calculated by a preset method. A degree of similarity between the first important sentence and each of the plurality of retrieved sentences, based on the created search calculation unit, the first important sentence, and each of the plurality of retrieved sentences a calculation updating unit that calculates a plurality of similarities and updates the score of each of the retrieved sentences based on the plurality of similarities; and based on the score of each of the retrieved sentences, a selection unit for selecting one of the plurality of sentences as a summary sentence; and an output unit for outputting the summary sentence.

本開示によれば、適切な要約文を出力することができる。 According to the present disclosure, an appropriate summary can be output.

実施の形態１の情報処理装置の機能を示すブロック図である。2 is a block diagram showing functions of the information processing apparatus according to Embodiment 1; FIG. 実施の形態１の情報処理装置が有するハードウェアを示す図である。2 illustrates hardware included in the information processing apparatus according to the first embodiment; FIG. 実施の形態１の不要語辞書の例を示す図である。4 is a diagram showing an example of an unnecessary word dictionary according to Embodiment 1; FIG. 実施の形態１の文分割辞書の例を示す図である。4 is a diagram showing an example of a sentence segmentation dictionary according to Embodiment 1; FIG. 実施の形態１の重要文の抽出の例を示す図である。FIG. 4 is a diagram showing an example of extraction of important sentences according to the first embodiment; 実施の形態１のクエリの例を示す図である。4 is a diagram showing an example of a query according to Embodiment 1; FIG. 実施の形態１の知識データベースの例を示す図である。2 is a diagram showing an example of a knowledge database according to Embodiment 1; FIG. 実施の形態１の検索結果の例を示す図である。FIG. 10 is a diagram showing an example of search results according to the first embodiment; FIG. 実施の形態１の更新されたスコアの例を示す図である。FIG. 4 is a diagram showing an example of updated scores according to Embodiment 1; FIG. 実施の形態１の情報処理装置が実行する処理の例を示すフローチャートである。4 is a flow chart showing an example of processing executed by the information processing apparatus according to the first embodiment; 実施の形態１の作成装置の機能を示すブロック図である。2 is a block diagram showing functions of the creation device of Embodiment 1; FIG. 実施の形態１の対話データベースの例を示す図である。4 is a diagram showing an example of a dialogue database according to Embodiment 1; FIG. 実施の形態１の報告書データベースの例を示す図である。4 is a diagram showing an example of a report database according to Embodiment 1; FIG. 実施の形態１の非文の削除の例を示す図である。FIG. 4 is a diagram showing an example of deletion of non-sentences according to the first embodiment; 実施の形態１の作成装置が実行する処理の例を示すフローチャートである。4 is a flow chart showing an example of processing executed by the creation device of Embodiment 1. FIG. 実施の形態２の情報処理装置の機能を示すブロック図である。3 is a block diagram showing functions of an information processing apparatus according to a second embodiment; FIG. 実施の形態２のカテゴリが推定されない場合の例を示す図である。FIG. 12 is a diagram showing an example of a case where a category is not estimated according to the second embodiment; FIG. 実施の形態２の情報処理装置が実行する処理の例を示すフローチャートである。10 is a flow chart showing an example of processing executed by the information processing apparatus according to the second embodiment; 実施の形態２の作成装置の機能を示すブロック図である。FIG. 11 is a block diagram showing functions of a creation device according to Embodiment 2; FIG. 実施の形態２の作成装置が実行する処理の例を示すフローチャートである。10 is a flow chart showing an example of processing executed by the creating device according to the second embodiment; 実施の形態３の情報処理装置の機能を示すブロック図である。FIG. 11 is a block diagram showing functions of an information processing apparatus according to a third embodiment; 実施の形態３の補助情報の例を示す図である。FIG. 13 is a diagram showing an example of auxiliary information according to Embodiment 3; FIG. 実施の形態３の情報処理装置が実行する処理の例を示すフローチャートである。10 is a flow chart showing an example of processing executed by the information processing apparatus according to the third embodiment; 実施の形態３の作成装置の機能を示すブロック図である。FIG. 11 is a block diagram showing functions of a creation device according to Embodiment 3; 実施の形態３の作成装置が実行する処理の例を示すフローチャートである。14 is a flow chart showing an example of processing executed by the creation device of Embodiment 3; 実施の形態４の情報処理装置の機能を示すブロック図である。FIG. 11 is a block diagram showing functions of an information processing apparatus according to a fourth embodiment; FIG. 実施の形態４の情報処理装置が実行する処理の例を示すフローチャートである。FIG. 13 is a flow chart showing an example of processing executed by an information processing apparatus according to a fourth embodiment; FIG.

以下、図面を参照しながら実施の形態を説明する。以下の実施の形態は、例にすぎず、本開示の範囲内で種々の変更が可能である。 Embodiments will be described below with reference to the drawings. The following embodiments are merely examples, and various modifications are possible within the scope of the present disclosure.

実施の形態１．
図１は、実施の形態１の情報処理装置の機能を示すブロック図である。情報処理装置１００は、要約文出力方法を実行する装置である。情報処理装置１００は、対話要約生成装置と呼んでもよい。Embodiment 1.
FIG. 1 is a block diagram showing functions of an information processing apparatus according to a first embodiment. The information processing device 100 is a device that executes a summary sentence output method. The information processing device 100 may be called a dialogue summary generation device.

まず、情報処理装置１００が有するハードウェアを説明する。
図２は、実施の形態１の情報処理装置が有するハードウェアを示す図である。情報処理装置１００は、プロセッサ１０１、揮発性記憶装置１０２、不揮発性記憶装置１０３、及びインタフェース１０４を有する。First, hardware included in the information processing apparatus 100 will be described.
FIG. 2 illustrates hardware included in the information processing apparatus according to the first embodiment. The information processing device 100 has a processor 101 , a volatile memory device 102 , a nonvolatile memory device 103 and an interface 104 .

プロセッサ１０１は、情報処理装置１００全体を制御する。例えば、プロセッサ１０１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）などである。プロセッサ１０１は、マルチプロセッサでもよい。また、情報処理装置１００は、処理回路を有してもよい。処理回路は、単一回路又は複合回路でもよい。 The processor 101 controls the information processing apparatus 100 as a whole. For example, the processor 101 is a CPU (Central Processing Unit), FPGA (Field Programmable Gate Array), or the like. Processor 101 may be a multiprocessor. Further, the information processing device 100 may have a processing circuit. The processing circuit may be a single circuit or multiple circuits.

揮発性記憶装置１０２は、情報処理装置１００の主記憶装置である。例えば、揮発性記憶装置１０２は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。不揮発性記憶装置１０３は、情報処理装置１００の補助記憶装置である。例えば、不揮発性記憶装置１０３は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、又はＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）である。 The volatile memory device 102 is the main memory device of the information processing device 100 . For example, the volatile memory device 102 is RAM (Random Access Memory). The nonvolatile storage device 103 is an auxiliary storage device of the information processing device 100 . For example, the nonvolatile memory device 103 is a HDD (Hard Disk Drive) or an SSD (Solid State Drive).

インタフェース１０４は、他の装置と通信する。また、インタフェース１０４は、顧客とオペレータとの音声を示す音声信号を受信してもよい。さらに、インタフェース１０４は、オペレータが入力したテキストデータを受信してもよい。 Interface 104 communicates with other devices. Interface 104 may also receive audio signals indicative of customer and operator speech. In addition, interface 104 may receive textual data entered by an operator.

図１に戻って、情報処理装置１００が有する機能を説明する。
情報処理装置１００は、記憶部１１０、取得部１２０、解析部１３０、不要語削除部１４０、重要文抽出部１５０、作成検索算出部１６０、算出更新部１７０、選択部１８０、及び出力部１９０を有する。Returning to FIG. 1, functions of the information processing apparatus 100 will be described.
The information processing apparatus 100 includes a storage unit 110, an acquisition unit 120, an analysis unit 130, an unnecessary word deletion unit 140, an important sentence extraction unit 150, a creation search calculation unit 160, a calculation update unit 170, a selection unit 180, and an output unit 190. have.

記憶部１１０は、揮発性記憶装置１０２又は不揮発性記憶装置１０３に確保した記憶領域として実現してもよい。
取得部１２０、解析部１３０、不要語削除部１４０、重要文抽出部１５０、作成検索算出部１６０、算出更新部１７０、選択部１８０、及び出力部１９０の一部又は全部は、処理回路によって実現してもよい。また、取得部１２０、解析部１３０、不要語削除部１４０、重要文抽出部１５０、作成検索算出部１６０、算出更新部１７０、選択部１８０、及び出力部１９０の一部又は全部は、プロセッサ１０１が実行するプログラムのモジュールとして実現してもよい。例えば、プロセッサ１０１が実行するプログラムは、要約文出力プログラムとも言う。例えば、要約文出力プログラムは、記録媒体に記録されている。The storage unit 110 may be implemented as a storage area secured in the volatile storage device 102 or the nonvolatile storage device 103 .
Part or all of the acquisition unit 120, the analysis unit 130, the unnecessary word deletion unit 140, the important sentence extraction unit 150, the creation search calculation unit 160, the calculation update unit 170, the selection unit 180, and the output unit 190 are realized by processing circuits. You may Some or all of the acquisition unit 120, the analysis unit 130, the unnecessary word deletion unit 140, the important sentence extraction unit 150, the creation search calculation unit 160, the calculation update unit 170, the selection unit 180, and the output unit 190 may be implemented as a module of a program executed by For example, the program executed by the processor 101 is also called a summary sentence output program. For example, a summary sentence output program is recorded on a recording medium.

記憶部１１０は、不要語辞書１１１、文分割辞書１１２、単語重要度モデル１１３、及び知識データベース１１４を記憶してもよい。不要語辞書１１１、文分割辞書１１２、単語重要度モデル１１３、及び知識データベース１１４については、後で説明する。 The storage unit 110 may store an unnecessary word dictionary 111 , a sentence segmentation dictionary 112 , a word importance model 113 and a knowledge database 114 . The unnecessary word dictionary 111, sentence segmentation dictionary 112, word importance model 113, and knowledge database 114 will be described later.

取得部１２０は、テキストデータを取得する。例えば、取得部１２０は、テキストデータを外部装置（例えば、クラウドサーバ）から取得する。また、例えば、取得部１２０は、テキストデータを記憶部１１０から取得する。テキストデータは、対話内容を示すデータである。例えば、対話内容は、顧客とオペレータとの対話内容、チャットボットとユーザとの対話内容などである。なお、顧客とオペレータとの対話内容を示すテキストデータは、音声認識技術を用いることで、生成される。また、テキストデータには、対話内容を示す複数の文が含まれている。 Acquisition unit 120 acquires text data. For example, the acquisition unit 120 acquires text data from an external device (for example, cloud server). Also, for example, the acquisition unit 120 acquires text data from the storage unit 110 . Text data is data indicating the content of dialogue. For example, the content of dialogue includes the content of dialogue between a customer and an operator, the content of dialogue between a chatbot and a user, and the like. Text data indicating the content of the conversation between the customer and the operator is generated using speech recognition technology. Also, the text data includes a plurality of sentences indicating the contents of the dialogue.

取得部１２０は、知識データベース１１４を取得する。例えば、取得部１２０は、知識データベース１１４を記憶部１１０から取得する。また、例えば、取得部１２０は、知識データベース１１４を外部装置から取得する。 Acquisition unit 120 acquires knowledge database 114 . For example, the acquisition unit 120 acquires the knowledge database 114 from the storage unit 110 . Also, for example, the acquisition unit 120 acquires the knowledge database 114 from an external device.

解析部１３０は、形態素解析を用いて、テキストデータを解析する。これにより、テキストデータに含まれる単語及び品詞が、抽出される。また、解析部１３０は、構文解析を用いて、テキストデータ内の文節及び文節間の関係を解析してもよい。
解析部１３０は、後述する不要語削除処理が実行された後のテキストデータを、解析してもよい。The analysis unit 130 analyzes text data using morphological analysis. As a result, words and parts of speech included in the text data are extracted. The analysis unit 130 may also use syntactic analysis to analyze clauses and relationships between clauses in the text data.
The analysis unit 130 may analyze the text data after the unnecessary word deletion process described below has been executed.

不要語削除部１４０は、予め設定された方法で、テキストデータの中から不要語を削除する。例えば、不要語削除部１４０は、不要語辞書１１１を用いて、不要語を削除する。ここで、不要語辞書１１１を例示する。 The unnecessary word deletion unit 140 deletes unnecessary words from the text data by a preset method. For example, the unnecessary word deletion unit 140 uses the unnecessary word dictionary 111 to delete unnecessary words. Here, the unnecessary word dictionary 111 is illustrated.

図３は、実施の形態１の不要語辞書の例を示す図である。例えば、不要語辞書１１１は、記憶部１１０に格納されている。不要語辞書１１１には、“あー”、“えー”などのフィラー、及び“お待たせ致しました。”のような定型句が登録される。また、不要語辞書１１１には、単語と品詞との対応関係を示す情報が登録されてもよい。 FIG. 3 is a diagram showing an example of an unnecessary word dictionary according to the first embodiment. For example, the unnecessary word dictionary 111 is stored in the storage unit 110 . In the unnecessary word dictionary 111, fillers such as "ah" and "erh" and fixed phrases such as "Thank you for waiting" are registered. Information indicating the correspondence between words and parts of speech may be registered in the unnecessary word dictionary 111 .

また、不要語削除部１４０は、機械学習により得られた学習済モデルを用いて、テキストデータの中から不要語を削除してもよい。なお、テキストデータの中に不要語が含まれていない場合、不要語削除部１４０は、処理を実行しない。 The unnecessary word deletion unit 140 may also delete unnecessary words from the text data using a learned model obtained by machine learning. If the text data does not contain unnecessary words, the unnecessary word deletion unit 140 does not execute the process.

重要文抽出部１５０は、形態素解析の結果を用いて、テキストデータ（例えば、不要語が削除されたテキストデータ）に含まれている複数の文の中から、予め設定された数の文を、重要文として抽出する。なお、予め設定された数は、１つでもよいし、２つ以上でもよい。予め設定された数が１つである場合、抽出された重要文は、第１の重要文とも言う。以下の説明では、予め設定された数は、２つ以上とする。 The important sentence extraction unit 150 extracts a preset number of sentences from a plurality of sentences included in the text data (for example, text data from which unnecessary words have been deleted) using the result of the morphological analysis. Extract as important sentences. Note that the preset number may be one, or two or more. When the preset number is one, the extracted important sentence is also called the first important sentence. In the following description, the preset number is two or more.

詳細に、重要文の抽出処理を説明する。まず、重要文抽出部１５０は、テキストデータに含まれている複数の文を分割する。複数の文を分割する方法としては、音声認識の区切れ目で分割する方法、テキストデータに含まれる句点又は読点で分割する方法、文分割辞書１１２を用いて分割する方法、学習済モデルを用いて分割する方法などが挙げられる。ここで、文分割辞書１１２を例示する。 The extraction processing of important sentences will be described in detail. First, the important sentence extraction unit 150 divides a plurality of sentences included in the text data. As a method for dividing a plurality of sentences, there are a method of dividing at divisions of speech recognition, a method of dividing by periods or commas included in text data, a method of dividing using a sentence dividing dictionary 112, and a method of dividing using a trained model. A method of dividing is mentioned. Here, the sentence segmentation dictionary 112 is illustrated.

図４は、実施の形態１の文分割辞書の例を示す図である。例えば、文分割辞書１１２は、記憶部１１０に格納されている。重要文抽出部１５０は、文分割辞書１１２を用いて、複数の文を分割してもよい。 FIG. 4 is a diagram showing an example of a sentence segmentation dictionary according to the first embodiment. For example, the sentence segmentation dictionary 112 is stored in the storage unit 110 . The important sentence extraction unit 150 may use the sentence division dictionary 112 to divide a plurality of sentences.

重要文抽出部１５０は、複数の文のそれぞれに対して、重要度を算出する。まず、１つの文に対応する重要度の算出方法を説明する。例えば、重要文抽出部１５０は、形態素解析の結果により得られた、当該１つの文に含まれている複数の単語のそれぞれの重要度を、ＴＦ－ＩＤＦ、ＯｋａｐｉＢＭ２５などを用いて、算出する。重要文抽出部１５０は、複数の単語のそれぞれの重要度を加算することで、当該１つの文に対応する重要度を算出する。また、重要文抽出部１５０は、形態素解析の結果により得られた、当該１つの文に含まれている複数の単語のそれぞれの重要度を、学習済モデルである単語重要度モデル１１３を用いて、算出してもよい。そして、重要文抽出部１５０は、複数の単語のそれぞれの重要度を加算することで、当該１つの文に対応する重要度を算出する。また、重要文抽出部１５０は、複数の単語のそれぞれの重要度の平均値を、当該１つの文に対応する重要度として、算出してもよい。 The important sentence extraction unit 150 calculates the importance of each sentence. First, a method of calculating the degree of importance corresponding to one sentence will be described. For example, the important sentence extraction unit 150 uses TF-IDF, Okapi BM25, etc. to calculate the importance of each of a plurality of words included in the one sentence obtained by the morphological analysis result. . The important sentence extraction unit 150 adds the importance of each of a plurality of words to calculate the importance of the sentence. In addition, the important sentence extraction unit 150 uses the word importance model 113, which is a trained model, to determine the importance of each of the plurality of words included in the one sentence obtained by the morphological analysis. , may be calculated. Then, the important sentence extracting unit 150 calculates the importance level corresponding to the one sentence by adding the importance levels of each of the plurality of words. Also, the important sentence extraction unit 150 may calculate the average value of the importance of each of a plurality of words as the importance corresponding to the one sentence.

このように、当該１つの文に対応する重要度が、算出される。同様に、重要文抽出部１５０は、複数の文のそれぞれに対して、重要度を算出する。これにより、複数の文に対応する複数の重要度が、算出される。 Thus, the importance corresponding to the one sentence is calculated. Similarly, the important sentence extraction unit 150 calculates the importance of each sentence. As a result, multiple degrees of importance corresponding to multiple sentences are calculated.

重要文抽出部１５０は、重要度が高い順に、複数の文を並べる。重要文抽出部１５０は、上位の予め設定された数の文を、重要文として抽出する。具体的に、重要文の抽出を例示する。 The important sentence extraction unit 150 arranges a plurality of sentences in descending order of importance. The important sentence extracting unit 150 extracts a preset number of high ranking sentences as important sentences. Specifically, extraction of important sentences will be exemplified.

図５は、実施の形態１の重要文の抽出の例を示す図である。図５は、テキストデータに含まれている複数の文（すなわち、複数の分割文）を示している。重要文抽出部１５０は、複数の文の中から、重要文を抽出する。 FIG. 5 is a diagram showing an example of extraction of important sentences according to the first embodiment. FIG. 5 shows multiple sentences (that is, multiple divided sentences) included in the text data. The important sentence extraction unit 150 extracts important sentences from a plurality of sentences.

上述したように、重要文抽出部１５０は、上位の予め設定された数の文を、重要文として抽出する。これにより、重要度の低い文が除かれる。後述するように、重要文に基づいて要約文が選択されるので、重要文のみが抽出されることは、適切な要約文が選択される。 As described above, the important sentence extraction unit 150 extracts a preset number of high-ranking sentences as important sentences. This eliminates sentences of low importance. As will be described later, a summary sentence is selected based on the important sentence, so extracting only the important sentence means that an appropriate summary sentence is selected.

作成検索算出部１６０は、重要文に基づいて、クエリを作成する。クエリは、文単位、又は単語（すなわち、形態素）単位で作成されてもよい。クエリは、文又は単語のｎ－ｇｒａｍ連鎖により作成されてもよい。ここで、クエリを例示する。 The creation search calculation unit 160 creates a query based on the key sentence. Queries may be formulated on a sentence-by-sentence basis or on a word (ie, morpheme) basis. Queries may be constructed by n-gram chains of sentences or words. Here is an example query.

図６は、実施の形態１のクエリの例を示す図である。図６は、重要文に基づいて作成されたクエリを示している。図６は、２－ｇｒａｍ連鎖を用いて作成されたクエリを示している。 FIG. 6 is a diagram showing an example of a query according to Embodiment 1. FIG. FIG. 6 shows a query created based on key sentences. FIG. 6 shows a query constructed using 2-gram chaining.

ここで、作成検索算出部１６０は、重要文の中の連続する単語を語彙化することにより、クエリを作成してもよい。例えば、作成検索算出部１６０は、重要文の中の否定語と、否定語の直前の動詞とを語彙化することにより、クエリを作成する。例えば、図６は、否定語と直前の動詞とが連結された“居る＿ない”を示している。 Here, the creation search calculation unit 160 may create a query by lexicalizing consecutive words in the important sentence. For example, the creation search calculation unit 160 creates a query by lexicalizing the negative word in the important sentence and the verb immediately before the negative word. For example, FIG. 6 shows "iru_nai" concatenated with the negative word and the immediately preceding verb.

ここで、数詞と、数詞の後の単位とのそれぞれが、クエリに含まれる場合が考えられる。例えば、重要文が“エアコンを２５℃に設定した”である場合、クエリは、“エアコン２５ ℃ 設定”である。しかし、当該クエリを用いて、検索が行われた場合、意味が異なる文が検索されることがある。例えば、“冷蔵庫を２５日に購入”が、検索される。そこで、作成検索算出部１６０は、重要文の中の数詞と、当該数詞の後の単位とを語彙化することにより、クエリを作成してもよい。例えば、語彙化されたクエリは、“２５＿℃”である。これにより、上記のクエリは、“エアコン２５＿℃ 設定”に変わる。このように、数詞と単位とを１つの語彙にすることで、意味が異なる文が検索されることが、防止される。 Here, it is conceivable that both the numeral and the unit after the numeral are included in the query. For example, if the key sentence is "set the air conditioner to 25 degrees Celsius", the query is "set the air conditioner to 25 degrees Celsius". However, when a search is performed using the query, sentences with different meanings may be retrieved. For example, "Purchase a refrigerator on the 25th" is retrieved. Therefore, the creation search calculation unit 160 may create a query by lexicalizing the numerals in the important sentence and the units following the numerals. For example, a lexicalized query is "25_°C". This changes the above query to "air conditioner 25_°C setting". In this way, by combining numerals and units into one vocabulary, retrieval of sentences with different meanings is prevented.

作成検索算出部１６０は、知識データベース１１４の中から、クエリにより得られる文を検索する。言い換えれば、作成検索算出部１６０は、クエリを用いて、知識データベース１１４の中から、重要文の意味に近い文を検索する。ここで、知識データベース１１４を例示する。 The creation search calculation unit 160 searches the knowledge database 114 for sentences obtained by the query. In other words, the creation search calculation unit 160 uses the query to search the knowledge database 114 for sentences that are close in meaning to the important sentence. Here, the knowledge database 114 is illustrated.

図７は、実施の形態１の知識データベースの例を示す図である。知識データベース１１４は、知識情報とも言う。知識データベース１１４は、過去の報告書に基づいて作成された複数の文を含む。知識データベース１１４は、文、インデックス登録クエリ、ステップ、及びカテゴリの項目を有する。 FIG. 7 is a diagram showing an example of a knowledge database according to the first embodiment. The knowledge database 114 is also called knowledge information. The knowledge database 114 contains a plurality of sentences created based on past reports. The knowledge database 114 has items of sentences, indexing queries, steps, and categories.

文の項目には、過去の報告書に基づいて作成された文が登録される。また、文の項目には、文節、連続した複数の文節、連続した複数の文が登録されてもよい。インデックス登録クエリの項目には、クエリが登録される。ステップの項目には、業務の名称が登録される。カテゴリの項目には、文の項目に登録されている文の内容が示すカテゴリが登録される。このように、文の項目に登録されている複数の文のそれぞれには、カテゴリが対応付けられている。
また、知識データベース１１４は、グラフ形式の情報でもよい。In the sentence item, sentences created based on past reports are registered. In addition, clauses, a plurality of continuous clauses, and a plurality of continuous sentences may be registered in the sentence item. A query is registered in the index registration query field. The name of the work is registered in the step item. A category indicated by the content of the sentence registered in the sentence item is registered in the category item. In this way, a category is associated with each of a plurality of sentences registered in the sentence item.
Also, the knowledge database 114 may be information in a graph format.

このように、作成検索算出部１６０は、クエリを用いて、知識データベース１１４の中から、重要文の意味に近い文を検索する。検索結果として、複数の文が検索される。 In this way, the creation search calculation unit 160 searches the knowledge database 114 for a sentence that is close in meaning to the important sentence using the query. A plurality of sentences are retrieved as the retrieval result.

また、作成検索算出部１６０は、重要文と、テキストデータの中の当該重要文の前文と後文とのうちの少なくとも１つとに基づいて、クエリを作成してもよい。前文と後文とうちの少なくとも１つに含まれている単語が、クエリの中に含まれることで、作成検索算出部１６０は、重要文が短い場合でも、重要文と関係のある文を検索できる。 Moreover, the creation search calculation unit 160 may create a query based on the important sentence and at least one of the preceding sentence and the following sentence of the relevant important sentence in the text data. By including a word included in at least one of the anterior sentence and the posterior sentence in the query, the creation search calculation unit 160 can retrieve sentences related to the important sentence even if the important sentence is short. can.

作成検索算出部１６０は、重要文に含まれている単語の類義語をクエリに含めてもよい。重要文に含まれている単語が“点かない”である場合、作成検索算出部１６０は、“点かない”の類義語である“消える”をクエリに含める。なお、作成検索算出部１６０は、ｗｏｒｄ２ｖｅｃを用いて、重要文に含まれている単語の類義語を得ることができる。このように、作成検索算出部１６０は、類義語をクエリに含めることで、類義語が含まれている文を検索できる。 The creation search calculation unit 160 may include synonyms of words included in the important sentence in the query. When the word included in the important sentence is "not lit", the creation search calculation unit 160 includes "disappear", which is a synonym of "not lit", in the query. Note that the creation search calculation unit 160 can use word2vec to obtain synonyms of words included in the important sentence. In this way, the creation search calculation unit 160 can search for sentences containing synonyms by including the synonyms in the query.

また、作成検索算出部１６０は、検索対象を、名詞、動詞、形容詞、形状詞などの品詞でフィルタリングしてもよい。作成検索算出部１６０は、重要文と知識データベース１１４とを用いて、文ベクトルの類似度を算出してもよい。 Moreover, the creation search calculation unit 160 may filter search targets by parts of speech such as nouns, verbs, adjectives, and shape words. The creation search calculation unit 160 may use the important sentence and the knowledge database 114 to calculate the similarity of sentence vectors.

作成検索算出部１６０は、予め設定された方法で、検索された複数の文のそれぞれのスコアを算出する。例えば、予め設定された方法は、次のような方法である。例えば、クエリは、“エアコン２５＿℃ 設定”とする。検索された文は、“エアコンは２５℃”と“エアコンが動かない”とする。作成検索算出部１６０は、“エアコンは２５℃”のスコアを２と算出する。作成検索算出部１６０は、“エアコンが動かない”のスコアを、１と算出する。すなわち、作成検索算出部１６０は、検索された文とクエリに含まれる単語とのマッチ数をスコアとして、算出する。このように、スコアが、算出される。また、作成検索算出部１６０は、Ｅｌａｓｔｉｃｓｅａｒｃｈのような検索エンジンで用いられている算出方法を用いて、スコアを算出してもよい。 The creation search calculation unit 160 calculates scores for each of the plurality of searched sentences by a preset method. For example, the preset method is as follows. For example, the query is "air conditioner 25_°C setting". Assume that the retrieved sentences are "the air conditioner is at 25°C" and "the air conditioner does not work". The creation search calculation unit 160 calculates a score of 2 for “air conditioner is 25°C”. The creation search calculation unit 160 calculates a score of 1 for “the air conditioner does not work”. That is, the creation search calculation unit 160 calculates the number of matches between the searched sentence and the words included in the query as a score. Thus, a score is calculated. Moreover, the creation search calculation unit 160 may calculate a score using a calculation method used in a search engine such as Elasticsearch.

作成検索算出部１６０は、複数の文のそれぞれのスコアを検索結果に含めてもよい。ここで、検索結果を例示する。
図８は、実施の形態１の検索結果の例を示す図である。図８が示すように、検索された文と、当該検索された文のスコアとが、検索結果として出力される。The composition search calculator 160 may include scores for each of the multiple sentences in the search results. Here, the search results are illustrated.
FIG. 8 is a diagram showing an example of search results according to the first embodiment. As shown in FIG. 8, the searched sentence and the score of the searched sentence are output as the search result.

算出更新部１７０は、重要文と、検索された複数の文のそれぞれとに基づいて、複数の類似度を算出する。まず、重要文と、検索された複数の文のうちの１つの文とに基づいて、類似度が算出される場合を説明する。なお、当該類似度は、当該重要文と、当該１つの文とが類似している度合である。よって、上記の複数の類似度とは、重要文と、検索された複数の文のそれぞれとが類似している度合である。 The calculation update unit 170 calculates a plurality of degrees of similarity based on the important sentence and each of the retrieved sentences. First, a case will be described where similarity is calculated based on an important sentence and one sentence out of a plurality of retrieved sentences. The degree of similarity is the degree of similarity between the important sentence and the one sentence. Therefore, the multiple degrees of similarity are degrees of similarity between the important sentence and each of the retrieved multiple sentences.

ここで、類似度を算出する方法として、Ｊａｃｃａｒｄ係数などを用いる方法が考えられる。しかし、音声対話が対象である場合、Ｊａｃｃａｒｄ係数などの方法を用いることは、望ましくない。理由は、差集合の要素数が大きいほど値が小さくなるというＪａｃｃａｒｄ係数の特徴にある。音声対話において、発話が冗長になることが多いため、重要文に含まれる単語の集合から検索された文に含まれる単語の集合を引いた差集合の要素数が大きいことは、許容されるべきである。一方で、検索された文に含まれる単語の集合から重要文に含まれる単語の集合を引いた差集合の要素数が大きい場合は、発話していない余計な内容が検索された文に含まれる可能性があるためペナルティを与えたい。そこで、算出更新部１７０は、式（１）を用いて、類似度を算出する。Ｉは、重要文に含まれる単語の集合である。Ｋは、検索された文に含まれる単語の集合である。 Here, as a method of calculating the degree of similarity, a method using a Jaccard coefficient or the like is conceivable. However, it is undesirable to use methods such as the Jaccard coefficients when speech dialogue is of interest. The reason lies in the characteristic of the Jaccard coefficient that the larger the number of elements in the difference set, the smaller the value. In spoken dialogue, utterances often become redundant, so a large number of elements in the difference set obtained by subtracting the set of words contained in the retrieved sentence from the set of words contained in the key sentence should be allowed. is. On the other hand, if the set of words contained in the retrieved sentence minus the set of words contained in the key sentence has a large number of elements in the difference set, the retrieved sentence contains unnecessary content that is not uttered. I want to give a penalty because there is a possibility. Therefore, the calculation updating unit 170 calculates the degree of similarity using Equation (1). I is a set of words included in the important sentence. K is a set of words contained in the retrieved sentences.

このように、算出更新部１７０は、検索された文に含まれる単語の集合から重要文に含まれる単語の集合を引いた差集合の要素数が大きい場合にペナルティを与えることにより、集合同士の類似度を算出する。これにより、冗長な発話を吸収しつつ、発話していない余計な内容を含まない文が検索できるようになる。 In this way, the calculation updating unit 170 gives a penalty when the number of elements of the difference set obtained by subtracting the set of words included in the important sentence from the set of words included in the retrieved sentence is large. Calculate the similarity. As a result, it is possible to retrieve sentences that do not contain unnecessary content that is not spoken while absorbing redundant utterances.

上記したように、算出更新部１７０は、重要文と、当該１つの文とに基づいて、類似度を算出する。同様に、算出更新部１７０は、複数の文のそれぞれに対応する類似度を算出する。これにより、複数の類似度が算出される。算出更新部１７０は、検索された複数の文のうち、上位Ｎ個の文のそれぞれを用いて、類似度を算出してもよい。
算出更新部１７０は、複数の類似度に基づいて、検索された複数の文のそれぞれのスコアを更新する。ここで、更新されたスコアを例示する。As described above, the calculation updating unit 170 calculates the degree of similarity based on the important sentence and the one sentence. Similarly, the calculation updating unit 170 calculates similarities corresponding to each of the plurality of sentences. Thereby, a plurality of degrees of similarity are calculated. The calculation update unit 170 may calculate the degree of similarity using each of the top N sentences among the retrieved sentences.
The calculation updating unit 170 updates the score of each of the retrieved sentences based on the similarities. Here is an example of the updated score.

図９は、実施の形態１の更新されたスコアの例を示す図である。図９のスコアは、類似度を示している。 9 is a diagram showing an example of updated scores according to Embodiment 1. FIG. The score in FIG. 9 indicates the degree of similarity.

選択部１８０は、更新された、複数の文のそれぞれのスコアに基づいて、検索された複数の文のうちの１つの文を、要約文として、選択する。例えば、選択部１８０は、最も高いスコアに対応する文を、要約文として、選択する。以下の説明では、最も高いスコアに対応する文が、要約文として、特定されるものとする。
出力部１９０は、要約文を出力する。The selection unit 180 selects one of the plurality of retrieved sentences as a summary sentence based on the updated score of each of the plurality of sentences. For example, the selection unit 180 selects the sentence corresponding to the highest score as the summary sentence. In the following description, the sentence corresponding to the highest score shall be identified as the summary sentence.
The output unit 190 outputs a summary sentence.

次に、情報処理装置１００が実行する処理を、フローチャートを用いて、説明する。
図１０は、実施の形態１の情報処理装置が実行する処理の例を示すフローチャートである。
（ステップＳ１１）取得部１２０は、テキストデータを取得する。
（ステップＳ１２）解析部１３０は、形態素解析を用いて、テキストデータを解析する。
（ステップＳ１３）不要語削除部１４０は、テキストデータの中から不要語を削除する。Next, processing executed by the information processing apparatus 100 will be described using a flowchart.
10 is a flowchart illustrating an example of processing executed by the information processing apparatus according to the first embodiment; FIG.
(Step S11) The acquisition unit 120 acquires text data.
(Step S12) The analysis unit 130 analyzes the text data using morphological analysis.
(Step S13) The unnecessary word deletion section 140 deletes unnecessary words from the text data.

（ステップＳ１４）重要文抽出部１５０は、テキストデータに含まれている複数の文を分割する。重要文抽出部１５０は、複数の文のそれぞれに対して、重要度を算出する。これにより、複数の文に対応する複数の重要度が算出される。重要文抽出部１５０は、複数の重要度に基づいて、上位の予め設定された数の文を、重要文として抽出する。これにより、複数の重要文が抽出される。 (Step S14) The important sentence extractor 150 divides a plurality of sentences included in the text data. The important sentence extraction unit 150 calculates the importance of each sentence. As a result, multiple degrees of importance corresponding to multiple sentences are calculated. The important sentence extraction unit 150 extracts a preset number of high-ranking sentences as important sentences based on multiple degrees of importance. As a result, multiple important sentences are extracted.

（ステップＳ１５）作成検索算出部１６０は、未処理の重要文があるか否かを判定する。未処理の重要文がある場合、処理は、ステップＳ１６に進む。全ての重要文が処理された場合、処理は、ステップＳ２０に進む。
（ステップＳ１６）作成検索算出部１６０は、未処理の重要文の中から、１つの重要文を選択する。なお、選択された重要文は、第１の重要文と呼んでもよい。(Step S15) The creation search calculation unit 160 determines whether or not there is an unprocessed important sentence. If there is an unprocessed important sentence, the process proceeds to step S16. If all important sentences have been processed, the process proceeds to step S20.
(Step S16) The creation search calculation unit 160 selects one important sentence from the unprocessed important sentences. Note that the selected important sentence may be called a first important sentence.

（ステップＳ１７）作成検索算出部１６０は、重要文に基づいて、クエリを作成し、知識データベース１１４の中から、クエリにより得られる文を検索する。これにより、複数の文が検索される。作成検索算出部１６０は、検索された複数の文のそれぞれのスコアを算出する。これにより、複数の文に対応する複数のスコアが、算出される。 (Step S17) The creation search calculation unit 160 creates a query based on the important sentence, and searches the knowledge database 114 for a sentence obtained by the query. This retrieves multiple sentences. The creation search calculation unit 160 calculates scores for each of the plurality of searched sentences. Thereby, multiple scores corresponding to multiple sentences are calculated.

（ステップＳ１８）算出更新部１７０は、重要文と、上位Ｎ個のスコアに対応する複数の文のそれぞれとに基づいて、複数の類似度を算出する。算出更新部１７０は、複数の類似度に基づいて、上位Ｎ個のスコアに対応する複数の文のそれぞれのスコアを更新する。
（ステップＳ１９）選択部１８０は、最も高いスコアに対応する文を、要約文として、選択する。そして、処理は、ステップＳ１５に進む。
（ステップＳ２０）出力部１９０は、特定された複数の要約文をまとめて、要約テキストとして、出力する。出力部１９０は、報告書形式に作成された要約テキストを出力してもよい。(Step S18) The calculation update unit 170 calculates a plurality of degrees of similarity based on the important sentence and each of the sentences corresponding to the top N scores. The calculation update unit 170 updates the scores of the sentences corresponding to the top N scores based on the similarities.
(Step S19) The selection unit 180 selects the sentence corresponding to the highest score as the summary sentence. Then, the process proceeds to step S15.
(Step S20) The output unit 190 puts together the identified plural abstracts and outputs them as a summary text. The output unit 190 may output a summary text created in a report format.

実施の形態１によれば、要約テキストに含まれる要約文は、音声認識結果に基づく文でない。当該要約文は、過去の報告書に基づく文である。そのため、当該要約文には、誤りが含まれていない可能性が高い。また、当該要約文の内容は、話し言葉で表されていない。よって、情報処理装置１００は、適切な要約文を出力することができる。 According to Embodiment 1, the summary sentence included in the summary text is not a sentence based on the speech recognition result. The abstract is based on past reports. Therefore, there is a high possibility that the summary does not contain any error. Also, the content of the abstract is not expressed in spoken language. Therefore, the information processing apparatus 100 can output an appropriate summary sentence.

ここで、作成装置を説明する。作成装置は、単語重要度モデル１１３と知識データベース１１４とを作成する。具体的に、作成装置を説明する。
図１１は、実施の形態１の作成装置の機能を示すブロック図である。作成装置２００は、記憶部２１０、単語重要度学習部２２０、及びデータベース作成部２３０を有する。Here, the production device will be described. The creating device creates a word importance model 113 and a knowledge database 114 . Specifically, the production device will be described.
FIG. 11 is a block diagram showing functions of the creation device according to the first embodiment. The creation device 200 has a storage unit 210 , a word importance learning unit 220 and a database creation unit 230 .

記憶部２１０は、作成装置２００が有する揮発性記憶装置又は不揮発性記憶装置に確保した記憶領域として実現してもよい。
単語重要度学習部２２０及びデータベース作成部２３０の一部又は全部は、作成装置２００が有する処理回路によって実現してもよい。また、単語重要度学習部２２０及びデータベース作成部２３０の一部又は全部は、作成装置２００が有するプロセッサが実行するプログラムのモジュールとして実現してもよい。The storage unit 210 may be realized as a storage area secured in a volatile storage device or a non-volatile storage device of the creation device 200 .
A part or all of the word importance level learning section 220 and the database creation section 230 may be implemented by a processing circuit of the creation device 200 . Also, part or all of the word importance level learning unit 220 and the database creation unit 230 may be realized as modules of programs executed by the processor of the creation device 200 .

記憶部２１０は、対話データベース２１１を記憶する。ここで、対話データベース２１１を例示する。
図１２は、実施の形態１の対話データベースの例を示す図である。対話データベース２１１は、記憶部２１０に格納されている。対話データベース２１１には、過去の対話履歴が登録されている。具体的には、対話データベース２１１は、対話ＩＤ（ｉｄｅｎｔｉｆｉｅｒ）、音声認識結果、カテゴリ、及び受付日時の項目を有する。Storage unit 210 stores dialogue database 211 . Here, the dialog database 211 is illustrated.
12 is a diagram showing an example of a dialogue database according to Embodiment 1. FIG. A dialogue database 211 is stored in the storage unit 210 . Past dialogue histories are registered in the dialogue database 211 . Specifically, the dialogue database 211 has items of dialogue ID (identifier), speech recognition result, category, and reception date and time.

対話ＩＤの項目には、識別子が登録される。音声認識結果の項目には、対話内容が登録される。カテゴリの項目には、音声認識結果の項目に登録されている文の内容が示すカテゴリが登録される。受付日時の項目には、対話が行われた日時が登録される。 An identifier is registered in the dialogue ID item. Conversation content is registered in the speech recognition result item. The Category field registers a category indicated by the content of the sentence registered in the Speech Recognition Result field. The date and time when the dialogue was conducted is registered in the item of date and time of reception.

記憶部２１０は、報告書データベース２１２を記憶する。ここで、報告書データベース２１２を例示する。
図１３は、実施の形態１の報告書データベースの例を示す図である。報告書データベース２１２は、記憶部２１０に格納されている。報告書データベース２１２は、過去の報告書に基づいて作成された情報である。報告書データベース２１２は、対話ＩＤ、受付履歴、対応履歴、カテゴリ、及び受付日時の項目を有する。Storage unit 210 stores report database 212 . Here, the report database 212 is illustrated.
13 is a diagram illustrating an example of a report database according to Embodiment 1. FIG. A report database 212 is stored in the storage unit 210 . The report database 212 is information created based on past reports. The report database 212 has items of dialogue ID, reception history, response history, category, and date and time of reception.

対話ＩＤの項目には、識別子が登録される。受付履歴の項目には、対話データベース２１１の音声認識結果の項目に登録されている情報の要約文が登録される。対応履歴の項目には、対応内容が登録される。カテゴリの項目には、受付履歴に登録されている内容が示すカテゴリが登録される。受付日時の項目には、対話が行われた日時が登録される。 An identifier is registered in the dialogue ID item. A summary of the information registered in the speech recognition result item of the dialog database 211 is registered in the reception history item. Correspondence contents are registered in the correspondence history item. A category indicated by the content registered in the reception history is registered in the category item. The date and time when the dialogue was conducted is registered in the item of date and time of reception.

単語重要度学習部２２０は、対話データベース２１１を用いて機械学習を行うことにより、単語重要度モデル１１３を作成する。なお、単語重要度モデル１１３は、単語が入力された場合、当該単語の重要度を出力する。 The word importance learning unit 220 creates the word importance model 113 by performing machine learning using the dialogue database 211 . When a word is input, the word importance model 113 outputs the importance of the word.

データベース作成部２３０は、報告書データベース２１２に基づいて、知識データベース１１４を作成する。例えば、データベース作成部２３０は、報告書データベース２１２の受付履歴及び対応履歴の項目に登録されている情報のうち、意味のある文、又は意味のある文節を抽出することで、知識データベース１１４を作成する。 The database creation unit 230 creates the knowledge database 114 based on the report database 212 . For example, the database creation unit 230 creates the knowledge database 114 by extracting meaningful sentences or meaningful clauses from the information registered in the items of reception history and response history of the report database 212. do.

データベース作成部２３０は、報告書データベース２１２に登録されている情報のうち、言語として不自然な文（以下、非文という）を削除してもよい。例えば、データベース作成部２３０は、ｎ－ｇｒａｍ尤度等を用いて、文の尤度を算出し、文の尤度に基づいて、非文を削除してもよい。ここで、非文が削除される例を示す。 The database creation unit 230 may delete sentences that are unnatural in terms of language (hereinafter referred to as non-sentences) from the information registered in the report database 212 . For example, the database creating unit 230 may calculate the likelihood of a sentence using an n-gram likelihood or the like, and delete non-sentences based on the likelihood of the sentence. Here is an example where non-sentences are deleted.

図１４は、実施の形態１の非文の削除の例を示す図である。図１４が示すように、データベース作成部２３０は、報告書データベース２１２に登録されている情報の中から、非文を削除する。このように、知識データベース１１４は、過去の報告書の中から非文が削除されることにより作成される。そして、過去の報告書の中から非文が削除されることにより、非文が要約文として選択されることが防止できる。 14 is a diagram illustrating an example of non-sentence deletion according to Embodiment 1. FIG. As shown in FIG. 14 , the database creation unit 230 deletes non-sentences from information registered in the report database 212 . Thus, the knowledge database 114 is created by deleting non-sentences from past reports. By deleting non-sentences from past reports, it is possible to prevent non-sentences from being selected as summary sentences.

次に、作成装置２００が実行する処理を、フローチャートを用いて、説明する。
図１５は、実施の形態１の作成装置が実行する処理の例を示すフローチャートである。
（ステップＳ２１）単語重要度学習部２２０は、対話データベース２１１の音声認識結果を参照し、形態素解析を用いて、音声認識結果を解析する。
（ステップＳ２２）単語重要度学習部２２０は、ＴＦ－ＩＤＦなどを用いて、解析により得られた単語の重要度を算出する。単語重要度学習部２２０は、単語と重要度との対応関係を示す情報を作成する。Next, processing executed by the creation device 200 will be described using a flowchart.
15 is a flowchart illustrating an example of processing executed by the creation device according to Embodiment 1. FIG.
(Step S21) The word importance learning unit 220 refers to the speech recognition results in the dialogue database 211 and analyzes the speech recognition results using morphological analysis.
(Step S22) The word importance learning unit 220 uses TF-IDF or the like to calculate the importance of the words obtained by the analysis. The word importance learning section 220 creates information indicating the correspondence between words and importance.

（ステップＳ２３）単語重要度学習部２２０は、全ての音声認識結果に対して処理を行ったか否かを判定する。全ての音声認識結果に対して処理を行った場合、処理は、ステップS２４に進む。未処理の音声認識結果がある場合、処理は、ステップＳ２１に進む。 (Step S23) The word importance level learning unit 220 determines whether or not all speech recognition results have been processed. If all speech recognition results have been processed, the process proceeds to step S24. If there is an unprocessed speech recognition result, the process proceeds to step S21.

このように、ステップＳ２１とステップＳ２２とが繰り返されることで、単語に対応する重要度が変化する。言い換えれば、学習により、単語に対応する重要度が更新される。そして、ステップＳ２１とステップＳ２２とが繰り返されることで得られた、単語と重要度との対応関係を示す情報が、単語重要度モデル１１３になる。 By repeating steps S21 and S22 in this manner, the degree of importance corresponding to a word changes. In other words, learning updates the importance associated with a word. Then, the word importance model 113 is information indicating the correspondence between words and importance obtained by repeating steps S21 and S22.

（ステップＳ２４）データベース作成部２３０は、報告書データベース２１２に基づいて、知識データベース１１４を作成する。
これにより、単語重要度モデル１１３と知識データベース１１４とが、作成される。(Step S<b>24 ) The database creating section 230 creates the knowledge database 114 based on the report database 212 .
As a result, a word importance model 113 and a knowledge database 114 are created.

実施の形態２．
次に、実施の形態２を説明する。実施の形態２では、実施の形態１と相違する事項を主に説明する。そして、実施の形態２では、実施の形態１と共通する事項の説明を省略する。
図１６は、実施の形態２の情報処理装置の機能を示すブロック図である。情報処理装置１００は、さらに、カテゴリ推定部１９１を有する。Embodiment 2.
Next, Embodiment 2 will be described. In Embodiment 2, mainly matters different from Embodiment 1 will be described. In the second embodiment, descriptions of items common to the first embodiment are omitted.
FIG. 16 is a block diagram showing functions of the information processing apparatus according to the second embodiment. Information processing apparatus 100 further includes category estimating section 191 .

記憶部１１０は、さらに、カテゴリ推定モデル１１５を記憶してもよい。カテゴリ推定モデル１１５は、単語が入力された場合、当該単語に基づいたカテゴリを出力する。言い換えれば、カテゴリ推定モデル１１５は、当該単語に基づいて、カテゴリを推定する。なお、例えば、カテゴリは、機種である。 Storage unit 110 may further store category estimation model 115 . When a word is input, the category estimation model 115 outputs a category based on that word. In other words, category inference model 115 infers a category based on the word. Note that, for example, the category is model.

取得部１２０は、カテゴリ推定モデル１１５を取得する。例えば、取得部１２０は、カテゴリ推定モデル１１５を記憶部１１０から取得する。また、例えば、取得部１２０は、カテゴリ推定モデル１１５を外部装置から取得する。
カテゴリ推定部１９１は、形態素解析により得られた単語と、カテゴリ推定モデル１１５とを用いて、対話内容のカテゴリを推定する。Acquisition unit 120 acquires category estimation model 115 . For example, the acquisition unit 120 acquires the category estimation model 115 from the storage unit 110 . Also, for example, the acquisition unit 120 acquires the category estimation model 115 from an external device.
The category estimation unit 191 estimates the category of dialogue content using the words obtained by the morphological analysis and the category estimation model 115 .

ここで、カテゴリが推定されない場合の例を示す。
図１７は、実施の形態２のカテゴリが推定されない場合の例を示す図である。テキストデータが示す対話内容のカテゴリは、冷蔵庫である。カテゴリが推定されない場合、単語“庫内灯”に基づいて、オーブンレンジに関する文が、多く検索される。そこで、作成検索算出部１６０は、推定されたカテゴリ“冷蔵庫”を用いて、“冷蔵庫”に関する文を検索する。これにより、対話内容のカテゴリに関する文のみが、検索される。情報処理装置１００は、対話内容のカテゴリに関する文のみを検索することで、適切な要約文を選択することができる。Here is an example when the category is not inferred.
FIG. 17 is a diagram showing an example in which the category is not estimated according to the second embodiment. The category of dialogue content indicated by the text data is a refrigerator. If the category is not estimated, many sentences related to microwave ovens are retrieved based on the word "inside light". Therefore, the creation search calculation unit 160 searches for sentences related to "refrigerator" using the estimated category "refrigerator". As a result, only sentences relating to the dialogue content category are retrieved. The information processing apparatus 100 can select an appropriate summary sentence by retrieving only sentences related to the category of dialogue content.

図１８は、実施の形態２の情報処理装置が実行する処理の例を示すフローチャートである。図１８の処理は、ステップＳ１３ａが実行される点が図１０の処理と異なる。また、図１８の処理では、ステップＳ１７がステップＳ１７ａに変更される。そこで、図１８では、ステップＳ１３ａ，１７ａを説明する。そして、ステップＳ１３ａ，１７ａ以外の処理の説明は、省略する。 18 is a flowchart illustrating an example of processing executed by the information processing apparatus according to the second embodiment; FIG. The process of FIG. 18 differs from the process of FIG. 10 in that step S13a is executed. Further, in the process of FIG. 18, step S17 is changed to step S17a. Therefore, steps S13a and S17a will be described with reference to FIG. Further, description of processes other than steps S13a and S17a is omitted.

（ステップＳ１３ａ）カテゴリ推定部１９１は、形態素解析により得られた単語と、カテゴリ推定モデル１１５とを用いて、対話内容のカテゴリを推定する。
（ステップＳ１７ａ）作成検索算出部１６０は、重要文に基づいて、クエリを作成する。作成検索算出部１６０は、推定されたカテゴリとクエリを用いて、知識データベース１１４に対して検索を行う。すなわち、作成検索算出部１６０は、推定されたカテゴリに情報を絞った状態で、クエリを用いて、知識データベース１１４に対して検索を行う。これにより、推定されたカテゴリと関係のある複数の文が検索される。作成検索算出部１６０は、検索された複数の文のそれぞれのスコアを算出する。これにより、複数の文に対応する複数のスコアが、算出される。(Step S<b>13 a ) The category estimation unit 191 estimates the category of the dialogue content using the words obtained by the morphological analysis and the category estimation model 115 .
(Step S17a) The creation search calculation unit 160 creates a query based on the important sentence. The creation search calculator 160 searches the knowledge database 114 using the estimated category and query. That is, the creation search calculation unit 160 searches the knowledge database 114 using the query while narrowing down the information to the estimated category. This retrieves multiple sentences that are related to the inferred category. The creation search calculation unit 160 calculates scores for each of the plurality of searched sentences. Thereby, multiple scores corresponding to multiple sentences are calculated.

実施の形態２によれば、情報処理装置１００は、対話内容のカテゴリに関する文のみを検索することで、適切な要約文を選択することができる。 According to Embodiment 2, the information processing apparatus 100 can select an appropriate summary sentence by searching only sentences related to the category of dialogue content.

ここで、カテゴリ推定モデル１１５は、作成装置２００により、作成される。カテゴリ推定モデル１１５の作成について説明する。
図１９は、実施の形態２の作成装置の機能を示すブロック図である。作成装置２００は、さらに、カテゴリ推定学習部２４０を有する。カテゴリ推定学習部２４０は、カテゴリ推定モデル１１５を作成する。Here, the category estimation model 115 is created by the creation device 200 . Creation of the category estimation model 115 will be described.
FIG. 19 is a block diagram showing functions of the creation device according to the second embodiment. Creation device 200 further includes category estimation learning section 240 . Category inference learning unit 240 creates category inference model 115 .

図２０は、実施の形態２の作成装置が実行する処理の例を示すフローチャートである。図２０の処理は、ステップＳ２４ａが実行される点が図１５の処理と異なる。そこで、図２０では、ステップＳ２４ａを説明する。そして、ステップＳ２４ａ以外の処理の説明は、省略する。 20 is a flowchart illustrating an example of processing executed by the creation device according to Embodiment 2. FIG. The process of FIG. 20 differs from the process of FIG. 15 in that step S24a is executed. Therefore, in FIG. 20, step S24a will be described. The description of the processes other than step S24a is omitted.

（ステップＳ２４ａ）カテゴリ推定学習部２４０は、対話データベース２１１の音声認識結果に対して、形態素解析を行う。カテゴリ推定学習部２４０は、形態素解析により得られた単語と、対話データベース２１１のカテゴリとにおける自己相互情報量を算出し、自己相互情報量に基づいて、単語とカテゴリとの対応関係を示す情報を、カテゴリ推定モデル１１５として、作成する。 (Step S<b>24 a ) The category estimation learning unit 240 performs morphological analysis on the speech recognition result of the dialogue database 211 . The category estimation learning unit 240 calculates the amount of self-mutual information between the words obtained by the morphological analysis and the categories of the dialogue database 211, and based on the amount of self-mutual information, generates information indicating the correspondence between the words and the categories. , as the category estimation model 115 .

また、カテゴリ推定学習部２４０は、報告書データベース２１２の受付履歴に対して、形態素解析を行ってもよい。カテゴリ推定学習部２４０は、形態素解析により得られた単語と、報告書データベース２１２のカテゴリとにおける自己相互情報量を算出し、自己相互情報量に基づいて、単語とカテゴリとの対応関係を示す情報を、カテゴリ推定モデル１１５として、作成する。 Also, the category estimation learning unit 240 may perform morphological analysis on the reception history of the report database 212 . The category estimation learning unit 240 calculates the self mutual information between the words obtained by the morphological analysis and the categories of the report database 212, and based on the self mutual information, information indicating the correspondence between the words and the categories. is created as the category estimation model 115 .

実施の形態３．
次に、実施の形態３を説明する。実施の形態３では、実施の形態１と相違する事項を主に説明する。そして、実施の形態３では、実施の形態１と共通する事項の説明を省略する。
図２１は、実施の形態３の情報処理装置の機能を示すブロック図である。記憶部１１０は、補助情報１１６を記憶してもよい。補助情報１１６は、クエリの生成を補助する情報である。言い換えれば、補助情報１１６は、クエリを生成する際に用いられる情報である。例えば、補助情報１１６は、特許文献２に記載の３次元情報と同じような情報であると考えてもよい。ここで、補助情報１１６を具体的に示す。 Embodiment 3.
Next, Embodiment 3 will be described. In the third embodiment, mainly matters different from the first embodiment will be described. In the third embodiment, descriptions of matters common to the first embodiment are omitted.
FIG. 21 is a block diagram showing functions of the information processing apparatus according to the third embodiment. The storage unit 110 may store auxiliary information 116 . The auxiliary information 116 is information that aids query generation. In other words, the auxiliary information 116 is information used when generating queries. For example, the auxiliary information 116 may be considered to be information similar to the three-dimensional information described in Patent Document 2. Here, the auxiliary information 116 is specifically shown.

図２２は、実施の形態３の補助情報の例を示す図である。補助情報１１６は、複数の述語である複数の単語のそれぞれと、複数の関係情報のそれぞれとの対応関係を示す情報である。関係情報とは、複数の単語のそれぞれの関係性を示す。 22 is a diagram illustrating an example of auxiliary information according to Embodiment 3. FIG. The auxiliary information 116 is information indicating the correspondence between each of a plurality of words, which are a plurality of predicates, and each of a plurality of relational information. The relationship information indicates the relationship between multiple words.

例えば、補助情報１１６は、述語ラベルとサブ単語文脈行列の項目を有する。述語ラベルの項目には、述語である単語が登録される。例えば、述語ラベルの項目には、述語である単語“落ちる”が登録される。サブ単語文脈行列の項目には、関係情報が登録される。ここで、関係情報は、２次元のテーブルで表されると考えてもよい。すなわち、関係情報は、２次元情報と考えてもよい。関係情報には、述語である単語が対応付けられる。述語である単語と関係情報との対応関係を示す情報は、３次元情報と考えてもよい。よって、補助情報１１６は、３次元情報と考えてもよい。 For example, the auxiliary information 116 comprises predicate label and subword context matrix entries. A word that is a predicate is registered in the item of the predicate label. For example, in the item of predicate label, the predicate word "fall" is registered. Related information is registered in the items of the sub-word context matrix. Here, it may be considered that the relationship information is represented by a two-dimensional table. That is, the relationship information may be considered as two-dimensional information. A word that is a predicate is associated with the relational information. The information indicating the correspondence between the predicate word and the relational information may be considered as three-dimensional information. Therefore, the auxiliary information 116 may be considered as three-dimensional information.

上述したように、関係情報は、複数の単語のそれぞれの関係性を示す。当該複数の単語とは、動詞、名詞、形容詞などの単語である。図２２の関係情報では、名詞の単語が例示されている。上述の通り、図２２の関係情報には、名詞以外の品詞の単語が含まれてもよい。 As described above, the relationship information indicates relationships between multiple words. The plurality of words are words such as verbs, nouns, and adjectives. The relationship information in FIG. 22 exemplifies noun words. As described above, the relationship information in FIG. 22 may include words of parts of speech other than nouns.

次に、補助情報１１６を具体的に説明する。例えば、述語ラベル“落ちる”には、関係情報が対応付けられている。図２２の関係情報では、“証明”と“照明”との関係性の度合が“１５９”であることが示されている。ここで、“１５９”などの数字は、複数の単語のそれぞれの関係性の度合を示す関係度と呼ぶ。このように、補助情報１１６には、関係度が含まれている。また、関係度は、自己相互情報量と考えてもよい。なお、関係度の上限は、１００に限らない。関係度“１５９”は、予め設定された閾値よりも大きい。よって、図２２の関係情報は、“証明”と“照明”との関係性が強いことを示している。 Next, the auxiliary information 116 will be specifically described. For example, the predicate label "fall" is associated with relational information. The relationship information in FIG. 22 indicates that the degree of relationship between "proof" and "illumination" is "159". Here, a number such as "159" is called a degree of relationship indicating the degree of relationship between a plurality of words. Thus, the auxiliary information 116 contains the degree of relationship. Also, the degree of relationship may be considered as the amount of self-mutual information. Note that the upper limit of the degree of relationship is not limited to 100. The degree of relationship "159" is greater than a preset threshold. Therefore, the relationship information in FIG. 22 indicates that the relationship between "proof" and "illumination" is strong.

取得部１２０は、補助情報１１６を取得する。例えば、取得部１２０は、補助情報１１６を記憶部１１０から取得する。また、例えば、取得部１２０は、補助情報１１６を外部装置から取得する。 Acquisition unit 120 acquires auxiliary information 116 . For example, the acquisition unit 120 acquires the auxiliary information 116 from the storage unit 110 . Also, for example, the acquisition unit 120 acquires the auxiliary information 116 from an external device.

作成検索算出部１６０は、重要文に対して形態素解析を実行することで得られた複数の品詞付単語の中から、述語になれる名詞の単語又は述語の単語を特定する。作成検索算出部１６０は、名詞の単語が述語に変換された単語又は特定された述語の単語と、複数の品詞付単語の中の単語（例えば、第１の単語とも言う。）と、補助情報１１６とに基づいて、当該単語（すなわち、第１の単語）と関係がある単語である関係単語を特定する。作成検索算出部１６０は、重要文と関係単語とに基づいてクエリを生成する。 The creation search calculation unit 160 identifies a noun word or a predicate word that can be a predicate from a plurality of words with parts of speech obtained by executing morphological analysis on the important sentence. The creation search calculation unit 160 generates a word converted from a noun word into a predicate or a specified predicate word, a word among a plurality of words with a part of speech (for example, also referred to as a first word), and auxiliary information. 116 to identify related words that are words that are related to the word (ie, the first word). The creation search calculation unit 160 creates a query based on the important sentence and related words.

例えば、重要文が“証明が落ちる”であるものとする。作成検索算出部１６０は、述語の単語“落ちる”を特定する。作成検索算出部１６０は、述語の単語“落ちる”と、単語“証明”と、補助情報１１６とに基づいて、単語“証明”と関係がある単語“照明”を特定する。作成検索算出部１６０は、重要文“証明が落ちる”と単語“照明”とに基づいてクエリを生成する。例えば、作成検索算出部１６０は、クエリ“証明落ちる照明”を作成する。このように、作成検索算出部１６０は、クエリ拡張によって、クエリを作成する。 For example, it is assumed that the key sentence is "proof fails". The creation search calculation unit 160 identifies the predicate word “fall”. Based on the predicate word “fall”, the word “proof”, and the auxiliary information 116, the creation search calculation unit 160 identifies the word “illumination” that is related to the word “proof”. The creation search calculation unit 160 creates a query based on the key sentence “proof is dropped” and the word “illumination”. For example, the creation search calculation unit 160 creates the query “proof falling lighting”. Thus, the creation search calculation unit 160 creates a query by query expansion.

ここで、重要文“証明が落ちる”の“証明”は、“照明”の誤りである。例えば、重要文（すなわち、テキストデータ）が音声認識によって作成された場合、音声認識の誤りによって、重要文“証明が落ちる”が作成される。クエリが“証明が落ちる”に基づいて、作成された場合、“証明”に関係する文が、検索される。“証明”に関係する文に基づいて、選択された要約文は、正確性が低い。そこで、作成検索算出部１６０は、“照明”を含むクエリを作成する。これにより、“照明”に関係する文も、検索される。これにより、情報処理装置１００は、“照明”に関係する文を要約文として、選択できる。 Here, the "proof" of the important sentence "proof fails" is the error of "illumination". For example, when an important sentence (that is, text data) is produced by speech recognition, an error in speech recognition produces the important sentence "Proof falls". If the query is formulated based on "proof falls", then sentences related to "proof" are retrieved. Based on the sentences related to "proof", the selected summary sentences are less accurate. Therefore, the creation search calculation unit 160 creates a query including “lighting”. As a result, sentences related to "illumination" are also retrieved. Thereby, the information processing apparatus 100 can select a sentence related to "illumination" as a summary sentence.

図２３は、実施の形態３の情報処理装置が実行する処理の例を示すフローチャートである。図２３の処理では、ステップＳ１７がステップＳ１７ｂに変更される。そこで、図２３では、ステップＳ１７ｂを説明する。そして、ステップＳ１７ｂ以外の処理の説明は、省略する。 23 is a flowchart illustrating an example of processing executed by the information processing apparatus according to the third embodiment; FIG. In the process of FIG. 23, step S17 is changed to step S17b. Therefore, FIG. 23 explains step S17b. A description of the processes other than step S17b is omitted.

（ステップＳ１７ｂ）作成検索算出部１６０は、クエリ拡張によって、クエリを作成する。作成検索算出部１６０は、知識データベース１１４の中から、クエリにより得られる文を検索する。これにより、複数の文が検索される。作成検索算出部１６０は、検索された複数の文のそれぞれのスコアを算出する。これにより、複数の文に対応する複数のスコアが、算出される。 (Step S17b) The creation search calculation unit 160 creates a query by query expansion. The creation search calculation unit 160 searches the knowledge database 114 for sentences obtained by the query. This retrieves multiple sentences. The creation search calculation unit 160 calculates scores for each of the plurality of searched sentences. Thereby, multiple scores corresponding to multiple sentences are calculated.

実施の形態３によれば、情報処理装置１００は、正確性の高い要約文を選択できる。
ここで、補助情報１１６は、作成装置２００により、作成される。補助情報１１６の作成について説明する。According to Embodiment 3, the information processing apparatus 100 can select a summary with high accuracy.
Here, the auxiliary information 116 is created by the creation device 200 . Creation of the auxiliary information 116 will be described.

図２４は、実施の形態３の作成装置の機能を示すブロック図である。作成装置２００は、さらに、補助情報作成部２５０を有する。
補助情報作成部２５０は、補助情報１１６を作成する。FIG. 24 is a block diagram showing functions of the creation device according to the third embodiment. The creating device 200 further has an auxiliary information creating section 250 .
Auxiliary information creating unit 250 creates auxiliary information 116 .

図２５は、実施の形態３の作成装置が実行する処理の例を示すフローチャートである。図２５の処理は、ステップＳ２４ｂが実行される点が図１５の処理と異なる。そこで、図２５では、ステップＳ２４ｂを説明する。そして、ステップＳ２４ｂ以外の処理の説明は、省略する。 25 is a flowchart illustrating an example of processing executed by the creation device according to Embodiment 3. FIG. The process of FIG. 25 differs from the process of FIG. 15 in that step S24b is executed. Therefore, FIG. 25 explains step S24b. A description of the processes other than step S24b is omitted.

（ステップＳ２４ｂ）例えば、補助情報作成部２５０は、対話データベース２１１に含まれている１つの述語と１つの名詞を抽出する。補助情報作成部２５０は、報告書データベース２１２に含まれている１つの名詞を抽出する。補助情報作成部２５０は、抽出された、対話データベース２１１の述語と名詞と、報告書データベース２１２の名詞とを用いて、自己相互情報量を算出する。補助情報作成部２５０は、自己相互情報量に基づいて、補助情報１１６を作成する。また、対話データベース２１１（詳細には、対話データベース２１１の述語と名詞）に誤りが含まれていても、報告書データベース２１２に基づいて作成された知識データベース１１４内の名詞が検索されることで、当該誤りが回復される。 (Step S<b>24 b ) For example, the auxiliary information creating section 250 extracts one predicate and one noun included in the dialogue database 211 . The auxiliary information creation unit 250 extracts one noun included in the report database 212 . The auxiliary information creation unit 250 uses the extracted predicates and nouns of the dialogue database 211 and the nouns of the report database 212 to calculate self-mutual information. The auxiliary information creating unit 250 creates the auxiliary information 116 based on the self mutual information. Also, even if the dialogue database 211 (more specifically, the predicates and nouns of the dialogue database 211) contains an error, by searching for the nouns in the knowledge database 114 created based on the report database 212, The error is recovered.

実施の形態４．
次に、実施の形態４を説明する。実施の形態４では、実施の形態３と相違する事項を主に説明する。そして、実施の形態４では、実施の形態３と共通する事項の説明を省略する。
図２６は、実施の形態４の情報処理装置の機能を示すブロック図である。情報処理装置１００は、さらに、抽出更新部１９２を有する。
取得部１２０は、修正情報を取得する。修正情報は、ユーザに修正された要約文の情報である。Embodiment 4.
Next, Embodiment 4 will be described. In Embodiment 4, mainly matters different from Embodiment 3 will be described. Further, in the fourth embodiment, descriptions of matters common to the third embodiment are omitted.
FIG. 26 is a block diagram showing functions of the information processing apparatus according to the fourth embodiment. The information processing apparatus 100 further has an extraction updating unit 192 .
Acquisition unit 120 acquires correction information. The correction information is information of a summary sentence corrected by the user.

抽出更新部１９２は、出力部１９０が出力した要約テキスト内の要約文と、修正情報とを比較し、差分を抽出する。抽出更新部１９２は、重要文と要約文の差分とに基づいて、重要文の単語と要約文の差分に対応する単語との関係度を補助情報１１６から特定し、特定された関係度を、現状の当該関係度（すなわち、値）よりも低くする。抽出更新部１９２は、重要文と修正情報の差分とに基づいて、重要文の単語と修正情報の差分に対応する単語との関係度を補助情報１１６から特定し、特定された関係度を、現状の当該関係度（すなわち、値）よりも高くする。 The extraction update unit 192 compares the summary sentence in the summary text output by the output unit 190 and the correction information, and extracts the difference. The extraction update unit 192 identifies, from the auxiliary information 116, the degree of relationship between the word in the important sentence and the word corresponding to the difference in the summary sentence based on the difference between the important sentence and the summary sentence. Make it lower than the current relevant degree of relationship (that is, value). The extraction update unit 192 identifies, from the auxiliary information 116, the degree of relationship between the word in the important sentence and the word corresponding to the difference in the correction information based on the difference between the important sentence and the correction information, and the identified degree of relationship is It is made higher than the current relevant degree of relationship (that is, value).

例えば、重要文が“証明が落ちる”であるものとする。当該要約テキストが“賞名が落ちる”であるとする。修正情報が“照明が落ちる”であるとする。当該要約テキストと修正情報との差分は、“賞名”と“照明”である。抽出更新部１９２は、重要文と要約文の差分とに基づいて、重要文の単語“証明”と要約文の差分に対応する単語“賞名”との関係度を補助情報１１６から特定し、特定された関係度を低くする。抽出更新部１９２は、重要文と修正情報の差分とに基づいて、重要文の単語“証明”と修正情報の差分に対応する単語“照明”との関係度を補助情報１１６から特定し、特定された関係度を高くする。
これにより、より正確な単語が、クエリに含まれる。よって、情報処理装置１００は、正確性の高い要約文を選択することができる。For example, it is assumed that the key sentence is "proof fails". Suppose that the summary text is "the prize name falls". Suppose that the correction information is "lights go off". The differences between the summary text and the correction information are "award name" and "lighting". Based on the difference between the important sentence and the abstract sentence, the extracting and updating unit 192 identifies, from the auxiliary information 116, the degree of relationship between the word "proof" in the important sentence and the word "prize name" corresponding to the difference in the abstract sentence, Lower the identified relationship. The extraction update unit 192 identifies, from the auxiliary information 116, the degree of relationship between the word “proof” in the important sentence and the word “illumination” corresponding to the difference in the correction information, based on the difference between the important sentence and the correction information. increase the degree of relationship
This allows more precise words to be included in the query. Therefore, the information processing apparatus 100 can select a summary sentence with high accuracy.

図２７は、実施の形態４の情報処理装置が実行する処理の例を示すフローチャートである。図２７の処理では、ステップＳ２０ａが実行される点が図２３の処理と異なる。そこで、図２７では、ステップＳ２０ａを説明する。そして、ステップＳ２０ａ以外の処理の説明は、省略する。 27 is a flowchart illustrating an example of processing executed by the information processing apparatus according to the fourth embodiment; FIG. The process of FIG. 27 differs from the process of FIG. 23 in that step S20a is executed. Therefore, step S20a will be described with reference to FIG. A description of the processes other than step S20a is omitted.

（ステップＳ２０ａ）抽出更新部１９２は、出力部１９０が出力した要約テキスト内の要約文と、取得部１２０により取得された修正情報とを比較し、差分を抽出する。抽出更新部１９２は、差分に基づいて、補助情報１１６を更新する。 (Step S20a) The extraction update unit 192 compares the summary sentence in the summary text output by the output unit 190 and the correction information acquired by the acquisition unit 120, and extracts the difference. The extraction update unit 192 updates the auxiliary information 116 based on the difference.

実施の形態４によれば、情報処理装置１００は、正確性の高い要約文を選択することができる。 According to Embodiment 4, the information processing apparatus 100 can select a summary with high accuracy.

以上に説明した各実施の形態における特徴は、互いに適宜組み合わせることができる。 The features of the embodiments described above can be combined as appropriate.

１００情報処理装置、１０１プロセッサ、１０２揮発性記憶装置、１０３不揮発性記憶装置、１０４インタフェース、１１０記憶部、１１１不要語辞書、１１２文分割辞書、１１３単語重要度モデル、１１４知識データベース、１１５カテゴリ推定モデル、１１６補助情報、１２０取得部、１３０解析部、１４０不要語削除部、１５０重要文抽出部、１６０作成検索算出部、１７０算出更新部、１８０選択部、１９０出力部、１９１カテゴリ推定部、１９２抽出更新部、２００作成装置、２１０記憶部、２１１対話データベース、２１２報告書データベース、２２０単語重要度学習部、２３０データベース作成部、２４０カテゴリ推定学習部、２５０補助情報作成部。 100 information processing device 101 processor 102 volatile storage device 103 nonvolatile storage device 104 interface 110 storage unit 111 unnecessary word dictionary 112 sentence segmentation dictionary 113 word importance model 114 knowledge database 115 category estimation model, 116 auxiliary information, 120 acquisition unit, 130 analysis unit, 140 unnecessary word deletion unit, 150 important sentence extraction unit, 160 creation search calculation unit, 170 calculation update unit, 180 selection unit, 190 output unit, 191 category estimation unit, 192 Extraction update unit 200 Creation device 210 Storage unit 211 Dialogue database 212 Report database 220 Word importance learning unit 230 Database creation unit 240 Category estimation learning unit 250 Auxiliary information creation unit.

Claims

an acquisition unit that acquires text data indicating the content of dialogue and knowledge information including a plurality of sentences created based on past reports;
an analysis unit that analyzes the text data using morphological analysis;
an important sentence extraction unit for extracting one sentence as a first important sentence from among a plurality of sentences included in the text data using the result of the morphological analysis;
creating a query based on the first important sentence, searching the knowledge information for sentences obtained by the query, and calculating scores of each of the retrieved sentences by a preset method; a creation search calculator that calculates;
A plurality of degrees of similarity, which are degrees of similarity between the first important sentence and each of the plurality of retrieved sentences, based on the first important sentence and each of the plurality of retrieved sentences. and updating the score of each of the plurality of retrieved sentences based on the plurality of similarities;
a selection unit that selects one sentence from the plurality of retrieved sentences as a summary sentence based on the score of each of the plurality of sentences;
an output unit that outputs the summary sentence;
Information processing device having

The important sentence extraction unit extracts a plurality of sentences as a plurality of important sentences from a plurality of sentences included in the text data,
The creation search calculation unit creates the query based on the first important sentence among the plurality of important sentences.
The information processing device according to claim 1 .

further comprising an unnecessary word deletion unit that deletes unnecessary words from the text data by a preset method;
The information processing apparatus according to claim 1 or 2.

The creation search calculation unit creates the query by lexicalizing consecutive words in the first important sentence.
The information processing apparatus according to any one of claims 1 to 3.

The creation search calculation unit creates the query by lexicalizing a negative word in the first important sentence and a verb immediately before the negative word.
The information processing apparatus according to claim 4.

The creation search calculation unit creates the query by lexicalizing numerals in the first important sentence and units after the numerals.
The information processing apparatus according to claim 4 or 5.

The creation search calculation unit creates the query based on the first key sentence and at least one of a preamble and a post sentence of the first key sentence in the text data.
The information processing apparatus according to any one of claims 1 to 6.

The creation search calculation unit includes synonyms of words included in the first important sentence in the query.
The information processing apparatus according to any one of claims 1 to 7.

When calculating one degree of similarity out of the plurality of degrees of similarity, the calculation updating unit selects a set of words included in one of the retrieved plurality of sentences to be included in the first important sentence. Calculate the similarity between sets by giving a penalty when the number of elements of the difference set is large, which is obtained by subtracting the set of words
The information processing apparatus according to any one of claims 1 to 8.

The knowledge information is information created by deleting non-sentences from the past report,
The information processing apparatus according to any one of claims 1 to 9.

further comprising a category estimator;
Each of the plurality of sentences included in the knowledge information is associated with a category,
The acquisition unit acquires a category estimation model for estimating categories based on words,
The category estimation unit estimates a category of the dialogue content using the words obtained by the morphological analysis and the category estimation model,
The creation search calculation unit performs a search on the knowledge information using the estimated category and the query.
The information processing apparatus according to any one of claims 1 to 10.

The acquisition unit acquires auxiliary information that is information indicating a correspondence relationship between each of a plurality of words that are a plurality of predicates and each of a plurality of relational information that indicates the relationship between each of the plurality of words,
The creation search calculation unit specifies a noun word that can be a predicate or a predicate word from among a plurality of words with parts of speech obtained by executing morphological analysis on the first important sentence, is related to the first word based on the word converted into the predicate or the specified predicate word, the first word among the plurality of words with parts of speech, and the auxiliary information Identifying related words that are words, and creating the query based on the first important sentence and the related words;
The information processing apparatus according to any one of claims 1 to 11.

further comprising an extraction update unit;
The auxiliary information includes a degree of relationship indicating the degree of relationship,
The acquisition unit acquires correction information of the abstract,
The extracting and updating unit compares the summary sentence and the correction information to extract a difference, and based on the difference between the first important sentence and the summary sentence, extracts the words of the first important sentence and the correction information. The degree of relationship between the word corresponding to the difference in the summary is specified from the auxiliary information, the specified degree of relationship is made lower than the current value, and the difference between the first important sentence and the correction information. and specifying the degree of relationship between the word of the first important sentence and the word corresponding to the difference in the correction information from the auxiliary information, and increasing the specified degree of relationship higher than the current value. do,
The information processing apparatus according to claim 12.

The information processing device
Acquiring text data indicating the content of the dialogue and knowledge information including multiple sentences created based on past reports,
Analyzing the text data using morphological analysis,
extracting one sentence as a first important sentence from among a plurality of sentences contained in the text data using the result of the morphological analysis;
create a query based on the first key sentence;
searching the knowledge information for a sentence obtained by the query;
Calculate scores for each of the retrieved sentences by a preset method,
A plurality of degrees of similarity, which are degrees of similarity between the first important sentence and each of the plurality of retrieved sentences, based on the first important sentence and each of the plurality of retrieved sentences. to calculate
updating a score for each of the plurality of retrieved sentences based on the plurality of similarities;
selecting one sentence from the plurality of retrieved sentences as a summary sentence based on the score of each of the plurality of sentences;
outputting said summary sentence;
Summary sentence output method.

information processing equipment,
Acquiring text data indicating the content of the dialogue and knowledge information including multiple sentences created based on past reports,
Analyzing the text data using morphological analysis,
extracting one sentence as a first important sentence from among a plurality of sentences contained in the text data using the result of the morphological analysis;
create a query based on the first key sentence;
searching the knowledge information for a sentence obtained by the query;
Calculate scores for each of the retrieved sentences by a preset method,
A plurality of degrees of similarity, which are degrees of similarity between the first important sentence and each of the plurality of retrieved sentences, based on the first important sentence and each of the plurality of retrieved sentences. to calculate
updating a score for each of the plurality of retrieved sentences based on the plurality of similarities;
selecting one sentence from the plurality of retrieved sentences as a summary sentence based on the score of each of the plurality of sentences;
outputting said summary sentence;
Summary statement output program to execute processing.