JP7243145B2

JP7243145B2 - Information processing device, information processing system and information processing method

Info

Publication number: JP7243145B2
Application number: JP2018221642A
Authority: JP
Inventors: 史裕手島; 寛小林; 峻横田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2018-03-19
Filing date: 2018-11-27
Publication date: 2023-03-22
Anticipated expiration: 2038-11-27
Also published as: JP2019164327A

Description

本発明は、情報処理装置、情報処理システム及び情報処理方法に関する。 The present invention relates to an information processing device, an information processing system, and an information processing method.

会議中に議事録を手入力で作成する場合、発言の抜け、漏れ又は個人の主観が入ることがある。一方、会議音声を録音し後日文字に起こして議事録を作成する場合、大きな労力と時間が必要であった。そこで、議事録作成の労力を低減するため、リアルタイムに会議中の音声認識を行い、文章化することで会議中に抜け又は漏れのない議事録作成の支援を行う会議支援システムが知られている（例えば特許文献１）。 When manually creating minutes during a meeting, omissions, omissions, or individual subjectivity may be entered. On the other hand, recording conference voices and transcribing them into characters at a later date to create minutes requires a great deal of labor and time. Therefore, in order to reduce the labor required to create minutes, there is known a meeting support system that performs voice recognition during a meeting in real time and converts it into text to support the creation of minutes without omissions or omissions during the meeting. (For example, Patent Document 1).

しかしながら、従来の音声認識を利用した会議支援システムにおいては、音声認識精度向上のために高性能なマイクあるいは集音性が優れたマイク等が必要であり、使用環境を構築するためのコストが大きかった。 However, conventional conference support systems that use speech recognition require high-performance microphones or microphones with excellent sound-collecting properties in order to improve the accuracy of speech recognition. rice field.

また、会議参加者が個々にマイクで音声を認識すると、それぞれのマイクで音声認識が実行されるため、会議支援システムとしては発言者の発言が重複して取得されてしまうことがあった。また、同一の発言が繰り返された場合及び複数の会議参加者が同時に発言した場合に正しく議事録を取得することが困難であった。一方、発言者のみの音声を取得するために使用される遮音性を有するマイクは、高価かつ複雑な装置であり、会議支援システムの運用を困難にするものであった。 In addition, when conference participants individually recognize their voices with microphones, the voice recognition is performed with each microphone, so that the conference support system may acquire the speaker's utterances redundantly. In addition, it is difficult to obtain the minutes correctly when the same speech is repeated or when a plurality of conference participants speak at the same time. On the other hand, the sound-insulating microphone used to capture the voice of only the speaker is an expensive and complicated device, making the operation of the conference support system difficult.

本発明は、上記の点に鑑みてなされたものであって、同一の音源から音声を複数のマイクで同時に取得した場合であっても、一名の発言として処理することを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of the above points, and it is an object of the present invention to process the speech as one person's utterance even when the speech is simultaneously acquired by a plurality of microphones from the same sound source.

そこで上記課題を解決するため、情報処理装置は、複数の情報処理端末と接続され、情報処理端末において取得された音声が音声認識されることにより得られたテキスト及び前記音声の取得が開始された時刻を含むメッセージを取得する取得部と、前記取得部により取得された第１のメッセージに含まれるテキストと、前記取得部により取得され、前記第１のメッセージに含まれる前記時刻以前の前記時刻を有する第２のメッセージに含まれるテキストとの類似度を算出する第１の算出部と、前記第１の算出部により算出される類似度に基づいて、前記第１のメッセージを記録するか否かを決定する決定部と、前記音声の音声波形の類似度を算出する第２の算出部とを有し、前記第１の算出部により算出される類似度により類似していると判定された場合、前記第２の算出部は、前記第１のメッセージに含まれる音声波形と、前記第２のメッセージに含まれる音声波形との類似度を算出し、前記決定部は、前記第２の算出部により算出される類似度に基づいて、前記第１のメッセージを記録するか否かを決定する。 Therefore, in order to solve the above problem, an information processing apparatus is connected to a plurality of information processing terminals, and acquisition of the text and the voice obtained by recognizing the voice acquired by the information processing terminal is started. an acquisition unit that acquires a message including a time, text included in a first message acquired by the acquisition unit, and the time before the time included in the first message acquired by the acquisition unit; a first calculator that calculates a degree of similarity with a text included in a second message; and whether or not to record the first message based on the degree of similarity calculated by the first calculator and a second calculation unit for calculating the similarity of the speech waveform of the speech, and when it is determined that the similarity is similar by the similarity calculated by the first calculation unit , the second calculation unit calculates a similarity between a speech waveform included in the first message and a speech waveform included in the second message, and the determination unit performs the second calculation unit It is determined whether to record the first message based on the similarity calculated by .

同一の音源から音声を複数のマイクで同時に取得した場合であっても、一名の発言として処理することができる。 Even when voices are simultaneously acquired from the same sound source by a plurality of microphones, they can be processed as utterances by one person.

本発明の実施の形態における情報処理システム１の構成例を示す図である。It is a figure showing an example of composition of information processing system 1 in an embodiment of the invention. 本発明の実施の形態における情報処理システム１を説明するための図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a figure for demonstrating the information processing system 1 in embodiment of this invention. 本発明の実施の形態における情報処理装置１０のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the information processing apparatus 10 in embodiment of this invention. 第１の実施の形態における情報処理装置１０の機能構成例を示す図である。1 is a diagram illustrating an example of functional configuration of an information processing apparatus 10 according to a first embodiment; FIG. 第１の実施の形態における最小編集距離アルゴリズムを説明するための図である。It is a figure for demonstrating the minimum edit distance algorithm in 1st Embodiment. 第１の実施の形態における音声波形の類似度を算出する方法の例を示す図である。FIG. 7 is a diagram showing an example of a method of calculating similarity between speech waveforms according to the first embodiment; 第１の実施の形態における情報処理方法を説明するためのフローチャートである。4 is a flowchart for explaining an information processing method according to the first embodiment; 第２の実施の形態における情報処理装置１０の機能構成例を示す図である。It is a figure which shows the functional structural example of the information processing apparatus 10 in 2nd Embodiment. 第２の実施の形態における情報処理の例（１）を説明するためのシーケンス図である。FIG. 12 is a sequence diagram for explaining example (1) of information processing in the second embodiment; 第２の実施の形態における情報処理の例（２）を説明するためのシーケンス図である。FIG. 12 is a sequence diagram for explaining example (2) of information processing in the second embodiment; 従来の情報処理の例（１）を示す図である。It is a figure which shows the example (1) of the conventional information processing. 第２の実施の形態における情報処理の例（１）を示す図である。FIG. 10 is a diagram illustrating an example (1) of information processing in the second embodiment; 従来の情報処理の例（２）を示す図である。It is a figure which shows the example (2) of the conventional information processing. 第２の実施の形態における情報処理の例（２）を示す図である。FIG. 10 is a diagram illustrating an example (2) of information processing in the second embodiment;

以下、図面に基づいて本発明の実施の形態を説明する。 BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described below based on the drawings.

図１は、本発明の実施の形態における情報処理システム１の構成例を示す図である。図１に示されるように、本発明の実施の形態における情報処理システム１は、情報処理装置１０及びクライアント２０を有する。情報処理システム１は、例えば会議の音声を入力とし、リアルタイムに議事録を出力することが可能な会議支援システムである。 FIG. 1 is a diagram showing a configuration example of an information processing system 1 according to an embodiment of the present invention. As shown in FIG. 1 , an information processing system 1 according to the embodiment of the present invention has an information processing device 10 and a client 20 . The information processing system 1 is, for example, a conference support system that can input conference audio and output minutes in real time.

情報処理装置１０は、サーバ等のコンピュータであり、後述するプログラムによって機能を実現する装置である。他の例として、情報処理装置１０は、会議システム専用装置、電子ホワイトボード又はプロジェクタ等であってもよい。図１に示されるように情報処理装置１０は、各クライアント２０から音声波形と、当該音声波形をクライアント２０で音声認識した結果であるテキストメッセージとを少なくとも受信し、フィルタ処理を行った後のテキストメッセージをクライアント２０に送信する。なお、各クライアントからは音声波形のみを受信して、当該音声波形の音声認識を情報処理装置１０が行ってもよい。さらに、情報処理装置１０は、議事録作成に付随する情報として、参加者と発言数の統計表示を行う機能を有してもよい。 The information processing device 10 is a computer such as a server, and is a device that implements functions by a program described later. As another example, the information processing device 10 may be a conference system dedicated device, an electronic whiteboard, a projector, or the like. As shown in FIG. 1, the information processing apparatus 10 receives at least a speech waveform from each client 20 and a text message that is the result of speech recognition of the speech waveform by the client 20, and filters the text message. Send the message to the client 20 . Note that the information processing apparatus 10 may receive only a voice waveform from each client and perform voice recognition of the voice waveform. Furthermore, the information processing apparatus 10 may have a function of displaying statistics of the number of participants and the number of statements as information incidental to the creation of minutes.

クライアント２０は、少なくともマイク及び表示装置を備えたＰＣ（Personal Computer）等のコンピュータであり、後述するプログラムによって機能を実現する装置であり、情報処理端末ともいう。クライアント２０は、情報処理端末を複数含む。他の例として、クライアント２０は、タブレット型端末、スマートフォン、ＰＤＡ（Personal Digital Assistant）、携帯電話、ウェアラブルＰＣ、ゲーム機器又はカーナビゲーション端末等であってもよい。 The client 20 is a computer such as a PC (Personal Computer) including at least a microphone and a display device. The client 20 includes multiple information processing terminals. As another example, the client 20 may be a tablet terminal, smart phone, PDA (Personal Digital Assistant), mobile phone, wearable PC, game device, car navigation terminal, or the like.

図２は、本発明の実施の形態における情報処理システム１を説明するための図である。本発明の実施の形態における情報処理システム１は、リアルタイム議事録支援サービスを提供することができる。リアルタイム議事録支援サービスとは、会議で行われる発言を音声認識によりテキストに起こして議事録を生成し、リアルタイムで会議参加者にフィードバックするサービスである。 FIG. 2 is a diagram for explaining the information processing system 1 according to the embodiment of the present invention. The information processing system 1 according to the embodiment of the present invention can provide a real-time minutes support service. The real-time minutes support service is a service that converts remarks made in a conference into text by speech recognition, generates minutes, and provides real-time feedback to conference participants.

図２に示されるように、情報処理システム１において、チャット形式による発言の記録及び編集が可能である。また、参加者と発言数の統計表示が可能である。また、音声認識による入力が使用可能であり、音声認識に外部エンジンを活用することもできる。また、チェック機能で、議事録発行をサポートすることができる。チェック機能とは、例えば、会議中に重要であると判断された内容について、印を付与し、会議終了後の振り返りをサポートする機能である。また、ユーザが設定したビジネスチャットサービスへの同期が想定される。また、会議終了時にタギング情報に基づいたサマライズと、取得された生データとをパッキングして、メール等にエクスポートすることができる。タギング情報とは、例えば、キーワードを含んでもよい。サマライズは、キーワードを含む発言に基づいてもよい。 As shown in FIG. 2, in the information processing system 1, it is possible to record and edit comments in a chat format. It is also possible to display the statistics of the number of participants and the number of remarks. In addition, voice recognition input can be used, and an external engine can be used for voice recognition. In addition, the check function can support issuing minutes. The check function is, for example, a function of marking contents judged to be important during the meeting and supporting review after the meeting. Synchronization to a user-configured business chat service is also envisioned. Also, at the end of the meeting, the summary based on the tagging information and the acquired raw data can be packed and exported to e-mail or the like. Tagging information may include, for example, keywords. Summarization may be based on utterances containing keywords.

なお、図２に示される画面は、会議参加者ごとに設置されているクライアント２０での画面であり、当該画面は、例えば外出しているユーザのスマートフォン又は会議室以外に在席しているユーザのＰＣにおいて、閲覧及びコメントが可能であってもよい。 It should be noted that the screen shown in FIG. 2 is a screen on the client 20 installed for each conference participant. can be viewed and commented on the PC.

図３は、本発明の実施の形態における情報処理装置１０のハードウェア構成例を示す図である。図３に示される情報処理装置１０は、それぞれ相互に接続されているプロセッサ１１、記憶装置１２、補助記憶装置１３、入出力インタフェース１４及び通信インタフェース１５等を有する。 FIG. 3 is a diagram showing a hardware configuration example of the information processing device 10 according to the embodiment of the present invention. The information processing apparatus 10 shown in FIG. 3 has a processor 11, a storage device 12, an auxiliary storage device 13, an input/output interface 14, a communication interface 15, etc., which are connected to each other.

情報処理装置１０での処理を実現するプログラムは、補助記憶装置１３に格納される。補助記憶装置１３は、インストールされたプログラムを保持すると共に、必要なデータを保持する。 A program that implements processing in the information processing device 10 is stored in the auxiliary storage device 13 . The auxiliary storage device 13 holds installed programs and necessary data.

記憶装置１２は、プロセッサ１１からの起動指示に従い、補助記憶装置１３からプログラムを読み出して格納する。プロセッサ１１は、記憶装置１２に格納されたプログラムを実行することによって情報処理装置１０に係る機能を実現する。 The storage device 12 reads the program from the auxiliary storage device 13 and stores it in accordance with the activation instruction from the processor 11 . The processor 11 realizes functions related to the information processing apparatus 10 by executing programs stored in the storage device 12 .

入出力インタフェース１４は、マイク、ＵＳＢ（Universal Serial Bus）機器、ハードウェアキー、状態通知用ＬＥＤ、液晶ディスプレイ等の様々な入出力装置との接続を行うためのインタフェースである。 The input/output interface 14 is an interface for connecting various input/output devices such as a microphone, a USB (Universal Serial Bus) device, a hardware key, a status notification LED, and a liquid crystal display.

通信インタフェース１５は、クライアント２０、メールサーバ、スマートフォン又はＰＣ等と通信を行うための有線又は無線のインタフェースである。 The communication interface 15 is a wired or wireless interface for communicating with the client 20, mail server, smart phone, PC, or the like.

なお、クライアント２０も図３と同様のハードウェア構成を有していてもよい。クライアント２０は、さらに入出力インタフェース１４に接続されるマイクを有していてもよい。 Note that the client 20 may also have the same hardware configuration as in FIG. Client 20 may also have a microphone connected to input/output interface 14 .

以下、本発明の第１の実施の形態について説明する。図４は、第１の実施の形態における情報処理装置１０の機能構成例を示す図である。図４に示されるように、情報処理装置１０は、情報受信部１０１、メッセージ保管部１０２、メッセージ類似度計算部１０３、音声保管部１０４、音声類似度計算部１０５及び類似メッセージ決定部１０６を有する。情報処理装置１０が有する各機能部は、図３に示される補助記憶装置１３から記憶装置１２上に展開されたプログラムを実行するプロセッサ１１によって実現される。 A first embodiment of the present invention will be described below. FIG. 4 is a diagram showing a functional configuration example of the information processing apparatus 10 according to the first embodiment. As shown in FIG. 4, the information processing apparatus 10 has an information receiving unit 101, a message storage unit 102, a message similarity calculation unit 103, a voice storage unit 104, a voice similarity calculation unit 105, and a similar message determination unit 106. . Each functional unit of the information processing apparatus 10 is implemented by the processor 11 that executes a program developed on the storage device 12 from the auxiliary storage device 13 shown in FIG.

情報受信部１０１は、図３に示される通信インタフェース１５によって実現され、各クライアント２０から、音声波形及び当該音声波形をクライアント２０で音声認識した結果であるテキストメッセージを少なくとも受信する。また、音声波形及びテキストメッセージには、音声が取得されたマイクＩＤ及び音声が取得開始された時刻を示す情報が付与されている。なお、各クライアント２０で、時刻は所定の精度で同期されており、音声が取得された時刻は、同期された時刻で計測されてもよい。マイクＩＤは、クライアント２０を識別する情報であってもよいし、マイク自体を識別する情報であってもよい。 The information receiving unit 101 is realized by the communication interface 15 shown in FIG. 3, and receives at least a speech waveform and a text message that is the result of speech recognition of the speech waveform by the client 20 from each client 20 . Information indicating the microphone ID from which the voice was acquired and the time at which the voice acquisition started is added to the voice waveform and the text message. It should be noted that the time of each client 20 may be synchronized with a predetermined accuracy, and the time at which the voice was acquired may be measured at the synchronized time. The microphone ID may be information identifying the client 20 or information identifying the microphone itself.

メッセージ保管部１０２は、情報受信部１０１からテキストメッセージを受信し、保管する。所定の期間にクライアント２０から受信したテキストメッセージが、メッセージ保管部１０２に蓄積される。メッセージ保管部１０２は、テキストメッセージと、付随するマイクＩＤ及び音声が取得開始された時刻とを、関連付けて保管してもよい。 The message storage unit 102 receives and stores the text message from the information reception unit 101 . Text messages received from the client 20 during a predetermined period are stored in the message storage unit 102 . The message storage unit 102 may store the text message, the associated microphone ID, and the time when the acquisition of the voice is started in association with each other.

メッセージ類似度計算部１０３は、受信したテキストメッセージが、メッセージ保管部１０２に蓄積されているテキストメッセージと類似しているか否かを、例えば最小編集距離アルゴリズムを用いて判定する。 The message similarity calculation unit 103 determines whether the received text message is similar to the text messages stored in the message storage unit 102 using, for example, a minimum edit distance algorithm.

図５は、第１の実施の形態における最小編集距離アルゴリズムを説明するための図である。最小編集距離アルゴリズムとは、文字の類似度を比較するアルゴリズムである。図５に示されるように、ある文字列に対して「挿入」、「削除」又は「置換」の３つの処理を行い、他の文字列に一致させることができる最小回数を編集回数として距離と定義する。文字列間の距離が少ないほうが、類似度が高いと判定される。 FIG. 5 is a diagram for explaining the minimum edit distance algorithm in the first embodiment. A minimum edit distance algorithm is an algorithm that compares the degree of similarity between characters. As shown in FIG. 5, the distance is defined as the minimum number of times that a character string can be matched with another character string by performing three processes of "insertion", "delete" or "replacement". Define. It is determined that the smaller the distance between character strings, the higher the degree of similarity.

図５に示される「挿入」は、「しんし」に、２文字「ぶん」が挿入されて、「しんぶんし」となる例である。２文字すなわち２回「挿入」されているため、「しんし」と「しんぶんし」の距離は２となる。 "Insert" shown in FIG. 5 is an example in which two characters "bun" are inserted into "shinbunshi" to form "shinbunshi". The distance between "shinbunshi" and "shinbunshi" is two, because two characters, that is, are "inserted" twice.

図５に示される「削除」は、「しんし」の末尾の「し」が削除されて「しん」となる例である。１文字すなわち１回「削除」されているため、「しんし」と「しん」の距離は１となる。 "Deletion" shown in FIG. 5 is an example in which "shi" at the end of "shinshi" is deleted to become "shin". Since one character is "deleted" once, the distance between "shin" and "shin" is one.

図５に示される「置換」は、「しんし」の「ん」が、「か」に置換されて「しかし」となる例である。１文字すなわち１回「置換」されているため、「しんし」と「しかし」の距離は１となる。 "Replacement" shown in FIG. 5 is an example in which "n" of "shinshi" is replaced with "ka" to form "but". The distance between "shinshi" and "but" is 1 because there is one character, that is, one "replacement".

図４に戻る。メッセージ類似度計算部１０３において、新規に受信したテキストメッセージが、蓄積されているテキストメッセージと類似しているか否かを判定した結果は、類似メッセージ決定部１０６に送信される。なお、類似度の判定に用いる最小編集距離アルゴリズムは一例であって、他のアルゴリズムによってテキストメッセージの類似度を判定してもよい。 Return to FIG. The message similarity calculation unit 103 determines whether or not the newly received text message is similar to the stored text message, and the result is sent to the similar message determination unit 106 . Note that the minimum edit distance algorithm used to determine similarity is an example, and other algorithms may be used to determine similarity of text messages.

音声保管部１０４は、情報受信部１０１から音声波形を受信し、保管する。所定の期間にクライアント２０から受信した音声波形が、音声保管部１０４に蓄積される。音声保管部１０４は、音声波形と、付随するマイクＩＤ及び音声が取得開始された時刻とを、関連付けて保管してもよい。 The voice storage unit 104 receives and stores voice waveforms from the information receiving unit 101 . A voice waveform received from the client 20 during a predetermined period is stored in the voice storage unit 104 . The voice storage unit 104 may store the voice waveform, the associated microphone ID, and the time at which voice acquisition is started in association with each other.

音声類似度計算部１０５は、受信した音声波形が、音声保管部１０４に蓄積されている音声波形と類似しているか否かを、例えば周波数解析して相互相関関数を用いて判定する。 The speech similarity calculation unit 105 determines whether the received speech waveform is similar to the speech waveform stored in the speech storage unit 104, for example, by frequency analysis and using a cross-correlation function.

図６は、第１の実施の形態における音声波形の類似度を算出する方法の例を示す図である。図６に示されるように、音声波形を周波数変換して相互相関関数にて類似度を計算する方法である。ただし、以下の手法は、一般的な周波数解析の例であり、音声波形の類似度の計算は下記のアルゴリズムに限定されない。 FIG. 6 is a diagram showing an example of a method of calculating the similarity of speech waveforms according to the first embodiment. As shown in FIG. 6, this is a method of frequency-converting the voice waveform and calculating the degree of similarity using the cross-correlation function. However, the method below is an example of general frequency analysis, and the calculation of the similarity of speech waveforms is not limited to the algorithm below.

図６左図の音声波形を、フーリエ変換したものが、図６右図である。例えば、類似度を算出する対象である音声波形がｃ（ｔ）及びｄ（ｔ）であって、それぞれをフーリエ変換したものがＣ（ｋ）及びＤ（ｋ）であったとすると、以下の数１によって、音声波形の類似度を算出する。数１のＸが大きいほど、類似度が高くなる。 The right figure in FIG. 6 is obtained by Fourier transforming the speech waveform in the left figure in FIG. For example, if the speech waveforms for which the similarity is to be calculated are c(t) and d(t), and the Fourier transforms of them are C(k) and D(k), then the following numbers 1, the similarity of speech waveforms is calculated. The greater the X in Equation 1, the higher the degree of similarity.

図４に戻る。音声類似度計算部１０５において、新規に受信した音声波形が、蓄積されている音声波形と類似しているか否かを判定した結果は、類似メッセージ決定部１０６に送信される。なお、類似度の判定に図６で説明した相互相関関数を用いる方法は一例であって、他のアルゴリズムによって音声波形の類似度を判定してもよい。 Return to FIG. The speech similarity calculation unit 105 determines whether or not the newly received speech waveform is similar to the accumulated speech waveform, and the result is sent to the similar message determination unit 106 . Note that the method of using the cross-correlation function described in FIG. 6 for similarity determination is an example, and the similarity of speech waveforms may be determined using other algorithms.

類似メッセージ決定部１０６において、メッセージ類似度計算部１０３及び音声類似度計算部１０５から受信した判定結果に基づいて、クライアント２０にフィルタ後のメッセージを送信する。詳細は後述する。 The similar message determination unit 106 transmits the filtered message to the client 20 based on the determination results received from the message similarity calculation unit 103 and the speech similarity calculation unit 105 . Details will be described later.

図７は、第１の実施の形態における情報処理方法を説明するためのフローチャートである。図７において、情報処理装置１０において、クライアント２０から新規メッセージを取得した場合の処理を説明する。 FIG. 7 is a flow chart for explaining the information processing method according to the first embodiment. Referring to FIG. 7, processing when a new message is acquired from the client 20 in the information processing apparatus 10 will be described.

ステップＳ１１において、情報受信部１０１は、クライアント２０から新規メッセージを取得する。新規メッセージには、音声波形、当該音声波形をクライアント２０で音声認識した結果であるテキストメッセージ、当該音声が取得されたマイクＩＤ及び音声取得開始時刻を示す情報が含まれる。さらに、新規メッセージには、メッセージを識別するための識別子が含まれてもよい。また、新規メッセージは、最新のメッセージである。すなわち、新規メッセージは、メッセージ保管部１０２又は音声保管部１０４に保管されているいずれのメッセージよりも遅い音声取得開始時刻を有するものとする。 In step S<b>11 , the information receiving section 101 acquires a new message from the client 20 . The new message includes information indicating a voice waveform, a text message that is the result of voice recognition of the voice waveform by the client 20, a microphone ID from which the voice was acquired, and a voice acquisition start time. Additionally, the new message may include an identifier to identify the message. Also, the new message is the latest message. That is, the new message has a later voice acquisition start time than any of the messages stored in message storage unit 102 or voice storage unit 104 .

続くステップＳ１２において、メッセージ類似度計算部１０３は、メッセージ保管部１０２に保管されているテキストメッセージを参照し、新規メッセージに含まれるテキストメッセージと、直近に類似したテキストメッセージがあるかを判定する（Ｓ１３）。直近とは、同一の音源から複数のマイクにおいてメッセージが取得される可能性のある期間より長い期間が設定される。例えば、会議室に配置されているマイク間の最大距離が２０ｍであった場合、想定される最大遅延の２０_ｍ／３４０_ｍ／ｓ＝０．０５８８ｓより長い期間として６０ｍｓ等と定めてもよい。３４０_ｍ／ｓは、音速の例である。メッセージ類似度計算部１０３は、新規メッセージに含まれる音声取得開始時刻から、上記の例では６０ｍｓ以内の音声取得開始時刻を有するメッセージのテキストメッセージに対する類似度を判定する。 In the subsequent step S12, the message similarity calculation unit 103 refers to the text messages stored in the message storage unit 102, and determines whether there is a text message that is most recently similar to the text message included in the new message ( S13). The term “most recent” is set to a period longer than a period during which messages may be acquired from the same sound source by a plurality of microphones. For example, if the maximum distance between microphones arranged in a conference room is 20 m, a period longer than the assumed maximum delay of 20 _m /340 _m/s = 0.0588 s may be set to 60 ms. 340 _m/s is an example of the speed of sound. The message similarity calculator 103 determines the similarity of a message having a voice acquisition start time within 60 ms from the voice acquisition start time included in the new message to the text message.

直近にメッセージが取得されていた場合、例えば図５で説明した最小編集距離アルゴリズムを用いて、新規メッセージのテキストメッセージと、保管されているテキストメッセージ間の類似度を算出する。類似度を判定するテキストメッセージは複数であってもよい。メッセージ類似度計算部１０３は、テキストメッセージ間の最小編集距離が所定の閾値未満であるか否かに基づいて、類似するか否かを判定する。類似している場合（Ｓ１３のＹＥＳ）、ステップＳ１４に進み、類似していない場合（Ｓ１３のＮＯ）ステップＳ１８に進む。 If the message was recently retrieved, the similarity between the text message of the new message and the stored text message is calculated using, for example, the minimum edit distance algorithm described in FIG. A plurality of text messages may be used for similarity determination. The message similarity calculation unit 103 determines whether or not the text messages are similar based on whether or not the minimum edit distance between text messages is less than a predetermined threshold. If they are similar (YES in S13), the process proceeds to step S14, and if they are not similar (NO in S13), the process proceeds to step S18.

ステップＳ１４において、メッセージ類似度計算部１０３は、類似していると判定されたメッセージの音声保管部１０４に保管されている音声ログを参照し、新規メッセージに含まれる音声波形と類似度が高いかを判定する。例えば図６で説明した音声波形の類似度を算出する方法を用いて、新規音声波形と、保管されている音声波形間の類似度を算出する。類似度を判定する音声波形は複数であってもよい。メッセージ類似度計算部１０３は、音声波形間の類似度が所定の閾値未満であるか否かに基づいて、類似するか否かを判定する。類似している場合（Ｓ１５のＹＥＳ）、ステップＳ１６に進み、類似していない場合（Ｓ１５のＮＯ）ステップＳ１８に進む。 In step S14, the message similarity calculation unit 103 refers to the voice log stored in the voice storage unit 104 of the message determined to be similar, and determines whether the voice waveform contained in the new message has a high degree of similarity. judge. For example, the similarity between the new speech waveform and the stored speech waveform is calculated using the method for calculating the speech waveform similarity described in FIG. A plurality of speech waveforms may be used for similarity determination. The message similarity calculator 103 determines whether or not the similarity between voice waveforms is similar based on whether or not the similarity between the voice waveforms is less than a predetermined threshold. If similar (YES in S15), proceed to step S16; otherwise (NO in S15), proceed to step S18.

ステップＳ１６において、ステップＳ１５で音声波形の類似度が高いと判定された過去メッセージの音量と、新規メッセージの音量とを比較する。新規メッセージの音量が大きい場合（Ｓ１６のＹＥＳ）、ステップＳ１７に進む。過去メッセージの音量が大きい場合（Ｓ１６のＮＯ）、ステップＳ１９に進む。 In step S16, the volume of the past message determined to have high similarity in voice waveform in step S15 is compared with the volume of the new message. If the volume of the new message is high (YES in S16), the process proceeds to step S17. If the past message volume is high (NO in S16), the process proceeds to step S19.

ステップＳ１７において、過去メッセージに除去フラグをセットする。メッセージ保管部１０２及び音声保管部１０４に保管されている除去フラグがセットされた過去メッセージは、以後類似度の判定には使用されない。 In step S17, a removal flag is set in the past message. The past messages with the removal flag set stored in the message storage unit 102 and voice storage unit 104 will not be used for similarity determination thereafter.

ステップＳ１８において、新規メッセージ及び除去フラグがセットされた過去メッセージをすべてのクライアント２０に通知する。クライアント２０において、「除去フラグ」がセットされたメッセージが通知された場合、当該メッセージは画面に表示されなくともよい。ステップＳ１８において、新規メッセージは、メッセージの識別子、マイクＩＤ及びテキストメッセージの一部又は全部がクライアント２０に通知されてもよい。なお、ステップＳ１８において、除去フラグがセットされた過去メッセージは、メッセージの識別子及び除去フラグのみがクライアント２０に通知されてもよいし、さらにマイクＩＤ及びテキストメッセージの一部又は全部が通知されてもよい。また、除去フラグがセットされた過去メッセージは、クライアント２０に通知されなくてもよい。通知された新規メッセージは、議事録に記録される。 In step S18, all clients 20 are notified of new messages and past messages with removal flags set. In the client 20, when a message with a "removal flag" set is notified, the message may not be displayed on the screen. In step S18, the new message may be notified to the client 20 of the identifier of the message, the microphone ID and some or all of the text message. In step S18, for the past message with the removal flag set, only the message identifier and removal flag may be notified to the client 20, or the microphone ID and part or all of the text message may be notified to the client 20. good. In addition, the client 20 may not be notified of the past message with the removal flag set. The notified new message is recorded in the minutes.

ステップＳ１９において、新規メッセージに除去フラグをセットする。メッセージ保管部１０２及び音声保管部１０４に保管されている除去フラグがセットされた新規メッセージは、保管されなくてもよい。 In step S19, a removal flag is set for the new message. New messages with the removal flag set stored in the message storage unit 102 and voice storage unit 104 may not be stored.

ステップＳ２０において、新規メッセージ及び除去フラグをセットした過去メッセージをすべてのクライアント２０に通知する。クライアント２０において、「除去フラグ」がセットされたメッセージが通知された場合、当該メッセージは画面に表示されなくともよい。ステップＳ２０において、新規メッセージは、メッセージの識別子、マイクＩＤ及びテキストメッセージの一部又は全部がクライアント２０に通知されてもよい。除去フラグがセットされた新規メッセージは、メッセージの識別子及び除去フラグのみがクライアント２０に通知されてもよいし、さらにマイクＩＤ及びテキストメッセージの一部又は全部が通知されてもよい。また、除去フラグがセットされた新規メッセージは、クライアント２０に通知されなくてもよい。通知された除去フラグがセットされていない新規メッセージは、議事録に記録される。 In step S20, all clients 20 are notified of new messages and past messages with removal flags set. In the client 20, when a message with a "removal flag" set is notified, the message may not be displayed on the screen. In step S20, the new message may be notified to the client 20 of the identifier of the message, the microphone ID and some or all of the text message. For a new message with a removal flag set, only the message identifier and removal flag may be notified to the client 20, or the microphone ID and part or all of the text message may be notified. Also, the client 20 may not be notified of the new message with the removal flag set. New messages that do not have the posted removal flag set are logged in the minutes.

なお、図７で説明したフローチャートにおいて、ステップＳ１３で新規メッセージと直近に類似したメッセージが存在した場合（Ｓ１３のＹＥＳ）、ステップＳ１９に進み、新規メッセージに除去フラグをセットしてもよい。 In the flowchart described in FIG. 7, if there is a message that is most recently similar to the new message in step S13 (YES in S13), the process may proceed to step S19 to set the removal flag for the new message.

上述のように、第１の実施の形態によれば、情報処理装置１０は、各クライアント２０から取得した最新の新規メッセージと、過去に取得されたメッセージとのテキストメッセージの類似度を比較する。テキストメッセージが類似しているメッセージが存在する場合、情報処理装置１０は、さらに音声波形の類似度を比較する。音声波形が類似しているメッセージが存在する場合、情報処理装置１０は、さらに音声波形の音量を比較し、音量が大きいメッセージを議事録に記録する。 As described above, according to the first embodiment, the information processing apparatus 10 compares the text message similarity between the latest new message obtained from each client 20 and the previously obtained message. If there are messages with similar text messages, the information processing apparatus 10 further compares the similarities of the voice waveforms. If there are messages with similar voice waveforms, the information processing apparatus 10 further compares the volume of the voice waveforms and records the message with the loudest volume in the minutes.

すなわち、同一の音源から音声を複数のマイクで同時に取得した場合であっても、一名の発言として処理することができる。したがって、会議支援システムにおいて生成される議事録の可読性が向上し、情報処理装置１０はクライアント２０に備えられる複数のマイクに音声が入力されることによるノイズとなるメッセージを除去することが可能になり、高価なマイク等のハードウェアが不要になる。 That is, even when voices are simultaneously acquired from the same sound source by a plurality of microphones, they can be processed as utterances by one person. Therefore, the readability of the minutes generated in the conference support system is improved, and the information processing apparatus 10 can remove messages that become noise due to voice input to multiple microphones provided in the client 20. , hardware such as expensive microphones becomes unnecessary.

次に、本発明の第２の実施の形態について説明する。第２の実施の形態では第１の実施の形態と異なる点について説明する。したがって、特に言及されない点については、第１の実施の形態と同様であってもよい。 Next, a second embodiment of the invention will be described. 2nd Embodiment demonstrates a different point from 1st Embodiment. Therefore, points that are not particularly mentioned may be the same as those in the first embodiment.

図８は、第２の実施の形態における情報処理装置１０の機能構成例を示す図である。図８に示されるように、情報処理装置１０は、情報受信部１１１、重複メッセージ候補保管部１１２、全メッセージ保管部１１３、メッセージ類似度判定部１１４及びメッセージフィルタリング部１１５を有する。情報処理装置１０が有する各機能部は、図３に示される補助記憶装置１３から記憶装置１２上に展開されたプログラムを実行するプロセッサ１１によって実現される。 FIG. 8 is a diagram showing a functional configuration example of the information processing apparatus 10 according to the second embodiment. As shown in FIG. 8, the information processing apparatus 10 has an information receiving section 111, a duplicate message candidate storage section 112, an all message storage section 113, a message similarity determination section 114, and a message filtering section 115. FIG. Each functional unit of the information processing apparatus 10 is implemented by the processor 11 that executes a program developed on the storage device 12 from the auxiliary storage device 13 shown in FIG.

情報受信部１１１は、図４に示される情報受信部１０１の機能に加えて、時間領域で重複する可能性があるメッセージを抽出する機能を有する。情報受信部１１０は、抽出された重複する可能性があるメッセージを重複メッセージ候補保管部１１２に送信する。例えば、あるメッセージの音声の取得が開始された時刻から音声の取得が完了した時刻までの期間が、他のメッセージの音声の取得が開始された時刻から音声の取得が完了した時刻までの期間と重複する場合に、重複する可能性があるメッセージとして抽出される。また、情報受信部１１０は、すべてのメッセージを全メッセージ保管部１１３に送信する。 Information receiving section 111 has a function of extracting messages that may overlap in the time domain, in addition to the function of information receiving section 101 shown in FIG. The information receiving unit 110 transmits the extracted potentially duplicate messages to the duplicate message candidate storage unit 112 . For example, the period from the time audio acquisition started to the time audio acquisition completed for one message is the same as the period from the time audio acquisition started to the time audio acquisition completed for another message. If duplicated, it is extracted as a possible duplicate message. Information receiving section 110 also transmits all messages to all message storage section 113 .

重複メッセージ候補保管部１１２は、情報受信部１１０からテキストメッセージを受信し、保管する。所定の期間にクライアント２０から受信したテキストメッセージのうち、時間領域で重複する可能性があるメッセージが、重複メッセージ候補保管部１１２に蓄積される。図８に示される例では、メッセージ２及びメッセージ３が、時間領域で重複する可能性があるメッセージである。重複メッセージ候補保管部１１２は、テキストメッセージと、付随するマイクＩＤ、音声が取得開始された時刻及び取得完了した時刻を、関連付けて記録してもよい。 The duplicate message candidate storage unit 112 receives and stores text messages from the information reception unit 110 . Of the text messages received from the client 20 during a predetermined period, messages that may overlap in the time domain are stored in the duplicate message candidate storage unit 112 . In the example shown in FIG. 8, message 2 and message 3 are messages that may overlap in the time domain. The duplicate message candidate storage unit 112 may associate and record the text message, the associated microphone ID, and the time when the acquisition of the voice was started and the time when the acquisition was completed.

全メッセージ保管部１１３は、情報受信部１１０からテキストメッセージを受信し、保管する。クライアント２０から受信したテキストメッセージが、全メッセージ保管部１１３に蓄積される。全メッセージ保管部１１３は、テキストメッセージと、付随するマイクＩＤ、音声が取得開始された時刻及び取得完了した時刻を、関連付けて記録してもよい。全メッセージ保管部１１３は、フィルタがオンに設定されていないメッセージを、全クライアントに送信する。 The all message storage unit 113 receives and stores the text messages from the information reception unit 110 . Text messages received from the client 20 are stored in the all message storage unit 113 . The all-message storage unit 113 may associate and record the text message, the associated microphone ID, and the time when voice acquisition started and acquisition completion time. The all-message storage unit 113 transmits messages for which the filter is not set to ON to all clients.

メッセージ類似度判定部１１４は、図４に示されるメッセージ類似度計算部１０３、音声類似度計算部１０５及び類似メッセージ決定部１０６の機能を有する。すなわち、メッセージ類似度判定部１１４は、テキストメッセージと音声データに基づいて、あるメッセージと他のメッセージが類似するか否かを判定することができる。メッセージ類似度判定部１１４は、重複メッセージ候補保管部１１２又は全メッセージ保管部１１３から取得したメッセージが類似するか否かを判定した結果をメッセージフィルタリング部１１５に送信する。 The message similarity determination unit 114 has the functions of the message similarity calculation unit 103, the voice similarity calculation unit 105, and the similar message determination unit 106 shown in FIG. That is, the message similarity determination unit 114 can determine whether or not a certain message is similar to another message based on the text message and voice data. The message similarity determination unit 114 transmits the result of determining whether or not the messages acquired from the duplicate message candidate storage unit 112 or the all message storage unit 113 are similar to the message filtering unit 115 .

メッセージフィルタリング部１１５は、メッセージ類似度判定部１１４から取得したメッセージが類似するか否かを判定した結果及びメッセージが取得された時間が重複しているか否かを判定した結果に基づいて、メッセージにフィルタを追加する機能を有する。メッセージフィルタリング部１１５から、全メッセージ保管部１１３にメッセージにフィルタを追加する通知が送信され、全メッセージ保管部１１３は、対象メッセージのフィルタをオンにする。図８に示される例では、メッセージ３のフィルタがオンに設定され、メッセージ３はクライアントに送信されない。 The message filtering unit 115, based on the result of determining whether the messages acquired from the message similarity determining unit 114 are similar and the result of determining whether the times at which the messages were acquired overlap, Has the ability to add filters. The message filtering unit 115 sends a notification to add the filter to the message to the all message storage unit 113, and the all message storage unit 113 turns on the filter for the target message. In the example shown in FIG. 8, the filter for message 3 is set on and message 3 is not sent to the client.

図９は、第２の実施の形態における情報処理の例（１）を説明するためのシーケンス図である。ステップＳ２１において、クライアント２０Ａは、音声認識を開始したことを情報処理装置１０に通知する。続いて、情報処理装置１０は、クライアント２０Ａが音声認識を開始した時刻を記録し、音声認識中であることを認識する（Ｓ２２）。続いて、ステップＳ２３において、クライアント２０Ａは、音声認識を終了したこと及びメッセージを情報処理装置１０に通知する。情報処理装置１０は、クライアント２０Ａが音声認識を終了した時刻を記録し、音声認識が終了したことを認識する。続いて、情報処理装置１０は、メッセージをすべてのクライアント２０に通知する（Ｓ２４）。 FIG. 9 is a sequence diagram for explaining example (1) of information processing in the second embodiment. In step S21, the client 20A notifies the information processing apparatus 10 that speech recognition has started. Subsequently, the information processing apparatus 10 records the time when the client 20A started speech recognition, and recognizes that speech recognition is in progress (S22). Subsequently, in step S23, the client 20A notifies the information processing apparatus 10 of completion of voice recognition and a message. The information processing apparatus 10 records the time when the client 20A finishes speech recognition, and recognizes that the speech recognition has finished. Subsequently, the information processing device 10 notifies all the clients 20 of the message (S24).

図１０は、第２の実施の形態における情報処理の例（２）を説明するためのシーケンス図である。ステップＳ３１において、クライアント２０Ａは、音声認識を開始したことを情報処理装置１０に通知する。続いて、情報処理装置１０は、クライアント２０Ａが音声認識を開始した時刻を記録し、音声認識中であることを認識する（Ｓ３２）。ステップＳ３３において、クライアント２０Ｂは、音声認識を開始したことを情報処理装置１０に通知する。続いて、情報処理装置１０は、クライアント２０Ａが音声認識を開始した時刻を記録し、音声認識中であることを認識して、クライアント２０Ａとクライアント２０Ｂとの音声認識中である時間が重複していることを検知する（Ｓ３４）。 FIG. 10 is a sequence diagram for explaining example (2) of information processing in the second embodiment. In step S31, the client 20A notifies the information processing apparatus 10 that voice recognition has started. Subsequently, the information processing apparatus 10 records the time when the client 20A started speech recognition, and recognizes that speech recognition is in progress (S32). In step S33, the client 20B notifies the information processing apparatus 10 that speech recognition has started. Subsequently, the information processing apparatus 10 records the time when the client 20A started speech recognition, recognizes that speech recognition is in progress, and recognizes that the speech recognition times of the clients 20A and 20B overlap. presence is detected (S34).

ステップＳ３５において、クライアント２０Ｂは、音声認識を終了したこと及びメッセージＢを情報処理装置１０に通知する。情報処理装置１０は、クライアント２０Ｂが音声認識を終了した時刻を記録し、音声認識が終了したことを認識して、クライアント２０Ａとクライアント２０Ｂとの音声認識中である時間が重複が解消されたことを検知することができる。ステップＳ３６において、クライアント２０Ａは、音声認識を終了したこと及びメッセージＡを情報処理装置１０に通知する。情報処理装置１０は、クライアント２０Ａが音声認識を終了した時刻を記録し、音声認識が終了したことを認識する。ここで、情報処理装置１０は、メッセージＡが取得された期間及びメッセージＢが取得された期間を認識している。ステップＳ３７において、情報処理装置１０は、ステップＳ３２及びステップＳ３４で取得したメッセージＡが取得された期間及びメッセージＢが取得された期間に基づいて、メッセージＡとメッセージＢが時間領域で重複している可能性のあるメッセージであるか否かを判定する。情報処理装置１０は、メッセージＡとメッセージＢが重複している可能性がある場合、図７に示されるフローチャートと同様の処理を行って、除去フラグがセットされたメッセージにフィルタを追加する。一方、情報処理装置１０は、メッセージＡとメッセージＢが重複している可能性がない場合、メッセージＡ及びメッセージＢにフィルタを追加しない。ステップＳ３８において、情報処理装置１０は、フィルタリングされていないメッセージをすべてのクライアント２０に通知し、議事録に記録する。 In step S<b>35 , the client 20</b>B notifies the information processing apparatus 10 that the voice recognition has ended and the message B. The information processing apparatus 10 records the time when the client 20B finished speech recognition, recognizes that the speech recognition has finished, and confirms that the duplication of the speech recognition time between the client 20A and the client 20B is eliminated. can be detected. In step S36, the client 20A notifies the information processing apparatus 10 of the fact that the voice recognition has ended and the message A. FIG. The information processing apparatus 10 records the time when the client 20A finishes speech recognition, and recognizes that the speech recognition has finished. Here, the information processing apparatus 10 recognizes the period during which the message A was acquired and the period during which the message B was acquired. In step S37, the information processing apparatus 10 determines whether message A and message B overlap in the time domain based on the period during which message A and the period during which message B were obtained, which were obtained in steps S32 and S34. Determine whether the message is a possible message. If there is a possibility that message A and message B overlap, the information processing apparatus 10 performs processing similar to that of the flowchart shown in FIG. 7 to add a filter to the message with the removal flag set. On the other hand, the information processing apparatus 10 does not add a filter to message A and message B when there is no possibility that message A and message B overlap. In step S38, the information processing apparatus 10 notifies all the clients 20 of the unfiltered message and records it in the minutes.

なお、図１０に示される情報処理において、処理するメッセージは２つに限られず、３つ以上のメッセージが処理されてもよい。取得された期間が重複している可能性があるメッセージが３つ以上の場合、ステップＳ３７において図７に示されるフローチャートと同様の処理を行って、除去フラグがセットされたメッセージにフィルタを追加する。 In the information processing shown in FIG. 10, the number of messages to be processed is not limited to two, and three or more messages may be processed. If there are three or more messages for which there is a possibility that the acquired period overlaps, the same processing as in the flowchart shown in FIG. 7 is performed in step S37 to add a filter to the messages with the removal flag set. .

図１１は、従来の情報処理の例（１）を示す図である。図１１に示されるように、各マイクで個人の音声を識別するシステムでは、同時に複数のマイクで同一の人物の発言を取得した場合に、音声認識結果に基づいて適切なメッセージを出力することができない。 FIG. 11 is a diagram showing an example (1) of conventional information processing. As shown in FIG. 11, in a system that recognizes the voice of an individual with each microphone, it is possible to output an appropriate message based on the voice recognition result when the speech of the same person is acquired simultaneously with a plurality of microphones. Can not.

図１２は、第２の実施の形態における情報処理の例（１）を示す図である。図１２に示されるように、音声認識が重複した時間を考慮して、重複している可能性があるメッセージを判別する。重複している可能性があるメッセージに、図７に示される情報処理方法を行うことで、複数マイクでの音声混濁を防ぐことができる。 FIG. 12 is a diagram illustrating example (1) of information processing in the second embodiment. As shown in FIG. 12, the times when speech recognition overlaps are considered to determine potentially overlapping messages. By applying the information processing method shown in FIG. 7 to messages that may be duplicated, it is possible to prevent voice turbidity with multiple microphones.

図１３は、従来の情報処理の例（２）を示す図である。図１３に示されるように、音声認識結果の類似度を考慮しない場合、かつ、同じ時間に異なる人が発言をした場合、１つの発言を採用するため、別々の発言をメッセージとして出力することができない。 FIG. 13 is a diagram showing an example (2) of conventional information processing. As shown in FIG. 13, when the similarity of speech recognition results is not taken into account and when different people make statements at the same time, one statement is adopted, so different statements can be output as messages. Can not.

図１４は、第２の実施の形態における情報処理の例（２）を示す図である。音声認識結果の類似度及び音声認識が重複した時間を考慮することにより、同じ時間に異なる人が発言をした場合であっても、それぞれに正しいメッセージを出力することができる。 FIG. 14 is a diagram illustrating example (2) of information processing in the second embodiment. By considering the degree of similarity of speech recognition results and the time when speech recognition overlaps, correct messages can be output to each person even if different people speak at the same time.

上述のように、第２の実施の形態によれば、情報処理装置１０は、各クライアント２０から取得した最新の新規メッセージと、過去に取得されたメッセージとのテキストメッセージの類似度を比較する。テキストメッセージが類似しているメッセージが存在する場合、情報処理装置１０は、さらに音声波形の類似度を比較する。音声波形が類似しているメッセージが存在する場合、情報処理装置１０は、さらに音声波形の音量を比較し、音量が大きいメッセージを議事録に記録する。さらに、音声認識が重複した時間に基づいてメッセージのフィルタリングを行うことで、同じ時間に異なる人が発言をした場合であっても、それぞれに正しいメッセージを出力することができる。 As described above, according to the second embodiment, the information processing apparatus 10 compares the text message similarity between the latest new message obtained from each client 20 and the previously obtained message. If there are messages with similar text messages, the information processing apparatus 10 further compares the similarities of the voice waveforms. If there are messages with similar voice waveforms, the information processing apparatus 10 further compares the volume of the voice waveforms and records the message with the loudest volume in the minutes. Furthermore, by filtering messages based on the time when speech recognition overlaps, even if different people speak at the same time, correct messages can be output to each person.

なお、本発明の実施の形態において、情報受信部１０１は、取得部の一例である。メッセージ類似度計算部１０３は、第１の算出部の一例である。類似メッセージ決定部１０６は、決定部の一例である。音声類似度計算部１０５は、第２の算出部の一例である。クライアント２０が備えるマイクは、音声取得部の一例である。クライアント２０が備える表示装置は、表示部の一例である。 In addition, in the embodiment of the present invention, the information receiving section 101 is an example of an obtaining section. The message similarity calculator 103 is an example of a first calculator. Similar message determination unit 106 is an example of a determination unit. The speech similarity calculator 105 is an example of a second calculator. A microphone included in the client 20 is an example of a voice acquisition unit. The display device included in the client 20 is an example of a display unit.

以上、本発明の実施例について詳述したが、本発明は斯かる特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 Although the embodiments of the present invention have been described in detail above, the present invention is not limited to such specific embodiments, and various modifications can be made within the scope of the gist of the invention described in the claims.・Changes are possible.

１０情報処理装置
１１プロセッサ
１２記憶装置
１３補助記憶装置
１４入出力インタフェース
１５通信インタフェース
２０情報処理端末（クライアント）
１０１情報受信部
１０２メッセージ保管部
１０３メッセージ類似度計算部
１０４音声保管部
１０５音声類似度計算部
１０６類似メッセージ決定部
１１１情報受信部
１１２重複メッセージ候補保管部
１１３全メッセージ保管部
１１４メッセージ類似度判定部
１１５メッセージフィルタリング部 10 information processing device 11 processor 12 storage device 13 auxiliary storage device 14 input/output interface 15 communication interface 20 information processing terminal (client)
101 information receiving unit 102 message storage unit 103 message similarity calculation unit 104 voice storage unit 105 voice similarity calculation unit 106 similar message determination unit 111 information reception unit 112 duplicate message candidate storage unit 113 all message storage unit 114 message similarity determination unit 115 message filtering unit

特開２００６－３０１２２３号公報Japanese Patent Application Laid-Open No. 2006-301223

Claims

An information processing device connected to a plurality of information processing terminals,
an acquisition unit configured to acquire a text obtained by recognizing a voice acquired by an information processing terminal and a message including a time when acquisition of the voice is started;
Text included in the first message acquired by the acquisition unit and text included in the second message acquired by the acquisition unit and having the time before the time included in the first message a first calculator that calculates the degree of similarity;
a determination unit that determines whether to record the first message based on the similarity calculated by the first calculation unit;
a second calculator that calculates the similarity of the speech waveform of the speech;
If the degree of similarity calculated by the first calculation unit determines that the similarity is similar, the second calculation unit calculates the voice waveform included in the first message and the voice waveform included in the second message. Calculate the similarity with the speech waveform that is
The information processing apparatus, wherein the determination unit determines whether to record the first message based on the degree of similarity calculated by the second calculation unit.

2. The information processing apparatus according to claim 1, wherein said time included in said first message and said time included in said second message are included in a predetermined period.

2. The information processing apparatus according to claim 1, wherein when the similarity calculated by the first calculation unit determines that the similarity is not similar, the determination unit determines to record the first message.

2. The information processing apparatus according to claim 1, wherein, when the degree of similarity calculated by the second calculation unit determines that the similarity is not similar, the determination unit determines to record the first message.

If the degree of similarity calculated by the second calculation unit determines that the similarity is similar, the volume is determined based on the voice waveform included in the first message and the voice waveform included in the second message. and when the volume of the first message is louder, the decision unit decides to record the first message and delete the second message.
The information processing device described.

If the degree of similarity calculated by the second calculation unit determines that the similarity is similar, the volume is determined based on the voice waveform included in the first message and the voice waveform included in the second message. 5 . The information processing apparatus according to claim 4 , wherein when the volume of the second message is louder, the decision unit decides to record the second message and delete the first message.

The acquisition unit acquires a time when the acquisition of the voice of the first message is finished and a time when the acquisition of the voice of the second message is finished, and obtains a period during which the first message is obtained and obtain a time period during which the second message was obtained;
If the period during which the first message was acquired and the period during which the second message was acquired overlap, the determining unit determines, based on the degree of similarity calculated by the first calculating unit 2. The information processing apparatus according to claim 1, wherein the information processing apparatus determines whether or not to record the first message.

The acquisition unit acquires a time when the acquisition of the voice of the first message is finished and a time when the acquisition of the voice of the second message is finished, and obtains a period during which the first message is obtained and obtain a time period during which the second message was obtained;
The determining unit records the first message and the second message when the period during which the first message is acquired and the period during which the second message is acquired do not overlap. 2. The information processing apparatus according to claim 1, wherein

An information processing system including an information processing device connected to a plurality of information processing terminals,
The information processing device is
an acquisition unit configured to acquire a text obtained by recognizing a voice acquired by an information processing terminal and a message including a time when acquisition of the voice is started;
Text included in the first message acquired by the acquisition unit and text included in the second message acquired by the acquisition unit and having the time before the time included in the first message a first calculator that calculates the degree of similarity;
determining whether or not to record the first message based on the degree of similarity calculated by the first calculating unit; a determination unit that transmits to the information processing terminal;
a second calculator that calculates the similarity of the speech waveform of the speech;
If the degree of similarity calculated by the first calculation unit determines that the similarity is similar, the second calculation unit calculates the voice waveform included in the first message and the voice waveform included in the second message. Calculate the similarity with the speech waveform that is
The determination unit determines whether to record the first message based on the similarity calculated by the second calculation unit;
The information processing terminal
an audio acquisition unit that acquires audio;
and a display for displaying text included in the message.

An information processing method executed by an information processing device connected to a plurality of information processing terminals,
an acquisition procedure for acquiring a text obtained by recognizing a voice acquired by an information processing terminal and a message including a time at which acquisition of the voice is started;
a text included in the first message obtained by the obtaining procedure; and a second text obtained by the obtaining procedure and having the time before the time included in the first message.
A first calculation procedure for calculating the similarity with the text included in the message of
a decision procedure for deciding whether to record the first message based on the similarity calculated by the calculation procedure;
and a second calculation procedure for calculating the similarity of the speech waveform of the speech,
If the degree of similarity calculated by the first calculation procedure determines that the similarity is similar, the second calculation procedure calculates the speech waveform included in the first message and the speech waveform included in the second message. Calculate the similarity with the speech waveform that is
The information processing method, wherein the determination procedure determines whether or not to record the first message based on the degree of similarity calculated by the second calculation procedure .