JP7728878B2

JP7728878B2 - Information processing device, information processing method, and information processing program

Info

Publication number: JP7728878B2
Application number: JP2023556050A
Authority: JP
Inventors: 義大石原
Original assignee: Pioneer Corp
Current assignee: Pioneer Corp
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2025-08-25
Anticipated expiration: 2041-10-29
Also published as: WO2023073945A1; JPWO2023073945A1

Description

本発明は、情報処理装置、情報処理方法および情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program.

従来、車両のナビゲーション装置に対して実行される音声認識システムが知られている。例えば、特許文献１には、音声認識システムに搭載される音声認識エンジンが発話音声を誤認識した場合に、誤認識による認識結果を訂正できるようにする手法が開示されている。 Conventionally, voice recognition systems that are executed on vehicle navigation devices are known. For example, Patent Document 1 discloses a method for correcting erroneous recognition results when a voice recognition engine installed in a voice recognition system erroneously recognizes spoken voice.

特開２００４－３３３７０３号公報Japanese Patent Application Laid-Open No. 2004-333703

しかしながら、上記の従来技術では、利用者の音声による操作入力に対して正しい動作を実行できるよう制御することができるとは限らない。 However, the above-mentioned conventional technology does not always provide the control to perform the correct action in response to the user's voice input.

例えば、上記の従来技術では、音声認識エンジンによる発話音声の誤認識を検知した場合に、誤認識された単語に対して利用者が以前に訂正したことのある単語を認識単語リンクＤＢから読み出して正解候補として提示するとともに、誤認識された単語と利用者により訂正された正解単語とを対応付けて認識単語リンクＤＢに新たに登録している。 For example, in the above-mentioned conventional technology, when a speech recognition engine detects a misrecognition of spoken speech, words that the user has previously corrected for the misrecognized word are read from the recognized word link DB and presented as correct answer candidates, and the misrecognized word and the correct word corrected by the user are associated and newly registered in the recognized word link DB.

このように、上記の従来技術は、利用者が発話した単語を音声認識エンジンが誤認識した場合であっても、誤認識された単語に対する正しい単語を音声認識エンジンが認識することができるよう辞書登録するものであり、係る処理は、利用者による言い間違えを正しく認識するものではない。 In this way, the above-mentioned conventional technology registers words in a dictionary so that even if the voice recognition engine misrecognizes a word spoken by a user, the correct word for the misrecognized word can be recognized by the voice recognition engine; however, this process does not correctly recognize mistakes made by the user.

したがって、上記の従来技術では、利用者が言い間違えた場合であっても言い間違えに対する正しい動作を実行できるよう制御することができるとは限らない。また、上記の従来技術では、利用者が発話した単語が、操作動作として登録されている単語とは異なる場合、音声認識エンジンは利用者の意図する操作を正しく認識することはできない。Therefore, with the above-mentioned conventional technology, it is not always possible to control the system so that the correct action is taken in response to a user's mistake, even if the user makes a mistake. Furthermore, with the above-mentioned conventional technology, if the words spoken by the user differ from the words registered as operation actions, the voice recognition engine will not be able to correctly recognize the operation intended by the user.

本発明は、上記に鑑みてなされたものであって、利用者の音声による操作入力に対して正しい動作を実行できるよう制御することができる情報処理装置、情報処理方法および情報処理プログラムを提供することを目的とする。 The present invention has been made in consideration of the above, and aims to provide an information processing device, an information processing method, and an information processing program that can be controlled to perform the correct operation in response to a user's voice input.

請求項１に記載の情報処理装置は、第１の発話音声が入力された後に、所定の対象物に触れることで情報入力する第２の入力操作が行われた場合には、前記第１の発話音声が示す発話内容と、前記第２の入力操作が示す操作内容とに基づいて、前記第２の入力操作が前記発話内容を訂正するための訂正操作であるか否かを判定する判定部と、前記判定部により前記第２の入力操作が前記訂正操作であると判定された場合には、前記操作内容と、前記発話内容とを紐付ける紐付部と、前記紐付部による紐付結果に基づいて、前記発話内容に対する所定の制御を行う情報制御部とを有することを特徴とする。 The information processing device described in claim 1 is characterized in that, when a second input operation is performed to input information by touching a predetermined object after a first utterance is input, the information processing device has a determination unit that determines whether the second input operation is a correction operation for correcting the utterance content based on the utterance content indicated by the first utterance and the operation content indicated by the second input operation; a linking unit that links the operation content with the utterance content when the determination unit determines that the second input operation is the correction operation; and an information control unit that performs predetermined control on the utterance content based on the linking result by the linking unit.

請求項９に記載の情報処理方法は、情報処理装置が実行する情報処理方法であって、第１の発話音声が入力された後に、所定の対象物に触れることで情報入力する第２の入力操作が行われた場合には、前記第１の発話音声が示す発話内容と、前記第２の入力操作が示す操作内容とに基づいて、前記第２の入力操作が前記発話内容を訂正するための訂正操作であるか否かを判定する判定工程と、前記判定工程により前記第２の入力操作が前記訂正操作であると判定された場合には、前記操作内容と、前記発話内容とを紐付ける紐付工程と、前記紐付工程による紐付結果に基づいて、前記発話内容に対する所定の制御を行う情報制御工程とを含むことを特徴とする。 The information processing method described in claim 9 is an information processing method executed by an information processing device, and is characterized in that it includes: a determination step, when a second input operation is performed to input information by touching a predetermined object after a first utterance is input, determining whether the second input operation is a correction operation for correcting the utterance content based on the utterance content indicated by the first utterance and the operation content indicated by the second input operation; a linking step, when the determination step determines that the second input operation is the correction operation, linking the operation content with the utterance content; and an information control step, performing predetermined control on the utterance content based on the linking result of the linking step.

請求項１０に記載の情報処理プログラムは、第１の発話音声が入力された後に、所定の対象物に触れることで情報入力する第２の入力操作が行われた場合には、前記第１の発話音声が示す発話内容と、前記第２の入力操作が示す操作内容とに基づいて、前記第２の入力操作が前記発話内容を訂正するための訂正操作であるか否かを判定する判定手順と、前記判定手順により前記第２の入力操作が前記訂正操作であると判定された場合には、前記操作内容と、前記発話内容とを紐付ける紐付手順と、前記紐付手順による紐付結果に基づいて、前記発話内容に対する所定の制御を行う情報制御手順とを情報処理装置に実行させるための情報処理プログラムである。 The information processing program described in claim 10 is an information processing program for causing an information processing device to execute the following steps: when a second input operation is performed to input information by touching a predetermined object after a first utterance is input, a determination step for determining whether the second input operation is a correction operation for correcting the utterance content based on the utterance content indicated by the first utterance and the operation content indicated by the second input operation; when the determination step determines that the second input operation is the correction operation, a linking step for linking the operation content with the utterance content; and an information control step for performing predetermined control on the utterance content based on the linking result of the linking step.

図１は、実施形態に係る情報処理システムの一例を示す図である。FIG. 1 is a diagram illustrating an example of an information processing system according to an embodiment. 図２は、第１の実施形態に係る情報処理を説明する説明図である。FIG. 2 is an explanatory diagram illustrating information processing according to the first embodiment. 図３は、第１の実施形態に係る情報処理装置の構成例を示す図である。FIG. 3 is a diagram illustrating an example of the configuration of the information processing apparatus according to the first embodiment. 図４は、第１の実施形態に係る実施形態に係る発話情報データベースの一例を示す図である。FIG. 4 is a diagram illustrating an example of an utterance information database according to the first embodiment. 図５は、実施形態に係る紐付情報データベースの一例を示す図である。FIG. 5 is a diagram illustrating an example of an association information database according to the embodiment. 図６は、実施形態に係るユーザ辞書データベースの一例を示す図である。FIG. 6 is a diagram illustrating an example of a user dictionary database according to the embodiment. 図７は、第１の実施形態に係る情報処理の手順を示すフローチャートである。FIG. 7 is a flowchart showing the procedure of information processing according to the first embodiment. 図８は、第２の実施形態に係る情報処理を説明する説明図である。FIG. 8 is an explanatory diagram illustrating information processing according to the second embodiment. 図９は、第２の実施形態に係る情報処理装置の構成例を示す図である。FIG. 9 is a diagram illustrating an example of the configuration of an information processing apparatus according to the second embodiment. 図１０は、第２の実施形態に係る操作情報データベースの一例を示す図である。FIG. 10 is a diagram illustrating an example of an operation information database according to the second embodiment. 図１１は、第２の実施形態に係る情報処理の手順を示すフローチャートである。FIG. 11 is a flowchart showing the procedure of information processing according to the second embodiment. 図１２は、情報処理装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 12 is a hardware configuration diagram illustrating an example of a computer that realizes the functions of the information processing device.

以下に、情報処理装置、情報処理方法および情報処理プログラムを実施するための形態（以下、「実施形態」と記載する）の一例について図面を参照しつつ詳細に説明する。なお、この実施形態により情報処理装置、情報処理方法および情報処理プログラムが限定されるものではない。また、以下の実施形態において同一の部位には同一の符号を付し、重複する説明は省略する。 Below, an example of a form for implementing an information processing device, an information processing method, and an information processing program (hereinafter referred to as an "embodiment") will be described in detail with reference to the drawings. Note that the information processing device, information processing method, and information processing program are not limited to this embodiment. Furthermore, identical components in the following embodiments will be given the same reference numerals, and duplicate explanations will be omitted.

［実施形態］
（各実施形態における共通事項）
〔１．はじめに〕
例えば、車両のナビゲーション装置には、利用者により入力された音声を認識し、認識結果に応じた情報処理（例えば、ルート案内）を行うという音声認識システムが搭載されている場合がある。このような場合、利用者は、ナビゲーション装置に対して、特定の動作を行うよう指示する内容の発話音声や、目的地を示す発話音声等を入力するが、言い間違えにより、意図した内容とは異なる内容の発話音声を入力してしまうことがある。そうすると、音声認識システムは、言い間違えられた誤りの内容に応じた動作を実行することとなるため、利用者にとって都合が悪い。 [Embodiment]
(Common features in each embodiment)
1. Introduction
For example, a vehicle navigation device may be equipped with a voice recognition system that recognizes speech input by a user and performs information processing (e.g., route guidance) according to the recognition result. In such cases, the user inputs speech to the navigation device, such as a speech instructing the navigation device to perform a specific action or a speech indicating a destination, but the user may make a mistake and input a speech that is different from the intended content. In such cases, the speech recognition system will execute an action according to the mistaken content, which is inconvenient for the user.

そこで、本発明は、上記事情に着目してなされたものであって、その目的とするところは、利用者が言い間違えた場合であっても言い間違えに対する正しい動作を実行できるよう制御することにある。このような目的のため、本発明では、利用者の動作から言い間違えを訂正しようとする意図を推定する。そして、本発明では、言い間違えを訂正しようとする意図の元に行われた動作の内容に基づいて、言い間違えられた内容と、この内容に対する本来の正しい内容とを紐付けておくことで、今後、紐付結果に応じた処理を実行する。 The present invention was developed with the above-mentioned circumstances in mind, and its purpose is to provide control so that the correct action can be taken in response to a user's slip-up, even if the user has made a mistake. To achieve this purpose, the present invention infers the user's intention to correct the slip-up from their actions. Then, based on the content of the actions taken with the intention of correcting the slip-up, the present invention links the misspoken content with the correct content for that content, and then performs processing according to the linking results.

ここで、本発明に対応する情報処理は、言い間違えた誤りの内容を発話音声で訂正しようとする利用者の訂正音声を検出することで、検出した訂正音声の内容に基づく紐付けを行う情報処理と、言い間違えた誤りの内容を発話音声以外の入力手段（例えば、手入力）で訂正しようとする利用者の訂正操作を検出することで、検出した訂正操作の内容に基づく紐付けを行う情報処理とに分けることができる。よって、以下の実施形態では、前者の情報処理を第１の実施形態とし、後者の情報処理を第２の実施形態として説明する。 Here, information processing corresponding to the present invention can be divided into information processing that detects the correction voice of a user attempting to correct the content of a slip of the tongue using spoken voice, and performs linking based on the content of the detected correction voice, and information processing that detects the correction operation of a user attempting to correct the content of a slip of the tongue using an input means other than spoken voice (e.g., manual input), and performs linking based on the content of the detected correction operation. Therefore, in the following embodiments, the former information processing will be described as the first embodiment, and the latter information processing will be described as the second embodiment.

〔２．システムの全体像について〕
第１の実施形態、第２の実施形態それぞれについて具体的に説明するにあたって、まず、双方の実施形態の共通事項として、実施形態に係る情報処理システムの構成を説明する。図１は、実施形態に係る情報処理システムの一例を示す図である。図１には、実施形態に係る情報処理システムの一例として、情報処理システム１が示される。後述する第１の実施形態、および、第２の実施形態は、図１に示す情報処理システム１内で実現されてよい。 [2. Overview of the system]
Before specifically describing the first and second embodiments, the configuration of an information processing system according to the embodiments will be described first as a feature common to both embodiments. FIG. 1 is a diagram illustrating an example of an information processing system according to the embodiments. FIG. 1 illustrates an information processing system 1 as an example of an information processing system according to the embodiments. The first and second embodiments described below may be realized within the information processing system 1 illustrated in FIG. 1.

図１に示すように、情報処理システム１は、端末装置１０と、情報処理装置１００とを備えてよい。また、端末装置１０と、情報処理装置１００とは、ネットワークＮを介して、有線または無線により通信可能に接続される。また、図１に示す情報処理システム１には、任意の数の端末装置１０と、任意の数の情報処理装置１００とが含まれてもよい。 As shown in FIG. 1, the information processing system 1 may include a terminal device 10 and an information processing device 100. The terminal device 10 and the information processing device 100 are connected to each other via a network N so that they can communicate with each other via a wired or wireless connection. The information processing system 1 shown in FIG. 1 may include any number of terminal devices 10 and any number of information processing devices 100.

端末装置１０は、移動体の一例である車両に搭載される車載装置であってよい。図１には、端末装置１０が車両ＶＥｘの車載装置である例が示される。係る例では、端末装置１０は、例えば、車両ＶＥｘに内蔵される専用のナビゲーション装置、あるいは、車両ＶＥｘに取り付けられる専用のナビゲーション装置であってよい。 The terminal device 10 may be an on-board device mounted on a vehicle, which is an example of a moving body. Figure 1 shows an example in which the terminal device 10 is an on-board device of a vehicle VEx. In such an example, the terminal device 10 may be, for example, a dedicated navigation device built into the vehicle VEx, or a dedicated navigation device attached to the vehicle VEx.

また、端末装置１０は、後述する情報処理装置１００として機能するよう構成されてもよい。例えば、図１には、端末装置１０と、情報処理装置１００とが別々の装置として示されているが、端末装置１０と情報処理装置１００とは一体化されて１つの情報処理装置として構成されてもよい。係る場合、例えば、端末装置１０に対して、情報処理装置１００が有する機能の一部または全てが導入されてよい。 The terminal device 10 may also be configured to function as the information processing device 100 described below. For example, while the terminal device 10 and the information processing device 100 are shown as separate devices in FIG. 1, the terminal device 10 and the information processing device 100 may be integrated into one information processing device. In such a case, for example, some or all of the functions of the information processing device 100 may be implemented in the terminal device 10.

なお、端末装置１０は、所定のナビゲーションシステムに対応するアプリケーションが導入されている携帯型端末装置（例えば、スマートフォン、タブレット型端末、ノート型ＰＣ、デスクトップＰＣ、ＰＤＡ等）であってもよい。係る例では、端末装置１０は、例えば、車両ＶＥｘの運転者によって日常的に利用されるものであってよい。 The terminal device 10 may be a portable terminal device (e.g., a smartphone, tablet device, notebook PC, desktop PC, PDA, etc.) that has an application corresponding to a specific navigation system installed. In such an example, the terminal device 10 may be one that is used daily by, for example, the driver of the vehicle VEx.

また、端末装置１０は、利用者による発話音声を集音する集音部（例えば、マイク）を有してよい。そして、集音部を介して収集された発話音声を示す発話情報は、端末装置１０によって情報処理装置１００に送信されてよい。 The terminal device 10 may also have a sound collection unit (e.g., a microphone) that collects the user's speech. Speech information indicating the speech collected via the sound collection unit may then be transmitted by the terminal device 10 to the information processing device 100.

また、端末装置１０は、カメラ、加速度センサ、ジャイロセンサ、ＧＰＳセンサ、気圧センサ等の各種センサも有していてよい。そして、センサによって検出されたセンサ情報は、端末装置１０によって情報処理装置１００に送信されてよい。また、車両ＶＥｘも、例えば、安全走行システム用のセンサを有していてよく、このセンサによるセンサ情報も情報処理装置１００に送信されてよい。 The terminal device 10 may also have various sensors such as a camera, an acceleration sensor, a gyro sensor, a GPS sensor, and an air pressure sensor. Sensor information detected by the sensors may be transmitted by the terminal device 10 to the information processing device 100. The vehicle VEx may also have, for example, a sensor for a safe driving system, and sensor information from this sensor may also be transmitted to the information processing device 100.

情報処理装置ＳＶは、実施形態に係る情報処理を行う装置である。例えば、情報処理装置ＳＶは、実施形態に係る情報処理プログラムで実現される情報処理方法に従って、実施形態に係る情報処理を行ってよい。 The information processing device SV is a device that performs information processing according to the embodiment. For example, the information processing device SV may perform information processing according to the embodiment in accordance with an information processing method realized by an information processing program according to the embodiment.

また、例えば、情報処理装置ＳＶは、第１の実施形態に係る情報処理として、第１の発話音声が入力された後に、第２の発話音声が入力された場合には、第１の発話音声が示す第１の発話内容と、第２の発話音声が示す第２の発話内容とに基づいて、第２の発話音声が第１の発話内容を訂正するために入力された音声であるか否かを判定する。 Furthermore, for example, as information processing relating to the first embodiment, when a second utterance is input after a first utterance is input, the information processing device SV determines whether the second utterance is an utterance input to correct the first utterance content based on the first utterance content indicated by the first utterance and the second utterance content indicated by the second utterance.

具体的には、情報処理装置ＳＶは、第１の発話内容が言い間違えによる誤りの内容であり、この誤りの内容を訂正する意図で利用者が第２の発話音声を入力したか否か利用者の意図を推定する。つまり、情報処理装置ＳＶは、第１の発話音声が示す第１の発話内容と、第２の発話音声が示す第２の発話内容とに基づいて、利用者が第２の発話内容によって第１の発話内容を訂正する意図があるか否かを推定する。そして、情報処理装置ＳＶは、推定結果に応じて、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定する。 Specifically, the information processing device SV estimates the user's intention, that is, whether the first utterance content is an error due to a slip of the tongue and whether the user input the second utterance with the intention of correcting this error. In other words, the information processing device SV estimates whether the user intends to correct the first utterance content with the second utterance content, based on the first utterance content indicated by the first utterance content and the second utterance content indicated by the second utterance content. Then, based on the estimation result, the information processing device SV determines whether the second utterance is correction speech input to correct the first utterance content.

また、情報処理装置ＳＶは、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であると判定した場合には、第１の発話内容と、第２の発話内容とを紐付けることで、紐付結果に基づいて、第１の発話内容に対する所定の制御を行う。 In addition, if the information processing device SV determines that the second utterance is a correction voice input to correct the first utterance content, it links the first utterance content with the second utterance content and performs predetermined control over the first utterance content based on the linking result.

一方、情報処理装置ＳＶは、第２の実施形態に係る情報処理として、第１の発話音声が入力された後に、所定の対象物に触れることで情報入力する第２の入力操作（例えば、手入力操作）が行われた場合には、第１の発話音声が示す発話内容と、第２の入力操作が示す操作内容とに基づいて、第２の入力操作が発話内容を訂正するための訂正操作であるか否かを判定する。 On the other hand, as information processing relating to the second embodiment, when a second input operation (e.g., a manual input operation) is performed to input information by touching a specified object after a first utterance is input, the information processing device SV determines whether the second input operation is a correction operation for correcting the utterance content based on the utterance content indicated by the first utterance and the operation content indicated by the second input operation.

具体的には、情報処理装置ＳＶは、発話内容が言い間違えによる誤りの内容であり、この誤りの内容を訂正する意図で利用者が第２の入力操作を行ったか否か利用者の意図を推定する。つまり、情報処理装置ＳＶは、第１の発話音声が示す発話内容と、第２の入力操作が示す操作内容とに基づいて、利用者が操作内容によって発話内容を訂正する意図があるか否かを推定する。そして、情報処理装置ＳＶは、推定結果に応じて、第２の入力操作が発話内容を訂正するための訂正操作であるか否かを判定する。 Specifically, the information processing device SV estimates the user's intention by determining whether the spoken content is an error due to a slip of the tongue and whether the user performed the second input operation with the intention of correcting the error. In other words, the information processing device SV estimates whether the user intends to correct the spoken content by the operation content based on the spoken content indicated by the first spoken voice and the operation content indicated by the second input operation. Then, based on the estimation result, the information processing device SV determines whether the second input operation is a correction operation for correcting the spoken content.

また、情報処理装置ＳＶは、第２の入力操作が訂正操作であると判定した場合には、操作内容と、発話内容とを紐付けることで、紐付結果に基づいて、発話内容に対する所定の制御を行う。 In addition, when the information processing device SV determines that the second input operation is a correction operation, it links the operation content with the utterance content and performs predetermined control over the utterance content based on the linking result.

ここで、端末装置１０を利用者の近くでエッジ処理を行うエッジコンピュータとするなら、情報処理装置ＳＶは、例えば、クラウド側で処理を行うクラウドコンピュータであってよい。すなわち、情報処理装置ＳＶは、サーバ装置であってよい。 Here, if the terminal device 10 is an edge computer that performs edge processing near the user, the information processing device SV may be, for example, a cloud computer that performs processing on the cloud side. In other words, the information processing device SV may be a server device.

以下では、第１の実施形態、第２の実施形態それぞれについて具体的に説明する。なお、第１の実施形態に係る情報処理を行う情報処置装置ＳＶを「情報処理装置１００」とする。また、第２の実施形態に係る情報処理を行う情報処置装置ＳＶを「情報処理装置２００」とする。 The following provides a specific description of the first and second embodiments. Note that the information processing device SV that performs information processing according to the first embodiment will be referred to as the "information processing device 100." Furthermore, the information processing device SV that performs information processing according to the second embodiment will be referred to as the "information processing device 200."

また、各実施形態では、移動体を車両ＶＥｘとして説明するが、移動体は車両ＶＥｘに限定されるものではない。また、各実施形態で示す利用者とは、車両ＶＥｘに搭乗して端末装置１０に対して音声入力したことのある人物であれば、いかなる人物であってよい。例えば、利用者とは、車両ＶＥｘを日常的に利用している人物、すなわち車両ＶＥｘの所有者であってよい。 Furthermore, in each embodiment, the moving body is described as a vehicle VEx, but the moving body is not limited to a vehicle VEx. Furthermore, the user shown in each embodiment may be any person who has boarded a vehicle VEx and made voice input to the terminal device 10. For example, the user may be a person who uses the vehicle VEx on a daily basis, i.e., the owner of the vehicle VEx.

（第１の実施形態）
〔１．第１の実施形態の全体像〕
ここからは、図２を用いて、第１の実施形態について説明する。図２は、第１の実施形態に係る情報処理を説明する説明図である。 (First embodiment)
1. Overview of the First Embodiment
The first embodiment will now be described with reference to Fig. 2. Fig. 2 is an explanatory diagram illustrating information processing according to the first embodiment.

図２には、利用者Ｕ１が、車両ＶＥ１（車両ＶＥｘの一例）に搭載される端末装置１０に向かって発話することで、発話音声を入力している場面が示される。より具体的には、図２には、利用者Ｕ１が、「大阪府のイバラキ市に存在する目的地○○」までのルートを案内するよう指示する内容の発話音声を入力している場面が示される。 Figure 2 shows a scene in which user U1 is inputting speech by speaking into terminal device 10 mounted on vehicle VE1 (an example of vehicle VEx). More specifically, Figure 2 shows a scene in which user U1 is inputting speech instructing to be guided along a route to "destination XX located in Ibaraki City, Osaka Prefecture."

このような場合、端末装置１０は、発話音声の入力を受け付けるたびに、受け付けた発話音声を示す音声情報を情報処理装置１００に送信する。この結果、情報処理装置１００は、端末装置１０から音声情報を取得する（ステップＳ１１）。In such a case, each time the terminal device 10 receives input of a speech voice, it transmits speech information indicating the received speech voice to the information processing device 100. As a result, the information processing device 100 acquires the speech information from the terminal device 10 (step S11).

例えば、図２には、利用者Ｕ１が、「「イバラギ」市の○○までルート案内おねがい！」といった内容Ｃ１１の発話音声ＶＯ１１を入力した例が示される。係る例では、端末装置１０は、発話音声ＶＯ１１の入力に応じて、発話内容Ｃ１１を示す音声情報を情報処理装置１００に送信する。この結果、情報処理装置１００は、発話内容Ｃ１１を示す音声情報を取得する。 For example, Figure 2 shows an example in which user U1 inputs a speech VO11 with content C11 such as "Please guide me to XX in Ibaragi City!" In this example, in response to the input of the speech VO11, the terminal device 10 transmits audio information indicating the speech content C11 to the information processing device 100. As a result, the information processing device 100 acquires the audio information indicating the speech content C11.

ここで、利用者Ｕ１は、正しくは「イバラキ」と発話すべきところ、「イバラギ」と言い間違えてしまったことに気付いたとする。そして、利用者Ｕ１は、図２に示すように、「「イバラキ」市の○○までルート案内おねがい！」といった内容Ｃ１２の発話音声ＶＯ１２を入力し直したとする。係る例では、端末装置１０は、発話音声ＶＯ１２の入力に応じて、発話内容Ｃ１２を示す音声情報を情報処理装置１００に送信する。この結果、情報処理装置１００は、発話内容Ｃ１２を示す音声情報を取得する。 Here, assume that user U1 realizes that he/she mistakenly said "Ibaragi" when he/she should have said "Ibaraki." Then, as shown in FIG. 2, user U1 re-inputs the spoken voice VO12 of content C12, such as "Please guide me to XX in Ibaraki City!" In this example, in response to the input of the spoken voice VO12, the terminal device 10 transmits audio information indicating the spoken content C12 to the information processing device 100. As a result, the information processing device 100 acquires the audio information indicating the spoken content C12.

次に、情報処理装置１００は、利用者Ｕ１によって発話されたタイミングの前後関係に基づいて、第１の発話音声を示す第１の音声情報、および、第２の発話音声を示す第２の音声情報を取得する（ステップＳ１２）。例えば、情報処理装置１００は、発話タイミングの前後関係に基づいて、先に入力された発話音声である第１の発話音声と、第１の発話音声が入力された後に入力された発話音声である第２の発話音声とを認識してよい。また、これにより情報処理装置１００は、端末装置１０を介してこれまでに収集している音声情報の中から、第１の発話音声を示す第１の音声情報、および、第２の発話音声を示す第２の音声情報を取得してよい。Next, the information processing device 100 acquires first speech information indicating the first speech and second speech information indicating the second speech based on the context of the timing of the speech made by user U1 (step S12). For example, the information processing device 100 may recognize the first speech, which is the speech input first, and the second speech, which is the speech input after the first speech, based on the context of the timing of the speech. Furthermore, the information processing device 100 may thereby acquire the first speech information indicating the first speech and the second speech information indicating the second speech from the speech information that has been collected so far via the terminal device 10.

図２の例では、情報処理装置１００は、発話音声ＶＯ１１を第１の発話音声として認識することで、発話内容Ｃ１１を示す音声情報を第１の音声情報として取得したものとする。また、図２の例では、情報処理装置１００は、発話音声ＶＯ１２を第２の発話音声として認識することで、発話内容Ｃ１２を示す音声情報を第２の音声情報として取得したものとする。以下、発話音声ＶＯ１１を「第１の発話音声ＶＯ１１」と表記し、発話音声ＶＯ１２を「第２の発話音声ＶＯ１２」と表記する場合がある。 In the example of Figure 2, the information processing device 100 recognizes the speech voice VO11 as the first speech voice, thereby acquiring speech information indicating the speech content C11 as the first speech information. Also, in the example of Figure 2, the information processing device 100 recognizes the speech voice VO12 as the second speech voice, thereby acquiring speech information indicating the speech content C12 as the second speech information. Hereinafter, the speech voice VO11 may be referred to as the "first speech voice VO11," and the speech voice VO12 may be referred to as the "second speech voice VO12."

次に、情報処理装置１００は、第１の音声情報（第１の発話内容）を構成する各キーワードである第１のキーワードと、第２の音声情報（第２の発話内容）を構成する各キーワードである第２のキーワードとの類似性に基づいて、言い間違えたことによる訂正の意図を推定する意図解析を行う（ステップＳ１３）。具体的には、情報処理装置１００は、第１のキーワードと、第２のキーワードとの類似性に基づいて、利用者Ｕ１が第２の発話内容によって、第１の発話内容を訂正する意図があるか否か意図推定を行う。ステップＳ１３で行われる意図解析の具体的な手法については後述する。Next, the information processing device 100 performs an intention analysis to estimate the intention of correcting the mistake based on the similarity between the first keywords, which are the keywords that make up the first speech information (first utterance content), and the second keywords, which are the keywords that make up the second speech information (second utterance content) (step S13). Specifically, the information processing device 100 estimates the intention of user U1 to correct the first utterance content using the second utterance content based on the similarity between the first keywords and the second keywords. The specific method of intention analysis performed in step S13 will be described later.

なお、情報処理装置１００は、ステップＳ１３では、第１の音声情報を示すテキストに対する形態素解析により、このテキストを構成する各単語を第１のキーワードとして抽出してよい。同様に、情報処理装置１００は、第２の音声情報を示すテキストに対する形態素解析により、このテキストを構成する各単語を第２のキーワードとして抽出してよい。 In step S13, the information processing device 100 may perform morphological analysis on the text representing the first audio information to extract each word constituting this text as a first keyword. Similarly, the information processing device 100 may perform morphological analysis on the text representing the second audio information to extract each word constituting this text as a second keyword.

続いて、情報処理装置１００は、意図解析による推定結果に基づいて、第２の発話音声ＶＯ１２が、第１の発話音声ＶＯ１１に対応する第１の発話内容を訂正するために音声入力された訂正音声であるか否かを判定する（ステップＳ１４）。例えば、情報処理装置１００は、第１のキーワードの１つである「イバラギ」と、第２のキーワードの１つである「イバラキ」との間で類似性があると認められた場合には、利用者Ｕ１が第２の発話内容（第２のキーワードＫＷ１２）によって、第１の発話内容（第１のキーワードＫＷ１１）を訂正する意図があると推定することができる。また、この結果、情報処理装置１００は、第２の発話音声ＶＯ１２が、第１の発話内容を訂正するために音声入力された訂正音声であると判定することができる。Next, based on the estimation results of the intention analysis, the information processing device 100 determines whether the second utterance VO12 is a correction speech input to correct the first utterance content corresponding to the first utterance VO11 (step S14). For example, if the information processing device 100 recognizes similarity between "Ibaragi," one of the first keywords, and "Ibaraki," one of the second keywords, it can infer that user U1 intends to correct the first utterance content (first keyword KW11) using the second utterance content (second keyword KW12). As a result, the information processing device 100 can determine that the second utterance VO12 is a correction speech input to correct the first utterance content.

このように、第２の発話音声ＶＯ１２が、第１の発話内容を訂正するために音声入力された訂正音声であると判定した場合には、情報処理装置１００は、第２のキーワード「イバラキ」を正解情報とし、第１のキーワード「イバラギ」を正解情報に対する誤り情報として、第２のキーワード「イバラキ」と、第１のキーワード「イバラギ」とを紐付ける（ステップＳ１５）。図２には、第２のキーワード「イバラキ」を正解情報とし、第１のキーワード「イバラギ」を正解情報とした今回の例を含めて、利用者Ｕ１が過去にも「イバラキ」を「イバラギ」と言い間違えたことによる紐付結果の例や、「イバラキ」を「イバラク」と言い間違えたことによる紐付結果の例が示される。また、このような紐付結果は、紐付けＩＤを用いて紐付情報データベース１２２（図５）で管理されてよい。In this way, if the information processing device 100 determines that the second speech VO12 is a correction speech input to correct the first speech content, the information processing device 100 associates the second keyword "Ibaraki" with the first keyword "Ibaragi," with the second keyword "Ibaraki" as the correct answer and the first keyword "Ibaragi" as the incorrect answer (step S15). Figure 2 shows examples of association results resulting from user U1's past mispronunciations of "Ibaraki" as "Ibaragi" and "Ibaraki" as "Ibaraku," including the current example in which the second keyword "Ibaraki" is the correct answer and the first keyword "Ibaragi" is the correct answer. Furthermore, such association results may be managed in the association information database 122 (Figure 5) using an association ID.

また、情報処理装置１００は、正解情報と誤り情報との組を学習データとして、誤り情報が示す第１のキーワードのうち、正解情報が示す第２のキーワードに対して間違えられやすいキーワードを学習する（ステップＳ１６）。図２の例では、情報処理装置１００は、第１のキーワード「イバラギ」、および、第１のキーワード「イバラク」のうち、第２のキーワード「イバラキ」に対して間違えられやすいものを学習する。 In addition, the information processing device 100 uses pairs of correct answer information and erroneous information as learning data to learn which of the first keywords indicated by the erroneous information are likely to be confused with the second keyword indicated by the correct answer information (step S16). In the example of Figure 2, the information processing device 100 learns which of the first keywords "Ibaragi" and "Ibaraku" are likely to be confused with the second keyword "Ibaraki."

そして、情報処理装置１００は、学習結果に基づいて、キーワードを辞書登録する（ステップＳ１７）。例えば、情報処理装置１００は、学習結果に基づいて、第１のキーワードのうち、第２のキーワードに対して間違えられやすいキーワードを含む発話音声が入力された場合に、入力されたこの第１のキーワードが第２のキーワードとして認識されるよう、この第１のキーワードをユーザ辞書（図６）に登録する。図２には、情報処理装置１００が、第１のキーワード「イバラギ」を含む発話音声が入力された場合に、第１のキーワード「イバラギ」が第２のキーワード「イバラキ」として認識されるよう、第１のキーワード「イバラギ」をユーザ辞書に登録している例が示される。 Then, the information processing device 100 registers the keyword in the dictionary based on the learning results (step S17). For example, based on the learning results, the information processing device 100 registers the first keyword in the user dictionary (Figure 6) so that when a speech containing a keyword that is easily confused with a second keyword is input, the input first keyword will be recognized as the second keyword. Figure 2 shows an example in which the information processing device 100 registers the first keyword "Ibaragi" in the user dictionary so that when a speech containing the first keyword "Ibaragi" is input, the first keyword "Ibaragi" will be recognized as the second keyword "Ibaraki."

さて、これまで図２で説明してきたように、第１の実施形態では、情報処理装置１００は、第１の発話音声が入力された後に、第２の発話音声が入力された場合には、第１の発話音声が示す第１の発話内容と、第２の発話音声が示す第２の発話内容とに基づいて、第２の発話音声が第１の発話内容を訂正するために入力された音声であるか否かを判定する。そして、情報処理装置２００は、第２の発話音声が第１の発話内容を訂正するために入力された音声であると判定した場合には、第１の発話内容と、第２の発話内容とを紐付けることで、紐付結果に基づいて、第１の発話内容をユーザ辞書に登録する。 As explained above with reference to FIG. 2, in the first embodiment, when a second utterance is input after a first utterance, the information processing device 100 determines whether the second utterance is input to correct the first utterance based on the first speech content indicated by the first utterance and the second speech content indicated by the second utterance. If the information processing device 200 determines that the second utterance is input to correct the first utterance, it links the first utterance content with the second utterance content and registers the first utterance content in the user dictionary based on the linking result.

このような第１の実施形態に係る情報処理によれば、情報処理装置１００は、利用者が言い間違えた場合であっても言い間違えに対する正しい動作を実行できるよう制御することができる。 According to the information processing of the first embodiment, the information processing device 100 can be controlled so that the correct action is taken in response to the mistake even if the user makes a mistake.

〔２．情報処理装置の構成〕
ここからは、図３を用いて、第１の実施形態に係る情報処理装置１００について説明する。図３は、第１の実施形態に係る情報処理装置１００の構成例を示す図である。図３に示すように、情報処理装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。 2. Configuration of information processing device
From here, the information processing device 100 according to the first embodiment will be described with reference to Fig. 3. Fig. 3 is a diagram showing an example of the configuration of the information processing device 100 according to the first embodiment. As shown in Fig. 3, the information processing device 100 includes a communication unit 110, a storage unit 120, and a control unit 130.

（通信部１１０について）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、ネットワークと有線または無線で接続され、例えば、端末装置１０との間で情報の送受信を行う。 (Regarding the communication unit 110)
The communication unit 110 is realized by, for example, a network interface card (NIC), etc. The communication unit 110 is connected to a network via a wired or wireless connection, and transmits and receives information to and from the terminal device 10, for example.

（記憶部１２０について）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ等の半導体メモリ素子またはハードディスク、光ディスク等の記憶装置によって実現される。記憶部１２０は、発話情報データベース１２１と、紐付情報データベース１２２と、ユーザ辞書データベース１２３とを有する。 (Regarding the storage unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory element such as a random access memory (RAM) or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 120 has an utterance information database 121, a linking information database 122, and a user dictionary database 123.

（発話情報データベース１２１について）
発話情報データベース１２１は、利用者により入力された発話音声に関する情報を記憶する。ここで、図４に、第１の実施形態に係る実施形態に係る発話情報データベース１２１の一例を示す。図４の例では、発話情報データベース１２１は、「利用者ＩＤ」、「発話日時」、「音声情報」といった項目を有する。 (Regarding the utterance information database 121)
The speech information database 121 stores information about speech input by a user. An example of the speech information database 121 according to the first embodiment is shown in Fig. 4. In the example of Fig. 4, the speech information database 121 has items such as "user ID,""utterance date and time," and "speech information."

「利用者ＩＤ」は、端末装置１０に対して発話音声を入力した利用者を識別する識別情報を示す。例えば、情報処理装置１００は、端末装置１０が有するセンサ（例えば、カメラ）による撮像画像に基づき、発話音声を入力した利用者を認識することで、認識した利用者に対して「利用者ＩＤ」を払い出してよい。 "User ID" indicates identification information that identifies the user who inputs the spoken voice to the terminal device 10. For example, the information processing device 100 may recognize the user who inputs the spoken voice based on an image captured by a sensor (e.g., a camera) possessed by the terminal device 10, and assign a "user ID" to the recognized user.

「発話日時」は、発話音声が入力された日時に関する情報を示す。図４には、利用者ＩＤ「Ｕ１」と、「発話日時♯１１」とが対応付けられる例が示される。係る例は、利用者Ｕ１が、発話日時♯１１という日時に発話音声を入力した例を示す。例えば、情報処理装置１００は、「発話日時」を発話タイミングと捉えることで、先に入力された発話音声である第１の発話音声と、第１の発話音声が入力された後に入力された発話音声である第２の発話音声とを認識することができる。 "Speech date and time" indicates information related to the date and time when the utterance was input. Figure 4 shows an example in which user ID "U1" is associated with "Speech date and time #11." This example shows an example in which user U1 inputs speech at the date and time of speech #11. For example, by regarding the "Speech date and time" as the timing of the speech, the information processing device 100 can recognize a first speech, which is the speech input earlier, and a second speech, which is the speech input after the first speech.

「音声情報」は、「利用者ＩＤ」が示す利用者によって入力された発話音声の内容を示す情報である。例えば、「音声情報」は、発話音声に対して任意の音声認識技術が適用されることで得られたテキスト情報であって、発話音声の内容を示すテキスト情報であってよい。なお、発話音声に対する音声認識処理は、端末装置１０によって実行されてもよいし、図１では図示されない音声認識装置によって実行されてもよい。図４には、利用者ＩＤ「Ｕ１」と、「発話日時♯１１」と、「音声情報♯１１」とが対応付けられる例が示される。係る例は、利用者Ｕ１が、発話日時♯１１という日時に入力した発話音声から、この発話音声の内容を示す音声情報♯１１が得られた例を示す。 "Voice information" is information indicating the content of the spoken voice input by the user identified by the "user ID." For example, "voice information" may be text information obtained by applying any voice recognition technology to the spoken voice, and may be text information indicating the content of the spoken voice. Note that the voice recognition process for the spoken voice may be performed by the terminal device 10, or may be performed by a voice recognition device not shown in Figure 1. Figure 4 shows an example in which user ID "U1," "speech date and time #11," and "voice information #11" are associated. This example shows an example in which voice information #11 indicating the content of the spoken voice is obtained from the spoken voice input by user U1 at the date and time of speech #11.

（紐付情報データベース１２２について）
紐付情報データベース１２２は、正解情報と誤り情報とを紐付けて管理する。ここで、図５に、実施形態に係る紐付情報データベース１２２の一例を示す。図５の例では、紐付情報データベース１２２は、「利用者ＩＤ」、「紐付けＩＤ」、「正解情報」、「誤り情報」といった項目を有する。 (Regarding the association information database 122)
The linked information database 122 links and manages correct answer information and error information. Fig. 5 shows an example of the linked information database 122 according to the embodiment. In the example of Fig. 5, the linked information database 122 has items such as "user ID,""linkingID,""correct answer information," and "error information."

「利用者ＩＤ」は、端末装置１０に対して発話音声を入力した利用者を識別する識別情報を示し、図４の「利用者ＩＤ」に対応する。 "User ID" indicates identification information that identifies the user who inputs the spoken voice to the terminal device 10, and corresponds to "User ID" in Figure 4.

「紐付けＩＤ」は、「正解情報」に対する「誤り情報」を、「正解情報」が示すキーワードごとに管理するための識別情報である。図５に示すように、「紐付けＩＤ」は、「正解情報」が示すキーワードごと払い出されてよい。図５には、利用者ＩＤ「Ｕ１」と、紐付けＩＤ「Ｈ１１」と、正解情報「イバラキ」とが対応付けられる例が示される。係る例は、利用者Ｕ１による発話音声の入力に応じて、正解情報としての１つの第２のキーワード「イバラキ」が紐付けＩＤ「Ｈ１１」によって管理される例を示す。 The "linking ID" is identification information for managing "incorrect information" for each keyword indicated by the "correct information." As shown in Figure 5, a "linking ID" may be assigned for each keyword indicated by the "correct information." Figure 5 shows an example in which a user ID "U1," a linking ID "H11," and correct information "Ibaraki" are associated with each other. This example shows an example in which a second keyword, "Ibaraki," as correct information is managed by the linking ID "H11" in response to the input of speech by user U1.

「正解情報」は、第１の発話音声に含まれる第１のキーワードのうち、特定の第１のキーワードを訂正するために、訂正音声として入力された第２の発話音声に含まれる第２のキーワードであって、特定の第１のキーワードを訂正する正しい第２のキーワードを示す情報である。 "Correct answer information" is information indicating the correct second keyword that corrects a specific first keyword, which is a second keyword contained in a second utterance that is input as correction speech in order to correct a specific first keyword among the first keywords contained in the first utterance.

「誤り情報」は、第１の発話音声に含まれる第１のキーワードのうち、訂正音声として入力された第２の発話音声に含まれる第２のキーワードによって訂正される対象のキーワードを示す情報である。 "Error information" is information indicating the keyword among the first keywords contained in the first utterance that is to be corrected by the second keyword contained in the second utterance input as the correction voice.

図５には、利用者ＩＤ「Ｕ１」と、紐付けＩＤ「Ｈ１１」と、正解情報「イバラキ」と、誤り情報「イバラギ」とが対応付けられる例が示される。係る例は、正しくは「イバラキ」と発話すべきところ、利用者Ｕ１が「イバラギ」と言い間違えたことによる紐付結果を示し、この紐付結果が紐付けＩＤ「Ｈ１１」を用いて管理される例を示す。 Figure 5 shows an example in which user ID "U1," linking ID "H11," correct answer information "Ibaraki," and incorrect information "Ibaragi" are associated. This example shows the linking result when user U1 mistakenly says "Ibaragi" when the correct answer is "Ibaraki," and shows an example in which this linking result is managed using linking ID "H11."

また、図５には、利用者ＩＤ「Ｕ１」と、紐付けＩＤ「Ｈ１１」と、正解情報「イバラキ」と、誤り情報「イバラク」とが対応付けられる例が示される。係る例は、正しくは「イバラキ」と発話すべきところ、利用者Ｕ１が「イバラク」と言い間違えたことによる紐付結果を示し、この紐付結果が紐付けＩＤ「Ｈ１１」を用いて管理される例を示す。 Figure 5 also shows an example in which user ID "U1," linking ID "H11," correct answer information "Ibaraki," and incorrect information "Ibaraku" are associated. This example shows the linking result when user U1 mistakenly says "Ibaraku" when the correct answer is "Ibaraki," and shows an example in which this linking result is managed using linking ID "H11."

また、図５には、利用者ＩＤ「Ｕ１」と、紐付けＩＤ「Ｈ１１」と、正解情報「案内中断」と、誤り情報「案内中止」とが対応付けられる例が示される。係る例は、正しくは「案内中断」と発話すべきところ、利用者Ｕ１が「案内中止」と言い間違えたことによる紐付結果を示し、この紐付結果が紐付けＩＤ「Ｈ１２」を用いて管理される例を示す。 Figure 5 also shows an example in which user ID "U1," linking ID "H11," correct answer information "guidance interrupted," and incorrect information "guidance canceled" are associated. This example shows the linking result resulting from user U1 mistakenly saying "guidance canceled" when the correct answer should be "guidance interrupted," and shows an example in which this linking result is managed using linking ID "H12."

なお、図２で説明したように、「正解情報」と「誤り情報」との組は、「誤り情報」が示す第１のキーワードのうち、「正解情報」が示す第２のキーワードに対して間違えられやすいキーワードのパターンを学習するための学習データとして利用される。 As explained in Figure 2, the pair of "correct answer information" and "error information" is used as learning data for learning patterns of keywords that are likely to be confused with the second keyword indicated by the "correct answer information" among the first keywords indicated by the "error information."

（ユーザ辞書データベース１２３について）
ユーザ辞書データベース１２３は、第２のキーワードに対して間違えられやすい第１のキーワードが、この第２のキーワードとして認識されるよう、第１のキーワードと、第２のキーワードとを対応付けて記憶する。ここで、図６に、実施形態に係るユーザ辞書データベース１２３の一例を示す。図６の例では、ユーザ辞書データベース１２３は、「利用者ＩＤ」、「発話キーワード」、「認識キーワード」といった項目を有する。 (Regarding the user dictionary database 123)
The user dictionary database 123 stores first keywords and second keywords in association with each other so that a first keyword that is easily mistaken for a second keyword can be recognized as the second keyword. An example of the user dictionary database 123 according to the embodiment is shown in FIG. 6. In the example of FIG. 6, the user dictionary database 123 has items such as "user ID,""utteredkeyword," and "recognized keyword."

「利用者ＩＤ」は、端末装置１０に対して発話音声を入力した利用者を識別する識別情報を示し、図４および図５の「利用者ＩＤ」に対応する。 "User ID" indicates identification information that identifies the user who inputs the spoken voice to the terminal device 10, and corresponds to "User ID" in Figures 4 and 5.

「発話キーワード」は、学習データを用いた学習の結果、「認識キーワード」が示す第２のキーワードに対して間違えられやすい傾向にあると推定された第１のキーワードを示す。また、「発話キーワード」は、この第１のキーワードを含む発話音声が入力された場合には、この第１のキーワードが「認識キーワード」が示す第２のキーワードとして認識されるよう条件付ける条件情報に相当する。 "Spoken keyword" refers to a first keyword that is estimated, as a result of learning using training data, to be prone to being mistaken for a second keyword indicated by a "recognition keyword." Furthermore, "spoken keyword" corresponds to condition information that conditions the first keyword to be recognized as the second keyword indicated by the "recognition keyword" when spoken speech containing this first keyword is input.

「認識キーワード」は、「発話キーワード」が示す第１のキーワードを含む発話音声が入力された場合に、この第１のキーワードについて正しくはどのようなキーワードとして認識されるべきかを条件付ける条件情報に相当する。 The "recognition keyword" corresponds to condition information that conditions the correct keyword that should be recognized for the first keyword indicated by the "spoken keyword" when spoken speech containing the first keyword is input.

図６には、利用者ＩＤ「Ｕ１」に対して、発話キーワード「イバラギ」と、認識キーワード「イバラキ」とが対応付けられる例が示される。係る例は、第１のキーワード「イバラギ」を含む発話音声が利用者Ｕ１によって入力された場合に、第１のキーワード「イバラギ」が第２のキーワード「イバラキ」として認識されるよう、第１のキーワード「イバラギ」と、第２のキーワード「イバラキ」とが対応付けられた状態で、利用者Ｕ１のユーザ辞書に登録されている例を示す。 Figure 6 shows an example in which the spoken keyword "Ibaragi" and the recognized keyword "Ibaraki" are associated with user ID "U1." This example shows an example in which the first keyword "Ibaragi" and the second keyword "Ibaraki" are registered in the user dictionary of user U1 in a state in which they are associated with each other, so that when user U1 inputs spoken audio containing the first keyword "Ibaragi," the first keyword "Ibaragi" is recognized as the second keyword "Ibaraki."

図６には、利用者ＩＤ「Ｕ１」に対して、発話キーワード「案内中止」と、認識キーワード「案内中断」とが対応付けられる例が示される。係る例は、第１のキーワード「案内中止」を含む発話音声が利用者Ｕ１によって入力された場合に、第１のキーワード「案内中止」が第２のキーワード「案内中断」として認識されるよう、第１のキーワード「案内中止」と、第２のキーワード「案内中断」とが対応付けられた状態で、利用者Ｕ１のユーザ辞書に登録されている例を示す。 Figure 6 shows an example in which the spoken keyword "guidance canceled" and the recognized keyword "guidance interrupted" are associated with user ID "U1." This example shows an example in which the first keyword "guidance canceled" and the second keyword "guidance interrupted" are registered in the user dictionary of user U1 in a state in which they are associated with each other, so that when user U1 inputs a spoken voice containing the first keyword "guidance canceled," the first keyword "guidance canceled" is recognized as the second keyword "guidance interrupted."

（制御部１３０について）
図３に戻り、制御部１３０は、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、情報処理装置１００内部の記憶装置に記憶されている各種プログラム（例えば、実施形態に係る情報処理プログラム）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。 (Regarding the control unit 130)
3, the control unit 130 is realized by a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or the like executing various programs (e.g., the information processing program according to the embodiment) stored in a storage device inside the information processing device 100 using RAM as a work area. The control unit 130 is also realized by an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図３に示すように、制御部１３０は、取得部１３１と、訂正音声判定部１３２と、検出部１３３と、紐付部１３４と、学習部１３５と、情報制御部１３６とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部１３０が有する各処理部の接続関係は、図３に示した接続関係に限られず、他の接続関係であってもよい。 As shown in FIG. 3, the control unit 130 has an acquisition unit 131, a corrected voice determination unit 132, a detection unit 133, a linking unit 134, a learning unit 135, and an information control unit 136, and realizes or executes the information processing functions and actions described below. Note that the internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be other configurations as long as they perform the information processing described below. Furthermore, the connection relationships between the processing units of the control unit 130 are not limited to the connection relationships shown in FIG. 3, and may be other connection relationships.

（取得部１３１について）
取得部１３１は、第１の実施形態に係る情報処理で用いられる各種情報を取得する。また、取得部１３１は、取得した情報を、この情報を用いて処理を行う適切な処理部へと出力してよい。 (Regarding the Acquisition Unit 131)
The acquisition unit 131 acquires various pieces of information used in the information processing according to the first embodiment. Furthermore, the acquisition unit 131 may output the acquired information to an appropriate processing unit that performs processing using this information.

例えば、取得部１３１は、利用者によって発話音声が端末装置１０に入力された場合に、入力された発話音声を示す音声情報を取得する。例えば、取得部１３１は、端末装置１０によって音声情報が生成された場合には、端末装置１０から音声情報を取得してよい。また、例えば、取得部１３１は、音声認識装置（不図示）によって音声情報が生成された場合には、音声認識装置から音声情報を取得してよい。 For example, when a user inputs a speech sound into the terminal device 10, the acquisition unit 131 acquires speech information indicating the input speech sound. For example, when the speech information is generated by the terminal device 10, the acquisition unit 131 may acquire the speech information from the terminal device 10. Also, for example, when the speech information is generated by a speech recognition device (not shown), the acquisition unit 131 may acquire the speech information from the speech recognition device.

また、取得部１３１は、第１の発話音声を示す第１の音声情報、および、第２の発話音声を示す第２の音声情報を取得してよい。例えば、取得部１３１は、発話タイミングの前後関係に基づいて、先に入力された発話音声である第１の発話音声と、第１の発話音声が入力された後に入力された発話音声である第２の発話音声とを認識してよい。また、これにより取得部１３１は、これまでに収集されている音声情報（発話情報データベース１２１に記憶される音声情報）の中から、第１の発話音声を示す第１の音声情報、および、第２の発話音声を示す第２の音声情報を取得してよい。 The acquisition unit 131 may also acquire first speech information indicating the first speech speech and second speech information indicating the second speech speech. For example, the acquisition unit 131 may recognize the first speech speech, which is the speech speech input first, and the second speech speech, which is the speech speech input after the first speech speech, based on the context of the speech timing. In addition, the acquisition unit 131 may thereby acquire the first speech information indicating the first speech speech and the second speech information indicating the second speech speech from the speech information that has been collected so far (speech information stored in the speech information database 121).

（訂正音声判定部１３２について）
訂正音声判定部１３２は、第１の発話音声が入力された後に、第２の発話音声が入力された場合には、第１の発話音声が示す第１の発話内容と、第２の発話音声が示す第２の発話内容とに基づいて、第２の発話音声が第１の発話内容を訂正するために入力された音声であるか否かを判定する。 (Regarding the corrected speech determination unit 132)
When a second speech is input after a first speech is input, the correction speech determination unit 132 determines whether the second speech is a speech input to correct the first speech content based on the first speech content indicated by the first speech and the second speech content indicated by the second speech.

例えば、訂正音声判定部１３２は、第１の発話内容が言い間違えによる誤りの内容であり、この誤りの内容を訂正する意図で利用者が第２の発話音声を入力したか否か利用者の意図を推定する。つまり、訂正音声判定部１３２は、第１の発話音声が示す第１の発話内容と、第２の発話音声が示す第２の発話内容とに基づいて、利用者が第２の発話内容によって第１の発話内容を訂正する意図があるか否かを推定する。そして、訂正音声判定部１３２は、推定結果に応じて、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定する。例えば、訂正音声判定部１３２は、利用者が第２の発話内容によって第１の発話内容を訂正する意図があるとの推定結果が得られた場合には、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であると判定することができる。For example, the correction speech determination unit 132 estimates the user's intention by determining whether the first utterance content is an error due to a slip of the tongue and whether the user input the second utterance with the intention of correcting this error. In other words, the correction speech determination unit 132 estimates whether the user intends to correct the first utterance content using the second utterance content based on the first utterance content indicated by the first utterance content and the second utterance content indicated by the second utterance content. Then, based on the estimation result, the correction speech determination unit 132 determines whether the second utterance is correction speech input to correct the first utterance content. For example, if the estimation result indicates that the user intends to correct the first utterance content using the second utterance content, the correction speech determination unit 132 can determine that the second utterance is correction speech input to correct the first utterance content.

以下では、利用者が第２の発話内容によって第１の発話内容を訂正する意図があるか否かを推定し、推定結果に応じて、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定するという意図解析の具体例を示す。 Below, we will show a specific example of intention analysis in which we infer whether the user intends to correct the first utterance based on the second utterance, and based on the inference result, we determine whether the second utterance is a correction voice input to correct the first utterance.

例えば、訂正音声判定部１３２は、第１の音声内容（第１の音声情報）を示すテキストに対する形態素解析により、このテキストを構成する各単語を第１のキーワードとして抽出してよい。同様に、訂正音声判定部１３２は、第２の音声内容（第２の音声情報）を示すテキストに対する形態素解析により、このテキストを構成する各単語を第２のキーワードとして抽出してよい。For example, the corrected speech determination unit 132 may perform morphological analysis on text indicating a first speech content (first speech information) and extract each word constituting this text as a first keyword. Similarly, the corrected speech determination unit 132 may perform morphological analysis on text indicating a second speech content (second speech information) and extract each word constituting this text as a second keyword.

そして、訂正音声判定部１３２は、例えば、１つの第１のキーワードと、１つの第２のキーワードとの間で成立する組合せごとに、類似性を検出することで、検出した類似性に基づいて、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定してよい。 The corrected voice determination unit 132 may then detect similarity for each combination that is formed between, for example, one first keyword and one second keyword, and determine, based on the detected similarity, whether the second utterance is corrected voice input to correct the content of the first utterance.

１つ目の例として、訂正音声判定部１３２は、読みの類似性を検出してよい。例えば、訂正音声判定部１３２は、第１のキーワードと、第２のキーワードとの間で成立する組合ごとに、この組合せに含まれる第１のキーワードと、第２のキーワードとが文字（読み）としてどれだけ類似しているか指標する類似度を算出してよい。As a first example, the corrected speech determination unit 132 may detect similarities in pronunciation. For example, for each combination between a first keyword and a second keyword, the corrected speech determination unit 132 may calculate a similarity that indicates how similar the first keyword and the second keyword included in this combination are in terms of characters (pronunciation).

具体的な一例として、訂正音声判定部１３２は、第１のキーワードにおける母音の並びと、第２のキーワードにおける母音の並びとの類似性を示す類似度を算出してよい。他の例として、訂正音声判定部１３２は、第１のキーワードにおける子音の並びと、第２のキーワードにおける子音の並びとの類似性を示す類似度を算出してよい。As a specific example, the corrected speech determination unit 132 may calculate a similarity indicating the similarity between the sequence of vowels in the first keyword and the sequence of vowels in the second keyword. As another example, the corrected speech determination unit 132 may calculate a similarity indicating the similarity between the sequence of consonants in the first keyword and the sequence of consonants in the second keyword.

そして、訂正音声判定部１３２は、組合せの中に、類似度が所定値を超えるものが存在する場合には、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であると判定してよい。 Then, if there is a combination whose similarity exceeds a predetermined value, the correction voice determination unit 132 may determine that the second utterance voice is a correction voice input to correct the content of the first utterance.

２つ目の例として、訂正音声判定部１３２は、意味の類似性を検出してよい。例えば、訂正音声判定部１３２は、第１のキーワードと、第２のキーワードとの間で成立する組合ごとに、この組合せに含まれる第１のキーワードと、第２のキーワードとが文字（意味）としてどれだけ類似しているかを指標する類似度を算出してよい。As a second example, the corrected speech determination unit 132 may detect semantic similarity. For example, for each combination between a first keyword and a second keyword, the corrected speech determination unit 132 may calculate a similarity that indicates how similar the first keyword and the second keyword included in this combination are in terms of characters (meaning).

例えば、第１のキーワードが「おじいちゃんのいえ」であり、第２のキーワード「おじいちゃんち」であったとする。係る例では、訂正音声判定部１３２は、第１のキーワード「おじいちゃんのいえ」と、第２のキーワード「おじいちゃんち」とが文字（読み）としてどれだけ類似しているか指標する類似度を算出したうえで、算出した類似度に対して、「おじいちゃんのいえ」と、「おじいちゃんち」とが意味としてどれだけ類似しているか類似性に応じた重み付けを行ってよい。一例として、訂正音声判定部１３２は、「おじいちゃんのいえ」と、「おじいちゃんち」とが意味としてどれだけ類似しているか類似性を指標する類似度を算出し、算出した類似度を重み値として用いることで、文字（読み）としての類似度に対して重み付けを行ってよい。For example, suppose the first keyword is "Grandpa's house" and the second keyword is "Grandpa's home." In this example, the corrected speech determination unit 132 may calculate a similarity index indicating how similar the first keyword "Grandpa's house" and the second keyword "Grandpa's home" are in terms of characters (readings), and then weight the calculated similarity index according to how similar the meanings of "Grandpa's house" and "Grandpa's home" are. As an example, the corrected speech determination unit 132 may calculate a similarity index indicating how similar the meanings of "Grandpa's house" and "Grandpa's home" are, and use the calculated similarity index as a weighting value to weight the similarity as characters (readings).

３つ目の例として、訂正音声判定部１３２は、漢字の読み方の類似性を検出してよい。この例としては、図２で説明したように、漢字表記される１つの単語について、間違いやすい複数の読み（例えば、「イバラキ」と「イバラギ」）が存在することによる類似性が挙げられる。As a third example, the corrected speech determination unit 132 may detect similarities in the pronunciation of kanji characters. As an example, as described in Figure 2, there may be similarities due to the existence of multiple pronunciations (e.g., "Ibaraki" and "Ibaragi") that are easily confused for a single word written in kanji.

４つ目の例として、訂正音声判定部１３２は、発話音声の入力時間の間隔に応じて、類似性を検出してよい。例えば、訂正音声判定部１３２は、第２のキーワードとして、第１の発話音声が入力されてから所定の時間が経過するまでに入力された第２の発話音声に含まれる第２のキーワードと第１のキーワードとの類似性に基づいて、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定してよい。例えば、訂正音声判定部１３２は、第１の発話音声に対して連続して入力された第２の発話音声に含まれる第２のキーワードと、この第１の発話音声に含まれる第１のキーワードとの類似性に基づいて、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定してよい。As a fourth example, the corrected speech determination unit 132 may detect similarity based on the interval between input times of the speech. For example, the corrected speech determination unit 132 may determine whether the second speech is corrected speech input to correct the content of the first utterance based on the similarity between the first keyword and a second keyword included in the second speech input within a predetermined time period after the first speech is input. For example, the corrected speech determination unit 132 may determine whether the second speech is corrected speech input to correct the content of the first utterance based on the similarity between the second keyword included in the second speech input immediately after the first speech and the first keyword included in the first speech.

また、訂正音声判定部１３２は、上記の４つの要素（読みの類似性、意味の類似性、漢字の読み方の類似性、入力時間間隔に応じた類似性）がどれだけ満たされているかを条件として、条件が満たされるような発話音声が入力された回数に基づいて、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定してよい。 In addition, the correction voice determination unit 132 may determine whether the second utterance is a correction voice input to correct the content of the first utterance, based on the number of times that an utterance that satisfies the conditions (similarity of reading, similarity of meaning, similarity of kanji reading, and similarity according to the input time interval) is input, using the degree to which these conditions are met as a condition.

（検出部１３３について）
検出部１３３は、第２の発話音声が入力された入力状況を検出してよい。例えば、検出部１３３は、端末装置１０が有するセンサによるセンサ情報、あるいは、車両ＶＥｘが有するセンサによるセンサ情報に基づいて、入力状況を検出することができる。 (Regarding the detection unit 133)
The detection unit 133 may detect an input situation in which the second utterance is input. For example, the detection unit 133 can detect the input situation based on sensor information from a sensor included in the terminal device 10 or sensor information from a sensor included in the vehicle VEx.

例えば、検出部１３３は、入力状況として、第２の発話音声が入力された入力回数を検出してよい。一例として、検出部１３３は、第１の発話音声が入力された後の所定期間内において、第２の発話音声が入力された入力回数を検出してよい。For example, the detection unit 133 may detect the number of times the second utterance has been input as the input status. As an example, the detection unit 133 may detect the number of times the second utterance has been input within a predetermined period after the first utterance has been input.

また、訂正音声判定部１３２は、検出部１３３により検出された入力回数に基づいて、上述した類似度以外の観点から、利用者が第２の発話内容によって第１の発話内容を訂正する意図があるか否かを推定してよい。例えば、訂正音声判定部１３２は、検出部１３３により検出された入力回数が所定回数を超えるか否かに基づいて、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定してよい。具体的には、訂正音声判定部１３２は、所定期間内に第２の発話音声が入力された入力回数が所定回数を超える場合には、所定期間内に入力された第２の発話音声が第１の発話内容を訂正するために入力された音声であると判定してよい。 Furthermore, the correction speech determination unit 132 may estimate whether the user intends to correct the first utterance content with the second utterance content from a perspective other than the degree of similarity described above, based on the number of inputs detected by the detection unit 133. For example, the correction speech determination unit 132 may determine whether the second utterance is correction speech input to correct the first utterance content, based on whether the number of inputs detected by the detection unit 133 exceeds a predetermined number. Specifically, if the number of times the second utterance is input within a predetermined period exceeds a predetermined number, the correction speech determination unit 132 may determine that the second utterance input within the predetermined period is speech input to correct the first utterance content.

なお、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であると判定された場合、第２の発話内容に含まれるどの第２のキーワードを正解情報とし、第１の発話内容に含まれるどの第１のキーワードを誤り情報とするかは、第２のキーワードと第１のキーワードとの類似性の観点から判断されてよい。 In addition, if the second utterance is determined to be correction speech input to correct the first utterance, which second keyword contained in the second utterance is to be treated as correct information and which first keyword contained in the first utterance is to be treated as incorrect information may be determined from the perspective of the similarity between the second keyword and the first keyword.

また、他の例として、検出部１３３は、入力状況として、第２の発話音声を示す周波数を検出してよい。係る場合、訂正音声判定部１３２は、検出部１３３により検出された周波数に基づいて、上述した類似度以外の観点から、利用者が第２の発話内容によって第１の発話内容を訂正する意図があるか否かを推定してよい。例えば、訂正音声判定部１３２は、周波数から特定される第２の発話音声のトーンに基づいて、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定してよい。具体的には、訂正音声判定部１３２は、特定されたトーンが所定の発話態様を示す場合には、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であると判定してよい。 As another example, the detection unit 133 may detect a frequency indicating a second utterance as the input status. In such a case, the correction speech determination unit 132 may estimate, based on the frequency detected by the detection unit 133, whether the user intends to correct the first utterance using the second utterance from a perspective other than the similarity described above. For example, the correction speech determination unit 132 may determine, based on the tone of the second utterance identified from the frequency, whether the second utterance is correction speech input to correct the first utterance. Specifically, if the identified tone indicates a predetermined speech pattern, the correction speech determination unit 132 may determine that the second utterance is correction speech input to correct the first utterance.

（紐付部１３４について）
紐付部１３４は、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であると判定された場合には、第１の発話内容と、この第２の発話音声が示す第２の発話内容とを紐付ける。 (Regarding the linking unit 134)
When the linking unit 134 determines that the second speech voice is a correction voice input to correct the first speech content, it links the first speech content with the second speech content indicated by the second speech voice.

例えば、紐付部１３４は、第２の発話音声が第１の発話内容を訂正するために入力された音声であると判定された場合には、第２の発話内容に含まれる第２のキーワードと、第１の発話内容に含まれる第１のキーワードとの組合せのうち、互いに類似していると判定された第２のキーワードと、第１のキーワードとを組合せを抽出する。そして、紐付部１３４は、抽出した組合せにおける第２のキーワードを正解情報とし、また、この組合せにおける第１のキーワードを当該正解情報に対する誤り情報として、第２のキーワードと、第１のキーワードとを紐付ける。For example, when the linking unit 134 determines that the second utterance is speech input to correct the first utterance, it extracts a combination of a second keyword and a first keyword that are determined to be similar to each other from among combinations of a second keyword included in the second utterance and a first keyword included in the first utterance.The linking unit 134 then links the second keyword and the first keyword in the extracted combination as correct answer information and as incorrect information for the correct answer information.

また、紐付部１３４は、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であると判定された場合には、第１の発話音声が入力された後の所定期間内において入力された第２の発話音声それぞれに含まれる第２のキーワードのうちいずれかのキーワードを正解情報とし、第１の発話音声に含まれる第１のキーワードを正解情報に対する誤り情報として、第２のキーワードと、第１のキーワードとを紐付けてよい。例えば、紐付部１３４は、所定期間内に入力された第２の発話音声が示す第２の発話内容に含まれる第２のキーワードと、第１の発話内容に含まれる第１のキーワードとの組合せのうち、互いに類似していると判定された第２のキーワードと、第１のキーワードとを組合せを抽出する。そして、紐付部１３４は、抽出した組合せにおける第２のキーワードを正解情報とし、また、この組合せにおける第１のキーワードを当該正解情報に対する誤り情報として、第２のキーワードと、第１のキーワードとを紐付けてよい。Furthermore, when the linking unit 134 determines that the second speech is correction speech input to correct the first speech, it may link the second keyword and the first keyword by treating one of the second keywords included in each second speech input within a predetermined period after the first speech is input as correct answer information and treating the first keyword included in the first speech as error information for the correct answer information. For example, the linking unit 134 may extract combinations of the second keyword and the first keyword that are determined to be similar to each other from combinations of the second keyword included in the second speech indicated by the second speech input within a predetermined period and the first keyword included in the first speech. Then, the linking unit 134 may determine the second keyword in the extracted combination as correct answer information and link the first keyword in this combination as error information for the correct answer information.

また、例えば、紐付部１３４は、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であると判定された場合には、所定の発話態様が示すトーンで発話された第２の発話音声に含まれる第２のキーワードを正解情報とし、第１の発話音声に含まれる第１のキーワードを正解情報に対する誤り情報として、第２のキーワードと、第１のキーワードとを紐付けてよい。例えば、紐付部１３４は、所定の発話態様が示すトーンで発話された第２の発話音声に含まれる第２のキーワードと、第１の発話内容に含まれる第１のキーワードとの組合せのうち、互いに類似していると判定された第２のキーワードと、第１のキーワードとを組合せを抽出する。そして、紐付部１３４は、抽出した組合せにおける第２のキーワードを正解情報とし、また、この組合せにおける第１のキーワードを当該正解情報に対する誤り情報として、第２のキーワードと、第１のキーワードとを紐付けてよい。 Furthermore, for example, when the linking unit 134 determines that the second speech is correction speech input to correct the first speech content, the linking unit 134 may link the second keyword and the first keyword, with the second keyword contained in the second speech spoken in a tone indicated by a predetermined speech style as correct answer information and the first keyword contained in the first speech as error information for the correct answer information. For example, the linking unit 134 may extract combinations of the second keyword and the first keyword that are determined to be similar to each other from combinations of the second keyword contained in the second speech spoken in a tone indicated by a predetermined speech style and the first keyword contained in the first speech content. Then, the linking unit 134 may link the second keyword and the first keyword in the extracted combination as correct answer information and the first keyword in this combination as error information for the correct answer information.

例えば、紐付部１３４は、第２のキーワードを正解情報とし、第１のキーワードを正解情報に対する誤り情報として、第２のキーワードと、第１のキーワードとを紐付けた紐付結果に対して、この第２のキーワードに対して払い出した紐付けＩＤを対応付けた状態で、紐付情報データベース１２２に登録してよい。 For example, the linking unit 134 may register the linking result of linking the second keyword with the first keyword in the linking information database 122, with the second keyword as correct information and the first keyword as incorrect information relative to the correct information, and associate the linking ID assigned to this second keyword with the linking result.

（学習部１３５について）
学習部１３５は、紐付部１３４により紐付けられた正解情報と誤り情報との組を学習データとして、誤り情報が示す発話内容のうち、正解情報が示す発話内容に対して間違えられやすい発話内容のパターンを学習する。例えば、学習部１３５は、正解情報と誤り情報との組を学習データとして、誤り情報が示す第１のキーワードのうち、正解情報が示す第２のキーワードに対して間違えられやすいキーワードのパターンを学習する。 (Regarding the learning unit 135)
The learning unit 135 uses as learning data the pairs of correct answer information and error information linked by the linking unit 134 to learn patterns of utterance content indicated by the error information that are likely to be mistaken for the utterance content indicated by the correct answer information. For example, the learning unit 135 uses as learning data the pairs of correct answer information and error information to learn patterns of keywords indicated by the first keywords indicated by the error information that are likely to be mistaken for the second keyword indicated by the correct answer information.

（情報制御部１３６について）
情報制御部１３６は、紐付部１３４による紐付結果に基づいて、第１の発話内容に対する所定の制御を行う。 (Regarding the information control unit 136)
The information control unit 136 performs predetermined control on the first utterance content based on the linking result by the linking unit 134 .

例えば、情報制御部１３６は、紐付部１３４により紐付けられた正解情報と誤り情報との関係性に基づいて、誤り情報が示す発話内容の音声が入力された場合に、入力された発話内容が、当該誤り情報に対応付けられる正解情報が示す発話内容として認識されるよう、当該誤り情報を正解情報としてユーザ辞書（ユーザ辞書データベース１２３）に登録する。 For example, based on the relationship between the correct answer information and the incorrect information linked by the linking unit 134, the information control unit 136 registers the incorrect information as correct answer information in the user dictionary (user dictionary database 123) so that when speech of the speech content indicated by the incorrect information is input, the input speech content is recognized as the speech content indicated by the correct answer information associated with the incorrect information.

例えば、情報制御部１３６は、学習部１３５による学習結果に基づいて、誤り情報が示す発話内容のうち、正解情報が示す発話内容に対して間違えられやすい発話内容の音声が入力された場合に、入力された発話内容が、当該誤り情報に対応付けられる正解情報が示す発話内容として認識されるよう、当該誤り情報を正解情報としてユーザ辞書に登録する。例えば、情報制御部１３６は、学習部１３５による学習結果に基づいて、キーワードをユーザ辞書に登録する。例えば、情報制御部１３６は、学習結果に基づいて、第１のキーワードのうち、第２のキーワードに対して間違えられやすいキーワードを含む発話音声が入力された場合に、入力されたこの第１のキーワードが第２のキーワードとして認識されるよう、この第１のキーワードをユーザ辞書に登録する。For example, based on the learning results by the learning unit 135, the information control unit 136 registers the error information as correct information in the user dictionary so that when a speech that is easily mistaken for ...

〔３．処理手順〕
次に、図７を用いて、第１の実施形態に係る情報処理の手順について説明する。図７は、第１の実施形態に係る情報処理の手順を示すフローチャートである。なお、図７の例では、端末装置１０が、発話音声の入力を受け付けるたびに、受け付けた発話音声を示す発話情報を情報処理装置１００に送信しているものとする。また、情報処理装置１００は、端末装置１０から送信された音声情報を発話情報データベース１２１に随時蓄積しているものとする。また、図７では、車両ＶＥ１の利用者Ｕ１を一例に用いて、情報処理手順を説明する。 3. Processing Procedure
Next, the procedure of information processing according to the first embodiment will be described with reference to FIG. 7. FIG. 7 is a flowchart showing the procedure of information processing according to the first embodiment. In the example of FIG. 7, it is assumed that the terminal device 10 transmits speech information indicating the received speech to the information processing device 100 every time it receives input of a speech. It is also assumed that the information processing device 100 accumulates the speech information transmitted from the terminal device 10 in the speech information database 121 as needed. In addition, in FIG. 7, the information processing procedure will be described using a user U1 of a vehicle VE1 as an example.

このような状態において、取得部１３１は、意図解析を行うタイミングになったか否かを判定する（ステップＳ７０１）。例えば、取得部１３１は、意図解析を行う上で十分な数の音声情報が発話情報データベース１２１に蓄積されているか否かに基づき、意図解析を行うタイミングになったか否かを判定してよい。In this state, the acquisition unit 131 determines whether it is time to perform intention analysis (step S701). For example, the acquisition unit 131 may determine whether it is time to perform intention analysis based on whether a sufficient amount of speech information for performing intention analysis has been stored in the utterance information database 121.

取得部１３１は、意図解析を行うタイミングになっていないと判定している間は（ステップＳ７０１；Ｎｏ）、意図解析を行うタイミングになったと判定できるまで待機する。 While the acquisition unit 131 determines that it is not time to perform intention analysis (step S701; No), it waits until it determines that it is time to perform intention analysis.

一方、取得部１３１は、意図解析を行うタイミングになったと判定できた場合には（ステップＳ７０１；Ｙｅｓ）、利用者Ｕ１による発話タイミングの前後関係に基づいて、第１の発話音声を示す第１の音声情報、および、第２の発話音声を示す第２の音声情報を取得する（ステップＳ７０２）。例えば、取得部１３１は、発話タイミングの前後関係に基づいて、利用者Ｕ１によって先に入力された発話音声である第１の発話音声と、第１の発話音声が入力された後に利用者Ｕ１によって入力された発話音声である第２の発話音声とを認識する。そして、取得部１３１は、発話情報データベース１２１に蓄積されている音声情報であって、利用者Ｕ１に対応する音声情報の中から、第１の発話音声を示す第１の音声情報、および、第２の発話音声を示す第２の音声情報を取得する。On the other hand, if the acquisition unit 131 determines that it is time to perform intention analysis (step S701; Yes), it acquires first speech information indicating the first speech and second speech information indicating the second speech based on the context of the speech timing by user U1 (step S702). For example, the acquisition unit 131 recognizes the first speech, which is the speech input by user U1 first, and the second speech, which is the speech input by user U1 after the first speech, based on the context of the speech timing. Then, the acquisition unit 131 acquires the first speech information indicating the first speech and the second speech information indicating the second speech from the speech information stored in the speech information database 121 and corresponding to user U1.

次に、訂正音声判定部１３２は、第１の音声情報と、第２の音声情報との組の全てについて、意図解析が済んでいない状態であるか否かを判定する（ステップＳ７０３）。なお、ここでいう、第１の音声情報と、第２の音声情報との組とは、発話タイミングが連続する関係にある第１の発話音声と、第２の発話音声とに対応する組であってよい。Next, the corrected speech determination unit 132 determines whether intention analysis has not been completed for all pairs of first speech information and second speech information (step S703). Note that the pair of first speech information and second speech information referred to here may be a pair corresponding to a first speech speech and a second speech speech that are in a consecutive speech timing relationship.

訂正音声判定部１３２は、第１の音声情報と、第２の音声情報との組の全てについて、意図解析済みであると判定した場合には（ステップＳ７０３；Ｎｏ）、この時点で第１の実施形態に係る情報処理を終了させる。 If the corrected voice determination unit 132 determines that intention analysis has been completed for all pairs of first voice information and second voice information (step S703; No), it terminates the information processing related to the first embodiment at this point.

一方、訂正音声判定部１３２は、第１の音声情報と、第２の音声情報との組の全てについて、意図解析が済んでいない状態であると判定した場合には（ステップＳ７０３；Ｙｅｓ）、第１の音声情報と、第２の音声情報との組のうち、意図解析が済んでいない未処理の組を取得する（ステップＳ７０４）。 On the other hand, if the corrected voice determination unit 132 determines that intention analysis has not been completed for all pairs of first voice information and second voice information (step S703; Yes), it acquires unprocessed pairs of first voice information and second voice information for which intention analysis has not been completed (step S704).

続いて、訂正音声判定部１３２は、ステップＳ７０４で取得した第１の音声情報および第２の音声情報に基づいて、第１の音声情報が示す第１の発話内容を訂正するために、第２の発話音声が入力されたか否か利用者Ｕ１の意図を推定する（ステップＳ７０５）。具体的には、訂正音声判定部１３２は、第１の発話音声が示す第１の発話内容と、第２の発話音声が示す第２の発話内容とに基づいて、第２の発話内容によって第１の発話内容を訂正するために、利用者Ｕ１が第２の発話内容を示す第２の発話音声を入力したか否か利用者Ｕ１の意図を推定する。Next, based on the first speech information and second speech information acquired in step S704, the correction speech determination unit 132 estimates the intention of user U1, i.e., whether or not a second speech has been input to correct the first speech content indicated by the first speech information (step S705). Specifically, based on the first speech content indicated by the first speech information and the second speech content indicated by the second speech information, the correction speech determination unit 132 estimates the intention of user U1, i.e., whether or not user U1 has input a second speech that indicates the second speech content to correct the first speech content with the second speech content.

例えば、訂正音声判定部１３２は、第１の音声内容（第１の音声情報）を示すテキストに対する形態素解析により、このテキストを構成する各単語を第１のキーワードとして抽出する。また、訂正音声判定部１３２は、第２の音声内容（第２の音声情報）を示すテキストに対する形態素解析により、このテキストを構成する各単語を第２のキーワードとして抽出する。そして、訂正音声判定部１３２は、抽出した第１のキーワードと、第２のキーワードとの類似性に基づいて、第２の発話内容によって第１の発話内容を訂正するために、利用者Ｕ１が第２の発話内容を示す第２の発話音声を入力したか否か利用者Ｕ１の意図を推定する。For example, the corrected speech determination unit 132 performs morphological analysis on text indicating a first speech content (first speech information) to extract each word constituting this text as a first keyword. The corrected speech determination unit 132 also performs morphological analysis on text indicating a second speech content (second speech information) to extract each word constituting this text as a second keyword. Based on the similarity between the extracted first keywords and the second keywords, the corrected speech determination unit 132 then estimates the intention of user U1, i.e., whether user U1 input a second speech speech indicating the second speech content in order to correct the first speech content with the second speech content.

続いて、訂正音声判定部１３２は、意図解析により利用者Ｕ１の意図を推定した推定結果に基づいて、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定する（ステップＳ７０６）。 Next, the correction voice determination unit 132 determines whether the second utterance is a correction voice input to correct the content of the first utterance based on the estimation result of the intention of user U1 through intention analysis (step S706).

訂正音声判定部１３２は、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声でないと判定した場合には（ステップＳ７０６；Ｎｏ）、第１の音声情報と、第２の音声情報との組のうち、意図解析が済んでいない未処理の他の組について処理を行うべくステップＳ７０３に戻る。 If the correction voice determination unit 132 determines that the second utterance voice is not a correction voice input to correct the content of the first utterance (step S706; No), it returns to step S703 to process other unprocessed pairs of first voice information and second voice information for which intention analysis has not been completed.

一方、紐付部１３４は、第２の発話音声が第１の発話内容を訂正するために入力された訂正音声であると判定した場合には（ステップＳ７０６；Ｙｅｓ）、第１のキーワードと、第２のキーワードとを紐付ける紐付処理を行う（ステップＳ７０７）。例えば、紐付部１３４は、第２の発話内容（第２の発話情報）に含まれる第２のキーワードと、第１の発話内容（第１の発話情報）に含まれる第１のキーワードとの組合せのうち、互いに類似していると判定された第２のキーワードと、第１のキーワードとの組合せを抽出する。そして、紐付部１３４は、抽出した組合せにおける第２のキーワードを正解情報とし、また、この組合せにおける第１のキーワードを当該正解情報に対する誤り情報として、第２のキーワードと、第１のキーワードとを紐付ける。On the other hand, if the linking unit 134 determines that the second utterance is correction speech input to correct the first utterance (step S706; Yes), it performs a linking process to link the first keyword with the second keyword (step S707). For example, the linking unit 134 extracts combinations of the second keyword and the first keyword that are determined to be similar to each other from among combinations of the second keyword included in the second utterance content (second utterance information) and the first keyword included in the first utterance content (first utterance information). The linking unit 134 then links the second keyword with the first keyword, regarding the second keyword in the extracted combination as correct answer information and the first keyword in this combination as incorrect information for the correct answer information.

また、紐付部１３４は、第２のキーワードと、第１のキーワードとを紐付けた紐付結果に対して、利用者Ｕ１を示す利用者ＩＤおよび紐付けＩＤの組を対応付けた状態で、紐付情報データベース１２２に登録してよい。この結果、図５に示すような、紐付情報データベース１２２が得られる。 The linking unit 134 may also register the linking result of linking the second keyword with the first keyword in the linking information database 122 in a state in which a pair of a user ID and a linking ID indicating user U1 is associated with the result. As a result, the linking information database 122 shown in Figure 5 is obtained.

次に、学習部１３５は、紐付処理の結果得られた正解情報と誤り情報との組を学習データとして、誤り情報が示す第１のキーワードのうち、正解情報が示す第２のキーワードに対して間違えられやすいキーワードのパターンを学習する（ステップＳ７０８）。 Next, the learning unit 135 uses the pairs of correct answer information and erroneous information obtained as a result of the linking process as learning data to learn patterns of keywords among the first keywords indicated by the erroneous information that are likely to be confused with the second keywords indicated by the correct answer information (step S708).

次に、情報制御部１３６は、学習結果に基づいて、キーワードを辞書登録する（ステップＳ７０９）。例えば、情報制御部１３６は、学習結果に基づいて、第１のキーワードのうち、第２のキーワードに対して間違えられやすいキーワードを含む発話音声が入力された場合に、入力された第１のキーワードが第２のキーワードとして認識されるよう、第１のキーワードを「発話キーワード」とし、第２のキーワードを「認識キーワード」として、双方のキーワードを対応付けた状態でユーザ辞書に登録する。この結果、図６に示すような、ユーザ辞書データベース１２３が得られる。Next, the information control unit 136 registers the keywords in the dictionary based on the learning results (step S709). For example, based on the learning results, when a speech containing a first keyword that is easily confused with a second keyword is input, the information control unit 136 registers the first keyword as an "utterance keyword" and the second keyword as a "recognition keyword" in the user dictionary in a corresponding state, so that the input first keyword is recognized as the second keyword. As a result, a user dictionary database 123 such as that shown in Figure 6 is obtained.

ここで、情報制御部１３６は、ステップＳ７０３へと処理を戻す。そして、第１の音声情報と、第２の音声情報との組の全てについて、意図解析済みであると判定された場合には、この時点で第１の実施形態に係る情報処理は終了となる。 At this point, the information control unit 136 returns the process to step S703. If it is determined that intention analysis has been completed for all pairs of first voice information and second voice information, the information processing according to the first embodiment ends at this point.

（第２の実施形態）
〔１．第２の実施形態の全体像〕
ここからは、図８を用いて、第２の実施形態について説明する。第１の実施形態では、言い間違えた誤りの内容を発話音声で訂正しようとする利用者の訂正音声を検出することで、検出した訂正音声の内容に基づく紐付けが行われていた。これに対して、第２の実施形態では、言い間違えた誤りの内容を発話音声以外の入力手段で訂正しようとする利用者の訂正操作を検出することで、検出した訂正操作の内容に基づく紐付けが行われる点で第１の実施形態とは異なる。 Second Embodiment
1. Overview of the Second Embodiment
Next, the second embodiment will be described with reference to Fig. 8. In the first embodiment, the correction voice of a user attempting to correct the content of a mistake made by speech is detected, and linking is performed based on the content of the detected correction voice. In contrast, the second embodiment differs from the first embodiment in that the correction operation of a user attempting to correct the content of a mistake made by speech using an input means other than speech is detected, and linking is performed based on the content of the detected correction operation.

以下では、図８を用いて、第２の実施形態に係る情報処理を説明する。図８は、第２の実施形態に係る情報処理を説明する説明図である。また、第２の実施形態では、発話音声以外の入力手段を、所定の対象物（例えば、端末装置１０の表示パネル（表示画面））に触れることで情報入力するという手入力操作とする。 In the following, information processing according to the second embodiment will be described using Figure 8. Figure 8 is an explanatory diagram illustrating information processing according to the second embodiment. In addition, in the second embodiment, the input means other than spoken voice is a manual input operation in which information is input by touching a predetermined object (for example, the display panel (display screen) of the terminal device 10).

図８には、利用者Ｕ１が、車両ＶＥ１（車両ＶＥｘの一例）に搭載される端末装置１０に向かって発話することで発話音声を入力したり、端末装置１０に触れることで情報入力したりしている場面が示される。 Figure 8 shows a scene in which user U1 inputs speech by speaking into a terminal device 10 installed in a vehicle VE1 (an example of a vehicle VEx) and inputs information by touching the terminal device 10.

例えば、端末装置１０は、発話音声の入力を受け付けるたびに、受け付けた発話音声を示す音声情報を情報処理装置２００に送信する。この結果、情報処理装置２００は、端末装置１０から音声情報を取得する（ステップＳ２１）。For example, each time the terminal device 10 receives input of a speech, it transmits speech information indicating the received speech to the information processing device 200. As a result, the information processing device 200 acquires the speech information from the terminal device 10 (step S21).

図８には、利用者Ｕ１が、「大阪府のイバラキ市に存在する目的地○○」までのルートを案内するよう指示する内容の発話音声を入力している場面が示される。具体的には、図８には、利用者Ｕ１が、「「イバラギ」市の○○までルート案内おねがい！」といった内容Ｃ１１の発話音声ＶＯ１１を入力した例が示される。係る例では、端末装置１０は、発話音声ＶＯ１１の入力に応じて、発話内容Ｃ１１を示す音声情報を情報処理装置２００に送信する。この結果、情報処理装置２００は、発話内容Ｃ１１を示す音声情報を取得する。 Figure 8 shows a scene in which user U1 is inputting a speech voice instructing that route guidance be provided to "destination XX located in Ibaraki City, Osaka Prefecture." Specifically, Figure 8 shows an example in which user U1 inputs a speech voice VO11 with content C11 such as "Please guide me along the route to XX in Ibaraki City!" In this example, in response to the input of the speech voice VO11, the terminal device 10 transmits audio information indicating the speech content C11 to the information processing device 200. As a result, the information processing device 200 acquires the audio information indicating the speech content C11.

また、端末装置１０は、手入力操作による入力を受け付けるたびに、入力された操作内容を示す操作情報も情報処理装置２００に送信してよい。この結果、情報処理装置２００は、端末装置１０から手入力操作による操作情報を取得する（ステップＳ２２）。 Furthermore, each time the terminal device 10 receives input through a manual input operation, it may also transmit operation information indicating the input operation content to the information processing device 200. As a result, the information processing device 200 acquires the operation information through the manual input operation from the terminal device 10 (step S22).

ここで、利用者Ｕ１は、正しくは「イバラキ」と発話すべきところ、「イバラギ」と言い間違えてしまったことに気付いたとする。図２では、利用者Ｕ１は、「「イバラキ」市の○○までルート案内おねがい！」といった内容Ｃ１２の発話音声ＶＯ１２を入力し直していた。 Now, let's say user U1 realizes that he/she mistakenly said "Ibaragi" when he/she should have said "Ibaraki." In Figure 2, user U1 re-inputs the speech VO12 with content C12 such as "Please guide me to XX in Ibaraki City!"

しかしながら、図８の例では、利用者Ｕ１は、発話では不安があるため手入力の方が確実であると考える。そして、図８の例では、利用者Ｕ１は、端末装置１０において経路案内に関するナビゲーション画面が表示されている状態で、正しい目的地「イバラキ」を打ち込むという操作内容Ｃ１２の手入力操作ＩＯ１２を行っている。係る例では、端末装置１０は、手入力操作ＩＯ１２に応じて、操作内容Ｃ１２を示す操作情報を情報処理装置２００に送信する。この結果、情報処理装置２００は、操作内容Ｃ１２を示す操作情報を取得する。However, in the example of Figure 8, user U1 is unsure about speaking and believes that manual input is more reliable. In the example of Figure 8, user U1 performs manual input operation IO12 of operation content C12, which involves typing in the correct destination, "Ibaraki," while a navigation screen related to route guidance is displayed on terminal device 10. In this example, terminal device 10 transmits operation information indicating operation content C12 to information processing device 200 in response to manual input operation IO12. As a result, information processing device 200 acquires operation information indicating operation content C12.

なお、図８の例では、操作内容Ｃ１２を示す操作情報とは、目的地「イバラキ」を示すキーワードを含むものであってよい。また、このようなことから、手入力操作ＩＯ１２は、目的地設定操作といえる。 In the example of Figure 8, the operation information indicating operation content C12 may include a keyword indicating the destination "Ibaraki." Therefore, the manual input operation IO12 can be considered a destination setting operation.

次に、情報処理装置２００は、利用者Ｕ１による発話のタイミングと、利用者Ｕ１による手入力操作のタイミングとの前後関係に基づいて、第１の発話音声を示す第１の音声情報と、第１の発話音声の後に行われた手入力操作（目的地設定操作）である第２の入力操作を示す第２の操作情報とを取得する（ステップＳ２３）。例えば、情報処理装置２００は、発話タイミングと、手入力操作のタイミングとの前後関係に基づいて、先に入力された発話音声である第１の発話音声と、第１の発話音声が入力された後に行われた手入力操作である第２の入力操作とを認識してよい。また、これにより情報処理装置２００は、端末装置１０を介してこれまでに収集している音声情報の中から、第１の発話音声を示す第１の音声情報を取得してよい。また、情報処理装置２００は、端末装置１０を介してこれまでに収集している操作情報の中から、第２の入力操作を示す第２の操作情報を取得してよい。Next, the information processing device 200 acquires first voice information indicating the first voice speech and second operation information indicating a second input operation, which is a manual input operation (destination setting operation) performed after the first voice speech, based on the context between the timing of the speech by user U1 and the timing of the manual input operation by user U1 (step S23). For example, the information processing device 200 may recognize the first voice speech, which is the voice speech input first, and the second input operation, which is the manual input operation performed after the first voice speech, based on the context between the timing of the speech and the manual input operation. Furthermore, the information processing device 200 may thereby acquire first voice information indicating the first voice speech from the voice information previously collected via the terminal device 10. Furthermore, the information processing device 200 may acquire second operation information indicating the second input operation from the operation information previously collected via the terminal device 10.

図８の例では、情報処理装置２００は、発話音声ＶＯ１１を第１の発話音声として認識することで、発話内容Ｃ１１を示す音声情報を第１の音声情報として取得したものとする。また、図８の例では、情報処理装置２００は、手入力操作ＩＯ１２を第２の入力操作として認識することで、操作内容Ｃ１２を示す操作情報を第２の操作情報として取得したものとする。以下、発話音声ＶＯ１１を「第１の発話音声ＶＯ１１」と表記し、手入力操作ＩＯ１２を「第２の入力操作ＩＯ１２」と表記する場合がある。 In the example of Figure 8, the information processing device 200 recognizes the spoken voice VO11 as the first spoken voice, thereby acquiring voice information indicating the spoken content C11 as the first voice information. Also, in the example of Figure 8, the information processing device 200 recognizes the manual input operation IO12 as the second input operation, thereby acquiring operation information indicating the operation content C12 as the second operation information. Hereinafter, the spoken voice VO11 may be referred to as the "first spoken voice VO11," and the manual input operation IO12 may be referred to as the "second input operation IO12."

次に、情報処理装置２００は、第１の音声情報（第１の発話内容）を構成する各キーワードである第１のキーワードと、第２の操作情報（第２の操作内容）を構成する各キーワードである第２のキーワードとの類似性に基づいて、言い間違えたことによる訂正の意図を推定する意図解析を行う（ステップＳ２４）。具体的には、情報処理装置２００は、第１のキーワードと、第２のキーワードとの類似性に基づいて、利用者Ｕ１が第２の操作内容によって、第１の発話内容を訂正する意図があるか否か意図推定を行う。ステップＳ２４で行われる意図解析の具体的な手法については後述する。Next, the information processing device 200 performs an intention analysis to estimate the intention of correcting the mistaken utterance based on the similarity between the first keywords, which are the keywords that make up the first voice information (first utterance content), and the second keywords, which are the keywords that make up the second operation information (second operation content) (step S24). Specifically, the information processing device 200 estimates the intention of whether user U1 intends to correct the first utterance content by using the second operation content, based on the similarity between the first keywords and the second keywords. The specific method of intention analysis performed in step S24 will be described later.

なお、情報処理装置２００は、ステップＳ２４では、第１の音声情報を示すテキストに対する形態素解析により、このテキストを構成する各単語を第１のキーワードとして抽出してよい。同様に、情報処理装置２００は、第２の操作情報を示すテキストに対する形態素解析により、このテキストを構成する各単語を第２のキーワードとして抽出してよい。 In step S24, the information processing device 200 may perform morphological analysis on the text indicating the first audio information to extract each word constituting this text as a first keyword. Similarly, the information processing device 200 may perform morphological analysis on the text indicating the second operation information to extract each word constituting this text as a second keyword.

続いて、情報処理装置２００は、意図解析による推定結果に基づいて、第２の入力操作ＩＯ１２が、第１の発話音声ＶＯ１１に対応する第１の発話内容を訂正するために手入力された訂正操作であるか否かを判定する（ステップＳ２５）。例えば、情報処理装置２００は、第１のキーワードの１つである「イバラギ」と、第２のキーワードの１つである「イバラキ」との間で類似性があると認められた場合には、利用者Ｕ１が第２の操作内容（第２のキーワードＫＷ１２）によって、第１の発話内容（第１のキーワードＫＷ１１）を訂正する意図があると推定することができる。また、この結果、情報処理装置２００は、第２の入力操作ＩＯ１２が、第１の発話内容を訂正するために手入力された訂正操作であると判定することができる。Next, based on the estimation result of the intention analysis, the information processing device 200 determines whether the second input operation IO12 is a correction operation manually input to correct the first utterance content corresponding to the first speech audio VO11 (step S25). For example, if the information processing device 200 recognizes similarity between "Ibaragi," one of the first keywords, and "Ibaraki," one of the second keywords, it can infer that user U1 intends to correct the first utterance content (first keyword KW11) using the second operation content (second keyword KW12). As a result, the information processing device 200 can determine that the second input operation IO12 is a correction operation manually input to correct the first utterance content.

このように、第２の入力操作ＩＯ１２が、第１の発話内容を訂正するために手入力された訂正操作であると判定した場合には、情報処理装置２００は、第２のキーワード「イバラキ」を正解情報とし、第１のキーワード「イバラギ」を正解情報に対する誤り情報として、第２のキーワード「イバラキ」と、第１のキーワード「イバラギ」とを紐付ける（ステップＳ２６）。図８には、第２のキーワード「イバラキ」を正解情報とし、第１のキーワード「イバラギ」を正解情報とした今回の例を含めて、利用者Ｕ１が過去にも「イバラキ」を「イバラギ」と言い間違えたことによる紐付結果の例や、「イバラキ」を「イバラク」と言い間違えたことによる紐付結果の例が示される。また、このような紐付結果は、紐付けＩＤを用いて紐付情報データベース１２２（図５）で管理されてよい。In this way, if the information processing device 200 determines that the second input operation IO12 is a manual correction operation to correct the first utterance, the information processing device 200 associates the second keyword "Ibaraki" with the first keyword "Ibaragi," with the second keyword "Ibaraki" as the correct answer and the first keyword "Ibaragi" as the incorrect answer (step S26). Figure 8 shows examples of association results resulting from user U1's past mispronunciations of "Ibaraki" as "Ibaragi" and "Ibaraki" as "Ibaraku," including the current example in which the second keyword "Ibaraki" is the correct answer and the first keyword "Ibaragi" is the correct answer. Furthermore, such association results may be managed in the association information database 122 (Figure 5) using an association ID.

また、情報処理装置２００は、正解情報と誤り情報との組を学習データとして、誤り情報が示す第１のキーワードのうち、正解情報が示す第２のキーワードに対して間違えられやすいキーワードを学習する（ステップＳ２７）。図８の例では、情報処理装置２００は、第１のキーワード「イバラギ」、および、第１のキーワード「イバラク」のうち、第２のキーワード「イバラキ」に対して間違えられやすいものを学習する。 The information processing device 200 also uses pairs of correct answer information and erroneous information as learning data to learn which of the first keywords indicated by the erroneous information are likely to be confused with the second keyword indicated by the correct answer information (step S27). In the example of Figure 8, the information processing device 200 learns which of the first keywords "Ibaragi" and "Ibaraku" are likely to be confused with the second keyword "Ibaraki."

そして、情報処理装置２００は、学習結果に基づいて、キーワードを辞書登録する（ステップＳ２８）。例えば、情報処理装置２００は、学習結果に基づいて、第１のキーワードのうち、第２のキーワードに対して間違えられやすいキーワードを含む発話音声が入力された場合に、入力されたこの第１のキーワードが第２のキーワードとして認識されるよう、この第１のキーワードをユーザ辞書（図６）に登録する。図８には、情報処理装置２００が、第１のキーワード「イバラギ」を含む発話音声が入力された場合に、第１のキーワード「イバラギ」が第２のキーワード「イバラキ」として認識されるよう、第１のキーワード「イバラギ」をユーザ辞書に登録している例が示される。 Then, the information processing device 200 registers the keyword in the dictionary based on the learning results (step S28). For example, based on the learning results, the information processing device 200 registers the first keyword in the user dictionary (Figure 6) so that when a speech containing a keyword that is easily confused with a second keyword is input, the input first keyword will be recognized as the second keyword. Figure 8 shows an example in which the information processing device 200 registers the first keyword "Ibaragi" in the user dictionary so that when a speech containing the first keyword "Ibaragi" is input, the first keyword "Ibaragi" will be recognized as the second keyword "Ibaraki."

さて、これまで図８で説明してきたように、第２の実施形態では、情報処理装置２００は、第１の発話音声が入力された後に、第２の入力操作が行われた場合には、第１の発話音声が示す第１の発話内容と、第２の入力操作が示す第２の操作内容とに基づいて、第２の入力操作が第１の発話内容を訂正するために手入力された訂正操作であるか否かを判定する。そして、情報処理装置２００は、第２の入力操作が第１の発話内容を訂正するために手入力された訂正操作であると判定した場合には、第１の発話内容と、第２の操作内容とを紐付けることで、紐付結果に基づいて、第１の発話内容をユーザ辞書に登録する。 As described above with reference to FIG. 8, in the second embodiment, when a second input operation is performed after a first utterance is input, the information processing device 200 determines whether the second input operation is a correction operation manually input to correct the first utterance, based on the first utterance content indicated by the first utterance and the second operation content indicated by the second input operation. If the information processing device 200 determines that the second input operation is a correction operation manually input to correct the first utterance, it links the first utterance content with the second operation content, and registers the first utterance content in the user dictionary based on the linking result.

このような第２の実施形態に係る情報処理によれば、情報処理装置２００は、利用者が言い間違えた場合であっても言い間違えに対する正しい動作を実行できるよう制御することができる。 According to the information processing of the second embodiment, the information processing device 200 can be controlled so that the correct action can be taken in response to the mistake even if the user makes a mistake.

なお、他の一例として、情報処理装置２００は、第１のキーワード「案内中止」の後に、案内中断ボタンが操作された際には、第１のキーワードの１つである「案内中止」を、案内中断ボタンと対応している第２のキーワードの１つである「案内中断」へ訂正する意図を含む操作であると推定することができる。このように、情報処理装置２００は、キーワードの直接入力によらず、キーワードと機能とが対応する操作ボタン等から、訂正する意図を含む操作であると推定することも可能である。 As another example, when the guidance interrupt button is operated after the first keyword "stop guidance," the information processing device 200 can infer that this is an operation that includes an intention to correct "stop guidance," one of the first keywords, to "interrupt guidance," one of the second keywords that corresponds to the guidance interrupt button. In this way, the information processing device 200 can infer that this is an operation that includes an intention to correct, not by direct input of a keyword, but from an operation button or the like that corresponds to a keyword and a function.

〔２．情報処理装置の構成〕
ここからは、図９を用いて、第２の実施形態に係る情報処理装置２００について説明する。なお、情報処理装置２００において情報処理装置１００と同一の符号が付された処理部については説明を省略する場合がある。図９は、第２の実施形態に係る情報処理装置２００の構成例を示す図である。図９に示すように、情報処理装置２００は、通信部１１０と、記憶部２２０と、制御部２３０とを有する。 2. Configuration of information processing device
From here, an information processing device 200 according to the second embodiment will be described with reference to Fig. 9. Note that descriptions of processing units in the information processing device 200 that are assigned the same reference numerals as those in the information processing device 100 may be omitted. Fig. 9 is a diagram showing an example of the configuration of the information processing device 200 according to the second embodiment. As shown in Fig. 9, the information processing device 200 has a communication unit 110, a storage unit 220, and a control unit 230.

（記憶部２２０について）
記憶部２２０は、例えば、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子またはハードディスク、光ディスク等の記憶装置によって実現される。記憶部２２０は、操作情報データベース２２４をさらに有してよい。 (Regarding the storage unit 220)
The storage unit 220 is realized by, for example, a semiconductor memory element such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk. The storage unit 220 may further include an operation information database 224.

（操作情報データベース２２４について）
操作情報データベース２２４は、利用者により行われた手入力操作に関する情報を記憶する。ここで、図１０に、第２の実施形態に係る操作情報データベース２２４の一例を示す。図１０の例では、操作情報データベース２２４は、「利用者ＩＤ」、「操作日時」、「操作情報」といった項目を有する。 (Regarding the operation information database 224)
The operation information database 224 stores information about manual input operations performed by a user. An example of the operation information database 224 according to the second embodiment is shown in Fig. 10. In the example of Fig. 10, the operation information database 224 has items such as "user ID,""operation date and time," and "operation information."

「利用者ＩＤ」は、端末装置１０に対して手入力により情報（例えば、目的地を示す情報）を入力した利用者を識別する識別情報を示す。例えば、情報処理装置１００は、端末装置１０が有するセンサ（例えば、カメラ）による撮像画像に基づき、手入力操作を行った利用者を認識することで、認識した利用者に対して「利用者ＩＤ」を払い出してよい。 "User ID" refers to identification information that identifies a user who manually inputs information (e.g., information indicating a destination) into the terminal device 10. For example, the information processing device 100 may recognize the user who performed the manual input operation based on an image captured by a sensor (e.g., a camera) possessed by the terminal device 10, and issue a "user ID" to the recognized user.

「操作日時」は、手入力操作により情報入力が行われた日時に関する情報を示す。図１０には、利用者ＩＤ「Ｕ１」と、「操作日時♯１１」とが対応付けられる例が示される。係る例は、利用者Ｕ１が、操作日時♯１１という日時に手入力操作を行うことで、端末装置１０に対して目的地等を入力した例を示す。例えば、情報処理装置１００は、発話情報データベース１２１の「発話日時」を発話タイミングと捉え、図１０での「操作日時」操作タイミングと捉えることで、先に入力された発話音声である第１の発話音声と、第１の発話音声が入力された後に行われた手入力操作である第２の入力操作とを認識することができる。 "Operation date and time" indicates information regarding the date and time when information was entered by manual input operation. Figure 10 shows an example in which user ID "U1" is associated with "operation date and time #11." This example shows an example in which user U1 inputs a destination, etc. into the terminal device 10 by performing a manual input operation on the date and time of operation date and time #11. For example, by regarding the "utterance date and time" in the utterance information database 121 as the timing of the utterance and the "operation date and time" in Figure 10 as the operation timing, the information processing device 100 can recognize the first utterance, which is the utterance inputted earlier, and the second input operation, which is the manual input operation performed after the first utterance was inputted.

「操作情報」は、「利用者ＩＤ」が示す利用者によって行われた手入力操作によってどのような内容（例えば、どのような目的地）の情報が入力されたか手入力操作の内容を示す情報である。すなわち、「操作情報」は、端末装置１０の表示パネルに触れることによる目的地設定操作で入力された目的地を示すキーワードを含んでよい。図１０には、利用者ＩＤ「Ｕ１」と、「操作日時♯１１」と、「操作情報♯１１」とが対応付けられる例が示される。係る例は、利用者Ｕ１が、操作日時♯１１という日時に行った手入力操作によって、操作情報♯１１という内容が入力された例を示す。 "Operation information" is information that indicates the content of the manual input operation (e.g., the destination) entered by the user indicated by the "user ID." In other words, "operation information" may include keywords that indicate the destination entered by touching the display panel of the terminal device 10 to set the destination. Figure 10 shows an example in which user ID "U1," "operation date and time #11," and "operation information #11" are associated. This example shows an example in which the content of operation information #11 was entered by user U1 through a manual input operation performed on operation date and time #11.

（制御部２３０について）
図９に戻り、制御部２３０は、ＣＰＵやＭＰＵ等によって、情報処理装置２００内部の記憶装置に記憶されている各種プログラム（例えば、実施形態に係る情報処理プログラム）がＲＡＭを作業領域として実行されることにより実現される。また、制御部２３０は、例えば、ＡＳＩＣやＦＰＧＡ等の集積回路により実現される。 (Regarding the control unit 230)
9 , the control unit 230 is realized by a CPU, an MPU, or the like executing various programs (for example, the information processing program according to the embodiment) stored in a storage device inside the information processing device 200 using RAM as a work area. The control unit 230 is also realized by an integrated circuit such as an ASIC or FPGA.

図９に示すように、制御部２３０は、取得部１３１、訂正音声判定部１３２、検出部１３３、紐付部１３４、学習部１３５、情報制御部１３６に加えて、訂正操作判定部２３７をさらに有してよい。そして、訂正操作判定部２３７は、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部２３０の内部構成は、図９に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。また、制御部２３０が有する各処理部の接続関係は、図９に示した接続関係に限られず、他の接続関係であってもよい。 As shown in FIG. 9, the control unit 230 may further include a correction operation determination unit 237 in addition to the acquisition unit 131, correction voice determination unit 132, detection unit 133, linking unit 134, learning unit 135, and information control unit 136. The correction operation determination unit 237 then realizes or executes the information processing functions and actions described below. Note that the internal configuration of the control unit 230 is not limited to the configuration shown in FIG. 9, and may be other configurations as long as they perform the information processing described below. Furthermore, the connection relationships between the processing units in the control unit 230 are not limited to the connection relationships shown in FIG. 9, and may be other connection relationships.

（取得部１３１について）
取得部１３１は、第２の実施形態に係る情報処理で用いられる各種情報を取得する。また、取得部１３１は、取得した情報を、この情報を用いて処理を行う適切な処理部へと出力してよい。 (Regarding the Acquisition Unit 131)
The acquisition unit 131 acquires various pieces of information used in the information processing according to the second embodiment. Furthermore, the acquisition unit 131 may output the acquired information to an appropriate processing unit that performs processing using this information.

また、取得部１３１は、第１の発話音声を示す第１の音声情報、および、第２の入力操作を示す第２の操作情報を取得してよい。例えば、取得部１３１は、発話タイミングと、手入力操作のタイミングとの前後関係に基づいて、先に入力された発話音声である第１の発話音声と、第１の発話音声が入力された後に行われた手入力操作である第２の入力操作とを認識してよい。また、これにより取得部１３１は、端末装置１０を介してこれまでに収集されている音声情報の中から、第１の発話音声を示す第１の音声情報を取得してよい。また、取得部１３１は、端末装置１０を介してこれまでに収集されている操作情報の中から、第２の入力操作を示す第２の操作情報を取得してよい。 The acquisition unit 131 may also acquire first speech information indicating the first speech and second operation information indicating the second input operation. For example, the acquisition unit 131 may recognize the first speech, which is the speech input earlier, and the second input operation, which is the manual input operation performed after the first speech was input, based on the context between the timing of the speech and the timing of the manual input operation. As a result, the acquisition unit 131 may also acquire first speech information indicating the first speech from among the speech information that has been collected so far via the terminal device 10. The acquisition unit 131 may also acquire second operation information indicating the second input operation from among the operation information that has been collected so far via the terminal device 10.

（訂正操作判定部２３７について）
訂正操作判定部２３７は、第１の発話音声が入力された後に、所定の対象物に触れることで情報入力する第２の入力操作が行われた場合には、第１の発話音声が示す発話内容（第１の発話内容）と、第２の入力操作が示す操作内容（第２の操作内容）とに基づいて、第２の入力操作が第１の発話内容を訂正するための訂正操作であるか否かを判定する。 (Regarding the correction operation determination unit 237)
When a second input operation is performed to input information by touching a specified object after a first utterance has been input, the correction operation determination unit 237 determines whether the second input operation is a correction operation for correcting the first utterance content based on the utterance content indicated by the first utterance (first utterance content) and the operation content indicated by the second input operation (second operation content).

例えば、訂正操作判定部２３７は、第１の発話内容が言い間違えによる誤りの内容であり、この誤りの内容を訂正する意図で利用者が第２の操作内容を手入力したか否か利用者の意図を推定する。つまり、訂正操作判定部２３７は、第１の発話音声が示す第１の発話内容と、第２の入力操作が示す第２の操作内容とに基づいて、利用者が第２の操作内容によって第１の発話内容を訂正する意図があるか否かを推定する。そして、訂正操作判定部２３７は、推定結果に応じて、第２の入力操作が第１の発話内容を訂正するための訂正操作であるか否かを判定する。例えば、訂正操作判定部２３７は、利用者が第２の入力操作によって第１の発話内容を訂正する意図があるとの推定結果が得られた場合には、第２の入力操作が第１の発話内容を訂正するための訂正音声であると判定することができる。For example, the correction operation determination unit 237 estimates the user's intention by determining whether the first utterance is an error due to a slip of the tongue and whether the user manually inputs the second operation content with the intention of correcting this error. In other words, the correction operation determination unit 237 estimates whether the user intends to correct the first utterance content by using the second operation content, based on the first utterance content indicated by the first utterance audio and the second operation content indicated by the second input operation. Then, the correction operation determination unit 237 determines whether the second input operation is a correction operation for correcting the first utterance content, depending on the estimation result. For example, if the estimation result indicates that the user intends to correct the first utterance content by using the second input operation, the correction operation determination unit 237 can determine that the second input operation is correction audio for correcting the first utterance content.

以下では、利用者が第２の入力によって第１の発話内容を訂正する意図があるか否かを推定し、推定結果に応じて、第２の入力操作が第１の発話内容を訂正するための訂正操作であるか否かを判定するという意図解析の具体例を示す。 Below, we will show a specific example of intention analysis in which we estimate whether the user intends to correct the first utterance content by the second input, and based on the estimation result, we determine whether the second input operation is a correction operation for correcting the first utterance content.

例えば、訂正操作判定部２３７は、第１の音声内容（第１の音声情報）を示すテキストに対する形態素解析により、このテキストを構成する各単語を第１のキーワードとして抽出してよい。また、訂正操作判定部２３７は、第２の操作内容（第２の操作情報）に含まれる目的地に関する単語を第２のキーワードとして抽出してよい。For example, the correction operation determination unit 237 may perform morphological analysis on text indicating the first voice content (first voice information) to extract each word constituting this text as a first keyword. Furthermore, the correction operation determination unit 237 may extract words related to the destination included in the second operation content (second operation information) as a second keyword.

係る場合、訂正操作判定部２３７は、第１の発話内容に対応する第１のキーワードと、第２の操作内容に対応する第２のキーワードとに基づいて、第２の入力操作が第１の発話内容を訂正するための訂正操作であるか否かを判定する。 In such a case, the correction operation determination unit 237 determines whether the second input operation is a correction operation for correcting the first utterance content based on the first keyword corresponding to the first utterance content and the second keyword corresponding to the second operation content.

ここで、上記の通り、第２の入力操作は、第１の発話音声に引き続き行われた目的地設定操作であってよい。このようなことから、訂正操作判定部２３７は、目的地設定操作としての第２の入力操作で入力された目的地を示す第２のキーワードを用いて、第２の入力操作が第１の発話内容を訂正するための訂正操作であるか否かを判定する。 Here, as described above, the second input operation may be a destination setting operation performed following the first utterance. For this reason, the correction operation determination unit 237 uses a second keyword indicating the destination entered in the second input operation as a destination setting operation to determine whether the second input operation is a correction operation for correcting the content of the first utterance.

例えば、訂正操作判定部２３７は、第１のキーワードと、第２のキーワードとの類似性に基づいて、第２の入力操作が第１の発話内容を訂正するための訂正操作であるか否かを判定してよい。例えば、訂正音声判定部１３２は、１つの第１のキーワードと、１つの第２のキーワードとの間で成立する組合せごとに、類似性を検出することで、検出した類似性に基づいて、第２の入力操作が第１の発話内容を訂正するために入力された訂正音声であるか否かを判定してよい。For example, the correction operation determination unit 237 may determine whether the second input operation is a correction operation for correcting the first utterance content based on the similarity between the first keyword and the second keyword. For example, the correction voice determination unit 132 may detect similarity for each combination between one first keyword and one second keyword, and then determine whether the second input operation is a correction voice input for correcting the first utterance content based on the detected similarity.

例えば、訂正操作判定部２３７は、第１のキーワードと、第２のキーワードとの類似性を検出するにあたって、第１の実施形態に係る情報処理と同様の手法を用いてよい。具体的には、訂正操作判定部２３７は、読みの類似性、意味の類似性、漢字の読み方の類似性等を検出することで、類似度を算出し、算出した類似度に基づいて、利用者の意図を推定してよい。For example, the correction operation determination unit 237 may use a method similar to the information processing method according to the first embodiment to detect the similarity between the first keyword and the second keyword. Specifically, the correction operation determination unit 237 may calculate the similarity by detecting similarity in pronunciation, similarity in meaning, similarity in the pronunciation of kanji characters, etc., and may infer the user's intention based on the calculated similarity.

また、訂正操作判定部２３７は、第１の発話音声が入力されてから所定の時間が経過するまでの第２の入力操作で入力された第２のキーワードとの類似性に基づいて、第２の入力操作が第１の発話内容を訂正するための訂正操作であるか否かを判定してもよい。 In addition, the correction operation determination unit 237 may determine whether the second input operation is a correction operation for correcting the content of the first utterance based on the similarity with the second keyword entered in the second input operation until a predetermined time has elapsed since the first utterance was input.

ここで、利用者は、目的地を示す発話音声を入力したが、目的地を言い間違えていたことに気付いた場合、一般に、車両ＶＥｘを停車させた状態で、端末装置１０に対する手入力により目的地を設定し直す。このようなことから、訂正操作判定部２３７は、第１の発話音声が入力されてから車両ＶＥｘが動き出すまでの間（第１の発話音声が入力された後、車両ＶＥｘが停車されている間）において、第２の入力操作が行われたことを検知できた場合には、この第２の入力操作で入力された第２のキーワードとの類似性に基づいて、第２の入力操作が第１の発話内容を訂正するための訂正操作であるか否かを判定してもよい。 Here, if the user inputs a spoken voice indicating a destination but realizes that they misspoken the destination, they will typically reset the destination by manually entering the destination into the terminal device 10 while the vehicle VEx is stopped. For this reason, if the correction operation determination unit 237 detects that a second input operation has been performed between the time the first spoken voice is input and the time the vehicle VEx starts moving (while the vehicle VEx is stopped after the first spoken voice is input), it may determine whether the second input operation is a correction operation for correcting the content of the first utterance based on the similarity with the second keyword input in this second input operation.

（紐付部１３４について）
紐付部１３４は、第２の入力操作が第１の発話内容を訂正するための訂正操作であると判定された場合には、第１の発話内容と、第２の入力操作が示す第２の入力内容とを紐付ける。 (Regarding the linking unit 134)
When the linking unit 134 determines that the second input operation is a correction operation for correcting the first utterance content, the linking unit 134 links the first utterance content with the second input content indicated by the second input operation.

例えば、紐付部１３４は、第２の入力操作が第１の発話内容を訂正するための訂正操作であると判定された場合には、第２の操作内容が示す第２のキーワードと、第１の発話内容に含まれる第１のキーワードとの組合せのうち、互いに類似していると判定された第２のキーワードと、第１のキーワードとを組合せを抽出する。そして、紐付部１３４は、抽出した組合せにおける第２のキーワードを正解情報とし、また、この組合せにおける第１のキーワードを当該正解情報に対する誤り情報として、第２のキーワードと、第１のキーワードとを紐付ける。For example, when the linking unit 134 determines that the second input operation is a correction operation for correcting the first utterance, it extracts a combination of the second keyword indicated by the second operation and the first keyword included in the first utterance that is determined to be similar to each other from among the combinations.The linking unit 134 then links the second keyword and the first keyword, regarding the second keyword in the extracted combination as correct answer information and the first keyword in this combination as incorrect information for the correct answer information.

（学習部１３５について）
学習部１３５は、紐付部１３４により紐付けられた正解情報と誤り情報との組を学習データとして、誤り情報が示す発話内容のうち、正解情報が示す操作内容に対して間違えられやすい発話内容のパターンを学習する。例えば、学習部１３５は、正解情報と誤り情報との組を学習データとして、誤り情報が示す第１のキーワードのうち、正解情報が示す第２のキーワードに対して間違えられやすいキーワードのパターンを学習する。 (Regarding the learning unit 135)
The learning unit 135 uses as learning data the pairs of correct answer information and error information linked by the linking unit 134 to learn patterns of utterance content that is likely to be mistaken for the operation content indicated by the correct answer information, among the utterance content indicated by the error information. For example, the learning unit 135 uses as learning data the pairs of correct answer information and error information to learn patterns of keywords that are likely to be mistaken for the second keyword indicated by the correct answer information, among the first keywords indicated by the error information.

例えば、情報制御部１３６は、紐付部１３４により紐付けられた正解情報と誤り情報との関係性に基づいて、誤り情報が示す発話内容の音声が入力された場合に、入力された発話内容が、当該誤り情報に対応付けられる正解情報が示す操作内容として認識されるよう、当該誤り情報を正解情報としてユーザ辞書（ユーザ辞書データベース１２３）に登録する。 For example, based on the relationship between the correct answer information and the incorrect information linked by the linking unit 134, the information control unit 136 registers the incorrect information as correct answer information in the user dictionary (user dictionary database 123) so that when speech of the speech content indicated by the incorrect information is input, the input speech content is recognized as the operation content indicated by the correct answer information associated with the incorrect information.

例えば、情報制御部１３６は、学習部１３５による学習結果に基づいて、誤り情報が示す発話内容のうち、正解情報が示す操作内容に対して間違えられやすい発話内容の音声が入力された場合に、入力された発話内容が、当該誤り情報に対応付けられる正解情報が示す操作内容として認識されるよう、当該誤り情報を正解情報としてユーザ辞書に登録する。例えば、情報制御部１３６は、学習部１３５による学習結果に基づいて、キーワードをユーザ辞書に登録する。例えば、情報制御部１３６は、学習結果に基づいて、第１のキーワードのうち、第２のキーワードに対して間違えられやすいキーワードを含む発話音声が入力された場合に、入力されたこの第１のキーワードが第２のキーワードとして認識されるよう、この第１のキーワードをユーザ辞書に登録する。For example, based on the learning results by the learning unit 135, the information control unit 136 registers the error information as correct information in the user dictionary so that when a speech that is easily mistaken for an operation indicated by correct information is input, the input speech is recognized as the operation indicated by the correct information associated with the error information. For example, the information control unit 136 registers keywords in the user dictionary based on the learning results by the learning unit 135. For example, based on the learning results, when a speech that includes a first keyword that is easily mistaken for a second keyword is input, the information control unit 136 registers the first keyword in the user dictionary so that the input first keyword is recognized as the second keyword.

〔３．処理手順〕
次に、図１１を用いて、第２の実施形態に係る情報処理の手順について説明する。図１１は、第２の実施形態に係る情報処理の手順を示すフローチャートである。なお、図１１の例では、端末装置１０が、発話音声の入力を受け付けるたびに、受け付けた発話音声を示す発話情報を情報処理装置１００に送信しているものとする。また、情報処理装置１００は、端末装置１０から送信された音声情報を発話情報データベース１２１に随時蓄積しているものとする。 3. Processing Procedure
Next, the procedure of information processing according to the second embodiment will be described with reference to Fig. 11 . Fig. 11 is a flowchart showing the procedure of information processing according to the second embodiment. In the example of Fig. 11 , it is assumed that the terminal device 10 transmits speech information indicating the received speech to the information processing device 100 every time it receives input of a speech voice. It is also assumed that the information processing device 100 accumulates the speech information transmitted from the terminal device 10 in the speech information database 121 as needed.

一方、図１１の例では、端末装置１０が、手入力操作による入力を受け付けるたびに、入力された操作内容を示す操作情報を情報処理装置１００に送信しているものとする。また、情報処理装置１００は、端末装置１０から送信された操作情報を操作情報データベース２２４に随時蓄積しているものとする。 On the other hand, in the example of Figure 11, it is assumed that the terminal device 10 transmits operation information indicating the input operation content to the information processing device 100 each time it receives input by manual operation. It is also assumed that the information processing device 100 accumulates the operation information transmitted from the terminal device 10 in the operation information database 224 as needed.

また、図１１では、車両ＶＥ１の利用者Ｕ１を一例に用いて、情報処理手順を説明する。 In addition, Figure 11 explains the information processing procedure using user U1 of vehicle VE1 as an example.

このような状態において、取得部１３１は、意図解析を行うタイミングになったか否かを判定する（ステップＳ８０１）。例えば、取得部１３１は、意図解析を行う上で十分な数の音声情報が発話情報データベース１２１に蓄積され、また、意図解析を行う上で十分な数の操作情報が操作情報データベース２２４に蓄積されているか否かに基づき、意図解析を行うタイミングになったか否かを判定してよい。In this state, the acquisition unit 131 determines whether it is time to perform intention analysis (step S801). For example, the acquisition unit 131 may determine whether it is time to perform intention analysis based on whether a sufficient amount of speech information for performing intention analysis has been stored in the utterance information database 121 and whether a sufficient amount of operation information for performing intention analysis has been stored in the operation information database 224.

取得部１３１は、意図解析を行うタイミングになっていないと判定している間は（ステップＳ８０１；Ｎｏ）、意図解析を行うタイミングになったと判定できるまで待機する。 While the acquisition unit 131 determines that it is not time to perform intention analysis (step S801; No), it waits until it determines that it is time to perform intention analysis.

一方、取得部１３１は、意図解析を行うタイミングになったと判定できた場合には（ステップＳ８０１；Ｙｅｓ）、発話タイミングと、操作タイミングとの前後関係に基づいて、第１の発話音声を示す第１の音声情報、および、第２の入力操作を示す第２の操作情報を取得する（ステップＳ８０２）。例えば、取得部１３１は、発話タイミングと、操作タイミングとの前後関係に基づいて、利用者Ｕ１によって先に入力された発話音声である第１の発話音声と、第１の発話音声が入力された後に利用者Ｕ１によって行われた手入力操作である第２の入力操作とを認識してよい。On the other hand, if the acquisition unit 131 determines that it is time to perform intention analysis (step S801; Yes), it acquires first speech information indicating a first speech and second operation information indicating a second input operation based on the context between the speech timing and the operation timing (step S802). For example, based on the context between the speech timing and the operation timing, the acquisition unit 131 may recognize the first speech, which is the speech input earlier by user U1, and the second input operation, which is the manual input operation performed by user U1 after the first speech was input.

そして、取得部１３１は、発話情報データベース１２１に蓄積されている音声情報であって、利用者Ｕ１に対応する音声情報の中から、第１の発話音声を示す第１の音声情報を取得する。また、取得部１３１は、操作情報データベース２２４に蓄積されている操作情報であって、利用者Ｕ１に対応する操作情報の中から、第２の入力操作を示す第２の操作情報を取得する。 Then, the acquisition unit 131 acquires first speech information indicating a first speech voice from the speech information stored in the speech information database 121 and corresponding to user U1. The acquisition unit 131 also acquires second operation information indicating a second input operation from the operation information stored in the operation information database 224 and corresponding to user U1.

次に、訂正操作判定部２３７は、第１の音声情報と、第２の入力情報との組の全てについて、意図解析が済んでいない状態であるか否かを判定する（ステップＳ８０３）。なお、ここでいう、第１の音声情報と、第２の音声情報との組とは、発話タイミングと操作タイミングとが連続する関係にある第１の発話音声と、第２の入力操作とに対応する組であってよい。Next, the correction operation determination unit 237 determines whether intention analysis has not been completed for all pairs of first voice information and second input information (step S803). Note that the pair of first voice information and second voice information referred to here may be a pair corresponding to a first voice utterance and a second input operation in which the utterance timing and the operation timing are in a continuous relationship.

訂正操作判定部２３７は、第１の音声情報と、第２の入力情報との組の全てについて、意図解析済みであると判定した場合には（ステップＳ８０３；Ｎｏ）、この時点で第２の実施形態に係る情報処理を終了させる。 If the correction operation determination unit 237 determines that intention analysis has been completed for all pairs of first voice information and second input information (step S803; No), it terminates the information processing related to the second embodiment at this point.

一方、訂正操作判定部２３７は、第１の音声情報と、第２の入力情報との組の全てについて、意図解析が済んでいない状態であると判定した場合には（ステップＳ８０３；Ｙｅｓ）、第１の音声情報と、第２の入力情報との組のうち、意図解析が済んでいない未処理の組を取得する（ステップＳ８０４）。 On the other hand, if the correction operation determination unit 237 determines that intention analysis has not been completed for all pairs of first voice information and second input information (step S803; Yes), it obtains unprocessed pairs of first voice information and second input information for which intention analysis has not been completed (step S804).

続いて、訂正操作判定部２３７は、ステップＳ８０４で取得した第１の音声情報および第２の入力情報に基づいて、第１の音声情報が示す第１の発話内容を訂正するために、第２の入力操作が行われたか否か利用者Ｕ１の意図を推定する（ステップＳ８０５）。具体的には、訂正操作判定部２３７は、第１の発話音声が示す第１の発話内容と、第２の入力操作が示す第２の操作内容とに基づいて、第２の操作内容によって第１の発話内容を訂正するために、利用者Ｕ１が第２の発話内容を示す第２の入力操作を行ったか否か利用者Ｕ１の意図を推定する。 Next, the correction operation determination unit 237 estimates the intention of user U1 based on the first voice information and second input information acquired in step S804, by determining whether a second input operation was performed to correct the first utterance content indicated by the first voice information (step S805). Specifically, the correction operation determination unit 237 estimates the intention of user U1 based on the first utterance content indicated by the first voice and the second operation content indicated by the second input operation, by determining whether user U1 performed a second input operation indicating the second utterance content in order to correct the first utterance content using the second operation content.

例えば、訂正操作判定部２３７は、第１の音声内容（第１の音声情報）を示すテキストに対する形態素解析により、このテキストを構成する各単語を第１のキーワードとして抽出する。また、訂正操作判定部２３７は、第２の操作内容（第２の操作情報）に含まれる目的地に関する単語を第２のキーワードとして抽出してよい。そして、訂正操作判定部２３７は、抽出した第１のキーワードと、第２のキーワードとの類似性に基づいて、第２の操作内容によって第１の発話内容を訂正するために、利用者Ｕ１が第２の入力操作を行ったか否か利用者Ｕ１の意図を推定する。For example, the correction operation determination unit 237 performs morphological analysis on text indicating the first speech content (first speech information) to extract each word constituting this text as a first keyword. The correction operation determination unit 237 may also extract words related to the destination included in the second operation content (second operation information) as a second keyword. Then, based on the similarity between the extracted first keyword and the second keyword, the correction operation determination unit 237 estimates the intention of user U1, i.e., whether user U1 performed a second input operation to correct the first speech content using the second operation content.

続いて、訂正操作判定部２３７は、意図解析により利用者Ｕ１の意図を推定した推定結果に基づいて、第２の入力操作が第１の発話内容を訂正するための訂正操作であるか否かを判定する（ステップＳ８０６）。 Next, the correction operation determination unit 237 determines whether the second input operation is a correction operation for correcting the first utterance content based on the estimation result of the intention of user U1 through intention analysis (step S806).

訂正操作判定部２３７は、第２の入力操作が第１の発話内容を訂正するための訂正操作でないと判定した場合には（ステップＳ８０６；Ｎｏ）、第１の音声情報と、第２の操作情報との組のうち、意図解析が済んでいない未処理の他の組について処理を行うべくステップＳ８０３に戻る。 If the correction operation determination unit 237 determines that the second input operation is not a correction operation for correcting the first utterance content (step S806; No), it returns to step S803 to process other unprocessed pairs of first voice information and second operation information for which intention analysis has not been completed.

一方、紐付部１３４は、第２の入力操作が第１の発話内容を訂正するための訂正操作であると判定した場合には（ステップＳ８０６；Ｙｅｓ）、第１のキーワードと、第２のキーワードとを紐付ける紐付処理を行う（ステップＳ８０７）。例えば、紐付部１３４は、第２の操作内容（第２の操作情報）に含まれる第２のキーワードと、第１の発話内容（第１の発話情報）に含まれる第１のキーワードとの組合せのうち、互いに類似していると判定された第２のキーワードと、第１のキーワードとの組合せを抽出する。そして、紐付部１３４は、抽出した組合せにおける第２のキーワードを正解情報とし、また、この組合せにおける第１のキーワードを当該正解情報に対する誤り情報として、第２のキーワードと、第１のキーワードとを紐付ける。On the other hand, if the linking unit 134 determines that the second input operation is a correction operation for correcting the first utterance content (step S806; Yes), it performs a linking process to link the first keyword with the second keyword (step S807). For example, the linking unit 134 extracts combinations of the second keyword and the first keyword that are determined to be similar to each other from among combinations of the second keyword included in the second operation content (second operation information) and the first keyword included in the first utterance content (first utterance information). Then, the linking unit 134 links the second keyword and the first keyword, regarding the second keyword in the extracted combination as correct answer information and the first keyword in this combination as incorrect information for the correct answer information.

次に、学習部１３５は、紐付処理の結果得られた正解情報と誤り情報との組を学習データとして、誤り情報が示す第１のキーワードのうち、正解情報が示す第２のキーワードに対して間違えられやすいキーワードのパターンを学習する（ステップＳ８０８）。 Next, the learning unit 135 uses the pairs of correct answer information and erroneous information obtained as a result of the linking process as learning data to learn patterns of keywords among the first keywords indicated by the erroneous information that are likely to be confused with the second keywords indicated by the correct answer information (step S808).

次に、情報制御部１３６は、学習結果に基づいて、キーワードを辞書登録する（ステップＳ８０９）。例えば、情報制御部１３６は、学習結果に基づいて、第１のキーワードのうち、第２のキーワードに対して間違えられやすいキーワードを含む発話音声が入力された場合に、入力された第１のキーワードが第２のキーワードとして認識されるよう、第１のキーワードを「発話キーワード」とし、第２のキーワードを「認識キーワード」として、双方のキーワードを対応付けた状態でユーザ辞書に登録する。この結果、図６に示すような、ユーザ辞書データベース１２３が得られる。Next, the information control unit 136 registers the keywords in the dictionary based on the learning results (step S809). For example, based on the learning results, when a speech containing a first keyword that is easily confused with a second keyword is input, the information control unit 136 registers the first keyword as an "utterance keyword" and the second keyword as a "recognition keyword" in the user dictionary in a corresponding state, so that the input first keyword is recognized as the second keyword. As a result, a user dictionary database 123 such as that shown in Figure 6 is obtained.

ここで、情報制御部１３６は、ステップＳ８０３へと処理を戻す。そして、第１の音声情報と、第２の操作情報との組の全てについて、意図解析済みであると判定された場合には、この時点で第２の実施形態に係る情報処理は終了となる。 At this point, the information control unit 136 returns the process to step S803. If it is determined that intention analysis has been completed for all pairs of first voice information and second operation information, the information processing according to the second embodiment ends at this point.

（その他の実施形態）
情報処理装置１００（情報処理装置２００）は、上記実施形態以外にも種々の異なる態様で実現されてよい。そこで、以下では、情報処理装置１００（情報処理装置２００）の他の実施形態について説明する。 (Other embodiments)
The information processing device 100 (information processing device 200) may be realized in various different modes other than the above-described embodiment. Therefore, other embodiments of the information processing device 100 (information processing device 200) will be described below.

〔１．言い間違いを示唆するワード検出〕
上記第１の実施形態では、取得部１３１が、発話タイミングの前後関係に基づいて、先に入力された発話音声である第１の発話音声と、第１の発話音声が入力された後に入力された発話音声である第２の発話音声とを認識することで、第１の音声情報、および、第２の音声情報を取得する例を示した。 1. Detecting words that suggest slip-ups
In the above first embodiment, an example was shown in which the acquisition unit 131 acquires first speech information and second speech information by recognizing a first speech voice, which is a speech voice that was input first, and a second speech voice, which is a speech voice that was input after the first speech voice was input, based on the context of the speech timing.

しかし、取得部１３１は、言い間違いを示唆するワードを検出できた場合には、このワードが発せられたタイミングに基づいて、第１の発話音声と第２の発話音声とを認識することで、第１の音声情報、および、第２の音声情報を取得してよい。 However, if the acquisition unit 131 detects a word that suggests a slip of the tongue, it may acquire the first speech information and the second speech information by recognizing the first speech audio and the second speech audio based on the timing at which the word was uttered.

例えば、利用者は、言い間違いに気付いた場合、条件反射的に、「間違えた！」あるいは「しまった！」等と発してしまう場合がある。また、利用者は、このように発した直後に、言い間違いを訂正するための発話音声を入力する傾向にある。For example, when a user realizes that they have made a mistake, they may reflexively say, "I made a mistake!" or "Oh no!" Immediately after making such an utterance, the user tends to input speech to correct the mistake.

このようなことから、取得部１３１は、「間違えた！」あるいは「しまった！」等の言い間違いを示唆するワードを検出できた場合には、係るワードの直前に入力された発話音声を第１の発話音声として認識し、また、係るワードの直後に入力された発話音声を第２の発話音声として認識してよい。 For this reason, when the acquisition unit 131 detects a word that suggests a slip of the tongue, such as "I made a mistake!" or "Oh no!", it may recognize the speech input immediately before the word as the first speech, and may recognize the speech input immediately after the word as the second speech.

また、訂正音声判定部１３２は、言い間違いを示唆するワードが検出された場合には、係るワードの直後に入力された発話音声（すなわち、第２の発話音声）が、第１の発話内容を訂正するために入力された訂正音声であると判定してもよい。 In addition, when a word suggesting a slip of the tongue is detected, the correction voice determination unit 132 may determine that the speech input immediately after the word (i.e., the second speech) is correction voice input to correct the content of the first speech.

また、取得部１３１は、第２の実施形態でも言い間違いを示唆するワードを検出してよい。そして、第２の実施形態では、取得部１３１は、言い間違いを示唆するワードを検出できた場合には、このワードが発せられたタイミングに基づいて、第１の発話音声と第２の入力操作とを認識することで、第１の音声情報、および、第２の操作情報を取得してよい。 Furthermore, in the second embodiment, the acquisition unit 131 may also detect words that suggest a slip of the tongue. Then, in the second embodiment, if the acquisition unit 131 detects a word that suggests a slip of the tongue, the acquisition unit 131 may acquire first voice information and second operation information by recognizing the first utterance and the second input operation based on the timing at which the word was uttered.

例えば、取得部１３１は、「間違えた！」あるいは「しまった！」等の言い間違いを示唆するワードを検出できた場合には、係るワードの直前に入力された発話音声を第１の発話音声として認識し、また、係るワードの直後に行われた手入力操作を第２の入力操作として認識してよい。 For example, if the acquisition unit 131 detects a word that suggests a slip of the tongue, such as "I made a mistake!" or "Oh no!", it may recognize the speech input immediately before the word as the first speech, and may also recognize the manual input operation performed immediately after the word as the second input operation.

また、訂正操作判定部２３７は、言い間違いを示唆するワードが検出された場合には、係るワードの直後に行われた手入力操作（すなわち、第２の入力操作）が、第１の発話内容を訂正するための訂正操作であると判定してもよい。 In addition, when a word suggesting a slip of the tongue is detected, the correction operation determination unit 237 may determine that the manual input operation (i.e., the second input operation) performed immediately after the word is a correction operation for correcting the content of the first utterance.

〔２．登録精度向上に関する施策〕
上記第２の実施形態では、情報制御部１３６が、学習部１３５による学習結果に基づいて、第２の操作内容に含まれる第２のキーワードと、第２の発話音声に含まれる第１のキーワードのうち第２のキーワードに対して間違えられやすい第１のキーワードとを対応付けてユーザ辞書に登録する例を示した。 [2. Measures to improve registration accuracy]
In the above second embodiment, an example was shown in which the information control unit 136, based on the learning results by the learning unit 135, associates a second keyword included in the second operation content with a first keyword included in the second speech voice that is likely to be mistaken for the second keyword, and registers the first keyword in the user dictionary.

しかし、情報制御部１３６は、第２のキーワードが示す目的地に利用者が実際に到着したか否かに基づいて、第２のキーワードと、第１のキーワードとを対応付けてユーザ辞書に登録してもよい。例えば、情報制御部１３６は、利用者（利用者の車両ＶＥｘ）の位置情報と、利用者により設定された目的地（第２のキーワード）に基づき、利用者がこの目的地に到着したか否かを判定し、利用者が目的地に到着したことを検知できた場合には、第２のキーワードと、第１のキーワードとを対応付けてユーザ辞書に登録してもよい。However, the information control unit 136 may associate the second keyword with the first keyword and register them in the user dictionary based on whether the user has actually arrived at the destination indicated by the second keyword. For example, the information control unit 136 may determine whether the user has arrived at the destination (second keyword) based on the location information of the user (user's vehicle VEx) and the destination set by the user, and if it detects that the user has arrived at the destination, it may associate the second keyword with the first keyword and register them in the user dictionary.

このような第２の実施形態に係る情報処理によれば、情報処理装置２００は、ユーザ辞書への登録精度を向上させることができるようになる。 According to the information processing of the second embodiment, the information processing device 200 can improve the accuracy of registration in the user dictionary.

（その他）
〔１．ハードウェア構成〕
また、上述してきた第１の実施形態に係る情報処理装置１００、および、第２の実施形態に係る情報処理装置２００は、例えば、図１２に示すような構成のコンピュータ１０００によって実現される。以下、情報処理装置１００を例に説明する。図１２は、情報処理装置１００の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ１３００、ＨＤＤ１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 (others)
[1. Hardware Configuration]
The information processing apparatus 100 according to the first embodiment and the information processing apparatus 200 according to the second embodiment described above are realized, for example, by a computer 1000 configured as shown in Fig. 12. The information processing apparatus 100 will be described below as an example. Fig. 12 is a hardware configuration diagram showing an example of a computer that realizes the functions of the information processing apparatus 100. The computer 1000 has a CPU 1100, a RAM 1200, a ROM 1300, a HDD 1400, a communication interface (I/F) 1500, an input/output interface (I/F) 1600, and a media interface (I/F) 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on programs stored in the ROM 1300 or the HDD 1400 and controls each component. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 starts up, as well as programs that depend on the computer 1000's hardware.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、所定の通信網を介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータを所定の通信網を介して他の機器へ送信する。 HDD 1400 stores programs executed by CPU 1100 and data used by such programs. Communication interface 1500 receives data from other devices via a specified communication network and sends it to CPU 1100, and transmits data generated by CPU 1100 to other devices via a specified communication network.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls output devices such as displays and printers, and input devices such as keyboards and mice, via the input/output interface 1600. The CPU 1100 acquires data from the input devices via the input/output interface 1600. The CPU 1100 also outputs generated data to the output devices via the input/output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 Media interface 1700 reads a program or data stored on recording medium 1800 and provides it to CPU 1100 via RAM 1200. CPU 1100 loads the program from recording medium 1800 onto RAM 1200 via media interface 1700 and executes the loaded program. Recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or a PD (Phase Change Rewritable Disc), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory.

例えば、コンピュータ１０００が第１の実施形態に情報処理装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラム（例えば、実施形態に係る情報処理プログラム）を実行することにより、制御部１３０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置から所定の通信網を介してこれらのプログラムを取得してもよい。For example, when the computer 1000 functions as the information processing device 100 in the first embodiment, the CPU 1100 of the computer 1000 realizes the functions of the control unit 130 by executing a program (e.g., an information processing program according to the embodiment) loaded onto the RAM 1200. The CPU 1100 of the computer 1000 reads and executes these programs from the recording medium 1800, but as another example, the CPU 1100 may obtain these programs from another device via a predetermined communication network.

また、例えば、コンピュータ１０００が第２の実施形態に情報処理装置２００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラム（例えば、実施形態に係る情報処理プログラム）を実行することにより、制御部２３０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムを記録媒体１８００から読み取って実行するが、他の例として、他の装置から所定の通信網を介してこれらのプログラムを取得してもよい。 Furthermore, for example, when the computer 1000 functions as the information processing device 200 in the second embodiment, the CPU 1100 of the computer 1000 realizes the functions of the control unit 230 by executing a program (e.g., an information processing program according to the embodiment) loaded onto the RAM 1200. The CPU 1100 of the computer 1000 reads and executes these programs from the recording medium 1800, but as another example, the CPU 1100 may obtain these programs from another device via a predetermined communication network.

〔２．その他〕
また、上記各実施形態において説明した処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [2. Other]
Furthermore, among the processes described in each of the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically using known methods. In addition, the information including the processing procedures, specific names, various data, and parameters shown in the above documents and drawings can be changed as desired unless otherwise specified. For example, the various information shown in each drawing is not limited to the information shown in the drawings.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Furthermore, the components of each device shown in the figure are functional concepts and do not necessarily have to be physically configured as shown. In other words, the specific form of distribution and integration of each device is not limited to that shown, and all or part of them can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc.

また、上記各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the above embodiments can be combined as appropriate to the extent that the processing content is not contradictory.

（まとめ）
以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 (summary)
Although some of the embodiments of the present application have been described in detail above with reference to the drawings, these are merely examples, and the present invention can be implemented in other forms that include the embodiments described in the Disclosure of the Invention section and that have undergone various modifications and improvements based on the knowledge of those skilled in the art.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 In addition, the "parts" (sections, modules, units) mentioned above can be read as "means" or "circuits." For example, an acquisition unit can be read as an acquisition means or an acquisition circuit.

１情報処理システム
１０端末装置
１００情報処理装置
１２０記憶部
１２１発話情報データベース
１２２紐付情報データベース
１２３ユーザ辞書データベース
１３０制御部
１３１取得部
１３２訂正音声判定部
１３３検出部
１３４紐付部
１３５学習部
１３６情報制御部
２００情報処理装置
２２０記憶部
２２４操作情報データベース
２３０制御部
２３７訂正操作判定部 REFERENCE SIGNS LIST 1 Information processing system 10 Terminal device 100 Information processing device 120 Storage unit 121 Speech information database 122 Linking information database 123 User dictionary database 130 Control unit 131 Acquisition unit 132 Corrected speech determination unit 133 Detection unit 134 Linking unit 135 Learning unit 136 Information control unit 200 Information processing device 220 Storage unit 224 Operation information database 230 Control unit 237 Correction operation determination unit

Claims

a voice information acquisition unit that acquires information indicating a user's voice;
an operation information acquisition unit that acquires information indicating an input operation input by the user by touching a predetermined object;
a determination unit that, when the voice information acquisition unit acquires information indicating a first utterance and then acquires information indicating a word suggesting a slip of the tongue, and when the operation information acquisition unit acquires information indicating a second input operation, determines whether the second input operation is a correction operation for correcting the speech content indicated by the first utterance ;
and a registration control unit that, when the determination unit determines that the second input operation is the correction operation, links the operation content indicated by the second input operation with the speech content indicated by the first speech voice and registers them in a predetermined dictionary .

The information processing device according to claim 1, characterized in that the determination unit determines whether the second input operation is a correction operation for correcting the speech content based on a similarity between a first keyword included in the first speech voice as the speech content indicated by the first speech voice and a second keyword input by the second input operation as the operation content indicated by the second input operation.

the second input operation is a destination setting operation performed subsequent to the first utterance,
The information processing device according to claim 2, characterized in that the determination unit determines whether the second input operation is a correction operation for correcting the speech content indicated by the first speech voice using a second keyword indicating the destination input in the destination setting operation as the second keyword.

3. The information processing device according to claim 2, wherein the determination unit determines whether the second input operation is a correction operation for correcting the speech content indicated by the first speech voice based on a similarity between the second keyword and a second keyword included in the second input operation inputted within a predetermined time period after the information indicating the first speech voice is acquired.

The information processing device of any one of claims 2 to 4, characterized in that when the determination unit determines that the second input operation is the correction operation, the registration control unit associates the second keyword with the first keyword, treating the second keyword as correct information and the first keyword as error information for the correct information.

The information processing device according to claim 5 , characterized in that the registration control unit, based on the relationship between the linked correct answer information and the linked error information, when speech of the speech content indicated by the error information is acquired , associates the error information with the correct answer information and registers the error information in the specified dictionary so that the acquired speech content is recognized as the operation content indicated by the correct answer information linked to the error information.

a learning unit configured to learn, as learning data, a set of linked correct answer information and error information, a pattern of utterance content that is likely to be mistaken for an operation content indicated by the correct answer information, among utterance content indicated by the error information;
The information processing device according to claim 6, characterized in that, when a speech content indicated by the error information is input that is likely to be mistaken for an operation content indicated by the correct answer information, the registration control unit registers the error information in the specified dictionary in association with the correct answer information based on the learning result by the learning unit , so that the input speech content is recognized as the operation content indicated by the correct answer information associated with the error information.

An information processing method executed by an information processing device,
a voice information acquisition step of acquiring information indicating a voice uttered by a user;
an operation information acquiring step of acquiring information indicating an input operation input by the user by touching a predetermined object;
a determining step of determining whether or not the second input operation is a correction operation for correcting the content of the utterance indicated by the first utterance, when information indicating a word suggesting a slip of the tongue is obtained after information indicating a first utterance is obtained in the voice information obtaining step and information indicating a second input operation is obtained in the operation information obtaining step;
and a registration control step of linking the operation content indicated by the second input operation with the speech content indicated by the first speech voice and registering them in a predetermined dictionary when the second input operation is determined to be the correction operation in the determination step.

a voice information acquisition step of acquiring information indicating a user's voice;
an operation information acquisition step of acquiring information indicating an input operation input by the user by touching a predetermined object;
a determination step of determining whether or not the second input operation is a correction operation for correcting the content of the utterance indicated by the first utterance, when information indicating a word suggesting a slip of the tongue is obtained after information indicating a first utterance is obtained in the voice information acquisition step and information indicating a second input operation is obtained in the operation information acquisition step;
and a registration control procedure for linking the operation content indicated by the second input operation with the speech content indicated by the first speech voice and registering them in a predetermined dictionary, if the second input operation is determined to be the correction operation in the determination procedure.