JP6495015B2

JP6495015B2 - Spoken dialogue control device, control method of spoken dialogue control device, and spoken dialogue device

Info

Publication number: JP6495015B2
Application number: JP2015002569A
Authority: JP
Inventors: 暁本村
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2015-01-08
Filing date: 2015-01-08
Publication date: 2019-04-03
Anticipated expiration: 2035-01-08
Also published as: JP2016126294A

Description

本発明は、ユーザの発話に対して応答する音声対話装置を制御するための音声対話制御装置に関する。 The present invention relates to a voice dialogue control device for controlling a voice dialogue device that responds to a user's utterance.

ユーザの発話に対して音声や動作で応答することで、ユーザと対話する音声対話装置（ロボット）が、従来から広く研究されている。ここで、ユーザと音声対話装置の対話においては、ユーザが発話してから、音声対話装置が当該発話の内容に応じた応答をするまでにある程度の時間を要する。この時間に音声対話装置が何も動作しないと、ユーザが音声対話装置とのコミュニケーションにおいてストレスを感じる可能性がある。この問題に対する解決策として、例えば、下記の特許文献１には、サーバからの回答を受信するまでの待機時間を予測して、必要であれば待機時間を埋める動作（場つなぎ動作）を実行する技術が開示されている。 2. Description of the Related Art Conventionally, a speech dialogue apparatus (robot) that interacts with a user by responding to the user's utterance with voice or motion has been widely studied. Here, in the dialogue between the user and the voice interactive device, a certain amount of time is required from when the user speaks until the voice interactive device responds according to the content of the speech. If the voice interaction device does not operate at this time, the user may feel stress in communication with the voice interaction device. As a solution to this problem, for example, in Patent Document 1 below, a standby time until receiving an answer from the server is predicted, and if necessary, an operation (filling operation) is performed to fill the standby time. Technology is disclosed.

特開２０１４−１９１０３０号公報（２０１４年１０月０６日公開）JP 2014-191030 A (released on October 06, 2014) 特開２００３−３３０９２３号公報（２００３年１１月２１日公開）JP 2003-330923 A (published November 21, 2003)

しかしながら、上記の特許文献１および２に記載の技術では、場つなぎ動作の実行の有無に関わらず、ユーザが同じ内容の音声を発した場合の応答は画一的である。例えば「今日の天気は何？」というユーザの質問に対してすぐに応答した場合と、場つなぎ動作を実行した後で（時間を要して）応答した場合とで、同じ「晴れだよ」という応答を実行している。つまり、従来の技術では、応答までに時間を要したなどの通常と異なる応答をすべき状況で、音声対話装置は画一的な応答しかできないという問題があった。 However, with the technologies described in Patent Documents 1 and 2, the response when the user utters the same content is uniform regardless of whether or not the joining operation is performed. For example, when you respond immediately to the user's question "What is the weather today?" Is executing the response. In other words, the conventional technology has a problem that the voice interaction apparatus can perform only a uniform response in a situation where a response different from the normal response is required, for example, it takes time to respond.

本発明は、上記の問題に鑑みてなされたものであり、その目的は、応答を修正すべき状況で応答を修正することで、ユーザと音声対話装置とのコミュニケーションの柔軟性を向上させる音声対話制御装置などを提供することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to improve the flexibility of communication between a user and a voice interaction device by correcting the response in a situation where the response should be corrected. It is to provide a control device and the like.

上記の課題を解決するために、本発明の一態様に係る音声対話制御装置は、音声対話装置が実行する、ユーザが発した音声に対する応答を生成する応答生成部と、上記音声を取得してから上記応答が出力可能になるまでの待機時間に、上記応答の修正要否を判定するための修正条件が満たされたか否かを判定する判定部と、上記判定部が、上記修正条件が満たされたと判定したとき、上記応答生成部が生成した上記応答を修正した修正応答を生成する修正部と、上記修正部が生成した上記修正応答を上記音声対話装置に実行させる応答実行部と、を備える。 In order to solve the above-described problem, a voice interaction control device according to one aspect of the present invention acquires a response, a response generation unit that generates a response to a voice uttered by a user, and that is executed by the voice interaction device. The determination unit for determining whether or not the correction condition for determining whether or not the response needs to be corrected is satisfied during the standby time until the response can be output, and the determination unit satisfies the correction condition. A correction unit that generates a correction response that corrects the response generated by the response generation unit, and a response execution unit that causes the voice interaction device to execute the correction response generated by the correction unit. Prepare.

また、上記の課題を解決するために、本発明の一態様に係る音声対話制御装置の制御方法は、音声対話装置が実行する、ユーザが発した音声に対する応答を生成する応答生成ステップと、上記音声を取得してから上記応答が出力可能になるまでの待機時間に、上記応答の修正要否を判定するための修正条件が満たされたか否かを判定する判定ステップと、上記判定ステップにて、上記修正条件が満たされたと判定したとき、上記応答生成ステップにて生成した上記応答を修正した修正応答を生成する修正ステップと、上記修正ステップにて生成した上記修正応答を上記音声対話装置に実行させる応答実行ステップと、を含む。 In order to solve the above-described problem, a control method for a voice interaction control device according to an aspect of the present invention includes a response generation step of generating a response to a voice uttered by a user, executed by the voice interaction device; A determination step for determining whether or not a correction condition for determining whether or not the response needs to be corrected is satisfied in a waiting time from when the voice is acquired until the response can be output. When it is determined that the correction condition is satisfied, a correction step for correcting the response generated in the response generation step is generated, and the correction response generated in the correction step is sent to the voice interaction device. A response execution step to be executed.

本発明の一態様によれば、ユーザと音声対話装置とのコミュニケーションの柔軟性を向上させるという効果を奏する。 According to one aspect of the present invention, there is an effect that the flexibility of communication between the user and the voice interaction apparatus is improved.

本発明の実施形態１に係る音声対話装置の構成を示すブロック図である。It is a block diagram which shows the structure of the voice interactive apparatus which concerns on Embodiment 1 of this invention. 図１に示す音声対話装置の記憶部に記憶されている応答修正テーブルのデータ構造およびデータ例を示す図である。It is a figure which shows the data structure and data example of a response correction table which are memorize | stored in the memory | storage part of the voice interactive apparatus shown in FIG. 図１に示す音声対話制御装置が実行する応答実行処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the response execution process which the voice dialogue control apparatus shown in FIG. 1 performs. 本発明の実施形態２に係る音声対話装置の構成を示すブロック図である。It is a block diagram which shows the structure of the voice interactive apparatus which concerns on Embodiment 2 of this invention. 図４に示す音声対話装置の記憶部に記憶されている呼びかけ属性テーブルのデータ構造およびデータ例を示す図である。It is a figure which shows the data structure and example of a data of the calling attribute table memorize | stored in the memory | storage part of the voice interactive apparatus shown in FIG. 図４に示す音声対話装置の記憶部に記憶されている応答修正テーブルのデータ構造およびデータ例を示す図である。It is a figure which shows the data structure and data example of a response correction table which are memorize | stored in the memory | storage part of the voice interactive apparatus shown in FIG. 図４に示す音声対話制御装置が実行する応答実行処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the response execution process which the voice dialogue control apparatus shown in FIG. 4 performs. 本発明の実施形態３に係る音声対話装置の構成を示すブロック図である。It is a block diagram which shows the structure of the voice interactive apparatus which concerns on Embodiment 3 of this invention. 図８に示す音声対話装置の記憶部に記憶されている場つなぎ動作テーブルのデータ構造およびデータ例を示す図である。FIG. 9 is a diagram illustrating a data structure and a data example of a connection operation table stored in a storage unit of the voice interactive apparatus illustrated in FIG. 8. 図８に示す音声対話制御装置が実行する応答実行処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the response execution process which the voice dialogue control apparatus shown in FIG. 8 performs. 図１０に示すフローチャートにおける場つなぎ動作決定処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the joining operation | movement determination processing in the flowchart shown in FIG.

〔実施形態１〕
本発明の一実施形態（実施形態１）について図１から図４に基づいて説明すると以下のとおりである。 Embodiment 1
An embodiment (Embodiment 1) of the present invention will be described below with reference to FIGS.

まず、図１に基づいて、本実施形態に係る音声対話装置１０について説明する。図１は、本実施形態に係る音声対話装置１０の構成を示すブロック図である。 First, the voice interactive apparatus 10 according to the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration of a voice interaction apparatus 10 according to the present embodiment.

音声対話装置１０は、ユーザの発話に対して音声や動作で応答することで、ユーザと対話する装置である。なお、音声対話装置１０の具体例としては人型ロボットが挙げられるが、これに限定されるものではない。例えば、音声対話装置１０の他の具体例として、スマートフォンなどの音声対話機能付きの携帯端末や、音声対話機能付きのカーナビゲーションシステムなどが挙げられる。図１に示すように、音声対話装置１０は、音声対話制御装置を制御部１として備えている。なお、音声対話装置１０と音声対話制御装置とは別体であってもよい。また、音声対話装置１０は、上記制御部１（音声対話制御装置）の他に、音声入力部２、通信部３、音声出力部４、駆動部５、および記憶部６を備えている。 The voice interaction device 10 is a device that interacts with a user by responding to the user's utterance with a voice or action. In addition, although a humanoid robot is mentioned as a specific example of the voice interactive apparatus 10, it is not limited to this. For example, other specific examples of the voice interaction device 10 include a mobile terminal with a voice interaction function such as a smartphone, a car navigation system with a voice interaction function, and the like. As shown in FIG. 1, the voice interaction device 10 includes a voice interaction control device as the control unit 1. Note that the voice interaction device 10 and the voice interaction control device may be separate. The voice interaction device 10 includes a voice input unit 2, a communication unit 3, a voice output unit 4, a drive unit 5, and a storage unit 6 in addition to the control unit 1 (voice conversation control device).

音声入力部２はユーザが発した音声を取得するいわゆるマイクである。音声入力部２は、取得した音声を音声データに変換し、後述する音声認識部１３に出力する（図１においてはｄ２で示している）。また、音声入力部２は、音声を取得した旨を後述する待機時間計測部１１に通知する（図１においてはｄ１で示している）。通信部３は音声対話装置１０が外部機器と通信を行う。具体的には、通信部３は、後述する応答生成部１４によって制御されて、外部機器から応答生成に必要なデータを受信する。例えば通信部３は、天気予報に関するデータを管理する天気予報サーバ（不図示）から、明日の天気に関するデータを取得し、応答生成部１４に出力する。音声出力部４は音声を出力するいわゆるスピーカである。具体的には、音声出力部４は、ユーザが発した音声に対する応答としての音声を出力する。駆動部５は音声対話装置１０（人型ロボット）における頭部や脚部などの可動部位を駆動させるものであり、例えばサーボモータである。なお、サーボモータ以外のアクチュエータを用いてもよい。具体的には、駆動部５は、ユーザが発した音声に対する応答としての動作を、可動部位を駆動させることで音声対話装置１０に行わせる。なお、音声対話装置１０がスマートフォンなどの可動部位を有しない装置である場合、駆動部５は省略されてもよい。記憶部６は、音声対話装置１０にて使用される各種データを記憶する。記憶部６は少なくとも、応答修正テーブル６１を記憶している。なお、応答修正テーブル６１の詳細については後述する。 The voice input unit 2 is a so-called microphone that acquires voice uttered by the user. The voice input unit 2 converts the acquired voice into voice data and outputs the voice data to the voice recognition unit 13 described later (indicated by d2 in FIG. 1). Further, the voice input unit 2 notifies the standby time measuring unit 11 described later that voice has been acquired (indicated by d1 in FIG. 1). In the communication unit 3, the voice interaction device 10 communicates with an external device. Specifically, the communication unit 3 is controlled by a response generation unit 14 described later, and receives data necessary for generating a response from an external device. For example, the communication unit 3 acquires data on tomorrow's weather from a weather forecast server (not shown) that manages data on weather forecasts, and outputs the data to the response generation unit 14. The audio output unit 4 is a so-called speaker that outputs audio. Specifically, the voice output unit 4 outputs a voice as a response to the voice uttered by the user. The drive unit 5 drives a movable part such as a head or a leg in the voice interaction device 10 (humanoid robot), and is a servo motor, for example. An actuator other than the servo motor may be used. Specifically, the drive unit 5 causes the voice interaction apparatus 10 to perform an operation as a response to the voice uttered by the user by driving the movable part. In addition, when the voice interaction apparatus 10 is an apparatus that does not have a movable part such as a smartphone, the driving unit 5 may be omitted. The storage unit 6 stores various data used in the voice interaction device 10. The storage unit 6 stores at least a response correction table 61. Details of the response correction table 61 will be described later.

制御部１は、音声対話装置１０が備える各部を統括制御する。制御部１は、待機時間計測部１１、応答修正部１２、音声認識部１３、応答生成部１４、および応答実行部１５を含んでいる。 The control unit 1 performs overall control of each unit included in the voice interaction device 10. The control unit 1 includes a standby time measurement unit 11, a response correction unit 12, a voice recognition unit 13, a response generation unit 14, and a response execution unit 15.

待機時間計測部１１は、ユーザが発した音声を取得してから、当該音声に対する応答を生成するまでの待機時間を計測する。具体的には、待機時間計測部１１は音声入力部２から音声を取得した旨を通知されると、タイマ（不図示）による時間の計測を開始する。また、待機時間計測部１１は、応答生成部１４から応答情報の生成が完了した旨の通知を受けると、タイマによる時間の計測を終了する。そして、計測した時間Ｔ_ａを応答修正部１２に出力する。 The standby time measuring unit 11 measures a standby time from when a voice uttered by the user is acquired until a response to the voice is generated. Specifically, when the standby time measuring unit 11 is notified from the voice input unit 2 that voice has been acquired, the standby time measuring unit 11 starts measuring time using a timer (not shown). In addition, when the standby time measuring unit 11 receives notification from the response generating unit 14 that the generation of response information has been completed, the standby time measuring unit 11 ends the time measurement by the timer. Then, it outputs the measured time T _a to the response modification unit 12.

応答修正部１２は、応答生成部１４が生成した応答情報を修正して修正応答を生成する。応答修正部１２は、修正要否判定部２１（判定部）および修正実行部２２（修正部）を含む。 The response correction unit 12 corrects the response information generated by the response generation unit 14 to generate a correction response. The response correction unit 12 includes a correction necessity determination unit 21 (determination unit) and a correction execution unit 22 (correction unit).

修正要否判定部２１は、ユーザが発した音声を取得してから、当該音声に対する応答が出力可能になるまでの待機時間に応答の修正要否を判定するための修正条件が満たされたか否かを判定する。具体的には、修正要否判定部２１は、待機時間計測部１１から受け取った時間（待機時間計測部１１が計測した時間）が、所定の値より大きいか否かを判定する。そして、判定結果を修正実行部２２に出力する。なお、所定の値は、応答情報の生成が長引いたと判断されるような時間の値（例えば３秒など）である。 Whether or not the correction necessity determination unit 21 acquires a voice uttered by the user and whether or not a correction condition for determining whether or not the response needs to be corrected is satisfied in a waiting time until a response to the voice can be output. Determine whether. Specifically, the correction necessity determination unit 21 determines whether or not the time received from the standby time measurement unit 11 (the time measured by the standby time measurement unit 11) is greater than a predetermined value. Then, the determination result is output to the correction execution unit 22. The predetermined value is a time value (for example, 3 seconds) at which it is determined that the generation of response information has been prolonged.

修正実行部２２は、上記修正条件が満たされたと判定されたとき、応答生成部１４が生成した応答情報を修正する。具体的には、修正実行部２２は、修正要否判定部２１から受け取った判定結果が、待機時間計測部１１から受け取った時間が所定の値より大きいことを示しているとき、記憶部６に記憶されている応答修正テーブル６１を用いて、応答生成部１４から受け取った応答情報を修正する。 The correction execution unit 22 corrects the response information generated by the response generation unit 14 when it is determined that the correction condition is satisfied. Specifically, the correction execution unit 22 stores the determination result received from the correction necessity determination unit 21 in the storage unit 6 when the time received from the standby time measurement unit 11 is greater than a predetermined value. The response information received from the response generation unit 14 is corrected using the stored response correction table 61.

ここで、図２を参照して、応答修正テーブル６１の詳細について説明する。図２は、記憶部６に記憶されている応答修正テーブル６１のデータ構造およびデータ例を示す図である。なお、図２に示す応答修正テーブル６１は一例であり、データ構造およびデータ例を図２の例に限定するものではない。応答修正テーブル６１は、応答情報の生成に要した時間、すなわち待機時間を示す情報（時間情報、以降、待機時間と称する）と、応答情報に付加する付加応答の内容（修正内容）を示す情報（修正内容情報、以降、付加応答情報）とを対応付けたテーブルである。つまり、応答修正テーブル６１は、待機時間計測部１１が計測した時間に応じて、異なる付加応答情報が対応付けられたテーブルである。なお、「待機時間」のカラムには、「４〜７秒」などの時間の範囲を示す情報が格納されてもよい。 Here, the details of the response correction table 61 will be described with reference to FIG. FIG. 2 is a diagram illustrating a data structure and a data example of the response correction table 61 stored in the storage unit 6. The response correction table 61 shown in FIG. 2 is an example, and the data structure and data example are not limited to the example in FIG. The response correction table 61 includes information indicating time required for generating response information, that is, information indicating standby time (time information, hereinafter referred to as standby time), and information indicating content of additional response (correction content) added to the response information. It is a table in which (correction content information, hereinafter, additional response information) is associated. That is, the response correction table 61 is a table in which different additional response information is associated with the time measured by the standby time measuring unit 11. The “standby time” column may store information indicating a time range such as “4 to 7 seconds”.

より具体的には、修正実行部２２は、応答修正テーブル６１を参照して、待機時間計測部１１から受け取った時間と合致する（対応する）待機時間に対応付けられた付加応答情報を特定する。そして、特定した付加応答情報を読み出し、受け取った応答情報に付加することで、応答情報を修正する。例えば、「晴れだよ」と発話する応答を示す応答情報（音声データ）を受け取っており、また、受け取った時間が５秒である場合、修正実行部２２は、図２に示すＮｏ．１またはＮｏ．２の付加応答情報のいずれかを応答情報を修正するための付加応答情報として決定する。このように、対応付けられた待機時間が合致する付加応答情報が複数ある場合、修正実行部２２は、ランダムに１つを選択すればよい。なおここでは、Ｎｏ．１の付加応答情報が選択されたものとする。修正実行部２２は、「晴れだよ」と発話するための音声データを「お待たせ。晴れだよ」と発話する音声データに修正する。そして、修正実行部２２は、修正した応答情報（上記の例の場合、音声データ）を応答実行部１５に出力する。なお、受け取った時間と「待機時間」のカラムに格納された時間とが完全に合致しなくてもよく、格納された待機時間を含む所定の範囲内に、受け取った時間が含まれているときに、当該待機時間に対応付けられた付加応答情報を応答情報に付加してもよい。 More specifically, the correction execution unit 22 refers to the response correction table 61 and identifies additional response information associated with a standby time that matches (corresponds to) the time received from the standby time measurement unit 11. . Then, the response information is corrected by reading the specified additional response information and adding it to the received response information. For example, when response information (voice data) indicating a response to utter “It's sunny” is received, and the received time is 5 seconds, the correction execution unit 22 displays No. 2 shown in FIG. 1 or No. One of the two additional response information is determined as additional response information for correcting the response information. As described above, when there are a plurality of additional response information that matches the associated waiting time, the correction execution unit 22 may select one at random. In this case, no. It is assumed that 1 additional response information is selected. The correction execution unit 22 corrects the voice data for uttering “It's sunny” to voice data uttering “Please wait. It ’s sunny”. Then, the correction execution unit 22 outputs the corrected response information (speech data in the above example) to the response execution unit 15. Note that the received time and the time stored in the “waiting time” column may not completely match, and the received time is included in a predetermined range including the stored waiting time. In addition, additional response information associated with the waiting time may be added to the response information.

一方、修正実行部２２は、修正要否判定部２１から受け取った判定結果が、待機時間が所定の値以下であることを示しているとき、応答生成部１４から受け取った応答情報を修正せず、そのまま応答実行部１５に出力する。 On the other hand, the correction execution unit 22 does not correct the response information received from the response generation unit 14 when the determination result received from the correction necessity determination unit 21 indicates that the standby time is equal to or less than a predetermined value. Then, it is output to the response execution unit 15 as it is.

音声認識部１３は、音声入力部２から受け取った音声データについて、音声認識処理を行う。なお、音声認識処理については既存の技術を利用することができる。音声認識部１３は、受け取った音声データの音声認識結果を応答生成部１４に出力する。 The voice recognition unit 13 performs voice recognition processing on the voice data received from the voice input unit 2. Note that existing technology can be used for the speech recognition processing. The voice recognition unit 13 outputs the voice recognition result of the received voice data to the response generation unit 14.

応答生成部１４は、ユーザが発した音声に対する応答を示す応答情報を生成する。この応答には、音声の出力、音声対話装置１０の可動部位の動作、並びに、音声の出力および可動部位の動作の３種類がある。応答生成部１４による応答情報の生成には既存の技術を利用することができる。例えば、記憶部６に認識した音声データの内容と応答内容とを対応付けたテーブル（不図示）を格納しておき、当該テーブルを参照することで応答情報を生成してもよい。また、応答生成部１４は、応答情報の生成に、明日の天気の情報などの外部データを用いる必要がある場合、通信部３を制御して当該データを取得する。応答生成部１４は、生成した応答情報（音声出力用の音声データや、可動部位を動作させるためのアクションデータなど）を応答修正部１２（修正実行部２２）に出力する（図１においてはｄ４で示している）。また、応答生成部１４は、応答情報の生成が完了した旨を待機時間計測部１１に通知する（図１においてはｄ３で示している）。 The response generation unit 14 generates response information indicating a response to the voice uttered by the user. There are three types of responses: voice output, movement of the moving part of the voice interaction device 10, and voice output and movement of the movable part. An existing technique can be used to generate response information by the response generation unit 14. For example, a table (not shown) in which the content of the voice data recognized and the response content are associated with each other may be stored in the storage unit 6, and the response information may be generated by referring to the table. Further, when it is necessary to use external data such as tomorrow's weather information to generate response information, the response generation unit 14 controls the communication unit 3 to acquire the data. The response generation unit 14 outputs the generated response information (voice data for voice output, action data for operating the movable part, etc.) to the response correction unit 12 (correction execution unit 22) (d4 in FIG. 1). ). In addition, the response generation unit 14 notifies the standby time measurement unit 11 that the generation of response information has been completed (indicated by d3 in FIG. 1).

応答実行部１５は、応答生成部１４が生成し、必要に応じて応答修正部１２が修正した応答情報に応じて応答を実行する。具体的には、応答実行部１５は、応答修正部１２（修正実行部２２）から応答情報を受け取り、当該応答情報が示す動作を音声対話装置１０に実行させる。例えば、音声出力部４を制御して音声を出力させたり、駆動部５を制御して音声対話装置１０の可動部位を動作させたりする。 The response execution unit 15 executes a response according to the response information generated by the response generation unit 14 and corrected by the response correction unit 12 as necessary. Specifically, the response execution unit 15 receives response information from the response correction unit 12 (correction execution unit 22), and causes the voice interaction device 10 to execute the operation indicated by the response information. For example, the voice output unit 4 is controlled to output a voice, or the drive unit 5 is controlled to operate a movable part of the voice interaction device 10.

次に、図３に基づいて、制御部１が実行する応答実行処理の流れについて説明する。図３は、制御部１が実行する応答実行処理の流れの一例を示すフローチャートである。 Next, based on FIG. 3, the flow of response execution processing executed by the control unit 1 will be described. FIG. 3 is a flowchart illustrating an example of a response execution process executed by the control unit 1.

まず、音声入力部２は音声の入力を待機している（Ｓ１）。音声入力部２は、ユーザが発した音声を取得すると（Ｓ１でＹＥＳ）、取得した音声を音声データに変換し、当該音声データを音声認識部１３に出力する。また、音声入力部２は、音声を取得した旨を待機時間計測部１１に通知する。 First, the voice input unit 2 waits for voice input (S1). When the voice input unit 2 acquires the voice uttered by the user (YES in S1), the voice input unit 2 converts the acquired voice into voice data, and outputs the voice data to the voice recognition unit 13. In addition, the voice input unit 2 notifies the standby time measurement unit 11 that voice has been acquired.

続いて待機時間計測部１１は、音声を取得した旨の通知を受けると、時間の計測を開始する（Ｓ２）。また、音声認識部１３は受け取った音声データについて音声認識処理を行う（Ｓ３）。音声認識部１３は、音声認識結果を応答生成部１４に出力する。続いて、応答生成部１４は、受け取った音声認識結果に応じて、応答情報を生成する（Ｓ４、応答生成ステップ）。そして、応答生成部１４は、応答情報を生成した旨を待機時間計測部１１に通知する。また、応答生成部１４は、生成した応答情報を修正実行部２２に出力する。続いて、待機時間計測部１１は、応答生成部１４からの通知を受けると、時間の計測を終了する（Ｓ５）。そして、計測結果（計測した時間Ｔ_ａ）を修正要否判定部２１に出力する。 Subsequently, when receiving the notification that the voice has been acquired, the standby time measuring unit 11 starts measuring time (S2). The voice recognition unit 13 performs voice recognition processing on the received voice data (S3). The voice recognition unit 13 outputs the voice recognition result to the response generation unit 14. Subsequently, the response generation unit 14 generates response information according to the received voice recognition result (S4, response generation step). Then, the response generation unit 14 notifies the standby time measurement unit 11 that the response information has been generated. In addition, the response generation unit 14 outputs the generated response information to the correction execution unit 22. Subsequently, when receiving the notification from the response generation unit 14, the standby time measurement unit 11 ends the time measurement (S5). Then, the measurement result (measured time T _a ) is output to the correction necessity determination unit 21.

修正要否判定部２１は、待機時間計測部１１から受け取った時間Ｔ_ａが所定の値より大きいか否かを判定する（Ｓ６、判定ステップ）。そして、修正要否判定部２１は、判定結果を修正実行部２２に出力する。 The correction necessity determination unit 21 determines whether or not the time Ta received from the standby time measurement unit 11 is greater than _a predetermined value (S6, determination step). Then, the correction necessity determination unit 21 outputs the determination result to the correction execution unit 22.

時間Ｔ_ａが所定の値より大きいと判定された場合（Ｓ６でＹＥＳ）、修正実行部２２は当該判定結果を受け取ると、時間Ｔ_ａに応じた付加応答を特定し、応答情報を修正する（Ｓ７、修正ステップ）。具体的には、修正実行部２２は、記憶部６に記憶されている応答修正テーブル６１を参照して、受け取った時間Ｔ_ａと合致する待機時間に対応付けられた付加応答情報を特定する。そして、特定した付加応答情報を読み出し、応答生成部１４から受け取った応答情報に付加することで、応答情報を修正する。そして、修正実行部２２は、修正した応答情報を応答実行部１５に出力する。 If the time T _a is determined to be greater than the predetermined value (YES in S6), the correction execution unit 22 receives the determination result, to identify additional responses as a function of time T _a, modifies the response information ( S7, correction step). Specifically, correction execution unit 22 refers to the response modification table 61 stored in the storage unit 6 to identify the additional response information associated with the waiting time that matches the received time T _a. Then, the identified additional response information is read out and added to the response information received from the response generation unit 14, thereby correcting the response information. Then, the correction execution unit 22 outputs the corrected response information to the response execution unit 15.

これに対して、時間Ｔ_ａが所定の値以下であると判定された場合（Ｓ６でＮＯ）、修正実行部２２は当該判定結果を受け取ると、応答生成部１４から受け取った応答情報を修正することなく、応答実行部１５に出力する。つまり、上述したステップＳ７の処理が省略される。 In contrast, if the time T _a is determined to be equal to or less than the predetermined value (NO at S6), correction execution unit 22 receives the determination result, corrects the response information received from the response generation unit 14 Without being output to the response execution unit 15. That is, the process in step S7 described above is omitted.

最後に、応答実行部１５は音声対話装置１０に応答を実行させる（Ｓ８、応答実行ステップ）。具体的には、応答実行部１５は、受け取った応答情報に応じて、音声出力部４を制御して音声を出力させたり、駆動部５を制御して音声対話装置１０の可動部位を動作させたりする。以上で、応答実行処理は終了する。 Finally, the response execution unit 15 causes the voice interaction device 10 to execute a response (S8, response execution step). Specifically, the response execution unit 15 controls the voice output unit 4 to output voice according to the received response information, or controls the drive unit 5 to operate the movable part of the voice interaction device 10. Or Thus, the response execution process ends.

〔実施形態２〕
本発明の他の実施形態（実施形態２）について、図４〜図７に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、前記実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。 [Embodiment 2]
The following will describe another embodiment (Embodiment 2) of the present invention with reference to FIGS. For convenience of explanation, members having the same functions as those described in the embodiment are given the same reference numerals, and descriptions thereof are omitted.

本実施形態では、待機時間計測部１１から受け取った時間と「待機時間」のカラムに格納された待機時間とが合致する付加応答情報を特定する前に、音声入力部２から入力された音声（音声データ）の属性と、付加応答情報の属性とが合致する付加応答情報を特定する例について説明する。 In the present embodiment, the voice input from the voice input unit 2 before identifying the additional response information in which the time received from the standby time measuring unit 11 matches the standby time stored in the “standby time” column ( A description will be given of an example of identifying additional response information in which the attribute of (sound data) matches the attribute of additional response information.

まず、図４に基づいて、本実施形態に係る音声対話装置１０ａについて説明する。図４は、本実施形態に係る音声対話装置１０ａの構成を示すブロック図である。音声対話装置１０ａは、実施形態１に係る音声対話装置１０と比較して、制御部１に代えて制御部１ａを備え、記憶部６に代えて記憶部６ａを備えている。本実施形態では、制御部１ａは、実施形態１に係る制御部１と比較して、応答修正部１２に代えて応答修正部１２ａを含み、音声認識部１３に代えて音声認識部１３ａ（音声属性特定部）を含む。また、記憶部６ａは、実施形態１に係る記憶部６と比較して、応答修正テーブル６１に代えて応答修正テーブル６１ａを記憶している。また、記憶部６ａは新たに呼びかけ属性テーブル６２を記憶している。 First, the voice interactive apparatus 10a according to the present embodiment will be described with reference to FIG. FIG. 4 is a block diagram showing the configuration of the voice interaction apparatus 10a according to this embodiment. Compared with the voice interaction device 10 according to the first embodiment, the voice interaction device 10 a includes a control unit 1 a instead of the control unit 1 and includes a storage unit 6 a instead of the storage unit 6. In this embodiment, compared with the control unit 1 according to the first embodiment, the control unit 1a includes a response correction unit 12a instead of the response correction unit 12, and replaces the voice recognition unit 13 with a voice recognition unit 13a (voice Attribute specific part). Further, the storage unit 6 a stores a response correction table 61 a instead of the response correction table 61 as compared with the storage unit 6 according to the first embodiment. In addition, the storage unit 6a newly stores a call attribute table 62.

音声認識部１３ａは、音声入力部２から受け取った音声データについて、音声認識処理を行い、音声認識結果を応答生成部１４に出力する（図４においてはｄ５で示している）。また、音声認識部１３ａは音声データに属性を付与する。具体的には、音声認識部１３ａは、音声認識を行った後、記憶部６ａに記憶されている呼びかけ属性テーブル６２を参照する。 The voice recognition unit 13a performs voice recognition processing on the voice data received from the voice input unit 2, and outputs a voice recognition result to the response generation unit 14 (indicated by d5 in FIG. 4). Further, the voice recognition unit 13a gives an attribute to the voice data. Specifically, the voice recognition unit 13a refers to the call attribute table 62 stored in the storage unit 6a after performing voice recognition.

ここで、図５を参照して、呼びかけ属性テーブル６２の詳細について説明する。図５は記憶部６ａに記憶されている呼びかけ属性テーブルのデータ構造およびデータ例を示す図である。なお、図５に示す呼びかけ属性テーブル６２は一例であり、データ構造およびデータ例を図５の例に限定するものではない。呼びかけ属性テーブル６２は、図５の（ａ）に示すように、音声認識結果と呼びかけ属性とを対応付けたテーブルである。換言すれば、呼びかけ属性テーブル６２は、音声認識結果に基づいて音声データの呼びかけ属性を特定するためのテーブルである。「呼びかけ」のカラムには、音声認識結果、すなわちユーザが発した音声をテキストデータにした情報が格納されている。また、「呼びかけ属性」のカラムには、当該音声の内容のカテゴリを示す呼びかけ属性（音声属性）が格納されている。音声認識部１３ａは、音声認識結果を用いて呼びかけ属性テーブル６２を参照し、音声データの呼びかけ属性を特定する。そして、特定した呼びかけ属性を修正実行部２２ａに出力する（図４においてはｄ６で示している）。例えば、音声認識の結果、音声データが「今日の天気はなに？」であった場合、音声認識部１３ａは、当該音声データの呼びかけ属性として、「質問」および「天気」を修正実行部２２ａに出力する。 Here, the details of the call attribute table 62 will be described with reference to FIG. FIG. 5 is a diagram showing a data structure and a data example of a call attribute table stored in the storage unit 6a. The call attribute table 62 shown in FIG. 5 is an example, and the data structure and data example are not limited to the example of FIG. The call attribute table 62 is a table in which a speech recognition result and a call attribute are associated with each other as shown in FIG. In other words, the call attribute table 62 is a table for specifying call attributes of voice data based on the voice recognition result. In the “calling” column, a speech recognition result, that is, information obtained by converting the speech uttered by the user into text data is stored. In the “calling attribute” column, a calling attribute (sound attribute) indicating a category of the content of the sound is stored. The voice recognition unit 13a refers to the call attribute table 62 using the voice recognition result, and specifies the call attribute of the voice data. Then, the specified call attribute is output to the correction execution unit 22a (indicated by d6 in FIG. 4). For example, when the voice data is “What is the weather today?” As a result of the voice recognition, the voice recognition unit 13a sets “question” and “weather” as the calling attributes of the voice data. Output to.

なお、呼びかけ属性テーブル６２は図５の（ｂ）に示すようなものであってもよい。つまり、「呼びかけ」のカラムにはユーザが発した音声の中に含まれるキーワードが格納されており、キーワードそれぞれに呼びかけ属性が対応付けられているものであってもよい。この場合、音声認識部１３ａは音声データに含まれるキーワードに対応付けられている呼びかけ属性をすべて特定し、特定した呼びかけ属性を修正実行部２２ａに出力する。 The call attribute table 62 may be as shown in FIG. That is, a keyword included in the voice uttered by the user is stored in the “calling” column, and a calling attribute may be associated with each keyword. In this case, the voice recognition unit 13a specifies all the call attributes associated with the keywords included in the voice data, and outputs the specified call attributes to the correction execution unit 22a.

応答修正部１２ａは、応答生成部１４が生成した応答情報を修正する。応答修正部１２ａは、修正要否判定部２１および修正実行部２２ａを含む。なお、修正要否判定部２１については、実施形態１にて既に説明したため、ここでの説明を省略する。 The response correction unit 12a corrects the response information generated by the response generation unit 14. The response correction unit 12a includes a correction necessity determination unit 21 and a correction execution unit 22a. Since the correction necessity determination unit 21 has already been described in the first embodiment, the description thereof is omitted here.

修正実行部２２ａは、応答生成部１４が生成した応答情報を修正する。具体的には、修正実行部２２ａは、修正要否判定部２１から受け取った判定結果が、待機時間計測部１１から受け取った時間が所定の値より大きいことを示しているとき、記憶部６に記憶されている応答修正テーブル６１ａを用いて、応答を修正する。 The correction execution unit 22a corrects the response information generated by the response generation unit 14. Specifically, when the determination result received from the correction necessity determination unit 21 indicates that the time received from the standby time measurement unit 11 is greater than a predetermined value, the correction execution unit 22a stores the storage unit 6 in the storage unit 6. The response is corrected using the stored response correction table 61a.

ここで、図６を参照して、応答修正テーブル６１ａの詳細について説明する。図６は、記憶部６ａに記憶されている応答修正テーブル６１ａのデータ構造およびデータ例を示す図である。応答修正テーブル６１ａでは、待機時間と付加応答情報とに、さらに付加応答情報が示す付加応答の内容のカテゴリを示す付加応答属性が対応付けられている。「付加応答属性」のカラムには、上述した付加応答属性が格納されている。「質問」という付加応答属性は、当該付加応答属性に対応付けられた付加応答情報が示す付加応答が、質問に対する回答を示す応答への付加に適していることを示す。また、「全て」という付加応答属性は、当該付加応答属性に対応付けられた付加応答情報が示す付加応答が、応答の内容を問わず、全ての応答への付加に適していることを示す。 Here, the details of the response correction table 61a will be described with reference to FIG. FIG. 6 is a diagram illustrating a data structure and a data example of the response correction table 61a stored in the storage unit 6a. In the response correction table 61a, an additional response attribute indicating the category of the content of the additional response indicated by the additional response information is further associated with the standby time and the additional response information. The “additional response attribute” column stores the additional response attribute described above. The additional response attribute “question” indicates that the additional response indicated by the additional response information associated with the additional response attribute is suitable for addition to a response indicating an answer to the question. The additional response attribute “all” indicates that the additional response indicated by the additional response information associated with the additional response attribute is suitable for addition to all responses regardless of the content of the response.

より具体的には、修正実行部２２ａは、修正要否判定部２１から受け取った判定結果が、待機時間計測部１１から受け取った時間が所定の値より大きいことを示しているとき、まず、音声認識部１３ａから受け取った呼びかけ属性と合致する付加応答属性を、応答修正テーブル６１ａから特定する。なお、修正実行部２２ａは「全て」の他に、呼びかけ属性と合致する付加応答属性がある場合は、当該付加応答属性と対応付けられた付加応答情報のみを特定する。そして、修正実行部２２ａは、特定した付加応答属性に対応付けられた付加応答情報の中から、待機時間計測部１１から受け取った時間に対応付けられた付加応答情報を特定する。そして、特定した付加応答情報を読み出し、受け取った応答情報に付加することで、応答情報を修正する。 More specifically, when the determination result received from the correction necessity determination unit 21 indicates that the time received from the standby time measurement unit 11 is greater than a predetermined value, the correction execution unit 22a first performs voice An additional response attribute that matches the call attribute received from the recognition unit 13a is specified from the response correction table 61a. When there is an additional response attribute that matches the call attribute in addition to “all”, the correction execution unit 22a specifies only the additional response information associated with the additional response attribute. Then, the correction execution unit 22a identifies the additional response information associated with the time received from the standby time measuring unit 11 from the additional response information associated with the identified additional response attribute. Then, the response information is corrected by reading the specified additional response information and adding it to the received response information.

例えば、応答生成部１４から「晴れだよ」と発話する応答を示す応答情報（音声データ）を受け取っており、音声認識部１３ａから、「質問」および「天気」という呼びかけ属性を受け取っており、また、受け取った時間が５秒である場合、修正実行部２２ａは、まず受け取った「質問」という呼びかけ属性と合致するＮｏ．１およびＮｏ．３の付加応答情報を特定する。そして、受け取った時間と、付加応答情報に対応付けられた応答生成処理時間とが合致する、Ｎｏ．１の付加応答情報を、応答情報を修正するための付加応答情報として決定する。そして、修正実行部２２ａは、「晴れだよ」と発話するための音声データを「お待たせ。晴れだよ」と発話する音声データに修正する。なお、上述した例では、複数の呼びかけ属性を受け取っている場合は、当該呼びかけ属性の少なくとも１つと合致する付加応答属性が対応付けられている付加応答情報を特定していたが、これに限定されず、複数の呼びかけ属性と完全に合致する付加応答属性が対応付けられている付加応答情報を特定してもよい。 For example, response information (voice data) indicating a response to utter “It's sunny” is received from the response generation unit 14, and call attributes “question” and “weather” are received from the voice recognition unit 13 a, When the received time is 5 seconds, the correction execution unit 22a firstly selects a No. that matches the received call attribute “question”. 1 and no. 3 additional response information is specified. Then, the received time matches the response generation processing time associated with the additional response information. One additional response information is determined as additional response information for correcting the response information. Then, the correction execution unit 22a corrects the voice data for uttering "It's sunny" to voice data uttering "Please wait. It's sunny". In the above-described example, when a plurality of call attributes are received, the additional response information associated with the additional response attribute that matches at least one of the call attributes is specified. However, the present invention is not limited to this. Alternatively, additional response information associated with additional response attributes that completely match a plurality of call attributes may be specified.

次に、図７に基づいて、制御部１ａが実行する応答実行処理の流れについて説明する。図７は、制御部１ａが実行する応答実行処理の流れの一例を示すフローチャートである。なおここでは、実施形態１にて説明した応答実行処理と異なる点のみを説明する。具体的には、ステップＳ１１からステップＳ１３、ステップＳ１５からステップＳ１７、およびステップＳ２０についてはそれぞれ、図３のフローチャートのステップＳ１からステップＳ３、ステップＳ４からステップＳ６、およびステップＳ８と同様であるためここでの説明を省略する。 Next, the flow of response execution processing executed by the control unit 1a will be described with reference to FIG. FIG. 7 is a flowchart showing an example of a response execution process executed by the control unit 1a. Here, only differences from the response execution process described in the first embodiment will be described. Specifically, Step S11 to Step S13, Step S15 to Step S17, and Step S20 are the same as Step S1 to Step S3, Step S4 to Step S6, and Step S8 in the flowchart of FIG. The description in is omitted.

音声認識部１３ａは、音声認識処理を行った後、認識した音声の呼びかけ属性を特定する（Ｓ１４）。具体的には、音声認識部１３ａは、音声認識結果を用いて呼びかけ属性テーブル６２を参照し、音声データの呼びかけ属性を特定する。そして、特定した呼びかけ属性を修正実行部２２ａに出力する。 After performing the voice recognition process, the voice recognition unit 13a specifies the call attribute of the recognized voice (S14). Specifically, the voice recognition unit 13a refers to the call attribute table 62 using the voice recognition result, and specifies the call attribute of the voice data. Then, the specified call attribute is output to the correction execution unit 22a.

時間Ｔ_ａが所定の値より大きいと判定された場合（Ｓ１７でＹＥＳ）、修正実行部２２は当該判定結果を受け取ると、音声認識部１３ａから受け取った呼びかけ属性と一致する付加応答属性と対応付けられた付加応答情報を特定する（Ｓ１８）。具体的には、修正実行部２２は記憶部６に記憶されている応答修正テーブル６１ａを参照して、呼びかけ属性と一致する付加応答属性と対応付けられた付加応答情報を特定する。そして、特定した付加応答情報の中から、時間Ｔ_ａに応じた付加応答情報をさらに特定し、応答を修正する（Ｓ１９）。そして、修正実行部２２は、修正した応答情報を応答実行部１５に出力する。 If the time T _a is determined to be greater than the predetermined value (YES in S17), the correction execution unit 22 receives the determination result, correspondence between the additional response attribute that matches the call attributes received from the voice recognition unit 13a The added additional response information is specified (S18). Specifically, the correction execution unit 22 refers to the response correction table 61a stored in the storage unit 6 and specifies additional response information associated with the additional response attribute that matches the call attribute. Then, from among the identified additional response information, and further identifies the additional response information corresponding to the time T _a, Correct Answers (S19). Then, the correction execution unit 22 outputs the corrected response information to the response execution unit 15.

なお、時間Ｔ_ａが所定の値以下であると判定された場合（Ｓ１７でＮＯ）、修正実行部２２は当該判定結果を受け取ると、応答生成部１４から受け取った応答情報を修正することなく、応答実行部１５に出力する。つまり、ステップＳ１８およびステップＳ１９の処理が省略される。 Incidentally, if the time T _a is determined to be equal to or less than the predetermined value (NO in S17), the correction execution unit 22 receives the determination result, without modifying the response information received from the response generation unit 14, The response is output to the response execution unit 15. That is, the processes of step S18 and step S19 are omitted.

また本実施形態では、音声認識部１３ａが、音声入力部２から受け取った音声データの呼びかけ属性を特定する例を説明した。これに対して、応答生成部１４ａ（不図示）が、生成した応答情報の応答属性を特定してもよい。具体的には、記憶部６ａには呼びかけ属性テーブル６２に代えて、応答情報と応答属性とを対応付けたテーブルである応答属性テーブル６２ａ（不図示）が記憶されており、応答生成部１４ａは、生成した応答情報を用いて、応答属性テーブル６２ａから応答属性を特定する。そして、生成した応答情報と特定した応答属性とを対応付けて、修正実行部２２ａに出力する。 In the present embodiment, the example has been described in which the voice recognition unit 13a specifies the call attribute of the voice data received from the voice input unit 2. On the other hand, the response generation unit 14a (not shown) may specify the response attribute of the generated response information. Specifically, a response attribute table 62a (not shown), which is a table in which response information and response attributes are associated with each other, is stored in the storage unit 6a in place of the call attribute table 62. The response generation unit 14a The response attribute is specified from the response attribute table 62a using the generated response information. Then, the generated response information is associated with the identified response attribute and output to the correction execution unit 22a.

〔実施形態３〕
本発明のさらに別の実施形態（実施形態３）について、図８〜図１１に基づいて説明すれば、以下のとおりである。なお、説明の便宜上、前記実施形態にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。 [Embodiment 3]
The following will describe still another embodiment (Embodiment 3) of the present invention with reference to FIGS. For convenience of explanation, members having the same functions as those described in the embodiment are given the same reference numerals, and descriptions thereof are omitted.

上述した実施形態１および２では、待機時間計測部１１が計測した時間が、所定の値より大きいか否かを判定することで、応答を修正するか否かを決定していた。一方本実施形態では、待機時間、すなわちユーザが発した音声を取得してから、当該音声に対する応答を生成するまでの時間中に、上記音声対話装置に音声の出力および動作の少なくとも一方を行わせると決定したか否か（または実際に行わせたか否か）を判定することで、応答を修正するか否かを決定する例について説明する。 In the first and second embodiments described above, it is determined whether or not the response is to be corrected by determining whether or not the time measured by the standby time measuring unit 11 is greater than a predetermined value. On the other hand, in the present embodiment, during the standby time, that is, the time from when the voice uttered by the user is acquired to when the response to the voice is generated, the voice interaction apparatus performs at least one of voice output and operation. An example will be described in which it is determined whether or not the response is to be corrected by determining whether or not it has been determined (or whether or not it has actually been performed).

まず、図８に基づいて、本実施形態に係る音声対話装置１０ｂについて説明する。図８は、本実施形態に係る音声対話装置１０ｂの構成を示すブロック図である。音声対話装置１０ｂは、実施形態１に係る音声対話装置１０と比較して、制御部１に代えて制御部１ｂを備え、音声入力部２に代えて音声入力部２ｂを備え、記憶部６に代えて記憶部６ｂを備えている。本実施形態では、制御部１ｂは、実施形態１に係る制御部１と比較して、待機時間計測部１１を含まない。また、応答修正部１２に代えて応答修正部１２ｂを含む。さらに、新たに待機時間予測部１６および場つなぎ動作制御部１７を含む。また、記憶部６ｂは、実施形態１に係る記憶部６と比較して、新たに場つなぎ動作テーブル６３を記憶している。 First, the voice interactive apparatus 10b according to the present embodiment will be described with reference to FIG. FIG. 8 is a block diagram showing the configuration of the voice interaction apparatus 10b according to the present embodiment. Compared with the voice interaction device 10 according to the first embodiment, the voice interaction device 10 b includes a control unit 1 b instead of the control unit 1, a voice input unit 2 b instead of the voice input unit 2, and a storage unit 6. Instead, a storage unit 6b is provided. In the present embodiment, the control unit 1b does not include the standby time measurement unit 11 as compared with the control unit 1 according to the first embodiment. In addition, a response correction unit 12b is included instead of the response correction unit 12. Further, a standby time prediction unit 16 and a field connection operation control unit 17 are newly included. In addition, the storage unit 6b stores a new connection operation table 63 in comparison with the storage unit 6 according to the first embodiment.

音声入力部２ｂは、取得した音声を音声データに変換し、後述する音声認識部１３に出力する（図８においてはｄ８で示している）。また音声入力部２は当該音声データのサイズ（データ量）および時間（発話時間）の少なくとも一方（以降、音声付属情報と称する）を、待機時間予測部１６に出力する（図８においてはｄ７で示している）。 The voice input unit 2b converts the acquired voice into voice data and outputs the voice data to the voice recognition unit 13 described later (indicated by d8 in FIG. 8). The voice input unit 2 outputs at least one of the size (data amount) and time (speech time) of the voice data (hereinafter referred to as voice attached information) to the standby time prediction unit 16 (in FIG. 8, at d7). Shown).

待機時間予測部１６は、音声対話装置１０ｂがユーザの発した音声を取得してから、当該音声に対する応答が出力可能となるまでの待機時間Ｔ_ｂを予測する。具体的には、まず待機時間予測部１６は音声入力部２から音声付属情報を受け取ると、当該音声データのサイズ（データ量）を用いて待機時間を予測する。より詳細には、待機時間予測部１６は、「待機時間Ｔ_ｂ＝α×データ量（αは単位データ量あたりに要する待機時間であり、所定の値である）」という計算式を用いて、予測待機時間Ｔ_ｂを算出する。待機時間予測部１６は、予測（算出）した予測待機時間Ｔ_ｂを後述する場つなぎ動作決定部７１および修正実行部２２ｂに出力する。なお、待機時間予測部１６は、音声データの時間（ユーザの発話時間）を用いて待機時間を予測してもよい。具体的には、待機時間予測部１６は、「待機時間Ｔ_ｂ＝β×発話時間（βは単位発話時間あたりに要する待機時間であり、所定の値である）」という計算式を用いて、予測待機時間Ｔ_ｂを算出してもよい。また、音声データのデータ量および発話時間の両方を用いて、待機時間を予測（算出）してもよい。データ量から算出した予測待機時間と発話時間から算出した予測待機時間とが異なる場合、より長い（または短い）方の予測待機時間を採用してもよいし、２つの予測待機時間の平均値を算出し、算出した平均待機時間を場つなぎ動作決定部７１に出力してもよい。 Waiting time prediction unit 16 predicts the acquired voice voice dialogue system 10b has issued the user, the waiting time T _b to response to the voice can be outputted. Specifically, first, when the standby time predicting unit 16 receives the audio attached information from the audio input unit 2, the standby time predicting unit 16 predicts the standby time using the size (data amount) of the audio data. More specifically, the standby time prediction unit 16 uses a calculation formula “standby time T _b = α × data amount (α is a standby time required per unit data amount, which is a predetermined value)”. The estimated waiting time _Tb is calculated. The standby time prediction unit 16 outputs the predicted (calculated) predicted standby time _Tb to the later-described place-joining operation determination unit 71 and the correction execution unit 22b. Note that the standby time prediction unit 16 may predict the standby time using the time of the voice data (user's speech time). Specifically, the waiting time prediction unit 16 uses a calculation formula “waiting time T _b = β × speech time (β is a waiting time required per unit utterance time, which is a predetermined value)”. it may calculate the predicted waiting time T _b. Further, the standby time may be predicted (calculated) using both the data amount of the voice data and the speech time. When the predicted standby time calculated from the data amount is different from the predicted standby time calculated from the utterance time, the longer (or shorter) predicted standby time may be adopted, and an average value of the two predicted standby times may be calculated. It may be calculated and the calculated average waiting time may be output to the connection determination unit 71.

場つなぎ動作制御部１７は、場つなぎ動作の決定および実行を行う。場つなぎ動作制御部１７は、場つなぎ動作決定部７１および場つなぎ動作実行部７２を含む。 The field connection operation control unit 17 determines and executes the field connection operation. The field connection operation control unit 17 includes a field connection operation determination unit 71 and a field connection operation execution unit 72.

場つなぎ動作決定部７１は、待機時間予測部１６が予測した予測待機時間Ｔ_ｂに基づいて、音声対話装置１０が実行する場つなぎ動作を決定するものである。ここで、場つなぎ動作とは、ユーザが発した音声を取得してから、当該音声に対する応答が出力可能となるまでの時間（待機時間）中に、音声対話装置１０に実行させる動作である。具体的には、場つなぎ動作決定部７１は、記憶部６に記憶されている場つなぎ動作テーブル６３を用いて、待機時間予測部１６が予測した予測待機時間Ｔ_ｂと、待機時間中に音声対話装置１０ｂに実行させる場つなぎ動作に要する場つなぎ動作時間とに応じて、場つなぎ動作を決定する。 If connecting operation determining unit 71, based on the standby time predicting section predicts the waiting time 16 predicts T _b, is what determines the field joint operation voice dialogue system 10 performs. Here, the place-joining operation is an operation that is executed by the voice interaction apparatus 10 during a time (waiting time) from when a voice uttered by the user is acquired until a response to the voice can be output. Specifically, the field joint operation determining unit 71, by using a field joint operation table 63 stored in the storage unit 6, and the predicted waiting time predicted waiting time prediction unit 16 T _b, the sound during the waiting time The place-joining operation is determined according to the place-joining operation time required for the place-joining operation to be executed by the interactive apparatus 10b.

ここで、図９を参照して、場つなぎ動作テーブル６３の詳細について説明する。図９は、記憶部６ｂに記憶されている場つなぎ動作テーブル６３のデータ構造およびデータ例を示す図である。なお、図９に示す場つなぎ動作テーブル６３は一例であり、データ構造およびデータ例を図９の例に限定するものではない。場つなぎ動作テーブル６３は、場つなぎ動作を示す情報と、当該場つなぎ動作に要する時間である場つなぎ動作時間とを対応付けたテーブルである。「場つなぎ動作」のカラムには、音声対話装置１０が実行可能な動作を示す複数の動作候補の情報（以下、場つなぎ動作情報と称する）が格納される。「種別」のカラムには、各場つなぎ動作が音声を出力するものであるか（図９では「音声」で示されている）、音声対話装置１０ｂの可動部位を動作させるものであるか（図９では「身振り」で示されている）、またはその両方を実行するものであるか（図９では「音声＋身振り」で示されている）を示す情報が格納される。「場つなぎ動作時間」のカラムには、上記場つなぎ動作時間が格納されている。 Here, with reference to FIG. 9, the details of the joining operation table 63 will be described. FIG. 9 is a diagram illustrating a data structure and a data example of the joining operation table 63 stored in the storage unit 6b. 9 is merely an example, and the data structure and the data example are not limited to the example in FIG. The field connection operation table 63 is a table in which information indicating a field connection operation is associated with a field connection operation time which is a time required for the field connection operation. Stored in the “place-joining action” column is information on a plurality of action candidates (hereinafter referred to as place-joining action information) indicating actions that can be executed by the voice interaction apparatus 10. In the “type” column, whether the connection operation at each place is to output a voice (indicated by “voice” in FIG. 9), or to move the movable part of the voice interactive device 10b ( Information indicating whether or not to perform both (indicated by “gesture” in FIG. 9) (indicated by “voice + gesture” in FIG. 9) is stored. In the column of “joining operation time”, the above-mentioned joining operation time is stored.

より具体的には、場つなぎ動作決定部７１は、受け取った予測待機時間Ｔ_ｂから、場つなぎ動作テーブル６３の各場つなぎ動作時間を減算して、各場つなぎ動作情報における減算値Ｔ_ｃを算出する。なお、ｃは場つなぎ動作テーブル６３における「Ｎｏ．」に格納されている数字である。続いて、場つなぎ動作決定部７１は、算出した減算値Ｔ_ｃのそれぞれについて、０以上かつ、場つなぎ動作を音声対話装置１０ｂが実行してから応答の生成が完了するまでに、音声対話装置１０ｂが動作しない時間として許容できる時間を示す第１許容時間Ｘ以下となるか否か（０≦Ｔ_ｃ≦Ｘを満たす場つなぎ動作情報があるか否か）を判定する。第１許容時間Ｘは予め設定されている値であり、例えばＸ＝２であれば、場つなぎ動作が完了してから応答の生成が完了するまでの時間として許容できる時間が２秒であるということである。 More specifically, the field joint operation determining unit 71, the predicted waiting time T _b received, by subtracting the respective field connecting operation time field joint operation table 63, a subtraction value T _c of each spot joining operation information calculate. In addition, c is a number stored in “No.” in the field connection operation table 63. Subsequently, the field connection operation determination unit 71 sets the calculated subtraction value T _c to 0 or more, and after the voice interaction apparatus 10b executes the field connection operation until the generation of the response is completed. It is determined whether or not 10b is equal to or shorter than a first allowable time X indicating a time that is allowable as a time during which no operation is performed (whether there is connection operation information that satisfies 0 ≦ T _c ≦ X). The first allowable time X is a preset value. For example, if X = 2, the allowable time from the completion of the splicing operation until the generation of the response is 2 seconds. That is.

０≦Ｔ_ｃ≦Ｘを満たす場つなぎ動作情報がある場合、場つなぎ動作決定部７１は、当該場つなぎ動作情報が示す場つなぎ動作を音声対話装置１０ｂに実行させる場つなぎ動作として決定し、当該場つなぎ動作情報を場つなぎ動作実行部７２に出力する。例えば予測待機時間Ｔ_ｂが２秒であり、第１許容時間Ｘ＝１である場合、図９に示すＮｏ．２およびＮｏ．３の場つなぎ動作情報が０≦Ｔ_ｃ≦Ｘを満たす。よって、場つなぎ動作決定部７１は、Ｎｏ．２またはＮｏ．３の場つなぎ動作情報を読み出し、場つなぎ動作実行部７２に出力する。 When there is field connection operation information satisfying 0 ≦ T _c ≦ X, the field connection operation determination unit 71 determines the field connection operation indicated by the field connection operation information as a field connection operation to be executed by the voice interaction apparatus 10b, and The field connection operation information is output to the field connection operation execution unit 72. For example, when the predicted standby time _Tb is 2 seconds and the first allowable time X = 1, No. 1 shown in FIG. 2 and no. The field connection operation information of 3 satisfies 0 ≦ T _c ≦ X. Therefore, the spot-linking operation determining unit 71 determines the No. 2 or No. 3 is read out and output to the connecting operation executing unit 72.

なお、０≦Ｔ_ｃ≦Ｘを満たす場つなぎ動作情報が複数ある場合は、音声対話装置１０ｂが動作を実行しない時間をより短くするために、減算値Ｔ_ｃの値がより小さい場つなぎ動作情報を選択することが好ましい。つまり上記の例の場合、減算値Ｔ_ｃが０となるＮｏ．３の場つなぎ動作情報を選択することが好ましい。また、減算値Ｔ_ｃの値が同じ場つなぎ動作情報が複数ある場合は、それらの中から場つなぎ動作情報を１つランダムに選択してもよい。 In addition, when there are a plurality of field connection operation information satisfying 0 ≦ T _c ≦ X, the field connection operation information with a smaller value of the subtraction value T _c is used in order to shorten the time during which the voice interaction device 10b does not execute the operation. Is preferably selected. In other words, in the case of the above example, subtraction value _{T c} is 0 No. It is preferable to select the operation information for connecting the three. In addition, when there are a plurality of field connection operation information having the same value of the subtraction value _Tc , one of the field connection operation information may be selected at random.

一方、０≦Ｔ_ｃ≦Ｘを満たす場つなぎ動作情報が無い場合、場つなぎ動作決定部７１は、減算値Ｔ_ｃの正負の符号を変更した値である符号変更値−Ｔ_ｃそれぞれについて、０以上かつ、応答の生成が完了してから音声対話装置１０ｂの場つなぎ動作が完了するまでの時間として許容できる時間を示す第２許容時間Ｙ以下となるか否か（０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報があるか否か）を判定する。第２許容時間Ｙは予め設定されている値であり、例えば、Ｙ＝２であれば、応答の生成が完了してから、場つなぎ動作が完了するまでの時間として許容できる時間が２秒であるということである。なお、場つなぎ動作決定部７１は、各場つなぎ動作時間から受け取った予測待機時間Ｔ_ｂを減算することで符号変換値−Ｔ_ｃを算出してもよい。 On the other hand, when there is no field joining operation information that satisfies 0 ≦ T _c ≦ X, the field joining operation determining unit 71 sets 0 for each of the sign change values −T _c that are values obtained by changing the sign of the subtraction value T _c. Whether or not it is equal to or shorter than the second permissible time Y indicating the time that is acceptable as the time from the completion of the response generation to the completion of the connection operation of the voice interactive device 10b (0 ≦ −T _c ≦ Y) It is determined whether or not there is connection operation information that satisfies the above condition. The second allowable time Y is a preset value. For example, if Y = 2, the allowable time from the completion of response generation to the completion of the splicing operation is 2 seconds. That is. Incidentally, the field joint operation determining unit 71 may calculate the code conversion value -T _c by subtracting the predicted waiting time T _b which is received from the field connecting operation time.

０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報がある場合、場つなぎ動作決定部７１は、当該場つなぎ動作情報が示す場つなぎ動作を音声対話装置１０ｂに実行させる場つなぎ動作として決定し、当該場つなぎ動作情報を場つなぎ動作実行部７２に出力する。例えば予測待機時間Ｔ_ｂが１秒であり、第２許容時間Ｙ＝１である場合、図９に示すＮｏ．２およびＮｏ．３の場つなぎ動作情報が０≦−Ｔ_ｃ≦Ｙを満たす。そのため、場つなぎ動作決定部７１は、Ｎｏ．２またはＮｏ．３の場つなぎ動作情報を読み出し、場つなぎ動作実行部７２に出力する。 When there is field connection operation information satisfying 0 ≦ −T _c ≦ Y, the field connection operation determination unit 71 determines a field connection operation indicated by the field connection operation information as a field connection operation to be executed by the voice interaction apparatus 10b. The field connection operation information is output to the field connection operation execution unit 72. For example, when the predicted waiting time _Tb is 1 second and the second allowable time Y = 1, No. 1 shown in FIG. 2 and no. The field connection operation information of 3 satisfies 0 ≦ −T _c ≦ Y. For this reason, the field connection operation determination unit 71 determines whether the No. 2 or No. 3 is read out and output to the connecting operation executing unit 72.

なお、０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報が複数ある場合は、音声対話装置１０ｂが動作を実行しない時間をより短くするために、符号変換値−Ｔ_ｃがより小さい場つなぎ動作情報を選択することが好ましい。つまり上記の例の場合、符号変換値−Ｔ_ｃが０となるＮｏ．２の場つなぎ動作情報を選択することが好ましい。 In addition, when there are a plurality of field connection operation information satisfying 0 ≦ −T _c ≦ Y, a field connection operation with a smaller code conversion value −T _c is performed in order to shorten the time during which the voice interactive device 10b does not execute the operation. It is preferable to select information. In other words, in the case of the above example, the code conversion value −T _c becomes 0. It is preferable to select the operation information for connecting the two points.

また、符号変換値−Ｔ_ｃの値が同じ場つなぎ動作情報が複数ある場合は、それらの中から場つなぎ動作情報を１つランダムに選択してもよい。 Also, if the value of the code conversion value -T _c there are multiple same field joint operation information, a field joint operation information may be selected in a single random from those.

なお、第１許容時間Ｘおよび第２許容時間Ｙの少なくとも一方において、すべての場つなぎ動作情報に対して同じ値が設定されてもよいし、場つなぎ動作情報ごとに異なる値が設定されてもよい。また、第１許容時間Ｘおよび第２許容時間Ｙの少なくとも一方は、音声データのデータ量および発話時間の少なくとも一方に応じて設定されてもよい。つまり場つなぎ動作決定部７１は、待機時間予測部１６から受け取った音声データのデータ量または発話時間に基づいて、第１許容時間Ｘおよび第２許容時間Ｙの少なくとも一方を決定する。 Note that, in at least one of the first allowable time X and the second allowable time Y, the same value may be set for all the joining motion information, or different values may be set for each joining motion information. Good. Further, at least one of the first allowable time X and the second allowable time Y may be set according to at least one of the amount of audio data and the speech time. In other words, the field connection operation determination unit 71 determines at least one of the first allowable time X and the second allowable time Y based on the data amount or speech time of the audio data received from the standby time prediction unit 16.

一方、０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報が無い場合、場つなぎ動作決定部７１は、複数の場つなぎ動作情報を選択する。具体的には、場つなぎ動作決定部７１は、場つなぎ動作時間≦待機時間を満たす場つなぎ動作情報のうち、場つなぎ動作時間が最も長い場つなぎ動作情報を１つ選択する。そして、予測待機時間Ｔ_ｂから、選択した場つなぎ動作情報に対応付けられた場つなぎ動作時間を減算した値（残時間）を算出し、場つなぎ動作時間≦残時間を満たす場つなぎ動作情報をさらに選択する。そして、場つなぎ動作決定部７１は、選択した複数の場つなぎ動作情報に対応付けられた場つなぎ動作時間を合計した合計値を算出し、０≦予測待機時間Ｔ_ｂ−合計値≦Ｘまたは０≦−（予測待機時間Ｔ_ｂ−合計値）≦Ｙを満たすか否かを判定する。いずれか一方を満たす場合、複数の場つなぎ動作情報を「Ｎｏ．」のカラムの数字と対応付けて場つなぎ動作実行部７２に出力する。 On the other hand, when there is no field connection operation information that satisfies 0 ≦ −T _c ≦ Y, the field connection operation determination unit 71 selects a plurality of field connection operation information. Specifically, the field connection operation determination unit 71 selects one field connection operation information having the longest field connection operation time among the field connection operation information satisfying the field connection operation time ≦ the standby time. Then, from the predicted waiting time T _b, calculates a subtracted value (remaining time) the field connecting operation time associated with the selected field joint operation information, the field joint operation information satisfying a field joint operation time ≦ remaining time Select further. Then, the field connection operation determination unit 71 calculates a total value obtained by summing the field connection operation times associated with the selected plurality of field connection operation information, and 0 ≦ predicted standby time T _b −total value ≦ X or 0 It is determined whether or not ≦ − (predicted waiting time T _b −total value) ≦ Y is satisfied. If either one is satisfied, a plurality of field connection operation information is output to the field connection operation execution unit 72 in association with the numbers in the “No.” column.

一方、いずれも満たさない場合、予測待機時間Ｔ_ｂから合計値を減算した値を算出し、場つなぎ動作時間≦当該算出した値を満たす場つなぎ動作情報をさらに選択する。そして、場つなぎ動作決定部７１は、選択した複数の場つなぎ動作情報に対応付けられた場つなぎ動作時間を合計した合計値を算出し、０≦予測待機時間Ｔ_ｂ−合計値≦Ｘまたは０≦−（予測待機時間Ｔ_ｂ−合計値）≦Ｙを満たすか否かを判定する。場つなぎ動作決定部７１は、これらの処理を０≦予測待機時間Ｔ_ｂ−合計値≦Ｘまたは０≦−（予測待機時間Ｔ_ｂ−合計値）≦Ｙのいずれか一方を満たすようになるまで繰り返す。 On the other hand, it is not satisfied either, calculates a value obtained by subtracting the sum from the prediction waiting time T _b, further selects a field joint operation information satisfying the value calculated field joint operation time ≦ the. Then, the field connection operation determination unit 71 calculates a total value obtained by summing the field connection operation times associated with the selected plurality of field connection operation information, and 0 ≦ predicted standby time T _b −total value ≦ X or 0 It is determined whether or not ≦ − (predicted waiting time T _b −total value) ≦ Y is satisfied. The field connection operation determination unit 71 performs these processes until either 0 ≦ predicted standby time T _b −total value ≦ X or 0 ≦ − (predicted standby time T _b −total value) ≦ Y is satisfied. repeat.

なお、０≦Ｔ_ｃ≦Ｘまたは０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報が複数ある場合は、音声対話装置１０ｂが動作を実行しない時間をより短くするために、Ｔ_ｃまたは−Ｔ_ｃの値がより小さい場つなぎ動作情報を選択することが好ましい。なお、Ｔ_ｃまたは−Ｔ_ｃの値が同じ場つなぎ動作情報が複数ある場合は、それらの中から場つなぎ動作情報を１つランダムに選択してもよい。 In addition, when there are a plurality of connection operation information satisfying 0 ≦ T _c ≦ X or 0 ≦ −T _c ≦ Y, in order to shorten the time during which the voice interactive apparatus 10b does not execute an operation, T _c or −T It is preferable to select the joining operation information when the value of _c is smaller. Incidentally, if the value of T _c or -T _c there are multiple same field joint operation information, a field joint operation information may be selected in a single random from those.

場つなぎ動作実行部７２は、場つなぎ動作決定部７１が決定した場つなぎ動作情報が示す場つなぎ動作を音声対話装置１０ｂに実行させる。具体的には、場つなぎ動作実行部７２は、場つなぎ動作決定部７１から場つなぎ動作情報を受け取ると、当該場つなぎ動作情報が示す場つなぎ動作を音声対話装置１０ｂに実行させる。例えば、音声出力部４を制御して音声を出力させたり、駆動部５を制御して音声対話装置１０ｂの可動部位を動作させたりする。場つなぎ動作実行部７２は、場つなぎ動作の実行が完了すると、その旨を応答実行部１５に通知する。また、場つなぎ動作実行部７２は、場つなぎ動作決定部７１から複数の場つなぎ動作情報を受け取った場合、複数の場つなぎ動作の動作順序をランダムに決定する。そして、決定した動作順序で音声対話装置１０に場つなぎ動作を実行させる。なお、記憶部６ｂに動作順序を規定する情報を格納しておき、当該情報が示す動作順序に基づいて動作順序を決定してもよいし、動作順序を場つなぎ動作に対応付けられている「Ｎｏ．」の数字が若い順としてもよい。なお、場つなぎ動作制御部１７は、音声対話装置１０ｂに実行させる場つなぎ動作を示す場つなぎ動作情報を、修正要否判定部２１ｂに出力する。これは、場つなぎ動作決定部７１が音声対話装置１０ｂに実行させる場つなぎ動作を示す場つなぎ動作情報を決定した時点で行ってもよいし、場つなぎ動作実行部７２が音声対話装置１０ｂに場つなぎ動作を実行させてから行ってもよい。なお、後述するフローチャートでは、場つなぎ動作実行部７２が音声対話装置１０ｂに場つなぎ動作を実行させてから、場つなぎ動作情報を修正要否判定部２１ｂに出力するものとして説明する。 The field connection operation execution unit 72 causes the voice interaction device 10b to execute the field connection operation indicated by the field connection operation information determined by the field connection operation determination unit 71. Specifically, when the field connection operation execution unit 72 receives the field connection operation information from the field connection operation determination unit 71, the field connection operation execution unit 72 causes the voice interaction device 10b to execute the field connection operation indicated by the field connection operation information. For example, the voice output unit 4 is controlled to output a voice, or the drive unit 5 is controlled to operate a movable part of the voice interaction device 10b. When the execution of the joining operation is completed, the jumping operation execution unit 72 notifies the response execution unit 15 to that effect. In addition, when receiving a plurality of field connection operation information from the field connection operation determination unit 71, the field connection operation execution unit 72 randomly determines an operation order of the plurality of field connection operations. Then, the voice interaction device 10 is caused to execute the joining operation in the determined operation order. Note that information defining the operation order may be stored in the storage unit 6b, and the operation order may be determined based on the operation order indicated by the information, or the operation order may be associated with the jumping operation. “No.” may be in ascending order. It should be noted that the field connection operation control unit 17 outputs the field connection operation information indicating the field connection operation to be executed by the voice interaction apparatus 10b to the correction necessity determination unit 21b. This may be performed at the time when the field connection operation determining unit 71 determines the field connection operation information indicating the field connection operation to be executed by the voice interaction device 10b, or the field connection operation execution unit 72 is connected to the voice interaction device 10b. You may perform after performing a joining operation | movement. In the flowchart to be described later, it is assumed that the joining operation execution unit 72 causes the voice interaction device 10b to perform the joining operation, and then outputs the joining operation information to the correction necessity determination unit 21b.

応答修正部１２ｂは、応答生成部１４が生成した応答情報を修正する。応答修正部１２ｂは、修正要否判定部２１ｂおよび修正実行部２２ｂを含む。 The response correction unit 12b corrects the response information generated by the response generation unit 14. The response correction unit 12b includes a correction necessity determination unit 21b and a correction execution unit 22b.

修正要否判定部２１ｂは、ユーザが発した音声を取得してから、当該音声に対する応答が出力可能になるまでの待機時間に応答の修正要否を判定するための修正条件が満たされたか否かを判定する。具体的には、修正要否判定部２１ｂは、場つなぎ動作決定部７１から受け取った場つなぎ動作情報が、音声対話装置１０ｂに音声の出力および動作の少なくとも一方を行わせる場つなぎ動作を示しているか否かを判定する。より具体的には、修正要否判定部２１ｂは、図９に示す場つなぎ動作情報のうち、受け取った場つなぎ動作情報がＮｏ．１の場つなぎ動作情報であるか、またはその他の場つなぎ動作情報であるかを判定する。そして、判定結果を修正実行部２２ｂに出力する。 Whether or not the correction necessity determination unit 21b has acquired the voice uttered by the user and whether or not the correction condition for determining whether or not the response needs to be corrected is satisfied during the waiting time until the response to the voice can be output is satisfied. Determine whether. Specifically, the correction necessity determination unit 21b indicates a field connection operation in which the field connection operation information received from the field connection operation determination unit 71 causes the voice interaction apparatus 10b to perform at least one of voice output and operation. It is determined whether or not. More specifically, the correction necessity determination unit 21b determines that the received connection operation information among the connection operation information illustrated in FIG. It is determined whether the operation information is one field connection operation information or the other field connection operation information. Then, the determination result is output to the correction execution unit 22b.

修正実行部２２ｂは、上記修正条件が満たされたと判定されたとき、応答生成部１４が生成した応答情報を修正する。具体的には、修正実行部２２ｂは、修正要否判定部２１ｂから受け取った判定結果が、修正要否判定部２１ｂが受け取った場つなぎ動作情報がＮｏ．１以外の場つなぎ動作情報であることを示しているとき、応答修正テーブル６１を参照して、待機時間予測部１６から受け取った予測待機時間Ｔ_ｂに対応付けられた付加応答情報を特定する。そして、特定した付加応答情報を読み出し、受け取った応答情報に付加することで、応答情報を修正する。そして、修正実行部２２ｂは、修正した応答情報を応答実行部１５に出力する。 When it is determined that the correction condition is satisfied, the correction execution unit 22b corrects the response information generated by the response generation unit 14. Specifically, the correction execution unit 22b indicates that the determination result received from the correction necessity determination unit 21b indicates that the connection operation information received by the correction necessity determination unit 21b is No. When the identification information indicates that the field joint operation information other than 1, with reference to the response modification table 61, specifies the additional response information corresponding to the predicted waiting time T _b which is received from the standby time predicting unit 16. Then, the response information is corrected by reading the specified additional response information and adding it to the received response information. Then, the correction execution unit 22b outputs the corrected response information to the response execution unit 15.

一方、修正実行部２２ｂは、修正要否判定部２１ｂから受け取った判定結果が、修正要否判定部２１ｂが受け取った場つなぎ動作情報がＮｏ．１の場つなぎ動作情報であることを示しているとき、応答生成部１４から受け取った応答情報を修正せず、そのまま応答実行部１５に出力する。 On the other hand, the correction execution unit 22b indicates that the determination result received from the correction necessity determination unit 21b indicates that the connection operation information received by the correction necessity determination unit 21b is No. When it is shown that the operation information is connected to the field 1, the response information received from the response generation unit 14 is output to the response execution unit 15 without modification.

次に、図１０に基づいて、制御部１ｂが実行する応答実行処理の流れについて説明する。図１０は、制御部１ｂが実行する応答実行処理の流れの一例を示すフローチャートである。 Next, based on FIG. 10, the flow of response execution processing executed by the control unit 1b will be described. FIG. 10 is a flowchart illustrating an example of a flow of response execution processing executed by the control unit 1b.

まず、音声入力部２は音声の入力を待機している（Ｓ２１）。音声入力部２は、ユーザが発した音声を取得すると（Ｓ２１でＹＥＳ）、取得した音声を音声データに変換し、当該音声データを音声認識部１３に出力し、また当該音声データの音声付属情報を待機時間予測部１６に出力する。 First, the voice input unit 2 waits for voice input (S21). When the voice input unit 2 acquires the voice uttered by the user (YES in S21), the voice input unit 2 converts the acquired voice into voice data, outputs the voice data to the voice recognition unit 13, and the voice attached information of the voice data. Is output to the standby time prediction unit 16.

続いて待機時間予測部１６は待機時間を予測する（Ｓ２２）。具体的には、待機時間予測部１６は、音声データを受け取ると、当該音声データのデータ量および発話時間の少なくとも一方を用いて、予測待機時間Ｔ_ｂを算出する。待機時間予測部１６は予測した待機時間を場つなぎ動作制御部１７（場つなぎ動作決定部７１）および修正実行部２２ｂに出力する。続いて場つなぎ動作決定部７１は、場つなぎ動作決定処理を行う（Ｓ２３）。なお、場つなぎ動作決定処理の詳細については後述する。場つなぎ動作決定部７１は、音声対話装置１０ｂに実行させると決定した場つなぎ動作を示す場つなぎ動作情報を、場つなぎ動作実行部７２に出力する。そして、場つなぎ動作実行部７２は、受け取った場つなぎ動作情報に応じて、音声対話装置１０ｂに場つなぎ動作を実行させる（Ｓ２４）。場つなぎ動作実行部７２は、場つなぎ動作の実行が完了すると、その旨を修正実行部２２ｂに通知する。 Subsequently, the standby time prediction unit 16 predicts the standby time (S22). Specifically, the standby time predicting section 16 receives the audio data, by using at least one of the data amount and the speech time of the speech data, calculates the predicted waiting time T _b. The standby time predicting unit 16 outputs the predicted standby time to the field connection operation control unit 17 (field connection operation determination unit 71) and the correction execution unit 22b. Subsequently, the field joining operation determination unit 71 performs a field joining operation determination process (S23). Note that the details of the joining operation determination process will be described later. The field connection operation determination unit 71 outputs, to the field connection operation execution unit 72, field connection operation information indicating the field connection operation determined to be executed by the voice interaction apparatus 10b. Then, the place joining operation execution unit 72 causes the voice interaction device 10b to execute the place joining operation according to the received place joining operation information (S24). When the execution of the joining operation is completed, the joining operation execution unit 72 notifies the modification execution unit 22b to that effect.

一方、音声認識部１３は音声認識処理を行う（Ｓ２５）。具体的には、音声認識部１３は、音声データを受け取ると、当該音声データについて音声認識処理を行い、音声認識結果を応答生成部１４に出力する。続いて応答生成部１４は応答情報を生成する（Ｓ２６）。具体的には、応答生成部１４は、受け取った音声認識結果に応じた応答情報を生成し、修正実行部２２ｂに出力する。 On the other hand, the voice recognition unit 13 performs voice recognition processing (S25). Specifically, when receiving the voice data, the voice recognition unit 13 performs voice recognition processing on the voice data and outputs the voice recognition result to the response generation unit 14. Subsequently, the response generation unit 14 generates response information (S26). Specifically, the response generation unit 14 generates response information according to the received voice recognition result and outputs the response information to the correction execution unit 22b.

なお、図１０に示すように、ステップＳ２２、Ｓ２３、Ｓ２４の処理と、ステップＳ２５、Ｓ２６の処理とは並列に行われる。つまり、応答修正部１２ｂは、応答情報および場つなぎ動作の実行が完了した旨の通知のいずれか一方のみを受け取った場合、もう一方を受け取るまで待機する。そして、応答修正部１２ｂ（修正要否判定部２１ｂ）は、上記通知と応答情報とを受け取ると、場つなぎ動作として発話および動作の少なくとも一方を実行させたか否かを判定する（Ｓ２７）。具体的には、修正要否判定部２１ｂは、図９に示す場つなぎ動作情報のうち、受け取った場つなぎ動作情報がＮｏ．１の場つなぎ動作情報であるか、またはその他の場つなぎ動作情報であるかを判定する。そして、判定結果を修正実行部２２ｂに出力する。 As shown in FIG. 10, the processes in steps S22, S23, and S24 and the processes in steps S25 and S26 are performed in parallel. That is, when only one of the response information and the notification that the execution of the joining operation has been completed is received, the response correction unit 12b waits until the other is received. Then, when receiving the notification and the response information, the response correction unit 12b (correction necessity determination unit 21b) determines whether or not at least one of the utterance and the operation has been executed as the joining operation (S27). Specifically, the correction necessity determination unit 21b determines that the received connection operation information is No. in the connection operation information illustrated in FIG. It is determined whether the operation information is one field connection operation information or the other field connection operation information. Then, the determination result is output to the correction execution unit 22b.

修正要否判定部２１ｂが受け取った場つなぎ動作情報がＮｏ．１以外の場つなぎ動作情報であると判定された場合（Ｓ２７でＹＥＳ）、修正実行部２２ｂは当該判定結果を受け取ると、待機時間予測部１６から受け取った予測待機時間Ｔ_ｂに応じた付加応答を特定し、応答情報を修正する（Ｓ２８）。具体的には、修正実行部２２ｂは、記憶部６に記憶されている応答修正テーブル６１を参照して、受け取った予測待機時間Ｔ_ｂと合致する待機時間に対応付けられた付加応答情報を特定する。そして、特定した付加応答情報を読み出し、応答生成部１４から受け取った応答情報に付加することで、応答情報を修正する。そして、修正実行部２２ｂは、修正した応答情報を応答実行部１５に出力する。 The connection operation information received by the correction necessity determination unit 21b is No. If it is determined that the field joint operation information other than 1 (YES at S27), the correction execution unit 22b receives the determination result, the additional responses for predicted waiting time T _b received from the standby time predicting section 16 And the response information is corrected (S28). Specifically, correction execution unit 22b refers to the response modification table 61 stored in the storage unit 6, a specific additional response information associated with the waiting time matches the predicted waiting time T _b received To do. Then, the identified additional response information is read out and added to the response information received from the response generation unit 14, thereby correcting the response information. Then, the correction execution unit 22b outputs the corrected response information to the response execution unit 15.

これに対して、修正要否判定部２１ｂが受け取った場つなぎ動作情報がＮｏ．１の場つなぎ動作情報であると判定された場合（Ｓ２７でＮＯ）、修正実行部２２ｂは当該判定結果を受け取ると、応答生成部１４から受け取った応答情報を修正することなく、応答実行部１５に出力する。つまり、上述したステップＳ２８の処理が省略される。 On the other hand, the connection operation information received by the correction necessity determination unit 21b is No. When it is determined that the information is the connection operation information of No. 1 (NO in S27), when the correction execution unit 22b receives the determination result, the response execution unit 15 does not correct the response information received from the response generation unit 14. Output to. That is, the process in step S28 described above is omitted.

最後に、応答実行部１５は音声対話装置１０ｂに応答を実行させる（Ｓ２９）。具体的には、応答実行部１５は、受け取った応答情報に応じて、音声出力部４を制御して音声を出力させたり、駆動部５を制御して音声対話装置１０ｂの可動部位を動作させたりする。以上で、応答実行処理は終了する。 Finally, the response execution unit 15 causes the voice interaction device 10b to execute a response (S29). Specifically, the response execution unit 15 controls the voice output unit 4 to output voice according to the received response information, or controls the drive unit 5 to operate the movable part of the voice interaction device 10b. Or Thus, the response execution process ends.

続いて、図１１に基づいて、場つなぎ動作決定部７１が実行する場つなぎ動作決定処理の流れについて説明する。図１１は、図１０のフローチャートにおける場つなぎ動作決定処理の流れの一例を示すフローチャートである。なお、図１１のフローチャートにおいて、場つなぎ動作テーブル６３に含まれる場つなぎ動作情報には、一般的に想定される待機時間程度の場つなぎ動作時間が対応付けられているものとする。 Next, the flow of the field connection operation determination process executed by the field connection operation determination unit 71 will be described with reference to FIG. FIG. 11 is a flowchart showing an example of the flow of the jointing operation determination process in the flowchart of FIG. In the flowchart of FIG. 11, it is assumed that the field connection operation information included in the field connection operation table 63 is associated with a field connection operation time that is generally equal to a standby time.

まず、場つなぎ動作決定部７１は、待機時間予測部１６が予測した予測待機時間Ｔ_ｂを受け取ると、場つなぎ動作テーブル６３を読み出し、予測待機時間Ｔ_ｂから各場つなぎ動作時間を減算した減算値Ｔ_ｃを算出する（Ｓ３１）。続いて、場つなぎ動作決定部７１は、算出した減算値Ｔ_ｃおよび第１許容時間Ｘを用いて場つなぎ動作テーブル６３を参照し、０≦Ｔ_ｃ≦Ｘを満たす場つなぎ動作情報があるか否かを判定する（Ｓ３２）。 First, the field joint operation determining unit 71 receives the prediction waiting time T _b which waiting time prediction unit 16 predicts, reading a field joint operation table 63, the subtraction obtained by subtracting the field connecting operation time from the predicted waiting time T _b A value _Tc is calculated (S31). Subsequently, the field connection operation determination unit 71 refers to the field connection operation table 63 using the calculated subtraction value T _c and the first allowable time X, and determines whether there is field connection operation information satisfying 0 ≦ T _c ≦ X. It is determined whether or not (S32).

０≦Ｔ_ｃ≦Ｘを満たす場つなぎ動作情報がある場合（Ｓ３２でＹＥＳ）、場つなぎ動作決定部７１は、当該場つなぎ動作情報のうちの１つが示す場つなぎ動作を、音声対話装置１０ｂが実行する場つなぎ動作に決定する（Ｓ３３）。具体的には、０≦Ｔ_ｃ≦Ｘを満たす場つなぎ動作情報のうち、Ｔ_ｃの値がより小さい場つなぎ動作情報を選択する。そして、場つなぎ動作決定部７１は、選択した場つなぎ動作情報を場つなぎ動作実行部７２に出力する。 When there is field connection operation information satisfying 0 ≦ T _c ≦ X (YES in S32), the field connection operation determination unit 71 indicates the field connection operation indicated by one of the field connection operation information by the voice interactive device 10b. It is determined to perform the joining operation when performing (S33). Specifically, the field connection operation information having a smaller value of T _c is selected from the field connection operation information satisfying 0 ≦ T _c ≦ X. Then, the field connection operation determination unit 71 outputs the selected field connection operation information to the field connection operation execution unit 72.

一方、０≦Ｔ_ｃ≦Ｘを満たす場つなぎ動作情報が無い場合（Ｓ３２でＮＯ）、場つなぎ動作決定部７１は、減算値Ｔ_ｃから符号変更値−Ｔ_ｃを算出し、符号変更値−Ｔ_ｃおよび第２許容時間Ｙを用いて場つなぎ動作テーブル６３を参照し、０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報があるか否かを判定する（Ｓ３４）。 On the other hand, when there is no field joining operation information that satisfies 0 ≦ T _c ≦ X (NO in S32), the field joining operation determining unit 71 calculates the sign change value −T _c from the subtraction value T _c, and the sign change value− By using _Tc and the second permissible time Y, it is determined whether or not there is connection operation information satisfying 0 ≦ −T _c ≦ Y by referring to the connection operation table 63 (S34).

０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報がある場合（Ｓ３４でＹＥＳ）、場つなぎ動作決定部７１は、当該場つなぎ動作情報のうちの１つが示す場つなぎ動作を、音声対話装置１０ｂが実行する場つなぎ動作に決定する（Ｓ３５）。具体的には、０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報のうち、−Ｔ_ｃの値がより小さい場つなぎ動作情報を選択する。 If there is field connection operation information satisfying 0 ≦ −T _c ≦ Y (YES in S34), the field connection operation determination unit 71 displays the field connection operation indicated by one of the field connection operation information as the voice interactive device 10b. Is determined to be connected (S35). Specifically, among the field connection operation information satisfying 0 ≦ −T _c ≦ Y, the field connection operation information having a smaller value of −T _c is selected.

一方、０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報が無い場合（Ｓ３４でＮＯ）、複数の場つなぎ動作を組み合わせて、音声対話装置１０が実行する場つなぎ動作を示す場つなぎ動作情報とする（Ｓ３６）。なおこの場合、場つなぎ動作実行部７２は、受け取った複数の場つなぎ動作情報の動作順序を決定する。以上で、場つなぎ動作決定処理は終了する。 On the other hand, when there is no field connection operation information satisfying 0 ≦ −T _c ≦ Y (NO in S34), the field connection operation information indicating the field connection operation executed by the voice interactive device 10 by combining a plurality of field connection operations (S36). In this case, the spot joining operation execution unit 72 determines the order of operations of the received plurality of spot joining action information. This is the end of the joining operation determination process.

なお、本実施形態では、第１許容時間Ｘおよび第２許容時間Ｙの両方を用いた判定を行ったが、第１許容時間Ｘまたは第２許容時間Ｙのいずれか一方のみを用いた判定を行ってもよい。 In the present embodiment, the determination using both the first allowable time X and the second allowable time Y is performed. However, the determination using only one of the first allowable time X and the second allowable time Y is performed. You may go.

また、本実施形態では、０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報が無い場合、複数の場つなぎ動作情報を組み合わせることで、０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報としていたが、場つなぎ動作情報に対応付けられた場つなぎ動作時間を変更することで、０≦Ｔ_ｃ≦Ｘまたは０≦−Ｔ_ｃ≦Ｙを満たす場つなぎ動作情報としてもよい。 Further, in the present embodiment, 0 ≦ -T _c ≦ Y when no field joint operation information satisfying take, by combining a plurality of field joint operation information, was a place connecting operation information satisfying 0 ≦ -T _c ≦ Y However, it is good also as field joint operation information which satisfy | fills 0 <= _Tc <= X or 0 <=- _Tc <= Y by changing the field joint operation time matched with the field joint motion information.

また、待機時間予測部１６が予測した予測待機時間Ｔ_ｂよりも応答情報の生成に時間がかかった場合、待機時間予測部１６は、音声認識部１３が行った音声認識結果を用いて、再度予測待機時間Ｔ_ｂを算出してもよい。そして、新たな予測待機時間が以前の予測待機時間よりも長くなる場合、場つなぎ動作決定部７１は、再度場つなぎ動作の決定を行ってもよい。 Also, if the standby time predicting unit 16 takes time to generate the response information than the predicted waiting time T _b which is predicted waiting time prediction unit 16 uses the speech recognition result voice recognition unit 13 has performed again it may calculate the predicted waiting time T _b. Then, when the new predicted standby time becomes longer than the previous predicted standby time, the joining operation determining unit 71 may determine the joining operation again.

また、待機時間予測部１６が予測した予測待機時間Ｔ_ｂよりも早く応答情報の生成が完了した場合であって、場つなぎ動作決定部７１が複数の場つなぎ動作を選択している場合、場つなぎ動作実行部７２は、その時点以降に行う場つなぎ動作の実行をキャンセルしてもよい。また、待機時間予測部１６が予測した予測待機時間Ｔ_ｂよりも早く応答情報の生成が完了した場合、場つなぎ動作実行部７２は実行する場つなぎ動作の場つなぎ動作時間を早めてもよい。 Further, in a case where the generation of early response information than the predicted waiting time T _b which waiting time prediction unit 16 predicts has been completed, if the field joint operation determiner 71 selects the plurality of field stitching operation, field The joining operation execution unit 72 may cancel the execution of the joining operation performed after that time. Also, if the generation of the early response information than the predicted waiting time T _b which waiting time prediction unit 16 predicts is completed, field joint operation execution unit 72 may accelerate the field joint operation time field joint action to perform.

また、本実施形態では、場つなぎ動作決定部７１は、条件を満たす場つなぎ動作情報が複数ある場合、減算値Ｔ_ｃ（または符号変更値−Ｔ_ｃ）の値がより小さいものを選択していたが、この例に限定されるものではない。例えば、場つなぎ動作テーブル６３に、各場つなぎ動作を最後に実行した日時を示す履歴情報を格納するカラムがあり、条件を満たす場つなぎ動作情報が複数ある場合は、当該履歴情報がより古い日時を示しているものを選択してもよい。 Further, in the present embodiment, the field joining operation determination unit 71 selects the smaller subtraction value T _c (or sign change value −T _c ) when there are a plurality of field joining operation information that satisfy the conditions. However, it is not limited to this example. For example, if there is a column that stores history information indicating the date and time when each place-joining operation was last executed in the place-joining operation table 63, and there are multiple pieces of place-joint action information that satisfy the condition, the date and time when the history information is older May be selected.

また、場つなぎ動作決定部７１が決定した場つなぎ動作の種別が「音声」である場合、決定した場つなぎ動作より場つなぎ動作時間が短く、かつ種別が「身振り」である場つなぎ動作情報を選択し、２つの場つなぎ動作情報を組み合わせて場つなぎ動作実行部７２に出力してもよい。同様に、決定した場つなぎ動作の種別が「身振り」である場合は、決定した場つなぎ動作より場つなぎ動作時間が短く、かつ種別が「音声」である場つなぎ動作情報を選択し、２つの場つなぎ動作情報を組み合わせて場つなぎ動作実行部７２に出力してもよい。例えば、図９に示すＮｏ．７の場つなぎ動作情報が示す場つなぎ動作（種別：身振り、「起き上がる」動作を行う）を、音声対話装置１０ｂが実行する場つなぎ動作と決定した場合、場つなぎ動作決定部７１は、例えば図９に示すＮｏ．４の場つなぎ動作情報が示す場つなぎ動作（種別：音声、「ちょっと待ってね」と発話する）を音声対話装置１０ｂが実行する場つなぎ動作としてさらに決定し、これらの場つなぎ動作情報を場つなぎ動作実行部７２に出力する。場つなぎ動作実行部７２は、この情報を受けて、音声対話装置１０ｂに「『ちょっと待ってね』と発話しながら『起き上がる』動作」を実行させる。これにより、場つなぎ動作のバリエーションが増え、ユーザを飽きさせないようにすることができる。 In addition, when the type of the field transition operation determined by the field transition operation determination unit 71 is “speech”, the field transition operation information in which the field transition operation time is shorter than the determined field transition operation and the type is “gesture” is displayed. It is also possible to select and combine the two pieces of joining operation information and output them to the joining operation execution unit 72. Similarly, when the determined type of the field transition operation is “gesture”, the field transition operation time is shorter than the determined field transition operation and the type is “speech”, and The joining operation information may be combined and output to the joining operation execution unit 72. For example, as shown in FIG. In the case where it is determined that the field connection operation (type: gesture, “get up” operation) indicated by the field connection operation information of 7 is the field connection operation to be executed by the voice interactive device 10b, the field connection operation determination unit 71, for example, No. 9 shown in FIG. 4 is further determined as a field connection operation executed by the voice interaction device 10b, and the field connection operation information is stored in the field connection operation (type: voice, “speak for a moment”). The data is output to the linking operation execution unit 72. Upon receiving this information, the spot-linking operation execution unit 72 causes the voice interaction apparatus 10b to execute “an operation to“ wake up ”while speaking” “Please wait a moment”. As a result, variations in the joining operation can be increased and the user can be prevented from getting bored.

また、本実施形態では、待機時間予測部１６が予測した予測待機時間Ｔ_ｂと、応答修正テーブル６１に含まれる時間とが合致するという条件のみで付加応答情報を特定していたが、この例に限定されるものではない。例えば、修正実行部２２ｂは、音声対話装置に実行させると決定した場つなぎ動作のカテゴリを示す場つなぎ動作属性と、実施形態２にて説明した付加応答属性とが合致する付加応答情報を選択し、選択した付加応答情報の中から、予測待機時間Ｔ_ｂと応答修正テーブル６１に含まれる時間とが合致する付加応答情報を特定してもよい。 Also, in the present embodiment, the predicted waiting time T _b which waiting time prediction unit 16 predicts, but the time and that is included in the response modification table 61 had identified the additional response information only on condition that matches, this example It is not limited to. For example, the correction execution unit 22b selects the additional response information in which the connection operation attribute indicating the category of the connection operation determined to be executed by the voice interaction apparatus matches the additional response attribute described in the second embodiment. , from the selected additional response information, time and included in the response modification table 61 and the predicted waiting time T _b may identify additional response information that matches.

この場合、場つなぎ動作テーブル６３に含まれる場つなぎ動作情報には、各場つなぎ動作の内容のカテゴリを示す場つなぎ動作属性が対応付けられており、また、記憶部６ｂには応答修正テーブル６１に代えて、実施形態２にて説明した応答修正テーブル６１ａが記憶されている。場つなぎ動作決定部７１は、音声対話装置に実行させると決定した場つなぎ動作の場つなぎ動作属性を特定し、修正実行部２２ｂに出力する。修正実行部２２ｂは、受け取った場つなぎ動作属性と合致する付加応答属性と対応付けられた付加応答情報を選択し、さらにその中から、予測待機時間Ｔ_ｂと応答修正テーブル６１ａに含まれる時間とが合致する付加応答情報を特定する。 In this case, the field connection operation information included in the field connection operation table 63 is associated with the field connection operation attribute indicating the category of the content of each field connection operation, and the response correction table 61 is stored in the storage unit 6b. Instead, the response correction table 61a described in the second embodiment is stored. The field connection operation determination unit 71 specifies the field connection operation attribute of the field connection operation that is determined to be executed by the voice interaction apparatus, and outputs the field connection operation attribute to the correction execution unit 22b. Correction execution unit 22b selects the received field joint operation attributes and additional response information associated with the additional response attribute matching, further from them time and included in the response modification table 61a and the predicted waiting time T _b The additional response information that matches is specified.

〔変形例〕
上述した実施形態１では、ユーザが発した音声の音声認識、および当該音声に対する応答情報の生成は、いずれも音声対話制御装置（制御部１）にて行っていたが、これらの処理は音声対話装置１０と通信可能な外部装置（外部サーバ、不図示）が行ってもよい。つまり、音声対話装置１０は、音声を取得すると音声データに変換し、通信部３を介して当該音声データを外部装置に送信する。外部装置は、音声認識および応答情報の生成を行い、応答情報を音声対話装置１０に送信する。なお、この変形例は実施形態２および３にも適用可能である。 [Modification]
In Embodiment 1 described above, the voice recognition of the voice uttered by the user and the generation of response information for the voice are both performed by the voice dialogue control device (control unit 1). An external device (external server, not shown) that can communicate with the device 10 may perform the processing. That is, when the voice interaction apparatus 10 acquires voice, it converts it into voice data, and transmits the voice data to an external device via the communication unit 3. The external device performs voice recognition and response information generation, and transmits the response information to the voice interaction device 10. This modification can also be applied to the second and third embodiments.

また、上述した実施形態１において、制御部１は、待機時間計測部１１に代えて待機時間予測部１６を含み、待機時間を計測する代わりに待機時間を予測してもよい。なおこの変形例は実施形態２にも適用可能である。 In the first embodiment described above, the control unit 1 may include a standby time prediction unit 16 instead of the standby time measurement unit 11, and may predict the standby time instead of measuring the standby time. This modification can also be applied to the second embodiment.

また、上述した実施形態１では、応答情報を応答修正テーブル６１に格納された付加応答情報を用いて修正していたが、応答情報の修正はこの例に限定されない。例えば、音声に含まれる言葉の少なくとも一部を修正応答に含めてもよい。具体的には、修正実行部２２は、音声認識部１３が認識した音声データの少なくとも一部を切り出したり、編集または要約したりしたものを付加応答情報として用いることで応答情報を修正してもよい。例えば、ユーザが発した音声、すなわち音声認識部１３が認識した音声データが「今日の天気はなに？」であったとき、修正実行部２２は、音声認識部１３から音声認識結果を受け取り、「今日の天気は」を切り出す。そして、修正実行部２２は、応答生成部１４から受け取った「晴れだよ」という音声データ（応答情報）に対して、「今日の天気は」という音声データを付加して、「今日の天気は晴れだよ」という応答情報を生成する。なお、この変形例は実施形態２および３にも適用可能である。 Moreover, in Embodiment 1 mentioned above, although response information was corrected using the additional response information stored in the response correction table 61, correction of response information is not limited to this example. For example, at least a part of words included in the voice may be included in the correction response. Specifically, the correction execution unit 22 corrects the response information by using, as additional response information, a part of the voice data recognized by the voice recognition unit 13 that is cut out, edited, or summarized. Good. For example, when the voice uttered by the user, that is, the voice data recognized by the voice recognition unit 13 is “What is the weather today?”, The correction execution unit 22 receives the voice recognition result from the voice recognition unit 13, Cut out "Today's weather". Then, the correction execution unit 22 adds the voice data “Today's weather” to the voice data (response information) “Sunny” received from the response generation unit 14. Response information “It's sunny” is generated. This modification can also be applied to the second and third embodiments.

また修正条件は、上述した各実施形態の例に限定されるものではない。例えば、外部装置との通信ができなくなったときや、音声対話装置に備えられたセンサ（不図示）が取得した情報が所定の条件を満たしているときに、応答を修正してもよい。また、応答の修正は応答情報に付加応答情報を付加することに限定されない。例えば、応答情報の内容を変更してもよい。また、付加応答情報が示す内容は、応答情報が示す応答の前に実行されてもよいし、当該応答の後に実行されてもよい。 Further, the correction condition is not limited to the example of each embodiment described above. For example, the response may be corrected when communication with an external device becomes impossible or when information acquired by a sensor (not shown) provided in the voice interaction device satisfies a predetermined condition. Further, the modification of the response is not limited to adding the additional response information to the response information. For example, the content of the response information may be changed. The content indicated by the additional response information may be executed before the response indicated by the response information, or may be executed after the response.

〔ソフトウェアによる実現例〕
音声対話装置１０、１０ａ、１０ｂそれぞれの音声対話制御装置、すなわち制御部１、１ａ、１ｂは、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ＣＰＵ（Central Processing Unit）を用いてソフトウェアによって実現してもよい。 [Example of software implementation]
The voice interaction control devices of the voice interaction devices 10, 10a, and 10b, that is, the control units 1, 1a, and 1b may be realized by logic circuits (hardware) formed in an integrated circuit (IC chip) or the like. It may be realized by software using a CPU (Central Processing Unit).

後者の場合、音声対話装置１０は、各機能を実現するソフトウェアであるプログラムの命令を実行するＣＰＵ、上記プログラムおよび各種データがコンピュータ（またはＣＰＵ）で読み取り可能に記録されたＲＯＭ（Read Only Memory）または記憶装置（これらを「記録媒体」と称する）、上記プログラムを展開するＲＡＭ（Random Access Memory）などを備えている。そして、コンピュータ（またはＣＰＵ）が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the voice interaction device 10 includes a CPU that executes instructions of a program that is software that realizes each function, and a ROM (Read Only Memory) in which the program and various data are recorded so as to be readable by a computer (or CPU). Alternatively, a storage device (these are referred to as “recording media”), a RAM (Random Access Memory) that expands the program, and the like are provided. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.

〔まとめ〕
本発明の態様１に係る音声対話制御装置（制御部１）は、音声対話装置（１０）が実行する、ユーザが発した音声に対する応答を生成する応答生成部（１４）と、上記音声を取得してから上記応答が出力可能になるまでの待機時間に、上記応答の修正要否を判定するための修正条件が満たされたか否かを判定する判定部（修正要否判定部２１）と、上記判定部が、上記修正条件が満たされたと判定したとき、上記応答生成部が生成した上記応答を修正した修正応答を生成する修正部（修正実行部２２）と、上記修正部が生成した上記修正応答を上記音声対話装置に実行させる応答実行部（１５）と、を備える。 [Summary]
The voice dialogue control device (control unit 1) according to the first aspect of the present invention acquires the above-mentioned voice and the response generation unit (14) that generates a response to the voice uttered by the user, which is executed by the voice dialogue device (10). A determination unit (correction necessity determination unit 21) for determining whether or not a correction condition for determining whether or not the response needs to be corrected is satisfied during a waiting time until the response can be output; When the determination unit determines that the correction condition is satisfied, a correction unit (correction execution unit 22) that generates a correction response that corrects the response generated by the response generation unit, and the correction unit that generates the correction response. A response execution unit (15) for causing the voice interaction device to execute a correction response.

上記の構成によれば、音声を取得してから応答が出力可能になるまでに、修正条件が満たされたとき、応答を修正するので、応答を修正すべき状況で応答を修正することができる。例えば、応答が出力可能になるまでに時間を要したとき、そのことを当該応答にて表現する（時間を要したことを詫びる音声を追加で出力する）ことができる。よって、ユーザと音声対話装置とのコミュニケーションの柔軟性を向上させることができる。 According to the above configuration, since the response is corrected when the correction condition is satisfied from when the voice is acquired until the response can be output, the response can be corrected in a situation where the response should be corrected. . For example, when a time is required until a response can be output, this can be expressed by the response (additional audio that apologizes for the time required) can be output. Therefore, the flexibility of communication between the user and the voice interaction device can be improved.

本発明の態様２に係る音声対話制御装置は、上記態様１において、上記待機時間を計測する待機時間計測部（１１）をさらに備え、上記判定部は、上記待機時間計測部が計測した待機時間が所定の時間を超えるとき、上記修正条件が満たされたと判定してもよい。 The voice interaction control device according to aspect 2 of the present invention further includes a standby time measuring unit (11) that measures the standby time in the above aspect 1, and the determination unit is the standby time measured by the standby time measuring unit. May exceed the predetermined time, it may be determined that the correction condition is satisfied.

上記の構成によれば、待機時間を計測して、計測した待機時間が所定の時間を超えるとき、修正条件が満たされたと判定するので、応答が出力可能になるまでに時間を要したことを正確に判定することができる。これにより、応答が出力可能になるまでに時間を要したとき、そのことを当該応答にて表現することができる。 According to the above configuration, the standby time is measured, and when the measured standby time exceeds a predetermined time, it is determined that the correction condition is satisfied, so that it takes time until the response can be output. It can be determined accurately. Thereby, when it takes time until the response can be output, this can be expressed by the response.

本発明の態様３に係る音声対話制御装置は、上記態様２において、上記修正部は、上記待機時間に対応する時間情報に対応付けられた、上記応答の修正内容を示す修正内容情報を用いて、上記応答を修正してもよい。 In the voice interaction control device according to aspect 3 of the present invention, in the aspect 2, the correction unit uses the correction content information indicating the correction content of the response associated with the time information corresponding to the standby time. The response may be modified.

上記の構成によれば、待機時間に対応する時間情報に対応付けられた修正内容情報を用いて応答を修正するので、待機時間の長さに応じた応答の修正を行うことができる。例えば、待機時間が長時間となった場合は、長時間考えていたことを示すような音声を追加で出力する。つまり、待機時間の長さに応じた応答をすることができるので、ユーザが音声対話装置とのコミュニケーションにおいてストレスを感じることを防ぐことができる。 According to said structure, since a response is corrected using the correction content information matched with the time information corresponding to standby time, the response according to the length of standby time can be corrected. For example, when the standby time becomes long, an additional sound indicating that the user has been thinking for a long time is output. That is, since it is possible to respond according to the length of the waiting time, it is possible to prevent the user from feeling stress in communication with the voice interactive apparatus.

本発明の態様４に係る音声対話制御装置は、上記態様１において、上記待機時間を予測する待機時間予測部（１６）をさらに備え、上記判定部は、上記待機時間予測部が予測した予測待機時間が所定の時間を超えるとき、修正条件が満たされたと判定してもよい。 The voice interaction control device according to aspect 4 of the present invention further includes a standby time prediction unit (16) that predicts the standby time in the above aspect 1, and the determination unit predicts standby that is predicted by the standby time prediction unit. When the time exceeds a predetermined time, it may be determined that the correction condition is satisfied.

上記の構成によれば、待機時間を予測して、予測した待機時間が所定の時間を超えるとき、修正条件が満たされたと判定するので、応答が出力可能になるまでに時間を要したことを正確に判定することができる。これにより、応答が出力可能になるまでに時間を要したとき、そのことを当該応答にて表現することができる。 According to the above configuration, the standby time is predicted, and when the predicted standby time exceeds a predetermined time, it is determined that the correction condition is satisfied, so that it takes time until the response can be output. It can be determined accurately. Thereby, when it takes time until the response can be output, this can be expressed by the response.

本発明の態様５に係る音声対話制御装置は、上記態様４において、上記修正部は、上記予測待機時間に対応する時間情報に対応付けられた、上記応答の修正内容を示す修正内容情報を用いて、上記応答を修正してもよい。 In the voice interaction control device according to aspect 5 of the present invention, in the aspect 4, the correction unit uses correction content information indicating the correction content of the response associated with time information corresponding to the predicted waiting time. Thus, the response may be corrected.

上記の構成によれば、予測待機時間に対応する時間情報に対応付けられた修正内容情報を用いて応答を修正するので、予測した待機時間の長さに応じた応答の修正を行うことができる。例えば、予測した待機時間が長時間となった場合は、長時間考えていたことを示すような音声を追加で出力する。つまり、予測した待機時間の長さに応じた応答をすることができるので、ユーザが音声対話装置とのコミュニケーションにおいてストレスを感じることを防ぐことができる。 According to said structure, since a response is corrected using the correction content information matched with the time information corresponding to prediction waiting time, the response according to the length of the estimated waiting time can be corrected. . For example, when the predicted standby time is long, an additional sound indicating that the user has been thinking for a long time is output. That is, since a response according to the predicted length of the standby time can be made, it is possible to prevent the user from feeling stress in communication with the voice interactive apparatus.

本発明の態様６に係る音声対話制御装置は、上記態様３または５において、上記音声の内容のカテゴリを示す音声属性を特定する音声属性特定部（音声認識部１３）をさらに備え、上記修正内容情報には、上記修正内容のカテゴリを示す応答属性がさらに対応付けられており、上記修正部は、上記音声属性特定部が特定した上記音声属性に対応する上記応答属性に対応付けられた上記修正内容情報を用いて、上記応答を修正してもよい。 The voice conversation control device according to aspect 6 of the present invention further includes a voice attribute specifying unit (speech recognition unit 13) that specifies a voice attribute indicating a category of the content of the voice in the mode 3 or 5, and the correction content. The information is further associated with a response attribute indicating the category of the correction content, and the correction unit is configured to associate the correction attribute associated with the response attribute corresponding to the voice attribute specified by the voice attribute identification unit. The response may be modified using content information.

上記の構成によれば、待機時間または予測待機時間に対応する時間情報に対応付けられ、かつ音声属性に対応する応答属性に対応付けられた修正内容情報を用いて応答を修正するので、音声対話装置が、ユーザが発した音声に対して、より適切な修正を施した応答を行うことができる。 According to the above configuration, the response is corrected using the correction content information associated with the time information corresponding to the standby time or the predicted standby time and associated with the response attribute corresponding to the voice attribute. The device can make a response with a more appropriate modification to the voice uttered by the user.

本発明の態様７に係る音声対話制御装置は、上記態様１において、上記待機時間に上記音声対話装置に実行させる場つなぎ動作を決定する場つなぎ動作決定部（７１）をさらに備え、上記判定部は、上記場つなぎ動作決定部が、上記場つなぎ動作として上記音声対話装置に音声の出力および身振りの少なくとも一方を行わせることを決定したとき、上記修正条件が満たされたと判定してもよい。 The voice interaction control device according to aspect 7 of the present invention further includes a field connection operation determination unit (71) that determines a field connection operation to be executed by the voice interaction device during the standby time in the aspect 1, and the determination unit May determine that the correction condition is satisfied when the place-joining movement determination unit determines that the voice interactive apparatus performs at least one of voice output and gesture as the spot-joining action.

場つなぎ動作は、音声を取得してから応答が出力可能になるまでの待機時間を埋める動作であるので、当該動作として音声の出力および動作の少なくとも一方を音声対話装置が実行するということは、応答の生成に時間を要するということである。ここで、上記の構成によれば、場つなぎ動作として音声の出力および身振りの少なくとも一方を音声対話装置に行わせるとき、修正条件が満たされたと判定するので、応答の生成に時間を要したことを正確に判定することができる。 Since the connection operation is an operation that fills the waiting time from when the voice is acquired until the response can be output, the voice interaction device executes at least one of the voice output and the operation as the operation. It takes time to generate a response. Here, according to the above configuration, when the voice interaction device performs at least one of voice output and gesture as the connecting operation, it is determined that the correction condition is satisfied, and thus it takes time to generate a response. Can be accurately determined.

本発明の態様８に係る音声対話制御装置は、上記態様７において、上記待機時間を予測する待機時間予測部をさらに備え、上記修正部は、上記待機時間予測部が予測した予測待機時間に対応する時間情報に対応付けられた、上記応答の修正内容を示す修正内容情報を用いて、上記応答を修正してもよい。 The voice interaction control device according to aspect 8 of the present invention further includes a standby time prediction unit that predicts the standby time in aspect 7, and the correction unit corresponds to the predicted standby time predicted by the standby time prediction unit. The response may be corrected using correction content information indicating the correction content of the response, which is associated with the time information.

上記の構成によれば、予測待機時間に対応する時間情報に対応付けられた修正内容情報を用いて応答を修正するので、予測した待機時間の長さに応じた応答の修正を行うことができる。つまり、予測した待機時間の長さに応じた応答をすることができるので、ユーザが音声対話装置とのコミュニケーションにおいてストレスを感じることを防ぐことができる。 According to said structure, since a response is corrected using the correction content information matched with the time information corresponding to prediction waiting time, the response according to the length of the estimated waiting time can be corrected. . That is, since a response according to the predicted length of the standby time can be made, it is possible to prevent the user from feeling stress in communication with the voice interactive apparatus.

本発明の態様９に係る音声対話制御装置は、上記態様８において、上記修正内容情報には、上記修正内容のカテゴリを示す応答属性がさらに対応付けられており、上記場つなぎ動作決定部は、決定した上記場つなぎ動作のカテゴリを示す場つなぎ動作属性を特定し、上記修正部は、上記場つなぎ動作決定部が特定した上記場つなぎ動作属性に対応する上記応答属性に対応付けられた上記修正内容情報を用いて、上記応答を修正してもよい。 In the voice interaction control device according to aspect 9 of the present invention, in the aspect 8, the correction content information is further associated with a response attribute indicating the category of the correction content, The field connection operation attribute indicating the determined field connection operation category is specified, and the correction unit is associated with the response attribute corresponding to the field connection operation attribute specified by the field connection operation determination unit. The response may be modified using content information.

上記の構成によれば、待機時間に対応する時間情報に対応付けられ、かつ場つなぎ動作属性に対応する応答属性に対応付けられた修正内容情報を用いて応答を修正するので、音声対話装置が、実行した場つなぎ動作に合った修正を施した応答を行うことができる。 According to the above configuration, since the response is corrected using the correction content information that is associated with the time information corresponding to the standby time and is associated with the response attribute corresponding to the jumping operation attribute, the voice interactive apparatus is , It is possible to make a response with corrections suitable for the connecting operation performed.

本発明の態様１０に係る音声対話制御装置は、上記態様１から９のいずれかにおいて、上記修正部は、上記音声に含まれる言葉の少なくとも一部を上記修正応答に含めてもよい。 In the voice interaction control device according to aspect 10 of the present invention, in any one of the aspects 1 to 9, the correction unit may include at least a part of words included in the voice in the correction response.

ユーザが発した音声を取得してから応答を生成するまでに時間を要した場合、当該音声がどのような内容であったかを応答に含めることが望ましい。そこで上記の構成によれば、音声に含まれる言葉の少なくとも一部を修正応答に含める。これにより、ユーザと音声対話装置とのコミュニケーションを円滑に進めることができる。なお、音声に含まれる言葉の少なくとも一部を修正応答に含めるとは、例えば、「今日の天気はなに？」という音声に対する応答である「晴れだよ」を、音声の一部を用いて「今日の天気は晴れだよ」と修正することである。また、音声を編集した内容、すなわち、音声に含まれる言葉の一部を切り出し、再構成したものを用いて応答を修正してもよい。例えば、「今日の天気はなに？」という音声データから「今日」と「は」という言葉を切り出し、「今日は」という音声データを再構成し、「晴れだよ」という応答を「今日は晴れだよ」と修正してもよい。 When it takes time until a response is generated after acquiring the voice uttered by the user, it is desirable to include the content of the voice in the response. So, according to said structure, at least one part of the word contained in an audio | voice is included in a correction response. As a result, communication between the user and the voice interactive apparatus can be smoothly advanced. Note that including at least a part of the words included in the voice in the correction response means, for example, “sunny weather”, which is a response to the voice “What is the weather today?” “Today's weather is sunny” is to correct. Further, the response may be corrected by using a content obtained by editing the voice, that is, a part of a word included in the voice and reconstructed. For example, the words “Today” and “Ha” are extracted from the voice data “What is the weather today?”, The voice data “Today is” is reconstructed, and the response “It ’s sunny” It's fine. "

本発明の態様１１に係る音声対話制御装置の制御方法は、音声対話装置が実行する、ユーザが発した音声に対する応答を生成する応答生成ステップと、上記音声を取得してから上記応答が出力可能になるまでの待機時間に、上記応答の修正要否を判定するための修正条件が満たされたか否かを判定する判定ステップと、上記判定ステップにて、上記修正条件が満たされたと判定したとき、上記応答生成ステップにて生成した上記応答を修正した修正応答を生成する修正ステップと、上記修正ステップにて生成した上記修正応答を上記音声対話装置に実行させる応答実行ステップと、を含む。この制御方法によれば、態様１に係る音声対話制御装置と同様の作用効果を有する。 The control method of the voice interaction control device according to aspect 11 of the present invention includes a response generation step for generating a response to the voice uttered by the user, which is executed by the voice interaction device, and the response can be output after acquiring the voice. A determination step for determining whether or not a correction condition for determining whether or not the response needs to be corrected is satisfied during the waiting time until the time is determined, and when it is determined in the determination step that the correction condition is satisfied A correction step for generating a correction response by correcting the response generated in the response generation step, and a response execution step for causing the voice interaction apparatus to execute the correction response generated in the correction step. According to this control method, the same function and effect as those of the spoken dialogue control apparatus according to aspect 1 are obtained.

本発明の態様１２に係る音声対話装置は、上記態様１から１０のいずれかに係る音声対話制御装置を備えてもよい。上記の構成によれば、この音声対話装置は、ユーザとのコミュニケーションの柔軟性を向上させることができる。 The voice interaction apparatus according to aspect 12 of the present invention may include the voice interaction control apparatus according to any one of aspects 1 to 10 described above. According to the above configuration, this voice interaction apparatus can improve the flexibility of communication with the user.

本発明の各態様に係る音声対話制御装置は、コンピュータによって実現してもよく、この場合には、コンピュータを上記音声対話制御装置が備える各部（ソフトウェア要素）として動作させることにより上記音声対話制御装置をコンピュータにて実現させる音声対話制御装置の制御プログラム、およびそれを記録したコンピュータ読み取り可能な記録媒体も、本発明の範疇に入る。 The voice conversation control device according to each aspect of the present invention may be realized by a computer. In this case, the voice conversation control device is operated by causing the computer to operate as each unit (software element) included in the voice dialogue control device. The control program of the voice interaction control device that realizes the above in a computer and a computer-readable recording medium on which the control program is recorded also fall within the scope of the present invention.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。さらに、各実施形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.

本発明は、ユーザの発話に対して応答する音声対話装置を制御するための音声対話制御装置に利用することができる。 INDUSTRIAL APPLICABILITY The present invention can be used for a voice dialogue control device for controlling a voice dialogue device that responds to a user's utterance.

１、１ａ、１ｂ制御部（音声対話制御装置）、１０、１０ａ、１０ｂ音声対話装置、１１待機時間計測部、１３音声認識部（音声属性特定部）、１４応答生成部、１５応答実行部、１６待機時間予測部、２１、２１ｂ修正要否判定部（判定部）、２２、２２ａ、２２ｂ修正実行部（修正部）、７１場つなぎ動作決定部、Ｓ４応答生成ステップ、Ｓ６判定ステップ、Ｓ７修正ステップ、Ｓ８応答実行ステップ 1, 1a, 1b control unit (voice dialogue control device), 10, 10a, 10b voice dialogue device, 11 standby time measurement unit, 13 voice recognition unit (voice attribute identification unit), 14 response generation unit, 15 response execution unit, 16 standby time prediction unit, 21, 21b correction necessity determination unit (determination unit), 22, 22a, 22b correction execution unit (correction unit), 71 spot connection operation determination unit, S4 response generation step, S6 determination step, S7 correction Step, S8 Response execution step

Claims

A response generation unit that generates a response to the voice uttered by the user, executed by the voice interaction device;
A determination unit that determines whether or not a correction condition for determining whether or not the response needs to be corrected is satisfied in a waiting time from when the voice is acquired until the response can be output;
When the determination unit determines that the correction condition is satisfied, a correction unit that generates a correction response that corrects the response generated by the response generation unit;
A response execution unit that causes the voice interaction device to execute the correction response generated by the correction unit;
A standby time prediction unit for predicting the standby time ,
The determination unit, when the predicted waiting time the standby time predicting section predicts exceeds a predetermined time, speech dialog control device which is characterized that you determined the adjustment conditions are satisfied.

The voice according to claim 1 , wherein the correction unit corrects the response using correction content information indicating correction content of the response associated with time information corresponding to the standby time. Dialog control device.

A response generation step for generating a response to the voice uttered by the user, executed by the voice interaction device;
A determination step of determining whether or not a correction condition for determining whether or not the response needs to be corrected is satisfied in a waiting time from when the voice is acquired until the response can be output;
A correction step for generating a correction response in which the response generated in the response generation step is corrected when it is determined in the determination step that the correction condition is satisfied;
A response execution step for causing the voice interaction apparatus to execute the correction response generated in the correction step;
And the step of predicting the above-mentioned waiting time, only including,
In the determination step, the control method of the spoken dialogue control apparatus, wherein the correction condition is determined to be satisfied when the predicted standby time predicted in the step of predicting the standby time exceeds a predetermined time .

Voice dialogue system, characterized in that it comprises a speech dialog control device according to claim 1 or 2.