JP7159773B2

JP7159773B2 - VOICE OPERATING DEVICE, VOICE OPERATING METHOD, AND VOICE OPERATING SYSTEM

Info

Publication number: JP7159773B2
Application number: JP2018193051A
Authority: JP
Inventors: 浩司竹井
Original assignee: Sumitomo Electric Industries Ltd
Current assignee: Sumitomo Electric Industries Ltd
Priority date: 2018-10-12
Filing date: 2018-10-12
Publication date: 2022-10-25
Anticipated expiration: 2038-10-12
Also published as: JP2020061046A

Description

本発明は、音声操作装置、音声操作方法、および音声操作システムに関する。 The present invention relates to a voice operation device, a voice operation method , and a voice operation system.

近年、ユーザの発話音声を認識し、認識結果に応じて機器操作や情報検索等を行うスマートスピーカが普及している（例えば、特許文献１および２参照）。スマートスピーカは、ユーザが発話した音声データをマイクを通じて取得し、音声データの認識を行う。スマートスピーカは、ホットワード（ウェイクワードとも言う）と呼ばれるスマートスピーカを起動させるためのワードの発話を認識した場合に、待機状態から、機器操作や情報検索等が可能な起動状態に遷移する。起動状態に遷移した後、スマートスピーカは、ユーザが発話した音声データの認識結果に応じて、例えば、エアコン等の家電機器を操作したり、認識結果をキーワードとする情報検索を行ったりする。 2. Description of the Related Art In recent years, smart speakers that recognize a user's uttered voice and perform device operation, information search, etc. according to the recognition result have become popular (see Patent Documents 1 and 2, for example). A smart speaker acquires voice data uttered by a user through a microphone and recognizes the voice data. When the smart speaker recognizes an utterance of a word called a hot word (also called a wake word) for activating the smart speaker, it transitions from the standby state to the activated state in which device operation, information search, etc. are possible. After transitioning to the activation state, the smart speaker operates, for example, home appliances such as an air conditioner, or performs information search using the recognition result as a keyword, according to the recognition result of the voice data uttered by the user.

特開２０１７－７６１１７号公報JP 2017-76117 A 特表２０１６－５０５８８８号公報Japanese Patent Publication No. 2016-505888

スマートスピーカには、ホットワードとして１つの固定ワードが割り当てられている。このため、スマートスピーカのテレビＣＭ（Commercial Message）などでホットワードが再生された場合には、テレビから出力される音声にスマートスピーカが反応し、スマートスピーカが起動してしまう場合がある。 A smart speaker is assigned one fixed word as a hot word. Therefore, when a hot word is reproduced in a television commercial message (Commercial Message) or the like of a smart speaker, the smart speaker may react to the sound output from the television and activate the smart speaker.

また、スマートスピーカが設置されたのと同じユーザの宅内に設置された音声出力可能な再生機器（例えば、スピーカ付き電話）を、宅外に居る悪意のある第三者が遠隔操作し、当該再生機器から音声を出力させることも想定される。この場合、第三者は、当該再生機器からホットワードの音声を再生させることにより、スマートスピーカを起動させ、その後、音声により家電機器等を操作することもできる。このように、ホットワードを固定とした場合には、第三者による宅内の機器の操作や、なりすましによるサービスの利用などが発生する可能性があるという課題がある。 In addition, a malicious third party outside the home remotely operates a playback device capable of audio output (for example, a speaker-equipped phone) installed in the same user's home where the smart speaker is installed. It is also assumed that audio is output from the device. In this case, the third party can activate the smart speaker by playing back the voice of the hot word from the playback device, and then operate the home appliance or the like by voice. In this way, when the hotword is fixed, there is a problem that a third party may operate the device in the house or use the service by spoofing.

このような課題は、スマートスピーカに限定されるものではなく、ホットワードを音声認識することにより起動し、その後に音声操作を受け付けるセットトップボックスなどの他の音声操作装置についても同様に当てはまる。 Such challenges are not limited to smart speakers, but equally apply to other voice-operated devices such as set-top boxes that are activated by voice recognition of hotwords and subsequently accept voice commands.

本発明は、このような事情に鑑みてなされたものであり、音声操作装置のユーザ以外の第三者による不正な音声操作を防止することのできる音声操作装置、音声操作方法、コンピュータプログラムおよび音声操作システムを提供することを目的とする。 The present invention has been made in view of such circumstances, and provides a voice operation device, a voice operation method, a computer program, and a voice capable of preventing unauthorized voice operations by a third party other than the user of the voice operation device. The object is to provide an operating system.

（１）上記目的を達成するために、本発明の一実施態様に係る音声操作装置は、ホットワードを画面に表示させる表示制御部と、ユーザが発話した音声データを取得する音声取得部と、前記音声データに基づいて、前記ホットワードが発話されたか否かを判定する発話判定部と、前記ホットワードが発話されたと判定された場合に、前記ユーザによる音声操作を許可する音声操作部とを備える。 (1) To achieve the above object, a voice operation device according to an embodiment of the present invention includes a display control unit that displays hot words on a screen, a voice acquisition unit that acquires voice data uttered by a user, an utterance determination unit that determines whether or not the hot word has been uttered based on the voice data; and a voice operation unit that permits voice operation by the user when it is determined that the hot word has been uttered. Prepare.

（１１）本発明の他の実施態様に係る音声操作方法は、ホットワードを画面に表示させるステップと、ユーザが発話した音声データを取得するステップと、前記音声データに基づいて、前記ホットワードが発話されたか否かを判定するステップと、前記ホットワードが発話されたと判定された場合に、前記ユーザによる音声操作を許可するステップとを含む。 (11) A voice operation method according to another embodiment of the present invention comprises the steps of: displaying a hot word on a screen; obtaining voice data uttered by a user; determining whether or not the hotword has been spoken; and permitting the voice operation by the user if it is determined that the hotword has been spoken.

（１２）本発明の他の実施態様に係るコンピュータプログラムは、コンピュータを、ホットワードを画面に表示させる表示制御部と、ユーザが発話した音声データを取得する音声取得部と、前記音声データに基づいて、前記ホットワードが発話されたか否かを判定する発話判定部と、前記ホットワードが発話されたと判定された場合に、前記ユーザによる音声操作を許可する音声操作部として機能させる。 (12) A computer program according to another embodiment of the present invention is a computer program comprising: a display control unit for displaying hot words on a screen; a voice acquisition unit for acquiring voice data uttered by a user; and a voice operation unit that permits voice operation by the user when it is determined that the hot word has been uttered.

（１３）本発明の他の実施態様に係る音声操作システムは、表示装置と、ホットワードを前記表示装置の画面に表示させる表示制御部と、ユーザが発話した音声データを取得する音声取得部と、前記音声データに基づいて、前記ホットワードが発話されたか否かを判定する発話判定部と、前記ホットワードが発話されたと判定された場合に、前記ユーザによる音声操作を許可する音声操作部とを備える。 (13) A voice operation system according to another embodiment of the present invention comprises a display device, a display control section for displaying a hot word on the screen of the display device, and a voice acquisition section for acquiring voice data uttered by a user. an utterance determination unit that determines whether or not the hot word has been uttered based on the voice data; and a voice operation unit that permits voice operation by the user when it is determined that the hot word has been uttered. Prepare.

なお、本発明は、音声操作装置の一部又は全部を実現する半導体集積回路として実現することもできる。 The present invention can also be implemented as a semiconductor integrated circuit that implements part or all of the voice operation device.

本発明によると、音声操作装置のユーザ以外の第三者による不正な音声操作を防止することができる。 According to the present invention, unauthorized voice operation by a third party other than the user of the voice operation device can be prevented.

本発明の実施の形態に係る音声操作システムの構成を示す図である。1 is a diagram showing the configuration of a voice operation system according to an embodiment of the present invention; FIG. 本発明の実施の形態に係るＳＴＢ（セットトップボックス：Set Top Box）の処理手順の一例を示すフローチャートである。It is a flow chart which shows an example of a processing procedure of STB (Set Top Box: Set Top Box) concerning an embodiment of the invention. 本発明の実施の形態に係る音声操作システムの動作の一例を示す図である。It is a figure which shows an example of operation|movement of the voice operation system which concerns on embodiment of this invention.

［本願発明の実施形態の概要］
最初に本発明の実施形態の概要を列記して説明する。
（１）本発明の一実施形態に係る音声操作装置は、ホットワードを画面に表示させる表示制御部と、ユーザが発話した音声データを取得する音声取得部と、前記音声データに基づいて、前記ホットワードが発話されたか否かを判定する発話判定部と、前記ホットワードが発話されたと判定された場合に、前記ユーザによる音声操作を許可する音声操作部とを備える。 [Overview of Embodiments of the Present Invention]
First, an overview of the embodiments of the present invention will be listed and explained.
(1) A voice operation device according to an embodiment of the present invention includes a display control unit that displays hot words on a screen, a voice acquisition unit that acquires voice data uttered by a user, and based on the voice data, An utterance determination unit that determines whether or not a hot word has been uttered, and a voice operation unit that permits a voice operation by the user when it is determined that the hot word has been uttered.

この構成によると、ホットワードが画面に表示され、画面に表示されているホットワードが発話された場合にユーザによる音声操作が許可される。このため、画面を見ることのできない第三者は、ホットワードを知ることができないため、ホットワードを発話することができず、これにより、第三者による音声操作を拒否することができる。よって、音声操作装置のユーザ以外の第三者による不正な音声操作を防止することができる。 According to this configuration, the hotword is displayed on the screen, and the voice operation by the user is permitted when the hotword displayed on the screen is spoken. For this reason, a third party who cannot see the screen cannot know the hotword and cannot utter the hotword, thereby rejecting the voice operation by the third party. Therefore, unauthorized voice operation by a third party other than the user of the voice operation device can be prevented.

（２）好ましくは、前記画面は、前記音声操作装置と同一場所に存在する。 (2) Preferably, the screen is co-located with the voice operation device.

この構成によると、ホットワードが音声操作装置と同一場所に存在する画面に表示され、画面に表示されているホットワードが発話された場合にユーザによる音声操作が許可される。このため、例えば、音声操作装置の設置場所であるユーザの宅内に居ない第三者は、画面を見ることができず、ホットワードを知ることができない。このため、第三者は、ホットワードを発話することができず、これにより、第三者による音声操作を拒否することができる。よって、音声操作装置のユーザ以外の第三者による不正な音声操作を防止することができる。 According to this configuration, the hotword is displayed on the screen in the same place as the voice operation device, and the user's voice operation is permitted when the hotword displayed on the screen is spoken. Therefore, for example, a third party who is not in the home of the user where the voice operation device is installed cannot see the screen and cannot know the hotword. Therefore, the third party cannot utter the hotword, thereby rejecting the voice operation by the third party. Therefore, unauthorized voice operation by a third party other than the user of the voice operation device can be prevented.

（３）さらに好ましくは、前記表示制御部は、前記ホットワードを所定の期間にわたり前記画面に表示させ、前記発話判定部は、前記所定の期間中に発話された前記音声データに基づいて、前記ホットワードが発話されたか否かを判定する。 (3) More preferably, the display control unit displays the hot word on the screen for a predetermined period of time, and the speech determination unit displays the hot word based on the voice data uttered during the predetermined period of time. Determine whether a hotword has been spoken.

この構成によると、ホットワードが画面に表示されている期間中にホットワードが発話された場合にのみ、音声操作を許可することができる。このため、過去に画面に表示された他のホットワードに基づいて音声操作が許可されることがなくなる。これにより、第三者による不正な音声操作を、より防止することができる。 According to this configuration, voice operation can be permitted only when the hotword is spoken while the hotword is displayed on the screen. Therefore, the voice operation is not permitted based on other hotwords displayed on the screen in the past. This makes it possible to further prevent unauthorized voice operations by a third party.

（４）また、前記表示制御部は、所定の変更条件に従って前記ホットワードを変更してもよい。 (4) Further, the display control unit may change the hotword according to a predetermined change condition.

この構成によると、ホットワードが第三者に暴露した場合であっても、所定の変更条件に従ってホットワードが変更されることにより、第三者による不正な音声操作を防止することができる。また、固定のホットワードを読み上げるテレビＣＭの音声などに音声操作装置が反応して、音声操作装置が起動する可能性を低くすることもできる。 According to this configuration, even if the hotword is disclosed to a third party, the hotword is changed according to the predetermined change conditions, thereby preventing unauthorized voice operations by the third party. In addition, it is possible to reduce the possibility that the voice operation device will be activated in response to the voice of a TV commercial that reads a fixed hot word.

（５）また、前記表示制御部は、周期的に前記ホットワードを変更してもよい。 (5) Further, the display control unit may periodically change the hotword.

この構成によると、周期的にホットワードが変更されるため、第三者による不正な音声操作を、より防止することができる。 According to this configuration, since the hotword is changed periodically, it is possible to further prevent unauthorized voice operations by a third party.

（６）また、前記表示制御部は、前記発話判定部での判定結果に基づいて、前記ホットワードを変更してもよい。 (6) Further, the display control section may change the hot word based on the determination result of the speech determination section.

この構成によると、ホットワードが発話されたか否かの判定結果に応じてホットワードを変更可能であるため、例えば、ホットワードが発話されなかったと判定された場合、または発話されたと判定された場合などにホットワードを変更することができる。 According to this configuration, the hotword can be changed according to the determination result of whether or not the hotword has been uttered. You can change the hotword to e.g.

（７）また、前記表示制御部は、前記ホットワードが発話されなかったと判定された回数に基づいて、前記ホットワードを変更してもよい。 (7) Further, the display control unit may change the hotword based on the number of times it is determined that the hotword is not uttered.

この構成によると、例えば、所定回数連続してホットワードが発話されなかったと判定された場合に、ホットワードを変更することができる。このため、第三者が遠隔操作で再生機器から音声を出力させる等して、音声操作装置を操作しようとして失敗した場合に、ホットワードを変更することができる。これにより、音声操作装置が第三者から不正に操作されるのを防止することができる。 According to this configuration, for example, when it is determined that the hot word has not been uttered a predetermined number of times in succession, the hot word can be changed. Therefore, if a third party attempts to operate the voice operation device by, for example, outputting voice from the playback device by remote control and fails, the hot word can be changed. As a result, it is possible to prevent the voice operating device from being illegally operated by a third party.

（８）また、前記表示制御部は、前記音声操作が終了した場合に前記ホットワードを変更してもよい。 (8) Further, the display control unit may change the hotword when the voice operation is finished.

この構成によると、ユーザによる音声操作が終了する度にホットワードを変更することができる。これにより、ホットワードが第三者に暴露されにくくすることができ、音声操作装置が第三者から不正に操作されるのを防止することができる。 According to this configuration, the hotword can be changed each time the voice operation by the user is completed. This makes it difficult for the hot word to be exposed to a third party, and prevents the voice operation device from being illegally operated by a third party.

（９）また、前記表示制御部は、記憶部に予め記憶されている複数のワードの中からワードを選択し、選択したワードを前記ホットワードとして前記画面に表示させてもよい。 (9) Further, the display control section may select a word from a plurality of words pre-stored in a storage section, and display the selected word on the screen as the hot word.

この構成によると、事前に記憶部に記憶されているワードをホットワードとすることができるため、当該ワードを認識可能な音声認識モデルを音声操作装置の出荷前に作成することができる。このため、発話判定部が音声認識により判定処理を行うのに先立って、音声認識モデルの学習を行う必要がなくなる。 According to this configuration, a word stored in advance in the storage unit can be used as a hot word, so that a voice recognition model capable of recognizing the word can be created before shipment of the voice operation device. Therefore, it is not necessary to learn the speech recognition model before the utterance judging section performs judgment processing by speech recognition.

（１０）また、前記表示制御部は、前記ユーザが決定したワードを前記ホットワードとして前記画面に表示させてもよい。 (10) Further, the display control unit may display the word determined by the user on the screen as the hot word.

この構成によると、ユーザが決定したワードをホットワードとすることができるため、自由にホットワードを決定することができ、これにより、ホットワードが第三者に暴露されにくくすることができる。 According to this configuration, since a word determined by the user can be used as a hotword, the hotword can be freely determined, thereby making it difficult for the hotword to be exposed to a third party.

（１１）本発明の他の実施形態に係る音声操作方法は、ホットワードを画面に表示させるステップと、ユーザが発話した音声データを取得するステップと、前記音声データに基づいて、前記ホットワードが発話されたか否かを判定するステップと、前記ホットワードが発話されたと判定された場合に、前記ユーザによる音声操作を許可するステップとを含む。 (11) A voice operation method according to another embodiment of the present invention includes the steps of displaying a hot word on a screen, acquiring voice data uttered by a user, and generating the hot word based on the voice data. determining whether or not the hotword has been spoken; and permitting the voice operation by the user if it is determined that the hotword has been spoken.

この構成は、上述の音声操作装置が備える特徴的な処理部に対応するステップを含む。このため、上述の音声操作装置と同様の作用および効果を奏することができる。 This configuration includes steps corresponding to the characteristic processing units included in the voice operation device described above. Therefore, it is possible to obtain the same actions and effects as those of the voice operation device described above.

（１２）本発明の他の実施形態に係るコンピュータプログラムは、コンピュータを、ホットワードを画面に表示させる表示制御部と、ユーザが発話した音声データを取得する音声取得部と、前記音声データに基づいて、前記ホットワードが発話されたか否かを判定する発話判定部と、前記ホットワードが発話されたと判定された場合に、前記ユーザによる音声操作を許可する音声操作部として機能させる。 (12) A computer program according to another embodiment of the present invention provides a computer program comprising: a display control unit for displaying a hot word on a screen; a voice acquisition unit for acquiring voice data uttered by a user; and a voice operation unit that permits voice operation by the user when it is determined that the hot word has been uttered.

この構成によると、コンピュータを上述の音声操作装置として機能させることができる。このため、上述の音声操作装置と同様の作用および効果を奏することができる。 According to this configuration, the computer can function as the voice operation device described above. Therefore, it is possible to obtain the same actions and effects as those of the voice operation device described above.

（１３）本発明の他の実施形態に係る音声操作システムは、表示装置と、ホットワードを前記表示装置の画面に表示させる表示制御部と、ユーザが発話した音声データを取得する音声取得部と、前記音声データに基づいて、前記ホットワードが発話されたか否かを判定する発話判定部と、前記ホットワードが発話されたと判定された場合に、前記ユーザによる音声操作を許可する音声操作部とを備える。 (13) A voice operation system according to another embodiment of the present invention includes a display device, a display control unit that displays hot words on the screen of the display device, and a voice acquisition unit that acquires voice data uttered by a user. an utterance determination unit that determines whether or not the hot word has been uttered based on the voice data; and a voice operation unit that permits voice operation by the user when it is determined that the hot word has been uttered. Prepare.

この音声操作システムは、上述の音声操作装置を構成として含む。このため、上述の音声操作装置と同様の作用および効果を奏することができる。 This voice operation system includes the voice operation device described above as a configuration. Therefore, it is possible to obtain the same actions and effects as those of the voice operation device described above.

［本願発明の実施形態の詳細］
以下、本発明の実施の形態について、図面を用いて詳細に説明する。なお、以下で説明する実施の形態は、いずれも本発明の好ましい一具体例を示すものである。以下の実施の形態で示される数値、形状、材料、構成要素、構成要素の配置位置および接続形態、ステップ、ステップの順序などは、一例であり、本発明を限定する主旨ではない。本発明は、特許請求の範囲によって特定される。よって、以下の実施の形態における構成要素のうち、本発明の最上位概念を示す独立請求項に記載されていない構成要素については、本発明の課題を達成するのに必ずしも必要ではないが、より好ましい形態を構成するものとして説明される。 [Details of the embodiment of the present invention]
BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. It should be noted that each of the embodiments described below is a preferred specific example of the present invention. Numerical values, shapes, materials, components, arrangement positions and connection forms of components, steps, order of steps, and the like shown in the following embodiments are examples and are not intended to limit the present invention. The invention is defined by the claims. Therefore, among the constituent elements in the following embodiments, the constituent elements not described in the independent claims representing the top concept of the present invention are not necessarily required to achieve the object of the present invention, but are more It is described as constituting a preferred form.

また、同一の構成要素には同一の符号を付す。それらの機能および名称も同様であるため、それらの説明は適宜省略する。 Moreover, the same code|symbol is attached|subjected to the same component. Since their functions and names are also the same, description thereof will be omitted as appropriate.

＜音声操作システムの全体構成＞
図１は、本発明の実施の形態に係る音声操作システムの構成を示す図である。
図１を参照して、音声操作システム１は、ＳＴＢ１０と、表示装置２０とを備える。 <Overall configuration of voice operation system>
FIG. 1 is a diagram showing the configuration of a voice operation system according to an embodiment of the present invention.
Referring to FIG. 1, voice operation system 1 includes STB 10 and display device 20 .

ＳＴＢ１０は、音声操作装置として機能し、ユーザ４０が発話した音声の音声データを取得し、取得した音声データに基づいて、ＳＴＢ１０またはＳＴＢ１０に接続されたエアコン３０等の家庭用機器を操作する。なお、ユーザ４０が発話した音声の音声データには、ユーザ４０自身が発話した音声の音声データの他に、ユーザ４０が音声を発話可能な機器を操作することにより、当該機器が発話した音声の音声データも含むものとする。例えば、発話障害を有するユーザ４０が音声合成装置を操作することにより、当該装置が発話した合成音声の音声データも、ユーザ４０が発話した音声の音声データに含まれる。つまり、ユーザ４０が発話した音声の音声データは、ユーザ４０の発話意思に基づく音声の音声データを示す。 The STB 10 functions as a voice operation device, acquires voice data of voice uttered by the user 40, and operates the STB 10 or household appliances such as the air conditioner 30 connected to the STB 10 based on the acquired voice data. In addition to the voice data of the voice uttered by the user 40 himself/herself, the voice data of the voice uttered by the user 40 may include the voice data of the voice uttered by the user 40 by operating a device capable of uttering voice. Also includes audio data. For example, when the user 40 with speech impairment operates a speech synthesizer, the speech data of the synthesized speech uttered by the device is also included in the speech data of the speech uttered by the user 40 . That is, the voice data of the voice uttered by the user 40 indicates the voice data of the voice based on the intention of the user 40 to utter.

なお、家庭用機器はエアコン３０に限定されるものではなく、ＳＴＢ１０と有線または無線により接続された機器であればその他の機器であってもよい。例えば、家庭用機器は、ＳＴＢ１０からの指示に従い電源をオンまたはオフしたり光量を調整したりする照明器具であってもよい。 Note that the household equipment is not limited to the air conditioner 30, and may be other equipment as long as it is connected to the STB 10 by wire or wirelessly. For example, the household appliance may be a lighting fixture that turns on or off or adjusts the amount of light according to instructions from the STB 10 .

また、音声操作装置は、ＳＴＢ１０に限定されるものではなく、ユーザ４０が発話した音声データに基づいて音声操作可能な装置であれば、その他の装置であってもよい。例えば、音声操作装置は、音声データに基づいて、情報を検索したり、家庭用機器を操作したりするスマートスピーカであってもよい。 Also, the voice operation device is not limited to the STB 10, and may be any other device as long as it can be voice operated based on voice data uttered by the user 40. FIG. For example, the voice control device may be a smart speaker that retrieves information or controls home appliances based on voice data.

表示装置２０は、ＳＴＢ１０と有線または無線により接続される。表示装置２０は、例えば、ＳＴＢ１０とＨＤＭＩ（登録商標）（High-Definition Multimedia Interface）ケーブルで接続されたディスプレイ装置である。表示装置２０は、ＳＴＢ１０から出力される映像データまたは画像データを画面に表示する。なお、表示装置２０は、ＳＴＢ１０またはスマートスピーカなどの音声操作装置に内蔵されていてもよい。つまり、音声操作装置が表示画面付きであってもよい。逆に、表示装置２０にＳＴＢ１０またはスマートスピーカなどの機能が内蔵されていてもよい。 The display device 20 is connected to the STB 10 by wire or wirelessly. The display device 20 is, for example, a display device connected to the STB 10 via an HDMI (registered trademark) (High-Definition Multimedia Interface) cable. The display device 20 displays video data or image data output from the STB 10 on the screen. Note that the display device 20 may be incorporated in the STB 10 or a voice operation device such as a smart speaker. That is, the voice operation device may have a display screen. Conversely, the display device 20 may have built-in functions such as the STB 10 or a smart speaker.

ＳＴＢ１０は、ＳＴＢ１０自身の音声操作に関する機能（後述する音声操作部１３、再生処理部１４）を起動させるために用いられ、かつユーザ４０による音声操作を許可するために用いられるホットワードを、表示装置２０に表示する制御を行う。 The STB 10 displays a hot word used for activating functions related to voice operation of the STB 10 itself (a voice operation unit 13 and a playback processing unit 14, which will be described later) and used for permitting voice operation by the user 40, on the display device. 20 is displayed.

ＳＴＢ１０は、音声取得部１１と、音声認識部１２と、音声操作部１３と、再生処理部１４と、映像出力部１５と、ホットワード表示制御部１６と、記憶部１７とを備える。 The STB 10 includes a voice acquisition section 11 , a voice recognition section 12 , a voice operation section 13 , a reproduction processing section 14 , a video output section 15 , a hot word display control section 16 and a storage section 17 .

音声取得部１１は、ユーザ４０が発話した音声データを取得する。具体的には、音声取得部１１は、マイクを含んで構成され、マイクに入力された音声をＡ／Ｄ（Analog to Digital）変換することで音声データに変換し、変換後の音声データを取得する。 The voice acquisition unit 11 acquires voice data uttered by the user 40 . Specifically, the voice acquisition unit 11 includes a microphone, converts voice input to the microphone into voice data by A/D (Analog to Digital) conversion, and acquires voice data after conversion. do.

音声認識部１２は、発話判定部として機能し、音声取得部１１が取得した音声データに基づいて、ユーザ４０がホットワードを発話したか否かを判定する。つまり、音声認識部１２は、音声データの音声認識処理を行い、音声データ中にホットワードの発話音声が含まれるか否かを判定する。音声認識部１２は、判定結果を示す判定結果信号を音声操作部１３およびホットワード表示制御部１６に送信する。 The voice recognition unit 12 functions as an utterance determination unit, and determines whether or not the user 40 has uttered a hot word based on the voice data acquired by the voice acquisition unit 11 . That is, the speech recognition unit 12 performs speech recognition processing on the speech data, and determines whether or not the speech data includes the uttered speech of the hot word. The voice recognition unit 12 transmits a determination result signal indicating the determination result to the voice operation unit 13 and hot word display control unit 16 .

音声認識処理には、公知の技術を用いることができる。例えば、隠れマルコフモデル（Hidden Markov Model）や、ディープラーニングにより機械学習されたニューラルネットワークなどを用いて、音声認識処理を行うことができる。 A known technique can be used for the speech recognition processing. For example, speech recognition processing can be performed using a Hidden Markov Model, a neural network machine-learned by deep learning, or the like.

なお、音声認識部１２は、ホットワード以外のワードも認識することができる。例えば、音声認識部１２は、音声データから、エアコン３０を操作するためのワード（例えば、「３０℃」、「電源オン」など）を認識することもできる。音声認識部１２は、音声認識結果を音声操作部１３に送信する。 Note that the speech recognition unit 12 can also recognize words other than hot words. For example, the speech recognition unit 12 can also recognize words for operating the air conditioner 30 (for example, "30° C.", "power on", etc.) from the speech data. The voice recognition section 12 transmits the voice recognition result to the voice operation section 13 .

音声操作部１３は、音声認識部１２から判定結果信号および音声認識結果を受信する。音声操作部１３は、判定結果信号がホットワードが発話されたことを示している場合には、ユーザ４０による音声操作を許可し、受信した音声認識結果による音声操作を実行する。 Voice operation unit 13 receives the determination result signal and the voice recognition result from voice recognition unit 12 . When the determination result signal indicates that a hot word has been uttered, the voice operation unit 13 permits the voice operation by the user 40 and executes the voice operation based on the received voice recognition result.

例えば、音声認識結果が「エアコンの電源オン」を示す場合には、音声操作部１３は、当該音声認識結果に対応付けられたエアコン３０の電源をオンするための操作信号を記憶部１７から読出し、エアコン３０に送信する。エアコン３０は、操作信号を受信し、受信した操作信号に従いエアコン３０の電源をオンする。 For example, when the voice recognition result indicates "air conditioner power on", voice operation unit 13 reads from storage unit 17 an operation signal for turning on air conditioner 30 associated with the voice recognition result. , to the air conditioner 30 . The air conditioner 30 receives the operation signal and turns on the power of the air conditioner 30 according to the received operation signal.

また、音声認識結果が「タイトルＡのコンテンツ再生」を示す場合には、音声操作部１３は、当該音声認識結果に対応付けられたタイトルＡのコンテンツ再生を指示するための操作信号を記憶部１７から読出し、再生処理部１４に送信する。 Further, when the speech recognition result indicates "playback of content title A", voice operation unit 13 stores an operation signal for instructing playback of content title A associated with the voice recognition result to storage unit 17. , and transmits it to the reproduction processing unit 14 .

再生処理部１４は、音声操作部１３からの操作信号に従って、コンテンツの再生、停止、早送り、巻き戻し等のコンテンツに対する処理を行う。例えば、再生処理部１４は、操作信号で指示されたタイトルＡのコンテンツデータを記憶部１７から読み出し、読み出したコンテンツデータを、映像出力部１５に出力する。 The reproduction processing unit 14 performs content processing such as reproduction, stop, fast forward, rewind, etc. of the content according to the operation signal from the voice operation unit 13 . For example, the reproduction processing unit 14 reads the content data of title A indicated by the operation signal from the storage unit 17 and outputs the read content data to the video output unit 15 .

映像出力部１５は、再生処理部１４からコンテンツデータを受信し、受信したコンテンツデータを表示装置２０に送信することにより、表示装置２０の画面にコンテンツデータを表示させる。 The video output unit 15 receives content data from the reproduction processing unit 14 and transmits the received content data to the display device 20 to display the content data on the screen of the display device 20 .

ホットワード表示制御部１６は、表示制御部として機能し、ホットワードを表示装置２０の画面に表示するための制御を行う。例えば、ホットワード表示制御部１６は、記憶部１７に予め記憶されているホットワードを読み出し、読み出したホットワードを映像出力部１５に出力する。 The hotword display control unit 16 functions as a display control unit, and performs control for displaying hotwords on the screen of the display device 20 . For example, the hotword display control unit 16 reads hotwords pre-stored in the storage unit 17 and outputs the read hotwords to the video output unit 15 .

映像出力部１５は、ホットワード表示制御部１６からホットワードを受信し、受信したホットワードを表示装置２０に送信することにより、表示装置２０の画面にホットワードを表示させる。 The video output unit 15 receives hotwords from the hotword display control unit 16 and transmits the received hotwords to the display device 20 to display the hotwords on the screen of the display device 20 .

記憶部１７は、コンテンツデータやホットワードなどの各種データを記憶するための記憶装置であり、例えば、フラッシュメモリなどの不揮発性メモリ、またはＨＤＤ（Hard Disk Drive）などの磁気記憶装置などより構成される。 The storage unit 17 is a storage device for storing various data such as content data and hotwords, and is composed of, for example, a non-volatile memory such as a flash memory, or a magnetic storage device such as a HDD (Hard Disk Drive). be.

記憶部１７には、１つまたは複数のホットワードが事前に登録されているものとする。ホットワードは、ＳＴＢ１０の出荷時までにＳＴＢ１０の製造者等が事前に記憶部１７に記憶したものであってもよいし、ＳＴＢ１０の出荷後にユーザ４０が決定し、記憶部１７に記憶したものであってもよい。 It is assumed that one or more hotwords are registered in advance in the storage unit 17 . The hotword may be stored in the storage unit 17 in advance by the manufacturer of the STB 10 before shipment of the STB 10, or may be determined by the user 40 after the shipment of the STB 10 and stored in the storage unit 17. There may be.

また、記憶部１７には、音声操作部１３による音声操作用のワードが事前に登録されており、当該ワードに対応付けられた再生処理部１４またはエアコン３０の操作信号も記憶されている。 Words for voice operation by the voice operation unit 13 are registered in advance in the storage unit 17, and operation signals for the reproduction processing unit 14 or the air conditioner 30 associated with the words are also stored.

＜ＳＴＢ１０の処理フロー＞
図２は、本発明の実施の形態に係るＳＴＢの処理手順の一例を示すフローチャートである。 <Processing Flow of STB 10>
FIG. 2 is a flow chart showing an example of the STB processing procedure according to the embodiment of the present invention.

図２を参照して、ホットワード表示制御部１６は、記憶部１７に記憶されているホットワードの中からいずれか１つのホットワードを読み出し、映像出力部１５に出力する。ホットワードの読み出し順序は、ランダムでも良いし、所定の順序（例えば、あいうえお順）であってもよい。映像出力部１５は、ホットワード表示制御部１６からホットワードを受信し、受信したホットワードを表示装置２０に送信することにより、表示装置２０の画面にホットワードを表示させる（Ｓ１）。ホットワード表示制御部１６は、ホットワードを所定の期間にわたり画面に表示させる。例えば、ホットワード表示制御部１６は、ＳＴＢ１０が起動している間中ホットワードを表示させるのが望ましい。なお、ホットワード表示制御部１６は、例えば、ＳＴＢ１０の起動時またはホットワードの変更時から所定時間（例えば、５分）の間にホットワードを表示させるとしてもよい。 Referring to FIG. 2 , hotword display control unit 16 reads out one of the hotwords stored in storage unit 17 and outputs it to video output unit 15 . The hotwords may be read out in random order or in a predetermined order (for example, alphabetical order). The video output unit 15 receives the hotword from the hotword display control unit 16 and transmits the received hotword to the display device 20, thereby displaying the hotword on the screen of the display device 20 (S1). The hotword display control unit 16 displays hotwords on the screen for a predetermined period of time. For example, the hotword display control unit 16 preferably displays hotwords while the STB 10 is running. Note that the hotword display control unit 16 may, for example, display hotwords for a predetermined period of time (for example, 5 minutes) from the time the STB 10 is activated or the hotword is changed.

その後、音声取得部１１は、ユーザ４０が発話した音声データを取得したか否かを判定する（Ｓ２）。 After that, the voice acquisition unit 11 determines whether voice data uttered by the user 40 has been acquired (S2).

音声データを取得した場合には（Ｓ２でＹＥＳ）、音声認識部１２は、取得した音声データを音声認識することにより、表示装置２０の画面に表示されているホットワードの発話音声が音声データ中に含まれるか否かを判定する（Ｓ３）。 If voice data has been acquired (YES in S2), the voice recognition unit 12 performs voice recognition on the acquired voice data so that the uttered voice of the hot word displayed on the screen of the display device 20 is included in the voice data. (S3).

音声データ中にホットワードの発話音声が含まれる場合、つまり、ホットワードが発話された場合には（Ｓ３でＹＥＳ）、音声取得部１１は、ユーザ４０が発話した音声データを取得するまで待機する（Ｓ４）。 If the voice data contains the uttered voice of the hot word, that is, if the hot word is uttered (YES in S3), the voice acquisition unit 11 waits until the voice data uttered by the user 40 is acquired. (S4).

音声データを取得した場合には（Ｓ４でＹＥＳ）、音声認識部１２は、取得した音声データを音声認識することにより、音声データ中に音声操作用のワードの発話音声が含まれるか否かを判定する（Ｓ５）。 If the voice data has been acquired (YES in S4), the voice recognition unit 12 performs voice recognition on the acquired voice data to determine whether or not the voice data contains an utterance of words for voice operation. Determine (S5).

音声データ中に音声操作用のワードの発話音声が含まれる場合、つまり、音声操作用の音声が発話された場合には（Ｓ５でＹＥＳ）、音声操作部１３は、音声操作用のワードに対応した操作信号を記憶部１７から読出し、再生処理部１４またはエアコン３０に送信することにより、音声操作を実行する（Ｓ６）。つまり、操作信号を受信した再生処理部１４は、当該操作信号に基づいて、コンテンツを記憶部１７から読出し、映像出力部１５に出力してもよい。映像出力部１５は、再生処理部１４からコンテンツを取得し、表示装置２０の画面に表示させる。また、操作信号を受信したエアコン３０は、当該操作信号に基づいて、エアコン３０の電源をオンまたはオフしたり、設定温度を変更したりする。 If the voice data contains the uttered voice of the word for voice operation, that is, if the voice for voice operation is uttered (YES in S5), the voice operation unit 13 responds to the word for voice operation. The voice operation is executed by reading the operation signal from the storage unit 17 and transmitting it to the reproduction processing unit 14 or the air conditioner 30 (S6). That is, the reproduction processing unit 14 that has received the operation signal may read the content from the storage unit 17 and output it to the video output unit 15 based on the operation signal. The video output unit 15 acquires content from the reproduction processing unit 14 and displays it on the screen of the display device 20 . Also, the air conditioner 30 that has received the operation signal turns on or off the power of the air conditioner 30 or changes the set temperature based on the operation signal.

音声操作が実行された後、ホットワード表示制御部１６は、表示装置２０の画面に表示されているホットワードとは異なるホットワードを記憶部１７から読み出すことにより、ホットワードを変更する（Ｓ７）。その後、ステップＳ１に制御を戻す。これにより、表示装置２０の画面には、今まで表示されていたのとは異なるホットワードが表示されることになる。 After the voice operation is performed, the hotword display control unit 16 changes the hotword by reading from the storage unit 17 a hotword different from the hotword displayed on the screen of the display device 20 (S7). . After that, the control is returned to step S1. As a result, the screen of the display device 20 displays a hot word different from the hot words that have been displayed so far.

音声データが取得できない場合には（Ｓ２でＮＯ）、ホットワード表示制御部１６は、予め定められた、ホットワードの変更条件を満たすか否かを判断する（Ｓ８）。 If the voice data cannot be obtained (NO in S2), the hotword display control unit 16 determines whether or not a predetermined hotword change condition is satisfied (S8).

また、音声データは取得できたが、音声データ中にホットワードの発話音声が含まれない場合、つまり、ホットワードが発話されていない場合にも（Ｓ３でＮＯ）、ホットワード表示制御部１６は、ホットワードの変更条件を満たすか否かを判断する（Ｓ８）。 Also, if the voice data was acquired, but the voice data does not contain the voice of the hot word, that is, if the hot word is not spoken (NO in S3), the hot word display control unit 16 , whether or not the hotword change condition is satisfied (S8).

さらに、音声データ中に音声操作用のワードの発話音声が含まれない場合、つまり、音声操作用の音声が発話されていない場合にも（Ｓ５でＮＯ）、ホットワード表示制御部１６は、ホットワードの変更条件を満たすか否かを判断する（Ｓ８）。 Furthermore, if the voice data does not contain the uttered voice of the word for voice operation, that is, if the voice for voice operation is not uttered (NO in S5), the hot word display control unit 16 It is determined whether or not the word change condition is satisfied (S8).

変更条件を満たす場合には（Ｓ８でＹＥＳ）、ホットワード変更処理（Ｓ７）を実行し、変更条件を満たさない場合には（Ｓ８でＮＯ）、ホットワードを変更することなく、ホットワード表示処理（Ｓ１）を実行する。 If the change condition is satisfied (YES in S8), hot word change processing (S7) is executed. If the change condition is not satisfied (NO in S8), hot word display processing is performed without changing the hot word. (S1) is executed.

例えば、ホットワード表示制御部１６は、ホットワードを変更してから所定時間経過している場合には、変更条件を満たすと判断し、所定時間経過していない場合には、変更条件を満たさないと判断する。これにより、周期的にホットワードを変更することができる。 For example, the hotword display control unit 16 determines that the change condition is satisfied if a predetermined time has passed since the hotword was changed, and that the change condition is not satisfied if the predetermined time has not passed since the hotword was changed. I judge. This allows the hotword to be changed periodically.

また、ホットワード表示制御部１６は、音声認識部１２から受信したホットワードの判定結果信号に基づいて、変更条件を満たすか否かを判断してもよい。例えば、ホットワード表示制御部１６は、ホットワードが発話されなかったと判定された回数が所定の閾値以上の場合に変更条件を満たし、当該回数が所定の閾値未満の場合には変更条件を満たさないと判断してもよい。また、ホットワード表示制御部１６は、ホットワードが発話されなかったと連続して判定された回数が所定の閾値を超えた場合に、変更条件を満たすと判断し、当該連続判定回数が所定の閾値未満の場合に、変更条件を満たさないと判断してもよい。さらに、ホットワード表示制御部１６は、ホットワードが発話されたと判定された回数を用いて、変更条件を満たすか否かを判断してもよい。これらの回数は、変更条件を満たすと判定された場合に０にリセットされる。 Further, the hot word display control unit 16 may determine whether or not the change condition is satisfied based on the hot word determination result signal received from the speech recognition unit 12 . For example, the hotword display control unit 16 satisfies the change condition when the number of times it is determined that the hotword is not uttered is equal to or greater than a predetermined threshold, and does not satisfy the change condition when the number of times is less than the predetermined threshold. can be judged. Further, when the number of consecutive determinations that the hotword was not uttered exceeds a predetermined threshold, the hotword display control unit 16 determines that the change condition is satisfied, and the number of consecutive determinations satisfies the predetermined threshold. In the case of less than, it may be determined that the change condition is not satisfied. Furthermore, the hotword display control unit 16 may use the number of times the hotword is determined to be uttered to determine whether or not the change condition is satisfied. These counts are reset to 0 when it is determined that the change conditions are met.

なお、待機処理（Ｓ４）において、音声データの入力が一定時間以上ない場合には、ステップＳ８に移行させてもよい。 In addition, in the standby process (S4), if there is no voice data input for a certain period of time or longer, the process may be shifted to step S8.

また、記憶部１７にホットワードが１つしか登録されていない場合には、ホットワードを変更することができない。このため、このような場合には、ステップＳ７およびＳ８の処理を省略してもよい。 Also, if only one hotword is registered in the storage unit 17, the hotword cannot be changed. Therefore, in such a case, the processing of steps S7 and S8 may be omitted.

＜音声操作システムの動作例＞
図３は、本発明の実施の形態に係る音声操作システムの動作の一例を示す図である。 <Operation example of the voice operation system>
FIG. 3 is a diagram showing an example of the operation of the voice operation system according to the embodiment of the invention.

図３に示すように、音声操作システム１を構成するＳＴＢ１０および表示装置２０は、ユーザ４０の宅内に設置されているものとする。ＳＴＢ１０および表示装置２０は同じ部屋に設置されていることが望ましいが、ＳＴＢ１０と表示装置２０とが接続可能であり、かつユーザ４０の音声データをＳＴＢ１０が取得可能であれば、必ずしも同じ部屋に設置されている必要はない。 As shown in FIG. 3, it is assumed that the STB 10 and the display device 20 that constitute the voice operation system 1 are installed in the home of the user 40 . It is desirable that the STB 10 and the display device 20 are installed in the same room, but if the STB 10 and the display device 20 can be connected and the STB 10 can acquire the voice data of the user 40, they are not necessarily installed in the same room. It doesn't have to be.

例えば、ユーザ４０が、ＳＴＢ１０の電源を入れると、表示装置２０の画面に、ホットワード「ライオン」が表示される。宅内に居るユーザ４０は、表示装置２０の画面を目視可能である。このため、ユーザ４０が「ライオン」と発話することにより、ＳＴＢ１０は「ライオン」の音声データを取得することができる。これにより、ＳＴＢ１０は音声操作を受付可能な状態に遷移する。 For example, when the user 40 powers on the STB 10, the screen of the display device 20 displays the hot word "lion". A user 40 at home can view the screen of the display device 20 . Therefore, when the user 40 utters "lion", the STB 10 can acquire the voice data of "lion". As a result, the STB 10 transitions to a state in which voice operations can be accepted.

一方、宅外に居る第三者５０は、表示装置２０の画面を目視できない。このため、第三者５０は、ホットワード「ライオン」を知ることができない。例えば、第三者５０が、宅外のスマートフォン６０からスマートフォン６０に無線接続された宅内のスマートフォン７０を遠隔操作することにより、スマートフォン７０から音声の出力ができるとしても、第三者５０はホットワード「ライオン」を知ることができない。このため、何らかのワード「ｘｘｘ」の音声をスマートフォン７０から出力させても、ＳＴＢ１０を音声操作可能な状態に遷移させることはできない。これにより、第三者５０の不正な操作を防止することができる。 On the other hand, a third party 50 outside the home cannot see the screen of the display device 20 . Therefore, the third party 50 cannot know the hot word "lion". For example, even if the third party 50 can remotely control the indoor smartphone 70 wirelessly connected to the smartphone 60 from the smartphone 60 outside the home, the third party 50 can output voice from the smartphone 70 . You can't know "Lion". For this reason, even if the smartphone 70 outputs some voice of the word "xxx", the STB 10 cannot be changed to a voice-operable state. Thereby, unauthorized operation by the third party 50 can be prevented.

＜実施の形態の効果＞
以上説明したように、本実施の形態によると、ホットワードが表示装置２０の画面に表示され、画面に表示されているホットワードが発話された場合にユーザ４０による音声操作が許可される。このため、画面を見ることのできない第三者５０は、ホットワードを知ることができないため、ホットワードを発話することができず、これにより、第三者５０による音声操作を拒否することができる。よって、ＳＴＢ１０のユーザ４０以外の第三者５０による不正な音声操作を防止することができる。 <Effect of Embodiment>
As described above, according to the present embodiment, a hotword is displayed on the screen of display device 20, and voice operation by user 40 is permitted when the hotword displayed on the screen is spoken. Therefore, the third party 50 who cannot see the screen cannot know the hotword and cannot utter the hotword, thereby rejecting the voice operation by the third party 50. . Therefore, unauthorized voice operation by a third party 50 other than the user 40 of the STB 10 can be prevented.

また、ホットワード表示制御部１６は、ホットワードを所定の期間にわたり表示装置２０の画面に表示させ、音声認識部１２は、所定の期間中に発話された音声データに基づいて、ホットワードが発話されたか否かを判定することができる。つまり、ホットワードが画面に表示されている期間中にホットワードが発話された場合にのみ、音声操作を許可することができる。このため、過去に画面に表示された他のホットワードに基づいて音声操作が許可されることがなくなる。これにより、第三者５０による不正な音声操作を、より防止することができる。 The hotword display control unit 16 displays hotwords on the screen of the display device 20 for a predetermined period of time, and the speech recognition unit 12 recognizes the hotwords as uttered based on the voice data uttered during the predetermined period of time. It can be determined whether or not In other words, the voice operation can be permitted only when the hotword is spoken while the hotword is displayed on the screen. Therefore, the voice operation is not permitted based on other hotwords displayed on the screen in the past. As a result, unauthorized voice operations by the third party 50 can be further prevented.

また、ＳＴＢ１０は、所定の変更条件に従って、ホットワードを変更する。このため、ホットワードが第三者５０に暴露した場合であっても、ホットワードを変更することにより、第三者５０による不正な音声操作を防止することができる。また、固定のホットワードを読み上げるテレビＣＭの音声などにＳＴＢ１０が反応して、ＳＴＢ１０が起動する可能性を低くすることもできる。 Also, the STB 10 changes the hotword according to a predetermined change condition. Therefore, even if the hotword is disclosed to the third party 50, by changing the hotword, the unauthorized voice operation by the third party 50 can be prevented. It is also possible to reduce the possibility that the STB 10 will react to the sound of a TV commercial reading a fixed hot word and the STB 10 will start up.

また、例えば、所定回数連続してホットワードが発話されなかったと判定された場合に、ホットワードを変更することができる。このため、第三者５０が遠隔操作でスマートフォン７０から音声を出力させる等して、ＳＴＢ１０を操作しようとして失敗した場合に、ホットワードを変更することができる。これにより、ＳＴＢ１０が第三者５０から不正に操作されるのを防止することができる。 Also, for example, if it is determined that the hotword has not been uttered a predetermined number of times in a row, the hotword can be changed. Therefore, when the third party 50 attempts to operate the STB 10 by, for example, outputting voice from the smartphone 70 by remote control and fails, the hotword can be changed. This prevents the STB 10 from being illegally operated by the third party 50 .

また、ユーザ４０による音声操作が終了する度にホットワードを変更することができる。これにより、ホットワードが第三者５０に暴露されにくくすることができ、ＳＴＢ１０が第三者５０から不正に操作されるのを防止することができる。 Also, the hotword can be changed each time the voice operation by the user 40 is completed. This makes it difficult for the hotword to be exposed to the third party 50 and prevents the STB 10 from being illegally operated by the third party 50 .

また、ホットワード表示制御部１６は、記憶部１７に予め記憶されている複数のワードの中からワードを選択し、選択したワードをホットワードとして表示装置２０の画面に表示させることもできる。これにより、当該ワードを認識可能な音声認識モデルをＳＴＢ１０の出荷前に作成することができる。このため、音声認識部１２が音声認識により判定処理を行うのに先立って、音声認識モデルの学習を行う必要がなくなる。 The hot word display control unit 16 can also select a word from a plurality of words pre-stored in the storage unit 17 and display the selected word on the screen of the display device 20 as a hot word. As a result, a speech recognition model capable of recognizing the word can be created before the STB 10 is shipped. Therefore, it is not necessary to learn the speech recognition model before the speech recognition unit 12 performs determination processing by speech recognition.

また、記憶部１７には、ユーザ４０が決定したワードが記憶されており、ホットワード表示制御部１６は、記憶部１７から当該ワードを読み出し、読み出したワードを表示装置２０の画面に表示させることもできる。つまり、ユーザ４０が決定したワードをホットワードとすることができるため、自由にホットワードを決定することができ、これにより、ホットワードが第三者５０に暴露されにくくすることができる。 The storage unit 17 stores words determined by the user 40 , and the hot word display control unit 16 reads the words from the storage unit 17 and displays the read words on the screen of the display device 20 . can also In other words, since a word determined by the user 40 can be used as a hot word, the hot word can be freely determined, thereby making it difficult for the hot word to be exposed to the third party 50 .

［付記］
上記したＳＴＢ１０に代表される音声操作装置は、具体的には、マイクロプロセッサ、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、ＨＤＤ（Hard Disk Drive）、ディスプレイユニット、キーボード、マウスなどから構成されるコンピュータシステムとして構成されてもよい。ＲＡＭまたはＨＤＤには、コンピュータプログラムが記憶されている。マイクロプロセッサが、コンピュータプログラムに従って動作することにより、音声操作装置は、その機能を達成する。ここでコンピュータプログラムは、所定の機能を達成するために、コンピュータに対する指令を示す命令コードが複数個組み合わされて構成されたものである。 [Appendix]
The voice operation device represented by the above STB 10 is specifically composed of a microprocessor, ROM (Read Only Memory), RAM (Random Access Memory), HDD (Hard Disk Drive), display unit, keyboard, mouse, etc. It may be configured as a computer system that is A computer program is stored in the RAM or HDD. The voice operation device achieves its functions by the microprocessor operating according to the computer program. Here, the computer program is constructed by combining a plurality of instruction codes indicating instructions to the computer in order to achieve a predetermined function.

さらに、音声操作装置を構成する構成要素の一部または全部は、１個のシステムＬＳＩから構成されているとしてもよい。システムＬＳＩは、複数の構成部を１個のチップ上に集積して製造された超多機能ＬＳＩであり、具体的には、マイクロプロセッサ、ＲＯＭ、ＲＡＭなどを含んで構成されるコンピュータシステムである。ＲＡＭには、コンピュータプログラムが記憶されている。マイクロプロセッサが、コンピュータプログラムに従って動作することにより、システムＬＳＩは、その機能を達成する。 Furthermore, part or all of the components that make up the voice operation device may be made up of one system LSI. A system LSI is an ultra-multifunctional LSI manufactured by integrating multiple components on a single chip. Specifically, it is a computer system that includes a microprocessor, ROM, RAM, etc. . A computer program is stored in the RAM. The system LSI achieves its functions by the microprocessor operating according to the computer program.

また、本発明は、上記に示す方法をコンピュータにより実現するコンピュータプログラムであるとしてもよい。 Further, the present invention may be a computer program that implements the method described above by a computer.

さらに、本発明は、上記コンピュータプログラムをコンピュータ読取可能な非一時的な記録媒体、例えば、ＨＤＤ、ＣＤ－ＲＯＭ、半導体メモリなどに記録したものとしてもよい。 Further, according to the present invention, the computer program may be recorded in a non-transitory computer-readable recording medium such as an HDD, a CD-ROM, a semiconductor memory, or the like.

また、本発明は、上記コンピュータプログラムを、電気通信回線、無線または有線通信回線、インターネットを代表とするネットワーク、データ放送等を経由して伝送するものとしてもよい。
また、音声操作装置は、複数のコンピュータにより実現されてもよい。 Further, according to the present invention, the computer program may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, data broadcasting, or the like.
Also, the voice operation device may be realized by a plurality of computers.

また、音声操作装置の一部または全部の機能がクラウドコンピューティングによって提供されてもよい。つまり、音声操作装置の一部または全部の機能がクラウドサーバにより実現されていてもよい。例えば、ＳＴＢ１０において、音声認識部１２の機能がクラウドサーバにより実現され、ＳＴＢ１０は、クラウドサーバに対して音声データを送信し、クラウドサーバから当該音声データに対する認識結果を取得する構成であってもよい。 Also, part or all of the functions of the voice operation device may be provided by cloud computing. That is, part or all of the functions of the voice operation device may be realized by the cloud server. For example, in the STB 10, the function of the speech recognition unit 12 is realized by a cloud server, the STB 10 may be configured to transmit speech data to the cloud server and obtain the recognition result for the speech data from the cloud server. .

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 It should be considered that the embodiments disclosed this time are illustrative in all respects and not restrictive. The scope of the present invention is indicated by the scope of the claims rather than the meaning described above, and is intended to include all modifications within the scope and meaning equivalent to the scope of the claims.

１音声操作システム
１０ＳＴＢ（音声操作装置）
１１音声取得部
１２音声認識部（発話判定部）
１３音声操作部
１４再生処理部
１５映像出力部
１６ホットワード表示制御部（表示制御部）
１７記憶部
２０表示装置
３０エアコン
４０ユーザ
５０第三者
６０スマートフォン
７０スマートフォン
1 voice operation system 10 STB (voice operation device)
11 voice acquisition unit 12 voice recognition unit (utterance determination unit)
13 Audio operation unit 14 Playback processing unit 15 Video output unit 16 Hot word display control unit (display control unit)
17 storage unit 20 display device 30 air conditioner 40 user 50 third party 60 smartphone 70 smartphone

Claims

A voice operating device,
a display controller for displaying hotwords on the screen;
a voice acquisition unit that acquires voice data uttered by a user;
an utterance determination unit that determines whether or not the hot word is uttered based on the voice data;
a voice operation unit that permits voice operation by the user when it is determined that the hot word has been uttered ;
The voice operation device, wherein the screen exists at the same location as the voice operation device installed in the home .

a display controller for displaying hotwords on the screen;
a voice acquisition unit that acquires voice data uttered by a user;
an utterance determination unit that determines whether or not the hot word is uttered based on the voice data;
a voice operation unit that permits voice operation by the user when it is determined that the hot word is uttered;
with
The voice operation device, wherein the display control unit randomly changes the hotword according to a predetermined change condition.

a display controller for displaying hotwords on the screen;
a voice acquisition unit that acquires voice data uttered by a user;
an utterance determination unit that determines whether or not the hot word is uttered based on the voice data;
a voice operation unit that permits voice operation by the user when it is determined that the hot word is uttered;
with
The voice operation device, wherein the display control unit periodically changes the hotword according to a predetermined change condition .

4. The voice operation device according to claim 3 , wherein said display control section changes said hot word based on the determination result of said speech determination section.

5. The voice operation device according to claim 4 , wherein the display control unit changes the hotword based on the number of times the hotword is determined not to be spoken.

a display controller for displaying hotwords on the screen;
a voice acquisition unit that acquires voice data uttered by a user;
an utterance determination unit that determines whether or not the hot word is uttered based on the voice data;
a voice operation unit that permits voice operation by the user when it is determined that the hot word is uttered;
with
The voice operation device, wherein the display control unit changes the hot word according to a predetermined change condition and based on the determination result of the speech determination unit.

a display controller for displaying hotwords on the screen;
a voice acquisition unit that acquires voice data uttered by a user;
an utterance determination unit that determines whether or not the hot word is uttered based on the voice data;
a voice operation unit that permits voice operation by the user when it is determined that the hot word is uttered;
with
The voice operation device, wherein the display control unit changes the hotword when the voice operation ends.

causing the hotword to appear on the screen;
obtaining voice data uttered by a user;
determining whether the hotword is uttered based on the audio data;
allowing voice manipulation by the user if it is determined that the hotword has been spoken ;
The voice operation method , wherein the step of displaying on the screen periodically changes the hotword according to a predetermined change condition .

causing the hotword to appear on the screen;
obtaining voice data uttered by a user;
determining whether the hotword is uttered based on the audio data;
allowing the user to perform a voice operation when it is determined that the hotword has been uttered;
including
In the step of displaying on the screen, the voice operation method changes the hotword according to a predetermined change condition based on a determination result as to whether or not the hotword is uttered.

causing the hotword to appear on the screen;
obtaining voice data uttered by a user;
determining whether the hotword is uttered based on the audio data;
allowing the user to perform a voice operation when it is determined that the hotword has been uttered;
including
The voice manipulation method, wherein the step of displaying on the screen changes the hotword when the voice manipulation ends.

a display device;
A voice operation system comprising the voice operation device according to any one of claims 1 to 7 .