JP3301775B2

JP3301775B2 - Voice recognition control device

Info

Publication number: JP3301775B2
Application number: JP08889192A
Authority: JP
Inventors: 正幸飯田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1992-04-09
Filing date: 1992-04-09
Publication date: 2002-07-15
Anticipated expiration: 2017-07-15
Also published as: JPH05289690A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は音声認識装置に関し、特
に、オーディオ・ビデオ機器を音声認識により制御する
音声認識制御装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device, and more particularly to a speech recognition control device for controlling audio / video equipment by speech recognition.

【０００２】[0002]

【従来の技術】ラジオやテレビなどのオーディオ・ビデ
オ機器（ＡＶ機器）の制御を行う手段として音声認識に
よる制御装置が用いられている。2. Description of the Related Art As a means for controlling audio / video equipment (AV equipment) such as a radio and a television, a control device based on voice recognition is used.

【０００３】図３に、このような従来の一般的な音声認
識制御装置の概略構成図を示す。従来の音声認識制御装
置は、被制御部であるＡＶ機器（３１）とリモートコン
トロール部であるリモコン（３０）とから成り、リモコ
ン（３０）は無線媒体（３２）を介してＡＶ機器（３
１）へ制御信号を送る。FIG. 3 is a schematic block diagram of such a conventional general speech recognition control device. The conventional voice recognition control device includes an AV device (31) that is a controlled unit and a remote controller (30) that is a remote control unit. The remote controller (30) is connected to the AV device (3) via a wireless medium (32).
Send a control signal to 1).

【０００４】図３において、（３０１）は音声が入力さ
れるマイクロフォン、（３０２）はマイクロフォン（３
０１）から入力される音響信号を分析して音声の特徴を
表す特徴パラメータの時系列を抽出する音声分析部であ
り、例えば、周波数分析により音響信号レベル情報を保
存したスペクトルパラメータが得られる。In FIG. 3, reference numeral (301) denotes a microphone to which sound is input, and (302) denotes a microphone (3).
01) is a voice analysis unit that analyzes a voice signal input from step 01) and extracts a time series of feature parameters representing voice characteristics. For example, a spectrum parameter in which sound signal level information is stored by frequency analysis is obtained.

【０００５】（３０３）は上記音声分析部（３０２）か
ら得られる特徴パラメータの時系列に対して音声が存在
する区間（音声区間）を切り出す音声区間切り出し部で
あり、（３０４）は該音声区間の特徴パラメータ時系列
から入力音声パターンを作成するパターン作成部であ
り、特定の時系列に特徴パターンを正規化した音声パタ
ーンが得られる。[0005] (303) is a speech section cutout section for cutting out a section (speech section) in which a speech exists in the time series of the characteristic parameter obtained from the speech analysis section (302), and (304) is a speech section cutout section. This is a pattern creation unit that creates an input speech pattern from the characteristic parameter time series, and obtains a speech pattern in which the characteristic pattern is normalized to a specific time series.

【０００６】（３０５）は予め多数の標準的音声の音声
パターンを標準音声パターンとして記憶した標準パター
ンメモリであって、同図の音声認識制御装置が、話者を
特定しない不特定話者を対象とした時には、あらゆる話
者に通じるような平均的な音声の特徴をパターン化した
標準音声パターンが各種音声についてそれぞれ記憶され
ている。Reference numeral (305) denotes a standard pattern memory in which a large number of standard voice patterns are stored in advance as standard voice patterns, and the voice recognition control device shown in FIG. In this case, standard voice patterns obtained by patterning the characteristics of average voices that are common to all speakers are stored for various voices.

【０００７】（３０６）は上記音声パターン作成部（３
０４）から得られる入力音声パターンと上記標準音声パ
ターンメモリ（３０５）の各標準音声パターンとをパタ
ーンマッチングし、パターン間誤差が最も小さくなるよ
うな標準音声パターンを検出する比較判定部であり、検
出された標準音声パターンに対応する認識結果信号を出
力する。[0007] (306) is the voice pattern creation unit (3)
04) is a comparison / determination unit that performs pattern matching between the input voice pattern obtained from step (04) and each of the standard voice patterns in the standard voice pattern memory (305), and detects a standard voice pattern that minimizes the inter-pattern error. And outputs a recognition result signal corresponding to the standard voice pattern.

【０００８】（３０７）は比較判定部（３０６）から得
られる認識結果信号を、被制御対象であるテレビなどの
ＡＶ機器（３１）の制御信号に変換して該ＡＶ機器（３
１）に送信するリモコン送信部である。リモコン送信部
（３０７）からの送信は、赤外線などの光信号、電波信
号、磁気信号などの無線媒体（３２）により行われる。(307) converts the recognition result signal obtained from the comparison / determination section (306) into a control signal of an AV device (31) such as a television to be controlled and converts the signal into a control signal of the AV device (3).
This is a remote control transmission unit that transmits the data to 1). Transmission from the remote control transmission unit (307) is performed by a wireless medium (32) such as an optical signal such as infrared light, a radio signal, and a magnetic signal.

【０００９】（３０８）はリモコン送信部（３０７）か
ら無線媒体（３２）により送信される制御信号を受信
し、ＡＶ機器本体（３１０）を制御する制御部（３０
９）へ該制御信号を伝達する本体受信部である。ＡＶ機
器本体（３１０）はスピーカ（３１２）から音声や音楽
等のオーディオ雑音を発生するためのアンプ（３１１）
を有する。A control unit (308) receives a control signal transmitted from the remote control transmission unit (307) via the wireless medium (32) and controls the AV equipment main body (310).
9) is a main body receiving unit for transmitting the control signal to 9). The AV device body (310) is an amplifier (311) for generating audio noise such as voice and music from a speaker (312).
Having.

【００１０】また、（３２０）は音声認識を行わずにリ
モコン（３０）を操作する場合に用いる操作盤であっ
て、ＡＶ機器（３１）を制御するために必要な多種のボ
タンやスイッチを備える。An operation panel (320) is used for operating the remote controller (30) without performing voice recognition. The operation panel includes various buttons and switches necessary for controlling the AV equipment (31). .

【００１１】さらに図４ないし図５は、従来の音声認識
制御装置における音声区間の切り出し方法を示す信号図
である。図４は、静かな環境下で音声のみからなる信号
（Ｖ）を切り出す方法を示す信号図であり、図５は、音
声と音楽とから構成される信号（Ｓ）を切り出す方法を
示す信号図である。これらの図において、（Ｂ）は定数
の値を持つ音声区間切り出しの基準値である。FIGS. 4 and 5 are signal diagrams showing a method of cutting out a voice section in a conventional voice recognition control device. FIG. 4 is a signal diagram showing a method of cutting out a signal (V) consisting of only sound in a quiet environment, and FIG. 5 is a signal diagram showing a method of cutting out a signal (S) consisting of sound and music. It is. In these figures, (B) is a reference value for extracting a voice section having a constant value.

【００１２】音声区間の検出は、通常、入力された音声
信号のレベルの値や変動状態に基づいて音声区間の始端
と終端とを検出することにより行うが、この種の検出で
最も単純な方法は、音声信号のレベルと所定のしきい値
とを比較する比較手段を備え、音声信号のレベルがこの
しきい値を越えた時間領域を音声区間と見做す方法であ
る。この方法によれば、図４の例では、音声信号のレベ
ル（Ｖ）がしきい値（Ｂ）を越えた区間（ｔＣ１〜ｔＣ
２）が音声が発生された音声区間として検出される。The detection of the voice section is usually performed by detecting the start and end of the voice section based on the level value and the fluctuation state of the input voice signal. The simplest method of this type of detection is used. Is a method comprising comparing means for comparing the level of an audio signal with a predetermined threshold, and regarding a time region in which the level of the audio signal exceeds the threshold as a voice section. According to this method, in the example of FIG. 4, the section (tC1 to tC1) where the level (V) of the audio signal exceeds the threshold (B).
2) is detected as a voice section in which voice is generated.

【００１３】ところが、前述したような従来の音声認識
制御装置においては、マイクロフォン（３０１）から入
力される音声の他に、常にスピーカ（３１２）から音楽
等のオーディオ雑音が入力されてしまうので、図５のよ
うに音声信号のレベル（Ｓ）が高く変化してしまい、音
声区間として検出される範囲（ｔＢ１〜ｔＢ２）は実際
の音声区間よりも広いものとなってしまう。However, in the conventional voice recognition control device as described above, audio noise such as music is always input from the speaker (312) in addition to the voice input from the microphone (301). 5, the level (S) of the audio signal changes high, and the range (tB1 to tB2) detected as an audio section becomes wider than the actual audio section.

【００１４】このように、従来の技術では、音声信号に
雑音が混在する場合には音声の時間領域を正確に検出す
ることが困難となり、音声認識の認識率が低下するとい
う問題があった。As described above, in the conventional technique, when noise is mixed in a voice signal, it is difficult to accurately detect a time domain of voice, and there is a problem that a recognition rate of voice recognition is reduced.

【００１５】そこで、このようなオーディオ雑音が存在
する環境下における音声認識技術として、特開平３−２
３３６００号公報に記載されるような、音声区間を切り
出す基準値をオーディオ雑音の発生源にいおける出力レ
ベルに合わせて変化させて音声区間検出の精度を上げる
技術が用いられている。As a speech recognition technique in an environment where such audio noise is present, Japanese Patent Laid-Open No. 3-2 is disclosed.
As described in Japanese Patent No. 33600, a technique for improving the accuracy of voice section detection by changing a reference value for cutting out a voice section in accordance with an output level of an audio noise source is used.

【００１６】ところが、この技術を前述したような従来
装置に用いる場合には、ＡＶ機器が発生するオーディオ
雑音のレベル情報を音声認識部へ反映させなければなら
ず、そのためには音声認識部がＡＶ機器と一体である必
要がある。しかしながら、音声認識部とＡＶ機器とを一
体とした場合、ＡＶ機器が発生するオーディオ雑音の影
響によりＳ／Ｎ比が悪くなり、認識率の低下につなが
る。また、これを避けるために認識部をリモコンに設け
ることができるが、この場合はＡＶ機器からのオーディ
オ雑音のレベル情報を音声認識部へ反映することができ
ない。However, when this technique is used in the conventional apparatus as described above, the level information of audio noise generated by the AV equipment must be reflected in the voice recognition unit. Must be integral with the device. However, when the voice recognition unit and the AV device are integrated, the S / N ratio deteriorates due to the influence of audio noise generated by the AV device, which leads to a reduction in the recognition rate. In order to avoid this, a recognition unit can be provided in the remote controller, but in this case, the level information of audio noise from the AV device cannot be reflected on the speech recognition unit.

【００１７】[0017]

【発明が解決しようとする課題】本発明は上述のような
従来の不都合に鑑みてなされたものであり、音声認識を
行うリモートコントロール部によって音声や音楽等の周
辺雑音を発生するテレビなどの被制御系を制御する場合
に、リモートコントロール部がオーディオ雑音に影響さ
れることなく音声認識を行い、この認識結果に基づいて
被制御系を制御することのできる音声認識制御装置を提
供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above-mentioned conventional inconveniences, and is intended to be applied to a television or the like which generates ambient noise such as voice and music by a remote control unit for voice recognition. It is an object of the present invention to provide a voice recognition control device capable of controlling a control system based on a result of a voice recognition performed by a remote control unit without being affected by audio noise when controlling a control system. And

【００１８】[0018]

【課題を解決するための手段】本発明による音声認識制
御装置の被制御系は、オーディオ雑音の出力レベルの変
動に追従したレベル信号をリモートコントロール部へ伝
送するレベル信号送出部を備える。The controlled system of the speech recognition control apparatus according to the present invention includes a level signal transmitting section for transmitting a level signal following a change in the output level of audio noise to a remote control section.

【００１９】また、リモートコントロール部は、上記被
制御系から送信される上記レベル信号を受信するレベル
信号受信部と、上記レベル信号に基づいて音声区間を切
り出す基準値を設定する切り出し基準値設定部と、入力
音声を分析して音声の特徴パラメータ時系列を抽出する
音声分析部と、上記切り出し基準値を用いて音声領域を
検出し、上記音声領域内に存在する上記特徴パラメータ
時系列から音声パターンを作成する音声パターン作成部
と、を備える。The remote control unit includes a level signal receiving unit that receives the level signal transmitted from the controlled system, and a cutout reference value setting unit that sets a reference value for cutting out a voice section based on the level signal. A voice analysis unit that analyzes an input voice to extract a feature parameter time series of the voice, and detects a voice region using the cut-out reference value, and determines a voice pattern from the feature parameter time series existing in the voice region. And a voice pattern creation unit that creates the sound pattern.

【００２０】[0020]

【作用】本発明による音声認識制御装置によれば、被制
御系において、レベル信号送出部がオーディオ雑音の出
力レベルの変動に追従したレベル信号をリモートコント
ロール部へ伝送する。According to the speech recognition control apparatus of the present invention, in the controlled system, the level signal transmitting section transmits the level signal following the fluctuation of the output level of the audio noise to the remote control section.

【００２１】また、リモートコントロール部において、
レベル信号受信部が上記被制御系から送信される上記レ
ベル信号を受信し、切り出し基準値設定部が上記レベル
信号に基づいて音声区間を切り出す基準値を設定し、音
声分析部が入力音声を分析して音声の特徴パラメータ時
系列を抽出し、音声パターン作成部が上記切り出し基準
値を用いて音声領域を検出し、該音声領域内に存在する
上記特徴パラメータ時系列に基づいて音声パターンを作
成し、比較判定部が標準パターンメモリの各標準パター
ンと上記音声パターンとを比較判定して上記音声パター
ンを識別し、制御信号送出手段が上記比較判定部による
比較判定結果に基づいた制御信号を被制御系に送出す
る。In the remote control section,
A level signal receiving unit receives the level signal transmitted from the controlled system, a cut-out reference value setting unit sets a reference value for cutting out a voice section based on the level signal, and a voice analysis unit analyzes an input voice. And extract a voice feature parameter time series, a voice pattern creation unit detects a voice area using the cutout reference value, and creates a voice pattern based on the feature parameter time series present in the voice area. A comparison / determination unit compares and determines each of the standard patterns in the standard pattern memory with the voice pattern to identify the voice pattern, and a control signal transmitting unit controls a control signal based on a comparison / determination result by the comparison / determination unit. Send to the system.

【００２２】[0022]

【実施例】以下、図とともに本発明による音声認識制御
装置について説明する。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing a speech recognition control apparatus according to the present invention.

【００２３】図１は本発明による音声認識制御装置の概
略構成図である。本発明による音声認識制御装置も、従
来の音声認識制御装置と同様に、被制御部であるＡＶ機
器（１１）とそのリモートコントロール部であるリモコ
ン（１０）とから成り、リモコン（１０）は無線媒体
（１２）を介してＡＶ機器（１１）へ制御信号を送る。FIG. 1 is a schematic configuration diagram of a voice recognition control device according to the present invention. The voice recognition control device according to the present invention also includes an AV device (11) as a controlled unit and a remote control (10) as a remote control unit, like the conventional voice recognition control device, and the remote control (10) is wireless. A control signal is sent to the AV device (11) via the medium (12).

【００２４】図１のリモコン（１０）側において、（１
０１）は音声を入力し音響信号に変換するマイクロフォ
ン、（１０２）はマイクロフォン（１０１）から入力さ
れる音響信号を分析して音声の特徴を表す特徴パラメー
タの時系列を抽出する音声分析部である。On the remote control (10) side of FIG.
Reference numeral 01) denotes a microphone that inputs voice and converts it into an audio signal, and reference numeral 102 denotes a voice analysis unit that analyzes an audio signal input from the microphone 101 and extracts a time series of feature parameters representing the features of the voice. .

【００２５】（１０３）は上記音声分析部（１０２）か
ら得られる特徴パラメータの時系列に対して音声が存在
する区間（音声区間）を切り出す音声区間切り出し部で
あり、（１０４）はＡＶ機器（１１）から送られてくる
オーディオ雑音のレベル信号に基づいて音声区間の切り
出し基準値を設定する切り出し基準値設定部である。音
声区間切り出し部（１０３）は、入力音声のレベルを切
り出し基準値と比較して、入力音声のレベルが切り出し
基準値を越えた時間領域を音声区間と見做し、この区間
の音声を切り出す。(103) is a speech section cutout section for cutting out a section (speech section) in which speech exists in the time series of the characteristic parameter obtained from the speech analysis section (102), and (104) is an AV device ( This is a cut-out reference value setting unit that sets a cut-out reference value for a voice section based on the audio noise level signal sent from 11). The voice section cut-out unit (103) compares the level of the input voice with the cut-out reference value, regards a time region where the level of the input voice exceeds the cut-out reference value as a voice section, and cuts out the voice of this section.

【００２６】（１０５）は該音声区間の特徴パラメータ
時系列から入力音声パターンを作成するパターン作成部
である。(105) a pattern creating section for creating an input voice pattern from the characteristic parameter time series of the voice section.

【００２７】（１０６）は予め多数の標準的音声の音声
パターンを標準音声パターンとして記憶した標準パター
ンメモリであって、（１０７）は上記音声パターン作成
部（１０５）から得られる入力音声パターンと上記標準
音声パターンメモリ（１０６）の各標準音声パターンと
をパターンマッチングして最も類似する標準音声パター
ンを検出する比較判定部であり、検出された標準音声パ
ターンに対応する認識結果信号を出力する。Reference numeral (106) denotes a standard pattern memory in which a large number of standard voice patterns are stored in advance as standard voice patterns, and (107) is an input voice pattern obtained from the voice pattern creating section (105) and A comparison / determination unit that performs pattern matching with each standard voice pattern in the standard voice pattern memory (106) to detect the most similar standard voice pattern, and outputs a recognition result signal corresponding to the detected standard voice pattern.

【００２８】（１０８）は比較判定部（１０７）から得
られる認識結果信号を、被制御対象であるテレビなどの
ＡＶ機器（１１）の制御信号に変換して該ＡＶ機器（１
１）に送信するリモコン送信部である。(108) converts the recognition result signal obtained from the comparison / determination unit (107) into a control signal of an AV device (11) such as a television to be controlled and converts the signal into a control signal.
This is a remote control transmission unit that transmits the data to 1).

【００２９】また、（１０９）はＡＶ機器（１１）から
送られてくる信号を受信するリモコン受信部であり、
（１１０）は該信号からＡＶ機器が発するオーディオ雑
音のレベル信号を検出するレベル信号検出部である。A remote control receiving unit (109) receives a signal sent from the AV device (11).
Reference numeral (110) denotes a level signal detection unit that detects a level signal of audio noise generated by the AV device from the signal.

【００３０】また、図１のＡＶ機器（１１）側におい
て、（１１１）はテレビやオーディオ装置などのＡＶ機
器本体であり、（１１２）はＡＶ機器本体（１１１）を
制御する制御部であり、（１１３）はリモコン送信部
（１０８）から送信される制御信号を受信し、該制御信
号を制御部（１１２）へと伝達する本体受信部である。Further, on the side of the AV equipment (11) in FIG. 1, (111) is a main body of the AV equipment such as a television or an audio device, (112) is a control unit for controlling the AV equipment main body (111), (113) is a main body receiving unit that receives a control signal transmitted from the remote control transmitting unit (108) and transmits the control signal to the control unit (112).

【００３１】（１１４）はＡＶ機器本体（１１２）が発
生するオーディオ雑音を出力するためのアンプであり、
アンプ（１１４）からの出力は音としてスピーカ（１１
５）から外部空間へ出力されると共に、信号としてレベ
ル信号作成部（１１６）へと送られる。レベル信号作成
部（１１６）はアンプ（１１４）から出力されるオーデ
ィオ雑音の信号のレベルを計測してレベル信号を作成す
る。Reference numeral (114) denotes an amplifier for outputting audio noise generated by the AV device main body (112).
The output from the amplifier (114) is converted into sound by a speaker (11).
5) is output to the external space and sent as a signal to the level signal generator (116). The level signal creation section (116) measures the level of the audio noise signal output from the amplifier (114) to create a level signal.

【００３２】（１１７）はレベル信号作成部（１１６）
において作成されたレベル信号をリモコン（１０）側へ
送出する本体送信部である。尚、リモコン送信部（１０
８）からの送信、並びに、本体送信部（１１７）からの
送信は、赤外線などの光信号、電波信号、磁気信号等の
無線媒体（１２）により行われる。(117) is a level signal generator (116)
Is a main body transmission unit for transmitting the level signal created in the step (1) to the remote control (10) side. Note that the remote control transmission unit (10
The transmission from 8) and the transmission from the main body transmission section (117) are performed by a wireless medium (12) such as an optical signal such as infrared light, a radio signal, and a magnetic signal.

【００３３】また、（１２０）は音声認識を行わずにリ
モコン（１０）を操作する場合に用いる操作盤であっ
て、ＡＶ機器（１１）を制御するために必要な多種のボ
タンやスイッチを備える。An operation panel (120) is used for operating the remote controller (10) without performing voice recognition, and includes various buttons and switches necessary for controlling the AV equipment (11). .

【００３４】さらに、図２は本発明装置による音声切り
出し方法を示す信号図である。図２において、（Ｓ）は
マイクロフォン（１０１）からの音声信号のレベルを示
しており、図４の音声の信号（Ｖ）にオーディオ雑音の
レベルが加わったものであって、先に図５で述べた信号
（Ｓ）と同じ物である。また、（Ｂ）は定数の音声区間
切り出し基準値を、（Ａ）はオーディオ雑音に応じて動
的に変化させた音声区間切り出し基準値を示す。FIG. 2 is a signal diagram showing a voice extracting method according to the apparatus of the present invention. In FIG. 2, (S) indicates the level of the audio signal from the microphone (101), which is obtained by adding the level of audio noise to the audio signal (V) in FIG. This is the same as the signal (S) described above. (B) shows a constant voice section cutout reference value, and (A) shows a voice section cutout reference value dynamically changed according to audio noise.

【００３５】これより、本発明による音声認識制御装置
の動作について説明するが、今、本実施例の音声認識制
御装置のＡＶ機器（１１）のスピーカ（１１５）からは
音声が発せられているものとし、従って、マイクロフォ
ン（１０１）へは制御のための音声とスピーカから発せ
られる音声との両方が入力されているものとする。The operation of the voice recognition control device according to the present invention will now be described. Now, the voice recognition control device according to the present embodiment emits voice from the speaker (115) of the AV device (11). Therefore, it is assumed that both the sound for control and the sound emitted from the speaker are input to the microphone (101).

【００３６】まずＡＶ機器（１１）側において、レベル
信号作成部（１１６）はＡＶ機器本体（１１１）が発生
するオーディオ雑音をアンプ（１１４）を介して受信
し、オーディオ雑音の出力レベルの変動に追従したレベ
ル信号を本体送信部（１１７）からリモコン（１０）へ
送信する。First, on the side of the AV equipment (11), the level signal creating section (116) receives audio noise generated by the AV equipment main body (111) via the amplifier (114) and changes the output level of the audio noise. The tracked level signal is transmitted from the main body transmission section (117) to the remote control (10).

【００３７】リモコン（１０）側では、ＡＶ機器（１
１）から送られてくるレベル信号はリモコン受信部（１
０９）を介してレベル信号検出部（１１０）において検
出される。切り出し基準値設定部（１０４）は、ここで
検出されたレベル信号を参考に切り出し基準値を設定す
る。切り出し基準値はレベル信号の値の関数と考えるこ
とができ、例えば、Ａ＝ｃ×（レベル信号値）＋Ｂのような式により表すことができる。ここでｃ、Ｂは定
数であり、特にＢはマイクロフォン（１０１）から入力
される定常的な雑音が音声として切り出されることがな
いような最適な値が与えられる。On the remote control (10) side, the AV equipment (1)
The level signal sent from the remote control receiving unit (1)
09) in the level signal detection unit (110). The cut-out reference value setting unit (104) sets a cut-out reference value with reference to the level signal detected here. The cut-out reference value can be considered as a function of the value of the level signal, and can be represented by, for example, an expression such as A = c × (level signal value) + B. Here, c and B are constants. In particular, B is given an optimum value such that stationary noise input from the microphone (101) is not cut out as speech.

【００３８】さて、ユーザがマイクロフォン（１０１）
に対する音声の入力を開始すると、音声分析部（１０
２）はマイクロフォン（１０１）から入力される音響信
号を分析して音声の特徴を表す特徴パラメータの時系列
を抽出し、周波数分析により音声信号レベル情報を保存
したスペクトルパラメータが得られる。Now, when the user enters the microphone (101)
When the input of the voice to the voice is started, the voice analysis unit (10
2) Analyzing an acoustic signal input from the microphone (101) to extract a time series of characteristic parameters representing characteristics of the voice, and obtaining a spectrum parameter storing voice signal level information by frequency analysis.

【００３９】音声区間切り出し部（１０３）は、マイク
ロフォン（１０１）からの音声信号レベル（Ｖ）が切り
出し基準値設定部（１０４）が設定する切り出し基準値
（Ａ）を越えた区間（ｔＡ１〜ｔＡ２）を音声区間とし
て検出する。すなわち、ＡＶ機器（１１）が発生するオ
ーディオ雑音のレベルに応じて変化する切り出し基準値
（Ａ）を用いて音声領域を検出するので、定数の切り出
し基準値（Ｂ）を使った場合に得られる音声区間（ｔＢ
１〜ｔＢ２）よりも、実際の音声区間（ｔＣ１〜ｔＣ
２）に近い音声区間を切り出すことができる。The voice section clipping section (103) is a section (tA1 to tA2) where the voice signal level (V) from the microphone (101) exceeds the clipping reference value (A) set by the clipping reference value setting section (104). ) Is detected as a voice section. That is, since the audio area is detected by using the cut-out reference value (A) that changes according to the level of audio noise generated by the AV device (11), it is obtained when a constant cut-out reference value (B) is used. Voice section (tB
1 to tB2) rather than the actual voice section (tC1 to tC2).
A voice section close to 2) can be cut out.

【００４０】この後、音声パターン作成部（１０５）
は、音声区間切り出し部（１０３）から得られる特徴パ
ラメータ時系列の内、上記音声区間に存在する特徴パラ
メータ時系列に基づいて音声パターンを作成する。比較
判定部（１０７）は、標準パターンメモリ（１０６）の
各標準パターンと上記音声パターンとを比較判定して上
記音声パターンを識別し、この比較判定結果に基づいた
制御信号をリモコン送信部（１０８）を介してＡＶ機器
（１１）へと送出する。Thereafter, a voice pattern creating section (105)
Creates a voice pattern based on the feature parameter time series existing in the voice section among the feature parameter time series obtained from the voice section cutout unit (103). The comparison / determination unit (107) compares and determines each of the standard patterns in the standard pattern memory (106) with the audio pattern to identify the audio pattern, and transmits a control signal based on the comparison / determination result to the remote control transmission unit (108). ) To the AV device (11).

【００４１】再びＡＶ機器（１１）側では、制御部（１
１２）が本体受信部（１１３）を介して、リモコン送信
部（１０８）から送信される制御信号を受信し、受信す
る制御信号に応じてＡＶ機器本体（１１１）を制御す
る。Again on the side of the AV equipment (11), the control unit (1)
12) receives a control signal transmitted from the remote control transmitting section (108) via the main body receiving section (113), and controls the AV equipment main body (111) according to the received control signal.

【００４２】尚、ＡＶ機器（１１）からリモコン（１
０）へのレベル信号の送信は、常に行うのでなく、リモ
コン（１０）が操作されている場合のみ行えばよく、リ
モコン（１０）が音声認識を開始する状態になった時点
で、ＡＶ機器（１１）に対してレベル信号の送信の開始
を要求する信号を送出し、ＡＶ機器（１１）からのレベ
ル信号の送出を開始させると、更に好ましい形態の本発
明による音声認識制御装置が提供できる。The remote control (1) is transmitted from the AV device (11).
The transmission of the level signal to (0) is not always performed, but may be performed only when the remote control (10) is operated. When the remote control (10) starts to perform voice recognition, the AV device ( By sending a signal requesting the start of transmission of a level signal to 11) and starting transmission of a level signal from the AV device (11), a speech recognition control device according to the present invention in a more preferred form can be provided.

【００４３】また、本実施例では、音声認識をパターン
マッチングにより行ったが、確率情報やファジー、ある
いはニューラルネットを用いる音声認識方法による本発
明の音声認識制御装置もまた可能である。In the present embodiment, the speech recognition is performed by pattern matching. However, the speech recognition control device of the present invention using a speech recognition method using probability information, fuzzy, or neural network is also possible.

【００４４】[0044]

【発明の効果】上述したように、本発明によれば、オー
ディオ雑音を発生するＡＶ機器等の被制御系とそれを制
御するリモートコントロール部とから構成される音声認
識制御装置において、被制御系が発生するオーディオ雑
音のレベル情報をリモートコントロール部へ送出するこ
とにより、リモートコントロール部での音声認識におい
て、この情報を利用して音声区間を切り出す基準レベル
をオーディオ雑音の入力レベルに合わせて変化させるの
で音声区間の切り出しの精度を上げることができる。As described above, according to the present invention, in a voice recognition control apparatus including a controlled system such as an AV device which generates audio noise and a remote control unit for controlling the same, the controlled system The level information of the audio noise generated is transmitted to the remote control unit, and in the speech recognition in the remote control unit, the reference level for cutting out the voice section using this information is changed according to the input level of the audio noise. Therefore, it is possible to improve the accuracy of cutting out the voice section.

【００４５】従って、操作者が音声を入力する時、ＡＶ
機器が発生する音声や音楽等のオーディオ雑音が操作者
の音声と重なって入力されても、ＡＶ機器が発生する音
による音声認識の認識率の極端な低下を防ぐことができ
る。Therefore, when the operator inputs a voice,
Even if audio noise such as voice or music generated by the device is input overlapping with the voice of the operator, it is possible to prevent an extremely lower recognition rate of voice recognition due to the sound generated by the AV device.

[Brief description of the drawings]

【図１】本発明による音声認識制御装置の概略構成図で
ある。FIG. 1 is a schematic configuration diagram of a voice recognition control device according to the present invention.

【図２】本発明装置による音声切り出し方法を示す信号
図である。FIG. 2 is a signal diagram illustrating a voice cutout method according to the apparatus of the present invention.

【図３】従来の音声認識制御装置の概略構成図である。FIG. 3 is a schematic configuration diagram of a conventional voice recognition control device.

【図４】従来の音声認識制御装置による切り出し方法を
示す信号図である。FIG. 4 is a signal diagram showing a clipping method by a conventional voice recognition control device.

【図５】従来の音声認識制御装置による切り出し方法を
示す信号図である。FIG. 5 is a signal diagram showing a clipping method by a conventional voice recognition control device.

[Explanation of symbols]

１０リモコン１１ＡＶ機器１２無線媒体１０１マイクロフォン１０３音声区間切り出し部１０４切り出し基準値設定部１１２制御部１１６レベル信号作成部 Reference Signs List 10 remote control 11 AV equipment 12 wireless medium 101 microphone 103 voice section cutout section 104 cutout reference value setting section 112 control section 116 level signal creation section

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩＨ０４Ｑ 9/00 ３０１３１１ (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/02 G10L 15/00 G10L 15/04 G10L 15/20 G10L 21/02 H04Q 9/00 301 H04Q 9/00 311 ──────────────────────────────────────────────────続き Continuation of the front page (51) Int.Cl. ⁷ identification code FI H04Q 9/00 301 311 (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 11/02 G10L 15/00 G10L 15 / 04 G10L 15/20 G10L 21/02 H04Q 9/00 301 H04Q 9/00 311

Claims

(57) [Claims]

1. A controlled system for generating audio noise,
A remote control unit for controlling the controlled system based on the result of recognizing the input voice. In the voice recognition control device, the controlled system includes a level signal that follows a change in the output level of the audio noise. And a level signal transmitting section for transmitting a level signal transmitted from the controlled system to the remote control section, and a reference for cutting out a voice section based on the level signal. A cut-out reference value setting unit for setting a value, a voice analyzing unit for analyzing an input voice to extract a feature parameter time series of the voice, and detecting a voice region using the cut-out reference value, and with a voice pattern creation section that creates a sound pattern based on the characteristic parameter time series, and the remote control State but to start the voice recognition
At which point the level signal is
Sends a signal requesting the start of transmission, and
A speech recognition control device for starting transmission of a level signal .

2. A voice recognition control device comprising: a remote control unit for issuing a control signal based on a recognition result of an input voice; and a controlled system controlled by the control signal, wherein the remote control unit comprises: A microphone for inputting voice, a voice analyzer for analyzing a sound signal obtained from the microphone to extract a time series of feature parameters of the voice, and a level signal receiver for receiving a level signal transmitted from the controlled system. A clipping reference value setting unit that sets a reference value for clipping audio based on the level signal;
A voice pattern generation unit that detects a voice region using the cut-out reference value and generates a voice pattern based on a time series of characteristic parameters in the voice region; and stores a voice pattern of a plurality of standard voices in advance as a standard pattern. A standard pattern memory, a comparison / determination unit for comparing / determining each standard pattern in the standard pattern memory with the voice pattern and identifying the voice pattern, and a control signal based on a comparison / determination result by the comparison / determination unit. Control signal sending means for sending to a control system, wherein the controlled system is an audio noise generating means for generating audio noise, a control signal receiving unit for receiving a control signal issued from the control signal sending means, A control unit for controlling the controlled system based on the control signal, and an output of audio noise output from the audio noise generating means. A level signal transmitting section for transmitting a level signal which follows the variation in level to the level signal receiver, comprising said remote control unit starts speech recognition status
At which point the level signal is
Sends a signal requesting the start of transmission, and
A speech recognition control device for starting transmission of a level signal .

3. The voice recognition control device according to claim 2, further comprising an operation unit for generating the control signal with respect to the control signal sending unit.