JPH0721759B2

JPH0721759B2 - Speech recognition response device

Info

Publication number: JPH0721759B2
Application number: JP58091809A
Authority: JP
Inventors: 洋一竹林; 英範篠田; 輝彦浮田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1983-05-25
Filing date: 1983-05-25
Publication date: 1995-03-08
Anticipated expiration: 2010-03-08
Also published as: JPS59216242A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は音声入力による情報処理システムに用いられる
音声認識応答装置に関する。TECHNICAL FIELD OF THE INVENTION The present invention relates to a voice recognition response device used in an information processing system by voice input.

[Technical background of the invention and its problems]

近時、音声認識技術や音声合成技術の発達が目覚まし
く、例えば連続音声認識や不特定話者を対象とした音声
認識が可能となり、また線形予測符号化法を用いた精度
の高い音声合成が可能となっている。また文章を音声に
変換する為の規則合成法に関しても盛んに研究開発され
ている。Recently, the development of speech recognition technology and speech synthesis technology has been remarkable. For example, continuous speech recognition and speech recognition for unspecified speakers are possible, and highly accurate speech synthesis using the linear predictive coding method is possible. Has become. In addition, research and development have been actively conducted on a rule synthesis method for converting sentences into speech.

しかして、このような技術を用いて、例えば電話公衆回
線を用いて各種のサービスを行う電話音声応答サービス
システムや、銀行等におけるオンライン業務システムの
開発が試行されており、その有用性が注目されている。
ところがこの種のシステムの利用者は不特定多数であ
り、例えば老人や子供等の不慣れな人、あるいは１日に
何回ともなく利用する人が存在する。これにも拘らず、
従来装置にあっては、その音声応答の内容が一様であ
り、またその発話速度も一定である為、人間と機械との
対話が円滑になされていなかった。つまり応答が冗長で
苛立しさが生じたり、或いは応答がわかり難いという問
題が生じた。Thus, using such a technology, for example, a telephone voice response service system for performing various services using a public telephone line, and an online business system in a bank or the like are being developed, and its usefulness is drawing attention. ing.
However, the number of users of this type of system is unspecified, and for example, there are unfamiliar people such as old people and children, or people who use the system many times a day. Despite this,
In the conventional device, the content of the voice response is uniform and the utterance speed is also constant, so that the dialogue between the human and the machine is not smooth. In other words, the response is redundant and frustrating, or the response is difficult to understand.

[Object of the Invention]

本発明はこのような事情を考慮してなされたもので、そ
の目的とするところは、人間と機械との間の自然で円滑
な対話を可能として効果的な音声入力による情報処理を
可能ならしめる実用性の高い音声認識応答装置を提供す
ることにある。The present invention has been made in consideration of such circumstances, and an object thereof is to enable natural and smooth dialogue between a human and a machine to enable effective information processing by voice input. It is to provide a highly practical voice recognition response device.

[Outline of Invention]

本発明は入力音声を認識し、この認識結果に対する応答
音声を出力する音声認識応答装置において、前記入力音
声の単語単位の時間長を検出する検出手段と、前記入力
音声の単語を認識する認識手段と、前記入力音声の単語
単位の標準時間長に関する情報を予め登録した記憶手段
と、前記認識手段により検出された単語に関する前記検
出手段により検出された時間長と前記記憶手段に登録さ
れた標準時間長の情報を用いて前記入力音声の発話速度
を測定する測定手段と、この測定手段により測定された
発話速度に応じて前記応答音声速度を制御する制御手段
とを備えたことを特徴とする。The present invention, in a voice recognition response device for recognizing an input voice and outputting a response voice in response to the recognition result, a detecting means for detecting a time length of the input voice in word units, and a recognizing means for recognizing the word of the input voice. And a storage unit that pre-registers information about a standard time length of the input voice in word units, a time length detected by the detection unit and a standard time registered in the storage unit regarding a word detected by the recognition unit. It is characterized by further comprising: measuring means for measuring the utterance speed of the input voice using the length information; and control means for controlling the response voice speed according to the utterance speed measured by the measuring means.

〔The invention's effect〕

本発明によれば、入力音声の発話速度に応じて応答音声
の速度が制御されるので、音声入力者に対して適切に応
答音声を与えることが可能となる。例えば音声入力者の
話し方が早口な場合は早口形式で、また話し方が遅い場
合はゆっくりした速度で応答がなされることにより、人
間と機械との間の対話の自然性が高くなり、円滑化を図
ることができる。According to the present invention, since the speed of the response voice is controlled according to the utterance speed of the input voice, it is possible to appropriately give the response voice to the voice input person. For example, when the voice-inputting person speaks quickly, the response is made in a fast-paced manner, and when the speech is slow, the response is made at a slow speed, which enhances the naturalness and smoothness of the dialogue between human and machine. Can be planned.

さらに、本発明では特に入力音声の単語単位の時間長
と、単語単位で登録した標準時間長を用いて入力音声の
発話速度を測定するため、同じ字数の単語でありながら
単語の種類や内容によって発話速度が違っていても、発
話速度を正確に測定することができる。従って、出力の
応答速度の速度を入力音声の発話速度に応じてより適切
に制御することが可能となる。Furthermore, in the present invention, in particular, the speech length of the input voice is measured using the time length of the input voice in word units and the standard time length registered in word units. Even if the speech rate is different, the speech rate can be accurately measured. Therefore, the output response speed can be controlled more appropriately according to the speech speed of the input voice.

Example of Invention

以下、図面を参照して本発明の実施例につき説明する。 Embodiments of the present invention will be described below with reference to the drawings.

第１図は第１の実施例装置を示す概略構成図である。こ
の装置は音声の認識対象を単語とし、この単語の発話速
度に応じて音声応答の速度制御を行うものである。即
ち、入力音声は分析器１を介してA/D変換処理、スペク
トル分析処理等が施されてその特徴パラメータの系列に
変換され、音声パターンメモリ２に格納される。音声区
間検出器３は、上記特徴パラメータ時系列の、例えばエ
ネルギー情報を利用して音声パターン中の単語の始端と
終端とを検出するものであり、これによって単語データ
部分が切出される。しかしてパターン照合回路４は、上
記単語データの音声パターンと、単語辞書メモリ５に予
め登録された複数の単語の各標準パターンとを照合し
て、単語を認識している。このパターンの照合は、例え
ば類似度計算法によって行われる。この認識結果が音声
応答出力部６に与えられる。FIG. 1 is a schematic configuration diagram showing the device of the first embodiment. In this device, a speech recognition target is a word, and the speed of a voice response is controlled according to the speech speed of the word. That is, the input voice is subjected to A / D conversion processing, spectrum analysis processing, and the like via the analyzer 1, converted into a series of characteristic parameters thereof, and stored in the voice pattern memory 2. The voice section detector 3 detects the start and end of a word in a voice pattern by using, for example, energy information of the characteristic parameter time series, and the word data portion is cut out by this. Then, the pattern matching circuit 4 matches the voice pattern of the word data with each standard pattern of a plurality of words registered in the word dictionary memory 5 in advance to recognize the word. This pattern matching is performed by, for example, a similarity calculation method. The recognition result is given to the voice response output unit 6.

一方、パターン照合回路４で求められた入力音声の認識
結果は発話速度測定器７に与えられる。この発話速度測
定器７は、入力音声の認識結果Wiと、前記始端および終
端の情報として示される単語の時間長Liとを用い、単語
継続時間長メモリ８に予め登録されている上記認識単語
Wiの標準時間長Riを求め、その平均値と分散とから発話
速度ｖを算出するものである。これによって例えば前記
入力音声の発話速度ｖがその平均的な標準発話速度より
も早いか、或いは遅いかが判定される。換言すれば、こ
れによって音声入力者が所謂早口か、標準的か、遅口か
が判定される。音声応答速度制御器９は、この発話速度
に関する情報を得て前記音声応答出力部６による応答音
声の速度を可変制御するものである。On the other hand, the recognition result of the input voice obtained by the pattern matching circuit 4 is given to the speech rate measuring device 7. The speech rate measuring device 7 uses the recognition result Wi of the input voice and the time length Li of the word shown as the information of the start end and the end, and uses the recognition word previously registered in the word duration memory 8 for the recognition word.
The standard time length Ri of Wi is obtained, and the utterance speed v is calculated from the average value and the variance thereof. Thus, for example, it is determined whether the speech speed v of the input voice is faster or slower than the average standard speech speed. In other words, this determines whether the voice input person is a so-called quick mouth, standard voice, or late voice. The voice response speed controller 9 variably controls the speed of the response voice by the voice response output unit 6 based on the information about the speech rate.

つまり、入力される音声の発話速度を入力音声全体とし
ての母音の動きやピッチによって決定するようなことは
行わず、言語的中身に依存する意味のある部分、すなわ
ち単語部分の速度によって決定するようになされてい
る。これにより、例えば、重要な部分の単語だけをゆっ
くりと強調して発話された場合、これに応答する音声
（単語）についてもゆっくりと発話するよう制御される
のである。In other words, the utterance speed of the input voice is not determined by the movement or pitch of the vowel as the entire input voice, but by the speed of the meaningful part that depends on the linguistic content, that is, the word part. Has been done. As a result, for example, when only an important portion of a word is slowly emphasized and uttered, the voice (word) responding to the utterance is controlled to be slowly uttered.

この結果、音声応答出力部６からは、入力音声の認識結
果に応じて決定された応答文の音声出力速度が上記入力
音声の発話速度に応じて可変制御されて音声応答がなさ
れることになる。このとき、規則合成方式によって応答
音声が合成出力される場合には、上記規則合成の為の種
種のパラメータの変化速度を制御することによって応答
音声速度が可変制御される。また録音編集形の音声合成
が行われる場合には、予め記録された発話速度の異なる
文章や音声素片を選択する等して、その応答音声速度の
制御が行われる。As a result, the voice response output unit 6 variably controls the voice output speed of the response sentence determined in accordance with the recognition result of the input voice in accordance with the utterance speed of the input voice to provide a voice response. . At this time, when the response voice is synthesized and output by the rule synthesis method, the response voice speed is variably controlled by controlling the changing speed of various parameters for the rule synthesis. Further, in the case of performing the voice synthesis of the recording edit type, the response voice speed is controlled by selecting a prerecorded sentence or voice unit having a different utterance speed.

かくして、このように構成された本装置によれば、音声
入力者の発話速度に応じた発話速度で音声応答が行われ
るので、所謂せっかちで早口な人に対しては早口形式
で、またのんびり型で遅口な人に対しては緩やかな速度
で音声応答することが可能となり、ここに人間と機械と
の間の対話の自然性を高め、その円滑化を図ることが可
能となる。この結果、総合的には音声認識応答による情
報処理効率の向上を図ることが可能となる。Thus, according to the present device configured as described above, since the voice response is performed at the speaking speed according to the speaking speed of the voice input person, the so-called impatient and fast-paced type can be used for the so-called impatient and quick-mouthed person, and the leisurely type can also be used. With this, it becomes possible to make a voice response to a slow-moving person at a slow speed, and it becomes possible to enhance the naturalness of the dialogue between the human and the machine and smooth the dialogue. As a result, it is possible to comprehensively improve the information processing efficiency by the voice recognition response.

このように本発明によれば、音声入力者の性格を良く反
映する音声発話速度を検出し、これに応じて音声応答の
速度を制御するので、音声入力者との間の対話の自然性
を高めることができる。この結果、上記音声入力者に苛
立たしさを与える等の不具合が無くなる等の実用上多大
なる効果が奏せられる。As described above, according to the present invention, the voice utterance speed that well reflects the character of the voice input person is detected, and the speed of the voice response is controlled accordingly, so that the naturalness of the dialogue with the voice input person can be improved. Can be increased. As a result, practically great effects such as elimination of problems such as irritation to the voice input person can be obtained.

[Brief description of drawings]

第１図は本発明の一実施例に係る音声認識応答装置の概
略構成図である。１……分析器２……音声パターンメモリ３……音声区間検出器４……パターン照合回路５……単語辞書メモリ６……音声応答出力部７……発話速度測定器８……単語継続時間長メモリ９……音声応答速度制御器FIG. 1 is a schematic configuration diagram of a voice recognition response device according to an embodiment of the present invention. 1 …… Analyzer 2 …… Voice pattern memory 3 …… Voice section detector 4 …… Pattern matching circuit 5 …… Word dictionary memory 6 …… Voice response output unit 7 …… Speech rate measuring instrument 8 …… Word duration Long memory 9 ... Voice response speed controller

───────────────────────────────────────────────────── フロントページの続き (72)発明者浮田輝彦神奈川県川崎市幸区小向東芝町１番地東京芝浦電気株式会社総合研究所内 (56)参考文献特開昭59−153238（ＪＰ，Ａ) 特開昭57−57375（ＪＰ，Ａ) ─────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Teruhiko Ukita 1 Komukai Toshiba-cho, Kouki-ku, Kawasaki-shi, Kanagawa Higashi Koshibaura Electric Co., Ltd. (56) Reference JP-A-59-153238 (JP, A) ) JP-A-57-57375 (JP, A)

Claims

[Claims]

1. A voice recognition response device for recognizing an input voice and outputting a response voice in response to the recognition result, comprising: a detecting means for extracting a word portion of the input voice and detecting a time length of the extracted word unit. A recognition unit that recognizes a word of the input voice, a storage unit that pre-registers information regarding a standard time length of the input voice in word units, and a time that is detected by the detection unit regarding a word recognized by the recognition unit. Measuring means for measuring the utterance speed of the input voice using the length and information of the standard time length registered in the storage means, and outputting the response voice in accordance with the utterance speed measured by the measuring means. A voice recognition response device, comprising: a control unit for controlling a speed.