JPH0516604B2

JPH0516604B2 -

Info

Publication number: JPH0516604B2
Application number: JP58017503A
Authority: JP
Inventors: Toshihiko Kurino; Kazuhiro Umemura
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1983-02-07
Filing date: 1983-02-07
Publication date: 1993-03-04
Also published as: JPS59144945A

Description

[Detailed description of the invention]

〔発明の利用分野〕本発明は、例えば、各種情報（例えば銀行業務
の預金残高、入金）について、電話回線により、
ユーザからの要求（照会）に応じて所望の情報を
音声で自動的に応答（提供）し、または発生した
所定の情報をユーザへ音声で自動的に通知する音
声応答・通知サービスシステムにおいて、ユーザ
の電話機からの入力音声を識別するのに使用され
る音声認識装置の認識率を向上させるための音声
認識制御方式に関するものである。〔従来技術〕上述のような音声応答・通知サービスシステム
は、当初、ユーザの電話機が押しボタンダイヤル
式（PB式）のものに対してのみ適用されていた。
したがつて、このシステムに対するサービス要求
種別等は、ユーザのPB式電話機が当該システム
と接続されてから後、そのPB信号（音声帯域内
の特定周波数の組合せ）により、当該システムに
入力されて以後の処理が進行されるようになつて
いた。ところで、最近、この種のサービス（例えば銀
行業務における預金残高照会、入金通知等のサー
ビス）が広く行われることが要望されるようにな
つてきている。しかしながら、ユーザの電話機は、必ずしも
PB式のものでなく、むしろダイヤル式（DP式）
のものの方が多く、DP式電話機では上記のよう
にPB信号によるサービス要求種別等の入力は不
可能である。また、ユーザの電話機がPB式のも
のであつても、ユーザによつては機械操作でなく
音声の入力によつてサービスを受けたいと希望す
る場合もある。このような場合には、ユーザの電話機からの入
力音声を音声認識装置で識別し、そのサービス要
求等を上記システムに入力し、所望のサービスを
提供しうるようにしなければならない。この場合、入力音声は電話回線を通して伝送さ
れたものであり、音声認識装置に不慣れなユーザ
（以下、初心者というが、これに対して慣れたユ
ーザを熟練者という。）は入力方法、発声方法が
不確実であるにもかかわらず、この種の音声認識
装置の従来のものは、例えば１種類の音声入力案
内（入力催告メツセージ）しかも用意されておら
ず、特に初心者の入力音声が不適当、不確実とな
り、その認識率が低くならざるを得ないととも
に、全体の認識率も低下させる原因となつてい
た。〔発明の目的〕本発明の目的は、上記した従来技術の欠点をな
くし、特に初心者からの入力音声についても高い
認識率を得ることができる音声認識制御方式を提
供することにある。〔発明の概要〕本発明に係る音声認識制御方式は、入力催告メ
ツセージとして内容が異なる所定種別のものを設
定しておき、情報応答サービスの場合は、所望の
入力催告メツセージ種別に対応した電話番号によ
つてサービス要求の着信をせしめることにより、
または情報通知サービスの場合は、情報通知サー
ビスを行うべき各通知者に応じて各所望の入力催
告メツセージ種別を選定・記憶せしめておくこと
により、所望の入力催告メツセージ種別の内容に
応じて所要の音声入力をさせるように制御するも
のである。〔発明の実施例〕以下、本発明の実施例を図に基づいて説明す
る。第１図は、本発明に係る音声認識制御方式によ
る音声認識装置の一実施例のブロツク図、第２図
は、そのフローチヤートである。ここで、１Ａ，１Ｂは、発信者（ユーザ）から
の音声入力および認識装置からの合成音声出力に
係る電話回線であつて、異なる電話番号のもの、
２は、入力音声信号について利得調整、帯域制限
その他所要の前処理を行つた後、そのデイジタル
変換をするとともに、電話回線１Ａ，１Ｂに関す
る着信、発信のために所要の機能を果す入力部、
３は、入力されたデイジタル音声信号に基づいて
入力音声の音声分析を行い、その特徴データを抽
出する分析部、４は、所定の音声区間検出用の閾
値に従つて入力音声の音声区間の検出処理をする
音声区間検出部、５は、入力音声と各標準音声パ
タンとのパタンマツチング処理（類似度計算処
理）を行う音声認識部、６は、その処理結果によ
つて入力音声に対する各類似度の順位を判定する
判定部、７は、認識対象の各単語について各複数
組の標準音声パタンデータを格納（記憶）してい
る標準音声パタンメモリ、８は、その選択制御を
する標準音声パタン選択部、９は、入力音声の分
析結果・認識結果の表示・確認、音声入力指示そ
の他所要の表示・指示に係る音声合成部、１０
は、発声者に合わせて、初心者用、熟練者用の２
通りの入力催告メツセージのうち、いずれかを選
択するメツセージ選択部、１１は、利用者（ユー
ザ）に関する通知者メモリ、１２は、上記各部に
対する制御その他所要の処理を行う制御部、１３
は、音声認識結果に基づいて所望のサービス処理
を行うホスト装置である。最初に、この音声認識装置が利用されているサ
ービスシステムにおいて、その情報照会（応答）
サービスの制御・動作について、第１図、第２図
ａに基づいて説明する。まず、入力催告メツセージ種別は、例えば初心
者、熟練者用の２種別が用意されているが、それ
らは、それぞれ、電話回線１Ａ，１Ｂに対応して
いるものとする。なお、上記種別は、実用の場合
によつて更に多数の内容のものを設けるようにし
てもよい。ユーザは、例えば情報応答サービスを受けよう
とし、その内容要求コード（依頼コード）の入力
指示について、初心者用、熟練者用いずれの入力
催告メツセージを希望するかに応じ、電話回線１
Ａまたは１Ｂに対応する電話番号をダイヤルす
る。それにより、そのユーザの電話機は、電話回線
１Ａまたは１Ｂに接続され、入力部２は、そのい
ずれに着信があつたかを検出した後、その情報を
制御部１２へ伝達する（第２図ａの処理２０Ａ）。制御部１２は、その情報によつて以後の入力催
告種別（初心者用、熟練者用の入力催告メツセー
ジのいずれか）を決定するとともに（同処理２１
Ａ）、音声入力に対する準備を入力部２、分析部
３、音声区間検出部４へ指示する。また、その時
に認識対象となるべき単語の分類（例えば、数
字、サービス種別名、物品名、地名等の分類）の
標準音声パタンの全組を標準音声パタンメモリ７
から選択するように標準音声パタン選択部８に対
して指示する（同処理２２）。これらの準備が完了すると、発声者に対して音
声入力を促すべき入力催告メツセージ（初心者
用、熟練者のいずれか）を音声合成部９経由で電
話回線１Ａまたは１Ｂから送出せしめる（同処理
２３）。これにより、発声者が電話回線１Ａまたは１Ｂ
から音声を入力すると（同処理２４）、入力部２
は、そのデイジタル変換等をし、分析部３は、そ
のデイジタル音声信号について音声分析をして当
該特徴データ等の抽出をする（同処理２５）。音声認識部５は、入力音声の特徴データと選択
されている標準音声パタンデータとの間でパタン
マツチング処理を行い、入力音声に対する上記各
標準音声パタンの類似度を判定部６へ伝える（同
処理２６）。判定部６は、類似度が最上位となる（最も確か
らしい）ものを認識結果として制御部１２へ伝え
る（同処理２６）。入力音声に対して最も確からしい類似度の値が
低く、それを認識結果として出力するのは疑わし
いとすべきリジエクトの場合には、制御部１２
は、標準音声パタン選択部８に対して今までと同
一のパタンを選択するように指示するとともに
（同処理３０）、音声合成部９経由で電話回線１Ａ
または１Ｂから再入力催告メツセージを送出せし
める（同処理３１）。また、リジエクトでない場合には、制御部１２
は、その認識結果が正しいものであるか否かを発
声者に確認させるための表示として、認識要求メ
ツセージを音声合成部９経由で電話回線１Ａまた
は１Ｂから送出させる（同処理２８）。発声者は、これを聴取して自己の入力音声につ
いて正認識、誤認識いずれであつたかを知り、そ
の確認結果を電話回線１Ａまたは１Ｂから入力す
る（同処理２９）。制御部１２は、上記確認情報により、上述の認
識候補が正しいものであるときは、それを認識結
果としてホスト装置１３へ送出し、１つの入力音
声に対する処理を終了せしめて次の入力に備え
る。一方、誤認識であつたという確認情報を受けた
場合は、前述のリジエクトの場合と同様に処理３
０，３１を行わせ、これを正認識が得られるまで
繰り返して行い、正認識となつたときは、上述と
同様に当該認識結果がホスト装置１３へ送出さ
れ、一連の処理が終了する。ここで、初心者用、熟練者用の入力催告メツセ
ージの内容例を示すと下表のとおりである。 [Field of Application of the Invention] The present invention provides, for example, information on various types of information (e.g., bank account balances and deposits) over a telephone line.
In a voice response/notification service system that automatically responds (provides) desired information by voice in response to a request (inquiry) from a user, or automatically notifies the user of predetermined information that has occurred, the user The present invention relates to a speech recognition control method for improving the recognition rate of a speech recognition device used to identify input speech from a telephone. [Prior Art] The voice response/notification service system as described above was initially applied only to users' telephones of the push-button dial type (PB type).
Therefore, the type of service request for this system, etc. will be determined after the user's PB telephone is connected to the system and after it is input to the system by the PB signal (combination of specific frequencies within the voice band). The process was beginning to proceed. Incidentally, recently, there has been a growing demand for this type of service (for example, services such as deposit balance inquiries and deposit notifications in banking services) to be widely provided. However, the user's phone is not necessarily
Not a PB type, but rather a dial type (DP type)
With DP telephones, it is not possible to input service request types using PB signals as described above. Furthermore, even if the user's telephone is a PB type, the user may wish to receive services through voice input rather than mechanical operation. In such a case, it is necessary to identify the voice input from the user's telephone using a voice recognition device, input the service request, etc. into the system, and provide the desired service. In this case, the input voice is transmitted through a telephone line, and a user who is unfamiliar with the voice recognition device (hereinafter referred to as a beginner, whereas a user who is accustomed to it is referred to as an expert) may have difficulty in inputting and speaking. Despite the uncertainty, conventional speech recognition devices of this type, for example, only provide one type of voice input guidance (input reminder message), and especially beginners' input voice may be inappropriate or inappropriate. As a result, the recognition rate inevitably becomes low, and also causes a decrease in the overall recognition rate. [Object of the Invention] An object of the present invention is to provide a speech recognition control method that eliminates the drawbacks of the above-mentioned conventional techniques and can obtain a high recognition rate, especially for input speech from beginners. [Summary of the Invention] In the voice recognition control system according to the present invention, predetermined types with different contents are set as input reminder messages, and in the case of an information response service, a telephone number corresponding to the desired input reminder message type is set. By inducing incoming service requests by
Alternatively, in the case of an information notification service, by selecting and storing each desired input reminder message type according to each notifier who should perform the information notification service, the required input reminder message type can be selected and stored according to the content of the desired input reminder message type. This controls the voice input. [Embodiments of the Invention] Hereinafter, embodiments of the present invention will be described based on the drawings. FIG. 1 is a block diagram of an embodiment of a speech recognition apparatus using a speech recognition control method according to the present invention, and FIG. 2 is a flowchart thereof. Here, 1A and 1B are telephone lines related to the voice input from the caller (user) and the synthesized voice output from the recognition device, which have different phone numbers.
2 is an input unit that performs gain adjustment, band limitation, and other necessary preprocessing on the input audio signal, and then converts the input audio signal into digital data, and also performs the necessary functions for incoming and outgoing calls on the telephone lines 1A and 1B;
3 is an analysis unit that performs audio analysis of the input audio based on the input digital audio signal and extracts characteristic data thereof; 4 is a detector that detects audio sections of the input audio according to a predetermined threshold for detecting audio sections; 5 is a speech recognition section that performs pattern matching processing (similarity calculation processing) between the input speech and each standard speech pattern; 6 is a speech recognition section that performs pattern matching processing (similarity calculation processing) between the input speech and each standard speech pattern; 6 is a speech recognition section that performs pattern matching processing (similarity calculation processing) between the input speech and each standard speech pattern; 7 is a standard speech pattern memory that stores (memorizes) multiple sets of standard speech pattern data for each word to be recognized; 8 is a standard speech pattern that controls the selection; A selection section 9 is a speech synthesis section 10 for displaying/confirming input voice analysis results/recognition results, voice input instructions, and other necessary display/instructions.
There are two types, one for beginners and one for experts, depending on the speaker.
11 is a notifier memory for the user; 12 is a control unit for controlling the above-mentioned units and other necessary processing; 13;
is a host device that performs desired service processing based on voice recognition results. First, in the service system where this speech recognition device is used, the information inquiry (response) is performed.
The control and operation of the service will be explained based on FIG. 1 and FIG. 2a. First, two types of input reminder messages are prepared, for example, for beginners and for experts, and it is assumed that these correspond to telephone lines 1A and 1B, respectively. In addition, the above-mentioned types may be provided with a larger number of contents depending on the practical case. For example, when a user attempts to receive an information response service, he/she may call the telephone line 1 depending on whether he/she wishes to receive an input reminder message for beginners or experts regarding input instructions for the content request code (request code).
Dial the phone number corresponding to A or 1B. As a result, the user's telephone is connected to the telephone line 1A or 1B, and the input unit 2, after detecting which of them receives the call, transmits the information to the control unit 12 (see FIG. 2a). Processing 20A). The control unit 12 determines the type of subsequent input reminder message (either one for beginners or an input reminder message for experts) based on the information (processing 21).
A) Instructs the input section 2, analysis section 3, and speech section detection section 4 to prepare for speech input. In addition, all sets of standard voice patterns for the classification of words to be recognized at that time (for example, classifications of numbers, service type names, product names, place names, etc.) are stored in the standard voice pattern memory 7.
The standard voice pattern selection unit 8 is instructed to select from among the following (process 22). When these preparations are completed, an input reminder message (either for beginners or experts) to prompt the speaker to input voice is sent from the telephone line 1A or 1B via the voice synthesis unit 9 (same process 23). . This allows the speaker to connect to telephone line 1A or 1B.
When inputting audio from (same process 24), input section 2
performs digital conversion, etc., and the analysis unit 3 performs audio analysis on the digital audio signal and extracts the characteristic data, etc. (processing 25). The speech recognition section 5 performs pattern matching processing between the feature data of the input speech and the selected standard speech pattern data, and transmits the degree of similarity of each standard speech pattern to the input speech to the determination section 6 (same as Processing 26). The determination unit 6 transmits the one with the highest degree of similarity (most likely) to the control unit 12 as a recognition result (processing 26). In the case of a reject whose most probable similarity value to the input voice is low and it is doubtful to output it as a recognition result, the control unit 12
Instructs the standard voice pattern selection unit 8 to select the same pattern as before (same process 30), and also connects the telephone line 1A via the voice synthesis unit 9.
Alternatively, a re-input reminder message is sent from 1B (same process 31). In addition, if it is not a reject, the control unit 12
The recognition request message is sent from the telephone line 1A or 1B via the speech synthesis section 9 as a display for the speaker to confirm whether the recognition result is correct or not (process 28). The speaker listens to this to know whether his or her input voice was recognized correctly or incorrectly, and inputs the confirmation result through the telephone line 1A or 1B (process 29). If the recognition candidate is correct based on the confirmation information, the control unit 12 sends it as a recognition result to the host device 13, ends the processing for one input voice, and prepares for the next input. On the other hand, if you receive confirmation information that it was misrecognized, process 3 as in the case of reject above.
0 and 31, and this is repeated until correct recognition is obtained. When correct recognition is obtained, the recognition result is sent to the host device 13 in the same manner as described above, and the series of processing ends. Here, examples of the contents of the input reminder messages for beginners and experts are shown in the table below.

〔Effect of the invention〕

以上、詳細に説明したように、本発明によれ
ば、特に初心者からの音声入力を確実化すること
によつて音声認識装置の認識率を全体的に向上す
ることができるので、この種のサービスシステム
の信頼性、サービス性の向上、ひいては導入拡大
に顕著な効果が得られる。 As described in detail above, according to the present invention, it is possible to improve the overall recognition rate of the speech recognition device by ensuring voice input, especially from beginners. Significant effects can be achieved in improving system reliability and serviceability, and in turn, increasing adoption.

[Brief explanation of drawings]

第１図は、本発明に係る音声認識制御方式によ
る音声認識装置の一実施例のブロツク図、第２図
は、そのフローチヤートである。１……電話回線、２……入力部、３……分析
部、４……音声区間検出部、５……音声認識部、
６……判定部、７……標準音声パタンメモリ、８
……標準音声パタン選択部、９……音声合成部、
１０……メツセージ選択部、１１……通知者メモ
リ、１２……制御部、１３……ホスト装置。 FIG. 1 is a block diagram of an embodiment of a speech recognition apparatus using a speech recognition control method according to the present invention, and FIG. 2 is a flowchart thereof. 1...Telephone line, 2...Input section, 3...Analysis section, 4...Speech section detection section, 5...Speech recognition section,
6... Judgment unit, 7... Standard voice pattern memory, 8
...Standard speech pattern selection section, 9...Speech synthesis section,
10...Message selection unit, 11...Notifier memory, 12...Control unit, 13...Host device.

Claims

[Claims]

1. In a voice response service system that includes a voice recognition device that sends an input reminder message through a telephone line and recognizes the input voice input sequentially in accordance with the message, the speakers using the service system are at least beginners and experienced speakers. A plurality of telephone lines are set up corresponding to each of the above-mentioned predetermined types for use according to which of the predetermined types they belong to, including persons, and a prompt for the input of the speaker corresponding to each telephone line is provided. means for storing an input reminder message for a telephone line, and a means for reading out and transmitting the stored input reminder message corresponding to one of the telephone lines to a speaker making a call through one of the telephone lines; 1. A voice recognition control method, comprising means for controlling a user to input a desired voice.