JP2845355B2

JP2845355B2 - Signal detection device

Info

Publication number: JP2845355B2
Application number: JP4267225A
Authority: JP
Inventors: 隆小森; 滋片桐
Original assignee: EI TEI AARU SHICHOKAKU KIKO KENKYUSHO KK
Current assignee: EI TEI AARU SHICHOKAKU KIKO KENKYUSHO KK
Priority date: 1992-10-06
Filing date: 1992-10-06
Publication date: 1999-01-13
Anticipated expiration: 2014-01-13
Also published as: JPH06119306A

Abstract

PURPOSE:To provide a new spotting type pattern recognizing method by which high recognizing performance can be obtained. CONSTITUTION:A signal is inputted from a signal input part 1, the signal is converted into the time series of feature vector and a distance is calculated as a value showing the classification degree in each frame of an input continuous pattern series in a feature extraction part 2, and a recognition/learning processing part 6 decides whether the pattern of the class model based on the calculated class distance exists or not, decides that it exists if the class distance is smaller than a threshold 5 and it does not exist if not so. Further, the recognition/learning processing part 6 performs the learning update of a parameter.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】この発明は信号検出装置に関し、
たとえば、音声，文字，画像のように数値表現が可能な
パターン（信号系列）中から特定のクラス（類）に属す
るパターンを検出するような信号検出装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a signal detection device,
For example, the present invention relates to a signal detection device that detects a pattern belonging to a specific class (class) from a pattern (signal sequence) that can be expressed numerically such as voice, text, and image.

【０００２】[0002]

【従来の技術】スポッティングに基づくパターン認識
（以下、スポッティング法と称する）では、パターン認
識器は、予め設計されたスポッティングされるべきクラ
スのパターンモデルとクラス帰属度しきい値とから構成
される。パターン認識器は、入力連続パターン系列が与
えられると、入力連続パターン系列の各位置（パターン
が時間信号である場合は時点，パターンが空間的信号で
ある場合は空間的位置）における各クラスに対する帰属
度をこのクラスモデルを用いて測り、得られた帰属度を
上述のしきい値と比較し、帰属度実測値としきい値との
関係が予め設定された条件を満足する場合にパターンが
その位置に存在するとの判断を出力し、帰属度実測値と
しきい値との関係がその条件を満足しない場合にパター
ンがその位置に存在しないとの判断を出力する。2. Description of the Related Art In pattern recognition based on spotting (hereinafter, referred to as spotting method), a pattern recognizer is composed of a pattern model of a class to be spotted, which is designed in advance, and a class membership threshold. Given an input continuous pattern sequence, the pattern recognizer assigns an attribute to each class at each position of the input continuous pattern sequence (the time point when the pattern is a time signal, and the spatial position when the pattern is a spatial signal). The degree of membership is measured using this class model, and the obtained degree of membership is compared with the above-mentioned threshold value. If the relationship between the measured membership level and the threshold value satisfies a preset condition, the pattern is positioned Is output, and when the relationship between the measured membership value and the threshold value does not satisfy the condition, the determination that the pattern does not exist at that position is output.

【０００３】上述のスポッティング法には、パターン認
識課題全般に共通する認識誤り（置換）と、スポッティ
ング法固有の誤り、すなわち脱落と付加とが存在する。
この３種類の誤りは、以下ではスポッティング誤りと総
称される。In the above spotting method, there are a recognition error (replacement) common to all pattern recognition tasks, and an error unique to the spotting method, ie, dropout and addition.
These three types of errors are hereinafter collectively referred to as spotting errors.

【０００４】[0004]

【発明が解決しようとする課題】上述のスポッティング
法に基づく従来のパターン認識は、各クラスにおける最
大尤度あるいは最小歪みを実現するクラスモデル、ある
いは全く経験的に選択されたクラスモデルと、経験的に
設定された帰属度しきい値とから構成され、しかもこれ
らの設計あるいは設定は、認識段階に先行する訓練段階
においてのみ行なわれる。これらの設計手法は、上述の
スポッティング誤りの最小化という本来のスポッティン
グ法パターン認識器のための設計目標と一貫性のないも
のである。このため、従来のパターン認識は、訓練結果
の最適性、すなわち最小スポッティング誤り状態が保証
されず、しかも入力連続パターン系列の特性あるいはそ
の入力時におけるパターン認識の環境の変化に追随でき
ず、高い認識性能（少ないスポッティング誤り）を得る
ことができないという問題点があった。In the conventional pattern recognition based on the spotting method described above, a class model realizing the maximum likelihood or the minimum distortion in each class, or a class model selected empirically, and an empirical model are used. , And their design or setting is performed only in the training stage preceding the recognition stage. These design techniques are inconsistent with the design goals for the original spotting pattern recognizer of minimizing spotting errors described above. For this reason, the conventional pattern recognition does not guarantee the optimality of the training result, that is, the minimum spotting error state, and cannot follow the change of the characteristics of the input continuous pattern sequence or the environment of the pattern recognition at the time of the input. There is a problem that performance (small spotting error) cannot be obtained.

【０００５】それゆえに、この発明の主たる目的は、高
い認識性能を得ることができるような信号検出装置を提
供することである。[0005] Therefore, a main object of the present invention is to provide a signal detection device capable of obtaining high recognition performance.

【０００６】[0006]

【課題を解決するための手段】この発明は、音声，文
字，画像のような入力信号の系列中から認識単位である
クラスに属する信号系列であるか否かをスポッティング
法を用いて判別し、所望の信号系列を抽出する信号検出
装置であって、入力された信号系列を特徴系列に変換す
る特徴抽出手段と、特徴抽出手段によって得られた特徴
系列中の任意の部分系列のクラスに属する可能性のある
大きさを表わす値としての距離を、部分系列とクラスの
テンプレートとを用いて演算する距離演算手段と、演算
された距離がクラスのしきい値よりも小さければ部分系
列がクラスに属すると判断し、そうでなければクラスに
属さないと判断する検出手段と、検出手段によって部分
系列が該当するクラスに属しているにもかかわらず属さ
ないと判断してしまった誤りと、部分系列が該当するク
ラスに属していないにもかかわらず属するものと判断し
てしまった誤りとにおける誤りの生起回数をテンプレー
トおよびしきい値に関して定義される１次微分係数によ
って構成される勾配に基づいて、テンプレートおよびし
きい値を更新するための更新手段とを備えて構成され
る。According to the present invention , a recognition unit is selected from a sequence of input signals such as voices, characters, and images.
Spotting whether a signal belongs to a class or not
Law determined using, a signal detection apparatus for extracting a desired signal sequence, any feature extraction means, in the feature sequence obtained by the feature extracting means for converting an input signal series FEATURES sequence May belong to the subsequence class of
The distance as a value representing the size, partial sequences and classes
And distance calculating means for calculating using the template, a detection means computed distance is less if partial sequence than class threshold is determined to belong to the class, it is determined not to belong to the class otherwise , part depending on the detection means
Even though the series belongs to the class,
Errors that have been determined to be
Judging that it does belong to Lass
The number of occurrences of errors with mistakes
And the first derivative defined for the threshold
And updating means for updating the template and the threshold value based on the gradient configured as described above.

【０００７】[0007]

【作用】この発明に係る信号検出装置は、入力された信
号系列を特徴系列に変換し、その特徴系列中の任意の部
分系列のクラスに属する可能性のある大きさを表わす値
としての距離を、部分系列とクラスのテンプレートとを
用いて演算し、その距離がクラスのしきい値よりも小さ
いか否かによって部分系列がそのクラスに属するか否か
を検出する。部分系列がクラスに属しているにもかかわ
らず属さないと判断してしまった誤りと、部分系列が該
当するクラスに属していないにもかかわらず属するもの
と判断してしまった誤りとにおける誤りの生起回数をテ
ンプレートとしきい値に関して定義される１次微分係数
とによって構成される勾配に基づいてテンプレートとし
きい値とを更新する。[Action] signal detection apparatus according to the present invention converts an input signal sequence to the feature sequence, the distance as a value representing the potential size belonging to any part series class in its features series Is calculated using the subsequence and the class template, and the distance is smaller than the class threshold.
Whether the subsequence belongs to the class depending on whether
Is detected. Whether the sub-series belongs to the class
Error and the sub-sequence
Belonging to a class that does not belong to the class
The number of occurrences of the error with the error
First derivative defined for template and threshold
And updates the template and the threshold based on the gradient formed by

【０００８】[0008]

【実施例】以下に、スポッティング法に基づく音声パタ
ーン認識を対象にした具体的な適応型学習則の実施例を
示す。ここでの実施例において採用されている最適解探
索法，一般化確率的降下法をスポッティング誤り基準の
定義を変更することなく、他のいかなる合理的最適解探
索法に取換えることもできる。特に、最急勾配法のよう
なバッチ的探索方法を採用するとき、結果である学習則
もまたバッチ型になる。また、以下に述べる実施例は、
１クラスに関するものであるが、スポッティング法が本
質的にクラス毎に独立に行なわれることに基づき、その
多クラス課題への拡張は、以下に示される手続の変更な
しに容易に行なわれる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a specific adaptive learning rule for speech pattern recognition based on the spotting method will be described below. The optimal solution search method and the generalized stochastic descent method employed in the embodiment can be replaced with any other rational optimal solution search method without changing the definition of the spotting error criterion. In particular, when a batch search method such as the steepest gradient method is employed, the resulting learning rule is also of a batch type. Also, the embodiments described below are
Although related to one class, based on the fact that the spotting method is essentially performed independently for each class, the extension to the multi-class task is easily performed without changing the procedure described below.

【０００９】図１はこの発明の一実施例のブロック図で
ある。図１において、信号入力部１から音声信号が入力
され、特徴抽出部２に与えられる。特徴抽出部２は入力
された音声信号を音響特徴ベクトルの時系列に変換す
る。この音響特徴ベクトルは、たとえばサンプリング周
波数１２ｋＨｚでサンプリングし、１６ｂｉｔのデジタ
ル信号に変換された音声信号を、５ｍｓｅｃ間隔で２５
６点離散フーリエ変換し、メルスケールに基づき１６チ
ャネルに分割し、各チャネルの出力の和の対数からなる
１６次元のベクトルに変換したものである。各音響特徴
ベクトルの各フレームについて前後３ベクトルずつを合
わせた連続した７ベクトルを入力連続パターン系列の１
フレームとする。FIG. 1 is a block diagram of one embodiment of the present invention. In FIG. 1, an audio signal is input from a signal input unit 1 and supplied to a feature extraction unit 2. The feature extraction unit 2 converts the input speech signal into a time series of acoustic feature vectors. This acoustic feature vector is obtained by, for example, sampling an audio signal, which is sampled at a sampling frequency of 12 kHz and converted into a 16-bit digital signal, at an interval of 5 msec.
It is a six-point discrete Fourier transform, is divided into 16 channels based on the mel scale, and is converted into a 16-dimensional vector composed of the logarithm of the sum of the output of each channel. For each frame of each acoustic feature vector, seven consecutive vectors obtained by combining three vectors before and after each frame are set as one of the input continuous pattern sequences.
Frame.

【００１０】次に、距離計算部３は、入力連続パターン
系列の各フレームにおけるクラス帰属度を表わす値とし
て、そのフレームを終端とする入力連続パターン系列の
部分系列と、クラスモデル４との距離（以下、クラス距
離と呼ぶ）を計算する。クラスモデル４は入力連続パタ
ーン系列を構成する音響特徴ベクトルと同次元のベクト
ルの系列（以下、テンプレートと呼ぶ）の１つまたは複
数からなる集合で構成される。クラスモデル４を構成す
るテンプレートの総数をＢとし、指標ｂによって示され
るテンプレートをλ_bで表わし、テンプレートの集合と
してのクラスモデル４をΑ^*＝｛Α_b｝^B _b＝１とで表
わし、入力連続パターン系列をｘとして、入力連続パタ
ーン系列の第ｉフレームにおけるクラス距離のｇ
_i（ｘ；Α^*）は次の第（１）式で定義される。すなわ
ち、この第（１）式より、入力パターンの部分系列とク
ラスモデルとの間の距離、すなわちクラス距離が定義さ
れる。 Next, the distance calculation unit 3 calculates, as a value representing the degree of class membership in each frame of the input continuous pattern sequence, the distance between the partial sequence of the input continuous pattern sequence ending with that frame and the class model 4 ( Hereinafter, the distance is referred to as a class distance). The class model 4 is composed of a set of one or more of a sequence of vectors having the same dimension as the acoustic feature vector forming the input continuous pattern sequence (hereinafter, referred to as a template). The total number of templates constituting the class model 4 is represented by B, the template indicated by the index b is represented by λ _b , the class model 4 as a set of templates is represented by Α ^* = ｛Α _b ｝ ^B _b = 1, and input Assuming that the continuous pattern sequence is x, the class distance g in the i-th frame of the input continuous pattern sequence
_i (x; Α ^* ) is defined by the following equation (1). Sand
From equation (1), the subsequence of the input pattern and the
Distance to the class model, that is, the class distance is defined.
It is.

【００１１】[0011]

【数１】 (Equation 1)

【００１２】ここで、Ｋは１以上Ｂ以下の自然数であ
り、ｂ（ｋ）はＢ個のテンプレートの中からＫ個を取り
出すある順列におけるｋ番目のテンプレートを示す指標
であり、ζは正の数である。また、Ｄ_i（ｘ；λ_b）は
入力連続パターン系列の第ｉフレームを終端とする部分
系列と指標ｂのテンプレートとの距離（以下、累積経路
距離と呼ぶ）であり、ω_kは指標ｂ（ｋ）のテンプレー
トとの累積経路距離に対する重み係数である。音声パタ
ーンは本質的に時間的な長さが変動するので、このよう
なパターン間の距離を計るためには、この長さに関する
変動を正規化する必要がある。そこで、以下に示す第
（２）式〜第（４）式に示したような副次的な距離の概
念として定義し、その中で正規化手続を実現する。定義
された距離のうち累積経路距離は次の第（２）式で定義
される。Here, K is a natural number not less than 1 and not more than B, b (k) is an index indicating the k-th template in a certain permutation for extracting K out of B templates, and ζ is a positive number. Is a number. D _i (x; λ _b ) is the distance between the partial sequence ending with the i-th frame of the input continuous pattern sequence and the template of the index b (hereinafter referred to as the cumulative path distance), and ω _k is the index b It is a weight coefficient for the cumulative route distance from the template (k). Voice pattern
Since the length of a symbol varies in nature,
In order to measure the distance between various patterns,
Fluctuations need to be normalized. Therefore, the following
Approximation of secondary distances as shown in equations (2) to (4)
Just in case, the normalization procedure is implemented in it. Definition
The cumulative route distance among the distances obtained is defined by the following equation (2).

【００１３】[0013]

【数２】 (Equation 2)

【００１４】ここで、Ｈ_{i b}は入力連続パターン系列の
第ｉフレームを終端とする部分系列と指標ｂのテンプレ
ートとの間に許される時間軸伸縮写像の総数であり、Ｍ
は１以上Ｈ以下の自然数であり、ｈ（ｍ）はＨ_{i b}個の
時間軸伸縮写像の中からＭ個取出すある順列におけるｍ
番目の時間軸伸縮写像を示す指標であり、ξηは正の整
数である。また、Ｄ_{i h}（ｘ；λ_b）は、指標ｂのテン
プレートを構成する全フレームについて、指標θの時間
軸伸縮写像によって対応する入力連続パターン系列のフ
レームとの局所距離δ_{i h j}（ｘ；λ_b）の重み付け総
和（以下、経路距離と呼ぶ）を取ったものであり、ρ_m
は指標ｈ（ｍ）の時間軸伸縮写像による経路距離に対す
る重み係数である。経路距離は次の第（３）式で定義さ
れる。Here, _Hib is the total number of time-axis expansion / contraction maps allowed between the subsequence ending with the i-th frame of the input continuous pattern sequence and the index b template.
M in a certain permutation is a natural number of 1 or more H below, h (m) is taken out M pieces from among H _ib number of time-scaling mapping
This is an index indicating the third time axis expansion / contraction map, and ξη is a positive integer. Further, D _ih (x; λ _b ) is a local distance δ _ihj (x; λ _b ) of all the frames constituting the template of the index b with respect to the frame of the input continuous pattern sequence corresponding to the index θ by time-axis expansion / contraction mapping. ) (Hereinafter referred to as path distance), and ρ _m
Is a weight coefficient for the path distance of the index h (m) by the time-axis expansion / contraction map. The path distance is defined by the following equation (3).

【００１５】[0015]

【数３】 (Equation 3)

【００１６】ここで、Ｊ_bは指標ｂのテンプレートを構
成するフレームの総数であり、ｗ_{b j}は指標ｂのテンプ
レートの第ｊフレームに対応する局所距離の重み係数で
ある。局所距離は次の第（４）式で定義される。Here, J _b is the total number of frames constituting the template of the index b, and w _bj is a weighting coefficient of the local distance corresponding to the j-th frame of the template of the index b. The local distance is defined by the following equation (4).

【００１７】[0017]

【数４】 (Equation 4)

【００１８】ここで、Ｓはフレームの次元数であり、ｒ
_{b j s}は指標ｂのテンプレートの第ｊフレームの第ｓ要
素であり、_q(i,b,h,j)は指標ｂのテンプレートの第ｊフ
レームに指標ｈの時間軸伸縮写像によって対応する入力
連続パターン系列のフレームの指標であり、ｘ
_q(i,b,h,j)sは入力連続パターン系列の第_q(i,b,h,j)フ
レームの第ｓ要素である。Here, S is the number of dimensions of the frame, and r
_bjs is the s-th element of the j-th frame of the template of the index b, and _{q (i, b, h, j)} is the input continuous pattern corresponding to the j-th frame of the template of the index b by the time-axis expansion / contraction map of the index h Index of the frame of the sequence, x
_{q (i, b, h, j) s} is the s-th element of the _{q-th (i, b, h, j)} frame of the input continuous pattern sequence.

【００１９】次に、認識・学習処理部６において、スポ
ッティング法による判断の処理と、クラスモデル４およ
びしきい値５の更新処理が行なわれる。図２はこれらの
処理の手順を示す図である。Next, in the recognition / learning processing unit 6, a judgment process by the spotting method and a process of updating the class model 4 and the threshold value 5 are performed. FIG. 2 is a diagram showing the procedure of these processes.

【００２０】まず、図２における判定処理部ＳＰ１にお
いて、入力連続パターン系列の各位置にスポッティング
の対象とするクラスのパターンが存在するか否かが判定
される。上述のクラス距離がしきい値５より小さければ
スポッティングの対象とするクラスのパターンが存在
し、そうでなければ存在しないと判定される。次に、更
新処理部ＳＰ２において、システムのパラメータの更新
が行なわれる。教師信号によってスポッティングの対象
とするクラスのパターンが存在するか否かが与えられた
入力連続パターン系列の部分Ｒについて、その部分Ｒの
中にスポッティングの対象とするクラスのパターンが存
在するか否かの判定を反映する関数、すなわちスポッテ
ィング測度を次の第（５）式で定義する。First, in the determination processing unit SP1 in FIG. 2, it is determined whether or not a pattern of a class to be spotted exists at each position of the input continuous pattern sequence. If the above-mentioned class distance is smaller than the threshold value 5, it is determined that a pattern of the class to be spotted exists, otherwise, it does not exist. Next, the update processing unit SP2 updates the system parameters. For the portion R of the input continuous pattern sequence given whether or not the pattern of the class to be spotted exists by the teacher signal, whether or not the pattern of the class to be spotted exists in the portion R A function that reflects the determination of, that is, a spotting measure, is defined by the following equation (5).

【００２１】[0021]

【数５】 (Equation 5)

【００２２】ここで、Λはクラスモデル４（Λ^*）とし
きい値５（ｚ）からなる本システムのパラメータ全体を
表わし、Ｉ_Rは入力連続パターン系列の部分Ｒに含まれ
るフレームの総数であり、μは正の数である。入力連続
パターン系列の部分Ｒにおけるスポッティング速度が負
であることはスポッティングの対象とするクラスのパタ
ーンが部分Ｒ中に存在するという判定を、負でないこと
は存在しないという判定をそれぞれ反映する。Here, Λ represents the entire parameters of the present system including the class model 4 (Λ ^* ) and the threshold value 5 (z), and I _R is the total number of frames included in the portion R of the input continuous pattern sequence. , Μ are positive numbers. The fact that the spotting speed in the portion R of the input continuous pattern sequence is negative reflects the determination that the pattern of the class to be spotted exists in the portion R, and the determination that the pattern is not negative does not exist.

【００２３】この第（５）式は「属する」，「属さな
い」という判定をしきい値５と第（１）式の距離を少し
変形したＬ _p ノルム形式の距離との比較で実現している
ものということができる。第（５）式の右辺第２項がこ
のＬ _p ノルム形式の距離である。このような複雑な形式
をとっている理由は、入力のある時点のみで第（１）式
の距離を計算するのは判定の安定性の観点から望ましく
なく、Ｒという部分区間の中での距離の平均をとること
によって安定性を向上させることにある。ここで、第
（５）式において比較結果を２つの項の差で表現してい
る。もし、差がつまりスポッティング測度ｄが負であれ
ば、右辺２項の平均距離がしきい値よりも大きいことに
なる。つまり、ｄが負であるということは、クラスのパ
ターンがＲに存在しないという判定を下すことになり、
ｄが負でない（０あるいは正の）場合は、右辺２項の平
均距離がしきい値よりも小さいことになる。したがっ
て、部分系列がクラスモデルに近いと判定するのが自然
であり、このときはクラスのパターンが「存在する」と
判定する。この発明に係る信号検出装置では、どの部分
系列にそのクラスのパターンが「存在するか」あるいは
「存在しないか」の情報が予め付与された訓練用の入力
パターンを用いて、この第（５）式を用いる判定ができ
るだけ正しく実施できるように訓練可能なしきい値をク
ラスモデルとの両方を最適化することを目指している。 The expression (5) represents "belongs" and "belongs".
The distance between the threshold 5 and the formula (1)
It is realized by comparison with the modified L _p norm form distance
It can be said. The second term on the right side of equation (5) is
Is a distance in the form of L _p norm. Such a complex format
The reason for taking is that the equation (1)
Is preferable from the viewpoint of the stability of the judgment.
Without taking the average of the distances in the subsection R
To improve the stability. Where
In equation (5), the comparison result is expressed by the difference between two terms.
You. If the difference is negative and the spotting measure d is negative
If the average distance of the two terms on the right side is larger than the threshold
Become. In other words, if d is negative, it means that the class
Will determine that the turn is not in R,
If d is not negative (0 or positive), the two terms on the right
The average distance will be smaller than the threshold. Accordingly
It is natural to determine that the subsequence is close to the class model
In this case, the class pattern "exists"
judge. In the signal detection device according to the present invention,
If the pattern of that class exists in the series
Training input to which "whether does not exist" information is added in advance
By using the pattern, it is possible to make a determination using the equation (5).
A trainable threshold so that
Alas aims to optimize both with the model.

【００２４】[0024]

【数６】上述の第（６）式においてｉｆＶ（Ｒ）＝Ｔは、もし連
続パターン系列の部分Ｒにスポッティングの対象となる
クラスのパターンが実際に存在する場合であり、ｉｆＶ
（Ｒ）＝Ｆは上記の部分Ｒにスポッティングの対象とす
るクラスのパターンが実際に存在しない場合を意味して
いる。第（６）式の上段は、脱落誤りの数の近似値を表
わしている。上段の場合は、もともと該当部分系列にク
ラスのパターンが含まれている状況を示しているので、
「存在しない」と判定したときのみ脱落誤りを起こす。
実際、この第（６）式の上段の損失関数の式は、第
（５）式のスポッティング測度が「一般に負でない」大
きな値をとる（「存在する」という判定を下す）場合に
０に近い値を示し、換言すると「誤りはない」と評価
し、第（５）式のスポッティング測度が負の値をとる
（「存在しない」という判定を下す）場合に１に近い値
を示す。換言すると、「もともとあるのに、存在しな
い」という判定をすることによる脱落誤りを防止するよ
うに作られている。第（６）式の下段は、上段の逆のケ
ースである。つまり、もともとクラスのパターンが存在
しないのに、「存在する」と判定する付加誤りの数を計
上する場合である。ここで、第（６）式の下段の損失関
数は、αの前に−の記号がつけられることによって、第
（５）式の測度が正の大きな値をとる場合に１に近い数
を示し、第（５）式の測度が負の値をとる場合に０に近
い数を示すように作られている。 (Equation 6) In the above equation (6), ifV (R) = T
Becomes a spotting target in the part R of the continuation pattern sequence
If the pattern of the class actually exists, ifV
(R) = F is the spotting target in the above part R
Class pattern does not actually exist
I have. The upper part of equation (6) shows the approximate value of the number of missing errors.
I do. In the case of the upper row, the sub-series is originally
Since it shows the situation where the lath pattern is included,
An omission error occurs only when it is determined that "exists".
In fact, the equation of the loss function in the upper part of this equation (6) is
The spotting measure in equation (5) is "generally non-negative"
Value (determining that it exists)
Indicates a value close to 0, in other words, evaluates to "no error"
And the spotting measure of equation (5) takes a negative value
A value close to 1 if (determined as "not present")
Is shown. In other words, "Even though it originally exists,
To prevent mistakes in dropping
It is made like. The lower part of equation (6) is the reverse of the upper part.
It is a source. In other words, the class pattern originally exists
The number of additional errors determined to be "exist"
This is the case. Here, the lower part of equation (6)
The number is given by the-sign before α.
A number close to 1 when the measure in equation (5) takes a large positive value
And when the measure of equation (5) takes a negative value, it approaches zero.
It is made to show a large number.

【００２５】ここで、関数Ｖ（Ｒ）は入力連続パターン
系列の部分Ｒにスポッティングの対象とするクラスのパ
ターンが実際に存在するか否かを示す教師信号であり、
存在するときは値Ｔをとり、存在しないときは値Ｆをと
る。αは正の定数，βは定数であり、γ_tは脱落誤りに
対する重みであり、γ _fは付加誤りに対する重みであ
る。最適解探索法として一般化確率的降下法を用いる場
合、システムのパラメータの更新は第（７）式による。
第（７）式は一般化確率的降下法と呼ばれるパターン認
識の分野で提案されている定理に従って訓練可能なパラ
メータの更新式である。予めクラスのパターンが含まれ
ているか否かを示す情報が付与されている訓練用のパタ
ーンがたくさん与えられたとすると、第（７）式に従っ
た更新を繰返し、今第ｎ回目の更新をしたところである
と仮定する。ここにおける更新は、基本的に参照パター
ンやしきい値をその勾配（１次微分係数）に基づいて微
小量のみ変更するというものである。実際の変更分が右
辺第２項であり、右辺第１項のｎ回目の更新終了時のパ
ラメータ状態Α（ｎ）から右辺第２項を引くことで、第
ｎ＋１回目の更新が行なわれ、左辺のΑ（ｎ＋１）に至
る。右辺第２項の▽ｙ _R （ｘ（ｎ）；Α（ｎ））は損失
関数の勾配、すなわち訓練可能なパラメータを変数とす
る損失関数の１次微分係数のベクトルである。この式で
はｎ＋１回目の更新時に登録する判定結果に対する損失
のみが使われている。換言すると、１つの部分系列に対
するスポッティング判定の結果に対する評価のみが使わ
れる。第（７）式の右辺の引算の意味を正確に説明する
と、ｙ _R （ｘ（ｎ）；Α（ｎ））で表わされている損失
は、基本的に誤りの数を近似しているので、小さい方が
望ましい。ということは各ｙ _R （ｘ（ｎ）；Α（ｎ））
の値もできるだけ１よりは０に近づけたいことになる。
そこで、関数ｙ _R （ｘ（ｎ）；Α（ｎ））の形を見る
と、これは１と０とを単調につないでいる、ちょうど１
の台地と０の低地を滑らかな坂でつないでいるような関
数になるので、勾配、つまりは坂の傾きの大きさ（▽ｙ
_R （ｘ（ｎ）；Α（ｎ））がこれにあたる）を求め、坂
を降りる方向（−▽ｙ _R （ｘ（ｎ）；Α（ｎ））がこれ
にあたる）にパラメータを変更、言い換えると移動させ
る。これによって、確率的降下法で証明されているが、
この損失関数の坂を降りる方向への更新を繰返していく
と、いずれは利用可能な標本全体で定義される損失関数
の平均化の関数の定点つまりは最小誤りの状態にたどり
着くことになる。 Here, the function V (R) is a teacher signal indicating whether or not the pattern of the class to be spotted actually exists in the portion R of the input continuous pattern sequence.
It takes the value T when it exists, and takes the value F when it does not exist. α is a positive constant, β is a constant, γ _t is a weight for a drop error, and γ _f is a weight for an additional error. When the generalized stochastic descent method is used as the optimal solution search method, the system parameters are updated according to the following equation (7).
Equation (7) is a pattern recognition called the generalized stochastic descent method.
Parameters that can be trained according to the theorem proposed in the field of knowledge.
This is a meter update formula. Includes class patterns in advance
Training pattern to which information indicating whether or not
If a large number of symbols are given, the following equation (7) is used.
Has been repeated, and the nth update has just been performed.
Assume that The update here is basically a reference pattern
And threshold based on the gradient (first derivative)
Only a small amount is changed. Actual changes right
The second term of the right side, and the parameter at the end of the n-th update of the first term of the right side
By subtracting the second term on the right from the parameter state Α (n),
The (n + 1) -th update is performed and reaches Α (n + 1) on the left side.
You. ▽ y _R (x (n); Α (n)) in the second term on the right side is a loss
Let the gradient of the function, i.e. the trainable parameter be a variable
Vector of the first derivative of the loss function. In this formula
Is the loss of the judgment result registered at the time of the (n + 1) th update
Only used. In other words, one subsequence
Only the evaluation of the result of the spotting judgment is used
It is. Explain exactly the meaning of subtraction on the right side of equation (7)
And a loss represented by y _R (x (n); Α (n))
Basically approximates the number of errors, so the smaller is
desirable. This means that each y _R (x (n); Α (n))
Also wants to be closer to 0 than 1 as much as possible.
Therefore, we look at the form of the function y _R (x (n); Α (n))
And this monotonically connects 1 and 0, just 1
Like connecting a plateau to a lowland with a smooth slope
The slope, that is, the magnitude of the slope (▽ y
_R (x (n); Α (n)) corresponds to this
Is the direction of descending (-▽ y _R (x (n); Α (n))
Change the parameter to
You. This has been proven by stochastic descent,
Iteratively updates this loss function in the direction of going downhill
And eventually a loss function defined over the available samples
The fixed point of the averaging function
Will arrive.

【００２６】[0026]

【数７】 (Equation 7)

【００２７】ここで、ｘ（ｎ）とΛ（ｎ）はそれぞれｎ
個目の入力連続パターン系列による更新時の入力連続パ
ターン系列とシステムのパラメータであり、ε_nは一般
化確率的降下法の要求する条件を満たす小さな値をとる
学習係数であり、Ｕは正定値行列であり、▽は関数の勾
配ベクトルを求める演算子である。この更新規則は、上
述の損失関数の期待値であるところのスポッティング誤
り規準の最小解探索を実現することが、一般化確率的降
下法により証明されている。Here, x (n) and Λ (n) are n
The input continuous pattern sequence and the system parameters at the time of updating by the input continuous pattern sequence, ε _n is a learning coefficient taking a small value satisfying the condition required by the generalized stochastic descent method, and U is a positive definite value Is a matrix, and ▽ is an operator for calculating a gradient vector of the function. This update rule has been proved by the generalized stochastic descent method to realize the minimum solution search of the spotting error criterion which is the expected value of the above-mentioned loss function.

【００２８】最適化学習法としては、一般化確率的降下
法以外のものも使うことができ、その選択によってはス
ポッティング誤り規準が上述のように連続関数で定義さ
れている必要はないが、この例における損失関数の定義
は不連続関数による表現に簡単化することができ、しか
もその簡単化によって導き出される学習則は後述の第２
実施例に示すような実際的学習則になる。As the optimization learning method, a method other than the generalized stochastic descent method can be used. Depending on the selection, the spotting error criterion does not need to be defined by a continuous function as described above. The definition of the loss function in the example can be simplified to the expression by a discontinuous function, and the learning rule derived by the simplification is described in the second rule below.
It becomes a practical learning rule as shown in the embodiment.

【００２９】図１に示した結果出力部７は、認識・学習
処理部６の認識結果に基づき、入力連続パターン系列の
各位置についてスポッティングの対象とするクラスのパ
ターンが存在するか否かを出力する。The result output unit 7 shown in FIG. 1 outputs, based on the recognition result of the recognition / learning processing unit 6, whether or not there is a pattern of the class to be spotted at each position of the input continuous pattern sequence. I do.

【００３０】なお、図１に示すスイッチＳＷ１，ＳＷ２
をともに閉じればクラスモデル４およびしきい値５をと
もに更新することか可能であり、また一方だけを閉じて
クラスモデル４のみまたはしきい値５のみの更新を行な
うことも可能である。The switches SW1 and SW2 shown in FIG.
Are closed, it is possible to update both the class model 4 and the threshold 5, and it is also possible to close only one and update only the class model 4 or only the threshold 5.

【００３１】次に、第２実施例について説明する。この
第２実施例では、第１実施例の更新規則の定義において
連続関数で表現されている損失関数のμ，ξ，ζの無限
大極限であるところの不連続関数を損失関数として用い
る近似によって簡単化された更新規則を用いる。この例
におけるクラスモデル４およびしきい値５の更新は次の
第（８）式および第（９）式に基づいて行なわれる。Next, a second embodiment will be described. In the second embodiment, an approximation using a discontinuous function at the infinite limit of μ, ξ, ξ of the loss function expressed as a continuous function in the definition of the update rule of the first embodiment as the loss function is performed. Use simplified update rules. The updating of the class model 4 and the threshold 5 in this example is performed based on the following equations (8) and (9).

【００３２】[0032]

【数８】 (Equation 8)

【００３３】[0033]

【数９】 (Equation 9)

【００３４】ただし、However,

【００３５】[0035]

【数１０】 (Equation 10)

【００３６】[0036]

【数１１】 [Equation 11]

【００３７】[0037]

【数１２】 (Equation 12)

【００３８】ここで、テンプレートの指標ｂは入力連続
パターン系列の第ｉ^*フレームにおける累積経路距離の
小さい順に付されており、時間軸伸縮写像の指標ｍは入
力連続パターン系列の第ｉ^*フレームにおいてそれぞれ
のテンプレートについて経路距離の小さい順に付されて
おり、ｒ_{b j s}（ｎ）およびｚ（ｎ）はそれぞれｎ個目
の入力連続パターン系列による更新時における指標ｂの
テンプレートの第ｊフレームの第ｓ要素およびしきい値
ｚであり、νは損失関数ｌ_R（ｄ_R（ｘ；Λ））の微分
である。[0038] Here, the index b of the template are assigned in ascending order of cumulative path distance in the i ^* frame consecutive input pattern sequence, the index m in the time-scaling mapping in the i ^* frame consecutive input pattern sequence R _bjs (n) and z (n) are the s-th elements of the j-th frame of the template of the index b at the time of updating with the n-th input continuous pattern sequence, respectively. And the threshold z, and ν is the derivative of the loss function l _R (d _R (x; Λ)).

【００３９】この第２実施例では、図１の認識・学習処
理部６においてＤＰ法やＡ^*探索法などの高速な計算法
を用いることができるため、第１実施例よりもはるかに
効率がよく実際的である。In the second embodiment, since a high-speed calculation method such as the DP method or the A ^* search method can be used in the recognition / learning processing unit 6 in FIG. 1, the efficiency is far higher than in the first embodiment. Well practical.

【００４０】次に、この発明の効果を、文中からの音素
／ｋ／のスポッティング実験により確認した。訓練に用
いた音声パターン標本は男性１話者による音素の数がほ
ぼ均等な５０文であり、視察により決定された音素／ｋ
／の終端のフレームおよびその前後３フレームずつから
なる部分でＶ（Ｒ）＝Ｔ，それ以外の部分でＶ（Ｒ）＝
Ｆとし、Ｋ＝Ｍ＝１，ω_k＝ρ_m＝１，ω_{b j}＝１，α
＝０．０５，β＝１０．０，γ_t＝１．０，γ_f＝０．
５として、この発明の第２実施例による訓練を行なっ
た。従来のクラスモデルおよびしきい値の設計法による
場合、音素／ｋ／の総数１３４に対し脱落誤り３３，付
加誤り１７０であったが、この発明の第２実施例による
訓練を施した場合、脱落誤り１８，付加誤り４０にまで
減少させることができた。Next, the effect of the present invention was confirmed by a spotting experiment of phonemes / k / from the text. The voice pattern sample used for training was composed of 50 sentences in which the number of phonemes by one male speaker was almost equal, and the phoneme / k determined by inspection
V (R) = T in the portion consisting of the frame at the end of / and three frames before and after it, and V (R) = T in other portions
F, K = M = 1, ω _k = ρ _m = 1, ω _bj = 1, α
= 0.05, β = 10.0, γ _t = 1.0, γ _f = 0.
As No. 5, the training according to the second embodiment of the present invention was performed. According to the conventional class model and threshold value designing method, the dropout error 33 and the addition error 170 were found for the total number 134 of phonemes / k /, but when the training according to the second embodiment of the present invention was performed, the dropout error was found. Error 18 and additional error 40 could be reduced.

【００４１】[0041]

【発明の効果】以上のように、この発明によれば、スポ
ッティング法を用いた信号検出装置において、スポッテ
ィング誤り数を減らすようにテンプレートとしきい値を
更新することにより、認識性能を向上できる。As described above, according to the present invention, the sports
Signal detection device using the spotting method.
Templates and thresholds to reduce the number of
By updating, the recognition performance can be improved.

[Brief description of the drawings]

【図１】この発明の一実施例が適用された音声認識シス
テムの一例を示す概略ブロック図である。FIG. 1 is a schematic block diagram showing an example of a speech recognition system to which an embodiment of the present invention has been applied.

【図２】図１に示した認識・学習処理部６の処理手順を
示す図である。FIG. 2 is a diagram showing a processing procedure of a recognition / learning processing unit 6 shown in FIG.

[Explanation of symbols]

１信号入力部２特徴抽出部３距離計算部４クラスモデル５しきい値６認識学習処理部７結果出力部 DESCRIPTION OF SYMBOLS 1 Signal input part 2 Feature extraction part 3 Distance calculation part 4 Class model 5 Threshold value 6 Recognition learning processing part 7 Result output part

───────────────────────────────────────────────────── フロントページの続き (72)発明者片桐滋京都府相楽郡精華町大字乾谷小字三平谷５番地株式会社エイ・ティ・アール視聴覚機構研究所内 (56)参考文献特開平４−125597（ＪＰ，Ａ) 特開平４−124782（ＪＰ，Ａ) 特開平４−148384（ＪＰ，Ａ) 「パターン認識」飯島恭哉著、昭和 48年11月10日初版発行、昭和54年３月15 日第３版発行、株式会社コロナ社ｐ．225〜255 ────────────────────────────────────────────────── ─── Continuing on the front page (72) Inventor Shigeru Katagiri Kyoto, Soraku-gun, Seika-cho, 5th, Inaniya, 5th, Sanraya, ATR Co., Ltd. (JP, A) JP-A-4-124784 (JP, A) JP-A-4-148384 (JP, A) "Pattern recognition" by Kyoya Iijima, first edition issued on November 10, 1973, March 1979 15th Third edition issued, Corona Co., Ltd. p. 225-255

Claims

(57) [Claims]

Is 1. A voice, text, signal sequence belonging to the class is the recognition unit from among a series of input signals such as an image
Is determined using the spotting method, and the desired signal is determined.
A signal detection apparatus for extracting a sequence, feature extraction means for converting the input signal sequence FEATURES sequence, it can belong to any part series of classes in the feature sequence obtained by said feature extracting means Distance as a value representing a certain size of the subsequence and the class
Distance calculating means for calculating with the rate, the distance the partial sequence is smaller than the distance calculated by the calculation means the class <br/> threshold is determined to belong to the class <br/> , detecting means for determining not to belong to the class otherwise, and the class in which the partial sequence corresponding by said detecting means
I judge that I belong but do not belong
An error and the subsequence does not belong to the corresponding class
Despite the mistakes that he judged to belong,
The number of occurrences of errors in the template and the
Constituted by the first derivative defined with respect to the threshold
A signal detection device comprising: an updating unit configured to update the template and the threshold based on a gradient to be obtained.