JP6435133B2

JP6435133B2 - Phoneme segmentation apparatus, speech processing system, phoneme segmentation method, and phoneme segmentation program

Info

Publication number: JP6435133B2
Application number: JP2014163880A
Authority: JP
Inventors: 川上　福司; 福司川上; 雅和木山; 健久岡本
Original assignee: Nippon Sheet Glass Environment Amenity Co Ltd
Current assignee: Nippon Sheet Glass Environment Amenity Co Ltd
Priority date: 2014-08-11
Filing date: 2014-08-11
Publication date: 2018-12-05
Anticipated expiration: 2034-08-11
Also published as: JP2016038552A

Description

本発明は、音声信号から音素を分割・抽出するための音素分割装置、該音素分割装置を用いた音声処理システム、音素分割方法、音素分割プログラム、および騒音測定装置に関する。 The present invention relates to a phoneme dividing device for dividing and extracting phonemes from a speech signal, a speech processing system using the phoneme dividing device, a phoneme dividing method, a phoneme dividing program, and a noise measuring device.

近年、「個人情報保護法」の施行等により銀行やオフィスにおいて会話情報を保護する必要性が高まってきたが、その手段として、物理的に空間を分離する従来の遮音・防音とは別に、オープンプランオフィスなどにおいて音声信号を別の雑音・音楽などで隠蔽(情報マスキング)するスピーチプライバシーシステム（音声情報秘話装置）が提案されている。スピーチプライバシーシステムとしては、例えば原音声をマスカーとして用いるもの（例えば、特許文献１参照）が知られている。 In recent years, the need to protect conversation information in banks and offices has increased due to the enforcement of the “Personal Information Protection Law”, etc., but as a means of doing so, it has been opened apart from conventional sound and sound insulation that physically separates spaces. A speech privacy system (speech information secret device) that conceals (information masking) an audio signal with another noise or music in a plan office or the like has been proposed. As a speech privacy system, for example, a system that uses original speech as a masker (see, for example, Patent Document 1) is known.

電気音響を用いた一般のＳＲ(Sound Reinforcement)システムやＰＡ(Public Address）システムが音量や明瞭を向上させる目的で用いられるのに対し、スピーチプライバシーシステムは、信号処理により音声信号の構造自体を略実時間で変更・処理することにより、音声信号のスペクトラムやエネルギー包絡線など統計的な性質を大きく変更することなく、その音声の内容のみを隠蔽／遮断し、受聴者に会話の中身を理解不能とすることを目的としたものである。 While a general SR (Sound Reinforcement) system and PA (Public Address) system using electroacoustics are used for the purpose of improving sound volume and clarity, the speech privacy system is a simplified structure of an audio signal by signal processing. By changing and processing in real time, without significantly changing statistical properties such as the spectrum and energy envelope of the audio signal, only the content of the audio is concealed / blocked and the listener cannot understand the contents of the conversation. It is intended to be.

上記特許文献１では、音声包絡線の「略一山」を１つの音素として抽出し、これを再配置するなどして音声の構造を変化させてマスカー（原音声に重畳してその内容を隠蔽する別音声）として利用している。 In the above Patent Document 1, “substantially a mountain” of a speech envelope is extracted as one phoneme and rearranged to change the structure of the speech to mask a masker (superimposed on the original speech to conceal its contents) Used as a separate voice).

従来、音声包絡線の略一山を抽出する方法としては、入力音声のエネルギー包絡線が閾値を越えて立ち上がり、再び元に戻るまでを１音素(1 ｍｏｒａ)とする方法が一般的である。 Conventionally, as a method for extracting substantially one peak of a speech envelope, a method in which one energy element (1 mora) from when the energy envelope of an input speech rises beyond a threshold and returns to the original level is generally used.

特開２０１１−１２３１４１号公報JP 2011-123141 A

しかしながら、マイクロホン等で集音された入力音声には通常、暗騒音（バックグラウンドノイズ）が重畳している。一般的な室や空間の暗騒音は、短い時間ではほぼ一定しているのに、長時間でみるとかなり大きく変動する傾向がある。従って、ある時間の暗騒音のレベルに基づいて閾値を設定したとしても、暗騒音のレベル変動に起因して適切な音素分割を行うことができない可能性がある。また、暗騒音のレベル変動に合わせて閾値を手作業で調整するのは大変な作業である。 However, background noise is usually superimposed on input sound collected by a microphone or the like. The background noise in a general room or space is almost constant in a short time, but tends to fluctuate considerably in a long time. Therefore, even if the threshold is set based on the background noise level for a certain time, there is a possibility that appropriate phoneme division cannot be performed due to the background noise level fluctuation. Also, it is a difficult task to manually adjust the threshold according to the background noise level fluctuation.

本発明はこうした課題に鑑みてなされたものであり、その目的は、暗騒音のレベルを自動で検知することを可能ならしめる技術を提供することにある。 The present invention has been made in view of these problems, and an object of the present invention is to provide a technique that makes it possible to automatically detect the background noise level.

上記課題を解決するために、本発明のある態様の音素分割装置は、音声信号に暗騒音信号が重畳された音信号を２つに分岐する第１分岐部と、第１分岐部で分岐された一方の音信号をさらに２つに分岐する第２分岐部と、第２分岐部で分岐された一方の音信号を数１０〜数１００ｍｓの音声用時定数で平滑化する音声用時定数部と、第２分岐部で分岐された他方の音信号の立ち上がりに対しては音声用時定数より少なくとも１０倍以上大きい立ち上がり用時定数で平滑化するとともに、他方の音信号の立ち下がりに対しては音声用時定数と略同じ立ち下がり用時定数で平滑化する暗騒音用時定数部と、音声用時定数部からの信号と、暗騒音用時定数部からの信号とを比較する比較部と、比較部の比較結果に応じて、第１分岐部で分岐された他方の音信号の通過／非通過を制御するゲート部とを備える。 In order to solve the above problems, a phoneme division device according to an aspect of the present invention is divided into a first branching unit that branches a sound signal in which a background noise signal is superimposed on a sound signal into two, and a first branching unit. A second branching unit for further branching the other sound signal into two, and a sound time constant unit for smoothing one sound signal branched by the second branching unit with a sound time constant of several tens to several hundreds of ms The rise of the other sound signal branched by the second branching unit is smoothed with a rise time constant that is at least 10 times larger than the sound time constant, and the rise of the other sound signal Is a comparison unit that compares the signal from the time constant part for background noise and the signal from the time constant part for background noise with the time constant part for background noise that is smoothed by the time constant for falling that is almost the same as the time constant for voice And other branches branched at the first branch according to the comparison result of the comparator And a gate portion for controlling transmission / non-transmission of the sound signal.

本発明の別の態様は、音声処理システムである。このシステムは、原音声を集音して、音声信号に暗騒音信号が重畳された音信号を出力する集音装置と、集音装置からの音信号を受信して、音声信号を音素に分割する上述の音素分割装置と、音素分割装置から得られる音素信号に所定の処理を施す音素処理装置と、音素処理装置によって処理された音素信号を音として空間に出力する出力装置とを備える。 Another aspect of the present invention is a speech processing system. This system collects the original sound and outputs a sound signal in which a background noise signal is superimposed on the sound signal, and receives the sound signal from the sound collector and divides the sound signal into phonemes The above-mentioned phoneme dividing device, a phoneme processing device that performs a predetermined process on a phoneme signal obtained from the phoneme dividing device, and an output device that outputs the phoneme signal processed by the phoneme processing device as a sound to space.

本発明のさらに別の態様は、音素分割方法である。この方法は、音声信号に暗騒音信号が重畳された音信号を２つに分岐する第１分岐ステップと、第１分岐ステップで分岐された一方の音信号を２つに分岐する第２分岐ステップと、第２分岐ステップで分岐された一方の音信号を数１０〜数１００ｍｓの音声用時定数で平滑化する第１平滑化ステップと、第２分岐ステップで分岐された他方の音信号の立ち上がりに対しては音声用時定数より少なくとも１０倍以上大きい立ち上がり用時定数で平滑化するとともに、他方の音信号の立ち下がりに対しては音声用時定数と略同じ立ち下がり用時定数で平滑化する第２平滑化ステップと、第１平滑化ステップで演算された信号と、第２平滑化ステップで演算された信号とを比較する比較ステップと、比較ステップの比較結果に応じて、第１分岐ステップで分岐された他方の音信号の通過／非通過を制御する通過制御ステップとを備える。 Yet another embodiment of the present invention is a phoneme division method. This method includes a first branching step for branching a sound signal in which a background noise signal is superimposed on an audio signal into two, and a second branching step for branching one of the sound signals branched in the first branching step into two. A first smoothing step for smoothing one sound signal branched in the second branching step with a time constant for sound of several tens to several hundreds of ms, and a rise of the other sound signal branched in the second branching step Is smoothed with a rising time constant that is at least 10 times larger than the audio time constant, and the other sound signal is smoothed with a falling time constant that is substantially the same as the audio time constant. The first smoothing step, the comparison step comparing the signal calculated in the first smoothing step and the signal calculated in the second smoothing step, and the first branch depending on the comparison result of the comparison step Ste And a transmission controlling step for controlling passage / non-passage of the other sound signals branched by up.

本発明のさらに別の態様は、音素分割プログラムである。このプログラムは、コンピュータに、音声信号に暗騒音信号が重畳された音信号を２つに分岐する第１分岐ステップと、第１分岐ステップで分岐された一方の音信号を２つに分岐する第２分岐ステップと、第２分岐ステップで分岐された一方の音信号を数１０〜数１００ｍｓの音声用時定数で平滑化する第１平滑化ステップと、第２分岐ステップで分岐された他方の音信号の立ち上がりに対しては音声用時定数より少なくとも１０倍以上大きい立ち上がり用時定数で平滑化するとともに、他方の音信号の立ち下がりに対しては音声用時定数と略同じ立ち下がり用時定数で平滑化する第２平滑化ステップと、第１平滑化ステップで演算された信号と、第２平滑化ステップで演算された信号とを比較する比較ステップと、比較ステップの比較結果に応じて、第１分岐ステップで分岐された他方の音信号の通過／非通過を制御する通過制御ステップとを実行させるための音素分割プログラムである。 Yet another embodiment of the present invention is a phoneme division program. This program causes a computer to branch a sound signal obtained by superimposing a background noise signal on a sound signal into two, and to branch one sound signal branched in the first branch step into two. A second smoothing step, a first smoothing step for smoothing one sound signal branched in the second branching step with an audio time constant of several tens to several hundreds of ms, and the other sound branched in the second branching step The signal rise is smoothed by a rise time constant that is at least 10 times larger than the audio time constant, and the fall time constant of the other sound signal is substantially the same as the audio time constant. The second smoothing step smoothed in step 1, the comparison step comparing the signal calculated in the first smoothing step with the signal calculated in the second smoothing step, and the comparison result of the comparison step In response, a phoneme splitting program for executing a transmission control step of controlling transmission / non-transmission of the other sound signal branched by the first branching step.

本発明のさらに別の態様は、騒音測定装置である。この装置は、周囲音に含まれる暗騒音のレベルを測定する騒音測定装置であって、周囲音を集音する集音部と、集音部からの音信号の立ち下がりに対しては数１０〜数１００ｍｓの立ち下がり用時定数で平滑化するとともに、集音部からの音信号の立ち上がりに対しては立ち下がり用時定数より少なくとも１０倍以上大きい立ち上がり用時定数で平滑化する暗騒音用時定数部とを備える。 Yet another embodiment of the present invention is a noise measurement device. This device is a noise measurement device that measures the level of background noise included in ambient sound, and is a tens of times for a sound collection unit that collects ambient sound and a falling edge of a sound signal from the sound collection unit. Smoothing with a falling time constant of ˜100 ms, and for dark noise smoothing with a rising time constant that is at least 10 times greater than the falling time constant for the rise of the sound signal from the sound collection unit And a time constant part.

本発明のさらに別の態様もまた、騒音測定装置である。この装置は、周囲音に含まれる騒音のレベルを測定する騒音測定装置であって、周囲音を２つに分岐する分岐部と、分岐部で分岐された一方の音信号を数１０〜数１００ｍｓの音声用時定数で平滑化する音声用時定数部と、分岐部で分岐された他方の音信号の立ち上がりに対しては音声用時定数より少なくとも１０倍以上大きい立ち上がり用時定数で平滑化するとともに、他方の音信号の立ち下がりに対しては音声用時定数と略同じ立ち下がり用時定数で平滑化する暗騒音用時定数部と、音声用時定数部からの信号と、暗騒音用時定数部からの信号とを表示する表示部とを備える。 Yet another embodiment of the present invention is also a noise measurement device. This device is a noise measuring device that measures the level of noise included in ambient sound, and a branching unit that branches the ambient sound into two, and one sound signal branched at the branching unit is several 10 to several 100 ms. The time constant for speech smoothing with the time constant for speech and the rise of the other sound signal branched at the branching portion are smoothed with a time constant for rise of at least 10 times greater than the time constant for speech. In addition, for the falling edge of the other sound signal, the background time constant part for smoothing with the time constant for falling substantially the same as the time constant for sound, the signal from the time constant part for sound, and for the background noise And a display unit for displaying a signal from the time constant unit.

なお、以上の構成要素の任意の組み合わせや、本発明の構成要素や表現を装置、方法、システム、コンピュータプログラム、コンピュータプログラムを格納した記録媒体などの間で相互に置換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements, or those obtained by replacing the constituent elements and expressions of the present invention with each other between apparatuses, methods, systems, computer programs, recording media storing computer programs, and the like are also included in the present invention. It is effective as an embodiment of

本発明によれば、暗騒音のレベルを自動で検知することができる。 According to the present invention, the background noise level can be automatically detected.

従来の音素分割装置の一例を説明するための図である。It is a figure for demonstrating an example of the conventional phoneme division | segmentation apparatus. 図２（ａ）〜（ｅ）は、図１に示す音素分割装置による音素分割処理を説明するための図である。2A to 2E are diagrams for explaining phoneme division processing by the phoneme division apparatus shown in FIG. 本発明の実施形態に係る音素分割装置を説明するための図である。It is a figure for demonstrating the phoneme division | segmentation apparatus which concerns on embodiment of this invention. 図４（ａ）〜（ｆ）は、図３に示す音素分割装置による音素分割処理を説明するための図である。FIGS. 4A to 4F are diagrams for explaining phoneme division processing by the phoneme division apparatus shown in FIG. 変形例に係る音素分割装置を説明するための図である。It is a figure for demonstrating the phoneme division | segmentation apparatus which concerns on a modification. 本発明の別の実施形態に係る音素分割装置を説明するための図である。It is a figure for demonstrating the phoneme division | segmentation apparatus which concerns on another embodiment of this invention. 本発明のさらに別の実施形態に係る音素分割装置を説明するための図である。It is a figure for demonstrating the phoneme division | segmentation apparatus which concerns on another embodiment of this invention. 図８（ａ）〜（ｃ）は、図７に示す音素分割装置による音素分割処理を説明するための図である。8A to 8C are diagrams for explaining phoneme division processing by the phoneme division apparatus shown in FIG. 本発明のさらに別の実施形態に係る音声処理システムを説明するための図である。It is a figure for demonstrating the speech processing system which concerns on another embodiment of this invention. 本発明のさらに別の実施形態に係る騒音測定装置を説明するための図である。It is a figure for demonstrating the noise measuring device which concerns on another embodiment of this invention. 本発明のさらに別の実施形態に係る騒音測定装置を説明するための図である。It is a figure for demonstrating the noise measuring device which concerns on another embodiment of this invention. 表示部による騒音レベル表示の一例を示す図である。It is a figure which shows an example of the noise level display by a display part.

以下、本発明を好適な実施の形態をもとに図面を参照しながら説明する。各図面に示される同一または同等の構成要素、部材、処理には、同一の符号を付するものとし、適宜重複した説明は省略する。 The present invention will be described below based on preferred embodiments with reference to the drawings. The same or equivalent components, members, and processes shown in the drawings are denoted by the same reference numerals, and repeated descriptions are omitted as appropriate.

まず、本発明の実施形態に係る音素分割装置について説明する前に、従来の音素分割装置の一例について説明する。 First, an example of a conventional phoneme dividing apparatus will be described before describing a phoneme dividing apparatus according to an embodiment of the present invention.

図１は、従来の音素分割装置の一例を説明するための図である。図１に音素分割装置１００は、マイクアンプ１０４と、絶対値回路１０６と、時定数回路１０８と、比較器１１０と、可変抵抗器１１２と、ゲート回路１１４とを備える。 FIG. 1 is a diagram for explaining an example of a conventional phoneme dividing apparatus. 1 includes a microphone amplifier 104, an absolute value circuit 106, a time constant circuit 108, a comparator 110, a variable resistor 112, and a gate circuit 114.

マイクアンプ１０４は、マイクロホン１０２に接続される。マイクロホン１０２は、原音声（マスキー）を集音し、音信号に変換する。この音信号は、音声に暗騒音が重畳されたものである。マイクアンプ１０４は、マイクロホン１０２からの音信号を増幅する。マイクアンプ１０４から出力された音信号Ｘ（ｔ）は、分岐部１０５で２つに分岐され、一方の音信号Ｘ（ｔ）は絶対値回路１０６に入力され、他方の音信号Ｘ（ｔ）はゲート回路１１４に入力される。絶対値回路１０６は、音信号Ｘ（ｔ）の絶対値を出力する。絶対値回路１０６から出力された音信号の絶対値｜Ｘ（ｔ）｜は、時定数回路１０８に入力される。時定数回路１０８は、抵抗値Ｒの抵抗器および容量値Ｃのコンデンサから構成された一次のローパスフィルタであり、その時定数τ＝ＲＣは１００ｍｓ程度に設定される。時定数回路１０８は、音信号の絶対値｜Ｘ（ｔ）｜を平滑化する。時定数回路１０８による平滑化処理により、音信号の絶対値｜Ｘ（ｔ）｜から時定数τよりも速い成分が取り除かれ、包絡線信号Ａ（ｔ）が得られる。比較器１１０は、包絡線信号Ａ（ｔ）と、所定の閾値Ｔとを比較し、Ａ（ｔ）≧Ｔとなるタイミングでゲート回路１１４を開状態とする。これにより、ゲート回路１１４が開状態のときの音信号の区間が、音素（ｍｏｒａ）として分割・抽出される。 The microphone amplifier 104 is connected to the microphone 102. The microphone 102 collects the original voice (Muskey) and converts it into a sound signal. This sound signal is a sound in which background noise is superimposed. The microphone amplifier 104 amplifies the sound signal from the microphone 102. The sound signal X (t) output from the microphone amplifier 104 is branched into two by the branching unit 105, one sound signal X (t) is input to the absolute value circuit 106, and the other sound signal X (t) Is input to the gate circuit 114. The absolute value circuit 106 outputs the absolute value of the sound signal X (t). The absolute value | X (t) | of the sound signal output from the absolute value circuit 106 is input to the time constant circuit 108. The time constant circuit 108 is a primary low-pass filter composed of a resistor having a resistance value R and a capacitor having a capacitance value C, and the time constant τ = RC is set to about 100 ms. The time constant circuit 108 smoothes the absolute value | X (t) | of the sound signal. By the smoothing processing by the time constant circuit 108, a component faster than the time constant τ is removed from the absolute value | X (t) | of the sound signal, and an envelope signal A (t) is obtained. The comparator 110 compares the envelope signal A (t) with a predetermined threshold T, and opens the gate circuit 114 at a timing when A (t) ≧ T. Thereby, the section of the sound signal when the gate circuit 114 is in the open state is divided and extracted as a phoneme (mora).

音素分割装置１００において、閾値Ｔは、電源電圧＋Ｖｃを可変抵抗器１１２で調整することにより手動で設定される。音声を高い精度で音素に分割するためには、この閾値Ｔの設定が重要である。 In the phoneme dividing apparatus 100, the threshold value T is manually set by adjusting the power supply voltage + Vc with the variable resistor 112. In order to divide speech into phonemes with high accuracy, the setting of this threshold value T is important.

図２（ａ）〜（ｅ）は、図１に示す音素分割装置１００による音素分割処理を説明するための図である。図２（ａ）〜（ｅ）の縦軸は信号レベルを任意の単位で表し、横軸は時間ｔを表す。図２（ａ）は、マイクアンプ１０４から出力される音信号Ｘ（ｔ）の波形を示す。図２（ｂ）は、絶対値回路１０６から出力される音信号の絶対値｜Ｘ（ｔ）｜の波形と、時定数回路１０８から出力される包絡線信号Ａ（ｔ）の波形を示す。包絡線信号Ａ（ｔ）の波形の「略一山」が一つの音素（１ｍｏｒａ）に対応する。 2A to 2E are diagrams for explaining phoneme division processing by the phoneme division apparatus 100 shown in FIG. 2A to 2E, the vertical axis represents the signal level in arbitrary units, and the horizontal axis represents time t. FIG. 2A shows the waveform of the sound signal X (t) output from the microphone amplifier 104. FIG. 2B shows the waveform of the absolute value | X (t) | of the sound signal output from the absolute value circuit 106 and the waveform of the envelope signal A (t) output from the time constant circuit 108. The “approximately one mountain” of the waveform of the envelope signal A (t) corresponds to one phoneme (1 mora).

図２（ｂ）に示すように、本例において包絡線信号Ａ（ｔ）は６個の「略一山」、すなわち音素１〜６を含んでいる。また、図２（ｂ）には、比較器１１０において包絡線信号Ａ（ｔ）から音素を分割する際に用いる３段階の閾値Ｔ（閾値Ｔ１〜Ｔ３）が図示されている。図２（ｃ）〜（ｅ）は、ゲート回路１１４からの出力信号の波形、すなわち、音素分割装置１００による音素分割結果を示す。音素分割装置１００による音素分割結果は、閾値Ｔに依存する。 As shown in FIG. 2B, in this example, the envelope signal A (t) includes six “substantially one mountain”, that is, phonemes 1 to 6. FIG. 2B shows three-stage threshold values T (threshold values T1 to T3) used when the phoneme is divided from the envelope signal A (t) in the comparator 110. 2C to 2E show waveforms of output signals from the gate circuit 114, that is, phoneme division results by the phoneme division device 100. FIG. The phoneme segmentation result by the phoneme segmentation apparatus 100 depends on the threshold value T.

図２（ｃ）は、閾値Ｔを暗騒音のレベルより十分大きな閾値Ｔ１に設定したときの音素分割結果を示す。音素分割装置１００のように音信号の包絡線と閾値を比較することで音素分割を行う方法においては、できるだけ安定に音素を分割・抽出しようとすると閾値Ｔを暗騒音のレベルより十分大きな値に設定することとなる。この場合、図２（ｃ）に示すように、音素４，６のような小さいレベルの音素が欠落する可能性がある。 FIG. 2C shows a phoneme division result when the threshold value T is set to a threshold value T1 sufficiently larger than the background noise level. In a method of dividing a phoneme by comparing an envelope of a sound signal and a threshold as in the phoneme dividing apparatus 100, the threshold T is set to a value sufficiently larger than the background noise level in order to divide and extract the phoneme as stably as possible. It will be set. In this case, as shown in FIG. 2C, there is a possibility that a small level of phonemes such as phonemes 4 and 6 may be lost.

図２（ｅ）は、閾値Ｔを暗騒音のレベルと同程度の小さな閾値Ｔ３に設定したときの音素分割結果を示す。この場合、図２（ｅ）に示すように、隣接する音素１と２および隣接する音素３〜５が分割されていない。このように、閾値Ｔを小さな閾値Ｔ３に設定した場合、音素間の境界があいまいとなり、図２（ｅ）に示すように複数の音素が分割されずに繋がってしまう可能性がある。 FIG. 2E shows a phoneme division result when the threshold value T is set to a small threshold value T3 comparable to the background noise level. In this case, as shown in FIG. 2E, the adjacent phonemes 1 and 2 and the adjacent phonemes 3 to 5 are not divided. Thus, when the threshold value T is set to a small threshold value T3, the boundary between phonemes becomes ambiguous, and a plurality of phonemes may be connected without being divided as shown in FIG.

図２（ｄ）は、閾値Ｔを暗騒音のレベルにマージンＨを加えた最適な閾値Ｔ２に設定したときの音素分割結果を示す。この場合、図２（ｄ）に示すように音素１〜６が適切に分割されている。このように、適切な音素分割を行うためには、暗騒音のレベルを正確に検知し、この暗騒音のレベルよりも僅かに大きな値を閾値Ｔに設定することが重要である。 FIG. 2D shows a phoneme division result when the threshold T is set to an optimum threshold T2 obtained by adding a margin H to the background noise level. In this case, phonemes 1 to 6 are appropriately divided as shown in FIG. Thus, in order to perform appropriate phoneme division, it is important to accurately detect the background noise level and set the threshold T to a value slightly larger than the background noise level.

一般的な室や空間の暗騒音は、短い時間ではほぼ一定しているのに、長時間でみると午前と午後、昼食時と執務時というように、かなり大きく変動する傾向がある。従って、ある時間の暗騒音のレベルに基づいて閾値Ｔを設定したとしても、暗騒音のレベル変動に起因して適切な音素分割を行うことができない可能性がある。また、暗騒音のレベル変動に合わせて閾値Ｔを手作業で調整するのは大変な作業である。 The general background noise in a room or space is almost constant in a short time, but in the long time, it tends to fluctuate considerably in the morning and afternoon, lunch and office hours. Therefore, even if the threshold value T is set based on the background noise level for a certain period of time, there is a possibility that appropriate phoneme division cannot be performed due to background noise level fluctuations. In addition, it is a difficult task to manually adjust the threshold T in accordance with the background noise level fluctuation.

上記のような従来の音素分割装置の課題を認識した上で、本発明者は、暗騒音のレベルが変動した場合であっても適切な音素分割を自動で実行することを可能ならしめる音素分割方法及び装置を発明した。 After recognizing the problems of the conventional phoneme dividing device as described above, the present inventor has made it possible to automatically perform appropriate phoneme division even when the background noise level fluctuates. Invented a method and apparatus.

図３は、本発明の実施形態に係る音素分割装置１０を説明するための図である。図３に示すように、音素分割装置１０は、マイクアンプ１４と、バンドパスフィルタ１５と、自乗回路１６と、音声用時定数回路１８と、暗騒音用時定数回路２０と、音声用平方根回路２２と、暗騒音用平方根回路２４と、バッファアンプ２６と、加算器２８と、可変抵抗器２９と、比較器３０と、ゲート回路３２とを備える。 FIG. 3 is a diagram for explaining the phoneme dividing apparatus 10 according to the embodiment of the present invention. As shown in FIG. 3, the phoneme dividing apparatus 10 includes a microphone amplifier 14, a bandpass filter 15, a square circuit 16, an audio time constant circuit 18, a background noise time constant circuit 20, and an audio square root circuit. 22, a background noise square root circuit 24, a buffer amplifier 26, an adder 28, a variable resistor 29, a comparator 30, and a gate circuit 32.

マイクアンプ１４は、マイクロホン１２に接続される。マイクロホン１２は、会話などの原音声（マスキー）を集音し、音信号に変換する。マイクアンプ１４は、マイクロホン１２からの音信号を増幅する。 The microphone amplifier 14 is connected to the microphone 12. The microphone 12 collects original voice (masky) such as conversation and converts it into a sound signal. The microphone amplifier 14 amplifies the sound signal from the microphone 12.

バンドパスフィルタ１５は、マイクアンプ１４からの増幅音信号のうち、所定の通過帯域の信号成分を通過させる。このバンドパスフィルタ１５は、例えば成人音声の平均スペクトルに対応する通過帯域（例えば１００Ｈｚ〜７ｋＨｚ、より好適には２５０Ｈｚ〜４ｋＨｚ）を有する。バンドパスフィルタ１５を設けることにより、音素の分割・抽出精度を向上できる。 The band-pass filter 15 passes a signal component in a predetermined pass band in the amplified sound signal from the microphone amplifier 14. The band pass filter 15 has a pass band (for example, 100 Hz to 7 kHz, more preferably 250 Hz to 4 kHz) corresponding to the average spectrum of adult speech, for example. By providing the band-pass filter 15, the phoneme segmentation / extraction accuracy can be improved.

バンドパスフィルタ１５から出力される音信号Ｘ（ｔ）は、音声信号ｘ（ｔ）に室（空間）の暗騒音信号ｎ（ｔ）が重畳されたものである。すなわち、Ｘ（ｔ）＝ｘ（ｔ）＋ｎ（ｔ）と表される。 The sound signal X (t) output from the bandpass filter 15 is obtained by superimposing the background noise signal n (t) of the room (space) on the sound signal x (t). That is, X (t) = x (t) + n (t).

バンドパスフィルタ１５から出力された音信号Ｘ（ｔ）は、第１分岐部１３で２つに分岐される。第１分岐部１３で分岐された一方の音信号Ｘ（ｔ）は自乗回路１６に入力され、他方の音信号Ｘ（ｔ）はゲート回路３２に入力される。図３から分かるように、本実施形態では第１分岐部１３の前段にバンドパスフィルタ１５が設けられている。また、第１分岐部１３と第２分岐部１７との間には自乗回路１６が設けられている。 The sound signal X (t) output from the bandpass filter 15 is branched into two by the first branching unit 13. One sound signal X (t) branched by the first branching unit 13 is input to the square circuit 16, and the other sound signal X (t) is input to the gate circuit 32. As can be seen from FIG. 3, in the present embodiment, a bandpass filter 15 is provided in front of the first branching unit 13. Further, a square circuit 16 is provided between the first branch part 13 and the second branch part 17.

自乗回路１６は、第１分岐部１３で分岐された一方の音信号Ｘ（ｔ）の自乗信号Ｘ^２（ｔ）を出力する。音信号Ｘ（ｔ）には、正負の値が含まれる。自乗回路１６で音信号Ｘ（ｔ）を自乗することで、正の値のみを処理すればよいため、信号処理を容易にすることができる。自乗回路１６から出力された自乗信号Ｘ^２（ｔ）は、第２分岐部１７で２つの分岐される。第２分岐部１７で分岐された一方の自乗信号Ｘ^２（ｔ）は音声用時定数回路１８に入力され、他方の自乗信号Ｘ^２（ｔ）は暗騒音用時定数回路２０に入力される。 The square circuit 16 outputs a square signal X ² (t) of one sound signal X (t) branched by the first branching unit 13. The sound signal X (t) includes positive and negative values. Since the squaring circuit 16 squares the sound signal X (t), only a positive value needs to be processed, so that signal processing can be facilitated. The square signal X ² (t) output from the square circuit 16 is branched into two by the second branch unit 17. One square signal X ² (t) branched by the second branching unit 17 is input to the audio time constant circuit 18, and the other square signal X ² (t) is input to the background noise time constant circuit 20. .

音声用時定数回路１８は、抵抗値Ｒの第１抵抗器３４と、容量値Ｃの第２コンデンサ３６とから構成される一次のローパスフィルタである。第１抵抗器３４の一方の端子は自乗回路１６に接続され、他方の端子は音声用平方根回路２２に接続されている。第２コンデンサ３６の一方の端子は第１抵抗器３４の他方の端子に接続され、第２コンデンサ３６の他方の端子は接地されている。音声用時定数回路１８の時定数（以下、「音声用時定数」と呼ぶ）τ_ｖ＝ＲＣは、数１０ｍｓ〜数１００ｍｓ（例えば１２５ｍｓ）の比較的小さい値に設定される。音声用時定数回路１８は、自乗信号Ｘ^２（ｔ）を音声用時定数τ_ｖで平滑化（平均化）する。音声用時定数回路１８による平滑化処理（平均化処理）により、自乗信号Ｘ^２（ｔ）から音声用時定数τ_ｖよりも速い成分が取り除かれ、自乗信号Ｘ^２（ｔ）の包絡線信号が得られる。 The audio time constant circuit 18 is a primary low-pass filter including a first resistor 34 having a resistance value R and a second capacitor 36 having a capacitance value C. One terminal of the first resistor 34 is connected to the square circuit 16, and the other terminal is connected to the audio square root circuit 22. One terminal of the second capacitor 36 is connected to the other terminal of the first resistor 34, and the other terminal of the second capacitor 36 is grounded. The time constant (hereinafter referred to as “sound time constant”) τ _v = RC of the sound time constant circuit 18 is set to a relatively small value of several tens of ms to several hundreds of ms (for example, 125 ms). The audio time constant circuit 18 smoothes (averages) the square signal X ² (t) with the audio time constant τ _v . The smoothing by the audio time constant circuit 18 processes (averaging process), squared signal X ^{2 (t)} at the time for sound from the constant tau _v fast component than is removed, the envelope signal of the squared signal X ^{2 (t)} Is obtained.

音声用時定数回路１８の後段に設けられた音声用平方根回路２２は、音声用時定数回路１８から入力された信号の平方根を演算する。この音声用平方根回路２２から出力される信号Ａ（ｔ）は、音声信号ｘ（ｔ）の包絡線、すなわち音声信号ｘ（ｔ）の実効値ｘ_ｒｍｓと見なすことができる（以下の数式参照）。以下、Ａ（ｔ）を「音声包絡線信号」と呼ぶ。

The audio square root circuit 22 provided at the subsequent stage of the audio time constant circuit 18 calculates the square root of the signal input from the audio time constant circuit 18. The signal A (t) output from the sound square root circuit 22 can be regarded as an envelope of the sound signal x (t), that is, an effective value x _{rms of the} sound signal x (t) (see the following formula). . Hereinafter, A (t) is referred to as a “voice envelope signal”.

暗騒音用時定数回路２０は、入力信号の立ち上がりと立ち下がりにおいて時定数が異なるように構成された一次のローパスフィルタである。暗騒音用時定数回路２０は、抵抗値Ｒ’の第２抵抗器３８と、容量Ｃの第２コンデンサ４０と、ダイオード４２と、抵抗値Ｒの第３抵抗器４４とから成る。第２抵抗器３８の一方の端子は自乗回路１６に接続され、他方の端子は暗騒音用平方根回路２４に接続されている。ダイオード４２のカソード端子は自乗回路１６に接続され、アノード端子は第３抵抗器４４の一方の端子に接続されている。第３抵抗器４４の他方の端子は暗騒音用平方根回路２４に接続されている。第２コンデンサ４０の一方の端子は第２抵抗器３８および第３抵抗器４４の他方の端子接続され、第２コンデンサ４０の他方の端子は接地されている。 The background noise time constant circuit 20 is a primary low-pass filter configured to have different time constants at the rise and fall of the input signal. The background noise time constant circuit 20 includes a second resistor 38 having a resistance value R ′, a second capacitor 40 having a capacitance C, a diode 42, and a third resistor 44 having a resistance value R. One terminal of the second resistor 38 is connected to the square circuit 16, and the other terminal is connected to the background noise square root circuit 24. The cathode terminal of the diode 42 is connected to the square circuit 16, and the anode terminal is connected to one terminal of the third resistor 44. The other terminal of the third resistor 44 is connected to the background noise square root circuit 24. One terminal of the second capacitor 40 is connected to the other terminal of the second resistor 38 and the third resistor 44, and the other terminal of the second capacitor 40 is grounded.

このように構成された暗騒音用時定数回路２０においては、入力信号の立ち上がりに対しては、第２抵抗器３８と第２コンデンサ４０から構成される時定数（以下、「立ち上がり用時定数」と呼ぶ）τ_ｕ＝Ｒ’Ｃのローパスフィルタで平滑化が行われる。一方、入力信号の立ち下がりに対しては、第３抵抗器４４と第２コンデンサ４０から構成される時定数（以下、「立ち下がり用時定数」と呼ぶ）τ_ｄ＝ＲＣのローパスフィルタで平滑化が行われる。 In the background noise time constant circuit 20 configured in this manner, the time constant (hereinafter referred to as “rising time constant”) composed of the second resistor 38 and the second capacitor 40 with respect to the rising of the input signal. Smoothing is performed with a low-pass filter of τ _u = R′C. On the other hand, the falling of the input signal is smoothed by a low-pass filter having a time constant (hereinafter referred to as “time constant for falling”) τ _d = RC composed of the third resistor 44 and the second capacitor 40. Is done.

本実施形態に係る暗騒音用時定数回路２０において、立ち上がり用時定数τ_ｕは、立ち下がり用時定数τ_ｄよりも非常に大きな値に設定される。すなわち、暗騒音用時定数回路２０は、非対称な２つの時定数で構成される。具体的には、立ち上がり用時定数τ_ｕは、立ち下がり用時定数τ_ｄより少なくとも１０倍以上、より好適には１００倍〜１０００倍以上大きく設定される。例えば、τ_ｕ＝Ｒ’Ｃ≧３００τ_ｄ〜３０００τ_ｄのように設定されてよい。一方、立ち下がり用時定数τ_ｄは、音声用時定数回路１８の音声用時定数τ_ｖと同じ値に設定される。本実施形態では、立ち下がり用時定数τ_ｄは音声用時定数に等しい（すなわち、τ_ｄ＝τ_ｖ＝ＲＣ）。 In the background noise time constant circuit 20 according to the present embodiment, the rising time constant τ _u is set to a value that is much larger than the falling time constant τ _d . That is, the background noise time constant circuit 20 is composed of two asymmetric time constants. Specifically, the rising time constant τ _u is set to be at least 10 times, more preferably 100 times to 1000 times larger than the falling time constant τ _d . For example, τ _u = R′C ≧ 300τ _{d to} 3000τ _d may be set. On the other hand, the time constant tau _d for falling is set to the same value and the time constant tau _v audio voice for the time constant circuit 18. In the present embodiment, the falling time constant τ _d is equal to the audio time constant (ie, τ _d = τ _v = RC).

暗騒音用時定数回路２０の後段に設けられた暗騒音用平方根回路２４は、暗騒音用時定数回路２０から入力された信号の平方根Ｂ（ｔ）を演算する。暗騒音用時定数回路２０の立ち上がり用時定数τ_ｕは、音声用時定数回路１８の音声用時定数τ_ｖよりも非常に大きな値に設定されるため、Ｂ（ｔ）は音声信号ｘ（ｔ）のレベル変化にはほとんど不感で、一般的にはほぼ一定と考えられる暗騒音レベル（例えば、銀行ロビーや病院の待合室などの暗騒音レベル）、つまり音信号Ｘ（ｔ）の最低レベル付近に維持される。すなわち、Ｂ（ｔ）は音素（ｍｏｒａ）間の僅かの無音部（途切れ目）を通じて素早く暗騒音レベルまで低下し、全体において常に暗騒音に等しいレベルを維持する。 The background noise square root circuit 24 provided at the subsequent stage of the background noise time constant circuit 20 calculates the square root B (t) of the signal input from the background noise time constant circuit 20. Since the rising time constant τ _u of the background noise time constant circuit 20 is set to a value that is much larger than the sound time constant τ _v of the sound time constant circuit 18, B (t) is the sound signal x ( t) is almost insensitive to changes in level, and is generally considered to be almost constant (eg, background noise level in a bank lobby or hospital waiting room), that is, near the lowest level of the sound signal X (t) Maintained. That is, B (t) quickly decreases to the background noise level through a slight silence (discontinuity) between phonemes (mora), and always maintains a level equal to background noise.

しかし場合によっては、大声の人が連続して話したり、短時間の始業ベルが鳴ったりしてゆっくりではあるが信号Ｂ（ｔ）が上昇することがあるため、それらが停止した時点で速やかに本来の暗騒音レベルに戻るよう、立ち下がり用時定数τ_ｄについては立ち上がり用時定数τ_ｕとは異なる値、具体的には音声用時定数回路１８の音声用時定数τ_ｖと同程度の時定数となっている。これにより、信号Ｂ（ｔ）は朝の早い時間から午前中にかけて、また昼食時から午後にかけて、といった対称空間のゆっくりした暗騒音変化には追従するが、音声程度の速いレベル変化にはほとんど追従せず不感、ということになる。このように変化する信号Ｂ（ｔ）は、刻々変化する音声包絡線信号Ａ（ｔ）に対し、暗騒音信号ｎ（ｔ）の包絡線、すなわち暗騒音信号ｎ（ｔ）の実効値ｎ_ｒｍｓと見なすことができる（以下の数式参照）。以下、Ｂ（ｔ）を「暗騒音包絡線信号」と呼ぶ。

However, in some cases, a loud speaker may speak continuously or a short start bell will ring, but the signal B (t) will rise slowly, but promptly when they stop. In order to return to the original background noise level, the falling time constant τ _d is different from the rising time constant τ _u , specifically, the same as the audio time constant τ _v of the audio time constant circuit 18. It is a time constant. As a result, the signal B (t) follows a slow background noise change in a symmetric space from early morning to morning and from lunch to afternoon, but almost follows a quick level change such as speech. It would be insensitive. The signal B (t) changing in this way is an envelope of the background noise signal n (t), that is, an effective value n _{rms of the} background noise signal n (t), with respect to the voice envelope signal A (t) changing every moment. (See formula below). Hereinafter, B (t) is referred to as a “background noise envelope signal”.

比較器３０は、音声用平方根回路２２から出力された音声包絡線信号Ａ（ｔ）と暗騒音用平方根回路２４から出力された暗騒音包絡線信号Ｂ（ｔ）とを比較する。ここで、本実施形態では、比較器３０に入力する前に、暗騒音包絡線信号Ｂ（ｔ）をバッファアンプ２６を用いて所定の増幅率ｍで増幅し、さらに加算器２８を用いて所定のオフセット値ｈを加算している。すなわち、本実施形態では、音声包絡線信号Ａ（ｔ）とＢ’（ｔ）＝ｍＢ（ｔ）＋ｈとが比較器３０で比較される。以下、Ｂ’（ｔ）＝ｍＢ（ｔ）＋ｈを「閾値信号」と呼ぶ。増幅率ｍは、例えばｍ＝１〜３の範囲で選択されてよい。また、オフセット値ｈは、例えばｈ＝０〜[Ｂ（ｔ）に想定される最大値の１０倍程度]の範囲から選択されてよい。一般に暗騒音レベルは音声レベルに対し十分低いので、このようにＢ（ｔ）よりも僅かに大きな閾値信号Ｂ’（ｔ）と音声包絡線信号Ａ（ｔ）とを比較することで、音素分割を安全・安定に行うことができる。変形例では、音声包絡線信号Ａ（ｔ）と暗騒音包絡線信号Ｂ（ｔ）とが直接比較されてもよい。 The comparator 30 compares the speech envelope signal A (t) output from the speech square root circuit 22 with the background noise envelope signal B (t) output from the background noise square root circuit 24. Here, in this embodiment, before inputting to the comparator 30, the background noise envelope signal B (t) is amplified with a predetermined amplification factor m using the buffer amplifier 26 and further added with the adder 28. The offset value h is added. That is, in the present embodiment, the speech envelope signal A (t) and B ′ (t) = mB (t) + h are compared by the comparator 30. Hereinafter, B ′ (t) = mB (t) + h is referred to as “threshold signal”. The amplification factor m may be selected in the range of m = 1 to 3, for example. The offset value h may be selected from a range of h = 0 to [about 10 times the maximum value assumed for B (t)], for example. In general, the background noise level is sufficiently lower than the voice level. Thus, by comparing the threshold signal B ′ (t) slightly larger than B (t) with the voice envelope signal A (t), the phoneme division is performed. Can be performed safely and stably. In the modification, the voice envelope signal A (t) and the background noise envelope signal B (t) may be directly compared.

比較器３０は、音声包絡線信号Ａ（ｔ）が閾値信号Ｂ’（ｔ）以上（すなわち、Ａ（ｔ）≧Ｂ’（ｔ））となる区間でゲート回路３２にハイレベルを出力し、音声包絡線信号Ａ（ｔ）が閾値信号Ｂ’（ｔ）未満（すなわち、Ａ（ｔ）＜Ｂ’（ｔ））となる区間でゲート回路３２にローレベルを出力する。 The comparator 30 outputs a high level to the gate circuit 32 in a section where the voice envelope signal A (t) is equal to or greater than the threshold signal B ′ (t) (that is, A (t) ≧ B ′ (t)), A low level is output to the gate circuit 32 in a section where the voice envelope signal A (t) is less than the threshold signal B ′ (t) (that is, A (t) <B ′ (t)).

ゲート回路３２は、比較器３０の比較結果に応じて、第１分岐部１３で分岐された他方の音信号Ｘ（ｔ）の通過／非通過を制御する。すなわち、ゲート回路３２は、比較器３０からハイレベルを受けたときは開状態となって音号Ｘ（ｔ）を通過させ、比較器３０からローレベルを受けたときには閉状態となって音号Ｘ（ｔ）を非通過とする。このような動作により、ゲート回路３２から音素信号が出力される。 The gate circuit 32 controls passage / non-passage of the other sound signal X (t) branched by the first branching unit 13 according to the comparison result of the comparator 30. That is, the gate circuit 32 is opened when the high level is received from the comparator 30 and passes the sound signal X (t), and is closed when the low level is received from the comparator 30. Let X (t) not pass. With such an operation, a phoneme signal is output from the gate circuit 32.

図４（ａ）〜（ｆ）は、図３に示す音素分割装置１０による音素分割処理を説明するための図である。図４（ａ）〜（ｆ）の縦軸は信号レベルｖを単位ｍＶで表し、横軸は時間ｔを単位ｍｓで表す。 FIGS. 4A to 4F are diagrams for explaining phoneme division processing by the phoneme division apparatus 10 shown in FIG. 4A to 4F, the vertical axis represents the signal level v in the unit mV, and the horizontal axis represents the time t in the unit ms.

図４（ａ）は、バンドパスフィルタ１５から出力される音信号Ｘ（ｔ）の波形を示す。音信号Ｘ（ｔ）は、音声信号ｘ（ｔ）に暗騒音信号ｎ（ｔ）が重畳されたものである。この音信号Ｘ（ｔ）は、第１分岐部１３で２つに分岐される。分岐された一方の音信号Ｘ（ｔ）は自乗回路１６に入力され、他方の音信号Ｘ（ｔ）はゲート回路３２に入力される。 FIG. 4A shows the waveform of the sound signal X (t) output from the bandpass filter 15. The sound signal X (t) is obtained by superimposing the background noise signal n (t) on the audio signal x (t). The sound signal X (t) is branched into two by the first branching unit 13. One of the branched sound signals X (t) is input to the square circuit 16, and the other sound signal X (t) is input to the gate circuit 32.

図４（ｂ）は、自乗回路１６から出力された自乗信号Ｘ^２（ｔ）の波形を示す。図４（ｂ）に示すように、自乗信号Ｘ^２（ｔ）は正の成分のみを含む。この自乗信号Ｘ^２（ｔ）は、第２分岐部１７で２つに分岐される。分岐された一方の自乗信号Ｘ^２（ｔ）は音声用時定数回路１８に入力され、他方の自乗信号Ｘ^２（ｔ）は暗騒音用時定数回路２０に入力される。 FIG. 4B shows the waveform of the square signal X ² (t) output from the square circuit 16. As shown in FIG. 4B, the square signal X ² (t) includes only a positive component. The square signal X ² (t) is branched into two by the second branching unit 17. One of the branched square signals X ² (t) is input to the audio time constant circuit 18, and the other square signal X ² (t) is input to the background noise time constant circuit 20.

音声用時定数回路１８で平滑化された信号は、音声用平方根回路２２でその平方根がとられる。この平方根は音声包絡線信号Ａ（ｔ）となる。図４（ｃ）は、音声用平方根回路２２から出力される音声包絡線信号Ａ（ｔ）の波形を示す。図４（ｃ）に示すように、音声包絡線信号Ａ（ｔ）は、ほぼ入力原音声の実効値ｘ_ｒｍｓに追従して変化する正の波形である。 The signal smoothed by the audio time constant circuit 18 is square-rooted by the audio square root circuit 22. This square root becomes the voice envelope signal A (t). FIG. 4C shows the waveform of the sound envelope signal A (t) output from the sound square root circuit 22. As shown in FIG. 4C, the speech envelope signal A (t) is a positive waveform that changes substantially following the effective value x _rms of the input original speech.

一方、暗騒音用時定数回路２０で平滑化された信号は、暗騒音用平方根回路２４でその平方根がとられる。この平方根は暗騒音包絡線信号Ｂ（ｔ）となる。図４（ｄ）は、暗騒音用平方根回路２４から出力される暗騒音包絡線信号Ｂ（ｔ）の波形を示す。図４（ｄ）に示すように、暗騒音包絡線信号Ｂ（ｔ）は、入力原音声にはほとんど追従せず、入力原音声の途切れ部分においてのみこれに沿って急速に最低値、即ち暗騒音レベルまで低下する。つまり、Ｂ（ｔ）は常に暗騒音のレベルに維持され、音素分割の域値として利用することができる。 On the other hand, the square root of the signal smoothed by the time constant circuit 20 for background noise is taken by the square root circuit 24 for background noise. This square root becomes the background noise envelope signal B (t). FIG. 4D shows the waveform of the background noise envelope signal B (t) output from the background noise square root circuit 24. As shown in FIG. 4 (d), the background noise envelope signal B (t) hardly follows the input original voice, and rapidly reaches the minimum value, that is, darkness only along the interrupted portion of the input original voice. Reduces to noise level. That is, B (t) is always maintained at the background noise level, and can be used as a phoneme division threshold.

暗騒音用平方根回路２４から出力された暗騒音包絡線信号Ｂ（ｔ）は、バッファアンプ２６でｍ倍に増幅された後、加算器２８でオフセット値ｈが加算され、閾値信号Ｂ’（ｔ）＝ｍＢ（ｔ）＋ｈとされる。図４（ｄ）には、暗騒音包絡線信号Ｂ（ｔ）に加えて、閾値信号Ｂ’（ｔ）の波形が図示されている。 The background noise envelope signal B (t) output from the background noise square root circuit 24 is amplified m times by the buffer amplifier 26, and then the offset value h is added by the adder 28, whereby the threshold signal B '(t) ) = MB (t) + h. FIG. 4D shows a waveform of the threshold signal B ′ (t) in addition to the background noise envelope signal B (t).

図４（ｃ）は、音声包絡線信号Ａ（ｔ）に加えて、閾値信号Ｂ’（ｔ）＝ｍＢ（ｔ）＋ｈを図示している。すなわち、図４（ｃ）には、比較器３０で比較される２つの信号が図示されている。図４（ｃ）に示すように、音声包絡線信号Ａ（ｔ）と閾値信号Ｂ’（ｔ）との交点が得られる。図４（ｅ）は、比較器３０の出力信号を示す。比較器３０は、音声包絡線信号Ａ（ｔ）と閾値信号Ｂ’（ｔ）との交点に有効に挟まれる区間のうち、Ａ（ｔ）≧Ｂ’（ｔ）となる区間でハイレベルを出力し、Ａ（ｔ）＜Ｂ’（ｔ）となる区間でローレベルを出力する。 FIG. 4C illustrates the threshold signal B ′ (t) = mB (t) + h in addition to the voice envelope signal A (t). That is, in FIG. 4C, two signals to be compared by the comparator 30 are shown. As shown in FIG. 4C, the intersection of the voice envelope signal A (t) and the threshold signal B '(t) is obtained. FIG. 4E shows the output signal of the comparator 30. The comparator 30 sets the high level in the section where A (t) ≧ B ′ (t) among the sections effectively sandwiched between the intersections of the voice envelope signal A (t) and the threshold signal B ′ (t). And outputs a low level in a section where A (t) <B ′ (t).

図４（ｆ）は、ゲート回路３２の出力信号を示す。ゲート回路３２は、比較器３０からハイレベルを受けたときだけ音号Ｘ（ｔ）を通過させ、比較器３０からローレベルを受けたときには音号Ｘ（ｔ）を非通過とする。これにより、図４（ｆ）に示すように音素と暗騒音が明確に区画され、３つの音素が分割・抽出されている。 FIG. 4F shows the output signal of the gate circuit 32. The gate circuit 32 passes the note X (t) only when it receives a high level from the comparator 30, and does not pass the note X (t) when it receives a low level from the comparator 30. As a result, as shown in FIG. 4 (f), phonemes and background noise are clearly divided, and three phonemes are divided and extracted.

以上、本実施形態に係る音素分割装置１０について説明した。この音素分割装置１０によれば、暗騒音のレベルが自動で検知されるので、時間帯によって暗騒音が変化しても音素を分割・抽出するための閾値は常に最適な値に維持される。その結果、従来よりも高い精度で音素分割を行うことができる。 The phoneme dividing device 10 according to the present embodiment has been described above. According to this phoneme dividing apparatus 10, since the background noise level is automatically detected, the threshold for dividing and extracting phonemes is always maintained at an optimum value even if the background noise changes with time. As a result, phoneme division can be performed with higher accuracy than in the past.

本実施形態の音素分割装置１０によれば、暗騒音のレベル変動に合わせて閾値Ｔを手作業で調整する作業が不要となるため、大きな合理化・省力化が可能となる。 According to the phoneme dividing apparatus 10 of the present embodiment, the work of manually adjusting the threshold value T in accordance with the fluctuation in the level of background noise is not required, so that significant rationalization and labor saving can be achieved.

図５は、変形例に係る音素分割装置５０を説明するための図である。図５に示す音素分割装置５０は、バンドパスフィルタ１５が第１分岐部１３と自乗回路１６との間に設けられている点が図３に示す音素分割装置１０と異なる。 FIG. 5 is a diagram for explaining a phoneme dividing apparatus 50 according to a modification. The phoneme division apparatus 50 shown in FIG. 5 is different from the phoneme division apparatus 10 shown in FIG. 3 in that the bandpass filter 15 is provided between the first branching unit 13 and the square circuit 16.

本変形例に係る音素分割装置５０では、第１分岐部１３と自乗回路１６の間にバンドパスフィルタ１５が設けられていることにより、ゲート回路３２にはバンドバスフィルタを通っていない音信号が入力される。従って、音素分割装置５０では原音声信号により近い音素信号が得られるため、図３に示す音素分割装置１０と比べて音質を向上することができる。なお、音素分割装置５０においては自乗回路１６に入る音信号はバンドパスフィルタ１５を通っているため、音素の分割・抽出精度は図３に示す音素分割装置と同等である。 In the phoneme division device 50 according to the present modification, the bandpass filter 15 is provided between the first branch unit 13 and the square circuit 16, so that a sound signal that does not pass through the bandpass filter is received in the gate circuit 32. Entered. Therefore, since the phoneme division device 50 can obtain a phoneme signal closer to the original speech signal, the sound quality can be improved as compared with the phoneme division device 10 shown in FIG. Note that in the phoneme dividing device 50, the sound signal entering the square circuit 16 passes through the bandpass filter 15, and therefore the phoneme dividing / extracting accuracy is equivalent to that of the phoneme dividing device shown in FIG.

図６は、本発明の別の実施形態に係る音素分割装置６０を説明するための図である。図６に示す音素分割装置６０は、自乗回路に代えて絶対値回路６２を備える点が図３に示す音素分割装置１０と異なる。 FIG. 6 is a diagram for explaining a phoneme division device 60 according to another embodiment of the present invention. The phoneme dividing device 60 shown in FIG. 6 is different from the phoneme dividing device 10 shown in FIG. 3 in that an absolute value circuit 62 is provided instead of the square circuit.

図６に示すように、音素分割装置６０は、マイクアンプ１４と、絶対値回路６２と、音声用時定数回路１８と、暗騒音用時定数回路２０と、バッファアンプ２６と、加算器２８と、可変抵抗器２９と、比較器３０と、ゲート回路３２とを備える。 As shown in FIG. 6, the phoneme division device 60 includes a microphone amplifier 14, an absolute value circuit 62, a sound time constant circuit 18, a background noise time constant circuit 20, a buffer amplifier 26, an adder 28, and the like. The variable resistor 29, the comparator 30, and the gate circuit 32 are provided.

マイクアンプ１４は、マイクロホン１２に接続される。マイクロホン１２は、会話などの原音声（マスキー）を集音し、音信号に変換する。マイクアンプ１４は、マイクロホン１２からの音信号を増幅する。マイクロホン１２で増幅された音信号Ｘ（ｔ）は、音声信号ｘ（ｔ）に暗騒音信号ｎ（ｔ）が重畳されたものである。 The microphone amplifier 14 is connected to the microphone 12. The microphone 12 collects original voice (masky) such as conversation and converts it into a sound signal. The microphone amplifier 14 amplifies the sound signal from the microphone 12. The sound signal X (t) amplified by the microphone 12 is obtained by superimposing the background noise signal n (t) on the audio signal x (t).

マイクアンプ１４から出力された音信号Ｘ（ｔ）は、第１分岐部１３で２つに分岐される。第１分岐部１３で分岐された一方の音信号Ｘ（ｔ）は絶対値回路６２に入力され、他方の音信号Ｘ（ｔ）はゲート回路３２に入力される。図３に示す音素分割装置１０と同様に、第１分岐部１３の前段にバンドパスフィルタが設けられてもよい。あるいは、図５に示す音素分割装置５０と同様に、第１分岐部１３と絶対値回路６２の間にバンドパスフィルタが設けられてもよい。また、第１分岐部１３と第２分岐部１７との間には絶対値回路６２が設けられている。 The sound signal X (t) output from the microphone amplifier 14 is branched into two by the first branching unit 13. One sound signal X (t) branched by the first branching unit 13 is input to the absolute value circuit 62, and the other sound signal X (t) is input to the gate circuit 32. Similarly to the phoneme dividing apparatus 10 shown in FIG. 3, a bandpass filter may be provided in the preceding stage of the first branching unit 13. Alternatively, a band pass filter may be provided between the first branching unit 13 and the absolute value circuit 62 as in the phoneme dividing device 50 shown in FIG. An absolute value circuit 62 is provided between the first branch unit 13 and the second branch unit 17.

絶対値回路６２は、第１分岐部１３で分岐された一方の音信号Ｘ（ｔ）の絶対値｜Ｘ（ｔ）｜を出力する。自乗回路を用いた実施形態と同様に、絶対値回路６２で音信号Ｘ（ｔ）の絶対値をとることで、正の値のみを処理すればよいため、信号処理を容易にすることができる。絶対値回路６２から出力された絶対値信号｜Ｘ（ｔ）｜は、第２分岐部１７で２つに分岐される。第２分岐部１７で分岐された一方の絶対値信号｜Ｘ（ｔ）｜は音声用時定数回路１８に入力され、他方の絶対値信号｜Ｘ（ｔ）｜は暗騒音用時定数回路２０に入力される。 The absolute value circuit 62 outputs an absolute value | X (t) | of one sound signal X (t) branched by the first branching unit 13. As in the embodiment using the square circuit, the absolute value circuit 62 takes the absolute value of the sound signal X (t), and only a positive value needs to be processed, thereby facilitating signal processing. . The absolute value signal | X (t) | output from the absolute value circuit 62 is branched into two by the second branching unit 17. One absolute value signal | X (t) | branched by the second branching unit 17 is input to the audio time constant circuit 18, and the other absolute value signal | X (t) | Is input.

音声用時定数回路１８は、抵抗値Ｒの第１抵抗器３４と、容量値Ｃの第２コンデンサ３６とから構成される一次のローパスフィルタである。音声用時定数回路１８は、絶対値信号｜Ｘ（ｔ）｜を数１０〜数１００ｍｓの音声用時定数τ_ｖで平滑化（平均化）する。音声用時定数回路１８から出力される信号Ａ（ｔ）は、音声信号ｘ（ｔ）の包絡線、すなわち音声信号ｘ（ｔ）の実効値ｘ_ｒｍｓと見なすことができる。以下、Ａ（ｔ）を「音声包絡線信号」と呼ぶ。 The audio time constant circuit 18 is a primary low-pass filter including a first resistor 34 having a resistance value R and a second capacitor 36 having a capacitance value C. The audio time constant circuit 18 smoothes (averages) the absolute value signal | X (t) | with an audio time constant τ _v of several tens to several hundreds of ms. The signal A (t) output from the audio time constant circuit 18 can be regarded as an envelope of the audio signal x (t), that is, an effective value x _{rms of the} audio signal x (t). Hereinafter, A (t) is referred to as a “voice envelope signal”.

暗騒音用時定数回路２０は、入力信号の立ち上がりと立ち下がりにおいて時定数が異なるように構成された一次のローパスフィルタである。暗騒音用時定数回路２０においては、入力信号の立ち上がりに対しては、立ち上がり用時定数τ_ｕ＝Ｒ’Ｃのローパスフィルタで平滑化が行われる。一方、入力信号の立ち下がりに対しては、立ち下がり用時定数τ_ｄ＝ＲＣのローパスフィルタで平滑化が行われる。 The background noise time constant circuit 20 is a primary low-pass filter configured to have different time constants at the rise and fall of the input signal. In the background noise time constant circuit 20, the rising of the input signal is smoothed by a low-pass filter having a rising time constant τ _u = R′C. On the other hand, the falling of the input signal is smoothed by a low-pass filter having a falling time constant τ _d = RC.

本実施形態に係る暗騒音用時定数回路２０において、立ち上がり用時定数τ_ｕは、立ち下がり用時定数τ_ｄよりも非常に大きな値に設定される。具体的には、立ち上がり用時定数τ_ｕは、立ち下がり用時定数τ_ｄより少なくとも１０倍以上、より好適には１００倍〜１０００倍以上大きく設定される。例えば、τ_ｕ＝Ｒ’Ｃ≧３００τ_ｄ〜３０００τ_ｄのように設定されてよい。一方、立ち下がり用時定数τ_ｄは、音声用時定数回路１８の音声用時定数τ_ｖと略同じ値に設定される。本実施形態では、立ち下がり用時定数τ_ｄは音声用時定数に等しい（すなわち、τ_ｄ＝τ_ｖ＝ＲＣ）。 In the background noise time constant circuit 20 according to the present embodiment, the rising time constant τ _u is set to a value that is much larger than the falling time constant τ _d . Specifically, the rising time constant τ _u is set to be at least 10 times, more preferably 100 times to 1000 times larger than the falling time constant τ _d . For example, τ _u = R′C ≧ 300τ _{d to} 3000τ _d may be set. On the other hand, the falling time constant τ _d is set to substantially the same value as the audio time constant τ _v of the audio time constant circuit 18. In the present embodiment, the falling time constant τ _d is equal to the audio time constant (ie, τ _d = τ _v = RC).

暗騒音用時定数回路２０で平滑化された信号Ｂ（ｔ）は、音声信号ｘ（ｔ）のレベル変化にはほとんど不感で、暗騒音レベル付近に維持される。信号Ｂ（ｔ）は、暗騒音信号ｎ（ｔ）の包絡線、すなわち暗騒音信号ｎ（ｔ）の実効値ｎ_ｒｍｓと見なすことができる。以下、Ｂ（ｔ）を「暗騒音包絡線信号」と呼ぶ。 The signal B (t) smoothed by the background noise time constant circuit 20 is almost insensitive to the level change of the audio signal x (t) and is maintained near the background noise level. The signal B (t) can be regarded as an envelope of the background noise signal n (t), that is, an effective value n _{rms of the} background noise signal n (t). Hereinafter, B (t) is referred to as a “background noise envelope signal”.

比較器３０は、音声用時定数回路１８から出力された音声包絡線信号Ａ（ｔ）と暗騒音用時定数回路２０から出力された暗騒音包絡線信号Ｂ（ｔ）とを比較する。ここで、本実施形態では、比較器３０に入力する前に、暗騒音包絡線信号Ｂ（ｔ）をバッファアンプ２６を用いて所定の増幅率ｍで増幅し、さらに加算器２８を用いて所定のオフセット値ｈを加算している。すなわち、本実施形態では、音声包絡線信号Ａ（ｔ）とＢ’（ｔ）＝ｍＢ（ｔ）＋ｈとが比較器３０で比較される。以下、Ｂ’（ｔ）＝ｍＢ（ｔ）＋ｈを「閾値信号」と呼ぶ。増幅率ｍは、例えばｍ＝１〜３の範囲で選択されてよい。また、オフセット値ｈは、例えばｈ＝０〜[Ｂ（ｔ）に想定される最大値の１０倍程度]の範囲から選択されてよい。変形例では、音声包絡線信号Ａ（ｔ）と暗騒音包絡線信号Ｂ（ｔ）とが直接比較されてもよい。 The comparator 30 compares the voice envelope signal A (t) output from the voice time constant circuit 18 with the background noise envelope signal B (t) output from the background noise time constant circuit 20. Here, in this embodiment, before inputting to the comparator 30, the background noise envelope signal B (t) is amplified with a predetermined amplification factor m using the buffer amplifier 26 and further added with the adder 28. The offset value h is added. That is, in the present embodiment, the speech envelope signal A (t) and B ′ (t) = mB (t) + h are compared by the comparator 30. Hereinafter, B ′ (t) = mB (t) + h is referred to as “threshold signal”. The amplification factor m may be selected in the range of m = 1 to 3, for example. The offset value h may be selected from a range of h = 0 to [about 10 times the maximum value assumed for B (t)], for example. In the modification, the voice envelope signal A (t) and the background noise envelope signal B (t) may be directly compared.

以上、本実施形態に係る音素分割装置６０について説明した。この音素分割装置６０においても、暗騒音のレベルが自動で検知されるので、時間帯によって暗騒音が変化しても音素を分割・抽出するための閾値は常に最適な値に維持される。その結果、従来よりも高い精度で音素分割を行うことができる。 The phoneme dividing device 60 according to the present embodiment has been described above. Also in this phoneme dividing device 60, since the level of background noise is automatically detected, the threshold for dividing and extracting phonemes is always maintained at an optimum value even if the background noise changes with time. As a result, phoneme division can be performed with higher accuracy than in the past.

また、本実施形態の音素分割装置６０においても、暗騒音のレベル変動に合わせて閾値Ｔを手作業で調整する作業が不要となるため、大きな合理化・省力化が可能となる。 In the phoneme dividing apparatus 60 according to the present embodiment as well, it is not necessary to manually adjust the threshold value T in accordance with the fluctuation in the level of background noise, so that significant rationalization and labor saving can be achieved.

図７は、本発明のさらに別の実施形態に係る音素分割装置７０を説明するための図である。図３，５，６に示す実施形態では、音素分割処理をアナログ回路で実現したが、図７に示す本実施形態では、音素分割処理をソフトウェアによって実現している。 FIG. 7 is a diagram for explaining a phoneme dividing device 70 according to still another embodiment of the present invention. In the embodiment shown in FIGS. 3, 5, and 6, the phoneme division process is realized by an analog circuit, but in the present embodiment shown in FIG. 7, the phoneme division process is realized by software.

音素分割装置７０は、マイクアンプ１４と、ＤＳＰ（Digital Signal Processor）ボード７１とを備える。ＤＳＰボード７１には、入力アンプ７２と、Ａ／Ｄ変換器７３と、ＤＳＰ７４と、Ｄ／Ａ変換器７５と、出力アンプ７６と、ＲＯＭ７７と、ＳＤ−ＲＡＭ７８と、入力ポート７９と、出力ポート８０とが実装されている。 The phoneme dividing device 70 includes a microphone amplifier 14 and a DSP (Digital Signal Processor) board 71. The DSP board 71 includes an input amplifier 72, an A / D converter 73, a DSP 74, a D / A converter 75, an output amplifier 76, a ROM 77, an SD-RAM 78, an input port 79, and an output port. 80 is implemented.

マイクアンプ１４は、マイクロホン１２に接続される。マイクロホン１２は、会話などの原音声（マスキー）を集音し、音信号に変換する。マイクアンプ１４は、マイクロホン１２からの音信号を増幅する。マイクロホン１２で増幅された音信号Ｘ（ｔ）は、ＤＳＰボード７１の入力ポート７９に入力される。音信号Ｘ（ｔ）は、アナログ信号であり、音声信号に暗騒音信号が重畳されたものである。入力ポート７９から入力された音信号Ｘ（ｔ）は、入力アンプ７２で増幅された後、Ａ／Ｄ変換器７３でデジタル信号に変換される。Ａ／Ｄ変換器７３から出力された音信号Ｘ（ｔ）のデジタル信号は、ＤＳＰ７４に入力される。 The microphone amplifier 14 is connected to the microphone 12. The microphone 12 collects original voice (masky) such as conversation and converts it into a sound signal. The microphone amplifier 14 amplifies the sound signal from the microphone 12. The sound signal X (t) amplified by the microphone 12 is input to the input port 79 of the DSP board 71. The sound signal X (t) is an analog signal, and a background noise signal is superimposed on a sound signal. The sound signal X (t) input from the input port 79 is amplified by the input amplifier 72 and then converted to a digital signal by the A / D converter 73. The digital signal of the sound signal X (t) output from the A / D converter 73 is input to the DSP 74.

ＤＳＰ７４は、音素分割処理を行うためのプログラムを格納するＲＯＭ７７と、ＤＳＰ７４で処理中のデータを格納するＳＤ−ＲＡＭ７８と接続されている。ＤＳＰ７４は、ＲＯＭ７７から音素分割プログラムを読み込み、音素分割処理を行う。 The DSP 74 is connected to a ROM 77 that stores a program for performing phoneme division processing and an SD-RAM 78 that stores data being processed by the DSP 74. The DSP 74 reads a phoneme division program from the ROM 77 and performs phoneme division processing.

ＲＯＭ７７に格納された音素分割プログラムは、ＤＳＰ７４に、音信号Ｘ（ｔ）を２つに分岐する第１分岐ステップと、第１分岐ステップで分岐された一方の音信号Ｘ（ｔ）を自乗する自乗ステップと、自乗信号Ｘ^２（ｔ）を２つに分岐する第２分岐ステップと、第２分岐ステップで分岐された一方の自乗信号Ｘ^２（ｔ）を数１０〜数１００ｍｓの音声用時定数τ_ｖで平滑化する第１平滑化ステップと、第１平滑化ステップで平滑化された信号の平方根を演算する第１平方根演算ステップと、第２分岐ステップで分岐された他方の自乗信号Ｘ^２（ｔ）の立ち上がりに対しては音声用時定数τ_ｖより少なくとも１０倍以上、より好適には１００〜１０００倍以上大きい立ち上がり用時定数τ_ｕで平滑化するとともに、他方の自乗信号Ｘ^２（ｔ）の立ち下がりに対しては音声用時定数τ_ｖと略同じ立ち下がり用時定数τ_ｄで平滑化する第２平滑化ステップと、第２平滑化ステップで平滑化された信号の平方根を演算する第２平方根演算ステップと、第１平方根演算ステップで演算された音声包絡線信号Ａ（ｔ）と、第２平方根演算ステップで演算された暗騒音包絡線信号Ｂ（ｔ）とを比較する比較ステップと、比較ステップの比較結果に応じて、第１分岐ステップで分岐された他方の音信号の通過／非通過を制御する通過制御ステップと、を実行させるためのプログラムであってよい。このプログラムの比較ステップでは、音声包絡線信号Ａ（ｔ）と、閾値信号Ｂ’（ｔ）＝ｍＢ（ｔ）＋ｈとが比較されてもよい。増幅率ｍは、例えばｍ＝１〜３の範囲で選択されてよい。また、オフセット値ｈは、例えばｈ＝０〜[Ｂ（ｔ）に想定される最大値の１０倍程度]の範囲から選択されてよい。 The phoneme division program stored in the ROM 77 squares the DSP 74 with the first branch step for branching the sound signal X (t) into two and the one sound signal X (t) branched at the first branch step. and square step, a second branching step of branching squared signal ^X 2 (t) into two, when a voice of one squared signal ^X 2 (t) several tens to several hundreds of 100ms to branched by the second branching step a first smoothing step for smoothing constant tau _v, a first square root operation step of calculating the square root of the smoothed signal by the first smoothing step, the other squared signals X branched by the second branching step ² (t) rising to at least 10 times more than the time constant tau _v for voice, more preferably with smoothing in time constant tau _u for greater rise above 100 to 1000 times, the other squared signals ^{X 2} a second smoothing step for smoothing a time constant tau _d for falling substantially the same standing as the time constant tau _v audio for the fall of t), the square root of the smoothed signal by the second smoothing step The second square root calculation step to be calculated is compared with the voice envelope signal A (t) calculated in the first square root calculation step and the background noise envelope signal B (t) calculated in the second square root calculation step. It may be a program for executing a comparison step and a passage control step for controlling passage / non-passage of the other sound signal branched in the first branch step according to the comparison result of the comparison step. In the comparison step of this program, the voice envelope signal A (t) may be compared with the threshold signal B ′ (t) = mB (t) + h. The amplification factor m may be selected in the range of m = 1 to 3, for example. The offset value h may be selected from a range of h = 0 to [about 10 times the maximum value assumed for B (t)], for example.

あるいは、ＲＯＭ７７に格納された音素分割プログラムは、ＤＳＰ７４に、音信号Ｘ（ｔ）を２つに分岐する第１分岐ステップと、第１分岐ステップで分岐された一方の音信号Ｘ（ｔ）の絶対値｜Ｘ（ｔ）｜を演算する絶対値演算ステップと、絶対値演算ステップからの絶対値信号｜Ｘ（ｔ）｜を２つに分岐する第２分岐ステップと、第２分岐ステップで分岐された一方の絶対値信号｜Ｘ（ｔ）｜を数１０〜数１００ｍｓの音声用時定数τ_ｖで平滑化する第１平滑化ステップと、第２分岐ステップで分岐された他方の絶対値信号｜Ｘ（ｔ）｜の立ち上がりに対しては音声用時定数τ_ｖより少なくとも１０倍以上、より好適には１００〜１０００倍以上大きい立ち上がり用時定数τ_ｕで平滑化するとともに、他方の絶対値信号｜Ｘ（ｔ）｜の立ち下がりに対しては音声用時定数τ_ｖと略同じ立ち下がり用時定数τ_ｄで平滑化する第２平滑化ステップと、第１平滑化ステップで平滑化された音声包絡線信号Ａ（ｔ）と、第２平滑化ステップで平滑化された暗騒音包絡線信号Ｂ（ｔ）とを比較する比較ステップと、比較ステップの比較結果に応じて、第１分岐ステップで分岐された他方の音信号Ｘ（ｔ）の通過／非通過を制御する通過制御ステップと、を実行させるためのプログラムであってもよい。このプログラムの比較ステップにおいても、音声包絡線信号Ａ（ｔ）と、閾値信号Ｂ’（ｔ）＝ｍＢ（ｔ）＋ｈとが比較されてもよい。増幅率ｍは、例えばｍ＝１〜３の範囲で選択されてよい。また、オフセット値ｈは、例えばｈ＝０〜[Ｂ（ｔ）に想定される最大値の１０倍程度]の範囲から選択されてよい。 Alternatively, the phoneme division program stored in the ROM 77 can be obtained by feeding the DSP 74 the first branch step for branching the sound signal X (t) into two and one of the sound signals X (t) branched in the first branch step. An absolute value calculation step for calculating the absolute value | X (t) |, a second branch step for branching the absolute value signal | X (t) | from the absolute value calculation step into two, and a branch at the second branch step First absolute value signal | X (t) | smoothed with a time constant τ _v for speech of several tens to several hundreds of ms, and the other absolute value signal branched in the second branch step The rise of | X (t) | is smoothed with a rise time constant τ _u that is at least 10 times, more preferably 100 to 1000 times greater than the audio time constant τ _v , and the other absolute value Signal | X (t) | A second smoothing step for smoothing a time constant tau _d for falling substantially the same standing as the time constant tau _v audio for falling, first smoothing step smoothing speech envelope signal A (t) And the background noise signal B (t) smoothed in the second smoothing step, and the other sound signal branched in the first branching step according to the comparison result of the comparison step And a passage control step for controlling passage / non-passage of X (t). Also in the comparison step of this program, the voice envelope signal A (t) may be compared with the threshold signal B ′ (t) = mB (t) + h. The amplification factor m may be selected in the range of m = 1 to 3, for example. The offset value h may be selected from a range of h = 0 to [about 10 times the maximum value assumed for B (t)], for example.

ＤＳＰ７４からの出力された音素のデジタル信号は、Ｄ／Ａ変換器７５でアナログ信号に変換された後、出力アンプ７６で増幅され、出力ポート８０から出力される。 The phoneme digital signal output from the DSP 74 is converted into an analog signal by the D / A converter 75, amplified by the output amplifier 76, and output from the output port 80.

図８（ａ）〜（ｃ）は、図７に示す音素分割装置７０による音素分割処理を説明するための図である。図８（ａ）〜（ｃ）の縦軸は信号レベルを任意の単位で表し、横軸は時間を任意の単位で表す。 8A to 8C are diagrams for explaining phoneme division processing by the phoneme division device 70 shown in FIG. 8A to 8C, the vertical axis represents signal level in arbitrary units, and the horizontal axis represents time in arbitrary units.

図８（ａ）は、音声包絡線信号Ａ（ｔ）の波形を示す。図８（ｂ）は、暗騒音包絡線信号Ｂ（ｔ）の波形を示す。図８（ｃ）は、音声包絡線信号Ａ（ｔ）と閾値信号Ｂ’（ｔ）＝ｍＢ（ｔ）＋ｈを比較した波形（すなわち、Ａ（ｔ）−Ｂ’（ｔ））を示す。ここでは、増幅率ｍ＝１、オフセット値ｈ＝２００に設定されている。図８（ａ）および（ｂ）に示すように、音素分割処理をソフトウェアで行った場合も、音信号から適切に音声包絡線信号Ａ（ｔ）と暗騒音包絡線信号Ｂ（ｔ）とを分離することができる。暗騒音のレベルが自動で検知されるため、図８（ｃ）に示すように音声包絡線信号Ａ（ｔ）と閾値信号Ｂ’（ｔ）との比較により高い精度で音素の分割・抽出を行うことができる。また、本実施形態の音素分割装置７０によれば、暗騒音のレベル変動に合わせて閾値Ｔを手作業で調整する作業が不要となるため、大きな合理化・省力化が可能となる。 FIG. 8A shows the waveform of the voice envelope signal A (t). FIG. 8B shows the waveform of the background noise envelope signal B (t). FIG. 8C shows a waveform (ie, A (t) −B ′ (t)) comparing the voice envelope signal A (t) and the threshold signal B ′ (t) = mB (t) + h. Here, the amplification factor m = 1 and the offset value h = 200 are set. As shown in FIGS. 8A and 8B, when the phoneme division process is performed by software, the sound envelope signal A (t) and the background noise envelope signal B (t) are appropriately obtained from the sound signal. Can be separated. Since the background noise level is automatically detected, as shown in FIG. 8C, the phoneme segmentation / extraction is performed with high accuracy by comparing the voice envelope signal A (t) and the threshold signal B ′ (t). It can be carried out. Further, according to the phoneme dividing apparatus 70 of the present embodiment, the work of manually adjusting the threshold value T in accordance with the background noise level fluctuation is not required, so that significant rationalization and labor saving can be achieved.

図９は、本発明のさらに別の実施形態に係る音声処理システム９０を説明するための図である。この音声処理システム９０は、上述の音素分割装置を利用して入力された音声に所定の処理を施し、空間に出力するものである。 FIG. 9 is a diagram for explaining an audio processing system 90 according to still another embodiment of the present invention. The speech processing system 90 performs predetermined processing on speech input using the above-described phoneme splitting device and outputs it to a space.

図９に示すように、音声処理システム９０は、集音装置としてのマイクロホン１２と、マイクアンプ１４と、音素分割装置９２と、音素処理装置９４と、アンプ９５と、出力装置としてのスピーカ９６とを備える。マイクロホン１２は、原音声を集音して、音声信号に暗騒音信号が重畳された音信号を出力する。マイクアンプ１４は、マイクロホン１２からの音信号を増幅する。音素分割装置９２は、マイクアンプ１４からの増幅音信号を受信して、音声信号を音素に分割する。音素分割装置９２としては、上述の音素分割装置１０，５０，６０，７０を好適に利用することができる。音素処理装置９４は、音素処理装置９４から得られる音素信号に所定の処理を施す。この所定の処理の例については後述する。アンプ９５は、音素処理装置９４によって処理された音素信号を増幅する。スピーカ９６は、増幅された音素信号を音として空間に出力する。 As shown in FIG. 9, the sound processing system 90 includes a microphone 12 as a sound collecting device, a microphone amplifier 14, a phoneme dividing device 92, a phoneme processing device 94, an amplifier 95, and a speaker 96 as an output device. Is provided. The microphone 12 collects the original sound and outputs a sound signal in which a background noise signal is superimposed on the sound signal. The microphone amplifier 14 amplifies the sound signal from the microphone 12. The phoneme dividing device 92 receives the amplified sound signal from the microphone amplifier 14 and divides the audio signal into phonemes. As the phoneme dividing device 92, the above-described phoneme dividing devices 10, 50, 60, and 70 can be preferably used. The phoneme processing device 94 performs predetermined processing on the phoneme signal obtained from the phoneme processing device 94. An example of this predetermined process will be described later. The amplifier 95 amplifies the phoneme signal processed by the phoneme processing device 94. The speaker 96 outputs the amplified phoneme signal as a sound to the space.

音声処理システム９０は、例えばスピーチプライバシーシステム（音声情報秘話装置）であってよい。スピーチプライバシーシステムは、信号処理により音声信号の構造自体を略実時間で変更・処理することにより、音声信号のスペクトラムやエネルギー包絡線など統計的な性質を大きく変更することなく、その音声の内容のみを隠蔽／遮断し、受聴者に会話の中身を理解不能とするものである．このスピーチプライバシーシステムは、従来の音声マスキングシステムと異なり、原音声の発生時（発声時）以外には音が出ないので、室内の騒音レベルや受聴者の不快感を増長させることなく、音声の内容のみを有効に隠蔽することができる。スピーチプライバシーシステムの詳細については、例えば上記の特許文献１を参照されたい。 The voice processing system 90 may be, for example, a speech privacy system (voice information secret talk device). The speech privacy system changes and processes the structure of the audio signal in real time through signal processing, so that only the content of the audio is obtained without greatly changing the statistical properties such as the spectrum of the audio signal and the energy envelope. It hides / blocks and makes the contents of the conversation unintelligible to the listener. Unlike conventional voice masking systems, this speech privacy system produces no sound except when the original voice is generated (during utterance), thus increasing the noise level of the room and increasing the listener's discomfort. Only the contents can be effectively concealed. For details of the speech privacy system, see, for example, Patent Document 1 described above.

スピーチプライバシーシステムにおいては、音素処理装置９４は、音素分割装置９２で分割・抽出された音素（ｍｏｒａ）を再配置、例えば音素の順番を入れ替えたりする。そしてこの再配置された音素信号がスピーカ９６から音として空間に出力される。このスピーカ９６からの音により原音声がマスキングされるため、原音声の内容を受聴者に理解不能とすることができる。 In the speech privacy system, the phoneme processing device 94 rearranges the phonemes (mora) divided and extracted by the phoneme dividing device 92, for example, rearranges the order of the phonemes. The rearranged phoneme signals are output from the speaker 96 as sound to the space. Since the original voice is masked by the sound from the speaker 96, the contents of the original voice can be made unintelligible to the listener.

あるいは、音声処理システム９０は、携帯電話、無線機、トランシーバなどの通信システムであってもよい。例えば工事現場やガード下、或いは鉄道のホームなどで携帯電話を使う場合、受信側では暗騒音が受信音声に重畳し、会話内容の理解を妨げる。すなわち、聞き取りや文章了解度が低下する。そこで、音素処理装置９４は、音素分割装置９２で分割・抽出された音素間（すなわち、会話の途切れ部分）の出力をゼロ（無音）にする。このように処理された音素信号をスピーカ９６から出力することで騒音低減・通話品質の向上を図ることができる。このような通信システムにおいて、音素分割装置９２の暗騒音用時定数回路における立ち上がり用時定数τ_ｕ＝Ｒ’Ｃは、スピーチプライバシーシステムに用いる場合より小さく設定されることが好ましい。なお、音声部分には暗騒音が依然として重畳しているが、聴覚の補完作用により音声のあるこの部分の暗騒音はほとんど認識されず、聴感的には騒音がほとんど除去されたように認識されるため、聞き取りは大きく改善される。 Alternatively, the voice processing system 90 may be a communication system such as a mobile phone, a radio, or a transceiver. For example, when a mobile phone is used at a construction site, under a guard, or at a railway platform, background noise is superimposed on the received voice on the receiving side, which hinders understanding of the conversation content. That is, listening and sentence comprehension are reduced. Therefore, the phoneme processing device 94 sets the output between the phonemes divided and extracted by the phoneme dividing device 92 (that is, the discontinuous part of the conversation) to zero (silence). By outputting the phoneme signal thus processed from the speaker 96, it is possible to reduce noise and improve call quality. In such a communication system, it is preferable that the rising time constant τ _u = R′C in the background noise time constant circuit of the phoneme dividing device 92 is set smaller than that used in the speech privacy system. Note that background noise is still superimposed on the audio part, but the background noise is hardly recognized due to auditory complementation, and perceived as if the noise was almost eliminated. Therefore, listening is greatly improved.

あるいは、上述の実施形態に係る音素分割装置は、音声認識機能を内包した車載ナビゲーションシステムに用いられてもよい。上述の音素分割装置から出力される音素信号を音声認識に利用することで、刻々変化する走行騒音の影響を受けることなく音声の認識率を向上させることができる。この場合、音素分割装置の暗騒音用時定数回路における立ち上がり用時定数τ_ｕ＝Ｒ’Ｃは、スピーチプライバシーシステムに用いる場合より小さく設定されることが好ましい。 Or the phoneme division | segmentation apparatus which concerns on the above-mentioned embodiment may be used for the vehicle-mounted navigation system which included the speech recognition function. By using the phoneme signal output from the above phoneme splitting device for speech recognition, the speech recognition rate can be improved without being affected by the constantly changing traveling noise. In this case, the rise time constant τ _u = R′C in the background noise time constant circuit of the phoneme division apparatus is preferably set smaller than that used in the speech privacy system.

あるいは、上述の実施形態に係る音素分割装置は、半二重通信のVOX（Voice Operating tX; tx=Transmitter）機能に利用されてもよい。音素分割装置から出力される音素信号に基づいて発話の発生を的確に把握することで、確実に送信・受信を切り替えることが可能となる。 Alternatively, the phoneme division device according to the above-described embodiment may be used for a VOX (Voice Operating tX; tx = Transmitter) function of half-duplex communication. By accurately grasping the occurrence of an utterance based on the phoneme signal output from the phoneme dividing device, it is possible to switch between transmission and reception with certainty.

図１０は、本発明のさらに別の実施形態に係る騒音測定装置１２０を説明するための図である。図１０に示す騒音測定装置１２０は、周囲音に含まれる暗騒音のレベルを測定することができる。騒音測定装置１２０は、マイクロホン１２と、マイクアンプ１４と、バンドパスフィルタ１５と、自乗回路１６と、暗騒音用時定数回路２０と、暗騒音用平方根回路２４とを備える。 FIG. 10 is a diagram for explaining a noise measurement device 120 according to still another embodiment of the present invention. The noise measurement device 120 shown in FIG. 10 can measure the level of background noise included in the ambient sound. The noise measuring device 120 includes a microphone 12, a microphone amplifier 14, a bandpass filter 15, a square circuit 16, a background constant circuit 20 for background noise, and a square root circuit 24 for background noise.

マイクロホン１２は、周囲音を集音して音信号に変換する。マイクアンプ１４は、マイクロホン１２からの音信号を増幅する。バンドパスフィルタ１５は、マイクアンプ１４からの増幅音信号のうち、所定の通過帯域の信号成分を通過させる。バンドパスフィルタ１５から出力される音信号Ｘ（ｔ）は、音声信号ｘ（ｔ）に室（空間）の暗騒音信号ｎ（ｔ）が重畳されたものである。 The microphone 12 collects ambient sounds and converts them into sound signals. The microphone amplifier 14 amplifies the sound signal from the microphone 12. The band-pass filter 15 passes a signal component in a predetermined pass band in the amplified sound signal from the microphone amplifier 14. The sound signal X (t) output from the bandpass filter 15 is obtained by superimposing the background noise signal n (t) of the room (space) on the sound signal x (t).

自乗回路１６は、音信号Ｘ（ｔ）の自乗信号Ｘ^２（ｔ）を出力する。音信号Ｘ（ｔ）には、正負の値が含まれる。自乗回路１６で音信号Ｘ（ｔ）を自乗することで、正の値のみを処理すればよいため、信号処理を容易にすることができる。自乗回路１６は、絶対値回路に置き換えられてもよい。この場合、暗騒音用平方根回路２４は不要となる。 The square circuit 16 outputs a square signal X ² (t) of the sound signal X (t). The sound signal X (t) includes positive and negative values. Since the squaring circuit 16 squares the sound signal X (t), only a positive value needs to be processed, so that signal processing can be facilitated. The square circuit 16 may be replaced with an absolute value circuit. In this case, the background noise square root circuit 24 is not necessary.

本実施形態に係る暗騒音用時定数回路２０において、立ち上がり用時定数τ_ｕは、立ち下がり用時定数τ_ｄよりも非常に大きな値に設定される。具体的には、立ち下がり用時定数τ_ｄは、数１０ｍｓ〜数１００ｍｓ（例えば１２５ｍｓ）の比較的小さい値に設定される。一方、立ち上がり用時定数τ_ｕは、立ち下がり用時定数τ_ｄより少なくとも１０倍以上、より好適には１００倍〜１０００倍以上大きく設定される。例えば、τ_ｕ＝Ｒ’Ｃ≧３００τ_ｄ〜３０００τ_ｄのように設定されてよい。 In the background noise time constant circuit 20 according to the present embodiment, the rising time constant τ _u is set to a value that is much larger than the falling time constant τ _d . Specifically, the falling time constant τ _d is set to a relatively small value of several tens of ms to several hundreds of ms (for example, 125 ms). On the other hand, the rising time constant τ _u is set at least 10 times, more preferably 100 times to 1000 times larger than the falling time constant τ _d . For example, τ _u = R′C ≧ 300τ _{d to} 3000τ _d may be set.

暗騒音用時定数回路２０の後段に設けられた暗騒音用平方根回路２４は、暗騒音用時定数回路２０から入力された信号の平方根Ｂ（ｔ）を演算する。上述したように、この信号Ｂ（ｔ）は、暗騒音信号ｎ（ｔ）の包絡線、すなわち暗騒音信号ｎ（ｔ）の実効値ｎ_ｒｍｓ（すなわち暗騒音のレベル）と見なすことができる。 The background noise square root circuit 24 provided at the subsequent stage of the background noise time constant circuit 20 calculates the square root B (t) of the signal input from the background noise time constant circuit 20. As described above, the signal B (t) can be regarded as an envelope of the background noise signal n (t), that is, an effective value n _rms (that is, background noise level) of the background noise signal n (t).

このように、本実施形態に係る騒音測定装置１２０によれば、周囲音に含まれる暗騒音のレベルを測定することができる。本実施形態に係る騒音測定装置１２０は、会話や特定の変動騒音（有意味騒音）がある空間での暗騒音測定に特に有効である。騒音測定装置１２０は、暗騒音用平方根回路２４から出力される信号を表示する表示部を備えてもよい。この場合、暗騒音を視覚的に認識することができる。 Thus, according to the noise measurement apparatus 120 according to the present embodiment, the level of background noise included in the ambient sound can be measured. The noise measuring device 120 according to the present embodiment is particularly effective for measuring background noise in a space where there is conversation or specific variable noise (significant noise). The noise measurement device 120 may include a display unit that displays a signal output from the background noise square root circuit 24. In this case, background noise can be visually recognized.

騒音測定装置１２０は、例えばテレビジョンシステム、車載テレビジョンシステム、カーステレオシステム等の音響システムに用いることができる。騒音測定装置１２０で測定される暗騒音のレベルは、在室者間の会話音声や短時間の間歇的騒音の影響を受けない。従って、この暗騒音のレベルを参照することで、例えば、暗騒音レベルが高い場合にはスピーカーの音量を上げ、暗騒音レベルが低い場合にはスピーカーの音量を下げるといったように、スピーカーの音量を最適に制御することができる。 The noise measuring device 120 can be used for an acoustic system such as a television system, an in-vehicle television system, a car stereo system, and the like. The level of background noise measured by the noise measuring device 120 is not affected by conversational voices between the occupants and intermittent noise for a short time. Therefore, referring to this background noise level, for example, the speaker volume is increased when the background noise level is high, and the speaker volume is decreased when the background noise level is low. It can be controlled optimally.

図１１は、本発明のさらに別の実施形態に係る騒音測定装置１３０を説明するための図である。図１１に示す騒音測定装置１３０は、周囲音に含まれる騒音のレベルを測定することができる。騒音測定装置１３０は、マイクロホン１２と、マイクアンプ１４と、バンドパスフィルタ１５と、自乗回路１６と、音声用時定数回路１８と、暗騒音用時定数回路２０と、音声用平方根回路２２と、暗騒音用平方根回路２４と、表示部１３４とを備える。 FIG. 11 is a diagram for explaining a noise measurement device 130 according to still another embodiment of the present invention. The noise measuring device 130 shown in FIG. 11 can measure the level of noise included in the ambient sound. The noise measurement device 130 includes a microphone 12, a microphone amplifier 14, a bandpass filter 15, a square circuit 16, a sound time constant circuit 18, a background noise time constant circuit 20, a sound square root circuit 22, The background noise square root circuit 24 and a display unit 134 are provided.

自乗回路１６は、音信号Ｘ（ｔ）の自乗信号Ｘ^２（ｔ）を出力する。音信号Ｘ（ｔ）には、正負の値が含まれる。自乗回路１６で音信号Ｘ（ｔ）を自乗することで、正の値のみを処理すればよいため、信号処理を容易にすることができる。自乗回路１６は、絶対値回路に置き換えられてもよい。この場合、音声用平方根回路２２および暗騒音用平方根回路２４は不要となる。 The square circuit 16 outputs a square signal X ² (t) of the sound signal X (t). The sound signal X (t) includes positive and negative values. Since the squaring circuit 16 squares the sound signal X (t), only a positive value needs to be processed, so that signal processing can be facilitated. The square circuit 16 may be replaced with an absolute value circuit. In this case, the voice square root circuit 22 and the background noise square root circuit 24 are not required.

自乗回路１６から出力された自乗信号Ｘ^２（ｔ）は、分岐部１３２で２つの分岐される。分岐部１３２で分岐された一方の自乗信号Ｘ^２（ｔ）は音声用時定数回路１８に入力され、他方の自乗信号Ｘ^２（ｔ）は暗騒音用時定数回路２０に入力される。 The square signal X ² (t) output from the square circuit 16 is branched into two by the branch unit 132. One square signal X ² (t) branched by the branch unit 132 is input to the audio time constant circuit 18, and the other square signal X ² (t) is input to the background noise time constant circuit 20.

音声用時定数回路１８は、数１０〜数１００ｍｓの音声用時定数τ_ｖを有する一次のローパスフィルタである。音声用時定数回路１８は、入力信号を音声用時定数τ_ｖで平滑化（平均化）する。音声用時定数回路１８から出力される信号Ａ（ｔ）は、音声信号ｘ（ｔ）の包絡線、すなわち音声信号ｘ（ｔ）の実効値ｘ_ｒｍｓ（すなわち音声信号のレベル）と見なすことができる。 The audio time constant circuit 18 is a first-order low-pass filter having an audio time constant τ _v of several tens to several hundreds of ms. The audio time constant circuit 18 smoothes (averages) the input signal with the audio time constant τ _v . The signal A (t) output from the audio time constant circuit 18 may be regarded as an envelope of the audio signal x (t), that is, an effective value x _rms (that is, an audio signal level) of the audio signal x (t). it can.

暗騒音用時定数回路２０は、入力信号の立ち上がりと立ち下がりにおいて時定数が異なるように構成された一次のローパスフィルタである。暗騒音用時定数回路２０においては、入力信号の立ち上がりに対しては、立ち上がり用時定数τ_ｕ＝Ｒ’Ｃのローパスフィルタで平滑化が行われる。一方、入力信号の立ち下がりに対しては、立ち下がり用時定数τ_ｄ＝ＲＣのローパスフィルタで平滑化が行われる。音声用時定数回路１８による平滑化処理（平均化処理）により、自乗信号Ｘ^２（ｔ）から音声用時定数τ_ｖよりも速い成分が取り除かれ、自乗信号Ｘ^２（ｔ）の包絡線信号が得られる。 The background noise time constant circuit 20 is a primary low-pass filter configured to have different time constants at the rise and fall of the input signal. In the background noise time constant circuit 20, the rising of the input signal is smoothed by a low-pass filter having a rising time constant τ _u = R′C. On the other hand, the falling of the input signal is smoothed by a low-pass filter having a falling time constant τ _d = RC. The smoothing by the audio time constant circuit 18 processes (averaging process), squared signal X ^{2 (t)} at the time for sound from the constant tau _v fast component than is removed, the envelope signal of the squared signal X ^{2 (t)} Is obtained.

音声用時定数回路１８の後段に設けられた音声用平方根回路２２は、音声用時定数回路１８から入力された信号の平方根を演算する。この音声用平方根回路２２から出力される信号Ａ（ｔ）は、音声信号ｘ（ｔ）の包絡線、すなわち音声信号ｘ（ｔ）の実効値ｘ_ｒｍｓ（すなわち音声信号のレベル）と見なすことができる。 The audio square root circuit 22 provided at the subsequent stage of the audio time constant circuit 18 calculates the square root of the signal input from the audio time constant circuit 18. The signal A (t) output from the audio square root circuit 22 can be regarded as an envelope of the audio signal x (t), that is, an effective value x _rms (that is, an audio signal level) of the audio signal x (t). it can.

表示部１３４は、音声用平方根回路２２からの信号Ａ（ｔ）と、暗騒音用平方根回路２４からの信号Ｂ（ｔ）とを表示する。例えば、表示部１３４は、信号Ａ（ｔ）と、信号Ｂ（ｔ）の両者を区別して二元表示してもよい。信号Ａ（ｔ）は、短時間に変化する会話音声や建設現場の間歇騒音のレベルを表し、信号Ｂ（ｔ）は、暗騒音のレベルを表す。信号Ａ（ｔ）については、Ａ（ｔ）が暗騒音が含まれる。そこで、表示部１３４は、信号Ａ（ｔ）に代えてまたは加えて、以下の数式に従って得られる「暗騒音補正された信号Ａ’（ｔ）」を「正味騒音レベル」として表示させることもできる。

The display unit 134 displays the signal A (t) from the audio square root circuit 22 and the signal B (t) from the background noise square root circuit 24. For example, the display unit 134 may distinguish and display the signal A (t) and the signal B (t) in a binary manner. The signal A (t) represents a conversational voice that changes in a short time or an intermittent noise level of the construction site, and a signal B (t) represents a background noise level. For the signal A (t), A (t) includes background noise. In view of this, the display unit 134 can display “the background noise corrected signal A ′ (t)” obtained according to the following formula as the “net noise level” instead of or in addition to the signal A (t). .

図１２は、表示部１３４による騒音レベル表示の一例を示す。図１２に示す騒音レベル表示例では、音声用平方根回路２２からの信号Ａ（ｔ）が「全騒音」として表示され、暗騒音用平方根回路２４からの信号Ｂ（ｔ）が「暗騒音」として表示され、暗騒音補正された信号Ａ’（ｔ）が「正味騒音」として表示されている。図１２に示すように全騒音、暗騒音、正味騒音のレベルを可視化することで、ユーザは瞬時に各騒音レベルを把握することができる。 FIG. 12 shows an example of noise level display by the display unit 134. In the noise level display example shown in FIG. 12, the signal A (t) from the voice square root circuit 22 is displayed as “total noise”, and the signal B (t) from the background noise square root circuit 24 is displayed as “background noise”. The signal A ′ (t) displayed and corrected for background noise is displayed as “net noise”. As shown in FIG. 12, by visualizing the levels of total noise, background noise, and net noise, the user can grasp each noise level instantaneously.

以上、実施の形態にもとづき本発明を説明したが、実施の形態は、本発明の原理、応用を示しているにすぎないことはいうまでもなく、実施の形態には、請求の範囲に規定された本発明の思想を逸脱しない範囲において、多くの変形例や配置の変更が可能であることはいうまでもない。 Although the present invention has been described based on the embodiments, the embodiments merely show the principle and application of the present invention, and the embodiments are defined in the claims. Needless to say, many modifications and arrangements can be made without departing from the spirit of the present invention.

入力信号に対する包絡線取得、すなわち包絡線検波は、上述の実施形態で説明した自乗平均値の平方根を取る方法や絶対値を平滑化する方法のみならず、ウェーブレット変換やヒルベルト変換、あるいは簡略的にはダイオードなどにより半波整流した結果を平滑化する方法などその他の類似の方法によってなされてもよい。 Envelope acquisition for the input signal, that is, envelope detection is not only the method of taking the square root of the mean square value described in the above embodiment and the method of smoothing the absolute value, but also the wavelet transform, the Hilbert transform, or simply May be performed by other similar methods such as a method of smoothing the result of half-wave rectification by a diode or the like.

１０，５０，６０，７０，９２音素分割装置、１２マイクロホン、１３第１分岐部、１４マイクアンプ、１５バンドパスフィルタ、１６自乗回路、１７第２分岐部、１８音声用時定数回路、２０暗騒音用時定数回路、２２音声用平方根回路、２４暗騒音用平方根回路、２６バッファアンプ、２８加算器、３０比較器、３２ゲート回路、６２絶対値回路、７１ＤＳＰボード、７４ＤＳＰ、７６出力アンプ、７７ＲＯＭ、９０音声処理システム、９４音素処理装置、９６スピーカ、１２０，１３０騒音測定装置、１３４表示部。 10, 50, 60, 70, 92 Phoneme splitting device, 12 microphone, 13 first branching unit, 14 microphone amplifier, 15 bandpass filter, 16 square circuit, 17 second branching unit, 18 time constant circuit for voice, 20 dark Time constant circuit for noise, 22 Square root circuit for sound, 24 Square root circuit for background noise, 26 Buffer amplifier, 28 Adder, 30 Comparator, 32 Gate circuit, 62 Absolute value circuit, 71 DSP board, 74 DSP, 76 Output amplifier 77 ROM, 90 voice processing system, 94 phoneme processing device, 96 speaker, 120, 130 noise measuring device, 134 display unit.

Claims

A phoneme dividing device that divides a sound signal in which a background noise signal is superimposed on a sound signal into phonemes that are substantially a single envelope signal based on the sound signal,
A first branching section for branching the sound signal into two;
A second branching section for branching the one sound signal branched at the first branching section into two further;
A sound time constant part for smoothing one sound signal branched by the second branch part with a sound time constant of several tens to several hundreds of milliseconds;
The rising edge of the other sound signal branched by the second branching unit is smoothed with a rising time constant that is at least 10 times larger than the sound time constant, and the rising edge of the other sound signal is A background time constant portion for smoothing with a time constant for falling substantially the same as the time constant for sound,
A comparison unit that compares the signal from the time constant unit for sound and the signal from the time constant unit for background noise;
A gate unit for controlling passage / non-passage of the other sound signal branched by the first branch unit according to a comparison result of the comparison unit;
A phoneme segmentation device comprising:

2. The phoneme division apparatus according to claim 1, wherein the rising time constant is 100 to 1000 times larger than the voice time constant.

An amplifier that amplifies the signal from the time constant portion for background noise at a predetermined amplification rate;
An adder for adding a predetermined offset value to the output from the amplifier, and
The phoneme division apparatus according to claim 1, wherein the comparison unit compares a signal from the time constant unit for speech with a signal from the adder.

The first is provided in front of the branching portion, the phoneme splitting apparatus according to any one of claims 1 to 3, further comprising a band-path filter having a pass band corresponding to the average spectrum of speech.

Either it provided from claim 1, further comprising a band-path filter having a pass band corresponding to the average spectrum of the speech 3 between the second branch portion and the first branch portion A phoneme division device according to claim 1.

A square section that squares one of the sound signals branched by the first branch section, provided between the second branch section and the second branch section after the first branch section;
A voice square root calculation unit for calculating a square root of a signal from the voice time constant unit, provided at a subsequent stage of the voice time constant unit;
A background noise square root calculation unit for calculating a square root of the signal from the background noise time constant unit, which is provided at a subsequent stage of the background noise time constant unit,
The phoneme dividing apparatus according to claim 1, further comprising:

An absolute value calculation unit that is provided between the first branching unit and the second branching unit and outputs an absolute value of one of the sound signals branched by the first branching unit is further provided. The phoneme division device according to any one of claims 1 to 5.

A sound collector that collects the original sound and outputs a sound signal in which a background noise signal is superimposed on the sound signal;
Wherein receiving the sound signal from the sound collecting device, the phoneme splitting device according to any one of 7 to claim 1 for dividing the sound signal into phonemes,
A phoneme processing device that performs predetermined processing on a phoneme signal obtained from the phoneme splitting device;
An output device that outputs the phoneme signal processed by the phoneme processing device as a sound to a space;
A speech processing system comprising:

A phoneme division method that divides a sound signal in which a background noise signal is superimposed on an audio signal into phonemes that are substantially a mountain of envelope signals based on the sound signal,
A first branching step for branching the sound signal into two;
A second branching step for branching one of the sound signals branched in the first branching step into two;
A first smoothing step of smoothing one of the sound signals branched in the second branching step with an audio time constant of several tens to several hundreds of ms;
The rising edge of the other sound signal branched in the second branching step is smoothed with a rising time constant that is at least 10 times larger than the sound time constant, and the rising edge of the other sound signal is A second smoothing step of smoothing with a falling time constant substantially the same as the sound time constant;
A comparison step for comparing the signal calculated in the first smoothing step with the signal calculated in the second smoothing step;
A passage control step for controlling passage / non-passage of the other sound signal branched in the first branch step according to a comparison result of the comparison step;
A phoneme segmentation method comprising:

A phoneme division program that divides a sound signal in which a background noise signal is superimposed on an audio signal into phonemes that are substantially a single envelope signal based on the sound signal,
On the computer,
A first branching step for branching the sound signal into two;
A second branching step for branching one of the sound signals branched in the first branching step into two;
A first smoothing step of smoothing one of the sound signals branched in the second branching step with an audio time constant of several tens to several hundreds of ms;
The rising edge of the other sound signal branched in the second branching step is smoothed with a rising time constant that is at least 10 times larger than the sound time constant, and the rising edge of the other sound signal is A second smoothing step of smoothing with a falling time constant substantially the same as the sound time constant;
A comparison step for comparing the signal calculated in the first smoothing step with the signal calculated in the second smoothing step;
A passage control step for controlling passage / non-passage of the other sound signal branched in the first branch step according to a comparison result of the comparison step;
Phoneme segmentation program to execute.