JP6765124B2

JP6765124B2 - Voice processing device, voice processing method, and voice processing program

Info

Publication number: JP6765124B2
Application number: JP2017156486A
Authority: JP
Inventors: 将志道上
Original assignee: ACCELE CORP
Current assignee: ACCELE CORP
Priority date: 2017-08-14
Filing date: 2017-08-14
Publication date: 2020-10-07
Anticipated expiration: 2037-08-14
Also published as: JP2019035839A

Description

本発明は、対象信号に対して所定の音声処理を行って処理後信号を生成する音声処理装置等に関する。 The present invention relates to a voice processing device or the like that performs predetermined voice processing on a target signal to generate a processed signal.

信号処理技術として、ＡＤＰＣＭ（adaptive differential pulse code modulation）が知られている。ＡＤＰＣＭは、ハードウェア実装の回路規模が小さく、かつ高速なデコードが可能であるため、複数の音声信号を同時に再生するアミューズメント分野などで重要な技術である。 ADPCM (adaptive differential pulse code modulation) is known as a signal processing technique. ADPCM is an important technology in the amusement field where a plurality of audio signals are simultaneously reproduced because the circuit scale of hardware implementation is small and high-speed decoding is possible.

しかし、ＡＤＰＣＭは、同程度の圧縮率を実現できる他の技術、例えば、ＭＰ３等の変換符号化方式のコーデック技術よりも再生される音質が劣ることがある。 However, ADPCM may be inferior in reproduced sound quality to other technologies capable of achieving the same degree of compression ratio, for example, a codec technology of a transform coding method such as MP3.

これに対して、ＡＤＰＣＭによって発生する量子化雑音と、ＡＤＰＣＭのエンコード対象である原信号との周波数スペクトルの傾向を考慮し、再生信号の品質を向上させる緩やかな遮蔽周波数を持つポストフィルタを用いる技術が知られている。 On the other hand, a technique using a post filter having a gentle cutoff frequency to improve the quality of the reproduced signal in consideration of the tendency of the frequency spectrum between the quantization noise generated by ADPCM and the original signal to be encoded by ADPCM. It has been known.

関連する技術として、複数の時系列信号によって形成される音声信号を生成するための量子化処理部と、量子化処理部において量子化を行った際に発生する量子化ノイズについてノイズシェーピングを行うためのノイズシェーピング手段とを備えた音声信号処理装置が知られている。ノイズシェーピング手段は、量子化された音声信号のノイズシェーピングを実現できる特性を有するポストフィルタと、ポストフィルタの逆特性を有するプリフィルタとを備え、ポストフィルタを量子化処理部の後段に、プリフィルタを量子化処理部の前段にそれぞれ設ける技術が開示されている（例えば、特許文献１参照）。 As related technologies, in order to perform noise shaping on the quantization processing unit for generating an audio signal formed by a plurality of time-series signals and the quantization noise generated when the quantization processing unit performs quantization. An audio signal processing device including the noise shaping means of the above is known. The noise shaping means includes a post filter having a characteristic capable of realizing noise shaping of a quantized audio signal and a pre filter having an inverse characteristic of the post filter, and the post filter is placed after the quantization processing unit and is prefiltered. Is disclosed in each of the preceding stages of the quantization processing unit (see, for example, Patent Document 1).

また、関連する他の技術として、量子化誤差を低減するノイズシェーピングフィルタの係数設定方法において、所定のサンプル数の周波数分析結果に基づいたパワースペクトル形状の逆数をとった後、逆直交変換して求められた自己相関係数に基づいてフィルタの係数を演算する技術が知られている（例えば、特許文献２）。これによると、通常のいわゆるＤＳＰ（digital signal processor）によって量子化誤差低減の実時間処理が可能となるノイズシェーピングフィルタの係数設定方法が提供できる。 In addition, as another related technique, in the coefficient setting method of the noise shaping filter that reduces the quantization error, after taking the reciprocal of the power spectrum shape based on the frequency analysis result of a predetermined number of samples, the inverse orthogonal conversion is performed. A technique for calculating a filter coefficient based on a obtained autocorrelation coefficient is known (for example, Patent Document 2). According to this, it is possible to provide a coefficient setting method of a noise shaping filter that enables real-time processing for reducing quantization error by a normal so-called DSP (digital signal processor).

特開２０１６−２１３６８３号公報Japanese Unexamined Patent Publication No. 2016-213683 特開平４−７２９０７号公報JP-A-4-72907

例えば、上記したポストフィルタを用いて再生信号の品質を向上させる音声処理技術は、高周波帯域の特性を改善するものであり、低・中周波数帯域の量子化雑音は低減されない。 For example, the voice processing technique for improving the quality of the reproduced signal by using the above-mentioned post filter improves the characteristics of the high frequency band, and does not reduce the quantization noise in the low and middle frequency bands.

これに対して、高周波帯域だけでなく、低・中周波数帯域の量子化雑音も低減するために、量子化雑音のエネルギーを最小とする方法が考えられる。量子化雑音のエネルギーを最小にすると、量子化雑音のスペクトル包絡は全周波数帯域で平坦となる。 On the other hand, in order to reduce not only the high frequency band but also the low / medium frequency band quantization noise, a method of minimizing the energy of the quantization noise can be considered. When the energy of the quantization noise is minimized, the spectral envelope of the quantization noise is flat in all frequency bands.

しかしながら、このように量子化雑音のスペクトル包絡を全周波数帯域で平坦としたとしても、人間にとって、再生信号の品質が向上しているとは限らない。 However, even if the spectral envelope of the quantization noise is flattened in the entire frequency band, the quality of the reproduced signal is not necessarily improved for humans.

一方、原信号の周波数特性を考慮してノイズシェーピングを施し、量子化雑音のスペクトルを変形し、原信号のエネルギーの強いところに量子化雑音を集中させることにより、原信号のエネルギーの弱いところの量子化雑音を軽減させる方法が考えられる。しかしながら、ノイズシェーピングの強度を固定してしまうと、量子化雑音の総量を増加させてしまって再生信号の品質が逆に劣化してしまう問題がある。 On the other hand, noise shaping is applied in consideration of the frequency characteristics of the original signal, the spectrum of the quantization noise is deformed, and the quantization noise is concentrated in the place where the energy of the original signal is strong, so that the energy of the original signal is weak. A method of reducing the quantization noise can be considered. However, if the intensity of noise shaping is fixed, there is a problem that the total amount of quantization noise is increased and the quality of the reproduced signal is deteriorated.

本発明は、上記事情に鑑みなされたものであり、その目的は、再生信号の品質を向上させることのできる技術を提供することにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique capable of improving the quality of a reproduced signal.

上記目的を達成するため、第１の観点に係る音声処理装置は、対象信号に対して所定の音声処理を行って処理後信号を生成する音声処理部を有する音声処理装置であって、音声処理部による音声処理における１以上のパラメータの値を複数の値に変更させて、音声処理部に音声処理を実行させる音声処理制御部と、パラメータの値を複数の値に変更させた場合のそれぞれにおける、対象信号のエネルギーと、処理後信号に基づいて生成される再生信号のエネルギーとの所定の周波数成分ごとの差分を検出する差分検出部と、パラメータの値を複数の値に変更させた場合のそれぞれにおける、差分検出部により検出された周波数成分ごとの差分に対して、人間の聴覚の周波数成分に対応する感度に応じた重み付けを行い、それらに基づいた特徴量を算出する特徴量算出部と、パラメータの値を複数の値に変更させた場合のそれぞれにおける、特徴量算出部により算出される複数の特徴量に基づいて、音声処理の利用に適したパラメータの値である有効値を決定する有効値決定部と、を備える。 In order to achieve the above object, the voice processing device according to the first aspect is a voice processing device having a voice processing unit that performs predetermined voice processing on a target signal to generate a processed signal, and performs voice processing. In each of the voice processing control unit that changes the value of one or more parameters in the voice processing by the unit to a plurality of values and causes the voice processing unit to execute the voice processing, and the case where the value of the parameter is changed to a plurality of values. , A difference detector that detects the difference between the energy of the target signal and the energy of the reproduced signal generated based on the processed signal for each predetermined frequency component, and when the parameter values are changed to multiple values. In each case, the difference for each frequency component detected by the difference detection unit is weighted according to the sensitivity corresponding to the frequency component of human hearing, and the feature amount calculation unit calculates the feature amount based on them. , The effective value, which is the value of the parameter suitable for the use of voice processing, is determined based on the plurality of feature quantities calculated by the feature quantity calculation unit in each case where the parameter value is changed to a plurality of values. It includes a valid value determination unit.

上記音声処理装置において、音声処理におけるパラメータの値を、有効値検出部により検出された有効値に設定して、音声処理を実行させ、音声処理により得られた処理後信号を有効な処理後信号である有効処理後信号として記憶部に格納させる有効信号生成制御部をさらに有するようにしてもよい。 In the above-mentioned voice processing device, the value of the parameter in the voice processing is set to the valid value detected by the valid value detection unit, the voice processing is executed, and the processed signal obtained by the voice processing is used as a valid post-processing signal. It is also possible to further have an effective signal generation control unit to be stored in the storage unit as a signal after effective processing.

また、上記音声処理装置において、音声処理部は、対象信号における量子化雑音の周波数特性を変化させるノイズシェーピングを行うノイズシェーピング部を含み、パラメータは、ノイズシェーピング部におけるノイズシェーピングに関わるパラメータであってもよい。 Further, in the above-mentioned voice processing apparatus, the voice processing unit includes a noise shaping unit that performs noise shaping that changes the frequency characteristic of the quantization noise in the target signal, and the parameters are parameters related to noise shaping in the noise shaping unit. May be good.

また、上記音声処理装置において、差分検出部は、対象信号の所定サイズのブロックを単位として差分を検出し、特徴量算出部は、ブロックを単位として特徴量を算出し、有効値決定部は、ブロックを単位として、有効値を決定するようにしてもよい。 Further, in the voice processing device, the difference detection unit detects the difference in units of blocks of a predetermined size of the target signal, the feature amount calculation unit calculates the feature amount in units of blocks, and the effective value determination unit determines the effective value. The effective value may be determined in units of blocks.

また、上記音声処理装置において、人間の聴覚の周波数成分に対応する感度に関する情報を記憶する聴覚情報記憶部をさらに備え、特徴量算出部は、聴覚情報記憶部の感度に関する情報に基づいて、重みを決定するようにしてもよい。 Further, the voice processing device further includes an auditory information storage unit that stores information on sensitivity corresponding to the frequency component of human hearing, and the feature amount calculation unit is weighted based on information on sensitivity of the auditory information storage unit. May be decided.

また、上記目的を達成するため、第２の観点に係る音声処理方法は、対象信号に対して所定の音声処理を行って処理後信号を生成する音声処理部を有する音声処理装置による音声処理方法であって、音声処理部による音声処理における１以上のパラメータの値を複数の値に変更させて、音声処理部に音声処理を実行させ、パラメータを複数の値のそれぞれに変更させた際に、原信号のエネルギーと、処理後信号に基づいて生成される再生信号のエネルギーとの所定の周波数成分ごとの差分を検出し、検出された周波数成分ごとの差分に対して、人間の聴覚の周波数成分に対応する感度に応じた重み付けを行い、それらに基づいた特徴量を算出し、算出された複数の特徴量に基づいて、音声処理の利用に適したパラメータの値である有効値を決定する。 Further, in order to achieve the above object, the voice processing method according to the second aspect is a voice processing method using a voice processing device having a voice processing unit that performs predetermined voice processing on the target signal to generate a processed signal. When the value of one or more parameters in the voice processing by the voice processing unit is changed to a plurality of values, the voice processing unit is made to execute the voice processing, and the parameter is changed to each of the plurality of values. The difference between the energy of the original signal and the energy of the reproduced signal generated based on the processed signal is detected for each predetermined frequency component, and the frequency component of human hearing is compared with the difference for each detected frequency component. Weighting is performed according to the sensitivity corresponding to, the feature amount based on them is calculated, and the effective value which is the value of the parameter suitable for the use of the voice processing is determined based on the calculated feature amount.

また、上記目的を達成するため、第３の観点に係る音声処理プログラムは、対象信号に対して所定の音声処理を行って処理後信号を生成する音声処理装置を構成するコンピュータに実行させる音声処理プログラムであって、音声処理プログラムは、コンピュータを、音声処理部による音声処理における１以上のパラメータの値を複数の値に変更させて、音声処理部に音声処理を実行させる音声処理制御部と、パラメータの値を複数の値に変更させた場合のそれぞれにおける、対象信号のエネルギーと、処理後信号に基づいて生成される再生信号のエネルギーとの所定の周波数成分ごとの差分を検出する差分検出部と、パラメータの値を複数の値に変更させた場合のそれぞれにおける、差分検出部により検出された周波数成分ごとの差分に対して、人間の聴覚の周波数成分に対応する感度に応じた重み付けを行い、それらに基づいた特徴量を算出する特徴量算出部と、パラメータの値を複数の値に変更させた場合のそれぞれにおける、特徴量算出部により算出される複数の特徴量に基づいて、音声処理の利用に適したパラメータの値である有効値を決定する有効値決定部として機能させる。 Further, in order to achieve the above object, the voice processing program according to the third aspect performs voice processing to be executed by a computer constituting a voice processing device that performs predetermined voice processing on the target signal and generates a processed signal. The voice processing program is a voice processing control unit, which causes the computer to change the value of one or more parameters in the voice processing by the voice processing unit to a plurality of values, and causes the voice processing unit to execute the voice processing. Difference detector that detects the difference between the energy of the target signal and the energy of the reproduced signal generated based on the processed signal for each predetermined frequency component when the value of the parameter is changed to a plurality of values. And, in each case where the value of the parameter is changed to a plurality of values, the difference for each frequency component detected by the difference detection unit is weighted according to the sensitivity corresponding to the frequency component of human hearing. , The feature amount calculation unit that calculates the feature amount based on them, and the voice processing based on the plurality of feature amounts calculated by the feature amount calculation unit in each case where the parameter value is changed to a plurality of values. It functions as an effective value determination unit that determines an effective value, which is a value of a parameter suitable for the use of.

本発明によれば、再生信号の品質を向上させることができる。 According to the present invention, the quality of the reproduced signal can be improved.

図１は、一実施形態に係る音声処理装置の機能ブロック図である。FIG. 1 is a functional block diagram of the voice processing device according to the embodiment. 図２は、一実施形態に係る音声処理装置の音声処理部の機能ブロック図である。FIG. 2 is a functional block diagram of a voice processing unit of the voice processing device according to the embodiment. 図３は、一実施形態に係る人間の聴覚の周波数に対する感度を示すＡカーブを示す図である。FIG. 3 is a diagram showing an A curve showing sensitivity to a human auditory frequency according to an embodiment. 図４は、一実施形態に係る音声処理装置のハードウェア構成図である。FIG. 4 is a hardware configuration diagram of the voice processing device according to the embodiment. 図５は、一実施形態に係る音声生成処理のフローチャートである。FIG. 5 is a flowchart of the voice generation process according to the embodiment. 図６は、一実施形態に係る特徴量算出処理のフローチャートである。FIG. 6 is a flowchart of the feature amount calculation process according to the embodiment.

実施形態について、図面を参照して説明する。なお、以下に説明する実施形態は特許請求の範囲に係る発明を限定するものではなく、また実施形態の中で説明されている諸要素及びその組み合わせの全てが発明の解決手段に必須であるとは限らない。 The embodiment will be described with reference to the drawings. It should be noted that the embodiments described below do not limit the invention according to the claims, and all of the elements and combinations thereof described in the embodiments are essential for the means for solving the invention. Is not always.

まず、一実施形態に係る音声処理装置について説明する。 First, the voice processing device according to the embodiment will be described.

図１は、一実施形態に係る音声処理装置の機能ブロック図である。図２は、一実施形態に係る音声処理装置の音声処理部の機能ブロック図である。 FIG. 1 is a functional block diagram of the voice processing device according to the embodiment. FIG. 2 is a functional block diagram of a voice processing unit of the voice processing device according to the embodiment.

音声処理装置１は、音声処理部１１と、差分検出部１２と、特徴量算出部１３と、有効値決定部１４と、有効信号生成制御部の一例としての音声処理制御部１５と、聴覚情報記憶部の一例としての記憶部２０とを備える。 The voice processing device 1 includes a voice processing unit 11, a difference detection unit 12, a feature amount calculation unit 13, an effective value determination unit 14, a voice processing control unit 15 as an example of an effective signal generation control unit, and auditory information. A storage unit 20 is provided as an example of the storage unit.

音声処理部１１は、対象信号の一例としての原信号ｓ（ｎ）に対して、所定の音声処理（例えば、ＡＤＰＣＭのエンコード処理）を行う。音声処理部１１は、例えば、図２に示すように、量子化部の一例としての適応量子化部３１と、逆量子化部の一例としての逆適応量子化部３２と、演算部３３，３４と、ノイズシェーピング部３５と、演算部３６と、適応予測部３７と、演算部３８とを備える。 The voice processing unit 11 performs predetermined voice processing (for example, ADPCM encoding processing) on the original signal s (n) as an example of the target signal. As shown in FIG. 2, the speech processing unit 11 includes an adaptive quantization unit 31 as an example of a quantization unit, an inverse adaptive quantization unit 32 as an example of an inverse quantization unit, and arithmetic units 33 and 34. A noise shaping unit 35, a calculation unit 36, an adaptive prediction unit 37, and a calculation unit 38 are provided.

演算部３３は、原信号ｓ（ｎ）と、適応予測部３７から出力される予測信号との差分である差分信号を出力する。演算部３４は、演算部３３から出力された差分信号と、ノイズシェーピング部３５により出力された信号とを加算する。 The calculation unit 33 outputs a difference signal which is a difference between the original signal s (n) and the prediction signal output from the adaptation prediction unit 37. The calculation unit 34 adds the difference signal output from the calculation unit 33 and the signal output by the noise shaping unit 35.

適応量子化部３１は、演算部３４から出力された信号（ノイズシェーピング後の差分信号）に対して、適応量子化処理（デコード処理）を実行することにより、処理後信号ｘ（ｎ）を出力する。逆適応量子化部３２は、適応量子化部３１により出力された処理後信号ｘ（ｎ）に対して、逆適応量子化処理（デコード処理）を実行する。 The adaptive quantization unit 31 outputs the processed signal x (n) by executing the adaptive quantization process (decoding process) on the signal (difference signal after noise shaping) output from the calculation unit 34. To do. The inverse adaptive quantization unit 32 executes an inverse adaptive quantization process (decoding process) on the processed signal x (n) output by the adaptive quantization unit 31.

演算部３６は、逆適応量子化部３２から出力された信号と、演算部３４から出力された信号との差分の信号を出力する。演算部３６から出力された信号は、適応量子化部３１に入力された信号に対する量子化雑音の信号である。 The calculation unit 36 outputs a signal of the difference between the signal output from the inverse adaptive quantization unit 32 and the signal output from the calculation unit 34. The signal output from the arithmetic unit 36 is a quantization noise signal with respect to the signal input to the adaptive quantization unit 31.

ノイズシェーピング部３５は、演算部３６から出力された信号を入力として、量子化雑音の信号の形状を変形させるノイズシェーピングを行う。 The noise shaping unit 35 receives the signal output from the calculation unit 36 as an input, and performs noise shaping that deforms the shape of the quantization noise signal.

ここで、量子化雑音の伝達関数（雑音伝達関数）Ｈ（ｚ）について、ノイズシェーピングの強度を変更するためのパラメータ（強度γ（０以上１以下））を導入した伝達関数Ｈ（γｚ）とすると、この伝達関数は、以下の式（１）に示すように表すことができる。式（１）において、Ｈ^―（γｚ）（なお、本明細書では、「Ｈ^―」は、Ｈの上に「―」を表すこととする。）は、ノイズシェーピング部３５のフィードバック要素に相当する。 Here, regarding the transfer function (noise transfer function) H (z) of the quantization noise, the transfer function H (γz) in which a parameter (intensity γ (0 or more and 1 or less)) for changing the noise shaping intensity is introduced is introduced. Then, this transfer function can be expressed as shown in the following equation (1). In the formula ^{(1), H - (γz} ) ( In this specification, "H ^-" it is on the H. The "-" and represent a) is equivalent to the feedback element of the noise shaping section 35 To do.

本実施形態では、雑音伝達関数Ｈ（γｚ）を、再生信号ｙ（ｎ）を自己回帰モデルとした場合における伝達関数Ａ（ｚ）に対して上記したパラメータである強度γを導入した伝達関数Ａ（γｚ）としている。このように、雑音伝達関数Ｈ（γｚ）を、伝達関数Ａ（γｚ）とすることにより、原信号と同様なスペクトル包絡によるノイズシェーピングを行うことができ、所謂聴覚マスキングによる雑音の隠蔽が期待できる。また、後述するように、強度γの値を再生信号の品質の向上に有効な値に設定することができるので、雑音の総エネルギーを抑制し、再生信号による音質を向上することができる。 In the present embodiment, the noise transfer function H (γz) is the transfer function A in which the above-mentioned parameter intensity γ is introduced into the transfer function A (z) when the reproduction signal y (n) is used as an autoregressive model. It is set to (γz). By setting the noise transfer function H (γz) to the transfer function A (γz) in this way, noise shaping can be performed by spectral wrapping similar to the original signal, and noise hiding by so-called auditory masking can be expected. .. Further, as will be described later, since the value of the intensity γ can be set to a value effective for improving the quality of the reproduced signal, the total energy of noise can be suppressed and the sound quality of the reproduced signal can be improved.

ここで、自己回帰モデルの伝達関数Ａ（ｚ）について説明する。 Here, the transfer function A (z) of the autoregressive model will be described.

まず、時刻ｎの再生信号ｙ（ｎ）の予測値を、過去ｋ個（ｋは所定の数）のサンプルと、ｋ個の予測係数αｉとを用いた線形和で予測することとすると、時刻ｎの再生信号の予測値をｙ＾_ｋ（ｎ）（なお、本明細書では、「ｙ＾」は、ｙの真上に「＾」を表すこととする。）とすると、以下の式（２）に示すように表される。 First, assuming that the predicted value of the reproduction signal y (n) at time n is predicted by a linear sum using the past k samples (k is a predetermined number) and k prediction coefficients αi, the time is assumed. Assuming that the predicted value of the reproduced signal of n is y ^ _k (n) (in the present specification, “y ^” represents “^” directly above y), the following equation ( It is represented as shown in 2).

予測値ｙ＾_ｋ（ｎ）と、実際の値ｙ（ｎ）との差分信号をｄ（ｎ）とすると、差分信号ｄ（ｎ）は、以下の式（３）に示すように表される。 Assuming that the difference signal between the predicted value y ^ _k (n) and the actual value y (n) is d (n), the difference signal d (n) is expressed as shown in the following equation (3). ..

この式（３）に、式（２）を代入すると、以下の式（４）に示す関係が得られる。この式（４）は、自己回帰モデルと呼ばれる。 By substituting the equation (2) into the equation (3), the relationship shown in the following equation (4) can be obtained. This equation (4) is called an autoregressive model.

この式（４）に対して、ｚ変換することにより、以下の式（５）に示すように、式（４）の伝達関数Ａ（ｚ）が得られる。 By z-transforming this equation (4), the transfer function A (z) of the equation (4) can be obtained as shown in the following equation (5).

なお、予測係数αｉは、Ｌｅｖｉｎｓｏｎ−Ｄｕｒｂｉｎのアルゴリズムを用いることにより求めることができる。 The prediction coefficient αi can be obtained by using the Levinson-Durbin algorithm.

Ｈ（γｚ）＝Ａ（γｚ）と、式（５）及び式（１）とによると、フィードバック要素Ｈ^―（γｚ）は、以下の式（６）に示すように表される。 And H (γz) = A (γz ), according to the equations (5) and (1), feedback element H ^- (γz) is expressed as shown in the following equation (6).

演算部３８は、逆適応量子化部３２の出力した信号と、適応予測部３７が出力した予測信号とを加算した再生信号ｙ（ｎ）を出力する。 The calculation unit 38 outputs a reproduction signal y (n) obtained by adding the signal output by the inverse adaptive quantization unit 32 and the prediction signal output by the adaptive prediction unit 37.

適応予測部３７は、再生信号ｙ（ｎ）を入力として、原信号ｓ（ｎ）を予測する予測信号を出力する。 The adaptive prediction unit 37 takes the reproduction signal y (n) as an input and outputs a prediction signal for predicting the original signal s (n).

図１の説明に戻り、記憶部２０は、原信号データ２１と、聴覚情報２２と、有効処理後信号データ２３とを記憶する。 Returning to the description of FIG. 1, the storage unit 20 stores the original signal data 21, the auditory information 22, and the effective processed signal data 23.

原信号データ２１は、音声処理部１１の処理対象となるデジタルデータ化された原信号ｓ（ｎ）である。原信号ｓ（ｎ）は、例えば、アナログの原信号を、所定のサンプリング間隔ごとに、各時点におけるアナログ値を、そのアナログ値に対応するデジタル値に変換したものである。 The original signal data 21 is a digitalized original signal s (n) to be processed by the voice processing unit 11. The original signal s (n) is, for example, an analog original signal obtained by converting an analog value at each time point into a digital value corresponding to the analog value at predetermined sampling intervals.

聴覚情報２２は、人間の聴覚の周波数成分に対応する感度に関する情報（聴覚情報）である。聴覚情報としては、例えば、人間が感覚的に同じ音の大きさであると認識する音圧レベルの周波数変化を示す等ラウドネス曲線のミラー特性を持つ曲線に対応する情報や、ＪＩＳＣ１５０９−１：２００５のＡカーブやＣカーブに対応する情報である。 The auditory information 22 is information (auditory information) relating to the sensitivity corresponding to the frequency component of human hearing. As auditory information, for example, information corresponding to a curve having a mirror characteristic of an equal loudness curve indicating a frequency change of a sound pressure level that humans perceive as having the same loudness, and JIS C 15091-1 : Information corresponding to the 2005 A curve and C curve.

ここで、Ａカーブについて説明する。 Here, the A curve will be described.

図３は、一実施形態に係る人間の聴覚の周波数に対する感度を示すＡカーブを示す図である。図３において、横軸は周波数［ＫＨｚ］であり、縦軸は、音圧［ｄＢ］である。縦軸については、周波数１．０［ｋＨｚ］の音圧を０［ｄＢ］としている。 FIG. 3 is a diagram showing an A curve showing sensitivity to a human auditory frequency according to an embodiment. In FIG. 3, the horizontal axis is the frequency [KHz] and the vertical axis is the sound pressure [dB]. On the vertical axis, the sound pressure at a frequency of 1.0 [kHz] is 0 [dB].

人間の聴覚は、図３のＡカーブに示すように、周波数に応じて、同一の音圧と感じる値が異なっている、すなわち、感度が異なっている。 As shown in the A curve of FIG. 3, human hearing differs in the value perceived as the same sound pressure depending on the frequency, that is, the sensitivity is different.

図１の説明に戻り、有効処理後信号データ２３は、ノイズシェーピング部３５の強度γに対して、有効値決定部１４により決定された有効値が設定されている場合において、音声処理部１１により生成される処理後信号ｘ（ｎ）である。この有効処理後信号データ２３を、デコードすることにより、高品質の音声を再現できる再生信号を生成することができる。 Returning to the description of FIG. 1, the signal data 23 after the effective processing is set by the voice processing unit 11 when the effective value determined by the effective value determining unit 14 is set for the intensity γ of the noise shaping unit 35. The processed signal x (n) that is generated. By decoding the signal data 23 after the effective processing, it is possible to generate a reproduction signal capable of reproducing high-quality sound.

差分検出部１２は、ノイズシェーピング部３５のパラメータの値を複数の値に変更させた場合のそれぞれにおいて、原信号ｓ（ｎ）の所定の単位（ブロック）を処理対象として、そのブロック（処理対象ブロック）についての対数パワースペクトルと、原信号ｓ（ｎ）の処理対象ブロックを適応量子化部３１でエンコードし、逆適応量子化部３２でデコードした後に得られる再生信号ｙ（ｎ）（図２では、演算部３８から出力される信号）についての対数パワースペクトルとを算出し、求めた２つの対数パワースペクトルの差の絶対値を周波数成分ごとに求める。具体的には、差分検出部１２は、以下の式（７）により、原信号ｓ（ｎ）のブロックについての対数パワースペクトルＳ（ｆ）を算出する。 The difference detection unit 12 sets a predetermined unit (block) of the original signal s (n) as a processing target in each case where the value of the parameter of the noise shaping unit 35 is changed to a plurality of values, and the block (processing target). The logarithmic power spectrum of the block) and the block to be processed of the original signal s (n) are encoded by the adaptive quantization unit 31 and decoded by the inverse adaptive quantization unit 32, and then the reproduced signal y (n) (FIG. 2). Then, the logarithmic power spectrum of the signal output from the calculation unit 38) is calculated, and the absolute value of the difference between the two obtained logarithmic power spectra is obtained for each frequency component. Specifically, the difference detection unit 12 calculates the logarithmic power spectrum S (f) for the block of the original signal s (n) by the following equation (7).

また、差分検出部１２は、以下の式（８）により、再生信号ｙ（ｎ）のブロックについての対数パワースペクトルＹ（ｆ）を算出する。 Further, the difference detection unit 12 calculates the logarithmic power spectrum Y (f) for the block of the reproduction signal y (n) by the following equation (8).

次いで、差分検出部１２は、以下の式（９）により、原信号ｓ（ｎ）の対数パワースペクトルＳ（ｆ）と、再生信号ｙ（ｎ）のブロックについての対数パワースペクトルＹ（ｆ）との周波数成分ごとの絶対値の差分である差分パワースペクトルＤ（ｆ）を算出する。 Next, the difference detection unit 12 uses the following equation (9) to obtain the logarithmic power spectrum S (f) of the original signal s (n) and the logarithmic power spectrum Y (f) of the block of the reproduced signal y (n). The difference power spectrum D (f), which is the difference between the absolute values for each frequency component of, is calculated.

特徴量算出部１３は、ノイズシェーピング部３５のパラメータの値を複数の値に変更させた場合のそれぞれにおいて、差分パワースペクトルＤ（ｆ）の各周波数成分に対して、人間の聴力に基づく重みｗ（ｆ）を乗算し、得られた値のすべてに基づいて（例えば、すべてを加算することにより）、特徴量（ＦＤＤ：ＦｒｅｑｕｅｎｃｙＤｏｍａｉｎＤｉｆｆｅｒｅｎｃｅ）を算出する。 The feature amount calculation unit 13 weights w based on human hearing for each frequency component of the difference power spectrum D (f) in each case where the value of the parameter of the noise shaping unit 35 is changed to a plurality of values. (F) is multiplied, and the feature amount (FDD: Frequency Domain Difference) is calculated based on all the obtained values (for example, by adding all).

ここで、聴覚情報２２を、図３に示すＡカーブが示す周波数と音圧との対応関係としている場合には、各周波数成分に対する重みｗ（ｆ）は、周波数に対する縦軸の音圧の値をＣ_Ａ（ｆ）とすると、以下の式（１０）により算出される。 Here, when the auditory information 22 has a correspondence relationship between the frequency shown by the A curve shown in FIG. 3 and the sound pressure, the weight w (f) for each frequency component is the value of the sound pressure on the vertical axis with respect to the frequency. Let _CA (f) be, and it is calculated by the following equation (10).

特徴量算出部１３は、特徴量ＦＤＤを、以下の式（１１）により算出する。 The feature amount calculation unit 13 calculates the feature amount FDD by the following formula (11).

ここで、特徴量ＦＤＤは、各周波数の原信号ｓ（ｎ）と再生信号ｙ（ｎ）とのエネルギーの差分に対して人間の聴覚の特性に応じた重みをつけて合計したものである。このことから、特徴量ＦＤＤが小さいと、人間の聴覚に対しては、再生信号ｙ（ｎ）における量子化雑音の影響が少ない、すなわち、再生信号ｙ（ｎ）による音声の音質が良いことを示している。 Here, the feature amount FDD is the sum of the energy differences between the original signal s (n) and the reproduced signal y (n) of each frequency, weighted according to the characteristics of human hearing. From this, when the feature amount FDD is small, the influence of the quantization noise on the reproduction signal y (n) is small on the human hearing, that is, the sound quality of the sound by the reproduction signal y (n) is good. Shown.

音声処理制御部１５は、原信号ｓ（ｎ）のブロックのそれぞれを対象（処理対象ブロック）として、音声処理部１１による音声処理における１以上のパラメータの値を複数の値に変更させて、それぞれの値を設定したそれぞれの状態で音声処理部１１に音声処理を実行させ、それぞれの音声処理に対して、差分検出部１２及び特徴量算出部１３による処理を実行させる。本実施形態では、音声処理制御部１５は、例えば、ノイズシェーピング部３５における強度γについて、０以上１以下の範囲で複数の値に変化させる。また、音声処理制御部１５は、有効値決定部１４により、パラメータの有効値が決定された場合には、音声処理部１１による原信号ｓ（ｎ）の処理対象ブロックに対する音声処理における１以上のパラメータの値を、その処理対象ブロックに対応する有効値に変更させ、音声処理部１１により生成された処理後信号ｘ（ｎ）を処理対象ブロックの有効な処理後信号（有効処理後データ）として、記憶部２０に格納させる。 The voice processing control unit 15 sets each of the blocks of the original signal s (n) as a target (process target block), and changes the value of one or more parameters in the voice processing by the voice processing unit 11 to a plurality of values, respectively. The voice processing unit 11 is made to execute the voice processing in each state in which the value of is set, and the difference detection unit 12 and the feature amount calculation unit 13 are made to execute the processing for each voice processing. In the present embodiment, the voice processing control unit 15 changes, for example, the intensity γ in the noise shaping unit 35 to a plurality of values in the range of 0 or more and 1 or less. Further, when the valid value of the parameter is determined by the valid value determining unit 14, the voice processing control unit 15 is one or more in the voice processing for the processing target block of the original signal s (n) by the voice processing unit 11. The value of the parameter is changed to an effective value corresponding to the processing target block, and the processed signal x (n) generated by the voice processing unit 11 is used as a valid post-processing signal (effective post-processing data) of the processing target block. , Stored in the storage unit 20.

有効値決定部１４は、音声処理制御部１５によって変更されたパラメータの各値に対応する特徴量算出部１３により算出された特徴量ＦＤＤに基づいて、劣化の少ない音声処理を行うことのできるパラメータ（強度γ）の値（有効値）を決定する。本実施形態では、有効値決定部１４は、特徴量ＦＤＤが最も小さくなるパラメータの値を、処理対象ブロックに対する有効値として決定する。 The valid value determination unit 14 is a parameter capable of performing voice processing with little deterioration based on the feature amount FDD calculated by the feature amount calculation unit 13 corresponding to each value of the parameter changed by the voice processing control unit 15. Determine the value (effective value) of (intensity γ). In the present embodiment, the effective value determining unit 14 determines the value of the parameter having the smallest feature amount FDD as the effective value for the block to be processed.

次に、音声処理装置１のハードウェア構成について詳細に説明する。 Next, the hardware configuration of the voice processing device 1 will be described in detail.

図４は、一実施形態に係る音声処理装置のハードウェア構成図である。 FIG. 4 is a hardware configuration diagram of the voice processing device according to the embodiment.

音声処理装置１は、制御回路１０１と、記憶装置１０２と、リーダライタ１０３と、通信インターフェース（通信Ｉ／Ｆ）１０４と、入出力インターフェース（入出力Ｉ／Ｆ）１０５と、入力装置１０６と、表示装置１０７とを備えるコンピュータにより構成される。制御回路１０１、記憶装置１０２、リーダライタ１０３、通信Ｉ／Ｆ１０４、入出力Ｉ／Ｆ１０５、及び表示装置１０７は、バス１０８を介して接続されている。 The voice processing device 1 includes a control circuit 101, a storage device 102, a reader / writer 103, a communication interface (communication I / F) 104, an input / output interface (input / output I / F) 105, an input device 106, and the like. It is composed of a computer including a display device 107. The control circuit 101, the storage device 102, the reader / writer 103, the communication I / F 104, the input / output I / F 105, and the display device 107 are connected via the bus 108.

制御回路１０１は、例えば、プロセッサであり、音声処理装置１の全体を統括制御する。制御回路１０１は、記憶装置１０２に格納されたプログラムを実行することにより各種処理を実行する。本実施形態では、制御回路１０１は、記憶装置１０２に格納された音声処理プログラムを実行することにより、音声処理部１１、差分検出部１２、特徴量算出部１３、有効値決定部１４、及び音声処理制御部１５を構成する。 The control circuit 101 is, for example, a processor, and controls the entire voice processing device 1. The control circuit 101 executes various processes by executing a program stored in the storage device 102. In the present embodiment, the control circuit 101 executes the voice processing program stored in the storage device 102 to execute the voice processing unit 11, the difference detection unit 12, the feature amount calculation unit 13, the effective value determination unit 14, and the voice. The processing control unit 15 is configured.

記憶装置１０２は、例えば、ＨＤＤ（ＨａｒｄＤＩＳＫＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、ＲＡＭ、ＲＯＭ等であり、制御回路１０１に実行されるプログラム（音声処理プログラム等）や、各種情報を記憶する。記録装置１０２は、図１に示す記憶部２０を構成する。 The storage device 102 is, for example, an HDD (Hard Disk Drive), an SSD (Solid State Drive), a RAM, a ROM, or the like, and stores a program (voice processing program or the like) executed in the control circuit 101 and various information. The recording device 102 constitutes the storage unit 20 shown in FIG.

リーダライタ１０３は、記録媒体１１０を着脱可能であり、記録媒体１１０からのデータの読み出し、及び記録媒体１１０へのデータの書き込みを行う。記録媒体１１０としては、例えば、ＳＤメモリーカード、ＦＤ（フロッピーディスク：登録商標）、ＣＤ、ＤＶＤ，ＢＤ（登録商標）、フラッシュメモリ等の非一時的記録媒体（不揮発性記録媒体）がある。本実施形態においては、記録媒体１１０に、音声処理プログラム、処理に使用する原信号ｓ（ｎ）、聴覚情報等を格納させておき、リードライタ１０３により、これらを読み出して、利用するようにしてもよい。 The reader / writer 103 has a removable recording medium 110, and reads data from the recording medium 110 and writes data to the recording medium 110. Examples of the recording medium 110 include non-temporary recording media (nonvolatile recording media) such as SD memory cards, FDs (floppy disks: registered trademarks), CDs, DVDs, BDs (registered trademarks), and flash memories. In the present embodiment, the recording medium 110 stores the voice processing program, the original signal s (n) used for the processing, the auditory information, and the like, and the read-dryer 103 reads them out and uses them. May be good.

通信Ｉ／Ｆ１０４は、ネットワーク１１１に接続されており、ネットワーク１１１に接続された他の装置との間でのデータの送受信を行う。 The communication I / F 104 is connected to the network 111, and transmits / receives data to / from another device connected to the network 111.

次に、本実施形態に係る音声処理装置１における音声生成処理の動作について説明する。 Next, the operation of the voice generation processing in the voice processing device 1 according to the present embodiment will be described.

図５は、一実施形態に係る音声生成処理のフローチャートである。 FIG. 5 is a flowchart of the voice generation process according to the embodiment.

音声処理制御部１５は、記憶部２０から原信号ｓ（ｎ）の処理範囲内の未処理のブロック（処理対象ブロック）を抽出する（ステップＳ１０１）。なお、原信号ｓ（ｎ）の処理範囲は、例えば、原信号ｓ（ｎ）の全体としてもよいし、原信号ｓ（ｎ）のうちのユーザから指定された一部の範囲としてもよい。 The voice processing control unit 15 extracts an unprocessed block (process target block) within the processing range of the original signal s (n) from the storage unit 20 (step S101). The processing range of the original signal s (n) may be, for example, the entire original signal s (n) or a part of the original signal s (n) designated by the user.

次いで、音声処理制御部１５は、音声処理部１１のノイズシェーピング部３５の強度γに０を設定し（ステップＳ１０２）、抽出した処理対象ブロックを音声処理部１１に入力して音声処理を実行させる（ステップＳ１０３）。 Next, the voice processing control unit 15 sets the intensity γ of the noise shaping unit 35 of the voice processing unit 11 to 0 (step S102), inputs the extracted processing target block to the voice processing unit 11, and executes voice processing. (Step S103).

次いで、音声処理制御部１５は、差分検出部１２及び特徴量算出部１３に、音声処理部１１による音声処理に用いた原信号ｓ（ｎ）と、音声処理部１１により出力される再生信号ｙ（ｎ）とを用いて、特徴量ＦＤＤを算出する処理（特徴量算出処理）を実行させる（ステップＳ１０４）。 Next, the voice processing control unit 15 tells the difference detection unit 12 and the feature amount calculation unit 13 the original signal s (n) used for voice processing by the voice processing unit 11 and the reproduction signal y output by the voice processing unit 11. Using (n), a process of calculating the feature amount FDD (feature amount calculation process) is executed (step S104).

次いで、音声処理制御部１５は、音声処理部１１の強度γを変更する（ステップＳ１０５）。例えば、音声処理制御部１５は、強度γの値に所定値（例えば、０．０１）を加算する。 Next, the voice processing control unit 15 changes the intensity γ of the voice processing unit 11 (step S105). For example, the voice processing control unit 15 adds a predetermined value (for example, 0.01) to the value of the intensity γ.

次いで、音声処理制御部１５は、強度γが１より大きいか否かを判定する（ステップＳ１０６）。この結果、強度γが１より大きくない場合（ステップＳ１０６：Ｎｏ）には、この強度γによる特徴量ＦＤＤを算出する必要があるので、音声処理制御部１５は、処理をステップＳ１０３に進める。一方、強度γが１より大きい場合（ステップＳ１０６：Ｙｅｓ）には、強度γを十分に変更させて必要な特徴量ＦＤＤを算出したことを意味するので、音声処理制御部１５は、処理をステップＳ１０７に進める。 Next, the voice processing control unit 15 determines whether or not the intensity γ is greater than 1 (step S106). As a result, when the intensity γ is not greater than 1 (step S106: No), it is necessary to calculate the feature amount FDD based on the intensity γ, so the voice processing control unit 15 advances the processing to step S103. On the other hand, when the intensity γ is larger than 1 (step S106: Yes), it means that the required feature amount FDD is calculated by sufficiently changing the intensity γ, so that the voice processing control unit 15 steps the process. Proceed to S107.

ステップＳ１０７では、有効値決定部１４が、ステップＳ１０４で算出された複数の特徴量ＦＤＤに基づいて、量子化雑音が少ない再生信号を得ることのできる処理後信号を生成することのできる強度γの値（有効値）を決定し、音声処理制御部１５が、決定した有効値を音声処理部１１の強度γに設定する。本実施形態では、有効値決定部１４は、特徴量ＦＤＤが最も小さくなる強度γの値を、処理対象ブロックに対する有効値として決定する。 In step S107, the effective value determining unit 14 has an intensity γ capable of generating a processed signal capable of obtaining a reproduced signal with less quantization noise based on the plurality of feature quantities FDD calculated in step S104. The value (valid value) is determined, and the voice processing control unit 15 sets the determined effective value to the intensity γ of the voice processing unit 11. In the present embodiment, the effective value determining unit 14 determines the value of the intensity γ having the smallest feature amount FDD as the effective value for the block to be processed.

次いで、音声処理制御部１５は、処理対象ブロックを音声処理部１１に入力して音声処理を実行させ、音声処理部１１により生成された処理後信号ｘ（ｎ）を有効処理後信号データとして記憶部２０に格納する（ステップＳ１０８）。 Next, the voice processing control unit 15 inputs the processing target block to the voice processing unit 11 to execute voice processing, and stores the processed signal x (n) generated by the voice processing unit 11 as effective processed signal data. It is stored in the unit 20 (step S108).

次いで、音声処理制御部１５は、原信号ｓ（ｎ）の処理範囲の全てのブロックに対して処理を行ったか否かを判定する（ステップＳ１０９）。この結果、処理範囲の全てのブロックに対して処理を行っていない場合（ステップＳ１０９：Ｎｏ）には、音声処理制御部１５は、処理をステップＳ１０１に進めて、次のブロックを対象に処理を行う一方、処理範囲の全てのブロックに対して処理を行った場合（ステップＳ１０９：Ｙｅｓ）には、音声処理制御部１５は、音声生成処理を終了する。 Next, the voice processing control unit 15 determines whether or not processing has been performed on all the blocks in the processing range of the original signal s (n) (step S109). As a result, when the processing is not performed for all the blocks in the processing range (step S109: No), the voice processing control unit 15 advances the processing to step S101 and performs the processing for the next block. On the other hand, when processing is performed on all the blocks in the processing range (step S109: Yes), the voice processing control unit 15 ends the voice generation processing.

次に、図５のステップＳ１０４に示す特徴量算出処理について説明する。 Next, the feature amount calculation process shown in step S104 of FIG. 5 will be described.

図６は、一実施形態に係る特徴量算出処理のフローチャートである。 FIG. 6 is a flowchart of the feature amount calculation process according to the embodiment.

差分検出部１２は、式（７）により、原信号ｓ（ｎ）の対数パワースペクトルＳ（ｆ）を算出する（ステップＳ２０１）。さらに、差分検出部１２は、式（８）により、再生信号ｙ(ｎ)の対数パワースペクトルＹ（ｆ）を算出する（ステップＳ２０２）。そして、差分検出部１２は、式（９）により、対数パワースペクトルＳ（ｆ）と対数パワースペクトルＹ（ｆ）との絶対値の差分である差分パワースペクトルＤ（ｆ）を周波数毎に算出する（ステップＳ２０３）。次いで、特徴量算出部１３は、各差分パワースペクトルＤ（ｆ）に対して、対応する周波数に応じた人間の聴覚特性に応じた重みｗ（ｆ）を付与する。すなわち、特徴量算出部１３は、式（１０）により、各差分パワースペクトルＤ（ｆ）に対して対応する重みｗ（ｆ）を乗算する（ステップＳ２０４）。次いで、特徴量算出部１３は、式（１１）により、重みが付与された周波数ごとの差分パワースペクトルＤ（ｆ）を合計することにより、特徴量ＦＤＤを算出する（ステップＳ２０５）。 The difference detection unit 12 calculates the logarithmic power spectrum S (f) of the original signal s (n) according to the equation (7) (step S201). Further, the difference detection unit 12 calculates the logarithmic power spectrum Y (f) of the reproduction signal y (n) by the equation (8) (step S202). Then, the difference detection unit 12 calculates the difference power spectrum D (f), which is the difference between the absolute values of the logarithmic power spectrum S (f) and the logarithmic power spectrum Y (f), for each frequency by the equation (9). (Step S203). Next, the feature amount calculation unit 13 assigns a weight w (f) according to the human auditory characteristic corresponding to the corresponding frequency to each difference power spectrum D (f). That is, the feature amount calculation unit 13 multiplies each difference power spectrum D (f) by the corresponding weight w (f) according to the equation (10) (step S204). Next, the feature amount calculation unit 13 calculates the feature amount FDD by summing the difference power spectra D (f) for each weighted frequency according to the equation (11) (step S205).

以上説明したように、本実施形態に係る音声処理装置１によると、音声処理における１以上のパラメータの値を複数の値に変更させて、音声処理部１１に音声処理を実行させ、原信号ｓ（ｎ）のエネルギーと、再生信号ｙ（ｎ）のエネルギーとの所定の周波数成分ごとの差分を検出し、検出された周波数成分ごとの差分に対して、人間の聴覚の周波数成分に対応する感度に応じた重み付けを行い、それらを合計した特徴量を算出し、パラメータを複数の値のそれぞれに変更させた際に特徴量算出部１３により算出される複数の特徴量に基づいて、量子化誤差の少ない再生信号が得られる処理後信号の生成に適したパラメータの有効値を決定するようにしたので、音声処理によって生成される処理後信号により生成される再生信号による音声の品質を向上することができる。 As described above, according to the voice processing device 1 according to the present embodiment, the values of one or more parameters in the voice processing are changed to a plurality of values, the voice processing unit 11 is made to execute the voice processing, and the original signal s. The difference between the energy of (n) and the energy of the reproduction signal y (n) for each predetermined frequency component is detected, and the sensitivity corresponding to the frequency component of human hearing with respect to the difference for each detected frequency component. The quantization error is based on the plurality of feature quantities calculated by the feature quantity calculation unit 13 when the weighting is performed according to the above, the total feature quantity is calculated, and the parameter is changed to each of the plurality of values. Since the effective value of the parameter suitable for the generation of the processed signal that can obtain the reproduced signal with less is determined, the quality of the audio by the reproduced signal generated by the processed signal generated by the audio processing can be improved. Can be done.

なお、本発明は、上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で、適宜変形して実施することが可能である。 The present invention is not limited to the above-described embodiment, and can be appropriately modified and implemented without departing from the spirit of the present invention.

例えば、上記実施形態においては、原信号の各ブロックに対して、ノイズシェーピング部３５の強度γの値を常に所定の範囲（０以上１以下の範囲）内で変更して特徴量ＦＤＤを算出するようにしていたが、本発明はこれに限られず、所定数前（例えば、直前）のブロックに対する強度γの有効値を基準に、その有効値の周辺の限られた範囲内で強度γの値を変更するようにしてもよい。ブロックに対する強度γの有効値は、時間的に近い範囲（例えば、直前）のブロックの強度γの有効値と近い値となる傾向が高く、このようにすることにより、有効値を適切に決定できると共に、有効値を決定するために必要な処理負荷を低減することができる。 For example, in the above embodiment, the feature amount FDD is calculated by constantly changing the value of the intensity γ of the noise shaping unit 35 within a predetermined range (range of 0 or more and 1 or less) for each block of the original signal. However, the present invention is not limited to this, and the value of the intensity γ is within a limited range around the effective value based on the effective value of the intensity γ for a block before a predetermined number (for example, immediately before). May be changed. The effective value of the intensity γ for the block tends to be close to the effective value of the intensity γ of the block in a time range (for example, immediately before), and by doing so, the effective value can be appropriately determined. At the same time, the processing load required to determine the effective value can be reduced.

また、上記実施形態においては、各ブロックのそれぞれに対して特徴量を算出して強度γの有効値を決定するようにしていたが、本発明はこれに限られず、例えば、複数のブロックのうちの一つのブロックに対して強度γを変更して有効値を決定し、それら複数のブロックに対する強度γとして決定した１つの有効値を用いて音声処理をするようにしてもよい。このようにすると、ブロックごとに特徴量を算出して強度γの有効値を決定する場合に比して、処理負荷を低減することができ、短時間で比較的品質の良い再生信号を生成することのできる処理後信号を生成することができる。 Further, in the above embodiment, the feature amount is calculated for each block to determine the effective value of the intensity γ, but the present invention is not limited to this, and for example, among a plurality of blocks. The intensity γ may be changed for one block to determine an effective value, and the voice processing may be performed using one effective value determined as the intensity γ for the plurality of blocks. In this way, the processing load can be reduced as compared with the case where the feature amount is calculated for each block to determine the effective value of the intensity γ, and a relatively high quality reproduced signal is generated in a short time. It is possible to generate a post-processing signal that can be produced.

また、上記実施形態では、ノイズシェーピング部３５の強度γの値を変化させた場合の特徴量を算出し、特徴量に基づいて、強度γの有効値を決定するようにしていたが、本発明はこれに限られず、音声処理における他のパラメータの値（例えば、音声処理部にＬＰＦ（ローパスフィルタ）があれば、そのフィルタの遮断周波数、ノイズシェーピング部３５の雑音伝達関数として用いる自己回帰モデルの予測係数の個数、量子化する際のブロックのサイズ等の少なくともいずれか１つのパラメータ）を変化させて、各パラメータ値での特徴量を算出し、その特徴量に基づいて、そのパラメータの有効値を決定するように、有効処理後信号を生成する際に用いるようにしてもよい。 Further, in the above embodiment, the feature amount when the value of the intensity γ of the noise shaping unit 35 is changed is calculated, and the effective value of the intensity γ is determined based on the feature amount. Is not limited to this, and the values of other parameters in speech processing (for example, if the speech processing section has an LPF (low-pass filter), the cutoff frequency of the filter and the self-regression model used as the noise transfer function of the noise shaping section 35) The feature quantity at each parameter value is calculated by changing at least one parameter such as the number of prediction coefficients and the block size at the time of quantization, and the effective value of the parameter is calculated based on the feature quantity. It may be used when generating a signal after effective processing so as to determine.

また、上記実施形態では、共通の聴覚情報２２に基づいて、重み付けを決定し、特徴量を算出するようにしていたが、本発明はこれに限られず、例えば、処理後信号から生成される再生信号に基づく音声を聞く対象となる利用者の年代別（例えば、１０代、２０代、３０代、・・・等）に、その年代に対応する聴覚情報を用意し、それら聴覚情報に基づいて重み付けを決定し、特徴量を算出するようにしてもよい。この場合には、音声を聞く主な年代の聴覚情報２２を利用して特徴量を算出して、パラメータの有効度を決定して、処理後信号を生成するようにしてもよく、年代ごとの特徴量を算出し、年代ごとのパラメータの有効度を決定し、各年代に適した処理後信号を生成するようにしてもよい。 Further, in the above embodiment, the weighting is determined and the feature amount is calculated based on the common auditory information 22, but the present invention is not limited to this, and for example, the reproduction generated from the processed signal is reproduced. For each age group (for example, teens, 20s, 30s, etc.) of the user who listens to the voice based on the signal, the auditory information corresponding to that age is prepared, and based on the auditory information. The weighting may be determined and the feature amount may be calculated. In this case, the feature amount may be calculated using the auditory information 22 of the main age of listening to the voice, the validity of the parameter may be determined, and the processed signal may be generated for each age. The feature amount may be calculated, the validity of the parameter for each age may be determined, and the processed signal suitable for each age may be generated.

また、上記実施形態では、聴覚情報２２として、主に等ラウドネス曲線に対応する情報の例を示していたが、本発明はこれに限られず、例えば、等ラウドネス曲線に対応する情報に代えて、又は等ラウドネス曲線に対応する情報に加えて、時間的に前の音によりその後の音がかき消されてしまうという現象である時間マスキングに対応する情報や、或る音が発生している際に他の音がかき消されてしまう現象であるスペクトルマスキングに対応する情報を用いるようにしてもよい。 Further, in the above embodiment, the auditory information 22 mainly shows an example of information corresponding to the equal loudness curve, but the present invention is not limited to this, and for example, instead of the information corresponding to the equal loudness curve, Or, in addition to the information corresponding to the equal loudness curve, the information corresponding to time masking, which is a phenomenon in which the subsequent sound is drowned out by the previous sound in time, and other information when a certain sound is generated. Information corresponding to spectrum masking, which is a phenomenon in which the sound of the sound is drowned out, may be used.

また、上記実施形態では、音声処理部による音声処理を、ＡＤＰＣＭとしていたが、本発明はこれに限られず、例えば、他のＤＰＣＭや他の方式（例えば、ＭＰ３）のコーデックであってもよい。音声処理部による音声処理がＭＰ３である場合、例えば、音声処理におけるカットする音の周波数（パラメータ）を変えた場合のそれぞれにおいて、聴覚特性（例えば、等ラウドネス曲線とマスキングに対応する情報とのいずれか１以上に基づく聴覚特性）に応じた重みを用いて特徴量ＦＤＤを算出し、特徴量ＦＤＤを閾値以下とすることができ、且つ圧縮率を高くすることのできるような（例えば、カットする音の周波数を多くすることができるような）パラメータを選択するようにすればよい。このようにすると、ＭＰ３では、聴覚特性に応じて、音質を一定以上に維持しつつ、音質に影響のない、又は少ない音のカットにより圧縮率を高くすることができる。 Further, in the above embodiment, the voice processing by the voice processing unit is set to ADPCM, but the present invention is not limited to this, and for example, another DPCM or a codec of another method (for example, MP3) may be used. When the voice processing by the voice processing unit is MP3, for example, when the frequency (parameter) of the sound to be cut in the voice processing is changed, which of the auditory characteristics (for example, the equal loudness curve and the information corresponding to masking) The feature amount FDD is calculated by using the weight according to (auditory characteristic based on 1 or more), the feature amount FDD can be set to be equal to or less than the threshold value, and the compression rate can be increased (for example, cut). It is only necessary to select a parameter (so that the frequency of the sound can be increased). In this way, in MP3, the compression rate can be increased by cutting a small amount of sound that does not affect the sound quality while maintaining the sound quality at a certain level or higher according to the auditory characteristics.

また、上記実施形態において、制御回路１０１が行っていた処理の一部又は全部を、ハードウェア回路で行うようにしてもよい。例えば、図１に示す各機能部１１〜１５の少なくともいずれか１つをハードウェア回路で構成してもよい。また、音声処理部１１の各構成の少なくとも一部をハードウェア回路で構成してもよい。 Further, in the above embodiment, a part or all of the processing performed by the control circuit 101 may be performed by the hardware circuit. For example, at least one of the functional units 11 to 15 shown in FIG. 1 may be configured by a hardware circuit. Further, at least a part of each configuration of the voice processing unit 11 may be configured by a hardware circuit.

１…音声処理装置、１１…音声処理部、１２…差分検出部、１３…特徴量算出部、１４…有効値決定部、１５…音声処理制御部、２０…記憶部、２１…原信号データ、２２…聴覚情報、２３…有効処理後信号データ
1 ... voice processing device, 11 ... voice processing unit, 12 ... difference detection unit, 13 ... feature amount calculation unit, 14 ... effective value determination unit, 15 ... voice processing control unit, 20 ... storage unit, 21 ... original signal data, 22 ... Auditory information, 23 ... Signal data after effective processing

Claims

A voice processing device having a voice processing unit that performs predetermined voice processing on a target signal and generates a signal after processing.
The value of one or more parameters in the voice processing by the voice processing unit is changed to each of a plurality of values , and the voice processing unit is made to change each of the changed values of the parameter for the same target signal. A voice processing control unit that executes each of the used voice processes,
Detects the difference between the energy of the target signal and the energy of the reproduced signal generated based on the processed signal for each predetermined frequency component when the value of the parameter is changed to a plurality of values. Difference detector and
When the value of the parameter is changed to a plurality of values, the difference for each frequency component detected by the difference detection unit is weighted according to the sensitivity corresponding to the frequency component of human hearing. , The feature amount calculation unit that calculates the feature amount based on them,
When the value of the parameter is changed to a plurality of values, the voice processing can be used from among the values of the plurality of parameters based on the plurality of the feature amounts calculated by the feature amount calculation unit. A valid value determination unit that determines a valid value, which is a value of a suitable parameter,
A voice processing device including.

The value of the parameter in the voice processing is set to the valid value detected by the valid value determination unit, the voice processing is executed, and the processed signal obtained by the voice processing is used as a valid post-processing signal. The voice processing device according to claim 1, further comprising an effective signal generation control unit for storing the effective processing post-signal in the storage unit.

The voice processing unit includes a noise shaping unit that performs noise shaping that changes the frequency characteristics of the quantization noise in the target signal.
The voice processing device according to claim 1 or 2, wherein the parameter is a parameter related to noise shaping in the noise shaping unit.

The difference detection unit detects the difference in units of blocks of a predetermined size of the target signal.
The feature amount calculation unit calculates the feature amount in units of the block, and then calculates the feature amount.
The voice processing device according to claim 3, wherein the effective value determining unit determines the effective value in units of the block.

Further provided with an auditory information storage unit for storing information on sensitivity corresponding to the frequency component of human hearing.
The voice processing device according to any one of claims 1 to 4, wherein the feature amount calculation unit determines the weight based on information on the sensitivity of the auditory information storage unit.

A voice processing method using a voice processing device having a voice processing unit that performs predetermined voice processing on a target signal to generate a processed signal.
The value of one or more parameters in the voice processing by the voice processing unit is changed to each of a plurality of values , and the voice processing unit is made to change each of the changed values of the parameter for the same target signal. Each of the used voice processes is executed,
When the parameter is changed to each of a plurality of values, the difference between the energy of the target signal and the energy of the reproduced signal generated based on the processed signal is detected and detected for each predetermined frequency component. The difference for each frequency component is weighted according to the sensitivity corresponding to the frequency component of human hearing, a total feature amount is calculated, and based on the calculated plurality of feature amounts, the feature amount is calculated. A voice processing method for determining an effective value which is a value of a parameter suitable for using the voice processing from a plurality of values of the parameter.

A voice processing program to be executed by a computer constituting a voice processing device having a voice processing unit that performs predetermined voice processing on a target signal and generates a signal after processing .
The voice processing program
The computer
The value of one or more parameters in the voice processing by the voice processing unit is changed to each of a plurality of values , and the voice processing unit is made to change each of the changed values of the parameter for the same target signal. A voice processing control unit that executes each of the used voice processes,
Detects the difference between the energy of the target signal and the energy of the reproduced signal generated based on the processed signal for each predetermined frequency component when the value of the parameter is changed to a plurality of values. Difference detector and
When the value of the parameter is changed to a plurality of values, the difference for each frequency component detected by the difference detection unit is weighted according to the sensitivity corresponding to the frequency component of human hearing. , The feature amount calculation unit that calculates the feature amount based on them,
When the value of the parameter is changed to a plurality of values, the voice processing can be used from among the values of the plurality of parameters based on the plurality of the feature amounts calculated by the feature amount calculation unit. A valid value determination unit that determines a valid value, which is a value of a suitable parameter,
A voice processing program that works.

A processing unit that performs processing including noise shaping on the original signal that is the target of audio processing to generate a playback signal,
A detector that detects the difference in energy between the original signal and the reproduced signal for each frequency component,
A calculation unit that weights the difference in energy of each frequency component according to the characteristics of hearing and calculates a feature amount by adding the difference in energy of each weighted frequency component.
A control unit that changes the noise shaping parameters and causes the calculation unit to calculate the feature amount in each state in which the changed parameters are set.
A determination unit that determines the value of the parameter corresponding to the minimum feature amount or the feature amount below the threshold value as the value of the noise shaping parameter used for the voice processing from the plurality of feature amounts calculated in each state. When,
A voice processing device including.