JP4587941B2

JP4587941B2 - Speech correction system and adaptive filter used therefor

Info

Publication number: JP4587941B2
Application number: JP2005333680A
Authority: JP
Inventors: 真吾木内; 望齊藤
Original assignee: Alpine Electronics Inc
Current assignee: Alpine Electronics Inc
Priority date: 2005-11-18
Filing date: 2005-11-18
Publication date: 2010-11-24
Anticipated expiration: 2025-11-18
Also published as: JP2007140102A

Description

本発明は音声補正システムおよびこれに用いる適応フィルタに関し、特に、複数の音声が混在した混合音声から特定の音声を抑圧するシステムに用いて好適なものである。 The present invention relates to a sound correction system and an adaptive filter used therefor, and is particularly suitable for use in a system that suppresses specific sound from mixed sound in which a plurality of sounds are mixed.

最近の車両の殆どには、オーディオ装置、エアーコンディショナ、ナビゲーション装置など各種の電子機器が搭載されている。また、最近では、これらの電子機器を操作する際の片手運転等を回避するために、電子機器の操作を音声認識により行えるようにしたシステムも提供されている。この音声認識技術を用いれば、運転者は、ハンドルから手を離すことなく（リモートコントローラや操作パネル等の操作部を手動で操作せずに）各種電子機器の操作を行うことができる。 Most recent vehicles are equipped with various electronic devices such as an audio device, an air conditioner, and a navigation device. Recently, in order to avoid one-handed operation or the like when operating these electronic devices, a system is also provided that allows the electronic devices to be operated by voice recognition. By using this voice recognition technology, the driver can operate various electronic devices without removing his / her hands from the steering wheel (without manually operating an operation unit such as a remote controller or an operation panel).

音声認識システムは通常、ユーザが発声した特定の単語や熟語、簡単な命令文などをマイクから入力し、それを発話コマンドとして認識する。そして、認識した発話コマンドに応じた処理を実行するようになっている。ここで、オーディオ装置の出力音声がある中でユーザがコマンドを発声すると、マイクには話者音声の他にオーディオ音声も入力され、音声認識にとってオーディオ音声が雑音となるため、話者音声の認識率が低下してしまう。 A voice recognition system usually inputs a specific word, idiom or simple command sentence uttered by a user from a microphone and recognizes it as an utterance command. And the process according to the recognized utterance command is performed. Here, when the user utters a command in the presence of the output sound of the audio device, the audio sound is also input to the microphone in addition to the speaker sound, and the audio sound becomes noise for speech recognition. The rate will drop.

そこで、マイクより入力した混合音声（話者音声とオーディオ音声）からオーディオ音声のみを抑圧するＡＳＣ（Audio Sound Cancellation）システムが提供されている。一般に、ＡＳＣシステムを実現するために、適応フィルタが用いられることが多い。適応フィルタのアルゴリズムとしては、Ｎ−ＬＭＳ（Normalized-LMS）アルゴリズムが用いられることが多い。 Accordingly, an ASC (Audio Sound Cancellation) system is provided that suppresses only audio sound from mixed sound (speaker sound and audio sound) input from a microphone. In general, an adaptive filter is often used to realize an ASC system. An N-LMS (Normalized-LMS) algorithm is often used as an adaptive filter algorithm.

図５は、適応フィルタを用いた従来のＡＳＣシステムの概略構成を示す図である。図５において、１０１はオーディオ音声を出力するスピーカ、１０２は音声を入力するマイク、１０３は適応フィルタ、１０４は減算器である。マイク１０２は、音声認識処理のために備えている話者音声入力用のものであるが、オーディオ装置でオーディオソースが再生されていると、話者音声だけでなく、スピーカ１０１から出力されるオーディオ音声もマイク１０２から入力される。 FIG. 5 is a diagram showing a schematic configuration of a conventional ASC system using an adaptive filter. In FIG. 5, 101 is a speaker for outputting audio sound, 102 is a microphone for inputting sound, 103 is an adaptive filter, and 104 is a subtractor. The microphone 102 is for speaker voice input provided for voice recognition processing. However, when the audio source is reproduced by the audio device, not only the speaker voice but also the audio output from the speaker 101 is used. Audio is also input from the microphone 102.

適応フィルタ１０３は、そのアルゴリズムがＮ−ＬＭＳアルゴリズムの場合、制御対象のリファレンス信号として入力されるオーディオ音声ｘ（ｎ）を次に示す（式１−１）〜（式１−３）（以下、特に区別しないときはこれらをまとめて（式１）と記す）に従って補正する。なお、（式１）はオーディオ音声ｘ（ｎ）が１チャンネルの場合の演算式を示している。 When the algorithm is the N-LMS algorithm, the adaptive filter 103 indicates the audio sound x (n) that is input as a reference signal to be controlled (Equation 1-1) to (Equation 1-3) (hereinafter, When there is no particular distinction, these are corrected together (denoted as (Equation 1)). (Equation 1) represents an arithmetic expression when the audio sound x (n) is one channel.

この（式１）中において、ｗ（ｎ），α（ｎ），ｅ（ｎ），ｘ（ｎ）は全て行列であり、右肩の“Ｔ”は転置行列であることを示し、右肩の“−１”は逆行列であることを示す。また、Ｌは適応フィルタのタップ長、μはステップサイズパラメータを示す。ステップサイズパラメータとは、フィルタ係数の修正の大きさを示すものであり、適応処理の収束を制御するためのパラメータである。このステップサイズパラメータμが乗算される“α（ｎ）×ｅ（ｎ）”を修正項と呼ぶ。本明細書では、修正項に含まれるα（ｎ）を「修正用データ」と呼ぶことにする。 In this (Formula 1), w (n), α (n), e (n), and x (n) are all matrices, and “T” on the right shoulder indicates a transposed matrix. “−1” in FIG. 4 indicates an inverse matrix. L represents the tap length of the adaptive filter, and μ represents the step size parameter. The step size parameter indicates the magnitude of correction of the filter coefficient, and is a parameter for controlling the convergence of the adaptive process. “Α (n) × e (n)” multiplied by the step size parameter μ is called a correction term. In this specification, α (n) included in the correction term is referred to as “correction data”.

減算器１０４は、マイク１０２より入力されたオーディオ音声と話者音声との混合音声ｄ（ｎ）から、適応フィルタ１０３より出力される補正後のオーディオ音声ｙ（ｎ）を引くことによって誤差ｅ（ｎ）を演算し、話者音声のみを抽出する。減算器１０４によって抽出された話者音声は、図示しない音声認識エンジンに供給される。これにより、発話コマンドに対応した処理が実行される。 The subtractor 104 subtracts the corrected audio sound y (n) output from the adaptive filter 103 from the mixed sound d (n) of the audio sound and the speaker sound input from the microphone 102, thereby generating an error e ( n) to calculate only the speaker's voice. The speaker voice extracted by the subtracter 104 is supplied to a voice recognition engine (not shown). Thereby, processing corresponding to the utterance command is executed.

適応フィルタ１０３には、リファレンス信号演算部１０３ａ、更新フィルタ係数算出部１０３ｂ、音声補正フィルタ１０３ｃが備えられている。リファレンス信号演算部１０３ａは、リファレンス信号として入力されるオーディオ音声ｘ（ｎ）を（式１−２）のように演算して、修正用データα（ｎ）を算出する。更新フィルタ係数算出部１０３ｂは、Ｎ−ＬＭＳアルゴリズムに従って（式１−１）の演算を行い、減算器１０４から出力される誤差ｅ（ｎ）のパワーが最小となるように動作して音声補正フィルタ１０３ｃのフィルタ係数ｗ（ｎ）を同定する。 The adaptive filter 103 includes a reference signal calculation unit 103a, an update filter coefficient calculation unit 103b, and an audio correction filter 103c. The reference signal calculation unit 103a calculates the audio sound x (n) input as the reference signal as shown in (Equation 1-2), and calculates correction data α (n). The update filter coefficient calculation unit 103b performs an operation of (Equation 1-1) according to the N-LMS algorithm, operates so as to minimize the power of the error e (n) output from the subtractor 104, and operates as a voice correction filter. The filter coefficient w (n) of 103c is identified.

音声補正フィルタ１０３ｃは、更新フィルタ係数算出部１０３ｂにより決定されたフィルタ係数ｗ（ｎ）を用いて、制御対象となるオーディオ音声ｘ（ｎ）に対してフィルタ演算をする。具体的には、スピーカ１０１からマイク１０２に伝達されるオーディオ音声の伝達関数と同一の伝達関数を制御対象のオーディオ音声ｘ（ｎ）に対して与えることにより、補正されたオーディオ音声ｙ（ｎ）を得る。この音声補正フィルタ１０３ｃより出力されるフィルタ制御後のオーディオ音声ｙ（ｎ）は、減算器１０４に供給され、ここで演算された誤差ｅ（ｎ）が更新フィルタ係数算出部１０３ｂにフィードバックされる。 The sound correction filter 103c performs a filter operation on the audio sound x (n) to be controlled using the filter coefficient w (n) determined by the update filter coefficient calculation unit 103b. Specifically, the corrected audio sound y (n) is obtained by giving the transfer function identical to the transfer function of the audio sound transmitted from the speaker 101 to the microphone 102 to the audio sound x (n) to be controlled. Get. The audio sound y (n) after filter control output from the sound correction filter 103c is supplied to the subtractor 104, and the error e (n) calculated here is fed back to the update filter coefficient calculation unit 103b.

以上のようなＡＳＣシステムは、オーディオ音声をミュートせずに音声認識を可能とするための技術であり、快適な音声ＨＩ（Human Interface）を実現する上で必要不可欠である。ところが、オーディオ音声の性質は個々のソースによって違い、しかも非定常である。このような性質の信号を扱う際には、リファレンス信号演算部１０３ａで算出される修正用データα（ｎ）とそれを用いた更新フィルタ係数算出部１０３ｂのＮ−ＬＭＳアルゴリズムとの制御が、ＡＳＣシステムの性能を良好に保つ上で有効である。 The ASC system as described above is a technique for enabling voice recognition without muting the audio voice, and is indispensable for realizing a comfortable voice HI (Human Interface). However, the nature of audio speech varies from source to source and is non-stationary. When handling a signal having such a property, control of the correction data α (n) calculated by the reference signal calculation unit 103a and the N-LMS algorithm of the update filter coefficient calculation unit 103b using the correction data α (n) is performed by the ASC. This is effective in maintaining good system performance.

しかしながら、ＡＳＣシステムの性能を良好に保つためには、高いサンプリング周波数でフィルタ係数を細かく（タップ数を多く）求める必要があり、その演算量は大きなものとなる。特に、オーディオ音声がマルチチャンネル化されている場合には、演算量は非常に大きくなってしまう。 However, in order to keep the performance of the ASC system good, it is necessary to obtain the filter coefficient finely (with a large number of taps) at a high sampling frequency, and the amount of calculation becomes large. In particular, when the audio sound is multi-channeled, the amount of calculation becomes very large.

なお、所要の係数データをあらかじめ計算してフィルタ係数データベースに蓄積しておき、このデータベースから係数データを読み出して制御用フィルタのタップ係数として設定するようにした技術が提案されている（例えば、特許文献１参照）。この特許文献１に記載の技術をＡＳＣシステムに適用すれば、フィルタ係数の算出にかかる演算量を大幅に削減することが可能となる。
特開平１０−３０７５９２号公報 A technique has been proposed in which required coefficient data is calculated in advance and accumulated in a filter coefficient database, and the coefficient data is read from the database and set as a tap coefficient of a control filter (for example, a patent). Reference 1). If the technique described in Patent Document 1 is applied to an ASC system, it is possible to significantly reduce the amount of calculation required for calculating filter coefficients.
Japanese Patent Laid-Open No. 10-307592

しかしながら、上記特許文献１に記載のフィルタ係数は、車種に応じた音場特性を作るためのものであり、入力されるオーディオ音声の内容によらず固定のものである。これに対して、ＡＳＣシステムの場合は、上述したように制御対象となるオーディオ音声ｘ（ｎ）の性質が個々のソースによって違い、時間的にも非定常である。また、オーディオ音声ｘ（ｎ）を再生している環境も、その時々で変化する。そのため、フィルタ係数は、オーディオ音声ｘ（ｎ）の変化する性質や環境等に合わせて随時変更していかなければならない。したがって、特許文献１に記載の技術をＡＳＣシステムにそのまま適用することはできない。 However, the filter coefficient described in Patent Document 1 is for creating a sound field characteristic corresponding to the vehicle type, and is fixed regardless of the contents of the input audio sound. On the other hand, in the case of the ASC system, as described above, the property of the audio sound x (n) to be controlled differs depending on each source, and is temporally unsteady. Also, the environment in which the audio sound x (n) is reproduced changes from time to time. For this reason, the filter coefficient must be changed as needed in accordance with the changing nature or environment of the audio sound x (n). Therefore, the technique described in Patent Document 1 cannot be applied to an ASC system as it is.

本発明は、このような問題を解決するために成されたものであり、ＡＳＣシステムで使用する適応フィルタのフィルタ係数の算出にかかる演算量を削減できるようにすることを目的とする。 The present invention has been made to solve such a problem, and an object of the present invention is to reduce the amount of calculation required for calculating the filter coefficient of an adaptive filter used in an ASC system.

上記した課題を解決するために、本発明では、制御対象となるオーディオ音声に基づいて修正用データをあらかじめ算出してデータベースとして記憶しておき、実際に制御対処となるオーディオ音声が再生されているときに、そのオーディオ音声に該当する修正用データをデータベースから読み出して、制御対象のオーディオ音声のタイムコードに同期させて出力する。そして、このようにして出力された修正用データを用いて、適応アルゴリズムによりフィルタ係数を求め、求めたフィルタ係数を用いてオーディオ音声のフィルタリング処理を行う。 In order to solve the above-described problems, in the present invention, correction data is calculated in advance based on the audio sound to be controlled and stored as a database, and the audio sound that is actually used for control is reproduced. Sometimes, the correction data corresponding to the audio sound is read from the database and output in synchronization with the time code of the audio sound to be controlled. Then, using the correction data output in this way, a filter coefficient is obtained by an adaptive algorithm, and audio sound filtering processing is performed using the obtained filter coefficient.

上記のように構成した本発明によれば、少なくとも修正用データについてはあらかじめ算出されたものがデータベースとして蓄積されており、これを読み出して利用することができるので、実際に制御対処となるオーディオ音声が再生されているときに修正用データをリアルタイムに求める必要がなくなる。しかも、その読み出した修正用データに対して適応アルゴリズムが適用されて、制御対象となるオーディオ音声の性質やその時々の環境に応じたフィルタ係数がリアルタイムに求められる。したがって、本発明をＡＳＣシステムに適用した場合でも、少ない演算量で適応フィルタの適切なフィルタ係数を算出することができる。 According to the present invention configured as described above, at least correction data that has been calculated in advance is stored as a database and can be read and used. It is no longer necessary to obtain correction data in real time when is being played. In addition, an adaptive algorithm is applied to the read correction data, and filter coefficients corresponding to the properties of the audio sound to be controlled and the circumstances at that time are obtained in real time. Therefore, even when the present invention is applied to the ASC system, it is possible to calculate an appropriate filter coefficient of the adaptive filter with a small amount of calculation.

以下、本発明の一実施形態を図面に基づいて説明する。図１は、本発明の適応フィルタおよび音声補正システムを実施したＡＳＣシステムの構成例を示す図である。図１に示すように、本実施形態のＡＳＣシステムは、車載用のユーザ端末１００と、車外に設置されたデータバンク装置（サーバ装置）２００とを備え、ユーザ端末１００とデータバンク装置２００とが通信ネットワークを介して相互に接続可能に構成されている。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a diagram showing a configuration example of an ASC system that implements an adaptive filter and a sound correction system of the present invention. As shown in FIG. 1, the ASC system of the present embodiment includes an in-vehicle user terminal 100 and a data bank device (server device) 200 installed outside the vehicle, and the user terminal 100 and the data bank device 200 include It is configured to be mutually connectable via a communication network.

ユーザ端末１００の構成において、１は音源であり、例えばオーディオデータが格納されたＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk）、ハードディスク、半導体メモリ、ＭＤ（Mini Disc）などの記録媒体により構成されている。音源１のオーディオデータは、図２に示すように、オーディオデータの経過時間を表すタイムコード毎に格納されている。ここで、タイムコードは、トラックの先頭からの経過時間を表すコードであっても良いし、音源１の先頭からの連続した経過時間を表すコードであっても良い。 In the configuration of the user terminal 100, reference numeral 1 denotes a sound source, which is constituted by a recording medium such as a CD (Compact Disc), a DVD (Digital Versatile Disk), a hard disk, a semiconductor memory, or an MD (Mini Disc) in which audio data is stored. ing. The audio data of the sound source 1 is stored for each time code indicating the elapsed time of the audio data, as shown in FIG. Here, the time code may be a code representing an elapsed time from the head of the track, or may be a code representing a continuous elapsed time from the head of the sound source 1.

２はオーディオソース再生部であり、例えばＣＤプレーヤ、ＤＶＤプレーヤ、ＭＰ３プレーヤ、ＭＤプレーヤにより構成される。このオーディオソース再生部２は、制御対象となるオーディオ音声（リファレンス信号）ｘ（ｎ）を音源１から再生して出力する。また、オーディオソース再生部２は、オーディオ音声ｘ（ｎ）の経過時間を表すタイムコードを音源１から再生して出力する。 Reference numeral 2 denotes an audio source playback unit, which includes, for example, a CD player, a DVD player, an MP3 player, and an MD player. The audio source playback unit 2 plays back and outputs the audio sound (reference signal) x (n) to be controlled from the sound source 1. The audio source reproduction unit 2 reproduces and outputs a time code representing the elapsed time of the audio sound x (n) from the sound source 1.

３はスピーカであり、オーディオソース再生部２により再生されたオーディオ音声ｘ（ｎ）を出力する。４は音声を入力するマイクである。このマイク４は、元々は音声認識処理のために備えている話者音声入力用のものであるが、オーディオソース再生部２によって音源１の再生が行われていると、話者音声だけでなく、スピーカ３から出力されるオーディオ音声もマイク４から入力される。 Reference numeral 3 denotes a speaker, which outputs the audio sound x (n) reproduced by the audio source reproducing unit 2. Reference numeral 4 denotes a microphone for inputting voice. The microphone 4 is originally for speaker voice input provided for voice recognition processing. However, when the sound source 1 is played back by the audio source playback unit 2, not only the speaker voice but also the microphone 4 is used. The audio sound output from the speaker 3 is also input from the microphone 4.

５は更新フィルタ係数算出部、６は音声補正フィルタ、７は減算器であり、更新フィルタ係数算出部５および音声補正フィルタ６は本実施形態による適応フィルタの一部を構成している。本実施形態の適応フィルタは、そのアルゴリズムがＮ−ＬＭＳアルゴリズムの場合、制御対象のオーディオ音声ｘ（ｎ）を上述した（式１）と同様のアルゴリズムに従って補正する。 Reference numeral 5 denotes an update filter coefficient calculation unit, 6 denotes an audio correction filter, and 7 denotes a subtracter. The update filter coefficient calculation unit 5 and the audio correction filter 6 constitute a part of the adaptive filter according to the present embodiment. When the algorithm is an N-LMS algorithm, the adaptive filter according to the present embodiment corrects the audio sound x (n) to be controlled according to the algorithm similar to (Equation 1) described above.

更新フィルタ係数算出部５は、後述するような手順で取得される修正用データα（ｎ）を用いて、例えばＮ−ＬＭＳアルゴリズムに従って（式１−１）の演算を行い、減算器７からフィードバックされる誤差ｅ（ｎ）のパワーが最小となるように動作して音声補正フィルタ６のフィルタ係数ｗ（ｎ）を同定する。この更新フィルタ係数算出部５は、本発明のフィルタ係数算出部に相当する。 The update filter coefficient calculation unit 5 performs the calculation of (Equation 1-1) according to, for example, the N-LMS algorithm using the correction data α (n) acquired by the procedure described later, and feeds back from the subtractor 7. The filter coefficient w (n) of the sound correction filter 6 is identified by operating so that the power of the error e (n) to be performed is minimized. The update filter coefficient calculation unit 5 corresponds to the filter coefficient calculation unit of the present invention.

音声補正フィルタ６は、本発明のフィルタ処理部に相当するものであり、更新フィルタ係数算出部５により求められたフィルタ係数ｗ（ｎ）を用いて、制御対象となるオーディオ音声ｘ（ｎ）に対してフィルタリング処理を行う。具体的には、スピーカ３からマイク４に伝達されるオーディオ音声の伝達関数と同一の伝達関数を制御対象のオーディオ音声ｘ（ｎ）に対して与えることにより、補正されたオーディオ音声ｙ（ｎ）を得る。 The audio correction filter 6 corresponds to the filter processing unit of the present invention, and uses the filter coefficient w (n) obtained by the update filter coefficient calculation unit 5 to control the audio sound x (n) to be controlled. A filtering process is performed on the image. Specifically, the corrected audio sound y (n) is obtained by giving the transfer function identical to the transfer function of the audio sound transmitted from the speaker 3 to the microphone 4 to the audio sound x (n) to be controlled. Get.

音声補正フィルタ６より出力されるフィルタ制御後のオーディオ音声ｙ（ｎ）は、減算器７に供給される。減算器７は、マイク４より入力されたオーディオ音声と話者音声との混合音声ｄ（ｎ）から、音声補正フィルタ６より出力される補正後のオーディオ音声ｙ（ｎ）を引くことによって誤差ｅ（ｎ）を演算し、話者音声のみを抽出する。減算器７によって抽出された話者音声は、図示しない音声認識エンジンに供給される。また、誤差ｅ（ｎ）は更新フィルタ係数算出部５にフィードバックされる。 The audio sound y (n) after the filter control output from the sound correction filter 6 is supplied to the subtractor 7. The subtracter 7 subtracts the corrected audio sound y (n) output from the sound correction filter 6 from the mixed sound d (n) of the audio sound and the speaker sound input from the microphone 4 to generate an error e. (N) is calculated, and only the speaker voice is extracted. The speaker voice extracted by the subtracter 7 is supplied to a voice recognition engine (not shown). The error e (n) is fed back to the update filter coefficient calculation unit 5.

８はオーディオトラック情報抽出部であり、音源１に記録されている制御対象のオーディオ音声からオーディオトラック情報を抽出する。このオーディオトラック情報抽出部８は、本発明の識別情報抽出部に相当する。また、抽出するオーディオトラック情報は、オーディオ音声を識別するための情報を含んでおり、本発明によるオーディオ音声の識別情報に相当する。 Reference numeral 8 denotes an audio track information extraction unit which extracts audio track information from the audio sound to be controlled recorded in the sound source 1. The audio track information extraction unit 8 corresponds to the identification information extraction unit of the present invention. The audio track information to be extracted includes information for identifying audio sound, and corresponds to audio sound identification information according to the present invention.

９は要求送信部であり、オーディオトラック情報抽出部８により抽出されたオーディオトラック情報を通信ネットワークを介してデータバンク装置２００に送信し、当該オーディオトラック情報により示されるオーディオ音声に該当する修正用データα（ｎ）の取得を要求する。１０は修正用データ取得部であり、要求送信部９からデータバンク装置２００に送信されたオーディオトラック情報に応じて当該データバンク装置２００から応答として返されてくる修正用データα（ｎ）を取得する。 A request transmission unit 9 transmits the audio track information extracted by the audio track information extraction unit 8 to the data bank apparatus 200 via the communication network, and the correction data corresponding to the audio sound indicated by the audio track information. Request acquisition of α (n). Reference numeral 10 denotes a correction data acquisition unit that acquires correction data α (n) returned as a response from the data bank device 200 according to the audio track information transmitted from the request transmission unit 9 to the data bank device 200. To do.

１１は修正用データ出力部であり、修正用データ取得部１０により取得された修正用データα（ｎ）を保持しておく。そして、制御対象となるオーディオ音声ｘ（ｎ）の経過時間を表すタイムコードをオーディオソース再生部２より取得し、そのタイムコードにより表される経過時間に同期させて修正用データα（ｎ）を更新フィルタ係数算出部５に逐次出力する。すなわち、後述するように、修正用データ取得部１０により取得される修正用データα（ｎ）にはタイムコードが紐付けされている。修正用データ出力部１１は、オーディオソース再生部２からタイムコードを取得する毎に、そのタイムコードに該当する修正用データα（ｎ）を更新フィルタ係数算出部５に順次出力する。更新フィルタ係数算出部５は、修正用データ出力部１１から出力される修正用データα（ｎ）を用いて、上述の（式１−１）に従って音声補正フィルタ６のフィルタ係数ｗ（ｎ）を求める。 Reference numeral 11 denotes a correction data output unit, which holds the correction data α (n) acquired by the correction data acquisition unit 10. Then, a time code representing the elapsed time of the audio sound x (n) to be controlled is acquired from the audio source reproduction unit 2, and the correction data α (n) is synchronized with the elapsed time represented by the time code. It outputs to the update filter coefficient calculation part 5 sequentially. That is, as will be described later, a time code is associated with the correction data α (n) acquired by the correction data acquisition unit 10. The correction data output unit 11 sequentially outputs the correction data α (n) corresponding to the time code to the update filter coefficient calculation unit 5 every time the time code is acquired from the audio source reproduction unit 2. The update filter coefficient calculation unit 5 uses the correction data α (n) output from the correction data output unit 11 to calculate the filter coefficient w (n) of the sound correction filter 6 according to the above (Equation 1-1). Ask.

一方、データバンク装置２００の構成において、２１は修正用データＤＢ（データベース）であり、音声補正フィルタ６のフィルタ係数を求める際に必要な修正用データα（ｎ）であって、制御対象となるオーディオ音声ｘ（ｎ）に基づいてあらかじめ算出された修正用データα（ｎ）を、オーディオ音声の識別情報（例えば、オーディオトラック情報）と共に蓄積する。この修正用データＤＢ２１は、本発明の修正用データ記憶部に相当する。 On the other hand, in the configuration of the data bank device 200, reference numeral 21 denotes a correction data DB (database), which is correction data α (n) necessary for obtaining the filter coefficient of the sound correction filter 6 and is to be controlled. The correction data α (n) calculated in advance based on the audio sound x (n) is stored together with the identification information (for example, audio track information) of the audio sound. The correction data DB 21 corresponds to the correction data storage unit of the present invention.

上述の（式１−２）から分かるように、修正用データα（ｎ）を求める演算式の中には、ステップサイズパラメータμや誤差ｅ（ｎ）の項は含まれておらず、制御対象となるオーディオ音声ｘ（ｎ）のみから修正用データα（ｎ）を求めることができる。したがって、修正用データα（ｎ）は、オーディオ音声ｘ（ｎ）のみから特定可能なデータであり、実際にオーディオ音声ｘ（ｎ）が再生されているときの環境等に応じて値を更新する必要がないものである。 As can be seen from the above (Equation 1-2), the arithmetic expression for obtaining the correction data α (n) does not include the term of the step size parameter μ and the error e (n). The correction data α (n) can be obtained only from the audio sound x (n). Therefore, the correction data α (n) is data that can be specified only from the audio sound x (n), and the value is updated according to the environment when the audio sound x (n) is actually reproduced. It is not necessary.

そこで、本実施形態では、音源１をあらかじめ入手してオーディオ音声ｘ（ｎ）から修正用データα（ｎ）を算出しておき、これを修正用データＤＢ２１に蓄積しておく。これを、様々な音源１のオーディオ音声ｘ（ｎ）について行っておくのが好ましい。その際、どの修正用データα（ｎ）がどのオーディオ音声ｘ（ｎ）から求めたものであるかを後から識別できるようにするために、修正用データα（ｎ）をオーディオ音声ｘ（ｎ）の識別情報と紐付けて修正用データＤＢ２１に格納する。 Therefore, in the present embodiment, the sound source 1 is obtained in advance, the correction data α (n) is calculated from the audio sound x (n), and this is stored in the correction data DB 21. This is preferably performed for the audio sound x (n) of various sound sources 1. At that time, in order to be able to identify later from which audio sound x (n) which correction data α (n) is obtained, the correction data α (n) is converted to audio sound x (n ) And stored in the correction data DB 21.

図３は、１つのオーディオ音声ｘ（ｎ）から求めた修正用データα（ｎ）の例を示す図である。図３に示すように、修正用データα（ｎ）は、オーディオ音声ｘ（ｎ）のタイムコードと紐付けして修正用データＤＢ２１に格納されている。すなわち、図２に示すようなタイムコード毎のオーディオデータに対して、（式１−２）に示す演算をそれぞれ行い、その結果得られるタイムコード毎の修正用データα（ｎ）をタイムコードと紐付けして修正用データＤＢ２１に蓄積する。 FIG. 3 is a diagram illustrating an example of the correction data α (n) obtained from one audio sound x (n). As shown in FIG. 3, the correction data α (n) is stored in the correction data DB 21 in association with the time code of the audio sound x (n). That is, the calculation shown in (Equation 1-2) is performed on the audio data for each time code as shown in FIG. 2, and the correction data α (n) for each time code obtained as a result is used as the time code. The data is linked and stored in the correction data DB 21.

２２は修正用データ読出部であり、要求送信部９から修正用データα（ｎ）の取得要求が送られてきたときに、それに応答して、取得要求に含まれるオーディオトラック情報により示されるオーディオ音声に該当する修正用データα（ｎ）を修正用データＤＢ２１から読み出す。そして、読み出した修正用データα（ｎ）を、通信ネットワークを介してユーザ端末１００に送信する。 Reference numeral 22 denotes a correction data reading unit. When a request for acquisition of correction data α (n) is sent from the request transmission unit 9, the audio indicated by the audio track information included in the acquisition request is sent in response thereto. The correction data α (n) corresponding to the voice is read from the correction data DB 21. Then, the read correction data α (n) is transmitted to the user terminal 100 via the communication network.

以上詳しく説明したように、本実施形態によれば、制御対象となるオーディオ音声から修正用データα（ｎ）を事前に算出してデータベース化しておき、実際にオーディオ音声を再生するときには、データベース内の修正用データα（ｎ）を利用して適応フィルタの更新を行うようにしたので、ＡＳＣシステムでオーディオ音声のみを抑圧する処理を行う際に、少なくとも修正用データα（ｎ）を求める演算を省略することができ、演算量を削減することができる。しかも、固定のフィルタ係数を使うのではなく、適応アルゴリズムに従ってフィルタ係数を適宜更新するので、制御対象となるオーディオ音声の性質やその時々の環境に応じたフィルタ係数をリアルタイムに求めることができ、ＡＳＣシステムの性能を良好に保つことができる。 As described above in detail, according to the present embodiment, the correction data α (n) is calculated in advance from the audio sound to be controlled and stored in a database, and when the audio sound is actually reproduced, Since the adaptive filter is updated by using the correction data α (n), the calculation for at least the correction data α (n) is performed when the ASC system performs the process of suppressing only the audio sound. This can be omitted, and the amount of calculation can be reduced. In addition, since the filter coefficients are appropriately updated according to the adaptive algorithm instead of using the fixed filter coefficients, the filter coefficients corresponding to the properties of the audio sound to be controlled and the circumstances at that time can be obtained in real time. The system performance can be kept good.

なお、上記実施形態では、音源１からオーディオデータを再生する度に、外部のデータバンク装置２００と通信をして修正用データα（ｎ）を取得する例について説明したが、本発明はこれに限定されない。例えば、ユーザ端末１００側にローカルのデータ記憶部（例えば、ハードディスク）を持ち、データバンク装置２００から最初に取得した修正用データα（ｎ）を当該ローカルのデータ記憶部に保存する。そして、同じオーディオデータを２回目以降に再生するときには、当該ローカルのデータ記憶部から修正用データα（ｎ）を読み出して利用するようにしても良い。この場合は、ローカルのデータ記憶部に修正用データα（ｎ）がないときにのみ、外部のデータバンク装置２００と通信する。 In the above-described embodiment, the example in which the correction data α (n) is acquired by communicating with the external data bank apparatus 200 every time audio data is reproduced from the sound source 1 has been described. It is not limited. For example, the user terminal 100 has a local data storage unit (for example, a hard disk), and the correction data α (n) first acquired from the data bank device 200 is stored in the local data storage unit. When the same audio data is reproduced for the second time or later, the correction data α (n) may be read from the local data storage unit and used. In this case, communication with the external data bank device 200 is performed only when the correction data α (n) is not present in the local data storage unit.

また、ユーザ端末１００の外部にデータバンク装置２００を用意するのではなく、修正用データＤＢ２１および修正用データ読出部２２自体をユーザ端末１００が備えるようにしても良い。このようにする場合、修正用データＤＢ２１に修正用データα（ｎ）を最初に記録するために、ユーザ端末１００は図５のリファレンス信号演算部１０３ａを更に備える必要があるが、このリファレンス信号演算部１０３ａによって修正用データα（ｎ）を演算しなければならないのは、最初にオーディオデータを再生するときのみである。すなわち、オーディオソース再生部２により再生されたオーディオ音声ｘ（ｎ）について最初に算出した修正用データα（ｎ）を修正用データＤＢ２１に保存することにより、同じオーディオ音声ｘ（ｎ）を２回目以降に再生するときには、修正用データＤＢ２１から修正用データα（ｎ）を読み出して利用することができる。 Instead of preparing the data bank device 200 outside the user terminal 100, the user terminal 100 may be provided with the correction data DB 21 and the correction data reading unit 22 itself. In this case, in order to record the correction data α (n) in the correction data DB 21 for the first time, the user terminal 100 needs to further include the reference signal calculation unit 103a of FIG. The correction data α (n) must be calculated by the unit 103a only when the audio data is reproduced for the first time. That is, the correction data α (n) calculated first for the audio sound x (n) reproduced by the audio source reproduction unit 2 is stored in the correction data DB 21, so that the same audio sound x (n) is stored for the second time. In the subsequent reproduction, the correction data α (n) can be read from the correction data DB 21 and used.

また、上記実施形態では、更新フィルタ係数算出部５の適応アルゴリズムがＮ−ＬＭＳアルゴリズムである場合を例にとって説明したが、本発明はこれに限定されない。例えば、射影アルゴリズムやＲＬＳ（Recursive Least Square）アルゴリズムであっても良い。射影アルゴリズムの場合は、制御対象のリファレンス信号として入力されるオーディオ音声ｘ（ｎ）を次に示す（式２）に従って補正する。 In the above embodiment, the case where the adaptive algorithm of the update filter coefficient calculation unit 5 is the N-LMS algorithm has been described as an example, but the present invention is not limited to this. For example, a projection algorithm or an RLS (Recursive Least Square) algorithm may be used. In the case of the projection algorithm, the audio sound x (n) input as the reference signal to be controlled is corrected according to the following (Equation 2).

上記（式２）において、β（ｎ）は修正用データであり、これもオーディオ音声ｘ（ｎ）のみから求められる。したがって、この修正用データβ（ｎ）をオーディオ音声ｘ（ｎ）から事前に求めて修正用データＤＢ２１にデータベース化しておき、実際にオーディオ音声を再生するときには、データベース内の修正用データβ（ｎ）を利用して適応フィルタの更新を行うようにする。 In the above (Formula 2), β (n) is correction data, which is also obtained from only the audio sound x (n). Therefore, the correction data β (n) is obtained in advance from the audio sound x (n) and stored in the correction data DB 21 as a database, and when the audio sound is actually reproduced, the correction data β (n in the database is stored. ) To update the adaptive filter.

同様に、ＲＬＳアルゴリズムの場合は、制御対象のリファレンス信号として入力されるオーディオ音声ｘ（ｎ）を次に示す（式３）に従って補正する。（式３）において、γ（ｎ）は修正用データであり、これもオーディオ音声ｘ（ｎ）のみから求められる。したがって、この修正用データγ（ｎ）をオーディオ音声ｘ（ｎ）から事前に求めて修正用データＤＢ２１にデータベース化しておき、実際にオーディオ音声を再生するときには、データベース内の修正用データγ（ｎ）を利用して適応フィルタの更新を行うようにする。 Similarly, in the case of the RLS algorithm, the audio sound x (n) input as the reference signal to be controlled is corrected according to the following (Equation 3). In (Equation 3), γ (n) is correction data, which is also obtained only from the audio sound x (n). Accordingly, the correction data γ (n) is obtained in advance from the audio sound x (n) and stored in the correction data DB 21 as a database. When the audio sound is actually reproduced, the correction data γ (n) in the database is stored. ) To update the adaptive filter.

また、上記実施形態では、修正用データＤＢ２１から修正用データα（ｎ）を取得するのに、識別情報としてオーディオトラック情報を用いる例について説明したが、これに加えて、ＡＳＣシステムが実装されている車両の車種情報（本発明によるタップ長の識別情報に相当する）を利用するようにしても良い。適応フィルタのタップ長Ｌは、当該適応フィルタが実装される車両の大きさに応じて、ＡＳＣシステムの性能を良好に保つために要求される値が変わる。すなわち、車両が大きいほど適応フィルタのタップ長Ｌは長くするのが好ましい。 In the above embodiment, the example in which the audio track information is used as the identification information to acquire the correction data α (n) from the correction data DB 21 has been described. However, in addition to this, an ASC system is implemented. The vehicle type information (corresponding to the tap length identification information according to the present invention) of the existing vehicle may be used. The tap length L of the adaptive filter varies depending on the size of the vehicle on which the adaptive filter is mounted, and a value required for maintaining good performance of the ASC system. That is, it is preferable that the tap length L of the adaptive filter is increased as the vehicle is larger.

オーディオトラック情報の他に車種情報も利用してＡＳＣシステムを構成する場合、ＡＳＣシステムは図４のように構成される。なお、この図４において、図１に示した構成要素と同一の機能を有する構成要素には同一の符号を付している。図４に示すＡＳＣシステムは、制御対象となるオーディオ音声ｘ（ｎ）が再生される車両の車種情報を取得する車種情報取得部１２（本発明の識別情報取得部に相当する）を更に備えている。ここで言う車種情報は、少なくとも車両の大きさが分かる情報であれば良い（具体的な寸法でなく、大きさのランクが分かれば良い）。この車種情報は、例えば、図示しないナビゲーション装置の内部メモリに保存されているものを利用する。 When the ASC system is configured using the vehicle type information in addition to the audio track information, the ASC system is configured as shown in FIG. In FIG. 4, components having the same functions as those shown in FIG. 1 are denoted by the same reference numerals. The ASC system shown in FIG. 4 further includes a vehicle type information acquisition unit 12 (corresponding to the identification information acquisition unit of the present invention) that acquires the vehicle type information of the vehicle on which the audio sound x (n) to be controlled is reproduced. Yes. The vehicle type information referred to here may be information that at least indicates the size of the vehicle (it is only necessary to know the rank of the size, not the specific size). As this vehicle type information, for example, information stored in an internal memory of a navigation device (not shown) is used.

車種情報取得部１２は、例えば図示しないナビゲーション装置から車種情報を取得し、それを要求送信部９に出力する。要求送信部９は、オーディオトラック情報抽出部８から出力されるオーディオトラック情報と車種情報取得部１２から出力される車種情報とを通信ネットワークを介してデータバンク装置２００に送信し、当該オーディオトラック情報および車種情報により示される修正用データα（ｎ）の取得を要求する。 The vehicle type information acquisition unit 12 acquires vehicle type information from a navigation device (not shown), for example, and outputs it to the request transmission unit 9. The request transmission unit 9 transmits the audio track information output from the audio track information extraction unit 8 and the vehicle type information output from the vehicle type information acquisition unit 12 to the data bank device 200 via the communication network, and the audio track information And the acquisition of correction data α (n) indicated by the vehicle type information is requested.

修正用データＤＢ２１には、制御対象となるオーディオ音声に基づいて異なるタップ長Ｌ用に算出された修正用データα（ｎ）を、オーディオ音声の識別情報および車種情報と共に蓄積する。すなわち、上述した図１の実施形態では、１つのオーディオ音声ｘ（ｎ）から１つの修正用データα（ｎ）を求めて修正用データＤＢ２１に格納していたが、図４の実施形態では、１つのオーディオ音声ｘ（ｎ）から異なるタップ長Ｌ用に複数の修正用データα（ｎ）を求め、どの修正用データα（ｎ）がどのオーディオ音声ｘ（ｎ）から求めたもので、どの車種用（どのタップ長Ｌ用）のものであるかを後から識別できるようにするために、修正用データα（ｎ）をオーディオ音声ｘ（ｎ）の識別情報および車種情報と紐付けて修正用データＤＢ２１に格納する。 In the correction data DB 21, correction data α (n) calculated for different tap lengths L based on the audio sound to be controlled is stored together with the identification information of the audio sound and the vehicle type information. That is, in the embodiment of FIG. 1 described above, one correction data α (n) is obtained from one audio sound x (n) and stored in the correction data DB 21. In the embodiment of FIG. A plurality of correction data α (n) is obtained from one audio sound x (n) for different tap lengths L, which correction data α (n) is obtained from which audio sound x (n), which The correction data α (n) is linked with the identification information of the audio sound x (n) and the vehicle type information so that the vehicle type (for which tap length L) can be identified later. Stored in the business data DB 21.

修正用データ読出部２２は、ユーザ端末１００の要求送信部９より送られてくるオーディオトラック情報および車種情報に基づいて、当該オーディオトラック情報および車種情報に該当する修正用データα（ｎ）を修正用データ記憶部２１から読み出す。そして、読み出した修正用データα（ｎ）をユーザ端末１００に送信する。修正用データ取得部１０は、修正用データ読出部２２より送られてきた修正用データα（ｎ）を取得し、修正用データ出力部１１に出力する。 Based on the audio track information and the vehicle type information sent from the request transmission unit 9 of the user terminal 100, the correction data reading unit 22 corrects the correction data α (n) corresponding to the audio track information and the vehicle type information. Read from the data storage unit 21. Then, the read correction data α (n) is transmitted to the user terminal 100. The correction data acquisition unit 10 acquires the correction data α (n) sent from the correction data reading unit 22 and outputs it to the correction data output unit 11.

なお、ここでは車種情報を用いているが、これに限定されない。例えば、車種とタップ長との対応関係を示したテーブル情報を用意しておき、このテーブル情報を参照することによって車種情報をタップ長情報に置き換えて使用するようにしても良い。 In addition, although vehicle type information is used here, it is not limited to this. For example, table information indicating the correspondence between the vehicle type and the tap length may be prepared, and the vehicle type information may be replaced with the tap length information by referring to this table information.

その他、上記実施形態は、何れも本発明を実施するにあたっての具体化の一例を示したものに過ぎず、これによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその精神、またはその主要な特徴から逸脱することなく、様々な形で実施することができる。 In addition, each of the above-described embodiments is merely an example of actualization in carrying out the present invention, and the technical scope of the present invention should not be construed in a limited manner. In other words, the present invention can be implemented in various forms without departing from the spirit or main features thereof.

本発明は、例えば、マイクより入力した混合音声からオーディオ音声のみを抑圧するＡＳＣシステムに有用である。 The present invention is useful for, for example, an ASC system that suppresses only audio sound from mixed sound input from a microphone.

本発明の適応フィルタおよび音声補正システムを実施したＡＳＣシステムの構成例を示す図である。It is a figure which shows the structural example of the ASC system which implemented the adaptive filter and audio | voice correction | amendment system of this invention. 音源のオーディオデータの概略的な構造イメージを示す図である。It is a figure which shows the schematic structure image of the audio data of a sound source. 本実施形態の修正用データＤＢに格納される修正用データの概略的な構造イメージを示す図である。It is a figure which shows the schematic structure image of the data for correction stored in the data DB for correction of this embodiment. 本発明の適応フィルタおよび音声補正システムを実施したＡＳＣシステムの他の構成例を示す図である。It is a figure which shows the other structural example of the ASC system which implemented the adaptive filter and audio | voice correction system of this invention. 従来のＡＳＣシステムの構成例を示す図である。It is a figure which shows the structural example of the conventional ASC system.

Explanation of symbols

５更新フィルタ係数算出部
６音声補正フィルタ
７減算器
８オーディオトラック情報抽出部
９要求送信部
１０修正用データ取得部
１１修正用データ出力部
１２車種情報取得部
２１修正用データＤＢ
２２修正用データ読出部
１００ユーザ端末
２００データバンク装置 DESCRIPTION OF SYMBOLS 5 Update filter coefficient calculation part 6 Audio | voice correction filter 7 Subtractor 8 Audio track information extraction part 9 Request transmission part 10 Correction data acquisition part 11 Correction data output part 12 Vehicle type information acquisition part 21 Correction data DB
22 data reading unit for correction 100 user terminal 200 data bank device

Claims

A correction data storage unit that stores correction data necessary for obtaining the filter coefficient of the adaptive filter, the correction data calculated based on the audio sound to be controlled, together with the identification information of the audio sound;
An identification information extraction unit that extracts the identification information from the audio sound to be controlled;
A correction data acquisition unit that acquires correction data corresponding to the identification information extracted by the identification information extraction unit from the correction data storage unit;
Correction data for obtaining a time code representing the elapsed time of the audio sound to be controlled and outputting the correction data acquired by the correction data acquisition unit in synchronization with the elapsed time represented by the time code An output section;
A filter coefficient calculation unit for obtaining a filter coefficient of the adaptive filter using the correction data output from the correction data output unit;
An audio correction system comprising: a filter processing unit that performs a filtering process on the audio sound to be controlled using the filter coefficient obtained by the filter coefficient calculating unit.

In the correction data storage unit, correction data calculated for different tap lengths based on the audio sound to be controlled is stored together with the identification information of the audio sound and the identification information of the tap length,
An identification information acquisition unit that acquires identification information of a tap length according to the vehicle type of the vehicle on which the audio sound to be controlled is reproduced;
The correction data acquisition unit is a correction data corresponding to the identification information based on the identification information of the audio sound extracted by the identification information extraction unit and the identification information of the tap length acquired by the identification information acquisition unit. The sound correction system according to claim 1, wherein the sound correction system is acquired from the correction data storage unit.

The correction data calculated based on the audio sound to be controlled corresponds to the identification information extracted from the audio sound to be controlled from the correction data storage unit stored together with the identification information of the audio sound. A correction data acquisition unit for acquiring correction data;
Correction data for obtaining a time code representing the elapsed time of the audio sound to be controlled and outputting the correction data acquired by the correction data acquisition unit in synchronization with the elapsed time represented by the time code An output section;
A filter coefficient calculation unit for obtaining a filter coefficient using the correction data output from the correction data output unit;
An adaptive filter comprising: a filter processing unit that performs a filtering process on the audio sound to be controlled using the filter coefficient obtained by the filter coefficient calculation unit.