JP3754819B2

JP3754819B2 - Voice communication method and voice communication apparatus

Info

Publication number: JP3754819B2
Application number: JP07518098A
Authority: JP
Inventors: 誠司佐々木
Original assignee: Hitachi Kokusai Electric Inc; Kokusai Denki Electric Inc
Current assignee: Kokusai Denki Electric Inc
Priority date: 1998-03-24
Filing date: 1998-03-24
Publication date: 2006-03-15
Anticipated expiration: 2018-03-24
Also published as: JPH11272298A

Abstract

PROBLEM TO BE SOLVED: To reduce the quality degradation of reproduced voice by applying the same interpolation process as the process on the reception side to a frame transmitting the synchronization signal on the transmission side so that the quality degradation of the reproduced voice of a frame discarded with voice coding information does not affect the succeeding frame. SOLUTION: A transmission section feeds a voice input in a frame ft0, extracts no voice information and interpolates voice information in frame ft1, and transmits the frame synchronization signal in frame ft2 to secure the synchronization of frame with the reception side. The interpolation of voice information in frame ft1 and frame ft51 is made the same as the interpolation by a voice decoder on the reception side. If the interpolation substituted with the voice coding information of the previous frame is applied on the transmission side, for example, the interpolation substituted with the voice coding information of the previous frame is likewise applied on the reception side. Any interpolation method can be used as far as the same interpolation is applied on the transmission side and reception side.

Description

【０００１】
【発明の属する技術分野】
本発明は、適応符号帳（又は長期予測とも呼ばれる）を使用する音声符号化・復号化技術を用いた音声通信方法及び音声通信装置に係り、特に再生音声の品質を向上できる音声通信方法及び音声通信装置に関する。
【０００２】
【従来の技術】
まず、従来の音声通信装置の概略構成について図５を使って説明する。図５は、従来の音声通信装置の概略構成を示すブロック図である。
従来の音声通信装置は、図５に示すように、送信部分として、送信する音声を入力してサンプリング、量子化しフレーム単位で入力音声を出力する音声入力部１と、入力音声を符号化して音声符号化情報を出力する音声符号化器２と、音声符号化情報を送信する送信部３とから構成されている。
また、受信部分としては、伝送されたフレーム単位の音声符号化情報を受信する受信部４と、受信音声符号化情報を復号化して音声を再生する音声復号化器５と、再生された音声を出力する音声出力部６から構成されている。
【０００３】
ここで、音声符号化器２は、適応符号帳（又は長期予測とも呼ばれる）を使用する音声符号化技術で符号化を行うもので、例えば、移動体通信等で最も広く用いられている音声の符号化方式である符号励振型線形予測（Code Excited Linear Prediction：ＣＥＬＰ）音声符号化方式がよく知られている。
【０００４】
符号励振型線形予測（ＣＥＬＰ）音声符号化方式は、フレーム単位で符号化を行い、１つ前のフレームの符号化情報に基づいて現在のフレームの音声を予測し、予測結果を最適化してその情報を現フレームの音声符号化情報とし、更に当該最適化された予測結果を次のフレームの符号化の際に使用するようになっている。
【０００５】
従って、音声復号化器５で行う符号励振型線形予測（ＣＥＬＰ）の音声復号化方式は、フレーム単位で復号化を行い、１つ前のフレームで復号化した結果を利用して、受信した音声符号化情報に従って復号化を行い、更に当該復号結果を次のフレームの復号化の際に使用するようになっている。
【０００６】
送信部３は、音声符号化器２で符号化されたフレーム単位の音声符号化情報を送信するものであるが、送信側の音声符号化器２と受信側の音声復号器との間でフレーム同期を保持、および補正するために、音声符号化情報を送信する際に所定のフレーム周期毎にフレーム同期信号を音声符号化情報と入れ替えて送信するようになっている。
【０００７】
そして、受信部４では、フレーム単位で伝送される音声符号化情報又はフレーム同期信号を受信し、音声符号化情報の場合は記憶エリアに記憶してから当該音声符号化情報を音声復号化器５に出力し、フレーム同期信号の場合は、記憶エリアに記憶されている例えば１つ前のフレームの音声符号化情報を音声復号化器５に出力するようになっている。
【０００８】
次に、従来の音声通信装置の動作について、図６を使って具体的に説明する。図６は、従来の音声通信装置における音声符号化・復号化処理とフレーム同期信号送受信タイミングを示す説明図である。尚、図６では、音声符号化処理に要する処理時間は１フレームであり、再生音声の復号処理に要する処理時間は１フレームであるとして示している。
【０００９】
図６においてｆｔｎ（ｎ＝０、１、２、…）は送信側（音声符号化器側）でのフレーム番号を示すインデックスであり、ｆｒｎ（ｎ＝０、１、２、…）は受信側（音声復号器側）でのフレーム番号を示すインデックスである。
【００１０】
従来の音声通信装置では、送信側の動作として、音声入力部１において図６（ａ）に示すように、音声が入力されサンプリング，量子化され、１フレーム分の長さを有する入力バッファに蓄積される。
【００１１】
そして、音声入力部１で蓄積された音声は、音声符号化器２で図６（ｂ）に示すようにフレーム単位で音声符号化情報が抽出される（図６では音声情報抽出と表示している）。
ここで、音声符号化情報の抽出は、入力バッファに１フレーム分の音声信号の蓄積が終了した後に開始される。例えば、フレームｆｔ０の区間で蓄積された音声はフレームｆｔ１の区間で抽出処理（符号化処理）が為される。
【００１２】
そして、音声符号化器２で抽出された音声符号化情報が、出力部３で図６（ｃ）に示すように送信される（図６では音声情報送信と表示している）。
ここで、音声符号化情報送信は抽出処理が完了した後に開始される。例えば、フレームｆｔ１の区間で抽出された音声情報はフレームｆｔ２の先頭から送信が開始される。
【００１３】
従来の音声通信装置において通常は、この音声入力、音声情報抽出、音声情報送信が繰り返されるが、受信側とのフレームの同期を確保するために、送信部３における送信処理において所定のフレーム周期毎に音声符号化情報の代わりにフレーム同期信号を送信する。
【００１４】
図６の例では、５０フレーム毎（フレームｆｔ２、ｆｔ５２、…）にフレーム同期信号を送信しており、この時送信すべき音声符号化情報は送信されない。例えば、フレームｆｔ２で送信すべきフレームｆｔ１で抽出された音声符号化情報は送信されないことになる。
【００１５】
一方、従来の音声通信装置の受信側の動作は、受信部４で図６（ｄ）に示すように音声符号化情報が受信されて受信バッファに蓄積される。
但し、図６の例では１フレーム分の音声符号化情報を受信するのに１フレーム分の時間を要するものとし、また、５０フレーム毎（フレームｆｒ２、ｆｒ５２、…）にフレーム同期信号を受信している。
【００１６】
そして、受信部４で蓄積された音声符号化情報によって、次のフレーム区間では復号化のための符号化情報が更新されて図６（ｅ）に示すように音声復号化器５で音声復号処理により音声が再生され（図６では音声情報更新と表示している）、音声出力部６によって図６（ｆ）に示すように再生音声が出力される（図６では再生音声出力と表示している）。
【００１７】
例えば、フレームｆｒ０で受信した音声符号化情報に対し、次フレームの区間ｆｒ１で復号処理を行い、次のフレーム区間ｆｒ２で再生音声を出力する。
【００１８】
但し、受信部４が５０フレーム毎（例えばフレームｆｒ２、ｆｒ５２、… ）にフレーム同期信号を受信した場合は、これらのフレームに対しフレームｆｒ３、ｆｒ５３１、… で音声復号処理する際、該当するフレームの音声符号化情報が存在しないため、他のフレームで受信した音声符号化情報により補間処理を行う。
補間処理の一例としては、前フレームで受信した音声符号化情報で置換する方法などがある。
【００１９】
そして、補間された音声符号化情報を用いて音声符号化器５で復号化が行われ、再生音声が出力される。
【００２０】
【発明が解決しようとする課題】
しかしながら、上記従来の音声通信方法及び音声通信機装置は、フレーム同期信号を送信するフレーム区間は送信すべき音声符号化情報を廃棄してしまい、受信側で当該区間の音声符号化情報を補間により生成して復号化を行うので、当該フレーム区間の実際の音声符号化情報を用いた復号化に比べて再生音声の品質が劣下するという問題点があった。
【００２１】
更に、従来の音声通信方法及び音声通信装置では、前フレームの音声符号化情報を反映させながら次フレームの符号化を行い、復号化においても前フレームの音声復号結果を反映させながら次フレームの復号化を行う符号化・復号化方法を用いているので、上記再生音声出力の品質劣下がそれに続くフレームの復号化にも影響し、再生音声の品質劣下が数フレームに及び連続的に発生するという問題点があった。
【００２２】
本発明は上記実情に鑑みて為されたもので、フレーム同期信号送信に伴う受信側での音声符号化情報の補間と同様の処理を送信側で施すことによって、フレーム同期信号送信によって音声符号化情報が廃棄されたフレームの再生音声の品質劣下をそれに続くフレームに影響しないようにして、再生音声の品質劣化を軽減できる音声通信方法及び音声通信装置を提供することを目的とする。
【００２３】
【課題を解決するための手段】
上記従来例の問題点を解決するための請求項１記載の発明は、適応符号帳を使用する音声符号化・復号化処理を用いた音声通信方法であって、送信側で入力音声信号を前記音声符号化処理することで音声符号化情報を抽出して送信し、受信側で受信した音声符号化情報を音声復号化処理することで音声信号を再生する音声通信方法において、送信側から周期的に音声符号化情報に替えて同期信号が送信される場合に、前記送信側にて同期信号が送信されるフレームの音声符号化情報について、当該同期信号を受信したフレームの音声符号化情報に対して前記受信側で為される音声符号化情報の補間処理と同じ補間処理を行い、前記補間処理によって得られた音声符号化情報に従って前記適応符号帳を更新することを特徴としており、適応符号帳を用いた音声符号化・復号化に際して、前の音声符号化情報を反映させながら処理が為されるような場合、送信側と受信側で同じ音声符号化情報の補間処理が為されるようになるため、送信側の音声符号化の影響と受信側の音声復号化の影響とが等しくなり、再生音声の品質向上を図ることができる。
【００２４】
上記従来例の問題点を解決するための請求項２記載の発明は、請求項１記載の音声通信方法において、音声符号化情報の補間処理は、１つ前のフレームで得られた音声符号化情報を用いるようにしたものが考えられる。
【００２５】
上記従来例の問題点を解決するための請求項３記載の発明は、音声通信装置において、音声を入力して音声信号を出力する音声入力部と、前記音声信号を適応符号帳を用いて音声符号化処理を行い、音声符号化情報を抽出する音声符号化器と、前記音声符号化情報を送信すると共に周期的に音声符号化情報に替えて同期信号を送信する送信部とを有する送信側と、送信された音声符号化情報を受信すると共に、前記同期信号を受信すると音声符号化情報の補間処理として前フレームで得られた音声符号化情報を出力する受信部と、音声符号化情報を適応符号帳を用いて復号化して音声信号を出力する音声復号化器と、前記音声信号を音声として出力する音声出力部とを有する受信側とを備え、前記音声符号化器が、前記送信部にて同期信号が送信されるフレームの音声符号化情報について、当該同期信号を受信したフレームの音声符号化情報に対して前記受信部で為される補間処理と同じ補間処理を行い、前記補間処理によって得られた音声符号化情報に従って前記適応符号帳を更新することを特徴としており、適応符号帳を用いた音声符号化・復号化に際して、前の音声符号化情報を反映させながら処理が為されるような場合、送信側と受信側で同じ音声符号化情報の補間処理が為されるようになるため、送信側の音声符号化の影響と受信側の音声復号化の影響とが等しくなり、再生音声の品質向上を図ることができる。
【００２６】
上記従来例の問題点を解決するための請求項４記載の発明は、音声通信装置において、入力音声信号についてフレーム単位で音声生成系における声道特性を表現するスペクトル包絡情報を抽出し、同期信号が送信されるフレームでは前フレームのスペクトル包絡情報を当該フレームのスペクトル包絡情報とするスペクトル包絡パラメータ抽出器と、入力音声信号についてフレーム単位でフレーム電力計算を行ってフレーム電力情報を出力し、同期信号が送信されるフレームでは前フレームのフレーム電力情報を当該フレームのフレーム電力情報とするフレーム電力計算器と、入力音声信号に対して前記スペクトル包絡情報を用いて聴覚重み付け処理を行い、聴覚重み付けされた入力音声信号を出力する聴覚重み付けフィルタと、音源信号における周期成分を表現するための符号帳であって入力される制御信号に従って選択された最適な適応符号の候補ベクトルを出力すると共に音源信号の入力を受けて適応符号の候補ベクトルの内容を更新する適応符号帳と、音源信号における雑音成分を表現するための符号帳であって入力される制御信号に従って選択された最適な雑音符号の候補ベクトルを出力する雑音符号帳と、利得を調整するための符号帳であって入力される制御信号に従って選択された適応符号帳用の利得候補ベクトルと雑音符号帳用の利得候補ベクトルとを出力する利得符号帳と、最適な適応符号帳ベクトルに利得候補ベクトルを乗算し、利得調整された最適な適応符号帳ベクトルを出力する第１の乗算器と、最適な雑音符号帳ベクトルに利得候補ベクトルを乗算し、利得調整された最適な雑音符号帳ベクトルを出力する第２の乗算器と、利得調整された最適な適応符号帳ベクトルと利得調整された最適な雑音符号帳ベクトルとを加算し、音源信号を出力する加算器と、前記音源信号に対して前記スペクトル包絡情報を付加すると共に聴覚重み付けを行い、再生音声信号を生成して出力する聴覚重み付け合成フィルタと、前記適応符号帳、前記雑音符号帳、前記利得符号帳における最適の各符号帳ベクトルを探索して各符号帳最適インデックスを出力する符号帳探索処理を行い、同期信号が送信されないフレームでは、前記探索処理で選択された最適の各符号帳ベクトルが出力されるよう、前記適応符号帳、前記雑音符号帳、前記利得符号帳に制御信号を出力して、前記適応符号帳を更新させ、同期信号が送信されるフレームでは、当該フレームの音声符号化情報について当該同期信号を受信したフレームの音声符号化情報に対して受信側で為される音声符号化情報の補間処理と同じ補間処理を行い、前記補間処理にて得られた音声符号化情報に従って最適の各符号帳ベクトルが出力されるよう、前記適応符号帳、前記雑音符号帳、前記利得符号帳に制御信号を出力して、前記適応符号帳を更新させる適応符号帳更新処理を行う最適候補ベクトル選択器とを有する音声符号化器を具備する送信側の装置を備えたことを特徴としており、適応符号帳を用いた音声符号化・復号化に際して、前の音声符号化情報を反映させながら処理が為されるような場合、送信側と受信側で同じ音声符号化情報の補間処理が為されるようになるため、送信側の音声符号化の影響と受信側の音声復号化の影響とが等しくなり、再生音声の品質向上を図ることができる。
【００２７】
上記従来例の問題点を解決するための請求項５記載の発明は、請求項３又は請求項４記載の音声通信装置において、音声符号化情報の補間処理は、１つ前のフレームで得られた音声符号化情報を用いるようにしたものが考えられる。
【００２８】
【発明の実施の形態】
本発明の実施の形態について図面を参照しながら説明する。
本発明の実施の形態に係る音声通信方法及び音声通信装置は、周期的に音声符号化情報の替わりに同期信号が送信されて、受信側で音声符号化情報の補間処理が為される場合に、送信側で同期信号が送信されるフレームに対して受信側で為される音声符号化情報の補間処理と同様の処理を行う音声通信方法及び音声通信装置としているので、適応符号帳を用いた音声符号化・復号化に際して、前の音声符号化情報を反映させながら処理が為されるような場合、送信側と受信側で同様の音声符号化情報の補間処理が為されるようになるため、送信側の音声符号化の影響と受信側の音声復号化の影響とが等しくなり、再生音声の品質向上を図ることができるものである。
【００２９】
本発明の実施の形態に係る音声通信装置（本装置）は、図５に示す構成と基本的に同様になっており、但し、音声符号化器２における処理動作が従来のものと相違している。この音声符号化器２の構成及び動作については後述する。
【００３０】
まず、本発明の実施の形態に係る音声通信方法を図１を用いて説明する。図１は、本発明の実施の形態に係る音声通信装置における音声符号化・復号化処理とフレーム同期信号送受信タイミングを示す説明図である。尚、図１において、音声符号化処理に要する処理時間は１フレームであり、再生音声の復号処理に要する処理時間は１フレームであるとして示している。
【００３１】
ここで、図１に示す処理は、図６に示す処理とほぼ同様であり、但し、音声符号化器２でフレーム単位で音声符号化情報が抽出される処理（図１（ｂ）の処理）が相違している。
具体的には、受信側とのフレームの同期を確保するために、フレームｆｔ２で送信部３にて音声符号化情報の代わりにフレーム同期信号を送信する場合には、従来ではフレームｆｔ０で音声入力を行い、フレームｆｔ１で音声情報抽出を行い、フレームｆｔ２で音声情報を送信せずにフレーム同期信号を送信していたが、本発明の実施の形態ではフレームｆｔ０で音声入力を行い、フレームｆｔ１では音声情報抽出を行わず、音声情報の補間を行い、フレームｆｔ２でフレーム同期信号を送信するものである。
【００３２】
ここで、フレームｆｔ１及びフレームｆｔ５１における音声情報の補間は、受信側の音声復号化器５における補間と同様のものとする。例えば、前のフレームの音声符号化情報で置換する補間を送信側で行うのであれば、受信側も同様に前のフレームの音声復号化情報で置換する補間を行うこととなる。要するに、送信側と受信側とで同じ補間が為されれば、どのような補間方法を用いても構わない。
【００３３】
次に、本発明の実施の形態に係る音声通信方法を実現する音声通信装置について、現在、移動体通信等で最も広く用いられている音声の符号化方式である符号励振型線形予測（Code Excited Linear Prediction：ＣＥＬＰ）音声符号化・復号化方法を例にとって説明する。
【００３４】
ＣＥＬＰ音声符号化・復号化方法では、送信側でフレーム単位で音声符号化情報を抽出して送信し、受信側で受信した音声符号化情報に基づいて復号化を行うようになっている。
ここで、ＣＥＬＰ音声符号化・復号化方法における音声符号化情報は、［表１］に示す項目がある。尚、［表１］では、入力音声が８ｋＨｚでサンプリングされ、１６ビットで量子化されている場合の例で、１フレームが４０ｍｓ，３２０サンプル、サブフレームが１０ｍｓ，８サンプルとして示している。
【００３５】
【表１】

【００３６】
ここで、スペクトル包絡情報ｂ１は、人間の音声生成系における声道特性を表現する情報であり、１フレーム（４０ｍｓ）毎に抽出される情報である。
また、フレーム電力情報ｃ１は、フレーム（４０ｍｓ）単位の電力を表す情報である。
【００３７】
適応符号帳最適インデックスｍ１は、音源信号における周期成分を表現するための適応符号帳における最適な候補ベクトルの番号を示す情報であり、雑音符号帳最適インデックスｏ１は、音源信号における雑音成分を表現するための雑音符号帳における最適な候補ベクトルの番号を示す情報であり、利得符号帳最適インデックスｐ１は、利得を調整するための利得符号帳における最適な候補ベクトルの番号を示す情報であり、いずれのインデックスもサブフレーム（１０ｍｓ）毎に抽出される情報である。
【００３８】
その結果、フレーム単位で抽出されて送信される音声符号化情報は、１セットのスペクトル包絡情報ｂ１及びフレーム電力情報ｃ１と、４セットの適応符号帳最適インデックスｍ１及び雑音符号帳最適インデックスｏ１及び利得符号帳最適インデックスｐ１で構成される。
【００３９】
次に、本発明の実施の形態に係る音声通信装置における音声符号化器（本音声符号化器）について、図２を用いて説明する。図２は、本発明の実施の形態に係る音声通信装置における音声符号化器の構成ブロック図である。
【００４０】
本音声符号化器は、図２に示すように、スペクトル包絡パラメータ抽出器１１と、フレーム電力計算器１２と、適応符号帳１３と、聴覚重み付け合成フィルタ１４と、最適候補ベクトル選択器１５と、雑音符号帳１６と、利得符号帳１７と、乗算器１８と、乗算器１９と、加算器２０と、聴覚重み付けフィルタ２１とから構成されている。
【００４１】
次に、本音声符号化器の各部について説明する。
スペクトル包絡パラメータ抽出器１１は、音声入力部１において入力されサンプリングされ、更に量子化された入力音声ａ１をフレーム単位で入力して、スペクトル包絡情報ｂ１を抽出し、音声符号化情報の一部として出力するものである。
【００４２】
但し、本発明のスペクトル包絡パラメータ抽出器１１の特徴部分として、後述する最適候補ベクトル選択器１５から出力される抽出／置換の制御信号ｑ１を入力し、制御信号が抽出を指示している場合は、入力音声ａ１のフレームのスペクトル包絡情報ｂ１を抽出し、制御信号が置換を指示している場合は、抽出を行わずに補間用の音声情報で置き換えを行ってスペクトル包絡情報ｂ１を出力するようになっている。
【００４３】
尚、補間用の音声情報とは、例えば１つ前のフレームの音声情報（スペクトル包絡情報ｂ１）である。
また、抽出と置換との切り替えは、最適候補ベクトル選択器１５からの制御信号ｑ１によらず、内部にフレームカウンタ等を設けて、補間処理を行うタイミングをカウントするようにしても構わない。
【００４４】
ここで、スペクトル包絡情報は、人間の音声生成系における声道特性を表現する情報であり、スペクトル包絡情報ｂ１は量子化された後、復号器側に伝送され再生音声信号を生成するのに用いられる。また、後述するように聴覚重み付けフィルタ２１及び聴覚重み付け合成フィルタ１４において聴覚重み付けを行う時に用いられる。
【００４５】
フレーム電力計算器１２は、音声入力部１からの入力音声ａ１をフレーム単位で入力して、フレーム電力計算を行い、フレーム電力情報ｃ１を音声符号化情報の一部として出力するものである。
ここで、フレーム電力情報ｃ１は復号器側に伝送され再生音声信号を生成するのに用いられる。また、後述するように最適候補ベクトル選択器１５で利得符号帳１７を探索する処理においてフレーム電力情報が用いられる。
【００４６】
但し、本発明のフレーム電力計算器１２の特徴部分として、後述する最適候補ベクトル選択器１５から出力される抽出／置換の制御信号ｑ１を入力し、制御信号が抽出を指示している場合は、入力音声ａ１のフレームのフレーム電力情報ｃ１を抽出し、制御信号が置換を指示している場合は、抽出を行わずに補間用の音声情報で置き換えを行ってフレーム電力情報ｃ１を出力するようになっている。
【００４７】
尚、補間用の音声情報とは、例えば１つ前のフレームの音声情報（フレーム電力情報ｃ１）である。
また、抽出と置換との切り替えは、最適候補ベクトル選択器１５からの制御信号ｑ１によらず、内部にフレームカウンタ等を設けて、補間処理を行うタイミングをカウントするようにしても構わない。
【００４８】
聴覚重み付けフィルタ２１は、音声入力部１からの入力信号ａ１に対し、サブフレーム単位でスペクトル包絡情報（パラメータ）ｂ１を用いて聴覚重み付け処理（公知の技術）を行い、聴覚重み付けされた入力音声ｎｌを出力するものである。
【００４９】
適応符号帳１３は、音源信号における周期成分を表現するための符号帳であり、例えば、１２８種類のピッチ成分のパターンを予め記憶しており（サイズ１２８、８０次元）、更に１つ前のサブフレームで抽出された最適な適応符号帳ベクトル及び雑音符号帳ベクトル及び利得符号帳ベクトルにより生成された音源信号を記憶する前音源信号エリアが設けられている。
そして、入力される制御信号ｌ１に従って選択された最適な適応符号の候補ベクトルｄ１を出力するようになっている。
【００５０】
雑音符号帳１６は、音源信号における雑音成分を表現するための符号帳であり、例えば、５１２種類の雑音成分のパターンを記憶していて（サイズ５１２、８０次元）、入力される制御信号ｌ１に従って選択された最適な雑音符号の候補ベクトルｆ１を出力するようになっている。
【００５１】
利得符号帳１７は、利得を調整するための符号帳であり、例えば、１２８種類の利得パターンを記憶していて（サイズ１２８、２次元）、入力される制御信号ｌ１に従って選択された適応符号用の利得候補ベクトルｈ１と、雑音符号用の利得候補ベクトルｉ１とを出力するようになっている。
【００５２】
乗算器１８は、最適な適応符号帳ベクトルｄ１に利得の候補ベクトルｈ１を乗算し、利得調整された最適な適応符号帳ベクトルｅ１を出力するものである。
乗算器１９は、最適な雑音符号帳ベクトルｆ１に利得の候補ベクトルｉ１を乗算し、利得調整された最適な雑音符号帳ベクトルｇ１を出力するものである。
加算器２０は、利得調整された最適な適応符号帳ベクトルｅ１と利得調整された最適な雑音符号帳ベクトルｇ１とを加算し、音源信号ｊ１を出力するものである。
【００５３】
聴覚重み付け合成フィルタ１４は、音源信号ｊ１に対してスペクトル包絡情報ｂ１を付加すると共に聴覚重み付けを行い、再生音声ｋ１を生成して出力するものである。
【００５４】
具体的には、音源信号ｊ１にスペクトル包絡情報ｂ１を付加するための合成フィルタの係数に聴覚重み付けをするための修正を施してからフィルタリングを行うことになる。
【００５５】
最適候補ベクトル選択器１５は、基本的にはサブフレーム単位で適応符号帳１３，雑音符号帳１６，利得符号帳１７における最適な符号帳ベクトルを選択するものであるが、本発明の特徴部分としてスペクトル包絡パラメータ抽出器１１及びフレーム電力計算器１２への抽出／置換の制御信号ｑ１を出力するようになっている。
【００５６】
ここで、抽出／置換の制御信号ｑ１は、スペクトル包絡パラメータ抽出器１１及びフレーム電力計算器１２において音声情報を抽出するか、又は音声情報を抽出せずに補間用の音声情報で置き換えるかを指示する信号である。
つまり、最適候補ベクトル選択器１５は、通常のフレームの際にはスペクトル包絡パラメータ抽出器１１及びフレーム電力計算器１２に対して抽出／置換の制御信号ｑ１で抽出を指示し、同期信号が送信されるフレームの際には、抽出／置換の制御信号ｑ１で置換を指示するようになっている。
【００５７】
尚、最適候補ベクトル選択器１５から制御信号ｑ１は出力せずに、スペクトル包絡パラメータ抽出器１１及びフレーム電力計算器１２において内部にフレームカウンタ等を設けて、補間処理を行うタイミングをカウントするようにしても構わない。
【００５８】
また、最適候補ベクトル選択器１５における最適な符号帳ベクトルの探索は、サブフレーム単位で、適応符号帳１３，雑音符号帳１６，利得符号帳１７における最適な各符号帳ベクトルを探索して各最適な各符号帳ベクトルの番号を符号帳最適インデックスｍ１、ｏ１、ｐ１として出力する符号帳探索処理と、抽出した音声符号化情報又は補間した音声符号化情報を次のフレームの符号帳探索に適応するための適応符号帳更新処理を行い、これをサブフレームの数だけ繰り返す。その結果、例えば１フレームが４０ｍｓ、サブフレームが１０ｍｓの場合は、１フレームについて４セットの各符号帳最適ベクトルを抽出して音声符号化情報の一部として出力するようになっている。
【００５９】
但し、本発明の特徴部分として、フレーム内の最後のサブフレームについては、符号帳探索処理終了後に、所定フレーム毎に復号器側と同様の音声符号化情報の補間を行う音声情報補間処理を行うようになっている。
音声情報補間処理の詳細については、後述する。
【００６０】
符号帳探索処理は具体的に、制御信号ｌ１により適応符号帳１３、雑音符号帳１６、利得符号帳１７から出力される各候補ベクトルを制御し、各候補ベクトルに対する再生音声ｋ１と聴覚重み付けされた入力音声ｎ１との自乗平均誤差を計算して、それが最小となる候補ベクトルを最適ベクトルとして選定する符号帳探索を行い、各符号帳（適応、雑音および利得符号帳）の最適ベクトルの番号を符号帳最適インデックスｍ１、ｏ１、ｐ１とし、音声符号化情報の一部として出力する処理である。
【００６１】
ここで、最適候補ベクトル選択器１５によりサブフレーム毎に実行される符号帳探索の手順について説明する。
最適候補ベクトル選択器１５における符号帳探索の概要は、まず第１段階として適応符号帳１３における最適な適応符号帳ベクトルを探索する適応符号帳探索（長期予測とも呼ばれる）を行い、次に第２段階として雑音符号帳１６における最適な雑音符号帳ベクトルを探索する雑音符号帳探索を行い、最適な適応符号帳ベクトル及び雑音符号帳ベクトルが決定した後に、最後の第３段階として利得符号帳探索を行うようになっている。
なお、各符号帳探索の詳細については、本発明の音声通信装置の動作で説明する。
【００６２】
そして、適応符号帳更新処理は、選択された最適な適応、雑音、利得符号帳ベクトルで生成される音源信号ｊ１、又は補間処理によって前フレームの音声符号化情報で置換された符号帳ベクトルで生成される音源信号ｊ１により、適応符号帳１３の内部メモリを更新することによって、次サブフレームで用いる適応符号帳１３を作成する処理である。
【００６３】
ここで、適応符号帳１３の内部メモリを更新する具体的な方法は、例えば、現在記憶されている適応符号帳１３（例えば、１６０サンプル記憶している）の内容をサブフレーム長（８０サンプル）分過去の方向にシフトし、その結果後半部分（新しい部分）には０が入ることになり、その部分に現サブフレームで得られた音源信号（８０サンプル）を代入するようになっている。
【００６４】
次に、本発明の特徴部分である音声符号化情報補間処理は、音声符号化情報抽出対象のフレームが所定フレーム毎に送信部３からフレーム同期信号を送信するフレームである場合に、復号化の際の音声符号化情報の補間処理と同様の処理を行うものである。
【００６５】
具体的には、補間処理方法が例えば前フレームの音声符号化情報での置換を施すような場合には、最適候補ベクトル選択器１５内に前フレームの音声符号化情報を記憶し、最後のサブフレームの符号帳探索終了後に、記憶されている前フレームの音声符号化情報の中の各符号帳の最適インデックスｍ１，ｏ１，ｐ１に従って、適応符号帳１３，雑音符号帳１６，利得符号帳１７から前フレームの適応符号帳最適ベクトルｄ１及び雑音符号帳最適ベクトルｆ１及び利得符号帳最適ベクトルｈ１，ｉ１が出力されるように制御信号ｌ１を制御し、その結果得られた音源信号ｊ１で前述した適応符号帳更新処理を行って適応符号帳１３の内部メモリ内容の更新が行われるようになっている。
【００６６】
ここで、音声符号化情報補間処理の制御フローについて、図３を用いて説明する。図３は、本音声符号化器の最適候補ベクトル選択器１５における音声符号化情報補間処理の流れを示すフローチャート図である。尚、図３において、フレームカウンタＣｆは符号化開始時にリセットされているものとし、５０フレーム毎にフレーム同期信号が挿入されるものとする。
【００６７】
本音声符号化器の最適候補ベクトル選択器１５における音声符号化情報補間処理は、フレームカウンタＣｆをインクリメントし（１００）、Ｃｆが５０より大きいか判断し（１０２）、Ｃｆが５０より大きくない場合（Ｎｏ）は、選択された適応、雑音、利得の符号帳最適インデックスｍ１，ｏ１，ｐ１を現フレームの音声符号化情報として記憶し（１１０）、補間処理を終了する。
【００６８】
一方、処理１０２において、Ｃｆが５０より大きくなった場合（Ｙｅｓ）は、フレームカウンタＣｆをリセットし（１０４）、前フレームの音声符号化情報を現フレームの音声符号化情報として置換し（１０６）、補間処理を終了する。
【００６９】
次に、本音声符号化器の動作について、図２を使って説明する。
本音声符号化器では、フレーム単位で入力音声ａ１が入力されると、スペクトル包絡パラメータ抽出器１１でスペクトル包絡情報ｂ１が抽出されて音声符号化情報の一部として送信部３に出力されると共に、聴覚重み付け合成フィルタ１４及び聴覚重み付けフィルタ２１に与えられる。
一方、フレーム電力計算器１２において入力音声ａ１からフレーム電力情報ｃ１が抽出されて、送信部３に音声符号化情報の一部として出力されると共に、最適候補ベクトル選択器１５に与えられる。
【００７０】
尚、この時、スペクトル包絡パラメータ抽出器１１及びフレーム電力計算器１２では、最適候補ベクトル選択器１５からの抽出／置換の制御信号ｑ１に従って、同期信号が送信されるフレームの時には、抽出を行わずにそれぞれ補間用の情報で置換されたスペクトル包絡情報ｂ１及びフレーム電力情報ｃ１が出力される。
【００７１】
そして以降はサブフレーム単位で、入力音声ａ１に対して、聴覚重み付けフィルタ２１でスペクトル包絡パラメータ抽出器１１からのスペクトル包絡情報ｂ１を用いて聴覚重み付けが為され、聴覚重み付けされた入力音声ｎｌが最適候補ベクトル選択器１５に出力される。
【００７２】
また、最適候補ベクトル選択器１５において、符号帳探索処理の第１段階である適応符号帳探索の動作として、制御信号ｌ１によって、まず適応符号帳１３に記憶された候補ベクトルｄ１が順に適応符号帳１３から出力されるようにし、この時雑音符号帳１６及び利得符号帳１７からは候補ベクトルが出力されないように制御信号ｌ１を制御する。
【００７３】
すると、適応符号帳１３から記憶された候補ベクトルｄ１が順に出力され、乗算器１８及び加算器２０をスルーし、周期性を有する音源信号ｊ１として出力され、聴覚重み付け合成フィルタ１４でスペクトル包絡パラメータ抽出器１１からのスペクトル包絡情報ｂ１を付加すると共に聴覚重み付けが行われ、部分的な再生音声（適応符号帳寄与分）ｋ１が生成されて出力される。
【００７４】
そして、最適候補ベクトル選択器１５では、各候補ベクトルｄ１に対して生成された部分的な再生音声（適応符号帳寄与分）ｋ１に対し最適な利得が与えられた後に、聴覚重み付けフィルタ２１から出力される聴覚重み付けされた入力音声ｎ１との自乗平均誤差が各々計算され、それが最小となる候補ベクトルｄ１が最適な適応符号帳ベクトルとして選定され、選定されたベクトルの番号が適応符号帳１３の符号帳最適インデックスｍ１として出力される。
【００７５】
ここで、最適な利得とは、自乗平均誤差の計算式で再生信号ｋ１に乗算される利得に対して偏微分した結果を０とおくことにより、自乗平均誤差が最小となる利得（最適な利得）を求め、この利得を固定して再生信号ベクトルｋ１を順次取り替えて自乗平均誤差を求めることにより最適な適応符号帳ベクトルの探索が行われるようになっている。
尚、自乗平均誤差の計算方法については、公知の技術であるので、ここでは詳細な説明を省略する。
【００７６】
次に、最適候補ベクトル選択器１５において、符号帳探索処理の第２段階である雑音符号帳探索の動作として、制御信号ｌ１によって、まず雑音符号帳１６に記憶された候補ベクトルｆ１が順に雑音符号帳１６ら出力されるようにし、この時適応符号帳１３及び利得符号帳１７からは候補ベクトルが出力されないように制御信号ｌ１を制御する。
【００７７】
すると、雑音符号帳１６から記憶された候補ベクトルｆ１が順に出力され、乗算器１９及び加算器２０をスルーし、雑音の音源信号ｊ１として出力され、聴覚重み付け合成フィルタ１４でスペクトル包絡パラメータ抽出器１１からのスペクトル包絡情報ｂ１を付加すると共に聴覚重み付けが行われ、部分的な再生音声（雑音符号帳寄与分）ｋ１が生成されて出力される。
【００７８】
ここで、雑音符号帳探索において、各候補ベクトルｆ１には、再生音声の量子化誤差を低減するため、聴覚重み付け合成フィルタ処理された最適な適応符号帳ベクトルに対し直交化処理が施される（公知の技術）。
しかし、便宜上各候補ベクトルに対する再生音声ｋ１について直交化処理を行っても同じ結果が得られるので、本発明では直交化処理が最適候補ベクトル選択器１５で行われるものとする。
【００７９】
そこで、最適候補ベクトル選択器１５では、各候補ベクトルｆ１に対して生成された部分的な再生音声（雑音符号帳寄与分）ｋ１に対し直交化処理を施し、更に最適な利得が与えられた後に、聴覚重み付けフィルタ２１から出力される聴覚重み付けされた入力音声ｎ１との自乗平均誤差が各々計算され、それが最小となる候補ベクトルｆ１が最適な雑音符号帳ベクトルとして選定され、選定されたベクトルの番号が雑音符号帳の符号帳最適インデックスｏ１として出力される。
【００８０】
次に、最適候補ベクトル選択器１５において、符号帳探索処理の第３段階である利得符号帳探索の動作として、制御信号ｌ１によって、適応符号帳１３からは上記適応符号帳探索で決定した最適な適応符号帳ベクトルｄ１が、また雑音符号帳１６からは上記雑音符号帳探索で決定した最適な雑音符号帳ベクトルｆ１が出力されるようにして、更に利得符号帳１７から適応符号用の利得候補ベクトルｈ１と、雑音符号用の利得候補ベクトルｉ１とが記憶されている全てについて順番に出力されるようにする。
【００８１】
これにより、適応符号帳１３からは最適な適応符号帳ベクトルｄ１が出力され、乗算器１８で利得符号帳１７から出力される適応符号用の利得候補ベクトルｈ１と乗算されて、利得調整が為された最適な適応符号帳ベクトルｅ１が出力される。
一方、雑音符号帳１６からは最適な雑音符号帳ベクトルｆ１が出力され、乗算器１９で利得符号帳１７から出力される雑音符号用の利得候補ベクトルｉ１と乗算されて、利得調整が為された最適な雑音符号帳ベクトルｇ１とが出力される。
【００８２】
そして、利得調整が為された最適な適応符号帳ベクトルｅ１と利得調整が為された最適な雑音符号帳ベクトルｇ１とが加算器２０で加算されて音源信号ｊ１が生成され、聴覚重み付け合成フィルタ１４で、スペクトル包絡情報ｂ１が付加されるとともに聴覚重み付けが施された再生音声ｋ１が出力されることになる。
【００８３】
そして、最適候補ベクトル選択器１５においてフレーム電力計算器１２から出力されるフレーム電力情報ｃ１を用いて聴覚重み付けフィルタ２１から出力される聴覚重み付けされた入力音声ｎ１に正規化が施され、再生音声ｋ１の前記正規化された入力音声ｎ１に対する聴覚重み付き自乗平均誤差を求め、それが最小となる適応符号用の利得候補ベクトルｈ１と、雑音符号用の利得候補ベクトルｉ１とが最適な利得符号帳ベクトルとして選定され、選定されたベクトルの番号が利得符号帳の最適利得インデックスｐ１として出力されるようになっている。
【００８４】
そして、符号帳探索処理の結果選択された適応符号帳最適ベクトルｄ１及び雑音符号帳最適ベクトルｆ１及び利得符号帳最適ベクトルｈ１，ｉ１が適応符号帳１３，雑音符号帳１６，利得符号帳１７から出力されるように制御信号ｌ１を制御し、適応符号帳更新処理の動作としてその結果得られた音源信号ｊ１で適応符号帳１３の内部メモリ内容が更新され、その更新結果が次サブフレームの適応符号帳１３として用いられるようになっている。
【００８５】
上記サブフレーム単位の動作が繰り返され、最適候補ベクトル選択器１５において、最後のサブフレームについての符号帳探索処理が終了したなら、本発明の特徴部分である音声符号化情報補間処理の動作として、フレーム同期信号が送信されるフレームである場合に、最適候補ベクトル選択器１５内に記憶されている前フレームの音声符号化情報の中の最後のサブフレームの各符号帳最適インデックスｍ１，ｏ１，ｐ１に従って、適応符号帳１３，雑音符号帳１６，利得符号帳１７から前フレームの適応符号帳最適ベクトルｄ１及び雑音符号帳最適ベクトルｆ１及び利得符号帳最適ベクトルｈ１，ｉ１が出力されるように制御信号ｌ１を制御し、適応符号帳更新処理の動作としてその結果得られた音源信号ｊ１で適応符号帳１３の内部メモリ内容が更新され、その補間による更新結果が次サブフレームの適応符号帳１３として用いられるようになっている。
【００８６】
また、フレーム同期信号が送信されるフレームでない場合には、そのまま符号帳探索処理の結果選択された適応符号帳最適ベクトルｄ１及び雑音符号帳最適ベクトルｆ１及び利得符号帳最適ベクトルｈ１，ｉ１が適応符号帳１３，雑音符号帳１６，利得符号帳１７から出力されるように制御信号ｌ１を制御し、適応符号帳更新処理の動作としてその結果得られた音源信号ｊ１で適応符号帳１３の内部メモリ内容が更新され、その更新結果が次サブフレームの適応符号帳１３として用いられるようになっている。
【００８７】
次に、本発明の実施の形態に係る音声通信装置における音声復号化器（本音声復号化器）について、図４を用いて説明する。図４は、本発明の実施の形態に係る音声通信装置における音声復号化器の構成ブロック図である。
【００８８】
本音声復号化器は、図４に示すように、適応符号帳３１と、雑音符号帳３２と、利得符号帳３３と、乗算器３４と、乗算器３５と、加算器３６と、合成フィルタ３７と、ポストフィルタ３８とから構成されている。
【００８９】
次に、本音声符号化器の各部について説明する。
適応符号帳３１は、音声符号器（図１）の適応符号帳１３と同じ内容の適応符号帳であり、更に１つ前のサブフレームで生成された音源信号を記憶する前音源信号エリアが設けられている。
そして、受信して入力される適応符号帳最適インデックスｍ２に従って選択された適応符号帳最適ベクトルｄ２を出力するようになっている。
【００９０】
雑音符号帳３２は、音声符号器（図１）の雑音符号帳１６と同じ内容の雑音符号帳であり、受信して入力される雑音符号帳最適インデックスｏ２に従って選択された雑音符号帳最適ベクトルｆ２を出力するようになっている。
【００９１】
利得符号帳３３は、音声符号器（図１）の利得符号帳１７と同じ内容の利得符号帳であり、受信して入力される利得符号帳最適インデックスｐ２に従って選択された適応符号帳ベクトルの最適利得ｈ２と雑音符号帳ベクトルの最適利得ｉ２を出力するようになっている。
【００９２】
利得制御器３９は、適応符号帳ベクトルの利得ｈ２と雑音符号帳ベクトルの利得ｉ２とを入力して、受信したフレーム電力情報ｃ２を用いて利得調整を行い、利得調整された適応符号帳ベクトルの利得ｈ２′と雑音符号帳ベクトルの利得ｉ２′とを出力するものである。
【００９３】
乗算器３４は、最適な適応符号帳ベクトルｄ２に利得調整された利得ｈ２′を乗算し、利得調整された最適な適応符号帳ベクトルｅ２を出力するものである。
乗算器３５は、最適な雑音符号帳ベクトルｆ２に利得調整された利得ｉ２′を乗算し、利得調整された最適な雑音符号帳ベクトルｇ２を出力するものである。
加算器３６は、利得調整された最適な適応符号帳ベクトルｅ２と利得調整された最適な雑音符号帳ベクトルｇ２とを加算し、音源信号ｊ２を再生するものである。
【００９４】
合成フィルタ３７は、音源信号ｊ２に受信したベクトル包絡情報ｂ２を付加することにより再生音声ｋ２を生成するものである。
ポストフィルタ３８は、聴感上の再生音声品質を向上するため、再生音声ｋ２に対しホルマント強調処理を行いホルマント強調処理された再生音声ａ２を出力するものである。
【００９５】
次に、本音声復号化器の動作について図４を用いて説明する。
本音声復号化器では、フレーム単位で受信した表１に示す音声符号化情報に従って再生音声を生成する。以下にその動作を説明する。
まず、サブフレーム（１０ｍｓ、８０サンプル）毎に以下の処理を行い、音源信号ｊ２が再生される。
【００９６】
具体的には、受信した適応符号帳最適インデックスｍ２、雑音符号帳最適インデックスｏ２を基に、適応符号帳３１、雑音符号帳３２からそれぞれ適応符号帳最適ベクトルｄ２、雑音符号帳最適ベクトルｆ２が出力される。
一方、受信した利得符号帳最適インデックスｐ２を基に、利得符号帳３３から適応符号帳ベクトルの利得ｈ２と雑音符号帳ベクトルの利得ｉ２とが出力され、受信したフレーム電力情報ｃ２を用いて利得制御器３９で利得調整が行われ、利得調整された適応符号帳ベクトルの利得ｈ２′と雑音符号帳ベクトルの利得ｉ２′とが出力される。
【００９７】
適応符号帳３１から出力された適応符号帳最適ベクトルｄ２は、乗算器３４で利得制御器３９からの利得調整された適応符号帳ベクトルの利得ｈ２′が乗算されて利得調整された最適な適応符号帳ベクトルｅ２が出力され、同様に雑音符号帳３２から出力された雑音符号帳最適ベクトルｆ２は、乗算器３５で利得制御器３９からの利得調整された雑音符号帳ベクトルの利得ｉ２′が乗算されて利得調整された最適な適応符号帳ベクトルｇ２が出力され、加算器３６でｅ２とｇ２が加算されて音源信号ｊ１が再生される。
【００９８】
適応符号帳３１では音源信号ｊ２の再生が終了した後に、その音源信号ｊ２により更新され、その更新結果が次サブフレームの適応符号帳として用いられる。
ここで、本音声復号化器の適応符号帳３１の更新結果は、伝送誤りがない場合には本音声符号化器の適応符号帳１３の更新結果と全く等しくなるはずである。
【００９９】
そして、フレーム（４０ｍ雑音符号帳３２０サンプル）毎に以下の処理が実行される。
加算器３６から出力された音源信号ｊ２は、合成フィルタ３７で受信したベクトル包絡情報ｂ２が付加されて再生音声ｋ２が生成され、更にポストフィルタ３８で聴感上の再生音声品質を向上するためのホルマント強調処理が施されて再生音声ａ２が出力されるようになっている。
【０１００】
本発明の実施の形態の音声通信方法によれば、送信側の音声符号化側（送信側）においてフレーム同期信号を送信するフレームに対する音声符号化情報抽出処理で、復号化側（受信側）での音声符号化情報補間処理と同一の補間処理を施すので、送信側の音声符号化器と受信側の音声復号器の適応符号帳の内部メモリ内容の更新結果が常に等しく保たれ、フレーム同期信号挿入による再生音声品質劣下が複数フレームに影響せず、再生音声信号の品質劣下を低減できる効果がある。
【０１０１】
本発明の実施の形態の音声通信装置によれば、音声符号化器の最適候補ベクトル選択器１５において、フレーム内の最後のサブフレームの符号帳探索処理と適応符号帳更新処理との間に音声符号化情報補間処理を挿入して、フレーム同期信号を送信するフレームに対して補間処理を行うので、音声符号化器の最適候補ベクトル選択器１５以外の部分及び音声復号化器側は従来のまま使用できるので、容易に実現できる効果がある。
【０１０２】
また、本発明の音声符号化器は、ＤＳＰ（デジタル・シグナル・プロセッサ）またはＣＰＵで実現されるため、本発明はそれらのソフトウエアを変更することで容易に実現できる効果がある。
【０１０３】
【発明の効果】
請求項１，２記載の発明によれば、周期的に送信側にて同期信号が送信されるフレームの音声符号化情報について、当該同期信号を受信したフレームの音声符号化情報に対して受信側で為される音声符号化情報の補間処理と同じ補間処理を送信側で行い、補間処理によって得られた音声符号化情報に従って適応符号帳を更新する音声通信方法としているので、適応符号帳を用いた音声符号化・復号化に際して、前の音声符号化情報を反映させながら処理が為されるような場合、送信側と受信側で同じ音声符号化情報の補間処理が為されるようになるため、送信側の音声符号化の影響と受信側の音声復号化の影響とが等しくなり、再生音声の品質向上を図ることができる効果がある。
【０１０４】
請求項３記載の発明によれば、送信側の音声符号化器が、送信部にて同期信号が送信されるフレームの音声符号化情報について、当該同期信号を受信したフレームの音声符号化情報に対して受信部で為される音声符号化情報の補間処理と同じ補間処理を行い、前記補間処理によって得られた音声符号化情報に従って前記適応符号帳を更新する音声通信装置としているので、適応符号帳を用いた音声符号化・復号化に際して、前の音声符号化情報を反映させながら処理が為されるような場合、送信側と受信側で同じ音声符号化情報の補間処理が為されるようになるため、送信側の音声符号化の影響と受信側の音声復号化の影響とが等しくなり、再生音声の品質向上を図ることができる効果がある。
【０１０５】
請求項４記載の発明によれば、周期的に同期信号が送信されるフレームでは、当該フレームの音声符号化情報について当該同期信号を受信したフレームの音声符号化情報に対して受信側で為される音声符号化情報の補間処理と同様の処理をスペクトル包絡パラメータ抽出器及びフレーム電力計算器及び最適候補ベクトル選択器で行い、補間処理された音声符号化情報に従って最適の各符号帳ベクトルが出力されるよう、適応符号帳、雑音符号帳、利得符号帳に制御信号を出力し、適応符号帳、雑音符号帳、利得符号帳で制御信号に従って適応符号、雑音符号、利得の候補ベクトルを出力し、その結果得られる音源信号の入力を受けて適応符号帳が適応符号の候補ベクトルの内容を更新する音声符号化器を具備する送信側装置を備えた音声通信装置としているので、適応符号帳を用いた音声符号化・復号化に際して、前の音声符号化情報を反映させながら処理が為されるような場合、送信側と受信側で同じ音声符号化情報の補間処理が為されるようになるため、送信側の音声符号化の影響と受信側の音声復号化の影響とが等しくなり、再生音声の品質向上を図ることができる効果がある。
【０１０６】
請求項５記載の発明によれば、送信側の音声符号化器が、送信部にて同期信号が送信されるフレームについて受信部で為される音声符号化情報の補間処理と同様に１つ前のフレームで得られた音声符号化情報を用いる補間処理を行う請求項３又は請求項４記載の音声通信装置としているので、適応符号帳を用いた音声符号化・復号化に際して、前の音声符号化情報を反映させながら処理が為されるような場合、簡単な処理によって送信側と受信側で同じ音声符号化情報の補間処理が為されるようになるため、送信側の音声符号化の影響と受信側の音声復号化の影響とが等しくなり、再生音声の品質向上を図ることができる効果がある。
【図面の簡単な説明】
【図１】本発明の実施の形態に係る音声通信装置における音声符号化・復号化処理とフレーム同期信号送受信タイミングを示す説明図である。
【図２】本発明の実施の形態に係る音声通信装置における音声符号化器の構成ブロック図である。
【図３】本音声符号化器の最適候補ベクトル選択器１５における音声符号化情報補間処理の流れを示すフローチャート図である。
【図４】本発明の実施の形態に係る音声通信装置における音声復号化器の構成ブロック図である。
【図５】従来の音声通信装置の概略構成を示すブロック図である。
【図６】従来の音声通信装置における音声符号化・復号化処理とフレーム同期信号送受信タイミングを示す説明図である。
【符号の説明】
１…音声入力部、２…音声符号化器、３…送信部、４…受信部、５…音声復号化器、６…音声出力部、１１…スペクトル包絡パラメータ抽出器、１２…フレーム電力計算器、１３…適応符号帳、１４…聴覚重み付け合成フィルタ、１５…最適候補ベクトル選択器、１６…雑音符号帳、１７…利得符号帳、１８…乗算器、１９…乗算器、２０…加算器、２１…聴覚重み付けフィルタ、３１…適応符号帳、３２…雑音符号帳、３３…利得符号帳、３４…乗算器、３５…乗算器、３６…加算器、３７…合成フィルタ、３８…ポストフィルタ、３９…利得制御器[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a voice communication method and voice communication apparatus using voice coding / decoding technology using an adaptive codebook (or long-term prediction), and more particularly to a voice communication method and voice capable of improving the quality of reproduced voice. The present invention relates to a communication device.
[0002]
[Prior art]
First, a schematic configuration of a conventional voice communication apparatus will be described with reference to FIG. FIG. 5 is a block diagram showing a schematic configuration of a conventional voice communication apparatus.
As shown in FIG. 5, the conventional voice communication apparatus includes, as a transmission part, a voice input unit 1 that inputs a voice to be transmitted, samples, quantizes, and outputs an input voice in units of frames; A speech encoder 2 that outputs encoded information and a transmitter 3 that transmits the encoded information are configured.
The receiving part includes a receiving unit 4 that receives transmitted frame-by-frame audio encoded information, an audio decoder 5 that decodes received audio encoded information and reproduces audio, and reproduced audio. The audio output unit 6 is configured to output.
[0003]
Here, the speech coder 2 performs coding using a speech coding technique that uses an adaptive codebook (also called long-term prediction). For example, the speech coder 2 uses the most widely used speech in mobile communication or the like. A code-excited linear prediction (CELP) speech coding method, which is a coding method, is well known.
[0004]
The code-excited linear prediction (CELP) speech coding method performs coding on a frame-by-frame basis, predicts the speech of the current frame based on the coding information of the previous frame, optimizes the prediction result, and The information is used as speech encoding information for the current frame, and the optimized prediction result is used for encoding the next frame.
[0005]
Therefore, the code-excited linear prediction (CELP) speech decoding method performed by the speech decoder 5 performs decoding on a frame-by-frame basis and uses the result decoded in the previous frame to receive received speech. Decoding is performed according to the encoding information, and the decoding result is used when decoding the next frame.
[0006]
The transmission unit 3 transmits the speech coding information in units of frames encoded by the speech coder 2, and a frame is transmitted between the speech coder 2 on the transmission side and the speech decoder on the reception side. In order to maintain and correct the synchronization, when transmitting speech coding information, the frame synchronization signal is replaced with the speech coding information and transmitted every predetermined frame period.
[0007]
The receiving unit 4 receives the speech encoded information or the frame synchronization signal transmitted in units of frames, and in the case of the speech encoded information, stores the speech encoded information in the storage area and then converts the speech encoded information into the speech decoder 5. In the case of the frame synchronization signal, for example, the speech coding information of the previous frame stored in the storage area is output to the speech decoder 5.
[0008]
Next, the operation of the conventional voice communication apparatus will be specifically described with reference to FIG. FIG. 6 is an explanatory diagram showing speech encoding / decoding processing and frame synchronization signal transmission / reception timing in a conventional speech communication apparatus. In FIG. 6, the processing time required for the speech encoding process is 1 frame, and the processing time required for the playback speech decoding process is 1 frame.
[0009]
In FIG. 6, ftn (n = 0, 1, 2,...) Is an index indicating the frame number on the transmission side (voice encoder side), and frn (n = 0, 1, 2,...) Is the reception side. It is an index indicating a frame number on the (speech decoder side).
[0010]
In a conventional voice communication apparatus, as an operation on the transmission side, as shown in FIG. 6A, voice is input, sampled and quantized in the voice input unit 1, and stored in an input buffer having a length of one frame. Is done.
[0011]
Then, the speech accumulated in the speech input unit 1 is extracted by the speech coder 2 in units of frames as shown in FIG. 6B (in FIG. 6, it is displayed as speech information extraction). )
Here, the extraction of the audio encoded information is started after the accumulation of the audio signal for one frame in the input buffer is completed. For example, the voice accumulated in the section of the frame ft0 is extracted (encoded) in the section of the frame ft1.
[0012]
Then, the speech encoded information extracted by the speech coder 2 is transmitted as shown in FIG. 6C by the output unit 3 (in FIG. 6, “speech information transmission” is displayed).
Here, transmission of speech coding information is started after the extraction process is completed. For example, transmission of audio information extracted in the section of the frame ft1 is started from the head of the frame ft2.
[0013]
In a conventional voice communication apparatus, this voice input, voice information extraction, and voice information transmission are normally repeated. However, in order to ensure frame synchronization with the receiving side, every predetermined frame period in the transmission process in the transmission unit 3 The frame synchronization signal is transmitted instead of the voice encoded information.
[0014]
In the example of FIG. 6, the frame synchronization signal is transmitted every 50 frames (frames ft2, ft52,...), And the audio coding information to be transmitted at this time is not transmitted. For example, the speech encoded information extracted in the frame ft1 to be transmitted in the frame ft2 is not transmitted.
[0015]
On the other hand, in the operation on the reception side of the conventional voice communication apparatus, the voice encoding information is received by the receiving unit 4 as shown in FIG. 6D and stored in the reception buffer.
However, in the example of FIG. 6, it is assumed that it takes time for one frame to receive the audio coding information for one frame, and the frame synchronization signal is received every 50 frames (frames fr2, fr52,...). ing.
[0016]
Then, the encoded information for decoding is updated in the next frame section by the encoded audio information stored in the receiver 4, and the audio decoder 5 performs the audio decoding process as shown in FIG. 6 (e). Is reproduced (in FIG. 6, it is displayed as audio information update), and the audio output unit 6 outputs the reproduced audio as shown in FIG. 6 (f) (in FIG. 6, displayed as reproduced audio output). )
[0017]
For example, the speech encoding information received in the frame fr0 is decoded in the next frame section fr1, and the reproduced voice is output in the next frame section fr2.
[0018]
However, if the receiving unit 4 receives a frame synchronization signal every 50 frames (for example, the frames fr2, fr52,...), When performing speech decoding processing on these frames with the frames fr3, fr531,. Since there is no speech coding information, interpolation processing is performed using speech coding information received in other frames.
As an example of the interpolation processing, there is a method of replacing with speech encoded information received in the previous frame.
[0019]
Then, the speech coder 5 performs decoding using the interpolated speech coding information, and the reproduced speech is output.
[0020]
[Problems to be solved by the invention]
However, in the conventional voice communication method and voice communication apparatus described above, the voice coding information to be transmitted is discarded in the frame section in which the frame synchronization signal is transmitted, and the voice coding information in the section is interpolated on the receiving side. Since it is generated and decoded, there is a problem that the quality of the reproduced speech is inferior to decoding using actual speech coding information in the frame section.
[0021]
Further, in the conventional voice communication method and voice communication apparatus, the next frame is encoded while reflecting the voice encoding information of the previous frame, and the decoding of the next frame is also performed in the decoding while reflecting the voice decoding result of the previous frame. Because of the use of encoding / decoding methods, the above-mentioned deterioration in the quality of the reproduced audio output also affects the decoding of the subsequent frames, and the quality degradation of the reproduced audio occurs continuously in several frames. There was a problem of doing.
[0022]
The present invention has been made in view of the above circumstances, and performs speech encoding by frame synchronization signal transmission by performing processing similar to the interpolation of speech encoding information on the reception side associated with frame synchronization signal transmission on the transmission side. It is an object of the present invention to provide a voice communication method and a voice communication apparatus capable of reducing the quality deterioration of a reproduced voice so that deterioration of the quality of the reproduced voice of a frame in which information is discarded is not affected by the subsequent frames.
[0023]
[Means for Solving the Problems]
The invention according to claim 1 for solving the problems of the conventional example described above is a voice communication method using voice coding / decoding processing using an adaptive codebook, wherein the input voice signal is transmitted on the transmission side. In a voice communication method in which voice coding information is extracted and transmitted by performing voice coding processing, and a voice signal is reproduced by performing voice decoding processing on the voice coding information received on the receiving side. When a synchronization signal is transmitted instead of speech coding information, a frame in which the synchronization signal is transmitted on the transmission side Speech coding information about , For the audio coding information of the frame that received the synchronization signal Interpolation processing of speech encoded information performed at the receiving side; the same Perform interpolation processing Update the adaptive codebook according to the speech coding information obtained by the interpolation processing When processing is performed while reflecting the previous speech coding information during speech coding / decoding using an adaptive codebook, the same speech coding information is used on the transmitting side and the receiving side. Therefore, the influence of the voice encoding on the transmission side and the influence of the voice decoding on the reception side are equal, and the quality of the reproduced voice can be improved.
[0024]
The invention according to claim 2 for solving the problems of the conventional example is the speech communication method according to claim 1, wherein the speech coding information interpolation processing is performed by speech coding obtained in the previous frame. The information can be used.
[0025]
According to a third aspect of the present invention for solving the problems of the conventional example, in the voice communication apparatus, a voice input unit that inputs voice and outputs a voice signal, and the voice signal is voiced using an adaptive codebook. A transmission side having an audio encoder that performs encoding processing and extracts audio encoding information, and a transmission unit that transmits the audio encoding information and periodically transmits a synchronization signal instead of the audio encoding information And receiving the transmitted speech coding information, and receiving the synchronization signal, a receiving unit that outputs speech coding information obtained in the previous frame as interpolation processing of the speech coding information, and speech coding information A speech decoder that decodes using an adaptive codebook and outputs a speech signal; and a reception side having a speech output unit that outputs the speech signal as speech, and the speech coder includes the transmitter In sync signal Frame to be trust Speech coding information about , For the audio coding information of the frame that received the synchronization signal Interpolation processing performed in the receiving unit; the same Perform interpolation processing Update the adaptive codebook according to the speech coding information obtained by the interpolation processing When processing is performed while reflecting the previous speech coding information during speech coding / decoding using an adaptive codebook, the same speech coding information is used on the transmitting side and the receiving side. Therefore, the influence of the voice encoding on the transmission side and the influence of the voice decoding on the reception side are equal, and the quality of the reproduced voice can be improved.
[0026]
The invention according to claim 4 for solving the problems of the conventional example, in the voice communication device, for the input voice signal, to extract the spectral envelope information expressing the vocal tract characteristics in the voice generation system in units of frames, Frame in which synchronization signal is transmitted Then, a spectral envelope parameter extractor that uses the spectral envelope information of the previous frame as the spectral envelope information of the frame, and outputs frame power information by performing frame power calculation for each input speech signal frame. Frame in which synchronization signal is transmitted Then, the frame power calculator that uses the frame power information of the previous frame as the frame power information of the frame and the perceptual weighting process using the spectrum envelope information for the input speech signal, and the perceptually weighted input speech signal is output An auditory weighting filter that outputs a candidate vector of an optimal adaptive code selected according to an input control signal and an adaptive code that receives the input of the sound source signal. An adaptive codebook for updating the content of the candidate vector of the signal and a codebook for expressing the noise component in the sound source signal and outputting a candidate vector of the optimal noise code selected according to the input control signal And a codebook for adjusting the gain, and for the adaptive codebook selected according to the input control signal. A gain codebook that outputs a candidate vector and a gain candidate vector for a noise codebook, and a first adaptive codebook vector that is gain-adjusted by multiplying the optimal adaptive codebook vector by the gain candidate vector A multiplier, a second multiplier that multiplies the optimal noise codebook vector by a gain candidate vector and outputs a gain-adjusted optimal noise codebook vector, and a gain-adjusted optimal adaptive codebook vector and gain Adds the adjusted optimal noise codebook vector and outputs a sound source signal; adds the spectral envelope information to the sound source signal and performs auditory weighting to generate and output a reproduced speech signal Perceptual weighting synthesis filter ,in front The codebook search process for searching the optimum codebook vectors in the adaptive codebook, the noise codebook, and the gain codebook and outputting each codebook optimum index is performed. In a frame in which no synchronization signal is transmitted, a control signal is output to the adaptive codebook, the noise codebook, and the gain codebook so that the optimum codebook vector selected in the search process is output. Update the adaptive codebook and Frame in which the initial signal is transmitted Then, speech encoding information of the frame about For the audio coding information of the frame that received the synchronization signal Interpolation processing of speech coding information performed on the receiving side the same Interpolation processing is performed according to the speech coding information obtained by the interpolation processing. The adaptive codebook, the noise codebook, and the gain codebook are output so that optimal codebook vectors are output. Output control signal To update the adaptive codebook A transmission-side apparatus having a speech coder having an optimal candidate vector selector for performing an adaptive codebook update process is provided, and before speech coding / decoding using an adaptive codebook, When processing is performed while reflecting the speech coding information of the same, the same speech coding information interpolation processing is performed on the transmitting side and the receiving side. The influence of audio decoding on the receiving side becomes equal, and the quality of reproduced audio can be improved.
[0027]
The invention according to claim 5 for solving the problems of the conventional example described above is that in the voice communication apparatus according to

claim

3 or 4, the interpolation processing of the voice encoded information is obtained in the previous frame. It is conceivable that the voice encoded information is used.
[0028]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described with reference to the drawings.
The voice communication method and the voice communication apparatus according to the embodiment of the present invention are used when the synchronization signal is periodically transmitted instead of the voice coded information and the voice coded information is interpolated on the receiving side. Since the voice communication method and the voice communication apparatus perform the same processing as the speech encoding information interpolation processing performed on the reception side for the frame on which the synchronization signal is transmitted on the transmission side, an adaptive codebook is used. When processing is performed while reflecting the previous speech coding information at the time of speech coding / decoding, the same speech coding information interpolation processing will be performed on the transmitting side and the receiving side. Thus, the influence of the voice encoding on the transmission side is equal to the influence of the voice decoding on the reception side, so that the quality of the reproduced voice can be improved.
[0029]
The voice communication apparatus (this apparatus) according to the embodiment of the present invention is basically the same as the configuration shown in FIG. 5 except that the processing operation in the voice encoder 2 is different from the conventional one. Yes. The configuration and operation of the speech encoder 2 will be described later.
[0030]
First, a voice communication method according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 is an explanatory diagram showing speech encoding / decoding processing and frame synchronization signal transmission / reception timing in the speech communication apparatus according to the embodiment of the present invention. In FIG. 1, the processing time required for the speech encoding process is 1 frame, and the processing time required for the playback speech decoding process is 1 frame.
[0031]
Here, the process shown in FIG. 1 is almost the same as the process shown in FIG. 6 except that the speech coder 2 extracts speech coding information in units of frames (the processing in FIG. 1B). Is different.
Specifically, in order to ensure the synchronization of the frame with the receiving side, when transmitting a frame synchronization signal instead of the speech encoded information at the transmission unit 3 at the frame ft2, the speech input is conventionally performed at the frame ft0. The voice information is extracted at the frame ft1 and the frame synchronization signal is transmitted without transmitting the voice information at the frame ft2. However, in the embodiment of the present invention, the voice is input at the frame ft0 and the frame ft1 is transmitted. Audio information is not extracted, audio information is interpolated, and a frame synchronization signal is transmitted at frame ft2.
[0032]
Here, the speech information interpolation in the frames ft1 and ft51 is the same as the interpolation in the speech decoder 5 on the receiving side. For example, if the transmission side performs interpolation to replace the speech encoding information of the previous frame, the reception side also performs interpolation to replace the speech decoding information of the previous frame. In short, any interpolation method may be used as long as the same interpolation is performed on the transmission side and the reception side.
[0033]
Next, with regard to the voice communication apparatus that implements the voice communication method according to the embodiment of the present invention, the code-excited linear prediction (Code Excited), which is the most widely used voice coding method in mobile communication, etc. A linear encoding (CELP) speech encoding / decoding method will be described as an example.
[0034]
In the CELP speech coding / decoding method, speech coding information is extracted and transmitted in frame units on the transmission side, and decoding is performed based on the speech coding information received on the reception side.
Here, speech coding information in the CELP speech coding / decoding method includes items shown in [Table 1]. [Table 1] shows an example in which the input voice is sampled at 8 kHz and quantized at 16 bits, and one frame is shown as 40 ms and 320 samples, and the subframe is shown as 10 ms and 8 samples.
[0035]
[Table 1]

[0036]
Here, the spectrum envelope information b1 is information expressing vocal tract characteristics in a human voice generation system, and is information extracted every frame (40 ms).
The frame power information c1 is information representing the power in units of frames (40 ms).
[0037]
The adaptive codebook optimal index m1 is information indicating the number of the optimal candidate vector in the adaptive codebook for expressing the periodic component in the excitation signal, and the noise codebook optimal index o1 expresses the noise component in the excitation signal. The gain codebook optimal index p1 is information indicating the optimal candidate vector number in the gain codebook for adjusting the gain. The index is also information extracted every subframe (10 ms).
[0038]
As a result, the speech coding information extracted and transmitted in units of frames includes one set of spectrum envelope information b1 and frame power information c1, four sets of adaptive codebook optimum index m1, noise codebook optimum index o1, and gain. It consists of a codebook optimal index p1.
[0039]
Next, a speech encoder (present speech encoder) in the speech communication apparatus according to the embodiment of the present invention will be described with reference to FIG. FIG. 2 is a configuration block diagram of a speech coder in the speech communication apparatus according to the embodiment of the present invention.
[0040]
As shown in FIG. 2, the speech encoder includes a spectral envelope parameter extractor 11, a frame power calculator 12, an adaptive codebook 13, an auditory weighting synthesis filter 14, an optimal candidate vector selector 15, The noise codebook 16, the gain codebook 17, a multiplier 18, a multiplier 19, an adder 20, and an auditory weighting filter 21 are included.
[0041]
Next, each part of the speech coder will be described.
The spectrum envelope parameter extractor 11 inputs the input speech a1 that is input and sampled in the speech input unit 1 and further quantized, and extracts the spectrum envelope information b1 as a part of speech coding information. Output.
[0042]
However, when the extraction / replacement control signal q1 output from the optimum candidate vector selector 15 to be described later is input as a characteristic part of the spectral envelope parameter extractor 11 of the present invention, and the control signal instructs extraction. When the spectral envelope information b1 of the frame of the input speech a1 is extracted and the control signal indicates replacement, the replacement is performed with the speech information for interpolation without extraction and the spectral envelope information b1 is output. It has become.
[0043]
Note that the speech information for interpolation is, for example, the speech information of the previous frame (spectrum envelope information b1).
Further, switching between extraction and replacement may be performed by providing a frame counter or the like inside the control signal q1 from the optimal candidate vector selector 15 and counting the timing for performing the interpolation processing.
[0044]
Here, the spectrum envelope information is information representing vocal tract characteristics in a human voice generation system, and the spectrum envelope information b1 is quantized and then transmitted to the decoder side and used to generate a reproduced voice signal. It is done. Further, as will be described later, the auditory weighting filter 21 and the auditory weighting synthesis filter 14 are used when auditory weighting is performed.
[0045]
The frame power calculator 12 inputs the input speech a1 from the speech input unit 1 in units of frames, performs frame power calculation, and outputs the frame power information c1 as part of speech coding information.
Here, the frame power information c1 is transmitted to the decoder side and used to generate a reproduced audio signal. As will be described later, the frame power information is used in the process of searching the gain codebook 17 by the optimal candidate vector selector 15.
[0046]
However, when the extraction / replacement control signal q1 output from the optimal candidate vector selector 15 to be described later is input as a characteristic part of the frame power calculator 12 of the present invention and the control signal instructs extraction, When the frame power information c1 of the frame of the input voice a1 is extracted and the control signal indicates replacement, the frame power information c1 is output by performing replacement with the voice information for interpolation without performing extraction. It has become.
[0047]
Note that the audio information for interpolation is, for example, audio information of the previous frame (frame power information c1).
Further, switching between extraction and replacement may be performed by providing a frame counter or the like inside the control signal q1 from the optimal candidate vector selector 15 and counting the timing for performing the interpolation processing.
[0048]
The perceptual weighting filter 21 performs perceptual weighting processing (known technique) on the input signal a1 from the speech input unit 1 using the spectral envelope information (parameter) b1 in units of subframes, and perceptually weighted input speech nl. Is output.
[0049]
The adaptive codebook 13 is a codebook for expressing a periodic component in a sound source signal. For example, 128 types of pitch component patterns are stored in advance (sizes 128 and 80 dimensions), and the previous sub-book is stored. A pre-sound source signal area for storing a sound source signal generated by the optimum adaptive codebook vector, noise codebook vector, and gain codebook vector extracted in the frame is provided.
The candidate vector d1 of the optimum adaptive code selected according to the input control signal l1 is output.
[0050]
The noise codebook 16 is a codebook for expressing a noise component in a sound source signal, and stores, for example, 512 types of noise component patterns (size 512, 80 dimensions), and in accordance with an input control signal l1. The selected optimal noise code candidate vector f1 is output.
[0051]
The gain codebook 17 is a codebook for adjusting the gain. For example, the gain codebook 17 stores 128 types of gain patterns (size 128, two dimensions) and is used for the adaptive code selected according to the input control signal l1. Gain candidate vector h1 and noise code gain candidate vector i1 are output.
[0052]
The multiplier 18 multiplies the optimal adaptive codebook vector d1 by the gain candidate vector h1, and outputs the optimal adaptive codebook vector e1 whose gain has been adjusted.
The multiplier 19 multiplies the optimal noise codebook vector f1 by the gain candidate vector i1, and outputs an optimal noise codebook vector g1 whose gain has been adjusted.
The adder 20 adds the optimum adaptive codebook vector e1 whose gain has been adjusted and the optimum noise codebook vector g1 whose gain has been adjusted, and outputs the excitation signal j1.
[0053]
The perceptual weighting synthesis filter 14 adds the spectral envelope information b1 to the sound source signal j1 and performs perceptual weighting to generate and output the reproduced sound k1.
[0054]
Specifically, the filtering is performed after correcting the coefficient of the synthesis filter for adding the spectral envelope information b1 to the sound source signal j1 to perform auditory weighting.
[0055]
The optimum candidate vector selector 15 basically selects the optimum codebook vector in the adaptive codebook 13, the noise codebook 16, and the gain codebook 17 on a subframe basis. An extraction / replacement control signal q1 to the spectrum envelope parameter extractor 11 and the frame power calculator 12 is output.
[0056]
Here, the extraction / replacement control signal q1 indicates whether to extract speech information in the spectrum envelope parameter extractor 11 and the frame power calculator 12, or to replace with speech information for interpolation without extracting the speech information. Signal.
In other words, the optimal candidate vector selector 15 instructs the spectrum envelope parameter extractor 11 and the frame power calculator 12 to perform extraction with the extraction / replacement control signal q1 during a normal frame, and a synchronization signal is transmitted. In this case, replacement is instructed by the extraction / replacement control signal q1.
[0057]
It should be noted that the control signal q1 is not output from the optimum candidate vector selector 15, and the spectrum envelope parameter extractor 11 and the frame power calculator 12 are internally provided with a frame counter or the like so as to count the timing for performing the interpolation process. It doesn't matter.
[0058]
Further, the optimum candidate code selector 15 searches for the optimum codebook vector by searching for each optimum codebook vector in the adaptive codebook 13, the noise codebook 16, and the gain codebook 17 for each subframe. The codebook search process for outputting the codebook vector number as the codebook optimum index m1, o1, p1, and the extracted speech coding information or interpolated speech coding information are applied to the codebook search of the next frame. Adaptive codebook update processing is performed for the number of subframes. As a result, for example, when one frame is 40 ms and a subframe is 10 ms, four sets of codebook optimum vectors are extracted for one frame and output as a part of speech coding information.
[0059]
However, as a characteristic part of the present invention, for the last subframe in the frame, after completion of the codebook search process, a speech information interpolation process for interpolating speech coding information similar to that on the decoder side is performed for each predetermined frame It is like that.
Details of the audio information interpolation processing will be described later.
[0060]
Specifically, the codebook search process controls each candidate vector output from the adaptive codebook 13, the noise codebook 16, and the gain codebook 17 by the control signal l1, and is perceptually weighted with the reproduced speech k1 for each candidate vector. The root mean square error with the input speech n1 is calculated, and a codebook search is performed to select a candidate vector that minimizes it as an optimal vector, and the optimal vector number of each codebook (adaptive, noise, and gain codebook) is obtained. The codebook optimum index m1, o1, and p1 are output as part of the speech coding information.
[0061]
Here, the codebook search procedure executed for each subframe by the optimal candidate vector selector 15 will be described.
The outline of the codebook search in the optimal candidate vector selector 15 is as follows. First, as a first step, an adaptive codebook search (also called long-term prediction) for searching for an optimal adaptive codebook vector in the adaptive codebook 13 is performed, and then the second As a step, a noise codebook search for searching for an optimal noise codebook vector in the noise codebook 16 is performed, and after determining an optimal adaptive codebook vector and a noise codebook vector, a gain codebook search is performed as a final third step. To do.
Details of each codebook search will be described in the operation of the voice communication apparatus of the present invention.
[0062]
Then, the adaptive codebook update processing is generated with the selected optimum adaptation, noise, excitation signal j1 generated with the gain codebook vector, or with the codebook vector replaced with the speech coding information of the previous frame by the interpolation processing This is a process of creating the adaptive codebook 13 to be used in the next subframe by updating the internal memory of the adaptive codebook 13 with the excitation signal j1 to be generated.
[0063]
Here, as a specific method for updating the internal memory of the adaptive codebook 13, for example, the content of the currently stored adaptive codebook 13 (for example, 160 samples are stored) is subframe length (80 samples). As a result, 0 is entered in the latter half (new part), and the sound source signal (80 samples) obtained in the current subframe is substituted for that part.
[0064]
Next, speech encoded information interpolation processing, which is a feature of the present invention, is performed when the frame to which speech encoded information is to be extracted is a frame for transmitting a frame synchronization signal from the transmission unit 3 every predetermined frame. The same processing as the speech encoding information interpolation processing at that time is performed.
[0065]
Specifically, when the interpolation processing method performs, for example, replacement with speech encoding information of the previous frame, the speech encoding information of the previous frame is stored in the optimal candidate vector selector 15 and the last sub After completion of the codebook search for the frame, from the adaptive codebook 13, the noise codebook 16, and the gain codebook 17 according to the optimum indexes m1, o1, and p1 of each codebook in the stored speech coding information of the previous frame The control signal l1 is controlled so that the adaptive codebook optimal vector d1, the noise codebook optimal vector f1 and the gain codebook optimal vectors h1 and i1 of the previous frame are output, and the adaptive signal described above with the excitation signal j1 obtained as a result. The code book update process is performed to update the contents of the internal memory of the adaptive code book 13.
[0066]
Here, the control flow of speech coding information interpolation processing will be described with reference to FIG. FIG. 3 is a flowchart showing the flow of speech coding information interpolation processing in the optimum candidate vector selector 15 of the speech coder. In FIG. 3, the frame counter Cf is reset at the start of encoding, and a frame synchronization signal is inserted every 50 frames.
[0067]
In the speech encoding information interpolation process in the optimum candidate vector selector 15 of the speech coder, the frame counter Cf is incremented (100), it is determined whether Cf is greater than 50 (102), and Cf is not greater than 50. (No) stores the codebook optimum indexes m1, o1, and p1 of the selected adaptation, noise, and gain as speech encoding information of the current frame (110), and ends the interpolation process.
[0068]
On the other hand, when Cf becomes larger than 50 in the process 102 (Yes), the frame counter Cf is reset (104), and the speech coding information of the previous frame is replaced with the speech coding information of the current frame (106). The interpolation process is terminated.
[0069]
Next, the operation of the speech coder will be described with reference to FIG.
In the present speech coder, when the input speech a1 is input in units of frames, the spectral envelope parameter extractor 11 extracts the spectral envelope information b1 and outputs it as a part of the speech encoded information to the transmitting unit 3. The auditory weighting synthesis filter 14 and the auditory weighting filter 21 are given.
On the other hand, the frame power calculator 12 extracts the frame power information c 1 from the input speech a 1, outputs it to the transmission unit 3 as a part of the speech coding information, and gives it to the optimum candidate vector selector 15.
[0070]
At this time, the spectrum envelope parameter extractor 11 and the frame power calculator 12 do not perform extraction in the case of a frame in which a synchronization signal is transmitted according to the extraction / replacement control signal q 1 from the optimal candidate vector selector 15. The spectrum envelope information b1 and the frame power information c1 each replaced with the information for interpolation are output.
[0071]
Thereafter, perceptual weighting is performed on the input speech a1 by the perceptual weighting filter 21 using the spectral envelope parameter extractor b1 from the spectral envelope parameter extractor 11, and the perceptually weighted input speech nl is optimal. The result is output to the candidate vector selector 15.
[0072]
Further, in the optimal candidate vector selector 15, as an operation of the adaptive codebook search which is the first stage of the codebook search process, the candidate vector d1 first stored in the adaptive codebook 13 is sequentially applied to the adaptive codebook by the control signal l1. The control signal 11 is controlled so that no candidate vector is output from the noise codebook 16 and the gain codebook 17 at this time.
[0073]
Then, the candidate vector d1 stored from the adaptive codebook 13 is output in order, passes through the multiplier 18 and the adder 20, and is output as a sound source signal j1 having periodicity, and the perceptual weighting synthesis filter 14 extracts the spectral envelope parameters. Spectral envelope information b1 from the device 11 is added and auditory weighting is performed, and a partially reproduced speech (adaptive codebook contribution) k1 is generated and output.
[0074]
Then, the optimum candidate vector selector 15 outputs an output from the perceptual weighting filter 21 after an optimum gain is given to the partially reproduced speech (adaptive codebook contribution) k1 generated for each candidate vector d1. The root mean square error with each of the perceptually weighted input speech n1 is calculated, the candidate vector d1 that minimizes it is selected as the optimal adaptive codebook vector, and the number of the selected vector is the adaptive codebook 13 It is output as the codebook optimum index m1.
[0075]
Here, the optimum gain is a gain (optimum gain) that minimizes the mean square error by setting the result of partial differentiation with respect to the gain multiplied by the reproduction signal k1 in the formula for calculating the mean square error to zero. ), And the gain is fixed, and the reproduction signal vector k1 is sequentially replaced to find the mean square error, so that the optimum adaptive codebook vector is searched.
Since the method for calculating the root mean square error is a known technique, a detailed description thereof is omitted here.
[0076]
Next, in the optimal candidate vector selector 15, as an operation of the noise codebook search which is the second stage of the codebook search process, the candidate vector f1 first stored in the noise codebook 16 is sequentially converted into the noise code by the control signal l1. The control signal l1 is controlled so that the candidate vector is not output from the adaptive codebook 13 and the gain codebook 17 at this time.
[0077]
Then, the candidate vector f1 stored from the noise codebook 16 is sequentially output, passes through the multiplier 19 and the adder 20, and is output as a noise source signal j1, and the spectrum envelope parameter extractor 11 is output by the perceptual weighting synthesis filter 14. Spectrum envelope information b1 is added and auditory weighting is performed, and a partially reproduced speech (noise codebook contribution) k1 is generated and output.
[0078]
Here, in the noise codebook search, each candidate vector f1 is subjected to orthogonalization processing on the optimal adaptive codebook vector subjected to the perceptual weighting synthesis filter processing in order to reduce the quantization error of the reproduced speech ( Known techniques).
However, for the sake of convenience, since the same result can be obtained even if orthogonalization processing is performed on the reproduced speech k1 for each candidate vector, the orthogonalization processing is performed by the optimal candidate vector selector 15 in the present invention.
[0079]
Therefore, the optimum candidate vector selector 15 performs orthogonalization processing on the partially reproduced speech (noise codebook contribution) k1 generated for each candidate vector f1, and after the optimum gain is given. The root mean square error with the perceptually weighted input speech n1 output from the perceptual weighting filter 21 is calculated, and the candidate vector f1 that minimizes it is selected as the optimal noise codebook vector. The number is output as the codebook optimum index o1 of the noise codebook.
[0080]
Next, in the optimal candidate vector selector 15, as the operation of the gain codebook search that is the third stage of the codebook search process, the optimum codebook 13 determined by the adaptive codebook search from the adaptive codebook 13 by the control signal l1. The adaptive codebook vector d1 is output from the noise codebook 16 and the optimum noise codebook vector f1 determined by the noise codebook search is output, and the gain candidate vector for adaptive code is further output from the gain codebook 17. All the stored h1 and the gain candidate vector i1 for the noise code are output in order.
[0081]
As a result, the optimum adaptive codebook vector d1 is output from the adaptive codebook 13, and is multiplied by the gain candidate vector h1 for adaptive code output from the gain codebook 17 by the multiplier 18, thereby performing gain adjustment. The optimum adaptive codebook vector e1 is output.
On the other hand, the optimum noise codebook vector f1 is output from the noise codebook 16, and is multiplied by the gain candidate vector i1 for noise code output from the gain codebook 17 by the multiplier 19 to perform gain adjustment. An optimal noise codebook vector g1 is output.
[0082]
Then, the optimum adaptive codebook vector e1 whose gain has been adjusted and the optimum noise codebook vector g1 whose gain has been adjusted are added by the adder 20 to generate a sound source signal j1, and the perceptual weighting synthesis filter 14 Thus, the reproduced sound k1 to which the spectrum envelope information b1 is added and the auditory weighting is applied is output.
[0083]
The optimal candidate vector selector 15 normalizes the perceptually weighted input speech n1 output from the perceptual weighting filter 21 using the frame power information c1 output from the frame power calculator 12, and reproduces the regenerated speech k1. Is a gain code vector that is optimal for an adaptive code gain candidate vector h1 and a noise code gain candidate vector i1 that minimize the auditory weighted mean square error for the normalized input speech n1. The number of the selected vector is output as the optimum gain index p1 of the gain codebook.
[0084]
Then, the adaptive codebook optimum vector d1, the noise codebook optimum vector f1, and the gain codebook optimum vectors h1 and i1 selected as a result of the codebook search process are output from the adaptive codebook 13, the noise codebook 16, and the gain codebook 17. The control signal l1 is controlled so that the content of the internal memory of the adaptive codebook 13 is updated with the excitation signal j1 obtained as a result of the adaptive codebook update processing, and the update result is the adaptive code of the next subframe. It is used as a book 13.
[0085]
When the above-described operation in units of subframes is repeated and the code candidate search processing for the last subframe is completed in the optimal candidate vector selector 15, as the operation of the speech encoded information interpolation processing that is a characteristic part of the present invention, When the frame synchronization signal is a frame to be transmitted, each codebook optimum index m1, o1, p1 of the last subframe in the speech coding information of the previous frame stored in the optimum candidate vector selector 15 In accordance with the control signal, the adaptive codebook 13, the noise codebook 16, and the gain codebook 17 output the adaptive codebook optimum vector d1, the noise codebook optimum vector f1, and the gain codebook optimum vectors h1 and i1 of the previous frame. 11 and control the internal memo of the adaptive codebook 13 with the excitation signal j1 obtained as a result of the adaptive codebook update processing. The contents are updated, the update result of the interpolation is adapted to be used as an adaptive codebook 13 for the next subframe.
[0086]
If the frame synchronization signal is not a frame to be transmitted, the adaptive codebook optimum vector d1, the noise codebook optimum vector f1, and the gain codebook optimum vector h1, i1 selected as a result of the codebook search process are used as the adaptive code. The control signal l1 is controlled so as to be output from the book 13, the noise codebook 16, and the gain codebook 17, and the internal memory contents of the adaptive codebook 13 are obtained by the excitation signal j1 obtained as a result of the adaptive codebook update processing operation. Is updated, and the update result is used as the adaptive codebook 13 of the next subframe.
[0087]
Next, a speech decoder (present speech decoder) in the speech communication apparatus according to the embodiment of the present invention will be described with reference to FIG. FIG. 4 is a configuration block diagram of a speech decoder in the speech communication apparatus according to the embodiment of the present invention.
[0088]
As shown in FIG. 4, the speech decoder includes an adaptive codebook 31, a noise codebook 32, a gain codebook 33, a multiplier 34, a multiplier 35, an adder 36, and a synthesis filter 37. And a post filter 38.
[0089]
Next, each part of the speech coder will be described.
The adaptive codebook 31 is an adaptive codebook having the same contents as the adaptive codebook 13 of the speech encoder (FIG. 1), and further includes a previous excitation signal area for storing the excitation signal generated in the previous subframe. It has been.
Then, the adaptive codebook optimum vector d2 selected according to the adaptive codebook optimum index m2 received and inputted is outputted.
[0090]
The noise codebook 32 is a noise codebook having the same contents as the noise codebook 16 of the speech encoder (FIG. 1), and is selected according to the noise codebook optimum index o2 received and inputted. Is output.
[0091]
The gain codebook 33 is a gain codebook having the same contents as the gain codebook 17 of the speech coder (FIG. 1), and the optimum of the adaptive codebook vector selected according to the gain codebook optimum index p2 received and inputted. The gain h2 and the optimum gain i2 of the noise codebook vector are output.
[0092]
The gain controller 39 inputs the gain h2 of the adaptive codebook vector and the gain i2 of the noise codebook vector, adjusts the gain using the received frame power information c2, and adjusts the gain-adjusted adaptive codebook vector. A gain h2 'and a noise codebook vector gain i2' are output.
[0093]
The multiplier 34 multiplies the optimum adaptive codebook vector d2 by the gain h2 'whose gain is adjusted, and outputs the optimum adaptive codebook vector e2 whose gain is adjusted.
The multiplier 35 multiplies the optimum noise codebook vector f2 by the gain i2 ′ adjusted in gain, and outputs the optimum noise codebook vector g2 adjusted in gain.
The adder 36 adds the optimum adaptive codebook vector e2 whose gain has been adjusted and the optimum noise codebook vector g2 whose gain has been adjusted, and reproduces the excitation signal j2.
[0094]
The synthesis filter 37 generates the reproduced sound k2 by adding the received vector envelope information b2 to the sound source signal j2.
The post filter 38 performs a formant emphasis process on the reproduced sound k2 and outputs a reproduced sound a2 subjected to the formant emphasis process in order to improve the reproduced sound quality in terms of hearing.
[0095]
Next, the operation of this speech decoder will be described with reference to FIG.
In the present speech decoder, reproduced speech is generated in accordance with speech encoding information shown in Table 1 received in units of frames. The operation will be described below.
First, the following processing is performed for each subframe (10 ms, 80 samples) to reproduce the sound source signal j2.
[0096]
Specifically, based on the received adaptive codebook optimum index m2 and noise codebook optimum index o2, adaptive codebook optimum vector d2 and noise codebook optimum vector f2 are output from adaptive codebook 31 and noise codebook 32, respectively. Is done.
On the other hand, based on the received gain codebook optimum index p2, the gain codebook 33 outputs the gain h2 of the adaptive codebook vector and the gain i2 of the noise codebook vector, and performs gain control using the received frame power information c2. The gain adjustment is performed by the unit 39, and the gain h2 'of the adaptive codebook vector and the gain i2' of the noise codebook vector that have been gain-adjusted are output.
[0097]
The adaptive codebook optimum vector d2 output from the adaptive codebook 31 is multiplied by the gain h2 'of the adaptive codebook vector whose gain is adjusted from the gain controller 39 by the multiplier 34, and the optimum adaptive code whose gain is adjusted. The book vector e2 is output. Similarly, the noise codebook optimum vector f2 output from the noise codebook 32 is multiplied by the gain i2 'of the noise codebook vector gain adjusted from the gain controller 39 by the multiplier 35. Then, the optimum adaptive codebook vector g2 whose gain has been adjusted is output, and e2 and g2 are added by the adder 36 to reproduce the excitation signal j1.
[0098]
In the adaptive codebook 31, after the reproduction of the excitation signal j2 is completed, it is updated with the excitation signal j2, and the update result is used as the adaptive codebook of the next subframe.
Here, the update result of the adaptive codebook 31 of the speech coder should be exactly the same as the update result of the adaptive codebook 13 of the speech coder when there is no transmission error.
[0099]
Then, the following processing is executed for each frame (320 m noise codebook 320 samples).
The sound source signal j2 output from the adder 36 is added with the vector envelope information b2 received by the synthesis filter 37 to generate the reproduced sound k2, and further the formant for improving the reproduced sound quality on hearing by the post filter 38. The reproduction sound a2 is output after the enhancement process.
[0100]
According to the speech communication method of the embodiment of the present invention, the speech encoding information extraction process for the frame transmitting the frame synchronization signal on the speech encoding side (transmission side) on the transmission side is performed on the decoding side (reception side). Since the same interpolation processing as the speech encoding information interpolation processing is performed, the update result of the internal memory contents of the adaptive codebook of the speech encoder on the transmission side and the speech decoder on the reception side is always kept equal, and the frame synchronization signal The deterioration of the reproduced sound quality due to the insertion does not affect a plurality of frames, and the deterioration of the reproduced sound signal quality can be reduced.
[0101]
According to the speech communication apparatus of the embodiment of the present invention, the optimal candidate vector selector 15 of the speech coder performs speech between the codebook search process of the last subframe in the frame and the adaptive codebook update process. Since the encoding information interpolation processing is inserted and the interpolation processing is performed on the frame for transmitting the frame synchronization signal, the portions other than the optimal candidate vector selector 15 of the speech encoder and the speech decoder side remain the same as before. Since it can be used, there is an effect that can be easily realized.
[0102]
Further, since the speech encoder of the present invention is realized by a DSP (Digital Signal Processor) or CPU, the present invention has an effect that can be easily realized by changing their software.
[0103]
【The invention's effect】
According to the first and second aspects of the invention, the frame in which the synchronization signal is periodically transmitted on the transmission side. Speech coding information about , For the audio coding information of the frame that received the synchronization signal Interpolation processing of speech coding information performed on the receiving side the same Interpolation is performed on the transmission side Update the adaptive codebook according to the speech coding information obtained by the interpolation process Since it is a voice communication method, when processing is performed while reflecting the previous voice coding information during voice coding / decoding using an adaptive codebook, the same voice coding is used on the transmitting side and the receiving side. Since the information interpolation process is performed, the influence of the voice encoding on the transmission side and the influence of the voice decoding on the reception side are equal, and there is an effect that the quality of the reproduced voice can be improved.
[0104]
According to the third aspect of the present invention, a frame in which a transmission side speech encoder transmits a synchronization signal in a transmission unit Speech coding information about , For the audio coding information of the frame that received the synchronization signal Interpolation processing of speech coding information performed by the receiver the same Perform interpolation processing Update the adaptive codebook according to the speech coding information obtained by the interpolation processing Since it is a voice communication device, when processing is performed while reflecting the previous speech coding information during speech coding / decoding using an adaptive codebook, the same speech coding is performed on the transmitting side and the receiving side. Since the information interpolation process is performed, the influence of the voice encoding on the transmission side is equal to the influence of the voice decoding on the reception side, and there is an effect that the quality of reproduced voice can be improved.
[0105]
According to the invention of claim 4, the frame in which the synchronization signal is periodically transmitted Then, speech encoding information of the frame about For the audio coding information of the frame that received the synchronization signal The same processing as the speech coding information interpolation processing performed on the receiving side is performed by the spectrum envelope parameter extractor, the frame power calculator, and the optimal candidate vector selector, and according to the speech coding information subjected to the interpolation processing. And adaptive codebook, noise codebook, and gain codebook so that each codebook vector is output optimally. The control signal is output, and the adaptive codebook, noise codebook, and gain codebook output adaptive code, noise code, and gain candidate vectors according to the control signal. Since the speech communication apparatus includes a transmission-side apparatus that includes a speech coder that updates the content of the candidate vector of the adaptive code, the previous speech coding information is used for speech coding / decoding using the adaptive codebook. When processing is performed while reflecting the same, the same speech coding information interpolation processing is performed on the transmitting side and the receiving side, so the influence of speech coding on the transmitting side and speech decoding on the receiving side This has the effect of improving the quality of reproduced audio.
[0106]
According to the fifth aspect of the present invention, the speech encoder on the transmission side is immediately before the speech encoding information interpolating process performed by the receiving unit for the frame in which the synchronization signal is transmitted by the transmitting unit. 5. The voice communication apparatus according to claim 3, wherein the voice communication apparatus performs the interpolation process using the voice coding information obtained in the frame of the previous voice code. If the processing is performed while reflecting the encoded information, the same speech coding information is interpolated on the transmitting side and the receiving side by a simple process. And the influence of speech decoding on the receiving side are equalized, and there is an effect that the quality of reproduced speech can be improved.
[Brief description of the drawings]
FIG. 1 is an explanatory diagram showing speech encoding / decoding processing and frame synchronization signal transmission / reception timing in a speech communication apparatus according to an embodiment of the present invention.
FIG. 2 is a configuration block diagram of a speech coder in the speech communication apparatus according to the embodiment of the present invention.
FIG. 3 is a flowchart showing the flow of speech coding information interpolation processing in the optimum candidate vector selector 15 of the speech coder.
FIG. 4 is a configuration block diagram of a speech decoder in the speech communication apparatus according to the embodiment of the present invention.
FIG. 5 is a block diagram showing a schematic configuration of a conventional voice communication apparatus.
FIG. 6 is an explanatory diagram showing speech encoding / decoding processing and frame synchronization signal transmission / reception timing in a conventional speech communication apparatus.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Speech input part, 2 ... Speech coder, 3 ... Transmission part, 4 ... Reception part, 5 ... Speech decoder, 6 ... Speech output part, 11 ... Spectral envelope parameter extractor, 12 ... Frame power calculator , 13 ... Adaptive codebook, 14 ... Auditory weighting synthesis filter, 15 ... Optimal candidate vector selector, 16 ... Noise codebook, 17 ... Gain codebook, 18 ... Multiplier, 19 ... Multiplier, 20 ... Adder, 21 ... auditory weighting filter, 31 ... adaptive codebook, 32 ... noise codebook, 33 ... gain codebook, 34 ... multiplier, 35 ... multiplier, 36 ... adder, 37 ... synthesis filter, 38 ... post filter, 39 ... Gain controller

Claims

A speech communication method using speech coding / decoding processing using an adaptive codebook, wherein speech coding information is extracted and transmitted by performing speech coding processing on an input speech signal on a transmission side, and reception In the audio communication method for reproducing the audio signal by performing audio decoding processing on the audio encoded information received on the side, the transmission is performed when the synchronization signal is periodically transmitted from the transmission side instead of the audio encoded information. For the speech coding information of the frame in which the synchronization signal is transmitted on the side, the same interpolation processing as the speech coding information interpolation processing performed on the receiving side with respect to the speech coding information of the frame that has received the synchronization signal gastric lines, voice communication method and updates the adaptive codebook in accordance with the speech encoding information obtained by the interpolation process.

The voice communication method according to claim 1, wherein the voice coding information interpolation process uses voice coding information obtained in the previous frame.

A voice input unit that inputs voice and outputs a voice signal; a voice encoder that performs voice coding processing of the voice signal using an adaptive codebook and extracts voice coded information; and the voice coded information And a transmitting unit that periodically transmits a synchronization signal instead of the speech encoded information, and receives the transmitted speech encoded information and receives the synchronization signal, the speech encoded information A receiving unit that outputs speech encoded information obtained in the previous frame as an interpolation process, a speech decoder that decodes speech encoded information using an adaptive codebook and outputs a speech signal, and the speech signal A receiving side having an audio output unit for outputting as audio,
Interpolation processing performed by the reception unit for the speech coding information of the frame in which the speech encoder receives the synchronization signal, with respect to speech coding information of the frame in which the synchronization signal is transmitted by the transmission unit There line the same interpolation process as the interpolation processing voice communication device and updates the adaptive codebook in accordance with the speech encoding information obtained by.

Spectral envelope parameters that extract the spectral envelope information that represents the vocal tract characteristics in the voice generation system for each input voice signal and that uses the spectral envelope information of the previous frame as the spectral envelope information of the previous frame in the frame in which the synchronization signal is transmitted An extractor;
A frame power calculator that performs frame power calculation on an input audio signal frame basis to output frame power information, and in a frame in which a synchronization signal is transmitted , a frame power calculator that uses the frame power information of the previous frame as the frame power information of the frame;
An auditory weighting filter that performs auditory weighting processing on the input voice signal using the spectral envelope information and outputs an auditory weighted input voice signal;
A codebook for expressing periodic components in a sound source signal, which outputs an optimal adaptive code candidate vector selected in accordance with an input control signal and receives the input of the sound source signal to display the contents of the adaptive code candidate vector An adaptive codebook to be updated;
A noise codebook for expressing a noise component in a sound source signal and outputting a candidate vector of an optimum noise code selected according to an input control signal;
A gain codebook for adjusting a gain and outputting a gain candidate vector for an adaptive codebook selected according to an input control signal and a gain candidate vector for a noise codebook;
A first multiplier for multiplying an optimal adaptive codebook vector by a gain candidate vector and outputting a gain-adjusted optimal adaptive codebook vector;
A second multiplier that multiplies the optimal noise codebook vector by the gain candidate vector and outputs a gain adjusted optimal noise codebook vector;
An adder for adding a gain-adjusted optimal adaptive codebook vector and a gain-adjusted optimal noise codebook vector and outputting a sound source signal;
An auditory weighting synthesis filter that adds the spectral envelope information to the sound source signal and performs auditory weighting to generate and output a reproduced audio signal;
Before SL adaptive codebook, the noise codebook, have rows codebook search process for outputting each codebook optimum index searches each codebook vector of the optimum in the gain codebook,
In a frame in which a synchronization signal is not transmitted, a control signal is output to the adaptive codebook, the noise codebook, and the gain codebook so that each optimum codebook vector selected in the search process is output, Update the adaptive codebook,
The frame sync signal is transmitted, the same interpolation process as the interpolation processing of the audio encoded information is made on the receiving side for the speech coding information to the speech coding information of the frame which has received the synchronization signal of the frame The control signal is output to the adaptive codebook, the noise codebook, and the gain codebook so that each optimal codebook vector is output according to the speech coding information obtained by the interpolation process , A speech communication apparatus comprising: a transmission-side apparatus including a speech encoder having an optimal candidate vector selector that performs an adaptive codebook update process for updating the adaptive codebook.

The voice communication apparatus according to claim 3 or 4, wherein the voice coding information interpolation process uses voice coding information obtained in a previous frame.