JP3207281B2

JP3207281B2 - Stereo speech encoding / decoding system, stereo speech decoding device, and single speech / multiple simultaneous speech discrimination device

Info

Publication number: JP3207281B2
Application number: JP02405193A
Authority: JP
Inventors: 重信南; 理岡田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-02-12
Filing date: 1993-02-12
Publication date: 2001-09-10
Anticipated expiration: 2016-09-10
Also published as: JPH06236200A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、遠隔会議システム等に
適用されるステレオ音声符号化・復号化方式、ステレオ
音声復号化装置及び単独発言／複数同時発言判別装置に
関する。BACKGROUND OF THE INVENTION The present invention, stereo audio encoding and decrypt scheme applied to the teleconference system or the like, stereo
The present invention relates to a voice decoding device and a single message / multiple simultaneous message discriminating device .

【０００２】[0002]

【従来の技術】近年、通信技術の進歩に伴い、遠隔地間
で通信により会議を行う遠隔会議システムの需要が増加
している。2. Description of the Related Art In recent years, with the advance of communication technology, a demand for a remote conference system for holding a conference by communication between remote locations has been increasing.

【０００３】このような遠隔会議システムは、一般に図
５に示すように、マイク１、スピーカ２、ＴＶカメラ
３、ＴＶ４、電子黒板５、ＦＡＸ６、テレライティング
７等の入出力系と、音声ユニット８、制御ユニット９、
制御パッド１０、画像ユニット１１等の制御系と、伝送
路１２、伝送ユニット１３等の伝送系とから構成され、
動画、静止画等の画像情報や音声情報を伝送路１２を介
して遠隔地間でやりとりをする。As shown in FIG. 5, such a remote conference system generally includes an input / output system such as a microphone 1, a speaker 2, a TV camera 3, a TV 4, an electronic whiteboard 5, a FAX 6, and a telewriting 7, and an audio unit 8. , Control unit 9,
It is composed of a control system such as a control pad 10 and an image unit 11, and a transmission system such as a transmission path 12 and a transmission unit 13.
Image information and audio information such as moving images and still images are exchanged between remote locations via the transmission path 12.

【０００４】ところで、こうした遠隔会議システムで
は、伝送コストの低下が望まれており、特に現在の一般
加入者線で伝送可能な６４ｋｂｐｓ程度の伝送速度でこ
れらの情報を伝送できれば、光ファイバ等を用いた高品
質の遠隔会議システムに比較して、はるかに低コストで
遠隔会議システムを実現できる。By the way, in such a teleconference system, it is desired to reduce the transmission cost. In particular, if such information can be transmitted at a transmission speed of about 64 kbps which can be transmitted through a current general subscriber line, an optical fiber or the like is used. A teleconference system can be realized at a much lower cost than a high-quality teleconference system that has been used.

【０００５】このような低伝送速度の伝送路を使用する
遠隔会議システムにおいては、モノラル音声でさえ、た
とえばＡＤＰＣ等の音声データ圧縮処理により１６ｋｂ
ｐｓ程度の低伝送容量に圧縮する必要があるため、通常
ステレオ音声は用いられていない。In a teleconferencing system using such a transmission line with a low transmission rate, even a monaural voice can be compressed to 16 kb by voice data compression processing such as ADPC.
Since it is necessary to compress the data to a low transmission capacity of about ps, stereo sound is not usually used.

【０００６】しかし、遠隔会議システムにおいては、臨
場感を出すためにも、相手方の誰が話しているかを知る
話者識別のためにも、ステレオ音声の採用が望ましいこ
とはよく知られている。[0006] However, it is well known that in a teleconferencing system, it is desirable to employ stereophonic sound in order to give a sense of realism and to identify a speaker who knows who is speaking.

【０００７】そこで、本発明者等は、特開昭６２−５１
８４４号公報において、低伝送速度の伝送路において使
用しても低コストで高品質のステレオ音声の伝送を可能
とするステレオ音声伝送方式を提唱した。Accordingly, the present inventors have disclosed in Japanese Patent Application Laid-Open No. Sho 62-51.
Japanese Patent Publication No. 844 proposes a stereo audio transmission system that enables high-quality stereo audio transmission at low cost even when used in a transmission line with a low transmission rate.

【０００８】この方式を図６に基づき簡単に説明する。This method will be briefly described with reference to FIG.

【０００９】同図に示すように、話者Ａ１の音声Ｘ
（ω）が左右各チャンネルのマイク１ｒ、１ｌへ入力さ
れる。ただし、壁等からのエコーは無視し、ωは各周波
数を表す。このとき、左右各チャンネルでの伝達関数
をＧ_L（ω）、Ｇ_R（ω）とすれば、左右各チャンネル
の入力音声Ｙ_L（ω）、Ｙ_R（ω）はＹ_L（ω）＝Ｇ_L（ω）Ｘ（ω） ……（１）Ｙ_R（ω）＝Ｇ_R（ω）Ｘ（ω） ……（２）となる。さらに、両式よりＹ_L（ω）＝（Ｇ_L（ω）／Ｇ_R（ω））・Ｙ_R（ω） ……（３）＝Ｇ（ω）Ｙ_R（ω） ……（４）となる。[0009] As shown in FIG.
(Ω) is input to the microphones 1r and 1l of the left and right channels. However, an echo from a wall or the like is ignored, and ω represents each frequency. At this time, if the transfer functions of the left and right channels are G _L (ω) and G _R (ω), the input voices Y _L (ω) and Y _R (ω) of the left and right channels are Y _L (ω) = the _{G L (ω) X (ω} ) ...... (1) Y R (ω) = G R (ω) X (ω) ...... (2). Moreover, from both equations _{Y L (ω) = (G} L (ω) / G R (ω)) · Y R (ω) ...... (3) = G (ω) Y R (ω) ...... (4) Becomes

【００１０】したがって、伝達関数Ｇ（ω）さえわかれ
ば、右チャンネルの音声が再現できることになる。Therefore, if the transfer function G (ω) is known, the sound of the right channel can be reproduced.

【００１１】そこで、この方式は、ステレオ伝送の場合
に、両チャンネルの音声を独立に送らずに、送信側より
一方のチャンネルの音声信号Ｙ_R（ω）と推定した伝達
関数Ｇ（ω）を送り、受信側で音声信号Ｙ_R（ω）と音
声信号Ｙ_R（ω）および伝達関数Ｇ（ω）の合成により
両チャンネルの音声を左右各チャンネルのスピーカ２
ｒ、２ｌより再現することで、ステレオ伝送を行うもの
である。Therefore, in this system, in the case of stereo transmission, the transmission function G (ω) estimated as the audio signal Y _R (ω) of one channel from the transmission side without transmitting the audio of both channels independently. The transmitting and receiving sides combine the audio signal Y _R (ω), the audio signal Y _R (ω), and the transfer function G (ω) to convert the audio of both channels into the left and right speakers 2.
By performing reproduction from r and 2l, stereo transmission is performed.

【００１２】そして、この方式では、単独発言を前提と
するならば伝達関数を単なる遅延と減衰とで規定できる
ので、その情報量は音声信号Ｙ_L（ω）の情報量よりも
はるかに少なくかつ推定も簡単に行え、より少ない伝送
量でステレオ伝送が可能となる。[0012] In this method, since a transfer function can be defined simply by delay and attenuation if a single utterance is assumed, the information amount is much smaller than the information amount of the voice signal Y _L (ω). Estimation can be easily performed, and stereo transmission can be performed with a smaller transmission amount.

【００１３】しかしながら、この方式では、単独発言時
を前提にしているため、複数の話者が同時に発言するよ
うなダブルトーク時には正確な伝達関数Ｇ（ω）、すな
わち付加情報を生成することができず音像がふらつくと
いう問題があった。However, this method is based on the premise that a single utterance is made. Therefore, an accurate transfer function G (ω), that is, additional information, can be generated during a double talk in which a plurality of speakers speak simultaneously. There was a problem that the sound image fluctuated.

【００１４】[0014]

【発明が解決しようとする課題】会議等の会話において
は、通常ダブルトークの占める割合は非常に低いと考え
られる。従来方式では、この性質を利用して単独発言を
モノラル伝送することにより大幅な帯域圧縮を実現した
わけであるが、稀に生じるダブルトーク時にもモノラル
伝送をそのまま適用したために音像がふらつくという問
題がある。In a conversation such as a conference, it is generally considered that the ratio of double talk is very low. In the conventional method, large bandwidth compression was realized by transmitting a single message monaurally using this property. is there.

【００１５】そこで、本発明は、ダブルトーク時にも音
像のふらつかない高品質なステレオ音声符号化・復号化
方式、ステレオ音声復号化装置及び単独発言／複数同時
発言判別装置を提供することを目的とする。[0015] Therefore, the present invention provides high-quality stereo audio encoding and decrypt schemes were not unsteady of sound even during double talk, stereo audio decoding apparatus and alone statements / multiple simultaneous
An object of the present invention is to provide a speech discrimination device .

【００１６】[0016]

【課題を解決するための手段】かかる課題を解決するた
め、第１の発明のステレオ音声符号化・復号化方式は、
複数チャンネルの音声信号を符号化・復号化するステレ
オ音声符号化・復号化方式において、単独発言または複
数同時発言を区別する機能を有し、単独発言時には、前
記複数チャンネルの音声信号のうち少なくとも１つのチ
ャンネルの音声信号よりなる主情報とこの主情報より残
りのチャンネルの音声信号を合成するために必要な付加
情報とを符号化・復号化し、複数同時発言時には、前記
複数チャンネルの音声信号を個別に符号化・復号化する
ことを特徴とする。To solve SUMMARY OF THE INVENTION The above problem, stereo audio encoding and decrypt method of the first invention,
Stereo for encoding and decrypt the audio signals of a plurality of channels
The voice encoding / decoding method has a function of distinguishing a single utterance or a plurality of simultaneous utterances. At the time of a single utterance, the main information including the audio signal of at least one of the plurality of audio signals and the main information turned into encoding and decrypt the additional information required to synthesize the speech signal of the remaining channels from the information, when multiple simultaneous utterance, characterized in that the encoding and decrypt individual audio signals of the plurality of channels And

【００１７】第２の発明のステレオ音声符号化・復号化
方式は、第１の発明において、単独発言時の主情報符号
化方式の符号化音声帯域は、複数同時発言時の各々の符
号化音声帯域より広いことを特徴とする。The stereo audio coding and decrypt method of the second aspect, in the first aspect, the encoded voice band of the main information encoding method when alone remarks, encoding each of the at multiple simultaneous speaking It is wider than the voice band.

【００１８】第３の発明のステレオ音声復号化装置は、
複数チャンネルの音声信号のうち少なくとも１つのチャ
ンネルの音声信号よりなる主情報とこの主情報より残り
のチャンネルの音声信号を合成するために必要な付加情
報とを復号化する手段と、前記複数チャンネルの音声信
号を個別に復号化する手段と、前記付加情報に基づき、
単独発言または複数同時発言を区別する手段と、単独発
言時には、前記復号化された主情報と付加情報を選択
し、複数同時発言には、前記個別に復合化された情報を
選択する手段とを具備することを特徴とする。A third stereo audio decrypt apparatus of the aspect of the present invention,
Means for decrypt and additional information necessary for synthesizing the audio signal of the remaining channels from the main information with the main information consisting of audio signals of the at least one channel of the plurality of channels of audio signals, said plurality of channels means for decrypt separately an audio signal, based on the additional information,
And means for distinguishing single utterance or more concurrent speaking, when alone speaking, select the primary information and the additional information which is the decrypt, the multiple simultaneous utterance, means for selecting said individually Fukugo of information It is characterized by having.

【００１９】第４の発明の単独発言／複数同時発言判別
装置は、音声信号を第１及び第２の低速フィルタに入力
する手段と、前記第１及び第２の低速フィルタからのそ
れぞれの出力信号を加算合成及び減算合成した左右の和
成分及び差成分をそれぞれ適応予測（ＡＤＰＣＭ）符号
化して正負符号成分を生成する手段と、前記生成された
正負符号成分を遅延線に入力する手段と、前記遅延線か
らの出力信号に平均化処理を施しＴサンプル間の符号相
関を得る相関生成手段と、前記相関生成手段によりＴサ
ンプル間で相関出力を得られなかった場合に複数同時発
言と判定し、該相関出力を得られた場合に単独発言と判
定する判定手段とを具備することを特徴とする。In a fourth aspect of the present invention, a single utterance / multiple simultaneous utterance discriminating apparatus inputs an audio signal to first and second low-speed filters.
Means from the first and second low-speed filters.
Left and right sums obtained by adding and subtracting the respective output signals
Adaptive prediction (ADPCM) code for each component and difference component
Means for generating a sign component turned into, is the product
Means for inputting the sign component delay line, or the delay line
Averaging is performed on these output signals to obtain the code phase between T samples.
Correlation generating means for obtaining the correlation, and T correlation by the correlation generating means.
It is characterized by comprising a judgment means for judging a plurality of simultaneous utterances when a correlation output cannot be obtained between samples, and for judging a single utterance when obtaining the correlation output .

【００２０】[0020]

【００２１】[0021]

【００２２】[0022]

【作用】本発明では、ダブルトーク時にはステレオ音声
伝送を行い、単独発言のみをモノラル伝送することによ
り、音像のふらつきを防止している。ただし、単にダブ
ルトーク時にステレオ伝送を行うと、ダブルトーク時に
一時的にせよ伝送レートが増加してしまう。そこで、ダ
ブルトーク時にのみ若干品質を劣化させることにより、
伝送レートを増加させることなくステレオ伝送を実現し
ている。According to the present invention, stereo sound transmission is performed at the time of double talk, and only a single utterance is transmitted in monaural, thereby preventing sound image fluctuation. However, if stereo transmission is simply performed during double talk, the transmission rate will increase even temporarily during double talk. Therefore, by slightly deteriorating the quality only during double talk,
Stereo transmission is realized without increasing the transmission rate.

【００２３】[0023]

【実施例】以下、本発明の実施例の詳細を図面に基づき
説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The embodiments of the present invention will be described below in detail with reference to the drawings.

【００２４】図１は本発明の一実施例に係るシステムの
構成を示す図である。FIG. 1 is a diagram showing a configuration of a system according to an embodiment of the present invention.

【００２５】同図に示すように、符号化器側は、第１〜
第３のモノラル音声符号化器１０１、１０３、１０４お
よび検出器１０２から構成される。As shown in FIG.
It comprises a third monaural speech encoder 101, 103, 104 and a detector 102.

【００２６】第１のモノラル音声符号化器１０１は、単
独発言時に左右マイク出力の和を例えば５６ｋｂｐｓに
圧縮して符号化する。検出器１０２は、左右マイク出力
信号間のレベル差、遅延差および単独発言または複数同
時発言の相違を検出して例えば８ｋｂｐｓに符号化す
る。第２および第３のモノラル音声符号化器１０３、１
０４は、複数同時発言時に左右各々のマイク出力信号を
第１のモノラル音声符号化器１０１より低ビットレート
例えば各３２ｋｂｐｓに個別に符号化する。The first monaural speech coder 101 compresses the sum of the outputs of the left and right microphones to, for example, 56 kbps when coding alone, and codes the sum. The detector 102 detects a level difference between the left and right microphone output signals, a delay difference, and a difference between a single utterance or a plurality of simultaneous utterances, and encodes them at, for example, 8 kbps. Second and third monaural audio encoders 103, 1
Reference numeral 04 individually encodes the left and right microphone output signals at a lower bit rate, for example, 32 kbps, by the first monaural audio encoder 101 at the time of simultaneous speech.

【００２７】復号化器側は、第１〜第３のモノラル復号
化器１０５、１０８、１０９、疑似ステレオ生成器１０
６、１０７および選択器１１０、１１１から構成され
る。The decrypt unit side, the first to third mono decrypt <br/> encoder 105,108,109, pseudo-stereo generator 10
6 and 107 and selectors 110 and 111.

【００２８】第１のモノラル復号化器１０５は、単独発
言時に第１のモノラル音声符号化器１０１から送られて
きた符号を復号する。疑似ステレオ生成器１０６、１０
７は、この復号出力に遅延差、利得差を与えて疑似的に
ステレオ音声を生成する。第２および第３のモノラル復
号化器１０８、１０９は、複数同時発言時に第２および
第３のモノラル音声符号化器１０３、１０４から送られ
てきた左右の各符号を復号する。選択器１１０、１１１
は、単独発言、複数同時発言の判定結果に基づき、疑似
ステレオ生成器１０６、１０７の出力または第２および
第３のモノラル復号化器１０８、１０９の出力よりいず
れか一方を選択して出力する。The first mono decrypt 105 decodes the code sent from the first monaural audio encoder 101 when alone speak. Pseudo stereo generators 106, 10
7 pseudo-generates stereo sound by giving a delay difference and a gain difference to the decoded output. Second and third monaural reproduction
The encoders 108 and 109 decode the left and right codes sent from the second and third monaural audio encoders 103 and 104 at the time of simultaneous speech. Selectors 110, 111
Alone speak, on the basis of the determination result of multiple simultaneous utterance, and selects and outputs either the output of the output or the second and third mono decrypt 108 and 109 of the pseudo-stereo generators 106 and 107 .

【００２９】この構成によれば、会話の大部分を占める
単独発言時には、第１のモノラル音声符号化器１０１に
より例えば６４ｋｂｐｓの伝送速度で高品質に疑似的に
ステレオ音声伝送を行うことができる。また、複数同時
発言時やその他の状態では、第２および第３のモノラル
音声符号化器１０３、１０４により左右個別に例えば３
２ｋｂｐｓで符号化した完全なステレオ音声伝送を行う
ことができる。これにより、単独発言時より若干品質は
劣化するものの合計６４ｋｂｐｓで符号化伝送でき合計
の符号化速度を一定に保ったままで複数同時発言時の音
像の乱れを防ぐとともに単独発言時には高品質で通信が
できる。According to this configuration, at the time of a single utterance that occupies most of the conversation, the first monaural speech encoder 101 can perform pseudo-quality stereo speech transmission at a transmission rate of, for example, 64 kbps with high quality. Also, at the time of plural simultaneous speeches and other states, the second and third monaural audio encoders 103 and 104 separately set the left and right
Complete stereo audio transmission encoded at 2 kbps can be performed. As a result, although the quality is slightly degraded compared to a single utterance, it is possible to perform coded transmission at a total of 64 kbps, to prevent disturbance of sound images when a plurality of utterances are simultaneously uttered while keeping the total coding speed constant, and to achieve high quality communication when a single utterance is made. it can.

【００３０】次に、各部の構成の詳細を説明する。な
お、以下では、単独発言時には７ｋＨｚ帯域の広帯域音
声符号化方式を、複数同時発言時やその他の状態には
３．４ｋＨｚ帯域の電話帯域音声符号化方式を適用する
ものとする。Next, the configuration of each section will be described in detail. In the following, it is assumed that the 7 kHz band wideband speech coding system is applied to a single speech, and the 3.4 kHz band telephone band speech coding system is applied to a plurality of simultaneous speeches and other states.

【００３１】図２は符合化器側の構成例である。FIG. 2 shows an example of the configuration on the encoder side.

【００３２】同図に示すように、左右のマイク出力音声
は、帯域分割フィルタ３０１、３０２、３０３、３０４
によりそれぞれ０〜４ｋＨｚ（複数同時発言時は０〜
３．４ｋＨｚ）の低域、４〜７ｋＨｚの高域に２分割さ
れる。As shown in the figure, the left and right microphone output sounds are divided into band division filters 301, 302, 303, 304.
0 to 4 kHz (0 to 4 kHz at the same time)
The frequency band is divided into a low band of 3.4 kHz) and a high band of 4 to 7 kHz.

【００３３】これらのフィルタ出力のうち高域フィルタ
３０１、３０４の出力は、加算器３０５により左右の信
号の加算が行われた後、適応予測（ＡＤＰＣＭ）符号化
器３０６により１６ｋｂｐｓに符号化され、単独発言時
の送信データの一部となる。また、低域フィルタ３０
２、３０３の出力は、加算器３０７および減算器３０８
により左右の和成分および差成分として合成され、各々
ＡＤＰＣＭ符号化器３０９、３１０に入力される。これ
らのうち和成分は、ＡＤＰＣＭ符号化器３０９により４
０ｋｂｐｓで符号化され単独発言時の送信データの一部
となるとともに、マスク器３００によりサンプル毎にＬ
ＳＢの１ｂｉｔが除去され、差成分のＡＤＰＣＭ符号化
器３１０の出力とともに各々３２ｋｂｐｓの複数同時発
言時の送信データとなる。Of these filter outputs, the outputs of the high-pass filters 301 and 304 are added to the left and right signals by an adder 305, and then coded to 16 kbps by an adaptive prediction (ADPCM) coder 306. It becomes a part of the transmission data at the time of a single remark. The low-pass filter 30
2 and 303 are output from an adder 307 and a subtractor 308.
Are combined as left and right sum components and difference components, and input to the ADPCM encoders 309 and 310, respectively. Of these, the sum component is converted to 4 by the ADPCM encoder 309.
The data is encoded at 0 kbps and becomes a part of the transmission data at the time of a single utterance.
One bit of the SB is removed, and together with the output of the difference component ADPCM encoder 310, becomes the transmission data at the time of plural simultaneous utterances of 32 kbps.

【００３４】さらに、ＡＤＰＣＭ符号化器３０９、３１
０の出力の正負符号成分および入力信号は、推定器３１
１に入力され、ここで左右のレベル差、遅延差が検出さ
れると同時に単独発言、複数同時発言の判定もなされ
る。Further, ADPCM encoders 309 and 31
The sign component of the output of 0 and the input signal are calculated by the estimator 31.
At the same time, a left-right level difference and a delay difference are detected, and at the same time, a single utterance and a plurality of simultaneous utterances are determined.

【００３５】単独発言データ合成器３１２は、１６ｋｂ
ｐｓのＡＤＰＣＭ高域符号、４０ｋｂｐｓの低域和成分
のＡＤＰＣＭ符号、８ｋｂｐｓの推定器３１１の出力コ
ードを合成して６４ｋｂｐｓの送信データを生成する。The single comment data synthesizer 312 is 16 kb.
A 64 kbps transmission data is generated by combining a ps ADPCM high band code, a 40 kbps low band sum component ADPCM code, and an 8 kbps estimator 311 output code.

【００３６】複数同時発言合成器３１３は、３２ｋｂｐ
ｓの左右ＡＤＰＣＭ符号化器３０６、３１０の出力符号
を合成して６４ｋｂｐｓの送信データを生成する。A plurality of simultaneous speech synthesizers 313 are 32 kbp.
The output codes of the left and right ADPCM encoders 306 and 310 are combined to generate transmission data of 64 kbps.

【００３７】これら送信データは、スイッチ３１４にお
いて推定器３１１の出力である単独複数発言判定信号に
より送信データ系列のうち１つを選択して６４ｋｂｐｓ
の回線に送出される。The transmission data is selected by a switch 314 from a transmission data sequence of 64 kbps by selecting a single transmission data sequence based on a single plural-speech determination signal output from the estimator 311.
Is sent to the line.

【００３８】図３は復号化器側の構成例である。[0038] FIG. 3 is a configuration example of a decrypt unit side.

【００３９】同図に示すように、６４ｋｂｐｓの受信デ
ータ系列は、単独発言用の分配器３１５および複数同時
発言用の分配器３１６に入力される。As shown in the figure, a received data sequence of 64 kbps is input to a distributor 315 for single speech and a distributor 316 for simultaneous speech.

【００４０】単独発言用の分配器３１５の出力のうち４
０ｋｂｐｓのＡＤＰＣＭ符号は、低域用のＡＤＰＣＭ復
号化器３１７に入力され、１６ｋｂｐｓのＡＤＰＣＭ符
号は、高域用のＡＤＰＣＭ復号化器３１８に入力され
る。これら復号化器の出力は、疑似ステレオ合成器３１
９、３２０、３２１、３２２により符号化器側で検出し
た遅延差利得差である分配器３１５の８ｋｂｐｓの出力
をもとに左右の疑似的なステレオ音声に生成された後、
帯域合成用の帯域０．２〜４ｋＨｚ（複数同時発言時は
３．４ｋＨｚ）の低域フィルタ３２３、３２４、帯域４
〜７ｋＨｚの高域フィルタ３２５、３２６に入力され
る。これらフィルタの出力は、加算器３２７、３２８で
帯域合成された後、単独発言時の復号信号となる。4 out of the outputs of the distributor 315 for a single statement
The 0 kbps ADPCM code is input to an ADPCM decoder 317 for low frequency, and the 16 kbps ADPCM code is input to an ADPCM decoder 318 for high frequency. The output of these decoders is the pseudo-stereo synthesizer 31
9, 320, 321, and 322, are generated into left and right pseudo stereo sound based on the 8 kbps output of the distributor 315, which is the delay difference gain difference detected on the encoder side.
Low-pass filters 323 and 324 of band 0.2 to 4 kHz (3.4 kHz at the time of simultaneous speech) for band synthesis, band 4
It is input to high-pass filters 325 and 326 of up to 7 kHz. Outputs of these filters are band-synthesized by adders 327 and 328, and then become decoded signals at the time of a single utterance.

【００４１】一方、複数同時発言用の分配器３１６の出
力である２つの３２ｋｂｐｓのデ−タ系列は、低域用の
ＡＤＰＣＭ復号化器３１７、３２６により復号された
後、和成分差成分から左右の信号を復元する加算器３３
０および減算器３３１に入力される。これらの出力は、
スイッチ３３２、３３３で複数同時発言時にのみ帯域合
成用の低域フィルタ３２３、３２４に入力される。On the other hand, two 32-kbps data sequences output from the distributor 316 for simultaneous simultaneous speech are decoded by the low-frequency ADPCM decoders 317 and 326, and then left and right from the sum component difference component. Adder 33 that restores the signal of
0 and input to the subtractor 331. These outputs are
The signals are input to the low-pass filters 323 and 324 for band synthesis only when a plurality of messages are sent simultaneously by the switches 332 and 333.

【００４２】低域用のＡＤＰＣＭ復号化器３１７、３２
６の入力符号の正負符号成分は、検出器３３４に入力さ
れ、複数同時発言状態から単独発言状態への切り換え用
信号として用いられる。ADPCM decoders 317, 32 for low band
The positive / negative sign component of the input code of No. 6 is input to the detector 334 and is used as a signal for switching from the multiple simultaneous speech state to the single speech state.

【００４３】スイッチ３３５、３３６は、複数同時発言
時に復号できない高域成分を抑圧するために用いられ
る。The switches 335 and 336 are used to suppress high-frequency components that cannot be decoded at the time of simultaneous speech.

【００４４】図４は推定器３１１の構成例である。FIG. 4 shows an example of the configuration of the estimator 311.

【００４５】同図に示すように、左右低域のＡＤＰＣＭ
符号化器３０９、３１０の正負符号成分のうち一方の信
号ＳＩＧＮ（Ｒ）（本例では右成分）は、Ｎサンプル分
のタップ付き遅延線４０１に入力される。一方、他の正
負符号（本例では左成分）は、左右の因果律を成立させ
るためのＮ／２サンブルの遅延線４０２に入力される。
これらの遅延線の出力信号は、遅延線４０１の各タップ
に対応する排他的論理和回路４０３−１、．．４０３−
Ｎに入力された後、Ｔサンプル毎にクリアされるアップ
ダウンカウンタ４０４−１、．．．．４０４−Ｎにより
平均化処理が施されることによりＴサンプル間の符号相
関がとられる。As shown in FIG.
One signal SIGN (R) (the right component in this example) of the positive and negative sign components of the encoders 309 and 310 is input to the delay line 401 with taps for N samples. On the other hand, the other sign (the left component in this example) is input to the N / 2 sampled delay line 402 for establishing the right and left causality.
Output signals of these delay lines are output to exclusive OR circuits 403-1,. . 403-
, N, and then cleared every T samples. . . . The code correlation between T samples is obtained by performing the averaging process by 404-N.

【００４６】これらアップダウンカウンタ４０４−
１、．．．．４０４−Ｎの出力は、クリア直前にラッチ
４０５によりラッチされた後、デコーダ回路４０６によ
り符号化され、Ｔサンプル毎に更新される左右の遅延差
情報τとなる。The up / down counter 404-
1,. . . . The output of 404-N is latched by the latch 405 immediately before clearing, and then encoded by the decoder circuit 406 to become left and right delay difference information τ updated every T samples.

【００４７】タイマ４０７は、Ｔサンプル毎のクリア信
号ＣＬ、ラッチ信号ＬＴＣを生成する。一般に、Ｔは例
えば１００ｍｓｅｃ程度の値に設定される。The timer 407 generates a clear signal CL and a latch signal LTC for every T samples. Generally, T is set to a value of, for example, about 100 msec.

【００４８】デコーダ回路４０６の出力のうちラッチ回
路４０５の出力が全部０に対応するコードはオア回路４
０８で検出され、０すなわちＴサンプル間で相関出力が
得られなかった状態をもって複数同時発言状態と判定す
る。A code corresponding to all the outputs of the latch circuit 405 out of the outputs of the decoder circuit 406 corresponds to the OR circuit 4
At 08, a state in which no correlation output is obtained between T samples, that is, a plurality of simultaneous speech states is determined.

【００４９】以上の回路は、復号化器側の検出器３３４
にも用いられ復号化器における複数発言から単独発言へ
の切り換え信号となる。[0049] The above circuit, the decrypt-side detector 334
A switching signal to a single utterance of a plurality remarks in decrypt instrument used to.

【００５０】符号化器側では、さらに、レベル検出器４
０９、４１０、比較器４１１により左右のレベル比ｌが
検出され遅延差とともに付加情報となる。On the encoder side, the level detector 4
09, 410 and the comparator 411 detect the left / right level ratio 1 and become additional information together with the delay difference.

【００５１】かくして、本実施例では、広く用いられて
いる広帯域モノラルＡＤＰＣＭ符号器・復号化器に比較
的簡単な処理を加えることにより複数同時発言時にも音
像の乱れないステレオ音声符号化方式が実現できる。[0051] Thus, in this embodiment, stereo audio encoding method that does not disturbance of the sound image when multiple simultaneous speech by adding a relatively simple process to wideband mono ADPCM encoder-decrypt unit which is widely used realizable.

【００５２】なお、以上の実施例は本発明を実施するた
めの一例にすぎず、本発明の趣旨を逸脱しない範囲内で
種々の変形が可能である。The above embodiment is merely an example for carrying out the present invention, and various modifications can be made without departing from the spirit of the present invention.

【００５３】[0053]

【発明の効果】以上説明したように、本発明によれば、
ダブルトーク時にはステレオ音声伝送を行い、単独発言
のみモノラル音声伝送を行っているので、音像のふらつ
きを防止でき、高品質なステレオ音声が実現できる。As described above, according to the present invention,
Since stereo sound transmission is performed during double talk and monaural sound transmission is performed only for a single utterance, fluctuation of a sound image can be prevented, and high quality stereo sound can be realized.

[Brief description of the drawings]

【図１】本発明の一実施例に係るシステムの構成を示す
図である。FIG. 1 is a diagram showing a configuration of a system according to an embodiment of the present invention.

【図２】図１に示す符号化器側の構成例である。2 is a configuration example of a mark-encoder side shown in FIG.

【図３】図１に示す復号化器側の構成例である。3 is a configuration example of a decrypt unit side shown in FIG.

【図４】図２に示す推定器の構成例である。FIG. 4 is a configuration example of an estimator shown in FIG . 2 ;

【図５】遠隔会議システムの一般的な構成を示す図であ
る。FIG. 5 is a diagram showing a general configuration of a remote conference system.

【図６】従来のステレオ音声伝送方式を説明するための
図である。FIG. 6 is a diagram for explaining a conventional stereo sound transmission system.

[Explanation of symbols]

１０１、１０３、１０４…第１〜第３のモノラル音声符
号化器１０２…検出器１０５、１０８、１０９…第１〜第３のモノラル復号化
器１０６、１０７…疑似ステレオ生成器１１０、１１１…選択器101, 103, 104 ... first to third mono audio coder 102 ... detector 105,108,109 ... first to third mono decrypt 106 and 107 ... pseudo-stereo generator 110, 111 ... Selector

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) H04B 14/04 H04N 7/15 H04M 3/56 Continuation of the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) H04B 14/04 H04N 7/15 H04M 3/56

Claims

(57) [Claims]

An audio signal of a plurality of channels is encoded and decoded.
In the stereo audio encoding / decoding method to be encoded, the audio signal has a function of distinguishing a single utterance or a plurality of simultaneous utterances. When the single utterance is made, the main information including the audio signal of at least one channel among the audio signals of the plurality of channels. turned into encoding and decrypt and additional information necessary for synthesizing the remaining channels of the audio signal from the main information of Toko, when multiple simultaneous utterance, to encoding and decrypt individual audio signals of the plurality of channels Stereo speech coding
Decrypt method.

2. A stereo speech encoding / decoding method according to claim 1,
Alone encoding voice band of the main information encoding method when remarks, stereo audio encoding and decrypt scheme, wherein the wider encoded voice band of each time multiple simultaneous speech at No. scheme.

Wherein means for decrypt and additional information necessary for synthesizing the audio signal of the remaining channels from the main information with the main information consisting of audio signals of the at least one channel of the plurality of channels of audio signals If, means for decrypt individual audio signals of the plurality of channels, based on the additional information, and means for distinguishing single utterance or more simultaneous utterance, alone at the time of speaking, the decrypt been main information added selects information, the multiple simultaneous speaking, the individual stereo audio decrypt apparatus characterized by comprising a means for selecting Fukugo of information.

4. The method according to claim 1, wherein the audio signal is supplied to first and second low-speed filters.
Input means and respective outputs from the first and second low speed filters
Left and right sum components and difference components obtained by adding and subtracting signals
Minute sign with adaptive prediction (ADPCM) encoding
Means for generating a sign component, and means for inputting the generated sign component to a delay line
When, T sump subjected to averaging processing on an output signal from the delay line
Generating a correlation output between T samples by the correlation generating means for obtaining the code correlation between the T samples.
If not, it is determined that there is a plurality of simultaneous remarks, and the correlation output is
A single utterance / multiple simultaneous utterance discriminating device, comprising: a deciding means for judging a single utterance when obtained .