Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
NZ611801B2 - Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec - Google Patents
[go: Go Back, main page]

NZ611801B2 - Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec - Google Patents

Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec Download PDF

Info

Publication number
NZ611801B2
NZ611801B2 NZ611801A NZ61180112A NZ611801B2 NZ 611801 B2 NZ611801 B2 NZ 611801B2 NZ 611801 A NZ611801 A NZ 611801A NZ 61180112 A NZ61180112 A NZ 61180112A NZ 611801 B2 NZ611801 B2 NZ 611801B2
Authority
NZ
New Zealand
Prior art keywords
gain
excitation
frame
fixed
contribution
Prior art date
Application number
NZ611801A
Other versions
NZ611801A (en
Inventor
Vladimir Malenovsky
Original Assignee
Voiceage Evs Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Voiceage Evs Llc filed Critical Voiceage Evs Llc
Priority claimed from PCT/CA2012/000138 external-priority patent/WO2012109734A1/en
Publication of NZ611801A publication Critical patent/NZ611801A/en
Publication of NZ611801B2 publication Critical patent/NZ611801B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/083Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being an excitation gain

Abstract

Disclosed is a device for quantizing a gain of a fixed contribution of an excitation in a frame, including sub-frames, of a coded sound signal. The device includes an input for a parameter t having a value representative of a classification of the frame. the device also includes an estimator of the gain of the fixed contribution of the excitation in a subframe of the frame. The estimator uses the value of the parameter t as a multiplicative factor in at least one term of a function used to calculate the estimated gain of the fixed contribution of the excitation. The device further includes a predictive quantizer of the gain of the fixed contribution of the excitation, in the sub-frame, using the estimated gain. e gain of the fixed contribution of the excitation in a subframe of the frame. The estimator uses the value of the parameter t as a multiplicative factor in at least one term of a function used to calculate the estimated gain of the fixed contribution of the excitation. The device further includes a predictive quantizer of the gain of the fixed contribution of the excitation, in the sub-frame, using the estimated gain.

Description

TITLE DEVICE AND METHOD FOR QUANTIZING THE GAINS OF ADAPTIVE AND FIXED CONTRIBUTIONS OF THE EXCITATION IN A CELP CODEC FIELD The present disclosure relates to quantization of the gain of a fixed contribution of an, excitation in a codedvsound signal. The present sure also relates tojoint quantization of the gains of the adaptive and fixed contributions of the excitation.
BACKGROUND In a coder of a codec structure, for e a CELP (Code-Excited Linear Prediction) codec structure such as ACELP (Algebraic Code-Excited Linear Prediction), an input speech or audio signal (sound signal) is processed in short segments, called . In order to capture rapidly varying properties of an input sound signal, each frame is further divided into sub-frames. A CELP codec structure also produces ve codebook and fixed codebook contributions of an tion that are added together to form a total excitation.
Gains related to the adaptive and fixed codebook contributions of the excitation are quantized and transmitted to a decoder along with other encoding parameters. The adaptive codebook contribution and the fixed codebook bution of the excitation will be referred to as "the adaptive contribution" and "the fixed contribution" of the excitation throughout the document.
W0 2012/109734 There is a need for a que for quantizing the gains of the adaptive and fixed excitation contributions that e the robustness of the codec against frame es or packet losses that can occur during transmission of the encoding parameters from the coder to the decoder.
SUMMARY According to a first aspect, the present disclosure s to a device for quantizing a gain of a fixed contribution of an excitation in a frame, including sub-frames, of a coded sound signal, comprising: an input for a parameter representative of a classification of the frame; an estimator of the gain of the fixed contribution of the excitation in a sub-frame of the frame, wherein the estimator is supplied with the parameter representative of the classification of the frame; and a predictive quantizer of the gain of the fixed contribution of the excitation, in the sub-frame, using the ted gain.
The present sure also relates to a method for quantizing a gain of a fixed contribution of an excitation in a frame, including sub-frames, of a coded sound , comprising: receiving a parameter representative of a classification of the frame; estimating the gain of the fixed contribution of the excitation in a sub- frame of the frame, using the ter representative of the classification of the frame; and predictive quantizing the gain of the fixed contribution of the excitation, in the sub-frame, using the estimated gain.
According to a third aspect, there is provided a device for jointly quantizing gains of adaptive and fixed contributions of an excitation in a frame of a coded sound signal, comprising: a quantizer of the gain of the adaptive contribution of the excitation; and the above described device for quantizing the gain of the fixed contribution of the excitation.
The present disclosure further relates to a method for jointly W0 2012/109734 quantizing gains of adaptive and fixed contributions of an tion in a frame of a coded sound signal. comprising: quantizing the gain of the adaptive contribution the excitation; and quantizing the gain of the fixed contribution of the tion using the above described method.
According to a fifth aspect, there is provided a device for retrieving a quantized gain of a fixed contribution of an excitation in a sub-frame of a frame, comprising: a receiver of a gain codebook index; an estimator of the gain of the fixed contribution of the excitation in the sub-frame, wherein the estimator is supplied with a parameter representative of a classification of the frame; a gain codebook for supplying a correction factor in response to the gain codebook index; and a lier of the ted gain by the correction factor to e a zed gain of the fixed contribution of the excitation in the sub-frame.
' The present disclosure is also concerned with a method for retrieving a quantized gain of a fixed contribution of an excitation in a sub-frame of a frame, comprising: receiving a gain codebook index; estimating the gain of the fixed contribution of the excitation in the sub-frame, using a parameter representative of a classification of the frame; supplying, from a gain codebook and for the sub- frame, a correction factor in response to the gain codebook index; and multiplying the estimated gain by the correction factor to provide a quantized gain of the fixed contribution of the excitation in said sub-frame.
The present disclosure is still further concerned with a device for retrieving quantized gains of adaptive and fixed butions of an excitation in a sub-frame of a frame, sing: a receiver of a gain codebook index; an estimator of the gain of the fixed contribution of the excitation in the sub-frame, n the estimator is supplied with a parameter representative of the fication of the frame; a gain codebook for supplying the quantized gain of the adaptive contribution of the tion and a correction factor for the sub-frame response to the gain codebook index; and a multiplier of the estimated gain by the correction factor to provide a quantized gain of fixed contribution of the excitation in the ame. ing to a further aSpect, the disclosure describes a method for ving quantized gains of adaptive and fixed contributions of an excitation in a sub-frame of a frame, comprising: receiving a gain codebook index; estimating the gain of the fixed bution of the excitation in the sub-frame, using a parameter representative of a classification of the frame; supplying, from a gain codebook and for the sub-frame, the quantized gain of the adaptive contribution of the excitation and a correction factor in response to the gain codebook index; and multiplying the estimated gain by the correction factor to provide a zed gain of fixed contribution of the excitation in the sub-frame.
The foregoing and other features will become more apparent upon reading of the ing non-restrictive description of rative embodiments, given by way of example only with referenceto the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS —-—E_,__ In the appended drawings: Figure 1 is a schematic diagram describing the construction of a filtered excitation in a ased coder; Figure 2 is a schematic block diagram describing an estimator of the gain of the fixed contribution of the excitation in a first ame of each frame; Figure 3 is a schematic block diagram describing an estimator of the gain of the fixed contribution of the excitation in all sub-frames following the first sub-frame; Figure 4 is a schematic block m describing a state machine in which estimation coefficients are calculated and used for ing a gain codebook for each ame; Figure 5 is a schematic block diagram describing a gain quantizer; Figure 6 is a schematic block diagram of another embodiment of gain quantizer equivalent to the gain quantizer of Figure 5.
DETAILED DESCRIPTION In the following, there is described quantization of a gain of a fixed contribution of an excitation in a coded sound signal, as well as joint quantization of gains of adaptive and fixed contributions of the excitation. The zation be applied to any number of sub-frames and deployed with any input speech or audio signal (input sound signal) sampled at any arbitrary sampling frequency.
Also, the gains of the adaptive and fixed contributions of the excitation are quantized without the need of inter-frame prediction. The e of inter-frame prediction results in improvement of the robustness against frame erasures or packet losses that can occur during transmission of encoded parameters.
The gain of the adaptive contribution of the excitation is quantized directly whereas the gain of the fixed contribution of the excitation is zed through an ted gain. The estimation of the gain of the fixed contribution of the excitation is based on parameters that exist both at the coder and the decoder.
These parameters are calculated during processing of the t frame. Thus, no information from a previous frame is required in the course of quantization or decoding which, as mentioned hereinabove, improves the robustness of the codec against frame erasures.
WO 09734 Although the following description will refer to a CELP (Code-Excited Linear Prediction) codec structure, for example ACELP (Algebraic Code-Excited Linear Prediction), it should be kept in mind that the subject matter of the present disclosure may be applied to other types of codec structures.
Optimal unquantized gains for the adaptive and fixed contributions of excitation in the art of CELP , the excitation is composed of two contributions: the adaptive contribution ive codebook excitation) and the fixed contribution (fixed ok excitation). The adaptive codebook is based on long-term prediction and is therefore related to the past excitation. The ve contribution of the excitation is found by means of a closed-loop search around an estimated value of a pitch Iag..The estimated pitch lag is found by means of a correlation analysis. The closed-loop search consists of minimizing the mean square weighted error (MSWE) between a target signal (in CELP coding, a perceptually filtered version of the input speech or audio signal (input sound )) and the filtered adaptive contribution of the excitation scaled by an adaptive codebook gain. The filter in the -loop search corresponds to the weighted synthesis filter known in the art of CELP coding. A fixed codebook search is also carried out by minimizing the mean squared error (MSE) n an updated target signal (after removing the adaptive contribution of the excitation) and the filtered fixed contribution of the excitation scaled by a fixed codebook gain.
The construction of the total filtered excitation is shown in Figure 1. For further reference, an implementation of CELP coding is- described in the following document: 3GPP TS 26.190, "Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions", of which the full contents is herein incorporated by nce.
Figure 1 is a schematic m describing the construction of the d total excitation in a CELP coder. The input signal 101 ,formed by the above mentioned target signal, is denoted as x(/) and is used as a reference during the search of gains for the adaptive and fixed contributions of the excitation. The filtered ve contribution of the excitation is denoted as y(i) and the filtered fixed contribution of the excitation (innovation) is denoted as 20). The ponding gains are denoted as gp for the adaptive contribution and 9C for the fixed contribution of the excitation. As illustrated in Figure 1, an amplifier 104 applies the gain 9,, to the filtered adaptive contribution y(i) Of the excitation and an amplifier 105 applies the gain 90 to the filtered fixed bution z(i) of the excitation. The optimal quantized gains are found by means of minimization of the mean square of the error signal e(l) calculated h a first subtractor 107 subtracting the signal gpy(i) at the output of the amplifier 104 from the target signal x,- and a second subtractor 108 subtracting the signal gcz(i) at the output of the amplifier 105 from the result of the ction from the subtractor 107. For all signals in Figure 1, the index / s the different signal samples and runs from 0 to L-1, where L is the length of each sub-frame. As well known to people skilled in the art, the filtered adaptive codebook contribution is usually computed as the convolution between the adaptive codebook excitation vector v(n) and the impulse se of the weighted synthesis filter h(n), that is y(n) = v(n)*h{n). Similarly, the filtered fixed codebook excitation z(n) is given by z(n) = (n), where cm) is the fixed codebook excitation.
Assuming the knowledge of the target signal x(l), the filtered adaptive contribution of the excitation y(r) and the filtered fixed contribution of the excitation z(l), the optimal set of unquantized gains gp and 9c is found by minimizing the energy of the error signal e(l) given by the following relation: 6(0 =X(i)- gpyfl) - gCZfi). i=0,...,L- 1 Equation (1) can be given in vector form as e=x-gpy—gcz (2) and minimizing the energy of the error signal, e'e = geza), where l‘ denotes vector ose, results in optimum unquantized gains £17,017! " "—2"_ Ci02~0304 -- _ Coos— ' ‘74Cic oc.opt (3) C002 ' C4 009 ' Q: where the constants or correlations Co, c1, 02, c3, c4 and ’05 are calculated as co :y1y; C] = X'y. C2 =2'Z. C3 =x'2. c4 = y'z, c5 = x'x. (4) The optimum gains in Equation (3) are not quantized directly, but they are used in training a gain codebook as will be described later. The gains are quantized jointly, after applying prediction to the gain of the fixed contribution of the tion. The tion is performed by computing an estimated value of the gain goo of the fixed contribution of the excitation. The gain of the fixed contribution of the excitation is given by gc = 900.}: where y is a correction factor. Therefore, each codebook entry contains two values. The first value corresoonds to the quantized gain gp of the adaptive contribution of the excitation. The second value corresponds to the correction factor ywhich is used to multiply the estimated gain goo of the fixed contribution of the excitation. The optimum index in the gain codebook (gp and y) is found by minimizing the mean squared error between the target signal and filtered total excitation. Estimation of the gain of the fixed bution of the excitation is described in detail below.
Estimation of the gain of the fixed contribution of the excitation Each frame contains a certain, number of sub-frames. Let us denote the number of sub-frames in a frame as K and the index of the t sub-frame as k. The tion goo of the gain of the fixed contribution of the excitation is performed ently in each sub-frame.
Figure 2 is a schematic block diagram describing an estimator 200 of the gain of the fixed bution of the excitation (hereinafter fixed codebook gain) in a first sub-frame of each frame.
The estimator 200 first ates an tion of the fixed codebook gain in response to a parameter frepresentative of the classification of the current frame. The energy of the innovation codevector from the fixed' codebook is then subtracted from the estimated fixed codebook gain to take into consideration this energy of the filtered innovation ctor. The resulting, estimated fixed codebook gain is multiplied by a correction factor selected from a gain codebook to produce the quantized fixed codebook gain 96.
In one embodiment, the estimator 200 comprises a calculator 201 of a linear estimation of the fixed codebook gain in logarithmic domain. The fixed codebook gain is ted assuming unity-energy of the innovation codevector 202 from the fixed codebook. Only one estimation parameter is used by the calculator 201 , the ter trepresentative of the classification of the current frame. A subtractor 203 then subtracts the energy of the filtered innovation codevector 202 from the fixed ok in logarithmic domain from the linear estimated fixed codebook gain in logarithmic domain at the output of the calculator 201 . A converter 204 converts the estimated fixed ok gain in logarithmic domain from the subtractor 203 to linear domain. The output in linear domain from the converter 204 is the estimated fixed codebook gain 900. A multiplier 205 multiplies the estimated gain God by the correction factor 206 selected from the gain codebook. As described in the preceding paragraph, the output of the multiplier 205 constitutes the quantized fixed codebook gain gc.
The quantized gain 9,, of the adaptive bution of the excitation (hereinafter the adaptive codebook gain) is selected directly from the gain ok. A lier 207 multiplies the fittered adaptive excitation 208 from the adaptive codebook by the quantized adaptive codebook gain gp to produce the filtered adaptive contribution 209 of the filtered excitation. Another multiplier 210 multiplies the d innovation codevector 202 from the fixed codebook by the quantized fixed codebook gain go to e the filtered fixed contribution 211 of the filtered excitation. Finaity, an adder 212 sums the filtered adaptive 209 and fixed 211 contributions of the excitation to form the total filtered excitation 214. in the first sub-frame of the current frame, the estimated fixed codebook gain in logarithmic domain at the output of the subtractor 203 is given by GE}? = a0 + alt ‘10g10(‘/E—i) (5) W0 2012/109734 where 03)) = loglO (gig/2) .
The inner term inside the logarithm of Equation (5) corresponds to the square root of the energy of the filtered tion vector 202 (E,- is the energy of the filtered innovation vector in the first sub-frame of frame n). This inner term (square root of the energy £,) is determined by a first calculator 215 of the energy E] of the filtered innovation vector 202 and a calculator 216 of the square root of that energy E A calculator 217 then computes the logarithm of the square root of the energy E, for application to the ve input of the subtractor 203. The inner term (square root of the energy E,) has non-zero eneI'QY; the energy is incremented by a small amount in case of all-zero frames to avoid The estimation of the fixed codebook gain in calculator 201 is linear in logarithmic domain with estimation cients a0 and a1which are found for each sub-frame by means of a mean square minimization on a large signal database (training) as will be explained in the following description. The only estimation parameter 202 in the equation, t, denotes the classification parameter for frame n (in one ment, this value is constant for all ames in frame n). Details about classification of the frames are given below. Finally, the estimated value of the gain in logarithmic domain is converted back to the linear domain (92)) =10rco ) by the calculator 204 and used in the search process for the best index of the gain codebook as will be explained in the following description.
The superscript (1) s the first sub-frame of the current frame n.
As explained in the foregoing description, the parameter 1‘ representative of the fication of the current frame is used in the calculation of the estimated fixed codebook gain gco. Different codebooks can be designed for different classes of voice signals. However, this will increase memory requirements. Also, estimation of the fixed codebook gain in the frames following the first frame can be based on the frame classification parameter tand the available adaptive and fixed codebook gains from previous sub-frames in the current frame. The estimation is confined to the frame boundary to increase robustness against frame es.
For example, frames can be classified as unvoiced, voiced, generic, or transition frames. Different alternatives can be used for classification. An example is given later below as a non-limitative illustrative embodiment. Further, the number of voice classes can be different from the one used hereinabove. For example the classification can be only voiced or unvoiced in one embodiment. another embodiment more classes can be added such as strongly voiced and strongly unvoiced. [00401 The values for the classification estimation parameter t can be chosen arbitrarily. For example, for narrowband signals, the values of parameter t are set to: 1, 3, 5, and 7, for unvoiced, voiced, generic, and transition frames, respectively, and for wideband signals, they are set to 0, 2, 4, and 6, tively.
However, other values for the estimation parameter tcan be used for each class. including this estimation, classification parameter t in the design and training for ining estimation parameters will result in better estimation gpo of the fixed codebook gain. [0041 1 The sub-frames following the first ame in a frame use slightly ent estimation scheme. The difference is in fact that in these sub-frames, both the zed adaptive codebook gain and the quantized fixed ok gain from the previous sub-frame(s) in the current frame are used as auxiliary tion ters to se the efficiency.
W0 2012/109734 Figure 3 is a schematic block diagram of an estimator 300 for estimating the fixed codebook gain in the sub-frames following the first sub-frame in a current frame. The estimation parameters e the classification parameter 1‘ and the zed values (parameters 301) of both the adaptive and fixed codebook gains from previous sub-frames of the t frame. These parameters 301 are d as 9pm, 0"), gp(2), 90(2), etc. where the superscript refers to first, second and other us sub-frames. An estimation of the fixed codebook gain is calculated and is multiplied by a correction factor selected from the gain codebook to produce a quantized fixed codebook gain go,- forming the gain of the fixed bution of the excitation (this estimated fixed codebook gain is ent from that of the first sub-frame).
In one embodiment, a calculator 302 computes a linear estimation of the fixed codebook gain again in logarithmic domain and a converter 303 converts the gain estimation back to linear‘domain. The quantized adaptive codebook gains rip(1) . gp‘2), etc. from the previous "sub-frames are supplied to the calculator 302 ly while the quantized fixed codebook gains go"), 90(2), etc. from the previous sub-frames are supplied to the calculator 302 in logarithmic domain through a logarithm calculator 304. A lier 305 then multiplies the estimated fixed ok gain 9co (which is different from that of the first sub-frame) from the converter 303 by the correction factor 306, selected from the gain codebook. As described in the preceding paragraph, the multiplier 305 then outputs a quantized fixed codebook gain go, forming the gain of the fixed contribution of the excitation.
A first multiplier 307 multiplies the filtered adaptive excitation 308 from the adaptive codebook by the quantized adaptive codebook gain gp ed directly from the gain codebook to produce the adaptive contribution 309 of the excitation. A second multiplier 310 multiplies the d innovation codevector 31 1 from the fixed codebook by the quantized fixed codebook gain 96 to produce the fixed contribution 312 of the excitation. An adder 313 sums the filtered adaptive W0 2012/109734 309 and filtered fixed 312 contributions of the excitation together so as to form the total ed excitation 314 for the current frame.
The estimated fixed codebook gain from the calculator 302 in the kth sub-frame of the current frame in logarithmic domain is given by GEE) = 00 +03]t + 2 j-_1(sz_2G£])+ 521489)),*. I ' k = 2,..., AT. (6) where Gg")=Iogm(g£k)) is the quantized fixed codebook gain in logarithmic domain in ame k, and gg‘) is the quantized ve codebook gain in sub- frame 1:.
For example, in one embodiment, four (4) sub-frames are used (K=4) so the estimated fixed codebook gains, in logarithmic domain, in the second, third, and fourth sub-frames from the calculator 302 are given by the following relations: egg) = a0 +a1t + (>069) +1;lg9), GS? = 00 +0]: + DOGS." + 118%" 1:26?) + ,3), and 6:3) = a0 + + been + be,» + mg» + tag," as?) + Iggy.
The above tion of the fixed codebook gain is based on both the quantized adaptive and fixed ok gains of all previous sub-frames of the current frame. There is also another difference between this estimation scheme and the one used in the first sub-frame. The energy of the filtered innovation vector from the fixed ok is not subtracted from the linear estimation of the fixed codebook gain in the logarithmic domain from the calculator 302. The reason comes from the use of the quantized adaptive codebook and fixed codebook gains from the previous sub-frames in the estimation equation. In the first sub-frame, the linear estimation is performed by the calculator 201 assuming unit energy of the innovation . Subsequently, this energy is subtracted to bring the estimated W0 2012/109734 fixed codebook gain to the same energetic level as its optimal value (or at least close to it). in the second and uent sub-frames, the previous quantized values of the fixed codebook gain are already at this level so there is no need to take the energy of the filtered tion vector into consideration. The estimation coefficients a, and b( are different for each sub-frame and they are determined offline using a large training database as will be described later below.
Calculation of estimation coefficients An optimal set of estimation coefficients is found on a large se containing clean, noisy and mixed speech signals in various languages and levels and with male and female talkers.
The estimation coefficients are calculated by running the codec with optimal unquantized values of adaptive and fixed codebook gains on the large database. It is reminded that the optimal unquantized adaptive and fixed codebook gains are found according to Equations (3) and (4).
In the ing description it is assumed that the database comprises Al+1 frames, and the frame index is n=0,...,N. The frame index n is added to the parameters used in the training which vary on a frame basis ification, first sub-frame innovation energy, and m adaptive and fixed codebook gains).
The estimation coefficients are found by minimizing the mean square error between the ted fixed codebook gain and the optimum gain in the logarithmic domain over all frames in the database.
For the first sub-frame, the mean square error energy is given by £9}. = Z[G§o'(n)-1\ \og1o(g£2pz(n))] (7) From Equation (5), the estimated fixed codebook gain in the first sub-frame of frame n is given by GEM") = a0 + "1’(n) '" 10810(\/ Er ("D , then the mean square error energy is given by $}? = 2 [a0 +a1r The minimization problem may be simplified by defining a normalized gain of the innovation vector in logarithmic . That is Gt‘ltn)=bagels-mm»+logmtg£2p,(n)), n:0,..,~ - 1- (9) The total error energy then becomes (1) 2 East = Z [00 +a1t(n)—Gi(I) (72)] . (10) The solution of the above defined MSE (Mean Square Error) problem is found by the following pair of partial derivatives gigs? : 0’3 1 51:19:32 : 0a 1 The optimal values of estimation coefficients resulting from the above equations are given by N N N N Zr2 02):: G,-(1) 00— 2 tin): me}I) (n) a0 —_ n=0 n=0 n=0 n=0 )+|:Zt(n)]N 2 , n=0 n=0 N N N ’ (11) NZr(n>G§"(n)— Z t(n)Z We» 11:0 12:0 a] = n=0 )+[Zr(n)]n=0N 2 Estimation of the fixed codebook gain in the first ame is performed in logarithmic domain and the estimated fixed codebook gain should be as close as possible to the ized gain of the innovation vector in logarithmic domain, G,(1)(n).
For the second and other subsequent sub-frames, the estimation scheme is slightly different. The error energy is given by 55:? =k .- 2 16%) - 659mm, k 2,...,K. (12) where 6% t=log]0(g%,,). Substituting Equation (6) into Equation (12) the following is obtained Egg) = 2 [a0 + 011(71) +Zk_l (b2j_2Gc(.J)(n) + b2j_lg(p1X!1)) -. — 2 F1 (37%)!" (n) ] (13) {0061] For the ation of the estimation coefficients in the second and subsequent sub-frames of each frame, the quantized values of both the fixed and adaptive codebook gains of previous sub-frames are used in the above Equation (13). gh it is possible to use the optimal unquantized gains in their place, the usage of quantized values leads to the maximum estimation efficiency in all sub- frames and consequently to better l performance of the gain quantizer.
Thus, the number of estimation coefficients increases as the index of the current sub-frame is advanced. The gain quantization itself is described in the following description. The estimation coefficients a,- and b; are different for each ame, but the same symbols were used for the sake of simplicity. ly, they would either have the superscript W associated therewith or they would be denoted differently for each sub-frame, wherein k is the sub-frame index.
The minimization of the error function in Equation (13) leads to the following system of linear equations N {(n) L 2 ggk-Wn) Z n) n=0 n=0 00 ":0 N N N N 2 t(n) Z r2 (n) (k—l) a] L 2 t(n)gp (n) Z ,opz (n)(k) n=0 n=0 _ n20 — M M O M M M N b Zglak")(n) _ N Zt(n)g§:k_n(n) L Zlgl""l(n)]2 2k 3 Zgék"’(n)0§32pr(n) The solution of this system, i.e. the optimal set of estimation coefficients a0, 81, 1:0,... ,bzk—z, is not provided here as it leads to complicated W0 2012/109734 formulas. it is usually solved by mathematical software equipped with a linear equation , for example MATLAB. This is advantageously done offline and not during the encoding process.
Forthe second sub-frame, Equation (14) reduces to N N N gown) :00: )(n)l 20g; )0»l 2068mm)2 N N N N Zrtn) N 2:201) Zttn)0§"(n) Zt(n)g§P(n) a° ZrG3.2mm) n=0 n=0 11:0 "1 n=0 N = N . (n) 2:000 on(1) Z[G.(I) 02)]2 b0 (mgp(I) (1) (n) (1) bl ZG n=0 (n)G,.,,,,(n)(2) N0 n=0 11:Zoggncn)Zt(n)g‘,,"(n) I1-ZOG§‘>(n)g§,'>(n)] fl:Zogmooc3.102) As mentioned hereinabove, calculation of the estimation coefficients is alternated with gain zation as depicted in Figure 4. More specifically, Figure 4 is a schematic block diagram describing a state machine 400 in which the estimation coefficients are calculated (401) for each sub-frame. The gain codebook is then designed (402) for each sub—frame using the ated estimation coefficients. Gain quantization (403) for the sub-frame is then conducted on the basis of the calculated estimation cients and the gain ok design. Estimation of the fixed codebook gain itself is slightly different in each sub-frame, the estimation coefficients are found by means of m mean square error, and the gain codebook may be ed by using the KMEANS algorithm as described, for e, in MacQueen, J. B. (1967). "Some Methods for classification and Analysis of Multivariate Observations". Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press. pp. 281-297, of which the full contents is herein incorporated by reference.
W0 09734 2012/000138 Gain quantization Figure 5 is a schematic block diagram describing a gain quantizer 500.
Before gain zation it is assumed that both the filtered adaptive excitation 501 from the adaptive codebook and the filtered tion codevector 502 from the fixed codebook are already known. The gain quantization at the coder is performed by searching the designed gain codebook 503 in the MMSE (Minimum Mean Square Error) sense. As described in the ing description, each entry in the gain codebook 503 includes two values: the quantized adaptive codebook gain gp and the correction factor 7 for the fixed contribution of the tion. The estimation of the fixed codebook gain is performed beforehand and the estimated fixed ok gain go is used to multiply the correction factor y selected from the gain codebook 503. In each sub-frame, the gain codebook 503 is searched completely, i.e. for indices g=0,..,Q-1 , Q being the number of indices of the gain codebook. It is possible to limit the search range in case the quantized adaptive codebook gain gp is mandated to be below a certain threshold. To allow reducing the search range, the codebook entries may be sorted in ascending order according to the value of the adaptive codebook gain 9,).
Referring to Figure 5, the two-entry gain codebook 503 is ed and each index provides two values - the adaptive codebook gain 9p and the correction factor 7. A multiplier 504 multiplies the correction factor 7 by the estimated fixed ok gain gco and the resulting value is used as the quantized gain 505 of the fixed contribution of the tion (quantized fixed codebook gain).
Another multiplier 506 multiplies the filtered ve excitation 505 from the adaptive codebook by the quantized adaptive codebook gain gp from the gain codebook 503 to produce the adaptive contribution 507 of the excitation. A multiplier 508 multiplies the filtered innovation codevector 502 by the quantized fixed codebook gain 505 to produce the fixed contribution 509 of the excitation. An adder 510 sums both the adaptive 507 and fixed 509 contributions of the excitation together so as to form the filtered total excitation 51 1. A subtractor 512 subtracts the filtered total excitation 511 from the target signal x; to produce the error signal e,. A calculator 513 computes the energy 515 of the error signal e; and es it back to the gain codebook searching mechanism.
All or a subset of the indices of the gain codebook 501 are searched in this manner and the index of the gain codebook 503 yielding the lowest error energy 515 is selected as the winning index and sent to the decoder.
The gain quantization can be performed by minimizing the energy of the error in Equation (2). The energy is given by E=e‘e=(x- gpy- X - gpy- gcz). (15) Substituting go by 7ch the ing relation is obtained E =05 +91360 ' 291,01 +#93002 ' 279cgcs +29p79cgc4 (16} where the constants or correlations Co, c1, 02 ca, c4 and as are ated as in Equation (4) above. The constants or correlations Co, 01, 02, c3, c4 and cs, and the estimated gain gCo are computed before the search of the gain codebook 503, and then the energy in Equation (16) is calculated for each codebook index (each set of entry values 9p and y).
The codevector from the gain codebook 503 leading to the lowest energy 515 of the error signal e.- is chosen as the g codevector and its entry values correspond to the quantized values gp and y. The quantized value of the fixed ok gain is then calculated as W0 2012/109734 gc : ch-r ' Figure 6 is a schematic block diagram of an equivalent gain quantizer 600 as in Figure 5, ming calculation of the energy E, of the error signal 9; using on (16). More specifically, the gain quantizer 600 comprises a gain codebook 601, a calculator 602 of constants or correlations, and a calculator 603 of the energy 604 of the error signal. The calculator 602 calculates the constants or correlations co, c1, c2 c3, c4 and Cs using Equation (4) and the target vector x, the ed adaptive excitation vector y from the adaptive codebook, and the filtered fixed codevector z from the fixed codebook, wherein denotes vector ose. The calculator 603 uses Equation (16) to calculate the energy E, of the error signal e; from the estimated fixed codebook gain 9;», the correlations Co, 01, c2 02, 04 and C5 from calculator 602, and the quantized adaptive codebook gain gp and the correction factor 7 from the gain ok 601 . The energy 604 of the error signal from the calculator 603 is supplied back to the gain codebook searching mechanism. Again, all or a subset of the indices of the gain codebook 601 are searched in this manner and the index of the gain codebook 601 yielding the lowest error energy 604 is selected as the winning index and sent to the decoder.
In the gain quantizer 600 of Figure 6, the gain codebook 601 has a size that can be different depending on the sub-frame. Better tion of the fixed codebook gain is attained in later sub-frames in a frame due to increased number of estimation parameters. Therefore a smaller number of bits can be used in later sub-frames. In one embodiment, four (4) sub-frames are used where the s of bits for the gain ok are 8, 7, 6, and 6 corresponding to sub- frames 1, 2, 3, and 4, respectively. In another embodiment at a lower bit rate, 6 bits are used in each sub-frame.
In the decoder, the received index is used to retrieve the values quantized adaptive codebook gain gp and correction factor 7 from the gain codebook. The estimation of the fixed codebook gain is performed in the same manner as in the coder, as bed in the foregoing description. The quantized value of the fixed codebook gain is calculated by the on 90 = goo Both the adaptive codevector and the innovation codevector are decoded from the bitstream and they become adaptive and fixed excitation contributions that are multiplied by the respective adaptive and fixed ok gains. Both excitation contributions are added together to form the total excitation. The synthesis signal is found by filtering the total excitation through a LP synthesis filter as known in the art of CELP coding.
Signal fication Different methods can be used for determining classification of a frame. for example parameter tof Figure 1.A non-limitative example is given in the following ption where frames are classified as unvoiced, , generic, or transition frames. However, the number of voice classes can be different from the one used in this example. For example the classification can be only voiced or unvoiced in one embodiment. In another embodiment more classes can be added such as ly voiced and strongly unvoiced.
Signal classification can be performed in three steps, where each step discriminates a specific signal class. First, a signal activity detector (SAD) discriminates between active and inactive speech frames. If an inactive speech frame is detected (background noise signal) then the classification chain ends and the frame is encoded with comfort noise tion (CNG). if an active speech frame is detected, the frame is subjected to a second classifier to discriminate unvoiced frames. ii the classifier fies the frame as unvoiced speech signal, the classification chain ends, and the frame is encoded using a coding method optimized for ed signals. Otherwise, the frame is processed through a "stable voiced" classification module. If the frame is classified as stable voiced frame, then the frame is encoded using a coding method optimized for stable voiced signals. Othenlvise, the frame is likely to contain a non-stationary signal segment such as a voiced onset or rapidly evolving voiced signal. These frames typically require a general purpose coder and high bit rate for sustaining good subjective quality. The disclosed gain zation technique has been developed and optimized for stable voiced and general-purpose frames.
However, it can be easily extended for any other signal class. in the following, the classification of unvoiced and voiced signal frames will be described.
The unvoiced parts of the sound signal are terized by missing periodic component and can be further divided into unstable frames, where energy and spectrum change rapidly, and stable frames where these characteristics remain relatively stable. The fication of unvoiced frames uses the following ters: voicing measure 7X, ed as an averaged normalized correlation; - e spectral tilt measure (3, ); maximum short-time energy increase at low level {at} to efficiently detect ive signal segments; - maximum short-time energy variation (dE) used to assess frame stability; tonal stability to minate music from unvoiced signal as described in [Jelinek, M., Vaillancourt, T., Gibbs, J., "G718: A new embedded speech and audio coding standard with high resilience to error-prone W0 09734 ‘ transmission channels", In IEEE Communications ne, vol. 47. pp. 117-123, October 2009] of which the full contents is herein orated by reference; and ° relative frame energy (Erei) to detect very low-energy signals. g measure The normalized correlation, used to determine the voicing measure, is computed as part of the open—loop pitch analysis.
In the art of CELP , the open-loop search module usually outputs two estimates per frame. Here, it is also used to output the normalized correlation measures. These normalized correlations are computed on a weighted signal and a past weighted signal at the open-loop pitch delay. The weighted speech signal sw(n) is computed using a perceptual weighting filter. For example, a perceptual weighting filter with fixed denominator, suited for wideband signals, is used. An example of a transfer function of the perceptual weighting filter is given by the following relation: where A(z) is a transfer on of linear prediction (LP) filter computed by means of the Levinson-Durbin algorithm and is given by the following A(z) = Hf a,Z-' .
LP analysis and open-loop pitch anaiysis are well known in the art of CELP coding and, accordingly, will not be further described in the present description.
The voicing measure 7)} is defined as an average normalized correlation given by the following relation: ‘ 1 Cnorm — 3(Cmrm (do) + Cnorm (d!) + Cnam ((12)) where Cnorm(do), d1) and Cnorm(d2) are, respectively, the ized correlation of the first half of the current frame, the normalized correlation of the second half of the current frame, and the normalized correlation of the look-ahead (the ing of the next frame). The arguments to the correlations are the open- loop pitch lags.
Spectral tilt The spectral tilt contains information about a frequency bution of energy. The spectral tilt can be estimated in the frequency domain as a ratio between the energy concentrated in low frequencies and the energy trated in high frequencies. However, it can be also estimated in different ways such as a ratio between the two first autocorrelation coefficients of the signal.
The energy in high frequencies and low frequencies is computed following the tual critical bands as described in [J. D. Johnston, "Transform Coding of Audio Signals Using Perceptual Noise Criteria," 'lEEE Journal on Selected Areas in Communications, vol. 6, no. 2, pp. 314-323, February 1988] of which the fuil contents is herein incorporated by reference. The energy in high frequencies is calculated as the e energy of the last two critical bands using the following relation: Eh = 0-5": 011(me ‘U + Eca(brrax)] PCT/CA20121000138 where E030) is the critical band energy of fth band and bmax is the last critical band. The energy in low frequencies is computed as average energy of the first to critical bands using the following relation: E,= 1———0 30') b 2E min l= where bm-m is the first critical band.
The middle critical bands are excluded from the calculation as they do not tend to improve the discrimination between frames with high energy concentration in low frequencies (generally voiced) and with high energy concentration in high ncies (generally unvoiced). in between, the energy content is not teristic for any of the classes sed further and increases the decision confusion.
The spectral tilt is given by - N! e, =-.. - h~Nh where IV h and IV, are, respectively, the average noise energies in the last two critical bands and first 10 critical bands, computed in the same way as Eh and E.
The estimated noise energies have been added to the tilt computation to account for the presence of background noise. The spectral tilt ation is performed twice per frame and average spectral tilt is calculated which is then used in ed frame classification. That is = % (em + e, (0) +e, (1)) W0 2012/109734 where said is the spectral tilt in the second half of the previous frame.
Maximum short-time energy se at low level The maximum short-time energy increase at low level dEO is evaluated on the input sound signal s(n), where n=0 corresponds to the first sample of the current frame. Signal energy is evaluated twice per ame.
Assuming for example the scenario of four sub-frames per frame, the energy is calculated 8 times per frame. if the total frame length is, for example, 256 s, each of these short segments may have 32 samples. In the calculation, short-term energies of the last 32 samples from the previous frame and the first 32 samples from the next frame are also taken into consideration. The short-time energies are calculated using the following relations: Es, (})=In3)x(s (1+32})),a) - 2 . . -.—-1,..,8, where j=-1 and j=8 correspond to the end of the previous frame and the beginning of the next frame, respectively. Another set of nine short-term energies is calculated by shifting the signal indices in the previous equation by 16 s using the following relation: E;f’U)=mgx(sz(i+32j-16)) , y=0,..,8.
For es that are ently low, i.e. which fulfill the condition IOIogCEf,"(j)) < 37 , the following ratio is calculated MM E§3’(j+l) ‘7) 3 E‘"(J') , forj=-1,..,6, for the first set of energies and the same calculation is repeated for EEKj) with j=0,..,7 to obtain two sets of ratios ratw and ratm. The only m in these two sets is searched by 418 0 rmax (rat(",rat(2)) which is the maximum short-time energy increase at low level.
Maximum short-time energy variation This parameter dE is similar to the maximum short-time energy increase at low level with the difference that the low-level condition is not applied.
Thus, the parameter is computed as the maximum of the following four : E53) (Wit-1) ES’(7)/E§P(8) ml59>(i).EEP(i—1)i forj=l,..,7 min (ESP (1'), 133W - 1)) maxiE53’(i),Ef.2)(i—lii min(E§3) Unvoicedsignal classification The fication of unvoiced signal frames is based on the parameters described above, namely: the voicing measure 7x, the average al tilt 5:: the maximum short-time energy se at low level dEQ and the maximum short-time energy variation dE. The algorithm is further supported by the tonal stability parameter, the SAD flag and the relative frame energy ated during the noise energy update phase. For more detailed information about these parameters, see for example [Jelinek, M., et al., "Advances in source-controlled WO 09734 ' variable bitrate wideband speech coding", l Workshop in MAUI (SWIM): Lectures by masters in speech processing, Maui, Hawaii, January 12-14, 2004] of which the full content is herein incorporated by reference.
The relative frame energy is given by where E, is the total frame energy (in dB) and E} is the long-term average frame energy, updated during each active frame by E = 0995,— 0.0lE,.
The rules for unvoiced classification of wideband signals are summarized below [({rx < 0.695 )AND (3, <4.0 )) OR (Ee ,< -14)] AND [last frame INACTlVE OR ED OR ((eou < 2.4) AND (rx(0) < 0.66))] [dEO < 250] AND [ef(‘l) < 2.7] AND NOT [{tonaLstability AND {{rx > 0.52) AND (e, >0.5 )) OR (e, >0.85 )) AND (£re,> - 14) AND SAD flag set to 1] The first line of this condition is related to low-energy signals and signals with low correlation concentrating their energy in high frequencies. The second line covers voiced s, the third line covers explosive signal segments and the fourth line is related to voiced onsets. The last line discriminates music s that would be othenlvise declared as unvoiced.
If the combined conditions are fulfilled the classification ends by ing the current frame as unvoiced.
Voicedsignal classification if a frame is not classified as inactive frame or as unvoiced frame then it is tested if it is a stable voiced frame. The decision rule is based on the normalized correlation Q in each ame (with 1/4 subsample resolution), the average spectral tilt E; and oop pitch tes in all sub-frames (with 1/4 subsample resolution).
The open-loop pitch estimation procedure calculates three open-loop pitch lags: d0- CH and d2, ponding to the first half—frame, the second half- frame and the look-ahead (first half~frame of the following frame). In order to obtain a precise pitch information in all four sub-frames, 114 sample resolution fractional pitch refinement is calculated. This refinement is calculated on a perceptually weighted input signal swj{n) (for example the input sound signal s(n) filtered through the above described perceptual weighting filter). At the beginning of each sub-frame a short correlation analysis (40 samples) with resolution of 1 sample is performed in the interval (-7, +7) using the following delays: do for the first and second sub-frames and d1 for the third and fourth sub—frames. The correlations are then interpolated around their maxima at the fractional positions dmax- 3/4, dmax - 1/2, dmx - 1/4, dmax + , dmax 1/4, dmax + 1/2, dmx + 3/4. The value yielding the m correlation is chosen as the d pitch lag.
Let the refined open-loop pitch lags in all four sub-frames be denoted as 7(0), T(1), 7(2) and 7(3) and their correSponding normalized correlations as 0(0), 0(1), 0(2) and C(3). Then, the voiced signal classification condition is given [0(0) > 0.605] AND [0(1) > 0.605] AND [0(2) > 0.605] AND [0(3) > 0.605] AND [gt >4] AND [|T(1) - 7(0); 1 < SAND "7(2) - 7(1)] l < SAND [l7(3) - 7(2)I ] < 3 The above voiced signal classification condition indicates that the normalized correlation must be sufficiently high in all sub-frames, the pitch estimates must not diverge throughout the frame and the energy must be concentrated in low frequencies. If this condition is fulfilled the fication ends by declaring the current frame as voiced. Othen/vise the t frame is declared as generic. [0099} Although the present invention has been described in the foregoing description with nce to non-restrictive illustrative embodiments thereof, these embodiments can be modified at will within the scope of the appended claims without departing from the spirit and nature of the t invention.

Claims (50)

WHAT IS CLAIMED IS:
1. A device for quantizing a gain of a fixed contribution of an excitation in a frame, including ames, of a coded sound signal, comprising: an input for a parameter t having a value representative of a classification of the frame; an estimator of the gain of the fixed contribution of the excitation in a subframe of said frame, wherein the estimator uses the value of the parameter t as a multiplicative factor in at least one term of a function used to calculate the estimated gain of the fixed contribution of the tion; and a predictive quantizer of the gain of the fixed contribution of the excitation, in the ame, using the estimated gain.
2. The quantizing device according to claim 1, wherein the tive quantizer determines a correction factor for the estimated gain as a quantization of the gain of the fixed contribution of the excitation, and wherein the estimated gain lied by the correction factor gives the quantized gain of the fixed contribution of the excitation.
3. The quantizing device according to claim 1 or 2, wherein the estimator comprises, for a first sub-frame of the frame, a calculator of a first estimation of the gain of the fixed bution of the excitation in response to the value of the parameter t representative of the classification of the frame, and a subtractor of an energy of a filtered innovation codevector from a fixed codebook from the first estimation to obtain the estimated gain.
4. The quantizing device according to claim 2, wherein the estimator comprises, for a first sub-frame of the frame: a calculator of a linear estimation of the gain of the fixed contribution of the excitation in logarithmic domain in response to the value of the parameter t representative of the classification of the frame; a subtractor of an energy of a ed tion codevector from a fixed codebook in logarithmic domain from the linear gain estimation from the ator, the subtractor producing a gain in logarithmic domain; a ter of the gain in logarithmic domain from the subtractor to linear domain to produce the estimated gain; and a multiplier of the estimated gain by the correction factor to produce the quantized gain of the fixed contribution of the excitation.
5. The quantizing device according to claim 1, n the estimator, for each ame of said frame following the first sub-frame, is responsive to the value of the ter t representative of the classification of the frame and gains of adaptive and fixed contributions of the excitation of at least one us subframe of the frame to estimate the gain of the fixed contribution of the excitation.
6. The zing device according to claim 5, wherein the estimator comprises, for each sub-frame following the first sub-frame, a calculator of a linear estimation of the gain of the fixed contribution of the excitation in logarithmic domain and a converter of the linear estimation in logarithmic domain to linear domain to produce the estimated gain.
7. The quantizing device according to claim 6, wherein the gains of the adaptive and fixed contributions of the excitation of at least one previous subframe of the frame are quantized gains and the quantized gains of the adaptive contributions of the excitation are supplied to the calculator directly while the quantized gains of the fixed contributions of the excitation are supplied to the calculator in logarithmic domain through a logarithm calculator.
8. The quantizing device according to claim 3 or 4, wherein the calculator of the estimation of the gain of the fixed contribution of the excitation uses in relation to the classification parameter t estimation coefficients determined using a large training database.
9. The quantizing device according to claim 6 or 7, n the calculator of a linear estimation of the gain of the fixed contribution of the excitation in thmic domain uses in relation to the classification parameter t of the frame and the gains of the adaptive and fixed contributions of the excitation of at least one previous sub-frame estimation coefficients which are different for each ame and ined using a large training database.
10. The quantizing device according to any one of claims 1 to 9, wherein the estimator uses, for estimating the gain of the fixed contribution of the excitation, estimation coefficients different for each sub-frame of the frame.
11. The quantizing device according to any one of claims 1 to 10, wherein the estimator confines estimation of the gain of the fixed bution of the excitation in the frame to increase robustness against frame erasure.
12. A device for y zing gains of adaptive and fixed contributions of an excitation in a frame of a coded sound signal, comprising: a quantizer of the gain of the adaptive contribution of the excitation; and the device for quantizing the gain of the fixed contribution of the excitation as defined in any one of claims 1 to 11.
13. The device for jointly quantizing the gains of the adaptive and fixed contributions of the excitation according to claim 12, comprising a gain codebook having entries each comprising the quantized gain of the adaptive contribution of the excitation and a correction factor for the estimated gain.
14. The device for jointly quantizing the gains of the adaptive and fixed contributions of the excitation according to claim 13, wherein the quantizer of the gain of the adaptive bution of the excitation and the predictive quantizer of the gain of the fixed contribution of the excitation search the gain codebook and select the gain of the adaptive contribution of the excitation from one entry of the gain codebook and the correction factor of the same entry of the gain codebook as a quantization of the gain of the fixed contribution of the tion.
15. The device for jointly quantizing the gains of the adaptive and fixed contributions of the excitation according to claim 13, comprising a designer of the gain ok for each sub-frame of the frame.
16. The device for jointly quantizing the gains of the adaptive and fixed contributions of the excitation according to claim 15, wherein the gain codebook has ent sizes in different sub-frames of the frame.
17. The device for jointly quantizing the gains of the adaptive and fixed contributions of the excitation ing to claim 14, wherein the quantizer of the gain of the adaptive contribution of the tion and the predictive quantizer of the gain of the fixed contribution of the excitation search the gain codebook completely in each sub-frame.
18. A device for retrieving a quantized gain of a fixed contribution of an excitation in a sub-frame of a frame, sing: a receiver of a gain codebook index; an estimator of the gain of the fixed contribution of the excitation in the subframe , wherein the estimator is supplied with a parameter t having a value entative of the classification of the frame, and uses the value of the parameter t as a multiplicative factor in at least one term of a function used to calculate the estimated gain of the fixed contribution of the excitation; a gain codebook for supplying a correction factor in response to the gain codebook index; and a multiplier of the estimated gain by the correction factor to provide a quantized gain of the fixed contribution of the excitation in said sub-frame.
19. The device for retrieving the quantized gain of the fixed contribution of the excitation according to claim 18, wherein the tor comprises, for a first subframe of the frame, a calculator of a first estimation of the gain of the fixed contribution of the tion in response to the value of the parameter t representative of the classification of the frame, and a subtractor of an energy of a filtered innovation ctor from a fixed codebook from the first tion to obtain the ted gain.
20. The device for retrieving the quantized gain of the fixed contribution of the excitation according to claim 18, wherein the estimator, for each ame of said frame following the first sub-frame, is responsive to the value of the to the parameter t representative of the classification of the frame and gains of adaptive and fixed contributions of the excitation of at least one previous sub-frame of the frame to estimate the gain of the fixed contribution of the excitation.
21. The device for retrieving the quantized gain of the fixed contribution of the excitation according to any one of claims 18 to 20, wherein the estimator uses, for estimating the gain of the fixed contribution of the excitation, estimation coefficients different for each sub-frame of the frame.
22. The device for retrieving the quantized gain of the fixed contribution of the excitation according to any one of claims 18 to 21, wherein the estimator confines estimation of the gain of the fixed contribution of the tion in the frame to increase robustness against frame erasure.
23. A device according to claim 18 for retrieving the quantized gain of the fixed contribution of the excitation and a quantized gain of an adaptive contribution of the excitation in the sub-frame of the frame, wherein: the gain codebook supplies the quantized gain of the adaptive contribution of the excitation for the sub-frame in response to the gain codebook index.
24. The device for retrieving the quantized gains of the adaptive and fixed butions of the excitation according to claim 23, wherein the gain codebook comprises entries each comprising the quantized gain of the adaptive contribution of the excitation and the tion factor for the estimated gain.
25. The device for retrieving the zed gains of the adaptive and fixed butions of the excitation according to claim 23 or 24, wherein the gain codebook has ent sizes in different sub-frames of the frame.
26. A method for quantizing a gain of a fixed contribution of an excitation in a frame, including sub-frames, of a coded sound signal, comprising: receiving a parameter t having a value representative of a fication of the frame; estimating the gain of the fixed contribution of the excitation in a ame of said frame, using the value of the parameter t representative of the classification of the frame as a multiplicative factor in at least one term of a function used to calculate the estimated gain of the fixed contribution of the excitation; and predictive quantizing the gain of the fixed contribution of the excitation, in the sub-frame, using the estimated gain.
27. The quantizing method according to claim 26, wherein predictive quantizing the gain of the fixed contribution of the excitation comprises determining a tion factor for the estimated gain as a quantization of the gain of the fixed contribution of the excitation, and wherein the estimated gain multiplied by the tion factor gives the quantized gain of the fixed contribution of the excitation.
28. The quantizing method according to claim 26 or 27, wherein estimating the gain of the fixed contribution of the excitation comprises, for a first sub-frame of the frame, calculating a first estimation of the gain of the fixed contribution of the tion in response to the value of the parameter t representative of the classification of the frame, and subtracting an energy of a filtered innovation codevector from a fixed codebook from the first estimation to obtain the ted gain.
29. The zing method according to claim 27, wherein estimating the gain of the fixed contribution of the excitation comprises, for a first sub-frame of the frame: calculating a linear estimation of the gain of the fixed contribution of the excitation in logarithmic domain in se to the value of the parameter t entative of the classification of the frame; subtracting an energy of a filtered innovation codevector from a fixed codebook in logarithmic domain from the linear gain estimation, to produce a gain in logarithmic domain; converting the gain in logarithmic domain from the subtraction to linear domain to produce the estimated gain; and multiplying the estimated gain by the correction factor to produce the quantized gain of the fixed contribution of the tion.
30. The quantizing method according to any one of claims 26 to 29, n estimating the gain of the fixed contribution of the excitation, for each sub-frame of said frame following the first sub-frame, is responsive to the value of the parameter t entative of the classification of the frame and gains of adaptive and fixed butions of the excitation of at least one previous ame of the frame to estimate the gain of the fixed contribution of the excitation.
31. The quantizing method according to claim 30, wherein estimating the gain of the fixed contribution of the excitation comprises, for each sub-frame following the first sub-frame, calculating a linear estimation of the gain of the fixed contribution of the excitation in logarithmic domain and converting to linear domain the linear estimation in logarithmic domain to produce the estimated gain.
32. The quantizing method according to claim 31, wherein the gains of the adaptive contributions of the excitation of at least one previous sub-frame of the frame are quantized gains and the gains of the fixed contributions of the excitation of at least one previous sub-frame of the frame are zed gains in logarithmic domain.
33. The quantizing method ing to claim 28 or 29, wherein calculating the estimation of the gain of the fixed contribution of the excitation comprises using in relation to the classification ter estimation coefficients determined using a large training se.
34. The quantizing method according to claim 31 or 32, wherein calculating a linear estimation of the gain of the fixed contribution of the excitation in logarithmic domain comprises using in relation to the classification parameter of the frame and the gains of the adaptive and fixed contributions of the excitation of at least one previous sub-frame estimation coefficients which are different for each ame and determined using a large training database.
35. The zing method ing to any one of claims 26 to 34, wherein estimating the gain of the fixed contribution of the tion comprises using, for estimating the gain of the fixed contribution of the excitation, estimation cients different for each sub-frame of the frame.
36. The quantizing method according to any one of claims 26 to 35, wherein estimation of the gain of the fixed contribution of the excitation is confined in the frame to increase robustness t frame erasure.
37. A method for jointly zing gains of adaptive and fixed contributions of an excitation in a frame of a coded sound signal, sing: quantizing the gain of the adaptive contribution of the excitation; and quantizing the gain of the fixed contribution of the excitation using the method as defined in any one of claims 26 to 36.
38. The method for jointly quantizing the gains of the adaptive and fixed contributions of the excitation according to claim 37, using a gain codebook having entries each comprising the quantized gain of the adaptive contribution of the excitation and a correction factor for the estimated gain.
39. The method for jointly quantizing the gains of the adaptive and fixed contributions of the excitation according to claim 38, n quantizing the gain of the adaptive contribution of the excitation and quantizing the gain of the fixed contribution of the excitation comprises searching the gain codebook and selecting the gain of the adaptive contribution of the excitation from one entry of the gain codebook and the correction factor of the same entry of the gain codebook as a quantization of the gain of the fixed contribution of the excitation.
40. The method for jointly quantizing the gains of the ve and fixed contributions of the excitation according to claim 38, sing designing the gain codebook for each sub-frame of the frame.
41. The method for jointly zing the gains of the adaptive and fixed contributions of the tion ing to claim 40, wherein the gain codebook has different sizes in different sub-frames of the frame.
42. The method for jointly zing the gains of the adaptive and fixed contributions of the excitation according to claim 39, n quantizing the gain of the adaptive contribution of the excitation and quantizing the gain of the fixed contribution of the tion comprise searching the gain codebook completely in each sub-frame.
43. A method for retrieving a quantized gain of a fixed contribution of an excitation in a sub-frame of a frame, comprising: receiving a gain codebook index; estimating the gain of the fixed contribution of the excitation in the subframe , using a value of a parameter t representative of a classification of the frame as a multiplicative factor in at least one term of a function used to calculate the estimated gain of the fixed contribution of the excitation; supplying, from a gain codebook and for the sub-frame, a correction factor in response to the gain codebook index; and multiplying the estimated gain by the correction factor to provide a quantized gain of the fixed contribution of the excitation in said sub-frame.
44. The method for retrieving the quantized gain of the fixed contribution of the excitation according to claim 43, wherein estimating the gain of the fixed contribution of the excitation comprises, for a first sub-frame of the frame, calculating a first estimation of the gain of the fixed contribution of the excitation in response to the value of the parameter t entative of the classification of the frame, and subtracting an energy of a filtered innovation codevector from a fixed codebook from the first estimation to obtain the estimated gain.
45. The method for retrieving the zed gain of the fixed bution of the excitation according to claim 43, wherein estimating the gain of the fixed contribution of the excitation comprises using, in each sub-frame of said frame following the first sub-frame, the value of the parameter t representative of the classification of the frame and gains of adaptive and fixed contributions of the excitation of at least one previous sub-frame of the frame to estimate the gain of the fixed contribution of the excitation.
46. The method for retrieving the quantized gain of the fixed contribution of the excitation according to any one of claims 43 to 45, wherein ting the gain of the fixed contribution of the excitation comprises using tion cients ent for each sub-frame of the frame.
47. The method for retrieving the quantized gain of the fixed contribution of the tion according to any one of claims 43 to 46, wherein the estimation of the gain of the fixed contribution of the excitation confines estimation of the gain of the fixed contribution of the excitation in the frame to increase robustness against frame erasure.
48. A method as defined in claim 43 for retrieving the quantized gain of the fixed contribution of the excitation and a quantized gain of an adaptive contribution of the excitation in the sub-frame of the frame, comprising: supplying, from the gain codebook and for the ame, the zed gain of the adaptive contribution of the excitation in response to the gain codebook index.
49. The method for retrieving the quantized gains of the adaptive and fixed contributions of the excitation according to claim 48, wherein the gain codebook comprises entries each comprising the quantized gain of the ve contribution of the excitation and the correction factor for the estimated gain.
50. The method for retrieving the quantized gains of the adaptive and fixed contributions of the excitation according to claim 48 and 49, wherein the gain codebook has different sizes in different sub-frames of the frame.
NZ611801A 2011-02-15 2012-02-14 Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec NZ611801B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201161442960P 2011-02-15 2011-02-15
US61/442,960 2011-02-15
PCT/CA2012/000138 WO2012109734A1 (en) 2011-02-15 2012-02-14 Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec

Publications (2)

Publication Number Publication Date
NZ611801A NZ611801A (en) 2015-06-26
NZ611801B2 true NZ611801B2 (en) 2015-09-29

Family

ID=

Similar Documents

Publication Publication Date Title
CA2821577C (en) Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec
RU2441286C2 (en) Method and apparatus for detecting sound activity and classifying sound signals
US8392178B2 (en) Pitch lag vectors for speech encoding
EP2102619A1 (en) Method and device for coding transition frames in speech signals
US10672411B2 (en) Method for adaptively encoding an audio signal in dependence on noise information for higher encoding accuracy
US10115408B2 (en) Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a CELP codec
NZ611801B2 (en) Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec
HK40028306B (en) Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec
HK40028306A (en) Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec
HK1187441A (en) Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec
HK1187441B (en) Device and method for quantizing the gains of the adaptive and fixed contributions of the excitation in a celp codec