JP3070073B2

JP3070073B2 - Shape control method based on audio signal

Info

Publication number: JP3070073B2
Application number: JP2185557A
Authority: JP
Inventors: 直人岩橋
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1990-07-13
Filing date: 1990-07-13
Publication date: 2000-07-24
Anticipated expiration: 2015-07-24
Also published as: JPH0473698A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、音声信号に基づいて映像或いは人形等の顔
の顎と口唇の形状を制御する音声信号に基づく形状制御
方法に関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a shape control method based on an audio signal that controls the shape of a chin and lips of a face of a video or a doll based on the audio signal.

[Summary of the Invention]

本発明は、入力音声信号のホルマント中心周波数を線
形変換と非線形変換して下顎と口唇の開大度を得るよう
にしたことにより、顎と口唇の形状をよりリアルに制御
することができる音声信号に基づく形状制御方法を提供
するものである。The present invention provides a linear and non-linear conversion of the formant center frequency of an input audio signal to obtain the degree of enlargement of the lower jaw and the lip, thereby enabling a more realistic control of the shape of the jaw and the lip. The present invention provides a shape control method based on the above.

[Conventional technology]

従来の例えばいわゆるアニメーションにおいて、その
アニメーション中の人物が会話等を行う際の口唇及び顎
等の動きは、当該アニメーション画像作成者が、該会話
に合わせた口唇等の動きを例えば従来の経験に照らし合
わせて推測することで決めるようにしている。また、例
えば人型のいわゆるロボット或いは人形等の口唇及び顎
等を会話に合わせて動かす場合も同様であった。In a conventional animation, for example, in a so-called animation, the movement of the lips and chin when a person in the animation has a conversation or the like, the animation image creator compares the movement of the lips or the like in accordance with the conversation with, for example, conventional experience. It is decided by guessing together. The same applies to the case where the lips and jaws of a humanoid robot or a doll are moved according to the conversation.

[Problems to be solved by the invention]

ところで、近年、上記アニメーション或いはロボッ
ト，人形等においては、会話に合わせて、よりリアルに
口唇及び顎等を動かすことができるようになることが求
められている。By the way, in recent years, in the above-mentioned animation, robots, dolls and the like, it has been demanded that the lips and jaws can be moved more realistically in accordance with the conversation.

しかし、上述したように、従来は会話に合わせた口唇
等の動きを経験等に基づいて推測するようにしているた
め、到底リアルな動きとは言い難いものとなっている。
また、例えばコンピュータ等を用いて口唇等の動き演算
するものも考えられているが、膨大な演算量が必要で、
簡単に、よりリアルな口唇等の動きを得ることはできな
いのが実情である。更に、従来は、リアルタイムで口唇
等を動かすこともできない。However, as described above, conventionally, the movement of the lips and the like in accordance with the conversation is estimated based on experience and the like, and therefore, it is hardly a realistic movement.
In addition, for example, a device that calculates the movement of the lips or the like using a computer or the like has been considered.
The reality is that it is not possible to easily obtain more realistic movements of the lips and the like. Further, conventionally, the lips and the like cannot be moved in real time.

そこで、本発明は、上述のような実情に鑑みて提案さ
れたものであり、映像、或いは、人形等の顔の顎と口唇
の形状を、よりリアルに制御することができ、更にリア
ルタイムでの制御も可能な音声信号に基づく形状制御方
法を提供することを目的とするものである。Therefore, the present invention has been proposed in view of the above-described situation, and it is possible to more realistically control the shape of the chin and lips of a face of a video or a doll, and furthermore, it is possible to realize the real-time control. It is an object of the present invention to provide a shape control method based on an audio signal that can be controlled.

[Means for solving the problem]

本発明の音声信号に基づく形状制御方法は、上述の目
的を達成するために提案されたものであって、入力音声
信号から、当該入力音声信号のスペクトルエンベロープ
のピークを示すホルマント周波数の中心周波数を求め、
このホルマント中心周波数を線形変換及び非線形変換す
ることにより、例えば映像，人形（ロボット）等の顔の
形状の下顎の開大度と口唇の横方向の開大度を得るよう
にしたものである。すなわち、ホルマント中心周波数の
線形変換することで顎の動きと舌の動きを求め、この舌
の動きを非線形変換することで下顎の開大度と口唇の横
方向の開大度を得るようにしている。A shape control method based on an audio signal according to the present invention has been proposed to achieve the above-described object.From an input audio signal, a center frequency of a formant frequency indicating a peak of a spectrum envelope of the input audio signal is determined. Asked,
By linearly and non-linearly transforming the formant center frequency, the degree of dilation of the lower jaw and the degree of dilation of the lips in the lateral direction are obtained, for example, for the face of a video, a doll (robot) or the like. In other words, the jaw movement and tongue movement are obtained by linearly converting the formant center frequency, and the lower jaw opening and the lateral opening of the lips are obtained by nonlinearly converting this tongue movement. I have.

[Action]

本発明によれば、実際の音声に基づく入力音声信号の
ホルマント中心周波数に、簡単な線形変換演算を施し、
更に、簡単な非線形変換演算を行って、下顎と口唇の開
大度を求めるようにしているため、簡単にリアルタイム
で口唇と下顎の動きを再現できるようになる。According to the present invention, a simple linear conversion operation is performed on the formant center frequency of the input audio signal based on the actual audio,
Further, since the degree of opening of the lower jaw and the lip is obtained by performing a simple nonlinear conversion operation, the movement of the lip and the lower jaw can be easily reproduced in real time.

〔Example〕

以下、本発明を適用した実施例について図面を参照し
ながら説明する。Hereinafter, embodiments of the present invention will be described with reference to the drawings.

第１図に本発明実施例の音声信号に基づく形状制御方
法が適用される例えばアニメーションの顔画像を示す。FIG. 1 shows a face image of, for example, an animation to which a shape control method based on an audio signal according to an embodiment of the present invention is applied.

この第１図に示す本実施例の顔画像においては、入力
音声信号から、例えば第２図〜第４図に示すような入力
音声信号のスペクトルエンベロープのピークを示すホル
マント（例えば第1,第２ホルマントH₁,H₂）周波数の中
心周波数を求め、このホルマント中心周波数を線形変換
及び非線形変換することにより、顔の下顎の開大度Ｄ
（cm）と口唇の横方向の開大度Ｌ（cm）を得るようにし
たものである。すなわち、本実施例では、上記ホルマン
ト中心周波数を、後述する（１）式を用いて線形変換す
ることで、顎の動きすなわち上記下顎の開大度Ｄと、第
８図に示す舌の各動作位置P₁〜P₅での動き（第６図）及
び／又は第７図に示す舌の先端形状の動きとを求め、こ
の舌の動きを後述する（２）式を用いて非線形変換する
ことで上記口唇の横方向の開大度Ｌを得るようにしてい
る。In the face image of the present embodiment shown in FIG. 1, a formant (for example, the first or second form) indicating the peak of the spectrum envelope of the input audio signal as shown in FIGS. 2 to 4 is obtained from the input audio signal. Formant H ₁ , H ₂ ) The center frequency of the frequency is obtained, and the formant center frequency is subjected to linear conversion and non-linear conversion to obtain the degree of dilation D of the lower jaw of the face.
(Cm) and the degree of opening L (cm) in the lateral direction of the lips. That is, in the present embodiment, the formant center frequency is linearly transformed by using the expression (1) described later, whereby the movement of the jaw, that is, the opening degree D of the lower jaw, and each movement of the tongue shown in FIG. obtains a motion of the tongue tip shape illustrated in motion (Figure 6) and / or Figure 7 in position P ₁ to P _5, to the non-linear converting the movement of the tongue with the below equation (2) To obtain the degree of lateral enlargement L of the lips.

ここで、第２図には例えば「ア」の音を発音した場合
の音声信号のスペクトルエンベロープを示し、第３図に
は例えば「イ」の音声信号のスペクトルエンベロープ
を、第４図には例えば「ウ」のスペクトルエンベロープ
を示している。これら第２図〜第４図に示すスペクトル
エンベロープのピーク部分を通常ホルマントと呼び、こ
の音声信号のホルマントは、一般に、声道の音響的イン
パルス応答の減衰正弦波成分と定義されるものである。
このホルマントは、長さが約17cmの平均的声道に対して
は、一般に3kHz以内に３〜４個のホルマントがあり、5k
Hz以内では４〜５個のホルマントがある。有声音では最
初の３個のホルマントが最も重要であり、一般に、周波
数の最も低い所に現れるピークを第１ホルマント（H₁）
と呼び、この第１ホルマントの次に現れるピークを第２
ホルマント（H₂）と、以後、第３ホルマント，第４ホル
マント，…と続いている。これらホルマントは、例えば
いわゆるケプストル分析或いは線形予測分析に基づいて
求めることができるものであり、例えば、該線形予測分
析を用いることによって、少ない演算量で求めることが
できる。Here, FIG. 2 shows a spectrum envelope of an audio signal when, for example, the sound of "A" is generated, FIG. 3 shows a spectrum envelope of an audio signal of "A", and FIG. The spectrum envelope of "U" is shown. The peak portions of the spectral envelopes shown in FIGS. 2 to 4 are usually called formants, and the formants of this audio signal are generally defined as attenuated sinusoidal components of the acoustic impulse response of the vocal tract.
This formant typically has 3-4 formants within 3 kHz for an average vocal tract about 17 cm long, and 5k
Within Hz there are 4-5 formants. For voiced sounds, the first three formants are the most important, and generally the peak appearing at the lowest frequency is the first formant (H ₁ )
And the peak appearing after this first formant is called the second peak.
Formants (H ₂ ), followed by third formants, fourth formants, and so on. These formants can be obtained, for example, based on so-called Cepstall analysis or linear prediction analysis. For example, by using the linear prediction analysis, they can be obtained with a small amount of calculation.

上述のようにして例えば各母音「ア」，「イ」，
「ウ」，「エ」，「オ」のホルマント周波数を求める。
第５図に該各母音のホルマント周波数の例えば第１ホル
マントH₁と第２ホルマントH₂の中心周波数の位置を平面
にプロットした時の位置関係を示す。この第５図におい
て、各母音の位置関係は、「ア」と「ウ」の間に「オ」
が位置し、「イ」と「ア」の間に「エ」が位置するよう
な位置関係となっていることが確認できる。As described above, for example, each vowel "A", "I",
Find the formant frequencies of "U", "E", and "O".
A formant first formant H ₁ example of the frequency of each of vowels in FIG. 5 shows the positional relationship when plotting the position of the second center frequency of the formant H ₂ in a plane. In FIG. 5, the positional relationship between the vowels is represented by "o" between "a" and "u".
Is located, and the positional relationship is such that “E” is located between “A” and “A”.

ところで、音声と舌の動きとは、例えば、第６図，第
７図のような関係を有していることが知られている。該
第６図には各母音「ア」，「イ」，「ウ」，「エ」，
「オ」に対応する第８図に示す舌の各動作位置P₁〜P₅で
の曲率関数（舌形状曲率）Ｃ（cm^-1）を示し、第７図に
は各母音に対応する舌の先端形状を示している。ここ
で、第８図において、各動作位置P₁〜P₅は、舌の表側の
中心線上の位置であって、上記動作位置P₁は舌の先端か
ら例えば10mmの位置であり、動作位置P₂は上記動作位置
P₁から更に5mm奥の位置で、以下動作位置P₃,P₄,P₅の順
に5mmずつ奥の位置を示している。すなわち、該第８図
に基づき、第６図には、上記各動作位置P₁〜P₅における
各母音の発声時の、これら各動作位置P₁〜P₅上の5mmの
範囲における舌形状の曲率関数（舌形状曲率）Ｃを示
し、第７図には、各母音に対してこの第６図のような舌
形状曲率Ｃを、舌の先端の形状に変換したものを示して
いる。By the way, it is known that the voice and the movement of the tongue have a relationship as shown in FIGS. 6 and 7, for example. In FIG. 6, the vowels "A", "I", "U", "E",
FIG. 8 shows a curvature function (tongue shape curvature) C (cm ^-1 ) at each of the operating positions P _{1 to} P ₅ of the tongue shown in FIG. 8 corresponding to “o”, and FIG. 7 shows a tongue corresponding to each vowel. Shows the tip shape. Here, in FIG. 8, each of the operation positions P _{1 to} P ₅ is a position on the center line on the front side of the tongue, and the operation position P ₁ is, for example, a position 10 mm from the tip of the tongue. ₂ is the above operating position
At a position 5 mm further from P ₁ , the operation positions P ₃ , P ₄ , and P ₅ are shown to be 5 mm deeper in this order in the following order. That is, based on the said 8 Figure, the Figure 6, at the time of utterance of each vowel in each operating position P ₁ to P _5, the tongue-shaped in the region of these 5mm on each operating position P ₁ to P ₅ FIG. 7 shows a curvature function (tongue-shaped curvature) C, and FIG. 7 shows the tongue-shaped curvature C as shown in FIG.

また、各母音における各動作位置P₁〜P₅での舌形状曲
率Ｃと、顔の下顎の開大度Ｄとは、第９図〜第13図に示
すような関係となっている。すなわち、第９図は各母音
発声時の顔の下顎の開大度Ｄと上記動作位置P₁での各母
音発声時の舌形状曲率Ｃとの関係を示し、第10図には各
母音発声時の顔の下顎の開大度Ｄと上記動作位置P₂での
各母音発声時の舌形状曲率Ｃとの関係を、以下同様に、
第11図は下顎開大度Ｄと動作位置P₃、第12図は下顎開大
度Ｄと動作位置P₄、第13図は下顎開大度Ｄと動作位置P₅
の舌形状曲率Ｃとの関係を示している。これら第９図〜
第13図から、舌顎開大度Ｄと舌形状曲率Ｃとの関係は、
「オ」は「ア」と「ウ」との間に位置し、「エ」は
「ア」と「イ」の間に位置していることがわかる。これ
らの各母音における位置関係は、上記第９図〜第13図で
全て共通していることが確認できる。Further, a tongue-shaped curvature C in each operating position P ₁ to P ₅ in each vowel, the open Daito D of the lower jaw of the face, and has a relationship as shown in FIG. 9 to 13 FIG. That is, FIG. 9 shows the relationship between the tongue-shaped curvature C during each vowel production in the open SEV D and the operating position P ₁ of the lower jaw of the face at the time of each vowel utterance, in FIG. 10 each vowel production The relationship between the degree of opening D of the lower jaw of the face at the time and the tongue-shaped curvature C at the time of each vowel utterance at the above-mentioned operation position P ₂ is similarly described below.
11 shows the lower jaw opening D and the operating position P ₃ , FIG. 12 shows the lower jaw opening D and the operating position P ₄ , and FIG. 13 shows the lower jaw opening D and the operating position P _5.
In relation to the tongue-shaped curvature C of FIG. These figures 9 ~
From FIG. 13, the relationship between the tongue and jaw opening degree D and the tongue shape curvature C is
It can be seen that “O” is located between “A” and “U”, and “E” is located between “A” and “I”. It can be confirmed that the positional relationship among these vowels is common in all of FIGS. 9 to 13.

上述の第９図〜第13図と前述の第５図とから、上記下
顎開大度Ｄ及び舌形状曲率Ｃと、上記第1,第２ホルマン
トH₁,H₂における各母音の位置関係が、上述同様に、
「オ」が「ア」と「ウ」との間に位置し、「エ」が
「ア」と「イ」の間に位置するような関係を有している
ことが確認できる。すなわち、ホルマント周波数と、舌
形状曲率Ｃ及び下顎開大度Ｄの位置関係とは、略一致し
ていると確認できる。From FIG. 9 to FIG. 13 and FIG. 5, the positional relationship between the lower jaw opening degree D and the tongue-shaped curvature C and the vowels in the _first and second formants H ₁ and H ₂ is shown. , As described above,
It can be confirmed that “O” is located between “A” and “U”, and that “E” is located between “A” and “A”. That is, it can be confirmed that the formant frequency substantially matches the positional relationship between the tongue-shaped curvature C and the degree of mandibular enlargement D.

このようなことから、上記第５図に示したような第1,
第２ホルマントH₁,H₂のホルマント周波数から、第９図
〜第13図に示したような上記舌形状曲率Ｃ及び下顎開大
度Ｄを近似的に写像する様な関数を比較的容易に導くこ
とができるようになる。Because of this, the first and the second as shown in FIG.
From the formant frequencies of the second formants H ₁ and H ₂ , a function that approximately maps the tongue-shaped curvature C and the mandibular opening D as shown in FIGS. 9 to 13 can be relatively easily formed. Will be able to guide you.

本実施例では、当該近似的に写像する関数を線形とし
ている。この場合、その線形変換は、で表すことができる。ただし、該（１）式中、F₁は第１
ホルマントH₁のホルマント周波数（Hz）であり、F₂は第
２ホルマントH₂のホルマント周波数（Hz）である。ま
た、である。In this embodiment, the function that is approximately mapped is linear. In this case, the linear transformation is Can be represented by However, in the equation (1), F ₁ is the first
A formant H ₁ formant frequency (Hz), F ₂ is the second formant of H ₂ formant frequency (Hz). Also, It is.

この時、（１）式中、Ａ及びＢは、例えば第14図のよ
うなホルマント周波数の「ア」，「イ」，「ウ」のそれ
ぞれの位置を示す点r_x,p_x,q_xを、第16図に示すような舌
形状曲率Ｃ及び下顎開大度Ｄでの「ア」，「イ」，
「ウ」のそれぞれの点r_y,p_y,q_yに線形変換するようにし
て求められる。At this time, in the equation (1), A and B are points r _x , p _x , q _x indicating respective positions of the formant frequencies “A”, “A”, “U” as shown in FIG. At the tongue-shaped curvature C and the lower jaw opening D as shown in FIG.
The points are obtained by performing linear conversion to respective points r _y , p _y , and q _y of “U”.

なお、上記ホルマント周波数は、発声する人によって
個人差があるため、この個人差を正規化によって取り除
く。この正規化としては、例えば、この第14図の
「ア」，「イ」，「ウ」を頂点とする三角形を、第15図
のような正三角形に変換することにより行う。これによ
り、各母音の正規化が可能となる。この正三角形への変
換については後述する。Since the formant frequency has individual differences depending on the utterer, this individual difference is removed by normalization. This normalization is performed, for example, by converting a triangle having the vertices “A”, “A”, and “U” in FIG. 14 into an equilateral triangle as shown in FIG. This makes it possible to normalize each vowel. The conversion to the equilateral triangle will be described later.

更に、本発明実施例では、上記舌形状曲率Ｃから口唇
の横方向の開大度Ｌへの変換を行うようにしている。こ
の時の変換は、非線形変換を用いることでなされる。当
該非線形変換としては、例えば、Ｌ＝（Ｃ＋d₁）^1/2＋d₂ （２）を用いる。ただし、（２）式中、d₁,d₂は定数である。
この非線形変換により口の動きの自然性を高めることが
できる。Further, in the embodiment of the present invention, the conversion from the tongue-shaped curvature C to the degree of widening L of the lips in the lateral direction is performed. The conversion at this time is performed by using a non-linear conversion. As the non-linear conversion, for example, L = (C + d ₁ ) ^1/2 + d ₂ (2) is used. However, in the equation (2), d ₁ and d ₂ are constants.
This non-linear conversion can enhance the naturalness of mouth movement.

本実施例においては、上述したようなホルマント周波
数の線形変換による舌形状曲率C,下顎開大度Ｄへの変
換、及び、該下形状曲率Ｃの非線形変換による口唇の横
方向の開大度Ｌへの変換の操作を行うことで、音声信号
から下顎の開大度Ｄ及び口唇の横方向の開大度Ｌを推定
することができるようになる。したがって、本実施例の
形状制御方法を用いれば、アニメーション等の映像に限
らず、人形等の顔の顎と口唇の形状をよりリアルに制御
することができるようになる。更に、発声の個人差を正
規化することで取り除いているため、より正確な形状制
御が可能となる。In the present embodiment, the tongue shape curvature C and the mandibular opening degree D are converted into the tongue shape curvature D by the linear conversion of the formant frequency as described above, and the lateral opening degree L of the lip is converted by the nonlinear conversion of the lower shape curvature C. By performing the conversion operation, the degree of enlargement D of the lower jaw and the degree of enlargement L of the lips in the lateral direction can be estimated from the audio signal. Therefore, by using the shape control method of the present embodiment, it is possible to more realistically control not only images such as animations but also shapes of chins and lips of a face of a doll or the like. Further, since individual differences in utterances are removed by normalization, more accurate shape control can be performed.

ここで、上記正規化に用いられる任意の三角形を正三
角形に変換或いは逆変換する手法について説明する。す
なわち、第17図に示すように、Ｘ−Ｙ平面内の三角形pq
rを考え、以下の手順によって、第23図に示すような一
辺の長さが１の正三角形p⁽⁶⁾q⁽⁶⁾r⁽⁶⁾へ変換する。先
ず、第17図の三角形pqrにおいて点ｐ（x₁,y₁）が原点に
移るように平行移動する（第18図）。第18図の三角形p
⁽¹⁾q⁽¹⁾r⁽¹⁾の点q⁽¹⁾のＸ座標（x₂ ⁽¹⁾）、及び、点r⁽¹⁾
のＹ座標（y₃ ⁽¹⁾）が１となるように、X,Y座標をスケー
ル変換する（第19図）。当該第19図の点q⁽²⁾のＹ座標
（y₂ ⁽²⁾）が０となるように角度θだけ三角形p⁽²⁾q⁽²⁾r
⁽²⁾を回転させる（第20図）。当該20図の三角形p⁽³⁾q
⁽³⁾r⁽³⁾の点q⁽³⁾のＸ座標（x₂ ⁽³⁾）及び、点r⁽³⁾のＹ座
標（y₃ ⁽³⁾）が１となるように、X,Y座標をスケール変換
する（第21図）。該21図の三角形p⁽⁴⁾q⁽⁴⁾r⁽⁴⁾の点r⁽⁴⁾
のＸ座標（x₃ ⁽⁴⁾）とＸ＝0.5との差をａとし、直線Ｙ＝
X/aを利用して点r⁽⁴⁾のｘ座標をスケール変換する（第2
2図）。該第22図の三角形p⁽⁵⁾q⁽⁵⁾r⁽⁵⁾のＹ座標が３^1/2
/2となるようにＹ座標のスケール変換を行う（第23
図）。以上の手順により任意の三角形pqrは正三角形に
変換できる。また、この手順を逆にたどることにより逆
変換も可能である。Here, a method of converting an arbitrary triangle used for the normalization into a regular triangle or inversely transforming the triangle will be described. That is, as shown in FIG. 17, the triangle pq in the XY plane
Considering r, it is converted into an equilateral triangle p ⁽⁶⁾ q ⁽⁶⁾ r ⁽⁶⁾ having one side length as shown in FIG. 23 by the following procedure. First, in the triangle pqr in FIG. 17, the point p (x ₁ , y ₁ ) is translated so as to move to the origin (FIG. 18). Triangle p in Fig. 18
⁽¹⁾ q ⁽¹⁾ r X coordinate of ⁽¹⁾ the point q ⁽¹⁾ (x ₂ ^(1)), and the point r ⁽¹⁾
The X and Y coordinates are scale-transformed so that the Y coordinate (y ₃ ⁽¹⁾ ) becomes 1 (FIG. 19). The triangle p ⁽²⁾ q ⁽²⁾ r by the angle θ so that the Y coordinate (y ₂ ⁽²⁾ ) of the point q ^{(2) in} FIG. 19 becomes zero.
Rotate ⁽²⁾ (Fig. 20). The triangle p ⁽³⁾ q in FIG. 20
⁽³⁾ X, Y such that the X coordinate (x ₂ ⁽³⁾ ⁾ of the point q ⁽³⁾ of r ^{(3) and} the Y coordinate (y ₃ ⁽³⁾ ) of the point r ⁽³⁾ become 1. The coordinates are scaled (FIG. 21). Point r ⁽⁴⁾ of triangle p ⁽⁴⁾ q ⁽⁴⁾ r ⁽⁴⁾ in FIG. 21
Is the difference between the X coordinate (x ₃ ⁽⁴⁾ ) of X and X = 0.5, and the straight line Y =
X / a is used to convert the x coordinate of point r ⁽⁴⁾ into a scale (2nd
2). Y coordinate of the triangle p of said Figure 22 ^{^{(5) q (5) r}} (5) is 3 ^1/2
/ 2 scale conversion of Y coordinate
Figure). Through the above procedure, an arbitrary triangle pqr can be converted into an equilateral triangle. Inverse conversion is also possible by following this procedure in reverse.

〔The invention's effect〕

本発明の音声信号に基づく形状制御方法においては、
入力音声信号のホルマント中心周波数を線形変換と非線
形変換して下顎と口唇の開大度を得るようにしたことに
より、例えばアニメーション等の映像、或いは、人形，
ロボット等の顔の顎と口唇の形状を簡単で、よりリアル
に制御可能とし、更に、リアルタイムでも制御すること
が可能となった。In the shape control method based on the audio signal of the present invention,
By converting the formant center frequency of the input audio signal into a linear transformation and a non-linear transformation to obtain the degree of opening of the lower jaw and the lips, for example, images such as animation, dolls,
The shape of the chin and lips of the face of a robot or the like can be controlled simply and more realistically, and furthermore, it is possible to control the shape in real time.

[Brief description of the drawings]

第１図は本発明実施例の顔画像を示す図、第２図〜第４
図は音声信号のスペクトルエンベロープを示す特性図、
第５図は音声信号のホルマント周波数を説明するための
図、第６図は舌形状曲率を示す図、第７図は舌の先端の
形状を示す図、第８図は動作位置を示す図、第９図〜第
13図は舌の各動作位置での舌形状曲率と下顎開大度を説
明するための図、第14図〜第16図はホルマント周波数か
ら舌形状曲率，下顎開大度への変換を説明するための
図、第17図〜第23図は任意の三角形から正三角形への変
換方法を説明するための図である。Ｄ……下顎の開大度Ｌ……口唇の横方向の開大度FIG. 1 is a diagram showing a face image according to an embodiment of the present invention, and FIGS.
The figure is a characteristic diagram showing the spectrum envelope of the audio signal,
FIG. 5 is a diagram for explaining the formant frequency of the audio signal, FIG. 6 is a diagram showing the tongue-shaped curvature, FIG. 7 is a diagram showing the shape of the tip of the tongue, FIG. FIG. 9 to FIG.
Fig. 13 is a diagram for explaining the tongue shape curvature and mandibular enlargement at each movement position of the tongue, and Figs. 14 to 16 explain the conversion from formant frequency to tongue shape curvature and mandibular enlargement 17 to 23 are diagrams for explaining a method of converting an arbitrary triangle into an equilateral triangle. D: The degree of dilation of the lower jaw L: The degree of dilation of the lips in the lateral direction

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩＧ１０Ｌ 19/02 Ｇ１０Ｌ 9/06 Ｃ 21/06 3/00 ５５１Ｈ // Ａ６３Ｆ 13/00 Ｓ 9/04 Ｇ (56)参考文献特開平４−359299（ＪＰ，Ａ) 特開平２−83727（ＪＰ，Ａ) 特開昭57−126000（ＪＰ，Ａ) 特開昭47−3008（ＪＰ，Ａ) 実開平４−40285（ＪＰ，Ｕ) 特許2667455（ＪＰ，Ｂ２) 特許2644789（ＪＰ，Ｂ２) 特許2518683（ＪＰ，Ｂ２) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 11/00 - 13/08 G10L 19/00 - 21/06 G10L 15/00 - 17/00 ＪＩＣＳＴファイル（ＪＯＩＳ) 実用ファイル（ＰＡＴＯＬＩＳ) 特許ファイル（ＰＡＴＯＬＩＳ)────────────────────────────────────────────────── ─── Continued on the front page (51) Int.Cl. ⁷ Identification code FI G10L 19/02 G10L 9/06 C 21/06 3/00 551H // A63F 13/00 S 9/04 G (56) References JP-A-4-359299 (JP, A) JP-A-2-83727 (JP, A) JP-A-57-126000 (JP, A) JP-A-47-3008 (JP, A) (JP, U) Patent 2667455 (JP, B2) Patent 2644789 (JP, B2) Patent 2518683 (JP, B2) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 11/00-13/08 G10L 19/00-21/06 G10L 15/00-17/00 JICST file (JOIS) Practical file (PATOLIS) Patent file (PATOLIS)

Claims

(57) [Claims]

A center frequency of a formant frequency indicating a peak of a spectrum envelope of the input audio signal is obtained from the input audio signal, and the formant center frequency is linearly and non-linearly converted to open the lower jaw of the face shape. A shape control method based on an audio signal, wherein a degree of opening and a degree of opening of the lips in the lateral direction are obtained.