JP3535360B2

JP3535360B2 - Sound generation method, sound generation device, and recording medium

Info

Publication number: JP3535360B2
Application number: JP31235197A
Authority: JP
Inventors: 美和子土井; 克之村田; 明森下
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1997-11-13
Filing date: 1997-11-13
Publication date: 2004-06-07
Anticipated expiration: 2017-11-13
Also published as: JPH11143462A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、取得した画像と音
響素の３次元配置位置とに基づき音響を生成する音響生
成装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a sound generation device that generates sound based on an acquired image and a three-dimensional arrangement position of acoustic elements.

【０００２】[0002]

【従来の技術】音楽を演奏するには、通常ピアノあるは
ギターなど、何らかの楽器を用いる。ピアノであれば、
鍵盤（キーボード）を楽譜に従って弾いていくことによ
り、楽曲となる。ギターであれば、弦を爪あるいはピッ
クアップではじくことにより、音楽を奏でることができ
る。このように従来の楽器では、楽器ごとに演奏方法も
奏でられる音も異なる。2. Description of the Related Art To play music, a musical instrument such as a piano or a guitar is usually used. If it's a piano,
A piece of music is created by playing the keyboard according to the score. With a guitar, you can play music by flipping the strings with a nail or a pickup. As described above, in the conventional musical instruments, the playing method and the sound to be played are different for each musical instrument.

【０００３】シンセサイザーなどの電子楽器はこのよう
な難点を克服するために、作られたもので、キーボード
を弾くことにより演奏することはピアノと同一である
が、どのような音で演奏するか、音色などを選択できる
ようになっている。音色などが選択されると、内蔵のマ
イクロコンピュータの制御により、音色などのパラメー
タが変更され、音が生成される。An electronic musical instrument such as a synthesizer is made in order to overcome such a difficulty, and playing a keyboard is the same as playing a piano, but what kind of sound is played? You can select the timbre. When a tone color or the like is selected, parameters such as the tone color are changed and a sound is generated under the control of the built-in microcomputer.

【０００４】シンセサイザーにより、確かにいろいろな
音色などで楽曲を演奏し、楽しむことが可能になった。
が、キーボードを弾きながら、かつ音色の変更まで行っ
て、思うように楽しめるようになるには、かなりの訓練
を必要とするのも実状である。また、画面上に表示され
たシンセサイザーのキーボードやスライドバーなどをマ
ウスにより選択して、楽曲を演奏するような音楽ソフト
も販売されている。確かに大きなキーボードは不要にな
るので、狭い場所でもパソコン１台あれば、音楽の演奏
ができるというメリットはある。一方、キーボードを弾
く代わりに、マウスで一つ一つ選択していくので、手間
がかかるという問題がある。With the synthesizer, it has become possible to play and enjoy music with various tones.
However, it is a fact that considerable training is required in order to be able to enjoy it as you like while playing the keyboard and changing the timbre. Also, music software for playing music by selecting a keyboard or slide bar of a synthesizer displayed on the screen with a mouse is sold. Certainly, there is no need for a large keyboard, so there is a merit that you can play music even in a small space with one PC. On the other hand, instead of playing the keyboard, the mouse is used to select each one, which causes a problem that it takes time.

【０００５】このような従来の電子楽器の限界を打ち破
る新しい楽器ＨｙｐｅｒＩｎｓｔｒｕｍｅｎｔｓ（以
下、超楽器と呼ぶ）の研究がＭＩＴメディアラボのＴｏ
ｍＭａｃｈｏｖｅｒらにより行われている。例えば、デ
ータグローブ（手袋状で、指の部分に光ファイバーが配
線されており、この光ファイバーの導通により、指の関
節の曲がり角を計測する）を用いて、ジェスチャを入力
して、仮想的にギターなどを演奏するＢｕｇ−Ｍｕｄｒ
ａの開発をおこなった。が、データグローブを装着する
のが負担となる他、データグローブの装着時にキャリブ
レーションを行っても、演奏していると手袋がずれるな
どして、ローバストに指関節の角度を計測することが難
しい。The research of a new musical instrument Hyper Instruments (hereinafter referred to as a super musical instrument) that breaks the limit of such a conventional electronic musical instrument is conducted by To of MIT Media Lab.
conducted by Mmachover et al. For example, using a data glove (a glove-shaped optical fiber is wired to the finger part and the bending angle of the finger joint is measured by the continuity of the optical fiber), a gesture is input and a guitar or the like is virtually input. Playing Bug-Mudr
We developed a. However, it is difficult to measure the angle of the knuckle in a robust manner because wearing the data glove becomes a burden and the glove may shift during the performance even if the data glove is calibrated. .

【０００６】また、同じくＭａｃｈｏｖｅｒらは、電場
を形成する椅子に人間が座り、人間が動くと電場が変化
することを利用して、人間の身体の動きを計測し、その
結果に従って、音を生成するＳｅｎｓｏｒＣｈａｉｒ
というものの開発も行った。同様に空中にぶら下がる四
角い枠の中に電場を形成し、そこに差し込まれた手の動
きを電場の変化により計測し、音を生成するＦｒａｍｅ
ｓの開発も行った。しかし、これらで奏でられる音は、
従来の楽曲の音とかなり異なったものであり、音楽らし
く演奏するには、従来の古典的な楽器以上の技術を要す
るという問題がある。[0006] Machover et al. Also measure the movement of the human body by utilizing the fact that a human sits in a chair forming an electric field and the electric field changes when the human moves, and a sound is generated according to the result. Do Sensor Sensor
We also developed that. Similarly, an electric field is formed in a rectangular frame that hangs in the air, and the movement of the hand inserted into the square frame is measured by the change in the electric field to generate sound.
s was also developed. However, the sounds played by these are
It is quite different from the sound of conventional music, and there is a problem that it takes more technology than conventional classical musical instruments to play like music.

【０００７】一方、画面上のマウスポインタの指し示す
位置にある色を音に変換することで、音を生成するよう
なプログラムが存在する。この場合、色と音の対応はあ
らかじめ決まっており、また、マウスポインタの位置に
ある色は一色なので、同時に一音しか生成されない。こ
れを利用して、マウスのボタンをクリックし続け、上手
にメロディラインを生成することは大変難しく、また、
単音のみなので、メロディライン以上の音楽を生成する
ことは困難である。On the other hand, there is a program for generating a sound by converting a color at a position pointed by a mouse pointer on the screen into a sound. In this case, the correspondence between the color and the sound is predetermined, and since the color at the position of the mouse pointer is one color, only one sound is generated at the same time. Using this, it is very difficult to keep clicking the mouse button and generate a melody line well.
Since it is only a single note, it is difficult to generate music above the melody line.

【０００８】[0008]

【発明が解決しようとする課題】キーボードを弾く、あ
るいはギターを弾くというような従来の古典的な楽器の
演奏法によらず、より直感的な演奏方法の実現では、Ｔ
ｏｄＭａｃｈｏｖｅｒらのＳｅｎｓｏｒＣｈａｉｒ
や、Ｆｒａｍｅｓのように、あまりにも前衛的な音にな
り、音楽らしく演奏するためには、それなりの訓練を必
要とする。In order to realize a more intuitive playing method, which is not dependent on the conventional playing method of a classical musical instrument such as playing the keyboard or playing the guitar,
od Machover et al.'s Sensor Hair
Like Frames, it sounds too avant-garde and requires some training to play like music.

【０００９】マウスポインタの指し示す位置にある表示
色を単純に色に変換する方法は、単純すぎて、音楽とし
てふくらみのある音にならない、また、マウスポインタ
を押し続けるのは、困難であるという問題がある。そこ
で、本発明は、単純な操作で所望の音響を容易に生成す
ることが可能な音響生成方法およびそれを用いた音響生
成装置を提供することを目的とする。The method of simply converting the display color at the position pointed to by the mouse pointer to a color is too simple to produce a bloated sound as music, and it is difficult to keep pressing the mouse pointer. There is. Therefore, an object of the present invention is to provide a sound generation method and a sound generation apparatus using the same, which can easily generate a desired sound by a simple operation.

【００１０】[0010]

【課題を解決するための手段】（１）本発明の音響生成
方法（請求項１）は、取得した画像の奥行き情報と、該
画像の音響素の２次元配置位置に対する位置関係とに基
づき該音響素のパワースペクトルを算出し、この算出さ
れたパワースペクトルと該音響素とに基づき音響を生成
することにより、単純な操作で所望の音響を容易に生成
することができる。(1) The sound generation method of the present invention (claim 1) is based on the depth information of the acquired image and the positional relationship of the acoustic elements of the image with respect to the two-dimensional arrangement position. A desired sound can be easily generated by a simple operation by calculating the power spectrum of the phoneme and generating the sound based on the calculated power spectrum and the phoneme.

【００１１】本発明の音響生成装置（請求項３）は、音
響素の２次元配置位置を記憶する音響素記憶手段と、画
像を取得する画像取得手段と、この画像取得手段で取得
した画像の前記音響素記憶手段に記憶された音響素の配
置位置に対する位置関係と該画像の奥行き情報とに基づ
き前記音響素のパワースペクトルを算出する算出手段
と、この算出手段で算出されたパワースペクトルと前記
音響素とに基づき音響を生成する生成手段と、を具備す
る。（２）本発明の音響生成方法（請求項２）は、取得した
画像の奥行き情報と、該画像の音響素の３次元配置位置
に対する位置関係とに基づき該音響素のパワースペクト
ルを算出し、この算出されたパワースペクトルと該音響
素とに基づき音響を生成することにより、単純な操作で
所望の音響を容易に生成することができる。According to another aspect of the present invention, there is provided a sound generation apparatus (claim 3), which is a sound element storage means for storing a two-dimensional arrangement position of a sound element, an image acquisition means for acquiring an image, and an image acquired by the image acquisition means. Calculating means for calculating the power spectrum of the acoustic element based on the positional relationship with respect to the arrangement position of the acoustic element stored in the acoustic element storage means and depth information of the image, and the power spectrum calculated by the calculating means and the Generating means for generating sound based on the phoneme. (2) The sound generation method (claim 2) of the present invention calculates the power spectrum of the acoustic element based on the acquired depth information of the image and the positional relationship of the acoustic element of the image with respect to the three-dimensional arrangement position, By generating a sound based on the calculated power spectrum and the phoneme, a desired sound can be easily generated by a simple operation.

【００１２】また、本発明の音響生成装置（請求項４）
は、音響素の３次元配置位置を記憶する音響素記憶手段
と、画像を取得する画像取得手段と、この画像取得手段
で取得した画像の前記音響素記憶手段に記憶された音響
素の配置位置に対する位置関係と該画像の奥行き情報と
に基づき前記音響素のパワースペクトルを算出する算出
手段と、この算出手段で算出されたパワースペクトルと
前記音響素とに基づき音響を生成する生成手段と、を具
備する。（３）また、本発明の音響生成装置（請求項５）は、前
記画像取得手段で取得された画像を記憶する画像記憶手
段を具備し、前記算出手段は、前記画像記憶手段に記憶
された画像の前記音響素の配置位置に対する位置関係と
該画像の奥行き情報とに基づき前記音響素のパワースペ
クトルを算出することにより、音響素の配列が不変のと
きには同一の音響を何度も生成することができる。Further, the sound producing apparatus of the present invention (claim 4).
Is an acoustic element storage unit that stores a three-dimensional arrangement position of the acoustic element, an image acquisition unit that acquires an image, and an arrangement position of the acoustic element stored in the acoustic element storage unit of the image acquired by the image acquisition unit. Calculating means for calculating a power spectrum of the acoustic element based on the positional relationship with respect to the image and depth information of the image, and generating means for generating a sound based on the power spectrum calculated by the calculating means and the acoustic element. To have. (3) Further, the sound generation apparatus (Claim 5) of the present invention comprises an image storage means for storing the image acquired by the image acquisition means, and the calculation means is stored in the image storage means. By generating a power spectrum of the acoustic elements based on the positional relationship of the image elements with respect to the arrangement position of the acoustic elements and depth information of the image, the same sound is generated many times when the array of the acoustic elements is unchanged. You can

【００１３】また、本発明の音響生成装置（請求項６）
は、前記音響素の配置位置を変更する変更手段を具備し
たことにより、単純な操作で所望の音響を容易に生成す
ることができる。例えば、取得した身振りや手振りの画
像に合わせるように、生成された音響をさらに変更する
ことができる。Further, the sound producing device of the present invention (claim 6).
Since the change means for changing the arrangement position of the acoustic element is provided, the desired sound can be easily generated by a simple operation. For example, the generated sound can be further changed so as to match the acquired gesture or hand gesture image.

【００１４】また、本発明の音響生成装置（請求項７）
は、前記音響素のパワースペクトルおよび配置位置のう
ちの少なくとも１つを予め定められた規則に従って変更
する変更手段を具備することにより、単純な操作で所望
の音響を容易に生成することができる。例えば、取得し
た身振りや手振りの画像の動きが単調なときも、変化あ
る音響を生成することができる。Further, the sound producing device of the present invention (claim 7).
By including a changing unit that changes at least one of the power spectrum and the arrangement position of the phoneme according to a predetermined rule, the desired sound can be easily generated by a simple operation. For example, even when the motion of the acquired gesture or hand gesture image is monotonous, a varying sound can be generated.

【００１５】本発明の音響生成装置（請求項８）は、前
記音響素のパワースペクトルおよび配置位置のうちの少
なくとも１つを与えられた音楽ファイルから抽出される
情報に基づき変更する変更手段を具備することにより、
単純な操作で所望の音響を容易に生成することができ
る。例えば、ＢＧＭ（バックグランドミュージック）に
合わせた音響の生成が可能となる。The sound producing apparatus of the present invention (claim 8) comprises a changing means for changing at least one of the power spectrum and the arrangement position of the acoustic element based on information extracted from a given music file. By doing
A desired sound can be easily generated by a simple operation. For example, it is possible to generate a sound according to BGM (background music).

【００１６】[0016]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照して説明する。（第１の実施形態）図１は、第１の実施形態に係る音響
生成装置の構成を概略的に示したものである。図１に示
すように、音響生成装置は、手振りや身振りなどの画像
で、奥行き値の判別可能な画像（以下、奥行き画像と呼
ぶ）を取得するためのカメラなどの撮像手段から構成さ
れる画像取得部１と、例えば周波数がそれぞれ異なる複
数の音響素の２次元的あるいは３次元的な配置位置を記
憶する音響素記憶部２と、画像取得部１が取得した手振
りや身振りの奥行き画像から抽出されるｘｙ平面の画像
と音響素記憶部２に記憶された音響素の配置位置とから
該ｘｙ平面の画像と重なる位置に配置された音響素を選
択し、この選択された音響素と、奥行き画像から判別さ
れるｚ値（奥行き情報）を基に算出された該選択された
各音響素のパワースペクトルとから音響を生成する音響
素演算部４と、３次元音響素演算部４で用いるパワース
ペクトルを算出するパワースペクトル算出部５と、音響
素演算部４の演算結果を音として提示する音響生成部７
と、音響素記憶部２に記憶された音響素の配置位置、画
像取得部１にて取得された手振りや身振りなどの奥行き
画像、音響素演算部４の演算結果を表示する表示部６
と、上記各部の間で各種情報をやり取りを制御する情報
管理部３から構成されている。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be described below with reference to the drawings. (First Embodiment) FIG. 1 schematically shows the configuration of a sound generation apparatus according to the first embodiment. As shown in FIG. 1, the sound generation device is an image of a hand gesture, a gesture, or the like, and is configured by an image capturing unit such as a camera for acquiring an image (hereinafter referred to as a depth image) whose depth value can be determined. Extraction from the acquisition unit 1, a phoneme storage unit 2 that stores a two-dimensional or three-dimensional arrangement position of a plurality of phonemes having different frequencies, and a depth image of a hand gesture or a gesture acquired by the image acquisition unit 1 From the image of the xy plane to be stored and the arrangement position of the acoustic element stored in the acoustic element storage unit 2, the acoustic element arranged at a position overlapping the image of the xy plane is selected, and the selected acoustic element and the depth are selected. Power used by the acoustic element calculation unit 4 for generating sound from the power spectrum of each selected acoustic element calculated based on the z value (depth information) determined from the image, and the power used by the three-dimensional acoustic element calculation unit 4. Spectrum A power spectrum calculating unit 5 that calculates, sound generating unit 7 for presenting the calculated result of the acoustic element calculation unit 4 as sound
And a display unit 6 that displays the placement positions of the phonemes stored in the phoneme storage unit 2, depth images such as hand gestures and gestures acquired by the image acquisition unit 1, and the calculation results of the phoneme calculation unit 4.
And an information management unit 3 that controls exchange of various information between the above units.

【００１７】図２は、音響素記憶部２に記憶されている
音響素の配置位置の記憶形式の一例を示したもので、音
響素の種別とその種別の音響素が配置される位置が複数
格納されている。FIG. 2 shows an example of a storage format of placement positions of the phonemes stored in the phoneme storage unit 2. There are a plurality of types of phonemes and positions where the phonemes of that type are installed. It is stored.

【００１８】ここで、音響素の種別とは、例えば、周波
数の違いによる種別を用いることにする。また、図２に
示した記憶形式によれば、ある１つの周波数の音響素を
１カ所のみに配置するだけでなく、あちこちに散在させ
ることもできる。Here, as the type of the phoneme, for example, the type depending on the difference in frequency is used. Further, according to the storage format shown in FIG. 2, not only acoustic elements of a certain frequency can be arranged in only one place, but also acoustic elements can be scattered here and there.

【００１９】図２の場合は、音響素を２次元的に配置す
る場合の例を示しているので、それぞれの位置座標に
は、ｘ座標とｙ座標の組み合わせが格納されている。各
アドレスの最終セルには、続きのアドレスが格納されて
れている。この部分に「ＮＵＬＬ」が書き込まれれば、
次データがないことを示す。Since FIG. 2 shows an example in which the phonemes are arranged two-dimensionally, a combination of the x coordinate and the y coordinate is stored in each position coordinate. Subsequent addresses are stored in the last cell of each address. If "NULL" is written in this part,
Indicates that there is no next data.

【００２０】図３は、図２の記憶形式で記憶されている
音響素を表示部６に表示する際の画面表示例を示したも
のである。図３では、１００Ｈｚから４．０ＫＨｚの周
波数の音響素が配列されている。参考のために、オーケ
ストラの調音に使われる「Ａ（アー）」（「ラ」にあた
る）の音が４４０Ｈｚである。FIG. 3 shows an example of a screen display when the phonemes stored in the storage format of FIG. 2 are displayed on the display unit 6. In FIG. 3, acoustic elements having frequencies of 100 Hz to 4.0 KHz are arranged. For reference, the sound of "A" (corresponding to "La") used for the articulation of the orchestra is 440 Hz.

【００２１】なお、どの周波数の音響素をどのように配
置するかは、例えば、本発明の音響生成装置の用途に応
じて任意に設定することができる。図２には音響素を２
次元的に配置する場合の音響素の記憶例を示したが、必
ずしも、これに限定されるものではない。例えば、３次
元的に配置する場合も可能である。It should be noted that how to arrange the acoustic elements of which frequency can be arbitrarily set, for example, according to the application of the sound generating apparatus of the present invention. Two phonemes are shown in FIG.
Although the example of storing the phonemes in the case of arranging the elements three-dimensionally has been shown, the present invention is not necessarily limited to this. For example, it is possible to arrange them three-dimensionally.

【００２２】図４は、音響素記憶部２に記憶されている
音響素の配置位置の他の記憶例で、音響素を３次元的に
配置する場合を示している。図４では記憶されている各
音響素の位置座標は、ｘ座標、ｙ座標、ｚ座標の３つの
座標の組になっている。FIG. 4 is another example of the storage position of the acoustic elements stored in the acoustic element storage unit 2 and shows a case where the acoustic elements are three-dimensionally arranged. In FIG. 4, the stored position coordinates of each phoneme are a set of three coordinates of x coordinate, y coordinate, and z coordinate.

【００２３】図５は、図４の記憶形式で記憶されている
音響素を表示部６に表示する際の画面表示例を示したも
ので、各音響素が３次元的に配置されている。図５で
は、各音響素は円柱の形状で表示されているが、これに
限定されるものではない。球でも、円錐でも、立方体で
も、あるいはもっと複雑な形でもよい。FIG. 5 shows an example of a screen display when the phonemes stored in the storage format of FIG. 4 are displayed on the display unit 6, and each phoneme is three-dimensionally arranged. In FIG. 5, each acoustic element is displayed in the shape of a cylinder, but the present invention is not limited to this. It can be a sphere, a cone, a cube, or a more complex shape.

【００２４】図６は、画像取得部１で取得した画像の一
例として、手の画像を示している。この手の画像は、ｘ
座標、ｙ座標、ｚ座標を有する３次元画像になってい
る。取得方法に関しては、従来の画像処理によりビデオ
カメラで取得した動画を解析し、エッジなどの切りだし
を行って形状を切り出す方法がある。が、従来の画像処
理では、奥行き値までは切り出せないので、ここでは、
特願平第９−２９９６４８号に記載の画像取得方法によ
り、奥行き値（ｚ座標値）の判別できる画像（奥行き画
像）が取得できていると想定し、本発明の動作を説明す
る。すなわち、奥行き値は、画素の濃淡による階調度に
基づく値であるとする。FIG. 6 shows a hand image as an example of the image acquired by the image acquisition unit 1. This kind of image is x
It is a three-dimensional image having coordinates, y coordinates, and z coordinates. Regarding the acquisition method, there is a method of analyzing a moving image acquired by a video camera by conventional image processing and cutting out a shape such as an edge to cut out a shape. However, with conventional image processing, it is not possible to cut out to the depth value, so here,
The operation of the present invention will be described on the assumption that an image (depth image) whose depth value (z coordinate value) can be determined has been acquired by the image acquisition method described in Japanese Patent Application No. 9-299648. That is, it is assumed that the depth value is a value based on the gradation degree due to the shading of the pixel.

【００２５】図７は、図６で示した手の奥行き画像を用
いて、例えば図３に示したような２次元的に配置された
音響素から音を発生させている（奏でている）ところの
様子を示したもので、例えば、表示部６に既に表示され
ている図３の２次元的に配列された音響素のイメージに
画像取得部１で取得された手の画像をオーバーラップす
ることにより、表示部６に図７に示したように表示され
ていてもよい。FIG. 7 shows a case where sounds are generated (played) from the two-dimensionally arranged phonemes as shown in FIG. 3, using the depth image of the hand shown in FIG. 3 shows that the image of the hand acquired by the image acquisition unit 1 overlaps the image of the two-dimensionally arranged acoustic elements of FIG. 3 already displayed on the display unit 6. Therefore, it may be displayed on the display unit 6 as shown in FIG.

【００２６】同様に、図８は、図６で示した手の奥行き
画像を用いて、例えば図５に示したような３次元的に配
置された音響素から音を発生させている（奏でている）
ところの様子を示したものである。Similarly, FIG. 8 uses the depth image of the hand shown in FIG. 6 to generate sounds from the three-dimensionally arranged phonemes as shown in FIG. 5, for example. Exist)
Here is what it looks like.

【００２７】以下、説明を簡単にするため、図７の２次
元的に配置された音響素を奏でる場合を例にとり図１の
音響生成装置の動作を説明する。図９は、図１の音響生
成装置の動作を説明するためのフローチャートで、情報
管理部３の制御の下、図１の各部が以下の動作を実行す
るようになっている。In order to simplify the explanation, the operation of the sound producing apparatus of FIG. 1 will be described below by taking the case of playing the two-dimensionally arranged acoustic elements of FIG. 7 as an example. FIG. 9 is a flowchart for explaining the operation of the sound generation device of FIG. 1, and under the control of the information management unit 3, each unit of FIG. 1 executes the following operations.

【００２８】まず、情報管理部３は、音響素記憶部２に
予め記憶されている、例えば、図２に示したような形式
で記憶されている音響素データを読み込み（ステップＳ
１）、その読み込んだ音響素データを表示部６に送り、
例えば、図３に示したように表示する（ステップＳ
２）。その後、画像取得部１からの画像の入力待ち状態
となる。First, the information management unit 3 reads the acoustic element data stored in advance in the acoustic element storage unit 2, for example, in the format shown in FIG. 2 (step S
1), send the read phoneme data to the display unit 6,
For example, it is displayed as shown in FIG. 3 (step S
2). After that, the process waits for an image to be input from the image acquisition unit 1.

【００２９】情報管理部３は、画像取得部１での画像取
得が開始されると、画像取得部１から画像取得終了の指
示がない限り、例えば毎秒３０フレームで、図６に示し
たような画像を取得する（ステップＳ３〜ステップＳ
４）。終了の指示があった場合には（ステップＳ３）、
処理を終了する。When the image acquisition unit 1 starts image acquisition, the information management unit 3 operates at 30 frames per second, for example, as shown in FIG. 6, unless the image acquisition unit 1 gives an instruction to end the image acquisition. Acquire an image (steps S3 to S)
4). If there is an end instruction (step S3),
The process ends.

【００３０】画像取得部１が取得した画像は、例えば、
図１０に示すような形式の奥行き画像になっている。図
１０は、たとえば、ｘ軸（横）方向１６画素、ｙ軸
（縦）方向１６画素、ｚ軸（奥行き）方向２５６階調の
画像になっている。つまり、縦横１６ｘ１６の行列に、
奥行き方向の値（奥行き値）が格納されている。図１０
に示した例では、奥行き値は大きいほど音響素の配置位
置から近いことを示している。つまり、奥行き値「０」
は、撮像物体がない、あるいはあっても遠方でないのと
同じであり、奥行き値「２５５」は、最も近い場所に撮
像物体があることを示す。The image acquired by the image acquisition unit 1 is, for example,
The depth image has a format as shown in FIG. FIG. 10 shows an image having 16 pixels in the x-axis (horizontal) direction, 16 pixels in the y-axis (vertical) direction, and 256 gradations in the z-axis (depth) direction. In other words, in a 16x16 matrix,
A value in the depth direction (depth value) is stored. Figure 10
In the example shown in (3), the larger the depth value, the closer the position of the phoneme. That is, the depth value "0"
Indicates that there is no imaged object, or even if there is no imaged object, the depth value “255” indicates that there is an imaged object at the closest position.

【００３１】すでに表示部６に図３に示したように音響
素が表示されているところに、取得した奥行き画像のう
ち、ある閾値以上の奥行き値を有する画素を抽出してな
る画像をオーバーラップさせて、例えば、図７に示した
ように表示する（ステップＳ５）。Where the phonemes are already displayed on the display unit 6 as shown in FIG. 3, an image obtained by extracting pixels having a depth value greater than a certain threshold value from the acquired depth images is overlapped. Then, for example, it is displayed as shown in FIG. 7 (step S5).

【００３２】次に、ステップＳ６において、パワースペ
クトル算出部５は、画像取得部１にて取得された図１０
に示したような画像の奥行き値を用いて、パワースペク
トルの算出を行う。パワースペクトルの算出方法として
は、例えば、手の画像が干渉している音響素に対し、干
渉部分の奥行き値を、その音響素のパワー値として換算
するようにしてもよい。Next, in step S6, the power spectrum calculating section 5 is acquired by the image acquiring section 1 as shown in FIG.
The power spectrum is calculated using the depth value of the image as shown in FIG. As a method of calculating the power spectrum, for example, for a phoneme in which a hand image interferes, the depth value of the interference portion may be converted as a power value of the phoneme.

【００３３】音響素演算部４では、例えば、手の画像が
干渉している音響素と、その音響素の算出されたパワー
スペクトルに基づき、３次元的な（奥行きのある）音を
生成するための演算を行い、その演算結果に基づき音響
生成部７で実際の音を生成する（ステップＳ７）。In order to generate a three-dimensional (deep) sound on the basis of, for example, the phoneme with which the hand image interferes and the calculated power spectrum of the phoneme, the phoneme computing unit 4 The sound generation unit 7 generates an actual sound based on the calculation result (step S7).

【００３４】次に、音響素演算部４とパワースペクトル
算出部５におけるパワースペクトルの算出方法につい
て、図１１に示すフローチャートを参照して説明する。
また、図１２は、算出された各音響素毎のパワースペク
トルの一例をグラフにて示したものである。Next, the method of calculating the power spectrum in the acoustic element calculator 4 and the power spectrum calculator 5 will be described with reference to the flowchart shown in FIG.
Further, FIG. 12 is a graph showing an example of the calculated power spectrum of each phoneme.

【００３５】まず、初期化を行う（ステップＳ１１）。
すなわち、算出結果としての各音響素ごとのパワースペ
クトルを格納する行列Ｘｉｊ（以下、簡単にパワー行列
と呼ぶ）と、パワー行列Ｘｉｊに音響素の種別（例え
ば、種別として周波数）毎に各音響素のパワー値が格納
できるように、その種別を識別するための変数ｉとをク
リアし、音響素記憶部２に格納されている音響素の配置
位置を読み出すためのアドレスＳに先頭のアドレス（図
２の場合、「ｐ１」）をセットする。First, initialization is performed (step S11).
That is, a matrix Xij (hereinafter simply referred to as a power matrix) that stores a power spectrum for each phoneme as a calculation result, and each phoneme for each phoneme type (for example, frequency as a type) in the power matrix Xij. So that the power value can be stored, the variable i for identifying the type is cleared and the address S for reading the placement position of the phoneme stored in the phoneme storage unit 2 is read at the top address (see FIG. In case of 2, "p1") is set.

【００３６】次に、アドレスＳに格納されている音響素
の位置（座標値）から、奥行き画像（図１０参照）との
干渉の有無（配置位置に基づく音響素の存在位置範囲に
奥行き画像の画素と重なり合う部分があるか否か）をチ
ェックする（ステップＳ１２）。Next, from the position (coordinate value) of the phoneme stored at the address S, the presence or absence of interference with the depth image (see FIG. 10) (the range of the position of the phoneme based on the placement position of the depth image of the phoneme). It is checked whether or not there is a portion overlapping the pixel) (step S12).

【００３７】その音響素が奥行き画像の１または複数の
画素とが干渉していれば、まず、アドレスＳの音響素の
種別（図２の場合、Ｓ＝ｐ１のとき音響素の種別は「音
１」）を、行列Ｘｉ１に格納する。そして、干渉してい
る画素の奥行き値をＸｉ２に順次重畳する（ステップＳ
１３）。If the phoneme interferes with one or a plurality of pixels of the depth image, first, the phoneme type of the address S (in the case of FIG. 2, when S = p1, the phoneme type is "sound"). 1 ”) is stored in the matrix Xi1. Then, the depth values of the interfering pixels are sequentially superimposed on Xi2 (step S
13).

【００３８】図２に示したように、音響素記憶部２に
は、同一のアドレス「ｐ１」（すなわち、同一の種別、
この場合、同一の周波数の音響素）に対し、複数の位置
が記憶されているので、異なる位置に配置されている音
響素があるかどうか調べる（ステップＳ１４）。あれ
ば、ステップＳ１２に戻り、その位置にある音響素と画
素との干渉の有無を調べ、干渉していれば、ステップＳ
１３にて、上記パワー行列Ｘｉ２に、その干渉している
画素の奥行き値を重畳（例えば、加算）していく。同様
に、ステップＳ１２において、その音響素の位置と干渉
している画素がない場合も、それとは異なる位置に配置
されている同一種別の音響素が干渉しているかもしれな
いので、それを調べられるように、ステップＳ１４に進
む。As shown in FIG. 2, the same address "p1" (that is, the same type,
In this case, a plurality of positions are stored for the same phoneme of the same frequency), and it is checked whether there are phonemes arranged at different positions (step S14). If there is, the process returns to step S12, the presence or absence of interference between the acoustic element at that position and the pixel is checked, and if there is interference, step S12.
At 13, the depth value of the interfering pixel is superimposed (for example, added) on the power matrix Xi2. Similarly, in step S12, even if there is no pixel interfering with the position of the phoneme, it is possible that the same type of phoneme placed at a different position may be interfering. If so, the process proceeds to step S14.

【００３９】ステップＳ１４において、同一種別（すな
わち、同一アドレスに記憶されている、例えば同一の周
波数）の全ての音響素に対し、干渉の有無をチェックし
終わると、ステップＳ１５に進み、次の種別の音響素
（すなわち、例えば、アドレス「ｐ２」）に対して、上
記同様の処理（ステップＳ１２〜ステップＳ１４）を行
うため、アドレス変数Ｓと、変数ｉのインクリメントを
行う。図２に示したような記憶形式の場合、各種別の音
響素の記憶領域の最終セルに他の種別の音響素の配置位
置を記憶している記憶領域のアドレスが格納されている
ので、それをアドレス変数Ｓに格納する。このとき、最
終セルに格納されているアドレスが「ＮＵＬＬ」のとき
は（ステップＳ１６）、全ての音響素に対する演算を終
了したことになるので処理を終了する。In step S14, if all the phonemes of the same type (that is, the same frequency, for example, stored in the same address) have been checked for interference, the process proceeds to step S15, and the next type is detected. The address variable S and the variable i are incremented in order to perform the same processing (steps S12 to S14) on the phoneme (i.e., the address "p2"). In the case of the storage format as shown in FIG. 2, since the address of the storage area storing the placement position of another type of phoneme is stored in the final cell of the storage area of each type of phoneme, Is stored in the address variable S. At this time, if the address stored in the final cell is "NULL" (step S16), it means that the calculation for all the phonemes has been completed, so the process is completed.

【００４０】ステップＳ１５でアドレス変数Ｓと変数ｉ
が更新され、ステップＳ１６でアドレス変数Ｓが「ＮＵ
ＬＬ」でないとき、再びステップＳ１２に戻り、上記処
理を繰り返す。In step S15, the address variable S and the variable i
Is updated, and the address variable S becomes “NU in step S16.
If it is not "LL", the process returns to step S12 again to repeat the above process.

【００４１】このようにして算出されたパワースペクト
ルの一例を図１２に示す。図１２では、取得画像と干渉
する各周波数の音響素毎（Ｘｉ１）に対し、そのパワー
行列Ｘｉ２の値を棒グラフの長さで示している。An example of the power spectrum calculated in this way is shown in FIG. In FIG. 12, the value of the power matrix Xi2 is shown by the length of the bar graph for each phoneme (Xi1) of each frequency that interferes with the acquired image.

【００４２】図１２に示したようなパワースペクトルの
算出結果は、そのまま表示部６に表示されてもよい。図
１２の１００Ｈｚの音響素のパワー値を算出する場合を
例にとり、図１１の処理の流れを具体的に説明する。図
７では、１００Ｈｚの音響素のところに、画像中の人差
し指の部分が干渉している。従って、ステップＳ１２で
は干渉ありと判定され、その干渉している画素（例えば
２つ）の奥行き値が、例えば、図１０より「３３」、
「７４」であるとする。ステップＳ１３では、この２つ
の画素の奥行き値がパワー行列に重畳され、結果とし
て、１００Ｈｚの音響素のパワー値が３３＋７４＝１０
７と算出される。音響素１００Ｈｚが配置されているの
は、この１カ所だけなので、この「１０７」がそのまま
１００Ｈｚのパワー値となる。The calculation result of the power spectrum as shown in FIG. 12 may be displayed on the display unit 6 as it is. The flow of the process in FIG. 11 will be specifically described by taking as an example the case of calculating the power value of the 100 Hz acoustic element in FIG. In FIG. 7, the index finger portion in the image interferes with the 100-Hz phoneme. Therefore, in step S12, it is determined that there is interference, and the depth value of the interfering pixel (for example, two) is “33” from FIG. 10,
It is assumed to be "74". In step S13, the depth values of these two pixels are superimposed on the power matrix, and as a result, the power value of the phoneme at 100 Hz is 33 + 74 = 10.
Calculated as 7. Since the acoustic element 100 Hz is arranged only in this one place, this “107” becomes the power value of 100 Hz as it is.

【００４３】なお、上記説明では、各周波数毎の音響素
のパワー値として、その音響素に干渉する画素の奥行き
値の和をそのまま用いているが、必ずしも、パワー値＝
奥行き値とする必要はなく、例えば、干渉する画素の奥
行き値を重畳する際に、その画素の奥行き値に対し、以
下のような操作を行ってもよい。In the above description, the sum of the depth values of the pixels that interfere with the acoustic element is used as it is as the power value of the acoustic element for each frequency.
It is not necessary to set the depth value, and for example, when superimposing the depth value of the interfering pixel, the following operation may be performed on the depth value of the pixel.

【００４４】１）パワー値＝奥行き値×α （α：任意の値）２）パワー値＝（奥行き値）３）パワー値＝ｋ×奥行き値４）パワー値＝Ｆ（ｘ、ｙ）×奥行き値（Ｆ（ｘ、
ｙ）：奥行き値のｘｙ平面上の分布に依存する重み値）５）パワー値＝Ｆ（ｘ）×奥行き値（Ｆ（ｘ）：奥行き
値のｘ軸上の分布に依存する重み値）６）パワー値＝Ｆ（ｙ）×奥行き値（Ｆ（ｙ）：奥行き
値のｙ軸上の分布に依存する重み値）図５に示したように、各音響素が３次元的に配置されて
いる場合、図１１のステップＳ１２では、画素の奥行き
値も使って干渉チェックをおこなう。例えば、画素のｘ
ｙ平面上の座標値および奥行き値と、音響素の座標値と
から、該画素と該音響素との間の距離を求め、その距離
が予め定められた閾値より小さい場合に、該画素と該音
響素間とが干渉していると判断する。そして、図１１の
ステップＳ１３で音響素のパワー値を求める際には、前
述同様な操作にて行ってもよいが、必ずしも奥行き値に
限定される必要はない。例えば、単純に、その音響素の
位置に干渉している画像の面積（画素の数）をそのま
ま、あるいは、その画像の面積に対し、上記１）〜６）
の操作を行ってパワー値を算出するようにしてもよい。1) power value = depth value × α (α: arbitrary value) 2) power value = (depth value) 3) power value = k × depth value 4) power value = F (x, y) × depth The value (F (x,
y): Weight value depending on distribution of depth value on xy plane) 5) Power value = F (x) × depth value (F (x): Weight value depending on distribution of depth value on x-axis) 6 ) Power value = F (y) × depth value (F (y): weight value depending on the distribution of the depth value on the y-axis) As shown in FIG. 5, each phoneme is arranged three-dimensionally. If so, in step S12 of FIG. 11, the interference check is performed using the depth value of the pixel. For example, the pixel x
The distance between the pixel and the phoneme is obtained from the coordinate value and the depth value on the y plane and the coordinate value of the phoneme, and when the distance is smaller than a predetermined threshold value, the pixel and the phoneme It is determined that there is interference with the phonemes. Then, when the power value of the phoneme is obtained in step S13 of FIG. 11, the same operation as described above may be performed, but it is not necessarily limited to the depth value. For example, simply, the area (the number of pixels) of the image interfering with the position of the phoneme is unchanged, or the area of the image is compared with the above 1) to 6).
The operation may be performed to calculate the power value.

【００４５】さて、図１２に示したようなパワースペク
トルにて各音響素を発音させれば手の動きに沿った音響
は発生することができる。ただし、単に図１２のような
パワーで各音響素を発音しただけでは、連続的な音にな
らない。Now, if each of the phonemes is sounded with the power spectrum as shown in FIG. 12, the sound can be generated according to the movement of the hand. However, a continuous sound cannot be obtained by simply sounding each phoneme with the power as shown in FIG.

【００４６】そこで、音響素演算部４では、フーリエ逆
変換により、パワースペクトル算出部５にて算出された
図１２に示したような周波数軸上のスペクトルを、時間
軸上で変位する波形に変化させる。すなわち、算出され
た各音響素のパワーを振幅とした、その各音響素の周波
数の正弦波を発生させ（例えば、各音響素に種別に対応
した波形を記憶する波形テーブルを具備してもよい）、
これを畳み込み積分することで、最終的な全体の波形を
生成する。Therefore, in the acoustic element calculator 4, the spectrum on the frequency axis calculated by the power spectrum calculator 5 as shown in FIG. 12 is changed into a waveform displaced on the time axis by the inverse Fourier transform. Let That is, a sine wave having a frequency of each phoneme is generated with the calculated power of each phoneme as the amplitude (for example, a waveform table for storing a waveform corresponding to the type of each phoneme may be provided. ),
The final overall waveform is generated by convolving and integrating this.

【００４７】図１２に基づき具体的に説明すると、ま
ず、図１２に示したパワーを振幅として、それぞれの周
波数で正弦波を発生させる。例えば、図１２の１００Ｈ
ｚの音響素のパワー値は１０７ｄＢなので、図１３に示
すように、これを振幅とした１００Ｈｚの正弦波を生成
する。また、図１２の２００Ｈｚの音響素に対しても同
様に、図１４に示すように、そのパワー値を振幅とした
２００Ｈｚの正弦波を生成する。このようにして、図１
２の各周波数に対し算出された正弦波（三角関数）を全
ての周波数について積分することにより、図１５に示す
ような波形が得られる。More specifically, referring to FIG. 12, first, a sine wave is generated at each frequency with the power shown in FIG. 12 as the amplitude. For example, 100H in FIG.
Since the power value of the acoustic element of z is 107 dB, as shown in FIG. 13, a 100 Hz sine wave having this amplitude is generated. Similarly, for the 200 Hz acoustic element in FIG. 12, a 200 Hz sine wave having the power value as the amplitude is generated as shown in FIG. In this way, FIG.
By integrating the sine wave (trigonometric function) calculated for each frequency of 2 for all frequencies, a waveform as shown in FIG. 15 is obtained.

【００４８】音響生成部７では、図１５に示した波形の
音響を発生する。画像取得部１で取得される画像が毎秒
３０フレームであるとすると、図１５に示した波形の音
響が１／３０秒毎に発生する。すなわち、毎秒３０種の
音響が生成されることになる。（第１の実施形態の効
果）以上説明したように、上記第１の実施形態によれ
ば、音響素の２次元あるいは３次元配置位置を記憶する
音響素記憶部２と、画像取得部１で所望の対象物の画像
（奥行き画像）を取得したら、パワースペクトル算出部
５で、その画像の奥行き情報とと音響素記憶部２に記憶
された音響素の２次元あるいは３次元配置位置に対する
位置関係とに基づき音響素のパワースペクトルを算出
し、音響素演算部４ではこの算出されたパワースペクト
ルと該音響素とに基づき音響を生成することにより、単
純な操作（例えば、手や体を動かすこと）で膨らみのあ
る音色を生成することが容易に行える。The sound generation unit 7 generates a sound having the waveform shown in FIG. If the image acquired by the image acquisition unit 1 is 30 frames per second, the sound of the waveform shown in FIG. 15 is generated every 1/30 second. That is, 30 kinds of sounds are generated every second. (Effects of the First Embodiment) As described above, according to the first embodiment, the acoustic element storage unit 2 that stores the two-dimensional or three-dimensional arrangement position of the acoustic element and the image acquisition unit 1 are used. When the image (depth image) of the desired object is acquired, the power spectrum calculation unit 5 has a positional relationship between the depth information of the image and the two-dimensional or three-dimensional arrangement position of the acoustic elements stored in the acoustic element storage unit 2. The power spectrum of the phoneme is calculated based on the above, and the phoneme computing unit 4 generates a sound based on the calculated power spectrum and the phoneme, thereby performing a simple operation (for example, moving a hand or a body). ) Makes it possible to easily generate a timbre with a bulge.

【００４９】奥行き画像を用いて、大変安価にかつ、小
さな計算負荷でリアルタイムに音響素を発音できるの
で、小さい子供から高齢者まで、誰もが、ほとんど練習
などしたりせずに、直感的に、演奏を楽しむことがで
き、その効果は大きい。（第１の実施形態の変形例）なお、上記第１の実施形態
では、人間の手振りや身振りなどの奥行き画像を取得
し、音響素と演算する場合について説明したが、本発明
は、必ずしもこれに限定されるものではない。例えば、
人間の手振りや身振りのかわりに、金魚や小鳥などのペ
ットの画像を撮像し、これらの奥行き画像と音響素とを
演算し、ペットによる環境音楽（ＢＧＭ）を生成するこ
とも可能である。Since a phoneme can be pronounced in real time using a depth image at a very low cost and with a small calculation load, anyone from small children to elderly people can intuitively do almost no practice. , You can enjoy playing and the effect is great. (Modified Example of First Embodiment) In the first embodiment, the case where a depth image such as a human hand gesture or a gesture is acquired and calculated as an acoustic element has been described. However, the present invention is not limited to this. It is not limited to. For example,
It is also possible to capture an image of a pet such as a goldfish or a small bird, instead of a human hand gesture or a gesture, calculate these depth images and acoustic elements, and generate environmental music (BGM) by the pet.

【００５０】この場合、例えば、各種ペットの特性に応
じて異なる音響素の種別、配列を適用できるよう、音響
素記憶部２に、図２に示したような各種ペットの特性に
応じた音響素の配列テーブルを複数個記憶しておく。ユ
ーザから情報管理装置３に対し、ペットの種類が指定さ
れると、そのペットの種類に応じて、音響素の配列テー
ブルを切り替えるようにしてもよい。In this case, for example, in order to be able to apply different types and arrangements of the acoustic elements depending on the characteristics of various pets, the acoustic element storage unit 2 has the acoustic elements corresponding to the characteristics of various pets as shown in FIG. A plurality of array tables of are stored. When the user specifies the type of pet to the information management device 3, the arrangement table of the phonemes may be switched according to the type of the pet.

【００５１】また、上記第１の実施形態では、まったく
無音の状態で、距離画像と音響素の干渉から得られたパ
ワーで発音することを想定しているが、必ずしもこれに
限定されるものではない。例えば、いくつかの音響素が
あるパワー値を有し、発音させた状態（アクティブ）
で、それに、例えば図１２のパワーを重畳させた演奏も
可能である。（第２の実施形態）図１６は、本発明の第２の実施形態
に係る音響生成装置の構成を概略的に示したものであ
る。なお、図１６において、図１と同一部分には同一符
号を付し、異なる部分についてのみ説明する。すなわ
ち、図１６では、図１の構成にさらに、音響素編集部８
が追加されている。Further, in the first embodiment, it is assumed that the sound is produced with the power obtained from the interference between the range image and the phoneme in a completely silent state, but the present invention is not necessarily limited to this. Absent. For example, some phonemes have a certain power value and are sounded (active)
Then, for example, a performance in which the power of FIG. 12 is superimposed is also possible. (Second Embodiment) FIG. 16 schematically shows the configuration of a sound generation apparatus according to the second embodiment of the present invention. 16, the same parts as those in FIG. 1 are designated by the same reference numerals, and only different parts will be described. That is, in FIG. 16, in addition to the configuration of FIG.
Has been added.

【００５２】音響素編集部８は、音響素記憶部２に予め
記憶されている音響素の配置位置を変更するためのもの
である。これにより、例えば、取得した身振りや手振り
にあわせて生成される音響を変更することができる。The phoneme editing unit 8 is for changing the arrangement position of the phonemes stored in advance in the phoneme storage unit 2. Thereby, for example, the sound generated in accordance with the acquired gesture or hand gesture can be changed.

【００５３】図１７は、表示部６に表示された音響素の
配列を変更する際の編集中の画面表示例を示したもので
ある。例えば、表示部６に図３に示したように表示され
ている音響素のうちの１つを選択して、その音響素を表
示画面中の所望の位置にコピーあるいは移動したり、ま
た、配列から削除したりする編集を行う、さらに、編集
結果を音響素編集部８から音響素記憶部２に保存するよ
うにしてもよい。FIG. 17 shows an example of a screen display during editing when changing the arrangement of the phonemes displayed on the display section 6. For example, one of the phonemes displayed on the display unit 6 as shown in FIG. 3 is selected, and the phoneme is copied or moved to a desired position on the display screen, or arranged. It is also possible to perform editing such as deleting from the phoneme, and further to save the edited result from the phoneme unit editing unit 8 to the phoneme unit storage unit 2.

【００５４】図１７では、中央右上の４００Ｈｚの音響
素が選択されて（反転表示になっている）、中央左下へ
コピーしているところを示している。図１８は、図１６
に示した音響生成装置の動作を説明するためのフローチ
ャートで、図９と異なる部分は、ステップＳ３とステッ
プＳ４の間にステップＳ２１、ステップＳ２２が追加さ
れている。すなわち、ステップＳ３で画像取得が終了し
たら、ステップＳ２１に進み、情報管理部３は、まず、
音響素の編集指示があるか否かチェックする。音響素の
編集指示があったときは、ステップＳ２２に進み、例え
ば、図１７を参照して説明したような音響素の編集処理
を実行する。FIG. 17 shows that the 400 Hz acoustic element at the upper right of the center is selected (inverted display) and is copied to the lower left of the center. 18 is the same as FIG.
9 is a flowchart for explaining the operation of the sound generation apparatus shown in FIG. 9, and differs from FIG. 9 in that steps S21 and S22 are added between steps S3 and S4. That is, when the image acquisition is completed in step S3, the process proceeds to step S21, and the information management unit 3 first
It is checked whether there is a phoneme edit instruction. When there is an instruction to edit a phoneme, the process proceeds to step S22, and, for example, the phoneme editing process described with reference to FIG. 17 is executed.

【００５５】図１６に示した音響生成装置を用いて、あ
る音響素の配列に対し、一度演奏を行い、その後、音響
素編集部８を用いて音響素の配列を変更したとする。こ
の場合、編集の効果が自分が満足いくものかどうか、音
響素の編集を行う前の演奏に対して確認したいものであ
る。が、通常、前回と全く同じ動きを再現することは難
しい。このような欠点を補うために、例えば、図１９に
示すような先に画像取得部１で取得された画像（奥行き
画像）を記憶するための画像記憶部９を、図１６の構成
部に追加してもよい。It is assumed that the sound generation apparatus shown in FIG. 16 is used to perform a performance once on a certain array of phonemes, and then the array of phonemes is changed using the phoneme editing unit 8. In this case, it is desired to confirm whether or not the effect of editing is satisfactory for the performance before the phoneme is edited. However, it is usually difficult to reproduce the exact same movement as last time. In order to compensate for such a defect, for example, an image storage unit 9 for storing an image (depth image) previously acquired by the image acquisition unit 1 as shown in FIG. 19 is added to the configuration unit of FIG. You may.

【００５６】画像記憶部９は、画像取得部１で取得され
た一連の奥行き画像を圧縮などして記憶し、音響素の編
集後に、その記憶された奥行き画像を呼び出して、その
奥行き画像と音響素の配置位置に基づき、パワースペク
トル算出部５、音響素演算部４にて上記同様の処理を実
行することにより、編集前と比較して編集した音響素の
配列が満足のいく音響を生成するものとなったかどうか
を確認することが容易に行える。The image storage unit 9 stores the series of depth images acquired by the image acquisition unit 1 by compressing and storing them, and after editing the phonemes, calls the stored depth image to extract the depth image and the sound. Based on the arrangement position of the elements, the power spectrum calculation unit 5 and the acoustic element calculation unit 4 execute the same processing as described above to generate a sound in which the arrangement of the edited element is more satisfactory than that before the editing. It is easy to confirm whether or not it has become a thing.

【００５７】この画像記憶部９に記憶された一連の奥行
き画像を、表示部６に表示して確認すると、奥行き画像
をアニメーションとして確認できる。具体的には、例え
ば、図６に示したような手の動く様子を確認することが
できる。あるいは、図３に示す音響素の配列の表示に画
像記憶部９に記憶された一連の奥行き画像を重ねて表示
することにより、図７に示したように、手の動きに合わ
せて、演奏される音を確認することができる。When the series of depth images stored in the image storage unit 9 are displayed on the display unit 6 for confirmation, the depth images can be confirmed as an animation. Specifically, for example, it is possible to confirm how the hand moves as shown in FIG. Alternatively, by displaying a sequence of depth images stored in the image storage unit 9 on the display of the array of phonemes shown in FIG. 3, the performance is performed in accordance with the movement of the hand, as shown in FIG. You can check the sound.

【００５８】このように、画像記憶部９に奥行き画像を
記憶して、その記憶された奥行き画像を呼び出すこと
で、音響素の配列が不変のときには同一の音響を何度も
生成することができる。なお、この場合でも音響素の配
列を変えれば音響も変わる。（第２の実施形態の効果）
以上説明したように、上記第２の実施形態によれば、音
響素編集部８で音響素の配置位置を変更（コピー、移
動、削除）することにより、ユーザは自分の好みにあわ
せた演奏（音響生成）を容易に行うことができる。As described above, by storing the depth image in the image storage unit 9 and calling the stored depth image, the same sound can be generated many times when the arrangement of the phonemes is unchanged. . Even in this case, if the arrangement of the phonemes is changed, the sound is also changed. (Effect of the second embodiment)
As described above, according to the second embodiment, by changing (copying, moving, deleting) the placement position of the phoneme in the phoneme editing unit 8, the user can perform the performance according to his or her preference ( Sound generation) can be easily performed.

【００５９】また、画像取得部１で取得された画像を記
憶する画像記憶部９を具備し、この画像記憶部９から所
望の画像を呼び出して、パワースペクトル算出部５、音
響素演算部４で、この呼び出した画像と音響素の配置位
置とから音響を生成することにより、音響素の配列が不
変のときには同一の音響を何度も生成することができ
る。（第３の実施形態）図２０は、第３の実施形態に係る音
響生成装置の構成を概略的に示したものである。なお、
図２０において、図１と同一部分には同一符号を付し、
異なる部分についてのみ説明する。すなわち、図２０で
は、図１の構成にさらに、音響素の配置位置、パワース
ペクトルを時系列的に変化させるための時系列変形部１
０が追加されている。Further, an image storage unit 9 for storing the image acquired by the image acquisition unit 1 is provided, and a desired image is called from this image storage unit 9, and the power spectrum calculation unit 5 and the acoustic element calculation unit 4 are called. By generating the sound from the called image and the arrangement position of the sound element, the same sound can be generated many times when the arrangement of the sound elements is unchanged. (Third Embodiment) FIG. 20 schematically shows the configuration of a sound generation apparatus according to the third embodiment. In addition,
20, the same parts as those in FIG. 1 are designated by the same reference numerals,
Only different parts will be described. That is, in FIG. 20, in addition to the configuration shown in FIG.
0 is added.

【００６０】第１の実施形態で、いくつかの音響素があ
るパワー値を有し、発音させた状態（アクティブ）で、
それに、例えば図１２のパワーを重畳させた演奏も可能
であることを述べたが、ここでは、それを応用して、時
系列変形部１０が、音響素のパワースペクトルを時系列
に変化させるようになっている。すなわち、時系列変形
部１０は、例えばｓｉｎ、ｃｏｓなどの三角関数等、何
らかの関数を用いて、各音響素のパワー値を時系列的に
変化させるようになっている。In the first embodiment, when some phonemes have a certain power value and are sounded (active),
In addition, for example, it has been described that the performance in which the power of FIG. 12 is superimposed is also possible, but here, by applying it, the time series transformation unit 10 changes the power spectrum of the phoneme in time series. It has become. That is, the time series transformation unit 10 changes the power value of each phoneme in time series using some function such as a trigonometric function such as sin and cos.

【００６１】例えば、時系列変形部１０でいくつかの任
意の音響素のパワーを時系列的に変化させながら、音響
素が発音されている状態（例えば、図１２のようなグラ
フ中の各音響素（周波数）のパワー値が１／Ｋ秒毎に上
下に変化している状態）で、その各音響素のパワー行列
に、画像取得部１で例えば１／３０秒毎に取得される画
像に基づくパワー値（例えば、図１２の各音響素のパワ
ー値）を、取得する度に重畳していくこともできる。For example, the time-series transformation unit 10 changes the power of some arbitrary phonemes in a time-series manner while the phonemes are being pronounced (for example, each sound in the graph as shown in FIG. 12). In a state where the power value of the element (frequency) changes up and down every 1 / K seconds), the power matrix of each of the phonemes is added to the image acquired by the image acquisition unit 1 for example every 1/30 second. It is also possible to superimpose the power value based on the power value (for example, the power value of each phoneme in FIG. 12) every time it is acquired.

【００６２】時系列変形部１０は、音響素のパワー値の
みに限らず、上記同様にして、時系列変形部１０のもつ
三角関数等に応じて音響素の配置位置自体を変化させる
こともできる。例えば、図５のような音響素が３次元的
に配置されているときには、音響素のｚ軸方向（奥行き
方向）の長さを変化させて、その形状を変形するように
してもよい。The time-series transforming unit 10 is not limited to the power value of the phoneme, but can similarly change the placement position of the phoneme according to the trigonometric function or the like of the time-series transforming unit 10. . For example, when the phonemes as shown in FIG. 5 are three-dimensionally arranged, the shape of the phonemes may be changed by changing the length of the phonemes in the z-axis direction (depth direction).

【００６３】時系列変形部１０が音響素のパワースペク
トル、配置位置等を時系列に変形するのに、例えば、Ｍ
ＩＤＩ（ＭｕｓｉｃＩｎｓｔｒｕｍｅｎｔＤｉｇｉ
ｔａｌＩｎｆｏｒｍａｔｉｏｎ）ファイルなどのよう
に、音楽情報が格納されているファイルから、格納され
ている音楽情報を用いて、時系列に変形させることも可
能である。The time-series transforming unit 10 transforms the power spectrum, arrangement position, etc. of an acoustic element in time-series.
IDI (Music Instrument Digi
It is also possible to transform a file in which music information is stored, such as a Tal Information file), in time series using the stored music information.

【００６４】このように音楽ファイルにあわせた時系列
の変形を行うには、例えば、図２１のように、図２０の
構成にさらに、音楽情報入力部１１を追加して、ここ
で、与えられた音楽ファイルから音楽情報（たとえば、
音の高さ、大きさ）を抽出して、それに基づき時系列変
形部１０が音響素の周波数、パワー値等を変化させるよ
うにしてもよい。（第３の実施形態の効果）以上説明したように、上記第
３の実施形態によれば、時系列変形部１０で、音響素の
パワースペクトルおよび配置位置のうちの少なくとも１
つを予め定められた規則（例えば、時間を変数とする三
角関数）に従って変更することにより、例えば、画像を
取得する対象に大きな動きがない場合、例えば、金魚な
どのようなペットの場合でも、時系列に音響素のパワー
スペクトルや配置位置が変化しているので、単調な演奏
になるのを防ぐことができる。In order to perform the time-series transformation according to the music file in this way, for example, as shown in FIG. 21, a music information input section 11 is further added to the configuration of FIG. Music information from a music file (for example,
Alternatively, the time series transformation unit 10 may change the frequency, the power value, and the like of the phoneme based on the extracted pitch and volume of the sound. (Effects of the Third Embodiment) As described above, according to the third embodiment, at least one of the power spectrum and the arrangement position of the acoustic element in the time-series deformation unit 10.
By changing one according to a predetermined rule (for example, a trigonometric function with time as a variable), for example, when there is no large movement in the object for which an image is acquired, for example, even in the case of a pet such as a goldfish, Since the power spectrum and arrangement position of the phonemes are changed in time series, it is possible to prevent a monotonous performance.

【００６５】また、音楽情報入力部１１で音楽ファイル
（例えばＭＩＤＩファイル）から抽出された情報に基づ
き、時系列変形部１０で音響素のパワースペクトルおよ
び配置位置のうちの少なくとも変更する変更ことによ
り、ＢＧＭ（バックグラウンドミュージック）にあわせ
た音響を生成することができる。Further, based on the information extracted from the music file (for example, MIDI file) by the music information input unit 11, the time-series transformation unit 10 changes at least one of the power spectrum and arrangement position of the acoustic element, It is possible to generate a sound that matches BGM (background music).

【００６６】なお、上記第１から第３の実施形態で説明
した手法は、コンピュータに実行させることのできるプ
ログラムとして、磁気ディスク（フロッピーディスク、
ハードディスクなど）、光ディスク（ＣＤ−ＲＯＭ、Ｄ
ＶＤなど）、半導体メモリなどの記録媒体に格納して頒
布することもできる。すなわち、図９、図１８に示した
フローチャートの従った処理を記述したプログラムを格
納した所定の記録媒体をパーソナルコンピュータ等に実
行させた場合も、上記第１〜第３の実施形態の説明と同
様である。The methods described in the first to third embodiments are magnetic disks (floppy disks, floppy disks, etc.) as programs that can be executed by a computer.
Hard disk, etc., Optical disk (CD-ROM, D
It can also be stored in a recording medium such as a VD) or a semiconductor memory and distributed. That is, even when a personal computer or the like executes a predetermined recording medium that stores a program that describes the processing according to the flowcharts shown in FIGS. 9 and 18, it is the same as the description of the first to third embodiments. Is.

【００６７】[0067]

【発明の効果】以上説明したように、本発明によれば、
単純な操作で所望の音響を容易に生成することができ
る。As described above, according to the present invention,
A desired sound can be easily generated by a simple operation.

[Brief description of drawings]

【図１】本発明の第１の実施形態に係る音響生成装置の
構成例を示した図。FIG. 1 is a diagram showing a configuration example of a sound generation device according to a first embodiment of the present invention.

【図２】音響素記憶部に記憶されている音響素の配置位
置の記憶例で、音響素を２次元的に配置する場合を示し
た図。FIG. 2 is a diagram showing a case where acoustic elements are two-dimensionally arranged, which is a storage example of arrangement positions of acoustic elements stored in an acoustic element storage unit.

【図３】図２の記憶形式で記憶されている音響素を表示
部に表示する際の画面表示例を示した図。FIG. 3 is a diagram showing an example of a screen display when an acoustic element stored in the storage format of FIG. 2 is displayed on a display unit.

【図４】音響素記憶部に記憶されている音響素の配置位
置の他の記憶例で、音響素を３次元的に配置する場合を
示した図。FIG. 4 is a diagram showing another example of the storage position of the acoustic elements stored in the acoustic element storage unit, showing a case where the acoustic elements are three-dimensionally arranged.

【図５】図４の記憶形式で記憶されている音響素を表示
部に表示する際の画面表示例を示した図。5 is a diagram showing an example of a screen display when the phonemes stored in the storage format of FIG. 4 are displayed on the display unit.

【図６】画像取得部で取得した画像の一例として、手の
画像を示した図。FIG. 6 is a diagram showing a hand image as an example of an image acquired by an image acquisition unit.

【図７】図６で示した手の奥行き画像を用いて、図３の
２次元的に配置された音響素から音を発生させる場合に
ついて説明するための図。FIG. 7 is a diagram for explaining a case where sound is generated from the two-dimensionally arranged phonemes of FIG. 3 using the depth image of the hand shown in FIG.

【図８】図６で示した手の奥行き画像を用いて、図５の
３次元的に配置された音響素から音を発生させる場合に
ついて説明するための図。8 is a diagram for explaining a case where sound is generated from the three-dimensionally arranged phonemes of FIG. 5 using the depth image of the hand shown in FIG.

【図９】図１の音響生成装置の動作を説明するためのフ
ローチャート。9 is a flowchart for explaining the operation of the sound generation device in FIG.

【図１０】画像取得部で取得される画像（奥行き画像）
の奥行き値について説明するための図。FIG. 10 is an image (depth image) acquired by the image acquisition unit.
For explaining the depth value of the.

【図１１】パワースペクトル算出処理を説明するための
フローチャート。FIG. 11 is a flowchart for explaining a power spectrum calculation process.

【図１２】算出された各音響素毎のパワースペクトルの
一例を示したグラフ。FIG. 12 is a graph showing an example of the calculated power spectrum of each phoneme.

【図１３】図１２の１００Ｈｚの音響素の正弦波（振幅
がパワー値）の一例を示した図。13 is a diagram showing an example of a sine wave (amplitude is a power value) of the 100-Hz acoustic element in FIG.

【図１４】図１２の２００Ｈｚの音響素の正弦波（振幅
がパワー値）の一例を示した図。14 is a diagram showing an example of a sine wave (amplitude is a power value) of the 200-Hz acoustic element in FIG.

【図１５】算出された各音響素のパワー値を振幅とした
各音響素の正弦波を重畳して生成された正弦波の一例を
示した図。FIG. 15 is a diagram showing an example of a sine wave generated by superimposing a sine wave of each acoustic element whose amplitude is the calculated power value of each acoustic element.

【図１６】本発明の第２の実施形態に係る音響生成装置
の構成例を示した図。FIG. 16 is a diagram showing a configuration example of a sound generation device according to a second embodiment of the present invention.

【図１７】表示部に表示された音響素の配列を変更する
際の編集中の画面表示例を示した図。FIG. 17 is a diagram showing a screen display example during editing when changing the arrangement of the phonemes displayed on the display unit.

【図１８】図１６の音響生成装置の動作を説明するため
のフローチャート。18 is a flowchart for explaining the operation of the sound generation device in FIG.

【図１９】本発明の第２の実施形態に係る音響生成装置
の他の構成例で、画像記憶部を具備している場合を示し
た図。FIG. 19 is a diagram showing another configuration example of the sound generation device according to the second embodiment of the present invention, which includes an image storage unit.

【図２０】本発明の第２の実施形態に係る音響生成装置
のさらに他の構成例で、時系列変形部を具備している場
合を示した図。FIG. 20 is a diagram showing still another configuration example of the sound generation apparatus according to the second embodiment of the present invention, which is provided with a time series deformation unit.

【図２１】本発明の第２の実施形態に係る音響生成装置
のさらに他の構成例、音楽情報入力部を具備している場
合を示した図。FIG. 21 is a diagram showing a further example of the configuration of the sound generation device according to the second embodiment of the present invention, which is provided with a music information input unit.

[Explanation of symbols]

１…画像取得部２…音響素記憶部３…情報管理部４…音響素演算部５…パワースペクトル算出部６…表示部７…音響生成部８…音響編集部９…画像記憶部１０…時系列変形部１１…音楽情報入力部 1 ... Image acquisition unit 2 ... Phoneme storage 3 ... Information Management Department 4 ... Acoustic element calculator 5 ... Power spectrum calculation unit 6 ... Display 7 ... Sound generation unit 8 ... Sound editorial department 9 ... Image storage unit 10 ... Time series transformation section 11 ... Music information input section

フロントページの続き (56)参考文献特開昭64−91190（ＪＰ，Ａ) 特開平10−26978（ＪＰ，Ａ) 特開平10−240282（ＪＰ，Ａ) 実公平３−48638（ＪＰ，Ｙ２) 特許2629740（ＪＰ，Ｂ２) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10H 1/00 - 7/12 G10K 15/04 302 Continuation of the front page (56) Reference JP-A 64-91190 (JP, A) JP-A 10-26978 (JP, A) JP-A 10-240282 (JP, A) Jitsuko 3-48638 (JP , Y2) Patent 2629740 (JP, B2) (58) Fields investigated (Int.Cl. ⁷ , DB name) G10H 1/00-7/12 G10K 15/04 302

Claims

(57) [Claims]

1. An acquisition step of acquiring an image of an imaged object, in which each pixel value represents the depth of the imaged object, and a plurality of acoustic elements each defined by coordinates and frequency values.
From among the above, in the image acquired in the acquisition step,
A sound having coordinates included in the image area corresponding to the imaging target
A selection step of selecting Hibikimoto, corresponding to the coordinates included in each acoustic element selected in the selecting step among the pixels of the image area of the imaging target in the image
A calculation step of calculating a power value of a sound corresponding to each of the selected phonemes based on a pixel value of the pixel, and a plurality of frequency values and power values of the selected phonemes. And a sound producing step for producing a sound to be output by superimposing the sound of.

2. Defining each of the plurality of phonemes
The coordinate is a three-dimensional coordinate, The claim 1 characterized in that
The sound generation method described above.

3. An image pickup device for picking up an image of an image pickup target, wherein each pixel value obtains an image representing the depth of the image pickup target, and a plurality of acoustic elements each defined by coordinates and frequency values.
From among the above, the imaging in the image acquired by the acquisition means
A phoneme having coordinates included in the image area corresponding to the target
Based selecting means for selecting the pixel value of <br/> pixel corresponding to the coordinates included in each acoustic element selected by said selection means among the pixels of the image area of the imaging target in the image, the Calculation means for calculating a power value of a sound corresponding to each selected phoneme, and a sound to be output by superposing a plurality of sounds each having a frequency value and the power value of each selected phoneme A sound generation device comprising: a generation unit that generates a.

4. Defining each of the plurality of phonemes
4. The coordinates according to claim 3, wherein the coordinates are three-dimensional coordinates.
On-board sound generator.

5. The sound generation apparatus according to claim 3, further comprising a changing unit that changes the coordinates of the phoneme.

6. The changing means for changing the power value of each of the plurality of sounds based on a predetermined rule or information extracted from a given music file. Sound generator.

7. The sound according to claim 3, further comprising changing means for changing the coordinates of the plurality of phonemes based on a predetermined rule or information extracted from a given music file. Generator.

8. An acquisition step of acquiring an image of an imaged object, each pixel value of which represents the depth of the imaged object, and a plurality of acoustic elements each of which is defined by coordinates and frequency values.
From among the above, in the image acquired in the acquisition step,
A sound having coordinates included in the image area corresponding to the imaging target
A selection step of selecting Hibikimoto, corresponding to the coordinates included in each acoustic element selected in the selecting step among the pixels of the image area of the imaging target in the image
A calculation step of calculating the power value of the sound corresponding to each of the selected phonemes based on the pixel value of the pixel, and a plurality of frequency values and power values of the selected phonemes. A machine-readable recording medium that stores a program that causes a computer to execute a generation step of generating a sound to be output by superimposing the sound of.

9. Defining each of the plurality of phonemes
9. The coordinates are three-dimensional coordinates, according to claim 8.
Recording medium.