JP4185437B2

JP4185437B2 - Video composition output device and audio reproduction output device

Info

Publication number: JP4185437B2
Application number: JP2003371798A
Authority: JP
Inventors: 仁博冨山; 祐一岩舘; 豊折原
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2003-10-31
Filing date: 2003-10-31
Publication date: 2008-11-26
Anticipated expiration: 2023-10-31
Also published as: JP2005136776A

Description

本発明は、映像合成出力装置及び音声再生出力装置に係り、特に、カードを識別するための文字や模様等が描かれたカードを撮影し、撮影したカード情報や動きに対応した動画映像、三次元動映像、又は音声を高精度に出力するための映像合成出力装置及び音声再生出力装置に関する。 The present invention relates to a video composition output device and an audio reproduction output device, and in particular, a card on which characters or patterns for identifying a card are drawn, and a moving image corresponding to the card information and movement, The present invention relates to a video composition output device and an audio reproduction output device for outputting original motion video or audio with high accuracy.

従来、模様が描かれたカードを撮影し、撮影したカメラ映像から検出されるカードの種類や三次元位置情報に基づいて、楽器を制御するＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）信号を出力する装置、及び映像を出力する装置についての技術が知られている（例えば、特許文献１参照。）。 Conventionally, an apparatus for photographing a card with a pattern and outputting a MIDI (Musical Instrument Digital Interface) signal for controlling a musical instrument based on the type of card and three-dimensional position information detected from the photographed camera image, and A technique regarding an apparatus that outputs video is known (for example, see Patent Document 1).

特許文献１では、オペレータにより把持されるカードを撮影することで得られるビデオストリーム信号に基づいて、カードの三次元位置及び姿勢等からなる三次元情報を検出し、それに対応するＭＩＤＩデータを出力する。また、特許文献１では、カードを撮影することにより得られる三次元情報（座標位置、回転角度、チルト角度等）に応じた映像を表示する。 In Patent Document 1, three-dimensional information including a three-dimensional position and posture of a card is detected based on a video stream signal obtained by photographing a card held by an operator, and MIDI data corresponding to the three-dimensional information is output. . In Patent Document 1, an image corresponding to three-dimensional information (coordinate position, rotation angle, tilt angle, etc.) obtained by photographing a card is displayed.

また、表示させる映像としては三次元形状のデータが考えられるが、表示するために必要な三次元形状データを取得する技術としては、例えば、被写体の概形形状を測定し、その概形形状を基にブロックマッチングにより被写体のより詳細な形状を求め、更にカメラからの距離情報を求めることができる技術がある（例えば、特許文献２参照。）。
特開２００２−３２０７６号公報特開２００３−２７１９２８号公報 In addition, 3D shape data can be considered as an image to be displayed, but as a technique for acquiring 3D shape data necessary for display, for example, the approximate shape of a subject is measured and the approximate shape is obtained. There is a technique that can obtain a more detailed shape of a subject based on block matching and further obtain distance information from the camera (see, for example, Patent Document 2).
JP 2002-32076 A Japanese Patent Laid-Open No. 2003-271928

しかしながら、特許文献１には、カード毎に対応する音声や映像を出力するだけであり、例えば、カメラにより撮影されるカードの移動速度等に対応した映像や音声の出力が考慮されていない。また、同時に撮影されるカードが複数であった場合に、その組み合わせや位置関係に対応した映像や音声を出力することもできない。このため、カードを用いた高精度な映像、音声出力を実現しているとは言えない。 However, Patent Document 1 only outputs audio and video corresponding to each card. For example, video and audio output corresponding to the moving speed of a card photographed by a camera is not considered. In addition, when a plurality of cards are photographed at the same time, video and audio corresponding to the combination and positional relationship cannot be output. For this reason, it cannot be said that high-accuracy video and audio output using a card is realized.

本発明は、上述した問題点に鑑みなされたものであり、カードを撮影した映像から検出されるカード情報やカードの速度情報に基づいて、動画映像や三次元動映像と、実際に撮影されているカメラ映像とを合成して高精度な映像を出力する映像合成出力装置及びカード情報やカードの速度情報に基づいて高精度な音声出力を実現するための映像合成出力装置及び音声再生出力装置を提供することを目的とする。 The present invention has been made in view of the above-described problems, and based on card information and card speed information detected from an image obtained by photographing a card, a moving image and a three-dimensional moving image are actually photographed. A video synthesis output device that synthesizes a camera video and outputs a high-accuracy video, and a video synthesis output device and an audio playback output device for realizing high-accuracy audio output based on card information and card speed information The purpose is to provide.

上記課題を解決するために、本件発明は、以下の特徴を有する課題を解決するための手段を採用している。 In order to solve the above problems, the present invention employs means for solving the problems having the following characteristics.

請求項１に記載された発明は、カードを撮影するカメラの映像信号によりカードの種類、三次元位置情報、及び姿勢情報からなるカード情報に基づいて、前記映像信号に他の映像信号を合成して出力するための映像合成出力装置において、前記カメラにより撮影される映像信号の所定のフレーム間隔におけるフレーム毎に得られる前記カード情報に基づいて、カードの速度を検出する速度検出部と、前記カード情報と前記速度検出部により得られる速度情報とに基づいて、合成する映像信号を選択して出力する映像出力部とを有することを特徴とする。 The invention described in claim 1 synthesizes another video signal to the video signal based on the card information including the card type, the three-dimensional position information, and the posture information based on the video signal of the camera that shoots the card. A speed detection unit that detects a card speed based on the card information obtained for each frame at a predetermined frame interval of a video signal shot by the camera, and the card And a video output unit that selects and outputs a video signal to be synthesized based on the information and the speed information obtained by the speed detection unit.

請求項１記載の発明によれば、カード情報だけでなく速度情報を用いて映像を選択することで、カードの移動内容に対応する様々な映像を出力することができる。これにより、インタラクティブ性に優れた高精度な映像を出力することができる。 According to the first aspect of the present invention, various videos corresponding to the movement contents of the card can be output by selecting a video using not only the card information but also the speed information. As a result, it is possible to output a highly accurate video with excellent interactivity.

請求項２に記載された発明は、前記映像出力部は、前記カード情報と前記速度検出部により得られる速度情報とに基づいて、動画映像を選択し、前記三次元位置情報と前記姿勢情報とに基づいて、前記動画映像の表示、非表示、大きさ、位置、方向を制御して出力することを特徴とする。 According to a second aspect of the present invention, the video output unit selects a moving image based on the card information and the speed information obtained by the speed detection unit, and the three-dimensional position information, the posture information, Based on the above, the display, non-display, size, position, and direction of the moving image are controlled and output.

請求項２記載の発明によれば、カード情報及び速度情報に基づいて動画映像を選択し、更に大きさや表示位置等を制御して出力することにより、より高精度な映像出力を実現することができる。 According to the second aspect of the present invention, it is possible to realize a higher-accuracy video output by selecting a video image based on the card information and the speed information, and further controlling and outputting the size, the display position, and the like. it can.

請求項３に記載された発明は、前記映像出力部は、前記カード情報と、前記速度検出部により得られる速度情報と、色情報及び前記三次元位置情報から得られる接続情報とに基づいて三次元動映像を選択し、前記三次元位置情報と前記姿勢情報とに基づいて、前記三次元動映像の表示、非表示、大きさ、位置、方向を制御して出力することを特徴とする。 According to a third aspect of the present invention, the video output unit is a third order based on the card information, the speed information obtained by the speed detection unit, and the connection information obtained from the color information and the three-dimensional position information. An original moving image is selected, and the display, non-display, size, position, and direction of the three-dimensional moving image are controlled and output based on the three-dimensional position information and the posture information.

請求項３記載の発明によれば、カード情報、速度情報、及び接続情報に基づいて三次元動映像を選択し、更に大きさや表示位置等を制御して出力することにより、より高精度な映像出力を実現することができる。 According to the invention described in claim 3, more accurate video can be obtained by selecting a three-dimensional video based on card information, speed information, and connection information, and further controlling the size, display position, etc. and outputting the video. Output can be realized.

請求項４に記載された発明は、前記カメラにより撮影される映像信号に複数のカードが存在する場合、前記速度検出部は、前記カード情報に基づいて前記複数のカード間の相対速度情報を検出することを特徴とする。 According to a fourth aspect of the present invention, when there are a plurality of cards in a video signal photographed by the camera, the speed detection unit detects relative speed information between the plurality of cards based on the card information. It is characterized by doing.

請求項４記載の発明によれば、複数のカードの夫々についてカード情報及び相対速度情報を用いて映像を選択することで、カード間の移動内容に対応する様々な映像を出力することができる。これにより、インタラクティブ性に優れた高精度な映像を出力することができる。 According to the fourth aspect of the present invention, it is possible to output various videos corresponding to the movement contents between the cards by selecting the video using the card information and the relative speed information for each of the plurality of cards. As a result, it is possible to output a highly accurate video with excellent interactivity.

請求項５に記載された発明は、前記複数のカードのカード情報からカード間の距離や姿勢情報からなる相互位置姿勢情報を検出する相互位置姿勢検出部を有することを特徴とする。 The invention described in claim 5 includes a mutual position / posture detection unit that detects mutual position / posture information including the distance between the cards and the posture information from the card information of the plurality of cards.

請求項５記載の発明によれば、相互位置姿勢情報を用いて、カード間の移動や姿勢内容に対応する様々な映像を出力することができる。 According to the fifth aspect of the present invention, it is possible to output various images corresponding to movement between cards and content of posture using the mutual position and posture information.

請求項６に記載された発明は、前記映像出力部は、前記相対速度情報と前記相互位置姿勢情報とに基づいて、前記動画映像及び／又は前記三次元動映像を選択して出力することを特徴とする。 According to a sixth aspect of the present invention, the video output unit selects and outputs the moving image and / or the three-dimensional moving image based on the relative speed information and the mutual position and orientation information. Features.

請求項６記載の発明によれば、前記相対速度情報と前記相互位置姿勢情報とを用いることで、より高精度な映像出力を実現することができる。 According to the sixth aspect of the invention, by using the relative speed information and the mutual position / posture information, it is possible to realize more accurate video output.

請求項７に記載された発明は、カードを撮影するカメラの映像信号によりカードの種類、三次元位置情報、及び姿勢情報からなるカード情報に基づいて、予め蓄積された音声を再生出力する音声再生出力装置において、前記カメラにより撮影される映像信号の所定のフレーム間隔におけるフレーム毎に得られる前記カード情報に基づいて、カードの速度を検出する速度検出部と、前記カード情報と前記速度検出部により得られる速度情報とに基づいて、音声ファイルを選択して再生出力する音声再生出力部とを有し、前記カメラにより撮影される映像信号に複数のカードが存在する場合、前記速度検出部は、前記カード情報に基づいて前記複数のカード間の相対速度情報を検出することを特徴とする。 According to the seventh aspect of the present invention, the sound reproduction for reproducing and outputting the sound accumulated in advance based on the card information including the card type, the three-dimensional position information, and the posture information by the video signal of the camera for photographing the card. In the output device, based on the card information obtained for each frame at a predetermined frame interval of the video signal photographed by the camera, a speed detection unit that detects a card speed, and the card information and the speed detection unit based on the obtained velocity information, if you select the audio file possess an audio reproduction output unit for reproducing output, a plurality of card is present in the video signal captured by the camera, the speed detection unit, Relative speed information between the plurality of cards is detected based on the card information .

請求項７記載の発明によれば、カード情報だけでなく速度情報を用いて音声を選択することで、カードの移動内容に対応する様々な音声を出力することができる。また、複数のカードの夫々についてカード情報及び相対速度情報を用いて音声を選択することで、カード間の移動内容に対応する様々な音声を出力することができる。これにより、インタラクティブ性に優れた高精度な音声を出力することができる。 According to the seventh aspect of the invention, it is possible to output various sounds corresponding to the movement contents of the card by selecting the sound using not only the card information but also the speed information. In addition, by selecting sound using card information and relative speed information for each of a plurality of cards, various sounds corresponding to the movement contents between the cards can be output. This makes it possible to output highly accurate voice with excellent interactivity.

請求項８に記載された発明は、前記複数のカードのカード情報からカード間の距離や姿勢情報からなる相互位置姿勢情報を検出する相互位置姿勢検出部を有することを特徴とする。 The invention described in claim 8 includes a mutual position / posture detecting unit that detects mutual position / posture information including the distance between the cards and the posture information from the card information of the plurality of cards.

請求項８記載の発明によれば、相互位置姿勢情報を用いて、カード間の移動や姿勢内容に対応する様々な音声を出力することができる。 According to the eighth aspect of the invention, it is possible to output various sounds corresponding to movement between cards and posture contents using the mutual position and posture information.

請求項９に記載された発明は、前記音声再生出力部は、前記相対速度情報と前記相互位置姿勢情報とに基づいて、前記音声ファイルを選択して出力することを特徴とする。 The invention described in claim 9 is characterized in that the audio reproduction output unit selects and outputs the audio file based on the relative velocity information and the mutual position and orientation information.

請求項９記載の発明によれば、前記相対速度情報と前記相互位置姿勢情報とを用いることで、より高精度な音声出力を実現することができる。 According to the ninth aspect of the invention, by using the relative velocity information and the mutual position and orientation information, it is possible to realize more accurate voice output.

本発明によれば、カードを撮影したカメラ映像から得られるカード情報及びカードの速度情報に基づいて、インタラクティブ性に優れた高精度な映像又は音声を出力することができる。 ADVANTAGE OF THE INVENTION According to this invention, based on the card information obtained from the camera image | video which image | photographed the card | curd, and the speed information of a card | curd, the highly accurate image | video or audio | voice excellent in the interactive property can be output.

＜本発明の概要＞
本発明は、特定の文字や模様等の識別情報が描かれたカードを撮影し、撮影したカメラ映像から検出されるカードの種類、三次元位置情報、姿勢情報からなるカード情報とカードの速度情報に基づいて、予め蓄積されている動画映像や三次元動映像から対応する映像を選択し、実際に撮影されているカメラ映像と合成して出力する。 <Outline of the present invention>
The present invention shoots a card on which identification information such as a specific character or pattern is drawn, and detects card information including card type, three-dimensional position information, and posture information detected from the captured camera image, and card speed information. Based on the above, the corresponding video is selected from the previously stored moving image and 3D video, and is combined with the actually captured camera video and output.

ここで、三次元位置情報とは、カメラの撮影にて得られるカードのｘ、ｙ、ｚ座標における位置情報を表している。上述の内容を図を用いて説明する。図１は、カードとカメラとの位置関係を示す一例の図である。ここで、カード１１には、カードの種類（種別）を特定するために文字（図１において、ａ）が描かれている。なお、カードを識別できるものであれば、文字に限らず、数字や、模様、顔の写真等でもよい。 Here, the three-dimensional position information represents position information on the x, y, and z coordinates of the card obtained by photographing with the camera. The above contents will be described with reference to the drawings. FIG. 1 is a diagram illustrating an example of a positional relationship between a card and a camera. Here, characters (a in FIG. 1) are drawn on the card 11 in order to specify the type (type) of the card. As long as the card can be identified, it is not limited to letters, but may be numbers, patterns, facial photographs, and the like.

図１に示すようにカード１１の位置情報は、ｘ座標がカメラに対して水平方向の位置を示し、ｙ座標がカメラに対して垂直方向の位置を示し、ｚ座標がカメラに対する距離を示している。また、姿勢情報は、上述のｘ軸、ｙ軸、ｚ軸の各軸に対する所定のカードの位置を基準とした回転の度合を示すものであり、カードそのものの姿勢を表すものである。これにより、ユーザがカードを動かした場合、カードを撮影したカメラからカード情報（カードの種類、三次元位置情報、姿勢情報）を取得する。 As shown in FIG. 1, in the position information of the card 11, the x coordinate indicates a position in the horizontal direction with respect to the camera, the y coordinate indicates a position in the vertical direction with respect to the camera, and the z coordinate indicates a distance with respect to the camera. Yes. The posture information indicates the degree of rotation based on the position of a predetermined card with respect to the above-described x-axis, y-axis, and z-axis, and represents the posture of the card itself. Thus, when the user moves the card, card information (card type, three-dimensional position information, posture information) is acquired from the camera that has photographed the card.

また、本発明ではカメラ映像から検出される複数のフレームから予め設定されたフレーム間隔により夫々の三次元位置情報、姿勢情報の差分を取得し、取得した差分からカードの移動速度を検出する。検出した速度情報に基づいて対応する動画映像及び／又は三次元動映像を選択し、実際に撮影されちるカメラの映像信号と合成して出力する。 In the present invention, the difference between the three-dimensional position information and the posture information is acquired from a plurality of frames detected from the camera image at a preset frame interval, and the moving speed of the card is detected from the acquired difference. A corresponding moving image and / or 3D moving image is selected based on the detected speed information, and is combined with a video signal of a camera that is actually shot and output.

具体的には、特定の文字や模様等の識別情報が描かれたカードの領域抽出、パターンマッチング等の画像処理を行うことにより、カードの種類、三次元位置情報、姿勢情報からなるカード情報を取得する。なお、上述のカード情報を取得するための画像処理技術としては、既に様々な技術が存在し、例えば、「画像処理工学」末松良一，山田宏尚著、コロナ社（以下、非特許文献１という。）に記載されている画像認識技術や、「マーカー追跡に基づく拡張現実感システムとそのキャリブレーション」加藤博一，ＭａｒｋＢｉｌｌｉｎｇｈｕｒｓｔ，浅野浩一，橘啓八郎著、基礎論文ＴＶＲＳＪｖｏｌ４Ｎｏ．４１９９９（以下、非特許文献２という。）に記載されている技術を適用することができる。 Specifically, card information consisting of card type, three-dimensional position information, and posture information is obtained by performing image processing such as area extraction and pattern matching of cards on which identification information such as specific characters and patterns are drawn. get. Various image processing techniques for acquiring the above card information already exist. For example, “Image Processing Engineering” by Ryoichi Suematsu and Hirohisa Yamada, Corona (hereinafter referred to as Non-Patent Document 1). ), "Augmented reality system based on marker tracking and its calibration" Hirokazu Kato, Mark Billinghurst, Koichi Asano, Keihachi Tachibana, basic paper TVRSJ vol4 No. 4 1999 (hereinafter, referred to as non-patent document 2) can be applied.

また、カード情報に基づいて、予め蓄積されているか又は外部から入力される動画映像と、更に色情報及び三次元座標情報を持つ頂点群の接続情報により蓄積されている三次元動映像とを射影変換、視点変換して出力し、クロマキー処理等の映像合成手法を利用して、実際にカードを撮影しているカメラ映像と合成して出力する。 Also, based on the card information, a video image stored in advance or inputted from the outside and a 3D moving image stored by connection information of vertex groups having color information and 3D coordinate information are projected. Conversion, viewpoint conversion, and output, and using a video synthesis method such as chroma key processing, the video is synthesized with the camera video actually shooting the card and output.

ここで、射影変換、視点変換については、例えば、「ＯｐｅｎＧＬＰｒｏｇｒａｍｍｉｎｇＧｕｉｄｅ」ＪａｃｋｉｅＮｅｉｄｅｒ，ＴｏｍＤａｖｉｓ，ＭａｓｏｎＷｏｏ著、アジソンウェスレイ（以下、非特許文献３という。）に記載されているように三次元空間を表現するためのフォーマットであるＯｐｅｎＧＬを用いた射影変換、視点変換技術を適用することができる。また、画像の合成技術については、例えば、「Ｃ言語で学ぶ実践画像処理」（井上誠喜，八木伸行，林正樹，中須英輔，三谷公二，奥井誠人著、オーム社）（以下、非特許文献４いう。）等に記載されているクロマキー処理等の映像合成を用いることができる。 Here, projection transformation and viewpoint transformation are three-dimensional as described in, for example, “OpenGL Programming Guide” by Jackie Neider, Tom Davis, Mason Woo, Addison Wesley (hereinafter referred to as Non-Patent Document 3). Projective transformation and viewpoint transformation techniques using OpenGL, which is a format for expressing a space, can be applied. As for image synthesis technology, for example, “Practical image processing learned in C language” (Masayuki Inoue, Nobuyuki Yagi, Masaki Hayashi, Eisuke Nakasu, Koji Mitani, Masato Okui, Ohmsha) Video composition such as chroma key processing described in the above.

更に、上述の色情報とは、例えば、カードの領域が矩形で形成されている場合、その矩形の頂点（Ｐ_１，Ｐ_２，Ｐ_３，Ｐ_４）の夫々の色情報（Ｒ_１，Ｇ_１，Ｂ_１）〜（Ｒ_４，Ｇ_４，Ｂ_４）を示し、上述の接続情報は、例えば、上述の矩形の頂点（Ｐ_１，Ｐ_２，Ｐ_３，Ｐ_４）から形成される２つの三角形（Ｐ_１，Ｐ_２，Ｐ_４），（Ｐ_１，Ｐ_３，Ｐ_４）のように形成された領域における点の接続情報を示している。なお、上述の三角形（Ｐ_１，Ｐ_２，Ｐ_４），（Ｐ_１，Ｐ_３，Ｐ_４）の夫々の色情報を内挿することで三角形の領域内の色が設定され、これが三次元形状の面に相当する。このように色情報を用いて三次元動映像を形成する技術としては、例えば、「注解ＶＲＭＬ２．０リファレンスマニュアル」（ＲｉｋｋＣａｒｅｙ，ＧａｖｉｎＢｅｌｌ著アジソンウェスレイ）（以下、非特許文献５という。）がある。 Furthermore, the above-described color information is, for example, when the card area is formed in a rectangle, each color information (R ₁ , G ₄ ) of the vertices (P ₁ , P ₂ , P ₃ , P ₄ ) of the rectangle. ₁ , B ₁ ) to (R ₄ , G ₄ , B ₄ ), and the above-described connection information is formed from, for example, the above-described rectangular vertices (P ₁ , P ₂ , P ₃ , P ₄ ) 2 The point connection information in a region formed as two triangles (P ₁ , P ₂ , P ₄ ) and (P ₁ , P ₃ , P ₄ ) is shown. It should be noted that the colors in the triangular area are set by interpolating the color information of the triangles (P ₁ , P ₂ , P ₄ ) and (P ₁ , P ₃ , P ₄ ) described above, It corresponds to the surface of the shape. As a technique for forming a three-dimensional moving image using color information in this manner, for example, “Note VRML2.0 Reference Manual” (Rick Carey, Gavin Bell, Addison Wesley) (hereinafter referred to as Non-Patent Document 5). There is.

また、本発明では、カメラで撮影されたカードの種類、三次元位置情報、姿勢情報からなるカード情報及びカードの速度情報に基づいて、予め蓄積される音声ファイルから再生出力する音声ファイルの種類、ボリューム、パンニングを選択し、その音声ファイルを再生出力する。ここで、音声ファイルのフォーマットとしては、ｗａｖ，ａｉｆｆ，ＭＰ３，ＷＭＡ，ＶＱ，ｒｍ，ｒａｍ，ｍｏｖ，ａａｃ，ａｔｒａｃ３等がある。 Further, in the present invention, based on the card type photographed by the camera, the three-dimensional position information, the card information consisting of the posture information and the card speed information, the type of the audio file reproduced and output from the pre-stored audio file, Select volume and panning, and play and output the audio file. Here, the audio file format includes wav, aiff, MP3, WMA, VQ, rm, ram, mov, aac, atrac3, and the like.

更に、本発明では、複数のカードが撮影された場合に、複数のカード同士における互いの種類、三次元位置情報、姿勢情報から前記動画映像、三次元動映像の表示、非表示、大きさ、位置、方向、及び前記音声ファイルのオン、オフ、ボリューム、パンニング、出力する音声の種類を制御する。 Furthermore, in the present invention, when a plurality of cards are photographed, from the mutual type, the three-dimensional position information, and the posture information among the plurality of cards, the moving image, the display of the three-dimensional moving image, the non-display, the size, Controls the position, direction, and on / off, volume, panning, and type of audio output of the audio file.

＜実施例１：映像合成出力装置＞
次に、本発明を適用した動画映像及び三次元動映像を実際に撮影されているカメラからの映像信号と合成して出力するための映像合成出力装置について図を用いて説明する。 <Example 1: Video composition output device>
Next, a video composition output apparatus for synthesizing and outputting a moving image and a three-dimensional moving image to which the present invention is applied with a video signal from a camera that is actually photographed will be described with reference to the drawings.

図２は、本発明における映像合成出力システムの一構成例を示す図である。図２の映像合成出力システムは、カード２１を撮影するためのカメラ２２と、カード情報取得装置２３と、映像合成出力装置２４とを有するよう構成されている。また、映像合成出力装置２４は、速度検出部２５と、動画映像出力部２６と、三次元動映像出力部２７と、映像合成出力部２８とを有するよう構成されている。 FIG. 2 is a diagram showing a configuration example of a video composition output system according to the present invention. The video composition output system of FIG. 2 is configured to include a camera 22 for photographing the card 21, a card information acquisition device 23, and a video composition output device 24. In addition, the video composition output device 24 is configured to include a speed detection unit 25, a moving image output unit 26, a 3D moving image output unit 27, and a video composition output unit 28.

図２に示す映像合成出力システムは、まずカードを識別するための特定の文字や模様が描かれているカード２１をカメラ２２により撮影する。カメラ２２にて撮影されたカメラ映像信号は、カード情報取得装置２３及び映像合成出力部２８に出力される。なお、カード２１は、例えば、ユーザにより把持されカメラ２１に対して三次元座標（図１に示すｘ，ｙ，ｚ座標）での移動や回転を行う。 In the video composition output system shown in FIG. 2, first, the camera 22 photographs a card 21 on which a specific character or pattern for identifying the card is drawn. The camera video signal photographed by the camera 22 is output to the card information acquisition device 23 and the video composition output unit 28. For example, the card 21 is held by a user and moves or rotates with respect to the camera 21 in three-dimensional coordinates (x, y, z coordinates shown in FIG. 1).

カード情報取得装置２３は、カメラ２２から入力されたカメラ映像信号からカード２１の領域を抽出し、パターンマッチング等の画像処理を行って撮影された映像中に含まれるカードの種類、三次元位置情報、姿勢情報からなるカード情報を取得する。また、カード情報取得装置２３は、取得したカード情報を速度検出部２５、動画映像出力部２６、及び三次元動映像出力部２７に出力する。なお、カード情報取得装置２３にて、カード２１の領域を抽出する際には、例えば、非特許文献２に示すように図２のカード２１内に矩形を描き、その描かれた角の三次元座標を取得することで、三次元位置情報や姿勢情報を取得することができる。 The card information acquisition device 23 extracts the area of the card 21 from the camera video signal input from the camera 22, performs image processing such as pattern matching, and the type of card included in the captured video, and the three-dimensional position information. The card information consisting of posture information is acquired. Further, the card information acquisition device 23 outputs the acquired card information to the speed detection unit 25, the moving image output unit 26, and the 3D moving image output unit 27. When the card information acquisition device 23 extracts the area of the card 21, for example, as shown in Non-Patent Document 2, a rectangle is drawn in the card 21 of FIG. By acquiring the coordinates, it is possible to acquire three-dimensional position information and posture information.

速度検出部２５は、カード情報取得装置２３から逐次入力されるカードの種類、三次元位置情報、及び姿勢情報からカードの種類毎に映像信号に含まれる予め設定されるフレーム間隔での三次元位置情報及び姿勢情報を取得し、その三次元位置情報、姿勢情報の差分を計算してカード毎の方向を含めた移動速度を求め、カードの種類及び速度情報を動画映像出力部２６及び三次元動映像出力部２７に出力する。ここで、フレーム間隔として、例えば、１秒間に３０フレームを有する映像信号であれば間隔を１５フレームと設定する。 The speed detection unit 25 determines the three-dimensional position at a preset frame interval included in the video signal for each card type from the card type, the three-dimensional position information, and the posture information sequentially input from the card information acquisition device 23. Information and posture information is obtained, the difference between the three-dimensional position information and posture information is calculated to determine the moving speed including the direction of each card, and the card type and speed information is obtained from the video image output unit 26 and the three-dimensional motion information. The video is output to the video output unit 27. Here, for example, if the video signal has 30 frames per second, the frame interval is set to 15 frames.

動画映像出力部２６では、入力されるカードの種類及び移動速度から予め蓄積されている動画映像データ又は外部から入力される動画映像の中から対応する動画映像を選択し、更に、カードの三次元位置情報、姿勢情報に基づいて、出力する動画映像を射影変換及び視点変換により移動、回転させて動画映像信号として映像合成出力部２８に出力する。なお、射影変換及び視点変換により出力される映像は、実際にカメラ２２により撮影されたカード上に出力され、更に、向きをカードの向きに対応して調整した映像が出力されるようにする。これにより、カードの向きに対応して映像の向きを変えて表示することができる。 The moving image output unit 26 selects a corresponding moving image from moving image data stored in advance or moving image input from the outside based on the type and moving speed of the input card, and further, the three-dimensional card Based on the position information and the posture information, the moving image to be output is moved and rotated by projection conversion and viewpoint conversion, and is output to the image composition output unit 28 as a moving image signal. Note that the video output by the projective conversion and the viewpoint conversion is output on the card actually captured by the camera 22, and the video whose direction is adjusted in accordance with the direction of the card is output. Thereby, it is possible to display by changing the direction of the video according to the direction of the card.

なお、出力される映像の位置は、本発明においてはこの限りではなく、例えば、カードから所定の位置だけ離れた位置に映像を出力させるよう設定することもできる。 Note that the position of the video to be output is not limited to this in the present invention. For example, the video can be set to be output at a position away from the card by a predetermined position.

また、三次元動映像出力部２７は、予め蓄積されている三次元動映像データの中から、速度検出部２５により入力された速度情報と、カード情報取得装置２３により入力されたカードの種類、三次元位置情報、姿勢情報からなるカード情報、及び上述した色情報及び三次元座標情報を持つ頂点群の接続情報から対応する三次元動映像を選択する。 The 3D moving image output unit 27 includes speed information input by the speed detecting unit 25 and the type of card input by the card information acquisition device 23 from the previously stored 3D moving image data. A corresponding three-dimensional moving image is selected from the card information including three-dimensional position information and posture information, and the above-described vertex group connection information having color information and three-dimensional coordinate information.

更に、カード２１の三次元位置情報、姿勢情報に基づいて、出力する三次元動映像を射影変換、視点変換して移動、回転させてＯｐｅｎＧＬ等の一般に利用可能な三次元動映像描画ソフトウェアにより描画し三次元動映像を生成し、生成した三次元動映像信号を映像合成出力部２８に出力する。 Further, based on the three-dimensional position information and posture information of the card 21, the output three-dimensional moving image is projected and converted by the viewpoint, moved and rotated, and drawn by a generally available three-dimensional moving image drawing software such as OpenGL. Then, a three-dimensional moving image is generated, and the generated three-dimensional moving image signal is output to the image composition output unit 28.

なお、三次元空間（三次元仮想現実空間）を表現するための三次元空間フォーマットとしては、ＯｐｅｎＧＬの他にＶＲＭＬ，ＤｉｒｅｃｔＸ，ＤＸＦ等の三次元空間フォーマットを適用することができる。 As a three-dimensional space format for expressing the three-dimensional space (three-dimensional virtual reality space), a three-dimensional space format such as VRML, DirectX, DXF, etc. can be applied in addition to OpenGL.

ここで、動画映像出力部２６及び三次元動映像出力部２７に夫々蓄積されている映像を選択するためのデータの例について、図を用いて説明する。 Here, an example of data for selecting videos stored in the moving image output unit 26 and the 3D moving image output unit 27 will be described with reference to the drawings.

図３は、動画映像出力部及び三次元動映像出力部に夫々蓄積されている映像選択データの一例を示す図である。 FIG. 3 is a diagram illustrating an example of video selection data stored in each of the moving image output unit and the 3D moving image output unit.

図３に示すデータ項目は、カードの種類を特定するカード種別と、速度検出部２５から検出された速度条件と、カード種別及び速度条件に対応する映像等からなる。更に、三次元動映像出力部２７に蓄積されているデータには、上述の接続情報も含まれる。 The data items shown in FIG. 3 include a card type that specifies the type of card, a speed condition detected by the speed detection unit 25, a video corresponding to the card type and the speed condition, and the like. Further, the data stored in the 3D moving image output unit 27 includes the above connection information.

図３において、例えば、カード種別がａで、所定のフレーム間隔で測定されたカードの移動速度が７ｃｍ／ｓであった場合、人Ａが歩いている映像が選択される。また、カード種別がａで、所定のフレーム間隔で測定されたカードの移動速度が１５ｃｍ／ｓであった場合、人Ａが走っている映像が選択される。このようにカードの移動速度に応じて選択される映像が異なる。この選択された映像に基づいて、動画映像出力部２６、三次元動映像出力部２７の夫々において、映像を選択しカード情報に含まれる三次元座標情報及び姿勢情報に基づいて射影変換、視点変換を行う。更に、映像を表示、非表示、大きさ、位置、方向等の制御を行い映像合成出力部２８に出力する。 In FIG. 3, for example, when the card type is a and the moving speed of the card measured at a predetermined frame interval is 7 cm / s, an image in which the person A is walking is selected. Also, when the card type is a and the moving speed of the card measured at a predetermined frame interval is 15 cm / s, an image in which the person A is running is selected. As described above, the selected video differs depending on the moving speed of the card. Based on the selected video, each of the video video output unit 26 and the 3D video output unit 27 selects a video and performs projection conversion and viewpoint conversion based on the 3D coordinate information and orientation information included in the card information. I do. Further, the video is displayed, hidden, controlled in size, position, direction, etc., and output to the video composition output unit 28.

映像合成出力部２８では、カメラ２２から入力される実際に撮影されているカメラの映像と、動画映像出力部２６から入力される動画映像と、三次元動映像出力部２７から入力される三次元動映像信号をキー信号処理等を利用した映像合成ソフトウェアにより合成を行い、カメラ映像と、動画映像と、三次元動映像との合成信号を映像信号としてディスプレイ装置やプロジェクタ等により出力する。なお、出力される具体的な内容については後述する。 In the video composition output unit 28, the video of the actually captured camera input from the camera 22, the moving image input from the moving image output unit 26, and the 3D input from the 3D moving image output unit 27. The moving image signal is synthesized by image synthesizing software using key signal processing or the like, and a synthesized signal of the camera image, the moving image image, and the three-dimensional moving image is output as a video signal by a display device, a projector, or the like. The specific contents to be output will be described later.

ここで、キー信号処理については、例えば、上述の非特許文献４に示すようなクロマキー信号処理等を用いることにより、映像を合成することができる。 Here, with regard to the key signal processing, for example, a video can be synthesized by using chroma key signal processing or the like as shown in Non-Patent Document 4 described above.

また、映像合成出力部２８において、動画映像出力部２６又は三次元動映像出力部２７のどちらか一方からしか映像信号が入力されなければ、その一方から得られる映像信号と、カメラ２２から入力される映像信号とを合成して出力する。 Further, in the video composition output unit 28, if a video signal is input from only one of the moving image output unit 26 or the 3D moving image output unit 27, the video signal obtained from one of the video signals is input from the camera 22. The video signal is synthesized and output.

このように、速度検出部２５により得られる速度情報を用いることで、最適な映像を選択して表示させることができる。これにより、ユーザが動かしたカードに対してインタラクティブ性に優れた高精度な映像を出力することができる。 As described above, by using the speed information obtained by the speed detector 25, it is possible to select and display an optimal video. Thereby, it is possible to output a highly accurate video with excellent interactivity with respect to the card moved by the user.

＜実施例２：音声再生出力装置＞
次に、本発明を適用した音声再生出力システムの構成について図を用いて説明する。 <Example 2: Audio reproduction output device>
Next, the configuration of an audio reproduction output system to which the present invention is applied will be described with reference to the drawings.

図４は、本発明における音声再生出力システムの一構成例を示す図である。図４に示す音声出力システムは、カード４１を撮影するカメラ４２と、カード情報取得装置４３と、音声再生出力装置４４とを有するよう構成されている。また、音声再生出力装置４４は、速度検出部４５と、音声再生出力部４６とを有するよう構成されている。ここで、カード４１は、上述のカード２１と同一である。 FIG. 4 is a diagram showing an example of the configuration of an audio reproduction output system according to the present invention. The audio output system shown in FIG. 4 is configured to include a camera 42 that captures a card 41, a card information acquisition device 43, and an audio reproduction output device 44. The audio reproduction output device 44 is configured to include a speed detection unit 45 and an audio reproduction output unit 46. Here, the card 41 is the same as the card 21 described above.

まず、カードを特定するための文字や模様等の識別情報が描かれているカード４１をカメラ４２により撮影する。カメラ４２にて撮影されたカメラ映像信号は、カード情報取得装置４３に出力される。なお、カード４１は、ユーザにより把持されカメラ４２に対して三次元座標（図１に示すｘ，ｙ，ｚ座標）での移動や回転を行う。 First, the camera 41 photographs a card 41 on which identification information such as characters and patterns for specifying the card is drawn. A camera video signal photographed by the camera 42 is output to the card information acquisition device 43. Note that the card 41 is held by the user and moves or rotates with respect to the camera 42 in three-dimensional coordinates (x, y, z coordinates shown in FIG. 1).

カード情報取得装置４３は、カメラ４２から入力されたカメラ映像信号からカード４１の領域を抽出し、パターンマッチング等の画像処理を行って撮影された映像中に含まれるカードの種類、三次元位置情報、姿勢情報からなるカード情報を取得する。また、カード情報取得装置４３は、取得したカード情報を速度検出部４５、及び音声再生出力部４６に出力する。 The card information acquisition device 43 extracts the area of the card 41 from the camera video signal input from the camera 42, performs the image processing such as pattern matching, and the type of the card included in the video and 3D position information. The card information consisting of posture information is acquired. Further, the card information acquisition device 43 outputs the acquired card information to the speed detection unit 45 and the audio reproduction output unit 46.

速度検出部４５は、カード情報取得装置４３から逐次入力されるカードの種類、三次元位置情報、及び姿勢情報からカードの種類毎に映像信号に含まれる予め設定されるフレーム間隔での三次元位置情報及び姿勢情報を取得し、その三次元位置情報、姿勢情報の差分を計算してカード毎の方向を含めた移動速度を求め、カードの種類及び速度情報を音声再生出力部４６に出力する。 The speed detection unit 45 determines the three-dimensional position at a preset frame interval included in the video signal for each card type from the card type, three-dimensional position information, and posture information sequentially input from the card information acquisition device 43. The information and posture information are acquired, the difference between the three-dimensional position information and posture information is calculated to determine the moving speed including the direction for each card, and the card type and speed information is output to the audio reproduction output unit 46.

音声再生出力部４６では、入力されるカードの種類及び移動速度から予め蓄積されている音声ファイル又は外部から入力される音声の中から対応する音声ファイルを選択して、音声信号を再生し、スピーカ等の音声出力装置に出力する。 The audio reproduction output unit 46 selects an audio file stored in advance from an input card type and moving speed or an audio file input from the outside, reproduces an audio signal, and outputs a speaker. To a voice output device.

なお、音声出力において、例えば、複数のスピーカを用いたサラウンド再生出力が可能である場合には、速度情報、三次元位置情報及び姿勢情報に基づいて所定のスピーカに音声出力を行う。また、カードとカメラからの距離に対応させて音声のボリューム変えたり、パンニングやエフェクト、残響等の音声制御を行う。これにより臨場感のある音声を出力することができる。 In addition, in the sound output, for example, when surround playback output using a plurality of speakers is possible, the sound is output to a predetermined speaker based on the speed information, the three-dimensional position information, and the posture information. In addition, the sound volume is changed according to the distance from the card and the camera, and sound control such as panning, effects, and reverberation is performed. This makes it possible to output a realistic sound.

ここで、音声再生出力部４６が有する音声を選択するためのデータ例について、図を用いて説明する。図５は、音声再生出力部が有するデータの一例を示す図である。図５に示すデータ項目は、カードの種類から得られるカード種別と、速度検出部４５から得られる速度条件と、カード種別と速度条件とから音声を出力するものである。 Here, an example of data for selecting the voice of the voice reproduction output unit 46 will be described with reference to the drawings. FIG. 5 is a diagram illustrating an example of data included in the audio reproduction output unit. The data items shown in FIG. 5 are for outputting sound from the card type obtained from the card type, the speed condition obtained from the speed detection unit 45, and the card type and speed condition.

図５において、例えば、カード種別がｂで、速度が２ｃｍ／ｓであれば、人Ｂの声で口笛を吹いている音声ファイルを選択する。なお、音声ファイルとしては、ｗａｖ，ａｉｆｆ，ＭＰ３，ＷＭＡ，ＶＱ，ｒｍ，ｒａｍ，ｍｏｖ，ａａｃ，ａｔｒａｃ３等でフォーマットされた音声ファイルを用いることができる。 In FIG. 5, for example, if the card type is b and the speed is 2 cm / s, an audio file whistling in person B's voice is selected. As an audio file, an audio file formatted in wav, aiff, MP3, WMA, VQ, rm, ram, mov, aac, atrac3, etc. can be used.

上述したように、本発明を適用した音声再生出力装置により、速度情報に基づいて、ユーザが動かしたカードに対してインタラクティブ性に優れた高精度な音声ファイルを出力することができる。 As described above, the audio reproduction output device to which the present invention is applied can output a highly accurate audio file with excellent interactivity to the card moved by the user based on the speed information.

ここで、上述の映像合成出力装置２４、音声再生出力装置４４は、組み合わせた構成にすることも可能である。これにより、更に高精度な映像出力、音声出力を実現することができる。 Here, the above-described video composition output device 24 and audio reproduction output device 44 may be combined. Thereby, video output and audio output with higher accuracy can be realized.

また、カメラに撮影されるカードは、１枚だけでなく、複数枚の場合もある。複数枚の場合には、夫々が上述の実施例で示しているように１枚で撮影されているときと同様の動作を行うこともできるが、複数撮影されている場合に夫々のカードの相対速度や相互位置に基づいて、出力される映像や音声を選択することで、より高精度な映像出力や音声出力を実現することができる。 In addition, the card photographed by the camera may be not only one but also a plurality of cards. In the case of a plurality of images, it is possible to perform the same operation as when each image is photographed as shown in the above-described embodiment. By selecting the video and audio to be output based on the speed and the mutual position, it is possible to realize more accurate video output and audio output.

ここで、上述したように複数枚のカードが撮影された場合に本発明を適用した映像合成出力装置及び音声再生出力装置について説明する。なお、後述する実施例３では、映像合成出力と、音声再生出力とを１つの装置構成にて実現する例について説明するが、上述の実施例に示すように夫々を別々の装置構成とすることもできる。 Here, a video composition output device and an audio reproduction output device to which the present invention is applied when a plurality of cards are photographed as described above will be described. In the third embodiment to be described later, an example in which video synthesis output and audio reproduction output are realized by one device configuration will be described. However, as shown in the above-described embodiments, each device has a separate device configuration. You can also.

＜実施例３：映像合成・音声再生出力装置＞
図６は、本発明における映像合成・音声再生出力システムの一構成例を示す図である。図６に示す映像合成・音声再生出力システムは、カード６１−１，６１−２を撮影するカメラ６２と、カード情報取得装置６３と、映像合成・音声再生出力装置６４とを有するよう構成されている。 <Example 3: Video synthesis / audio reproduction output device>
FIG. 6 is a diagram showing a configuration example of a video synthesis / audio reproduction output system according to the present invention. The video synthesis / audio reproduction output system shown in FIG. 6 is configured to include a camera 62 for photographing the cards 61-1 and 61-2, a card information acquisition device 63, and a video synthesis / audio reproduction output device 64. Yes.

また、映像合成・音声再生出力装置６４は、速度検出部６５と、相互位置姿勢検出部６６と、動画映像出力部６７と、三次元動映像出力部６８と、音声再生出力部６９と、映像合成出力部７０とを有するよう構成されている。 The video synthesis / audio reproduction output device 64 includes a speed detection unit 65, a mutual position / posture detection unit 66, a moving image output unit 67, a 3D moving image output unit 68, an audio reproduction output unit 69, and an image. And a composite output unit 70.

ここで、図６に示す実施例では、２枚のカード６１−１，６１−２を用いた例について説明するが、本発明においてはこの限りではなく、３枚以上であってもよい。 Here, in the embodiment shown in FIG. 6, an example in which two cards 61-1 and 61-2 are used will be described. However, the present invention is not limited to this, and three or more cards may be used.

まず、カード６１−１，６１−２をカメラ６２により撮影する。カメラ６２にて撮影されたカメラ映像信号は、カード情報取得装置６３及び映像合成出力部７０に出力される。なお、カード６１−１，６１−２は、ユーザにより把持されカメラ６２に対して三次元座標（図１に示すｘ，ｙ，ｚ座標）での移動や回転を行う。 First, the cards 61-1 and 61-2 are photographed by the camera 62. The camera video signal photographed by the camera 62 is output to the card information acquisition device 63 and the video composition output unit 70. The cards 61-1 and 61-2 are held by the user and move or rotate with respect to the camera 62 in three-dimensional coordinates (x, y, and z coordinates shown in FIG. 1).

カード情報取得装置６３は、カメラ６２から入力されたカメラ映像信号からカード６１−１，６１−２の夫々の領域を抽出し、パターンマッチング等の画像処理を行って撮影された映像中に含まれるカードの種類、三次元位置情報、姿勢情報からなるカード情報を取得する。また、カード情報取得装置６３は、取得したカード情報を速度検出部６５、相互位置姿勢検出部６６に出力する。 The card information acquisition device 63 extracts the respective areas of the cards 61-1 and 61-2 from the camera video signal input from the camera 62 and performs image processing such as pattern matching to be included in the video imaged. The card information including the card type, three-dimensional position information, and posture information is acquired. Further, the card information acquisition device 63 outputs the acquired card information to the speed detection unit 65 and the mutual position / posture detection unit 66.

速度検出部６５は、カード情報取得装置６３から逐次入力されるカードの種類、三次元位置情報、及び姿勢情報から、複数のカードの種類毎に映像信号に含まれる予め設定されるフレーム間隔での三次元位置情報及び姿勢情報を取得し、その三次元位置情報、姿勢情報の差分を計算してカード間の方向を含めた相対移動速度を求め、カードの種類、撮影された複数のカード間の相対速度情報を動画映像出力部６７、三次元動映像出力部６８、及び音声再生出力部６９に出力する。 The speed detection unit 65 is configured to detect the card type, the three-dimensional position information, and the posture information that are sequentially input from the card information acquisition device 63 at a preset frame interval included in the video signal for each of the plurality of card types. Obtain 3D position information and posture information, calculate the difference between the 3D position information and posture information, find the relative movement speed including the direction between the cards, type of card, between multiple cards taken The relative speed information is output to the moving image output unit 67, the 3D moving image output unit 68, and the audio reproduction output unit 69.

また、相互位置姿勢検出部６６は、入力される夫々のカードのカード情報からカード同士が相互にどの位置でどの方向を向いているかの相互位置姿勢情報（カード間の距離や姿勢情報）を検出し、カード情報及び相互位置姿勢情報を動画映像出力部６７、三次元動映像出力部６８、音声再生出力部６９に出力する。 Further, the mutual position / posture detection unit 66 detects mutual position / posture information (distance and posture information between cards) indicating at which position and in which direction the cards face each other from the card information of each input card. Then, the card information and the mutual position / posture information are output to the moving image output unit 67, the 3D moving image output unit 68, and the audio reproduction output unit 69.

動画映像出力部６７では、入力されるカード情報、カード間の相対速度情報及び相互位置姿勢情報から、予め蓄積されている動画映像データ又は外部から入力される動画映像の中から対応する動画映像を選択し、更に、カード６１−１，６１−２の三次元位置情報、姿勢情報に基づいて、出力する動画映像を射影変換及び視点変換により移動、回転させて動画映像信号として映像合成出力部７０に出力する。なお、射影変換及び視点変換により出力される映像は、実際にカメラ６２により撮影されたカード上に出力され、更に、向きをカードの向きに対応して調整した映像が出力されるようにする。 In the moving image output unit 67, a corresponding moving image from moving image data stored in advance or moving image input from the outside is obtained from the input card information, the relative speed information between the cards, and the mutual position and orientation information. Further, based on the three-dimensional position information and posture information of the cards 61-1 and 61-2, the moving image to be output is moved and rotated by projective transformation and viewpoint transformation, and the video composition output unit 70 is converted into a moving image signal. Output to. Note that the video output by the projection conversion and the viewpoint conversion is output on the card actually captured by the camera 62, and further, the video whose direction is adjusted according to the direction of the card is output.

また、三次元動映像出力部６８は、予め蓄積されている三次元動映像データの中から、速度検出部６５により入力されたカード間の相対速度情報及び相互位置姿勢情報、カード情報、及び接続情報から対応する三次元動映像を選択する。更に、カード６１−１，６１−２の三次元位置情報、姿勢情報に基づいて、出力する三次元動映像を射影変換、視点変換して移動、回転させてＯｐｅｎＧＬ等の一般に利用可能な三次元動映像描画ソフトウェアにより描画し三次元動映像を生成し、生成した三次元動映像信号を映像合成出力部７０に出力する。 Also, the 3D moving image output unit 68 includes, among prestored 3D moving image data, relative speed information and mutual position / posture information, card information, and connection between cards input by the speed detecting unit 65. Select the corresponding 3D video from the information. Further, based on the three-dimensional position information and posture information of the cards 61-1 and 61-2, the three-dimensional moving image to be output is projected and converted, and the viewpoint is converted to move and rotate to generally use three-dimensional such as OpenGL. Drawing is performed by the moving picture drawing software to generate a three-dimensional moving picture, and the generated three-dimensional moving picture signal is output to the video composition output unit 70.

映像合成出力部７０は、カメラ６２から入力される実際に撮影されているカメラの映像と、動画映像出力部６７から入力される動画映像と、三次元動映像出力部６８から入力される三次元動映像信号をキー信号処理等により合成を行い、カメラ映像と、動画映像と、三次元動映像との合成信号を映像信号としてディスプレイ装置やプロジェクタ等により出力する。なお、出力される具体的な内容については後述する。 The video composition output unit 70 is a camera image actually input from the camera 62, a moving image input from the moving image output unit 67, and a 3D input from the 3D moving image output unit 68. A moving image signal is synthesized by key signal processing or the like, and a synthesized signal of a camera image, a moving image, and a three-dimensional moving image is output as a video signal by a display device, a projector, or the like. The specific contents to be output will be described later.

一方、音声再生出力部６９では、入力されるカード情報、カード間の相対速度情報、及び相互位置姿勢情報から予め蓄積されている音声ファイル又は外部から入力される音声の中から対応する音声ファイルを選択して、音声信号を再生し、スピーカ等の音声出力装置に出力する。 On the other hand, in the audio reproduction output unit 69, a corresponding audio file is selected from audio files stored in advance from input card information, relative speed information between cards, and mutual position and orientation information, or audio input from the outside. Select, reproduce the audio signal, and output it to an audio output device such as a speaker.

なお、音声出力において、例えば、複数のスピーカを用いたサラウンド再生出力が可能である場合には、相対速度情報、相互位置姿勢情報に基づいて所定のスピーカに音声出力を行う。これにより臨場感のある音声を出力することができる。 For example, when surround playback output using a plurality of speakers is possible in audio output, audio is output to a predetermined speaker based on relative speed information and mutual position and orientation information. This makes it possible to output a realistic sound.

ここで、動画映像出力部６７、三次元動映像出力部６８、及び音声再生出力部６９に夫々蓄積されている映像を選択するためのデータの例と、そのデータにより、ディスプレイに出力される映像出力、及び音声出力の具体例について、図７、図８を用いて説明する。 Here, an example of data for selecting videos stored in the moving image output unit 67, the three-dimensional moving image output unit 68, and the audio reproduction output unit 69, and an image output to the display based on the data. Specific examples of output and audio output will be described with reference to FIGS.

図７は、動画映像出力部、三次元動映像出力部、音声再生出力部に蓄積されている映像を選択するためのデータの一例を示す図である。また、図８は、図７のデータ例に基づく映像出力、音声出力の一例を示す図である。 FIG. 7 is a diagram illustrating an example of data for selecting videos stored in the moving image output unit, the 3D moving image output unit, and the audio reproduction output unit. FIG. 8 is a diagram showing an example of video output and audio output based on the data example of FIG.

ここで、図７（ａ）は、動画映像出力部６７、及び三次元動映像出力部６８が備えるデータ例であり、図７（ｂ）は、音声再生出力部６９が備えるデータ例である。図７では、少なくとも２つのカード種別（カードの種類）に対応する相対速度、相互位置におけるカードの映像（図７（ａ））、又は音声（図７（ｂ））の情報が蓄積されている。また、三次元動映像出力部６８の場合には、上述の項目の他に接続情報も含まれる。
また、図７のデータでは、２枚のカードが撮影されている場合についてのデータ例を示しているが、本発明においてカードの枚数についてはこの限りではなく複数枚に対応させてデータを蓄積することもできる。なお、カードが１枚の場合には、図３又は図５に示すデータを参照する。更に、データ内に条件に対応する映像が複数存在する場合は、全てを選択してもよく、また複数ある映像や音声に優先順位を設けて選択することもできる。 Here, FIG. 7A is an example of data provided in the moving image output unit 67 and the 3D moving image output unit 68, and FIG. 7B is an example of data provided in the audio reproduction output unit 69. In FIG. 7, information on the relative speed corresponding to at least two card types (card types), video of the card at the mutual position (FIG. 7A), or sound (FIG. 7B) is accumulated. . In the case of the 3D moving image output unit 68, connection information is also included in addition to the above items.
The data in FIG. 7 shows an example of data when two cards are photographed. However, in the present invention, the number of cards is not limited to this, and data is stored corresponding to a plurality of cards. You can also. When there is one card, the data shown in FIG. 3 or FIG. 5 is referred to. Furthermore, when there are a plurality of videos corresponding to the condition in the data, all of them may be selected, or a plurality of videos and sounds can be selected with priority.

例えば、図８（ａ）に示すように、同時に撮影される２枚のカード８１−１，８１−２が存在し、カード８１−１の種類（カード種別）をａとし，カード８１−２の種類（カード種別）をｂとする。ここで、カード８１−１とカード８１−２を互いに接近させた場合の、映像、音声の出力について説明する。 For example, as shown in FIG. 8A, there are two cards 81-1 and 81-2 that are photographed at the same time, and the type (card type) of the card 81-1 is a, The type (card type) is b. Here, output of video and audio when the card 81-1 and the card 81-2 are brought close to each other will be described.

まず、速度検出部６５により得られる２枚のカードの相対移動速度が０．５ｃｍ／ｓで、相互位置姿勢検出部６６から得られる相互位置が２ｃｍの位置にある場合、図７（ａ）からカード種別ａに対応する映像１として人Ａがおじぎをする映像が選択され、動画映像出力部６７、三次元動映像出力部６８により出力される映像を実際に撮影されている映像と合成して、所定の位置に映像を出力する（図８（ｂ））。また、図７（ｂ）から対応する音声として“コンニチハ”と音声出力する音声ファイルが選択され、スピーカ等の音声出力装置から再生出力される。 First, when the relative movement speed of the two cards obtained by the speed detection unit 65 is 0.5 cm / s and the mutual position obtained from the mutual position / posture detection unit 66 is at a position of 2 cm, from FIG. The video that person A bows is selected as video 1 corresponding to the card type a, and the video output by the video video output unit 67 and the 3D video output unit 68 is combined with the video actually captured. The video is output at a predetermined position (FIG. 8B). Also, from FIG. 7B, an audio file to be output as “Konichiha” is selected as the corresponding audio, and is reproduced and output from an audio output device such as a speaker.

一方、カード種別ｂに対応する映像２として図７（ａ）のデータから人Ｂがおじぎをする映像が選択され、動画映像出力部６７、三次元動映像出力部６８により出力される映像を実際に撮影されている映像と合成して、所定の位置に映像を出力する（図８（ｂ））。また、図７（ｂ）からは、“ドウモ”と音声出力する音声ファイルが選択され、スピーカ等の音声出力装置から再生出力される。 On the other hand, as the video 2 corresponding to the card type b, the video that the person B bows from the data in FIG. 7A is selected, and the video output from the moving image output unit 67 and the 3D moving image output unit 68 is actually displayed. Is combined with the video imaged in (1) and output to a predetermined position (FIG. 8B). Also, from FIG. 7B, an audio file to be output as “Domo” is selected and reproduced and output from an audio output device such as a speaker.

更に、速度検出部６５により得られる２枚のカードの相対移動速度が０．５ｃｍ／ｓで、相互位置姿勢検出部６６から得られる相互位置が０．２ｃｍの位置にある場合、図７（ａ）からカード種別ａに対応する映像１として、人Ａと人Ｂとが握手をする映像が選択され、動画映像出力部６７、三次元動映像出力部６８により出力される映像を実際に撮影されている映像と合成して、例えば、カード８１−２よりに表示させる等の制御を行って映像を出力する（図８（ｃ））。また、図７（ｂ）から対応する音声として“ヨロシク”と音声出力する音声ファイルが選択され、スピーカ等の音声出力装置から再生出力される。 Further, when the relative movement speed of the two cards obtained by the speed detection unit 65 is 0.5 cm / s and the mutual position obtained from the mutual position / posture detection unit 66 is 0.2 cm, FIG. ) Is selected as the video 1 corresponding to the card type a, and the video output by the moving image output unit 67 and the 3D moving image output unit 68 is actually shot. For example, the video is output by performing control such as displaying on the card 81-2 (FIG. 8C). Also, from FIG. 7B, an audio file that outputs “Yoroshiku” as the corresponding audio is selected and reproduced and output from an audio output device such as a speaker.

一方、カード種別ｂに対応する映像２としては、図７（ａ）のデータから出力される映像を表示しないよう設定されている。つまり、対応するカード種別ａの映像において、人Ａと人Ｂとが握手をしている映像を表示させるため、カード種別ｂは映像を表示させる必要がない。このように表示させることで、人物同士が抱き合ったり、人が自動車に乗り込むような映像等をより現実的な動作で表示することができ、より詳細な映像を高精度に表示させることができる。 On the other hand, the video 2 corresponding to the card type b is set not to display the video output from the data of FIG. That is, in the video of the corresponding card type a, the video in which the person A and the person B are shaking hands is displayed, so that the card type b does not need to display the video. By displaying in this way, it is possible to display an image such as a person embracing each other or a person getting into a car with a more realistic operation, and a more detailed image can be displayed with high accuracy.

また、図７（ｂ）から対応する音声として“ヨロシク”と音声出力する音声ファイルが選択され、スピーカ等の音声出力装置から出力される。 Also, from FIG. 7 (b), a voice file that outputs “Yoroshiku” as a corresponding voice is selected and output from a voice output device such as a speaker.

更に、速度検出部６５により得られる２枚のカードの相対移動速度が１５ｃｍ／ｓで、相互位置姿勢検出部６６から得られる相互位置が０．５ｃｍの位置にある場合、図７（ａ）からカード種別ａに対応する映像１として人Ａが転倒する映像が選択され、動画映像出力部６７、三次元動映像出力部６８により出力される映像を実際に撮影されている映像と合成して、所定の位置に映像を出力する（図８（ｄ））。また、図７（ｂ）から対応する音声として“ワー”と音声出力する音声ファイルが選択され、スピーカ等の音声出力装置から出力される。 Furthermore, when the relative movement speed of the two cards obtained by the speed detection unit 65 is 15 cm / s and the mutual position obtained from the mutual position / posture detection unit 66 is at a position of 0.5 cm, from FIG. A video in which the person A falls is selected as the video 1 corresponding to the card type a, and the video output by the video video output unit 67 and the three-dimensional video output unit 68 is combined with the video actually captured, An image is output at a predetermined position (FIG. 8D). Also, from FIG. 7B, an audio file to be output as “Wor” as the corresponding audio is selected and output from an audio output device such as a speaker.

一方、カード種別ｂに対応する映像２として図７（ａ）のデータから人Ｂが転倒する映像が選択され、動画映像出力部６７、三次元動映像出力部６８により出力される映像を実際に撮影されている映像と合成して、所定の位置に映像を出力する（図８（ｄ））。また、図７（ｂ）からは、“イタイ”と音声出力する音声ファイルが選択され、スピーカ等の音声出力装置から出力される。 On the other hand, as the video 2 corresponding to the card type b, the video in which the person B falls is selected from the data of FIG. 7A, and the video output by the moving image output unit 67 and the 3D moving image output unit 68 is actually used. The video is synthesized with the video being shot and output to a predetermined position (FIG. 8D). Also, from FIG. 7B, an audio file to be output as “Itai” is selected and output from an audio output device such as a speaker.

このように、相対速度情報と相互位置姿勢情報に基づいて、映像、音声を選択することで、カードが１枚の場合より多様で高精度な映像、音声を出力することができる。これにより、ユーザが動かしたカードに対してインタラクティブ性に優れた高精度な映像や音声を出力することができる。 Thus, by selecting video and audio based on the relative velocity information and the mutual position and orientation information, it is possible to output video and audio that are more diverse and highly accurate than when one card is used. This makes it possible to output highly accurate video and audio with excellent interactivity with respect to the card moved by the user.

上述したように本発明によれば、特定の文字や模様等の識別情報が描かれたカードを撮影した映像から得られるカード情報と、カードの速度情報とに基づいて動画映像、三次元動映像、音声を選択して実際に撮影された映像信号に合成することで高精度に映像出力や音声出力を実現することができる。これにより、ユーザが動かしたカードに対してインタラクティブ性に優れた高精度な映像や音声を出力することができる。 As described above, according to the present invention, a video image, a three-dimensional video image is obtained based on card information obtained from a video obtained by photographing a card on which identification information such as a specific character or pattern is drawn, and speed information of the card. By selecting the audio and synthesizing it with the actually captured video signal, video output and audio output can be realized with high accuracy. This makes it possible to output highly accurate video and audio with excellent interactivity with respect to the card moved by the user.

更に、実際に撮影された映像信号に複数のカードが存在する場合に、カード間の相対速度情報や相互位置姿勢情報に基づいて、動画映像、三次元動映像、音声を選択して実際に撮影された映像信号に合成することでより高精度に映像出力や音声出力を実現することができる。 In addition, when there are multiple cards in the video signal that was actually shot, actual video was shot by selecting video, 3D video, and audio based on the relative speed information and mutual position and orientation information between the cards. By synthesizing the generated video signal, video output and audio output can be realized with higher accuracy.

以上本発明の好ましい実施例について詳述したが、本発明は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to such specific embodiments, and various modifications, within the scope of the gist of the present invention described in the claims, It can be changed.

カードとカメラとの位置関係を示す一例の図である。It is a figure of an example which shows the positional relationship of a card | curd and a camera. 本発明における映像合成出力システムの一構成例を示す図である。It is a figure which shows the example of 1 structure of the image | video synthetic | combination output system in this invention. 動画映像出力部及び三次元動映像出力部に夫々蓄積されている映像選択データの一例を示す図である。It is a figure which shows an example of the video selection data each accumulate | stored in the moving image output part and the three-dimensional moving image output part. 本発明における音声再生出力システムの一構成例を示す図である。It is a figure which shows one structural example of the audio | voice reproduction | regeneration output system in this invention. 音声再生出力部が有するデータの一例を示す図である。It is a figure which shows an example of the data which an audio | voice reproduction | regeneration output part has. 本発明における映像合成・音声再生出力システムの一構成例を示す図である。It is a figure which shows the example of 1 structure of the image | video synthetic | combination / audio | voice reproduction | regeneration output system in this invention. 動画映像出力部、三次元動映像出力部、音声再生出力部に蓄積されている映像を選択するためのデータの一例を示す図である。It is a figure which shows an example of the data for selecting the image | video accumulated in the moving image output part, the three-dimensional moving image output part, and the audio | voice reproduction output part. 図７のデータ例に基づく映像出力、音声出力の一例を示す図である。It is a figure which shows an example of the video output based on the data example of FIG. 7, and an audio | voice output.

Explanation of symbols

１１，２１，４１，６１，８１カード
１２，２２，４２，６２カメラ
２３，４３，６３カード情報取得装置
２４映像合成出力装置
２５，４５，６５速度検出部
２６，６７動画映像出力部
２７，６８三次元動映像出力部
２８，７０映像合成出力部
４４音声再生出力装置
４６，６８音声再生出力部
６４映像合成・音声再生出力装置
６６相互位置姿勢検出部 11, 21, 41, 61, 81 Card 12, 22, 42, 62 Camera 23, 43, 63 Card information acquisition device 24 Video composition output device 25, 45, 65 Speed detection unit 26, 67 Video image output unit 27, 68 3D moving image output unit 28, 70 Video composition output unit 44 Audio reproduction output device 46, 68 Audio reproduction output unit 64 Video synthesis / audio reproduction output device 66 Mutual position / attitude detection unit

Claims

In a video composition output device for synthesizing and outputting another video signal to the video signal based on the card information consisting of the card type, three-dimensional position information, and posture information from the video signal of the camera that shoots the card ,
A speed detector that detects the speed of the card based on the card information obtained for each frame at a predetermined frame interval of the video signal captured by the camera;
A video composition output device comprising: a video output unit that selects and outputs a video signal to be synthesized based on the card information and speed information obtained by the speed detection unit.

The video output unit
Based on the card information and the speed information obtained by the speed detection unit, select a moving image, and based on the three-dimensional position information and the posture information, display, non-display, size of the moving image, The video composition output device according to claim 1, wherein the video composition output device controls and outputs a position and a direction.

The video output unit
Based on the card information, speed information obtained by the speed detector, color information and connection information obtained from the three-dimensional position information, a three-dimensional moving image is selected, and the three-dimensional position information and the posture information are selected. The video composition output device according to claim 1, wherein the display, non-display, size, position, and direction of the three-dimensional moving image are controlled based on the output.

When there are a plurality of cards in the video signal captured by the camera,
The video composition output device according to any one of claims 1 to 3, wherein the speed detection unit detects relative speed information between the plurality of cards based on the card information.

5. The video composition output device according to claim 4, further comprising a mutual position / posture detection unit configured to detect mutual position / posture information including distance between cards and posture information from card information of the plurality of cards.

The video output unit
6. The video composition output device according to claim 5, wherein the video image and / or the three-dimensional moving image is selected and output based on the relative velocity information and the mutual position and orientation information.

In an audio reproduction output device that reproduces and outputs pre-stored audio based on card information including card type, three-dimensional position information, and posture information from a video signal of a camera that shoots the card,
A speed detector that detects the speed of the card based on the card information obtained for each frame at a predetermined frame interval of the video signal captured by the camera;
Wherein based on the velocity information obtained by the card information and the speed detection unit, possess an audio reproduction output unit for reproducing and outputting selected audio file,
When there are a plurality of cards in a video signal photographed by the camera, the speed detection unit detects relative speed information between the plurality of cards based on the card information. .

8. The audio reproduction output device according to claim 7 , further comprising a mutual position / posture detection unit configured to detect mutual position / posture information including card distance and posture information from the card information of the plurality of cards.

The audio reproduction output unit
9. The audio reproduction output device according to claim 8 , wherein the audio file is selected and output based on the relative speed information and the mutual position and orientation information.