JP4548313B2

JP4548313B2 - Video creation device and video creation method

Info

Publication number: JP4548313B2
Application number: JP2005325274A
Authority: JP
Inventors: 雄介鈴木
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2005-11-09
Filing date: 2005-11-09
Publication date: 2010-09-22
Anticipated expiration: 2025-11-09
Also published as: JP2007133609A

Description

意味を持った人物の動作の映像を含んだ動作映像を作成する映像作成装置及び映像作成方法に関し、特に手話などの、語や文章、記号等に対応付けることができる人物の動作の映像を含んだ動作映像であって、複数連結し、また一部を合成して連続する動作映像を作成する映像作成装置及び映像作成方法に関するものである。 The video creation apparatus and the video creation method for creating a motion image including a motion image of a meaningful person, particularly including a motion image of a person that can be associated with words, sentences, symbols, etc., such as sign language. The present invention relates to a video creation apparatus and a video creation method for creating a continuous motion video by connecting a plurality of motion videos and combining a part thereof.

動きのある人物映像を合成する場合、一連の動作を行なっている映像を収録し、それらの映像フレームの人物の画像をＣＧ映像の各映像フレームに合成することで、動きのある人物の実写映像をＣＧ映像上に合成していた。しかしながら、単一動作映像を連結して合成し、連続動作映像を生成する場合、人物の位置や形状にずれが生じるため、不自然な動作映像になってしまう。これを解消するために、人物の一連の動作を収録した映像を用いてＣＧ映像上に合成を行なっていたが、この場合、ＣＧ映像上で上記連続動作映像を別の場面で再利用することができなかった。 When synthesizes a moving person image, it records a video of a series of operations, and synthesizes the image of the person in those image frames into each image frame of the CG image, thereby creating a live-action image of the person in motion Was synthesized on the CG video. However, when a single motion video is connected and combined to generate a continuous motion video, the person's position and shape are shifted, resulting in an unnatural motion video. In order to solve this problem, synthesis was performed on a CG video using a video recording a series of human actions. In this case, the continuous motion video is reused in another scene on the CG video. I could not.

これを解消するために特許文献１の発明が提案されている。この発明は、写実性の高い映像コンテンツを動的に作成するために、例えば、実際の人物の動きを撮影した実写の映像ファイルを複数用意しておき、それらから必要に応じて中間の映像部分を合成した後、連結するものである。
特開２００３−６９９００号公報 In order to solve this, the invention of Patent Document 1 has been proposed. In order to dynamically create highly realistic video content, the present invention prepares a plurality of live-action video files in which, for example, an actual person's movement is photographed, and an intermediate video portion is prepared from them as necessary. Are synthesized and then linked.
JP 2003-69900 A

しかしながら、上記映像ファイルを単純に合成すると、顔と手など人物の体の各部分が重なることなどによって生じる映像のゆがみなどが目立ち、かえって不自然な印象を与えるという問題がある。 However, when the above video files are simply combined, there is a problem that the distortion of the video caused by the overlapping of each part of the human body such as the face and hand is noticeable and gives an unnatural impression.

また、合成に使われる映像で動作している人物がすべて同一人物である必要があり、映像の撮影の際に個人の負担が大きくなるという問題がある。 In addition, it is necessary that all persons operating on the video used for composition are the same person, and there is a problem that the burden on the individual increases when shooting the video.

本発明は、上記問題点に鑑みてなされたもので、映像中の人物の身体各部分の動作映像をそれぞれ別個のデータとして保持、利用することにより、合成の際に混入するノイズの低減などを行い、品質の高い連続映像を作成しようとするものである。即ち、撮影されている人物の右手領域の映像、左手領域の映像、顔領域の映像など身体の各部分の動作だけを格納した映像を個別の映像ファイルとして単語毎に分割して、動作映像格納手段に登録し、単語毎に分割したメタデータを連結情報格納手段に登録して利用する。さらに、各動作の中間映像を合成する際に、各身体領域の中問画像を合成した後、最終的な人物の動作として各映像を重ねた合成画像を作成することにより、身体領域の重なりによる合成のゆがみや合成の際に混入するノイズを低減し、高品質な手話映像を作成しようとするものである。 The present invention has been made in view of the above problems, and by holding and using the motion images of each part of a person's body in the images as separate data, it is possible to reduce noise mixed during synthesis, etc. And try to create high quality continuous video. That is, a video containing only the motion of each part of the body, such as a video of the right hand area, video of the left hand area, video of the face area, etc. of the person being photographed is divided into words as individual video files , and motion videos are stored. The metadata registered in the means and divided for each word is registered and used in the connection information storage means. Furthermore, when synthesizing the intermediate images of each motion, the intermediate images of each body region are synthesized, and then a composite image is created by superimposing each video as the final human motion. The aim is to create a high-quality sign language image by reducing the distortion of the composition and the noise mixed during the composition.

また、手領域や顔領域等の身体の各部分を分割してデータを保持するため、特定人の手領域等の部分データを他人の顔領域等の部分データと組み合わせて利用できるようにし、使用できる映像の範囲を拡大しようとするものである。 In addition, because data is held by dividing each part of the body such as the hand area and face area, the partial data such as the hand area of a specific person can be used in combination with the partial data such as the face area of another person. It is intended to expand the range of possible images.

このために本発明の映像作成装置は、特定情報に応じた身体の動作を表した動作映像を複数連結して連続動作映像を作成する映像作成装置であって、上記身体の各部分ごとの動作がそれぞれ含まれている動作映像を個別に格納する動作映像格納手段と、上記各動作映像の情報に、当該動作映像の読み出し情報と、当該動作映像内に表示されている１または複数の連結基準情報とを対応付けて格納する連結基準情報格納手段と、時間的順序のある複数の入力情報に基づいて上記連結基準情報格納手段から対応する上記動作映像の読み出し情報を検索し、その読み出し情報に基づいて上記動作映像格納手段から上記動作映像を取り出す動作映像取得手段と、当該動作映像取得手段で取り出した複数の上記各身体部分の動作映像を連結し、連結した各身体部分の動作映像を重畳して身体の動作映像を作成する映像作成手段と、特定人物の動作映像を分解して得た複数の処理フレームの画像情報を用いて、認識、分割し、各身体部分ごとの身体領域フレームとし、それらを時間情報と共に蓄積して各身体部分ごとの身体領域ファイルとする映像分割手段と、上記各身体部分の動作映像の特徴を認識して連結基準情報を生成する連結基準情報手段と、あらかじめ用意された情報辞書に基づいて、動作映像が表す１または複数の情報を認識する情報認識手段と、当該情報認識手段が認識した各認識情報の動作映像を身体部分情報から分割し、その分割動作映像および、上記認識情報に対応する連結基準情報を上記動作映像格納手段および、上記連結基準情報格納手段に登録する登録手段とを備え、上記映像分割手段が、上記連結基準情報と上記時間情報からなる時系列データを単語に相当する部分ごとに区切り、区切った時系列データの時間情報を用いて上記各身体部分ごとの身体領域ファイルを上記単語ごとに分割し、上記登録手段が、上記単語ごとに分割した身体領域ファイルを上記動作映像格納手段に登録し、上記単語ごとに分割した上記連結基準情報を上記連結基準情報格納手段に登録することを特徴とする。 To this end, the video creation device of the present invention is a video creation device that creates a continuous motion video by linking a plurality of motion images representing the motion of the body according to specific information, and the motion for each part of the body Motion video storing means for individually storing the motion video each of which is included, information on each of the motion video, read information of the motion video, and one or a plurality of connection criteria displayed in the motion video The link reference information storage means for storing the information in association with each other, and the corresponding read-out information of the motion video is searched from the link reference information storage means based on a plurality of pieces of input information in time order, and the read information is Based on the motion video acquisition means for extracting the motion video from the motion video storage means based on the motion video of the plurality of body parts extracted by the motion video acquisition means, Using a video generating means for generating the operation image of the body by superimposing the operation image of the body part, the image information of a plurality of processing frame obtained by decomposing the operation image of a specific person, recognition, divided, each body Video segmentation means that creates body region frames for each part, accumulates them with time information, and creates a body region file for each body part, and recognizes the characteristics of the motion video of each body part to generate connection reference information Based on a connection reference information means, an information dictionary prepared in advance, information recognizing means for recognizing one or a plurality of information represented by the motion video, and motion video of each recognition information recognized by the information recognition means divided from the division operation picture and the recognition information the operation video storage means linking criterion information corresponding to and, and a registration means for registering to the linking criterion information storage means It said image segmentation means divides each portion corresponding to a word time-series data consisting of the connection reference information and the time information, the body region file per each body part using the time information of the time series data, separated Dividing into words, the registration means registers the body region file divided into words into the motion video storage means, and registers the connection reference information divided into words into the connection reference information storage means. characterized in that it.

さらに、本発明の映像作成方法は、映像作成装置で、特定情報に応じた身体の動作を表した動作映像を複数連結して連続動作映像を作成する映像作成方法であって、動作映像格納手段で、上記身体の各部分ごとの動作がそれぞれ含まれている動作映像を個別に格納する動作映像格納工程と、連結基準情報格納手段で、上記各動作映像の情報に、当該動作映像の読み出し情報と、当該動作映像内に表示されている１または複数の連結基準情報とを対応付けて格納する連結基準情報格納工程と、動作映像取得手段で、時間的順序のある複数の入力情報に基づいて上記連結基準情報格納手段から対応する上記動作映像の読み出し情報を検索し、その読み出し情報に基づいて上記動作映像格納手段から上記動作映像を取り出す動作映像取得工程と、当該動作映像取得工程で取り出した複数の上記各身体部分の動作映像を映像作成手段で連結し、連結した各身体部分の動作映像を重畳して身体の動作映像を作成する映像作成工程と、映像分割手段で、特定人物の動作映像を分解して得た複数の処理フレームの画像情報を用いて、認識、分割し、各身体部分ごとの身体領域フレームとし、それらを時間情報と共に蓄積して各身体部分ごとの身体領域ファイルとする映像分割工程と、連結基準情報手段で、上記各身体部分の動作映像の特徴を認識して連結基準情報を生成する連結基準情報工程と、情報認識手段で、あらかじめ用意された情報辞書に基づいて、動作映像が表す１または複数の情報を認識する情報認識工程と、当該情報認識工程で認識した各認識情報の動作映像を身体部分情報から分割し、その分割動作映像および、上記認識情報に対応する連結基準情報を登録手段で上記動作映像格納工程および、上記連結基準情報格納工程で用いるために登録する登録工程とを備え、上記映像分割工程において上記映像分割手段が、上記連結基準情報と上記時間情報からなる時系列データを単語に相当する部分ごとに区切り、区切った時系列データの時間情報を用いて上記各身体部分ごとの身体領域ファイルを上記単語ごとに分割し、上記登録工程において上記登録手段が、上記単語ごとに分割した身体領域ファイルを上記動作映像格納手段に登録し、上記単語ごとに分割した上記連結基準情報を上記連結基準情報格納手段に登録することを特徴とする。 Furthermore, the video creation method of the present invention is a video creation method for creating a continuous motion video by connecting a plurality of motion videos representing body movements according to specific information in a video creation device , comprising motion video storage means In the operation video storing step of individually storing the operation images each including the operation of each part of the body, and the connection reference information storage means, the information of each operation image is included in the information of the operation images. And a connection reference information storing step for storing one or a plurality of connection reference information displayed in the motion video in association with each other, and the motion video acquisition means based on a plurality of pieces of input information in time order. Find the read information of the operation image corresponding from the connection reference information storage unit, and operates the image acquisition step of taking out the operation image from the operation image storage means on the basis of the read information, those A plurality of said operation image of each body part extracted in operation the image acquisition process connected with the image forming means, and the image generation step of generating the operation image of the body by superimposing the operation image of each body part linked, image segmentation By using the image information of a plurality of processing frames obtained by disassembling the motion video of a specific person by means, it recognizes and divides it into body region frames for each body part, and accumulates them together with time information for each body A video segmentation step for creating a body region file for each part, and a connection criterion information unit that recognizes the characteristics of the motion image of each body part and generates connection criterion information, and an information recognition unit in advance. Based on the prepared information dictionary, an information recognition step for recognizing one or a plurality of information represented by the motion video, and the motion video of each recognition information recognized in the information recognition step are separated from the body part information. And the registration step of registering the segmented motion image and the connection reference information corresponding to the recognition information for use in the operation image storage step and the connection criterion information storage step by a registration means, and the image division step The video dividing means divides the time series data composed of the connection reference information and the time information into parts corresponding to words, and uses the time information of the divided time series data for the body region file for each body part. In the registration step, the registration means registers the body region file divided for each word in the motion video storage means, and the connection reference information divided for each word is used as the connection reference. The information is stored in the information storage means .

映像合成を行う際に、身体部分の重なりが原因で生じるゆがみを解消でき、より自然な形での映像合成が可能となるという効果が得られる。 When video synthesis is performed, the distortion caused by the overlapping of body parts can be eliminated, and the video can be synthesized in a more natural form.

また、手領域と顔領域の映像等のように身体の各部分の映像が別の人物のものであっても合成して使用できるため、使用可能な映像の範囲が広がるという効果が得られる。 In addition, since the images of each part of the body, such as the images of the hand area and the face area, can be combined and used, the effect that the range of usable images is widened can be obtained.

さらに、システムに映像分割部を追加することで、作業者が映像の分割作業を行う必要がなくなるため、作業者は、システムに単語と映像ファイルを簡便に追加することが可能となる。 Further, by adding the video dividing unit to the system, the operator does not need to perform the video dividing operation, so that the worker can easily add words and video files to the system.

次に、本発明の実施形態について添付図面を基に説明する。 Next, embodiments of the present invention will be described with reference to the accompanying drawings.

［第１実施形態］
まず、本発明の第１実施形態について説明する。本実施形態では、手話を例に説明する。具体的には、入力文章の意味を表現する手話を、連続映像として表示する映像表示装置に本発明を適用した例を、図面を参照して説明する。 [First Embodiment]
First, a first embodiment of the present invention will be described. In the present embodiment, explanation will be given by taking sign language as an example. Specifically, an example in which the present invention is applied to a video display device that displays a sign language expressing the meaning of an input sentence as a continuous video will be described with reference to the drawings.

図１は実施形態に係る映像表示装置のシステム構成図である。図１に示すように、映像表示装置は、入力部１０、形態素解析部１１、翻訳部１２、データ選択部１３、単語辞書１４、選択映像ファイル１５、表示部１６、合成部１７及び映像ファイル群１８を備えて構成されている。 FIG. 1 is a system configuration diagram of a video display apparatus according to an embodiment. As shown in FIG. 1, the video display device includes an input unit 10, a morphological analysis unit 11, a translation unit 12, a data selection unit 13, a word dictionary 14, a selected video file 15, a display unit 16, a synthesis unit 17, and a video file group. 18 is provided.

入力部１０は、ユーザが映像表示を希望する意味の文章（本実施形態では日本語とする）を取り込み、取り込んだ文章を形態素解析部１１に与えるものである。入力部１０の機能を実現する例として、たとえばキーボードによる文字入力受付、マイクからの入力音声をテキスト変換した、音声テキスト入力受付、またはアンテナが捕捉した放送電波によるデータストリームの入力受付などの多様な入力形態が可能である。即ち、ユーザが映像表示を希望する文章の内容を入力できるすべての手段（ボタン等の操作、音声、電波等を用いた入力手段）を用いることができる。 The input unit 10 captures a sentence meaning that the user wants to display an image (in this embodiment, Japanese), and gives the captured sentence to the morphological analysis unit 11. Examples of realizing the function of the input unit 10 include various types such as character input reception by a keyboard, voice text input reception in which voice input from a microphone is converted into text, or data stream input reception by broadcast radio waves captured by an antenna. Input form is possible. That is, all means (input means using operation of buttons, voice, radio waves, etc.) that allow the user to input the contents of the text that the user wants to display video can be used.

形態素解析部１１は、入力部１０から入力文章を受け取り、その入力文章に対して形態素解析を行い、入力文を形態素に分解し、図示しない辞書を用いて形態素に品詞を割り当てる一連の処理を行う部分である。また、形態素解析部１１は、解析した単語を翻訳部１２に与える。 The morphological analysis unit 11 receives an input sentence from the input unit 10, performs morphological analysis on the input sentence, decomposes the input sentence into morphemes, and performs a series of processes for assigning parts of speech to the morphemes using a dictionary (not shown). Part. Further, the morpheme analysis unit 11 gives the analyzed word to the translation unit 12.

翻訳部１２は、形態素解析部１１により解析された単語を受け取り、単語の情報、単語の品詞情報に従って、単語の語順の変更や不要単語の除去や必要な語の追加などの処理を行う部分である。また、翻訳部１２は、処理結果をデータ選択部１３に与える。 The translation unit 12 receives the word analyzed by the morpheme analysis unit 11 and performs processing such as changing the word order of words, removing unnecessary words, and adding necessary words in accordance with word information and word part-of-speech information. is there. Also, the translation unit 12 gives the processing result to the data selection unit 13.

データ選択部１３は、翻訳部１２から処理結果（調整された単語）を受け取り、その単語の意味を表現するために必要となる映像ファイルを検索する部分である。データ選択部１３は、翻訳部１２から、調整された単語を受け取ると、単語辞書１４の中からその単語に対応付けられている読み出し情報（例えば、後述する映像ファイル名、映像内のメタデータ、動作継続時間など）を取り出し、映像ファイル名やメタデータ等を検索キーとして、映像ファイル群１８からその単語の意味を表現するために必要となる映像ファイルを検索する。また、データ選択部１３は、検索した映像ファイルを選択映像ファイル１５に蓄積する。 The data selection unit 13 is a part that receives a processing result (adjusted word) from the translation unit 12 and searches for a video file necessary to express the meaning of the word. When the data selection unit 13 receives the adjusted word from the translation unit 12, the data selection unit 13 reads out information associated with the word from the word dictionary 14 (for example, a video file name, metadata in the video, which will be described later, And the video file necessary for expressing the meaning of the word is searched from the video file group 18 using the video file name, metadata, and the like as search keys. Further, the data selection unit 13 stores the searched video file in the selected video file 15.

また、データ選択部１３は、入力単語が単語辞書１４に登録されていない場合、その単語の音を手話で表現した指文字を示す映像を単語辞書１４から検索して、選択映像ファイル１５に与える。 In addition, when the input word is not registered in the word dictionary 14, the data selection unit 13 searches the word dictionary 14 for an image showing a finger character expressing the sound of the word in sign language, and gives the selected image file 15. .

このデータ選択部１３で、時間的順序のある複数の入力情報に基づいて単語辞書１４から対応する上記動作映像の読み出し情報を検索し、その読み出し情報に基づいて映像ファイル群１８から上記動作映像を取り出す動作映像取得手段が構成されている。 The data selection unit 13 retrieves the corresponding motion video read information from the word dictionary 14 based on a plurality of input information in time order, and the motion video is retrieved from the video file group 18 based on the read information. An operation video acquisition means for extracting is configured.

ここで、図２を参照して映像ファイルについて説明する。 Here, the video file will be described with reference to FIG.

映像ファイルとは、映像を単語の意味を表現する部分で区切って名前をつけたものである。即ち、映像ファイルとは、たとえば映る部分以外を撮影時の背景と同じ色とするなどの処理により、顔だけが映る状態、右手だけが映る状態、左手だけが映る状態として、手話を行っている人物を撮影し、その撮影した映像を、各単語の意味を表現する部分で区切り、区切った部分にそれぞれ名前をつけたものである。これらの映像ファイルを、それぞれ顔映像ファイル、左手映像ファイル、右手映像ファイルと呼ぶ。 A video file is a video file that is divided into parts that express the meaning of words. In other words, a video file is a sign language in which only the face is reflected, only the right hand is reflected, and only the left hand is reflected, for example, by processing the part other than the reflected part to be the same color as the background at the time of shooting. A person is photographed, and the photographed video is separated by a portion expressing the meaning of each word, and a name is given to each of the separated portions. These video files are called a face video file, a left hand video file, and a right hand video file, respectively.

上記映像ファイルは、複数枚の静止画像を連続的にならべたものとして構成されており、その静止画像の一枚一枚をフレームという。図２中の２１、２２、２３はそれぞれ、顔映像ファイル、左手映像ファイル、右手映像ファイルのフレームの例を示したものである。また、２４はこれら顔映像ファイル、右手映像ファイル、左手映像ファイルの各フレームがそろった人物の映像として表示した例を示したものである。 The video file is configured by continuously arranging a plurality of still images, and each still image is called a frame. In FIG. 2, 21, 22 and 23 show examples of frames of a face video file, a left hand video file and a right hand video file, respectively. Reference numeral 24 denotes an example in which the frames of the face video file, the right hand video file, and the left hand video file are displayed as a video of a person having a complete frame.

図１中の顔映像ファイル群１８Ｆ、右手映像ファイル群１８Ｒ、左手映像ファイル群１８Ｌは、それぞれの名前の領域だけが映っている複数の映像ファイルの集合体である。さらに、顔映像ファイル群１８Ｆ、右手映像ファイル群１８Ｒ、左手映像ファイル群１８Ｌは、最終的に表示部１６に表示されて連続映像を構成する要素となる複数の映像ファイルの集合体であって、表現すべき日本語の各単語と日本語の音を手話で表現するための動作である指文字動作に対応する動作が撮影された映像を含んでいる複数の映像ファイルの集合体である。これらを総称して映像ファイル群１８と呼ぶ。この映像ファイル群１８で、上記身体の各部分ごとの動作がそれぞれ含まれている動作映像を個別に格納する動作映像格納手段が構成されている。 The face video file group 18F, the right hand video file group 18R, and the left hand video file group 18L in FIG. 1 are an aggregate of a plurality of video files in which only the areas of the respective names are shown. Furthermore, the face video file group 18F, the right hand video file group 18R, and the left hand video file group 18L are aggregates of a plurality of video files that are finally displayed on the display unit 16 and are elements constituting a continuous video, It is an aggregate of a plurality of video files including videos in which motions corresponding to finger character motions, which are motions for expressing each Japanese word to be expressed and Japanese sounds in sign language, are captured. These are collectively referred to as a video file group 18. The video file group 18 constitutes motion video storage means for individually storing motion videos each including the motion of each part of the body.

単語辞書１４は、各単語に、その単語の意味を表現する映像を含んでいる映像ファイルの映像ファイル名、映像ファイル中の動作継続時間、映像内のメタデータなどが対応付けられて記憶されている記憶領域である。なお、単語辞書１４は、その内容を、新規登録、追加、削除することが可能である。この単語辞書１４で、上記各動作映像の情報に、当該動作映像の読み出し情報と、当該動作映像内に表示されている１または複数の連結基準情報とを対応付けて格納する連結基準情報格納手段が構成されている。

The word dictionary 14 stores each word in association with a video file name of a video file including a video representing the meaning of the word, an operation duration in the video file, metadata in the video, and the like. Storage area. Note that the word dictionary 14 can be newly registered, added, and deleted. In this word dictionary 14, connection criterion information storage means for storing the information of each motion video in association with the readout information of the motion video and one or a plurality of link criterion information displayed in the motion video. Is configured.

表１に本実施形態の単語辞書１４の構成例を示す。この表１に示すように、単語名は、登録されている単語の名前を示す。読みは、その単語の読みを示す。品詞は、その単語の品詞名を示す。ファイル名はその単語を表している映像ファイルの名前を示す。顔、左手、右手の三つの映像ファイルの名前が存在する。作成日時は、映像ファイルが作成され、単語辞書にデータが登録された日時を表す。継続時間は、映像ファイル中の動作の継続時間を示しており、単位は秒である。 Table 1 shows a configuration example of the word dictionary 14 of the present embodiment. As shown in Table 1, the word name indicates the name of a registered word. Reading indicates the reading of the word. The part of speech indicates the part of speech name of the word. The file name indicates the name of the video file representing the word. There are three video file names for face, left hand, and right hand. The creation date represents the date when the video file was created and the data was registered in the word dictionary. The duration indicates the duration of the operation in the video file, and the unit is second.

表１においてメタデータは、映像ファイルの内容の中から映像の合成、連結に必要なデータを抜き出したものである。メタデータの構成は映像ファイルの内容、出力すべき映像の内容によって異なる構成とすることが可能である。本実施形態では、出力映像を手話とするため、単語辞書１４は、入力された単語の意味を表す三種類の手話映像の各フレーム中での人物の視線の向き、口の形、人物の手の位置、手の形、手の向き、速度ベクトルなどをメタデータとして保持している。 In Table 1, metadata is data extracted from video file contents necessary for video composition and connection. The configuration of the metadata can be different depending on the content of the video file and the content of the video to be output. In this embodiment, since the output video is used as sign language, the word dictionary 14 stores the direction of the person's line of sight, mouth shape, and human hand in each frame of the three types of sign language video representing the meaning of the input word. The position, hand shape, hand orientation, velocity vector, etc. are stored as metadata.

表１において始動点とは、映像ファイルの先頭フレームのメタデータ、終了点とは映像ファイルの最終フレームのメタデータをそれぞれ示す。それ以外のフレームのメタデータも先頭、最終フレームのメタデータと同様の構造を有している。 In Table 1, the start point indicates the metadata of the first frame of the video file, and the end point indicates the metadata of the last frame of the video file. The metadata of other frames has the same structure as the metadata of the first and last frames.

次に、表１のメタデータ中の各項目について図３を参照して説明する。表１中の手の位置とは、映像中に現れる人物の手の領域の重心位置を、映像中での二次元座標で表したものである。なお、表１には示していないが、手の位置を時間で微分した動作速度ベクトルを用いても良い。 Next, each item in the metadata of Table 1 will be described with reference to FIG. The position of the hand in Table 1 represents the position of the center of gravity of the hand area of the person appearing in the video, expressed by two-dimensional coordinates in the video. Although not shown in Table 1, an operation speed vector obtained by differentiating the hand position with respect to time may be used.

図３に、実際に表示される画面の例を示す。画面の垂直方向がｙ軸、水平方向がｘ軸である。ｙ軸とｘ軸の交点を基準点（０，０）とする。この場合、当該フレーム中の右手の位置を座標（ｘ，ｙ）、左手の位置を（ｘ０，ｙ０）とすることができる。 FIG. 3 shows an example of a screen that is actually displayed. The vertical direction of the screen is the y-axis and the horizontal direction is the x-axis. The intersection of the y axis and the x axis is defined as a reference point (0, 0). In this case, the position of the right hand in the frame can be the coordinates (x, y), and the position of the left hand can be (x0, y0).

表１中の手の形とは、人物が手の指をのばしている、曲げているなどの指の形を示す。通常、手話においては約８０種類程度の手の形が区別されているから、本実施形態では区別される手の形を表す記号を取り決め、その記号を用いる。図１６に手の形と対応する記号の例を一部示す。 The shape of the hand in Table 1 indicates the shape of a finger such as a person extending or bending a finger of the hand. Usually, about 80 types of hand shapes are distinguished in sign language. In the present embodiment, symbols representing the distinguished hand shapes are determined and used. FIG. 16 shows some examples of symbols corresponding to hand shapes.

表１中の手の向きとは、手の向いている方向と、手のひらが向いている方向とを組み合わせて表したものである。ここでいう手の向いている方向とは、人物の肘から手首まで引いた直線が向いている方向である。通常手話においては手の向いている方向は２０〜２８種類程度、手のひらの向きは６〜８種類程度が区別されているから、向きの種類をそれぞれ記号で表現し、表中にはそれらの記号の組み合わせが記されている。 The direction of the hand in Table 1 is a combination of the direction in which the hand is facing and the direction in which the palm is facing. Here, the direction in which the hand is facing is the direction in which a straight line drawn from the elbow of the person to the wrist is facing. In normal sign language, there are about 20 to 28 types of directions facing the hand and about 6 to 8 types of palm orientations. Therefore, the direction types are represented by symbols, and those symbols are shown in the table. The combination of is written.

図４、図５、図６に手の向いている方向と記号の対応付け、図７に手のひらの向きと記号の対応付けの例の一部をそれぞれ示す。 FIG. 4, FIG. 5 and FIG. 6 show the correspondence between the direction of the hand and the symbol, and FIG. 7 shows a part of the example of the correspondence between the palm direction and the symbol.

図４は人物を真正面から見た平面、いわゆる前頭面上での方向を示す。例えば図４での人物の右手の指している方向をａ８と表す。図４中のｘ軸，ｙ軸は図３で説明したものと同様の軸である。 FIG. 4 shows a direction on a plane viewed from the front, that is, a so-called frontal plane. For example, the direction in which the person's right hand is pointing in FIG. The x-axis and y-axis in FIG. 4 are the same as those described in FIG.

図５は人物を側面から見た平面、いわゆる矢状面上での６方向を示す。図５での人物の右手の指している方向をａ１０と表す。図５中のｚ軸とは、図４のｘ軸，ｙ軸に垂直な軸で、人物の体から顔の向いている方向に向かっている軸である。 FIG. 5 shows six directions on a plane as seen from the side, that is, a so-called sagittal plane. The direction in which the person's right hand is pointing in FIG. The z-axis in FIG. 5 is an axis perpendicular to the x-axis and the y-axis in FIG. 4, and is an axis that is directed from the person's body toward the face.

図６は人物を真上から見おろした平面、いわゆる水平面上での４方向を示す。図６での人物の右手の指している方向をａ１６と表す。図６中の各軸は図４、５のものと同様である。 FIG. 6 shows four directions on a plane when a person is viewed from directly above, a so-called horizontal plane. The direction in which the person's right hand is pointing in FIG. 6 is represented as a16. Each axis in FIG. 6 is the same as that in FIGS.

図７の軸ｖはこれまでに説明した手の向いている方向を示す軸である。細い矢印は、軸ｖに対して垂直な平面上に配置されている矢印であって、手のひらの向いている方向を示す。図７での手のひらが向いている方向をａＧと表す。 The axis v in FIG. 7 is an axis indicating the direction in which the hand is facing as described above. The thin arrows are arrows arranged on a plane perpendicular to the axis v and indicate the direction in which the palm is facing. The direction in which the palm faces in FIG. 7 is represented as aG.

表１中の視線の方向とは、人物が見ている方向を示したものである。図１４に視線の方向と座標系の対応付けを表す。方向の表現の方法は、図４、５、６と同様である。図１４中の各軸は図４、５、６と同様である。図中の人物の視線の方向はａ１０と表す。 The direction of the line of sight in Table 1 indicates the direction in which a person is looking. FIG. 14 shows the correspondence between the line-of-sight direction and the coordinate system. The method of expressing the direction is the same as that shown in FIGS. Each axis in FIG. 14 is the same as that in FIGS. The direction of the line of sight of the person in the figure is represented as a10.

表１中の口の形とは、人物の口の形を示したものである。手話においては母音などが音の手がかりとして用いられることが多いため、口のあらわしている母音などを記号として用いる。図１５に口の形と記号との対応付けの例を示す。 The mouth shape in Table 1 indicates the shape of a person's mouth. In sign language, vowels are often used as clues, so vowels that represent the mouth are used as symbols. FIG. 15 shows an example of correspondence between mouth shapes and symbols.

本実施例では手話の映像を表示するために、メタデータは以上のような構成となっているが、映像の表現する内容の情報や映像周波数の情報やカラーヒストグラムなど映像自体の持つ情報の一部をメタデータとして用いる構成としてもよい。 In this embodiment, the metadata is configured as described above in order to display the sign language image. However, it is one of the information of the image itself such as the information of the contents expressed by the image, the information of the image frequency, and the color histogram. The part may be used as metadata.

図１中の顔の選択映像ファイル１５Ｆ、左手の選択映像ファイル１５Ｌ、右手の選択映像ファイル１５Ｒは、データ選択部１３が検索した単語ごとに、図２で説明したような、映っている領域ごとの映像ファイルを受け取り、これらの映像ファイルを各領域ごとに保持する記憶領域である。これらを総称して選択映像ファイル１５と呼ぶ。 The face selection video file 15F, the left-hand selection video file 15L, and the right-hand selection video file 15R in FIG. 1 are for each of the areas shown in FIG. 2 for each word searched by the data selection unit 13. This is a storage area for receiving the video files and holding these video files for each area. These are collectively referred to as a selected video file 15.

合成部１７は、蓄積された選択映像ファイル１５中の顔の選択映像ファイル１５Ｆ、左手の選択映像ファイル１５Ｌ、右手の選択映像ファイル１５Ｒについてそれぞれ、前後する二つの映像ファイルを抜き出し、二つの映像ファイルの中間の映像ファイルを合成する処理を行うものである。この合成部１７で、データ選択部１３で取り出した複数の上記各身体部分の動作映像を連結し、連結した各身体部分の動作映像を重畳して身体の動作映像を作成する映像作成手段と、データ選択部１３が取得した時間的に前後する複数の上記動作映像に基づいて、各身体部分の動作映像ごとに中間動作映像を作成し、上記映像作成手段に引き渡す合成手段を構成している。 The synthesizing unit 17 extracts two adjacent video files for the face selection video file 15F, the left-hand selection video file 15L, and the right-hand selection video file 15R in the stored selection video file 15, respectively. The intermediate video file is synthesized. In the combining unit 17, a plurality of motion images of the respective body parts taken out by the data selection unit 13 are connected, and a video creation means for creating a motion image of the body by superimposing the motion images of the connected body parts; Based on the plurality of motion images that are acquired by the data selection unit 13 and that move forward and backward, an intermediate motion image is created for each motion image of each body part, and is combined with the video creation means.

表示部１６は、複数ある選択映像ファイル１５（Ｆ、Ｌ、Ｒ）を適宜重ねて、連続して表示する処理を行う機能部である。 The display unit 16 is a functional unit that performs processing of displaying a plurality of selected video files 15 (F, L, R) as appropriate and continuously displaying them.

[映像作成方法]
次に、上記構成の映像表示装置を用いた映像作成方法について図面を参照して説明する。図８は本実施形態の映像表示装置による映像作成動作のフローチャートである。 [Video creation method]
Next, a video creation method using the video display device having the above configuration will be described with reference to the drawings. FIG. 8 is a flowchart of a video creation operation by the video display device of this embodiment.

［１．入力］
図８において、まずユーザが手話映像で表現する文章を入力部１０に入力する（ｓ８０、ｓ８１、ｓ８２）。このうちｓ８０では、ユーザは例えばキーボードなどの文章入力手段によって、手話の映像で表示したい意味の文章を入力する。また、音声入力などで文章を入力する場合もある。この場合は、音声入力などで文章を入力した後（ｓ８１）、音声テキスト変換をすることにより（ｓ８２）、入力された音声を文章に変換し、次のステップヘの入力とする。 [1. input]
In FIG. 8, first, the user inputs a sentence expressed in sign language video to the input unit 10 (s80, s81, s82). Among these, in s80, the user inputs a sentence having a meaning to be displayed as a sign language image by a sentence input unit such as a keyboard. In some cases, a sentence is input by voice input or the like. In this case, after inputting a sentence by voice input or the like (s81), by converting the voice text (s82), the input voice is converted into a sentence and used as an input to the next step.

［２．単語への割り当て］
形態素解析部１１は入力された文章を形態素解析して、形態素に分解する（ｓ８３）。次いで、分解された各形態素に、図示しない一般的な日本語の辞書を用いて、品詞を割当てる（ｓ８４）。ここで品詞を割当てられた形態素を、以下では単語と呼ぶ。この単語が翻訳部１２に与えられる。 [2. Assign to Word]
The morpheme analysis unit 11 performs morphological analysis on the input sentence and decomposes it into morphemes (s83). Next, parts of speech are assigned to each decomposed morpheme using a general Japanese dictionary (not shown) (s84). Here, morphemes to which parts of speech are assigned are referred to as words below. This word is given to the translation unit 12.

［３．構文解析と順序変更］
翻訳部１２は、単語に分割された文章を構文解析し（ｓ８５）、既存の研究と同様に単語間の状態遷移として表現されている文法に従った並び順に、順序を変更する（ｓ８６）。 [3. Parsing and reordering]
The translation unit 12 parses the sentence divided into words (s85), and changes the order according to the grammar expressed as the state transition between words as in the existing research (s86).

［４．選択映像ファイルの作成］
［４−１ファイルの検索］
データ選択部１３は単語を１つ読み出し、単語の名前と割り当てられた品詞をキーとして単語辞書１４を検索し、登録されている単語があるかどうかを調べる（ｓ８７）。 [4. Create selected video file]
[4-1 Search for files]
The data selection unit 13 reads one word, searches the word dictionary 14 using the word name and the assigned part of speech as a key, and checks whether there is a registered word (s87).

単語が登録されていない場合は、その単語の音を手話で表現する指文字を撮影した映像ファイルを検索する指文字ファイル検索処理を行う（ｓ８８）。 If a word is not registered, a finger character file search process is performed to search for a video file that captures a finger character that expresses the sound of the word in sign language (s88).

［４−１−１指文字検索処理］
指文字ファイル検索処理（ｓ８８）の詳細を、図９を参照して説明する。 [4-1-1 Finger Character Search Processing]
Details of the finger character file search process (s88) will be described with reference to FIG.

まず、指文字で表現する入力単語の音が文字単位に分解され（ｓｂ８８１）、単語を構成する文字が一つずつ読み出される（ｓｂ８８２）。次いで、各文字に相当する指文字動作を表しているファイルの名前を単語辞書１４から検索して読み出し（ｓｂ８８３）、そのファイル名を指文字用データに蓄積する（ｓｂ８８４）。次いで、ファイル名の蓄積がすべての文字について完了したか否かを判断する（ｓｂ８８５）。これにより、すべての文字についてｓｂ８８２〜ｓｂ８８４の処理が終了するまで繰り返され、最終的に指文字用データを用いて呼び出すファイル名を決定する。（ｓｂ８８６）。 First, the sound of the input word expressed by finger characters is decomposed into character units (sb881), and the characters constituting the word are read one by one (sb882). Next, the name of the file representing the finger motion corresponding to each character is retrieved from the word dictionary 14 and read (sb 883), and the file name is stored in finger character data (sb 884). Next, it is determined whether or not file name accumulation has been completed for all characters (sb885). Thus, the processing is repeated until the processing of sb882 to sb884 is completed for all characters, and finally the file name to be called is determined using the finger character data. (Sb886).

次いで、ファイルを映像ファイル群１８から読み出し（ｓｂ８８７）、読み出した映像ファイルを選択映像ファイル１５に追加する（ｓ８１３）。 Next, the file is read from the video file group 18 (sb887), and the read video file is added to the selected video file 15 (s813).

図８に戻り、ｓ８７で、登録されている単語があると判断した場合は、さらに、入力単語が複数あるか否かを判断する（ｓ８９）。該当する件数が一件である場合にはｓ８１２に進み、検索結果のファイル名の項目を利用して、映像ファイル群１８から各領域のファイルを読み出し（ｓ８１２）、選択映像ファイル１５Ｆ、１５Ｒ、１５Ｌを領域ごとに追加する（ｓ８１３）。 Returning to FIG. 8, if it is determined in s87 that there is a registered word, it is further determined whether or not there are a plurality of input words (s89). When the number of corresponding cases is one, the process proceeds to s812, and the file of each area is read from the video file group 18 using the file name item of the search result (s812), and the selected video files 15F, 15R, 15L are read. Is added for each region (s813).

［４−２登録が複数ある場合の選択］
一方、ｓ８９で、該当する件数が複数あると判断した場合には、単語辞書中のメタデータを参照して（ｓ８１０）、複数の候補の中から読み出すファイルを決定する（ｓ８１１）。具体的には、データ選択部１３はｓ８１０で一つ前の単語を表すファイルのメタデータを参照する。メタデータが参照できた場合の具体的な例として、表１のデータを例として処理を説明する。 [Selection when there are multiple 4-2 registrations]
On the other hand, if it is determined in s89 that there are a plurality of corresponding cases, the metadata in the word dictionary is referred to (s810), and a file to be read from the plurality of candidates is determined (s811). Specifically, the data selection unit 13 refers to the metadata of the file representing the previous word in s810. As a specific example when the metadata can be referred to, the processing will be described using the data in Table 1 as an example.

一つ前の単語が”明日”で”０１Ｆ．ａｖｉ”、”０１Ｌ．ａｖｉ”、”０１Ｒ．ａｖｉ”がその単語の動作を表す映像ファイルとしてそれぞれ選択され、現在の単語が”会う”である場合を考える。 The previous word is “Tomorrow” and “01F.avi”, “01L.avi”, and “01R.avi” are selected as video files representing the operation of the word, respectively, and the current word is “Meet” Think about the case.

表１のように”会う”を表す右手の映像ファイルの候補は複数あるため、どのファイルを使用するか、ファイルを選択する必要が生じる。まず、前の単語である”明日”を現すファイル”０１Ｆ．ａｖｉ”、”０１Ｌ．ａｖｉ”、”０１Ｒ．ａｖｉ”のメタデータのうち終了時の右手の位置と左手の位置の座標を得る（ｓｂ８１０）。 As shown in Table 1, since there are a plurality of right-hand video file candidates representing “Meet”, it is necessary to select which file to use. First, the coordinates of the right hand position and the left hand position at the end of the metadata of the files “01F.avi”, “01L.avi”, “01R.avi” representing the previous word “Tomorrow” are obtained ( sb810).

そして”会う”を表す複数候補のファイルのメタデータから始動時の右手の位置と左手の位置の座標を得る。 Then, the coordinates of the right hand position and the left hand position at the time of starting are obtained from the metadata of a plurality of candidate files representing “meet”.

ここで、終了時の手の位置と各始動時の手の位置のユークリッド距離の左右の和をそれぞれ求め、その値がもっとも小さいファイルを選択ファイルとして決定する（ｓ８１１）。 Here, the left and right sums of the Euclidean distances of the hand position at the end and the hand position at each start are obtained, respectively, and the file with the smallest value is determined as the selected file (s811).

この例の場合
”０４Ｆ．ａｖｉ”、”０４Ｒ．ａｖｉ”、”０４Ｌ．ａｖｉ”の場合には左右の手のユークリッド距離の和は、右手で
（（５０−５０）∧２＋（５０−６０）∧２）∧（１／２）＝１０
左手で
（（１００−８０）∧２＋（９０−６０）∧２）∧（１／２）＝３６．０６．．
ここで記号∧はべき乗を表す。 In the case of “04F.avi”, “04R.avi”, and “04L.avi” in this example, the sum of the Euclidean distances of the left and right hands is ((50-50) ∧2 + (50-60) with the right hand. ∧2) ∧ (1/2) = 10
With the left hand ((100-80) ∧2 + (90-60) ∧2) ∧ (1/2) = 36.06. .
Here, the symbol ∧ represents a power.

左右の和は
４６．０６
となる。 The left and right sum is 46.06
It becomes.

”０２Ｆ．ａｖｉ”、”０２Ｒ．ａｖｉ”、”０２Ｌ．ａｖｉ”の場合には、右手で
（（５０−１００）∧２＋（５０−６０）∧）∧（１／２）＝５０．９９
左手で
（（１００−１３０）∧２＋（９０−６０）∧２）∧（１／２）＝５８．３１．．
左右の和は
１０９．３０
となるため、
”０４Ｆ．ａｖｉ”、”０４Ｌ．ａｖｉ”、”０４Ｒ．ａｖｉ”
が選択される。 In the case of “02F.avi”, “02R.avi”, “02L.avi”, ((50-100) ∧2 + (50-60) ∧) ∧ (1/2) = 50.99 with the right hand.
((100-130) １００2+ (90-60) ∧2) ∧ (1/2) = 58.31. .
The left and right sum is 109.30
So that
“04F.avi”, “04L.avi”, “04R.avi”
Is selected.

ファイルがメタデータを参照できない場合には更新日時の新しいデータを、ｓ８１１の出力結果とする。 If the file cannot refer to the metadata, the data with the new update date is used as the output result of s811.

この実施形態では、右手、左手の情報を基準として用いたが、構成によっては視線の方向や口の形の類似度を基準として用いることも可能である。 In this embodiment, the information on the right hand and the left hand is used as a reference. However, depending on the configuration, it is also possible to use the similarity in the direction of the line of sight and the shape of the mouth.

以降、データ選択部１３は、選択された映像ファイルを映像ファイル群１８から読み出し（ｓ８１２）、その読み出した各映像ファイルを選択映像ファイル１５（Ｆ、Ｒ、Ｌ）に追加する。（ｓ８１３）。 Thereafter, the data selection unit 13 reads the selected video file from the video file group 18 (s812), and adds the read video files to the selected video file 15 (F, R, L). (S813).

入力文を構成するすべての単語について、上記ｓ８７〜ｓ８１３の処理を繰り返す。 The above steps s87 to s813 are repeated for all words constituting the input sentence.

［５．中間ファイルの合成］
（選択映像ファイルの変更）
図１０に合成部１７の処理の流れと動作フローの関係について示す。 [5. Intermediate file composition]
(Change selected video file)
FIG. 10 shows the relationship between the processing flow of the synthesis unit 17 and the operation flow.

（ａ．抜き出し）
合成部１７は、選択映像ファイル１５の各領域映像ファイルのうち、たとえばまず、右手領域の選択映像ファイル１５Ｒからまだ処理していない先行する単語を表す先映像ファイル９７とそのすぐ後の単語を表す後映像ファイル９９の二つを抜き出す（ｓ９１）。 (A. Extraction)
The synthesizing unit 17 represents, for example, a destination video file 97 representing a preceding word that has not yet been processed from the selection video file 15R in the right-hand region, and a word immediately thereafter, out of each region video file of the selection video file 15. Two of the subsequent video files 99 are extracted (s91).

次に、先映像ファイル９７の最終フレーム９８と後映像ファイル９９の先頭フレーム９１０とをそれぞれ抜き出す（ｓ９２）。 Next, the last frame 98 of the previous video file 97 and the first frame 910 of the subsequent video file 99 are extracted (s92).

（ｂ．中間フレーム合成）
次に、最終フレーム９８と先頭フレーム９１０を用いてモーフィング処理によって中間フレーム９１１を複数枚合成する（ｓ９３）。 (B. Intermediate frame synthesis)
Next, a plurality of intermediate frames 911 are synthesized by morphing processing using the final frame 98 and the leading frame 910 (s93).

ここではモーフィング処理として、既存の技術であるクロスディゾルブやワーピングなどの方法を用いることが可能である。最終フレーム９８から先頭フレーム９１０へ徐々に変化するようなフレームを合成する。 Here, as a morphing process, it is possible to use an existing technique such as cross dissolve or warping. A frame that gradually changes from the last frame 98 to the first frame 910 is synthesized.

（ｃ．映像ファイル作成）
次に、上記ｓ９３で合成した複数枚の中間フレームを連結して合成映像ファイル９１２を作成する。即ち、複数枚ある中間フレームに時間情報を追加して映像ファイル９１２としてまとめる（ｓ９４）。 (C. Create video file)
Next, a composite video file 912 is created by concatenating a plurality of intermediate frames synthesized in s93. In other words, time information is added to a plurality of intermediate frames and collected as a video file 912 (s94).

（ｄ．追加）
次に、上記合成映像ファイル９１２を選択映像ファイルの当該部分、つまり先映像ファイル９７と後映像ファイル９９の中間部分に挿入する処理を行う（ｓ９５）。 (D. Addition)
Next, a process of inserting the composite video file 912 into the relevant part of the selected video file, that is, an intermediate part between the previous video file 97 and the subsequent video file 99 is performed (s95).

次に、右手の選択映像ファイルをすべて処理するまで、上記ａ．ｂ．ｃ．ｄ．の各処理を反復する。右手の処理が終了した後、左手、顔の選択映像ファイル１５Ｌ、１５Ｆについても同様の処理を行う。なお、身体部分の順序は異なっていても良い。 Next, the above-described a. b. c. d. Repeat each process. After the right hand processing is completed, the same processing is performed for the left hand and face selection video files 15L and 15F. The order of the body parts may be different.

［６．表示］
表示部１６は、選択映像ファイルを１５Ｆ、１５Ｒ、１５Ｌと重ねて描画し、図２の２４に示すように、最終的に顔、右手、左手のそろった人物の映像として表示する。 [6. display]
The display unit 16 draws the selected video file so as to overlap with 15F, 15R, and 15L, and finally displays the selected video file as a video of a person with a face, right hand, and left hand as shown in FIG.

このようにして作成された映像の表示例を図１３に示す。図１３は、文章を表示すると共に、その文章の意味を手話に翻訳して同時に表示する表示装置の例である。
A display example of the video created in this way is shown in FIG. FIG. 13 is an example of a display device that displays a sentence and simultaneously translates the meaning of the sentence into sign language and displays it.

この表示装置は、電車の運行状況等の情報を表示する表示部１２０１と、上記構成の映像表示装置１２０２とを備えている。表示部１２０１は、現在の電車の運行状況等を自動的に表示する。映像表示装置１２０２は、表示部１２０１の内容を上記処理によって手話に翻訳し、手話の映像として、表示部１２０１の文章と同時に表示する。 This display device includes a display unit 1201 for displaying information such as train operation status and the video display device 1202 having the above-described configuration. The display unit 1201 automatically displays the current train operation status and the like. The video display device 1202 translates the content of the display unit 1201 into sign language by the above processing, and displays it as a sign language video simultaneously with the text on the display unit 1201.

［効果］
各身体部分の動作の映像を独立した映像ファイルとして保持し、それぞれ独自に映像合成処理を行い、最後に重ねて表示するという構成をとることで、通常の構成で映像合成を行う際に、身体部分の重なりが原因で生じるゆがみを解消でき、より自然な形での映像合成が可能となるという効果が得られる。 [effect]
By storing the motion image of each body part as an independent video file, performing the video composition process independently, and displaying it at the end, it is possible to display the body when synthesizing the video in the normal configuration. Distortion caused by overlapping of parts can be eliminated, and an effect of enabling more natural video composition is obtained.

また、手領域の映像と顔領域の映像が別の人物のものであっても使用できるため、使用可能な映像の範囲が広がるという効果が得られる。 In addition, since the image of the hand area and the image of the face area can be used even if they are from different people, the range of usable images can be expanded.

［第２実施形態］
次に、本発明の第２実施形態について説明する。本実施形態では、映像表示装置のシステムに映像分割部１０１を追加したものである。このシステムに映像分割部１０１を追加することにより、手話の文章を表現している人物を撮影した映像ファイルを解析、分節、さらに顔、右手、左手などの各身体領域のみが表示されている動作映像ファイルとして分割したのち、単語に相当する映像ファイルを切り分けて映像ファイル群に格納し、単語辞書の検索に必要なメタデータを追加することを可能にしたシステムについて説明する。
[Second Embodiment]
Next, a second embodiment of the present invention will be described. In this embodiment, a video dividing unit 101 is added to the system of the video display device. By adding a video segmentation unit 101 to this system, an image file that captures a person expressing a sign language sentence is analyzed, segmented, and only the body regions such as the face, right hand, and left hand are displayed. A system will be described in which a video file corresponding to a word is divided and stored in a video file group after being divided as a video file, and metadata necessary for searching a word dictionary can be added.

［構成］
本第２実施形態の構成は、映像分割部１０１を新たに追加する点で、第１の実施形態と異なる。従って、図１１において図１と同一または、対応する構成要件については同じ符号をつけて示す。また、上記第１実施形態で説明した構成要件については、その機能の説明を省略する。 [Constitution]
The configuration of the second embodiment is different from the first embodiment in that a video dividing unit 101 is newly added. Therefore, in FIG. 11, the same or corresponding components as those in FIG. 1 are denoted by the same reference numerals. In addition, description of the functions of the configuration requirements described in the first embodiment is omitted.

映像分割部１０１は、手話の文章を表現している人物を撮影した映像ファイルを解析、分節、さらに、画像認識、領域分割などの画像処理の手法を用いて、人物の顔、右手、左手などの各身体領域のみが表示されている独立の映像ファイルとして分割して、これらの映像ファイルを映像ファイル群１８に格納し、さらに単語辞書１４に、検索に必要なメタデータを追加する機能を備えた機能部である。この映像分割部１０１は、図１２に示す処理機能を備えている。 The video segmentation unit 101 analyzes a video file that captures a person representing a sign language sentence, uses image processing techniques such as segmentation, image recognition, and area segmentation, and the person's face, right hand, left hand, etc. A function of dividing the video files into independent video files in which only the body regions are displayed, storing the video files in the video file group 18, and adding metadata necessary for the search to the word dictionary 14 is provided. Functional part. The video dividing unit 101 has a processing function shown in FIG.

この映像分割部１０１で、特定人物の動作映像を分解して得た複数の処理フレームの画像情報を用いて、認識、分割し、各身体部分ごとの映像フレームとし、それを蓄積して各身体部分ごとの動作映像とする映像分割手段と、上記各身体部分の動作映像の連結基準情報を生成する連結基準情報手段と、あらかじめ用意された情報辞書に基づいて、上記入力動作映像が表す１または複数の情報を認識する情報認識手段と、当該情報認識手段が認識した各認識情報の動作映像を上記身体部分情報から分割し、その分割動作映像および、上記認識情報に対応する連結基準情報を上記動作映像格納手段および、上記連結基準情報格納手段に登録する登録手段とを構成している。具体的には、以下の処理機能によって実現されている。 In this video dividing unit 101, using the image information of a plurality of processing frames obtained by decomposing a motion image of a specific person, it is recognized and divided into video frames for each body part, which are accumulated and stored in each body Based on a video segmentation means for making a motion image for each part, a connection reference information means for generating connection reference information for motion images of each body part, and an information dictionary prepared in advance, 1 or Information recognition means for recognizing a plurality of information, and a motion video of each recognition information recognized by the information recognition means is divided from the body part information, and the divided motion video and the connection reference information corresponding to the recognition information are The moving image storage means and the registration means for registering in the connection reference information storage means are configured. Specifically, this is realized by the following processing functions.

[映像作成方法]
次に、上記構成の映像表示装置を用いた映像作成方法について説明する。本実施形態の映像表示装置による映像作成方法は、映像分割部１０１を除いて上記第１実施形態と共通するので、ここでは、映像分割部１０１の動作のみを説明する。 [Video creation method]
Next, a video creation method using the video display device having the above configuration will be described. Since the video creation method by the video display device of this embodiment is the same as that of the first embodiment except for the video dividing unit 101, only the operation of the video dividing unit 101 will be described here.

図１２において、映像ファイル１１０１は、ある文章を手話で表現している人物を撮影した映像ファイルである。以下では、この映像ファイル１１０１を顔、右手、左手の各身体領域に分割、さらに単語ごとに切り分けて、映像ファイル群１８および単語辞書にレコードを追加する場合について説明する。 In FIG. 12, a video file 1101 is a video file obtained by photographing a person expressing a certain sentence in sign language. Hereinafter, a case will be described in which the video file 1101 is divided into body regions of the face, right hand, and left hand, and further divided into words, and records are added to the video file group 18 and the word dictionary.

まず、映像分割部１０１は、追加対象の映像ファイル１１０１を取り込み、この映像ファイル１１０１を構成する複数の映像フレームについて、所定のフレーム群１１０３に分割する（ｓ１１０２）。なお、このフレーム群１１０３は、複数の静止画像をまとめたものである。 First, the video dividing unit 101 takes in a video file 1101 to be added, and divides a plurality of video frames constituting the video file 1101 into a predetermined frame group 1103 (s1102). The frame group 1103 is a group of a plurality of still images.

映像ファイル１１０１がフレーム群１１０３に分割されると、映像分割部１０１は一つのフレームを取り出し（ｓ１１０４）、処理フレーム１１０５とする。 When the video file 1101 is divided into the frame group 1103, the video dividing unit 101 extracts one frame (s1104) and sets it as a processing frame 1105.

取り出された処理フレーム１１０５に対して、各身体領域に分割する処理を行う（ｓ１１０６）。 The extracted processing frame 1105 is divided into body regions (s1106).

ウエーブレット変換を用いたパターンマッチングによる顔領域抽出、色空間変換とオプティカルフロー抽出の組み合わせによる腕領域抽出など、既存の方法を用いて一つの処理フレームから、身体部分が一つずつそれぞれだけ現れている領域フレーム１１０７、１１０８、１１０９を作成する。 Only one body part appears from each processing frame using existing methods, such as face region extraction by pattern matching using wavelet transform, arm region extraction by combination of color space conversion and optical flow extraction, etc. Area frames 1107, 1108, and 1109 are created.

つぎに、各身体領域だけが存在している顔領域フレーム１１０７、右手領域フレーム１１０８、左手領域フレーム１１０９に対して、既存のアルゴリズムである重心位置検出などを行い（ｓ１１１０）、各領域の位置を位置データ１１１１として追加する。 Next, for the face area frame 1107, the right hand area frame 1108, and the left hand area frame 1109 where only each body area exists, the center-of-gravity position detection, which is an existing algorithm, is performed (s1110), and the position of each area is determined. It is added as position data 1111.

さらに、顔領域フレーム１１０７に対し、顔領域中に複数の特徴点検出を行い、特徴点群にグラフを構成し、その特徴グラフの分類から、視線の向きを認識する処理を行う（ｓ１１１２）。 Further, a plurality of feature points are detected in the face region for the face region frame 1107, a graph is formed in the feature point group, and a process of recognizing the direction of the line of sight from the classification of the feature graph is performed (s1112).

さらに、口周辺領域でのパターンマッチング等による口の形の認識と分類を行い（ｓ１１１４）、それぞれ、時系列データ１１１７として追加する（ｓ１１１６）。 Further, mouth shape recognition and classification are performed by pattern matching or the like in the mouth peripheral region (s1114), and each is added as time series data 1117 (s1116).

右手、左手領域フレーム（１１０８、１１０９）に対しても同様に画像認識処理とデータの追加を行う。 Similarly, image recognition processing and data addition are performed for the right-hand and left-hand region frames (1108, 1109).

位置データ１１１１の情報を用いて、当該処理フレーム１１０８、１１０９中の人物の手とひじの位置関係から、手の向きを認識し、その認識した手の向きについて記号分類する（ｓ１１１３）。次に、認識した手の位置周辺の一定面積の画素情報を取得し、その画素情報を用いて手の形を認識し、認識した手の形について記号分類する（ｓ１１１５）。このとき、手の形を認識する方法として、たとえば、手の位置周辺の一定面積の画素に対して、既存の画像認識アルゴリズムである、ニューラルネットワークや、高次局所自己相関特徴の計算などの手法を適用できる。 Using the information of the position data 1111, the direction of the hand is recognized from the positional relationship between the hand and elbow of the person in the processing frames 1108 and 1109, and the recognized hand direction is classified into symbols (s 1113). Next, pixel information of a certain area around the recognized hand position is acquired, the shape of the hand is recognized using the pixel information, and the recognized hand shape is classified into symbols (s1115). At this time, as a method for recognizing the shape of the hand, for example, for a pixel with a certain area around the position of the hand, a method such as neural network or calculation of higher-order local autocorrelation features, which is an existing image recognition algorithm Can be applied.

このようにして、映像分割部１０１は、顔領域フレーム１１０７、右手領域フレーム１１０８、左手領域フレーム１１０９中における処理フレーム１１０５の表示時間を示す情報を追加して、視線の向き、口の形、手の位置、手の向き、手の形のデータを時系列データ１１１７に追加する（ｓ１１１６）。 In this way, the video dividing unit 101 adds information indicating the display time of the processing frame 1105 in the face area frame 1107, the right hand area frame 1108, and the left hand area frame 1109, and thereby the direction of the line of sight, mouth shape, and hand. Is added to the time-series data 1117 (s1116).

さらに、顔領域フレーム１１０７、右手領域フレーム１１０８、左手領域フレーム１１０９を顔領域ファイル１１２３、右手領域ファイル１１２４、左手領域ファイル１１２５に追加する（ｓ１１１８）。 Further, the face area frame 1107, the right hand area frame 1108, and the left hand area frame 1109 are added to the face area file 1123, the right hand area file 1124, and the left hand area file 1125 (s1118).

ここで、顔領域ファイル１１２３、右手領域ファイル１１２４、左手領域ファイル１１２５は、フレームと時間情報をそれぞれ持ち、一般的な映像ファイルとして蓄積する記憶領域である。

Here, the face area file 1123, the right-hand area file 1124, and the left-hand area file 1125 are storage areas that have frame and time information, respectively, and accumulate as general video files.

また、表２に時系列データの構造例を示す。この表２に示すように、時系列データは、所定時間（ここでは時間単位をミリ秒とする）ごとのメタデータ（視線の向き、口の形、手の位置、手の向き、手の形）からなる。 Table 2 shows an example of the structure of time series data. As shown in Table 2, time-series data includes metadata (gaze direction, mouth shape, hand position, hand direction, hand shape) for each predetermined time (here, the time unit is milliseconds). ).

そして、上記Ｓ１１０４からＳ１１１８までの処理を、フレーム群１１０３のすべてのフレームが終了するまで繰り返す。 Then, the processing from S1104 to S1118 is repeated until all the frames in the frame group 1103 are completed.

この時点で時系列データ１１１７に各種メタデータが格納され、顔領域ファイル１１２３には顔だけの映像、右手領域ファイル１１２４には右手だけの映像、左手領域ファイル１１２５には左手だけの映像がそれぞれ映像ファイルとして格納されている。 At this time, various kinds of metadata are stored in the time series data 1117, the face area file 1123 is an image of only the face, the right hand area file 1124 is an image of only the right hand, and the left hand area file 1125 is an image of only the left hand. Stored as a file.

すべてのフレームについて処理が終了すると、映像分割部１０１は、作成した時系列データに対し、手の位置やその時間微分である動作速度ベクトル、手のむきなどを入力とするＨＭＭなどの既存の認識方法を用いて、手話認識処理を行い、時系列データを手話単語に相当する部分ごとに区切る（ｓ１１１９）。 When the processing is completed for all frames, the video segmentation unit 101 recognizes an existing recognition such as an HMM that inputs the position of the hand, an operation speed vector that is a time derivative thereof, a hand peeling, etc. with respect to the generated time series data. Using the method, sign language recognition processing is performed, and time-series data is divided into portions corresponding to sign language words (s1119).

さらに、映像分割部１０１は区切った時系列データに、認識結果の手話単語名のラベルをつけ（ｓ１１２０）、顔領域ファイル、右手領域ファイル、左手領域ファイルを読み、時系列データに付されたラベルごとに、時系列データの時間情報を用い、顔領域ファイル、右手領域ファイル、左手領域ファイルを、ラベルごとに分割する（ｓ１１２１）。 Further, the video dividing unit 101 attaches a label of the sign language word name of the recognition result to the divided time series data (s1120), reads the face area file, the right hand area file, and the left hand area file and labels the time series data. Each time, using the time information of the time-series data, the face area file, the right hand area file, and the left hand area file are divided for each label (s1121).

そして、映像分割部１０１は、分割した顔領域ファイル、右手領域ファイル、左手領域ファイルに名前をつけて、映像ファイル群１８のそれぞれ顔映像ファイル群１８Ｆ、右手映像ファイル群１８Ｒ、左手映像ファイル群１８Ｌにレコードとして追加する処理を行い（ｓ１１２２）、単語辞書１４にメタデータを追加する。 Then, the video dividing unit 101 names the divided face area file, right hand area file, and left hand area file, and the face video file group 18F, the right hand video file group 18R, and the left hand video file group 18L of the video file group 18, respectively. Is added as a record (s1122), and metadata is added to the word dictionary 14.

［効果］
本実施形態では、上記第１実施形態の効果に加えて、以下の効果が得られる。 [effect]
In the present embodiment, the following effects are obtained in addition to the effects of the first embodiment.

これまでは、システム作成時に人物の動作をたとえば各領域にマスクをして複数回撮影するなど、撮影時に工夫が必要であったが、本実施形態によってこれを解消することができる。即ち、システムに映像分割部１０１を追加することで、作業者が映像の分割作業を行う必要がなくなるため、結果として作業者は、システムに単語と映像ファイルを簡便に追加することが可能となる。 Up to now, it has been necessary to devise at the time of shooting, such as shooting a person's actions a plurality of times by masking each area at the time of system creation, but this embodiment can solve this. That is, by adding the video dividing unit 101 to the system, it is not necessary for the worker to perform the video dividing operation, and as a result, the worker can easily add words and video files to the system. .

［変形例］
上記各実施形態において、入力部としてキーボードなどの方法を用いたが、これは音声入力、ＧＵＩ的なインターフェース、ボタン、バーコードリーダーなどのすべての入力装置を用いることができる。 [Modification]
In each of the above embodiments, a method such as a keyboard is used as the input unit. However, any input device such as a voice input, a GUI-like interface, a button, or a barcode reader can be used.

上記各実施形態において、映像ファイル群１８と単語辞書１４とはそれぞれ別の部分として構成されているものとして説明したが、本発明はこれに限らず、同じ機能を備えていれば、他の構成でもよい。例えば、映像ファイルのヘッダなどにメタデータを埋め込み、特定の映像ファイル群のヘッダを単語辞書として随時検索するといった構成としてもよい。 In each of the above embodiments, the video file group 18 and the word dictionary 14 have been described as being configured as separate parts. However, the present invention is not limited to this, and other configurations can be used as long as they have the same function. But you can. For example, metadata may be embedded in a video file header or the like, and a header of a specific video file group may be searched as needed as a word dictionary.

上記各実施形態において、映像ファイルのメタデータを、手話を表現するために、視線の向き、口の形、手の位置、手の向き、手の形としたが、これらは一例であって、本発明はこれらに限定されるものではない。たとえば、顔の表情など他のデータを用いる構成としても良い。 In each of the above embodiments, in order to express sign language in the video file metadata, the direction of the line of sight, the shape of the mouth, the position of the hand, the direction of the hand, the shape of the hand, but these are examples, The present invention is not limited to these. For example, another data such as facial expressions may be used.

第１実施形態では、映像ファイルを選択する評価基準において、ユークリッド距離を用いて説明したが、これは一例であり、同様の効果が得られる別の値を用いても良い。 In the first embodiment, the Euclidean distance is used in the evaluation criteria for selecting a video file. However, this is an example, and another value that can obtain the same effect may be used.

第１実施形態では、モーフィングの方法として、クロスディゾルブ、ワーピングを例としてあげたが、これらは、排他的なものではなく、どちらか一方または両方を組み合わせて使う構成としてもよい。またそのほかの合成方法を用いても良い。 In the first embodiment, as examples of morphing methods, cross dissolve and warping are given as examples. However, these are not exclusive, and either one or both may be used in combination. Other synthesis methods may also be used.

各実施形態において、手話映像を表示する映像表示装置について説明したが、手話に限らず、語や文章、記号に対応付けることができる意味を持った人物の動作を含んだ映像を複数連結して、連続する人物動作映像を表示する装置であれば、たとえば、手旗信号、パントマイム、ダンスなど、特定の意味を表現するための動作の映像を表示するシステムとして利用することが可能である。 In each embodiment, a video display device that displays a sign language video has been described, but not limited to sign language, a plurality of videos including actions of a person having a meaning that can be associated with words, sentences, and symbols, Any device that displays continuous human action images can be used as a system for displaying action images for expressing a specific meaning, such as a hand flag signal, pantomime, and dance.

各実施形態において、日本語による辞書検索を行ったが、辞書のデータを変更することで、その他外国語による手話表示システムとすることも可能である。 In each embodiment, a dictionary search in Japanese is performed, but it is also possible to make a sign language display system in another foreign language by changing the dictionary data.

各実施形態において、各辞書の構成を表の形として示したが、ツリーなど異なるデータ構造を用いても良い。 In each embodiment, the configuration of each dictionary is shown as a table, but a different data structure such as a tree may be used.

本発明を構成する格納手段等の要素については、ハードウエア構成は問わない。例えば、一台のＰＣ上に存在していても良いし、データベースなど要素の一部またはすべてがネットワーク上のサーバーなどである構成としても良い。 The elements such as storage means constituting the present invention may be any hardware configuration. For example, it may exist on a single PC, or a part or all of elements such as a database may be a server on a network.

また、上記各実施形態では、表示部１６を備えた映像表示装置を例に説明したが、表示部１６を備えていない映像作成装置でも上記同様の作用、効果を奏することができる。 Further, in each of the above embodiments, the video display device including the display unit 16 has been described as an example, but a video creation device that does not include the display unit 16 can exhibit the same operations and effects as described above.

本発明の第１の実施形態に係る映像表示装置のシステム構成図である。1 is a system configuration diagram of a video display apparatus according to a first embodiment of the present invention. 各身体部分の映像ファイルの例を示す模式図である。It is a schematic diagram which shows the example of the video file of each body part. 各身体部分のうち手の位置を示す模式図である。It is a schematic diagram which shows the position of the hand among each body part. 前頭面上での手の向きを示す模式図である。It is a schematic diagram which shows direction of the hand on a frontal surface. 矢状面上での手の向きを示す模式図である。It is a schematic diagram which shows the direction of the hand on a sagittal surface. 水平面上での手の向きを示す模式図である。It is a schematic diagram which shows direction of the hand on a horizontal surface. 手のひらの向きを表す模式図である。It is a schematic diagram showing the direction of the palm. 選択映像ファイルの作成処理を示すフローチャートである。It is a flowchart which shows the creation process of the selection video file. 指文字映像ファイルの作成処理を示すフローチャートである。It is a flowchart which shows the creation process of a finger character video file. 合成部での処理を示すフローチャート及び合成部で生成される中間データを示すブロック図である。It is a block diagram which shows the intermediate data produced | generated by the flowchart which shows the process in a synthetic | combination part, and a synthetic | combination part. 本発明の第２の実施形態に係る映像表示装置のシステム構成図である。It is a system configuration | structure figure of the video display apparatus concerning the 2nd Embodiment of this invention. 映像分割部での処理を示すフローチャートである。It is a flowchart which shows the process in a video division part. 映像作成装置の使用例を示す正面図である。It is a front view which shows the usage example of an imaging | video production apparatus. 視線の方向と記号との対応関係を示す模式図である。It is a schematic diagram which shows the correspondence of the direction of eyes | visual_axis and a symbol. 口の形と記号との対応関係を示す模式図である。It is a schematic diagram which shows the correspondence of the shape of a mouth and a symbol. 手の形と記号との対応関係を示す模式図である。It is a schematic diagram which shows the correspondence of a hand shape and a symbol.

Explanation of symbols

１０：入力部、１１：形態素解析部、１２：翻訳部、１３：データ選択部、１４：単語辞書、１５：選択映像ファイル、１５Ｆ：顔の選択映像ファイル、１５Ｌ：左手の選択映像ファイル、１５Ｒ：右手の選択映像ファイル、１６：表示部、１７：合成部、１８：映像ファイル群、１８Ｆ：顔映像ファイル群、１８Ｒ：右手映像ファイル群、１８Ｌ：左手映像ファイル群、１０１：映像分割部。 10: input unit, 11: morphological analysis unit, 12: translation unit, 13: data selection unit, 14: word dictionary, 15: selection video file, 15F: selection video file of face, 15L: selection video file of left hand, 15R : Right hand selected video file, 16: display unit, 17: compositing unit, 18: video file group, 18F: face video file group, 18R: right hand video file group, 18L: left hand video file group, 101: video dividing unit.

Claims

A video creation device that creates a continuous motion video by linking a plurality of motion images representing physical motions according to specific information,
Motion video storage means for individually storing motion videos each including motion for each part of the body;
Linked reference information storage means for storing the information of each moving image in association with the read information of the moving image and one or more pieces of linked reference information displayed in the moving image;
A motion video that retrieves the corresponding motion video readout information from the connection reference information storage means based on a plurality of temporally input information and extracts the motion video from the motion video storage means based on the readout information Acquisition means;
A video creation means for connecting the motion images of each of the body parts taken out by the motion video acquisition means, and superimposing the motion images of the connected body parts to create a motion video of the body,
Using the image information of multiple processing frames obtained by decomposing a motion image of a specific person, it is recognized and divided into body region frames for each body part, and these are stored together with time information for each body part. Video segmentation means as body region files;
A connection reference information means for recognizing the feature of the motion image of each body part and generating connection reference information;
Information recognition means for recognizing one or more pieces of information represented by the motion video based on an information dictionary prepared in advance;
The motion video of each recognition information recognized by the information recognition means is divided from the body part information, and the motion video storage means and the connection reference information storage means corresponding to the divided motion video and the connection reference information corresponding to the recognition information. And registration means for registering with
The video dividing means divides the time series data composed of the connection reference information and the time information into parts corresponding to words, and uses the time information of the divided time series data to obtain the body region file for each body part. Divide each word above,
The video in which the registration means registers the body region file divided for each word in the motion video storage means, and registers the connection standard information divided for each word in the connection standard information storage means Creation device.

The video creation apparatus according to claim 1, wherein the connection reference information is metadata necessary for composition and connection of each motion video.

Based on the plurality of motion images that are acquired by the motion image acquisition means, the intermediate motion image is generated for each motion image of each body part, and is provided with synthesis means that is delivered to the image creation means. The video creation device according to claim 1 or 2.

The above motion image is a sign language image expressing the meaning of a word,
The video creation apparatus according to any one of claims 1 to 3 , wherein the connection reference information is metadata including a hand position of a person who performs sign language, a hand peeling, and a hand shape.

5. The video creation device according to claim 1 , wherein the video creation means includes a display unit that displays the created video.

In a video creation device, a video creation method for creating a continuous motion video by connecting a plurality of motion videos representing body motion according to specific information,
An operation image storage step of individually storing operation images each including an operation for each part of the body in the operation image storage means ;
The connection reference information storage means stores the connection reference information in which the information of each operation video is stored in association with the read information of the operation video and one or more connection reference information displayed in the operation video. Process,
The motion video acquisition means retrieves the corresponding motion video read information from the connection reference information storage means based on a plurality of input information in time order, and the motion video storage means reads the motion video storage information based on the read information. An operation image acquisition process for extracting an operation image;
A video creation step of connecting a plurality of motion images of each of the body parts extracted in the motion video acquisition step with a video creation means, and superimposing the motion images of the connected body parts to create a motion video of the body,
Using image information of a plurality of processing frames obtained by disassembling a specific person's motion video by video segmentation means, recognize and divide it into body region frames for each body part, and store them together with time information The video segmentation process to be a body area file for each body part,
A connection reference information step of generating connection reference information by recognizing the characteristics of the motion image of each of the body parts by a connection reference information means,
An information recognizing step for recognizing one or more pieces of information represented by the motion video based on an information dictionary prepared in advance by an information recognizing means;
The motion image of each recognition information recognized in the information recognition step is divided from the body part information, and the motion image storage step and the connection criterion are stored in the registration means by using the divided motion image and the connection reference information corresponding to the recognition information. A registration process for registering for use in the information storage process,
In the video dividing step, the video dividing means divides the time series data composed of the connection reference information and the time information into parts corresponding to words, and uses the time information of the divided time series data for each body part. Split the body region file for each word above
In the registration step, the registration means registers the body region file divided for each word in the motion video storage means, and registers the connection reference information divided for each word in the connection reference information storage means. A featured video creation method.

The video creation method according to claim 6 , wherein the connection reference information is metadata necessary for composition and connection of each motion video.

Based on the plurality of motion images that are acquired in the motion image acquisition step, the body motion parts are inserted in the middle when the motion images of the body parts are connected in the image creation step. The video creation method according to claim 6, further comprising a synthesis step of creating each motion video.

The above motion image is a sign language image expressing the meaning of a word,
9. The video creation method according to claim 6 , wherein the connection reference information is metadata including a hand position of a sign language person, a hand peeling, and a hand shape.