JP4446124B2

JP4446124B2 - How to create video content

Info

Publication number: JP4446124B2
Application number: JP2005337185A
Authority: JP
Inventors: 教彰桑原; 和宏桑原; 伸治安部; 清安田
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2005-11-22
Filing date: 2005-11-22
Publication date: 2010-04-07
Anticipated expiration: 2025-11-22
Also published as: JP2007143029A

Description

この発明はビデオコンテンツ作成方法に関し、特にたとえば、認知症者の過去の写真をナレーションとともにその認知症社（視聴者)に提示できる、思い出ビデオのようなビデ
オコンテンツを作成する、ビデオコンテンツ作成方法に関する。 The present invention relates to a video content creation method , and more particularly to a video content creation method for creating a video content such as a memory video that can present past photographs of a person with dementia to the dementia company (viewer) together with narration. .

高齢の認知症者に視聴覚刺激を提示することで、彼（女）らを活性化することを目的とした研究が幾つかなされている。たとえば、非特許文献１に示すビデオレスパイトでは、ビデオの視聴者である認知症者に語り掛けを行うキャラクタが提示される。また、非特許文献２に示すプロジェクトCIRCAでは、昔の有名な歌や画像、映像を含んだ、良くデザインされたマルチメディアコンテンツが利用されている。我々は、認知症者の長期記憶を刺激するという観点から、視聴覚刺激の素材として、思い出ビデオを選定した。思い出ビデオは認知症者の古いアルバム中の写真を用いて作成されたスライドショービデオである。その臨床的な有効性は非特許文献３で実験的に示されている。 There have been some studies aimed at activating them by presenting audiovisual stimuli to elderly people with dementia. For example, in the video rasp shown in Non-Patent Document 1, a character that talks to a person with dementia who is a video viewer is presented. In addition, the project CIRCA shown in Non-Patent Document 2 uses well-designed multimedia contents including old famous songs, images, and videos. From the viewpoint of stimulating the long-term memory of people with dementia, we selected a memory video as a material for audiovisual stimulation. The memory video is a slideshow video created using photos in an old album of people with dementia. Its clinical effectiveness has been experimentally shown in Non-Patent Document 3.

しかし、思い出ビデオを作成するのは単純な作業ではない。まず、古いアルバムを用意し、そこから適当な写真を選択する。使用する写真は、視聴者である認知症者の遠い昔の記憶を呼び覚ますものでなくてはならない。次に、ビデオカメラで写真をムービー仕立てに撮影するが、必要に応じてパン、ズームの映像効果（非特許文献４に示すいわゆるケンバーンズ効果）を付与する。またナレーションは、認知症者をビデオにより一層引き付けるために付与される。 However, creating a memory video is not a simple task. First, prepare an old album and select an appropriate photo from it. The photo you use must awaken the distant memory of the viewer who has dementia. Next, a picture is taken with a video camera in a tailored manner, and a pan and zoom video effect (a so-called Cambridge effect shown in Non-Patent Document 4) is applied as necessary. Narration is also given to attract more people with dementia through video.

通常これらは、映像編集のスキルを有するボランティアが実施しており、介護家族が気軽に作成できるようなものではない。加えて、認知症者が思い出ビデオを繰り返し視聴することでそれに飽きてしまい、結果として思い出ビデオが当初のように認知症者を引き付ける効果は、次第に失せてしまうことも考えられる。
Lund, D.A., Hill, R.D., Caserta, M.S., and Wright, S.D.: Video Respite: an innovative resource for family, professional caregivers, and persons with dementia, The Gerontologist, Vol. 35, Issue 5 (1995) 683-687. Gowans, G., Campbell, J., Alm, N., Dye, R., Astell, A., and Ellis, M.: Designing a multimedia conversation aid for reminiscence therapy in dementia care environments, Extended abstracts of the 2004 conference on Human Factors and Computing Systems (2004) 825 - 836. 安田ほか：認知症者への思い出写真ビデオの作成と集中度の評価。第２８回高次脳機能障害学会総会（２００４） http://en.wikipedia.org/wiki/Ken_Burns These are usually carried out by volunteers with video editing skills and are not something that can be easily created by caregivers. In addition, the person with dementia gets tired of repeatedly viewing the memory video, and as a result, the effect that the memory video attracts the person with dementia like the original may be gradually lost.
Lund, DA, Hill, RD, Caserta, MS, and Wright, SD: Video Respite: an innovative resource for family, professional caregivers, and persons with dementia, The Gerontologist, Vol. 35, Issue 5 (1995) 683-687. Gowans, G., Campbell, J., Alm, N., Dye, R., Astell, A., and Ellis, M .: Designing a multimedia conversation aid for reminiscence therapy in dementia care environments, Extended abstracts of the 2004 conference on Human Factors and Computing Systems (2004) 825-836. Yasuda et al .: Making a photo video for people with dementia and evaluating their concentration 28th Annual Meeting of the Society for Higher Brain Dysfunction (2004) http://en.wikipedia.org/wiki/Ken_Burns

魅力的な思い出ビデオを作成するために、どのような映像効果、あるいはオーディオ効果が加えられるべきかを明確にする必要がある。そのため発明者等は、３つの代表的な効果、すなわち前述のケンバーンズ効果、ＢＧＭ、およびナレーションに対して、それらの有効性を評価するための、以下のような実験を実施した。上記のすべての効果を使用した思い出ビデオと、それら効果の１つを除外したビデオを、３人の認知症者のために用意した。それぞれのビデオの長さはおよそ２０分である。２週間の実験期間中に介護家族には、毎日１回、映像効果、オーディオの効果を変え、上記すべてのタイプのビデオを認知症者が視聴するように依頼した。認知症者がビデオに飽きた場合には、ビデオを別のタイプに変えるように依頼した。介護家族には、認知症者がどの程度の長い間、そのタイプのビデオを視聴したか、またどの程度集中してそれを視聴したか、「１」から「５」の尺度で介護家族の主観で評価し記録してもらった。実験期間が終わった後に、介護家族のニーズを調査するために、インタビューも実施した。
その結果、以下のような知見が得られた。 It is necessary to clarify what video effects or audio effects should be added to create an attractive memory video. Therefore, the inventors conducted the following experiments for evaluating the effectiveness of three typical effects, namely, the above-mentioned Cambridge effect, BGM, and narration. A memory video using all the above effects and a video excluding one of those effects were prepared for 3 people with dementia. Each video is approximately 20 minutes long. During the two-week experimental period, the caregiver was asked to change the video and audio effects once a day, and the dementia watched all types of videos. When people with dementia got tired of the video, they asked them to change the video to another type. For caregivers, how long a person with dementia has watched that type of video, and how concentrated they have been watching it, can measure the subjectivity of the caregiver on a scale of “1” to “5”. I was evaluated and recorded. After the experimental period, an interview was conducted to investigate the needs of caregivers.
As a result, the following findings were obtained.

(1) ナレーションは、魅力的な思い出ビデオに非常に有効である。 (1) Narration is very useful for engaging memories videos.

(2) 写真中のよく知った人々の顔に対して、ズームアップすることも不可欠である。 (2) It is also essential to zoom in on the faces of familiar people in the photo.

(3) さらに、ナレーションとズームが連動していることが重要である。すなわち、ナレーションが写真中の特定の人に関するものなら、ナレーションはその人のズームアップの際に付与されなければならない。 (3) Furthermore, it is important that narration and zoom are linked. That is, if the narration relates to a specific person in the photo, the narration must be given when the person zooms up.

このように、思い出ビデオは認知症者の安定した精神状態を形成する上で有効性が確認されているが、特に、映像効果を付与したりナレーションを付与したりすることは重要な要素であり、さらにそれらは上述のように同期させる必要がある。 In this way, the memorized video has been confirmed to be effective in forming a stable mental state of dementia, but it is particularly important to add visual effects and narration. Furthermore, they need to be synchronized as described above.

それゆえに、この発明の主たる目的は、新規な、ビデオコンテンツ作成方法を提供することである。 Therefore, the main object of the present invention is to provide a novel video content creation method .

この発明の他の目的は、映像効果要素やナレーション要素などの各要素が同期したビデオコンテンツを作成できる、ビデオコンテンツ作成方法を提供することである。 Another object of the present invention is to provide a video content creation method capable of creating video content in which elements such as a video effect element and a narration element are synchronized.

請求項１の発明は、写真遷移にフェードアウト／フェードインの視覚効果を用い、写真の再生期間中にナレーションおよび反応時間を配置したビデオコンテンツを作成する方法であって、最小フェードアウト／フェードイン時間、ナレーション時間および反応時間の総和より最大ビデオ再生時間が小さいとき写真を除外し、そして最大フェードアウト／フェードイン時間、ナレーション時間および反応時間の総和より最小ビデオ再生時間が大きいとき、写真を追加するようにした、ビデコンテンツ作成方法である。 The invention of claim 1 is a method of creating video content using fade-out / fade-in visual effects for photo transitions and arranging narration and reaction time during photo playback , wherein the minimum fade-out / fade-in time, Exclude photos when the maximum video playback time is less than the sum of the narration time and reaction time, and add a photo when the minimum video playback time is greater than the sum of the maximum fade-out / fade-in time, narration time and reaction time This is a bidet content creation method .

請求項１の発明では、たとえばイメージスキャナなどを含む写真データ入力装置（２０。実施例で相当する部分または要素を例示する参照符号。以下同様。）から、写真画像データをコンピュータ（１２）に入力する。コンピュータはその写真画像データに、たとえばDublin Core、Image Regions、FOAFなどメタ情報を付与する。同じく、コンピュータはナレーションにもアノテーションを付与し、たとえば記憶手段（２２）に格納するこの記憶手段に格納している写真画像データを用いてビデオコンテンツを作成するが、コンピュータ（１２）は、各写真に付与する映像効果要素、ナレーション要素および反応時間要素の局所的制約をたとえばタイムライン上に設定する。レンダリング処理（Ｓ５）における時間調整手段は、たとえばコンピュータであって、各写真に対する映像効果要素、ナレーション要素および反応時間要素の各所要時間の総和を計算し、それが、たとえばビデオコンテンツの所望再生時間になるように調整する。 According to the first aspect of the present invention, for example, photographic image data is input to a computer (12) from a photographic data input device (20. Reference numerals exemplifying corresponding parts or elements in the embodiment; the same applies hereinafter) including an image scanner. To do. The computer adds meta information such as Dublin Core, Image Regions, and FOAF to the photographic image data. Similarly, the computer also annotates the narration and creates video content using the photographic image data stored in the storage means stored in the storage means (22), for example. The computer (12) For example, local restrictions on the video effect element, the narration element, and the reaction time element to be assigned to are set on the timeline. The time adjusting means in the rendering process (S5) is, for example, a computer, and calculates the total required time of the video effect element, the narration element, and the reaction time element for each photograph, and this is, for example, a desired playback time of the video content. Adjust so that

請求項１の発明では、写真のアノテーションされた領域に対するケンバーンズ効果などの映像効果要素と、オーディオの効果で特にナレーション要素との間のセマンティックな同期制約を考慮することで、簡単に魅力的なスライドショービデオ（ビデオコンテンツ）を製作できる。しかも、視聴者の反応時間も確実に確保できるので、視聴者による反応中に次のナレーションがオーバーラップするなどの不具合も生じない。 According to the first aspect of the present invention, it is easily attractive by considering the semantic synchronization constraint between the video effect element such as the Cambridge effect on the annotated region of the photograph and the narration element particularly in the audio effect. Can produce slideshow video (video content). In addition, since the viewer's reaction time can be reliably ensured, there is no problem such that the next narration overlaps during the viewer's reaction.

請求項２の発明は、写真遷移にフェードアウト／フェードインの視覚効果を用い、同じ写真内での人から人へ遷移にパンズームの視覚効果を用い、写真の再生期間中にナレーションおよび反応時間を配置したビデオコンテンツを作成する方法であって、最小フェードアウト／フェードイン時間、ナレーション時間、反応時間および最小パンズーム時間の総和より最大ビデオ再生時間が小さいとき写真を除外し、そして最大フェードアウト／フェードイン時間、ナレーション時間、反応時間および最大パンズーム時間の総和より最小ビデオ再生時間が大きいとき、写真を追加するようにした、ビデコンテンツ作成方法である。 The invention of claim 2 uses a fade-out / fade-in visual effect for photo transitions, uses a pan-zoom visual effect for person-to-person transitions in the same photo, and arranges narration and reaction time during photo playback To create a video content that excludes photos when the maximum video playback time is less than the sum of the minimum fade-out / fade-in time, narration time, reaction time and minimum pan-zoom time, and the maximum fade-out / fade-in time, This is a video content creation method in which a photo is added when the minimum video playback time is larger than the sum of the narration time, reaction time, and maximum pan-zoom time .

この発明によれば、写真のアノテーションされた領域に対するケンバーンズ効果などの映像効果要素と、ナレーション要素および反応時間要素との間のセマンティック(semantic)な同期制約を考慮することで、簡単に魅力的なビデオコンテンツを製作できる。 In accordance with the present invention, it is easily attractive by considering the semantic synchronization constraints between the video effect elements such as the Cambridge effect on the annotated region of the photo, and the narration and reaction time elements. Video content can be produced.

この発明の上述の目的、その他の目的、特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features, and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

図１を参照して、この実施例のビデオコンテンツ作成装置１０は、コンピュータ１２を含む。このコンピュータ１２としては、パーソナルコンピュータやワークステーションが利用可能である。 With reference to FIG. 1, a video content creation apparatus 10 of this embodiment includes a computer 12. As the computer 12, a personal computer or a workstation can be used.

コンピュータ１２にはハードディスクやＲＡＭのような内部メモリ１４が設けられていて、その内部メモリ１４には、メタ情報を入力するためのツールとして、たとえば、DublinCore（http://dublincore.org）、ImageRegions（http://www.w3.org）、FOAF （http://www.foaf-project.org）、Jena2(2.1)（http://jena.sourceforge.net）などが設定されている。これらはいずれも、写真画像データに関連するメタ情報を入力または登録もしくは付与する手段として機能する。 The computer 12 is provided with an internal memory 14 such as a hard disk or a RAM. In the internal memory 14, as tools for inputting meta information, for example, DublinCore (http://dublincore.org), ImageRegions (Http://www.w3.org), FOAF (http://www.foaf-project.org), Jena2 (2.1) (http://jena.sourceforge.net), etc. are set. All of these function as means for inputting, registering or giving meta information related to photographic image data.

ここで、メタ情報とは、データに関する構造化した情報を意味し、実施例の場合、取り扱う写真画像データの詳細を構造的に記述する情報である。このようなメタ情報を付与することをアノテーション(annotation)という。 Here, meta information means structured information regarding data, and in the case of the embodiment, is information that structurally describes details of photographic image data to be handled. Giving such meta information is called annotation.

Dublin Coreはメタ情報を取り扱う代表的なツールとして知られていて、さらに、写真中の人物の顔やオブジェクトの領域（以後、「リージョン」と呼ぶ。）を複数個指定し、これをメタ情報として保持するためにImage Regionsを利用する。リージョンが人の場合
には、FOAFを利用して人のメタ情報を付与する。リージョンは静止画である写真に対して、そこへのズーム、リージョン間のパンといったエフェクトを加えるために用いる。メタ情報をＲＤＦの形式でデータベースに格納するために、Jena2(2。1)を使用する。 Dublin Core is known as a representative tool for handling meta information. In addition, a person's face and object area (hereinafter referred to as "region") in a photo is specified and used as meta information. Use Image Regions to hold. If the region is a person, add meta information of the person using FOAF. Regions are used to add effects such as zooming and panning between regions to still photos. Jena2 (2.1) is used to store the meta information in the database in the RDF format.

コンピュータ１２は図示しないが、グラフィックボードまたはプロセサやサウンドボードまたはプロセサを内蔵していて、それらを通して、ＧＵＩ画面や他のグラフィックス（映像）がモニタ１６に表示されるとともに、スピーカ１８からＢＧＭ（バックグラウンドミュージック）などの音声が出力される。 Although not shown, the computer 12 includes a graphic board, processor, sound board, or processor, through which a GUI screen and other graphics (video) are displayed on the monitor 16 and a BGM (backplane) from the speaker 18. Sound such as (ground music) is output.

コンピュータ１２にはさらに、写真データ入力装置２０が接続される。この写真データ入力装置２０は、イメージスキャナ、ディジタルカメラ（デジカメ）、インターネット（Ｗｅｂ）などの少なくとも１つを含む。イメージスキャナは、認知症者の過去の写真をスキャンしてカラーまたはモノクロの写真画像データを入力する。デジカメはリアルタイムで撮影した写真画像データを入力できる他、過去の写真を撮影してそれらの写真画像データを入力するという使い方もできる。インターネットは、遠隔地から送信される認知症者の過去の写真の写真画像データを入力し、さらには必要に応じて認知症者の過去にまつわる事象の写真画像データを取り込むために使用できる。さらに他の種類の写真データ入力装置が用いられてもよい。 Further, a photographic data input device 20 is connected to the computer 12. The photo data input device 20 includes at least one of an image scanner, a digital camera (digital camera), the Internet (Web), and the like. The image scanner scans past photographs of persons with dementia and inputs color or monochrome photographic image data. Digital cameras can input photographic image data taken in real time, and can also be used to take past photographs and input those photographic image data. The Internet can be used to input photographic image data of a past photo of a person with dementia transmitted from a remote location, and to capture photographic image data of an event related to the past of the dementia as needed. Still other types of photo data input devices may be used.

さらに、コンピュータ１２には、インタフェース２４を介してデータベース２２が結合される。この実施例では、データベース２２にはPostgreSQL7.4（http://www.postgresql.org）というリレーショナルデータベースを用いる。 Further, a database 22 is coupled to the computer 12 via an interface 24. In this embodiment, the database 22 uses a relational database called PostgreSQL7.4 (http://www.postgresql.org).

なお、図示していないが、コンピュータ１２は当然、キーボードやマウスなどの入力手段を持っている。 Although not shown, the computer 12 naturally has input means such as a keyboard and a mouse.

図１の実施例のビデオコンテンツ作成装置１０を用いて思い出ビデオを作成するためには、コンピュータ１２などは図２に示す手順で動作する。 In order to create a memory video using the video content creation apparatus 10 of the embodiment of FIG. 1, the computer 12 and the like operate according to the procedure shown in FIG.

まず、ステップＳ１で、思い出ビデオに使用する可能性のある写真画像データを入力するとともに、各写真に対してアノテーションを付与する。具体的には、図１の写真データ入力装置２０を用いて、主として、当該認知症者の過去の写真の写真画像データをコンピュータ１２に入力する。 First, in step S1, photographic image data that may be used for a memory video is input, and annotations are assigned to each photo. Specifically, the photographic image data of the past photograph of the person with dementia is mainly input to the computer 12 using the photographic data input device 20 of FIG.

このとき、図３に示すＧＵＩ（Graphical User Interface）画面２６がモニタ１６に表示される。このＧＵＩ２６は、モニタ画面の左方の大部分を占める、写真表示編集領域２８を含む。この写真表示編集領域２８は、そこに入力した写真画像データの写真を表示し、さらには、リージョンを指定するなどの編集作業のために利用される。ＧＵＩ２６は、モニタ画面の右方に形成されるサムネイル表示領域３０を含む。サムネイル表示領域３０には、入力しかつ後述のようにして検索した写真画像データから使用する写真を選択するためにサムネイル画像を表示する。 At this time, a GUI (Graphical User Interface) screen 26 shown in FIG. The GUI 26 includes a photo display editing area 28 that occupies most of the left side of the monitor screen. The photo display / editing area 28 displays a photo of the photo image data input thereto, and is used for editing work such as designating a region. The GUI 26 includes a thumbnail display area 30 formed on the right side of the monitor screen. A thumbnail image is displayed in the thumbnail display area 30 in order to select a photo to be used from photographic image data that has been input and searched as described below.

ＧＵＩ２６には、モニタ画面の下方に形成される、第１メタ情報入力領域３２、モニタ画面のほぼ中央に形成される第２メタ情報入力領域３４、およびモニタ画面の右下方に形成される再生順設定領域３６が設けられる。メタ情報入力領域３２には、領域２８に表示もされている写真全般に関わるDublin Core で定義されたメタ情報を入力する。また、メタ情報入力領域３４はポップアップ形式で入力時に現れる領域であり、それを用いて、FOAFを利用してリージョンが人の場合のメタ情報を入力する。そして、再生順設定領域３６は、思い出ビデオに取り込んだ写真を再生する順番を設定するために利用され、後に説明するように、ドラッグアンドドロップで順番を入れ替えることができる。 The GUI 26 includes a first meta information input area 32 formed below the monitor screen, a second meta information input area 34 formed substantially at the center of the monitor screen, and a playback order formed at the lower right of the monitor screen. A setting area 36 is provided. In the meta information input area 32, meta information defined in Dublin Core related to all the photographs displayed in the area 28 is input. Further, the meta information input area 34 is an area that appears at the time of input in a pop-up format, and is used to input meta information when the region is a person using FOAF. The playback order setting area 36 is used to set the order in which the photos taken in the memory video are played back, and the order can be changed by drag and drop as will be described later.

ステップＳ１での写真入力およびメタ情報登録について、具体的に説明する。図４を参照し、今、写真データ入力装置２０から図４に示すような写真（図面では線画であるが、実際は写真である。他の図面でも同様。）の写真画像データが入力されると、コンピュータ１２は、モニタ１６のＧＵＩ２６の写真表示編集領域２８にその画像データで表される写真（静止画）を表示する。それとともに、コンピュータ１２は、その写真のサムネイル画像をサムネイル表示領域３０に表示する。 The photo input and meta information registration in step S1 will be specifically described. Referring to FIG. 4, when photographic image data of a photograph (a drawing is a line drawing in the drawing but is actually a photograph. The same applies to other drawings) as shown in FIG. The computer 12 displays the photograph (still image) represented by the image data in the photograph display editing area 28 of the GUI 26 of the monitor 16. At the same time, the computer 12 displays a thumbnail image of the photo in the thumbnail display area 30.

そして、図示しないマウス等を利用して、その写真中で２つのリージョン、リージョン１およびリージョン２を設定すると、図５に示すように、写真表示編集領域２８中に、リージョン１およびリージョン２をそれぞれ特定する矩形枠２９ａおよび２９ｂが表示される。このようなリージョン１およびリージョン２を指定すると、Image Regionsで定義される形式で、図６に示す各リージョンのメタ情報が設定される。つまり、リージョン１の原点（矩形枠の左上の角）の座標（ｘ１１、ｙ１１）およびそれの対角の座標（ｘ１２、ｙ１２）が登録され、さらにそのリージョン１の高さｈ１および幅ｗ１もメタ情報として登録される。同様に、リージョン２についても、原点座標、対角座標、高さ、および幅が、それぞれ、ｘ２１、ｙ２１、ｘ２２、ｙ２２、ｈ２、およびｗ２として登録される。 Then, using a mouse or the like (not shown), if two regions, region 1 and region 2, are set in the photo, region 1 and region 2 are respectively displayed in the photo display editing area 28 as shown in FIG. The specified rectangular frames 29a and 29b are displayed. When such region 1 and region 2 are designated, meta information of each region shown in FIG. 6 is set in a format defined by Image Regions. That is, the coordinates (x11, y11) of the origin of the region 1 (upper left corner of the rectangular frame) and the diagonal coordinates (x12, y12) thereof are registered, and the height h1 and width w1 of the region 1 are also set as meta. Registered as information. Similarly, for the region 2, the origin coordinates, diagonal coordinates, height, and width are registered as x21, y21, x22, y22, h2, and w2, respectively.

このようにしてステップＳ１で写真を入力し、メタ情報を登録すると、たとえば図７に示すようなデータベースができる。この図７では右に、図５に示した、実際の写真が表示されリージョンが指定されている写真表示編集領域２８が描かれている。そして、楕円形の中に「dc:」とあり、それに関連する矩形の中のデータが、Dublin Coreで登録したメタ情報である。たとえば「dc:date」では日付「２００４０７１６（２００４年７月１６日）」が、「dc:title」では名称「at Disney Animal Kingdom（ディズニー動物王国にて）」が、「dc:description」では説明文「They are very happy.(彼等は非常に楽しそう)」がメタ情報として登録される。 Thus, if a photograph is input in step S1 and meta information is registered, a database as shown in FIG. 7, for example, is created. In FIG. 7, a photo display editing area 28 shown in FIG. 5 in which an actual photo is displayed and a region is designated is drawn on the right. Then, there is “dc:” in the ellipse, and the data in the rectangle related to it is meta information registered in Dublin Core. For example, “dc: date” has the date “200407716 (July 16, 2004)”, “dc: title” has the name “at Disney Animal Kingdom”, and “dc: description” has the description The sentence “They are very happy.” Is registered as meta information.

楕円形のなかに「imgReg:」とあるメタ情報は、Image Regionsでリージョンを指定したときのメタ情報である。「imgReg:has region」はリージョンが設定されていることを示すメタ情報であり、「imgReg:Rectangle」はリージョンが矩形であることを示し、「imgReg:regionDepict」はリージョンの説明で、「imgReg:boundingBox」はリージョンの原点位置とサイズ（高さｈ、幅ｗ）とを含む。「imgReg:coords」はリージョンの原点位置および対角位置の座標である。 The meta information “imgReg:” in the ellipse is meta information when a region is designated by Image Regions. “ImgReg: has region” is meta information indicating that the region is set, “imgReg: Rectangle” indicates that the region is rectangular, “imgReg: regionDepict” is a description of the region, and “imgReg: “BoundingBox” includes the origin position and size (height h, width w) of the region. “ImgReg: coords” is the coordinates of the origin position and the diagonal position of the region.

また、「foaf:gender」で与えられるメタ情報は、リージョンが人である場合の性別（例示では「female(女)」）であり、「foaf:name」で与えられるメタ情報は名前（例示では「Haruka(はるか)」）であり、「foaf:Person」で与えられるメタ情報はViewer（この思い出ビデオを観る人）と写真に写った人との関係を示し、例示では、「孫（grandchild）」であることがわかる。 The meta information given by “foaf: gender” is the gender (“female (female)” in the example) when the region is a person, and the meta information given by “foaf: name” is the name (in the example "Haruka (Haruka)"), the meta information given by "foaf: Person" shows the relationship between the viewer (the person watching this memorable video) and the person in the photo. In the example, "grandchild" "It can be seen that it is.

なお、図７に示す各メタ情報は、それぞれ以下に示される。 In addition, each meta information shown in FIG. 7 is shown below, respectively.

ｘｍｌｎｓ：ｉｍｇＲｅｇ＝“http://www.w3.org/2004/02/image-regions#”
ｘｍｌｎｓ：ｆｏａｆ＝“xmlns.com/foaf/0.1”
ｘｍｌｎｓ：ｄｃ＝“http://purl.org/dc/elements/1.1/”
このようにして、図２のステップＳ１で写真入力と、メタ情報登録が行なわれると、コンピュータ１２は、その写真と、それに付与したメタ情報とをデータベース２２に格納する。なお、上の説明では１枚の写真とそれのメタ情報について説明したが、入力装置２０で写真画像データを入力する都度、同じようにして、図７に示すようなメタ情報が登録され、そのメタ情報を付与した写真データがデータベース２２に格納される。 xmlns: imgReg = “http://www.w3.org/2004/02/image-regions#”
xmlns: foo = “xmlns.com/foaf/0.1”
xmlns: dc = “http://purl.org/dc/elements/1.1/”
When the photo input and meta information registration are performed in step S1 of FIG. 2, the computer 12 stores the photo and the meta information assigned thereto in the database 22. In the above description, one photo and its meta information have been described. However, every time photographic image data is input by the input device 20, meta information as shown in FIG. Photo data to which meta information is added is stored in the database 22.

ステップＳ１では、さらに、さらにナレーションについてもアノテーションを付与してデータベース化した。つまり、前にも述べたように、思い出ビデオにとってナレーションが重要であるが、ナレーション付与は、非常に手間がかかる作業である。したがって、この実施例では、ナレーション付与作業の時間短縮のため、数千の典型的なナレーションテキストと、それらに対応する音声データとを含む、ナレーションデータベースを構築し、データベース２２（図１）に登録した。 In step S1, the narration is further annotated to create a database. In other words, as described above, narration is important for a memory video, but narration is a very time-consuming task. Therefore, in this embodiment, in order to shorten the narration assignment time, a narration database including thousands of typical narration texts and corresponding voice data is constructed and registered in the database 22 (FIG. 1). did.

後に詳細に説明するように、典型的なナレーションテキストは、日本語の文法に基づいた簡単な規則を使用して作成した。典型的なナレーションテキストは、その規則に対して、副詞、形容詞、および幾つかの名詞の組み合わせを与えて生成した。また、これらの語は、ナレーションのアノテーションに使用し、これらの語の組み合わせは共起確率に基づいて決定した。そして、音声合成技術を使って、またはプロのナレーターがそれらを読み上げて音声データを作成し、それをデータベース化し、同じくデータベース２２に登録した。 As explained in detail later, typical narration text was created using simple rules based on Japanese grammar. A typical narration text was generated by giving the rule a combination of adverbs, adjectives, and several nouns. These words were used for narration annotations, and the combinations of these words were determined based on the co-occurrence probabilities. Then, using voice synthesis technology, or a professional narrator reads them out to create voice data, which is converted into a database and registered in the database 22 as well.

このとき、ビデオ製作者が、何千ものナレーションデータから、全ての写真の、全ての写真中の人物に対して、最も適切なナレーションを容易に選択できるようにするために、この実施例では、ナレーションと写真の両方のアノテーションを関連付けてナレーションデータを絞り込むようにしたので、ビデオ製作者は、絞り込まれたナレーションから適切なものを選択するだけで、ナレーションデータの設定が可能である。 In this example, in order to allow the video producer to easily select the most appropriate narration from thousands of narration data for all photos, and for the person in all photos, Since the narration data is narrowed down by associating both narration and photo annotations, the video producer can set the narration data only by selecting an appropriate one from the narrowed narration.

図８はナレーションに付与されたアノテーションの例である。ナレーションのアノテーションには独自のボキャブラリーを定義した。図８でnaはこのボキャブラリーのネームスペースを表す。”Narration1.wav”のナレーションテキストは、na:textによって指定される。そして”Narration1.wav”はna:keywordによって指定される、幾つかのインスタンスを指す。図中の左のインスタンスはナレーションテキストに関連する人を示す。それはna:referToによりfoaf:Personのインスタンスを指している。一方、右側のインスタンスは、その人がどのように見えるかを示しており、dc:descriptionが当該のナレーションテキストを作る際に使用した語を示している。 FIG. 8 shows an example of the annotation given to the narration. A unique vocabulary was defined for narration annotations. In FIG. 8, na represents the namespace of this vocabulary. The narration text of “Narration1.wav” is specified by na: text. "Narration1.wav" points to several instances specified by na: keyword. The left instance in the figure shows the person associated with the narration text. It points to an instance of foaf: Person by na: referTo. On the other hand, the instance on the right shows what the person looks like and shows the word that dc: description used to create the narration text.

その後、図２のステップＳ２では、ステップＳ１で入力した写真のうち、そのとき使う写真を検索し、使用する写真の組を選択する。ただし、写真の検索では、上で説明したメタ情報が利用される。メタ情報を利用した写真の検索条件としては、たとえば、以下のものが利用できる。 Thereafter, in step S2 of FIG. 2, the photograph used at that time is searched from the photographs input in step S1, and a set of photographs to be used is selected. However, the meta information described above is used in the photo search. For example, the following can be used as search conditions for photographs using meta information.

まず、FOAFではリージョンの種類が人であるとき、その人を特定するメタ情報を付与するのであるから、このFOAFのメタ情報を利用して、「特定の人物の写っている写真」を検索することができる。複数の人物を同時に検索できるが、この場合には、検索した名前の全員が写っている写真が対象となる。 First, in FOAF, when the region type is a person, meta information that identifies the person is added, so use this FOAF meta information to search for "photos of a specific person" be able to. A plurality of persons can be searched simultaneously, but in this case, a photograph in which all of the searched names are shown is targeted.

Dublin Coreのメタ情報を利用する場合には、「撮影年月日」で検索できる。たとえば、「From (第１指定日)〜To（第２指定日）」で第１指定日以降第２指定日以前に撮影した全ての写真が検索できる。「From (指定日)」でその指定日以降に撮影した全ての写真が検索できる。同様に、「To（指定日）」でその指定日以前に撮影した全ての写真が検索できる。また、「特定のプロパティに特定の値が含まれている写真」を検索できる。たとえば、「dc:title」に「Disney」の文字が含まれる写真など。ただし、複数同時に選択または検索可能であるが、複数選択時には、設定された検索条件を同時に充足する写真だけが対象となる。 When using the Dublin Core meta information, you can search by “shooting date”. For example, in “From (first specified date) to To (second specified date)”, it is possible to search for all photos taken after the first specified date and before the second specified date. With “From (specified date)”, you can search all photos taken after the specified date. Similarly, “To (specified date)” can be used to search for all photos taken before the specified date. In addition, it is possible to search for “photos whose specific properties include specific values”. For example, photos with “Disney” in “dc: title”. However, a plurality of images can be selected or searched at the same time, but at the time of selecting a plurality of images, only photographs that satisfy the set search conditions at the same time are targeted.

そして、検索した写真から実際に使用する写真の組を選択する場合、その組の複数の写真の再生順序を指定または決定する。再生順を設定するためには、たとえば、図９のようなＧＵＩ２６において、サムネイル表示領域３０に表示されるので、その一覧画像の中の写真を再生順序設定領域３６にドラッグアンドドロップすることによって、選択した写真の再生順を設定することができる。ただし、ここでは、その具体的な詳細は省略する。 Then, when a set of photos to be actually used is selected from the searched photos, the reproduction order of a plurality of photos in the set is designated or determined. In order to set the playback order, for example, in the GUI 26 as shown in FIG. 9, it is displayed in the thumbnail display area 30, so by dragging and dropping a photo in the list image to the playback order setting area 36, You can set the playback order of selected photos. However, the specific details are omitted here.

このステップＳ２で使用する写真の組を選択したとき、好ましくは、それらの写真の各々に優先順位を付与しておく、この優先順位は、思い出ビデオに使用したい順位のことである。たとえば後に説明するように、思い出ビデオ全体の総時間数がたとえば２０分とか３０分とか限られているとき、映像効果やナレーションに必要な時間を計算すると写真の枚数を減らさざるを得ないとき、最も低い優先順位の写真から除いていく作業をする。 When a set of photos to be used in step S2 is selected, a priority is preferably given to each of the photos. This priority is a ranking to be used for the memory video. For example, as explained later, when the total number of hours of the entire memory video is limited to 20 minutes or 30 minutes, for example, when calculating the time required for video effects and narration, the number of photos must be reduced. Work out from the lowest priority photos.

つづいて、図２のステップＳ３で、写真の再生時に流すＢＧＭおよびナレーションを選択する。この実施例では、たとえばＭＰ３形式の楽曲ファイルをサポートしていて、その中からＢＧＭとして使う楽曲をプレイリストとして登録すれば、ＢＧＭがそのプレイリストに沿って、再生されるようにしているが、ＢＧＭについてはここではこれ以上の説明は省略する。 Subsequently, in step S3 of FIG. 2, BGM and narration to be played at the time of reproduction of the photograph are selected. In this embodiment, for example, an MP3 format music file is supported, and if a song to be used as a BGM is registered as a playlist, the BGM is played along the playlist. Further explanation of BGM is omitted here.

次に図２のステップＳ４では、プレビューをするかどうか判断する。写真の選択などをすべて終えたときにこのステップＳ４で“ＹＥＳ”が判断されることになるが、そうでないときには、“ＮＯ”と判断され、ステップＳ２やステップＳ３に戻って作業を続ける。つまり、図２の各ステップＳ１−Ｓ４はそれぞれ、メニューでの選択によって、任意の時間に任意の作業量で何回も実行可能であるので、製作者は、時間のあるときに、必要なだけ必要な作業を行なえばよい。いずれの場合にも、前回までの作業の結果はデータベース２２に格納されているので、今回の作業では、まず、データベース２２から前回までのデータを読み出し、その後それに続行した処理またはそれを変更する処理を行う。 Next, in step S4 of FIG. 2, it is determined whether or not to preview. “YES” is determined in step S4 when the selection of all the photos and the like are completed. If not, “NO” is determined, and the process returns to step S2 or step S3 to continue the operation. That is, each step S1-S4 of FIG. 2 can be executed any number of times at an arbitrary time by selecting from the menu, so that the producer needs only when he has time. You only need to do the necessary work. In any case, since the result of the previous work is stored in the database 22, in this work, first, the data up to the previous time is read from the database 22, and then the process that continues or changes the process. I do.

そして、ステップＳ４で“ＹＥＳ”を判断したときには、続くステップＳ５で、セマンティクスを考慮したメディア同期の手法に則り、ケンバーンズ効果、ＢＧＭおよびナレーションを付与することによって、たとえばＲＶＭＬ形式の思い出ビデオをレンダリングする。 When “YES” is determined in step S4, in the subsequent step S5, for example, a memory video in the RVML format is rendered by adding a Cambridge effect, BGM, and narration in accordance with a media synchronization method considering semantics. To do.

なお、「ＲＶＭＬ」とは、ｓｗｆを完全に表現できるように設計されたＸＭＬの一種であり、すべてのバージョンのｓｗｆ動画はＲＶＭＬとして表現できる。ただし、ｓｗｆは、Ｆｌａｓｈのバージョンやフレームレートなどの情報を持つファイルヘッダを除くと、基本的にはタグの列だけである。たとえば、定義タグで図形を定義し、操作タグでその図形をフレームに配置し、表示タグで現在のフレームを画面に描画する、というパターンが１フレームに相当し、これを繰り返す。 Note that “RVML” is a kind of XML designed so that swf can be completely expressed, and all versions of swf moving images can be expressed as RVML. However, swf is basically only a column of tags excluding a file header having information such as a Flash version and a frame rate. For example, a pattern in which a figure is defined by a definition tag, the figure is arranged in a frame by an operation tag, and the current frame is drawn on the screen by a display tag corresponds to one frame, and this is repeated.

そして、生成されたＲＶＭＬは、ＲＶＭＬからＦｌａｓｈツールであるKineticFusion（http://www.kinesissoftware.com）を使用すれば、Ｆｌａｓｈムービー形式の思い出ビデオが作成できる。 The generated RVML can be used to create a flash movie format memory video by using KineticFusion (http://www.kinesissoftware.com), which is a flash tool, from the RVML.

図２のステップＳ５は、具体的には、図１０に示す手順で実行される。最初のステップＳ１１では、コンピュータ１２は、まず、ＢＧＭとしてのプレイリスト（図示せず）の楽曲の再生を開始する。以後、楽曲は、そのプレイリストに沿って、順次切り替わり、かつ連続的に再生される。つまり、コンピュータ１２は、プレイリストに登録した楽曲データを読み出し、それをサウンドボードまたはプロセサで処理させることによって、スピーカ１８（図１）から、当該プレイリストの楽曲が音声として再生される。 Specifically, step S5 in FIG. 2 is executed according to the procedure shown in FIG. In the first step S <b> 11, the computer 12 first starts playing a song in a playlist (not shown) as a BGM. Thereafter, the music is sequentially switched and continuously reproduced along the playlist. That is, the computer 12 reads the music data registered in the playlist, and processes the music data with a sound board or processor, so that the music in the playlist is reproduced as sound from the speaker 18 (FIG. 1).

次のステップＳ１３では、コンピュータ１２は、モニタ１６の表示画面に形成されたＧＵＩ２６の写真表示編集領域２８（図３）の幅をＳｗ、高さをＳｈとして設定する。 In the next step S13, the computer 12 sets the width of the photo display editing area 28 (FIG. 3) of the GUI 26 formed on the display screen of the monitor 16 as Sw and the height as Sh.

ついで、ステップＳ１５で、コンピュータ１２は、写真個数ｎをインクリメント（ｎ＝ｎ＋１）する。そして、次のステップＳ１７では、再生順設定領域３６（図９）に挙げられたｎ番目の写真がフェードイン態様で、表示編集領域２８に表示される。つまり、コンピュータ１２は、データベース２２（図１）から再生順のｎ番目の写真とそれに付随するメタ情報（アノテーション）とを読み出し、ｎ番目の写真を表示する。ただし、最初はｎ＝１であるので、１番目の写真がフェードイン表示される。 In step S15, the computer 12 increments the number of photographs n (n = n + 1). In the next step S17, the nth photo listed in the reproduction order setting area 36 (FIG. 9) is displayed in the display editing area 28 in a fade-in manner. That is, the computer 12 reads the nth photo in the playback order and the meta information (annotation) associated therewith from the database 22 (FIG. 1), and displays the nth photo. However, since n = 1 at the beginning, the first photo is displayed in a fade-in manner.

そして、次のステップＳ１８において、コンピュータ１２は、その写真についてナレーションが付与されているかどうか判断し、既にナレーションが付与されて登録されている場合には、このナレーションをステップＳ１９で再生する。したがって、このステップＳ１９において、既登録のナレーションが確認できる。 Then, in the next step S18, the computer 12 determines whether or not narration is given to the photograph. If the narration is already given and registered, the narration is reproduced in step S19. Therefore, the registered narration can be confirmed in step S19.

ただし、ナレーションが未だ付与されていないときには、その後、ステップＳ２０での一定時間のポーズの後、次のステップＳ２１で、コンピュータ１２は、そのｎ番目の写真にリージョンが指定されているかどうか、たとえばImage Regionsのメタ情報があるかで判断する。リージョンが指定されているなら、次のステップＳ２３で、コンピュータ１２は、リージョン番号ｍをインクリメント（ｍ＋１）する。そして、ステップＳ２５で、ｍ番目のリージョンのImage Regionsのメタ情報を参照する。このメタ情報に、ｍ番目のリージョンの位置データやサイズデータが含まれる。したがって、次のステップＳ２７で、コンピュータ１２は、そのようなメタ情報および先に設定した表示領域の高さＳｈおよび幅Ｓｗを利用して、ｍ番目のリージョンを、領域２８の中央に位置合わせする。 However, when the narration has not been given yet, after the pause for a fixed time in step S20, in the next step S21, the computer 12 determines whether or not a region is designated for the nth photo, for example, Image Judged by whether there is meta information of Regions. If the region is designated, in the next step S23, the computer 12 increments the region number m (m + 1). In step S25, the meta information of Image Regions of the mth region is referred to. This meta information includes position data and size data of the mth region. Therefore, in the next step S27, the computer 12 uses the meta information and the previously set display area height Sh and width Sw to align the mth region with the center of the area 28. .

一例として、そのリージョンの幅をｗ１、高さをｈ１とすると、拡大率をｍｉｎ［Ｓｗ／ｗ１、Ｓｈ／ｈ１］として、そのリージョンがちょうど表示画面２８の中央に収まるまで、１フレームずつ、表示画面２８に対して写真画像を横軸にｄＸ、縦軸にｄＹ移動し、ｄＺ分拡大して表示する。 As an example, if the width of the region is w1 and the height is h1, the enlargement ratio is min [Sw / w1, Sh / h1], and the frame is displayed frame by frame until it just fits in the center of the display screen 28. The photographic image is displayed on the screen 28 by moving dX on the horizontal axis and dY on the vertical axis, and enlarged by dZ.

ただし、このステップＳ２７では、そのリージョンの画像を表示領域の中央に位置合わせするだけでなく、たとえば、左上、右下など他の位置に位置合わせするようにしてもよい。 However, in step S27, the image of the region is not only aligned with the center of the display area, but may be aligned with other positions such as upper left and lower right, for example.

その後、ステップＳ２９でのポーズの後、次のステップＳ３１で、コンピュータ１２は、残りリージョンがなくなったかどうかを判断する。つまり、Image Regionsのメタ情報からリージョン個数がわかるので、このステップＳ３１では、ステップＳ２３でインクリメントした結果がそのリージョン個数に等しくなったかどうか判断すればよい。 Thereafter, after the pause in step S29, in the next step S31, the computer 12 determines whether or not there are no remaining regions. That is, since the number of regions can be known from the image region meta information, in step S31, it may be determined whether the result incremented in step S23 is equal to the number of regions.

残りリージョンがあれば、次のステップＳ３２において、コンピュータ１２は、表１に示す識別子ＲＴ−１、ＲＴ−２およびＲＴ−４で示される映像効果Ａ−１、Ａ−２を付与する。 If there are remaining regions, in the next step S32, the computer 12 assigns video effects A-1 and A-2 indicated by identifiers RT-1, RT-2 and RT-4 shown in Table 1.

ここで、映像効果について説明する。発明者等は、映像編集の経験のある複数のクリエータ(製作者)に、写真から思い出ビデオに変換する際に付加できる映像効果（各写真に付与したエフェクト、写真間のトランジション）について、どのようなものが想定できるか、聞き取り調査した。その結果は、以下のＡ-１〜Ａ−３およびＢ−１〜Ｂ−５であった。
（Ａ）エフェクト
Ａ-１：写真中の人物の顔を含む矩形領域（以下、リージョンと呼ぶ）に対するズームアップ、パンという、いわゆるケンバーンズ（Ken Burns）効果。
Ａ−２：パンの代わりに、ズームアップされたリージョンをフェードアウトして、次のリージョンをフェードインする。
Ａ−３：カラー写真をまずモノクロ調に表示した後、徐々にカラー表示に遷移させていく。
（Ｂ）トランジション
Ｂ-１：前の写真をフェードアウトしながら次の写真をフェードインし、それらをオーバ
ーラップさせる。
Ｂ-２：次の写真をスライドインさせる。
Ｂ-３：前の写真をディゾルブして次の写真に遷移する。
Ｂ-４：前の写真をページピール（右下隅から捲り上げるような効果）して次の写真に遷
移する。
Ｂ-５：前の写真を中心の縦軸で回転させて、次の写真に遷移する。 Here, the video effect will be described. Inventors, etc., how to add video effects (effects given to each photo, transitions between photos) that can be added to multiple creators (producers) with video editing experience when converting from photos to memory videos An interview was conducted to see if something could be expected. The results were the following A-1 to A-3 and B-1 to B-5.
(A) Effect A-1: A so-called Ken Burns effect called zooming up and panning on a rectangular area (hereinafter referred to as a region) including a human face in a photograph.
A-2: Instead of panning, the zoomed-up region is faded out, and the next region is faded in.
A-3: First, a color photograph is displayed in monochrome, and then gradually changed to color display.
(B) Transition B-1: Fade out the next photo while fading out the previous photo, and overlap them.
B-2: Slide in the next photo.
B-3: Dissolve the previous photo and move to the next photo.
B-4: The previous photo is page peeled (an effect that rises from the lower right corner) and transitions to the next photo.
B-5: The previous photo is rotated about the vertical axis at the center to transition to the next photo.

上記の結果を踏まえて、表1に示すような思い出ビデオをレンダリングするためのテンプレートを作成した。表１の「使用された情報」に挙げた項目を写真に付与すべきアノテーションとし、アノテーションオントロジ（ontology）として設計した。アノテーション付与の枠組みとして、セマンティクＷｅｂ（これは、検索性能の向上や利便性を高める次世代Ｗｅｂ技術で、「メタ情報（内容を説明する付加情報）」と「オントロジ（メタデータを記述する用語の定義）」という２つの技術を用いる。）の枠組みを用いた。すなわち、アノテーションは、ＲＤＦ（Resource Description Framework）で記述される。これは、将来的には他人がアノテーションした写真を自分や家族の思い出ビデオに利用するような情報交換を考えたとき、Ｗｅｂとの親和性を考慮したためである。 Based on the above results, we created a template to render the memories video as shown in Table 1. The items listed in “Used Information” in Table 1 are the annotations to be given to the photos, and are designed as annotation ontologies. Semantic Web (this is a next-generation Web technology that improves search performance and improves usability as a framework for adding annotations. It includes "meta information (additional information that describes the content)" and "ontology (a term that describes metadata). Definition) ”is used. That is, the annotation is described in RDF (Resource Description Framework). This is because, in the future, when considering information exchange in which a photograph annotated by another person is used for a memorable video of yourself or a family, the compatibility with the Web is taken into consideration.

さらに、発明者等は、既存のボキャブラリーを可能な限り利用するという方針をたて、図１実施例のための実験では、撮影された日付や出来事に関しては、書誌情報に関する標準的なボキャブラリーである上述のDublin Coreで記述した。また、写真中の人物の情報を記述するためには、人の情報を記述するための標準的なボキャブラリーである上述のFOAFを用いた。そして、写真中の人物領域を記述するためには、上述のImage Regionを使用し、写真の色調はExif（http://it.jeita.or.jp/document/publica/standard/exif/english/jeida49e.htm参照）の色空間情報から取得できる。また、写真中の人物（被写体）との関係については、FOAFでknowsプロパティが定義されているが、思い出ビデオ作成においては本人(視聴者)と被写体(関係者)との間柄をより詳細に定義する必要があるため、FOAFのknowsの属性を拡張して定義されたRELATIONSHIP（http://vocab.org/relationship/参照）を利用した。これを用いて、親子関係、親戚関係などを記述した。そのサンプルが先の図７に示される。 Furthermore, the inventors have established a policy of using the existing vocabulary as much as possible, and in the experiment for the embodiment of FIG. 1, the date and event of shooting are standard vocabulary regarding bibliographic information. It was described in the above Dublin Core. In addition, in order to describe the information of the person in the photograph, the above-mentioned FOAF, which is a standard vocabulary for describing the information of the person, was used. Then, to describe the human region in the photo, use the above-mentioned Image Region, and the color of the photo is Exif (http://it.jeita.or.jp/document/publica/standard/exif/english/ (See jeida49e.htm). In addition, regarding the relationship with the person (subject) in the photo, the knows property is defined in FOAF, but in creating a memory video, the relationship between the person (viewer) and the subject (related party) is defined in more detail. Therefore, we used RELATIONSHIP (see http://vocab.org/relationship/) defined by extending the FOAF knows attribute. This was used to describe parent-child relationships, relative relationships, etc. The sample is shown in FIG.

さらに、表１に挙げたエフェクト、トランジションに対して、思い出ビデオ作成に用いる映像効果のオントロジを表２のように定義した。 Furthermore, for the effects and transitions listed in Table 1, the ontology of the video effects used for creating the memory video is defined as shown in Table 2.

そして、写真のアノテーションから映像効果への変換ルールを記述し、テンプレートとして使用する。表３に、表１の「ＲＴ−８」のレンダリングテンプレートの表現例を示す。ただし、表１に示す他の識別子に付いても同様にテンプレートを作成しておくものとする。 Then, a conversion rule from photo annotation to video effect is described and used as a template. Table 3 shows a representation example of the rendering template “RT-8” in Table 1. However, a template is created in the same manner for other identifiers shown in Table 1.

ここでは、レンダリングテンプレート自身もＲＤＦのステートメントとして記述し、上述のＲＤＱＬを用いて、レンダリングテンプレートを検索することを想定している。なお、レンダリングオントロジについては、インタオペラビリティ（相互操作性）の必要性は低いため、別の独自形式を定義してもよい。 Here, it is assumed that the rendering template itself is also described as an RDF statement, and the rendering template is searched using the above-described RDQL. Note that the rendering ontology has a low necessity for interoperability (interoperability), so another unique format may be defined.

ステップＳ３２に戻って、具体的には、コンピュータ１２は、図７のようなメタ情報のうち、リージョン座標値（imgReg:coords）およびリージョン境界値（imgReg:boundingBox）から、複数のリージョン間間隔、Ｘ方向（横方向）間隔およびＹ方向（縦方向）間隔を算出する。そして、それらのリージョン間間隔が所定の閾値以上か以下かを判別する。そして、Ｘ方向（横方向）間隔およびＹ方向（縦方向）間隔のいずれかが閾値以下のときには、ケンバーンズ効果を使用するべく、写真中の人物の顔を含むリージョンに対するズームアップをした後、次のリージョンへパンさせる。逆に、Ｘ方向（横方向）間隔およびＹ方向（縦方向）間隔のいずれかが閾値以上のときには、パンの代わりに、ズームアップされたリージョンをフェードアウトして、次のリージョンをフェードインさせる。 Returning to step S32, specifically, the computer 12 includes a plurality of inter-region intervals from the region coordinate value (imgReg: coords) and the region boundary value (imgReg: boundingBox) in the meta information as shown in FIG. The X direction (horizontal direction) interval and the Y direction (vertical direction) interval are calculated. Then, it is determined whether the inter-region spacing is greater than or less than a predetermined threshold. When either the X direction (horizontal direction) interval or the Y direction (vertical direction) interval is equal to or less than the threshold value, after zooming in on the region including the face of the person in the photo to use the Cambridge effect, Pan to the next region. Conversely, when either the X direction (horizontal direction) interval or the Y direction (vertical direction) interval is equal to or greater than the threshold, the zoomed-up region is faded out and the next region is faded in instead of panning.

また、図７に示すfoaf:personで示す被写体情報が、視聴者に思い出深い被写体であると、その被写体のリージョンだけにズームアップをし、その後、リージョン間間隔に応じたパンかフェードアウト、フェードインからの処理を行う。 Also, if the subject information indicated by foaf: person shown in FIG. 7 is a subject that is memorable to the viewer, it zooms up only to the region of that subject, and then pans or fades out and fades in according to the inter-region spacing. Perform the process.

ただし、このステップＳ３２において識別子ＲＴ−１、−２および−４のような映像効果をＡ−１およびＡ−２を自動的に付与するためには、表３で例示した変換テンプレートが用いられる。 However, the conversion templates exemplified in Table 3 are used in order to automatically give A-1 and A-2 video effects such as identifiers RT-1, -2, and -4 in step S32.

その後、先のステップＳ２３に戻り、ステップＳ２３−Ｓ３２を繰り返し実行する。 Then, it returns to previous step S23 and repeats steps S23-S32.

ステップＳ３１で“ＮＯ”なら、つまり、写真リスト欄３８でｎ番目のその写真の全てのリージョンの処理が終わったなら、次のステップＳ３３で、コンピュータ１２は、ｎ番目の写真を写真表示編集領域２８からフェードアウトさせる。 If “NO” in the step S31, that is, if all the regions of the nth photo in the photo list column 38 have been processed, in the next step S33, the computer 12 displays the nth photo in the photo display editing area. Fade out from 28.

続いて、ステップＳ３５において、コンピュータ１２は、処理すべき写真の残りがなくなったかどうかを判断する。つまり、図９で示される写真リスト中の写真の数は予め判っているので、このステップＳ３５では、ステップＳ１５でインクリメントした結果がその枚数に等しくなったかどうか判断すればよい。 Subsequently, in step S35, the computer 12 determines whether there is no remaining photo to be processed. That is, since the number of photos in the photo list shown in FIG. 9 is known in advance, in this step S35, it may be determined whether or not the result incremented in step S15 is equal to the number.

写真が未だ残っていれば、次のステップＳ３６において、コンピュータ１２は、表１に示す識別子ＲＴ−３、ＲＴ−５、ＲＴ−６、ＲＴ−７およびＲＴ−８で示される映像効果Ａ−３やＢ−１およびＢ−５を付与する。 If the photograph still remains, in the next step S36, the computer 12 causes the video effect A-3 indicated by the identifiers RT-3, RT-5, RT-6, RT-7, and RT-8 shown in Table 1. Or B-1 and B-5.

具体的には、コンピュータ１２は、メタ情報のうち、図７には図示していない、写真の色調のメタ情報を前述のExifの色空間情報から取得し、その色情報が、前の写真がモノクロで今回の写真がカラーのような色変化を示しているかどうか判断する。色変化情報があれば、コンピュータ１２は、今回のカラー写真をまずモノクロ調に表示した後、徐々にカラー表示に遷移させていくという映像効果を付与する。 Specifically, the computer 12 acquires meta information of a color tone of a photograph (not shown in FIG. 7) from the above-described Exif color space information among the meta information, and the color information is obtained from the previous photo. Judge whether the current photo shows a color change such as color in monochrome. If there is the color change information, the computer 12 gives the video effect that the current color photograph is first displayed in monochrome and then gradually changed to color display.

また、コンピュータ１２は、図７に示すメタ情報のうち、撮影日（date）のデータを取得し、前の写真と今回の写真との間の年代差を算出する。そして、その年代差が所定の閾値以上か以下か判別する。前の写真との間の年代差が閾値以下のときには、コンピュータ１２は、前の写真をフェードアウトしながら次の写真をフェードインし、それらをオーバーラップさせる、という映像効果を付与する。前の写真との間の年代差が閾値以上のときには、コンピュータ１２は、前の写真を中心の縦軸で回転させて、今回の写真に遷移する、という映像効果を付与する。 In addition, the computer 12 acquires data of the shooting date (date) from the meta information shown in FIG. 7, and calculates the age difference between the previous photo and the current photo. Then, it is determined whether the age difference is greater than or equal to a predetermined threshold value. When the age difference from the previous photo is less than or equal to the threshold value, the computer 12 gives the video effect of fading in the next photo while fading out the previous photo and overlapping them. When the age difference from the previous photo is equal to or greater than the threshold, the computer 12 gives the video effect that the previous photo is rotated about the vertical axis at the center to transition to the current photo.

さらに、コンピュータ１２は、図７に示すメタ情報のうち、撮影日や出来事（title）を参照して、シナリオが一貫しているかどうか、判断する。これは、たとえば、写真が年代順に取り込まれているか、とか同じ年代であれば季節の順序に従っているかなどを判断すればよい。そして、シナリオが一貫していると判断したときには、前の写真をフェードアウトしながら次の写真をフェードインし、それらをオーバーラップさせる、という映像効果を付与する。 Further, the computer 12 determines whether or not the scenario is consistent with reference to the shooting date and the event (title) in the meta information shown in FIG. For example, it may be determined whether the photos are taken in chronological order, or if they are of the same age, the order of the seasons is followed. When it is determined that the scenario is consistent, a video effect is applied in which the next photo is faded in while the previous photo is faded out, and they are overlapped.

さらに、上述のようにして算出した写真の年代差が所定の閾値以下ではあるが、かつシナリオの転換点であるような場合、たとえば、進学した、結婚した、子供が生まれた、などのような場合には、コンピュータ１２は、前の写真を中心の縦軸で回転させて、今回の写真に遷移する、という映像効果を付与する。 In addition, when the age difference of the photograph calculated as described above is below a predetermined threshold and is a turning point of the scenario, for example, you have entered a school, married, a child was born, etc. In this case, the computer 12 gives the video effect that the previous photo is rotated about the vertical axis at the center to transition to the current photo.

ただし、このステップＳ３６での映像効果を自動的に付与するためには、表３で例示した変換テンプレートが用いられる。 However, the conversion template exemplified in Table 3 is used to automatically give the video effect in step S36.

なお、先のステップＳ３５で写真残数があると判断したときには、先のステップＳ１５に戻り、ステップＳ１５−Ｓ３６を繰り返し実行する。ステップＳ３５で“ＮＯ”なら、コンピュータ１２はステップＳ３７でＢＧＭを停止し、終了する。 When it is determined in the previous step S35 that there is a remaining number of photos, the process returns to the previous step S15, and steps S15 to S36 are repeatedly executed. If “NO” in the step S35, the computer 12 stops the BGM in a step S37 and ends.

このようにして、図１０のフロー図に従って、写真画像データをそれに関連するメタ情報を用いて編集しさらには映像効果付与することによって、一連のビデオコンテンツ（思い出ビデオ）が生成（レンダリング）されるが、この映像効果付与と同時並行して、またはそれに続いて、ナレーション付与を行う。 In this way, according to the flowchart of FIG. 10, a series of video content (recollection video) is generated (rendered) by editing the photographic image data using the meta information related thereto and adding a video effect. However, the narration is applied simultaneously with or in parallel with the video effect application.

ナレーション付与を実行するに際しては、図４に示すＧＵＩ２６に形成されるナレーションボタン３８を操作する。ユーザは、生成したスライドショーにナレーションを付与する必要があるとき、マウス（図示せず）でこのナレーションボタン３８をクリックする。そうすると、図１１に示すように、ＧＵＩ２６のほぼ中央に、写真表示編集領域２８その他にオーバーラップする形で、ナレーション作成領域４４が、ポップアップ形式で表示される。 When giving narration, the narration button 38 formed on the GUI 26 shown in FIG. 4 is operated. When the user needs to give narration to the generated slide show, the user clicks the narration button 38 with a mouse (not shown). Then, as shown in FIG. 11, the narration creation area 44 is displayed in a pop-up format in the form of overlapping the photo display editing area 28 and others in the approximate center of the GUI 26.

このナレーション作成領域４４には、ナレーション候補テキスト表示領域４６が形成される。ナレーション候補テキスト表示領域４６は、後に説明するように、メタ情報から取得したキーワードもしくはユーザが入力したキーワードに基づいてコンピュータ１２がナレーション候補テキストを作成したとき、そのナレーション候補テキスト（文）を表示するための領域である。このナレーション候補テキスト表示領域４６に関連して、ＯＫボタン４８が配置される。このＯＫボタン４８は、ナレーション候補テキスト表示領域４６に表示されたナレーションテキスト文を選択するかどうかを指示するためのものであり、表示されているナレーション候補を使用するならそれをクリックすればよい。 In this narration creation area 44, a narration candidate text display area 46 is formed. The narration candidate text display area 46 displays the narration candidate text (sentence) when the computer 12 creates the narration candidate text based on the keyword acquired from the meta information or the keyword input by the user, as will be described later. It is an area for. An OK button 48 is arranged in association with the narration candidate text display area 46. This OK button 48 is for instructing whether or not to select a narration text sentence displayed in the narration candidate text display area 46. If the displayed narration candidate is used, it may be clicked.

ただし、ナレーション候補テキスト表示領域４６には、１つだけの候補ではなく、複数の候補が一度に表示されるようにしてもよく、その場合には、ユーザは、その領域４６を直接クリックして１つまたは複数のナレーションテキストを選択するようにすればよい。 However, in the narration candidate text display area 46, not only one candidate but a plurality of candidates may be displayed at once. In this case, the user directly clicks the area 46. One or more narration texts may be selected.

ナレーション候補テキスト表示領域４６の下方には、指示ボタン５０が形成される。たとえば、コンピュータ１２が適当なナレーション候補テキストを作成できなかったような場合に、ユーザが適当なキーワードを手動で入力するときにこの指示ボタン５０をマウスでクリックする。そうすると、その下のキーワード入力領域５２が有効化される。したがって、ユーザは図示しないキーボードを使用してキーワードを入力することができる。このキーワード入力領域５２は、好ましくは、図１１に示すように、カテゴリ毎にキーワードを入力することができるように、複数（この実施例では５つ）のカテゴリに区分けされている。この入力領域５２を見れば、ユーザは、意図したキーワードが入力されたかどうか確認することができる。そして、意図どおりのキーワードを入力していることを確認したとき、ユーザは、入力領域５２の右にあるＯＫボタン５４を操作すればよい。それによって、キーワード入力が終了できる。 An instruction button 50 is formed below the narration candidate text display area 46. For example, when the computer 12 cannot create an appropriate narration candidate text, the instruction button 50 is clicked with the mouse when the user manually inputs an appropriate keyword. Then, the keyword input area 52 below it is validated. Therefore, the user can input a keyword using a keyboard (not shown). As shown in FIG. 11, the keyword input area 52 is preferably divided into a plurality of categories (five in this embodiment) so that a keyword can be input for each category. By looking at the input area 52, the user can confirm whether or not the intended keyword has been input. When confirming that the keyword is input as intended, the user may operate the OK button 54 on the right side of the input area 52. Thereby, the keyword input can be completed.

キーワード入力ＯＫボタン７１の右には、終了ボタン５６が設定される。この終了ボタン７２は、ナレーション付与作業を終了するときに操作する。 To the right of the keyword input OK button 71, an end button 56 is set. The end button 72 is operated to end the narration assignment work.

このようにして、ナレーション付与ボタン３８を操作することによって、ＧＵＩ２６が図１１に示す状態となり、その状態で、図１２に示すナレーション付与動作を実行することができる。 By operating the narration giving button 38 in this way, the GUI 26 enters the state shown in FIG. 11, and the narration giving operation shown in FIG. 12 can be executed in this state.

図１２の最初のステップＳ４１で、コンピュータ１２は、ナレーションを付与すべき写真が選択されたかどうか判断する。ユーザが写真を選択するためには、先に説明したように、サムネイル表示領域３０に表示されたサムネイル画像をクリックすればよい。そして、コンピュータ１２は、サムネイル表示領域３０での操作によってステップＳ４１で写真が選択されたと判断すると、次のステップＳ４３において、選択したサムネイル画像で表される写真を、写真表示編集領域２８に表示する。ただし、このときＧＵＩ２６はナレーション作成領域４４がその写真表示編集領域２８にオーバーラップした「ナレーション付与モード」になっているので、この領域２８で写真の編集をすることはできない。 In the first step S41 of FIG. 12, the computer 12 determines whether a photo to be narrated is selected. In order for the user to select a photo, the thumbnail image displayed in the thumbnail display area 30 may be clicked as described above. When the computer 12 determines that a photo has been selected in step S41 by an operation in the thumbnail display area 30, the computer 12 displays the photo represented by the selected thumbnail image in the photo display editing area 28 in the next step S43. . At this time, however, the GUI 26 is in a “narration giving mode” in which the narration creation area 44 overlaps with the photo display / edit area 28, so that the picture cannot be edited in this area 28.

次のステップＳ４５において、コンピュータ１２は、先に述べた方法で登録されているメタ情報を取得し、次のステップＳ４７で、そのメタ情報からキーワードを抽出する。そして、ステップＳ４９において、コンピュータ１２は、そのキーワードに基づいて、ナレーション候補テキストをナレーション候補テキスト表示領域４６に表示する。 In the next step S45, the computer 12 acquires the meta information registered by the method described above, and in the next step S47, extracts a keyword from the meta information. In step S49, the computer 12 displays the narration candidate text in the narration candidate text display area 46 based on the keyword.

ここで、ナレーション候補の生成方法について、説明する。 Here, a method for generating narration candidates will be described.

この実施例のシステム１０では、コンピュータ１２の内部メモリ（図示せず）またはデータベース２２（図１）に、表４に示すような、典型的なナレーションパターンを予め設定しておく。その意味で、これら内部メモリおよび／またはデータベース２２がナレーションパターン設定手段として機能する。そして、この表４に示す単語Ｘ１‐Ｘ５を、メタ情報から抽出したキーワードに基づいて当てはめることによって、ナレーションテキストを自動生成する。 In the system 10 of this embodiment, typical narration patterns as shown in Table 4 are preset in the internal memory (not shown) of the computer 12 or the database 22 (FIG. 1). In this sense, the internal memory and / or the database 22 function as a narration pattern setting unit. Then, the narration text is automatically generated by applying the words X1-X5 shown in Table 4 based on the keywords extracted from the meta information.

ただし、単語Ｘ１‐Ｘ５の与え方は、一例として、表５に従う。 However, how to give the words X1-X5 follows Table 5 as an example.

たとえば、表４の第１パターン「Ｘ１ですね（ｏｒですか）」や第２パターン「これはＸ１ですね（ｏｒですか）」を使うときには、単語Ｘ１には、人、物、場所、時候のいずれかである名詞２を当てはめる。ただし、その名詞２にも何種類かあり、それが｛（［副詞］＋形容詞）または（名詞１＋の）＋名詞２｝で表されている。これを分解すると、単語Ｘ１に当てはめられる単語は、「形容詞＋名詞２」（たとえば、「きれいな」）、「副詞＋形容詞＋名詞２」（たとえば、「大変きれいな花」）、「形容詞＋名詞１の名詞２」（たとえば、「きれいな庭の花」）、「副詞＋形容詞＋名詞１の名詞２」（たとえば、「大変きれいな庭の花」）、そして「名詞１の名詞２」（たとえば、「庭の花」）のいずれかとなる。 For example, when using the first pattern “Is it X1 (or is it)” or the second pattern “Is this X1 (or is it)” in Table 4, the word X1 contains people, things, places, and weather Apply noun 2, which is either However, there are several types of the noun 2, which are represented by {([adverb] + adjective) or (noun 1 +) + noun 2}. When this is decomposed, the words applied to the word X1 are “adjective + noun 2” (for example, “clean”), “adverb + adjective + noun 2” (for example, “very beautiful flower”), “adjective + noun 1”. Noun 2 "(for example," beautiful garden flower ")," adverb + adjective + noun 2 for noun 1 "(for example," very beautiful garden flower "), and" noun 1 noun 2 "(for example," Garden flowers ").

表４の第３パターン「Ｘ２とＸ３しましたね（ｏｒしましたか）」を使うときには、単語Ｘ２には、人である名詞２を当てはめる。ただし、そのＸ２の場合の名詞２は、第１パターンや第２パターンのときと同様に、｛（［副詞］＋形容詞）または（名詞１＋の）＋名詞２｝で定義され得る。また、単語Ｘ３には、行為である名詞２を当てはめる。このときの名詞２は、｛（［副詞］＋名詞２）｝で定義される。つまり、副詞が付いたか、付かない名詞２（行為）（たとえば、「楽しく旅行」または「旅行」）である。 When using the third pattern “I did X2 and X3 (or did you do it)” in Table 4, the noun 2 which is a person is applied to the word X2. However, the noun 2 in the case of X2 can be defined by {([adverb] + adjective) or (noun 1 +) + noun 2} as in the first pattern and the second pattern. Moreover, the noun 2 which is an action is applied to the word X3. The noun 2 at this time is defined by {([adverb] + noun 2)}. That is, noun 2 (action) with or without adverb (for example, “joyful trip” or “travel”).

表４の第４パターン「Ｘ４でＸ３しましたね（ｏｒしましたか）」を使うときには、単語Ｘ４には、場所である名詞２を当てはめる。ただし、その名詞２には、上述の場合と同じく、｛（［副詞］＋形容詞）または（名詞１＋の）＋名詞２｝で定義される。なお、単語Ｘ３については上述のとおりである。たとえば、場所としては、「遊園地」、「デパート」などが例示できる。 When using the fourth pattern “I did X3 with X4 (or did you do it)” in Table 4, the noun 2 that is a place is applied to the word X4. However, the noun 2 is defined by {([adverb] + adjective) or (noun 1 +) + noun 2} as in the case described above. The word X3 is as described above. For example, examples of places include “amusement park” and “department store”.

表４の第５パターン「Ｘ５でＸ３しましたね（ｏｒしましたか）」を使うときには、単語Ｘ５には、時候および／または場所である名詞２を当てはめる。ただし、その名詞２には、上述の場合と同じく、｛（［副詞］＋形容詞）または（名詞１＋の）＋名詞２｝で定義される。時候としては、春夏秋冬、何月などが例示できる。なお、単語Ｘ３については上述のとおりである。 When using the fifth pattern “Did you X3 with X5 (or did you do it?)” In Table 4, the noun 2 which is a time and / or place is applied to the word X5. However, the noun 2 is defined by {([adverb] + adjective) or (noun 1 +) + noun 2} as in the case described above. Examples of the weather include spring, summer, autumn and winter, and what month. The word X3 is as described above.

さらに、上記において、名詞２に与える単語は、一般名詞の概念構造から、ナレーションに使用するのに粒度として適当なものを選択して用いる。そして、副詞、形容詞、名詞１については、その名詞２と適合するものを割り当てる。ただし、この「適合」については、図示しないが、適合表を作成し、名詞２を決定した後、その適合表を参照して副詞、形容詞、名詞１を採用する。 Further, in the above description, the word given to the noun 2 is selected from the conceptual structure of the general noun and used with the appropriate granularity for use in narration. Then, adverbs, adjectives, and nouns 1 are assigned those that match the noun 2. However, for this “conformity”, although not shown, after creating a conformity table and determining a noun 2, adverbs, adjectives, and nouns 1 are adopted with reference to the conformity table.

このような表１のナレーションパターンのＸ１−Ｘ５に表２のように定義できる単語を当てはめる訳であるが、実施例では、この「単語」を、写真に付加したメタ情報から取得しようとするものである。 The words that can be defined as shown in Table 2 are applied to X1-X5 of the narration pattern in Table 1, but in the embodiment, this "word" is to be acquired from the meta information added to the photograph. It is.

図１４および図１５には、実施例において単語決定手段として機能するシソーラス辞書５８および共起辞書６０が図解される。このようなシソーラス辞書５８や共起辞書６０は、ともに、データディクショナリ（データ辞書）の一種であり、実施例では、図１に示すデータベース２２またはコンピュータ１２の内部メモリに予め設定されている。 14 and 15 illustrate a thesaurus dictionary 58 and a co-occurrence dictionary 60 that function as word determination means in the embodiment. Both the thesaurus dictionary 58 and the co-occurrence dictionary 60 are a kind of data dictionary (data dictionary), and are preset in the database 22 shown in FIG. 1 or the internal memory of the computer 12 in the embodiment.

シソーラス辞書５８は、図１４に示すように、表記の揺れや、関連語、同義語、類似語などを概念的に類似したキーワードのツリー構造または網構造の階層構造として示したものであり、カテゴリ毎にまとめられている。図１４の例では、カテゴリとして、「場所」および「行為」が示されている。たとえば、「偕楽園」、「後楽園」、「兼六園」などは「公園」という概念でくくられ、いずれも固有名詞である「ディズニーランド」や「ＵＳＪ」は、「遊園地」という概念に当てはめられ、これら「公園」および「遊園地」は、別の概念たとえば「デパート」とともに、カテゴリとしては「場所」に含まれることを示している。「行為」というカテゴリについても同様である。「○○旅行」はまとめて「旅行」で表し、「旅行」は、「散歩」、「おでかけ」などともに、「行為」のカテゴリに分類されている。先のナレーションパターンでの「名詞２」として適当な他のカテゴリ「人」、「時候」などについても、同様にキーワードを集積している。 As shown in FIG. 14, the thesaurus dictionary 58 shows notation fluctuations, related words, synonyms, similar words, etc., as a conceptually similar keyword tree structure or network structure hierarchical structure. It is summarized for each. In the example of FIG. 14, “place” and “action” are shown as categories. For example, “Kairakuen”, “Korakuen” and “Kenrokuen” are grouped under the concept of “park”, and the proper names “Disneyland” and “USJ” are all applied to the concept of “amusement park”. These “park” and “amusement park” are included in “location” as a category together with another concept such as “department store”. The same applies to the category of “action”. “Travel” is collectively expressed as “Travel”, and “Travel” is classified into the category of “Action” along with “Walk” and “Outing”. Similarly, keywords are accumulated for other categories “people”, “time points”, etc. suitable as “noun 2” in the above narration pattern.

特定の単語と単語との関係、結びつきを共起関係といい、共起辞書６０には、図１５に示すように、各カテゴリたとえば「場所」および「行為」にそれぞれ含まれるキーワード間の共起関係が示されている。この共起辞書７６によれば、一例として、場所の「デパート」というキーワードは行為のなかでは、「おでかけ」というキーワードとしか繋がらないが、「公園」といえば、「おでかけ」、「散歩」および「旅行」という複数のキーワードに強いつながりを持つことがわかる。 A relationship between specific words and a word and a connection are referred to as a co-occurrence relationship, and the co-occurrence dictionary 60 includes co-occurrence between keywords included in each category, for example, “place” and “action”, as shown in FIG. The relationship is shown. According to the co-occurrence dictionary 76, for example, the keyword “department” of a place is connected only to the keyword “outing” in the action, but “park” means “outing”, “walking” and It turns out that it has a strong connection to multiple keywords “travel”.

このようなツールを利用して、表４のナレーションパターンに単語、特に名詞２を適用することによって、ステップＳ４９において、ナレーション候補テキストを作成する。 Using such a tool, a narration candidate text is created in step S49 by applying a word, particularly the noun 2, to the narration pattern of Table 4.

ナレーション候補の具体例を説明する。図７に示すメタ情報には、日付（dc:date）として「２００４．０７．１６」があり、タイトル（dc:title）として「ディズニー動物王国にて」があり、人（faof:person）として「rel:grandchildOf」がありさらに、名前（foaf:name）として「はるか」が含まれる。 A specific example of a narration candidate will be described. The meta information shown in FIG. 7 includes “2004.07.16” as the date (dc: date), “in the Disney animal kingdom” as the title (dc: title), and “faof: person”. There is “rel: grandchildOf”, and “haruka” is included as a name (foaf: name).

まず、日付が「２００４．７．１６」であることから、シソーラス辞書５８を参照すれば、図１４には図示していないが、カテゴリ「時候」としては「夏」であることがわかる。「ディズニー動物王国にて」というタイトルに含まれる「ディズニー」を図１４のシソーラス辞書５８で検索すると、それは、「遊園地」に包含され、その「遊園地」は「場所」のカテゴリに該当することがわかる。さらに、図１５の共起辞書６０を参照すると、「遊園地」は３つの行為「おでかけ」、「散歩」、「旅行」に共起関係を有することがわかる。さらに、「人」として「孫」があり、その名前が「はるか」であることがわかる。このようにして、メタ情報からキーワードを抽出し、そのキーワードから、ナレーションパターンに適用可能な「単語」を検索すると、時候が「夏」で、場所が「遊園地」で、行為が「おでかけ」、「散歩」または「旅行」で、人が「孫のはるか」であることがそれぞれわかる。 First, since the date is “2004.7.16”, referring to the thesaurus dictionary 58, although not shown in FIG. 14, it is understood that the category “time” is “summer”. When “Disney” included in the title “In Disney Animal Kingdom” is searched in the thesaurus dictionary 58 of FIG. 14, it is included in “Amusement Park”, and “Amusement Park” corresponds to the category of “Place”. I understand that. Further, referring to the co-occurrence dictionary 60 of FIG. 15, it can be seen that “Amusement park” has a co-occurrence relationship with three actions “outing”, “walking”, and “travel”. Furthermore, it is understood that there is “grandchild” as “person”, and the name is “Haruka”. In this way, when keywords are extracted from the meta information and searched for “words” applicable to the narration pattern from the keywords, the time is “summer”, the place is “amusement park”, and the action is “outing” , "Walk" or "Travel", you can see that the person is "Much of the grandchild" respectively.

したがって、コンピュータ１２は、ステップＳ４９で、一例として表６に示すような３つのナレーションテキストを生成する。 Accordingly, in step S49, the computer 12 generates three narration texts as shown in Table 6 as an example.

表６の第１の候補は、第１のナレーションパターンを選択して単語を適用したものであり、第２の候補は、第３のナレーションパターンを選択して単語を適用したもので、第３の候補は、第５のナレーションパターンを選択して単語を適用してものである。 The first candidate in Table 6 is obtained by selecting a first narration pattern and applying a word, and the second candidate is obtained by selecting a third narration pattern and applying a word. Are candidates for selecting a fifth narration pattern and applying a word.

そして、ステップＳ４９で、このようなナレーションテキストを、ナレーション候補として、図１１のナレーション候補テキスト表示領域４６に一度に、または順次表示する。 In step S49, such narration text is displayed as narration candidates at one time or sequentially in the narration candidate text display area 46 of FIG.

そして、ステップＳ５１でコンピュータ１２は、そのようなナレーション候補テキストが選択されたかどうか、判断する。どれかのナレーション候補テキストが選択されると、次のステップＳ５３で、コンピュータ１２は、このナレーションテキストの音声データを取得する。 In step S51, the computer 12 determines whether such a narration candidate text has been selected. If any narration candidate text is selected, in the next step S53, the computer 12 acquires the voice data of this narration text.

この実施例では、音声モデルを図１のデータベース２２またはコンピュータ１２の内部メモリに登録しておき、ナレーションテキストを決定すると、その音声モデルを使って音声合成の手法で、ナレーション音声を作成する。ただし、音声データは、音声合成による他、たとえばプロのナレーターが発声した音声データを内部メモリやデータベース２２内に収録しておき、その音声データを編集することによって、ナレーション音声を作成するようにしてもよい。 In this embodiment, a speech model is registered in the database 22 of FIG. 1 or the internal memory of the computer 12, and when a narration text is determined, a narration speech is created by a speech synthesis method using the speech model. However, the voice data is generated by voice synthesis, for example, voice data uttered by a professional narrator is recorded in the internal memory or the database 22, and the voice data is edited to create a narration voice. Also good.

そして、ステップＳ５５において、コンピュータ１２は、ステップＳ５３で取得し、また作成したナレーション音声データを、ステップＳ４１で選択した写真に紐付けして、データベース２２に登録する。 In step S55, the computer 12 registers the narration audio data acquired in step S53 and the created narration audio data in the database 22 in association with the photograph selected in step S41.

このようにして、１枚の写真についてのナレーション音声データが写真に付与されるが、次のステップＳ５７で、ナレーション付与をしたい写真が未だあるかどうか判断し、まだナレーション付与を続行するときは終了ボタン５６（図１１）を押さないので、先のステップＳ４１に戻るが、終了ボタン７２を押したなら、このステップＳ５７で“ＹＥＳ”となり、ナレーション付与ステップＳ７が終了する。 In this way, the narration audio data for one photo is added to the photo, but in the next step S57, it is determined whether there is still a photo to which narration is to be added. Since the button 56 (FIG. 11) is not pressed, the process returns to the previous step S41. However, if the end button 72 is pressed, “YES” is determined in the step S57, and the narration providing step S7 is ended.

ただし、先のステップＳ４９で表示したナレーション候補をステップＳ５１で選択しなかったときには、コンピュータ１２は、次のステップＳ５９でさらにナレーション候補があるかどうか判断し、もしあれば、次のステップＳ６１でナレーション候補を更新して、再び、ナレーション候補テキスト表示領域４６に表示し（ステップＳ４９）、ユーザの選択を待つ。 However, when the narration candidate displayed in the previous step S49 is not selected in step S51, the computer 12 determines whether or not there are further narration candidates in the next step S59, and if there is, the narration is performed in the next step S61. The candidate is updated and displayed again in the narration candidate text display area 46 (step S49), and the selection of the user is awaited.

また、ステップＳ５９で適当なナレーション候補がないと判断したときには、コンピュータ１２は、ユーザによる変更キーワードを受け付ける。ユーザはキーワードを入力するときには、指示ボタン５０（図１１）を操作し、キーワード入力領域５２にキーワードを入力する。このとき、入力するキーワードは、ナレーションパターン（表４）の名詞１および名詞２に対応する｛人、物、場所、時候、行為｝、副詞や形容詞に対応する｛どんな｝をそれぞれ入力するものとする。ただし、指定しないカテゴリや項目があってもよい。 If it is determined in step S59 that there is no appropriate narration candidate, the computer 12 accepts a change keyword by the user. When inputting a keyword, the user operates the instruction button 50 (FIG. 11) and inputs the keyword in the keyword input area 52. At this time, the keywords to be entered are {person, thing, place, time, action} corresponding to noun 1 and noun 2 in the narration pattern (Table 4), {what} corresponding to adverbs and adjectives, respectively. To do. However, there may be categories and items that are not specified.

このようにして、ユーザがキーワードを入力した後には、コンピュータ１２は、ステップＳ４９で、上で説明したように、ユーザ入力キーワードから、シソーラス辞書５８や共起辞書６０を使って「単語」を決定し、その単語を表１のナレーションパターンに当てはめて、ナレーションテキストを生成し、表示する。以後、先に説明したように、ユーザは、その表示されたナレーション候補テキストを選択し、ナレーションテキストを決定する。 Thus, after the user inputs a keyword, the computer 12 determines a “word” from the user input keyword using the thesaurus dictionary 58 or the co-occurrence dictionary 60 as described above in step S49. Then, the narration text is generated and displayed by applying the word to the narration pattern in Table 1. Thereafter, as described above, the user selects the displayed narration candidate text and determines the narration text.

このようにして決定したナレーションテキストの音声データを写真画像データとともにデータベース２２に登録することによって、たとえばＲＶＭＬ形式のビデオムービーのようなビデオコンテンツを製作することができるが、この実施例では、付与すべき映像効果として、現在は２つのタイプの映像効果を対象としている。１つは写真間のトランジションの際に適用するフェードアウト/フェードインである。もう１つのタイプは写真の幾つかの領域に視聴者の注意を引き付けるための、パン、ズームの効果である。特に後者の映像効果は、適切なナレーションが伴なければならない。たとえば、「あなたの息子さんは、かわいいですね」というナレーションは、視聴者の息子を含む領域にズームアップしているタイミングで再生しなくてはいけない。また、映像効果との関連以外に、適切なポーズ（反応時間）がナレーションの後に挿入される必要がある。すなわち、視聴者が、ナレーションに反応することができる十分な時間を与える必要がある。ポーズが短過ぎると、視聴者が何らかの発話をしようとするのを、次のナレーションで遮られるかもしれず、それは視聴者にとって非常にいらだたしく感じられる。 By registering the voice data of the narration text thus determined together with the photographic image data in the database 22, video content such as a video movie in the RVML format can be produced. Currently, two types of video effects are targeted. One is a fade-out / fade-in that is applied during transitions between photos. Another type is the effect of panning and zooming to draw the viewer's attention to several areas of the photo. In particular, the latter video effect must be accompanied by appropriate narration. For example, a narration that says “Your son is cute” must be played when you zoom in to the area that includes the son of the viewer. In addition to the relationship with the video effect, an appropriate pose (reaction time) needs to be inserted after the narration. That is, the viewer needs to give enough time to react to the narration. If the pause is too short, the next narration may block the viewer from trying to speak, which can be very frustrating for the viewer.

つまり、視聴者にとって興味あるスライドショービデオにするためには、視覚的効果やＢＧＭ、ナレーションなどのオーディオ効果を付与する必要があるが、その際には写真の内容に応じて適切に視覚的効果とオーディオ効果を同期させることが必要となる。たとえば、ウェブ上のマルチメディアコンテンツの標準的な表現形式として、Synchronized Multimedia Integration Language (SMIL)が良く知られている。この表現形式は、タイムライン上に各コンテンツが割付済みのマルチメディアコンテンツを記述することを対象としている。この実施例で問題となるのは、アノテーションされた写真のセットから魅力的なスライドショービデオを作成する際の視覚的効果とオーディオ効果の間の意味的なレベルでの同期関係である。具体的には、ナレーションやＢＧＭと、写真への視覚的効果（映像効果）を付与するタイミングを同期させる必要があり、そのことをここでは、「セマンティクス(semantics)を考慮したメディア同期」と呼ぶことにする。 In other words, in order to create a slideshow video that is of interest to viewers, it is necessary to add visual effects and audio effects such as BGM and narration. It is necessary to synchronize the audio effect. For example, Synchronized Multimedia Integration Language (SMIL) is well known as a standard expression format for multimedia contents on the web. This representation format is intended for describing multimedia content to which each content has already been assigned on the timeline. The problem with this embodiment is the semantic level synchronization between visual and audio effects in creating an attractive slideshow video from a set of annotated photos. Specifically, it is necessary to synchronize the narration or BGM and the timing for giving a visual effect (video effect) to a photograph, and this is called “media synchronization in consideration of semantics” here. I will decide.

このような「セマンティクスを考慮したメディア同期」をとるために、この実施例では、上述の全ての要素(ナレーション、ポーズ、映像効果、写真および写真中の人物)を揃えた後に、映像効果要素、ナレーション要素およびポーズ(反応時間)要素についてのセマンティックな制約を考慮に入れて、ビデオのタイムライン上に並べる。 In order to achieve such “media synchronization in consideration of semantics”, in this embodiment, after all the above-described elements (narration, pose, video effect, photo and person in the photo) are arranged, the video effect element, Arrange them on the video timeline, taking into account the semantic constraints on the narration and pause (response time) elements.

図１５にその例を示す。この図１５では、Ｎ１（ナレーション１）とＰ１（家族）とが同時に提示されるべきという制約を有し、またＮ２（ナレーション２）とＰ２（息子）も同様な制約を有する。さらに、ビデオのタイムライン上に要素を配置するには、前述のナレーションとポーズとの関係のように、他の種類の制約も考慮する必要がある。 An example is shown in FIG. In FIG. 15, N1 (narration 1) and P1 (family) should be presented at the same time, and N2 (narration 2) and P2 (son) have similar restrictions. Furthermore, placing elements on the video timeline also requires other types of constraints to be considered, such as the relationship between narration and pose described above.

セマンティクスを考慮したメディア同期は、コンテンツを構成する要素間の時間的制約(局所的制約)を記述したものであるといえ、時区間論理のモデルによって表現することができる。このモデルは、時間的関係の制約の記述形式としてよく知られたものであり、ＯＷＬ−Ｓにおける時間に関するオントロジの一部にも組み込まれている。時区間論理のモデルを使えば、図１５の要素間の制約は、以下のように記述される。このような制約は、たとえば、コンピュータ１２の内部メモリ１４(図１)に設定される。 Media synchronization considering semantics is a description of temporal constraints (local constraints) between elements constituting content, and can be expressed by a model of time interval logic. This model is well known as a description format of constraints on time relations, and is also incorporated in a part of the ontology concerning time in OWL-S. If a time interval logic model is used, the constraints between the elements in FIG. 15 are described as follows. Such restrictions are set in the internal memory 14 (FIG. 1) of the computer 12, for example.

{VE1 overlaps P1、 VE1 meets N1、 N1 during P1、 N1 meets R1、 R1 meets VE2、 P1 overlaps VE2、 VE2 overlaps P2、 VE2 meets N2、 N2 meets R2、 N2 during P2}
｛視覚効果または映像効果ＶＥ１は写真Ｐ１とオーバーラップする、映像効果ＶＥ１はナレーションＮ１の直前にある、ナレーションＮ１は写真Ｐ１の間にある、ナレーションＮ１は反応時間Ｒ１の直前にある、反応時間Ｒ１は映像効果ＶＥ２の直前にある、写真Ｐ１は映像効果ＶＥ２とオーバーラップする、映像効果ＶＥ２は写真Ｐ２とオーバーラップする、映像効果ＶＥ２はナレーションＮ２の直前にある、ナレーションＮ２は反応時間Ｒ２の直前にある、ナレーションＮ２は写真Ｐ２の間にある｝
次に、上記の制約のもとで、思い出ビデオでこれらの各要素を提示する具体的なタイミングを計算する手続きを説明する。この計算のために、以下の条件を仮定する。なお、以下でサフィックスのiは、それが各タイプのithの要素であることを示す。
条件１：写真トランジションにはフェードアウト／フェードインの視覚効果を用いる。また同じ写真内で人から人への遷移には、パン、ズームの視覚効果を適用する。
条件２:ビデオに提示される写真の順序はデフォルトで年代順とする。さらに各写真での人と人との間のパン、ズームについて、デフォルトで左から右とする。しかし製作者は必要に応じて、それを変更できる。
条件３：フェードアウト／フェードインに要する時間は任意に変更可能であるが、速過ぎる、または遅過ぎるフェードアウト／フェードインは、視聴者を不愉快にする可能性がある。そのために最小、最大の時間を予め与えておくことにする。以降、その最小時間をminTfoi_iとし、最大時間をmaxTfoi_iとする。
条件４：Tn_iはNarration_iに費やされる時間を表す。そして視聴者の予想される応答または反応に必要な時間を、Tr_i秒で表す。
条件５：パン(Δ画素/秒)とズーム(Δ倍/秒)の速度を変えることで、パン、ズームそれぞれに必要な時間を制御することができる。しかし、この効果についても、条件３と同様な理由から、最小、最大の時間を設定しておく必要がある。最小時間をminTpz_iとし、最大時間をmaxTpz_iとして表す。
条件６：製作者は、思い出ビデオの最小、最大のビデオ再生時間を設定することができる。これらをminTpbとmaxTpbとしてそれぞれ表す。たとえば、製作者は、ビデオの長さを少なくとも３０分以上で、高々６０分以内のものとしたい等を指定することができる。 {VE1 overlaps P1, VE1 meets N1, N1 during P1, N1 meets R1, R1 meets VE2, P1 overlaps VE2, VE2 overlaps P2, VE2 meets N2, N2 meets R2, N2 during P2}
{Visual effect or video effect VE1 overlaps with photo P1, video effect VE1 is immediately before narration N1, narration N1 is between photos P1, narration N1 is immediately before reaction time R1, reaction time R1 Is immediately before the video effect VE2, the photo P1 overlaps with the video effect VE2, the video effect VE2 overlaps with the photo P2, the video effect VE2 is immediately before the narration N2, and the narration N2 is immediately before the reaction time R2. Narration N2 is between photos P2}
Next, a procedure for calculating specific timing for presenting each of these elements in the memory video under the above-described restrictions will be described. For this calculation, the following conditions are assumed. In the following, the suffix i indicates that it is an element of each type of ith.
Condition 1: Fade-out / fade-in visual effects are used for photographic transitions. Also, pan and zoom visual effects are applied to the transition from person to person in the same picture.
Condition 2: The order of photos presented in the video is chronological order by default. Furthermore, panning and zooming between people in each photo is defaulted from left to right. However, the producer can change it if necessary.
Condition 3: The time required for fade-out / fade-in can be arbitrarily changed, but a fade-out / fade-in that is too fast or too slow may make the viewer unpleasant. For this purpose, the minimum and maximum times are given in advance. Hereinafter, the minimum time is set to minTfoi_i, and the maximum time is set to maxTfoi_i.
Condition 4: Tn_i represents time spent in Narration_i. The time required for the viewer's expected response or response is expressed in Tr_i seconds.
Condition 5: By changing the speed of panning (Δ pixel / second) and zooming (Δ multiple / second), the time required for panning and zooming can be controlled. However, for this effect as well, it is necessary to set the minimum and maximum times for the same reason as in Condition 3. The minimum time is expressed as minTpz_i, and the maximum time is expressed as maxTpz_i.
Condition 6: The producer can set the minimum and maximum video playback time of the memory video. These are expressed as minTpb and maxTpb, respectively. For example, the producer can specify that the video should be at least 30 minutes long and not longer than 60 minutes, etc.

このような条件のもとで、各要素のタイムライン上の並びを計算する。 Under such conditions, the arrangement of the elements on the timeline is calculated.

先に図２を参照して説明したように、思い出ビデオに使用する写真、および各写真の人々を表示する順番は、ステップＳ２において決定されている。したがって、それらに対する映像効果の要素は自動的に並べることができる。また、ステップＳ３での写真中の各人へのナレーションの選択は、それぞれの映像効果の要素に引き続いて実行される。この実施例において設定しているセマンティック同期のための制約(各要素間の局所的制約)では、そのナレーションに引き続いて、視聴者の応答または反応のためのポーズ時間を配置する。そして、視聴者の応答または反応があればそれに引き続いて、直ちに次の映像効果の要素が実行される。映像効果の要素は、提示順序が前後の写真中の、各人に対応する要素とオーバーラップする。同様の制約を適用して、映像効果要素、ナレーション要素および反応時間要素が最後の写真まで配置される。 As described above with reference to FIG. 2, the photos used for the memory video and the order in which the people of each photo are displayed are determined in step S2. Therefore, the elements of the video effect for them can be automatically arranged. Further, the selection of the narration for each person in the photograph in step S3 is executed subsequent to the elements of the respective video effects. In the constraint for semantic synchronization set in this embodiment (local constraint between each element), a pause time for the viewer's response or reaction is arranged following the narration. Then, if there is a viewer's response or reaction, the next video effect element is immediately executed. The elements of the video effect overlap with the elements corresponding to each person in the photographs in the order of presentation. Applying similar constraints, the video effects element, narration element and reaction time element are placed up to the last photo.

このようにして、図２のステップＳ２およびＳ３を繰り返し実行することによって、各要素のタイムラインをあらまし設計するが、図２のステップＳ５で実行するレンダリング処理において、上で説明した同期の制約（条件１ないし６）のもとで、コンピュータ１２が、思い出ビデオのタイムライン上で各要素を実行する正確なタイミングを計算する。 In this manner, the timeline of each element is designed by repeatedly executing steps S2 and S3 in FIG. 2, but in the rendering process executed in step S5 in FIG. Under conditions 1-6, the computer 12 calculates the exact timing of executing each element on the memory video timeline.

まず、全ての要素（映像効果、ナレーション、反応）に対する、最小フェードアウト／フェードイン時間minTfoi_ｉ、ナレーション時間Tn_i、反応時間Tr_i、および最小パンズーム時間minTpz_iの総和より最大ビデオ再生時間maxTpbが小さいなら、すなわち、変数iが１からｎまで変化するときのΣ(minTfoi_ｉ＋Tn_i＋Tr_i＋minTpz_i)＞maxTpbなら、図２のステップＳ５において、コンピュータ１２は、先に写真に付与しておいた優先順位に従って、優先順位の低い写真を自動的に除外する。ただし、製作者が手動で除外対象を選択するようにしてもよい。 First, if the maximum video playback time maxTpb is smaller than the sum of the minimum fade-out / fade-in time minTfoi_i, narration time Tn_i, reaction time Tr_i, and minimum pan-zoom time minTpz_i for all elements (video effects, narration, reaction), that is, If Σ (minTfoi_i + Tn_i + Tr_i + minTpz_i)> maxTpb when variable i changes from 1 to n, in step S5 of FIG. Excluded. However, the producer may manually select the exclusion target.

また、全ての要素に対する、最大フェードアウト／フェードイン時間maxTfoi_i、ナレーション時間Tn_i、反応時間Tr_i、および最大パンズーム時間maxTpz_iの総和より最小ビデオ再生時間minTpbが大きいなら、つまり変数iが１からｎまで変化するときのΣ(maxTfoi_ｉ＋Tn_i＋Tr_i＋maxTpz_i)＜minTpbなら、ステップＳ５のレンダリング処理において、コンピュータ１２が幾つかの写真を追加するか、あるいは製作者が幾つかの写真を手動で追加する。追加する写真の画像データは図２のステップＳ１において既に取り込まれているものを利用するか、あるいは必要に応じて追加的にその都度取り込むものとする。 If the minimum video playback time minTpb is larger than the sum of the maximum fade-out / fade-in time maxTfoi_i, narration time Tn_i, reaction time Tr_i, and maximum pan-zoom time maxTpz_i for all elements, that is, the variable i changes from 1 to n. If Σ (maxTfoi_i + Tn_i + Tr_i + maxTpz_i) <minTpb at that time, the computer 12 adds some photos in the rendering process of step S5, or the producer adds some photos manually. As the image data of the photograph to be added, the image data already captured in step S1 of FIG. 2 is used, or it is additionally captured whenever necessary.

そして、ビデオの再生時間長さが最大再生時間maxTpbより短く、かつ最小再生時間minTpbより長くなるまで、写真の追加や削除による時間調整を行う。 Then, time adjustment is performed by adding or deleting pictures until the video playback time length is shorter than the maximum playback time maxTpb and longer than the minimum playback time minTpb.

このように、写真のアノテーションされた領域に対するケンバーンズ効果などの映像効果と、オーディオの効果で特にナレーションとの間のセマンティックな同期制約を考慮することで、簡単に魅力的なスライドショービデオ（ビデオコンテンツ）を製作できる。具体的には、映像効果要素、ナレーション要素および反応時間要素を各写真に所定の条件でタイムライン上で割り付け、それら各要素の合計所要時間が最大ビデオ再生時間と最小ビデオ再生時間との間に入るように設計することによって、映像効果とナレーションとの同期および視聴者の反応時間が確実に確保できる。 In this way, attractive slideshow videos (video content) can be easily created by taking into account the semantic synchronization constraints between video effects such as the Cambridge effect on the annotated area of the photo and the narration in the audio effect. ) Can be produced. Specifically, a video effect element, a narration element, and a reaction time element are assigned to each photo on the timeline under predetermined conditions, and the total time required for each element is between the maximum video playback time and the minimum video playback time. By designing to enter, it is possible to ensure the synchronization between the video effect and the narration and the reaction time of the viewer.

この発明の一実施例のビデオコンテンツ作成装置を示すブロック図である。It is a block diagram which shows the video content production apparatus of one Example of this invention. 図１実施例の全体動作を示すフロー図である。It is a flowchart which shows the whole operation | movement of FIG. 1 Example. 図１実施例のＧＵＩの一例を示す図解図である。It is an illustration figure which shows an example of GUI of FIG. 1 Example. 図２のステップＳ１で利用する写真入力およびメタ情報登録時のＧＵＩの表示の一例を示す図解図である。FIG. 3 is an illustrative view showing one example of GUI display at the time of photo input and meta information registration used in step S1 of FIG. 2; 図４においてリージョンの切り出しを示す図解図である。FIG. 5 is an illustrative view showing segmentation in FIG. 4. 図５のリージョンのメタ情報を示す図解図である。It is an illustration figure which shows the meta information of the region of FIG. 写真のアノテーションを例示する図解図である。It is an illustration figure which illustrates the annotation of a photograph. ナレーションのアノテーションを例示する図解図である。It is an illustration figure which illustrates the annotation of a narration. 図２のステップＳ２での写真の選択および再生順設定動作時のＧＵＩの一例を示す図解図である。FIG. 3 is an illustrative view showing one example of a GUI during a photo selection and playback order setting operation in step S2 of FIG. 2; 図２のステップＳ５での思い出ビデオのレンダリング動作を詳細に示すフロー図である。It is a flowchart which shows in detail the rendering operation | movement of the memory | video in step S5 of FIG. 図８のＧＵＩにおけるナレーション作成領域を示す図解図である。It is an illustration figure which shows the narration preparation area | region in GUI of FIG. 図２のステップＳ７でのナレーション付与動作を詳細に示すフロー図である。It is a flowchart which shows the narration provision operation | movement in step S7 of FIG. 2 in detail. ナレーション付与に用いるシソーラス辞書を示す図解図である。It is an illustration figure which shows the thesaurus dictionary used for narration provision. ナレーション付与に用いる共起辞書を示す図解図である。It is an illustration figure which shows the co-occurrence dictionary used for narration provision. セマンティクスを考慮した要素のタイムライン上の配置の一例を示す図解図である。It is an illustration figure which shows an example of arrangement | positioning on the timeline of the element which considered semantics.

Explanation of symbols

１０ …ビデオコンテンツ作成装置
１２ …コンピュータ
１４ …内部メモリ
１６ …モニタ
２０ …写真データ入力装置
２２ …データベース
２６ …ＧＵＩ
２８ …写真表示編集領域
３２、３４ …メタ情報入力領域 DESCRIPTION OF SYMBOLS 10 ... Video content creation apparatus 12 ... Computer 14 ... Internal memory 16 ... Monitor 20 ... Photo data input device 22 ... Database 26 ... GUI
28 ... Photo display edit area 32, 34 ... Meta information input area

Claims

A method of creating video content that uses fade-out / fade-in visual effects for photo transitions and arranges narration and reaction time during photo playback ,
Exclude photos when the maximum video playback time is less than the sum of the minimum fade-out / fade-in time, narration time and reaction time, and
A video content creation method in which a photo is added when the minimum video playback time is larger than the sum of the maximum fade-out / fade-in time, narration time, and reaction time.

How to create video content with fade-out / fade-in visual effects for photo transitions, pan-zoom visual effects for person-to-person transitions in the same photo, with narration and reaction time placed during the playback of the photo Because
Exclude photos when the maximum video playback time is less than the sum of minimum fade-out / fade-in time, narration time, reaction time and minimum pan-zoom time, and
A video content creation method in which a photo is added when the minimum video playback time is greater than the sum of the maximum fade-out / fade-in time, narration time, reaction time, and maximum pan-zoom time.