JP3602765B2

JP3602765B2 - Video search method, computer-readable recording medium storing a program for causing a computer to execute the method, video search processing device, video indexing method, and computer-readable recording a program for causing a computer to execute the method RECORDING MEDIUM, METHOD FOR GENERATING EXPLANATION OF IMAGE CONTENT, AND COMPUTER-READABLE RECORDING MEDIUM RECORDING PROGRAM FOR CAUSING COMPUTER TO EXECUTE THE METHOD

Info

Publication number: JP3602765B2
Application number: JP2000077193A
Authority: JP
Inventors: 隆子橋本; 由香利白田; 博子真野; 篤志飯沢
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1999-07-19
Filing date: 2000-03-17
Publication date: 2004-12-15
Anticipated expiration: 2020-03-17
Also published as: JP2001092849A

Description

【０００１】
【発明の属する技術分野】
本発明は、構造インデックスおよび事象インデックスを用いて構造化された映像を対象として、抽象度の高い用語を用いた映像検索処理を行う映像検索方法および映像検索処理装置、抽象度の高い用語を用いた映像インデックス付与処理を行う映像インデックス付与方法、映像シーンの映像内容を説明する説明文を生成する映像内容の説明文生成方法、およびその方法をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体に関する。
【０００２】
【従来の技術】
近年、コンピュータ・ハードウェア技術、情報処理技術の進歩や、インターネット、デジタル衛生放送の普及に伴って、日常的に様々な映像を利用することが可能となっている。このため、映像の有する情報としての価値・多様性・娯楽性がさらに重要な意味を持つようになり、従来から利用されているテレビジョン放送やビデオの再生等のように、映像を連続的に再生して視聴することに加えて、インデックスを付与して構造化した映像から所望の映像シーンを検索して視聴したり、情報を収集したり、ダイジェスト版を作成する等、多彩な利用方法が提案されている。
【０００３】
ところで、映像を効率的に検索するために、通常は、時間的に連続した映像シーンの集合である映像をより小さいユニット（区間）に分割して利用している。また、この際、映像シーンを所定の単位時間で分割したり、映像シーンの量で分割する等のように物理的に分割するのではなく、予め定めた条件を満足する映像シーンの集合として論理的に分割することが一般的である。この論理的な分割を行った後、それにインデックスを付加することにより、分割された映像シーンを再利用可能な意味的な纏まりとして扱うことができる。
【０００４】
この論理的な分割方法としては、例えば、映像を目でみて、人手（手作業）によって映像を分割し、映像にインデックスを付与する方法や、リアルタイムオーサリング（リアルタイム内容記述）の手法の一つとして、論理的分割の開始点にあたる点をインデックス（構造インデックス）として指定し、次の論理的分割の開始点までを分割の区間として判定する方法が考えられる。
【０００５】
一方、上記の分割する区間を表す構造インデックスの他に、映像のインデックスとして、映像上で起こった事象を表す事象インデックスというものも考えられている。これは、上記の論理的な構造インデックスが映像の論理的な区間（Ｄｕｒａｔｉｏｎ：継続時間）を示すものに対し、基本的に区間を持たないインデックスである。ただし、この事象インデックスも、現在の技術では断片的な情報しか設定できない。
【０００６】
なお、従来、この事象インデックスを利用した検索による論理的な区間の切り出しが映像検索の処理である。換言すれば、事象インデックスとして設定されている用語（断片的な情報）と一致する問い合わせを用いることにより、所望の映像部分を検索することが可能となる。
【０００７】
【発明が解決しようとする課題】
しかしながら、上記従来の技術によれば、構造インデックスまたは事象インデックスとして設定されている用語と一致する問い合わせに対して映像の検索を行うことはできるものの、映像の内容の意味について検索することはできないという問題点があった。
【０００８】
換言すれば、上記従来の技術によれば、映像に対して断片的なインデックスしか付与することができないため、断片的なインデックスを頼りに論理的な構造の単位で検索を行い、映像部分を取り出すことは可能であるが、利用者が真に欲する映像部分の検索が行えるとは限らなかった。
【０００９】
なお、説明を明確にするために本明細書中において「映像の内容の意味」を以下のように定義する。映像の内容の意味とは、人が映像を見て、その映像の内容を抽象的・概念的に理解することにより、特定の区間に対して生成される情報である。すなわち、人間の主観を使って初めて発見できるような意味を持つ情報である。また、映像の内容の意味とは、事象インデックスが映像上で起こった事象を断片的に表すものであるのに対して、映像上で起こった事象の変化の状態および事象の変化によって起こった結果となる事象の状態に基づいて、ある区間の映像の内容を総合的に人が判断して意味付けを行ったものである。したがって、映像の内容の意味は、ある区間の映像の内容が明らかになって初めて意味が成立するものである。
【００１０】
また、映像を検索する場合に、利用者からの問い合わせに対する答えとして期待される区間は、必ずしも１つの構造単位に対応しているとは限らず、ある構造単位の中の一部であるかもしれないし、複数の構造単位が組み合わさったものかもしれないが、論理的な構造単位に付与された断片的なインデックスに基づいて、問い合わせの内容に対する答え（検索結果）として１つの構造単位を選択するため、必ずしも利用者が満足する映像部分を検索することはできなかった。
【００１１】
また、映像シーンは多様に解釈されるため、論理的な構造を作成した人（すなわち、定義した人）と、問い合わせを行う人が異なる場合に、意味のある纏まりとして設定した区間が必ずしも一致するとは限らなかった。
【００１２】
さらに、映像の内容の意味について検索するということは、「抽象度の高い用語を条件として検索する」ということでもあるが、上記従来の技術では断片的に付けられたインデックス情報だけを頼りに、論理的な構造単位を取り出すという処理を行うしかなく、特定の応用の場合を除いて、一般的には、抽象度の高い検索条件によって映像の内容の意味を効率的に解析する、という処理はできなかった。
【００１３】
また、上記従来の技術では、インデックスを用いて映像の内容を説明する文字列を生成しようとした場合、断片的に付与されているインデックスを単に並列で並べることしかできないため、利用者にとって理解しやすい文章を生成することはできないという不具合もあった。すなわち、断片的なインデックスから、一般的な意味のある文字列に変換して、利用者にとって分かりやすい文章を生成することはできなかった。
【００１４】
また、リアルタイムオーサリング時において、ある事象に対する属性（または事象インデックス）は、続けて起きる事象によって決定するということがある。例えば、野球映像などにおいては、２塁打、３塁打というのは、ヒットを打った時点では判定できない。その後、バッターが何塁まで出塁したかという結果を見てからでなければ、意味付け（属性または事象インデックスの付与）を行うことができない。
【００１５】
これを解決する技術としては、リアルタイムに内容記述を行う方式として、赤迫、飯島、角谷、田中の関連研究（「映像データのリアルタイム内容記述方式とその実装」、ＤＥＷＳ’９９予稿集、３Ａ−２、１９９９年３月４日〜６日）が報告されている。この報告によれば、ラジオのナレーションのようなリアルタイムに入ってくる不完全なインデックス情報シーケンスに対して、それをどのように解釈して上位概念に置き換えるか、という処理を行っている。すなわち、予め上位概念を状態遷移図によって表現しておき、リアルタイムに上位概念に置き換えることで、元入力インデックスシーケンスの冗長性および不完全性を正すようにしている。
【００１６】
ところが、上記の赤迫等の関連研究で提案されている方式では、リアルタイムの内容記述といっているように、状態遷移図は時系列に入っているインデックスを解釈するために使うことを前提としている。このため、この方式をそのまま検索に応用すると、時系列でインデックスを解釈するため、処理に時間がかかるという問題点が発生する。また、如何にして、高速に抽象度の高い検索を実現するか、という課題に対しての解決は提案されていない。
【００１７】
さらに、常に時系列に入っているインデックスを全ての状態遷移図と比較して解釈する必要があるため、必ずしも効率的でないという問題点があった。
【００１８】
本発明は上記に鑑みてなされたものであって、映像の内容の意味に対する問い合わせを抽象度の高い用語または概念を用いて行うことができ、かつ、高速に検索を行えることを目的とする。
【００１９】
また、本発明は上記に鑑みてなされたものであって、映像の内容の意味を解釈し、一般的な意味のある文字列に変換して、利用者にとって分かりやすい映像内容の説明文字列を生成できることを目的とする。
【００２０】
また、本発明は上記に鑑みてなされたものであって、抽象度の高い用語を用いたインデックスの付与を効率良く高速に行えることを目的とする。
【００２１】
【課題を解決するための手段】
上記の目的を達成するために、請求項１に係る映像検索方法は、少なくとも映像を意味的な纏まりで分割するための構造インデックスおよび映像中で発生した事象の内容および場所を特定するための事象インデックスが付与され、構造インデックスで分割された区間の映像シーンを映像の構造単位とし、かつ、複数の階層化した構造単位を用いて構造化した映像を対象として、前記構造インデックスおよび事象インデックスを用いて前記映像中から所望の映像シーンを検索する映像検索方法において、予め、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語に対して、前記映像を検索する際の検索対象単位として適当な構造単位を検索粒度として設定し、前記用語を複数の事象で表現した事象の発生パターンに基づいて、前記検索粒度の中で連続して発生する複数の事象インデックスの順序を定めた入力列として前記用語に対応した状態遷移パターンを定義しておき、前記映像中から所望の映像シーンを検索する場合に、所望の映像シーンを表現した用語を入力し、入力した用語に対応する検索粒度を検索対象単位として、検索粒度と一致する構造単位毎に、入力した用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定し、一致した構造単位を検索結果として出力するものである。
【００２２】
また、請求項２に係る映像検索方法は、請求項１に記載の映像検索方法において、さらに、前記用語に対応した状態遷移パターン毎に、それぞれの状態遷移パターン中に存在する事象インデックスのうち、少なくとも１つの事象インデックスが検索の取り掛かりとなるキーインデックスとして指定されており、前記映像中から所望の映像シーンを検索する場合に、検索粒度と一致する構造単位で、かつ、前記キーインデックスと一致する事象インデックスを有する構造単位を検索した後、該当する構造単位に対して、前記状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かの判定を行うものである。
【００２３】
また、請求項３に係る映像検索方法は、請求項１または２に記載の映像検索方法において、前記検索結果として出力された構造単位に基づいて、前記映像から所望の映像シーンを取り出す際に、前記構造単位で特定された映像シーンを出力するものである。
【００２４】
また、請求項４に係る映像検索方法は、請求項１または２に記載の映像検索方法において、前記検索結果として出力された構造単位に基づいて、前記映像から所望の映像シーンを取り出す際に、構造化された映像上における上位または下位の任意の構造単位を指定可能であるものである。
【００２５】
また、請求項５に係る映像検索方法は、請求項２に記載の映像検索方法において、前記検索結果として出力された構造単位に基づいて、前記映像から所望の映像シーンを取り出す際に、前記キーインデックスが付与された場所の前後に映像切り出しのためのオフセットを指定して映像シーンを取り出すものである。
【００２６】
また、請求項６に係る映像検索方法は、請求項２〜５のいずれか一つに記載の映像検索方法において、前記入力した用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定した際に、一致した構造単位に、その用語を表すインデックスを抽象インデックスとして定義して新たに付加し、キーインデックスとして再利用するものである。
【００２７】
また、請求項７に係る映像検索方法は、請求項１〜６のいずれか一つに記載の映像検索方法において、予め複数の事象の組み合わせによって表現可能な用語のそれぞれに、各用語を表すインデックスとして抽象インデックスを設定しておき、前記状態遷移パターンを定義する際に、事象インデックスに加えて前記抽象インデックスを用いて、事象インデックスと抽象インデックスからなる入力列として前記用語に対応した状態遷移パターンを定義するものである。
【００２８】
また、請求項８に係る映像検索方法は、請求項２〜７のいずれか一つに記載の映像検索方法において、前記構造インデックスおよび事象インデックスには、複数の属性情報が付加されており、前記用語に対応した状態遷移パターンには、前記構造インデックスおよび事象インデックスの各属性情報を用いて前記用語に関連した説明文を生成するための文字列定義情報が付加されており、前記状態遷移パターンの文字列定義情報に基づいて、前記検索結果として出力された構造単位中の属性情報を参照して前記用語に関連した説明文を生成するものである。
【００２９】
また、請求項９に係る映像検索方法は、少なくとも映像を意味的な纏まりで分割するための構造インデックスおよび映像中で発生した事象の内容および場所を特定するための事象インデックスが付与され、構造インデックスで分割された区間の映像シーンを映像の構造単位とし、かつ、複数の階層化した構造単位を用いて構造化した映像を対象として、前記構造インデックスおよび事象インデックスを用いて前記映像中から所望の映像シーンを検索する映像検索方法において、映像検索を行う前の処理として、予め、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語に対して、前記映像を検索する際の検索対象単位として適当な構造単位を検索粒度として設定し、前記用語を複数の事象で表現した事象の発生パターンに基づいて、前記検索粒度の中で連続して発生する複数の事象インデックスの順序を定めた入力列として前記用語に対応した状態遷移パターンを定義し、さらに、前記用語に対応した状態遷移パターン毎に、それぞれの状態遷移パターン中に存在する事象インデックスのうち、少なくとも１つの事象インデックスを検索の取り掛かりとなるキーインデックスとして指定して、前記用語、検索粒度、状態遷移パターンおよびキーインデックスからなる状態遷移テーブルを生成する状態遷移テーブル生成工程を含み、映像検索を行う際の処理として、所望の映像シーンを表現した用語を入力する用語入力工程と、前記状態遷移テーブルを参照して、前記用語入力工程で入力した用語に対応する検索粒度を検索対象単位とし、かつ、前記用語に対応するキーインデックスを用いて、前記キーインデックスと一致する事象インデックスを有する構造単位を検索する検索工程と、前記状態遷移テーブルを参照して、前記検索工程で検索した構造単位中に前記用語に対応する状態遷移パターン中に含まれる事象インデックスが全て存在するか否かを判定する第１の判定工程と、前記第１の判定工程で全て存在すると判定された構造単位に対して、入力した用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定する第２の判定工程と、前記第２の判定工程で一致すると判定された構造単位に基づいて、前記映像中から映像シーンを切り出して、検索結果として出力する検索結果出力工程と、を含むものである。
【００３０】
さらに、請求項１０のコンピュータ読み取り可能な記録媒体は、前記請求項１〜９のいずれか一つに記載の映像検索方法をコンピュータに実行させるためのプログラムを記録したものである。
【００３１】
また、請求項１１に係る映像検索処理装置は、少なくとも映像を意味的な纏まりで分割するための構造インデックスおよび映像中で発生した事象の内容および場所を特定するための事象インデックスが付与され、構造インデックスで分割された区間の映像シーンを映像の構造単位とし、かつ、複数の階層化した構造単位を用いて構造化した映像を対象として、前記構造インデックスおよび事象インデックスを用いて前記映像中から所望の映像シーンを検索する映像検索処理装置において、検索対象である前記構造化した映像を入力する映像入力手段と、検索する所望の映像シーンを指定するための、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語と、前記用語で映像を検索する際の検索対象単位としての構造単位を指定する検索粒度と、前記用語を前記検索粒度の中で連続して発生する複数の事象インデックスの順序を定めた入力列として定義した状態遷移パターンと、前記状態遷移パターン中に存在する事象インデックスのうちの少なくとも１つの事象インデックスを指定したキーインデックスと、を状態遷移テーブルとして記憶した記憶手段と、前記所望の映像シーンを検索するための問い合わせ用語を入力または指定するための操作入力手段と、前記操作入力手段を介して問い合わせ用語が入力または指定された場合に、前記記憶手段の状態遷移テーブルを参照し、前記問い合わせ用語に対応するキーインデックスを用いて、前記映像入力手段で入力した映像から前記検索粒度と一致し、かつ、前記キーインデックスと一致する事象インデックスを有する構造単位を検索する検索手段と、前記検索手段で検索された構造単位を入力し、前記問い合わせ用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定する判定手段と、前記判定手段で一致すると判定された構造単位に基づいて、前記映像中から映像シーンを切り出して、検索結果として出力する検索結果出力手段と、を備えたものである。
【００３２】
また、請求項１２に係る映像検索処理装置は、請求項１１記載の映像検索処理装置において、前記判定手段が、前記検索手段で検索された構造単位中に、前記問い合わせ用語と対応する状態遷移パターン中に含まれる事象インデックスが全て存在するか否かを判定する第１の判定手段と、前記第１の判定手段で全て存在すると判定された構造単位に対して、前記問い合わせ用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定する第２の判定手段と、から構成されるものである。
【００３３】
さらに、請求項１３に係る映像インデックス付与方法は、映像を構造化する際に、少なくとも映像を意味的な纏まりで分割するための構造インデックスおよび映像中で発生した事象の内容および場所を特定するための事象インデックスを含む映像インデックスを付与する映像インデックス付与方法において、予め、複数の事象が連続して発生することによって意味が成立し、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語と、前記用語を複数の事象で表現した複数の事象インデックスの順序を定めた入力列を用いて前記用語を定義した状態遷移パターンと、所定の構造インデックスで定義される映像の構造単位を前記用語に対応させて指定した検索粒度とを、前記用語毎に対応させた状態遷移テーブルを設定しておき、前記映像に映像インデックスを付与する際に、前記状態遷移テーブルを参照して、前記構造インデックスによって特定される構造単位毎に、対象となる構造単位と前記検索粒度が一致し、かつ、対象となる構造単位内に付与された事象インデックスの付与順序と前記状態遷移パターンの複数の事象インデックスの入力列とが一致する用語を検索して、一致する用語が存在する場合に、前記一致した用語の意味が発生したと判定し、該当する用語の成立を示す事象インデックスまたは属性情報を付与するものである。
【００３４】
また、請求項１４に係る映像インデックス付与方法は、映像を構造化する際に、少なくとも映像を意味的な纏まりで分割するための構造インデックスおよび映像中で発生した事象の内容および場所を特定するための事象インデックスを含む映像インデックスを付与する映像インデックス付与方法において、予め、複数の事象が連続して発生することによって意味が成立し、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語と、前記用語を複数の事象で表現した複数の事象インデックスの順序を定めた入力列を用いて前記用語を定義した状態遷移パターンと、所定の構造インデックスで定義される映像の構造単位を前記用語に対応させて指定した検索粒度とを、前記状態遷移パターン中に存在する事象インデックスのうち、少なくとも１つの事象インデックスを指定した中心インデックスとを、前記用語毎に対応させた状態遷移テーブルを設定しておき、前記映像に映像インデックスを付与する際に、前記状態遷移テーブルを参照して、前記中心インデックスと一致する事象インデックスが付与された場合に、前記構造インデックスによって特定される構造単位毎に、対象となる構造単位と前記検索粒度が一致し、かつ、対象となる構造単位内に付与された事象インデックスの付与順序と前記状態遷移パターンの複数の事象インデックスの入力列とが一致する用語を検索して、一致する用語が存在する場合に、前記一致した用語の意味が発生したと判定し、該当する用語の成立を示す事象インデックスまたは属性情報を付与するものである。
【００３５】
また、請求項１５に係るコンピュータ読み取り可能な記録媒体は、前記請求項１３または１４に記載の映像インデックス付与方法をコンピュータに実行させるためのプログラムを記録したものである。
【００３６】
また、請求項１６に係る映像内容の説明文生成方法は、少なくとも映像を意味的な纏まりで分割するための構造インデックスおよび映像中で発生した事象の内容および場所を特定するための事象インデックスが付与され、構造インデックスで分割された区間の映像シーンを映像の構造単位とし、かつ、複数の階層化した構造単位を用いて構造化した映像を対象として、映像内容を説明する説明文を生成する映像内容の説明文生成方法において、前記説明文の生成に使用する情報が設定された状態遷移テーブルと、予め前記構造インデックスおよび事象インデックスに付与されている文字列または文字列に変換可能な属性情報とを用いて、前記説明文を生成する映像内容の説明文生成方法であって、前記状態遷移テーブルには、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語毎に、前記説明文を生成する際の映像単位として適当な構造単位を設定した生成粒度と、前記用語を複数の事象で表現した事象の発生パターンに基づいて、前記生成粒度の中で連続して発生する複数の事象インデックスの順序を定めた入力列として前記用語を定義した状態遷移パターンと、前記状態遷移パターン毎に、それぞれの状態遷移パターン中に存在する事象インデックスのうち、少なくとも１つを選択して設定したキーインデックスと、前記状態遷移パターン毎に、前記説明文の生成に使用する構文要素および前記構文の構文要素として使用する文字列の入力元を設定した文字列定義情報と、が設定されており、前記映像内容の説明文を生成する際に、前記状態遷移テーブルを参照して、前記説明文を生成する対象となる構造単位と一致する生成粒度を検索し、該当する生成粒度に対応する状態遷移パターンのキーインデックスと一致する事象インデックスが前記対象となる構造単位中に存在するか否かを判定し、前記一致する事象インデックスが存在する場合に、前記対応する状態遷移パターンと前記対象となる構造単位中の事象インデックスの発生パターンが一致するか否かを判定し、前記発生パターンが一致した場合に、前記対応する状態遷移パターンで定義された用語が成立したと判定し、成立した用語の状態遷移パターンの文字列定義情報を用いて前記対象となる構造単位の映像シーンの説明文を生成するものである。
【００３７】
また、請求項１７に係る映像内容の説明文生成方法は、請求項１６に記載の映像内容の説明文生成方法において、さらに、前記構造単位には、特定の用語を定義した状態遷移パターンが成立した場合に、その用語を表すインデックスを抽象インデックスとして付与することが可能であり、前記状態遷移テーブルには、前記状態遷移パターン毎に、対応する用語を表す抽象インデックスがキーインデックスの一つとして設定されており、前記該当する生成粒度に対応する状態遷移パターンのキーインデックスと一致する事象インデックスが前記対象となる構造単位中に存在するか否かを判定する際に、キーインデックスとして抽象インデックスを優先して用いて、前記対象となる構造単位に該当する抽象インデックスが付与されているか否かを判定し、抽象インデックスが設定されている場合には、前記対応する状態遷移パターンで定義された用語が成立したと判定し、成立した用語の状態遷移パターンの文字列定義情報を用いて前記対象となる構造単位の映像シーンの説明文を生成する。
【００３８】
また、請求項１８に係る映像内容の説明文生成方法は、請求項１６または１７に記載の映像内容の説明文生成方法において、前記文字列定義情報は、前記構文要素として使用する文字列の入力元として、前記構造インデックスまたは事象インデックスの属性情報が設定されているものである。
【００３９】
また、請求項１９に係る映像内容の説明文生成方法は、請求項１６〜１８のいずれか一つに記載の映像内容の説明文生成方法において、前記文字列定義情報は、「何時（Ｗｈｅｎ）、どこで（Ｗｈｅｒｅ）、なぜ（Ｗｈｙ）、誰の（Ｗｈｏ）、何（Ｗｈａｔ）で、どのように（Ｈｏｗ）なった」の５Ｗ１Ｈを基本とした構文要素が設定されているものである。
【００４０】
また、請求項２０に係る映像内容の説明文生成方法は、請求項１９に記載の映像内容の説明文生成方法において、前記該当する生成粒度に対応する状態遷移パターンが複数存在する場合、各状態遷移パターンに対して用語が成立するか否かを判定し、複数の用語が成立すると、各用語の状態遷移パターンに設定された文字列定義情報の構文要素を組み合わせて前記対象となる構造単位の映像シーンの説明文を生成するものである。
【００４１】
また、請求項２１に係る映像内容の説明文生成方法は、請求項２０に記載の映像内容の説明文生成方法において、各用語の状態遷移パターンに設定された文字列定義情報の構文要素を組み合わせて前記対象となる構造単位の映像シーンの説明文を生成する際に、５Ｗ１Ｈの構文要素の中に重複する構文要素がある場合、時間的に後に発生する事象インデックスを参照する構文要素を優先するものである。
【００４２】
また、請求項２２に係る映像内容の説明文生成方法は、請求項２０または２１に記載の映像内容の説明文生成方法において、各用語の状態遷移パターンに設定された文字列定義情報の構文要素を組み合わせて前記対象となる構造単位の映像シーンの説明文を生成する際に、５Ｗ１Ｈの構文要素の中に重複する構文要素がある場合、各用語の状態遷移パターンを比較し、より多くの事象インデックスを用いて定義された状態遷移パターンの構文要素を優先するものである。
【００４３】
また、請求項２３に係る映像内容の説明文生成方法は、請求項１９〜２２のいずれか一つに記載の映像内容の説明文生成方法において、各用語の状態遷移パターンに設定された文字列定義情報の構文要素を組み合わせて前記対象となる構造単位の映像シーンの説明文を生成する際に、５Ｗ１Ｈの構文要素の中に重複する構文要素がある場合、必要に応じて構文要素を並列に並べるものである。
【００４４】
また、請求項２４に係るコンピュータ読み取り可能な記録媒体は、前記請求項１６〜２３のいずれか一つに記載の映像内容の説明文生成方法をコンピュータに実行させるためのプログラムを記録したものである。
【００４５】
【発明の実施の形態】
以下、本発明の映像検索方法、その方法をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体、映像検索処理装置、映像インデックス付与方法、その方法をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体、映像内容の説明文生成方法およびその方法をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体の実施の形態について、添付の図面を参照しつつ詳細に説明する。
【００４６】
（実施の形態１）
実施の形態１に係る映像検索方法、映像検索処理装置および映像インデックス付与方法について、
（１）映像検索処理装置の装置構成
（２）構造インデックスと事象インデックスとの関係
（３）状態遷移テーブルの構造
（４）状態遷移パターン（状態遷移図）の定義例
（５）具体的な映像検索処理アルゴリズムの例
（６）具体的な動作例
の順で説明する。
【００４７】
（１）映像検索処理装置の装置構成
図１は、実施の形態１の映像検索処理装置の構成図を示し、同図（ａ）が映像検索処理装置１００のハード構成の一例を示し、同図（ｂ）が映像検索処理装置１００の機能ブロック図を示している。
映像検索処理装置１００は、少なくとも映像を意味的な纏まりで分割するための構造インデックスおよび映像中で発生した事象の内容および場所を特定するための事象インデックスが付与され、構造インデックスで分割された区間の映像シーンを映像の構造単位とし、かつ、複数の階層化した構造単位を用いて構造化した映像を対象として、構造インデックスおよび事象インデックスを用いて映像中から所望の映像シーンを検索するものである。
【００４８】
この映像検索処理装置１００のハード構成としては、少なくともＣＰＵ（中央演算処理装置）、ディスプレー、キーボード、磁気ディスクを有する装置であれば良く、例えば、同図（ａ）に示すようなパーソナルコンピュータを利用することができる。
【００４９】
また、映像検索処理装置１００は、同図（ｂ）に示すように、検索対象である構造化した映像を入力するための映像入力部１０１と、後述する用語、検索粒度、状態遷移パターンおよびキーインデックスを状態遷移テーブルとして記憶した状態遷移テーブル記憶部１０２と、所望の映像シーンを検索するための問い合わせ用語を入力または指定するための操作入力部１０３と、操作入力部１０３を介して問い合わせ用語が入力または指定された場合に、状態遷移テーブル記憶部１０２の状態遷移テーブルを参照し、問い合わせ用語に対応するキーインデックスを用いて、映像入力部１０１で入力した映像から検索粒度と一致し、かつ、キーインデックスと一致する事象インデックスを有する構造単位を検索する映像検索部１０４と、映像検索部１０４で検索された構造単位を入力し、問い合わせ用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定する判定部１０５と、判定部１０５で一致すると判定された構造単位に基づいて、映像中から映像シーンを切り出して、検索結果として出力する検索結果出力部１０６と、を備えている。
【００５０】
なお、判定部１０５は、映像検索部１０４で検索された構造単位中に、問い合わせ用語と対応する状態遷移パターン中に含まれる事象インデックスが全て存在するか否かを判定する第１の判定部１０５ａと、第１の判定部１０５ａで全て存在すると判定された構造単位に対して、問い合わせ用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定する第２の判定部１０５ｂと、を有している。
【００５１】
（２）構造インデックスと事象インデックスとの関係
次に、本発明の映像検索処理装置（映像検索方法）が、検索の対象とする構造化された映像について説明する。構造化された映像には、映像を意味的な纏まりで分割するための構造インデックスと、映像中で発生した事象の内容および場所を特定するための事象インデックスが付与されている。
【００５２】
図２および図３は、構造インデックスと事象インデックスとの関係を示す説明図である。図２において、例えば、構造１の構造インデックスで示される区間の映像データ（映像シーン）が１つの映像全体を表すものとした場合、この映像は、構造１の下位に構造２−ａ、構造２−ｂ、構造２−ｃ…等の複数の構造インデックスを有している。ここで、構造２−ａ、構造２−ｂ、構造２−ｃ…等の各構造インデックスで示される区間は、構造１で示される映像全体を分割した区間であり、かつ、これらの分割された区間を全て繋げると上位の区間である映像全体の区間と一致する。
【００５３】
また、構造２−ａの下位には、構造３−ａａ、構造３−ａｂ、構造３−ａｃ…等の複数の構造インデックスが設けられている。同様に、構造３−ａａ、構造３−ａｂ、構造３−ａｃ…等の各構造インデックスで示される区間は、構造２−ａで示される区間を分割した区間であり、かつ、これらの分割された区間を全て繋げると上位の区間である構造２−ａの区間と一致する。
【００５４】
なお、最上位の構造１の構造インデックスで示される区間が、映像全体を示す構造単位となり、次の構造２−ａ、構造２−ｂ、構造２−ｃ…等の各構造インデックスで示される区間が、構造２レベルの構造単位となり、さらに下位の構造３−ａａ、構造３−ａｂ、構造３−ａｃ…等の各構造インデックスで示される区間が、構造３レベルの構造単位となる。
【００５５】
このように映像（構造１）は、構造インデックスで分割された区間の映像シーンを映像の構造単位とし、かつ、複数の階層化した構造単位を用いて構造化されている。
【００５６】
上記映像の構造を表す構造インデックスに対して、事象インデックスは、映像上で起こった事象を表すものである。前述したように構造インデックスが映像の論理的な区間を示すのに対して、事象インデックスは基本的に区間を持たないインデックスである。この事象インデックスは、基本的には、映像中で事象が発生した場所にその事象の内容を示す情報として付与される。例えば、図３のように、映像の流れ（時系軸での変化）において事象の発生した時に事象インデックス３ａａ−１〜３ａａ−４のように付与しても良いし、図２のように、事象の発生した構造インデックス３ａａに事象インデックス３ａａ−１〜３ａａ−４を付与しても良い。なお、詳細な説明は省略するが、構造インデックスおよび事象インデックスで構成される構造化されたインデックス情報の部分と、実際の映像シーンの部分とをそれぞれ別々に保存したり、管理したりすることもできるのは勿論である。
【００５７】
（３）状態遷移テーブルの構造
次に、本発明の重要な要素である状態遷移テーブルの構造について詳細に説明する。本発明では、抽象度の高い意味（用語）を用いて所望の映像シーンを検索するために、その用語を表現する状態遷移パターンを映像の構造情報（構造インデックスおよび事象インデックス）を利用して定義しておく。ここでの処理が、本発明の状態遷移テーブル生成工程に相当する。
【００５８】
状態遷移テーブルは、図４に示すように、以下の（３−１）〜（３−４）を対応させて設定したものである。
（３−１）用語：
抽象度の高い検索用語であり、該当する映像シーンの意味を表す文字列として設定したものである。抽象度の高い用語（検索用語）とは、事象インデックスとして使用される断片的な意味の用語が映像上の１つの事象と対応付けることができるのに対して、映像上の１つの事象のみに対応させることはできず、複数の事象の組み合わせによって表現可能な用語である。換言すれば、抽象度の高い用語（検索用語）は、ある区間の複数の事象の発生が明らかになって初めて意味が成立するものである。
【００５９】
（３−２）検索粒度：
映像を検索する際の検索対象単位として適当な構造単位を設定したものである。すなわち、用語（上記（３−１））の意味が成立する最小の構造単位を検索粒度として設定することにより、検索対象単位を狭い範囲に絞って効率的に検索できるようにするものである。
（３−３）状態遷移パターン：
用語（上記（３−１））を表現する事象の発生パターンに基づいて、検索粒度（上記（３−２））の中で連続して発生する複数の事象インデックスの入力列として用語（上記（３−１））に対応した事象リストを定義したものである。
（３−４）キーインデックス：
状態遷移パターン（上記（３−３））中に存在する事象インデックスのうち、少なくとも１つの事象インデックスを検索の取り掛かりとなるキー（中心事象）として指定したものである。
【００６０】
すなわち、状態遷移テーブルは、映像内容の意味定義を、ある構造上の論理単位（検索粒度）における事象インデックスのリストと、内容の意味を表す文字列と、検索を効率的に行うためのキーインデックスとの組として表現し、作成したものである。この状態遷移テーブル上の状態遷移パターンを、インデックスにより構造化された映像に対して、発見（パース）していくことが検索となる。
【００６１】
図４を参照して、さらに説明すると、意味を表す文字列『××××・・×』に対して、先ず、その用語を複数の事象の入力列で表現して事象リスト『事象１、事象２、事象３』を設定する。この事象リストは、図５に示すように、状態Ａにおいて事象１が発生し、次に事象２が発生し、次に事象３が発生し、状態Ｂに移ることを示している。
【００６２】
用語に対する事象リストが作成されると、この事象リストに存在する事象１〜３の中で最も用語を象徴的に表している事象または用語を特定するのに相応しい事象をキーインデックスに指定する。ここでは事象１がキーインデックスとして指定されている。続いて、構造化された映像の構造単位において、事象リストに存在する事象１〜３が用語の意味する内容として発生可能な最小の構造単位を検索粒度として選択し、設定する。ここでは粒度１として記述する。
【００６３】
同様に、意味を表す文字列『００００・・０』に対して、先ず、その用語を複数の事象の入力列で表現して事象リスト『事象１、事象２、事象４』を設定する。この事象リストは、図５に示すように、状態Ａにおいて事象１が発生し、次に事象２が発生し、次に事象４が発生し、状態Ｃに移ることを示している。
【００６４】
用語に対する事象リストが作成されると、この事象リストに存在する事象１、２、４の中で最も用語を象徴的に表している事象または用語を特定するのに相応しい事象をキーインデックスに指定する。ここでは事象４がキーインデックスとして指定されている。続いて、構造化された映像の構造単位において、事象リストに存在する事象１、２、４が用語の意味する内容として発生可能な最小の構造単位を検索粒度として選択し、設定する。ここでは粒度２として記述する。
【００６５】
なお、図４に示すように、異なる用語に対して検索粒度として設定された粒度１と粒度２が同一の構造単位であることもあり得る。検索粒度は用語毎に最適なものを選択すれば良く、検索粒度に同一の構造単位が多数存在していてもかまわない。
【００６６】
状態遷移テーブルを利用して用語の意味する映像内容を検索する場合、例えば、用語に対応した検索粒度の構造単位に絞り込んで検索をすることで検索効率の向上を図ることができる。さらに、キーインデックスを用いて該当する事象をパースした後、指定された検索粒度の構造単位で、事象リストが成り立つか判定することで、高速かつ効率的な検索を行うことができる。
【００６７】
（４）状態遷移パターン（状態遷移図）の定義例
本発明では、前述したように、予め、人間が検索に用いる抽象度の高い意味をもつ用語（および概念）を、状態遷移のパス正規表現パターン（状態遷移パターン）として定義しておく。
【００６８】
前提条件として、
＊映像には事象インデックスが付加されているが、事象インデックスが付加されたことにより、それを入力記号として新たな状態に遷移するとする。
＊また、事象インデックスは時間的幅をもたないと定義する。
＊そして、２つの着目する事象インデックスの間の映像データをシーンと定義する。
＊シーンは時間に沿って流れていく。
【００６９】
このシーンの流れ、すなわち、状態遷移の様子は、事象インデックスをラベルとする有向グラフによって表現できる。グラフのノードはシーンを表す。各シーンには、状況を表現する各種のパラメータ値が属性として付加されている。例えば、映像が野球の試合を記録したものである場合、この属性としては、スコア、守備側選手のポジションと選手名、打者名、ＳＢＯ（ストライク・ボール・アウト）などである。
【００７０】
検索とは、インデックス付けされた映像に対して、状態遷移パターンを発見することである。発見された映像部分（シーンあるいはインデックス）が検索結果となる。該当するシーンが他の該当するシーンを包含する場合は、それらのうち最短であるシーンの流れを検索結果とする。
【００７１】
状態遷移パターンのパス正規表現としては、『シーンｓ０から事象インデックスＩにより、シーンｓ１に遷移する場合』、“ｓ０−Ｉ−＞ｓ１”と表現することとする。
また、“．”は任意のシーンおよび任意の事象インデックスを表現するものとする。
また、“−．−＞”は“→”と略して記述する。
また、時間的に連続する２つのシーン間の遷移は“＝＝＞”で表す。
また、“＊”は０回以上の繰り返しを表し、“＋”は１回以上の繰り返しを表すものとする。
【００７２】
例えば、シーンｓ０から出発して、事象インデックス列“Ａ．＊Ｂ．＊（Ｃ．＊）＊Ｃ”によってシーンｓ１に至るシーンの流れは、以下のパス正規表現で表現される。
ｓ０−Ａ−＞．（ →．）＊−Ｂ−＞．（→．）＊（−Ｃ−＞（→．）＊）＊−Ｃ −＞ｓ１
このパス表現を以下のように略記することにする。
ｓ０−ＡＢＣ＋−＞ｓ１
【００７３】
厳密には、正規表現“Ａ．＊Ｂ．＊（Ｃ．＊）＊Ｃ”と、正規表現“ＡＢＣ＋”の表現内容は同一ではないが、事象インデックスの場合、無視できる事象インデックスも多く、それらを一々記述するのは煩雑であるため、このような略記法を適用する。図６は、このパス正規表現に対応する有限状態オートマトンの状態遷移図を示している。
【００７４】
また、図７に示すように、映像（映像シーン）に対して、Ａ，Ｂ，Ｃ，Ｄ等の事象インデックスが付加されているとする。この映像に対して、図６に示すシーンｓ０，ｓ１を検索すると、ｓ０，ｓ１は図７に示すように求まる。
【００７５】
この状態遷移パターン（状態遷移図）にさらに、条件式を記述することにより、構造情報利用による検索の効率化を図ることができる。
例えば、状態遷移パターンの後ろの〔〕内に条件式を書くこととする。
“ ｓ０−ＡＢＣ＋−＞ｓ１〔打席（ｓ０）ｉｓｔｈｅｓａｍｅａｓ打席（ｓ１）〕”
【００７６】
構造情報を利用した例：
ホームチームの逆転シーンを探せ。ただし、検索結果のシーン列の最後は、そのイニングの最後までとってくること。

【００７７】
ここで、ｄｅｆ以下の定義は、単なる文字列の置き換えであり、実際にはｓ０．ｘなどのシーン環境が設定されたとき、値の評価が起こる。
【００７８】
（５）具体的な映像検索処理アルゴリズムの例
ここで、具体的な映像検索処理アルゴリズムについて説明する前に、映像検索を行う前の処理について確認しておく。
映像検索処理装置１００の状態遷移テーブル記憶部１０２には、前述した状態遷移テーブル生成工程を介して、既に状態遷移テーブルが記憶されているものとする。
【００７９】
図８は、実施の形態１の映像検索処理のアルゴリズムを示すフローチャートである。図１で示した映像検索処理装置１００を用いて所望の映像部分を検索する場合、利用者は、映像入力部１０１を介して検索したい映像を映像検索処理装置１００へ入力する。なお、映像入力部１０１として装置の磁気ディスクが使用されている場合には、検索対象となる映像を指定するだけで良い。
【００８０】
映像検索を行う際の処理として、
先ず、利用者が操作入力部１０３を介して、所望の映像シーンを表現した用語を入力すると、検索の取り掛かりとして状態遷移テーブルから該当する用語に対応したキーインデックスを求める（Ｓ８０１）。
【００８１】
次に、キーインデックスを利用して、キーインデックスと一致する事象インデックスを検索し、結果としてキーインデックスの集合を得る（Ｓ８０２）。
ステップＳ８０２で得られたキーインデックスの集合に対して、状態遷移テーブルで指定されている構造の制約条件（状態遷移パターン）から、そのキーインデックスを含む構造インスタンス（検索粒度）を求める（Ｓ８０３）。
【００８２】
続いて、一つの構造インスタンス（検索粒度と一致する構造単位）に対し（Ｓ８０４）、状態遷移パターン中に含まれる事象インデックスが全て存在するか否かを判定し（Ｓ８０５）、含まれていない構造インスタンスについては、処理を行わない。
【００８３】
また、ある構造インスタンスに複数のキーインデックスが含まれている場合は、キーインデックスの集合から、該当するキーインデックスを除去し（Ｓ８０６）。続いて、全て存在すると判定された構造単位に対して、状態遷移が成立するか否かを判定し、換言すれば、入力した用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定し（Ｓ８０７）、成立するならば、抽象度の高い用語によって指定されている意味が成立したと判定する（Ｓ８０８）。また、このとき、必要に応じて、得られた構造インスタンスに対して、成立した意味に該当する新たな事象インデックスを追加するか、または既存の事象インデックスの属性としてその情報を付加する。
【００８４】
次に、得られたキーインデックスの集合の全てのキーインデックスに対して、上記ステップＳ８０３〜Ｓ８０８を繰り返す（Ｓ８０９）。
【００８５】
その後、得られた構造インスタンス（構造単位）を返り値として返し（Ｓ８１０）、検索結果として映像シーンの切り出しを行う（Ｓ８１１）。なお、検索結果の切り出しは、デフォルトで指定された検索粒度（構造単位）であっても良く、その他に、その切り出し部分を含む任意の構造単位や、キーインデックスを基準として前後にオフセットを指定した切り出し等が指定できるものとする。
【００８６】
また、検索の応用の一つとして、検索条件をパースして、状態遷移パターンと一致するパターンを発見した場合、その意味を説明する解説文（説明文）を生成する手続きを定義しておき、実行することで解説文を自動生成することができる。すなわち、状態遷移テーブルに指定されている映像内容の意味定義（状態遷移パターン）を利用して、映像を説明する文字列を生成することが可能となる。断片的に振られているインデックスを単に羅列しただけでは、単なる用語が並列に並んでいるだけとなるが、状態遷移パターンを利用すれば、利用者に分かりやすい文書を生成できる。
【００８７】
（６）具体的な動作例
以上の構成および映像検索処理アルゴリズムを用いた具体的な動作例について説明する。図９は、野球映像を例とした時の映像インデックス（構造インデックスおよび事象インデックス）の例を示す説明図である。構造インデックスによって、ゲーム開始から、回、イニング、表・裏、打席、投球と言った構造単位で分割され階層化されている。このような構造は、映像インデックスを定義する際に予めプロファイルとして設定されている。
【００８８】
さらに、ヒット、アウト、ホームランなどの事象インデックスも必要に応じて振られている。このように映像インデックスにより、構造化され、説明事象が断片的に設定された映像に対し、図６に示したような状態遷移パターンを用意する。例えば、打席粒度（打席レベルの構造単位）において、ヒットの後に１回以上の加点イベントがあった場合、『タイムリー』という意味となる。また、アウトもしくはフライイベントに続いて、加点イベントが発生した場合は、『犠打』という意味となる。
【００８９】
このとき、中心事象（キーインデックス）として定義されているイベントが検索の手掛かりとなるインデックスである。このキーインデックスを頼りに映像を検索し、指定された粒度において、状態遷移パターンが発見されれば、その粒度は検索結果の候補となる。
【００９０】
さらに、事象インデックスが映像の状況を表す環境パラメータ対して及ぼした効果についても、状態遷移を定義することができる。例えば、野球の場合、打席粒度に対して、加点イベントが発生していたとき、打席開始直前の加点状況と打席終了直後の加点状況の変化に対して、『先制』、『同点』、『逆転』などの意味を表すことで指定できる。
【００９１】
ここで、状態遷移パターンの例について具体的に示す。
＊タイムリーヒット
ｓ０＝＞打席イン＝＞ｓ０’−ヒット・加点＋→ｓ１＝＞打席イン＝＞ｓ１’
＊犠打
ｓ０＝＞投席イン＝＞ｓ０’−アウト・加点＋→ｓ１＝＞打席イン＝＞ｓ１’
＊併殺打
ｓ０＝＞打席イン＝＞ｓ０’−アウト・アウト→ｓ１＝＞打席イン＝＞ｓ１’
＊逆転
ｓ０＝＞打席イン＝＞ｓ０’−加点＋ →ｓ１＝＞打席イン＝＞ｓ１’
〔ｓ０．ホームラン＞ｓ０．アウェイスコア＆＆
ｓ１．’ホームスコア＞ｓ１．’アウェイスコア｜｜
ｓ０．ホームラン＜ｓ０．アウェイスコア＆＆
ｓ１．’ホームスコア＜ｓ１．’アウェイスコア〕
【００９２】
上記のような状態遷移パターンを設定しておけば、抽象度の高い意味の検索に対応できる。逆に言えば、この状態遷移パターンを定義するだけで、映像のインデックス情報の構造や事象定義などに依存せずに映像の内容の意味に基づいた検索が可能となる。すなわち、半構造データとしての映像インデックス情報（インデックスの構造は、作成者ごとに異なり、固定していないという半構造データの特徴を備えた映像インデックス）に対して、統一した検索問い合わせのための環境を用意することが可能となる。
【００９３】
また、解説文（説明文）の自動生成に関しては、抽象度の高い意味を表す状態遷移テーブルに、その主語となるデータをどこから取ってくるかという情報（文字列定義情報）を併せて定義しておき、その主語と意味情報、さらに状況変化により発生した意味情報を接続詞を挟んで組み合わせることにより、利用者にとって違和感のない解説文が生成できる。例えば、タイムリーヒットという用語に対して、説明文の生成条件（文字列定義情報）は次のように指定される。
【００９４】

【００９５】
ここで、＜インデックス．属性名＞は指定された映像インデックスの属性を示す。〔インデックス〕は、指定された粒度においてインデックスが発生した回数を示す。「用語」は、中に記述されている抽象度の高い用語が成立した場合に記述することを示す。
【００９６】
上記の指定により検索結果が得られたときの説明文は、
１回裏高橋の２点タイムリーで逆転
といった形態となる。
【００９７】
また、リアルタイムオーサリングの際は、インデックスを振りながら、上記の状態遷移パターンのキーインデックス（中心事象）が発生したときに、その前後において、指定された粒度内で状態遷移パターンを満足しているかの検証を行う。状態遷移パターンを満足するような事象が連続して起きたときには、そこで定義されている意味が発生したとし、新たなインデックスを付加したり、キーインデックスの属性として、その情報を加えるなどの処理を行う。
【００９８】
例えば、打席粒度の中で、ヒットの後に加点インデックスが１回以上続いた場合は、そのヒットはタイムリーであったと判定し、タイムリーヒットというインデックス（抽象インデックス）を新たに付加するか、ヒットインデックス（事象インデックス）の属性として『タイムリー』を加えるなどの処理を行う。
【００９９】
また、タイムリーヒットというインデックス（抽象インデックス）をキーインデックスとして、状態遷移テーブルに予め設定しておいても良い。
【０１００】
以上説明した実施の形態１に係る映像検索方法および映像インデックス付与方法は、前述した説明および各フローチャートに示した手順に従って、予めプログラムをコンピュータで実行することによって実現される。このプログラムは、ハードディスク、フロッピーディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また、このプログラムは、上記記録媒体を介して、またはネットワークを介して配布することができる。
【０１０１】
（実施の形態２）
実施の形態２の映像内容説明文生成装置は、基本的には実施の形態１の映像検索処理装置１００の検索の応用の一つとして、検索条件をパースして、映像中（構造単位中）に状態遷移パターンと一致するパターンを発見した場合、その意味を説明する解説文（説明文）を生成する手続きを定義し、実行することで映像内容の説明文生成を自動生成するものである。すなわち、状態遷移テーブルに指定されている映像内容の意味定義（状態遷移パターン）を利用して、映像内容を説明する文字列を生成するものである。
【０１０２】
しかし、映像内容の説明文生成は、必ずしも映像検索処理装置１００における映像検索処理の後処理または追加的な機能として実行されるだけでなく、前述した構造インデックスおよび事象インデックスが付与され、構造インデックスで分割された区間の映像シーンを映像の構造単位とし、かつ、複数の階層化した構造単位を用いて構造化した映像であれば、映像検索処理とは別に、独立した機能として映像内容の説明文生成処理を行うことができる。したがって、本発明の映像内容の説明文生成方法を明確に表現するために、ここでは適用する装置を映像内容説明文生成装置２００として呼ぶことにする。
【０１０３】
また、実施の形態２においては、構造インデックス、事象インデックスおよび抽象インデックスで構成される構造化されたインデックス情報の部分と、実際の映像シーンの部分とがそれぞれ別々に管理されており、映像内容の説明文を生成する際には、インデックス情報の部分を入力して、説明文の生成を行うものとする。これにより、データ量が多い映像シーン部分と切り離して処理を行えるので、装置の負荷を軽減して説明文の生成処理を高速に行うことができる。
【０１０４】
以下、実施の形態２の映像内容説明文生成装置２００について、
（７）映像内容説明文生成装置の装置構成
（８）事象インデックスと抽象インデックスとの関係
（９）実施の形態２の状態遷移テーブルの構造
（１０）映像内容の説明文生成処理のアルゴリズムの例
（１１）構文および文字列定義情報を用いて生成した説明文の具体例
の順で説明する。
【０１０５】
（７）映像内容説明文生成装置の装置構成
図１０は、実施の形態２の映像内容説明文生成装置の構成図を示し、同図（ａ）が映像内容説明文生成装置２００のハード構成の一例を示し、同図（ｂ）が映像内容説明文生成装置２００の機能ブロック図を示している。
映像内容説明文生成装置２００のハード構成としては、少なくともＣＰＵ（中央演算処理装置）、ディスプレー、キーボード、磁気ディスクを有する装置であれば良く、例えば、同図（ａ）に示すようなパーソナルコンピュータを利用することができる。
【０１０６】
また、映像内容説明文生成装置２００は、同図（ｂ）に示すように、説明文生成の対象である構造化した映像を入力するための映像入力部２０１と、後述する用語、生成粒度、状態遷移パターン、文字列定義情報およびキーインデックスを状態遷移テーブルとして記憶した状態遷移テーブル記憶部２０２と、映像内容の説明文生成処理に必要な各種指定を入力するための操作入力部２０３と、操作入力部２０３を介して説明文を生成する映像シーンの範囲（構造単位）が指定された場合に、状態遷移テーブル記憶部２０２の状態遷移テーブルを参照し、説明文を生成する対象となる構造単位と一致する生成粒度を検索し、該当する生成粒度を用いて構造単位中で成立する用語を検索する用語検索部２０４と、用語検索部２０４で検索された用語（成立した用語）の状態遷移パターンの文字列定義情報を用いて、映像シーンの説明文を生成する説明文生成部２０５と、説明文生成部２０５で生成した説明文を表示する説明文表示部２０６と、を備えている。
【０１０７】
なお、実施の形態２の映像内容説明文生成装置２００は、説明文の生成に使用する情報が設定された状態遷移テーブルと、予め構造インデックスおよび事象インデックスに付与されている文字列または文字列に変換可能な属性情報とを用いて、説明文を生成する。
【０１０８】
（８）事象インデックスと抽象インデックスとの関係
次に、実施の形態２の映像内容説明文生成装置２００が、説明文生成の対象とする構造化された映像について説明する。実施の形態２で使用する構造化された映像には、映像を意味的な纏まりで分割するための構造インデックスと、映像中で発生した事象の内容および場所を特定するための事象インデックスと、特定の用語を定義した状態遷移パターンが成立した場合に、その用語の意味が成立していることを表すための抽象インデックスと、が付与されている。
【０１０９】
なお、構造インデックスおよび事象インデックスとの関係は、実施の形態１の「（２）構造インデックスと事象インデックスとの関係」で説明した内容と同一であるため、ここでは、事象インデックスと抽象インデックスとの関係について説明する。また、説明文を生成する際の映像単位として、構造インデックスによって定義された構造単位を指定したものが生成粒度である。
【０１１０】
事象インデックスは、映像中で発生した事象の内容および場所を特定するためのインデックスである。換言すれば、映像上の１つの事象と対応付けて付与され、かつ、事象の内容（意味）を示す断片的な情報を属性情報として有するものである。
【０１１１】
抽象インデックスは、複数の事象の組み合わせによって表現される意味（すなわち、実施の形態１で説明した抽象度の高い用語）を示すインデックスである。また、抽象インデックスは、ある区間の複数の事象の発生が明らかになって初めて意味が成立するものであり、複数の事象インデックスの発生パターンによって表現可能な意味を有するものである。
【０１１２】
一方、実施の形態２では、状態遷移パターンの定義を、事象（事象インデックス）の発生パターンに基づいて、生成粒度の中で連続して発生する複数の事象インデックスの入力列として抽象度の高い用語を定義したものとする。
【０１１３】
換言すれば、抽象インデックスが、複数の事象インデックスの発生パターンによって表現可能な意味を有するものであり、状態遷移パターンが、複数の事象インデックスの入力列として用語を定義したものであるため、１つの抽象インデックスと１つの状態遷移パターンは、事象インデックスの入力列（または発生パターン）を構成要件として１対１の対応関係で存在することになる。したがって、抽象インデックスを用いて、その抽象インデックスと１対１で対応する状態遷移パターン（複数の事象インデックスの入力列）を表現することができるので、多数の事象インデックスで表現された状態遷移パターンの場合に、抽象インデックスと事象インデックスとを用いて表現することにより、状態遷移パターンをより少ない数のインデックスで表現することができるようになる。また、抽象インデックスと事象インデックスとを用いて状態遷移パターンを表現することにより、より抽象度の高い用語の定義が容易になるという効果を奏する。
【０１１４】
また、状態遷移パターンが成立した場合にのみ、対応する抽象インデックスを映像に付与することが可能であり、映像中に抽象インデックスが付与されている場合には、対応する状態遷移パターンが成立していることを意味している。
【０１１５】
次に、図１１（ａ）〜（ｃ）を参照して、抽象インデックスを用いて状態遷移パターンを定義した例について説明する。例えば、図１１（ａ）に示すように、ある用語Ｗの状態遷移パターンが、事象１（事象インデックス）→事象２→事象３→事象８→事象１２で示す事象インデックスの入力列で定義されているとする。
【０１１６】
また、図１１（ｂ）に示すように、ある用語Ｚの状態遷移パターンが、事象１→事象２→事象３→事象８→事象１２→事象１３で示す事象インデックスの入力列で定義されているとする。
【０１１７】
このような場合に、用語Ｗに対応する抽象インデックスＷを用いて、用語Ｗの状態遷移パターン（事象１→事象２→事象３→事象８→事象１２）を表現するものと定義しておくと、用語Ｚの状態遷移パターンは、抽象Ｗ（抽象インデックスＷ）と事象１３（事象インデックス）とを用いることにより、図１１（ｃ）に示すように簡略化して記述することができる。
【０１１８】
状態遷移パターンを定義する際に、事象インデックスだけでなく、抽象インデックスを使用可能とすることにより、より複雑な用語（より抽象的な概念の用語）の状態遷移パターンの定義が容易となる。
【０１１９】
（９）実施の形態２の状態遷移テーブルの構造
図１２は、実施の形態２の状態遷移テーブルの構造例を示す。状態遷移テーブルには、用語と、生成粒度と、状態遷移パターンと、キーインデックスと、文字列定義情報と、が設定されている。
【０１２０】
用語は、複数の事象の組み合わせによって表現可能な用語であり、実施の形態１で説明した内容と同一である。
生成粒度には、説明文を生成する際の映像単位として適当な構造単位を設定したものである。構造インデックスによって定義された構造単位を指定することができる。すなわち、生成粒度は実施の形態１の検索粒度と同じものである。
【０１２１】
状態遷移パターンは、用語を表現する事象の発生パターンに基づいて、生成粒度の中で連続して発生する複数の事象インデックスの入力列として用語を定義したものである。また、前述したように状態遷移パターンは、事象インデックスだけでなく、抽象インデックスを用いて表現できるものとする。例えば、図１２において、用語Ｃの状態遷移パターンは、事象インデックスのみで表すと、『事象１→事象４→事象６→事象７→事象８』で示される入力列として表現できるが、用語Ｂの抽象インデックス（抽象Ｂ）を用いて表すと、『抽象Ｂ→事象７→事象８』で示される入力列として表現できる。
【０１２２】
また、キーインデックスは、状態遷移パターン中に存在する事象インデックスのうち、少なくとも１つを選択して設定したものである。なお、実施の形態２では、キーインデックスの一つとして、対応する用語を表す抽象インデックス、換言すれば、該当する状態遷移パターン全体を表す抽象インデックスが設定されているものとする。
【０１２３】
文字列定義情報には、説明文の生成に使用する構文要素および構文の構文要素として使用する文字列の入力元が設定されている。また、構文要素として使用する文字列の入力元として、構造インデックスまたは事象インデックスの属性情報が設定されている。例えば、事象１（事象インデックス）の○○○という属性情報の内容を文字列の入力元とする、というような設定がなされている。また、文字列の入力元としては、後述するように単に属性情報の種類を指定するだけでも良いし、用語Ｃの構文要素（Ｗｈａｔ）のように「“」と「”」の間に文字列（××××××）を直接記述しておき、この文字列を入力元としても良い。
【０１２４】
構文要素としては、「何時（Ｗｈｅｎ）、どこで（Ｗｈｅｒｅ）、なぜ（Ｗｈｙ）、誰の（Ｗｈｏ）、何（Ｗｈａｔ）で、どのように（Ｈｏｗ）なった」の５Ｗ１Ｈを基本とした構文要素が設定されている。なお、文字列定義情報には、５Ｗ１Ｈの構文要素が全て設定されていても良く、または一部のみが設定されていても良い。また、その他の構文要素を設定することも可能である。
【０１２５】
ここで、図１２に示した状態遷移テーブルを、各用語を表す抽象インデックスが定義されたテーブルとして捉えて、野球映像を対象とした場合の抽象インデックスの定義の例を具体的に説明する。
【０１２６】
例えば、用語『１塁打』の抽象インデックスを定義すると、

と記述することができる。
【０１２７】
なお、上記の記述は以下のルールで作成されている。
「＆ＡＢＳＩＮＤＥＸ文字列」：
＆ＡＢＳＩＮＤＥＸは抽象インデックスの宣言子（識別子）であり、後続の『文字列』が抽象インデックスの名称（用語）および意味が『文字列』であることを示している。
「＆ＲＡＮＧＥ文字列」：
＆ＲＡＮＧＥは構造単位（生成粒度）の宣言子であり、後続の『文字列』で構造単位（生成粒度）を指定している。
【０１２８】
「＆ＰＡＴＴＥＲＮ文字列」：
＆ＰＡＴＴＥＲＮは、状態遷移パターンの宣言子であり、後続の『文字列』で状態遷移パターンを表現している。例えば、「＆ＰＡＴＴＥＲＮヒット」は、状態遷移パターンが１つの事象（ヒットの事象インデックス）で構成されていることを示している。
「＆ＫＥＹ文字列」：
＆ＫＥＹはキーインデックスの宣言子であり、後続の『文字列』でキーインデックスを指定している。なお、キーインデックスとしては、後続の『文字列』の他に、「＆ＡＢＳＩＮＤＥＸ文字列」で宣言されている抽象インデックスそのものが自動的に指定される。
【０１２９】
「＆ＥＸＰ＜文字列＞」：
＆ＥＸＰは、文字列定義情報の宣言子であり、後続の『＜文字列＞』で構文要素およびその入力元を定義している。例えば、＜Ｗｈｅｎ：ｉｎｎｉｎｇ＿ｔｉｍｅ，Ｗｈｏ：ｂａｔｔｅｒ＿ｎａｍｅ，Ｗｈａｔ：”１塁打”＞の場合、構文要素（Ｗｈｅｎ）は属性情報（ｉｎｎｉｎｇ＿ｔｉｍｅ）を入力元とし、構文要素（Ｗｈｏ）は属性情報（ｂａｔｔｅｒ＿ｎａｍｅ）を入力元とし、構文要素（Ｗｈａｔ）は“１塁打”（“，”の間の文字を入力することを示す）を入力元とすることを示している。
【０１３０】
次に、他の抽象インデックスの定義例を以下に示す。

【０１３１】
（１０）映像内容の説明文生成処理のアルゴリズムの例
次に、図１３のフローチャートを参照して、映像内容の説明文生成処理のアルゴリズムについて説明する。実施の形態２の映像内容説明文生成装置２００を用いて映像内容の説明文を生成する場合、先ず、映像内容の説明文を生成したい映像、すなわち、対象となる構造単位の映像（インデックス情報のみでも良い）を操作入力部２０３および映像入力部２０１を介して入力する（Ｓ１３０１）。
【０１３２】
ここで、対象となる構造単位の映像の入力方法としては、例えば、利用者が操作入力部２０３を介して、説明文を生成したい映像シーンの範囲を指定することにより、指定された映像シーンの範囲に対応する構造単位で映像（インデックス情報）を入力する方法や、あるいは予め説明文を生成する範囲として特定の構造単位を設定しておき、特定の構造単位の映像を自動的に切り出して入力する方法でも良い。前者の方法では、利用者が所望の映像シーンを選択して、所望の映像シーンのみの説明文を生成させることができ、後者の方法では、特定の構造単位毎に連続して、かつ、自動的に説明文を生成させることができる。
【０１３３】
次に、用語検索部２０４が、状態遷移テーブル記憶部２０２の状態遷移テーブルを参照して、説明文を生成する映像の構造単位と一致する生成粒度を検索する（Ｓ１３０２）。この検索の結果、一致する生成粒度があるか否かを判定し（Ｓ１３０３）、一致する生成粒度があれば、ステップＳ１３０４へ進み、一致する生成粒度がなければステップＳ１３０７へ進む。なお、用語検索部２０４は、状態遷移テーブルを先頭から順番に一致する生成粒度がなくなるまで検索するか、または一致する生成粒度が検索される度に、ステップＳ１３０２からステップＳ１３０３へ移行する。
【０１３４】
続いて、用語検索部２０４は、該当する生成粒度に対応する状態遷移パターンのキーインデックスと一致する事象インデックスが、前記対象となる構造単位中に存在するか否かを判定し（Ｓ１３０４）、一致する事象インデックスが存在しない場合には、ステップＳ１３０２へ戻って次の一致する生成粒度の検索を行う。一方、一致する事象インデックスが存在する場合には、状態遷移パターンの事象インデックスの入力列と、対象となる構造単位中の事象インデックスの発生パターンが一致するか否かを判定する（Ｓ１３０５）。
【０１３５】
ステップＳ１３０５において発生パターンが一致した場合には、状態遷移パターンで定義された用語が成立したと判定し、成立した用語の抽象インデックスをステップＳ１３０１で入力した映像（インデックス情報）中に付与し（Ｓ１３０６）、ステップＳ１３０２へ戻る。なお、成立した用語の抽象インデックスを映像に付与することにより、例えば、同じ映像から再度、説明文を生成する際に抽象インデックスを利用することできるようになる。
【０１３６】
上記ステップＳ１３０２〜ステップＳ１３０６の処理は、状態遷移テーブル中の一致する生成粒度がなくなるまで実行される。換言すれば、状態遷移テーブル中の全ての一致する生成粒度に対して、各生成粒度の用語が成立するか否かが判定され、成立する用語の集合が抽出されたことになる。
【０１３７】
ステップＳ１３０３において、一致する生成粒度がないと判定された場合に、説明文生成部２０５が、成立した用語の文字列定義情報を用いて、説明文を生成し（Ｓ１３０７）、生成した説明文を説明文表示部２０６に表示して処理を終了する。なお、説明文生成部２０５には、説明文の生成用に予め５Ｗ１Ｈを基本とした構文が準備されている。説明文生成部２０５は、説明文を生成する際に、５Ｗ１Ｈを基本とした構文に、文字列定義情報の構文要素を配置することにより、説明文を生成する。
【０１３８】
なお、上記の説明文生成処理のアルゴリズムでは、キーインデックスが事象インデックスであることを想定して説明したが、ステップＳ１３０３の後に、キーインデックスとして抽象インデックスを優先して用いることにより、対象となる構造単位に該当する抽象インデックス（キーインデックス）が付与されているか否かを判定する処理を追加し、さらに抽象インデックスが設定されている場合には、対応する状態遷移パターンで定義された用語が成立したと判定する処理を追加することにより、用語が成立するか否かの判定処理の高速化を図ることができる。その後、抽象インデックスが存在しない場合に、キーインデックスとして指定されている事象インデックスを用いて、ステップＳ１３０４以降を実行するようにする。
【０１３９】
（１１）構文および文字列定義情報を用いて生成した説明文の具体例
次に、５Ｗ１Ｈを基本とした構文の例と、構文および文字列定義情報を用いて生成した説明文の例を具体的に挙げる。なお、説明を簡単するために構文の例として、以下の５Ｗ１Ｈの構文例を使用する。

【０１４０】
▲１▼成立した用語が１つである場合（例１）
先ず、説明文を生成する場合の最も簡単な例を挙げて説明する。
例えば、成立した用語が、『タイムリーヒット』であり、
その文字列定義情報が、
『＆ＥＸＰ＜Ｗｈｅｎ：ｉｎｎｉｎｇ＿ｔｉｍｅ，Ｗｈｏ：ｂａｔｔｅｒ＿ｎａｍｅ，Ｗｈａｔ：”タイムリーヒット”＞』
である場合、成立した用語の文字列定義情報からは、
『Ｗｈｅｎ』、『Ｗｈｏ』、『Ｗｈａｔ』
の３つの構文要素が得れる。
【０１４１】
また、構文要素の入力元として指定された属性情報の内容（文字列）が、
『ｉｎｎｉｎｇ＿ｔｉｍｅ＝１回裏』
『ｂａｔｔｅｒ＿ｎａｍｅ＝高橋』
『Ｗｈａｔ：”タイムリーヒット” 』
であるとすると、上記構文および文字列定義情報から、以下の説明文が生成される。

なお、（なし）は該当する情報がない部分を示し、必ずしも５Ｗ１Ｈの全て構文要素が存在する必要はない。このように文字列定義情報から得られる構文要素を使用して説明文を生成することができる。また、文字列定義情報中で５Ｗ１Ｈ以外の構文要素が指定されている場合には、構文を適宜調整して説明文を長くしても良く、あるいはその他の構文要素が組み込まれた構文を予め複数準備しておき、適宜選択して使用するようにしても良い。
【０１４２】
▲２▼成立した用語が複数である場合（例２）
また、意味の成立した用語が複数ある場合には、説明文生成部２０５は、各用語の文字列定義情報に定義されている構文要素を抽出して、抽出した構文要素を組み合わせて、５Ｗ１Ｈの構文の該当する構文要素の位置に配置し、説明文を生成する。
【０１４３】
例えば、成立した用語が、
『タイムリーヒット』
『逆転』
の２つであり、
それぞれの文字列定義情報が、
『＆ＥＸＰ＜Ｗｈｅｎ：ｉｎｎｉｎｇ＿ｔｉｍｅ，Ｗｈｏ：ｂａｔｔｅｒ＿ｎａｍｅ，Ｗｈａｔ：”タイムリーヒット”＞』
『＆ＥＸＰ＜Ｈｏｗ：”逆転”＞』
の２つである場合、成立した用語の文字列定義情報からは、構文要素として、
『Ｗｈｅｎ』、『Ｗｈｏ』、『Ｗｈａｔ』、『Ｈｏｗ』
の４つの構文要素が得られる。
【０１４４】
また、構文要素の入力元として指定された属性情報の内容（文字列）が、
『ｉｎｎｉｎｇ＿ｔｉｍｅ＝１回裏』
『ｂａｔｔｅｒ＿ｎａｍｅ＝高橋』
『Ｗｈａｔ：”タイムリーヒット” 』
『Ｈｏｗ：” 逆転” 』
であるとすると、上記構文および文字列定義情報から、以下の説明文が生成される。

【０１４５】
▲３▼成立した用語が複数である場合（例３）
意味の成立した用語が複数あり、このときに、重複する構文要素がある場合（例えば、『Ｗｈｏ』の構文要素が複数ある場合）には、重複した構文要素を含む各用語の状態遷移パターンを参照し、状態遷移パターン中でより時間的に後の事象インデックスを参照している構文要素を選択する。
【０１４６】
例えば、成立した用語が、
『１塁打』
『タイムリーヒット』
の２つであり、
それぞれの文字列定義情報が、
『＆ＥＸＰ＜Ｗｈｅｎ：ｉｎｎｉｎｇ＿ｔｉｍｅ，Ｗｈｏ：ｂａｔｔｅｒ＿ｎａｍｅ，Ｗｈａｔ：”１塁打”＞』
『＆ＥＸＰ＜Ｗｈｅｎ：ｉｎｎｉｎｇ＿ｔｉｍｅ，Ｗｈｏ：ｂａｔｔｅｒ＿ｎａｍｅ，Ｗｈａｔ：”タイムリーヒット”＞』
の２つである場合、成立した用語の文字列定義情報からは、構文要素として、
『Ｗｈｅｎ』、『Ｗｈｏ』、『Ｗｈａｔ』
の３つの構文要素が得られるが、３つの構文要素が全て重複している。
【０１４７】
また、構文要素の入力元として指定された属性情報の内容（文字列）が、
１塁打の場合：
『ｉｎｎｉｎｇ＿ｔｉｍｅ＝１回裏』
『ｂａｔｔｅｒ＿ｎａｍｅ＝清原』
『Ｗｈａｔ：”一塁打” 』
タイムリーヒットの場合：
『ｉｎｎｉｎｇ＿ｔｉｍｅ＝１回裏』
『ｂａｔｔｅｒ＿ｎａｍｅ＝高橋』
『Ｗｈａｔ：”タイムリーヒット” 』
であるとする。ここで『ｉｎｎｉｎｇ＿ｔｉｍｅ』の内容は両方ともに『１回裏』で同一であるので特に選択する必要がないが、『ｂａｔｔｅｒ＿ｎａｍｅ』と『Ｗｈａｔ』の内容が異なるので何れかを選択しなければならない。
【０１４８】
このような場合に、重複した構文要素を含む各用語の状態遷移パターンを参照し、状態遷移パターン中でより時間的に後の事象インデックスを参照している構文要素を選択する。
例えば、
１塁打の状態遷移パターンが、
＆ＰＡＴＴＥＲＮヒット
タイムリーヒットの状態遷移パターンが、
＆ＰＡＴＴＥＲＮ１塁打，加点＋
である場合、映像シーンの範囲における事象インデックス（ヒット）、事象インデックス（１塁打）、事象インデックス（加点＋）が付与されている位置を比較し、より時間的に後の位置に付与されている事象インデックスを特定する。ここでは、事象インデックス（加点＋）が時間的に後の位置に付与されているものとして説明する。
【０１４９】
特定された事象インデックス（加点＋）を参照している用語は、タイムリーヒットであるので、状態遷移パターン中でより時間的に後の事象インデックスを参照している構文要素として、タイムリーヒットの構文要素を選択する。
【０１５０】
上記構文および文字列定義情報から、以下の説明文が生成される。

【０１５１】
さらに、成立した用語が複数である場合に説明文を生成する際の条件を追加することにより、以下のように重複する構文要素を発生時間順に並べて説明文を生成することもできる。

【０１５２】
▲４▼成立した用語が複数である場合（例４）
また、成立した用語が複数である場合の説明文の他の生成例として、５Ｗ１Ｈの構文要素の中に重複する構文要素がある場合、各用語の状態遷移パターンを比較し、より多くの事象インデックスを用いて定義された状態遷移パターンの構文要素を優先して、説明文を生成する。
【０１５３】
例えば、成立した用語が、
『タイムリーヒット』
『タイムリーツーベース』
『逆転』
の３つであり、
それぞれの文字列定義情報が、
『＆ＥＸＰ＜Ｗｈｅｎ：ｉｎｎｉｎｇ＿ｔｉｍｅ，Ｗｈｏ：ｂａｔｔｅｒ＿ｎａｍｅ，Ｗｈａｔ：”タイムリーヒット”＞』
『＆ＥＸＰ＜Ｗｈｅｎ：ｉｎｎｉｎｇ＿ｔｉｍｅ，Ｗｈｏ：ｂａｔｔｅｒ＿ｎａｍｅ，Ｗｈａｔ：”タイムリーツーベース”＞』
『＆ＥＸＰ＜Ｈｏｗ：”逆転”＞』
の３つである場合、成立した用語の文字列定義情報からは、構文要素として、
『Ｗｈｅｎ』、『Ｗｈｏ』、『Ｗｈａｔ』、『Ｈｏｗ』
の４つの構文要素が得られるが、『Ｈｏｗ』以外の３つの構文要素が重複している。
【０１５４】
また、構文要素の入力元として指定された属性情報の内容（文字列）が、
タイムリーヒットの場合：
『ｉｎｎｉｎｇ＿ｔｉｍｅ＝１回裏』
『ｂａｔｔｅｒ＿ｎａｍｅ＝高橋』
『Ｗｈａｔ：”タイムリーヒット” 』
タイムリーヒットの場合：
『ｉｎｎｉｎｇ＿ｔｉｍｅ＝１回裏』
『ｂａｔｔｅｒ＿ｎａｍｅ＝高橋』
『Ｗｈａｔ：”タイムリーツーベース” 』
であるとする。ここで『ｉｎｎｉｎｇ＿ｔｉｍｅ』の内容は両方ともに『１回裏』で同一であり、『ｂａｔｔｅｒ＿ｎａｍｅ』が『高橋』で同一であるので特に選択する必要がないが、『Ｗｈａｔ』の内容が異なるので何れかを選択しなければならない。
【０１５５】
このような場合に、重複する構文要素を含む各用語の状態遷移パターンを比較し、より多くの事象インデックスを用いて定義された状態遷移パターンの構文要素を優先して選択する。
例えば、
タイムリーヒットの状態遷移パターンが、
＆ＰＡＴＴＥＲＮ１塁打，加点＋
タイムリーツーベースの状態遷移パターンが、
＆ＰＡＴＴＥＲＮヒット，２塁進塁，加点＋
である場合、状態遷移パターンで参照している事象インデックスの数は、『タイムリーヒット』が事象インデックス（１塁打）と事象インデックス（加点＋）の２つであり、『タイムリーツーベース』が事象インデックス（ヒット）と事象インデックス（２塁進塁）と事象インデックス（加点＋）との３つである。
したがって、
タイムリーツーベースの参照数（３）＞タイムリーヒットの参照数（２）
となり、ここでは、タイムリーツーベースの構文要素を選択する。
【０１５６】
上記構文および文字列定義情報から、以下の説明文が生成される。

【０１５７】
また、他の説明文の生成例として、以下に示すように、単純に重複する構文要素を並列に列挙するようにしても良い。

【０１５８】
すなわち、説明文は『１回裏、２・３塁間で突風のため、（松坂のエラー，高橋のヒット）で逆転』という記述になる。このように状況によっては、単純に並列に列挙する方が、映像シーンの中で発生した事象をより正確な情報として伝えることができる場合がある。
【０１５９】
以上説明した実施の形態２に係る映像内容の説明文生成方法は、前述した説明および各フローチャートに示した手順に従って、予めプログラムをコンピュータで実行することによって実現される。このプログラムは、ハードディスク、フロッピーディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また、このプログラムは、上記記録媒体を介して、またはネットワークを介して配布することができる。
【０１６０】
【発明の効果】
以上説明したように、本発明の映像検索方法（請求項１〜８）によれば、予め、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語に対して、映像を検索する際の検索対象単位として適当な構造単位を検索粒度として設定し、用語を複数の事象で表現した事象の発生パターンに基づいて、検索粒度の中で連続して発生する複数の事象インデックスの順序を定めた入力列として用語に対応した状態遷移パターンを定義しておき、映像中から所望の映像シーンを検索する場合に、所望の映像シーンを表現した用語を入力し、入力した用語に対応する検索粒度を検索対象単位として、検索粒度と一致する構造単位毎に、入力した用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定し、一致した構造単位を検索結果として出力するため、映像の内容の意味に対する問い合わせを抽象度の高い用語または概念を用いて行うことができ、かつ、高速に検索を行うことができる。
【０１６１】
また、用語に対応した状態遷移パターン毎に、それぞれの状態遷移パターン中に存在する事象インデックスのうち、少なくとも１つの事象インデックスが検索の取り掛かりとなるキーインデックスとして指定しておき、映像中から所望の映像シーンを検索する場合に、検索粒度と一致する構造単位で、かつ、キーインデックスと一致する事象インデックスを有する構造単位を検索した後、該当する構造単位に対して、状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かの判定を行うことにより、高速にキーインデックス検索（すなわち、キーワード検索）を行って対象となる構造単位を絞り込んだ後、状態遷移パターンを用いた判定を行うので、さらに抽象度の高い用語または概念を用いた問い合わせによる映像検索の高速化を図ることができる。
【０１６２】
また、検索結果として出力された構造単位に基づいて、映像から所望の映像シーンを取り出す際に、構造単位で特定された映像シーンを出力するため、すなわち、予め映像を検索する際の検索粒度として設定した適当な構造単位で映像シーンを出力するため、利用者が望む映像部分に近い映像シーンを容易に出力できる。
【０１６３】
また、検索結果として出力された構造単位に基づいて、映像から所望の映像シーンを取り出す際に、構造化された映像上における上位または下位の任意の構造単位を指定可能であるため、抽象的な用語を用いた問い合わせの入力と、出力する構造単位の指定の組み合わせによって、検索を行う際の利便性の向上を図ることができる。例えば、映像シーンを表現した抽象的な用語で検索された構造単位の前または／および後に実際に見たい映像シーンが存在する場合や、映像シーンを表現した抽象的な用語で検索された構造単位の一部（下位の構造単位）に実際に見たい映像シーンが存在する場合に便利である。
【０１６４】
また、検索結果として出力された構造単位に基づいて、映像から所望の映像シーンを取り出す際に、キーインデックスが付与された場所の前後に映像切り出しのためのオフセットを指定して映像シーンを取り出すため、検索を行う際の利便性の向上を図ることができる。例えば、映像シーンを表現した抽象的な用語で検索された構造単位の前または／および後の状況を併せて確認したい場合に便利である。
【０１６５】
また、入力した用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定した際に、一致した構造単位に、その用語を表すインデックスを抽象インデックスとして定義して新たに付加し、キーインデックスとして再利用するため、同一の抽象的な用語で再度映像の検索が行われた場合に、このキーインデックス（抽象インデックス）を利用してさらに検索の高速化を図ることができる。また、検索結果として出力した構造単位を後で再度確認する場合にこのキーインデックス（抽象インデックス）を利用して同一の構造単位を確実に検索することができる。
【０１６６】
また、構造インデックスおよび事象インデックスには、複数の属性情報が付加されており、用語に対応した状態遷移パターンには、構造インデックスおよび事象インデックスの各属性情報を用いて用語に関連した説明文を生成するための文字列定義情報が付加されており、状態遷移パターンの文字列定義情報に基づいて、検索結果として出力された構造単位中の属性情報を参照して用語に関連した説明文を生成するため、映像の内容の意味を解釈し、一般的な意味のある文字列に変換して、利用者にとって分かりやすい映像内容の説明文字列を生成することができる。
【０１６７】
また、事象インデックスに加えて抽象インデックスを用いて、事象インデックスと抽象インデックスからなる入力列として用語に対応した状態遷移パターンを定義するため、状態遷移パターンの定義が容易に行えると共に、検索処理の高速化を図ることも可能である。
【０１６８】
また、本発明の映像検索方法（請求項９）によれば、映像検索を行う前の処理として、予め、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語に対して、映像を検索する際の検索対象単位として適当な構造単位を検索粒度として設定し、語を複数の事象で表現した事象の発生パターンに基づいて、検索粒度の中で連続して発生する複数の事象インデックスの順序を定めた入力列として用語に対応した状態遷移パターンを定義し、さらに、用語に対応した状態遷移パターン毎に、それぞれの状態遷移パターン中に存在する事象インデックスのうち、少なくとも１つの事象インデックスを検索の取り掛かりとなるキーインデックスとして指定して、用語、検索粒度、状態遷移パターンおよびキーインデックスからなる状態遷移テーブルを生成する状態遷移テーブル生成工程を含み、映像検索を行う際の処理として、所望の映像シーンを表現した用語を入力する用語入力工程と、状態遷移テーブルを参照して、用語入力工程で入力した用語に対応する検索粒度を検索対象単位とし、かつ、用語に対応するキーインデックスを用いて、キーインデックスと一致する事象インデックスを有する構造単位を検索する検索工程と、状態遷移テーブルを参照して、検索工程で検索した構造単位中に用語に対応する状態遷移パターン中に含まれる事象インデックスが全て存在するか否かを判定する第１の判定工程と、第１の判定工程で全て存在すると判定された構造単位に対して、入力した用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定する第２の判定工程と、第２の判定工程で一致すると判定された構造単位に基づいて、映像中から映像シーンを切り出して、検索結果として出力する検索結果出力工程と、を含むため、映像の内容の意味に対する問い合わせを抽象度の高い用語または概念を用いて行うことができ、かつ、高速にキーインデックス検索（すなわち、キーワード検索）を行って対象となる構造単位を絞り込んだ後、状態遷移パターンを用いた判定を行うので、さらに抽象度の高い用語または概念を用いた問い合わせによる映像検索の高速化を図ることができる。
【０１６９】
さらに、本発明のコンピュータ読み取り可能な記録媒体（請求項１０）によれば、請求項１〜９のいずれか一つに記載の映像検索方法をコンピュータに実行させるためのプログラムを記録したため、このプログラムをコンピュータに実行させることにより、映像の内容の意味に対する問い合わせを抽象度の高い用語または概念を用いて行うことができ、かつ、高速にキーインデックス検索（すなわち、キーワード検索）を行って対象となる構造単位を絞り込んだ後、状態遷移パターンを用いた判定を行うので、さらに抽象度の高い用語または概念を用いた問い合わせによる映像検索の高速化を図ることができる。
【０１７０】
また、本発明の映像検索処理装置（請求項１１、１２）によれば、検索対象である構造化した映像を入力する映像入力手段と、検索する所望の映像シーンを指定するための、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語と、用語で映像を検索する際の検索対象単位としての構造単位を指定する検索粒度と、用語を検索粒度の中で連続して発生する複数の事象インデックスの順序を定めた入力列として定義した状態遷移パターンと、状態遷移パターン中に存在する事象インデックスのうちの少なくとも１つの事象インデックスを指定したキーインデックスと、を状態遷移テーブルとして記憶した記憶手段と、所望の映像シーンを検索するための問い合わせ用語を入力または指定するための操作入力手段と、操作入力手段を介して問い合わせ用語が入力または指定された場合に、記憶手段の状態遷移テーブルを参照し、問い合わせ用語に対応するキーインデックスを用いて、映像入力手段で入力した映像から検索粒度と一致し、かつ、キーインデックスと一致する事象インデックスを有する構造単位を検索する検索手段と、検索手段で検索された構造単位を入力し、問い合わせ用語に対応した状態遷移パターンと構造単位中の事象インデックスの発生パターンが一致するか否かを判定する判定手段と、判定手段で一致すると判定された構造単位に基づいて、映像中から映像シーンを切り出して、検索結果として出力する検索結果出力手段と、を備えたため、映像の内容の意味に対する問い合わせを抽象度の高い用語または概念を用いて行うことができ、かつ、高速にキーインデックス検索（すなわち、キーワード検索）を行って対象となる構造単位を絞り込んだ後、状態遷移パターンを用いた判定を行うので、さらに抽象度の高い用語または概念を用いた問い合わせによる映像検索の高速化を図ることができる。
【０１７１】
また、本発明の映像インデックス付与方法（請求項１３）によれば、予め、複数の事象が連続して発生することによって意味が成立し、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語と、用語を複数の事象で表現した複数の事象インデックスの順序を定めた入力列を用いて用語を定義した状態遷移パターンと、所定の構造インデックスで定義される映像の構造単位を用語に対応させて指定した検索粒度とを、用語毎に対応させた状態遷移テーブルを設定しておき、映像に映像インデックスを付与する際に、状態遷移テーブルを参照して、構造インデックスによって特定される構造単位毎に、対象となる構造単位と検索粒度が一致し、かつ、対象となる構造単位内に付与された事象インデックスの付与順序と状態遷移パターンの複数の事象インデックスの入力列とが一致する用語を検索して、一致する用語が存在する場合に、一致した用語の意味が発生したと判定し、該当する用語の成立を示す事象インデックスまたは属性情報を付与するため、抽象度の高い用語を用いたインデックスの付与を効率良く高速に行うことができる。
【０１７２】
また、本発明の映像インデックス付与方法（請求項１４）によれば、予め、複数の事象が連続して発生することによって意味が成立し、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語と、用語を複数の事象で表現した複数の事象インデックスの順序を定めた入力列を用いて用語を定義した状態遷移パターンと、所定の構造インデックスで定義される映像の構造単位を用語に対応させて指定した検索粒度とを、状態遷移パターン中に存在する事象インデックスのうち、少なくとも１つの事象インデックスを指定した中心インデックスとを、用語毎に対応させた状態遷移テーブルを設定しておき、映像に映像インデックスを付与する際に、状態遷移テーブルを参照して、中心インデックスと一致する事象インデックスが付与された場合に、構造インデックスによって特定される構造単位毎に、対象となる構造単位と検索粒度が一致し、かつ、対象となる構造単位内に付与された事象インデックスの付与順序と状態遷移パターンの複数の事象インデックスの入力列とが一致する用語を検索して、一致する用語が存在する場合に、一致した用語の意味が発生したと判定し、該当する用語の成立を示す事象インデックスまたは属性情報を付与するため、抽象度の高い用語を用いたインデックスの付与を効率良く高速に行うことができる。
【０１７３】
また、本発明のコンピュータ読み取り可能な記録媒体（請求項１５）によれば、請求項１３または１４に記載の映像インデックス付与方法をコンピュータに実行させるためのプログラムを記録したため、このプログラムをコンピュータに実行させることにより、抽象度の高い用語を用いたインデックスの付与を効率良く高速に行うことができる。
【０１７４】
また、本発明の映像内容の説明文生成方法（請求項１６〜２２）によれば、状態遷移テーブルには、複数の事象の発生による映像シーンの意味を表すとともに複数の事象の組み合わせによって表現可能な用語毎に、説明文を生成する際の映像単位として適当な構造単位を設定した生成粒度と、用語を複数の事象で表現した事象の発生パターンに基づいて、生成粒度の中で連続して発生する複数の事象インデックスの順序を定めた入力列として用語を定義した状態遷移パターンと、状態遷移パターン毎に、それぞれの状態遷移パターン中に存在する事象インデックスのうち、少なくとも１つを選択して設定したキーインデックスと、状態遷移パターン毎に、説明文の生成に使用する構文要素および構文の構文要素として使用する文字列の入力元を設定した文字列定義情報と、が設定されており、映像内容の説明文を生成する際に、状態遷移テーブルを参照して、説明文を生成する対象となる構造単位と一致する生成粒度を検索し、該当する生成粒度に対応する状態遷移パターンのキーインデックスと一致する事象インデックスが対象となる構造単位中に存在するか否かを判定し、一致する事象インデックスが存在する場合に、対応する状態遷移パターンと対象となる構造単位中の事象インデックスの発生パターンが一致するか否かを判定し、発生パターンが一致した場合に、対応する状態遷移パターンで定義された用語が成立したと判定し、成立した用語の状態遷移パターンの文字列定義情報を用いて対象となる構造単位の映像シーンの説明文を生成するため、映像の内容の意味を解釈し、一般的な意味のある文字列に変換して、利用者にとって分かりやすい映像内容の説明文（説明文字列）を生成することができる。換言すれば、対象となる構造単位（映像）に対して、その部分で発生した映像の内容の意味による最適な説明文（文字列）を生成することが可能となる。
【０１７５】
また、構造単位には、特定の用語を定義した状態遷移パターンが成立した場合に、その用語を表すインデックスを抽象インデックスとして付与することが可能であり、状態遷移テーブルには、状態遷移パターン毎に、対応する用語を表す抽象インデックスがキーインデックスの一つとして設定されており、該当する生成粒度に対応する状態遷移パターンのキーインデックスと一致する事象インデックスが対象となる構造単位中に存在するか否かを判定する際に、キーインデックスとして抽象インデックスを優先して用いて、対象となる構造単位に該当する抽象インデックスが付与されているか否かを判定し、抽象インデックスが設定されている場合には、対応する状態遷移パターンで定義された用語が成立したと判定し、成立した用語の状態遷移パターンの文字列定義情報を用いて対象となる構造単位の映像シーンの説明文を生成するため、映像の内容の意味に相当する用語の特定（成立の有無）を効率的に短時間で行うことが可能となる。
【０１７６】
また、文字列定義情報には、構文要素として使用する文字列の入力元として、構造インデックスまたは事象インデックスの属性情報が設定されているため、状態遷移パターンが一致して用語が成立した場合には、必ず必要な構造インデックスおよび状態遷移パターンに属性情報が存在するので、文字列の入力を確実に行って説明文を生成することができる。
【０１７７】
また、文字列定義情報には、「何時（Ｗｈｅｎ）、どこで（Ｗｈｅｒｅ）、なぜ（Ｗｈｙ）、誰の（Ｗｈｏ）、何（Ｗｈａｔ）で、どのように（Ｈｏｗ）なった」の５Ｗ１Ｈを基本とした構文要素が設定されているため、これらの構文要素を用いて５Ｗ１Ｈの情報を盛り込んだ説明文を簡単に生成することができる。
【０１７８】
また、該当する生成粒度に対応する状態遷移パターンが複数存在する場合、各状態遷移パターンに対して用語が成立するか否かを判定し、複数の用語が成立すると、各用語の状態遷移パターンに設定された文字列定義情報の構文要素を組み合わせて対象となる構造単位の映像シーンの説明文を生成するため、より内容の詳細に説明した説明文を生成することできる。
【０１７９】
また、各用語の状態遷移パターンに設定された文字列定義情報の構文要素を組み合わせて対象となる構造単位の映像シーンの説明文を生成する際に、５Ｗ１Ｈの構文要素の中に重複する構文要素がある場合、時間的に後に発生する事象インデックスを参照する構文要素を優先するため、より新しい情報（属性情報）を用いた説明文を生成することができる。
【０１８０】
また、各用語の状態遷移パターンに設定された文字列定義情報の構文要素を組み合わせて対象となる構造単位の映像シーンの説明文を生成する際に、５Ｗ１Ｈの構文要素の中に重複する構文要素がある場合、各用語の状態遷移パターンを比較し、より多くの事象インデックスを用いて定義された状態遷移パターンの構文要素を優先するため、より新しい情報（属性情報）を用いた説明文を生成すると共に、その部分で発生した映像の内容の意味をより適切な説明文（文字列）で表現することができる。すなわち、より多くの事象インデックスを用いて定義された状態遷移パターンの構文要素を優先することとは、例えば、第１の状態遷移パターンの一部分で構成される第２の状態遷移パターンがある場合、第１の状態遷移パターンが成立している場合には、常に第２の状態遷移パターンも成立しているが、常に事象インデックスの多い第１の状態遷移パターンの構文要素が選択されるので、その部分の映像における最終的な結果を構文要素として選択できることになる。
【０１８１】
また、各用語の状態遷移パターンに設定された文字列定義情報の構文要素を組み合わせて対象となる構造単位の映像シーンの説明文を生成する際に、５Ｗ１Ｈの構文要素の中に重複する構文要素がある場合、必要に応じて構文要素を並列に並べるため、より情報量の多い、換言すれば、分かりやすい説明文を生成することができる。
【０１８２】
また、請求項２４に係るコンピュータ読み取り可能な記録媒体は、請求項１６〜２３のいずれか一つに記載の映像内容の説明文生成方法をコンピュータに実行させるためのプログラムを記録したため、映像の内容の意味を解釈し、一般的な意味のある文字列に変換して、利用者にとって分かりやすい映像内容の説明文（説明文字列）を生成することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態１の映像検索処理装置の構成図である。
【図２】構造インデックスと事象インデックスとの関係を示す説明図である。
【図３】構造インデックスと事象インデックスとの関係を示す説明図である。
【図４】実施の形態１の状態遷移テーブルを示す説明図である。
【図５】実施の形態１の状態遷移テーブルを示す説明図である。
【図６】実施の形態１のパス正規表現に対応する有限状態オートマトンの状態遷移図である。
【図７】実施の形態１のパス正規表現に対応する有限状態オートマトンの状態遷移を説明するための図である。
【図８】実施の形態１の映像検索処理のアルゴリズムを示すフローチャートである。
【図９】野球映像を例とした時の映像インデックス（構造インデックスおよび事象インデックス）の例を示す説明図である。
【図１０】実施の形態２の映像内容説明文生成装置の構成図である。
【図１１】実施の形態２の抽象インデックスを用いて状態遷移パターンを定義した例を示す説明図である。
【図１２】実施の形態２の状態遷移テーブルを示す説明図である。
【図１３】実施の形態２の映像内容の説明文生成処理のアルゴリズムを示すフローチャートである。
【符号の説明】
１００映像検索処理装置
１０１映像入力部
１０２状態遷移テーブル記憶部
１０３操作入力部
１０４映像検索部
１０５判定部
１０５ａ第１の判定部
１０５ｂ第２の判定部
１０６検索結果出力部
２００映像内容説明文生成装置
２０１映像入力部
２０２状態遷移テーブル記憶部
２０３操作入力部
２０４用語検索部
２０５説明文生成部
２０６説明文表示部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention provides a video search method and a video search processing apparatus for performing a video search process using a term with a high degree of abstraction on a video structured using a structure index and an event index. Video index assigning method for performing a video index assigning process, a description generating method of a video content for generating a description explaining the video content of a video scene, and That The present invention relates to a computer-readable recording medium recording a program for causing a computer to execute the method.
[0002]
[Prior art]
In recent years, with the progress of computer hardware technology and information processing technology, and the spread of the Internet and digital satellite broadcasting, it has become possible to use various images on a daily basis. For this reason, the value, diversity, and recreational value of the information possessed by the video become even more important. In addition to playback and viewing, there are a variety of ways to use, such as searching for and viewing desired video scenes from structured videos that have been indexed, collecting information, and creating digest versions. Proposed.
[0003]
By the way, in order to efficiently search for a video, a video that is a set of temporally continuous video scenes is usually divided into smaller units (sections) and used. At this time, the video scene is not physically divided such as divided into predetermined unit time or divided according to the amount of the video scene, but is logically defined as a set of video scenes satisfying a predetermined condition. It is common to divide it into two. After performing this logical division, by adding an index to the logical division, the divided video scene can be treated as a reusable semantic group.
[0004]
As a logical division method, for example, as a method of visually dividing an image, manually dividing the image and assigning an index to the image, and a method of real-time authoring (real-time content description), A method is conceivable in which a point corresponding to a start point of a logical division is designated as an index (structure index), and a section up to the start point of the next logical division is determined as a division section.
[0005]
On the other hand, in addition to the structure index indicating the section to be divided, an event index indicating an event that has occurred on the video has been considered as an index of the video. This is an index basically having no section, whereas the above-mentioned logical structure index indicates a logical section (Duration: duration) of the video. However, this event index can also set only fragmentary information with the current technology.
[0006]
Conventionally, extraction of a logical section by a search using the event index is a video search process. In other words, it is possible to search for a desired video portion by using a query that matches a term (fragment information) set as an event index.
[0007]
[Problems to be solved by the invention]
However, according to the above-described conventional technology, it is possible to search for a video in response to a query that matches a term set as a structure index or an event index, but it is not possible to search for the meaning of the content of the video. There was a problem.
[0008]
In other words, according to the above-described conventional technology, only a fragmentary index can be assigned to a video, and a search is performed in units of a logical structure based on the fragmentary index to extract a video portion. Although it is possible, it is not always possible to search for a video part that the user really wants.
[0009]
In addition, in order to clarify the description, “the meaning of the contents of the video” is defined as follows in this specification. The meaning of the content of the video is information generated for a specific section when a person views the video and understands the content of the video abstractly and conceptually. That is, it is information having a meaning that can be discovered only by using human subjectivity. Also, the meaning of the content of the video means that the event index is a fragmentary representation of the event that occurred on the video, while the status of the change of the event that occurred on the video and the result that occurred due to the change of the event Based on the state of the event, the contents of the video in a certain section are comprehensively judged and given meaning by a person. Therefore, the meaning of the content of the video is meaningful only after the content of the video in a certain section becomes clear.
[0010]
Also, when searching for a video, a section expected as an answer to an inquiry from a user does not always correspond to one structural unit, and may be a part of a certain structural unit. Alternatively, a plurality of structural units may be combined, but one structural unit is selected as an answer (search result) to the content of the query based on a fragmentary index assigned to the logical structural unit. Therefore, it was not always possible to search for a video portion that satisfied the user.
[0011]
Also, since video scenes are interpreted in various ways, if the person who created the logical structure (that is, the person who defined it) is different from the person making the inquiry, the section set as a meaningful group does not necessarily match. Was not limited.
[0012]
Furthermore, searching for the meaning of the contents of the video means "searching for a term with a high degree of abstraction", but in the above-described conventional technology, relying only on fragmented index information, The only way to do this is to extract logical structural units. Except for specific applications, generally, the process of efficiently analyzing the meaning of video content using highly abstract search conditions is could not.
[0013]
Also, in the above-described conventional technology, when an attempt is made to generate a character string that describes the content of a video using an index, it is only possible to simply arrange the indices assigned in fragments in parallel. Absent For this reason, there is a problem that a sentence that is easy for the user to understand cannot be generated. That is, it was not possible to convert a fragmentary index into a character string having a general meaning and generate a sentence that is easy for the user to understand.
[0014]
At the time of real-time authoring, an attribute (or an event index) for a certain event may be determined by a subsequent event. For example, in a baseball video or the like, it is not possible to determine whether the player hits a second or third base at the time of hitting. After that, the meaning (addition of an attribute or an event index) cannot be made until the result of the number of bases the batter has reached is determined.
[0015]
As a technique for solving this problem, as a method of describing the contents in real time, a related study by Akasako, Iijima, Kakutani and Tanaka (“Real-time contents description method of video data and its implementation”, DEWS'99 Proceedings, 3A-2 , March 4-6, 1999). According to this report, a process is performed on how to interpret and replace the incomplete index information sequence that comes in real time such as radio narration with a higher concept. That is, the superordinate concept is represented in advance by a state transition diagram, and replaced with the superordinate concept in real time, thereby correcting the redundancy and incompleteness of the original input index sequence.
[0016]
However, the method proposed in the above related research by Akasako et al. Presupposes that a state transition diagram is used to interpret an index included in a time series, as described in real-time content description. Therefore, if this method is applied to a search as it is, the index is interpreted in a time series, so that there is a problem that it takes time to process. Further, no solution to the problem of how to realize a search with a high degree of abstraction at high speed has not been proposed.
[0017]
Further, there is a problem that the efficiency is not always efficient because it is necessary to interpret a time-series index by comparing it with all state transition diagrams.
[0018]
The present invention has been made in view of the above, and it is an object of the present invention to be able to make an inquiry about the meaning of the contents of a video using terms or concepts with a high degree of abstraction and to perform a high-speed search.
[0019]
Further, the present invention has been made in view of the above, and interprets the meaning of the contents of a video, converts the meaning into a character string having a general meaning, and converts a description character string of the video contents that is easy for a user to understand. The purpose is to be able to generate.
[0020]
Further, the present invention has been made in view of the above, and it is an object of the present invention to efficiently and quickly assign an index using terms having a high degree of abstraction.
[0021]
[Means for Solving the Problems]
In order to achieve the above object, a video search method according to claim 1, wherein at least a structure index for dividing a video into a semantic group and an event for specifying the content and location of an event occurring in the video An index is given, and a video scene of a section divided by a structure index is used as a structure unit of a video, and the structure index and the event index are used for a video structured using a plurality of hierarchical structure units. In the video search method for searching for a desired video scene from the video, the meaning of the video scene caused by the occurrence of a plurality of events is determined in advance. Can be expressed by combining multiple events with For a term, a structural unit suitable as a search target unit when searching for the video is set as a search granularity, and based on an occurrence pattern of an event expressing the term by a plurality of events, A state transition pattern corresponding to the term is defined as an input sequence that defines the order of a plurality of event indexes that occur consecutively, and when a desired video scene is searched from the video, a desired video scene is searched. Enter the expressed term, and use the search granularity corresponding to the input term as the search target unit, and for each structural unit that matches the search granularity, the state transition pattern corresponding to the input term and the occurrence pattern of the event index in the structural unit Is determined as to whether or not they match, and the matching structural unit is output as a search result.
[0022]
The video search method according to claim 2 is the video search method according to claim 1, further comprising, for each state transition pattern corresponding to the term, the event index existing in each state transition pattern. At least one event index is designated as a key index for starting a search, and when searching for a desired video scene from the video, a structure unit matching the search granularity matches the key index. After searching for a structural unit having an event index, it is determined whether or not the state transition pattern matches the occurrence pattern of the event index in the structural unit for the relevant structural unit.
[0023]
Further, in the video search method according to claim 3, in the video search method according to

claim

1 or 2, when extracting a desired video scene from the video based on the structural unit output as the search result, The video scene specified in the structural unit is output.
[0024]
The video search method according to claim 4 is the video search method according to

claim

1 or 2, wherein a desired video scene is extracted from the video based on the structural unit output as the search result. Any upper or lower structural unit on the structured video can be specified.
[0025]
A video search method according to a fifth aspect of the present invention is the video search method according to the second aspect, wherein when extracting a desired video scene from the video based on a structural unit output as the search result, the key search is performed. A video scene is extracted by specifying an offset for clipping the video before and after the location where the index is added.
[0026]
A video search method according to a sixth aspect is the video search method according to any one of the second to fifth aspects, wherein a state transition pattern corresponding to the input term and an occurrence pattern of an event index in a structural unit. When it is determined whether or not matches, an index representing the term is defined as an abstract index, newly added to the matching structural unit, and reused as a key index.
[0027]
Further, a video search method according to claim 7 is the video search method according to any one of claims 1 to 6, wherein the video search method according to any one of claims 1 to 6 is a term of a term that can be expressed in advance by a combination of a plurality of events. In each An abstract index is set as an index representing each term, and when defining the state transition pattern, the abstract index is added in addition to the event index. Use Then, a state transition pattern corresponding to the term is defined as an input sequence including an event index and an abstract index.
[0028]
The video search method according to claim 8 is the video search method according to any one of claims 2 to 7, wherein a plurality of pieces of attribute information are added to the structure index and the event index. To the state transition pattern corresponding to the term, character string definition information for generating an explanatory note related to the term using each attribute information of the structure index and the event index is added, and A description sentence related to the term is generated by referring to the attribute information in the structural unit output as the search result based on the character string definition information.
[0029]
In the video search method according to the ninth aspect, at least a structure index for dividing the video into a semantic group and an event index for specifying the content and location of an event that has occurred in the video are provided. The video scene of the section divided by the as a structural unit of the video, and, for a video structured using a plurality of hierarchical structural units, a desired from the video using the structure index and the event index In a video search method for searching for a video scene, the meaning of the video scene due to the occurrence of a plurality of events is represented in advance as a process before performing the video search. Can be expressed by combining multiple events with For a term, a structural unit suitable as a search target unit when searching for the video is set as a search granularity, and based on an occurrence pattern of an event expressing the term by a plurality of events, A state transition pattern corresponding to the term is defined as an input sequence that defines the order of a plurality of event indexes that occur consecutively, and a state transition pattern corresponding to the term is present in each state transition pattern. A state transition table generating step of generating a state transition table including the term, search granularity, state transition pattern, and key index by designating at least one event index among the event indexes to be searched as a key index for starting a search. Include and enter terms that describe the desired video scene as a process when performing a video search Referring to the term transition step and the state transition table, using a search granularity corresponding to the term input in the term input step as a search target unit, and using the key index corresponding to the term, the key index A search step of searching for a structural unit having an event index that matches with the event index included in the state transition pattern corresponding to the term in the structural unit searched in the search step with reference to the state transition table. A first determining step of determining whether all exist, and a state transition pattern corresponding to the input term and an event in the structural unit with respect to the structural unit determined to be all present in the first determining step A second determining step of determining whether index generation patterns match, and a structural unit determined to match in the second determining step Based on, cut out image scene from the in the video, is intended to include, a search result output step of outputting as a search result.
[0030]
A computer-readable recording medium according to a tenth aspect records a program for causing a computer to execute the video search method according to any one of the first to ninth aspects.
[0031]
Also, the video search processing device according to claim 11 is provided with at least a structure index for dividing the video into a semantic group and an event index for specifying the content and location of an event that has occurred in the video. A video scene of a section divided by an index is used as a structural unit of a video, and a video structured using a plurality of hierarchical structural units is targeted. A video input means for inputting the structured video to be searched and a desired video scene to be searched for. Represents the meaning of a video scene due to the occurrence of multiple events and can be expressed by a combination of multiple events An input that defines a term, a search granularity that specifies a structural unit as a search target unit when searching for a video with the term, and an order of a plurality of event indexes in which the term occurs continuously in the search granularity Storage means for storing, as a state transition table, a state transition pattern defined as a column and a key index designating at least one event index among the event indexes existing in the state transition pattern; Operation input means for inputting or specifying an inquiry term for searching for, and when the inquiry term is input or specified via the operation input means, refer to a state transition table of the storage means, Using the key index corresponding to the term, the search grain is obtained from the video input by the video input means. A search unit for searching for a structural unit having an event index that matches the key index, and a state transition pattern and a structural unit corresponding to the query term by inputting the structural unit searched by the search unit. Determining means for determining whether or not the occurrence patterns of the event indexes in the image match; and extracting a video scene from the video based on the structural unit determined to match by the determining means, and outputting as a search result. Search result output means.
[0032]
The video search processing device according to claim 12 is the video search processing device according to claim 11, wherein the determination unit includes a state transition pattern corresponding to the query term in a structural unit searched by the search unit. First determining means for determining whether or not all the event indexes included therein are present; and state transition corresponding to the query term for the structural unit determined to be all present by the first determining means. Second determining means for determining whether or not the pattern matches the occurrence pattern of the event index in the structural unit.
[0033]
Furthermore, the video index assigning method according to claim 13, when structuring the video, specifies at least a structure index for dividing the video into a semantic group and the content and location of an event that has occurred in the video. In the video index assigning method of assigning a video index including the event index of the above, the meaning is established by a plurality of events occurring consecutively in advance. It represents the meaning of a video scene due to the occurrence of multiple events and can be expressed by a combination of multiple events A term, a state transition pattern that defines the term using an input sequence that defines the order of a plurality of event indices expressing the term by a plurality of events, and a structural unit of an image defined by a predetermined structural index. A search granularity specified in correspondence with a term is set in a state transition table corresponding to each term, and when assigning a video index to the video, the state transition table is referred to and the structure is referred to. For each structural unit specified by the index, the target structural unit matches the search granularity, and the order of assigning the event indexes assigned in the target structural unit and the plurality of event indexes of the state transition pattern Search for a term that matches the input string of, and if a matching term exists, determine that the meaning of the matched term has occurred, and Those that confer an event index or attribute information indicating the establishment of the word.
[0034]
Further, the video index assigning method according to claim 14, when structuring a video, specifies at least a structure index for dividing the video into a semantic group and the content and location of an event that has occurred in the video. In the video index assigning method of assigning a video index including the event index of the above, the meaning is established by a plurality of events occurring consecutively in advance. It represents the meaning of a video scene due to the occurrence of multiple events and can be expressed by a combination of multiple events A term, a state transition pattern that defines the term using an input sequence that defines the order of a plurality of event indices expressing the term by a plurality of events, and a structural unit of an image defined by a predetermined structural index. A state transition table in which a search granularity specified in correspondence with a term and a central index specifying at least one event index among event indexes existing in the state transition pattern are set in correspondence with each term is set. In addition, when assigning the video index to the video, referring to the state transition table, when an event index that matches the central index is assigned, for each structural unit specified by the structural index The search granularity matches the target structural unit and is assigned within the target structural unit. A search is made for a term in which the index assignment order matches the input string of the plurality of event indexes of the state transition pattern, and when a matching term exists, it is determined that the meaning of the matched term has occurred, and An event index or attribute information indicating the establishment of the term is given.
[0035]
A computer-readable recording medium according to a fifteenth aspect stores a program for causing a computer to execute the video index assigning method according to the thirteenth or fourteenth aspect.
[0036]
Also, in the method for generating a description of a video content according to claim 16, at least a structure index for dividing the video into a semantic group and an event index for specifying the content and location of an event occurring in the video are added. A video for which a video scene of a section divided by a structure index is used as a video structural unit, and a video for generating a description describing a video content is provided for a video structured using a plurality of hierarchical structural units. In the method for generating a description of a content, a state transition table in which information used for generating the description is set, and a character string or attribute information that can be converted into a character string that is previously assigned to the structure index and the event index. A method for generating a description of a video content for generating the description, wherein a plurality of events are generated in the state transition table. It represents the meaning of the video scene by Can be expressed by combining multiple events with For each term, based on a generation granularity in which an appropriate structural unit is set as a video unit when generating the description, and an occurrence pattern of an event in which the term is expressed by a plurality of events, a continuous At least one of a state transition pattern in which the term is defined as an input sequence that defines the order of a plurality of event indexes generated in the state transition pattern, and an event index existing in each state transition pattern for each of the state transition patterns A key index set by selecting and, for each of the state transition patterns, a syntax element used to generate the explanatory note and character string definition information that sets an input source of a character string used as a syntax element of the syntax, Is set, and at the time of generating the description of the video content, referring to the state transition table, the structure unit for which the description is to be generated is set. Search for a generation granularity that matches with, and determine whether an event index that matches the key index of the state transition pattern corresponding to the relevant generation granularity exists in the target structural unit, and determine whether the matching event index If there is, it is determined whether the corresponding state transition pattern matches the occurrence pattern of the event index in the target structural unit, and if the occurrence pattern matches, the corresponding state transition It is determined that the term defined by the pattern is established, and a description of the video scene of the target structural unit is generated using the character string definition information of the state transition pattern of the established term.
[0037]
According to a seventeenth aspect of the present invention, in the method for generating a description of a video content according to the sixteenth aspect, further, a state transition pattern defining a specific term is established in the structural unit. In this case, an index representing the term can be assigned as an abstract index. In the state transition table, an abstract index representing a corresponding term is set as one of the key indexes for each of the state transition patterns. When determining whether an event index that matches the key index of the state transition pattern corresponding to the corresponding generation granularity exists in the target structural unit, the abstract index is prioritized as the key index. To determine whether an abstract index corresponding to the target structural unit has been given. When the abstract index is set, it is determined that the term defined in the corresponding state transition pattern is established, and the target is determined using the character string definition information of the state transition pattern of the established term. A description of a video scene in a structural unit is generated.
[0038]
A method for generating a description of a video content according to claim 18 is the method for generating a description of a video content according to claim 16 or 17, wherein the character string definition information includes input of a character string used as the syntax element. As the source, attribute information of the structure index or the event index is set.
[0039]
According to a nineteenth aspect of the present invention, in the method of generating a description of a video content according to any one of the sixteenth to eighteenth aspects, the character string definition information may include “what time (When). , Where (Where), why (Why), who (Who), what (What), how and how (How) it is.) The syntax element based on 5W1H is set.
[0040]
According to a twentieth aspect of the present invention, in the method for generating a description of a video content according to the nineteenth aspect, when a plurality of state transition patterns corresponding to the corresponding generation granularity exist, each Determine whether or not a term is established for the transition pattern, and when a plurality of terms are established, combine the syntax elements of the character string definition information set in the state transition pattern of each term to combine the syntax elements of the target structural unit. A description of the video scene is generated.
[0041]
A video content description generating method according to claim 21 is the video content description generating method according to claim 20, wherein the syntactic elements of the character string definition information set in the state transition pattern of each term are combined. When generating the description of the video scene of the target structural unit, if there is a duplicate syntax element among the syntax elements of 5W1H, the syntax element that refers to the event index that occurs later in time is prioritized. Things.
[0042]
A video content description generation method according to claim 22 is the video content description generation method according to claim 20 or 21, wherein the syntax element of the character string definition information set in the state transition pattern of each term When generating the description of the video scene of the target structural unit by combining the above, if there are duplicate syntax elements among the syntax elements of 5W1H, the state transition patterns of each term are compared, and more events The priority is given to the syntax element of the state transition pattern defined using the index.
[0043]
A video content description generating method according to claim 23 is the video content description generating method according to any one of claims 19 to 22, wherein the character string set in the state transition pattern of each term is provided. When generating the description of the video scene of the target structural unit by combining the syntax elements of the definition information, if there are duplicate syntax elements in the 5W1H syntax elements, the syntax elements are parallelized as necessary. It is something to line up.
[0044]
A computer-readable recording medium according to a twenty-fourth aspect stores a program for causing a computer to execute the method for generating a description of a video content according to any one of the sixteenth to twenty-third aspects. .
[0045]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an image search method of the present invention, a computer-readable recording medium storing a program for causing a computer to execute the method, an image search processing device, an image indexing method, and a program for causing a computer to execute the method will be described. Embodiments of a recorded computer-readable recording medium, a method of generating a description of video content, and a computer-readable recording medium recording a program for causing a computer to execute the method will be described in detail with reference to the accompanying drawings. Will be described.
[0046]
(Embodiment 1)
A video search method, a video search processing device, and a video index assigning method according to Embodiment 1 will be described.
(1) Device configuration of video search processing device
(2) Relationship between structure index and event index
(3) Structure of state transition table
(4) Definition example of state transition pattern (state transition diagram)
(5) Specific example of video search algorithm
(6) Specific operation example
Will be described in this order.
[0047]
(1) Device configuration of video search processing device
FIG. 1 shows a configuration diagram of a video search processing apparatus according to the first embodiment. FIG. 1A shows an example of a hardware configuration of the video search processing apparatus 100, and FIG. FIG. 2 shows a functional block diagram.
The video search processing device 100 is provided with at least a structure index for dividing a video into a semantic group and an event index for specifying the content and location of an event that has occurred in the video, and the section divided by the structure index A video scene is used as a structural unit of a video, and for a video structured using a plurality of hierarchical structural units, a desired video scene is searched from the video using a structure index and an event index. is there.
[0048]
The hardware configuration of the video search processing device 100 may be any device having at least a CPU (central processing unit), a display, a keyboard, and a magnetic disk. For example, a personal computer as shown in FIG. can do.
[0049]
Further, as shown in FIG. 2B, the video search processing device 100 includes a video input unit 101 for inputting a structured video to be searched, and a term, a search granularity, a state transition pattern, and a key described later. A state transition table storage unit 102 in which an index is stored as a state transition table, an operation input unit 103 for inputting or specifying a query term for searching for a desired video scene, and a query term via the operation input unit 103 When input or specified, the search unit refers to the state transition table of the state transition table storage unit 102, matches the search granularity from the video input by the video input unit 101 using the key index corresponding to the query term, and A video search unit 104 for searching for a structural unit having an event index that matches the key index; The structural unit searched in 104 is input, and the determination unit 105 that determines whether the state transition pattern corresponding to the query term matches the occurrence pattern of the event index in the structural unit, and the determination unit 105 determines that they match. And a search result output unit 106 that cuts out a video scene from the video based on the obtained structural unit and outputs it as a search result.
[0050]
The determining unit 105 determines whether all of the event indexes included in the state transition pattern corresponding to the query term exist in the structural unit searched by the video searching unit 104. A second determination is made as to whether or not the state transition pattern corresponding to the query term matches the occurrence pattern of the event index in the structural unit with respect to the structural unit determined to be all present by the first determining unit 105a. And a determination unit 105b.
[0051]
(2) Relationship between structure index and event index
Next, a structured video to be searched by the video search processing device (video search method) of the present invention will be described. The structured video is provided with a structure index for dividing the video into a semantic group and an event index for specifying the content and location of an event that has occurred in the video.
[0052]
2 and 3 are explanatory diagrams showing the relationship between the structure index and the event index. In FIG. 2, for example, if the video data (video scene) in the section indicated by the structure index of the structure 1 represents the entirety of one video, the video is a structure 2-a and a structure 2 -B, structure 2-c, etc. Here, the section indicated by each structure index such as the structure 2-a, the structure 2-b, the structure 2-c... Is a section obtained by dividing the entire video shown by the structure 1, and these sections are divided. When all sections are connected, they match the section of the entire video which is a higher section.
[0053]
Further, a plurality of structure indexes such as a structure 3-aa, a structure 3-ab, a structure 3-ac... Are provided below the structure 2-a. Similarly, the section indicated by each structure index such as the structure 3-aa, the structure 3-ab, the structure 3-ac... Is a section obtained by dividing the section indicated by the structure 2-a, and When all of the sections are connected, they match the section of the structure 2-a, which is the upper section.
[0054]
The section indicated by the structure index of the top structure 1 is a structure unit indicating the entire video, and the section indicated by each structure index such as the following structure 2-a, structure 2-b, structure 2-c, etc. Is a structural unit of the structure 2 level, and a section indicated by each structure index such as a lower structure 3-aa, a structure 3-ab, a structure 3-ac... Becomes a structural unit of the structure 3 level.
[0055]
As described above, the video (structure 1) is structured using the video scene of the section divided by the structure index as a video structure unit and using a plurality of hierarchical structure units.
[0056]
In contrast to the structure index representing the structure of the video, the event index represents an event that has occurred on the video. As described above, the structure index indicates a logical section of a video, whereas the event index is basically an index having no section. This event index is basically attached to a place where an event occurs in a video as information indicating the content of the event. For example, as shown in FIG. 3, when an event occurs in the flow of the video (change in the time axis), the event index may be assigned as event indexes 3aa-1 to 3aa-4, or as shown in FIG. The event index 3aa-1 to 3aa-4 may be added to the structure index 3aa where the event has occurred. Although a detailed description is omitted, it is also possible to separately store and manage a portion of structured index information composed of a structure index and an event index and a portion of an actual video scene, respectively. Of course you can.
[0057]
(3) Structure of state transition table
Next, the structure of the state transition table, which is an important element of the present invention, will be described in detail. According to the present invention, in order to search for a desired video scene using a meaning (term) having a high degree of abstraction, a state transition pattern expressing the term is defined using structure information (structure index and event index) of the video. Keep it. This processing corresponds to the state transition table generation step of the present invention.
[0058]
As shown in FIG. 4, the state transition table is set in correspondence with the following (3-1) to (3-4).
(3-1) Terms:
This is a search term having a high degree of abstraction, and is set as a character string representing the meaning of the corresponding video scene. A term with a high degree of abstraction (a search term) means that a fragmentary term used as an event index can be associated with one event on the video, while it corresponds to only one event on the video It is a term that can not be expressed and can be expressed by a combination of a plurality of events. In other words, a term with a high degree of abstraction (search term) has meaning only when the occurrence of a plurality of events in a certain section becomes clear.
[0059]
(3-2) Search granularity:
An appropriate structural unit is set as a search target unit when searching for a video. That is, by setting the minimum structural unit that satisfies the meaning of the term ((3-1)) as the search granularity, the search target unit can be narrowed down and the search can be performed efficiently.
(3-3) State transition pattern:
Based on the occurrence pattern of an event expressing the term ((3-1)), the term (((2)) is used as an input string of a plurality of event indexes that continuously occur in the search granularity ((3-2)). This defines an event list corresponding to 3-1)).
(3-4) Key index:
At least one event index among the event indexes existing in the state transition pattern ((3-3)) is specified as a key (center event) to start a search.
[0060]
That is, the state transition table defines the meaning definition of the video content by listing a list of event indexes in a logical unit (search granularity) in a certain structure, a character string representing the meaning of the content, and a key index for efficiently performing the search. And created as a set. The search is to discover (parse) a state transition pattern on the state transition table for an image structured by an index.
[0061]
With further reference to FIG. 4, for a character string “xxxxxx” representing a meaning, first, the term is represented by an input sequence of a plurality of events, and an event list “event1, Event 2 and Event 3 are set. This event list indicates that event 1 occurs in state A, then event 2 occurs, event 3 occurs, and then state B, as shown in FIG.
[0062]
When the event list for the term is created, an event that symbolically represents the term among events 1 to 3 existing in the event list or an event suitable for specifying the term is designated as a key index. Here, event 1 is specified as a key index. Subsequently, in the structural unit of the structured video, the smallest structural unit in which the events 1 to 3 existing in the event list can occur as the meaning of the term is selected and set as the search granularity. Here, it is described as granularity 1.
[0063]
Similarly, for the character string “0000... 0” representing the meaning, first, the term is represented by an input sequence of a plurality of events, and an event list “event 1, event 2, event 4” is set. This event list indicates that event 1 occurs in state A, then event 2 occurs, then event 4 occurs, and the state transitions to state C, as shown in FIG.
[0064]
When the event list for the term is created, an event that symbolically represents the term among the

events

1, 2, and 4 present in the event list or an event suitable for specifying the term is designated as a key index. . Here, event 4 is specified as a key index. Subsequently, in the structural unit of the structured video, the smallest structural unit that can be generated as the meaning of the terms of the

events

1, 2, and 4 existing in the event list is selected and set as the search granularity. Here, it is described as particle size 2.
[0065]
As shown in FIG. 4, the granularity 1 and the granularity 2 set as the search granularity for different terms may be the same structural unit. An optimal search granularity may be selected for each term, and a plurality of structural units having the same search granularity may exist.
[0066]
In the case of searching for the video content that the term means using the state transition table, for example, search efficiency can be improved by narrowing down the search to structural units of search granularity corresponding to the term. Furthermore, after parsing the corresponding event using the key index, it is possible to perform a high-speed and efficient search by determining whether an event list is established in the structural unit of the specified search granularity.
[0067]
(4) Definition example of state transition pattern (state transition diagram)
In the present invention, as described above, terms (and concepts) having a high degree of abstraction used by humans for retrieval are defined in advance as state transition path regular expression patterns (state transition patterns).
[0068]
As a prerequisite,
* It is assumed that an event index is added to a video, but the event index is added, and a new state is used as an input symbol.
* Also, the event index is defined as having no temporal width.
* The video data between the two event indexes of interest is defined as a scene.
* Scenes flow over time.
[0069]
The flow of the scene, that is, the state of the state transition, can be represented by a directed graph using the event index as a label. The nodes of the graph represent scenes. To each scene, various parameter values expressing the situation are added as attributes. For example, when the video is a video of a baseball game, the attributes include a score, a position and a player name of a defensive player, a batter's name, an SBO (strike ball out), and the like.
[0070]
The search is to find a state transition pattern in the indexed video. The found video part (scene or index) is the search result. If the relevant scene includes other relevant scenes, the shortest scene flow among them is used as the search result.
[0071]
As the path regular expression of the state transition pattern, “when a transition from the scene s0 to the scene s1 by the event index I” is expressed as “s0−I−> s1”.
“.” Represents an arbitrary scene and an arbitrary event index.
“−.−>” is abbreviated as “→”.
A transition between two temporally consecutive scenes is represented by “==>”.
In addition, “*” indicates 0 or more repetitions, and “+” indicates 1 or more repetitions.
[0072]
For example, the flow of the scene starting from the scene s0 and reaching the scene s1 by the event index sequence “A. * B. * (C *) * C” is expressed by the following path regular expression.
s0-A->. (→.) *-B->. (→.) * (-C-> (→.) *) *-C-> s1
This path expression is abbreviated as follows.
s0-ABC +-> s1
[0073]
Strictly speaking, the expression contents of the regular expression “A. * B. * (C. *) * C” and the regular expression “ABC +” are not the same, but in the case of event indexes, there are many event indexes that can be ignored. Since it is complicated to describe each one, such an abbreviation is applied. FIG. 6 shows a state transition diagram of the finite state automaton corresponding to the path regular expression.
[0074]
Further, as shown in FIG. 7, it is assumed that event indexes such as A, B, C, and D are added to a video (video scene). When the scenes s0 and s1 shown in FIG. 6 are searched for this video, s0 and s1 are obtained as shown in FIG.
[0075]
By describing conditional expressions further in this state transition pattern (state transition diagram), it is possible to improve the efficiency of retrieval by using structural information.
For example, it is assumed that a conditional expression is written in [] after the state transition pattern.
"S0-ABC +-> s1 [at bat (s0) is the same as at bat (s1)]"
[0076]
Example using structure information:
Look for the reversal scene of the home team. However, the end of the scene sequence in the search results must be taken to the end of the inning.

[0077]
Here, the definition below def is a mere replacement of a character string. When a scene environment such as x is set, value evaluation occurs.
[0078]
(5) Specific example of video search algorithm
Here, before describing a specific video search processing algorithm, processing before performing a video search will be confirmed.
It is assumed that the state transition table is already stored in the state transition table storage unit 102 of the video search processing device 100 through the above-described state transition table generation step.
[0079]
FIG. 8 is a flowchart illustrating an algorithm of the video search processing according to the first embodiment. When searching for a desired video portion using the video search processing device 100 shown in FIG. 1, a user inputs a video to be searched to the video search processing device 100 via the video input unit 101. When the magnetic disk of the apparatus is used as the video input unit 101, it is only necessary to specify the video to be searched.
[0080]
When performing video search,
First, when the user inputs a term representing a desired video scene via the operation input unit 103, a key index corresponding to the term is obtained from the state transition table as a starting point of search (S801).
[0081]
Next, an event index that matches the key index is searched using the key index, and as a result, a set of key indexes is obtained (S802).
With respect to the set of key indexes obtained in step S802, a structure instance (search granularity) including the key index is obtained from the constraint (state transition pattern) of the structure specified in the state transition table (S803).
[0082]
Subsequently, for one structure instance (structure unit that matches the search granularity) (S804), it is determined whether or not all the event indexes included in the state transition pattern exist (S805), and the structure that is not included is determined. No processing is performed for instances.
[0083]
If a certain structural instance includes a plurality of key indexes, the corresponding key index is removed from the set of key indexes (S806). Subsequently, it is determined whether or not a state transition is established for all structural units determined to exist, in other words, a state transition pattern corresponding to the input term and an occurrence index of an event index in the structural unit. Is determined (S807), and if it is satisfied, it is determined that the meaning specified by the term with a high degree of abstraction is satisfied (S808). At this time, if necessary, a new event index corresponding to the established meaning is added to the obtained structure instance, or the information is added as an attribute of the existing event index.
[0084]
Next, the above steps S803 to S808 are repeated for all the key indices of the obtained set of key indices (S809).
[0085]
After that, the obtained structure instance (structure unit) is returned as a return value (S810), and a video scene is cut out as a search result (S811). The extraction of the search result may be the search granularity (structural unit) specified by default, or an arbitrary structural unit including the extracted portion or an offset specified before or after the key index. It is assumed that clipping can be specified.
[0086]
Also, as one of the applications of the search, when a pattern that matches the state transition pattern is found by parsing the search condition, a procedure for generating a commentary (explanatory text) that explains the meaning is defined. By executing, a commentary can be automatically generated. That is, it is possible to generate a character string that describes a video by using the meaning definition (state transition pattern) of the video content specified in the state transition table. Simply arranging indices assigned in a fragmentary manner merely means that terms are arranged in parallel. However, if a state transition pattern is used, a user-friendly document can be generated.
[0087]
(6) Specific operation example
A specific operation example using the above configuration and the video search processing algorithm will be described. FIG. 9 is an explanatory diagram showing an example of a video index (structure index and event index) when a baseball video is taken as an example. According to the structure index, the game is divided into hierarchies such as times, innings, front / back, bats, and pitches from the start of the game, and is hierarchized. Such a structure is set in advance as a profile when defining a video index.
[0088]
In addition, event indexes, such as hits, outs, and home runs, are assigned as needed. As described above, a state transition pattern as shown in FIG. 6 is prepared for an image that is structured by the image index and in which the explanatory event is set in a fragmentary manner. For example, in the at-bat granularity (at-bat level structural unit), if there is at least one additional event after a hit, it means "timely". In addition, if an additional event occurs following the out or fly event, it means "sacrifice hit".
[0089]
At this time, an event defined as a central event (key index) is an index serving as a clue for search. A video is searched by relying on this key index, and if a state transition pattern is found at a specified granularity, the granularity becomes a candidate for a search result.
[0090]
Furthermore, a state transition can be defined for the effect that the event index has on the environmental parameters representing the state of the video. For example, in the case of baseball, when a scoring event occurs with respect to at-bat granularity, a change in the scoring situation immediately before the start of a turn at bat and the change in the scoring situation immediately after the end of a turn at-bat result in "first strike", "tie", and "reversal". ] Can be specified.
[0091]
Here, an example of the state transition pattern will be specifically described.
* Timely hit
s0 => bat in == s0'-hit / addition + → s1 => bat in => s1 '
* Sacrifice hit
s0 => Pitch-in =>s0'-out / additional point + → s1 => Batter-in => s1 '
* Striking strike
s0 => bat in == s0'-out / out → s1 => bat in == s1 '
* Reverse
s0 => bat in == s0'-addition + → s1 => bat in == s1 '
[S0. Home run> s0. Away score &&
s1. 'Home score> s1. 'Away score ｜｜
s0. Home run <s0. Away score &&
s1. 'Home score <s1. 'Away score]
[0092]
By setting the state transition pattern as described above, it is possible to cope with a search having a high level of abstraction. Conversely, simply by defining this state transition pattern, a search based on the meaning of the content of the video can be performed without depending on the structure of the index information of the video or the event definition. That is, an environment for a unified search query for video index information as semi-structured data (a video index having the characteristic of semi-structured data that the structure of the index differs for each creator and is not fixed). Can be prepared.
[0093]
As for the automatic generation of commentary (explanatory text), information (character string definition information) indicating where to obtain the subject data from the state transition table, which represents the meaning of the abstraction, is also defined. In addition, by combining the subject, the semantic information, and the semantic information generated by the situation change with a connective in between, it is possible to generate a commentary sentence that is comfortable for the user. For example, for the term “timely hit”, the conditions for generating the description (character string definition information) are specified as follows.
[0094]

[0095]
Here, <index. Attribute name> indicates the attribute of the specified video index. [Index] indicates the number of times an index has been generated at the specified granularity. “Term” indicates that a term having a high degree of abstract described therein is described.
[0096]
When the search result is obtained by the above specification, the description is
One-time reverse Takahashi's two-point timely turnaround
It becomes a form.
[0097]
In addition, during real-time authoring, when the key index (central event) of the above-mentioned state transition pattern occurs while shaking the index, it is determined whether the state transition pattern is satisfied within the specified granularity before and after the key index (central event). Perform verification. When events that satisfy the state transition pattern occur consecutively, it is assumed that the meaning defined there has occurred, and processing such as adding a new index or adding the information as an attribute of the key index is performed. Do.
[0098]
For example, if the point addition index continues one or more times after the hit in the at-bat granularity, the hit is determined to be timely, and a timely hit index (abstract index) is newly added or the hit is added. Processing such as adding “timely” as an attribute of the index (event index) is performed.
[0099]
Further, an index (abstract index) called a timely hit may be set in the state transition table in advance as a key index.
[0100]
The video search method and the video index assigning method according to the first embodiment described above are realized by executing a program in advance by a computer according to the above-described description and the procedure shown in each flowchart. This program is recorded on a computer-readable recording medium such as a hard disk, a floppy disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. Further, this program can be distributed via the recording medium or via a network.
[0101]
(Embodiment 2)
The video content description generating apparatus according to the second embodiment basically parses search conditions as one of the applications of the search of the video search processing apparatus 100 according to the first embodiment, and generates When a pattern that matches the state transition pattern is found, a procedure for generating a commentary (explanatory text) that explains the meaning of the pattern is defined and executed to automatically generate a description of the video content. That is, a character string describing the video content is generated by using the semantic definition (state transition pattern) of the video content specified in the state transition table.
[0102]
However, the generation of the description of the video content is not necessarily performed as a post-processing or an additional function of the video search processing in the video search processing device 100, and the structure index and the event index described above are added, and the structure index is used. If the video scene of the divided section is a structural unit of the video and the video is structured using a plurality of hierarchical structural units, the description of the video content is provided as an independent function separately from the video search processing. Generation processing can be performed. Therefore, in order to clearly express the video content description generation method of the present invention, the apparatus to which the present invention is applied will be referred to as a video content description generation device 200 here.
[0103]
In the second embodiment, a part of structured index information composed of a structure index, an event index, and an abstract index and a part of an actual video scene are separately managed. When generating a description, it is assumed that a part of the index information is input to generate the description. As a result, the processing can be performed separately from the video scene portion having a large data amount, so that the load on the apparatus can be reduced and the processing for generating the description can be performed at high speed.
[0104]
Hereinafter, the video content description generating apparatus 200 according to the second embodiment will be described.
(7) Device configuration of the video content description generation device
(8) Relationship between event index and abstract index
(9) Structure of state transition table according to the second embodiment
(10) Example of algorithm for generating description of video contents
(11) Specific example of an explanatory note generated using syntax and character string definition information
Will be described in this order.
[0105]
(7) Device configuration of the video content description generation device
FIG. 10 shows a configuration diagram of a video content description generation device according to the second embodiment. FIG. 10A shows an example of a hardware configuration of the video content description generation device 200, and FIG. FIG. 2 shows a functional block diagram of an explanatory note generation device 200.
The hardware configuration of the video content explanation generating device 200 may be any device having at least a CPU (Central Processing Unit), a display, a keyboard, and a magnetic disk. For example, a personal computer as shown in FIG. Can be used.
[0106]
Also, as shown in FIG. 2B, the video content description generation device 200 includes a video input unit 201 for inputting a structured video as a target of the description generation, and a term, generation granularity described later, A state transition table storage unit 202 in which a state transition pattern, character string definition information, and a key index are stored as a state transition table; an operation input unit 203 for inputting various specifications necessary for a process of generating a description of a video content; When the range (structural unit) of the video scene for which the description is to be generated is specified via the input unit 203, the structure unit for which the description is to be generated is referred to by referring to the state transition table in the state transition table storage unit 202. A term search unit 204 that searches for a generation granularity that matches with and uses the corresponding generation granularity to search for terms that hold in the structural unit; A description generating unit 205 that generates a description of a video scene using the character string definition information of the (transformed term) state transition pattern, and a description display unit that displays the description generated by the description generating unit 205 206.
[0107]
It should be noted that the video content description generating apparatus 200 according to the second embodiment includes a state transition table in which information used for generating a description is set, and a character string or a character string previously assigned to the structure index and the event index. A description is generated using the convertible attribute information.
[0108]
(8) Relationship between event index and abstract index
Next, a structured video to be generated by the video content description generating apparatus 200 according to the second embodiment will be described. The structured video used in the second embodiment includes a structure index for dividing the video into a semantic group, an event index for specifying the content and location of an event that has occurred in the video, and When a state transition pattern that defines the term is established, an abstract index is provided to indicate that the meaning of the term is established.
[0109]
Note that the relationship between the structure index and the event index is the same as the content described in “(2) Relationship between structure index and event index” in the first embodiment. The relationship will be described. In addition, as a video unit when generating a description, a structure unit defined by a structure index is specified as a generation granularity.
[0110]
The event index is an index for specifying the content and location of an event that has occurred in the video. In other words, it is provided in association with one event on the video and has fragmentary information indicating the content (meaning) of the event as attribute information.
[0111]
The abstract index is an index indicating a meaning represented by a combination of a plurality of events (that is, a term having a high degree of abstraction described in the first embodiment). Further, the abstract index has meaning only when the occurrence of a plurality of events in a certain section becomes clear, and has a meaning that can be expressed by the occurrence pattern of the plurality of event indexes.
[0112]
On the other hand, in the second embodiment, the definition of the state transition pattern is defined as a term having a high degree of abstraction as an input sequence of a plurality of event indexes that continuously occur in the generation granularity based on the event (event index) occurrence pattern. Is defined.
[0113]
In other words, the abstract index has a meaning that can be expressed by the occurrence pattern of a plurality of event indexes, and the state transition pattern defines a term as an input sequence of the plurality of event indexes. So Therefore, one abstract index and one state transition pattern exist in a one-to-one correspondence with the input sequence (or occurrence pattern) of the event index as a constituent requirement. Therefore, a state transition pattern (input sequence of a plurality of event indexes) corresponding to the abstract index on a one-to-one basis can be expressed using the abstract index. In this case, the state transition pattern can be represented by a smaller number of indexes by expressing the state transition pattern using the abstract index and the event index. In addition, by expressing the state transition pattern using the abstract index and the event index, it is possible to easily define a term having a higher degree of abstraction.
[0114]
Also, it is possible to add a corresponding abstract index to a video only when a state transition pattern is established, and when an abstract index is assigned in a video, a corresponding state transition pattern is established. Means that
[0115]
Next, an example in which a state transition pattern is defined using an abstract index will be described with reference to FIGS. For example, as shown in FIG. 11A, a state transition pattern of a certain term W is defined by an input column of an event index represented by event 1 (event index) → event 2 → event 3 → event 8 → event 12. Suppose you have
[0116]
Further, as shown in FIG. 11B, a state transition pattern of a certain term Z is defined by an input sequence of event indexes represented by event 1 → event 2 → event 3 → event 8 → event 12 → event 13. And
[0117]
In such a case, it is defined that the state transition pattern of the term W (event 1 → event 2 → event 3 → event 8 → event 12) is expressed by using the abstract index W corresponding to the term W. The state transition pattern of the term Z can be described in a simplified manner as shown in FIG. 11C by using the abstract W (abstract index W) and the event 13 (event index).
[0118]
When defining a state transition pattern, not only an event index but also an abstract index can be used, so that a state transition pattern of a more complicated term (a term of a more abstract concept) can be easily defined.
[0119]
(9) Structure of state transition table according to the second embodiment
FIG. 12 shows a structural example of a state transition table according to the second embodiment. A term, a generation granularity, a state transition pattern, a key index, and character string definition information are set in the state transition table.
[0120]
The term is a term that can be expressed by a combination of a plurality of events, and is the same as the content described in the first embodiment.
The generation granularity is obtained by setting an appropriate structural unit as a video unit when generating a description. A structural unit defined by a structural index can be specified. That is, the generation granularity is the same as the search granularity in the first embodiment.
[0121]
The state transition pattern defines a term as an input sequence of a plurality of event indexes that continuously occur in the generation granularity based on an occurrence pattern of an event expressing the term. Also, as described above, the state transition pattern can be expressed using not only the event index but also the abstract index. For example, in FIG. 12, the state transition pattern of the term C can be expressed as an input sequence represented by “event 1 → event 4 → event 6 → event 7 → event 8” when expressed only by the event index. If represented using an abstract index (abstract B), it can be represented as an input sequence represented by “abstract B → event 7 → event 8”.
[0122]
The key index is set by selecting at least one of the event indexes existing in the state transition pattern. In the second embodiment, it is assumed that an abstract index representing a corresponding term, in other words, an abstract index representing the entire corresponding state transition pattern is set as one of the key indexes.
[0123]
In the character string definition information, a syntax element used for generating an explanatory note and an input source of a character string used as a syntax element of the syntax are set. In addition, attribute information of a structure index or an event index is set as an input source of a character string used as a syntax element. For example, a setting is made such that the content of the attribute information of １ of event 1 (event index) is the input source of the character string. As the input source of the character string, the type of the attribute information may be simply specified as described later, or the character string may be inserted between "" and "" as in the syntax element (What) of the term C. (Xxxxxx) may be directly described, and this character string may be used as an input source.
[0124]
As the syntax element, a syntax element based on 5W1H of “when (When), where (Where), why (Why), who (Who), what (What), how (How) it became)” was used. Is set. In the character string definition information, all 5W1H syntax elements may be set, or only some of them may be set. It is also possible to set other syntax elements.
[0125]
Here, the state transition table shown in FIG. 12 is regarded as a table in which an abstract index representing each term is defined, and an example of the definition of the abstract index when a baseball video is targeted will be specifically described.
[0126]
For example, if you define an abstract index for the term "1 base hit"

Can be described.
[0127]
Note that the above description is created according to the following rules.
"& ABSINDEX string":
& ABSINDEX is a declarator (identifier) of the abstract index, and indicates that the following “character string” is the name (term) and meaning of the abstract index is “character string”.
"& RANGE string":
& RANGE is a declarator of the structural unit (generation granularity), and specifies the structural unit (generation granularity) in the subsequent “character string”.
[0128]
"& PATTERN string":
& PATTERN is a state transition pattern declarator, and the subsequent “character string” expresses the state transition pattern. For example, "& PATTERN hit" indicates that the state transition pattern is composed of one event (hit event index).
"& KEY string":
& KEY is a key index declarator, and specifies the key index with the following “character string”. Note that, as the key index, in addition to the subsequent “character string”, the abstract index itself declared in “& ABSINDEX character string” is automatically specified.
[0129]
"& EXP <character string>":
& EXP is a declarator of character string definition information, and defines a syntax element and its input source in the following “<character string>”. For example, in the case of <Wen: inning_time, Who: batter_name, What: “1 base hit”>, the syntax element (When) receives the attribute information (inning_time) as the input source, and the syntax element (Who) inputs the attribute information (batter_name). The syntax element (What) indicates that the input source is “first base hit” (indicating that a character between “,” is input).
[0130]
Next, examples of definitions of other abstract indexes are shown below.

[0131]
(10) Example of algorithm for generating description of video contents
Next, with reference to the flowchart of FIG. 13, an algorithm of a description generating process of a video content will be described. When generating a description of a video content using the video content description generation device 200 of the second embodiment, first, a video for which a description of the video content is to be generated, that is, a video of a target structural unit (only index information) May be input via the operation input unit 203 and the video input unit 201 (S1301).
[0132]
Here, as a method of inputting the video of the target structural unit, for example, the user specifies, via the operation input unit 203, the range of the video scene for which the description is to be generated, and A method of inputting an image (index information) in a structural unit corresponding to a range, or setting a specific structural unit as a range for generating an explanatory sentence in advance, and automatically extracting and inputting an image of a specific structural unit You can also do it. In the former method, the user can select a desired video scene and generate an explanatory note of only the desired video scene. In the latter method, the description is continuously and automatically generated for each specific structural unit. It is possible to generate an explanatory sentence.
[0133]
Next, the term search unit 204 refers to the state transition table of the state transition table storage unit 202 and searches for a generation granularity that matches the structural unit of the video for which the description is generated (S1302). As a result of this search, it is determined whether or not there is a matching generation granularity (S1303). If there is a matching generation granularity, the process proceeds to step S1304. If there is no matching generation granularity, the process proceeds to step S1307. The term search unit 204 searches the state transition table in order from the top until there is no matching generation granularity, or shifts from step S1302 to step S1303 every time a matching generation granularity is searched.
[0134]
Next, the term search unit 204 determines whether an event index that matches the key index of the state transition pattern corresponding to the corresponding generation granularity exists in the target structural unit (S1304). If no event index exists, the process returns to step S1302 to search for the next matching generation granularity. On the other hand, if a matching event index exists, it is determined whether or not the input sequence of the event index of the state transition pattern matches the occurrence pattern of the event index in the target structural unit (S1305).
[0135]
If the occurrence patterns match in step S1305, it is determined that the term defined in the state transition pattern has been established, and an abstract index of the established term is added to the video (index information) input in step S1301 (S1306). ), And return to step S1302. By adding the abstract index of the established term to the video, for example, it becomes possible to use the abstract index when generating the description again from the same video.
[0136]
The processes in steps S1302 to S1306 are executed until there is no more matching generation granularity in the state transition table. In other words, it is determined whether or not a term of each generation granularity is satisfied with respect to all matching generated granularities in the state transition table, and a set of satisfied terms is extracted.
[0137]
If it is determined in step S1303 that there is no matching generation granularity, the explanatory note generation unit 205 generates an explanatory note using the character string definition information of the established term (S1307), and generates the generated explanatory note. The message is displayed on the explanatory note display unit 206, and the process ends. Note that a syntax based on 5W1H is prepared in advance in the explanatory sentence generating unit 205 for generating an explanatory sentence. When generating the description, the description generation unit 205 generates the description by arranging the syntax elements of the character string definition information in the syntax based on 5W1H.
[0138]
Although the above description has been made on the assumption that the key index is an event index, the algorithm of the description generation process has been described. However, after the step S1303, the abstract index is preferentially used as the key index, so that the target structure can be obtained. Added processing to determine whether or not an abstract index (key index) corresponding to the unit has been added. If an abstract index has been set, the term defined in the corresponding state transition pattern has been established. By adding the process of determining whether the term is satisfied, the speed of the process of determining whether or not the term is established can be increased. After that, when there is no abstract index, the process from step S1304 is executed using the event index specified as the key index.
[0139]
(11) Specific example of an explanatory note generated using syntax and character string definition information
Next, an example of a syntax based on 5W1H and an example of a description generated using the syntax and character string definition information will be specifically described. In addition, the following 5W1H syntax example is used as an example of the syntax for simplifying the description.

[0140]
(1) When the number of established terms is one (Example 1)
First, a description will be given with reference to the simplest example of generating a description.
For example, the term established is "timely hit"
The string definition information is
"& EXP <When: inning_time, Who: batter_name, What:" Timely hit ">"
, The character string definition information of the term
"When", "Who", "What"
The following three syntax elements are obtained.
[0141]
Also, the content (character string) of the attribute information specified as the input source of the syntax element
"Inning_time = once back"
"Batter_name = Takahashi"
"What:" Timely hit ""
, The following explanatory sentence is generated from the syntax and the character string definition information.

Note that (none) indicates a portion where there is no corresponding information, and it is not necessary that all 5W1H syntax elements exist. As described above, an explanatory note can be generated using the syntax element obtained from the character string definition information. If a syntax element other than 5W1H is specified in the character string definition information, the syntax may be adjusted as appropriate to lengthen the description, or a plurality of syntaxes incorporating other syntax elements may be used in advance. It may be prepared and used by selecting as appropriate.
[0142]
(2) When there are multiple established terms (Example 2)
When there are a plurality of terms that have meanings, the explanatory sentence generation unit 205 extracts the syntax elements defined in the character string definition information of each term, and combines the extracted syntax elements to form a 5W1H It is placed at the position of the corresponding syntax element of the syntax, and a description is generated.
[0143]
For example, if the term is established,
"Timely hit"
"Reversal"
The two,
Each string definition information is
"& EXP <When: inning_time, Who: batter_name, What:" Timely hit ">"
"& EXP <How:" Reverse ">"
In the case of two, from the character string definition information of the established term, as a syntax element,
"When", "Who", "What", "How"
Are obtained.
[0144]
Also, the content (character string) of the attribute information specified as the input source of the syntax element
"Inning_time = once back"
"Batter_name = Takahashi"
"What:" Timely hit ""
"How:" Reverse ""
, The following explanatory sentence is generated from the syntax and the character string definition information.

[0145]
(3) When there are multiple established terms (Example 3)
If there are a plurality of terms that have meanings and there are duplicate syntax elements (for example, if there are multiple “Who” syntax elements), the state transition pattern of each term including the duplicate syntax elements is Reference and select a syntax element that refers to an event index that is later in the state transition pattern.
[0146]
For example, if the term is established,
"1 base hit"
"Timely hit"
The two,
Each string definition information is
"& EXP <When: inning_time, Who: batter_name, What:" 1 base hit ">"
"& EXP <When: inning_time, Who: batter_name, What:" Timely hit ">"
In the case of two, from the character string definition information of the established term, as a syntax element,
"When", "Who", "What"
Are obtained, but all three syntax elements are duplicated.
[0147]
Also, the content (character string) of the attribute information specified as the input source of the syntax element
For a single run:
"Inning_time = once back"
"Batter_name = Kiyohara"
"What:" One base hit ""
For a timely hit:
"Inning_time = once back"
"Batter_name = Takahashi"
"What:" Timely hit ""
And Here, the contents of “inning_time” are the same for both “one time back” and therefore need not be selected, but since the contents of “batter_name” and “What” are different, one must be selected.
[0148]
In such a case, the state transition pattern of each term including the duplicated syntax element is referred to, and the syntax element that refers to an event index that is later in the state transition pattern is selected.
For example,
The state transition pattern of the first run is
& PATTERN hit
Timely hit state transition pattern,
& PATTERN 1 base hit, additional points +
In the case of, the positions where the event index (hit), the event index (first base hit), and the event index (additional point +) in the range of the video scene are given are compared, and the position is given to a position later in time. Identify the event index. Here, description will be made assuming that the event index (addition point +) is added to a position that is later in time.
[0149]
Since the term referring to the specified event index (additional point +) is a timely hit, the syntax element referring to the event index that is later in the state transition pattern is the timely hit. Select a syntax element.
[0150]
The following explanation is generated from the syntax and the character string definition information.

[0151]
Further, by adding a condition for generating a description when a plurality of terms are established, a description can be generated by arranging overlapping syntax elements in the order of occurrence time as described below.

[0152]
(4) When there are multiple established terms (Example 4)
Further, as another example of generating a description sentence in which a plurality of terms have been established, when there are duplicate syntax elements in the syntax elements of 5W1H, the state transition pattern of each term is compared, and more event indexes are used. The description is generated by giving priority to the syntax element of the state transition pattern defined by using.
[0153]
For example, if the term is established,
"Timely hit"
"Timely two base"
"Reversal"
Of the three,
Each string definition information is
"& EXP <When: inning_time, Who: batter_name, What:" Timely hit ">"
“& EXP <Wen: inning_time, Who: batter_name, What:“ Timely to base ”>”
"& EXP <How:" Reverse ">"
In the case of three, from the character string definition information of the established term, as a syntax element,
"When", "Who", "What", "How"
Are obtained, but three syntax elements other than “How” are duplicated.
[0154]
Also, the content (character string) of the attribute information specified as the input source of the syntax element
For a timely hit:
"Inning_time = once back"
"Batter_name = Takahashi"
"What:" Timely hit ""
For a timely hit:
"Inning_time = once back"
"Batter_name = Takahashi"
"What:" Timely two base ""
And Here, the contents of “inning_time” are the same for both “one time back” and the same, and “batter_name” is the same for “Takahashi”, so there is no need to select especially. However, since the contents of “What” are different, You have to choose.
[0155]
In such a case, the state transition patterns of the terms including the overlapping syntax elements are compared, and the syntax element of the state transition pattern defined using more event indexes is preferentially selected.
For example,
Timely hit state transition pattern,
& PATTERN 1 base hit, additional points +
Timely two-based state transition pattern
& PATTERN hit, second base advance, additional point +
, The number of event indexes referred to in the state transition pattern is that the “timely hit” is the event index (1 base hit) and the event index (additional point +), and the “timely two base” is the event index. Index (hit), event index (second base advance), and event index (additional point +).
Therefore,
Reference count of timely two bases (3)> Reference count of timely hits (2)
Here, a timely two-based syntax element is selected.
[0156]
The following explanation is generated from the syntax and the character string definition information.

[0157]
In addition, as another example of generating a description, as shown below, simply overlapping syntax elements may be listed in parallel.

[0158]
In other words, the description is described as "Reverse due to a gust between the first and second and third bases, (Matsuzaka error, Takahashi hit)". As described above, depending on the situation, there is a case where simply enumerating in parallel can transmit an event occurring in the video scene as more accurate information.
[0159]
The above-described method for generating a description sentence of video content according to the second embodiment is realized by executing a program by a computer in advance in accordance with the above-described description and the procedure shown in each flowchart. This program is recorded on a computer-readable recording medium such as a hard disk, a floppy disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. Further, this program can be distributed via the recording medium or via a network.
[0160]
【The invention's effect】
As described above, according to the video search method of the present invention (claims 1 to 8), the meaning of the video scene due to the occurrence of a plurality of events is represented in advance. Can be expressed by combining multiple events with For the term, set an appropriate structural unit as the search target unit when searching for video as the search granularity, and based on the occurrence pattern of the event expressing the term with multiple events, the A state transition pattern corresponding to a term is defined as an input sequence defining the order of a plurality of event indexes to be generated, and when searching for a desired video scene from a video, a term representing the desired video scene is input. Using the search granularity corresponding to the input term as a search target unit, for each structural unit that matches the search granularity, whether the state transition pattern corresponding to the input term matches the occurrence pattern of the event index in the structural unit In order to output the matching structural unit as a search result, it is necessary to make inquiries about the meaning of the content of the video using terms or concepts with a high level of abstraction. It can be, and, it is possible to perform high-speed search.
[0161]
Also, for each state transition pattern corresponding to a term, at least one of the event indexes existing in each state transition pattern is designated as a key index for starting a search, and a desired index is selected from the video. When searching for a video scene, after searching for a structural unit that matches the search granularity and that has an event index that matches the key index, the state transition pattern and the structural unit After determining whether or not the occurrence pattern of the event index matches, a high-speed key index search (that is, a keyword search) is performed to narrow down the target structural units, and then a determination using a state transition pattern is performed. , So queries using terms or concepts with a higher level of abstraction It is possible to speed up the image search.
[0162]
Also, based on the structural unit output as a search result, when extracting a desired video scene from a video, to output the video scene specified by the structural unit, that is, as a search granularity when previously searching for a video, Since a video scene is output in a set appropriate structure unit, a video scene close to a video portion desired by a user can be easily output.
[0163]
In addition, when extracting a desired video scene from a video based on the structural unit output as a search result, an arbitrary upper or lower structural unit on the structured video can be specified. By combining the input of the query using the term and the specification of the structural unit to be output, it is possible to improve the convenience in performing the search. For example, if there is a video scene that the user actually wants to see before and / or after the structural unit searched using the abstract term expressing the video scene, or the structural unit searched using the abstract term expressing the video scene This is convenient when there is a video scene that the user actually wants to see in a part (lower-order structural unit).
[0164]
In addition, when extracting a desired video scene from a video based on a structural unit output as a search result, the video scene is extracted by specifying an offset for video clipping before and after a place where a key index is added. Thus, the convenience in performing the search can be improved. For example, this is convenient when it is desired to also confirm the situation before and / or after the structural unit searched using an abstract term representing a video scene.
[0165]
When it is determined whether the state transition pattern corresponding to the input term matches the occurrence pattern of the event index in the structural unit, an index representing the term is defined as an abstract index in the matched structural unit. When a video is searched again using the same abstract term, the key index (abstract index) is used to further speed up the search because the key index is newly added and reused as a key index. be able to. Further, when the structural unit output as the search result is to be checked again later, the same structural unit can be reliably searched using this key index (abstract index).
[0166]
Also, a plurality of attribute information is added to the structure index and the event index, and in the state transition pattern corresponding to the term, a description sentence related to the term is generated using each attribute information of the structure index and the event index. Character string definition information to generate a description sentence related to the term by referring to the attribute information in the structural unit output as a search result based on the character string definition information of the state transition pattern Therefore, it is possible to interpret the meaning of the content of the video and convert it to a character string having a general meaning, thereby generating a description character string of the video content that is easy for the user to understand.
[0167]
In addition to the event index, the abstract index Use In addition, since a state transition pattern corresponding to a term is defined as an input sequence including an event index and an abstract index, the state transition pattern can be easily defined, and the search processing can be speeded up.
[0168]
According to the video search method of the present invention (claim 9), as a process before performing the video search, the meaning of the video scene due to the occurrence of a plurality of events is represented in advance. Can be expressed by combining multiple events with For the term, set an appropriate structural unit as a search target unit as a search target unit when searching for video, as a search granularity, and based on the occurrence pattern of an event expressing a word with multiple events, continuously within the search granularity A state transition pattern corresponding to the term is defined as an input sequence that defines the order of a plurality of event indexes to be generated, and further, for each state transition pattern corresponding to the term, of the event indexes existing in each state transition pattern, Performing a video search including a state transition table generating step of generating a state transition table including a term, a search granularity, a state transition pattern, and a key index by designating at least one event index as a key index for starting a search. In the process, a term input step of inputting a term expressing a desired video scene and a state transition Referring to the table, the search granularity corresponding to the term input in the term input step is used as a search target unit, and a structural unit having an event index matching the key index is searched using the key index corresponding to the term. A search step, and a first determination step of referring to the state transition table to determine whether or not all of the event indexes included in the state transition pattern corresponding to the term exist in the structural unit searched in the search step A second step of determining whether or not the state transition pattern corresponding to the input term matches the occurrence pattern of the event index in the structural unit with respect to the structural units determined to be all present in the first determining step; A video scene is cut out from the video based on the structural unit determined to match in the determination step and the second determination step, and is output as a search result. And a query result output step of performing a query on the meaning of the contents of the video using a term or concept with a high degree of abstraction, and performing a key index search (that is, a keyword search) at high speed. Since the determination using the state transition pattern is performed after narrowing down the target structural unit, it is possible to speed up the video search by an inquiry using a term or concept with a higher degree of abstraction.
[0169]
Further, according to a computer-readable recording medium of the present invention (claim 10), a program for causing a computer to execute the video search method according to any one of claims 1 to 9 is recorded. Makes it possible to make inquiries about the meaning of the contents of the video using terms or concepts with a high degree of abstraction, and to perform a key index search (that is, a keyword search) at high speed to be a target. Since the determination using the state transition pattern is performed after narrowing down the structural units, it is possible to speed up a video search by an inquiry using a term or concept with a higher degree of abstraction.
[0170]
Further, according to the video search processing device of the present invention (claims 11 and 12), a video input means for inputting a structured video to be searched, and a video scene for specifying a desired video scene to be searched. Represents the meaning of a video scene due to the occurrence of multiple events and can be expressed by a combination of multiple events A term, a search granularity that specifies the structural unit as a search target unit when searching for videos by terms, and a term defined as an input sequence that defines the order of multiple event indexes that occur consecutively in the search granularity Storage means for storing, as a state transition table, the obtained state transition pattern, a key index designating at least one event index among the event indexes existing in the state transition pattern, and a search means for retrieving a desired video scene. An operation input unit for inputting or specifying an inquiry term, and using a key index corresponding to the inquiry term by referring to a state transition table of the storage unit when the inquiry term is input or specified via the operation input unit. Match the search granularity from the video input by the video input means and match the key index Search means for searching for a structural unit having an event index, and inputting the structural unit searched by the search means, and determining whether a state transition pattern corresponding to the query term matches an occurrence pattern of an event index in the structural unit And a search result output unit that cuts out a video scene from the video based on the structural unit determined to match by the determination unit and outputs the video scene as a search result. Can be performed using terms or concepts with a high degree of abstraction, and a high-speed key index search (that is, a keyword search) is performed to narrow down the target structural units, and then a state transition pattern is used. Since the judgment is made, it is possible to speed up video search by inquiries using terms or concepts with a higher level of abstraction. Can.
[0171]
Further, according to the video index assigning method of the present invention (claim 13), the meaning is established by a plurality of events occurring consecutively in advance. It represents the meaning of a video scene due to the occurrence of multiple events and can be expressed by a combination of multiple events A term, a state transition pattern that defines the term using an input sequence that defines the order of a plurality of event indices expressing the term by a plurality of events, and a structural unit of video defined by a predetermined structural index correspond to the term A state transition table in which the specified search granularity is made to correspond to each term is set, and when assigning a video index to a video, the structure unit specified by the structure index is referred to by referring to the state transition table. In each case, a term whose search granularity matches the target structural unit and whose input sequence of event indexes assigned in the target structural unit matches the input sequence of multiple event indexes of the state transition pattern When a search is performed and a matching term exists, it is determined that the meaning of the matching term has occurred, and an event index or attribute indicating the establishment of the corresponding term. To impart information, it is possible to grant the index using the highly abstract terms efficiently fast.
[0172]
According to the video index assigning method of the present invention (claim 14), the meaning is established by a plurality of events occurring consecutively in advance. It represents the meaning of a video scene due to the occurrence of multiple events and can be expressed by a combination of multiple events A term, a state transition pattern that defines the term using an input sequence that defines the order of a plurality of event indices expressing the term by a plurality of events, and a structural unit of video defined by a predetermined structural index correspond to the term A state transition table is set in which the search granularity specified by the above and the central index that specifies at least one event index among the event indexes existing in the state transition pattern are set for each term, When an event index that matches the central index is assigned by referring to the state transition table when assigning a video index to the object, the target structural unit and search granularity are assigned to each structural unit specified by the structural index. Match, and the assignment order and the state transition pattern of the event index assigned in the target structural unit Searches for a term that matches the input column of multiple event indexes, and if a matching term exists, determines that the meaning of the matched term has occurred, and an event index or attribute indicating the establishment of the corresponding term Since information is added, it is possible to efficiently and quickly add an index using terms having a high degree of abstraction.
[0173]
According to the computer-readable recording medium of the present invention (claim 15), a program for causing a computer to execute the video index assigning method according to claim 13 or 14 is recorded. By doing so, index assignment using terms with a high degree of abstraction can be performed efficiently and at high speed.
[0174]
According to the method for generating a description of a video content according to the present invention (claims 16 to 22), the state transition table indicates the meaning of the video scene due to the occurrence of a plurality of events. Can be expressed by combining multiple events with For each term, based on the generation granularity that sets an appropriate structural unit as a video unit when generating a description and the occurrence pattern of the event that expresses the term in multiple events, it continuously occurs in the generation granularity Select and set at least one of a state transition pattern in which terms are defined as an input sequence defining the order of a plurality of event indexes to be performed, and an event index existing in each state transition pattern for each state transition pattern Key index, and for each state transition pattern, the syntax element used to generate the description and the character string definition information that sets the input source of the character string used as the syntax element of the syntax are set. When generating a description, a search is made for a generation granularity that matches the structural unit for which the description is to be generated with reference to the state transition table, and the corresponding generation It is determined whether an event index that matches the key index of the state transition pattern corresponding to exists in the target structural unit, and if a matching event index exists, the corresponding state transition pattern is determined as the target. Determines whether the occurrence pattern of the event index in the structural unit matches, and if the occurrence patterns match, determines that the term defined in the corresponding state transition pattern has been established, and the state transition of the established term In order to generate a description of the video scene of the target structural unit using the character string definition information of the pattern, the meaning of the video content is interpreted, converted into a general meaning character string, and It is possible to generate an easy-to-understand explanation (explanatory character string) of the video content. In other words, for a target structural unit (video), it is possible to generate an optimal description (character string) according to the meaning of the content of the video generated in that part.
[0175]
Further, when a state transition pattern that defines a specific term is established in the structural unit, it is possible to add an index representing the term as an abstract index, and the state transition table includes, for each state transition pattern, , Whether an abstract index representing the corresponding term is set as one of the key indexes, and whether an event index that matches the key index of the state transition pattern corresponding to the corresponding generation granularity exists in the target structural unit When determining whether or not an abstract index has been assigned as a key index, it is determined whether or not an abstract index corresponding to the target structural unit has been assigned. , It is determined that the term defined in the corresponding state transition pattern has been established, and the state of the established term In order to generate a description of the video scene of the target structural unit using the character string definition information of the transfer pattern, a term corresponding to the meaning of the content of the video (specification or not) is efficiently and quickly performed. It becomes possible.
[0176]
In addition, since the attribute information of the structure index or the event index is set as the input source of the character string used as the syntax element in the character string definition information, if the state transition pattern matches and the term is established, Since attribute information is always present in the necessary structure index and state transition pattern, it is possible to reliably input a character string and generate an explanatory note.
[0177]
In addition, the character string definition information basically includes 5W1H of "when (When), where (Where), why (Why), who (Who), what (What), and how (How)." Since the syntax elements described above are set, it is possible to easily generate an explanatory sentence including 5W1H information using these syntax elements.
[0178]
In addition, when there are a plurality of state transition patterns corresponding to the corresponding generation granularity, it is determined whether or not a term is established for each state transition pattern. Since the description of the video image scene of the target structural unit is generated by combining the syntax elements of the set character string definition information, the description can be generated in more detail.
[0179]
Also, when syntactic elements of the character string definition information set in the state transition pattern of each term are combined to generate a description of the video scene of the target structural unit, a syntactic element that overlaps with the 5W1H syntax element In the case where there is, a description element using newer information (attribute information) can be generated because priority is given to a syntax element that refers to an event index that occurs later in time.
[0180]
Also, when syntactic elements of the character string definition information set in the state transition pattern of each term are combined to generate a description of the video scene of the target structural unit, a syntactic element that overlaps with the 5W1H syntax element If there is, compare the state transition pattern of each term and generate a description using newer information (attribute information) to give priority to the syntax elements of the state transition pattern defined using more event indexes At the same time, the meaning of the content of the video generated in that part can be expressed by a more appropriate explanation (character string). That is, giving priority to the syntax element of the state transition pattern defined by using more event indexes means, for example, when there is a second state transition pattern that is composed of a part of the first state transition pattern. When the first state transition pattern is established, the second state transition pattern is always established. However, the syntax element of the first state transition pattern having a large event index is always selected. The final result in the partial video can be selected as a syntax element.
[0181]
Also, when syntactic elements of the character string definition information set in the state transition pattern of each term are combined to generate a description of the video scene of the target structural unit, a syntactic element that overlaps with the 5W1H syntax element In the case where there is, since the syntax elements are arranged in parallel as necessary, it is possible to generate a description with more information amount, in other words, an easy-to-understand explanation.
[0182]
A computer-readable recording medium according to claim 24 records a program for causing a computer to execute the method for generating a description of a video content according to any one of claims 16 to 23. Is interpreted and converted into a character string having a general meaning, so that a description (description character string) of the video content that is easy for the user to understand can be generated.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a video search processing device according to a first embodiment of the present invention.
FIG. 2 is an explanatory diagram showing a relationship between a structure index and an event index.
FIG. 3 is an explanatory diagram showing a relationship between a structure index and an event index.
FIG. 4 is an explanatory diagram illustrating a state transition table according to the first embodiment;
FIG. 5 is an explanatory diagram illustrating a state transition table according to the first embodiment;
FIG. 6 is a state transition diagram of a finite state automaton corresponding to a path regular expression according to the first embodiment.
FIG. 7 is a diagram illustrating a state transition of the finite state automaton corresponding to the path regular expression according to the first embodiment.
FIG. 8 is a flowchart illustrating an algorithm of a video search process according to the first embodiment.
FIG. 9 is an explanatory diagram showing an example of a video index (structure index and event index) when a baseball video is taken as an example.
FIG. 10 is a configuration diagram of a video content description generation device according to a second embodiment.
FIG. 11 is an explanatory diagram illustrating an example in which a state transition pattern is defined using an abstract index according to the second embodiment.
FIG. 12 is an explanatory diagram illustrating a state transition table according to the second embodiment;
FIG. 13 is a flowchart illustrating an algorithm of a video content description generation process according to the second embodiment.
[Explanation of symbols]
100 Video search processing device
101 Video input section
102 state transition table storage unit
103 Operation input unit
104 Video Search Unit
105 Judgment unit
105a first determination unit
105b Second determination unit
106 search result output section
200 Video description generator
201 Video input unit
202 state transition table storage unit
203 Operation input section
204 Term Search Unit
205 Explanation generator
206 Description display section

Claims

At least a structure index for dividing a video into a semantic group and an event index for specifying the content and location of an event that has occurred in the video are added, and the video scene of the section divided by the structure index is used as a video structure. In a video search method for searching for a desired video scene from the video using the structure index and the event index, targeting a video structured using a plurality of hierarchically structured units as a unit,
In advance, for a term that represents the meaning of a video scene due to the occurrence of a plurality of events and that can be expressed by a combination of a plurality of events, an appropriate structural unit as a search target unit when searching for the video is set as a search granularity. A state transition pattern corresponding to the term as an input sequence that determines the order of a plurality of event indexes that occur continuously in the search granularity based on an occurrence pattern of an event in which the term is expressed by a plurality of events. Defined,
When searching for a desired video scene from the video, a term representing the desired video scene is input, and a search granularity corresponding to the input term is used as a search target unit, and for each structural unit that matches the search granularity, A video search method comprising: determining whether a state transition pattern corresponding to an input term matches an occurrence pattern of an event index in a structural unit; and outputting the matched structural unit as a search result.

Further, for each state transition pattern corresponding to the term, at least one event index among the event indexes present in each state transition pattern is designated as a key index for starting a search,
When searching for a desired video scene from the video, in a structural unit that matches the search granularity, and after searching for a structural unit that has an event index that matches the key index, for the corresponding structural unit, The video search method according to claim 1, wherein it is determined whether or not the state transition pattern matches an occurrence pattern of an event index in a structural unit.

The video scene specified by the structural unit is output when extracting a desired video scene from the video based on the structural unit output as the search result. Video search method.

When extracting a desired video scene from the video based on the structural unit output as the search result, an arbitrary upper or lower structural unit on the structured video can be designated. The video search method according to claim 1.

When extracting a desired video scene from the video based on the structural unit output as the search result, the video scene is extracted by specifying an offset for video clipping before and after the place where the key index is added 3. The video search method according to claim 2, wherein:

When it is determined whether or not the state transition pattern corresponding to the input term matches the occurrence pattern of the event index in the structural unit, an index representing the term is defined as an abstract index in the matched structural unit. 6. The video search method according to claim 2, wherein the video search method is newly added and reused as a key index.

Advance to, respectively that of the plurality of terms that can be represented by a combination of events, have configured the abstract index as an index representing each term,
Claims in defining the state transition pattern, and have use the abstract index in addition to the event index, characterized by defining the state transition pattern corresponding to said term as an input string and event index consisting abstract index Item 7. The video search method according to any one of Items 1 to 6.

A plurality of pieces of attribute information are added to the structure index and the event index,
The state transition pattern corresponding to the term, character string definition information for generating a description related to the term using each attribute information of the structure index and the event index is added,
8. A description sentence related to the term is generated by referring to attribute information in a structural unit output as the search result based on the character string definition information of the state transition pattern. The video search method according to any one of the above.

At least a structure index for dividing a video into a semantic group and an event index for specifying the content and location of an event that has occurred in the video are added, and the video scene of the section divided by the structure index is used as a video structure. In a video search method for searching for a desired video scene from the video using the structure index and the event index, targeting a video structured using a plurality of hierarchically structured units as a unit,
Before performing a video search,
In advance, for a term that represents the meaning of a video scene due to the occurrence of a plurality of events and that can be expressed by a combination of a plurality of events, an appropriate structural unit as a search target unit when searching for the video is set as a search granularity. A state transition pattern corresponding to the term as an input sequence that defines the order of a plurality of event indexes that occur consecutively in the search granularity based on an occurrence pattern of an event in which the term is expressed by a plurality of events. Defining, for each state transition pattern corresponding to the term, at least one event index out of the event indexes present in each state transition pattern as a key index for starting a search; A state transition table consisting of, search granularity, state transition pattern and key index Includes a state transition table generation step,
When performing video search,
A term input step of inputting a term expressing a desired video scene;
With reference to the state transition table, a search granularity corresponding to the term input in the term input step is set as a search target unit, and, using a key index corresponding to the term, an event index that matches the key index. A search step of searching for structural units having
A first determination step of referring to the state transition table to determine whether or not all of the event indexes included in the state transition pattern corresponding to the term exist in the structural unit searched in the search step;
A second determination is made as to whether or not the state transition pattern corresponding to the input term matches the occurrence pattern of the event index in the structural unit with respect to the structural unit determined to be all present in the first determination step. A determining step;
A search result output step of cutting out a video scene from the video based on the structural unit determined to match in the second determination step and outputting it as a search result;
A video search method comprising:

A computer-readable recording medium having recorded thereon a program for causing a computer to execute the video search method according to any one of claims 1 to 9.

At least a structure index for dividing the video into a semantic group and an event index for specifying the content and location of an event that has occurred in the video are added, and the video scene of the section divided by the structure index is used as a video structure. As a unit, and for a video structured using a plurality of hierarchically structured units, a video search processing device that searches for a desired video scene from the video using the structure index and the event index,
Video input means for inputting the structured video to be searched;
For specifying a desired video scene to be searched, a term that represents the meaning of the video scene due to the occurrence of a plurality of events and can be expressed by a combination of a plurality of events, and a search target unit when searching for a video with the term A search granularity that specifies the structural unit of the search condition; a state transition pattern in which the term is defined as an input sequence that defines the order of a plurality of event indexes that occur consecutively in the search granularity; Storage means for storing, as a state transition table, a key index designating at least one event index of the event indexes to be executed;
Operation input means for inputting or specifying an inquiry term for searching for the desired video scene,
When a query term is input or specified through the operation input means, the state transition table of the storage means is referred to, and a key index corresponding to the query term is used to convert the video input by the video input means. A search unit that searches for a structural unit that matches the search granularity and has an event index that matches the key index;
Inputting the structural unit searched by the searching means, determining means for determining whether the state transition pattern corresponding to the query term matches the occurrence pattern of the event index in the structural unit,
A search result output unit that cuts out a video scene from the video based on the structural unit determined to be matched by the determination unit and outputs it as a search result;
A video search processing device comprising:

A first determination unit configured to determine whether all of the event indexes included in the state transition pattern corresponding to the query term exist in the structural unit searched by the search unit;
A second determination is made as to whether or not the state transition pattern corresponding to the query term matches the occurrence pattern of the event index in the structural unit with respect to the structural unit determined to be all present by the first determining unit. 12. The video search processing device according to claim 11, comprising: a determination unit.

At the time of structuring a video, a video index is provided which includes at least a structure index for dividing the video into a semantic group and an event index for specifying the content and location of an event that has occurred in the video. In the method,
In advance, the meaning is established by the occurrence of a plurality of events in succession, a term that represents the meaning of the video scene due to the occurrence of the plurality of events, and a term that can be expressed by a combination of a plurality of events, and the term is expressed by a plurality of events. A state transition pattern that defines the term using an input sequence that defines the order of a plurality of represented event indexes, and a search granularity that specifies a structural unit of a video defined by a predetermined structural index in correspondence with the term Is set in a state transition table corresponding to each term,
When assigning a video index to the video, referring to the state transition table, for each structural unit specified by the structural index, the target structural unit matches the search granularity, and becomes the target A search is made for a term in which the assignment order of the event indexes assigned in the structural unit matches the input string of the plurality of event indexes of the state transition pattern, and when there is a matching term, the meaning of the matched term A video index assigning method, wherein it is determined that the term has occurred, and an event index or attribute information indicating the establishment of the corresponding term is assigned.

At the time of structuring a video, a video index is provided which includes at least a structure index for dividing the video into a semantic group and an event index for specifying the content and location of an event that has occurred in the video. In the method,
In advance, the meaning is established by the occurrence of a plurality of events in succession, the term represents the meaning of the video scene due to the occurrence of the plurality of events, and a term that can be expressed by a combination of a plurality of events, and the term is expressed by a plurality of events. A state transition pattern that defines the term using an input sequence that defines the order of a plurality of represented event indexes, and a search granularity that specifies a structural unit of an image defined by a predetermined structural index in association with the term Among the event indexes present in the state transition pattern, a state transition table in which at least one event index is designated and a central index corresponding to each term are set,
When assigning a video index to the video, referring to the state transition table, if an event index that matches the central index is assigned, each of the structural units specified by the structural index is targeted. Structural unit and the search granularity match, and search for a term in which the input sequence of the event index assigned in the target structural unit and the input sequence of the plurality of event indexes of the state transition pattern match, A video index assigning method, wherein when a matching term exists, it is determined that the meaning of the matched term has occurred, and an event index or attribute information indicating establishment of the corresponding term is assigned.

15. A computer-readable recording medium storing a program for causing a computer to execute the video index assigning method according to claim 13 or 14.

At least a structure index for dividing the video into a semantic group and an event index for specifying the content and location of an event that has occurred in the video are added, and the video scene of the section divided by the structure index is used as a video structure. In a video content description generating method for generating a descriptive text describing the video content as a unit and for a video structured using a plurality of hierarchically structured units,
By using a state transition table in which information used for generating the description is set, and a character string or attribute information that can be converted into a character string that has been previously assigned to the structure index and the event index, A method for generating a description of a video content to be generated,
In the state transition table,
A generation granularity that represents a meaning of a video scene due to the occurrence of a plurality of events and for each term that can be expressed by a combination of a plurality of events, and sets an appropriate structural unit as a video unit when generating the description,
A state transition pattern that defines the term as an input sequence that defines the order of a plurality of event indexes that occur continuously in the generation granularity based on an occurrence pattern of an event in which the term is represented by a plurality of events,
For each of the state transition patterns, a key index that is set by selecting at least one of the event indexes existing in each state transition pattern;
For each of the state transition patterns, a syntax element used to generate the explanatory note and character string definition information that sets an input source of a character string to be used as the syntax element of the syntax are set. When generating the description, the state transition table is referenced to search for a generation granularity that matches the structural unit for which the description is to be generated, and a key index of the state transition pattern corresponding to the relevant generation granularity It is determined whether or not an event index that matches with the target structural unit is present, and if the matching event index is present, the corresponding state transition pattern and the event in the target structural unit are determined. It is determined whether or not the occurrence pattern of the index matches, and if the occurrence pattern matches, the term defined in the corresponding state transition pattern Is determined, and the description of the video scene of the target structural unit is generated using the character string definition information of the state transition pattern of the term that has been established.

Further, when a state transition pattern defining a specific term is established, the structural unit can be provided with an index representing the term as an abstract index,
In the state transition table, an abstract index representing a corresponding term is set as one of the key indexes for each of the state transition patterns,
When determining whether an event index that matches the key index of the state transition pattern corresponding to the corresponding generation granularity exists in the target structural unit, the abstract index is preferentially used as the key index. Determine whether an abstract index corresponding to the target structural unit is assigned, and if an abstract index is set, determine that the term defined in the corresponding state transition pattern is satisfied 17. The method according to claim 16, wherein a description of the video scene of the target structural unit is generated using character string definition information of a state transition pattern of the established term. .

18. The video content according to claim 16, wherein the character string definition information sets attribute information of the structure index or the event index as an input source of a character string used as the syntax element. Description text generation method.

The character string definition information is based on 5W1H of “when (When), where (Where), why (Why), who (Who), what (What), how (How)). The method according to any one of claims 16 to 18, wherein a syntax element is set.

When there are a plurality of state transition patterns corresponding to the corresponding generation granularity, it is determined whether or not a term is established for each state transition pattern. If a plurality of terms are established, the state transition pattern of each term is set. 20. The method according to claim 19, wherein a description of the video scene of the target structural unit is generated by combining the syntax elements of the obtained character string definition information.

When generating a description of the video scene of the target structural unit by combining the syntax elements of the character string definition information set in the state transition pattern of each term, duplicate syntax elements are included in the 5W1H syntax elements. 21. The method according to claim 20, wherein priority is given to a syntax element that refers to an event index occurring later in time.

When generating a description of the video scene of the target structural unit by combining the syntax elements of the character string definition information set in the state transition pattern of each term, duplicate syntax elements are included in the 5W1H syntax elements. 22. The description of the video content according to claim 20, wherein in some cases, the state transition patterns of each term are compared, and a syntax element of the state transition pattern defined using more event indexes is prioritized. Statement generation method.

When generating a description of the video scene of the target structural unit by combining the syntax elements of the character string definition information set in the state transition pattern of each term, duplicate syntax elements are included in the 5W1H syntax elements. 23. The method according to claim 19, wherein the syntax elements are arranged in parallel if necessary.

24. A computer-readable recording medium having recorded thereon a program for causing a computer to execute the method for generating a description of a video content according to any one of claims 16 to 23.