JP4033142B2

JP4033142B2 - Video / scenario matching method, apparatus and program

Info

Publication number: JP4033142B2
Application number: JP2004041592A
Authority: JP
Inventors: 精一紺谷; 行信谷口; 秀信長田; 幸紀南田
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2004-02-18
Filing date: 2004-02-18
Publication date: 2008-01-16
Anticipated expiration: 2024-02-18
Also published as: JP2005236544A

Description

本発明は、映像・シナリオ整合方法及び装置及びプログラムに係り、特に、映像の部分区間（シーン）毎にシナリオテキストを対応付けるための映像・シナリオ整合方法及び装置及びプログラムに関する。 The present invention relates to a video / scenario matching method, apparatus, and program, and more particularly, to a video / scenario matching method, apparatus, and program for associating scenario text with each partial section (scene) of a video.

従来の映像とシナリオテキストを対応付ける第１の方法として、映像のショット切り換えを検出し、ショット単位にコメントをカット・アンド・ペーストすることにより入力する方法がある。 As a first method for associating a conventional video with a scenario text, there is a method of detecting a shot change of a video and inputting a comment by cutting and pasting a comment for each shot.

また、第２の方法として、映像とシナリオテキストとの対応付け処理に誤りがあった場合に、全ての区間についてＩＮ／ＯＵＴ点を指定することにより、誤りを修正する方法がある（例えば特許文献１，２参照）。
特開２００３−２２４７７４「半自動型字幕番組制作システム」特開２００３−２２４７７３「タイムライン上の配置した字幕の境界移動による字幕編集支援システム」 In addition, as a second method, there is a method of correcting an error by specifying IN / OUT points for all sections when there is an error in the process of associating the video with the scenario text (for example, Patent Documents). 1 and 2).
JP 2003-224774 “Semi-automatic subtitle program production system” Japanese Patent Laid-Open No. 2003-224773 “Subtitle Editing Support System by Moving Boundary of Subtitles Arranged on Timeline”

しかしながら、上記従来の第１の方法では、シナリオテキストからショットに対応するテキストをカット・アンド・ペーストする手間が掛かるという問題がある。また、ショットの先頭とナレーションの先頭が一致しないため、ショットの先頭からナレーションを聞き終わるまでに時間がかかる。また、ショット切り換えが存在しない映像に対応できないという問題がある。 However, the first conventional method has a problem that it takes time and effort to cut and paste the text corresponding to the shot from the scenario text. Also, since the beginning of the shot does not match the beginning of the narration, it takes time until the narration is heard from the beginning of the shot. In addition, there is a problem that it is not possible to deal with images in which shot switching does not exist.

また、従来の第２の方法では、対応付けが間違っていた全ての区間に対して、ＩＮ／ＯＵＴ点を指定する必要があり、手間が掛かるという問題がある。 In addition, the conventional second method has a problem that it is necessary to specify IN / OUT points for all sections in which the association is wrong, which takes time.

本発明は、上記の点に鑑みなされたもので、映像の部分区間（シーン）とシナリオテキストとの対応付け作業の効率を向上させる映像・シナリオ整合方法及び装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and an object thereof is to provide a video / scenario matching method, apparatus, and program for improving the efficiency of the work of associating a partial section (scene) of a video with a scenario text. To do.

図1は、本発明の原理を説明するための図である。 FIG. 1 is a diagram for explaining the principle of the present invention.

本発明は、映像の部分区間（シーン）毎にシナリオテキストを対応付けるための映像・シナリオ整合方法において、
映像蓄積手段から読み込んだ映像データとシナリオ蓄積手段から読み込んだシナリオテキストの対応付けを行い（ステップ１）、
オペレータとのユーザインタフェースを用いて、
読み込まれた映像データの部分区間と読み込まれたシナリオテキストとの対応付け結果を表示して、前記オペレータに判断を促し（ステップ２）、
前記オペレータから対応付けが誤っている旨の入力があった場合には（ステップ３）、
前記該オペレータに修正を要求し（ステップ４）、ロック情報記憶手段の映像データの部分区間とシナリオテキストの対応関係を有するレコードのロックフラグをＯＮにし（ステップ５）、対応付けの更新を行うかどうかを該オペレータに問い合わせ（ステップ６）、
前記オペレータから更新を行う旨の入力があれば対応付け処理（ステップ１）から繰り返し、そうでなければ終了する。 The present invention relates to a video / scenario matching method for associating a scenario text with each partial section (scene) of a video,
The video data read from the video storage means is associated with the scenario text read from the scenario storage means (step 1).
Using the user interface with the operator,
Displaying a correspondence result between the read video data partial section and the read scenario text to prompt the operator to make a decision (step 2);
If there is an input from the operator that the correspondence is incorrect (step 3),
Whether correction is requested of the operator (step 4), a lock flag of a record having a correspondence relation between the partial section of the video data in the lock information storage means and the scenario text is turned on (step 5), and the correspondence is updated. Inquire the operator about whether (Step 6),
If there is an input from the operator to update, the association processing (step 1) is repeated, and if not, the process ends.

本発明は、映像蓄積手段から映像データを入力し、シナリオ蓄積手段から一定の語長で区切られたシナリオテキストを入力するデータ入力過程と、
シナリオテキストの区切られた単位毎に話者名を含む記号列であるシナリオの話者パターンを生成して話者パターン記憶手段に格納するシナリオテキスト解析過程と、
映像データを単位区間毎に区切り、話者名を含む記号列である発話パターンを生成して発話パターン記憶手段に格納する映像解析過程と、
話者パターン記憶手段に格納された記号列と、発話パターン記憶手段に格納された記号列をマッチングし、台詞の変わり目と時刻とを対応付けた整合情報を生成し、整合情報記憶手段に格納する対応付け過程と、
整合情報記憶手段から整合情報を読み出して、該整合情報に対応する映像蓄積手段から映像を取得し、さらに、シナリオ蓄積手段からシナリオテキストを取得して、対応付けられた結果を表示手段に表示し、オペレータによって指定された映像部分区間を表示する対応付け結果表示過程と、
オペレータにより対応付け結果が正しいと判断され、対応関係のロックを指定された場合には、表示装置上の画面上のロック状態を示す表示を行うと共に、ロック情報記憶手段のロックフラグをＯＮにし、
オペレータにより対応付け結果の修正指示が行われた場合には、オペレータに対応付けすべきシナリオのキーワードを入力させ、該キーワードに対応するシナリオテキストをシナリオ蓄積手段から検索して表示装置上に表示し、該オペレータから対応関係のロックを指定された場合には、該表示装置上の画面上のロック状態を表す表示を行うと共に、ロック記憶手段のロックフラグをＯＮにすると共に、整合情報を更新する再整合過程と、
全ての映像データとシナリオテキストの整合が終了したら、整合情報を出力する整合結果出力過程と、からなる。 The present invention is a data input process of inputting video data from the video storage means and inputting scenario text delimited by a certain word length from the scenario storage means;
A scenario text analysis process for generating a speaker pattern of a scenario, which is a symbol string including a speaker name for each unit in which the scenario text is divided, and storing it in the speaker pattern storage means;
A video analysis process of dividing video data into unit sections, generating an utterance pattern that is a symbol string including a speaker name, and storing it in the utterance pattern storage means;
Matching the symbol string stored in the speaker pattern storage means with the symbol string stored in the utterance pattern storage means, generating matching information in which the line change and the time are associated, and storing the matching information in the matching information storage means The matching process;
The matching information is read from the matching information storage means, the video is acquired from the video storage means corresponding to the matching information, the scenario text is acquired from the scenario storage means, and the associated result is displayed on the display means. A matching result display process for displaying the video partial section designated by the operator;
When it is determined by the operator that the matching result is correct and the lock of the correspondence relationship is designated, the lock state on the screen on the display device is displayed and the lock flag of the lock information storage means is turned ON,
When the operator instructs the correction of the association result, the operator inputs a keyword for the scenario to be associated, searches the scenario storage means for the scenario text corresponding to the keyword, and displays it on the display device. When the operator designates the lock of the correspondence relationship, the display indicating the lock state on the screen on the display device is performed, the lock flag of the lock storage unit is turned on, and the matching information is updated. Realignment process;
When matching of all the video data and the scenario text is completed, the process includes a matching result output process for outputting matching information.

また、本発明は、対応付け過程において、
ロック情報ファイルを参照し、ロックフラグがＯＮになっている台詞ＩＤに対応した台詞ＩＤに対応した部分でシナリオテキストを分割し、該ロックフラグがＯＮになっている時刻に対応した映像データを分析し、個別に対応付けを行って前記整合情報記憶手段に統合する。 Further, according to the present invention, in the association process,
By referring to the lock information file, the scenario text is divided at the portion corresponding to the dialogue ID corresponding to the dialogue ID whose lock flag is ON, and the video data corresponding to the time when the lock flag is ON is analyzed. Then, they are individually associated and integrated into the matching information storage means.

図２は、本発明の原理構成図である。 FIG. 2 is a principle configuration diagram of the present invention.

本発明は、映像の部分区間（シーン）毎にシナリオテキストを対応付けるための映像・シナリオ整合装置であって、
映像データを蓄積する映像蓄積手段１１０と、
台詞ＩＤ、話者ＩＤ、話者名及び台詞からなるシナリオテキストを蓄積するシナリオ蓄積手段１２０と、
シナリオ蓄積手段からシナリオテキストを読み込み、シナリオの話者パターンを生成して話者パターン記憶手段１２に格納するシナリオテキスト解析手段１４０と、
映像蓄積手段１１０から映像データを読み込み、単位区間毎に区切り、話者名を含む記号列である発話パターンを生成して発話パターン記憶手段１１に格納する映像解析手段１３０と、
話者パターン記憶手段１２に格納された記号列と、発話パターン記憶手段１１に格納された記号列をマッチングし、台詞の変わり目と時刻とを対応付けた整合情報を生成し、整合情報記憶手段１３に格納する対応付け手段１５０と、
整合情報記憶手段１３から整合情報を読み出して、映像蓄積手段１１０から該整合情報に対応する映像を取得し、さらに、シナリオ蓄積手段１２０から該整合情報に対応するシナリオテキストを取得して、対応付けられた結果を表示手段に表示し、オペレータによって指定された映像部分区間を表示する対応付け結果表示手段１６１と、
オペレータにより対応付け結果が正しいと判断され、対応関係のロックが指定された場合には、表示装置上の画面上のロック状態を示す表示を行うと共に、ロック情報記憶手段１４のロックフラグをＯＮにし、また、該オペレータにより対応付け結果の修正指示が行われた場合には、オペレータに対応付けすべきシナリオのキーワードを入力させ、該キーワードに対応するシナリオテキストをシナリオ蓄積手段１２０から検索して表示装置上に表示し、該オペレータから対応関係のロックを指定された場合には、該表示装置上の画面上のロック状態を表す表示を行うと共に、該ロック記憶手段１４のロックフラグをＯＮにすると共に、整合情報を更新するように指示する再整合手段１６２と、
全ての映像データとシナリオテキストの整合が終了したら、整合情報を出力する整合結果出力手段１６３と、を有する。 The present invention is a video / scenario matching device for associating a scenario text with each partial section (scene) of a video,
Video storage means 110 for storing video data;
Scenario storage means 120 for storing a scenario text consisting of a dialogue ID, a speaker ID, a speaker name, and dialogue;
A scenario text analysis unit 140 that reads scenario text from the scenario storage unit, generates a speaker pattern of the scenario, and stores it in the speaker pattern storage unit 12;
A video analysis unit 130 that reads video data from the video storage unit 110, generates a speech pattern that is a symbol string including a speaker name by dividing the video data into unit sections, and stores the speech pattern in the speech pattern storage unit 11;
Matching the symbol string stored in the speaker pattern storage unit 12 with the symbol string stored in the utterance pattern storage unit 11 to generate matching information in which the change of the dialogue and the time are associated with each other, the matching information storage unit 13 The association means 150 stored in
The matching information is read from the matching information storage unit 13, the video corresponding to the matching information is acquired from the video storage unit 110, and the scenario text corresponding to the matching information is acquired from the scenario storage unit 120. A matching result display means 161 for displaying the received result on the display means and displaying the video partial section designated by the operator;
If the operator determines that the association result is correct and the correspondence lock is designated, the lock state on the screen on the display device is displayed and the lock flag of the lock information storage means 14 is turned ON. In addition, when the operator gives an instruction to correct the association result, the operator inputs the keyword of the scenario to be associated, and the scenario text corresponding to the keyword is retrieved from the scenario storage unit 120 and displayed. When it is displayed on the device and the corresponding lock is designated by the operator, the lock state on the screen on the display device is displayed and the lock flag of the lock storage means 14 is turned ON. And re-matching means 162 for instructing to update the matching information;
When matching of all video data and scenario text is completed, a matching result output unit 163 that outputs matching information is provided.

また、上記の対応付け手段１５０は、
対応付けを行う際に、ロック情報ファイルを参照し、ロックフラグがＯＮになっている台詞ＩＤに対応した部分で話者パターンを分割し、該ロックフラグがＯＮになっている時刻に対応した部分で話者パターンを分割し、個別に対応付けを行って整合情報記憶手段に統合する手段を含む。 In addition, the association means 150 described above
A part corresponding to the time when the lock flag is turned on by referring to the lock information file and dividing the speaker pattern at the part corresponding to the dialogue ID for which the lock flag is turned on. Includes a unit that divides the speaker pattern, associates them individually, and integrates them into the matching information storage unit.

本発明は、映像の部分区間（シーン）毎にシナリオテキストを対応付けるための映像・シナリオ整合プログラムであって、
コンピュータに、
映像蓄積手段から映像データを入力し、シナリオ蓄積手段から一定の語長で区切られたシナリオテキストを入力させるデータ入力ステップと、
シナリオテキストの区切られた単位毎に話者名を含む記号列であるシナリオの話者パターンを生成して話者パターン記憶手段に格納するシナリオテキスト解析ステップと、
映像データを単位区間毎に区切り、話者名を含む記号列である発話パターンを生成して発話パターン記憶手段に格納する映像解析ステップと、
話者パターン記憶手段に格納された記号列と、発話パターン記憶手段に格納された記号列をマッチングし、台詞の変わり目と時刻とを対応付けた整合情報を生成し、整合情報記憶手段に格納する対応付けステップと、
整合情報記憶手段から整合情報を読み出して、該整合情報に対応する映像蓄積手段から映像を取得し、さらに、シナリオ蓄積手段からシナリオテキストを取得して、対応付けられた結果を表示手段に表示し、オペレータによって指定された映像部分区間を表示する対応付け結果表示ステップと、
オペレータにより対応付け結果が正しいと判断され、対応関係のロックを指定された場合には、表示装置上の画面上のロック状態を示す表示を行うと共に、ロック情報記憶手段のロックフラグをＯＮにし、
オペレータにより対応付け結果の修正指示が行われた場合には、オペレータに対応付けすべきシナリオのキーワードを入力させ、該キーワードに対応するシナリオテキストをシナリオ蓄積手段から検索して表示装置上に表示し、該オペレータから対応関係のロックを指定された場合には、該表示装置上の画面上のロック状態を表す表示を行うと共に、ロック記憶手段のロックフラグをＯＮにすると共に、整合情報を更新する再整合ステップと、
全ての映像データとシナリオテキストの整合が終了したら、整合情報を出力する整合結果出力ステップと、を実行させる。 The present invention is a video / scenario matching program for associating a scenario text with each partial section (scene) of a video,
On the computer,
A data input step of inputting video data from the video storage means and inputting scenario text delimited by a certain word length from the scenario storage means;
A scenario text analysis step of generating a speaker pattern of a scenario that is a symbol string including a speaker name for each unit of scenario text and storing it in the speaker pattern storage means;
A video analysis step of dividing the video data into unit intervals, generating an utterance pattern that is a symbol string including a speaker name, and storing the utterance pattern in the utterance pattern storage means;
Matching the symbol string stored in the speaker pattern storage means with the symbol string stored in the utterance pattern storage means, generating matching information in which the line change and the time are associated, and storing the matching information in the matching information storage means A mapping step;
The matching information is read from the matching information storage means, the video is acquired from the video storage means corresponding to the matching information, the scenario text is acquired from the scenario storage means, and the associated result is displayed on the display means. An association result display step for displaying the video partial section designated by the operator;
When it is determined by the operator that the matching result is correct and the lock of the correspondence relationship is designated, the lock state on the screen on the display device is displayed and the lock flag of the lock information storage means is turned ON,
When the operator instructs the correction of the association result, the operator inputs a keyword for the scenario to be associated, searches the scenario storage means for the scenario text corresponding to the keyword, and displays it on the display device. When the operator designates the lock of the correspondence relationship, the display indicating the lock state on the screen on the display device is performed, the lock flag of the lock storage unit is turned on, and the matching information is updated. A realignment step;
When matching of all the video data and the scenario text is completed, a matching result output step for outputting matching information is executed.

また、上記の対応付けステップにおいて、
対応付けを行う際に、前記ロック情報ファイルを参照し、ロックフラグがＯＮになっている台詞ＩＤに対応した部分で話者パターンを分割し、該ロックフラグがＯＮになっている時刻に対応した部分で話者パターンを分割し、個別に対応付けを行って前記整合情報記憶手段に統合するステップを実行させる。 In the above association step,
When performing the association, the lock information file is referred to, the speaker pattern is divided at the part corresponding to the dialogue ID for which the lock flag is ON, and it corresponds to the time when the lock flag is ON The speaker pattern is divided into parts, individually associated with each other, and integrated with the matching information storage unit.

本発明は、映像データとシナリオの対応付けを行う際に、オペレータから対応関係が正しいと判断された対応付け結果に対して、ロックフラグをＯＮにしておくことにより、オペレータにより指示された対応点を通る、より良い対応付けを生成することができる。 In the present invention, when associating video data with a scenario, the correspondence point instructed by the operator is set by turning on the lock flag for the association result determined to be correct by the operator. A better association can be generated through

また、映像に対応する台詞をテキスト検索で探す際に、検索範囲を直前のロック情報（映像より前の時刻で一番近いもの）から直後のロック情報（映像より後の時刻で一番近いもの）までの間に限定することができる。このため、テキスト検索結果の件数が少なくなり、ロックフラグが無い場合に比べて、台詞の確認の手間が減少する。 Also, when searching for dialogues corresponding to video by text search, the search range is the lock information immediately before (the closest one before the video) to the next lock information (the one closest to the time after the video) ). For this reason, the number of text search results is reduced, and the effort for confirming the dialogue is reduced as compared with the case where there is no lock flag.

以下、図面と共に本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

最初に、以下の説明で用いられる用語について説明する。
・映像：ｍｐｅｇ等の動画像ファイル；
・シナリオテキスト：話者名と台詞文章が特定のフォーマットで記述されたテキストファイル；
・シナリオ整合：特定の台詞が映像音声の内容と合致するようにタイムコードを与える操作（対応付け操作と修正操作を繰り返すことにより行う）；
・対応付け操作：自動でシナリオと映像音声とを対応付ける操作、別プログラム（対応付けプログラム）により行う。
・修正操作：対応付け操作の結果得られる整合状態を、ＧＵＩを介して手作業で修正する操作；
・キュー（ＣＵＥ）：特定のシナリオテキストが対応付けられた１部分映像；
次に、本発明の装置構成について説明する。 First, terms used in the following description will be described.
・ Video: Video file such as mpeg;
・ Scenario text: A text file in which the speaker name and dialogue are written in a specific format;
Scenario matching: An operation that gives a time code so that a specific line matches the content of the video and audio (performed by repeating the association operation and the correction operation);
Correlation operation: An operation for automatically associating a scenario and video / audio, and a separate program (correlation program)
-Correcting operation: An operation for manually correcting the alignment state obtained as a result of the associating operation through the GUI;
Cue (CUE): one partial video associated with a specific scenario text;
Next, the apparatus configuration of the present invention will be described.

図３は、本発明の一実施の形態における映像シナリオ整合装置の構成を示す。 FIG. 3 shows the configuration of the video scenario matching apparatus in one embodiment of the present invention.

同図に示す映像シナリオ整合装置は、映像蓄積部１１０、シナリオ蓄積部１２０、映像解析部１３０、シナリオテキスト解析部１４０、対応付け部１５０、制御部１６０、表示部１７０、入力部１８０、発話パターンファイル１１、話者パターンファイル１２、整合情報ファイル１３、ロック情報ファイル１４から構成される。 The video scenario matching apparatus shown in the figure includes a video storage unit 110, a scenario storage unit 120, a video analysis unit 130, a scenario text analysis unit 140, an association unit 150, a control unit 160, a display unit 170, an input unit 180, and an utterance pattern. The file 11, the speaker pattern file 12, the matching information file 13, and the lock information file 14 are configured.

映像蓄積部１１０は、入力された映像ファイルのデータを蓄積する。 The video storage unit 110 stores data of the input video file.

シナリオ蓄積部１２０は、入力されたシナリオテキストを蓄積する。シナリオテキストの例を図４に示す。同図に示すように、シナリオテキストは、台詞ＩＤ、話者ＩＤ、話者名、台詞から構成される。 The scenario storage unit 120 stores the input scenario text. An example of scenario text is shown in FIG. As shown in the figure, the scenario text is composed of a dialogue ID, a speaker ID, a speaker name, and a dialogue.

映像解析部１３０は、映像蓄積部１１０から映像データを読み込み、単位区間長に区切り、一定の長さの時区間における話者名が並べられたリストである発話パターンを生成し、発話パターンファイル１１に格納する。図５に発話パターンの例を示す。同図に示すように、発話パターン１１は、単位区間ＩＤ，単位区間スタート点、単位区間エンド点、話者ＩＤから構成される。なお、発話パターン１１は、ファイル形式で記憶手段に格納されるものとする。 The video analysis unit 130 reads the video data from the video storage unit 110, divides it into unit section lengths, generates an utterance pattern that is a list in which speaker names in a certain length of time section are arranged, and the utterance pattern file 11 To store. FIG. 5 shows an example of an utterance pattern. As shown in the figure, the utterance pattern 11 includes a unit section ID, a unit section start point, a unit section end point, and a speaker ID. It is assumed that the utterance pattern 11 is stored in the storage means in a file format.

シナリオテキスト解析部１４０は、図６に示すように、シナリオ蓄積部１２０からシナリオテキストを読み込み、台詞を一定の語長で区切った時の話者名のリストである話者パターンを台詞ＩＤを付与して生成し、話者パターンファイル１２に格納する。 As shown in FIG. 6, the scenario text analysis unit 140 reads the scenario text from the scenario storage unit 120 and assigns a dialogue ID, which is a list of speaker names when dialogue is divided by a certain word length, to a dialogue ID. Is generated and stored in the speaker pattern file 12.

対応付け部１５０は、発話パターン１１の記号列と話者パターン１２の記号列とをＤＰマッチング等の手法により対応付け、図７に示すように、台詞の変わり目と時刻（ｓｅｃ）とを対応付けた整合情報を生成し、整合情報ファイル１３に格納する。 The associating unit 150 associates the symbol string of the utterance pattern 11 and the symbol string of the speaker pattern 12 by a technique such as DP matching, and associates the line transition and time (sec) as shown in FIG. The matching information is generated and stored in the matching information file 13.

制御部１６０は、図８に示すように、整合情報ファイル１３の整合情報の台詞ＩＤと再生時刻からシナリオ蓄積部１２０のシナリオテキスト及び、映像蓄積部１１０の静止画を取得し、表示装置１７０に対して、整合結果表示ウィンドウに並べて表示するように制御する。また、整合コントロールウィンドウには、選択されたＣＵＥ番号（初期値は先頭）のシナリオテキスト及び静止画を表示すると共に、再生ボタンが入力装置１８０からクリックされた場合には、指定した位置から映像を再生する。また、制御部１６０は、入力装置１８０から再生している映像の部分区間とシナリオセグメントの対応が正しいと判断した場合には、ロック情報ファイル１４に対して、台詞と時刻の対応を固定するロック情報（ロックフラグをＯＮに設定）を追加し、対応付け部１５０が発話パターンと話者パターンの記号列を対応付ける際に、ＤＰマッチングにより話者パターンと発話パターンが対応付けられ、話者パターンから台詞ＩＤが、発話パターンから再生時刻（単位区間Start）が得られ、台詞ＩＤと再生時刻を並べてリストを作成する。このとき、同じ台詞ＩＤを持つ行が複数できる場合があるので、時刻が最も早いもののみを整合情報ファイル１３に書き込む。 As shown in FIG. 8, the control unit 160 acquires the scenario text of the scenario storage unit 120 and the still image of the video storage unit 110 from the dialogue ID and the playback time of the matching information in the matching information file 13 and stores them in the display device 170. On the other hand, control is performed so that the alignment result display windows are displayed side by side. In the matching control window, the scenario text and still image of the selected CUE number (initial value is the first) are displayed, and when the playback button is clicked from the input device 180, the video is displayed from the designated position. Reproduce. In addition, when the control unit 160 determines that the correspondence between the partial segment of the video reproduced from the input device 180 and the scenario segment is correct, the control unit 160 locks the lock information file 14 so that the correspondence between the dialogue and the time is fixed. When the information (the lock flag is set to ON) is added and the associating unit 150 associates the utterance pattern with the symbol string of the speaker pattern, the speaker pattern and the utterance pattern are associated with each other by DP matching. As for the dialogue ID, the reproduction time (unit section Start) is obtained from the utterance pattern, and the dialogue is created by arranging the dialogue ID and the reproduction time. At this time, since there may be a plurality of lines having the same dialogue ID, only the one with the earliest time is written to the matching information file 13.

ロック情報ファイル１４は、図９に示すように、ロックＩＤ，台詞ＩＤ，ロックされた映像の再生時刻及びロックフラグから構成される。制御部１６０によりロックが指示されると、ロック情報ファイル１４には、ロックされた順に付与されるロックＩＤ，整合情報ファイル１３から取得した整合した台詞ＩＤとロックされた映像の再生時刻からなるレコードが追加される。 As shown in FIG. 9, the lock information file 14 includes a lock ID, a dialogue ID, a playback time of a locked video, and a lock flag. When a lock is instructed by the control unit 160, the lock information file 14 includes a record including a lock ID given in the locked order, a matched dialogue ID acquired from the matching information file 13, and a reproduction time of the locked video. Is added.

表示装置１７０は、制御部１６０の指示により、整合結果表示ウィンドウと整合コントロールウィンドウを表示する。 The display device 170 displays a matching result display window and a matching control window in accordance with instructions from the control unit 160.

ここで、整合結果表示ウィンドウと整合コントロールウィンドウについて説明する。 Here, the matching result display window and the matching control window will be described.

整合結果ウィンドウは、図９の右側に示すようなウィンドウであり、ＣＵＥ番号、ロックアイコン、ＣＵＥ先頭カット画像及び時刻、再生位置カーソル、修正ＣＵＥ先頭カット画像及び時刻、台詞文章表示部などがある。 The matching result window is a window as shown on the right side of FIG. 9, and includes a CUE number, a lock icon, a CUE head cut image and time, a playback position cursor, a modified CUE head cut image and time, a dialogue sentence display section, and the like.

整合コントロールウィンドウは、図１０の左側に示すようなウィンドウであり、メニューバー（ファイル、ツール、ヘルプ）、映像表示部、映像シークコントロール、台詞表示部、台詞シークコントロール、音声波形表示部、フレーム映像表示部、対応付け位置の修正コントロールボタンからなる。対応付け位置の修正コントロールボタンは、図１１に示すように、リンクボタン、ロックボタン等があり、リンクボタンは、映像側の表示とシナリオテキストの表示同期／非同期を切り替えるボタンである。また、ロックボタンは、映像音声とシナリオテキストの整合のペアをロックする。整合を再帰的に行う場合には、ロックされたペアは不動となる。これらのボタンを用いることにより、修正のためにリンクを解除し、正しい整合のペアを選択してロックをかける。また、最初の状態で既に整合されている場合には、再整合の前にロックをかける等の処理が可能である。 The matching control window is a window as shown on the left side of FIG. 10, and includes a menu bar (file, tool, help), video display unit, video seek control, dialogue display unit, dialogue seek control, audio waveform display unit, and frame video. It consists of a display unit and a correction control button for the associated position. As shown in FIG. 11, the associated position correction control buttons include a link button, a lock button, and the like. The link button is a button for switching between video-side display and scenario text display synchronization / asynchronization. The lock button locks a matched pair of video and audio and scenario text. When matching is performed recursively, the locked pair is immobile. By using these buttons, the link is released for modification and the correct matching pair is selected and locked. Further, when the alignment is already performed in the initial state, processing such as locking before realignment is possible.

次に、上記の構成における動作を図１２を用いて説明する。 Next, the operation in the above configuration will be described with reference to FIG.

ステップ１０１）映像解析部１３０が、映像蓄積部１１０から映像を入力し、シナリオテキスト解析部１４０がシナリオ蓄積部１２０から、シナリオテキストは、台詞ＩＤ，話者ＩＤ，話者名及び台詞からなるシナリオテキストを入力する。 Step 101) The video analysis unit 130 inputs video from the video storage unit 110, the scenario text analysis unit 140 receives the scenario from the scenario storage unit 120, and the scenario text is a scenario consisting of a dialogue ID, a speaker ID, a speaker name, and a dialogue. Enter text.

ステップ１０２）シナリオテキスト解析部１４０において、シナリオテキストの台詞を一定の語長で区切り、通し番号を付与し、当該通し番号順に話者名を並べ、話者パターンファイル１２を生成する。 Step 102) In the scenario text analysis unit 140, the dialogue of the scenario text is divided by a certain word length, serial numbers are assigned, speaker names are arranged in the order of the serial numbers, and the speaker pattern file 12 is generated.

ステップ１０３）また、映像解析部１３０において、映像データから一定の長さの時区間毎に、単位区間ＩＤ，単位区間スタート点、単位区間エンド点及び話者ＩＤからなる発話パターンファイル１３を生成する。 Step 103) Further, the video analysis unit 130 generates an utterance pattern file 13 including a unit section ID, a unit section start point, a unit section end point, and a speaker ID for each time section of a certain length from the video data. .

ステップ１０４）制御部１６０は、ロック情報ファイル１４のロックフラグを全てＯＦＦに初期化する。 Step 104) The controller 160 initializes all lock flags of the lock information file 14 to OFF.

ステップ１０５）対応付け部１５０において、発話パターンファイル１１と話者パターンファイル１２の各パターン毎の記号列をＤＰマッチングすることにより、台詞ＩＤと対応付けられた映像の再生時刻を取得し、対応付けを行い、整合情報ファイル１３を生成する。 Step 105) The matching unit 150 obtains the reproduction time of the video associated with the dialogue ID by performing DP matching on the symbol strings for each pattern of the utterance pattern file 11 and the speaker pattern file 12, and associates them. To generate the matching information file 13.

ステップ１０６）制御部１６０において、整合情報ファイル１３、映像蓄積部１１０の映像データ及び、シナリオ蓄積部１２０のシナリオテキストを読み込み、対応付け結果として、整合結果表示ウィンドウと整合コントロールウィンドウを表示装置１７０上に表示する。 Step 106) The control unit 160 reads the matching information file 13, the video data of the video storage unit 110, and the scenario text of the scenario storage unit 120, and displays a matching result display window and a matching control window on the display device 170 as matching results. To display.

ステップ１０７）オペレータが整合コントロールウィンドウのテキストのシーク操作ボタンを操作（前ＣＵＥ／次ＣＵＥ）することにより、映像の部分区間及び／またはシナリオテキストの再生を指示すると、制御部１６０は、当該指示に基づいて、整合コントロールウィンドウに指定されたＣＵＥの部分映像及び／またはシナリオテキストを表示する。このとき、整合情報ファイル１３からＣＵＥ番号の行を読み、対応する映像シナリオを表示する。なお、ＣＵＥ番号は、整合情報ファイル１３を読み込むときに、整合情報ファイルの行番号に一致させる。 Step 107) When the operator operates the text seek operation button in the alignment control window (previous CUE / next CUE) to instruct the reproduction of the video partial section and / or scenario text, the control unit 160 responds to the instruction. Based on this, the partial video and / or scenario text of the designated CUE is displayed in the matching control window. At this time, the line of the CUE number is read from the matching information file 13 and the corresponding video scenario is displayed. The CUE number is matched with the line number of the matching information file when the matching information file 13 is read.

ステップ１０８）オペレータが表示装置１７０上に表示されている整合コントロールウィンドウを見て、再生している映像の部分区間とシナリオセグメントの対応付けが正しいかを判定し、正しければ（整合していれば）ステップ１０９に移行し、正しくなければ（整合していなければ）ステップ１１０に移行する。 Step 108) The operator looks at the matching control window displayed on the display device 170 to determine whether or not the correspondence between the partial section of the video being played back and the scenario segment is correct. ) Go to step 109, and if not correct (if not consistent), go to step 110.

ステップ１０９）オペレータが整合コントロールウィンドウに表示されている部分区間とシナリオセグメントが整合していれば、入力装置１８０からロックボタンをロックする。これにより、制御部１６０は、ロック情報ファイル１４のロックフラグをＯＮにし、ステップ１０７に移行する。このとき、整合結果表示ウィンドウの各ＣＵＥ番号に対応するロックアイコンがロック表示される。このとき、制御部１６０は、現在のＣＵＥ番号を表示部１７０に通知し、ロックアイコンをロック表示させる。 Step 109) If the partial section displayed in the alignment control window matches the scenario segment, the operator locks the lock button from the input device 180. As a result, the control unit 160 turns on the lock flag of the lock information file 14 and proceeds to step 107. At this time, the lock icon corresponding to each CUE number in the matching result display window is displayed in a locked manner. At this time, the control unit 160 notifies the display unit 170 of the current CUE number and lock-displays the lock icon.

ステップ１１０）オペレータが、整合コントロールウィンドウのテキストボックスに、入力装置１８０から検索キーワードを指定する。 Step 110) The operator designates a search keyword from the input device 180 in the text box of the matching control window.

ステップ１１１）制御部１６０は、当該キーワードに基づいて、シナリオ蓄積部１２０を検索し、当該キーワードが存在した場合には、整合結果表示ウィンドウ中のシナリオテキスト欄のキーワードを強調表示する。 Step 111) The control unit 160 searches the scenario storage unit 120 based on the keyword, and when the keyword exists, highlights the keyword in the scenario text column in the matching result display window.

ステップ１１２）オペレータは、対応付けを修正し、ロックフラグをＯＮにすると、制御部１６０は、対応付けられた台詞ＩＤとロックされた映像の再生時刻を整合情報ファイル１４から取得し、ロックフラグをＯＮに設定し、ロックＩＤを付与して、ロックファイル１４に追加登録する。 Step 112) When the operator corrects the association and turns on the lock flag, the control unit 160 obtains the associated dialogue ID and the playback time of the locked video from the matching information file 14, and sets the lock flag. Set to ON, give a lock ID, and additionally register in the lock file 14.

ステップ１１３）オペレータが映像とシナリオを再度対応付けを行う指示を行うと、ステップ１０６の処理に移行し、再度ステップ１０６以降の処理を行う。 Step 113) When the operator gives an instruction to associate the video and the scenario again, the process proceeds to Step 106, and the processes after Step 106 are performed again.

上記のステップ１０３において、映像の音声から発話パターンを作成する方法としては、例えば、“Automatic Speech and Speaker Recognition-Advanced Topics,”Kluwer Academic Publishers Group, 1996, ISBN 0-7923-9706-1等に記載された手法を用いることで実現できる。 As a method for creating an utterance pattern from video audio in step 103 described above, for example, described in “Automatic Speech and Speaker Recognition-Advanced Topics,” Kluwer Academic Publishers Group, 1996, ISBN 0-7923-9706-1 This can be realized by using the method described above.

また、上記のステップ１０５の処理について詳細に説明する。 The processing in step 105 will be described in detail.

図１３は、本発明の話者パターンと発話パターンを対応付ける処理のフローチャートである。 FIG. 13 is a flowchart of processing for associating a speaker pattern with an utterance pattern according to the present invention.

まず、対応付け部１５０は、整合情報ファイル１３をクリアし（ステップ１５０１）、次に、ロック情報ファイル１４に基づき、話者パターンと発話パターンを同じ数に分割する（ステップ１０５２，１０５３）。このとき、分割数はロック情報ファイルの行数＋１となる。次に、分割された話者パターンと発話パターンを個別にＤＰマッチングし、話者パターンから台詞ＩＤを取得し、発話パターンから再生時刻（単位区間Start）を取得することで、台詞ＩＤと再生時刻のリストを作成する（ステップ１０５５）。このリストから、各台詞ＩＤのうち時刻が最も早いものを整合情報ファイル１３に追加する（ステップ１０５６）。この処理を分割数分繰り返す（ステップ１０５４）ことで整合情報ファイル１３を生成する。 First, the associating unit 150 clears the matching information file 13 (step 1501), and then divides the speaker pattern and the utterance pattern into the same number based on the lock information file 14 (steps 1052 and 1053). At this time, the division number is the number of lines in the lock information file + 1. Next, the divided speaker pattern and the utterance pattern are individually DP-matched, the dialogue ID is obtained from the speaker pattern, and the reproduction time (unit section Start) is obtained from the utterance pattern, so that the dialogue ID and the reproduction time are obtained. Is created (step 1055). From this list, the earliest time of each dialogue ID is added to the matching information file 13 (step 1056). This process is repeated for the number of divisions (step 1054) to generate the matching information file 13.

次に、対応付けの動作を詳細に説明する。 Next, the association operation will be described in detail.

対応付け部１５０において、初回の対応付けを行う場合には、図１４に示すように、話者パターンと発話パターンをマッチングするが、その場合、ロック情報が全てＯＦＦであるので、そのまま対応付けを行い整合情報ファイル１３に出力する。 When the association unit 150 performs the first association, as shown in FIG. 14, the speaker pattern and the utterance pattern are matched. In this case, since all the lock information is OFF, the association is performed as it is. And output to the matching information file 13.

また、ロック情報がある場合には、図１５に示すように、ロック情報を参照して、話者パターンの分割数を、
分割数＝ロック情報の行数＋１
により求め、当該分割数に基づいて、話者パターンを分割する。同図の例では、話者パターン片１，２，３に分かれており、話者パターン片１は、先頭から台詞ＩＤより前まで、話者パターン片２は、台詞ＩＤ１から台詞ＩＤ３より前まで、話者パターン片３は、台詞ＩＤ３から台詞ＩＤ１８より前までにそれぞれ分割する。 Also, when there is lock information, as shown in FIG.
Number of divisions = number of lines of lock information + 1
And dividing the speaker pattern based on the number of divisions. In the example of the figure, the speaker pattern pieces 1, 2, and 3 are divided. The speaker pattern piece 1 is from the head before the dialogue ID, and the speaker pattern piece 2 is from the dialogue ID 1 to the dialogue ID 3. The speaker pattern piece 3 is divided from the line ID 3 to before the line ID 18.

発話パターンについては、図１６に示すように、ロック情報を参照し、発話パターンを上記の式で計算された分割数に応じて、発話パターン片１は先頭から０：００：００より前まで、発話パターン片２は０：００：００から０：０７：００より前まで、発話パターン片３は０：０７：００から０：１５：３０より前までにそれぞれ分割する。 For the utterance pattern, as shown in FIG. 16, referring to the lock information, the utterance pattern piece 1 is from the beginning until 0:00:00 according to the number of divisions calculated by the above formula, The utterance pattern piece 2 is divided from 0:00 to 0:07:00, and the utterance pattern piece 3 is divided from 0:07:00 to 0:15:30.

図１５、図１６により分割された話者パターン片、発話パターン片を対応付けて、整合情報ファイル１３に追加する処理を分割数分繰り返すことにより整合情報ファイル１３が更新される。 The matching information file 13 is updated by associating the speaker pattern pieces and the utterance pattern pieces divided in FIG. 15 and FIG.

次に、上記の動作に基づいて、図１８〜図２３を用いて、具体的に説明する。 Next, based on said operation | movement, it demonstrates concretely using FIGS. 18-23.

ここで、動作の説明をする前に、入力装置１８０、出力装置１９０におけるインタフェースについて説明する。 Here, before describing the operation, an interface in the input device 180 and the output device 190 will be described.

入力装置１８０から映像コントロールの次ＣＵＥ（または、前ＣＵＥ）が入力されると、現在表示されている整合コントロールウィンドウの再生画面に表示されている映像区間の次（前）の区間が整合コントロールウィンドウの再生画面に表示される。このとき、シーク操作のリンクボタンがＯＮであれば、テキストボックスに表示される台詞が同期して次（前）の台詞に遷移する（ＯＦＦのときはテキストの変更はない）。 When the next CUE of video control (or previous CUE) is input from the input device 180, the next (previous) section of the video section displayed on the playback screen of the currently displayed matching control window is the matching control window. Displayed on the playback screen. At this time, if the link button of the seek operation is ON, the dialogue displayed in the text box is synchronized and transitions to the next (previous) dialogue (when OFF, there is no text change).

入力装置１８０から、テキストボックスのコントロールの次（前）テキストが入力されると、現在表示されている台詞の次（前）の台詞が表示される。ここで、シーク操作のリンクボタンがＯＮであれば、映像区間も同期して次（前）の区間に遷移する。 When the next (previous) text of the text box control is input from the input device 180, the next (previous) line of the currently displayed line is displayed. Here, if the seek operation link button is ON, the video section is also synchronized to transition to the next (previous) section.

リンクボタンをＯＮにして、映像及びテキストのシーク操作を行うと、映像とテキストが同期して変化するので、対応付けが正しいかどうかを確認することが可能である。 When the link button is turned on and a video and text seek operation is performed, the video and text change synchronously, so it is possible to check whether the association is correct.

また、対応付けが間違っている場合には、リンクボタンをＯＦＦにすることで、映像の音声を聞き取り、テキスト検索機能で対応する台詞を探すことで正しい対応付けを行うことができる。このとき、テキスト検索を行う範囲は、ロック情報によって制限される。 If the association is wrong, the link can be turned off to listen to the audio of the video, and the corresponding search can be performed by searching for the corresponding dialogue using the text search function. At this time, the range of text search is limited by the lock information.

以下では、映像解析部１３０、シナリオテキスト解析部１４０により、発話パターンファイル１１、話者パターンファイル１２が作成され、更に、対応付け部１５０により整合情報ファイル１３が作成されているものとする。 In the following, it is assumed that the utterance pattern file 11 and the speaker pattern file 12 are created by the video analysis unit 130 and the scenario text analysis unit 140, and the matching information file 13 is created by the association unit 150.

図１８は、対応付け開始時の起動画面の例である。ここで、オペレータが、プログラムアイコンをダブルクリックすることにより起動し、画面上の「ファイル（Ｆ）」をクリックすると、ファイル選択のダイアログボックスが表示され、整合情報ファイルを選択する。このとき、制御部１６０は、整合コントロールウィンドウのリンクボタンをＯＮとし、ロックボタンをＯＦＦとする（ステップ１０４）。制御部１６０は、話者パターンと発話パターンを対応付け、整合情報ファイル１３に追加する（ステップ１０５）、これにより、制御部１６０は、整合情報ファイル１３から整合情報を読み込み、台詞ＩＤに基づいてシナリオ蓄積部１２０からシナリオテキスト内の台詞を参照し、同様に、タイムコードに基づいて映像ファイル蓄積部１１０からカット映像を生成し整合コントロールウィンドウに表示する（ステップ１０６）。また、整合結果ウィンドウの映像表示部に、「ＣＵＥ１」の先頭カットを、台詞表示部に「ＣＵＥ１」の台詞をそれぞれ表示する。図１９では、リンクボタンをＯＮ、ロックボタンをＯＦＦとした状態を初期状態として表示している。 FIG. 18 is an example of a startup screen at the start of association. Here, when the operator double-clicks the program icon and starts up and clicks “File (F)” on the screen, a file selection dialog box is displayed, and a matching information file is selected. At this time, the control unit 160 turns on the link button of the matching control window and turns off the lock button (step 104). The control unit 160 associates the speaker pattern with the utterance pattern and adds them to the matching information file 13 (step 105), whereby the control unit 160 reads the matching information from the matching information file 13 and based on the dialogue ID. The dialogue in the scenario text is referenced from the scenario storage unit 120, and similarly, a cut video is generated from the video file storage unit 110 based on the time code and displayed on the matching control window (step 106). In addition, the head cut of “CUE1” is displayed on the video display part of the matching result window, and the line of “CUE1” is displayed on the dialog display part. In FIG. 19, the state where the link button is ON and the lock button is OFF is displayed as the initial state.

この状態において、映像シークコントロール、または、台詞シークコントロールを操作することにより（ステップ１０７）、整合コントロールウィンドウに任意のＣＵＥを表示させることができる。整合コントロールウィンドウ上で表示されているＣＵＥは、整合状態表示ウィンドウで同色の太枠でハイライトされる。 In this state, an arbitrary CUE can be displayed in the matching control window by operating the video seek control or the dialogue seek control (step 107). The CUE displayed on the matching control window is highlighted with a thick frame of the same color in the matching status display window.

上記の状態で、任意のＣＵＥ位置において、オペレータがロックボタンをクリックすると（ステップ１０７，１０８）、図２０に示すように、ロックボタンのアイコンが赤（施錠マーク）へと変化する。また、ロックした整合結果ウィンドウのＣＵＥ番号もロックボタンと同様に変化する。ロックされたＣＵＥの情報がロック情報ファイル１４へ書き出される（ステップ１０９）。 In the above state, when the operator clicks the lock button at an arbitrary CUE position (steps 107 and 108), as shown in FIG. 20, the icon of the lock button changes to red (locking mark). Further, the CUE number of the locked matching result window also changes in the same manner as the lock button. The locked CUE information is written to the lock information file 14 (step 109).

次に、上記の操作の後、任意の区間の対応関係を修正することができる。まず、シークのリンクボタンをクリックすると、図２１に示すように、シークのリンクボタン及びロックボタンの両方が非アクティブ状態に遷移する。次に、映像のシークコントロールを用いて任意の時刻の映像にジャンプし、図２２に示すように、その時点で映像のコントロールにより映像の再生を一時停止する。シーク操作を繰り返し、特定の台詞の開始時点で一時停止する。適切な対応関係にある台詞のテキストセグメントを、台詞のシークコントロールとしてキーワードを指定することにより選択し（ステップ１１０）、マッチした部分を強調表示する（ステップ１１１）。 Next, after the above operation, the correspondence between arbitrary sections can be corrected. First, when a seek link button is clicked, both the seek link button and the lock button transition to an inactive state, as shown in FIG. Next, it jumps to a video at an arbitrary time using video seek control, and as shown in FIG. 22, video playback is paused by video control at that time. Repeat the seek operation and pause at the start of a specific line. A dialogue text segment having an appropriate correspondence is selected by specifying a keyword as a dialogue seek control (step 110), and the matched portion is highlighted (step 111).

この時点でロックボタンをクリックすると、シークのリンクボタン及びロックボタンがＯＮになり、ロック情報ファイル１４に時刻情報及び台詞ＩＤが書き出される（ステップ１１２）。 When the lock button is clicked at this time, the seek link button and the lock button are turned ON, and the time information and the dialogue ID are written in the lock information file 14 (step 112).

上記の操作を繰り返し行い、複数回ロックを行った後、再度整合を行うことができる。 It is possible to perform the alignment again after repeating the above operation and performing the lock a plurality of times.

オペレータが図２３に示すメニューバーのツールにおいて、「再整合」を選択すると、対応付け部１５０により新規に書き出す整合情報ファイル１３のパスを指定し、オペレータが整合ボタンがクリックすると、対応付け部１５０は、ロック情報ファイル１４に基づいてマッチング処理を行い、整合情報ファイル１３に上書きする。前の整合情報ファイル１３は、ファイル名を変更した後、バックアップとして残すことも可能である。 When the operator selects “realignment” in the tool of the menu bar shown in FIG. 23, the path of the alignment information file 13 to be newly written by the associating unit 150 is specified, and when the operator clicks the alignment button, the associating unit 150 Performs matching processing based on the lock information file 14 and overwrites the matching information file 13. The previous matching information file 13 can be left as a backup after changing the file name.

上記の処理の終了後、映像と台詞の対応付けが正しく行われていれば、制御部１６０は、整合情報を出力する。 If the video and dialogue are correctly associated after the above processing is completed, the control unit 160 outputs matching information.

上記のステップ１０９において、オペレータがロックボタンをＯＮにする操作を指定すると、制御部１６０は、台詞と時刻との対応を固定するロック情報をロック情報ファイル１４に追加し、対応付け部１５０が発話パターンの記号列と話者パターンの記号列とを対応付けする際に、ロック情報ファイル１４のロックフラグがＯＮになっている台詞ＩＤに対応した部分で話者パターンを分割し、ロックフラグがＯＮになっている時刻に対応した部分で発話パターンを分割し、個別に対応付けを行って整合情報ファイル１３に統合する。 When the operator designates an operation to turn on the lock button in the above step 109, the control unit 160 adds lock information for fixing the correspondence between the dialogue and the time to the lock information file 14, and the association unit 150 speaks. When associating the symbol string of the pattern with the symbol string of the speaker pattern, the speaker pattern is divided at the portion corresponding to the dialogue ID for which the lock flag of the lock information file 14 is ON, and the lock flag is turned ON. The utterance pattern is divided at the part corresponding to the time of, and is individually associated and integrated into the matching information file 13.

これにより、対応付け部１５０に人手により指定された対応点を通るより良い対応付けを生成することができる。 As a result, it is possible to generate a better association that passes through the corresponding points designated manually by the association unit 150.

また、映像の対応する台詞をテキスト検索で検索する際に、検索範囲を直前のロック情報（映像より前の時刻で一番近いもの）から直後のロック情報（映像より後の時刻で一番近いもの）までの間に限定できる。このため、テキストの検索の件数が少なくなり、ロックフラグがない場合に比べて台詞の確認の手間が減少する。図２４に示すように、ロック情報がない場合には、全範囲を検索する必要があるが、ロック情報を用いると、ロックされている範囲については探索する必要がない。 Also, when searching for the corresponding dialogue of the video by text search, the search range is the closest to the lock information immediately before (the time before the video) to the lock information immediately after (the time after the video) Stuff). For this reason, the number of text searches is reduced, and the effort for confirming dialogue is reduced as compared with the case where there is no lock flag. As shown in FIG. 24, when there is no lock information, it is necessary to search the entire range, but when the lock information is used, it is not necessary to search the locked range.

なお、前述の図３に示す映像解析部１３０、シナリオテキスト解析部１４０、対応付け部１５０、制御部１６０の動作をプログラムとして構築し、映像シナリオ整合装置として利用されるコンピュータにインストールし、ＣＰＵ等の制御手段に実行させることが可能である。 It should be noted that the operations of the video analysis unit 130, scenario text analysis unit 140, association unit 150, and control unit 160 shown in FIG. 3 described above are constructed as programs, installed in a computer used as a video scenario matching device, a CPU, etc. It is possible to execute this control means.

また、構築されたプログラムを映像シナリオ整合装置として利用されるコンピュータに接続されるハードディスクや、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納しておき、実施する際に、コンピュータにインストールすることも可能である。 The built program is stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM connected to a computer used as a video scenario matching apparatus, and is installed in the computer when it is executed. It is also possible.

なお、本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiment, and various modifications and applications can be made within the scope of the claims.

本発明は、映像データとシナリオデータ（テキストデータ）とをユーザインタフェースを介して対応付けるシステムに適用可能である。 The present invention is applicable to a system that associates video data and scenario data (text data) via a user interface.

本発明の原理を説明するための図である。It is a figure for demonstrating the principle of this invention. 本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態における映像シナリオ整合装置の構成図である。It is a block diagram of the video scenario matching apparatus in one embodiment of this invention. 本発明の一実施の形態におけるシナリオテキストの構成図である。It is a block diagram of the scenario text in one embodiment of this invention. 本発明の一実施の形態における発話パターンファイルの構成図である。It is a block diagram of the speech pattern file in one embodiment of this invention. 本発明の一実施の形態における話者パターンファイルの生成手順を示す図である。It is a figure which shows the production | generation procedure of the speaker pattern file in one embodiment of this invention. 本発明の一実施の形態における整合情報ファイルの例である。It is an example of the consistency information file in one embodiment of this invention. 本発明の一実施の形態における制御部の動作を説明するための図である。It is a figure for demonstrating operation | movement of the control part in one embodiment of this invention. 本発明の一実施の形態におけるロック情報ファイルの構成図である。It is a block diagram of the lock information file in one embodiment of this invention. 本発明の一実施の形態における表示装置に表示されるウィンドウを説明するための図である。It is a figure for demonstrating the window displayed on the display apparatus in one embodiment of this invention. 本発明の一実施の形態における対応付け位置の修正コントロールボタンを説明するための図である。It is a figure for demonstrating the correction control button of the matching position in one embodiment of this invention. 本発明の一実施の形態における映像シナリオ整合装置の動作のフローチャートである。It is a flowchart of operation | movement of the video scenario matching apparatus in one embodiment of this invention. 本発明の一実施の形態における話者パターンと発話パターンを対応付ける処理のフローチャートである。It is a flowchart of the process which matches a speaker pattern and an utterance pattern in one embodiment of this invention. 本発明の一実施の形態における初回の対応付けを説明するための図である。It is a figure for demonstrating the initial association in one embodiment of this invention. 本発明の一実施の形態におけるロック情報がある場合の対応付けを説明するための図（その１）である。It is FIG. (1) for demonstrating matching when there exists lock information in one embodiment of this invention. 本発明の一実施の形態におけるロック情報がある場合の対応付けを説明するため図（その２）である。It is FIG. (2) for demonstrating matching when there exists lock information in one embodiment of this invention. 本発明の一実施の形態におけるロック情報がある場合の対応付けを説明するための図（その３）である。It is FIG. (3) for demonstrating matching when there exists lock information in one embodiment of this invention. 本発明の一実施の形態における対応付け開始時の起動画面の例である。It is an example of the starting screen at the time of the correlation start in one embodiment of this invention. 本発明の一実施の形態における整合情報を開いた時点における画面例である。It is an example of a screen at the time of opening the matching information in one embodiment of the present invention. 本発明の一実施の形態におけるオペレータが判断を行った場合（正しいと判断）の表示例である。It is an example of a display when the operator in one embodiment of the present invention makes a determination (determined to be correct). 本発明の一実施の形態におけるオペレータが判断を行った場合（正しくないと判断）の表示例である。It is an example of a display when the operator in one embodiment of the present invention makes a determination (determines that it is not correct). 本発明の一実施の形態における対応関係の修正の例である。It is an example of correction | amendment of the correspondence in one embodiment of this invention. 本発明の一実施の形態における再整合を行う場合の表示例である。It is an example of a display in the case of performing realignment in one embodiment of the present invention. 本発明の一実施の形態におけるロック情報による効果を説明するための図である。It is a figure for demonstrating the effect by the lock information in one embodiment of this invention.

Explanation of symbols

１１発話パターン記憶手段、発話パターンファイル
１２話者パターン記憶手段、話者パターンファイル
１３整合情報記憶手段、整合情報ファイル
１４ロック情報記憶手段、ロック情報ファイル
１５整合情報
１１０映像蓄積手段、映像蓄積部
１２０シナリオ蓄積手段、シナリオ蓄積部
１３０映像解析手段、映像解析部
１４０シナリオテキスト解析手段、シナリオテキスト解析部
１５０対応付け手段、対応付け部
１６０制御部
１６１対応付け結果表示手段
１６２再整合手段
１６３整合結果出力手段
１７０表示部
１８０入力部 11 utterance pattern storage means, utterance pattern file 12 speaker pattern storage means, speaker pattern file 13 matching information storage means, matching information file 14 lock information storage means, lock information file 15 matching information 110 video storage means, video storage section 120 Scenario storage unit, scenario storage unit 130 Video analysis unit, video analysis unit 140 Scenario text analysis unit, scenario text analysis unit 150 Association unit, association unit 160 Control unit 161 Association result display unit 162 Rematching unit 163 Matching result output Means 170 Display unit 180 Input unit

Claims

In the video / scenario matching method for associating scenario text with each partial section (scene) of video,
Associate the video data read from the video storage means with the scenario text read from the scenario storage means,
Using the user interface with the operator,
Displaying the correspondence result between the read video data partial section and the read scenario text, prompting the operator to make a decision,
If there is an input from the operator that the correspondence is incorrect,
Requesting the operator to make corrections, turning on a lock flag of a record having a correspondence relationship between a partial section of video data in a lock information storage means and a scenario text, and inquiring to the operator whether or not to update the correspondence;
A video / scenario matching method, characterized in that the association processing is repeated if there is an input from the operator to update.

A data input process for inputting video data from the video storage means and inputting scenario text delimited by a certain word length from the scenario storage means;
A scenario text analysis step of generating a speaker pattern of a scenario that is a symbol string including a speaker name for each unit of the scenario text and storing it in the speaker pattern storage means;
A video analysis process of dividing the video data into unit sections, generating an utterance pattern that is a symbol string including a speaker name, and storing the utterance pattern in a utterance pattern storage unit;
Matching the symbol string stored in the speaker pattern storage means with the symbol string stored in the utterance pattern storage means, generating matching information in which the change of the dialogue and the time are associated with each other, the matching information storage means Storing the matching process;
The matching information is read from the matching information storage means, the video is acquired from the video storage means corresponding to the matching information, the scenario text is acquired from the scenario storage means, and the associated result is displayed. A matching result display process for displaying the video partial section designated by the operator,
When it is determined by the operator that the correspondence result is correct and the correspondence lock is designated, the lock state on the screen on the display device is displayed and the lock flag of the lock information storage means is turned ON. West,
When the operator instructs the association result to be corrected, the operator inputs the keyword of the scenario to be associated, searches the scenario storage unit for the scenario text corresponding to the keyword, and displays it on the display device. When the operator designates the lock of the correspondence relationship, the lock state display on the screen on the display device is displayed, the lock flag of the lock storage means is turned ON, and Realignment process to update the alignment information;
2. The video / scenario matching method according to claim 1, further comprising: a matching result output process for outputting matching information when matching of all video data and scenario text is completed.

In the matching process,
By referring to the lock information file, the scenario text is divided at a portion corresponding to the dialogue ID corresponding to the dialogue ID for which the lock flag is ON, and video data corresponding to the time when the lock flag is ON is obtained. The video / scenario matching method according to claim 2, wherein the video / scenario matching method is analyzed, individually associated, and integrated into the matching information storage means.

A video / scenario matching device for associating scenario text with each partial section (scene) of video,
Video storage means for storing video data;
Scenario storage means for storing a scenario text composed of a dialogue ID, a speaker ID, a speaker name, and dialogue;
A scenario text analysis process of reading the scenario text from the scenario storage means, generating a speaker pattern of the scenario and storing it in the speaker pattern storage means;
A video analysis process of reading the video data from the video storage means, dividing it into unit intervals, generating a speech pattern which is a symbol string including a speaker name, and storing it in the speech pattern storage means;
Matching the symbol string stored in the speaker pattern storage means with the symbol string stored in the utterance pattern storage means, generating matching information in which the change of the dialogue and the time are associated with each other, the matching information storage means Storing association means;
Read the matching information from the matching information storage means, acquire the video corresponding to the matching information from the video storage means, and further acquire the scenario text corresponding to the matching information from the scenario storage means Display the displayed result on the display means, association result display means for displaying the video partial section designated by the operator,
When it is determined by the operator that the matching result is correct and lock of the correspondence relationship is designated, the lock state on the screen on the display device is displayed and the lock flag of the lock information storage means is turned on. In addition, when the operator gives an instruction to correct the association result, the operator inputs a scenario keyword to be associated, and searches the scenario storage means for the scenario text corresponding to the keyword. When displayed on the display device and the corresponding lock is designated by the operator, a display indicating the lock state on the screen on the display device is performed and the lock flag of the lock storage means is turned ON. And re-matching means for instructing to update the matching information;
A video / scenario matching device, comprising: matching result output means for outputting matching information when matching of all video data and scenario text is completed.

The association means includes
When performing the association, the lock information file is referred to, the speaker pattern is divided at the part corresponding to the dialogue ID for which the lock flag is ON, and it corresponds to the time when the lock flag is ON 5. The video / scenario matching apparatus according to claim 4, further comprising means for dividing the speaker pattern into parts, individually associating them, and integrating them into the matching information storage means.

A video / scenario matching program for associating scenario text with each partial section (scene) of video,
On the computer,
A data input step of inputting video data from the video storage means and inputting scenario text delimited by a certain word length from the scenario storage means;
A scenario text analysis step of generating a speaker pattern of a scenario that is a symbol string including a speaker name for each unit in which the scenario text is divided and storing it in a speaker pattern storage means;
A video analysis step of dividing the video data into unit intervals, generating an utterance pattern that is a symbol string including a speaker name, and storing the utterance pattern in the utterance pattern storage means;
Matching the symbol string stored in the speaker pattern storage means with the symbol string stored in the utterance pattern storage means, generating matching information in which the change of the dialogue and the time are associated with each other, the matching information storage means A mapping step to store;
The matching information is read from the matching information storage means, the video is acquired from the video storage means corresponding to the matching information, the scenario text is acquired from the scenario storage means, and the associated result is displayed. A matching result display step for displaying the video partial section specified by the operator,
When it is determined by the operator that the correspondence result is correct and the correspondence lock is designated, the lock state on the screen on the display device is displayed and the lock flag of the lock information storage means is turned ON. West,
When the operator instructs the association result to be corrected, the operator inputs the keyword of the scenario to be associated, searches the scenario storage unit for the scenario text corresponding to the keyword, and displays it on the display device. When the operator designates the lock of the correspondence relationship, the lock state display on the screen on the display device is displayed, the lock flag of the lock storage means is turned ON, and A realignment step to update the alignment information;
A video / scenario matching program for executing a matching result output step of outputting matching information when matching of all video data and scenario text is completed.

In the association step,
When performing the association, the lock information file is referred to, the speaker pattern is divided at the part corresponding to the dialogue ID for which the lock flag is ON, and it corresponds to the time when the lock flag is ON 7. The video / scenario matching program according to claim 6, wherein a speaker pattern is divided into portions, individually associated with each other, and integrated into the matching information storage means.