JP6700957B2

JP6700957B2 - Subtitle data generation device and program

Info

Publication number: JP6700957B2
Application number: JP2016094531A
Authority: JP
Inventors: 高登河村; 克幸杉森; 馨介塚口; 浜口　斉周; 斉周浜口
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2016-05-10
Filing date: 2016-05-10
Publication date: 2020-05-27
Anticipated expiration: 2036-05-10
Also published as: JP2017204695A

Description

本発明は、字幕データ生成装置、およびプログラムに関する。
The present invention, subtitle data generation apparatus, and a program.

テレビの放送番組などの動画コンテンツを、通信回線（インターネット等）を介して放送と同時に配信するシステムの普及が望まれている。このようなシステムが広く利用可能となることにより、様々な放送受信環境において、良好な状態で番組が視聴可能となることが期待される。 There is a demand for the spread of a system for delivering moving image content such as television broadcast programs simultaneously with broadcasting via a communication line (such as the Internet). The widespread availability of such a system is expected to enable viewing of programs in good condition in various broadcast receiving environments.

放送番組などの動画コンテンツを通信回線経由で配信するためのシステムの構成要素の一つは、エンコーダー装置である。特に、テレビの放送信号を入力して、リアルタイムに配信可能な動画ファイル等を出力するエンコーダー装置が必要とされる。このようなエンコーダー装置は、「ライブエンコーダー」とも呼ばれる。通信回線経由で放送番組のコンテンツを配信する場合、映像（音声を含む）に関しては、上記のエンコーダー装置（ライブエンコーダー）によって、配信するファイルを生成することができる。 One of the components of a system for delivering moving image content such as broadcast programs via a communication line is an encoder device. In particular, an encoder device that receives a broadcast signal from a television and outputs a moving image file or the like that can be distributed in real time is required. Such an encoder device is also called a "live encoder". When delivering the content of a broadcast program via a communication line, with respect to video (including audio), a file to be delivered can be generated by the encoder device (live encoder) described above.

ＡＲＩＢ（一般社団法人電波産業会）によって定められた標準規格によると、テレビ放送の字幕データは、映像とは別に、テキストデータの形で放送信号に載せて、送信装置から送出される。テレビ受像機側では、放送信号から、映像と字幕テキストとをそれぞれ抽出し、定められた提示時刻にしたがってそれら両者を同期させながら表示を行う。字幕テキストは、定められた提示開始時刻と提示終了時刻によって提示制御される。 According to the standard defined by ARIB (Association of Radio Industries and Businesses), subtitle data for television broadcasting is put on a broadcast signal in the form of text data in addition to video, and sent from a transmitting device. On the side of the television receiver, the video and the subtitle text are extracted from the broadcast signal, and they are displayed while synchronizing them according to a predetermined presentation time. The subtitle text is presentation-controlled by the presentation start time and the presentation end time that are set.

字幕の表示に関しては、次に挙げる文献に、それぞれ、技術が記載されている。
特許文献１には、現在の字幕と過去の字幕とを同時に表示する技術が記載されている。具体的には、同文献の技術では、表示部は２つの画面を持つ。そして、第１の画面（現在画面）には、受信した放送信号から抽出された番組の映像に同期した現在の字幕が表示される。一方、第２の画面（過去画面）には、現在画面に表示されている現在の映像および字幕よりも所定時間前のタイミングで表示された過去の字幕が表示される。 Regarding the display of subtitles, the technologies are described in the following documents.
Patent Document 1 describes a technique for simultaneously displaying a current subtitle and a past subtitle. Specifically, in the technique of the document, the display unit has two screens. Then, on the first screen (current screen), the current subtitle synchronized with the video of the program extracted from the received broadcast signal is displayed. On the other hand, on the second screen (past screen), past captions displayed at a timing a predetermined time before the current video and subtitles displayed on the current screen are displayed.

特許文献２には、過去に表示された字幕を利用してストリーム出力を制御する技術が記載されている。具体的には、記憶装置に記憶された、多重化された時間情報を有するストリームから情報を分離する。分離される情報は、字幕と、映像と、音声である。分離された情報が字幕ならば、その字幕は、字幕リスト保持用メモリに保持される。そして、その字幕は、時間情報が対応する映像と合成して表示出力される。そして、字幕リスト保持用メモリ内に記憶されている字幕履歴の特定の字幕を選択すると、その字幕に対応した時間情報を基に、上記ストリーム出力が制御される。なお、同文献の技術は、ＤＶＤプレイヤー装置やハードディスクレコーダー装置など、コンテンツが視聴者側の記録媒体に予め記録されていることが想定されている。 Patent Document 2 describes a technique for controlling stream output by using captions displayed in the past. Specifically, the information is separated from the stream having the multiplexed time information stored in the storage device. The separated information is subtitles, video, and audio. If the separated information is a subtitle, the subtitle is held in the subtitle list holding memory. Then, the subtitles are combined with the video corresponding to the time information and displayed and output. Then, when a specific caption of the caption history stored in the caption list holding memory is selected, the stream output is controlled based on the time information corresponding to the caption. The technique of the document is presumed to have contents recorded in advance on a recording medium on the viewer side, such as a DVD player device and a hard disk recorder device.

特開２００９−１７７７２０号公報JP, 2009-177720, A 特開２００３−０１８４９１号公報JP, 2003-018491, A

前述の通り、放送信号を基に、エンコーダー装置（ライブエンコーダー）が映像および音声をエンコードして、ファイルとして出力することは、従来の技術において可能である。しかしながら、従来技術によるエンコーダー装置では、字幕データをリアルタイムにエンコードすることができない。したがって、放送と同時に、通信を介してテレビ番組を配信しようとしても、字幕のない映像しか配信することができないという問題がある。現在実施されている通信回線経由でのコンテンツ配信においても、字幕データは配信されていない。
特許文献１や特許文献２に記載されている技術は、視聴者側で、過去の字幕を見たり、過去の字幕に基づいて出力ストリームを制御したりすることを可能とするものであるが、配信可能な字幕データをリアルタイムに生成するものではない。
今後、通信回線を経由して放送番組のコンテンツを配信する場合にも、受信側（視聴者側）で字幕も見ることができるようにすることが望まれる。 As described above, it is possible in the related art that the encoder device (live encoder) encodes video and audio based on the broadcast signal and outputs it as a file. However, the conventional encoder device cannot encode subtitle data in real time. Therefore, even if an attempt is made to deliver a television program through communication simultaneously with broadcasting, there is a problem that only a video without subtitles can be delivered. Even in the content distribution currently being carried out via communication lines, subtitle data is not distributed.
The techniques described in Patent Document 1 and Patent Document 2 enable the viewer to view past subtitles and control the output stream based on the past subtitles. It does not generate distributable subtitle data in real time.
In the future, it is desired that subtitles can be viewed on the receiving side (viewer side) even when the content of a broadcast program is distributed via a communication line.

本発明は、上記の課題認識に基づいて行なわれたものであり、放送信号を基に、リアルタイムで配信可能な字幕データを生成するための字幕データ生成装置およびそのプログラムを提供しようとするものである。また、そのような字幕データ生成装置またはプログラムによって生成された字幕データ等を表示するためのコンテンツ表示装置およびそのプログラムを提供しようとするものである。 The present invention has been made based on the above-mentioned problem recognition, and is intended to provide a caption data generation device and a program thereof for generating caption data that can be distributed in real time based on a broadcast signal. is there. Further, the present invention aims to provide a content display device and a program for displaying subtitle data and the like generated by such a subtitle data generation device or program.

［１］上記の課題を解決するため、本発明の一態様による字幕データ生成装置は、外部から取得した放送信号から抽出された字幕データから字幕テキストと前記字幕テキストの提示時刻の情報とを取得し、前記字幕テキストを前記提示時刻と関連付けて出力する字幕変換部と、前記放送信号に基づいてエンコードされた動画ファイルを記憶する記憶部と、前記記憶部が記憶する前記動画ファイルの提示時刻に同期するように、前記字幕テキストを含んだ字幕ファイルを生成するデータ生成部と、前記記憶部から読み出した前記動画ファイルと前記データ生成部が生成した前記字幕ファイルとを出力する出力部と、を具備する。 [1] To solve the above problem, a caption data generation device according to an aspect of the present invention acquires caption text and information on the presentation time of the caption text from caption data extracted from a broadcast signal acquired from the outside. Then, the subtitle conversion unit that outputs the subtitle text in association with the presentation time, the storage unit that stores the video file encoded based on the broadcast signal, and the presentation time of the video file that the storage unit stores. A data generation unit that generates a subtitle file including the subtitle text so as to be synchronized, and an output unit that outputs the moving image file read from the storage unit and the subtitle file generated by the data generation unit. To have.

［２］また、本発明の一態様は、上記の字幕データ生成装置において、前記動画ファイルは、所定の長さの時間のセグメントごとに分割された複数の動画ファイルであり、前記記憶部は、前記複数の動画ファイルを適切な順に提示させるための各動画ファイルの提示時刻の情報を含んだプレイリストのデータである動画プレイリストファイルをさらに記憶しており、前記データ生成部は、前記動画プレイリストファイルを参照しながら前記複数の動画ファイルにそれぞれ対応する複数の前記字幕ファイルを生成するとともに、生成した前記字幕ファイルを適切な順に提示させるためのプレイリストのデータである字幕プレイリストファイルをさらに生成するものであり、前記出力部は、さらに前記動画プレイリストファイルと前記字幕プレイリストファイルとを出力する、ことを特徴とする。 [2] Further, according to an aspect of the present invention, in the caption data generation device, the moving image file is a plurality of moving image files divided into segments each having a predetermined length of time, and the storage unit includes: The moving image playlist file, which is data of a playlist including presentation time information of each moving image file for presenting the plurality of moving image files in an appropriate order, is further stored. While generating a plurality of the subtitle files respectively corresponding to the plurality of video files while referring to the list file, a subtitle playlist file which is data of a playlist for presenting the generated subtitle files in an appropriate order is further created. The output section further outputs the moving picture playlist file and the subtitle playlist file.

［３］また、本発明の一態様は、上記の字幕データ生成装置において、前記字幕変換部は、前記字幕テキスト内に外字が含まれている場合には、前記外字に対応するフォントの所在情報を、当該外字に関連付けた形の字幕テキストを出力する、ことを特徴とする。 [3] Further, according to an aspect of the present invention, in the above subtitle data generation device, when the subtitle conversion unit includes an external character, the location information of a font corresponding to the external character is included. Is output as a subtitle text in a form associated with the external character.

［４］また、本発明の一態様は、コンピューターを、外部から取得した放送信号から字幕データを抽出する字幕抽出部、前記字幕データから字幕テキストと前記字幕テキストの提示時刻の情報とを取得し、前記字幕テキストを前記提示時刻と関連付けて出力する字幕変換部、前記放送信号に基づいてエンコードされた動画ファイルを記憶する記憶部、前記記憶部が記憶する前記動画ファイルの提示時刻に同期するように、前記字幕テキストを含んだ字幕ファイルを生成するデータ生成部、前記記憶部から読み出した前記動画ファイルと前記データ生成部が生成した前記字幕ファイルとを出力する出力部、として機能させるためのプログラムである。 [4] Further, according to one aspect of the present invention, a computer acquires a caption extraction unit that extracts caption data from a broadcast signal acquired from the outside, and acquires the caption text and information of the presentation time of the caption text from the caption data. , A subtitle conversion unit that outputs the subtitle text in association with the presentation time, a storage unit that stores a video file encoded based on the broadcast signal, so as to synchronize with the presentation time of the video file stored by the storage unit And a program for functioning as a data generation unit that generates a subtitle file including the subtitle text, and an output unit that outputs the moving image file read from the storage unit and the subtitle file generated by the data generation unit. Is.

［５］また、本発明の一態様によるコンテンツ表示装置は、動画ファイルと前記動画ファイルに対応する字幕ファイルとを受信する通信部と、受信された前記動画ファイルをデコードすることによって映像と前記映像の提示時刻である映像提示時刻の情報とを出力するデコード部と、受信された前記字幕ファイルから、字幕テキストと前記字幕テキストの提示時刻である字幕提示時刻の情報とを出力する字幕処理部と、前記映像提示時刻の情報と前記字幕提示時刻の情報とに基づいて前記映像と前記字幕テキストの提示のタイミングを同期させながら、前記映像を表示するための領域である映像表示領域とは重ならない字幕表示領域に、前記字幕テキストを表示する提示制御部と、を具備する。 [5] Further, the content display device according to one aspect of the present invention includes a communication unit that receives a moving image file and a subtitle file corresponding to the moving image file, and an image and the image by decoding the received moving image file. And a subtitle processing unit that outputs, from the received subtitle file, subtitle text and subtitle presentation time information that is the presentation time of the subtitle text. , The video display area, which is an area for displaying the video, does not overlap while synchronizing the timing of presenting the video and the subtitle text based on the information on the video presentation time and the information on the subtitle presentation time. A presentation control unit that displays the subtitle text in the subtitle display area.

［６］また、本発明の一態様は、上記のコンテンツ表示装置において、前記字幕提示時刻の情報は、字幕提示開始時刻と字幕提示終了時刻との情報を含むものであり、前記提示制御部は、前記字幕提示開始時刻において当該字幕提示開始時刻に対応する前記字幕テキストの表示を開始するとともに、当該字幕テキストに対応する前記字幕提示終了時刻が到来しても当該字幕テキストの表示を終了させず、当該字幕テキストの表示位置とは異なる前記字幕表示領域内の位置に、以後の字幕テキストを表示させるよう制御する、ことを特徴とする。 [6] Further, according to an aspect of the present invention, in the content display device described above, the information on the caption presentation time includes information on a caption presentation start time and a caption presentation end time. , Starting display of the subtitle text corresponding to the subtitle presentation start time at the subtitle presentation start time, and not ending display of the subtitle text even when the subtitle presentation end time corresponding to the subtitle text arrives The subtitle text is controlled so as to be displayed at a position different from the display position of the subtitle text in the subtitle display area.

［７］また、本発明の一態様は、上記のコンテンツ表示装置において、前記提示制御部は、表示済の前記字幕テキストが選択される操作を受け付けた場合、当該字幕テキストの提示時刻に対応する位置まで早戻しして、当該位置から前記動画ファイルの提示を再開するよう制御する、ことを特徴とする。 [7] Further, according to an aspect of the present invention, in the above content display device, when the presentation control unit receives an operation of selecting the displayed subtitle text, the presentation control unit corresponds to a presentation time of the subtitle text. It is characterized by performing a quick rewind to the position and restarting the presentation of the moving image file from the position.

［８］また、本発明の一態様は、上記のコンテンツ表示装置において、前記提示制御部は、前記字幕ファイルから、前記字幕テキストに対応する話者を特定する話者特定情報を取得し、前記話者特定情報に関連付ける形で、前記字幕テキストを表示する、ことを特徴とする。 [8] Further, according to an aspect of the present invention, in the content display device described above, the presentation control unit acquires, from the subtitle file, speaker identification information that identifies a speaker corresponding to the subtitle text. The subtitle text is displayed in a form associated with the speaker identification information.

［９］また、本発明の一態様は、コンピューターを、通信で受信された動画ファイルをデコードすることによって映像と前記映像の提示時刻である映像提示時刻の情報とを出力するデコード部、通信で受信された字幕ファイルから、字幕テキストと前記字幕テキストの提示時刻である字幕提示時刻の情報とを出力する字幕処理部、前記映像提示時刻の情報と前記字幕提示時刻の情報とに基づいて前記映像と前記字幕テキストの提示のタイミングを同期させながら、前記映像を表示するための領域である映像表示領域とは重ならない字幕表示領域に、前記字幕テキストを表示する提示制御部、として機能させるためのプログラムである。 [9] Further, according to one aspect of the present invention, a computer is provided with a decoding unit that outputs a video and information of a video presentation time, which is a presentation time of the video, by decoding a video file received by the communication. From the received subtitle file, a subtitle processing unit that outputs subtitle text and information of subtitle presentation time that is the presentation time of the subtitle text, the video based on the video presentation time information and the subtitle presentation time information. And a presentation control unit that displays the subtitle text in a subtitle display area that does not overlap with the video display area that is an area for displaying the video, while synchronizing the presentation timing of the subtitle text. It is a program.

本発明によれば、放送番組をリアルタイムに通信回線で同時配信する場合に、字幕を配信することも可能となる。また、コンテンツ表示装置側で、字幕を時系列に表示することが可能となり、モバイルの環境等においても配信されるコンテンツを視聴しやすくなる。 According to the present invention, subtitles can be distributed when a broadcast program is simultaneously distributed in real time on a communication line. In addition, subtitles can be displayed in time series on the content display device side, which makes it easy to view the distributed content even in a mobile environment.

本発明の実施形態による字幕データ生成装置の機能構成を示すブロック図である。It is a block diagram showing functional composition of a title data generation device by an embodiment of the present invention. 同実施形態による字幕データ生成装置を含む配信システムの概略機能構成を示すブロック図である。FIG. 3 is a block diagram showing a schematic functional configuration of a distribution system including the caption data generation device according to the same embodiment. 同実施形態による字幕データ生成装置がエンコーダー装置から取得し記憶部に記憶するプレイリストファイル（動画ｍ３ｕ８）の例を示す概略図である。It is a schematic diagram showing an example of a play list file (moving picture m3u8) which a caption data generation device by the embodiment acquires from an encoder device and memorize|stores in a memory|storage part. 同実施形態による字幕データ生成装置がエンコーダー装置から取得し記憶部に記憶するプレイリストファイルのマスター（ｍａｓｔｅｒ．ｍ３ｕ８）の例を示す概略図である。It is the schematic which shows the example of the master (master.m3u8) of the play list file which the subtitle data generation apparatus by the same embodiment acquires from an encoder apparatus and memorize|stores in a memory|storage part. 同実施形態による字幕変換部によってリスト化された字幕データの構成を示す概略図である。It is a schematic diagram showing composition of subtitle data listed by a subtitle conversion part by the embodiment. 同実施形態によるデータ生成部が生成する字幕ファイルの構成例を示す概略図である。It is a schematic diagram showing an example of composition of a subtitles file which a data generation part by the embodiment generates. 同実施形態によるデータ生成部が生成する字幕のプレイリストファイル（字幕ｍ３ｕ８）の構成例を示す概略図である。It is a schematic diagram showing an example of composition of a playlist file (subtitle m3u8) of a subtitle which a data generation part by the embodiment generates. 同実施形態によるデータ生成部が生成するプレイリストファイルのマスター（ｍａｓｔｅｒ．ｍ３ｕ８）の構成例を示す概略図である。It is a schematic diagram showing an example of composition of a master (master.m3u8) of a play list file which a data generation part by the embodiment. 同実施形態によるクライアント装置の概略機能構成を示すブロック図である。FIG. 3 is a block diagram showing a schematic functional configuration of a client device according to the same embodiment. 同実施形態によるクライアント装置側におけるコンテンツ提示画面の構成例を示す概略図である。It is a schematic diagram showing an example of composition of a contents presentation screen in the client device side by the embodiment. 同実施形態によるクライアント装置が、前図とは異なるモードで字幕を表示させた例を示す概略図である。FIG. 9 is a schematic diagram showing an example in which the client device according to the same embodiment displays subtitles in a mode different from the previous mode.

次に、本発明の実施形態について、図面を参照しながら説明する。
図１は、本実施形態による字幕データ生成装置の機能構成を示すブロック図である。この図において、符号１は字幕データ生成装置である。図示するように、字幕データ生成装置１は、字幕抽出部１１と、字幕変換部１２と、データ生成部１３と、出力部１４と、記憶部２０とを含んで構成される。これら各部は電子回路を用いて実現され、情報を表す電気的な信号を処理する。なお、後述するように、コンピューターを用いて各部の機能を実現するようにしてもよい。以下で、各部の機能について説明する。 Next, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing the functional configuration of the caption data generation device according to the present embodiment. In this figure, reference numeral 1 is a caption data generation device. As illustrated, the caption data generation device 1 is configured to include a caption extraction unit 11, a caption conversion unit 12, a data generation unit 13, an output unit 14, and a storage unit 20. Each of these units is realized by using an electronic circuit and processes an electric signal representing information. As will be described later, a computer may be used to realize the function of each unit. The function of each unit will be described below.

字幕抽出部１１は、外部から入力される放送信号を取り込み、取得した放送信号から字幕データを抽出する。放送信号は、ＳＤＩ（Serial Digital Interface，シリアル・ディジタル・インターフェース）で字幕データ生成装置１に伝送されてくる。ＳＤＩは、放送用機器に用いられる標準的なインターフェースである。放送信号の形式は、ＡＲＩＢ（Association of Radio Industries and Business，一般社団法人電波産業会）で策定された標準規格に基づくものである。字幕データも、ＡＲＩＢの規定にしたがって、入力される放送信号に重畳されている。字幕データは、ＨＤ−ＳＤＩまたはＳＤ−ＳＤＩの垂直ブランキング領域に格納されており、字幕抽出部１１はこの字幕データを抽出する。なお、字幕データが、放送信号の他の領域に格納されていてもよい。
字幕抽出部１１は、抽出した字幕データを、字幕変換部１２に渡す。
なお、字幕抽出部１１の機能が、字幕データ生成装置１の外部の装置に存在していてもよい。その場合、字幕データ生成装置１の外部に存在する字幕抽出部が、放送信号から字幕データを抽出し、抽出した字幕データを字幕データ生成装置１の字幕変換部１２に渡す。 The caption extraction unit 11 takes in a broadcast signal input from the outside and extracts caption data from the acquired broadcast signal. The broadcast signal is transmitted to the caption data generation device 1 by SDI (Serial Digital Interface). SDI is a standard interface used for broadcasting equipment. The format of the broadcast signal is based on the standard established by ARIB (Association of Radio Industries and Business). The subtitle data is also superimposed on the input broadcast signal according to the ARIB regulations. The caption data is stored in the vertical blanking area of HD-SDI or SD-SDI, and the caption extraction unit 11 extracts this caption data. The subtitle data may be stored in another area of the broadcast signal.
The caption extraction unit 11 passes the extracted caption data to the caption conversion unit 12.
The function of the caption extraction unit 11 may exist in a device outside the caption data generation device 1. In that case, the caption extraction unit existing outside the caption data generation device 1 extracts the caption data from the broadcast signal, and passes the extracted caption data to the caption conversion unit 12 of the caption data generation device 1.

字幕変換部１２は、抽出した字幕データをテキスト形式に変換した後、字幕文のテキストと、その付加情報とをリスト化する。付加情報には、提示時刻（提示開始時刻および提示終了時刻）に関する情報、画面上での字幕の表示位置に関する情報、文字装飾に関する情報が含まれる。つまり、字幕変換部１２は、字幕データから字幕テキストとその字幕テキストの提示時刻の情報とを取得し、字幕テキストと提示時刻とを関連付けて出力する。字幕変換部１２によってリスト化されたデータについては、後で、図５を参照しながら説明する。 After converting the extracted subtitle data into a text format, the subtitle conversion unit 12 lists the text of the subtitle text and its additional information. The additional information includes information on presentation time (presentation start time and presentation end time), information on display position of subtitles on the screen, and information on character decoration. That is, the subtitle conversion unit 12 acquires the subtitle text and the information on the presentation time of the subtitle text from the subtitle data, and outputs the subtitle text and the presentation time in association with each other. The data listed by the subtitle conversion unit 12 will be described later with reference to FIG.

また、抽出した字幕データのテキスト内に外字が含まれている場合、字幕変換部１２は、その外字を、ウェブで取得可能な外字フォント（「ウェブフォント」とも呼ばれる）の情報（その外字フォントを取得するためのＵＲＬ等）に変換する。なお、ＵＲＬは、Uniform Resource Locator（ユニフォーム・リソース・ロケーター）の略である。ここで、外字とは、標準的な規格として定められている文字セットに含まれない文字である。文字コードの標準的な規格の例は、ＪＩＳ（日本工業規格）コードやUnicode（ユニコード）等である。外字のコードとしては、標準的な文字コード体系の空き領域のコードが割り当てられる。なお、絵文字も、外字の一種として扱ってよい。字幕変換部１２は、個々の外字と、そのウェブフォントの所在情報（ＵＲＬ等）の対応関係のデータを予め保持しておき、字幕テキスト内に外字が存在するときには、その外字に、対応するウェブフォントの所在情報を関連付けて出力する。このような変換を行うことにより、字幕データの配信を受けたクライアント装置側では、適切なウェブフォントを取得し、外字を表示することが可能となる。
字幕変換部１２は、リスト化したデータを、データ生成部１３に渡す。 In addition, when the extracted subtitle data includes an external character in the text, the subtitle conversion unit 12 determines the external character as information on an external character font (also referred to as “web font”) that can be obtained on the web (that external character font is URL to retrieve). The URL is an abbreviation for Uniform Resource Locator. Here, the external character is a character that is not included in the character set defined as a standard. Examples of the standard of character codes are JIS (Japanese Industrial Standard) code and Unicode (Unicode). As the external character code, the code of the empty area of the standard character code system is assigned. Note that pictograms may also be treated as a type of external character. The subtitle conversion unit 12 holds in advance data of the correspondence relationship between each external character and the location information (URL, etc.) of the web font, and when the external character exists in the subtitle text, the external character corresponding to that external character is stored. The font location information is associated and output. By performing such conversion, the client device side that has received the subtitle data distribution can acquire an appropriate web font and display the external character.
The subtitle conversion unit 12 passes the listed data to the data generation unit 13.

データ生成部１３は、字幕変換部１２によって変換された字幕データと、記憶部２０から読み出したプレイリストファイル２２とに基づいて、字幕ファイル２３を生成し、記憶部２０内に書き込む。また、データ生成部１３は、読み込んだプレイリストファイル２２に字幕に関する情報を追加することによって、プレイリストファイル２４を生成し、記憶部２０内に書き込む。なお、データ生成部１３は、読み込んだプレイリストファイル２２（動画プレイリストファイル）自体には手を加えない。つまり、データ生成部１３は、記憶部２０が記憶する動画ファイル２１の提示時刻に同期するように、字幕テキストを含んだ字幕ファイル２３を生成する。 The data generation unit 13 generates a subtitle file 23 based on the subtitle data converted by the subtitle conversion unit 12 and the playlist file 22 read from the storage unit 20, and writes the subtitle file 23 in the storage unit 20. In addition, the data generation unit 13 adds the information regarding the subtitles to the read playlist file 22 to generate the playlist file 24 and writes the playlist file 24 in the storage unit 20. The data generation unit 13 does not modify the read playlist file 22 (moving image playlist file) itself. That is, the data generation unit 13 generates the subtitle file 23 including the subtitle text so as to be synchronized with the presentation time of the moving image file 21 stored in the storage unit 20.

なお、動画ファイル２１およびプレイリストファイル２２は、外部から取得されて記憶部２０に書き込まれているデータファイルである。動画ファイル２１およびプレイリストファイル２２は、字幕抽出部１１に入力される放送信号に対応するものである。つまり、動画ファイル２１およびプレイリストファイル２２は、その放送信号に基づいて、外部のエンコーダー装置（後述する）によって生成されたファイルである。具体的には、データ生成部１３は、後述する動画プレイリストファイルを参照しながら複数の動画ファイルにそれぞれ対応する複数の字幕ファイルを生成するとともに、生成した字幕ファイルを適切な順に提示させるためのプレイリストのデータである字幕プレイリストファイルをさらに生成する。 The moving image file 21 and the playlist file 22 are data files acquired from the outside and written in the storage unit 20. The moving image file 21 and the playlist file 22 correspond to the broadcast signal input to the subtitle extraction unit 11. That is, the moving image file 21 and the playlist file 22 are files generated by an external encoder device (described later) based on the broadcast signal. Specifically, the data generation unit 13 generates a plurality of subtitle files respectively corresponding to a plurality of video files with reference to a video playlist file described later, and presents the generated subtitle files in an appropriate order. A subtitle playlist file that is playlist data is further generated.

出力部１４は、配信用の動画ファイル２１と字幕ファイル２３とプレイリストファイル２４とを外部に出力する。具体的には、出力部１４は、これらの動画ファイル２１と字幕ファイル２３とプレイリストファイル２４を、外部のコンテンツ配信サーバー装置（後述する）に渡す。出力部１４が出力するプレイリストファイル２４には、動画プレイリストファイルと、字幕プレイリストファイルと、プレイリストのマスターとが含まれる。 The output unit 14 outputs the distribution moving image file 21, the subtitle file 23, and the playlist file 24 to the outside. Specifically, the output unit 14 passes the moving image file 21, the subtitle file 23, and the playlist file 24 to an external content distribution server device (described later). The playlist file 24 output by the output unit 14 includes a moving image playlist file, a subtitle playlist file, and a playlist master.

記憶部２０は、動画ファイル２１と、プレイリストファイル２２と、字幕ファイル２３と、プレイリストファイル２４とを少なくとも一時的に記憶するものである。記憶部２０は、これらのファイルを記憶するために、ハードディスク装置や半導体ディスク装置などといった記憶媒体を内部に備えている。 The storage unit 20 temporarily stores a moving image file 21, a playlist file 22, a subtitle file 23, and a playlist file 24. The storage unit 20 is internally provided with a storage medium such as a hard disk device or a semiconductor disk device for storing these files.

ここで、記憶部２０が保持する各ファイルについてさらに説明する。
動画ファイル２１は、動画コンテンツ（映像と音声を含む）のファイルである。動画ファイル２１は、放送信号を基に、外部のエンコーダー装置でエンコードして得られた動画コンテンツを保持するものである。動画ファイル２１は、所定のセグメントの長さのファイルに分割されて存在している。セグメントの長さは適宜定められるが、例えば、５秒あるいは１０秒といった長さである。つまり、セグメント単位の動画ファイル２１が複数件存在する。
プレイリストファイル２２は、動画コンテンツの全体の構成を記述したファイルである。プレイリストファイル２２もまた外部のエンコーダー装置で生成されたファイルである。プレイリストファイル２２は、セグメントごとに複数存在する上記動画ファイル２１について、いつ（提示時刻）、どの順で再生すべきであるかを記述したデータを保持する。本実施形態では、プレイリストファイル２２は、動画ｍ３ｕ８ファイル（動画プレイリストファイル）と、ｍａｓｔｅｒ．ｍ３ｕ８ファイルとで構成される。ｍａｓｔｅｒ．ｍ３ｕ８ファイルは、マルチメディアコンテンツ全体の構成を示す基となるマスターのファイルである。動画ｍ３ｕ８ファイルは、複数の動画ファイルを適切な順に提示させるための各動画ファイルの提示時刻の情報を含んだプレイリストのデータである。
プレイリストファイル２２の具体例については図面を参照しながら後で説明する。 Here, each file stored in the storage unit 20 will be further described.
The moving image file 21 is a file of moving image content (including video and audio). The moving image file 21 holds moving image content obtained by encoding with an external encoder device based on a broadcast signal. The moving image file 21 exists by being divided into files having a predetermined segment length. The length of the segment is appropriately determined, but is, for example, 5 seconds or 10 seconds. That is, there are a plurality of moving image files 21 for each segment.
The playlist file 22 is a file that describes the overall structure of the moving image content. The playlist file 22 is also a file generated by an external encoder device. The playlist file 22 holds data describing when (presentation time) and in what order the moving image files 21 existing in plural for each segment should be reproduced. In the present embodiment, the play list file 22 includes a moving picture m3u8 file (moving picture play list file), a master. It consists of m3u8 file. master. The m3u8 file is a master file that serves as a basis for showing the configuration of the entire multimedia content. The moving image m3u8 file is play list data including information on the presentation time of each moving image file for presenting a plurality of moving image files in an appropriate order.
A specific example of the playlist file 22 will be described later with reference to the drawings.

字幕ファイル２３とプレイリストファイル２４とは、データ生成部１３によって生成され、記憶部２０に書き込まれるものである。
字幕ファイル２３は、動画ファイル２１に対応する字幕テキストのデータを含むファイルである。字幕ファイル２３は、字幕抽出部１１が取得した放送信号から抽出された字幕テキストのデータを保持する。この放送信号は、動画ファイル２１の基となった放送信号と同一のものである。字幕ファイル２３もまた、前記のセグメントの長さのファイルに分割されて複数存在している。つまり、あるセグメントに関して、動画ファイル２１の１つに対応して、字幕ファイル２３の１つが存在する。
プレイリストファイル２４は、プレイリストファイル２２に、字幕に関する情報を付加したファイルである。本実施形態では、プレイリストファイル２４は、前述した動画ｍ３ｕ８ファイルと、字幕に関するプレイリストである字幕ｍ３ｕ８ファイルと、ｍａｓｔｅｒ．ｍ３ｕ８ファイルとを含む。ここで、ｍａｓｔｅｒ．ｍ３ｕ８ファイルには、字幕に関する情報が付加的に書き込まれている。 The subtitle file 23 and the playlist file 24 are generated by the data generation unit 13 and written in the storage unit 20.
The subtitle file 23 is a file including subtitle text data corresponding to the moving image file 21. The subtitle file 23 holds data of subtitle text extracted from the broadcast signal acquired by the subtitle extraction unit 11. This broadcast signal is the same as the broadcast signal on which the moving image file 21 is based. The subtitle file 23 is also divided into a plurality of files each having the length of the segment, and a plurality of subtitle files 23 exist. That is, for a certain segment, one subtitle file 23 exists corresponding to one moving image file 21.
The playlist file 24 is a file in which information regarding subtitles is added to the playlist file 22. In the present embodiment, the playlist file 24 includes the moving image m3u8 file described above, a subtitle m3u8 file that is a playlist related to subtitles, and a master. Including m3u8 file. Here, the master. Information about subtitles is additionally written in the m3u8 file.

つまり、プレイリストファイル２４と動画ファイル２１と字幕ファイル２３とを受信した側では、プレイリストファイル２４を参照することにより、提示時刻に対応してセグメントごとに、動画ファイル２１と字幕ファイル２３とを提示していくことが可能となる。つまり、動画ファイル２１と字幕ファイル２３とは、相互に同期しながら提示される。 That is, on the side that receives the playlist file 24, the moving image file 21, and the subtitle file 23, by referring to the playlist file 24, the moving image file 21 and the subtitle file 23 are generated for each segment corresponding to the presentation time. It is possible to present it. That is, the moving image file 21 and the subtitle file 23 are presented in synchronization with each other.

なお、動画ファイルの形式としては、ＨＬＳ（HTTP Live Streaming，ＨＴＴＰ・ライブ・ストリーミング）やＭＰＥＧ−ＤＡＳＨ（Dynamic Adaptive Streaming over HTTP，ダイナミック・アダプティブ・ストリーミング・オーバー・ＨＴＴＰ）を用いることができるが、これらには限定されない。また、字幕ファイルの形式としては、ＷｅｂＶＴＴ（Web Video Text Track，ウェブ・ビデオ・テキスト・トラック）や、ＴＴＭＬ（Timed Text Markup Language，タイムド・テキスト・マークアップ言語）、ＡＲＩＢ−ＴＴＭＬを用いることができるが、これらには限定されない。なお、ＡＲＩＢ−ＴＴＭＬは、ＡＲＩＢの規格として定められたＴＴＭＬである。
なお、以下の説明において、特定の形式を想定した説明をする場合があるが、その他のファイル形式によって動画や字幕を配信する場合にも、当然、本実施形態を適用することが可能である。 As the format of the moving image file, HLS (HTTP Live Streaming, HTTP live streaming) or MPEG-DASH (Dynamic Adaptive Streaming over HTTP, dynamic adaptive streaming over HTTP) can be used. It is not limited to. Further, as the format of the subtitle file, WebVTT (Web Video Text Track, Web Video Text Track), TTML (Timed Text Markup Language, Timed Text Markup Language), or ARIB-TTML can be used. However, it is not limited thereto. ARIB-TTML is TTML defined as a standard of ARIB.
In the following description, a description will be given assuming a specific format, but naturally, the present embodiment can also be applied to the case where a moving image or subtitle is distributed in another file format.

次に、字幕データ生成装置１を用いた配信システムについて説明する。
図２は、字幕データ生成装置１を含む配信システムの概略機能構成を示すブロック図である。つまり、同図は、字幕データ生成装置１とその周辺の装置との関係を表している。図示するように、配信システム１００は、字幕データ生成装置１と、タイムコード挿入器５１と、分配器５２と、エンコーダー装置５３と、コンテンツ配信サーバー装置６１と、クライアント装置７１とを含んで構成されるものである。なお、クライアント装置７１を「コンテンツ表示装置」と呼んでもよい。同図に示す各装置間では、データを含んだ信号のやりとりを行えるようになっている。特に、コンテンツ配信サーバー装置６１とクライアント装置７１との間は、インターネット等の通信回線により接続されている。同図では、クライアント装置７１を１台だけ示しているが、実際には、多数のクライアント装置７１を用いて配信システム１００を構成するようにしてもよい。 Next, a distribution system using the caption data generation device 1 will be described.
FIG. 2 is a block diagram showing a schematic functional configuration of a distribution system including the caption data generation device 1. That is, the figure shows the relationship between the caption data generation device 1 and the peripheral devices. As illustrated, the distribution system 100 is configured to include a caption data generation device 1, a time code inserter 51, a distributor 52, an encoder device 53, a content distribution server device 61, and a client device 71. It is something. The client device 71 may be called a “content display device”. A signal including data can be exchanged between the respective devices shown in FIG. In particular, the content distribution server device 61 and the client device 71 are connected by a communication line such as the Internet. Although only one client device 71 is shown in the figure, in actuality, the distribution system 100 may be configured using a large number of client devices 71.

タイムコード挿入器５１は、入力される放送信号（ＳＤＩ）に、タイムコードを挿入する。タイムコード挿入器５１は、タイムコード挿入済の放送信号を、分配器５２に渡す。タイムコード挿入器５１と分配器５２との間においても、ＳＤＩによる伝送が行われる。 The time code inserter 51 inserts a time code into the input broadcast signal (SDI). The time code inserter 51 delivers the broadcast signal in which the time code has been inserted to the distributor 52. The SDI transmission is also performed between the time code inserter 51 and the distributor 52.

分配器５２は、タイムコード挿入器５１から渡された放送信号を、２系統に分配する。分配器５２から出力される第１の系統の放送信号は、エンコーダー装置５３に渡される。また、分配器５２から出力される第２の系統の放送信号は、字幕データ生成装置１の字幕抽出部１１に渡される。つまり、同一の放送信号が分配器５２で分配され、第１の系統の放送信号は動画のエンコーディングのために使用され、第２の系統の放送信号は字幕データの抽出のために使用される。 The distributor 52 distributes the broadcast signal passed from the time code inserter 51 into two systems. The broadcast signal of the first system output from the distributor 52 is passed to the encoder device 53. Further, the broadcast signal of the second system output from the distributor 52 is passed to the caption extraction unit 11 of the caption data generation device 1. That is, the same broadcast signal is distributed by the distributor 52, the broadcast signal of the first system is used for encoding a moving image, and the broadcast signal of the second system is used for extracting caption data.

エンコーダー装置５３は、入力される放送信号に含まれる映像および音声を抽出し、動画ファイル（映像および音声を含む）を出力する。動画ファイルは、所定の長さにセグメント化されている。セグメントの長さは任意であるが、例えば、５秒あるいは１０秒といった所定の長さを有する。つまり、エンコーダー装置５３が出力する各々の動画ファイルは、セグメントに対応するものであり、定められたセグメント長を有するものである。
また、エンコーダー装置５３は、上記の動画ファイルについて記述したプレイリストファイルもまた出力する。
なお、放送信号を入力して映像および音声をエンコードする処理自体は、既存の技術を用いて行うことができる。つまり、エンコーダー装置５３自体は、既存技術によるものである。
エンコーダー装置５３によって生成された動画ファイルおよびプレイリストファイルは、字幕データ生成装置１に渡される。そして、これらのファイルは、字幕データ生成装置１内の記憶部２０に、動画ファイル２１およびプレイリストファイル２２として、書き込まれる。 The encoder device 53 extracts video and audio included in the input broadcast signal and outputs a moving image file (including video and audio). The moving image file is segmented into a predetermined length. The length of the segment is arbitrary, but has a predetermined length such as 5 seconds or 10 seconds. That is, each moving image file output by the encoder device 53 corresponds to a segment and has a predetermined segment length.
In addition, the encoder device 53 also outputs a playlist file describing the moving image file.
The process itself of inputting a broadcast signal and encoding video and audio can be performed by using an existing technology. That is, the encoder device 53 itself is based on the existing technology.
The moving image file and the playlist file generated by the encoder device 53 are passed to the subtitle data generation device 1. Then, these files are written in the storage unit 20 in the subtitle data generation device 1 as a moving image file 21 and a playlist file 22.

字幕データ生成装置１は、エンコーダー装置５３から出力される動画ファイルとプレイリストファイルとを取得し、内部の記憶手段に一時的に記憶する。また、字幕データ生成装置１は、分配器５２から放送信号（ＳＤＩ）を直接取得し、その放送信号から字幕データを抽出する。そして、字幕データ生成装置１は、取得した動画ファイルに合わせた字幕ファイルを生成する。また、字幕データ生成装置１は、上記の動画ファイルと自らが生成した字幕ファイルとを同期して提示できるように、エンコーダー装置５３から取得したプレイリストファイルに、字幕ファイルに関する情報を追記し、出力する。そして、字幕データ生成装置１は、これらの動画ファイルと字幕ファイルとプレイリストファイルとを、コンテンツ配信サーバー装置６１に渡す。 The subtitle data generation device 1 acquires the moving image file and the playlist file output from the encoder device 53 and temporarily stores them in the internal storage means. Further, the caption data generation device 1 directly acquires a broadcast signal (SDI) from the distributor 52 and extracts caption data from the broadcast signal. Then, the caption data generation device 1 generates a caption file that matches the acquired moving image file. In addition, the subtitle data generation device 1 adds information about the subtitle file to the playlist file acquired from the encoder device 53 so that the above-mentioned moving image file and the subtitle file generated by itself can be presented in synchronization and output. To do. Then, the subtitle data generation device 1 passes these moving image file, subtitle file, and playlist file to the content distribution server device 61.

コンテンツ配信サーバー装置６１は、クライアント装置７１に対してコンテンツデータを配信するものである。具体的には、コンテンツ配信サーバー装置６１は、クライアント装置７１からの要求に応じて、動画ファイルや字幕ファイルやプレイリストファイルをクライアント装置７１に対して送信する。 The content distribution server device 61 distributes content data to the client device 71. Specifically, the content distribution server device 61 transmits a moving image file, a subtitle file, or a playlist file to the client device 71 in response to a request from the client device 71.

クライアント装置７１は、コンテンツ配信サーバー装置６１に対してコンテンツデータを要求し、コンテンツ配信サーバー装置６１から配信されるコンテンツデータを受信して視聴者に対して提示する。例えば、クライアント装置７１は、コンテンツ配信サーバー装置６１から動画ファイルと字幕ファイルとプレイリストファイルとを受信する。そして、クライアント装置７１は、受信したプレイリストファイルにしたがって、所定のタイミングで動画ファイルと字幕ファイルとを読み込み、動画および字幕を同期させながら画面上に表示する。また、クライアント装置７１は、動画ファイルに含まれる音声をスピーカーやイヤフォン端子等の音声出力手段から出力する。
なお、クライアント装置７１による字幕テキストの表示のしかたの詳細については、後で説明する。 The client device 71 requests content data from the content distribution server device 61, receives the content data distributed from the content distribution server device 61, and presents it to the viewer. For example, the client device 71 receives a moving image file, a subtitle file, and a playlist file from the content distribution server device 61. Then, the client device 71 reads the moving image file and the subtitle file at a predetermined timing according to the received playlist file, and displays the moving image and the subtitle on the screen while synchronizing them. The client device 71 also outputs the sound included in the moving image file from a sound output unit such as a speaker or an earphone terminal.
The details of how the client device 71 displays the subtitle text will be described later.

以上の構成により、字幕データ生成装置１は、放送信号から字幕データを抽出し、抽出した字幕データを用いて字幕ファイルを出力する。字幕ファイルは、配信先のクライアント装置７１で利用しやすい形式のデータとして構成される。また、字幕データ生成装置１は、字幕データと動画ファイルとが同期するよう、プレイリストファイルを加工する。そして、字幕データ生成装置１は、配信用の動画ファイルおよび字幕ファイルとともに、プレイリストファィルを、コンテンツ配信サーバー装置６１に渡す。
コンテンツ配信サーバー装置６１はこれらのファイルを配信することが可能となる。また、配信を受けたクライアント装置７１側では、プレイリストファイルに基づいて、表示すべき動画ファイルと字幕ファイルを取得する。これにより、配信を受けたクライアント装置７１側では字幕を含むコンテンツを再生・表示させることが可能となる。 With the above configuration, the caption data generation device 1 extracts caption data from a broadcast signal and outputs a caption file using the extracted caption data. The subtitle file is configured as data in a format that is easily used by the client device 71 of the delivery destination. Also, the subtitle data generation device 1 processes the playlist file so that the subtitle data and the moving image file are synchronized. Then, the subtitle data generation device 1 passes the playlist file to the content distribution server device 61 together with the moving image file and subtitle file for distribution.
The content distribution server device 61 can distribute these files. Also, the client device 71 side that has received the distribution acquires the moving image file and the subtitle file to be displayed based on the playlist file. As a result, the client device 71 that has received the distribution can reproduce and display the content including the subtitles.

図３は、字幕データ生成装置１がエンコーダー装置５３から取得し、記憶部２０に記憶するプレイリストファイルの例を示す概略図である。具体的には、同図は、動画ｍ３ｕ８ファイルを示す。「ｍ３ｕ８」はマルチメディアプレイリストの形式の一つであり、「ｍ３ｕ８」ファイルは、テキストデータとして記述されている。なお、同図では便宜的に行番号を付している。以下、同図に示すデータ例について説明する。 FIG. 3 is a schematic diagram showing an example of a playlist file that the subtitle data generation device 1 acquires from the encoder device 53 and stores in the storage unit 20. Specifically, this figure shows a moving image m3u8 file. "M3u8" is one of the formats of the multimedia playlist, and the "m3u8" file is described as text data. In the figure, line numbers are added for convenience. Hereinafter, the data example shown in the figure will be described.

第１行目の「＃ＥＸＴＭ３Ｕ」は、動画ｍ３ｕ８ファイルのヘッダーである。
第２行目の「＃ＥＸＴ−Ｘ−ＶＥＲＳＩＯＮ：３」は、動画ｍ３ｕ８ファイルの互換性バージョンが「３」であることを示す。
第３行目の「＃ＥＸＴ−Ｘ−ＴＡＲＧＥＴＤＵＲＡＴＩＯＮ：５」は、メディアファイル（動画ファイル等）の最大の長さ（時間長）を秒単位で示す。本例では「５」が指定されているため、メディアファイルの最大の長さが５秒であることを示している。
第４行目の「＃ＥＸＴ−Ｘ−ＭＥＤＩＡ−ＳＥＱＵＥＮＣＥ：０」は、このプレイリストファイル内に現れる最初のＵＲＬが、何番目のシーケンス番号のものであるかを示す。シーケンス番号は、メディアの各セグメントにシーケンシャルに付与される番号である。本例では、シーケンス番号として「０」が指定されているので、最初に現れるＵＲＬ（第７行目）のシーケンス番号が０であることを示している。 "#EXT3U" in the first line is the header of the moving image m3u8 file.
“#EXT-X-VERSION:3” on the second line indicates that the compatible version of the moving image m3u8 file is “3”.
“#EXT-X-TARGETDURATION:5” on the third line indicates the maximum length (time length) of the media file (moving image file or the like) in seconds. In this example, since "5" is designated, it indicates that the maximum length of the media file is 5 seconds.
"#EXT-X-MEDIA-SEQUENCE: 0" on the fourth line indicates the sequence number of the first URL appearing in this playlist file. The sequence number is a number sequentially assigned to each segment of the medium. In this example, since "0" is designated as the sequence number, it indicates that the sequence number of the URL (the 7th line) that appears first is 0.

第５行目から第７行目までが、１つのセグメントに対応する。
第５行目の「＃ＥＸＴ−Ｘ−ＰＲＯＧＲＡＭ−ＤＡＴＥ−ＴＩＭＥ：２０１５−０６−０９Ｔ０８：４２：０５．６３５Ｚ」は、当該セグメントの開始部分に関連付けられる日時を示す。具体的には、本例では、本セグメントの開始部分は「２０１５年６月９日０８：４２：０５．６３５」であることを示す。なお、「Ｚ」は、協定世界時による時刻表記であることを示す。
第６行目の「＃ＥＸＴＩＮＦ：５．０」は、当該セグメントの長さを秒単位で示す。具体的には、本例では、当該セグメント長さは５．０秒である。
第７行目には、当該セグメントの動画ファイル（「チャンクファイル」あるいは「ＴＳファイル」とも呼ぶ）の所在を示すＵＲＬが記述されている。 The fifth line to the seventh line correspond to one segment.
"#EXT-X-PROGRAM-DATE-TIME:2015-06-09T08:42:05.635Z" on the fifth line indicates the date and time associated with the start portion of the segment. Specifically, in this example, the start portion of this segment is “June 9, 2015 08:42:05.635”. In addition, "Z" indicates that the time is expressed in Coordinated Universal Time.
“#EXTINF:5.0” on the sixth line indicates the length of the segment in seconds. Specifically, in this example, the segment length is 5.0 seconds.
The 7th line describes the URL indicating the location of the moving image file (also called “chunk file” or “TS file”) of the segment.

以上、第５行目から第７行目までのセグメントについて説明した。
ファイルの以下の部分では、セグメントに関する記述が順次続く。第８行目から第１０行目までは、当該ファイル内の２番目のセグメントに関する記述である。第１１行目から第１３行目までは、当該ファイル内の３番目のセグメントに関する記述である。第１４行目から第１６行目までは、当該ファイル内の４番目のセグメントに関する記述である。
このように、同図に示す例では、長さ５秒のセグメントの連続として、コンテンツが配信される。
なお、本例では動画ｍ３ｕ８ファイルの行数は１６であるが、動画ｍ３ｕ８ファイルがさらに後続するセグメントに関する記述を含んでいてもよい。
なお、本例では動画ファイルの所在をＵＲＬで記載しているが、所在を示すものであればこれに限るものではなく、例えば相対パスで記載してもかまわない。 The segments from the fifth line to the seventh line have been described above.
In the following parts of the file, a description of the segments follows. The 8th line to the 10th line describe the second segment in the file. The 11th to 13th lines describe the third segment in the file. The 14th line to the 16th line describe the fourth segment in the file.
As described above, in the example shown in the figure, the content is distributed as a continuous segment having a length of 5 seconds.
In this example, the moving image m3u8 file has 16 lines, but the moving image m3u8 file may include a description regarding a segment that follows further.
Although the location of the moving image file is described by the URL in this example, the location is not limited to this as long as it indicates the location, and the relative path may be described, for example.

図４もまた、字幕データ生成装置１がエンコーダー装置５３から取得し、記憶部２０に記憶するプレイリストファイルの例を示す概略図である。同図は、プレイリストファイルのマスター（ｍａｓｔｅｒ．ｍ３ｕ８ファイル）を示す。なお、同図では便宜的に行番号を付している。以下、同図に示すデータ例について説明する。
第１行目の「＃ＥＸＴＭ３Ｕ」は、ファイルのヘッダーである。
第２行目および第４行目は、ストリーミングに関する情報を示す。「ＰＲＯＧＲＡＭ−ＩＤ＝」はプログラムを識別する情報を示している。また、「ＢＡＮＤＷＩＤＴＨ＝」は配信する際のストリーミングバンド幅を示している。
第３行目および第５行目は、動画ｍ３ｕ８ファイルの名前を示している。 FIG. 4 is also a schematic diagram showing an example of a playlist file that the subtitle data generation device 1 acquires from the encoder device 53 and stores in the storage unit 20. This figure shows the master of the playlist file (master.m3u8 file). In the figure, line numbers are added for convenience. Hereinafter, the data example shown in the figure will be described.
"#EXTM3U" in the first line is the header of the file.
The second and fourth lines show information about streaming. “PROGRAM-ID=” indicates information for identifying the program. In addition, “BANDWIDTH=” indicates a streaming bandwidth at the time of distribution.
The 3rd and 5th lines indicate the name of the moving image m3u8 file.

図５は、字幕変換部１２によって変換され、リスト化された字幕データの構成を示す概略図である。
同図に示すデータは、表形式のデータであり、開始時刻、終了時刻、字幕テキスト、表示位置・文字装飾、といった項目を有する。
この表の各行が、図３に示したファイルで定義されているセグメントに対応している。つまり、字幕抽出部１１は、タイムコードを含んだ放送信号から、字幕テキストとタイムコードとを関連付けて抽出する。そして、字幕変換部１２は、記憶部２０に記憶されているプレイリストファイル２２（即ち、図３に示した動画ｍ３ｕ８ファイル）を参照しながら、抽出した字幕テキストをセグメントごとに分割し、振り分ける。 FIG. 5 is a schematic diagram showing the structure of the caption data converted and listed by the caption conversion unit 12.
The data shown in the figure is tabular data, and has items such as start time, end time, subtitle text, display position and character decoration.
Each row of this table corresponds to the segment defined in the file shown in FIG. That is, the subtitle extraction unit 11 extracts the subtitle text and the time code in association with each other from the broadcast signal including the time code. Then, the subtitle conversion unit 12 divides the extracted subtitle text for each segment and distributes it while referring to the playlist file 22 (that is, the moving image m3u8 file illustrated in FIG. 3) stored in the storage unit 20.

開始時刻は、各セグメントに対応した、字幕の提示開始時刻である。図５に示す各セグメントの開始時刻は、図３に示したプレイリストファイル（動画ｍ３ｕ８ファイル）に記述されている各セグメントの開始時刻に対応するものである。
終了時刻は、その字幕の提示終了時刻である。なお、終了時刻のデータを省略することもできる。終了時刻のデータを省略した場合には、次のセグメントの開始時刻が、当該セグメントの終了時刻として扱われるようにする。
字幕テキストは、そのセグメント内で提示される字幕のテキストである。この字幕のテキストは、放送信号から字幕抽出部１１によって抽出されたものである。
表示位置・文字装飾は、字幕テキストを表示する位置（画面上の座標情報）や、字幕テキストの文字を表示する際の装飾（文字サイズ、字体、下線等）の情報である。表示位置・文字装飾も、放送信号から抽出される情報である。
なお、その他のデータ項目をさらに含んでいてもよい。 The start time is the presentation start time of subtitles corresponding to each segment. The start time of each segment shown in FIG. 5 corresponds to the start time of each segment described in the playlist file (moving image m3u8 file) shown in FIG.
The end time is the presentation end time of the subtitle. The end time data may be omitted. When the end time data is omitted, the start time of the next segment is treated as the end time of the segment.
Subtitle text is the text of the subtitle presented within that segment. The caption text is extracted from the broadcast signal by the caption extraction unit 11.
The display position/character decoration is information on the position where the subtitle text is displayed (coordinate information on the screen) and the decoration when the characters of the subtitle text are displayed (character size, font, underline, etc.). The display position and character decoration are also information extracted from the broadcast signal.
Incidentally, other data items may be further included.

図６は、データ生成部１３が生成する字幕ファイルの構成例を示す概略図である。データ生成部１３は、１セグメント分の字幕のデータを、１つの字幕ファイルとして生成する。つまり、データ生成部１３は、セグメントに対応する字幕ファイルを多数生成する。同図は、１セグメントに対応する字幕ファイルを示している。なお、同図では、便宜的に行番号を付している。以下、この字幕ファイルについて説明する。 FIG. 6 is a schematic diagram showing a configuration example of a subtitle file generated by the data generation unit 13. The data generation unit 13 generates one segment of subtitle data as one subtitle file. That is, the data generation unit 13 generates many subtitle files corresponding to the segments. The figure shows a subtitle file corresponding to one segment. In the figure, line numbers are added for convenience. The subtitle file will be described below.

第１行目の「ＷＥＢＶＴＴ」は、ヘッダー情報であり、本ファイルがＷＥＢＶＴＴ形式のファイルであることを表す。
第２行目の「Ｘ−ＴＩＭＥＳＴＡＭＰ−ＭＡＰ：ＭＰＥＧＴＳ＝１５２２２６０，ＬＯＣＡＬ：００：００：００．０００」は、時刻のマッピングを示す情報である。具体的には、本例では、「ＭＰＥＧＴＳ：１５２２２６０」は、動画ファイル（チャンクファイル）から取得された情報であり、動画ファイル内のタイムスタンプ「１５２２２６０」に対応している。また、「ＬＯＣＡＬ：００：００：００．０００」は、上記のタイムスタンプ「１５２２２６０」が、局所的（相対的）な時刻表記における「００：００：００．０００」（０時０分０秒０００）に対応することを表している。 “WEBVTT” in the first line is header information and indicates that this file is a WEBVTT format file.
“X-TIMESTAMP-MAP:MPEGTS=1522260, LOCAL:00:00:00.000” on the second line is information indicating time mapping. Specifically, in this example, “MPEGTS:1522260” is information acquired from the moving image file (chunk file) and corresponds to the time stamp “1522260” in the moving image file. Further, "LOCAL: 00:00: 00.000" means that the above-mentioned time stamp "1522260" is "00:00: 00.000" (0:00: 00: 00) in local (relative) time notation. 000).

第３行目のデータと第４行目のデータは対である。第３行目は、字幕提示の開始時刻（００：００：００．０００）と終了時刻（００：００：０２．２０３）とを示している。ここで表記されている時刻は、上記の相対的な時刻である。第４行目の「あいうえお」は、第３行目で示した開始時刻から終了時刻までの間に提示されるべきテキストである。この字幕テキスト「あいうえお」は、図５で示したデータの第４行目に対応するものである。
また、第５行目のデータと第６行目のデータは対である。第５行目は、字幕提示の開始時刻（００：００：０２．２０３）と終了時刻（００：００：０６．０４１）とを示している。ここで表記されている時刻は、上記の相対的な時刻である。第６行目の「かきくけこ」は、第５行目で示した開始時刻から終了時刻までの間に提示されるべきテキストである。この字幕テキスト「かきくけこ」は、図５で示したデータの第５行目に対応するものである。 The data on the third line and the data on the fourth line are a pair. The third line shows the start time (00:00:00.000) and the end time (00:00:02.203) of subtitle presentation. The time described here is the relative time described above. "Aiueo" on the fourth line is the text that should be presented between the start time and the end time shown on the third line. This subtitle text "Aiueo" corresponds to the fourth line of the data shown in FIG.
The data on the fifth line and the data on the sixth line are a pair. The fifth line shows the start time (00:00:02.203) and end time (00:00:06.041) of subtitle presentation. The time described here is the relative time described above. "Kakikuke" on the sixth line is the text to be presented between the start time and the end time shown on the fifth line. This subtitle text "Kakikuke" corresponds to the fifth line of the data shown in FIG.

図７は、データ生成部１３が生成する字幕のプレイリストファイルの構成例を示す概略図である。字幕のプレイリストファイルもまた、動画のプレイリストファイルと同様に、ｍ３ｕ８ファイルとして生成される。ここでは、字幕のプレイリストファイルを「字幕ｍ３ｕ８ファイル」とも呼ぶ。同図では、便宜上、行番号を付している。以下、同図に示すデータの各行について説明する。 FIG. 7 is a schematic diagram showing a configuration example of a playlist file of subtitles generated by the data generation unit 13. The subtitle playlist file is also generated as an m3u8 file similarly to the moving picture playlist file. Here, the subtitle playlist file is also referred to as a “subtitle m3u8 file”. In the figure, line numbers are added for convenience. Hereinafter, each line of the data shown in the figure will be described.

第１行目の「＃ＥＸＴＭ３Ｕ」は、字幕ｍ３ｕ８ファイルのヘッダーである。
第２行目の「＃ＥＸＴ−Ｘ−ＶＥＲＳＩＯＮ：３」は、字幕ｍ３ｕ８ファイルの互換性バージョンが「３」であることを示す。
第３行目の「＃ＥＸＴ−Ｘ−ＴＡＲＧＥＴＤＵＲＡＴＩＯＮ：５」は、メディアファイルの最大の長さ（時間長）を秒単位で示す。本例では、動画ｍ３ｕ８ファイル（図３）に合わせて、最大の長さとして「５秒」が指定されている。
第４行目の「＃ＥＸＴ−Ｘ−ＭＥＤＩＡ−ＳＥＱＵＥＮＣＥ：０は、動画ｍ３ｕ８ファイル（図３）におけるシーケンス番号と同様に、このプレイリストファイル内に現れる最初のＵＲＬが、何番目のシーケンス番号のものであるかを示す。 "#EXTM3U" in the first line is the header of the subtitle m3u8 file.
“#EXT-X-VERSION:3” on the second line indicates that the compatible version of the subtitle m3u8 file is “3”.
"#EXT-X-TARGETDURATION:5" in the third line indicates the maximum length (time length) of the media file in seconds. In this example, “5 seconds” is designated as the maximum length in accordance with the moving image m3u8 file (FIG. 3).
In the 4th line, "#EXT-X-MEDIA-SEQUENCE:0" is similar to the sequence number in the moving image m3u8 file (FIG. 3), and the first URL appearing in this playlist file has a sequence number of Indicates whether it is a thing.

第５行目と第６行目までが、１つのセグメントに対応する。
第５行目の「＃ＥＸＴＩＮＦ：５．０」は、当該セグメントの長さを秒単位で示す。具体的には、本例では、当該セグメント長さは５．０秒である。
第６行目には、当該セグメントの字幕ファイル（図６で説明したファイル。ファイルの拡張子名が「．ｖｔｔ」であるファイル。）の所在を示すＵＲＬが記述されている。
字幕ｍ３ｕ８ファイルに「＃ＥＸＴ−Ｘ−ＰＲＯＧＲＡＭ−ＤＡＴＥ−ＴＩＭＥ」属性の記述が含まれていない。動画ファイルのセグメントと字幕ファイルのセグメントとが１対１に対応し、また動画ファイルと字幕ファイルとの間でセグメントの開始時刻は同一である。したがって、動画ｍ３ｕ８ファイルに記述されている「＃ＥＸＴ−Ｘ−ＰＲＯＧＲＡＭ−ＤＡＴＥ−ＴＩＭＥ」属性の値が、対応する字幕ファイルの開始時刻をも表している。
なお、字幕ｍ３ｕ８ファイルに「＃ＥＸＴ−Ｘ−ＰＲＯＧＲＡＭ−ＤＡＴＥ−ＴＩＭＥ」属性の記述を含むようにしてもよい。 The fifth and sixth lines correspond to one segment.
“#EXTINF:5.0” on the fifth line indicates the length of the segment in seconds. Specifically, in this example, the segment length is 5.0 seconds.
In the 6th line, the URL indicating the location of the subtitle file of the segment (the file described in FIG. 6. The file whose extension name is “.vtt”.) is described.
The subtitle m3u8 file does not include the description of the "#EXT-X-PROGRAM-DATE-TIME" attribute. The segment of the moving image file and the segment of the subtitle file have a one-to-one correspondence, and the start time of the segment is the same between the moving image file and the subtitle file. Therefore, the value of the “#EXT-X-PROGRAM-DATE-TIME” attribute described in the moving image m3u8 file also represents the start time of the corresponding subtitle file.
The subtitle m3u8 file may include a description of the attribute “#EXT-X-PROGRAM-DATE-TIME”.

以上、第５行目から第６行目までのセグメントについて説明した。
字幕ｍ３ｕ８ファイルの以下の部分では、セグメントに関する記述が順次続く。第７行目から第８行目までは、当該ファイル内の２番目のセグメントに関する記述である。第９行目から第１０行目までは、当該ファイル内の３番目のセグメントに関する記述である。第１１行目から第１２行目までは、当該ファイル内の４番目のセグメントに関する記述である。
このように、同図に示す例では、長さ５秒のセグメントの連続として、動画ファイルに対応付く形で字幕ファイルが存在する。
なお、本例では字幕ｍ３ｕ８ファイルの行数は１２であるが、動画ｍ３ｕ８ファイルにあわせて、字幕ｍ３ｕ８ファイルがさらに後続するセグメントに関する記述を含んでいてもよい。
なお、本例では字幕ファイルの所在をＵＲＬで記載しているが、所在を示すものであればこれに限るものではなく、例えば相対パスで記載してもかまわない。 The segments from the fifth line to the sixth line have been described above.
In the following part of the subtitle m3u8 file, the description about the segment sequentially follows. The 7th line to the 8th line describe the second segment in the file. The ninth to tenth lines describe the third segment in the file. The 11th line to the 12th line describe the fourth segment in the file.
In this way, in the example shown in the figure, a subtitle file exists in a form corresponding to a moving image file as a continuous segment of 5 seconds in length.
In this example, the number of lines of the subtitle m3u8 file is 12, but the subtitle m3u8 file may include a description regarding a segment subsequent to the moving image m3u8 file.
Although the location of the subtitle file is described by the URL in this example, the location is not limited to this as long as it indicates the location, and may be described by a relative path, for example.

図８は、データ生成部１３が生成するプレイリストファイルのマスター（ｍａｓｔｅｒ．ｍ３ｕ８ファイル）の構成例を示す概略図である。データ生成部１３は、図４に示したプレイリストファイルのマスターを読み込み、このマスターに字幕に関する情報を付加することによって、図８のマスターを生成する。なお、図８に示すマスターは、記憶部２０に書き込まれるプレイリストファイル２４の一部である。同図では、便宜的に、行番号を付して示している。また、同図において下線を付して示している部分は、元のマスターには含まれない情報であり、データ生成部１３が付加した情報である。以下、このマスターの詳細について説明する。 FIG. 8 is a schematic diagram showing a configuration example of the master (master.m3u8 file) of the playlist file generated by the data generating unit 13. The data generation unit 13 reads the master of the playlist file shown in FIG. 4 and adds information regarding subtitles to this master to generate the master of FIG. The master shown in FIG. 8 is a part of the playlist file 24 written in the storage unit 20. In the figure, line numbers are added for convenience. In addition, the underlined portion in the figure is information that is not included in the original master and is information that is added by the data generation unit 13. The details of this master will be described below.

第１行目の「＃ＥＸＴＭ３Ｕ」は、ファイルのヘッダーである。
第２行目の全体は、データ生成部１３によって付加された、字幕データに関する情報である。第２行目に含まれる記述は次の通りである。「＃ＥＸＴ−Ｘ−ＭＥＤＩＡ」は、メディアに関する定義であることを示す。
「ＴＹＰＥ＝ＳＵＢＴＩＴＬＥＳ」は、メディア種別が字幕（subtitles）であることを示す。「ＧＲＯＵＰ−ＩＤ＝"ｘｘｘ"」は、当該メディアのグループＩＤが「ｘｘｘ」であることを示す。「ＮＡＭＥ＝"Ｊａｐａｎｅｓｅ"」は、当該メディアの名称が「Ｊａｐａｎｅｓｅ」であることを示す。「ＤＥＦＡＵＬＴ＝ＹＥＳ」と「ＡＵＴＯＳＥＬＥＣＴ＝ＹＥＳ」と「ＦＯＲＣＥＤ＝ＮＯ」とは、それぞれ、当該メディアに関する設定値を記述したものである。「ＬＡＮＧＵＡＧＥ＝"ｊａ"」は、使用されている言語が日本語であることを示すものである。「ＵＲＩ＝"ｓｕｂ／ｊｐｎ／ｓｕｂ.ｍ３ｕ８"」は、字幕ｍ３ｕ８（プレイリストファイル）の所在を示すＵＲＩ（ユニフォーム・リソース・アイデンティファイアー）を記述したものである。このＵＲＩの指定によって、マスターから、字幕のプレイリストが関連付けられる。 "#EXTM3U" in the first line is the header of the file.
The entire second line is the information regarding the subtitle data added by the data generation unit 13. The description contained in the second line is as follows. “#EXT-X-MEDIA” indicates a definition related to media.
“TYPE=SUBTITLES” indicates that the media type is subtitles. “GROUP-ID=“xxxx”” indicates that the group ID of the media is “xxx”. “NAME=“Japanease”” indicates that the name of the medium is “Japanease”. “DEFAULT=YES”, “AUTOSELECT=YES”, and “FORCED=NO” respectively describe setting values for the medium. "LANGUAGE="ja"" indicates that the language used is Japanese. “URI=“sub/jpn/sub.m3u8”” describes a URI (uniform resource identifier) indicating the location of the subtitle m3u8 (playlist file). By specifying the URI, the subtitle playlist is associated with the master.

第３行目および第５行目は、ストリーミングに関する情報を示す。キーワードパラメーター「ＰＲＯＧＲＡＭ−ＩＤ」および「ＢＡＮＤＷＩＤＴＨ」は、図４の第２行目と第４行目において説明したものと同様である。「ＳＵＢＴＩＴＬＥＳ＝"ｘｘｘ"」は、字幕に関する記述として、データ生成部１３によって追加されたものである。
第４行目および第６行目は、動画ｍ３ｕ８ファイルの名前を示しており、図４の第３行目と第５行目において説明したものと同様である。 The third and fifth lines show information about streaming. The keyword parameters “PROGRAM-ID” and “BANDWIDTH” are the same as those described in the second and fourth lines of FIG. “SUBTITLES=“xxxx”” is added by the data generation unit 13 as a description regarding subtitles.
The 4th and 6th lines indicate the name of the moving image m3u8 file, and are the same as those described in the 3rd and 5th lines of FIG.

図８に示したように、データ生成部１３は、自らが生成する字幕データに関する記述を、マスター（ｍａｓｔｅｒ．ｍ３ｕ８）にも追加して、出力する。このようなマスターが配信されることにより、配信を受けるクライアント装置７１の側では、各セグメントの動画ファイルに関連付けられた字幕ファイルを引き当てることが可能となる。 As illustrated in FIG. 8, the data generation unit 13 adds the description related to the subtitle data generated by itself to the master (master.m3u8) and outputs it. By distributing such a master, the client device 71 receiving the distribution can allocate the subtitle file associated with the moving image file of each segment.

次に、配信を受けるクライアント装置側での字幕の提示等について説明する。
図９は、クライアント装置の概略機能構成を示すブロック図である。図示するように、クライアント装置７１は、通信部７９と、記憶部８０と、デコード部８１と、字幕処理部８２と、提示制御部８３と、出力部８４と、を含んで構成される。クライアント装置７１は、具体的には例えば、パーソナルコンピューター（ＰＣ）や、スマートフォン（スマホ）や、ウェアラブル端末などといった装置である。ウェアラブル端末は、例えば、腕時計型の端末や、眼鏡型の端末であるが、これらの形態には限られない。 Next, the presentation of subtitles and the like on the client device side that receives distribution will be described.
FIG. 9 is a block diagram showing a schematic functional configuration of the client device. As illustrated, the client device 71 includes a communication unit 79, a storage unit 80, a decoding unit 81, a caption processing unit 82, a presentation control unit 83, and an output unit 84. The client device 71 is specifically a device such as a personal computer (PC), a smartphone (smartphone), or a wearable terminal. The wearable terminal is, for example, a wristwatch-type terminal or a spectacle-type terminal, but is not limited to these forms.

通信部７９は、外部との通信を行う。通信部７９は、例えばインターネット経由で、コンテンツ配信サーバー装置６１との間の通信を行う。この通信により、通信部７９は、コンテンツ配信サーバー装置６１から配信されるコンテンツのデータを受信する。具体的には、通信部７９は、動画ファイル２１や字幕ファイル２３やプレイリストファイル２４を受信し、記憶部８０に書き込む。
記憶部８０は、データを記憶する。具体的には、記憶部８０は、動画ファイル２１と、字幕ファイル２３と、プレイリストファイル２４とを少なくとも一時的に記憶する。記憶部８０は、磁気ハードディスク装置や半導体ディスク装置などといった記憶媒体を内部に備えている。 The communication unit 79 communicates with the outside. The communication unit 79 communicates with the content distribution server device 61 via the Internet, for example. Through this communication, the communication unit 79 receives the content data distributed from the content distribution server device 61. Specifically, the communication unit 79 receives the moving image file 21, the subtitle file 23, and the playlist file 24, and writes them in the storage unit 80.
The storage unit 80 stores data. Specifically, the storage unit 80 at least temporarily stores the moving image file 21, the subtitle file 23, and the playlist file 24. The storage unit 80 internally includes a storage medium such as a magnetic hard disk device or a semiconductor disk device.

デコード部８１は、提示制御部８３による制御にしたがって、記憶部８０から動画ファイル２１を読み出し、その動画ファイル２１をデコードする。デコード処理の結果、デコード部８１は、映像および音声を提示制御部８３に渡す。また、デコード部８１は、映像の提示時刻に関する情報を動画ファイルから取得し、提示制御部８３に渡す。
字幕処理部８２は、提示制御部８３による制御にしたがって、記憶部８０から字幕ファイル２３を読み出し、字幕ファイル２３から字幕テキスト等を取得する。字幕処理部８２は、また、字幕ファイル２３から取り出した字幕テキスト内に外字が含まれている場合、その外字のフォント（ウェブフォント）の所在を示すＵＲＬをあわせて字幕ファイル２３から読み出す。そして、字幕処理部８２は、通信部７９を介して、そのＵＲＬの場所にアクセスし、上記外字のフォントデータを取得する。そして、字幕処理部８２は、取得したフォントデータをその外字に関連付ける形で出力する。字幕処理部８２は、これらの字幕テキスト等のデータを、提示制御部８３に渡す。また、字幕処理部８２は、字幕テキストの提示時刻に関する情報を字幕ファイル２３から取り出し、提示制御部８３に渡す。 The decoding unit 81 reads the moving image file 21 from the storage unit 80 and decodes the moving image file 21 under the control of the presentation control unit 83. As a result of the decoding process, the decoding unit 81 passes the video and audio to the presentation control unit 83. Also, the decoding unit 81 acquires information regarding the presentation time of the video from the moving image file and passes it to the presentation control unit 83.
Under the control of the presentation control unit 83, the subtitle processing unit 82 reads the subtitle file 23 from the storage unit 80 and acquires the subtitle text and the like from the subtitle file 23. If the subtitle text extracted from the subtitle file 23 includes an external character, the subtitle processing unit 82 also reads from the subtitle file 23 a URL indicating the location of the font of the external character (web font). Then, the subtitle processing unit 82 accesses the location of the URL via the communication unit 79 and acquires the font data of the external character. Then, the subtitle processing unit 82 outputs the acquired font data in a form associated with the external character. The subtitle processing unit 82 passes the data such as the subtitle text to the presentation control unit 83. Further, the subtitle processing unit 82 extracts information regarding the presentation time of the subtitle text from the subtitle file 23 and passes it to the presentation control unit 83.

提示制御部８３は、動画コンテンツの提示を制御する。具体的には、提示制御部８３は、記憶部８０に記憶されているプレイリストファイル２４を参照しながら、所定のタイミングで、セグメントごとに動画ファイル２１をデコードするよう、デコード部８１を制御する。また、提示制御部８３は、同様にプレイリストファイル２４を参照しながら、セグメントごとの字幕ファイル２３を読み込んで処理するよう、字幕処理部８２を制御する。また、提示制御部８３は、適切な提示のタイミングで同期させながら、映像と字幕テキストとを、画面に表示するよう出力部８４に渡す。具体的には、提示制御部８３は、映像提示時刻の情報と字幕提示時刻の情報とに基づいて映像と字幕テキストの提示のタイミングを同期させながら、映像を表示するための領域である映像表示領域とは重ならない字幕表示領域に、字幕テキストを表示する。さらに、提示制御部８３が、字幕提示開始時刻において当該字幕提示開始時刻に対応する字幕テキストの表示を開始するとともに、当該字幕テキストに対応する字幕提示終了時刻が到来しても当該字幕テキストの表示を終了させず、当該字幕テキストの表示位置とは異なる字幕表示領域内の位置に、以後の字幕テキストを表示させるよう制御してもよい。なお、このときの画面の構成および配置については後述する。また、提示制御部８３は、映像等と同期させながら、動画ファイル２１から取得された音声を、音声出力手段（スピーカーやイヤフォン端子等）から出力するよう出力部８４に渡す。
出力部８４は、提示制御部８３から渡された画像（映像）を画面等に表示させる。また、出力部８４は、提示制御部８３から渡された音声を音声出力手段から出力する。 The presentation control unit 83 controls presentation of moving image content. Specifically, the presentation control unit 83 controls the decoding unit 81 so as to decode the moving image file 21 for each segment at a predetermined timing while referring to the playlist file 24 stored in the storage unit 80. .. The presentation control unit 83 also controls the subtitle processing unit 82 so as to read and process the subtitle file 23 for each segment while also referring to the playlist file 24. In addition, the presentation control unit 83 passes the video and the subtitle text to the output unit 84 so as to be displayed on the screen while synchronizing with the appropriate presentation timing. Specifically, the presentation control unit 83 synchronizes the timing of presenting the video and the subtitle text based on the information of the video presentation time and the information of the subtitle presentation time, and a video display that is an area for displaying the video. Subtitle text is displayed in a subtitle display area that does not overlap the area. Further, the presentation control unit 83 starts displaying the subtitle text corresponding to the subtitle presentation start time at the subtitle presentation start time, and displays the subtitle text even when the subtitle presentation end time corresponding to the subtitle text arrives. It is also possible to control to display the subsequent subtitle text at a position in the subtitle display area different from the display position of the subtitle text without terminating the above. The configuration and layout of the screen at this time will be described later. In addition, the presentation control unit 83 passes the sound acquired from the moving image file 21 to the output unit 84 so as to be output from the sound output unit (speaker, earphone terminal, or the like) in synchronization with the video or the like.
The output unit 84 displays the image (video) passed from the presentation control unit 83 on the screen or the like. Further, the output unit 84 outputs the voice passed from the presentation control unit 83 from the voice output means.

図１０は、クライアント装置側におけるコンテンツ提示画面の構成例を示す概略図である。図示するように、クライアント装置７１の表示面側には、表示部１０１が設けられている。表示部１０１としては、例えば、液晶ディスプレイ装置や有機ＥＬディスプレイ装置等を用いることができる。なお、「ＥＬ」は「エレクトロルミネッセンス（Electroluminescence）」の略である。図示する例では、表示部１０１を複数の領域に分割し、各領域に映像や字幕テキストを表示している。つまり、表示部１０１には、映像表示領域１０２と、字幕表示領域１０３とが設けられている。表示部１０１は多数の画素で構成されており、表示制御手段（不図示）が画素の領域を適宜分割することにより、これら複数の領域のそれぞれにおける制御が可能となる。同図において、映像表示領域１０２に表示されているものは、動画ファイル２１をデコードして得られた映像の１フレームである。また、字幕表示領域１０３に表示されているものは、現時点（そのフレームが提示される時点）までに提示された字幕テキストの履歴である。 FIG. 10 is a schematic diagram showing a configuration example of a content presentation screen on the client device side. As illustrated, a display unit 101 is provided on the display surface side of the client device 71. As the display unit 101, for example, a liquid crystal display device, an organic EL display device, or the like can be used. Note that “EL” is an abbreviation for “electroluminescence”. In the illustrated example, the display unit 101 is divided into a plurality of areas, and the video and subtitle text are displayed in each area. That is, the display unit 101 is provided with the video display area 102 and the subtitle display area 103. The display unit 101 is composed of a large number of pixels, and a display control unit (not shown) appropriately divides the pixel region to enable control in each of the plurality of regions. In the figure, what is displayed in the image display area 102 is one frame of an image obtained by decoding the moving image file 21. Also, what is displayed in the caption display area 103 is a history of caption texts presented up to the present time (the time when the frame is presented).

つまり、図１０に示す表示方法の例では、提示制御部８３は、字幕テキストを、映像の画面（映像表示領域１０２）内に重ねて表示するのではなく、字幕専用の領域（字幕表示領域１０３）に表示する。また、提示制御部８３は、字幕テキストの提示終了時刻が到来しても、その字幕テキストを字幕表示領域１０３から消去せず、字幕テキストの履歴として表示し続ける。そして、提示開始時刻の到来した新たな字幕テキストを、追加的に順次表示していく。本例では、既に表示されていた字幕テキストの下に、新たな字幕テキストを追加的に表示していく。これにより、ユーザーは、現在再生されているシーンから時間的に遡って、字幕を視認することが可能となる。また、提示制御部８３は、ユーザーの操作によって字幕表示領域１０３を上下にスクロールできるような表示方法を行ってもよい。これにより、ユーザーは、時間的に更に広い範囲の字幕テキストの履歴を参照することが可能となる。 That is, in the example of the display method shown in FIG. 10, the presentation control unit 83 does not superimpose the subtitle text on the screen of the video (video display area 102) and displays the subtitle only area (subtitle display area 103). ) Is displayed. Further, the presentation control unit 83 does not erase the subtitle text from the subtitle display area 103 even when the presentation end time of the subtitle text arrives, and continues to display it as a history of the subtitle text. Then, the new subtitle text when the presentation start time has arrived is additionally and sequentially displayed. In this example, new subtitle text is additionally displayed below the subtitle text that has already been displayed. As a result, the user can visually recognize the subtitles by going back in time from the currently reproduced scene. Further, the presentation control unit 83 may perform a display method such that the caption display area 103 can be scrolled up and down by a user's operation. This allows the user to refer to the history of subtitle text in a wider range over time.

図１１は、クライアント装置が、前図とは異なるモードで字幕を表示させた例を示す概略図である。同図においても、字幕表示領域１０３に、字幕テキストが表示されている。ただし、同図の例では、提示制御部８３は、発話者に対応するアイコンに関連付ける形で字幕を表示している。本例では、２人の発話者のアイコンが表示されている。また、提示制御部８３は、字幕テキストを吹き出し図形内に表示するよう制御している。また、提示制御部８３は、本例では、画面の縦方向に時系列に字幕テキストを表示している。画面の上側に表示されているのが提示時刻のより古い（過去方向の）字幕テキストであり、画面の下側に表示されているのが提示時刻のより新しい字幕テキストである。また、同図に示す表示例では、字幕表示領域１０３内でユーザーの操作によって上下に移動できるように、スクロールバー１１１も表示されている。同図に示す字幕テキストの例は、次の通りである。まず、第１の話者のアイコン（左側）に関連付ける形で、字幕テキスト「あいうえお」が表示されている。次に、第２の話者のアイコン（右側）に関連付ける形で、字幕テキスト「かきくけこ」が表示されている。次に、第１の話者のアイコン（左側）に関連付ける形で、字幕テキスト「明日も見てね」が表示されている。次に、第２の話者のアイコン（右側）に関連付ける形で、字幕テキスト「明日も見てね」が表示されている。 FIG. 11 is a schematic diagram showing an example in which the client device displays subtitles in a mode different from the previous mode. Also in this figure, subtitle text is displayed in the subtitle display area 103. However, in the example of the figure, the presentation control unit 83 displays the subtitle in a form associated with the icon corresponding to the speaker. In this example, the icons of two speakers are displayed. The presentation control unit 83 also controls the subtitle text to be displayed in the balloon graphic. Further, in this example, the presentation control unit 83 displays the subtitle text in time series in the vertical direction of the screen. What is displayed on the upper side of the screen is older (past direction) subtitle text at the presentation time, and what is displayed on the lower side of the screen is newer subtitle text at the presentation time. Further, in the display example shown in the figure, a scroll bar 111 is also displayed so that it can be moved up and down in the subtitle display area 103 by a user operation. An example of the subtitle text shown in the figure is as follows. First, the subtitle text "AIUEO" is displayed in association with the icon (left side) of the first speaker. Next, the subtitle text "Kakikuke" is displayed in association with the icon (right side) of the second speaker. Next, the subtitle text "See also tomorrow" is displayed in association with the icon (left side) of the first speaker. Next, the subtitle text “See also tomorrow” is displayed in association with the icon (right side) of the second speaker.

図１１の表示を行うために、提示制御部８３は、コンテンツ配信サーバー装置６１側から送られてくるメタデータを利用する。例えば、このメタデータは、字幕テキストに付随し、その話者を識別するための話者識別情報を含む。また、このメタデータは、話者識別情報と関連付けて、アイコン画像のデータ、あるいはアイコン画像を取得することのできるＵＲＬのデータを含む。このメタデータは、字幕ファイル２３内に格納された状態で、コンテンツ配信サーバー装置６１からクライアント装置７１に送られてくる。また、このメタデータは元々の放送信号に含まれており、字幕データ生成装置１のデータ生成部１３は、そのメタデータを引き継ぐように字幕ファイル２３を生成する。
なお、メタデータとして、上記の話者識別情報の代わりに、字幕表示領域１０３内における表示位置を用いるようにしてもよい。 In order to perform the display of FIG. 11, the presentation control unit 83 uses the metadata sent from the content distribution server device 61 side. For example, this metadata accompanies the subtitle text and includes speaker identification information for identifying the speaker. Further, this metadata includes the data of the icon image or the data of the URL from which the icon image can be obtained in association with the speaker identification information. This metadata is sent from the content distribution server device 61 to the client device 71 while being stored in the subtitle file 23. Further, this metadata is included in the original broadcast signal, and the data generation unit 13 of the caption data generation device 1 generates the caption file 23 so as to inherit the metadata.
Note that the display position in the subtitle display area 103 may be used as the metadata instead of the speaker identification information described above.

つまり、同図に示す表示方法を実現するために、提示制御部８３は、字幕ファイルに含まれる情報から、字幕テキストに対応する話者を特定する話者特定情報（話者ＩＤや、話者のアイコン画像や、話者のアイコン画像の所在情報等）を取得し、話者特定情報に関連付ける形で、字幕テキストを表示するよう、制御する。 That is, in order to realize the display method shown in the figure, the presentation control unit 83 uses the speaker identification information (speaker ID or speaker) that specifies the speaker corresponding to the caption text from the information included in the caption file. (The icon image location information of the speaker's icon image, etc.) is acquired, and the subtitle text is displayed so as to be associated with the speaker identification information.

次に、クライアント装置７１におけるコンテンツの早戻し操作を実現するための方法を説明する。
クライアント装置７１において、例えばユーザー（コンテンツの視聴者）が画面上に表示されている任意の字幕テキストを指示（選択）することにより、その字幕テキストの提示時刻の時点まで、映像を早戻しするようにしてもよい。なお、ユーザーが字幕テキストを指示するためには、例えばマウス等のポインティングデバイスを操作したり、タッチパネルにタッチする操作を行ったりできるようにする。ユーザーによるこれらの操作が行われると、提示制御部８３は、画面上における指示された位置の座標の情報を取得する。そして、提示制御部８３は、その座標から、どの字幕テキストが指示されたかを特定する。そして、提示制御部８３は、プレイリストファイル２４を参照することにより、指示された字幕テキストの提示時刻を取得するとともに、その字幕ファイルと同一のセグメントの動画ファイルを特定する。このようにして、提示制御部８３は、コンテンツを早戻しし、早戻しされた位置から、動画ファイル２１および字幕ファイル２３の提示を再開する。つまり、提示制御部８３は、表示済の字幕テキストが選択される操作を受け付けた場合、当該字幕テキストの提示時刻に対応する位置まで早戻しして、当該位置から動画ファイルの提示を再開するよう制御する。この場合、提示制御部８３は、早戻しした位置に対応する動画ファイルを再度デコードするよう、デコード部８１に指示してもよい。あるいは、デコード済みの映像を一時記憶手段に蓄積しておいて、その一時記憶手段から映像を再読出しすることによって早戻し再生を実現するようにしてもよい。
なお、ここで説明した早戻し操作は、図１０に示した形の字幕表示においても、図１１に示した形の字幕表示においても、行うことが可能である。 Next, a method for realizing a content fast rewind operation in the client device 71 will be described.
In the client device 71, for example, when the user (viewer of the content) designates (selects) any subtitle text displayed on the screen, the video is rewound to the point in time of the presentation time of the subtitle text. You can In order to instruct the subtitle text, the user can operate a pointing device such as a mouse or touch the touch panel. When these operations are performed by the user, the presentation control unit 83 acquires information on the coordinates of the designated position on the screen. Then, the presentation control unit 83 identifies which subtitle text is instructed from the coordinates. Then, the presentation control unit 83 refers to the playlist file 24 to acquire the presentation time of the instructed subtitle text, and specifies the moving image file of the same segment as the subtitle file. In this way, the presentation control unit 83 rewinds the content, and restarts the presentation of the moving image file 21 and the subtitle file 23 from the rewinded position. That is, when the presentation control unit 83 receives an operation of selecting the displayed subtitle text, the presentation control unit 83 rewinds to a position corresponding to the presentation time of the subtitle text, and restarts the presentation of the moving image file from the position. Control. In this case, the presentation control unit 83 may instruct the decoding unit 81 to decode again the moving image file corresponding to the fast-returned position. Alternatively, the decoded video may be stored in the temporary storage means, and the video may be reread from the temporary storage means to realize the fast-reverse reproduction.
The fast-reverse operation described here can be performed both in the subtitle display in the form shown in FIG. 10 and in the subtitle display in the form shown in FIG.

本実施形態のクライアント装置７１によれば、ユーザーは、過去に表示された字幕を後から読んで確認することができる。このとき、字幕表示領域を適宜スクロールさせて、提示時刻を遡ることができる。さらに、字幕テキストの部分から、その字幕テキストの位置に対応する動画の位置をリンクすることで、見逃したシーンの早戻し再生も可能となる。 According to the client device 71 of the present embodiment, the user can later read and confirm the captions displayed in the past. At this time, the presentation time can be traced back by appropriately scrolling the subtitle display area. Furthermore, by linking the position of the moving image corresponding to the position of the subtitle text from the subtitle text portion, it is possible to perform fast-rewind reproduction of a missed scene.

なお、上述した実施形態における字幕データ生成装置、コンテンツ配信サーバー装置、クライアント装置等の機能をコンピューターで実現するようにしても良い。その場合、これらの機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 The functions of the caption data generation device, the content distribution server device, the client device, and the like in the above-described embodiments may be realized by a computer. In that case, the program for realizing these functions may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read by a computer system and executed. The “computer system” here includes an OS and hardware such as peripheral devices. The "computer-readable recording medium" refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, the "computer-readable recording medium" means that a program is dynamically held for a short time like a communication line when the program is transmitted through a network such as the Internet or a communication line such as a telephone line. In this case, it is possible to include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above program may be one for realizing some of the functions described above, and may be one that can realize the above functions in combination with a program already recorded in the computer system.

［変形例］
なお、上記実施形態では、動画ファイルや字幕ファイルとともにプレイリストファイルを生成し配信することとした。しかし、変形例として、プレイリストファイルを用いない形で実施してもよい。プレイリストファイルを用いない場合、動画ファイルの内部に、その動画ファイルのシーケンス番号を特定するための情報を格納する。また、字幕ファイルの内部に、その字幕ファイルのシーケンス番号を特定するための情報を格納する。このようなファイルを生成して、コンテンツ配信サーバー装置から配信することにより、クライアント装置側では、動画ファイルおよび字幕ファイルのそれぞれを、正しい順序によって提示することが可能となる。また、プレイリストファイルを用いない場合、動画ファイルおよび字幕ファイルの内部に、提示タイミングに関する情報を格納する。これにより、クライアント装置側では、動画ファイルと字幕ファイルとを同期させて適切なタイミングで提示することが可能となる。なお、この場合、字幕データ生成装置内のデータ生成部はプレイリストファイルを生成せず、また出力部はプレイリストファイルを出力しない。また、コンテンツ配信サーバー装置は、プレイリストファイルを配信しない。 [Modification]
In the above embodiment, the playlist file is generated and distributed together with the moving image file and the subtitle file. However, as a modification, it may be implemented without using the playlist file. When the playlist file is not used, information for specifying the sequence number of the moving picture file is stored inside the moving picture file. In addition, information for identifying the sequence number of the subtitle file is stored inside the subtitle file. By generating such a file and distributing it from the content distribution server device, each of the moving image file and the subtitle file can be presented in the correct order on the client device side. When the playlist file is not used, the information regarding the presentation timing is stored inside the moving image file and the subtitle file. As a result, on the client device side, it is possible to synchronize the moving image file and the subtitle file and present them at an appropriate timing. In this case, the data generation unit in the caption data generation device does not generate the playlist file, and the output unit does not output the playlist file. Further, the content distribution server device does not distribute the playlist file.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiment of the present invention has been described in detail above with reference to the drawings, the specific configuration is not limited to this embodiment, and includes a design and the like within a range not departing from the gist of the present invention.

本発明は、コンテンツを配信する事業や、そのための装置等を製造・販売する事業や、その他の事業において利用可能である。 INDUSTRIAL APPLICABILITY The present invention can be used in a business of distributing contents, a business of manufacturing and selling devices for the same, and other businesses.

１字幕データ生成装置
１１字幕抽出部
１２字幕変換部
１３データ生成部
１４出力部
２０記憶部
２１動画ファイル
２２プレイリストファイル
２３字幕ファイル
２４プレイリストファイル
５１タイムコード挿入器
５２分配器
５３エンコーダー装置
６１コンテンツ配信サーバー装置
７１クライアント装置（コンテンツ表示装置）
７９通信部
８０記憶部
８１デコード部
８２字幕処理部
８３提示制御部
８４出力部
１００配信システム 1 Subtitle Data Generation Device 11 Subtitle Extraction Unit 12 Subtitle Conversion Unit 13 Data Generation Unit 14 Output Unit 20 Storage Unit 21 Video File 22 Playlist File 23 Subtitle File 24 Playlist File 51 Time Code Inserter 52 Distributor 53 Encoder Device 61 Content Distribution server device 71 Client device (content display device)
79 communication unit 80 storage unit 81 decoding unit 82 caption processing unit 83 presentation control unit 84 output unit 100 distribution system

Claims

The subtitle text and the presentation time information of the subtitle text are obtained from the subtitle data extracted from the vertical blanking area of the HD-SDI or SD-SDI of the broadcast signal obtained from the outside, and the subtitle text is set as the presentation time. A subtitle conversion unit that outputs in association with
A storage unit that stores a moving image file encoded based on the broadcast signal,
A data generation unit that generates a subtitle file including the subtitle text so as to be synchronized with the presentation time of the video file stored in the storage unit,
An output unit that outputs the moving image file read from the storage unit and the subtitle file generated by the data generation unit;
A subtitle data generation device comprising:

The moving image file is a plurality of moving image files divided into segments of a predetermined length of time,
The storage unit further stores a moving picture playlist file that is data of a playlist including information of presentation time of each moving picture file for presenting the plurality of moving picture files in an appropriate order,
The data generation unit generates a plurality of subtitle files respectively corresponding to the plurality of video files while referring to the video playlist file, and a playlist for presenting the generated subtitle files in an appropriate order. It further generates a subtitle playlist file that is data,
The output unit further outputs the video playlist file and the subtitle playlist file,
The caption data generation device according to claim 1, wherein

When the subtitle text includes an external character, the subtitle conversion unit outputs subtitle text in a form in which the location information of a font corresponding to the external character is associated with the external character.
The caption data generation device according to claim 1 or 2, characterized in that.

Computer
A caption extraction unit for extracting caption data from a vertical blanking area of HD-SDI or SD-SDI of a broadcast signal acquired from the outside,
A subtitle conversion unit that acquires subtitle text from the subtitle data and information about the presentation time of the subtitle text, and outputs the subtitle text in association with the presentation time.
A storage unit that stores a moving image file encoded based on the broadcast signal,
A data generation unit that generates a subtitle file including the subtitle text so as to be synchronized with the presentation time of the moving image file stored in the storage unit,
An output unit that outputs the moving image file read from the storage unit and the subtitle file generated by the data generation unit,
Program to function as.