JP6948934B2

JP6948934B2 - Content processing systems, terminals, and programs

Info

Publication number: JP6948934B2
Application number: JP2017243198A
Authority: JP
Inventors: 成暁加藤; 宗遠藤; 馬場　秋継; 秋継馬場; 石川　清彦; 清彦石川; 雅晴高野; 隅倉　正隆; 正隆隅倉; 剛太岩浪; 忠義小山
Original assignee: BITMEDIA INC.; Infocity KK; Japan Broadcasting Corp
Current assignee: BITMEDIA INC.; Infocity KK; Japan Broadcasting Corp
Priority date: 2017-12-19
Filing date: 2017-12-19
Publication date: 2021-10-13
Anticipated expiration: 2037-12-19
Also published as: JP2019110480A

Description

本発明は、コンテンツ加工システム、端末装置、およびプログラムに関する。 The present invention relates to content processing systems, terminal devices, and programs.

ライブ（各種イベント、舞台、音楽等）等の映像および音声を通信回線（インターネット等）によって配信する技術が普及してきている。従来技術において、ライブの映像および音声の編集・加工には、編集機材、スイッチャ―、ミキサーなどといった装置が用いられる。そして、従来技術において、編集および加工は、配信処理の前段において一元的に実施されるものであった。 Technology for distributing video and audio of live performances (various events, stages, music, etc.) via communication lines (Internet, etc.) has become widespread. In the prior art, devices such as editing equipment, switchers, mixers, etc. are used for editing and processing live video and audio. Then, in the prior art, editing and processing are performed centrally in the first stage of the distribution process.

例えば、非特許文献１には、大規模なスポーツイベントに関して、放送事業者が、インターネット経由で全競技・全種目の映像を実際にライブストリーミングで配信した際のシステム構成が記載されている。この文献によれば、イベントが開催されている現地都市のセンター（ブラジル）から、国際回線を用いて、ＩＰＶａｎｄＡのＳＤ画質の映像リソースが、東京の放送センターまで伝送された。なお、ＳＤ画質の映像は、約２．５Ｍｂｐｓのビットレートによるものである。そして、その放送センターにおいて、ＩＰＶａｎｄＡの映像をより低ビットレートの映像にコーディングし直して、インターネット経由での配信が行われた。また、一部の競技の映像に関しては、上記の放送センター内に音声の加工のための音声ブースを構築し、ネット配信独自の解説・実況を付加して配信することが行われた。 For example, Non-Patent Document 1 describes a system configuration when a broadcaster actually distributes images of all competitions and all events by live streaming for a large-scale sporting event via the Internet. According to this document, IPVandA's SD-quality video resources were transmitted from the center of the local city (Brazil) where the event was held to the broadcasting center in Tokyo using an international line. The SD image quality video is based on a bit rate of about 2.5 Mbps. Then, at the broadcasting center, the IPVandA video was recoded into a lower bit rate video and distributed via the Internet. In addition, for the video of some competitions, an audio booth for audio processing was constructed in the above broadcasting center, and the commentary and live commentary unique to online distribution were added and distributed.

島西顕司，遠藤宗，小久保幸紀，折下伸也，坂井駿一，前田彩、「リオデジャネイロオリンピックデジタルコンテンツ制作について」、放送技術、２０１６年１１月、ｐ．１０４−１０６．Kenji Shimanishi, Mune Endo, Yuki Kokubo, Shinya Orishita, Shunichi Sakai, Aya Maeda, "Rio de Janeiro Olympic Digital Content Production", Broadcasting Technology, November 2016, p. 104-106.

しかしながら、ライブ配信済みのコンテンツを基に、より低コストで実現できる構成で、コンテンツを追加したり差し替えたりすることができれば、多様なコンテンツを生成することが可能となる。 However, if it is possible to add or replace content with a configuration that can be realized at a lower cost based on the content that has already been delivered live, it will be possible to generate a variety of content.

本発明は、上記の課題認識に基づいて行なわれたものであり、配信されたコンテンツを、安価な機器構成で手軽に加工することのできる、コンテンツ加工システム、端末装置、およびプログラムを提供しようとするものである。 The present invention has been made based on the above-mentioned problem recognition, and intends to provide a content processing system, a terminal device, and a program capable of easily processing the distributed content with an inexpensive device configuration. To do.

［１］上記の課題を解決するため、本発明の一態様によるコンテンツ加工システムは、ハイパーテキスト転送プロトコルを用いたストリーミングのオリジナルコンテンツに含まれるオリジナルマニュフェストファイルを取得するマニュフェスト取得部と、前記オリジナルコンテンツに含まれるオリジナルセグメントファイルを取得するセグメント取得部と、前記オリジナルセグメントファイルをデコードし、出力するデコーダー部と、デコードされた前記オリジナルセグメントファイルに関連付けられる形で新たに追加される追加コンテンツを取得するインターフェース部と、前記追加コンテンツをエンコードするエンコーダー部と、エンコードされた前記追加コンテンツを、前記オリジナルセグメントファイルの時刻に同期するようにセグメント化することによって追加セグメントファイルを生成するセグメント化部と、前記オリジナルマニュフェストファイルに基づいて、前記オリジナルセグメントファイルと前記追加セグメントファイルとが同期するように、加工マニュフェストファイルを生成するマニュフェスト生成部と、前記オリジナルセグメントファイルと、前記追加セグメントファイルと、前記加工マニュフェストファイルとを、加工コンテンツとして、ハイパーテキスト転送プロトコルを用いて配信する再配信部と、を具備する。 [1] In order to solve the above problems, the content processing system according to one aspect of the present invention includes a manifest acquisition unit that acquires an original manifest file included in the original content of streaming using the hypertext transfer protocol, and the original content. Acquires a segment acquisition unit that acquires the original segment file included in the file, a decoder unit that decodes and outputs the original segment file, and additional content newly added in a form associated with the decoded original segment file. An interface unit, an encoder unit that encodes the additional content, a segmentation unit that generates an additional segment file by segmenting the encoded additional content so as to synchronize with the time of the original segment file, and the segmentation unit. Based on the original manifest file, a manifest generator that generates a processed manifest file so that the original segment file and the additional segment file are synchronized, the original segment file, the additional segment file, and the processed manifest file. As a processed content, a redistribution unit that distributes the file using the hypertext transfer protocol is provided.

［２］また、本発明の一態様は、上記のコンテンツ加工システムにおいて、前記マニュフェスト生成部は、取得した前記オリジナルセグメントファイルのすべてを含み、且つ前記追加セグメントファイルを含んだ追加型加工コンテンツを再生するための前記加工マニュフェストファイルを生成する、ことを特徴とする。 [2] Further, in one aspect of the present invention, in the above content processing system, the manifest generation unit includes all of the acquired original segment files and reproduces additional processed contents including the additional segment files. It is characterized in that the processing manifest file for the purpose is generated.

［３］また、本発明の一態様は、上記のコンテンツ加工システムにおいて、前記マニュフェスト生成部は、取得した前記オリジナルセグメントファイルのうちの一部のみを含み、且つ前記追加セグメントファイルを含んだ差し替え型加工コンテンツを再生するための前記加工マニュフェストファイルを生成する、ことを特徴とする。 [3] Further, in one aspect of the present invention, in the above content processing system, the manifest generation unit includes only a part of the acquired original segment file and is a replacement type including the additional segment file. It is characterized in that the processing manifest file for reproducing the processed content is generated.

［４］また、本発明の一態様は、サーバー装置と端末装置とを含むコンテンツ加工システムであって、端末装置は、ハイパーテキスト転送プロトコルを用いたストリーミングのオリジナルコンテンツに含まれるオリジナルマニュフェストファイルを取得するマニュフェスト取得部と、前記オリジナルコンテンツに含まれるオリジナルセグメントファイルを取得するセグメント取得部と、前記オリジナルセグメントファイルをデコードし、出力するデコーダー部と、デコードされた前記オリジナルセグメントファイルに関連付けられる形で新たに追加される追加コンテンツを取得するインターフェース部と、前記追加コンテンツをエンコードするエンコーダー部と、エンコードされた前記追加コンテンツを、前記オリジナルセグメントファイルの時刻に同期するようにセグメント化することによって追加セグメントファイルを生成するセグメント化部と、を具備し、前記サーバー装置は、前記オリジナルマニュフェストファイルに基づいて、前記オリジナルセグメントファイルと前記追加セグメントファイルとが同期するように、加工マニュフェストファイルを生成するマニュフェスト生成部と、前記オリジナルセグメントファイルと、前記追加セグメントファイルと、前記加工マニュフェストファイルとを、加工コンテンツとして、ハイパーテキスト転送プロトコルを用いて配信する再配信部と、を具備する、コンテンツ加工システムである。 [4] Further, one aspect of the present invention is a content processing system including a server device and a terminal device, and the terminal device acquires an original manifest file included in the original content of streaming using the hypertext transfer protocol. A new manifest acquisition unit, a segment acquisition unit that acquires the original segment file included in the original content, a decoder unit that decodes and outputs the original segment file, and a new form associated with the decoded original segment file. An additional segment file by segmenting the interface part that acquires the additional content added to the file, the encoder part that encodes the additional content, and the encoded additional content so as to synchronize with the time of the original segment file. The server device includes a segmentation unit that generates a processing manifest file based on the original manifest file so that the original segment file and the additional segment file are synchronized with each other. A content processing system including a redistribution unit that distributes the original segment file, the additional segment file, and the processing manifest file as processing content using a hypertext transfer protocol.

［５］また、本発明の一態様は、上記のコンテンツ加工システムにおいて、前記オリジナルセグメントファイルは、映像または音声の少なくともいずれかを符号化してなるデータを格納したものであり、前記デコーダー部が出力する映像または音声を解析することによって前記オリジナルセグメントファイルに基づく前記追加コンテンツを自動的に生成するコンテンツ生成部、をさらに具備するものである。 [5] Further, in one aspect of the present invention, in the above content processing system, the original segment file stores data obtained by encoding at least one of video and audio, and is output by the decoder unit. It further includes a content generation unit that automatically generates the additional content based on the original segment file by analyzing the video or audio to be generated.

［６］また、本発明の一態様は、上記のコンテンツ加工システムにおいて、前記コンテンツ生成部は、前記デコーダー部が出力する映像または音声の認識処理を行うことによって得られるテキストデータを含んだ前記追加コンテンツを生成するものである。 [6] Further, in one aspect of the present invention, in the above content processing system, the content generation unit includes the text data obtained by performing a video or audio recognition process output by the decoder unit. It produces content.

［７］また、本発明の一態様は、上記のコンテンツ加工システムにおいて、前記マニュフェスト生成部は、外部からの指示に基づき、前記オリジナルセグメントファイルのみを再生するための加工マニュフェストファイルを生成する機能を備え、前記再配信部は、外部からの前記指示に基づき、前記オリジナルセグメントファイルと前記加工マニュフェストファイルとのみを配信する機能を備える、ものである。 [7] Further, in one aspect of the present invention, in the above content processing system, the manifest generation unit has a function of generating a processing manifest file for reproducing only the original segment file based on an instruction from the outside. The redistribution unit is provided with a function of distributing only the original segment file and the processing manifest file based on the instruction from the outside.

［８］また、本発明の一態様は、ハイパーテキスト転送プロトコルを用いたストリーミングのオリジナルコンテンツに含まれるオリジナルマニュフェストファイルを取得するマニュフェスト取得部と、前記オリジナルコンテンツに含まれるオリジナルセグメントファイルを取得するセグメント取得部と、前記オリジナルセグメントファイルをデコードし、出力するデコーダー部と、デコードされた前記オリジナルセグメントファイルに関連付けられる形で新たに追加される追加コンテンツを取得するインターフェース部と、前記追加コンテンツをエンコードするエンコーダー部と、エンコードされた前記追加コンテンツを、前記オリジナルセグメントファイルの時刻に同期するようにセグメント化することによって追加セグメントファイルを生成するセグメント化部と、を具備する端末装置である。 [8] Further, one aspect of the present invention is a manifest acquisition unit that acquires an original manifest file included in the original content of streaming using the hypertext transfer protocol, and a segment that acquires the original segment file included in the original content. Encodes the acquisition unit, the decoder unit that decodes and outputs the original segment file, the interface unit that acquires the additional content newly added in a form associated with the decoded original segment file, and the additional content. The terminal device includes an encoder unit and a segmentation unit that generates an additional segment file by segmenting the encoded additional content so as to synchronize with the time of the original segment file.

［９］また、本発明の一態様は、コンピューターを、ハイパーテキスト転送プロトコルを用いたストリーミングのオリジナルコンテンツに含まれるオリジナルマニュフェストファイルを取得するマニュフェスト取得部と、前記オリジナルコンテンツに含まれるオリジナルセグメントファイルを取得するセグメント取得部と、前記オリジナルセグメントファイルをデコードし、出力するデコーダー部と、デコードされた前記オリジナルセグメントファイルに関連付けられる形で新たに追加される追加コンテンツを取得するインターフェース部と、前記追加コンテンツをエンコードするエンコーダー部と、エンコードされた前記追加コンテンツを、前記オリジナルセグメントファイルの時刻に同期するようにセグメント化することによって追加セグメントファイルを生成するセグメント化部と、前記オリジナルマニュフェストファイルに基づいて、前記オリジナルセグメントファイルと前記追加セグメントファイルとが同期するように、加工マニュフェストファイルを生成するマニュフェスト生成部と、前記オリジナルセグメントファイルと、前記追加セグメントファイルと、前記加工マニュフェストファイルとを、加工コンテンツとして、ハイパーテキスト転送プロトコルを用いて配信する再配信部と、を具備するコンテンツ加工システムとして機能させるためのプログラムである。 [9] Further, in one aspect of the present invention, a computer is provided with a manifest acquisition unit that acquires an original manifest file included in the original content of streaming using the hypertext transfer protocol, and an original segment file included in the original content. A segment acquisition unit to be acquired, a decoder unit that decodes and outputs the original segment file, an interface unit that acquires additional content newly added in a form associated with the decoded original segment file, and the additional content. Based on the encoder section that encodes the A manifest generator that generates a processing manifest file, the original segment file, the additional segment file, and the processing manifest file are used as processing contents so that the original segment file and the additional segment file are synchronized. It is a program for functioning as a content processing system including a redistribution unit that distributes using a hypertext transfer protocol.

［１０］また、本発明の一態様は、コンピューターを、ハイパーテキスト転送プロトコルを用いたストリーミングのオリジナルコンテンツに含まれるオリジナルマニュフェストファイルを取得するマニュフェスト取得部と、前記オリジナルコンテンツに含まれるオリジナルセグメントファイルを取得するセグメント取得部と、前記オリジナルセグメントファイルをデコードし、出力するデコーダー部と、デコードされた前記オリジナルセグメントファイルに関連付けられる形で新たに追加される追加コンテンツを取得するインターフェース部と、前記追加コンテンツをエンコードするエンコーダー部と、エンコードされた前記追加コンテンツを、前記オリジナルセグメントファイルの時刻に同期するようにセグメント化することによって追加セグメントファイルを生成するセグメント化部と、を具備する端末装置として機能させるためのプログラムである。 [10] Further, in one aspect of the present invention, a computer is provided with a manifest acquisition unit that acquires an original manifest file included in the original content of streaming using the hypertext transfer protocol, and an original segment file included in the original content. A segment acquisition unit to be acquired, a decoder unit that decodes and outputs the original segment file, an interface unit that acquires additional content newly added in a form associated with the decoded original segment file, and the additional content. To function as a terminal device including an encoder unit for encoding and a segmentation unit for generating an additional segment file by segmenting the encoded additional content so as to synchronize with the time of the original segment file. It is a program for.

実施形態によれば、追加するコンテンツのみをエンコードして配信し、オリジナルのコンテンツはオリジナルのセグメントファイルのまま配信することが可能である。これにより、小規模な装置構成で、ストリーミング配信されたコンテンツを加工して再配信することが可能となる。 According to the embodiment, only the content to be added can be encoded and distributed, and the original content can be distributed as the original segment file. This makes it possible to process and redistribute streamed content with a small device configuration.

本発明の実施形態によるコンテンツ加工システム１を含んだ配信システムの概略機能構成を示すブロック図である。It is a block diagram which shows the schematic functional structure of the distribution system including the content processing system 1 by embodiment of this invention. 同実施形態における端末装置３のより詳細な機能構成を示すブロック図である。It is a block diagram which shows the more detailed functional structure of the terminal apparatus 3 in the same embodiment. 同実施形態によるコンテンツ加工システム１による処理のアーキテクチャーを示す概略図である。It is the schematic which shows the architecture of the processing by the content processing system 1 by the same embodiment. 同実施形態によるコンテンツ加工システム１がコンテンツを追加する加工を行う場合のセグメントファイルの構成を示す概略図である。It is a schematic diagram which shows the structure of the segment file when the content processing system 1 by the same embodiment performs processing to add content. 同実施形態によるコンテンツ加工システム１がコンテンツを差し替える加工を行う場合のセグメントファイルの構成を示す概略図である。It is a schematic diagram which shows the structure of the segment file when the content processing system 1 by the same embodiment performs processing which replaces a content. 同実施形態において、ウェブサーバー装置７から配信され、コンテンツ加工システム１が受信する、オリジナルのマニュフェストファイルの例を示す概略図である。FIG. 5 is a schematic diagram showing an example of an original manifest file distributed from the web server device 7 and received by the content processing system 1 in the same embodiment. 同実施形態によるサーバー装置２のマニュフェスト生成部２２が生成するマニュフェストファイルの例を示す概略図である。It is the schematic which shows the example of the manifest file generated by the manifest generation part 22 of the server apparatus 2 by the same embodiment. 同実施形態によるコンテンツ加工システム１が管理のために用いる加工コンテンツ管理情報の構成例を示す概略図である。It is a schematic diagram which shows the structural example of the processing content management information used for management by the content processing system 1 by the same embodiment.

次に、本発明の一実施形態について、図面を参照しながら説明する。
図１は、本実施形態による配信システムの概略機能構成を示すブロック図である。この図において、符号０は、配信システムである。図示するように、配信システム０は、コンテンツ加工システム１と、エンコーダー装置６と、ウェブサーバー装置７と、受信端末８と、を含んで構成される。ウェブサーバー装置７とコンテンツ加工システム１との間は、インターネット１００によって接続されており、両者間での通信が可能である。また、コンテンツ加工システム１と受信端末８との間も、インターネット１００によって接続されており、両者間での通信が可能である。なお、コンテンツ加工システム１や、エンコーダー装置６や、ウェブサーバー装置７や、受信端末８として、それぞれ専用の装置を用いてもよいし、コンピューターを用いて実現してもよい。 Next, an embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a schematic functional configuration of a distribution system according to the present embodiment. In this figure, reference numeral 0 is a distribution system. As shown in the figure, the distribution system 0 includes a content processing system 1, an encoder device 6, a web server device 7, and a receiving terminal 8. The web server device 7 and the content processing system 1 are connected by the Internet 100, and communication between the two is possible. Further, the content processing system 1 and the receiving terminal 8 are also connected by the Internet 100, and communication between the two is possible. The content processing system 1, the encoder device 6, the web server device 7, and the receiving terminal 8 may each use dedicated devices or may be realized by using a computer.

配信システム０は、ウェブサーバー装置７側から、コンテンツ（映像、音声、テキスト等）を、受信端末８に配信するためのものである。
コンテンツ加工システム１は、ウェブサーバー装置７から配信されたコンテンツを受信し、加工し、加工済みのコンテンツを再配信する。ここでコンテンツの加工とは、例えば、コンテンツ（映像、音声、テキスト等）の追加や、一部のコンテンツの差し替えである。
エンコーダー装置６は、映像や音声等のコンテンツを符号化する装置である。
ウェブサーバー装置７は、エンコーダー装置６によって符号化されたコンテンツをＨＴＴＰライブストリーミング形式で配信する。ＨＴＴＰライブストリーミングには、例えば、ＨＬＳ（HTTP Live Streaming）やＭＰＥＧ−ＤＡＳＨ（Dynamic Adaptive Streaming over HTTP）といった技術を用いることができる。ＨＬＳやＭＰＥＧ−ＤＡＳＨ自体は、既存技術である。なお、「ＨＴＴＰ」は、ハイパーテキスト転送プロトコルを意味する。
受信端末８は、コンテンツ加工システム１から配信されるコンテンツを受信し、再生する。 The distribution system 0 is for distributing contents (video, audio, text, etc.) from the web server device 7 side to the receiving terminal 8.
The content processing system 1 receives the content distributed from the web server device 7, processes it, and redistributes the processed content. Here, the processing of the content is, for example, the addition of the content (video, audio, text, etc.) or the replacement of a part of the content.
The encoder device 6 is a device that encodes content such as video and audio.
The web server device 7 distributes the content encoded by the encoder device 6 in the HTTP live streaming format. For HTTP live streaming, for example, techniques such as HLS (HTTP Live Streaming) and MPEG-DASH (Dynamic Adaptive Streaming over HTTP) can be used. HLS and MPEG-DASH itself are existing technologies. In addition, "HTTP" means a hypertext transfer protocol.
The receiving terminal 8 receives and reproduces the content distributed from the content processing system 1.

図１に示すように、コンテンツ加工システム１は、サーバー装置２と、端末装置３と、を含んで構成される。サーバー装置２と端末装置３とは、相互に協調しながら動作することによって、ウェブサーバー装置７側から配信されたコンテンツを適宜加工する。なお、サーバー装置２は、複数台のコンピューターで構成されてもよい。また、サーバー装置２は、いわゆるクラウドサーバーであってもよい。また、同図では１台の端末装置３を示しているが、複数の端末装置３がコンテンツ加工システム１に含まれていてもよい。 As shown in FIG. 1, the content processing system 1 includes a server device 2 and a terminal device 3. The server device 2 and the terminal device 3 operate in cooperation with each other to appropriately process the content distributed from the web server device 7 side. The server device 2 may be composed of a plurality of computers. Further, the server device 2 may be a so-called cloud server. Further, although one terminal device 3 is shown in the figure, a plurality of terminal devices 3 may be included in the content processing system 1.

サーバー装置２は、マニュフェスト取得部２１と、マニュフェスト生成部２２と、セグメント取得部２５と、セグメント選択部２６と、ウェブサーバー部２８と、を含んで構成される。
端末装置３は、デコーダー部４１と、再生部４２と、エンコーダー部４８と、セグメント化部４９と、を含んで構成される。なお、端末装置３のより詳細な構成については、後で図２を参照しながら説明する。
ここに列挙した各機能部は、例えば、電子回路を用いて実現される。また、各機能部は、必要に応じて、半導体メモリーや磁気ハードディスク装置などといった記憶手段を内部に備えてよい。また、各機能を、コンピューターおよびソフトウェアによって実現するようにしてもよい。 The server device 2 includes a manifest acquisition unit 21, a manifest generation unit 22, a segment acquisition unit 25, a segment selection unit 26, and a web server unit 28.
The terminal device 3 includes a decoder unit 41, a reproduction unit 42, an encoder unit 48, and a segmentation unit 49. A more detailed configuration of the terminal device 3 will be described later with reference to FIG.
Each functional unit listed here is realized by using, for example, an electronic circuit. Further, each functional unit may be provided with a storage means such as a semiconductor memory or a magnetic hard disk device, if necessary. In addition, each function may be realized by a computer and software.

なお、図１に示す構成では端末装置３がウェブサーバー装置７側から配信されるコンテンツをインターネット１００から直接受信するようにしているが、端末装置３が、サーバー装置２からコンテンツを受け取るようにしてもよい。 In the configuration shown in FIG. 1, the terminal device 3 receives the content distributed from the web server device 7 side directly from the Internet 100, but the terminal device 3 receives the content from the server device 2. May be good.

ここで、本実施形態でコンテンツ加工システム１が配信するコンテンツの構成について説明する。
コンテンツは、１つまたは複数の素材で構成される。
素材は、映像や、音声や、テキストや、その他である。ここで、素材としてのテキストには、例えば、字幕テキストのように受信側の装置の画面に表示されるテキストもあれば、受信側のプログラムによって処理されるテキストもある。テキストは、例えば、プレーンテキストや、ＸＭＬ形式のデータ等である。
映像や音声は、適宜、符号化されている。
映像や音声やテキストなどといった素材は、適宜、セグメント化されている。セグメントは、コンテンツの素材を所定の時間長で切った断片である。セグメントの長さは、典型的には数秒程度である。コンテンツは、セグメント単位のファイルとして、配信され、必要に応じて保存され、管理される。１セグメント分のファイルを、セグメントファイルと呼ぶ場合がある。コンテンツの各セグメントには、開始時刻（提示開始時刻）および時間長が関連付けられている。開始時刻は、絶対時刻あるいは相対時刻として表現される。なお、開始時刻の代わりに、各セグメントが何らかのタイミング情報に関連付けられていてもよい。このタイミング情報は、例えば、コンテンツを送出する側のシステムにおけるクロック参照情報（「タイムスタンプ」とも呼ばれる）である。このようなタイミング情報は、配信時や再生時における時刻と厳密に一致していなくてもよい。ただし、こう言ったタイミング情報により、連続するセグメント間での相対的な時間関係は把握可能である。以下において、この種のタイミング情報を用いる場合も含めて、セグメントの「開始時刻」と呼ぶ。
なお、セグメントは、フラグメント、チャンク、断片などとも呼ばれる。
コンテンツが複数の素材で構成される場合、それら複数の素材は、各セグメントに関連付けられた開始時刻の情報によって同期する。
コンテンツは、１つまたは複数の素材で構成される。つまり、コンテンツが、１時点で複数の素材を含んでいてもよい。例えば、あるコンテンツが、ある時点において映像の素材と音声の素材とを含んでいてもよい。また、あるコンテンツが、ある時点において複数の音声素材、あるいは複数の映像素材を含んでいてもよい。あるコンテンツは、ある時点において任意の数の素材を含んでいてもよい。 Here, the configuration of the content distributed by the content processing system 1 in the present embodiment will be described.
Content consists of one or more materials.
The material is video, audio, text, and so on. Here, the text as the material includes, for example, text displayed on the screen of the receiving device such as subtitle text, and text processed by the receiving program. The text is, for example, plain text, data in XML format, or the like.
The video and audio are appropriately encoded.
Materials such as video, audio, and text are appropriately segmented. A segment is a fragment of a content material cut for a predetermined time length. The length of the segment is typically on the order of a few seconds. Content is distributed as segment-based files, and is stored and managed as needed. A file for one segment may be called a segment file. Each segment of content has a start time (presentation start time) and a time length associated with it. The start time is expressed as an absolute time or a relative time. Note that each segment may be associated with some timing information instead of the start time. This timing information is, for example, clock reference information (also referred to as “time stamp”) in the system on the side of sending the content. Such timing information does not have to exactly match the time at the time of distribution or reproduction. However, the relative time relationship between consecutive segments can be grasped from such timing information. In the following, it will be referred to as the "start time" of the segment, including the case where this kind of timing information is used.
The segment is also called a fragment, a chunk, a fragment, or the like.
If the content consists of multiple materials, the multiple materials are synchronized by the start time information associated with each segment.
Content consists of one or more materials. That is, the content may include a plurality of materials at one time point. For example, some content may include video material and audio material at some point in time. Further, a certain content may include a plurality of audio materials or a plurality of video materials at a certain point in time. Some content may contain any number of materials at any given time.

１つのコンテンツにおける複数の素材および複数のセグメントを指標するためのデータがマニュフェストである。マニュフェストのデータは、マニュフェストファイル内に保持される。マニュフェストのデータは、セグメントの開始時刻と、素材の種類と、そのセグメントのコンテンツデータを有するセグメントファイルの所在情報とを関連付けて管理する。ファイルの所在情報とは、ファイル名やＵＲＩ（Uniform Resource Identifier）やそれに類する情報である。つまり、マニュフェストのデータは、いつ（開始時刻）から何秒間（時間長）、どの種類の素材（映像か、音声か、その他か）を、どのファイルから読み込んで提示すべきかを表す。つまり、マニュフェストファイルは、配信されるコンテンツの再生手順に関する情報を含むものである。具体的には、ＨＬＳにおけるマニュフェストファイルは、ｍ３u８ファイルである。また、ＭＰＥＧ−ＤＡＳＨにおけるマニュフェストファイルは、ＭＰＤファイルである。コンテンツの再生装置（あるいは再生プログラム）は、マニュフェストのデータを参照することにより、適切なタイミングで、セグメントファイルを所定の場所から読み取り、提示する。ここで、提示とは、映像を表示装置に表示したり、音声をスピーカー等から出力したりすることである。 The data for indexing a plurality of materials and a plurality of segments in one content is a manifest. Manifest data is kept in the manifest file. The manifest data is managed by associating the start time of the segment, the type of material, and the location information of the segment file having the content data of the segment. The location information of a file is a file name, a URI (Uniform Resource Identifier), or similar information. In other words, the manifest data indicates from when (start time) how many seconds (time length), what kind of material (video, audio, etc.) should be read from which file and presented. That is, the manifest file contains information about the playback procedure of the delivered content. Specifically, the manifest file in HLS is an m3u8 file. The manifest file in MPEG-DASH is an MPD file. The content playback device (or playback program) reads the segment file from a predetermined location and presents it at an appropriate timing by referring to the data in the manifest. Here, the presentation means displaying a video on a display device or outputting audio from a speaker or the like.

つまり、ウェブサーバー装置７は、ＨＬＳやＭＰＥＧ−ＤＡＳＨを用いて、コンテンツを含んだセグメントファイルを配信する。また、ウェブサーバー装置７は、セグメントファイルの再生手順等を記述したマニュフェストファイルを配信する。これら、ウェブサーバー装置７側から送られるファイルを、便宜的に「オリジナル」と呼ぶ。端末装置３側では、後で説明するように、オリジナルのセグメントファイルの時刻情報（開始時刻（「先頭時刻」とも呼ばれる）および時間長（「デュレーション」とも呼ばれる））に同期したセグメントファイルが生成される。生成されるセグメントファイルは、例えば、映像、音声、映像プラス音声、テキスト等のコンテンツを格納したファイルである。端末装置３側で生成されるファイルを、便宜的に「追加」と呼ぶ。 That is, the web server device 7 distributes the segment file including the content by using HLS or MPEG-DASH. In addition, the web server device 7 distributes a manifest file that describes a procedure for reproducing the segment file and the like. These files sent from the web server device 7 side are called "original" for convenience. On the terminal device 3 side, as will be described later, a segment file synchronized with the time information (start time (also referred to as “start time”) and time length (also referred to as “duration”)) of the original segment file is generated. NS. The generated segment file is, for example, a file that stores contents such as video, audio, video plus audio, and text. The file generated on the terminal device 3 side is called "addition" for convenience.

サーバー装置２側では、オリジナルのセグメントファイルに、追加のセグメントファイルを加えて、新たなコンテンツとする。あるいは、サーバー装置２が、オリジナルのセグメントファイルの少なくとも一部を、追加のセグメントファイルで差し替えて、新たなコンテンツとしてもよい。また、サーバー装置２が、追加のセグメントファイルを加え、且つオリジナルのセグメントファイルの少なくとも一部を追加のセグメントファイルで差し替えるようにしてもよい。なお、「差し替え」は、追加のセグメントファイルを加え、且つオリジナルのセグメントファイルの少なくとも一部を削除する（つまり、下流側に流さない）ことと等価である。本実施形態の特徴の一つは、上記のいずれの場合も、コンテンツ加工システム１において生成された新たな追加のコンテンツが、ウェブサーバー装置７側から配信されるオリジナルのコンテンツに加えられる。具体的には、セグメント選択部２６が、セグメント取得部２５によって取得されたセグメントと、端末装置３側から渡されるセグメントの内、受信端末８に向けて再配信するセグメントを選択する。 On the server device 2 side, an additional segment file is added to the original segment file to create new content. Alternatively, the server device 2 may replace at least a part of the original segment file with an additional segment file to provide new content. Further, the server device 2 may add an additional segment file and replace at least a part of the original segment file with the additional segment file. Note that "replacement" is equivalent to adding an additional segment file and deleting at least a part of the original segment file (that is, not flowing to the downstream side). One of the features of this embodiment is that in any of the above cases, the new additional content generated in the content processing system 1 is added to the original content distributed from the web server device 7 side. Specifically, the segment selection unit 26 selects a segment to be redistributed toward the receiving terminal 8 from the segment acquired by the segment acquisition unit 25 and the segment passed from the terminal device 3 side.

なお、サーバー装置２においては上記のように、追加のセグメントファイルが加えられる。しかし、例えばエンドユーザー側である受信端末８からの要求に応じて、追加のセグメントファイルを加えない形態を選択できるようにしてもよい。つまり、セグメントファイルの追加あるいは差し替えを行わないことを選択できるようにしてもよい。これを、便宜的に「パススルー」と呼ぶ。
コンテンツをパススルーするよう指示された場合、マニュフェスト生成部２２は、オリジナルセグメントファイルのみを再生するための加工マニュフェストファイルを生成する。また、その場合、ウェブサーバー部２８は、オリジナルセグメントファイルと加工マニュフェストファイルとのみを配信する。 In the server device 2, an additional segment file is added as described above. However, for example, in response to a request from the receiving terminal 8 on the end user side, a form in which an additional segment file is not added may be selected. That is, it may be possible to choose not to add or replace the segment file. This is called "pass-through" for convenience.
When instructed to pass through the content, the manifest generation unit 22 generates a processed manifest file for reproducing only the original segment file. Further, in that case, the web server unit 28 delivers only the original segment file and the processing manifest file.

また、サーバー装置２のマニュフェスト生成部２２は、セグメント選択部２６が選択するセグメントファイルに合わせて、マニュフェストファイルを生成する。 Further, the manifest generation unit 22 of the server device 2 generates a manifest file according to the segment file selected by the segment selection unit 26.

これにより、コンテンツ加工システム１は、コンテンツを追加したり差し替えたり、といった加工を行うことができる。また、コンテンツ加工システム１は、オリジナルコンテンツをパススルーして再配信する動作を選択することもできる。これにより、受信端末８側において、加工されたコンテンツを受信し、視聴することができるようになる。コンテンツ加工システム１においてコンテンツを加工しても、受信端末８側ではストリームの切り替え等を意識せずに、動的に映像や音声等のコンテンツの差し替え、追加が可能となる。 As a result, the content processing system 1 can perform processing such as adding or replacing contents. In addition, the content processing system 1 can also select an operation of passing through and redistributing the original content. As a result, the processed content can be received and viewed on the receiving terminal 8 side. Even if the content is processed by the content processing system 1, the receiving terminal 8 can dynamically replace and add the content such as video and audio without being aware of the stream switching and the like.

次に、サーバー装置２が有する各機能部の機能について説明する。
マニュフェスト取得部２１は、ウェブサーバー装置７から送信されるマニュフェストファイルを取得する。つまり、マニュフェスト取得部２１は、ハイパーテキスト転送プロトコルを用いたストリーミングのオリジナルコンテンツに含まれるオリジナルマニュフェストファイルを取得する。
マニュフェスト生成部２２は、マニュフェスト取得部２１が取得したマニュフェストファイルに基づき、またセグメント選択部２６が選択するセグメントファイルに基づき、新たなマニュフェストファイルを生成し、ウェブサーバー部２８に渡す。マニュフェスト生成部２２が生成するマニュフェストファイルは、コンテンツ加工システム１が実施する加工内容に対応するものである。つまり、マニュフェスト生成部２２は、オリジナルマニュフェストファイルに基づいて、オリジナルセグメントファイル（オリジナルコンテンツのセグメントファイル）と追加セグメントファイル（追加コンテンツのセグメントファイル）とが同期するように、加工マニュフェストファイル（加工コンテンツのマニュフェストファイル）を生成する。 Next, the functions of each functional unit of the server device 2 will be described.
The manifest acquisition unit 21 acquires the manifest file transmitted from the web server device 7. That is, the manifest acquisition unit 21 acquires the original manifest file included in the original content of the streaming using the hypertext transfer protocol.
The manifest generation unit 22 generates a new manifest file based on the manifest file acquired by the manifest acquisition unit 21 and the segment file selected by the segment selection unit 26, and passes the new manifest file to the web server unit 28. The manifest file generated by the manifest generation unit 22 corresponds to the processing content performed by the content processing system 1. That is, the manifest generation unit 22 synchronizes the original segment file (segment file of the original content) and the additional segment file (segment file of the additional content) based on the original manifest file, so that the processed manifest file (processed content segment file) is synchronized. Manifest file) is generated.

マニュフェスト生成部２２は、取得したオリジナルセグメントファイルのすべてを含み、且つ追加セグメントファイルを含んだ追加型加工コンテンツを再生するための加工マニュフェストファイルを生成することができる。これは、コンテンツの追加用である。また、マニュフェスト生成部２２は、取得した前記オリジナルセグメントファイルのうちの一部のみを含み、且つ追加セグメントファイルを含んだ差し替え型加工コンテンツを再生するための加工マニュフェストファイルを生成することができる。これは、コンテンツの差し替え用である。 The manifest generation unit 22 can generate a processing manifest file that includes all of the acquired original segment files and that reproduces the additional processing content including the additional segment files. This is for adding content. In addition, the manifest generation unit 22 can generate a processing manifest file for reproducing the replacement type processed content including only a part of the acquired original segment file and including the additional segment file. This is for content replacement.

セグメント取得部２５は、ウェブサーバー装置７から送信されるセグメントファイルを取得する。
セグメント選択部２６は、セグメント取得部２５が取得したセグメントファイルと、端末装置３から渡されるセグメントファイルとから、配信対象とするセグメントファイルを選択する。セグメント選択部２６は、配信対象として選択したセグメントファイルをウェブサーバー部２８に渡す。 The segment acquisition unit 25 acquires the segment file transmitted from the web server device 7.
The segment selection unit 26 selects a segment file to be distributed from the segment file acquired by the segment acquisition unit 25 and the segment file passed from the terminal device 3. The segment selection unit 26 passes the segment file selected as the distribution target to the web server unit 28.

ウェブサーバー部２８は、セグメント選択部２６から渡されたセグメントファイルと、マニュフェスト生成部２２によって生成されたマニュフェストファイルとを、インターネット１００経由で配信する。つまり、ウェブサーバー部２８は、オリジナルセグメントファイルと、追加セグメントファイルと、加工マニュフェストファイルとを、加工コンテンツとして、ハイパーテキスト転送プロトコルを用いて配信する。ウェブサーバー部２８は、例えば、ＨＬＳやＭＰＥＧ−ＤＡＳＨといった方法を用いて、コンテンツの再配信を行う。ウェブサーバー部２８は、「再配信部」とも呼ばれる。 The web server unit 28 distributes the segment file passed from the segment selection unit 26 and the manifest file generated by the manifest generation unit 22 via the Internet 100. That is, the web server unit 28 distributes the original segment file, the additional segment file, and the processing manifest file as processed contents by using the hypertext transfer protocol. The web server unit 28 redistributes the content by using, for example, a method such as HLS or MPEG-DASH. The web server unit 28 is also referred to as a "redistribution unit".

端末装置３内の内部の各部については、図２を参照しながら説明するため、ここでは説明を省略する。 Since each part inside the terminal device 3 will be described with reference to FIG. 2, description thereof will be omitted here.

図２は、端末装置３のより詳細な機能構成を示すブロック図である。図示するように、端末装置３は、マニュフェスト取得部３１と、マニュフェスト解析部３２と、セグメント取得部３３と、時刻解析部３５と、デコーダー部４１と、再生部４２と、コンテンツ生成部４３と、Ａ／Ｖインターフェース部４４と、ミキサー部４５と、エンコーダー部４８と、セグメント化部４９と、アップロード部５０と、を含んで構成される。 FIG. 2 is a block diagram showing a more detailed functional configuration of the terminal device 3. As shown in the figure, the terminal device 3 includes a manifest acquisition unit 31, a manifest analysis unit 32, a segment acquisition unit 33, a time analysis unit 35, a decoder unit 41, a playback unit 42, a content generation unit 43, and the like. It includes an A / V interface unit 44, a mixer unit 45, an encoder unit 48, a segmentation unit 49, and an upload unit 50.

マニュフェスト取得部３１は、ウェブサーバー装置７から送信されるマニュフェストファイルを取得する。つまり、マニュフェスト取得部３１は、ハイパーテキスト転送プロトコルを用いたストリーミングのオリジナルコンテンツに含まれるオリジナルマニュフェストファイルを取得する。マニュフェスト取得部３１は、取得したマニュフェストファイルをマニュフェスト解析部３２に渡す。
マニュフェスト解析部３２は、マニュフェスト取得部３１から渡されたマニュフェストファイルを解析する。即ち、マニュフェスト解析部３２は、ウェブサーバー装置７から配信されるコンテンツの構造を解析する。具体的には、マニュフェスト解析部３２は、マニュフェストファイルから、取得すべきセグメントファイルに関して、その所在情報と開始時刻とを抽出する。マニュフェスト解析部３２は、解析結果に基づきセグメントのアクセス情報をセグメント取得部３３に渡す。具体的には、マニュフェスト解析部３２は、取得すべきセグメントファイルの所在情報と開始時刻の情報を、セグメント取得部３３に渡す。さらに、マニュフェスト解析部３２が、セグメントファイルの時間長の情報を抽出してセグメント取得部３３に渡してもよい。また、マニュフェスト解析部３２は、マニュフェストファイルからセグメントファイルの構成の情報と、各セグメントファイルの時刻情報とを抽出する。マニュフェスト解析部３２は、抽出した情報（各セグメントファイルの時刻情報等）をセグメント化部４９に渡す。 The manifest acquisition unit 31 acquires the manifest file transmitted from the web server device 7. That is, the manifest acquisition unit 31 acquires the original manifest file included in the original content of the streaming using the hypertext transfer protocol. The manifest acquisition unit 31 passes the acquired manifest file to the manifest analysis unit 32.
The manifest analysis unit 32 analyzes the manifest file passed from the manifest acquisition unit 31. That is, the manifest analysis unit 32 analyzes the structure of the content distributed from the web server device 7. Specifically, the manifest analysis unit 32 extracts the location information and the start time of the segment file to be acquired from the manifest file. The manifest analysis unit 32 passes the segment access information to the segment acquisition unit 33 based on the analysis result. Specifically, the manifest analysis unit 32 passes the location information and the start time information of the segment file to be acquired to the segment acquisition unit 33. Further, the manifest analysis unit 32 may extract the time length information of the segment file and pass it to the segment acquisition unit 33. In addition, the manifest analysis unit 32 extracts information on the configuration of the segment file and time information on each segment file from the manifest file. The manifest analysis unit 32 passes the extracted information (time information of each segment file, etc.) to the segmentation unit 49.

セグメント取得部３３は、マニュフェスト解析部３２から、取得すべきセグメントファイルに関する情報を受け取る。そして、セグメント取得部３３は、マニュフェスト解析部３２から受け取った情報に基づいて、ウェブサーバー装置７から送信されるセグメントファイルを取得する。つまり、セグメント取得部３３は、前記オリジナルコンテンツに含まれるオリジナルセグメントファイルを取得する。セグメント取得部３３は、取得したセグメントファイルをデコーダー部４１に渡す。また、セグメント取得部３３は、取得したセグメントファイルの少なくとも時刻に関する情報を、時刻解析部３５に渡す。 The segment acquisition unit 33 receives information regarding the segment file to be acquired from the manifest analysis unit 32. Then, the segment acquisition unit 33 acquires the segment file transmitted from the web server device 7 based on the information received from the manifest analysis unit 32. That is, the segment acquisition unit 33 acquires the original segment file included in the original content. The segment acquisition unit 33 passes the acquired segment file to the decoder unit 41. Further, the segment acquisition unit 33 passes information regarding at least the time of the acquired segment file to the time analysis unit 35.

時刻解析部３５は、セグメント取得部３３から、セグメントファイル、またはセグメントファイルの時刻に関する情報を受け取る。そして、時刻解析部３５は、セグメントファイルごとに時刻情報の解析を行う。時刻解析部３５は、セグメントファイルごとに、少なくとも開始時刻および時間長の情報を出力する。つまり、時刻解析部３５は、各セグメントの開始時刻および時間長の情報をセグメント化部４９に渡す。 The time analysis unit 35 receives information about the time of the segment file or the segment file from the segment acquisition unit 33. Then, the time analysis unit 35 analyzes the time information for each segment file. The time analysis unit 35 outputs at least information on the start time and time length for each segment file. That is, the time analysis unit 35 passes information on the start time and time length of each segment to the segmentation unit 49.

デコーダー部４１は、セグメント取得部３３が取得したセグメントファイルをデコードする。つまり、デコーダー部４１は、オリジナルセグメントファイルをデコードし、出力する。具体的には、デコーダー部４１は、セグメントファイル内に格納されている映像や音声のデータをデコードする。また、デコーダー部４１は、セグメントファイル内に格納されている他のデータ（テキストデータ等）を抽出する。デコーダー部４１は、デコードした結果のデータを再生部４２、コンテンツ生成部４３、およびＡ／Ｖインターフェース部４４に渡す。 The decoder unit 41 decodes the segment file acquired by the segment acquisition unit 33. That is, the decoder unit 41 decodes the original segment file and outputs it. Specifically, the decoder unit 41 decodes the video and audio data stored in the segment file. In addition, the decoder unit 41 extracts other data (text data, etc.) stored in the segment file. The decoder unit 41 passes the decoded data to the reproduction unit 42, the content generation unit 43, and the A / V interface unit 44.

再生部４２は、デコーダー部４１においてデコードされた映像や音声を、指定された時刻情報に基づいて再生する。再生部４２は、映像をディスプレイ装置に表示し、音声をスピーカー等から出力する。また、再生部４２が、デコーダー部４１から渡された映像や音声以外のデータを、定められた方法で適切に処理するようにしてもよい。一例として、再生部４２は、デコーダー部４１から渡されるテキストデータを、指定されたタイミングで、且つ指定された方法で、画面に表示する。このテキストデータは、例えば、タイムドテキスト（timed text）であり、より具体的には、スーパーインポーズや字幕のデータである。また、再生部４２が、デコーダー部４１から渡される画像のデータを画面に表示するようにしてもよい。また、再生部４２が、デコーダー部４１から渡されるテキストデータを読み上げるように合成音声を出力してもよい。また、再生部４２が、デコーダー部４１から渡されるその他のデータを、再生部４２上で稼働するプログラムへの入力として与えてもよい。また、再生部４２が、デコーダー部４１から渡されるプログラムを、再生部４２上で稼働させてもよい。プログラムを実行させる場合、再生部４２は、プログラム実行環境を具備する。プログラム実行環境の一例は、ＪａｖａＳｃｒｉｐｔインタープリターであるが、プログラムの記述言語あるいは形態はこれに限られない。 The reproduction unit 42 reproduces the video and audio decoded by the decoder unit 41 based on the designated time information. The playback unit 42 displays the video on the display device and outputs the sound from the speaker or the like. Further, the reproduction unit 42 may appropriately process data other than the video and audio passed from the decoder unit 41 by a predetermined method. As an example, the reproduction unit 42 displays the text data passed from the decoder unit 41 on the screen at a specified timing and by a specified method. This text data is, for example, timed text, and more specifically, superimpose or subtitle data. Further, the reproduction unit 42 may display the image data passed from the decoder unit 41 on the screen. Further, the reproduction unit 42 may output the synthetic voice so as to read out the text data passed from the decoder unit 41. Further, the reproduction unit 42 may give other data passed from the decoder unit 41 as an input to the program running on the reproduction unit 42. Further, the reproduction unit 42 may run the program passed from the decoder unit 41 on the reproduction unit 42. When executing a program, the reproduction unit 42 includes a program execution environment. An example of a program execution environment is a Javascript interpreter, but the description language or form of the program is not limited to this.

コンテンツ生成部４３は、デコーダー部４１がデコードした結果のデータに基づく処理を行う。そして、コンテンツ生成部４３は、その処理の結果として、セグメント取得部３３が取得したセグメントファイルとは別のコンテンツ（あるいはコンテンツの素材）を生成する。
例えば、コンテンツ生成部４３は、デコーダー部４１が出力する映像または音声を解析することによってオリジナルセグメントファイルに基づく追加コンテンツを自動的に生成することができる。
また、コンテンツ生成部４３は、前記デコーダー部が出力する映像または音声の認識処理を行うことによって得られるテキストデータを含んだ前記追加コンテンツを生成することができる。
また、コンテンツ生成部４３は、例えば、デコーダー部４１においてデコードされた音声に含まれる発話文章の音声認識を行い、その文章を書き起こしたテキストデータを出力する。
また、コンテンツ生成部４３は、例えば、デコーダー部４１においてデコードされた音声に含まれる発話文章の言語翻訳処理を行い、翻訳後の文章を、テキストとしてあるいは音声として出力する。
また、コンテンツ生成部４３は、例えば、デコーダー部４１においてデコードされた音声に含まれる発話文章に対する応答を、人工知能等を用いて生成し、生成した応答文章を、テキストとしてあるいは音声として出力する。
また、コンテンツ生成部４３は、例えば、デコーダー部４１においてデコードされた音声のフーリエ解析処理を行い、フーリエ解析の結果のデータを出力する。
また、コンテンツ生成部４３は、例えば、デコーダー部４１においてデコードされた映像に基づいて認識処理（画像認識、文字認識等）を行い、認識処理の結果を出力する。
また、コンテンツ生成部４３は、例えば、デコーダー部４１においてデコードされた映像に関する各種の画像処理を行い、画像処理の結果を出力する。
また、コンテンツ生成部４３は、例えば、デコーダー部４１においてデコードされた映像および音声の内容に関する認識処理を行い、映像および音声内に特定のシーンが検出された場合に、効果音あるいは特定の映像・画像を出力する。
また、コンテンツ生成部４３が、上に例示した処理だけでなく、デコーダー部４１から渡されるコンテンツに基づいて様々な処理を行い、新たなコンテンツを生成するようにしてもよい。
なお、上で例示したコンテンツ生成部４３による処理に含まれる、音声認識処理、言語翻訳処理、人工知能による応答処理、フーリエ変換処理、認識処理、画像処理等の処理自体は、既存技術により実現可能なものである。 The content generation unit 43 performs processing based on the data as a result of decoding by the decoder unit 41. Then, as a result of the processing, the content generation unit 43 generates content (or content material) different from the segment file acquired by the segment acquisition unit 33.
For example, the content generation unit 43 can automatically generate additional content based on the original segment file by analyzing the video or audio output by the decoder unit 41.
In addition, the content generation unit 43 can generate the additional content including the text data obtained by performing the video or audio recognition process output by the decoder unit.
Further, the content generation unit 43, for example, performs voice recognition of the utterance sentence included in the voice decoded by the decoder unit 41, and outputs the text data transcribed from the sentence.
Further, the content generation unit 43 performs language translation processing of the utterance sentence included in the voice decoded by the decoder unit 41, and outputs the translated sentence as text or voice.
Further, the content generation unit 43 generates, for example, a response to the utterance sentence included in the voice decoded by the decoder unit 41 by using artificial intelligence or the like, and outputs the generated response sentence as a text or a voice.
Further, the content generation unit 43 performs, for example, a Fourier analysis process of the voice decoded by the decoder unit 41, and outputs the data of the result of the Fourier analysis.
Further, the content generation unit 43 performs recognition processing (image recognition, character recognition, etc.) based on the video decoded by the decoder unit 41, and outputs the result of the recognition processing.
Further, the content generation unit 43 performs various image processing on the video decoded by the decoder unit 41, for example, and outputs the result of the image processing.
Further, the content generation unit 43 performs recognition processing regarding the contents of the video and audio decoded by the decoder unit 41, and when a specific scene is detected in the video and audio, a sound effect or a specific video / audio is detected. Output the image.
Further, the content generation unit 43 may perform various processes based on the contents passed from the decoder unit 41 in addition to the processes illustrated above to generate new contents.
The processing itself such as voice recognition processing, language translation processing, response processing by artificial intelligence, Fourier transform processing, recognition processing, and image processing included in the processing by the content generation unit 43 exemplified above can be realized by existing technology. It is a thing.

Ａ／Ｖインターフェース部４４は、デコーダー部４１によってデコードされたコンテンツ素材を受け取る。また、Ａ／Ｖインターフェース部４４は、デコーダー部４１から渡されたコンテンツ素材をミキサー部４５に渡す。また、Ａ／Ｖインターフェース部４４は、ミキサー部４５から新たなコンテンツ素材を受け取り、エンコーダー部４８に渡す。つまり、Ａ／Ｖインターフェース部４４は、デコードされたオリジナルセグメントファイルに関連付けられる形で新たに追加される追加コンテンツを取得する。なお、Ａ／Ｖインターフェース部４４が、デコーダー部４１から渡されたコンテンツ素材の少なくとも一部を、エンコーダー部４８に渡すようにしてもよい。
なお、Ａ／Ｖインターフェース部４４は、単に「インターフェース部」とも呼ばれる。 The A / V interface unit 44 receives the content material decoded by the decoder unit 41. Further, the A / V interface unit 44 passes the content material passed from the decoder unit 41 to the mixer unit 45. Further, the A / V interface unit 44 receives a new content material from the mixer unit 45 and passes it to the encoder unit 48. That is, the A / V interface unit 44 acquires the newly added additional content in a form associated with the decoded original segment file. The A / V interface unit 44 may pass at least a part of the content material passed from the decoder unit 41 to the encoder unit 48.
The A / V interface unit 44 is also simply referred to as an "interface unit".

ミキサー部４５は、Ａ／Ｖインターフェース部４４から渡されたコンテンツと、外部から入力されるコンテンツとを、適宜混合して、出力する。ミキサー部４５が出力するコンテンツは、Ａ／Ｖインターフェース部４４を経由して、エンコーダー部４８に渡される。ミキサー部４５が混合する処理は、例えば、音声と音声の混合、映像と映像の混合などである。映像コンテンツ素材が内部に音声を含む場合、ミキサー部４５が、映像と音声とを混合する処理を行ってもよい。ミキサー部４５が混合する場合の混合比は、任意に設定可能である。また、ミキサー部４５は、コンテンツ生成部４３から渡されるコンテンツや、外部から入力される映像または音声等を、単独で、Ａ／Ｖインターフェース部４４側に渡してもよい。 The mixer unit 45 appropriately mixes and outputs the content passed from the A / V interface unit 44 and the content input from the outside. The content output by the mixer unit 45 is passed to the encoder unit 48 via the A / V interface unit 44. The process of mixing the mixer unit 45 is, for example, mixing audio and audio, mixing video and video, and the like. When the video content material contains audio inside, the mixer unit 45 may perform a process of mixing the video and audio. The mixing ratio when the mixer unit 45 mixes can be arbitrarily set. Further, the mixer unit 45 may independently pass the content passed from the content generation unit 43, the video or audio input from the outside, or the like to the A / V interface unit 44 side.

つまり、デコーダー部４１がオリジナルコンテンツをデコードし、再生部４２がオリジナルコンテンツを再生するタイミングに合わせて、ミキサー部４５は新たなコンテンツを取得する。あるいは、デコーダー部４１がオリジナルコンテンツをデコードし、コンテンツ生成部４３がオリジナルコンテンツを処理（解析処理等）するタイミングに合わせて、ミキサー部４５は新たなコンテンツを取得する。
なお、新たなコンテンツは、追加用のコンテンツや、差し替え用のコンテンツである。 That is, the decoder unit 41 decodes the original content, and the mixer unit 45 acquires the new content at the timing when the reproduction unit 42 reproduces the original content. Alternatively, the mixer unit 45 acquires new content at the timing when the decoder unit 41 decodes the original content and the content generation unit 43 processes the original content (analysis processing or the like).
The new content is content for addition or content for replacement.

エンコーダー部４８は、Ａ／Ｖインターフェース部４４から渡されるコンテンツ素材をエンコードし、セグメント化部４９に渡す。つまり、エンコーダー部４８は、追加コンテンツをエンコードする。つまり、エンコーダー部４８は、端末装置３側で追加されたコンテンツ（映像や、音声や、映像プラス音声など）を、再度エンコードして出力する。なお、エンコーダー部４８がエンコード処理する際のパラメーターは、ウェブサーバー装置７側から配信されたオリジナルの映像／音声にしたがって動的に設定される。 The encoder unit 48 encodes the content material passed from the A / V interface unit 44 and passes it to the segmentation unit 49. That is, the encoder unit 48 encodes the additional content. That is, the encoder unit 48 re-encodes and outputs the content (video, audio, video plus audio, etc.) added on the terminal device 3 side. The parameters when the encoder unit 48 performs the encoding process are dynamically set according to the original video / audio distributed from the web server device 7 side.

セグメント化部４９は、セグメント取得部３３が取得したセグメントファイルに同期するように、ミキサー部４５で入力された新たなコンテンツをセグメント化する。つまり、オリジナルコンテンツと新たに追加されたコンテンツは、同期する。具体的には、セグメント化部４９は、時刻解析部３５から渡される時刻情報にしたがって、エンコーダー部４８からの出力を適切に区切り、セグメント化する。つまり、セグメント化部４９は、エンコードされた追加コンテンツを、オリジナルセグメントファイルの時刻に同期するようにセグメント化することによって追加セグメントファイルを生成する。そして、セグメント化部４９は、生成したセグメントファイルをアップロード部５０に渡す。
アップロード部５０は、セグメント化部４９から渡されたセグメントファイルを、サーバー装置２にアップロードする。 The segmentation unit 49 segments new content input by the mixer unit 45 so as to synchronize with the segment file acquired by the segment acquisition unit 33. That is, the original content and the newly added content are synchronized. Specifically, the segmentation unit 49 appropriately divides the output from the encoder unit 48 and segments it according to the time information passed from the time analysis unit 35. That is, the segmentation unit 49 generates the additional segment file by segmenting the encoded additional content so as to synchronize with the time of the original segment file. Then, the segmentation unit 49 passes the generated segment file to the upload unit 50.
The upload unit 50 uploads the segment file passed from the segmentation unit 49 to the server device 2.

上記の処理により、セグメント化部４９は、オリジナルセグメントファイルと追加セグメントファイルとの間で、セグメントの開始時刻およびセグメント時間長を同一にする。そのため、セグメント化部４９は、時刻解析部３５が最初に受信したオリジナルセグメントファイルの先頭タイムスタンプの情報を取得する。そして、セグメント化部４９は、時刻解析部３５から取得したセグメントの開始時刻を起点として、予め定められたセグメント時間長に基づいて、最初のセグメントおよび以後のセグメントの開始時刻を算出する。セグメント化部４９は、そのように算出された各セグメントの開始時刻を用いて、生成する追加セグメントファイルのタイムススタンプの情報を決定する。これにより、サーバー装置２側では、コンテンツの統合を容易に行うことができる。 By the above processing, the segmentation unit 49 makes the segment start time and the segment time length the same between the original segment file and the additional segment file. Therefore, the segmentation unit 49 acquires the information of the head time stamp of the original segment file first received by the time analysis unit 35. Then, the segmentation unit 49 calculates the start time of the first segment and the subsequent segments based on the predetermined segment time length, starting from the start time of the segment acquired from the time analysis unit 35. The segmentation unit 49 determines the time stamp information of the additional segment file to be generated by using the start time of each segment calculated in this way. As a result, the contents can be easily integrated on the server device 2 side.

上記のように、端末装置３では、再生部４２が再生したコンテンツ（オリジナルのコンテンツ）と、付加するコンテンツ（映像または音声等）をミキサー部４５でミックスする。これにより、オリジナルコンテンツとミックスしたコンテンツとの間で、映像／音声の同期ずれはほぼ生じない。
例えば、端末装置３が映像および音声を含むオリジナルコンテンツを取得し、そのコンテンツに追加の音声コンテンツを付加する場合、次の３つのコンテンツがサーバー装置２から再配信されることとなる。即ち、オリジナルコンテンツに含まれる音声であるオリジナル音声と、端末側で付加する音声である付加音声と、オリジナルコンテンツに含まれるオリジナル映像の３つのコンテンツである。これらの３つのコンテンツ相互間で、同期ずれは生じない。よって、例えば、上記のオリジナル映像とオリジナル音声とを再生する場合にも、上記のオリジナル映像と付加音声とを再生する場合にも、映像と音声との間で同期ずれは生じない。 As described above, in the terminal device 3, the content (original content) reproduced by the reproduction unit 42 and the content to be added (video, audio, etc.) are mixed by the mixer unit 45. As a result, there is almost no video / audio synchronization shift between the original content and the mixed content.
For example, when the terminal device 3 acquires the original content including video and audio and adds additional audio content to the content, the following three contents are redistributed from the server device 2. That is, there are three contents: the original sound which is the sound included in the original content, the added sound which is the sound added on the terminal side, and the original video included in the original content. There is no synchronization shift between these three contents. Therefore, for example, when the above-mentioned original video and the original audio are reproduced, or when the above-mentioned original video and the additional audio are reproduced, there is no synchronization shift between the video and the audio.

図３は、コンテンツ加工システム１による処理のアーキテクチャーを示す概略図である。図３に示すクラウド処理２０２は、図１におけるサーバー装置２による処理に対応する。つまり、図３に示すアーキテクチャーは、図１におけるサーバー装置２として、いわゆるクラウドサーバーを利用する場合のものである。また、図３に示す端末処理２０３は、図１における端末装置３による処理に対応する。また、ストリーム２００は、ウェブサーバー装置７側から配信されるオリジナルのストリームである。 FIG. 3 is a schematic diagram showing a processing architecture by the content processing system 1. The cloud processing 202 shown in FIG. 3 corresponds to the processing by the server device 2 in FIG. That is, the architecture shown in FIG. 3 is for using a so-called cloud server as the server device 2 in FIG. Further, the terminal process 203 shown in FIG. 3 corresponds to the process by the terminal device 3 in FIG. The stream 200 is an original stream delivered from the web server device 7 side.

クラウド処理２０２は、コンテンツの追加の処理とコンテンツの差し替えの処理を含む。追加用および差し替え用のコンテンツは、端末処理２０３側で生成されるものである。端末処理２０３は、ストリーム２００を参照するとともに、そのストリーム２００に基づき、追加用または差し替え用のコンテンツの素材を生成し、クラウド処理２０２側に提供する。制御２０１は、どの素材を追加するか、また、どの素材をどの素材で差し替えるかといったことを制御する。つまり、制御２０１は、素材のセグメントファイルを取捨選択するとともに、選択されたセグメントに合うマニュフェストファイルを生成するための制御を行う。また、追加用のセグメントファイルや、差し替え用のセグメントファイルは、オリジナルのセグメントファイルとの間で同期するように制御される。つまり、追加用のコンテンツ素材や、差し替え用のコンテンツ素材は、オリジナルのセグメントファイルと整合するようにセグメント化される。そして、追加用のセグメントファイルや差し替え用のセグメントファイルには、オリジナルのセグメントファイルと同期する時刻情報（開始時刻、時間長）が付与される。マニュフェストファイルには、同期を考慮して付与された時刻情報が書き込まれる。つまり、受信側では、マニュフェストファイルを参照することにより、オリジナルのコンテンツと、追加ないしは差し替えのコンテンツとが同期して再生される。 The cloud process 202 includes a process of adding content and a process of replacing content. The content for addition and replacement is generated on the terminal processing 203 side. The terminal processing 203 refers to the stream 200, and based on the stream 200, generates the material of the content for addition or replacement and provides it to the cloud processing 202 side. The control 201 controls which material is added and which material is replaced with which material. That is, the control 201 selects the segment file of the material and controls to generate the manifest file that matches the selected segment. In addition, the segment file for addition and the segment file for replacement are controlled to be synchronized with the original segment file. That is, the additional content material and the replacement content material are segmented so as to be consistent with the original segment file. Then, time information (start time, time length) that synchronizes with the original segment file is added to the additional segment file and the replacement segment file. Time information given in consideration of synchronization is written in the manifest file. That is, on the receiving side, by referring to the manifest file, the original content and the added or replaced content are played back in synchronization.

選択２１０は、クラウド処理２０２から出力されるどのようなストリームを受信側で視聴するかを選択する処理である。
ストリーム２１１は、コンテンツ追加のストリームである。即ち、ストリーム２１１は、オリジナルのストリーム２００に含まれるコンテンツを維持したまま、さらに端末処理２０３において生成された追加のコンテンツを含んだストリームである。
ストリーム２１２は、コンテンツ差し替えのストリームである。即ち、ストリーム２１２は、オリジナルのストリーム２００に含まれるコンテンツのうちの少なくとも一部を、端末処理２０３において生成された追加のコンテンツで置き換えたストリームである。
ストリーム２１３は、パススルーのストリームである。即ち、ストリーム２１３は、オリジナルのストリーム２００にコンテンツを追加したり、ストリーム２００のコンテンツを差し替えたりすることなく、ストリーム２００をそのまま再配信する。 The selection 210 is a process of selecting what kind of stream output from the cloud process 202 is to be viewed on the receiving side.
The stream 211 is a stream for adding content. That is, the stream 211 is a stream that further includes the additional content generated in the terminal processing 203 while maintaining the content included in the original stream 200.
The stream 212 is a content replacement stream. That is, the stream 212 is a stream in which at least a part of the contents included in the original stream 200 is replaced with the additional contents generated in the terminal processing 203.
Stream 213 is a pass-through stream. That is, the stream 213 redistributes the stream 200 as it is without adding content to the original stream 200 or replacing the content of the stream 200.

次に、コンテンツを追加したり差し替えたりした場合におけるセグメントファイルの具体例について説明する。 Next, a specific example of the segment file when the content is added or replaced will be described.

図４は、コンテンツ加工システム１がコンテンツを追加する場合のセグメントファイルの構成を示す概略図である。同図において、横方向が時間軸である。また、同図には、時刻ｔ１，ｔ２，・・・，ｔ７のそれぞれを開始時刻とするセグメントファイルが含まれている。なお、時刻ｔ８以後については記載を省略している。図示するＣ１１，Ｃ１２，・・・，Ｃ１７は、当該コンテンツに含まれる特定の素材（例えば、映像あるいは音声など）のセグメントファイルの系列である。また、Ｃ２１，Ｃ２２，・・・，Ｃ２７は、当該コンテンツに含まれる他の素材（例えば、映像あるいは音声など）のセグメントファイルの系列である。これらの２つの系列、即ち、Ｃ１１，Ｃ１２，・・・，Ｃ１７の系列と、Ｃ２１，Ｃ２２，・・・，Ｃ２７の系列とは、オリジナルのコンテンツに含まれるものである。つまり、これらの２つの系列に属するセグメントファイルを、実線の四角形で示している。一方、時刻ｔ５を開始時刻とするＣ３５と、それに後続するＣ３６，Ｃ３７は、コンテンツ加工システム１によって追加されたコンテンツである。つまり、Ｃ３５，Ｃ３６，Ｃ３７は、オリジナルのコンテンツには含まれていない。これら、コンテンツ加工システム１によって追加されたコンテンツのセグメントファイルを、破線の四角形で示している。図示するように、追加されたコンテンツのセグメントファイルＣ３５，Ｃ３６，Ｃ３７は、それぞれ、オリジナルコンテンツに含まれるセグメントファイルＣ１５，Ｃ１６，Ｃ１７およびＣ２５，Ｃ２６，Ｃ２７と同期している。つまり、追加されたコンテンツは、オリジナルコンテンツのセグメントファイルと同期するように分割され、時刻情報の付与が行われる。 FIG. 4 is a schematic diagram showing the structure of a segment file when the content processing system 1 adds content. In the figure, the horizontal direction is the time axis. Further, the figure includes segment files having each of the times t1, t2, ..., T7 as the start time. The description is omitted after the time t8. The illustrated C11, C12, ..., C17 are a series of segment files of a specific material (for example, video or audio) included in the content. Further, C21, C22, ..., C27 are a series of segment files of other materials (for example, video or audio) included in the content. These two series, that is, the series of C11, C12, ..., C17 and the series of C21, C22, ..., C27 are included in the original content. That is, the segment files belonging to these two series are shown by solid rectangles. On the other hand, C35 whose start time is time t5 and C36 and C37 following it are contents added by the content processing system 1. That is, C35, C36, and C37 are not included in the original content. The segment files of the contents added by the content processing system 1 are shown by broken lines. As shown, the segment files C35, C36, and C37 of the added content are synchronized with the segment files C15, C16, C17 and C25, C26, and C27 included in the original content, respectively. That is, the added content is divided so as to be synchronized with the segment file of the original content, and the time information is added.

図４で示したように、追加されるコンテンツ（セグメントファイルＣ３５，Ｃ３６，Ｃ３７）の系列は、オリジナルコンテンツのセグメントファイルと同期するようにセグメント化される。つまり、セグメントファイルＣ３５の開始時刻は、セグメントファイルＣ１５およびＣ２５の開始時刻と同じである。また、セグメントファイルＣ３５の時間長が、セグメントファイルＣ１５およびＣ２５の時間長と同一になるようにしてもよい。以後のセグメントファイルに関しても同様である。また、マニュフェストファイル（プレイリストファイル）においては、各セグメントファイルが同期するように時刻情報が記述される。セグメント化部４９およびマニュフェスト生成部２２は、上記の通り系列間でセグメントファイルが同期するように、出力するファイルの時刻情報を制御する。 As shown in FIG. 4, the series of the added contents (segment files C35, C36, C37) is segmented so as to be synchronized with the segment file of the original contents. That is, the start time of the segment file C35 is the same as the start time of the segment files C15 and C25. Further, the time length of the segment file C35 may be the same as the time length of the segment files C15 and C25. The same applies to the subsequent segment files. Further, in the manifest file (playlist file), the time information is described so that each segment file is synchronized. The segmentation unit 49 and the manifest generation unit 22 control the time information of the output file so that the segment files are synchronized between the series as described above.

図５は、コンテンツ加工システム１がコンテンツの差し替えを行う場合のセグメントファイルの構成を示す概略図である。この図においても、横方向が時間軸である。また、図４の場合と同様に、図示するセグメントファイルＣ１１，Ｃ１２，・・・，Ｃ１７は、コンテンツに含まれる特定の素材に属する。そして、本図の場合、セグメントファイルＣ２１，Ｃ２２，・・・，Ｃ２７の系列は、途中から、別のコンテンツのセグメントファイルＣ４５，Ｃ４６，Ｃ４７に差し替えられている。 FIG. 5 is a schematic diagram showing the structure of a segment file when the content processing system 1 replaces the content. Also in this figure, the horizontal direction is the time axis. Further, as in the case of FIG. 4, the illustrated segment files C11, C12, ..., C17 belong to a specific material included in the content. Then, in the case of this figure, the sequence of the segment files C21, C22, ..., C27 is replaced with the segment files C45, C46, C47 of another content from the middle.

図５で示したように、差し替えられるコンテンツ（セグメントファイルＣ４５，Ｃ４６，Ｃ４７）の系列は、オリジナルコンテンツのセグメントファイルと同期するようにセグメント化される。つまり、セグメントファイルＣ４５の開始時刻は、セグメントファイルＣ１５の開始時刻と同じである。また、セグメントファイルＣ４５の時間長が、セグメントファイルＣ１５の時間長と同一になるようにしてもよい。以後のセグメントファイルに関しても同様である。また、マニュフェストファイル（プレイリストファイル）においては、各セグメントファイルが同期するように時刻情報が記述される。セグメント化部４９およびマニュフェスト生成部２２は、上記の通り系列間でセグメントファイルが同期するように、出力するファイルの時刻情報を制御する。 As shown in FIG. 5, the series of the contents to be replaced (segment files C45, C46, C47) is segmented so as to be synchronized with the segment file of the original contents. That is, the start time of the segment file C45 is the same as the start time of the segment file C15. Further, the time length of the segment file C45 may be the same as the time length of the segment file C15. The same applies to the subsequent segment files. Further, in the manifest file (playlist file), the time information is described so that each segment file is synchronized. The segmentation unit 49 and the manifest generation unit 22 control the time information of the output file so that the segment files are synchronized between the series as described above.

次に、マニュフェストファイルの例について説明する。既に述べたように、コンテンツ加工システム１は、ウェブサーバー装置７側から受信したオリジナルコンテンツを加工し、加工コンテンツとして再配信する。このとき、コンテンツ加工システム１は、オリジナルコンテンツのマニュフェストファイルを受信する。また、コンテンツ加工システム１内のマニュフェスト生成部２２は、再配信する加工コンテンツのためのマニュフェストファイルを生成する。図６および図７は、それぞれ、オリジナルコンテンツのマニュフェストファイルと、加工コンテンツのマニュフェストファイルを示す。なお、図６および図７に示すマニュフェストファイルは、「マスタープレイリスト」とも呼ばれる。マスタープレイリストファイル内に定義されるプレイリストファイルが、実際に再生すべきセグメントファイルの情報を含む。 Next, an example of a manifest file will be described. As described above, the content processing system 1 processes the original content received from the web server device 7 side and redistributes it as the processed content. At this time, the content processing system 1 receives the manifest file of the original content. In addition, the manifest generation unit 22 in the content processing system 1 generates a manifest file for the processed content to be redistributed. 6 and 7 show a manifest file of the original content and a manifest file of the processed content, respectively. The manifest files shown in FIGS. 6 and 7 are also referred to as "master playlists". The playlist file defined in the master playlist file contains information about the segment file that should actually be played.

図６は、ウェブサーバー装置７から配信され、コンテンツ加工システム１が受信する、オリジナルのマニュフェストファイルの例を示す概略図である。図示するマニュフェストファイルは、Ｍ３Ｕ８ファイル（Ｍ３Ｕファイル）である。なお、同図では、ファイルの各行に参照のための行番号を付している。オリジナルのマニュフェストファイルの内容は、次の通りである。
第１行目の「＃ＥＸＴＭ３Ｕ」は、ファイルのヘッダーであり、このファイルが拡張Ｍ３Ｕファイルであることを表している。
第２行目の「＃ＥＸＴ−Ｘ−ＶＥＲＳＩＯＮ：３」は、マニュフェストファイルのバージョン番号が「３」であることを表している。
第３行目の「＃ＥＸＴ−Ｘ−ＩＮＤＥＰＥＮＤＥＮＴ−ＳＥＧＭＥＮＴＳ」は、当該マスタープレイリストから参照されるすべてのプレイリスト内のあるセグメント内のコンテンツが、他のセグメントの情報から独立であることを表すタグである。つまり、あるセグメント内のコンテンツは、他のセグメントの情報なしに復号可能である。
第４行目から第１７行目までは、７本のストリームを定義している情報である。第４行目と第５行目とのペアが、第１のストリームを定義する。第６行目と第７行目とのペアが、第２のストリームを定義する。第８行目と第９行目とのペアが、第３のストリームを定義する。第１０行目と第１１行目とのペアが、第４のストリームを定義する。第１２行目と第１３行目とのペアが、第５のストリームを定義する。第１４行目と第１５行目とのペアが、第６のストリームを定義する。第１６行目と第１７行目とのペアが、第７のストリームを定義する。各ストリームの情報は、「＃ＥＸＴ−Ｘ−ＳＴＲＥＡＭ−ＩＮＦ」タグと、プレイリストファイルのファイル名（ｍ３ｕ８ファイル名）の情報とで構成される。「＃ＥＸＴ−Ｘ−ＳＴＲＥＡＭ−ＩＮＦ」タグは、「ＢＡＮＤＷＩＤＴＨ」と、「ＡＶＥＲＡＧＥ−ＢＡＮＤＷＩＤＴＨ」と、「ＣＯＤＥＣＳ」と、「ＲＥＳＯＬＵＴＩＯＮ」と、「ＦＲＡＭＥ−ＲＡＴＥ」と、「ＣＬＯＳＥＤ−ＣＡＰＴＩＯＮ」の各パラメーターを有している。「ＢＡＮＤＷＩＤＴＨ」は、帯域幅を表す。「ＡＶＥＲＡＧＥ−ＢＡＮＤＷＩＤＴＨ」は、平均帯域幅を表す。「ＣＯＤＥＣＳ」は、符号化および復号の情報を表す。「ＲＥＳＯＬＵＴＩＯＮ」は、映像の解像度を表す。「ＦＲＡＭＥ−ＲＡＴＥ」は、フレームレート（単位時間当たりフレーム数）を表す。「ＣＬＯＳＥＤ−ＣＡＰＴＩＯＮ」は、クローズドキャプションの有無を表す。
各ストリームのｍ３ｕ８ファイルは、例えば第５行目の「ｔｅｓｔ２＿２７０．ｍ３ｕ８」のように、相対的な位置の情報としてマニュフェストファイルに記述される。 FIG. 6 is a schematic diagram showing an example of an original manifest file distributed from the web server device 7 and received by the content processing system 1. The illustrated manifest file is an M3U8 file (M3U file). In the figure, each line of the file is given a line number for reference. The contents of the original manifest file are as follows.
The first line "# EXTM3U" is the header of the file, indicating that this file is an extended M3U file.
The second line "# EXT-X-VERSION: 3" indicates that the version number of the manifest file is "3".
The third line "# EXT-X-INDEPENDENT-SEGMENTS" indicates that the content in one segment in all playlists referenced from the master playlist is independent of the information in other segments. It is a tag. That is, the content in one segment can be decrypted without the information in the other segment.
The 4th to 17th lines are information defining seven streams. The pair of lines 4 and 5 defines the first stream. The pair of lines 6 and 7 defines the second stream. The pair of 8th and 9th lines defines the third stream. The pair of lines 10 and 11 defines the fourth stream. The pair of lines 12 and 13 defines the fifth stream. The pair of lines 14 and 15 defines the sixth stream. The pair of lines 16 and 17 defines the seventh stream. The information of each stream is composed of the "# EXT-X-STREAM-INF" tag and the information of the file name (m3u8 file name) of the playlist file. The "# EXT-X-STREAM-INF" tag is a parameter of "BANDWIDTH", "AVERAGE-BANDWIDTH", "CODECS", "RESOLUTION", "FRAME-RATE", and "CLOSED-CAPTION". have. "BANDWIDTH" represents bandwidth. "AVERAGE-BANDWIDTH" represents the average bandwidth. "CODECS" represents coding and decoding information. "RESOLUTION" represents the resolution of the image. "FRAME-RATE" represents a frame rate (number of frames per unit time). "CLOSED-CAPTION" indicates the presence or absence of closed captions.
The m3u8 file of each stream is described in the manifest file as relative position information, for example, "test2_270.m3u8" on the fifth line.

図７は、サーバー装置２のマニュフェスト生成部２２が生成するマニュフェストファイルの例を示す概略図である。マニュフェスト生成部２２は、サーバー装置２が受信した図６のマニュフェストファイルに基づいて、この図７のマニュフェストファイルを生成する。図示するマニュフェストファイルは、Ｍ３Ｕ８ファイル（Ｍ３Ｕファイル）である。なお、同図では、ファイルの各行に参照のための行番号を付している。コンテンツ加工システム１によって生成される加工コンテンツのマニュフェストファイルの内容は、次の通りである。
第１行目の「＃ＥＸＴＭ３Ｕ」（ヘッダー）と、第２行目の「＃ＥＸＴ−Ｘ−ＶＥＲＳＩＯＮ：３」（バージョン情報）とは、図６において説明したものと同様である。 FIG. 7 is a schematic view showing an example of a manifest file generated by the manifest generation unit 22 of the server device 2. The manifest generation unit 22 generates the manifest file of FIG. 7 based on the manifest file of FIG. 6 received by the server device 2. The illustrated manifest file is an M3U8 file (M3U file). In the figure, each line of the file is given a line number for reference. The contents of the manifest file of the processed contents generated by the contents processing system 1 are as follows.
The first line "# EXTM3U" (header) and the second line "# EXT-X-VERSION: 3" (version information) are the same as those described in FIG.

第３行目および第４行目は、マニュフェスト生成部２２が付加した情報であり、音声（オーディオ）のｍ３ｕ８ファイルを規定するものである。第３行目および第４行目の「＃ＥＸＴ−Ｘ−ＭＥＤＩＡ」タグは、相互に代替可能な２つのメディアを関連付ける。第３行目および第４行目の「＃ＥＸＴ−Ｘ−ＭＥＤＩＡ」タグは、「ＴＹＰＥ＝ＡＵＤＩＯ」（種別が、音声）、「ＧＲＯＵＰ−ＩＤ＝”ａｕｄｉｏ”」（「ａｕｄｉｏ」というグループＩＤを有する）という共通の記述を持つ。
しかし、第３行目の「＃ＥＸＴ−Ｘ−ＭＥＤＩＡ」タグが「ＮＡＭＥ＝ｍｉｘｅｄ」（混合音声）という記述を持つのに対して、第４行目の「＃ＥＸＴ−Ｘ−ＭＥＤＩＡ」タグは「ＮＡＭＥ＝ｏｒｉｇｉｎａｌ」（オリジナル音声）という記述を持つ。このように、第３行目と第４行目とでは、音声メディアの名称が異なる。また、第３行目が「ＤＥＦＡＵＬＴ＝ＹＥＳ」（デフォルト音声である）という記述を持つのに対して、第４行目は「ＤＥＦＡＵＬＴ＝ＮＯ」（デフォルト音声ではない）という記述を持つ。また、第３行目の「＃ＥＸＴ−Ｘ−ＭＥＤＩＡ」タグが指定するプレイリストファイルのＵＲＬは「ｍｉｘｅｄ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」である。一方、第４行目の「＃ＥＸＴ−Ｘ−ＭＥＤＩＡ」タグが指定するプレイリストファイルのＵＲＬは「ｏｒｉｇｉｎａｌ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８」である。 The third and fourth lines are information added by the manifest generation unit 22, and define an audio (audio) m3u8 file. The "# EXT-X-MEDIA" tags on the third and fourth lines associate two mutually substitutable media. The "# EXT-X-MEDIA" tags on the 3rd and 4th lines have the group IDs "TYPE = AUDIO" (type is voice) and "GROUP-ID =" audio ""("audio"). Has a common description.
However, while the "# EXT-X-MEDIA" tag on the third line has the description "NAME = mixed" (mixed voice), the "# EXT-X-MEDIA" tag on the fourth line has a description. It has a description of "NAME = original" (original voice). As described above, the names of the audio media are different between the third line and the fourth line. Further, the third line has the description "DEFAULT = YES" (default voice), while the fourth line has the description "DEFAULT = NO" (not the default voice). The URL of the playlist file specified by the "# EXT-X-MEDIA" tag on the third line is "mixed / playlist.m3u8". On the other hand, the URL of the playlist file specified by the "# EXT-X-MEDIA" tag on the fourth line is "original / playlist.m3u8".

上記のように、サーバー装置２側のマニュフェスト生成部２２は、端末装置３が生成した追加用あるいは差し替え用のコンテンツ（セグメントファイルの系列）に応じて、マニュフェストファイルを生成する。具体的には、マニュフェスト生成部２２が生成したマニュフェストファイルでは、第３行目と第４行目において、代替可能な２種類の音声コンテンツを記述する。また、それらは、互いに異なる名称を持ち、異なるプレイリストのＵＲＬを指定する。また、第３行目で指定される音声がデフォルトである（即ち、暗黙に選択される）のに対して、第４行目で指定される音声はデフォルトではない。 As described above, the manifest generation unit 22 on the server device 2 side generates a manifest file according to the content for addition or replacement (series of segment files) generated by the terminal device 3. Specifically, in the manifest file generated by the manifest generation unit 22, two types of substitutable audio contents are described in the third line and the fourth line. They also have different names and specify different playlist URLs. Also, the voice specified in the third line is the default (that is, implicitly selected), whereas the voice specified in the fourth line is not the default.

さらに、図７に示すように、マニュフェスト生成部２２は、第３行目および第４行目で定義される音声コンテンツのグループＩＤ（ＧＲＯＵＰ−ＩＤ＝”ａｕｄｉｏ”）を、第６行目から第１９行目までが参照するように書き換える。つまり、図７の第６行目から第１９行目までは、図６の第４行目から第１７行目までに対応する記述である。図７のマニュフェストファイルに記述された７本のストリームに関して、マニュフェスト生成部２２は、「＃ＥＸＴ−Ｘ−ＳＴＲＥＡＭ−ＩＮＦ」タグに、「ＡＵＤＩＯ＝”ａｕｄｉｏ”」というパラメーター指定を追記している。この小文字で記載された「ａｕｄｉｏ」が、第３行目および第４行目で定義されたグループＩＤである。つまり、図７に記述された７本のストリームは、図６のファイルから引き継いだものであるが、マニュフェスト生成部２２は、これらのストリームのそれぞれに「ａｕｄｉｏ」というグループを関連付けている。 Further, as shown in FIG. 7, the manifest generation unit 22 sets the group ID (GROUP-ID = "audio") of the audio content defined in the third and fourth lines from the sixth line to the third line. Rewrite so that the 19th line is referenced. That is, the sixth to 19th lines of FIG. 7 are descriptions corresponding to the fourth to 17th lines of FIG. With respect to the seven streams described in the manifest file of FIG. 7, the manifest generation unit 22 adds the parameter specification "AUDIO =" audio "" to the "# EXT-X-STREAM-INF" tag. The "audio" written in this lowercase letter is the group ID defined in the third and fourth lines. That is, the seven streams described in FIG. 7 are inherited from the file of FIG. 6, and the manifest generation unit 22 associates a group called "audio" with each of these streams.

マニュフェスト生成部２２が生成するこのマニュフェストファイルにより、受信端末８側では、オリジナル音声（プレイリストのＵＲＬが、ｏｒｉｇｉｎａｌ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８）だけではなく、加工された混合音声（プレイリストのＵＲＬが、ｍｉｘｅｄ／ｐｌａｙｌｉｓｔ．ｍ３ｕ８）を再生することができる。
なお、混合音声（ｍｉｘｅｄ）は、例えば、スポーツイベントの中継映像において、オリジナルの配信元からの音声（例えば、イベント会場音声）と、コンテンツ加工システム１で入力される音声（例えば、日本語解説等）とをミックスしたものである。 Due to this manifest file generated by the manifest generation unit 22, not only the original audio (playlist URL is original / playlist.m3u8) but also processed mixed audio (playlist URL is mixed) on the receiving terminal 8 side. /Playlist.m3u8) can be reproduced.
The mixed audio (mixed) is, for example, the audio from the original distribution source (for example, the event venue audio) and the audio input by the content processing system 1 (for example, Japanese commentary, etc.) in the live video of a sports event. ) And is mixed.

図８は、コンテンツ加工システム１が管理のために用いる加工コンテンツ管理情報の構成例を示す概略図である。端末装置３の制御部（不図示）がコンテンツを加工する際に、図８に示す加工コンテンツ管理情報を生成し、サーバー装置２に渡す。サーバー装置２の制御部（不図示）は、端末装置３からこの加工コンテンツ管理情報を受け取り、蓄積する。１つのオリジナルコンテンツに対して、１つの端末装置３が、１つまたは複数の加工コンテンツを生成し、そのセグメントファイルをサーバー装置２に渡す。その際、端末装置３は、加工コンテンツごとに、図８の加工コンテンツ管理情報を生成する。また、１つのオリジナルコンテンツに対して、複数の端末装置３が、それぞれ加工コンテンツを生成し、そのセグメントファイルをサーバー装置２に渡すようにしてもよい。その際、加工コンテンツを生成する各々の端末装置３が、加工コンテンツごとに、図８の加工コンテンツ管理情報を生成する。 FIG. 8 is a schematic view showing a configuration example of processed content management information used by the content processing system 1 for management. When the control unit (not shown) of the terminal device 3 processes the content, the processed content management information shown in FIG. 8 is generated and passed to the server device 2. The control unit (not shown) of the server device 2 receives and stores the processed content management information from the terminal device 3. One terminal device 3 generates one or a plurality of processed contents for one original content, and passes the segment file to the server device 2. At that time, the terminal device 3 generates the processed content management information of FIG. 8 for each processed content. Further, a plurality of terminal devices 3 may generate processed contents for one original content, and pass the segment file to the server device 2. At that time, each terminal device 3 that generates the processed content generates the processed content management information of FIG. 8 for each processed content.

図示するように、加工コンテンツ管理情報は、オリジナルコンテンツＩＤと、加工コンテンツＩＤと、被置換系列リストと、追加系列リストと、加工者ＩＤとを含む（同図（Ａ））。ここで、「ＩＤ」は、識別情報（identifier）を意味する。 As shown in the figure, the processed content management information includes an original content ID, a processed content ID, a replacement series list, an additional series list, and a processor ID (FIG. (A)). Here, "ID" means identification information (identifier).

オリジナルコンテンツＩＤは、加工対象であるオリジナルのコンテンツをユニークに識別するための情報である。図示する例では、オリジナルコンテンツＩＤとして８桁の数値を用いているがデータの形式に特に制約はない。
加工コンテンツＩＤは、コンテンツ加工システム１において生成した加工コンテンツをユニークに識別するための情報である。図示する例では、加工コンテンツＩＤとして、オリジナルコンテンツＩＤと枝番（４ケタの数値）とを用いているが、データの形式に特に制約はない。なお、加工コンテンツＩＤは、端末装置３側で付与するようにしてもよいし、サーバー装置２側で付与するようにしてもよい。
被置換系列リストは、オリジナルコンテンツには存在していたがこの加工コンテンツで置換されたセグメントファイルの系列の情報である。被置換系列リストに含まれる系列数は任意である。図示する例では、被置換系列リスト（同図（Ｂ））は、２個の系列を有している。これは、オリジナルコンテンツが含む複数の系列のうちの２個の系列が被置換系列であることを表す。被置換系列リストは、表形式のデータであり、系列ＩＤと、系列種別と、開始時刻と、終了時刻の各項目を有している。系列ＩＤは、セグメントフィルの系列をユニークに識別するための情報である。系列種別は、系列の種別（例えば、映像、音声、映像プラス音声、字幕等）を表す情報である。開始時刻は、当該被置換系列の置換が開始される時刻を、年月日、時分秒、および秒未満の通し番号の形式で表した情報である。終了時刻は、当該被置換系列の置換が終了される時刻を、開始時刻と同様の形式で表した情報である。
追加系列リストは、オリジナルコンテンツには存在せずこの加工コンテンツで追加されたセグメントファイルの系列の情報である。追加系列リストに含まれる系列数は任意である。図示する例では、追加系列リスト（同図（Ｃ））は、２個の系列を有している。これは、当該加工コンテンツにおいて２個の系列が追加されることを表す。追加系列リストは、表形式のデータであり、被置換系列リストと同様に、系列ＩＤと、系列種別と、開始時刻と、終了時刻の各項目を有している。開始時刻は、当該追加系列の追加が開始される時刻を表す。終了時刻は、当該追加系列の追加が終了する時刻を表す。
加工者ＩＤは、当該加工コンテンツを生成する者（ユーザーや事業者等）をユニークに識別するための情報である。 The original content ID is information for uniquely identifying the original content to be processed. In the illustrated example, an 8-digit numerical value is used as the original content ID, but the data format is not particularly limited.
The processed content ID is information for uniquely identifying the processed content generated by the content processing system 1. In the illustrated example, the original content ID and the branch number (four-digit numerical value) are used as the processed content ID, but the data format is not particularly limited. The processed content ID may be assigned on the terminal device 3 side or on the server device 2 side.
The replaced series list is information on the series of segment files that existed in the original content but were replaced by this processed content. The number of series included in the replaced series list is arbitrary. In the illustrated example, the substituted sequence list (FIG. (B)) has two sequences. This means that two of the plurality of series included in the original content are replaced series. The replaced series list is tabular data, and has each item of series ID, series type, start time, and end time. The series ID is information for uniquely identifying the series of the segment fill. The series type is information representing the type of series (for example, video, audio, video plus audio, subtitles, etc.). The start time is information representing the time when the replacement of the replaced series is started in the form of a serial number of a year, month, day, hour, minute, second, and less than a second. The end time is information representing the time when the replacement of the replaced series is completed in the same format as the start time.
The additional series list is information on the series of segment files that do not exist in the original content and are added in this processed content. The number of series included in the additional series list is arbitrary. In the illustrated example, the additional sequence list (FIG. C) has two sequences. This means that two series are added in the processed content. The additional series list is tabular data, and has each item of series ID, series type, start time, and end time, like the replaced series list. The start time represents the time when the addition of the additional series is started. The end time represents the time when the addition of the additional series ends.
The processor ID is information for uniquely identifying a person (user, business operator, etc.) who generates the processed content.

なお、上記の追加系列リストに含まれる系列は、被置換系列を置換するものである場合と、オリジナルコンテンツに単純に追加されるものである場合とがある。 The series included in the above-mentioned additional series list may be a series that replaces the replaced series or a series that is simply added to the original content.

なお、上述した実施形態におけるサーバー装置２や端末装置３の機能の少なくとも一部を、コンピューターで実現するようにしても良い。その場合、この機能を実現するためのプログラムをコンピューター読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピューターシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピューターシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ＵＳＢメモリー等の可搬媒体、コンピューターシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピューター読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバーやクライアントとなるコンピューターシステム内部の揮発性メモリーのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピューターシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 It should be noted that at least a part of the functions of the server device 2 and the terminal device 3 in the above-described embodiment may be realized by a computer. In that case, the program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by the computer system and executed. The term "computer system" as used herein includes hardware such as an OS and peripheral devices. The "computer-readable recording medium" is a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, a DVD-ROM, or a USB memory, or a storage device such as a hard disk built in a computer system. Say that. Further, a "computer-readable recording medium" is a communication line for transmitting a program via a network such as the Internet or a communication line such as a telephone line, and dynamically holds the program for a short period of time. It may also include a program that holds a program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or a client in that case. Further, the above-mentioned program may be a program for realizing a part of the above-mentioned functions, and may be a program for realizing the above-mentioned functions in combination with a program already recorded in the computer system.

次に、コンテンツ加工システム１を用いて実現できるサービスの例について説明する。 Next, an example of a service that can be realized by using the content processing system 1 will be described.

第１サービス例：動画同期共有アプリ
例えば、スポーツ競技の映像および音声が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。当該スポーツ競技をスタジアム等で観戦しているユーザーが、端末装置３（例えば、スマートフォン等）においてコンテンツ加工用のアプリケーションプログラム（以下において「アプリ」と呼ぶ場合あり）を立ち上げる。ユーザーの操作により、当該スタジアム内で、端末装置３は動画を撮影し、追加コンテンツとしてサーバー装置２にアップロードすることができる。また、ユーザーの入力操作等により、端末装置３はコメントテキストをサーバー装置２にアップロードすることができる。それら追加のコンテンツを含んだ加工コンテンツが、コンテンツ加工システム１から配信される。その加工コンテンツを、他のユーザーの端末装置３が受信し、さらに別のコンテンツ（映像、音声、テキスト等）を追加することができる。このようなコンテンツの追加は、多段階的に行ってもよい。また、その加工コンテンツを、コンテンツ配信事業者等が加工・編集して配信することができる。つまり、オリジナルコンテンツに、多数のユーザーがコンテンツを付加して多元的にコンテンツを楽しむことが可能となる。 First service example: Video synchronization sharing application For example, video and audio of a sports competition are distributed as original contents from the web server device 7. A user who is watching the sports competition at a stadium or the like launches an application program for content processing (hereinafter, may be referred to as an "application") on a terminal device 3 (for example, a smartphone or the like). By the user's operation, the terminal device 3 can shoot a moving image in the stadium and upload it to the server device 2 as additional content. In addition, the terminal device 3 can upload the comment text to the server device 2 by a user input operation or the like. The processed content including the additional content is distributed from the content processing system 1. The processed content is received by the terminal device 3 of another user, and another content (video, audio, text, etc.) can be added. Such content may be added in multiple stages. In addition, the processed content can be processed / edited by a content distribution company or the like and distributed. That is, it is possible for a large number of users to add content to the original content and enjoy the content in multiple ways.

第２サービス例：聖火リレー動画配信サービス
例えば、スポーツイベントに関連して、聖火リレーの模様、および関連する映像および音声が、スポーツ競技の映像および音声が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。例えば、聖火リレーで運ばれる聖火トーチに、３６０度撮影可能な小型のカメラを取り付ける。また、その聖火トーチに、小型の端末装置３を取り付ける。聖火トーチに取り付けられた端末装置３は、聖火視点の３６０度動画を、追加コンテンツとしてサーバー装置２にアップロードする。また、聖火リレーを沿道で応援する一般ユーザーは、端末装置３（例えば、スマートフォン等）でリレーの模様を撮影し、追加コンテンツとしてサーバー装置２にアップロードする。サーバー装置は、追加コンテンツを含んだ映像コンテンツを、ウェブサーバー部２８から再配信する。
これにより、聖火視点の映像や一般ユーザーが撮影した映像を用いたコンテンツを楽しむことが可能となる。沿道で聖火リレーを応援する誰もが、コンテンツの制作者として参加することができる。 Second service example: Holy fire relay video distribution service For example, in connection with a sporting event, the pattern of the holy fire relay and related video and audio are distributed from the web server device 7 as original content of the video and audio of the sports competition. Will be done. For example, a small camera capable of shooting 360 degrees is attached to the torch torch carried by the torch relay. In addition, a small terminal device 3 is attached to the torch torch. The terminal device 3 attached to the torch torch uploads a 360-degree moving image of the torch viewpoint to the server device 2 as additional content. In addition, a general user who supports the torch relay along the road takes a picture of the relay pattern with the terminal device 3 (for example, a smartphone or the like) and uploads it to the server device 2 as additional content. The server device redistributes the video content including the additional content from the web server unit 28.
This makes it possible to enjoy content using images from the torch viewpoint and images taken by general users. Anyone who supports the torch relay along the road can participate as a content creator.

第３サービス例：ＶＲ映像の生成および配信
例えば、映像が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。その映像撮影現場に近い場所にいる単数または複数のユーザーは、端末装置３（例えば、スマートフォン等）を用いて、オリジナル映像に映るオブジェクトを、それぞれ別の角度から、撮影する。各ユーザーの端末装置３は、各視点から映した映像を、追加コンテンツとして、サーバー装置２にアップロードする。例えば、同一のオブジェクトを３台以上の端末装置３で、それぞれ異なる角度から撮影する。それらの複数の視点から同一のオブジェクトを映した映像を、ＶＲ（バーチャルリアリティ）映像として、ウェブサーバー部２８から再配信することができる。 Third service example: Generation and distribution of VR video For example, the video is distributed from the web server device 7 as original content. A single or a plurality of users who are close to the video shooting site use the terminal device 3 (for example, a smartphone or the like) to shoot the objects reflected in the original video from different angles. The terminal device 3 of each user uploads the video projected from each viewpoint to the server device 2 as additional content. For example, the same object is photographed by three or more terminal devices 3 from different angles. The video showing the same object from these plurality of viewpoints can be redistributed from the web server unit 28 as a VR (virtual reality) video.

第４サービス例：顔認識機能に関連したメタデータ付与
例えば、スポーツ競技の映像および音声が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。視聴者であるユーザーは、端末装置３（例えば、スマートフォン等）において、お気に入りの選手を予め登録しておく。端末装置３は、登録されたお気に入り選手の顔画像や、あるいは顔画像の特徴量のデータを予め保存しておく。端末装置３は、顔認識機能を実行させることにより、オリジナルコンテンツの映像において、お気に入り選手が登場したシーンを検出する。端末装置３は、お気に入り選手が登場したシーンが検出されると、例えば、ユーザーに対する通知を行ったり、タイムライン上にチャプターマークを付与したり、当該選手が登場する場面のみをつなぎあわせた自動編集ハイライト動画を生成し表示したりする。これにより、特定の野球選手の打席の場面や、特定のゴルフ選手のショットの場面や、特定のサッカー選手がプレイする場面などを、映像で楽しむことができる。
端末装置３は、そのようなコンテンツに、さらに情報を付加することができる。例えば、端末装置３は、ユーザーの操作により、自動的な顔認識では検出できなかったお気に入り選手の登場シーンを追加（補正）することができる。端末装置３は、この追加情報（メタデータ）を、例えばテキストコンテンツとして追加する。また、端末装置３は、ユーザーの操作により、顔認識によって抽出されたシーンのそれぞれにシーン名称を付与することができる。端末装置３は、この追加情報（シーン名称を表すテキスト）を、コンテンツとして追加する。また、端末装置３は、顔認識によって抽出されたシーンのそれぞれに「いいね」属性を付加することができる。端末装置３は、この追加情報（「いいね」を表すメタデータ）を、例えばテキストコンテンツとして追加する。このように、端末装置３は、メタデータの修飾機能を有する。コンテンツ加工システム１は、これらのメタデータを端末装置３から収集し、メタデータを付加した加工コンテンツを再配信することができる。また、オリジナルコンテンツの制作者は、収集されたメタデータをリアルタイムに取込み、オリジナルコンテンツに反映させることも可能となる。 Fourth service example: Addition of metadata related to the face recognition function For example, video and audio of a sports competition are distributed as original contents from the web server device 7. The user who is a viewer registers a favorite player in advance in the terminal device 3 (for example, a smartphone or the like). The terminal device 3 stores in advance the face image of the registered favorite player or the feature amount data of the face image. By executing the face recognition function, the terminal device 3 detects a scene in which a favorite player appears in the video of the original content. When a scene in which a favorite player appears is detected, the terminal device 3 notifies the user, for example, adds a chapter mark on the timeline, and automatically edits only the scene in which the player appears. Generate and display highlight videos. As a result, it is possible to enjoy a scene of a specific baseball player's turn at bat, a scene of a specific golf player's shot, a scene of a specific soccer player playing, and the like in a video.
The terminal device 3 can further add information to such contents. For example, the terminal device 3 can add (correct) the appearance scene of a favorite player that could not be detected by automatic face recognition by the user's operation. The terminal device 3 adds this additional information (metadata), for example, as text content. In addition, the terminal device 3 can give a scene name to each of the scenes extracted by face recognition by the user's operation. The terminal device 3 adds this additional information (text representing the scene name) as content. In addition, the terminal device 3 can add a "like" attribute to each of the scenes extracted by face recognition. The terminal device 3 adds this additional information (metadata representing "like"), for example, as text content. As described above, the terminal device 3 has a metadata modification function. The content processing system 1 can collect these metadata from the terminal device 3 and redistribute the processed content to which the metadata is added. In addition, the creator of the original content can also capture the collected metadata in real time and reflect it in the original content.

第５サービス例：シフト同期映像生成
例えば、スピードスケートや、スキーや、陸上競技や、水泳などでは、タイムが競われるが、競技者がスタートするタイミングがまちまちである場合がある。ウェブサーバー装置７は、その時にスタートする競技者の映像を、オリジナルコンテンツとして配信する。一方、端末装置３は、既にスタートした競技者の映像を予め受信し、録画保存している。そして、端末装置３は、現時点でウェブサーバー装置７から配信されている競技映像に同期させて、自らが録画保存している映像を追加する。このとき、複数の競技者のスタートのタイミングが同期するよう制御する。サーバー装置２は、ウェブサーバー装置７から配信されている映像（リアルタイム映像）と、端末装置３によって追加された映像（タイムシフトの録画映像）とを同時に配信することができる。これにより、受信端末８側では、スタートのタイミングが異なる競技者同士を、時間的に重ねあわせて視聴することが可能となる。 Fifth service example: Shift-synchronized video generation For example, in speed skating, skiing, athletics, swimming, etc., time is competed, but the timing at which the athlete starts may be different. The web server device 7 distributes the video of the athlete starting at that time as original content. On the other hand, the terminal device 3 receives in advance the video of the athlete who has already started, and records and saves the video. Then, the terminal device 3 synchronizes with the competition video currently distributed from the web server device 7, and adds the video recorded and saved by itself. At this time, control is performed so that the start timings of a plurality of competitors are synchronized. The server device 2 can simultaneously deliver the video (real-time video) delivered from the web server device 7 and the video (time-shift recorded video) added by the terminal device 3. As a result, on the receiving terminal 8 side, it is possible to superimpose and watch the athletes having different start timings in time.

第６サービス例：路上および公共スペース見守りアプリ
例えば、映像が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。一方、道路上（例えば、通学路）や公園などの公共スペースなどの複数の箇所に、予め端末装置３を設置しておく。各端末装置３は、ウェブサーバー装置７からオリジナルコンテンツを受信するとともに、設置された場所（道路や公園等）の映像を撮影して、追加コンテンツとしてその映像をサーバー装置２に提供する。サーバー装置２のウェブサーバー部２８は、端末装置３から提供された映像も追加コンテンツとして含めて再配信する。受信端末８側では、複数の端末装置３からの映像を同期させて、同時に視聴することが可能となる。 Sixth service example: Road and public space watching application For example, a video is distributed from the web server device 7 as original content. On the other hand, the terminal device 3 is installed in advance on a road (for example, a school road) or at a plurality of places such as a public space such as a park. Each terminal device 3 receives the original content from the web server device 7, captures an image of the place where the terminal device 3 is installed (road, park, etc.), and provides the image as additional content to the server device 2. The web server unit 28 of the server device 2 redistributes the video provided by the terminal device 3 including the video as additional content. On the receiving terminal 8 side, images from a plurality of terminal devices 3 can be synchronized and viewed at the same time.

第７サービス例：記念写真等自動生成
オリジナルコンテンツがウェブサーバー装置７から配信される。例えば観光地等に複数の端末装置３が設置される。各端末装置３は、設置された場所を撮影し、高画質映像（または画像）を追加コンテンツとしてサーバー装置２に提供する。サーバー装置２は、追加コンテンツを含んだコンテンツを配信する。さらに、端末装置３は、特定の場所（観光地等）および時刻を鍵として、映像（または画像）を検索する。サーバー装置２は、検索された映像（または画像）を配信する。ここで、例えば、オンデマンド配信の仕組みを用いてもよい。端末装置３は、配信された映像（または画像）と、自らが有する撮影手段で撮影した映像（または画像）とを合成して、記念映像（または記念写真）を作成することができる。端末装置３は、合成処理によって得られた映像（または画像）を、さらなる追加コンテンツとして、サーバー装置２に提供する。サーバー装置２は、さらに、追加されたコンテンツを含む加工コンテンツを配信する。 Seventh service example: Automatically generated commemorative photo, etc. Original content is distributed from the web server device 7. For example, a plurality of terminal devices 3 are installed in tourist spots and the like. Each terminal device 3 photographs the installed location and provides the server device 2 with a high-quality image (or image) as additional content. The server device 2 distributes the content including the additional content. Further, the terminal device 3 searches for a video (or image) using a specific place (sightseeing spot or the like) and time as a key. The server device 2 delivers the searched video (or image). Here, for example, an on-demand distribution mechanism may be used. The terminal device 3 can create a commemorative image (or a commemorative photo) by synthesizing the distributed image (or image) and the image (or image) taken by the photographing means owned by the terminal device 3. The terminal device 3 provides the video (or image) obtained by the compositing process to the server device 2 as additional content. The server device 2 further distributes processed content including the added content.

第８サービス例：リプレイスロー挿入
例えば、スポーツ競技の映像等が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。端末装置３のコンテンツ生成部４３は、受信した映像のフレームを間引くことによって、簡易スロー映像を生成する。端末装置３は、生成したスロー映像をライブストリーム中に挿入して、サーバー装置２に提供する。サーバー装置２は、スロー映像の挿入された映像を、再配信する。
さらに、端末装置３は、再配信された映像（スロー映像入り）を受信し、ユーザーの入力操作等により、ライブコメントのテキストを生成する。端末装置３は、このライブコメントのテキストコンテンツをサーバー装置２に提供する。サーバー装置２は、渡されたテキストコンテンツを追加した加工コンテンツを、再配信する。 Eighth service example: Replace row insertion For example, a video of a sports competition or the like is distributed as original content from the web server device 7. The content generation unit 43 of the terminal device 3 generates a simple slow video by thinning out the frames of the received video. The terminal device 3 inserts the generated slow video into the live stream and provides it to the server device 2. The server device 2 redistributes the video in which the slow video is inserted.
Further, the terminal device 3 receives the re-distributed video (with slow video) and generates the text of the live comment by the user's input operation or the like. The terminal device 3 provides the text content of this live comment to the server device 2. The server device 2 redistributes the processed content to which the passed text content is added.

第９サービス例：仮想カメラスイッチング
例えば、スポーツ中継や、音楽ライブの中継や、舞台（演劇等）の中継の映像が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。オリジナルコンテンツとして配信される映像は、例えば８Ｋ（横７６８０画素×縦４３２０画素）といった高解像で撮影された映像である。また、オリジナルコンテンツとして配信される映像は、典型的には、固定カメラおよび広角レンズを用いて撮影された映像である。端末装置３は、オリジナルコンテンツの一部の領域をクロッピングすることによって得られる映像ストリームを生成する。つまり、クロッピング映像は、オリジナルの映像よりも画素数が少ない。また、端末装置３が、複数のクロッピング映像間での切り替え（スイッチング）を行うようにしてもよい。つまり、仮想カメラスイッチングを実現する。このクロッピングおよびスイッチングによる映像ストリームの生成を、端末装置３内のコンテンツ生成部４３が行う。サーバー装置２は、端末装置３によって生成された映像を、再配信する。
なお、コンテンツ生成部４３が、自動的にクロッピングする領域を選択したり、自動的にカメラの仮想スイッチングを行ったりするようにしてもよい。自動的なクロッピングや、自動的な仮想スイッチングを行うためには、例えば、ＡＩ（人工知能）の技術を利用する。具体的には、予め、クロッピングやスイッチングの結果得られる映像に評価値を付与することにより、学習処理を行っておく。そして、学習済みのＡＩによって、クロッピングや仮想スイッチングを行うようにする。
さらに、再配信されたクロッピング映像を端末装置３が受信してもよい。そして、例えばユーザーの操作等により、ライブコメントのテキストデータを端末装置３が取得する。端末装置３は、取得したテキストのデータを追加コンテンツとして、サーバー装置２に提供する。サーバー装置２は、端末装置３から提供されたライブコメントのテキストデータも、コンテンツとして再配信する。 Ninth service example: Virtual camera switching For example, a video of a sports broadcast, a live music broadcast, or a stage (drama, etc.) broadcast is distributed from the web server device 7 as original content. The video delivered as the original content is a video shot with a high resolution of, for example, 8K (width 7680 pixels x height 4320 pixels). The video delivered as the original content is typically a video shot using a fixed camera and a wide-angle lens. The terminal device 3 generates a video stream obtained by cropping a part of the original content. That is, the cropping image has a smaller number of pixels than the original image. Further, the terminal device 3 may perform switching between a plurality of cropping images. That is, virtual camera switching is realized. The content generation unit 43 in the terminal device 3 generates the video stream by the cropping and switching. The server device 2 redistributes the video generated by the terminal device 3.
The content generation unit 43 may automatically select the area to be cropped or automatically perform virtual switching of the camera. For automatic cropping and automatic virtual switching, for example, AI (artificial intelligence) technology is used. Specifically, the learning process is performed by assigning an evaluation value to the video obtained as a result of cropping or switching in advance. Then, cropping and virtual switching are performed by the learned AI.
Further, the terminal device 3 may receive the re-distributed cropping video. Then, for example, the terminal device 3 acquires the text data of the live comment by the operation of the user or the like. The terminal device 3 provides the acquired text data as additional content to the server device 2. The server device 2 also redistributes the text data of the live comment provided by the terminal device 3 as content.

第１０サービス例：仮想スイッチングによるクリップ映像の挿入
例えば、記者会見や、ニュースの現場からの中継や、スポーツ中継の映像が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。端末装置３は、上記のオリジナルコンテンツを再生しながら、ユーザーの操作等に基づき予め記憶しておいたクリップ映像を差し替え画像として挿入し、新たな映像コンテンツを生成する。このとき、例えば、オリジナルコンテンツに含まれる音声の差し替えは行わず、その音声をそのまま再送信（パススルー）する。挿入するクリップ映像は、例えば、当該記者会見やニュースやスポーツ中継に関連するＶＴＲ映像である。端末装置３は、生成された新たなコンテンツをサーバー装置２に提供する。サーバー装置２は、差し替え映像を伴う新たな映像コンテンツを、再配信する。
これにより、現場（記者会見、ニュース中継、スポーツ中継等）では、カメラ映像のライブ配信を行うだけで済む。つまり、現場で必要とする機材は、カメラと、小型ライブ配信用のエンコーダー装置のみである。そして、端末装置３側で必要な映像クリップを挿入する操作を行うことができる。コンテンツ加工システム１側では、映像のライブストリーミングをデコードしたり再エンコードしたりすることなく、映像の差し替え装入が可能となる。
さらに、再配信された映像（クリップ映像が差し替えとして挿入された映像）を端末装置３が受信してもよい。そして、例えばユーザーの操作等により、ライブコメントのテキストデータを端末装置３が取得する。端末装置３は、取得したテキストのデータを追加コンテンツとして、サーバー装置２に提供する。サーバー装置２は、端末装置３から提供されたライブコメントのテキストデータも、コンテンツとして再配信する。 Example of 10th service: Insertion of clip video by virtual switching For example, a press conference, a broadcast from a news site, or a video of a sports broadcast is delivered from the web server device 7 as original content. While playing back the original content, the terminal device 3 inserts a clip video stored in advance based on a user's operation or the like as a replacement image to generate new video content. At this time, for example, the sound included in the original content is not replaced, and the sound is retransmitted (pass-through) as it is. The clip video to be inserted is, for example, a VTR video related to the press conference, news, or sports broadcast. The terminal device 3 provides the generated new content to the server device 2. The server device 2 redistributes new video content accompanied by the replacement video.
As a result, at the site (press conference, news broadcast, sports broadcast, etc.), it is only necessary to perform live distribution of camera images. In other words, the only equipment required in the field is a camera and an encoder device for small live distribution. Then, the terminal device 3 can perform an operation of inserting a necessary video clip. On the content processing system 1 side, it is possible to replace and insert the video without decoding or re-encoding the live streaming of the video.
Further, the terminal device 3 may receive the re-distributed video (video in which the clip video is inserted as a replacement). Then, for example, the terminal device 3 acquires the text data of the live comment by the operation of the user or the like. The terminal device 3 provides the acquired text data as additional content to the server device 2. The server device 2 also redistributes the text data of the live comment provided by the terminal device 3 as content.

第１１サービス例：ライブ配信コメンタリー付加アプリ
例えば、スポーツ中継の映像および音声が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。端末装置３（例えば、スマートフォン）は、配信された上記オリジナルコンテンツを受信して再生する。また、端末装置３は、オリジナルコンテンツを再生しながら、ユーザーの発話音声を取得し、追加の音声コンテンツを生成する。端末装置３は、生成した音声コンテンツをサーバー装置２に渡す。サーバー装置２は、オリジナルコンテンツに含まれる映像と、端末装置３によって生成された音声とを少なくとも含んだコンテンツを再配信する。これにより、ソーシャルコメンタリーサービス（Social Commentary Service）が実現できる。
また、サーバー装置２は、複数の端末装置３から渡される音声のコンテンツを、相互に同期させ、つまりサーバー装置２上で統合して、オリジナルの映像とともに再配信してもよい。これにより、複数のユーザーによるコメンタリーを、オリジナルの映像に付加して配信することができる。
さらに、端末装置３は、コメンタリーの追加されたコンテンツ（加工コンテンツ）を受信する。そして、端末装置３は、ユーザーの操作等に基づき、コメンタリーの評価情報（例えば、レーティング数値の情報）を取得し、この評価情報を例えば追加のテキストコンテンツとして生成する。端末装置３は、追加のテキストコンテンツをサーバー装置２に渡す。サーバー装置２は、必要に応じて評価情報を適宜処理して、元のコンテンツと共に再配信することができる。これにより、多数の音声トラック（コメンタリーのコンテンツ）のそれぞれに対して、ユーザーの評価情報を付加することができる。これにより、ユーザーからの人気が高い、質の良いコメンタリーを、効率よく選択することも可能となる。 Eleventh service example: Live distribution commentary addition application For example, video and audio of a sports broadcast are distributed from the web server device 7 as original contents. The terminal device 3 (for example, a smartphone) receives and reproduces the distributed original content. Further, the terminal device 3 acquires the user's spoken voice and generates additional voice content while playing back the original content. The terminal device 3 passes the generated audio content to the server device 2. The server device 2 redistributes the content including at least the video included in the original content and the audio generated by the terminal device 3. As a result, a social commentary service can be realized.
Further, the server device 2 may synchronize the audio contents passed from the plurality of terminal devices 3 with each other, that is, integrate them on the server device 2 and redistribute them together with the original video. As a result, commentary by a plurality of users can be added to the original video and distributed.
Further, the terminal device 3 receives the commentary added content (processed content). Then, the terminal device 3 acquires commentary evaluation information (for example, rating numerical value information) based on the user's operation or the like, and generates this evaluation information as, for example, additional text content. The terminal device 3 passes the additional text content to the server device 2. The server device 2 can appropriately process the evaluation information as necessary and redistribute it together with the original content. As a result, user evaluation information can be added to each of a large number of audio tracks (commentary content). This makes it possible to efficiently select high-quality commentaries that are popular with users.

第１２サービス例：仮想スタジアム
例えば、スポーツ中継等のライブ映像が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。不特定多数のユーザーの端末装置３が、そのコンテンツを受信するとともに、ユーザーの音声によるコメントを取得し、音声のコンテンツを生成する。端末装置３は、生成された音声のコンテンツをサーバー装置２に渡す。サーバー装置２は、端末装置３から渡される音声のコンテンツを受け取る。サーバー装置２は、多数の端末装置３からそれぞれの音声のコンテンツを受信し、それらを統合してもよい。サーバー装置２は、オリジナルコンテンツに含まれる映像と、端末装置３から受け取った音声とを少なくとも含んだ加工コンテンツを、再配信する。
受信端末８は、オリジナルコンテンツの映像と、端末装置３によって生成された多数のユーザーのコメントとを同期して再生することができる。つまり、受信端末８では、あたかもスポーツ競技が行われている現場のような雰囲気に混合された音声とともに、オリジナルの映像を鑑賞することが可能となる。つまり、視聴ユーザーは、現場の一体感を味わうことができる。
また、サーバー装置２が複数の端末装置３からの音声コメントのコンテンツを集める際に、友人同士の複数のユーザーが持つ端末装置３からの音声コメントのみを統合して再配信するようにしてもよい。また、サーバー装置２が、特定のチームあるいは選手を贔屓にするユーザーが持つ端末装置３からの音声コメントのみを統合して再配信するようにしてもよい。 12th service example: Virtual stadium For example, a live image such as a sports broadcast is distributed from the web server device 7 as original content. The terminal device 3 of an unspecified number of users receives the content, acquires a comment by the user's voice, and generates the voice content. The terminal device 3 passes the generated audio content to the server device 2. The server device 2 receives the audio content passed from the terminal device 3. The server device 2 may receive the respective audio contents from a large number of terminal devices 3 and integrate them. The server device 2 redistributes the processed content including at least the video included in the original content and the audio received from the terminal device 3.
The receiving terminal 8 can synchronize and reproduce the video of the original content and the comments of a large number of users generated by the terminal device 3. That is, on the receiving terminal 8, it is possible to appreciate the original video together with the sound mixed in the atmosphere as if the sports competition is being held. In other words, the viewing user can enjoy the sense of unity in the field.
Further, when the server device 2 collects the contents of the voice comments from the plurality of terminal devices 3, only the voice comments from the terminal devices 3 owned by the plurality of users of friends may be integrated and redistributed. .. Further, the server device 2 may integrate and redistribute only the voice comments from the terminal device 3 owned by the user who favors a specific team or player.

第１３サービス例：ソーシャルオーケストラアプリ
例えば、音楽の演奏音を含む映像が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。端末装置３は、オリジナルコンテンツの映像を受信して再生するとともに、当該端末装置３のユーザーによる歌唱や、ユーザーによる楽器演奏の音声を、取得する。端末装置３は、音声のコンテンツを生成する。端末装置３は、追加された音声のコンテンツをサーバー装置２に渡す。サーバー装置２は、端末装置３から渡された音声のコンテンツと、オリジナルコンテンツの映像（音声が付加されていてもよい）とを同期させ、再配信する。サーバー装置２は、複数の端末装置３から渡された音声のコンテンツを、オリジナルコンテンツの映像と同期させ、再配信してもよい。これにより、複数の演奏者や歌唱者による仮想音楽セッションを実現することが可能となる。
また、端末装置３は、歌唱するユーザーあるいは楽器を演奏するユーザーを撮影した映像を、さらに追加コンテンツとして取得するようにしてもよい。端末装置３は、取得した映像を、サーバー装置２に渡す。サーバー装置２は、端末装置３から渡された映像をも、オリジナルのコンテンツと同期させ、配信する。
また、複数の端末装置３は、必ずしも同時にセッションする必要はない。つまり、ある端末装置３と他の端末装置３との間で、オリジナルコンテンツを再生するタイミングが異なっており、その結果として歌唱あるいは楽器演奏の音声や映像を取得するタイミングが異なっていても良い。その場合、サーバー装置２は、それぞれのタイミングで取得された追加コンテンツ（映像や音声）を端末装置３から取得し、すべての追加コンテンツを、オリジナルコンテンツのタイミングに同期させて、再配信する。これにより、各端末装置３のユーザーが同時に歌唱あるいは演奏しなくても、セッションのコンテンツを生成することができる。
また、サーバー装置２が、受信端末８に加工コンテンツを配信する際に、例えば受信端末８からの要求に基づいて、特定の音声トラックあるいは特定の映像トラックのみを選択して配信するようにしてもよい。あるいは、受信端末８側で、特定の音声トラックあるいは特定の映像トラックのみを選択して再生するようにしてもよい。
また、サーバー装置２が、受信端末８に加工コンテンツを配信する際に、各音声トラックのボリューム（音量）レベルを任意の比率でミックスするようにしてもよい。これにより、例えば、楽器重視の音声コンテンツや、歌唱重視の音声コンテンツなど、複数のパターンのコンテンツを配信することができるようになる。 13th service example: Social orchestra application For example, a video including a music performance sound is distributed from the web server device 7 as original content. The terminal device 3 receives and reproduces the video of the original content, and also acquires the singing by the user of the terminal device 3 and the sound of the musical instrument performance by the user. The terminal device 3 generates audio content. The terminal device 3 passes the added audio content to the server device 2. The server device 2 synchronizes and redistributes the audio content passed from the terminal device 3 and the video of the original content (audio may be added). The server device 2 may synchronize the audio content passed from the plurality of terminal devices 3 with the video of the original content and redistribute it. This makes it possible to realize a virtual music session by a plurality of performers and singers.
Further, the terminal device 3 may acquire an image of a user singing or a user playing a musical instrument as additional content. The terminal device 3 passes the acquired video to the server device 2. The server device 2 also synchronizes and distributes the video passed from the terminal device 3 with the original content.
Further, the plurality of terminal devices 3 do not necessarily have to have a session at the same time. That is, the timing of reproducing the original content may be different between a certain terminal device 3 and the other terminal device 3, and as a result, the timing of acquiring the sound or video of singing or playing a musical instrument may be different. In that case, the server device 2 acquires the additional content (video or audio) acquired at each timing from the terminal device 3, and re-distributes all the additional content in synchronization with the timing of the original content. As a result, the content of the session can be generated without the user of each terminal device 3 singing or playing at the same time.
Further, when the server device 2 distributes the processed content to the receiving terminal 8, for example, based on a request from the receiving terminal 8, only a specific audio track or a specific video track may be selected and distributed. good. Alternatively, the receiving terminal 8 may select and play back only a specific audio track or a specific video track.
Further, when the server device 2 distributes the processed content to the receiving terminal 8, the volume level of each audio track may be mixed at an arbitrary ratio. As a result, it becomes possible to deliver a plurality of patterns of content such as audio content that emphasizes musical instruments and audio content that emphasizes singing.

第１４サービス例：映像作品アフレコアプリ
例えば、映画やアニメーション等の映像が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。端末装置３は、そのオリジナルコンテンツの映像を受信し、再生する。端末装置３は、オリジナルコンテンツの映像のタイミングに合わせて発話されるユーザーの音声を取得する。端末装置３は、音声のコンテンツを生成し、サーバー装置２に渡す。サーバー装置２は、端末装置３から渡された追加の音声コンテンツと、オリジナルコンテンツである映像とを同期させ、再配信する。
また、サーバー装置２は、複数の端末装置３から追加の音声コンテンツを受信し、複数の音声コンテンツをオリジナルコンテンツに同期させて再配信することもできる。これにより、複数のユーザーが異なる役割を分担してアフレコを行うことが可能となる。サーバー装置２は、複数の端末装置３からの追加の音声コンテンツを、同時に受信してもよいし、異なるタイミングで受信してもよい。
また、サーバー装置２は、１台の端末装置３から、複数回、追加の音声コンテンツを受信し、それら複数の音声コンテンツを、オリジナルの映像コンテンツに同期させて再配信してもよい。これにより、１人のユーザーが、複数の役割を演じながらアフレコを行うことも可能となる。
また、サーバー装置２から再配信された加工コンテンツを、端末装置３が受信するとともに、端末装置３のユーザーが、加工コンテンツへの投票や、評価数値の入力などを行うようにしてもよい。これにより、端末装置３は、様々な加工コンテンツの人気のテキストデータを追加コンテンツとして取得する。端末装置３は、このテキストデータを、サーバー装置２に渡す。サーバー装置２は、加工コンテンツごとの人気を表すテキストデータのコンテンツを、再配信することができる。この仕組みにより、例えば、どの加工コンテンツ（映像）が面白かったかを競うイベントを行うこともできる。 14th service example: Video work dubbing application For example, a video such as a movie or an animation is distributed as original content from the web server device 7. The terminal device 3 receives and reproduces the video of the original content. The terminal device 3 acquires the user's voice spoken at the timing of the video of the original content. The terminal device 3 generates audio content and passes it to the server device 2. The server device 2 synchronizes and redistributes the additional audio content passed from the terminal device 3 and the video which is the original content.
The server device 2 can also receive additional audio content from the plurality of terminal devices 3 and redistribute the plurality of audio content in synchronization with the original content. This makes it possible for a plurality of users to share different roles and perform dubbing. The server device 2 may receive additional audio contents from the plurality of terminal devices 3 at the same time, or may receive them at different timings.
Further, the server device 2 may receive additional audio content a plurality of times from one terminal device 3 and redistribute the plurality of audio contents in synchronization with the original video content. As a result, one user can perform dubbing while playing a plurality of roles.
Further, the terminal device 3 may receive the processed content redistributed from the server device 2, and the user of the terminal device 3 may vote for the processed content, input an evaluation value, or the like. As a result, the terminal device 3 acquires popular text data of various processed contents as additional contents. The terminal device 3 passes this text data to the server device 2. The server device 2 can redistribute the text data content indicating the popularity of each processed content. With this mechanism, for example, it is possible to hold an event to compete for which processed content (video) was interesting.

第１５サービス例：多言語実況および解説
例えば、スポーツイベントの中継映像が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。当該スポーツイベントの実況者や解説者の操作により、端末装置３は、当該オリジナルコンテンツの映像（音声を伴っていてもよい）を受信し、再生する。実況者および解説者は、同じ場所に居てもよいし、異なる場所に居てもよい。また、端末装置３は、実況者および解説者の音声を取得し、追加の音声コンテンツを生成する。端末装置３は、生成した音声コンテンツを、サーバー装置２に渡す。サーバー装置２は、オリジナルコンテンツである映像と、端末装置３によって生成された音声コンテンツとを、同期させて配信する。これにより、実況者および解説者は、特定の言語による実況音声および解説音声をコンテンツに付加することができる。また、言語ごとに（例えば、日本語、英語、中国語、フランス語、・・・等）、実況者および解説者が実況音声および解説音声のコンテンツを発し、追加コンテンツを生成するようにしてもよい。また、実況者と解説者が相互に離れていても、その掛け合い音声によるコンテンツを生成することが可能となる。これにより、例えば大規模なスポーツイベントの、多言語実況等が可能となる。 Fifteenth service example: Multilingual live commentary and explanation For example, a live video of a sporting event is distributed as original content from the web server device 7. By the operation of the commentator or the commentator of the sporting event, the terminal device 3 receives and reproduces the video (which may be accompanied by audio) of the original content. The commentator and commentator may be in the same place or in different places. In addition, the terminal device 3 acquires the audio of the commentator and the commentator, and generates additional audio content. The terminal device 3 passes the generated audio content to the server device 2. The server device 2 synchronizes and distributes the video which is the original content and the audio content generated by the terminal device 3. As a result, the commentator and the commentator can add the commentary sound and the commentary sound in a specific language to the content. In addition, for each language (for example, Japanese, English, Chinese, French, etc.), the commentator and the commentator may emit the content of the commentary audio and the commentary audio to generate additional content. .. In addition, even if the commentator and the commentator are separated from each other, it is possible to generate the content by the dialogue voice. As a result, for example, a large-scale sporting event can be played in multiple languages.

第１６サービス例：ライブストリーミングに対する音声コメンタリー
例えば、何らかの映像コンテンツ（音楽ライブ、トークライブ等を含む任意のコンテンツ）が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。端末装置３は、そのオリジナルコンテンツを受信し、再生する。また、端末装置３は、ユーザーの音声を取得し、音声による追加コンテンツを生成する。端末装置３は、生成された音声コンテンツを、サーバー装置２に渡す。サーバー装置２は、受信した追加コンテンツを、オリジナルコンテンツと同期させて、再配信する。
サーバー装置２は、複数の端末装置３から追加コンテンツ（音声）を受信し、それらをミックスして再配信してもよい。
この仕組みにより、動画配信サービスにおいて、ユーザーが音声コメントを共有することが可能となる。 Example of 16th service: Audio commentary for live streaming For example, some kind of video content (arbitrary content including music live, talk live, etc.) is distributed from the web server device 7 as original content. The terminal device 3 receives and reproduces the original content. In addition, the terminal device 3 acquires the user's voice and generates additional content by voice. The terminal device 3 passes the generated audio content to the server device 2. The server device 2 synchronizes the received additional content with the original content and redistributes it.
The server device 2 may receive additional content (audio) from the plurality of terminal devices 3, mix them, and redistribute them.
With this mechanism, users can share voice comments in the video distribution service.

第１７サービス例：パブリックビューイング会場の観客音声の配信
例えば、大規模スポーツイベントの中継映像（音声を含む）が、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。いわゆるパブリックビューイング会場において、端末装置３（ＰＣ等）が、オリジナルコンテンツの映像および音声を再生する。映像はパブリックビューイング会場の大画面に表示され、音声はスピーカー等から出力される。端末装置３は、パブリックビューイング会場の観客の音声を取得し、音声コンテンツを生成する。端末装置３は、生成した追加コンテンツをサーバー装置２に渡す。サーバー装置２は、端末装置３から渡された追加の音声コンテンツと、オリジナルコンテンツとを同期させ、それらを再配信する。これにより、受信端末８は、スポーツイベントが行われている会場の音声だけではなく、パブリックビューイング会場の音声をも含んだコンテンツを受信し、再生することができる。
パブリックビューイング会場は、当該スポーツイベントの会場と同地域に存在していてもよいし、遠隔地（異なる国を含む）に存在していてもよい。スポーツイベントの会場とパブリックビューイングの会場とが離れている場合には、スポーツイベントの会場に居る観客の客層と、パブリックビューイングの会場に居る観客の客層とが異なる場合もある。例えば、両会場の観客で、贔屓の選手、贔屓のチーム、文化的背景等が異なる場合もあり得る。この場合、受信端末８側では、配信される加工コンテンツを通して、スポーツイベント会場の音声による臨場感とは異なる、パブリックビューイング会場の音声による臨場感をも味わうことが可能となる。 17th Service Example: Distribution of Audience Audio at Public Viewing Venue For example, a live video (including audio) of a large-scale sporting event is distributed from the web server device 7 as original content. At the so-called public viewing venue, the terminal device 3 (PC or the like) reproduces the video and audio of the original content. The video is displayed on the large screen of the public viewing venue, and the audio is output from speakers and the like. The terminal device 3 acquires the audio of the audience at the public viewing venue and generates audio content. The terminal device 3 passes the generated additional content to the server device 2. The server device 2 synchronizes the additional audio content passed from the terminal device 3 with the original content and redistributes them. As a result, the receiving terminal 8 can receive and play back the content including not only the sound of the venue where the sporting event is held but also the sound of the public viewing venue.
The public viewing venue may be located in the same area as the venue of the sporting event, or may be located in a remote location (including different countries). When the venue of the sporting event and the venue of the public viewing are separated from each other, the audience of the audience at the venue of the sporting event and the audience of the audience at the venue of the public viewing may be different. For example, the spectators at both venues may have different favored players, favored teams, cultural backgrounds, and so on. In this case, on the receiving terminal 8 side, it is possible to experience the realism of the public viewing venue, which is different from the realism of the voice of the sports event venue, through the processed content to be delivered.

第１８サービス例：道案内動画生成アプリ
ユーザーの操作等に基づき、端末装置３は、検索エンジンのサーバー装置に対して検索キーワードを送信する。検索キーワードは、例えば「汐留から日本橋まで」といったように、移動の出発地と目的地の地名を含むものである。このときのユーザーの意図は、道案内の情報を得ることである。すると、オリジナルコンテンツとして、検索キーワードに対応した基本移動映像がウェブサーバー装置７から配信される。検索キーワードが「汐留から日本橋まで」である場合、基本移動映像はそれらの地点間の移動のルートにおける映像である。基本移動映像は、予め、データベースに格納されている。また、基本移動映像には位置情報（経度，緯度）や、時刻情報や、移動手段に関する情報が関連付けられている。なお、基本移動映像が、時間帯毎に予め準備されていてもよい。その場合、実際の時間帯に最も近い時間帯の基本移動映像がオリジナルコンテンツとして配信される。また、基本移動映像は、複数の映像をつなぎ合わせたものであってもよい。例えば、汐留から日本橋までの基本移動映像は、汐留から銀座四丁目までの移動映像と、銀座四丁目から日本橋までの移動映像とをつなぎ合わせたものであってよい。端末装置３は、オリジナルコンテンツである基本移動映像を受信し、再生する。また、端末装置３内のコンテンツ生成部４３は、基本移動映像に関連付けられているデータを抽出する。例えば、コンテンツ生成部４３は、移動中の位置情報（座標情報）を抽出する。そして、コンテンツ生成部４３は、抽出した位置情報に基づいて他のデータベース（例えば、端末装置３自身が持つデータベース、またはインターネットを介してアクセスするサーバー上のデータベース）を検索し、位置に関連した情報を取得する。位置に関連した情報は、例えば、観光スポットの案内情報（場所、見どころ、歴史的背景等）や、飲食店の情報（場所、メニュー等）や、他の店舗の情報（場所、業態、販売物等）などである。位置に関連した情報は、例えば、テキスト情報で与えられる。コンテンツ生成部４３は、オリジナルコンテンツである移動映像と、上記の観光スポットや飲食店や店舗の情報とに基づいて、複合コンテンツを生成する。この複合コンテンツにおいて、観光スポットや飲食店や店舗の情報は、例えば、映像内の特定の場所において表示される画像やテキストを含む。端末装置３は、新たに生成したコンテンツ（移動映像と店舗等の情報の複合コンテンツ）をサーバー装置２に渡す。サーバー装置２は、加工されたコンテンツを再配信する。再配信されたコンテンツは、端末装置３自身で閲覧することもできるし、他の受信端末８で閲覧することもできる。
これにより、観光スポットや店舗等の情報を、わかりやすく観光客らに提供することができる。また、基本移動情報に関連付けて、広告情報を提供するようにしてもよい。 18th service example: Directions video generation application The terminal device 3 transmits a search keyword to the server device of the search engine based on the user's operation or the like. The search keyword includes the place name of the departure point and the destination of the movement, for example, "from Shiodome to Nihonbashi". The intention of the user at this time is to obtain information on directions. Then, as the original content, the basic moving video corresponding to the search keyword is distributed from the web server device 7. When the search keyword is "from Shiodome to Nihonbashi", the basic movement video is the video on the route of movement between those points. The basic moving image is stored in the database in advance. In addition, position information (longitude, latitude), time information, and information related to the means of transportation are associated with the basic moving image. The basic moving image may be prepared in advance for each time zone. In that case, the basic moving video in the time zone closest to the actual time zone is delivered as the original content. Further, the basic moving image may be a combination of a plurality of images. For example, the basic moving image from Shiodome to Nihonbashi may be a combination of the moving image from Shiodome to Ginza 4-chome and the moving image from Ginza 4-chome to Nihonbashi. The terminal device 3 receives and reproduces the basic moving video which is the original content. Further, the content generation unit 43 in the terminal device 3 extracts data associated with the basic moving video. For example, the content generation unit 43 extracts moving position information (coordinate information). Then, the content generation unit 43 searches for another database (for example, the database owned by the terminal device 3 itself or the database on the server accessed via the Internet) based on the extracted location information, and the information related to the location. To get. Information related to the location includes, for example, tourist spot information (location, highlights, historical background, etc.), restaurant information (location, menu, etc.), and other store information (location, business format, sales). Etc.) and so on. Position-related information is given, for example, in textual information. The content generation unit 43 generates composite content based on the moving video, which is the original content, and the information on the above-mentioned tourist spots, restaurants, and stores. In this composite content, information on tourist spots, restaurants, and stores includes, for example, images and texts displayed at specific places in the video. The terminal device 3 passes the newly generated content (composite content of moving video and information such as a store) to the server device 2. The server device 2 redistributes the processed content. The redistributed content can be viewed by the terminal device 3 itself, or can be viewed by another receiving terminal 8.
As a result, information on tourist spots, stores, etc. can be provided to tourists in an easy-to-understand manner. Further, the advertisement information may be provided in association with the basic movement information.

第１９サービス例：タイムライン検索キー生成
例えば、任意の映像または音声（両方を含んでもよい）のコンテンツが、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。端末装置３はそのオリジナルコンテンツを受信し、端末装置３のデコーダー部４１はコンテンツをデコードして映像や音声等をコンテンツ生成部４３に提供する。コンテンツ生成部４３は、映像認識エンジンおよび音声認識エンジンを備えている。コンテンツ生成部４３の映像認識エンジンは、デコーダー部４１から供給される映像の認識処理を行い、映像に含まれている人物やオブジェクトやシーンが何であるかを認識し、認識結果のテキストデータを出力する。また、コンテンツ生成部４３の音声認識エンジンは、デコーダー部４１から供給される音声の認識処理を行い、音声に含まれる語や文章が何であるかを認識し、認識結果のテキストデータを出力する。さらに、コンテンツ生成部４３は、映像認識や音声認識の結果として得られるテキストデータを、検索キー用のメタデータとして利用しやすい形に編集し、出力する。検索キー用のメタデータでは、検索語と、映像コンテンツや音声コンテンツにおける時刻位置（例えば、コンテンツの開始時からの相対時刻等）とが相互に関連付けられている。また、検索語と、映像等のシーンとが相互に関連付けられていてもよい。映像コンテンツをシーンの切り替えのポイントで分割したり、音声コンテンツを所定の長さの無音区間で分割したりすることは、既存技術を用いて行うことができる。端末装置３は、検索キー用メタデータとして生成したテキストデータを、サーバー装置２に渡す。サーバー装置２は、オリジナルコンテンツとともに、端末装置３から渡されたテキストデータを再配信する。コンテンツ加工システム１から再配信された加工コンテンツを受信する受信端末８側では、検索キー用のメタデータを参照することにより、映像コンテンツや音声コンテンツ内の検索語に該当する箇所をすばやくサーチする（例えば、頭出しする）ことが可能となる。つまり、受信端末８を操作する視聴者は、所望のシーン等を手軽に視聴することが可能となる。 19th Service Example: Timeline Search Key Generation For example, arbitrary video or audio content (which may include both) is distributed from the web server device 7 as original content. The terminal device 3 receives the original content, and the decoder unit 41 of the terminal device 3 decodes the content and provides video, audio, or the like to the content generation unit 43. The content generation unit 43 includes a video recognition engine and a voice recognition engine. The video recognition engine of the content generation unit 43 performs the video recognition process supplied from the decoder unit 41, recognizes what the person, object, or scene contained in the video is, and outputs the text data of the recognition result. do. Further, the voice recognition engine of the content generation unit 43 performs voice recognition processing of the voice supplied from the decoder unit 41, recognizes what the words and sentences included in the voice are, and outputs the text data of the recognition result. Further, the content generation unit 43 edits and outputs the text data obtained as a result of the video recognition or the voice recognition in a form that can be easily used as the metadata for the search key. In the metadata for the search key, the search term and the time position in the video content or the audio content (for example, the relative time from the start of the content) are associated with each other. Further, the search term and the scene such as a video may be related to each other. The video content can be divided at the point of switching the scene, and the audio content can be divided into silent sections having a predetermined length by using the existing technology. The terminal device 3 passes the text data generated as the search key metadata to the server device 2. The server device 2 redistributes the text data passed from the terminal device 3 together with the original content. On the receiving terminal 8 side that receives the processed content re-distributed from the content processing system 1, by referring to the metadata for the search key, the part corresponding to the search term in the video content or the audio content is quickly searched ( For example, it is possible to cue). That is, the viewer who operates the receiving terminal 8 can easily view a desired scene or the like.

第２０サービス例：字幕生成
例えば、任意の映像または音声（両方を含んでもよい）のコンテンツが、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。端末装置３はそのオリジナルコンテンツを受信し、端末装置３のデコーダー部４１はコンテンツをデコードして映像や音声等をコンテンツ生成部４３に提供する。コンテンツ生成部４３は、音声認識エンジンを備えている。コンテンツ生成部４３の音声認識エンジンは、デコーダー部４１から供給される音声の認識処理を行い、音声から文字起こしテキストデータを出力する。また、コンテンツ生成部４３は、自動的に、あるいは少なくとも一部で校閲者の操作にも基づいて、上記テキストデータから字幕テキストデータを生成する。例えば、字幕テキストデータは、タイムドテキストとして、あるいはタイムライン型テキストデータとして出力される。端末装置３は、生成された字幕テキストデータを、サーバー装置２に渡す。サーバー装置２は、オリジナルコンテンツと同期させる形で、端末装置３から渡されたテキストデータを再配信する。この加工コンテンツを受信する受信端末８側では、オリジナルコンテンツとともに字幕テキストを表示することが可能となる。あるいは、受信端末８側では、字幕テキストに対して所定の処理を行った結果を、オリジナルコンテンツに関連付けて出力することができる。 20th service example: Subtitle generation For example, arbitrary video or audio content (which may include both) is distributed from the web server device 7 as original content. The terminal device 3 receives the original content, and the decoder unit 41 of the terminal device 3 decodes the content and provides video, audio, or the like to the content generation unit 43. The content generation unit 43 includes a voice recognition engine. The voice recognition engine of the content generation unit 43 performs the voice recognition process of the voice supplied from the decoder unit 41, and outputs the transcription text data from the voice. In addition, the content generation unit 43 automatically or at least partially generates subtitle text data from the text data based on the operation of the reviewer. For example, the subtitle text data is output as timed text or timeline type text data. The terminal device 3 passes the generated subtitle text data to the server device 2. The server device 2 redistributes the text data passed from the terminal device 3 in synchronization with the original content. On the receiving terminal 8 side that receives the processed content, the subtitle text can be displayed together with the original content. Alternatively, the receiving terminal 8 side can output the result of performing a predetermined process on the subtitle text in association with the original content.

第２１サービス例：自動ハイライト映像の生成（１）
例えば、任意の映像または音声（両方を含んでもよい）のコンテンツが、オリジナルコンテンツとして、ウェブサーバー装置７から配信される。端末装置３はそのオリジナルコンテンツを受信し、端末装置３のデコーダー部４１はコンテンツをデコードして映像や音声等をコンテンツ生成部４３に提供する。コンテンツ生成部４３は、映像認識エンジンや音声認識エンジンを備えている。これらの認識エンジンは、人工知能の技術を援用するものであってもよい。コンテンツ生成部４３は、映像や音声を認識し解析することによって、オリジナルコンテンツ中の主要イベントを抽出する。この主要イベントは、例えば、音声の内容や、音圧レベルや、映像の内容や、映像内の特定のパターン等に基づき、学習済みの人工知能によって抽出される。コンテンツ生成部４３は、オリジナルコンテンツ中の、主要イベントを含む断片をハイライトシーンとして認識し、例えば複数のハイライトシーンのみを切り出して連結することにより、ハイライト映像（または音声）を生成する。ハイライト映像等は「ハイライトクリップ」とも呼ばれる。端末装置３は、このハイライトクリップのコンテンツをサーバー装置２に渡す。サーバー装置２は、オリジナルコンテンツとともに、端末装置３から渡されたハイライトクリップを再配信する。あるいは、サーバー装置２は、オリジナルコンテンツを置換して、端末装置３から渡されたハイライトクリップのみを再配信する。この仕組みにより、映像や音声のコンテンツのハイライトのみを容易に配信することが可能となる。 21st service example: Automatic highlight video generation (1)
For example, arbitrary video or audio (which may include both) content is delivered from the web server device 7 as original content. The terminal device 3 receives the original content, and the decoder unit 41 of the terminal device 3 decodes the content and provides video, audio, or the like to the content generation unit 43. The content generation unit 43 includes a video recognition engine and a voice recognition engine. These recognition engines may be those that utilize artificial intelligence technology. The content generation unit 43 extracts the main events in the original content by recognizing and analyzing the video and audio. This major event is extracted by learned artificial intelligence, for example, based on audio content, sound pressure level, video content, a specific pattern in the video, and the like. The content generation unit 43 recognizes a fragment including a main event in the original content as a highlight scene, and generates a highlight video (or audio) by, for example, cutting out and connecting only a plurality of highlight scenes. Highlight images and the like are also called "highlight clips". The terminal device 3 passes the content of the highlight clip to the server device 2. The server device 2 redistributes the highlight clip passed from the terminal device 3 together with the original content. Alternatively, the server device 2 replaces the original content and redistributes only the highlight clip passed from the terminal device 3. This mechanism makes it possible to easily deliver only the highlights of video and audio contents.

以上、複数のサービス例を説明したが、コンテンツ加工システム１は、上で説明したサービス例のうちの複数を組み合わせて実施してもよい。 Although a plurality of service examples have been described above, the content processing system 1 may be implemented by combining a plurality of the service examples described above.

上記実施形態では、サーバー装置２と端末装置３とを用いてコンテンツ加工システム１を構成した。各装置の機能構成は図１および図２に示した通りである。しかし、変形例として、サーバー装置２と端末装置３とのそれぞれへの機能の配置を任意に変更してもよい。また、３個以上の装置に機能分散させてコンテンツ加工システム１を構成してもよい。また、サーバー装置２と端末装置３が有する機能を統合して１台の装置としてコンテンツ加工システム１を構成してもよい。 In the above embodiment, the content processing system 1 is configured by using the server device 2 and the terminal device 3. The functional configuration of each device is as shown in FIGS. 1 and 2. However, as a modification, the arrangement of functions in each of the server device 2 and the terminal device 3 may be arbitrarily changed. Further, the content processing system 1 may be configured by distributing the functions to three or more devices. Further, the contents processing system 1 may be configured as one device by integrating the functions of the server device 2 and the terminal device 3.

本実施形態およびその変形例によれば、追加するコンテンツのみをエンコードして配信し、オリジナルのコンテンツはオリジナルのセグメントファイルのまま配信することが可能である。これにより、比較的小規模な装置構成で、ストリーミング配信されたコンテンツを加工して再配信することが可能となる。
また、オリジナルのセグメントファイルの一部を差し替える形で上記の追加のコンテンツを配信することにより、コンテンツの差し替えを、比較的小規模な装置構成で実現することができる。
また、本実施形態を利用して、多岐にわたるサービスを実現することができる。 According to the present embodiment and its modification, it is possible to encode and distribute only the content to be added, and distribute the original content as the original segment file. This makes it possible to process and redistribute streamed content with a relatively small device configuration.
Further, by delivering the above-mentioned additional content in the form of replacing a part of the original segment file, the replacement of the content can be realized with a relatively small device configuration.
In addition, a wide variety of services can be realized by using this embodiment.

以上、この発明の実施形態、変形例、実現するサービス例について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiments, modifications, and service examples of the present invention have been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment and does not deviate from the gist of the present invention. Design etc. are also included.

本発明は、例えば、コンテンツ配信事業等（放送事業を含む）において利用可能である。ただし、本発明の利用範囲はここに例示した事業に限定されるものではない。 The present invention can be used, for example, in a content distribution business (including a broadcasting business). However, the scope of use of the present invention is not limited to the business exemplified here.

０配信システム
１コンテンツ加工システム（コンテンツ加工装置）
２サーバー装置
３端末装置
６エンコーダー装置
７ウェブサーバー装置
８受信端末
２１マニュフェスト取得部
２２マニュフェスト生成部
２５セグメント取得部
２６セグメント選択部
２８ウェブサーバー部（再配信部）
３１マニュフェスト取得部
３２マニュフェスト解析部
３３セグメント取得部
３５時刻解析部
４１デコーダー部
４２再生部
４３コンテンツ生成部
４４Ａ／Ｖインターフェース部（インターフェース部）
４５ミキサー部
４８エンコーダー部
４９セグメント化部
５０アップロード部 0 Distribution system 1 Content processing system (content processing equipment)
2 Server device 3 Terminal device 6 Encoder device 7 Web server device 8 Receiving terminal 21 Manifest acquisition unit 22 Manufest generation unit 25 Segment acquisition unit 26 Segment selection unit 28 Web server unit (redistribution unit)
31 Manufest acquisition unit 32 Manufest analysis unit 33 Segment acquisition unit 35 Time analysis unit 41 Decoder unit 42 Playback unit 43 Content generation unit 44 A / V interface unit (interface unit)
45 Mixer section 48 Encoder section 49 Segment section 50 Upload section

Claims

A manifest acquisition unit that acquires the original manifest file included in the original content of streaming using the hypertext transfer protocol, and
A segment acquisition unit that acquires the original segment file included in the original content,
A decoder unit that decodes and outputs the original segment file, and
An interface unit that acquires additional content newly added in a form associated with the decoded original segment file, and an interface unit.
The encoder section that encodes the additional content and
A segmentation unit that generates an additional segment file by segmenting the encoded additional content so as to synchronize with the time of the original segment file.
A manifest generation unit that generates a processing manifest file so that the original segment file and the additional segment file are synchronized based on the original manifest file.
A redistribution unit that distributes the original segment file, the additional segment file, and the processing manifest file as processed contents using a hypertext transfer protocol.
Content processing system equipped with.

The manifest generation unit generates the processing manifest file that includes all of the acquired original segment files and that reproduces the additional processing content including the additional segment file.
The content processing system according to claim 1, wherein the content processing system is characterized in that.

The manifest generation unit generates the processing manifest file for reproducing the replacement type processing content including only a part of the acquired original segment file and including the additional segment file.
The content processing system according to claim 1, wherein the content processing system is characterized in that.

A content processing system that includes a server device and a terminal device.
The terminal device is
A manifest acquisition unit that acquires the original manifest file included in the original content of streaming using the hypertext transfer protocol, and
A segment acquisition unit that acquires the original segment file included in the original content,
A decoder unit that decodes and outputs the original segment file, and
An interface unit that acquires additional content newly added in a form associated with the decoded original segment file, and an interface unit.
The encoder section that encodes the additional content and
A segmentation unit that generates an additional segment file by segmenting the encoded additional content so as to synchronize with the time of the original segment file.
Equipped with
The server device
A manifest generation unit that generates a processing manifest file so that the original segment file and the additional segment file are synchronized based on the original manifest file.
A redistribution unit that distributes the original segment file, the additional segment file, and the processing manifest file as processed contents using a hypertext transfer protocol.
Equipped with
Content processing system.

The original segment file stores data obtained by encoding at least one of video and audio.
A content generation unit that automatically generates the additional content based on the original segment file by analyzing the video or audio output by the decoder unit.
The content processing system according to any one of claims 1 to 4, further comprising.

The content generation unit generates the additional content including text data obtained by performing a video or audio recognition process output by the decoder unit.
The content processing system according to claim 5.

The manifest generation unit has a function of generating a processed manifest file for reproducing only the original segment file based on an external instruction.
The redistribution unit has a function of distributing only the original segment file and the processing manifest file based on the instruction from the outside.
The content processing system according to any one of claims 1 to 6.

A manifest acquisition unit that acquires the original manifest file included in the original content of streaming using the hypertext transfer protocol, and
A segment acquisition unit that acquires the original segment file included in the original content,
A decoder unit that decodes and outputs the original segment file, and
An interface unit that acquires additional content newly added in a form associated with the decoded original segment file, and an interface unit.
The encoder section that encodes the additional content and
A segmentation unit that generates an additional segment file by segmenting the encoded additional content so as to synchronize with the time of the original segment file.
A terminal device comprising.

Computer,
A manifest acquisition unit that acquires the original manifest file included in the original content of streaming using the hypertext transfer protocol, and
A segment acquisition unit that acquires the original segment file included in the original content,
A decoder unit that decodes and outputs the original segment file, and
An interface unit that acquires additional content newly added in a form associated with the decoded original segment file, and an interface unit.
The encoder section that encodes the additional content and
A segmentation unit that generates an additional segment file by segmenting the encoded additional content so as to synchronize with the time of the original segment file.
A manifest generation unit that generates a processing manifest file so that the original segment file and the additional segment file are synchronized based on the original manifest file.
A redistribution unit that distributes the original segment file, the additional segment file, and the processing manifest file as processed contents using a hypertext transfer protocol.
A program for functioning as a content processing system equipped with.

Computer,
A manifest acquisition unit that acquires the original manifest file included in the original content of streaming using the hypertext transfer protocol, and
A segment acquisition unit that acquires the original segment file included in the original content,
A decoder unit that decodes and outputs the original segment file, and
An interface unit that acquires additional content newly added in a form associated with the decoded original segment file, and an interface unit.
The encoder section that encodes the additional content and
A segmentation unit that generates an additional segment file by segmenting the encoded additional content so as to synchronize with the time of the original segment file.
A program for functioning as a terminal device equipped with.