JP7828984B2

JP7828984B2 - Processing apparatus, processing program, and processing method

Info

Publication number: JP7828984B2
Application number: JP2024013605A
Authority: JP
Inventors: 渡岡田
Original assignee: 株式会社フレクト
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2026-03-12
Anticipated expiration: 2044-01-31
Also published as: JP2025118337A

Description

本開示は、選択されたオブジェクトに関連付けられた発話情報を出力するための処理装置、処理プログラム及び処理方法に関する。 This disclosure relates to a processing device, a processing program, and a processing method for outputting speech information associated with a selected object.

従来より、インターネットを介した動画配信システムが知られている。例えば、特許文献１には、「ユーザ端末に動画の配信条件を含む募集要項を通知すると共に、ユーザ端末から投稿動画を取得する募集管理部と、前記投稿動画の配信可否を分析して、配信可の投稿動画を配信動画とする動画分析部と、前記配信動画を配信する動画配信管理部とを備えることを特徴とする動画配信システム」が記載されている。 Video distribution systems via the internet have been known for some time. For example, Patent Document 1 describes a "video distribution system characterized by comprising: a recruitment management unit that notifies a user terminal of recruitment requirements including video distribution conditions and acquires submitted videos from the user terminal; a video analysis unit that analyzes whether the submitted videos are eligible for distribution and designates eligible submitted videos as distribution videos; and a video distribution management unit that distributes the distribution videos."

特開２０２２－１８０９６７号公報Japanese Patent Publication No. 2022-180967

そこで、上記のような技術を踏まえ、本開示では、様々な実施形態により、受信者等のユーザにとってより使い勝手の良い処理装置、処理プログラム及び処理方法を提供することを目的とする。 Therefore, based on the technologies described above, this disclosure aims to provide a processing device, processing program, and processing method that are more user-friendly for users such as recipients, through various embodiments.

本開示の一態様によれば、「少なくとも一つのプロセッサを具備する処理装置であって、前記少なくとも一つのプロセッサは、通信インターフェイスを介して、送信者端末装置において生成されるコンテンツに含まれる複数のオブジェクトのそれぞれに関連付けて入力される発話情報を前記送信者端末装置から受信し、入力インターフェイスを介して前記複数のオブジェクトのうちの少なくとも一つのオブジェクトを選択し、出力インターフェイスを介して前記発話情報を出力するときに、選択された前記少なくとも一つのオブジェクトに関連付けられた発話情報を出力するための処理を実行するように構成された処理装置」が提供される。 According to one aspect of this disclosure, a processing device is provided comprising at least one processor, wherein the at least one processor is configured to receive speech information from a transmitting terminal device via a communication interface, associated with each of a plurality of objects included in content generated in the transmitting terminal device, to select at least one of the plurality of objects via an input interface, and to perform processing for outputting speech information associated with the selected at least one object when outputting the speech information via an output interface.

本開示の一態様によれば、「少なくとも一つのプロセッサを具備するコンピュータにおいて、前記少なくとも一つのプロセッサを、通信インターフェイスを介して、送信者端末装置において生成されるコンテンツに含まれる複数のオブジェクトのそれぞれに関連付けて入力される発話情報を前記送信者端末装置から受信し、入力インターフェイスを介して前記複数のオブジェクトのうちの少なくとも一つのオブジェクトを選択し、出力インターフェイスを介して前記発話情報を出力するときに、選択された前記少なくとも一つのオブジェクトに関連付けられた発話情報を出力するための処理を実行するように機能させる処理プログラム」が提供される。 According to one aspect of this disclosure, a processing program is provided that causes a computer comprising at least one processor to function as follows: the computer receiving speech information from a sender terminal device via a communication interface, associated with each of a plurality of objects included in content generated at the sender terminal device; selecting at least one of the plurality of objects via an input interface; and, when outputting the speech information via an output interface, to perform processing to output the speech information associated with the selected at least one object.

本開示の一態様によれば、「少なくとも一つのプロセッサを具備するコンピュータにおいて、前記少なくとも一つのプロセッサにより実行される処理方法であって、通信インターフェイスを介して、送信者端末装置において生成されるコンテンツに含まれる複数のオブジェクトのそれぞれに関連付けて入力される発話情報を前記送信者端末装置から受信する段階と、入力インターフェイスを介して前記複数のオブジェクトのうちの少なくとも一つのオブジェクトを選択する段階と、出力インターフェイスを介して前記発話情報を出力するときに、選択された前記少なくとも一つのオブジェクトに関連付けられた発話情報を出力する段階とを含む処理方法」が提供される。 According to one aspect of this disclosure, a processing method is provided for a computer comprising at least one processor, the method being performed by the at least one processor, the method comprising: receiving speech information input from a sender terminal device via a communication interface, associated with each of a plurality of objects included in content generated in a sender terminal device; selecting at least one object from the plurality of objects via an input interface; and outputting speech information associated with the selected at least one object when outputting the speech information via an output interface.

本開示によれば、受信者等のユーザにとってより使い勝手の良い処理装置、処理プログラム及び処理方法を提供することができる。 This disclosure provides a more user-friendly processing device, processing program, and processing method for users such as recipients.

なお、上記効果は説明の便宜のための例示的なものであるにすぎず、限定的なものではない。上記効果に加えて、又は上記効果に代えて、本開示中に記載されたいかなる効果や当業者であれば明らかな効果を奏することも可能である。 The effects described above are merely illustrative for illustrative purposes and are not limiting. In addition to, or in lieu of, any other effects described herein or those obvious to those skilled in the art may also be achieved.

図１Ａは、本開示の実施形態に係る処理システム１に係る処理の概要を示す図である。Figure 1A is a diagram showing an overview of the processing according to the processing system 1 according to the embodiment of this disclosure. 図１Ｂは、本開示の一実施形態に係る処理システム１の構成を示すブロック図である。Figure 1B is a block diagram showing the configuration of a processing system 1 according to one embodiment of the present disclosure. 図２Ａは、本開示の一実施形態に係るサーバ装置１００の構成を示すブロック図である。Figure 2A is a block diagram showing the configuration of a server device 100 according to one embodiment of the present disclosure. 図２Ｂは、本開示の一実施形態に係る端末装置２００の構成を示すブロック図である。Figure 2B is a block diagram showing the configuration of a terminal device 200 according to one embodiment of the present disclosure. 図３は、本開示の一実施形態に係る送信情報として送信者端末装置２００－１から送信される情報を概略的に示す図である。Figure 3 is a schematic diagram showing information transmitted from the sender terminal device 200-1 as transmission information according to one embodiment of the present disclosure. 図４は、本開示の一実施形態に係る処理システム１で実行される処理シーケンスを示す図である。Figure 4 is a diagram showing a processing sequence executed by a processing system 1 according to one embodiment of the present disclosure. 図５は、本開示の一実施形態に係る受信者端末装置２００－２において実行される処理フローを示す図である。Figure 5 is a diagram showing the processing flow performed in a receiver terminal device 200-2 according to one embodiment of the present disclosure. 図６は、本開示の一実施形態に係る送信者端末装置２００－１において出力される画面の例を示す図である。Figure 6 shows an example of a screen output in a sender terminal device 200-1 according to one embodiment of the present disclosure. 図７は、本開示の一実施形態に係る受信者端末装置２００－２において出力される画面の例を示す図である。Figure 7 shows an example of a screen output in a receiver terminal device 200-2 according to one embodiment of the present disclosure. 図８Ａは、本開示の一実施形態に係る処理システム１で実行される処理シーケンスを示す図である。Figure 8A is a diagram showing a processing sequence executed by a processing system 1 according to one embodiment of the present disclosure. 図８Ｂは、本開示の一実施形態に係る処理システム１で実行される処理シーケンスを示す図である。Figure 8B is a diagram showing a processing sequence executed by a processing system 1 according to one embodiment of the present disclosure. 図９は、本開示の実施形態に係る処理システム１に係る処理の概要を示す図である。Figure 9 is a diagram showing an overview of the processing according to the processing system 1 according to the embodiment of this disclosure.

１．処理システム１の概要
本開示に係る処理システム１は、送信者から送信されるコンテンツについて、受信者が所望のオブジェクトに関連付けられた発話情報を出力するために用いられる。一例としては、処理システム１は、送信者から送信される動画コンテンツについて、受信者が動画コンテンツ内に登場するキャラクタオブジェクトのうちの１つを選択することで、当該キャラクタオブジェクトに関連付けられた音声のみを出力するために用いられる。 1. Overview of Processing System 1 The processing system 1 described herein is used to output speech information associated with an object desired by the receiver for content transmitted from the sender. For example, the processing system 1 is used to output only the audio associated with a character object when the receiver selects one of the character objects appearing in video content transmitted from the sender.

ここで、図１Ａは、本開示の実施形態に係る処理システム１に係る処理の概要を示す図である。具体的には、図１Ａは、処理システム１を用いて行われる動画コンテンツの配信における処理の一例が示されている。図１Ａによると、送信者であるユーザは、利用可能な送信者端末装置を用いて、サーバ装置を介して、キャラクタＡ及びキャラクタＢの各オブジェクトが登場する動画コンテンツを、受信者であるユーザの受信者端末装置に送信する。当該動画コンテンツには、例えば送信者自身が、キャラクタＡの音声Ａである音声情報と、キャラクタＢの音声Ｂである音声情報の両方を入力する（典型的には、動画コンテンツにおいてキャラクタＡとキャラクタＢを送信者が演じ分ける場合が想定される。）。 Here, Figure 1A is a diagram illustrating the overview of the processing according to the processing system 1 according to the embodiment of this disclosure. Specifically, Figure 1A shows an example of the processing in the distribution of video content performed using the processing system 1. According to Figure 1A, the user, as the sender, uses an available sender terminal device to transmit video content featuring the objects of character A and character B to the receiver terminal device of the user, as the receiver. For example, the sender themselves inputs both audio information, which is voice A for character A, and audio information, which is voice B for character B (typically, it is assumed that the sender performs both character A and character B in the video content).

そして、受信者であるユーザは、利用可能な受信者端末装置を用いて、サーバ装置を介して、送信者端末装置から動画コンテンツを受信し、再生する。ところで、受信者は、例えば自身の好みや自身の状況などに応じて、再生される動画コンテンツからキャラクタＡ及びキャラクタＢのうちいずれか一方だけの音声を出力したい、逆に言えば他方をミュートしたいというニーズがある。このとき、ただ単に音声のボリュームや再生アプリケーションの音声の設定変更のみであれば、送信者端末装置から送信される音声の全て、すなわちキャラクタＡの音声Ａ及びキャラクタＢの音声Ｂの両方をミュートするか、両方を出力し続けるかしかできない。しかし、処理システム１によれば、あらかじめキャラクタＡの音声Ａの音声情報及びキャラクタＢの音声Ｂの音声情報に対して各音声情報を識別するための識別情報を付しているため、受信者が所望する音声のみを出力して、他方の出力を制限、つまりミュートすることが可能である。図１Ａの例では、音声Ａのみが出力され、音声Ｂの出力が制限、つまりミュートされている。 The user, as the recipient, then uses an available recipient terminal device to receive and play video content from the sender terminal device via the server device. However, the recipient may have a need to output only the voice of either character A or character B from the video content being played, depending on their preferences or circumstances, or conversely, to mute the other. In this case, simply changing the volume or audio settings of the playback application would only allow either muting all the audio transmitted from the sender terminal device—that is, both voice A of character A and voice B of character B—or continuously outputting both. However, according to processing system 1, since identification information is pre-assigned to the audio information of character A's voice A and character B's voice B, it is possible to output only the audio desired by the recipient and restrict, i.e., mute, the output of the other. In the example in Figure 1A, only voice A is output, and the output of voice B is restricted, i.e., muted.

このような処理システム１は、典型的にはキャラクタＡやキャラクタＢが登場する動画コンテンツにおいて利用されるが、その他にもビデオ会議や電話会議などの動画コンテンツにおいても利用することが可能である。このような場合も、上記と同様に、ビデオ会議や電話会議に参加するユーザのキャラクタやユーザの識別情報を指定することによって、いずれかの音声の出力を制限することが可能である。 This processing system 1 is typically used in video content featuring characters A and B, but it can also be used in other video content such as video conferencing and teleconferencing. In such cases, as described above, it is possible to restrict the output of any of the audio by specifying the character or user identification information of the users participating in the video or teleconferencing.

このように、処理システム１では、送信者端末装置において、コンテンツ（例えば、動画コンテンツ）に含まれる複数のオブジェクト（例えば、キャラクタＡ及びキャラクタＢ）のそれぞれに関連付けて発話情報（例えば、音声Ａの音声情報及び音声Ｂの音声情報）が入力される。一方、受信者端末装置において、複数のオブジェクトのうちの少なくとも一つのオブジェクト（例えば、キャラクタＡ）が選択される。そして、受信者端末装置において、選択された少なくとも一つのオブジェクト（例えば、キャラクタＡ）に関連付けられた発話情報（例えば、音声Ａの音声情報）の出力を許容するとともに、当該オブジェクト以外のオブジェクト（例えば、キャラクタＢ）に関連付けられた発話情報（例えば、音声Ｂの音声情報）の出力を制限する。 Thus, in processing system 1, the sender terminal device receives speech information (e.g., audio information for voice A and voice B) associated with each of the multiple objects (e.g., character A and character B) contained in the content (e.g., video content). Meanwhile, the receiver terminal device selects at least one of the multiple objects (e.g., character A). The receiver terminal device then allows the output of speech information (e.g., audio information for voice A) associated with the selected object (e.g., character A), while restricting the output of speech information (e.g., audio information for voice B) associated with other objects (e.g., character B).

なお、本開示において、「送信者」及び「受信者」は、コンテンツの送信をする者とコンテンツの受信をする者を区別するためにつけた呼称であるにすぎない。すなわち、送信者と記載されていたとしても、他の者からコンテンツを受信する場合には受信者になり得るし、受信者と記載されていたとしても、他の者にコンテンツを送信する場合には送信者になり得る。また、送信者及び受信者は、共に、個人のみに限定されるわけではなく、企業や団体などの組織であってもよい。また、送信者自らコンテンツの生成をする場合を主に記載するが、送信者とコンテンツの生成を行う者は別々であってもよい。この場合、コンテンツの生成を行う者が、コンテンツの生成のみを行ってコンテンツの生成を行わなかったとしても、生成したコンテンツがいずれかの者によって送信される場合には、送信者に含む。 In this disclosure, "sender" and "receiver" are merely designations used to distinguish between those who send content and those who receive content. That is, even if someone is described as a sender, they may also be a receiver if they receive content from someone else, and even if someone is described as a receiver, they may also be a sender if they transmit content to someone else. Furthermore, both senders and receivers are not limited to individuals; they may also be organizations such as companies or groups. While this primarily describes cases where the sender generates the content themselves, the sender and the content creator may be different individuals. In this case, even if the content creator only generates the content and does not create the content themselves, if the generated content is transmitted by someone else, they are included as the sender.

また、本開示において、「送信者端末装置」及び「受信者端末装置」は、コンテンツの送信をする端末装置とコンテンツの受信をする端末装置を区別するためにつけた呼称であるにすぎない。すなわち、送信者端末装置と記載されていたとしても、他の端末装置からコンテンツを受信する場合には受信者端末装置になり得るし、受信者端末装置と記載されていたとしても、他の端末装置にコンテンツを送信する場合には送信者端末装置になり得る。 Furthermore, in this disclosure, "sender terminal device" and "receiver terminal device" are merely designations used to distinguish between terminal devices that transmit content and terminal devices that receive content. That is, even if a device is described as a sender terminal device, it can become a receiver terminal device if it receives content from another terminal device, and even if a device is described as a receiver terminal device, it can become a sender terminal device if it transmits content to another terminal device.

本開示において、「コンテンツ」は、通信ネットワークを介して送受信されるひとまとまりの電子的な情報を意味する。このようなコンテンツには、一例としては、動画コンテンツ、音楽コンテンツ、ゲームコンテンツ、出版物コンテンツ、チャットコンテンツ、ＳＮＳコンテンツ、ウェブコンテンツ及びこれらの組み合わせ等が挙げられる。これらの中でも、処理システム１は、複数のオブジェクトであるキャラクタオブジェクトが登場人物として含まれる画像情報と各キャラクタオブジェクトに対してそれぞれ関連付けられた音声情報を少なくとも含む動画コンテンツに対して、好ましくは用いられる。なお、本開示においては、動画コンテンツには、動画コンテンツの配信サイト等を通じて配信されているコンテンツのみならず、例えばビデオ会議コンテンツ（カメラ機能をオフにして音声のみで送受信される場合も含む）、電話会議コンテンツ、デジタルサイネージ等の電子広告コンテンツなども含む。また、以下では、特に言及しない限り、コンテンツの例として動画コンテンツの場合を説明するが、当然にコンテンツが動画コンテンツに限定されるわけではない。 In this disclosure, "content" means a set of electronic information transmitted and received via a communication network. Examples of such content include video content, music content, game content, publication content, chat content, social networking service (SNS) content, web content, and combinations thereof. Among these, processing system 1 is preferably used for video content that includes at least image information containing multiple character objects as characters, and audio information associated with each character object. In this disclosure, video content includes not only content distributed through video content distribution sites, but also, for example, video conferencing content (including cases where only audio is transmitted and received with the camera function turned off), teleconferencing content, and electronic advertising content such as digital signage. Furthermore, unless otherwise specified, the following explanation will use video content as an example of content; however, content is not necessarily limited to video content.

本開示において、「オブジェクト」は、コンテンツ内に含まれるデータ又はそれを操作入力するための手段のことを意味する。このようなオブジェクトには、一例としては、キャラクタオブジェクト、構造物オブジェクト、装飾オブジェクト、テキストオブジェクト、画像オブジェクト、ＧＵＩオブジェクト及びこれらの組み合わせ等が挙げられる。これらの中でも、処理システム１は、動画コンテンツ内において登場人物として含まれるようなキャラクタオブジェクト（例えば、図１ＡのキャラクタＡ及びキャラクタＢ）に対して、好ましくは用いられる。なお、以下では、特に言及しない限り、オブジェクトの例としてキャラクタオブジェクトの場合を説明するが、当然にオブジェクトがキャラクタオブジェクトに限定されるわけではない。 In this disclosure, "object" means data contained within content or means for manipulating and inputting such data. Examples of such objects include character objects, structural objects, decorative objects, text objects, image objects, GUI objects, and combinations thereof. Among these, the processing system 1 is preferably used for character objects included as characters within video content (for example, character A and character B in Figure 1A). In the following, unless otherwise specified, the example of an object will be a character object; however, objects are not necessarily limited to character objects.

本開示において、「処理装置」は、処理システム１を構成する装置のいずれかを意味するものであり、サーバ装置、送信者端末装置及び受信者端末装置のいずれであってもよい。また、処理装置は、これらいずれかの装置単体に限るものではなく、処理装置において行われる処理を分散して処理可能に複数の装置が組み合わされたものであってもよい。なお、「処理プログラム」及び「処理方法」は、当該処理装置において実行されるプログラム及び方法を意味する。 In this disclosure, "processing device" means any of the devices constituting the processing system 1, and may be a server device, a sender terminal device, or a receiver terminal device. Furthermore, the processing device is not limited to any single device, but may be a combination of multiple devices capable of distributing the processing performed by the processing device. "Processing program" and "processing method" mean the program and method executed in the said processing device.

２．処理システム１の構成
図１Ｂは、本開示の一実施形態に係る処理システム１の構成を示すブロック図である。図１Ｂによれば、処理システム１は、コンテンツ（例えば、動画コンテンツ）の処理をするためのサーバ装置１００、コンテンツを送信する送信者端末装置２００－１、及びコンテンツを受信する受信者端末装置２００－２を含み、これらが通信ネットワークを介して通信可能に接続されている。 2. Diagram 1B of the configuration of the processing system 1 is a block diagram showing the configuration of the processing system 1 according to one embodiment of the present disclosure. According to Figure 1B, the processing system 1 includes a server device 100 for processing content (for example, video content), a sender terminal device 200-1 for transmitting content, and a receiver terminal device 200-2 for receiving content, and these are connected to each other so as to be able to communicate via a communication network.

なお、図１Ｂにおいて、送信者端末装置２００－１及び受信者端末装置２００－２はそれぞれ単一の装置が示されているが、当然に、それぞれ複数の装置が含まれていてもよい。 In Figure 1B, the sender terminal device 200-1 and the receiver terminal device 200-2 are shown as single devices, but naturally, each may include multiple devices.

また、図１Ｂにおいて、単一のサーバ装置１００が示されているが、複数のサーバ装置や他の装置が組み合わされて処理や記憶を分散してもよい。この場合、サーバ装置１００は、複数のサーバ装置や他の装置の組み合わせも含みうる。 Furthermore, although Figure 1B shows a single server device 100, multiple server devices and other devices may be combined to distribute processing and storage. In this case, the server device 100 may include combinations of multiple server devices and other devices.

３．サーバ装置１００の構成
図２Ａは、本開示の一実施形態に係るサーバ装置１００の構成を示すブロック図である。サーバ装置１００は、図２Ａに示す構成要素の全てを備える必要はなく、一部を省略した構成をとることも可能であるし、他の構成要素を加えることも可能である。サーバ装置１００は単一の筐体に図２Ａに図示する構成要素を備える必要はなく、サーバ装置１００の各構成要素及び処理を複数の装置に分散することも可能である。 3. Configuration diagram 2A of the server device 100 is a block diagram showing the configuration of a server device 100 according to one embodiment of the present disclosure. The server device 100 does not need to have all of the components shown in Figure 2A; it is possible to omit some components or add other components. The server device 100 does not need to have all the components shown in Figure 2A in a single enclosure; it is possible to distribute each component and processing of the server device 100 across multiple devices.

図２Ａによると、サーバ装置１００は、ＣＰＵ等から構成されるプロセッサ１１１、ＲＡＭ、ＲＯＭ、及び不揮発性メモリ、ＨＤＤ等を含むメモリ１１２、及び通信インターフェイス１１３を含む。そして、これらの各構成要素が制御ライン及びデータラインを介して互いに電気的に接続される。 According to Figure 2A, the server device 100 includes a processor 111 composed of a CPU, RAM, ROM, and memory 112 including non-volatile memory and an HDD, and a communication interface 113. These components are electrically connected to each other via control lines and data lines.

プロセッサ１１１は、ＣＰＵ（マイクロコンピュータ：マイコン）から構成され、メモリ１１２に記憶された各種プログラムに基づいて、接続された他の構成要素を制御するための制御部として機能する。プロセッサ１１１は、本開示に係るアプリケーションを実行するためのプログラムやＯＳを実行するためのプログラムをメモリ１１２から読み出して実行する。具体的には、プロセッサ１１１は、「通信インターフェイス１１３を介して、送信者端末装置２００－１において生成されるコンテンツに含まれる複数のオブジェクトのそれぞれに関連付けて入力される発話情報を送信者端末装置２００－１から受信する処理」、及び「通信インターフェイス１１３を介して、受信したコンテンツを受信者端末装置２００－２に送信する処理」等を、メモリ１１２に記憶されたプログラムに基づいて実行する。プロセッサ１１１は、主に一又は複数のＣＰＵにより構成されるが、適宜ＧＰＵやＦＰＧＡなどを組み合わせてもよい。 The processor 111 is composed of a CPU (microcomputer) and functions as a control unit for controlling other connected components based on various programs stored in the memory 112. The processor 111 reads and executes programs for running the application and the OS from the memory 112. Specifically, the processor 111 executes processes such as "receiving speech information from the sender terminal device 200-1 via the communication interface 113, associated with each of the multiple objects included in the content generated in the sender terminal device 200-1," and "transmitting the received content to the receiver terminal device 200-2 via the communication interface 113," based on programs stored in the memory 112. The processor 111 is mainly composed of one or more CPUs, but a GPU, FPGA, etc., may be combined as appropriate.

メモリ１１２は、ＲＡＭ、ＲＯＭ、不揮発性メモリ、ＨＤＤを含み、記憶部として機能する。ＲＯＭは、本開示に係るアプリケーションやＯＳを実行するための指示命令をプログラムとして記憶する。このようなプログラムは、プロセッサ１１１によってロードされ実行される。ＲＡＭは、ＲＯＭに記憶されたプログラムがプロセッサ１１１によって処理されている間、データの書き込み及び読み込みを実行するために用いられる。不揮発性メモリは、当該プログラムの実行によってデータの書き込み及び読み込みが実行されるメモリであって、ここに書き込まれたデータは、当該プログラムの実行が終了した後でも保存される。具体的には、メモリ１１２は、プロセッサ１１１が上記処理等を実行するためのプログラムを記憶する。 Memory 112 includes RAM, ROM, non-volatile memory, and HDD, and functions as a storage unit. ROM stores instruction commands for executing the application and OS related to this disclosure as programs. Such programs are loaded and executed by the processor 111. RAM is used to write and read data while the program stored in ROM is being processed by the processor 111. Non-volatile memory is memory where data is written and read as a result of program execution, and data written there is retained even after the program execution has finished. Specifically, memory 112 stores programs for the processor 111 to perform the above-mentioned processing.

通信インターフェイス１１３は、通信処理回路及びアンテナを介して、遠隔に設置された送信者端末装置２００－１及び受信者端末装置２００－２等の他の装置との間で情報の送受信をする通信部として機能する。通信処理回路は、処理システム１において用いられるプログラムや各種情報等を処理の進行に応じて情報を送受信するための処理をする。通信処理回路は、ＬＴＥ方式に代表されるような広帯域の無線通信方式に基づいて処理されるが、ＩＥＥＥ８０２．１１に代表されるような無線ＬＡＮやＢｌｕｅｔｏｏｔｈ（登録商標）のような狭帯域の無線通信に関する方式や非接触無線通信に関する方式に基づいて処理することも可能である。また、無線通信に代えて、又は加えて、有線通信を用いることも可能である。 The communication interface 113 functions as a communication unit that transmits and receives information with other devices, such as the remotely installed transmitter terminal device 200-1 and receiver terminal device 200-2, via the communication processing circuit and antenna. The communication processing circuit performs processing to transmit and receive programs and various information used in the processing system 1 as processing progresses. The communication processing circuit processes based on a wideband wireless communication method, such as LTE, but can also process based on narrowband wireless communication methods such as wireless LAN (as represented by IEEE 802.11) or Bluetooth®, or contactless wireless communication methods. Furthermore, wired communication can be used in place of, or in addition to, wireless communication.

４．端末装置２００の構成
図２Ｂは、本開示の一実施形態に係る端末装置２００の構成を示すブロック図である。端末装置２００は、図２Ｂに示す構成要素の全てを備える必要はなく、一部を省略した構成をとることも可能であるし、他の構成要素を加えることも可能である。また、端末装置２００は、送信者端末装置２００－１又は受信者端末装置２００－２として利用されるが、両者が同一の構成を備える必要はなく、端末装置ごとに異なる構成を有してもよい。 4. Diagram 2B of the configuration of the terminal device 200 is a block diagram showing the configuration of the terminal device 200 according to one embodiment of the present disclosure. The terminal device 200 does not need to have all of the components shown in Figure 2B; it is possible to omit some components or to add other components. Furthermore, although the terminal device 200 is used as a sender terminal device 200-1 or a receiver terminal device 200-2, the two do not need to have the same configuration; each terminal device may have a different configuration.

図２Ｂによると、端末装置２００は、ＣＰＵ等から構成されるプロセッサ２１１、ＲＡＭ、ＲＯＭ、及び不揮発性メモリ、ＨＤＤ等を含むメモリ２１２、通信インターフェイス２１３、入力インターフェイス２１４及び出力インターフェイス２１５を含む。そして、これらの各構成要素が制御ライン及びデータラインを介して互いに電気的に接続される。 According to Figure 2B, the terminal device 200 includes a processor 211 composed of a CPU, RAM, ROM, and memory 212 including non-volatile memory and HDD, a communication interface 213, an input interface 214, and an output interface 215. These components are electrically connected to each other via control lines and data lines.

プロセッサ２１１は、ＣＰＵ（マイクロコンピュータ：マイコン）から構成され、メモリ２１２に記憶された各種プログラムに基づいて、接続された他の構成要素を制御するための制御部として機能する。プロセッサ２１１は、本開示に係るアプリケーションを実行するためのプログラムやＯＳを実行するためのプログラムをメモリ２１２から読み出して実行する。プロセッサ２１１は、主に一又は複数のＣＰＵにより構成されるが、適宜ＧＰＵやＦＰＧＡなどを組み合わせてもよい。 The processor 211 consists of a CPU (microcomputer) and functions as a control unit for controlling other connected components based on various programs stored in the memory 212. The processor 211 reads and executes programs for running the application and the operating system from the memory 212. The processor 211 is mainly composed of one or more CPUs, but may be combined with a GPU, FPGA, etc., as appropriate.

プロセッサ１１１は、送信者端末装置２００－１として機能する場合は、「入力インターフェイス２１４を介して送信者による操作入力を受け付けて、コンテンツを生成するためのアプリケーションプログラムを起動する処理」、「入力インターフェイス２１４を介して送信者による操作入力を受け付けて、コンテンツに含まれる複数のオブジェクトのうちのいずれか一つのオブジェクトを特定するために入力された識別情報に基づいて、当該いずれか一つのオブジェクトを選択する処理」、「入力インターフェイス２１４を介して、選択された当該いずれか一つのオブジェクトに関連付けて発話情報を入力する処理」、及び「通信インターフェイス２１３を介して、サーバ装置１００に、画像情報及び発話情報を含むコンテンツを送信する処理」等を、メモリ２１２に記憶されたプログラムに基づいて実行する。 When the processor 111 functions as the sender terminal device 200-1, it executes the following processes based on a program stored in memory 212: "receiving operation input from the sender via the input interface 214 and starting an application program for generating content," "receiving operation input from the sender via the input interface 214 and selecting one of several objects included in the content based on the inputted identification information," "inputting speech information associated with the selected object via the input interface 214," and "transmitting content including image information and speech information to the server device 100 via the communication interface 213."

また、プロセッサ１１１は、受信者端末装置２００－２として機能する場合は、「入力インターフェイス２１４を介して受信者による操作入力を受け付けて、コンテンツを出力するためのアプリケーションプログラムを起動し、所望のコンテンツの選択をする処理」、「通信インターフェイス２１３を介して、送信者端末装置２００－１において生成されるコンテンツに含まれる複数のオブジェクトのそれぞれに関連付けて入力される発話情報を含むコンテンツを送信者端末装置２００－１から受信する処理」、「出力インターフェイス２１５を介して選択されたコンテンツを出力する処理」、「入力インターフェイス２１４を介してコンテンツに含まれる複数のオブジェクトのうちの少なくとも一つのオブジェクトを選択する処理」、及び「出力インターフェイス２１５を介してコンテンツに含まれる発話情報を出力するときに、選択された少なくとも一つのオブジェクトに関連付けられた発話情報を出力する処理」等を、メモリ２１２に記憶されたプログラムに基づいて実行する。 Furthermore, when the processor 111 functions as a receiver terminal device 200-2, it executes the following processes based on a program stored in memory 212: "receiving operation input from the receiver via the input interface 214, launching an application program for outputting content, and selecting the desired content"; "receiving content from the sender terminal device 200-1 via the communication interface 213, including speech information input associated with each of the multiple objects included in the content generated at the sender terminal device 200-1"; "outputting the selected content via the output interface 215"; "selecting at least one object from the multiple objects included in the content via the input interface 214"; and "outputting speech information associated with the selected at least one object when outputting speech information included in the content via the output interface 215."

メモリ２１２は、ＲＡＭ、ＲＯＭ又は不揮発性メモリを含み、記憶部として機能する。ＲＯＭは、本開示に係るアプリケーションやＯＳを実行するための指示命令をプログラムとして記憶する。このようなプログラムは、プロセッサ２１１によってロードされ実行される。ＲＡＭは、ＲＯＭに記憶されたプログラムがプロセッサ２１１によって処理されている間、データの書き込み及び読み込みを実行するために用いられる。不揮発性メモリは、当該プログラムの実行によってデータの書き込み及び読み込みが実行されるメモリであって、ここに書き込まれたデータは、当該プログラムの実行が終了した後でも保存される。具体的には、メモリ２１２は、プロセッサ２１１が上記処理等を実行するためのプログラムを記憶する。 Memory 212 includes RAM, ROM, or non-volatile memory and functions as a storage unit. ROM stores instruction commands for executing the application and OS according to this disclosure as a program. Such programs are loaded and executed by the processor 211. RAM is used to write and read data while the program stored in ROM is being processed by the processor 211. Non-volatile memory is memory in which data is written and read as a result of program execution, and the data written therein is retained even after the program execution has finished. Specifically, memory 212 stores programs for the processor 211 to perform the above-mentioned processing, etc.

通信インターフェイス２１３は、通信処理回路を介して、電気的に接続されたサーバ装置１００や他の端末装置２００との間で情報の送受信をする通信部として機能する。通信処理回路は、処理システム１において用いられるプログラムや各種情報等を処理の進行に応じて情報を送受信するための処理をする。通信処理回路は、ＬＴＥ方式に代表されるような広帯域の無線通信方式に基づいて処理されるが、ＩＥＥＥ８０２．１１に代表されるような無線ＬＡＮやＢｌｕｅｔｏｏｔｈ（登録商標）のような狭帯域の無線通信に関する方式や非接触無線通信に関する方式に基づいて処理することも可能である。また、無線通信に代えて、又は加えて、有線通信を用いることも可能である。 The communication interface 213 functions as a communication unit that transmits and receives information between the electrically connected server device 100 and other terminal devices 200 via the communication processing circuit. The communication processing circuit performs processing to transmit and receive programs and various other information used in the processing system 1 as processing progresses. While the communication processing circuit processes based on a broadband wireless communication method such as LTE, it can also process based on narrowband wireless communication methods such as wireless LAN (as represented by IEEE 802.11) or Bluetooth®, or contactless wireless communication methods. Furthermore, wired communication can be used in place of, or in addition to, wireless communication.

入力インターフェイス２１４は、端末装置２００に対する送信者又は受信者の操作入力や送信者又は受信者による各種情報の入力を受け付ける入力部として機能する。入力インターフェイス２１４の一例としては、キーボード、マウス等の各種ハードキーや、ディスプレイ装置のディスプレイに重畳して設けられディスプレイの表示座標系に対応する入力座標系を有するタッチパネルなどに加え、発話情報の一つである音声情報の入力を入力するためのマイク、画像を撮影するためのカメラなどの外部環境をセンシングするためのセンサ等が挙げられる。タッチパネルの場合、ディスプレイに入力したいコマンドに対応したアイコンが表示され、当該タッチパネルを介してユーザ又は事業者が操作入力を行うことで、各アイコンに対する選択が行われる。タッチパネルによる操作入力の検出方式は、静電容量式、抵抗膜式などいかなる方式であってもよい。入力インターフェイス２１４は、常に端末装置２００に物理的に備えられる必要はなく、有線や無線ネットワークを介して必要に応じて接続されてもよい。 The input interface 214 functions as an input unit that accepts operation input from the sender or receiver to the terminal device 200, as well as input of various types of information by the sender or receiver. Examples of the input interface 214 include various hard keys such as keyboards and mice, a touch panel superimposed on the display of a display device and having an input coordinate system corresponding to the display's coordinate system, a microphone for inputting voice information (a type of spoken information), and sensors for sensing the external environment, such as a camera for capturing images. In the case of a touch panel, icons corresponding to the command to be input are displayed on the screen, and the user or business operator makes a selection of each icon by performing operation input via the touch panel. The detection method for operation input via the touch panel can be any method, such as capacitive or resistive. The input interface 214 does not always need to be physically provided on the terminal device 200; it may be connected via a wired or wireless network as needed.

出力インターフェイス２１５は、各種情報を出力するための出力部として機能する。出力インターフェイス２１５の一例としては、液晶パネル、有機ＥＬディスプレイ又はプラズマディスプレイ等から構成されるディスプレイ装置等の外部装置又は外部機器と接続するためのインターフェイスが挙げられる。しかし、端末装置２００そのものがディスプレイを有する場合には、当該ディスプレイが出力インターフェイスとして機能することが可能である。また、ディスプレイ装置などに対して通信インターフェイス２１３を介して接続されている場合には、当該通信インターフェイス２１３が出力インターフェイス２１５として機能することも可能である。 The output interface 215 functions as an output unit for outputting various types of information. An example of the output interface 215 is an interface for connecting to an external device or equipment, such as a display device composed of a liquid crystal panel, organic EL display, or plasma display. However, if the terminal device 200 itself has a display, that display can function as the output interface. Furthermore, if it is connected to a display device via a communication interface 213, the communication interface 213 can also function as the output interface 215.

６．コンテンツの例
本実施形態において、上記のとおり、送信者端末装置２００－１においてコンテンツが生成され、サーバ装置１００を介して受信者端末装置２００－２に生成されたコンテンツが出力される。このようなコンテンツには複数のオブジェクトが含まれ、各オブジェクトに対して発話情報が対応付けられている。このようなコンテンツは、通信ネットワークを介して送受信されるひとまとまりの電子的な情報を意味する。その一例としては、動画コンテンツ、音楽コンテンツ、ゲームコンテンツ、出版物コンテンツ、チャットコンテンツ、ＳＮＳコンテンツ、ウェブコンテンツ及びこれらの組み合わせ等が挙げられる。これらの中でも、処理システム１は、複数のオブジェクトであるキャラクタオブジェクトが登場人物として含まれる画像情報と各キャラクタオブジェクトに対してそれぞれ関連付けられた音声情報を少なくとも含む動画コンテンツに対して、好ましくは用いられる。なお、以下では、特に言及しない限り、コンテンツが動画コンテンツである場合を例に説明するが、当然にコンテンツが動画コンテンツのみに限定されるわけではなく、他のコンテンツであっても本実施形態に係る処理は同様に実行可能である。 6. Examples of Content In this embodiment, as described above, content is generated in the sender terminal device 200-1, and the generated content is output to the receiver terminal device 200-2 via the server device 100. Such content includes multiple objects, and speech information is associated with each object. Such content refers to a set of electronic information transmitted and received via a communication network. Examples include video content, music content, game content, publication content, chat content, SNS content, web content, and combinations thereof. Among these, the processing system 1 is preferably used for video content that includes at least image information in which multiple objects, namely character objects, are characters, and audio information associated with each character object. In the following, unless otherwise specified, the explanation will be given using the case where the content is video content as an example, but of course, the content is not limited to video content, and the processing according to this embodiment can be executed similarly even with other types of content.

図３は、本開示の一実施形態に係る送信情報として送信者端末装置２００－１から送信される情報を概略的に示す図である。具体的には、図３は、送信者端末装置２００－１において生成されメモリ２１２に記憶されたのちに、サーバ装置１００に送信されコンテンツ管理テーブルに記憶される動画コンテンツの一例を示す図である。 Figure 3 is a schematic diagram illustrating information transmitted from the sender terminal device 200-1 as transmission information according to one embodiment of this disclosure. Specifically, Figure 3 shows an example of video content that is generated in the sender terminal device 200-1, stored in memory 212, and then transmitted to the server device 100 and stored in the content management table.

図３によると、動画コンテンツは、当該動画コンテンツのコンテンツＩＤ情報に対応付けて、画像情報と音声情報を含む。「コンテンツＩＤ情報」は、各動画コンテンツに対して固有の情報であり、各動画コンテンツを識別するための情報である。当該コンテンツＩＤ情報は、送信者端末装置２００－１において新たな動画コンテンツの生成がされるたび、又はサーバ装置１００において新たな動画コンテンツが受信されるたびに生成される。 According to Figure 3, video content includes image information and audio information, associated with the content ID information of the video content. "Content ID information" is unique to each video content and is used to identify each video content. This content ID information is generated each time new video content is generated at the sender terminal device 200-1, or each time new video content is received at the server device 100.

「画像情報」は、動画コンテンツを構成する画像データである。当該画像情報は、静止画像、動画像及びこれらの組み合わせのいずれであってもよい。このような画像情報は、送信者端末装置２００－１において入力インターフェイス２１４の一つとして備えられたカメラによって実空間を撮影されたものであってもよいし、プロセッサ２１１の処理によって仮想的に生成されたものであってもよい。画像情報には、互いに識別可能である複数のオブジェクトが少なくとも含まれ、各オブジェクトに対応付けてオブジェクトＩＤ情報が付与されている。例えば、図３の例では、画像情報には、オブジェクトＩＤ情報が「Ｂ１」であるキャラクタＡのキャラクタオブジェクトと、オブジェクトＩＤ情報が「Ｂ２」であるキャラクタＢのキャラクタオブジェクトが含まれる。このようなオブジェクトは、一例としては、キャラクタオブジェクト、構造物オブジェクト、装飾オブジェクト、テキストオブジェクト、画像オブジェクト、ＧＵＩオブジェクト及びこれらの組み合わせ等が挙げられる。これらの中でも、処理システム１は、動画コンテンツ内において登場人物として含まれるようなキャラクタオブジェクト（例えば、図１ＡのキャラクタＡ及びキャラクタＢ）に対して、好ましくは用いられる。なお、以下では、特に言及しない限り、オブジェクトの例としてキャラクタオブジェクトの場合を説明するが、当然にオブジェクトがキャラクタオブジェクトに限定されるわけではない。 "Image information" refers to image data that constitutes the video content. This image information may be a still image, a moving image, or a combination thereof. Such image information may be captured from real space by a camera provided as one of the input interfaces 214 in the sender terminal device 200-1, or it may be virtually generated by the processing of the processor 211. The image information includes at least multiple objects that are identifiable from one another, and each object is assigned object ID information. For example, in the example in Figure 3, the image information includes a character object of character A with object ID information "B1" and a character object of character B with object ID information "B2". Examples of such objects include character objects, structural objects, decorative objects, text objects, image objects, GUI objects, and combinations thereof. Among these, the processing system 1 is preferably used for character objects that are included as characters in the video content (for example, character A and character B in Figure 1A). Note that, unless otherwise specified, the following explanation will describe the case of character objects as examples of objects, but of course, objects are not limited to character objects.

「音声情報」は、発話情報の一つであり、動画コンテンツを構成する音声データである。当該音声情報は、一例としては、送信者端末装置２００－１において入力インターフェイス２１４の一つとして備えられたマイクによって送信者の音声等が入力された音声データである。ただし、これ以外にも、音声情報は、例えば、入力インターフェイス２１４を介して入力されたテキスト情報に基づいてキャラクタオブジェクトの音声を再現した音声データや、マイクによって入力された送信者の音声をテキスト化したテキストデータ、入力インターフェイス２１４を介して入力されたテキスト情報に基づいて生成されたテキストデータ、又はこれらのうちの少なくともいずれかを変換した他のデータであってもよい。このような音声情報は、典型的には、動画コンテンツに含まれる各オブジェクトの各オブジェクトＩＤ情報に対応付けて記憶される。例えば、図３の例では、キャラクタＡのオブジェクトＩＤ情報である「Ｂ１」に対応付けて音声情報Ａが記憶され、キャラクタＢのオブジェクトＩＤ情報である「Ｂ２」に対応付けて音声情報Ｂが記憶され、いずれのオブジェクトＩＤ情報にも対応付けられていない音声情報としてＢＧＭ音声情報が記憶されている。 "Audio information" is a type of speech information and constitutes audio data that makes up the video content. For example, this audio information is audio data input by a microphone provided as one of the input interfaces 214 in the sender terminal device 200-1, containing the sender's voice, etc. However, the audio information may also be, for example, audio data that reproduces the voice of a character object based on text information input via the input interface 214, text data that transcribes the sender's voice input via the microphone, text data generated based on text information input via the input interface 214, or other data converted from at least one of these. Such audio information is typically stored in association with the object ID information of each object included in the video content. For example, in the example in Figure 3, audio information A is stored in association with "B1," which is the object ID information of character A, audio information B is stored in association with "B2," which is the object ID information of character B, and BGM audio information is stored as audio information not associated with any object ID information.

すなわち、図３によると、コンテンツＩＤ情報が「Ａ１」である動画コンテンツがコンテンツの一例として示されている。当該動画コンテンツには、Ｆ１からＦｎの複数のフレームで構成され時間ｔ０から時間ｔｎの長さを有する動画である画像情報が含まれる。当該画像情報のうちの少なくともいずれかのフレームには、その登場人物として、オブジェクトＩＤ情報が「Ｂ１」であるキャラクタＡと、オブジェクトＩＤ情報が「Ｂ２」であるキャラクタＢが、それぞれオブジェクトとして含まれる。ここで、例えば図１Ａで例示したように、送信者自らが自身の送信者端末装置２００－１を使って、キャラクタＡ及びキャラクタＢをそれぞれ演じ分けてる場合が想定されている。したがって。当該動画コンテンツには、時間ｔ０で入力が開始され時間ｔ２で入力が終了されたキャラクタＡの音声情報Ａが含まれる。また、当該動画コンテンツには、時間ｔ１において送信者による操作入力が受け付けられることによって、時間ｔ１で入力が開始され時間ｔ４で入力が終了されたキャラクタＢの音声情報Ｂが含まれる。また、当該動画コンテンツには、時間ｔ３において送信者による操作入力が受け付けられることによって、時間ｔ３で入力が開始され時間ｔ６で入力が終了されたキャラクタＡの音声情報Ａが含まれる。また、当該動画コンテンツには、時間ｔ５において送信者による操作入力が受け付けられることによって、時間ｔ４で入力が開始され時間ｔｎで入力が終了されたキャラクタＢの音声情報Ｂが含まれる。さらに、時間ｔ１から時間ｔ６の期間においては、いずれのオブジェクトにも対応付けられていない音声情報としてＢＧＭ音声情報が含まれている。すなわち、図３の例では、例えば時間ｔ１から時間ｔ２、時間ｔ３から時間ｔ４、及び時間ｔ５から時間ｔ６では、キャラクタＡの音声情報Ａ及びキャラクタＢの音声情報Ｂが同時に再生されることとなる。 In other words, as shown in Figure 3, a video content with content ID information "A1" is shown as an example of content. This video content includes image information that is a video consisting of multiple frames from F1 to Fn and having a length from time t0 to time tn. At least one of the frames of this image information includes, as characters, character A with object ID information "B1" and character B with object ID information "B2" as objects. Here, it is assumed that the sender himself is acting out character A and character B respectively using his own sender terminal device 200-1, as illustrated in Figure 1A. Therefore, this video content includes voice information A of character A, which is input started at time t0 and ended at time t2. Also, this video content includes voice information B of character B, which is input started at time t1 and ended at time t4, as a result of receiving operation input from the sender at time t1. Furthermore, the video content includes voice information A of character A, which was inputted at time t3 and ended at time t6, as the sender's input was received at time t3. The video content also includes voice information B of character B, which was inputted at time t4 and ended at time tn, as the sender's input was received at time t5. Additionally, during the period from time t1 to time t6, background music (BGM) audio information is included as audio information not associated with any object. That is, in the example in Figure 3, for example, from time t1 to time t2, from time t3 to time t4, and from time t5 to time t6, voice information A of character A and voice information B of character B are played simultaneously.

このように、コンテンツには、動画コンテンツを例にすると、コンテンツＩＤ情報に対応付けて、画像情報と音声情報が含まれる。また、当該画像情報には、時間（例えば、ｔ０～ｔｎ）に同期するするように、時間に対応付けて動画を構成する各フレーム（画像データ）、各フレームを識別するフレームＩＤ情報（例えば、Ｆ１～Ｆｎ）、及び画像情報の少なくともいずれかのフレームに含まれる各オブジェクトを識別するオブジェクトＩＤ情報が含まれる。また、当該音声情報には、時間（例えば、ｔ０～ｔｎ）に同期するように、時間に対応付けて各音声データ、及び各音声データに対応付けられたオブジェクトＩＤ情報（対応付けられたオブジェクトＩＤ情報がない場合もある）が含まれる。 Thus, in the case of video content, the content includes image information and audio information, associated with the content ID information. The image information includes each frame (image data) constituting the video, associated with time (e.g., t0 to tn), frame ID information identifying each frame (e.g., F1 to Fn), and object ID information identifying each object contained in at least one of the frames of the image information, synchronized with time (e.g., t0 to tn). The audio information includes each audio data, associated with time (e.g., t0 to tn), and object ID information associated with each audio data (although there may be cases where no associated object ID information exists).

なお、図３に示す動画コンテンツは、上記のとおりコンテンツの一例であるにすぎない。したがって、コンテンツとして動画コンテンツを用いる場合であっても、上記において例示する各種情報の全てを備える必要はないし、他の情報をさらに備えていてもよい。 The video content shown in Figure 3 is merely one example of content, as described above. Therefore, even when using video content, it is not necessary to include all of the various types of information exemplified above, and additional information may be included.

また、上記のとおり音声情報は発話情報の一つであり、発話情報は、オブジェクトのオブジェクトＩＤ情報に対応付けられ、送信者が入力した情報を再現可能な情報であればいずれでもよく、音声情報以外にも、例えばテキスト情報や画像情報であってもよい。 Furthermore, as mentioned above, audio information is a type of speech information. Speech information can be associated with the object ID information of an object, and can be any type of information that can reproduce the information entered by the sender. Besides audio information, it can also be text information or image information, for example.

７．処理システム１により実行さる処理シーケンス
図４は、本開示の一実施形態に係る処理システム１で実行される処理シーケンスを示す図である。具体的には、図４は、送信者端末装置２００－１においてコンテンツが生成され、サーバ装置１００を介して受信者端末装置２００－２において生成されたコンテンツが出力されるまでの一連の処理シーケンスを示す図である。各装置における処理は、各装置のメモリに記憶されたプログラムをプロセッサが処理することによって実行される。 7. Processing Sequence Executed by Processing System 1 Figure 4 is a diagram showing a processing sequence executed by processing system 1 according to one embodiment of the present disclosure. Specifically, Figure 4 is a diagram showing a series of processing sequences from when content is generated in the sender terminal device 200-1 until the generated content is output to the receiver terminal device 200-2 via the server device 100. Processing in each device is executed by a processor processing a program stored in the memory of each device.

（Ａ）コンテンツの生成に係る処理
図４によると、まず、主に送信者端末装置２００－１においてコンテンツの生成に係る処理が実行される。送信者端末装置２００－１のプロセッサ２１１は、入力インターフェイス２１４を介して送信者による操作入力を受付て、コンテンツの生成のためのアプリケーションプログラムをメモリ２１２から読み出して、当該アプリケーションプログラムの起動を行う（Ｓ１１）。アプリケーションプログラムが起動されると、プロセッサ２１１は、図３に示すように、コンテンツの画像情報として、フレームＦ１である画像データの記憶を開始する。このとき、プロセッサ２１１は、入力インターフェイス２１４を介して、コンテンツの画像情報に含まれるオブジェクトに関連付けて音声情報の入力を所望するための送信者による操作入力を受け付けると（Ｓ１２）、出力インターフェイス２１５を介して音声入力画面を出力する。 (A) Processing related to content generation As shown in Figure 4, first, the processing related to content generation is mainly performed in the sender terminal device 200-1. The processor 211 of the sender terminal device 200-1 receives operation input from the sender via the input interface 214, reads the application program for content generation from the memory 212, and starts the application program (S11). When the application program is started, the processor 211 starts storing the image data, which is frame F1, as image information of the content, as shown in Figure 3. At this time, when the processor 211 receives operation input from the sender via the input interface 214 to request input of audio information associated with an object included in the image information of the content (S12), it outputs an audio input screen via the output interface 215.

ここで、図６は、本開示の一実施形態に係る送信者端末装置２００－１において出力される画面の例を示す図である。具体的には、図６は、送信者端末装置２００－１において図４のＳ１２において音声情報の入力を所望するための送信者による操作入力を受け付けたときに出力される音声入力画面１０の例を示す図である。図６によると、音声入力画面１０には、「アプリケーションＡ」というアプリケーションプログラムの名称と共に、画像情報表示領域１１とオブジェクト選択領域１２が含まれる。画像情報表示領域１１には、現在録画されている画像情報として、例えばフレームＦ１の画像データ１３が出力されている。当該画像データ１３には、その中にキャラクタオブジェクトとしてキャラクタＡの画像１４及びキャラクタＢの画像１５が含まれる。 Here, Figure 6 shows an example of a screen output in a sender terminal device 200-1 according to one embodiment of the present disclosure. Specifically, Figure 6 shows an example of a voice input screen 10 output when the sender terminal device 200-1 receives an operation input from the sender requesting voice information input in S12 of Figure 4. According to Figure 6, the voice input screen 10 includes the name of the application program, "Application A," along with an image information display area 11 and an object selection area 12. The image information display area 11 outputs image data 13 of frame F1, for example, as currently recorded image information. This image data 13 includes an image 14 of character A and an image 15 of character B as character objects.

ここで、画像１４で示されるキャラクタＡにはオブジェクトＩＤ情報として「Ｂ１」が、画像１５で示されるキャラクタＢにはオブジェクトＩＤ情報として「Ｂ２」が付与される。当該オブジェクトＩＤ情報の付与は、例えばカメラによって動画撮影が行われているときに、プロセッサ２１１が物体検知処理を実行することによって各フレーム内に含まれるオブジェクトを検知するとともに、各オブジェクトが初めて検知されたタイミングで各プロジェクトに対してオブジェクトＩＤ情報を割り当てることによって行われる。また、当該オブジェクトＩＤ情報の付与は、例えば送信者端末装置２００－１によって仮想空間上に仮想的に画像情報を生成する場合には、プロセッサ２１１がその生成時に描画されるオブジェクトに対してオブジェクトＩＤ情報を割り当てることによって行われる。したがって、図６の例では、たまたまキャラクタＡの画像１４及びキャラクタＢの画像１５のみが含まれているが、新たに他のキャラクタの画像が含まれる場合には当該他のキャラクタのキャラクタＩＤ情報が生成されることとなる。 Here, character A shown in image 14 is assigned object ID information "B1", and character B shown in image 15 is assigned object ID information "B2". This assignment of object ID information is performed, for example, when video recording is being performed by a camera, by the processor 211 executing object detection processing to detect objects included in each frame, and assigning object ID information to each project at the moment each object is first detected. Furthermore, when, for example, the sender terminal device 200-1 virtually generates image information in a virtual space, the processor 211 assigns object ID information to the objects drawn during that generation. Therefore, in the example of Figure 6, only images 14 of character A and 15 of character B happen to be included, but if images of other characters are newly included, character ID information for those other characters will be generated.

オブジェクト選択領域１２には、画像情報に含まれるオブジェクトのうちオブジェクトＩＤ情報が付与されたオブジェクトに対応して各オブジェクトを選択するためのアイコンが含まれる。図６の例では、オブジェクト選択領域１２には、キャラクタＡに対応してキャラクタＡアイコン１６と、キャラクタＢに対応してキャラクタＢアイコン１７が含まれる。プロセッサ２１１は、入力インターフェイス２１４を介してオブジェクト選択領域１２に含まれるいずれかのアイコン（例えば、キャラクタＡアイコン１６及びキャラクタＢアイコン１７のいずれか）に対する送信者の操作入力を受け付けると、当該操作入力がされたアイコンに対応するキャラクタを選択する。図６の例では、キャラクタＡアイコン１６が他のアイコンに対して識別可能に表示されているが、これから入力される音声情報が対応付けられるキャラクタのオブジェクトＩＤ情報として、キャラクタＡのオブジェクトＩＤ情報（すなわち、「Ｂ１」）が選択されたことを示している。 The object selection area 12 contains icons for selecting objects corresponding to objects in the image information that have been assigned object ID information. In the example in Figure 6, the object selection area 12 includes a character A icon 16 corresponding to character A and a character B icon 17 corresponding to character B. When the processor 211 receives an operation input from the sender for any of the icons in the object selection area 12 (for example, either character A icon 16 or character B icon 17) via the input interface 214, it selects the character corresponding to the icon for which the operation input was made. In the example in Figure 6, character A icon 16 is displayed in a way that allows it to be identified from the other icons, indicating that the object ID information of character A (i.e., "B1") has been selected as the object ID information of the character to which the incoming audio information will be associated.

再び図４に戻り、図６に示すとおり、音声情報の入力を所望するキャラクタのオブジェクトＩＤ情報が選択されると、送信者端末装置２００－１のプロセッサ２１１は、入力インターフェイス２１４を介して当該オブジェクトＩＤ情報に対応付けて音声情報の入力を受け付ける（Ｓ１３）。具体的には、プロセッサ２１１は、音声情報が入力されている時間（例えば、Ｔ０）に対応付けられた画像情報の各フレームに同期して、入力インターフェイス２１４の一つであるマイクから送信者が発話した音声データを音声情報としてメモリ２１２に記憶する。 Returning to Figure 4, as shown in Figure 6, when the object ID information of a character for which voice information input is desired is selected, the processor 211 of the sender terminal device 200-1 accepts the voice information input via the input interface 214, associating it with the object ID information (S13). Specifically, the processor 211 stores the voice data spoken by the sender from the microphone, which is one of the input interfaces 214, as voice information in the memory 212, synchronized with each frame of the image information associated with the time (e.g., T0) during which the voice information is being input.

送信者端末装置２００－１のプロセッサ２１１は、Ｓ１１～Ｓ１３の画像情報の録画、音声情報を対応付けるオブジェクトＩＤ情報の選択、及び音声情報の入力を繰り返し、例えば図３に例示する、コンテンツＩＤ情報が「Ａ１」のコンテンツの生成を行う。プロセッサ２１１は、コンテンツの生成が終了すると、コンテンツＩＤ情報に対応付けてメモリ２１２に記憶するとともに、通信インターフェイス２１３を介してサーバ装置１００に生成したコンテンツ（Ｔ１１）を送信する。 The processor 211 of the sender terminal device 200-1 repeatedly performs the recording of image information, the selection of object ID information to associate with audio information, and the input of audio information as described in S11 to S13, thereby generating content with content ID information "A1," as illustrated in Figure 3. Once content generation is complete, the processor 211 stores the generated content (T11) in memory 212, associating it with the content ID information, and transmits the generated content to the server device 100 via the communication interface 213.

サーバ装置１００のプロセッサ１１１は、送信者端末装置２００－１からコンテンツを受信すると、コンテンツＩＤ情報に対応付けて、メモリ１１２のコンテンツ管理テーブル（図示しない）に受信したコンテンツを記憶する（Ｓ１４）。具体的には、サーバ装置１００のプロセッサ１１１は、例えば図３で示されたコンテンツに含まれる各情報（画像情報、オブジェクトＩＤ情報及び音声情報など）を、メモリ１１２のコンテンツ管理テーブル（図示しない）にコンテンツＩＤ情報に対応付けて記憶する。以上により、コンテンツの生成に係る処理を終了する。 When the processor 111 of the server device 100 receives content from the sender terminal device 200-1, it stores the received content in the content management table (not shown) of memory 112, associating it with the content ID information (S14). Specifically, the processor 111 of the server device 100 stores each piece of information (image information, object ID information, and audio information, etc.) included in the content shown in Figure 3, associating it with the content ID information in the content management table (not shown) of memory 112. With this, the processing related to content generation is completed.

なお、図４においては、送信者端末装置２００－１のプロセッサ２１１は、コンテンツの生成が終了したタイミングでサーバ装置１００に当該コンテンツを送信したが、所定のフレーム数やデータ量のコンテンツが生成されるごとに分割してコンテンツを送信するようにしてもよい。 In Figure 4, the processor 211 of the sender terminal device 200-1 transmitted the content to the server device 100 when content generation was completed. However, the content may be divided and transmitted each time a predetermined number of frames or amount of data is generated.

また、図４においては、画像情報の録画をしつつ音声情報の入力をすることを前提に説明したが、送信者端末装置２００－１のプロセッサ２１１は、最初に画像情報を生成しておき、後から画像情報の各フレームに同期して、音声情報の入力を行うようにしてもよい。例えば、図３に示す例においては、時間ｔ１から時間ｔ２において音声情報Ａ及び音声情報Ｂが重複して入力されているが、これらは画像情報が生成されたのちに、各音声情報を各フレームに同期して入力することによって、同じ送信者がキャラクタＡ及びキャラクタＢを演じ分けることが可能となる。 Furthermore, while Figure 4 describes the process assuming simultaneous recording of image information and input of audio information, the processor 211 of the transmitter terminal device 200-1 may first generate the image information and then input the audio information in synchronization with each frame of the image information. For example, in the example shown in Figure 3, audio information A and audio information B are input simultaneously from time t1 to time t2. However, by inputting each audio information in synchronization with each frame after the image information has been generated, the same transmitter can perform both character A and character B.

また、図４においては特に図示はしていないものの、図３に示すように、特定のオブジェクトのオブジェクトＩＤ情報に関連付けられていない音声情報（例えば、ＢＧＭ音声情報）も入力することが可能である。 Furthermore, although not specifically illustrated in Figure 4, as shown in Figure 3, it is also possible to input audio information that is not associated with the object ID information of a specific object (for example, background music audio information).

（Ｂ）コンテンツの出力に係る処理
次に、図４によると、主に受信者末装置２００－２においてコンテンツの出力に係る処理が実行される。当該処理は、例えばコンテンツが動画コンテンツである場合には、受信者端末装置２００－２において所望の動画コンテンツを選択し、当該動画コンテンツを再生する処理である。受信者端末装置２００－２のプロセッサ２１１は、入力インターフェイス２１４を介して受信者による操作入力を受付て、コンテンツの出力のためのアプリケーションプログラムをメモリ２１２から読み出して、当該アプリケーションプログラムの起動を行う（Ｓ２１）。アプリケーションプログラムが起動されると、プロセッサ２１１は、出力インターフェイス２１５を介して、例えば、受信者端末装置２００－２において出力が可能な一又は複数の動画コンテンツを選択するためのサムネイル画像が一覧として表示されたコンテンツ選択画面を出力する。そして、プロセッサ２１１は、入力インターフェイス２１４を介してコンテンツ選択画面内の一欄の中から所望のコンテンツのサムネイル画像を選択するための受信者による操作入力を受付て、出力するコンテンツの選択をする（Ｓ２２）。プロセッサ２１１は、通信インターフェイス２１３を介して、選択されたコンテンツに対応付けられたコンテンツＩＤ情報（例えば、コンテンツＩＤ情報が「Ａ１」）と共に、当該コンテンツの送信を所望するためのコンテンツ要求（Ｔ２１）をサーバ装置１００に送信する。 (B) Processing related to content output Next, as shown in Figure 4, processing related to content output is mainly performed in the receiver terminal device 200-2. This processing involves, for example, if the content is video content, selecting the desired video content in the receiver terminal device 200-2 and playing the video content. The processor 211 of the receiver terminal device 200-2 receives operation input from the receiver via the input interface 214, reads an application program for content output from the memory 212, and starts the application program (S21). Once the application program is started, the processor 211 outputs a content selection screen via the output interface 215, which displays a list of thumbnail images for selecting one or more video content that can be output in the receiver terminal device 200-2. The processor 211 then receives operation input from the receiver via the input interface 214 to select the thumbnail image of the desired content from the list in the content selection screen and selects the content to be output (S22). The processor 211 transmits a content request (T21) to the server device 100 via the communication interface 213, along with content ID information associated with the selected content (for example, content ID information is "A1"), requesting that the content be transmitted.

サーバ装置１００のプロセッサ１１１は、通信インターフェイス１１３を介して受信者端末装置２００－２からコンテンツ要求を受信すると、一緒に受信したコンテンツＩＤ情報（例えば、Ａ１）に基づいてコンテンツ管理テーブルを参照し、コンテンツ（例えば図３に例示された情報）を読み出す（Ｓ２３）。プロセッサ１１１は、通信インターフェイス１１３を介して、コンテンツ要求を送信してきた受信者端末装置２００－２に読み出したコンテンツ（Ｔ２２）を送信する。 When the processor 111 of the server device 100 receives a content request from the receiver terminal device 200-2 via the communication interface 113, it refers to the content management table based on the content ID information (e.g., A1) received along with the request and reads the content (e.g., the information illustrated in Figure 3) (S23). The processor 111 then transmits the read content (T22) to the receiver terminal device 200-2 that sent the content request via the communication interface 113.

受信者端末装置２００－２のプロセッサ２１１は、通信インターフェイス２１３を介してコンテンツを受信すると、出力インターフェイス２１５を介して受信したコンテンツを出力する（Ｓ２４）。ここで、受信者端末装置２００－２のプロセッサ２１１は、入力インターフェイス２１４を介して、受信したコンテンツに、現在出力している画像情報を構成するフレームにオブジェクトＩＤ情報が対応付けられている場合には、受信者の操作入力を介して、出力する音声情報を選択することが可能である。すなわち、プロセッサ２１１は、オブジェクトに関連付けて音声情報の入力を所望するための送信者による操作入力を受け付けると（Ｓ２５）、出力インターフェイス２１５を介して出力する音声情報の選択を行い、出力される音声情報を変更する処理を実行する（Ｓ２６）。なお、Ｓ２４～Ｓ２６に係る一連の処理の詳細については、図５において後述する。 The processor 211 of the receiver terminal device 200-2 receives content via the communication interface 213 and outputs the received content via the output interface 215 (S24). Here, the processor 211 of the receiver terminal device 200-2 can select the audio information to be output via the receiver's operation input if the received content is associated with an object ID information corresponding to a frame constituting the currently outputting image information, via the input interface 214. That is, when the processor 211 receives operation input from the sender requesting audio information to be input associated with an object (S25), it selects the audio information to be output via the output interface 215 and executes a process to change the output audio information (S26). Details of the series of processes related to S24 to S26 will be described later in Figure 5.

そして、受信者端末装置２００－２は、Ｓ２４～Ｓ２６のコンテンツの出力、オブジェクトの選択、及びその選択に応じて出力する音声情報の変更を繰り返し、時間ｔｎに達すると、コンテンツの出力を終了する。以上により、コンテンツの出力に係る処理を終了する。 The receiver terminal device 200-2 then repeatedly outputs content, selects objects, and modifies the audio information output according to the selection, as described in S24-S26. When time tn is reached, it terminates the content output. Thus, the process related to content output is completed.

なお、図４においては、受信者端末装置２００－２のプロセッサ２１１は、コンテンツをサーバ装置１００からひとまとまりのデータとして受信しているが、所定のフレーム数やデータ量ごとに受信し、順次出力するようにしてもよい。 In Figure 4, the processor 211 of the receiver terminal device 200-2 receives the content from the server device 100 as a single data set. However, it may also receive the content in predetermined frame or data chunks and output it sequentially.

８．受信者端末装置２００－２の処理フロー
図５は、本開示の一実施形態に係るサーバ装置１００において実行される処理フローを示す図である。具体的には、図５、図４のＳ２４～Ｓ２６において受信者末装置２００－２が行うコンテンツの出力に係る処理のフローを示す図である。当該処理フローは、主に受信者端末装置２００－２がメモリ２１２に記憶されたプログラムを読み出して実行することにより行われる。 8. The processing flow diagram 5 of the receiver terminal device 200-2 is a diagram showing the processing flow executed in the server device 100 according to one embodiment of the present disclosure. Specifically, it is a diagram showing the processing flow related to content output performed by the receiver terminal device 200-2 in steps S24 to S26 of Figure 4. This processing flow is mainly performed by the receiver terminal device 200-2 reading and executing a program stored in the memory 212.

図５によると、プロセッサ２１１は、通信インターフェイス２１３を介して、サーバ装置１００から所望するコンテンツ（例えば、図３に示すコンテンツＩＤ情報がＡ１のコンテンツ）を受信する（Ｓ１１１）。そして、プロセッサ２１１は、コンテンツを受信すると、出力インターフェイス２１５を介して受信したコンテンツを出力する。具体的には、プロセッサ２１１は、出力インターフェイス２１５の一つであるディスプレイを介して、受信したコンテンツに含まれる画像情報をフレームＦ１から順次出力する。また、プロセッサ２１１は、出力インターフェイス２１５の一つであるスピーカーを介して、受信したコンテンツに含まれる音声情報を出力する画像情報のフレームに同期して出力する。図３の例では、時間ｔ０からキャラクタＡの音声情報Ａが出力され、時間ｔ１になると音声情報Ａに加えてキャラクタＢの音声情報Ｂ及びＢＧＭ音声情報がそれぞれ出力されることになる。 According to Figure 5, the processor 211 receives desired content (for example, content with content ID information A1 as shown in Figure 3) from the server device 100 via the communication interface 213 (S111). Upon receiving the content, the processor 211 outputs it via the output interface 215. Specifically, the processor 211 outputs the image information contained in the received content sequentially from frame F1 via the display, which is one of the output interfaces 215. The processor 211 also outputs the audio information contained in the received content via the speaker, another output interface 215, synchronized with the image information frames. In the example in Figure 3, character A's audio information A is output from time t0, and at time t1, in addition to audio information A, character B's audio information B and BGM audio information are output.

プロセッサ２１１は、入力インターフェイス２１４を介して、受信したコンテンツに、現在出力している画像情報を構成するフレームにオブジェクトＩＤ情報が対応付けられている場合には、入力インターフェイス２１４を介して受信者の操作入力を受け付けて、出力する音声情報を選択することが可能である。したがって、プロセッサ２１１は、当該操作入力を受け付けることによって、オブジェクトの選択がされたか否かを判断する（Ｓ１１３）。 The processor 211, via the input interface 214, can receive operation input from the receiver via the input interface 214 and select the audio information to be output if object ID information is associated with the frame constituting the currently outputting image information in the received content. Therefore, the processor 211 determines whether or not an object has been selected by receiving this operation input (S113).

ここで、図７は、本開示の一実施形態に係る受信者端末装置２００－２において出力される画面の例を示す図である。具体的には、図７は、受信者端末装置２００－２において図５のＳ１１２～Ｓ１１３において出力する音声情報を選択するための受信者による操作入力を受け付けたときのコンテンツ出力画面２０の例を示す図である。図７によると、コンテンツ出力画面２０には、「アプリケーションＢ」というアプリケーションプログラムの名称と共に、画像情報表示領域２１とオブジェクト選択領域２２が含まれる。画像情報表示領域２１には、現在出力されている画像情報として、例えばフレームＦ３の画像データ２３が出力されている。当該画像データ２３には、その中にキャラクタオブジェクトとしてキャラクタＡの画像２４及びキャラクタＢの画像２５が含まれる。画像２４で示されるキャラクタＡにはオブジェクトＩＤ情報として「Ｂ１」が、画像２５で示されるキャラクタＢにはオブジェクトＩＤ情報として「Ｂ２」が付与されている。 Here, Figure 7 shows an example of a screen output in a receiver terminal device 200-2 according to one embodiment of the present disclosure. Specifically, Figure 7 shows an example of a content output screen 20 when the receiver terminal device 200-2 receives operation input from a receiver to select the audio information to be output in steps S112 to S113 of Figure 5. According to Figure 7, the content output screen 20 includes the name of the application program, "Application B," along with an image information display area 21 and an object selection area 22. The image information display area 21 displays image data 23 of frame F3, for example, as the currently output image information. This image data 23 includes an image 24 of character A and an image 25 of character B as character objects. Character A, shown in image 24, is assigned the object ID information "B1," and character B, shown in image 25, is assigned the object ID information "B2."

オブジェクト選択領域２２には、音声情報のうち、現在出力されている画像情報のフレームに同期する音声情報に対応付けられたオブジェクトＩＤ情報に対応して、各オブジェクトすなわちキャラクタを選択するためのアイコンが含まれる。例えば、図３の時間ｔ１から時間ｔ２のいずれかのタイミングのコンテンツ出力画面２０を例にすると、当該時間ではキャラクタＡの音声情報Ａ及びキャラクタＢの音声情報Ｂの両方が出力されている。したがって、オブジェクト選択領域２２には、キャラクタＡに対応してキャラクタＡアイコン２６と、キャラクタＢに対応してキャラクタＢアイコン２７が含まれる。プロセッサ２１１は、入力インターフェイス２１４を介してオブジェクト選択領域２２に含まれるいずれかのアイコン（例えば、キャラクタＡアイコン２６及びキャラクタＢアイコン２７のいずれか）に対する受信者の操作入力を受け付けると、当該操作入力がされたアイコンに対応するキャラクタを選択する。図７の例では、キャラクタＡアイコン２６が他のアイコンに対して識別可能に表示されているが、これから入力される音声情報が対応付けられるキャラクタのオブジェクトＩＤ情報として、キャラクタＡのオブジェクトＩＤ情報（すなわち、「Ｂ１」）が選択されたことを示している。 The object selection area 22 contains icons for selecting each object, i.e., character, corresponding to the object ID information associated with the audio information synchronized with the currently output image frame. For example, in the content output screen 20 at either time t1 or t2 in Figure 3, both audio information A for character A and audio information B for character B are output at that time. Therefore, the object selection area 22 includes a character A icon 26 corresponding to character A and a character B icon 27 corresponding to character B. When the processor 211 receives a user input via the input interface 214 for any of the icons in the object selection area 22 (for example, either character A icon 26 or character B icon 27), it selects the character corresponding to the icon for which the user input was made. In the example in Figure 7, character A icon 26 is displayed identifiable from the other icons, indicating that the object ID information of character A (i.e., "B1") has been selected as the object ID information of the character to which the incoming audio information will be associated.

なお、図７の例では、オブジェクト選択領域２２には、現在出力されているフレームに含まれるキャラクタに対応して、出力する音声情報を選択するためのアイコンを含むようにした。しかし、これに限らず、コンテンツ全体において少なくとも１フレームにおいて登場するキャラクタについては、常に音声情報を選択するためのアイコンを含むようにし、音声情報が出力されていないときであっても音声情報の選択ができるようにしてもよい。 In the example shown in Figure 7, the object selection area 22 includes an icon for selecting the audio information to be output, corresponding to the character included in the currently outputting frame. However, this is not limited to this example. For characters appearing in at least one frame throughout the entire content, an icon for selecting audio information may always be included, allowing for audio information selection even when no audio information is being output.

また、図５のＳ１１３及び図７においては、受信者による操作入力を入力インターフェイス２１４で受け付けることによってオブジェクトを選択する場合について説明した。しかし、これに代えて、又はこれに加えて、入力インターフェイス２１４として、マイクやカメラなどのセンサを使ってオブジェクトを選択することも可能である。例えば、プロセッサ２１１は、カメラを利用して受信者端末装置２００－２を利用している受信者の属性（例えば、年齢、性別など）を認識する。そして、プロセッサ２１１は、その認識結果に基づいて音声の出力をするオブジェクトの選択を行う。例えば、受信者端末装置２００－２としてデジタルサイネージ用の端末装置を用意し、当該端末装置に搭載されたカメラにおいて当該端末装置のディスプレイを参照しているユーザ（受信者）の属性を認識する。そして、ユーザ（受信者）が「子供」であると認識された場合には子供向けのオブジェクト（例えば、動物キャラクタ）以外の音声をミュートにし、「大人」であると認識された場合には大人向けのオブジェクト（例えば、人間キャラクタ）以外の音声をミュートにする。このように、入力インターフェイス２１４としてカメラ等のセンサを用いることによってより多様な選択の方法を実現することが可能である。 Furthermore, Figures 5 (S113) and 7 illustrate the case where an object is selected by receiving operation input from the receiver via the input interface 214. However, it is also possible to select objects using sensors such as microphones or cameras as the input interface 214, either as an alternative or in addition to this. For example, the processor 211 uses the camera to recognize the attributes of the receiver using the receiver terminal device 200-2 (e.g., age, gender, etc.). The processor 211 then selects objects for audio output based on this recognition result. For example, a digital signage terminal device could be provided as the receiver terminal device 200-2, and the camera mounted on the terminal device could recognize the attributes of the user (receiver) viewing the terminal device's display. If the user (receiver) is recognized as a "child," audio from objects other than child-oriented objects (e.g., animal characters) would be muted; if the user is recognized as an "adult," audio from objects other than adult-oriented objects (e.g., human characters) would be muted. In this way, using sensors such as cameras as the input interface 214 makes it possible to realize a wider variety of selection methods.

再び図５に戻り、図７に示すとおり、音声情報の出力を所望するキャラクタのオブジェクトＩＤ情報が選択されると、プロセッサ２１１は選択されたキャラクタの音声情報のみを出力し、それ以外のキャラクタの音声情報の出力を制限（例えば、ミュート）する（Ｓ１１４）。すなわち、プロセッサ２１１は、図３の時間ｔ１から時間ｔ２において、キャラクタＡのオブジェクトＩＤ情報が選択されると、キャラクタＢの音声情報Ｂの出力インターフェイス２１５（例えば、スピーカ）からの出力を制限し、キャラクタＡの音声情報Ａのみが出力されるようにする。一方で、Ｓ１１３においていずれのオブジェクトの選択も行われていない場合には、Ｓ１１４に係る処理はスキップする。 Returning to Figure 5, as shown in Figure 7, when the object ID information of a character whose audio information output is desired is selected, the processor 211 outputs only the audio information of the selected character and restricts (e.g., mutes) the output of audio information of other characters (S114). That is, between time t1 and time t2 in Figure 3, if the object ID information of character A is selected, the processor 211 restricts the output from the output interface 215 (e.g., speaker) of character B's audio information B, so that only character A's audio information A is output. On the other hand, if no object is selected in S113, the process related to S114 is skipped.

プロセッサ２１１は、時間ｔ０～ｔｎに至る一連のコンテンツの出力を終了するまで、Ｓ１１２～Ｓ１１４に係る処理を常に繰り返す。以上により、本処理フローを終了する。 Processor 211 continuously repeats the processes described in S112 to S114 until it has finished outputting the series of content from time t0 to tn. This completes the processing flow.

なお、ここでは、キャラクタＡの音声情報Ａ及びキャラクタＢの音声情報Ｂのみがコンテンツに含まれる場合を説明しているために、キャラクタＡが選択された場合には音声情報Ｂの出力が制限され、音声情報Ａのみが出力されるとした。しかし、図７において選択されたキャラクタの音声情報の出力を制限して、選択されなかった方の音声情報を制限することなく出力してもよい。 In this explanation, we are describing a scenario where only the voice information A for character A and the voice information B for character B are included in the content. Therefore, when character A is selected, the output of voice information B is restricted, and only voice information A is output. However, in Figure 7, the output of the selected character's voice information may be restricted, while the voice information of the unselected character is output without restriction.

また、３以上の音声情報がコンテンツに含まれている場合には、
（１）選択された一のキャラクタの音声情報のみを出力し、残りのキャラクタ全ての音声情報の出力を制限
（２）選択された一のキャラクタの音声情報の出力を制限し、残りのキャラクタ全ての音声情報を出力
（３）選択された複数のキャラクタの音声情報を出力し、残りのキャラクタ全ての音声情報の出力を制限
（４）選択された複数のキャラクタの音声情報の出力を制限し、残りのキャラクタ全ての音声情報を出力
など、様々な組み合わせで音声情報を出力することができる。 Furthermore, if the content contains three or more audio pieces,
Audio information can be output in various combinations, such as (1) outputting only the audio information of one selected character and restricting the output of audio information for all remaining characters, (2) restricting the output of audio information for one selected character and outputting audio information for all remaining characters, (3) outputting audio information for multiple selected characters and restricting the output of audio information for all remaining characters, or (4) restricting the output of audio information for multiple selected characters and outputting audio information for all remaining characters.

また、音声情報の出力の制限の方法も、上記の例では「ミュート」する場合を例に挙げたが、出力されるときの音量を変更したり（例えば、小さくする）、通常に出力する音声情報には字幕のテキスト情報を同時に出力するが制限する音声情報の字幕は出力しなかったり、他の様々な制限の方法が採用されてよい。 Furthermore, while the above example used "muting" as a method for restricting audio output, various other restriction methods can be employed. These include changing the output volume (for example, lowering it), simultaneously outputting subtitle text information with normally outputted audio, but not outputting subtitles for restricted audio.

以上、本実施形態においては、受信者等のユーザにとってより使い勝手の良い処理装置、処理プログラム及び処理方法を提供することが可能である。特に、出力されるコンテンツに複数の発話情報（例えば、音声情報）が含まれているような場合には、出力する発話情報（例えば、音声情報）を受信者の選択によって選ぶことが可能である。例えば、従来では、一部のオブジェクトに対応付けられた音声情報を出力したくないという場合、受信者端末装置２００－２等において音量ボタンによる制御を行うことで出力の制限がされていた。したがって、全ての音声情報の出力が制限されることとなった。しかし、本実施形態では、受信者が所望するタイミングで、受信者が所望するオブジェクトに対応付けられた音声情報のみを選択的に出力したり、選択的に出力の制限をすることが可能となる。 In this embodiment, it is possible to provide a processing device, processing program, and processing method that are more user-friendly for users such as receivers. In particular, when the output content includes multiple speech information (e.g., voice information), it is possible for the receiver to select which speech information (e.g., voice information) to output. For example, conventionally, if the receiver did not want to output voice information associated with certain objects, the output was limited by controlling the volume using a volume button on the receiver terminal device 200-2, etc. Therefore, the output of all voice information was limited. However, in this embodiment, it is possible to selectively output only the voice information associated with the object desired by the receiver at the timing desired by the receiver, or to selectively limit the output.

９．変形例
以下に、図１～図７に示す上記実施形態においける変形例を示す。なお、以下の変形例及び図１～図７に示す実施形態は、相互に組み合わせて実施することも可能である。また、以下において特に言及する点を除いて、図１～図７に示す実施形態において説明した点と同様に処理することが可能である。 9. Modifications Below are modifications of the above embodiment shown in Figures 1 to 7. Note that the following modifications and the embodiment shown in Figures 1 to 7 can be combined and implemented. Furthermore, except for points specifically mentioned below, the process can be carried out in the same manner as described in the embodiment shown in Figures 1 to 7.

（Ａ）音声情報の選択に係る変形例１
上記においては、図４等に示すように、受信者端末装置２００－２において選択されたキャラクタに対応付けられた音声情報が、受信者端末装置２００－２のプロセッサ２１１によって選択されて、それ以外のキャラクタの音声情報の出力が制限される場合について説明した。しかし、これに代えて、受信者端末装置２００－２において選択されたキャラクタに対応付けられた音声情報が、サーバ装置１００のプロセッサ１１１によって再編成されて、それ以外のキャラクタの音声情報の出力が制限されるようにしてもよい。 (A) Variation 1 of the selection of audio information
In the above, as shown in Figure 4, etc., the case described is one in which the audio information associated with the character selected in the receiver terminal device 200-2 is selected by the processor 211 of the receiver terminal device 200-2, and the output of audio information for other characters is restricted. However, instead, the audio information associated with the character selected in the receiver terminal device 200-2 may be reorganized by the processor 111 of the server device 100, and the output of audio information for other characters may be restricted.

図８Ａは、本開示の一実施形態に係る処理システム１で実行される処理シーケンスを示す図である。具体的には、図８Ａは、発話情報の一つである音声情報の選択に係る処理がサーバ装置１００のプロセッサ１１１によって行われる場合の処理シーケンスを示す図である。各装置における処理は、各装置のメモリに記憶されたプログラムをプロセッサが処理することによって実行される。 Figure 8A is a diagram showing a processing sequence executed in a processing system 1 according to one embodiment of this disclosure. Specifically, Figure 8A shows a processing sequence when the processing related to the selection of speech information, which is one of the speech information, is performed by the processor 111 of the server device 100. Processing in each device is executed by the processor processing a program stored in the memory of each device.

なお、コンテンツ生成に係るＳ３１～Ｓ３４に係る処理は、図４に示すコンテンツ生成に係るＳ１１～Ｓ１４に係る処理と同じであるため、その説明は省略する。 Note that the processes related to content generation in S31 to S34 are the same as the processes related to content generation in S11 to S14 shown in Figure 4, therefore their explanation is omitted.

また、コンテンツ出力に係る処理のうち、コンテンツ出力を受け付けてアプリケーションプログラムを起動し、出力インターフェイスを介して所望のコンテンツの出力をし、所望のオブジェクトに対応付けらえた音声情報の選択を行うまでのＳ４１～Ｓ４５に係る処理は、図４に示すコンテンツ出力に係る処理のうちのＳ２１～Ｓ２５に係る処理と同じであるため、その説明は省略する。 Furthermore, regarding the content output process, the processes S41 to S45, which involve receiving the content output, launching the application program, outputting the desired content via the output interface, and selecting the audio information associated with the desired object, are the same as the processes S21 to S25 shown in Figure 4 regarding the content output process; therefore, their explanation is omitted.

図７に示す方法等により音声情報の出力を所望するキャラクタのオブジェクトＩＤ情報が選択されると、受信者端末装置２００－２のプロセッサ２１１は、通信インターフェイス２１３を介して、現在出力するコンテンツのコンテンツＩＤ情報と選択されたオブジェクトＩＤ情報を含むオブジェクト選択情報（Ｔ４３）をサーバ装置１００に送信する。 When the object ID information of a character from which audio information output is desired is selected using the method shown in Figure 7, the processor 211 of the receiver terminal device 200-2 transmits object selection information (T43), which includes the content ID information of the content to be output and the selected object ID information, to the server device 100 via the communication interface 213.

サーバ装置１００のプロセッサ１１１は、オブジェクト選択情報を受信すると、コンテンツＩＤ情報に対応付けられたコンテンツをメモリ１１２から読み出して、当該コンテンツを再編成する処理を実行する（Ｓ４６）。具体的には、プロセッサ１１１は、読み出されたコンテンツに含まれる音声情報Ａ及び音声情報Ｂのうち、選択されたオブジェクトＩＤ情報に対応付けられた音声情報（図７の例では音声情報Ａ）をそのままにし、選択されなかった他の音声情報（図７の例では音声情報Ｂ）を当該コンテンツから削除する。そして、プロセッサ１１１は、上記処理によりコンテンツを再編成すると、再編成後のコンテンツを新たにメモリ１１２に記憶するとともに、通信インターフェイス１１３を介してオブジェクト選択情報を送信してきた受信者端末装置２００－２に当該コンテンツ（Ｔ４４）を送信する。 When the processor 111 of the server device 100 receives object selection information, it reads the content associated with the content ID information from the memory 112 and performs a process to reorganize the content (S46). Specifically, the processor 111 keeps the audio information associated with the selected object ID information (audio information A in the example of Figure 7) as is, and deletes the other audio information that was not selected (audio information B in the example of Figure 7) from the content. After reorganizing the content through the above process, the processor 111 stores the reorganized content in the memory 112 and transmits the content (T44) to the receiver terminal device 200-2 that sent the object selection information via the communication interface 113.

受信者端末装置２００－２は、通信インターフェイス２１３を介して再編成後のコンテンツを受信すると、Ｓ４４と同様に出力インターフェイス２１５を介して受信したコンテンツを出力する。このとき、当該コンテンツの音声情報にはキャラクタＢの音声情報Ｂは含まれていない。したがって、受信者端末装置２００－２のプロセッサ１１１は、キャラクタＢの音声情報Ｂを出力することなく、出力インターフェイス２１５を介してキャラクタＡの音声情報Ａのみを出力することとなる。 When the receiver terminal device 200-2 receives the reorganized content via the communication interface 213, it outputs the received content via the output interface 215, similar to S44. At this time, the audio information of the content does not include the audio information B of character B. Therefore, the processor 111 of the receiver terminal device 200-2 outputs only the audio information A of character A via the output interface 215, without outputting the audio information B of character B.

なお、例えば図７に出力するコンテンツ出力画面２０のオブジェクト選択領域２２には、画像情報の少なくとも一部のフレームに含まれるオブジェクト（キャラクタ）に対応するアイコンが常に表示されるものとする。これによって、音声情報Ｂの出力が制限されている場合であっても、再度受信者が音声情報Ｂの出力を所望する場合には、音声情報Ｂの選択が可能となる。 Furthermore, for example, in the content output screen 20 output in Figure 7, the object selection area 22 will always display icons corresponding to objects (characters) included in at least some of the frames of the image information. This ensures that even if the output of audio information B is restricted, the recipient can select audio information B again if they wish to receive it.

以上、図８Ａに示す例によっても、図１～図７の実施形態と同様に、音声情報の選択的な出力が可能となる。 As shown in Figure 8A, selective output of audio information is possible, similar to the embodiments in Figures 1 to 7.

（Ｂ）音声情報の選択に係る変形例２
上記においては、図４等に示すように、受信者端末装置２００－２において選択されたキャラクタに対応付けられた音声情報が、受信者端末装置２００－２のプロセッサ２１１によって選択されて、それ以外のキャラクタの音声情報の出力が制限される場合について説明した。しかし、これに代えて、受信者端末装置２００－２において選択されたキャラクタに対応付けられた音声情報が、送信者端末装置２００－１のプロセッサ２１１によって選択されて、それ以外のキャラクタの音声情報の出力が制限されるようにしてもよい。 (B) Modification 2 of the selection of audio information
In the above, as shown in Figure 4, etc., the case described is one in which the audio information associated with the character selected in the receiver terminal device 200-2 is selected by the processor 211 of the receiver terminal device 200-2, and the output of audio information for other characters is restricted. However, instead, the audio information associated with the character selected in the receiver terminal device 200-2 may be selected by the processor 211 of the sender terminal device 200-1, and the output of audio information for other characters may be restricted.

図８Ｂは、本開示の一実施形態に係る処理システム１で実行される処理シーケンスを示す図である。具体的には、図８Ｂは、発話情報の一つである音声情報の選択に係る処理が送信者端末装置２００－１のプロセッサ２１１によって行われる場合の処理シーケンスを示す図である。各装置における処理は、各装置のメモリに記憶されたプログラムをプロセッサが処理することによって実行される。 Figure 8B is a diagram showing a processing sequence executed in a processing system 1 according to one embodiment of this disclosure. Specifically, Figure 8B shows a processing sequence when the processing related to the selection of voice information, which is one of the speech information, is performed by the processor 211 of the sender terminal device 200-1. Processing in each device is executed by the processor processing a program stored in the memory of each device.

なお、コンテンツが一定のデータ量ごとにストリーミング配信される点を除いて、コンテンツ生成に係るＳ６１～Ｓ６４に係る処理は、図４に示すコンテンツ生成に係るＳ１１～Ｓ１４に係る処理と同じであるため、その説明は省略する。 Except for the fact that the content is streamed in fixed data chunks, the processes related to content generation S61-S64 are the same as the processes related to content generation S11-S14 shown in Figure 4, so their explanation will be omitted.

また、コンテンツが一定のデータ量ごとにストリーミング配信される点を除いて、コンテンツ出力に係る処理のうち、コンテンツ出力を受け付けてアプリケーションプログラムを起動し、出力インターフェイスを介して所望のコンテンツの出力をし、所望のオブジェクトに対応付けらえた音声情報の選択を行うまでのＳ７１～Ｓ７５に係る処理は、図４に示すコンテンツ出力に係る処理のうちのＳ２１～Ｓ２５に係る処理と同じであるため、その説明は省略する。 Furthermore, except for the fact that the content is streamed in fixed data chunks, the processing related to content output, specifically steps S71-S75, which involve receiving the content output, launching the application program, outputting the desired content via the output interface, and selecting the audio information associated with the desired object, is the same as the processing related to content output shown in Figure 4, specifically steps S21-S25. Therefore, a detailed explanation of these steps is omitted.

サーバ装置１００のプロセッサ１１１は、オブジェクト選択情報（Ｔ７３）を受信すると、コンテンツＩＤ情報に基づいてコンテンツの送信者である送信者端末装置２００－１を特定する（Ｓ７６）。そして、プロセッサ１１１は、通信インターフェイス１１３を介して、特定された送信者端末装置２００－１にオブジェクト選択情報（Ｔ７４）を送信する。 When the processor 111 of the server device 100 receives object selection information (T73), it identifies the sender terminal device 200-1, which is the sender of the content, based on the content ID information (S76). Then, the processor 111 transmits the object selection information (T74) to the identified sender terminal device 200-1 via the communication interface 113.

送信者端末装置２００－１のプロセッサ２１１は、通信インターフェイス２１３を介してオブジェクト選択情報を受信すると、選択的に音声情報の入力を実行する（Ｓ７７）。具体的には、送信者端末装置２００－１では、リアルタイムで画像情報と音声情報の入力が行われ、配信がなされているところ、プロセッサ２１１は、Ｓ６２及びＳ６３に示す処理（すなわち、図４のＳ２２及びＳ２３に示す処理）によって、オブジェクトＩＤ情報に対応付けて音声情報の入力を受け付ける。そして、プロセッサ２１１は、オブジェクト選択情報により受信したオブジェクトＩＤ情報を参照して、当該オブジェクトＩＤ情報と同じオブジェクトＩＤ情報に対応付けられた音声情報が入力されている場合には、当該音声情報を画像情報に同期して記憶する。一方、プロセッサ２１１は、受信したオブジェクトＩＤ情報と異なるオブジェクトＩＤ情報に対応付けられた音声情報については、入力を受け付けるものの、送信するコンテンツには含めない。すなわち、プロセッサ２１１は、受信者により選択されたキャラクタのオブジェクトＩＤ情報に対応付けられた音声情報のみが含まれ、他のキャラクタのオブジェクトＩＤ情報に対応付けられた音声情報が含まれていないコンテンツを生成する。 The processor 211 of the sender terminal device 200-1, upon receiving object selection information via the communication interface 213, selectively performs input of audio information (S77). Specifically, while the sender terminal device 200-1 inputs and distributes image and audio information in real time, the processor 211 accepts input of audio information associated with object ID information through the processes shown in S62 and S63 (i.e., the processes shown in S22 and S23 of Figure 4). The processor 211 then refers to the object ID information received via the object selection information, and if audio information associated with the same object ID information is input, it stores that audio information synchronized with the image information. On the other hand, the processor 211 accepts input of audio information associated with object ID information different from the received object ID information, but does not include it in the transmitted content. That is, the processor 211 generates content that includes only audio information associated with the object ID information of the character selected by the receiver, and does not include audio information associated with the object ID information of other characters.

送信者端末装置２００－１のプロセッサ２１１は、通信インターフェイス２１３を介して、上記のとおり生成したコンテンツ（Ｔ７５）をコンテンツＩＤ情報と共にサーバ装置１００に送信する。サーバ装置１００のプロセッサ１１１は、通信インターフェイス１１３を介してコンテンツを受信すると、コンテンツＩＤ情報に対応付けてコンテンツ管理テーブルに記憶するとともに、通信インターフェイス１１３を介して、オブジェクト選択情報を送信してきた受信者端末装置２００－２に受信した受信したコンテンツ（Ｔ７６）を送信する。 The processor 211 of the sender terminal device 200-1 transmits the content (T75) generated as described above, along with the content ID information, to the server device 100 via the communication interface 213. Upon receiving the content via the communication interface 113, the processor 111 of the server device 100 stores it in the content management table, associating it with the content ID information, and also transmits the received content (T76) to the receiver terminal device 200-2, which sent the object selection information, via the communication interface 113.

受信者端末装置２００－２のプロセッサ２１１は、通信インターフェイス２１３を介してコンテンツを受信すると、出力インターフェイス２１５を介して受信したコンテンツを出力する。このとき、受信したコンテンツには、上記のとおり、選択されたキャラクタのオブジェクトＩＤ情報に対応付けられた音声情報のみが含まれ、他のキャラクタのオブジェクトＩＤ情報に対応付けられた音声情報が含まれていない。すなわち、受信者により選択されたキャラクタ以外のオブジェクトＩＤ情報に対応付けられた音声情報は、その送信が制限されることによって、受信者端末装置２００－２における出力が制限されることになる。 The processor 211 of the receiver terminal device 200-2, upon receiving content via the communication interface 213, outputs the received content via the output interface 215. At this time, the received content includes only the audio information associated with the object ID information of the selected character, as described above, and does not include audio information associated with the object ID information of other characters. In other words, audio information associated with object ID information of characters other than the one selected by the receiver is restricted from transmission, thereby limiting its output in the receiver terminal device 200-2.

以上、図８Ｂに示す例によっても、図１～図７の実施形態と同様に、音声情報の選択的な出力が可能となる。 As described above, the example shown in Figure 8B also enables selective output of audio information, similar to the embodiments shown in Figures 1 to 7.

（Ｃ）制限される音声情報に係る変形例
図１～図８Ｂにおいては、キャラクタＡの音声情報Ａ及びキャラクタＢの音声情報Ｂのみがコンテンツに含まれる場合を説明しているために、キャラクタＡが選択された場合には音声情報Ｂの出力が制限され、音声情報Ａのみが出力されるとした。しかし、選択されたキャラクタの音声情報の出力を制限して、選択されなかった方の音声情報を制限することなく出力してもよい。 (C) Modifications relating to restricted audio information Figures 1 to 8B illustrate the case where only audio information A for character A and audio information B for character B are included in the content. Therefore, when character A is selected, the output of audio information B is restricted, and only audio information A is output. However, it is also possible to restrict the output of the audio information of the selected character and output the audio information of the other character without restriction.

（Ｃ）複数の送信者が存在する変形例
図１～図８Ｂの例においては、一の送信者端末装置２００－１において複数のキャラクタの音声情報をオブジェクトＩＤ情報に対応付けて入力することで、一の送信者が複数のキャラクタを演じ分ける場合について説明した。しかし、これに代えて、又はこれに加えて、複数の送信者端末装置２００－１において複数のキャラクタの音声情報をオブジェクトＩＤ情報に対応付けて入力することで、複数の送信者で同一のキャラクタを演じたり、複数の送信者で複数のキャラクタを演じ分けることも可能である。 (C) Modification with Multiple Transmitters In the examples in Figures 1 to 8B, the case in which one sender plays multiple characters was explained by inputting the voice information of multiple characters in association with object ID information in one sender terminal device 200-1. However, instead of this, or in addition to this, it is also possible for multiple senders to play the same character, or for multiple senders to play multiple characters, by inputting the voice information of multiple characters in association with object ID information in multiple sender terminal devices 200-1.

図９は、本開示の実施形態に係る処理システム１に係る処理の概要を示す図である。具体的には、図９は、処理システム１を用いて行われる動画コンテンツの配信における処理の一例が示されている。図９によると、同じ動画コンテンツに対して、送信者Ａの送信者端末装置では、キャラクタＡの音声情報ＡとキャラクタＢの音声情報Ｂが入力され、サーバ装置を介して受信者の受信者端末装置に送信されている。また、送信者Ｂの送信者端末装置では、キャラクタＣの音声情報ＣとキャラクタＤの音声情報Ｄが入力され、サーバ装置を介して受信者の受信者端末装置に送信されている。このとき、音声情報Ａ及び音声情報Ｂには、送信者Ａ又は送信者Ａの送信者端末装置を特定するための送信者ＩＤ情報が対応付けられている。また、音声情報Ｃ及び音声情報Ｄには、送信者Ｂ又は送信者Ｂの送信者端末装置を特定するための送信者ＩＤ情報が対応付けられている。したがって、受信者端末装置において出力する音声情報を選択するときに、オブジェクトＩＤ情報を選択することに代えて、送信者ＩＤ情報を選択させることも可能である。例えば、受信者端末装置において送信者Ａの送信者ＩＤ情報が選択された場合には、音声情報Ａ及び音声情報Ｂのみが出力され、音声情報Ｃ及び音声情報Ｄの出力が制限される。また、受信者端末装置において送信者Ｂの送信者ＩＤ情報が選択された場合には、音声情報Ｃ及び音声情報Ｄのみが出力され、音声情報Ａ及び音声情報Ｂの出力が制限される。 Figure 9 is a diagram illustrating an overview of the processing according to the processing system 1 according to the embodiment of this disclosure. Specifically, Figure 9 shows an example of processing in the distribution of video content performed using the processing system 1. According to Figure 9, for the same video content, the sender terminal device of sender A receives voice information A for character A and voice information B for character B, and transmits them to the receiver terminal device of the recipient via the server device. In addition, the sender terminal device of sender B receives voice information C for character C and voice information D for character D, and transmits them to the receiver terminal device of the recipient via the server device. At this time, sender ID information for identifying sender A or sender terminal device of sender A is associated with voice information A and voice information B. In addition, sender ID information for identifying sender B or sender terminal device of sender B is associated with voice information C and voice information D. Therefore, when selecting the voice information to be output in the receiver terminal device, it is possible to select sender ID information instead of object ID information. For example, if the receiver terminal device selects the sender ID information of sender A, only voice information A and voice information B will be output, and the output of voice information C and voice information D will be restricted. Similarly, if the receiver terminal device selects the sender ID information of sender B, only voice information C and voice information D will be output, and the output of voice information A and voice information B will be restricted.

以上、図９に示す例によっても、図１～図８Ｂの実施形態と同様に、音声情報の選択的な出力が可能となる。 As shown in Figure 9, selective output of audio information is possible, similar to the embodiments in Figures 1 to 8B.

（Ｄ）コンテンツ、オブジェクト、及び発話情報に係る変形例
図１～図８Ｂの例においては、コンテンツとして動画コンテンツを例に挙げたために、オブジェクトがキャラクタオブジェクトであり、発話情報が音声情報である場合を例に挙げて説明した。しかし、コンテンツが動画コンテンツであるか他のコンテンツかに関わらず、他のオブジェクトや他の発話情報であっても同様の処理することが可能である。例えば、コンテンツとしては、動画コンテンツ以外にも、音楽コンテンツ、ゲームコンテンツ、出版物コンテンツ、チャットコンテンツ、ＳＮＳコンテンツ、ウェブコンテンツ及びこれらの組み合わせ等が挙げられる。また、オブジェクトとしても、キャラクタオブジェクト以外にも、構造物オブジェクト、装飾オブジェクト、テキストオブジェクト、画像オブジェクト、ＧＵＩオブジェクト及びこれらの組み合わせ等が挙げられる。また、発話情報としても、音声情報以外に、テキスト情報、画像情報及びこれらの組み合わせ等が挙げられる。 (D) Modifications relating to content, objects, and speech information In the examples in Figures 1 to 8B, video content was used as an example of content, and the explanation was given using the case where the object is a character object and the speech information is audio information. However, regardless of whether the content is video content or other content, similar processing is possible for other objects and other speech information. For example, in addition to video content, content can include music content, game content, publication content, chat content, SNS content, web content, and combinations thereof. Similarly, in addition to character objects, objects can include structure objects, decorative objects, text objects, image objects, GUI objects, and combinations thereof. Furthermore, in addition to audio information, speech information can include text information, image information, and combinations thereof.

例えば、コンテンツとしてチャットコンテンツを本開示に係る実施形態に適用する場合、オブジェクトとしては各送信者に対応付けられて吹き出し形状をしたＧＵＩオブジェクトが挙げられ、発話情報には各ユーザがチャットとして入力したテキスト情報が挙げられる。このような場合であっても、受信者が所望の送信者のＧＵＩオブジェクトを選択することによって、他の送信者のＧＵＩオブジェクトに対応付けれたチャット（テキスト情報）の出力（表示）を制限する。これによって、特定の送信者のみを選択的に出力することが可能となる。 For example, when applying chat content as content to the embodiments described herein, the objects include GUI objects in the shape of speech bubbles associated with each sender, and the utterance information includes text information entered by each user as chat. Even in such a case, the recipient can restrict the output (display) of chat (text information) associated with the GUI objects of other senders by selecting the GUI object of the desired sender. This makes it possible to selectively output only specific senders.

本明細書で説明される処理及び手順は、実施形態において明示的に説明されたものによってのみならず、ソフトウェア、ハードウェア又はこれらの組み合わせによっても実現可能である。具体的には、本明細書で説明された処理及び手順は、集積回路、揮発性メモリ、不揮発性メモリ、磁気ディスク、光ストレージ等の媒体に、当該処理に相当するロジックを実装することによって実現される。また、本明細書で説明される処理及び手順は、それらの処理・手順をコンピュータプログラムとして実装し、処理装置やサーバ装置を含む各種のコンピュータに実行させることが可能である。 The processes and procedures described herein can be implemented not only by those explicitly described in the embodiments, but also by software, hardware, or a combination thereof. Specifically, the processes and procedures described herein are implemented by implementing the logic corresponding to the process on a medium such as an integrated circuit, volatile memory, non-volatile memory, magnetic disk, or optical storage. Furthermore, the processes and procedures described herein can be implemented as computer programs and executed by various computers, including processing units and server devices.

本明細書中で説明される処理及び手順が単一の装置、ソフトウェア、コンポーネント、モジュールによって実行される旨が説明されたとしても、そのような処理又は手順は、複数の装置、複数のソフトウェア、複数のコンポーネント、及び／又は、複数のモジュールによって実行されるものとすることができる。また、本明細書中で説明される各種情報が単一のメモリや記憶部に格納される旨が説明されたとしても、そのような情報は、単一の装置に備えられた複数のメモリ又は複数の装置に分散して配置された複数のメモリに分散して格納されるものとすることができる。さらに、本明細書において説明されるソフトウェア及びハードウェアの要素は、それらをより少ない構成要素に統合して、又は、より多い構成要素に分解することによって実現されるものとすることができる。 Even if the processes and procedures described herein are described as being performed by a single device, software, component, or module, such processes or procedures may be performed by multiple devices, multiple software programs, multiple components, and/or multiple modules. Similarly, even if the various types of information described herein are described as being stored in a single memory or storage unit, such information may be distributed and stored in multiple memories within a single device or in multiple memories distributed across multiple devices. Furthermore, the software and hardware elements described herein may be implemented by integrating them into fewer components or by decomposing them into more components.

１処理システム
１００サーバ装置
２００端末装置
２００－１送信者端末装置
２００－２受信者端末装置 1 Processing System 100 Server Device 200 Terminal Device 200-1 Sender Terminal Device 200-2 Receiver Terminal Device

Claims

A processing unit comprising at least one processor,
The at least one processor is
The system receives speech information from the transmitting terminal device via a communication interface, which is input in association with each of the multiple objects included in the content generated in the transmitting terminal device.
Select at least one of the multiple objects via the input interface.
When outputting the utterance information via the output interface, the utterance information associated with the selected at least one object is output.
A processing unit configured to perform a process for that purpose.

The aforementioned content is video content,
The aforementioned object is a character object included in the video content.
The apparatus according to claim 1.

The processing apparatus according to claim 2, wherein the utterance information is voice information input in association with the character object.

The at least one processor is
The system receives each utterance information input associated with each of the aforementioned multiple objects,
When outputting the speech information via the output interface, the output of speech information associated with objects other than the selected at least one object is restricted, thereby limiting the output of speech information associated with the selected at least one object.
The apparatus according to claim 1, configured to perform a process for that purpose.

The at least one processor is
Of the speech information input in association with the aforementioned multiple objects, only the speech information of the selected at least one object is received.
When outputting the speech information via the output interface, only the speech information associated with the at least one received object is output.
The apparatus according to claim 1, configured to perform a process for that purpose.

The processing apparatus according to claim 5, wherein speech information associated with objects other than the selected at least one object is restricted from being transmitted from the sender terminal device to the processing apparatus.

The aforementioned speech information is received from the transmitting terminal device via a server device installed remotely.
The processing apparatus according to claim 5, wherein speech information associated with objects other than the selected at least one object is restricted from being transmitted from the server device to the processing apparatus.

The processing device according to claim 1, wherein the speech information is input in association with each of the multiple objects by the sender selecting one of the multiple objects in advance at the sender terminal device.

In a computer comprising at least one processor, the at least one processor is:
The system receives speech information from the transmitting terminal device via a communication interface, which is input in association with each of the multiple objects included in the content generated in the transmitting terminal device.
Select at least one of the multiple objects via the input interface.
When outputting the utterance information via the output interface, the utterance information associated with the selected at least one object is output.
A processing program that is designed to perform a specific action.

A computer comprising at least one processor, wherein a processing method is performed by the at least one processor,
The process involves receiving speech information from the sender terminal device via a communication interface, which is input in association with each of the multiple objects included in the content generated in the sender terminal device,
A step of selecting at least one object from the plurality of objects via an input interface,
When outputting the speech information via the output interface, the steps include outputting the speech information associated with the selected at least one object,
A processing method that includes this.