JP4564432B2

JP4564432B2 - Video composition apparatus, video composition method, and program

Info

Publication number: JP4564432B2
Application number: JP2005267410A
Authority: JP
Inventors: 添博史川; 村卓也川
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2005-09-14
Filing date: 2005-09-14
Publication date: 2010-10-20
Anticipated expiration: 2025-09-14
Also published as: JP2007081863A

Description

本発明は、映像合成装置、映像合成方法およびプログラムに関する。 The present invention relates to a video composition device, a video composition method, and a program.

近年、通信技術が大幅に進歩したことにより、安価で広帯域なネットワーク基盤が広く普及してきている。特に、ＡＤＳＬ（Asymmetric Digital Subscriber Line）回線やＦＴＴＨ（Fiber To The Home）回線等の通信サービスが整備されてきていることにより、企業のみならず一般家庭においても容易に広帯域のネットワーク環境を構築することが可能になっている。 In recent years, due to significant advances in communication technology, inexpensive and broadband network platforms have become widespread. In particular, by establishing communication services such as ADSL (Asymmetric Digital Subscriber Line) lines and FTTH (Fiber To The Home) lines, it is easy to build a broadband network environment not only in companies but also in general households. Is possible.

広帯域ネットワークの普及に伴って、映像と音声によるコミュニケーションサービスの提供が始まっている。データの送受信が可能な情報機器同士がネットワークを介してデータを交換するシステムを構築することにより、映像・音声による双方向のコミュニケーションが可能になる。このようなシステムは、テレビ会議システムと呼ばれる。中でも特に、複数の拠点間をネットワーク接続することで複数のメンバ間のコミュニケーションを可能にする、多地点テレビ会議システムが近年の注目を集めている。 With the widespread use of broadband networks, the provision of video and audio communication services has begun. By constructing a system in which information devices capable of transmitting and receiving data exchange data via a network, bidirectional communication using video and audio becomes possible. Such a system is called a video conference system. In particular, a multi-point video conference system that enables communication among a plurality of members by connecting a plurality of bases via a network has attracted attention in recent years.

多地点テレビ会議システムを構築する際には、各会議端末装置（会議クライアント装置、あるいは単にクライアントとも呼ぶ。）同士が映像をお互いに交換する方法と、多地点制御装置（ＭＣＵ：Multipoint Control Unit。会議サーバ装置、あるいは単にサーバとも呼ぶ。）を利用して、各クライアントから映像を受信し、ＭＣＵでそれらの映像に拡大・縮小・切り出し等を行い１つの映像に合成（ミキシング）した後で、各クライアントに配信する方法がある。後者のようなサーバ・クライアント型の構成のシステムでは、各クライアントは単一のＭＣＵから映像を送受信するだけでよいため、前者に比べて、各クライアントの映像送受信のための処理負荷と、それに伴うネットワーク負荷とを低減することが可能である。したがって、現在提供されている多地点テレビ会議システムでは、主にサーバ・クライアント型の構成によりサービスが提供されることが多い。 When constructing a multipoint video conference system, a method in which each conference terminal apparatus (also called a conference client apparatus or simply a client) exchanges images with each other and a multipoint control unit (MCU: Multipoint Control Unit). After receiving the video from each client using the conference server device or simply the server), the MCU performs enlargement / reduction / cutout etc. on these videos and synthesizes them into one video (mixing). There is a way to distribute to each client. In a server-client type system such as the latter, each client only needs to send and receive video from a single MCU, so the processing load for sending and receiving video for each client and the accompanying load are higher than the former. It is possible to reduce the network load. Therefore, in the currently provided multi-point video conference systems, services are often provided mainly by a server / client type configuration.

また、ＭＣＵで行われる映像合成には、さまざまな合成レイアウトが存在する。たとえば、各クライアントから受信した映像を等しいサイズに縮小し隙間なく並べたレイアウトや、あるいは、一つの映像を他の映像より一回り大きくして隙間なく並べたレイアウト、あるいは、一つの映像を最大化し他の映像を縮小してその前面に重ね合わせて配置したピクチャーインピクチャーのレイアウトなどがある。 There are various composition layouts for image composition performed by the MCU. For example, a layout in which videos received from each client are reduced to the same size and arranged without gaps, or a layout in which one video is made larger than the other videos and arranged without gaps, or one video is maximized. There is a picture-in-picture layout in which other images are reduced and placed on top of each other.

さらに、ＭＣＵで行われる映像合成には、すべてのクライアントで同一の合成映像を共有する方法のほかに、各クライアントごとに独立したレイアウトの合成映像を参照できる方法も考えられている。前者の方法では、代表となるクライアントが合成映像のレイアウトを指定し、ＭＣＵは一本の映像合成を行い各クライアントにそのコピーを配信する。後者の方法では、各クライアントは自身の所望する合成映像のレイアウトを指定し、そのレイアウトをＭＣＵに通知する。ＭＣＵは、各クライアントから通知されたレイアウトに基づいて複数の映像合成を独立に行い、生成した合成映像を各クライアントに配信する。 Furthermore, for video composition performed by the MCU, in addition to a method of sharing the same composite video among all clients, a method of referring to a composite video having an independent layout for each client is considered. In the former method, the representative client designates the layout of the composite video, and the MCU performs a single video composition and distributes the copy to each client. In the latter method, each client designates the layout of the desired composite video and notifies the MCU of the layout. The MCU performs a plurality of video composition independently based on the layout notified from each client, and distributes the generated composite video to each client.

さて、一般的にテレビ会議システムは、ストレスのないコミュニケーションの道具としての役割を果たすために、高品質の映像・音声を伝送することが要求される。それゆえ、ユーザの利用可能なネットワーク通信帯域のうち、テレビ会議システムの使用に伴う映像・音声データの伝送帯域が占める割合は高いものとなることが避けられない。このことは、ネットワーク帯域を複数のユーザが共有している場合や、クライアントがたとえば携帯電話網を介して接続されている場合といった、利用可能なネットワークの通信帯域幅に限りがあるような状況、あるいは、映像・音声以外のたとえばデスクトップ画像共有・アプリケーションデータ共有等といったデータ伝送を並行して行うような状況において、特に問題となる。そこで、このような状況において、いかに映像・音声データを節約して効率よく伝送できるかが重要な課題となる。 In general, a video conference system is required to transmit high-quality video / audio in order to play a role as a stress-free communication tool. Therefore, it is inevitable that the proportion of the video / audio data transmission band that accompanies the use of the video conference system will be high in the network communication band available to the user. This is a situation where the available network communication bandwidth is limited, such as when the network bandwidth is shared by multiple users, or when the client is connected via a mobile phone network, for example, Alternatively, this is particularly problematic in situations where data transmission other than video / audio, such as desktop image sharing and application data sharing, is performed in parallel. Under such circumstances, how to save video / audio data and transmit it efficiently is an important issue.

このような課題の従来の解決方法として、たとえば、ネットワークの混雑状況に応じて自動的に映像・音声データの品質を制御するという方法が挙げられる。この方法では、会議開始時、あるいは会議開催中に、ネットワークの混雑状況を調査し、混雑状況が悪くなったのを検知すると自動的に映像・音声データの送信品質を下げて通信帯域の節約を図る。
特開平１０−３０４３３５公報特開２００４−１８７１７０公報 As a conventional solution to such a problem, for example, there is a method of automatically controlling the quality of video / audio data in accordance with the congestion status of the network. This method investigates the network congestion status at the start of the conference or during the conference, and if it detects that the congestion status has deteriorated, it automatically reduces the transmission quality of video and audio data to save communication bandwidth. Plan.
JP-A-10-304335 JP 2004-187170 A

ＭＣＵに複数のクライアントが接続する構成を持ち、かつ、各クライアントがそれぞれ独立したレイアウトを参照することが可能なテレビ会議システムにおいて、ネットワークの混雑状況が大きく変わらない環境であっても、クライアントからサーバへ不必要に高品質な映像データを伝送しないような制御をどのようにして行うかが重要な課題となる。 In a video conference system that has a configuration in which a plurality of clients are connected to an MCU and each client can refer to an independent layout, even in an environment where the network congestion does not change significantly, the client to the server An important issue is how to control such that high-quality video data is not transmitted unnecessarily.

そこで、特開平１０−３０４３３５（特許文献１）では、クライアントのユーザの注目する被注目部位の情報を感知しサーバに通知することで、サーバからクライアントに伝送する複数の映像データのうち、被注目部位に相当する映像の品質を高くし、それ以外の部位に相当する映像の品質を低くすることで、通信帯域の節約を図る。 Therefore, in Japanese Patent Laid-Open No. 10-304335 (Patent Document 1), information on a target portion of interest of a client user is detected and notified to the server, so that the target of interest among a plurality of video data transmitted from the server to the client. By increasing the quality of the video corresponding to the part and lowering the quality of the video corresponding to the other part, the communication band is saved.

しかし、前記特開平１０−３０４３３５の方法では、サーバからクライアントへの映像データの伝送制御は可能であるものの、クライアントからサーバへの映像データの伝送制御は行うことができない。 However, in the method disclosed in JP-A-10-304335, transmission control of video data from the server to the client is possible, but transmission control of video data from the client to the server cannot be performed.

そこで、特開２００４−１８７１７０（特許文献２）では、クライアントが画面テンプレートを選択した際に、その画面テンプレートに対応する画像パラメータ情報をサーバからクライアントに通知し、クライアントから送信する画像データの品質をその画像パラメータ情報に従うように制御するという方法が提案されている。 Therefore, in Japanese Patent Application Laid-Open No. 2004-187170 (Patent Document 2), when a client selects a screen template, the server notifies the client of image parameter information corresponding to the screen template, and determines the quality of image data transmitted from the client. There has been proposed a method of performing control according to the image parameter information.

しかし、前記特開２００４−１８７１７０の方法では、クライアントからサーバへの映像データの伝送制御は可能であるものの、伝送制御の際に、その映像の他クライアントからの参照状況を考慮に入れていない。そのために、クライアントから送信した映像を他のいずれのクライアントも表示していない状況、あるいは表示していてもきわめて小さいサイズでしか表示していない状況だったとしても、クライアントは高品質の映像をサーバに送信しようとする。誰もあまり注目していない映像を高品質で送信するのは無駄なことであり、結果的にこのクライアントは効率の悪いデータ送信を行っていることになる。このことはクライアントの処理能力を不必要に使用し、またネットワークの通信帯域を不必要に圧迫していることから、好ましくないことである。 However, in the method disclosed in Japanese Patent Application Laid-Open No. 2004-187170, transmission control of video data from the client to the server is possible, but the reference status from other clients of the video is not taken into consideration in the transmission control. For this reason, even if the video sent from the client is not displayed by any other client, or even if it is displayed in a very small size, the client can display the high-quality video on the server. Try to send to. It is useless to send a video that no one pays much attention to in high quality, and as a result, this client is performing inefficient data transmission. This is undesirable because it uses the client's processing power unnecessarily and unnecessarily presses the network communication bandwidth.

本発明は、上記課題を解決するためになされたもので、映像表示装置から映像合成装置へ送信する映像データの制御を効率的に行い、これにより通信帯域幅の節約を可能にした映像合成装置、映像合成方法およびプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and efficiently controls video data transmitted from a video display device to a video synthesis device, thereby enabling a communication bandwidth to be saved. Another object is to provide a video composition method and program.

本発明の一態様としての映像合成装置は、第１〜第Ｎ（Ｎは２以上の整数）の映像表示装置から第１〜第Ｎの符号化映像データを受信する映像受信手段と、前記第１〜第Ｎの符号化映像データを復号して第１〜第Ｎの復号化映像データを出力する復号手段と、前記第１〜第Ｎの映像表示装置から映像合成用の第１〜第Ｎのレイアウト情報を受信するレイアウト情報受信手段と、受信された前記第１〜第Ｎのレイアウト情報を記憶するレイアウト情報記憶手段と、前記第１〜第Ｎの復号化映像データが表す第１〜第Ｎの映像を前記第１〜第Ｎのレイアウト情報にしたがって合成して第１〜第Ｎの合成映像を生成する映像合成手段と、前記第１〜第Ｎの合成映像を前記第１〜第Ｎの映像表示装置へ送信する映像送信手段と、前記第１〜第Ｎのレイアウト情報のうち少なくとも前記第２〜第Ｎのレイアウト情報における前記第１の映像のレイアウトに基づき、前記第１の映像表示装置に対する判定値を算出する判定値算出手段と、算出した前記判定値から前記第１の映像表示装置で適用すべき符号化パラメータを決定する映像品質制御手段と、決定した前記符号化パラメータを前記第１の映像表示装置へ送信する映像品質制御信号送信手段と、を備える。 According to another aspect of the present invention, there is provided a video synthesizing apparatus, comprising: video receiving means for receiving first to Nth encoded video data from first to Nth (N is an integer of 2 or more) video display devices; Decoding means for decoding the 1st to Nth encoded video data and outputting the 1st to Nth decoded video data; and the 1st to Nth video composites from the 1st to Nth video display devices Layout information receiving means for receiving the first layout information, layout information storage means for storing the received first to Nth layout information, and first to first decoded video data represented by the first to Nth decoded video data. N video images are synthesized in accordance with the first to Nth layout information to generate first to Nth synthesized video images, and the first to Nth synthesized video images are combined with the first to Nth synthesized video images. Video transmission means for transmitting to the video display device, and the first to Nth rays. Based on at least the layout of the first video in the second to N-th layout information in the out information, a judgment value calculation means for calculating a judgment value for the first video display device, and the calculated judgment value Video quality control means for determining an encoding parameter to be applied in the first video display device; and video quality control signal transmission means for transmitting the determined encoding parameter to the first video display device. .

本発明の一態様としての映像表示装置は、映像データを生成する映像生成手段と、前記映像データを符号化パラメータに従って符号化して符号化映像データを出力する符号化手段と、前記符号化映像データを映像合成装置へ送信する映像送信手段と、映像合成のためのレイアウト情報を前記映像合成装置へ送信するレイアウト情報送信手段と、合成映像データを受信する映像受信手段と、前記合成映像データを表示する表示手段と、他の１つ以上の映像表示装置がそれぞれ使用しているレイアウト情報を受信するレイアウト情報受信手段と、受信した各前記レイアウト情報における自装置の映像のレイアウトから自装置に対する判定値を算出する判定値算出手段と、算出した前記判定値から適用すべき符号化パラメータを決定し、決定した符号化パラメータを前記符号化手段に通知する映像品質制御手段と、を備える。 The video display device as one aspect of the present invention includes video generation means for generating video data, encoding means for encoding the video data according to encoding parameters and outputting encoded video data, and the encoded video data Video transmission means for transmitting the video to the video synthesis apparatus, layout information transmission means for transmitting layout information for video synthesis to the video synthesis apparatus, video reception means for receiving the composite video data, and display of the composite video data Display means for receiving, layout information receiving means for receiving layout information respectively used by one or more other video display apparatuses, and a determination value for the own apparatus from the video layout of the own apparatus in each received layout information A determination value calculation means for calculating the coding parameter and a coding parameter to be applied from the calculated determination value Video quality control means for notifying the encoding means of encoding parameters.

本発明の一態様としての映像合成方法は、第１〜第Ｎ（Ｎは２以上の整数）の映像表示装置において第１〜第Ｎの映像データを生成し、前記第１〜第Ｎの映像データを第１〜第Ｎの符号化パラメータに従って符号化して第１〜第Ｎの符号化映像データを生成し、前記第１〜第Ｎの符号化映像データを映像合成装置に送信し、映像合成のための第１〜第Ｎのレイアウト情報を前記映像合成装置に送信し、前記第１〜第Ｎの符号化映像データを復号して第１〜第Ｎの復号化映像データを生成し、前記第１〜第Ｎの復号化映像データが表す第１〜第Ｎの映像を第１〜第Ｎのレイアウト情報にしたがって合成して第１〜第Ｎの合成映像を生成し、前記第１〜第Ｎの合成映像を前記第１〜第Ｎの映像表示装置に送信し、前記第１〜第Ｎのレイアウト情報のうち少なくとも前記第２〜第Ｎのレイアウト情報における前記第１の映像のレイアウトに基づき、前記第１の映像表示装置に対する判定値を算出し、算出した前記判定値から前記第１の映像表示装置で適用すべき前記第１の符号化パラメータを決定する。 According to another aspect of the present invention, there is provided a video composition method, wherein first to Nth video data are generated in first to Nth video display devices (N is an integer of 2 or more), and the first to Nth video data are generated. The data is encoded according to the first to Nth encoding parameters to generate first to Nth encoded video data, and the first to Nth encoded video data is transmitted to a video synthesizer for video synthesis. Transmitting the first to Nth layout information for the video synthesizer, decoding the first to Nth encoded video data to generate the first to Nth decoded video data, The first to N-th synthesized videos are generated by synthesizing the first to N-th videos represented by the first to N-th decoded video data according to the first to N-th layout information, and the first to N-th synthesized videos are generated. N composite images are transmitted to the first to Nth image display devices, and the first to Nth layout information is transmitted. A determination value for the first video display device is calculated based on at least the layout of the first video in the second to N-th layout information, and the first video display device uses the calculated determination value to calculate the determination value. The first encoding parameter to be applied is determined.

本発明の一態様としてのプログラムは、第１〜第Ｎ（Ｎは２以上の整数）の映像表示装置から第１〜第Ｎの符号化映像データを受信するステップと、前記第１〜第Ｎの符号化映像データを復号して第１〜第Ｎの復号化映像データを出力するステップと、前記第１〜第Ｎの映像表示装置から映像合成用の第１〜第Ｎのレイアウト情報を受信するステップと、受信された前記第１〜第Ｎのレイアウト情報を記憶するステップと、前記第１〜第Ｎの復号化映像データが表す第１〜第Ｎの映像を前記第１〜第Ｎのレイアウト情報にしたがって合成して第１〜第Ｎの合成映像を生成するステップと、前記第１〜第Ｎの合成映像を前記第１〜第Ｎの映像表示装置へ送信するステップと、前記第１〜第Ｎのレイアウト情報のうち少なくとも前記第２〜第Ｎのレイアウト情報における前記第１の映像のレイアウトに基づき、前記第１の映像表示装置に対する判定値を算出するステップと、算出した前記判定値から前記第１の映像表示装置で適用すべき符号化パラメータを決定するステップと、決定した前記符号化パラメータを前記第１の映像表示装置へ送信するステップと、をコンピュータに実行させる。 The program as one aspect of the present invention includes a step of receiving first to Nth encoded video data from first to Nth (N is an integer of 2 or more) video display devices; Decoding first encoded video data and outputting first to Nth decoded video data, and receiving first to Nth layout information for video composition from the first to Nth video display devices A step of storing the received first to Nth layout information, and a first to Nth video represented by the first to Nth decoded video data. Synthesizing according to layout information to generate first to N-th synthesized videos, transmitting the first to N-th synthesized videos to the first to N-th video display devices, and the first To at least the second to Nth layout information among the Nth layout information. Based on the layout of the first video in the out information, a step of calculating a determination value for the first video display device, and an encoding parameter to be applied in the first video display device from the calculated determination value A step of determining and a step of transmitting the determined encoding parameter to the first video display device are executed by a computer.

本発明の一態様としてのプログラムは、映像データを生成するステップと、前記映像データを符号化パラメータに従って符号化して符号化映像データを出力するステップと、前記符号化映像データを映像合成装置へ送信するステップと、映像合成のためのレイアウト情報を前記映像合成装置へ送信するステップと、合成映像データを受信するステップと、前記合成映像データを表示するステップと、他の１つ以上の映像表示装置がそれぞれ使用しているレイアウト情報を受信するステップと、受信した各前記レイアウト情報における自装置の映像のレイアウトから自装置に対する判定値を算出するステップと、算出した前記判定値から適用すべき符号化パラメータを決定するステップと、を映像表示装置に実行させる。 A program as one aspect of the present invention includes a step of generating video data, a step of encoding the video data according to a coding parameter and outputting encoded video data, and transmitting the encoded video data to a video synthesizer A step of transmitting layout information for video composition to the video composition device, a step of receiving composite video data, a step of displaying the composite video data, and one or more other video display devices Receiving the layout information used by each, a step of calculating a determination value for the device from the layout of the image of the device in each received layout information, and encoding to be applied from the calculated determination value A step of determining a parameter;

本発明により、会議に参加する複数のクライアントの映像の参照状況に基づいて、クライアントからサーバへ送信する映像データの制御を行うようにしたことにより、通信帯域幅を節約することが可能となる。 According to the present invention, it is possible to save the communication bandwidth by controlling the video data transmitted from the client to the server based on the video reference status of a plurality of clients participating in the conference.

（第１の実施の形態）
以下、本発明の第１の実施の形態を、図１から図１７を参照して説明する。 (First embodiment)
Hereinafter, a first embodiment of the present invention will be described with reference to FIGS.

図１に本発明を適用した多地点テレビ会議システムのシステム構成を示す。図１は会議サーバ装置１００と４台の会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄから構成される。会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄは、それぞれネットワーク５００Ａ、５００Ｂ、５００Ｃ、５００Ｄを介して会議サーバ装置１００と接続されている。 FIG. 1 shows a system configuration of a multipoint video conference system to which the present invention is applied. FIG. 1 includes a conference server device 100 and four conference client devices 300A, 300B, 300C, and 300D. The conference client devices 300A, 300B, 300C, and 300D are connected to the conference server device 100 via the networks 500A, 500B, 500C, and 500D, respectively.

会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄはそれぞれカメラ装置３１０Ａ、３１０Ｂ、３１０Ｃ、３１０Ｄを備えている。各会議クライアント装置は、カメラ装置を使って撮影した映像データを、ネットワーク５００Ａ、５００Ｂ、５００Ｃ、５００Ｄを通じて会議サーバ装置１００へ送信する機能を備えている。また会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄは、会議サーバ装置１００から送信された映像データを受信し、それぞれの表示装置３６０Ａ、３６０Ｂ、３６０Ｃ、３６０Ｄに表示する機能を備えている。またネットワーク５００Ａ、５００Ｂ、５００Ｃ、５００Ｄを通じて、会議サーバ装置１００に制御情報を送信する機能を備えている。 The conference client apparatuses 300A, 300B, 300C, and 300D include camera apparatuses 310A, 310B, 310C, and 310D, respectively. Each conference client device has a function of transmitting video data captured using the camera device to the conference server device 100 through the networks 500A, 500B, 500C, and 500D. In addition, the conference client devices 300A, 300B, 300C, and 300D have a function of receiving the video data transmitted from the conference server device 100 and displaying it on the respective display devices 360A, 360B, 360C, and 360D. Further, it has a function of transmitting control information to the conference server apparatus 100 through the networks 500A, 500B, 500C, and 500D.

会議サーバ装置１００は、会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄから送信された映像データを受信する機能を備えている。また、受信した複数の映像データを一つの映像に合成した合成映像を、レイアウトが異なる複数種類生成する機能を備えている。なお映像合成の際には、複数の映像合成を、各会議クライアント装置ごとに独立して行うことができる。また、ネットワーク５００Ａ、５００Ｂ、５００Ｃ、５００Ｄを通じて会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄに合成映像をそれぞれ送信する機能を備えている。また同様にネットワーク５００Ａ、５００Ｂ、５００Ｃ、５００Ｄを通じて、各会議クライアント装置に制御信号を送信する機能を備えている。 The conference server apparatus 100 has a function of receiving video data transmitted from the conference client apparatuses 300A, 300B, 300C, and 300D. In addition, it has a function of generating a plurality of types of composite video having different layouts by combining a plurality of received video data into one video. It should be noted that a plurality of video compositions can be performed independently for each conference client device at the time of video composition. In addition, a composite video is transmitted to the conference client apparatuses 300A, 300B, 300C, and 300D through the networks 500A, 500B, 500C, and 500D. Similarly, it has a function of transmitting a control signal to each conference client device through the networks 500A, 500B, 500C, and 500D.

次に、図２を用いて、会議サーバ装置と会議クライアント装置において本発明に関わる内部構成要素を示す。会議サーバ装置における内部構成要素、会議クライアント装置における内部構成要素により表される機能はプログラムをコンピュータに実行させることによって実現してもよいしハードウェア的に実現してもよい。 Next, internal components related to the present invention in the conference server apparatus and the conference client apparatus will be described with reference to FIG. The functions represented by the internal components in the conference server device and the internal components in the conference client device may be realized by causing a computer to execute a program or may be realized by hardware.

会議サーバ装置１００は、その構成要素として、通信路５１０Ａを利用して映像データの受信を行う映像受信部１１０と、映像データの復号化を行うデコード部１２０と、通信路５３０Ａを利用してレイアウト制御信号の受信を行うレイアウト制御信号受信部１３０と、レイアウト制御信号から解析して得られたレイアウト情報を格納するレイアウト情報記憶部１４０と、映像の合成処理を行う映像合成部１５０と、合成映像の符号化を行うエンコード部１６０と、通信路５２０Ａを利用して映像データの送信を行う映像送信部１７０と、前記レイアウト情報から判定値の算出を行う判定値算出部１８０と、前記判定値から映像符号化制御パラメータを算出する映像品質制御部１９０と、通信路５４０Ａを利用して映像符号化制御パラメータの送信を行う映像品質制御信号送信部２００とを有している。なお、以下で示す会議サーバ装置１００の各内部構成要素の動作は、会議クライアント装置３００Ａとの間で行われる処理について説明しているが、会議クライアント装置３００Ａとの間に限らず、３００Ｂ、３００Ｃ、３００Ｄとの間についても同様して行われる。 The conference server apparatus 100 includes, as its constituent elements, a video receiving unit 110 that receives video data using the communication path 510A, a decoding unit 120 that decodes video data, and a layout that uses the communication path 530A. A layout control signal receiving unit 130 that receives a control signal, a layout information storage unit 140 that stores layout information obtained by analysis from the layout control signal, a video synthesis unit 150 that performs video synthesis processing, and a synthesized video An encoding unit 160 that encodes the video, a video transmission unit 170 that transmits video data using the communication channel 520A, a determination value calculation unit 180 that calculates a determination value from the layout information, and the determination value Video quality control unit 190 for calculating video coding control parameters, and video coding control parameters using communication channel 540A And a video quality control signal transmitting unit 200 for transmitting. In addition, although the operation | movement of each internal component of the conference server apparatus 100 shown below has demonstrated the process performed between conference client apparatuses 300A, it is not restricted to conference client apparatus 300A, 300B, 300C , 300D.

会議クライアント装置３００Ａは、その構成要素として、カメラ部３１０Ａと、合成映像の符号化を行うエンコード部３２０Ａと、通信路５１０Ａを通じて映像データの送信を行う映像送信部３３０Ａと、通信路５２０Ａを利用して映像データの受信を行う映像受信部３４０Ａと、映像データの復号化を行うデコード部３５０Ａと、映像の表示を行う表示部３６０Ａと、ユーザ・インターフェース部３７０Ａと、通信路５３０Ａを利用してレイアウト制御信号の送信を行うレイアウト制御信号送信部３８０Ａと、通信路５４０Ａを利用して映像符号化制御パラメータの受信を行う映像品質制御信号受信部３９０Ａとを有している。なお、図２では会議クライアント装置３００Ａを１台だけ記載しているが、会議サーバ装置１００にはこの他にも会議クライアント装置３００Ｂ、３００Ｃ、３００Ｄが接続されており、それぞれは会議クライアント装置３００Ａと同様の内部構成要素を持つものとする。 The conference client device 300A uses, as its constituent elements, a camera unit 310A, an encoding unit 320A that encodes synthesized video, a video transmission unit 330A that transmits video data through the communication channel 510A, and a communication channel 520A. A video receiving unit 340A that receives video data, a decoding unit 350A that decodes video data, a display unit 360A that displays video, a user interface unit 370A, and a layout using the communication path 530A. A layout control signal transmission unit 380A that transmits a control signal and a video quality control signal reception unit 390A that receives a video encoding control parameter using the communication path 540A are provided. In FIG. 2, only one conference client device 300A is shown, but the conference server device 100 is connected to other conference client devices 300B, 300C, and 300D, each of which is connected to the conference client device 300A. It shall have similar internal components.

以下、会議サーバ装置１００の内部構成要素について詳細に説明する。 Hereinafter, the internal components of the conference server apparatus 100 will be described in detail.

映像受信部１１０は、通信路５１０Ａを利用して会議クライアント装置３００Ａから、符号化された映像データを受信しデコード部１２０へ出力する。 The video receiving unit 110 receives the encoded video data from the conference client device 300A using the communication path 510A and outputs it to the decoding unit 120.

デコード部１２０は、映像受信部１１０から映像データが入力されると、復号化し映像合成部１５０へ出力する。なお、本実施の形態では映像データの符号化方式にＭＰＥＧ４を使用するが、符号化方式の種類は本発明において本質的な問題ではないので、この他にもＭＰＥＧ２、Ｈ．２６３、Ｈ．２６４等の映像符号化方式であってもよいものとする。この場合、デコード部１２０およびエンコード部３２０Ａで使用する符号化方式は、同一の映像フォーマットをサポートするものとする。 When the video data is input from the video reception unit 110, the decoding unit 120 decodes and outputs the decoded video data to the video synthesis unit 150. In this embodiment, MPEG4 is used as the video data encoding method, but the type of encoding method is not an essential problem in the present invention. 263, H.M. It may be a video encoding method such as H.264. In this case, the encoding method used in the decoding unit 120 and the encoding unit 320A supports the same video format.

レイアウト制御信号受信部１３０は、通信路５３０Ａを利用して会議クライアント装置３００Ａからレイアウト制御信号を受信する。レイアウト制御信号は、ＩＰヘッダ部とＩＰペイロード部からなるＩＰパケットとして送信するものとし、ＩＰヘッダ部にはレイアウト制御信号の送信元の会議クライアント装置３００ＡのＩＰアドレスが含まれ、またＩＰペイロード部には、会議クライアント装置３００Ａにおいて指定されたレイアウト情報１０００Ａが含まれる。レイアウト制御信号受信部１３０はレイアウト制御信号を解析し、ＩＰヘッダ部に含まれる送信元ＩＰアドレスから、レイアウト制御信号の送信元である会議クライアント装置３００Ａを特定し、またＩＰペイロード部から、レイアウト情報１０００Ａを抽出する。 The layout control signal receiving unit 130 receives a layout control signal from the conference client apparatus 300A using the communication path 530A. The layout control signal is transmitted as an IP packet including an IP header portion and an IP payload portion. The IP header portion includes the IP address of the conference client apparatus 300A that is the source of the layout control signal, and the IP payload portion includes Includes the layout information 1000A specified in the conference client apparatus 300A. The layout control signal receiving unit 130 analyzes the layout control signal, specifies the conference client device 300A that is the transmission source of the layout control signal from the transmission source IP address included in the IP header unit, and determines the layout information from the IP payload unit. Extract 1000A.

ここでレイアウト情報とは、映像合成部１５０で行われる映像合成の合成方法を指定する情報であり、たとえば、図３に示すような、各映像データの識別番号・合成位置・合成サイズ・前後方向の相対位置（階層）を表す数値の集合である。識別番号は、会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄを識別するための番号で、ここでは３００Ａの識別番号を１、３００Ｂを２、３００Ｃを３、３００Ｄを４とする。合成位置・合成サイズは、合成映像の横方向・縦方向をいずれも１００の値に正規化したときのＸＹ座標値を用いて、合成位置は映像の左上角の（Ｘ，Ｙ）値の組で表され、合成サイズは映像の横方向・縦方向の長さ（Ｗ，Ｈ）値の組で表される。なお本実施の形態ではＷ＝Ｈとする。前後方向の相対位置は、合成映像の最前面の表示位置を１、最背面の表示位置を４としたときの１から４までの自然数で表される。本実施の形態ではレイアウト情報を識別番号・合成位置・合成サイズ・前後方向の相対位置の値の組と定義したが、これに限らず、たとえば合成切り取り位置・合成切り取りサイズ等を含んでいてもよい。合成切り取り位置・合成切り取りサイズとは、映像合成部１５０が映像の一部の切り取りをサポートしている際に定義可能な値であり、図５に示すように、入力映像の横方向・縦方向をいずれも１００の値に正規化したときのＸＹ座標値を用いて、合成切り取り位置は切り取り領域の左上角の（Ｘ，Ｙ）値の組で定義され、合成切り取りサイズは切り取り領域の横方向・縦方向の長さ（Ｗ，Ｈ）値の組で定義される。レイアウト制御信号受信部１３０は、レイアウト情報１０００Ａを抽出すると、レイアウト情報記憶部１４０へこれを出力するとともに、判定値算出部１８０へ、会議クライアント装置３００Ａに関わる判定値の算出要求を出力する。 Here, the layout information is information for designating a method for synthesizing video synthesized in the video synthesizing unit 150. For example, as shown in FIG. It is a set of numerical values representing the relative position (hierarchy). The identification number is a number for identifying the conference client apparatuses 300A, 300B, 300C, and 300D. Here, the identification number of 300A is 1, 300B is 2, 300C is 3, and 300D is 4. The composite position / composite size is a set of (X, Y) values in the upper left corner of the video using the XY coordinate values when the horizontal and vertical directions of the composite video are both normalized to 100. The composite size is represented by a set of length (W, H) values in the horizontal and vertical directions of the video. In this embodiment, W = H. The relative position in the front-rear direction is represented by a natural number from 1 to 4, where 1 is the frontmost display position of the composite video and 4 is the rearmost display position. In the present embodiment, layout information is defined as a set of identification number, composite position, composite size, and relative position value in the front-rear direction. However, the present invention is not limited to this. For example, the layout information may include a composite cut position, composite cut size, etc. Good. The composite cropping position and composite cropping size are values that can be defined when the video composition unit 150 supports cropping of a part of the video. As shown in FIG. 5, the horizontal and vertical directions of the input video are shown. Using the XY coordinate values when both are normalized to 100, the combined cut position is defined by a set of (X, Y) values in the upper left corner of the cut area, and the combined cut size is the horizontal direction of the cut area • It is defined by a set of longitudinal length (W, H) values. When the layout control signal receiving unit 130 extracts the layout information 1000 A, the layout control signal receiving unit 130 outputs the layout information 1000 A to the layout information storage unit 140 and outputs a determination value calculation request related to the conference client device 300 A to the determination value calculation unit 180.

レイアウト情報記憶部１４０は、レイアウト制御信号受信部１３０からレイアウト情報１０００Ａが入力されると、入力されたレイアウト情報１０００Ａを記憶する。もしこの状態で、会議クライアント装置３００Ａから入力が新たに行われた場合は、レイアウト情報１０００Ａを新しい入力値で更新する。また、レイアウト情報記憶部１４０は、映像合成部１５０からレイアウト情報１０００Ａの取得要求が入力されると、記憶しているレイアウト情報１０００Ａを映像合成部１５０へ出力する。また、レイアウト情報記憶部１４０は、会議クライアント装置３００Ｂ、３００Ｃ、３００Ｄからレイアウト情報１０００Ｂ、１０００Ｃ、１０００Ｄが入力されると、同様にこれらのレイアウト情報を記憶する。また、レイアウト情報記憶部１４０は、判定値算出部１８０からレイアウト情報１０００Ａ、１０００Ｂ、１０００Ｃ、１０００Ｄの取得要求が入力されると、記憶しているレイアウト情報１０００Ａ、１０００Ｂ、１０００Ｃ、１０００Ｄを判定値算出部１８０へ出力する。 When the layout information 1000A is input from the layout control signal receiving unit 130, the layout information storage unit 140 stores the input layout information 1000A. If a new input is made from the conference client apparatus 300A in this state, the layout information 1000A is updated with a new input value. In addition, when an acquisition request for layout information 1000 A is input from the video composition unit 150, the layout information storage unit 140 outputs the stored layout information 1000 A to the video composition unit 150. In addition, when layout information 1000B, 1000C, and 1000D is input from the conference client devices 300B, 300C, and 300D, the layout information storage unit 140 similarly stores the layout information. In addition, when an acquisition request for layout information 1000A, 1000B, 1000C, and 1000D is input from the determination value calculation unit 180, the layout information storage unit 140 calculates the stored layout information 1000A, 1000B, 1000C, and 1000D as a determination value. Output to the unit 180.

映像合成部１５０は、デコード部１２０から映像データが入力されると、レイアウト情報記憶部１４０に、会議クライアント装置３００Ａのレイアウト情報１０００Ａの取得要求を出力する。会議クライアント装置３００Ａのレイアウト情報１０００Ａがレイアウト情報記憶部１４０から入力されると、例えば図４のように、レイアウト情報１０００Ａにしたがって映像合成を行い合成映像１０１０Ａを生成する。生成された合成映像１０１０Ａはエンコード部１６０へ入力される。 When the video data is input from the decoding unit 120, the video synthesizing unit 150 outputs an acquisition request for the layout information 1000A of the conference client device 300A to the layout information storage unit 140. When the layout information 1000A of the conference client apparatus 300A is input from the layout information storage unit 140, for example, as shown in FIG. 4, the video is synthesized according to the layout information 1000A to generate a synthesized video 1010A. The generated composite video 1010A is input to the encoding unit 160.

エンコード部１６０は、映像合成部１５０から合成映像データ１０１０Ａが入力されると、これを符号化し映像送信部１７０へ出力する。なお、本実施の形態では映像データの符号化方式にＭＰＥＧ４を使用するが、符号化方式の種類は本発明において本質的な問題ではないので、この他にもＭＰＥＧ２、Ｈ．２６３、Ｈ．２６４等の映像符号化方式であってもよいものとする。この場合、エンコード部１６０およびデコード部３５０Ａで使用する符号化方式は、同一の映像フォーマットをサポートするものとする。 When the synthesized video data 1010A is input from the video synthesizing unit 150, the encoding unit 160 encodes this and outputs it to the video transmitting unit 170. In this embodiment, MPEG4 is used as the video data encoding method, but the type of encoding method is not an essential problem in the present invention. 263, H.M. It may be a video encoding method such as H.264. In this case, the encoding method used in the encoding unit 160 and the decoding unit 350A supports the same video format.

映像送信部１７０は、エンコード部１６０から、符号化された合成映像データ１０１０Ａが入力されると、通信路５２０Ａを利用して会議クライアント装置３００Ａへ合成映像データ１０１０Ａを送出する。 When the encoded composite video data 1010A is input from the encoding unit 160, the video transmission unit 170 transmits the composite video data 1010A to the conference client device 300A using the communication path 520A.

判定値算出部１８０は、レイアウト制御信号受信部１３０から、会議クライアント装置３００Ａから判定値の算出要求が入力されると、レイアウト情報記憶部１４０に、全ての会議クライアント装置のレイアウト情報１０００Ａ、１０００Ｂ、１０００Ｃ、１０００Ｄの取得要求を出力する。レイアウト情報記憶部１４０から、図６に示すような４つのレイアウト情報１０００Ａ、１０００Ｂ、１０００Ｃ、１０００Ｄが入力されると、これらのレイアウト情報にもとづいて、図７に示すような判定値１０２０Ａ、１０２０Ｂ、１０２０Ｃ、１０２０Ｄを算出する。判定値は、各会議クライアント装置ごとに定義される値で、各会議クライアント装置の送出するカメラ装置の映像が、それぞれどれだけ大きくあるいはくっきりと表示されるべきとされているか（注目を受けているか）の度合いを表す。ここでは、会議クライアント装置３００Ａの判定値１０２０Ａを、自分以外の会議クライアント装置、つまり３００Ｂ、３００Ｃ、３００Ｄがそれぞれ見ている合成映像上の３００Ａの映像の横方向の長さ（Ｗ）の中の最大値と定義する。つまり、図６の３つのレイアウト情報１０００Ｂ、１０００Ｃ、１０００Ｄの中で、識別番号が１（＝３００Ａの識別番号）に等しい行に注目し、その中で最大のＷ値を選びこれを判定値１０２０Ａと定める（図６の例では、判定値１０２０Ａは、５０、５０、２５の最大値、すなわち５０と算出される）。なお、本実施の形態では１０００Ａを除く３つのレイアウト情報に基づいて判定値を算出したが、１０００Ａを含めた４つのレイアウト情報に基づいた上で判定値を算出してもよい。また、ここでは、映像の横方向の長さの中の最大値を判定値としたが最小値を判定値とすることも可能である。 When the determination value calculation unit 180 receives a determination value calculation request from the conference client apparatus 300A from the layout control signal receiving unit 130, the layout value storage unit 140 stores the layout information 1000A, 1000B, Output 1000C and 1000D acquisition requests. When four pieces of layout information 1000A, 1000B, 1000C, and 1000D as shown in FIG. 6 are input from the layout information storage unit 140, determination values 1020A and 1020B as shown in FIG. 1020C and 1020D are calculated. The judgment value is a value defined for each conference client device, and how much or clearly the video of the camera device transmitted by each conference client device should be displayed (whether it is receiving attention) ) Degree. Here, the determination value 1020A of the conference client apparatus 300A is set to the horizontal length (W) of the video of 300A on the composite video viewed by the conference client apparatuses other than itself, that is, 300B, 300C, and 300D. It is defined as the maximum value. That is, in the three pieces of layout information 1000B, 1000C, and 1000D in FIG. 6, pay attention to the line whose identification number is equal to 1 (= identification number of 300A), and select the largest W value among them and select this as the determination value 1020A. (In the example of FIG. 6, the determination value 1020A is calculated as the maximum value of 50, 50, and 25, that is, 50). In the present embodiment, the determination value is calculated based on the three pieces of layout information excluding 1000A. However, the determination value may be calculated based on the four pieces of layout information including 1000A. In this example, the maximum value in the horizontal length of the video is used as the determination value, but the minimum value may be used as the determination value.

また、ここでは判定値を、映像の横方向の長さ（Ｗ）に基づいて算出したが、これに限らず、これ以外の情報を基にして判定値を算出してもよい。たとえば考えられる例として、判定値算出部１８０は、会議クライアント装置３００Ａに関わる判定値の算出の際に、３００Ａ以外のクライアント装置がそれぞれ見ている合成映像上の、３００Ａに対応する映像の合成切り取りの横方向の長さに注目し、３つの合成切り取りの横方向の長さのうち、最も小さいものまたは最も大きいものに基づいて判定値を求めても良い。前者の例として、３つの合成切り取りの横方向の長さのうち、１つでも１００に満たないときは、各合成映像上の３００Ａの映像の横方向の長さ（Ｗ）の値に関わらず、常に判定値を１００として出力するとしてもよい。合成切り取りの横方向の長さが全て１００に等しいときは、前記と同様、各合成映像上の３００Ａの映像の横方向の長さ（Ｗ）の最大値を判定値として出力する。例えば図６のレイアウト情報から算出される３００Ａの判定値は、この場合、１００が導かれる（レイアウト情報１０００Ｃにおいて３００Ａに対応する映像の合成切り取りの横方向の長さが１００未満）。あるいは、これらとも異なる判定値の算出方法として、判定値算出部１８０は、３００Ａに関わる判定値の算出の際に、３００Ａ以外のクライアント装置がそれぞれ見ている合成映像上の、３００Ａに対応する映像の前後方向の相対位置に注目し、これらのうちの最小値または最大値を決定した後で、その値に応じて、１であれば判定値１００、２であれば判定値７５、３であれば判定値５０、４であれば判定値２５を出力するとしてもよい。例えば図６のレイアウト情報から算出される３００Ａの判定値は、３００Ａ以外の各合成映像上の３００Ａに対応する映像の前後方向の相対位置がそれぞれ２、４、４となるため、その最小値２に対応する７５が導かれる。判定値算出部１８０は、判定値１０２０Ａを算出すると、それを映像品質制御部１９０へ出力する。 Although the determination value is calculated based on the horizontal length (W) of the video here, the determination value is not limited to this, and the determination value may be calculated based on other information. For example, as a possible example, the determination value calculation unit 180 may perform composite cut-out of a video corresponding to 300A on a composite video viewed by a client device other than 300A when calculating a determination value related to the conference client device 300A. The determination value may be obtained based on the smallest or largest one of the three horizontal lengths of the combined cuts. As an example of the former, when at least one of the horizontal lengths of the three composite cuts is less than 100, regardless of the value of the horizontal length (W) of the 300A video on each composite video. The determination value may be always output as 100. When the horizontal lengths of the composite cuts are all equal to 100, the maximum value of the horizontal length (W) of the 300A video on each composite video is output as a determination value, as described above. For example, the determination value of 300A calculated from the layout information of FIG. 6 is 100 in this case (the horizontal length of the combined cutout of the video corresponding to 300A in the layout information 1000C is less than 100). Alternatively, as a determination value calculation method that is different from these, the determination value calculation unit 180, when calculating a determination value related to 300A, an image corresponding to 300A on a composite image viewed by a client device other than 300A. Focusing on the relative position in the front-rear direction, and determining the minimum or maximum value of these, depending on the value, if it is 1, the determination value is 100, 2 is the determination value 75, 3 For example, if the determination value is 50, the determination value 25 may be output. For example, the determination value of 300A calculated from the layout information of FIG. 6 has a minimum value of 2 because the relative positions in the front-rear direction of the images corresponding to 300A on each composite image other than 300A are 2, 4, and 4, respectively. 75 corresponding to is derived. After calculating the determination value 1020A, the determination value calculation unit 180 outputs the determination value 1020A to the video quality control unit 190.

以上では大きく３通りの判定値の算出方法を示したが、これらの算出方法は別個に用いてもよいし、これら３つの算出方法のうち任意の組み合わせを用いて判定値を算出してもよい。例えば各算出方法の判定値に重み付けを行った関数に、各算出方法で求めた判定値を入力することにより、判定値を算出してもよい。あるいは、各算出方法で得た判定値のうち最も小さいまたは大きいものを、求めるべき判定値としてもよい。 In the above, three methods for calculating judgment values have been shown, but these calculation methods may be used separately, or judgment values may be calculated using any combination of these three calculation methods. . For example, the determination value may be calculated by inputting the determination value obtained by each calculation method to a function that weights the determination value of each calculation method. Alternatively, the smallest or largest judgment value obtained by each calculation method may be used as the judgment value to be obtained.

映像品質制御部１９０は、判定値算出部１８０から判定値１０２０Ａが入力されると、映像符号化制御パラメータ１０３０Ａを算出する。映像符号化制御パラメータ１０３０Ａは、会議クライアント装置３００Ａのエンコード部３２０Ａでカメラ映像を符号化する際に使用される値で、ここでは符号化の際の解像度とフレームレートを規定している。映像符号化制御パラメータ１０３０Ａの算出の際には、図８、９に示すように、映像品質制御部１９０が判定値と映像符号化制御パラメータの対応テーブルを内部に保持しており、適宜参照して算出を行う。通常、判定値１０２０Ａの値の大きさに応じて、映像符号化制御パラメータ１０３０Ａに含まれる解像度とフレームレートの値は大きくなり、より高品質の映像に相当することになる。映像符号化制御パラメータ１０３０Ａを算出すると、映像品質制御部１９０は送信先の会議クライアント装置３００ＡのＩＰアドレスを含むＩＰヘッダを映像符号化制御パラメータ１０３０Ａに対して付加し、映像品質制御信号送信部２００へ出力する。 When the determination value 1020A is input from the determination value calculation unit 180, the video quality control unit 190 calculates the video encoding control parameter 1030A. The video encoding control parameter 1030A is a value used when the encoding unit 320A of the conference client apparatus 300A encodes the camera video, and here defines the resolution and frame rate at the time of encoding. When calculating the video coding control parameter 1030A, as shown in FIGS. 8 and 9, the video quality control unit 190 holds a correspondence table between the determination value and the video coding control parameter, and refers to it appropriately. To calculate. Normally, the resolution and frame rate values included in the video encoding control parameter 1030A increase in accordance with the magnitude of the determination value 1020A, which corresponds to a higher quality video. When the video coding control parameter 1030A is calculated, the video quality control unit 190 adds an IP header including the IP address of the destination conference client apparatus 300A to the video coding control parameter 1030A, and the video quality control signal transmission unit 200. Output to.

映像品質制御信号送信部２００は、映像品質制御部１９０からＩＰヘッダを含む映像符号化制御パラメータ１０３０Ａが入力されると、これに送信元の会議サーバ装置１００のＩＰアドレス等を含んだＩＰヘッダをさらに付加しＩＰパケットを生成する。このＩＰパケットを映像品質制御信号と呼ぶ。映像品質制御信号送信部２００は、通信路５４０Ａを利用して会議クライアント装置３００Ａへ映像品質制御信号を送出する。 When the video quality control signal transmission unit 200 receives the video encoding control parameter 1030A including the IP header from the video quality control unit 190, the video quality control signal transmission unit 200 receives the IP header including the IP address and the like of the conference server device 100 as the transmission source. Further, an IP packet is generated by adding. This IP packet is called a video quality control signal. The video quality control signal transmission unit 200 transmits a video quality control signal to the conference client apparatus 300A using the communication path 540A.

次に、会議クライアント装置３００Ａの内部構成要素について詳細に説明する。 Next, the internal components of the conference client apparatus 300A will be described in detail.

カメラ部３１０Ａは、会議参加者の顔映像や資料映像、あるいはＰＣデスクトップの映像などを撮影して得られた映像データをエンコード部３２０Ａへ出力する。なお、本実施の形態ではカメラ部を用いて撮影した映像データを利用するが、本発明ではカメラ以外にも、たとえばハードディスクやフロッピーディスク、ＭＯ（Magneto-Optical disk）、ＣＤ−Ｒ（Compact Disk Recordable）、磁気テープといった外部記憶装置に記憶された映像データを読み取って利用してもよい。 The camera unit 310A outputs video data obtained by capturing a face image, a material image, a PC desktop image, or the like of a conference participant to the encoding unit 320A. In this embodiment, video data captured using a camera unit is used. However, in the present invention, in addition to the camera, for example, a hard disk, a floppy disk, an MO (Magneto-Optical disk), a CD-R (Compact Disk Recordable). ), Video data stored in an external storage device such as a magnetic tape may be read and used.

エンコード部３２０Ａは、カメラ部３１０Ａから映像データが入力されると、符号化し映像送信部３３０Ａへ出力する。符号化の際には、あらかじめデフォルト値として設定された映像符号化制御パラメータ（解像度およびフレームレート）に従い符号化を行う。また、映像品質制御信号受信部３９０Ａから映像符号化制御パラメータが入力されると、保持している映像符号化制御パラメータの値を入力値で更新する。なお、本実施の形態では映像データの符号化方式にＭＰＥＧ４を使用する。 When the video data is input from the camera unit 310A, the encoding unit 320A encodes and outputs the encoded data to the video transmission unit 330A. At the time of encoding, encoding is performed according to video encoding control parameters (resolution and frame rate) set in advance as default values. When a video encoding control parameter is input from the video quality control signal receiving unit 390A, the value of the held video encoding control parameter is updated with the input value. In this embodiment, MPEG4 is used as the video data encoding method.

映像送信部３３０Ａは、エンコード部３２０Ａから、符号化された映像データが入力されると、通信路５１０Ａを通じて会議サーバ装置１００に映像データの送信を行う。 When the encoded video data is input from the encoding unit 320A, the video transmission unit 330A transmits the video data to the conference server apparatus 100 through the communication path 510A.

映像受信部３４０Ａは、通信路５２０Ａを利用して会議サーバ装置１００から映像データを受信しデコード部３５０Ａへ出力する。 Video receiving unit 340A receives video data from conference server apparatus 100 using communication path 520A and outputs the video data to decoding unit 350A.

デコード部３５０Ａは、映像受信部３４０Ａから、符号化された映像データが入力されると、復号化し表示部３６０Ａへ出力する。なお、本実施の形態では映像データの符号化方式にＭＰＥＧ４を使用する。 When the encoded video data is input from the video receiving unit 340A, the decoding unit 350A decodes and outputs the decoded video data to the display unit 360A. In this embodiment, MPEG4 is used as the video data encoding method.

表示部３６０Ａは、デコード部３５０Ａから入力された映像データを表示する。また、表示部３６０ＡはたとえばＣＲＴ（Cathode Ray Tube）ディスプレイや液晶ディスプレイといった表示装置である。 Display unit 360A displays the video data input from decoding unit 350A. The display unit 360A is a display device such as a CRT (Cathode Ray Tube) display or a liquid crystal display.

ユーザ・インターフェース部３７０Ａは、マウスなどのポインティングデバイスや、キーボードなどの入力装置を有している。ユーザはこれらの装置を用いて自分の所望するレイアウト情報を入力する。ユーザ・インターフェース部３７０Ａは、ユーザの入力装置の操作に応じて、表示部３６０Ａに表示されるポインティングデバイスのカーソル表示位置を更新し、またレイアウト情報の入力を検知すると、レイアウト制御信号送信部３８０Ａへレイアウト情報を出力する。なお、ユーザがレイアウト情報を入力する際には、一般的なウィンドウ型オブジェクトの操作と同様にして視覚的にレイアウト情報の変更を行うことができる。あるいは図１０のように、あらかじめ決められている使用可能なレイアウトのパターン１０４０Ａ、１０５０Ａ、１０６０Ａの選択肢を、ユーザの指示によって表示部３６０Ａに表示してもよい。この場合、ユーザは所望のレイアウトに最も近いと判断したパターンを入力装置を利用して選択する。各パターンにはそれぞれレイアウト情報が対応しており、選択されたパターンに対応するレイアウト情報をユーザの所望するレイアウト情報と見なす。ユーザの所望するレイアウト情報が決定されると、レイアウト制御信号送信部３８０Ａへレイアウト情報が出力される。 The user interface unit 370A has a pointing device such as a mouse and an input device such as a keyboard. The user inputs layout information desired by the user using these devices. The user interface unit 370A updates the cursor display position of the pointing device displayed on the display unit 360A according to the operation of the user's input device, and when detecting the input of layout information, the user interface unit 370A sends the layout control signal to the layout control signal transmission unit 380A. Output layout information. When the user inputs layout information, the layout information can be visually changed in the same manner as a general window object operation. Alternatively, as shown in FIG. 10, choices of predetermined layout patterns 1040A, 1050A, and 1060A that can be used may be displayed on the display unit 360A according to a user instruction. In this case, the user selects a pattern determined to be closest to the desired layout using the input device. Each pattern corresponds to layout information, and the layout information corresponding to the selected pattern is regarded as the layout information desired by the user. When the layout information desired by the user is determined, the layout information is output to layout control signal transmission unit 380A.

レイアウト制御信号送信部３８０Ａは、ユーザ・インターフェース部３７０Ａからレイアウト情報が入力されると、送信先の会議クライアント装置３００ＡのＩＰアドレス等を含んだＩＰヘッダをレイアウト情報に付加しＩＰパケットを生成する。このＩＰパケットをレイアウト制御信号と呼ぶ。レイアウト制御信号送信部３８０Ａはレイアウト制御信号を生成すると、通信路５３０Ａを利用して会議サーバ装置１００へ送信する。 When layout information is input from the user interface unit 370A, the layout control signal transmission unit 380A adds an IP header including the IP address of the destination conference client apparatus 300A to the layout information and generates an IP packet. This IP packet is called a layout control signal. When the layout control signal transmission unit 380A generates the layout control signal, the layout control signal transmission unit 380A transmits the layout control signal to the conference server apparatus 100 using the communication path 530A.

映像品質制御信号受信部３９０Ａは、通信路５４０Ａを利用して会議サーバ装置１００から映像品質制御信号を受信する。映像品質制御信号受信部３９０Ａは、映像品質制御信号のＩＰペイロード部分から映像符号化制御パラメータ１０３０Ａを抽出し、抽出した映像符号化制御パラメータ１０３０Ａをエンコード部３２０Ａへ出力する。 The video quality control signal receiving unit 390A receives the video quality control signal from the conference server device 100 using the communication path 540A. The video quality control signal receiving unit 390A extracts the video encoding control parameter 1030A from the IP payload portion of the video quality control signal, and outputs the extracted video encoding control parameter 1030A to the encoding unit 320A.

次に、会議サーバ装置１００と、会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄにおける処理の一例として、各会議クライアント装置により会議サーバ装置１００内部で管理しているレイアウト情報に変更が加えられることにより、各会議クライアント装置の送出する映像の品質が動的に制御される様子を示す。なお、会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄを、識別番号１、２、３、４で区別する。 Next, as an example of processing in the conference server device 100 and the conference client devices 300A, 300B, 300C, and 300D, the layout information managed inside the conference server device 100 by each conference client device is changed, A mode that the quality of the image | video which each conference client apparatus sends out is controlled dynamically is shown. The conference client devices 300A, 300B, 300C, and 300D are distinguished by identification numbers 1, 2, 3, and 4.

会議が開始した直後においては、図１１に示すように、レイアウト情報記憶部１４０に格納される各会議クライアント装置のレイアウト情報は、いずれもデフォルトのレイアウト情報１０７０Ａ、１０７０Ｂ、１０７０Ｃ、１０７０Ｄに設定される。レイアウト情報１０７０Ａ、１０７０Ｂ、１０７０Ｃ、１０７０Ｄは、すべて映像のサイズが等しく（（Ｗ，Ｈ）＝（５０，５０））、互いに重ならないよう配置されている。このレイアウト情報を元に映像合成部１５０で映像の合成が行われ、各会議クライアント装置の表示部３６０Ａ、３６０Ｂ、３６０Ｃ、３６０Ｄには、図１２のように、各会議参加者の映像サイズの等しい合成画面が表示される。また、デフォルトとして、各会議クライアント装置のエンコード部３２０Ａ、３２０Ｂ、３２０Ｃ、３２０Ｄにおいて、符号化を行う際の映像符号化制御パラメータは、図１３のように、いずれも解像度が６４０×４８０、フレームレートが３０ｆｐｓに設定される。 Immediately after the start of the conference, as shown in FIG. 11, the layout information of each conference client device stored in the layout information storage unit 140 is set to default layout information 1070A, 1070B, 1070C, 1070D. . The layout information 1070A, 1070B, 1070C, and 1070D are all arranged with the same video size ((W, H) = (50, 50)) so as not to overlap each other. Based on this layout information, the video composition unit 150 synthesizes the video, and the display units 360A, 360B, 360C, and 360D of each conference client device have the same video size of each conference participant as shown in FIG. The composite screen is displayed. Further, as a default, in the encoding units 320A, 320B, 320C, and 320D of each conference client device, the video encoding control parameters when performing encoding are all 640 × 480 and the frame rate as shown in FIG. Is set to 30 fps.

次に、会議に参加している各ユーザが、自身のレイアウト情報に変更を加えた状況を考える。会議に参加しているユーザはそれぞれ、ユーザ・インターフェース部３７０Ａ、３７０Ｂ、３７０Ｃ、３７０Ｄを利用して、自分が現在見ている合成映像のレイアウトに変更を加えることができる。いま、各ユーザの操作によって、図１４に示すようなレイアウト情報１０８０Ａ、１０８０Ｂ、１０８０Ｃ、１０８０Ｄが指定されたとする。これらのレイアウト情報は、各会議クライアント装置のレイアウト制御信号送信部３８０Ａ、３８０Ｂ、３８０Ｃ、３８０Ｄ、および、通信路５３０Ａ、５３０Ｂ、５３０Ｃ、５３０Ｄを通じて会議サーバ装置１００へ送出される。そしてレイアウト制御信号受信部１３０を通じてレイアウト情報記憶部１４０に格納されると、映像合成部１５０はこのレイアウト情報を参照して図１５のように映像合成を行う。それとともに、レイアウト制御信号受信部１３０から判定値算出部１８０に対し、各会議クライアント装置に関わる判定値の算出要求が入力される。判定値算出部１８０は判定値の算出要求が入力されると、レイアウト情報記憶部１４０からレイアウト情報１０８０Ａ、１０８０Ｂ、１０８０Ｃ、１０８０Ｄを取得し、この情報に基づいて、各会議クライアント装置に対する判定値を図１６のように算出する。たとえば会議クライアント装置３００Ａに対する判定値は以下のように算出される。まず、取得したレイアウト情報のうち、３００Ａから取得したもの以外のレイアウト情報、つまり１０８０Ｂ、１０８０Ｃ、１０８０Ｄに注目する。この中で、識別番号が１（＝３００Ａの識別番号）に相当する行に注目し、この行に含まれる映像の横方向の長さ（Ｗ）を抽出する。この場合、１０８０Ｂから抽出したＷ値は２５、１０８０Ｃから抽出したＷ値は２５、１０８０Ｄから抽出したＷ値は２５となる。そして、これら３つの値の中で最大のものを３００Ａの判定値１０９０Ａとする。このような処理を、判定値算出部１８０は、他の会議クライアント装置３００Ｂ、３００Ｃ、３００Ｄに関しても行う。会議クライアント装置３００Ｂの場合、レイアウト情報１０８０Ａ、１０８０Ｃ、１０８０Ｄから抽出したＷ値は、それぞれ、５０、１００、１００となるから、３００Ｂの判定値１０９０Ｂは１００と算出される。会議クライアント装置３００Ｃの場合、レイアウト情報１０８０Ａ、１０８０Ｂ、１０８０Ｄから抽出したＷ値は、それぞれ、５０、１００、２５となるから、３００Ｃの判定値１０９０Ｃは１００と算出される。会議クライアント装置３００Ｄの場合、レイアウト情報１０８０Ａ、１０８０Ｂ、１０８０Ｃから抽出したＷ値は、それぞれ、５０、２５、２５となるから、３００Ｄの判定値１０９０Ｄは５０と算出される。判定値算出部１８０は、以上のように算出した判定値１０９０Ａ、１０９０Ｂ、１０９０Ｃ、１０９０Ｄを映像品質制御部１９０へそれぞれ入力する。 Next, consider a situation in which each user participating in the conference has changed his / her layout information. Users participating in the conference can make changes to the layout of the composite video that they are currently viewing using the user interface units 370A, 370B, 370C, and 370D, respectively. Now, it is assumed that layout information 1080A, 1080B, 1080C, 1080D as shown in FIG. The layout information is transmitted to the conference server apparatus 100 through the layout control signal transmission units 380A, 380B, 380C, and 380D of each conference client apparatus and the communication paths 530A, 530B, 530C, and 530D. When stored in the layout information storage unit 140 through the layout control signal receiving unit 130, the video composition unit 150 performs video composition as shown in FIG. 15 with reference to the layout information. At the same time, a determination value calculation request related to each conference client device is input from the layout control signal receiving unit 130 to the determination value calculation unit 180. When a determination value calculation request is input, the determination value calculation unit 180 obtains layout information 1080A, 1080B, 1080C, and 1080D from the layout information storage unit 140. Based on this information, a determination value for each conference client device is obtained. Calculation is performed as shown in FIG. For example, the determination value for the conference client apparatus 300A is calculated as follows. First, attention is paid to layout information other than that acquired from 300A among the acquired layout information, that is, 1080B, 1080C, and 1080D. Among these, paying attention to the line corresponding to the identification number 1 (= identification number of 300A), the horizontal length (W) of the video included in this line is extracted. In this case, the W value extracted from 1080B is 25, the W value extracted from 1080C is 25, and the W value extracted from 1080D is 25. The maximum value among these three values is set to a determination value 1090A of 300A. The determination value calculation unit 180 performs such processing also for the other conference client devices 300B, 300C, and 300D. In the case of the conference client apparatus 300B, the W values extracted from the layout information 1080A, 1080C, and 1080D are 50, 100, and 100, respectively, and therefore the determination value 1090B of 300B is calculated as 100. In the case of the conference client apparatus 300C, the W values extracted from the layout information 1080A, 1080B, and 1080D are 50, 100, and 25, respectively. Therefore, the determination value 1090C of 300C is calculated as 100. In the case of the conference client apparatus 300D, the W values extracted from the layout information 1080A, 1080B, and 1080C are 50, 25, and 25, respectively. Therefore, the determination value 1090D of 300D is calculated as 50. The determination value calculation unit 180 inputs the determination values 1090A, 1090B, 1090C, and 1090D calculated as described above to the video quality control unit 190, respectively.

各判定値を受け取った映像品質制御部１９０は、判定値から映像符号化制御パラメータへの変換を行う。映像品質制御部１９０は内部に図８のような判定値と映像符号化制御パラメータの対応テーブルを保持しており、この対応テーブルを参照して映像符号化制御パラメータを図１７のように算出する。たとえば判定値１０９０Ａの場合、これに対応する映像符号化制御パラメータ１１００Ａは、解像度が３２０×２４０ドット、フレームレートが１５ｆｐｓと算出される。判定値１０９０Ｂ、１０９０Ｃ、１０９０Ｄに対応する映像符号化制御パラメータ１１００Ｂ、１１００Ｃ、１１００Ｄについても同様に、図１７に示すとおりに算出される。 Receiving each determination value, the video quality control unit 190 converts the determination value into a video encoding control parameter. The video quality control unit 190 internally stores a correspondence table of determination values and video encoding control parameters as shown in FIG. 8, and calculates the video encoding control parameters as shown in FIG. 17 with reference to this correspondence table. . For example, in the case of the determination value 1090A, the video encoding control parameter 1100A corresponding to this is calculated with a resolution of 320 × 240 dots and a frame rate of 15 fps. Similarly, the video encoding control parameters 1100B, 1100C, and 1100D corresponding to the determination values 1090B, 1090C, and 1090D are calculated as illustrated in FIG.

映像符号化制御パラメータ１１００Ａ、１１００Ｂ、１１００Ｃ、１１００Ｄが算出されると、これらの値は、映像品質情報送信部２００、および通信路５４０Ａ、５４０Ｂ、５４０Ｃ、５４０Ｄを通じて各会議クライアント装置へ送出される。 When the video encoding control parameters 1100A, 1100B, 1100C, and 1100D are calculated, these values are sent to each conference client device through the video quality information transmission unit 200 and the communication paths 540A, 540B, 540C, and 540D.

会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄは、それぞれ、映像符号化制御パラメータ１１００Ａ、１１００Ｂ、１１００Ｃ、１１００Ｄを受信する。これらの映像符号化制御パラメータは、それぞれ、エンコード部３２０Ａ、３２０Ｂ、３２０Ｃ、３２０Ｄへ入力され、これにしたがって符号化の際の映像品質が調整される。今回の場合、デフォルトの映像符号化制御パラメータと比べて、会議クライアント装置３００Ａの送信する映像の映像符号化制御パラメータ１１００Ａに変更が加えられたことになる。（１１００Ｂ、１１００Ｃ、１１００Ｄは、図１３に示すデフォルトのパラメータと結果的に変わらない。）つまりこの場合、会議クライアント装置３００Ａの映像に対し、他の会議クライアント装置３００Ｂ、３００Ｃ、３００Ｄは、この映像を他と比べて小さなサイズで表示していることから、３００Ａの判定値が低いと見なしたことになる。このため、エンコード部３２０Ａでのカメラ映像符号化の映像品質が下げられ、結果、通信路５１０Ａを利用して伝送されるデータ量の節約が図られたことになる。 The conference client apparatuses 300A, 300B, 300C, and 300D receive the video encoding control parameters 1100A, 1100B, 1100C, and 1100D, respectively. These video encoding control parameters are input to the encoding units 320A, 320B, 320C, and 320D, respectively, and the video quality at the time of encoding is adjusted accordingly. In this case, compared to the default video encoding control parameter, the video encoding control parameter 1100A of the video transmitted by the conference client apparatus 300A has been changed. (1100B, 1100C, and 1100D do not change from the default parameters shown in FIG. 13 as a result.) That is, in this case, the other conference client devices 300B, 300C, and 300D use this video for the video of the conference client device 300A. Is displayed in a smaller size than the others, it is considered that the judgment value of 300A is low. For this reason, the video quality of the camera video encoding in the encoding unit 320A is lowered, and as a result, the amount of data transmitted using the communication path 510A is saved.

以降、各ユーザからレイアウト情報の変更の指示がなされると、その毎に、以上と同様にして各会議クライアント装置のカメラ映像符号化の映像品質が調整される。 Thereafter, when each user gives an instruction to change the layout information, the video quality of the camera video encoding of each conference client device is adjusted in the same manner as described above.

以上、本発明の第１の実施の形態として、会議サーバ装置１００および会議クライアント装置３００Ａ〜３００Ｄの詳細な内部構成と動作について説明した。本発明により、会議に参加する複数のクライアントの映像の参照状況に基づいて、クライアントからサーバへ送信する映像データの制御を行い、通信帯域幅を節約することが可能となる。 The detailed internal configurations and operations of the conference server device 100 and the conference client devices 300A to 300D have been described above as the first embodiment of the present invention. According to the present invention, it is possible to control video data transmitted from a client to a server based on video reference statuses of a plurality of clients participating in a conference, thereby saving communication bandwidth.

（第２の実施の形態）
以下、本発明の第２の実施の形態を、図１、および、図３から図１８を参照して説明する。 (Second Embodiment)
Hereinafter, a second embodiment of the present invention will be described with reference to FIGS. 1 and 3 to 18.

会議サーバ装置１００は、会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄから送信された映像データを受信する機能を備えている。また、受信した複数の映像データを一つの映像に合成する機能を備えている。なお映像合成の際には、複数の映像合成を、各会議クライアント装置ごとに独立して行うことができる。また、ネットワーク５００Ａ、５００Ｂ、５００Ｃ、５００Ｄを通じて会議クライアント装置３００Ａ、３００Ｂ、３００Ｃ、３００Ｄに合成映像をそれぞれ送信する機能を備えている。また同様にネットワーク５００Ａ、５００Ｂ、５００Ｃ、５００Ｄを通じて、各会議クライアント装置に制御信号を送信する機能を備えている。 The conference server apparatus 100 has a function of receiving video data transmitted from the conference client apparatuses 300A, 300B, 300C, and 300D. Also, it has a function of combining a plurality of received video data into one video. It should be noted that a plurality of video compositions can be performed independently for each conference client device at the time of video composition. In addition, a composite video is transmitted to the conference client apparatuses 300A, 300B, 300C, and 300D through the networks 500A, 500B, 500C, and 500D. Similarly, it has a function of transmitting a control signal to each conference client device through the networks 500A, 500B, 500C, and 500D.

次に、図１８を用いて、会議サーバ装置と会議クライアント装置において本発明に関わる内部構成要素を示す。会議サーバ装置における内部構成要素、会議クライアント装置における内部構成要素により表される機能はプログラムをコンピュータに実行させることによって実現してもよいしハードウェア的に実現してもよい。 Next, internal components related to the present invention in the conference server apparatus and the conference client apparatus will be described with reference to FIG. The functions represented by the internal components in the conference server device and the internal components in the conference client device may be realized by causing a computer to execute a program or may be realized by hardware.

会議サーバ装置１００は、その構成要素として、通信路５１０Ａを利用して映像データの受信を行う映像受信部１１０と、映像データの復号化を行うデコード部１２０と、通信路５３０Ａを利用してレイアウト制御信号の受信を行うレイアウト制御信号受信部１３０と、レイアウト制御信号から解析して得られたレイアウト情報を格納するレイアウト情報記憶部１４０と、映像の合成処理を行う映像合成部１５０と、合成映像の符号化を行うエンコード部１６０と、通信路５２０Ａを利用して映像データの送信を行う映像送信部１７０と、通信路５４０Ａを利用してレイアウト情報通知信号の送信を行うレイアウト情報通知信号送信部２１０とを有している。なお、以下で示す会議サーバ装置１００の各内部構成要素の動作は、会議クライアント装置３００Ａとの間で行われる処理について説明しているが、会議クライアント装置３００Ａとの間に限らず、３００Ｂ、３００Ｃ、３００Ｄとの間についても同様にして行われる。 The conference server apparatus 100 includes, as its constituent elements, a video receiving unit 110 that receives video data using the communication path 510A, a decoding unit 120 that decodes video data, and a layout that uses the communication path 530A. A layout control signal receiving unit 130 that receives a control signal, a layout information storage unit 140 that stores layout information obtained by analysis from the layout control signal, a video synthesis unit 150 that performs video synthesis processing, and a synthesized video An encoding unit 160 that performs encoding, a video transmission unit 170 that transmits video data using the communication channel 520A, and a layout information notification signal transmission unit that transmits a layout information notification signal using the communication channel 540A. 210. In addition, although the operation | movement of each internal component of the conference server apparatus 100 shown below has demonstrated the process performed between conference client apparatuses 300A, it is not restricted to conference client apparatus 300A, 300B, 300C , 300D is performed in the same manner.

会議クライアント装置３００Ａは、その構成要素として、カメラ部３１０Ａと、合成映像の符号化を行うエンコード部３２０Ａと、通信路５１０Ａを通じて映像データの送信を行う映像送信部３３０Ａと、通信路５２０Ａを利用して映像データの受信を行う映像受信部３４０Ａと、映像データの復号化を行うデコード部３５０Ａと、映像の表示を行う表示部３６０Ａと、ユーザ・インターフェース部３７０Ａと、通信路５３０Ａを利用してレイアウト制御信号の送信を行うレイアウト制御信号送信部３８０Ａと、通信路５４０Ａを利用してレイアウト情報通知信号の受信を行うレイアウト情報通知信号受信部２２０Ａと、前記レイアウト情報から判定値の算出を行う判定値算出部１８０Ａと、前記判定値から映像符号化制御パラメータを算出する映像品質制御部１９０Ａとを有している。なお、図１８では会議クライアント装置３００Ａを１台だけ記載しているが、会議サーバ装置１００にはこの他にも会議クライアント装置３００Ｂ、３００Ｃ、３００Ｄが接続されており、それぞれは会議クライアント装置３００Ａと同様の内部構成要素を持つものとする。 The conference client device 300A uses, as its constituent elements, a camera unit 310A, an encoding unit 320A that encodes synthesized video, a video transmission unit 330A that transmits video data through the communication channel 510A, and a communication channel 520A. A video receiving unit 340A that receives video data, a decoding unit 350A that decodes video data, a display unit 360A that displays video, a user interface unit 370A, and a layout using the communication path 530A. A layout control signal transmission unit 380A that transmits a control signal, a layout information notification signal reception unit 220A that receives a layout information notification signal using the communication path 540A, and a determination value that calculates a determination value from the layout information The video encoding control parameter is calculated from the calculation unit 180A and the determination value. And a video quality control section 190A that. In FIG. 18, only one conference client device 300A is shown, but the conference server device 100 is connected to other conference client devices 300B, 300C, and 300D, each of which is connected to the conference client device 300A. It shall have similar internal components.

映像受信部１１０は、第１の実施の形態で説明した映像受信部１１０と同じ動作を行うものとする。 The video receiving unit 110 performs the same operation as the video receiving unit 110 described in the first embodiment.

デコード部１２０は、第１の実施の形態で説明したデコード部１２０と同じ動作を行うものとする。 The decoding unit 120 performs the same operation as the decoding unit 120 described in the first embodiment.

レイアウト制御信号受信部１３０は、通信路５３０Ａを利用して会議クライアント装置３００Ａからレイアウト制御信号を受信する。レイアウト制御信号は、ＩＰヘッダ部とＩＰペイロード部からなるＩＰパケットとして送信するものとし、ＩＰヘッダ部にはレイアウト制御信号の送信元の会議クライアント装置３００ＡのＩＰアドレスが含まれ、またＩＰペイロード部には、会議クライアント装置３００Ａにおいて指定されたレイアウト情報１０００Ａが含まれる。レイアウト制御信号受信部１３０はレイアウト制御信号を解析し、ＩＰヘッダ部に含まれる送信元ＩＰアドレスから、レイアウト制御信号の送信元である会議クライアント装置３００Ａを特定し、またＩＰペイロード部からレイアウト情報１０００Ａを抽出する。 The layout control signal receiving unit 130 receives a layout control signal from the conference client apparatus 300A using the communication path 530A. The layout control signal is transmitted as an IP packet including an IP header portion and an IP payload portion. The IP header portion includes the IP address of the conference client apparatus 300A that is the source of the layout control signal, and the IP payload portion includes Includes the layout information 1000A specified in the conference client apparatus 300A. The layout control signal receiving unit 130 analyzes the layout control signal, specifies the conference client device 300A that is the transmission source of the layout control signal from the source IP address included in the IP header portion, and the layout information 1000A from the IP payload portion. To extract.

ここでレイアウト情報とは、図３に示すように、第１の実施の形態で説明した識別番号・合成位置・合成サイズ・前後方向の相対位置の値の組と同じものとする。もちろん、これに限らず、たとえば合成切り取り位置・合成切り取りサイズ等を含んでいてもよい。合成切り取り位置・合成切り取りサイズの定義は、第１の実施の形態で説明した定義と同じとする。レイアウト制御信号受信部１３０は、レイアウト情報１０００Ａを抽出すると、レイアウト情報記憶部１４０へこれを出力するとともに、レイアウト情報通知信号送信部２１０へ３００Ｂ、３００Ｃ、３００Ｄのそれぞれに関わるレイアウト情報通知信号の送信要求を出力する。 Here, as shown in FIG. 3, the layout information is the same as the set of the identification number, the composite position, the composite size, and the relative position value in the front-rear direction described in the first embodiment. Of course, the present invention is not limited to this, and may include, for example, a combined cut position, a combined cut size, and the like. The definition of the combined cut position and the combined cut size is the same as the definition described in the first embodiment. When the layout control signal receiving unit 130 extracts the layout information 1000A, the layout control signal receiving unit 130 outputs the layout information 1000A to the layout information storage unit 140 and transmits the layout information notification signal related to each of 300B, 300C, and 300D to the layout information notification signal transmission unit 210. Output the request.

レイアウト情報記憶部１４０は、レイアウト制御信号受信部１３０からレイアウト情報１０００Ａが入力されると、入力されたレイアウト情報１０００Ａを記憶する。もしこの状態で、会議クライアント装置３００Ａから入力が新たに行われた場合は、レイアウト情報１０００Ａを新しい入力値で更新する。また、レイアウト情報記憶部１４０は、映像合成部１５０からレイアウト情報１０００Ａの取得要求が入力されると、記憶しているレイアウト情報１０００Ａを映像合成部１５０へ出力する。また、レイアウト情報記憶部１４０は、会議クライアント装置３００Ｂ、３００Ｃ、３００Ｄからレイアウト情報１０００Ｂ、１０００Ｃ、１０００Ｄが入力されると、同様にこれらのレイアウト情報を記憶する。また、レイアウト情報記憶部１４０は、レイアウト情報通知信号送信部２１０から３つのレイアウト情報１０００Ｂ、１０００Ｃ、１０００Ｄの取得要求が入力されると、記憶しているレイアウト情報１０００Ｂ、１０００Ｃ、１０００Ｄをレイアウト情報通知信号送信部２１０へ出力する。 When the layout information 1000A is input from the layout control signal receiving unit 130, the layout information storage unit 140 stores the input layout information 1000A. If a new input is made from the conference client apparatus 300A in this state, the layout information 1000A is updated with a new input value. In addition, when an acquisition request for layout information 1000 A is input from the video composition unit 150, the layout information storage unit 140 outputs the stored layout information 1000 A to the video composition unit 150. In addition, when layout information 1000B, 1000C, and 1000D is input from the conference client devices 300B, 300C, and 300D, the layout information storage unit 140 similarly stores the layout information. Further, when an acquisition request for the three layout information 1000B, 1000C, and 1000D is input from the layout information notification signal transmission unit 210, the layout information storage unit 140 notifies the stored layout information 1000B, 1000C, and 1000D to the layout information notification. The signal is output to the signal transmission unit 210.

映像合成部１５０は、第１の実施の形態で説明した映像合成部１５０と同じ動作を行うものとする。 The video composition unit 150 performs the same operation as the video composition unit 150 described in the first embodiment.

エンコード部１６０は、第１の実施の形態で説明したエンコード部１６０と同じ動作を行うものとする。 The encoding unit 160 performs the same operation as the encoding unit 160 described in the first embodiment.

映像送信部１７０は、第１の実施の形態で説明した映像送信部１７０と同じ動作を行うものとする。 The video transmission unit 170 performs the same operation as the video transmission unit 170 described in the first embodiment.

レイアウト情報通知信号送信部２１０は、レイアウト制御信号受信部１３０からレイアウト情報の送信要求が入力されると、レイアウト情報記憶部１４０に、３００Ａを除いた３つの会議クライアント装置のレイアウト情報１０００Ｂ、１０００Ｃ、１０００Ｄの取得要求を出力する。レイアウト情報記憶部１４０から、図６のような３つのレイアウト情報１０００Ｂ、１０００Ｃ、１０００Ｄが入力されると、これらのレイアウト情報に対し、送信先の会議クライアント装置３００ＡのＩＰアドレス等を含んだＩＰヘッダを付加しＩＰパケットを生成する。このＩＰパケットをレイアウト情報通知信号と呼ぶ。レイアウト情報通知信号送信部２１０はレイアウト情報通知信号を生成すると、通信路５４０Ａを利用してこれを会議クライアント装置３００Ａへ送信する。 When the layout information notification signal transmission unit 210 receives a layout information transmission request from the layout control signal reception unit 130, the layout information storage unit 140 receives the layout information 1000B, 1000C, A 1000D acquisition request is output. When three pieces of layout information 1000B, 1000C, and 1000D as shown in FIG. 6 are input from the layout information storage unit 140, an IP header including the IP address and the like of the conference client device 300A that is the transmission destination for these layout information. To generate an IP packet. This IP packet is called a layout information notification signal. When the layout information notification signal transmission unit 210 generates the layout information notification signal, the layout information notification signal transmission unit 210 transmits the layout information notification signal to the conference client apparatus 300A using the communication path 540A.

カメラ部３１０Ａは、第１の実施の形態で説明したカメラ部３１０Ａと同じ動作を行うものとする。 The camera unit 310A performs the same operation as the camera unit 310A described in the first embodiment.

エンコード部３２０Ａは、映像符号化制御パラメータを内部に保持し、第１の実施の形態で説明したエンコード部３２０Ａと同じ動作を行うものとする。 The encoding unit 320A holds video encoding control parameters inside and performs the same operation as the encoding unit 320A described in the first embodiment.

映像送信部３３０Ａは、通信路５１０Ａを利用して第１の実施の形態で説明した映像送信部３３０Ａと同じ動作を行うものとする。 The video transmission unit 330A performs the same operation as the video transmission unit 330A described in the first embodiment using the communication path 510A.

映像受信部３４０Ａは、通信路５２０Ａを利用して、第１の実施の形態で説明した映像受信部３４０Ａと同じ動作を行うものとする。 The video reception unit 340A performs the same operation as the video reception unit 340A described in the first embodiment using the communication path 520A.

デコード部３５０Ａは、第１の実施の形態で説明したデコード部３５０Ａと同じ動作を行うものとする。 The decoding unit 350A performs the same operation as the decoding unit 350A described in the first embodiment.

表示部３６０Ａは、第１の実施の形態で説明した表示部３６０Ａと同じ動作を行うものとする。 The display unit 360A performs the same operation as the display unit 360A described in the first embodiment.

ユーザ・インターフェース部３７０Ａは、第１の実施の形態で説明したユーザ・インターフェース部３７０Ａと同様の入力装置を有し、同様の動作を行うものとする。 The user interface unit 370A has the same input device as the user interface unit 370A described in the first embodiment, and performs the same operation.

レイアウト制御信号送信部３８０Ａは、通信路５３０Ａを利用して第１の実施の形態で説明したレイアウト制御信号送信部３８０Ａと同じ動作を行うものとする。 The layout control signal transmission unit 380A performs the same operation as the layout control signal transmission unit 380A described in the first embodiment using the communication path 530A.

レイアウト情報通知信号受信部２２０Ａは、通信路５４０Ａを利用して会議サーバ装置１００からレイアウト情報通知信号を受信する。レイアウト情報通知信号受信部２２０Ａは、レイアウト情報通知信号のＩＰペイロード部分から、３つの会議クライアント装置のレイアウト情報１０００Ｂ、１０００Ｃ、１０００Ｄを抽出し、抽出したレイアウト情報を判定値算出部１８０Ａへ出力する。 The layout information notification signal receiving unit 220A receives a layout information notification signal from the conference server apparatus 100 using the communication path 540A. The layout information notification signal receiving unit 220A extracts the layout information 1000B, 1000C, and 1000D of the three conference client devices from the IP payload portion of the layout information notification signal, and outputs the extracted layout information to the determination value calculation unit 180A.

判定値算出部１８０Ａは、レイアウト情報通知信号受信部２２０Ａから３つのレイアウト情報１０００Ｂ、１０００Ｃ、１０００Ｄが入力されると、これらのレイアウト情報にもとづいて、図１６に示すような判定値１０９０Ａを算出する。判定値１０９０Ａとは、会議クライアント装置３００Ａの送出するカメラ映像が、他の会議クライアント装置のユーザからどれだけ注目を受けているかの度合いを表す。ここでは、判定値１０９０Ａを、会議クライアント装置３００Ｂ、３００Ｃ、３００Ｄがそれぞれ見ている合成映像上の３００Ａの映像の横方向の長さ（Ｗ）の中の最大値と定義する。つまり、図６の中のレイアウト情報１０００Ｂ、１０００Ｃ、１０００Ｄの中で、識別番号が１（＝３００Ａの識別番号）に等しい行に注目し、その中で最大のＷ値を選びこれを判定値１０９０Ａと定める。なお、本実施の形態では１０００Ａを除く３つのレイアウト情報に基づいて判定値を算出したが、１０００Ａを含めた４つのレイアウト情報に基づいた上で判定値を算出してもよい。つまり、レイアウト情報通知信号送信部２１０からのレイアウト情報通知信号が４つのレイアウト情報１０００Ａ、１０００Ｂ、１０００Ｃ、１０００Ｄを含んでおり、これら４つのレイアウト情報の３００ＡのＷ値の最大値を判定値１０９０Ａと定義してもよい。判定値算出部１８０Ａは、判定値１０９０Ａを算出すると、それを映像品質制御部１９０Ａへ出力する。 When three pieces of layout information 1000B, 1000C, and 1000D are input from layout information notification signal receiving unit 220A, determination value calculation unit 180A calculates determination value 1090A as shown in FIG. 16 based on these layout information. . The determination value 1090A represents the degree to which the camera video transmitted from the conference client apparatus 300A has received attention from users of other conference client apparatuses. Here, the determination value 1090A is defined as the maximum value in the horizontal length (W) of the 300A video on the composite video viewed by the conference client devices 300B, 300C, and 300D. That is, in the layout information 1000B, 1000C, and 1000D in FIG. 6, pay attention to the line where the identification number is equal to 1 (= the identification number of 300A), and select the largest W value among them and select this as the determination value 1090A. It is determined. In the present embodiment, the determination value is calculated based on the three pieces of layout information excluding 1000A. However, the determination value may be calculated based on the four pieces of layout information including 1000A. That is, the layout information notification signal from the layout information notification signal transmission unit 210 includes four pieces of layout information 1000A, 1000B, 1000C, and 1000D, and the maximum W value of 300A of these four pieces of layout information is determined as a determination value 1090A. It may be defined. When determination value calculation unit 180A calculates determination value 1090A, it outputs it to video quality control unit 190A.

映像品質制御部１９０Ａは、判定値算出部１８０Ａから判定値１０９０Ａが入力されると、映像符号化制御パラメータ１０３０Ａを算出する。映像符号化制御パラメータ１０３０Ａは、会議クライアント装置３００Ａのエンコード部３２０Ａでカメラ映像を符号化する際に使用される値で、ここでは符号化の際の解像度とフレームレートを規定している。映像符号化制御パラメータ１０３０Ａの算出の際には、図８および図９に示すように、映像品質制御部１９０Ａが判定値と映像符号化制御パラメータの対応テーブルを内部に保持しており、適宜参照して算出を行う。通常、判定値１０９０Ａの値の大きさに応じて、映像符号化制御パラメータ１０３０Ａに含まれる解像度とフレームレートの値は大きくなり、より高品質の映像に相当することになる。映像符号化制御パラメータ１０３０Ａを算出すると、映像品質制御部１９０Ａはこのパラメータをエンコード部３２０Ａへ出力する。 When the determination value 1090A is input from the determination value calculation unit 180A, the video quality control unit 190A calculates the video encoding control parameter 1030A. The video encoding control parameter 1030A is a value used when the encoding unit 320A of the conference client apparatus 300A encodes the camera video, and here defines the resolution and frame rate at the time of encoding. When calculating the video coding control parameter 1030A, as shown in FIGS. 8 and 9, the video quality control unit 190A holds a correspondence table between the determination value and the video coding control parameter, and is referred to as appropriate. To calculate. Normally, the resolution and frame rate values included in the video encoding control parameter 1030A increase in accordance with the magnitude of the determination value 1090A, which corresponds to a higher quality video. When the video encoding control parameter 1030A is calculated, the video quality control unit 190A outputs this parameter to the encoding unit 320A.

次に、会議に参加している各ユーザが、自身のレイアウト情報に変更を加えた状況を考える。会議に参加しているユーザはそれぞれ、ユーザ・インターフェース部３７０Ａ、３７０Ｂ、３７０Ｃ、３７０Ｄを利用して、自分が現在見ている合成映像のレイアウトに変更を加えることができる。いま、各ユーザの操作によって、図１４に示すようなレイアウト情報１０８０Ａ、１０８０Ｂ、１０８０Ｃ、１０８０Ｄが指定されたとする。これらのレイアウト情報は、各会議クライアント装置のレイアウト制御信号送信部３８０Ａ、３８０Ｂ、３８０Ｃ、３８０Ｄ、および、通信路５３０Ａ、５３０Ｂ、５３０Ｃ、５３０Ｄを通じて会議サーバ装置１００へ送出される。そしてレイアウト制御信号受信部１３０を通じてレイアウト情報記憶部１４０に格納されると、映像合成部１５０はこのレイアウト情報を参照して図１５のように映像合成を行う。それとともに、レイアウト制御信号受信部１３０はレイアウト情報通知信号送信部２１０に対し、各会議クライアント装置に関わるレイアウト情報通知信号の送信要求を出力する。レイアウト情報通知信号送信部２１０は、たとえば会議クライアント装置３００Ａへのレイアウト情報通知信号の送信要求が入力された場合、１０８０Ａを除いた３つのレイアウト情報１０８０Ｂ、１０８０Ｃ、１０８０Ｄをレイアウト情報記憶部１４０から取得し、これらの情報を、通信路５４０Ａを通じて会議クライアント装置３００Ａに送出する。同様に、会議クライアント装置３００Ｂへのレイアウト情報通知信号の送信要求が入力された場合は、レイアウト情報１０８０Ａ、１０８０Ｃ、１０８０Ｄを取得し３００Ｂに送出し、３００Ｃへのレイアウト情報通知信号の送信要求が入力された場合は、レイアウト情報１０８０Ａ、１０８０Ｂ、１０８０Ｄを取得し３００Ｂに送出し、会議クライアント装置３００Ｄへのレイアウト情報通知信号の送信要求が入力された場合は、レイアウト情報１０８０Ａ、１０８０Ｂ、１０８０Ｃを取得し３００Ｄに送出する。 Next, consider a situation in which each user participating in the conference has changed his / her layout information. Users participating in the conference can make changes to the layout of the composite video that they are currently viewing using the user interface units 370A, 370B, 370C, and 370D, respectively. Now, it is assumed that layout information 1080A, 1080B, 1080C, 1080D as shown in FIG. The layout information is transmitted to the conference server apparatus 100 through the layout control signal transmission units 380A, 380B, 380C, and 380D of each conference client apparatus and the communication paths 530A, 530B, 530C, and 530D. When stored in the layout information storage unit 140 through the layout control signal receiving unit 130, the video composition unit 150 performs video composition as shown in FIG. 15 with reference to the layout information. At the same time, the layout control signal receiver 130 outputs a layout information notification signal transmission request related to each conference client device to the layout information notification signal transmitter 210. For example, when a layout information notification signal transmission request to the conference client apparatus 300A is input, the layout information notification signal transmission unit 210 acquires three layout information 1080B, 1080C, and 1080D excluding 1080A from the layout information storage unit 140. These pieces of information are sent to the conference client apparatus 300A through the communication path 540A. Similarly, when a layout information notification signal transmission request is input to the conference client apparatus 300B, the layout information 1080A, 1080C, 1080D is acquired and transmitted to 300B, and a layout information notification signal transmission request to 300C is input. If layout information 1080A, 1080B, 1080D is acquired and transmitted to 300B, and a layout information notification signal transmission request is input to conference client apparatus 300D, layout information 1080A, 1080B, 1080C is acquired. Send to 300D.

各会議クライアント装置は、会議サーバ装置１００からレイアウト情報を受け取ると、それをレイアウト情報通知信号受信部２２０Ａを通じて、判定値算出部１８０Ａに入力し判定値を算出する。たとえば会議クライアント装置３００Ａの場合、図１６のように、３００Ａに対する判定値は以下のように算出される。会議クライアント装置３００Ａが受信するレイアウト情報は、１０８０Ｂ、１０８０Ｃ、１０８０Ｄの３つである。まず、取得したレイアウト情報の中で、識別番号が１（＝３００Ａの識別番号）に相当する行に注目し、この行に含まれる映像の横方向の長さ（Ｗ）を抽出する。この場合、１０８０Ｂから抽出したＷ値は２５、１０８０Ｃから抽出したＷ値は２５、１０８０Ｄから抽出したＷ値は２５となる。そして、これら３つの値の中で最大のもの、つまり２５を会議クライアント装置３００Ａの判定値１０９０Ａとする。判定値算出部１８０Ａは、判定値１０９０Ａを算出すると、これを映像品質制御部１９０Ａへ入力する。 Upon receiving the layout information from the conference server device 100, each conference client device inputs the layout information to the determination value calculation unit 180A through the layout information notification signal reception unit 220A and calculates the determination value. For example, in the case of the conference client apparatus 300A, as shown in FIG. 16, the determination value for 300A is calculated as follows. The layout information received by the conference client apparatus 300A is three pieces of 1080B, 1080C, and 1080D. First, in the acquired layout information, attention is paid to a line corresponding to an identification number of 1 (= 300A identification number), and the horizontal length (W) of the video included in this line is extracted. In this case, the W value extracted from 1080B is 25, the W value extracted from 1080C is 25, and the W value extracted from 1080D is 25. The largest value among these three values, that is, 25 is set as the determination value 1090A of the conference client apparatus 300A. When the determination value calculation unit 180A calculates the determination value 1090A, the determination value calculation unit 180A inputs the determination value 1090A to the video quality control unit 190A.

映像品質制御部１９０Ａは判定値１０９０Ａを受け取ると、判定値から映像符号化制御パラメータへの変換を行う。映像品質制御部１９０Ａは内部に図８のような判定値と映像符号化制御パラメータの対応テーブルを保持しており、この対応テーブルを参照して映像符号化制御パラメータを図１７のように算出する。たとえば会議クライアント装置３００Ａの場合、これに対応する映像符号化制御パラメータ１１００Ａは、解像度が３２０×２４０ドット、フレームレートが１５ｆｐｓと算出される。 Upon receiving the determination value 1090A, the video quality control unit 190A converts the determination value into a video encoding control parameter. The video quality control unit 190A internally stores a correspondence table of determination values and video encoding control parameters as shown in FIG. 8, and calculates the video encoding control parameters as shown in FIG. 17 with reference to this correspondence table. . For example, in the case of the conference client apparatus 300A, the video encoding control parameter 1100A corresponding to the conference client apparatus 300A is calculated to have a resolution of 320 × 240 dots and a frame rate of 15 fps.

映像符号化制御パラメータ１１００Ａが算出されると、この値はエンコード部３２０Ａへ入力され、これにしたがって符号化の際の映像品質が調整される。 When the video encoding control parameter 1100A is calculated, this value is input to the encoding unit 320A, and the video quality at the time of encoding is adjusted accordingly.

このような処理を、他の会議クライアント装置３００Ｂ、３００Ｃ、３００Ｄも同様に行う。会議クライアント装置３００Ｂの場合、受け取ったレイアウト情報１０８０Ａ、１０８０Ｃ、１０８０Ｄから抽出したＷ値は、それぞれ、５０、１００、１００であるから、自身の判定値１０９０Ｂを１００と算出し、エンコード部３２０Ｂで行われる映像符号化を、パラメータ１１００Ｂにしたがって、解像度が６４０×４８０ドット、フレームレートが３０ｆｐｓとなるように調整する。会議クライアント装置３００Ｃの場合、受け取ったレイアウト情報１０８０Ａ、１０８０Ｂ、１０８０Ｄから抽出したＷ値は、それぞれ、５０、１００、２５であるから、自身の判定値１０９０Ｃを１００と算出し、エンコード部３２０Ｃで行われる映像符号化を、パラメータ１１００Ｃにしたがって、解像度が６４０×４８０ドット、フレームレートが３０ｆｐｓとなるように調整する。会議クライアント装置３００Ｄの場合、受け取ったレイアウト情報１０８０Ａ、１０８０Ｂ、１０８０Ｃから抽出したＷ値は、それぞれ、５０、２５、２５であるから、自身の判定値１０９０Ｄを５０と算出し、エンコード部３２０Ｄで行われる映像符号化を、パラメータ１１００Ｄにしたがって、解像度が６４０×４８０ドット、フレームレートが３０ｆｐｓとなるように調整する。 Such processing is similarly performed by the other conference client apparatuses 300B, 300C, and 300D. In the case of the conference client apparatus 300B, the W values extracted from the received layout information 1080A, 1080C, and 1080D are 50, 100, and 100, respectively. Therefore, the determination value 1090B is calculated as 100, and the encoding unit 320B The video encoding is adjusted according to the parameter 1100B so that the resolution is 640 × 480 dots and the frame rate is 30 fps. In the case of the conference client apparatus 300C, the W values extracted from the received layout information 1080A, 1080B, and 1080D are 50, 100, and 25, respectively. Therefore, the determination value 1090C is calculated as 100, and the encoding unit 320C The video encoding is adjusted according to the parameter 1100C so that the resolution is 640 × 480 dots and the frame rate is 30 fps. In the case of the conference client apparatus 300D, the W values extracted from the received layout information 1080A, 1080B, and 1080C are 50, 25, and 25, respectively. Therefore, the determination value 1090D is calculated as 50, and the encoding unit 320D performs the process. The video encoding is adjusted according to the parameter 1100D so that the resolution is 640 × 480 dots and the frame rate is 30 fps.

今回の場合、デフォルトの映像符号化制御パラメータと比べて、会議クライアント装置３００Ａの送信する映像の映像符号化制御パラメータ１１００Ａに変更が加えられたことになる。（１１００Ｂ、１１００Ｃ、１１００Ｄは、図１３に示すデフォルトのパラメータと結果的に変わらない。）つまりこの場合、会議クライアント装置３００Ａの映像に対し、他の会議クライアント装置３００Ｂ、３００Ｃ、３００Ｄは、この映像を他と比べて小さなサイズで表示していることから、会議クライアント装置３００Ａの判定値が低いと見なしたことになる。このため、エンコード部３２０Ａでのカメラ映像符号化の映像品質が下げられ、結果、通信路５１０Ａを利用して伝送されるデータ量の節約が図られたことになる。 In this case, compared to the default video encoding control parameter, the video encoding control parameter 1100A of the video transmitted by the conference client apparatus 300A has been changed. (1100B, 1100C, and 1100D do not change from the default parameters shown in FIG. 13 as a result.) That is, in this case, the other conference client devices 300B, 300C, and 300D use this video for the video of the conference client device 300A. Is displayed in a smaller size than others, it is considered that the determination value of the conference client apparatus 300A is low. For this reason, the video quality of the camera video encoding in the encoding unit 320A is lowered, and as a result, the amount of data transmitted using the communication path 510A is saved.

以上、本発明の第２の実施の形態として、会議サーバ装置１００および会議クライアント装置３００Ａ〜３００Ｄの詳細な内部構成と動作について説明した。第１の実施の形態とは異なる構成だが、本発明により、会議に参加する複数のクライアントの映像の参照状況に基づいて、クライアントからサーバへ送信する映像データの制御を行い、通信帯域幅を節約することが同様に可能となる。 The detailed internal configurations and operations of the conference server device 100 and the conference client devices 300A to 300D have been described above as the second embodiment of the present invention. Although the configuration is different from that of the first embodiment, according to the present invention, the video data transmitted from the client to the server is controlled based on the reference status of the video of a plurality of clients participating in the conference, and the communication bandwidth is saved. It is possible to do as well.

多地点テレビ会議システムの概略図。Schematic of a multipoint video conference system. 本発明の第１の実施の形態に係る会議サーバ装置、会議クライアント装置の構成を示すブロック図。The block diagram which shows the structure of the conference server apparatus which concerns on the 1st Embodiment of this invention, and a conference client apparatus. 本発明の第１の実施の形態に係るレイアウト情報例１を示す図。The figure which shows the layout information example 1 which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る合成映像例１を示す図。The figure which shows the synthetic | combination image example 1 which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る合成映像内の座標軸を示す図。The figure which shows the coordinate axis in the synthetic | combination image | video which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係るレイアウト情報例２を示す図。The figure which shows the layout information example 2 which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る判定値例１を示す図。The figure which shows the judgment value example 1 which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る、判定値と映像符号化制御パラメータの対応テーブルを示す図。The figure which shows the corresponding | compatible table of the determination value and video coding control parameter based on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る映像符号化制御パラメータ例を示す図。The figure which shows the example of a video coding control parameter which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係るレイアウトのパターン例を示す図。FIG. 4 is a diagram showing an example of a layout pattern according to the first embodiment of the present invention. 本発明の第１の実施の形態に係るレイアウト情報例３を示す図。FIG. 6 is a diagram showing a layout information example 3 according to the first embodiment of the present invention. 本発明の第１の実施の形態に係る表示画面例１を示す図。The figure which shows the example 1 of a display screen which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る映像符号化制御パラメータ例２を示す図。The figure which shows the video encoding control parameter example 2 which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係るレイアウト情報例４を示す図。FIG. 6 is a diagram showing a layout information example 4 according to the first embodiment of the present invention. 本発明の第１の実施の形態に係る表示画面例１を示す図。The figure which shows the example 1 of a display screen which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る判定値例２を示す図。The figure which shows the example 2 of the judgment value which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る映像符号化制御パラメータ例３を示す図。The figure which shows the video encoding control parameter example 3 which concerns on the 1st Embodiment of this invention. 本発明の第２の実施の形態にかかる会議サーバ装置、会議クライアント装置の構成を示すブロック図。The block diagram which shows the structure of the conference server apparatus and conference client apparatus concerning the 2nd Embodiment of this invention.

Explanation of symbols

１００ … 会議サーバ装置
１１０ … 映像受信部
１２０ … デコード部
１３０ … レイアウト制御信号受信部
１４０ … レイアウト情報記憶部
１５０ … 映像合成部
１６０ … エンコード部
１７０ … 映像送信部
１８０、１８０Ａ … 判定値算出部
１９０、１９０Ａ … 映像品質制御部
２００ … 映像品質情報送信部
２１０ … レイアウト情報通知信号送信部
２２０Ａ … レイアウト情報通知信号受信部
３００Ａ、３００Ｂ、３００Ｃ、３００Ｄ … 会議クライアント装置
３１０Ａ、３１０Ｂ、３１０Ｃ、３１０Ｄ … カメラ部
３２０Ａ、３２０Ｂ、３２０Ｃ、３２０Ｄ … エンコード部
３３０Ａ … 映像送信部
３４０Ａ … 映像受信部
３５０Ａ … デコード部
３６０Ａ、３６０Ｂ、３６０Ｃ、３６０Ｄ … 表示部
３７０Ａ、３７０Ｂ、３７０Ｃ、３７０Ｄ … ユーザ・インターフェース部
３８０Ａ、３８０Ｂ、３８０Ｃ、３８０Ｄ … レイアウト制御信号送信部
３９０Ａ … 映像品質制御信号受信部
５００Ａ、５００Ｂ、５００Ｃ、５００Ｄ … ネットワーク
５１０Ａ、５２０Ａ、５３０Ａ、５４０Ａ … 通信路
１０００Ａ、１０００Ｂ、１０００Ｃ、１０００Ｄ … レイアウト情報
１０１０Ａ … 合成映像
１０２０Ａ、１０２０Ｂ、１０２０Ｃ、１０２０Ｄ … 判定値
１０３０Ａ … 映像符号化制御パラメータ
１０４０Ａ、１０５０Ａ、１０６０Ａ … レイアウトパターン
１０７０Ａ、１０７０Ｂ、１０７０Ｃ、１０７０Ｄ … レイアウト情報
１０８０Ａ、１０８０Ｂ、１０８０Ｃ、１０８０Ｄ … レイアウト情報
１０９０Ａ、１０９０Ｂ、１０９０Ｃ、１０９０Ｄ … 判定値
１１００Ａ、１１００Ｂ、１１００Ｃ、１１００Ｄ … 映像符号化制御パラメータ DESCRIPTION OF SYMBOLS 100 ... Conference server apparatus 110 ... Video receiving part 120 ... Decoding part 130 ... Layout control signal receiving part 140 ... Layout information storage part 150 ... Video composition part 160 ... Encoding part 170 ... Video transmission part 180, 180A ... Determination value calculation part 190 , 190A ... Video quality control unit 200 ... Video quality information transmission unit 210 ... Layout information notification signal transmission unit 220A ... Layout information notification signal reception units 300A, 300B, 300C, 300D ... Conference client devices 310A, 310B, 310C, 310D ... Camera Units 320A, 320B, 320C, 320D ... Encoding unit 330A ... Video transmission unit 340A ... Video reception unit 350A ... Decoding units 360A, 360B, 360C, 360D ... Display units 370A, 370B, 370C, 370D ... User interface unit 380A, 380B, 380C, 380D ... Layout control signal transmission unit 390A ... Video quality control signal reception unit 500A, 500B, 500C, 500D ... Network 510A, 520A, 530A, 540A ... Communication paths 1000A, 1000B, 1000C , 1000D ... Layout information 1010A ... Composite video 1020A, 1020B, 1020C, 1020D ... Determination value 1030A ... Video coding control parameter 1040A, 1050A, 1060A ... Layout pattern 1070A, 1070B, 1070C, 1070D ... Layout information 1080A, 1080B, 1080C, 1080D ... Layout information 1090A, 1090B, 1090C, 1090D ... Determination values 1100A, 1100B 1100C, 1100D ... video encoding control parameters

Claims

Video receiving means for receiving first to Nth encoded video data representing the first to Nth videos from first to Nth (N is an integer of 2 or more) video display devices;
Decoding means for decoding the first to Nth encoded video data and outputting the first to Nth decoded video data;
The first to Nth video display information includes first to Nth layout information respectively indicating layouts of the first to Nth videos in one synthesized video obtained by synthesizing the first to Nth videos. Layout information receiving means for receiving from the apparatus;
Layout information storage means for storing the received first to Nth layout information;
Video synthesizing means for synthesizing first to Nth videos represented by the first to Nth decoded video data according to the first to Nth layout information to generate first to Nth synthesized videos; ,
Video transmission means for transmitting the first to Nth composite video to the first to Nth video display devices;
Determination value calculating means for calculating a determination value for the first video display device based on at least the layout of the first video in the second to Nth layout information of the first to Nth layout information; ,
Video quality control means for determining an encoding parameter to be applied in the first video display device from the calculated determination value;
Video quality control signal transmitting means for transmitting the determined encoding parameter to the first video display device;
A video composition device.

The video composition apparatus according to claim 1, wherein the determination value calculation unit calculates the determination value based on a size of the first video in the second to N-th layout information.

The video composition apparatus according to claim 2, wherein the determination value is calculated based on a largest or smallest size of the first video in the second to Nth layout information.

The first to N-th layout information includes a cut-out size from the first video, and the determination value calculation means sets the cut-out size from the first video in the second to N-th layout information. The video composition apparatus according to claim 1, wherein the determination value is calculated based on the determination value.

The determination value calculation means calculates the determination value based on a smallest or largest one of the cut sizes from the first video in the second to N-th layout information. Item 5. The video composition device according to Item 4.

Each of the first to Nth layout information includes hierarchy information of the first to Nth videos, and the determination value calculation means includes the first video hierarchy in the second to Nth layout information. The video synthesizing apparatus according to claim 1, wherein the determination value is calculated on the basis of the video.

The determination value calculation means calculates the determination value based on the smallest or largest one of the first video layers in the second to Nth layout information. The video composition device described in 1.

The said video quality control means has a table which matched the said determination value and the said encoding parameter, The said encoding parameter is determined based on the said table, The one of Claim 1 thru | or 7 characterized by the above-mentioned. The video composition device described in 1.

9. The video composition apparatus according to claim 1, wherein the encoding parameter includes at least one of a resolution and a frame rate.

Video generation means for generating video data;
Encoding means for encoding the video data according to encoding parameters and outputting encoded video data;
Video transmission means for transmitting the encoded video data to a video synthesizer;
Layout information transmitting means for transmitting layout information for video composition to the video composition device;
Video receiving means for receiving composite video data;
Display means for displaying the composite video data;
Layout information receiving means for receiving layout information respectively used by one or more other video display devices;
Determination value calculating means for calculating a determination value for the own device from the layout of the image of the own device in each received layout information;
Video quality control means for determining an encoding parameter to be applied from the calculated determination value, and notifying the determined encoding parameter to the encoding means;
A video display device comprising:

The video display device according to claim 10, wherein the determination value calculation unit calculates the determination value based on a size of the video of the device itself in each received layout information.

The video display device according to claim 11, wherein the determination value is calculated based on a largest or smallest image size of the own device in each received layout information.

Each of the received layout information includes a cut size from a video represented by the video data, and the determination value calculation means calculates the determination value based on the cut size in each layout information. Item 13. The video display device according to Item 10.

14. The video display device according to claim 13, wherein the determination value calculation unit calculates the determination value based on a smallest or largest one of the cutout sizes of the video in each of the layout information.

Each of the layout information includes hierarchical information of a plurality of videos to be combined, and the determination value calculation means calculates the determination value based on the video hierarchy of the own device in each of the layout information. The video display device according to claim 10.

16. The video display according to claim 15, wherein the determination value calculation means calculates the determination value based on the smallest or largest of the video layers of the device itself in each of the layout information. apparatus.

17. The video quality control means includes a table in which the determination value and the encoding parameter are associated with each other, and determines the encoding parameter based on the table. The video display device described in 1.

18. The video display device according to claim 10, wherein the encoding parameter includes at least one of resolution and frame rate.

Generating first to Nth video data in first to Nth video display devices (N is an integer of 2 or more);
Encoding the first to Nth video data according to the first to Nth encoding parameters to generate first to Nth encoded video data;
Transmitting the first to Nth encoded video data to a video synthesizer;
Transmitting first to N-th layout information for video composition to the video composition device;
Decoding the first to Nth encoded video data to generate first to Nth decoded video data;
Synthesizing the first to Nth videos represented by the first to Nth decoded video data according to the first to Nth layout information to generate first to Nth synthesized videos;
Transmitting the first to Nth composite images to the first to Nth image display devices;
A determination value for the first video display device is calculated based on at least the layout of the first video in the second to Nth layout information among the first to Nth layout information,
Determining the first encoding parameter to be applied by the first video display device from the calculated determination value;
Video composition method.

Receiving first to Nth encoded video data from first to Nth (N is an integer of 2 or more) video display devices;
Decoding the first to Nth encoded video data and outputting the first to Nth decoded video data;
Receiving first to Nth layout information for video composition from the first to Nth video display devices;
Storing the received first to Nth layout information;
Synthesizing the first to Nth videos represented by the first to Nth decoded video data according to the first to Nth layout information to generate first to Nth synthesized videos;
Transmitting the first to Nth composite images to the first to Nth image display devices;
Calculating a determination value for the first video display device based on at least the layout of the first video in the second to N-th layout information among the first to N-th layout information;
Determining an encoding parameter to be applied in the first video display device from the calculated determination value;
Transmitting the determined encoding parameters to the first video display device;
A program that causes a computer to execute.

Generating video data; and
Encoding the video data according to encoding parameters and outputting the encoded video data;
Transmitting the encoded video data to a video synthesizer;
Transmitting layout information for video composition to the video composition device;
Receiving composite video data; and
Displaying the composite video data;
Receiving layout information respectively used by one or more other video display devices;
Calculating a determination value for the own device from the layout of the image of the own device in each received layout information;
Determining an encoding parameter to be applied from the calculated determination value;
For causing the video display device to execute the program.