JP7449715B2

JP7449715B2 - Framing area learning device, framing area estimating device, and programs thereof

Info

Publication number: JP7449715B2
Application number: JP2020027891A
Authority: JP
Inventors: 敦志荒井; 俊枝三須; 秀樹三ツ峰; 淳洗井
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2020-02-21
Filing date: 2020-02-21
Publication date: 2024-03-14
Anticipated expiration: 2040-02-21
Also published as: JP2021132349A

Description

本発明は、カメラのフレーミングパラメータを学習するフレーミング領域学習装置及びフレーミング領域推定装置、並びに、それらのプログラムに関する。 The present invention relates to a framing area learning device and a framing area estimating device that learn camera framing parameters, and programs thereof.

雲台などの駆動機構に取り付けられたカメラや固定カメラからの切り出しにより自動撮影を行うことは、カメラの構図となるフレーミングを順次決定していくことに相当する。例えば、特許文献１には、カメラからの画像データから注視対象が含まれる構図を決定し、駆動機構を制御する手法が提案されている。また、特許文献２には、構図情報を学習させて、元画像からの切り出しルールを更新していく手法が提案されている。 Performing automatic photography by cropping from a camera attached to a drive mechanism such as a pan head or a fixed camera corresponds to sequentially determining the framing that is the composition of the camera. For example, Patent Document 1 proposes a method of determining a composition including a gaze target from image data from a camera and controlling a drive mechanism. Further, Patent Document 2 proposes a method of learning composition information and updating rules for cutting out from the original image.

特許第４４６４９０２号公報Patent No. 4464902 特許第５０１６５４０号公報Patent No. 5016540

特許文献１，２に記載の従来技術は、ある撮影意図に対して最終的な構図を決定し、駆動機構の制御及び画像の切り出しを行うものである。これら従来技術において、実際のカメラマンによるカメラワークを模倣しようとした場合を考える。カメラマンは、被写体とそれを取り巻く状況に応じて最適な構図を連続的に決定している。この場合、前記した従来技術では、カメラマンのような状況に応じた構図を自動的に決定するのは困難である。 In the conventional techniques described in Patent Documents 1 and 2, a final composition is determined for a certain photographing intention, and a driving mechanism is controlled and an image is cut out. In these conventional techniques, consider a case where an attempt is made to imitate camera work by an actual photographer. Photographers continuously determine the optimal composition depending on the subject and the surrounding situation. In this case, with the prior art described above, it is difficult for a photographer to automatically determine a composition depending on the situation.

そこで、本発明は、様々なカメラ位置から実際のカメラマンがフレーミングしたようなフレーミング領域を自動で決定できるフレーミング領域学習装置及びフレーミング領域推定装置、並びに、それらのプログラムを提供することを課題とする。 SUMMARY OF THE INVENTION It is therefore an object of the present invention to provide a framing area learning device and a framing area estimating device that can automatically determine framing areas similar to those framed by an actual photographer from various camera positions, as well as programs thereof.

前記課題を解決するため、本発明に係るフレーミング領域学習装置は、カメラの姿勢と画角とを示すカメラパラメータを、カメラがフレーミングするフレーミング領域を示すフレーミングパラメータに変換し、フレーミングパラメータと、被写体の位置又は速度の少なくとも一方を示す被写体情報とに対応付けて学習するフレーミング領域学習装置であって、変換部と、学習部と、を備える構成とした。 In order to solve the above problems, the framing area learning device according to the present invention converts camera parameters indicating the attitude and angle of view of the camera into framing parameters indicating the framing area to be framed by the camera, and converts the framing parameters and the subject's angle. A framing area learning device that learns in association with object information indicating at least one of a position or a speed, and is configured to include a conversion section and a learning section.

かかる構成によれば、変換部は、予め設定した変換規則に基づいて、被写体情報に同期させて取得したカメラパラメータをフレーミングパラメータに変換する。
そして、学習部は、被写体情報とフレーミングパラメータとを対応付けて学習することにより、フレーミングパラメータの学習済み推論モデルを生成する。 According to this configuration, the conversion unit converts the camera parameters acquired in synchronization with the subject information into framing parameters based on a preset conversion rule.
Then, the learning unit generates a learned inference model of the framing parameter by learning the subject information and the framing parameter in association with each other.

このように、フレーミング領域学習装置は、カメラ位置に依存するカメラパラメータではなく、カメラ位置に依存しない世界座標系でフレーミングパラメータを学習する。従って、フレーミング領域学習装置は、カメラ位置に依存しない学習済み推論モデルを生成できる。 In this way, the framing region learning device learns framing parameters in a world coordinate system that does not depend on camera position, rather than camera parameters that depend on camera position. Therefore, the framing region learning device can generate a trained inference model that does not depend on the camera position.

また、前記課題を解決するため、本発明に係るフレーミング領域推定装置は、本発明に係るフレーミング領域学習装置で生成した学習済み推論モデルを用いて、カメラのフレーミング領域を示すフレーミングパラメータを推定するフレーミング領域推定装置であって、推定部と、逆変換部と、を備える構成とした。 Furthermore, in order to solve the above-mentioned problems, the framing region estimation device according to the present invention provides a framing method for estimating a framing parameter indicating a framing region of a camera using a trained inference model generated by a framing region learning device according to the present invention. The area estimating device is configured to include an estimating section and an inverse transform section.

かかる構成によれば、推定部は、被写体の位置又は速度の少なくとも一方を示す被写体情報を入力し、学習済み推論モデルにより、被写体情報に応じたフレーミングパラメータの推定値を出力する。
そして、逆変換部は、推定部が出力したフレーミングパラメータの推定値を、予め設定した逆変換規則に基づいて、カメラの姿勢と画角とを示すカメラパラメータの推定値に逆変換する。 According to this configuration, the estimating unit receives subject information indicating at least one of the position and velocity of the subject, and outputs the estimated value of the framing parameter according to the subject information using the learned inference model.
Then, the inverse transformer inversely transforms the estimated value of the framing parameter outputted by the estimator into the estimated value of the camera parameter indicating the attitude and angle of view of the camera, based on a preset inverse transform rule.

このように、フレーミング領域推定装置は、学習済み推論モデルを用いて、カメラ位置に依存しない世界座標系でフレーミングパラメータを推定し、推定したフレーミングパラメータをカメラパラメータに変換する。従って、フレーミング領域推定装置は、カメラ位置に依存しない学習済み推論モデルを用いて、各カメラで固有のカメラパラメータを推定できる。 In this way, the framing area estimating device uses the trained inference model to estimate framing parameters in a world coordinate system that does not depend on the camera position, and converts the estimated framing parameters into camera parameters. Therefore, the framing area estimating device can estimate unique camera parameters for each camera using a trained inference model that does not depend on camera position.

なお、本発明は、コンピュータを、前記したフレーミング領域学習装置又はフレーミング領域推定装置として機能させるためのプログラムで実現することもできる。 Note that the present invention can also be realized by a program for causing a computer to function as the above-described framing area learning device or framing area estimating device.

本発明は、カメラ位置に依存しない推論モデルを用いるので、様々なカメラ位置から実際のカメラマンがフレーミングしたようなフレーミング領域を自動で決定できる。 Since the present invention uses an inference model that does not depend on camera positions, it is possible to automatically determine framing areas similar to those framed by an actual cameraman from various camera positions.

従来のカメラの一例を説明する説明図であり、（ａ）はカメラの外観図であり、（ｂ）はセンサのブロック図である。FIG. 2 is an explanatory diagram illustrating an example of a conventional camera, in which (a) is an external view of the camera, and (b) is a block diagram of a sensor. 第１実施形態に係るフレーミング領域学習装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a framing region learning device according to a first embodiment. 第１実施形態に係るフレーミング領域推定装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a framing area estimating device according to a first embodiment. 第１実施形態において、世界座標系及びカメラ座標系を説明する説明図である。FIG. 2 is an explanatory diagram illustrating a world coordinate system and a camera coordinate system in the first embodiment. カメラ位置を移動する前後において、画角及びスケールの変換を説明する説明図である。FIG. 6 is an explanatory diagram illustrating conversion of the angle of view and scale before and after moving the camera position. 状況データの一例として、サッカーの試合映像を説明する説明図である。FIG. 2 is an explanatory diagram illustrating a soccer match video as an example of situation data. 位置マップを説明する説明図であり、（ａ）は選手及びボールの位置を示す位置マップであり、（ｂ）は一方のチームの選手の位置を示す位置マップであり、（ｃ）は他方のチームの選手の位置を示す位置マップであり、（ｄ）はボールの位置を示す位置マップである。FIG. 2 is an explanatory diagram illustrating position maps, in which (a) is a position map showing the positions of players and the ball, (b) is a position map showing the positions of players of one team, and (c) is a position map showing the positions of players of one team. This is a position map showing the positions of players on the team, and (d) is a position map showing the position of the ball. 速度マップを説明する説明図であり、（ａ）は一方のチームの選手の位置と速度成分とを示す図であり、（ｂ）は一方のチームの選手の速度成分をドットで示す速度マップであり、（ｃ）は一方のチームの選手の速度成分を円領域で示す速度マップである。2 is an explanatory diagram illustrating speed maps, (a) is a diagram showing the positions and velocity components of players of one team, and (b) is a diagram showing the velocity components of players of one team with dots. (c) is a speed map showing the speed components of players of one team in a circular area. 第１実施形態に係るフレーミング領域学習装置の動作を示すフローチャートである。3 is a flowchart showing the operation of the framing region learning device according to the first embodiment. 第１実施形態に係るフレーミング領域推定装置の動作を示すフローチャートである。3 is a flowchart showing the operation of the framing area estimating device according to the first embodiment. 第２実施形態に係る学習部の構成を示すブロック図である。FIG. 2 is a block diagram showing the configuration of a learning section according to a second embodiment. 第３実施形態に係る学習部の構成を示すブロック図である。It is a block diagram showing the composition of the learning part concerning a 3rd embodiment.

（第１実施形態）
以下、本発明の各実施形態について図面を参照して説明する。
まず、第１実施形態に関連したカメラ３の一例を説明した後、第１実施形態に係るフレーミング領域学習装置１及びフレーミング領域推定装置２の構成を順に説明する。 (First embodiment)
Hereinafter, each embodiment of the present invention will be described with reference to the drawings.
First, an example of the camera 3 related to the first embodiment will be described, and then the configurations of the framing region learning device 1 and the framing region estimating device 2 according to the first embodiment will be described in order.

［カメラ］
図１を参照し、カメラ３について説明する。
カメラ３は、遠隔操作可能に構成されるカメラである。図１（ａ）に示すように、カメラ３は、雲台などの駆動機構３０に搭載されており、この駆動機構３０によって、パン、チルト、ズーム及びフォーカスが制御されている。なお、ズーム及びフォーカスの制御は、カメラ３の内部機構によって行われ、この駆動機構３０で包括しているものとして説明する。 [camera]
The camera 3 will be explained with reference to FIG.
The camera 3 is a camera configured to be remotely controllable. As shown in FIG. 1A, the camera 3 is mounted on a drive mechanism 30 such as a pan head, and pan, tilt, zoom, and focus are controlled by this drive mechanism 30. Note that zoom and focus control is performed by an internal mechanism of the camera 3, and will be described as being encompassed by this drive mechanism 30.

また、図１（ｂ）に示すように、カメラ３は、そのカメラ操作を検出するセンサ３１を備える。このセンサ３１は、カメラ３の姿勢、図示を省略したレンズの画角、焦点位置を測定するためのものである。このセンサ３１には、カメラ３が固定されている雲台のパン角を検出するパン角センサ３１ａと、カメラ３のチルト角を検出するチルト角センサ３１ｂと、カメラ３のズーム量（画角）を検出するズームセンサ３１ｃと、カメラ３に内蔵されているレンズの焦点位置（フォーカス）を検出するフォーカスセンサ３１ｄとが含まれている。 Further, as shown in FIG. 1(b), the camera 3 includes a sensor 31 that detects the camera operation. This sensor 31 is for measuring the attitude of the camera 3, the angle of view of a lens (not shown), and the focal position. This sensor 31 includes a pan angle sensor 31a that detects the pan angle of the pan head to which the camera 3 is fixed, a tilt angle sensor 31b that detects the tilt angle of the camera 3, and a zoom amount (angle of view) of the camera 3. The camera 3 includes a zoom sensor 31c that detects a zoom sensor 31c, and a focus sensor 31d that detects a focal position (focus) of a lens built into the camera 3.

例えば、パン角センサ３１ａ及びチルト角センサ３１ｂは、雲台に取り付けられたロータリエンコーダ、ポテンショメータ又はジャイロセンサによって構成できる。また、ズームセンサ３１ｃ及びフォーカスセンサ３１ｄは、ズームリング及びフォーカスリングの回転角をロータリエンコーダ、ポテンショメータで読みとる方式によって構成できる。この他、ズームセンサ３１ｃ及びフォーカスセンサ３１ｄは、カメラ３のレンズ摺動部に設置されるリニアセンサを用いることができる。 For example, the pan angle sensor 31a and the tilt angle sensor 31b can be configured by a rotary encoder, a potentiometer, or a gyro sensor attached to a pan head. Further, the zoom sensor 31c and the focus sensor 31d can be configured by a method of reading the rotation angles of the zoom ring and the focus ring using a rotary encoder and a potentiometer. In addition, as the zoom sensor 31c and the focus sensor 31d, a linear sensor installed on the lens sliding part of the camera 3 can be used.

［フレーミング領域学習装置の構成］
図２を参照し、フレーミング領域学習装置１の構成について説明する。
フレーミング領域学習装置１は、カメラ３のカメラパラメータを、カメラ３のフレーミングパラメータに変換し、変換したフレーミングパラメータと被写体の位置又は速度の少なくとも一方を示す状況データ（被写体情報）ｊとを対応付けて学習するものである。 [Configuration of framing region learning device]
The configuration of the framing area learning device 1 will be described with reference to FIG. 2.
The framing area learning device 1 converts camera parameters of the camera 3 into framing parameters of the camera 3, and associates the converted framing parameters with situation data (subject information) j indicating at least one of the position or speed of the subject. It's something to learn.

ここで、フレーミングパラメータとは、カメラ３がフレーミングしているフレーミング領域（構図）を示すパラメータのことであり、世界座標系で示されている。
また、世界座標系とは、被写体が存在する空間に対応した３次元座標系のことであり、各カメラ３で共通する座標系である。
また、カメラパラメータとは、カメラ３の姿勢と画角とを示すパラメータのことであり、カメラ座標系で示されている。
また、カメラ座標系とは、カメラ３を基準とした３次元座標系のことであり、各カメラ３で固有の座標系である。
なお、フレーミングパラメータ、世界座標系、カメラパラメータ及びカメラ座標系の詳細は、後記する。 Here, the framing parameter is a parameter indicating a framing area (composition) framed by the camera 3, and is expressed in a world coordinate system.
Further, the world coordinate system is a three-dimensional coordinate system corresponding to the space in which the subject exists, and is a coordinate system common to each camera 3.
Further, the camera parameters are parameters indicating the attitude and angle of view of the camera 3, and are expressed in a camera coordinate system.
Further, the camera coordinate system is a three-dimensional coordinate system based on the camera 3, and is a unique coordinate system for each camera 3.
Note that details of the framing parameters, world coordinate system, camera parameters, and camera coordinate system will be described later.

図２に示すように、フレーミング領域学習装置１は、状況データ記憶部（被写体情報記憶部）１０Ａと、カメラパラメータ記憶部１０Ｂと、データ選択部１１と、時刻調整部１２と、状況データ読出部（被写体情報読出部）１３と、カメラパラメータ読出部１４と、変換部１５と、量子化部１６と、学習部１７とを備える。 As shown in FIG. 2, the framing area learning device 1 includes a situation data storage section (subject information storage section) 10A, a camera parameter storage section 10B, a data selection section 11, a time adjustment section 12, and a situation data reading section. It includes a (subject information reading section) 13, a camera parameter reading section 14, a converting section 15, a quantizing section 16, and a learning section 17.

状況データ記憶部１０Ａは、予め、後記する状況データｊを記憶するＨＤＤ（hard disk drive）、ＳＳＤ（solid state drive）、メモリ等の記憶装置である。ここでは、状況データ記憶部１０Ａは、各時刻νの状況データｊ（ν）を記憶している（｛ｊ（ν）｝_{ν∈（１，２，…，Ｎ）}）。 The situation data storage unit 10A is a storage device such as a hard disk drive (HDD), solid state drive (SSD), or memory that stores situation data j to be described later in advance. Here, the situation data storage unit 10A stores situation data j(ν) at each time ν ({j(ν)} _{νε(1, 2, . . . , N)} ).

カメラパラメータ記憶部１０Ｂは、予め、後記するカメラパラメータを記憶するＨＤＤ、ＳＳＤ、メモリ等の記憶装置である。ここで、カメラパラメータ記憶部１０Ｂは、各時刻νのカメラパラメータθ（ν）を記憶している（｛θ（ν）｝_{ν∈（１，２，…，Ｎ）}）。 The camera parameter storage unit 10B is a storage device such as an HDD, SSD, or memory that stores camera parameters, which will be described later, in advance. Here, the camera parameter storage unit 10B stores camera parameters θ(ν) at each time ν ({θ(ν)} _{νε(1, 2, . . . , N)} ).

なお、状況データｊ（ν）及びカメラパラメータθ（ν）は、同時刻に同期して取得されているものとする。
また、図２では、状況データ記憶部１０Ａ及びカメラパラメータ記憶部１０Ｂを別々に図示したが、記憶部として一体化してもよい。 Note that it is assumed that the situation data j(v) and the camera parameter θ(v) are acquired synchronously at the same time.
Further, although the situation data storage section 10A and the camera parameter storage section 10B are shown separately in FIG. 2, they may be integrated as a storage section.

データ選択部１１は、状況データｊ（ν）とカメラパラメータθ（ν）とに紐づけられている時刻ν＝ｎを順次選択するものである（ｎ＝１，２，…，Ｎ）。そして、データ選択部１１は、選択した時刻ｎを時刻調整部１２及び状況データ読出部１３に出力する。 The data selection unit 11 sequentially selects times ν=n associated with situation data j(ν) and camera parameters θ(ν) (n=1, 2, . . . , N). Then, the data selection section 11 outputs the selected time n to the time adjustment section 12 and the situation data reading section 13.

時刻調整部１２は、オフセット時刻Δｎが予め設定され、データ選択部１１から入力された時刻ｎにオフセット時刻Δを加算し、未来時刻（ｎ＋Δｎ）を算出するものである。このオフセット時刻Δｎは、任意の値で設定できる。そして、時刻調整部１２は、生成した未来時刻（ｎ＋Δｎ）をカメラパラメータ読出部１４に出力する。 The time adjustment unit 12 has an offset time Δn set in advance, and adds the offset time Δ to the time n inputted from the data selection unit 11 to calculate a future time (n+Δn). This offset time Δn can be set to any value. Then, the time adjustment unit 12 outputs the generated future time (n+Δn) to the camera parameter reading unit 14.

状況データ読出部１３は、データ選択部１１より入力された所定時刻ｎの状況データｊ（ｎ）を状況データ記憶部１０Ａから読み出すものである。そして、状況データ読出部１３は、読み出した状況データｊ（ｎ）を学習部１７に出力する。 The situation data reading section 13 reads out the situation data j(n) at a predetermined time n input from the data selection section 11 from the situation data storage section 10A. Then, the situation data reading section 13 outputs the read situation data j(n) to the learning section 17.

カメラパラメータ読出部１４は、状況データｊ（ｎ）の読出時刻ｎに対して、予め設定したオフセット時刻Δｎだけオフセットした未来時刻（ｎ＋Δｎ）のカメラパラメータθ（ｎ＋Δｎ）を読み出すものである。このカメラパラメータ読出部１４は、時刻調整部１２より入力された未来時刻（ｎ＋Δｎ）のカメラパラメータθ（ｎ＋Δｎ）を読み出す。そして、カメラパラメータ読出部１４は、読み出したカメラパラメータθ（ｎ＋Δｎ）を変換部１５及び学習部１７に出力する。 The camera parameter reading unit 14 reads camera parameters θ(n+Δn) at a future time (n+Δn) offset by a preset offset time Δn with respect to the read time n of the situation data j(n). The camera parameter reading unit 14 reads camera parameters θ(n+Δn) at a future time (n+Δn) input from the time adjustment unit 12. Then, the camera parameter reading unit 14 outputs the read camera parameter θ(n+Δn) to the converting unit 15 and the learning unit 17.

ここで、カメラパラメータθ（ｎ＋Δｎ）の読み出し時刻をオフセット時刻Δｎだけ先の時刻（未来）にオフセットさせる理由について説明する。
カメラマンのカメラワークが遅れた場合、カメラパラメータθにも遅れが反映されてしまう。そこで、カメラパラメータθの読み出し時刻をオフセットさせると、カメラマンによるカメラワークの遅れを先読みでき、カメラパラメータθの遅延の影響を低減できる。
さらに、カメラパラメータθの読み出し時刻をオフセットさせることで、フレーミング領域推定装置２で発生する処理遅延の影響を低減できる。 Here, the reason why the read time of the camera parameter θ(n+Δn) is offset to the future time by the offset time Δn will be explained.
If the cameraman's camera work is delayed, the delay will also be reflected in the camera parameter θ. Therefore, by offsetting the readout time of the camera parameter θ, it is possible to predict in advance the delay in camera work by the cameraman, and the influence of the delay in the camera parameter θ can be reduced.
Furthermore, by offsetting the readout time of the camera parameter θ, the influence of processing delays occurring in the framing area estimating device 2 can be reduced.

変換部１５は、状況データｊ（ｎ）に同期したカメラパラメータθ（ｎ＋Δｎ）をカメラパラメータ読出部１４から入力する。そして、変換部１５は、予め設定した変換規則Ｔに基づいて、入力されたカメラ座標系のカメラパラメータθ（ｎ＋Δｎ）を世界座標系のフレーミングパラメータの連続値ｇ（ｎ＋Δｎ）＝Ｔ（θ（ｎ＋Δｎ））に変換するものである。さらに、変換部１５は、変換したフレーミングパラメータの連続値ｇ（ｎ＋Δｎ）を量子化部１６に出力する。
なお、変換部１５による変換処理の詳細は、後記する。 The conversion unit 15 inputs the camera parameter θ(n+Δn) synchronized with the situation data j(n) from the camera parameter reading unit 14. Then, the conversion unit 15 converts the camera parameter θ(n+Δn) of the input camera coordinate system to the continuous value g(n+Δn)=T(θ(n+Δn) of the framing parameter of the world coordinate system based on the preset conversion rule T. )). Further, the converter 15 outputs the converted continuous value g(n+Δn) of the framing parameter to the quantizer 16.
Note that details of the conversion process by the conversion unit 15 will be described later.

量子化部１６は、フレーミングパラメータを世界座標系上の連続値ｇ（ｎ＋Δｎ）として変換部１５から入力する。そして、量子化部１６は、予め設定した量子化規則Ｑに基づいて、入力されたフレーミングパラメータを世界座標系上の離散値ｑ（ｎ＋Δｎ）＝Ｑ（ｇ（ｎ＋Δｎ））に量子化するものである。さらに、量子化部１６は、量子化したフレーミングパラメータの離散値ｑ（ｎ＋Δｎ）を学習部１７に出力する。
なお、量子化部１６による量子化処理の詳細は、後記する。 The quantization unit 16 receives the framing parameter as a continuous value g(n+Δn) on the world coordinate system from the conversion unit 15. Then, the quantization unit 16 quantizes the input framing parameters into discrete values q(n+Δn)=Q(g(n+Δn)) on the world coordinate system based on a preset quantization rule Q. be. Further, the quantization unit 16 outputs the quantized discrete value q(n+Δn) of the framing parameter to the learning unit 17.
Note that details of the quantization process by the quantization unit 16 will be described later.

ここで、離散値とは、推論モデルがフレーミングパラメータを世界座標系上の離散値（空間的な離散値）に変換した値をいう。例えば、離散値は、０．２５メートル刻み、０．５メートル刻み、又は、１．０メートル刻みのようにメッシュオーダとなる。
また、連続値とは、推論モデルがフレーミングパラメータを世界座標系上の連続値（空間的な連続値）に変換した値をいう。 Here, the discrete value refers to a value obtained by converting the framing parameter into a discrete value (spatial discrete value) on the world coordinate system by the inference model. For example, the discrete values are in mesh order, such as 0.25 meter increments, 0.5 meter increments, or 1.0 meter increments.
Further, the continuous value refers to a value obtained by converting the framing parameter into a continuous value (spatial continuous value) on the world coordinate system by the inference model.

学習部１７は、状況データ読出部１３から状況データｊ（ｎ）を入力し、量子化部１６からフレーミングパラメータの離散値ｑ（ｎ＋Δｎ）を入力する。そして、学習部１７は、状況データｊ（ｎ）とフレーミングパラメータの離散値ｑ（ｎ＋Δｎ）とを対応付けて学習することにより、フレーミングパラメータの学習済み推論モデルを生成するものである。つまり、学習部１７は、状況データｊ（ｎ）及びフレーミングパラメータの離散値ｑ（ｎ＋Δｎ）のデータ対を取り込み、学習済みパラメータＰを出力する。
なお、学習部１７の構成は、第２実施形態及び第３実施形態で説明する。 The learning unit 17 receives the situation data j(n) from the situation data reading unit 13 and receives the discrete value q(n+Δn) of the framing parameter from the quantization unit 16. The learning unit 17 generates a learned inference model of the framing parameter by learning the situation data j(n) and the discrete value q(n+Δn) of the framing parameter in association with each other. That is, the learning unit 17 takes in the data pair of the situation data j(n) and the discrete value q(n+Δn) of the framing parameter, and outputs the learned parameter P.
Note that the configuration of the learning section 17 will be explained in the second embodiment and the third embodiment.

推論モデルは、任意に選択できる。例えば、推論モデルとして、サポートベクターマシン（ＳＶＭ：support vector machine）やｋ近傍法（ｋ－ＮＮ：k-nearest neighbor）などの機械学習を用いてもよい。また、推論モデルとして、ニューラルネットワークを用いてもよい。状況データｊが画像化されたデータの場合、推論モデルとして、畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）を用いることが好ましい。 An inference model can be selected arbitrarily. For example, machine learning such as a support vector machine (SVM) or a k-nearest neighbor method (k-NN) may be used as the inference model. Further, a neural network may be used as the inference model. When the situation data j is image data, it is preferable to use a convolutional neural network (CNN) as the inference model.

ここで、フレーミングパラメータの離散値ｑ（ｎ＋Δｎ）又は連続値ｇ（ｎ＋Δｎ）の何れで学習するかは、推論モデルの種類に応じて任意に選択できる。
なお、学習部１７がフレーミングパラメータの連続値ｇ（ｎ＋Δｎ）を直接学習する場合、フレーミング領域学習装置１は、量子化部１６を備えなくともよい。この場合、変換部１５は、フレーミングパラメータの連続値ｇ（ｎ＋Δｎ）を学習部１７に出力する。 Here, it is possible to arbitrarily select which of the discrete value q(n+Δn) or the continuous value g(n+Δn) of the framing parameter is to be used for learning, depending on the type of the inference model.
Note that when the learning section 17 directly learns the continuous value g(n+Δn) of the framing parameter, the framing region learning device 1 does not need to include the quantization section 16. In this case, the conversion unit 15 outputs the continuous value g(n+Δn) of the framing parameter to the learning unit 17.

以上のように、フレーミング領域学習装置１は、カメラ位置に依存するカメラパラメータθではなく、カメラ位置に依存しない世界座標系でフレーミングパラメータの離散値ｑ又は連続値ｇを学習するので、カメラ位置に依存しない学習済み推論モデルを生成できる。従って、フレーミング領域学習装置１は、様々なカメラ位置から実際のカメラマンがフレーミングしたようなフレーミング領域を自動で決定できる。 As described above, the framing area learning device 1 learns the discrete value q or continuous value g of the framing parameter in the world coordinate system that does not depend on the camera position, instead of the camera parameter θ that depends on the camera position. It is possible to generate a trained inference model that is independent of dependencies. Therefore, the framing area learning device 1 can automatically determine a framing area similar to that framed by an actual cameraman from various camera positions.

従来の推論モデルは、カメラパラメータθをそのまま学習しているので、カメラパラメータθをそのまま出力するため、カメラパラメータθがカメラ位置に依存し、学習時及び推定時でカメラ位置が同一でなければ、この推論モデルを利用できない。現実的には、学習時及び推定時にカメラ位置が異なるため、従来の推論モデルの利用が困難である。その一方、フレーミング領域学習装置１の学習済み推論モデルは、世界座標系のフレーミングパラメータを学習しているので、学習時及び推定時でカメラ位置が異なる場合にも利用できる。 The conventional inference model learns the camera parameter θ as it is, and outputs the camera parameter θ as it is. This inference model cannot be used. In reality, it is difficult to use conventional inference models because the camera positions differ during learning and estimation. On the other hand, since the trained inference model of the framing region learning device 1 has learned the framing parameters in the world coordinate system, it can be used even when the camera position differs between learning and estimation.

さらに、フレーミング領域学習装置１は、カメラパラメータθ（ｎ＋Δｎ）の読み出し時刻を未来にオフセットさせるので、カメラパラメータθの遅延の影響やフレーミング領域推定装置２の処理遅延の影響を低減できる。 Furthermore, since the framing area learning device 1 offsets the readout time of the camera parameter θ(n+Δn) into the future, the influence of the delay in the camera parameter θ and the influence of the processing delay of the framing area estimation device 2 can be reduced.

［フレーミング領域推定装置の構成］
図３を参照し、フレーミング領域推定装置２の構成について説明する。
フレーミング領域推定装置２は、フレーミング領域学習装置１で生成した学習済み推論モデルを用いて、フレーミングパラメータを推定し、推定したフレーミングパラメータをカメラパラメータに変換するものである。図３に示すように、フレーミング領域推定装置２は、推定部２０と、逆量子化部２１と、逆変換部２２とを備える。 [Configuration of framing area estimation device]
The configuration of the framing area estimating device 2 will be described with reference to FIG. 3.
The framing region estimating device 2 estimates a framing parameter using the learned inference model generated by the framing region learning device 1, and converts the estimated framing parameter into a camera parameter. As shown in FIG. 3, the framing region estimation device 2 includes an estimation section 20, an inverse quantization section 21, and an inverse transformation section 22.

推定部２０は、後記する状況データｊを入力し、学習済み推論モデルにより、入力した状況データｊに応じたフレーミングパラメータの推定値を世界座標系で出力するものである。つまり、推定部２０は、学習済みパラメータＰによって構成される学習済みの推論モデルであり、状況データｊを入力すると、フレーミングパラメータの離散推定値ｑ_ｅｓｔを逆量子化部２１に出力する。 The estimation unit 20 receives situation data j, which will be described later, and uses a learned inference model to output an estimated value of a framing parameter according to the input situation data j in a world coordinate system. That is, the estimation unit 20 is a learned inference model configured by the learned parameters P, and upon receiving the situation data j, outputs the discrete estimated value q _est of the framing parameter to the inverse quantization unit 21.

以後、フレーミングパラメータの推定値が離散値であるものを「離散推定値」と記載し、フレーミングパラメータの推定値が連続値であるものを「連続推定値」と記載する場合がある。 Hereinafter, a framing parameter whose estimated value is a discrete value may be referred to as a "discrete estimated value," and a framed parameter whose estimated value is a continuous value may be referred to as a "continuous estimated value."

逆量子化部２１は、予め設定した逆量子化規則Ｑ^－１に基づいて、推定部２０から入力されたフレーミングパラメータの離散推定値ｑ_ｅｓｔを連続推定値ｇ_ｅｓｔ＝Ｑ^－１（ｑ_ｅｓｔ）に逆量子化するものである。そして、逆量子化部２１は、逆量子化したフレーミングパラメータの連続推定値ｇ_ｅｓｔを逆変換部２２に出力する。 The dequantization unit 21 converts the discrete estimated value q _est of the framing parameter input from the estimation unit 20 into a continuous estimated value g _est =Q ⁻¹ (q _est ) based on a preset dequantization rule Q ⁻¹ . This is inverse quantization. Then, the dequantization unit 21 outputs the dequantized continuous estimated value _gest of the framing parameter to the inverse transformation unit 22.

なお、逆量子化部２１による逆量子化の詳細は、後記する。
また、推定部２０がフレーミングパラメータの連続推定値ｇ_ｅｓｔを直接出力する場合、フレーミング領域推定装置２は、逆量子化部２１を備えなくともよい。この場合、推定部２０は、フレーミングパラメータの連続推定値ｇ_ｅｓｔを逆変換部２２に出力する。 Note that details of the inverse quantization by the inverse quantization unit 21 will be described later.
Further, when the estimation unit 20 directly outputs the continuous estimated value _gest of the framing parameter, the framing area estimation device 2 does not need to include the inverse quantization unit 21. In this case, the estimator 20 outputs the continuous estimated value _gest of the framing parameter to the inverse transformer 22.

逆変換部２２は、逆量子化部２１から入力された世界座標系のフレーミングパラメータの連続推定値ｇ_ｅｓｔを、予め設定した逆変換規則Ｔ^－１に基づいて、カメラ座標系のカメラパラメータの推定値θ_ｅｓｔ＝Ｔ^－１（ｇ_ｅｓｔ）に逆変換するものである。そして、逆変換部２２は、逆変換したカメラパラメータの推定値θ_ｅｓｔを外部に出力する。
なお、逆変換部２２による逆変換処理の詳細は、後記する。 The inverse transformer 22 estimates the camera parameters in the camera coordinate system based on the inverse transform rule T ^-1 set in advance, using the continuous estimated value g _est of the framing parameters in the world coordinate system inputted from the inverse quantizer 21. The value is inversely converted to the value θ _est =T ⁻¹ (g _est ). Then, the inverse transformer 22 outputs the inversely transformed estimated value θ _est of the camera parameters to the outside.
Note that details of the inverse transformation process by the inverse transformation unit 22 will be described later.

以上のように、フレーミング領域推定装置２は、カメラ位置に依存しない推論モデルを用いるので、様々なカメラ位置から実際のカメラマンがフレーミングしたようなフレーミング領域を自動で決定できる。すなわち、フレーミング領域推定装置２は、学習済み推論モデルを用いて、カメラ位置に依存しない世界座標系でフレーミングパラメータの離散推定値ｑ_ｅｓｔを推定し、推定したフレーミングパラメータの離散推定値ｑ_ｅｓｔをカメラパラメータの推定値θ_ｅｓｔに変換する。このカメラパラメータの推定値θ_ｅｓｔからカメラ３の制御信号を生成すれば、様々なカメラ位置でカメラ３の姿勢を制御できる。 As described above, since the framing area estimating device 2 uses an inference model that does not depend on the camera position, it is possible to automatically determine a framing area similar to that framed by an actual cameraman from various camera positions. That is, the framing region estimating device 2 estimates the discrete estimated value q _est of the framing parameter in a world coordinate system that does not depend on the camera position using the trained inference model, and uses the estimated discrete estimated value q _est of the framing parameter as the camera position. The estimated value θ _est of the parameter is converted. If a control signal for the camera 3 is generated from the estimated value θ _est of the camera parameter, the attitude of the camera 3 can be controlled at various camera positions.

図４及び図５を参照し、世界座標系のフレーミングパラメータ及びカメラ座標系のカメラパラメータと、変換処理及び逆変換処理と、量子化処理及び逆量子化処理とを順に説明する。 With reference to FIGS. 4 and 5, framing parameters in the world coordinate system, camera parameters in the camera coordinate system, transformation processing and inverse transformation processing, and quantization processing and inverse quantization processing will be described in order.

＜世界座標系のフレーミングパラメータ、カメラ座標系のカメラパラメータ＞
図４には、サッカーフィールドを一例として、被写体が存在する世界座標系と、カメラ３の位置及び姿勢が反映されたカメラ座標系との関係を図示した。 <Framing parameters of world coordinate system, camera parameters of camera coordinate system>
FIG. 4 illustrates the relationship between the world coordinate system in which the subject exists and the camera coordinate system in which the position and orientation of the camera 3 are reflected, using a soccer field as an example.

図４に示すように、世界座標系は、サッカーフィールド９０の中央９１を原点とし、原点を通りサイドライン９２と平行な軸をＸ軸とし、センターライン９３と平行な軸をＹ軸とし、サッカーフィールド９０の平面に垂直かつ上向きの軸を＋Ｚ軸とする。従って、世界座標系は、Ｘ軸、Ｙ軸、Ｚ軸の順に右手系となる。 As shown in FIG. 4, the world coordinate system has the center 91 of the soccer field 90 as its origin, the axis passing through the origin and parallel to the sideline 92 as the X axis, and the axis parallel to the center line 93 as the Y axis. The axis perpendicular to the plane of the field 90 and pointing upward is the +Z axis. Therefore, the world coordinate system becomes a right-handed system with the X-axis, Y-axis, and Z-axis in this order.

カメラ座標系は、カメラ３の光軸方向を＋ｚ軸とする。カメラ座標系の原点が世界座標系の原点と一致し、かつ、各軸の回転がない場合を考える。この場合、カメラ座標系＋ｘ軸と世界座標系＋Ｘ軸、カメラ座標系＋ｙと世界座標系－Ｚ軸、カメラ座標系＋ｚ軸と世界座標系＋Ｙ軸が一致する関係となる。従って、カメラ座標系は、ｘ軸、ｙ軸、ｚ軸の順に右手系となる。 In the camera coordinate system, the optical axis direction of the camera 3 is the +z axis. Consider a case where the origin of the camera coordinate system coincides with the origin of the world coordinate system, and there is no rotation on each axis. In this case, the camera coordinate system +x axis and the world coordinate system +X axis, the camera coordinate system +y and the world coordinate system -Z axis, and the camera coordinate system +z axis and the world coordinate system +Y axis coincide. Therefore, the camera coordinate system becomes a right-handed system with the x-axis, y-axis, and z-axis in this order.

カメラ座標系において、ｙ軸の回転角をパン角α［ｒａｄ］とし、ｘ軸の回転角をチルト角δ［ｒａｄ］とし、ｚ軸の回転角をロール角φ［ｒａｄ］とする。この場合、カメラ座標系における世界座標系の姿勢Ｒは、以下の式（１）で表される。つまり、カメラ座標系のパン角α、チルト角δ及びロール角φと世界座標系の姿勢Ｒとの間には、式（１）の関係がある。 In the camera coordinate system, the rotation angle of the y-axis is a pan angle α [rad], the rotation angle of the x-axis is a tilt angle δ [rad], and the rotation angle of the z-axis is a roll angle φ [rad]. In this case, the attitude R of the world coordinate system in the camera coordinate system is expressed by the following equation (1). In other words, there is a relationship expressed by equation (1) between the pan angle α, tilt angle δ, and roll angle φ in the camera coordinate system and the attitude R in the world coordinate system.

図５に示すように、世界座標系におけるカメラ座標系の原点（カメラ３の位置）をｔ_ｗ＝［ｔ_ｗｘ，ｔ_ｗｙ，ｔ_ｗｚ］とする。この場合、カメラ座標系における世界座標系の原点ｔ_ｃ＝［ｔ_ｃｘ，ｔ_ｃｙ，ｔ_ｃｚ］は、ｔ_ｃ＝－Ｒｔ_ｗで表すことができる。 As shown in FIG. 5, the origin of the camera coordinate system (the position of the camera 3) in the world coordinate system is assumed to be _tw = [ _twx , _twy , _twz ]. In this case, the origin t _c =[t _cx , t _cy , t _cz ] of the world coordinate system in the camera coordinate system can be expressed as t _c =-Rt _w .

カメラ３のフレーミング領域は、カメラ３の位置ｔ_ｗ、姿勢Ｒ及び画角（正確にはカメラ３に装着されているレンズの水平画角）βによって決定される。
カメラ３の位置ｔ_ｗは、撮影現場で実際に測定すればよい。また、カメラ３の位置ｔ_ｗは、その位置で撮影された画像内の対象物と、この対象物の実際の地上のデータであるグラウンドトゥルースとの対応関係から推定できる。 The framing area of the camera 3 is determined by the position _tw , attitude R, and angle of view (more precisely, the horizontal angle of view of the lens attached to the camera 3) β of the camera 3.
The position _tw of the camera 3 may be actually measured at the shooting site. Further, the position _tw of the camera 3 can be estimated from the correspondence between the object in the image photographed at that position and the ground truth, which is the actual ground data of this object.

カメラ３の姿勢Ｒは、カメラ３に接続されているセンサ３１（パン角センサ３１ａ及びチルト角センサ３１ｂ）から取得できる。一般的に使用される雲台にはロール角φを操作する回転機構がないため、ロール角φ＝０とすればよい。従って、カメラ座標系のパン角α及びチルト角δが分かれば、世界座標系でカメラ３の姿勢Ｒを求められる。 The attitude R of the camera 3 can be acquired from the sensors 31 (the pan angle sensor 31a and the tilt angle sensor 31b) connected to the camera 3. Since commonly used pan heads do not have a rotation mechanism for controlling the roll angle φ, the roll angle φ may be set to 0. Therefore, if the pan angle α and tilt angle δ of the camera coordinate system are known, the attitude R of the camera 3 can be determined in the world coordinate system.

カメラ３の画角βは、ズームレンズを使用している場合、ズームリングの回転量に応じて変化する。ズームセンサ３１ｃの出力値を入力とし、カメラ３の画角βを出力とした場合、カメラ３のセンササイズやズームレンズの組み合わせに応じて、入出力値が変化する。そこで、入出力値を予め測定し、それら入出力値をルックアップテーブル化することで、カメラ３の画角βを求められる。
この他、カメラ３の画角βは、カメラ３のセンササイズから一意に定まる焦点距離ｆとしてもよい。 The angle of view β of the camera 3 changes depending on the amount of rotation of the zoom ring when a zoom lens is used. When the output value of the zoom sensor 31c is input and the angle of view β of the camera 3 is output, the input and output values change depending on the sensor size of the camera 3 and the combination of zoom lenses. Therefore, the angle of view β of the camera 3 can be determined by measuring the input and output values in advance and creating a lookup table of these input and output values.
In addition, the angle of view β of the camera 3 may be a focal length f that is uniquely determined from the sensor size of the camera 3.

以上より、フレーミング領域の決定に最低限必要、かつ、取得可能なカメラパラメータは、パン角α、チルト角δ及び画角βの３つとなる。つまり、カメラ３のパン角α、チルト角δ及び画角βが、カメラパラメータθに相当する。 From the above, the minimum camera parameters required and obtainable for determining the framing area are the pan angle α, the tilt angle δ, and the angle of view β. That is, the pan angle α, tilt angle δ, and angle of view β of the camera 3 correspond to the camera parameter θ.

カメラ３の位置が固定という条件であれば、カメラパラメータθを教師データとして学習した推論モデルによって、状況データｊに応じたフレーミングパラメータを推定できる。カメラ位置に依存せずフレーミングパラメータを推定するためには、カメラ３のフレーミングパラメータとスケールｓとを世界座標系に変換して学習し、これらの推定値をカメラ座標系に逆変換し、カメラパラメータの推定値θ_ｅｓｔを求める必要がある。 Under the condition that the position of the camera 3 is fixed, a framing parameter according to the situation data j can be estimated by an inference model learned using the camera parameter θ as teacher data. In order to estimate the framing parameters without depending on the camera position, the framing parameters and scale s of camera 3 are transformed into the world coordinate system, then these estimated values are transformed back into the camera coordinate system, and the camera parameters are It is necessary to find the estimated value θ _est .

カメラ３のチルト角δ＜０の場合、図４及び図５に示すように、カメラ３の光軸（ｚ軸）と世界座標系におけるＸＹ平面（Ｚ＝０）とが交わる交差位置をｃ_ｗ＝［ｃ_ｗｘ，ｃ_ｗｙ，０］とする。この場合、交差位置ｃ_ｗは、以下の式（２）で算出できる。 When the tilt angle δ<0 of the camera 3, as shown in FIGS. 4 and 5, the intersection position where the optical axis (z axis) of the camera 3 intersects with the XY plane (Z=0) in the world coordinate system is c _w = [c _wx , c _wy , 0]. In this case, the intersection position c _w can be calculated using the following equation (2).

スケールｓは、世界座標系におけるフレーミング領域のサイズを示している。図５に示すように、スケールｓは、カメラ３の位置ｔ_ｗから交差位置ｃ_ｗまでの光軸長ｌとカメラ３の画角βとから、以下の式（３）で算出できる。つまり、交点位置ｃ_ｗｘ，ｃ_ｗｙ及びスケールｓが、フレーミングパラメータの連続値ｇに相当する。 The scale s indicates the size of the framing area in the world coordinate system. As shown in FIG. 5, the scale s can be calculated using the following equation (3) from the optical axis length l from the position _tw of the camera 3 to the intersection position _cw and the angle of view β of the camera 3. In other words, the intersection positions c _wx , c _wy and the scale s correspond to the continuous value g of the framing parameter.

なお、図５では、カメラ３の移動後の位置をｔ_{ｗ＿ｎｅｗ}とし、カメラ３の移動後に推定される画角をβ_ｅｓｔとする。また、カメラ３の移動後に推定されたスケールをｓ_ｅｓｔとし、光軸長をｌ_ｅｓｔとし、交点位置をｃ_{ｗ＿ｅｓｔ}とする。 Note that in FIG. 5, the position of the camera 3 after the movement is _{tw_new} , and the angle of view estimated after the camera 3 is moved is β _est . Further, the scale estimated after the movement of the camera 3 is set _as sest, the optical axis length is set _as lest, and the intersection position is set as _{cw_est} .

＜変換処理及び逆変換処理＞
以上より、前記した式（２）及び式（３）が、変換部１５に予め設定されている変換規則Ｔに相当する。すなわち、変換部１５は、前記した式（２）及び式（３）を用いて、カメラ座標系のカメラパラメータθ＝［α，δ，β］を世界座標系のフレーミングパラメータの連続値ｇ＝［ｃ_ｗｘ，ｃ_ｗｙ，ｓ］に変換する。 <Conversion processing and inverse conversion processing>
From the above, the above equations (2) and (3) correspond to the conversion rule T set in advance in the conversion section 15. That is, the conversion unit 15 converts the camera parameters θ=[α, δ, β] in the camera coordinate system into the continuous value g=[ of the framing parameter in the world coordinate system, using equations (2) and (3) described above. c _wx , c _wy , s].

また、世界座標系のフレーミングパラメータの連続推定値ｇ_ｅｓｔからカメラパラメータの推定値θ_ｅｓｔへの逆変換についても検討する。ここでは、パン角の推定値α_ｅｓｔが式（４）で算出でき、チルト角の推定値δ_ｅｓｔが式（５）で算出できる。 We will also consider inverse transformation from the continuous estimated value _gest of the framing parameter in the world coordinate system to the estimated value θ _est of the camera parameter. Here, the estimated value α _est of the pan angle can be calculated using equation (4), and the estimated value δ _est of the tilt angle can be calculated using equation (5).

図５に示すように、カメラ３の画角の推定値β_ｅｓｔは、カメラ３の位置ｔ_{ｗ＿ｎｅｗ}から交差位置ｃ_{ｗ＿ｅｓｔ}までの光軸長ｌ_ｅｓｔとすると、以下の式（６）で算出できる。 As shown in FIG. 5, the estimated value β _est of the angle of view of the camera 3 can be calculated using the following equation (6), where the optical axis length l _est from the position _{tw_new} of the camera 3 to the intersection position c _{w_est} is set.

前記した式（４）～式（６）が、逆変換部２２に予め設定されている逆変換規則Ｔ^－１に相当する。すなわち、逆変換部２２は、前記した式（４）～式（６）を用いて、フレーミングパラメータの連続推定値ｇ_ｅｓｔ＝［ｃ_{ｗｘ＿ｅｓｔ}，ｃ_{ｗｙ＿ｅｓｔ}，ｓ_ｅｓｔ］をカメラ座標系のカメラパラメータの推定値θ_ｅｓｔ＝［α_ｅｓｔ，δ_ｅｓｔ，β_ｅｓｔ］に逆変換する。 Equations (4) to (6) described above correspond to the inverse transformation rule T ⁻¹ set in advance in the inverse transformation unit 22. That is, the inverse transformation unit 22 converts the continuous estimated value g _est = [c _{wx_est} , c _{wy_est} , s _est ] of the framing parameter into the camera parameter of the camera coordinate system using Equations (4) to (6) described above. The estimated value θ _est = [α _est , δ _est , β _est ].

＜量子化処理及び逆量子化処理＞
推論モデルが連続値ではなく離散値を扱う場合、連続値から離散値への量子化、及び、離散値から連続値への逆量子化が必要になる。前記したように、フレーミングパラメータの連続推定値ｇ_ｅｓｔがｃ_ｗｘ，ｃ_ｗｙ，ｓという３つのパラメータで構成されている。この場合、ベクトル次元数ｋ＝３であることから、フレーミングパラメータの連続値ｇ＝［ｇ_１，ｇ_２，ｇ_３］と表される。このとき、フレーミングパラメータの離散値ｑ＝［ｑ_１（ｇ_１），ｑ_２（ｇ_２），ｑ_３（ｇ_３）］は、以下の式（７）で算出できる。なお、Ａ_ｋ及びＢ_ｋは、各次元ｋにおいて、任意に設定できるパラメータである。 <Quantization processing and inverse quantization processing>
When an inference model deals with discrete values rather than continuous values, quantization from continuous values to discrete values and inverse quantization from discrete values to continuous values are required. As described above, the continuous estimated value g _est of the framing parameter is composed of three parameters: c _wx , c _wy , and s. In this case, since the number of vector dimensions is k=3, the continuous value of the framing parameter g=[g ₁ , g ₂ , g ₃ ]. At this time, the discrete values q=[q ₁ (g ₁ ), q ₂ (g ₂ ), q ₃ (g ₃ )] of the framing parameters can be calculated using the following equation (7). Note that A _k and B _k are parameters that can be arbitrarily set in each dimension k.

前記した式（７）が、量子化部１６に予め設定されている量子化規則Ｑに相当する。すなわち、量子化部１６は、前記した式（７）を用いて、フレーミングパラメータを連続値ｇから離散値ｑに量子化する。 The above equation (7) corresponds to the quantization rule Q set in advance in the quantization section 16. That is, the quantization unit 16 quantizes the framing parameter from a continuous value g to a discrete value q using Equation (7) described above.

逆量子化部２１は、以下の式（８）で表されるように逆量子化する。この式（８）が、逆量子化部２１に予め設定されている逆量子化規則Ｑ^－１に相当する。すなわち、逆量子化部２１は、式（８）を用いて、フレーミングパラメータの離散推定値ｑ_{ｋ＿ｅｓｔ}を連続推定値ｇ_{ｋ＿ｅｓｔ}に逆量子化する。 The dequantization unit 21 performs dequantization as expressed by the following equation (8). This equation (8) corresponds to the dequantization rule Q ⁻¹ set in advance in the dequantization section 21. That is, the dequantization unit 21 dequantizes the discrete estimated value q _{k_est} of the framing parameter into the continuous estimated value g _{k_est} using equation (8).

＜状況データ＞
図６～図８を参照し、サッカーの試合映像を一例として、状況データｊについて説明する。
状況データｊは、被写体の状況として、被写体の位置又は速度の少なくとも一方を示すデータである。例えば、状況データｊは、図６に示すように、サッカーの試合映像ｊ_１である。図６の試合映像ｊ_１は、被写体である選手９_Ａやボール９_Ｂの位置を示している。なお、図６では、図面を見やすくするため、一部の選手９_Ａのみ符号を付した。 <Status data>
With reference to FIGS. 6 to 8, the situation data j will be explained using a soccer match video as an example.
The situation data j is data indicating at least one of the position and speed of the subject as the situation of the subject. For example, the situation data j is a soccer match video _j1 , as shown in FIG. The match video _j1 in FIG. 6 shows the positions of the player _9A and the ball _9B , which are the subjects. In addition, in FIG. 6, in order to make the drawing easier to read, only a part of the players _9A is labeled.

また、状況データｊは、被写体の位置又は速度の少なくとも一方を画像化したマップであってもよい。図７には、状況データｊの一例として、選手やボールの位置を示す位置マップを図示した。 Furthermore, the situation data j may be a map that is an image of at least one of the position and speed of the subject. FIG. 7 shows a position map showing the positions of players and balls as an example of the situation data j.

図７（ａ）の位置マップｊ_２は、全選手とボールの位置を示している。このマップでは、〇が一方のチームの選手の位置を示し、×が他方のチームの選手の位置を示し、●がボールの位置を示している。
図７（ｂ）の位置マップｊ_３は、一方のチームの選手の位置を示している。この位置マップｊ_３では、各選手の位置を、中心側から外周側にかけて濃淡を有する円領域で示した。この位置マップｊ_３によれば、ぼかした円領域で各選手の位置を大まかに示すので、推論モデルの精度を向上させることができる。
図７（ｃ）の位置マップｊ_４は、他方のチームの選手の位置を、位置マップｊ_３と同様の円領域で示している。この位置マップｊ_４によれば、位置マップｊ_３と同様、推論モデルの精度を向上させることができる。
図７（ｄ）の位置マップｊ_５は、ボールの位置を位置マップｊ_３と同様の円領域で示している。 The position map _j2 in FIG. 7(a) shows the positions of all players and the ball. In this map, 〇 indicates the position of a player on one team, × indicates the position of a player on the other team, and ● indicates the position of the ball.
The position map _j3 in FIG. 7(b) shows the positions of players of one team. In this position map _j3 , the position of each player is shown as a circular area having shading from the center side to the outer circumference side. According to this position map _j3 , since the position of each player is roughly indicated by a blurred circular area, the accuracy of the inference model can be improved.
The position map j ₄ in FIG. 7(c) shows the positions of the players of the other team using circular areas similar to the position map j ₃ . According to this position map j ₄ , the accuracy of the inference model can be improved like the position map j ₃ .
The position map _j5 in FIG. 7(d) shows the position of the ball in a circular area similar to the position map _j3 .

図８には、状況データｊの一例として、選手の速度を示す速度マップを図示した。
図８（ａ）は、一方のチームの選手の位置と速度成分（Ｘ軸成分、Ｙ軸成分）とを図示した。この速度成分は、所定の規則で正規化した値を示している。
図８（ａ）の速度マップｊ_６は、図８（ａ）の各選手の位置を無視し、各選手の速度成分を点で示している。
図８（ｃ）の速度マップｊ_７は、図８（ａ）の各選手の速度成分をぼかした円領域で示している。この速度マップｊ_７によれば、ぼかした円領域で各選手の速度を大まかに示すので、推論モデルの精度を向上させることができる。
なお、図８（ｂ）及び図８（ｃ）では、説明のために縦軸及び横軸を図示したものであり、速度マップｊ_６，ｊ_７に軸線を含める必要はない。 FIG. 8 shows a speed map showing the speed of the player as an example of the situation data j.
FIG. 8(a) illustrates the positions and velocity components (X-axis component, Y-axis component) of players of one team. This velocity component indicates a value normalized according to a predetermined rule.
The velocity map _j6 in FIG. 8(a) ignores the position of each player in FIG. 8(a) and shows the velocity component of each player as a point.
The velocity map _j7 in FIG. 8(c) shows the velocity components of each player in FIG. 8(a) as blurred circular regions. According to this speed map _j7 , since the speed of each player is roughly indicated by a blurred circular area, the accuracy of the inference model can be improved.
Note that in FIGS. 8(b) and 8(c), the vertical axis and the horizontal axis are shown for explanation, and there is no need to include the axis lines in the speed maps j ₆ and j ₇ .

ここで、状況データｊとして、試合映像ｊ_１、位置マップｊ_２～ｊ_５又は速度マップｊ_６，ｊ_７の何れを用いてもよい。例えば、状況データｊとして、試合映像ｊ_１、位置マップｊ_２～ｊ_５又は速度マップｊ_６，ｊ_７の何れか１つのみを用いてもよい。また、状況データｊとして、試合映像ｊ_１、位置マップｊ_２～ｊ_５又は速度マップｊ_６，ｊ_７の２つ以上を任意に組み合わせてもよい。さらに、状況データｊとして、両チームの選手の位置マップｊ_３，ｊ_４を組み合わせると、推論モデルの精度を向上させることができる。 Here, as the situation data j, any of the match video j ₁ , position maps j ₂ to j ₅ or speed maps j ₆ and j ₇ may be used. For example, as the situation data j, only one of the match video j ₁ , the position maps j ₂ to j ₅ , or the speed maps j ₆ and j ₇ may be used. Further, as the situation data j, two or more of the match video j ₁ , the position maps j ₂ to j ₅ or the speed maps j ₆ and j ₇ may be arbitrarily combined. Furthermore, by combining the position maps j ₃ and j ₄ of the players of both teams as the situation data j, the accuracy of the inference model can be improved.

［フレーミング領域学習装置の動作］
図９を参照し、フレーミング領域学習装置１の動作について説明する。
ステップＳ１０において、データ選択部１１は、状況データｊ（ν）とカメラパラメータθ（ν）とに紐づけられている時刻ν＝ｎを選択する。
ステップＳ１１において、時刻調整部１２は、オフセット時刻Δｎが予め設定される。そして、時刻調整部１２は、時刻ｎにオフセット時刻Δを加算し、未来時刻（ｎ＋Δｎ）を算出する。 [Operation of framing area learning device]
The operation of the framing area learning device 1 will be described with reference to FIG. 9.
In step S10, the data selection unit 11 selects time ν=n that is linked to the situation data j(ν) and the camera parameter θ(ν).
In step S11, the offset time Δn is preset in the time adjustment unit 12. Then, the time adjustment unit 12 adds the offset time Δ to the time n to calculate a future time (n+Δn).

ステップＳ１２において、状況データ読出部１３は、時刻ｎの状況データｊ（ｎ）を状況データ記憶部１０Ａから読み出す。
ステップＳ１３において、カメラパラメータ読出部１４は、未来時刻（ｎ＋Δｎ）のカメラパラメータθ（ｎ＋Δｎ）をカメラパラメータ記憶部１０Ｂから読み出す。 In step S12, the situation data reading unit 13 reads the situation data j(n) at time n from the situation data storage unit 10A.
In step S13, the camera parameter reading unit 14 reads the camera parameter θ(n+Δn) at the future time (n+Δn) from the camera parameter storage unit 10B.

ステップＳ１４において、変換部１５は、カメラパラメータθ（ｎ＋Δｎ）を世界座標系のフレーミングパラメータの連続値ｇ（ｎ＋Δｎ）に変換する。
ステップＳ１５において、量子化部１６は、フレーミングパラメータの連続値ｇ（ｎ＋Δｎ）を離散値ｑ（ｎ＋Δｎ）に量子化する。 In step S14, the conversion unit 15 converts the camera parameter θ(n+Δn) into a continuous value g(n+Δn) of the framing parameter in the world coordinate system.
In step S15, the quantization unit 16 quantizes the continuous value g(n+Δn) of the framing parameter into a discrete value q(n+Δn).

ステップＳ１６において、学習部１７は、ステップＳ１２で読み出した状況データｊ（ｎ）と、ステップＳ１５で量子化したフレーミングパラメータの離散値ｑ（ｎ＋Δｎ）とを対応付けて学習することにより、フレーミングパラメータの学習済み推論モデルを生成する。
なお、ステップＳ１６において、学習部１７がフレーミングパラメータの連続値ｇ（ｎ＋Δｎ）を直接学習する場合、ステップＳ１５の処理を実行しなくともよい。 In step S16, the learning unit 17 learns the situation data j(n) read out in step S12 and the discrete value q(n+Δn) of the framing parameter quantized in step S15, thereby learning the framing parameter. Generate a trained inference model.
Note that when the learning unit 17 directly learns the continuous value g(n+Δn) of the framing parameter in step S16, it is not necessary to execute the process of step S15.

［フレーミング領域推定装置の動作］
図１０を参照し、フレーミング領域推定装置２の動作について説明する。
ステップＳ２０において、推定部２０は、状況データｊを入力する。
ステップＳ２１において、推定部２０は、学習済み推論モデルにより、状況データｊに応じたフレーミングパラメータの離散推定値ｑ_ｅｓｔを出力する。 [Operation of framing area estimation device]
The operation of the framing area estimating device 2 will be described with reference to FIG. 10.
In step S20, the estimation unit 20 inputs the situation data j.
In step S21, the estimation unit 20 outputs a discrete estimated value q _est of the framing parameter according to the situation data j using the learned inference model.

ステップＳ２２において、逆量子化部２１は、フレーミングパラメータの離散推定値ｑ_ｅｓｔを連続推定値ｇ_ｅｓｔに逆量子化する。
ステップＳ２３において、逆変換部２２は、フレーミングパラメータの連続推定値ｇ_ｅｓｔをカメラパラメータの推定値θ_ｅｓｔに逆変換する。
なお、ステップＳ２１において、推定部２０がフレーミングパラメータの連続推定値ｇ_ｅｓｔを直接出力する場合、ステップＳ２２の処理を実行しなくともよい。 In step S22, the dequantization unit 21 dequantizes the discrete estimated value q _est of the framing parameter into a continuous estimated value g _est .
In step S23, the inverse transformer 22 inversely transforms the continuous estimated value _gest of the framing parameter into the estimated value θ _est of the camera parameter.
Note that in the case where the estimation unit 20 directly outputs the continuous estimated value _gest of the framing parameter in step S21, it is not necessary to execute the process of step S22.

（第２実施形態）
図１１を参照し、第２実施形態に係る学習部１７の構成の一例を説明する。
学習部１７は、学習の過程において学習パラメータを更新し、最終的な学習パラメータを学習済みパラメータＰとして出力するものである。図１１に示すように、学習部１７は、推定部１７０と、逆量子化部１７１と、逆量子化部１７２と、誤差評価部１７３と、パラメータ更新部１７４と、パラメータ記憶部１７５とを備える。 (Second embodiment)
An example of the configuration of the learning section 17 according to the second embodiment will be described with reference to FIG. 11.
The learning unit 17 updates learning parameters during the learning process and outputs the final learning parameters as learned parameters P. As shown in FIG. 11, the learning unit 17 includes an estimation unit 170, an inverse quantization unit 171, an inverse quantization unit 172, an error evaluation unit 173, a parameter update unit 174, and a parameter storage unit 175. .

推定部１７０は、パラメータ記憶部１７５の学習パラメータで構成される推論モデルであり、図３の推定部２０と同様、状況データｊを入力すると、フレーミングパラメータの離散推定値ｑ_ｅｓｔを逆量子化部１７１に出力する。 The estimation unit 170 is an inference model composed of learning parameters stored in the parameter storage unit 175. Similar to the estimation unit ₂₀ of FIG. 171.

逆量子化部１７１は、図３の逆量子化部２１と同様、推定部１７０から入力されたフレーミングパラメータの離散推定値ｑ_ｅｓｔを連続推定値ｇ_ｅｓｔに逆量子化するものである。そして、逆量子化部１７１は、逆量子化したフレーミングパラメータの連続推定値ｇ_ｅｓｔを誤差評価部１７３に出力する。 Similar to the dequantization unit 21 in FIG. 3, the dequantization unit 171 dequantizes the discrete estimated value q _est of the framing parameter inputted from the estimation unit 170 into a continuous estimated value g _est . Then, the dequantization unit 171 outputs the dequantized continuous estimated value _gest of the framing parameter to the error evaluation unit 173.

逆量子化部１７２は、図３の逆量子化部２１と同様、図２の量子化部１６から入力されたフレーミングパラメータの離散値ｑを連続値ｇに逆量子化するものである。そして、逆量子化部１７２は、逆量子化したフレーミングパラメータの連続値ｇを誤差評価部１７３に出力する。 Similar to the dequantization unit 21 in FIG. 3, the dequantization unit 172 dequantizes the discrete value q of the framing parameter inputted from the quantization unit 16 in FIG. 2 into a continuous value g. Then, the dequantization unit 172 outputs the dequantized continuous value g of the framing parameter to the error evaluation unit 173.

なお、推定部１７０がフレーミングパラメータの連続推定値ｇ_ｅｓｔを直接出力する場合、学習部１７は、逆量子化部１７１及び逆量子化部１７２を備えなくともよい。この場合、推定部１７０は、フレーミングパラメータの連続推定値ｇ_ｅｓｔを誤差評価部１７３に出力し、図２の変換部１５からフレーミングパラメータの連続値ｇを誤差評価部１７３に入力すればよい。 Note that when the estimation section 170 directly outputs the continuous estimated value _gest of the framing parameter, the learning section 17 does not need to include the inverse quantization section 171 and the inverse quantization section 172. In this case, the estimation unit 170 may output the continuous estimated value g _est of the framing parameter to the error evaluation unit 173 and input the continuous value g of the framing parameter from the conversion unit 15 in FIG. 2 to the error evaluation unit 173.

誤差評価部１７３は、逆量子化部１７１から入力されたフレーミングパラメータの連続推定値ｇ_ｅｓｔと、逆量子化部１７２から入力されたフレーミングパラメータの連続値ｇとの誤差を算出するものである。例えば、誤差評価部１７３は、２乗和誤差や交差エントロピー誤差を算出する。そして、誤差評価部１７３は、算出した誤差をパラメータ更新部１７４に出力する。 The error evaluation unit 173 calculates the error between the continuous estimated value g _est of the framing parameter input from the inverse quantization unit 171 and the continuous value g of the framing parameter input from the inverse quantization unit 172 . For example, the error evaluation unit 173 calculates a sum of squares error and a cross entropy error. The error evaluation unit 173 then outputs the calculated error to the parameter update unit 174.

パラメータ更新部１７４は、誤差評価部１７３で算出したフレーミングパラメータの連続推定値ｇ_ｅｓｔとフレーミングパラメータの連続値ｇとの誤差から学習信号を生成し、パラメータ記憶部１７５の学習パラメータを更新するものである。つまり、パラメータ更新部１７４は、誤差評価部１７３で算出した誤差が最小となるように、パラメータ記憶部１７５の学習パラメータを逆伝搬で更新する。 The parameter update unit 174 generates a learning signal from the error between the continuous estimated value g _est of the framing parameter calculated by the error evaluation unit 173 and the continuous value g of the framing parameter, and updates the learning parameter in the parameter storage unit 175. be. That is, the parameter update unit 174 updates the learning parameters in the parameter storage unit 175 by back propagation so that the error calculated by the error evaluation unit 173 is minimized.

パラメータ記憶部１７５は、教師データとして、パラメータ更新部１７４が更新した学習パラメータを記憶する記憶装置である。パラメータ記憶部１７５の学習パラメータは、推定部１７０によって参照される。 The parameter storage unit 175 is a storage device that stores learning parameters updated by the parameter update unit 174 as teacher data. The learning parameters in the parameter storage unit 175 are referenced by the estimation unit 170.

このように、学習部１７は、推定部１７０が出力したフレーミングパラメータの推定値ｑ_ｅｓｔをカメラ座標系に変換する必要がないので、演算量を抑えることができる。 In this way, since the learning unit 17 does not need to convert the estimated value q _est of the framing parameter outputted by the estimating unit 170 into the camera coordinate system, the amount of calculation can be suppressed.

（第３実施形態）
図１２を参照し、第３実施形態に係る学習部１７Ｂの構成の一例を説明する。
第２実施形態に係る学習部１７では、世界座標系でフレーミングパラメータの連続推定値ｇ_ｅｓｔとフレーミングパラメータの連続値ｇとの誤差を評価している。これに対し、第３実施形態に係る学習部１７Ｂでは、カメラ座標系でカメラパラメータの推定値θ_ｅｓｔとカメラパラメータθとの誤差を評価する点が、第２実施形態と異なっている。 (Third embodiment)
An example of the configuration of the learning section 17B according to the third embodiment will be described with reference to FIG. 12.
The learning unit 17 according to the second embodiment evaluates the error between the continuous estimated value g _est of the framing parameter and the continuous value g of the framing parameter in the world coordinate system. On the other hand, the learning unit 17B according to the third embodiment differs from the second embodiment in that it evaluates the error between the estimated value θ _est of the camera parameter and the camera parameter θ in the camera coordinate system.

図１２に示すように、学習部１７Ｂは、推定部１７０と、逆量子化部１７１と、逆量子化部１７２と、パラメータ更新部１７４と、パラメータ記憶部１７５と、逆変換部１７６と、誤差評価部１７７とを備える。
なお、逆変換部１７６及び誤差評価部１７７以外の構成は、第２位実施形態と同様のため、説明を省略する。 As shown in FIG. 12, the learning unit 17B includes an estimation unit 170, an inverse quantization unit 171, an inverse quantization unit 172, a parameter update unit 174, a parameter storage unit 175, an inverse transformation unit 176, and an error and an evaluation section 177.
Note that the configurations other than the inverse transform unit 176 and the error evaluation unit 177 are the same as those in the second embodiment, and therefore their description will be omitted.

逆変換部１７６は、図３の逆変換部２２と同様、逆量子化部１７１から入力されたフレーミングパラメータの連続推定値ｇ_ｅｓｔをカメラパラメータθ_ｅｓｔに逆変換するものである。そして、逆変換部１７６は、逆変換したカメラパラメータの推定値θ_ｅｓｔを誤差評価部１７７に出力する。 The inverse transformer 176, like the inverse transformer 22 in FIG. 3, inversely transforms the continuous estimation value _gest of the framing parameter inputted from the inverse quantizer 171 into the camera parameter θ _est . Then, the inverse transformer 176 outputs the inversely transformed estimated value θ _est of the camera parameters to the error evaluation unit 177 .

誤差評価部１７７は、逆変換部１７６から入力されたカメラパラメータの推定値θ_ｅｓｔと、図２のカメラパラメータ読出部１４から入力されたカメラパラメータθとの誤差を算出するものである。例えば、誤差評価部１７７は、２乗和誤差や交差エントロピー誤差を算出する。そして、誤差評価部１７７は、算出した誤差をパラメータ更新部１７４に出力する。 The error evaluation unit 177 calculates the error between the camera parameter estimate θ _est input from the inverse transformation unit 176 and the camera parameter θ input from the camera parameter reading unit 14 in FIG. For example, the error evaluation unit 177 calculates a sum of squares error and a cross entropy error. Then, the error evaluation section 177 outputs the calculated error to the parameter updating section 174.

このように、学習部１７Ｂは、カメラマンが構図を決めるカメラ座標系で誤差を評価するので、推定精度を向上させることができる。 In this way, the learning unit 17B evaluates the error using the camera coordinate system in which the cameraman determines the composition, so that the estimation accuracy can be improved.

（変形例）
以上、本発明の各実施形態を詳述してきたが、本発明は前記した各実施形態に限られるものではなく、本発明の要旨を逸脱しない範囲の設計変更等も含まれる。
第２実施形態及び第３実施形態では、学習部の具体的な構成を説明したが、これに限定されない。 (Modified example)
Although each embodiment of the present invention has been described in detail above, the present invention is not limited to each of the above-described embodiments, and includes design changes within a range that does not depart from the gist of the present invention.
In the second embodiment and the third embodiment, the specific configuration of the learning section has been described, but the present invention is not limited to this.

前記した各実施形態では、サッカーの試合映像を一例として説明したが、これに限定されない。特に、本発明は、カメラマンのカメラワークと被写体の位置との間に相関がある映像であれば、高精度なカメラパラメータを推定できるので好ましい。例えば、本発明は、バスケットボールやラグビーなどの試合映像、コンサートや舞台などの撮影映像に最適である。
さらに、本発明は、世界座標系が共通する場合、学習済み推論モデルをそのまま利用できる。例えば、国際規格に対応したサッカーフィールドであれば、同一の世界座標系を設定できるので、学習済み推論モデルをそのまま利用できる。 In each of the embodiments described above, a soccer game video was explained as an example, but the present invention is not limited to this. In particular, the present invention is preferable because highly accurate camera parameters can be estimated if the video has a correlation between the camera work of the cameraman and the position of the subject. For example, the present invention is most suitable for video of games such as basketball and rugby, and video of concerts and performances.
Furthermore, in the present invention, when the world coordinate system is common, a trained inference model can be used as is. For example, if it is a soccer field that complies with international standards, the same world coordinate system can be set, so a trained inference model can be used as is.

前記した第１実施形態では、カメラパラメータの読み出し時刻をオフセットさせるものとして説明したが、これに限定されない。つまり、オフセット時刻を０に設定してもよい。この場合、学習部は、同一時刻の状況データとフレーミングパラメータとを対応付けて学習することになる。
前記した第１実施形態では、変換規則及び逆変換規則の一例と、量子化規則及び逆量子化規則の一例とを説明したが、これに限定されない。 In the first embodiment described above, the reading time of camera parameters is explained as being offset, but the present invention is not limited to this. That is, the offset time may be set to 0. In this case, the learning unit learns by associating situation data and framing parameters at the same time.
In the first embodiment described above, an example of a conversion rule and an inverse conversion rule, and an example of a quantization rule and an inverse quantization rule were explained, but the present invention is not limited thereto.

前記した各実施形態では、フレーミング領域学習装置及びフレーミング領域推定装置を独立したハードウェアとして説明したが、本発明は、これに限定されない。例えば、本発明は、コンピュータが備えるＣＰＵ、メモリ、ハードディスク等のハードウェア資源を、前記した各装置として動作させるプログラムで実現することもできる。これらのプログラムは、通信回線を介して配布してもよく、ＣＤ－ＲＯＭやフラッシュメモリ等の記録媒体に書き込んで配布してもよい。 In each of the embodiments described above, the framing region learning device and the framing region estimating device are described as independent hardware, but the present invention is not limited to this. For example, the present invention can be realized by a program that causes hardware resources such as a CPU, memory, and hard disk included in a computer to operate as each of the above-described devices. These programs may be distributed via communication lines, or may be written and distributed on recording media such as CD-ROMs and flash memories.

１フレーミング領域学習装置
１０Ａ状況データ記憶部（被写体情報記憶部）
１０Ｂカメラパラメータ記憶部
１１データ選択部
１２時刻調整部
１３状況データ読出部（被写体情報読出部）
１４カメラパラメータ読出部
１５変換部
１６量子化部
１７学習部
２フレーミング領域推定装置
２０推定部
２１逆量子化部
２２逆変換部
１７０推定部
１７１逆量子化部
１７２変換部
１７３，１７７誤差評価部
１７４パラメータ更新部
１７５パラメータ記憶部
１７６逆変換部 1 Framing area learning device 10A Situation data storage unit (subject information storage unit)
10B Camera parameter storage section 11 Data selection section 12 Time adjustment section 13 Situation data reading section (subject information reading section)
14 Camera parameter readout section 15 Transformation section 16 Quantization section 17 Learning section 2 Framing region estimation device 20 Estimation section 21 Inverse quantization section 22 Inverse transformation section 170 Estimation section 171 Inverse quantization section 172 Transformation section 173, 177 Error evaluation section 174 Parameter update unit 175 Parameter storage unit 176 Inverse conversion unit

Claims

Converting camera parameters indicating the attitude and angle of view of the camera into framing parameters indicating a framing area to be framed by the camera, and associating the framing parameters with subject information indicating at least one of the position or speed of the subject. A framing area learning device for learning,
a conversion unit that converts the camera parameters acquired in synchronization with the subject information into the framing parameters based on a preset conversion rule;
a learning unit that generates a learned inference model of the framing parameter by learning the subject information and the framing parameter in association with each other;
A framing area learning device comprising:

a subject information storage unit that stores the subject information in advance;
a subject information reading unit that reads out the subject information at a predetermined time from the subject information storage unit;
a camera parameter storage section that stores the camera parameters in advance;
further comprising a camera parameter readout unit that reads out the camera parameters at a future time offset by a preset offset time with respect to the readout time of the subject information;
The conversion unit converts the camera parameters read by the camera parameter reading unit into framing parameters at the future time,
2. The framing area learning device according to claim 1, wherein the learning unit learns the object information at the read time and the framing parameter at the future time in association with each other.

2. The object information includes at least one of a position map that visualizes the position of the object in a world coordinate system, and a speed map that images the speed of the object in the world coordinate system. Or the framing area learning device according to claim 2.

4. The framing region learning device according to claim 3, wherein the learning unit uses a convolutional neural network as an inference model.

a quantization unit that inputs the framing parameter as a continuous value on the world coordinate system from the conversion unit and quantizes the framing parameter into discrete values on the world coordinate system based on a preset quantization rule; Be even more prepared,
The framing area learning device according to any one of claims 1 to 4, wherein the learning unit learns by associating the discrete value of the framing parameter with the subject information.

A framing area estimating device that estimates a framing parameter indicating a framing area of a camera using a learned inference model generated by the framing area learning device according to any one of claims 1 to 5,
an estimation unit that receives subject information indicating at least one of the position or speed of the subject and outputs an estimated value of the framing parameter according to the subject information using the learned inference model;
an inverse transformation unit that inversely transforms the estimated value of the framing parameter outputted by the estimation unit into the estimated value of the camera parameter indicating the attitude and angle of view of the camera, based on a preset inverse transformation rule;
A framing area estimation device comprising:

6. The object information includes at least one of a position map that represents the position of the object in a world coordinate system, and a speed map that represents the speed of the object in the world coordinate system. The framing area estimation device described in .

The estimation unit outputs the estimated value of the framing parameter as a discrete value on a world coordinate system,
A claim further comprising: an inverse quantization unit that inversely quantizes the estimated value of the framing parameter input from the estimation unit into continuous values on the world coordinate system based on a preset inverse quantization rule. The framing area estimating device according to claim 6 or claim 7.

A program for causing a computer to function as the framing area learning device according to any one of claims 1 to 5.

A program for causing a computer to function as the framing area estimating device according to any one of claims 6 to 8.