JP7625407B2

JP7625407B2 - Information processing device, control method thereof, and program

Info

Publication number: JP7625407B2
Application number: JP2020198081A
Authority: JP
Inventors: 敬介牧田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2025-02-03
Anticipated expiration: 2040-11-30
Also published as: JP2022086194A

Description

本発明は、情報処理装置及びその制御方法、並びにプログラムに関し、特に、カメラによる目標被写体の追尾撮影を制御する情報処理装置及びその制御方法、並びにプログラムに関する。 The present invention relates to an information processing device, a control method thereof, and a program, and in particular to an information processing device, a control method thereof, and a program for controlling tracking and photographing a target subject by a camera.

従来、ユーザが操作器からカメラを遠隔操作することで所望の映像を取得する、雲台システムが広く知られている。例えば、テレビのニュースで目にする航空機の映像は、空港屋上に常設された雲台装置を放送局から遠隔操作することで撮影されている。 Conventionally, a camera platform system has been widely known, in which a user remotely controls a camera using a control device to obtain the desired image. For example, the images of aircraft seen on television news are captured by a broadcasting station remotely controlling a camera platform device permanently installed on the roof of an airport.

また、この雲台装置に画像認識技術を搭載して、映像内に映る被写体を認識し、被写体の動きに合わせて自動でパン、チルト、ズームを動作させ追尾する自動追尾システムが提案されている。これにより、ユーザが操作器を操作しなくても。動きのある被写体を自動で撮影することができる。 An automatic tracking system has also been proposed that incorporates image recognition technology into the camera platform to recognize subjects in the image and automatically pan, tilt, and zoom to track the subject's movements. This makes it possible to automatically capture moving subjects without the user having to operate a controller.

近年、画像認識技術として機械学習（ＡＩ）を用いた例が知られており、例えば、特許文献１には、カメラで撮像される被写体を検出・監視するシステムが提案されている。 In recent years, examples of image recognition technology that uses machine learning (AI) have become known. For example, Patent Document 1 proposes a system that detects and monitors subjects captured by a camera.

特許第６１８４２７１号公報Patent No. 6184271

しかし、特許文献１に開示された従来技術では、機械学習の処理結果の確からしさ（尤度）が低い場合に、サブカメラのアングルを変更するため、主カメラでは被写体の追尾ができず、連続した追尾映像を撮影できない。被写体の検出精度を向上させ、尤度が低下するケースを取り除くことができれば、主カメラによる追尾映像が撮影可能だが、そのためには多くの学習が必要となる。 However, in the conventional technology disclosed in Patent Document 1, when the certainty (likelihood) of the machine learning processing results is low, the angle of the sub-camera is changed, so the main camera cannot track the subject and cannot capture continuous tracking footage. If the accuracy of subject detection could be improved and cases in which the likelihood is reduced could be eliminated, it would be possible to capture tracking footage using the main camera, but this would require a lot of learning.

本発明は上述した課題に鑑みてなされたものであり、被写体の検出精度が低下するケースでも、特別な学習を必要とせず、撮影中の被写体を追尾することができる情報処理装置及びその制御方法、並びにプログラムを提供することを目的とする。 The present invention has been made in consideration of the above-mentioned problems, and aims to provide an information processing device, a control method thereof, and a program that can track a subject during shooting without requiring special learning, even in cases where the subject detection accuracy is reduced.

本発明の請求項１に係る情報処理装置は、動画像を撮影する撮像装置のパン、チルト、ズームを制御することにより、前記撮像装置により目標被写体を追尾する情報処理装置であって、前記撮像装置で撮影された動画像のフレームを取得する取得手段と、前記取得したフレーム内に存在するオブジェクトを推定する推定手段と、前記推定されたオブジェクトが予め設定された追尾オブジェクトである場合、前記取得したフレーム内に存在するオブジェクトを追尾するよう前記撮像装置を制御する制御手段とを備え、前記制御手段は、前記取得したフレーム内に存在するオブジェクトのサイズが第１の閾値以下である場合、前記追尾オブジェクトを前記予め設定された追尾オブジェクトよりも実サイズが小さいオブジェクトに再設定することを特徴とする。 The information processing device of claim 1 of the present invention is an information processing device that tracks a target subject using an imaging device by controlling the pan, tilt, and zoom of the imaging device that captures moving images, and is equipped with an acquisition means for acquiring frames of moving images captured by the imaging device, an estimation means for estimating an object present in the acquired frames, and a control means for controlling the imaging device to track the object present in the acquired frames if the estimated object is a predetermined tracking object, and is characterized in that, if the size of the object present in the acquired frames is equal to or smaller than a first threshold , the control means resets the tracking object to an object whose actual size is smaller than the predetermined tracking object .

本発明によれば、被写体の検出精度が低下するケースでも、特別な学習を必要とせず、撮影中の被写体を追尾することができる。 According to the present invention, even in cases where the accuracy of subject detection is reduced, it is possible to track a subject during shooting without requiring special learning.

本発明の実施形態に係る情報処理装置を含む自動追尾システムの構成を示す図である。1 is a diagram showing a configuration of an automatic tracking system including an information processing device according to an embodiment of the present invention. 図１における情報処理装置、雲台装置、及び操作装置のハードウェア構成を示す図である。FIG. 2 is a diagram showing the hardware configurations of the information processing device, the camera platform device, and the operation device appearing in FIG. 1 . 図１における情報処理装置、雲台装置、及び操作装置が備えるハードウェア資源及びプログラムを利用することで実現されるソフトウェア構成を示す図である。2 is a diagram showing a software configuration realized by using hardware resources and programs included in the information processing device, the camera platform device, and the operation device in FIG. 1 . 図３における推定部での推定処理の概念図である。FIG. 4 is a conceptual diagram of an estimation process in an estimation unit in FIG. 3 . 本発明の実施形態に係る学習フェーズでの処理の詳細な流れを示すフローチャートである。10 is a flowchart showing a detailed flow of processing in a learning phase according to the embodiment of the present invention. 図３における学習部に入力される学習用データの例を示す図である。4 is a diagram showing an example of learning data input to a learning unit in FIG. 3; 図４の学習済みモデルを利用した自動追尾システムにおける自動追尾撮影動作を説明する図である。5 is a diagram illustrating an automatic tracking and photographing operation in an automatic tracking system using the trained model of FIG. 4. 本発明の実施形態に係る推定フェーズでの処理の詳細な流れを示すフローチャートである。10 is a flowchart showing a detailed flow of processing in an estimation phase according to the embodiment of the present invention.

図１は、本発明の実施形態に係る情報処理装置１００を含む自動追尾システム１を示す図である。 Figure 1 is a diagram showing an automatic tracking system 1 including an information processing device 100 according to an embodiment of the present invention.

図１において、自動追尾システム１は、情報処理装置１００、雲台装置２００、操作装置３００、及びネットワークＮＥＴを備える。 In FIG. 1, the automatic tracking system 1 includes an information processing device 100, a camera platform device 200, an operation device 300, and a network NET.

操作装置３００は、ユーザ操作を受け付ける装置であり、その受け付けたユーザ操作に応じた指令（制御信号）が、ネットワークＮＥＴ及び情報処理装置１００を経由し、雲台装置２００に送信される。 The operation device 300 is a device that accepts user operations, and commands (control signals) corresponding to the accepted user operations are transmitted to the camera platform device 200 via the network NET and the information processing device 100.

雲台装置２００は、操作装置３００から送信された制御信号の内容に応じて動作する。これにより、ユーザは、雲台装置２００の遠隔操作が可能となる。雲台装置２００が備えるカメラ２０１（図２）で撮影された映像は、情報処理装置１００に入力され、自動追尾撮影に必要な各種演算及び記録が行われる。 The pan-and-tilt device 200 operates according to the contents of the control signal transmitted from the operation device 300. This allows the user to remotely control the pan-and-tilt device 200. Images captured by a camera 201 (FIG. 2) provided on the pan-and-tilt device 200 are input to the information processing device 100, where various calculations and recordings required for automatic tracking shooting are performed.

ネットワークＮＥＴは、公衆電話回線やインターネット等の通信回線である。雲台装置２００及び情報処理装置１００は、空港や鉄塔、テレビ局屋上といったスポットに設置され、操作装置３００はテレビ局内等に設置される。以下、本実施例では、情報処理装置１００及び雲台装置２００が空港に設置され、雲台装置により自動追尾撮影される対象物が航空機である場合について説明する。 The network NET is a communication line such as a public telephone line or the Internet. The camera platform device 200 and the information processing device 100 are installed in spots such as airports, steel towers, and TV station rooftops, and the operation device 300 is installed inside the TV station, etc. In the following embodiment, a case will be described in which the information processing device 100 and the camera platform device 200 are installed in an airport, and the object to be automatically tracked and photographed by the camera platform device is an airplane.

図２は、情報処理装置１００、雲台装置２００、及び操作装置３００のハードウェア構成を示す図である。 Figure 2 is a diagram showing the hardware configuration of the information processing device 100, the camera platform device 200, and the operation device 300.

情報処理装置１００は、ＲＡＭ１０１、ＧＰＵ１０２、ＣＰＵ１０３、入力部１０４、記憶部１０５、シリアル通信部１０６、ネットワーク通信部１０７、及びＵＩ部１０８を備える。 The information processing device 100 includes a RAM 101, a GPU 102, a CPU 103, an input unit 104, a memory unit 105, a serial communication unit 106, a network communication unit 107, and a UI unit 108.

雲台装置２００は、カメラ２０１、駆動部２０２、シリアル通信部２０３、ＣＰＵ２０４、及び記憶部２０５を備える。 The camera platform device 200 includes a camera 201, a drive unit 202, a serial communication unit 203, a CPU 204, and a memory unit 205.

操作装置３００は、ネットワーク通信部３０１、操作部３０２、記憶部３０３、ＣＰＵ３０４、及び表示部３０５を備える。 The operation device 300 includes a network communication unit 301, an operation unit 302, a memory unit 303, a CPU 304, and a display unit 305.

以下、情報処理装置１００のハードウェア資源の詳細について説明する。 The details of the hardware resources of the information processing device 100 are explained below.

ＲＡＭ１０１は、揮発性のメモリであり、ＣＰＵ１０３の主メモリ、ワークエリア等の一時記憶領域として用いられる。 RAM 101 is a volatile memory and is used as a temporary storage area such as the main memory and work area of CPU 103.

ＣＰＵ１０３は、例えば記憶部１０５に格納されるプログラムに従い、ＲＡＭ１０１をワークメモリとして用いて、情報処理装置１００の各部を制御する。 The CPU 103 controls each part of the information processing device 100, for example, according to a program stored in the memory unit 105, using the RAM 101 as a work memory.

ＧＰＵ１０２は、データの並列処理により効率的な演算を行うことができるので、以下後述するようにディープラーニングにより学習モデルを用いて複数回に渡り学習を行う処理はＧＰＵ１０２で行われる。 Since GPU 102 can perform efficient calculations through parallel processing of data, the process of learning multiple times using a learning model through deep learning, as described below, is performed by GPU 102.

入力部１０４（取得手段）は、映像信号を有線の映像信号線を介して情報処理装置１００に入力するためのインターフェイスであり、例えばＵＳＢ等の各種通信インターフェイスである。 The input unit 104 (acquisition means) is an interface for inputting a video signal to the information processing device 100 via a wired video signal line, and is, for example, a variety of communication interfaces such as a USB.

記憶部１０５は、不揮発性のメモリであり、画像データやその他のデータ、ＣＰＵ１０３が動作するための各種プログラム等が、それぞれ所定の領域に格納されている。記憶部１０５は、例えばＨＤＤやフラッシュメモリなどの磁気ディスクにより構成される。 The storage unit 105 is a non-volatile memory, and image data and other data, various programs for the operation of the CPU 103, etc. are stored in respective predetermined areas. The storage unit 105 is configured, for example, by a magnetic disk such as a HDD or flash memory.

シリアル通信部１０６は、ＣＰＵ１０３の制御に基づき、雲台装置２００とシリアル通信をするためのインターフェイスである。 The serial communication unit 106 is an interface for serial communication with the camera platform device 200 under the control of the CPU 103.

ネットワーク通信部１０７は、ＣＰＵ１０３の制御に基づき、ネットワークＮＥＴを介して操作装置３００と通信するための通信インターフェイスである。 The network communication unit 107 is a communication interface for communicating with the operation device 300 via the network NET under the control of the CPU 103.

ＵＩ部１０８は、情報処理装置１００に対するユーザ操作を受け付けると共に、ユーザへ情報処理装置１００の情報を表示するためのユーザインターフェイスである。ＵＩ部１０８は、具体的には、タッチパネルにより構成されるが、キーボードやマウス、ディスプレイ等をさらに備えていてもよい。 The UI unit 108 is a user interface for accepting user operations on the information processing device 100 and displaying information on the information processing device 100 to the user. Specifically, the UI unit 108 is configured with a touch panel, but may further include a keyboard, a mouse, a display, etc.

以下、雲台装置２００のハードウェア資源の詳細について説明する。 The hardware resources of the camera platform device 200 are described in detail below.

カメラ２０１（撮像装置）は、雲台装置２００が設置された周囲を撮影し、目標被写体の映像を撮影する。カメラ２０１は撮影倍率を変更可能な光学ズームレンズが備えている。すなわち、雲台装置２００は、ＣＰＵ２０４からカメラ２０１に光学ズームレンズに対する光学ズーム制御命令を送信することで、撮像映像の倍率を変更する光学ズーム機能を有する。さらに、雲台装置２００は、ＣＰＵ２０４からカメラ２０１に撮像された画像の一部を局所的に拡大させるデジタルズーム制御命令を送信することで、撮像映像の倍率を変更するデジタルズーム機能も有する。デジタルズーム機能は、光学ズーム機能では倍率が足りない場合、すなわち、より撮影映像を拡大したい場合に利用される。また、カメラ２０１は、情報処理装置１００の入力部１０４と有線の映像信号線で接続されており、撮影した映像信号を情報処理装置１００へ出力する。 The camera 201 (imaging device) captures the surroundings around the camera platform device 200 and captures an image of a target subject. The camera 201 is equipped with an optical zoom lens that can change the imaging magnification. That is, the camera platform device 200 has an optical zoom function that changes the magnification of the captured image by transmitting an optical zoom control command for the optical zoom lens from the CPU 204 to the camera 201. Furthermore, the camera platform device 200 also has a digital zoom function that changes the magnification of the captured image by transmitting a digital zoom control command from the CPU 204 to the camera 201 to locally enlarge a part of the captured image. The digital zoom function is used when the magnification is insufficient with the optical zoom function, that is, when it is desired to further enlarge the captured image. The camera 201 is also connected to the input unit 104 of the information processing device 100 by a wired video signal line, and outputs the captured video signal to the information processing device 100.

駆動部２０２は、雲台装置２００をパン、チルト方向に旋回させるためのアクチュエータ及びその駆動回路、周辺回路である。すなわち、雲台装置２００は、ＣＰＵ２０４から駆動部２０２に目標被写体に対してパン、チルト方向に旋回する追尾制御命令を送信することで追尾映像を撮影する追尾撮影機能を有する。 The driving unit 202 is an actuator for rotating the camera platform device 200 in the pan and tilt directions, as well as its driving circuit and peripheral circuits. That is, the camera platform device 200 has a tracking shooting function that shoots tracking video by sending a tracking control command from the CPU 204 to the driving unit 202 to rotate the camera platform device 200 in the pan and tilt directions relative to a target subject.

シリアル通信部２０３は、情報処理装置１００のシリアル通信部１０６と接続され、ＣＰＵ２０４の制御に基づき、情報処理装置１００とシリアル通信をするためのインターフェイスである。 The serial communication unit 203 is connected to the serial communication unit 106 of the information processing device 100, and is an interface for performing serial communication with the information processing device 100 based on the control of the CPU 204.

ＣＰＵ２０４は、例えば記憶部２０５に格納されるプログラムに従い、雲台装置２００の各部を制御する。 The CPU 204 controls each part of the camera head device 200 according to a program stored in the memory unit 205, for example.

記憶部２０５は、不揮発性のメモリであり、雲台装置２００の設定データやその他のデータ、ＣＰＵ２０４が動作するための各種プログラム等が、それぞれ所定の領域に格納されている。 The storage unit 205 is a non-volatile memory, and the setting data of the camera head device 200, other data, various programs for the operation of the CPU 204, etc. are stored in designated areas.

以下、操作装置３００のハードウェア資源の詳細について説明する。 The hardware resources of the operation device 300 are described in detail below.

ネットワーク通信部３０１は、ＣＰＵ３０４の制御に基づき、ネットワークＮＥＴを介して情報処理装置１００と通信するための通信インターフェイスである。 The network communication unit 301 is a communication interface for communicating with the information processing device 100 via the network NET based on the control of the CPU 304.

操作部３０２は、ジョイスティック、操作レバーや各種スイッチであり、ユーザは、操作部３０２を操作することで、雲台装置２００の旋回制御やズーム制御、ゲインなどの調整を行う。尚、ここでのズーム制御は、上述した光学ズーム機能及びデジタルズーム機能を利用したズーム制御を指す。 The operation unit 302 is a joystick, an operating lever, and various switches, and the user operates the operation unit 302 to perform rotation control, zoom control, gain adjustment, and the like of the camera platform device 200. Note that zoom control here refers to zoom control that utilizes the optical zoom function and digital zoom function described above.

記憶部３０３は、不揮発性のメモリであり、操作装置３００の設定データやその他のデータ、ＣＰＵ３０４が動作するための各種プログラム等が、それぞれ所定の領域に格納されている。 The storage unit 303 is a non-volatile memory, and the setting data of the operation device 300, other data, various programs for the operation of the CPU 304, etc. are stored in respective designated areas.

ＣＰＵ３０４は、例えば記憶部３０３に格納されるプログラムに従い、操作装置３００の各部を制御する。 The CPU 304 controls each part of the operation device 300 according to a program stored in the memory unit 303, for example.

表示部３０５は、ＬＥＤ及びタッチパネルを備え、雲台装置２００のステータスや警告等をユーザに通知する。 The display unit 305 is equipped with an LED and a touch panel, and notifies the user of the status of the camera head device 200, warnings, etc.

尚、ＣＰＵ１０３、ＣＰＵ２０４、ＣＰＵ３０４はいずれも、１つ以上のプロセッサーにより構成するようにしてもよい。 Note that CPU 103, CPU 204, and CPU 304 may each be configured with one or more processors.

図３は、図２に示す情報処理装置１００、雲台装置２００、及び操作装置３００が備えるハードウェア資源及びプログラムを利用することで実現されるソフトウェア構成を示す図である。 Figure 3 is a diagram showing a software configuration realized by using the hardware resources and programs provided in the information processing device 100, the camera platform device 200, and the operation device 300 shown in Figure 2.

以下、情報処理装置１００のソフトウェア構成について説明する。 The software configuration of the information processing device 100 is described below.

情報処理装置１００は、学習部１５０、データ記憶部１５１、推定対象設定部１５２、モード管理部１５３、画像処理部１５４、推定部１５５、推定結果処理部１５６、及び雲台制御部１５７を備える。 The information processing device 100 includes a learning unit 150, a data storage unit 151, an estimation target setting unit 152, a mode management unit 153, an image processing unit 154, an estimation unit 155, an estimation result processing unit 156, and a camera head control unit 157.

学習部１５０（学習手段）は、推定部１５５で推定処理を行うための学習処理を実行する。学習部１５０により実行される学習処理の詳細な内容については後述する。 The learning unit 150 (learning means) executes a learning process to enable the estimation unit 155 to perform estimation processing. The details of the learning process executed by the learning unit 150 will be described later.

データ記憶部１５１は、自動追尾撮影した映像の記録処理や、学習用データの記録処理を行う。 The data storage unit 151 records the video captured using automatic tracking and records the learning data.

推定対象設定部１５２は、推定部１５５が出力するオブジェクトのタグを管理する。ここで、タグは、学習部１５０に入力する学習用データの一部、及び推定部１５５が出力するデータのひとつであり、オブジェクトが何であるかを示すラベルである。タグの具体例として、航空機、犬、猫、鳥などが挙げられる。推定対象設定部１５２には、ユーザが自動追尾撮影したいオブジェクトのタグを予め設定することができ、複数のタグを設定することも可能である。本実施例では、航空機の自動追尾撮影を行うため、航空機のタグが自動追尾撮影したいオブジェクトのタグとして設定される。 The estimation target setting unit 152 manages the tags of objects output by the estimation unit 155. Here, the tag is part of the learning data input to the learning unit 150 and one of the data output by the estimation unit 155, and is a label indicating what the object is. Specific examples of tags include aircraft, dogs, cats, and birds. In the estimation target setting unit 152, the user can set in advance the tag of an object that he or she wishes to automatically track and photograph, and it is also possible to set multiple tags. In this embodiment, in order to perform automatic tracking and photographing of an aircraft, the tag of the aircraft is set as the tag of the object that he or she wishes to automatically track and photograph.

モード管理部１５３は、情報処理装置１００のモードを管理し、学習モード、自動追尾モード、マニュアルモードの３モードを管理する。各モードの詳細な内容については後述する。 The mode management unit 153 manages the mode of the information processing device 100, and manages three modes: learning mode, automatic tracking mode, and manual mode. The details of each mode will be described later.

画像処理部１５４は、雲台装置２００から受信した映像信号（画像；フレーム）の処理を行う。具体的には受信した画像のリサイズや、輝度調整である。 The image processing unit 154 processes the video signal (image; frame) received from the camera platform device 200. Specifically, it resizes the received image and adjusts the brightness.

推定部１５５（推定手段）は、画像処理部１５４から出力された画像を入力データとし、学習部１５０によって生成された学習済みモデルへ入力し推定処理を行う。これにより、、雲台装置２００から受信したフレーム内に存在する被写体（オブジェクト）を推定する。 The estimation unit 155 (estimation means) uses the image output from the image processing unit 154 as input data, inputs it to the trained model generated by the learning unit 150, and performs estimation processing. In this way, the subject (object) present in the frame received from the camera platform device 200 is estimated.

推定結果処理部１５６は、推定部１５５から出力された推定結果に追尾対象が含まれている場合、各種ノイズ処理、平均化処理を実施し、追尾対象の映像内の位置（被写体現在位置）を出力する。雲台装置２００で撮影する映像には、追尾対象以外のノイズ（航空機以外の航空機や、背景の一部や雲など航空機と見間違えるもの）が存在する。推定結果処理部１５６では、これらのノイズを処理し、信頼度の高い被写体現在位置を出力する役割がある。また、前述した推定対象設定部１５２に航空機に加えて他のタグを設定すると、推定結果処理部１５６でのノイズ処理に負荷がかかったり、出力する被写体の現在位置の信頼性が低下したりする。そのため、推定対象設定部１５２に設定するタグは、必要最小限にしておくことが好ましい。 When the estimation result output from the estimation unit 155 includes a tracking target, the estimation result processing unit 156 performs various noise processing and averaging processing, and outputs the position of the tracking target in the image (current position of the subject). The image captured by the camera platform device 200 contains noise other than the tracking target (aircraft other than aircraft, parts of the background, clouds, etc. that can be mistaken for aircraft). The estimation result processing unit 156 has the role of processing this noise and outputting a highly reliable current position of the subject. Furthermore, if other tags are set in addition to the aircraft in the estimation target setting unit 152 described above, the noise processing in the estimation result processing unit 156 may be burdened, or the reliability of the current position of the subject to be output may decrease. For this reason, it is preferable to set the tags in the estimation target setting unit 152 to the minimum necessary.

雲台制御部１５７は、前述したモードに応じて、雲台装置２００を制御する制御信号を演算し、雲台装置２００へ出力する。現在のモードが自動追尾モードの場合は、被写体の現在位置と、追尾目標位置（追尾撮影中に撮影画面内で被写体を保持したい位置）を基に、雲台装置２００を制御する制御信号を演算し、雲台装置２００へ出力する。また、現在のモードがマニュアルモードの場合は、操作装置３００から送信された制御信号を雲台装置２００へ出力する。 The tripod head control unit 157 calculates a control signal for controlling the tripod head device 200 according to the aforementioned mode, and outputs it to the tripod head device 200. If the current mode is the automatic tracking mode, it calculates a control signal for controlling the tripod head device 200 based on the current position of the subject and the tracking target position (the position where the subject is to be held within the shooting screen during tracking shooting), and outputs it to the tripod head device 200. If the current mode is the manual mode, it outputs a control signal sent from the operation device 300 to the tripod head device 200.

尚、学習部１５０による処理にはＣＰＵ１０３に加えてＧＰＵ１０２を用いる。具体的には、学習モデルを含む学習プログラムを実行する場合に、ＣＰＵ１０３とＧＰＵ１０２が協働して演算を行うことで学習処理を行う。また、推定部１５５も学習部１５０と同様にＣＰＵ１０３に加えてＧＰＵ１０２が協働して演算を行うことで推定処理を行う。尚、学習部１５０及び推定部１５５の処理は、ＣＰＵ１０３またはＧＰＵ１０２の演算処理能力によっては、ＣＰＵ１０３またはＧＰＵ１０２のみにより演算が行われても良い。 The learning unit 150 uses the GPU 102 in addition to the CPU 103 for processing. Specifically, when executing a learning program including a learning model, the CPU 103 and the GPU 102 work together to perform calculations to perform the learning process. Similarly to the learning unit 150, the estimation unit 155 also works together with the CPU 103 and the GPU 102 to perform calculations to perform the estimation process. Depending on the calculation processing capabilities of the CPU 103 or the GPU 102, the processing of the learning unit 150 and the estimation unit 155 may be performed by the CPU 103 or the GPU 102 alone.

以下、雲台装置２００のソフトウェア構成について説明する。 The software configuration of the camera head device 200 is explained below.

雲台装置２００は、パンチルト制御部２５０、カメラ制御部２５１、設定管理部２５２、及び通信部２５３を備える。 The camera platform device 200 includes a pan/tilt control unit 250, a camera control unit 251, a setting management unit 252, and a communication unit 253.

パンチルト制御部２５０は、通信部２５３で受信した駆動部２０２の駆動を制御する制御信号に基づいて、パン、チルトを駆動するための信号を、駆動部２０２へ出力する。 The pan/tilt control unit 250 outputs a signal for driving the pan and tilt to the drive unit 202 based on a control signal that controls the drive of the drive unit 202 received by the communication unit 253.

カメラ制御部２５１は、通信部２５３で受信したカメラ２０１を制御する制御信号に基づいて、カメラ２０１を制御するための信号を、カメラ２０１へ出力する。 The camera control unit 251 outputs a signal for controlling the camera 201 to the camera 201 based on the control signal for controlling the camera 201 received by the communication unit 253.

設定管理部２５２は、操作装置３００の設定を管理する。設定管理部２５２で管理される具体的な設定項目としては、パン、チルトの最高速や、駆動可能範囲などの駆動パラメータが挙げられる。 The setting management unit 252 manages the settings of the operation device 300. Specific setting items managed by the setting management unit 252 include drive parameters such as the maximum pan and tilt speeds and the drivable range.

通信部２５３は、雲台制御部１５７と予め定めた通信ルール（プロトコル）に則って、制御指令や、ステータス情報のやり取りを行う。 The communication unit 253 exchanges control commands and status information with the camera head control unit 157 in accordance with predetermined communication rules (protocol).

以下、操作装置３００のソフトウェア構成について説明する。 The software configuration of the operation device 300 is described below.

操作装置３００は、通信部３５０、及び表示部３５１を備える。 The operating device 300 includes a communication unit 350 and a display unit 351.

通信部３５０は、雲台制御部１５７と予め定めた通信ルール（プロトコル）に則って、制御信号やステータス情報のやり取りを行う。 The communication unit 350 exchanges control signals and status information with the camera head control unit 157 in accordance with predetermined communication rules (protocol).

図４は、推定部１５５での推定処理の概念図である。 Figure 4 is a conceptual diagram of the estimation process in the estimation unit 155.

図４に示すように、推定部１５５は、学習済みモデル４０３に入力データ４００を入力して推定処理を実行し、学習済みモデル４０３から出力データ４０１を出力する。 As shown in FIG. 4, the estimation unit 155 inputs input data 400 to a trained model 403, executes an estimation process, and outputs output data 401 from the trained model 403.

入力データ４００は、雲台装置２００のカメラ２０１で撮影され、画像処理部１５４で処理された画像データである。尚、雲台装置２００は動画を撮影するため、実際に入力データ４００として学習済みモデル４０３に入力されるデータは、動画像中の１フレームである。 The input data 400 is image data captured by the camera 201 of the camera platform device 200 and processed by the image processing unit 154. Note that since the camera platform device 200 captures video, the data actually input to the trained model 403 as the input data 400 is one frame of the video.

出力データ４０１は、入力データ４００である画像内に存在すると推定されたオブジェクトのタグ、座標、尤度を含む。 The output data 401 includes tags, coordinates, and likelihoods of objects estimated to be present in the image, which is the input data 400.

推定されたオブジェクトのタグは、学習時に入力した学習用データに含まれるタグの中から選択される。学習用データの詳細については後述する。 The tags of the estimated objects are selected from the tags contained in the training data entered during training. Details of the training data will be described later.

推定されたオブジェクトの座標は、画像４０２に示すように、推定されたオブジェクトの外接枠の左上座標（座標１）と、右下座標（座標２）である。この２点の座標から、オブジェクトサイズや中心点座標が演算できる。 The coordinates of the estimated object are the upper left coordinate (coordinate 1) and the lower right coordinate (coordinate 2) of the bounding box of the estimated object, as shown in image 402. From the coordinates of these two points, the object size and center point coordinates can be calculated.

推定されたオブジェクトの尤度は、０～１の値であり、値が大きい程、出力するタグに対する学習済みモデル４０３による推定結果の信頼度が高いことを示す。 The likelihood of an estimated object is a value between 0 and 1, and the higher the value, the higher the reliability of the estimation result by the trained model 403 for the output tag.

学習済みモデル４０３は、ニューラルネットワークであり、これの内部パラメータは学習部１５０による学習モデルの機械学習によって生成される。 The trained model 403 is a neural network, and its internal parameters are generated by machine learning of the trained model by the training unit 150.

尚、学習部１５０は、誤差検出部及び更新部を備える。 The learning unit 150 includes an error detection unit and an update unit.

誤差検出部は、入力層に入力される入力データに応じてニューラルネットワークの出力層から出力される出力データと、教師データとの誤差を得る。誤差検出部は、損失関数を用いて、ニューラルネットワークからの出力データと教師データとの誤差を計算するようにしてもよい。 The error detection unit obtains the error between the teacher data and the output data output from the output layer of the neural network in response to the input data input to the input layer. The error detection unit may use a loss function to calculate the error between the output data from the neural network and the teacher data.

更新部は、誤差検出部で得られた誤差に基づいて、その誤差が小さくなるように、ニューラルネットワークのノード間の結合重み付け係数等を更新する。この更新部は、例えば、誤差逆伝播法を用いて、結合重み付け係数等を更新する。誤差逆伝播法は、上記の誤差が小さくなるように、各ニューラルネットワークのノード間の結合重み付け係数等を調整する手法である。 The update unit updates the connection weighting coefficients between the nodes of the neural network based on the error obtained by the error detection unit so as to reduce the error. This update unit updates the connection weighting coefficients, for example, using the backpropagation method. The backpropagation method is a technique for adjusting the connection weighting coefficients between the nodes of each neural network so as to reduce the above-mentioned error.

図５は、本実施形態に係る学習フェーズでの処理の詳細な流れを示すフローチャートである。 Figure 5 is a flowchart showing the detailed flow of processing in the learning phase according to this embodiment.

まずステップＳ５００で、ＣＰＵ１０３は、モード管理部１５３により管理されている現在のモードが学習モードか否かを判断する。この結果、ＣＰＵ１０３が学習モードであると判断した場合はステップＳ５０１に進み、そうでない場合は、ステップＳ５０６に進む。 First, in step S500, the CPU 103 determines whether the current mode managed by the mode management unit 153 is the learning mode. If the CPU 103 determines that the current mode is the learning mode, the process proceeds to step S501. If not, the process proceeds to step S506.

ステップＳ５０１では、ＣＰＵ１０３は、データ記憶部１５１から学習用データを一つ読み出してステップＳ５０２に進む。ここで、学習用データについて図６を参照し説明する。 In step S501, the CPU 103 reads one piece of learning data from the data storage unit 151 and proceeds to step S502. Here, the learning data will be described with reference to FIG. 6.

図６は、学習部１５０に入力される学習用データの例を示す図である。 Figure 6 shows an example of learning data input to the learning unit 150.

学習用データは、画像とその画像内に含まれるオブジェクトのタグ（教師データ）が紐づけられたデータである。尚、画像に含まれるオブジェクトはひとつであることが好ましく、画像サイズは、複数の学習用データ間で同一であることが好ましい。本実施例では、空港で航空機の自動追尾撮影を行うため、入力データとして航空機の映像を用意し、加えて自動追尾撮影中に、雲台装置２００のカメラ２０１に写る可能性が高い他のオブジェクト映像を用意する。具体的な他のオブジェクトは、例えば鳥や凧が挙げられる。また、これらのオブジェクトの画像は、予め雲台装置２００を用いて撮影・記録すればよい。学習用データのうち、教師データは前述したオブジェクトの画像からオブジェクトが何であるかを目視で判断し、設定する。図６では学習用データ６０１～６０３を例示しており、それぞれ、航空機、鳥、凧が入力データである画像に含まれるオブジェクトである例を示す。 The learning data is data in which an image and a tag (teaching data) of an object contained in the image are linked. Note that an image preferably contains one object, and the image size is preferably the same among multiple learning data. In this embodiment, to perform automatic tracking and shooting of an aircraft at an airport, an image of the aircraft is prepared as input data, and in addition, images of other objects that are likely to be captured by the camera 201 of the camera platform 200 during automatic tracking and shooting are prepared. Specific examples of other objects include birds and kites. Images of these objects may be photographed and recorded in advance using the camera platform 200. Among the learning data, the teaching data is set by visually determining what the object is from the image of the object described above. FIG. 6 shows learning data 601 to 603, which are examples of objects contained in the image that is the input data, such as an aircraft, a bird, and a kite.

続いて、図５に戻り、ステップＳ５０２では、ＣＰＵ１０３は、データ記憶部１５１から読み出した学習用データが決められたルール通り（正規）か否かを判断する。この結果、ＣＰＵ１０３がルール通りであると判断した場合は、ステップＳ５０３に進み、そうでない場合はステップＳ５０１に戻る。 Returning to FIG. 5, in step S502, the CPU 103 determines whether the learning data read from the data storage unit 151 conforms to the determined rules (is regular). As a result, if the CPU 103 determines that the learning data conforms to the rules, the process proceeds to step S503, and if not, the process returns to step S501.

ステップＳ５０３では、ＣＰＵ１０３は、データ記憶部１５１から読み出した学習用データを学習部１５０にあるニューラルネットワークである学習モデルに入力する。 In step S503, the CPU 103 inputs the learning data read from the data storage unit 151 into a learning model, which is a neural network in the learning unit 150.

ステップＳ５０４では、ＣＰＵ１０３とＧＰＵ１０２が協働して、学習部１５０で学習処理を行い、学習モデルの内部パラメータを更新する。 In step S504, the CPU 103 and GPU 102 work together to perform learning processing in the learning unit 150 and update the internal parameters of the learning model.

続いて、ステップＳ５０５では、ＣＰＵ１０３は、データ記憶部１５１に記録されている全ての学習用データに基づく学習が済んだか否かに応じ、全ての学習が済んだ場合は本処理を終了する一方、そうでなければステップＳ５０１に戻る。 Next, in step S505, the CPU 103 determines whether learning based on all learning data recorded in the data storage unit 151 has been completed, and if all learning has been completed, ends this process, otherwise returns to step S501.

以上の処理によって、推定部１５５により推定処理が実行される学習済みモデル４０３の内部パラメータが決定される。 By the above processing, the internal parameters of the trained model 403 on which the estimation process is performed by the estimation unit 155 are determined.

尚、ステップＳ５０６以降の処理は、前述したモードが、マニュアルモードもしくは、自動追尾モード時の処理の流れを示す。 The process from step S506 onwards shows the process flow when the aforementioned mode is manual mode or automatic tracking mode.

ステップＳ５０６では、ＣＰＵ１０３は、現在のモードがマニュアルモードか否かを判断する。この結果、ＣＰＵ１０３が現在のモードがマニュアルモードであると判断した場合は、ステップＳ５０７に進み、そうでなければ、ステップＳ５０９に進む。 In step S506, the CPU 103 determines whether the current mode is manual mode. If the CPU 103 determines that the current mode is manual mode, the process proceeds to step S507. If not, the process proceeds to step S509.

ステップＳ５０７では、ＣＰＵ１０３は、操作装置３００からの制御信号（指令）を受信し、ステップＳ５０８に進む。 In step S507, the CPU 103 receives a control signal (command) from the operation device 300 and proceeds to step S508.

ステップＳ５０８では、ＣＰＵ１０３は、受信した制御信号を雲台装置２００へ送信し、本処理を終了する。 In step S508, the CPU 103 transmits the received control signal to the camera head device 200 and ends this process.

ステップＳ５０９では、ＣＰＵ１０３は、後述する自動追尾モード処理を実行した後、本処理を終了する。 In step S509, the CPU 103 executes the auto-tracking mode process described below, and then ends this process.

図７は、図４の学習済みモデル４０３を利用した自動追尾システム１における自動追尾撮影動作を説明する図である。 Figure 7 is a diagram explaining the automatic tracking and shooting operation in the automatic tracking system 1 using the trained model 403 in Figure 4.

まず、ステップＳ７０１にて、操作装置３００から情報処理装置１００に追尾開始命令の制御信号が送信される。尚、この制御信号は、操作装置３００の操作部３０２に対して所定のユーザ操作があったときに、ＣＰＵ３０４により生成され、情報処理装置１００に送信される。 First, in step S701, a control signal for a tracking start command is transmitted from the operation device 300 to the information processing device 100. Note that this control signal is generated by the CPU 304 and transmitted to the information processing device 100 when a predetermined user operation is performed on the operation unit 302 of the operation device 300.

続いて、ステップＳ７０２にて、情報処理装置１００から雲台装置２００に追尾開始位置命令の制御信号が送信される。追尾開始位置とは、自動追尾撮影を開始する雲台装置２００のパン、チルト、ズームの位置であり、ユーザが操作部３０２を操作することで、予め雲台装置２００の記憶部２０５に保持させておく。 Next, in step S702, the information processing device 100 transmits a control signal for a tracking start position command to the camera-head device 200. The tracking start position is the pan, tilt, and zoom position of the camera-head device 200 at which automatic tracking shooting starts, and is stored in advance in the memory unit 205 of the camera-head device 200 by the user operating the operation unit 302.

ステップＳ７０３にて、追尾開始位置命令を受信した雲台装置２００のＣＰＵ２０４は、パン、チルト、ズームの位置を追尾開始位置に移動する。 In step S703, the CPU 204 of the camera head device 200 that has received the tracking start position command moves the pan, tilt, and zoom positions to the tracking start position.

ステップＳ７０４にて、パン、チルト、ズームの位置が追尾開始位置に到達したら、雲台装置２００のＣＰＵ２０４は、情報処理装置１００へ、追尾開始位置に到達したこと及び、カメラ２０１で撮影した映像信号を送信する。 In step S704, when the pan, tilt, and zoom positions reach the tracking start position, the CPU 204 of the camera head device 200 transmits to the information processing device 100 a notification that the tracking start position has been reached and a video signal captured by the camera 201.

ステップＳ７０５にて、雲台装置２００から映像信号を受信した情報処理装置１００のＣＰＵ１０３は、記憶部１０５に雲台装置２００からの映像信号を記録する記録処理を開始する。 In step S705, the CPU 103 of the information processing device 100, which has received the video signal from the pan-and-tilt device 200, starts a recording process to record the video signal from the pan-and-tilt device 200 in the memory unit 105.

続いて、ステップＳ７０６－１にて、ＣＰＵ１０３は、ＧＰＵ１０２と協働して、受信した映像信号（画像）に含まれる被写体を推定する推定処理を行う。 Next, in step S706-1, the CPU 103 cooperates with the GPU 102 to perform an estimation process to estimate the subject contained in the received video signal (image).

さらに、ステップＳ７０６－２にて、ＣＰＵ１０３は、推定結果処理部１５６及び雲台制御部１５７によって雲台制御命令を演算し、ステップＳ７０７にて、ＣＰＵ１０３は、演算した雲台制御命令の制御信号を雲台装置２００へ送信する。 Furthermore, in step S706-2, the CPU 103 calculates a tripod head control command using the estimation result processing unit 156 and the tripod head control unit 157, and in step S707, the CPU 103 transmits a control signal for the calculated tripod head control command to the tripod head device 200.

ステップＳ７０８にて、雲台装置２００のＣＰＵ２０４は、駆動部２０２で受信した制御信号に従ってパン、チルトの位置を制御すると共にカメラ２０１でズームを制御する。その後、ステップＳ７０９にて、ＣＰＵ２０４は、カメラ２０１で撮影した映像信号を情報処理装置１００へ送信する。 In step S708, the CPU 204 of the camera head device 200 controls the pan and tilt positions according to the control signal received by the drive unit 202, and also controls the zoom of the camera 201. Then, in step S709, the CPU 204 transmits the video signal captured by the camera 201 to the information processing device 100.

以降、ステップＳ７０６－１～ステップＳ７０９の処理が実行されることで、目標被写体の自動追尾撮影が可能となる。 Then, the processing of steps S706-1 to S709 is executed, enabling automatic tracking and shooting of the target subject.

図８は、本実施形態に係る推定フェーズにおける処理の詳細な流れを示すフローチャートである。尚、この処理は、図５のステップＳ５０９の自動追尾モード処理でもある。よって、以下、本処理を自動追尾モード処理という。 Figure 8 is a flowchart showing a detailed flow of processing in the estimation phase according to this embodiment. Note that this processing is also the automatic tracking mode processing of step S509 in Figure 5. Therefore, hereinafter, this processing will be referred to as automatic tracking mode processing.

自動追尾モードが開始し、ステップＳ８００ａで、ＣＰＵ１０３は、雲台装置２００から画像（動画像のフレーム）を情報処理装置１００の入力部１０４で受信する。 The automatic tracking mode starts, and in step S800a, the CPU 103 receives an image (a frame of a moving image) from the camera head device 200 at the input unit 104 of the information processing device 100.

ステップＳ８００で、ＣＰＵ１０３は、受信した画像に対し、画像処理部１５４でサイズの変更（リサイズ）を行う。推定部１５５は、学習済みモデル４０３に入力データ４００として入力する画像データのサイズが大きければ大きいほど、推定処理に時間がかかる。すなわち、ステップＳ８００の処理は、入力データ４００のサイズを小さくすることで、推定処理に要する時間を削減する目的がある。本実施形態では、ステップＳ８００でのサイズ変更後における推定処理に要する時間が５０ｍｓ程度になるように入力データ４００をリサイズする。但し、推定処理に要する時間はＣＰＵ１０３やＧＰＵ１０２の演算処理能力に関わるため、この限りではない。 In step S800, the CPU 103 changes (resizes) the size of the received image in the image processing unit 154. The larger the size of the image data input to the trained model 403 as the input data 400, the longer the estimation process takes for the estimation unit 155. That is, the process of step S800 aims to reduce the time required for the estimation process by reducing the size of the input data 400. In this embodiment, the input data 400 is resized so that the time required for the estimation process after the size change in step S800 is about 50 ms. However, this is not the case because the time required for the estimation process is related to the computational processing capabilities of the CPU 103 and GPU 102.

続いて、ステップＳ８０１では、ＣＰＵ１０３は、推定対象設定部１５２から推定対象とするタグ（追尾オブジェクト）を取得する。本実施形態では、推定対象設定部１５２の初期値として、航空機が設定されているので、ステップＳ８０１において、タグとして航空機が取得される。 Next, in step S801, the CPU 103 acquires a tag (tracking object) to be estimated from the estimation target setting unit 152. In this embodiment, an aircraft is set as the initial value of the estimation target setting unit 152, and therefore, in step S801, an aircraft is acquired as the tag.

続いてステップＳ８０２では、ＣＰＵ１０３は、ステップＳ８００でリサイズした入力データ４００を学習済みモデル４０３に入力する。 Next, in step S802, the CPU 103 inputs the input data 400 resized in step S800 into the trained model 403.

ステップＳ８０３では、ＣＰＵ１０３とＧＰＵ１０２が協働して推定部１５５で推定処理を実行し、出力データ４０１を出力する。 In step S803, the CPU 103 and the GPU 102 cooperate to execute estimation processing in the estimation unit 155 and output the output data 401.

続いてステップＳ８０４に進み、ＣＰＵ１０３は、出力データ４０１にステップＳ８０１で取得したタグが含まれるかを判断する。この結果、ＣＰＵ１０３が出力データ４０１にステップＳ８０１で取得したタグが含まれると判断した場合、ステップＳ８０５に進み、そうでない場合はステップＳ８９０に進む。 Then, the process proceeds to step S804, where the CPU 103 determines whether the output data 401 includes the tag acquired in step S801. As a result, if the CPU 103 determines that the output data 401 includes the tag acquired in step S801, the process proceeds to step S805, and if not, the process proceeds to step S890.

ステップＳ８９０では、ＣＰＵ１０３は、雲台装置２００に停止指令の制御信号を送信し、ステップＳ８９１に進む。 In step S890, the CPU 103 sends a control signal to the camera head device 200 to command a stop, and then proceeds to step S891.

ステップＳ８９１では、ＣＰＵ１０３は、モード管理部１５３により管理されている現在のモードをマニュアルモードに変更し、自動追尾モード処理を終了する。 In step S891, the CPU 103 changes the current mode managed by the mode management unit 153 to manual mode and ends the auto-tracking mode processing.

このように、ステップＳ８０４において、推定部１５５からの出力データ４０１に、推定対象設定部１５２で設定したタグが含まれなかった場合、ＣＰＵ１０３は、追尾対象が撮影可能範囲から消失したと判断し、自動追尾モードを終了する。 Thus, in step S804, if the output data 401 from the estimation unit 155 does not include the tag set by the estimation target setting unit 152, the CPU 103 determines that the tracking target has disappeared from the captureable range and ends the automatic tracking mode.

ステップＳ８０５では、ＣＰＵ１０３（演算手段）は、出力データ４０１に含まれる、推定されたオブジェクトの外接枠の左上座標と右下座標から、画面内における推定されたオブジェクトのオブジェクトサイズと中心点座標を演算する。次に、ＣＰＵ１０３（制御手段）は、演算された中心点座標と、追尾目標位置の差分を基に、パン、チルトの制御指令を演算し、その指令の制御信号を雲台装置２００に送信する。同様に、ＣＰＵ１０３（制御手段）は、演算されたオブジェクトサイズと追尾目標サイズを基にズームの制御指令を演算し、その指令の制御信号を雲台装置２００に送信する。追尾目標位置と、追尾目標サイズは、予め情報処理装置１００の記憶部１０５に登録しておけばよい。本実施形態では、追尾目標位置として画面内の中心座標が、追尾目標サイズとして画面の３０％のサイズが予め登録されているが、これに限らない。また、追尾目標位置と追尾目標サイズは、操作装置３００から情報処理装置１００に設定できる構成としても良い。 In step S805, the CPU 103 (calculation means) calculates the object size and center point coordinates of the estimated object in the screen from the upper left coordinates and lower right coordinates of the circumscribing frame of the estimated object included in the output data 401. Next, the CPU 103 (control means) calculates a pan and tilt control command based on the difference between the calculated center point coordinates and the tracking target position, and transmits a control signal of the command to the camera head device 200. Similarly, the CPU 103 (control means) calculates a zoom control command based on the calculated object size and tracking target size, and transmits a control signal of the command to the camera head device 200. The tracking target position and tracking target size may be registered in advance in the storage unit 105 of the information processing device 100. In this embodiment, the center coordinates in the screen are registered in advance as the tracking target position, and 30% of the screen size is registered in advance as the tracking target size, but this is not limited to this. The tracking target position and tracking target size may also be configured to be set in the information processing device 100 from the operation device 300.

以上までの処理により、雲台装置２００は情報処理装置１００から送信されたパン、チルトの制御指令及びズームの制御指令の制御信号に従って、カメラ２０１の旋回・変倍制御を行う。また、かかる旋回・変倍制御中にカメラ２０１で撮影された映像が、情報処理装置１００に再び送信される。以下後述する場合を除き（ステップＳ８０８でＮＯ）、上記再び雲台装置２００から送信された画像を入力データ４００とし、ステップＳ８００ａからの処理が繰り返される。これにより、ＣＰＵ１０３は、カメラ２０１からフレームを受信する毎に、その受信したフレームに基づいてカメラ２０１の自動追尾撮影を制御することが実現できる。 By the above processing, the camera platform device 200 performs rotation and magnification control of the camera 201 according to the control signals of the pan and tilt control commands and the zoom control command transmitted from the information processing device 100. In addition, the image captured by the camera 201 during such rotation and magnification control is transmitted again to the information processing device 100. Except as described below (NO in step S808), the image transmitted again from the camera platform device 200 is used as input data 400, and the processing from step S800a is repeated. This allows the CPU 103 to control the automatic tracking shooting of the camera 201 based on the received frame each time it receives a frame from the camera 201.

ところで、離陸中の航空機を、常設された雲台装置２００から撮影する場合、航空機は雲台装置２００から遠ざかるため、自動追尾モードにおいてはステップＳ８０５での上記ズームの制御指令により追尾目標サイズに近づくよう制御する。しかし、カメラ２０１でかかる自動追尾撮影を継続すると、入力データ４００中の航空機のサイズは小さくなり、画像の画質劣化が生じる。その結果推定部１５５からの出力データ４０１にある航空機のタグの尤度が低くなる。また、カメラ２０１でのカメラ２０１での光学ズーム可能な領域を超え、デジタルズームにより、入力データ４００中の航空機のズームを行う場合、画像の画質劣化が生じる。これによっても、推定部１５５からの出力データ４０１にある航空機のタグの尤度が低くなる。このような場合、雲台装置２００から送信された画像に含まれる被写体に航空機が存在しても学習済みモデル４０３によるその被写体を航空機と推定する精度が下がるため、自動追尾撮影の継続が難しい。 When an aircraft taking off is photographed from a permanently installed pan head device 200, the aircraft moves away from the pan head device 200, so in the automatic tracking mode, the zoom control command in step S805 controls the aircraft to approach the tracking target size. However, if the camera 201 continues such automatic tracking photography, the size of the aircraft in the input data 400 becomes smaller, and image quality degradation occurs. As a result, the likelihood of the aircraft tag in the output data 401 from the estimation unit 155 decreases. In addition, if the camera 201 exceeds the optical zoomable area of the camera 201 and zooms in on the aircraft in the input data 400 by digital zoom, image quality degradation occurs. This also reduces the likelihood of the aircraft tag in the output data 401 from the estimation unit 155. In such a case, even if an aircraft is present in the subject included in the image transmitted from the pan head device 200, the accuracy of estimating the subject as an aircraft by the trained model 403 decreases, making it difficult to continue automatic tracking photography.

よってステップＳ８０６以降の処理では、以下説明する通り、ステップＳ８０５での演算結果、ステップＳ８０３で得られた出力データ４０１、及びステップＳ８００ａで受信したフレームのメタ情報の少なくとも１つの情報に応じて推定対象を再設定する。これにより、情報処理装置１００による、雲台装置２００による航空機の自動追尾撮影の制御を可能とする。 Therefore, in the processing from step S806 onwards, as described below, the estimation target is reset according to at least one of the calculation results from step S805, the output data 401 obtained in step S803, and the meta information of the frame received in step S800a. This enables the information processing device 100 to control automatic tracking and photography of the aircraft using the camera platform device 200.

ステップＳ８０６では、ＣＰＵ１０３は、ステップＳ８０５で演算されたオブジェクトサイズが所定値以下（第１の閾値以下）であるか否かを判断する。この結果、ＣＰＵ１０３が演算されたオブジェクトサイズが所定値以下であると判断した場合は、ステップＳ８０９に進み、そうでない場合は、ステップＳ８０７に進む。 In step S806, the CPU 103 determines whether the object size calculated in step S805 is equal to or smaller than a predetermined value (equal to or smaller than a first threshold value). If the CPU 103 determines that the object size calculated is equal to or smaller than the predetermined value, the process proceeds to step S809; otherwise, the process proceeds to step S807.

ステップＳ８０７では、ＣＰＵ１０３は、出力データ４０１からオブジェクトの尤度を取得し、尤度が所定値以下（第２の閾値以下）である場合は、ステップＳ８０９に進み、そうでない場合はステップＳ８０８に進む。 In step S807, the CPU 103 obtains the likelihood of the object from the output data 401, and if the likelihood is equal to or less than a predetermined value (equal to or less than a second threshold), the process proceeds to step S809; otherwise, the process proceeds to step S808.

ステップＳ８０８では、ＣＰＵ１０３は、カメラ２０１において画質低下要因があるか否かを判断する。この結果、ＣＰＵ１０３がカメラ２０１において画質低下要因があると判断した場合は、ステップＳ８０９に進み、そうでない場合はステップＳ８００ａに戻る。実施形態では、ＣＰＵ１０３は、ステップＳ８００ａで受信したフレームの撮影時において、カメラ２０１がデジタルズームを使用している場合、画質低下要因があると判断する。尚、カメラ２０１からフレームを取得する際、そのフレームのメタ情報として、撮影時のデジタルズームを使用有無の情報も取得する。 In step S808, the CPU 103 determines whether or not there is a factor that reduces image quality in the camera 201. As a result, if the CPU 103 determines that there is a factor that reduces image quality in the camera 201, the process proceeds to step S809, and if not, the process returns to step S800a. In the embodiment, the CPU 103 determines that there is a factor that reduces image quality if the camera 201 used digital zoom when capturing the frame received in step S800a. When acquiring a frame from the camera 201, information on whether or not digital zoom was used when capturing the frame is also acquired as meta information for the frame.

尚、画質低下要因の有無の判断は、撮影されたフレームの輝度、その撮影の際のカメラ２０１のゲイン、その撮影の際の時刻を単独又は複合的に行うようにしても良い。例えば、カメラ２０１からフレームを取得する際、そのフレームのメタ情報として、撮影時の輝度、ゲイン、及び撮影時刻も取得するようにする。これにより、ＣＰＵ１０３は、取得したメタ情報から、フレーム撮影時の輝度が低く、カメラ２０１のゲインが高く、かつ撮影時刻が夜間である場合には、画質低下要因があると判断できる。 The presence or absence of factors that degrade image quality may be determined based on the brightness of the captured frame, the gain of the camera 201 at the time of capture, and the time of capture, either alone or in combination. For example, when acquiring a frame from the camera 201, the brightness, gain, and capture time at the time of capture are also acquired as meta information for the frame. This allows the CPU 103 to determine from the acquired meta information that there is a factor that degrades image quality if the brightness at the time of capture of the frame was low, the gain of the camera 201 was high, and the capture time was at night.

ステップＳ８０９では、ＣＰＵ１０３は、推定対象設定部１５２で管理される推定対象を再設定する。具体的には、初期値である航空機に加え、航空機と誤認しやすいタグ（本実施形態においては、航空機より実サイズが小さい鳥及び凧のタグ）を設定し、ステップＳ８００ａに戻る。尚、誤認しやすいタグについては、追尾目標サイズとして航空機の場合より小さいサイズ、ここでは画面の１０％のサイズが予め登録されている。これにより、以降新たに雲台装置２００から受信する画像（次フレーム以降の画像）に基づく入力データ４００からは、推定部１５５は、ステップＳ８０３において航空機以外に鳥や凧のタグを含む出力データ４０１を出力することとなる。よって、ステップＳ８０４にて推定部１５５が航空機以外のタグを出力した、すなわち、推定結果が航空機を鳥や凧と誤認した結果となったとしても、航空機の自動追尾撮影を継続することが可能となる。 In step S809, the CPU 103 resets the estimation target managed by the estimation target setting unit 152. Specifically, in addition to the initial value of aircraft, a tag that is easily mistaken for an aircraft (in this embodiment, a bird and kite tag whose actual size is smaller than that of an aircraft) is set, and the process returns to step S800a. For tags that are easily mistaken, a tracking target size smaller than that of an aircraft, here 10% of the screen size, is preregistered. As a result, from the input data 400 based on the image (image from the next frame onward) newly received from the camera platform device 200, the estimation unit 155 outputs output data 401 including bird and kite tags in addition to aircraft in step S803. Therefore, even if the estimation unit 155 outputs a tag other than an aircraft in step S804, that is, the estimation result is a result of misidentifying an aircraft as a bird or kite, it is possible to continue automatic tracking and shooting of the aircraft.

尚、本実施形態では推定対象設定部１５２で管理される推定対象を再設定した際に、ＣＰＵ１０３（通知手段）が、シリアル通信部１０６を介して雲台装置２００にその旨を通知してもよい。この場合、雲台装置２００は、かかる通知を受信した際、内部で保持する駆動パラメータを変更する。具体的には、推定対象が鳥や凧に再設定された場合、その実サイズは推定対象の初期値である飛行機より小さく、且つ画面内での移動量も小さくなる。そこで、雲台装置２００は、設定管理部２５２で管理されるパン、チルト、ズームの駆動パラメータを変更し、推定対象が再設定される前より微細な制御を実現する。 In this embodiment, when the estimation target managed by the estimation target setting unit 152 is reset, the CPU 103 (notification means) may notify the camera head device 200 of this via the serial communication unit 106. In this case, when the camera head device 200 receives such a notification, it changes the drive parameters stored internally. Specifically, when the estimation target is reset to a bird or a kite, its actual size will be smaller than the initial value of the estimation target, which is an airplane, and the amount of movement within the screen will also be smaller. Therefore, the camera head device 200 changes the drive parameters of pan, tilt, and zoom managed by the setting management unit 252, and achieves finer control than before the estimation target was reset.

尚、本実施形態では、ステップＳ８０６，Ｓ８０７，Ｓ８０８の判断は、この順で実行されたが、これらの少なくとも１つのみを判断するようにしてもよいし、異なる順で実行するようにしてもよい。 In this embodiment, the determinations of steps S806, S807, and S808 are performed in this order, but it is also possible to determine only one of these, or to perform them in a different order.

尚、本実施形態では航空機が離陸するシーンを自動追尾撮影する例として説明をしたが、これに限らず、航空機が着陸するシーンを自動追尾撮影しても良い。 Note that in this embodiment, an example has been described in which automatic tracking and photography is performed on a scene in which an aircraft is taking off, but this is not limiting, and automatic tracking and photography may also be performed on a scene in which an aircraft is landing.

尚、本実施形態では、情報処理装置１００と雲台装置２００間をシリアル通信と、有線の映像信号線で接続したが、これに限らず公衆電話回線や、インターネット等の通信回線で接続しても良い。例えば、情報処理装置１００はクラウドサーバであってもよい。 In this embodiment, the information processing device 100 and the camera platform device 200 are connected by serial communication and a wired video signal line, but the connection is not limited to this and may be made by a public telephone line or a communication line such as the Internet. For example, the information processing device 100 may be a cloud server.

さらに、本実施形態では、学習部１５０も情報処理装置１００内に存在したが、学習部１５０は情報処理装置１００と接続する外部装置に存在してもよい。この場合、情報処理装置１００は、学習済みモデル４０３の内部パラメータをその外部装置より取得する。 In addition, in this embodiment, the learning unit 150 is also present within the information processing device 100, but the learning unit 150 may be present in an external device connected to the information processing device 100. In this case, the information processing device 100 acquires the internal parameters of the trained model 403 from the external device.

尚、上述した各処理部のうち、推定部１５５については、機械学習された学習済みモデル４０３を用いて推定処理を実行したが、ルックアップテーブル（ＬＵＴ）等のルールベースの推定処理を行ってもよい。その場合には、例えば、入力データ４００と出力データ４０１との関係を予めＬＵＴとして作成する。そして、この作成したＬＵＴを情報処理装置１００の記憶部１０５に格納しておくとよい。推定部１５５の推定処理を行う場合には、この格納されたＬＵＴを参照して、出力データを取得することができる。つまりＬＵＴは、学習済みモデル４０３と同等の処理をするためのプログラムとして、ＣＰＵあるいはＧＰＵなどと協働で動作することにより、推定部１５５の推定処理を行う。 Note that, among the above-mentioned processing units, the estimation unit 155 executes the estimation process using the learned model 403 that has been machine-learned, but rule-based estimation process such as a look-up table (LUT) may also be performed. In that case, for example, the relationship between the input data 400 and the output data 401 is created in advance as an LUT. Then, it is advisable to store this created LUT in the storage unit 105 of the information processing device 100. When the estimation unit 155 executes the estimation process, the output data can be obtained by referring to this stored LUT. In other words, the LUT operates in cooperation with a CPU or GPU as a program for performing the same process as the learned model 403, thereby executing the estimation process of the estimation unit 155.

また、学習部１５０は、追尾対象及びこれと誤認しやすいもの以外のオブジェクトのタグ（例えば雲。以下、「不正解タグ」という。）に紐づいたオブジェクト映像についても学習用データとして用いて学習するようにしてもよい。この場合、ステップＳ８０９で推定対象を再設定する際に不正解タグも加える。その後、ステップＳ８０４で出力データ４０１に含まれるタグが不正解タグしかない場合、ステップＳ８９０に進み、情報処理装置１００は、追尾対象が撮影可能範囲から消失したと判断し、自動追尾モードを終了する。 The learning unit 150 may also learn by using as learning data object images linked to tags of objects other than the tracking target and objects that are easily mistaken for the tracking target (e.g., clouds; hereinafter, referred to as "incorrect tags"). In this case, when the estimation target is redefined in step S809, incorrect tags are also added. After that, if in step S804 the only tags included in the output data 401 are incorrect tags, the process proceeds to step S890, where the information processing device 100 determines that the tracking target has disappeared from the captureable range and ends the automatic tracking mode.

尚、本実施形態では、学習部１５０により学習処理がなされる学習モデル、及び推定部の推定処理に用いられる学習済みモデル４０３は、ニューラルネットワークのアルゴリズムを利用したものであったが、その他のアルゴリズムを利用したものであってもよい。例えば、決定木やＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）のアルゴリズムを用いたものであってもよい。 In this embodiment, the learning model that is learned by the learning unit 150 and the trained model 403 that is used in the estimation process of the estimation unit use a neural network algorithm, but other algorithms may be used. For example, a decision tree or an SVM (Support Vector Machine) algorithm may be used.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワークまたは記憶媒体を介してシステムまたは装置に供給し、そのシステムまたは装置のコンピュータがプログラムを読出し実行する処理でも実現可能である。コンピュータは、１または複数のプロセッサーまたは回路を有し、コンピュータ実行可能命令を読み出し実行するために、分離した複数のコンピュータまたは分離した複数のプロセッサーまたは回路のネットワークを含みうる。 The present invention can also be realized by providing a program that realizes one or more functions of the above-described embodiments to a system or device via a network or storage medium, and having a computer in the system or device read and execute the program. The computer has one or more processors or circuits, and may include multiple separate computers or a network of multiple separate processors or circuits to read and execute computer-executable instructions.

プロセッサーまたは回路は、中央演算処理装置（ＣＰＵ）、マイクロプロセッシングユニット（ＭＰＵ）、グラフィクスプロセッシングユニット（ＧＰＵ）、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートウェイ（ＦＰＧＡ）を含みうる。また、プロセッサーまたは回路は、デジタルシグナルプロセッサ（ＤＳＰ）、データフロープロセッサ（ＤＦＰ）、またはニューラルプロセッシングユニット（ＮＰＵ）を含みうる。 The processor or circuitry may include a central processing unit (CPU), a microprocessing unit (MPU), a graphics processing unit (GPU), an application specific integrated circuit (ASIC), or a field programmable gateway (FPGA). The processor or circuitry may also include a digital signal processor (DSP), a data flow processor (DFP), or a neural processing unit (NPU).

１００情報処理装置
１０３，２０４ＣＰＵ
１０６，２０３シリアル通信部
１５０学習部
１５２推定対象設定部
１５５推定部
１５７雲台制御部
２００雲台装置
２０１カメラ
２０２駆動部
３００操作装置
ＮＥＴネットワーク 100 Information processing device 103, 204 CPU
106, 203 Serial communication unit 150 Learning unit 152 Estimation target setting unit 155 Estimation unit 157 Head control unit 200 Head device 201 Camera 202 Driving unit 300 Operation device NET Network

Claims

An information processing device that tracks a target subject by controlling pan, tilt, and zoom of an imaging device that captures a moving image, comprising:
an acquisition means for acquiring frames of a moving image captured by the imaging device;
An estimation means for estimating an object present in the acquired frame;
a control unit that controls the imaging device so as to track an object present in the acquired frame when the estimated object is a preset tracking object;
The information processing device is characterized in that the control means resets the tracking object to an object whose actual size is smaller than the predetermined tracking object when the size of an object present in the acquired frame is equal to or smaller than a first threshold .

The information processing apparatus according to claim 1 , wherein the control means resets the tracking object when the estimated object likelihood is equal to or smaller than a second threshold value.

3 . The information processing apparatus according to claim 1 , wherein the control means resets the tracking object when it is determined that there is a factor causing deterioration in image quality in the imaging device based on meta information of the frame. 4 .

the imaging device performs the zooming using at least one of an optical zoom and a digital zoom;
4. The information processing device according to claim 3, wherein the control means determines that the factor of image quality degradation exists when the meta information includes information indicating that the digital zoom was used when the frame was captured by the imaging device.

The information processing device according to claim 3 or 4, characterized in that the control means determines that there is a factor of image quality degradation based on at least one of the brightness of the frame, the gain of the imaging device when the frame was photographed, and the time when the frame was photographed, which are included in the meta information.

6. The information processing apparatus according to claim 1, further comprising a notification unit that, when the control unit resets the tracking object, notifies the imaging device of that effect.

7. The information processing apparatus according to claim 6 , wherein the imaging device changes driving parameters of pan, tilt and zoom when the imaging device receives the notification from the notification means.

The information processing apparatus according to claim 1 , further comprising: a learning unit that generates internal parameters of a trained model used for estimating the object in the estimation unit.

9. The information processing apparatus according to claim 8 , wherein the learning means uses learning data in which an image and teacher data indicating an object contained in the image are linked to each other.

The information processing device according to claim 8 or 9, characterized in that the estimation means, each time it receives a frame of the moving image from the imaging device, resizes the received frame as input data to the trained model , and outputs information about the object from the trained model as output data.

A method for controlling an information processing device to track a target subject by controlling pan, tilt, and zoom of an imaging device that captures a moving image, comprising:
An acquisition step of acquiring frames of a moving image captured by the imaging device;
an estimation step of estimating an object present in the acquired frame;
and a control step of controlling the imaging device so as to track an object present in the acquired frame when the estimated object is a preset tracking object,
A control method characterized in that the control step resets the tracking object to an object whose actual size is smaller than the predetermined tracking object when the size of an object present in the acquired frame is equal to or smaller than a first threshold .

A computer-executable program that causes a computer to function as each of the means of the information processing apparatus according to any one of claims 1 to 10 .