JP7609726B2

JP7609726B2 - Computer system, method for controlling input of data to a model, and control device

Info

Publication number: JP7609726B2
Application number: JP2021115708A
Authority: JP
Inventors: 忠幸松村; 潔人伊藤; 弘之水野
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2021-07-13
Filing date: 2021-07-13
Publication date: 2025-01-07
Anticipated expiration: 2041-07-13
Also published as: JP2023012202A

Description

本発明は、行動を決定するモデルへのデータの入力制御に関する。 The present invention relates to controlling the input of data into a model that determines behavior.

機械学習を利用して、車両、ロボット、及びドローン等の制御対象の自動制御を実現するアルゴリズムの開発が行われている。例えば、ＥｎｄｔｏＥｎｄ深層強化学習を利用して、画像から車両の制御信号を出力するアルゴリズムの開発が行われる。 Machine learning is being used to develop algorithms that realize automatic control of controlled objects such as vehicles, robots, and drones. For example, end-to-end deep reinforcement learning is being used to develop algorithms that output vehicle control signals from images.

ＥｎｄｔｏＥｎｄ深層強化学習は、大量の学習データが必要かつ学習時間が膨大であり、また、学習が困難な場合が多いという問題がある。そこで、オブジェクトベース強化学習を利用したアルゴリズム開発が注目されている。 End-to-end deep reinforcement learning has problems in that it requires a large amount of training data, takes a huge amount of time to train, and is often difficult to train. Therefore, the development of algorithms that use object-based reinforcement learning has attracted attention.

オブジェクトベース強化学習では、画像からオブジェクト（物体）を抽出し、抽出したオブジェクトの情報に基づいて、制御対象の行動を決定する方策を学習する。 In object-based reinforcement learning, objects are extracted from images, and a strategy for determining the behavior of the target is learned based on the information about the extracted objects.

一般的に、方策は、ニューラルネットワーク等のモデルとして与えられる。モデルがニューラルネットワークの場合、ニューラルネットワークは複数の入力スロットを有する。通常、オブジェクト情報を入力する入力スロットが変わると出力値も変化する。すなわち、方策の精度に影響を与える。そのため、入力スロットに対するオブジェクト情報の入力の制御は重要である。 Generally, a policy is given as a model such as a neural network. When the model is a neural network, the neural network has multiple input slots. Usually, when the input slot for inputting object information changes, the output value also changes. In other words, this affects the accuracy of the policy. Therefore, it is important to control the input of object information to the input slots.

入力スロットへのオブジェクト情報の入力を制御する方法として、例えば、非特許文献１に記載の技術が知られている。 As a method for controlling the input of object information to an input slot, for example, the technology described in Non-Patent Document 1 is known.

非特許文献１には、物体の見え情報に基づいて物体のスコアを決定し、スコアに基づいて物体を選択して、選択したオブジェクト情報をニューラルネットワークに入力する手法が記載されている。 Non-Patent Document 1 describes a method of determining an object's score based on the object's appearance information, selecting an object based on the score, and inputting the selected object information into a neural network.

D. Wang, C. Devin, Q. -Z. Cai, F. Yu and T. Darrell, "Deep Object-Centric Policies for Autonomous Driving," 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 8853-8859, doi: 10.1109/ICRA.2019.8794224.D. Wang, C. Devin, Q. -Z. Cai, F. Yu and T. Darrell, "Deep Object-Centric Policies for Autonomous Driving," 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 8853 -8859, doi: 10.1109/ICRA.2019.8794224.

物体の見え方は多様性が高いため、人が物体の見え方及びスコアの算出方法を設定するか、又は、優先度の算出方法を別途学習する必要がある。したがって、処理コストが高いという課題がある。また、非特許文献１に記載の手法は、方策の精度の改善効果が小さいという課題がある。 Because there is a high degree of diversity in how objects appear, it is necessary for a person to set the object appearance and the method of calculating the score, or to separately learn the method of calculating the priority. Therefore, there is an issue that the processing cost is high. In addition, the method described in Non-Patent Document 1 has an issue that the effect of improving the accuracy of the policy is small.

本発明は、高い精度の出力を得るための、モデルへのオブジェクト情報の入力制御を実現する計算機システム及び方法を提供する。 The present invention provides a computer system and method that realizes control of the input of object information into a model in order to obtain a highly accurate output.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、計算機システムであって、演算装置、前記演算装置に接続される記憶装置、前記演算装置に接続されるインタフェースを有する、少なくとも一つの計算機を備え、制御対象の周囲の画像に含まれる物体の特徴を表す物体データを用いて、前記制御対象の行動を決定するモデルを管理するモデル情報と、前記物体の動きに基づいて付与されるラベルに基づく前記モデルに対する前記物体データの入力規則を管理する入力制御情報と、を保持し、前記画像の時系列データを受け付け、複数の前記画像の各々から前記物体を検出し、複数の前記画像の中からターゲット画像を選択し、前記ターゲット画像を含む二つ以上の前記画像を用いて、前記ターゲット画像から検出された前記物体の動きを表す動きデータを算出し、前記ターゲット画像及び前記動きデータを対応付けて記憶し、複数の前記画像の各々について、前記動きデータに基づいて、前記画像から検出された前記物体に前記ラベルを付与し、複数の前記画像の各々について、前記入力制御情報及び前記ラベルに基づいて、前記モデルに対する、前記画像から検出された前記物体の前記物体データの入力を制御する。 A representative example of the invention disclosed in the present application is as follows. That is, a computer system includes at least one computer having an arithmetic unit, a storage device connected to the arithmetic unit, and an interface connected to the arithmetic unit, and holds model information that manages a model that determines the behavior of the control target using object data that represents the characteristics of objects included in an image surrounding the control target, and input control information that manages input rules for the object data to the model based on labels assigned based on the movement of the object, receives time-series data of the images, detects the object from each of the multiple images, selects a target image from the multiple images, calculates motion data representing the motion of the object detected from the target image using two or more images including the target image, stores the target image and the motion data in association with each other, assigns the label to the object detected from the image based on the motion data for each of the multiple images, and controls input of the object data of the object detected from the image to the model based on the input control information and the label for each of the multiple images.

本発明によれば、モデルの出力の精度を向上させる物体データ（オブジェクト情報）の入力制御を実現できる。上記した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to the present invention, it is possible to realize input control of object data (object information) that improves the accuracy of model output. Problems, configurations, and advantages other than those described above will become clear from the explanation of the following embodiment.

実施例１の計算機システムに含まれる計算機のハードウェア構成の一例を説明する図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of a computer included in the computer system of the first embodiment. 実施例１の計算機の機能構成の一例を示す図である。FIG. 2 illustrates an example of a functional configuration of a computer according to a first embodiment. 実施例１の計算機の機能構成の一例を示す図である。FIG. 2 illustrates an example of a functional configuration of a computer according to a first embodiment. 実施例１の計算機システムが扱う画像の一例を説明する図である。1 is a diagram illustrating an example of an image handled by the computer system of the first embodiment. 実施例１の計算機システムが扱う画像の一例を説明する図である。1 is a diagram illustrating an example of an image handled by the computer system of the first embodiment. 実施例１の計算機システムが扱う画像の一例を説明する図である。1 is a diagram illustrating an example of an image handled by the computer system of the first embodiment. 実施例１の計算機が実行する処理の一例を説明するフローチャートである。1 is a flowchart illustrating an example of a process executed by a computer according to a first embodiment. 実施例１の計算機が生成する境界ボックス情報のデータ構造の一例を示す図である。11 is a diagram illustrating an example of a data structure of bounding box information generated by a computer according to a first embodiment. FIG. 実施例１の計算機が生成する動き情報のデータ構造の一例を示す図である。FIG. 13 is a diagram illustrating an example of a data structure of motion information generated by a computer according to the first embodiment. 実施例１の計算機が生成する物体ラベル情報のデータ構造の一例を示す図である。FIG. 13 is a diagram illustrating an example of a data structure of object label information generated by a computer according to the first embodiment. 実施例１の入力制御部による物体データの入力制御の一例を説明する図である。4A to 4C are diagrams illustrating an example of input control of object data by an input control unit according to the first embodiment. 実施例１の入力制御部による物体データの入力制御の一例を説明する図である。4A to 4C are diagrams illustrating an example of input control of object data by an input control unit according to the first embodiment. 実施例１の入力制御部による物体データの入力制御の一例を説明する図である。4A to 4C are diagrams illustrating an example of input control of object data by an input control unit according to the first embodiment. 実施例１の入力制御部による物体データの入力制御の一例を説明する図である。4A to 4C are diagrams illustrating an example of input control of object data by an input control unit according to the first embodiment. 実施例１の境界ボックス予測部が実行する境界ボックス補正処理の一例を説明するフローチャートである。11 is a flowchart illustrating an example of a bounding box correction process executed by a bounding box prediction unit according to the first embodiment. 実施例１の境界ボックス予測部による予測境界ボックスの算出方法の一例を示す図である。10 is a diagram illustrating an example of a method for calculating a predicted bounding box by a bounding box prediction unit according to the first embodiment. FIG. 実施例２の計算機の機能構成の一例を示す図である。FIG. 11 is a diagram illustrating an example of a functional configuration of a computer according to a second embodiment. 実施例２の計算機の機能構成の一例を示す図である。FIG. 11 is a diagram illustrating an example of a functional configuration of a computer according to a second embodiment. 実施例２のクラスタ情報のデータ構造の一例を示す図である。FIG. 11 is a diagram illustrating an example of a data structure of cluster information according to the second embodiment. 実施例２の履歴情報のデータ構造の一例を示す図である。FIG. 11 is a diagram illustrating an example of a data structure of history information according to the second embodiment. 実施例２の入力制御部による物体データの入力制御の一例を説明する図である。13 is a diagram illustrating an example of input control of object data by an input control unit according to a second embodiment. FIG. 実施例２の計算機が実行する再クラスタリング処理を説明するフローチャートである。13 is a flowchart illustrating a reclustering process executed by a computer according to a second embodiment.

以下、本発明の実施例を、図面を用いて説明する。ただし、本発明は以下に示す実施例の記載内容に限定して解釈されるものではない。本発明の思想ないし趣旨から逸脱しない範囲で、その具体的構成を変更し得ることは当業者であれば容易に理解される。 The following describes an embodiment of the present invention with reference to the drawings. However, the present invention should not be interpreted as being limited to the description of the embodiment shown below. It will be easily understood by those skilled in the art that the specific configuration can be changed without departing from the concept or spirit of the present invention.

以下に説明する発明の構成において、同一又は類似する構成又は機能には同一の符号を付し、重複する説明は省略する。 In the configuration of the invention described below, the same or similar configurations or functions are given the same reference symbols, and duplicate explanations are omitted.

本明細書等における「第１」、「第２」、「第３」等の表記は、構成要素を識別するために付するものであり、必ずしも、数又は順序を限定するものではない。 The terms "first," "second," "third," and the like used in this specification are used to identify components and do not necessarily limit the number or order.

図面等において示す各構成の位置、大きさ、形状、及び範囲等は、発明の理解を容易にするため、実際の位置、大きさ、形状、及び範囲等を表していない場合がある。したがって、本発明では、図面等に開示された位置、大きさ、形状、及び範囲等に限定されない。 The position, size, shape, range, etc. of each component shown in the drawings, etc. may not represent the actual position, size, shape, range, etc., in order to facilitate understanding of the invention. Therefore, the present invention is not limited to the position, size, shape, range, etc. disclosed in the drawings, etc.

計算機システムは、少なくとも一つの計算機１００から構成される。図１は、実施例１の計算機システムに含まれる計算機１００のハードウェア構成の一例を説明する図である。 The computer system is composed of at least one computer 100. Figure 1 is a diagram illustrating an example of the hardware configuration of the computer 100 included in the computer system of the first embodiment.

計算機１００は、プロセッサ１０１、メモリ１０２、補助記憶装置１０３、ネットワークインタフェース１０４、及び入出力インタフェース１０５を有する。各ハードウェア要素はバスを介して互いに接続される。 The computer 100 has a processor 101, a memory 102, an auxiliary storage device 103, a network interface 104, and an input/output interface 105. Each hardware element is connected to each other via a bus.

プロセッサ１０１は、計算機１００全体の制御を行う演算装置であり、メモリ１０２に格納されるプログラムを実行する。プロセッサ１０１がプログラムにしたがって処理を実行することによって、特定の機能を実現する機能部（モジュール）として動作する。以下の説明では、機能部を主語に処理を説明する場合、プロセッサ１０１が当該機能部を実現するプログラムを実行していることを示す。 The processor 101 is an arithmetic device that controls the entire computer 100, and executes programs stored in the memory 102. The processor 101 executes processing according to the programs, thereby operating as a functional unit (module) that realizes a specific function. In the following explanation, when processing is explained using a functional unit as the subject, this indicates that the processor 101 is executing a program that realizes the functional unit.

メモリ１０２は、プロセッサ１０１が実行するプログラム及びプログラムが実行する情報を格納する記憶装置である。メモリ１０２はワークエリアとしても用いられる。 Memory 102 is a storage device that stores the programs executed by processor 101 and the information executed by the programs. Memory 102 is also used as a work area.

補助記憶装置１０３は、データを永続的に格納する記憶装置であり、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）及びＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等である。メモリ１０２に格納されるプログラム及び情報は、補助記憶装置１０３に格納されてもよい。この場合、プロセッサ１０１が補助記憶装置１０３からプログラム及び情報を読み出し、メモリ１０２にロードする。 The auxiliary storage device 103 is a storage device that permanently stores data, such as a hard disk drive (HDD) and a solid state drive (SSD). The programs and information stored in the memory 102 may be stored in the auxiliary storage device 103. In this case, the processor 101 reads the programs and information from the auxiliary storage device 103 and loads them into the memory 102.

ネットワークインタフェース１０４は、ネットワークを介して外部装置と接続するためのインタフェースである。 The network interface 104 is an interface for connecting to external devices via a network.

入出力インタフェース１０５は、入力装置１１１及び出力装置１１２と接続するためのインタフェースである。ここで、入力装置１１１は、キーボード、マウス、及びタッチパネル等である。また、出力装置１１２は、ディスプレイ等である。 The input/output interface 105 is an interface for connecting to an input device 111 and an output device 112. Here, the input device 111 is a keyboard, a mouse, a touch panel, etc., and the output device 112 is a display, etc.

なお、計算機１００は、入出力インタフェース１０５を有していなくてもよい。 Note that the computer 100 does not have to have an input/output interface 105.

計算機システムは、行動を決定するための方策を学習する計算機１００と、学習された方策を用いて行動を決定する計算機１００とを含む。なお、一つの計算機１００が、学習及び行動の決定を行ってもよい。 The computer system includes a computer 100 that learns a strategy for determining an action, and a computer 100 that determines an action using the learned strategy. Note that one computer 100 may perform both learning and action determination.

図２Ａ及び図２Ｂは、実施例１の計算機１００の機能構成の一例を示す図である。 Figures 2A and 2B are diagrams showing an example of the functional configuration of the computer 100 of Example 1.

図２Ａは、方策を学習する計算機１００－１の機能構成を示す。計算機１００－１は、物体検出部２００、境界ボックス予測部２０１、物体分類部２０２、ラベル付与部２０３、入力制御部２０４、及び学習部２１０を有する。また、計算機１００－１は、分類定義情報２３０、遮蔽検出定義情報２３１、ラベル定義情報２３２、物体データ定義情報２３３、入力制御情報２３４、モデル情報２３５、及び学習情報２３６を保持する。また、計算機１００－１には、外部から自己情報２３７及び報酬情報２３８が入力される。自己情報２３７及び報酬情報２３８は画像２５０の入力にあわせて入力される。 Figure 2A shows the functional configuration of computer 100-1 that learns a policy. Computer 100-1 has an object detection unit 200, a bounding box prediction unit 201, an object classification unit 202, a label assignment unit 203, an input control unit 204, and a learning unit 210. Computer 100-1 also holds classification definition information 230, occlusion detection definition information 231, label definition information 232, object data definition information 233, input control information 234, model information 235, and learning information 236. Self information 237 and reward information 238 are also input to computer 100-1 from the outside. Self information 237 and reward information 238 are input together with the input of image 250.

分類定義情報２３０は、物体を分類するための情報である。例えば、分類定義情報２３０は、分類器の定義情報である。遮蔽検出定義情報２３１は、ある物体が他の物体によって遮蔽された状態を検出するための方法に関する情報である。ラベル定義情報２３２は、物体の動きに基づいてラベルを付与するための方法に関する情報である。例えば、ラベル定義情報２３２は、物体種別、及び評価する物体の動きに基づくラベル付与規則を対応付けたエントリを格納する情報である。物体データ定義情報２３３は、方策を決定する場合に入力する物体データ（特徴量）のデータ構造に関する情報である。入力制御情報２３４は、方策を実現するモデルに対する物体データの入力規則に関する情報である。モデル情報２３５は、モデルの定義情報である。学習情報２３６は、方策の学習処理を制御するための情報である。例えば、学習情報２３６には、学習率等が含まれる。自己情報２３７は、制御対象の内部状態及び周囲の環境に関する情報である。報酬情報２３８は、行動に対する報酬の情報である。 The classification definition information 230 is information for classifying objects. For example, the classification definition information 230 is definition information of a classifier. The occlusion detection definition information 231 is information on a method for detecting a state in which an object is occluded by another object. The label definition information 232 is information on a method for assigning a label based on the movement of an object. For example, the label definition information 232 is information for storing an entry that associates an object type with a label assignment rule based on the movement of the object to be evaluated. The object data definition information 233 is information on the data structure of object data (feature amount) to be input when determining a policy. The input control information 234 is information on the input rule of object data for a model that realizes the policy. The model information 235 is definition information of the model. The learning information 236 is information for controlling the learning process of the policy. For example, the learning information 236 includes a learning rate, etc. The self information 237 is information on the internal state of the control target and the surrounding environment. The reward information 238 is information on the reward for the action.

物体検出部２００は、制御対象の周囲の画像２５０から物体（オブジェクト）を検出する。物体検出部２００は、当該物体を包含する矩形領域（境界ボックス）を生成する。物体検出部２００は、境界ボックス予測部２０１及び物体分類部２０２に画像２５０及び物体の境界ボックスを出力し、ラベル付与部２０３に画像２５０を出力する。 The object detection unit 200 detects an object from an image 250 of the surroundings of the control target. The object detection unit 200 generates a rectangular area (bounding box) that contains the object. The object detection unit 200 outputs the image 250 and the object's bounding box to the bounding box prediction unit 201 and the object classification unit 202, and outputs the image 250 to the label assignment unit 203.

境界ボックス予測部２０１は、複数の画像２５０を用いて、物体の境界ボックスを予測し、物体検出部２００によって生成された境界ボックスと予測された境界ボックスとを比較し、比較結果に基づいて境界ボックスを補正する。境界ボックス予測部２０１は、補正された境界ボックスをラベル付与部２０３に出力する。 The boundary box prediction unit 201 predicts a boundary box of an object using multiple images 250, compares the boundary box generated by the object detection unit 200 with the predicted boundary box, and corrects the boundary box based on the comparison result. The boundary box prediction unit 201 outputs the corrected boundary box to the label assignment unit 203.

物体分類部２０２は、画像２５０に含まれる物体の種別を分類する。本実施例では、物体分類部２０２は、境界ボックスごとに物体の種別を分類する。物体分類部２０２は、分類結果をラベル付与部２０３に出力する。 The object classification unit 202 classifies the type of object contained in the image 250. In this embodiment, the object classification unit 202 classifies the type of object for each bounding box. The object classification unit 202 outputs the classification result to the label assignment unit 203.

ラベル付与部２０３は、複数の画像２５０を用いて、画像２５０に含まれる物体の動きを解析し、物体の動きに基づいて画像２５０に含まれる物体にラベルを付与する。ラベル付与部２０３は、物体のラベル及び画像２５０を入力制御部２０４に出力する。 The labeling unit 203 uses multiple images 250 to analyze the movement of objects included in the images 250, and assigns labels to the objects included in the images 250 based on the movement of the objects. The labeling unit 203 outputs the object labels and the images 250 to the input control unit 204.

入力制御部２０４は、画像２５０に含まれる物体の物体データ８００（図８参照）を生成し、当該物体のラベルに基づいて、モデルに対する物体データ８００の入力を制御する。 The input control unit 204 generates object data 800 (see FIG. 8) for the object included in the image 250, and controls the input of the object data 800 to the model based on the label of the object.

学習部２１０は、入力制御部２０４によって入力された物体データ８００を用いて方策を学習する。学習部２１０は、方策に基づいて行動を決定する行動決定部２２０を含む。行動決定部２２０は、モデル情報２３５によって定義された方策、自己情報２３７、及び物体データ８００に基づいて行動を決定し、行動データ２６０として出力する。行動データ２６０は、例えば、車両のトルク及びステアリングを制御するための制御信号である。 The learning unit 210 learns a policy using the object data 800 input by the input control unit 204. The learning unit 210 includes a behavior decision unit 220 that decides an action based on the policy. The behavior decision unit 220 decides an action based on the policy defined by the model information 235, the self information 237, and the object data 800, and outputs it as behavior data 260. The behavior data 260 is, for example, a control signal for controlling the torque and steering of the vehicle.

学習部２１０は、学習情報２３６、報酬情報２３８、及び行動データ２６０に基づいて、方策を学習し、学習結果をモデル情報２３５に反映する。 The learning unit 210 learns a policy based on the learning information 236, the reward information 238, and the behavioral data 260, and reflects the learning results in the model information 235.

実施例１では、強化学習を例に説明するが、強化学習以外の学習手法でもよい。 In Example 1, reinforcement learning is used as an example, but learning methods other than reinforcement learning may also be used.

図２Ｂは、方策を用いて行動を決定する計算機１００－２の機能構成を示す。計算機１００－２は、物体検出部２００、境界ボックス予測部２０１、物体分類部２０２、ラベル付与部２０３、入力制御部２０４、及び行動決定部２２０を有する。また、計算機１００－２は、分類定義情報２３０、遮蔽検出定義情報２３１、ラベル定義情報２３２、物体データ定義情報２３３、入力制御情報２３４、及びモデル情報２３５を保持し、また、自己情報２３７が入力される。 Figure 2B shows the functional configuration of computer 100-2, which determines an action using a strategy. Computer 100-2 has an object detection unit 200, a bounding box prediction unit 201, an object classification unit 202, a label assignment unit 203, an input control unit 204, and an action determination unit 220. Computer 100-2 also holds classification definition information 230, occlusion detection definition information 231, label definition information 232, object data definition information 233, input control information 234, and model information 235, and also receives input of self information 237.

計算機１００－２が有する機能及び情報は、計算機１００－１が有する機能及び情報と同一であるため説明を省略する。 The functions and information possessed by computer 100-2 are the same as those possessed by computer 100-1, and therefore will not be described.

なお、計算機１００－１、１００－２が有する各機能部については、複数の機能部を一つの機能部にまとめてもよいし、一つの機能部を機能毎に複数の機能部に分けてもよい。例えば、物体検出部２００が境界ボックス予測部２０１を含んでもよい。 Note that with regard to each functional unit of the computers 100-1 and 100-2, multiple functional units may be combined into one functional unit, or one functional unit may be divided into multiple functional units for each function. For example, the object detection unit 200 may include a bounding box prediction unit 201.

なお、計算機システムは、モデルを学習し、車両、ロボット、及びドローン等の制御対象に学習されたモデルを設定する形態でもよい。この場合、制御対象は、画像を取得するカメラと、駆動装置と、図２Ｂに示すような機能を有するマイクロコンピュータ等の制御装置とを備える。制御対象は、制御装置から出力された行動データ２６０に基づいて駆動装置を制御する。 The computer system may learn a model and set the learned model to a control object such as a vehicle, a robot, or a drone. In this case, the control object includes a camera that acquires images, a drive unit, and a control device such as a microcomputer having the functions shown in FIG. 2B. The control object controls the drive unit based on the behavior data 260 output from the control device.

次に、実施例１の計算機システムが扱う画像２５０の一例について説明する。 Next, an example of an image 250 handled by the computer system of the first embodiment will be described.

図３Ａ、図３Ｂ、及び図３Ｃは、実施例１の計算機システムが扱う画像２５０の一例を説明する図である。 Figures 3A, 3B, and 3C are diagrams illustrating an example of an image 250 handled by the computer system of the first embodiment.

図３Ａに示すように、実施例１の計算機システムは、車両３００に取り付けられたカメラによって取得された画像を扱う。カメラは、範囲３１０の画像を取得するものとする。範囲３１０には、物体（車両）３２０－１、３２０－２、３２０－３、３２０－４が含まれる。なお、説明のために３次元空間の座標系を示している。 As shown in FIG. 3A, the computer system of the first embodiment handles images captured by a camera attached to a vehicle 300. The camera captures images of a range 310. Range 310 includes objects (vehicles) 320-1, 320-2, 320-3, and 320-4. For the sake of explanation, a three-dimensional coordinate system is shown.

カメラは、例えば、図３Ｂ及び図３Ｃに示すような画像２５０を出力する。図３Ｂは、画像の左上を原点とする座標系の画像２５０を示す。図３Ｃは、画像の上中心を原点とする座標系の画像２５０を示す。 The camera outputs an image 250, for example, as shown in Figures 3B and 3C. Figure 3B shows image 250 in a coordinate system whose origin is the upper left corner of the image. Figure 3C shows image 250 in a coordinate system whose origin is the top center of the image.

物体検出部２００は、検出した物体を包含する境界ボックス３３０を生成する。図３Ｂ及び図３Ｃの画像２５０からは、物体３２０－１、３２０－２、３２０－３、３２０－４を包含する境界ボックス３３０－１、３３０－２、３３０－３、３３０－４が生成される。 The object detection unit 200 generates a bounding box 330 that contains the detected object. From the image 250 in Figures 3B and 3C, bounding boxes 330-1, 330-2, 330-3, and 330-4 that contain the objects 320-1, 320-2, 320-3, and 320-4 are generated.

図３Ｂ及び図３Ｃの画像２５０には、物体３２０として車両のみが存在しているが、人、道路標識、及び信号機等、車両とは物体種別が異なる物体が含まれてもよい。 In the image 250 of Figures 3B and 3C, only vehicles are present as objects 320, but objects of different object types than vehicles, such as people, road signs, and traffic lights, may also be included.

図４は、実施例１の計算機１００－１が実行する処理の一例を説明するフローチャートである。図５は、実施例１の計算機１００－１が生成する境界ボックス情報のデータ構造の一例を示す図である。図６は、実施例１の計算機１００－１が生成する動き情報のデータ構造の一例を示す図である。図７は、実施例１の計算機１００－１が生成する物体ラベル情報のデータ構造の一例を示す図である。 FIG. 4 is a flowchart illustrating an example of processing executed by the computer 100-1 of the first embodiment. FIG. 5 is a diagram illustrating an example of the data structure of bounding box information generated by the computer 100-1 of the first embodiment. FIG. 6 is a diagram illustrating an example of the data structure of motion information generated by the computer 100-1 of the first embodiment. FIG. 7 is a diagram illustrating an example of the data structure of object label information generated by the computer 100-1 of the first embodiment.

計算機１００－１は、時系列順に並べられた、複数の画像２５０（画像２５０の時系列データ）を取得する（ステップＳ１０１）。取得した画像２５０はワークエリアに格納される。 The computer 100-1 acquires a number of images 250 (time-series data of the images 250) arranged in chronological order (step S101). The acquired images 250 are stored in a work area.

計算機１００－１の物体検出部２００は、各画像２５０に対して物体検出処理を実行する（ステップＳ１０２）。 The object detection unit 200 of the computer 100-1 performs object detection processing on each image 250 (step S102).

物体検出処理では、物体検出部２００は、一つの画像２５０から一つ以上の物体を検出し、検出された物体の境界ボックス３３０を生成する。なお、画像から物体を検出する方法は公知の技術であるため詳細な説明は省略する。 In the object detection process, the object detection unit 200 detects one or more objects from an image 250 and generates a bounding box 330 for the detected objects. Note that the method of detecting objects from an image is a known technique, so a detailed description is omitted.

物体検出部２００は、一つの画像２５０に対する物体検出処理の結果として境界ボックス情報５００を生成する。境界ボックス情報５００は、ＩＤ５０１、位置５０２、及びサイズ５０３を含むエントリを格納する。一つのエントリが一つの物体の境界ボックス３３０に対応する。なお、エントリに含まれるフィールドは前述したものに限定されない。前述したフィールドのいずれかを含まなくてもよいし、また、他のフィールドを含んでもよい。 The object detection unit 200 generates bounding box information 500 as a result of the object detection process for one image 250. The bounding box information 500 stores entries including an ID 501, a position 502, and a size 503. One entry corresponds to the bounding box 330 of one object. Note that the fields included in the entry are not limited to those described above. It is possible for the entry to not include any of the above-mentioned fields, and it may also include other fields.

ＩＤ５０１は、境界ボックス３３０（物体）の識別情報を格納するフィールドである。位置５０２は、境界ボックス３３０の中心の座標値を格納するフィールドである。なお、中心の座標値の代わりに、境界ボックス３３０の左上の座標値等が格納されてもよい。サイズ５０３は、境界ボックス３３０の幅（ｗｉｄｔｈ）及び高さ（ｈｉｇｈｔ）を格納するフィールドである。 ID 501 is a field that stores identification information of the bounding box 330 (object). Position 502 is a field that stores the coordinate values of the center of the bounding box 330. Note that instead of the coordinate values of the center, the coordinate values of the upper left corner of the bounding box 330 may be stored. Size 503 is a field that stores the width and height of the bounding box 330.

また、物体検出部２００は、各画像２５０の境界ボックス情報５００を参照し、境界ボックス３３０を追跡することによって、画像２５０間の境界ボックス３３０の対応付けを行う。物体検出部２００は、当該対応付けに基づいて、同じ物体の境界ボックス３３０に同じＩＤを付与する。 The object detection unit 200 also refers to the bounding box information 500 of each image 250 and tracks the bounding boxes 330 to associate the bounding boxes 330 between the images 250. Based on this association, the object detection unit 200 assigns the same ID to the bounding boxes 330 of the same object.

計算機１００－１の境界ボックス予測部２０１は、各画像２５０に対して境界ボックス補正処理を実行する（ステップＳ１０３）。境界ボックス補正処理は図１０及び図１１を用いて説明する。 The bounding box prediction unit 201 of the computer 100-1 executes a bounding box correction process for each image 250 (step S103). The bounding box correction process will be explained using Figures 10 and 11.

計算機１００－１の物体分類部２０２は、各画像２５０に対して物体分類処理を実行する（ステップＳ１０４）。 The object classification unit 202 of the computer 100-1 performs object classification processing on each image 250 (step S104).

画像に含まれる物体の分類は公知の技術を用いればよいため詳細な説明は省略する。例えば、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）を用いて分類する方法が考えられる。 Classification of objects contained in an image can be done using known techniques, so a detailed description will be omitted. For example, a classification method using a CNN (Convolutional Neural Network) can be considered.

計算機１００－１のラベル付与部２０３は、画像２５０のループ処理を開始する（ステップＳ１０５）。具体的には、ラベル付与部２０３は、ターゲットとなる画像２５０（ターゲット画像２５０）を一つ選択する。ここでは、最新の画像２５０から過去の方向に画像２５０が選択されるものとする。なお、時間の経過方向に画像２５０を選択してもよい。 The labeling unit 203 of the computer 100-1 starts loop processing of the images 250 (step S105). Specifically, the labeling unit 203 selects one image 250 to be a target (target image 250). Here, it is assumed that the images 250 are selected in the direction from the most recent image 250 to the past. Note that images 250 may also be selected in the direction of time lapse.

計算機１００－１のラベル付与部２０３は、物体種別のループ処理を開始する（ステップＳ１０６）。具体的には、ラベル付与部２０３は、ターゲットとなる物体種別（ターゲット物体種別）を一つ選択する。 The labeling unit 203 of the computer 100-1 starts a loop process of object types (step S106). Specifically, the labeling unit 203 selects one object type to be the target (target object type).

計算機１００－１のラベル付与部２０３は、ターゲット画像２５０を用いて、ターゲット物体種別の物体の動きデータを算出する（ステップＳ１０７）。具体的には、以下のような処理が実行される。 The labeling unit 203 of the computer 100-1 uses the target image 250 to calculate the motion data of the object of the target object type (step S107). Specifically, the following process is executed.

（Ｓ１０７－１）ラベル付与部２０３は、ターゲット画像２５０より時系列が一つ前の画像２５０を読み出す。 (S107-1) The label assignment unit 203 reads out the image 250 that is chronologically preceding the target image 250.

（Ｓ１０７－２）ラベル付与部２０３は、ターゲット画像２５０及び読み出された画像２５０の各々の境界ボックス情報５００から、ターゲット物体種別の物体の境界ボックス３３０のデータを読み出す。 (S107-2) The labeling unit 203 reads out data on the bounding box 330 of the object of the target object type from the bounding box information 500 of each of the target image 250 and the read image 250.

（Ｓ１０７－３）ラベル付与部２０３は、境界ボックス３３０のデータを用いて、ターゲット物体種別の物体の境界ボックス３３０の動きを表す指標を算出する。例えば、ラベル付与部２０３は、境界ボックス３３０の中心、境界ボックス３３０のサイズ（高さ及び幅）の変化量を算出する。 (S107-3) The labeling unit 203 uses the data of the bounding box 330 to calculate an index representing the movement of the bounding box 330 of the object of the target object type. For example, the labeling unit 203 calculates the amount of change in the center of the bounding box 330 and the size (height and width) of the bounding box 330.

このとき、ラベル付与部２０３は、物体データ８００に含める衝突猶予時間等の特徴量を算出してもよい。衝突猶予時間は式（１）、式（２）、式（３）のいずれかで算出される。 At this time, the labeling unit 203 may calculate features such as a collision grace period to be included in the object data 800. The collision grace period is calculated using any one of formulas (1), (2), and (3).

ここで、ｗ（ｔ）は、ターゲット画像２５０における境界ボックス３３０の幅を表す。デルタｗは境界ボックス３３０の幅の変化量を表す。ｈ（ｔ）は、ターゲット画像２５０における境界ボックス３３０の高さを表す。デルタｔは境界ボックス３３０の高さの変化量を表す。ｓ（ｔ）は、ターゲット画像２５０における境界ボックス３３０の面積を表す。面積は、幅及び高さの乗算値として与えられる。デルタｓは境界ボックス３３０の面積の変化量を表す。 Here, w(t) represents the width of the bounding box 330 in the target image 250. Delta w represents the change in the width of the bounding box 330. h(t) represents the height of the bounding box 330 in the target image 250. Delta t represents the change in the height of the bounding box 330. s(t) represents the area of the bounding box 330 in the target image 250. The area is given as the product of the width and height. Delta s represents the change in the area of the bounding box 330.

（Ｓ１０７－４）ラベル付与部２０３は、指標から構成される動きデータを生成し、動き情報６００に登録する。 (S107-4) The labeling unit 203 generates motion data composed of indicators and registers it in the motion information 600.

図６に示すように、動き情報６００は、ＩＤ６０１、位置変化量６０２、サイズ変化量６０３、及び衝突猶予時間６０４を含むエントリを格納する。一つのエントリが一つの物体の動きデータに対応する。なお、エントリに含まれるフィールドは前述したものに限定されない。前述したフィールドのいずれかを含まなくてもよいし、また、他のフィールドを含んでもよい。 As shown in FIG. 6, the motion information 600 stores entries including an ID 601, a position change amount 602, a size change amount 603, and a collision grace period 604. One entry corresponds to the motion data of one object. Note that the fields included in an entry are not limited to those described above. An entry may not include any of the above-mentioned fields, or may include other fields.

ＩＤ６０１は、ＩＤ５０１と同一のフィールドである。位置変化量６０２は、境界ボックス３３０の中心の変化量を格納するフィールドである。サイズ変化量６０３は、境界ボックス３３０のサイズの変化量を格納するフィールドである。衝突猶予時間６０４は、衝突猶予時間を格納するフィールドである。図６に示すように動きデータは、指標を成分とするベクトルとして扱うことができる。 ID601 is the same field as ID501. Position change amount 602 is a field that stores the amount of change in the center of bounding box 330. Size change amount 603 is a field that stores the amount of change in size of bounding box 330. Collision grace period 604 is a field that stores the collision grace period. As shown in FIG. 6, the motion data can be treated as a vector whose components are indices.

なお、動き情報６００は物体種別ごとに生成されるものとする。なお、ターゲット画像２５０が最も過去の場合、当該画像２５０より前の画像２５０は存在しないため、ラベル付与部２０３は指標を０とする。なお、時系列が連続する三つ以上の画像２５０を用いて動きデータを算出してもよい。 The motion information 600 is generated for each object type. If the target image 250 is the oldest, there is no image 250 before that image 250, so the labeling unit 203 sets the index to 0. Motion data may be calculated using three or more images 250 that are consecutive in time series.

以上がステップＳ１０７の処理の説明である。 This concludes the explanation of the processing in step S107.

計算機１００－１のラベル付与部２０３は、動き情報６００及びラベル定義情報２３２に基づいてターゲット物体種別の物体にラベルを付与し、物体ラベル情報７００を生成する（ステップＳ１０８）。 The label assignment unit 203 of the computer 100-1 assigns a label to the object of the target object type based on the movement information 600 and the label definition information 232, and generates object label information 700 (step S108).

ラベル定義情報２３２には、例えば、以下のような付与規則が定義される。
（付与規則１）中心のｙ軸の変化量の大きい順に「Ｌ１」から「Ｌｎ」までのラベルを付与する。
（付与規則２）中心のｘ軸の変化量順に動きデータをソートし、両端の動きデータのｘ軸の変化量の差分の大きい順に「Ｌ１」から「Ｌｎ」までのラベルを付与する。
ｎは物体の数を表す。 In the label definition information 232, for example, the following rules for adding labels are defined.
(Assignment rule 1) Labels are assigned from "L1" to "Ln" in descending order of the amount of change in the y-axis from the center.
(Assignment rule 2) The motion data is sorted in order of the amount of change on the x-axis from the center, and labels from "L1" to "Ln" are assigned in order of the largest difference in the amount of change on the x-axis of the motion data at both ends.
n represents the number of objects.

付与規則１は、動きデータに含まれる物理量そのものを指標としてラベルを付与する方法である。付与規則２は、動きデータに含まれる物理量から算出される、周囲の物体の動きとの違いを示す指標に基づいてラベルを付与する方法である。 Assignment rule 1 is a method of assigning a label using the physical quantity contained in the motion data itself as an index. Assignment rule 2 is a method of assigning a label based on an index that indicates the difference from the motion of surrounding objects, calculated from the physical quantity contained in the motion data.

なお、ターゲット画像２５０が最も過去の場合、ラベル付与部２０３は、時系列が一つ先の画像２５０に対するラベルの付与結果をそのまま流用する。 When the target image 250 is the oldest image, the label assignment unit 203 simply reuses the label assignment results for the image 250 that is chronologically ahead.

図７に示すように、物体ラベル情報７００は、ＩＤ７０１及びラベル７０２を含むエントリを格納する。一つのエントリが一つの物体に対応する。なお、エントリに含まれるフィールドは前述したものに限定されない。前述したフィールドのいずれかを含まなくてもよいし、また、他のフィールドを含んでもよい。 As shown in FIG. 7, object label information 700 stores entries including an ID 701 and a label 702. One entry corresponds to one object. Note that the fields included in an entry are not limited to those described above. An entry may not include any of the above-mentioned fields, or may include other fields.

ＩＤ７０１は、ＩＤ５０１と同一のフィールドである。ラベル７０２は、決定されたラベルを格納するフィールドである。図７では、中心のｙ軸の変化量の大きい順に「Ｌ１」から「Ｌｎ」までのラベルが付与された結果を示す。ｎは物体の数を表す。 ID701 is the same field as ID501. Label 702 is a field that stores the determined label. Figure 7 shows the result of assigning labels from "L1" to "Ln" in descending order of the amount of change on the y-axis of the center. n represents the number of objects.

計算機１００－１のラベル付与部２０３は、すべての物体種別について処理が完了したか否かを判定する（ステップＳ１０９）。 The label assignment unit 203 of computer 100-1 determines whether processing has been completed for all object types (step S109).

すべての物体種別について処理が完了していないと判定された場合、計算機１００－１のラベル付与部２０３は、ステップＳ１０６に戻り、同様の処理を実行する。 If it is determined that processing has not been completed for all object types, the label assignment unit 203 of computer 100-1 returns to step S106 and executes the same processing.

すべての物体種別について処理が完了したと判定された場合、計算機１００－１のラベル付与部２０３は、すべての画像２５０について処理が完了したか否かを判定する（ステップＳ１１０）。 If it is determined that processing has been completed for all object types, the label assignment unit 203 of the computer 100-1 determines whether processing has been completed for all images 250 (step S110).

すべての画像２５０について処理が完了していないと判定された場合、計算機１００－１のラベル付与部２０３は、ステップＳ１０５に戻り、同様の処理を実行する。 If it is determined that processing has not been completed for all images 250, the label assignment unit 203 of the computer 100-1 returns to step S105 and executes the same processing.

すべての画像２５０について処理が完了したと判定された場合、計算機１００－１の入力制御部２０４は、物体データ定義情報２３３に基づいて、各画像２５０に含まれる物体の物体データ８００を生成する（ステップＳ１１１）。 If it is determined that processing has been completed for all images 250, the input control unit 204 of the computer 100-1 generates object data 800 for the objects contained in each image 250 based on the object data definition information 233 (step S111).

計算機１００－１の入力制御部２０４は、物体ラベル情報７００及び入力制御情報２３４に基づいて、各画像２５０について、行動決定部２２０に対する物体データ８００の入力を制御する（ステップＳ１１２）。 The input control unit 204 of the computer 100-1 controls the input of object data 800 to the action decision unit 220 for each image 250 based on the object label information 700 and the input control information 234 (step S112).

ここで、行動決定部２２０に対する物体データの入力の制御方法について説明する。図８Ａ、図８Ｂ、図９Ａ、及び図９Ｂは、実施例１の入力制御部２０４による物体データ８００の入力制御の一例を説明する図である。 Here, a method for controlling the input of object data to the action decision unit 220 will be described. Figures 8A, 8B, 9A, and 9B are diagrams illustrating an example of input control of object data 800 by the input control unit 204 in Example 1.

行動決定部２２０が図８Ａ及び図８Ｂに示すように、入力層８１０、隠れ層８１１、及び出力層８１２から構成されるニューラルネットワークである場合、入力制御情報２３４には、ラベルと入力スロットとの対応付けを示す情報が格納される。この場合、入力制御部２０４は、物体のラベルに対応する入力スロットに物体データ８００を入力する。 When the behavior decision unit 220 is a neural network composed of an input layer 810, a hidden layer 811, and an output layer 812 as shown in Figures 8A and 8B, the input control information 234 stores information indicating the correspondence between labels and input slots. In this case, the input control unit 204 inputs object data 800 to an input slot corresponding to the label of the object.

なお、図８Ａは、物体種別が一つの場合の物体データ８００の入力制御を示し、図８Ｂは、物体種別が二つの場合の物体データ８００の入力制御を示す。 Note that FIG. 8A shows input control of object data 800 when there is one object type, and FIG. 8B shows input control of object data 800 when there are two object types.

行動決定部２２０が図９Ａ及び図９Ｂに示すように、リカレントニューラルネットワーク（ＲＮＮ）８２０及びニューラルネットワーク（ＮＮ）８３０から構成される場合、入力制御情報２３４には、ラベルとＲＮＮ８２０への物体データ８００の入力順とを対応付けた情報が格納される。この場合、入力制御部２０４は、物体のラベルに対応する順番に従って、ＲＮＮ８２０に物体データ８００を入力する。 When the behavior decision unit 220 is composed of a recurrent neural network (RNN) 820 and a neural network (NN) 830 as shown in Figures 9A and 9B, the input control information 234 stores information associating labels with the input order of the object data 800 to the RNN 820. In this case, the input control unit 204 inputs the object data 800 to the RNN 820 in the order corresponding to the object labels.

なお、図９Ａは、物体種別が一つの場合の物体データ８００の入力制御を示し、図９Ｂは、物体種別が二つの場合の物体データ８００の入力制御を示す。 Note that FIG. 9A shows input control of object data 800 when there is one object type, and FIG. 9B shows input control of object data 800 when there are two object types.

計算機１００－１の学習部２１０は、入力が制御された物体データ８００を用いて学習処理を実行する（ステップＳ１１３）。その後、計算機１００－１は、一連の処理を終了する。 The learning unit 210 of the computer 100-1 executes the learning process using the object data 800 whose input has been controlled (step S113). After that, the computer 100-1 ends the series of processes.

学習処理は公知の技術であるため詳細な説明を省略する。なお、学習部２１０は、画像２５０が入力されるたびに、学習処理を実行するのではなく、所定の周期で実行してもよい。周期が経過していない場合には、ステップＳ１１２及びステップＳ１１３の処理は省略できる。学習処理が実行されない場合、物体データ８００は蓄積される。 Since the learning process is a known technique, a detailed description will be omitted. Note that the learning unit 210 may execute the learning process at a predetermined cycle rather than every time an image 250 is input. If the cycle has not elapsed, the processes of steps S112 and S113 can be omitted. If the learning process is not executed, the object data 800 is accumulated.

計算機１００－２が実行する処理は、計算機１００－１のステップＳ１１３の処理を、行動データ２６０の出力処理に置き換えたものである。 The process executed by computer 100-2 is the same as that executed by computer 100-1 in step S113, but with the output process of behavioral data 260.

ラベル付与部２０３は、物体（境界ボックス３３０）の動きに基づいてラベルを付与する。物体の動きは、物体の見えより多様性が低いため、入力の意味付けを明確に行うことができる。例えば、水平方向の動きが大きい、自車両に近い、移動速度が速い等、物体データ８００の入力規則を与えることができる。入力規則に基づいて物体データ８００の入力を制御することによって、学習効率の向上及び出力の精度向上が期待できる。 The labeling unit 203 assigns a label based on the movement of the object (bounding box 330). Since the movement of an object has less diversity than the appearance of the object, it is possible to clearly assign meaning to the input. For example, input rules for the object data 800 can be given, such as large horizontal movement, close to the host vehicle, and fast moving speed. By controlling the input of the object data 800 based on the input rules, it is expected that the learning efficiency and the output accuracy can be improved.

図１０は、実施例１の境界ボックス予測部２０１が実行する境界ボックス補正処理の一例を説明するフローチャートである。図１１は、実施例１の境界ボックス予測部２０１による予測境界ボックスの算出方法の一例を示す図である。 FIG. 10 is a flowchart illustrating an example of a boundary box correction process executed by the boundary box prediction unit 201 of the first embodiment. FIG. 11 is a diagram illustrating an example of a method for calculating a predicted boundary box by the boundary box prediction unit 201 of the first embodiment.

境界ボックス予測部２０１は、ターゲット画像２５０を選択し、ターゲット画像２５０に対して、以下で説明する境界ボックス補正処理を実行する。なお、最新の画像２５０から過去の方向に画像２５０が選択されるものとする。 The boundary box prediction unit 201 selects a target image 250 and performs the boundary box correction process described below on the target image 250. Note that it is assumed that the image 250 is selected in the past direction from the latest image 250.

境界ボックス予測部２０１は、ターゲット画像２５０に含まれる物体のループ処理を開始する（ステップＳ２０１）。具体的には、境界ボックス予測部２０１は、ターゲット画像２５０の境界ボックス情報５００から一つの境界ボックス３３０を選択する。 The bounding box prediction unit 201 starts loop processing of objects included in the target image 250 (step S201). Specifically, the bounding box prediction unit 201 selects one bounding box 330 from the bounding box information 500 of the target image 250.

境界ボックス予測部２０１は、ターゲット画像２５０と、ターゲット画像２５０より時系列が一つ前の画像２５０とを用いて、選択された物体の予測境界ボックス３３０を算出する（ステップＳ２０２）。 The bounding box prediction unit 201 calculates a predicted bounding box 330 of the selected object using the target image 250 and the image 250 that is chronologically preceding the target image 250 (step S202).

例えば、境界ボックス予測部２０１は、ターゲット画像２５０（ｔ２）より時系列が一つ前の画像２５０（ｔ１）の物体３２０－２の境界ボックス３３０、並びに、物体３２０－２に重なっている物体３２０－１の大きさ及び移動速度等に基づいて、選択された画像２５０における物体３２０－２の予測境界ボックス３３０を算出する。 For example, the boundary box prediction unit 201 calculates the predicted boundary box 330 of the object 320-2 in the selected image 250 based on the boundary box 330 of the object 320-2 in the image 250 (t1) that is chronologically immediately preceding the target image 250 (t2), and the size and movement speed of the object 320-1 that overlaps the object 320-2.

なお、予測境界ボックス３３０の算出方法は公知の技術を用いればよいため、詳細な説明は省略する。 The method for calculating the predicted bounding box 330 can be done using known technology, so a detailed explanation is omitted.

境界ボックス予測部２０１は、選択された物体の境界ボックス３３０と、予測境界ボックス３３０との誤差を算出する（ステップＳ２０３）。 The bounding box prediction unit 201 calculates the error between the bounding box 330 of the selected object and the predicted bounding box 330 (step S203).

例えば、境界ボックス予測部２０１は、中心のｘ軸の誤差又は境界ボックス３３０の高さの誤差等を算出する。 For example, the bounding box prediction unit 201 calculates the error in the x-axis of the center or the error in the height of the bounding box 330.

境界ボックス予測部２０１は、算出された誤差及び遮蔽検出定義情報２３１に基づいて、選択された物体が他の物体によって遮蔽されているか否かを判定する（ステップＳ２０４）。 The bounding box prediction unit 201 determines whether the selected object is occluded by another object based on the calculated error and the occlusion detection definition information 231 (step S204).

遮蔽検出定義情報２３１には誤差の閾値が格納されており、境界ボックス予測部２０１は、誤差及び閾値の比較結果に基づいて、選択された物体が他の物体によって遮蔽されているか否かを判定する。例えば、誤差が閾値より大きい場合、境界ボックス予測部２０１は、選択された物体が他の物体によって遮蔽されていると判定する。 The occlusion detection definition information 231 stores an error threshold, and the bounding box prediction unit 201 determines whether the selected object is occluded by another object based on the result of comparing the error and the threshold. For example, if the error is greater than the threshold, the bounding box prediction unit 201 determines that the selected object is occluded by another object.

選択された物体が他の物体によって遮蔽されていないと判定された場合、境界ボックス予測部２０１はステップＳ２０６に進む。 If it is determined that the selected object is not occluded by another object, the bounding box prediction unit 201 proceeds to step S206.

選択された物体が他の物体によって遮蔽されていると判定された場合、境界ボックス予測部２０１は、選択された物体の境界ボックス３３０を予測境界ボックス３３０に置換し（ステップＳ２０５）、その後、ステップＳ２０６に進む。 If it is determined that the selected object is occluded by another object, the bounding box prediction unit 201 replaces the bounding box 330 of the selected object with the predicted bounding box 330 (step S205), and then proceeds to step S206.

具体的には、境界ボックス予測部２０１は、境界ボックス情報５００から選択されたエントリの位置５０２及びサイズ５０３に、予測境界ボックス３３０の値を設定する。 Specifically, the bounding box prediction unit 201 sets the values of the predicted bounding box 330 to the position 502 and size 503 of the entry selected from the bounding box information 500.

遮蔽状態を考慮して補正された境界ボックス３３０を用いることによって、物体の動きを正確に評価することができる。 By using a bounding box 330 that has been corrected to take into account the occlusion state, the object's movement can be accurately evaluated.

ステップＳ２０６では、ターゲット画像２５０に含まれるすべての物体について処理が完了したか否かを判定する（ステップＳ２０６）。 In step S206, it is determined whether processing has been completed for all objects contained in the target image 250 (step S206).

ターゲット画像２５０に含まれるすべての物体について処理が完了していないと判定された場合、境界ボックス予測部２０１は、ステップＳ２０１に戻り、同様の処理を実行する。 If it is determined that processing has not been completed for all objects contained in the target image 250, the bounding box prediction unit 201 returns to step S201 and executes the same processing.

ターゲット画像２５０に含まれるすべての物体について処理が完了したと判定された場合、境界ボックス予測部２０１は境界ボックス補正処理を終了する。 When it is determined that processing has been completed for all objects contained in the target image 250, the boundary box prediction unit 201 ends the boundary box correction process.

実施例２では、過去の物体の動きデータから構成されるクラスタを利用してラベルを付与する点が異なる。以下、実施例１との差異を中心に実施例２について説明する。 The difference in the second embodiment is that labels are assigned using clusters constructed from past object movement data. Below, the second embodiment will be explained, focusing on the differences from the first embodiment.

実施例２の計算機システムを構成する計算機１００のハードウェア構成は実施例１と同一である。 The hardware configuration of the computer 100 constituting the computer system of the second embodiment is the same as that of the first embodiment.

図１２Ａ及び図１２Ｂは、実施例２の計算機１００の機能構成の一例を示す図である。 Figures 12A and 12B are diagrams showing an example of the functional configuration of a computer 100 in Example 2.

図１２Ａは、方策を学習する計算機１００－１の機能構成を示す。実施例２の計算機１００－１は、クラスタ情報２３９及び履歴情報２４０を保持する点が実施例１と異なる。 FIG. 12A shows the functional configuration of a computer 100-1 that learns a policy. The computer 100-1 of the second embodiment differs from the computer of the first embodiment in that it holds cluster information 239 and history information 240.

図１２Ｂは、方策を用いて行動を決定する計算機１００－２の機能構成を示す。実施例２の計算機１００－１は、クラスタ情報２３９及び履歴情報２４０を保持する点が実施例１と異なる。 FIG. 12B shows the functional configuration of a computer 100-2 that uses a policy to determine an action. The computer 100-1 of the second embodiment differs from the computer of the first embodiment in that it stores cluster information 239 and history information 240.

図１３は、実施例２のクラスタ情報２３９のデータ構造の一例を示す図である。 Figure 13 is a diagram showing an example of the data structure of cluster information 239 in Example 2.

クラスタＩＤ１３０１、重心１３０２、及び範囲１３０３を含むエントリを格納する。一つのエントリが一つのクラスタに対応する。なお、エントリに含まれるフィールドは前述したものに限定されない。前述したフィールドのいずれかを含まなくてもよいし、また、他のフィールドを含んでもよい。 Entries including a cluster ID 1301, a center of gravity 1302, and a range 1303 are stored. One entry corresponds to one cluster. Note that the fields included in an entry are not limited to those described above. An entry may not include any of the fields described above, or may include other fields.

クラスタＩＤ１３０１は、クラスタの識別情報を格納するフィールドである。重心１３０２は、クラスタの重心を格納するフィールドである。範囲１３０３は、クラスタの大きさを格納するフィールドである。 Cluster ID 1301 is a field that stores identification information of a cluster. Center of gravity 1302 is a field that stores the center of gravity of a cluster. Range 1303 is a field that stores the size of the cluster.

実施例２では、動きデータに含まれるｘ座標、ｙ座標、幅、高さの各々の変化量を軸とする特徴量空間のクラスタを想定している。なお、特徴量空間はこれに限定されない。 In the second embodiment, a cluster in a feature space is assumed, with the axes representing the amount of change in each of the x-coordinate, y-coordinate, width, and height contained in the motion data. Note that the feature space is not limited to this.

なお、物体種別ごとにクラスタを構成してもよい。この場合、物体種別ごとにクラスタ情報２３９が存在する。 In addition, a cluster may be configured for each object type. In this case, cluster information 239 exists for each object type.

図１４は、実施例２の履歴情報２４０のデータ構造の一例を示す図である。 Figure 14 is a diagram showing an example of the data structure of history information 240 in Example 2.

履歴情報２４０は、位置変化量１４０１及びサイズ変化量１４０２を含むエントリを格納する。一つのエントリが動きデータに対応する。なお、エントリに含まれるフィールドは前述したものに限定されない。前述したフィールドのいずれかを含まなくてもよいし、また、他のフィールドを含んでもよい。 The history information 240 stores entries including a position change amount 1401 and a size change amount 1402. One entry corresponds to motion data. Note that the fields included in an entry are not limited to those described above. An entry may not include any of the above-mentioned fields, or may include other fields.

位置変化量１４０１及びサイズ変化量１４０２は、位置変化量６０２及びサイズ変化量６０３と同一のフィールドである。 Position change amount 1401 and size change amount 1402 are the same fields as position change amount 602 and size change amount 603.

なお、物体種別ごとにクラスタを構成する場合、エントリには物体種別を格納するフィールドが含まれる。 When configuring clusters for each object type, the entry includes a field that stores the object type.

実施例２のラベル定義情報２３２は、例えば、以下のような付与規則が定義される。
（付与規則３）クラスタｎにラベル「Ｌｎ」を付与する。
ｎはクラスタの数を表す。 In the label definition information 232 of the second embodiment, for example, the following assignment rules are defined.
(Assignment rule 3) Assign a label "Ln" to cluster n.
n represents the number of clusters.

実施例２の計算機１００－１が実行する処理の流れは実施例１と同様であるが、一部処理が異なる。 The process flow executed by the computer 100-1 in the second embodiment is the same as that in the first embodiment, but some processing is different.

ステップＳ１０８では、ラベル付与部２０３は、動き情報６００、ラベル定義情報２３２、及びクラスタ情報２３９に基づいて、ターゲット物体種別の物体にラベルを付与し、物体ラベル情報７００を生成する。 In step S108, the label assignment unit 203 assigns a label to the object of the target object type based on the motion information 600, the label definition information 232, and the cluster information 239, and generates object label information 700.

具体的には、ラベル付与部２０３は、物体の動きデータ及びクラスタ情報２３９に基づいて、物体が所属するクラスタを決定する。ラベル付与部２０３は、動きデータを履歴情報２４０に格納する。ラベル付与部２０３は、クラスタ及びラベル定義情報２３２に基づいて、ターゲット物体種別の物体にラベルを付与する。 Specifically, the label assignment unit 203 determines the cluster to which the object belongs based on the object's motion data and the cluster information 239. The label assignment unit 203 stores the motion data in the history information 240. The label assignment unit 203 assigns a label to the object of the target object type based on the cluster and label definition information 232.

実施例２の物体データ８００の入力の制御方法は実施例１と同一の制御方法を採用できる。また、図１５に示すような制御方法も採用できる。 The control method for inputting the object data 800 in the second embodiment can be the same as that in the first embodiment. In addition, a control method such as that shown in FIG. 15 can also be used.

図１５は、実施例２の入力制御部２０４による物体データ８００の入力制御の一例を説明する図である。 Figure 15 is a diagram illustrating an example of input control of object data 800 by the input control unit 204 in Example 2.

行動決定部２２０は、ＲＮＮ８２０及びＮＮ８３０から構成される。この場合、ラベル定義情報２３２には、クラスタごとのラベルの付与規則が定義されている。また、入力制御情報２３４には、各クラスタのラベルとＲＮＮ８２０への物体データ８００の入力順とを対応付けた情報が格納される。この場合、入力制御部２０４は、物体のラベルに対応する順番に従って、各クラスタの物体データ８００をＲＮＮ８２０に入力する。 The behavior decision unit 220 is composed of an RNN 820 and an NN 830. In this case, the label definition information 232 defines the label assignment rules for each cluster. Furthermore, the input control information 234 stores information that associates the label of each cluster with the input order of the object data 800 to the RNN 820. In this case, the input control unit 204 inputs the object data 800 of each cluster to the RNN 820 in the order corresponding to the object label.

図１６は、実施例２の計算機１００－１が実行する再クラスタリング処理を説明するフローチャートである。 Figure 16 is a flowchart explaining the re-clustering process executed by computer 100-1 in Example 2.

計算機１００－１は、所定数の動きデータが蓄積された場合、所定時間経過した場合、又は、実行要求を受け付けた場合、再クラスタリング処理を実行する。ここでは、ラベル付与部２０３が実行するものとする。なお、他の機能部が実行してもよい。 The computer 100-1 executes the reclustering process when a predetermined number of pieces of motion data have been accumulated, when a predetermined time has elapsed, or when an execution request has been received. Here, it is assumed that the label assignment unit 203 executes the process. However, it may also be executed by another functional unit.

ラベル付与部２０３は、履歴情報２４０を用いて、クラスタリングを実行する（ステップＳ３０１）。クラスタリングは公知の手法を用いればよいため詳細な説明を省略する。 The labeling unit 203 performs clustering using the history information 240 (step S301). A known method may be used for the clustering, so a detailed description is omitted.

ラベル付与部２０３は、クラスタ情報２３９を参照し、前回のクラスタと、新たなクラスタとを対応付けできるか否かを判定する（ステップＳ３０２）。 The label assignment unit 203 refers to the cluster information 239 and determines whether the previous cluster can be associated with the new cluster (step S302).

例えば、すべてのクラスタについて、前回のクラスタと、新たなクラスタとの重心の誤差が閾値より小さく、かつ、範囲の誤差が閾値より小さい場合、ラベル付与部２０３は、前回のクラスタと、新たなクラスタとを対応付けできると判定する。 For example, if the error in the center of gravity between the previous cluster and the new cluster is smaller than a threshold value for all clusters, and the error in the range is smaller than a threshold value, the labeling unit 203 determines that the previous cluster can be associated with the new cluster.

前回のクラスタと、新たなクラスタとを対応付けできると判定された場合、ラベル付与部２０３は、クラスタ情報２３９を更新し（ステップＳ３０３）、再クラスタリング処理を終了する。 If it is determined that the previous cluster can be associated with the new cluster, the label assignment unit 203 updates the cluster information 239 (step S303) and terminates the reclustering process.

具体的には、ラベル付与部２０３は、各クラスタの重心１３０２及び範囲１３０３を更新する。再クラスタリングの前後でクラスタは実質的に変化していないため、ラベル定義情報２３２及び入力制御情報２３４は変化する必要がない。 Specifically, the label assignment unit 203 updates the center of gravity 1302 and range 1303 of each cluster. Since the clusters are substantially unchanged before and after reclustering, the label definition information 232 and input control information 234 do not need to be changed.

前回のクラスタと、新たなクラスタとを対応付けができないと判定された場合、ラベル付与部２０３は、クラスタ情報２３９を更新する（ステップＳ３０４）。 If it is determined that the previous cluster cannot be associated with the new cluster, the label assignment unit 203 updates the cluster information 239 (step S304).

具体的には、ラベル付与部２０３は、クラスタ情報２３９を初期化し、クラスタリングの結果に基づいて、新たなクラスタのエントリをクラスタ情報２３９に設定する。 Specifically, the label assignment unit 203 initializes the cluster information 239 and sets an entry for a new cluster in the cluster information 239 based on the clustering result.

ラベル付与部２０３は、ラベル定義情報２３２及び入力制御情報２３４の再設定をユーザに要求する（ステップＳ３０５）。 The label assignment unit 203 requests the user to reset the label definition information 232 and the input control information 234 (step S305).

実施例２では、所属するクラスタに応じてラベルが設定されるため、クラスタが変化した場合、ラベル定義情報２３２及び入力制御情報２３４を再設定する必要がある。したがって、クラスタが変化した場合、ラベル付与部２０３は、ユーザに対して、ラベル定義情報２３２及び入力制御情報２３４の再設定を促す。 In the second embodiment, since a label is set according to the cluster to which the label belongs, if the cluster changes, the label definition information 232 and the input control information 234 need to be reset. Therefore, if the cluster changes, the label assignment unit 203 prompts the user to reset the label definition information 232 and the input control information 234.

ラベル付与部２０３は、ユーザからラベル定義情報２３２及び入力制御情報２３４の入力を受け付ける（ステップＳ３０６）。その後、ラベル付与部２０３は再クラスタリング処理を終了する。 The label assignment unit 203 receives input of the label definition information 232 and the input control information 234 from the user (step S306). After that, the label assignment unit 203 ends the reclustering process.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。また、例えば、上記した実施例は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、各実施例の構成の一部について、他の構成に追加、削除、置換することが可能である。 The present invention is not limited to the above-described embodiments, but includes various modified examples. For example, the above-described embodiments are provided to explain the present invention in detail, and are not necessarily limited to those including all of the described configurations. In addition, it is possible to add, delete, or replace part of the configuration of each embodiment with another configuration.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、本発明は、実施例の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をコンピュータに提供し、そのコンピュータが備えるプロセッサが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク、光磁気ディスク、ＣＤ－Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 The above-mentioned configurations, functions, processing units, processing means, etc. may be realized in part or in whole by hardware, for example by designing them as integrated circuits. The present invention can also be realized by software program code that realizes the functions of the embodiments. In this case, a storage medium on which the program code is recorded is provided to a computer, and a processor included in the computer reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-mentioned embodiments, and the program code itself and the storage medium on which it is stored constitute the present invention. Examples of storage media for supplying such program code include flexible disks, CD-ROMs, DVD-ROMs, hard disks, SSDs (Solid State Drives), optical disks, magneto-optical disks, CD-Rs, magnetic tapes, non-volatile memory cards, and ROMs.

また、本実施例に記載の機能を実現するプログラムコードは、例えば、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｐｙｔｈｏｎ、Ｊａｖａ（登録商標）等の広範囲のプログラム又はスクリプト言語で実装できる。 In addition, the program code that realizes the functions described in this embodiment can be implemented in a wide range of program or script languages, such as assembler, C/C++, perl, Shell, PHP, Python, Java (registered trademark), etc.

さらに、実施例の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することによって、それをコンピュータのハードディスクやメモリ等の記憶手段又はＣＤ－ＲＷ、ＣＤ－Ｒ等の記憶媒体に格納し、コンピュータが備えるプロセッサが当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしてもよい。 Furthermore, the program code of the software that realizes the functions of the embodiment may be distributed over a network and stored in a storage means such as a computer's hard disk or memory, or in a storage medium such as a CD-RW or CD-R, and the processor of the computer may read and execute the program code stored in the storage means or storage medium.

上述の実施例において、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。 In the above examples, the control lines and information lines are those that are considered necessary for the explanation, and not all control lines and information lines in the product are necessarily shown. All components may be interconnected.

１００計算機
１０１プロセッサ
１０２メモリ
１０３補助記憶装置
１０４ネットワークインタフェース
１０５入出力インタフェース
１１１入力装置
１１２出力装置
２００物体検出部
２０１境界ボックス予測部
２０２物体分類部
２０３ラベル付与部
２０４入力制御部
２１０学習部
２２０行動決定部
２３０分類定義情報
２３１遮蔽検出定義情報
２３２ラベル定義情報
２３３物体データ定義情報
２３４入力制御情報
２３５モデル情報
２３６学習情報
２３７自己情報
２３８報酬情報
２３９クラスタ情報
２４０履歴情報
２５０画像
２６０行動データ
３００車両
３１０範囲
３２０物体
３３０境界ボックス
５００境界ボックス情報
６００動き情報
７００物体ラベル情報
８００物体データ
８１０入力層
８１１隠れ層
８１２出力層
８２０リカレントニューラルネットワーク
８３０ニューラルネットワーク 100 Computer 101 Processor 102 Memory 103 Auxiliary storage device 104 Network interface 105 Input/output interface 111 Input device 112 Output device 200 Object detection unit 201 Bounding box prediction unit 202 Object classification unit 203 Label assignment unit 204 Input control unit 210 Learning unit 220 Action determination unit 230 Classification definition information 231 Occlusion detection definition information 232 Label definition information 233 Object data definition information 234 Input control information 235 Model information 236 Learning information 237 Self information 238 Reward information 239 Cluster information 240 History information 250 Image 260 Action data 300 Vehicle 310 Range 320 Object 330 Bounding box 500 Bounding box information 600 Movement information 700 Object label information 800 Object data 810 Input layer 811 Hidden layer 812 Output layer 820 Recurrent Neural Network 830 Neural Network

Claims

1. A computer system comprising:
At least one computer having a computing device, a storage device connected to the computing device, and an interface connected to the computing device;
The system holds model information for managing a model that determines a behavior of a control target using object data that represents features of objects included in an image of the control target, and input control information for managing input rules for the object data to the model based on labels that are assigned based on the movement of the object,
receiving time series data of the image;
Detecting the object in each of a plurality of the images;
selecting a target image from the plurality of images, calculating motion data representing a motion of the object detected from the target image using two or more of the images including the target image, and storing the target image and the motion data in association with each other;
for each of a plurality of said images, applying said label to said object detected in said image based on said motion data;
A computer system comprising: a computer that controls, for each of a plurality of said images, input of said object data of said object detected from said image to said model based on said input control information and said label.

2. The computer system of claim 1,
The motion data is a vector having physical quantities representing the motion of the object as components,
The computer system comprises:
holding label definition information for managing a rule for assigning the label based on an index calculated using the motion data;
Calculating the indicator using the motion data of the object detected from the image;
A computer system comprising: a computer that assigns the label to the object detected from the image based on the index and the label definition information.

2. The computer system of claim 1,
The motion data is a vector having physical quantities representing the motion of the object as components,
The computer system comprises:
holding cluster information for managing clusters configured from a plurality of pieces of motion data in a feature space having an axis of the physical quantity included in the motion data, and label definition information for managing a rule for assigning the label based on the cluster to which the cluster belongs;
Identifying the cluster to which the object detected from the image belongs based on the motion data and the cluster information;
A computer system comprising: a computer that assigns the label to the object detected from the image based on the identified cluster and the label definition information.

2. The computer system of claim 1,
A bounding box is calculated as an area including the object included in the image;
using two or more of the images including the target image, selecting a target object from among the objects included in the target image, and determining whether the target object is occluded by another of the objects;
correcting the bounding box of the target object if the target object is occluded by another of the objects;
A computer system comprising: a computer that calculates the motion data of the object detected from the target image based on an amount of change in the bounding box of the object in two or more images including the target image.

2. The computer system of claim 1,
Accepting time series data of learning images for learning the model;
Detecting the object from each of the plurality of learning images;
selecting a target learning image from the plurality of learning images, calculating the motion data of the object detected from the target learning image using two or more learning images including the target learning image, and storing the target learning image and the motion data in association with each other;
for each of the plurality of training images, assigning the label to the object detected from the training image based on the motion data;
for each of the plurality of learning images, controlling input of the object data of the object detected from the learning image to the model based on the input control information and the label;
A computer system comprising: a learning process that uses the object data whose input is controlled, thereby updating the model; and a result of the update that is reflected in the model information.

A method for controlling input of data to a model, the method being executed by a computer system, comprising:
The computer system comprises:
At least one computer having a computing device, a storage device connected to the computing device, and an interface connected to the computing device;
The system holds model information for managing a model that determines a behavior of a control target using object data that represents features of objects included in an image of the control target, and input control information for managing input rules for the object data to the model based on labels that are assigned based on the movement of the object,
The data input control method includes:
a first step of receiving, by the at least one computer, time series data of the image;
a second step of the at least one computer detecting the object in each of a plurality of the images;
a third step in which the at least one computer selects a target image from the plurality of images, calculates motion data representing a motion of the object detected from the target image using two or more of the images including the target image, and stores the target image and the motion data in association with each other;
a fourth step of the at least one computer assigning, for each of a plurality of the images, the labels to the objects detected in the images based on the motion data;
a fifth step of controlling, for each of a plurality of the images, input of the object data of the object detected from the image to the model based on the input control information and the label by the at least one computer;
13. A method for controlling input of data to a model, comprising:

A method for controlling input of data to a model according to claim 6, comprising the steps of:
The motion data is a vector having physical quantities representing the motion of the object as components,
the computer system holds label definition information that manages a rule for assigning the label based on an index calculated using the motion data;
The fourth step includes:
said at least one computer calculating said indicia using said motion data of said object detected from said image;
and a step in which the at least one computer assigns the label to the object detected from the image based on the index and the label definition information.

A method for controlling input of data to a model according to claim 6, comprising the steps of:
The motion data is a vector having physical quantities representing the motion of the object as components,
the computer system holds cluster information for managing clusters configured of a plurality of pieces of motion data in a feature space having an axis of the physical quantity included in the motion data, and label definition information for managing a rule for assigning the label based on the cluster to which the cluster belongs;
The fourth step includes:
identifying, by the at least one computer, the cluster to which the object detected from the image belongs based on the motion data and the cluster information;
assigning the labels to the objects detected in the image based on the identified clusters and the label definition information, by the at least one computer;
13. A method for controlling input of data to a model, comprising:

A method for controlling input of data to a model according to claim 6, comprising the steps of:
The second step includes:
said at least one computer calculating a bounding box that is an area that contains said object in said image;
the at least one computer using two or more of the images including the target image to select a target object from among the objects included in the target image and determine whether the target object is occluded by another of the objects;
and correcting the bounding box of the target object if the target object is occluded by another of the objects,
A method for controlling input of data to a model, characterized in that the third step includes a step in which the at least one computer calculates the motion data of the object detected from the target image based on the amount of change in the bounding box of the object in two or more images including the target image.

A method for controlling input of data to a model according to claim 6, comprising the steps of:
The at least one computer receives time-series data of training images for training the model;
detecting the object from each of a plurality of training images by the at least one computer;
the at least one computer selecting a target learning image from the plurality of learning images, calculating the motion data of the object detected from the target learning image using two or more learning images including the target learning image, and storing the target learning image and the motion data in association with each other;
the at least one computer assigning the label to the object detected from the training image based on the motion data for each of the training images;
a step of controlling, for each of the plurality of learning images, an input of the object data of the object detected from the learning image to the model based on the input control information and the label, by the at least one computer;
updating the model by executing a learning process using the object data whose input is controlled by the at least one computer, and reflecting a result of the update in the model information;
13. A method for controlling input of data to a model, comprising:

A control device that determines an action of a control target,
A computing device, a storage device connected to the computing device, and an interface connected to the computing device,
The method includes: storing model information for managing a model that determines a behavior of the control target using object data that represents features of objects included in an image of the control target; and storing input control information for managing input rules for the object data to the model based on labels that are assigned based on the movement of the object;
receiving time series data of the image;
Detecting the object in each of a plurality of the images;
selecting a target image from the plurality of images, calculating motion data representing a motion of the object detected from the target image using two or more of the images including the target image, and storing the target image and the motion data in association with each other;
for each of a plurality of said images, applying said label to said object detected in said image based on said motion data;
A control device characterized by controlling, for each of a plurality of images, input of the object data of the object detected from the image to the model based on the input control information and the label.

The control device according to claim 11,
The motion data is a vector having physical quantities representing the motion of the object as components,
The control device includes:
holding label definition information for managing a rule for assigning the label based on an index calculated using the motion data;
Calculating the indicator using the motion data of the object detected from the image;
A control device which assigns the label to the object detected from the image based on the index and the label definition information.

The control device according to claim 11,
The motion data is a vector having physical quantities representing the motion of the object as components,
The control device includes:
holding cluster information for managing clusters configured from a plurality of pieces of motion data in a feature space having an axis of the physical quantity included in the motion data, and label definition information for managing a rule for assigning the label based on the cluster to which the cluster belongs;
Identifying the cluster to which the object detected from the image belongs based on the motion data and the cluster information;
A control device which assigns the label to the object detected from the image based on the identified cluster and the label definition information.

The control device according to claim 11,
A bounding box is calculated as an area including the object included in the image;
using two or more of the images including the target image, selecting a target object from among the objects included in the target image, and determining whether the target object is occluded by another of the objects;
correcting the bounding box of the target object if the target object is occluded by another of the objects;
A control device comprising: a control unit that calculates the motion data of the object detected from the target image based on an amount of change in the bounding box of the object in two or more images including the target image.