JP7700501B2

JP7700501B2 - Inference device, inference method, and program

Info

Publication number: JP7700501B2
Application number: JP2021074174A
Authority: JP
Inventors: 大輔大秋
Original assignee: Fuji Electric Co Ltd
Current assignee: Fuji Electric Co Ltd
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2025-07-01
Anticipated expiration: 2041-04-26
Also published as: JP2022168599A

Description

本発明は、推論装置、推論方法、及びプログラムに関する。 The present invention relates to an inference device, an inference method, and a program.

画像認識等といった様々なタスクを機械学習モデルにより実行することが従来から行われている。また、或るタスクを実行する際に、複数の機械学習モデルの中から適切な機械学習モデルを選択することも従来から行われている。例えば、特許文献１には、複数のＣＮＮ（Convolutional Neural Network）モデルの中からデータベースに適するＣＮＮモデルを選択する技術が開示されている。 Various tasks such as image recognition have traditionally been performed using machine learning models. In addition, when performing a certain task, it has also been traditional to select an appropriate machine learning model from among multiple machine learning models. For example, Patent Literature 1 discloses a technology for selecting a CNN model suitable for a database from among multiple CNN (Convolutional Neural Network) models.

特開２０１８－９２６１４号公報JP 2018-92614 A

しかしながら、ＣＮＮモデルは、ＲＡＭ（Random Access Memory）等といったハードウェアリソースの使用量が多いことが一般的である。このため、特許文献１では、複数のＣＮＮモデルの中からどのＣＮＮモデルを選んだとしても、ハードウェアリソースの使用量を大きく削減させることは期待できないと考えられる。また、特許文献１では、推論を実行する前にＣＮＮモデルを選択しており、推論時に動的にＣＮＮモデルを変更することはできない。 However, CNN models generally use a large amount of hardware resources such as RAM (Random Access Memory). For this reason, in Patent Document 1, no matter which CNN model is selected from among multiple CNN models, it is not expected that the amount of hardware resources used will be significantly reduced. In addition, in Patent Document 1, the CNN model is selected before inference is performed, and the CNN model cannot be dynamically changed during inference.

一方で、推論時の状況によっては、ＣＮＮモデル等の汎用的な機械学習モデルだけでなく、より軽量で特定の状況に特化した機械学習モデルを適用可能な場合がある。このため、推論時の状況が特定の状況のときには、当該状況に特化した軽量な機械学習モデルにより推論を行うことで、ハードウェアリソースの使用量を大きく削減できると考えられる。また、推論時の状況は刻々と変化し得るため、推論に使用する機械学習モデルを動的に変更できることが好ましいと考えられる。 On the other hand, depending on the situation during inference, it may be possible to apply not only general-purpose machine learning models such as CNN models, but also lighter machine learning models specialized for specific situations. For this reason, when the situation during inference is specific, it is believed that the amount of hardware resources used can be significantly reduced by performing inference using a lightweight machine learning model specialized for that situation. Furthermore, since the situation during inference can change from moment to moment, it is considered preferable to be able to dynamically change the machine learning model used for inference.

本発明の一実施形態は、上記の点に鑑みてなされたもので、推論時の状況に応じて複数のモデルの中から適切なモデルを動的に選択することを目的とする。 One embodiment of the present invention has been made in consideration of the above points, and aims to dynamically select an appropriate model from among multiple models depending on the situation at the time of inference.

上記目的を達成するため、一実施形態に係る推論装置は、複数のデータの各々に対して所定のタスクの推論を繰り返し実行する推論装置であって、複数のモデルの中から選択されたモデルを示す選択モデルにより、前記データに対して前記タスクの推論を実行する推論部と、前記推論の結果に基づいて、次の繰り返しにおける推論を実行する選択モデルを前記複数のモデルの中から選択する選択部と、を有する。 To achieve the above object, an inference device according to one embodiment is an inference device that repeatedly executes inference of a predetermined task for each of a plurality of data, and has an inference unit that executes inference of the task for the data using a selection model indicating a model selected from a plurality of models, and a selection unit that selects a selection model from the plurality of models for executing inference in the next iteration based on the result of the inference.

推論時の状況に応じて複数のモデルの中から適切なモデルを動的に選択することができる。 The appropriate model can be dynamically selected from multiple models depending on the situation at the time of inference.

本実施形態に係る動的モデル切替推論装置のハードウェア構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of the dynamic model switching inference device according to the present embodiment. 本実施形態に係る動的モデル切替推論装置の機能構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of a dynamic model switching inference device according to the present embodiment. 認識状態パターン情報の一例を示す図である。FIG. 11 is a diagram illustrating an example of recognition state pattern information. モデル種類情報の一例を示す図である。FIG. 11 is a diagram illustrating an example of model type information. 動的モデル切替推論処理の流れの一例を示すフローチャートである。13 is a flowchart showing an example of the flow of a dynamic model switching inference process. モデル選択処理の流れの一例を示すフローチャートである。13 is a flowchart showing an example of the flow of a model selection process. 時刻ｔ－１と時刻ｔにおける状態変数値の一例を示す図である。FIG. 2 is a diagram showing an example of state variable values at time t-1 and time t. 時刻ｔ－１に対する時刻ｔにおける状態変数値の変化の一例を示す図である。FIG. 2 is a diagram showing an example of changes in state variable values at time t relative to time t−1.

以下、本発明の一実施形態について説明する。本実施形態では、画像中の物体の種類とその位置を認識する画像認識タスクを対象として、当該タスクの推論時の状況に応じて複数のモデルの中から適切なモデルを動的に選択した上で、この選択したモデルに切り替えて推論を行うことが可能な動的モデル切替推論装置１０について説明する。なお、本実施形態では、モデルとは、画像認識タスクに適用可能な機械学習モデル（画像認識モデル）又は画像認識手法のことを意味する。 One embodiment of the present invention will be described below. In this embodiment, a dynamic model switching inference device 10 will be described that is capable of dynamically selecting an appropriate model from among multiple models depending on the situation at the time of inference of an image recognition task of recognizing the type of object in an image and its position, and then switching to the selected model to perform inference. Note that in this embodiment, a model refers to a machine learning model (image recognition model) or an image recognition method that can be applied to an image recognition task.

ここで、本実施形態では、各モデルは、画像中の物体の種類（物体種類）とその物体の画像中の位置（画像中位置）とを認識すると共に、その認識率を算出するものとする。言い換えれば、各モデルは、認識率と、物体種類と、画像中位置とを出力するものとする。認識率とはモデルによる特徴の認識精度を表す指標値であり、例えば、モデルによって出力された物体種類及び画像中位置の確からしさを表している。 In this embodiment, each model recognizes the type of object in an image (object type) and the position of the object in the image (image position), and calculates the recognition rate. In other words, each model outputs the recognition rate, object type, and image position. The recognition rate is an index value that represents the accuracy of feature recognition by the model, and represents, for example, the accuracy of the object type and image position output by the model.

なお、物体種類や画像中位置はモデルによって認識される画像中の特徴であるため、以下では、物体種類及び画像中位置をそれぞれ「特徴」ともいう。また、認識率、物体種類及び画像中位置はモデルの認識状態を表す変数であるため、認識率、物体種類、及び画像中位置をそれぞれ「状態変数」ともいう。 Note that, since the object type and the position in the image are features in the image recognized by the model, hereinafter, the object type and the position in the image are also referred to as "features", respectively. Furthermore, since the recognition rate, the object type, and the position in the image are variables that represent the recognition state of the model, the recognition rate, the object type, and the position in the image are also referred to as "state variables", respectively.

ただし、認識率、物体種類、画像中位置以外にも、画像認識タスクにおけるモデルの認識状態を表す様々な変数を状態変数としてもよい。例えば、撮像装置から物体までの距離、画像認識の認識率に影響を与える条件（例えば、撮影環境における照明条件、１日における時刻等）といったものを表す変数を状態変数としてもよい。 However, in addition to the recognition rate, object type, and position in the image, various variables that represent the recognition state of the model in the image recognition task may be used as state variables. For example, variables that represent the distance from the imaging device to the object, conditions that affect the recognition rate of image recognition (e.g., lighting conditions in the shooting environment, time of day, etc.) may be used as state variables.

＜動的モデル切替推論装置１０のハードウェア構成＞
まず、本実施形態に係る動的モデル切替推論装置１０のハードウェア構成について、図１を参照しながら説明する。図１は、本実施形態に係る動的モデル切替推論装置１０のハードウェア構成の一例を示す図である。 <Hardware Configuration of the Dynamic Model Switching Inference Device 10>
First, a hardware configuration of a dynamic model switching inference device 10 according to the present embodiment will be described with reference to Fig. 1. Fig. 1 is a diagram showing an example of a hardware configuration of the dynamic model switching inference device 10 according to the present embodiment.

図１に示すように、本実施形態に係る動的モデル切替推論装置１０は一般的なコンピュータ又はコンピュータシステムで実現され、入力装置１０１と、表示装置１０２と、外部Ｉ／Ｆ１０３と、通信Ｉ／Ｆ１０４と、プロセッサ１０５と、メモリ装置１０６とを有する。これら各ハードウェアは、それぞれがバス１０７を介して通信可能に接続されている。 As shown in FIG. 1, the dynamic model switching inference device 10 according to this embodiment is realized by a general computer or computer system, and has an input device 101, a display device 102, an external I/F 103, a communication I/F 104, a processor 105, and a memory device 106. Each of these pieces of hardware is connected to each other so as to be able to communicate with each other via a bus 107.

入力装置１０１は、例えば、キーボードやマウス、タッチパネル等である。表示装置１０２は、例えば、ディスプレイ等である。なお、動的モデル切替推論装置１０は、入力装置１０１及び表示装置１０２のうちの少なくとも一方を有していなくてもよい。 The input device 101 is, for example, a keyboard, a mouse, a touch panel, etc. The display device 102 is, for example, a display, etc. Note that the dynamic model switching inference device 10 does not necessarily have to have at least one of the input device 101 and the display device 102.

外部Ｉ／Ｆ１０３は、記録媒体１０３ａ等の外部装置とのインタフェースである。動的モデル切替推論装置１０は、外部Ｉ／Ｆ１０３を介して、記録媒体１０３ａの読み取りや書き込み等を行うことができる。なお、記録媒体１０３ａとしては、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disk）、ＳＤメモリカード（Secure Digital memory card）、ＵＳＢ（Universal Serial Bus）メモリカード等がある。 The external I/F 103 is an interface with an external device such as a recording medium 103a. The dynamic model switching inference device 10 can read and write data from and to the recording medium 103a via the external I/F 103. Examples of the recording medium 103a include a CD (Compact Disc), a DVD (Digital Versatile Disk), a SD memory card (Secure Digital memory card), and a USB (Universal Serial Bus) memory card.

通信Ｉ／Ｆ１０４は、動的モデル切替推論装置１０を通信ネットワークに接続するためのインタフェースである。プロセッサ１０５は、例えば、ＣＰＵ（Central Processing Unit）やＧＰＵ（Graphics Processing Unit）等の各種演算装置である。メモリ装置１０６は、例えば、ＨＤＤ（Hard Disk Drive）やＳＳＤ（Solid State Drive）、ＲＡＭ、ＲＯＭ（Read Only Memory）、フラッシュメモリ等の各種記憶装置である。 The communication I/F 104 is an interface for connecting the dynamic model switching inference device 10 to a communication network. The processor 105 is, for example, various types of arithmetic devices such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit). The memory device 106 is, for example, various types of storage devices such as a HDD (Hard Disk Drive), SSD (Solid State Drive), RAM, ROM (Read Only Memory), and flash memory.

本実施形態に係る動的モデル切替推論装置１０は、図１に示すハードウェア構成を有することにより、後述する動的モデル切替推論処理を実現することができる。なお、図１に示すハードウェア構成は一例であって、動的モデル切替推論装置１０は、他のハードウェア構成を有していてもよい。例えば、動的モデル切替推論装置１０は、複数のプロセッサ１０５を有していてもよいし、複数のメモリ装置１０６を有していてもよい。 The dynamic model switching inference device 10 according to this embodiment has the hardware configuration shown in FIG. 1, and is therefore capable of implementing the dynamic model switching inference process described below. Note that the hardware configuration shown in FIG. 1 is merely an example, and the dynamic model switching inference device 10 may have other hardware configurations. For example, the dynamic model switching inference device 10 may have multiple processors 105, or multiple memory devices 106.

また、本実施形態に係る動的モデル切替推論装置１０は、例えば、一般的なコンピュータ又はコンピュータシステムと比べてハードウェアリソースに制約がある組み込み機器等であってもよい。 The dynamic model switching inference device 10 according to this embodiment may be, for example, an embedded device that has limited hardware resources compared to a general computer or computer system.

＜動的モデル切替推論装置１０の機能構成＞
次に、本実施形態に係る動的モデル切替推論装置１０の機能構成について、図２を参照しながら説明する。図２は、本実施形態に係る動的モデル切替推論装置１０の機能構成の一例を示す図である。 <Functional configuration of the dynamic model switching inference device 10>
Next, the functional configuration of the dynamic model switching inference device 10 according to the present embodiment will be described with reference to Fig. 2. Fig. 2 is a diagram showing an example of the functional configuration of the dynamic model switching inference device 10 according to the present embodiment.

図２に示すように、本実施形態に係る動的モデル切替推論装置１０は、推論部２０１と、状態変数取得部２０２と、認識状態パターン判定部２０３と、モデル選択部２０４とを有する。これら各部は、例えば、動的モデル切替推論装置１０にインストールされた１以上のプログラムが、プロセッサ１０５に実行させる処理により実現される。 As shown in FIG. 2, the dynamic model switching inference device 10 according to this embodiment has an inference unit 201, a state variable acquisition unit 202, a recognition state pattern determination unit 203, and a model selection unit 204. Each of these units is realized, for example, by a process in which one or more programs installed in the dynamic model switching inference device 10 are executed by the processor 105.

また、本実施形態に係る動的モデル切替推論装置１０は、記憶部２０５を有する。記憶部２０５は、例えば、メモリ装置１０６により実現される。なお、記憶部２０５は、例えば、動的モデル切替推論装置１０と通信ネットワークを介して接続されるデータベースサーバ等により実現されてもよい。 The dynamic model switching inference device 10 according to this embodiment also has a storage unit 205. The storage unit 205 is realized, for example, by the memory device 106. Note that the storage unit 205 may also be realized, for example, by a database server connected to the dynamic model switching inference device 10 via a communication network.

推論部２０１は、画像データを入力として、モデル選択部２０４により選択されたモデル（以下、選択モデルともいう。）により推論処理を行って、当該画像データが表す画像中の物体種類と画像中位置とを認識すると共に、その認識率を算出する。 The inference unit 201 receives image data as input and performs inference processing using a model selected by the model selection unit 204 (hereinafter also referred to as the selected model), recognizing the object type and position in the image represented by the image data, and calculating the recognition rate.

状態変数取得部２０２は、推論部２０１による推論結果（つまり、認識率、物体種類、画像中位置）を表す状態変数値を取得する。 The state variable acquisition unit 202 acquires state variable values that represent the inference results (i.e., recognition rate, object type, position in the image) by the inference unit 201.

認識状態パターン判定部２０３は、記憶部２０５に記憶されている認識状態パターン情報を参照して、状態変数取得部２０２により取得された状態変数値（及びその変化）から現在の認識状態パターンを判定する。ここで、認識状態パターンとは推論時の状況を表すパターンのことであり、後述するように、例えば、認識率が高い／低い、物体種類の変化が大きい／小さい、画像中位置の変化が大きい／小さいの組み合わせによって定義される。なお、認識状態パターン情報の詳細については後述する。 The recognition state pattern determination unit 203 refers to the recognition state pattern information stored in the storage unit 205 and determines the current recognition state pattern from the state variable values (and their changes) acquired by the state variable acquisition unit 202. Here, the recognition state pattern is a pattern that represents the situation at the time of inference, and is defined by a combination of, for example, high/low recognition rate, large/small change in object type, and large/small change in position in the image, as described below. Details of the recognition state pattern information will be described later.

モデル選択部２０４は、記憶部２０５に記憶されているモデル種類情報を参照して、認識状態パターン判定部２０３により判定された認識状態パターンから選択モデルを決定する。ここで、モデル種類情報とは、認識状態パターン毎に、その認識状態パターンに対して適切なモデルが定義された情報である。適切なモデルとは、例えば、該当の認識状態パターンにおいて、高い精度で、かつ、軽量に画像認識が可能なモデルのこと（つまり、当該認識状態パターンに特化した軽量なモデルのこと）である。なお、モデル種類情報の詳細については後述する。 The model selection unit 204 refers to the model type information stored in the storage unit 205 and determines the selected model from the recognition state pattern determined by the recognition state pattern determination unit 203. Here, the model type information is information that defines an appropriate model for each recognition state pattern. An appropriate model is, for example, a model that is capable of highly accurate and lightweight image recognition for the relevant recognition state pattern (i.e., a lightweight model specialized for the relevant recognition state pattern). Details of the model type information will be described later.

記憶部２０５は、各種情報（例えば、認識状態パターン情報やモデル種類情報、各モデルのモデルデータ等）を記憶する。なお、記憶部２０５には、認識状態パターン情報やモデル種類情報、モデルデータ以外にも、後述する動的モデル切替推論処理の実行に必要な様々な情報が記憶される。 The storage unit 205 stores various information (e.g., recognition state pattern information, model type information, model data for each model, etc.). In addition to the recognition state pattern information, model type information, and model data, the storage unit 205 also stores various information necessary for executing the dynamic model switching inference process described below.

≪認識状態パターン情報≫
次に、記憶部２０５に記憶されている認識状態パターン情報の一例について、図３を参照しながら説明する。図３は、認識状態パターン情報の一例を示す図である。 <Recognition status pattern information>
Next, an example of the recognition state pattern information stored in the storage unit 205 will be described with reference to Fig. 3. Fig. 3 is a diagram showing an example of the recognition state pattern information.

図３に示す認識状態パターン情報には、認識率と物体種類の変化と画像中位置の変化との組み合わせに応じた５つの認識状態パターンが定義されている。ここで、物体種類の変化とは、現在の時刻と１つ前の時刻との間における物体種類の値の変化のことである。同様に、画像中位置の変化とは、現在の時刻と１つ前の時刻との間における画像中位置の値の変化のことである。なお、図３中における「×」は認識率が低い、物体種類の変化が大きい、画像中位置の変化が大きいことを表し、「〇」は認識率が高い、物体種類の変化が小さい、画像中位置の大きいことを表す。また、「－」は定義されないことを意味する。 The recognition state pattern information shown in Figure 3 defines five recognition state patterns according to combinations of recognition rate, change in object type, and change in position in the image. Here, a change in object type refers to the change in the value of the object type between the current time and the previous time. Similarly, a change in position in the image refers to the change in the value of the position in the image between the current time and the previous time. Note that in Figure 3, "x" indicates a low recognition rate, a large change in object type, and a large change in position in the image, while "o" indicates a high recognition rate, a small change in object type, and a large change in position in the image. Also, "-" means that it is not defined.

認識状態パターン１は、認識率が「×」である場合を表す認識状態パターンである。この場合、物体種類の変化と画像中位置の変化は定義されない。これは、認識率が低い場合は、物体種類及び画像中位置が正しくない可能性が高いためである。 Recognition state pattern 1 is a recognition state pattern that represents a case where the recognition rate is "x". In this case, changes in object type and changes in position in the image are not defined. This is because when the recognition rate is low, there is a high possibility that the object type and position in the image are incorrect.

認識状態パターン２は、認識率が「〇」、物体種類の変化が「×」、画像中位置の変化が「×」である場合を表す認識状態パターンである。認識状態パターン３は、認識率が「〇」、物体種類の変化が「×」、画像中位置の変化が「〇」である場合を表す認識状態パターンである。認識状態パターン４は、認識率が「〇」、物体種類の変化が「〇」、画像中位置の変化が「×」である場合を表す認識状態パターンである。認識状態パターン５は、認識率が「〇」、物体種類の変化が「〇」、画像中位置の変化が「〇」である場合を表す認識状態パターンである。 Recognition state pattern 2 is a recognition state pattern that represents a case where the recognition rate is "o", the change in object type is "x", and the change in position in the image is "x". Recognition state pattern 3 is a recognition state pattern that represents a case where the recognition rate is "o", the change in object type is "x", and the change in position in the image is "o". Recognition state pattern 4 is a recognition state pattern that represents a case where the recognition rate is "o", the change in object type is "o", and the change in position in the image is "x". Recognition state pattern 5 is a recognition state pattern that represents a case where the recognition rate is "o", the change in object type is "o", and the change in position in the image is "o".

なお、図３に示す例では、認識率が「×」の場合は認識状態パターン１のみとしたが、この認識状態パターン１は更に細分化されてもよい。すなわち、認識率「×」、物体種類の変化「×」、画像中位置の変化「×」である認識状態パターン１－１、認識率「×」、物体種類の変化「〇」、画像中位置の変化「×」である認識状態パターン１－２、認識率「×」、物体種類の変化「〇」、画像中位置の変化「×」である認識状態パターン１－３、認識率「×」、物体種類の変化「〇」、画像中位置の変化「〇」である認識状態パターン１－４と細分化されてもよい。 In the example shown in FIG. 3, only recognition state pattern 1 is used when the recognition rate is "x", but this recognition state pattern 1 may be further subdivided. That is, it may be subdivided into recognition state pattern 1-1, which is recognition rate "x", change in object type "x", and change in position in image "x", recognition state pattern 1-2, which is recognition rate "x", change in object type "o", and change in position in image "x", recognition state pattern 1-3, which is recognition rate "x", change in object type "o", and change in position in image "x", and recognition state pattern 1-4, which is recognition rate "x", change in object type "o", and change in position in image "o".

また、図３に示す例では、認識率が高い／低い、物体種類の変化と画像中位置の変化が大きい／小さいの組み合わせとしたが、更に細分化されてもよい。すなわち、例えば、認識率が高い／中程度／低いと３つに細分化されてもよい。同様に、物体種類の変化と画像中位置の変化の少なくとも一方が大きい／中程度／小さいと３つに細分化されてもよい。ただし、３つに細分化することは一例であって、更に細かく細分化（つまり、４つ以上に細分化）されてもよい。 In the example shown in FIG. 3, the recognition rate is a combination of high/low and large/small change in object type and position in the image, but this may be further subdivided. That is, for example, the recognition rate may be subdivided into three categories: high/medium/low. Similarly, at least one of the change in object type and the change in position in the image may be subdivided into three categories: large/medium/small. However, subdivision into three categories is just one example, and further subdivision (i.e., four or more categories) may be used.

また、図３に示す例は、認識率、物体種類及び画像中位置の３つが状態変数である場合の認識状態パターンの一例であり、状態変数の数が増えれば認識状態パターン数も増加し得る。 The example shown in Figure 3 is an example of a recognition state pattern when the three state variables are the recognition rate, object type, and position in the image, and the number of recognition state patterns can increase as the number of state variables increases.

≪モデル種類情報≫
次に、記憶部２０５に記憶されているモデル種類情報の一例について、図４を参照しながら説明する。図４は、モデル種類情報の一例を示す図である。 <Model type information>
Next, an example of the model type information stored in the storage unit 205 will be described with reference to Fig. 4. Fig. 4 is a diagram showing an example of the model type information.

図４に示すモデル種類情報には、認識状態パターン１～認識状態パターン５の各々に対して適切なモデルＡ～モデルＥが定義されている。 The model type information shown in Figure 4 defines appropriate models A to E for each of recognition state patterns 1 to 5.

認識状態パターン１は、認識率が低い認識状態パターンである。このため、認識状態パターン１に対しては、複数の特徴（物体種類と画像中位置）を高い精度で認識可能な汎用的なモデルＡを適切なモデルとする。ここで、モデルＡとしては、例えば、特徴量を自動で抽出するようなモデルと、着目する特徴量を手動で選定又は選択して、複数の特徴量を組み合わせるモデルとが存在する。前者のモデルとしては、例えば、ＣＮＮモデル等が挙げられる。一方で、後者のモデルとしては、ＳＩＦＴ（Scale-Invariant Feature Transform）特徴量やＨＯＧ（Histgram Of Gradient）特徴量等といった複数の特徴量を組み合わせたモデル等が挙げられる。なお、認識状態パターン１ではＣＮＮモデル等の汎用的なモデルＡが用いられるため、ハードウェアリソースの使用量削減の効果は期待できない。これは、認識状態パターン１ではハードウェアリソースの使用量削減よりも、認識性能の向上を優先すべきためである。 Recognition state pattern 1 is a recognition state pattern with a low recognition rate. Therefore, for recognition state pattern 1, a general-purpose model A capable of recognizing multiple features (object type and position in the image) with high accuracy is set as an appropriate model. Here, model A includes, for example, a model that automatically extracts features and a model that manually selects or chooses features of interest and combines multiple features. An example of the former model is a CNN model. On the other hand, an example of the latter model is a model that combines multiple features such as SIFT (Scale-Invariant Feature Transform) features and HOG (Histgram Of Gradient) features. Note that, since a general-purpose model A such as a CNN model is used in recognition state pattern 1, the effect of reducing the amount of hardware resources used cannot be expected. This is because, in recognition state pattern 1, improving recognition performance should be prioritized over reducing the amount of hardware resources used.

認識状態パターン２～認識状態パターン４は、認識率は高い一方で、特定の特徴の変化が大きい認識状態パターンである。 Recognition state patterns 2 to 4 are recognition state patterns that have high recognition rates but large changes in certain features.

認識状態パターン２は、物体種類と画像中位置の両方の変化が大きいため、これら両方の特徴を軽量に認識可能なモデルＢを適切なモデルとする。モデルＢとしては、例えば、ＳＩＦＴ特徴量を使用した画像認識モデル又は画像認識手法、ＳＵＲＦ（Speeded-Up Robust Features）特徴量を使用した画像認識モデル又は画像認識手法等が挙げられる。ただし、認識状態パターン２は物体種類と画像中位置の両方の変化が大きいため、ＣＮＮモデルをモデルＢとしてもよいし、ＣＮＮモデルと同等の認識性能を持つ画像認識モデル等をモデルＢとしてもよい。 For recognition state pattern 2, there is a large change in both object type and position in the image, so model B, which can easily recognize both of these features, is an appropriate model. Examples of model B include an image recognition model or method using SIFT features, and an image recognition model or method using SURF (Speed-Up Robust Features) features. However, since there is a large change in both object type and position in the image for recognition state pattern 2, a CNN model may be used as model B, or an image recognition model with recognition performance equivalent to that of a CNN model may be used as model B.

認識状態パターン３は、物体種類の変化が大きいため、物体種類を軽量に認識可能なモデルＣを適切なモデルとする。モデルＣとしては、例えば、ＳＩＦＴ特徴量を使用した画像認識モデル又は画像認識手法、ＨＯＧ特徴量を使用した画像認識モデル又は画像認識手法等が挙げられる。 For recognition state pattern 3, the change in object type is large, so model C, which can easily recognize object types, is an appropriate model. Examples of model C include an image recognition model or image recognition method using SIFT features, and an image recognition model or image recognition method using HOG features.

なお、ＳＩＦＴ特徴量を使用した画像認識モデル又は画像認識手法は細かな特徴の把握に優れていることが知られている一方で、ＨＯＧ特徴量を使用した画像認識モデル又は画像認識手法は大まかな特徴の把握に優れていることが知られている。このため、例えば、同一物品の物体における種類を認識する場合（例えば、カップにおける絵柄の種類を認識する場合等）はＳＩＦＴ特徴量を使用した画像認識モデル又は画像認識手法が優れている。一方で、例えば、物体の物品種類を認識する場合（例えば、物品がカップか否かを認識する場合等）はＨＯＧ特徴量を使用した画像認識モデル又は画像認識手法が優れている。 It is known that image recognition models or methods using SIFT features are excellent at grasping fine features, while image recognition models or methods using HOG features are excellent at grasping rough features. For this reason, for example, when recognizing the type of object of the same item (for example, when recognizing the type of pattern on a cup, etc.), an image recognition model or method using SIFT features is superior. On the other hand, for example, when recognizing the type of object (for example, when recognizing whether the object is a cup or not), an image recognition model or method using HOG features is superior.

認識状態パターン４は、画像中位置の変化が大きいため、画像中位置を軽量に認識可能なモデルＤを適切なモデルとする。ここで、軽量に認識可能なモデル（以下、軽量モデルともいう。）とは、特定の特徴量の認識に特化し、省リソースで画像認識が可能なモデルのことをいう。モデルＤとしては、例えば、ＳＵＲＦ特徴量を使用した画像認識モデル又は画像認識手法、ＨＯＧ特徴量を使用した画像認識モデル又は画像認識手法等が挙げられる。 Because recognition state pattern 4 involves large changes in position within the image, model D, which can recognize the position within the image with a light weight, is an appropriate model. Here, a model capable of light weight recognition (hereinafter also referred to as a lightweight model) refers to a model that specializes in recognizing specific features and is capable of image recognition with reduced resources. Examples of model D include an image recognition model or image recognition method using SURF features, and an image recognition model or image recognition method using HOG features.

認識状態パターン５は、認識率が高く、かつ、各特徴の変化も小さい認識状態パターンである。この場合は、最も軽量なモデルＥを適切なモデルとする。モデルＥとしては、例えば、ＳＵＲＦ特徴量を使用した画像認識モデル又は画像認識手法等が挙げられる。ただし、これ以外に、軽量な画像認識モデル又は画像認識手法をモデルＥとしてもよい。 Recognition state pattern 5 is a recognition state pattern with a high recognition rate and small changes in each feature. In this case, the most lightweight model E is the appropriate model. Model E can be, for example, an image recognition model or image recognition method using SURF features. However, other lightweight image recognition models or image recognition methods may also be used as model E.

なお、モデルＡ～モデルＥの全てが異なるモデルである必要はなく、一部のモデルが同一のモデルであってもよい。例えば、モデルＤとモデルＥが共にＳＵＲＦ特徴量を使用した画像認識モデル又は画像認識手法であってもよい。 Note that it is not necessary for all of models A to E to be different models, and some of the models may be the same model. For example, models D and E may both be image recognition models or image recognition methods that use SURF features.

また、上記の各モデルの具体例は一例であって、上述したＳＩＦＴ特徴量、ＳＵＲＦ特徴量又はＨＯＧ特徴量を使用した画像認識モデル又は画像認識手法以外にも、様々な画像認識モデル又は画像認識手法を用いることが可能である。例えば、Ｈａａｒ－ｌｉｋｅ特徴量、テンプレートマッチング等といった画像認識手法が用いられてもよい。 The specific examples of each model above are merely examples, and various image recognition models or methods can be used in addition to the image recognition models or methods using the above-mentioned SIFT features, SURF features, or HOG features. For example, image recognition methods such as Haar-like features, template matching, etc. may be used.

＜動的モデル切替推論処理＞
次に、本実施形態に係る動的モデル切替処理について、図５を参照しながら説明する。図５は、動的モデル切替推論処理の流れの一例を示すフローチャートである。なお、以下では、時刻を表すインデックスをｔとして、或る時刻ｔにおける動的モデル切替処理について説明する。 <Dynamic model switching inference processing>
Next, the dynamic model switching process according to this embodiment will be described with reference to Fig. 5. Fig. 5 is a flowchart showing an example of the flow of the dynamic model switching inference process. Note that, in the following, the dynamic model switching process at a certain time t will be described, where t is an index representing time.

まず、推論部２０１は、時刻ｔの画像データを取得する（ステップＳ１０１）。なお、推論部２０１は、動的モデル切替推論装置１０と接続される他の装置又は機器（例えば、撮像装置等）から画像データを取得してもよいし、記憶部２０５から画像データを取得してもよい。 First, the inference unit 201 acquires image data at time t (step S101). Note that the inference unit 201 may acquire image data from another device or equipment (e.g., an imaging device, etc.) connected to the dynamic model switching inference device 10, or may acquire image data from the storage unit 205.

次に、推論部２０１は、上記のステップＳ１０１で取得した画像データを入力として、現在の選択モデルにより推論処理を行って、認識率と物体種類と画像中位置とを表す状態変数値を出力する（ステップＳ１０２）。この状態変数値は、時刻ｔにおける推論結果として記憶部２０５に格納される。なお、ｔ＝１のとき（つまり、最初に推論処理を行うとき）は予め決められたモデル（例えば、ＣＮＮモデル等の汎用的なモデル）を選択モデルとすればよい。 Next, the inference unit 201 performs inference processing using the current selected model with the image data acquired in step S101 as input, and outputs state variable values representing the recognition rate, object type, and position in the image (step S102). These state variable values are stored in the storage unit 205 as the inference results at time t. Note that when t=1 (i.e., when the inference processing is performed for the first time), a predetermined model (for example, a general-purpose model such as a CNN model) may be used as the selected model.

次に、状態変数取得部２０２は、上記のステップＳ１０２で出力された状態変数値（つまり、時刻ｔにおける状態変数値）と、１つ前の時刻の状態変数値（つまり、時刻ｔ－１における状態変数値）とを取得する（ステップＳ１０３）。 Next, the state variable acquisition unit 202 acquires the state variable values output in step S102 above (i.e., the state variable values at time t) and the state variable values at the previous time (i.e., the state variable values at time t-1) (step S103).

次に、動的モデル切替推論装置１０は、モデル選択処理を実行する（ステップＳ１０４）。このモデル選択処理では、時刻ｔにおける状態変数値、又は、時刻ｔにおける状態変数値と時刻ｔ－１における状態変数値の両方に基づいて、次の時刻ｔ＋１の推論処理で用いる選択モデルを決定する。なお、モデル選択処理の詳細については後述する。 Next, the dynamic model switching inference device 10 executes a model selection process (step S104). In this model selection process, a selected model to be used in the next inference process at time t+1 is determined based on the state variable values at time t, or on both the state variable values at time t and time t-1. The model selection process will be described in detail later.

次に、推論部２０１は、次の画像データがあるか否かを判定する（ステップＳ１０５）。そして、次の画像データがある場合は、時刻ｔをｔ＋１に更新した上で（ステップＳ１０６）、推論部２０１は、上記のステップＳ１０１に戻る。これにより、各時刻ｔに対して上記のステップＳ１０１～ステップＳ１０４が繰り返し実行される。一方で、次の画像データが無い場合は、推論部２０１は、動的モデル切替推論処理を終了する。 Next, the inference unit 201 determines whether or not there is next image data (step S105). If there is next image data, the inference unit 201 updates the time t to t+1 (step S106) and returns to the above step S101. As a result, the above steps S101 to S104 are repeatedly executed for each time t. On the other hand, if there is no next image data, the inference unit 201 ends the dynamic model switching inference process.

≪モデル選択処理≫
次に、上記のステップＳ１０４におけるモデル選択処理について、図６を参照しながら説明する。図６は、モデル選択処理の流れの一例を示すフローチャートである。 <Model selection process>
Next, the model selection process in step S104 will be described with reference to Fig. 6. Fig. 6 is a flow chart showing an example of the flow of the model selection process.

まず、認識状態パターン判定部２０３は、時刻ｔにおける状態変数値に含まれる認識率が所定の閾値ｔｈ_１以上であるか否かを判定する（ステップＳ２０１）。当該認識率が閾値ｔｈ_１以上である場合は認識率が高いと判定される一方で、閾値ｔｈ_１未満である場合は認識率が低いと判定される。ただし、例えば、閾値ｔｈ_１１と閾値ｔｈ_１２とを準備し、当該認識率が閾値ｔｈ_１１以上の場合は認識率が高いと判定し、閾値ｔｈ_１２未満の場合は認識率が低いと判定し、閾値ｔｈ_１２以上かつ閾値ｔｈ_１１未満の場合は時刻ｔ－１における本ステップの判定結果と同じとしてもよい。 First, the recognition state pattern determination unit 203 determines whether the recognition rate included in the state variable value at time t is equal to or greater than a predetermined threshold th ₁ (step S201). If the recognition rate is equal to or greater than the threshold th ₁ , the recognition rate is determined to be high, whereas if it is less than the threshold th _1, the recognition rate is determined to be low. However, for example, thresholds th ₁₁ and th ₁₂ may be prepared, and if the recognition rate is equal to or greater than the threshold th ₁₁ , the recognition rate may be determined to be high, if it is less than the threshold th ₁₂ , the recognition rate may be determined to be low, and if it is equal to or greater than the threshold th ₁₂ and less than the threshold th ₁₁ , the result may be the same as the determination result of this step at time t-1.

なお、閾値ｔｈ_１（又は、閾値ｔｈ_１１と閾値ｔｈ_１２）は予め設定した固定値としてもよいが、過去の認識率に応じた可変値としてもよい。例えば、時刻ｔ－ΔＴ（ただし、ΔＴは予め決められた時間幅）から時刻ｔまでの各時刻における認識率の平均値μ_１と分散σ_１を算出した上で、ｔｈ_１１＝μ_１＋σ_１，ｔｈ_１２＝μ_１－σ_１としてもよい。ただし、ｔｈ_１１が低くなり過ぎることを防止するため、例えば、予め決めたチューニング値をＲ'（例えば、Ｒ'＝７０％等）として、ｔｈ_１１＝ｍａｘ（μ_１＋σ_１，Ｒ'）としてもよい。 The threshold th ₁ (or the thresholds th ₁₁ and th ₁₂ ) may be a preset fixed value, or may be a variable value according to past recognition rates. For example, after calculating the average value μ ₁ and variance σ ₁ of the recognition rates at each time from time t-ΔT (where ΔT is a predetermined time width) to time t, th ₁₁ = μ ₁ + σ ₁ , th ₁₂ = μ ₁ - σ ₁ may be used. However, in order to prevent th ₁₁ from becoming too low, for example, a preset tuning value may be set to R' (e.g., R'=70%), and th ₁₁ =max(μ ₁ + σ ₁ , R').

上記のステップＳ２０１で認識率が低いと判定された場合、認識状態パターン判定部２０３は、図３に示す認識状態パターン情報を参照して、現在の認識状態パターンを「認識状態パターン１」であると判定する（ステップＳ２０２）。 If the recognition rate is determined to be low in step S201 above, the recognition state pattern determination unit 203 refers to the recognition state pattern information shown in Figure 3 and determines that the current recognition state pattern is "recognition state pattern 1" (step S202).

そして、モデル選択部２０４は、図４に示すモデル種類情報を参照して、認識状態パターン１に対応するモデルＡを選択モデルとして選択する（ステップＳ２０３）。 Then, the model selection unit 204 refers to the model type information shown in FIG. 4 and selects model A corresponding to recognition state pattern 1 as the selected model (step S203).

一方で、上記のステップＳ２０１で認識率が高いと判定された場合、認識状態パターン判定部２０３は、時刻ｔ－１に対する時刻ｔにおける状態変数値（ただし、認識率を除く）の変化を算出する（ステップＳ２０４）。なお、認識率を除く理由は、上記のステップＳ２０１で認識率の高い／低いが判定済なためである。 On the other hand, if the recognition rate is determined to be high in step S201, the recognition state pattern determination unit 203 calculates the change in the state variable values (excluding the recognition rate) at time t relative to time t-1 (step S204). Note that the recognition rate is excluded because the recognition rate has already been determined to be high or low in step S201.

ここで、状態変数値（ただし、認識率を除く）の変化の算出方法の一例について説明する。例えば、図７の上図に示すように、時刻ｔ－１における画像データを入力として或る選択モデルにより推論処理を行った結果を表す状態変数値が、認識率「８０％」、物体種類「０：カップ」、画像中位置「Ｘ＝１００，Ｙ＝１５０」であったものとする。また、例えば、図７の下図に示すように、時刻ｔにおける画像データを入力として或る選択モデルにより推論処理を行った結果を表す状態変数値が、認識率「８５％」、物体種類「０：カップ」、画像中位置「Ｘ＝１５０，Ｙ＝１００」であったものとする。なお、認識率は０以上１００以下の値を取り得るものとし、物体種類は物体の種類を表すカテゴリ値を取り得るものとする。また、画像中位置は画像データが表す画像の左上を原点として、右方向をＸ軸の正の方向、下方向をＹ軸の正の方向とするＸＹ座標値を取り得るものとする。 Here, an example of a method for calculating the change in state variable values (excluding the recognition rate) will be described. For example, as shown in the upper diagram of FIG. 7, the state variable values representing the result of inference processing performed by a certain selection model using image data at time t-1 as input are a recognition rate of "80%, object type "0: cup", and position in the image "X=100, Y=150". Also, as shown in the lower diagram of FIG. 7, the state variable values representing the result of inference processing performed by a certain selection model using image data at time t as input are a recognition rate of "85%, object type "0: cup", and position in the image "X=150, Y=100". Note that the recognition rate can take a value between 0 and 100, and the object type can take a category value representing the type of object. Also, the position in the image can take an XY coordinate value with the top left corner of the image represented by the image data as the origin, with the right direction being the positive direction of the X axis and the downward direction being the positive direction of the Y axis.

このとき、認識状態パターン判定部２０３は、例えば、時刻ｔにおける物体種類と時刻ｔ－１における物体種類が同一の場合はその変化を「０」、異なる場合はその変化を「１」とする。また、認識状態パターン判定部２０３は、例えば、時刻ｔにおける画像中位置のＸ座標値及びＹ座標値から、時刻ｔ－１における画像中位置のＸ座標値及びＹ座標値をそれぞれ減算することで、画像中位置の変化を算出する。これにより、物体種類の変化と画像中位置の変化は、図８のようになる。すなわち、時刻ｔ－１に対する時刻ｔにおける物体種類の変化は「０」、時刻ｔ－１に対する時刻ｔにおける画像中位置の変化は「Ｘ＝５０，Ｙ＝０」となる。 At this time, the recognition state pattern determination unit 203, for example, sets the change to "0" if the object type at time t is the same as the object type at time t-1, and sets the change to "1" if they are different. The recognition state pattern determination unit 203 also calculates the change in the position in the image, for example, by subtracting the X and Y coordinate values of the position in the image at time t-1 from the X and Y coordinate values of the position in the image at time t, respectively. As a result, the change in object type and the change in position in the image are as shown in FIG. 8. That is, the change in object type at time t relative to time t-1 is "0", and the change in position in the image at time t relative to time t-1 is "X=50, Y=0".

次に、認識状態パターン判定部２０３は、認識率以外の状態変数値の変化に対する閾値判定を行って、その変化が大きい又は小さいのいずれであるかを判定する（ステップＳ２０５）。すなわち、認識状態パターン判定部２０３は、物体種類の変化が大きい又は小さいのいずれかであるか、画像中位置の変化が大きい又は小さいのいずれであるかをそれぞれ判定する。以下、物体種類の変化と画像中位置の変化のそれぞれが大きい又は小さいのいずれであるかを判定する場合の詳細について説明する。 Next, the recognition state pattern determination unit 203 performs a threshold determination on the changes in state variable values other than the recognition rate to determine whether the changes are large or small (step S205). That is, the recognition state pattern determination unit 203 determines whether the change in object type is large or small, and whether the change in position in the image is large or small. Details of determining whether the change in object type and the change in position in the image are large or small are described below.

・物体種類の変化が大きい／小さいを判定する場合
例えば、認識状態パターン判定部２０３は、物体種類の変化率Ｒｖを算出した上で、この変化率Ｒｖが所定の閾値ｔｈ_２以上である場合は物体種類の変化が大きいと判定し、閾値ｔｈ_２未満である場合は物体種類の変化が小さいと判定する。ここで、物体種類の変化率Ｒｖは、例えば、時刻ｔ－ΔＴから時刻ｔまでの間における物体種類の変化回数をｎ、時間幅ΔＴに含まれる時刻インデックス数をＮとして、Ｒｖ＝（ｎ／Ｎ）×１００で算出される。 When determining whether the change in object type is large/small, for example, the recognition state pattern determination unit 203 calculates the change rate Rv of the object type, and determines that the change in the object type is large if this change rate Rv is equal to or greater than a predetermined threshold th ₂ , and determines that the change in the object type is small if it is less than the threshold th _2. Here, the change rate Rv of the object type is calculated, for example, as Rv = (n/N) x 100, where n is the number of changes in the object type between time t-ΔT and time t, and N is the number of time indexes included in the time span ΔT.

ただし、例えば、閾値ｔｈ_２１と閾値ｔｈ_２２とを準備し、物体種類の変化率Ｒｖが閾値ｔｈ_２１以上の場合は物体種類の変化が大きいと判定し、閾値ｔｈ_２２未満の場合は物体種類の変化が小さいと判定し、閾値ｔｈ_２１未満かつ閾値ｔｈ_２２以上の場合は時刻ｔ－１における本ステップの判定結果（又は、時刻ｔ－１で本ステップが実行されていない場合は、それ以前の時刻における本ステップの最新の判定結果）と同じとしてもよい。 However, for example, thresholds _th21 and _th22 may be prepared, and if the rate of change Rv of the object type is equal to or greater than the threshold _th21 , it may be determined that the change in the object type is large, and if it is less than the threshold _th22 , it may be determined that the change in the object type is small, and if it is less than the threshold _th21 and equal to or greater than the threshold _th22 , it may be determined that the change in the object type is the same as the determination result of this step at time t-1 (or, if this step has not been executed at time t-1, the most recent determination result of this step at a time prior to that).

なお、閾値ｔｈ_２（又は、閾値ｔｈ_２１と閾値ｔｈ_２２）は予め設定した固定値としてもよいが、過去の変化率Ｒｖに応じた可変値としてもよい。例えば、時刻ｔ－ΔＴから時刻ｔまでの各時刻における物体種類の変化率Ｒｖの平均値μ_２と分散σ_２を算出した上で、ｔｈ_２１＝μ_２＋σ_２，ｔｈ_２２＝μ_２－σ_２としてもよい。ただし、ｔｈ_２１が低くなり過ぎることを防止するため、例えば、予め決めたチューニング値をＲｖ'（例えば、Ｒｖ'＝３０％等）として、ｔｈ_２１＝ｍａｘ（μ_２＋σ_２，Ｒｖ'）としてもよい。 The threshold th ₂ (or the thresholds th ₂₁ and th ₂₂ ) may be a preset fixed value, or may be a variable value according to the past rate of change Rv. For example, after calculating the average value μ ₂ and variance σ ₂ of the rate of change Rv of the object type at each time from time t-ΔT to time t, th ₂₁ = μ ₂ + σ ₂ and th ₂₂ = μ ₂ - σ ₂ may be set. However, in order to prevent th ₂₁ from becoming too low, for example, a predetermined tuning value may be set to Rv' (for example, Rv' = 30%), and th ₂₁ = max (μ ₂ + σ ₂ , Rv').

・画像中位置の変化が大きい／小さいを判定する場合
例えば、認識状態パターン判定部２０３は、画像中位置の変化率Ｒｄを算出した上で、この変化率Ｒｄが所定の閾値ｔｈ_３以上である場合は画像中位置の変化が大きいと判定し、閾値ｔｈ_３未満である場合は画像中位置の変化が小さいと判定する。ここで、画像中位置の変化率Ｒｄは、例えば、時刻ｔ－ΔＴから時刻ｔまでの間におけるＸ軸方向の変化率Ｒｄｘと、時刻ｔ－ΔＴから時刻ｔまでの間におけるＹ軸方向の変化率Ｒｄｙとのいずれか高い方とする（つまり、Ｒｄ＝ｍａｘ（Ｒｄｘ，Ｒｄｙ））。また、Ｒｄｘは、例えば、時刻ｔ－ΔＴから時刻ｔまでの間におけるＸ軸方向の移動量をｄｘ、画像データが表す画像のＸ軸方向の総画素数をＰｘとして、Ｒｄｘ＝（ｄｘ／Ｐｘ）×１００で算出される。同様に、Ｒｄｙは、例えば、時刻ｔ－ΔＴから時刻ｔまでの間におけるＹ軸方向の移動量をｄｙ、画像データが表す画像のＹ軸方向の総画素数をＰｙとして、Ｐｄｙ＝（ｄｙ／Ｐｙ）×１００で算出される。 - When determining whether the change in the position in the image is large/small For example, the recognition state pattern determination unit 203 calculates the change rate Rd of the position in the image, and determines that the change in the position in the image is large if the change rate Rd is equal to or greater than a predetermined threshold th ₃ , and determines that the change in the position in the image is small if the change rate Rd is less than the threshold th _3. Here, the change rate Rd of the position in the image is, for example, the higher of the change rate Rdx in the X-axis direction from time t-ΔT to time t and the change rate Rdy in the Y-axis direction from time t-ΔT to time t (that is, Rd=max(Rdx, Rdy)). In addition, Rdx is calculated, for example, by Rdx=(dx/Px)×100, where dx is the amount of movement in the X-axis direction from time t-ΔT to time t and Px is the total number of pixels in the X-axis direction of the image represented by the image data. Similarly, Rdy is calculated, for example, as Pdy = (dy/Py) x 100, where dy is the amount of movement in the Y-axis direction between time t-ΔT and time t, and Py is the total number of pixels in the Y-axis direction of the image represented by the image data.

ただし、例えば、閾値ｔｈ_３１と閾値ｔｈ_３２とを準備し、画像中位置の変化率Ｒｄが閾値ｔｈ_３１以上の場合は画像中位置の変化が大きいと判定し、閾値ｔｈ_３２未満の場合は画像中位置の変化が小さいと判定し、閾値ｔｈ_３１未満かつ閾値ｔｈ_３２以上の場合は時刻ｔ－１における本ステップの判定結果（又は、時刻ｔ－１で本ステップが実行されていない場合は、それ以前の時刻における本ステップの最新の判定結果）と同じとしてもよい。 However, for example, thresholds _th31 and _th32 may be prepared, and if the rate of change Rd of position in the image is equal to or greater than the threshold _th31 , it may be determined that the change in position in the image is large, and if it is less than the threshold _th32 , it may be determined that the change in position in the image is small, and if it is less than the threshold _th31 and equal to or greater than the threshold _th32 , it may be determined that the result is the same as the determination result of this step at time t-1 (or, if this step has not been executed at time t-1, the most recent determination result of this step at a time prior to that).

なお、閾値ｔｈ_３（又は、閾値ｔｈ_３１と閾値ｔｈ_３２）は予め設定した固定値としてもよいが、過去の変化率Ｒｄに応じた可変値としてもよい。例えば、時刻ｔ－ΔＴから時刻ｔまでの各時刻における画像中位置の変化率Ｒｄの平均値μ_３と分散σ_３を算出した上で、ｔｈ_３１＝μ_３＋σ_３，ｔｈ_３２＝μ_３－σ_３としてもよい。ただし、ｔｈ_３１が低くなり過ぎることを防止するため、例えば、予め決めたチューニング値をＲｄ'（例えば、Ｒｄ'＝２００％等）として、ｔｈ_３１＝ｍａｘ（μ_３＋σ_３，Ｒｄ'）としてもよい。 The threshold th ₃ (or the thresholds th ₃₁ and th ₃₂ ) may be a preset fixed value, or may be a variable value according to the past rate of change Rd. For example, after calculating the average value μ ₃ and variance σ ₃ of the rate of change Rd of the position in the image at each time from time t-ΔT to time t, th ₃₁ = μ ₃ + σ ₃ , th ₃₂ = μ ₃ - σ ₃ may be set. However, in order to prevent th ₃₁ from becoming too low, for example, a predetermined tuning value Rd' (for example, Rd' = 200%) may be set, and th ₃₁ = max (μ ₃ + σ ₃ , Rd').

次に、認識状態パターン判定部２０３は、図３に示す認識状態パターン情報を参照して、現在の認識状態パターンが、認識状態パターン１～認識状態パターン５のいずれであるかを判定する（ステップＳ２０６）。すなわち、認識状態パターン判定部２０３は、物体種類の変化が大きく、かつ、画像中位置の変化も大きい場合は「認識状態パターン２」、物体種類の変化が大きく、かつ、画像中位置の変化は小さい場合は「認識状態パターン３」、物体種類の変化は小さく、かつ、画像中位置の変化が大きい場合は「認識状態パターン４」、物体種類の変化が小さく、かつ、画像中位置の変化も小さい場合は「認識状態パターン５」と判定する。 Next, the recognition state pattern determination unit 203 refers to the recognition state pattern information shown in FIG. 3 and determines whether the current recognition state pattern is recognition state pattern 1 to recognition state pattern 5 (step S206). That is, the recognition state pattern determination unit 203 determines "recognition state pattern 2" if the change in object type is large and the change in position in the image is also large, "recognition state pattern 3" if the change in object type is large and the change in position in the image is small, "recognition state pattern 4" if the change in object type is small and the change in position in the image is large, and "recognition state pattern 5" if the change in object type is small and the change in position in the image is also small.

次に、モデル選択部２０４は、図４に示すモデル種類情報を参照して、上記のステップＳ２０６で判定した認識状態パターンに対応するモデルを選択モデルとして選択する（ステップＳ２０７）。すなわち、モデル選択部２０４は、上記のステップＳ２０６で「認識状態パターン２」と判定された場合はモデルＢ、「認識状態パターン３」と判定された場合はモデルＣ、「認識状態パターン４」と判定された場合はモデルＤ、「認識状態パターン５」と判定された場合はモデルＥを選択モデルとして選択する。 Next, the model selection unit 204 refers to the model type information shown in FIG. 4 and selects the model corresponding to the recognition state pattern determined in step S206 above as the selected model (step S207). That is, the model selection unit 204 selects model B as the selected model if the recognition state pattern is determined to be "recognition state pattern 2" in step S206 above, model C if the recognition state pattern is determined to be "recognition state pattern 3", model D if the recognition state pattern is determined to be "recognition state pattern 4", and model E if the recognition state pattern is determined to be "recognition state pattern 5".

そして、上記のステップＳ２０３又はステップＳ２０７に続いて、モデル選択部２０４は、選択モデルのモデルデータを記憶部２０５から読み込む（ステップＳ２０８）。これにより、次の時刻ｔ＋１における推論処理では、当該選択モデルが用いられることになる。 Then, following step S203 or step S207, the model selection unit 204 reads the model data of the selected model from the storage unit 205 (step S208). As a result, the selected model is used in the inference process at the next time t+1.

＜まとめ＞
以上のように、本実施形態に係る動的モデル切替推論装置１０は、画像認識タスクを対象として、現在の選択モデルにより推論処理を行った結果及びその変化のパターンを表す認識状態パターンに応じて、当該パターンに対応するモデルに選択モデルを動的に切り替えることができる。これにより、例えば、認識率が低い場合はより認識率が高い汎用モデルを選択モデルとしたり、或る特徴の変化が大きい場合はその特徴の認識に特化した軽量モデルを選択モデルとしたりすることができる。このため、特に、或る一定以上の認識率が得られている状況下では、特定の特徴の認識に特化した軽量モデルにより推論処理を行うことができるため、常に汎用モデルにより推論処理を行う場合と比べて、ハードウェアリソースの使用量を削減させることが可能となる。 <Summary>
As described above, the dynamic model switching inference device 10 according to the present embodiment can dynamically switch the selected model to a model corresponding to the pattern according to the result of inference processing performed by the current selected model for an image recognition task and the recognition state pattern representing the pattern of change. As a result, for example, when the recognition rate is low, a general-purpose model with a higher recognition rate can be selected as the selected model, and when a certain feature changes significantly, a lightweight model specialized for recognizing that feature can be selected as the selected model. Therefore, in particular, when a certain recognition rate or higher is obtained, inference processing can be performed by a lightweight model specialized for recognizing a specific feature, and it is possible to reduce the amount of hardware resources used compared to when inference processing is always performed by a general-purpose model.

なお、本実施形態では画像認識タスクを対象としたが、これは一例であって、データを入力としてモデルにより推論処理を繰り返し実行する様々なタスクに対して適用可能である。例えば、データ分類、回帰、予測、異常検知等といった様々なタスクに対して適用可能である。 Note that while this embodiment focuses on image recognition tasks, this is merely one example, and the invention can be applied to a variety of tasks in which data is input and inference processing is repeatedly performed using a model. For example, the invention can be applied to a variety of tasks, such as data classification, regression, prediction, and anomaly detection.

本発明は、具体的に開示された上記の実施形態に限定されるものではなく、特許請求の範囲の記載から逸脱することなく、種々の変形や変更、既知の技術との組み合わせ等が可能である。 The present invention is not limited to the specifically disclosed embodiments above, and various modifications, changes, and combinations with known technologies are possible without departing from the scope of the claims.

１０動的モデル切替推論装置
１０１入力装置
１０２表示装置
１０３外部Ｉ／Ｆ
１０３ａ記録媒体
１０４通信Ｉ／Ｆ
１０５プロセッサ
１０６メモリ装置
１０７バス
２０１推論部
２０２状態変数取得部
２０３認識状態パターン判定部
２０４モデル選択部
２０５記憶部 10 Dynamic model switching inference device 101 Input device 102 Display device 103 External I/F
103a Recording medium 104 Communication I/F
105 Processor 106 Memory device 107 Bus 201 Inference unit 202 State variable acquisition unit 203 Recognition state pattern determination unit 204 Model selection unit 205 Storage unit

Claims

An inference device that repeatedly executes inference of a predetermined task for each of a plurality of data,
an inference unit that executes inference of the task on the data using a selection model that indicates a model selected from a plurality of models;
a selection unit that selects a selection model for performing inference in a next iteration from among the plurality of models based on a result of the inference;
having
the result of the inference includes a recognition result for each of one or more features of the data, and a recognition rate representing the likelihood of the recognition result;
The selection unit is
An inference device that, depending on a pattern of the recognition result and the recognition rate included in a result of inference in a current iteration, selects a model corresponding to the pattern as a selected model for performing inference in the next iteration .

The selection unit is
The inference device according to claim 1 , wherein if the recognition rate is less than a predetermined threshold, a general-purpose model capable of recognizing the one or more features is selected as the selected model as a model corresponding to the pattern .

The selection unit is
The inference device according to claim 1 , wherein if the recognition rate is equal to or greater than the threshold, a lightweight model capable of recognizing a specific feature is selected as the selected model as a model corresponding to the pattern .

An inference device as claimed in any one of claims 1 to 3, wherein the pattern is represented by a combination of whether or not the recognition rate is greater than or equal to the threshold value and the presence or absence of a change between the recognition result included in the inference result in the current iteration and the recognition result included in the inference result in the previous iteration, or the magnitude of the change .

the task is an image recognition task,
the one or more features include at least one of a type of object in an image represented by the data and a position of the object in the image ;
5. The inference device of claim 4, wherein the pattern is represented by a combination of whether or not the recognition rate is equal to or greater than the threshold, whether or not there is a change between the type of object included in the inference result in the current iteration and the type of object included in the inference result in the previous iteration, and the magnitude of change between the position of the object included in the inference result in the current iteration and the position of the object included in the inference result in the previous iteration .

An inference method for repeatedly executing inference of a predetermined task for each of a plurality of data, comprising:
an inference procedure for performing inference of the task on the data using a selection model indicating a model selected from a plurality of models;
a selection step of selecting, based on a result of the inference, from among the plurality of models, a selection model for performing inference in a next iteration;
The computer executes
the result of the inference includes a recognition result for each of one or more features of the data, and a recognition rate representing the likelihood of the recognition result;
The selection step comprises:
An inference method comprising: selecting, depending on a pattern of the recognition result and the recognition rate contained in a result of inference in a current iteration, a model corresponding to the pattern as a selected model for performing inference in the next iteration .

A program for repeatedly executing inference of a predetermined task for each of a plurality of data,
an inference procedure for performing inference of the task on the data using a selection model indicating a model selected from a plurality of models;
a selection step of selecting, based on a result of the inference, from among the plurality of models, a selection model for performing inference in a next iteration;
on the computer ,
the result of the inference includes a recognition result for each of one or more features of the data, and a recognition rate representing the likelihood of the recognition result;
The selection step comprises:
a program for selecting, depending on a pattern of the recognition result and the recognition rate included in a result of inference in a current iteration, a model corresponding to the pattern as a selection model for performing inference in the next iteration .