JP7345355B2

JP7345355B2 - object identification device

Info

Publication number: JP7345355B2
Application number: JP2019194498A
Authority: JP
Inventors: 修瀬川
Original assignee: Chubu Electric Power Co Inc
Current assignee: Chubu Electric Power Co Inc
Priority date: 2019-10-25
Filing date: 2019-10-25
Publication date: 2023-09-15
Anticipated expiration: 2039-10-25
Also published as: JP2021068293A

Description

本発明は、撮像情報に基づいて物体を識別する物体識別装置、特に、対象領域に配置されている複数の物体（同一形状の複数の物体を含む）を個体識別する物体識別装置に関する。 The present invention relates to an object identification device that identifies objects based on imaging information, and particularly to an object identification device that individually identifies a plurality of objects (including a plurality of objects having the same shape) arranged in a target area.

ディープラーニング（多層ニューラルネットワークによる機械学習手法）を用いて物体を検出する技術が研究されている。例えば、撮像情報に基づいて、対象物体の位置とカテゴリを同時に検出するSSD(Single Shot MultiBox Detector)やYOLO(You Only Look Once)といったエンド・ツー・エンド(end-to-end)の手法が多数提案されている。これらの手法は、物体の位置検出のための多層ニューラルネットワークによる学習と、物体のカテゴリ判別のための多層ニューラルネットワークによる学習を同時に行うマルチタスク学習を基本としている。
SSDによる物体検出技術は、例えば、非特許文献１に開示され、YOLOによる物体検出技術は、例えば、非特許文献２に開示されている。 Technology for detecting objects using deep learning (a machine learning method using multilayer neural networks) is being researched. For example, there are many end-to-end methods such as SSD (Single Shot MultiBox Detector) and YOLO (You Only Look Once) that simultaneously detect the position and category of a target object based on imaging information. Proposed. These methods are based on multi-task learning that simultaneously performs learning using a multilayer neural network to detect the position of an object and learning using a multilayer neural network to determine the category of the object.
An object detection technique using SSD is disclosed in Non-Patent Document 1, for example, and an object detection technique using YOLO is disclosed in Non-Patent Document 2, for example.

“SSD: Single Shot MulitiBox Detector”, Wei Liu, Dragomir Anguelov. Domitru Erhan, Christian Szegedy, Scott reed, Cheng-Yang Fu and Alexsander C. berg (2015), https://arxiv.org/pdf/1512.02325.pdf“SSD: Single Shot MulitiBox Detector”, Wei Liu, Dragomir Anguelov. Domitru Erhan, Christian Szegedy, Scott reed, Cheng-Yang Fu and Alexsander C. berg (2015), https://arxiv.org/pdf/1512.02325.pdf “You Only Look Once Unified, Real-Time Object Detection”, Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi (2016), https://pjreddie,com/media/files/papers/yolo.pdf“You Only Look Once Unified, Real-Time Object Detection”, Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi (2016), https://pjreddie,com/media/files/papers/yolo.pdf

非特許文献１および非特許文献２に開示されている物体検出技術では、対象領域に配置されている複数の物体の位置やカテゴリを検出することはできるが、対象領域に配置されている複数の物体、特に、同一形状（「略同一形状」を含む）の複数の物体を個体識別することができない。
このため、従来では、対象領域に配置されている複数の物体の配置状態を示す配置情報（複数の物体の位置とＩＤ）を２次元マップとして作成（登録）しておき、撮像手段から出力される撮像情報に基づいて検出した物体検出情報（撮像領域における対象領域に配置されている複数の物体の位置およびカテゴリ）と２次元マップで示される配置情報（複数の物体の位置とＩＤ）とを照合することによって、複数の物体を個体識別していた。
しかしながら、物体を登録する毎に２次元マップを手作業で作成する必要があり、２次元マップの作成に労力と時間を要する。
本発明は、このような点に鑑みて創案されたものであり、対象領域に配置されている、複数の物体（同一形状の複数の物体を含む）を容易に個体識別することができる技術を提供することを目的とする。 The object detection techniques disclosed in Non-Patent Document 1 and Non-Patent Document 2 can detect the positions and categories of multiple objects placed in the target area, but Objects, especially multiple objects with the same shape (including "substantially the same shape") cannot be individually identified.
For this reason, conventionally, placement information (positions and IDs of multiple objects) indicating the placement state of multiple objects placed in the target area is created (registered) as a two-dimensional map, and the information is output from the imaging means. Object detection information (positions and categories of multiple objects placed in the target area in the imaging region) detected based on the imaging information obtained by Multiple objects were individually identified by matching them.
However, it is necessary to manually create a two-dimensional map each time an object is registered, and creating a two-dimensional map requires effort and time.
The present invention was devised in view of these points, and provides a technology that can easily individually identify multiple objects (including multiple objects with the same shape) placed in a target area. The purpose is to provide.

本発明の物体識別装置は、所定の対象領域に配置されている複数の物体、特に、同一形状（「略同一形状」を含む）の複数の物体を含む複数の物体を個体識別するために用いられる。
本発明は、撮像手段、対象領域判別手段、物体検出手段、物体配置情報生成手段、物体識別手段および記憶手段を備えている。
撮像手段は、対象領域および対象領域に配置されている複数の物体（同一形状の複数の物体を含む）を含む撮像領域を撮像した撮像画面を示す撮像情報を出力する。撮像手段としては、好適には、ＣＣＤまたはＣＭＯＳを用いたデジタルカメラが用いられる。
対象領域判別手段は、撮像手段からの撮像情報に基づいて、撮像画面における対象領域を判別する。
物体検出手段は、撮像手段からの撮像情報と、対象領域判別手段により判別した、撮像画面における対象領域とに基づいて、撮像画面における対象領域に配置されている複数の物体のカテゴリおよび位置を検出する。好適には、物体検出手段は、対象領域に配置されている複数の物体のカテゴリおよび対象領域における位置を示す物体検出情報を出力する。対象領域判別手段による対象領域を判別する手法や物体検出手段による物体のカテゴリおよび位置の検出手法としては、公知の種々の手法を用いることができる。例えば、SSDやYOLO等のディープラーニングによる検出手法を用いることができる。
物体配置情報生成手段は、物体検出手段により検出した、撮像画面における対象領域に配置されている複数の物体のカテゴリおよび位置（物体検出情報）に基づいて、対象領域における複数の物体の配置状態を示す物体配置情報を生成して記憶手段に記憶する。物体配置情報には、物体を識別する識別情報（ＩＤ）と、対象領域における物体の位置を示す位置情報が含まれている。識別情報（ＩＤ）は、適宜の方法で各物体に付与される。好適には、物体のカテゴリを示すカテゴリ情報を含む識別情報（ＩＤ）が用いられる。
物体識別手段は、物体検出手段から出力される物体検出情報（撮像画面における対象領域に配置されている複数の物体のカテゴリおよび位置）と、記憶手段に記憶されている物体配置情報（対象領域に配置されている複数の物体の識別情報と位置）とに基づいて、撮像画面における対象領域に配置されている複数の物体を個体識別する。例えば、撮像画面における対象領域に配置されている複数の物体と、物体配置情報で示される複数の物体との対応関係を判別する。
そして、本発明は、物体配置情報生成モードと物体識別モードに設定可能に構成されている。物体配置情報生成モードに設定されている時は、対象領域判別手段、物体検出手段および物体配置情報生成手段により物体配置情報を生成して記憶手段に記憶する。物体識別モードに設定されている時は、対象領域判別手段、物体検出手段および物体識別手段により、撮像画面における対象領域に配置されている複数の物体を個体識別する。物体配置情報生成モードあるいは物体識別モードに設定可能に構成する方法としては、種々の方法を用いることができる。例えば、入力手段等から物体配置情報生成モード設定情報が入力されることによって物体配置情報生成モードが設定され、入力手段等から物体識別モード設定情報が入力されることによって物体識別モードに設定されるように構成することができる。あるいは、入力手段等から物体識別開始情報が入力されることによって、先ず、物体配置情報生成モードが設定され、その後、物体識別モードに設定されるように構成することもできる。
本発明では、物体配置情報生成モードにおいて、対象領域における複数の物体の配置状態を示す物体配置情報を生成し、物体識別モードにおいて、物体配置情報を参照して対象領域に配置されている複数の物体を個体識別しているため、対象領域に配置されている複数の物体を容易に個体識別することができる。
本発明の異なる形態では、画角変化量判別手段を備えている。画角変化量判別手段は、記憶手段に記憶されている物体配置情報を生成する際に用いた撮像画面（Ｍ）と、撮像手段から出力された撮像情報で示される撮像画面（Ｐ）との間の画角変化量が所定範囲を超えているか否かを判別する。撮像画面（Ｍ）と撮像画面（Ｐ）との間の画角変化量が所定範囲を超えているか否かを判別する手法としては、好適には、撮像画面（Ｍ）と撮像画面（Ｐ）の類似度が低いか否かを判別する手法を用いることができる。例えば、撮像画面（Ｍ）と撮像画面（Ｐ）の特徴点を検出し、両画面の特徴点のうち、対応関係が一致する特徴点を含む特徴ベクトルの類似度（例えば、両ベクトルのコサイン距離）が所定値以上である場合には画角変化量が所定範囲を超えていないことを判別し、所定値未満である場合には画角変化量が所定範囲を超えていることを判別する。勿論、画角変化量が所定範囲を超えているか否かを判別する手法としては、これ以外の手法を用いることもできる。なお、撮像画面（Ｍ）における対象領域（Ｍ）と撮像画面（Ｐ）における対象領域（Ｐ）との間の画角変化量は、撮像画面（Ｍ）と撮像画面（Ｐ）との間の画角変化量と等価である。
そして、物体識別モードに設定されている時に、画角変化量が所定範囲を超えていることが画角変化量判別手段によって判別されると、対象領域判別手段、物体検出手段および物体配置情報生成手段により物体配置情報を生成して記憶手段に記憶するように構成されている。
撮像手段の撮影位置や撮像角度（撮像方向）を変更した場合には、撮像画面における複数の物体の配置状態と、記憶手段に記憶されている物体配置情報を生成する際に用いた撮像画面における複数の物体の配置状態とのずれ（特に、撮像画面における位置）が大きくなり、物体識別手段による、物体検出情報と物体配置情報との照合に基づく複数の物体の個体識別精度が低下するおそれがある。本形態では、記憶手段に記憶されている物体配置情報を生成する際に用いた撮像画面と、撮像手段から出力された撮像情報で示される撮像画面との間の画角変化量が所定範囲を超えた場合には、物体配置情報を再生成して記憶手段に記憶するため、撮像手段の撮影位置や撮像角度（撮像方向）が変更された場合でも、対象領域に配置されている複数の物体を確実に個体識別することができる。
本発明の異なる形態では、物体識別手段は、物体検出手段により検出した、撮像画面における対象領域に配置されている複数の物体のカテゴリおよび位置と、記憶手段に記憶されている物体配置情報で示される複数の物体の識別情報および位置を、所定の誤差範囲を許容しながら照合することによって、複数の物体を個体識別する。物体の位置を照合する方法としては、好適には、バウンディングボックス（物体検出手段により推定される物体の存在領域候補）の四隅の点の座標、あるいは、バウンディングボックスの幅と高さおよび重心位置の座標を照合する方法が用いられる。
本形態では、簡単に複数の物体の個体識別を行うことができる。
本発明の異なる形態では、物体検出手段は、複数の物体のカテゴリおよび対象領域における相対位置を検出し、物体配置情報生成手段は、複数の物体の識別情報および対象領域における相対位置を示す物体配置情報を生成する。相対位置としては、例えば、撮像画面上に設定された座標系における座標が用いられる。
物体識別手段は、物体の相対位置を用いて、物体検出情報と物体配置情報とを照合する。
本形態では、物体検出情報と物体配置情報との照合処理を容易に行うことができる。
本発明の異なる形態では、対象領域判別手段および物体検出手段は、多層ニューラルネットワークで構成されている。
本形態では、対象領域の判別処理や対象領域に配置されている複数の物体のカテゴリおよび位置の検出処理を簡単に行うことができる。 The object identification device of the present invention is used to individually identify a plurality of objects arranged in a predetermined target area, particularly a plurality of objects including a plurality of objects having the same shape (including "substantially the same shape"). It will be done.
The present invention includes an imaging means, a target area determination means, an object detection means, an object placement information generation means, an object identification means, and a storage means.
The imaging means outputs imaging information indicating an imaging screen that captures an imaging area that includes a target area and a plurality of objects (including a plurality of objects having the same shape) arranged in the target area. As the imaging means, a digital camera using CCD or CMOS is preferably used.
The target area determination means determines the target area on the image capture screen based on the imaging information from the image capture unit.
The object detection means detects the categories and positions of the plurality of objects arranged in the target area on the imaging screen based on the imaging information from the imaging means and the target area on the imaging screen determined by the target area determining means. do. Preferably, the object detection means outputs object detection information indicating the categories and positions of the plurality of objects placed in the target area. Various known methods can be used as the method for determining the target region by the target region determining means and the method for detecting the category and position of the object by the object detecting means. For example, detection methods based on deep learning such as SSD and YOLO can be used.
The object placement information generating means determines the placement state of the plurality of objects in the target area based on the categories and positions (object detection information) of the plurality of objects placed in the target area on the imaging screen, which are detected by the object detection means. The object arrangement information shown is generated and stored in the storage means. The object placement information includes identification information (ID) that identifies the object and position information that indicates the position of the object in the target area. Identification information (ID) is given to each object by an appropriate method. Preferably, identification information (ID) including category information indicating the category of the object is used.
The object identification means uses object detection information (categories and positions of a plurality of objects placed in the target area on the imaging screen) output from the object detection means and object placement information (categories and positions of multiple objects placed in the target area on the imaging screen) stored in the storage means. The plurality of objects arranged in the target area on the imaging screen are individually identified based on the identification information and positions of the plurality of arranged objects. For example, the correspondence relationship between a plurality of objects arranged in a target area on an imaging screen and a plurality of objects indicated by object arrangement information is determined.
The present invention is configured to be able to be set to an object placement information generation mode and an object identification mode. When set to the object placement information generation mode, object placement information is generated by the target area determination means, object detection means, and object placement information generation means, and is stored in the storage means. When the object identification mode is set, a plurality of objects arranged in the object area on the image capturing screen are individually identified by the object area determining means, the object detecting means, and the object identifying means. Various methods can be used to configure the device to be set to object placement information generation mode or object identification mode. For example, the object placement information generation mode is set by inputting object placement information generation mode setting information from an input means, etc., and the object identification mode is set by inputting object identification mode setting information from an input means etc. It can be configured as follows. Alternatively, by inputting object identification start information from an input means or the like, the object arrangement information generation mode is first set, and then the object identification mode can be set.
In the present invention, in the object placement information generation mode, object placement information indicating the placement state of a plurality of objects in the target area is generated, and in the object identification mode, the object placement information is referred to to identify the plurality of objects placed in the target area. Since objects are individually identified, multiple objects placed in the target area can be easily identified individually.
In a different embodiment of the present invention, a viewing angle change amount determining means is provided. The viewing angle change amount determining means distinguishes between the imaging screen (M) used to generate the object arrangement information stored in the storage means and the imaging screen (P) indicated by the imaging information output from the imaging means. It is determined whether the amount of change in the angle of view between the two exceeds a predetermined range. As a method for determining whether the amount of change in the angle of view between the imaging screen (M) and the imaging screen (P) exceeds a predetermined range, it is preferable to A method can be used to determine whether or not the degree of similarity is low. For example, the feature points of the imaging screen (M) and the imaging screen (P) are detected, and the similarity of the feature vectors including the feature points with the same correspondence among the feature points of both screens (for example, the cosine distance of both vectors) is calculated. ) is greater than a predetermined value, it is determined that the amount of change in the angle of view does not exceed a predetermined range, and if it is less than a predetermined value, it is determined that the amount of change in the angle of view exceeds the predetermined range. Of course, other methods may be used to determine whether the amount of change in the angle of view exceeds the predetermined range. Note that the amount of change in the angle of view between the target area (M) on the imaging screen (M) and the target area (P) on the imaging screen (P) is the amount of change in the angle of view between the imaging screen (M) and the imaging screen (P). This is equivalent to the amount of change in angle of view.
When the object identification mode is set and the angle of view change determination means determines that the amount of change in the angle of view exceeds a predetermined range, the target area determination means, the object detection means, and the object placement information are generated. The object arrangement information is generated by the means and stored in the storage means.
When the imaging position or imaging angle (imaging direction) of the imaging means is changed, the arrangement state of multiple objects on the imaging screen and the imaging screen used to generate the object arrangement information stored in the storage means are changed. There is a risk that the deviation from the arrangement state of the plurality of objects (especially their position on the imaging screen) will become large, and the accuracy of individual identification of the plurality of objects based on the comparison between the object detection information and the object arrangement information by the object identification means will decrease. be. In this embodiment, the amount of change in the angle of view between the imaging screen used to generate the object placement information stored in the storage means and the imaging screen indicated by the imaging information output from the imaging means is within a predetermined range. If the object location information is exceeded, the object placement information is regenerated and stored in the storage means. can be reliably identified.
In a different form of the present invention, the object identifying means is configured to identify the categories and positions of the plurality of objects located in the target area on the imaging screen detected by the object detecting means, and the object arrangement information stored in the storage means. A plurality of objects are individually identified by comparing identification information and positions of the plurality of objects while allowing a predetermined error range. Preferably, the method of verifying the position of the object is based on the coordinates of the four corner points of the bounding box (the object's existing region candidate estimated by the object detection means), or the width and height of the bounding box, and the position of the center of gravity. A method of matching coordinates is used.
In this embodiment, individual identification of a plurality of objects can be easily performed.
In a different form of the present invention, the object detecting means detects the categories of the plurality of objects and their relative positions in the target area, and the object placement information generating means detects the object placement information indicating the identification information of the plurality of objects and their relative positions in the target area. Generate information. As the relative position, for example, coordinates in a coordinate system set on the imaging screen are used.
The object identification means uses the relative position of the object to check the object detection information and the object placement information.
In this embodiment, matching processing between object detection information and object arrangement information can be easily performed.
In a different form of the present invention, the target area determining means and the object detecting means are configured by a multilayer neural network.
In this embodiment, it is possible to easily perform a process of determining a target area and a process of detecting categories and positions of a plurality of objects arranged in a target area.

本発明では、対象領域に配置されている複数の物体を容易に個体識別することができる。 According to the present invention, it is possible to easily individually identify a plurality of objects arranged in a target area.

本発明の物体識別装置の一実施形態のブロック図である。FIG. 1 is a block diagram of an embodiment of an object identification device of the present invention. SSDのニューラルネットワーク構成を示す図であるIt is a diagram showing the neural network configuration of SSD. 撮像画像の一例を示す図である。It is a figure showing an example of a captured image. 図２に示されている撮像画像における対象領域を示す図である。3 is a diagram showing a target area in the captured image shown in FIG. 2. FIG. 図２に示されている撮像画像に基づいた物体検出結果の一例を示す図である。3 is a diagram showing an example of an object detection result based on the captured image shown in FIG. 2. FIG. 図２に示されている撮像画像に基づいた物体検出情報の一例を示す図である。3 is a diagram showing an example of object detection information based on the captured image shown in FIG. 2. FIG. 図５に示されている物体検出情報に基づいた物体配置情報の一例を示す図である。6 is a diagram showing an example of object placement information based on the object detection information shown in FIG. 5. FIG. 物体検出情報と物体配置情報に基づいて複数の物体を個体識別する動作を説明する図である。FIG. 6 is a diagram illustrating an operation of individually identifying a plurality of objects based on object detection information and object placement information. 物体配置情報生成モードにおける動作の一実施形態を説明するフローチャートである。3 is a flowchart illustrating an embodiment of operations in object placement information generation mode. 物体識別モードにおける動作の一実施形態を説明するフローチャートである。3 is a flowchart illustrating an embodiment of operation in object identification mode. 撮像画像の特徴ベクトルによる画角変化量判別方法を示す図である。FIG. 6 is a diagram illustrating a method for determining the amount of change in the angle of view based on a feature vector of a captured image. 物体識別モードにおける動作の他の実施形態を説明するフローチャートである。12 is a flowchart illustrating another embodiment of operation in object identification mode.

以下に、本発明の実施形態を、図面を参照して説明する。
図１は、本発明の物体識別装置の一実施形態のブロック図である。
本実施形態の物体識別装置は、撮像手段１０、処理手段２０、記憶手段３０、入力手段４０および表示手段５０等を有している。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a block diagram of an embodiment of an object identification device of the present invention.
The object identification device of this embodiment includes an imaging means 10, a processing means 20, a storage means 30, an input means 40, a display means 50, and the like.

撮像手段１０は、好適には、ＣＣＤまたはＣＭＯＳを用いたデジタルカメラにより構成される。撮像手段１０は、対象領域と、対象領域に配置されている複数の物体を含む撮像領域を撮像可能に配置される。なお、以下の実施形態の説明では、複数の物体には、同一形状の複数の物体が少なくとも一種類含まれているものとする。
後述する物体検出処理では、「物体の形状」に基づいて物体のカテゴリ（種類）が判別される。「同一形状」という記載は、物体検出処理によって同じカテゴリの物体であると判別される「略同一形状」を含むものとして用いられている。 The imaging means 10 is preferably constituted by a digital camera using a CCD or CMOS. The imaging means 10 is arranged so as to be able to image an imaging area including a target area and a plurality of objects placed in the target area. In the following description of the embodiment, it is assumed that the plurality of objects includes at least one type of plurality of objects having the same shape.
In object detection processing, which will be described later, the category (type) of the object is determined based on the "shape of the object." The expression "same shape" is used to include "substantially the same shape" that is determined by object detection processing to be objects of the same category.

処理手段２０は、ＣＰＵ等により構成される。処理手段２０は、管理手段２１、撮像情報入力手段２２、対象領域判別手段２３、物体検出手段２４、物体配置情報生成手段２５、画角変化量判別手段２６、物体識別手段２７を有している。本実施形態では、各手段２１～２７は、処理手段２０を構成するＣＰＵに設けられている。勿論、各手段２１～２７の少なくとも１つを、処理手段２０を構成するＣＰＵと異なるＣＰＵで構成することもできる。 The processing means 20 is composed of a CPU or the like. The processing means 20 includes a management means 21 , an imaging information input means 22 , a target area determination means 23 , an object detection means 24 , an object placement information generation means 25 , an angle of view change amount determination means 26 , and an object identification means 27 . In this embodiment, each of the means 21 to 27 is provided in a CPU that constitutes the processing means 20. Of course, at least one of the means 21 to 27 may be configured with a CPU different from the CPU constituting the processing means 20.

各手段２１～２７の処理を、図２～図８を参照しながら説明する。
以下では、図３に示されているように、操作盤１００の操作パネル１１０に、同一形状の３個のボリューム１２０ａ～１２０ｃ、同一形状の２個のダイヤル１３０ａ、１３０ｂ、同一形状の１１個のボタン１４０ａ～１４０ｋが配置されている場合について説明する。
操作パネル１１０は、上側縁部１１１ａ、左側縁部１１１ｂ、下側縁部１１１ｃおよび右側縁部１１１ｄにより四角形状に形成されている縁部１１１により囲まれている。
なお、１１は、操作パネル１１０とボリューム１２０ａ～１２０ｃ、ダイヤル１３０ａ、１３０ｂおよびボタン１４０ａ～１４０ｋを含む撮像領域を撮像手段１０で撮像した撮像画面を示している。
操作パネル１１０が、本発明の「対象領域」に対応し、操作パネル１１０の縁部１１１（上側縁部１１１ａ～右側縁部１１１ｄ）が、本発明の「対象領域縁部（対象領域上側縁部～対象領域右側縁部）」に対応し、ボリューム１２０ａ～１２０ｃ、ダイヤル１３０ａ、１３０ｂおよびボタン１４０ａ～１４０ｋが、本発明の「対象領域に配置されている複数の物体」に対応し、ボリューム１２０ａ～１２０ｃあるいはダイヤル１３０ａ、１３０ｂあるいはボタン１４０ａ～１４０ｋが、本発明の「同一形状の複数の物体」に対応する。また、ボリューム、ダイヤルおよびボタンが、本発明の「物体のカテゴリ」に対応する。 The processing of each means 21 to 27 will be explained with reference to FIGS. 2 to 8.
In the following, as shown in FIG. 3, the operation panel 110 of the operation panel 100 has three volumes 120a to 120c of the same shape, two dials 130a and 130b of the same shape, and 11 dials of the same shape. A case where buttons 140a to 140k are arranged will be explained.
The operation panel 110 is surrounded by a rectangular edge 111 formed by an upper edge 111a, a left edge 111b, a lower edge 111c, and a right edge 111d.
Note that reference numeral 11 indicates an image capturing screen in which an image capturing area including the operation panel 110, volumes 120a to 120c, dials 130a, 130b, and buttons 140a to 140k is imaged by the image capturing means 10.
The operation panel 110 corresponds to the "target area" of the present invention, and the edge 111 (upper edge 111a to right edge 111d) of the operation panel 110 corresponds to the "target area edge (target area upper edge)" of the present invention. The volumes 120a to 120c, the dials 130a and 130b, and the buttons 140a to 140k correspond to the "multiple objects placed in the target area" of the present invention, and the volumes 120a to 120c correspond to 120c, dials 130a, 130b, or buttons 140a to 140k correspond to "a plurality of objects having the same shape" of the present invention. Further, the volume, dial, and button correspond to the "object category" of the present invention.

管理手段２１は、各手段２２～２７の処理を管理する。
撮像情報入力手段２２は、撮像手段１０からの撮像情報を入力する。 The management means 21 manages the processing of each means 22-27.
The imaging information input means 22 inputs imaging information from the imaging means 10.

対象領域判別手段２３は、撮像情報入力手段２２により入力した、撮像手段１０からの撮像情報に基づいて、対象領域を判別する。 The target area determining means 23 determines the target area based on the imaging information from the imaging means 10 inputted by the imaging information inputting means 22 .

対象領域判別手段２３による対象領域を判別する手法としては、種々の手法を用いることができる。例えば、非特許文献１に開示されているSSDを用いることができる。
SSDは、図２に示されているように、多層のCNN(Convolutional neural network)（畳み込みニューラルネットワーク）を基本とし、物体の存在領域候補を推定するレイヤと、存在領域候補内の物体のカテゴリを判別するレイヤとにより構成される。物体の存在領域候補を推定するレイヤでは、画像情報を、複数の所定サイズの矩形領域（デフォルトボックス）に分割し、矩形領域のずれを考慮しながら物体の存在領域候補（バウンディングボックス）を推定する。存在領域候補内の物体のカテゴリを判別するレイヤでは、別途学習済のCNNを用いて当該存在領域候補内の物体のカテゴリを判別する。
対象領域判別手段２３に用いられるSSDは、複数の物体が配置されている対象領域の存在領域候補を推定するレイヤと、存在領域候補内の対象領域を判別するレイヤとにより構成される。SSDにより構成される対象領域判別手段２３は、撮像領域を撮像した撮像画面における対象領域とその位置を判別する。
本実施形態では、図４に示されているように、縁部１１１により囲まれている操作パネル１１０が対象領域１１０として判別される。 Various methods can be used to determine the target area by the target area determining means 23. For example, the SSD disclosed in Non-Patent Document 1 can be used.
As shown in Figure 2, SSD is based on a multi-layer CNN (Convolutional neural network), which includes a layer that estimates object region candidates and a layer that estimates the object category within the object region candidates. It is composed of layers to be determined. In the layer that estimates the object's existing area candidate, image information is divided into multiple rectangular areas (default boxes) of a predetermined size, and the object's existing area candidate (bounding box) is estimated while taking into account the deviation of the rectangular areas. . In the layer that determines the category of an object within the existence region candidate, a separately trained CNN is used to determine the category of the object within the existence region candidate.
The SSD used in the target area determination means 23 is composed of a layer for estimating an existence area candidate of a target area in which a plurality of objects are arranged, and a layer for determining a target area within the existence area candidate. The target area determining means 23, which is constituted by an SSD, determines the target area and its position on the image capture screen where the image capture area is captured.
In this embodiment, as shown in FIG. 4, the operation panel 110 surrounded by the edge 111 is determined as the target area 110.

物体検出手段２４は、撮像情報入力手段２２により入力した、撮像手段１０からの撮像情報と、対象領域判別手段２３により判別した対象領域に基づいて、撮像画面における対象領域に配置（表示）されている複数の物体のカテゴリと位置を検出する。
物体検出手段２４による、対象領域に配置されている複数の物体のカテゴリと位置を検出する手法としては、種々の手法を用いることができる。例えは、対象領域判別手段２３と同様に、図２に示されているSSDを用いることができる。
物体検出手段２４に用いられるSSDは、対象領域における、複数の物体それぞれの存在領域候補（バウンディングボックス）を推定するレイヤと、各存在領域候補内の物体のカテゴリを検出するレイヤとにより構成される。
本実施形態では、図５に破線で示されているバウンディングボックスと、各バウンディングボックス内のボリューム１２０ａ～１２０ｃ、ダイヤル１３０ａ、１３０ｂおよびボタン１４０ａ～１４０ｋが検出される。 The object detection means 24 is arranged (displayed) in the target area on the imaging screen based on the imaging information from the imaging means 10 inputted by the imaging information input means 22 and the target area determined by the target area determination means 23. Detect the categories and locations of multiple objects in your environment.
Various methods can be used for the object detection means 24 to detect the categories and positions of a plurality of objects placed in the target area. For example, similar to the target area determination means 23, the SSD shown in FIG. 2 can be used.
The SSD used in the object detection means 24 is composed of a layer that estimates the existence region candidates (bounding boxes) of each of a plurality of objects in the target region, and a layer that detects the category of the object in each existence region candidate. .
In this embodiment, the bounding boxes indicated by broken lines in FIG. 5 and the volumes 120a to 120c, dials 130a and 130b, and buttons 140a to 140k within each bounding box are detected.

そして、物体検出手段２４は、対象領域に配置されている複数の物体のカテゴリと位置を示す物体検出情報を出力する。物体の位置としては、好適には、バウンディングボックスの四隅の点の座標、あるいは、バウンディングボックスの幅と高さおよび重心位置の座標が用いられる。座標としては、好適には、対象領域に設定されたｘｙ座標系におけるｘｙ座標が用いられる。「対象領域に設定されたｘｙ座標系におけるｘｙ座標」が、本発明の「対象領域における相対位置」に対応する。
本実施形態では、図６に示されているように、破線で示されているバウンディングボックスの位置に、３個のボリュームＶ、２個のダイヤルＤおよび１１個のスイッチＳが配置されていることを示す物体検出情報が出力される。 Then, the object detection means 24 outputs object detection information indicating the categories and positions of a plurality of objects placed in the target area. As the position of the object, the coordinates of the four corner points of the bounding box, or the coordinates of the width, height, and center of gravity of the bounding box are preferably used. As the coordinates, xy coordinates in an xy coordinate system set in the target area are preferably used. The "xy coordinates in the xy coordinate system set in the target area" corresponds to the "relative position in the target area" of the present invention.
In this embodiment, as shown in FIG. 6, three volumes V, two dials D, and 11 switches S are arranged at the bounding box indicated by the broken line. Object detection information indicating .

物体配置情報生成手段２５は、物体検出手段２４から出力される物体検出情報（対象領域に配置されている複数の物体のカテゴリと位置）に基づいて物体配置情報を作成し、記憶手段３０に記憶する。物体配置情報には、各物体を識別する識別情報（ＩＤ）と位置を示す位置情報が含まれている。各物体に識別情報（ＩＤ）を付与する方法は、適宜設定可能である。識別情報（ＩＤ）には、物体のカテゴリを示すカテゴリ情報が含まれる。好適には、同一形状の物体には、物体の種類を示すカテゴリ情報（共通情報）と番号（個別情報）を含む識別情報が付与される。番号は、例えば、物体の重心位置の分散が大きい方向に沿って順に大きくなるように付与される。
物体配置情報で示される物体の位置としては、好適には、対象領域における位置、より好適には、対象領域における相対位置が用いられる。
図７では、操作パネル１１０の上側縁部１１１ａと左側縁部１１１ｂ（対象領域上側縁部と対象領域左側縁部）をそれぞれｘ軸とｙ軸としたｘｙ座標系が設定され、ｘｙ座標系におけるｘｙ座標が相対位置として用いられる。 The object placement information generation means 25 creates object placement information based on the object detection information (categories and positions of a plurality of objects placed in the target area) output from the object detection means 24, and stores it in the storage means 30. do. The object arrangement information includes identification information (ID) that identifies each object and position information that indicates the position. The method of assigning identification information (ID) to each object can be set as appropriate. The identification information (ID) includes category information indicating the category of the object. Preferably, objects having the same shape are given identification information including category information (common information) indicating the type of object and a number (individual information). For example, the numbers are assigned in increasing order along the direction in which the dispersion of the center of gravity positions of the objects is large.
As the position of the object indicated by the object placement information, the position in the target area is preferably used, and more preferably the relative position in the target area is used.
In FIG. 7, an xy coordinate system is set with the upper edge 111a and the left edge 111b (the upper edge of the target area and the left edge of the target area) as the x and y axes, respectively, of the operation panel 110. The xy coordinates are used as relative positions.

物体配置情報生成手段２５は、例えば、図７に示されているように、３個のボリューム［Ｖ］に対して、ｘ軸に沿って左側から順に識別情報［Ｖ１］～［Ｖ３］を付与し、２個のダイヤル［Ｄ］に対して、ｘ軸に沿って左側から順に識別情報［Ｄ１］、［Ｄ２］を付与し、１１個のボタン［Ｓ］のうち上方の７個のボタン［Ｓ］に対して、ｘ軸に沿って左側から順に識別情報［Ｓ１］～［Ｓ７］を付与し、右側の４個のボタン［Ｓ］に対して、上方から順に識別情報［Ｓ８］～［Ｓ１１］を付与する。
なお、図７では、ボリューム［Ｖ１］～［Ｖ３］、ダイヤル［Ｄ１］、［Ｄ２］、ボタン［Ｓ１］～［Ｓ１１］の位置は、それぞれのバウンディングボックスの幅と高さおよび重心である点Ｇで示されている。点Ｇの位置は、ｘｙ座標系におけるｘｙ座標で表される。
「重心の位置を、撮像画面あるいは撮像画面の対象領域に設定されたｘｙ座標系におけるｘｙ座標で表す」構成は、本発明の「物体の位置を相対位置で表す」構成に含まれる。 For example, as shown in FIG. 7, the object placement information generation means 25 assigns identification information [V1] to [V3] to the three volumes [V] in order from the left side along the x-axis. Then, identification information [D1] and [D2] are assigned to the two dials [D] in order from the left side along the x-axis, and the upper seven buttons [S] of the 11 buttons [S] are assigned identification information [D1] and [D2] in order from the left side along the S] are given identification information [S1] to [S7] in order from the left side along the x-axis, and identification information [S8] to [S7] are given in order from the top to the four buttons [S] on the right side. S11].
In Fig. 7, the positions of the volumes [V1] to [V3], dials [D1], [D2], and buttons [S1] to [S11] are the width, height, and center of gravity of their respective bounding boxes. It is indicated by G. The position of point G is expressed by xy coordinates in the xy coordinate system.
The configuration of "representing the position of the center of gravity by xy coordinates in an xy coordinate system set in the imaging screen or the target area of the imaging screen" is included in the configuration of "representing the position of the object by a relative position" of the present invention.

なお、画角変化量判別手段２６については、後述する。 Note that the view angle change amount determining means 26 will be described later.

物体識別手段２７は、物体検出手段２４から出力される物体検出情報（対象領域に配置されている複数の物体のカテゴリと位置）と記憶手段３０に記憶されている物体配置情報（各物体の識別情報と位置）とに基づいて、対象領域に配置されている複数の物体を個体識別する。
物体識別手段２７による物体識別方法としては、種々の方法を用いることができる。例えば、物体検出情報および物体配置情報から、同一形状の物体の位置を抽出する。そして、物体検出情報から抽出した位置と物体配置情報から抽出した位置を照合し、両位置が一致するか否かを判断する。この時、両位置が一致するか否かの判断は、所定の誤差範囲を許容しながら行う。すなわち、両位置の差が所定の誤差範囲内である場合には、両位置が一致すると判断し、所定の誤差範囲を超えている場合には、両位置が一致していないと判断する。両位置が一致すると判断した場合には、物体検出情報から抽出した位置に配置されている物体は、物体配置情報から抽出した位置に配置されている物体に付与されている識別情報で示される物体であると識別する。 The object identification means 27 includes object detection information (categories and positions of a plurality of objects arranged in the target area) outputted from the object detection means 24 and object arrangement information (identification of each object) stored in the storage means 30. A plurality of objects placed in the target area are individually identified based on the information and position).
Various methods can be used as the object identification method by the object identification means 27. For example, the positions of objects with the same shape are extracted from object detection information and object placement information. Then, the position extracted from the object detection information and the position extracted from the object placement information are compared to determine whether the two positions match. At this time, a determination as to whether the two positions match is made while allowing a predetermined error range. That is, if the difference between the two positions is within a predetermined error range, it is determined that the two positions match, and if the difference exceeds the predetermined error range, it is determined that the two positions do not match. If it is determined that both positions match, the object placed at the position extracted from the object detection information is the object indicated by the identification information given to the object placed at the position extracted from the object placement information. .

具体的には、物体識別手段２７は、図８に示されているように、物体検出手段２４から出力される物体検出情報と記憶手段３０に記憶されている物体配置情報を照合する。
例えば、物体検出情報に含まれている、ボリューム［Ｖ］（１）の位置（バウンディングボックスの幅と高さおよび重心位置のｘｙ座標）と、物体配置情報に含まれている、ボリューム［Ｖ］（１）と同一形状（同一カテゴリ）のボリューム［Ｖ１］～［Ｖ３］の位置（バウンディングボックスの幅と高さおよび重心位置のｘｙ座標）を照合し、両者の差が所定範囲内であるか否かを判断する。この時、ボリューム［Ｖ］（１）の位置とボリューム［Ｖ２］および［Ｖ３］の位置との差は所定範囲を超えているが、ボリューム［Ｖ］（１）の位置とボリューム［Ｖ１］の位置の差は所定範囲内である場合には、物体検出情報に含まれているボリューム［Ｖ］（１）は、物体配置情報に含まれているボリューム［Ｖ１］であると識別（個体識別）する。 Specifically, the object identification means 27 compares the object detection information output from the object detection means 24 with the object arrangement information stored in the storage means 30, as shown in FIG.
For example, the position of the volume [V] (1) (the width and height of the bounding box and the xy coordinates of the center of gravity) included in the object detection information and the volume [V] included in the object placement information Check the positions of volumes [V1] to [V3] of the same shape (same category) as in (1) (the width and height of the bounding box and the xy coordinates of the center of gravity), and check whether the difference between the two is within the specified range. Decide whether or not. At this time, the difference between the position of volume [V] (1) and the positions of volumes [V2] and [V3] exceeds the predetermined range, but the difference between the position of volume [V] (1) and volume [V1] is If the difference in position is within a predetermined range, the volume [V] (1) included in the object detection information is identified as the volume [V1] included in the object placement information (individual identification). do.

記憶手段３０は、ＲＯＭやＲＡＭ等により構成され、各手段２１～２７の処理を実行するプログラムや種々のデータが記憶される。物体配置情報生成手段２５によって作成された物体配置情報は、記憶手段３０の物体配置情報記憶部３１に記憶される。
入力手段４０は、キーボード等により構成され、種々の情報を入力する。
表示手段５０は、液晶表示装置や有機ＥＬ表示装置等により構成され、種々の情報を表示する。なお、表示手段５０として、表示画面に表示されている表示部をタッチすることによって情報を入力することができる表示手段が用いられる場合には、入力手段４０を省略することもできる。 The storage means 30 is constituted by ROM, RAM, etc., and stores programs for executing the processes of each means 21 to 27 and various data. The object placement information created by the object placement information generation means 25 is stored in the object placement information storage section 31 of the storage means 30.
The input means 40 is constituted by a keyboard or the like, and inputs various information.
The display means 50 is constituted by a liquid crystal display device, an organic EL display device, etc., and displays various information. In addition, when the display means 50 is a display means in which information can be input by touching the display section displayed on the display screen, the input means 40 can be omitted.

次に、本実施形態の物体識別装置の動作を説明する。本実施形態の物体識別装置は、物体配置情報生成モードあるいは物体識別モードに設定可能に構成されている。
例えば、入力手段４０から物体配置情報生成モード設定情報あるいは物体識別モード設定情報を入力することによって、物体配置情報生成モードあるいは物体識別モードに設定されるように構成される。あるいは、入力手段４０から物体識別開始情報が入力されることによって、先ず、物体配置情報生成モードが設定され、物体配置情報を生成して記憶手段３０に記憶した後、物体識別モードに設定されるように構成される。 Next, the operation of the object identification device of this embodiment will be explained. The object identification device of this embodiment is configured to be able to be set to object placement information generation mode or object identification mode.
For example, by inputting object placement information generation mode setting information or object identification mode setting information from the input means 40, it is configured to be set to object placement information generation mode or object identification mode. Alternatively, by inputting object identification start information from the input means 40, the object arrangement information generation mode is first set, and after the object arrangement information is generated and stored in the storage means 30, the object identification mode is set. It is configured as follows.

物体情報生成モードに設定された時の動作を、図９に示されているフローチャートを参照して説明する。
ステップＡ１では、撮像手段１０で撮像した撮像情報を入力する。ステップＡ１の処理は、撮像情報入力手段２２により実行される。
ステップＡ２では、入力された撮像情報に基づいて、対象領域を判別する。ステップＡ２の処理は、対象領域判別手段２３により実行される。
ステップＡ３では、入力された撮像情報とステップＡ２で判別した対象領域に基づいて、対象領域に配置されている複数の物体のカテゴリおよび位置を検出して物体検出情報を出力する。ステップＡ３の処理は、物体検出手段２４によって実行される。
ステップＡ４では、ステップＡ３で検出した、対象領域に配置されている物体のカテゴリおよび位置を示す物体検出情報に基づいて、各物体を識別する識別情報と各物体の位置を含む物体配置情報を生成する。ステップＡ４の処理は、物体配置情報生成手段２５によって実行される。
ステップＡ５では、ステップＡ４で生成した物体配置情報を記憶手段３０に記憶する。ステップＡ５の処理は、物体配置情報生成手段２５によって実行される。 The operation when the object information generation mode is set will be explained with reference to the flowchart shown in FIG.
In step A1, image information captured by the image capturing means 10 is input. The process of step A1 is executed by the imaging information input means 22.
In step A2, a target area is determined based on the input imaging information. The process of step A2 is executed by the target area determining means 23.
In step A3, the categories and positions of a plurality of objects placed in the target area are detected based on the input imaging information and the target area determined in step A2, and object detection information is output. The process of step A3 is executed by the object detection means 24.
In step A4, object placement information including identification information for identifying each object and the position of each object is generated based on the object detection information detected in step A3 and indicating the category and position of the object placed in the target area. do. The process of step A4 is executed by the object placement information generating means 25.
In step A5, the object placement information generated in step A4 is stored in the storage means 30. The process of step A5 is executed by the object placement information generating means 25.

物体識別モードに設定された時の動作を、図１０に示されているフローチャートを参照して説明する。
ステップＢ１では、撮像手段１０で撮像した撮像情報を入力する。ステップＢ１の処理は、撮像情報入力手段２２により実行される。
ステップＢ２では、入力された撮像情報に基づいて、対象領域を判別する。ステップＢ２の処理は、対象領域判別手段２３により実行される。
ステップＢ３では、入力された撮像情報とステップＢ２で判別した対象領域に基づいて、対象領域に配置されている複数の物体のカテゴリおよび位置を検出して物体検出情報を出力する。ステップＢ３の処理は、物体検出手段２４によって実行される。
ステップＢ４では、ステップＢ３で検出した、対象領域に配置されている物体のカテゴリおよび位置を示す物体検出情報と、記憶手段３０に記憶されている（物体配置情報生成モードのステップＡ４で生成した）物体配置情報（各物体の識別情報と位置）を照合し、ステップＢ３で検出した、対象領域に配置されている複数の物体を個体識別する。ステップＢ４の処理は、物体識別手段２７によって実行される。
ステップＢ４の処理を終了した後、ステップＢ１に戻る。なお、ステップＢ１～Ｂ４の処理は、連続的に繰り返して実行してもよいし、所定の時間間隔をおいて実行してもよい。 The operation when the object identification mode is set will be explained with reference to the flowchart shown in FIG.
In step B1, image information captured by the image capturing means 10 is input. The process of step B1 is executed by the imaging information input means 22.
In step B2, a target area is determined based on the input imaging information. The process of step B2 is executed by the target area determining means 23.
In step B3, the categories and positions of a plurality of objects placed in the target area are detected based on the input imaging information and the target area determined in step B2, and object detection information is output. The process of step B3 is executed by the object detection means 24.
In step B4, the object detection information indicating the category and position of the object placed in the target area detected in step B3 and stored in the storage means 30 (generated in step A4 in the object placement information generation mode) The object arrangement information (identification information and position of each object) is compared to individually identify the plurality of objects located in the target area detected in step B3. The process of step B4 is executed by the object identifying means 27.
After completing the process in step B4, the process returns to step B1. Note that the processes of steps B1 to B4 may be repeatedly executed continuously, or may be executed at predetermined time intervals.

次に、本発明の異なる実施形態を説明する。
撮像手段１０の撮像位置や撮像角度（撮像方向）が変更されると、撮像画面における、対象領域に配置されている複数の物体の位置（ｘｙ座標）が変化する。
撮像画面における対象領域に配置されている複数の物体の位置が大きく変化すると、物体検出情報と物体配置情報との照合精度、すなわち、各物体の個体識別精度が低下するおそれがある。
このため、撮像手段１０の撮像位置や撮像角度（撮像方向）が変更された場合には、物体配置情報を再生成することによって、各物体の個体識別精度の低下を防止するのが好ましい。
撮像手段１０の撮像位置や撮像角度（撮像方向）が変更されたことは、後述するように、撮像画面の画角の変化量によって判別することができる。 Next, different embodiments of the present invention will be described.
When the imaging position and imaging angle (imaging direction) of the imaging means 10 are changed, the positions (xy coordinates) of the plurality of objects arranged in the target area on the imaging screen change.
If the positions of the plurality of objects placed in the target area on the imaging screen change significantly, the accuracy of matching the object detection information and the object placement information, that is, the accuracy of identifying each object individually, may decrease.
For this reason, when the imaging position or imaging angle (imaging direction) of the imaging means 10 is changed, it is preferable to regenerate the object arrangement information to prevent a decrease in the accuracy of individual identification of each object.
Whether the imaging position or imaging angle (imaging direction) of the imaging means 10 has been changed can be determined by the amount of change in the angle of view of the imaging screen, as will be described later.

本実施形態では、図１に示されている画角変化量判別手段２６を有している。
画角変化量判別手段２６は、記憶手段３０に記憶されている物体配置情報を生成する際に用いた撮像画面（Ｍ）と、撮像手段１０から出力された撮像情報で示される撮像画面（Ｐ）との間の画角変化量が所定範囲を超えているか否かを判別する。画角は、撮像手段１０の撮像位置や撮像角度（撮像方向）によって変化する。
画角変化量判別手段２６による、画角変化量が所定範囲を超えているか否かを判別する方法（画角変化量判別方法）としては、種々の方法を用いることができる。
例えば、撮像画面（Ｍ）と撮像画面（Ｐ）の類似度が所定値より低いか否かを判別する方法を用いることができる。撮像画面（Ｍ）と撮像画面（Ｐ）の類似度は、撮像画面（Ｍ）の特徴点と撮像画面（Ｐ）の特徴点の類似度によって判別することができる。ここで、像撮像画面（Ｍ）および撮像画面（Ｐ）の各特徴点は、ＳＩＦＴなどの局所特徴量に基づく手法を用いて検出することができる。後述のとおり、撮像画面（Ｍ）および撮像画面（Ｐ）の特徴点を用いることにより、撮像画面（Ｍ）と撮像画面（Ｐ）の類似度を判別することができる。 This embodiment includes a viewing angle change amount determining means 26 shown in FIG.
The viewing angle change amount determining means 26 distinguishes between the imaging screen (M) used when generating the object arrangement information stored in the storage means 30 and the imaging screen (P) indicated by the imaging information output from the imaging means 10. ) is beyond a predetermined range. The angle of view changes depending on the imaging position and imaging angle (imaging direction) of the imaging means 10.
Various methods can be used as a method for determining whether or not the amount of change in the angle of view exceeds a predetermined range (a method for determining the amount of change in the angle of view) by the angle of view change amount determining means 26.
For example, a method may be used that determines whether the degree of similarity between the captured screen (M) and the captured screen (P) is lower than a predetermined value. The degree of similarity between the image capture screen (M) and the image capture screen (P) can be determined based on the degree of similarity between the feature points of the image capture screen (M) and the feature points of the image capture screen (P). Here, each feature point of the imaging screen (M) and the imaging screen (P) can be detected using a method based on local feature amounts such as SIFT. As described later, by using the feature points of the image capture screen (M) and the image capture screen (P), it is possible to determine the degree of similarity between the image capture screen (M) and the image capture screen (P).

ここで、撮像画面（Ｍ）における対象領域（Ｍ）と撮像画面（Ｐ）における対象領域（Ｐ）との間の画角変化量は、撮像画面（Ｍ）と撮像画面（Ｐ）との間の画角変化量と等価である。
本実施形態では、画角変化量判別手段２６は、図１１に示されているように、記憶手段３０に記憶されている物体配置情報を生成する際に用いた撮像画面（Ｍ）における対象領域（Ｍ）の特徴点と、撮像手段１０から出力された撮像情報で示される撮像画面（Ｐ）における対象領域（Ｐ）の特徴点の類似度が所定値より低いか否かを判別することによって、撮像画面（Ｍ）と撮像画面（Ｐ）との間の画角変化量が所定範囲を超えているか否かを判別している。
対象領域の特徴点としては、例えば、図１１に破線で例示されている箇所が検出される。なお、好適には、複数の特徴点が検出される。
そして、両対象領域の特徴点のうち、それぞれの特徴点同士の局所特徴量による類似度から対応関係が一致すると判別される特徴点の集合により、両対象領域の特徴ベクトルを生成する。そして、両対象領域の特徴ベクトルの類似度が所定値以上であるか否かを判別する。
特徴ベクトルの類似度は、例えば、対象領域（Ｍ）の特徴ベクトルを〈Ｍ〉、対象領域（Ｐ）の特徴ベクトルを〈Ｐ〉とした場合、以下に示す両ベクトルのコサイン距離で表される。
Ｓｉｍ（〈Ｍ〉，〈Ｐ〉）＝（〈Ｍ〉・〈Ｐ〉）／（｜〈Ｍ〉｜｜〈Ｐ〉｜）
コサイン距離が所定値以上である場合には、特徴ベクトルの類似度が高く、画角変化量が所定範囲を超えていないことを判別し、所定値未満である場合に、特徴ベクトルの類似度が低く、画角変化量が所定範囲を超えていることを判別する。
なお、特徴点検出、特徴ベクトル算出手法、類似度判定手法としては、前記手法以外の種々の手法を用いることができる。
また、画角変化量を検出する手法としては、前記手法以外にも、オプティカルフローによって画素の移動による変化量を速度ベクトルとして算出する手法等の種々の手法を用いることができる。
撮像画面（Ｍ）および撮像画面（Ｐ）の特徴点は、撮像画面（Ｍ）における対象領域（Ｍ）および撮像画面（Ｐ）における対象領域（Ｐ）の特徴点に限定されない。 Here, the amount of change in the angle of view between the target area (M) on the imaging screen (M) and the target area (P) on the imaging screen (P) is the difference between the imaging screen (M) and the imaging screen (P). is equivalent to the amount of change in angle of view.
In this embodiment, as shown in FIG. By determining whether the degree of similarity between the feature points of (M) and the feature points of the target area (P) on the imaging screen (P) indicated by the imaging information output from the imaging means 10 is lower than a predetermined value. , it is determined whether the amount of change in the angle of view between the imaging screen (M) and the imaging screen (P) exceeds a predetermined range.
As the feature points of the target area, for example, the locations illustrated by broken lines in FIG. 11 are detected. Note that preferably a plurality of feature points are detected.
Then, feature vectors for both target regions are generated from a set of feature points that are determined to have matching correspondence relationships based on the degree of similarity between the respective feature points based on the local feature amount among the feature points of both target regions. Then, it is determined whether the degree of similarity between the feature vectors of both target regions is greater than or equal to a predetermined value.
For example, if the feature vector of the target region (M) is <M> and the feature vector of the target region (P) is <P>, the similarity of the feature vectors is expressed by the cosine distance of both vectors shown below. .
Sim(〈M〉,〈P〉) = (〈M〉・〈P〉)/(｜〈M〉｜｜〈P〉｜)
If the cosine distance is greater than or equal to a predetermined value, it is determined that the similarity of the feature vectors is high and the amount of change in the angle of view does not exceed a predetermined range, and if the cosine distance is less than a predetermined value, the similarity of the feature vectors is It is determined that the amount of change in the angle of view is low and the amount of change in the angle of view exceeds a predetermined range.
Note that various methods other than the above methods can be used as the feature point detection, feature vector calculation method, and similarity determination method.
Furthermore, as a method for detecting the amount of change in the angle of view, various methods other than the above method can be used, such as a method of calculating the amount of change due to pixel movement as a velocity vector using optical flow.
The feature points of the imaging screen (M) and the imaging screen (P) are not limited to the feature points of the target area (M) on the imaging screen (M) and the target area (P) on the imaging screen (P).

本実施形態の物体識別装置の動作を説明する。本実施形態の物体識別装置は、物体配置情報生成モードあるいは物体識別モードに設定可能に構成されている。
物体配置情報生成モードに設定された時の動作は、図９に示されている動作と同じであるため、説明を省略する。 The operation of the object identification device of this embodiment will be explained. The object identification device of this embodiment is configured to be able to be set to object placement information generation mode or object identification mode.
The operation when the object placement information generation mode is set is the same as the operation shown in FIG. 9, so the explanation will be omitted.

物体識別モードに設定された時の動作を、図１２に示されているフローチャートを参照して説明する。
ステップＣ１およびＣ２の動作は、図１０に示されているステップＢ１およびＢ２の動作と同じである。
ステップＣ３では、画角変化量が所定範囲を超えているか否かを判断する。画角変化量が所定範囲を超えている場合には、ステップＣ４に進む。一方、画角変化量が所定範囲を超えていない場合には、ステップＣ７に進む。ステップＣ３の処理は、画角変化量判別手段２６によって実行される。
ステップＣ４～Ｃ６では、図９に示されているステップＡ３～Ａ５と同様に物体配置情報を生成（再生成）して記憶手段３０に記憶する。
ステップＣ７およびＣ８では、図１０に示されているステップＢ３およびＢ４と同様に、対象領域に配置されている複数の物体を個体識別する。 The operation when the object identification mode is set will be explained with reference to the flowchart shown in FIG.
The operations of steps C1 and C2 are the same as the operations of steps B1 and B2 shown in FIG.
In step C3, it is determined whether the amount of change in the angle of view exceeds a predetermined range. If the amount of change in angle of view exceeds the predetermined range, the process proceeds to step C4. On the other hand, if the amount of change in the angle of view does not exceed the predetermined range, the process proceeds to step C7. The process of step C3 is executed by the viewing angle change amount determining means 26.
In steps C4 to C6, object placement information is generated (regenerated) and stored in the storage means 30 in the same way as steps A3 to A5 shown in FIG.
In steps C7 and C8, similar to steps B3 and B4 shown in FIG. 10, a plurality of objects placed in the target area are individually identified.

本発明の好適な利用形態としては、操作盤の操作パネル等に配置されている複数の操作部（同一形状の複数の操作部を含む）の状態を監視する監視装置等が想定される。
このような監視装置に、非特許文献１に開示されているSSDや非特許文献２に開示されているYOLOによる物体検出手段を用いた場合、前述のとおり、対象領域内に存在する同一形状の複数の物体を個体識別することができない。
このため、従来では、対象領域に配置されている複数の物体の配置状態を示す配置情報（各物体の位置とＩＤ）を２次元マップとして作成（登録）しておき、撮像手段から出力される撮像情報に基づいて検出した物体検出情報（対象領域に配置されている複数の物体の位置およびカテゴリ）と２次元マップで示されるは物体配置情報（各物体の位置とＩＤ）とを照合することによって、各物体を個体識別していた。
しかしながら、物体を登録する毎に２次元マップを手作業で作成する必要があり、２次元マップの作成に労力と時間を要する。
一方、本発明の物体識別装置を用いた場合、物体配置情報が自動的に生成されるため、対象領域に配置されている物体の配置状態を示す２次元マップを手作業で作成する必要がない。
また、撮像情報が入力されるごとに、物体検出情報と物体配置情報を照合することによって、操作パネルに配置されている複数の操作部を個体識別するため、作業者によって操作された操作部や操作部の状態を容易に識別することができる。 A preferred form of use of the present invention is assumed to be a monitoring device that monitors the status of a plurality of operation sections (including a plurality of operation sections having the same shape) arranged on an operation panel of an operation panel or the like.
When the SSD disclosed in Non-Patent Document 1 or the YOLO object detection means disclosed in Non-Patent Document 2 is used in such a monitoring device, as described above, objects of the same shape existing in the target area are detected. Unable to identify multiple objects individually.
For this reason, conventionally, arrangement information (position and ID of each object) indicating the arrangement state of a plurality of objects arranged in a target area is created (registered) as a two-dimensional map, and is output from an imaging means. Compare object detection information (positions and categories of multiple objects placed in the target area) detected based on imaging information with object placement information (position and ID of each object) shown on a two-dimensional map. This allows each object to be identified individually.
However, it is necessary to manually create a two-dimensional map each time an object is registered, and creating a two-dimensional map requires effort and time.
On the other hand, when using the object identification device of the present invention, object placement information is automatically generated, so there is no need to manually create a two-dimensional map showing the placement state of objects placed in the target area. .
In addition, each time imaging information is input, the object detection information and object placement information are compared to identify each of the multiple operation parts arranged on the operation panel. The state of the operating section can be easily identified.

本発明は、実施形態で説明した構成に限定されず、種々の変更、追加、削除が可能である。
処理手段２０を管理手段２１～物体識別手段２７により構成したが、処理手段２０の構成は、これに限定されない。例えば、不要な手段を削除することもできるし、複数の手段を統合することもできる。
対象領域判別手段２３、物体検出手段２４、物体配置情報生成手段２５、画角変化量判別手段２６および物体識別手段２７の構成は、実施形態で説明した構成に限定されず、本発明の要旨を変更しない範囲内で種々変更可能である。
図９、図１０および図１２に示されている処理は、種々変更可能である。 The present invention is not limited to the configuration described in the embodiments, and various changes, additions, and deletions are possible.
Although the processing means 20 is configured by the management means 21 to the object identification means 27, the configuration of the processing means 20 is not limited to this. For example, unnecessary means can be deleted or multiple means can be integrated.
The configurations of the target area determination means 23, the object detection means 24, the object placement information generation means 25, the angle of view change amount determination means 26, and the object identification means 27 are not limited to those described in the embodiments, and the gist of the present invention is Various changes can be made within the same range.
The processes shown in FIGS. 9, 10, and 12 can be modified in various ways.

１０撮像手段
１１撮像画面
２０処理手段
２１管理手段
２２撮像情報入力手段
２３対象領域判別手段
２４物体検出手段
２５物体配置情報作成手段
２６物体識別手段
２７画角変化判別手段
３０記憶手段
３１物体配置情報記憶部
４０入力手段
５０表示手段
１００操作盤
１１０操作パネル（対象領域）
１１１縁部（対象領域縁部）
１１１ａ上側縁部（対象領域上側縁部）
１１１ｂ左側縁部（対象領域左側縁部）
１１１ｃ下側縁部（対象領域下側縁部）
１１１ｄ右側縁部（対象領域右側縁部）
１２０ａ～１２０ｃボリューム
１３０ａ、１３０ｂダイヤル
１４０ａ～１４０ｋボタン 10 Imaging means 11 Imaging screen 20 Processing means 21 Management means 22 Imaging information input means 23 Target area determination means 24 Object detection means 25 Object arrangement information creation means 26 Object identification means 27 View angle change determination means 30 Storage means 31 Object arrangement information storage Section 40 Input means 50 Display means 100 Operation panel 110 Operation panel (target area)
111 Edge (target area edge)
111a Upper edge (upper edge of target area)
111b Left side edge (left side edge of target area)
111c Lower edge (lower edge of target area)
111d Right edge (right edge of target area)
120a~120c Volume 130a, 130b Dial 140a~140k Button

Claims

An object identification device that individually identifies a plurality of objects, including a plurality of objects of the same shape, arranged in a target area,
comprising an imaging means, a target area determination means, an object detection means, an object placement information generation means, an object identification means, and a storage means,
The imaging means outputs imaging information indicating an imaging screen that captures an imaging area including the target area and the plurality of objects arranged in the target area,
The target area determining means determines the target area on the imaging screen based on the imaging information output from the imaging unit,
The object detecting means is arranged in the target area on the imaging screen based on the imaging information output from the imaging means and the target area on the imaging screen determined by the target area determining means. detecting categories and positions of the plurality of objects;
The object placement information generation means generates identification information and positions of the plurality of objects based on categories and positions of the plurality of objects arranged in the target area on the imaging screen detected by the object detection means. generating object placement information shown and storing it in the storage means,
The object identification means is configured to identify categories and positions of the plurality of objects located in the target area on the imaging screen detected by the object detection means, and the object arrangement information stored in the storage means. individually identifying the plurality of objects arranged in the target area on the imaging screen based on identification information and positions of the plurality of objects;
It can be set to object placement information generation mode and object identification mode,
When the object placement information generation mode is set, the object placement information is generated by the target area determination means, the object detection means, and the object placement information generation means, and is stored in the storage means;
When the object identification mode is set, the object area determining means, the object detecting means, and the object identifying means individually identify the plurality of objects arranged in the object area on the imaging screen. An object identification device comprising:

The object identification device according to claim 1,
Equipped with means for determining the amount of change in angle of view,
The viewing angle change amount determining means determines the difference between the imaging screen used to generate the object arrangement information stored in the storage means and the imaging screen indicated by the imaging information output from the imaging means. Determine whether the amount of change in the angle of view exceeds a predetermined range,
When the object identification mode is set and the angle of view change amount determination means determines that the amount of change in the angle of view exceeds the predetermined range, the object area determination means, the object detection means and An object identification device characterized in that the object arrangement information is configured to be generated by the object arrangement information generating means and stored in the storage means.

The object identification device according to claim 1 or 2,
The object identification means detects the category and position of the object located in the target area on the imaging screen detected by the object detection means, and the object placement information stored in the storage means. An object identification device characterized in that the plurality of objects are individually identified by comparing identification information and positions while allowing a predetermined error range.

The object identification device according to any one of claims 1 to 3,
The object detection means detects categories and relative positions of the plurality of objects in the target area,
The object identification device is characterized in that the object placement information generating means generates object placement information indicating identification information of the plurality of objects and relative positions in the target area.

The object identification device according to any one of claims 1 to 4,
An object identification device characterized in that the target area discriminating means and the object detecting means are constituted by a multilayer neural network.