JP7624875B2

JP7624875B2 - Task recognition device and task recognition method

Info

Publication number: JP7624875B2
Application number: JP2021083456A
Authority: JP
Inventors: 卓馬寺田; 洋登永吉; 克行中村
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2025-01-31
Anticipated expiration: 2041-05-17
Also published as: WO2022244536A1; JP2022176819A

Description

本発明は、作業認識装置および作業認識方法に係り、特に、作業監視のために、製造現場など作業する作業員やロボットと周辺物体の関係を関係付けて作業認識を行うのに好適な作業認識装置および作業認識方法に関する。 The present invention relates to a task recognition device and a task recognition method, and in particular to a task recognition device and a task recognition method suitable for task monitoring by associating the relationship between workers or robots working in a manufacturing site, etc., and surrounding objects to recognize tasks.

製造現場では、製品品質を維持するため人作業に関する安全性の確保や作業内容の正確性向上に向けて改善活動が行われている。それらの改善活動の一つに、人作業における動作認識に関する技術開発が産業分野に適用されている。例えば、製造現場の標準作業書などに従った行動をしているのか検出するケースや、特定の作業にどれぐらい時間を要しているのか作業時間を算出するケースなどが挙げられる。このような技術が対象とする製造現場としては、様々な機械や道具が設置され、工程により機械の動きが異なり、作業ごとに扱う道具が変わる作業現場が想定される。 In manufacturing sites, improvement activities are being carried out to ensure the safety of human work and to improve the accuracy of work content in order to maintain product quality. One of these improvement activities is the development of technology related to motion recognition in human work, which is being applied to the industrial field. For example, there are cases where it is necessary to detect whether actions are being taken in accordance with standard operating procedures at the manufacturing site, or to calculate the amount of time required for a specific task. The manufacturing sites that this technology targets are likely to be workplaces where a variety of machines and tools are installed, where machine movements differ depending on the process, and where different tools are used for each task.

一方、作業認識に関するコンピュータビジョンの技術においては、コンピュータ性能が急発展したことや一般的なデータの大量収集が容易になったことから、機械学習を用いた動作認識技術が盛んに行われている。この動作認識技術では、動画像データを入力とし、情報処理装置が、その動画内の人物がとっている動作を認識する。そして、このような動作認識技術に関連する公開データベースとしては、例えば、非特許文献１があり、このデータベースによれば、各画像データに対して、”Playing Piano”, “Surfing”, “Table Tennis Shot”などのラベルが付与されている。 Meanwhile, in computer vision technology related to task recognition, the rapid advances in computer performance and the ease of collecting large amounts of general data have led to the widespread use of motion recognition technology using machine learning. With this motion recognition technology, video image data is input, and an information processing device recognizes the motions made by people in the video. An example of a public database related to this type of motion recognition technology is Non-Patent Document 1, which assigns labels such as "Playing Piano," "Surfing," and "Table Tennis Shot" to each piece of image data.

また、画像から物体領域を取得する物体アルゴリズムについては、例えば、非特許文献２、非特許文献３に記載がある。 Also, object algorithms for acquiring object regions from images are described in, for example, Non-Patent Documents 2 and 3.

これに関連して、情報処理装置により、映像内に存在する物体間の紐づけする方法としては、例えば、特許文献１に開示されている。特許文献１では、人物カウント向けの情報処理装置として、鏡に映り込んだ鏡像と実像の繋がりを見つけ、鏡像が映りこむ画像においても正確な人物カウントを実現する方法を提案している。鏡像と実像の繋がりを見つけるには、画像内における複数の移動体の位置変化に応じた速度の類似度に基づいて、複数の移動体における同一の実移動体に対応する二つ以上の移動体を特定し、移動量の時間変化が相似に近い形であるか判定し、二つの物体（実像と鏡像）の繋がりを推定している。 In relation to this, a method for linking objects present in an image using an information processing device is disclosed in, for example, Patent Document 1. Patent Document 1 proposes a method for an information processing device for counting people, which finds the connection between a mirror image and a real image reflected in a mirror, and realizes accurate person counting even in images in which a mirror image is reflected. To find the connection between a mirror image and a real image, two or more moving objects corresponding to the same real moving object among the multiple moving objects are identified based on the similarity of the speed according to the positional changes of the multiple moving objects in the image, and it is determined whether the change in the amount of movement over time is similar, and the connection between the two objects (real image and mirror image) is estimated.

また、センサやカメラを活用して、行動認識を行う手法については、例えば、非特許文献４に記載されている。 Techniques for behavior recognition using sensors and cameras are described in, for example, Non-Patent Document 4.

また、非特許文献５では、機械学習の一態様として、ＧＣＮ（Graph Convolution Network）により行動認識を行う技術が記載されている。ＧＣＮとは、いわゆる畳み込みニューラルネットワークの一分野であり、ある物体の特徴量およびクラスの情報を備えたノードと、ノード間をつなげるエッジから構成されるグラフを生成し、隣接するノードの特徴量を用いて目標のノードクラスやグラフ全体のクラスを推論できる手法である。ＧＣＮでは、ノード間のつながりを事前に定義するか、映像内の変化に応じて自動的に決定する方法がある。非特許文献５では二つの手法を採用している。一つ目は、カメラで撮影した映像に対して物体の位置を検出し、物体間の重なり度合いから重畳率を算出した後、その重畳率に応じて両物体の関係を推定する手法である。二つ目は、時系列に取得したフレーム画像において、類似する物体が映っている場合に同一物体とみなし、時間方向のフレーム間における物体間の関係をする手法である。このような手法により、物体間の関係を事前定義あるいは推定することによってグラフを生成でき、物体間を考慮した行動認識が可能となる。 In addition, Non-Patent Document 5 describes a technique for behavior recognition using a graph convolution network (GCN) as one aspect of machine learning. GCN is a type of so-called convolutional neural network, and is a method for generating a graph consisting of nodes with feature and class information of an object and edges connecting the nodes, and inferring a target node class or the class of the entire graph using the feature of adjacent nodes. In GCN, there is a method for predefining the connections between nodes or automatically determining them according to changes in the video. Non-Patent Document 5 adopts two methods. The first method is a method for detecting the position of an object in a video captured by a camera, calculating the overlap rate from the degree of overlap between the objects, and then estimating the relationship between the two objects according to the overlap rate. The second method is a method for regarding similar objects as the same object when they are captured in frame images acquired in a time series, and determining the relationship between the objects between frames in the time direction. With such a method, a graph can be generated by predefining or estimating the relationship between objects, making it possible to recognize behavior taking the objects into consideration.

特開２０２０－８７３５８号公報JP 2020-87358 A

UNIVERSITY OF CENTRAL FLORIDA、“UCF101 - Action Recognition Data Set”、[online]、[令和３年４月１６日検索]、インターネット＜ＵＲＬ：https://www.crcv.ucf.edu/research/data-sets/ucf101/＞UNIVERSITY OF CENTRAL FLORIDA, "UCF101 - Action Recognition Data Set", [online], [Retrieved April 16, 2021], Internet <URL: https://www.crcv.ucf.edu/research/data-sets/ucf101/> Ren, Shaoqing, et al., “Faster r-cnn: Towards real-time object detection with region proposal networks.”, IEEE transactions on pattern analysis and machine intelligence 39.6, 2016, p. 1137-1149Ren, Shaoqing, et al., “Faster r-cnn: Towards real-time object detection with region proposal networks.”, IEEE transactions on pattern analysis and machine intelligence 39.6, 2016, p. 1137-1149 Yolo(Redmon, Joseph, et al. "You only look once: Unified, real-time object detection.", Proceedings of the IEEE conference on computer vision and pattern recognition, 2016Yolo(Redmon, Joseph, et al. "You only look once: Unified, real-time object detection.", Proceedings of the IEEE conference on computer vision and pattern recognition, 2016 Vrigkas，M.，Nikou，C． and Kakadiaris，I．A．”A Review of Human Activity Recognition Methods”，Frontiers in Robotics and AI， 2015Vrigkas, M., Nikou, C. and Kakadiaris, I. A. “A Review of Human Activity Recognition Methods”, Frontiers in Robotics and AI, 2015 Wang, Xiaolong, and Abhinav Gupta, “Videos as space-time region graphs”, Proceedings of the European conference on computer vision (ECCV), 2018.Wang, Xiaolong, and Abhinav Gupta, “Videos as space-time region graphs”, Proceedings of the European conference on computer vision (ECCV), 2018.

コンピュータビジョンの技術において、製造現場の作業者を対象に動作認識技術を適用することを想定した場合には、製造現場毎に異なる特有の動作を認識する必要がある。その際、対象の作業者にのみ注目して作業者の動きを捉える方法が一般的であるが、この方法によった場合、複数の異なる作業を同じ姿勢などで実施していた場合に識別することが困難である。また、製造現場に機械が存在する現場では、ある作業を実施する際、作業者だけでなくその周辺の機械が同時に動く、または道具を同時に扱う場合がある。すなわち、作業認識の対象とする物体（ヒトやモノ）とその周辺の物体は、同時に変化するケースが存在する。 When applying computer vision technology to motion recognition technology for workers in manufacturing sites, it is necessary to recognize the unique motions that differ for each manufacturing site. In such cases, a common method is to focus only on the target worker and capture his or her movements, but this method makes it difficult to distinguish between workers who are performing multiple different tasks in the same posture. Furthermore, in manufacturing sites where machines are present, when performing a certain task, not only the worker but also the surrounding machines may move at the same time, or tools may be handled at the same time. In other words, there are cases where the object (person or object) being recognized as the target of task recognition and the surrounding objects change at the same time.

製造現場に適用する場合、ある機械の見た目は変化するが大きな移動がないケースがある。非特許文献５では、上記のように、物体間の関連性を推定するために、物体の物体間の重畳率また類似度を用いているが、いずれを採用しても、製造現場の物体間の関係を適切に扱うことができない事態が想定される。例えば、加工作業時に設備のドアが開き、新しいワークが運搬されて設備内に投入するケースで、それが同時に動作していたとしても、カメラの設置位置により画像空間上で重ならないことがある。そのため、物体間の重なりや距離に依存しない方式で関連性を推定する必要がある。 When applied to manufacturing sites, there are cases where the appearance of a machine changes but there is no significant movement. As described above, Non-Patent Document 5 uses the overlap rate or similarity between objects to estimate the relevance between objects, but no matter which method is adopted, it is possible that the relationship between objects in a manufacturing site cannot be properly handled. For example, when a machine door opens during processing work and a new workpiece is brought in and inserted into the machine, even if they are operating simultaneously, they may not overlap in the image space depending on the installation position of the camera. Therefore, it is necessary to estimate the relevance using a method that does not depend on the overlap or distance between objects.

実像と鏡像の繋がりを見出す方法を提案している特許文献１では、対象物体が同一のものであることを想定している。ある作業を実施する際に、製造現場に存在する複数の物体（機械・ワーク・作業者）が同時に動くことはあるが、それらの機械・ワーク・作業者などの間で、時間変化による動き量は、物体別に異なる。そのため、機械・ワーク・作業者の間で移動量の時間変化の相似から製造現場にある物体の関連性を推定することは困難である。 Patent Document 1, which proposes a method for finding the connection between real images and mirror images, assumes that the target objects are the same. When performing a certain task, multiple objects (machines, workpieces, workers) present at the manufacturing site may move simultaneously, but the amount of movement over time between these machines, workpieces, workers, etc. differs for each object. For this reason, it is difficult to estimate the relevance of objects at the manufacturing site from the similarity of the change in movement amount over time between machines, workpieces, and workers.

また、特許文献１および非特許文献５において、物体間のつながりを決定するための指標として重畳率や類似度などを採用しており、閾値を設けて判定している。そのため、この手法により物体間のつながりに関連性ありと判定される場合は、それらの指標が劇的な値をとる場合のみであり、それらの指標の重畳率や類似度一定の閾値を超えなくても、物体間のつながりを有すると認識すべき場合には、そのような手法では判定することができず、さらに物体の位置関係などからの関連性を推定するアルゴリズムが必要になるという問題が生ずる。 In addition, in Patent Document 1 and Non-Patent Document 5, overlap rate, similarity, etc. are used as indices for determining the connection between objects, and a threshold value is set for the judgment. Therefore, this method can only judge that there is a correlation between the objects when the indices have dramatic values. Even if the overlap rate or similarity of the indices does not exceed a certain threshold, such a method cannot judge that there is a connection between the objects, and a problem arises in that an algorithm is required to estimate the correlation from the positional relationship of the objects, etc.

本発明の目的は、製造現場などの作業認識で物体間のつながりを用いた作業認識を行う場合に、簡易なアルゴリズムにより、汎用的に物体間の関連性を推定することのできる作業認識装置を提供することにある。 The object of the present invention is to provide a task recognition device that can generically estimate the relationships between objects using a simple algorithm when performing task recognition using connections between objects in tasks such as manufacturing sites.

本発明の作業認識装置の構成は、好ましくは、画像データを解析し、その画像データを解析することにより、作業にかかわる物体の作業認識を行う作業認識装置であって、撮影装置により撮影された画像データを取得する画像取得部と、画像取得部から得た画像データを解析し、物体に関する領域を検出する解析領域検出部と、解析領域検出部により検出された物体に関する領域の画像に関する特徴量を抽出し、所定期間内における画像に関する特徴量の変化量を算出する特徴抽出部と、所定期間内における第一の物体に関する領域として検出された画像に関する特徴量の第一の変化量と、第一の変化量の所定期間内と同一期間における第二の物体に関する領域として検出された画像に関する特徴量の第二の変化量に基づき、第一の物体と第二の物体の物体関連度を算出する関連性推定部とを有するようにしたものである。 The task recognition device of the present invention is preferably configured to analyze image data and perform task recognition of objects related to the task by analyzing the image data, and includes an image acquisition unit that acquires image data captured by a capture device, an analysis area detection unit that analyzes the image data obtained from the image acquisition unit and detects an area related to the object, a feature extraction unit that extracts features of the image of the area related to the object detected by the analysis area detection unit and calculates a change in the feature of the image within a predetermined period of time, and a relevance estimation unit that calculates the object relevance between a first object and a second object based on a first change in the feature of the image detected as an area related to a first object within a predetermined period of time and a second change in the feature of the image detected as an area related to a second object within the same predetermined period of time as the first change.

本発明によれば、製造現場などの作業認識で物体間のつながりを用いた作業認識を行う場合に、簡易なアルゴリズムにより、汎用的に物体間の関連性を推定することのできる作業認識装置を提供することができる。 The present invention provides a task recognition device that can generically estimate the relationships between objects using a simple algorithm when performing task recognition using connections between objects in tasks such as manufacturing sites.

作業認識システムの概略的な構成図である。FIG. 1 is a schematic configuration diagram of an activity recognition system. 実施形態１に係る作業認識装置の機能構成図である。1 is a functional configuration diagram of a task recognition device according to a first embodiment. FIG. 作業認識装置のハードウェア・ソフトウェア構成図である。FIG. 2 is a diagram illustrating a hardware/software configuration of the task recognition device. 実施形態１係る作業認識装置の画像取得から学習処理までの一連の処理を示すフローチャートである。4 is a flowchart showing a series of processes from image acquisition to learning processing in the task recognition device according to the first embodiment. 特徴量の変化量抽出処理（動き変化の場合）の詳細を示すフローチャートである。13 is a flowchart showing details of a process for extracting an amount of change in a feature amount (in the case of a change in motion); 特徴量の変化量抽出処理（テクスチャ変化の場合）の詳細を示すフローチャートである。13 is a flowchart showing details of a feature amount change extraction process (in the case of texture change); 関連性推定処理の詳細を示すフローチャートである。13 is a flowchart showing details of a relevance estimation process. グラフ生成処理の詳細を示すフローチャートである。13 is a flowchart showing details of a graph generation process. 作業認識装置の画像取得から推論処理までの一連の処理を示すフローチャートである。1 is a flowchart showing a series of processes from image acquisition to inference processing by the task recognition device. 画像データから物体関連度を算出する具体的なイメージを説明する図である。FIG. 11 is a diagram for explaining a specific image of calculating an object relevance from image data. 物体関連度をグラフ生成に適用する例について説明する図である。11A and 11B are diagrams illustrating an example in which object relevance is applied to graph generation. 推論結果としてノードにノード作業ラベルを付与することを説明する図である。FIG. 13 is a diagram for explaining how a node work label is assigned to a node as an inference result. 実施形態２に係る作業認識装置の機能構成図である。FIG. 11 is a functional configuration diagram of a task recognition device according to a second embodiment. 実施形態２に係る作業認識装置の画像取得から学習処理までの一連の処理を示すフローチャートである。10 is a flowchart showing a series of processes from image acquisition to learning processing in a task recognition device according to a second embodiment. 物体関連度編集画面の一例を示す図である。FIG. 13 is a diagram showing an example of an object relevance editing screen. 実施形態３の画像データから物体関連度を算出する具体的なイメージを説明する図である。13A to 13C are diagrams illustrating a specific image of calculating an object relevance from image data according to the third embodiment.

以下、本発明に係る各実施形態を、図１ないし図１５を用いて説明する。 Each embodiment of the present invention will be described below with reference to Figures 1 to 15.

〔実施形態１〕
以下、本発明に係る実施形態１を、図１ないし図１１を用いて説明する。 [Embodiment 1]
Hereinafter, a first embodiment of the present invention will be described with reference to FIGS. 1 to 11. FIG.

本実施形態では、ユーザが作業している様子から作業者および周辺物体を検出し、その状態をグラフ構造により表現して、そのグラフ構造に基づいたＧＣＮの手法により機械学習することにより、作業動作を認識する作業認識システムの例を説明する。作業動作を認識するためのデータ構造としてグラフ構造を用いるのには、一例であり、本実施形態の適用のためには、それに類するネットワーク構造やアルゴリズムであってもよい。 In this embodiment, an example of a task recognition system is described that detects a worker and surrounding objects from the way a user works, expresses the state in a graph structure, and recognizes work actions by machine learning using a GCN technique based on the graph structure. Using a graph structure as a data structure for recognizing work actions is just one example, and a similar network structure or algorithm may be used for the application of this embodiment.

先ず、図１ないし図３を用いて作業認識システムの構成について説明する。 First, we will explain the configuration of the task recognition system using Figures 1 to 3.

作業認識システム１は、図１に示されように、画像取得装置６、センサ３、作業認識装置１００が、通信手段４によって接続された構成である。 As shown in FIG. 1, the task recognition system 1 is configured such that an image capture device 6, a sensor 3, and a task recognition device 100 are connected via a communication means 4.

通信手段４は、有線でも無線でもよく、ＬＡＮ（Local Area Network）でも、インターネットなどのＷＡＮ（Wide Area Network）であってもよい。また、ＵＳＢ（Universal Serial Bus）やＲＳ－２３２Ｃ等のシリアル規格に準拠した通信手段であってもよい。 The communication means 4 may be wired or wireless, and may be a LAN (Local Area Network) or a WAN (Wide Area Network) such as the Internet. It may also be a communication means that complies with serial standards such as USB (Universal Serial Bus) or RS-232C.

画像取得装置６（撮影装置）は、作業者２や周辺物体５などを映した画像データを取得する装置であり、例えば、動画や静止画の画像データを取得（撮影）するカメラ（デジタルカメラ（ＲＧＢカメラ）、赤外線カメラ、サーモグラフィカメラ、タイムオブフライト（ＴＯＦ：Time Of Flight）カメラ、ステレオカメラ等）である。画像取得装置６は、図１では、１台で表現されているが、撮影対象が複数ある場合など、複数台のカメラを用いて、異なるカメラの複数の画像を用いてもよい。 The image acquisition device 6 (photographing device) is a device that acquires image data showing the worker 2 and surrounding objects 5, and is, for example, a camera (digital camera (RGB camera), infrared camera, thermography camera, time-of-flight (TOF) camera, stereo camera, etc.) that acquires (takes) image data of videos and still images. Although the image acquisition device 6 is shown as one unit in FIG. 1, multiple cameras may be used and multiple images from different cameras may be used when there are multiple subjects to be photographed.

センサ３は、作業者２が作業を行う作業環境に設けられ、作業者２や作業環境についての状態を検知し、物理的な情報を出力する。センサ３は、例えば、動体検知センサ、人感センサ、温度センサ、湿度センサ、加速度センサ、速度センサ、音響センサ（マイクロホン）、超音波センサ、振動センサ、ミリ波レーダ、レーザレーダ（LIDAR: Laser Imaging Detection and Ranging）、赤外線深度センサである。 Sensor 3 is provided in the work environment where worker 2 performs work, detects the state of worker 2 and the work environment, and outputs physical information. Sensor 3 is, for example, a motion detection sensor, a human sensor, a temperature sensor, a humidity sensor, an acceleration sensor, a speed sensor, an acoustic sensor (microphone), an ultrasonic sensor, a vibration sensor, a millimeter wave radar, a laser radar (LIDAR: Laser Imaging Detection and Ranging), or an infrared depth sensor.

作業認識装置１００は、画像取得装置６によって取得される画像データに基づき作業者２の作業動作または周辺物体５の作業動作を認識する処理を行う装置である。 The task recognition device 100 is a device that performs processing to recognize the task actions of a worker 2 or the task actions of a surrounding object 5 based on image data acquired by an image acquisition device 6.

次に、図２を用いて作業認識装置の機能構成を説明する。
作業認識装置１００は、図２に示されるように、画像取得部１０１、解析領域検出部１０２、特徴抽出部１０３、関連性推定部１０４、グラフ生成部１０５、作業学習部１０６、作業推論部１０７、記憶部１１０の各機能部を有する。 Next, the functional configuration of the task recognition device will be described with reference to FIG.
As shown in FIG. 2 , the activity recognition device 100 has the following functional units: an image acquisition unit 101, an analysis area detection unit 102, a feature extraction unit 103, a relevance estimation unit 104, a graph generation unit 105, an activity learning unit 106, an activity reasoning unit 107, and a memory unit 110.

画像取得部１０１は、画像取得装置６から取得される画像データ２００を所得する機能部である。画像データ２００は、例えば、画像取得装置６から送られてくる静止画データや動画データを構成するフレームのデータである。 The image acquisition unit 101 is a functional unit that acquires image data 200 acquired from the image acquisition device 6. The image data 200 is, for example, frame data constituting still image data or video data sent from the image acquisition device 6.

解析領域検出部１０２は、画像取得部１０１で得た画像から物体領域を検出する機能部である。物体領域の検出には、ユーザが手動で画像上に矩形を描くように設定する方法、または、例えば、非特許文献２、非特許文献３に示されるような物体検出アルゴリズムを用いて自動的に領域を取得する方法があり、手動と自動のいずれの方法を用いてもよい。 The analysis area detection unit 102 is a functional unit that detects an object area from the image obtained by the image acquisition unit 101. The object area can be detected by a method in which the user manually draws a rectangle on the image, or by a method in which the area is automatically acquired using an object detection algorithm such as those shown in Non-Patent Documents 2 and 3. Either a manual or automatic method may be used.

特徴抽出部１０３は、画像における特徴量とその変化量を抽出する機能部である。 The feature extraction unit 103 is a functional unit that extracts features and their changes in an image.

特徴量の抽出の機能では、特徴抽出部１０３は、画像から得られる色特徴量、動き特徴量、ＣＮＮ（Convolutional Neural Network）特徴量などの特徴量を抽出する。なお、本実施形態の特徴利用は、スカラー値であるものとする。特徴量の抽出の機能では、特徴抽出部１０３は、前後フレームなど時系列にデータを確保しておき、前後フレームの特徴量の差を用いて特徴量の変化量を抽出する。所定の領域から特徴抽出するだけでなく、位置情報などのその領域が１点で表現できるような変化量であってもよい。 In the feature extraction function, the feature extraction unit 103 extracts features such as color features, motion features, and CNN (Convolutional Neural Network) features obtained from an image. Note that in this embodiment, features are used as scalar values. In the feature extraction function, the feature extraction unit 103 secures data in a time series such as previous and next frames, and extracts the amount of change in the feature using the difference in the features of the previous and next frames. Features can be extracted not only from a specified area, but also as an amount of change such that the area can be expressed by one point, such as position information.

関連性推定部１０４は、特徴抽出部１０３の出力された特徴量の変化量を用いて物体間のつながりを示す強弱を推定する物体関連度を算出する機能部である。関連性推定部１０４は、各物体から取得した変化量に対して、補正パラメータ２０１を用いて正規化（スケールを０～１の間に調整）する。そして、正規化された特徴量の変化量に対して、物体間の関係の強さとして、各々演算して、例えば、それらの積として、算出した結果を、物体関連度として出力する。補正パラメータ２０１は、特徴量の変化量を正規化するためのパラメータであり、物体間の矩形の大きさや平均的な変化量の差を埋める目的を有する。上記の説明では、物体間の関係の強さを、特徴量の変化量の各々の積で算出する例を示したが、特徴量の変化量の各々の和、重み付き線形和などの算出方法であってもよい。 The relevance estimation unit 104 is a functional unit that calculates an object relevance that estimates the strength of the connection between objects using the change in the feature amount output by the feature extraction unit 103. The relevance estimation unit 104 normalizes (adjusts the scale between 0 and 1) the change amount obtained from each object using the correction parameter 201. Then, the normalized change in the feature amount is calculated as the strength of the relationship between the objects, and the result calculated as, for example, their product is output as the object relevance. The correction parameter 201 is a parameter for normalizing the change in the feature amount, and has the purpose of filling in the difference in the size of the rectangle between objects and the average change amount. In the above explanation, an example was shown in which the strength of the relationship between objects is calculated as the product of each change in the feature amount, but a calculation method such as the sum of each change in the feature amount or a weighted linear sum may also be used.

グラフ生成部１０５は、ＧＣＮで用いられるグラフを生成する機能部である。グラフ生成部１０５では、解析領域検出部１０２により取得した領域の数だけグラフのノードを生成し、各ノードに特徴抽出部１０３の特徴抽出処理により得られた特徴量を付与し、ノード特定ラベル２０２のラベルを付与する。ノード特定ラベル２０２のラベルは、例えば、「ロボット」、「作業員」、「加工機」などの製造現場における物体の名称である。各ノードの間にはつながりがあることを前提とし、それらのノード間にエッジを設け、関連性推定部１０４から得られた物体関連度を用いてグラフのエッジに重みを付与する。 The graph generation unit 105 is a functional unit that generates a graph used in the GCN. The graph generation unit 105 generates graph nodes for the number of areas acquired by the analysis area detection unit 102, assigns features obtained by the feature extraction process of the feature extraction unit 103 to each node, and assigns a label of the node identification label 202. The label of the node identification label 202 is the name of an object in the manufacturing site, such as "robot", "worker", or "machine". It is assumed that there is a connection between each node, and edges are provided between those nodes, and weights are assigned to the graph edges using the object relevance obtained from the relevance estimation unit 104.

作業学習部１０６は、作業の正解ラベルであるノード作業ラベル２０３と、グラフ生成部１０５で生成されたグラフとに基づいて、ＧＣＮにより機械学習を行い、推論モデル２１０を生成する機能部である。作業学習部１０６の機械学習の処理では、生成されたグラフをベースに、ＧＣＮのアルゴリズムを用いて学習し、その結果として、推論モデル２１０を出力する。ノード作業ラベル２０３は、作業を認識する正解ラベルであり、例えば、ノードが「ロボット」である場合の「溶接」、「スクリュー締め」などであり、ノードが「作業員」である場合の「制御盤操作」、「床清掃」などである。 The task learning unit 106 is a functional unit that performs machine learning using GCN based on the node task label 203, which is the correct label for the task, and the graph generated by the graph generation unit 105, to generate an inference model 210. In the machine learning process of the task learning unit 106, learning is performed using a GCN algorithm based on the generated graph, and the inference model 210 is output as a result. The node task label 203 is a correct label that recognizes the task, for example, "welding" or "screw tightening" when the node is a "robot", or "control panel operation" or "floor cleaning" when the node is a "worker".

作業推論部１０７は、作業学習部１０６により学習の結果により得られた推論モデル２１０に基づいて、作業認識を行いたい画像データ２００に対して推論をし、推論結果２２０を得るための機能部である。 The task inference unit 107 is a functional unit that performs inference on the image data 200 for which task recognition is to be performed based on the inference model 210 obtained as a result of learning by the task learning unit 106, and obtains the inference result 220.

記憶部１１０は、データを記憶する機能部である。記憶部１１０には、画像データ２００、補正パラメータ２０１、ノード特定ラベル２０２、ノード作業ラベル２０３、推論モデル２１０、推論結果２２０が格納される。 The memory unit 110 is a functional unit that stores data. The memory unit 110 stores image data 200, correction parameters 201, node identification labels 202, node work labels 203, inference models 210, and inference results 220.

次に、図３を用いて作業認識装置のハードウェア・ソフトウェア構成について説明する。作業認識装置１００は、一般的な情報処理装置（コンピュータ）により実現することができ、図３に示されるように、プロセッサ１１、主記憶装置１２、通信装置１３、入力装置１４、出力装置１５、補助記憶装置１６を備える。 Next, the hardware and software configuration of the task recognition device will be described with reference to FIG. 3. The task recognition device 100 can be realized by a general information processing device (computer), and as shown in FIG. 3, includes a processor 11, a main memory device 12, a communication device 13, an input device 14, an output device 15, and an auxiliary memory device 16.

プロセッサ１１は、主記憶装置１２上のプログラムを実行し、作業認識装置１００の各部を制御する半導体装置であり、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＡＩ（Artificial Intelligence）チップ等である。 The processor 11 is a semiconductor device that executes programs on the main memory device 12 and controls each part of the task recognition device 100, and is, for example, a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), an AI (Artificial Intelligence) chip, etc.

主記憶装置１２は、プログラムやデータを一時的に記憶する装置であり、例えば、ＳＲＡＭ（Static Random Access Memory）やＤＲＡＭ（Dynamic Random Access Memory）等のＲＡＭ（Random Access Memory）である。 The main memory device 12 is a device that temporarily stores programs and data, and is, for example, a RAM (Random Access Memory) such as an SRAM (Static Random Access Memory) or a DRAM (Dynamic Random Access Memory).

通信装置１３は、通信手段４を介して他の装置と通信するインタフェース装置であり、例えば、ＮＩＣ（Network Interface Card）、無線通信モジュール、ＵＳＢ（Universal Serial Interface）モジュール、シリアル通信モジュール等である。通信装置１３は、通信可能に接続する他の装置から情報を受信する入力装置として機能することもできる。また、通信装置１３は、通信可能に接続する他の装置に情報を送信する出力装置として機能することもできる。作業認識装置１００は、通信装置１３により通信手段４を介して画像取得装置６、センサ３と通信する。 The communication device 13 is an interface device that communicates with other devices via the communication means 4, and is, for example, a NIC (Network Interface Card), a wireless communication module, a USB (Universal Serial Interface) module, a serial communication module, etc. The communication device 13 can also function as an input device that receives information from other devices that are communicatively connected. The communication device 13 can also function as an output device that transmits information to other devices that are communicatively connected. The task recognition device 100 communicates with the image acquisition device 6 and the sensor 3 via the communication means 4 using the communication device 13.

入力装置１４は、ユーザから情報を受付けるユーザインタフェース装置であり、例えば、キーボード、マウス、カードリーダ、タッチパネル等である。 The input device 14 is a user interface device that accepts information from the user, such as a keyboard, mouse, card reader, touch panel, etc.

出力装置１５は、各種の情報を出力（表示出力、音声出力、印字出力等）するユーザインタフェース装置であり、例えば、各種情報を可視化する表示装置（ＬＣＤ（Liquid Crystal Display）、グラフィックカード等）や音声出力装置（スピーカ）、印字装置等である。 The output device 15 is a user interface device that outputs various types of information (display output, audio output, print output, etc.), and is, for example, a display device (LCD (Liquid Crystal Display), graphics card, etc.) that visualizes various types of information, an audio output device (speaker), a printer, etc.

補助記憶装置１６は、ハードディスクドライブ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）等である。補助記憶装置１６に格納されているプログラムやデータは、主記憶装置１２に随時読み込まれ、プロセッサ１１は、ロードされたデータを参照し、プログラムを実行する。 The auxiliary storage device 16 is a hard disk drive, a solid state drive (SSD), etc. The programs and data stored in the auxiliary storage device 16 are loaded into the main storage device 12 as needed, and the processor 11 references the loaded data and executes the programs.

本実施形態の補助記憶装置１６には、画像取得プログラム１６１、解析領域検出プログラム１６２、特徴抽出プログラム１６３、関連性推定プログラム１６４、グラフ生成プログラム１６５、作業学習プログラム１６５、作業推論プログラム１６６がインストールされている。 In this embodiment, an image acquisition program 161, an analysis area detection program 162, a feature extraction program 163, a relevance estimation program 164, a graph generation program 165, a work learning program 165, and a work inference program 166 are installed in the auxiliary storage device 16.

画像取得プログラム１６１、解析領域検出プログラム１６２、特徴抽出プログラム１６３、関連性推定プログラム１６４、グラフ生成プログラム１６５、作業学習プログラム１６５、作業推論プログラム１６６は、各々画像取得部１０１、解析領域検出部１０２、特徴抽出部１０３、関連性推定部１０４、グラフ生成部１０５、作業学習部１０６、作業推論部１０７の各機能を実現するプログラムである。 The image acquisition program 161, the analysis area detection program 162, the feature extraction program 163, the relevance estimation program 164, the graph generation program 165, the work learning program 165, and the work inference program 166 are programs that respectively realize the functions of the image acquisition unit 101, the analysis area detection unit 102, the feature extraction unit 103, the relevance estimation unit 104, the graph generation unit 105, the work learning unit 106, and the work inference unit 107.

なお、作業認識装置１００が備える各種の機能は、プロセッサ１１が、主記憶装置１２に格納されているプログラムを読み出して実行するとして説明したが、作業認識装置１００を構成しているハードウェア（ＦＰＧＡ、ＡＳＩＣ、ＡＩチップ等）により実現されるようにしてもよい。 Note that, although the various functions of the task recognition device 100 have been described as being implemented by the processor 11 reading and executing a program stored in the main memory device 12, they may also be implemented by the hardware (FPGA, ASIC, AI chip, etc.) that constitutes the task recognition device 100.

次に、図４ないし図８を用いて作業認識装置の処理について説明する。 Next, the processing of the task recognition device will be explained using Figures 4 to 8.

先ず、作業認識装置１００の画像取得部１０１は、画像取得装置６から送信される画像データ２００を取得する（Ｓ１００）。 First, the image acquisition unit 101 of the task recognition device 100 acquires the image data 200 transmitted from the image acquisition device 6 (S100).

次に、解析領域検出部１０２は、画像取得部１０１で得た画像データ２００の画像から物体領域を検出する。 Next, the analysis area detection unit 102 detects an object area from the image data 200 obtained by the image acquisition unit 101.

次に、特徴抽出部１０３は、画像から得られる所定のフレーム間における特徴量を抽出する（Ｓ１０２）。 Next, the feature extraction unit 103 extracts features between a specified number of frames obtained from the image (S102).

次に、特徴抽出部１０３は、前後フレームなど時系列にデータを確保しておき、前後フレームの特徴量の差を用いて特徴量の変化量を抽出する（Ｓ１０３）。なお、特徴量の変化量を抽出する処理については、後に詳説する。 Next, the feature extraction unit 103 secures data in a time series, such as previous and next frames, and extracts the amount of change in the feature using the difference between the features of the previous and next frames (S103). The process of extracting the amount of change in the feature will be described in detail later.

次に、関連性推定部１０４は、特徴抽出部１０３の特徴量の変化量を抽出する処理Ｓ１０３により抽出された特徴量の変化量を用いて物体間のつながりを示す強弱を推定する物体関連度を算出する（Ｓ１０４）。この物体間の物体関連度を算出する処理については、後に詳説する。 Next, the relevance estimation unit 104 calculates an object relevance that estimates the strength of the connection between the objects using the amount of change in the feature extracted by the process S103 of extracting the amount of change in the feature of the feature extraction unit 103 (S104). The process of calculating the object relevance between the objects will be described in detail later.

次に、グラフ生成部１０５は、Ｓ１０４の物体間の物体関連度に基づき、ＧＣＮで用いられるグラフを生成する（Ｓ１０５）。 Next, the graph generation unit 105 generates a graph to be used in the GCN based on the object association between the objects in S104 (S105).

次に、作業学習部１０６は、Ｓ１０５で生成されたグラフに基づき、ＧＣＮにより機械学習を行い、推論モデル２１０を生成する（Ｓ１０６）。 Next, the work learning unit 106 performs machine learning using GCN based on the graph generated in S105 to generate an inference model 210 (S106).

次に、図５Ａおよび図５Ｂを用いて特徴量の変化量を抽出する処理の詳細について説明する。 Next, the process of extracting the amount of change in features will be described in detail using Figures 5A and 5B.

これは、図４のＳ１０３に該当する処理である。本実施形態では、特徴量の例として、動き変化、テクスチャ変化の場合を詳細に説明するが、画像データの特徴量としては、作業認識に寄与するデータであればよい。 This process corresponds to S103 in FIG. 4. In this embodiment, we will explain in detail the cases of movement change and texture change as examples of features, but the feature of image data may be any data that contributes to task recognition.

先ず、図５Ａを用いて画像上の動きのある物体から動き変化を特徴量の変化量として抽出する処理について説明する。 First, we will use Figure 5A to explain the process of extracting changes in motion from a moving object in an image as changes in features.

最初に、所定の物体領域群を取得する（Ｓ２００）。ここで、動きを取得したい領域を取得するため、必ずしも物体そのものにフォーカスした領域である必要はなく、画像空間上の物体と関連する所定の領域であればよい。 First, a group of predetermined object regions is acquired (S200). Here, in order to acquire the regions from which movement is to be acquired, the regions do not necessarily need to be focused on the object itself, but can be predetermined regions associated with the object in image space.

次に、前後フレームの画像を用いて、領域内からオプティカルフローの手法により動き量を算出する（Ｓ２０１）。オプティカルフローとは、物体やカメラの移動によって生じる隣接フレーム間の物体の動きの見え方のパターンを解析する手法である。 Next, the amount of movement is calculated from within the region using images from previous and next frames by the optical flow method (S201). Optical flow is a method for analyzing the pattern of how the movement of an object appears between adjacent frames, which is caused by the movement of the object or camera.

次に、各物体で得られる領域内の動き量の総和を算出する（Ｓ２０２）。 Next, the sum of the amount of movement within the area obtained for each object is calculated (S202).

次に、算出した動き量の総和を特徴量の変化量として出力する（Ｓ２０３）。 Next, the sum of the calculated amounts of movement is output as the amount of change in the feature (S203).

次に、図５Ｂを用いて特徴量の変化量として画像のテクスチャの変化を抽出する処理にいて説明する。 Next, we will use Figure 5B to explain the process of extracting changes in image texture as changes in feature quantities.

画像のテクスチャの変化とは、画像のＲＧＢ値や濃淡値に変化があるような場合である。 A change in the texture of an image is when there is a change in the RGB values or gray values of the image.

最初に、所定の物体領域群を取得する（Ｓ２１０）。 First, a set of object regions is obtained (S210).

次に、ある物体や物体の変化などを識別できるような学習済みモデルを特徴抽出モデルとして取得する（Ｓ２１１）。例えば、画像の局所特徴量を抽出するモデルとしては、テンプレート特徴量モデル、Ｈｏｌｉｓｔｉｃ特徴量モデル、共起特徴量モデル、パーツ特徴量モデルなどがある。 Next, a trained model capable of identifying an object or changes in an object is acquired as a feature extraction model (S211). For example, models for extracting local features of an image include a template feature model, a holistic feature model, a co-occurrence feature model, and a part feature model.

各物体領域の画像データを特徴抽出モデルに適用することによって、各物体の特徴量が得る（Ｓ２１２）。 Feature quantities for each object are obtained by applying the image data of each object region to the feature extraction model (S212).

そして、前後フレームなどの画像データから前後の特徴量を取得し、前後フレームの特徴量の距離を算出する（Ｓ２１２）。特徴量の距離とは、例えば、特徴量がｎ個の要素からなるときには、ｎ次元のユークリッド距離である。 Then, previous and next feature amounts are obtained from image data such as previous and next frames, and the distance between the feature amounts of the previous and next frames is calculated (S212). For example, when the feature amount consists of n elements, the distance between the feature amounts is the n-dimensional Euclidean distance.

次に、Ｓ２１２で算出した距離値を、特徴量の変化量として出力する（Ｓ２１４）。 Next, the distance value calculated in S212 is output as the change in the feature amount (S214).

次に、図６を用いて物体間の関連度を推定する処理について説明する。
これは、図４のＳ１０４に該当する処理である。 Next, a process for estimating the relevance between objects will be described with reference to FIG.
This process corresponds to S104 in FIG.

先ず、図４のＳ１０３で抽出された各物体領域の特徴量の変化量を取得する（Ｓ３００）。 First, the amount of change in the feature amount of each object region extracted in S103 of FIG. 4 is obtained (S300).

次に、補正パラメータ２０１を用いて特徴量の変化量を正規化する（Ｓ３０１）。これは、各領域の特徴量が大きく変化することがあるので、スケールを合わせる意義があり、補正パラメータ２０１による補正は、必ずしも必要というわけではない。 Next, the amount of change in the feature amount is normalized using the correction parameters 201 (S301). This is because the feature amount of each region can change significantly, so it is meaningful to match the scale, and correction using the correction parameters 201 is not necessarily required.

次に、Ｓ３０１により正規化された二つの領域の特徴量の変化量に基づいて、物体関連度を算出する（Ｓ３０２）。この物体関連度が、物体関係度合を示す指標である。 Next, the object relevance is calculated based on the amount of change in the features of the two regions normalized in S301 (S302). This object relevance is an index showing the degree of object relationship.

例えば、物体Ａの領域の特徴量をＣ_Ａ、物体Ｂの領域の特徴量をＣ_Ｂとしたとき、その両者の積をとり、以下の（式１）により、物体関連度を算出する。
物体関連度（物体Ａ，物体Ｂ）＝Ｃ_Ａ×Ｃ_Ｂ …（式１） For example, when the feature amount of the area of object A is C _A and the feature amount of the area of object B is C _B , the product of these two is taken and the object relevance is calculated by the following (Equation 1).
Object relevance (object A, object B)=C _A ×C _B ... (Equation 1)

また、例えば、その両者の和をとり、以下の（式２）により、物体関連度を算出する。
物体関連度（物体Ａ，物体Ｂ）＝Ｃ_Ａ＋Ｃ_Ｂ …（式２） Furthermore, for example, the sum of the two is calculated, and the object relevance is calculated by the following (Equation 2).
Object relevance (object A, object B)=C _A +C _B ... (equation 2)

また、例えば、その両者の重み付き線形和をとり、以下の（式３）により、物体関連度を算出する。
物体関連度（物体Ａ，物体Ｂ）＝Ｋ_ＡＣ_Ａ＋Ｋ_ＢＣ_Ｂ …（式３） Furthermore, for example, a weighted linear sum of the two is taken, and the object relevance is calculated by the following (Equation 3).
Object relevance (object A, object B)=K _A C _A +K _B C _B ... (Equation 3)

ここで、Ｋ_Ａ、Ｋ_Ｂは、それぞれ物体Ａ、物体Ｂの特徴量の変化量をどれだけ重視するかによって定まる重み係数である。
また、これら以外の演算でもよい。 Here, K _A and K _B are weighting coefficients determined depending on how much importance is placed on the amount of change in the feature amounts of object A and object B, respectively.
Moreover, calculations other than these may be performed.

次に、補正パラメータ２０１により、Ｓ３１３で得られた物体関連度を正規化する（Ｓ３０３）。これは、後の学習フェーズで学習しやすくするためであり、この補正パラメータ２０１による補正は、必ずしも必要というわけではない。 Next, the object relevance obtained in S313 is normalized using the correction parameter 201 (S303). This is to make learning easier in the subsequent learning phase, and correction using the correction parameter 201 is not necessarily required.

次に、Ｓ３０３で算出した物体関連度を出力する（Ｓ３０４）。 Next, the object relevance calculated in S303 is output (S304).

次に、図７を用いてグラフ生成処理の詳細について説明する。
各物体領域別に、図４のＳ１０２の特徴量抽出処理により得た特徴量を取得する（Ｓ４００）。 Next, the graph generation process will be described in detail with reference to FIG.
The feature amount obtained by the feature amount extraction process in S102 of FIG. 4 is acquired for each object region (S400).

次に、ノード特定ラベル２０２から各物体領域別に付与されるノード特定ラベルを取得する（Ｓ４０１）。 Next, the node specific labels assigned to each object region are obtained from the node specific labels 202 (S401).

次に、図４のＳ１０４の関連性推定処理により得られた各物体間での物体関連度を取得する（Ｓ４０２）。 Next, the object relevance between each object obtained by the relevance estimation process in S104 of FIG. 4 is obtained (S402).

次に、各ノードに、Ｓ４１１で取得した特徴量と、Ｓ４１２で所得したノード特定ラベルを付与する（Ｓ４０３）。 Next, the features acquired in S411 and the node-specific label acquired in S412 are assigned to each node (S403).

次に、ノード間のエッジの重みとして、物体関連度を付与する（Ｓ４１４）。この重みは、例えばＧＣＮの畳み込み層で用いられるエッジの重みである。 Next, the object relevance is assigned as the weight of the edge between the nodes (S414). This weight is, for example, the edge weight used in the convolutional layer of the GCN.

このように、生成したグラフを使って、例えば、ＧＣＮなどで学習することにより、ノード特定ラベルを付与されたエッジに該当する物体の作業を推定することのできる推論モデル２１０を構築することができる。 In this way, by using the generated graph to learn, for example, using a GCN, it is possible to construct an inference model 210 that can estimate the activity of an object that corresponds to an edge that has been assigned a node-specific label.

次に、図８を用いて、作業認識装置の画像取得から推論処理までの一連の処理について説明する。
Ｓ１００～Ｓ１０５の処理は、図４で示した処理と同様である。ここで、Ｓ１００で入力される画像データ２００は、物体の作業認識の推論を行うための画像データ２００である。 Next, a series of processes from image acquisition to inference processing by the task recognition device will be described with reference to FIG.
The processes of S100 to S105 are similar to those shown in Fig. 4. Here, the image data 200 input in S100 is image data 200 for making inferences regarding object task recognition.

Ｓ１０５のグラフ生成の後、図４のＳ１０６で得られた推論モデル２１０に基づき、作業推論部１０７は、作業認識の推論を行い推論結果を出力する（Ｓ１１０）。推論結果は、各ノードにノード作業ラベル２０３が付与されたグラフである。なお、推論結果の具体例は、後に詳説する。 After the graph generation in S105, the task inference unit 107 performs task recognition inference based on the inference model 210 obtained in S106 in FIG. 4, and outputs the inference result (S110). The inference result is a graph in which a node task label 203 is assigned to each node. A specific example of the inference result will be described in detail later.

次に、図９を用いて画像データから物体関連度を算出する具体的なイメージを説明する。
ここでは、映像データのような時系列データに対し、物体Ａと物体Ｂの物体関連度を算出する例を説明する。ここで、クロック１、クロック２、…とは、時系列データの時間軸の範囲である。画像上に物体Ａと物体Ｂが存在した場合、それぞれ同じフレームで得た画像データから画像特徴量を取得する。所定のフレーム間であるクロック１（Ｃ０１）に着目したとき、最初のフレームでの特徴抽出Ｆ１Ａと最後のフレームでの特徴抽出Ｆ２Ａで得られた特徴量からユークリッド距離などで距離値を算出し、物体Ａの特徴量の変化量ｄ１Ａを求める。一方、物体Ｂも同様に特徴抽出を行い、物体Ｂの距離値から特徴量の変化量ｄ１Ｂを求める。変化量Ａ１１Ｄと変化量Ａ２１Ｄから、それらの積（（式１）の場合）などで物体関連度Ｐ１を求める。Ｐ１は、クロック１（Ｃ０１）の物体関連度となる。同様の処理をクロック２（Ｃ０２）でも行い、物体関連度Ｐ２を算出する。映像データが得られる度に、クロック毎に同様の処理を行う。なお、各クロックにおいて異なるフレームを使う必要はなく、一部のフレームが同じであってもよい。 Next, a specific image of calculating an object relevance from image data will be described with reference to FIG.
Here, an example of calculating the object relevance of object A and object B for time series data such as video data will be described. Here, clock 1, clock 2, ... are the range of the time axis of the time series data. When object A and object B exist on an image, image features are obtained from image data obtained in the same frame. When focusing on clock 1 (C01) between a predetermined number of frames, a distance value is calculated using Euclidean distance or the like from the features obtained in feature extraction F1A in the first frame and feature extraction F2A in the last frame, and a change amount d1A in the feature amount of object A is obtained. On the other hand, feature extraction is performed in the same manner for object B, and a change amount d1B in the feature amount is obtained from the distance value of object B. From the change amount A11D and the change amount A21D, an object relevance P1 is obtained using the product thereof (in the case of (Equation 1)) or the like. P1 is the object relevance of clock 1 (C01). A similar process is performed for clock 2 (C02) to calculate object relevance P2. A similar process is performed for each clock every time video data is obtained. It is not necessary to use different frames in each clock, and some of the frames may be the same.

次に、図１０を用いて物体関連度をグラフ生成に適用する例について説明する。
物体Ａでは、所定のクロック内で得た特徴量の変化量Ｆ１１Ａを取得しており、物体Ｂでは物体Ａと同じクロック内で得た特徴量の変化量Ｆ１１Ｂを取得しているものとする。このとき、補正パラメータ２０１によりそれぞれの特徴量の変化量を正規化し、物体Ａでは、正規化後の特徴量の変化量Ｆ１１ＡＮを、物体Ｂでは正規化後の特徴量の変化量Ｆ１１ＢＮを取得する。 Next, an example of applying the object relevance to graph generation will be described with reference to FIG.
Assume that for object A, a change amount F11A in the feature amount obtained within a predetermined clock is acquired, and for object B, a change amount F11B in the feature amount obtained within the same clock as for object A is acquired. At this time, the change amount of each feature amount is normalized by the correction parameter 201, and for object A, a change amount F11AN in the feature amount after normalization is acquired, and for object B, a change amount F11BN in the feature amount after normalization is acquired.

次に、各変化量の積から（（式１）の場合）物体関連度Ｐ１１を算出し、所定のクロックにおける変化量からグラフＧ０１に示す物体Ａと物体Ｂをつなぐエッジに重みｗ１１として適用する。このようにして、各物体間において同様の処理を行い、物体間をつなぐエッジに重みを適用していく。図１０では、着目時間域における物体Ａと物体Ｂの物体関連度の平均が０．１であり、それがロボットのノードと加工機のノードをつなぐエッジの重みとして適用される例を示している。 Next, object relevance P11 is calculated from the product of each change amount (in the case of (Equation 1)), and is applied as weight w11 to the edge connecting object A and object B shown in graph G01 from the change amount at a specified clock. In this way, similar processing is performed between each object, and weights are applied to the edges connecting the objects. Figure 10 shows an example in which the average object relevance of object A and object B in the time range of interest is 0.1, and this is applied as the weight of the edge connecting the robot node and the processing machine node.

次に、図１１を用いて推論結果の作業認識結果について説明する。
図１１は、推論結果としてノードにノード作業ラベルを付与することを説明する図である。 Next, the task recognition result of the inference result will be described with reference to FIG.
FIG. 11 is a diagram for explaining how a node work label is assigned to a node as an inference result.

本実施形態では、図４のＳ１０６の作業学習処理で生成した推論モデル２１０に基づいて、図８のＳ１１０の作業推論処理により、作業認識を行い、各ノードとして表現された物体のノード作業ラベル２０３を得る。 In this embodiment, task recognition is performed by the task inference process of S110 in FIG. 8 based on the inference model 210 generated by the task learning process of S106 in FIG. 4, and node task labels 203 are obtained for the objects represented as each node.

図１１の例では、図１１（ａ）に示されるように、ノード１「ロボット」、ノード２「加工機」、ノード３「ワーク」で、時刻Ｔ１におけるノード作業ラベルとして、ノード１「溶接」、ノード２「停止」、ノード３「被溶接」が得られることを示している。図１１（ｂ）は、ノード１「ロボット」をターゲットとするタイミングチャートである。このタイミングチャートでは、時刻Ｔ１にロボットは、溶接作業をしていることを示している。 In the example of Figure 11, as shown in Figure 11(a), the node operation labels obtained at time T1 for node 1 "robot", node 2 "machine", and node 3 "workpiece" are node 1 "welding", node 2 "stop", and node 3 "welded". Figure 11(b) is a timing chart targeting node 1 "robot". This timing chart shows that at time T1, the robot is performing welding work.

以上、本実施形態によれば、画像の特徴量の変化量により、物体関連度を算出し、物体をノード、関連ある物体をエッジとするグラフに基づき、物体関連度をエッジの重みとして、ＧＣＮの学習と推論を行う。それにより、製造現場などの作業認識で物体間のつながりを用いた作業認識を行う場合に、簡易なアルゴリズムにより、汎用的に物体間の関連性を推定した作業認識を行うことができる。 As described above, according to this embodiment, object relevance is calculated based on the amount of change in image features, and GCN learning and inference are performed using object relevance as edge weights based on a graph in which objects are nodes and related objects are edges. As a result, when performing task recognition using connections between objects in task recognition at manufacturing sites, etc., task recognition can be performed that estimates the relevance between objects in a general purpose manner using a simple algorithm.

〔実施形態２〕
以下、図１２ないし図１４を用いて本実施形態２を説明する。 [Embodiment 2]
The second embodiment will be described below with reference to FIGS.

実施形態１では、物体の特徴量の変化量から各物体の物体関連度を算出し、それをＧＣＮのグラフのエッジの重みとして適用することにより、作業認識を行う作業認識装置の例について説明した。 In the first embodiment, an example of a task recognition device that calculates the object relevance of each object from the amount of change in the feature amount of the object and applies it as the edge weight of the GCN graph to perform task recognition was described.

本実施形態では、実施形態１と同様の作業認識を行う作業認識装置であり、さらに、ノードに該当する物体間の物体関連度をユーザが編集することができるようにした例について説明する。 In this embodiment, a task recognition device performs task recognition similar to that of embodiment 1, and further, an example is described in which the user can edit the object association between objects corresponding to nodes.

以下、実施形態１と異なる所を中心に説明する。 The following will focus on the differences from embodiment 1.

先ず、図１２を用いて実施形態２に係る作業認識装置の機能構成について説明する。
本実施形態の作業認識装置１００は、実施形態１の図２に示した作業認識装置１００の構成に加えて、関連性編集部１０８を有する。 First, the functional configuration of the task recognition device according to the second embodiment will be described with reference to FIG.
The task recognition apparatus 100 of this embodiment includes a relevance editing unit 108 in addition to the components of the task recognition apparatus 100 of the first embodiment shown in FIG.

関連性編集部１０８は、ノードとして認識される物体の関連度の値を編集する機能部である。 The relevance editing unit 108 is a functional unit that edits the relevance value of objects recognized as nodes.

また、記憶部１１０に、関連性データ２０４を格納している。関連性データ２０４は、ノードとして認識される物体の関連度の値を保持するデータである。 The storage unit 110 also stores association data 204. The association data 204 is data that holds the value of the association degree of an object recognized as a node.

次に、図１３を用いて実施形態２に係る作業認識装置の画像取得から学習処理までの一連の処理について説明する。 Next, a series of processes from image acquisition to learning processing of the task recognition device according to embodiment 2 will be described with reference to FIG. 13.

本実施形態の作業認識装置の画像取得から学習処理までの一連の処理では、実施形態１の図４に示された処理に加えて、Ｓ１０４とＳ１０５の間に、関連性編集処理Ｓ１２０が付け加えられている。 In the series of processes from image acquisition to learning processing in the task recognition device of this embodiment, in addition to the processes shown in FIG. 4 of the first embodiment, a relevance editing process S120 is added between S104 and S105.

関連性編集処理Ｓ１２０は、グラフのノードとして認識される物体の関連度の値を編集する処理である。この処理により、ユーザが定義した物体関連度を再設定し、関連性推定処理Ｓ１０４で得た物体関連度の結果とユーザが定義した物体関連度を関連性データ２０４として出力し、グラフ生成処理Ｓ１０５に受け渡す。 The relevance editing process S120 is a process for editing the relevance values of objects recognized as nodes in the graph. This process resets the object relevance defined by the user, and the object relevance results obtained in the relevance estimation process S104 and the object relevance defined by the user are output as relevance data 204 and passed to the graph generation process S105.

次に、図１４を用いて作業認識装置が提供するユーザインタフェースについて説明する。
本実施形態の作業認識装置１００は、図１４に示される物体関連度編集画面４００により、グラフのエッジに付与される物体関連度を表示し、編集することができる。表現される
作業認識装置１００は、関連性推定処理Ｓ０４の結果、あるいは、所定の物体を候補から選択できるような設定ファイルを読み込む。次に、ユーザは、物体間をひもづけるための対象を二つ選択する。図１４の例では、対象１のボックス４１１で物体Ａを選択し、対象２のボックス４１２で物体Ｂを選んでいる。各ボックスで選択された物体間の関連度合をボックス４２で記入でき、ユーザが物体間に対して任意の関連度合を設けることができる。また、特徴量のボックス４３により、特徴量の種類が選択でき、対象の特徴量の変化量を、対象１のボックス４４１、対象２のボックス４４２でそれぞれ設定でき、ユーザが任意の物体関連度を設定できる機能を有する。また、グラフ表示５００により、関連度合を可視化し、ノード５１１やノード５１２がつながっていること、各物体で特徴量５３などが表示されていること、物体Ａと物体Ｂがつながりがあることが視認でき、ボックス５２により、その物体Ａと物体Ｂの物体関連度の数値を編集できるようになっている。 Next, a user interface provided by the task recognition device will be described with reference to FIG.
The task recognition device 100 of this embodiment can display and edit the object relevance assigned to the edges of the graph using the object relevance editing screen 400 shown in FIG. 14. The task recognition device 100 reads the result of the relevance estimation process S04 or a setting file that allows a specific object to be selected from candidates. Next, the user selects two objects to link the objects. In the example of FIG. 14, object A is selected in the box 411 of object 1, and object B is selected in the box 412 of object 2. The degree of relevance between the objects selected in each box can be entered in the box 42, and the user can set an arbitrary degree of relevance between the objects. In addition, the type of feature can be selected using the feature box 43, and the amount of change in the feature of the object can be set in the box 441 of object 1 and the box 442 of object 2, respectively, and the user has the function of setting an arbitrary object relevance. In addition, the graph display 500 visualizes the degree of association, and it can be seen that nodes 511 and 512 are connected, that features 53 and the like are displayed for each object, and that objects A and B are connected. Box 52 allows the numerical value of the object association degree between objects A and B to be edited.

以上のように、本実施形態の作業認識装置によれば、各物体の特徴量の変化量や物体関連度をユーザが任意に編集することができる。 As described above, the task recognition device of this embodiment allows the user to arbitrarily edit the amount of change in the feature quantities of each object and the object relevance.

〔実施形態３〕
以下、本発明の実施形態３を、図１５を用いて説明する。
実施形態１では、画像における物体の領域の特徴量の変化量を求め、物体間の関連を示すために、ある物体の領域における特徴量の変化量と他の物体の領域における特徴量の変化量を演算することにより、両者の物体関連度を算出し、それをＧＣＮにおけるグラフのエッジの重みとして、機械学習を行うことにより、作業認識を行う作業認識装置について説明した。 [Embodiment 3]
Hereinafter, the third embodiment of the present invention will be described with reference to FIG.
In the first embodiment, a task recognition device has been described that obtains the amount of change in the feature amount of an object region in an image, calculates the amount of change in the feature amount in a certain object region and the amount of change in the feature amount in another object region to show the relationship between the objects, calculates the object relevance between the two objects, and performs machine learning using this as the weight of the edges of a graph in a GCN.

本実施形態では、実施形態１と同様に、画像における物体の領域の特徴量の変化量を求めて、それからある物体と他の物体の物体関連度を算出する方法であるが、時系列の画像データの変遷にもフォーカスをおいて、物体関連度を算出する方法を説明する。 In this embodiment, as in embodiment 1, the amount of change in the feature amount of an object region in an image is found, and then the object relevance between one object and another object is calculated. However, this embodiment also focuses on the transition of image data over time to calculate the object relevance.

以下、実施形態１と異なる所を中心として説明する。 The following will focus on the differences from embodiment 1.

本実施形態の作業認識装置１００は、実施形態１の図２で示した機能構成と同様の構成を有し、図４の関連性推定処理Ｓ１０４である物体の領域における特徴量の変化量と他の物体の領域における特徴量の変化量を演算することにより、両者の物体関連度を算出することは同様であるが、その方法が異なっている。 The task recognition device 100 of this embodiment has a configuration similar to the functional configuration shown in FIG. 2 of the first embodiment, and calculates the object relevance between the two objects by calculating the amount of change in the feature amount in the area of the object and the amount of change in the feature amount in the area of the other object, which is the relevance estimation process S104 of FIG. 4, but the method is different.

以下、図１５を用いて、実施形態３における関連性推定処理Ｓ１０４について説明する。
作業認識装置１００の関連性推定部１０４は、物体Ａ（６０Ａ）のオプティカルフローなどによって得られた特徴量１（ＦＡ１）と、３ＤＣＮＮなどで得られた特徴量２（ＦＡ２）を取得する。 The relevance estimation process S104 in the third embodiment will be described below with reference to FIG.
The relevance estimation unit 104 of the task recognition apparatus 100 acquires feature amount 1 (FA1) obtained by the optical flow or the like of the object A (60A) and feature amount 2 (FA2) obtained by 3DCNN or the like.

３ＤＣＮＮとは、空間情報（２Ｄ）と時間情報（１Ｄ）をまとめて３Ｄの畳み込みを行うことにより、時空間情報を考慮した動画の行動認識を行う手法である。 3DCNN is a method for recognizing actions in videos that takes spatiotemporal information into account by combining spatial information (2D) and temporal information (1D) and performing 3D convolution.

また、関連性推定部１０４は、物体Ａ（６０Ａ）と同様に、物体Ｂ（６０Ｂ）についての特徴量１（ＦＢ１）と特徴量２（ＦＢ２）を取得する。 The relevance estimation unit 104 also acquires feature 1 (FB1) and feature 2 (FB2) for object B (60B) in the same manner as for object A (60A).

次に、物体間において同一のクロック内で得られた特徴量同士を対象に距離値（ｄ１１、ｄ１２、ｄ２１、ｄ２２）を算出する。 Next, distance values (d11, d12, d21, d22) are calculated between features obtained within the same clock between objects.

その後、前後するクロック間の距離値の差を求め、異なる特徴量１と特徴量２を合わせ、物体関連度を算出する。例えば、クロック１で得られた物体Ａの特徴量１と物体Ｂの特徴量１との距離ｄ１１を算出する。クロック２でも同様に距離ｄ１２を算出する。距離ｄ１１と距離ｄ１２の差の絶対値からクロック２とクロック１の変化が得られ、クロック間における変化度合（Ｄ１Ｒ）を得る。例えば、距離ｄ１１と距離ｄ１２の差の絶対値Δ１に１を足した値の逆数を取り、クロック間の変化度合いを得る。すなわち、以下の（式４）により、クロック間における変化度合（Ｄ１Ｒ）を求める。
Ｄ１Ｒ＝１/（Δ１+１) …（式４） Then, the difference in distance values between the preceding and succeeding clocks is calculated, and the different feature amount 1 and feature amount 2 are combined to calculate the object relevance. For example, the distance d11 between feature amount 1 of object A and feature amount 1 of object B obtained at clock 1 is calculated. The distance d12 is calculated similarly at clock 2. The change between clock 2 and clock 1 is obtained from the absolute value of the difference between distance d11 and distance d12, and the degree of change between the clocks (D1R) is obtained. For example, the reciprocal of the value obtained by adding 1 to the absolute value Δ1 of the difference between distance d11 and distance d12 is taken to obtain the degree of change between the clocks. That is, the degree of change between the clocks (D1R) is calculated by the following (Equation 4).
D1R=1/(Δ1+1)...(Formula 4)

同様に、さらに、物体Ａの特徴量２と物体Ｂの特徴量２においても同様に、距離ｄ２１と距離ｄ２２の差の絶対値からクロック２とクロック１の変化が得られ、クロック間における変化度合（Ｄ２Ｒ）を得る。例えば、距離ｄ２１と距離ｄ２２の差の絶対値Δ２に１を足した値の逆数を取り、クロック間の変化度合いを得る。すなわち、以下の（式５）により、クロック間における変化度合（Ｄ２Ｒ）を求める。
Ｄ２Ｒ＝１/（Δ２+１) …（式５）
そして、変化度合Ｄ１ＲとＤ２Ｒから相加平均値を、それを類似度Ｓとする。 Similarly, for feature amount 2 of object A and feature amount 2 of object B, the change between clocks 2 and 1 is obtained from the absolute value of the difference between distance d21 and distance d22, and the degree of change between the clocks (D2R) is obtained. For example, the reciprocal of the value obtained by adding 1 to the absolute value Δ2 of the difference between distance d21 and distance d22 is taken to obtain the degree of change between the clocks. That is, the degree of change between the clocks (D2R) is calculated by the following (Equation 5).
D2R=1/(Δ2+1)...(Formula 5)
Then, the arithmetic mean value is calculated from the degrees of change D1R and D2R, and this is set as the similarity S.

グラフ生成にあたっては、クロック１のグラフのエッジの重みをＷ１、クロック２のグラフのエッジの重みをＷ２としたときに、以下の（式６）により、最終的なクロック２のエッジの重みＷ２′を算出する。
Ｗ２′＝Ｗ１×Ｓ＋Ｗ２ …（式６） In generating the graph, when the edge weight of the graph of clock 1 is W1 and the edge weight of the graph of clock 2 is W2, the final edge weight W2' of clock 2 is calculated by the following (Equation 6).
W2'=W1×S+W2 (Equation 6)

すなわち、クロック間における変化度合いが少ないときには、クロック２のグラフのエッジの重みは、強調されず、クロック間における変化度合いが多いときには、クロック２のグラフのエッジの重みが強調されることになる。 In other words, when the degree of change between clocks is small, the weight of the edge in the graph for clock 2 is not emphasized, and when the degree of change between clocks is large, the weight of the edge in the graph for clock 2 is emphasized.

以上のように、本実施形態では、物体間の差とクロック間の差を算出することで、時系列データを考慮した物体関連度を算出し、ＧＣＮのグラフを生成することができる。 As described above, in this embodiment, by calculating the difference between objects and the difference between clocks, it is possible to calculate object relevance taking into account time series data and generate a GCN graph.

１…作業認識システム、２…作業者、３…センサ、４…通信手段、５…周辺物体、６…画像取得装置、
１００…作業認識装置、
１１…プロセッサ、１２…主記憶装置、１３…通信装置、１４…入力装置、１５…出力装置、１６…補助記憶装置、
１０１…画像取得部、１０２…解析領域検出部、１０３…特徴抽出部、１０４…関連性推定部、１０５…グラフ生成部、１０６…作業学習部、１０７…作業推論部、１０８…関連性編集部、１１０…記憶部、
２００…画像データ、２０１…補正パラメータ、２０２…ノード特定ラベル、２０３…ノード作業ラベル、２０４…関連性データ、２１０…推論モデル、２２０…推論結果 1 ... task recognition system, 2 ... worker, 3 ... sensor, 4 ... communication means, 5 ... surrounding object, 6 ... image acquisition device,
100...work recognition device,
11: processor, 12: main storage device, 13: communication device, 14: input device, 15: output device, 16: auxiliary storage device,
101: image acquisition unit, 102: analysis area detection unit, 103: feature extraction unit, 104: relevance estimation unit, 105: graph generation unit, 106: work learning unit, 107: work reasoning unit, 108: relevance editing unit, 110: memory unit,
200: image data, 201: correction parameters, 202: node specific labels, 203: node working labels, 204: association data, 210: inference model, 220: inference results

Claims

A task recognition device that analyzes image data and performs task recognition of an object related to the task by analyzing the image data,
an image acquisition unit that acquires image data captured by the imaging device;
an analysis area detection unit that analyzes the image data obtained from the image acquisition unit and detects an area related to an object;
a feature extraction unit that extracts a feature amount of an image of a region related to the object detected by the analysis region detection unit and calculates a change amount of the feature amount of the image within a predetermined period of time;
and a relevance estimation unit that calculates an object relevance between a first object and a second object based on a first change in a feature amount for the image detected as an area related to a first object within a predetermined period of time and a second change in a feature amount for the image detected as an area related to a second object within the same predetermined period of time as the first change in the feature amount.

the first change amount of the feature quantity and the second change amount of the feature quantity are scalar quantities,
2. The task recognition device according to claim 1, wherein the object relevance between the first object and the second object is calculated by any one of a product, a sum, or a weighted linear sum of a first change amount in the feature amount and a second change amount in the feature amount.

The task recognition device according to claim 1 further comprises a task learning unit that performs machine learning for task recognition based on the object relevance calculated by the relevance estimation unit and the feature amount of the image of the area related to the object extracted by the feature extraction unit, and outputs an inference model.

The task recognition system according to claim 3 further comprises a task inference unit that performs task recognition inference based on image data captured by a photographing device and the inference model output by the task learning unit.

The data includes data expressing a graph in which regions related to the object are nodes and related objects are connected by edges;
4. The task recognition device according to claim 3, wherein an inference model is generated that generates a node task label for the node by using the object relevance as a weight between edges and a Graph Convolution Network (GCN).

The data includes data expressing a graph in which regions related to the object are nodes and related objects are connected by edges;
2. The task recognition apparatus according to claim 1, further comprising a display/editing unit for displaying and editing the feature amount of the area related to the object and the object relevance.

the relevance estimation unit calculates a first object relevance between the first object and the second object based on a first amount of change in a feature amount related to the image detected as a region related to a first object within a first period and a second amount of change in a feature amount related to the image detected as a region related to a second object within the first period;
calculating a second object relevance between the first object and the second object based on a first change amount of a feature amount related to the image detected as a region related to a first object within a second period and a second change amount of a feature amount related to the image detected as a region related to a second object within the second period;
calculating a degree of change in the object relevance based on an absolute value of a difference between the first object relevance and the second object relevance, and calculating a similarity based on a degree of change in the object relevance using a first amount of change within the first period;
correcting a weight of a first edge corresponding to a first period in the graph by a similarity based on a degree of change in the object relevance;
6. The task recognition device according to claim 5, further comprising: a sum of a weight of a first edge corresponding to the corrected first period and a weight of a second edge in the second period, to calculate the corrected weight of the second edge in the second period.

A task recognition method using a task recognition device that analyzes image data and recognizes tasks of a worker and a task device by analyzing the image data, comprising:
an image acquiring step in which an image acquiring unit acquires image data captured by the photographing device;
an analysis region detection step in which an analysis region detection unit analyzes the image data obtained by the image acquisition step and detects a region related to the object;
a feature extraction step of extracting a feature amount related to an image of the area related to the object detected by the analysis area detection step, and calculating a change amount of the feature amount related to the image within a predetermined period of time;
a relevance estimation step of calculating an object relevance between the first object and the second object based on a first change amount of a feature amount of the image detected as an area related to a first object within a predetermined period and a second change amount of a feature amount of the image detected as an area related to a second object within the same predetermined period as the first change amount;
a task learning step of performing machine learning for task recognition based on the object relevance calculated in the relevance estimation step and feature amounts related to the image of the area related to the object extracted in the feature extraction step, and outputting an inference model;
A task recognition method comprising a task inference step of performing inference for task recognition based on image data captured by a photographing device and the inference model output by the task learning step.

the first change amount of the feature quantity and the second change amount of the feature quantity are scalar quantities,
9. The task recognition method according to claim 8, wherein the object relevance between the first object and the second object is calculated by any one of a product, a sum, or a weighted linear sum of a first change amount in the feature amount and a second change amount in the feature amount.

the task recognition device has data representing a graph in which regions related to the object are nodes and related objects are connected by edges;
9. The task recognition method according to claim 8, further comprising the steps of: generating an inference model that generates a node task label for the node by using the object relevance as a weight between edges and a graph convolution network (GCN).