JP7633588B2

JP7633588B2 - Motion recognition device, motion recognition method, and motion recognition program

Info

Publication number: JP7633588B2
Application number: JP2020109441A
Authority: JP
Inventors: 滋穂里内田; 健太西行
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2020-06-25
Filing date: 2020-06-25
Publication date: 2025-02-20
Anticipated expiration: 2040-06-25
Also published as: JP2022006885A; CN113850114A

Description

本発明は、動作認識装置、動作認識方法及び動作認識プログラムに関する。 The present invention relates to a motion recognition device, a motion recognition method, and a motion recognition program.

従来、工場等の製造ラインにカメラやセンサ等を設置し、それらにより得られる作業者の動作情報に基づいて、作業者の動作を認識し、評価することが行われている。例えば、下記特許文献１では、モーションキャプチャ等により得られる情報に基づいて、標準作業から逸脱する標準外作業の有無を判断している。 Conventionally, cameras and sensors are installed on production lines in factories, etc., and the movements of workers are recognized and evaluated based on the information on the movements of the workers obtained by these cameras and sensors. For example, in the following Patent Document 1, the presence or absence of non-standard work that deviates from standard work is determined based on information obtained by motion capture, etc.

特許文献１では、ある作業工程で行われる動作を想定し、その動作の有無やその動作に異常があるかどうかを判断している。 In Patent Document 1, an action to be performed in a certain work process is assumed, and it is determined whether or not that action is present and whether or not there is an abnormality in that action.

国際公開第２０１８/１３１６３０号WO 2018/131630

ところで、作業工程で行われる動作は、同じ動作種別に属する動作であっても、作業工程ごとに異なる動きになることが多い。例えば、ネジを締める工程とケースを嵌合する工程には、作業対象に手を移動して掴む（把持）動作が、共通動作として含まれるが、それぞれの掴む動きは異なる。具体的に説明すると、ネジを締める工程では、例えば部品箱に置かれている部品を一方の手で掴み、作業者の前方上方からケーブルにより吊り下げられている電動ドライバーを他方の手で掴む動きとなる。これに対し、ケースを嵌合する工程では、例えば部品箱に置かれているケースを両手で掴む動きとなる。 However, the actions performed in each work process often differ even if they belong to the same action type. For example, the process of tightening a screw and the process of fitting a case share the common action of moving the hand to the work object and grabbing (grabbing) it, but the grasping motions are different in each process. To be more specific, in the process of tightening a screw, for example, one hand will grab a part placed in a parts box, and the other hand will grab an electric screwdriver suspended by a cable from above and in front of the worker. In contrast, in the process of fitting a case, for example, both hands will grab a case placed in a parts box.

このように作業工程ごとに異なる動きをする共通動作を認識対象とする場合、特許文献１では、いずれかの作業工程で行われる共通動作を認識させることはできる。しかしながら、他の作業工程で行われる共通動作を認識させるためには、他の作業工程で行われる共通動作を認識できるように、別途設計等を行う必要がある。 When targeting common actions that differ for each work process, the technology in Patent Document 1 can recognize common actions performed in one of the work processes. However, in order to recognize common actions performed in other work processes, it is necessary to carry out separate design work so that common actions performed in other work processes can be recognized.

そこで、本発明は、異なる作業工程で行われる共通動作をそれぞれ認識させることができる動作認識装置、動作認識方法及び動作認識プログラムを提供する。 Therefore, the present invention provides an action recognition device, an action recognition method, and an action recognition program that can recognize common actions performed in different work processes.

本開示の一態様に係る動作認識装置は、作業の経過時間、認識対象である異なる作業工程に共通して含まれる共通動作、及び作業者の骨格データを含む動作情報を取得する取得部と、骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの座標と、身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、骨格データに含まれるそれぞれの座標を、算出した距離に変換する変換部と、変換部により座標が距離に変換された後の動作情報に基づいて、共通動作のいずれかに該当する作業者の動作を示す情報を出力するモデルを、動作認識用のモデルとして学習させる学習部と、を備える。 The motion recognition device according to one aspect of the present disclosure includes an acquisition unit that acquires motion information including elapsed time of work, common motions common to different work processes to be recognized, and skeletal data of a worker, a conversion unit that calculates the distance between each coordinate and a starting coordinate corresponding to a starting point of a body part determined as a starting point among the body parts based on the coordinates corresponding to each of a plurality of body parts included in the skeletal data, and converts each coordinate included in the skeletal data into the calculated distance, and a learning unit that learns a model for motion recognition that outputs information indicating a motion of a worker that corresponds to one of the common motions based on the motion information after the coordinates are converted into distances by the conversion unit.

この態様によれば、骨格データに含まれる複数の身体部位それぞれに対応する座標と、起点部位に対応する起点座標との間の距離をそれぞれ算出し、骨格データの各座標を、その算出した距離に置き換えることができる。そして、骨格データの各座標を距離データに置き換えた後の動作情報に基づいて、共通動作のいずれかに該当する作業者の動作を示す情報を出力する動作認識用のモデルを学習させることができる。 According to this aspect, the distance between the coordinates corresponding to each of the multiple body parts included in the skeletal data and the starting coordinates corresponding to the starting part can be calculated, and each coordinate of the skeletal data can be replaced with the calculated distance. Then, based on the movement information after each coordinate of the skeletal data has been replaced with distance data, a model for action recognition can be trained that outputs information indicating the worker's actions that correspond to any of the common actions.

本開示の他の態様に係る動作認識装置は、作業者の動作に関する時系列情報を取得する取得部と、時系列情報から抽出された骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの座標と、身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、骨格データに含まれるそれぞれの座標を、算出した距離に変換する変換部と、変換部により変換された後の距離を、学習済モデルに入力し、当該学習済モデルから出力される、認識対象である異なる作業工程に共通して含まれる共通動作のいずれかに該当する作業者の動作を示す情報に基づいて、作業者の動作を認識する動作認識部と、を備える。 A motion recognition device according to another aspect of the present disclosure includes an acquisition unit that acquires time-series information regarding a motion of a worker, a conversion unit that calculates, based on coordinates corresponding to each of a plurality of body parts included in skeletal data extracted from the time-series information, a distance between each coordinate and a starting coordinate corresponding to a starting part of the body part that is set as a starting point, and converts each coordinate included in the skeletal data into the calculated distance, and a motion recognition unit that inputs the distance converted by the conversion unit into a trained model and recognizes the motion of the worker based on information indicating a motion of the worker that corresponds to one of the common motions commonly included in different work processes to be recognized, which is output from the trained model.

この態様によれば、認識対象となる作業者の時系列情報から抽出された骨格データに含まれる複数の身体部位それぞれに対応する座標と、起点部位に対応する起点座標との間の距離をそれぞれ算出し、骨格データの各座標を、その算出した距離に置き換えることができる。そして、置き換えた後の距離データを学習済モデルに入力することで、共通動作のいずれかに該当する作業者の動作を認識することができる。 According to this aspect, the distance between the coordinates corresponding to each of the multiple body parts included in the skeletal data extracted from the time-series information of the worker to be recognized and the starting coordinates corresponding to the starting part can be calculated, and each coordinate of the skeletal data can be replaced with the calculated distance. Then, by inputting the replaced distance data into the trained model, it is possible to recognize the worker's movements that correspond to any of the common movements.

上記各態様において、起点部位は、首又は腰のいずれかであることとしてもよい。 In each of the above aspects, the starting point may be either the neck or the lower back.

これにより、認識対象となる作業者の動きの中心付近に、起点部位を定めることができる。 This allows the starting point to be determined near the center of the movement of the worker being recognized.

本開示の他の態様に係る動作認識方法は、プロセッサにより実行される動作認識方法であって、作業の経過時間、認識対象である異なる作業工程に共通して含まれる共通動作、及び作業者の骨格データを含む動作情報を取得することと、骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの座標と、身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、骨格データに含まれるそれぞれの座標を、算出した距離に変換することと、座標が距離に変換された後の動作情報に基づいて、共通動作のいずれかに該当する作業者の動作を示す情報を出力するモデルを、動作認識用のモデルとして学習させることと、を含む。 A motion recognition method according to another aspect of the present disclosure is a motion recognition method executed by a processor, and includes acquiring motion information including elapsed time of work, common motions included in common to different work processes to be recognized, and skeletal data of a worker, calculating a distance between each coordinate and a starting coordinate corresponding to a starting point of a body part determined as a starting point among the body parts based on the coordinates corresponding to each of a plurality of body parts included in the skeletal data, and converting each coordinate included in the skeletal data to the calculated distance, and training a model that outputs information indicating a motion of a worker that corresponds to one of the common motions based on the motion information after the coordinates have been converted to distance, as a model for motion recognition.

本開示の他の態様に係る動作認識方法は、プロセッサにより実行される動作認識方法であって、作業者の動作に関する時系列情報を取得することと、時系列情報から抽出された骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの座標と、身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、骨格データに含まれるそれぞれの座標を、算出した距離に変換することと、変換された後の距離を、学習済モデルに入力し、当該学習済モデルから出力される、認識対象である異なる作業工程に共通して含まれる共通動作のいずれかに該当する作業者の動作を示す情報に基づいて、作業者の動作を認識することと、を含む。 A motion recognition method according to another aspect of the present disclosure is a motion recognition method executed by a processor, and includes: acquiring time-series information on a motion of a worker; calculating a distance between each coordinate and an origin coordinate corresponding to an origin part of the body part determined as an origin based on coordinates corresponding to each of a plurality of body parts included in skeletal data extracted from the time-series information; converting each coordinate included in the skeletal data to the calculated distance; inputting the converted distance into a trained model; and recognizing the motion of the worker based on information output from the trained model indicating a motion of the worker that corresponds to one of the common motions commonly included in different work processes to be recognized.

本開示の他の態様に係る動作認識プログラムは、コンピュータを、作業の経過時間、認識対象である異なる作業工程に共通して含まれる共通動作、及び作業者の骨格データを含む動作情報を取得する取得部、骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの座標と、身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、骨格データに含まれるそれぞれの座標を、算出した距離に変換する変換部、変換部により座標が距離に変換された後の動作情報に基づいて、共通動作のいずれかに該当する作業者の動作を示す情報を出力するモデルを、動作認識用のモデルとして学習させる学習部、として機能させる。 A motion recognition program according to another aspect of the present disclosure causes a computer to function as an acquisition unit that acquires motion information including elapsed time of work, common motions common to different work processes to be recognized, and skeletal data of a worker, a conversion unit that calculates the distance between each coordinate and a starting coordinate corresponding to a starting point among the body parts defined as a starting point based on the coordinates corresponding to each of a plurality of body parts included in the skeletal data, and converts each coordinate included in the skeletal data to the calculated distance, and a learning unit that learns a model for motion recognition that outputs information indicating a motion of a worker corresponding to one of the common motions based on the motion information after the coordinates have been converted into distances by the conversion unit.

本開示の他の態様に係る動作認識プログラムは、コンピュータを、作業者の動作に関する時系列情報を取得する取得部、時系列情報から抽出された骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの座標と、身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、骨格データに含まれるそれぞれの座標を、算出した距離に変換する変換部、変換部により変換された後の距離を、学習済モデルに入力し、当該学習済モデルから出力される、認識対象である異なる作業工程に共通して含まれる共通動作のいずれかに該当する作業者の動作を示す情報に基づいて、作業者の動作を認識する動作認識部、として機能させる。 A motion recognition program according to another aspect of the present disclosure causes a computer to function as an acquisition unit that acquires time-series information regarding a worker's motion, a conversion unit that calculates the distance between each coordinate and an origin coordinate corresponding to an origin part of the body part that is determined as an origin based on the coordinates corresponding to each of the multiple body parts included in the skeletal data extracted from the time-series information, and converts each coordinate included in the skeletal data into the calculated distance, and a motion recognition unit that inputs the distances converted by the conversion unit into a trained model and recognizes the motion of the worker based on information indicating a motion of the worker that corresponds to one of the common motions commonly included in the different work processes to be recognized, which is output from the trained model.

本発明によれば、異なる作業工程で行われる共通動作をそれぞれ認識させることができる動作認識装置、動作認識方法及び動作認識プログラムを提供することができる。 The present invention provides an action recognition device, an action recognition method, and an action recognition program that can recognize common actions performed in different work processes.

本発明の実施形態に係る動作認識システムの概要を例示する図である。1 is a diagram illustrating an example of an overview of an action recognition system according to an embodiment of the present invention. ネジを締める工程における把持動作の一例を示す模式図である。11A to 11C are schematic diagrams illustrating an example of a gripping operation in a process of tightening a screw. ケースを嵌合する工程における把持動作の一例を示す模式図である。11A to 11C are schematic diagrams illustrating an example of a gripping operation in a case fitting process. 動作認識システム及び動作認識装置の機能構成を例示する図である。FIG. 2 is a diagram illustrating a functional configuration of an action recognition system and an action recognition device. 動作認識装置に記憶される動作情報の一例を示す図である。FIG. 2 is a diagram illustrating an example of action information stored in the action recognition device. 骨格データの首に対応する座標と他の各座標との間の距離を算出する様子を模式的に示す図である。13 is a diagram showing a schematic diagram of a manner in which distances between coordinates corresponding to the neck in the skeletal data and each of the other coordinates are calculated. FIG. 動作認識装置のハードウェア構成を例示する図である。FIG. 2 is a diagram illustrating an example of a hardware configuration of the action recognition device. 動作認識装置における学習モード時の動作の一例を説明するためのフローチャートである。10 is a flowchart illustrating an example of an operation of the action recognition device in a learning mode. 図８に示す変換処理の手順を説明するためのフローチャートである。9 is a flowchart for explaining the procedure of the conversion process shown in FIG. 8 . 動作認識装置における動作認識モード時の動作の一例を説明するためのフローチャートである。10 is a flowchart illustrating an example of an operation of the action recognition device in a action recognition mode.

以下、本発明の一側面に係る実施の形態（以下、「本実施形態」と表記する。）を、図面に基づいて説明する。なお、各図において、同一の符号を付したものは、同一又は同様の構成を有する。 Below, an embodiment of one aspect of the present invention (hereinafter, referred to as "this embodiment") will be described with reference to the drawings. Note that in each drawing, parts with the same reference numerals have the same or similar configurations.

§１適用例
まず、図１を用いて、本発明が適用される場面の一例について説明する。本実施形態に係る動作認識システム１００は、ある作業領域Ｒで行われる作業者Ａの動作を画像センサ２０ａ、２０ｂ、２０ｃで撮影し、その撮影した動画を取得した動作認識装置１０が、学習済モデルを用いて作業者Ａの共通動作を認識する。ここで、共通動作は、異なる作業工程に共通して含まれる動作（同じ動作種別に属する動作）である。このような共通動作として、例えば、把持、運搬、調整等のように、異なる作業工程で共通して行われる動作が該当する。 §1 Application Example First, an example of a scene to which the present invention is applied will be described with reference to FIG. 1. In the action recognition system 100 according to this embodiment, the action of a worker A performed in a certain work area R is captured by the image sensors 20a, 20b, and 20c, and the action recognition device 10 that acquires the captured video recognizes a common action of the worker A using a trained model. Here, the common action is an action that is commonly included in different work processes (action that belongs to the same action type). Such a common action is, for example, an action that is commonly performed in different work processes, such as grasping, carrying, and adjustment.

把持は、作業対象に手を移動して掴む動作として定義され、例えば、部品や器具を掴みに行く動作が該当する。運搬は、作業対象を目的の場所に移動させる動作として定義され、例えば、組み立てている製品に向けて部品や器具を運ぶ動作が該当する。調整は、作業を目標状態に移行する動作として定義され、例えば、部品の組み立てを行う動作が該当する。 Grasping is defined as the action of moving the hand to the work object and grabbing it, for example, the action of going to grab a part or tool. Transporting is defined as the action of moving the work object to the desired location, for example, the action of carrying a part or tool toward a product being assembled. Adjusting is defined as the action of moving the work to the target state, for example, the action of assembling parts.

本実施形態では、例示的に、共通動作が把持である場合について説明するが、運搬や調整等の他の共通動作にも同様に適用することができる。 In this embodiment, as an example, a case where the common action is grasping is described, but the same can be applied to other common actions such as carrying and adjusting.

学習済モデルは、作業者の骨格データに基づいて生成される距離データ等を含む動作情報を入力とし、作業者の動作を示す情報を出力とするように学習させた動作認識用のモデルである。学習時に入力する距離データは、作業者の骨格データに含まれる複数の身体部位それぞれに対応する座標と、身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を示すデータである。 The trained model is a model for motion recognition that is trained to receive motion information, including distance data, generated based on the skeletal data of a worker, as input, and to output information indicating the motion of the worker. The distance data input during training is data indicating the distance between the coordinates corresponding to each of the multiple body parts included in the skeletal data of the worker and the starting coordinates corresponding to a starting part of the body part that is set as the starting point.

起点部位として、例えば、首や腰等を設定することができる。ここで、作業者の上半身が認識対象になる場合には、起点部位として首を設定し、作業者の全身が認識対象になる場合には、起点部位として腰を設定することが好ましい。このように、認識対象となる作業者の動きの中心付近に起点部位を定めることが好ましい。 For example, the neck, waist, etc. can be set as the starting point. Here, if the upper body of the worker is the recognition target, it is preferable to set the neck as the starting point, and if the entire body of the worker is the recognition target, it is preferable to set the waist as the starting point. In this way, it is preferable to set the starting point near the center of the movement of the worker to be recognized.

図２及び図３を参照して、距離データについて、さらに説明する。図２は、ネジを締める工程において、作業者Ａａの前方かつ上方からケーブルにより吊り下げられている電動ドライバーＤを、作業者Ａａが右手を伸ばして掴もうとしている動きを例示する図である。図３は、ケースを嵌合する工程において、部品箱に置かれているケースＣを、作業者Ａｂが両手を伸ばして掴もうとしている動きを例示する図である。 The distance data will be further explained with reference to Figures 2 and 3. Figure 2 is a diagram illustrating the movement of worker Aa stretching out his right hand to grab an electric screwdriver D that is suspended from above and in front of worker Aa by a cable during the process of tightening a screw. Figure 3 is a diagram illustrating the movement of worker Ab stretching out both hands to grab a case C placed in a parts box during the process of fitting a case.

図２及び図３では、どちらも作業者Ａａ、Ａｂが把持動作を行っている。しかしながら、例えば、作業者Ａａの右手と、作業者Ａｂの右手とでは、把持動作としての動きが全く異なる。このような把持動作を認識する場合、従来では、作業工程ごとに、学習モデル等を生成し、それぞれの作業工程で行われる把持動作を認識する必要がある。 In both Figures 2 and 3, workers Aa and Ab are performing a gripping motion. However, for example, the gripping motion of the right hand of worker Aa and the right hand of worker Ab are completely different. In the past, when recognizing such a gripping motion, it was necessary to generate a learning model or the like for each work process and recognize the gripping motion performed in each work process.

これに対し、本願発明は、作業者の骨格データに含まれる各座標を、ある身体部位を起点とした距離にそれぞれ置き換え、それら距離の特徴を作業者の把持動作と関連付けて学習させることで、異なる作業工程において行われる把持動作をそれぞれ認識できるようにしたものである。以下に具体的に説明する。 In response to this, the present invention replaces each coordinate included in the worker's skeletal data with a distance starting from a certain body part, and learns the characteristics of these distances in association with the worker's grasping motion, making it possible to recognize each of the grasping motions performed in different work processes. This is explained in detail below.

図２及び図３において、それぞれの作業者Ａａ、Ａｂの首を起点とし、それぞれの作業者Ａａ、Ａｂの右手の動きを分析すると、どちらの把持動作でも、右手が首から遠ざかる方向に動く動作として特徴付けることができる。そして、図２に示す、ネジを締める工程で作業者Ａａが電動ドライバーＤを右手で掴む動きは、ネジを締める工程において全ての作業者が同じように効率良く動くことが望ましい。同様に、図３に示す、ケースを嵌合する工程で作業者ＡｂがケースＣを右手で掴む動きは、ケースを嵌合する工程において全ての作業者が同じように効率良く動くことが望ましい。 In Figures 2 and 3, when the neck of each worker Aa and Ab is used as the starting point and the right hand movements of each worker Aa and Ab are analyzed, it can be characterized that in both gripping movements, the right hand moves in a direction away from the neck. In the process of tightening a screw, as shown in Figure 2, the movement of worker Aa gripping electric screwdriver D with his right hand is such that it is desirable for all workers to move equally efficiently in the process of tightening a screw. Similarly, in the process of fitting the case, as shown in Figure 3, the movement of worker Ab gripping case C with his right hand is such that it is desirable for all workers to move equally efficiently in the process of fitting the case.

したがって、それぞれの作業工程で行われる把持動作に対応する右手の距離データを、それぞれの作業工程の把持動作に関連付けて学習させることで、それぞれの作業工程において作業者が右手で作業対象を掴む動きを精度よく認識できるようになる。 Therefore, by learning the distance data of the right hand corresponding to the grasping action performed in each work process in association with the grasping action in each work process, it becomes possible to accurately recognize the movement of the worker grasping the work object with his/her right hand in each work process.

このように、本実施形態に係る動作認識装置１０によれば、異なる作業工程で行われる把持動作等の共通動作をそれぞれ認識させることができる。 In this way, the action recognition device 10 according to this embodiment can recognize common actions, such as gripping actions, performed in different work processes.

§２構成例
［機能構成］
次に、図４を参照し、本実施形態に係る動作認識システム１００及び動作認識装置１０の機能構成について、その一例を説明する。動作認識システム１００は、三台の画像センサ２０ａ、２０ｂ、２０ｃと、動作認識装置１０とを備える。以下において、三台の画像センサ２０ａ、２０ｂ、２０ｃを特に区別して記載する必要がない場合には、画像センサ２０と記載する。動作認識装置１０は、機能的な構成として、例えば、取得部１１、変換部１２、学習部１３、動作認識部１４及び記憶部１９を有する。記憶部１９は、例えば、動画１９ａ、動作情報１９ｂ及び学習済モデル１９ｃを記憶する。 §2 Configuration example [Functional configuration]
Next, an example of the functional configuration of the action recognition system 100 and the action recognition device 10 according to the present embodiment will be described with reference to Fig. 4. The action recognition system 100 includes three image sensors 20a, 20b, and 20c, and the action recognition device 10. In the following, when there is no need to distinguish between the three image sensors 20a, 20b, and 20c, they will be referred to as the image sensor 20. The action recognition device 10 has, as its functional configuration, for example, an acquisition unit 11, a conversion unit 12, a learning unit 13, an action recognition unit 14, and a storage unit 19. The storage unit 19 stores, for example, a video 19a, action information 19b, and a learned model 19c.

ここで、本実施形態では、動作認識装置１０が、動作認識用のモデルを学習する機能（学習モード）と、作業者の動作を認識する機能（動作認識モード）とを有する場合について説明するが、それぞれの機能を独立した別個の装置に分散して備えることとしてもよい。 In this embodiment, the case will be described where the motion recognition device 10 has a function of learning a model for motion recognition (learning mode) and a function of recognizing the motion of a worker (motion recognition mode), but each function may be distributed and provided in an independent, separate device.

動作認識システム１００及び動作認識装置１０が有する各機能構成の詳細を、以下において順次説明する。 The details of each functional configuration of the action recognition system 100 and the action recognition device 10 are explained below.

＜画像センサ＞
画像センサ２０は、例えば、汎用のカメラであり、作業者Ａが作業領域Ｒで動作を行っている場面を含む動画を撮影する。画像センサ２０は、機能的な構成として、例えば、検知部を有する。検知部は、作業者Ａの動作を検知し、その動作を示す動画を時系列情報として出力する。 <Image sensor>
The image sensor 20 is, for example, a general-purpose camera, and captures a video including a scene in which the worker A is performing an action in the working area R. The image sensor 20 has, for example, a detection unit as a functional configuration. The detection unit detects the action of the worker A, and outputs a video showing the action as time-series information.

ここで、時系列情報は、動画に限定されない。例えば、画像センサ２０の替わりに備えることができるモーションキャプチャによって測定される作業者Ａの動作を示す座標に関する情報であってもよい。 Here, the time series information is not limited to video. For example, it may be information about coordinates indicating the movements of worker A measured by a motion capture device that can be provided in place of the image sensor 20.

各画像センサ２０ａ、２０ｂ、２０ｃは、作業領域Ｒの全域及び作業者Ａの全身を撮影できるように配置される。この場合、例えば、各画像センサ２０ａ、２０ｂ、２０ｃのそれぞれが、作業領域Ｒの全域及び作業者Ａの全身を撮影できるように配置されてもよいし、各画像センサ２０ａ、２０ｂ、２０ｃのそれぞれが、作業領域Ｒ及び作業者Ａの一部分を撮影し、それぞれの動画を合わせることで作業領域Ｒの全域及び作業者Ａの全身をカバーできるように配置されてもよい。また、各画像センサ２０ａ、２０ｂ、２０ｃが、それぞれ異なる倍率で作業領域Ｒ及び作業者Ａを撮影することとしてもよい。画像センサ２０は、三台備える必要はなく、少なくとも一台以上備えることとすればよい。 Each image sensor 20a, 20b, 20c is positioned so that it can capture the entire work area R and the entire body of worker A. In this case, for example, each image sensor 20a, 20b, 20c may be positioned so that it can capture the entire work area R and the entire body of worker A, or each image sensor 20a, 20b, 20c may be positioned so that it can capture a portion of the work area R and worker A and cover the entire work area R and the entire body of worker A by combining the respective videos. Also, each image sensor 20a, 20b, 20c may capture the work area R and worker A at different magnifications. There is no need to provide three image sensors 20; at least one or more may be provided.

＜取得部＞
取得部１１は、作業者Ａが行った動作に関する時系列情報（本実施形態では動画）を画像センサ２０から取得する。取得部１１が取得した時系列情報は、記憶部１９に伝送され、動画１９ａとして記憶される。取得部１１は、記憶部１９に記憶された動画１９ａを取得することも行う。 <Acquisition Department>
The acquisition unit 11 acquires time-series information (video in this embodiment) related to the actions performed by the worker A from the image sensor 20. The time-series information acquired by the acquisition unit 11 is transmitted to the storage unit 19 and stored as a video 19a. The acquisition unit 11 also acquires the video 19a stored in the storage unit 19.

取得部１１は、動画１９ａの画像から、作業者の骨格の動きを示す骨格データを抽出する。骨格データは、複数の身体部位それぞれに対応する座標（ｘ，ｙ）によって表すことができる。本実施形態では、座標が、二次元座標（ｘ，ｙ）である場合について説明するが、三次元座標（ｘ，ｙ，ｚ）である場合にも同様に適用できる。また、座標値の他に、座標値の確度を示す情報を付加してもよい。 The acquisition unit 11 extracts skeletal data indicating the movement of the worker's skeleton from the images of the video 19a. The skeletal data can be represented by coordinates (x, y) corresponding to each of a plurality of body parts. In this embodiment, the case where the coordinates are two-dimensional coordinates (x, y) is described, but the same can be applied to the case where the coordinates are three-dimensional coordinates (x, y, z). In addition to the coordinate values, information indicating the accuracy of the coordinate values may be added.

骨格データは、記憶部１９に伝送され、動作情報１９ｂの一部として記憶される。取得部１１は、記憶部１９に記憶された動作情報１９ｂを取得することも行う。 The skeletal data is transmitted to the storage unit 19 and stored as part of the motion information 19b. The acquisition unit 11 also acquires the motion information 19b stored in the storage unit 19.

図５を参照して、動作情報１９ｂについて説明する。動作情報１９ｂは、データ項目として、例えば、経過時間項目、右手動作項目、左手動作項目、及び骨格データ項目を含んで構成される。経過時間項目は、作業対象となる全工程のうち最初の工程が開始された時間を基準にし、その基準にした時間からの経過時間を格納する。経過時間の間隔は、任意に設定することができ、例えば、動画のフレーム単位に設定することとしてもよいし、１秒ごと等のように所定時間ごとに設定することとしてもよい。 Movement information 19b will be described with reference to FIG. 5. Movement information 19b is configured to include, as data items, an elapsed time item, a right hand movement item, a left hand movement item, and a skeleton data item, for example. The elapsed time item stores the elapsed time from the reference time, which is the time when the first process of all the processes to be worked on was started. The interval of the elapsed time can be set arbitrarily, and may be set, for example, in units of frames of a video, or may be set at a predetermined interval such as every second.

右手動作項目は、右手の動作が、認識対象である共通動作のいずれに該当するのかを示す情報を格納する。左手動作項目は、左手の動作が、認識対象である共通動作のいずれに該当するのかを示す情報を格納する。右手動作項目及び左手動作項目に格納する情報は、例えば、動画を参照し、右手及び左手の各動作を経過時間ごとに確認し、その確認した動作内容を入力することで登録することができる。 The right hand action item stores information indicating which of the common actions to be recognized corresponds to a right hand action. The left hand action item stores information indicating which of the common actions to be recognized corresponds to a left hand action. The information stored in the right hand action item and left hand action item can be registered, for example, by referring to a video, checking each of the right and left hand actions at each elapsed time, and inputting the confirmed action content.

動作情報１９ｂの骨格データ項目は、経過時間に対応する動画から抽出した骨格データを格納する。なお、この骨格データ項目に格納される骨格データは、後述する変換処理により算出される距離データに置き換えられる。 The skeleton data item of the motion information 19b stores skeleton data extracted from the video corresponding to the elapsed time. Note that the skeleton data stored in this skeleton data item is replaced with distance data calculated by the conversion process described below.

図５に示す先頭行の動作情報は、最初の工程が開始されてから１秒が経過した時点の動画に基づく動作情報である。この動作情報には、作業者の右手の動作が、共通動作のいずれにも該当せず、作業者の左手の動作が、共通動作のうち、把持に該当することを示す情報と、その時点の動画から抽出された骨格データとが格納されている。また、図５に示す最終行の動作情報は、最初の工程が開始されてから２分５３秒が経過した時点の動画に基づく動作情報である。この動作情報には、作業者の右手及び左手の動作が、共通動作のうち、調整に該当することを示す情報と、その時点の動画から抽出された骨格データとが格納されている。 The motion information in the first row shown in Figure 5 is based on the video at one second after the start of the first process. This motion information contains information indicating that the motion of the worker's right hand does not fall into any of the common motions and that the motion of the worker's left hand falls into the common motion of grasping, as well as skeletal data extracted from the video at that time. The motion information in the last row shown in Figure 5 is based on the video at two minutes and fifty-three seconds after the start of the first process. This motion information contains information indicating that the motions of the worker's right and left hands fall into the common motion of adjusting, as well as skeletal data extracted from the video at that time.

＜変換部＞
図４に示す変換部１２は、骨格データに含まれる各座標を、起点座標からの距離に変換する変換処理を実行する。変換処理は、骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの座標と、起点部位に対応する起点座標との間の距離をそれぞれ算出し、骨格データに含まれるそれぞれの座標を、算出した距離に置き換える処理である。 <Conversion section>
4 executes a conversion process for converting each coordinate included in the skeletal data into a distance from a starting coordinate. The conversion process is a process for calculating a distance between each coordinate and a starting coordinate corresponding to a starting point based on coordinates corresponding to each of a plurality of body parts included in the skeletal data, and replacing each coordinate included in the skeletal data with the calculated distance.

ここで、変換部１２は、学習モード時及び動作認識モード時に共通の機能となるが、入力される骨格データは異なる。学習モード時には、学習対象の動画を用いて生成した動作情報に含まれる骨格データに基づいて距離を算出する。他方、動作認識モード時には、認識対象となる作業者の動画から抽出した骨格データに基づいて距離を算出する。 The conversion unit 12 has a common function in both the learning mode and the motion recognition mode, but the input skeletal data is different. In the learning mode, the distance is calculated based on the skeletal data included in the motion information generated using the video of the target of learning. On the other hand, in the motion recognition mode, the distance is calculated based on the skeletal data extracted from the video of the worker to be recognized.

図６は、起点部位ａとして首が設定され、首に対応する起点座標と身体部位それぞれに対応する座標との間の距離をそれぞれ算出するイメージを模式的に示すものである。 Figure 6 shows a schematic diagram of how the neck is set as the starting point a and the distances between the starting point coordinates corresponding to the neck and the coordinates corresponding to each body part are calculated.

図６に示す骨格データの各座標を、起点座標からの距離にそれぞれ変換することで、以下のような認識が可能となる。例えば、右手に対応する距離が、時間の経過とともに長くなっていれば、右手が首から遠ざかる方向にどの程度動かされている（伸ばされている）のかを認識することができる。他方、右手に対応する距離が、時間の経過とともに短くなっていれば、右手が首に近づく方向にどの程度動かされている（縮められている）のかを認識することができる。 By converting each coordinate of the skeleton data shown in Figure 6 into a distance from the origin coordinate, the following recognition becomes possible. For example, if the distance corresponding to the right hand increases over time, it is possible to recognize how much the right hand is being moved (stretched) in the direction away from the neck. On the other hand, if the distance corresponding to the right hand decreases over time, it is possible to recognize how much the right hand is being moved (shortened) in the direction toward the neck.

起点座標からの距離ｄ_iは、例えば以下の式（１）のように算出することができる。式（１）では、起点部位ａに対応する起点座標を（ｘ_a，ｙ_a）とし、任意の身体部位ｉに対応する座標を（ｘ_i，ｙ_i）（ｉは可変）とする。任意の身体部位ｉに対応する座標に、起点座標を含むこととしてもよい。 The distance d _i from the origin coordinate can be calculated, for example, according to the following formula (1). In formula (1), the origin coordinate corresponding to the origin part a is (x _a , y _a ), and the coordinate corresponding to an arbitrary body part i is (x _i , y _i ) (i is variable). The origin coordinate may be included in the coordinate corresponding to the arbitrary body part i.

変換部１２は、式（１）により算出した距離ｄ_iに基づいて、任意の身体部位ｉに対応する座標（ｘ_i，ｙ_i）を、例えば（ｄ_i，ｄ_i）に置き換える。 The conversion unit 12 replaces the coordinates (x _i , y _i ) corresponding to an arbitrary body part i with, for example, (d _i , d _i ) based on the distance d _i calculated by the formula (1).

ここで、変換部１２は、動画から抽出した骨格データを調整するための調整処理を行う調整部をさらに含むことができる。変換部１２に含まれる調整部は、前述した変換処理を実行する前に調整処理を実行する。 Here, the conversion unit 12 may further include an adjustment unit that performs an adjustment process to adjust the skeletal data extracted from the video. The adjustment unit included in the conversion unit 12 performs the adjustment process before performing the above-mentioned conversion process.

調整処理には、例えば、骨格データの時系列補完処理、身長（体格）の正規化処理、骨格データの時系列平滑化処理、骨格データのシフト処理及びノイズ付与処理が含まれる。調整処理に含まれる各処理について以下に説明する。 The adjustment process includes, for example, a time series complementation process for skeletal data, a normalization process for height (physique), a time series smoothing process for skeletal data, a shift process for skeletal data, and a noise addition process. Each process included in the adjustment process is described below.

骨格データの時系列補完処理は、骨格データに生ずる欠損データを、時間的に前後に位置する他の骨格データに基づいて補完する処理である。欠損データは、例えば、作業者の姿勢等によって隠れてしまい推定できない部位に対して生じ得る。 The time series completion process of skeletal data is a process that completes missing data that occurs in skeletal data based on other skeletal data located before and after in time. Missing data may occur, for example, in areas that cannot be estimated because they are hidden by the worker's posture, etc.

身長（体格）の正規化処理は、例えば、男女間等に生ずる体格差を吸収するために、体形に基づいて骨格データを正規化する処理である。正規化する処理として、例えば、骨格データを胴体の長さ（例えば鼻から腰までの長さ）で除算することで、正規化したデータを生成することができる。 Height (physique) normalization is a process that normalizes skeletal data based on body shape to absorb physical differences that occur between men and women, for example. As a normalization process, for example, normalized data can be generated by dividing the skeletal data by the length of the torso (for example, the length from the nose to the waist).

骨格データの時系列平滑化処理は、骨格データの時間軸方向の変化に対するノイズを除去する処理である。平滑化する処理として、例えば、ガウシアンフィルタ処理を骨格データに対して施すことで、平滑化したデータを生成することができる。 The time series smoothing process of the skeletal data is a process that removes noise from changes in the time axis direction of the skeletal data. As a smoothing process, for example, Gaussian filter processing can be performed on the skeletal data to generate smoothed data.

骨格データのシフト処理は、骨格データの原点を揃え、始点を統一することで、動作のばらつきを抑え、動作の特徴を認識し易くする処理である。例えば、首の関節が原点に位置するように骨格データ全体を平行移動させてシフト後の骨格データを生成する。 Skeletal data shifting is a process that aligns the origin of skeletal data and standardizes the starting point to reduce variation in movement and make movement characteristics easier to recognize. For example, the entire skeletal data is translated so that the neck joint is located at the origin, and shifted skeletal data is generated.

ノイズ付与処理は、骨格データにノイズを加えることで、骨格データを仮想的に増加させる処理である。骨格データに加えるノイズは、例えば、骨格データとして適合し得る範囲内の値をランダムに発生させて生成することができる。 The noise addition process is a process that virtually increases the skeletal data by adding noise to the skeletal data. The noise added to the skeletal data can be generated, for example, by randomly generating values within a range that can be used as skeletal data.

ここで、調整部は、学習モード時には、前述した調整処理の各処理を実行し、動作認識モード時には、前述した調整処理からノイズ付与処理を除いた各処理を実行することが好ましい。 Here, it is preferable that the adjustment unit executes each step of the adjustment process described above in the learning mode, and executes each step of the adjustment process described above excluding the noise addition process in the action recognition mode.

＜学習部＞
図４に示す学習部１３は、学習モード時の機能である。学習部１３は、変換部１２により骨格データの各座標が距離に変換された後の動作情報１９ｂに基づいて、共通動作のいずれかに該当する作業者の動作を示す情報を出力する動作認識用のモデルを生成（学習）する。 <Study Section>
The learning unit 13 shown in Fig. 4 is a function in the learning mode. The learning unit 13 generates (learns) a model for action recognition that outputs information indicating an action of a worker that corresponds to any of the common actions, based on the action information 19b after each coordinate of the skeleton data is converted into a distance by the conversion unit 12.

学習部１３により学習させられたモデルは、記憶部１９に伝送され、学習済モデル１９ｃとして記憶される。 The model trained by the training unit 13 is transmitted to the memory unit 19 and stored as a trained model 19c.

＜動作認識部＞
動作認識部１４は、動作認識モード時の機能である。動作認識部１４は、変換部１２により認識対象となる作業者に対応する骨格データの各座標から変換された距離データを含む動作情報１９ｂを学習済モデル１９ｃに入力し、学習済モデル１９ｃから出力される共通動作のいずれかに該当する作業者の動作を示す情報に基づいて、作業者の動作を認識する。作業者の動作を認識する際に、例えば、ＳＴ－ＧＣＮ（Spatial Temporal Graph Convolutional Networks）等の公知の動作認識手法を用いることができる。
<Action Recognition Unit>
The motion recognition unit 14 is a function in the motion recognition mode. The motion recognition unit 14 inputs motion information 19b including distance data converted from each coordinate of the skeletal data corresponding to the worker to be recognized by the conversion unit 12 to the trained model 19c, and recognizes the motion of the worker based on information indicating the motion of the worker corresponding to any of the common motions output from the trained model 19c. When recognizing the motion of the worker, for example, a known motion recognition method such as ST-GCN (Spatial Temporal Graph Convolutional Networks) can be used.

［ハードウェア構成］
次に、図７を用いて、本実施形態に係る動作認識装置１０のハードウェア構成について、その一例を説明する。動作認識装置１０は、演算装置に相当するＣＰＵ（Central Processing Unit）１０ａと、記憶部１９に相当するＲＡＭ（Random Access Memory）１０ｂと、記憶部１９に相当するＲＯＭ（Read only Memory）１０ｃと、通信装置１０ｄと、入力装置１０ｅと、表示装置１０ｆとを有する。これらの各構成は、バスを介して相互にデータを送受信できるように接続される。なお、本実施形態では動作認識装置１０が一台のコンピュータで構成される場合について説明するが、動作認識装置１０は、複数のコンピュータを用いて実現されてもよい。 [Hardware configuration]
Next, an example of the hardware configuration of the action recognition device 10 according to the present embodiment will be described with reference to FIG. 7. The action recognition device 10 includes a CPU (Central Processing Unit) 10a corresponding to a calculation device, a RAM (Random Access Memory) 10b corresponding to a storage unit 19, a ROM (Read Only Memory) 10c corresponding to the storage unit 19, a communication device 10d, an input device 10e, and a display device 10f. These components are connected to each other via a bus so that they can transmit and receive data to each other. Note that, although the present embodiment describes a case where the action recognition device 10 is configured by one computer, the action recognition device 10 may be realized using multiple computers.

ＣＰＵ１０ａは、ＲＡＭ１０ｂ又はＲＯＭ１０ｃに記憶されたプログラムを実行し、データの演算や加工を行う制御部として機能する。ＣＰＵ１０ａは、入力装置１０ｅや通信装置１０ｄから種々の入力データを受信し、入力データを演算した結果を表示装置１０ｆに表示したり、ＲＡＭ１０ｂやＲＯＭ１０ｃに格納したりする。 The CPU 10a executes programs stored in the RAM 10b or ROM 10c and functions as a control unit that performs calculations and processing of data. The CPU 10a receives various input data from the input device 10e or communication device 10d, and displays the results of calculations on the input data on the display device 10f or stores them in the RAM 10b or ROM 10c.

ＲＡＭ１０ｂは、例えば半導体記憶素子で構成され、書き換え可能なデータを記憶する。ＲＯＭ１０ｃは、例えば半導体記憶素子で構成され、読み出し可能かつ書き換え不可能なデータを記憶する。 RAM 10b is composed of, for example, a semiconductor memory element, and stores rewritable data. ROM 10c is composed of, for example, a semiconductor memory element, and stores readable but non-rewritable data.

通信装置１０ｄは、動作認識装置１０を外部機器に接続するインターフェースである。通信装置１０ｄは、例えば、画像センサ２０とＬＡＮ（Local Area Network）やインターネット等の通信ネットワークにより接続され、画像センサ２０から動画を受信する。 The communication device 10d is an interface that connects the action recognition device 10 to an external device. The communication device 10d is connected to the image sensor 20 via a communication network such as a LAN (Local Area Network) or the Internet, for example, and receives video from the image sensor 20.

入力装置１０ｅは、ユーザからデータの入力を受け付けるインターフェースであり、例えば、キーボード、マウス及びタッチパネルを含むことができる。 The input device 10e is an interface that accepts data input from a user and may include, for example, a keyboard, a mouse, and a touch panel.

表示装置１０ｆは、ＣＰＵ１０ａによる演算結果等を視覚的に表示するインターフェースであり、例えば、ＬＣＤ（Liquid Crystal Display）により構成することができる。 The display device 10f is an interface that visually displays the results of calculations performed by the CPU 10a, and can be configured, for example, with an LCD (Liquid Crystal Display).

プログラムは、ＲＡＭ１０ｂやＲＯＭ１０ｃ等のコンピュータによって読み取り可能な記憶媒体に記憶されて提供されてもよいし、通信装置１０ｄにより接続される通信ネットワークを介して提供されてもよい。動作認識装置１０は、ＣＰＵ１０ａがプログラムを実行することで、図４に示す取得部１１、変換部１２、学習部１３及び動作認識部１４の動作を行う。なお、これらの物理的な構成は例示であって、必ずしも独立した構成でなくてもよい。例えば、動作認識装置１０は、ＣＰＵ１０ａとＲＡＭ１０ｂやＲＯＭ１０ｃとが一体化したＬＳＩ（Large-Scale Integration）を備えることとしてもよい。 The program may be provided by being stored in a computer-readable storage medium such as RAM 10b or ROM 10c, or may be provided via a communication network connected by communication device 10d. In the action recognition device 10, the CPU 10a executes the program to perform the operations of the acquisition unit 11, conversion unit 12, learning unit 13, and action recognition unit 14 shown in FIG. 4. Note that these physical configurations are merely examples and do not necessarily have to be independent configurations. For example, the action recognition device 10 may be provided with an LSI (Large-Scale Integration) in which the CPU 10a is integrated with the RAM 10b and ROM 10c.

§３動作例
図８は、本実施形態に係る動作認識装置１０における学習モード時の動作の一例を示すフローチャートである。この動作は、学習用の動画から抽出された骨格データに含まれる各座標を起点座標からの距離に変換した後の動作情報１９ｂを用いて、動作認識用のモデルを学習させる際の動作である。 8 is a flow chart showing an example of an operation in the learning mode of the action recognition device 10 according to this embodiment. This operation is an operation for learning an action recognition model using the action information 19b obtained after each coordinate included in the skeletal data extracted from the learning video is converted into a distance from the starting point coordinate.

最初に、取得部１１は、学習用の動画１９ａに基づいて生成された動作情報１９ｂを記憶部１９から取得する（ステップＳ１０１）。この動作情報１９ｂは、共通動作を含む作業工程に従って作業する作業者を撮影した動画に基づいて生成されたものである。また、動作情報１９ｂの骨格データ項目には、その動画に対応する作業者の骨格データが格納されている。 First, the acquisition unit 11 acquires from the storage unit 19 the motion information 19b generated based on the learning video 19a (step S101). This motion information 19b is generated based on a video of a worker performing a work process including a common motion. In addition, the skeleton data item of the motion information 19b stores skeleton data of the worker corresponding to the video.

続いて、変換部１２は、骨格データに含まれる各座標を起点座標からの距離に変換する変換処理を実行する（ステップＳ１０２）。この変換処理の手順については、後述する。 Next, the conversion unit 12 executes a conversion process to convert each coordinate included in the skeleton data into a distance from the origin coordinate (step S102). The procedure for this conversion process will be described later.

続いて、学習部１３は、上記ステップＳ１０２で変換された後の距離データを含む動作情報１９ｂに基づいて、共通動作のいずれかに該当する作業者の動作を示す情報を出力する動作認識用のモデルを学習させる（ステップＳ１０３）。そして、本動作を終了する。 Then, the learning unit 13 trains a model for action recognition that outputs information indicating an action of the worker that corresponds to one of the common actions, based on the action information 19b including the distance data after conversion in step S102 (step S103). Then, this operation ends.

図９を参照し、上記ステップＳ１０２で実行される変換処理の手順について説明する。この変換処理では、予め起点部位として首が設定されていることとする。 The procedure for the conversion process executed in step S102 above will be described with reference to FIG. 9. In this conversion process, it is assumed that the neck is set in advance as the starting point.

最初に、取得部１１によって、例えばフレーム数がＦである動画１９ａの画像から、フレーム単位に１体の骨格データが抽出される（ステップＳ２０１）。つまり、取得部１１によりＦ体の骨格データが抽出される。 First, the acquisition unit 11 extracts skeletal data for one body per frame from an image of a video 19a having, for example, F frames (step S201). In other words, the acquisition unit 11 extracts skeletal data for F bodies.

続いて、変換部１２は、上記式（１）により、１体の骨格データに含まれる複数の身体部位それぞれに対応する座標と、起点部位に対応する起点座標との間の距離をそれぞれ算出する（ステップＳ２０２）。 Next, the conversion unit 12 calculates the distance between the coordinates corresponding to each of the multiple body parts included in the skeletal data of one body and the starting coordinates corresponding to the starting part using the above formula (1) (step S202).

続いて、変換部１２は、骨格データに含まれるそれぞれの座標を、上記ステップＳ２０２で算出した距離に変換する（ステップＳ２０３）。これにより、動作情報１９ｂの骨格データ項目に、距離データが格納される。 Then, the conversion unit 12 converts each coordinate included in the skeleton data into the distance calculated in step S202 (step S203). As a result, the distance data is stored in the skeleton data item of the motion information 19b.

続いて、変換部１２は、Ｆ体全ての骨格データの座標を距離に変換したかどうかを判定する（ステップＳ２０４）。この判定がＮＯである場合は、上記ステップＳ２０２に処理を移行する一方、この判定がＹＥＳである場合には、本変換処理を終了する。 Next, the conversion unit 12 determines whether the coordinates of the skeletal data of all F bodies have been converted into distances (step S204). If the determination is NO, the process proceeds to step S202, whereas if the determination is YES, the conversion process ends.

図１０は、本実施形態に係る動作認識装置１０における動作認識モード時の動作の一例を示すフローチャートである。この動作では、認識対象となる作業者の動作を撮影した動画１９ａが、記憶部１９に既に格納されていることを前提とする。 Figure 10 is a flowchart showing an example of the operation of the action recognition device 10 according to this embodiment in the action recognition mode. This operation is based on the premise that a video 19a capturing the action of a worker to be recognized has already been stored in the storage unit 19.

最初に、取得部１１は、動作認識用の動画１９ａを記憶部１９から取得する（ステップＳ３０１）。この動画１９ａは、共通動作を含む作業工程に従って作業する作業者を撮影した動画である。また、動作情報１９ｂの骨格データ項目には、その動画に対応する作業者の骨格データが格納されている。 First, the acquisition unit 11 acquires a video 19a for motion recognition from the storage unit 19 (step S301). This video 19a is a video of a worker performing a work process including a common motion. In addition, the skeleton data item of the motion information 19b stores skeleton data of the worker corresponding to the video.

続いて、変換部１２は、骨格データに含まれる各座標を起点座標からの距離に変換する変換処理を実行する（ステップＳ３０２）。この変換処理の手順は、前述した図９の手順と同様であるため、その説明を省略する。 Next, the conversion unit 12 executes a conversion process to convert each coordinate included in the skeleton data into a distance from the origin coordinate (step S302). The procedure of this conversion process is similar to the procedure of FIG. 9 described above, and therefore will not be described.

続いて、動作認識部１４は、上記ステップＳ３０２で変換された後の距離データを含む動作情報１９ｂを学習済モデル１９ｃに入力し、学習済モデル１９ｃから出力される共通動作のいずれかに該当する作業者の動作を示す情報に基づいて、作業者の動作を認識する（ステップＳ３０３）。そして、本動作を終了する。
Next, the action recognition unit 14 inputs the action information 19b including the distance data after the conversion in the above step S302 to the trained model 19c, and recognizes the action of the worker based on the information indicating the action of the worker that corresponds to any of the common actions output from the trained model 19c (step S303). Then, this operation ends.

前述したように、本実施形態に係る動作認識装置１０によれば、学習対象となる作業者の動画から抽出された骨格データに含まれる複数の身体部位それぞれに対応する座標と、起点部位に対応する起点座標との間の距離をそれぞれ算出し、骨格データの各座標を、その算出した距離に置き換えることができる。そして、骨格データの各座標を距離データに置き換えた後の動作情報に基づいて、共通動作のいずれかに該当する作業者の動作を示す情報を出力する動作認識用のモデルを学習させることができる。 As described above, the motion recognition device 10 according to this embodiment can calculate the distance between the coordinates corresponding to each of the multiple body parts included in the skeletal data extracted from the video of the worker to be learned and the starting coordinates corresponding to the starting part, and replace each coordinate of the skeletal data with the calculated distance. Then, based on the motion information after replacing each coordinate of the skeletal data with distance data, it is possible to train a motion recognition model that outputs information indicating the motion of the worker that corresponds to one of the common motions.

また、認識対象となる作業者の動画から抽出された骨格データに含まれる複数の身体部位それぞれに対応する座標と、起点部位に対応する起点座標との間の距離をそれぞれ算出し、骨格データの各座標を、その算出した距離に置き換えることができる。そして、置き換えた後の距離データを学習済モデルに入力することで、共通動作のいずれかに該当する作業者の動作を認識することができる。 In addition, the system can calculate the distance between the coordinates corresponding to each of the multiple body parts included in the skeletal data extracted from the video of the worker to be recognized and the starting coordinates corresponding to the starting part, and replace each coordinate of the skeletal data with the calculated distance.The replaced distance data can then be input into the trained model, making it possible to recognize the worker's movements that correspond to any of the common movements.

それゆえ、本実施形態に係る動作認識装置１０によれば、異なる作業工程で行われる共通動作をそれぞれ認識させることができる。 Therefore, the action recognition device 10 according to this embodiment can recognize common actions performed in different work processes.

なお、本発明は、前述した実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内において、他の様々な形で実施することができる。例えば、本発明の実施形態は、以下の付記のようにも記載され得る。ただし、本発明の実施形態は、以下の付記に記載した形態に限定されない。また、本発明の実施形態は、付記間の記載を置換したり、組み合わせたりした形態であってもよい。 The present invention is not limited to the above-described embodiment, and can be implemented in various other forms without departing from the scope of the present invention. For example, the embodiment of the present invention can be described as in the following appendix. However, the embodiment of the present invention is not limited to the form described in the following appendix. Furthermore, the embodiment of the present invention may be in a form in which the descriptions between the appendixes are substituted or combined.

［付記１］
作業の経過時間、認識対象である異なる作業工程に共通して含まれる共通動作、及び作業者の骨格データを含む動作情報（１９ｂ）を取得する取得部（１１）と、
前記骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの前記座標と、前記身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、前記骨格データに含まれるそれぞれの前記座標を、前記算出した前記距離に変換する変換部（１２）と、
前記変換部（１２）により前記座標が前記距離に変換された後の前記動作情報（１９ｂ）に基づいて、前記共通動作のいずれかに該当する前記作業者の動作を示す情報を出力するモデルを、動作認識用のモデルとして学習させる学習部（１３）と、
を備える動作認識装置（１０）。 [Appendix 1]
an acquisition unit (11) for acquiring motion information (19b) including an elapsed time of a task, a common motion included in common to different task processes to be recognized, and skeletal data of a worker;
a conversion unit (12) that calculates a distance between each of the coordinates corresponding to a plurality of body parts included in the skeletal data and a starting point coordinate corresponding to a starting point part of the body parts that is determined as a starting point, based on the coordinates corresponding to each of the body parts included in the skeletal data, and converts each of the coordinates included in the skeletal data into the calculated distance;
a learning unit (13) that learns, as a model for action recognition, a model that outputs information indicating an action of the worker that corresponds to any one of the common actions, based on the action information (19b) after the coordinates have been converted into the distances by the conversion unit (12); and
An action recognition device (10) comprising:

［付記２］
作業者の動作に関する時系列情報（１９ａ）を取得する取得部（１１）と、
前記時系列情報（１９ａ）から抽出された骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの前記座標と、前記身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、前記骨格データに含まれるそれぞれの前記座標を、前記算出した前記距離に変換する変換部（１２）と、
前記変換部（１２）により変換された後の前記距離を、学習済モデル（１９ｃ）に入力し、当該学習済モデル（１９ｃ）から出力される、認識対象である異なる作業工程に共通して含まれる共通動作のいずれかに該当する前記作業者の動作を示す情報に基づいて、前記作業者の動作を認識する動作認識部（１４）と、
を備える動作認識装置（１０）。 [Appendix 2]
An acquisition unit (11) that acquires time-series information (19a) regarding a worker's motion;
a conversion unit (12) that calculates a distance between each of a plurality of body parts included in the skeleton data extracted from the time-series information (19a) and a starting point coordinate corresponding to a starting point part of the body parts that is determined as a starting point, and converts each of the coordinates included in the skeleton data into the calculated distance;
an action recognition unit (14) that inputs the distance converted by the conversion unit (12) into a trained model (19c) and recognizes the action of the worker based on information indicating any of the actions of the worker that corresponds to any of the common actions commonly included in different work processes that are targets of recognition, the information being output from the trained model (19c);
An action recognition device (10) comprising:

［付記３］
前記起点部位は、首又は腰のいずれかである、
付記１又は２記載の動作認識装置（１０）。 [Appendix 3]
The starting site is either the neck or the lower back.
3. The action recognition device (10) according to claim 1 or 2.

［付記４］
プロセッサ（１０ａ）により実行される動作認識方法であって、
作業の経過時間、認識対象である異なる作業工程に共通して含まれる共通動作、及び作業者の骨格データを含む動作情報（１９ｂ）を取得することと、
前記骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの前記座標と、前記身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、前記骨格データに含まれるそれぞれの前記座標を、前記算出した前記距離に変換することと、
前記座標が前記距離に変換された後の前記動作情報（１９ｂ）に基づいて、前記共通動作のいずれかに該当する前記作業者の動作を示す情報を出力するモデルを、動作認識用のモデルとして学習させることと、
を含む動作認識方法。 [Appendix 4]
A motion recognition method executed by a processor (10a), comprising:
Acquiring motion information (19b) including an elapsed time of a task, a common motion included in common to different task processes to be recognized, and skeletal data of a worker;
calculating a distance between each of the coordinates corresponding to a plurality of body parts included in the skeletal data and a starting point coordinate corresponding to a starting point part of the body parts that is determined as a starting point, based on the coordinates corresponding to each of the body parts included in the skeletal data, and converting each of the coordinates included in the skeletal data into the calculated distance;
a model that outputs information indicating an action of the worker that corresponds to any one of the common actions based on the action information (19b) after the coordinates are converted into the distance is trained as a model for action recognition;
The action recognition method includes:

［付記５］
プロセッサ（１０ａ）により実行される動作認識方法であって、
作業者の動作に関する時系列情報を取得することと、
前記時系列情報（１９ａ）から抽出された骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの前記座標と、前記身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、前記骨格データに含まれるそれぞれの前記座標を、前記算出した前記距離に変換することと、
前記変換された後の前記距離を、学習済モデル（１９ｃ）に入力し、当該学習済モデル（１９ｃ）から出力される、認識対象である異なる作業工程に共通して含まれる共通動作のいずれかに該当する前記作業者の動作を示す情報に基づいて、前記作業者の動作を認識することと、
を含む動作認識方法。 [Appendix 5]
A motion recognition method executed by a processor (10a), comprising:
Obtaining time series information regarding the worker's movements;
Calculating a distance between each of the coordinates corresponding to a plurality of body parts included in the skeleton data extracted from the time-series information (19a) and a starting point coordinate corresponding to a starting point part of the body parts that is determined as a starting point, and converting each of the coordinates included in the skeleton data into the calculated distance;
inputting the converted distance into a trained model (19c), and recognizing the motion of the worker based on information indicating any of the motions of the worker that correspond to any of the common motions commonly included in different work processes that are targets of recognition, outputted from the trained model (19c);
The action recognition method includes:

［付記６］
コンピュータを、
作業の経過時間、認識対象である異なる作業工程に共通して含まれる共通動作、及び作業者の骨格データを含む動作情報（１９ｂ）を取得する取得部（１１）、
前記骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの前記座標と、前記身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、前記骨格データに含まれるそれぞれの前記座標を、前記算出した前記距離に変換する変換部（１２）、
前記変換部（１２）により前記座標が前記距離に変換された後の前記動作情報（１９ｂ）に基づいて、前記共通動作のいずれかに該当する前記作業者の動作を示す情報を出力するモデルを、動作認識用のモデルとして学習させる学習部（１３）、
として機能させる動作認識プログラム。 [Appendix 6]
Computer,
an acquisition unit (11) for acquiring motion information (19b) including an elapsed time of a task, a common motion included in common to different task processes to be recognized, and skeletal data of a worker;
a conversion unit (12) that calculates a distance between each of the coordinates corresponding to a plurality of body parts included in the skeleton data and a starting point coordinate corresponding to a starting point part defined as a starting point among the body parts, based on the coordinates corresponding to each of the body parts included in the skeleton data, and converts each of the coordinates included in the skeleton data into the calculated distance;
a learning unit (13) that learns, as a model for action recognition, a model that outputs information indicating an action of the worker that corresponds to any one of the common actions, based on the action information (19b) after the coordinates have been converted into the distances by the conversion unit (12);
A motion recognition program that functions as a

［付記７］
コンピュータを、
作業者の動作に関する時系列情報（１９ａ）を取得する取得部（１１）、
前記時系列情報（１９ａ）から抽出された骨格データに含まれる複数の身体部位それぞれに対応する座標に基づいて、それぞれの前記座標と、前記身体部位のうち起点として定められた起点部位に対応する起点座標との間の距離を算出し、前記骨格データに含まれるそれぞれの前記座標を、前記算出した前記距離に変換する変換部（１２）、
前記変換部（１２）により変換された後の前記距離を、学習済モデル（１９ｃ）に入力し、当該学習済モデル（１９ｃ）から出力される、認識対象である異なる作業工程に共通して含まれる共通動作のいずれかに該当する前記作業者の動作を示す情報に基づいて、前記作業者の動作を認識する動作認識部（１４）、
として機能させる動作認識プログラム。 [Appendix 7]
Computer,
An acquisition unit (11) for acquiring time-series information (19a) relating to the motion of a worker;
a conversion unit (12) that calculates a distance between each of the coordinates corresponding to a plurality of body parts included in the skeleton data extracted from the time-series information (19a) and a starting point coordinate corresponding to a starting point part of the body parts that is determined as a starting point, and converts each of the coordinates included in the skeleton data into the calculated distance;
an action recognition unit (14) that inputs the distance converted by the conversion unit (12) into a trained model (19c) and recognizes the action of the worker based on information output from the trained model (19c) indicating any of the actions of the worker that correspond to any of the common actions commonly included in different work processes that are targets of recognition;
A motion recognition program that functions as a

１０…動作認識装置、１０ａ…ＣＰＵ、１０ｂ…ＲＡＭ、１０ｃ…ＲＯＭ、１０ｄ…通信装置、１０ｅ…入力装置、１０ｆ…表示装置、１１…取得部、１２…変換部、１３…学習部、１４…動作認識部、１９…記憶部、１９ａ…動画、１９ｂ…動作情報、１９ｃ…学習済モデル、２０ａ，２０ｂ，２０ｃ…画像センサ、１００…動作認識システム、Ａ…作業者、Ｒ…作業領域、ａ…起点部位 10...motion recognition device, 10a...CPU, 10b...RAM, 10c...ROM, 10d...communication device, 10e...input device, 10f...display device, 11...acquisition unit, 12...conversion unit, 13...learning unit, 14...motion recognition unit, 19...storage unit, 19a...video, 19b...motion information, 19c...trained model, 20a, 20b, 20c...image sensor, 100...motion recognition system, A...worker, R...work area, a...starting point

Claims

an acquisition unit that acquires from a storage unit motion information generated based on time-series information on a motion of a first worker in a task performed by the first worker to be recognized, the motion information including an elapsed time of the task performed by the first worker, information indicating a common motion commonly performed by all workers although the motion differs in each task process, and skeletal data of the first worker, in association with each other;
a conversion unit that calculates, based on coordinates of a coordinate system that represents the skeletal data corresponding to each of a plurality of body parts included in the skeletal data of the acquired motion information, a distance between each of the coordinates and a starting point coordinate corresponding to a starting point part of the body part that is defined as a starting point near the center of the worker's movement, and replaces each of the coordinates included in the skeletal data with the calculated distance;
an action recognition unit that inputs action information including the distance after the replacement by the conversion unit into a trained model, and recognizes the action of the first worker based on information indicating a worker's action corresponding to any of the common actions output from the trained model; and
Equipped with
The trained model is a model trained to receive input of motion information after the coordinates included in the skeletal data of the motion information generated based on time-series information on the motion of the second worker in the work performed by the second worker to be trained are replaced with the distance by the conversion unit, and to output information indicating a motion of the worker corresponding to any of the common motions.
Motion recognition device.

The starting site is either the neck or the lower back.
The action recognition device according to claim 1.

1. A processor-implemented method for action recognition, comprising:
acquiring from a storage unit motion information generated based on time-series information on motions of a first worker in a task performed by the first worker to be recognized, the motion information storing, in association with each other, information on an elapsed time of the task performed by the first worker, information on common motions commonly performed by all workers although the movements are different in different task steps, and skeletal data of the first worker;
calculating a distance between each of the coordinates of a coordinate system representing the skeletal data corresponding to each of a plurality of body parts included in the skeletal data of the acquired motion information and a starting point coordinate corresponding to a starting point part of the body part that is defined as a starting point near the center of the worker's movement, and replacing each of the coordinates included in the skeletal data with the calculated distance;
inputting motion information including the replaced distance into a trained model, and recognizing the motion of the first worker based on information indicating a motion of the worker corresponding to any of the common motions output from the trained model;
Including,
The trained model is a model trained to receive input of motion information after the coordinates included in the skeletal data of the motion information generated based on time-series information on the motion of the second worker in the work performed by the second worker to be trained are replaced with the distance, and to output information indicating a motion of the worker corresponding to any of the common motions.
Action recognition method.

Computer,
an acquisition unit that acquires from a storage unit motion information generated based on time-series information on a motion of a first worker in a task performed by the first worker to be recognized, the motion information storing, in association with each other, information on an elapsed time of the task performed by the first worker, information on common motions commonly performed by all workers although the movements are different in different task steps, and skeletal data of the first worker;
a conversion unit that calculates, based on coordinates of a coordinate system that represents the skeletal data corresponding to each of a plurality of body parts included in the skeletal data of the acquired motion information, a distance between each of the coordinates and a starting point coordinate corresponding to a starting point part of the body parts that is defined as a starting point near the center of the worker's movement, and replaces each of the coordinates included in the skeletal data with the calculated distance;
an action recognition unit that inputs action information including the distance after the replacement by the conversion unit into a trained model, and recognizes the action of the first worker based on information indicating a worker's action corresponding to any of the common actions output from the trained model;
Function as a
The trained model is a model trained to receive input of motion information after the coordinates included in the skeletal data of the motion information generated based on time-series information on the motion of the second worker in the work performed by the second worker to be trained are replaced with the distance by the conversion unit, and to output information indicating a motion of the worker corresponding to any of the common motions.
Motion recognition program.