JP7635822B2

JP7635822B2 - Joint point detection device, joint point detection method, and program

Info

Publication number: JP7635822B2
Application number: JP2023502224A
Authority: JP
Inventors: 遊哉石井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2021-02-26
Filing date: 2022-02-01
Publication date: 2025-02-26
Anticipated expiration: 2042-02-01
Also published as: WO2022181251A1; JPWO2022181251A1

Description

本発明は、画像から生体の関節点を検出するための、関節点検出装置、及び関節点検出方法に関し、更には、これらを実現するためのプログラムに関する。 The present invention relates to a joint point detection device and a joint point detection method for detecting joint points of a living body from an image, and further to a program for implementing these.

近年、画像から人の姿勢を推定するシステムが提案されている。このようなシステムは、映像監視、ユーザインタフェース等の分野での利用が期待されている。例えば、画像監視システムにおいて、人の姿勢を推定できれば、カメラに写った人物が何をしているかを推定できるので、監視精度の向上が図られる。また、ユーザインタフェースにおいて、人の姿勢を推定できれば、ジェスチャーによる入力が可能となる。 In recent years, systems have been proposed that can estimate a person's posture from an image. Such systems are expected to be used in fields such as video surveillance and user interfaces. For example, if a person's posture can be estimated in an image surveillance system, it would be possible to estimate what a person captured on camera is doing, thereby improving surveillance accuracy. Furthermore, in a user interface, if a person's posture can be estimated, input would be possible through gestures.

例えば、非特許文献１は、画像から人の姿勢、とりわけ、人の手の姿勢を推定するシステムを開示している。非特許文献１に開示されたシステムは、まず、手の画像を含む画像データを取得すると、取得した画像データを、関節点毎の画像特徴量を機械学習したニューラルネットワークに入力して、関節点毎に、関節点の存在確率を色彩及び濃度によって表現するヒートマップを出力させる。For example, Non-Patent Document 1 discloses a system that estimates a person's posture, particularly the posture of a person's hand, from an image. The system disclosed in Non-Patent Document 1 first acquires image data including an image of the hand, inputs the acquired image data into a neural network that has machine-learned image features for each joint point, and outputs a heat map that represents the probability of the existence of each joint point using color and density.

続いて、非特許文献１に開示されたシステムは、関節点と対応するヒートマップとの関係を機械学習したニューラルネットワークに、出力されたヒートマップを入力する。また、このようなニューラルネットワークは複数個用意されており、あるニューラルネットワークからの出力結果は、別のニューラルネットワークに入力される。この結果、ヒートマップ上の関節点の位置がリファインされる。Next, the system disclosed in Non-Patent Document 1 inputs the output heat map to a neural network that has learned the relationship between joint points and the corresponding heat map through machine learning. In addition, multiple such neural networks are prepared, and the output results from one neural network are input to another neural network. As a result, the positions of the joint points on the heat map are refined.

また、特許文献１も、画像から手の姿勢を推定するシステムを開示している。特許文献１に開示されたシステムも、非特許文献１に開示されたシステムと同様に、ニューラルネットワークを使用して、関節点の座標を推定する。Patent Document 1 also discloses a system for estimating hand posture from an image. Like the system disclosed in Non-Patent Document 1, the system disclosed in Patent Document 1 also uses a neural network to estimate the coordinates of joint points.

特開２０１７－１９１５７６号公報JP 2017-191576 A

Christian Zimmermann, Thomas Brox, ”Learning to Estimate 3D Hand Pose from Single RGB Images”, [online], University of Freiburg, [２０２１年２月８日検索],インターネット＜URL：https://openaccess.thecvf.com/content_ICCV_2017/papers/Zimmermann_Learning_to_Estimate_ICCV_2017_paper.pdf＞Christian Zimmermann, Thomas Brox, "Learning to Estimate 3D Hand Pose from Single RGB Images", [online], University of Freiburg, [Retrieved February 8, 2021], Internet <URL: https://openaccess.thecvf.com/content_ICCV_2017/papers/Zimmermann_Learning_to_Estimate_ICCV_2017_paper.pdf>

非特許文献１又は特許文献１に開示されたシステムを用いれば、上述したように、画像から人の手の関節点の座標を推定することができるが、これらのシステムには、以下のように推定精度が低下するという問題点がある。 As described above, by using the systems disclosed in Non-Patent Document 1 or Patent Document 1, it is possible to estimate the coordinates of the joint points of a person's hand from an image. However, these systems have the problem that the estimation accuracy decreases as described below.

まず、生体には多くの関節点があり、画像には、一部の関節点が映っていない場合がある。このような場合、非特許文献１及び特許文献１に開示されたシステムでは、画像に映っていない関節点のヒートマップでの位置が誤った位置となることがある。そして、この結果、ニューラルネットワークによって各関節点の位置がリファインされる際に、画像に映っていない関節点の誤った位置に引きずられて、画像に写っている関節点の位置までも誤った位置となる。First, a living body has many joint points, and some joint points may not be captured in the image. In such cases, in the systems disclosed in Non-Patent Document 1 and Patent Document 1, the positions of the joint points not captured in the image may be incorrect in the heat map. As a result, when the positions of each joint point are refined by the neural network, they are dragged along by the incorrect positions of the joint points not captured in the image, and even the positions of the joint points captured in the image become incorrect.

本発明の目的の一例は、関節点の位置の推定精度の向上を図り得る、関節点検出装置、関節点検出方法、及びプログラムを提供することにある。 An example of an object of the present invention is to provide a joint point detection device, a joint point detection method, and a program that can improve the accuracy of estimating the position of a joint point.

上記目的を達成するため、本発明の一側面における関節点検出装置は、
対象の画像データから、前記対象の関節点それぞれ毎に、当該関節点を表す第１の特徴量を出力する、全特徴量出力部と、
前記対象の関節点のうちの複数の特定の関節点それぞれ毎に、当該特定の関節点以外の関節点を表す前記第１の特徴量を入力として、当該特定の関節点以外の関節点間の位置関係を機械学習している第１の機械学習モデルを用いて、前記特定の関節点以外の関節点を表す第２の特徴量を出力する、部分特徴量出力部と、
前記対象の関節点それぞれ毎に、当該関節点を表す前記第２の特徴量を入力として、当該関節点の位置を機械学習している第２の機械学習モデルを用いて、当該関節点を表す第３の特徴量を出力する、統合特徴量出力部と、
を備えていることを特徴とする。 In order to achieve the above object, a joint point detection device according to one aspect of the present invention comprises:
a total feature output unit that outputs, for each joint point of the object, a first feature representing the joint point from the image data of the object;
a partial feature output unit that receives, for each of a plurality of specific joint points among the target joint points, the first feature representing joint points other than the specific joint points, and outputs a second feature representing the joint points other than the specific joint points by using a first machine learning model that performs machine learning on positional relationships between the joint points other than the specific joint points;
an integrated feature output unit that receives, for each of the target joint points, the second feature representing the joint point, and outputs a third feature representing the joint point by using a second machine learning model that performs machine learning on a position of the joint point;
The present invention is characterized in that it is provided with:

また、上記目的を達成するため、本発明の一側面における関節点検出方法は、
対象の画像データから、前記対象の関節点それぞれ毎に、当該関節点を表す第１の特徴量を出力する、全特徴量出力ステップと、
前記対象の関節点のうちの複数の特定の関節点それぞれ毎に、当該特定の関節点以外の関節点を表す前記第１の特徴量を入力として、当該特定の関節点以外の関節点間の位置関係を機械学習している第１の機械学習モデルを用いて、前記特定の関節点以外の関節点を表す第２の特徴量を出力する、部分特徴量出力ステップと、
前記対象の関節点それぞれ毎に、当該関節点を表す前記第２の特徴量を入力として、当該関節点の位置を機械学習している第２の機械学習モデルを用いて、当該関節点を表す第３の特徴量を出力する、統合特徴量出力ステップと、
を有することを特徴とする。 In order to achieve the above object, a joint point detection method according to one aspect of the present invention includes:
a total feature amount output step of outputting, for each joint point of the object, a first feature amount representing the joint point from the image data of the object;
a partial feature output step of outputting, for each of a plurality of specific joint points among the target joint points, a second feature representing the joint points other than the specific joint points using a first machine learning model that performs machine learning on positional relationships between the joint points other than the specific joint points, using the first feature representing the joint points other than the specific joint points as an input;
an integrated feature output step of outputting, for each of the target joint points, a third feature representing the joint point using a second machine learning model that performs machine learning on a position of the joint point using the second feature representing the joint point as an input;
The present invention is characterized by having the following.

更に、上記目的を達成するため、本発明の一側面におけるプログラムは、
コンピュータに、
対象の画像データから、前記対象の関節点それぞれ毎に、当該関節点を表す第１の特徴量を出力する、全特徴量出力ステップと、
前記対象の関節点のうちの複数の特定の関節点それぞれ毎に、当該特定の関節点以外の関節点を表す前記第１の特徴量を入力として、当該特定の関節点以外の関節点間の位置関係を機械学習している第１の機械学習モデルを用いて、前記特定の関節点以外の関節点を表す第２の特徴量を出力する、部分特徴量出力ステップと、
前記対象の関節点それぞれ毎に、当該関節点を表す前記第２の特徴量を入力として、当該関節点の位置を機械学習している第２の機械学習モデルを用いて、当該関節点を表す第３の特徴量を出力する、統合特徴量出力ステップと、
を実行させる、ことを特徴とする。 Furthermore, in order to achieve the above object, a program according to one aspect of the present invention comprises:
On the computer,
a total feature amount output step of outputting, for each joint point of the object, a first feature amount representing the joint point from the image data of the object;
a partial feature output step of outputting, for each of a plurality of specific joint points among the target joint points, a second feature representing the joint points other than the specific joint points using a first machine learning model that performs machine learning on positional relationships between the joint points other than the specific joint points, using the first feature representing the joint points other than the specific joint points as an input;
an integrated feature output step of outputting, for each of the target joint points, a third feature representing the joint point using a second machine learning model that performs machine learning on a position of the joint point using the second feature representing the joint point as an input;
The present invention is characterized in that:

以上のように、本発明によれば、関節点の位置の推定精度の向上を図ることができる。 As described above, according to the present invention, it is possible to improve the accuracy of estimating the position of joint points.

図１は、実施の形態における関節点検出装置の概略構成を示す構成図である。FIG. 1 is a diagram showing a schematic configuration of a joint point detection device according to an embodiment. 図２は、実施の形態における関節点検出装置１０の構成をより具体的に示す図である。FIG. 2 is a diagram showing in more detail the configuration of the joint point detection device 10 according to the embodiment. 図３は、実施の形態における全特徴量出力部の機能を説明する図である。FIG. 3 is a diagram illustrating the function of the total feature output unit in the embodiment. 図４は、実施の形態における部分特徴量出力部の機能を説明する図である。FIG. 4 is a diagram illustrating the function of the partial feature output unit in the embodiment. 図５は、実施の形態における統合特徴量出力部の機能を説明する図である。FIG. 5 is a diagram illustrating the function of the integrated feature output unit according to the embodiment. 図６は、実施の形態における関節点検出装置の動作を示すフロー図である。FIG. 6 is a flow chart showing the operation of the joint point detection device in the embodiment. 図７は、実施の形態の変形例における関節点検出装置１０の構成を示す図である。FIG. 7 is a diagram showing a configuration of a joint point detection device 10 according to a modified example of the embodiment. 図８は、実施の形態の変形例における全特徴量出力部の機能を説明する図である。FIG. 8 is a diagram illustrating the function of a total feature output unit in the modified example of the embodiment. 図９は、実施の形態の変形例における部分特徴量出力部の機能を説明する図である。FIG. 9 is a diagram illustrating the function of a partial feature output unit in the modification of the embodiment. 図１０は、実施の形態の変形例における統合特徴量出力部の機能を説明する図である。FIG. 10 is a diagram illustrating functions of an integrated feature output unit in the modification of the embodiment. 図１１は、実施の形態における関節点検出装置を実現するコンピュータの一例を示すブロック図である。FIG. 11 is a block diagram showing an example of a computer that realizes the joint point detection device according to the embodiment.

（実施の形態）
以下、実施の形態における関節点検出装置、関節点検出方法、及びプログラムについて、図１～図１１を参照しながら説明する。 (Embodiment)
Hereinafter, a joint point detection device, a joint point detection method, and a program according to an embodiment will be described with reference to FIGS.

［装置構成］
最初に、実施の形態における関節点検出装置の概略構成について図１を用いて説明する。図１は、実施の形態における関節点検出装置の概略構成を示す構成図である。 [Device configuration]
First, a schematic configuration of a joint point detection device according to an embodiment will be described with reference to Fig. 1. Fig. 1 is a diagram showing a schematic configuration of a joint point detection device according to an embodiment.

図１に示す実施の形態における関節点検出装置１０は、対象、例えば、生体、ロボット等の関節点を検出するための装置である。図１に示すように、関節点検出装置１０は、全特徴量出力部１１と、部分特徴量出力部１２と、統合特徴量出力部１３と、を備えている。The joint point detection device 10 in the embodiment shown in Figure 1 is a device for detecting joint points of an object, such as a living body, a robot, etc. As shown in Figure 1, the joint point detection device 10 includes a total feature output unit 11, a partial feature output unit 12, and an integrated feature output unit 13.

全特徴量出力部１１は、対象の画像データから、対象の関節点それぞれ毎に、その関節点を表す第１の特徴量を出力する。The total feature output unit 11 outputs a first feature representing each joint point of the target from the image data of the target.

部分特徴量出力部１２は、対象の関節点のうちの複数の特定の関節点それぞれ毎に、その特定の関節点以外の関節点を表す第１の特徴量を入力として、第１の機械学習モデルを用いて、その特定の関節点以外の関節点を表す第２の特徴量を出力する。第１の機械学習モデルは、特定の関節点以外の関節点間の位置関係を機械学習している機械学習モデルである。The partial feature output unit 12 receives, for each of a plurality of specific joint points among the target joint points, a first feature representing a joint point other than the specific joint point, and outputs a second feature representing the joint point other than the specific joint point using a first machine learning model. The first machine learning model is a machine learning model that machine-learns the positional relationships between the joint points other than the specific joint points.

統合特徴量出力部１３は、対象の関節点それぞれ毎に、その関節点を表す第２の特徴量を入力として、第２の機械学習モデルを用いて、その関節点を表す第３の特徴量を出力する。第２の機械学習モデルは、関節点の位置を機械学習している機械学習モデルである。The integrated feature output unit 13 receives, for each target joint point, a second feature representing the joint point, and outputs a third feature representing the joint point using a second machine learning model. The second machine learning model is a machine learning model that learns the positions of the joint points by machine learning.

このように、実施の形態では、特定の関節点毎に、それ以外の関節点について、第１の機械学習モデルを用いて、第２の特徴量が出力される。第１の機械学習モデルは、特定の関節点以外の関節点間の位置関係を機械学習しているので、第２の特徴量は、特定の関節点が見えない場合における、それ以外の関節点の位置を適切に示すことができる。そして、各関節点の最終的な特徴量である第３の特徴量は、この第２の特徴量から得られているので、実施の形態によれば、各関節点の位置の推定精度を向上することが可能となる。 In this manner, in the embodiment, for each specific joint point, the second feature is output using the first machine learning model for the other joint points. Since the first machine learning model learns the positional relationships between joint points other than the specific joint point by machine learning, the second feature can appropriately indicate the positions of the other joint points when the specific joint point is not visible. And since the third feature, which is the final feature for each joint point, is obtained from this second feature, according to the embodiment, it is possible to improve the estimation accuracy of the position of each joint point.

続いて、図２～図５を用いて、実施の形態における関節点検出装置１０の構成及び機能について具体的に説明する。図２は、実施の形態における関節点検出装置１０の構成をより具体的に示す図である。図３は、実施の形態における全特徴量出力部の機能を説明する図である。図４は、実施の形態における部分特徴量出力部の機能を説明する図である。図５は、実施の形態における統合特徴量出力部の機能を説明する図である。 Next, the configuration and functions of the joint point detection device 10 in the embodiment will be specifically described using Figures 2 to 5. Figure 2 is a diagram showing the configuration of the joint point detection device 10 in the embodiment in more detail. Figure 3 is a diagram explaining the functions of the total feature output unit in the embodiment. Figure 4 is a diagram explaining the functions of the partial feature output unit in the embodiment. Figure 5 is a diagram explaining the functions of the integrated feature output unit in the embodiment.

図２に示すように、実施の形態では、関節点検出装置１０は、上述した全特徴量出力部１１、部分特徴量出力部１２、及び統合特徴量出力部１３に加えて、関節点検出部１４と、記憶部１５と、を備えている。As shown in FIG. 2, in an embodiment, the joint point detection device 10 includes, in addition to the total feature output unit 11, partial feature output unit 12, and integrated feature output unit 13 described above, a joint point detection unit 14 and a memory unit 15.

記憶部１５は、第１の機械学習モデル１６及び第２の機械学習モデル１７を格納している。実施の形態では、第１の機械学習モデル１６及び第２の機械学習モデル１７は、畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）によって構築されている。記憶部１５は、実施の形態では、後述する第３の機械学習モデル１８も格納している。第３の機械学習モデル１８も、ＣＮＮによって構築されている。なお、以降においては、機械学習モデルは、「ＣＮＮ」とも表記する。The memory unit 15 stores a first machine learning model 16 and a second machine learning model 17. In the embodiment, the first machine learning model 16 and the second machine learning model 17 are constructed by a convolutional neural network (CNN). In the embodiment, the memory unit 15 also stores a third machine learning model 18, which will be described later. The third machine learning model 18 is also constructed by a CNN. Note that hereinafter, the machine learning model will also be referred to as "CNN".

更に、以降においては、関節点の検出の対象が人の手である場合を例に挙げて説明する。なお、実施の形態において、関節点の検出の対象は、人の手に限定されず、人の体全体であっても良いし、他の部位であっても良い。また、関節点の検出の対象は、関節点を有するものであれば良く、人以外のもの、例えば、ロボットであっても良い。更に、実施の形態では、関節点に加え、関節点以外の部分、例えば、指先といった特徴的な部分も、検出の対象となっていても良い。 Furthermore, hereinafter, an example will be described in which the target of joint point detection is a human hand. Note that in the embodiments, the target of joint point detection is not limited to a human hand, but may be the entire human body or other parts. Also, the target of joint point detection may be anything that has joint points, and may be something other than a human, for example, a robot. Furthermore, in the embodiments, in addition to joint points, parts other than joint points, for example characteristic parts such as fingertips, may also be detected.

加えて、実施の形態では、第１の特徴量、第２の特徴量、及び第３の特徴量として、ヒートマップが用いられているとする。ヒートマップは、画像上の関節点が存在する可能性を表現するマップであり、例えば、関節点の存在の可能性を色の濃淡で表現することができる。なお、第１の特徴量、第２の特徴量、及び第３の特徴量として、ヒートマップ以外のものが用いられても良い。この点については変形例で後述する。In addition, in the embodiment, a heat map is used as the first feature amount, the second feature amount, and the third feature amount. A heat map is a map that represents the possibility of the existence of a joint point on an image, and for example, the possibility of the existence of a joint point can be represented by a shade of color. Note that something other than a heat map may be used as the first feature amount, the second feature amount, and the third feature amount. This point will be described later in a modified example.

全特徴量出力部１１は、実施の形態では、まず、対象の画像データ２０を取得する。そして、全特徴量出力部１１は、図３に示すように、第３のＣＮＮ１８に、画像データ２０を入力し、関節点の第１の特徴量として、第１のヒートマップ２１を出力させる。また、図３の例では、第１のヒートマップ２１は、画像データ２０上の関節点毎に複数出力されている。In the embodiment, the total feature output unit 11 first acquires the target image data 20. Then, as shown in Fig. 3, the total feature output unit 11 inputs the image data 20 to the third CNN 18, which outputs a first heat map 21 as a first feature of the joint point. Also, in the example of Fig. 3, multiple first heat maps 21 are output for each joint point on the image data 20.

第３のＣＮＮ１８は、画像上の関節点とヒートマップとの関係を学習している機械学習モデルである。第３のＣＮＮ１８の機械学習では、関節点の画像データと、正解となるヒートマップとが訓練データとなる。そして、第３のＣＮＮ１８の機械学習は、訓練データとなる画像データの出力結果（ヒートマップ）と正解となるヒートマップとの差分が小さくなるように、パラメータを更新することで行われる。The third CNN 18 is a machine learning model that learns the relationship between joint points on an image and a heat map. In the machine learning of the third CNN 18, image data of the joint points and a heat map that is the correct answer are used as training data. The machine learning of the third CNN 18 is performed by updating parameters so that the difference between the output result (heat map) of the image data that is the training data and the heat map that is the correct answer is reduced.

部分特徴量出力部１２は、実施の形態では、図４に示すように、特定の関節点毎に構築された第１のＣＮＮ１６を用いる。図４の例では、特定の関節点は、第１関節、第２関節、指付け根であり、これらの関節点毎に第１のＣＮＮ１６が構築されている。また、図４の例では、第１のＣＮＮ１６は、特定の関節点に加えて、指先についても構築されている。In the embodiment, the partial feature output unit 12 uses a first CNN 16 constructed for each specific joint point as shown in Fig. 4. In the example of Fig. 4, the specific joint points are the first joint, the second joint, and the base of the finger, and a first CNN 16 is constructed for each of these joint points. Also, in the example of Fig. 4, the first CNN 16 is constructed for the fingertips in addition to the specific joint points.

第１のＣＮＮ１６は、上述したように、特定の関節点以外の関節点間の位置関係を機械学習している機械学習モデルである。第１のＣＮＮ１６の機械学習では、対応する特定の関節点（又は指先）以外の関節点のヒートマップと、正解となるヒートマップとが、訓練データとなる。第１のＣＮＮ１６の機械学習も、訓練データとなる画像データの出力結果（ヒートマップ）と正解となるヒートマップとの差分が小さくなるように、パラメータを更新することで行われる。As described above, the first CNN 16 is a machine learning model that machine-learns the positional relationships between joint points other than specific joint points. In the machine learning of the first CNN 16, the heat map of joint points other than the corresponding specific joint point (or fingertip) and the correct heat map become training data. The machine learning of the first CNN 16 is also performed by updating parameters so that the difference between the output result (heat map) of the image data that is the training data and the correct heat map becomes small.

部分特徴量出力部１２は、第１のＣＮＮ１６それぞれに、全特徴量出力部１１から入力された第１のヒートマップ２１のうち、対応する指先又は特定の関節点以外の関節点の第１のヒートマップ２１を入力する。これにより、第２の特徴量として、指先以外の第２のヒートマップ２２と、第１関節以外の第２のヒートマップ２２と、第２関節以外の第２のヒートマップ２２と、指付け根以外の第２のヒートマップ２２とが、出力される。The partial feature output unit 12 inputs, to each of the first CNNs 16, the first heat map 21 of the joint points other than the corresponding fingertip or specific joint point, out of the first heat map 21 input from the full feature output unit 11. As a result, the second heat map 22 other than the fingertip, the second heat map 22 other than the first joint, the second heat map 22 other than the second joint, and the second heat map 22 other than the base of the finger are output as second features.

統合特徴量出力部１３は、実施の形態では、図５に示すように、関節点毎に構築された第２のＣＮＮ１７を用いる。図５の例では、第２のＣＮＮ１７は、手の２１個の関節点それぞれ毎に、構築されている。In the embodiment, the integrated feature output unit 13 uses a second CNN 17 constructed for each joint point as shown in Figure 5. In the example of Figure 5, the second CNN 17 is constructed for each of the 21 joint points of the hand.

第２のＣＮＮ１７は、上述したように、対応する関節点の位置を機械学習している機械学習モデルである。第２のＣＮＮ１７の機械学習でも、対応する関節点のヒートマップと、正解となるヒートマップとが、訓練データとなる。第２のＣＮＮ１７の機械学習も、訓練データとなる画像データの出力結果（ヒートマップ）と正解となるヒートマップとの差分が小さくなるように、パラメータを更新することで行われる。As described above, the second CNN 17 is a machine learning model that machine-learns the positions of corresponding joint points. In the machine learning of the second CNN 17, the heat map of the corresponding joint points and the correct heat map are also used as training data. The machine learning of the second CNN 17 is also performed by updating parameters so that the difference between the output result (heat map) of the image data that is the training data and the correct heat map is reduced.

統合特徴量出力部１３は、第１のＣＮＮ１６それぞれが出力した第２のヒートマップ２２をマージし、第２のＣＮＮ１７それぞれに、対応する関節点の第２のヒートマップ２２だけを入力する。これにより、第３の特徴量として、関節点毎に、第３のヒートマップ２３が出力される。The integrated feature output unit 13 merges the second heat maps 22 output by each of the first CNNs 16, and inputs only the second heat maps 22 of the corresponding joint points to each of the second CNNs 17. As a result, a third heat map 23 is output for each joint point as a third feature.

その後、関節点検出部１４は、対象である手の関節点毎の第３のヒートマップ２３を取得する。そして、関節点検出部１４は、関節点毎の第３のヒートマップ２３を用いて、対象の関節点それぞれの座標を検出する。Then, the joint point detection unit 14 obtains a third heat map 23 for each joint point of the target hand. The joint point detection unit 14 then uses the third heat map 23 for each joint point to detect the coordinates of each of the target joint points.

具体的には、関節点検出部１４は、関節点毎に、第３のヒートマップ２３の最も濃度の高い箇所を特定し、特定した箇所の画像上の２次元座標を検出する。また、関節点毎に、第３のヒートマップ２３が複数存在する場合は、関節点検出部１４は、第３のヒートマップ２３毎に最も濃度の高い箇所の２次元座標を特定し、更に、特定した各２次元座標の平均を求め、求めた平均の座標を最終的な座標とする。Specifically, the joint point detection unit 14 identifies the location of highest density in the third heat map 23 for each joint point, and detects the two-dimensional coordinates of the identified location on the image. Furthermore, if there are multiple third heat maps 23 for each joint point, the joint point detection unit 14 identifies the two-dimensional coordinates of the location of highest density for each third heat map 23, and further calculates the average of each of the identified two-dimensional coordinates, and sets the calculated average coordinates as the final coordinates.

［装置動作］
次に、実施の形態における関節点検出装置１０の動作について図６を用いて説明する。図６は、実施の形態における関節点検出装置の動作を示すフロー図である。以下の説明においては、適宜図１～図５を参照する。また、実施の形態では、関節点検出装置１０を動作させることによって、関節点検出方法が実施される。よって、実施の形態における関節点検出方法の説明は、以下の関節点検出装置１０の動作説明に代える。 [Device Operation]
Next, the operation of the joint point detection device 10 in the embodiment will be described with reference to Fig. 6. Fig. 6 is a flow diagram showing the operation of the joint point detection device in the embodiment. In the following description, Figs. 1 to 5 will be referred to as appropriate. Also, in the embodiment, a joint point detection method is implemented by operating the joint point detection device 10. Therefore, the description of the joint point detection method in the embodiment will be replaced with the following description of the operation of the joint point detection device 10.

図６に示すように、最初に、全特徴量出力部１１は、対象の画像データ２０を取得し、第３の機械学習モデル１８に、取得した画像データ２０を入力し、各関節点を表す第１のヒートマップ２１を出力させる（ステップＡ１）。As shown in FIG. 6, first, the full feature output unit 11 acquires target image data 20, inputs the acquired image data 20 to the third machine learning model 18, and outputs a first heat map 21 representing each joint point (step A1).

次に、部分特徴量出力部１２は、各第１のＣＮＮ１６に、ステップＡ１で出力された第１のヒートマップ２１のうち、対応する指先又は特定の関節点以外の関節点の第１のヒートマップ２１を入力し、指先又は特定の関節点以外の関節点を表す第２のヒートマップ２２を出力させる（ステップＡ２）。Next, the partial feature output unit 12 inputs to each first CNN 16 the first heat map 21 of the corresponding fingertip or joint point other than the specific joint point from the first heat map 21 output in step A1, and outputs a second heat map 22 representing the fingertip or joint point other than the specific joint point (step A2).

次に、統合特徴量出力部１３は、ステップＡ２にて各第１のＣＮＮ１６が出力した第２のヒートマップ２２をマージし、各第２のＣＮＮ１７に、対応する関節点の第２のヒートマップ２２だけを入力して、対応する関節点を表す第３のヒートマップ２３を出力させる（ステップＡ３）。Next, the integrated feature output unit 13 merges the second heat maps 22 output by each first CNN 16 in step A2, and inputs only the second heat maps 22 of the corresponding articulation points to each second CNN 17, causing them to output a third heat map 23 representing the corresponding articulation points (step A3).

次に、関節点検出部１４は、ステップＡ３で出力された関節点毎の第３のヒートマップ２３を用いて、対象の関節点それぞれの座標を検出する（ステップＡ４）。Next, the joint point detection unit 14 detects the coordinates of each target joint point using the third heat map 23 for each joint point output in step A3 (step A4).

以上のように、実施の形態では、例えば、指先、第１関節、第２関節、付け根については、画像で見えなくなる可能性が高いため、これら以外の部分の第１の特徴量を用いて、第２の特徴量が出力される。そして、このようにして出力された第２の特徴量から、各関節点の第３の特徴量が得られるため、最終的に得られる各関節点の座標は正確な値となる。つまり、実施の形態によれば、各関節点の位置の推定精度を向上することが可能となる。 As described above, in the embodiment, for example, since there is a high possibility that the fingertip, first joint, second joint, and base will not be visible in the image, the second feature amount is output using the first feature amount of the other parts. Then, the third feature amount of each joint point is obtained from the second feature amount output in this way, so that the finally obtained coordinates of each joint point are accurate values. In other words, according to the embodiment, it is possible to improve the estimation accuracy of the position of each joint point.

［変形例］
続いて、図７～図１０を用いて、実施の形態の変形例について説明する。図７は、実施の形態の変形例における関節点検出装置１０の構成を示す図である。図８は、実施の形態の変形例における全特徴量出力部の機能を説明する図である。図９は、実施の形態の変形例における部分特徴量出力部の機能を説明する図である。図１０は、実施の形態の変形例における統合特徴量出力部の機能を説明する図である。 [Modification]
Next, a modified example of the embodiment will be described with reference to Figs. 7 to 10. Fig. 7 is a diagram showing the configuration of a joint point detection device 10 in a modified example of the embodiment. Fig. 8 is a diagram explaining the function of a total feature output unit in the modified example of the embodiment. Fig. 9 is a diagram explaining the function of a partial feature output unit in the modified example of the embodiment. Fig. 10 is a diagram explaining the function of an integrated feature output unit in the modified example of the embodiment.

図７に示すように、本変形例では、第１の機械学習モデル３１、第２の機械学習モデル３２は、グラフ畳み込みネットワーク（ＧＣＮ：Graphic Convolution Network）によって構築されている。ＧＣＮは、複数のノードで構成されたグラフ構造を入力とし、隣接するノードを用いて畳み込み処理を実行するネットワークである。本変形例においては、機械学習モデルは、「ＧＣＮ」とも表記する。なお、第３の機械学習モデルとしては、図２及び図３に示した第３のＣＮＮ１８が用いられている。As shown in FIG. 7, in this modification, the first machine learning model 31 and the second machine learning model 32 are constructed by a graph convolution network (GCN). A GCN is a network that takes a graph structure composed of multiple nodes as input and performs convolution processing using adjacent nodes. In this modification, the machine learning model is also written as "GCN". Note that the third machine learning model is the third CNN 18 shown in FIG. 2 and FIG. 3.

加えて、本変形例では、第１の特徴量、及び第２の特徴量として、画像上の関節点の位置を示す座標値が用いられる。具体的には、ＧＣＮが用いられるため、ＧＣＮから出力される特徴量は、各ノードが関節点の座標値を表すグラフ構造である。In addition, in this modified example, coordinate values indicating the position of the joint point on the image are used as the first feature and the second feature. Specifically, since a GCN is used, the feature output from the GCN has a graph structure in which each node represents the coordinate value of the joint point.

また、本変形例では、特徴量として座標値が出力されるため、関節点検出装置１０においては、図２に示した関節点検出部１４は省略されている。なお、上述した点以外は、本変形例においても、関節点検出装置１０は、実施の形態で述べた構成を備えている。In addition, in this modified example, since coordinate values are output as feature quantities, the joint point detection unit 14 shown in Fig. 2 is omitted in the joint point detection device 10. Other than the points described above, the joint point detection device 10 in this modified example also has the configuration described in the embodiment.

全特徴量出力部１１は、本変形例においても、上述の例と同様に、図８に示すように、第３のＣＮＮ１８に、画像データ２０を入力し、関節点の第１の特徴量として、第１のヒートマップ２１を出力させる。In this modified example, as in the above example, the total feature output unit 11 inputs image data 20 to the third CNN 18 as shown in Figure 8, and outputs a first heat map 21 as the first feature of the joint point.

但し、本変形例では、全特徴量出力部１１は、出力された第１のヒートマップ２１を用いて、対象の関節点それぞれの座標値（第１の座標値）４１を算出する。座標値の算出は、上述した関節点検出部１４での処理と同様に行われる。そして、全特徴量出力部１１は、関節点毎の第１の座標値４１を、グラフ構造として、出力する。However, in this modified example, the total feature output unit 11 calculates the coordinate values (first coordinate values) 41 of each target joint point using the output first heat map 21. The calculation of the coordinate values is performed in the same manner as the processing in the joint point detection unit 14 described above. Then, the total feature output unit 11 outputs the first coordinate values 41 for each joint point as a graph structure.

部分特徴量出力部１２は、本変形例では、図９に示すように、特定の関節点毎に構築された第１のＧＣＮ３１を用いる。図９の例でも、特定の関節点は、第１関節、第２関節、指付け根であり、これらの関節点毎に第１のＧＣＮ３１が構築されている。また、第１のＧＣＮ３１も、特定の関節点に加え、指先についても構築されている。In this modified example, the partial feature output unit 12 uses a first GCN 31 constructed for each specific joint point, as shown in Fig. 9. In the example of Fig. 9, the specific joint points are the first joint, the second joint, and the base of the finger, and a first GCN 31 is constructed for each of these joint points. Furthermore, the first GCN 31 is also constructed for the fingertips in addition to the specific joint points.

第１のＧＣＮ３１は、特定の関節点以外の関節点間の位置関係を機械学習している機械学習モデルである。第１のＧＣＮ３１の機械学習では、対応する特定の関節点（又は指先）以外の関節点の座標値と、正解となる座標値とが、訓練データとなる。第１のＧＣＮ３１の機械学習も、訓練データとなる画像データの出力結果（座標値）と正解となる座標値との差分が小さくなるように、パラメータを更新することで行われる。The first GCN 31 is a machine learning model that machine-learns the positional relationships between joint points other than specific joint points. In the machine learning of the first GCN 31, the coordinate values of joint points other than the corresponding specific joint point (or fingertip) and the correct coordinate values become training data. The machine learning of the first GCN 31 is also performed by updating parameters so that the difference between the output result (coordinate values) of the image data that becomes the training data and the correct coordinate values becomes small.

部分特徴量出力部１２は、第１のＧＣＮ３１それぞれに、全特徴量出力部１１から入力された第１の座標値４１のうち、対応する指先又は特定の関節点以外の関節点の第１の座標値４１を入力する。これにより、第２の特徴量として、指先以外の第２の座標値４２と、第１関節以外の第２の座標値４２と、第２関節以外の第２の座標値４２と、指付け根以外の第２の座標値４２とが、出力される。The partial feature output unit 12 inputs, to each of the first GCNs 31, the first coordinate values 41 of the corresponding fingertip or joint points other than a specific joint point, among the first coordinate values 41 input from the full feature output unit 11. As a result, the second coordinate values 42 other than the fingertip, the second coordinate values 42 other than the first joint, the second coordinate values 42 other than the second joint, and the second coordinate values 42 other than the base of the finger are output as second features.

統合特徴量出力部１３は、本変形例では、図１０に示すように、関節点毎に構築された第２のＧＣＮ３２を用いる。図１０の例では、第２のＧＣＮ３２は、手の２１個の関節点それぞれ毎に、構築されている。In this modified example, the integrated feature output unit 13 uses a second GCN 32 constructed for each joint point, as shown in Fig. 10. In the example of Fig. 10, the second GCN 32 is constructed for each of the 21 joint points of the hand.

第２のＧＣＮ３２は、上述したように、対応する関節点の位置を機械学習している機械学習モデルである。第２のＧＣＮ３２の機械学習では、対応する関節点の座標値と、正解となる座標値とが、訓練データとなる。第２のＧＣＮ３２の機械学習も、訓練データとなる画像データの出力結果（座標値）と正解となる座標値との差分が小さくなるように、パラメータを更新することで行われる。As described above, the second GCN 32 is a machine learning model that machine-learns the positions of corresponding joint points. In the machine learning of the second GCN 32, the coordinate values of the corresponding joint points and the coordinate values that are the correct answer become training data. The machine learning of the second GCN 32 is also performed by updating parameters so that the difference between the output result (coordinate values) of the image data that is the training data and the coordinate values that are the correct answer becomes small.

統合特徴量出力部１３は、第１のＧＣＮ３１それぞれが出力した第２の座標値４２をマージし、第２のＧＣＮ３２それぞれに、対応する関節点の第２の座標値４２だけを入力する。これにより、第３の特徴量として、関節点毎に、第３の座標値４３が出力される。出力された関節点毎の第３の座標値４３が、各関節点の座標として検出される。The integrated feature output unit 13 merges the second coordinate values 42 output by each of the first GCNs 31, and inputs only the second coordinate values 42 of the corresponding joint points to each of the second GCNs 32. As a result, a third coordinate value 43 is output for each joint point as a third feature. The output third coordinate value 43 for each joint point is detected as the coordinate of each joint point.

このように、本変形例では、機械学習モデルとしてＧＣＮが用いられ、特徴量として座標値が用いられているが、この場合も、画像で見えなくなる可能性が高いところ以外の部分の第１の特徴量を用いて、第２の特徴量が出力される。このため、本変形例においても、各関節点の位置の推定精度を向上することが可能となる。 In this way, in this modified example, GCN is used as the machine learning model and coordinate values are used as the features, but in this case too, the second features are output using the first features of parts other than those that are likely to be invisible in the image. Therefore, in this modified example too, it is possible to improve the estimation accuracy of the position of each joint point.

［プログラム］
実施の形態におけるプログラムは、コンピュータに、図６に示すステップＡ１～Ａ４を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、実施の形態における関節点検出装置と関節点検出方法とを実現することができる。この場合、コンピュータのプロセッサは、全特徴量出力部１１、部分特徴量出力部１２、統合特徴量出力部１３、及び関節点検出部１４として機能し、処理を行なう。 [program]
The program in the embodiment may be a program that causes a computer to execute steps A1 to A4 shown in Fig. 6. By installing and executing this program in a computer, the joint point detection device and joint point detection method in the embodiment can be realized. In this case, the processor of the computer functions as a total feature output unit 11, a partial feature output unit 12, an integrated feature output unit 13, and a joint point detection unit 14, and performs processing.

また、本実施の形態では、記憶部１５は、コンピュータに備えられたハードディスク等の記憶装置に、これらを構成するデータファイルを格納することによって実現されていても良いし、別のコンピュータの記憶装置によって実現されていても良い。 In addition, in this embodiment, the memory unit 15 may be realized by storing the data files that constitute it in a storage device such as a hard disk provided in the computer, or it may be realized by a storage device of another computer.

また、コンピュータとしては、汎用のＰＣの他に、スマートフォン、タブレット型端末装置が挙げられる。 In addition to general-purpose PCs, examples of computers include smartphones and tablet terminal devices.

実施の形態におけるプログラムは、複数のコンピュータによって構築されたコンピュータシステムによって実行されても良い。この場合は、例えば、各コンピュータが、それぞれ、全特徴量出力部１１、部分特徴量出力部１２、統合特徴量出力部１３、及び関節点検出部１４のいずれかとして機能しても良い。The program in the embodiment may be executed by a computer system constructed by multiple computers. In this case, for example, each computer may function as any one of the total feature output unit 11, the partial feature output unit 12, the integrated feature output unit 13, and the joint point detection unit 14.

［物理構成］
ここで、実施の形態におけるプログラムを実行することによって、関節点検出装置１０を実現するコンピュータについて図１１を用いて説明する。図１１は、実施の形態における関節点検出装置を実現するコンピュータの一例を示すブロック図である。 [Physical configuration]
Here, a computer that realizes the joint point detection device 10 by executing a program in the embodiment will be described with reference to Fig. 11. Fig. 11 is a block diagram showing an example of a computer that realizes the joint point detection device in the embodiment.

図１１に示すように、コンピュータ１１０は、ＣＰＵ（Central Processing Unit）１１１と、メインメモリ１１２と、記憶装置１１３と、入力インターフェイス１１４と、表示コントローラ１１５と、データリーダ／ライタ１１６と、通信インターフェイス１１７とを備える。これらの各部は、バス１２１を介して、互いにデータ通信可能に接続される。11, the computer 110 includes a CPU (Central Processing Unit) 111, a main memory 112, a storage device 113, an input interface 114, a display controller 115, a data reader/writer 116, and a communication interface 117. These components are connected to each other via a bus 121 so as to be able to communicate data with each other.

また、コンピュータ１１０は、ＣＰＵ１１１に加えて、又はＣＰＵ１１１に代えて、ＧＰＵ（Graphics Processing Unit）、又はＦＰＧＡ（Field-Programmable Gate Array）を備えていても良い。この態様では、ＧＰＵ又はＦＰＧＡが、実施の形態におけるプログラムを実行することができる。Furthermore, the computer 110 may include a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array) in addition to or instead of the CPU 111. In this embodiment, the GPU or FPGA can execute the program in the embodiment.

ＣＰＵ１１１は、記憶装置１１３に格納された、コード群で構成された実施の形態におけるプログラムをメインメモリ１１２に展開し、各コードを所定順序で実行することにより、各種の演算を実施する。メインメモリ１１２は、典型的には、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性の記憶装置である。The CPU 111 loads the program in the embodiment, which is composed of a group of codes stored in the storage device 113, into the main memory 112 and executes each code in a predetermined order to perform various calculations. The main memory 112 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory).

また、実施の形態におけるプログラムは、コンピュータ読み取り可能な記録媒体１２０に格納された状態で提供される。なお、本実施の形態におけるプログラムは、通信インターフェイス１１７を介して接続されたインターネット上で流通するものであっても良い。In addition, the program in the embodiment is provided in a state stored in a computer-readable recording medium 120. The program in the embodiment may be distributed over the Internet connected via the communication interface 117.

また、記憶装置１１３の具体例としては、ハードディスクドライブの他、フラッシュメモリ等の半導体記憶装置が挙げられる。入力インターフェイス１１４は、ＣＰＵ１１１と、キーボード及びマウスといった入力機器１１８との間のデータ伝送を仲介する。表示コントローラ１１５は、ディスプレイ装置１１９と接続され、ディスプレイ装置１１９での表示を制御する。 Specific examples of the storage device 113 include a hard disk drive and a semiconductor storage device such as a flash memory. The input interface 114 mediates data transmission between the CPU 111 and input devices 118 such as a keyboard and a mouse. The display controller 115 is connected to the display device 119 and controls the display on the display device 119.

データリーダ／ライタ１１６は、ＣＰＵ１１１と記録媒体１２０との間のデータ伝送を仲介し、記録媒体１２０からのプログラムの読み出し、及びコンピュータ１１０における処理結果の記録媒体１２０への書き込みを実行する。通信インターフェイス１１７は、ＣＰＵ１１１と、他のコンピュータとの間のデータ伝送を仲介する。The data reader/writer 116 mediates data transmission between the CPU 111 and the recording medium 120, reads programs from the recording medium 120, and writes the results of processing in the computer 110 to the recording medium 120. The communication interface 117 mediates data transmission between the CPU 111 and other computers.

また、記録媒体１２０の具体例としては、ＣＦ（Compact Flash（登録商標））及びＳＤ（Secure Digital）等の汎用的な半導体記憶デバイス、フレキシブルディスク（Flexible Disk）等の磁気記録媒体、又はＣＤ－ＲＯＭ（Compact Disk Read Only Memory）などの光学記録媒体が挙げられる。 Specific examples of recording medium 120 include general-purpose semiconductor storage devices such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), magnetic recording media such as a flexible disk, or optical recording media such as a CD-ROM (Compact Disk Read Only Memory).

なお、実施の形態における関節点検出装置１０は、プログラムがインストールされたコンピュータではなく、各部に対応したハードウェアを用いることによっても実現可能である。更に、関節点検出装置１０は、一部がプログラムで実現され、残りの部分がハードウェアで実現されていてもよい。In addition, the joint point detection device 10 in the embodiment can be realized by using hardware corresponding to each part, rather than a computer with a program installed. Furthermore, the joint point detection device 10 may be realized in part by a program and the remaining part by hardware.

上述した実施の形態の一部又は全部は、以下に記載する（付記１）～（付記１２）によって表現することができるが、以下の記載に限定されるものではない。 Some or all of the above-described embodiments can be expressed by (Appendix 1) to (Appendix 12) described below, but are not limited to the following descriptions.

（付記１）
対象の画像データから、前記対象の関節点それぞれ毎に、当該関節点を表す第１の特徴量を出力する、全特徴量出力部と、
前記対象の関節点のうちの複数の特定の関節点それぞれ毎に、当該特定の関節点以外の関節点を表す前記第１の特徴量を入力として、当該特定の関節点以外の関節点間の位置関係を機械学習している第１の機械学習モデルを用いて、前記特定の関節点以外の関節点を表す第２の特徴量を出力する、部分特徴量出力部と、
前記対象の関節点それぞれ毎に、当該関節点を表す前記第２の特徴量を入力として、当該関節点の位置を機械学習している第２の機械学習モデルを用いて、当該関節点を表す第３の特徴量を出力する、統合特徴量出力部と、
を備えていることを特徴とする関節点検出装置。 (Appendix 1)
a total feature output unit that outputs, for each joint point of the object, a first feature representing the joint point from the image data of the object;
a partial feature output unit that receives, for each of a plurality of specific joint points among the target joint points, the first feature representing joint points other than the specific joint points, and outputs a second feature representing the joint points other than the specific joint points by using a first machine learning model that performs machine learning on positional relationships between the joint points other than the specific joint points;
an integrated feature output unit that receives, for each of the target joint points, the second feature representing the joint point, and outputs a third feature representing the joint point by using a second machine learning model that performs machine learning on a position of the joint point;
A joint point detection device comprising:

（付記２）
付記１に記載の関節点検出装置であって、
前記対象の関節点それぞれ毎の前記第３の特徴量を用いて、前記対象の関節点それぞれの座標を検出する、関節点検出部を更に備えている、
ことを特徴とする関節点検出装置。 (Appendix 2)
2. The joint point detection device according to claim 1,
The apparatus further includes a joint point detection unit that detects coordinates of each of the joint points of the object by using the third feature amount for each of the joint points of the object.
A joint point detection device characterized by:

（付記３）
付記１に記載の関節点検出装置であって、
前記第１の機械学習モデル及び前記第２の機械学習モデルが、畳み込みニューラルネットワークによって構築されており、
前記第１の特徴量、前記第２の特徴量、及び前記第３の特徴量、それぞれが、画像上の関節点が存在する可能性を表現するヒートマップを含む、
ことを特徴とする関節点検出装置。 (Appendix 3)
2. The joint point detection device according to claim 1,
The first machine learning model and the second machine learning model are constructed by a convolutional neural network;
each of the first feature amount, the second feature amount, and the third feature amount includes a heat map representing a possibility that a joint point exists on the image;
A joint point detection device comprising:

（付記４）
付記１に記載の関節点検出装置であって、
前記第１の機械学習モデル及び前記第２の機械学習モデルが、グラフ畳み込みネットワークによって構築されており、
前記第１の特徴量、前記第２の特徴量、及び前記第３の特徴量、それぞれが、画像上の関節点の位置を示す座標値を含む、
ことを特徴とする関節点検出装置。 (Appendix 4)
2. The joint point detection device according to claim 1,
The first machine learning model and the second machine learning model are constructed by a graph convolutional network;
each of the first feature amount, the second feature amount, and the third feature amount includes a coordinate value indicating a position of a joint point on an image;
A joint point detection device comprising:

（付記５）
対象の画像データから、前記対象の関節点それぞれ毎に、当該関節点を表す第１の特徴量を出力する、全特徴量出力ステップと、
前記対象の関節点のうちの複数の特定の関節点それぞれ毎に、当該特定の関節点以外の関節点を表す前記第１の特徴量を入力として、当該特定の関節点以外の関節点間の位置関係を機械学習している第１の機械学習モデルを用いて、前記特定の関節点以外の関節点を表す第２の特徴量を出力する、部分特徴量出力ステップと、
前記対象の関節点それぞれ毎に、当該関節点を表す前記第２の特徴量を入力として、当該関節点の位置を機械学習している第２の機械学習モデルを用いて、当該関節点を表す第３の特徴量を出力する、統合特徴量出力ステップと、
を有することを特徴とする関節点検出方法。 (Appendix 5)
a total feature amount output step of outputting, for each joint point of the object, a first feature amount representing the joint point from the image data of the object;
a partial feature output step of outputting, for each of a plurality of specific joint points among the target joint points, a second feature representing the joint points other than the specific joint points using a first machine learning model that performs machine learning on positional relationships between the joint points other than the specific joint points, using the first feature representing the joint points other than the specific joint points as an input;
an integrated feature output step of outputting, for each of the target joint points, a third feature representing the joint point using a second machine learning model that performs machine learning on a position of the joint point using the second feature representing the joint point as an input;
A joint point detection method comprising:

（付記６）
付記５に記載の関節点検出方法であって、
前記対象の関節点それぞれ毎の前記第３の特徴量を用いて、前記対象の関節点それぞれの座標を検出する、関節点検出ステップを更に備えている、
ことを特徴とする関節点検出方法。 (Appendix 6)
6. The joint point detection method according to claim 5, further comprising:
The method further includes a joint point detection step of detecting coordinates of each of the joint points of the object by using the third feature amount for each of the joint points of the object.
A joint point detection method comprising:

（付記７）
付記５に記載の関節点検出方法であって、
前記第１の機械学習モデル及び前記第２の機械学習モデルが、畳み込みニューラルネットワークによって構築されており、
前記第１の特徴量、前記第２の特徴量、及び前記第３の特徴量、それぞれが、画像上の関節点が存在する可能性を表現するヒートマップを含む、
ことを特徴とする関節点検出方法。 (Appendix 7)
6. The joint point detection method according to claim 5, further comprising:
The first machine learning model and the second machine learning model are constructed by a convolutional neural network;
each of the first feature amount, the second feature amount, and the third feature amount includes a heat map representing a possibility that a joint point exists on the image;
A joint point detection method comprising:

（付記８）
付記５に記載の関節点検出方法であって、
前記第１の機械学習モデル及び前記第２の機械学習モデルが、グラフ畳み込みネットワークによって構築されており、
前記第１の特徴量、前記第２の特徴量、及び前記第３の特徴量、それぞれが、画像上の関節点の位置を示す座標値を含む、
ことを特徴とする関節点検出方法。 (Appendix 8)
6. The joint point detection method according to claim 5, further comprising:
The first machine learning model and the second machine learning model are constructed by a graph convolutional network;
each of the first feature amount, the second feature amount, and the third feature amount includes a coordinate value indicating a position of a joint point on an image;
A joint point detection method comprising:

（付記９）
コンピュータに、
対象の画像データから、前記対象の関節点それぞれ毎に、当該関節点を表す第１の特徴量を出力する、全特徴量出力ステップと、
前記対象の関節点のうちの複数の特定の関節点それぞれ毎に、当該特定の関節点以外の関節点を表す前記第１の特徴量を入力として、当該特定の関節点以外の関節点間の位置関係を機械学習している第１の機械学習モデルを用いて、前記特定の関節点以外の関節点を表す第２の特徴量を出力する、部分特徴量出力ステップと、
前記対象の関節点それぞれ毎に、当該関節点を表す前記第２の特徴量を入力として、当該関節点の位置を機械学習している第２の機械学習モデルを用いて、当該関節点を表す第３の特徴量を出力する、統合特徴量出力ステップと、
を実行させる、プログラム。 (Appendix 9)
On the computer,
a total feature amount output step of outputting, for each joint point of the object, a first feature amount representing the joint point from the image data of the object;
a partial feature output step of outputting, for each of a plurality of specific joint points among the target joint points, a second feature representing the joint points other than the specific joint points using a first machine learning model that performs machine learning on positional relationships between the joint points other than the specific joint points, using the first feature representing the joint points other than the specific joint points as an input;
an integrated feature output step of outputting, for each of the target joint points, a third feature representing the joint point using a second machine learning model that performs machine learning on a position of the joint point using the second feature representing the joint point as an input;
A program to execute .

（付記１０）
付記９に記載のプログラムであって、
前記コンピュータに、
前記対象の関節点それぞれ毎の前記第３の特徴量を用いて、前記対象の関節点それぞれの座標を検出する、関節点検出ステップを更に実行させる、
ことを特徴とするプログラム。 (Appendix 10)
10. The program according to claim 9,
The computer includes:
a joint point detection step of detecting coordinates of each of the joint points of the object using the third feature amount for each of the joint points of the object;
A program characterized by:

（付記１１）
付記９に記載のプログラムであって、
前記第１の機械学習モデル及び前記第２の機械学習モデルが、畳み込みニューラルネットワークによって構築されており、
前記第１の特徴量、前記第２の特徴量、及び前記第３の特徴量、それぞれが、画像上の関節点が存在する可能性を表現するヒートマップを含む、
ことを特徴とするプログラム。 (Appendix 11)
10. The program according to claim 9,
The first machine learning model and the second machine learning model are constructed by a convolutional neural network;
each of the first feature amount, the second feature amount, and the third feature amount includes a heat map representing a possibility that a joint point exists on the image;
A program characterized by:

（付記１２）
付記９に記載のプログラムであって、
前記第１の機械学習モデル及び前記第２の機械学習モデルが、グラフ畳み込みネットワークによって構築されており、
前記第１の特徴量、前記第２の特徴量、及び前記第３の特徴量、それぞれが、画像上の関節点の位置を示す座標値を含む、
ことを特徴とするプログラム。
(Appendix 12)
10. The program according to claim 9,
The first machine learning model and the second machine learning model are constructed by a graph convolutional network;
each of the first feature amount, the second feature amount, and the third feature amount includes a coordinate value indicating a position of a joint point on an image;
A program characterized by:

以上、実施の形態を参照して本願発明を説明したが、本願発明は上記実施の形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described above with reference to the embodiment, the present invention is not limited to the above embodiment. Various modifications that can be understood by a person skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

この出願は、２０２１年２月２６日に出願された日本出願特願２０２１－０２９４１０を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2021-029410, filed on February 26, 2021, the disclosure of which is incorporated herein in its entirety.

以上のように、本発明によれば、関節点の位置の推定精度の向上を図ることができる。本発明は、人、ロボットといった、関節点を有するものの姿勢検出が求められる分野に有用である。具体的な分野としては、映像監視、ユーザインタフェースなどが挙げられる。As described above, the present invention can improve the accuracy of estimating the position of joint points. The present invention is useful in fields where posture detection of objects with joint points, such as humans and robots, is required. Specific fields include video surveillance and user interfaces.

１０関節点検出装置
１１全特徴量出力部
１２部分特徴量出力部
１３統合特徴量出力部
１４関節点検出部
１５記憶部
１６第１の機械学習モデル（ＣＮＮ）
１７第２の機械学習モデル（ＣＮＮ）
１８第３の機械学習モデル（ＣＮＮ）
２０画像データ
２１第１のヒートマップ
２２第２のヒートマップ
２３第３のヒートマップ
３１第１の機械学習モデル（ＧＣＮ）
３２第２の機械学習モデル（ＧＣＮ）
４１第１の座標値
４２第２の座標値
４３第３の座標値
１１０コンピュータ
１１１ＣＰＵ
１１２メインメモリ
１１３記憶装置
１１４入力インターフェイス
１１５表示コントローラ
１１６データリーダ／ライタ
１１７通信インターフェイス
１１８入力機器
１１９ディスプレイ装置
１２０記録媒体
１２１バス REFERENCE SIGNS LIST 10 Articulation point detection device 11 Total feature output unit 12 Partial feature output unit 13 Integrated feature output unit 14 Articulation point detection unit 15 Storage unit 16 First machine learning model (CNN)
17. Second Machine Learning Model (CNN)
18. The third machine learning model (CNN)
20 Image data 21 First heat map 22 Second heat map 23 Third heat map 31 First machine learning model (GCN)
32 Second machine learning model (GCN)
41 First coordinate value 42 Second coordinate value 43 Third coordinate value 110 Computer 111 CPU
112 Main memory 113 Storage device 114 Input interface 115 Display controller 116 Data reader/writer 117 Communication interface 118 Input device 119 Display device 120 Recording medium 121 Bus

Claims

a total feature value output means for outputting, for each joint point of the object, a first feature value representing the joint point from the image data of the object;
a partial feature output means for outputting, for each of a plurality of specific joint points among the target joint points, a second feature representing a joint point other than the specific joint point using a first machine learning model that corresponds to the specific joint point and that performs machine learning on a positional relationship between the joint points other than the specific joint point;
an integrated feature output means for outputting, for each of the target joint points, a third feature representing the joint point using a second machine learning model that corresponds to the joint point and that has machine-learned a position of the joint point;
A joint point detection device comprising:

The joint point detection device according to claim 1,
The method further includes a joint point detection means for detecting coordinates of each of the joint points of the object by using the third feature amount for each of the joint points of the object.
A joint point detection device characterized by:

The joint point detection device according to claim 1,
The first machine learning model and the second machine learning model are constructed by a convolutional neural network;
each of the first feature amount, the second feature amount, and the third feature amount includes a heat map representing a possibility that a joint point exists on the image;
A joint point detection device comprising:

The joint point detection device according to claim 1,
The first machine learning model and the second machine learning model are constructed by a graph convolutional network;
each of the first feature amount, the second feature amount, and the third feature amount includes a coordinate value indicating a position of a joint point on an image;
A joint point detection device comprising:

outputting, for each joint point of the object, a first feature amount representative of the joint point from the image data of the object;
using the first feature amount representing a joint point other than the specific joint point as an input for each of a plurality of specific joint points among the target joint points, and outputting a second feature amount representing the joint point other than the specific joint point using a first machine learning model that corresponds to the specific joint point and that has machine-learned a positional relationship between the joint points other than the specific joint point;
outputting, for each of the target joint points, a third feature amount representing the joint point using a second machine learning model that corresponds to the joint point and that has machine-learned a position of the joint point, using the second feature amount representing the joint point as an input;
A joint point detection method comprising:

On the computer,
outputting, for each joint point of the object, a first feature amount representing the joint point from the image data of the object;
a first machine learning model that receives, as an input, the first feature amount representing a joint point other than the specific joint point for each of a plurality of specific joint points among the target joint points, and that corresponds to the specific joint point and that has machine-learned a positional relationship between the joint points other than the specific joint point, and outputs a second feature amount representing the joint point other than the specific joint point;
outputting, for each of the target joint points, a third feature amount representing the joint point using a second machine learning model that corresponds to the joint point and that has machine-learned a position of the joint point, using the second feature amount representing the joint point as an input;
program.