JP7316236B2

JP7316236B2 - Skeletal tracking method, device and program

Info

Publication number: JP7316236B2
Application number: JP2020033081A
Authority: JP
Inventors: 智尋中塚; 和之田坂; 仁志西村
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2023-07-27
Anticipated expiration: 2040-02-28
Also published as: JP2021135877A

Description

本発明は、骨格追跡方法、装置およびプログラムに係り、特に、時系列画像上で人物の関節点を追跡して骨格を推定する骨格追跡方法、装置およびプログラムに関する。 The present invention relates to a skeleton tracking method, apparatus and program, and more particularly to a skeleton tracking method, apparatus and program for estimating a skeleton by tracking joint points of a person on time-series images.

骨格追跡とは、時系列画像において人物の骨格およびその動きを推定する技術であり、特に、時系列画像内に複数の人物が存在することを前提とした場合、大別して２つのアプローチが存在する。 Skeletal tracking is a technique for estimating a person's skeleton and its movements in time-series images. Especially, when it is assumed that multiple people exist in time-series images, there are roughly two approaches. .

第１のアプローチは、非特許文献１，２に開示されるように、各フレームにおいて人物の関節点を検出し、複数のフレームで検出された全ての関節点についてフレーム内およびフレーム間で最適な接続を求めることで関節点追跡と骨格推定とを同時に行うボトムアップアプローチである。 The first approach, as disclosed in Non-Patent Documents 1 and 2, detects the joint points of a person in each frame and optimizes the intra-frame and inter-frame joint points for all detected joint points in multiple frames. It is a bottom-up approach that performs joint point tracking and skeleton estimation simultaneously by finding connections.

非特許文献１には、入力された時系列画像の全てについて、(1) 関節点検出、(2) 確信度推定（関節点隣接確信度および関節点追跡確信度）および(3) 確信度ベースでの追跡推定を行った後、骨格追跡の結果を出力する技術が開示されている。 Non-Patent Document 1 describes (1) joint point detection, (2) confidence estimation (joint point adjacent confidence and joint point tracking confidence), and (3) confidence base A technique is disclosed for outputting the result of skeletal tracking after performing tracking estimation in .

関節点隣接確信度とは、例えば手首と肘のように、フレーム内の各関節点が四肢などによって直接接続される関節点同士である確信度である。関節点追跡確信度とは、フレーム間の各関節点が同一の関節点である確信度である。確信度ベースの追跡推定では、得られたすべての関節点をノード、推定した確信度をエッジの重みとするグラフを考え、いくつかの制約に従った最適な分割を行うことで人物毎に骨格を推定する。 The joint point adjacency confidence is the confidence that each joint point in the frame is a joint point that is directly connected by a limb or the like, such as a wrist and an elbow. The joint point tracking confidence is the confidence that each joint point between frames is the same joint point. Confidence-based tracking estimation considers a graph in which all the obtained joint points are nodes and the estimated confidence is the edge weight. to estimate

非特許文献２では、フレーム画像を読み込む毎に骨格追跡の推定を行い、かつ二つのフレーム間の関節点の追跡をより効果的に行う。すなわち、注目フレームから関節点を検出し、次に注目フレーム内の関節点同士が隣接している確信度の推定と直前のフレーム内の関節点からの追跡確信度の推定を行う。それぞれの確信度は、学習によって得られたモデルが出力する関節点の存在確率マップ、関節点の隣接方向ベクトルマップ（PAF）および関節点の追跡方向ベクトルマップ（TAF）から求められる。 In Non-Patent Document 2, skeleton tracking is estimated each time a frame image is read, and joint points between two frames are tracked more effectively. That is, the joint points are detected from the frame of interest, then the confidence that the joint points in the frame of interest are adjacent to each other is estimated, and the tracking confidence is estimated from the joint points in the immediately preceding frame. Each degree of certainty is obtained from a joint point presence probability map, a joint point adjacent direction vector map (PAF), and a joint point tracking direction vector map (TAF) output by a model obtained by learning.

特に、TAFは関節点の動きの表現を含んでおり、前回および今回のPAF、ならびに一つ前のTAFに基づき算出される。非特許文献２では、一つ前のTAFを使うことにより、過去の動きを踏まえて現在の動きを推定することができる。最後に、注目フレームと直前のフレームの関節点をノードとし、推定した確信度を各エッジの重みとするグラフを考え、いくつかの制約に従った最適な分割を行うことで人物毎に骨格を推定する。 In particular, the TAF includes a motion representation of the joint points and is calculated based on the previous and current PAFs and the previous TAF. In Non-Patent Document 2, by using the previous TAF, the current motion can be estimated based on the past motion. Finally, we consider a graph in which the joint points of the frame of interest and the previous frame are nodes, and the estimated confidence is the weight of each edge. presume.

第２のアプローチは、非特許文献３に開示されるように、各フレームにおいて人物検出および骨格推定を行った後、各人物について追跡を行うトップダウンアプローチである。 The second approach, as disclosed in Non-Patent Document 3, is a top-down approach that performs person detection and skeleton estimation in each frame, followed by tracking for each person.

非特許文献３では、各フレームで人物検出によって求められたバウンディングボックスごとに骨格推定を行い、得られた関節点の次フレームにおける位置をオプティカルフローによって推定する。そして、この推定位置と次フレームで実際に得られた関節点の位置とが最も近い人物同士を対応付けることで追跡が行われる。 In Non-Patent Document 3, skeleton estimation is performed for each bounding box obtained by human detection in each frame, and the positions of the obtained joint points in the next frame are estimated by optical flow. Tracking is then performed by associating persons whose estimated positions are closest to joint point positions actually obtained in the next frame.

このアプローチでは各フレームにおける骨格推定が肝要であり、特許文献１は、予め骨格モデルを設定することで画像中に様々な物体が含まれる複雑なシーンにおいても頑健に骨格推定を行う方法を開示している。 In this approach, skeleton estimation in each frame is essential, and Patent Document 1 discloses a method for robust skeleton estimation even in a complex scene in which various objects are included in an image by setting a skeleton model in advance. ing.

特開2017-091377号公報JP 2017-091377 A

Iqbal, Umar, Anton Milan, and Juergen Gall. "Posetrack: Joint multi-person pose estimation and tracking." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.Iqbal, Umar, Anton Milan, and Juergen Gall. "Posetrack: Joint multi-person pose estimation and tracking." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Raaj, Yaadhav, et al. "Efficient Online Multi-Person 2D Pose Tracking with Recurrent Spatio-Temporal Affinity Fields." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.Raaj, Yaadhav, et al. "Efficient Online Multi-Person 2D Pose Tracking with Recurrent Spatio-Temporal Affinity Fields." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019. Xiao, Bin, Haiping Wu, and Yichen Wei. "Simple baselines for human pose estimation and tracking." Proceedings of the European Conference on Computer Vision (ECCV). 2018.Xiao, Bin, Haiping Wu, and Yichen Wei. "Simple baselines for human pose estimation and tracking." Proceedings of the European Conference on Computer Vision (ECCV). 2018. Cao, Zhe, et al. "OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields." arXiv preprint arXiv:1812.08008 (2018).Cao, Zhe, et al. "OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields." arXiv preprint arXiv:1812.08008 (2018). He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Doering, Andreas, Umar Iqbal, and Juergen Gall. "Joint flow: Temporal flow fields for multi person tracking." arXiv preprint arXiv:1805.04596 (2018).Doering, Andreas, Umar Iqbal, and Juergen Gall. "Joint flow: Temporal flow fields for multi person tracking." arXiv preprint arXiv:1805.04596 (2018). Bewley, Alex, et al. "Simple online and realtime tracking." 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016.Bewley, Alex, et al. "Simple online and realtime tracking." 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016.

非特許文献３が属するトップダウンアプローチは、各フレームで検出された人数に比例して計算コストが増大する懸念がある。また、複数の人物同士が接近するような困難な状況では人物検出、骨格推定が失敗するケースが確認されており、同時にこれらの処理に強く依存する追跡についても性能低下が免れない。骨格推定手法である特許文献１は前述のような骨格推定が困難な状況を解決する機構を持たない上、そもそも追跡処理を考慮していない。 There is a concern that the top-down approach to which Non-Patent Document 3 belongs increases the computational cost in proportion to the number of people detected in each frame. In addition, it has been confirmed that human detection and skeleton estimation fail in difficult situations such as when multiple people are close to each other. Patent Document 1, which is a skeleton estimation method, does not have a mechanism for solving the above-described situation in which skeleton estimation is difficult, and does not consider tracking processing in the first place.

非特許文献１，２が開示するボトムアップアプローチでは、検出できた人物すべての骨格推定及び追跡を同時に推定するため、人数の増加が処理時間に与える影響が比較的少ない。さらに、骨格推定と追跡の双方にとって最適な解が求められることから、一方の精度劣化をもう一方によって低減する余地がある。 In the bottom-up approaches disclosed in Non-Patent Documents 1 and 2, skeleton estimation and tracking of all detected persons are simultaneously estimated, so an increase in the number of persons has relatively little effect on processing time. Furthermore, since optimal solutions are sought for both skeleton estimation and tracking, there is room for reducing the accuracy degradation of one by the other.

しかしながら、非特許文献１は複数のフレームについて同時に骨格追跡の推定を行うオフライン処理である。その上、現在のフレームと過去の複数のフレームとの間で直接関節点の対応付けをとっており、この対応付けの処理は計算コストが高い。従って、リアルタイム性の求められる状況下では利用しがたいという問題がある。 However, Non-Patent Document 1 is an offline process for estimating skeleton tracking for a plurality of frames at the same time. In addition, there is a direct joint point correspondence between the current frame and past frames, and this correspondence process is computationally expensive. Therefore, there is a problem that it is difficult to use in situations where real-time performance is required.

非特許文献２では、前後フレームの間でのみ追跡を行い、直前より過去のフレームの情報については直接利用しない代わりに、直前のフレームにおける動き情報や見えの特徴量を現在のフレームでの推論に再帰的に活用することによって対応付けの処理を減らし、オンラインのリアルタイム処理を可能にしている。 In Non-Patent Document 2, tracking is performed only between the previous and next frames, and instead of directly using the information of the previous frames, motion information and appearance features in the previous frame are used for inference in the current frame. Recursive use reduces matching processing and enables online real-time processing.

しかしながら、前後フレーム間での対応付けによる追跡は遮蔽が頻発した場合に悪影響を受けやすい。また、関節点の動き方向（TAF）と関節点の接続方向（PAF）とを表す二種類のベクトルによって関節点の対応付けを行っていることから、位置さえ似ていれば見た目の大きく異なる関節点同士でも取り違えてしまう場合がある。従って、特にスポーツ映像など複数の人物が接近し相互の遮蔽が発生する状況において課題が残っている。 However, tracking based on correspondence between the preceding and following frames is susceptible to adverse effects when occlusions occur frequently. In addition, since the two types of vectors representing the movement direction of the joint point (TAF) and the connection direction of the joint point (PAF) are used to associate the joint points, even if the positions are similar, joints that look very different Even points may be mistaken for each other. Therefore, problems remain, particularly in situations such as sports videos where multiple people are close together and mutual shading occurs.

本発明の目的は、上記の技術課題を解決し、追跡中の関節点を遮蔽等の原因でフレーム画像から一時的に検出できないことがあっても、確度の高い追跡を継続して正確な骨格追跡を実現できる骨格追跡方法、装置およびプログラムを提供することにある。 An object of the present invention is to solve the above technical problems, and even if the joint points being tracked cannot be temporarily detected from the frame image due to shielding, etc., the tracking can be continued with high accuracy and an accurate skeleton can be obtained. An object of the present invention is to provide a skeleton tracking method, apparatus and program capable of realizing tracking.

上記の目的を達成するために、本発明は、関節点の追跡結果に基づいて人物の骨格を追跡する骨格追跡装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention is characterized by having the following configuration in a skeleton tracking device for tracking a human skeleton based on the tracking result of joint points.

(1) 時系列画像の各フレームから関節点の位置および見え特徴量を検出する関節点検出手段と、検出履歴のある関節点ごとに、その位置および見え特徴量の履歴を関節点追跡フローとして記憶する記憶手段と、フレーム画像から検出した各関節点が各関節点追跡フローに対応する確信度を、それぞれの位置および見え特徴量に基づいて推定する確信度推定手段と、確信度が最も高い対応関係にある関節点と関節点追跡フローとの組み合わせに基づいて骨格を推定する骨格推定手段とを具備した。 (1) Joint point detection means for detecting joint point positions and appearance feature quantities from each frame of time-series images, and for each joint point with a detection history, the position and appearance feature quantity histories are recorded as a joint point tracking flow. storage means for storing; confidence estimation means for estimating confidence that each joint point detected from a frame image corresponds to each joint point tracking flow based on each position and appearance feature quantity; and skeleton estimation means for estimating a skeleton based on a combination of corresponding joint points and the joint point tracking flow.

(2) 骨格の推定結果に基づいて、関節点の位置を確信度が最も高い対応関係にある関節点追跡フローに更新登録し、関節点の見え情報を確信度が最も高い対応関係にある関節点追跡フローに履歴として追加する更新手段を具備した。 (2) Based on the estimation results of the skeleton, the positions of the joint points are updated and registered in the joint point tracking flow with the highest degree of confidence, and the appearance information of the joint points is added to the joint with the highest degree of confidence. An updating means is provided to add history to the point tracking flow.

(3) 更新手段は、関節点追跡フローを更新する際、追加しようとする見え特徴量との類似度が所定の基準値を超える見え特徴量の履歴が関節点追跡フローに既登録であると当該見え特徴量を追加しないようにした。 (3) When updating the joint point tracking flow, the updating means determines that the history of the appearance feature amount whose similarity to the appearance feature amount to be added exceeds a predetermined reference value is already registered in the joint point tracking flow. The appearance feature amount is not added.

(4) 更新手段は、関節点追跡フローに登録されている見え特徴量の履歴数が所定数を超えると、類似度が所定の基準値を超える見え特徴量ペアの一方を削除するようにした。 (4) When the number of appearance feature quantity histories registered in the joint point tracking flow exceeds a prescribed number, the updating means deletes one of the appearance feature quantity pairs whose similarity exceeds a prescribed reference value. .

(5) 確信度推定手段は、関節点の見え特徴量と関節点追跡フローの各見え特徴量の履歴との類似度を算出し、各類似度の平均値，最大値および中央値のいずれかに基づいて確信度を推定するようにした。 (5) The confidence estimating means calculates the degree of similarity between the appearance feature amount of the joint point and the history of each appearance feature amount of the joint point tracking flow, and calculates any of the average value, the maximum value and the median value of each degree of similarity. to estimate confidence based on

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 検出履歴のある関節点ごとに、その位置および見え特徴量の履歴を関節点追跡フローとして記憶し、フレーム画像から検出した各関節点が各関節点追跡フローに対応する確信度を、それぞれの位置および見え特徴量に基づいて推定し、確信度が最も高い対応関係にある関節点と関節点追跡フローとの組み合わせに基づいて人物の骨格を推定するので、追跡中の関節点を遮蔽等の原因でフレーム画像から一時的に検出できないことがあっても、確度の高い追跡を継続でき、正確な骨格推定が可能になる。 (1) For each joint point that has a detection history, the history of its position and appearance feature amount is stored as a joint point tracking flow, and the certainty that each joint point detected from the frame image corresponds to each joint point tracking flow is Estimates based on each position and appearance feature, and estimates the human skeleton based on the combination of the joint points with the highest confidence and the joint point tracking flow, so the joint points are masked during tracking. Even if it cannot be detected temporarily from the frame image for some reason, it is possible to continue tracking with high accuracy and accurately estimate the skeleton.

(2) 骨格の推定結果に基づいて、関節点の位置を確信度が最も高い対応関係にある関節点追跡フローに更新登録し、関節点の見え情報を確信度が最も高い対応関係にある関節点追跡フローに履歴として追加するので、関節点追跡フローに最新の位置および見え情報を保持できるようになる。 (2) Based on the estimation results of the skeleton, the positions of the joint points are updated and registered in the joint point tracking flow with the highest degree of confidence, and the appearance information of the joint points is added to the joint with the highest degree of confidence. It is added as history to the point tracking flow so that the joint point tracking flow can retain the latest position and appearance information.

(3) 更新手段は、関節点追跡フローを更新する際、追加しようとする見え特徴量との類似度が所定の基準値を超える見え特徴量の履歴が関節点追跡フローに既登録であると当該見え特徴量を追加しないようにしたので、見え特徴量の履歴に多様性を持たせることができ、見え特徴量の変化に頑健な確信度推定が可能になる。 (3) When updating the joint point tracking flow, the updating means determines that the history of the appearance feature amount whose similarity to the appearance feature amount to be added exceeds a predetermined reference value is already registered in the joint point tracking flow. Since the appearance feature amount is not added, the history of the appearance feature amount can be diversified, and robust estimation of the degree of certainty against changes in the appearance feature amount is possible.

(4) 更新手段は、関節点追跡フローに登録されている見え特徴量の履歴数が所定数を超えると、類似度が所定の基準値を超える見え特徴量ペアの一方を削除するようにしたので、関節点追跡フローに登録されている見え特徴量の履歴数を抑えながら多様性をもたせることが可能になると同時に、追跡処理に必要となる記憶容量や計算コストの低減を図ることも可能となる。 (4) When the number of appearance feature quantity histories registered in the joint point tracking flow exceeds a prescribed number, the updating means deletes one of the appearance feature quantity pairs whose similarity exceeds a prescribed reference value. Therefore, it is possible to reduce the number of histories of appearance feature values registered in the joint point tracking flow while providing diversity, and at the same time, it is possible to reduce the storage capacity and calculation cost required for tracking processing. Become.

(5) 確信度推定手段は、関節点の見え特徴量と関節点追跡フローの各見え特徴量の履歴との類似度を算出し、各類似度の平均値，最大値および中央値のいずれかに基づいて確信度を推定するので、関節点の見え特徴量と関節点追跡フローの各見え特徴量の履歴との類似度を実用的かつ定量的に求められるようになる。 (5) The confidence estimating means calculates the degree of similarity between the appearance feature amount of the joint point and the history of each appearance feature amount of the joint point tracking flow, and calculates any of the average value, the maximum value and the median value of each degree of similarity. Since the degree of certainty is estimated based on , the degree of similarity between the joint point appearance feature amount and the history of each appearance feature amount in the joint point tracking flow can be obtained practically and quantitatively.

本発明の一実施形態に係る骨格推定装置の主要部の構成を示した図である。1 is a diagram showing the configuration of main parts of a skeleton estimation device according to an embodiment of the present invention; FIG. 関節点の定義の一例を示した図である。FIG. 4 is a diagram showing an example of definition of joint points; 関節点追跡フローリストの構成を示した図である。FIG. 10 is a diagram showing the configuration of a joint point tracking flow list; 追跡確信度の推定手順を示したフローチャートである。4 is a flow chart showing a procedure for estimating a tracking confidence; 追跡確信度の推定方法を模式的に示した図である。FIG. 4 is a diagram schematically showing a method of estimating a tracking certainty; 関節点および関節点追跡フローをノードと見なしたグラフ生成および部分グラフ分割の例を示した図である。FIG. 10 is a diagram showing an example of graph generation and subgraph division in which joint points and joint point tracking flows are regarded as nodes;

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明の一実施形態に係る骨格推定装置１の主要部の構成を示した図であり、本実施形態では、複数の人物が写った時系列画像において骨格推定を行うケースを想定し、時系列画像は、1フレームより長い時間継続し、シーンの切り替わりが起こらないものとする。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a diagram showing the configuration of the main parts of a skeleton estimation apparatus 1 according to one embodiment of the present invention. In this embodiment, it is assumed that skeleton estimation is performed on time-series images in which a plurality of people are captured. However, the time-series images shall continue for a period longer than one frame, and shall not change scenes.

このような骨格推定装置は、CPU、メモリ、インタフェースおよびこれらを接続するバス等を備えた少なくとも1台の汎用のコンピュータやサーバに各機能を実現するアプリケーション（プログラム）を実装することで構成できる。あるいはアプリケーションの一部がハードウェア化またはROM化された専用機や単能機としても構成できる。 Such a skeleton estimation device can be configured by installing an application (program) that implements each function on at least one general-purpose computer or server equipped with a CPU, a memory, an interface, and a bus connecting them. Alternatively, it can be configured as a dedicated machine or a single-function machine in which part of the application is made into hardware or ROMized.

画像入力部１０１には、被写体として人物の写ったRGBの時系列画像が入力される。関節点検出部１０２は、画像入力部１０１から時系列画像をフレーム単位で取得し、図２に一例を示した関節点の定義に従い、右手首や右肘などの区別をもって各フレーム画像から関節点を検出する。 The image input unit 101 receives RGB time-series images of a person as a subject. The joint point detection unit 102 acquires time-series images from the image input unit 101 on a frame-by-frame basis, and according to the definition of joint points shown in an example in FIG. to detect

関節点の検出には任意の既存手法を用いることができる。例えば、骨格推定手法である非特許文献４と同様の手法を用いることができる。すなわち事前学習済みのCNNモデルにフレーム画像を入力し、各関節点の存在確率マップを種類別に出力した上で、確率の高い座標をその関節点の位置とする手法を用いても良い。本実施形態では、関節点ごとにその種別id、位置pの情報および見え特徴量fが取得されて確信度推定部１０３へ提供される。 Any existing technique can be used to detect joint points. For example, a method similar to that of Non-Patent Document 4, which is a skeleton estimation method, can be used. That is, a method may be used in which a frame image is input to a pre-trained CNN model, an existence probability map of each joint point is output by type, and then coordinates with high probability are used as the positions of the joint points. In this embodiment, the type id, the information of the position p, and the appearance feature amount f are acquired for each joint point and provided to the certainty estimation unit 103 .

ここで、関節点の種別idとは各関節点が骨格を構成する右手首や右肘等のいずれであるかを識別する情報である。位置pとは各関節点のフレーム画像上での位置を代表する情報である。見え特徴量fとは、フレーム画像の関節点を中心とした局所的範囲の画像特徴を代表する情報であり、以下に代表的な２つの算出方式を説明する。 Here, the joint point type id is information for identifying whether each joint point is a right wrist, a right elbow, or the like, which constitutes the skeleton. The position p is information representing the position of each joint point on the frame image. The appearance feature amount f is information representing image features in a local range centering on the joint point of the frame image, and two typical calculation methods will be described below.

(1) CNNを用いる方法 (1) Method using CNN

非特許文献５に開示されるように、関節点の検出位置近傍の画像（例えば、検出位置を中心とする5×5の矩形領域）を事前学習済みのCNNモデルに入力することによって算出する。 As disclosed in Non-Patent Document 5, it is calculated by inputting an image (for example, a 5×5 rectangular area centered on the detection position) near the detection position of the joint point to a pre-trained CNN model.

(2) 前段の処理を活用する方法 (2) A method that utilizes the processing in the previous stage

関節点検出部１０２において非特許文献４と同様の手段をとった場合、CNNモデルの中間層の出力は入力画像の見え特徴量に相当するものと考えられることから、検出した関節点の位置に相当する箇所の特徴量を見え特徴量として切り出す。 If the joint point detection unit 102 uses the same means as in Non-Patent Document 4, the output of the intermediate layer of the CNN model is considered to correspond to the appearance feature amount of the input image. The feature amount of the corresponding portion is cut out as the visible feature amount.

記憶部１０５には、関節点追跡フローリスト１０５ａおよび関節点隣接情報１０５ｂが記憶されている。関節点追跡フローリスト１０５ａは、図３に一例を示したように、検出履歴のある関節点ごとに生成された複数の関節点追跡フローFを含む。各関節点追跡フローFは、当該関節点が所属する人物の識別子ID，関節点種別id、位置pおよび見え特徴量fの履歴から構成される。 The storage unit 105 stores a joint point tracking flow list 105a and joint point adjacent information 105b. The joint point tracking flow list 105a includes a plurality of joint point tracking flows F generated for each joint point having a detection history, as shown in FIG. Each joint point tracking flow F is composed of a history of the identifier ID of the person to whom the joint point belongs, the joint point type id, the position p, and the appearance feature amount f.

見え特徴量の履歴とは、同一の関節点が過去の各フレームで検出されたときに抽出された見え特徴量fの集合であり、関節点ごとに数個ないしは数十個の見え特徴量fで構成される。図３では、例えば関節点ごとに当該関節点を検出できたフレーム位置に見え特徴量fが模式的に○印で記され、検出できなかったフレーム位置には模式的にスペース記号が記されている。 The appearance feature history is a set of appearance features f extracted when the same joint point was detected in each past frame. consists of In FIG. 3, for example, for each joint point, the appearance feature value f at the frame position where the joint point could be detected is schematically marked with a circle, and the frame position where the joint point could not be detected is schematically marked with a space symbol. there is

位置pとしては、当該関節点が最後に検出されたフレーム画像における位置座標を用いることができる。なお、関節点追跡フローFの位置には、これまでの位置の遷移として動き情報を反映しておくことができる。関節点隣接情報１０５ｂは、関節点追跡フロー１０５ａで定義された各関節点の隣接関係を特定する情報である。 As the position p, position coordinates in the frame image in which the joint point was last detected can be used. It should be noted that motion information can be reflected in the position of the joint point tracking flow F as the transition of the position so far. The joint point adjacency information 105b is information specifying the adjacency relationship of each joint point defined in the joint point tracking flow 105a.

確信度推定部１０３は、関節点隣接確信度推定部１０３ａおよび関節点追跡確信度推定部１０３ｂを含む。関節点隣接確信度推定部１０３ａは、前記関節点間の隣接関係の定義に従い、関節点検出部１０２が検出した関節点同士が隣接している確信度を推定する。 The certainty estimator 103 includes a joint point adjacent certainty estimator 103a and a joint point tracking certainty estimator 103b. The joint point adjacent certainty estimating unit 103a estimates the certainty that the joint points detected by the joint point detecting unit 102 are adjacent according to the definition of the adjacency relationship between the joint points.

この確信度推定には任意の既存手法を用いることができる。例えば、骨格推定手法である非特許文献４に開示された手法を採用しても良い。すなわち、事前学習済みのCNNモデルに画像を入力し、その画像上における隣接する関節点間のつながりを表すベクトルマップを出力した上で、検出した関節点間においてその二点間の方向ベクトルと出力ベクトルの類似度により隣接確信度を求める方法である。ただし、この推定方法に依らず、隣接関係が定義されていない関節点間の確信度はすべて0とされる。 Any existing technique can be used for this confidence estimation. For example, the method disclosed in Non-Patent Document 4, which is a skeleton estimation method, may be adopted. In other words, input an image to a pre-trained CNN model, output a vector map representing the connection between adjacent joint points on the image, and output the direction vector between the two points between the detected joint points. This is a method of obtaining the degree of adjacency confidence from the degree of similarity of vectors. However, regardless of this estimation method, all the degrees of certainty between joint points whose adjacency relationship is not defined are set to 0.

関節点追跡確信度推定部１０３ｂは、今回のフレーム画像から検出した各関節点が、前記記憶部１０５に保持している関節点追跡フロー１０５ａの各関節点追跡フローリストFに追跡対象として対応付けられる確信度を推定する。 The joint point tracking confidence estimation unit 103b associates each joint point detected from the current frame image with each joint point tracking flow list F of the joint point tracking flow 105a held in the storage unit 105 as a tracking target. Estimate the confidence that

図４は、関節点追跡確信度推定部１０３ｂによる追跡確信度の推定手順を示したフローチャートであり、図５は、追跡確信度の推定方法を模式的に示した図である。 FIG. 4 is a flowchart showing a tracking confidence estimation procedure by the joint point tracking confidence estimation unit 103b, and FIG. 5 is a diagram schematically showing a tracking confidence estimation method.

ステップＳ１では、関節点検出部１０２が今回のフレーム画像から検出した複数の関節点の一つに注目する。ステップＳ２では、関節点追跡フローリストに登録されている複数の関節点追跡フローFのうち、関節点種別idが一致する関節点追跡フローFの一つに注目する。 In step S1, attention is focused on one of the joint points detected by the joint point detection unit 102 from the current frame image. In step S2, one of the joint point tracking flows F having the matching joint point type id among the plurality of joint point tracking flows F registered in the joint point tracking flow list is focused on.

ステップＳ３では、注目した関節点を注目した関節点追跡フローFに追跡対象として対応付けられる確信度が、両者の位置および見え特徴量の類似度に基づいて推定される。本実施形態では、関節点種別が異なる関節点（ただし左右の別は含まない）と関節点追跡フローとの確信度は位置や見え特徴量の類似度に関わらず全て0とされる。したがって、実質の確信度は位置pおよび見え特徴量fの類似度に基づいて推定される。 In step S3, the degree of certainty that the joint point of interest is associated with the joint point tracking flow F of interest as a tracking target is estimated based on the similarity of the positions and appearance features of both. In this embodiment, the degrees of certainty between joint points of different joint point types (however, left and right are not included) and the joint point tracking flow are all set to 0 regardless of the degree of similarity between positions and appearance feature amounts. Therefore, the real confidence is estimated based on the similarity between the position p and the appearance feature f.

位置の類似度S1は、関節点の位置と関節点追跡フローリストの位置との間のユークリッド距離dとし、次式(1)に基づいて算出できる。
S1=1/(1+d) (1) The positional similarity S1 can be calculated based on the following equation (1) using the Euclidean distance d between the position of the joint point and the position of the joint point tracking flow list.
S1=1/(1+d) (1)

見え特徴量の類似度S2は、現フレームで検出された関節点の見え特徴量をf0、関節点追跡フローリストに登録されている各見え特徴量の履歴を[f1, f2…, fn]としたとき、コサイン類似度[f0f1, … ,f0fn]の平均値、中央値あるいは最大値で代表できる。ただし、各見え特徴量の履歴[f0, … ,fn]は一次元ベクトルの形式で表され、長さ１に正規化されているものとする。 The appearance feature value similarity S2 is defined by the appearance feature value of the joint points detected in the current frame as f0, and the history of each appearance feature value registered in the joint point tracking flow list as [f1, f2..., fn]. , it can be represented by the mean, median or maximum value of the cosine similarity [f0f1, . . . ,f0fn]. However, it is assumed that the history of each appearance feature amount [f0, .

このように、本実施形態では関節点の見え特徴量と関節点追跡フローの各見え特徴量の履歴との類似度を算出し、各類似度の平均値，最大値および中央値のいずれかに基づいて確信度を推定するので、両者の類似度を実用的かつ定量的に求められるようになる。 As described above, in this embodiment, the similarity between the appearance feature amount of the joint point and the history of each appearance feature amount of the joint point tracking flow is calculated, and any of the average value, the maximum value, and the median value of each similarity is calculated. Since the degree of certainty is estimated based on this, the degree of similarity between the two can be obtained practically and quantitatively.

注目した関節点と注目した関節点追跡フローとの確信度は、位置および見え特徴量の類似度の組み合わせ類似度として、上記のようにして求めた位置類似度S1と見え特徴量類似度S2との積で代表できる。なお、繰り返し処理の一周目など、記憶部に関節点追跡フローが保持されていない場合には、追跡確信度の算出は行わない。 The degree of certainty between the joint point of interest and the joint point tracking flow of interest is the combined similarity of the similarity of the position and the appearance feature amount, which is the position similarity S1 and the appearance feature amount similarity S2 obtained as described above. can be represented by the product of Note that when the joint point tracking flow is not stored in the storage unit, such as in the first round of the iterative process, the tracking certainty is not calculated.

ステップＳ４では、注目した関節点と関節点種別idが一致する全ての関節点追跡フローとの確信度推定が完了したか否かが判断される。完了していなければステップＳ２へ戻り、注目する関節点追跡フローを切り替えながら上記の処理が繰り返される。 In step S4, it is determined whether the certainty estimation of all the joint point tracking flows whose joint point type id matches the joint point of interest has been completed. If not completed, the process returns to step S2, and the above processing is repeated while switching the joint point tracking flow of interest.

注目した関節点と関節点種別idが一致する全ての関節点追跡フローとの確信度推定が終了するとステップＳ５へ進み、全ての関節点と関節点種別idが一致する全ての関節点追跡フローとの確信度推定が完了したか否かが判断される。完了していなければステップＳ１へ戻り、注目する関節点を切り替えながら上記の処理が繰り返される。全ての確信度推定が終了すると当該処理を終了する。 When the certainty estimation of all the joint point tracking flows whose joint point type id matches the joint point of interest is completed, the process proceeds to step S5, and all the joint points and all joint point tracking flows whose joint point type id matches. is completed. If not completed, the process returns to step S1, and the above processing is repeated while switching the joint point of interest. When all certainty estimations are completed, the process is terminated.

確信度ベース骨格追跡推定部１０４は、グラフ生成部１０４ａ、部分グラフ分割部１０４ｂ、更新部１０４ｃおよび見え特徴量剪定部１０４ｄを含み、確信度が最も高い対応関係にある関節点と関節点追跡フローとの組み合わせに基づいて骨格を推定する。 The certainty-based skeleton tracking estimation unit 104 includes a graph generating unit 104a, a subgraph dividing unit 104b, an updating unit 104c, and an appearance feature amount pruning unit 104d, and combines joint points having a correspondence relationship with the highest certainty and the joint point tracking flow. Estimate the skeleton based on the combination of

グラフ生成部１０４ａは、図６に示したように、全ての関節点および関節点追跡フローをノードと見なしてグラフを生成する。各ノード間のエッジの重みは以下のように定義される。 As shown in FIG. 6, the graph generation unit 104a generates a graph regarding all joint points and joint point tracking flows as nodes. The edge weight between each node is defined as follows.

すなわち、(1) 関節点ノード間のエッジの重みは、関節点隣接確信度推定部１０３ａが推定した確信度とされる。(2) 関節点ノードと関節点追跡フローノードとの間のエッジの重みは、関節点追跡確信度推定部１０３ｂが推定した確信度とされる。(3) 関節点追跡フローノード間のエッジの重みは、後述する関節点追跡フロー間の隣接確信度とされる。なお、繰り返し処理の一周目など、記憶部１０５に関節点追跡フローが蓄積されていない場合は、関節点ノード間のエッジのみからグラフを構成することで代替する。 That is, (1) the weight of the edge between the joint point nodes is the certainty estimated by the joint point adjacent certainty estimator 103a. (2) The weight of the edge between the joint point node and the joint point tracking flow node is the confidence estimated by the joint point tracking confidence estimation unit 103b. (3) The weight of the edge between the joint point tracking flow nodes is the adjacency confidence between the joint point tracking flows, which will be described later. Note that when the joint point tracking flow is not accumulated in the storage unit 105, such as in the first round of the iterative process, the graph is alternatively constructed from only the edges between the joint point nodes.

部分グラフ分割部１０４ｂは、上記のようにして生成したグラフを、人体の関節点の構成に関する制約の下、最適な部分グラフに分割することで人物毎の骨格追跡結果を獲得する。本実施形態では、生成したグラフを、例えば非特許文献１，２，６と同様の方法で人物毎の部分グラフに分割する。 The subgraph division unit 104b divides the graph generated as described above into optimal subgraphs under the constraints on the configuration of the joint points of the human body, thereby obtaining the skeleton tracking result for each person. In this embodiment, the generated graph is divided into subgraphs for each person by the same method as in Non-Patent Documents 1, 2, and 6, for example.

更新部１０４ｃは、人物毎の骨格追跡結果に基づいて関節点追跡フロー１０５ａおよび関節点隣接情報１０５ｂを更新する。本実施形態では、同一人物として相互に紐付けられる複数の関節点追跡フローの人物IDを、各関節点追跡フローに既登録の人物IDのうち最頻出のものに更新する。紐付けられる関節点追跡フローがない場合は新しい人物IDが設定される。 The update unit 104c updates the joint point tracking flow 105a and the joint point adjacent information 105b based on the skeleton tracking result for each person. In this embodiment, the person IDs of a plurality of joint point tracking flows that are mutually linked as the same person are updated to the most frequently occurring person IDs among the person IDs already registered in each joint point tracking flow. If there is no linked joint point tracking flow, a new person ID is set.

更新部１０４ｃはまた、関節点の見え特徴量fを、対応する関節点追跡フローFに見え特徴量の履歴として追加登録される。更新部１０４ｃはさらに、当該関節点の位置pに基づいて、対応する関節点追跡フローFの位置Pを更新する。このとき、位置に関連して位置の動き情報が登録されていれば、この動き情報と今回の位置pとに基づいて次フレームの座標位置を予測し、その予測結果が対応する関節点追跡フローFの位置として更新登録される。 The updating unit 104c also additionally registers the appearance feature amount f of the joint point in the corresponding joint point tracking flow F as a history of the appearance feature amount. The updating unit 104c further updates the position P of the corresponding joint point tracking flow F based on the position p of the joint point. At this time, if position motion information is registered in relation to the position, the coordinate position of the next frame is predicted based on this motion information and the current position p, and the prediction result corresponds to the corresponding joint point tracking flow. It is updated and registered as the position of F.

例えば、非特許文献７と同様に、カルマンフィルターによって関節点の動きの予測を行い、次フレームにおける関節点位置を予測できる。また、次フレームの画像を得てから、非特許文献３と同様のオプティカルフローを用いる方法や、非特許文献２と同様のTAFを用いる方法によって次フレームにおける関節点位置を予測してもよい。 For example, as in Non-Patent Document 7, a Kalman filter can be used to predict the motion of the joint point, and the position of the joint point in the next frame can be predicted. Alternatively, after obtaining the image of the next frame, the joint point positions in the next frame may be predicted by a method using optical flow similar to Non-Patent Document 3 or a method using TAF similar to Non-Patent Document 2.

一方、いずれかの人物の骨格に含まれたが、対応する関節点追跡フローFが未登録であった関節点については、当該関節点に対応する関節点追跡フローFが新規登録される。当該関節点追跡フローFには、その位置として前記関節点の位置が登録され、その見え特徴量の履歴として前記関節点の見え特徴量fが登録される。 On the other hand, for a joint point that is included in the skeleton of any person but for which the corresponding joint point tracking flow F has not been registered, the joint point tracking flow F corresponding to the joint point is newly registered. In the joint point tracking flow F, the position of the joint point is registered as its position, and the appearance feature amount f of the joint point is registered as the history of the appearance feature amount.

見え特徴量剪定部１０４ｄは、関節点追跡フローFに登録されている見え特徴量の個数が一定数を超えた場合に、登録タイミングの古い見え特徴量から順に削除する。あるいは関節点追跡フローFに登録されている見え特徴量から２つを見え特徴量ペアとして抽出し、類似度が所定の基準値を超える見え特徴量ペアの一方を削除する。 When the number of appearance feature amounts registered in the joint point tracking flow F exceeds a certain number, the appearance feature amount pruning unit 104d deletes appearance feature amounts in order of oldest registration timing. Alternatively, two of the appearance feature amounts registered in the joint point tracking flow F are extracted as an appearance feature amount pair, and one of the appearance feature amount pairs whose similarity exceeds a predetermined reference value is deleted.

前記見え特徴量剪定部１０４ｄはさらに、今回のフレーム画像から検出した見え特徴量fを関節点追跡フローFに追加する際、追加する見え特徴量fと既登録の見え特徴量fの履歴との類似度を比較し、既登録の見え特徴量fの履歴のいずれかとの類似度が所定の基準値を超える見え特徴量fの追加登録を制限する。あるいは類似度が所定の基準値を超える既登録の見え特徴量履歴と入れ替えることで、記憶容量を制限しながら関節点追跡フローに登録する見え特徴量を多様化すると同時に、追跡処理に必要となる計算コストの低減も図れる。 When adding the appearance feature quantity f detected from the current frame image to the joint point tracking flow F, the appearance feature quantity pruning unit 104d further compares the appearance feature quantity f to be added and the history of the registered appearance feature quantity f. The degree of similarity is compared, and additional registration of the appearance feature quantity f whose degree of similarity with any of the histories of the already registered appearance feature quantity f exceeds a predetermined reference value is restricted. Alternatively, by replacing the history of registered appearance feature amounts whose similarity exceeds a predetermined reference value, it is possible to diversify the appearance feature amounts to be registered in the joint point tracking flow while limiting the storage capacity, and at the same time, it becomes necessary for the tracking process. It is also possible to reduce the calculation cost.

このように、本実施形態によれば検出履歴のある関節点ごとに、その位置および見え特徴量の履歴を関節点追跡フローとして記憶し、フレーム画像から検出した各関節点が各関節点追跡フローに対応する確信度を、それぞれの位置および見え特徴量に基づいて推定し、確信度が最も高い対応関係にある関節点と関節点追跡フローとの組み合わせに基づいて人物の骨格を推定する。したがって、追跡中の関節点を遮蔽等の原因でフレーム画像から一時的に検出できないことがあっても、確度の高い追跡を継続でき、正確な骨格推定が可能になる。 As described above, according to this embodiment, for each joint point having a detection history, the position and the history of the appearance feature amount are stored as a joint point tracking flow, and each joint point detected from the frame image is stored in each joint point tracking flow. is estimated based on each position and appearance feature amount, and the skeleton of a person is estimated based on the combination of the joint point having the correspondence relationship with the highest confidence and the joint point tracking flow. Therefore, even if the joint points being tracked cannot be temporarily detected from the frame image due to shielding or the like, highly accurate tracking can be continued, and accurate skeleton estimation is possible.

１０１…画像入力部，１０２…関節点検出部，１０３…確信度推定部，１０３ａ…関節点隣接確信度推定部，１０３ｂ…関節点追跡確信度推定部，１０４…確信度ベース骨格追跡推定部，１０４ａ…グラフ生成部，１０４ｂ…部分グラフ分割部，１０４ｃ…更新部，１０４ｄ…見え特徴量剪定部，１０５…記憶部，１０５ａ…関節点追跡フローリスト，１０５ｂ…関節点隣接情報 Reference numerals 101: image input unit, 102: joint point detection unit, 103: confidence estimation unit, 103a: joint point adjacent confidence estimation unit, 103b: joint point tracking confidence estimation unit, 104: confidence-based skeleton tracking estimation unit, 104a Graph generation unit 104b Subgraph division unit 104c Update unit 104d Appearance feature amount pruning unit 105 Storage unit 105a Joint point tracking flow list 105b Joint point adjacent information

Claims

A skeleton tracking device for estimating a human skeleton based on tracking results of joint points,
joint point detection means for detecting the position of a joint point from each frame of a time-series image and an appearance feature quantity, which is an image feature of a local range centering on the position of the joint point detected in each frame;
storage means for storing, for each joint point having a detection history, a history of appearance feature amounts, which is a set of positions and appearance feature amounts, as a joint point tracking flow;
confidence estimating means for estimating the confidence that each joint point detected from a frame image corresponds to each joint point tracking flow based on the similarity of each position and appearance feature quantity;
skeleton estimating means for estimating the skeleton of each person based on a combination of joint points having a correspondence relationship with the highest degree of confidence and the joint point tracking flow,
wherein the certainty estimation means estimates the certainty based on the similarity between a joint point appearance feature amount detected from a frame image and a set of joint point appearance feature amounts having a detection history. Device.

2. The skeleton tracking apparatus according to claim 1, further comprising updating means for updating the joint point tracking flow based on the skeleton estimation result.

The updating means updates and registers the position of the joint point in the joint point tracking flow having the correspondence relationship with the highest degree of confidence, and the appearance information of the joint point as a history in the joint point tracking flow having the correspondence relationship with the highest degree of confidence. 3. The skeletal tracking device according to claim 2, further comprising:

The update means does not add the appearance feature amount if a history of the appearance feature amount whose degree of similarity with the appearance feature amount to be added exceeds a predetermined reference value is already registered in the joint point tracking flow. 4. The skeleton tracking device according to claim 3.

The updating means is characterized in that, when the number of histories of appearance feature amounts registered in the joint point tracking flow exceeds a predetermined number, one of the appearance feature amount pairs whose degree of similarity exceeds a predetermined reference value is deleted. A skeleton tracking device according to any one of claims 2 to 4.

The certainty estimating means calculates a similarity between the joint point appearance feature amount and the history of each appearance feature amount of the joint point tracking flow, and calculates the degree of similarity based on one of the average value, the maximum value and the median value of each similarity. 6. The skeleton tracking device according to any one of claims 1 to 5, wherein the confidence factor is estimated by

In a skeleton tracking method in which a computer estimates a human skeleton based on joint point tracking results,
Detecting the position of the joint point from each frame of the time-series images and the appearance feature quantity , which is an image feature of a local range centering on the position of the joint point detected in each frame,
For each joint point with a detection history, the position and the history of the appearance feature amount, which is a set of appearance feature amounts, are stored as a joint point tracking flow,
estimating the confidence that each joint point detected from the frame image corresponds to each joint point tracking flow based on the similarity of each position and appearance feature,
estimating the skeleton of each person based on the combination of the joint points having the correspondence relationship with the highest degree of confidence and the joint point tracking flow ;
A skeleton tracking method, wherein the degree of certainty is estimated based on a degree of similarity between a joint point appearance feature amount detected from a frame image and a set of joint point appearance feature amounts having a detection history.

8. The skeleton tracking method according to claim 7, wherein the joint point tracking flow is updated based on the estimation result of the skeleton.

When updating the joint point tracking flow, the position of the joint point is updated and registered in the joint point tracking flow having the correspondence relationship with the highest degree of confidence, and the appearance information of the joint point is registered to the joint having the correspondence relationship with the highest degree of confidence. 9. The skeleton tracking method according to claim 8, wherein the point tracking flow is added as a history.

In a skeleton tracking program for estimating a human skeleton based on joint point tracking results,
a procedure for detecting, from each frame of a time-series image, the position of a joint point and an appearance feature quantity , which is an image feature of a local range centering on the position of the joint point detected in each frame;
a procedure for storing, for each joint point having a detection history, a history of appearance feature amounts, which is a set of positions and appearance feature amounts, as a joint point tracking flow;
a procedure for estimating the degree of confidence that each joint point detected from a frame image corresponds to each joint point tracking flow based on the similarity of each position and appearance feature quantity;
causing a computer to execute a procedure for estimating the skeleton of each person based on the combination of the joint points having the correspondence relationship with the highest degree of confidence and the joint point tracking flow ;
The step of estimating the degree of certainty is characterized by estimating the degree of certainty based on the degree of similarity between the joint point appearance feature amount detected from the frame image and a set of joint point appearance feature amounts having a detection history. skeleton tracking program.

11. The skeleton tracking program according to claim 10, further comprising a step of updating a joint point tracking flow based on said skeleton estimation result.

In the updating procedure, the position of the joint point is updated and registered in the joint point tracking flow having the correspondence relationship with the highest degree of confidence, and the appearance information of the joint point is recorded in the joint point tracking flow having the correspondence relationship with the highest degree of confidence. 12. The skeleton tracking program according to claim 11, wherein the program is added as