JP7631769B2

JP7631769B2 - Information processing device, matching program, and matching method

Info

Publication number: JP7631769B2
Application number: JP2020204537A
Authority: JP
Inventors: 健畑中; 光礼千野; 慶一松岡; 明浩陸; 健竹林; 和也山元
Original assignee: エフサステクノロジーズ株式会社
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2025-02-19
Anticipated expiration: 2040-12-09
Also published as: JP2022091612A

Description

本発明は、情報処理装置などに関する。 The present invention relates to an information processing device, etc.

近年、様々な分野で、データが利活用されている。特に、パーソナルデータは、個人の行動や嗜好に合わせたサービスに繋がるため、重要度が高い。パーソナルデータには、例えば、個人を撮像した撮像画像がある。 In recent years, data has been utilized in a variety of fields. Personal data in particular is highly important because it can lead to services tailored to an individual's behavior and preferences. Examples of personal data include captured images of individuals.

撮像画像の中の人物の特定や表情の認識には、顔認識技術が利用されている。顔認識技術は、例えば、人物１人の顔画像に対して人物の表情を認識する。また、撮像画像の中の人物の姿勢の認識には、姿勢推定技術が利用されている。姿勢推定技術は、例えば、人物１人の画像に対して人物の姿勢を推定し、姿勢から骨格情報を推定する。 Facial recognition technology is used to identify people and recognize their facial expressions in captured images. Facial recognition technology, for example, recognizes a person's facial expression from a facial image of a single person. Furthermore, posture estimation technology is used to recognize a person's posture in a captured image. Posture estimation technology, for example, estimates a person's posture from an image of a single person and estimates skeletal information from the posture.

顔認識技術および姿勢推定技術を用いた技術が開示されている（例えば、特許文献１，２参照）。例えば、特許文献１では、対象者の画像から、顔情報、および骨格情報を抽出し、顔情報に基づく顔の位置と骨格情報に基づく骨格の位置との位置関係について、正常か異常かを判定し、異常と判定された場合、対象者について認証不可と出力する。 Technologies using face recognition and posture estimation technologies have been disclosed (see, for example, Patent Documents 1 and 2). For example, in Patent Document 1, face information and skeletal information are extracted from an image of a subject, and the positional relationship between the face position based on the face information and the skeletal position based on the skeletal information is judged to be normal or abnormal, and if it is judged to be abnormal, a message is output indicating that the subject cannot be authenticated.

国際公開第２０２０／１５２９１７号International Publication No. 2020/152917 特開２０１４－０５３９６５号公報JP 2014-053965 A

しかしながら、複数の人物が写った画像に対して、顔認識によって認識された顔情報と姿勢推定によって推定された姿勢情報とを精度良くマッチングすることが難しいという問題がある。例えば、対象の画像が、集合写真のような複数の人物が密接な状態の画像の場合、顔情報と姿勢情報とのマッチングに失敗することがある。ここで、マッチングに失敗する場合を、図１８を参照して説明する。 However, there is a problem in that it is difficult to accurately match face information recognized by face recognition with posture information estimated by posture estimation for an image that contains multiple people. For example, if the target image is an image in which multiple people are close together, such as a group photo, matching between face information and posture information may fail. Here, a case in which matching fails will be described with reference to FIG. 18.

図１８は、マッチングに失敗する場合を説明する図である。図１８には、複数の人物が密接な状態の集合写真が示されている。かかる集合写真から、例えば、符号ａ０で示される顔情報が顔認識によって認識される。また、符号ｂ０で示される姿勢情報の基となる人物情報が姿勢推定によって推定される。ところが、集合写真を示す画像は、符号ａ１で示される顔情報が他人の人物情報と重なっているので、顔情報と人物情報とのマッチングに失敗してしまうことがある。すなわち、顔情報と姿勢情報とのマッチングに失敗してしまうことがある。 Figure 18 is a diagram illustrating a case where matching fails. Figure 18 shows a group photograph in which multiple people are close together. From such a group photograph, for example, facial information indicated by the symbol a0 is recognized by facial recognition. Furthermore, personal information on which posture information indicated by the symbol b0 is based is estimated by posture estimation. However, in an image showing a group photograph, facial information indicated by the symbol a1 overlaps with other people's personal information, so matching between facial information and personal information may fail. In other words, matching between facial information and posture information may fail.

本発明は、１つの側面では、複数の人物が写った画像に対して、顔認識によって認識された顔情報と姿勢推定によって推定された姿勢情報とを精度良くマッチングすることを目的とする。 In one aspect, the present invention aims to accurately match face information recognized by face recognition with pose information estimated by pose estimation for an image containing multiple people.

１つの態様では、情報処理装置は、複数の人物を含む撮像画像からそれぞれの人物の顔を検出する第１の検出部と、前記撮像画像から人物を検出する第２の検出部と、前記第２の検出部によって検出された複数の人物に対応する人物検出データごとに、前記人物検出データに基づいて、前記撮像画像から姿勢を推定する姿勢推定部と、前記姿勢推定部によって推定された姿勢に対する姿勢データに基づく骨格情報から、顔候補データを算出する算出部と、前記人物検出データごとに算出される顔候補データに対して、前記第１の検出部によって検出されたそれぞれの人物の顔検出データの中で重なり面積が最も大きい顔検出データを同一人物として特定する特定部と、を有する。 In one aspect, the information processing device has a first detection unit that detects the faces of multiple people from a captured image including the multiple people, a second detection unit that detects people from the captured image, a posture estimation unit that estimates a posture from the captured image based on the person detection data for each of the multiple people detected by the second detection unit, a calculation unit that calculates face candidate data from skeletal information based on the posture data for the posture estimated by the posture estimation unit, and an identification unit that identifies, for the face candidate data calculated for each of the person detection data, face detection data with the largest overlap area among the face detection data of each person detected by the first detection unit as the same person.

１実施態様によれば、複数の人物が写った画像に対して、顔認識によって認識された顔情報と姿勢推定によって推定された姿勢情報とを精度良くマッチングできる。 According to one embodiment, for an image containing multiple people, face information recognized by face recognition and posture information estimated by posture estimation can be matched with high accuracy.

図１は、実施例に係る情報処理装置を含む情報処理システムの機能構成を示すブロック図である。FIG. 1 is a block diagram illustrating a functional configuration of an information processing system including an information processing device according to an embodiment. 図２は、対象となる画像の一例を示す図である。FIG. 2 is a diagram showing an example of a target image. 図３は、顔検出データの一例を示す図である。FIG. 3 is a diagram showing an example of face detection data. 図４は、人物検出データの一例を示す図である。FIG. 4 is a diagram illustrating an example of human detection data. 図５は、顔検出の一例を示す図である。FIG. 5 is a diagram showing an example of face detection. 図６は、人物検出の一例を示す図である。FIG. 6 is a diagram illustrating an example of human detection. 図７は、姿勢推定データの一例を示す図である。FIG. 7 is a diagram illustrating an example of the posture estimation data. 図８は、人体比率の一例を示す図である。FIG. 8 is a diagram showing an example of human body proportions. 図９は、顔候補データの一例を示す図である。FIG. 9 is a diagram showing an example of face candidate data. 図１０は、実施例に係る座標を説明する図である。FIG. 10 is a diagram for explaining coordinates according to the embodiment. 図１１Ａは、実施例に係る横向き判定を説明する図（１）である。FIG. 11A is a diagram (1) for explaining landscape orientation determination according to the embodiment. 図１１Ｂは、実施例に係る横向き判定を説明する図（２）である。FIG. 11B is a diagram (2) for explaining landscape orientation determination according to the embodiment. 図１２Ａは、実施例に係る顔候補抽出を説明する図（１）である。FIG. 12A is a diagram (1) for explaining face candidate extraction according to the embodiment. 図１２Ｂは、実施例に係る顔候補抽出を説明する図（２）である。FIG. 12B is a diagram (2) for explaining face candidate extraction according to the embodiment. 図１３Ａは、実施例に係るマッチングを説明する図（１）である。FIG. 13A is a diagram (1) for explaining matching according to an embodiment. 図１３Ｂは、実施例に係るマッチングを説明する図（２）である。FIG. 13B is a diagram (2) for explaining matching according to the embodiment. 図１４は、実施例に係るマッチングの一例を示す図である。FIG. 14 is a diagram illustrating an example of matching according to the embodiment. 図１５は、実施例に係る情報処理の全体のフローチャートの一例を示す図である。FIG. 15 is a diagram illustrating an example of an overall flowchart of information processing according to the embodiment. 図１６は、実施例に係るマッチング処理のフローチャートの一例を示す図である。FIG. 16 is a diagram illustrating an example of a flowchart of the matching process according to the embodiment. 図１７は、マッチングプログラムを実行するコンピュータの一例を示す図である。FIG. 17 is a diagram illustrating an example of a computer that executes a matching program. 図１８は、マッチングに失敗する場合を説明する図である。FIG. 18 is a diagram for explaining a case where matching fails.

以下に、本願の開示する情報処理装置、マッチングプログラムおよびマッチング方法の実施例を図面に基づいて詳細に説明する。なお、本発明は、実施例により限定されるものではない。 Below, examples of the information processing device, matching program, and matching method disclosed in the present application will be described in detail with reference to the drawings. Note that the present invention is not limited to the examples.

［情報処理装置を含むシステムの構成］
図１は、実施例に係る情報処理装置を含む情報処理システムの機能構成を示すブロック図である。情報処理システム９は、情報処理装置１とカメラ３とを有する。情報処理装置１とカメラ３とは、例えば、ネットワーク５で接続される。ネットワーク５は、無線通信であっても、有線通信であっても良く、カメラ３から情報処理装置１へ画像データを送信できれば良い。一例として、無線通信の場合には、ネットワーク５は、ＷｉＦｉ（登録商標）や、Ｂｌｕｅｔｏｏｔｈ（登録商標）などが挙げられる。 [Configuration of a system including an information processing device]
1 is a block diagram showing a functional configuration of an information processing system including an information processing device according to an embodiment. The information processing system 9 includes an information processing device 1 and a camera 3. The information processing device 1 and the camera 3 are connected, for example, by a network 5. The network 5 may be wireless communication or wired communication, as long as image data can be transmitted from the camera 3 to the information processing device 1. As an example, in the case of wireless communication, the network 5 may be WiFi (registered trademark) or Bluetooth (registered trademark).

情報処理装置１は、カメラ３から送信された、複数の人物が写った画像データから、顔検出および姿勢推定を行い、顔検出および姿勢推定の情報から顔と姿勢を人物ごとにマッチングする。ここで、実施例で用いる対象となる画像の一例を、図２に示す。図２は、対象となる画像の一例を示す図である。図２に示すように、複数の人物が写った集合写真が、対象となる画像である。かかる対象となる画像は、１人の顔が他人の体と重なり、複数人が密接な状態となっている。 The information processing device 1 performs face detection and pose estimation from image data containing multiple people transmitted from the camera 3, and matches the face and pose for each person from the face detection and pose estimation information. An example of a target image used in the embodiment is shown in FIG. 2. FIG. 2 is a diagram showing an example of a target image. As shown in FIG. 2, the target image is a group photo containing multiple people. In such a target image, one person's face overlaps with another person's body, and multiple people are in close proximity.

情報処理装置１は、通信部１０と、制御部２０と、記憶部３０とを有する。通信部１０は、他の装置との間の通信を制御する。例えば、通信部１０は、カメラ３から、複数の人物が撮像された画像の画像データを受信する。また、通信部１０は、カメラ３から、複数の人物が撮像された映像の映像データを受信しても良い。なお、実施例では、映像データは、画像データの束であるので、個々の画像データに関して説明するものとする。 The information processing device 1 has a communication unit 10, a control unit 20, and a storage unit 30. The communication unit 10 controls communication with other devices. For example, the communication unit 10 receives image data of an image of multiple people from the camera 3. The communication unit 10 may also receive video data of a video of multiple people from the camera 3. Note that in the embodiment, the video data is a bundle of image data, so individual image data will be described.

記憶部３０は、各種データや制御部２０が実行するプログラムなどを記憶する。例えば、記憶部３０は、画像ストレージ３１、顔検出データ３２、人物検出データ３３、姿勢推定データ３４、顔候補データ３５およびマッチングリスト３６を記憶する。 The memory unit 30 stores various data and programs executed by the control unit 20. For example, the memory unit 30 stores an image storage 31, face detection data 32, person detection data 33, posture estimation data 34, face candidate data 35, and a matching list 36.

制御部２０は、情報処理装置１全体を司る処理部であり、保存部２１、顔検出部２２、人物検出部２３、姿勢推定部２４、横向き判定部２５、顔候補抽出部２６およびマッチング部２７を有する。保存部２１は、画像データを保存する。例えば、保存部２１は、通信部１０を介してカメラ３から送信された画像データを受け取り、画像ストレージ３１に保存する。 The control unit 20 is a processing unit that controls the entire information processing device 1, and has a storage unit 21, a face detection unit 22, a person detection unit 23, a posture estimation unit 24, a landscape orientation determination unit 25, a face candidate extraction unit 26, and a matching unit 27. The storage unit 21 stores image data. For example, the storage unit 21 receives image data transmitted from the camera 3 via the communication unit 10, and stores it in the image storage 31.

顔検出部２２は、複数の人物が撮像された撮像画像から顔を検出する。例えば、顔検出部２２は、画像ストレージ３１から対象の画像データを取得する。画像データは、複数の人物が撮像された画像の画像データである。顔検出部２２は、取得した画像データから個々の顔を検出する。顔検出に用いるツールは、一例として、ＯｐｅｎＣＶが挙げられるが、顔を検出できるツールであれば何でも良い。顔検出部２２は、検出された顔ごとに、顔座標と顔範囲に対して顔ＩＤを割り振った顔検出データ３２を生成する。 The face detection unit 22 detects faces from a captured image in which multiple people are captured. For example, the face detection unit 22 acquires target image data from the image storage 31. The image data is image data of an image in which multiple people are captured. The face detection unit 22 detects individual faces from the acquired image data. One example of a tool used for face detection is OpenCV, but any tool capable of detecting faces will do. The face detection unit 22 generates face detection data 32 in which a face ID is assigned to the face coordinates and face range for each detected face.

ここで、顔検出データ３２の一例を、図３を参照して説明する。図３は、顔検出データの一例を示す図である。図３に示すように、顔検出データ３２は、顔ＩＤ（IDentifier）、ｘ座標、ｙ座標、横幅および高さを対応付けた情報である。顔ＩＤは、顔を一意に識別する識別子である。ｘ座標、ｙ座標は、検出された顔を矩形で示した場合の所定の角の位置の座標である。座標は、原点を撮像画像の左上とした場合の座標である。所定の角の位置は、矩形の左下、左上、右下または右上であっても良く、予め定められた角の位置であれば良い。横幅、高さは、検出された顔を矩形で示した場合の矩形の横の長さ、矩形の高さである。この矩形は、顔範囲を表す。そして、顔座標と顔範囲は、バウンディングボックスと呼ばれる。 Here, an example of the face detection data 32 will be described with reference to FIG. 3. FIG. 3 is a diagram showing an example of the face detection data. As shown in FIG. 3, the face detection data 32 is information that associates a face ID (identifier), an x coordinate, a y coordinate, a width, and a height. The face ID is an identifier that uniquely identifies a face. The x coordinate and the y coordinate are coordinates of a predetermined corner position when the detected face is shown as a rectangle. The coordinates are coordinates when the origin is the upper left of the captured image. The position of the predetermined corner may be the lower left, upper left, lower right, or upper right of the rectangle, and may be a predetermined corner position. The width and height are the horizontal length and the height of the rectangle when the detected face is shown as a rectangle. This rectangle represents the face area. The face coordinates and the face area are called a bounding box.

一例として、顔ＩＤが「０」である場合、ｘ座標として「１３３３」、ｙ座標として「９８」、横幅として「４１３」、高さとして「４１３」を記憶している。 As an example, if the face ID is "0", the x coordinate is stored as "1333", the y coordinate as "98", the width as "413", and the height as "413".

図１に戻って、人物検出部２３は、複数の人物が撮像された撮像画像から人物を検出する。例えば、人物検出部２３は、対象の画像データから人物を検出する。人物検出に用いるツールは、一例として、ＹＯＬＯ（You Only Look Once）が挙げられるが、人物を検出できるツールであれば何でも良い。人物検出部２３は、検出された人物ごとに、人物の座標と人物の範囲に対して体ＩＤを割り振った人物検出データ３３を生成する。 Returning to FIG. 1, the person detection unit 23 detects people from a captured image in which multiple people are captured. For example, the person detection unit 23 detects people from the target image data. One example of a tool used for person detection is YOLO (You Only Look Once), but any tool that can detect people will do. For each detected person, the person detection unit 23 generates person detection data 33 in which a body ID is assigned to the person's coordinates and the person's range.

ここで、人物検出データ３３の一例を、図４を参照して説明する。図４は、人物検出データの一例を示す図である。図４に示すように、人物検出データ３３は、体ＩＤ、ｘ座標、ｙ座標、横幅および高さを対応付けた情報である。体ＩＤは、人物を一意に識別する識別子である。ｘ座標、ｙ座標は、検出された人物を矩形で示した場合の所定の角の位置の座標である。座標は、原点を撮像画像の左上とした場合の座標である。所定の角の位置は、矩形の左下、左上、右下または右上であっても良く、予め定められた角の位置であれば良い。横幅、高さは、検出された人物を矩形で示した場合の矩形の横の長さ、矩形の高さである。この矩形は、人物の範囲を表す。そして、人物の座標と人物の範囲は、バウンディングボックスと呼ばれる。 Here, an example of the person detection data 33 will be described with reference to FIG. 4. FIG. 4 is a diagram showing an example of the person detection data. As shown in FIG. 4, the person detection data 33 is information that associates a body ID, an x coordinate, a y coordinate, a width, and a height. The body ID is an identifier that uniquely identifies a person. The x coordinate and the y coordinate are the coordinates of a predetermined corner position when the detected person is shown as a rectangle. The coordinates are coordinates when the origin is the upper left of the captured image. The position of the predetermined corner may be the lower left, upper left, lower right, or upper right of the rectangle, and may be a predetermined corner position. The width and height are the horizontal length and the height of the rectangle when the detected person is shown as a rectangle. This rectangle represents the range of the person. The coordinates of the person and the range of the person are called a bounding box.

一例として、体ＩＤが「０」である場合、ｘ座標として「１１７６」、ｙ座標として「１０９」、横幅として「７５４」、高さとして「６８０」を記憶している。 As an example, when the body ID is "0", the x coordinate is stored as "1176", the y coordinate as "109", the width as "754", and the height as "680".

図５は、顔検出の一例を示す図である。図５に示すように、対象となる画像に対して、顔検出部２２によって検出された顔のバウンディングボックスが表わされている。顔検出部２２は、対象となる画像の画像データに対して、顔検出ツールを用いて個々の顔を検出する。そして、顔検出部２２は、検出された顔ごとに、顔座標と顔範囲に対して顔ＩＤを割り振った顔検出データ３２を生成する。顔座標は、例えば、撮像画像の左上を原点とした場合の、矩形の左下の座標であり、バウンディングボックスは、顔座標と顔範囲（横幅、高さ）とで表わされる。 Figure 5 is a diagram showing an example of face detection. As shown in Figure 5, a bounding box of a face detected by the face detection unit 22 is displayed for a target image. The face detection unit 22 detects individual faces for image data of the target image using a face detection tool. The face detection unit 22 then generates face detection data 32 in which a face ID is assigned to the face coordinates and face range for each detected face. The face coordinates are, for example, the bottom left coordinates of a rectangle when the upper left corner of the captured image is the origin, and the bounding box is represented by the face coordinates and face range (width, height).

図６は、人物検出の一例を示す図である。図６に示すように、対象となる画像に対して、人物検出部２３によって検出された人物のバウンディングボックスが表わされている。なお、図６では、説明の便宜上、１人の人物のみのバウンディングボックスが表わされているが、実際には、検出された複数の人物のバウンディングボックスが表わされる。人物検出部２３は、対象となる画像の画像データに対して、人物検出ツールを用いて個々の人物を検出する。そして、人物検出部２３は、検出された人物ごとに、人物の座標と人物の範囲に対して体ＩＤを割り振った人物検出データ３３を生成する。人物の座標は、例えば、撮像画像の左上を原点とした場合の、矩形の左下の座標であり、バウンディングボックスは、人物の座標と人物の範囲（横幅、高さ）とで表わされる。 Figure 6 is a diagram showing an example of person detection. As shown in Figure 6, the bounding box of a person detected by the person detection unit 23 is displayed for the target image. Note that in Figure 6, for convenience of explanation, the bounding box of only one person is displayed, but in reality, the bounding boxes of multiple detected people are displayed. The person detection unit 23 detects individual people using a person detection tool for the image data of the target image. Then, for each detected person, the person detection unit 23 generates person detection data 33 in which a body ID is assigned to the person's coordinates and the person's range. The person's coordinates are, for example, the lower left coordinates of a rectangle when the upper left of the captured image is the origin, and the bounding box is represented by the person's coordinates and the person's range (width, height).

図１に戻って、姿勢推定部２４は、人物検出部２３によって検出された複数の人物に対応する人物検出データ３３ごとに、人物検出データ３３に基づいて、撮像画像から人物の姿勢を推定する。例えば、姿勢推定部２４は、撮像画像から、それぞれの人物検出データ３３に対応する画像（人物検出画像）を抽出する。姿勢推定部２４は、それぞれの人物検出画像に基づいて、姿勢推定ツールを用いて人の関節点の位置情報（骨格情報）を推定する。姿勢推定ツールは、一例として、ＤｅｅｐＰｏｓｅが挙げられるが、人の関節点の位置情報（骨格情報）を推定できるツールであれば何でも良い。姿勢推定部２４は、推定された骨格情報に対して、基にした人物検出データ３３の体ＩＤを割り振った姿勢推定データ３４を生成する。 Returning to FIG. 1, the posture estimation unit 24 estimates the posture of a person from the captured image based on the person detection data 33 for each of the person detection data 33 corresponding to the multiple people detected by the person detection unit 23. For example, the posture estimation unit 24 extracts an image (person detection image) corresponding to each person detection data 33 from the captured image. The posture estimation unit 24 estimates position information (skeletal information) of the person's joint points using a posture estimation tool based on each person detection image. An example of a posture estimation tool is DeepPose, but any tool that can estimate position information (skeletal information) of a person's joint points will do. The posture estimation unit 24 generates posture estimation data 34 by assigning the body ID of the person detection data 33 on which the estimated skeletal information is based.

ここで、姿勢推定データ３４の一例を、図７を参照して説明する。図７は、姿勢推定データの一例を示す図である。図７に示すように、姿勢推定データ３４は、体ＩＤと骨格情報とを対応付けた情報である。体ＩＤは、人物検出データ３３の体ＩＤに対応する。骨格情報は、関節Ｎｏ、ｘ座標およびｙ座標を対応付けた情報である。関節Ｎｏは、関節点に対応する番号である。一例として、関節Ｎｏとしての「０」は、鼻を示す。「１」は、左目を示す。「２」は、右目を示す。「３」は、左耳を示す。「４」は、右耳を示す。「５」は、左肩を示す。「６」は、右肩を示す。「７」は、左肘を示す。「８」は、右肘を示す。「９」は、左手首を示す。「１０」は、右手首を示す。「１１」は、左腰示す。「１２」は、右腰を示す。「１３」は、左膝を示す。「１４」は、右膝を示す。「１５」は、左足首を示す。「１６」は、右足首を示す。関節Ｎｏは、図７右図の人物の関節モデル内の番号に対応する。ｘ座標、ｙ座標は、推定された関節点の座標である。座標は、原点を撮像画像の左上とした場合の座標である。なお、関節Ｎｏの定義は、上記で説明したものに限定されない。 Here, an example of the posture estimation data 34 will be described with reference to FIG. 7. FIG. 7 is a diagram showing an example of the posture estimation data. As shown in FIG. 7, the posture estimation data 34 is information in which a body ID and skeletal information are associated with each other. The body ID corresponds to the body ID of the person detection data 33. The skeletal information is information in which a joint number, an x coordinate, and a y coordinate are associated with each other. The joint number is a number corresponding to a joint point. As an example, the joint number "0" indicates the nose. "1" indicates the left eye. "2" indicates the right eye. "3" indicates the left ear. "4" indicates the right ear. "5" indicates the left shoulder. "6" indicates the right shoulder. "7" indicates the left elbow. "8" indicates the right elbow. "9" indicates the left wrist. "10" indicates the right wrist. "11" indicates the left hip. "12" indicates the right hip. "13" indicates the left knee. "14" indicates the right knee. "15" indicates the left ankle. "16" indicates the right ankle. The joint numbers correspond to the numbers in the joint model of the person in the right diagram of Figure 7. The x and y coordinates are the coordinates of the estimated joint point. The coordinates are those when the origin is the upper left corner of the captured image. Note that the definition of the joint numbers is not limited to that described above.

一例として、体ＩＤが「０」である場合に、関節Ｎｏとして「５」（左肩）、ｘ座標として「１１７３」、ｙ座標として「４８１」を記憶している。関節Ｎｏとして「６」（右肩）、ｘ座標として「９５３」、ｙ座標として「４６４」を記憶している。 As an example, when the body ID is "0", the joint number is stored as "5" (left shoulder), the x coordinate as "1173", and the y coordinate as "481". The joint number is stored as "6" (right shoulder), the x coordinate as "953", and the y coordinate as "464".

図１に戻って、横向き判定部２５は、人体比率を用いて、姿勢推定部２４によって推定された人物の姿勢から横向きであるか否かを判定する。 Returning to FIG. 1, the landscape orientation determination unit 25 uses the human body ratio to determine whether the person's posture estimated by the posture estimation unit 24 is landscape orientation or not.

例えば、横向き判定部２５は、姿勢推定部２４によって推定された人物の姿勢ごとに、以下の処理を行う。横向き判定部２５は、人体比率を用いて、姿勢に対応する姿勢推定データ３４に基づく骨格情報から人物が横を向いているか否かを判定する。人体比率とは、公に知られている人体の各部位の長さの比率から人体全体の中の部位の比率（係数）を計算したものである。人体比率の詳細は、後述する。一例として、横向き判定部２５は、姿勢推定データ３４に基づく骨格情報を用いて、所定の部位の長さを算出する。所定の部位は、例えば、上腕、下腕、胴体、太もも、ふくらはぎのことをいう。横向き判定部２５は、人体比率を用いて、所定の部位の長さに、部位に応じた比率を乗算して、部位の長さから胴体の長さを推定する。そして、横向き判定部２５は、最大の値を有する部位の長さを胴体の長さとして決定する。ここで、胴体の長さを推測するために用いられる、部位に応じた比率は、上腕の場合には９／５、下腕の場合には９／４、胴体の場合には１、太ももの場合には３／２、ふくらはぎの場合には３／２を示す。部位の係数および部位に応じた比率は、人体比率から求められる。 For example, the sideways determination unit 25 performs the following process for each posture of the person estimated by the posture estimation unit 24. The sideways determination unit 25 uses the human body ratio to determine whether the person is facing sideways from the skeletal information based on the posture estimation data 34 corresponding to the posture. The human body ratio is a calculation of the ratio (coefficient) of a part in the entire human body from the publicly known ratio of the length of each part of the human body. The details of the human body ratio will be described later. As an example, the sideways determination unit 25 calculates the length of a predetermined part using the skeletal information based on the posture estimation data 34. The predetermined part refers to, for example, the upper arm, lower arm, torso, thigh, and calf. The sideways determination unit 25 uses the human body ratio to multiply the length of the predetermined part by a ratio according to the part, and estimates the length of the torso from the length of the part. Then, the sideways determination unit 25 determines the length of the part with the maximum value as the length of the torso. Here, the part-specific ratios used to estimate the torso length are 9/5 for the upper arm, 9/4 for the lower arm, 1 for the torso, 3/2 for the thigh, and 3/2 for the calf. The part coefficients and part-specific ratios are determined from the human body proportions.

また、横向き判定部２５は、姿勢推定データ３４に基づく骨格情報を用いて、肩幅の長さを算出する。そして、横向き判定部２５は、肩幅の長さが胴体の長さの１／３よりも短いか否かを判定し、短いと判定した場合には、体が横向きと判定する。そして、横向き判定部２５は、体が横向きと判定した人物（体ＩＤ）の姿勢推定データ３４に横向きフラグを対応付ける。そして、横向き判定部２５は、体が横向きと判定した人物を顔候補抽出の対象から除外するべく、体が横向きでないと判定した人物の姿勢推定データ３４を顔候補抽出部２６に渡す。体が横向きと判定した人物を顔候補抽出の対象から除外するのは、体が横向きの場合には、一方の肩の位置のｘ座標が顔の範囲のｘ座標と被ってしまうため、後述する顔候補抽出部２６によって顔候補を正しく抽出できないからである。 The sideways orientation determination unit 25 also calculates the length of the shoulder width using skeletal information based on the posture estimation data 34. The sideways orientation determination unit 25 then determines whether the shoulder width is shorter than 1/3 of the torso length, and if it is determined that the shoulder width is shorter, determines that the body is sideways. The sideways orientation determination unit 25 then associates a sideways orientation flag with the posture estimation data 34 of the person (body ID) whose body has been determined to be sideways. The sideways orientation determination unit 25 then passes the posture estimation data 34 of the person whose body has been determined not to be sideways to the face candidate extraction unit 26 in order to exclude the person whose body has been determined to be sideways from the target for face candidate extraction. The reason why the person whose body has been determined to be sideways is excluded from the target for face candidate extraction is that when the body is sideways, the x coordinate of the position of one shoulder overlaps with the x coordinate of the range of the face, and therefore the face candidate extraction unit 26 described later cannot correctly extract face candidates.

ここで、人体比率の一例を、図８を参照して説明する。図８は、人体比率の一例を示す図である。図８左図には、公に知られている人体の各部位の長さの比率が表わされている。そして、図８右図には、人体の各部位の長さの比率から計算された、人体全体の中の部位の比率（係数）が表わされている。図８右図に示すように、胴体の係数は「９」を示し、肩幅の係数は「６」を示している。胴体の中の胸部と腰部の係数は、「５」と「４」を示している。太ももとふくらはぎの係数は、「６」と「６」を示している。 Here, an example of human body ratios will be described with reference to FIG. 8. FIG. 8 is a diagram showing an example of human body ratios. The left diagram of FIG. 8 shows the publicly known ratios of the lengths of each part of the human body. The right diagram of FIG. 8 shows the ratios (coefficients) of parts within the entire human body calculated from the ratios of the lengths of each part of the human body. As shown in the right diagram of FIG. 8, the coefficient of the torso is "9", and the coefficient of shoulder width is "6". The coefficients of the chest and waist within the torso are "5" and "4". The coefficients of the thighs and calves are "6" and "6".

ここで、肩幅が胴体の長さの１／３よりも短い場合に体が横向きと判定するのは、以下の理由による。すなわち、人体が正面向き（または後向き）の場合には、人体比率から、胴体と肩幅との比率は略３対２となる。胴体と肩幅との比率が３対１未満となる場合には、肩幅が狭く、もはや体が正面（または後向き）を向いていないと推測されるからである。すなわち、一方の肩の位置のｘ座標が顔の範囲のｘ座標と被ってしまうと推測されるからである。このため、横向き判定部２５は、肩幅が胴体の長さの１／３よりも短い場合に体が横向きと判定し、顔候補抽出の対象から除外するようにする。 Here, the reason why the body is determined to be sideways when the shoulder width is shorter than 1/3 of the torso length is as follows. That is, when the human body is facing forward (or backward), the ratio of the torso to shoulder width is approximately 3:2 based on the human body proportions. When the ratio of the torso to shoulder width is less than 3:1, it is presumed that the shoulder width is narrow and the body is no longer facing forward (or backward). That is, it is presumed that the x coordinate of the position of one shoulder overlaps with the x coordinate of the face range. For this reason, the sideways determination unit 25 determines that the body is sideways when the shoulder width is shorter than 1/3 of the torso length, and excludes it from the target for face candidate extraction.

顔候補抽出部２６は、人体比率および姿勢推定データ３４を用いて、顔候補のデータを抽出する。例えば、顔候補抽出部２６は、対象の体ＩＤに対する骨格情報から左肩（関節Ｎｏが「５」）のｘ座標およびｙ座標、並びに右肩（関節Ｎｏが「６」）のｘ座標およびｙ座標を取得する。顔候補抽出部２６は、原点を対象の撮像画像の左上とした場合に、左肩のｘ座標と右肩のｘ座標の小さい方の値を顔候補の矩形のｘ座標とする。顔候補抽出部２６は、左肩のｙ座標と右肩のｙ座標の大きい方の値を顔候補の矩形のｙ座標とする。そして、顔候補抽出部２６は、肩幅の水平成分の長さを顔候補の矩形の横幅とする。顔候補抽出部２６は、人体比率を用いて、胴体の長さの５／９を顔候補の矩形の高さとする。胴体の長さの９／５を顔候補の矩形の高さとするのは、図８で示した人体比率から、胴体と、首を含む頭部との比率が９対５であるからである。そして、顔候補抽出部２６は、対象の体ＩＤごとに、顔候補の矩形の座標、横幅および高さを顔候補データ３５として生成する。 The face candidate extraction unit 26 extracts data of face candidates using the human body ratio and posture estimation data 34. For example, the face candidate extraction unit 26 acquires the x-coordinate and y-coordinate of the left shoulder (joint No. is "5") and the x-coordinate and y-coordinate of the right shoulder (joint No. is "6") from the skeletal information for the body ID of the target. When the origin is set to the upper left of the captured image of the target, the face candidate extraction unit 26 sets the smaller of the x-coordinate of the left shoulder and the x-coordinate of the right shoulder as the x-coordinate of the rectangle of the face candidate. The face candidate extraction unit 26 sets the larger of the y-coordinate of the left shoulder and the y-coordinate of the right shoulder as the y-coordinate of the rectangle of the face candidate. Then, the face candidate extraction unit 26 sets the length of the horizontal component of the shoulder width as the width of the rectangle of the face candidate. The face candidate extraction unit 26 sets 5/9 of the length of the torso as the height of the rectangle of the face candidate using the human body ratio. The height of the face candidate rectangle is set to 9/5 of the torso length because, according to the human body proportions shown in Figure 8, the ratio of the torso to the head including the neck is 9:5. Then, the face candidate extraction unit 26 generates the coordinates, width, and height of the face candidate rectangle as face candidate data 35 for each target body ID.

ここで、顔候補データ３５の一例を、図９を参照して説明する。図９は、顔候補データの一例を示す図である。図９に示すように、顔候補データ３５は、体ＩＤ、ｘ座標、ｙ座標、横幅および高さを対応付けた情報である。体ＩＤは、人物検出データ３３の体ＩＤに対応する。ｘ座標、ｙ座標は、抽出された顔候補を矩形で示した場合の所定の角の位置の座標である。座標は、原点を撮像画像の左上とした場合の座標である。所定の角の位置は、矩形の左下、左上、右下または右上であっても良く、予め定められた角の位置であれば良い。横幅、高さは、抽出された顔候補を矩形で示した場合の矩形の横の長さ、矩形の高さである。この矩形は、顔候補の範囲を表す。そして、顔候補の座標と顔候補の範囲は、バウンディングボックスと呼ばれる。 Here, an example of the face candidate data 35 will be described with reference to FIG. 9. FIG. 9 is a diagram showing an example of the face candidate data. As shown in FIG. 9, the face candidate data 35 is information that associates a body ID, an x coordinate, a y coordinate, a width, and a height. The body ID corresponds to the body ID of the person detection data 33. The x coordinate and the y coordinate are the coordinates of a predetermined corner position when the extracted face candidate is shown as a rectangle. The coordinates are coordinates when the origin is the upper left of the captured image. The position of the predetermined corner may be the lower left, upper left, lower right, or upper right of the rectangle, and may be a predetermined corner position. The width and the height are the horizontal length and the height of the rectangle when the extracted face candidate is shown as a rectangle. This rectangle represents the range of the face candidate. The coordinates of the face candidate and the range of the face candidate are called a bounding box.

図１に戻って、マッチング部２７は、顔候補データ３５に対して、複数の顔検出データ３２の中で重なり面積が最も大きい顔検出データ３２を同一人物としてマッチング（紐付け）する。例えば、マッチング部２７は、顔候補データ３５ごとに、以下の処理を行う。マッチング部２７は、対象の顔候補データ３５から得られるバウンディングボックスと、顔検出部２２によって検出された全ての顔の顔検出データ３２から得られるバウンディングボックスとの重なりの面積を求める。マッチング部２７は、対象の顔候補データ３５の体ＩＤと、重なりの面積が最大の顔検出データ３２の顔ＩＤとをマッチングしたと判定する。すなわち、マッチング部２７は、マッチングした顔ＩＤと、体ＩＤとを同一人物として特定する。そして、マッチング部２８は、特定された、顔ＩＤと、体ＩＤとをマッチングリスト３６に出力する。 Returning to FIG. 1, the matching unit 27 matches (links) the face detection data 32 with the largest overlapping area among the multiple face detection data 32 with the face candidate data 35 as the same person. For example, the matching unit 27 performs the following process for each face candidate data 35. The matching unit 27 calculates the overlapping area between the bounding box obtained from the target face candidate data 35 and the bounding box obtained from the face detection data 32 of all faces detected by the face detection unit 22. The matching unit 27 determines that the body ID of the target face candidate data 35 is matched with the face ID of the face detection data 32 with the largest overlapping area. In other words, the matching unit 27 identifies the matched face ID and body ID as the same person. The matching unit 28 then outputs the identified face ID and body ID to the matching list 36.

ここで、実施例に係る座標を、図１０を参照して説明する。図１０は、実施例に係る座標を説明する図である。なお、図１０では、説明の便宜上、対象となる撮像画像の一部を抜粋した画像を用いて説明する。対象の撮像画像の左上が、原点の座標（０，０）である。そして、原点から右方向がｘ座標であり、原点から下方向がｙ座標である。ここでは、例えば、姿勢推定部２４によって推定された人の関節点の座標（骨格情報）が表わされている。一例として、右肩の座標は、（１４，３４）である。右肩の座標は、図７に示す人物の関節モデル内の関節Ｎｏ「６」に対応する座標である。左肩の座標は、（２８，２４）である。左肩の座標は、図７に示す人物の関節モデル内の関節Ｎｏ「５」に対応する座標である。右腰の座標は、（２８，５５）である。右腰の座標は、図７に示す人物の関節モデル内の関節Ｎｏ「１２」に対応する座標である。左腰の座標は、（４４，５２）である。左腰の座標は、図７に示す人物の関節モデル内の関節Ｎｏ「１１」に対応する座標である。 Here, the coordinates according to the embodiment will be described with reference to FIG. 10. FIG. 10 is a diagram for explaining the coordinates according to the embodiment. For convenience of explanation, FIG. 10 uses an image in which a part of the captured image to be taken is extracted. The upper left corner of the captured image to be taken is the coordinates (0,0) of the origin. The right direction from the origin is the x coordinate, and the downward direction from the origin is the y coordinate. Here, for example, the coordinates (skeleton information) of the joint points of a person estimated by the posture estimation unit 24 are shown. As an example, the coordinates of the right shoulder are (14,34). The coordinates of the right shoulder are the coordinates corresponding to the joint number "6" in the joint model of the person shown in FIG. 7. The coordinates of the left shoulder are (28,24). The coordinates of the left shoulder are the coordinates corresponding to the joint number "5" in the joint model of the person shown in FIG. 7. The coordinates of the right hip are (28,55). The coordinates of the right hip are the coordinates corresponding to the joint number "12" in the joint model of the person shown in FIG. 7. The coordinates of the left hip are (44, 52). The coordinates of the left hip correspond to joint number "11" in the human joint model shown in Figure 7.

図１１Ａおよび図１１Ｂは、実施例に係る横向き判定を説明する図である。図１１Ａには、体が横を向いている人物の顔候補のバウンディングボックスが表わされている。黒丸は、左肩と右肩の座標である。体が横を向いていると、一方の肩（ここでは、右肩）の位置のｘ座標が顔の範囲のｘ座標と被ってしまうため、顔候補抽出部２６によって抽出される顔候補が正しく抽出されない。そこで、横向き判定部２５は、体が横を向いている人物を顔候補抽出の対象から除外する。 Figures 11A and 11B are diagrams illustrating sideways orientation determination according to an embodiment. Figure 11A shows the bounding boxes of face candidates for a person whose body is turned to the side. The black circles represent the coordinates of the left and right shoulders. When the body is turned to the side, the x-coordinate of the position of one shoulder (here, the right shoulder) overlaps with the x-coordinate of the face range, and face candidates extracted by the face candidate extraction unit 26 are not extracted correctly. Therefore, the sideways orientation determination unit 25 excludes people whose body is turned to the side from the targets for face candidate extraction.

図１１Ｂに示すように、姿勢推定部２４によって推定された人の関節点の座標（骨格情報）の一部が表わされている。右肩の座標は、（１４，３４）である。左肩の座標は、（２８，２４）である。右腰の座標は、（２８，５５）である。左腰の座標は、（４４，５２）である。また、図示しないが、左肘の座標と左手首の座標がある。右肩と右腰の間や左肩と左腰の間は、それぞれ胴体を示す。左肘と左手首の間は、下腕を示す。 As shown in FIG. 11B, some of the coordinates (skeletal information) of a person's joint points estimated by the posture estimation unit 24 are shown. The coordinates of the right shoulder are (14, 34). The coordinates of the left shoulder are (28, 24). The coordinates of the right hip are (28, 55). The coordinates of the left hip are (44, 52). Although not shown, there are also the coordinates of the left elbow and the left wrist. The space between the right shoulder and right hip, and the space between the left shoulder and left hip, respectively, indicate the torso. The space between the left elbow and left wrist indicates the lower arm.

横向き判定部２５は、姿勢推定データ３４の骨格情報を用いて、それぞれの部位の長さを算出する。ここでは、一例として、胴体の長さ、下腕を算出する。右肩の座標（１４，３４）と右腰の座標（２８，５５）を用いて、胴体の長さは、「２５」と算出される。また、左肩の座標（２８，２４）と左腰の座標（４４，５２）を用いて、胴体の長さは、「３２」と算出される。また、下腕の長さは、左肘の座標と左手首の座標を用いて「８」と算出されたとする。 The sideways orientation determination unit 25 calculates the length of each part using the skeletal information of the posture estimation data 34. Here, as an example, the length of the torso and the lower arm are calculated. Using the coordinates of the right shoulder (14, 34) and the coordinates of the right hip (28, 55), the length of the torso is calculated to be "25". Using the coordinates of the left shoulder (28, 24) and the coordinates of the left hip (44, 52), the length of the torso is calculated to be "32". The length of the lower arm is calculated to be "8" using the coordinates of the left elbow and the left wrist.

横向き判定部２５は、人体比率を用いて、所定の部位の長さに、部位に応じた比率を乗算して、最大の乗算結果を胴体の長さと決定する。ここでは、右肩と右腰から算出される胴体の長さは、比率が１であるので、「２５」と算出される。また、左肩と左腰から算出される胴体の長さは、比率が１であるので、「３２」と算出される。また、下腕から算出される胴体の長さは、比率が９／４であるので、「１８」（＝８×９／４）と算出される。そして、横向き判定部２５は、ｍａｘ（２５，３２，１８）を算出して、胴体の長さを「３２」と決定する。なお、ここでは、横向き判定部２５は、一例として、胴体、下腕から胴体の長さを決定したが、実際には、上腕、太もも、ふくらはぎなどを加えて胴体の長さを決定する。 The sideways orientation determination unit 25 uses the human body ratio to multiply the length of a specific part by a ratio according to the part, and determines the maximum multiplication result as the length of the torso. Here, the length of the torso calculated from the right shoulder and right hip is calculated as "25" because the ratio is 1. The length of the torso calculated from the left shoulder and left hip is calculated as "32" because the ratio is 1. The length of the torso calculated from the lower arms is calculated as "18" (= 8 x 9/4) because the ratio is 9/4. The sideways orientation determination unit 25 then calculates max (25, 32, 18) and determines the length of the torso as "32". Note that here, as an example, the sideways orientation determination unit 25 determines the length of the torso from the torso and lower arms, but in reality, the length of the torso is determined by adding the upper arms, thighs, calves, etc.

また、横向き判定部２５は、右肩の座標と左肩の座標を用いて、肩幅の長さを算出する。ここでは、右肩の座標が（１４，３４）であり、左肩の座標が（２８，２４）であるので、肩幅の長さは、「１７」と算出される。 The sideways orientation determination unit 25 also calculates the shoulder width using the coordinates of the right shoulder and the left shoulder. In this case, the coordinates of the right shoulder are (14, 34) and the coordinates of the left shoulder are (28, 24), so the shoulder width is calculated to be "17."

そして、横向き判定部２５は、肩幅の長さが胴体の長さの１／３よりも短いか否かを判定し、短いと判定した場合には、体が横向きと判定する。ここでは、肩幅の長さが「１７」、胴体の長さが「３２」であるため、肩幅の長さ「１７」が胴体の長さの１／３「約１０」よりも長い。したがって、体が横向きでないと判定される。この結果、横向き判定部２５は、この人物の姿勢推定データ３４を顔候補抽出部２６による顔候補抽出の対象とする。なお、横向き判定部２５は、体が横向きと判定した場合には、体が横向きと判定された人物の姿勢推定データ３４を顔候補抽出部２６による顔候補抽出の対象としない。 Then, the sideways orientation determination unit 25 determines whether the shoulder width is shorter than 1/3 of the torso length, and if it is determined that the shoulder width is shorter, it determines that the body is sideways. Here, the shoulder width is "17" and the torso length is "32", so the shoulder width "17" is longer than 1/3 of the torso length (approximately 10). Therefore, it is determined that the body is not sideways. As a result, the sideways orientation determination unit 25 makes the posture estimation data 34 of this person a target for face candidate extraction by the face candidate extraction unit 26. Note that, if the sideways orientation determination unit 25 determines that the body is sideways, it does not make the posture estimation data 34 of the person whose body has been determined to be sideways a target for face candidate extraction by the face candidate extraction unit 26.

図１２Ａおよび図１２Ｂは、実施例に係る顔候補抽出を説明する図である。図１２Ａには、人物の顔候補のバウンディングボックスが表わされている。大きい黒丸は、左肩と右肩の位置を示す。小さい黒丸は、左腰と右腰の位置を示す。矢印で示される肩と腰の間の長さは、胴体の長さである。顔候補抽出部２６は、肩幅の水平成分の長さを顔候補のバウンディングボックスの横幅とする。顔候補抽出部２６は、人体比率を用いて、胴体の長さの５／９を顔候補のバウンディングボックスの高さとする。これは、図８の人体比率から、胴体の長さと、頭部および首部の長さとの比率が、９対５であるからである。 12A and 12B are diagrams for explaining face candidate extraction according to an embodiment. In FIG. 12A, the bounding box of a person's face candidate is shown. The large black circle indicates the position of the left shoulder and the right shoulder. The small black circle indicates the position of the left hip and the right hip. The length between the shoulder and the hip indicated by the arrow is the length of the torso. The face candidate extraction unit 26 determines the length of the horizontal component of the shoulder width as the width of the bounding box of the face candidate. Using the human body ratio, the face candidate extraction unit 26 determines the height of the bounding box of the face candidate to be 5/9 of the torso length. This is because, according to the human body ratio in FIG. 8, the ratio between the length of the torso and the length of the head and neck is 9:5.

図１２Ｂに示すように、姿勢推定部２４によって推定された人の関節点の座標（骨格情報）の一部が表わされている。右肩の座標は、（１４，３４）である。左肩の座標は、（２８，２４）である。顔候補抽出部２６は、左肩のｘ座標と右肩のｘ座標の小さい方の値を顔候補のバウンディングボックスのｘ座標とする。ここでは、バウンディングボックスのｘ座標は、「１４」を示す。顔候補抽出部２６は、左肩のｙ座標と右肩のｙ座標の大きい方の値を顔候補のバウンディングボックスのｙ座標とする。ここでは、バウンディングボックスのｙ座標は、「３４」を示す。そして、顔候補抽出部２６は、肩幅の水平成分の長さを顔候補のバウンディングボックスの横幅とする。ここでは、バウンディングボックスの横幅は、肩幅の水平成分の長さが１４と算出されるので、「１４」を示す。そして、顔候補抽出部２６は、人体比率を用いて、胴体の長さの５／９を顔候補のバウンディングボックスの高さとする。ここでは、胴体の長さは、図１１Ｂで算出したとおり「３２」であるので、バウンディングボックスの高さは、１８（＝３２×５／９）を示す。これにより、顔候補のバウンディングボックスは、顔候補の座標として（１４，３４）、顔候補の範囲の横幅として１４顔候補の範囲の高さとして１８と抽出される。点線の矩形が、顔候補のバウンディングボックスである。 As shown in FIG. 12B, a part of the coordinates (skeletal information) of the joint points of the person estimated by the posture estimation unit 24 is shown. The coordinates of the right shoulder are (14, 34). The coordinates of the left shoulder are (28, 24). The face candidate extraction unit 26 sets the smaller of the x coordinate of the left shoulder and the x coordinate of the right shoulder as the x coordinate of the bounding box of the face candidate. Here, the x coordinate of the bounding box indicates "14". The face candidate extraction unit 26 sets the larger of the y coordinate of the left shoulder and the y coordinate of the right shoulder as the y coordinate of the bounding box of the face candidate. Here, the y coordinate of the bounding box indicates "34". The face candidate extraction unit 26 then sets the length of the horizontal component of the shoulder width as the width of the bounding box of the face candidate. Here, the width of the bounding box indicates "14" since the length of the horizontal component of the shoulder width is calculated to be 14. Then, using the human body ratio, the face candidate extraction unit 26 sets the height of the bounding box of the face candidate to 5/9 of the length of the torso. In this case, the length of the torso is "32" as calculated in FIG. 11B, so the height of the bounding box is 18 (= 32 x 5/9). As a result, the bounding box of the face candidate is extracted with the coordinates of the face candidate being (14, 34), the width of the face candidate range being 14, and the height of the face candidate range being 18. The dotted rectangle is the bounding box of the face candidate.

図１３Ａおよび図１３Ｂは、実施例に係るマッチングを説明する図である。図１３Ａには、１つの顔候補データ３５から得られるバウンディングボックスｂｂｏｘ１と、顔検出部２２によって検出された１つの顔検出データ３２から得られるバウンディングボックスｂｂｏｘ２との重なりが表わされている。ｂｂｏｘ１の顔候補の座標は、（ｂｂｏｘ１＿ｘ、ｂｂｏｘ１＿ｙ）である。ｂｂｏｘ１の横幅は、ｂｂｏｘ１＿ｗである。ｂｂｏｘ１の高さは、ｂｂｏｘ１＿ｈである。また、ｂｂｏｘ２の顔候補の座標は、（ｂｂｏｘ２＿ｘ、ｂｂｏｘ２＿ｙ）である。ｂｂｏｘ２の横幅は、ｂｂｏｘ２＿ｗである。ｂｂｏｘ２の高さは、ｂｂｏｘ２＿ｈである。 Figures 13A and 13B are diagrams for explaining matching according to an embodiment. Figure 13A shows the overlap between a bounding box bbox1 obtained from one face candidate data 35 and a bounding box bbox2 obtained from one face detection data 32 detected by the face detection unit 22. The coordinates of the face candidate in bbox1 are (bbox1_x, bbox1_y). The width of bbox1 is bbox1_w. The height of bbox1 is bbox1_h. The coordinates of the face candidate in bbox2 are (bbox2_x, bbox2_y). The width of bbox2 is bbox2_w. The height of bbox2 is bbox2_h.

マッチング部２７は、バウンディングボックスｂｂｏｘ１とバウンディングボックスｂｂｏｘ２との重なりの四角形の４点である左下、左上、右下、右上を、以下のように求める。なお、ｌｅｆｔは、重なりの左の上下のｘ座標の値を示す。ｒｉｇｈｔは、重なりの右の上下のｘ座標の値を示す。ｂｏｔｔｏｍは、重なりの下の左右のｙ座標の値を示す。ｔｏｐは、重なりの上の左右のｙ座標の値を示す。
ｌｅｆｔ＝ｍａｘ（ｂｂｏｘ１＿ｘ，ｂｂｏｘ２＿ｘ）
ｔｏｐ＝ｍａｘ（ｂｂｏｘ１＿ｙ－ｂｂｏｘ１＿ｈ，ｂｂｏｘ２＿ｙ－ｂｂｏｘ２＿ｈ）
ｒｉｇｈｔ＝ｍｉｎ（ｂｂｏｘ１＿ｘ＋ｂｂｏｘ１＿ｗ，ｂｂｏｘ２＿ｘ＋ｂｂｏｘ２＿ｗ）
ｂｏｔｔｏｍ＝ｍｉｎ（ｂｂｏｘ１＿ｙ，ｂｂｏｘ２＿ｙ）
すなわち、重なりの四角形の左下の座標は、（ｌｅｆｔ，ｂｏｔｔｏｍ）である。重なりの四角形の左上の座標は、（ｌｅｆｔ，ｔｏｐ）である。重なりの四角形の右下の座標は、（ｒｉｇｈｔ，ｂｏｔｔｏｍ）である。重なりの四角形の右上の座標は、（ｒｉｇｈｔ，ｔｏｐ）である。 The matching unit 27 determines the four points of the overlapping rectangle between bounding boxes bbox1 and bbox2, the bottom left, top left, bottom right, and top right, as follows: "left" indicates the x coordinate values of the top and bottom of the left of the overlap; "right" indicates the x coordinate values of the top and bottom of the right of the overlap; "bottom" indicates the y coordinate values of the left and right of the bottom of the overlap; and "top" indicates the y coordinate values of the left and right of the top of the overlap.
left=max(bbox1_x, bbox2_x)
top=max(bbox1_y-bbox1_h, bbox2_y-bbox2_h)
right=min(bbox1_x+bbox1_w, bbox2_x+bbox2_w)
bottom=min(bbox1_y, bbox2_y)
That is, the coordinates of the bottom left corner of the overlapping rectangle are (left, bottom). The coordinates of the top left corner of the overlapping rectangle are (left, top). The coordinates of the bottom right corner of the overlapping rectangle are (right, bottom). The coordinates of the top right corner of the overlapping rectangle are (right, top).

そして、マッチング部２７は、重なりの四角形が存在する場合には、重なりの面積を求める。ここでは、ｌｅｆｔがｒｉｇｈｔより小さく、且つｔｏｐがｂｏｔｔｏｍより小さければ、重なりの四角形が存在すると判定される。そして、重なりの面積は、以下のように求められる。
重なりの面積＝（ｒｉｇｈｔ－ｌｅｆｔ）×（ｂｏｔｔｏｍ－ｔｏｐ） Then, if an overlapping rectangle exists, the matching unit 27 calculates the area of the overlap. Here, if the left is smaller than the right and the top is smaller than the bottom, it is determined that an overlapping rectangle exists. Then, the area of the overlap is calculated as follows.
Overlap area = (right-left) x (bottom-top)

図１３Ｂに示すように、顔候補データ３５から得られるバウンディングボックス（破線）と、顔検出データ３２から得られるバウンディングボックス（実線）とが表わされている。顔候補データ３５から得られるバウンディングボックスについて、顔候補の座標が（１４，３４）、顔候補の横幅および高さが、それぞれ１４、１８で表わされている。顔検出データ３２から得られるバウンディングボックスについて、顔の座標が（１４，２８）、顔の横幅および高さが、それぞれ１２，１３で表わされている。 As shown in FIG. 13B, a bounding box (dashed line) obtained from face candidate data 35 and a bounding box (solid line) obtained from face detection data 32 are shown. For the bounding box obtained from face candidate data 35, the coordinates of the face candidate are represented as (14, 34), and the width and height of the face candidate are represented as 14 and 18, respectively. For the bounding box obtained from face detection data 32, the coordinates of the face are represented as (14, 28), and the width and height of the face are represented as 12 and 13, respectively.

そうすると、マッチング部２７は、重なりの四角形の４点（左下、左上、右下、右上）を、以下のように求める。
ｌｅｆｔ＝ｍａｘ（１４，１４）＝１４
ｔｏｐ＝ｍａｘ（２８－１３，３４－１８）＝ｍａｘ（１５，１６）＝１６
ｒｉｇｈｔ＝ｍｉｎ（２６，２８）＝２６
ｂｏｔｔｏｍ＝ｍｉｎ（２８，３４）＝２８
すなわち、重なりの四角形の左下の座標は、（１４，２８）である。重なりの四角形の左上の座標は、（１４，１６）である。重なりの四角形の右下の座標は、（２６，２８）である。重なりの四角形の右上の座標は、（２６，１６）である。 Then, the matching unit 27 finds the four points (lower left, upper left, lower right, and upper right) of the overlapping rectangle as follows.
left=max(14,14)=14
top=max(28-13,34-18)=max(15,16)=16
right=min(26,28)=26
bottom=min(28,34)=28
That is, the coordinates of the bottom left corner of the overlapping rectangle are (14, 28). The coordinates of the top left corner of the overlapping rectangle are (14, 16). The coordinates of the bottom right corner of the overlapping rectangle are (26, 28). The coordinates of the top right corner of the overlapping rectangle are (26, 16).

そして、マッチング部２７は、ｌｅｆｔ（１４）がｒｉｇｈｔ（２６）より小さく、且つｔｏｐ（１６）がｂｏｔｔｏｍ（２８）より小さいので、重なりの四角形が存在するので、重なりの面積を求める。重なりの面積は、１４４（＝（２６－１４）×（２８－１６））と計算される。 Then, since left (14) is smaller than right (26) and top (16) is smaller than bottom (28), an overlapping rectangle exists, so the matching unit 27 calculates the overlapping area. The overlapping area is calculated as 144 (= (26-14) x (28-16)).

図１４は、実施例に係るマッチングの一例を示す図である。図１４に示す破線の矩形が、体ＩＤが「１」を示す顔候補データ３５から得られるバウンディングボックスであるとする。 Figure 14 is a diagram showing an example of matching according to the embodiment. The dashed rectangle shown in Figure 14 is assumed to be a bounding box obtained from face candidate data 35 showing a body ID of "1".

マッチング部２７は、対象の顔候補データ３５から得られるバウンディングボックスと、顔検出部２２によって検出された全ての顔の顔検出データ３２から得られるバウンディングボックスとの重なりの面積を求める。ここでは、体ＩＤが「１」を示す顔候補のバウンディングボックスに重なる顔検出データ３２から得られるバウンディングボックスが表わされている。重なる顔検出データ３２の顔ＩＤは、「１」、「２」、「３」である。体ＩＤ「１」と、顔ＩＤ「１」、「２」、「３」との重なり面積は、「２０」、「３０」、「１０」であるとする。 The matching unit 27 finds the overlap area between the bounding box obtained from the target face candidate data 35 and the bounding boxes obtained from the face detection data 32 of all faces detected by the face detection unit 22. Here, the bounding boxes obtained from the face detection data 32 that overlap the bounding box of the face candidate indicating a body ID of "1" are shown. The overlapping face detection data 32 face IDs are "1", "2", and "3". The overlap areas between the body ID "1" and the face IDs "1", "2", and "3" are assumed to be "20", "30", and "10".

そして、マッチング部２７は、対象の顔候補データ３５の体ＩＤと、重なりの面積が最大の顔検出データ３２の顔ＩＤとをマッチングしたと判定する。ここでは、体ＩＤ「１」と顔ＩＤ「２」との重なりの面積が最大となっている。そこで、体ＩＤ「１」と顔ＩＤ「２」とがマッチングしたと判定される。そして、マッチング部２７は、体ＩＤ「１」と顔ＩＤ「２」とを紐付けて、マッチングリスト３６に出力する。 The matching unit 27 then determines that the body ID of the target face candidate data 35 is matched with the face ID of the face detection data 32 with the largest overlapping area. Here, the overlapping area between body ID "1" and face ID "2" is the largest. Therefore, it is determined that body ID "1" and face ID "2" are matched. The matching unit 27 then links body ID "1" and face ID "2" together and outputs them to the matching list 36.

図１５は、実施例に係る情報処理の全体のフローチャートの一例を示す図である。図１５に示すように、制御部２０は、複数の人物が写った映像を入力する（ステップＳ１１）。制御部２０は、入力した映像について、フレーム単位の画像で、以降の処理を実行する（ステップＳ１２）。 Fig. 15 is a diagram showing an example of an overall flowchart of information processing according to an embodiment. As shown in Fig. 15, the control unit 20 inputs a video image showing multiple people (step S11). The control unit 20 performs subsequent processing on the input video image on a frame-by-frame basis (step S12).

顔検出部２２は、画像データに対して顔を検出し、検出した顔ごとに、顔座標と顔範囲に対して顔ＩＤを割り振り、顔検出データ３２を生成する（ステップＳ１３）。そして、顔検出部２２は、マッチング処理をすべく、ステップＳ２０に移行する。 The face detection unit 22 detects faces in the image data, assigns a face ID to the face coordinates and face area for each detected face, and generates face detection data 32 (step S13). The face detection unit 22 then proceeds to step S20 to perform matching processing.

一方、人物検出部２３は、画像データに対して人物を検出し、検出した人物ごとに、人物の座標と人物の範囲に対して体ＩＤを割り振り、人物検出データ３３を生成する（ステップＳ１４）。 Meanwhile, the person detection unit 23 detects people from the image data, assigns a body ID to the coordinates and range of each detected person, and generates person detection data 33 (step S14).

続いて、姿勢推定部２４は、画像から、人物検出データ３３ごとに対応する画像（人物検出画像）を抽出する（ステップＳ１５）。そして、姿勢推定部２４は、抽出した全ての人物検出画像に対して姿勢推定し、姿勢推定して得られる骨格情報を推定する。姿勢推定部２４は、骨格情報に対して、基にした人物検出データ３３の体ＩＤを割り振り、姿勢推定データ３４を生成する（ステップＳ１６）。ここでいう骨格情報は、人の関節点の位置情報のことをいう。 Next, the posture estimation unit 24 extracts from the image an image (person detection image) corresponding to each person detection data 33 (step S15). Then, the posture estimation unit 24 performs posture estimation for all extracted person detection images, and estimates skeletal information obtained by the posture estimation. The posture estimation unit 24 assigns the body ID of the person detection data 33 on which it is based to the skeletal information, and generates posture estimation data 34 (step S16). Skeletal information here refers to position information of a person's joint points.

続いて、横向き判定部２５は、人体比率および姿勢推定データ３４を用いて、体が横向きの人がいるか否かを判定する（ステップＳ１７）。例えば、横向き判定部２５は、姿勢推定データ３４の骨格情報を用いて、肩幅の長さを算出し、肩幅の長さが胴体の長さの１／３よりも短いか否かを判定し、短いと判定した場合には、体が横向きと判定する。なお、胴体の長さは、骨格情報から算出される上腕、下腕、胴体、太もも、ふくらはぎの長さと人体比率とから算出される。 Next, the sideways orientation determination unit 25 uses the human body ratio and posture estimation data 34 to determine whether or not a person's body is sideways (step S17). For example, the sideways orientation determination unit 25 calculates the shoulder width using the skeletal information of the posture estimation data 34, determines whether or not the shoulder width is shorter than 1/3 of the torso length, and if it is determined to be shorter, determines that the body is sideways. The torso length is calculated from the lengths of the upper arm, lower arm, torso, thigh, and calf calculated from the skeletal information and the human body ratio.

体が横向きの人がいないと判定した場合には（ステップＳ１７；Ｎｏ）、横向き判定部２５は、顔候補を生成すべく、ステップＳ１９に移行する。一方、体が横向きの人がいると判定した場合には（ステップＳ１７；Ｙｅｓ）、横向き判定部２５は、体が横向きの人の姿勢推定データ３４に横向きフラグを対応付ける（ステップＳ１８）。そして、横向き判定部２５は、顔候補を生成すべく、ステップＳ１９に移行する。 If it is determined that there is no person with a sideways body (step S17; No), the sideways orientation determination unit 25 proceeds to step S19 to generate face candidates. On the other hand, if it is determined that there is a person with a sideways body (step S17; Yes), the sideways orientation determination unit 25 associates a sideways orientation flag with the posture estimation data 34 of the person with a sideways body (step S18). Then, the sideways orientation determination unit 25 proceeds to step S19 to generate face candidates.

ステップＳ１９において、顔候補抽出部２６は、人体比率と、横向きフラグの対応付けのない姿勢推定データ３４（骨格情報）から顔候補の座標と顔候補の範囲を算出して顔候補データ３５を生成する（ステップＳ１９）。例えば、顔候補抽出部２６は、姿勢推定データ３４の体ＩＤに対する骨格情報から左肩のｘ座標およびｙ座標、並びに右肩のｘ座標およびｙ座標を取得する。顔候補抽出部２６は、左肩のｘ座標と右肩のｘ座標の小さい方の値を顔候補の矩形のｘ座標とし、左肩のｙ座標と右肩のｙ座標の大きい方の値を顔候補の矩形のｙ座標とする。そして、顔候補抽出部２６は、肩幅の水平成分の長さを顔候補の矩形の横幅とする。顔候補抽出部２６は、人体比率を用いて、胴体の長さの５／９を顔候補の矩形の高さとする。胴体の長さは、横向き判定部２５によって算出される。そして、顔候補抽出部２６は、姿勢推定データ３４の体ＩＤごとに、顔候補の座標、横幅および高さを顔候補データ３５として生成する。 In step S19, the face candidate extraction unit 26 calculates the coordinates and range of the face candidate from the posture estimation data 34 (skeletal information) without the correspondence between the human body ratio and the landscape flag to generate face candidate data 35 (step S19). For example, the face candidate extraction unit 26 acquires the x and y coordinates of the left shoulder and the x and y coordinates of the right shoulder from the skeleton information for the body ID of the posture estimation data 34. The face candidate extraction unit 26 sets the smaller value of the x coordinate of the left shoulder and the x coordinate of the right shoulder as the x coordinate of the rectangle of the face candidate, and sets the larger value of the y coordinate of the left shoulder and the y coordinate of the right shoulder as the y coordinate of the rectangle of the face candidate. Then, the face candidate extraction unit 26 sets the length of the horizontal component of the shoulder width as the width of the rectangle of the face candidate. The face candidate extraction unit 26 uses the human body ratio to set 5/9 of the length of the torso as the height of the rectangle of the face candidate. The length of the torso is calculated by the landscape determination unit 25. Then, the face candidate extraction unit 26 generates the coordinates, width, and height of the face candidate for each body ID in the posture estimation data 34 as face candidate data 35.

そして、マッチング部２７は、姿勢推定データ３４の体ＩＤごとに、マッチング処理を実行する（ステップＳ２０）。なお、マッチング処理のフローチャートは、後述する。そして、制御部２０は、未処理の画像が存在する場合には、ステップＳ１２に移行する（ステップＳ２１）。一方、制御部２０は、未処理の画像が存在しない場合には、情報処理を終了する。 Then, the matching unit 27 executes a matching process for each body ID in the posture estimation data 34 (step S20). Note that a flowchart of the matching process will be described later. Then, if there are unprocessed images, the control unit 20 proceeds to step S12 (step S21). On the other hand, if there are no unprocessed images, the control unit 20 ends the information processing.

図１６は、実施例に係るマッチング処理のフローチャートの一例を示す図である。図１６に示すように、マッチング部２７は、体ＩＤ単位で処理を実行する（ステップＳ３１）。マッチング部２７は、体ＩＤに対応する姿勢推定データ３４に横向きフラグが対応付けられているか否かを判定する（ステップＳ３２）。体ＩＤに対応する姿勢推定データ３４に横向きフラグが対応付けられていると判定した場合には（ステップＳ３２；Ｙｅｓ）、マッチング部２７は、次の体ＩＤに対応する処理を実行すべく、ステップＳ３７に移行する。 FIG. 16 is a diagram showing an example of a flowchart of the matching process according to the embodiment. As shown in FIG. 16, the matching unit 27 executes the process on a body ID basis (step S31). The matching unit 27 determines whether or not a landscape flag is associated with the posture estimation data 34 corresponding to the body ID (step S32). If it is determined that a landscape flag is associated with the posture estimation data 34 corresponding to the body ID (step S32; Yes), the matching unit 27 proceeds to step S37 to execute the process corresponding to the next body ID.

一方、体ＩＤに対応する姿勢推定データ３４に横向きフラグが対応付けられていないと判定した場合には（ステップＳ３２；Ｎｏ）、マッチング部２７は、顔ＩＤ単位で処理を実行する（ステップＳ３３）。マッチング部２７は、体ＩＤと顔ＩＤのそれぞれのバウンディングボックス（座標と範囲）の重なり面積を算出する（ステップＳ３４）。マッチング部２７は、未処理の顔ＩＤが存在する場合には、ステップＳ３３に移行する（ステップＳ３５）。一方、マッチング部２７は、未処理の顔ＩＤが存在しない場合には、体ＩＤと顔ＩＤとをマッチングすべく、ステップＳ３６に移行する。 On the other hand, if it is determined that the landscape flag is not associated with the posture estimation data 34 corresponding to the body ID (step S32; No), the matching unit 27 executes processing on a face ID basis (step S33). The matching unit 27 calculates the overlapping area of the bounding boxes (coordinates and range) of the body ID and the face ID (step S34). If there is an unprocessed face ID, the matching unit 27 proceeds to step S33 (step S35). On the other hand, if there is no unprocessed face ID, the matching unit 27 proceeds to step S36 to match the body ID and the face ID.

ステップＳ３６において、マッチング部２７は、体ＩＤと顔ＩＤとのバウンディングボックス（座標と範囲）の重なり面積が最大のものをペアとしてマッチングする（ステップＳ３６）。そして、マッチング部２７は、未処理の体ＩＤが存在する場合には、ステップＳ３１に移行する（ステップＳ３７）。 In step S36, the matching unit 27 matches the body ID and face ID with the largest overlapping area of the bounding boxes (coordinates and range) as a pair (step S36). If there is an unprocessed body ID, the matching unit 27 proceeds to step S31 (step S37).

一方、マッチング部２７は、未処理の体ＩＤが存在しない場合には、マッチング部２７は、マッチング結果をマッチングリスト３６に出力する（ステップＳ３８）。そして、マッチング部２７は、マッチング処理を終了する。 On the other hand, if there is no unprocessed body ID, the matching unit 27 outputs the matching result to the matching list 36 (step S38). Then, the matching unit 27 ends the matching process.

なお、実施例に係る情報処理装置１によって実行されるマッチング処理は、用途として映像配信サービスに適用することができる。例えば、映像配信サービスは、複数の人物が写った映像からそれぞれの顔と体（姿勢）とをマッチングする。加えて、映像配信サービスは、顔の表情が変わったときや体の動きが変わったときを人の感情が動いた瞬間の特徴と捉えて、その瞬間の画像やその瞬間の前後の映像をユーザに配信する。一例として、ボウリングでストライクをとって皆でハイタッチしているような部分の映像が挙げられる。また、映像配信サービスは、対象の人をハイライトして、その瞬間の画像やその瞬間の前後の映像をユーザに配信しても良い。 The matching process executed by the information processing device 1 according to the embodiment can be applied to a video distribution service. For example, the video distribution service matches the faces and bodies (postures) of multiple people in a video. In addition, the video distribution service considers a change in facial expression or body movement to be a characteristic of the moment when a person's emotions change, and distributes an image of that moment and video before and after that moment to the user. One example is a video of a part of a bowling game where everyone gets a strike and high-fives. The video distribution service may also highlight the target person and distribute an image of that moment and video before and after that moment to the user.

［実施例の効果］
このようにして、上記実施例では、情報処理装置１は、複数の人物を含む撮像画像からそれぞれの人物の顔を検出する。情報処理装置１は、撮像画像から人物を検出する。情報処理装置１は、検出された複数の人物に対応する人物検出データごとに、人物検出データに基づいて、撮像画像から姿勢を推定する。情報処理装置１は、推定された姿勢に対する姿勢データに基づく骨格情報から、顔候補データを算出する。情報処理装置１は、人物検出データごとに算出される顔候補データに対して、検出されたそれぞれの人物の顔検出データの中で重なり面積が最も大きい顔検出データを同一人物として特定する。かかる構成によれば、情報処理装置１は、顔検出から検出された顔と姿勢推定から推定された姿勢（体）とのマッチング精度を向上させることができる。すなわち、情報処理装置１は、人が密集しているような画像上でも、顔と体とを精度良くマッチングすることができる。 [Effects of the embodiment]
In this way, in the above embodiment, the information processing device 1 detects the faces of each person from a captured image including multiple people. The information processing device 1 detects people from a captured image. For each person detection data corresponding to the multiple detected people, the information processing device 1 estimates a posture from the captured image based on the person detection data. The information processing device 1 calculates face candidate data from skeletal information based on posture data for the estimated posture. For the face candidate data calculated for each person detection data, the information processing device 1 identifies face detection data with the largest overlap area among the face detection data of each detected person as the same person. With this configuration, the information processing device 1 can improve the matching accuracy between a face detected from face detection and a posture (body) estimated from posture estimation. That is, the information processing device 1 can match a face and a body with high accuracy even on an image where people are crowded together.

また、上記実施例では、情報処理装置１は、人体全体の中の部位の比率を示す人体比率および姿勢データから得られる骨格情報に基づいて、顔候補データを算出する。かかる構成によれば、情報処理装置１は、人体比率を用いることで、姿勢データから顔の位置の候補を算出することができる。 In addition, in the above embodiment, the information processing device 1 calculates face candidate data based on human body ratios that indicate the ratios of parts of the entire human body and skeletal information obtained from posture data. With this configuration, the information processing device 1 can calculate candidates for the position of the face from the posture data by using the human body ratios.

また、上記実施例では、情報処理装置１は、人体比率を用いて、骨格情報から得られる所定の部位の長さから胴体の長さを推定する。情報処理装置１は、骨格情報から得られる右肩と左肩の位置と、肩幅の水平成分の長さおよび胴体の長さに基づいて顔候補データを算出する。かかる構成によれば、情報処理装置１は、肩の位置と胴体の長さを用いることで、顔の位置の候補を算出することができる。 In addition, in the above embodiment, the information processing device 1 uses human body proportions to estimate the length of the torso from the length of a specific part obtained from the skeletal information. The information processing device 1 calculates face candidate data based on the positions of the right and left shoulders obtained from the skeletal information, the length of the horizontal component of the shoulder width, and the length of the torso. With this configuration, the information processing device 1 can calculate candidates for the position of the face by using the position of the shoulders and the length of the torso.

また、上記実施例では、情報処理装置１は、さらに、肩幅の長さおよび胴体の長さを用いて体が横向きであるか否かを判定する。情報処理装置１は、体が横向きであると判定した場合には、骨格情報から顔候補データを算出しない。かかる構成によれば、情報処理装置１は、人体比率を用いることで、体が横向きであるか否かを判定することができる。そして、情報処理装置１は、体が横向きである骨格情報を顔候補の算出から除外することで、精度の悪い顔候補の算出を避けることができる。 In the above embodiment, the information processing device 1 further determines whether the body is facing sideways using the shoulder width length and the torso length. If the information processing device 1 determines that the body is facing sideways, it does not calculate face candidate data from skeletal information. With this configuration, the information processing device 1 can determine whether the body is facing sideways by using human body proportions. Then, the information processing device 1 can avoid calculating face candidates with low accuracy by excluding skeletal information in which the body is facing sideways from the calculation of face candidates.

［その他］
なお、実施例では、顔検出データ３２、人物検出データ３３、姿勢推定データ３４および顔候補データ３５の座標について、原点を撮像画像の左上とした場合を説明した。しかしながら、原点の位置を撮像画像の左下とした場合であっても良い。かかる場合には、例えば、顔候補抽出部２６は、左肩のｘ座標と右肩のｘ座標の小さい方の値を顔候補の矩形のｘ座標とし、左肩のｙ座標と右肩のｙ座標の小さい方の値を顔候補の矩形のｙ座標とすれば良い。 [others]
In the embodiment, the coordinates of the face detection data 32, the person detection data 33, the posture estimation data 34, and the face candidate data 35 are described as having the origin at the top left of the captured image. However, the origin may be located at the bottom left of the captured image. In such a case, for example, the face candidate extraction unit 26 may set the smaller of the x coordinate of the left shoulder and the x coordinate of the right shoulder as the x coordinate of the rectangle of the face candidate, and set the smaller of the y coordinate of the left shoulder and the y coordinate of the rectangle of the face candidate.

また、図示した情報処理装置１の各構成要素は、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的態様は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、横向き判定部２５と顔候補抽出部２６とを１つの部として統合しても良い。また、横向き判定部２５を胴体の長さを決定する決定部と、胴体の長さを用いて横向きを判定する判定部とに分散しても良い。また、画像ストレージ３１、顔検出データ３２などを記憶する記憶部３０を情報処理装置１の外部装置としてネットワーク経由で接続するようにしても良い。さらに、制御部２０および記憶部３０をクラウド上に置き、利用物は、パーソナルコンピュータなどからネットワークを介して当該クラウドに接続して、制御部２０により実行される処理を利用することも可能である。このようにすることで、本発明の処理をサービスとして提供することができるようになる。 In addition, each component of the illustrated information processing device 1 does not necessarily have to be physically configured as illustrated. In other words, the specific manner in which each device is distributed and integrated is not limited to that shown in the figure, and all or part of them can be functionally or physically distributed and integrated in any unit depending on various loads and usage conditions. For example, the landscape determination unit 25 and the face candidate extraction unit 26 may be integrated into one unit. The landscape determination unit 25 may also be distributed into a determination unit that determines the length of the torso and a determination unit that determines landscape orientation using the length of the torso. In addition, the image storage 31, the storage unit 30 that stores the face detection data 32, etc. may be connected via a network as an external device of the information processing device 1. Furthermore, the control unit 20 and the storage unit 30 may be placed on a cloud, and the user may connect to the cloud via a network from a personal computer or the like to use the processing executed by the control unit 20. In this way, the processing of the present invention can be provided as a service.

また、上記実施例で説明した各種の処理は、予め用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図１に示した情報処理装置１と同様の機能を実現するマッチングプログラムを実行するコンピュータの一例を説明する。図１７は、マッチングプログラムを実行するコンピュータの一例を示す図である。 The various processes described in the above embodiments can be realized by executing a prepared program on a computer such as a personal computer or a workstation. Therefore, an example of a computer that executes a matching program that realizes the same functions as the information processing device 1 shown in FIG. 1 will be described below. FIG. 17 is a diagram showing an example of a computer that executes a matching program.

図１７に示すように、コンピュータ２００は、各種演算処理を実行するＣＰＵ２０３と、ユーザからのデータの入力を受け付ける入力装置２１５と、表示装置２０９とを有する。また、コンピュータ２００は、記憶媒体からプログラムなどを読取るドライブ装置２１３と、ネットワークを介して他のコンピュータとの間でデータの授受を行う通信Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）２１７とを有する。また、コンピュータ２００は、各種情報を一時記憶するメモリ２０１と、ＨＤＤ（Hard Disk Drive）２０５を有する。そして、メモリ２０１、ＣＰＵ２０３、ＨＤＤ２０５、表示装置２０９、ドライブ装置２１３、入力装置２１５、通信Ｉ／Ｆ２１７は、バス２１９で接続されている。 As shown in FIG. 17, computer 200 has a CPU 203 that executes various arithmetic processes, an input device 215 that accepts data input from a user, and a display device 209. Computer 200 also has a drive device 213 that reads programs and the like from a storage medium, and a communication I/F (Interface) 217 that transmits and receives data to and from other computers via a network. Computer 200 also has memory 201 that temporarily stores various information, and HDD (Hard Disk Drive) 205. Memory 201, CPU 203, HDD 205, display device 209, drive device 213, input device 215, and communication I/F 217 are connected by bus 219.

ドライブ装置２１３は、例えばリムーバブルディスク２１１用の装置である。ＨＤＤ２０５は、マッチングプログラム２０５ａおよびマッチング処理関連情報２０５ｂを記憶する。通信Ｉ／Ｆ２１７は、ネットワークと装置内部とのインターフェースを司り、他のコンピュータからのデータの入出力を制御する。通信Ｉ／Ｆ２１７には、例えば、モデムやＬＡＮアダプタなどを採用することができる。 The drive device 213 is, for example, a device for the removable disk 211. The HDD 205 stores the matching program 205a and the matching process related information 205b. The communication I/F 217 is responsible for interfacing between the network and the inside of the device, and controls the input and output of data from other computers. For example, a modem or a LAN adapter can be used as the communication I/F 217.

表示装置２０９は、カーソル、アイコンあるいはツールボックスをはじめ、文書、画像、機能情報などのデータを表示する表示装置である。表示装置２０９は、例えば、液晶ディスプレイや有機ＥＬ（Ｅｌｅｃｔｒｏｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイなどを採用することができる。 The display device 209 is a display device that displays data such as a cursor, an icon, a tool box, documents, images, and function information. The display device 209 may be, for example, a liquid crystal display or an organic EL (Electroluminescence) display.

ＣＰＵ２０３は、マッチングプログラム２０５ａを読み出して、メモリ２０１に展開し、プロセスとして実行する。かかるプロセスは情報処理装置１の各機能部に対応する。マッチング処理関連情報２０５ｂは、画像ストレージ３１や顔検出データ３２などに対応する。そして、例えばリムーバブルディスク２１１が、マッチングプログラム２０５ａなどの各情報を記憶する。 The CPU 203 reads the matching program 205a, expands it in the memory 201, and executes it as a process. Such a process corresponds to each functional unit of the information processing device 1. The matching process related information 205b corresponds to the image storage 31, face detection data 32, etc. And, for example, the removable disk 211 stores each piece of information such as the matching program 205a.

なお、マッチングプログラム２０５ａについては、必ずしも最初からＨＤＤ２０５に記憶させておかなくても良い。例えば、コンピュータ２００に挿入されるフレキシブルディスク（ＦＤ）、ＣＤ－ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に当該プログラムを記憶させておく。そして、コンピュータ２００がこれらからマッチングプログラム２０５ａを読み出して実行するようにしても良い。 The matching program 205a does not necessarily have to be stored in the HDD 205 from the beginning. For example, the program can be stored in a "portable physical medium" such as a flexible disk (FD), CD-ROM, DVD disk, magneto-optical disk, or IC card that is inserted into the computer 200. The computer 200 can then read and execute the matching program 205a from the medium.

１情報処理装置
３カメラ
５ネットワーク
９情報処理システム
１０通信部
２０制御部
２１保存部
２２顔検出部
２３人物検出部
２４姿勢推定部
２５横向き判定部
２６顔候補抽出部
２７マッチング部
３０記憶部
３１画像ストレージ
３２顔検出データ
３３人物検出データ
３４姿勢推定データ
３５顔候補データ
３６マッチングリスト REFERENCE SIGNS LIST 1 Information processing device 3 Camera 5 Network 9 Information processing system 10 Communication unit 20 Control unit 21 Storage unit 22 Face detection unit 23 Person detection unit 24 Posture estimation unit 25 Landscape orientation determination unit 26 Face candidate extraction unit 27 Matching unit 30 Memory unit 31 Image storage 32 Face detection data 33 Person detection data 34 Posture estimation data 35 Face candidate data 36 Matching list

Claims

a first detection unit that detects a face of each person from a captured image including a plurality of people;
a second detection unit that detects each person from the captured image;
a posture estimation unit that estimates, for each person detection data corresponding to each person detected by the second detection unit, a posture corresponding to the person detection data from the captured image;
a first calculation unit that uses a human body ratio indicating a ratio of a part in the entire human body to estimate a length of a torso according to a length of a predetermined part obtained from skeletal information based on a posture estimated by the posture estimation unit, calculates a shoulder width using the skeletal information, and calculates face candidate data from the skeletal information based on posture data for the posture estimated by the posture estimation unit;
a determination unit that determines whether or not the body is lying sideways using the shoulder width and the torso length;
a second calculation unit that calculates the face candidate data based on the positions of the right and left shoulders obtained from the skeletal information, the length of the horizontal component of the shoulder width, and the length of the torso when it is determined that the body is not turned sideways, and does not calculate the face candidate data from the skeletal information when it is determined that the body is turned sideways;
an identification unit that identifies, for face candidate data calculated for each of the person detection data, face detection data having a largest overlapping area among the face detection data of each person detected by the first detection unit, as the same person;
13. An information processing device comprising:

Detecting the faces of multiple people from a captured image containing each person,
Detecting each person from the captured image;
For each person detection data corresponding to each of the detected people, estimating a posture corresponding to the person detection data from the captured image;
Using a human body ratio indicating the ratio of a part to the whole human body, a torso length is estimated according to the length of a predetermined part obtained from skeletal information based on the estimated posture, and a shoulder width is calculated using the skeletal information . Face candidate data is calculated from the skeletal information based on posture data for the estimated posture.
determining whether the body is lying sideways using the shoulder width and the torso length;
When it is determined that the body is not turned sideways, the face candidate data is calculated based on the positions of the right and left shoulders obtained from the skeletal information, the length of the horizontal component of the shoulder width, and the length of the torso, and when it is determined that the body is turned sideways, the face candidate data is not calculated from the skeletal information,
identifying, for the face candidate data calculated for each of the person detection data, the face detection data having the largest overlapping area among the face detection data of each of the detected persons as the same person;
A matching program that causes a computer to execute the processing.

Detecting the faces of multiple people from a captured image containing each person,
Detecting each person from the captured image;
For each person detection data corresponding to each of the detected people, estimating a posture corresponding to the person detection data from the captured image;
Using a human body ratio indicating the ratio of a part to the whole human body, a torso length is estimated according to the length of a predetermined part obtained from skeletal information based on the estimated posture, and a shoulder width is calculated using the skeletal information . Face candidate data is calculated from the skeletal information based on posture data for the estimated posture.
determining whether the body is lying sideways using the shoulder width and the torso length;
When it is determined that the body is not turned sideways, the face candidate data is calculated based on the positions of the right and left shoulders obtained from the skeletal information, the length of the horizontal component of the shoulder width, and the length of the torso, and when it is determined that the body is turned sideways, the face candidate data is not calculated from the skeletal information,
identifying, for the face candidate data calculated for each of the person detection data, the face detection data having the largest overlapping area among the face detection data of each of the detected persons as the same person;
A matching method in which processing is performed by a computer.