JP6989485B2

JP6989485B2 - Multi-label data learning support device, multi-label data learning support method and multi-label data learning support program

Info

Publication number: JP6989485B2
Application number: JP2018239239A
Authority: JP
Inventors: 博章三沢; 博基古川; 一則和久井
Original assignee: Hitachi Industry and Control Solutions Co Ltd
Current assignee: Hitachi Industry and Control Solutions Co Ltd
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2022-01-05
Anticipated expiration: 2038-12-21
Also published as: JP2020101968A

Description

本発明は、マルチラベルデータ学習支援装置、マルチラベルデータ学習支援方法およびマルチラベルデータ学習支援プログラムに関する。 The present invention relates to a multi-label data learning support device, a multi-label data learning support method, and a multi-label data learning support program.

近年、画像認識の分野では機械学習（特に、Deep Learning）を活用したシステムやサービスが増加している。Deep Learning（以下、「ＤＬ」と称する。）には、学習用の画像データに正解ラベルを付けて学習する教師あり学習と、正解ラベルを付けずに学習を行う教師なし学習がある。例えば、入力画像に写る物体が「何か」を認識する画像分類問題や画像に写る物体が「正常か異常か」を判断する故障診断、人物画像から年齢を推定するといった回帰問題などをＤＬで解くためには、教師あり学習を用いることが多い。 In recent years, in the field of image recognition, systems and services utilizing machine learning (especially deep learning) are increasing. Deep learning (hereinafter referred to as "DL") includes supervised learning in which learning is performed by attaching a correct answer label to image data for learning, and unsupervised learning in which learning is performed without a correct answer label. For example, DL can be used to solve image classification problems that recognize "something" about an object in an input image, failure diagnosis that determines whether an object in an image is "normal or abnormal", and regression problems such as estimating age from a person's image. Supervised learning is often used to solve.

教師あり学習は、一般的に学習に用いる画像データが多ければ多い程、学習後に生成する識別器の性能（認識精度、汎化性能など）が向上する。しかし、教師あり学習には、画像データに正解ラベル付けを行う作業（アノテーション）に非常に大きな工数を要するという問題がある。 In supervised learning, the more image data is generally used for learning, the better the performance (recognition accuracy, generalization performance, etc.) of the classifier generated after learning. However, supervised learning has a problem that the work (annotation) of labeling image data with a correct answer requires a very large number of man-hours.

上記問題に対して、特許文献１に記載の技術では、ラベル無し画像データの中から、学習効果の高いデータを抽出し、アノテーション作業者（以下「アノテータ」と称する。）に画像データを提示する能動学習と呼ばれる手法を用いたシステムを提案している。 In response to the above problem, in the technique described in Patent Document 1, data having a high learning effect is extracted from unlabeled image data, and the image data is presented to an annotation worker (hereinafter referred to as "annotator"). We are proposing a system that uses a method called active learning.

具体的には、まず、少量（例えば、数十から数百枚）のラベル付き画像データを用いて識別器を生成し、生成した識別器に大量（例えば、数千から数万枚）のラベル無し画像データを入力して認識処理を行う。次に、認識処理の出力値をもとに計算処理によって学習効果の高い画像データ（例えば、出力値のエントロピーを計算しエントロピーが大きい画像データ）を抽出する。ここでの学習効果の高い画像データとは、識別器が推定した結果が間違っている可能性の高い画像データを意味する。そして、推定した結果が間違っている可能性が高い画像データに、アノテータが正しいラベルを付与し、再度識別器を生成して、徐々に識別器の性能を高めていく能動学習と呼ばれる手法を用いる。 Specifically, first, a classifier is generated using a small amount (for example, tens to hundreds of sheets) of labeled image data, and a large amount (for example, thousands to tens of thousands of sheets) of labels are generated on the generated classifier. None Image data is input and recognition processing is performed. Next, image data having a high learning effect (for example, image data having a large entropy by calculating the entropy of the output value) is extracted by calculation processing based on the output value of the recognition process. The image data having a high learning effect here means the image data in which the result estimated by the discriminator is likely to be incorrect. Then, the annotator assigns the correct label to the image data that is likely to have an incorrect estimation result, generates a classifier again, and uses a method called active learning that gradually improves the performance of the classifier. ..

特開２０１７−１６７８３４号公報JP-A-2017-167834

特許文献１に記載の技術は、単一ラベル（１枚の画像データについて１つの正解ラベル）に対しては、アノテーションの高効率化を見込める。しかしながら、１枚の画像データについて複数の正解ラベルを付与するマルチラベル（例えば、人物の画像に対して、年齢、性別、服装などの複数のラベルを付けること）に対しては、単一ラベル時と比較してアノテーション工数がラベル数分、倍増する。よって、画像データに正解ラベルを付与するアノテータの負担も増加してしまう。 The technique described in Patent Document 1 can be expected to improve the efficiency of annotation for a single label (one correct label for one image data). However, for a multi-label that assigns multiple correct labels to one image data (for example, attaching multiple labels such as age, gender, clothes, etc. to a person's image), a single label is used. Compared with, the man-hours for annotation are doubled by the number of labels. Therefore, the burden on the annotator that assigns the correct label to the image data also increases.

本発明は、上記問題に鑑みてなされたものであり、マルチラベルを持つ画像データへのアノテーション作業量を低減することができる、マルチラベルデータ学習支援装置、マルチラベルデータ学習支援方法およびマルチラベルデータ学習支援プログラムを提供することを課題とする。 The present invention has been made in view of the above problems, and is capable of reducing the amount of annotation work on image data having multi-labels, a multi-label data learning support device, a multi-label data learning support method, and multi-label data. The challenge is to provide a learning support program.

前記課題を達成するため、本発明のマルチラベルデータ学習支援装置は、ラベル付き画像データを記憶するラベル付き画像データＤＢ（DataBase）、ラベル無し画像データを記憶するラベル無し画像データＤＢ、および、複数のラベル間の関係性を示すとともに画像データに付されるマルチラベルの内容の傾向を示すラベル関連情報を記憶するラベル関連情報ＤＢが格納される記憶部と、ラベル付き画像データを取得し、機械学習により識別器を生成する学習部と、生成した識別器を用いてラベル無し画像データに対して、マルチラベルを構成する各ラベルの推論処理を行う推論部と、推論処理で得られたラベル無し画像データについての各ラベルの推論結果と、ラベル関連情報で示されるラベル間の関連性とを比較して異なる場合に、当該ラベル無し画像データをアノテーションを必要とするアノテーション対象画像として選定するアノテーション画像選定部と、選定されたアノテーション対象画像を表示装置に表示させ、当該画像データの正解ラベルの入力を受け付けるアノテーション処理部と、を備えるものとした。 In order to achieve the above object, the multi-label data learning support device of the present invention includes a labeled image data DB (DataBase) for storing labeled image data, an unlabeled image data DB for storing unlabeled image data, and a plurality of unlabeled image data DBs. The storage unit that stores the label-related information DB that stores the label-related information that shows the relationship between the labels and shows the tendency of the contents of the multi-label attached to the image data, and the storage unit that stores the label-related information DB, and the machine that acquires the labeled image data. A learning unit that generates a classifier by learning, a reasoning unit that performs inference processing for each label that constitutes a multi-label for unlabeled image data using the generated classifier, and no label obtained by inference processing. Annotated image that selects the unlabeled image data as the annotation target image that requires annotation when the inference result of each label for the image data and the relationship between the labels indicated by the label-related information are compared and different. It is provided with a selection unit and an annotation processing unit that displays the selected image to be annotated on the display device and accepts the input of the correct answer label of the image data.

本発明によれば、マルチラベルを持つ画像データへのアノテーション作業量を低減する、マルチラベルデータ学習支援装置、マルチラベルデータ学習支援方法およびマルチラベルデータ学習支援プログラムを提供することができる。 According to the present invention, it is possible to provide a multi-label data learning support device, a multi-label data learning support method, and a multi-label data learning support program that reduce the amount of annotation work on image data having multi-labels.

本実施形態に係るマルチラベルデータ学習支援装置の構成を示すブロック図である。It is a block diagram which shows the structure of the multi-label data learning support apparatus which concerns on this embodiment. 本実施形態に係るラベル付き画像データＤＢに記憶されるラベル付き画像データ情報のデータ構成例を示す図である。It is a figure which shows the data structure example of the labeled image data information stored in the labeled image data DB which concerns on this embodiment. 本実施形態に係るラベル無し画像データＤＢに記憶されるラベル無し画像データ情報（推論前）のデータ構成例を示す図である。It is a figure which shows the data composition example of the unlabeled image data information (before inference) stored in the unlabeled image data DB which concerns on this embodiment. 本実施形態に係るラベル無し画像データＤＢに記憶されるラベル無し画像データ情報（推論後）のデータ構成例を示す図である。It is a figure which shows the data composition example of the unlabeled image data information (after inference) stored in the unlabeled image data DB which concerns on this embodiment. 本実施形態に係るラベル関連情報ＤＢに記憶されるラベル関連情報のデータ構成例を示す図である。It is a figure which shows the data structure example of the label-related information stored in the label-related information DB which concerns on this embodiment. 本実施形態に係るマルチラベルデータ学習支援装置が実行する全体の処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the whole processing executed by the multi-label data learning support apparatus which concerns on this embodiment. 本実施形態に係るマルチラベルデータ学習支援装置のアノテーション画像選定部が実行するアノテーション対象画像選定処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the annotation target image selection process executed by the annotation image selection part of the multi-label data learning support apparatus which concerns on this embodiment. 本実施形態に係るラベル無し画像データＤＢに記憶されるラベル無し画像データ情報（アノテーション候補画像決定後）のデータ構成例を示す図である。It is a figure which shows the data composition example of the unlabeled image data information (after the annotation candidate image is determined) stored in the unlabeled image data DB which concerns on this embodiment. 本実施形態に係るラベル関連情報定義ＵＩ（User Interface）を例示する図である。It is a figure which illustrates the label-related information definition UI (User Interface) which concerns on this embodiment. 本実施形態に係るラベル関連情報履歴確認ＵＩを例示する図である。It is a figure which illustrates the label-related information history confirmation UI which concerns on this embodiment. 本実施形態に係るラベル関連情報定義ＵＩを例示する図である。It is a figure which illustrates the label-related information definition UI which concerns on this embodiment.

以下、本発明の実施形態（以下、「本実施形態」と称する。）について図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention (hereinafter referred to as “the present embodiment”) will be described in detail with reference to the drawings.

本実施形態に係るマルチラベルデータ学習支援装置１（後記する、図１参照）は、画像の撮影場所ごとに異なるマルチラベルの「ラベル間の関連性」を、「ラベル関連情報」として定義し、学習効果の高いデータを能動学習するために抽出する際に、ラベル関連情報を活用することにより、マルチラベルを持つ画像データへのアノテーション作業量を低減する。
本実施形態における「ラベル間の関連性」とは、画像を撮影する場所における、撮影対象となる画像データに付されるマルチラベルの内容の「傾向」を意味する。年齢ラベル、性別ラベル、服装ラベルの３種類のマルチラベルを持つ人物画像の例で説明すると、「この場所では、２０代（年齢ラベル）の男性（性別ラベル）は、スーツ（服装ラベル）を着ている可能性が高い」、「この場所では、３０代（年齢ラベル）でスカート（服装ラベル）をはいている人は、女性（性別ラベル）の可能性が高い」といったように、画像を撮影する場所によって、マルチラベルの内容に傾向が得られることがある。この傾向を事前に「ラベル関連情報」として定義する。 The multi-label data learning support device 1 (described later, see FIG. 1) according to the present embodiment defines the “relationship between labels” of the multi-label, which differs depending on the shooting location of the image, as “label-related information”. By utilizing label-related information when extracting data with a high learning effect for active learning, the amount of annotation work on image data having multiple labels is reduced.
The "relationship between labels" in the present embodiment means the "tendency" of the content of the multi-label attached to the image data to be photographed at the place where the image is photographed. To explain with an example of a person image having three types of multi-labels, age label, gender label, and clothing label, "In this place, a man (gender label) in his twenties (age label) wears a suit (clothing label). "In this place, people in their thirties (age label) who wear skirts (clothes label) are more likely to be women (gender label)." Depending on where you do it, you may find a tendency in the content of the multi-label. This tendency is defined in advance as "label related information".

マルチラベルデータ学習支援装置１では、撮影場所ごとに、上記のラベル関連情報を定義しておき、識別器がラベル無し画像データを推論した際に、例えば、「年齢：２０代、性別：男性、服装：作業着」と推論したとすると、「２０代男性は、スーツを着ている可能性が高い」という定義した傾向と異なる推論をしているため、このラベル無し画像データに対する推論が間違っている可能性が高くなる。この識別器が推論した結果について、間違っている可能性が高いと判断した画像を抽出し、マルチラベルを持つ画像データへのアノテーションを行う。これにより、マルチラベルデータ学習支援装置１によれば、マルチラベルを持つ画像について能動学習を行い、効率的に識別器の性能を高めることができるため、作業工数を低減することが可能となる。
以下、本実施形態に係るマルチラベルデータ学習支援装置１について、詳細に説明する。 In the multi-label data learning support device 1, the above label-related information is defined for each shooting location, and when the classifier infers the unlabeled image data, for example, "age: 20s, gender: male," If you infer "clothes: work clothes", the inference for this unlabeled image data is incorrect because the inference is different from the defined tendency that "men in their twenties are likely to wear suits". It is more likely that you are there. For the result inferred by this classifier, the image judged to be highly likely to be incorrect is extracted, and the image data having the multi-label is annotated. As a result, according to the multi-label data learning support device 1, active learning can be performed on an image having a multi-label, and the performance of the classifier can be efficiently improved, so that the work man-hours can be reduced.
Hereinafter, the multi-label data learning support device 1 according to the present embodiment will be described in detail.

＜マルチラベルデータ学習支援装置＞
図１は、本実施形態に係るマルチラベルデータ学習支援装置１の構成を示すブロック図である。
図１に示すように、マルチラベルデータ学習支援装置１は、制御部１０、入力部２０、出力部３０および記憶部４０を備えるコンピュータにより構成される。 <Multi-label data learning support device>
FIG. 1 is a block diagram showing a configuration of a multi-label data learning support device 1 according to the present embodiment.
As shown in FIG. 1, the multi-label data learning support device 1 is composed of a computer including a control unit 10, an input unit 20, an output unit 30, and a storage unit 40.

入力部２０は、ネットワーク等を介して情報の送受信（入力）を行うための通信インタフェース（図示省略）や、タッチパネルやキーボード等の入力装置（図示省略）を介して、情報の入力を受け付ける機能を備える。この入力部２０への情報の入力は、アノテータ９が視覚的に操作できるようにユーザインタフェース（ＵＩ：User Interface）により実現してもよいし、ＣＳＶファイルのような媒体で入力できるようにしてもよい。 The input unit 20 has a function of receiving information input via a communication interface (not shown) for transmitting / receiving (inputting) information via a network or the like, or an input device (not shown) such as a touch panel or keyboard. Be prepared. The input of information to the input unit 20 may be realized by a user interface (UI: User Interface) so that the annotator 9 can be visually operated, or may be input by a medium such as a CSV file. good.

この入力部２０は、アノテーション処理部２１と、画像データ読込部２２と、ラベル関連情報定義部２３とを含んで構成される。
アノテーション処理部２１は、ラベル無し画像データに対して、アノテータ９等が付与した正解ラベルの情報を受け付けることにより、ラベル付き画像データを生成する。そして、アノテーション処理部２１は、生成したラベル付き画像データを、記憶部４０内の後記するラベル付き画像データＤＢ（DataBase）４１に記憶する。 The input unit 20 includes an annotation processing unit 21, an image data reading unit 22, and a label-related information definition unit 23.
The annotation processing unit 21 generates labeled image data by receiving information on the correct label given by the annotator 9 or the like to the unlabeled image data. Then, the annotation processing unit 21 stores the generated labeled image data in the labeled image data DB (DataBase) 41 described later in the storage unit 40.

画像データ読込部２２は、ラベル付き画像データとラベル無し画像データの入力を受け付ける。そして、画像データ読込部２２は、ラベル付き画像データを受け付けると、後記するラベル付き画像データ情報４１０（図２参照）で示される１レコードを生成した上で、そのラベル付き画像データを、記憶部４０内のラベル付き画像データＤＢ４１に記憶する。また、画像データ読込部２２は、ラベル無し画像データを受け付けると、後記するラベル無し画像データ情報４２０（図３参照）で示される１レコードを生成した上で、そのラベル無し画像データを、記憶部４０内のラベル無し画像データＤＢ４２に記憶する。
なお、画像データ読込部２２は、ラベル付きの画像データおよびラベル無し画像データをアノテータ９の操作により、１枚ずつ受け付けても良いし、ある特定のディレクトリ内に含まれている画像データとして一括で受け付けても良い。
また、画像データ読込部２２は、画像データの入力を受け付けると、当該画像データにラベルが付されているか否かを判定する機能を備える。そして、画像データ読込部２２は、ラベルが付されている場合には、その画像データをラベル付き画像データとして、ラベル付き画像データＤＢ４１に記憶する。また、画像データ読込部２２は、ラベルが付されていない場合には、そのデータをラベル無し画像データとして、ラベル無し画像データＤＢ４２に記憶する。 The image data reading unit 22 accepts input of labeled image data and unlabeled image data. Then, when the image data reading unit 22 receives the labeled image data, it generates one record indicated by the labeled image data information 410 (see FIG. 2) described later, and then stores the labeled image data in the storage unit. It is stored in the labeled image data DB 41 in 40. Further, when the image data reading unit 22 receives the unlabeled image data, it generates one record shown in the unlabeled image data information 420 (see FIG. 3) described later, and then stores the unlabeled image data in the storage unit. It is stored in the unlabeled image data DB 42 in 40.
The image data reading unit 22 may accept labeled image data and unlabeled image data one by one by operating the annotator 9, or collectively as image data contained in a specific directory. You may accept it.
Further, the image data reading unit 22 has a function of determining whether or not the image data is labeled when the input of the image data is received. Then, when the image data reading unit 22 is labeled, the image data is stored in the labeled image data DB 41 as the labeled image data. If the image data reading unit 22 is not labeled, the image data reading unit 22 stores the data as unlabeled image data in the unlabeled image data DB 42.

ラベル関連情報定義部２３は、アノテータ９等から、ラベル間の関連性を示すとともに、撮影対象となる画像データに付されるマルチラベルの内容の「傾向」を示すラベル関連情報、つまり、ラベル間の関連性の定義情報の入力を受け付ける。ラベル関連情報定義部２３は、受け付けたラベル関連情報を、記憶部４０内のラベル関連情報ＤＢ４３に記憶する。なお、ラベル関連情報の詳細は、図５を参照して後記する。また、図９〜図１１を参照して、ラベル関連情報をアノテータ９等が入力する際に利用するユーザインタフェース（ＵＩ）について説明する。 The label-related information definition unit 23 indicates the relationship between the labels from the annotator 9 and the like, and also indicates the "tendency" of the content of the multi-label attached to the image data to be photographed, that is, the label-to-label information. Accepts input of relevance definition information. The label-related information definition unit 23 stores the received label-related information in the label-related information DB 43 in the storage unit 40. The details of the label-related information will be described later with reference to FIG. Further, with reference to FIGS. 9 to 11, a user interface (UI) used when the annotator 9 or the like inputs label-related information will be described.

出力部３０は、ネットワーク等を介して情報の送受信（出力）を行うための通信インタフェース（図示省略）や、モニタ等の表示装置（図示省略）へ情報を出力するインタフェースの機能を備える。
出力部３０は、入力部２０および制御部１０の機能に基づき得られた結果である、識別器１１１の精度評価結果や、ラベル無し画像データの中からアノテーションを行う画像として選定された画像データ（アノテーション対象画像）、後記するラベル関連情報定義ユーザインタフェース２００の画面（図９，図１１）、ラベル関連情報履歴確認ユーザインタフェース２５０の画面（図１０）等を、表示装置に表示させる。 The output unit 30 has a communication interface (not shown) for transmitting / receiving (outputting) information via a network or the like, and an interface function for outputting information to a display device (not shown) such as a monitor.
The output unit 30 is an image data selected as an image to be annotated from the accuracy evaluation result of the classifier 111 and the unlabeled image data, which are the results obtained based on the functions of the input unit 20 and the control unit 10. The display device displays the screen of the label-related information definition user interface 200 (FIGS. 9 and 11), the screen of the label-related information history confirmation user interface 250 (FIG. 10), and the like, which will be described later.

記憶部４０は、ハードディスクやフラッシュメモリ、ＲＡＭ（Random Access Memory）等により構成される。この記憶部４０には、ラベル付き画像データＤＢ４１、ラベル無し画像データＤＢ４２、ラベル関連情報ＤＢ４３や、その他、マルチラベルデータ学習支援装置１が実行する処理に必要な情報が記憶される。 The storage unit 40 is composed of a hard disk, a flash memory, a RAM (Random Access Memory), and the like. The storage unit 40 stores the labeled image data DB 41, the unlabeled image data DB 42, the label-related information DB 43, and other information necessary for the processing executed by the multi-label data learning support device 1.

図２は、本実施形態に係るラベル付き画像データＤＢ４１に記憶されるラベル付き画像データ情報４１０のデータ構成例を示す図である。
このラベル付き画像データＤＢ４１には、ラベル付き画像データそのもの（撮影画像）とともに、その画像データ毎（１レコード毎）に、当該画像データに関連する情報がラベル付き画像データ情報４１０として、例えばテーブル形式で格納される（図２参照）。 FIG. 2 is a diagram showing a data configuration example of the labeled image data information 410 stored in the labeled image data DB 41 according to the present embodiment.
In the labeled image data DB 41, together with the labeled image data itself (captured image), information related to the image data is stored as the labeled image data information 410 for each image data (for each record), for example, in a table format. It is stored in (see Fig. 2).

このラベル付き画像データ情報４１０には、図２に示すようにそのデータ項目として、撮影場所４１１、撮影画像保存先パス４１２、撮影画像に対するマルチラベル（ここでは例として、性別４１３、年代４１４、服装（上半身）４１５、服装（下半身）４１６の各ラベル）が格納される。
例えば、図２のラベル付き画像データ情報４１０の１行目（１つ目のレコード）に示すように、画像データの撮影場所４１１として「Ａ工場」が格納される。画像データの保存先を示す撮影画像保存先パス４１２として「A_001.jpg」が格納される。画像データに付されるラベル（マルチラベル）のラベル情報として、性別４１３が「男」、年代４１４が「３０代」、服装（上半身）４１５が「作業着」、服装（下半身）４１６が「作業着」のそれぞれの情報が格納される。 As shown in FIG. 2, the labeled image data information 410 includes a shooting location 411, a shot image storage destination path 412, and a multi-label for the shot image (here, as an example, gender 413, age 414, clothing). (Upper body) 415 and clothes (lower body) 416 labels) are stored.
For example, as shown in the first line (first record) of the labeled image data information 410 in FIG. 2, "Factory A" is stored as the shooting location 411 of the image data. "A_001.jpg" is stored as the captured image storage destination path 412 indicating the storage destination of the image data. As label information of the label (multi-label) attached to the image data, gender 413 is "male", age 414 is "30s", clothes (upper body) 415 is "work clothes", and clothes (lower body) 416 is "work". Each information of "arrival" is stored.

図３は、本実施形態に係るラベル無し画像データＤＢ４２に記憶されるラベル無し画像データ情報（推論前）４２０（４２０Ａ）のデータ構成例を示す図である。
このラベル無し画像データＤＢ４２には、ラベル無し画像データそのもの（撮影画像）とともに、その画像データに関連する情報がラベル無し画像データ情報４２０として、例えばテーブル形式で格納される（図３参照）。
なお、ラベル無し画像データ情報４２０は、マルチラベルデータ学習支援装置１の処置に伴い、格納されている状態のデータ項目が異なる。図３では、画像データ読込部２２がラベル無し画像データを取得し記憶部４０に記憶した段階であり、後記する制御部１０（推論部１３）による推論処理の前（推論前）の状態を示している。なお、この推論前状態のラベル無し画像データの情報を、図３では符号４２０Ａとして示している。 FIG. 3 is a diagram showing a data configuration example of the unlabeled image data information (before inference) 420 (420A) stored in the unlabeled image data DB 42 according to the present embodiment.
In the unlabeled image data DB 42, along with the unlabeled image data itself (photographed image), information related to the image data is stored as unlabeled image data information 420 as, for example, in a table format (see FIG. 3).
The unlabeled image data information 420 has different data items in a stored state due to the treatment of the multi-label data learning support device 1. FIG. 3 shows a state before (before inference) the inference processing by the control unit 10 (inference unit 13) described later, which is the stage where the image data reading unit 22 acquires the unlabeled image data and stores it in the storage unit 40. ing. The information of the unlabeled image data in the pre-inference state is shown as reference numeral 420A in FIG.

図３に示すように、このラベル無し画像データ情報（推論前）４２０（４２０Ａ）には、そのデータ項目として、撮影場所４２１、撮影画像保存先パス４２２、撮影画像に対するマルチラベル（性別４２３、年代４２４、服装（上半身）４２５、服装（下半身）４２６の各ラベル）、アノテーション候補４２７、関連度４２８が格納される。
撮影場所４２１、撮影画像保存先パス４２２、マルチラベル（性別４２３〜服装（下半身）４２６）は、図２で示したラベル付き画像データ情報４１０と同じ内容である。ただし、マルチラベルにおける各ラベルのフィールドは、空欄（「―」）である。アノテーション候補４２７および関連度４２８は、ラベル無し画像データの中から、アノテーションを実行するアノテーション画像（後記する「アノテーション対象画像」）を選定する際に必要となるデータ項目であり、詳細は後記する。 As shown in FIG. 3, the unlabeled image data information (before inference) 420 (420A) includes the shooting location 421, the shot image storage destination path 422, and the multi-label for the shot image (gender 423, age) as the data items. 424, clothes (upper body) 425, clothes (lower body) 426 labels), annotation candidates 427, and relevance degree 428 are stored.
The shooting location 421, the shooting image storage destination path 422, and the multi-label (gender 423 to clothes (lower body) 426) have the same contents as the labeled image data information 410 shown in FIG. However, the field of each label in the multi-label is blank (“−”). Annotation candidate 427 and relevance degree 428 are data items required when selecting an annotation image (“annotation target image” described later) to execute annotation from unlabeled image data, and details will be described later.

図１に戻り、制御部１０は、マルチラベルデータ学習支援装置１全体の制御を司り、学習部１１と、評価部１２と、推論部１３と、アノテーション画像選定部１４とを含んで構成される。 Returning to FIG. 1, the control unit 10 controls the entire multi-label data learning support device 1, and includes a learning unit 11, an evaluation unit 12, an inference unit 13, and an annotation image selection unit 14. ..

学習部１１は、ラベル付き学習データを用いて、ＤＬなどの機械学習を利用し識別器１１１を生成する。具体的には、学習部１１は、ラベル付き画像データＤＢ４１に記憶されたラベル付き画像データの中から所定量の画像データ（例えば、９割の画像データ）を選定し、機械学習により識別器１１１を生成する。学習部１１による、ラベル付き学習データの中からの所定量の学習データの選定は、例えば、ランダムに選定される。 The learning unit 11 uses the labeled learning data to generate the classifier 111 by using machine learning such as DL. Specifically, the learning unit 11 selects a predetermined amount of image data (for example, 90% of the image data) from the labeled image data stored in the labeled image data DB 41, and the classifier 111 is subjected to machine learning. To generate. The learning unit 11 selects a predetermined amount of learning data from the labeled learning data, for example, at random.

評価部１２は、学習部１１が生成した識別器１１１の精度評価を行う。具体的には、評価部１２は、学習部１１が識別器１１１の生成に利用していないラベル付き画像データを用いて、識別器１１１の性能が所定の認識精度（目標の認識精度）に達したか否かを判定する。
所定の認識精度の情報は、事前にパラメータファイルなどに記述し記憶部４０に記憶しておいてもよいし、マルチラベルデータ学習支援装置１の起動時に入力部２０がアノテータ９等からの入力を受け付けて記憶部４０に記憶しておいてもよい。 The evaluation unit 12 evaluates the accuracy of the classifier 111 generated by the learning unit 11. Specifically, the evaluation unit 12 uses the labeled image data that the learning unit 11 does not use to generate the classifier 111, and the performance of the classifier 111 reaches a predetermined recognition accuracy (target recognition accuracy). Determine if it has been done.
Information on the predetermined recognition accuracy may be described in a parameter file or the like in advance and stored in the storage unit 40, or the input unit 20 inputs from the annotator 9 or the like when the multi-label data learning support device 1 is started. It may be received and stored in the storage unit 40.

推論部１３は、評価部１２が識別器１１１の性能について所定の認識精度に達していないと判定した場合に、学習部１１が生成した、その時点での識別器１１１を用いて、ラベル無し画像データＤＢ４２に記憶されているすべてのラベル無し画像データの推論処理を行う。 When the evaluation unit 12 determines that the performance of the classifier 111 does not reach a predetermined recognition accuracy, the reasoning unit 13 uses the discriminator 111 generated by the learning unit 11 at that time to generate an unlabeled image. Inference processing is performed for all unlabeled image data stored in the data DB 42.

図４は、本実施形態に係るラベル無し画像データＤＢ４２に記憶されるラベル無し画像データ情報（推論後）４２０（４２０Ｂ）のデータ構成例を示す図である。
推論部１３は、識別器１１１を用いて各ラベル無し画像データについて、各ラベル（マルチラベル）の推論（推定）をした結果を、そのラベル無し画像データに対応するレコードのラベル無し画像データ情報として格納する。図４においては、性別４２３、年代４２４、服装（上半身）４２５、服装（下半身）４２６それぞれのラベルの推論結果がラベル情報として、ラベル無し画像データ情報４２０に格納される。なお、この推論後状態のラベル無し画像データの情報を、図４では符号４２０Ｂとして示している。 FIG. 4 is a diagram showing a data configuration example of the unlabeled image data information (after inference) 420 (420B) stored in the unlabeled image data DB 42 according to the present embodiment.
The inference unit 13 uses the classifier 111 to infer (estimate) each label (multi-label) for each unlabeled image data, and uses the result as unlabeled image data information of the record corresponding to the unlabeled image data. Store. In FIG. 4, the inference results of the labels of the gender 423, the age 424, the clothes (upper body) 425, and the clothes (lower body) 426 are stored in the unlabeled image data information 420 as label information. The information of the unlabeled image data in the post-inference state is shown as reference numeral 420B in FIG.

図１に戻り、アノテーション画像選定部１４は、推論部１３がラベル無し画像データの各ラベルについて推論した結果と、ラベル関連情報ＤＢ４３に記憶されたラベル関連情報４３０（図５参照）とを用いて、優先的にアノテーションを行う撮影画像（以下、「アノテーション対象画像」と称する。）を選定する。 Returning to FIG. 1, the annotation image selection unit 14 uses the result of deduction by the inference unit 13 for each label of the unlabeled image data and the label-related information 430 (see FIG. 5) stored in the label-related information DB 43. , Select a photographed image (hereinafter referred to as "annotation target image") to be preferentially annotated.

まず、アノテーション画像選定部１４は、ラベル関連情報ＤＢ４３を参照し、アノテーション画像の選定に利用するラベル関連情報の読込みを行う。ここで、アノテーション画像選定部１４は、ラベル関連情報が未定義であった場合には、ラベル関連情報定義部２３を介して、アノテータ９等からのラベル関連情報の入力を受け付ける。なお、このラベル関連情報の定義に関する詳細は、後記する図９〜図１１（ラベル関連情報定義ユーザインタフェース、ラベル関連情報履歴確認ユーザインタフェース）を用いて説明する。 First, the annotation image selection unit 14 refers to the label-related information DB 43 and reads the label-related information used for selecting the annotation image. Here, when the label-related information is undefined, the annotation image selection unit 14 accepts the input of the label-related information from the annotator 9 or the like via the label-related information definition unit 23. The details of the definition of the label-related information will be described with reference to FIGS. 9 to 11 (label-related information definition user interface, label-related information history confirmation user interface) described later.

図５は、本実施形態に係るラベル関連情報ＤＢ４３に記憶されるラベル関連情報４３０のデータ構成例を示す図である。
このラベル関連情報４３０は、そのデータ項目として、ＩＤ４３１、撮影場所４３２、撮影画像に対するマルチラベル（ここでは例として、性別４３３、年代４３４、服装（上半身）４３５、服装（下半身）４３６の各ラベル）、説明ラベル４３７、目的ラベル４３８、関連度４３９が格納される。 FIG. 5 is a diagram showing a data configuration example of the label-related information 430 stored in the label-related information DB 43 according to the present embodiment.
The label-related information 430 has ID431, a shooting location 432, and a multi-label for a shot image as its data items (here, for example, gender 433, age 434, clothes (upper body) 435, clothes (lower body) 436 labels). , Description label 437, target label 438, and relevance degree 439 are stored.

ＩＤ４３１には、そのラベル関連情報（１レコード）に付されたユニークな識別子が格納される。撮影場所４３２には、その撮影画像の撮影場所が格納される。性別４３３、年代４３４、服装（上半身）４３５、服装（下半身）４３６は、マルチラベル（各ラベル）の例である。そして、説明ラベル４３７（第１ラベル）と目的ラベル４３８（第２ラベル）とは、以下に示すように、どのラベル同士が関連性を持つかを定義するものである。また、関連度４３９は、説明ラベル４３７と目的ラベル４３８との関連の強さ（例えば、０〜１．０の範囲）を定義するものである。 The ID 431 stores a unique identifier attached to the label-related information (1 record). The shooting place of the shot image is stored in the shooting place 432. Gender 433, age 434, clothing (upper body) 435, and clothing (lower body) 436 are examples of multi-labels (each label). The explanatory label 437 (first label) and the target label 438 (second label) define which labels are related to each other, as shown below. Further, the degree of association 439 defines the strength of the association between the explanatory label 437 and the target label 438 (for example, in the range of 0 to 1.0).

例えば、ＩＤ４３１が「A_001」のレコードで示すように、説明ラベル４３７を「性別」と「年代」、目的ラベル４３８を「服装（上半身）」、関連度４３９を「０．９」として定義した場合、「A_001」のレコードは、「Ａ工場では、２０代の男性の９割が服装（上半身）に作業着を着用している」という定義内容となる。また、ＩＤ４３１が「A_005」のレコードのように、説明ラベル４３７を「年代」「服装（上半身）」「服装（下半身）」、目的ラベル４３８を「性別」、関連度４３９を「０．８」として定義した場合、「A_005」のレコードは、「Ａ工場で撮影できる５０代で服装（上半身および下半身）がスーツの人物は、８割が男性である」という定義内容となる。なお、関連性のないラベルに関しては、「A_001」のレコードの服装（下半身）４３６のように、フィールドを空欄（「―」）としてもよい。 For example, as shown in the record of ID431 "A_001", the explanatory label 437 is defined as "gender" and "age", the target label 438 is defined as "clothes (upper body)", and the relevance degree 439 is defined as "0.9". , The record of "A_001" is defined as "At Factory A, 90% of men in their twenties wear work clothes on their clothes (upper body)". Also, like the record with ID 431 of "A_005", the explanation label 437 is "age", "clothes (upper body)", "clothes (lower body)", the target label 438 is "gender", and the relevance degree 439 is "0.8". When defined as, the record of "A_005" has the definition content that "80% of people in their 50s who can shoot at Factory A and whose clothes (upper and lower body) are suits are men". For irrelevant labels, the field may be blank (“−”), as in the clothing (lower body) 436 of the record “A_001”.

一方、アノテーション画像選定部１４は、ラベル関連情報４３０が定義済みだった場合、ラベル関連情報ＤＢ４３に格納されたラベル関連情報４３０の中から、アノテーションしようとする画像データと同一の撮影場所４３２のレコードを抽出する。そして、アノテーション画像選定部１４は、推論部１３が推論処理した結果と、抽出したラベル関連情報４３０とを用いて優先的にアノテーションを行うラベル無し画像データ（アノテーション対象画像）を選定する。なお、このアノテーション画像選定部１４による、アノテーション対象画像の選定処理については、後記する図７を参照して詳細に説明する。 On the other hand, when the label-related information 430 is defined, the annotation image selection unit 14 records the same shooting location 432 as the image data to be annotated from the label-related information 430 stored in the label-related information DB 43. To extract. Then, the annotation image selection unit 14 selects unlabeled image data (annotation target image) to be annotated preferentially using the result of the inference processing by the inference unit 13 and the extracted label-related information 430. The process of selecting the annotation target image by the annotation image selection unit 14 will be described in detail with reference to FIG. 7 described later.

なお、アノテーション画像選定部１４が選定したアノテーション画像は、出力部３０により、表示装置等に出力される。そして、アノテータ９等により付与された正解ラベルの情報を受け付け、新たにラベルが付された画像データをラベル付き画像データＤＢ４１に格納する。このとき、アノテーション画像として選定されたラベル無し画像データは、ラベル無し画像データＤＢ４２から削除される。
マルチラベルデータ学習支援装置１の学習部１１は、追加されたラベル付き画像データを含めて、さらに機械学習を行い識別器１１１を生成することにより識別器１１１の性能を向上させ、識別器１１１が所定の認識精度に達するまで上記の処理を繰り返す。 The annotation image selected by the annotation image selection unit 14 is output to a display device or the like by the output unit 30. Then, the information of the correct answer label given by the annotator 9 or the like is received, and the newly labeled image data is stored in the labeled image data DB 41. At this time, the unlabeled image data selected as the annotation image is deleted from the unlabeled image data DB 42.
The learning unit 11 of the multi-label data learning support device 1 improves the performance of the classifier 111 by further performing machine learning including the added labeled image data to generate the classifier 111, and the classifier 111 The above process is repeated until a predetermined recognition accuracy is reached.

＜処理の流れ＞
次に、マルチラベルデータ学習支援装置１が実行する処理の流れについて説明する。まず、図６を参照し、全体の流れを説明する。そして、図７を参照して、アノテーション対象画像の選定処理の詳細を説明する。 <Processing flow>
Next, the flow of processing executed by the multi-label data learning support device 1 will be described. First, the entire flow will be described with reference to FIG. Then, the details of the selection process of the annotation target image will be described with reference to FIG. 7.

図６は、本実施形態に係るマルチラベルデータ学習支援装置１が実行する全体の処理の流れを示すフローチャートである。
なお、マルチラベルデータ学習支援装置１の記憶部４０には、識別器１１１の性能を判断するための所定の認識精度（目標の認識精度）の情報が予め記憶されているものとする。 FIG. 6 is a flowchart showing the flow of the entire process executed by the multi-label data learning support device 1 according to the present embodiment.
It is assumed that the storage unit 40 of the multi-label data learning support device 1 stores information of a predetermined recognition accuracy (target recognition accuracy) for determining the performance of the classifier 111 in advance.

まず、マルチラベルデータ学習支援装置１の画像データ読込部２２は、ラベル付き画像データとラベル無し画像データの入力を受け付ける（ステップＳ１）。そして、画像データ読込部２２は、受け付けたラベル付き画像データについて、図２に示すラベル付き画像データ情報４１０を生成した上で、ラベル付き画像データＤＢ４１に格納する。また、画像データ読込部２２は、受け付けたラベル無し画像データについて、図３に示すラベル無し画像データ情報（推論前）４２０（４２０Ａ）を生成した上で、ラベル無し画像データＤＢ４２に格納する。 First, the image data reading unit 22 of the multi-label data learning support device 1 accepts input of labeled image data and unlabeled image data (step S1). Then, the image data reading unit 22 generates the labeled image data information 410 shown in FIG. 2 for the received labeled image data, and then stores it in the labeled image data DB 41. Further, the image data reading unit 22 generates the unlabeled image data information (before inference) 420 (420A) shown in FIG. 3 for the received unlabeled image data, and then stores it in the unlabeled image data DB 42.

ここでは、例えば、マルチラベルデータ学習支援装置１は、少量（例えば、数十から数百枚）のラベル付き画像データと、大量（例えば、数千から数万枚）のラベル無し画像データとを受け付ける。なお、画像データ読込部２２が受け付けた画像データが、すべてラベル無し画像データであった場合には、アノテーション処理部２１が、受け付けたラベル無し画像データの中からランダムに画像データを選択し、出力部３０に出力することにより、アノテータ９等にアノテーションを実行させ、正解ラベルが付されたレベル付画像データを生成する。 Here, for example, the multi-label data learning support device 1 has a small amount (for example, tens to hundreds of sheets) of labeled image data and a large amount (for example, thousands to tens of thousands of sheets) of unlabeled image data. accept. If the image data received by the image data reading unit 22 is all unlabeled image data, the annotation processing unit 21 randomly selects and outputs the image data from the received unlabeled image data. By outputting to the unit 30, the annotator 9 or the like is made to execute the annotation, and the image data with the level with the correct answer label is generated.

次に、学習部１１は、ラベル付き画像データＤＢ４１に記憶されたラベル付き学習データを用いて、ＤＬなどの機械学習により識別器１１１を生成する（ステップＳ２）。ここで、学習部１１は、ラベル付き画像データの中から所定量（例えば、９割）の学習データをランダムに選定し、機械学習により識別器１１１を生成する。 Next, the learning unit 11 generates the classifier 111 by machine learning such as DL using the labeled learning data stored in the labeled image data DB 41 (step S2). Here, the learning unit 11 randomly selects a predetermined amount (for example, 90%) of learning data from the labeled image data, and generates the classifier 111 by machine learning.

続いて、評価部１２は、学習部１１が識別器１１１の生成に利用していないラベル付き画像データを用いて、識別器１１１の性能が所定の認識精度（目標の認識精度）以上か否かを判定する（ステップＳ３）。ここで、識別器１１１の性能が所定の認識精度（目標の認識精度）以上である場合には（ステップＳ３→Ｙｅｓ）、識別器１１１の学習が終了したものとして処理を終える。一方、所定の認識精度に達していない場合には（ステップＳ３→Ｎｏ）、次のステップＳ４に進む。 Subsequently, the evaluation unit 12 uses the labeled image data that the learning unit 11 does not use to generate the classifier 111, and determines whether or not the performance of the classifier 111 is equal to or higher than a predetermined recognition accuracy (target recognition accuracy). Is determined (step S3). Here, when the performance of the classifier 111 is equal to or higher than the predetermined recognition accuracy (target recognition accuracy) (step S3 → Yes), the process is terminated assuming that the learning of the classifier 111 is completed. On the other hand, if the predetermined recognition accuracy is not reached (step S3 → No), the process proceeds to the next step S4.

ステップＳ４において、推論部１３は、ステップＳ２において生成された識別器１１１を用いて、ラベル無し画像データＤＢ４２に記憶されているすべての画像データの各ラベルに対する推論処理を実行する。そして、推論部１３は、推論結果としての各ラベルの情報を、ラベル無し画像データ情報（推論後）４２０（４２０Ｂ）（図４参照）に格納する。 In step S4, the inference unit 13 executes inference processing for each label of all the image data stored in the unlabeled image data DB 42 by using the classifier 111 generated in step S2. Then, the inference unit 13 stores the information of each label as the inference result in the unlabeled image data information (after inference) 420 (420B) (see FIG. 4).

次に、アノテーション画像選定部１４は、ラベル関連情報ＤＢ４３（図１）を参照し、アノテーション画像の選定に利用するラベル関連情報４３０（図５参照）が定義済みか否かを判定する（ステップＳ５）。
ここで、アノテーション画像選定部１４は、例えば、ラベル関連情報４３０（図５）の撮影場所４３２の項目に、ラベル無し画像データ情報４２０（図４）の撮影場所４２１と同じ撮影場所が設定されているか等により、定義済みか否かを判定することができる。 Next, the annotation image selection unit 14 refers to the label-related information DB 43 (FIG. 1), and determines whether or not the label-related information 430 (see FIG. 5) used for selecting the annotation image has been defined (step S5). ).
Here, for example, the annotation image selection unit 14 sets the same shooting location as the shooting location 421 of the unlabeled image data information 420 (FIG. 4) in the item of the shooting location 432 of the label-related information 430 (FIG. 5). Whether or not it has been defined can be determined depending on whether or not it has been defined.

そして、アノテーション画像選定部１４は、ラベル関連情報４３０が定義済みであった場合に（ステップＳ５→Ｙｅｓ）、定義済みのラベル関連情報を読み込む（ステップＳ６）。具体的には、アノテーション画像選定部１４は、ラベル関連情報ＤＢ４３に記憶されたラベル関連情報４３０（図５）の中から、例えば、アノテーションしようとしている画像データと同一の撮影場所４３２のレコードを、今回利用するラベル関連情報として抽出する。なお、アノテーション画像選定部１４は、ラベル関連情報４３０（図５）で示される情報のうち、ＩＤ４３１の情報を指定する情報を受け取り、今回利用するラベル関連情報としてもよい。
一方、アノテーション画像選定部１４は、定義済みでなかった場合には（ステップＳ５→Ｎｏ）、ラベル関連情報を定義する処理を行う（ステップＳ７）。なお、ラベル関連情報の定義に関する詳細は後記する（図９〜図１１参照）。 Then, when the label-related information 430 is defined (step S5 → Yes), the annotation image selection unit 14 reads the defined label-related information (step S6). Specifically, the annotation image selection unit 14 selects, for example, a record of the same shooting location 432 as the image data to be annotated from the label-related information 430 (FIG. 5) stored in the label-related information DB 43. Extract as label related information to be used this time. Note that the annotation image selection unit 14 may receive information that specifies the information of ID 431 among the information shown in the label-related information 430 (FIG. 5), and may use it as the label-related information to be used this time.
On the other hand, if the annotation image selection unit 14 has not been defined (step S5 → No), the annotation image selection unit 14 performs a process of defining label-related information (step S7). Details regarding the definition of label-related information will be described later (see FIGS. 9 to 11).

次に、アノテーション画像選定部１４は、ステップＳ４において行ったラベル無し画像データに対する各ラベルの推論結果と、定義されたラベル関連情報とを用いて、優先的にアノテーションを行うラベル無し画像データ（アノテーション対象画像）の選定を行う（ステップＳ８：アノテーション対象画像選定処理）。なお、このアノテーション対象画像選定処理の詳細は、図７を参照して後記する。 Next, the annotation image selection unit 14 preferentially annotates the unlabeled image data (annotation) using the inference result of each label for the unlabeled image data performed in step S4 and the defined label-related information. The target image) is selected (step S8: annotation target image selection process). The details of this annotation target image selection process will be described later with reference to FIG. 7.

続いて、アノテーション画像選定部１４は、出力部３０を介して、選定したアノテーション対象画像を、表示装置に画面表示する（ステップＳ９）。そして、アノテーション処理部２１が、アノテータ９等からの正解ラベルの入力を受け付ける。即ち、選定したアノテーション対象画像についてのアノテーションを実行する（ステップＳ１０）。 Subsequently, the annotation image selection unit 14 displays the selected annotation target image on the screen of the display device via the output unit 30 (step S9). Then, the annotation processing unit 21 accepts the input of the correct answer label from the annotator 9 and the like. That is, the annotation of the selected image to be annotated is executed (step S10).

アノテーション処理部２１は、新たに正解ラベルが付与された画像データ（ラベル付き画像データ）を、ラベル付き画像データＤＢ４１に追加するとともに、ラベル無し画像データＤＢ４２から、当該画像データの情報を削除する（ステップＳ１１）。
このとき、アノテーション処理部２１は、ラベル無し画像データ情報（推論後）４２０（４２０Ｂ）（図４参照）の各ラベルのフィールドに記憶された情報を、図３の推論前の状態に戻すようにリセットしてもよいし、そのまま保持し、推論処理を繰り返す度に上書きしてもよい。 The annotation processing unit 21 adds image data (labeled image data) to which a correct answer label is newly added to the labeled image data DB 41, and deletes the information of the image data from the unlabeled image data DB 42 (. Step S11).
At this time, the annotation processing unit 21 returns the information stored in the fields of each label of the unlabeled image data information (after inference) 420 (420B) (see FIG. 4) to the state before inference in FIG. You may reset it, or you may keep it as it is and overwrite it every time the inference process is repeated.

続いて、マルチラベルデータ学習支援装置１は、ステップＳ２に戻り、識別器１１１の性能が所定の認識精度（目標の認識精度）以上になるまで（ステップＳ３参照）、上記の処理を繰り返す。 Subsequently, the multi-label data learning support device 1 returns to step S2 and repeats the above processing until the performance of the classifier 111 becomes equal to or higher than a predetermined recognition accuracy (target recognition accuracy) (see step S3).

＜アノテーション対象画像選定処理＞
次に、マルチラベルデータ学習支援装置１のアノテーション画像選定部１４が実行するアノテーション対象画像選定処理について説明する。なお、このアノテーション対象画像選定処理は、図６のステップＳ８において実行される処理である。
図７は、本実施形態に係るマルチラベルデータ学習支援装置１のアノテーション画像選定部１４が実行するアノテーション対象画像選定処理の流れを示すフローチャートである。 <Annotation target image selection process>
Next, the annotation target image selection process executed by the annotation image selection unit 14 of the multi-label data learning support device 1 will be described. The annotation target image selection process is a process executed in step S8 of FIG.
FIG. 7 is a flowchart showing the flow of the annotation target image selection process executed by the annotation image selection unit 14 of the multi-label data learning support device 1 according to the present embodiment.

まず、アノテーション画像選定部１４は、定義したラベル関連情報４３０（図５）を取得する（ステップＳ８１）。
そして、アノテーション画像選定部１４は、ラベル無し画像データＤＢ４２からラベル無し画像データを１つ選択する（ステップＳ８２）。 First, the annotation image selection unit 14 acquires the defined label-related information 430 (FIG. 5) (step S81).
Then, the annotation image selection unit 14 selects one unlabeled image data from the unlabeled image data DB 42 (step S82).

続いて、アノテーション画像選定部１４は、選択したラベル無し画像データの推論結果（図４参照）と、定義したラベル関連情報とを比較する、即ち、推論結果が定義したラベル関連情報と異なっているか否かを判定する（ステップＳ８３）。
具体的には、図４に示すラベル無し画像データ情報（推論後）４２０（４２０Ｂ）の選択したラベル無し画像データのレコードと、図５のラベル関連情報４３０のレコードとを比較して、説明ラベル４３７（第１ラベル）のラベル情報は一致するが目的ラベル（第２ラベル）のラベル情報が異なるレコードを探索する。例えば、図４の撮影画像保存先パス４２２が「A_101.jpg」のレコードは、図５のＩＤ４３１が「A_002」のレコードの説明ラベル（性別、年代）とラベル情報が「性別：男、年代：３０代」で一致しているが、目的ラベル（服装（上半身））のラベル情報が異なっている。 Subsequently, the annotation image selection unit 14 compares the inference result of the selected unlabeled image data (see FIG. 4) with the defined label-related information, that is, is the inference result different from the defined label-related information? It is determined whether or not (step S83).
Specifically, the record of the selected unlabeled image data of the unlabeled image data information (after inference) 420 (420B) shown in FIG. 4 is compared with the record of the label-related information 430 of FIG. Search for records in which the label information of 437 (first label) matches but the label information of the target label (second label) is different. For example, a record whose captured image storage path 422 is "A_101.jpg" in FIG. 4 has an explanatory label (gender, age) and label information of a record whose ID 431 is "A_002" in FIG. 5 is "gender: male, age:". Although they match in their thirties, the label information of the target label (clothes (upper body)) is different.

ステップＳ８３において、ラベル無し画像データ情報（推論後）４２０（４２０Ｂ）のレコードと、ラベル関連情報４３０のレコードと比較して、説明ラベル４３７のラベル情報は一致するが、目的ラベル４３８のラベル情報が異なるレコードを見つけた場合（ステップＳ８３→Ｙｅｓ）、そのラベル無し画像データを、アノテーション候補画像に決定する（ステップＳ８４）。
なお、アノテーション候補画像に決定されたラベル無し画像データには、図８のラベル無し画像データ情報（アノテーション候補画像決定後）４２０（４２０Ｃ）に示すように、アノテーション候補４２７にフラグ（例えば、「ＹＥＳ」）をたてる。また、アノテーション画像選定部１４は、ラベル無し画像データ情報（アノテーション候補画像決定後）４２０（４２０Ｃ）の関連度４２８の欄に、比較において異なっていると判定されたラベル関連情報４３０（図５）に記憶された関連度４３９の値を抽出して格納する。ここでは、図５のＩＤ４３１が「A_002」のレコードにおける関連度４３９の値「０．９」が、図８の撮影画像保存先パス４２２が「A_101.jpg」のレコードの関連度４２８の値「０．９」として格納される。 In step S83, the label information of the explanatory label 437 matches the record of the unlabeled image data information (after inference) 420 (420B) and the record of the label-related information 430, but the label information of the target label 438 is different. When a different record is found (step S83 → Yes), the unlabeled image data is determined as an annotation candidate image (step S84).
For the unlabeled image data determined as the annotation candidate image, a flag (for example, "YES") is given to the annotation candidate 427 as shown in the unlabeled image data information (after the annotation candidate image is determined) 420 (420C) in FIG. "). Further, the annotation image selection unit 14 has a label-related information 430 (FIG. 5) determined to be different in the column of the relevance degree 428 of the unlabeled image data information (after the annotation candidate image is determined) 420 (420C). The value of the degree of association 439 stored in is extracted and stored. Here, the value "0.9" of the relevance degree 439 in the record in which the ID 431 of FIG. 5 is "A_002" is the value of the relevance degree 428 of the record in which the captured image storage destination path 422 of FIG. 8 is "A_101.jpg". It is stored as "0.9".

一方、ステップＳ８３において、該当するレコードが見つからなかった場合には（ステップＳ８３→Ｎｏ）、図８のラベル無し画像データ情報（アノテーション候補画像決定後）４２０（４２０Ｃ）に示すように、アノテーション候補４２７のフラグを、例えば「ＮＯ」として（フラグをたてずに）、次のステップＳ８５に進む。 On the other hand, if the corresponding record is not found in step S83 (step S83 → No), the annotation candidate 427 is shown in the unlabeled image data information (after the annotation candidate image is determined) 420 (420C) in FIG. For example, the flag of is set to "NO" (without setting the flag), and the process proceeds to the next step S85.

ステップＳ８５において、アノテーション画像選定部１４は、ラベル無し画像データＤＢ４２に記憶されたラベル無し画像データのすべてを処理したか否かを判定する。そして、まだ処理していないラベル無し画像データがある場合には（ステップＳ８５→Ｎｏ）、ステップＳ８２に戻り、次のラベル無し画像データについて処理を続ける。一方、ラベル無し画像データのすべての処理を終えた場合には（ステップＳ８５→Ｙｅｓ）、ステップＳ８６に進む。 In step S85, the annotation image selection unit 14 determines whether or not all of the unlabeled image data stored in the unlabeled image data DB 42 has been processed. Then, if there is unlabeled image data that has not been processed yet (step S85 → No), the process returns to step S82, and processing is continued for the next unlabeled image data. On the other hand, when all the processing of the unlabeled image data is completed (step S85 → Yes), the process proceeds to step S86.

ステップＳ８６において、アノテーション画像選定部１４は、決定したアノテーション候補画像の数が、所定の閾値を超えたか否かを判定する。なお、このアノテーション候補画像の数に関する所定の閾値は、アノテーション処理を実行するラベル無し画像データの数を制限するための閾値であり、予め記憶部４０に格納しておく。
そして、所定の閾値を超えていなかった場合に（ステップＳ８６→Ｎｏ）、アノテーション画像選定部１４は、アノテーション候補画像のすべてを、アノテーションを実行するアノテーション対象画像に選定し（ステップＳ８７）、処理を終える。 In step S86, the annotation image selection unit 14 determines whether or not the number of the determined annotation candidate images exceeds a predetermined threshold value. The predetermined threshold value regarding the number of annotation candidate images is a threshold value for limiting the number of unlabeled image data for executing annotation processing, and is stored in the storage unit 40 in advance.
Then, when the predetermined threshold value is not exceeded (step S86 → No), the annotation image selection unit 14 selects all of the annotation candidate images as annotation target images for executing annotation (step S87), and performs processing. Finish.

一方、所定の閾値を超えていた場合に（ステップＳ８６→Ｙｅｓ）、アノテーション画像選定部１４は、ラベル無し画像データ情報（アノテーション候補画像決定後）４２０（４２０Ｃ）（図８）を参照し、アノテーション候補４２７のフラグがたっているラベル無し画像データのレコードの中で、関連度４２８の値が高い順に所定の閾値に達する数までアノテーション候補画像を、アノテーションを実行するアノテーション対象画像に選定し（ステップＳ８８）、処理を終える。
なお、関連度４２８の値が高い順に選定する理由は、関連度の値が高い程、説明ラベルと目的ラベルとの関連の強さが強いため、推論結果が定義したラベル関連情報と異なっている場合には、その推論が間違っている可能性が高くなるためである。 On the other hand, when the predetermined threshold is exceeded (step S86 → Yes), the annotation image selection unit 14 refers to the unlabeled image data information (after the annotation candidate image is determined) 420 (420C) (FIG. 8) and annotates. Among the records of unlabeled image data on which the candidate 427 flag is set, annotation candidate images up to the number reaching a predetermined threshold in descending order of the value of the degree of association 428 are selected as annotation target images to be annotated (step S88). ), Finish the process.
The reason for selecting in descending order of the relevance degree 428 is that the higher the relevance value is, the stronger the relationship between the explanatory label and the target label is, so that the inference result is different from the defined label-related information. In some cases, the reasoning is likely to be wrong.

＜ユーザインタフェース＞
次に、本実施形態に係るマルチラベルデータ学習支援装置１のラベル関連情報定義部２３が提供するユーザインタフェース（以下、「ＵＩ」と称する。）について説明する。ラベル関連情報定義部２３は、ラベル関連情報定義ＵＩ２００（図９，図１１参照）と、ラベル関連情報履歴確認ＵＩ２５０を、出力部３０を介して表示装置に表示し、アノテータ９等によりラベル関連情報に関する定義の設定情報の入力を受け付け、ラベル関連情報ＤＢ４３にラベル関連情報４３０（図５）として記憶する。 <User interface>
Next, a user interface (hereinafter, referred to as “UI”) provided by the label-related information definition unit 23 of the multi-label data learning support device 1 according to the present embodiment will be described. The label-related information definition unit 23 displays the label-related information definition UI 200 (see FIGS. 9 and 11) and the label-related information history confirmation UI 250 on the display device via the output unit 30, and the label-related information is displayed by the annotator 9 or the like. The input of the setting information of the definition regarding is received and stored as the label-related information 430 (FIG. 5) in the label-related information DB 43.

図９は、本実施形態に係るラベル関連情報定義ＵＩ２００を例示する図である。アノテータ９は、新規または追加でラベル関連情報の定義を設定する場合に、このラベル関連情報定義ＵＩ２００を利用する。 FIG. 9 is a diagram illustrating the label-related information definition UI 200 according to the present embodiment. The annotator 9 uses this label-related information definition UI 200 when setting a new or additional definition of label-related information.

まず、アノテータ９は、画像読込（メニュー）２０１により、ラベル関連情報を定義する撮影場所の撮影画像を選定する。選定した撮影画像は、画像表示領域２０２に表示される。また、選定した撮影画像の撮影場所は、撮影場所テキストボックス２０３に表示される。図９では、撮影場所として「Ａ工場」が表示される例を示している。ここで、新規にラベル関連情報を定義する場合には、ＩＤテキストボックス２０４に、ユニークなＩＤが自動で表示される。なお、このＩＤは、ユニークであれば、アノテータ９等が自由に設定してもよい。 First, the annotator 9 selects a captured image at a photographing location that defines label-related information by reading an image (menu) 201. The selected captured image is displayed in the image display area 202. Further, the shooting location of the selected shot image is displayed in the shooting location text box 203. FIG. 9 shows an example in which "Factory A" is displayed as a shooting location. Here, when newly defining label-related information, a unique ID is automatically displayed in the ID text box 204. If this ID is unique, the annotator 9 or the like may freely set it.

次に、アノテータ９等により行われるラベル関連情報の定義の設定手法を説明する。
ラベル関連情報定義部２３は、関連性のあるラベルのラベル情報について、ラベルプルダウン２０５による選択を受け付ける。図９では性別ラベルにおいて「男性」を選択した例を示している。そして、その選択したラベルが、説明ラベル（「説明」）（第１ラベル）か目的ラベル（「目的」）（第２ラベル）かの選択を、関連種別プルダウン２０６により受け付ける。この際、説明ラベルと目的ラベルは、それぞれ１項目以上が選択される。 Next, a method for setting the definition of label-related information performed by the annotator 9 or the like will be described.
The label-related information definition unit 23 accepts selection by the label pull-down 205 for the label information of the related label. FIG. 9 shows an example in which "male" is selected in the gender label. Then, the selection of whether the selected label is an explanatory label (“description”) (first label) or a target label (“purpose”) (second label) is accepted by the related type pull-down 206. At this time, one or more items are selected for each of the explanatory label and the target label.

次に、アノテータ９により、当該ラベル関連情報の関連度が設定される。関連度の設定は、関連度スライダー２０７による設定でもよいし、関連度テキストエディタ２０８による設定でもよい。 Next, the annotator 9 sets the degree of relevance of the label-related information. The relevance may be set by the relevance slider 207 or by the relevance text editor 208.

定義したラベル関連情報は、登録ボタン２０９が押されることにより、新規にラベル関連情報４３０（図５）に登録される。なお、登録された情報は、ＣＳＶファイルの形式やバイナリファイルのような形式で出力することもできる。
また、出力したラベル関連情報は、ラベル関連情報定義読込（メニュー）２１０によりファイルを選択することで、ラベル関連情報ＤＢ４３（図１参照）にロードすることができる。ロードしたラベル関連情報は、履歴（ボタン）２１１が押されると、図１０のラベル関連情報履歴確認ＵＩ２５０として表示でき、ラベル関連情報を一覧で確認することができる。
アノテータ９は、図１０のラベル関連情報履歴確認ＵＩ２５０に表示されているラベル関連情報をマウス等で指定（例えば、ＩＤ「A_001」のレコードを指定）することにより、図１１に示すように、ラベル関連情報定義ＵＩ２００上で、定義内容の確認や修正を行うことが可能である。 The defined label-related information is newly registered in the label-related information 430 (FIG. 5) by pressing the registration button 209. The registered information can also be output in a CSV file format or a binary file format.
Further, the output label-related information can be loaded into the label-related information DB 43 (see FIG. 1) by selecting a file from the label-related information definition read (menu) 210. When the history (button) 211 is pressed, the loaded label-related information can be displayed as the label-related information history confirmation UI 250 in FIG. 10, and the label-related information can be confirmed in a list.
The annotator 9 specifies the label-related information displayed on the label-related information history confirmation UI 250 of FIG. 10 with a mouse or the like (for example, a record of ID “A_001” is specified), so that the label is as shown in FIG. It is possible to confirm or modify the definition contents on the related information definition UI 200.

このように、アノテータ９は、ラベル関連情報定義ＵＩ２００により、視覚的に分かりやすく簡易な手法で、ラベル関連情報の入力を行うことができる。よって、アノテータ９によるラベル関連情報の定義に関する負担を減らすことができる。 As described above, the annotator 9 can input the label-related information by the label-related information definition UI 200 by a visually easy-to-understand and simple method. Therefore, it is possible to reduce the burden on the annotator 9 regarding the definition of label-related information.

以上説明したように、本実施形態に係るマルチラベルデータ学習支援装置１、マルチラベルデータ学習支援方法およびマルチラベルデータ学習支援プログラムによれば、画像の撮影場所ごとに異なるマルチラベルの「ラベル間の関連性」をラベル関連情報として事前に定義することにより、能動学習において学習効果の高いデータを抽出する際に、このラベル関連情報を活用しアノテーション対象画像を選定することができる。よって、マルチラベルを持つ画像データにおいても、アノテーションを効率的に行うこと、つまり、アノテーション作業量を低減することが可能となる。 As described above, according to the multi-label data learning support device 1, the multi-label data learning support method, and the multi-label data learning support program according to the present embodiment, the multi-label "between labels" that differs depending on the shooting location of the image. By defining "relevance" in advance as label-related information, it is possible to select an image to be annotated by utilizing this label-related information when extracting data having a high learning effect in active learning. Therefore, it is possible to efficiently perform annotation even in image data having multiple labels, that is, to reduce the amount of annotation work.

なお、本実施形態においては、撮影場所が「Ａ工場」であり、撮影された人物について、「性別」「年代」「服装（上半身）」「服装（下半身）」のマルチラベルを付加する例として説明した。しかし、本発明はこれに限定されず、例えば、工場内の外観検査工程において撮影された物（工業製品や食品等）について、例えば「傷（キズ）の種類」「傷の位置」「汚れ」等に関するマルチラベルを付加するような場合にも用いることができる。このとき、「傷の種類」「傷の位置」「汚れ」等についてのラベル間の関連性と傾向をラベル関連情報として定義しておく。例えば、傷の種類に、線傷、ピンホール、スクラッチ、打痕などがある場合に、「Ｂ製品の外観検査ライン（撮影場所）では、線傷（傷の種類）が上部表面（傷の位置）にあると、赤い異物（汚れ）が付着している。」という定義内容で、ラベル関連情報を定義する。このように、撮影対象は人物に限定されるものではなく、様々な物などにマルチラベルを付す際にも、本発明を適用することが可能である。 In this embodiment, the shooting location is "Factory A", and as an example of adding multi-labels of "gender", "age", "clothes (upper body)", and "clothes (lower body)" to the photographed person. explained. However, the present invention is not limited to this, and for example, for an object (industrial product, food, etc.) photographed in a visual inspection process in a factory, for example, "type of scratch (scratch)", "position of scratch", "dirt". It can also be used when adding a multi-label related to the above. At this time, the relationship and tendency between the labels regarding "type of scratch", "position of scratch", "dirt", etc. are defined as label-related information. For example, if the type of scratch includes line scratches, pinholes, scratches, dents, etc., "In the visual inspection line (photographing location) of product B, the line scratches (type of scratches) are on the upper surface (scratch position). ), Red foreign matter (dirt) is attached. ”, The label-related information is defined. As described above, the object to be photographed is not limited to a person, and the present invention can be applied even when a multi-label is attached to various objects and the like.

また、本発明に係るラベル関連情報で定義される「撮影場所」は、工場や商業施設、公園、駅などの場所（エリア）に限定されず、画像に写っている撮影領域が特定されればよく、マルチラベルを付加する対象に応じて、アノテータ９により任意に定義可能である。例えば、店舗内や、店舗の入り口、会議室、製造ラインの一工程（外観検査）、製品の一部分を特定して撮影した領域などを、撮影領域として「撮影場所」に設定することができる。 Further, the "shooting place" defined by the label-related information according to the present invention is not limited to a place (area) such as a factory, a commercial facility, a park, or a station, and if the shooting area shown in the image is specified. Often, it can be arbitrarily defined by the annotator 9 depending on the target to which the multi-label is added. For example, the inside of a store, the entrance of a store, a conference room, one process (visual inspection) of a production line, an area where a part of a product is specified and photographed, and the like can be set as a "photographing place" as a photographing area.

また、本発明は、一般的なコンピュータのハードウェア資源を、マルチラベルデータ学習支援装置１の各機能として動作させるプログラム（マルチラベルデータ学習支援プログラム）によって実現することもできる。そして、プログラム（マルチラベルデータ学習支援プログラム）は、通信回線を介して配布したり、ＣＤ−ＲＯＭ等の記録媒体に記録して配布したりすることが可能である。 Further, the present invention can also be realized by a program (multi-label data learning support program) that operates the hardware resources of a general computer as each function of the multi-label data learning support device 1. The program (multi-label data learning support program) can be distributed via a communication line, or can be recorded and distributed on a recording medium such as a CD-ROM.

１マルチラベルデータ学習支援装置
１０制御部
１１学習部
１２評価部
１３推論部
１４アノテーション画像選定部
２０入力部
２１アノテーション処理部
２２画像データ読込部
２３ラベル関連情報定義部
３０出力部
４０記憶部
４１ラベル付き画像データＤＢ
４２ラベル無し画像データＤＢ
４３ラベル関連情報ＤＢ
１１１識別器
２００ラベル関連情報定義ユーザインタフェース（ＵＩ）
２５０ラベル関連情報履歴確認ユーザインタフェース（ＵＩ）
４１０ラベル付き画像データ情報
４２０ラベル無し画像データ情報
４３０ラベル関連情報
４３７説明ラベル（第１ラベル）
４３８目的ラベル（第２ラベル） 1 Multi-label data learning support device 10 Control unit 11 Learning unit 12 Evaluation unit 13 Reasoning unit 14 Annotation image selection unit 20 Input unit 21 Annotation processing unit 22 Image data reading unit 23 Label-related information definition unit 30 Output unit 40 Storage unit 41 Label Attached image data DB
42 Unlabeled image data DB
43 Label-related information DB
111 Discriminator 200 Label Related Information Definition User Interface (UI)
250 Label-related information History confirmation user interface (UI)
410 Labeled image data information 420 Unlabeled image data information 430 Label-related information 437 Description label (first label)
438 Purpose label (second label)

Claims

It is a multi-label data learning support device that supports learning of a classifier that assigns multi-labels indicating multiple labels to image data.
A labeled image data DB (DataBase) that stores labeled image data, an unlabeled image data DB that stores unlabeled image data, and a multi that shows the relationship between the plurality of labels and is attached to the image data. A storage unit that stores a label-related information DB that stores label-related information that indicates the tendency of the contents of the label, and a storage unit that stores the label-related information DB.
A learning unit that acquires the labeled image data and generates the classifier by machine learning.
An inference unit that performs inference processing for each label constituting the multi-label on the unlabeled image data using the generated discriminator.
When the inference result of each label for the unlabeled image data obtained by the inference processing and the relationship between the labels shown in the label-related information are compared and different, the unlabeled image data is imaged. An annotation image selection unit that selects as an annotation target image that indicates an image that requires annotation, which is the work of labeling data correctly.
The selected image to be annotated is displayed on the display device, the input of the correct answer label of the image data is accepted, and the image data with the accepted correct answer label is stored in the labeled image data DB as the labeled image data. An annotation processing unit to store and
A multi-label data learning support device characterized by being equipped with.

The label-related information is provided for each shooting location indicating the shooting area of the image data using the first label and the second label, in which one or more labels are selected from the labels constituting the multi-label. It is information in which the first label and the second label are associated with each other.
The annotation image selection unit extracts the information of the first label and the information of the second label from the information of each label obtained as the inference result of the unlabeled image data, and stores the information as the label-related information. The multi-label data learning support device according to claim 1, wherein the comparison is performed depending on whether or not the relationship between the labels indicated by the information of the second label associated with the first label matches.

Label-related information that displays an input screen for defining a label corresponding to each of the first label and the second label among the labels constituting the multi-label on the display device and accepts the input of the label-related information. The multi-label data learning support device according to claim 2, further comprising a definition unit.

It is a multi-label data learning support method of a multi-label data learning support device that supports learning of a classifier that assigns a multi-label indicating a plurality of labels to image data.
The multi-label data learning support device is
A labeled image data DB that stores labeled image data, an unlabeled image data DB that stores unlabeled image data, and the contents of a multi-label attached to the image data while showing the relationship between the plurality of labels. It is provided with a storage unit for storing a label-related information DB that stores label-related information indicating the tendency of the above.
The step of acquiring the labeled image data and generating the classifier by machine learning, and
A step of inferring each label constituting the multi-label on the unlabeled image data using the generated classifier, and
When the inference result of each label for the unlabeled image data obtained by the inference processing and the relationship between the labels shown in the label-related information are compared and different, the unlabeled image data is referred to as an image. The step of selecting an image that requires an annotation, which is the work of labeling the data correctly, as an image to be annotated, and
The selected image to be annotated is displayed on the display device, the input of the correct answer label of the image data is accepted, and the image data with the accepted correct answer label is stored in the labeled image data DB as the labeled image data. Steps to remember and
A multi-label data learning support method characterized by executing.

A multi-label data learning support program for operating a computer as the multi-label data learning support device according to any one of claims 1 to 3.