JP7623868B2

JP7623868B2 - Data processing device and method

Info

Publication number: JP7623868B2
Application number: JP2021053198A
Authority: JP
Inventors: 竜介関; 康貴岡田; 雄喜片山; 怜広見; 葵荻島
Original assignee: Denso Ten Ltd
Current assignee: Denso Ten Ltd
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2025-01-29
Anticipated expiration: 2041-03-26
Also published as: JP2022150552A

Description

本発明は、データ処理装置及び方法に関する。 The present invention relates to a data processing device and method.

機械学習の一種である教師あり学習により画像認識（物体検出又は物体分類等）を行う推論モデルを構築するためには、一般に、アノテーション情報が付与された大量の画像（数千枚から数百万枚の画像、又は、それを超える枚数の画像）が必要である。アノテーション情報は、例えば、人間による手動アノテーション又は学習済みモデルによる自動アノテーションによって作成される。大量の画像の画像データとアノテーション情報とを含むデータセットを用いて教師あり学習を実施できる。 To build an inference model that performs image recognition (such as object detection or object classification) using supervised learning, which is a type of machine learning, a large number of images (thousands to millions of images, or even more) with annotation information are generally required. The annotation information is created, for example, by manual annotation by humans or automatic annotation using a trained model. Supervised learning can be performed using a dataset that includes image data and annotation information for a large number of images.

特表２０１７－５１５１８９号公報Special table 2017-515189 publication

但し、手動アノテーション及び自動アノテーションの何れの場合でも、画像中の物体に誤ったクラス情報（ラベル）がアノテーションされることがある。人手（人間の目）によるチェックを個々の画像に対して行えば誤ったクラス情報（ラベル）を判別することができるが、当該チェックには膨大なコスト（時間及び人間の労力）が発生する。クラス情報に誤りがあるデータを装置側で自動抽出することができれば、データセット中の不備を探すコストを削減でき、有益である。 However, in both manual and automatic annotation, objects in an image may be annotated with incorrect class information (labels). Manual (human eye) checking of each image can identify incorrect class information (labels), but such checking incurs huge costs (time and human effort). If the device could automatically extract data with erroneous class information, it would be beneficial as it would reduce the cost of searching for defects in a dataset.

本発明は、クラス情報の誤りの特定に寄与するデータ処理装置及び方法を提供することを目的とする。 The present invention aims to provide a data processing device and method that contributes to identifying errors in class information.

本発明に係るデータ処理装置は、各々に物体の画像データを含む複数の物体画像と、各物体画像中の物体のクラスを特定するクラス情報と、を含む画像分類データセットを取得する画像分類データセット取得部と、前記画像分類データセットにおける各物体画像の特徴量を導出する特徴量導出部と、前記複数の物体画像に含まれるｍ枚の物体画像の特徴量と前記ｍ枚の物体画像に対する前記クラス情報とに基づき、前記ｍ枚の物体画像のクラスタリングを行うクラスタリング処理部と（ｍは３以上の整数）、前記クラスタリングにより得られる１以上のクラスタについて、クラスタごとに、当該クラスタに属する複数の特徴点の重心及び当該重心と各特徴量との距離を導出し、前記複数の特徴量に対して導出した複数の距離の平均及び分散を導出する指標導出部と、前記平均及び前記分散の導出結果に基づいて、前記クラス情報に誤りのある物体画像が属するクラスタを誤りクラスタとして特定する誤り特定部と、を備える構成（第１の構成）である。 The data processing device according to the present invention is configured (first configuration) to include an image classification dataset acquisition unit that acquires an image classification dataset including a plurality of object images each including image data of an object and class information that identifies the class of the object in each object image; a feature derivation unit that derives features of each object image in the image classification dataset; a clustering processing unit that performs clustering of the m object images (m is an integer of 3 or more) based on the features of the m object images included in the plurality of object images and the class information for the m object images; an index derivation unit that derives, for each cluster obtained by the clustering, the center of gravity of a plurality of feature points belonging to the cluster and the distance between the center of gravity and each feature, and derives the average and variance of the distances derived for the plurality of feature amounts; and an error identification unit that identifies, based on the derivation results of the average and the variance, a cluster to which an object image having an error in the class information belongs as an error cluster.

上記第１の構成に係るデータ処理装置において、前記誤り特定部は、導出された平均が所定の第１閾値以上であって且つ導出された分散が所定の第２閾値以上であるクラスタを、前記誤りクラスタとして特定する構成（第２の構成）であっても良い。 In the data processing device according to the first configuration, the error identification unit may be configured to identify, as the error cluster, a cluster whose derived mean is equal to or greater than a predetermined first threshold and whose derived variance is equal to or greater than a predetermined second threshold (second configuration).

上記第１又は第２の構成に係るデータ処理装置において、前記クラスタリング処理部は、前記クラスタリングにおいて前記ｍ枚の物体画像をｋ個のクラスタに分類し、ｋは、前記ｍ枚の物体画像に対する前記クラス情報により表される、前記ｍ枚の物体画像における物体のクラスの種類数に等しい構成（第３の構成）であっても良い。 In the data processing device according to the first or second configuration, the clustering processing unit may classify the m object images into k clusters in the clustering, where k is equal to the number of object classes in the m object images represented by the class information for the m object images (third configuration).

上記第１～第３の構成の何れかに係るデータ処理装置において、前記誤り特定部は、前記誤りクラスタに属するｎ枚の物体画像の再クラスタリングを前記ｎ枚の物体画像に対するｎ個の特徴量に基づいて行い、前記再クラスタリングの結果に基づいて、前記ｎ枚の物体画像の中から前記クラス情報に誤りのある物体画像を特定する（ｎは３以上の整数）構成（第４の構成）であっても良い。 In the data processing device according to any one of the first to third configurations, the error identification unit may be configured to perform reclustering of n object images belonging to the error cluster based on n feature amounts for the n object images, and identify an object image having an error in the class information from among the n object images based on the result of the reclustering (n is an integer equal to or greater than 3) (fourth configuration).

上記第４の構成に係るデータ処理装置において、前記誤り特定部は、前記再クラスタリングにより前記ｎ枚の物体画像を第１及び第２クラスタに分類し、前記第２クラスタに属する物体画像の総数が前記第１クラスタに属する物体画像の総数よりも少ないとき、前記第２クラスタに属する物体画像を前記クラス情報に誤りのある物体画像として特定する構成（第５の構成）であっても良い。 In the data processing device according to the fourth configuration, the error identification unit may be configured to classify the n object images into first and second clusters by the reclustering, and, when the total number of object images belonging to the second cluster is smaller than the total number of object images belonging to the first cluster, identify the object image belonging to the second cluster as an object image having an error in the class information (fifth configuration).

上記第４の構成に係るデータ処理装置において、前記誤り特定部は、前記再クラスタリングにより前記ｎ枚の物体画像を第１及び第２クラスタに分類し、前記ｎ枚の物体画像についてのｎ個の特徴量の重心と前記第２クラスタとの距離が、前記ｎ枚の物体画像についてのｎ個の特徴量の重心と前記第１クラスタとの距離よりも大きいとき、前記第２クラスタに属する物体画像を前記クラス情報に誤りのある物体画像として特定する構成（第６の構成）であっても良い。 In the data processing device according to the fourth configuration, the error identification unit may be configured to classify the n object images into first and second clusters by the reclustering, and, when the distance between the center of gravity of the n feature amounts for the n object images and the second cluster is greater than the distance between the center of gravity of the n feature amounts for the n object images and the first cluster, identify the object image belonging to the second cluster as an object image having an error in the class information (sixth configuration).

上記第４～第６の構成の何れかに係るデータ処理装置において、前記複数の物体画像は複数の入力画像から切り出された画像であり、前記複数の入力画像を複数の学習用画像として用い、前記複数の学習用画像の画像データと各学習用画像内の物体の位置及びクラスを特定するアノテーション情報とを含む学習用データセットを生成する学習用データセット生成部が当該データ処理装置に設けられ、前記学習用データセット生成部は、前記誤り特定部の特定結果に基づき、前記クラス情報に誤りのある物体画像を含む入力画像を前記複数の入力画像からアノテーション対象として抽出し、前記アノテーション対象に対して外部から与えられた情報を用いて、前記アノテーション対象に対する前記アノテーション情報を修正する構成（第７の構成）であっても良い。 In the data processing device according to any one of the fourth to sixth configurations, the plurality of object images are images cut out from a plurality of input images, and the data processing device is provided with a learning dataset generation unit that uses the plurality of input images as a plurality of learning images and generates a learning dataset including image data of the plurality of learning images and annotation information that identifies the position and class of an object in each learning image, and the learning dataset generation unit may be configured to extract, as an annotation target, an input image including an object image having an error in the class information from the plurality of input images based on the identification result of the error identification unit, and to correct the annotation information for the annotation target using information provided from outside for the annotation target (seventh configuration).

上記第４～第６の構成の何れかに係るデータ処理装置において、前記複数の物体画像は複数の入力画像から切り出された画像であり、前記複数の入力画像の一部を複数の学習用画像として抽出し、前記複数の学習用画像の画像データと各学習用画像内の物体の位置及びクラスを特定するアノテーション情報とを含む学習用データセットを生成する学習用データセット生成部が当該データ処理装置に設けられ、前記学習用データセット生成部は、前記誤り特定部の特定結果に基づき、前記クラス情報に誤りのある物体画像を含む入力画像を前記複数の学習用画像から除外する構成（第８の構成）であっても良い。 In the data processing device according to any one of the fourth to sixth configurations, the plurality of object images are images cut out from a plurality of input images, and the data processing device is provided with a learning dataset generation unit that extracts a portion of the plurality of input images as a plurality of learning images and generates a learning dataset including image data of the plurality of learning images and annotation information that identifies the position and class of an object in each learning image, and the learning dataset generation unit may be configured to exclude input images including object images with errors in the class information from the plurality of learning images based on the identification result of the error identification unit (eighth configuration).

上記第１～第８の構成の何れかに係るデータ処理装置において、前記画像分類データセットを用いてニューラルネットワークの機械学習を実行することで物体分類用の推論モデルを作成する機械学習処理部を更に備え、前記特徴量導出部は、前記推論モデルに対して前記画像分類データセットを入力することで各物体画像の特徴量を導出する構成（第９の構成）であっても良い。 The data processing device according to any one of the first to eighth configurations may further include a machine learning processing unit that creates an inference model for object classification by executing machine learning of a neural network using the image classification dataset, and the feature derivation unit may be configured to derive features of each object image by inputting the image classification dataset to the inference model (ninth configuration).

本発明に係るデータ処理方法は、各々に物体の画像データを含む複数の物体画像と、各物体画像中の物体のクラスを特定するクラス情報と、を含む画像分類データセットを取得する画像分類データセット取得ステップと、前記画像分類データセットにおける各物体画像の特徴量を導出する特徴量導出ステップと、前記複数の物体画像に含まれるｍ枚の物体画像の特徴量と前記ｍ枚の物体画像に対する前記クラス情報とに基づき、前記ｍ枚の物体画像のクラスタリングを行うクラスタリング処理ステップと（ｍは３以上の整数）、前記クラスタリングにより得られる１以上のクラスタについて、クラスタごとに、当該クラスタに属する複数の特徴点の重心及び当該重心と各特徴量との距離を導出し、前記複数の特徴量に対して導出した複数の距離の平均及び分散を導出する指標導出ステップと、前記平均及び前記分散の導出結果に基づいて、前記クラス情報に誤りのある物体画像が属するクラスタを誤りクラスタとして特定する誤り特定ステップと、を備える構成（第１０の構成）である。 The data processing method according to the present invention includes an image classification dataset acquisition step of acquiring an image classification dataset including a plurality of object images each including image data of an object and class information identifying a class of an object in each object image; a feature amount derivation step of deriving feature amounts of each object image in the image classification dataset; a clustering processing step of clustering the m object images based on the feature amounts of the m object images included in the plurality of object images and the class information for the m object images (m is an integer of 3 or more); an index derivation step of deriving, for each cluster obtained by the clustering, the center of gravity of a plurality of feature points belonging to the cluster and the distance between the center of gravity and each feature amount, and deriving the average and variance of the distances derived for the plurality of feature amounts; and an error identification step of identifying a cluster to which an object image having an error in the class information belongs as an error cluster based on the results of deriving the average and the variance (tenth configuration).

本発明によれば、クラス情報の誤りの特定に寄与するデータ処理装置及び方法を提供することが可能となる。 The present invention makes it possible to provide a data processing device and method that contributes to identifying errors in class information.

本発明の実施形態に係るデータ処理装置の構成図を示す。1 shows a configuration diagram of a data processing device according to an embodiment of the present invention. 本発明の実施形態で想定される複数の入力画像を示す図である。FIG. 2 illustrates a number of input images contemplated in an embodiment of the present invention. 本発明の実施形態に係り、１枚の入力画像から複数枚の物体画像が切り出される様子を示す図である。FIG. 2 is a diagram showing how a plurality of object images are cut out from one input image according to an embodiment of the present invention. 本発明の実施形態で想定される複数の物体画像を示す図である。FIG. 2 illustrates multiple object images contemplated in an embodiment of the present invention. 本発明の実施形態に属する第１実施例に係り、複数の物体画像及び対応するクラス情報を含む評価データセット（ＥＳ１）を示す図である。FIG. 1 is a diagram showing an evaluation dataset (ES1) including a plurality of object images and corresponding class information according to a first example belonging to an embodiment of the present invention. 本発明の実施形態に属する第１実施例に係り、図５の複数の物体画像に対して導出された複数の特徴点の分布を示す図である。FIG. 6 is a diagram showing a distribution of a plurality of feature points derived for the plurality of object images of FIG. 5 according to a first example belonging to an embodiment of the present invention. 本発明の実施形態に属する第１実施例に係り、複数の物体画像及び対応するクラス情報を含む評価データセット（ＥＳ２）を示す図である。FIG. 11 is a diagram showing an evaluation dataset (ES2) including a plurality of object images and corresponding class information according to a first example belonging to an embodiment of the present invention. 本発明の実施形態に属する第１実施例に係り、図７の複数の物体画像に対して導出された複数の特徴点の分布を示す図である。FIG. 8 relates to a first example belonging to an embodiment of the present invention and is a diagram showing a distribution of a plurality of feature points derived for the plurality of object images of FIG. 7. 本発明の実施形態に属する第１実施例に係り、複数の物体画像及び対応するクラス情報を含む評価データセット（ＥＳ３）を示す図である。FIG. 11 is a diagram showing an evaluation dataset (ES3) including a plurality of object images and corresponding class information according to a first example belonging to an embodiment of the present invention. 本発明の実施形態に属する第１実施例に係り、図９の複数の物体画像に対して導出された複数の特徴点の分布を示す図である。FIG. 10 relates to a first example belonging to an embodiment of the present invention and is a diagram showing a distribution of a plurality of feature points derived for the plurality of object images of FIG. 9. 本発明の実施形態に属する第２実施例に係り、図７の評価データセットに対して実施された再クラスタリングの結果を示す図である。FIG. 8 is a diagram showing the results of reclustering performed on the evaluation dataset of FIG. 7 according to a second example belonging to an embodiment of the present invention. 本発明の実施形態に属する第２実施例に係り、図９の評価データセットに対して実施された再クラスタリングの結果を示す図である。FIG. 10 is a diagram showing the results of reclustering performed on the evaluation dataset of FIG. 9 according to a second example belonging to an embodiment of the present invention. 本発明の実施形態に属する第４実施例に係り、データ処理装置の動作の一部フローチャートである。13 is a partial flowchart showing the operation of a data processing device according to a fourth embodiment of the present invention.

以下、本発明の実施形態の例を、図面を参照して具体的に説明する。参照される各図において、同一の部分には同一の符号を付し、同一の部分に関する重複する説明を原則として省略する。尚、本明細書では、記述の簡略化上、情報、信号、物理量又は部材等を参照する記号又は符号を記すことによって、該記号又は符号に対応する情報、信号、物理量又は部材等の名称を省略又は略記することがある。例えば、後述の“１０”によって参照される画像分類データセット取得部は（図１参照）、画像分類データセット取得部１０と表記されることもあるし、取得部１０と略記されることもあり得るが、それらは全て同じものを指す。 Below, examples of embodiments of the present invention will be described in detail with reference to the drawings. In each of the drawings, the same parts are given the same reference numerals, and duplicated descriptions of the same parts will be omitted as a general rule. In this specification, for the sake of simplicity, a symbol or code referring to information, a signal, a physical quantity, a member, etc. may be written, and the name of the information, signal, physical quantity, member, etc. corresponding to the symbol or code may be omitted or abbreviated. For example, the image classification dataset acquisition unit referred to by "10" described below (see FIG. 1) may be written as image classification dataset acquisition unit 10 or abbreviated as acquisition unit 10, but they all refer to the same thing.

図１に本実施形態に係るデータ処理装置１の構成図を示す。データ処理装置１は、画像分類データセット取得部１０、機械学習処理部２０、推論部３０、クラスタリング処理部４０、指標導出部５０、誤り特定部６０、学習用データセット生成部７０及びデータベース８０を備える。尚、データ処理装置１は単一のコンピュータ装置にて構成されても良いし、物理的に分離した複数のコンピュータ装置にて構成されても良い。所謂クラウドコンピューティングを利用してデータ処理装置１が構成されても良い。 Figure 1 shows a configuration diagram of a data processing device 1 according to this embodiment. The data processing device 1 includes an image classification dataset acquisition unit 10, a machine learning processing unit 20, an inference unit 30, a clustering processing unit 40, an index derivation unit 50, an error identification unit 60, a learning dataset generation unit 70, and a database 80. The data processing device 1 may be configured as a single computer device, or may be configured as multiple physically separated computer devices. The data processing device 1 may also be configured using so-called cloud computing.

画像分類データセット取得部１０は、複数の入力画像に基づく画像分類データセットを取得する。ここでは、図２に示す如く、画像分類データセットはＬ枚の入力画像に基づいて取得されるものとする。Ｌは２以上の任意の整数であり、例えば数千～数百万の値を持つ。各入力画像は１以上の物体を含む。例えば、自動車等の車両に搭載されたカメラの撮影画像の中から選ばれた多数の静止画像が複数の入力画像として用いられて良い。Ｌ枚の入力画像における第ｉ番目の入力画像は記号“ＩＩ［ｉ］”にて参照されることがある。ｉは任意の整数を表す。 The image classification dataset acquisition unit 10 acquires an image classification dataset based on multiple input images. Here, as shown in FIG. 2, it is assumed that the image classification dataset is acquired based on L input images. L is an arbitrary integer equal to or greater than 2, and has a value of, for example, several thousand to several million. Each input image includes one or more objects. For example, a large number of still images selected from images captured by a camera mounted on a vehicle such as an automobile may be used as the multiple input images. The i-th input image of the L input images may be referred to by the symbol "II[i]". i represents an arbitrary integer.

尚、本実施形態では、或る画像内に物体の画像データが含まれることを、当該画像に当該物体が含まれる又は存在すると表現することがある。同様に、或る画像中の注目した画像領域（例えば後述の物体領域）内に物体の画像データが含まれることを、注目した画像領域に物体が含まれる又は存在すると表現することがある。本実施形態において物体とは、データ処理装置１による画像認識の対象となる認識対象物体を指す。 In this embodiment, when image data of an object is included in an image, this may be expressed as the object being included or present in the image. Similarly, when image data of an object is included in a focused image area in an image (e.g., an object area described below), this may be expressed as the object being included or present in the focused image area. In this embodiment, an object refers to a recognition target object that is the subject of image recognition by the data processing device 1.

図３の画像６１０は１枚の入力画像の例である。入力画像６１０には３つの物体６１１～６１３が含まれている。物体６１１、６１２、６１３は、夫々、車両、人間、交通標識である。本実施形態では、車両として路上を走行可能な自動車を主として想定する。 Image 610 in Figure 3 is an example of one input image. Input image 610 contains three objects 611 to 613. Objects 611, 612, and 613 are a vehicle, a human, and a traffic sign, respectively. In this embodiment, the vehicle is primarily assumed to be an automobile that can travel on the road.

取得部１０は機械学習により生成された物体検出器を有していて良い。この場合、取得部１０は、物体検出器を用いて入力画像ごとに物体検出を行うことで画像分類データセットを生成できる。物体検出では、画像認識の対象となる画像内の物体の位置を特定する位置特定と、画像認識の対象となる画像内の物体のクラス（種別）を特定するクラス識別と、が行われる。 The acquisition unit 10 may have an object detector generated by machine learning. In this case, the acquisition unit 10 can generate an image classification dataset by performing object detection for each input image using the object detector. In object detection, position identification is performed to identify the position of an object in an image that is the subject of image recognition, and class identification is performed to identify the class (type) of an object in an image that is the subject of image recognition.

従って、入力画像６１０に対する物体検出において、物体検出器は、入力画像６１０に、物体６１１を含む物体領域６１１Ｂ、物体６１２を含む物体領域６１２Ｂ及び物体６１３を含む物体領域６１３Ｂを設定することができる。入力画像６１０内の物体６１１～６１３の位置が特定されることで物体領域６１１Ｂ～６１３Ｂが設定される。取得部１０は入力画像６１０から物体領域６１１Ｂ～６１３Ｂを切り出すことで物体画像６２１～６２３を生成する。或る物体の物体領域は、当該物体の像を取り囲む矩形領域（望ましくは最小の矩形領域）であって、バウンディングボックスとも称される。また、物体検出器は、クラス識別において、物体領域ごとに当該物体領域内の物体のクラス（種類）を特定する。 Therefore, in object detection for the input image 610, the object detector can set object region 611B including object 611, object region 612B including object 612, and object region 613B including object 613 in the input image 610. Object regions 611B to 613B are set by identifying the positions of objects 611 to 613 in the input image 610. The acquisition unit 10 generates object images 621 to 623 by cutting out object regions 611B to 613B from the input image 610. The object region of a certain object is a rectangular region (preferably the smallest rectangular region) that surrounds the image of the object, and is also called a bounding box. Furthermore, in class identification, the object detector identifies the class (type) of the object in each object region.

物体のクラスとして複数のクラスがある。位置特定又はクラス識別など、任意の処理の対象として注目された任意の物体を、説明の具体化及び明確化のため、注目物体と称することがある。物体検出では入力画像内の各物体が注目物体となり得る。クラス識別では、注目物体が上記複数のクラスの何れに属するのかを判別する。物体のクラスは物体の種別と同義である。ここでは、説明の具体化のため、複数のクラスが第１～第３クラスを含むものとし、第１、第２、第３クラスは、夫々、車両、人間、交通標識であるとする。この場合、クラス識別では、注目物体が車両、人間及び交通標識の何れであるか、或いは、第１～第３クラスに属さないクラス（例えば第４クラス）の物体であるかを判別することになる。 There are multiple object classes. For the sake of concreteness and clarity, any object that is focused on as a target for any process, such as location identification or class identification, may be referred to as a focused object. In object detection, each object in an input image can be a focused object. In class identification, it is determined to which of the multiple classes the focused object belongs. The class of an object is synonymous with the type of object. Here, for the sake of concreteness, it is assumed that the multiple classes include first to third classes, and the first, second, and third classes are a vehicle, a human, and a traffic sign, respectively. In this case, class identification determines whether the focused object is a vehicle, a human, or a traffic sign, or whether it is an object of a class that does not belong to the first to third classes (for example, the fourth class).

入力画像６１０に対するクラス識別では、物体領域６１１Ｂ、６１２Ｂ、６１３Ｂ内の物体が、夫々、車両（即ち第１クラスの物体）、人間（即ち第２クラスの物体）、交通標識（即ち第３クラスの物体）であると判別されることになる（但し、クラス識別に誤りがないと仮定）。 In class identification of input image 610, the objects in object regions 611B, 612B, and 613B are identified as vehicles (i.e., first class objects), humans (i.e., second class objects), and traffic signs (i.e., third class objects), respectively (assuming there is no error in the class identification).

物体検出器により入力画像ごとに物体検出が行われることで、入力画像ＩＩ［１］～ＩＩ［Ｌ］から各々に１つの物体の画像データを含む多数の物体画像が生成され、各物体画像に対しクラス情報が付与される。或る物体画像に付与されるクラス情報は、当該物体画像内の物体のクラスを特定する情報である。従って、クラス識別に誤りがないと仮定すれば、物体画像６２１、６２２、６２３に付与されるクラス情報は、夫々、第１、第２、第３クラスを表す。尚、クラス情報はラベルとも称される。また、上記位置特定の結果を表す情報を物体位置情報と称する。或る入力画像に対する物体位置情報は当該入力画像内の各物体の物体領域の位置（詳細には物体領域における１つの対角線の両端の座標値）を特定する。クラス情報及び物体位置情報を含む情報をアノテーション情報と称する。物体検出器による物体検出では、入力画像ごとにアノテーション情報が生成されることになる。 By performing object detection for each input image by the object detector, a large number of object images, each including image data of one object, are generated from the input images II[1] to II[L], and class information is assigned to each object image. The class information assigned to a certain object image is information that identifies the class of the object in the object image. Therefore, assuming that there is no error in the class identification, the class information assigned to the object images 621, 622, and 623 represents the first, second, and third classes, respectively. Note that the class information is also called a label. In addition, information that represents the result of the position identification is called object position information. The object position information for a certain input image identifies the position of the object region of each object in the input image (more specifically, the coordinate values of both ends of one diagonal line in the object region). Information including the class information and object position information is called annotation information. In object detection by the object detector, annotation information is generated for each input image.

図４に示す如く、以下では、入力画像ＩＩ［１］～ＩＩ［Ｌ］から計Ｈ枚の物体画像ＯＩ［１］～ＯＩ［Ｈ］が生成されたものとし、物体画像ＯＩ［ｉ］に付与されたクラス情報を記号“ＣＬＳ［ｉ］”にて表す。Ｈは２以上の任意の整数であり、ここでは各入力画像から１以上の物体領域が切り出されることを想定しているので“Ｈ≧Ｌ”である。実際には例えば、Ｈは数千～数百万の値を持つ。画像分類データセットは物体画像ＯＩ［１］～ＯＩ［Ｈ］の画像データとクラス情報ＣＬＳ［１］～ＣＬＳ［Ｈ］を含み、画像分類データセットにおいて物体画像ＯＩ［ｉ］の画像データとクラス情報ＣＬＳ［ｉ］は互いに関連付けられている。また、各物体画像が何れの入力画像に基づく物体画像であるかを表す抽出元情報が、データ処理装置１内の各部位に共有及び認識されるものとする。従って例えば、物体画像ＯＩ［１］～ＯＩ［５］が入力画像ＩＩ［１］から切り出されたものである場合、その旨が抽出元情報にて規定される。抽出元情報は画像分類データセットに含まれていても良い。 As shown in FIG. 4, in the following, a total of H object images OI[1] to OI[H] are generated from input images II[1] to II[L], and the class information assigned to object image OI[i] is represented by the symbol "CLS[i]". H is an arbitrary integer of 2 or more, and since it is assumed here that one or more object regions are cut out from each input image, "H≧L". In reality, for example, H has a value of several thousand to several million. The image classification dataset includes image data of object images OI[1] to OI[H] and class information CLS[1] to CLS[H], and in the image classification dataset, the image data of object image OI[i] and the class information CLS[i] are associated with each other. In addition, extraction source information indicating which input image each object image is based on is shared and recognized by each part in the data processing device 1. Therefore, for example, if object images OI[1] to OI[5] are extracted from input image II[1], this is specified in the extraction source information. The extraction source information may be included in the image classification dataset.

取得部１０が物体検出器を有している場合の画像分類データセットの生成方法を説明したが、取得部１０が物体検出器を有していることは必須ではない。例えば、人間の手動操作に基づきアノテーション情報がデータ処理装置１に入力されることで画像分類データセットが生成されても良い。即ち例えば、データ処理装置１の管理者から所定のマンマシンインターフェース（不図示）を通じてデータ処理装置１に与えられた操作情報に基づき、各入力画像に対し物体領域が設定され且つ各物体領域（換言すれば各物体画像）のクラス情報の指定が行われ、これによって各入力画像に対するアノテーション情報の生成並びに画像分類データセットの生成及び取得が行われるようにしても良い。或いは例えば、取得部１０は、データ処理装置１と異なる外部装置（不図示）から有線又は無線通信を通じ、予め作成されたデータセットの入力を受けるものであっても良い。ここにおけるデータセットは、入力画像ＩＩ［１］～ＩＩ［Ｌ］の画像データと各入力画像に対するアノテーション情報を含み、取得部１０は、入力されたデータセットに基づいて画像分類データセットを生成及び取得しても良い。以下の説明において、画像分類データセットとは、特に記述無き限り、取得部１０にて取得された画像分類データセットを指すものとする。 Although the method for generating an image classification dataset when the acquisition unit 10 has an object detector has been described, it is not essential that the acquisition unit 10 has an object detector. For example, the image classification dataset may be generated by inputting annotation information into the data processing device 1 based on a manual operation by a human. That is, for example, an object region may be set for each input image and class information for each object region (in other words, each object image) may be specified based on operation information provided to the data processing device 1 by an administrator of the data processing device 1 through a predetermined man-machine interface (not shown), thereby generating annotation information for each input image and generating and acquiring an image classification dataset. Alternatively, for example, the acquisition unit 10 may receive input of a dataset created in advance from an external device (not shown) different from the data processing device 1 through wired or wireless communication. The dataset here includes image data of the input images II[1] to II[L] and annotation information for each input image, and the acquisition unit 10 may generate and acquire an image classification dataset based on the input dataset. In the following description, unless otherwise specified, the image classification dataset refers to the image classification dataset acquired by the acquisition unit 10.

機械学習処理部２０は、ニューラルネットワーク２１（以下、ＮＮ２１と称する）を有し、機械学習処理を実行する。機械学習処理部２０は、機械学習処理において、画像分類データセットを用いてＮＮ２１の機械学習を実行することにより所定の推論を行う推論モデルを作成する。ここにおける機械学習は深層学習に分類されるものであって良く、従って、ＮＮ２１はディープニューラルネットワークであって良い。機械学習の中でＮＮ２１のパラメータ（重み及びバイアス）が適正に設定される。パラメータ（重み及びバイアス）が適正に設定された後のＮＮ２１が推論モデルに相当する。 The machine learning processing unit 20 has a neural network 21 (hereinafter referred to as NN21) and executes machine learning processing. In the machine learning processing, the machine learning processing unit 20 creates an inference model that performs a predetermined inference by executing machine learning of the NN21 using an image classification dataset. The machine learning here may be classified as deep learning, and therefore the NN21 may be a deep neural network. During the machine learning, the parameters (weights and biases) of the NN21 are appropriately set. The NN21 after the parameters (weights and biases) have been appropriately set corresponds to the inference model.

推論部３０はＮＮ２１による推論モデルを用いて所定の推論を行う。ＮＮ２１は画像分類用のニューラルネットワークであり、故に推論モデルにおける推論は画像分類用の推論である。即ち、推論部３０は、物体を含む物体画像が与えられたとき、推論において、当該物体画像に含まれる物体のクラスを特定（判別）するクラス識別を行う。この際、推論部３０は、クラス識別の実現に有益な特徴量を当該物体画像から導出する特徴量導出処理を実行し、特徴量に基づき当該物体画像に含まれる物体のクラスを特定（判別）する。特徴量は多次元のベクトル量であって、特徴量ベクトルと読み替えても良い。特徴量導出処理は推論部３０に含まれる特徴量導出部３１により実行される。ＮＮ２１の推論モデルに対して画像分類データセットが入力されることで、画像分類データセットにおける各物体画像に対してＮＮ２１の推論モデルによる推論が行われる。この推論の中で、画像分類データセットにおける各物体画像に対して特徴量導出処理が実行され、これによって各物体画像の特徴量が導出される。 The inference unit 30 performs a predetermined inference using an inference model by the NN21. The NN21 is a neural network for image classification, and therefore the inference in the inference model is for image classification. That is, when an object image including an object is given, the inference unit 30 performs class discrimination in inference to identify (discriminate) the class of the object included in the object image. At this time, the inference unit 30 performs a feature derivation process to derive features useful for realizing class discrimination from the object image, and identifies (discriminates) the class of the object included in the object image based on the features. The features are multidimensional vector quantities, and may be read as feature vectors. The feature derivation process is performed by the feature derivation unit 31 included in the inference unit 30. An image classification dataset is input to the inference model of the NN21, and inference is performed for each object image in the image classification dataset using the inference model of the NN21. In this inference, a feature derivation process is performed for each object image in the image classification dataset, thereby deriving the features of each object image.

クラスタリング処理部４０は特徴量導出処理の結果に基づくクラスタリング処理を実行する。即ち、クラスタリング処理において、クラスタリング処理部４０は、画像分類データセット内の各物体画像に対して導出された特徴量に基づき、当該画像分類データセットにおける複数の物体画像のクラスタリングを行う。クラスタリング処理部４０は、特徴量導出処理にて導出された特徴量の次元削減を行い、次元削減後の特徴量に基づいてクラスタリングを行っても良い。例えば主成分分析を用いて次元削減を行うことができる。以下に示す特徴量は、次元削減を行わないならば特徴量導出処理にて導出された特徴量そのものを指し、次元削減を行ったならば次元削減後の特徴量を指すものとする。 The clustering processing unit 40 executes clustering processing based on the results of the feature derivation processing. That is, in the clustering processing, the clustering processing unit 40 performs clustering of multiple object images in the image classification dataset based on the features derived for each object image in the image classification dataset. The clustering processing unit 40 may perform dimensionality reduction of the features derived in the feature derivation processing, and perform clustering based on the features after dimensionality reduction. For example, dimensionality reduction can be performed using principal component analysis. The features shown below refer to the features themselves derived in the feature derivation processing if dimensionality reduction is not performed, and refer to the features after dimensionality reduction if dimensionality reduction is performed.

クラスタリングは、画像分類データセット内の複数の物体画像について導出された複数の特徴量間の距離に基づき実行される。特徴量間の距離は、特徴量が定義される特徴空間上の距離であって、例えば、ユークリッド距離又はコサイン類似度である。クラスタリング処理により、画像分類データセット内において、類似した特徴量を持つ物体画像同士が共通のクラスタに分類される。クラスタリングの方法として階層型クラスタリングを用いて良い。但し、非階層型クラスタリング（ｋ平均法など）によりクラスタリングを行っても良い。 Clustering is performed based on the distance between multiple feature amounts derived for multiple object images in an image classification dataset. The distance between feature amounts is a distance in a feature space in which the feature amounts are defined, and is, for example, Euclidean distance or cosine similarity. By the clustering process, object images with similar feature amounts are classified into a common cluster in the image classification dataset. Hierarchical clustering may be used as a clustering method. However, clustering may also be performed using non-hierarchical clustering (such as k-means).

ここで、クラスタリング処理部４０におけるクラスタリングは、当該画像分類データセットに含まれる全物体画像に対して一括して行われるものではない。本実施形態では、物体画像ＯＩ［１］～ＯＩ［Ｈ］を複数の組に分割し、組ごとに当該組に属する物体画像のクラスタリングを行う。換言すれば、入力画像［１］～入力画像［Ｌ］を複数の組に分割し、組ごとに、当該組に属する入力画像内の物体画像のクラスタリングを行う。このようなクラスタリングの具体例及び技術的意義については後述される。 The clustering in the clustering processing unit 40 is not performed collectively on all object images included in the image classification dataset. In this embodiment, the object images OI[1] to OI[H] are divided into a number of groups, and clustering of the object images belonging to each group is performed for each group. In other words, the input images [1] to [L] are divided into a number of groups, and clustering of the object images in the input images belonging to each group is performed for each group. Specific examples and technical significance of such clustering will be described later.

画像分類データセットの作成には、入力画像内の物体の位置及び物体のクラスに対するアノテーションが必要である。画像分類データセットを物体検出器により作成する場合にも手動操作を通じて作成する場合にも、物体のクラスが誤ってアノテーションされる場合がある。つまり例えば、注目物体が本当は人間（即ち第２クラスの物体）であるのにかかわらず、当該注目物体が車両（即ち第１クラスの物体）であると誤ってクラス情報が付与され、結果、誤ったクラス情報が画像分類データセットに含まれることもある。 Creating an image classification dataset requires annotation of the location of objects in the input images and the object classes. Whether an image classification dataset is created using an object detector or through manual operations, the object classes may be incorrectly annotated. For example, an object of interest may be erroneously assigned as a vehicle (i.e., a first class object) when in fact it is a human (i.e., a second class object), resulting in incorrect class information being included in the image classification dataset.

指標導出部５０は、このような誤りを特定するために有益な指標を導出し、誤り特定部６０は、導出された指標に基づいて誤りのあるデータ等を特定する（詳細は後述）。 The index derivation unit 50 derives indices that are useful for identifying such errors, and the error identification unit 60 identifies erroneous data, etc. based on the derived indices (details will be described later).

学習用データセット生成部７０は、複数の学習用画像の画像データと各学習用画像内の物体の位置及びクラスを特定するアノテーション情報とを含む学習用データセットを生成する。 The training dataset generator 70 generates a training dataset that includes image data of multiple training images and annotation information that identifies the position and class of objects in each training image.

取得部１０にて取り扱われる入力画像ＩＩ［１］～ＩＩ［Ｌ］の画像データ及び各入力画像のアノテーション情報を含むデータセットを、便宜上、暫定データセットと称する。暫定データセットは、生成部７０にて生成されるべき学習用データセットの元となるデータセットであり、仮に、暫定データセットにおけるアノテーションに一切誤りがなければ暫定データセットそのものが学習用データセットとなる。但し、上述したように、膨大な数のアノテーションには誤りが含まれることがあり、生成部７０にて当該誤りを修正又は除去し、誤りの修正又は除去後の暫定データセットを学習用データセットとして生成する。 For convenience, the dataset including the image data of input images II[1] to II[L] handled by the acquisition unit 10 and the annotation information of each input image is referred to as the provisional dataset. The provisional dataset is the original dataset for the learning dataset to be generated by the generation unit 70, and if there are no errors in the annotations in the provisional dataset, the provisional dataset itself becomes the learning dataset. However, as described above, a huge number of annotations may contain errors, and the generation unit 70 corrects or removes the errors, and generates the provisional dataset after the errors have been corrected or removed as the learning dataset.

データベース８０は、不揮発性の記録媒体から成り、生成部７０の制御の下、生成部７０にて生成された学習用データセットを記憶する。暫定データセットもデータベース８０に記憶されていて良い。データベース８０に記憶された暫定データセットが生成部７０により学習用データセットに書き換え又は更新されると考えても良い。磁気ディスク、光ディスク若しくは半導体メモリ、又は、それらの組み合わせにてデータベース８０を構成することができる。尚、データベース８０はデータ処理装置１の外部に設けられた外部記録装置であっても良い。 The database 80 is made of a non-volatile recording medium, and stores the learning dataset generated by the generation unit 70 under the control of the generation unit 70. The provisional dataset may also be stored in the database 80. It may also be considered that the provisional dataset stored in the database 80 is rewritten or updated into the learning dataset by the generation unit 70. The database 80 may be configured from a magnetic disk, an optical disk, or a semiconductor memory, or a combination of these. The database 80 may also be an external recording device provided outside the data processing device 1.

以下、複数の実施例の中で、データ処理装置１に関わる幾つかの具体的な動作例、応用技術、変形技術等を説明する。本実施形態にて上述した事項は、特に記述無き限り且つ矛盾無き限り、以下の各実施例に適用される。各実施例において、上述の事項と矛盾する事項がある場合には、各実施例での記載が優先されて良い。また矛盾無き限り、以下に示す複数の実施例の内、任意の実施例に記載した事項を、他の任意の実施例に適用することもできる（即ち複数の実施例の内の任意の２以上の実施例を組み合わせることも可能である）。 Below, several specific examples of operation, application techniques, modification techniques, etc. related to the data processing device 1 will be described in multiple examples. The matters described above in this embodiment are applied to each of the following examples unless otherwise specified and unless there is a contradiction. If there are any matters in each example that contradict the matters described above, the description in each example may take precedence. Furthermore, unless there is a contradiction, the matters described in any of the multiple examples shown below can also be applied to any of the other examples (i.e., any two or more of the multiple examples can also be combined).

＜＜第１実施例＞＞
第１実施例を説明する。上述したように、クラスタリング処理部４０におけるクラスタリングは、当該画像分類データセットに含まれる全物体画像に対して一括して行われるものではなく、当該画像分類データセットに含まれる全物体画像の一部を含む評価データセットを単位にクラスタリングが行われる。 <<First Example>>
A first embodiment will be described. As described above, the clustering in the clustering processing unit 40 is not performed collectively on all object images included in the image classification dataset, but is performed on an evaluation dataset including a portion of all object images included in the image classification dataset.

まず、図５を参照し、物体画像ＯＩ［１］～ＯＩ［Ｈ］の一部である物体画像ＯＩ［１１］～ＯＩ［２０］の画像データを含んだ評価データセットＥＳ１に対するクラスタリングを説明する。評価データセットＥＳ１は、画像分類データセットから抽出された、物体画像ＯＩ［１１］～ＯＩ［２０］のクラス情報（即ちＣＬＳ［１１］～ＣＬＳ［２０］；図４参照）を含む。物体画像ＯＩ［１１］～ＯＩ［２０］は１又は複数の入力画像ＩＩ［Ｊ_Ａ１］～ＩＩ［Ｊ_Ａ２］から切り出された物体画像であるとする。“０＜Ｊ_Ａ１≦Ｊ_Ａ２＜Ｌ”であり、差（Ｊ_Ａ２－Ｊ_Ａ１）は０又は１以上の整数値を持つ。 First, referring to Fig. 5, clustering for an evaluation dataset ES1 including image data of object images OI[11] to OI[20], which are part of object images OI[1] to OI[H], will be described. The evaluation dataset ES1 includes class information (i.e., CLS[11] to CLS[20]; see Fig. 4) of object images OI[11] to OI[20] extracted from an image classification dataset. Assume that object images OI[11] to OI[20] are object images cut out from one or more input images II[J _A1 ] to II[J _A2 ]. "0<J _A1 ≦J _A2 <L" holds, and the difference (J _A2 -J _A1 ) has an integer value of 0 or 1 or more.

物体画像ＯＩ［１１］～ＯＩ［１４］の夫々に含まれる物体は車両であって、物体画像ＯＩ［１５］～ＯＩ［２０］の夫々に含まれる物体は人間である。画像分類データセット及び評価データセットＥＳ１において、物体画像ＯＩ［１１］～ＯＩ［１４］の各クラス情報は車両に対応する第１クラスを指し示し、物体画像ＯＩ［１５］～ＯＩ［２０］の各クラス情報は人間に対応する第２クラスを指し示す。これらのクラス情報に誤りはない。各物体画像に対して特徴量導出処理が実行されることで各物体画像の特徴量が導出される。以下、物体画像ＯＩ［ｉ］の特徴量を記号“ＦＴ［ｉ］”にて参照する。 The object contained in each of object images OI[11] to OI[14] is a vehicle, and the object contained in each of object images OI[15] to OI[20] is a human. In the image classification dataset and evaluation dataset ES1, the class information of each of object images OI[11] to OI[14] indicates a first class corresponding to a vehicle, and the class information of each of object images OI[15] to OI[20] indicates a second class corresponding to a human. There is no error in this class information. A feature derivation process is performed on each object image to derive the features of each object image. Hereinafter, the feature of object image OI[i] will be referred to by the symbol "FT[i]".

図６は、物体画像ＯＩ［１１］～ＯＩ［２０］に対応する特徴量ＦＴ［１１］～ＦＴ［２０］を特徴空間上にプロットした様子を示す図である。実際には、各特徴量は多次元ベクトル量であるが、図６では各特徴量が二次元ベクトル量であるかのように示している。特徴量が特徴空間にプロットされる様子を示す後述の他の図においても同様である。但し、上述の次元削減により特徴量の次元数を“２”にまで削減するようにしても良い。 Figure 6 shows feature quantities FT[11] to FT[20] corresponding to object images OI[11] to OI[20] plotted in feature space. In reality, each feature quantity is a multidimensional vector quantity, but in Figure 6, each feature quantity is shown as if it were a two-dimensional vector quantity. This is also true for other figures described below that show how feature quantities are plotted in feature space. However, the number of dimensions of the feature quantities may be reduced to "2" by the above-mentioned dimensionality reduction.

尚、以下では、任意の自然数ｕについて、物体画像ＯＩ［ｕ］が或る注目したクラスタに分類される又は属することを、特徴量ＦＴ［ｕ］が当該注目したクラスタに分類される又は属すると表現することもある。即ち、物体画像ＯＩ［ｕ］が当該注目したクラスタに分類される又は属することを、特徴量ＦＴ［ｕ］が当該注目したクラスタに分類される又は属することは等価である。 In the following, for any natural number u, the fact that the object image OI[u] is classified into or belongs to a certain cluster of interest may be expressed as the feature FT[u] being classified into or belonging to the cluster of interest. In other words, the fact that the object image OI[u] is classified into or belongs to the cluster of interest is equivalent to the fact that the feature FT[u] is classified into or belongs to the cluster of interest.

クラスタリング処理部４０は、画像分類データセット（評価データセットＥＳ１）における物体画像ＯＩ［１１］～ＯＩ［２０］のクラス情報に基づき、物体画像ＯＩ［１１］～ＯＩ［２０］をｋ_Ａ個のクラスタに分類する（即ち、クラスタ数をｋ_Ａに設定した上でクラスリングを行う）。ｋ_Ａは、画像分類データセットの物体画像ＯＩ［１１］～ＯＩ［２０］へのクラス情報により表されるクラスの種類数（即ちクラス情報により表される物体画像ＯＩ［１１］～ＯＩ［２０］の物体の種類数）である。換言すれば、ｋ_Ａは、物体画像ＯＩ［１１］～ＯＩ［２０］に付与された計１０個のクラス情報が指し示すクラスの種類数（総数）に等しい。図５の例では、物体画像ＯＩ［１１］～ＯＩ［２０］に付与された計１０個のクラス情報が指し示すクラスは第１クラス及び第２クラスの２つであるため“ｋ_Ａ＝２”である。クラスタリングとして非階層型クラスタリングを行う場合にはクラスタの総数を“ｋ_Ａ＝２”に設定した上でクラスタリングを行えば良く、クラスタリングとして階層型クラスタリングを行う場合にはクラスタの総数が“ｋ_Ａ＝２”となった時点でクラスタリングを止めれば良い。 The clustering processing unit 40 classifies the object images OI[11] to OI[20] into k _A clusters based on the class information of the object images OI[11] to OI[20] in the image classification dataset (evaluation dataset ES1) (i.e., the number of clusters is set to k _A and then clustering is performed). k _A is the number of types of classes represented by the class information for the object images OI[11] to OI[20] in the image classification dataset (i.e., the number of types of objects in the object images OI[11] to OI[20] represented by the class information). In other words, k _A is equal to the number of types of classes (total number) indicated by the total of 10 pieces of class information assigned to the object images OI[11] to OI[20]. In the example of FIG. 5, the classes indicated by the total of 10 pieces of class information assigned to the object images OI[11] to OI[20] are the first class and the second class, so "k _A = 2". When non-hierarchical clustering is performed as the clustering, the total number of clusters can be set to "k _A = 2" before performing the clustering. When hierarchical clustering is performed as the clustering, the clustering can be stopped when the total number of clusters reaches "k _A = 2".

図６に示す如く、特徴量ＦＴ［１１］～ＦＴ［２０］に対しては２つのクラスタＣＴ_Ａ１及びＣＴ_Ａ２が設定される。特徴量ＦＴ［１１］～ＦＴ［１４］はクラスタＣＴ_Ａ１に分類され、特徴量ＦＴ［１５］～ＦＴ［２０］はクラスタＣＴ_Ａ２に分類される。図６において、点Ｇ_Ａ１はクラスタＣＴ_Ａ１に属する特徴量ＦＴ［１１］～ＦＴ［１４］の重心を表し、点Ｇ_Ａ２はクラスタＣＴ_Ａ２に属する特徴量ＦＴ［１５］～ＦＴ［２０］の重心を表す。 As shown in Fig. 6, two clusters CT _A1 and CT _A2 are set for the features FT[11] to FT[20]. The features FT[11] to FT[14] are classified into the cluster CT _A1 , and the features FT[15] to FT[20] are classified into the cluster CT _A2 . In Fig. 6, point G _A1 represents the center of gravity of the features FT[11] to FT[14] belonging to the cluster CT _A1 , and point G _A2 represents the center of gravity of the features FT[15] to FT[20] belonging to the cluster CT _A2 .

指標導出部５０は、クラスタリング処理部４０により生成されたクラスタごとに、当該クラスタの重心を導出し且つ当該重心と当該クラスタに属する各特徴量との距離を導出する。重心及び特徴量間の距離は、２つの特徴量間の距離と同様に、特徴量が定義される特徴空間上の距離であって、例えば、ユークリッド距離又はコサイン類似度である。特徴量ＦＴ［１１］～ＦＴ［２０］に対して導出された距離を、夫々、記号“ｄ［１１］～ｄ［２０］”にて参照する。距離ｄ［１１］～ｄ［１４］は、夫々、重心Ｇ_Ａ１と特徴量ＦＴ［１１］～ＦＴ［１４］との距離を表す。距離ｄ［１５］～ｄ［２０］は、夫々、重心Ｇ_Ａ２と特徴量ＦＴ［１５］～ＦＴ［２０］との距離を表す。 The index derivation unit 50 derives the center of gravity of each cluster generated by the clustering processing unit 40 and derives the distance between the center of gravity and each feature amount belonging to the cluster. The distance between the center of gravity and the feature amount is a distance in the feature space in which the feature amount is defined, similar to the distance between two feature amounts, and is, for example, Euclidean distance or cosine similarity. The distances derived for the feature amounts FT[11] to FT[20] are referred to by the symbols "d[11] to d[20]". The distances d[11] to d[14] respectively represent the distance between the center of gravity G _A1 and the feature amounts FT[11] to FT[14]. The distances d[15] to d[20] respectively represent the distance between the center of gravity G _A2 and the feature amounts FT[15] to FT[20].

指標導出部５０は、更に、クラスタリング処理部４０により生成されたクラスタごとに、導出した複数の距離の平均及び分散を導出する。指標導出部５０にて導出される平均及び分散を、以下では、距離平均、距離分散と称する。クラスタＣＴ_Ａ１に対して導出される距離平均及び距離分散を、夫々、記号“ＡＶＥ_Ａ１”及び“σ^２ _Ａ１”にて参照する。そうすると、距離平均ＡＶＥ_Ａ１は距離ｄ［１１］～ｄ［１４］の平均であり、距離分散σ^２ _Ａ１は距離ｄ［１１］～ｄ［１４］の分散である。同様に、クラスタＣＴ_Ａ２に対して導出される距離平均及び距離分散を、夫々、記号“ＡＶＥ_Ａ２”及び“σ^２ _Ａ２”にて参照する。そうすると、距離平均ＡＶＥ_Ａ２は距離ｄ［１５］～ｄ［２０］の平均であり、距離分散σ^２ _Ａ２は距離ｄ［１５］～ｄ［２０］の分散である。 The index derivation unit 50 further derives the average and variance of the distances derived for each cluster generated by the clustering processing unit 40. The average and variance derived by the index derivation unit 50 are hereinafter referred to as the distance average and distance variance. The distance average and distance variance derived for the cluster CT _A1 are referred to as "AVE _A1 " and "σ ² _A1 ", respectively. Then, the distance average AVE _A1 is the average of the distances d[11] to d[14], and the distance variance σ ² _A1 is the variance of the distances d[11] to d[14]. Similarly, the distance average and distance variance derived for the cluster CT _A2 are referred to as "AVE _A2 " and "σ ² _A2 ", respectively. Then, the distance average AVE _A2 is the average of the distances d[15] to d[20], and the distance variance σ ² _A2 is the variance of the distances d[15] to d[20].

誤り特定部６０は、指標導出部５０にてクラスタごとに導出された距離平均及び距離分散に基づいて、各クラスタが誤りクラスタであるか否かを判定する。誤りクラスタとは、クラス情報に誤りのある物体画像が属するクラスタである。クラス情報に誤りのある物体画像とは、換言すれば、誤りのあるクラス情報が付与された物体画像であり、以下、誤りラベル付き物体画像とも称する。これとは逆に、正しいクラス情報が付与された物体画像を正解ラベル付き物体画像と称する。また、以下では、誤り特定部６０にて誤りクラスタであると特定されなかったクラスタを正解クラスタと称することがある。誤り特定部６０は、各クラスタが正解クラスタ及び誤りクラスタの何れであるかを判定する機能を有している、とも言える。 The error identification unit 60 judges whether each cluster is an error cluster based on the distance average and distance variance derived for each cluster by the index derivation unit 50. An error cluster is a cluster to which an object image with erroneous class information belongs. An object image with erroneous class information is, in other words, an object image to which erroneous class information has been assigned, and is hereinafter also referred to as an object image with an error label. Conversely, an object image to which correct class information has been assigned is referred to as an object image with a correct label. In addition, hereinafter, a cluster that has not been identified as an error cluster by the error identification unit 60 may be referred to as a correct cluster. It can also be said that the error identification unit 60 has a function of judging whether each cluster is a correct cluster or an error cluster.

誤り特定部６０は、クラスタリング処理部４０により生成されたクラスタごとに、距離平均及び距離分散を夫々所定の閾値ＴＨ_１及びＴＨ_２と比較し、その比較結果に基づき所定の誤り条件を満たすクラスタを誤りクラスタとして特定する。或るクラスタについて、対応する距離平均が所定の閾値ＴＨ_１以上であって且つ対応する距離分散が所定の閾値ＴＨ_２以上であるとき、当該クラスタについて誤り条件が満たされる。 The error identifying unit 60 compares the distance average and distance variance with predetermined thresholds _TH1 and _TH2 , respectively, for each cluster generated by the clustering processing unit 40, and identifies a cluster that satisfies a predetermined error condition as an error cluster based on the comparison result. When the corresponding distance average is equal to or greater than the predetermined threshold _TH1 and the corresponding distance variance is equal to or greater than the predetermined threshold _TH2 for a certain cluster, the error condition is satisfied for that cluster.

閾値ＴＨ_１及びＴＨ_２を以下のように設定して良い。即ち、各々に認識対象物体を含む複数の閾値設定用物体画像を予め用意しておき、複数の閾値設定用物体画像に対するクラスタリング結果に基づいて閾値ＴＨ_１及びＴＨ_２を設定して良い。その際、クラスタリング処理部４０は、複数の閾値設定用物体画像を複数のクラスタに分類する。次に、指標導出部５０は、クラスタごとに、複数の距離の平均及び分散を導出する。次に、複数の閾値設定用物体画像について、導出された複数の距離の平均に適当なオフセット値（０でも可）を加えた値を閾値ＴＨ_１に設定し、導出された複数の距離の分散に適当なオフセット値（０でも可）を加えた値を閾値ＴＨ_２に設定する。そして、設定した閾値ＴＨ_１及びＴＨ_２を、対応するクラスタ（同じ種別の物体に関するクラスタ）に対する上記比較に使用して良い。複数の閾値設定用物体画像の画像データは、既存且つ任意のデータセットから得られるものであって良く、例えば、例えば、Microsoft COCOデータセットに含まれる画像データであって良い。 The thresholds TH ₁ and TH ₂ may be set as follows. That is, a plurality of object images for threshold setting, each of which includes a recognition target object, may be prepared in advance, and the thresholds TH ₁ and TH ₂ may be set based on the clustering results for the plurality of object images for threshold setting. At that time, the clustering processing unit 40 classifies the plurality of object images for threshold setting into a plurality of clusters. Next, the index derivation unit 50 derives the average and variance of the plurality of distances for each cluster. Next, for the plurality of object images for threshold setting, a value obtained by adding an appropriate offset value (which may be 0) to the average of the plurality of distances is set as the threshold TH ₁ , and a value obtained by adding an appropriate offset value (which may be 0) to the variance of the plurality of distances is set as the threshold TH _2. Then, the set thresholds TH ₁ and TH ₂ may be used in the above comparison with the corresponding clusters (clusters related to the same type of object). The image data of the plurality of object images for threshold setting may be obtained from an existing and arbitrary dataset, and may be, for example, image data included in the Microsoft COCO dataset.

クラスタＣＴ_Ａ１については距離平均ＡＶＥ_Ａ１及び距離分散σ^２ _Ａ１が夫々閾値ＴＨ_１及びＴＨ_２と比較され、クラスタＣＴ_Ａ２については距離平均ＡＶＥ_Ａ２及び距離分散σ^２ _Ａ２が夫々閾値ＴＨ_１及びＴＨ_２と比較される。図５及び図６の例では、画像分類データセットにおける物体画像ＯＩ［１０］～ＯＩ［２０]のクラス情報に誤りはない。このため、クラスタＣＴ_Ａ１における距離平均ＡＶＥ_Ａ１及び距離分散σ^２ _Ａ１は閾値ＴＨ_１及びＴＨ_２より小さく、且つ、クラスタＣＴ_Ａ２における距離平均ＡＶＥ_Ａ２及び距離分散σ^２ _Ａ２も閾値ＴＨ_１及びＴＨ_２より小さい。故に、クラスタＣＴ_Ａ１及びＣＴ_Ａ２は何れも誤りクラスタとして特定されない（換言すれば正解クラスタであると判定される）。 For the cluster CT _A1, the distance average AVE _A1 and the distance variance σ ² _A1 are compared with the thresholds TH ₁ and TH _2, respectively, and for the cluster CT _A2 , the distance average AVE _A2 and the distance variance σ ² _A2 are compared with the thresholds TH ₁ and TH ₂ , respectively. In the examples of Figs. 5 and 6, there is no error in the class information of the object images OI[10] to OI[20] in the image classification dataset. Therefore, the distance average AVE _A1 and the distance variance σ ² _A1 in the cluster CT _A1 are smaller than the thresholds TH ₁ and TH ₂ , and the distance average AVE _A2 and the distance variance σ ² _A2 in the cluster CT _A2 are also smaller than the thresholds TH ₁ and TH _2. Therefore, neither the cluster CT _A1 nor the cluster CT _A2 is identified as an incorrect cluster (in other words, it is determined to be a correct cluster).

次に、図７を参照し、物体画像ＯＩ［１］～ＯＩ［Ｈ］の一部である体画像ＯＩ［３１］～ＯＩ［４０］の画像データを含んだ評価データセットＥＳ２に対するクラスタリングを説明する。評価データセットＥＳ２は、画像分類データセットから抽出された、物体画像ＯＩ［３１］～ＯＩ［４０］のクラス情報（即ちＣＬＳ［３１］～ＣＬＳ［４０］；図４参照）を含む。物体画像ＯＩ［３１］～ＯＩ［４０］は１又は複数の入力画像ＩＩ［Ｊ_Ｂ１］～ＩＩ［Ｊ_Ｂ２］から切り出された物体画像であるとする。“Ｊ_Ａ２＜Ｊ_Ｂ１≦Ｊ_Ｂ２＜Ｌ”であり、差（Ｊ_Ｂ２－Ｊ_Ｂ１）は０又は１以上の整数値を持つ。 Next, referring to Fig. 7, clustering for the evaluation dataset ES2 including image data of body images OI[31] to OI[40] which are part of the object images OI[1] to OI[H] will be described. The evaluation dataset ES2 includes class information (i.e., CLS[31] to CLS[40]; see Fig. 4) of the object images OI[31] to OI[40] extracted from the image classification dataset. Assume that the object images OI[31] to OI[40] are object images cut out from one or more input images II[J _B1 ] to II[J _B2 ]. "J _A2 <J _B1 ≦J _B2 <L" holds, and the difference (J _B2 -J _B1 ) has an integer value of 0 or 1 or more.

物体画像ＯＩ［３１］～ＯＩ［３４］の夫々に含まれる物体は車両であり、物体画像ＯＩ［３５］～ＯＩ［３８］の夫々に含まれる物体は人間であり、物体画像ＯＩ［３９］及びＯＩ［４０］の夫々に含まれる物体は交通標識である。画像分類データセット及び評価データセットＥＳ２において、物体画像ＯＩ［３１］～ＯＩ［３４］の各クラス情報は車両に対応する第１クラスを指し示し、物体画像ＯＩ［３５］～ＯＩ［３８］の各クラス情報は人間に対応する第２クラスを指し示す。これらのクラス情報に誤りはない。但し、画像分類データセット及び評価データセットＥＳ２において、物体画像ＯＩ［３９］及びＯＩ［４０］のクラス情報は人間に対応する第２クラスを指し示しており、故に物体画像ＯＩ［３９］及びＯＩ［４０］のクラス情報には誤りがある。 The object included in each of the object images OI[31] to OI[34] is a vehicle, the object included in each of the object images OI[35] to OI[38] is a human, and the object included in each of the object images OI[39] and OI[40] is a traffic sign. In the image classification dataset and the evaluation dataset ES2, the class information of each of the object images OI[31] to OI[34] indicates a first class corresponding to a vehicle, and the class information of each of the object images OI[35] to OI[38] indicates a second class corresponding to a human. There is no error in this class information. However, in the image classification dataset and the evaluation dataset ES2, the class information of the object images OI[39] and OI[40] indicates a second class corresponding to a human, and therefore the class information of the object images OI[39] and OI[40] contains an error.

図８は、物体画像ＯＩ［３１］～ＯＩ［４０］に対応する特徴量ＦＴ［３１］～ＦＴ［４０］を特徴空間上にプロットした様子を示す図である。クラスタリング処理部４０は、画像分類データセット（評価データセットＥＳ２）における物体画像ＯＩ［３１］～ＯＩ［４０］のクラス情報に基づき、物体画像ＯＩ［３１］～ＯＩ［４０］をｋ_Ｂ個のクラスタに分類する（即ち、クラスタ数をｋ_Ｂに設定した上でクラスリングを行う）。ｋ_Ｂは、画像分類データセットの物体画像ＯＩ［３１］～ＯＩ［４０］へのクラス情報により表されるクラスの種類数（即ちクラス情報により表される物体画像ＯＩ［３１］～ＯＩ［４０］の物体の種類数）である。換言すれば、ｋ_Ｂは、物体画像ＯＩ［３１］～ＯＩ［４０］に付与された計１０個のクラス情報が指し示すクラスの種類数（総数）に等しい。図７の例では、物体画像ＯＩ［３１］～ＯＩ［４０］に付与された計１０個のクラス情報が指し示すクラスは、本来は第１～第３クラスの３つとなるべきところ、実際には誤って第１クラス及び第２クラスの２つとなっているため“ｋ_Ｂ＝２”である。クラスタリングとして非階層型クラスタリングを行う場合にはクラスタの総数を“ｋ_Ｂ＝２”に設定した上でクラスタリングを行えば良く、クラスタリングとして階層型クラスタリングを行う場合にはクラスタの総数が“ｋ_Ｂ＝２”となった時点でクラスタリングを止めれば良い。 FIG. 8 is a diagram showing a state where feature quantities FT[31] to FT[40] corresponding to object images OI[31] to OI[40] are plotted on a feature space. The clustering processing unit 40 classifies object images OI[31] to OI[40] into k _B clusters based on class information of object images OI[31] to OI[40] in the image classification dataset (evaluation dataset ES2) (i.e., the number of clusters is set to k _B and then clustering is performed). k _B is the number of types of classes represented by class information for object images OI[31] to OI[40] in the image classification dataset (i.e., the number of types of objects in object images OI[31] to OI[40] represented by class information). In other words, k _B is equal to the number of types of classes (total number) indicated by a total of 10 pieces of class information assigned to object images OI[31] to OI[40]. 7, the classes indicated by the 10 pieces of class information assigned to the object images OI[31] to OI[40] should be three, the first to third classes, but are mistakenly indicated as two, the first and second classes, resulting in "k _B = 2." When non-hierarchical clustering is performed as the clustering, the total number of clusters can be set to "k _B = 2" before clustering is performed, and when hierarchical clustering is performed as the clustering, the clustering can be stopped when the total number of clusters reaches "k _B = 2."

図８に示す如く、特徴量ＦＴ［３１］～ＦＴ［４０］に対しては２つのクラスタＣＴ_Ｂ１及びＣＴ_Ｂ２が設定される。ここでは、特徴量ＦＴ［３１］～ＦＴ［３４］はクラスタＣＴ_Ｂ１に分類され、特徴量ＦＴ［３５］～ＦＴ［４０］はクラスタＣＴ_Ｂ２に分類されたとする。図８において、点Ｇ_Ｂ１はクラスタＣＴ_Ｂ１に属する特徴量ＦＴ［３１］～ＦＴ［３４］の重心を表し、点Ｇ_Ｂ２はクラスタＣＴ_Ｂ２に属する特徴量ＦＴ［３５］～ＦＴ［４０］の重心を表す。 As shown in Fig. 8, two clusters CT _B1 and CT _B2 are set for the features FT[31] to FT[40]. Here, it is assumed that the features FT[31] to FT[34] are classified into the cluster CT _B1 , and the features FT[35] to FT[40] are classified into the cluster CT _B2 . In Fig. 8, point G _B1 represents the center of gravity of the features FT[31] to FT[34] belonging to the cluster CT _B1 , and point G _B2 represents the center of gravity of the features FT[35] to FT[40] belonging to the cluster CT _B2 .

指標導出部５０は、上述したように、クラスタリング処理部４０により生成されたクラスタごとに、当該クラスタの重心を導出し且つ当該重心と当該クラスタに属する各特徴量との距離を導出する。特徴量ＦＴ［３１］～ＦＴ［４０］に対して導出された距離を、夫々、記号“ｄ［３１］～ｄ［４０］”にて参照する。距離ｄ［３１］～ｄ［３４］は、夫々、重心Ｇ_Ｂ１と特徴量ＦＴ［３１］～ＦＴ［３４］との距離を表す。距離ｄ［３５］～ｄ［４０］は、夫々、重心Ｇ_Ｂ２と特徴量ＦＴ［３５］～ＦＴ［４０］との距離を表す。 As described above, the index derivation unit 50 derives the center of gravity of each cluster generated by the clustering processing unit 40, and derives the distance between the center of gravity and each feature belonging to the cluster. The distances derived for the features FT[31] to FT[40] are referred to by the symbols "d[31] to d[40]", respectively. The distances d[31] to d[34] represent the distances between the center of gravity G _B1 and the features FT[31] to FT[34], respectively. The distances d[35] to d[40] represent the distances between the center of gravity G _B2 and the features FT[35] to FT[40], respectively.

指標導出部５０は、更に上述の如くクラスタごとに距離平均及び距離分散を導出する。クラスタＣＴ_Ｂ１に対して導出される距離平均及び距離分散を、夫々、記号“ＡＶＥ_Ｂ１”及び“σ^２ _Ｂ１”にて参照する。そうすると、距離平均ＡＶＥ_Ｂ１は距離ｄ［３１］～ｄ［３４］の平均であり、距離分散σ^２ _Ｂ１は距離ｄ［３１］～ｄ［３４］の分散である。同様に、クラスタＣＴ_Ｂ２に対して導出される距離平均及び距離分散を、夫々、記号“ＡＶＥ_Ｂ２”及び“σ^２ _Ｂ２”にて参照する。そうすると、距離平均ＡＶＥ_Ｂ２は距離ｄ［３５］～ｄ［４０］の平均であり、距離分散σ^２ _Ｂ２は距離ｄ［３５］～ｄ［４０］の分散である。 The index derivation unit 50 further derives the distance average and distance variance for each cluster as described above. The distance average and distance variance derived for the cluster CT _B1 are referred to by the symbols "AVE _B1 " and "σ ² _B1 ", respectively. Then, the distance average AVE _B1 is the average of the distances d[31] to d[34], and the distance variance σ ² _B1 is the variance of the distances d[31] to d[34]. Similarly, the distance average and distance variance derived for the cluster CT _B2 are referred to by the symbols "AVE _B2 " and "σ ² _B2 ", respectively. Then, the distance average AVE _B2 is the average of the distances d[35] to d[40], and the distance variance σ ² _B2 is the variance of the distances d[35] to d[40].

誤り特定部６０は、クラスタＣＴ_Ｂ１及びＣＴ_Ｂ２の夫々について誤り条件の成否を判定する。クラスタＣＴ_Ｂ１については距離平均ＡＶＥ_Ｂ１及び距離分散σ^２ _Ｂ１が夫々閾値ＴＨ_１及びＴＨ_２と比較され、クラスタＣＴ_Ｂ２については距離平均ＡＶＥ_Ｂ２及び距離分散σ^２ _Ｂ２が夫々閾値ＴＨ_１及びＴＨ_２と比較される。図７及び図８の例では、クラスタＣＴ_Ｂ１における距離平均ＡＶＥ_Ｂ１及び距離分散σ^２ _Ｂ１は夫々閾値ＴＨ_１及びＴＨ_２より小さく、結果、クラスタＣＴ_Ｂ１は誤りクラスタとして特定されない（換言すれば正解クラスタであると判定される）。但し、クラスタＣＴ_Ｂ２における距離平均ＡＶＥ_Ｂ２及び距離分散σ^２ _Ｂ２は夫々閾値ＴＨ_１及びＴＨ_２より大きい。このため、クラスタＣＴ_Ｂ２は誤り条件を満たし、結果、クラスタＣＴ_Ｂ２は誤りクラスタとして特定される。 The error identification unit 60 judges whether the error condition is satisfied for each of the clusters CT _B1 and CT _B2 . For the cluster CT _B1 , the distance average AVE _B1 and the distance variance σ ² _B1 are compared with the thresholds TH ₁ and TH _2, respectively, and for the cluster CT _B2, the distance average AVE _B2 and the distance variance σ ² _B2 are compared with the thresholds TH ₁ and TH ₂ , respectively. In the example of Fig. 7 and Fig. 8, the distance average AVE _B1 and the distance variance σ ² _B1 in the cluster CT _B1 are smaller than the thresholds TH ₁ and TH ₂ , respectively, and as a result, the cluster CT _B1 is not identified as an error cluster (in other words, it is determined to be a correct cluster). However, the distance average AVE _B2 and the distance variance σ ² _B2 in the cluster CT _B2 are larger than the thresholds TH ₁ and TH ₂ , respectively. Therefore, cluster CT _B2 satisfies the error condition, and as a result, cluster CT _B2 is identified as an error cluster.

次に、図９を参照し、物体画像ＯＩ［１］～ＯＩ［Ｈ］の一部である体画像ＯＩ［５１］～ＯＩ［６０］の画像データを含んだ評価データセットＥＳ３に対するクラスタリングを説明する。評価データセットＥＳ３は、画像分類データセットから抽出された、物体画像ＯＩ［５１］～ＯＩ［６０］のクラス情報（即ちＣＬＳ［５１］～ＣＬＳ［６０］；図４参照）を含む。物体画像ＯＩ［５１］～ＯＩ［６０］は１又は複数の入力画像ＩＩ［Ｊ_Ｃ１］～ＩＩ［Ｊ_Ｃ２］から切り出された物体画像であるとする。“Ｊ_Ｂ２＜Ｊ_Ｃ１≦Ｊ_Ｃ２＜Ｌ”であり、差（Ｊ_Ｃ２－Ｊ_Ｃ１）は０又は１以上の整数値を持つ。 Next, referring to Fig. 9, clustering for an evaluation dataset ES3 including image data of body images OI[51] to OI[60] which are part of object images OI[1] to OI[H] will be described. The evaluation dataset ES3 includes class information (i.e., CLS[51] to CLS[60]; see Fig. 4) of object images OI[51] to OI[60] extracted from an image classification dataset. Assume that object images OI[51] to OI[60] are object images cut out from one or more input images II[J _C1 ] to II[J _C2 ]. "J _B2 <J _C1 ≦J _C2 <L" holds, and the difference (J _C2 -J _C1 ) has an integer value of 0 or 1 or more.

物体画像ＯＩ［５１］～ＯＩ［５８］の夫々に含まれる物体は車両であり、物体画像ＯＩ［５９］及びＯＩ［６０］の夫々に含まれる物体は人間である。画像分類データセット及び評価データセットＥＳ３において、物体画像ＯＩ［５１］～ＯＩ［５８］の各クラス情報は車両に対応する第１クラスを指し示す。これらのクラス情報に誤りはない。但し、画像分類データセット及び評価データセットＥＳ３において、物体画像ＯＩ［５９］及びＯＩ［６０］のクラス情報も車両に対応する第１クラスを指し示しており、故に物体画像ＯＩ［５９］及びＯＩ［６０］のクラス情報には誤りがある。 The object contained in each of object images OI[51] to OI[58] is a vehicle, and the object contained in each of object images OI[59] and OI[60] is a human. In the image classification dataset and evaluation dataset ES3, the class information of each of object images OI[51] to OI[58] indicates the first class corresponding to a vehicle. There is no error in this class information. However, in the image classification dataset and evaluation dataset ES3, the class information of object images OI[59] and OI[60] also indicates the first class corresponding to a vehicle, and therefore the class information of object images OI[59] and OI[60] contains an error.

図１０は、物体画像ＯＩ［５１］～ＯＩ［６０］に対応する特徴量ＦＴ［５１］～ＦＴ［６０］を特徴空間上にプロットした様子を示す図である。クラスタリング処理部４０は、画像分類データセット（評価データセットＥＳ３）における物体画像ＯＩ［５１］～ＯＩ［６０］のクラス情報に基づき、物体画像ＯＩ［５１］～ＯＩ［６０］をｋ_Ｃ個のクラスタに分類する（即ち、クラスタ数をｋ_Ｃに設定した上でクラスリングを行う）。ｋ_Ｃは、画像分類データセットの物体画像ＯＩ［５１］～ＯＩ［６０］へのクラス情報により表されるクラスの種類数（即ちクラス情報により表される物体画像ＯＩ［５１］～ＯＩ［６０］の物体の種類数）である。換言すれば、ｋ_Ｃは、物体画像ＯＩ［５１］～ＯＩ［６０］に付与された計１０個のクラス情報が指し示すクラスの種類数（総数）に等しい。図９の例において、物体画像ＯＩ［５１］～ＯＩ［６０］に付与された計１０個のクラス情報が指し示すクラスは、本来は第１及び第２クラスの２つとなるべきところ、誤って第１クラスのみとなっているため“ｋ_Ｃ＝１”である。クラスタリングとして非階層型クラスタリングを行う場合にはクラスタの総数を“ｋ_Ｃ＝１”に設定した上でクラスタリングを行えば良く、クラスタリングとして階層型クラスタリングを行う場合にはクラスタの総数が“ｋ_Ｃ＝１”となるまでクラスタリングを継続すれば良い。但し、“ｋ_Ｃ＝１”の場合には実体的なクラスタリング用の演算は不要とも言え、単に特徴量ＦＴ［５１］～ＦＴ［６０］を全て包含する単一のクラスタを設定すれば足る。 FIG. 10 is a diagram showing a state where feature quantities FT[51] to FT[60] corresponding to object images OI[51] to OI[60] are plotted on a feature space. The clustering processing unit 40 classifies object images OI[51] to OI[60] into k _C clusters based on class information of object images OI[51] to OI[60] in the image classification dataset (evaluation dataset ES3) (i.e., the number of clusters is set to k _C and then clustering is performed). k _C is the number of types of classes represented by the class information for object images OI[51] to OI[60] in the image classification dataset (i.e., the number of types of objects in object images OI[51] to OI[60] represented by class information). In other words, k _C is equal to the number of types of classes (total number) indicated by a total of 10 pieces of class information assigned to object images OI[51] to OI[60]. In the example of Fig. 9, the classes indicated by the total of 10 pieces of class information assigned to the object images OI[51] to OI[60] should be two classes, the first and second classes, but are mistakenly indicated as only the first class, so " _kC = 1". When non-hierarchical clustering is performed as the clustering, the total number of clusters may be set to " _kC = 1" before clustering is performed, and when hierarchical clustering is performed as the clustering, clustering may be continued until the total number of clusters becomes " _kC = 1". However, when " _kC = 1", it can be said that a substantial clustering operation is not necessary, and it is sufficient to simply set a single cluster that includes all of the feature amounts FT[51] to FT[60].

図１０に示す如く、特徴量ＦＴ［５１］～ＦＴ［６０］に対しては１つのクラスタＣＴ_Ｃ１が設定される。図１０において、点Ｇ_Ｃ１はクラスタＣＴ_Ｃ１に属する特徴量ＦＴ［５１］～ＦＴ［６０］の重心を表す。 As shown in Fig. 10, one cluster CT _C1 is set for the features FT[51] to FT[60]. In Fig. 10, a point G _C1 represents the center of gravity of the features FT[51] to FT[60] belonging to the cluster CT _C1 .

指標導出部５０は、上述したように、クラスタリング処理部４０により生成されたクラスタごとに、当該クラスタの重心を導出し且つ当該重心と当該クラスタに属する各特徴量との距離を導出する。評価データセットＥＳ３に対しては単一のクラスタＣＴ_Ｃ１が設定されるため、クラスタＣＴ_Ｃ１のみに注目して各距離が導出される。特徴量ＦＴ［５１］～ＦＴ［６０］に対して導出された距離を、夫々、記号“ｄ［５１］～ｄ［６０］”にて参照する。距離ｄ［５１］～ｄ［６０］は、夫々、重心Ｇ_Ｃ１と特徴量ＦＴ［５１］～ＦＴ［６０］との距離を表す。 As described above, the index derivation unit 50 derives the center of gravity of each cluster generated by the clustering processing unit 40, and derives the distance between the center of gravity and each feature amount belonging to the cluster. Since a single cluster CT _C1 is set for the evaluation data set ES3, each distance is derived focusing only on the cluster CT _C1 . The distances derived for the features FT[51] to FT[60] are referred to by the symbols "d[51] to d[60]", respectively. The distances d[51] to d[60] represent the distance between the center of gravity G _C1 and the features FT[51] to FT[60], respectively.

指標導出部５０は、更にクラスタＣＴ_Ｃ１に対して距離平均及び距離分散を導出する。クラスタＣＴ_Ｃ１に対して導出される距離平均及び距離分散を、夫々、記号“ＡＶＥ_Ｃ１”及び“σ^２ _Ｃ１”にて参照する。そうすると、距離平均ＡＶＥ_Ｃ１は距離ｄ［５１］～ｄ［６０］の平均であり、距離分散σ^２ _Ｃ１は距離ｄ［５１］～ｄ［６０］の分散である。 The index derivation unit 50 further derives the distance average and distance variance for the cluster CT _C1 . The distance average and distance variance derived for the cluster CT _C1 are referred to by the symbols "AVE _C1 " and "σ ² _C1 ", respectively. Then, the distance average AVE _C1 is the average of the distances d[51] to d[60], and the distance variance σ ² _C1 is the variance of the distances d[51] to d[60].

誤り特定部６０は、クラスタＣＴ_Ｃ１について誤り条件の成否を判定する。クラスタＣＴ_Ｃ１について距離平均ＡＶＥ_Ｃ１及び距離分散σ^２ _Ｃ１が夫々閾値ＴＨ_１及びＴＨ_２と比較される。図９及び図１０の例において、クラスタＣＴ_Ｃ１における距離平均ＡＶＥ_Ｃ１及び距離分散σ^２ _Ｃ１は夫々閾値ＴＨ_１及びＴＨ_２よりも大きい。このため、クラスタＣＴ_Ｃ１は誤り条件を満たし、結果、クラスタＣＴ_Ｃ１は誤りクラスタとして特定される。 The error identification unit 60 judges whether the error condition is satisfied for the cluster CT _C1 . The distance average AVE _C1 and the distance variance σ ² _C1 for the cluster CT _C1 are compared with the thresholds TH ₁ and TH ₂ , respectively. In the examples of Fig. 9 and Fig. 10, the distance average AVE _C1 and the distance variance σ ² _C1 for the cluster CT _C1 are greater than the thresholds TH ₁ and TH ₂ , respectively. Therefore, the cluster CT _C1 satisfies the error condition, and as a result, the cluster CT _C1 is identified as an error cluster.

正解クラスタ内の特徴量は真に同一のクラスの物体についての特徴量であるため、正解クラスタ内の特徴量のばらつきは比較的小さい。これに対し、誤りクラスタ内の特徴量は互いに異なるクラスの物体についての特徴量を含むため、誤りクラスタ内の特徴量のばらつきは比較的大きくなる。これらを考慮し、上述の如く、クラスタごとに距離平均及び距離分散を導出し、それらの導出結果に基づいて誤りクラスタを特定する。この方法により、人の目に頼ることなくデータ処理装置１自身で誤りクラスタを良好に検出することが可能となり、人の目に頼る場合と比べてデータセット中の不備を探すコスト（時間及び人間の労力）を削減できる。誤りクラスタを特定できれば、誤りクラスタに属する物体画像のクラス情報を修正して学習用データセットを生成する又は当該物体画像を含む入力画像を学習用データセットから除外するなどといった対応が可能となり、結果、学習用データセットの質を高めることが可能となる。学習用データセットの質を高めることで、学習用データセットを用いた機械学習にて構築される画像認識用の推論モデルの性能を高めることができる。 The features in the correct cluster are for objects of the same class, so the variation in the features in the correct cluster is relatively small. In contrast, the features in the error cluster include features for objects of different classes, so the variation in the features in the error cluster is relatively large. Taking these factors into consideration, as described above, the distance mean and distance variance are derived for each cluster, and the error cluster is identified based on the derived results. This method makes it possible for the data processing device 1 itself to detect error clusters well without relying on human eyes, and reduces the cost (time and human effort) of searching for defects in the dataset compared to when relying on human eyes. If an error cluster can be identified, it becomes possible to take measures such as correcting the class information of the object image belonging to the error cluster to generate a learning dataset or excluding an input image including the object image from the learning dataset, and as a result, it becomes possible to improve the quality of the learning dataset. By improving the quality of the learning dataset, the performance of an inference model for image recognition constructed by machine learning using the learning dataset can be improved.

＜＜第２実施例＞＞
第２実施例を説明する。第２実施例は上述の第１実施例と組み合わせて実施される。第２実施例では、誤りクラスタが特定された場合に、誤り特定部６０により実行可能な誤りラベル特定処理を説明する。 <<Second Example>>
A second embodiment will be described. The second embodiment is implemented in combination with the above-described first embodiment. In the second embodiment, an error label specification process that can be performed by the error specification unit 60 when an error cluster is specified will be described.

誤りラベル特定処理において、誤り特定部６０は、誤りクラスタ内で当該誤りクラスタに属する物体画像を第１及び第２クラスタにグループ分けするクラスタリングを行う。誤りラベル特定処理におけるクラスタリングを、クラスタリング処理４０が行う上述のクラスタリングと明確に区別すべく、再クラスタリングと称する。再クラスタリングは、誤りクラスタに属する各物体画像の特徴量に基づいて行う。即ち、或る誤りクラスタにｎ枚の物体画像が属するのであれば、当該ｎ枚の物体画像について特徴量導出処理にて導出されたｎ個の特徴量に基づき、再クラスタリングを行う。但し、この再クラスタリングは“ｎ≧３”を満たすことを前提に実行される。故に、誤りラベル特定処理は“ｎ≧３”を満たす誤りクラスタに対して実行可能である。再クラスタリングを、クラスタ数を２に設定した非階層型クラスタリング（ｋ平均法など）にて実現できる。 In the error label identification process, the error identification unit 60 performs clustering to group object images belonging to an error cluster into a first and a second cluster within the error cluster. The clustering in the error label identification process is called reclustering to clearly distinguish it from the above-mentioned clustering performed by the clustering process 40. The reclustering is performed based on the feature amounts of each object image belonging to the error cluster. In other words, if n object images belong to a certain error cluster, reclustering is performed based on the n feature amounts derived in the feature derivation process for the n object images. However, this reclustering is performed on the premise that "n ≧ 3" is satisfied. Therefore, the error label identification process can be performed on an error cluster that satisfies "n ≧ 3". The reclustering can be realized by non-hierarchical clustering (such as k-means) with the number of clusters set to 2.

誤りラベル特定処理に係る誤り特定部６０は、再クラスタリングの結果に基づき、上記ｎ枚の物体画像の中から誤りラベル付き物体画像（即ちクラス情報に誤りのある物体画像）を特定する。誤り特定部６０は、誤りクラスタに属する各物体画像が正解ラベル付き物体画像及び誤りラベル付き物体画像の何れであるかを判定する機能を有しているとも言える。誤りラベル特定処理として、以下の第１又は第２誤りラベル特定処理を採用できる。 The error identification unit 60 involved in the error label identification process identifies an erroneously labeled object image (i.e., an object image with an error in the class information) from among the n object images based on the result of reclustering. It can also be said that the error identification unit 60 has a function of determining whether each object image belonging to an error cluster is an object image with a correct label or an object image with an error label. As the error label identification process, the following first or second error label identification process can be adopted.

図７及び図８に対応する上記評価データセットＥＳ２に注目して第１誤りラベル特定処理を説明する。図１１を参照する。評価セットＥＳ２に対しては第１実施例で述べたようにクラスタＣＴ_Ｂ２が誤りクラスタとして特定される。誤りクラスタＣＴ_Ｂ２には物体画像ＯＩ［３５］～ＯＩ［４０］が属する。換言すれば、誤りクラスタＣＴ_Ｂ２には特徴量ＦＴ［３５］～ＦＴ［４０］が属する。故に、誤りクラスタＣＴ_Ｂ２において“ｎ＝６”である。誤り特定部６０は、特徴量ＦＴ［３５］～ＦＴ［４０］に基づき物体画像ＯＩ［３５］～ＯＩ［４０］の再クラスタリングを行うことで物体画像ＯＩ［３５］～ＯＩ［４０］を第１及び第２クラスタに分類する。図１１において、クラスタＣＴ_Ｂ２＿１及びＣＴ_Ｂ２＿２の内、何れか一方が第１クラスタに相当し、他方が第２クラスタに相当する。特徴量ＦＴ［３５］～ＦＴ［３８］はクラスタＣＴ_Ｂ２＿１に分類され、特徴量ＦＴ［３９］及びＦＴ［４０］はクラスタＣＴ_Ｂ２＿２に分類される。図１１における点Ｇ_Ｂ２は図８の重心Ｇ_Ｂ２と一致する。 The first error label identification process will be described with attention to the evaluation data set ES2 corresponding to FIG. 7 and FIG. 8. Please refer to FIG. 11. For the evaluation set ES2, the cluster CT _B2 is identified as an error cluster as described in the first embodiment. The object images OI[35] to OI[40] belong to the error cluster CT _B2 _. In other words, the feature amounts FT[35] to FT[40] belong to the error cluster CT B2. Therefore, "n=6" in the error cluster CT _B2 . The error identification unit 60 classifies the object images OI[35] to OI[40] into the first and second clusters by re-clustering the object images OI[35] to OI[40] based on the feature amounts FT[35] to FT[40]. In FIG. 11, one of the clusters CT _{B2_1} and CT _{B2_2} corresponds to the first cluster, and the other corresponds to the second cluster. The features FT[35] to FT[38] are classified into a cluster CT _{B2_1} , and the features FT[39] and FT[40] are classified into a cluster CT _{B2_2} . The point G _B2 in FIG. 11 coincides with the center of gravity G _B2 in FIG.

第１誤りラベル特定処理では、誤りクラスタ内の各物体画像の再クラスタリングにより形成される第１及び第２クラスタの内、属する物体画像がより少ない方のクラスタ内の物体画像を誤りラベル付き物体画像として特定する。即ち、第１誤りラベル特定処理では、誤りクラスタ内の各物体画像の再クラスタリングにより第１及び第２クラスタを形成した後、第２クラスタに属する物体画像の総数が第１クラスタに属する物体画像の総数よりも少ないならば、第２クラスタに属する物体画像を誤りラベル付き物体画像（即ちクラス情報に誤りのある物体画像）として特定し、第１クラスタに属する物体画像の総数が第２クラスタに属する物体画像の総数よりも少ないならば、第１クラスタに属する物体画像を誤りラベル付き物体画像として特定する。 In the first error label identification process, of the first and second clusters formed by reclustering each object image in the error cluster, the object image in the cluster with fewer object images is identified as an object image with an error label. That is, in the first error label identification process, after forming the first and second clusters by reclustering each object image in the error cluster, if the total number of object images belonging to the second cluster is smaller than the total number of object images belonging to the first cluster, the object image belonging to the second cluster is identified as an object image with an error label (i.e., an object image with an error in class information), and if the total number of object images belonging to the first cluster is smaller than the total number of object images belonging to the second cluster, the object image belonging to the first cluster is identified as an object image with an error label.

図１１の例では、クラスタＣＴ_Ｂ２＿１に属する物体画像の総数は４であって且つクラスタＣＴ_Ｂ２＿２に属する物体画像の総数は２である。このため、評価データセットＥＳ２の誤りクラスタに対して第１誤りラベル特定処理を適用した場合には、クラスタＣＴ_Ｂ２＿２に属する各物体画像が誤りラベル付き物体画像として特定される。 11 , the total number of object images belonging to the cluster CT _{B2_1} is 4, and the total number of object images belonging to the cluster CT _{B2_2} is 2. Therefore, when the first erroneous label identification process is applied to the erroneous clusters of the evaluation data set ES2, each object image belonging to the cluster CT _{B2_2} is identified as an erroneously labeled object image.

上記評価データセットＥＳ２に注目して第２誤りラベル特定処理を説明する。特徴量ＦＴ［３５］～ＦＴ［４０］をクラスタＣＴ_Ｂ２＿１及びＣＴ_Ｂ２＿２に分類するまでの処理は、第１及び第２誤りラベル特定処理間で共通である。第２誤りラベル特定処理では、誤りクラスタ内の各物体画像の再クラスタリングにより形成される第１及び第２クラスタの内、当該誤りクラスタの重心からより遠い方のクラスタ内の物体画像を誤りラベル付き物体画像として特定する。即ち、第２誤りラベル特定処理では、誤りクラスタ内の各物体画像の再クラスタリングにより第１及び第２クラスタを形成する。そして、当該誤りクラスタの重心と第２クラスタとの距離が当該誤りクラスタの重心と第１クラスタとの距離よりも大きいならば、第２クラスタに属する物体画像を誤りラベル付き物体画像（即ちクラス情報に誤りのある物体画像）として特定する。逆に、当該誤りクラスタの重心と第１クラスタとの距離が当該誤りクラスタの重心と第２クラスタとの距離よりも大きいならば、第１クラスタに属する物体画像を誤りラベル付き物体画像として特定する。当該誤りクラスタの重心と第１クラスタとの距離とは、当該誤りクラスタの重心と第１クラスタに属する全特徴量の重心との距離であって良く、当該誤りクラスタの重心と第２クラスタとの距離とは、当該誤りクラスタの重心と第２クラスタに属する全特徴量の重心との距離であって良い。 The second error label identification process will be described with a focus on the evaluation data set ES2. The process up to classifying the feature quantities FT[35] to FT[40] into the clusters CT _{B2_1} and CT _{B2_2} is common to the first and second error label identification processes. In the second error label identification process, of the first and second clusters formed by reclustering each object image in the error cluster, the object image in the cluster farther from the center of gravity of the error cluster is identified as an object image with an error label. That is, in the second error label identification process, the first and second clusters are formed by reclustering each object image in the error cluster. Then, if the distance between the center of gravity of the error cluster and the second cluster is greater than the distance between the center of gravity of the error cluster and the first cluster, the object image belonging to the second cluster is identified as an object image with an error label (i.e., an object image with an error in class information). Conversely, if the distance between the center of gravity of the error cluster and the first cluster is greater than the distance between the center of gravity of the error cluster and the second cluster, the object image belonging to the first cluster is identified as an object image with an error label. The distance between the center of gravity of the error cluster and the first cluster may be the distance between the center of gravity of the error cluster and the center of gravity of all features belonging to the first cluster, and the distance between the center of gravity of the error cluster and the second cluster may be the distance between the center of gravity of the error cluster and the center of gravity of all features belonging to the second cluster.

図１１の例において（図８も参照）、誤りクラスタＣＴ_Ｂ２の重心Ｇ_Ｂ２とクラスタＣＴ_Ｂ２＿２との距離（詳細には、クラスタＣＴ_Ｂ２＿２に属する特徴量ＦＴ［３９］及びＦＴ［４０］の重心と重心Ｇ_Ｂ２との距離）は、誤りクラスタＣＴ_Ｂ２の重心Ｇ_Ｂ２とクラスタＣＴ_Ｂ２＿１との距離（詳細には、クラスタＣＴ_Ｂ２＿１に属する特徴量ＦＴ［３５］～ＦＴ［３８］の重心と重心Ｇ_Ｂ２との距離）よりも大きい。このため、評価データセットＥＳ２の誤りクラスタに対して第２誤りラベル特定処理を適用した場合にも、第１誤りラベル特定処理を適用した場合と同様、クラスタＣＴ_Ｂ２＿２に属する各物体画像が誤りラベル付き物体画像として特定される。 11 (see also FIG. 8 ), the distance between the center of gravity G _B2 of the error cluster CT _B2 and the cluster CT _{B2_2} (more specifically, the distance between the center of gravity G _B2 and the centers of gravity of the features FT[39] and FT[40] belonging to the cluster CT _{B2_2} ) is greater than the distance between the center of gravity G _B2 of the error cluster CT _B2 and the cluster CT _{B2_1} (more specifically, the distance between the center of gravity G _B2 and the centers of gravity of the features FT[35] to FT[38] belonging to the cluster CT _{B2_1} ). Therefore, when the second error label identification process is applied to the error clusters of the evaluation dataset ES2, each object image belonging to the cluster CT _{B2_2} is identified as an erroneously labeled object image, as in the case of applying the first error label identification process.

図９及び図１０に対応する上記評価データセットＥＳ３に注目して第１誤りラベル特定処理を説明する。図１２を参照する。評価データセットＥＳ３に対しては第１実施例で述べたようにクラスタＣＴ_Ｃ１が誤りクラスタとして特定される。誤りクラスタＣＴ_Ｃ１には物体画像ＯＩ［５１］～ＯＩ［６０］が属する。換言すれば、誤りクラスタＣＴ_Ｃ１には特徴量ＦＴ［５１］～ＦＴ［６０］が属する。故に、誤りクラスタＣＴ_Ｃ１において“ｎ＝１０”である。誤り特定部６０は、特徴量ＦＴ［５１］～ＦＴ［６０］に基づき物体画像ＯＩ［５１］～ＯＩ［６０］の再クラスタリングを行うことで物体画像ＯＩ［５１］～ＯＩ［６０］を第１及び第２クラスタに分類する。図１２において、クラスタＣＴ_Ｃ１＿１及びＣＴ_Ｃ１＿２の内、何れか一方が第１クラスタに相当し、他方が第２クラスタに相当する。特徴量ＦＴ［５１］～ＦＴ［５８］はクラスタＣＴ_Ｃ１＿１に分類され、特徴量ＦＴ［５９］及びＦＴ［６０］はクラスタＣＴ_Ｃ１＿２に分類される。図１２における点Ｇ_Ｃ１は図１０の重心Ｇ_Ｃ１と一致する。 The first error label identification process will be described with attention to the evaluation data set ES3 corresponding to FIG. 9 and FIG. 10. Please refer to FIG. 12. For the evaluation data set ES3, the cluster CT _C1 is identified as an error cluster as described in the first embodiment. The object images OI[51] to OI[60] belong to the error cluster CT _C1 _. In other words, the feature amounts FT[51] to FT[60] belong to the error cluster CT C1. Therefore, "n=10" in the error cluster CT _C1 . The error identification unit 60 classifies the object images OI[51] to OI[60] into the first and second clusters by re-clustering the object images OI[51] to OI[60] based on the feature amounts FT[51] to FT[60]. In FIG. 12, one of the clusters CT _{C1_1} and CT _{C1_2} corresponds to the first cluster, and the other corresponds to the second cluster. The features FT[51] to FT[58] are classified into a cluster CT _{C1_1} , and the features FT[59] and FT[60] are classified into a cluster CT _{C1_2} . The point G _C1 in FIG. 12 coincides with the center of gravity G _C1 in FIG.

図１２の例では、クラスタＣＴ_Ｃ１＿１に属する物体画像の総数は８であって且つクラスタＣＴ_Ｃ１＿２に属する物体画像の総数は２である。このため、評価データセットＥＳ３の誤りクラスタに対して第１誤りラベル特定処理を適用した場合には、クラスタＣＴ_Ｃ１＿２に属する各物体画像が誤りラベル付き物体画像として特定される。 12 , the total number of object images belonging to the cluster CT _{C1_1} is 8, and the total number of object images belonging to the cluster CT _{C1_2} is 2. Therefore, when the first erroneous label identification process is applied to the erroneous clusters of the evaluation dataset ES3, each object image belonging to the cluster CT _{C1_2} is identified as an erroneously labeled object image.

図１２の例において（図１０も参照）、誤りクラスタＣＴ_Ｃ１の重心Ｇ_Ｃ１とクラスタＣＴ_Ｃ１＿２との距離（詳細には、クラスタＣＴ_Ｃ１＿２に属する特徴量ＦＴ［５９］及びＦＴ［６０］の重心と重心Ｇ_Ｃ１との距離）は、誤りクラスタＣＴ_Ｃ１の重心Ｇ_Ｃ１とクラスタＣＴ_Ｃ１＿１との距離（詳細には、クラスタＣＴ_Ｃ１＿１に属する特徴量ＦＴ［５１］～ＦＴ［５８］の重心と重心Ｇ_Ｃ１との距離）よりも大きい。このため、評価データセットＥＳ３の誤りクラスタに対して第２誤りラベル特定処理を適用した場合にも、第１誤りラベル特定処理を適用した場合と同様、クラスタＣＴ_Ｃ１＿２に属する各物体画像が誤りラベル付き物体画像として特定される。 12 (see also FIG. 10 ), the distance between the center of gravity G _C1 of the error cluster CT _C1 and the cluster CT _{C1_2} (more specifically, the distance between the center of gravity G _C1 and the centers of gravity of the features FT[59] and FT[60] belonging to the cluster CT _{C1_2} ) is greater than the distance between the center of gravity G _C1 of the error cluster CT _C1 and the cluster CT _{C1_1 (more specifically, the distance between the center of gravity G C1 and the centers of gravity of the features FT[51] to FT[58] belonging to the cluster CT C1_1} ₎ _. For this reason, even when the second error label identification process is applied to the error clusters of the evaluation dataset ES3, each object image belonging to the cluster CT _{C1_2} is identified as an erroneously labeled object image, as in the case of applying the first error label identification process.

本実施例に示す如く、誤りクラスタに対して再クラスタリングを実行すれば、誤りクラスタが、正解ラベル付き物体画像が属するクラスタと、誤りラベル付き物体画像が属するクラスタと、に分解されると見込まれる。このため、再クラスタリングの結果を参照することで、誤りラベル付き物体画像を容易に特定することが可能となる。 As shown in this embodiment, if reclustering is performed on the erroneous clusters, the erroneous clusters are expected to be decomposed into clusters to which correctly labeled object images belong and clusters to which erroneously labeled object images belong. Therefore, by referring to the results of reclustering, it becomes possible to easily identify erroneously labeled object images.

より具体的には、誤りクラスタに対して再クラスタリングを実行して第１及び第２クラスタを形成したとき、第１及び第２クラスタの内、一方のクラスタに属する物体画像は正解ラベル付き物体画像であって且つ他方のクラスタに属する物体画像は誤りラベル付き物体画像であると見込まれる。一方で、基本的に画像分類データセットのアノテーション情報の大部分は正しいと期待される。従って、第１及び第２クラスタの内、属する物体画像がより少ない方のクラスタ内の物体画像が誤りラベル付き物体画像である可能性が高い。この知見に基づく第１誤りラベル特定処理を用いれば、誤りラベル付き物体画像を精度良く特定することができる。 More specifically, when reclustering is performed on the erroneous clusters to form the first and second clusters, it is expected that the object images belonging to one of the first and second clusters are correctly labeled object images, and the object images belonging to the other cluster are incorrectly labeled object images. On the other hand, it is expected that the majority of annotation information in the image classification dataset is basically correct. Therefore, of the first and second clusters, the object images in the cluster that contains fewer object images are more likely to be incorrectly labeled object images. By using the first erroneous label identification process based on this knowledge, it is possible to accurately identify incorrectly labeled object images.

また、画像分類データセットのアノテーション情報の大部分は正しいと期待されるのであるから、誤りクラスタ内の各物体画像の再クラスタリングにより形成される第１及び第２クラスタの内、当該誤りクラスタの重心からより遠い方のクラスタ内の物体画像が誤りラベル付き物体画像である可能性が高い。この知見に基づく第２誤りラベル特定処理を用いることでも、誤りラベル付き物体画像を精度良く特定することができる。 In addition, since most of the annotation information in an image classification dataset is expected to be correct, of the first and second clusters formed by reclustering each object image in an erroneous cluster, the object image in the cluster farther from the center of gravity of the erroneous cluster is likely to be an erroneously labeled object image. By using the second erroneous label identification process based on this knowledge, erroneously labeled object images can also be identified with high accuracy.

＜＜第３実施例＞＞
第３実施例を説明する。第３実施例は上述の第１及び第２実施例と組み合わせて実施される。上述したように、学習用データセット生成部７０は、複数の学習用画像の画像データと各学習用画像内の物体の位置及びクラスを特定するアノテーション情報とを含む学習用データセットを生成する。入力画像ＩＩ［１］～ＩＩ［Ｌ］の夫々は学習用画像の候補であり、仮に、取得部１０にて取り扱われる入力画像ＩＩ［１］～ＩＩ［Ｌ］の画像データ及び各入力画像のアノテーション情報を含むデータセット（暫定データセット）において、誤りラベル付き物体画像が一切存在しないのであれば、暫定データセットが、そのまま学習用データセットとなる。但し、実際には、或る程度、誤りラベル付き物体画像が存在することが多い。第３実施例では、誤りラベル付き物体画像が特定された場合に、学習用データセット生成部７０により実行可能な誤りラベル対応処理を説明する。 <<Third Example>>
A third embodiment will be described. The third embodiment is implemented in combination with the first and second embodiments. As described above, the learning dataset generation unit 70 generates a learning dataset including image data of a plurality of learning images and annotation information that identifies the position and class of an object in each learning image. Each of the input images II[1] to II[L] is a candidate for a learning image, and if there is no erroneously labeled object image in a dataset (provisional dataset) including the image data of the input images II[1] to II[L] and the annotation information of each input image handled by the acquisition unit 10, the provisional dataset becomes the learning dataset as it is. However, in reality, there are often erroneously labeled object images to a certain extent. In the third embodiment, an erroneous label correspondence process that can be executed by the learning dataset generation unit 70 when an erroneously labeled object image is identified will be described.

まず、図５及び図６に対応する上記評価データセットＥＳ１に注目する。評価データセットＥＳ１に関して、クラスタリング処理部４０により生成されたクラスタＣＴ_Ａ１及びＣＴ_Ａ２は何れも誤りクラスタは特定されず、故に、物体画像ＯＩ［１１］～ＯＩ［２０］は何れも誤りラベル付き物体画像として特定されない。生成部７０は、誤りラベル付き物体画像が特定されない評価データセット（ここではＥＳ１）に対しては誤りラベル対応処理を実行しない。生成部７０は、誤りクラスタが一切特定されない評価データセット中の各物体画像を正解ラベル付き物体画像とみなし、正解ラベル付き物体画像を含む各入力画像を学習用画像に設定する。学習用画像に設定された各入力画像の画像データは学習用データセットに含められる。 First, attention is paid to the evaluation data set ES1 corresponding to FIG. 5 and FIG. 6. For the evaluation data set ES1, neither of the clusters CT _A1 and CT _A2 generated by the clustering processing unit 40 is identified as an erroneous cluster, and therefore none of the object images OI[11] to OI[20] are identified as erroneously labeled object images. The generation unit 70 does not perform erroneous label correspondence processing on the evaluation data set (ES1 in this case) in which no erroneously labeled object images are identified. The generation unit 70 regards each object image in the evaluation data set in which no erroneous clusters are identified as a correctly labeled object image, and sets each input image including a correctly labeled object image as a learning image. The image data of each input image set as a learning image is included in the learning data set.

例えば、物体画像ＯＩ［１１］～ＯＩ［１４］が入力画像ＩＩ［Ｊ_Ａ１］に含まれ、物体画像ＯＩ［１５］～ＯＩ［１７］が入力画像ＩＩ［Ｊ_Ａ１＋１］に含まれ、且つ、物体画像ＯＩ［１８］～ＯＩ［２０］が入力画像ＩＩ［Ｊ_Ａ１＋２］に含まれるのであれば（ここでは“Ｊ_Ａ１＋２＝Ｊ_Ａ２”であると仮定）、３枚の入力画像ＩＩ［Ｊ_Ａ１］～ＩＩ［Ｊ_Ａ１＋２］を３枚の学習用画像に設定し、入力画像ＩＩ［Ｊ_Ａ１］～ＩＩ［Ｊ_Ａ１＋２］の画像データと、暫定データセットにおける入力画像ＩＩ［Ｊ_Ａ１］～ＩＩ［Ｊ_Ａ１＋２］のアノテーション情報とを、そのまま学習用データセットに含める。 For example, if object images OI[11] to OI[14] are included in input image II[J _A1 ], object images OI[15] to OI[17] are included in input image II[J _A1 +1], and object images OI[18] to OI[20] are included in input image II[J _A1 +2] (assuming here that "J _A1 +2 = J _A2 "), then the three input images II[J _A1 ] to II[J _A1 +2] are set as three learning images, and the image data of input images II[J _A1 ] to II[J _A1 +2] and the annotation information of input images II[J _A1 ] to II[J _A1 +2] in the provisional dataset are included as they are in the learning dataset.

次に、図７、図８及び図１１に対応する上記評価データセットＥＳ２に注目する。評価データセットＥＳ２に関しては、クラスタリング処理部４０により生成されたクラスタＣＴ_Ｂ１及びＣＴ_Ｂ２の内、クラスタＣＴ_Ｂ２のみが誤りクラスタとして特定され、且つ、物体画像ＯＩ［３１］～［４０］の内、物体画像ＯＩ［３９］及び［４０］のみが誤りラベル付き物体画像として特定される。生成部７０は、誤りラベル付き物体画像が特定された評価データセット（ここではＥＳ２）に対して誤りラベル対応処理を実行する。誤りラベル対応処理として、以下の第１誤りラベル対応処理を採用できる。 Next, attention is paid to the evaluation data set ES2 corresponding to Figures 7, 8, and 11. Regarding the evaluation data set ES2, of the clusters CT _B1 and CT _B2 generated by the clustering processing unit 40, only the cluster CT _B2 is identified as an erroneous cluster, and of the object images OI [31] to [40], only the object images OI [39] and [40] are identified as erroneously labeled object images. The generating unit 70 executes an erroneous label correspondence process on the evaluation data set (ES2 in this case) in which erroneously labeled object images have been identified. As the erroneous label correspondence process, the following first erroneous label correspondence process can be adopted.

今、物体画像ＯＩ［３１］～ＯＩ［３４］が入力画像ＩＩ［Ｊ_Ｂ１］に含まれ、物体画像ＯＩ［３５］～ＯＩ［３７］が入力画像ＩＩ［Ｊ_Ｂ１＋２］に含まれ、且つ、物体画像ＯＩ［３８］～ＯＩ［４０］が入力画像ＩＩ［Ｊ_Ｂ１＋２］に含まれるケースＣＡＳＥ１を想定する（当該ケースでは“Ｊ_Ｂ１＋２＝Ｊ_Ｂ２”）。 Now, assume case CASE1 in which object images OI[31] to OI[34] are included in input image II[J _B1 ], object images OI[35] to OI[37] are included in input image II[J _B1 +2], and object images OI[38] to OI[40] are included in input image II[J _B1 +2] (in this case, "J _B1 +2 = J _B2 ").

ケースＣＡＳＥ１に係る第１誤りラベル対応処理において、生成部７０は、誤りラベル付き物体画像である物体画像ＯＩ［３９］及び［４０］を含む入力画像ＩＩ［Ｊ_Ｂ１＋２］をアノテーション対象として抽出し、暫定データセットにおける入力画像ＩＩ［Ｊ_Ｂ１＋２］のアノテーション情報に誤りが含まれる可能性があることを、データ処理装置１の管理者に対して通知する（この通知を通知ＮＴ１と称する）。通知ＮＴ１は、データ処理装置１に設けられた又は接続された表示画面（不図示）での表示を通じて行われて良い。通知ＮＴ１は、入力画像ＩＩ［Ｊ_Ｂ１＋２］のアノテーション情報の正解を求める正解要求通知を含んでいて良い。通知ＮＴ１を受けた管理者は、入力画像ＩＩ［Ｊ_Ｂ１＋２］のアノテーション情報の正解を示す正解情報を、ポインティングデバイス等の所定のマンマシンインターフェース（不図示）を通じてデータ処理装置１に入力する。ここにおける正解情報は、物体画像ＯＩ［３９］及び［４０］中の物体が交通標識であること（即ち第３クラスの物体であること）を指し示す。生成部７０は、正解情報に基づき入力画像ＩＩ［Ｊ_Ｂ１＋２］に対するアノテーション情報（暫定データセット中のアノテーション情報）を修正し、入力画像ＩＩ［Ｊ_Ｂ１＋２］の画像データと当該修正したアノテーション情報とを、１枚の教師用画像の画像データ及び当該教師用画像のアノテーション情報の組として、学習用データセットに含める。 In the first error label correspondence process for case CASE1, the generation unit 70 extracts the input image II[J _B1 +2] including the object images OI[39] and [40], which are object images with an error label, as an annotation target, and notifies the administrator of the data processing device 1 that the annotation information of the input image II[J _B1 +2] in the provisional data set may contain an error (this notification is referred to as notification NT1). The notification NT1 may be performed through a display on a display screen (not shown) provided in or connected to the data processing device 1. The notification NT1 may include a correct answer request notification for requesting a correct answer for the annotation information of the input image II[J _B1 +2]. The administrator who has received the notification NT1 inputs correct answer information indicating the correct answer for the annotation information of the input image II[J _B1 +2] into the data processing device 1 through a predetermined man-machine interface (not shown) such as a pointing device. The correct answer information here indicates that the objects in the object images OI[39] and [40] are traffic signs (i.e., objects of the third class). The generation unit 70 corrects the annotation information for the input image II[J _B1 +2] (annotation information in the provisional data set) based on the correct answer information, and includes the image data of the input image II[J _B1 +2] and the corrected annotation information in the learning data set as a set of image data of one teacher image and the annotation information of the teacher image.

ケースＣＡＳＥ１において、入力画像ＩＩ［Ｊ_Ｂ１］及びＩＩ［Ｊ_Ｂ１＋１］は２枚の学習用画像に設定され、入力画像ＩＩ［Ｊ_Ｂ１］及びＩＩ［Ｊ_Ｂ１＋１］の画像データと、暫定データセットにおける入力画像ＩＩ［Ｊ_Ｂ１］及びＩＩ［Ｊ_Ｂ１＋１］のアノテーション情報とが、そのまま学習用データセットに含められる。 In case CASE 1, input images II[J _B1 ] and II[J _B1 +1] are set as two learning images, and the image data of input images II[J _B1 ] and II[J _B1 +1] and the annotation information of input images II[J _B1 ] and II[J _B1 +1] in the provisional dataset are included directly in the learning dataset.

物体画像ＯＩ［３１］～ＯＩ［４０］が全て１枚の入力画像ＩＩ［Ｊ_Ｂ１］に含まれるケースＣＡＳＥ２もあり得る（当該ケースでは“Ｊ_Ｂ１＝Ｊ_Ｂ２”）。 There may also be a case CASE2 in which all of the object images OI[31] to OI[40] are included in a single input image II[J _B1 ] (in this case, "J _B1 =J _B2 ").

ケースＣＡＳＥ２に係る第１誤りラベル対応処理において、生成部７０は、誤りラベル付き物体画像である物体画像ＯＩ［３９］及び［４０］を含む入力画像ＩＩ［Ｊ_Ｂ１］をアノテーション対象として抽出し、暫定データセットにおける入力画像ＩＩ［Ｊ_Ｂ１］のアノテーション情報に誤りが含まれる可能性があることを、データ処理装置１の管理者に対して通知する（この通知を通知ＮＴ２と称する）。通知ＮＴ２は、データ処理装置１に設けられた又は接続された表示画面（不図示）での表示を通じて行われて良い。通知ＮＴ２は、入力画像ＩＩ［Ｊ_Ｂ１］のアノテーション情報の正解を求める正解要求通知を含んでいて良い。通知ＮＴ２を受けた管理者は、入力画像ＩＩ［Ｊ_Ｂ１］のアノテーション情報の正解を示す正解情報を、ポインティングデバイス等の所定のマンマシンインターフェース（不図示）を通じてデータ処理装置１に入力する。ここにおける正解情報は、物体画像ＯＩ［３９］及び［４０］中の物体が交通標識であること（即ち第３クラスの物体であること）を指し示す。生成部７０は、正解情報に基づき入力画像ＩＩ［Ｊ_Ｂ１］に対するアノテーション情報（暫定データセット中のアノテーション情報）を修正し、入力画像ＩＩ［Ｊ_Ｂ１］の画像データと当該修正したアノテーション情報とを、１枚の教師用画像の画像データ及び当該教師用画像のアノテーション情報の組として、学習用データセットに含める。 In the first error label correspondence process related to case CASE2, the generation unit 70 extracts the input image II[J _B1 ] including the object images OI[39] and [40], which are object images with an error label, as an annotation target, and notifies the administrator of the data processing device 1 that the annotation information of the input image II[J _B1 ] in the provisional data set may contain an error (this notification is referred to as notification NT2). The notification NT2 may be performed through a display on a display screen (not shown) provided in or connected to the data processing device 1. The notification NT2 may include a correct answer request notification for requesting a correct answer for the annotation information of the input image II[J _B1 ]. The administrator who has received the notification NT2 inputs correct answer information indicating the correct answer for the annotation information of the input image II[J _B1 ] into the data processing device 1 through a predetermined man-machine interface (not shown) such as a pointing device. The correct answer information here indicates that the objects in the object images OI[39] and [40] are traffic signs (i.e., objects of the third class). The generation unit 70 corrects the annotation information for the input image II[J _B1 ] (annotation information in the provisional dataset) based on the correct answer information, and includes the image data of the input image II[J _B1 ] and the corrected annotation information in the learning dataset as a set of image data of one teacher image and the annotation information of the teacher image.

誤りラベル対応処理として、以下の第２誤りラベル対応処理を採用することもできる。上記ケースＣＡＳＥ１に係る第２誤りラベル対応処理において、生成部７０は、誤りラベル付き物体画像である物体画像ＯＩ［３９］及び［４０］を含む入力画像ＩＩ［Ｊ_Ｂ１＋２］を学習用画像に設定しない。即ち、入力画像ＩＩ［Ｊ_Ｂ１＋２］の画像データを学習用データセットから除外する（学習用データセットに含めない）。第２誤りラベル対応処理が採用される場合、ケースＣＡＳＥ１においては、第１誤りラベル対応処理が採用される場合と同様に、入力画像ＩＩ［Ｊ_Ｂ１］及びＩＩ［Ｊ_Ｂ２］は２枚の学習用画像に設定され、入力画像ＩＩ［Ｊ_Ｂ１］及びＩＩ［Ｊ_Ｂ１＋１］の画像データと、暫定データセットにおける入力画像ＩＩ［Ｊ_Ｂ１］及びＩＩ［Ｊ_Ｂ１＋１］のアノテーション情報とが、そのまま学習用データセットに含められる。上記ケースＣＡＳＥ２に係る第２誤りラベル対応処理において、生成部７０は、誤りラベル付き物体画像である物体画像ＯＩ［３９］及び［４０］を含む入力画像ＩＩ［Ｊ_Ｂ１］を学習用画像に設定しない。即ち、入力画像ＩＩ［Ｊ_Ｂ１］の画像データを学習用データセットから除外する（学習用データセットに含めない）。尚、第２誤りラベル対応処理が採用される場合であっても、ケースＣＡＳＥ１、ＣＡＳＥ２において、上記通知ＮＴ１、ＮＴ２が行われるようにしても良い（但し正解要求通知は非実行）。 The following second error label correspondence process can also be adopted as the error label correspondence process. In the second error label correspondence process according to the above case CASE1, the generation unit 70 does not set the input image II[J _B1 +2] including the object images OI[39] and [40], which are the object images with the error labels, as the learning image. That is, the image data of the input image II[J _B1 +2] is excluded from the learning dataset (not included in the learning dataset). When the second error label correspondence process is adopted, in case CASE1, similarly to the case where the first error label correspondence process is adopted, the input images II[J _B1 ] and II[J _B2 ] are set as two learning images, and the image data of the input images II[J _B1 ] and II[J _B1 +1] and the annotation information of the input images II[J _B1 ] and II[J _B1 +1] in the provisional dataset are included in the learning dataset as they are. In the second error label handling process for CASE 2, the generating unit 70 does not set the input image II[J B1 ] including the object images OI[39] and [40], which are object images with an error label, as a learning image. That is, the image data of the input image II[J _B1 ] is excluded from the learning data set (is not included in the learning data set ₎ . Note that even when the second error label handling process is adopted, the notifications NT1 and NT2 may be performed in CASE 1 and CASE 2 (however, the correct answer request notification is not executed).

評価データセットＥＳ２に注目して第１及び第２誤りラベル対応処理を説明したが、評価データセットＥＳ３を含む他の評価データセットに対しても同様の処理が実行される。 The first and second erroneous label response processes have been described with a focus on the evaluation dataset ES2, but similar processes are also performed on other evaluation datasets, including the evaluation dataset ES3.

このように、第１誤りラベル対応処理を採用する生成部７０は、複数の入力画像（ＩＩ［１］～ＩＩ［Ｌ］）を複数の学習用画像に設定し、複数の入力画像（ＩＩ［１］～ＩＩ［Ｌ］）としての複数の学習用画像の画像データと各学習用画像内の物体の位置及びクラスを特定するアノテーション情報とを含む学習用データセットを生成する。この際、第１誤りラベル対応処理を採用する生成部７０は、誤り特定部６０の特定結果（誤りクラスタ及び誤りラベル付き物体画像の特定結果）に基づき、誤りラベル付き物体画像を含む入力画像をアノテーション対象として抽出し、アノテーション対象に対して外部から与えられた情報（正解情報）を用いて、アノテーション対象に対するアノテーション情報を修正する。修正後のアノテーション情報が学習用データセットに含められることになる。 In this way, the generation unit 70 that employs the first error label correspondence process sets multiple input images (II[1] to II[L]) as multiple learning images, and generates a learning dataset that includes image data of the multiple learning images as the multiple input images (II[1] to II[L]) and annotation information that identifies the position and class of an object in each learning image. At this time, the generation unit 70 that employs the first error label correspondence process extracts input images that include erroneously labeled object images as annotation targets based on the identification results (identification results of error clusters and erroneously labeled object images) of the error identification unit 60, and corrects the annotation information for the annotation targets using information provided from outside for the annotation targets (correct answer information). The corrected annotation information is included in the learning dataset.

これに対し、第２誤りラベル対応処理を採用する生成部７０は、複数の入力画像（ＩＩ［１］～ＩＩ［Ｌ］）の一部を複数の学習用画像として抽出し、抽出によって得られた複数の学習用画像の画像データと各学習用画像内の物体の位置及びクラスを特定するアノテーション情報とを含む学習用データセットを生成する。この際、第２誤りラベル対応処理を採用する生成部７０は、誤り特定部６０の特定結果（誤りクラスタ及び誤りラベル付き物体画像の特定結果）に基づき、誤りラベル付き物体画像を含む入力画像を上記複数の学習用画像から除外する。例えば、入力画像ＩＩ［１］～ＩＩ［１００００］が存在する場合において、入力画像ＩＩ［１］～ＩＩ［９８００］の夫々が誤りラベル付き物体画像を含まず、且つ、入力画像ＩＩ［９８０１］～ＩＩ［１００００］の夫々が誤りラベル付き物体画像を含むのであれば、入力画像ＩＩ［１］～ＩＩ［９８００］のみが計９８００枚の学習用画像として抽出されて、９８００枚の学習用画像の画像データを含む学習用データセットが生成される。 In response to this, the generation unit 70 that employs the second error label correspondence process extracts a portion of the multiple input images (II[1] to II[L]) as multiple training images, and generates a training dataset that includes image data of the multiple training images obtained by the extraction and annotation information that identifies the position and class of objects in each training image. At this time, the generation unit 70 that employs the second error label correspondence process excludes input images that include erroneously labeled object images from the multiple training images based on the identification results of the error identification unit 60 (identification results of error clusters and erroneously labeled object images). For example, if there are input images II[1] to II[10000], and none of the input images II[1] to II[9800] contain an erroneously labeled object image, and each of the input images II[9801] to II[10000] contains an erroneously labeled object image, then only the input images II[1] to II[9800] are extracted as a total of 9,800 training images, and a training dataset containing image data of the 9,800 training images is generated.

第１及び第２ラベル対応処理の何れを採用した場合でも、質の高い学習用データセット、即ちアノテーション情報に誤りの少ない（理想的には誤りの無い）学習用データセットを生成することが可能である。学習用データセットの質を高めることで、学習用データセットを用いた機械学習にて構築される画像認識用の推論モデルの性能を高めることができる。第２ラベル対応処理の採用時には手動による正解情報の入力作業が不要となるというメリットがある。一方で、第１ラベル対応処理の採用時には、第２ラベル対応処理の採用時と比べて、学習用画像の枚数を増やすことができるというメリットがある。 Whether the first or second label correspondence process is adopted, it is possible to generate a high-quality training dataset, i.e., a training dataset with few errors in the annotation information (ideally no errors). By improving the quality of the training dataset, it is possible to improve the performance of an inference model for image recognition constructed by machine learning using the training dataset. When the second label correspondence process is adopted, there is an advantage that the manual input of correct answer information is not required. On the other hand, when the first label correspondence process is adopted, there is an advantage that the number of training images can be increased compared to when the second label correspondence process is adopted.

＜＜第４実施例＞＞
第４実施例を説明する。図１３にデータ処理装置１の動作の一部フローチャートを示す。画像分類データセットの取得処理、機械学習処理、及び、特徴量導出処理は、ステップＳ１１に至る前に実行済みであるとする。但し、特徴量導出処理の実行タイミングは後述のステップＳ１２に至る前であれば任意である。 <<Fourth Example>>
A fourth embodiment will now be described. Fig. 13 shows a partial flowchart of the operation of the data processing device 1. It is assumed that the image classification dataset acquisition process, the machine learning process, and the feature derivation process have been executed before reaching step S11. However, the feature derivation process can be executed at any time before reaching step S12 described later.

図１３の動作では、評価データセットを単位にクラスタリング処理を行うために、まずステップＳ１１において、クラスタリング処理部４０がｍ枚の物体画像の画像データと当該ｍ枚の物体画像のクラス情報とを含む評価データセットを設定する。ｍ枚の物体画像は画像分類データセット中のＭ枚の入力画像から切り出された物体画像である。Ｍは１以上の整数である。 In the operation of FIG. 13, in order to perform clustering processing on an evaluation dataset basis, first, in step S11, the clustering processing unit 40 sets an evaluation dataset including image data of m object images and class information of the m object images. The m object images are object images cut out from M input images in the image classification dataset. M is an integer equal to or greater than 1.

ｍは３以上の整数である。但し、ｍは４以上の整数であっても良いし、５以上の整数であっても良いし、更にそれ以上の整数であっても良い。ｍは固定値ｍ_ＣＮＳＴ（例えば“ｍ＝ｍ_ＣＮＳＴ＝１０）であっても良い。“ｍ＝ｍ_ＣＮＳＴ”である場合、Ｍの値は不定であり、固定値ｍ_ＣＮＳＴの枚数分の物体画像を含むＭ枚の入力画像を画像分類データセット中から抽出することによって評価データセットが設定される。“ｍ＝ｍ_ＣＮＳＴ”とするのではなく、“Ｍ＝Ｍ_ＣＮＳＴ”としても良い。Ｍ_ＣＮＳＴは固定値であり、基本的には２以上の整数であるが、１であり得ても良い。“Ｍ＝Ｍ_ＣＮＳＴ”である場合、Ｍ_ＣＮＳＴ枚の入力画像に設定された物体領域（バウンディングボックス）の総数がｍの値になるので、評価データセットが設定されるたびにｍの値は変動しうる。“Ｍ＝Ｍ_ＣＮＳＴ”とすることを原則としつつも、Ｍ枚の入力画像に設定される物体領域（バウンディングボックス）の総数が必要数（少なくとも３）に満たない場合には、当該総数が必要数以上となるまでＭの値を増大させて良い。 m is an integer of 3 or greater. However, m may be an integer of 4 or greater, an integer of 5 or greater, or an integer even greater than this. m may be a fixed value m _CNST (e.g., "m = m _CNST = 10). When "m = m _CNST ", the value of M is indefinite, and the evaluation dataset is set by extracting M input images including the fixed value m _CNST number of object images from the image classification dataset. Instead of "m = m _CNST ", "M = M _CNST " may be used. M _CNST is a fixed value and is basically an integer of 2 or more, but may be 1. When "M = M CNST ", the total number of object regions (bounding boxes) set in M _CNST input images is the value of m, so the value of _{m may vary each time the evaluation dataset is set. While "M = M CNST} _" is the general rule, if the total number of object regions (bounding boxes) set in M input images is less than the required number (at least 3), the value of M may be increased until the total number becomes equal to or greater than the required number.

ステップＳ１１にて評価データセットが設定されると、その設定された評価データセットに対しステップＳ１２～Ｓ１５の処理又はステップＳ１２～Ｓ１６の処理が順次実行される。即ちステップＳ１２において、クラスタリング処理部４０は、評価データセットにおけるｍ枚の物体画像の特徴量及びクラス情報に基づきｍ枚の物体画像のクラスタリングを行い、これによってｍ枚の物体画像を１以上のクラスタに分類する。続くステップＳ１３において、指標導出部５０は、クラスタごとに特徴量の重心、距離平均及び距離分散を導出する。そしてステップＳ１４において、誤り特定部６０は、距離平均及び距離分散に基づき誤りクラスタを特定するための処理を実行する。この処理にて誤りクラスタが特定されたか否かがステップＳ１５にてチェックされる。誤りクラスタが特定された場合（ステップＳ１５のＹ）にはステップＳ１６の処理を行ってからステップＳ１７に進むが、誤りクラスタが特定されなかった場合（ステップＳ１５のＮ）にはステップＳ１６の処理を経ずにステップＳ１７に直接進む。ステップＳ１６において、誤り特定部６０は、誤りクラスタの再クラスタリングを経て誤りラベル付き物体画像を特定する。 When the evaluation data set is set in step S11, the processes of steps S12 to S15 or steps S12 to S16 are sequentially performed on the set evaluation data set. That is, in step S12, the clustering processing unit 40 performs clustering of the m object images in the evaluation data set based on the feature amounts and class information of the m object images, thereby classifying the m object images into one or more clusters. In the following step S13, the index derivation unit 50 derives the center of gravity, distance average, and distance variance of the feature amounts for each cluster. Then, in step S14, the error identification unit 60 performs a process for identifying an error cluster based on the distance average and distance variance. In step S15, it is checked whether an error cluster has been identified by this process. If an error cluster has been identified (Y in step S15), the process of step S16 is performed and then the process proceeds to step S17, but if an error cluster has not been identified (N in step S15), the process proceeds directly to step S17 without going through the process of step S16. In step S16, the error identification unit 60 identifies object images with erroneous labels through reclustering of the error clusters.

ステップＳ１７において、データ処理装置１（例えばクラスタリング処理部４０）は、画像分類データセットに含まれる全ての物体画像に対する評価が完了したか否かを判定する。画像分類データセットに含まれる全ての物体画像が何れかの評価データセットに含められた上でステップＳ１２～Ｓ１５の処理又はステップＳ１２～Ｓ１６の処理が行われたならば、上記評価が完了したと判定して（ステップＳ１７のＹ）ステップＳ１８に進む。
画像分類データセットに含まれる１以上の物体画像が未だ評価データセットに含められていないのであれば、ステップＳ１１に戻ってステップＳ１１以降の処理を繰り返す。 In step S17, the data processing device 1 (e.g., the clustering processing unit 40) determines whether or not the evaluation of all object images included in the image classification dataset has been completed. If all object images included in the image classification dataset have been included in any one of the evaluation datasets and the processing of steps S12 to S15 or the processing of steps S12 to S16 has been performed, it is determined that the evaluation has been completed (Y in step S17) and the process proceeds to step S18.
If one or more object images included in the image classification dataset have not yet been included in the evaluation dataset, the process returns to step S11 and repeats the processes from step S11 onward.

ステップＳ１８において、学習用データセット生成部７０は、必要に応じて第３実施例で述べた誤りラベル対応処理を行いつつ、暫定データセットを元に学習用データセットを生成する。生成された学習用データセットはデータベース８０に記憶される。 In step S18, the training dataset generation unit 70 generates a training dataset based on the provisional dataset, while performing the erroneous label handling process described in the third embodiment as necessary. The generated training dataset is stored in the database 80.

＜＜第５実施例＞＞
第５実施例を説明する。第５実施例では、上述した事項に対する応用技術、変形技術又は補足事項等を説明する。 <<Fifth Example>>
A fifth embodiment will now be described. In the fifth embodiment, applied techniques, modified techniques, or supplementary matters to the above-mentioned matters will be described.

上述の実施例において、誤りクラスタに属する誤りラベル付き物体画像の数が２つとなる具体例に挙げたが（図７～図１２参照）、１つの誤りクラスタに属する誤りラベル付き物体画像の数は１であり得るし、３以上であり得る。 In the above embodiment, a specific example was given in which the number of erroneously labeled object images belonging to an error cluster was two (see Figures 7 to 12), but the number of erroneously labeled object images belonging to one error cluster could be one, or three or more.

データ処理装置１は、ハードウェアとして、演算処理装置であるＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＲＯＭ（Read only memory）及びＲＡＭ（Random access memory）等を備える。データ処理装置１は、ＲＯＭに格納されたプログラム又は他の装置から通信を通じて取得されたプログラムをＣＰＵにて実行することにより、図１に示す各部位の機能を実現して良く、故に図１３のステップＳ１１～Ｓ１８の各処理を実現して良い。 The data processing device 1 includes, as hardware, a central processing unit (CPU), a graphics processing unit (GPU), a read only memory (ROM), and a random access memory (RAM), which are arithmetic processing devices. The data processing device 1 may realize the functions of each part shown in FIG. 1 by executing, in the CPU, a program stored in the ROM or a program acquired from another device through communication, and thus may realize each process of steps S11 to S18 in FIG. 13.

データ処理装置１にて生成された学習用データセットを用いて、任意のニューラルネットワークを機械学習させることにより画像認識用の推論モデルを構築できる。例えば、車載装置においてＤＮＮ（Deep Neural Network）を含む学習器を形成しておき、学習用データセットを教師データとして用いて、車載装置の学習器に機械学習を行わせるようにしても良い。機械学習を経て車載装置の学習器により物体検出が可能な推論モデルが形成される。そして、車載装置にて推論モデルによる物体検出を行わせ、物体検出の結果を、車両で実施され得る自動運転又は運転支援等に利用して良い。ここにおける車載装置は、自動車等の車両に設置される装置を指す。 An inference model for image recognition can be constructed by machine learning any neural network using the learning dataset generated by the data processing device 1. For example, a learning device including a DNN (Deep Neural Network) may be formed in the in-vehicle device, and the learning dataset may be used as training data to cause the learning device of the in-vehicle device to perform machine learning. After machine learning, an inference model capable of object detection is formed by the learning device of the in-vehicle device. Then, object detection is performed by the inference model in the in-vehicle device, and the results of the object detection may be used for autonomous driving or driving assistance that may be implemented in the vehicle. The in-vehicle device here refers to a device installed in a vehicle such as an automobile.

尚、データ処理装置１自体が車載装置であっても構わない。車両（例えば放送中継車）によっては、豊富な計算資源を有する車載装置が設置されることもあり、この場合においては特にデータ処理装置１自体を車載装置とすることも可能である。 The data processing device 1 itself may be an in-vehicle device. Depending on the vehicle (e.g., a broadcast relay vehicle), an in-vehicle device with abundant computing resources may be installed, and in this case, it is particularly possible for the data processing device 1 itself to be an in-vehicle device.

データ処理装置１により実行される処理の一部又は全部は、ソフトウェアおよびハードウェアの混在処理により実現しても良い。前述した方法をコンピュータに実行させるコンピュータプログラム及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体は、本実施形態の範囲に含まれる。ここで、コンピュータ読み取り可能な記録媒体は、例えば、フレキシブルディスク、ハードディスク、ＣＤ－ＲＯＭ、ＭＯ、ＤＶＤ、ＤＶＤ－ＲＯＭ、ＤＶＤ－ＲＡＭ、大容量ＤＶＤ、次世代ＤＶＤ、半導体メモリである。 Part or all of the processing executed by the data processing device 1 may be realized by a mixture of software and hardware. A computer program that causes a computer to execute the above-mentioned method and a computer-readable recording medium on which the program is recorded are included in the scope of this embodiment. Here, computer-readable recording media are, for example, flexible disks, hard disks, CD-ROMs, MOs, DVDs, DVD-ROMs, DVD-RAMs, large-capacity DVDs, next-generation DVDs, and semiconductor memories.

本発明の実施形態は、特許請求の範囲に示された技術的思想の範囲内において、適宜、種々の変更が可能である。以上の実施形態は、あくまでも、本発明の実施形態の例であって、本発明ないし各構成要件の用語の意義は、以上の実施形態に記載されたものに制限されるものではない。上述の説明文中に示した具体的な数値は、単なる例示であって、当然の如く、それらを様々な数値に変更することができる。 The embodiments of the present invention can be modified in various ways as appropriate within the scope of the technical ideas set forth in the claims. The above embodiments are merely examples of the present invention, and the meanings of the terms of the present invention or each of the constituent elements are not limited to those described in the above embodiments. The specific numerical values shown in the above description are merely examples, and can, of course, be changed to various numerical values.

１データ処理装置
１０画像分類データセット取得部
２０機械学習処理部
２１ニューラルネットワーク
３０推論部
３１特徴量導出部
４０クラスタリング処理部
５０指標導出部
６０誤り特定部
７０学習用データセット生成部
８０データベース REFERENCE SIGNS LIST 1 Data processing device 10 Image classification dataset acquisition unit 20 Machine learning processing unit 21 Neural network 30 Inference unit 31 Feature amount derivation unit 40 Clustering processing unit 50 Index derivation unit 60 Error identification unit 70 Learning dataset generation unit 80 Database

Claims

A data processing device having a calculation processing unit,
The arithmetic processing unit is
creating an inference model for object classification by performing machine learning on a neural network using an image classification dataset including a plurality of object images, each of which includes image data of an object, and class information that identifies a class of an object in each of the object images;
Derive features of each object image by inputting the image classification dataset into the inference model ;
performing clustering to classify the m object images into k clusters based on feature amounts of the m object images included in the plurality of object images and the class information for the m object images;
deriving, for each cluster obtained by the clustering, a center of gravity of a plurality of feature points belonging to the cluster and a distance between the center of gravity and each feature amount, and deriving an average and a variance of the distances derived for the plurality of feature amounts;
identifying a cluster to which an object image having an error in the class information belongs as an error cluster based on the derived results of the mean and the variance ;
performing reclustering of the n object images belonging to the erroneous cluster based on the n feature amounts for the n object images, and identifying an object image having an error in the class information from among the n object images based on a result of the reclustering;
m represents an integer of 3 or more, n represents an integer of 3 or more,
k is equal to the number of object classes in the m object images represented by the class information for the m object images.
, data processing device.

2 . The data processing apparatus according to claim 1 , wherein the calculation processing unit identifies, as the erroneous cluster, a cluster whose derived mean is equal to or greater than a predetermined first threshold and whose derived variance is equal to or greater than a predetermined second threshold.

3. The data processing device according to claim 1, wherein the calculation processing unit classifies the n object images into first and second clusters by the reclustering, and when the total number of object images belonging to the second cluster is smaller than the total number of object images belonging to the first cluster, identifies the object image belonging to the second cluster as an object image having an error in the class information.

3. The data processing device according to claim 1, wherein the arithmetic processing unit classifies the n object images into first and second clusters by the reclustering, and when a distance between the center of gravity of the n feature amounts for the n object images and the second cluster is greater than a distance between the center of gravity of the n feature amounts for the n object images and the first cluster, identifies an object image belonging to the second cluster as an object image having an error in the class information.

the plurality of object images are images cut out from a plurality of input images,
the arithmetic processing unit uses the plurality of input images as a plurality of learning images, and generates a learning dataset including image data of the plurality of learning images and annotation information that identifies a position and a class of an object in each learning image ;
The data processing device according to any one of claims 1 to 4, wherein the calculation processing unit extracts an input image including an object image having an error in the class information as an annotation target from the multiple input images based on the result of identifying the error cluster, and corrects the annotation information for the annotation target using information provided from outside for the annotation target.

the plurality of object images are images cut out from a plurality of input images,
the arithmetic processing unit extracts a portion of the plurality of input images as a plurality of learning images, and generates a learning dataset including image data of the plurality of learning images and annotation information that identifies a position and a class of an object in each learning image ;
5. The data processing device according to claim 1 , wherein the arithmetic processing unit excludes, from the plurality of learning images, input images including object images having erroneous class information, based on the result of identifying the erroneous clusters .

a feature derivation step of creating an inference model for object classification by executing machine learning of a neural network using an image classification dataset including a plurality of object images each including image data of an object and class information identifying a class of an object in each object image, and deriving features of each object image by inputting the image classification dataset to the inference model ;
a clustering processing step of performing clustering to classify the m object images into k clusters based on feature amounts of the m object images included in the plurality of object images and the class information for the m object images;
an index deriving step of deriving, for each cluster obtained by the clustering, a center of gravity of a plurality of feature points belonging to the cluster and a distance between the center of gravity and each feature amount, and deriving an average and a variance of the distances derived for the plurality of feature amounts;
and an error identifying step of identifying, as an error cluster, a cluster to which an object image having an error in the class information belongs, based on the derived results of the mean and the variance,
In the error identifying step, reclustering is performed on the n object images belonging to the error cluster based on the n feature amounts for the n object images, and an object image having an error in the class information is identified from among the n object images based on a result of the reclustering;
m represents an integer of 3 or more, n represents an integer of 3 or more,
k is equal to the number of object classes in the m object images represented by the class information for the m object images.
,Data Processing Methods.