JP7824601B2

JP7824601B2 - Information processing device and information processing method

Info

Publication number: JP7824601B2
Application number: JP2022041786A
Authority: JP
Inventors: 祐生鵜飼; 弘亘藤吉; 隆義山下; 翼平川
Original assignee: Glory Ltd; Chubu University
Current assignee: Glory Ltd; Chubu University
Priority date: 2022-03-16
Filing date: 2022-03-16
Publication date: 2026-03-05
Anticipated expiration: 2042-03-16
Also published as: WO2023176445A1; JP2023136262A

Description

本発明は、機械学習に関する情報処理装置（特に、機械学習に関する説明性を向上させるための情報処理装置）、およびそれに関連する技術に関する。 The present invention relates to an information processing device related to machine learning (in particular, an information processing device for improving explainability related to machine learning) and related technologies.

近年、深層学習（ディープラーニング）などの機械学習を用いた推論処理技術が急速に進化を遂げている。 In recent years, inference processing technologies using machine learning such as deep learning have evolved rapidly.

しかしながら、機械学習における学習モデルが非常に複雑であること等に起因して、学習モデルによる推論結果がどのような判断根拠に基づいて得られているのかが必ずしも明確ではない（説明が容易ではない）、という問題が存在する。 However, due to factors such as the extremely complex nature of learning models in machine learning, there is a problem in that it is not always clear (it is not easy to explain) what basis the inference results obtained by the learning model are based on.

特に、推論結果が重要な影響を与える場面では、判断根拠の説明性を向上させることが要求されている。たとえば、特許文献１に記載の技術は、このような要求に応える技術の一つである。 In particular, in situations where the inference results have a significant impact, there is a demand for improved explainability of the basis for judgments. For example, the technology described in Patent Document 1 is one technology that meets this demand.

特開２００１－３３３７６号公報Japanese Patent Application Laid-Open No. 2001-33376

上述の特許文献１においては、画像領域内における注目領域（学習モデルによる推論処理での注目領域）が可視化され、当該注目領域が推論に利用された領域として把握される。すなわち、画像内のいずれの領域に注目して類似性が判断されたかを把握できる。 In the aforementioned Patent Document 1, the region of interest within the image area (the region of interest in the inference process using the learning model) is visualized, and this region of interest is recognized as the region used in the inference. In other words, it is possible to understand which region within the image was focused on to determine similarity.

しかしながら、特許文献１の技術では、推論処理において、画像内の注目領域（注目位置）を把握できるとしても、どのようなコンセプト（概念）の類似性が推論結果に影響を与えているかを把握することは困難である。換言すれば、推論処理における判断根拠をコンセプトベースで説明することは困難である。 However, with the technology of Patent Document 1, even if it is possible to identify the region of interest (position of interest) within an image during inference processing, it is difficult to understand what concept similarities are influencing the inference results. In other words, it is difficult to explain the basis for judgments made during inference processing on a concept basis.

そこで、この発明は、画像の類似性の根拠をコンセプトベースで説明することが可能な技術を提供することを課題とする。 The objective of this invention is to provide technology that can explain the basis for image similarity on a concept-based basis.

上記課題を解決すべく、本発明に係る情報処理装置は、機械学習された学習モデルへの複数の入力画像のそれぞれの入力に対応して前記学習モデルから出力される複数の特徴ベクトルを取得し、前記複数の特徴ベクトルに対する階層化クラスタリング処理を実行することにより階層化された複数のクラスタを生成し、前記複数のクラスタのうちの特定クラスタに対応する部分空間あるいはベクトルを、前記特定クラスタのコンセプトとして抽出する制御部、を備えることを特徴とする。 In order to solve the above problem, the information processing device of the present invention is characterized by comprising a control unit that acquires multiple feature vectors output from a machine-learned learning model corresponding to each input of multiple input images to the learning model, generates multiple hierarchical clusters by performing a hierarchical clustering process on the multiple feature vectors, and extracts a subspace or vector corresponding to a specific cluster from the multiple clusters as a concept of the specific cluster.

前記制御部は、前記特定クラスタに関する代表ベクトルを、前記特定クラスタのコンセプトとして抽出してもよい。 The control unit may extract a representative vector related to the specific cluster as a concept of the specific cluster.

前記制御部は、前記代表ベクトルに対応する仮想的な入力画像であり且つ前記特定クラスタのコンセプトを可視化した画像であるコンセプト可視化画像を、前記代表ベクトルと前記学習モデルとに基づいて生成し、当該コンセプト可視化画像を表示部に表示させてもよい。 The control unit may generate a concept visualization image, which is a virtual input image corresponding to the representative vector and which visualizes the concept of the specific cluster, based on the representative vector and the learning model, and display the concept visualization image on the display unit.

前記情報処理装置は、前記特定クラスタの属性情報の入力を受け付ける受付部、をさらに備えてもよい。 The information processing device may further include a reception unit that receives input of attribute information for the specific cluster.

前記制御部は、前記学習モデルに対する第１画像の入力に対して前記学習モデルから出力される第１特徴ベクトルと前記学習モデルに対する第２画像の入力に対して前記学習モデルから出力される第２特徴ベクトルとに基づき前記第１画像と前記第２画像との類似性を判断する場合において、前記複数のクラスタにそれぞれ対応する複数のコンセプトのうちの少なくとも１つのコンセプトについて、前記第１画像と前記第２画像との類似性の判断に対する寄与度を算出してもよい。 When the control unit determines the similarity between the first image and the second image based on a first feature vector output from the learning model in response to a first image input to the learning model and a second feature vector output from the learning model in response to a second image input to the learning model, the control unit may calculate the contribution of at least one concept among a plurality of concepts corresponding to the plurality of clusters to the determination of the similarity between the first image and the second image.

前記制御部は、前記学習モデルに対する第１画像の入力に対して前記学習モデルから出力される第１特徴ベクトルと前記学習モデルに対する第２画像の入力に対して前記学習モデルから出力される第２特徴ベクトルとに基づき、前記第１画像と前記第２画像との類似性を判断する場合において、前記複数のクラスタにそれぞれ対応する複数の部分空間のうち、その直交補空間への前記第１および第２特徴ベクトルの射影ベクトルの相互間の距離を相対的に小さくする部分空間、または当該部分空間を張るベクトルを、前記第１画像と前記第２画像とが互いに似ていないと判断される根拠となるコンセプトとして抽出してもよい。 When determining the similarity between the first image and the second image based on a first feature vector output from the learning model in response to a first image being input to the learning model and a second feature vector output from the learning model in response to a second image being input to the learning model, the control unit may extract, from among a plurality of subspaces corresponding to the plurality of clusters, a subspace that relatively reduces the distance between the projection vectors of the first and second feature vectors onto the orthogonal complementary space, or a vector spanning the subspace, as a concept that serves as the basis for determining that the first image and the second image are not similar to each other.

また、本発明に係る情報処理装置は、機械学習された学習モデルへの複数の入力画像のそれぞれの入力に対応して前記学習モデルから出力される複数の特徴ベクトルを取得し、前記複数の特徴ベクトルに対する階層化クラスタリング処理を実行することにより階層化された複数のクラスタを生成し、前記複数のクラスタのうちの特定クラスタに対応する２以上の入力画像を、前記特定クラスタのコンセプトを表す画像群として決定する制御部、を備えることを特徴とする。 The information processing device according to the present invention is also characterized by comprising a control unit that acquires a plurality of feature vectors output from a machine-learned learning model corresponding to the input of a plurality of input images to the learning model, generates a plurality of hierarchical clusters by performing a hierarchical clustering process on the plurality of feature vectors, and determines two or more input images corresponding to a specific cluster among the plurality of clusters as a group of images representing the concept of the specific cluster.

また、本発明に係る情報処理方法は、ａ）機械学習された学習モデルへの複数の入力画像のそれぞれの入力に対応して前記学習モデルから出力される複数の特徴ベクトルを取得するステップと、ｂ）前記複数の特徴ベクトルに対する階層化クラスタリング処理を実行することにより階層化された複数のクラスタを生成するステップと、ｃ）前記複数のクラスタのうちの特定クラスタに対応する部分空間あるいはベクトルを、前記特定クラスタのコンセプトとして抽出するステップと、を備えることを特徴とする。 In addition, the information processing method of the present invention is characterized by comprising: a) a step of acquiring a plurality of feature vectors output from a machine-learned learning model corresponding to each input of a plurality of input images to the learning model; b) a step of generating a plurality of hierarchical clusters by performing a hierarchical clustering process on the plurality of feature vectors; and c) a step of extracting a subspace or vector corresponding to a specific cluster from the plurality of clusters as a concept of the specific cluster.

本発明によれば、画像の類似性の根拠をコンセプトベースで説明することが可能である。 This invention makes it possible to explain the basis of image similarity on a concept-based basis.

画像処理システムを示す概略図である。FIG. 1 is a schematic diagram illustrating an image processing system. 画像処理装置における処理の概要を示す概念図である。FIG. 1 is a conceptual diagram illustrating an overview of processing in an image processing device. 画像処理装置における処理を示すフローチャートである。10 is a flowchart showing processing in the image processing device. コンセプト解析処理を示すフローチャートである。10 is a flowchart illustrating a concept analysis process. 類似判断の根拠を説明する処理を示すフローチャートである。10 is a flowchart illustrating a process for explaining the basis for similarity determination. 非類似判断の根拠を説明する処理を示すフローチャートである。10 is a flowchart illustrating a process for explaining the basis for determining dissimilarity. 第１フェーズにおける学習処理を示す概念図である。FIG. 10 is a conceptual diagram illustrating a learning process in a first phase. 学習が進展した状態における特徴空間等を示す図である。FIG. 10 is a diagram showing a feature space etc. in a state where learning has progressed. 第２フェーズにおける推論処理について説明する図である。FIG. 10 is a diagram illustrating the inference process in the second phase. 推論処理結果の一例を示す図である。FIG. 10 is a diagram illustrating an example of an inference processing result. 階層化クラスタリング処理結果に係るデンドロイド（樹形図）等を示す図である。FIG. 10 is a diagram showing a dendroid (tree diagram) and the like related to the results of the hierarchical clustering process. 特定クラスタ周辺の階層関係を示す図である。FIG. 10 is a diagram showing the hierarchical relationship around a specific cluster. クラスタを構成する入力画像等を示す図である。FIG. 2 is a diagram showing input images and the like that constitute a cluster. 特徴ベクトルが超球面上にマッピングされた状態を示している。The feature vectors are shown mapped onto the hypersphere. 特徴空間に関する２次元的表現と３次元的表現との対応関係を示す図である。FIG. 10 is a diagram showing the correspondence between two-dimensional and three-dimensional representations of feature spaces. 詳細な学習結果（超球面上での分布）を示す図である。FIG. 10 is a diagram showing detailed learning results (distribution on a hypersphere). 線形分離器によって生成された分離平面等を示す図である。FIG. 10 is a diagram showing a separating plane etc. generated by a linear separator. 各クラスタのコンセプトベクトルを示す図である。FIG. 10 is a diagram showing the concept vectors of each cluster. 各クラスタのコンセプトベクトルを示す図である。FIG. 10 is a diagram showing the concept vectors of each cluster. 各クラスタのコンセプトベクトルを示す図である。FIG. 10 is a diagram showing the concept vectors of each cluster. ２つの入力画像の類似度について説明するための概念図である。FIG. 10 is a conceptual diagram for explaining the similarity between two input images. 特徴ベクトルが特定平面（部分空間）に射影される様子を示す図である。FIG. 10 is a diagram illustrating how a feature vector is projected onto a specific plane (subspace). 特徴ベクトルが特定直線（部分空間）に射影される様子を示す図である。FIG. 10 is a diagram illustrating how a feature vector is projected onto a specific line (subspace). 特徴ベクトルが特定直線に射影される様子（コンセプトベクトルがｘ軸と同じ向きを向いている場合）を示す図である。FIG. 10 is a diagram showing how a feature vector is projected onto a specific line (when the concept vector faces in the same direction as the x-axis). 第３フェーズにおける解析処理結果等を示す図である。10A to 10C are diagrams showing analysis processing results in the third phase, etc.; 或るコンセプトの詳細説明画面を示す図である。FIG. 10 is a diagram showing a detailed explanation screen for a certain concept. 別のコンセプトの詳細説明画面を示す図である。FIG. 10 is a diagram showing a detailed explanation screen for another concept. コンセプト可視化画像の生成処理の概略を示す図である。FIG. 10 is a diagram illustrating an outline of a process for generating a concept visualization image. ２つの画像が互いに類似していない旨の判断の根拠（非類似判断の根拠）を求める処理を示す概念図である。FIG. 10 is a conceptual diagram showing a process of obtaining a basis for determining that two images are not similar to each other (basis for determining dissimilarity).

以下、本発明の実施形態を図面に基づいて説明する。 Embodiments of the present invention will be described below with reference to the drawings.

＜１．第１実施形態＞
＜１－１．システム概要＞
図１は、画像処理システム１を示す概略図である。図１に示されるように、画像処理システム１は、撮影画像を撮像する複数（多数）の撮影装置（監視カメラ等）２０と、撮影画像を処理する画像処理装置３０とを備えている。画像処理装置３０は、撮影画像の対象（ここでは対象人物）を識別ないし分類するための各種の処理を実行する装置である。画像処理装置３０は、各種の情報を処理する情報処理装置であるとも表現される。 1. First embodiment
<1-1. System Overview>
Fig. 1 is a schematic diagram showing an image processing system 1. As shown in Fig. 1, the image processing system 1 includes a plurality (a large number) of image capturing devices (such as surveillance cameras) 20 that capture images, and an image processing device 30 that processes the captured images. The image processing device 30 is a device that executes various processes for identifying or classifying subjects (target people in this case) in the captured images. The image processing device 30 can also be expressed as an information processing device that processes various types of information.

各撮影装置２０で撮影された撮影画像は、通信ネットワーク（ＬＡＮおよび／またはインターネット等）を介して画像処理装置３０に入力される。そして、画像処理装置３０による画像処理等によって、撮影画像内の対象人物等を識別ないし分類する処理等が行われる。詳細には、複数の撮影画像に撮影された複数の人物の中から、特定人物を識別（認識）する処理等が行われる。 Images captured by each image capture device 20 are input to the image processing device 30 via a communications network (such as a LAN and/or the Internet). The image processing device 30 then performs image processing to identify or classify people and other subjects in the captured images. In more detail, processing is performed to identify (recognize) a specific person from among multiple people captured in multiple captured images.

たとえば、所定エリア内に配置された複数の撮影装置２０による複数の撮影画像の中から、特定人物が写っている撮影画像（被写体として特定人物を含む画像）を探し出す処理が行われる。複数の撮影装置２０は、互いに異なる複数の場所（道路沿いの互いに異なる箇所、互いに異なる複数の店舗（内の各箇所）、および／または同一店舗（特に大型店舗）内の互いに異なる複数の箇所等）に分散して配置される。そして、画像処理装置３０は、検索対象の特定人物を複数の撮影画像の中から検索し、検索した１又は複数の撮影画像に対応する各撮影装置を特定することによって、所定エリア内における当該特定人物の行動（移動経路等）を特定する。端的に言えば、画像処理装置３０は、特定人物を追跡することが可能である。特定人物としては、迷子追跡処理における迷子（子供等）、あるいは、犯人追跡処理における犯人（被疑者）等が例示される。たとえば、或る撮影装置２０Ａの撮影画像と別の撮影装置２０Ｂの撮影画像と更に別の撮影装置２０Ｃの撮影画像との合計３枚の撮影画像に当該特定人物（検索対象人物）が含まれている（写っている）場合を想定する。この場合、画像処理装置３０は、当該特定人物が当該撮影装置２０Ａ，２０Ｂ，２０Ｃに対応する３カ所に存在していたことを知得できる。また、画像処理装置３０は、各撮影画像の撮影時刻（詳細には、当該３カ所の撮影画像に関する撮影時刻順序）に基づいて、当該３カ所の移動順序を知得することもできる。 For example, a process is performed to search for images capturing a specific person (images containing a specific person as a subject) from among multiple images captured by multiple image capture devices 20 located within a specified area. The multiple image capture devices 20 are distributed across multiple different locations (such as different locations along a road, multiple different stores (locations within each store), and/or multiple different locations within the same store (especially a large store)). The image processing device 30 then searches for the specific person to be searched for among the multiple captured images and identifies the image capture devices corresponding to one or more of the retrieved images, thereby determining the specific person's behavior (path of travel, etc.) within the specified area. In short, the image processing device 30 is capable of tracking a specific person. Examples of specific persons include a lost child in a lost child tracking process, or a criminal (suspect) in a criminal tracking process. For example, consider a case where the specific person (search target person) is included (photographed) in a total of three images, including images captured by a certain image capture device 20A, another image capture device 20B, and yet another image capture device 20C. In this case, the image processing device 30 can determine that the specific person was present in the three locations corresponding to the image capture devices 20A, 20B, and 20C. The image processing device 30 can also determine the order in which the specific person moved between the three locations based on the capture time of each image (more specifically, the order in which the images were captured at the three locations).

このような推論処理、複数の撮影装置２０で撮影された複数の撮影画像に基づき同一人物を識別する処理は、人物再識別（再同定ないし再認識）（Person Re-Identification）処理とも称される。なお、人物再識別に関する特定人物の追跡処理は、犯人を追跡する犯人追跡処理、および迷子を探す（追跡する）迷子追跡処理等に限定されず、たとえば、マーケティング等に利用するために各個人の行動を追跡する追跡処理等であってもよい。 This type of inference processing, or the processing of identifying the same person based on multiple images captured by multiple image capture devices 20, is also called person re-identification processing. Note that the tracking processing of a specific person related to person re-identification is not limited to criminal tracking processing for tracking criminals or lost child tracking processing for searching for (tracking) a lost child, but may also be, for example, tracking processing for tracking the behavior of individuals for use in marketing, etc.

図２は、画像処理装置３０における処理の概要を示す概念図であり、図３は、画像処理装置３０における処理を示すフローチャートである。 Figure 2 is a conceptual diagram showing an overview of the processing in the image processing device 30, and Figure 3 is a flowchart showing the processing in the image processing device 30.

この実施形態では、図２および図３に示されるように、最初に、画像処理装置３０は、上記のような推論処理を行うための機械学習処理（学習モデル４００を機械学習する処理）を第１フェーズＰＨ１（図２）にて実行する（ステップＳ１１（図３））。詳細には、このような機械学習処理として、メトリックラーニング（距離学習とも称される）が実行される。より詳細には、ディープニューラルネットワーク（特に畳み込みニューラルネットワーク（Convolutional Neural Network））を用いたディープメトリックラーニング（Deep Metric Learning）が利用される。当該メトリックラーニングでは、入力画像２１０（２１１）の入力に対して特徴空間（特徴量空間）における特徴ベクトル２５０（２５１）（図７参照）を出力する学習モデル４００が用いられる。このような学習モデル４００は、入力画像（入力）から特徴ベクトル（出力）への変換（写像）を示すモデルである、とも表現される。第１フェーズＰＨ１における処理によって、機械学習された学習モデル４００（学習済みの学習モデル）（４２０）が生成される。 In this embodiment, as shown in Figures 2 and 3, the image processing device 30 first executes machine learning processing (processing for machine learning the learning model 400) in the first phase PH1 (Figure 2) to perform the above-mentioned inference processing (step S11 (Figure 3)). Specifically, metric learning (also known as distance learning) is executed as this machine learning processing. More specifically, deep metric learning using a deep neural network (particularly a convolutional neural network) is used. This metric learning uses a learning model 400 that outputs a feature vector 250 (251) (see Figure 7) in a feature space (feature space) in response to an input image 210 (211). This learning model 400 can also be expressed as a model that shows the transformation (mapping) from the input image (input) to the feature vector (output). The processing in the first phase PH1 generates a machine-learned learning model 400 (trained learning model) (420).

次に、画像処理装置３０は、第２フェーズＰＨ２（図２）の処理として推論処理を実行する（ステップＳ１２）。具体的には、第１フェーズＰＨ１にて学習された学習モデル（学習済みモデル）４００（４２０）を利用することによって、推論処理が行われる。詳細には、所定エリア内で撮影された複数の撮影画像２１３（ギャラリー画像とも称する）の中から、特定人物を含む画像を探し出す処理等が、推論処理として実行される。より詳細には、特定人物の画像である検索元の画像２１５（クエリ画像とも称する）との類似度合いが所定程度以上（換言すれば、特徴空間における特徴ベクトル間の距離が所定距離以下）の画像を、特定人物と同一の人物の画像として探し出す処理等が、推論処理として実行される。あるいは、クエリ画像２１５に類似した画像をその類似順に探し出す処理等が推論処理（人物再識別処理）として実行されてもよい。なお、複数のギャラリー画像２１３は、探索範囲を構成する画像群（探索範囲画像群）とも称される。 Next, the image processing device 30 executes inference processing as processing of the second phase PH2 (FIG. 2) (step S12). Specifically, the inference processing is performed by utilizing the learning model (trained model) 400 (420) learned in the first phase PH1. Specifically, the inference processing includes processing such as searching for images containing a specific person from among multiple captured images 213 (also referred to as gallery images) captured within a specific area. More specifically, the inference processing includes processing such as searching for images that have a predetermined degree of similarity or higher with a search source image 215 (also referred to as a query image), which is an image of the specific person (in other words, the distance between the feature vectors in feature space is a predetermined distance or less), as images of the same person. Alternatively, the inference processing (person re-identification processing) may include processing such as searching for images similar to the query image 215 in order of similarity. The multiple gallery images 213 are also referred to as a group of images constituting a search range (search range image group).

さらに、画像処理装置３０は、クエリ画像２１５と上記推論処理にて探し出された画像２１３との２つの画像（入力画像）の相互間の類似性に関する判断根拠をコンセプトベースで説明する処理（説明情報の生成処理等）を、第３フェーズＰＨ３（図２）の処理として実行する（ステップＳ１３）。 Furthermore, the image processing device 30 executes a process (such as a process for generating explanatory information) to explain, on a concept basis, the basis for determining the similarity between two images (input images), the query image 215 and the image 213 found in the above-mentioned inference process, as a process in the third phase PH3 (Figure 2) (step S13).

具体的には、まず、画像処理装置３０は、当該２つの画像の類似性に関する判断根拠の導出に先立って、機械学習された学習モデル４００（学習済みモデル４２０とも称する）にて、どのようなコンセプトが獲得（学習）されたかを解析する（図４参照）。なお、図４は、このような処理（コンセプト解析処理）を示すフローチャートである。 Specifically, before deriving the basis for determining the similarity between the two images, the image processing device 30 first analyzes what concepts have been acquired (learned) in the machine-learned learning model 400 (also referred to as the trained model 420) (see Figure 4). Note that Figure 4 is a flowchart showing this process (concept analysis process).

次に、画像処理装置３０は、当該２つの画像の類似性に関する判断根拠を導出する。詳細には、当該２つの画像の相互間の類似性に関する判断根拠（互いに類似している旨の判断の根拠）をコンセプトベースで説明するための情報（説明情報）を生成する処理等が実行される（図５参照）。なお、図５は、このような処理（類似判断の根拠を説明する処理）を示すフローチャートである。 Next, the image processing device 30 derives the basis for determining the similarity between the two images. In detail, it executes a process to generate information (explanatory information) for explaining the basis for determining the similarity between the two images (the basis for determining that they are similar to each other) on a concept basis (see Figure 5). Figure 5 is a flowchart showing this process (processing for explaining the basis for determining similarity).

より詳細には、学習モデル４００にて獲得された各種のコンセプトのうち、当該２つの画像の類似性に特に大きな影響を及ぼすコンセプト（寄与度が大きなコンセプト）が主要コンセプトとして抽出される。たとえば、複数のコンセプトのうち、（寄与度の高い順序等で）上位数個のコンセプトが主要コンセプトとして抽出される。そして、当該コンセプトが、２つの画像の類似性に関する判断根拠として決定されるとともに、当該コンセプトを表現するための各種画像が表示部３５ｂに表示される（ユーザに提示される）。 More specifically, of the various concepts acquired by the learning model 400, the concept that has a particularly large impact on the similarity between the two images (the concept with the greatest contribution) is extracted as the main concept. For example, of multiple concepts, the top few concepts (in order of highest contribution, etc.) are extracted as the main concepts. This concept is then determined as the basis for judging the similarity between the two images, and various images representing this concept are displayed on the display unit 35b (presented to the user).

このような処理（図４および図５等の処理）については後に詳述する。 This type of processing (such as that shown in Figures 4 and 5) will be described in more detail later.

＜１－２．画像処理装置３０＞
図１を再び参照する。図１に示されるように、画像処理装置３０は、コントローラ３１（制御部とも称される）と記憶部３２と通信部３４と操作部３５とを備える。 <1-2. Image processing device 30>
Referring again to Fig. 1, the image processing device 30 includes a controller 31 (also referred to as a control unit), a storage unit 32, a communication unit 34, and an operation unit 35.

コントローラ３１は、画像処理装置３０に内蔵され、画像処理装置３０の動作を制御する制御装置である。 The controller 31 is a control device built into the image processing device 30 that controls the operation of the image processing device 30.

コントローラ３１は、１又は複数のハードウェアプロセッサ（例えば、ＣＰＵ（Central Processing Unit）およびＧＰＵ（Graphics Processing Unit））等を備えるコンピュータシステムとして構成される。コントローラ３１は、ＣＰＵ等において、記憶部（ＲＯＭおよび／またはハードディスクなどの不揮発性記憶部）３２内に格納されている所定のソフトウエアプログラム（以下、単にプログラムとも称する）を実行することによって、各種の処理を実現する。なお、当該プログラム（詳細にはプログラムモジュール群）は、ＵＳＢメモリなどの可搬性の記録媒体に記録され、当該記録媒体から読み出されて画像処理装置３０にインストールされるようにしてもよい。あるいは、当該プログラムは、通信ネットワーク等を経由してダウンロードされて画像処理装置３０にインストールされるようにしてもよい。 The controller 31 is configured as a computer system equipped with one or more hardware processors (e.g., a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit)). The controller 31 performs various processes by executing, in the CPU or the like, a predetermined software program (hereinafter simply referred to as a program) stored in a storage unit 32 (a non-volatile storage unit such as a ROM and/or a hard disk). The program (more specifically, a group of program modules) may be recorded on a portable storage medium such as a USB memory, read from the storage medium, and installed on the image processing device 30. Alternatively, the program may be downloaded via a communication network or the like and installed on the image processing device 30.

具体的には、コントローラ３１は、上述の第１フェーズＰＨ１における学習処理、第２フェーズＰＨ２における推論処理、および第３フェーズＰＨ３における説明処理（説明情報生成処理等）を実行する。 Specifically, the controller 31 executes the learning process in the first phase PH1, the inference process in the second phase PH2, and the explanation process (such as the explanation information generation process) in the third phase PH3.

記憶部３２は、ハードディスクドライブ（ＨＤＤ）および／またはソリッドステートドライブ（ＳＳＤ）等の記憶装置で構成される。記憶部３２は、学習モデル４００（学習モデルに関する学習パラメータおよびプログラムを含む）（ひいては学習済みモデル４２０）等を記憶する。 The memory unit 32 is composed of a storage device such as a hard disk drive (HDD) and/or a solid state drive (SSD). The memory unit 32 stores the learning model 400 (including learning parameters and programs related to the learning model) (and thus the trained model 420), etc.

通信部３４は、ネットワークを介したネットワーク通信を行うことが可能である。このネットワーク通信では、たとえば、ＴＣＰ／ＩＰ（Transmission Control Protocol / Internet Protocol）等の各種のプロトコルが利用される。当該ネットワーク通信を利用することによって、画像処理装置３０は、所望の相手先（たとえば、撮影装置２０あるいは不図示の情報格納装置等）との間で各種のデータ（撮影画像データおよび正解データ等）を授受することが可能である。 The communication unit 34 is capable of performing network communication via a network. This network communication uses various protocols, such as TCP/IP (Transmission Control Protocol/Internet Protocol). By using this network communication, the image processing device 30 can exchange various data (captured image data, correct answer data, etc.) with a desired destination (for example, the imaging device 20 or an information storage device (not shown)).

操作部３５は、画像処理装置３０に対する操作入力を受け付ける操作入力部３５ａと、各種情報の表示出力を行う表示部３５ｂとを備えている。操作入力部３５ａとしてはマウスおよびキーボード等が用いられ、表示部３５ｂとしてはディスプレイ（液晶ディスプレイ等）が用いられる。また、操作入力部３５ａの一部としても機能し且つ表示部３５ｂの一部としても機能するタッチパネルが設けられてもよい。 The operation unit 35 includes an operation input unit 35a that accepts operation inputs to the image processing device 30, and a display unit 35b that displays and outputs various information. A mouse and keyboard are used as the operation input unit 35a, and a display (such as a liquid crystal display) is used as the display unit 35b. A touch panel that functions as both part of the operation input unit 35a and part of the display unit 35b may also be provided.

なお、画像処理装置３０は、教師データを用いて学習モデル４００を機械学習する機能を備えているので、学習モデル生成装置とも称される。また、画像処理装置３０は、学習モデル４００（４２０）を用いて対象の識別および／また分類に関する推論を実行する装置でもあるので、推論装置とも称される。さらに、画像処理装置３０は、類似性に関する説明情報を生成する装置でもあるので、説明情報生成装置とも称される。また、画像処理装置３０は、学習モデル４００（４２０）により獲得されたコンセプトを抽出する装置でもあるのでコンセプト抽出装置とも称され、２つの画像の類似性の根拠を当該コンセプトに基づいて説明（解析）する装置でもあるので（類似性）解析装置とも称される。 The image processing device 30 is also referred to as a learning model generation device because it has the function of machine learning the learning model 400 using training data. The image processing device 30 is also referred to as an inference device because it is a device that performs inference regarding object identification and/or classification using the learning model 400 (420). The image processing device 30 is also referred to as an explanatory information generation device because it is a device that generates explanatory information regarding similarity. The image processing device 30 is also referred to as a concept extraction device because it is a device that extracts concepts acquired by the learning model 400 (420), and as a device that explains (analyzes) the basis for the similarity between two images based on those concepts, so it is also referred to as a (similarity) analysis device.

また、ここでは、様々な処理（機能）が１つの画像処理装置３０によって実現されているが、これに限定されない。たとえば、様々な処理が複数の装置で分担されて実現されてもよい。たとえば、上述の第１フェーズＰＨ１における学習処理と、第２フェーズＰＨ２における推論処理と、第３フェーズＰＨ３における説明処理（説明情報生成処理等）とが、それぞれ別個の装置で実行されてもよい。 Furthermore, while various processes (functions) are implemented here by a single image processing device 30, this is not limiting. For example, various processes may be shared and implemented by multiple devices. For example, the learning process in the first phase PH1, the inference process in the second phase PH2, and the explanation process (explanation information generation process, etc.) in the third phase PH3 described above may each be executed by separate devices.

＜１－３．学習段階（第１フェーズＰＨ１）の処理＞
図３に示されるように、この実施形態では、第１フェーズＰＨ１における学習処理（ステップＳ１１）と、第２フェーズＰＨ２における推論処理（ステップＳ１２）と、第３フェーズＰＨ３における説明処理（ステップＳ１３）とがこの順序で実行される。 <1-3. Processing in the learning stage (first phase PH1)>
As shown in FIG. 3, in this embodiment, a learning process (step S11) in the first phase PH1, an inference process (step S12) in the second phase PH2, and an explanation process (step S13) in the third phase PH3 are executed in this order.

以下では、まず、第１フェーズＰＨ１における学習処理（ステップＳ１１）（図２および図３参照）について説明する。 Below, we will first explain the learning process (step S11) in the first phase PH1 (see Figures 2 and 3).

図７は、第１フェーズＰＨ１における学習処理を示す概念図である。 Figure 7 is a conceptual diagram showing the learning process in the first phase PH1.

図７に示されるように、第１フェーズＰＨ１（ステップＳ１１）においては、メトリックラーニング（距離学習）によって、学習モデル４００（詳細には学習前の学習モデル４１０（図２））に対する機械学習処理が実行される。詳細には、正解ラベル付き複数の教師データ（教師データ群）における複数の入力画像２１０（２１１）が学習モデル４００に対して順次に入力され、学習モデル４００からの出力群（複数の特徴ベクトル２５０（２５１））が取得される（図７参照）。そして、入力画像２１０（入力）と特徴空間における特徴ベクトル２５０（出力）との写像関係が学習される。より具体的には、特徴空間での距離（特徴ベクトル間の距離）が入力空間での入力画像の類似度を反映するように、学習モデル４００（写像関係）が学習される。たとえば、トリプレットロスなどの評価関数を最小化（最適化）するような学習処理等が行われる。このような学習処理によって、学習前の学習モデル４００（４１０）が学習され、学習済みモデル４２０が生成される（ステップＳ１１）。 As shown in FIG. 7, in the first phase PH1 (step S11), machine learning processing is performed on the learning model 400 (more specifically, the learning model 410 before learning (FIG. 2)) using metric learning (distance learning). Specifically, multiple input images 210 (211) in multiple teacher data sets (teacher data group) with correct answer labels are sequentially input to the learning model 400, and an output group (multiple feature vectors 250 (251)) from the learning model 400 is obtained (see FIG. 7). Then, a mapping relationship between the input image 210 (input) and the feature vector 250 (output) in the feature space is learned. More specifically, the learning model 400 (mapping relationship) is learned so that the distance in the feature space (the distance between feature vectors) reflects the similarity of the input images in the input space. For example, a learning process is performed to minimize (optimize) an evaluation function such as triplet loss. Through this learning process, the pre-learning learning model 400 (410) is trained and a trained model 420 is generated (step S11).

より詳細には、まず、画像処理装置３０は、機械学習用の人物画像２１０（２１１とも称する）を生成する。たとえば、画像処理装置３０は、撮影装置２０から取得した複数の撮影画像のそれぞれに対して人物抽出処理およびサイズ調整処理（リサイズ処理）を施して複数の人物画像２１０（２１１）を生成する。当該複数の人物画像２１０は、学習モデル４００に対する入力画像群として準備される。換言すれば、学習モデル４００に対する入力画像２１０（２１１）として、各人物画像２１０（２１１）が準備される。たとえば、幅（横）Ｗ０画素および高さ（縦）Ｈ０画素の画素配列（矩形形状の画素配列）を有するカラー画像（３チャンネル）が各入力画像２１０として準備される。換言すれば、入力画像２１０は、Ｗ０×Ｈ０×ＣＨ０のボクセルデータ（ただし、ＣＨ０＝３）として生成される。 More specifically, the image processing device 30 first generates a person image 210 (also referred to as 211) for machine learning. For example, the image processing device 30 performs a person extraction process and a size adjustment process (resizing process) on each of a plurality of captured images acquired from the image capture device 20 to generate a plurality of person images 210 (211). The plurality of person images 210 are prepared as a group of input images for the learning model 400. In other words, each person image 210 (211) is prepared as an input image 210 (211) for the learning model 400. For example, a color image (3 channels) having a pixel array (rectangular pixel array) with a width (horizontal) of W0 pixels and a height (vertical) of H0 pixels is prepared as each input image 210. In other words, the input image 210 is generated as voxel data of W0 x H0 x CH0 (where CH0 = 3).

また、複数の入力画像２１０の人物が同じ人物であるか異なる人物かに関する正解情報（正解ラベル）が、当該複数の入力画像２１０のそれぞれに付与される。たとえば、各入力画像２１０（２１１）に対して人物ＩＤ（人物を識別する識別子）等が付与される。詳細には、同一人物の画像には同じ人物ＩＤが付与され、異なる人物の画像には異なる人物ＩＤが付与される。このようにして、正解ラベルと入力画像２１０（２１１）との組み合わせが、正解ラベル付き教師データとして付与される。 In addition, correct answer information (correct answer label) regarding whether the people in the multiple input images 210 are the same person or different people is assigned to each of the multiple input images 210. For example, a person ID (an identifier that identifies a person) is assigned to each input image 210 (211). In detail, the same person ID is assigned to images of the same person, and different person IDs are assigned to images of different people. In this way, the combination of the correct answer label and the input image 210 (211) is assigned as training data with a correct answer label.

つぎに、当該複数の入力画像２１０（入力画像群）が順次に学習モデル４００に入力され、学習モデル４００からの複数の出力、すなわち特徴空間における複数の特徴ベクトル２５０（特徴ベクトル群）が順次に出力される（図７参照）。 Next, the multiple input images 210 (input image group) are sequentially input to the learning model 400, and multiple outputs from the learning model 400, i.e., multiple feature vectors 250 (feature vector group) in feature space, are sequentially output (see Figure 7).

ここにおいて、学習モデル４００は、複数の層（階層）が階層的に接続される階層構造を有している。具体的には、学習モデル４００は、入力層と複数の中間層と出力層とを備えている。複数の中間層は、特徴抽出層等を備えて構成される。特徴抽出層は、１又は複数の畳み込み層と１のプーリング層とが繰り返し配置されること等によって構成される。各畳み込み層では、畳み込み処理を実行するフィルタにより画像内の特徴が抽出される。また、各プーリング層では、微小画素範囲（たとえば、２×２の画素範囲）毎の平均画素値あるいは最大画素値等を抽出するプーリング処理（平均プーリング処理あるいは最大プーリング処理等）が行われ、画素サイズが低減（たとえば、縦横の各方向に１／２）される（情報量が凝縮される）。入力画像２１０に対して複数の特徴抽出処理が施されることによって、特徴マップ２３０（不図示）が生成される。また、当該特徴マップ２３０の各チャンネル画像に対してプーリング処理（たとえば、最大プーリング処理）が施されることによって、所定のチャンネル数（次元数）ＣＨ１を有する特徴ベクトル２５０が生成され、当該特徴ベクトル２５０が学習モデル４００から出力される。 Here, the learning model 400 has a hierarchical structure in which multiple layers (hierarchies) are hierarchically connected. Specifically, the learning model 400 includes an input layer, multiple intermediate layers, and an output layer. The multiple intermediate layers include a feature extraction layer, etc. The feature extraction layer is configured by repeatedly arranging one or more convolutional layers and one pooling layer, etc. In each convolutional layer, features within the image are extracted using a filter that performs convolution processing. In addition, in each pooling layer, a pooling process (such as average pooling or max pooling) is performed to extract the average pixel value or maximum pixel value for each small pixel range (e.g., a 2x2 pixel range), thereby reducing the pixel size (e.g., by half in both the vertical and horizontal directions) (condensing the amount of information). A feature map 230 (not shown) is generated by performing multiple feature extraction processes on the input image 210. Furthermore, by performing a pooling process (for example, a max pooling process) on each channel image of the feature map 230, a feature vector 250 having a predetermined number of channels (number of dimensions) CH1 is generated, and the feature vector 250 is output from the learning model 400.

なお、このような学習モデル４００（ニューラルネットワーク）としては、たとえば、ＶＧＧ１６あるいはＲｅｓＮｅｔ（Residual Network）（残差ネットワーク）等が用いられればよい。ＶＧＧ１６は、３層の畳み込み層と５層のプーリング層と３層の全結合層とを有する畳み込みニューラルネットワークモデルである。また、ＲｅｓＮｅｔ（Residual Network）（残差ネットワーク）は、層間で残差を足し合わせることを含む畳み込みニューラルネットワークである。ＲｅｓＮｅｔにおける特徴抽出層は、畳み込み層と活性化関数とスキップコネクション（ショートカットコネクション）との組合せ等で構成される複数の残差ブロック等で構成される。 Note that such a learning model 400 (neural network) may be, for example, VGG16 or ResNet (Residual Network). VGG16 is a convolutional neural network model with three convolutional layers, five pooling layers, and three fully connected layers. ResNet (Residual Network) is a convolutional neural network that includes summing residuals between layers. The feature extraction layer in ResNet is composed of multiple residual blocks, which are made up of combinations of convolutional layers, activation functions, and skip connections (shortcut connections).

入力画像２１０における画像の各種の特徴は、特徴マップ２３０におけるチャンネルごと（換言すれば、特徴ベクトル２５０のチャンネル（要素）ごとに）に抽出される。なお、入力画像２１０における画像の特徴は、特徴マップ２３０における各チャンネルの２次元画像内において、その大まかな位置が保持された状態で抽出される。 Various image features in the input image 210 are extracted for each channel in the feature map 230 (in other words, for each channel (element) of the feature vector 250). Note that the image features in the input image 210 are extracted while maintaining their approximate positions within the two-dimensional image of each channel in the feature map 230.

たとえば、特徴マップ２３０は、それぞれ幅Ｗ１画素および高さＨ１画素の画素配列（矩形形状の画素配列）の２次元配列データで構成されるチャンネルをＣＨ１個備える３次元配列データ（Ｗ１×Ｈ１×ＣＨ１のボクセルデータ）である。特徴マップ２３０の各チャンネルのサイズ（Ｗ１×Ｈ１）は、たとえば、１４×１４である。特徴ベクトル２５０の各要素（の数値）は、各チャンネルで抽出された特徴を表している。特徴ベクトル２５０の次元数ＣＨ１は、特徴マップ２３０のチャンネル数ＣＨ１であり、たとえば、１０２４である。ただし、これに限定されず、各チャンネルのサイズ（Ｗ１×Ｈ１）およびチャンネル数ＣＨ１は、他の値であってもよい。たとえば、チャンネル数ＣＨ１（特徴ベクトル２５０の次元数）は、５１２、あるいは２０４８などであってもよい。 For example, feature map 230 is three-dimensional array data (voxel data of W1 x H1 x CH1) with one channel CH, each channel consisting of two-dimensional array data of pixels with a width of W1 pixels and a height of H1 pixels (rectangular pixel array). The size (W1 x H1) of each channel of feature map 230 is, for example, 14 x 14. Each element (value) of feature vector 250 represents the feature extracted in each channel. The number of dimensions CH1 of feature vector 250 is the number of channels CH1 of feature map 230, and is, for example, 1024. However, this is not limited to this, and the size (W1 x H1) of each channel and the number of channels CH1 may be other values. For example, the number of channels CH1 (the number of dimensions of feature vector 250) may be 512 or 2048.

理想的には、特徴空間（学習モデル４００の出力空間）において、同一人物を被写体とする複数の入力画像２１０（２１１）に対応する複数の特徴ベクトル２５０（２５１）は互いに近い位置に配置され、異なる人物に関する複数の入力画像に対応する複数の特徴ベクトル２５０は互いに遠い位置に配置される。ただし、学習前の学習モデル４００からの出力に基づく特徴ベクトル群の分布（図７の最右欄参照）は、このような理想的な分布状態からずれている。 Ideally, in the feature space (the output space of the learning model 400), multiple feature vectors 250 (251) corresponding to multiple input images 210 (211) featuring the same person as a subject would be located close to each other, and multiple feature vectors 250 corresponding to multiple input images of different people would be located far from each other. However, the distribution of feature vectors based on the output from the learning model 400 before learning (see the rightmost column in Figure 7) deviates from this ideal distribution.

つぎに、メトリックラーニングにおいて、トリプレットロス（Triplet Loss）などの評価関数を最適化（最小化）するように、学習モデル４００が学習される。これによって、入力空間での入力画像の類似度が特徴空間での距離（特徴ベクトル間の距離）に対応するように、学習モデル４００（写像関係）が学習される。換言すれば、特徴空間における特徴ベクトルの分布位置が学習の進行に応じて徐々に変更される。非常に良好な機械学習が実行されれば、特徴空間における特徴ベクトルの分布は、上述の理想的な分布状態に徐々に近づいていく（図８の最右欄参照）。具体的には、最終的な特徴空間において、同じ人物（および似た服装の人物）の画像の対応特徴ベクトルは比較的近くに分布し、異なる人物（および大きく異なる服装の人物）の画像の対応特徴ベクトルは比較的離れて分布する。このような機械学習の結果、学習前の学習モデル４００（４１０とも称する）は、学習済みの学習モデル４００（４２０とも称する）に変化する。学習済みモデル４２０は、入力画像に応じた特徴量（特徴ベクトル）を抽出する特徴抽出器である、とも表現される。 Next, in metric learning, the learning model 400 is trained to optimize (minimize) an evaluation function such as triplet loss. This trains the learning model 400 (mapping relationship) so that the similarity of input images in the input space corresponds to the distance in the feature space (the distance between feature vectors). In other words, the distribution of feature vectors in the feature space gradually changes as the learning progresses. If very good machine learning is performed, the distribution of feature vectors in the feature space gradually approaches the ideal distribution described above (see the rightmost column in Figure 8). Specifically, in the final feature space, corresponding feature vectors of images of the same person (or people wearing similar clothing) will be relatively close to each other, while corresponding feature vectors of images of different people (or people wearing significantly different clothing) will be relatively far apart. As a result of this machine learning, the pre-learning learning model 400 (also referred to as 410) transforms into the trained learning model 400 (also referred to as 420). The trained model 420 can also be described as a feature extractor that extracts features (feature vectors) according to the input image.

なお、図７および図８内の各最右欄においては、複数の特徴ベクトル２５０（２５１）を特徴空間にマッピングした様子が示されている。当該最右欄では、各特徴ベクトル２５１は、１つの点（詳細には点状の図形）として表現されている。詳細には、多数の入力画像２１１に対応する多数の特徴ベクトル２５１のうちの一部が、それぞれ点状の図形（白丸、黒丸、ハッチング付き白丸、ハッチング付き黒丸、白四角、黒四角等）で示されている。また、図７および図８のそれぞれにおいて、最右欄（特徴空間を示す大きな四角形部分）と当該最右欄の左側の部分（複数の特徴ベクトル２５１（細長い帯状の矩形で示されている）が配列された部分）とは同じ状況を示している。また、便宜上、ここでは本来同じ人物（および非常に似た服装の人物）に対応する複数の点を同じ図形（点状図形）で示している。ただし、画像処理装置３０は、何れの点（特徴ベクトル（換言すれば、入力画像））が同じ人物に本来対応するか（正解ラベル）を知らない。 Note that the rightmost columns in Figures 7 and 8 show how multiple feature vectors 250 (251) are mapped onto the feature space. In the rightmost columns, each feature vector 251 is represented as a single point (more specifically, a point-like figure). More specifically, some of the multiple feature vectors 251 corresponding to the multiple input images 211 are represented as point-like figures (white circles, black circles, hatched white circles, hatched black circles, white squares, black squares, etc.). In Figures 7 and 8, the rightmost column (the large rectangular portion representing the feature space) and the left portion of the rightmost column (the portion where multiple feature vectors 251 (shown as elongated strip-like rectangles) are arranged) show the same situation. For convenience, multiple points that actually correspond to the same person (and people wearing very similar clothing) are represented by the same figure (point-like figure). However, the image processing device 30 does not know which points (feature vectors (in other words, input images)) actually correspond to the same person (correct labels).

＜１－４．推論段階（第２フェーズＰＨ２）の処理＞
つぎに、第２フェーズＰＨ２（ステップＳ１２）（図２および図３参照）における推論処理について図９および図１０を参照しつつ説明する。図９は、特徴ベクトル２５０（２５３）を用いた推論処理について説明する図である。図１０は、推論処理結果の一例を示す図である。 <1-4. Processing in the inference stage (second phase PH2)>
Next, the inference processing in the second phase PH2 (step S12) (see FIGS. 2 and 3) will be described with reference to FIGS. 9 and 10. FIG. 9 is a diagram illustrating the inference processing using the feature vector 250 (253). FIG. 10 is a diagram showing an example of the inference processing result.

第２フェーズＰＨ２（ステップＳ１２）においては、画像処理装置３０は、探索範囲の複数の人物画像（具体的には、新たな複数の入力画像２１０（２１３））内の対象（ここでは対象人物）を識別（ないし分類）する推論処理を実行する。具体的には、ターゲットエリアにて（ターゲットエリアに配置された撮影装置２０により）撮影された新たな複数の入力画像２１０（２１３）の中から、探索対象（探索元）の入力画像２１５（新たな入力画像）内の人物と同一の人物が探索される。換言すれば、画像処理装置３０は、当該複数の入力画像２１３の人物の中から、探索対象の入力画像２１５（クエリ画像）内の人物と同一の人物を識別（認識）する。 In the second phase PH2 (step S12), the image processing device 30 performs an inference process to identify (or classify) objects (here, target persons) in multiple person images in the search range (specifically, multiple new input images 210 (213)). Specifically, a person identical to a person in the search target (search source) input image 215 (new input image) is searched for among multiple new input images 210 (213) captured in the target area (by a camera device 20 placed in the target area). In other words, the image processing device 30 identifies (recognizes) from among the people in the multiple input images 213, a person identical to a person in the search target input image 215 (query image).

そのため、まず、画像処理装置３０は、探索範囲の複数の人物画像（具体的には、新たな複数の入力画像２１０（ギャラリー画像２１３））を学習モデル４２０にそれぞれ入力し、当該学習モデル４２０からの出力をそれぞれ取得する。具体的には、図９に示されるように、各入力画像２１３に対する出力として、特徴ベクトル２５０（２５３）が取得される。また、各特徴ベクトル２５０（２５３）は、たとえば、１０２４次元のベクトルとして生成される。このような特徴ベクトル２５３が、各入力画像２１３の特徴を表すベクトルとして、複数の入力画像２１３のそれぞれに関して求められる（図９左側参照）。 To this end, the image processing device 30 first inputs multiple person images within the search range (specifically, multiple new input images 210 (gallery images 213)) into the learning model 420 and obtains output from the learning model 420. Specifically, as shown in FIG. 9, a feature vector 250 (253) is obtained as output for each input image 213. Furthermore, each feature vector 250 (253) is generated as a 1024-dimensional vector, for example. Such a feature vector 253 is obtained for each of the multiple input images 213 as a vector representing the features of each input image 213 (see the left side of FIG. 9).

同様に、画像処理装置３０は、探索対象の入力画像（クエリ画像）２１５を学習モデル４２０に入力し、当該学習モデル４２０から出力された特徴ベクトル２５０（２５５）を取得する（図９右側参照）。なお、クエリ画像２１５は、たとえば、複数の入力画像２１３（ギャラリー画像）とは別の画像（探索用に新たに付与された画像等）である。ただし、これに限定されず、クエリ画像２１５は、複数の入力画像２１３（ギャラリー画像）の中から何らかの契機等によって発見（特定）された探索対象人物に関する画像等であってもよい。 Similarly, the image processing device 30 inputs the input image (query image) 215 to be searched into the learning model 420 and obtains the feature vector 250 (255) output from the learning model 420 (see the right side of Figure 9). Note that the query image 215 is, for example, an image (such as an image newly assigned for search purposes) that is different from the multiple input images 213 (gallery images). However, without being limited to this, the query image 215 may also be an image related to the person to be searched that has been discovered (identified) from the multiple input images 213 (gallery images) due to some trigger or the like.

つぎに、画像処理装置３０は、クエリ画像２１５の特徴ベクトル２５５と複数の入力画像２１３に関する複数の特徴ベクトル２５３のそれぞれとの類似度合い（たとえば、ユークリッド距離、あるいはベクトル間の内積（コサイン類似度）等）を算出する。また、当該類似度合いの高い順（類似度合いの降順）に当該複数の特徴ベクトル２５３が並べ替えられる。より詳細には、ユークリッド距離の昇順に（あるいは、コサイン類似度の降順に）複数の特徴ベクトル２５３が並べ替えられる。 Next, the image processing device 30 calculates the degree of similarity (for example, Euclidean distance or the dot product between vectors (cosine similarity)) between the feature vector 255 of the query image 215 and each of the multiple feature vectors 253 for the multiple input images 213. The multiple feature vectors 253 are then sorted in descending order of similarity (descending order of similarity). More specifically, the multiple feature vectors 253 are sorted in ascending order of Euclidean distance (or descending order of cosine similarity).

たとえば、画像処理装置３０は、特徴空間における特徴ベクトル２５５との距離が所定の距離以下（すなわち、類似度合いが所定程度以上）の１又は２以上の特徴ベクトル２５３を、クエリ画像２１５内の人物と同一の人物の特徴ベクトル２５５として特定する。換言すれば、画像処理装置３０は、特定された当該１又は２以上の特徴ベクトル２５５に対応する１又は２以上の入力画像２１３内の人物を、クエリ画像２１５内の人物と同一の人物であると認識する。 For example, the image processing device 30 identifies one or more feature vectors 253 whose distance to the feature vector 255 in the feature space is less than a predetermined distance (i.e., the degree of similarity is greater than or equal to a predetermined level) as the feature vector 255 of the same person as the person in the query image 215. In other words, the image processing device 30 recognizes the person in one or more input images 213 corresponding to the identified one or more feature vectors 255 as the same person as the person in the query image 215.

図１０は、複数の入力画像２１３にそれぞれ対応する複数の特徴ベクトル２５３（図１０にて砂地ハッチングを付した白丸でそれぞれ示される）が特徴空間にて分布する様子を示している。図１０では、クエリ画像２１５の特徴ベクトル２５５（白星印参照）から所定の距離範囲内に、３つの特徴ベクトル２５３（Ｖ３０１，Ｖ３０２，Ｖ３０３）が存在している。この場合、たとえば、当該３つの特徴ベクトル２５３（Ｖ３０１，Ｖ３０２，Ｖ３０３）に対応する３つの画像２１３が同一人物の画像として抽出される。また、当該３つの特徴ベクトル２５３は、特徴ベクトル２５５との類似度の降順に（距離の昇順に）並べられている。ここでは、上位３つの特徴ベクトル２５３に対応する３つの人物画像２１３が、クエリ画像２１５の人物と同一の人物（あるいは非常に類似する人物）の画像である、と認識されている。 Figure 10 shows how multiple feature vectors 253 (represented by open circles with sand-hatched lines in Figure 10) corresponding to multiple input images 213 are distributed in feature space. In Figure 10, three feature vectors 253 (V301, V302, V303) exist within a predetermined distance range from the feature vector 255 (see the open star) of the query image 215. In this case, for example, the three images 213 corresponding to the three feature vectors 253 (V301, V302, V303) are extracted as images of the same person. The three feature vectors 253 are also sorted in descending order of similarity to the feature vector 255 (ascending order of distance). Here, the three person images 213 corresponding to the top three feature vectors 253 are recognized as images of the same person (or a person very similar) as the person in the query image 215.

なお、これに限定されず、当該距離の昇順に並べ替えられた上位所定数の特徴ベクトル２５０（２５３）に対応する入力画像２１３内の人物が、クエリ画像２１５内の人物と同一の人物であると認識されてもよい。あるいは、複数の入力画像２１３が、クエリ画像２１５との（特徴ベクトル２５５に関する）距離の昇順（類似度の降順）に並べ替えられるだけでもよい。この場合でも、画像処理装置３０は、実質的にクエリ画像内の人物と同一の人物である可能性が高い人物をその可能性順に探し出す処理（同一人物の認識処理）を実行しており、当該処理は、クエリ画像内の対象人物を認識する推論処理の一つである。 However, without being limited to this, the person in the input image 213 corresponding to a predetermined number of feature vectors 250 (253) sorted in ascending order of distance may be recognized as the same person as the person in the query image 215. Alternatively, multiple input images 213 may simply be sorted in ascending order of distance (related to feature vector 255) from the query image 215 (descending order of similarity). Even in this case, the image processing device 30 essentially executes a process of searching for people who are likely to be the same person as the person in the query image in order of likelihood (same person recognition process), and this process is one type of inference process for recognizing the target person in the query image.

また、ここでは、複数の特徴ベクトル２５０（２５１，２５３，２５５）は、それぞれ正規化（詳細にはＬ２正規化）されているものとする。また、当該複数の特徴ベクトル２５０のうちのいずれか２つの特徴ベクトルの類似性を示す指標として、２つの特徴ベクトル２５０（Ｆとも表記する）間の内積（換言すれば、コサイン類似度）を採用する。具体的には、２つの入力画像Ｘ（クエリ画像Ｘｑおよびギャラリー画像Ｘｇ）にそれぞれ対応する２つの特徴ベクトルＦ（Ｆ^ｑ，Ｆ^ｇ）の内積（＝Ｆ^ｑ・Ｆ^ｇ＝ｑ・ｇ）が、２つの入力画像の類似度として用いられる。特徴ベクトルＦ^ｑは、入力画像（クエリ画像）Ｘ^ｑに対する学習モデル４００からの出力ベクトル（特徴空間（出力空間）における特徴ベクトル）Ｆであり、特徴ベクトルＦ^ｇは、入力画像（或るギャラリー画像）Ｘ^ｇに対する学習モデル４００からの出力ベクトルＦである。なお、記号「・」は、内積を表す。また、特徴ベクトルＦ^ｑを単に特徴ベクトルｑとも表現し、特徴ベクトルＦ^ｇを単に特徴ベクトルｇとも表現する。このような類似度Ｓｔは、次の式（１）のように表現される。 Here, each of the multiple feature vectors 250 (251, 253, 255) is assumed to be normalized (specifically, L2 normalized). The dot product (in other words, cosine similarity) between two feature vectors 250 (also denoted as F) is used as an index indicating the similarity between any two of the multiple feature vectors 250. Specifically, the dot product (= Fq·Fg = q·g) of two feature vectors F ( ^Fq , ^Fg ) corresponding to two input images X (query image ^Xq and gallery image Xg) is used as the similarity between the two input images. The feature vector ^Fq ^is an output vector (feature vector in feature space (output space)) F from the learning model 400 for the input image (query image) ^Xq , and the feature vector ^Fg is an output vector F from the learning model 400 for the input image (a certain gallery image) ^Xg . The symbol "·" represents the dot product. Furthermore, the feature vector ^Fq is also simply expressed as the feature vector q, and the feature vector ^Fg is also simply expressed as the feature vector g. Such a similarity St is expressed as in the following equation (1).

なお、各特徴ベクトルＦ（２５０）は、正規化されている（各ベクトルＦの大きさは１である）ため、２つの特徴ベクトル間の内積は、２つの特徴ベクトル間のコサイン類似度に等しい。また、コサイン類似度（および内積）が大きい（「１」に近い）ということは、２つの特徴ベクトルＦのなす角度θが小さいこと（２つの特徴ベクトルＦが類似していること）、ひいては、当該２つの特徴ベクトルＦに対応する２つの入力画像が類似していることを意味する。すなわち、２つの特徴ベクトルＦの類似度Ｓｔ（２つの特徴ベクトルＦの内積）が大きいほど、２つの特徴ベクトルＦに対応する２つの入力画像は類似する。 Note that because each feature vector F (250) is normalized (the magnitude of each vector F is 1), the dot product between two feature vectors is equal to the cosine similarity between the two feature vectors. Furthermore, a large cosine similarity (and dot product) (close to 1) means that the angle θ between the two feature vectors F is small (the two feature vectors F are similar), and therefore the two input images corresponding to the two feature vectors F are similar. In other words, the greater the similarity St (the dot product of the two feature vectors F) between the two feature vectors F, the more similar the two input images corresponding to the two feature vectors F are.

＜１－５．特徴ベクトルＦの分布について＞
ここにおいて、図７および図８（の各最右欄）においては、特徴ベクトルＦの分布は平面（超平面）上の点群として２次元的に表現されている。 <1-5. Distribution of feature vector F>
In FIGS. 7 and 8 (in the rightmost columns), the distribution of feature vectors F is expressed two-dimensionally as a group of points on a plane (hyperplane).

一方、図１４等に示されるように、特徴ベクトルＦの分布は球面（超球面）上の点群として３次元的にも表現することも可能である。以下では、後者の表現（超球面を用いた３次元的な表現）を主に用いて説明する。 On the other hand, as shown in Figure 14, etc., the distribution of feature vectors F can also be expressed three-dimensionally as a group of points on a sphere (hypersphere). In the following, we will mainly use the latter expression (three-dimensional expression using a hypersphere) for explanation.

図１４は、学習済みモデル４２０から出力された複数（ここでは３つ）の特徴ベクトルＦ（２５０）が超球面上にマッピングされた状態を示している。ここでは、各特徴ベクトルＦ（２５０）は、正規化されている（各ベクトルＦのノルム（大きさ）は１である）。それ故、図１４に示されるように、特徴ベクトルＦは、原点を始点とし超球面上の点を終点とするベクトルで表現され得る。 Figure 14 shows the state in which multiple (here, three) feature vectors F (250) output from the trained model 420 are mapped onto the hypersphere. Here, each feature vector F (250) is normalized (the norm (magnitude) of each vector F is 1). Therefore, as shown in Figure 14, the feature vector F can be represented as a vector starting from the origin and ending at a point on the hypersphere.

また、上述したように、２つの特徴ベクトルＦのなす角度θが小さいこと（当該２つの特徴ベクトルＦの内積が大きいこと）は、当該２つの特徴ベクトルＦに対応する２つの入力画像が類似していることを意味する。 Also, as mentioned above, a small angle θ between two feature vectors F (a large dot product of the two feature vectors F) means that the two input images corresponding to the two feature vectors F are similar.

図１５は、特徴空間に関する２次元的表現と３次元的表現との対応関係を示す図である。図１５において、左右上下に大別される４つの図形群のうち、図１５の上側の２つの図形群は、各特徴ベクトルＦ（２５０）が超平面に分布する様子を示す図である。これに対して、図１５の下側の２つの図形群は、各特徴ベクトルＦ（２５０）が超球面に分布する様子を示す図である。 Figure 15 shows the correspondence between two-dimensional and three-dimensional representations of feature space. Of the four groups of figures in Figure 15, roughly divided into left, right, top, and bottom, the two groups of figures at the top of Figure 15 show how each feature vector F(250) is distributed on a hyperplane. In contrast, the two groups of figures at the bottom of Figure 15 show how each feature vector F(250) is distributed on a hypersphere.

詳細には、図１５の左下の図形群（球体およびその表面の点状図形等）は、図７の最右欄と同様、特徴空間における「未学習の学習モデル４００」による出力ベクトル（特徴ベクトルＦ）の分布を示している。当該左下の図形群は、特徴空間を３次元的に図示するものであり、図１５の左上側の図形群（大きな矩形およびその内部の図形）と同様の状況を（ただし、２次元的にではなく３次元的に）示している。 In detail, the group of figures in the lower left of Figure 15 (spheres and point figures on their surfaces, etc.) show the distribution of output vectors (feature vectors F) from the "untrained learning model 400" in feature space, similar to the right-most column of Figure 7. The group of figures in the lower left illustrates the feature space in three dimensions, and shows a similar situation to the group of figures in the upper left of Figure 15 (large rectangles and figures inside them) (although in three dimensions rather than two dimensions).

また、図１５の右下の図形群（球体およびその表面の点状図形等）は、図８の最右欄と同様、特徴空間における「学習済みの学習モデル４００（４２０）」による出力ベクトル（特徴ベクトルＦ）の分布を示している。当該右下の図形群は、特徴空間を３次元的に図示するものであり、図１５の右上側の図形群（大きな矩形およびその内部の図形）と同様
の状況を（ただし、２次元的にではなく３次元的に）示している。 Furthermore, the group of figures (spheres and point-like figures on their surfaces, etc.) in the lower right of Fig. 15 show the distribution of output vectors (feature vectors F) in the feature space by the "trained learning model 400 (420)," similar to the rightmost column of Fig. 8. The group of figures in the lower right illustrates the feature space three-dimensionally, and shows a situation similar to that of the group of figures (large rectangles and figures inside them) in the upper right of Fig. 15 (however, three-dimensionally, not two-dimensionally).

なお、図１５の上側および図８等は、便宜上、特徴ベクトルＦの分布を２次元的に表現する概念図であり、図１５の下側および図１４等は、便宜上、特徴ベクトルＦの分布を３次元的に表現する概念図である。実際の特徴ベクトルＦは、通常、２次元ないし３次元ベクトルではなく、非常に高次（更に高次の）多次元ベクトル（たとえば１０２４次元ベクトル）である。４次元以上のベクトルは、３次元空間において可視的に図示することは困難であり、特徴ベクトルＦに関するこれらの図示表現（図８、図１４、図１５等参照）は、いずれも、簡略化した仮想的なものである。 Note that, for convenience, the upper part of Figure 15 and Figure 8 etc. are conceptual diagrams that represent the distribution of feature vectors F in two dimensions, while the lower part of Figure 15 and Figure 14 etc. are conceptual diagrams that represent the distribution of feature vectors F in three dimensions. Actual feature vectors F are not usually two- or three-dimensional vectors, but rather very high-dimensional (even higher) multidimensional vectors (for example, 1024-dimensional vectors). Vectors with four or more dimensions are difficult to visually illustrate in three-dimensional space, and these graphical representations of feature vectors F (see Figures 8, 14, 15 etc.) are all simplified virtual representations.

＜１－６．第１および第２フェーズの処理の詳細等＞
本実施形態（特に上述の第１フェーズＰＨ１および第２フェーズＰＨ２）では、より詳細には次のような人物再識別（再同定）処理等が実行される。 <1-6. Details of the first and second phase processes>
In this embodiment (particularly the first phase PH1 and the second phase PH2 described above), more specifically, the following person re-identification (re-identification) process is executed.

まず、第１フェーズＰＨ１（ステップＳ１１）において、多数の人物（たとえば、数百人～数万人）のそれぞれを個々に含む多数の画像（たとえば、数千枚～数十万枚）に基づき学習モデル４００が機械学習される。なお、当該多数の画像に係る多数の人物は、互いに異なる服を着ており、完全に同じ服装の別人はいない、との前提である。同じような服装の人物画像に対応する特徴ベクトル同士は、特徴空間（学習モデル４００の出力空間）において相対的に（異なる服装の人物画像に対応する特徴ベクトルよりも）近くに配置されるように、学習が行われる。 First, in the first phase PH1 (step S11), the learning model 400 is machine-learned based on a large number of images (e.g., thousands to hundreds of thousands) each containing a large number of people (e.g., hundreds to tens of thousands). It is assumed that the people in the images are wearing different clothing, and that no two people are wearing exactly the same clothing. Learning is performed so that feature vectors corresponding to images of people wearing similar clothing are positioned relatively closer to each other in the feature space (the output space of the learning model 400) (than feature vectors corresponding to images of people wearing different clothing).

詳細には、同じ人物（同じ服を着用している同じ人物）の画像に対応する特徴ベクトルは、特徴空間において非常に近接して配置されるように、学習される。換言すれば、特徴空間において、或る服装特徴を有する同一人物に係る複数の画像は、非常に近くに配置されるように学習される。すなわち、同じ人物の画像同士（詳細にはその特徴ベクトル同士）は、特徴空間において非常に近接して配置される。また、似た服装の異なる人物（類似する服装特徴を有する互いに異なる人物）の画像同士（詳細にはその特徴ベクトル同士）も、特徴空間において近接して配置される。一方、大きく異なる服装の異なる人物の画像同士（詳細にはその特徴ベクトル同士）は、特徴空間において比較的離れて配置される。 In more detail, feature vectors corresponding to images of the same person (the same person wearing the same clothes) are trained to be positioned very close together in feature space. In other words, multiple images of the same person with certain clothing characteristics are trained to be positioned very close together in feature space. That is, images of the same person (more specifically, their feature vectors) are positioned very close together in feature space. Furthermore, images (more specifically, their feature vectors) of different people wearing similar clothing (different people with similar clothing characteristics) are also positioned close together in feature space. On the other hand, images (more specifically, their feature vectors) of different people wearing very different clothing are positioned relatively far apart in feature space.

図８の最右欄には、このような学習結果の一例が示されている。詳細には、学習済みモデル４２０による各出力ベクトル（特徴ベクトルＦ）が点状の図形でそれぞれ表現され、複数の特徴ベクトルＦを表す複数の点状図形が、特徴空間（を表す大きな矩形）内にプロット（配置）されて示されている。 An example of such a learning result is shown in the rightmost column of Figure 8. In detail, each output vector (feature vector F) from the trained model 420 is represented by a dot-like figure, and multiple dot-like figures representing multiple feature vectors F are plotted (arranged) within (a large rectangle representing) the feature space.

また、図１１および図１６は、（図８の最右欄よりも）詳細な学習結果の一例を示す図である。図１１の中央部分から下半部分に亘る大きな矩形内には、学習済みの学習モデル４００（学習済みモデル４２０）による出力ベクトル（特徴ベクトルＦ）の分布が２次元的に示されている。また、図１６は、同様の状況を３次元的に表現した図である。 Furthermore, Figures 11 and 16 show an example of a more detailed learning result (than the rightmost column of Figure 8). Within the large rectangle spanning from the center to the lower half of Figure 11, the distribution of output vectors (feature vectors F) from the trained learning model 400 (trained model 420) is shown in two dimensions. Furthermore, Figure 16 is a diagram that depicts a similar situation in three dimensions.

上述のように、特徴空間においては、同一人物の複数（たとえば２つの）の画像にそれぞれ対応する複数（たとえば２つ）の特徴ベクトルＦは、非常に近接して（比較的近接して）配置される。たとえば、図１１および図１６において、２つの点状図形（たとえば、クロスハッチング付きの白丸図形のペア）が互いに非常に近接して配置されている。これは、同一人物の２つの画像（同じ衣服を着用した両人物画像）にそれぞれ対応する２つの特徴ベクトルＦが非常に近接して配置されていることを表している。換言すれば、点状図形のペアは、同じ服装の同一人物の画像ペアに対応する。 As described above, in feature space, multiple (e.g., two) feature vectors F corresponding to multiple (e.g., two) images of the same person are located very close (relatively close). For example, in Figures 11 and 16, two point-like figures (e.g., a pair of cross-hatched white circle figures) are located very close to each other. This indicates that two feature vectors F corresponding to two images of the same person (both images of the person wearing the same clothing) are located very close to each other. In other words, the pair of point-like figures corresponds to a pair of images of the same person wearing the same clothing.

また、似た服装の異なる人物の画像同士（詳細にはその特徴ベクトルＦ同士）も、（同一人物の画像同士ほどの近接度合いではないものの）特徴空間において互いに近接して配置される。たとえば、図１１および図１６において、９つの白丸図形ペア（合計１８個の白丸図形）が比較的近くに配置されている。また、当該９つの白丸図形ペア（ハッチング有無の双方）のうち、特に６つの白丸図形ペア（ハッチング付き）が比較的近くに配置されている。 Furthermore, images of different people wearing similar clothing (more specifically, their feature vectors F) are also positioned close to each other in feature space (although not as close as images of the same person). For example, in Figures 11 and 16, nine pairs of white circles (a total of 18 white circles) are positioned relatively close to each other. Of these nine pairs of white circles (both with and without hatching), six pairs of white circles (with hatching) are particularly positioned relatively close to each other.

一方、大きく異なる人物の画像同士（詳細にはその特徴ベクトルＦ同士）は、特徴空間において比較的離れて配置される。たとえば、黒丸図形ペアと白丸図形ペア（ハッチング無し）とは大きく離れて配置されている。 On the other hand, images of people that are significantly different (specifically, their feature vectors F) are positioned relatively far apart in feature space. For example, a pair of black circles and a pair of white circles (no hatching) are positioned far apart.

このように、同一人物に関する特徴ベクトルＦは比較的狭い範囲に密集して存在し、似た服装の異なる人物に関する特徴ベクトルＦは若干広い範囲に密集して存在する。一方、大きく異なる人物（大きく異なる服装を着用した異なる人物）に関する特徴ベクトルＦは、比較的遠く離れて（比較的大きく分散して）存在する。 In this way, feature vectors F relating to the same person are clustered in a relatively narrow range, while feature vectors F relating to different people wearing similar clothing are clustered in a slightly wider range. On the other hand, feature vectors F relating to very different people (different people wearing very different clothing) are clustered relatively far apart (with a relatively large dispersion).

なお、各図においては、図示の都合上、複数の特徴ベクトルＦのうち一部の特徴ベクトルＦのみが示されている。特に、図８、図１５および図１７以降では、図１１および図１６等よりも更に少数の特徴ベクトルＦのみが示されている。 Note that, for convenience of illustration, each figure shows only a portion of the multiple feature vectors F. In particular, in Figures 8, 15, and 17 onwards, even fewer feature vectors F are shown than in Figures 11 and 16, etc.

次に、第２フェーズＰＨ２（ステップＳ１２）において、画像処理装置３０は、学習済みの学習モデル４００（４２０）を用いて２人の人物（クエリ画像２１５の人物とギャラリー画像２１３の人物）が同じ人物か否か（両人物の類似度合い）を判定する。なお、同一人物は同じ服を着用しているとの前提で、同じような服装の人物が同一人物（詳細には、同一人物である可能性が高い人物）であるとして探索される。同じような服装の人物の画像ペア間の類似度は相対的に高くなる。 Next, in the second phase PH2 (step S12), the image processing device 30 uses the trained learning model 400 (420) to determine whether two people (the person in the query image 215 and the person in the gallery image 213) are the same person (the degree of similarity between the two people). It is assumed that the same person wears the same clothes, and people wearing similar clothing are searched for as the same person (more specifically, people who are likely to be the same person). The similarity between pairs of images of people wearing similar clothing is relatively high.

具体的には、たとえば、式（１）で示される類似度Ｓｔを最大にする特徴ベクトルＦの組み合わせ（ひいては対応する画像ペア）が、求められる。詳細には、クエリ画像の特徴ベクトルＦとの類似度Ｓｔを最大化する特徴ベクトルＦに対応するギャラリー画像が抽出される。すなわち、その特徴ベクトルが互いに類似する２つの画像が、同一人物の画像として抽出される。たとえば、図２１（あるいは図１４）等に示されるように、同じ服装（あるいは似た服装）の人物が、同一人物であると推定されて抽出される。 Specifically, for example, the combination of feature vectors F (and thus the corresponding image pair) that maximizes the similarity St shown in equation (1) is found. More specifically, the gallery image corresponding to the feature vector F that maximizes the similarity St with the feature vector F of the query image is extracted. In other words, two images whose feature vectors are similar to each other are extracted as images of the same person. For example, as shown in Figure 21 (or Figure 14), people wearing the same (or similar) clothing are presumed to be the same person and extracted.

さらに、次の第３フェーズＰＨ３（ステップＳ１３）では、同一の人物（ないし類似する人物）であると画像処理装置３０が判断した根拠を説明する処理等が、画像処理装置３０によって実行される。たとえば、類似していると判断した根拠が、白いショートパンツを着用している点なのか、および／または、パターン付き（チェック柄等）のシャツを着用している点なのか等が解析される。また、その前準備として、学習済みモデル４２０は、学習データに基づき、どのようなコンセプトを学習したのかに関する解析処理等が実行される。これらについては次述する。 Furthermore, in the next third phase PH3 (step S13), the image processing device 30 executes processes such as explaining the basis for the image processing device 30's determination that the individuals are the same (or similar). For example, it analyzes whether the basis for the determination that the individuals are similar is that they are wearing white shorts and/or that they are wearing patterned (checked, etc.) shirts. In addition, as a preliminary step, the trained model 420 executes analytical processes, etc., regarding what concepts it has learned based on the training data. These are described below.

＜１－７．説明段階（第３フェーズＰＨ３）の処理（概要）＞
この実施形態においては、画像処理装置３０は更に第３フェーズＰＨ３（図２）の処理を実行する。第３フェーズＰＨ３の処理は、２つの画像（入力画像）の相互間の類似性に関する判断根拠をコンセプトベースで説明する処理（説明情報の生成処理等）である。ここでは、２つの画像の類似性として、クエリ画像２１５と上記推論処理にて探し出された画像２１３との類似性が判断される場合について主に説明する。 <1-7. Explanation Stage (Third Phase PH3) Processing (Overview)>
In this embodiment, the image processing device 30 further executes the processing of the third phase PH3 (FIG. 2). The processing of the third phase PH3 is a processing (e.g., a processing for generating explanatory information) for explaining the basis for determining the similarity between two images (input images) on a concept basis. Here, the case where the similarity between the query image 215 and the image 213 found in the above inference processing is determined as the similarity between two images will be mainly described.

この第３フェーズＰＨ３の処理は、サブフェーズＰＨ３ａ（ステップＳ２０（図４））の処理と、サブフェーズＰＨ３ｂ（ステップＳ３０（図５））の処理とに大別される。 The processing of this third phase PH3 is broadly divided into processing of subphase PH3a (step S20 (Figure 4)) and processing of subphase PH3b (step S30 (Figure 5)).

前者のサブフェーズＰＨ３ａ（ステップＳ２０（図４））では、機械学習された学習モデル４２０にて如何なるコンセプトが獲得（学習）されたかを解析する解析処理が行われる。前者のサブフェーズＰＨ３ａは、後者のサブフェーズＰＨ３ｂの処理の前処理である。 In the former subphase PH3a (step S20 (Figure 4)), an analysis process is performed to analyze what concepts have been acquired (learned) in the machine-learned learning model 420. The former subphase PH3a is a preprocessing step for the latter subphase PH3b.

具体的には、サブフェーズＰＨ３ａにおいては、当該学習モデル４００への複数の入力画像（たとえば、機械学習に用いられた複数の入力画像２１１）の入力に対して当該学習モデル４００から出力される特徴空間における複数の特徴ベクトルＦが取得される。そして、当該複数の特徴ベクトルＦに対する階層化クラスタリング処理を実行することにより、階層化された複数のクラスタＧが生成される。さらに、当該複数のクラスタのうちの特定クラスタに対応するベクトル（当該特定クラスタに関する代表ベクトル）が、当該特定クラスタのコンセプト（詳細には、当該コンセプトを表すベクトル等）として抽出される。当該特定クラスタに関する代表ベクトルとしては、たとえば、当該特定クラスタに関するコンセプト活性化ベクトルＣＡＶ（後述）が利用される。なお、後述するように、特定クラスタに対応する部分空間（代表ベクトルで張られる部分空間等）が、当該特定クラスタのコンセプトとして抽出されてもよい。 Specifically, in subphase PH3a, multiple feature vectors F are obtained in the feature space output from the learning model 400 in response to multiple input images (e.g., multiple input images 211 used in machine learning) input to the learning model 400. A hierarchical clustering process is then performed on the multiple feature vectors F to generate multiple hierarchical clusters G. Furthermore, a vector corresponding to a specific cluster among the multiple clusters (a representative vector for the specific cluster) is extracted as the concept of the specific cluster (more specifically, a vector representing the concept, etc.). For example, the concept activation vector CAV (described below) for the specific cluster is used as the representative vector for the specific cluster. As described below, a subspace corresponding to the specific cluster (e.g., a subspace spanned by the representative vectors) may also be extracted as the concept of the specific cluster.

一方、後者のサブフェーズＰＨ３ｂ（ステップＳ３０（図５））では、２つの画像の類似性に関する判断根拠を導出する処理等が行われる。具体的には、２つの画像の類似性に対する各種コンセプト（サブフェーズＰＨ３ａ（図４）で得られたコンセプト）による影響が評価される。より具体的には、学習モデル４００が機械学習により獲得した各種コンセプト（当該学習モデル４００から抽出された複数のコンセプト）について、２つの画像の類似性に対する寄与度がそれぞれ算出される。そして、当該寄与度等に基づき、これらのコンセプトのうち主要コンセプトが特定される。さらに、当該主要コンセプトを説明する画面表示等が行われる。 On the other hand, in the latter sub-phase PH3b (step S30 (Figure 5)), processing is performed to derive the basis for determining the similarity between the two images. Specifically, the influence of various concepts (concepts obtained in sub-phase PH3a (Figure 4)) on the similarity between the two images is evaluated. More specifically, the contribution of each of the various concepts acquired by the learning model 400 through machine learning (multiple concepts extracted from the learning model 400) to the similarity between the two images is calculated. Then, based on the contribution, etc., a main concept from among these concepts is identified. Furthermore, a screen display explaining the main concept is performed.

以下、サブフェーズＰＨ３ａ，ＰＨ３ｂについてこの順序で説明する。 Subphases PH3a and PH3b will be explained below in that order.

＜１－８．サブフェーズＰＨ３ａ（ステップＳ２０）の処理＞
まず、サブフェーズＰＨ３ａの処理について図４を参照しつつ説明する。 <1-8. Processing of sub-phase PH3a (step S20)>
First, the processing of sub-phase PH3a will be described with reference to FIG.

＜ステップＳ２１＞
図４に示されるように、まずステップＳ２１において、コントローラ３１（画像処理装置３０）は、機械学習された学習モデル４２０への複数の入力画像２１０（２１１）の入力に対して学習モデル４２０から出力された（特徴空間における）複数の特徴ベクトル２５１を取得する。ここでは、学習済みモデル４２０への入力画像２１０として、学習済みモデル４２０の機械学習に用いた教師データ（詳細には、当該教師データを構成する入力画像２１１）を用いる。また、複数の特徴ベクトル２５１は、学習モデル４００に関する学習処理の最終段階にて（多数回の繰り返しを伴う学習処理の後に）最終的に出力される特徴ベクトルである、とも表現される。 <Step S21>
As shown in Fig. 4, first, in step S21, the controller 31 (image processing device 30) acquires a plurality of feature vectors 251 (in feature space) output from the learning model 420 in response to a plurality of input images 210 (211) input to the machine-learned learning model 420. Here, the training data used in the machine learning of the trained model 420 (more specifically, the input images 211 constituting the training data) is used as the input images 210 to the trained model 420. The plurality of feature vectors 251 can also be expressed as feature vectors that are finally output in the final stage of the learning process for the learning model 400 (after the learning process involves multiple iterations).

各特徴ベクトル２５１は、学習済みモデル４２０への各入力画像２１１の入力に対して当該学習済みモデル４２０から出力される各ベクトル（次元数ＣＨ１のベクトル、たとえば、１０２４次元のベクトル）である。上述のように、このようにして得られた複数の特徴ベクトル２５１は、学習後の学習モデル４２０によって特徴空間内の適切な位置に分布する。学習モデル４００に関する距離学習（メトリックラーニング）の結果、特徴空間内での複数の特徴ベクトル２５１の相互間の距離は、入力空間での対応入力画像の類似度を反映している（図１１の下段および図１６等参照）。特徴空間内における当該複数の特徴ベクトル２５１の分布は、学習によって獲得されたコンセプトに基づく分布であるとも考えられる。具体的には、同一コンセプトに対応する特徴ベクトル群、および互いに類似するコンセプトに対応する特徴ベクトル群は、特徴空間において比較的近くに分布している、と考えられる。 Each feature vector 251 is a vector (a vector with dimension CH1, for example, a 1024-dimensional vector) output from the trained model 420 in response to the input of each input image 211 to the trained model 420. As described above, the multiple feature vectors 251 obtained in this manner are distributed to appropriate positions in feature space by the trained training model 420. As a result of distance learning (metric learning) for the training model 400, the distance between the multiple feature vectors 251 in feature space reflects the similarity of the corresponding input images in input space (see the bottom of Figure 11 and Figure 16, etc.). The distribution of the multiple feature vectors 251 in feature space can also be considered to be a distribution based on concepts acquired by training. Specifically, feature vectors corresponding to the same concept and feature vectors corresponding to similar concepts are considered to be distributed relatively close to each other in feature space.

＜ステップＳ２２：階層化クラスタリング処理＞
次に、ステップＳ２２において、コントローラ３１は、特徴空間内での位置関係等に基づき複数の特徴ベクトル２５１に対する階層化クラスタリング処理を実行することにより、階層化された複数のクラスタを生成する。なお、階層化クラスタリング処理は、階層型クラスタリング処理あるいは階層的クラスタリング処理などとも称される。 <Step S22: Hierarchical clustering process>
Next, in step S22, the controller 31 generates a plurality of hierarchical clusters by performing a hierarchical clustering process on the plurality of feature vectors 251 based on the positional relationships in the feature space, etc. The hierarchical clustering process is also called a hierarchical clustering process or a hierarchical clustering process.

階層化クラスタリング処理は、複数の要素（集合の要素）（ここでは特徴ベクトルＦ（２５１））を順次にグルーピングして、階層化されたクラスタ（グループ）を形成する処理である。 Hierarchical clustering is a process in which multiple elements (elements of a set) (here, feature vector F(251)) are sequentially grouped to form hierarchical clusters (groups).

具体的には、階層化クラスタリング処理においては、最も類似する（相互間の類似度が最も高い）暫定クラスタ（次述）同士を１つずつ順次に結合してクラスタを生成することが（全体が１つのクラスタになるまで）繰り返される。これによって、階層化されたクラスタが形成される。暫定クラスタ（暫定的なクラスタ）は、最初は、単一の特徴ベクトルＦで構成され、その後は、単一の特徴ベクトルＦ、又は２以上の特徴ベクトルＦで構成される。当該２以上の特徴ベクトルＦで構成される暫定クラスタは、階層化クラスタリング処理にて生成された（新たな）クラスタを意味する。 Specifically, in the hierarchical clustering process, the most similar (highest mutual similarity) tentative clusters (described below) are sequentially combined one by one to generate clusters (until the entire cluster is reduced to one cluster). This results in the formation of hierarchical clusters. A tentative cluster (provisional cluster) is initially composed of a single feature vector F, and thereafter is composed of a single feature vector F or two or more feature vectors F. A tentative cluster composed of two or more feature vectors F represents a (new) cluster generated in the hierarchical clustering process.

なお、階層化クラスタリング処理において２つの暫定クラスタが相互に類似するか否かは、当該２つの暫定クラスタの距離等（たとえば、ユークリッド距離、あるいはコサイン類似度）に基づいて判定される。階層化クラスタリング処理としては、重心法、最短距離法、最長距離法、群平均法、あるいはウォード法（Ward's method）などの各種の手法が用いられればよい。これらの手法は、当該２つの暫定クラスタの距離等（類似度）を具体的にどのような量として算出するか等に応じて分類される。たとえば、重心法は、２つの暫定クラスタの重心間の距離を当該２つの暫定クラスタの距離（類似度）とする手法である。また、最短距離法は、一方の暫定クラスタのいずれかの要素と他方の暫定クラスタのいずれかの要素との距離のうち、最も短い要素間距離を当該２つの暫定クラスタの距離とする手法である。なお、各手法において類似度を表す指標値は、距離に限定されず、コサイン類似度等が用いられてもよい。以下では、主にコサイン類似度（式（１）参照）を用いる場合について例示する。 In hierarchical clustering, whether two provisional clusters are similar to each other is determined based on the distance between the two provisional clusters (e.g., Euclidean distance or cosine similarity). Hierarchical clustering can be performed using a variety of methods, including the centroid method, shortest distance method, longest distance method, group average method, or Ward's method. These methods are classified according to the specific quantity used to calculate the distance (similarity) between the two provisional clusters. For example, the centroid method uses the distance between the centroids of the two provisional clusters as the distance (similarity) between the two provisional clusters. The shortest distance method uses the shortest inter-element distance between any element in one provisional cluster and any element in the other provisional cluster as the distance between the two provisional clusters. Note that the index value representing similarity in each method is not limited to distance; cosine similarity, etc., may also be used. The following examples primarily use cosine similarity (see Equation (1)).

より詳細には、階層化クラスタリング処理において、まず、全ての暫定クラスタ（最初は個々の特徴ベクトルＦ）同士の組み合わせについて評価値（具体的には、類似度Ｓｔ（式（１）参照）等）が求められる。そして、最も高い評価値（類似度Ｓｔ）を有する暫定クラスタ同士（要素ペア）を結合したグループが新たなクラスタとして形成される。そして、同様の処理が繰り返されることによって、順次に新たなクラスタ（同位クラスタあるいは上位クラスタ）が形成されていき、最終的には、大きな１つのクラスタが形成される。形成された複数のクラスタの包含関係（換言すれば、上下関係）は、１つの樹形図（デンドロイドとも称する）（図１１の上段参照）で表現される。 More specifically, in the hierarchical clustering process, an evaluation value (specifically, similarity St (see equation (1))) is first calculated for all combinations of provisional clusters (initially, individual feature vectors F). Then, a new cluster is formed by combining the provisional clusters (element pairs) with the highest evaluation value (similarity St). By repeating the same process, new clusters (same-level clusters or higher-level clusters) are sequentially formed, until a single large cluster is formed. The inclusion relationships (in other words, the hierarchical relationships) of the multiple clusters that are formed are represented by a single tree diagram (also called a dendroid) (see the top row of Figure 11).

図１１の下段（および図１６）には、図８等と同様に、複数の特徴ベクトルＦ（２５１）が特徴空間（学習済みモデル４２０の出力空間）内に分布する様子が示されている。そして、このような分布を有する複数の特徴ベクトル２５１が、複数階層に階層化された複数のクラスタを形成している。なお、図１１の上段には、当該複数の特徴ベクトルＦに関する樹形図（デンドロイド）が併せて示されている。 The lower part of Figure 11 (and Figure 16), similar to Figure 8, shows how multiple feature vectors F (251) are distributed within the feature space (the output space of the trained model 420). The multiple feature vectors 251 having such a distribution form multiple clusters hierarchically organized in multiple layers. The upper part of Figure 11 also shows a tree diagram (dendroid) for the multiple feature vectors F.

たとえば、上述のような階層化クラスタリング処理の途中段階において、クラスタＧ３１０（図１１の中央左側付近参照）が形成される。クラスタＧ３１０は、互いに近接する複数（図１１では６個）の特徴ベクトル２５１（点状黒丸図形）によって構成される。また、その次以降の或る段階では、クラスタＧ３１０を包含するようなクラスタＧ３００が形成される。クラスタＧ３００は、クラスタＧ３１０内の特徴ベクトル２５１と、クラスタＧ３１０に近接する他の特徴ベクトル２５１（ハッチング付きの点状黒丸図形）とで構成される。クラスタＧ３００は、クラスタＧ３１０の上位クラスタ（親クラスタとも称する）とも表現される。 For example, during the intermediate stage of the hierarchical clustering process described above, cluster G310 (see the area near the center left of Figure 11) is formed. Cluster G310 is composed of multiple (six in Figure 11) feature vectors 251 (dotted black circles) that are close to each other. Then, at a subsequent stage, cluster G300 is formed, which encompasses cluster G310. Cluster G300 is composed of feature vectors 251 within cluster G310 and other feature vectors 251 (hatched dotted black circles) that are close to cluster G310. Cluster G300 is also referred to as the higher-level cluster (also referred to as the parent cluster) of cluster G310.

同様に、階層化クラスタリング処理の或る途中段階において、クラスタＧ１１１とクラスタＧ１１２とクラスタＧ１２０とが形成される（図１１の下側中央付近参照）。クラスタＧ１１１は、互いに近接する複数（図１１では６個）の特徴ベクトル２５１（縦横クロス（格子状）ハッチング付きの点状白丸図形）によって構成される。クラスタＧ１１２は、互いに近接する複数（図１１では６個）の特徴ベクトル２５１（斜めクロスハッチング付きの点状白丸図形）によって構成される。また、クラスタＧ１２０は、互いに近接する複数（図１１では６個）の特徴ベクトル２５１（ハッチング無しの点状白丸図形）によって構成される。その次以降の或る段階では、２つのクラスタＧ１１１，Ｇ１１２の双方を包含するようなクラスタＧ１１０（２つのクラスタＧ１１１，Ｇ１１２の上位クラスタ）が構成される。その後の或る段階では、２つのクラスタＧ１１０，Ｇ１２０の双方を包含するようなクラスタＧ１００（２つのクラスタＧ１１０，Ｇ１２０の上位クラスタ）が構成される。さらに後の或る段階では、２つのクラスタＧ１００，Ｇ２００の双方を包含するようなクラスタＧ１０（２つのクラスタＧ１００，Ｇ２００の上位クラスタ）が構成される。 Similarly, at a certain intermediate stage of the hierarchical clustering process, clusters G111, G112, and G120 are formed (see the lower center of Figure 11). Cluster G111 is composed of multiple (six in Figure 11) feature vectors 251 (dotted white circles with vertical and horizontal cross (checkerboard) hatching) that are close to each other. Cluster G112 is composed of multiple (six in Figure 11) feature vectors 251 (dotted white circles with diagonal cross hatching) that are close to each other. Cluster G120 is composed of multiple (six in Figure 11) feature vectors 251 (dotted white circles without hatching) that are close to each other. At a subsequent stage, cluster G110 (a super-cluster of clusters G111 and G112) is formed, encompassing both clusters G111 and G112. At a later stage, a cluster G100 (a super-cluster of the two clusters G110 and G120) is constructed that includes both clusters G110 and G120. At an even later stage, a cluster G10 (a super-cluster of the two clusters G100 and G200) is constructed that includes both clusters G100 and G200.

また、他のクラスタＧ４００，Ｇ５１０，Ｇ５００なども、階層化クラスタリング処理の進展に伴って形成されていく。 In addition, other clusters G400, G510, G500, etc. will be formed as the hierarchical clustering process progresses.

＜階層化クラスタリング処理の処理結果＞
このような階層化クラスタリング処理によって、例えば図１１のような複数のクラスタ（Ｇ１１１，Ｇ１１２，Ｇ１１０，Ｇ１２０，Ｇ１００，Ｇ２００，Ｇ１０，Ｇ３１０，Ｇ３００，Ｇ４００，Ｇ５１０，Ｇ５００等）が形成される。なお、図１１においては、図示の都合上、多数の特徴ベクトル２５１のうちの一部の特徴ベクトル２５１のみが図示されており、且つ、多数のクラスタのうちの一部のクラスタのみが図示されている。 <Results of hierarchical clustering processing>
By such hierarchical clustering processing, a plurality of clusters (G111, G112, G110, G120, G100, G200, G10, G310, G300, G400, G510, G500, etc.) are formed, for example, as shown in Fig. 11. Note that, for convenience of illustration, Fig. 11 shows only some of the many feature vectors 251, and also shows only some of the many clusters.

また、各特徴ベクトルＦ（２５１）は、それぞれ、各入力画像Ｘ（２１１）に対応している。それ故、階層化クラスタリング処理は、複数の特徴ベクトルＦ（２５１）をクラスタリングする処理であるとともに、複数の入力画像Ｘ（２１１）をクラスタリングする処理でもある（図１３参照）。図１３は、図１１に示される複数のクラスタのうちの一部のクラスタを示す図である。図１３においては、当該一部のクラスタに対応する入力画像２１０（２１１）が示されている。なお、入力画像２１０は実際には撮影画像であるものの、図１３（以後の図（図１４、図２１、図２５、図２６等）でも同様）では図示の都合上、入力画像２１０（２１１、２１３，２１５）がＣＧ（コンピュータグラフィックス）画像で表現されている。 Furthermore, each feature vector F (251) corresponds to a respective input image X (211). Therefore, the hierarchical clustering process is a process of clustering multiple feature vectors F (251) as well as a process of clustering multiple input images X (211) (see Figure 13). Figure 13 is a diagram showing some of the multiple clusters shown in Figure 11. Figure 13 shows input images 210 (211) corresponding to these some of the clusters. Note that while input image 210 is actually a photographed image, for convenience of illustration, in Figure 13 (and the subsequent figures (Figures 14, 21, 25, 26, etc.)), input images 210 (211, 213, 215) are represented as CG (computer graphics) images.

たとえば、クラスタＧ１１１には、縦横クロス（格子状）ハッチング付きの複数の点状白丸図形（図１１参照）にそれぞれ対応する複数の入力画像が含まれている、とも表現できる（図１３も参照）。同様に、クラスタＧ１１２には、斜めクロスハッチング付きの複数の点状白丸図形（図１１参照）にそれぞれ対応する複数の入力画像が含まれている。また、上位クラスタＧ１１０には、下位クラスタＧ１１１に含まれる複数の入力画像と、下位クラスタＧ１１２に含まれる複数の入力画像との双方が含まれている。 For example, cluster G111 can also be expressed as including a plurality of input images corresponding to a plurality of dotted white circle shapes with vertical and horizontal cross (checkerboard) hatching (see Figure 11) (see also Figure 13). Similarly, cluster G112 includes a plurality of input images corresponding to a plurality of dotted white circle shapes with diagonal cross hatching (see Figure 11). Furthermore, upper cluster G110 includes both a plurality of input images included in lower cluster G111 and a plurality of input images included in lower cluster G112.

より詳細には、図１３に示されるように、クラスタＧ１１１は、「模様多めの白（白地）シャツ（且つその模様が直線的なもの）」を着用している少なくとも１人（ここでは３人以上）の人物に関する複数の画像２１１で構成されている。クラスタＧ１１２は、「模様多めの白シャツ（且つその模様が曲線的なもの）」を着用している少なくとも１人の人物に関する複数の画像２１１で構成されている。上位クラスタＧ１１０は、下位クラスタＧ１１１に含まれる人物画像２１１と、下位クラスタＧ１１２に含まれる人物画像２１１との双方を備えて構成されている。より具体的には、上位クラスタＧ１１０は、「模様多めの白シャツ」（その模様は直線的な模様であってもよく曲線的な模様であってもよい）を着用している少なくとも１人の人物に関する複数の画像２１１で構成されている。 More specifically, as shown in FIG. 13, cluster G111 is composed of multiple images 211 of at least one person (here, three or more people) wearing a "heavily patterned white (white background) shirt (where the pattern is linear)." Cluster G112 is composed of multiple images 211 of at least one person wearing a "heavily patterned white shirt (where the pattern is curved)." Upper cluster G110 is composed of both person images 211 included in lower cluster G111 and person images 211 included in lower cluster G112. More specifically, upper cluster G110 is composed of multiple images 211 of at least one person wearing a "heavily patterned white shirt" (where the pattern may be linear or curved).

また、クラスタＧ１１０（「模様多めの白シャツ」の人物画像に対応するクラスタ）に対して同位関係を有するクラスタＧ１２０は、「模様少なめの白シャツ」を着用している少なくとも１人の人物に関する複数の画像２１１で構成されている。 Furthermore, cluster G120, which has a peer relationship with cluster G110 (the cluster corresponding to images of people wearing "white shirts with a lot of pattern"), is composed of multiple images 211 of at least one person wearing a "white shirt with a little pattern."

さらに、上位クラスタＧ１００は、下位クラスタＧ１１０に含まれる人物画像２１１と、下位クラスタＧ１２０に含まれる人物画像２１１との双方を備えて構成されている。より具体的には、上位クラスタＧ１００は、「模様有りの白シャツ」（その模様は多めであってもよく少なめであってもよい）を着用している少なくとも１人の人物に関する複数の画像２１１で構成されている。 Furthermore, the upper cluster G100 is composed of both the person images 211 included in the lower cluster G110 and the person images 211 included in the lower cluster G120. More specifically, the upper cluster G100 is composed of multiple images 211 of at least one person wearing a "white shirt with a pattern" (which may have a lot of pattern or not).

また、クラスタＧ１００（「模様有りの白シャツ」の人物画像に対応するクラスタ）に対して同位関係を有するクラスタＧ２００は、「薄いピンク色のシャツ」を着用している少なくとも１人の人物に関する複数の画像２１１で構成されている。 Furthermore, cluster G200, which has a peer relationship with cluster G100 (the cluster corresponding to images of people wearing "patterned white shirts"), is composed of multiple images 211 of at least one person wearing a "light pink shirt."

その他のクラスタも同様に、特定の概念で互いに類似した服装の人物の画像で構成される。 Other clusters similarly consist of images of people dressed similarly to each other for a particular concept.

このように、各入力画像２１０（２１１）に対応する複数の特徴ベクトル２５０（２５１）をクラスタリングすることは、複数の入力画像２１０をクラスタリングすることと等価である。 In this way, clustering multiple feature vectors 250 (251) corresponding to each input image 210 (211) is equivalent to clustering multiple input images 210.

換言すれば、（階層化クラスタリング処理で生成された）特定クラスタに所属する複数の入力画像には共通の特徴が存在し、当該特定クラスタは、固有のコンセプトを有していると解釈される。 In other words, multiple input images belonging to a specific cluster (generated by the hierarchical clustering process) share common features, and the specific cluster is interpreted as having a unique concept.

また、上位クラスタのコンセプトは、その下位クラスタのコンセプトを包括（包含）するコンセプト（包括的コンセプト）である。逆に、下位クラスタのコンセプトは、その上位クラスタのコンセプトを細分化したコンセプトである。端的に言えば、上位コンセプトは粒度の粗いコンセプトであり、下位コンセプトは粒度の細かいコンセプトである。また、上位コンセプトは、比較的多くの人物間で共有されるコンセプトであり、下位コンセプトは、比較的少ないの人物間で共有されるコンセプトである、とも表現される。 Furthermore, the concepts of a higher-level cluster are concepts (overarching concepts) that encompass (subsume) the concepts of its lower-level clusters. Conversely, the concepts of lower-level clusters are concepts that subdivide the concepts of their higher-level clusters. In short, higher-level concepts are coarse-grained concepts, while lower-level concepts are fine-grained concepts. It can also be said that higher-level concepts are concepts shared by a relatively large number of people, while lower-level concepts are concepts shared by a relatively small number of people.

なお、図１１の樹形図（デンドロイド）は一例であり、学習データ等に依拠して異なる樹形図が生成される。 Note that the tree diagram (dendroid) in Figure 11 is just an example, and different tree diagrams will be generated depending on the training data, etc.

また、このような階層化クラスタリング処理においては、非常に多数のクラスタが生成される。階層化クラスタリング処理で生成された全てのクラスタが以後の処理（特にステップＳ３１（後述）以降の処理）に利用されてもよいが、これに限定されない。たとえば、このような多数のクラスタのうち、所定人数（たとえば３人）以上を含むクラスタが、類似性の説明根拠を示すクラスタ（当該説明根拠のコンセプトを形成するクラスタ）として利用されることが好ましい。これによれば、（学習データの人物に固有の特徴への依存を抑制し）コンセプトのロバスト性を向上させることが可能である。換言すれば、単一の人物のみで構成されるクラスタは、類似性の説明根拠を示すクラスタからは除外されてもよい。 Furthermore, such hierarchical clustering processing generates a very large number of clusters. All clusters generated by the hierarchical clustering processing may be used in subsequent processing (particularly processing after step S31 (described below)), but this is not limited to this. For example, of these many clusters, it is preferable to use clusters containing a predetermined number of people (e.g., three people) or more as clusters that provide explanatory grounds for similarity (clusters that form the concept of said explanatory grounds). This makes it possible to improve the robustness of the concept (by reducing dependence on characteristics unique to people in the training data). In other words, clusters consisting of only a single person may be excluded from clusters that provide explanatory grounds for similarity.

また、特に、階層化クラスタリング処理にて生成された全てのクラスタのうち、そのコンセプトベクトルＵ（後述）が互いに１次独立（線形独立）となるような複数のクラスタのみが、用いられることが好ましい。 In particular, it is preferable to use only clusters whose concept vectors U (described below) are linearly independent of each other out of all the clusters generated by the hierarchical clustering process.

＜ステップＳ２３：コンセプトベクトルＵの抽出処理＞
次のステップＳ２３では、コントローラ３１は、各クラスタに対応する部分空間あるいはベクトルを、当該各クラスタのコンセプトとして抽出する（ステップＳ２３）。ここでは、各クラスタに対応するベクトル（「コンセプトベクトル」（次述））が、当該各クラスタのコンセプトとして抽出される。 <Step S23: Extraction process of concept vector U>
In the next step S23, the controller 31 extracts a subspace or vector corresponding to each cluster as a concept of the cluster (step S23). Here, a vector ("concept vector" (described next)) corresponding to each cluster is extracted as a concept of the cluster.

より詳細には、まず、コントローラ３１は、全クラスタのうち、以後の処理での検討対象となり得るクラスタを選択する。具体的には、階層化クラスタリング処理にて生成された全クラスタの中から、所定の基準を充足する一部のクラスタが選択される。選択されたクラスタは、検討対象候補クラスタとも称される。当該所定の基準は、たとえば、（上述のような）所定数（たとえば３人）以上の人物を含むクラスタであること等である。あるいは、全てのクラスタが検討対象候補クラスタとして選択されてもよい。 More specifically, the controller 31 first selects, from all clusters, clusters that can be considered for subsequent processing. Specifically, from all clusters generated by the hierarchical clustering process, a portion of clusters that satisfy predetermined criteria are selected. The selected clusters are also referred to as candidate clusters for consideration. The predetermined criteria may be, for example, that the clusters contain a predetermined number of people (e.g., three people) or more (as described above). Alternatively, all clusters may be selected as candidate clusters for consideration.

そして、コントローラ３１は、選択された一部のクラスタ（複数の特定クラスタ）のそれぞれに対応する各ベクトルＵを、当該各クラスタ（各特定クラスタ）のコンセプトとして抽出する（ステップＳ２３）。特定クラスタに対応するベクトルＵは、具体的には、当該特定クラスタに関する代表ベクトルである。なお、「代表ベクトル」は、特定クラスタに属する複数の特徴ベクトルのいずれかであることを要さず、当該複数の特徴ベクトルで構成された１つのまとまりを代表的に示すベクトル（当該特定クラスタに属する複数の特徴ベクトルを象徴するような代表的なベクトル）であればよい。また、ベクトルＵは、当該特定クラスタのコンセプトを表現するベクトル（当該コンセプトのベクトル表現）であることから、「コンセプトベクトル」とも表現される。当該コンセプトベクトルＵも、特徴ベクトルＦと同様に、正規化されていることが好ましい。 The controller 31 then extracts each vector U corresponding to each of the selected clusters (multiple specific clusters) as a concept for that cluster (each specific cluster) (step S23). Specifically, the vector U corresponding to a specific cluster is a representative vector for that specific cluster. Note that the "representative vector" does not have to be one of the multiple feature vectors belonging to the specific cluster; it may be a vector that representatively represents a group made up of the multiple feature vectors (a representative vector that symbolizes the multiple feature vectors belonging to the specific cluster). Furthermore, since vector U is a vector that expresses the concept of the specific cluster (a vector representation of the concept), it is also referred to as a "concept vector." It is preferable that the concept vector U is also normalized, similar to feature vector F.

特定クラスタのコンセプトベクトルＵは、たとえば、当該特定クラスタに属する（当該特定クラスタを構成する）複数の特徴ベクトルＦ（２５１）の平均ベクトルとして求められる。当該平均ベクトルは、特徴空間の超平面（超球面）上における当該複数の特徴ベクトルＦの重心位置（平均位置）を示すベクトルであることから、重心ベクトルとも称される。 The concept vector U of a specific cluster is calculated, for example, as the average vector of multiple feature vectors F (251) that belong to (constitute) that specific cluster. This average vector is also called the centroid vector, because it indicates the centroid position (average position) of the multiple feature vectors F on the hyperplane (hypersphere) of the feature space.

あるいは、特定クラスタのコンセプトベクトルＵは、特定クラスタの「コンセプト活性化ベクトル」（ＣＡＶ：Concept Activation Vector）であってもよい。特定クラスタのコンセプト活性化ベクトルは、特定クラスタに属する要素とそれ以外の要素とを分離する分離平面５０１の法線ベクトルである（図１７参照）。図１７においては、特徴空間（ここでは、超球面）における分離平面５０１が示されている。分離平面５０１は、特定クラスタＧａに属する要素（点状白丸図形（で表される特徴ベクトルＦａ）参照）と特定クラスタＧａに属しない要素（他の点状図形（で表される特徴ベクトルＦｂ）参照）とを分離する平面である。このような分離平面５０１は、２クラス分類の線形分離器（線形識別器）（サポートベクトルマシン等）によって求めることができる。この分離平面５０１に垂直な（且つ外向きの）ベクトルが、特定クラスタＧａのコンセプト活性化ベクトル（ＣＡＶ）である。 Alternatively, the concept vector U of a specific cluster may be the "concept activation vector" (CAV) of the specific cluster. The concept activation vector of a specific cluster is the normal vector of a separation plane 501 that separates elements belonging to the specific cluster from other elements (see Figure 17). Figure 17 shows the separation plane 501 in feature space (here, a hypersphere). The separation plane 501 is a plane that separates elements belonging to the specific cluster Ga (see the dotted white circle (feature vector Fa) represented by) from elements that do not belong to the specific cluster Ga (see the other dotted circle (feature vector Fb) represented by). Such a separation plane 501 can be obtained using a two-class linear separator (linear classifier) (such as a support vector machine). The vector perpendicular (and pointing outward) to this separation plane 501 is the concept activation vector (CAV) of the specific cluster Ga.

図１８～図２０は、各クラスタのコンセプトベクトルＵ（ＣＡＶ等）を示す図である。各図において、その左側には或るクラスタＧが示されており、その右側には当該或るクラスタＧのコンセプトベクトルＵ（超球面上の点へ向かうベクトルで表現されたコンセプトベクトル）が示されている。 Figures 18 to 20 show the concept vectors U (CAV, etc.) for each cluster. In each figure, a certain cluster G is shown on the left, and the concept vector U for that certain cluster G (a concept vector expressed as a vector pointing to a point on the hypersphere) is shown on the right.

たとえば、図１８の上段にはクラスタＧ１１０のコンセプトベクトルＵ（Ｕ１１０）が示されており、図１８の中段にはクラスタＧ１２０のコンセプトベクトルＵ（Ｕ１２０）が示されている。また、図１８の下段にはクラスタＧ１００のコンセプトベクトルＵ（Ｕ１００）が示されている。 For example, the upper part of Figure 18 shows the concept vector U (U110) of cluster G110, and the middle part of Figure 18 shows the concept vector U (U120) of cluster G120. The lower part of Figure 18 shows the concept vector U (U100) of cluster G100.

また、図１９の上段にはクラスタＧ２００のコンセプトベクトルＵ（Ｕ２００）が示されており、図１９の中段にはクラスタＧ３１０のコンセプトベクトルＵ（Ｕ３１０）が示されている。また、図１９の下段にはクラスタＧ３００のコンセプトベクトルＵ（Ｕ３００）が示されている。 The upper part of Figure 19 shows the concept vector U (U200) of cluster G200, and the middle part of Figure 19 shows the concept vector U (U310) of cluster G310. The lower part of Figure 19 shows the concept vector U (U300) of cluster G300.

同様に、図２０の上段にはクラスタＧ４００のコンセプトベクトルＵ（Ｕ４００）が示されており、図２０の中段にはクラスタＧ５１０のコンセプトベクトルＵ（Ｕ５１０）が示されている。また、図２０の下段にはクラスタＧ５００のコンセプトベクトルＵ（Ｕ５００）が示されている。 Similarly, the upper part of Figure 20 shows the concept vector U (U400) of cluster G400, and the middle part of Figure 20 shows the concept vector U (U510) of cluster G510. The lower part of Figure 20 shows the concept vector U (U500) of cluster G500.

なお、ここでは、図示されていないが、他のコンセプトベクトルＵ、たとえば、コンセプトベクトルＵ１１１，Ｕ１１２，Ｕ１０等も同様に求められる。 Although not shown here, other concept vectors U, such as concept vectors U111, U112, U10, etc., can also be calculated in a similar manner.

これらの図に示されるように、デンドロイド（図１１上段参照）にて互いに近くに存在するクラスタ同士のコンセプトベクトルＵ同士は、比較的類似する。たとえば、クラスタＧ１１０のコンセプトベクトルＵ１１０（図１８上段参照）とクラスタＧ１２０のコンセプトベクトルＵ１２０（図１８中段参照）とは、互いに比較的近い（比較的近い向きを有している）。逆に、デンドロイド（図１１上段参照）にて互いに遠く離れたクラスタ同士のコンセプトベクトルＵ同士は、大きく異なる。たとえば、クラスタＧ１００のコンセプトベクトルＵ１００（図１８下段参照）とクラスタＧ５００のコンセプトベクトルＵ５００（図２０下段参照）とは、互いに大きく異なる（大きく異なる向きを有している）。 As shown in these figures, the concept vectors U of clusters that are close to each other in the dendroid (see top row of Figure 11) are relatively similar. For example, the concept vector U110 of cluster G110 (see top row of Figure 18) and the concept vector U120 of cluster G120 (see middle row of Figure 18) are relatively close to each other (have relatively similar orientations). Conversely, the concept vectors U of clusters that are far apart from each other in the dendroid (see top row of Figure 11) are significantly different from each other. For example, the concept vector U100 of cluster G100 (see bottom row of Figure 18) and the concept vector U500 of cluster G500 (see bottom row of Figure 20) are significantly different from each other (have significantly different orientations).

このように、コンセプトベクトルＵの類似性は、各クラスタの特徴（換言すれば、各クラスタのコンセプト）の類似性を反映している。 In this way, the similarity of the concept vector U reflects the similarity of the features of each cluster (in other words, the concepts of each cluster).

また、上述したように、（階層化クラスタリング処理で生成された）特定クラスタに所属する複数の入力画像には共通の特徴が存在し、当該特定クラスタは、固有のコンセプトを有していると解釈される。 Furthermore, as mentioned above, multiple input images belonging to a specific cluster (generated by the hierarchical clustering process) share common characteristics, and the specific cluster is interpreted as having a unique concept.

そこで、コントローラ３１は、各クラスタのコンセプトベクトルＵを当該各クラスタのコンセプト（ないしコンセプト表現）として抽出する。各クラスタのコンセプトベクトルＵは、当該各クラスタのコンセプトを表現するベクトルである。換言すれば、各クラスタのコンセプトベクトルＵは、各クラスタの「ベクトルによるコンセプト表現」でもある。 The controller 31 therefore extracts the concept vector U of each cluster as the concept (or concept expression) of that cluster. The concept vector U of each cluster is a vector that expresses the concept of that cluster. In other words, the concept vector U of each cluster is also a "vector-based concept expression" of that cluster.

以上のように、学習済みモデル４２０は、（階層化クラスタリング処理にて生成された）各クラスタのコンセプトを学習したモデルであると解釈される。そして、各クラスタのコンセプトベクトルＵが、当該各クラスタのコンセプトとして抽出される。 As described above, the trained model 420 is interpreted as a model that has learned the concepts of each cluster (generated by the hierarchical clustering process). The concept vector U of each cluster is then extracted as the concept of that cluster.

なお、上述したように、図１６および図１８～２０等においては、各コンセプトベクトルＵが３次元的に表現されている。ただし、実際には、各コンセプトベクトルＵは、非常に高い次元（β次元（たとえば１０２４次元））のベクトル（多次元ベクトル）である。したがって、互いに異なるクラスタ（コンセプト）に対応する非常に多数（γmax個）（たとえば３００個）（ただし、γmax＜β））のコンセプトベクトルＵが、１次独立（線形独立）のベクトルとして存在する。 As mentioned above, each concept vector U is represented three-dimensionally in Figure 16 and Figures 18 to 20. However, in reality, each concept vector U is a vector (multidimensional vector) of very high dimension (β dimension (e.g., 1024 dimension)). Therefore, a very large number (γmax) (e.g., 300) (where γmax < β)) of concept vectors U corresponding to different clusters (concepts) exist as linearly independent vectors.

また、このような多次元の（たとえば１０２４次元）コンセプトベクトルＵの図示（３次元表現）には限界がある。各コンセプトベクトルＵは、図示可能な３つの次元以外の他の次元（４次元目以降の第ｉ次元等）に実質的な特徴成分を有していることが多い。たとえば、図１６、図１８～図２０等においては、互いに大きく異なる複数のコンセプトベクトルＵは３次元内にて異なる向きを有している。ただし、実際には、互いに大きく異なる複数のコンセプトベクトルＵは４次元目以降の第ｉ次元等において相違する向きを有していることが多い。 Furthermore, there are limitations to illustrating (three-dimensionally expressing) such multidimensional (e.g., 1024-dimensional) concept vectors U. Each concept vector U often has substantial feature components in dimensions other than the three dimensions that can be illustrated (such as the i-th dimension from the fourth dimension onward). For example, in Figures 16, 18 to 20, etc., multiple concept vectors U that are significantly different from one another have different orientations within the three dimensions. However, in reality, multiple concept vectors U that are significantly different from one another often have different orientations in the i-th dimension from the fourth dimension onward.

＜１－９．２つの画像の類似度（コンセプト群ごとの類似度Ｓｃ等）＞
サブフェーズＰＨ３ｂ（図５）の処理について説明する前に、２つの画像（画像ペア）の類似度について説明する。２つの画像（第１画像および第２画像）としては、クエリ画像とステップＳ１２で抽出されたギャラリー画像とが例示される。ただし、これに限定されず、任意の２つの画像の類似性についても同様である。 <1-9. Similarity of two images (similarity Sc for each concept group, etc.)>
Before describing the processing of sub-phase PH3b (FIG. 5), the similarity between two images (an image pair) will be described. The two images (first and second images) are, for example, a query image and a gallery image extracted in step S12. However, the present invention is not limited to this, and the similarity between any two images can be determined in the same manner.

まず、上述のように、２つの入力画像の相互間における類似度Ｓｔ（画像ペア相互間における全体的な類似度Ｓｔ）は、上式（１）で表される。すなわち、２つの入力画像Ｘの類似度Ｓｔは、当該２つの入力画像Ｘに対する学習モデル４００からの出力ベクトルｑ，ｇ（特徴ベクトルＦ）の内積（ｑ・ｇ）（ここでは、ｃｏｓθ）として算出（表現）される。たとえば、図２１に示すような２つの入力画像（クエリ画像およびギャラリー画像）の類似度Ｓｔは、それぞれの特徴ベクトルｑ，ｇの内積（図２１では、Ｓｔ＝０．７７８）として算出される。 First, as described above, the similarity St between two input images (the overall similarity St between an image pair) is expressed by the above formula (1). That is, the similarity St between two input images X is calculated (expressed) as the dot product (q·g) (here, cos θ) of the output vectors q and g (feature vector F) from the learning model 400 for the two input images X. For example, the similarity St between two input images (query image and gallery image) shown in Figure 21 is calculated as the dot product of their respective feature vectors q and g (St = 0.778 in Figure 21).

ここで、当該類似度Ｓｔ（全体的な類似度）とは異なる別の指標（具体的には、類似度Ｓｃ）を導入する。この類似度Ｓｃは、複数のコンセプトのうち、類似性に関する寄与度合いを考慮する対象として選択されたコンセプト（考慮対象コンセプトないし被選択コンセプトとも称する）で説明される、（両画像の）類似度合いを示す指標値である。端的に言えば、類似度Ｓｃは、２つの入力画像の相互間における全体的な類似性ではなく、一部の被選択コンセプトに起因する部分的な類似性を表す。換言すれば、類似度Ｓｃは、２つの入力画像の相互間における全体的な類似度Ｓｔのうち、一部の被選択コンセプト（考慮対象コンセプト）が寄与する成分を表す。謂わば、類似度Ｓｃは、両画像の類似性に対して被選択コンセプトが寄与する程度（寄与度）を示す指標である。類似度Ｓｃは、コンセプトごと（コンセプトベクトルごと、或いはクラスタごと）の類似度（寄与度）である、とも表現される。なお、類似度Ｓｃは、類似度Ｓｔと同様に、スカラー（値）である。 Here, we introduce another index (specifically, similarity Sc) different from the similarity St (overall similarity). This similarity Sc is an index value indicating the degree of similarity (between two images) explained by a concept (also referred to as a concept under consideration or a selected concept) selected from multiple concepts for consideration of its contribution to similarity. Simply put, similarity Sc does not represent the overall similarity between two input images, but rather the partial similarity resulting from some of the selected concepts. In other words, similarity Sc represents the component contributed by some of the selected concepts (concepts under consideration) to the overall similarity St between two input images. In other words, similarity Sc is an index indicating the degree of contribution (contribution) of the selected concept to the similarity between the two images. Similarity Sc can also be expressed as the similarity (contribution) for each concept (each concept vector or each cluster). Note that similarity Sc, like similarity St, is a scalar (value).

この被選択コンセプトに関する類似度Ｓｃは、被選択コンセプトに対応したコンセプトベクトルＵにより張られる部分空間（特定部分空間とも称する）上での画像特徴量（特徴ベクトルＦｑ（＝ｑ）およびＦｇ（＝ｇ））間の類似度合いで表現される。具体的には、当該類似度Ｓｃは、第１特徴ベクトルｑを当該特定部分空間に射影（直交射影）したベクトル（Ｐｑ）と、第２特徴ベクトルｇを当該特定部分空間に射影（直交射影）したベクトル（Ｐｇ）との内積で表現される。ベクトル（Ｐｑ）は、射影行列Ｐ（次述）を特徴ベクトルｑに対して（左から）作用させたベクトル（正射影ベクトル）であり、ベクトルＰｇは、射影行列Ｐを特徴ベクトルｇに対して（左から）作用させたベクトル（正射影ベクトル）である。 This similarity Sc for the selected concept is expressed as the degree of similarity between image features (feature vectors Fq (= q) and Fg (= g)) in a subspace (also called a specific subspace) spanned by the concept vector U corresponding to the selected concept. Specifically, this similarity Sc is expressed as the dot product of a vector (Pq) obtained by projecting (orthogonally projecting) the first feature vector q onto the specific subspace, and a vector (Pg) obtained by projecting (orthogonally projecting) the second feature vector g onto the specific subspace. Vector (Pq) is a vector (orthogonal projection vector) obtained by applying (from the left) the projection matrix P (described below) to feature vector q, and vector Pg is a vector (orthogonal projection vector) obtained by applying (from the left) the projection matrix P to feature vector g.

すなわち、当該類似度Ｓｃは、次の式（２）で表現される。 That is, the similarity Sc is expressed by the following equation (2):

ただし、行列Ｐは、各特徴ベクトルＦを、上記特定部分空間（被選択コンセプトに対応したコンセプトベクトルＵにより張られる部分空間）に射影する特定の射影行列（詳細には直交射影行列）である。具体的には、射影行列（直交射影行列）Ｐは、行列Ｂを用いて次の式（３）で算出される。行列Ｐは、β×βのサイズを有している。値βは、ベクトルＦ（ｑあるいはｇ）の次元数（たとえば１０２４）である。 However, matrix P is a specific projection matrix (more specifically, an orthogonal projection matrix) that projects each feature vector F onto the specific subspace (the subspace spanned by concept vector U corresponding to the selected concept). Specifically, the projection matrix (orthogonal projection matrix) P is calculated using matrix B according to the following equation (3). Matrix P has a size of β×β. The value β is the number of dimensions of vector F (q or g) (for example, 1024).

ここで、行列Ｂは、所定数（被選択コンセプトの個数（γ個））のコンセプトベクトルＵ（縦ベクトル（列ベクトル））を横方向に並べた行列である。行列Ｂは、β×γ（たとえば、１０２４×２）サイズを有している。また、γは、選択するコンセプトの個数（たとえば、２（１あるいは３などでもよい））である。また、行列の右上の添え字「Ｔ」は転置（行列）であることを示す。 Here, matrix B is a matrix in which a predetermined number (the number of concepts to be selected (γ)) of concept vectors U (column vectors) are arranged horizontally. Matrix B has a size of β x γ (for example, 1024 x 2). γ is the number of concepts to be selected (for example, 2 (or 1 or 3, etc.)). The subscript "T" in the upper right corner of the matrix indicates that it is a transpose (matrix).

γ個のコンセプトベクトルＵは、互いに１次独立（線形独立）となるように選択される。また、被選択コンセプトの個数γの最大値は、βである。 The γ concept vectors U are selected so that they are linearly independent of each other. Furthermore, the maximum value of the number γ of selected concepts is β.

なお、式（３）は、１又は２以上のベクトルｂで張られる部分空間（２以上のベクトルｂを基底とする部分空間）への直交射影を表す行列（直交射影行列）を求める一般的な式でもある。ただし、ベクトルｂとしてコンセプトベクトルＵを用いている。 Note that equation (3) is also a general equation for finding a matrix (orthogonal projection matrix) that represents an orthogonal projection onto a subspace spanned by one or more vectors b (a subspace based on two or more vectors b). However, the concept vector U is used as vector b.

行列Ｂ（ひいては行列Ｐ）は、被選択コンセプト（詳細には当該被選択コンセプトに対応するコンセプトベクトル）に依拠して、異なる行列になる。詳細には、いずれのコンセプト（およびいくつのコンセプト）を選択するかに依拠して、行列Ｂ（および行列Ｐ）は変動する。 Matrix B (and therefore matrix P) will differ depending on the selected concepts (more specifically, the concept vectors corresponding to those selected concepts). More specifically, matrix B (and matrix P) will vary depending on which concepts (and how many concepts) are selected.

なお、逆行列を算出する際の便宜上、式（３）の代わりに次の式（４）を用いて射影行列Ｐが算出されてもよい。+εＥの項は、無限大への発散等を防止するための調整項である。行列Ｅは次元数βの単位行列（β×β（サイズ）の単位行列）であり、値εは微少な定数である。 For convenience in calculating the inverse matrix, the projection matrix P may be calculated using the following equation (4) instead of equation (3). The term +εE is an adjustment term to prevent divergence to infinity. Matrix E is a unit matrix of dimension β (unit matrix of β × β (size)), and the value ε is a tiny constant.

再び式（２）を参照し、当該式（２）について説明する。 Referring again to equation (2), we will explain equation (2).

式（２）においては、上述のように、２つの入力画像Ｘ（Ｘｑ，Ｘｇ）に対応する２つの特徴ベクトルＦ（具体的には、ベクトルｑ，ｇ）が、特定の射影行列Ｐを用いて２つの射影ベクトル（ＰＦ）に変換される。射影ベクトル（ＰＦ）は、射影行列Ｐを特徴ベクトルＦに対して（左から）作用させたベクトルである。上述のように、当該特定の射影行列Ｐは、特徴空間内における各特徴ベクトルを、特定の部分空間（特定コンセプトに対応する部分空間）に射影（直交射影）する射影行列（直交射影行列）である。 In equation (2), as described above, two feature vectors F (specifically, vectors q and g) corresponding to two input images X (Xq, Xg) are transformed into two projection vectors (PF) using a specific projection matrix P. The projection vectors (PF) are vectors obtained by applying the projection matrix P to the feature vectors F (from the left). As described above, the specific projection matrix P is a projection matrix (orthogonal projection matrix) that projects (orthogonally projects) each feature vector in the feature space onto a specific subspace (a subspace corresponding to a specific concept).

この特定の射影行列Ｐ（式（３）等参照）は、評価対象のｎ個の被選択コンセプト（１又は２以上のコンセプト）で構成されるコンセプト群ごとに規定される。 This specific projection matrix P (see equation (3), etc.) is defined for each concept group consisting of n selected concepts (one or more concepts) to be evaluated.

たとえば、評価対象のコンセプトが単一のコンセプトである場合、特定の射影行列Ｐは、特徴空間内における各特徴ベクトルＦを、特定の直線（特定の部分空間）に射影する射影行列である。より詳細には、単一のコンセプトベクトルＵで規定される単一のコンセプトに関する特定の射影行列Ｐは、特徴空間内における各特徴ベクトルを、特定の直線（当該単一のコンセプトベクトルＵを含む直線）に射影する射影行列である。 For example, if the concept to be evaluated is a single concept, the specific projection matrix P is a projection matrix that projects each feature vector F in the feature space onto a specific line (a specific subspace). More specifically, the specific projection matrix P for a single concept defined by a single concept vector U is a projection matrix that projects each feature vector in the feature space onto a specific line (a line that includes the single concept vector U).

図２３は、このような射影行列Ｐ（ただし、単一のコンセプトベクトルＵ１で張られる部分空間（すなわち直線）への直交射影行列）により各特徴ベクトルｑ，ｇがそれぞれ特定の直線（単一のコンセプトベクトルＵ１を含む直線）に射影（正射影）される様子を示している。 Figure 23 shows how each feature vector q, g is projected (orthogonally projected) onto a specific line (a line containing a single concept vector U1) using such a projection matrix P (which is an orthogonal projection matrix onto a subspace (i.e., a line) spanned by a single concept vector U1).

図２３に示されるように、この直交射影行列Ｐによって、ベクトルｑはベクトルｑ１に変換され且つベクトルｇはベクトルｇ１に変換される。図２３では、変換後のベクトルｑ１，ｇ１は太い破線で示されている。このように、ベクトルｑ１，ｇ１は、特定クラスタ（特定コンセプトベクトル）に対応する部分空間（詳細には直線）に対して、２つの特徴ベクトルｑ，ｇをそれぞれ射影した射影ベクトルである。 As shown in Figure 23, this orthogonal projection matrix P transforms vector q into vector q1 and vector g into vector g1. In Figure 23, the transformed vectors q1 and g1 are indicated by thick dashed lines. In this way, vectors q1 and g1 are projection vectors obtained by projecting two feature vectors q and g onto a subspace (more specifically, a line) corresponding to a specific cluster (specific concept vector).

それ故、式（２）は、次の式（５）のように変形される。具体的には、２つの射影ベクトルｑ１，ｇ１間の内積（ｑ１・ｇ１＝ｑｓ＊ｇｓ）が、当該各クラスタ（コンセプト）に関する類似度Ｓｃとして求められる。 Therefore, equation (2) is transformed into the following equation (5). Specifically, the inner product between the two projection vectors q1 and g1 (q1 * g1 = qs * gs) is calculated as the similarity Sc for each cluster (concept).

ここで、値ｑｓは、ベクトルｑ１の大きさ（コンセプトベクトルＵの向きの直線へと射影したベクトルｑの射影成分）であり、値ｇｓは、ベクトルｇ１の大きさ（コンセプトベクトルＵの向きの直線へと射影したベクトルｇの射影成分）である。ただし、ベクトルｑ１（，ｇ１）とコンセプトベクトルＵとのなす角度が９０度～２７０度（degree）である場合には、値ｑｓ（，ｇｓ）は負の値である。式（２）で示される類似度Ｓｃは、式（５）で示されるように、値ｑｓと値ｇｓとの積として算出される。 Here, the value qs is the magnitude of vector q1 (the projected component of vector q projected onto a line in the direction of concept vector U), and the value gs is the magnitude of vector g1 (the projected component of vector g projected onto a line in the direction of concept vector U). However, if the angle between vector q1(, g1) and concept vector U is between 90 and 270 degrees, the value qs(, gs) is a negative value. The similarity Sc shown in equation (2) is calculated as the product of the values qs and gs, as shown in equation (5).

なお、図２４は、図２３と同様の図である。ただし、図２４は、仮にコンセプトベクトルＵがｘ軸と同じ向きを向いている場合を示している。この場合には、図２４に示されるように、値ｑｓはベクトルｑ１のｘ方向成分（第１成分）に等しく、値ｇｓはベクトルｇ１のｘ方向成分（第１成分）に等しい。図２４は、説明の単純化のために示したが、一般的には、図２３のような射影変換が想定される。 Note that Figure 24 is similar to Figure 23. However, Figure 24 illustrates the case where the concept vector U is oriented in the same direction as the x-axis. In this case, as shown in Figure 24, the value qs is equal to the x-direction component (first component) of vector q1, and the value gs is equal to the x-direction component (first component) of vector g1. Figure 24 is shown for the sake of simplicity, but a projective transformation like that shown in Figure 23 is generally assumed.

式（２）（特に式（５））によれば、単一のコンセプトベクトルＵに関する寄与度（換言すれば、両画像の類似性に対する単一の被選択コンセプトの寄与度）が、類似度Ｓｃとして算出される。換言すれば、コンセプトごとの類似度Ｓｃが算出される。 According to equation (2) (particularly equation (5)), the contribution of a single concept vector U (in other words, the contribution of a single selected concept to the similarity between both images) is calculated as the similarity Sc. In other words, the similarity Sc for each concept is calculated.

また、評価対象のコンセプトが２つのコンセプトである場合、特定の射影行列Ｐは、特徴空間内における各特徴ベクトルＦを、特定の平面（特定の部分空間）に射影する射影行列である。より詳細には、２つのコンセプトに関する特定の射影行列Ｐは、特徴空間内における各特徴ベクトルＦを、特定の平面（２つのコンセプトに対応する２つのコンセプトベクトルＵ１，Ｕ２で張られる平面）に射影する射影行列である。そして、式（２）に従って、２つの特徴ベクトルＦを当該射影行列Ｐによって射影変換した後の２つの射影ベクトルの内積（Ｐｑ）・（Ｐｇ）が、２つの画像の類似性への評価対象コンセプトによる寄与度Ｓｃとして算出される。 Furthermore, when there are two concepts to be evaluated, the specific projection matrix P is a projection matrix that projects each feature vector F in the feature space onto a specific plane (a specific subspace). More specifically, the specific projection matrix P for the two concepts is a projection matrix that projects each feature vector F in the feature space onto a specific plane (a plane spanned by the two concept vectors U1 and U2 corresponding to the two concepts). Then, according to equation (2), the inner product (Pq) x (Pg) of the two projection vectors obtained after projecting the two feature vectors F using the projection matrix P is calculated as the contribution Sc of the concept to be evaluated to the similarity between the two images.

図２２は、このような射影行列Ｐ（２つのコンセプトベクトルＵ１，Ｕ２で張られる部分空間（平面）への直交射影行列）により各特徴ベクトルｑ，ｇがそれぞれ当該平面（２つのコンセプトベクトルＵ１，Ｕ２で張られる平面）に射影される様子を示している。図２２では、図示の簡単化のため、コンセプトベクトルＵ１，Ｕ２が平面ｚ＝０（ｘｙ平面）に平行である場合が示されている。換言すれば、２つのコンセプトベクトルＵ１，Ｕ２で張られる平面が、ｚ＝０で表される平面である場合が示されている。 Figure 22 shows how each feature vector q, g is projected onto the plane (the plane spanned by the two concept vectors U1, U2) using this projection matrix P (an orthogonal projection matrix onto the subspace (plane) spanned by the two concept vectors U1, U2). For simplicity, Figure 22 shows the case where the concept vectors U1, U2 are parallel to the plane z=0 (xy plane). In other words, the case where the plane spanned by the two concept vectors U1, U2 is the plane represented by z=0 is shown.

図２２に示されるように、２つのコンセプトベクトルＵ１，Ｕ２を用いて算出された当該直交射影行列Ｐによって、ベクトルｑはベクトルｑ１２に変換され且つベクトルｇはベクトルｇ１２に変換される。なお、変換後のベクトルｑ１２，ｇ１２は太い破線で示されている。 As shown in Figure 22, the orthogonal projection matrix P calculated using the two concept vectors U1 and U2 transforms vector q into vector q12 and vector g into vector g12. Note that the transformed vectors q12 and g12 are shown by thick dashed lines.

この場合、直交射影行列Ｐによる変換後の両ベクトルｑ１２，ｇ１２の内積（換言すれば、特定部分空間（平面ｚ＝０）での両ベクトルｑ１２，ｇ１２の類似度）が、類似度Ｓｃとして算出される。すなわち、２つのコンセプトを考慮対象（評価対象）とする場合、対応する２つのコンセプトベクトルＵ１，Ｕ２で張られる平面に対して２つの特徴ベクトルＦを射影変換した後の２つのベクトルの内積（ｑ１２・ｇ１２）が、類似度Ｓｃとして算出される。 In this case, the dot product of both vectors q12 and g12 after transformation using the orthogonal projection matrix P (in other words, the similarity between both vectors q12 and g12 in the specific subspace (plane z = 0)) is calculated as the similarity Sc. In other words, when two concepts are being considered (evaluated), the dot product of the two vectors (q12 · g12) after projecting two feature vectors F onto the plane spanned by the two corresponding concept vectors U1 and U2 is calculated as the similarity Sc.

このようにして、２つのコンセプトベクトルＵ１，Ｕ２に関する寄与度（換言すれば、両画像の類似性に対する２つの被選択コンセプトの寄与度）が、式（２）に基づき類似度Ｓｃとして算出され得る。 In this way, the contribution of the two concept vectors U1 and U2 (in other words, the contribution of the two selected concepts to the similarity between the two images) can be calculated as the similarity Sc based on equation (2).

同様にして、任意の個数（所定数）のコンセプトベクトルＵに関する寄与度（換言すれば、両画像の類似性に対する当該所定数の被選択コンセプトの寄与度）（類似度）Ｓｃが、式（２）に基づき算出され得る。なお、３つ以上のコンセプトベクトルＵに関する寄与度Ｓｃは、４次元以上の変換前空間からそれよりも低い次元（コンセプトベクトルの個数に等しい次元）の変換後空間へと射影変換された２つのベクトルの内積として考えればよい（ただし、図示は困難である）。 Similarly, the contribution (similarity) Sc for any number (predetermined number) of concept vectors U (in other words, the contribution of the predetermined number of selected concepts to the similarity between both images) can be calculated based on equation (2). The contribution Sc for three or more concept vectors U can be thought of as the inner product of two vectors projected from a pre-transformation space of four or more dimensions to a post-transformation space of lower dimensions (dimensions equal to the number of concept vectors) (however, this is difficult to illustrate).

＜１－１０．サブフェーズＰＨ３ｂ（ステップＳ３０）の処理＞
＜サブフェーズＰＨ３ｂの概要＞
つぎに、サブフェーズＰＨ３ｂ（ステップＳ３０）の処理について図５を参照しつつ説明する。サブフェーズＰＨ３ｂの処理は、２つの画像の類似性に関する判断根拠を導出する処理等である。 <1-10. Processing of subphase PH3b (step S30)>
<Outline of Sub-Phase PH3b>
Next, the processing of sub-phase PH3b (step S30) will be described with reference to Fig. 5. The processing of sub-phase PH3b includes deriving a basis for determining the similarity between two images.

サブフェーズＰＨ３ｂでは、同一の人物（ないし類似する人物）であると画像処理装置３０が判断した根拠を説明する処理等が、画像処理装置３０によって実行される。ここでは、ステップＳ１２で同一人物の画像（所定程度以上に類似している画像）と判定された２つの画像（クエリ画像Ｘｑおよびギャラリー画像Ｘｇ）の類似性に関する判断根拠を導出する処理について主に説明する。ただし、本発明は、これに限定されず、任意の２つの画像の類似性についての判断処理およびその判断根拠の導出処理等が実行されてもよい。 In subphase PH3b, the image processing device 30 executes processes such as explaining the basis for the image processing device 30's determination that the images are of the same person (or similar people). Here, the process of deriving the basis for determining the similarity between two images (query image Xq and gallery image Xg) determined in step S12 to be images of the same person (images that are similar to a predetermined degree or more) will be mainly described. However, the present invention is not limited to this, and processes such as determining the similarity between any two images and deriving the basis for that determination may be executed.

このサブフェーズＰＨ３ｂの処理によれば、類似していると判断した根拠が、たとえば、特定の特徴を有するシャツを着用している点なのか、特定の特徴を有するボトムスを着用している点なのか等が解析され得る。また、シャツの模様に特徴を見いだしたのか等もが解析され得る。 The processing of this subphase PH3b can analyze whether the basis for determining similarity is, for example, whether the person is wearing a shirt with specific characteristics, or whether the person is wearing bottoms with specific characteristics, etc. It can also analyze whether characteristics were found in the pattern of the shirt, etc.

このサブフェーズＰＨ３ｂでは、上述したコンセプトベクトルＵ（ステップＳ２３（図４）参照）を利用して、類似性に関する判断根拠が解析される。 In this sub-phase PH3b, the basis for determining similarity is analyzed using the concept vector U described above (see step S23 (Figure 4)).

＜ステップＳ３１：主要コンセプトの抽出＞
具体的には、まず、ステップＳ３１（図５）において、学習モデル４００にて獲得された各種のコンセプトのうち、２つの画像の類似性に特に大きな影響を及ぼすコンセプト（寄与度が大きなコンセプト）が主要コンセプトとして抽出される。詳細には、複数のコンセプトのうち、所定の基準による上位数個のコンセプトが主要コンセプトとして抽出される。たとえば、次述する２つの手法のいずれか、具体的には、第１手法と第２手法とのいずれかが実行されて、主要コンセプトが抽出される。 <Step S31: Extraction of main concepts>
Specifically, in step S31 ( FIG. 5 ), among the various concepts acquired by the learning model 400, concepts that have a particularly large influence on the similarity between two images (concepts with a large contribution) are extracted as main concepts. More specifically, among the multiple concepts, the top several concepts according to a predetermined criterion are extracted as main concepts. For example, the main concepts are extracted by executing one of the following two methods, specifically, the first method or the second method.

以下では、２つの手法（第１手法および第２手法）について例示する。 Below, two methods (method 1 and method 2) are illustrated.

まず、第１手法について説明する。 First, we will explain the first method.

第１手法は、画像ペアの類似度に対するコンセプト毎の寄与度Ｓｃ（式（５）参照）を全コンセプトについて求め、当該寄与度Ｓｃの大きな上位数個のコンセプトを主要コンセプトとして抽出する手法である。 The first method calculates the contribution Sc (see formula (5)) of each concept to the similarity between image pairs for all concepts, and extracts the top few concepts with the highest contribution Sc as main concepts.

具体的には、第１手法においては、まず、コントローラ３１は、単一のコンセプトベクトルＵの寄与度Ｓｃ（上述の式（５）に基づく類似度Ｓｃ）を、複数の候補コンセプトベクトルＵ（後述）（たとえば全てのコンセプトベクトルＵ）のそれぞれについて求める。すなわち、コントローラ３１は、或るクラスタに対応する単一のコンセプトベクトルＵで表現される一のコンセプトが全体の類似度にどの程度寄与しているか、を求める処理（コンセプトごとの寄与度Ｓｃを求める処理）を、全てのコンセプトについて実行する。次に、コントローラ３１は、全てのコンセプトをその寄与度の大きい順に（寄与度の降順に）並べ替える。そして、コントローラ３１は、上位数個のコンセプトを、２つの画像の類似度に大きく寄与する主要なコンセプト（主要コンセプト）として決定する。 Specifically, in the first technique, the controller 31 first calculates the contribution Sc of a single concept vector U (the similarity Sc based on the above-described formula (5)) for each of multiple candidate concept vectors U (described below) (e.g., all concept vectors U). That is, the controller 31 performs a process for calculating the degree to which a concept represented by a single concept vector U corresponding to a certain cluster contributes to the overall similarity (a process for calculating the contribution Sc for each concept) for all concepts. Next, the controller 31 sorts all concepts in descending order of their contribution. The controller 31 then determines the top few concepts as main concepts (main concepts) that contribute significantly to the similarity between the two images.

なお、第１手法においては（第２手法も同様）、上記ステップＳ２０で抽出された複数のコンセプト（複数のコンセプトベクトルＵ）の全てが、類似性の根拠となるコンセプトの候補（候補コンセプトとも称する）として特定されてもよい。ただし、これに限定されず、候補コンセプトは、上記ステップＳ２０で抽出された全てのコンセプトのうちの一部のコンセプトであってもよい。換言すれば、類似性の根拠となるコンセプトベクトルの候補（候補コンセプトベクトルＵとも称する）は、上記ステップＳ２０で抽出された全てのコンセプトベクトルのうちの一部のコンセプトベクトルであってもよい。このように、一部の候補コンセプト（候補コンセプトベクトル）の中から、主要コンセプト（主要コンセプトベクトル）が決定されてもよい。 Note that in the first method (and the second method as well), all of the multiple concepts (multiple concept vectors U) extracted in step S20 above may be identified as candidate concepts (also referred to as candidate concepts) that serve as the basis for similarity. However, this is not limited to this, and the candidate concepts may be some of all of the concepts extracted in step S20 above. In other words, the candidate concept vectors (also referred to as candidate concept vectors U) that serve as the basis for similarity may be some of all of the concept vectors extracted in step S20 above. In this way, a main concept (main concept vector) may be determined from some of the candidate concepts (candidate concept vectors).

次に、第２手法について説明する。 Next, we will explain the second method.

第２手法は、寄与度（類似度）について検討する点においては第１手法に類似する。 The second method is similar to the first method in that it considers contribution (similarity).

ただし、第２手法では、或る時点（ｉ回目の繰り返し処理時点）で既に選択（考慮）されたコンセプト（選択済みコンセプト）以外のコンセプト（未選択コンセプト）の中から、次順位コンセプトを探索する探索処理が繰り返し実行される。ここで、次順位コンセプトは、未選択コンセプトのうち、考慮される類似度成分（そのコンセプトによって追加的に説明される類似度成分）を最も大きく増大させるようなコンセプトである。具体的には、或る時点における未選択（未考慮）コンセプトのコンセプトベクトルＵ（未選択コンセプトベクトルＵ）のうち、考慮される類似度成分を最大化するようなコンセプトベクトルＵを探索する探索処理が繰り返し実行される。換言すれば、選択済みコンセプトでは未だ説明されていない類似度成分のうち、新たに（追加的に）説明できる成分を最も大きくする未選択コンセプト（次順位コンセプト）が順次に探索される。 However, in the second method, a search process is repeatedly performed to search for a next-ranked concept from among concepts (unselected concepts) other than concepts (selected concepts) that have already been selected (considered) at a certain point in time (at the time of the i-th iteration). Here, the next-ranked concept is the concept, among the unselected concepts, that most significantly increases the considered similarity component (the similarity component that is additionally explained by that concept). Specifically, a search process is repeatedly performed to search for a concept vector U that maximizes the considered similarity component from among the concept vectors U (unselected concept vectors U) of unselected (unconsidered) concepts at a certain point in time. In other words, an unselected concept (next-ranked concept) that maximizes the newly (additionally) explainable component from among the similarity components not yet explained by the selected concepts is sequentially searched for.

詳細には、或る時点で既に選択されたコンセプトベクトルＵにより張られる部分空間の「直交補空間」上で類似度に最も寄与するコンセプトベクトルＵが探索される（後述する式（６）等参照）。そして、当該探索処理が繰り返されることによって、主要コンセプトが決定される。このような点において、第２手法は、コンセプト毎の寄与度を主に用いて主要コンセプトを決定する第１手法と相違する。 Specifically, the concept vector U that contributes most to similarity is searched for in the "orthogonal complement" of the subspace spanned by the concept vector U already selected at a certain point in time (see equation (6) below, etc.). This search process is then repeated to determine the main concept. In this respect, the second method differs from the first method, which determines the main concept primarily using the contribution of each concept.

第２手法によれば、高い独立性を有するコンセプトを主要コンセプトとして抽出することが可能である。換言すれば、コンセプト間の重複を少なくした主要コンセプトを抽出することが可能である。 The second method makes it possible to extract concepts with high independence as main concepts. In other words, it is possible to extract main concepts with minimal overlap between concepts.

以下、第２手法について詳細に説明する。 The second method is explained in detail below.

第２手法では、各特徴ベクトルＦ（詳細には、特徴ベクトルｑ，ｇ）が、特定の射影行列Ｒ（次述）による射影変換後の射影ベクトルに変換される。 In the second method, each feature vector F (specifically, feature vectors q and g) is transformed into a projection vector using a specific projection matrix R (described next).

特定の射影行列Ｒは、特定の射影行列Ｐを用いて（１－Ｐ）で表現される行列（Ｒ＝１－Ｐ）である。 A specific projection matrix R is a matrix (R = 1-P) expressed as (1-P) using a specific projection matrix P.

特定の射影行列Ｐ（詳細には、Ｐｎとも表記する）は、或る時点で既に考慮されたｎ個（γ個）のコンセプトベクトルＵに基づく射影行列（詳細には、直交射影行列）である。射影行列Ｐ（詳細にはＰｎ）は、ｎ個のコンセプトベクトルＵを横方向に並べた行列Ｂに基づいて、式（３）または式（４）を用いて求められる行列である。なお、値ｎは、上記探索処理が繰り返されるごとに１つずつ増加していく値である。 A specific projection matrix P (more specifically, also referred to as Pn) is a projection matrix (more specifically, an orthogonal projection matrix) based on n (γ) concept vectors U that have already been considered at a certain point in time. The projection matrix P (more specifically, Pn) is a matrix calculated using equation (3) or (4) based on matrix B, in which n concept vectors U are arranged horizontally. Note that the value n increases by one each time the above search process is repeated.

一方、射影行列Ｒ（詳細には、Ｒｎとも表記する）は、射影行列Ｐｎによる射影空間（部分空間）に直交する部分空間（直交補空間）へと各特徴ベクトルＦを射影する行列であり、（１－Ｐｎ）に相当する行列である。換言すれば、射影行列Ｐｎによる射影空間（部分空間）の直交補空間が、射影行列Ｒｎによる射影空間（部分空間）である。なお、射影行列Ｒｎは、直交射影行列でもある。 On the other hand, the projection matrix R (more specifically, also referred to as Rn) is a matrix that projects each feature vector F onto a subspace (orthogonal complement) that is orthogonal to the projection space (subspace) created by the projection matrix Pn, and is equivalent to (1-Pn). In other words, the orthogonal complement of the projection space (subspace) created by the projection matrix Pn is the projection space (subspace) created by the projection matrix Rn. Note that the projection matrix Rn is also an orthogonal projection matrix.

なお、特徴ベクトルＦは、特定の部分空間へ射影する射影行列Ｐ（＝Ｐｎ）を特徴ベクトルＦに作用させたベクトル（ＰＦ）と、当該特定の部分空間の直交補空間へと射影する射影行列Ｒ（＝Ｒｎ＝１－Ｐｎ）を特徴ベクトルＦに作用させたベクトル（ＲＦ）とに分離される。ベクトル（ＰＦ）は、射影行列Ｐ（詳細には行列Ｂ）を構成するｎ個のコンセプトベクトルＵで考慮される成分である、とも表現される。また、ベクトル（ＲＦ）は、射影行列Ｐ（行列Ｂ）を構成するｎ個のコンセプトベクトルＵでは未だ考慮されていない成分である、とも表現される。 Note that feature vector F is separated into vector (PF) obtained by applying projection matrix P (= Pn), which projects onto a specific subspace, to feature vector F, and vector (RF) obtained by applying projection matrix R (= Rn = 1 - Pn), which projects onto the orthogonal complement of that specific subspace, to feature vector F. Vector (PF) can also be expressed as a component considered in the n concept vectors U that make up projection matrix P (more specifically, matrix B). Vector (RF) can also be expressed as a component not yet considered in the n concept vectors U that make up projection matrix P (matrix B).

また、射影行列Ｐが、たとえば単一のコンセプトベクトルＵで張られる部分空間（特定直線）への射影行列である場合、射影行列Ｒは、当該部分空間（特定直線）に直交する部分空間（残りの部分空間）への射影行列である。簡単化のためコンセプトベクトルＵが３次元ベクトルであるとすると、射影行列Ｒは、当該部分空間（特定直線）に直交する部分空間（「平面（当該特定直線に垂直な平面）」）への射影行列である。 Furthermore, if the projection matrix P is, for example, a projection matrix onto a subspace (specific straight line) spanned by a single concept vector U, the projection matrix R is a projection matrix onto a subspace (the remaining subspace) orthogonal to that subspace (specific straight line). For simplicity's sake, if the concept vector U is considered a three-dimensional vector, the projection matrix R is a projection matrix onto a subspace ("plane (plane perpendicular to the specific straight line)") orthogonal to that subspace (specific straight line).

この第２手法では、次の式（６）で表現される指標値Ｑｎを最大化する新たなコンセプト（詳細には、当該新たなコンセプトに対応する新たなコンセプトベクトルＵｒ）が探索される。具体的には、第（ｎ＋１）順位の新たなコンセプトベクトルＵが探索される。すなわち、次順位コンセプトベクトルＵ（ひいては次順位コンセプト）が探索される。 In this second method, a new concept (more specifically, a new concept vector Ur corresponding to the new concept) is searched for that maximizes the index value Qn expressed by the following equation (6). Specifically, a new concept vector U of the (n+1)th rank is searched for. In other words, the next-ranked concept vector U (and therefore the next-ranked concept) is searched for.

詳細には、未だ選択されていない（残余の）コンセプトベクトルＵｒ（新たな選択対象コンセプトの候補）のそれぞれについて、式（６）に基づいて、指標値Ｑｎが算出される。ここで、「既に選択（考慮）されたコンセプトベクトル」は、行列Ｐｎ（行列Ｒｎに対応する行列）を構成するために利用されたｎ個（１又は２以上）のコンセプトベクトルＵを意味する。また、「未だ選択（考慮）されていないコンセプトベクトル」は、当該行列Ｐｎを構成するために利用されたｎ個のコンセプトベクトルＵ以外のコンセプトベクトルＵ（残余のコンセプトベクトルＵ）を意味する。 Specifically, for each of the (remaining) concept vectors Ur (candidates for new concepts to be selected), an index value Qn is calculated based on equation (6). Here, "concept vectors already selected (considered)" refers to the n (one or more) concept vectors U used to construct the matrix Pn (the matrix corresponding to the matrix Rn). Furthermore, "concept vectors not yet selected (considered)" refers to the concept vectors U (remaining concept vectors U) other than the n concept vectors U used to construct the matrix Pn.

式（６）のベクトルＵｒは、新たな選択対象コンセプトの候補に対応する未選択の一のコンセプトベクトルＵ（一の候補コンセプトベクトルＵ）である。 The vector Ur in equation (6) is an unselected concept vector U (a candidate concept vector U) corresponding to a new candidate concept to be selected.

ベクトル（Ｒｑ）は、射影行列Ｒを特徴ベクトルｑに対して（左から）作用させたベクトルであり、ベクトルＲｇは、射影行列Ｒを特徴ベクトルｇに対して（左から）作用させたベクトルである。 Vector (Rq) is the vector obtained by applying the projection matrix R to the feature vector q (from the left), and vector Rg is the vector obtained by applying the projection matrix R to the feature vector g (from the left).

換言すれば、ベクトル（Ｒｑ）は、射影行列Ｐによる射影空間に対する直交補空間へと特徴ベクトルｑを射影したベクトルであり、ベクトル（Ｒｇ）は、射影行列Ｐによる射影空間に対する直交補空間へと特徴ベクトルｇを射影したベクトルである。 In other words, vector (Rq) is the vector obtained by projecting feature vector q onto the orthogonal complement of the projected space defined by projection matrix P, and vector (Rg) is the vector obtained by projecting feature vector g onto the orthogonal complement of the projected space defined by projection matrix P.

これらのベクトル（Ｒｑ）およびベクトル（Ｒｇ）は、それぞれ、未だ考慮されていない部分空間への射影ベクトルである。また、ベクトル（Ｒｑ）とベクトルＵｒ（未だ考慮されていない新たなコンセプトベクトル）との内積（Ｒｑ・Ｕｒ）は、まだ考慮されていない部分空間への射影ベクトルのうち、当該ベクトルＵｒによる寄与成分（ベクトルＵｒへの射影成分）を表している。ベクトル（Ｒｑ）とベクトルＵｒ（未だ考慮されていない新たなコンセプトベクトル）との内積（Ｒｇ・ｂ）も同様である。 These vectors (Rq) and (Rg) are each projection vectors onto a subspace that has not yet been considered. Furthermore, the dot product (Rq・Ur) of vector (Rq) and vector Ur (a new concept vector that has not yet been considered) represents the contribution component (projection component onto vector Ur) of the projection vector onto a subspace that has not yet been considered, made by vector Ur. The same is true for the dot product (Rg・b) of vector (Rq) and vector Ur (a new concept vector that has not yet been considered).

そして、式（６）では、これらの２つの内積同士の積（スカラー（値））が指標値Ｑｎとして求められる。具体的には、ベクトル（Ｒｑ）とベクトルＵｒとの内積（（Ｒｑ）・Ｕｒ）が算出されるとともに、ベクトル（Ｒｇ）とベクトルＵｒとの内積（（Ｒｇ）・Ｕｒ）が算出され、これらの内積同士の積（スカラー積）（式（６）の右辺）が算出される。 In equation (6), the product (scalar (value)) of these two dot products is calculated as the index value Qn. Specifically, the dot product ((Rq) · Ur) of vector (Rq) and vector Ur is calculated, and the dot product ((Rg) · Ur) of vector (Rg) and vector Ur is calculated, and the product (scalar product) of these dot products (the right-hand side of equation (6)) is calculated.

この指標値Ｑｎは、既に考慮したコンセプトでは未だ考慮されていない部分空間（既に考慮したコンセプトベクトルＵで張られる部分空間に対する「直交補空間」）への射影ベクトル（当該直交補空間へと特徴ベクトルＦを射影したベクトル（ＲＦ））と新たなコンセプトベクトルＵｒとの内積（（ＲＦ）・Ｕｒ）同士の積である。 This index value Qn is the product of the inner product ((RF) * Ur) of the projection vector (vector (RF) obtained by projecting feature vector F onto the orthogonal complement) of a subspace not yet considered in the concepts already considered (the "orthogonal complement" of the subspace spanned by the concept vector U already considered) and the new concept vector Ur.

これまでに考慮したコンセプト（１以上のコンセプトベクトルＵ）では未だ考慮できていない部分空間（直交補空間）に各特徴ベクトルＦを射影した各射影ベクトル（ＲＦ）は、各特徴ベクトルＦのうちの未考慮成分（既考慮コンセプトでは未だ考慮できていない成分）を表している。また、当該未考慮成分と新たなコンセプトベクトルＵとの内積は、当該未考慮成分と（当該新たなコンセプトベクトルＵに対応する）新たなコンセプトとの類似性を表している。それ故、指標値Ｑｎは、２つの特徴ベクトルｑ，ｇのうちの未考慮成分と新たなコンセプトとの類似性を表している。すなわち、指標値Ｑｎは、当該未考慮成分のうち、類似性に関する判断根拠について、当該新たなコンセプトで説明できる成分の大きさを示している。換言すれば、指標値Ｑｎは、新たなコンセプト（コンセプトベクトルＵ）が未考慮成分に寄与する度合いを表している。 Each projection vector (RF) obtained by projecting each feature vector F onto a subspace (orthogonal complement) that has not yet been considered by the concepts considered so far (one or more concept vectors U) represents an unconsidered component of each feature vector F (a component that has not yet been considered by the concepts previously considered). Furthermore, the inner product of the unconsidered component and the new concept vector U represents the similarity between the unconsidered component and the new concept (corresponding to the new concept vector U). Therefore, the index value Qn represents the similarity between the unconsidered component of the two feature vectors q and g and the new concept. In other words, the index value Qn indicates the extent to which the new concept can explain the basis for determining similarity among the unconsidered components. In other words, the index value Qn represents the degree to which the new concept (concept vector U) contributes to the unconsidered component.

それ故、指標値Ｑｎを最大化するコンセプトベクトルＵは、２つの特徴ベクトルＦに関する未考慮成分との類似度を最大化するコンセプトであり、未考慮成分を最も説明できるコンセプトである。 Therefore, the concept vector U that maximizes the index value Qn is the concept that maximizes the similarity with the unconsidered components for the two feature vectors F, and is the concept that can best explain the unconsidered components.

そして、探索処理の結果、未選択のコンセプトベクトルＵｒのうち、当該指標値Ｑｎを最大化するコンセプトベクトルＵｒが、新たなコンセプトベクトルＵ（次順位コンセプトベクトル）として求められる。すなわち、指標値Ｑｎを最大化するコンセプトベクトルＵに対応するコンセプトが、新たな主要コンセプト（次順位（第（ｎ＋１）順位）の主要コンセプト）として選択される。換言すれば、指標値Ｑｎを最大化するクラスタが次順位クラスタとして決定される。 As a result of the search process, the concept vector Ur that maximizes the index value Qn from among the unselected concept vectors Ur is determined as the new concept vector U (next-ranked concept vector). In other words, the concept corresponding to the concept vector U that maximizes the index value Qn is selected as the new main concept (next-ranked ((n+1)th) main concept). In other words, the cluster that maximizes the index value Qn is determined as the next-ranked cluster.

さらに、当該新たなコンセプトベクトル（次順位コンセプトベクトル）をも用いた新たな行列Ｐ，Ｒが求められる。具体的には、次順位コンセプトベクトルの追加に伴って、コンセプトベクトルＵの選択数が１つ増大（インクリメント）し、値ｎも１つ増大する。また、コンセプトベクトルＵの選択数等のインクリメントに応じて、射影行列Ｐのランク（階数）は１つ増大し且つ射影行列Ｒのランク（階数）は１つ減少する。換言すれば、射影行列Ｐによる射影空間の次元数は１つ増大し、射影行列Ｒによる射影空間の次元数は１つ減少する。 Furthermore, new matrices P and R are calculated using this new concept vector (next-ranked concept vector). Specifically, with the addition of the next-ranked concept vector, the number of selected concept vectors U increases (increments) by one, and the value n also increases by one. Furthermore, in response to the increment in the number of selected concept vectors U, the rank (rank) of projection matrix P increases by one, and the rank (rank) of projection matrix R decreases by one. In other words, the number of dimensions of the projection space created by projection matrix P increases by one, and the number of dimensions of the projection space created by projection matrix R decreases by one.

以後、同様にして、所定数（たとえば５個）の主要コンセプトが選択されるまで当該探索処理が繰り返される。これによって、第２手法において、上位数個（上位所定数）の主要コンセプト（詳細には、当該主要コンセプトに対応するコンセプトベクトルＵ）が決定される。 The search process is then repeated in a similar manner until a predetermined number (e.g., five) of major concepts are selected. This allows the second method to determine the top few (top predetermined number) major concepts (more specifically, the concept vectors U corresponding to those major concepts).

より具体的には、第１順位のコンセプトベクトルＵの決定の際には、ゼロ個のコンセプトベクトルＵが選択された状態から開始されればよい。この場合、式（６）において、ベクトル（Ｒｑ）は特徴ベクトルｑ自体であり且つベクトル（Ｒｇ）は特徴ベクトルｇ自体であるとして、ベクトル（Ｒｑ）と候補コンセプトベクトルＵとの内積等が求められればよい。 More specifically, when determining the first-ranked concept vector U, it is sufficient to start with zero concept vectors U selected. In this case, in equation (6), the vector (Rq) is the feature vector q itself, and the vector (Rg) is the feature vector g itself, and the dot product between the vector (Rq) and the candidate concept vector U can be calculated.

あるいは、第１手法と同様にして、式（５）に基づいて複数のコンセプトベクトルＵについての各類似度Ｓｃが算出され、算出された複数の類似度Ｓｃのうちの最大値に対応するコンセプトベクトルＵが第１順位のコンセプトベクトルＵとして求められてもよい。なお、第１順位のコンセプトを求めるにあたっては、式（６）と式（５）とは等価である（Ｒｑ・Ｕｒ＝ｑ・Ｕｒ＝ｑｓ＝Ｐｑ）。 Alternatively, similar to the first method, similarities Sc for multiple concept vectors U may be calculated based on formula (5), and the concept vector U corresponding to the maximum value of the calculated similarities Sc may be determined as the first-ranked concept vector U. Note that when determining the first-ranked concept, formulas (6) and (5) are equivalent (Rq Ur = q Ur = qs = Pq).

また、たとえば、第１順位のコンセプトベクトルＵがコンセプトベクトルＵ１１０に決定された後、第２順位のコンセプトベクトルＵが決定される際には、行列Ｂは、単一のコンセプトベクトルＵ１１０（縦ベクトル）になる。そして、行列Ｂ（当該縦ベクトル）に基づき行列Ｐ（階数＝１）が求められ（式（３）あるいは式（４）参照）、行列Ｐに基づき行列Ｒが求められる（Ｒ＝１－Ｐ）。そして、式（６）の指標値Ｑｎを最大化する次順位（第２順位）コンセプトベクトルＵ（たとえばコンセプトベクトルＵ４００）が求められる。 For example, after the first-ranked concept vector U is determined as concept vector U110, when the second-ranked concept vector U is determined, matrix B becomes a single concept vector U110 (column vector). Then, matrix P (rank = 1) is calculated based on matrix B (the column vector) (see equation (3) or (4)), and matrix R is calculated based on matrix P (R = 1 - P). Then, the next-ranked (second-ranked) concept vector U (for example, concept vector U400) that maximizes the index value Qn of equation (6) is calculated.

さらに、その後、第３順位のコンセプトベクトルＵが決定される際には、行列Ｂは、２つのコンセプトベクトルＵ１１０，Ｕ４００が横方向に並べられた行列である。そして、行列Ｂに基づき行列Ｐ（階数＝２）が求められ（式（３）あるいは式（４）参照）、行列Ｐに基づき行列Ｒが求められる（Ｒ＝１－Ｐ）。そして、式（６）の指標値Ｑｎを最大化する次順位（第３順位）コンセプトベクトルＵ（たとえばコンセプトベクトルＵ５１０）が求められる。 Furthermore, when the third-ranked concept vector U is subsequently determined, matrix B is a matrix in which two concept vectors U110 and U400 are arranged horizontally. Then, matrix P (rank = 2) is calculated based on matrix B (see equation (3) or (4)), and matrix R is calculated based on matrix P (R = 1 - P). Then, the next-ranked (third-ranked) concept vector U (for example, concept vector U510) that maximizes the index value Qn of equation (6) is calculated.

以降、同様の処理が繰り返され、上位数個（たとえば５個）のコンセプトベクトルＵが決定される。 The same process is then repeated to determine the top few (e.g., five) concept vectors U.

＜別の終了条件＞
なお、ここでは、所定数の主要コンセプトが得られるまで上記探索処理が繰り返されている（換言すれば、所定数の主要コンセプトが得られることが終了条件として設定されている）が、これに限定されず、別の終了条件（終了判定条件）の下で上記探索処理が繰り返されてもよい。 <Another termination condition>
Here, the search process is repeated until a predetermined number of main concepts are obtained (in other words, the termination condition is set to be that a predetermined number of main concepts are obtained), but this is not limited to this, and the search process may be repeated under a different termination condition (termination determination condition).

たとえば、式（７）の条件を満たすことが終了条件として設定されてもよい。詳細には、値ｎのインクリメント後に式（７）の終了条件が判定されればよい。 For example, the termination condition may be set to be that the condition in equation (7) is satisfied. In particular, the termination condition in equation (7) may be determined after the value n is incremented.

式（７）の左辺（（Ｐｑ）・（Ｐｇ））は、上述の類似度Ｓｃと同じである（式（２）参照）。行列Ｐは、被選択コンセプト（選択済みコンセプト）を示すｎ個の被選択コンセプトベクトルＵで張られる部分空間への射影行列である。すなわち、式（７）の左辺は、ベクトルｑ，ｇをそれぞれ当該部分空間へ射影した射影ベクトルの内積（すなわち、ｎ個の被選択コンセプトベクトルＵによる寄与度）を表す。 The left-hand side of equation (7) ((Pq)·(Pg)) is the same as the similarity Sc described above (see equation (2)). The matrix P is a projection matrix onto a subspace spanned by n selected concept vectors U representing selected concepts (already selected concepts). In other words, the left-hand side of equation (7) represents the inner product of the projection vectors obtained by projecting vectors q and g onto the subspace (i.e., the contribution of the n selected concept vectors U).

一方、式（７）の右辺は、変換前の両ベクトルｑ，ｇの内積に一定の割合（１－δ）を乗じた値である。ここで、値δは、割合を示す定数（０＜δ＜１）であり、値（１－δ）も、全体に対する割合を示している。 On the other hand, the right-hand side of equation (7) is the value obtained by multiplying the inner product of the two vectors q and g before transformation by a constant ratio (1-δ). Here, the value δ is a constant indicating the ratio (0<δ<1), and the value (1-δ) also indicates the ratio to the whole.

たとえば、値δが０．４の場合、値（１－δ）は０．６である。この場合、式（７）の終了条件は、変換後の両ベクトル（Ｐｑ），（Ｐｇ）の内積が、変換前の両ベクトルｑ，ｇの内積の６０％の値よりも大きくなることを意味する。換言すれば、式（７）は、値ｎの増大に伴って徐々に増大する値Ｓｃ（ｎ個のコンセプトベクトルＵに対応する射影行列Ｐを用いて算出される類似度Ｓｃ）が、全体の類似度Ｓｔに対する一定割合（１－δ）の値よりも大きくなった時点で探索処理が終了することを意味する。 For example, if the value δ is 0.4, the value (1 - δ) is 0.6. In this case, the termination condition of equation (7) means that the inner product of the transformed vectors (Pq) and (Pg) is greater than 60% of the inner product of the untransformed vectors q and g. In other words, equation (7) means that the search process ends when the value Sc (the similarity Sc calculated using the projection matrix P corresponding to n concept vectors U), which gradually increases as the value n increases, becomes greater than a certain percentage (1 - δ) of the overall similarity St.

たとえば、第５順位までのコンセプトベクトルＵが決定された後、第（ｎ＋１）順位（具体的には、第６順位）の新たなコンセプトベクトルＵｒを探索する際に、式（７）の終了条件が成立すると、探索処理の繰り返しが終了する。そして、第５順位までのコンセプトベクトルＵが主要コンセプトベクトルとして決定される。 For example, after the first five concept vectors U have been determined, when searching for a new concept vector Ur of the (n+1)th order (specifically, the sixth order), if the termination condition of equation (7) is met, the repeated search process ends. Then, the first five concept vectors U are determined as the main concept vectors.

このようにして、コンセプトベクトルＵの個数ｎを予め決定する代わりに、値δ（ひいては、ｎ個のコンセプトベクトルＵで考慮すべき寄与度の割合）を予め決定してもよい。 In this way, instead of predetermining the number n of concept vectors U, the value δ (and thus the proportion of contributions to be considered among the n concept vectors U) may be predetermined.

＜コンセプト決定例＞
上述のような第２手法によれば、たとえば、最上位（第１順位）コンセプトＣ１、第２順位コンセプトＣ２、第３順位コンセプトＣ３、第４順位コンセプトＣ４、および第５順位コンセプトＣ５が、この順序で探索されて求められる。 <Example of concept decision>
According to the second technique described above, for example, the top (first) concept C1, the second concept C2, the third concept C3, the fourth concept C4, and the fifth concept C5 are searched for and found in this order.

より具体的には、コンセプトベクトルＵ１１０（図１８最上段参照）が最上位コンセプトＣ１として特定される（図１１および図１３等も参照）。この場合、２つの画像の類似性に関する最大の判断根拠（学習済みモデル４２０による判断根拠）は、コンセプトベクトルＵ１１０で表現される特徴であることが判る。換言すれば、コンセプトベクトルＵ１１０に対応するクラスタＧ１１０の特徴が、（類似している旨の判断に関する）最大の判断根拠であることが判る。当該クラスタＧ１１０は、「模様多めの白シャツ」を着用している人物の画像群で構成されている（図１１および図１３参照）ことから、当該コンセプトＣ１は、「模様多めの白シャツ」、とも表現できる。 More specifically, concept vector U110 (see the top row of Figure 18) is identified as the highest-level concept C1 (see also Figures 11 and 13, etc.). In this case, it can be seen that the greatest basis for determining the similarity between the two images (the basis for determination by the trained model 420) is the feature expressed by concept vector U110. In other words, it can be seen that the feature of cluster G110 corresponding to concept vector U110 is the greatest basis for determining (the similarity). Since cluster G110 is composed of a group of images of a person wearing a "white shirt with many patterns" (see Figures 11 and 13), concept C1 can also be expressed as "white shirt with many patterns."

また、コンセプトベクトルＵ４００（図２０最上段参照）が第２順位コンセプトＣ２として特定される。換言すれば、コンセプトベクトルＵ４００に対応するクラスタＧ４００の特徴が、第１順位コンセプトＣ１とは別観点の比較的大きな判断根拠であることが判る。当該クラスタＧ４００は、「白い短めのボトムス」を着用している人物の画像群で構成されている（図１１等参照）ことから、当該コンセプトＣ２は、「白い短めのボトムス」、とも表現できる。 Furthermore, concept vector U400 (see the top row of Figure 20) is identified as the second-ranked concept C2. In other words, it can be seen that the characteristics of cluster G400 corresponding to concept vector U400 provide a relatively large basis for judgment from a different perspective than the first-ranked concept C1. Since cluster G400 is composed of a group of images of people wearing "short white bottoms" (see Figure 11, etc.), concept C2 can also be expressed as "short white bottoms."

同様に、コンセプトベクトルＵ５１０（図２０中段参照）が第３順位コンセプトＣ３（「薄い青色のボトムス」）として特定される。また、コンセプトベクトルＵ２００（図１９最上段参照）が第４順位コンセプトＣ４（「薄いピンク色のシャツ」）として特定され、コンセプトベクトルＵ３１０（図１９中段参照）が第５順位コンセプトＣ５（「無地のワインレッド色のシャツ」）として特定される。 Similarly, concept vector U510 (see middle row of Figure 20) is identified as the third-ranked concept C3 ("light blue bottoms"). Furthermore, concept vector U200 (see top row of Figure 19) is identified as the fourth-ranked concept C4 ("light pink shirt"), and concept vector U310 (see middle row of Figure 19) is identified as the fifth-ranked concept C5 ("plain wine-red shirt").

＜ステップＳ３２：主要コンセプトの説明処理等＞
つぎに、ステップＳ３２（図５）において、ステップＳ３１での解析結果等を表示する処理が実行される。換言すれば、２つの画像が互いに類似する旨の判断に関する判断根拠（説明情報）を提示する処理が実行される。 <Step S32: Explanation of main concepts, etc.>
Next, in step S32 (FIG. 5), a process is executed to display the analysis results, etc., from step S31. In other words, a process is executed to present the basis (explanatory information) for determining that the two images are similar to each other.

図２５は、上述のような探索処理結果（解析処理結果）の一例（画面表示例）を示す図である。なお、図２５の表示画面６００およびその他の各種の画面６１０，６２０（後述）等は、たとえば表示部３５ｂに表示される。 Figure 25 is a diagram showing an example (screen display example) of the search processing results (analysis processing results) described above. Note that the display screen 600 in Figure 25 and various other screens 610, 620 (described below) are displayed, for example, on the display unit 35b.

図２５の表示画面６００においては、第２手法に基づく５つの主要コンセプトＣ１～Ｃ５が提示されている。 Display screen 600 in Figure 25 presents five main concepts C1 to C5 based on the second method.

具体的には、表示画面６００のグラフ表示領域６０９において、式（２）（より詳細には式（５））に基づき算出された類似度Ｓｃが、グラフ化されて示されている。上位数個（ここでは５個）のコンセプトについて、画像ペア相互間における類似性（ここではクエリ画像とギャラリー画像との類似性）に対する寄与度が算出され、当該寄与度が表示されている。なお、寄与度は、数値（０．１８等）で表示されてもよく、図２５に示されるようにグラフ化して（数値を棒グラフの長さへと変換した状態で）表示されてもよい。 Specifically, in the graph display area 609 of the display screen 600, the similarity Sc calculated based on equation (2) (more specifically, equation (5)) is displayed in a graph. For the top few (here, five) concepts, the contribution to the similarity between the image pairs (here, the similarity between the query image and the gallery image) is calculated and this contribution is displayed. Note that the contribution may be displayed as a numerical value (e.g., 0.18) or may be displayed as a graph (with the numerical value converted into the length of the bar graph) as shown in FIG. 25.

ここでは第２手法に基づき、第１順位から第５順位の５つの主要コンセプトＣ１～Ｃ５が求められている。また、これら５つの主要コンセプトＣ１～Ｃ５のそれぞれについて類似度（寄与度）Ｓｃが示されている。さらに、５つの主要コンセプトＣ１～Ｃ５以外のコンセプトによる寄与度（残りの寄与度）も（「その他」欄）にて示されている。当該残りの寄与度は、たとえば、５つのコンセプトＣ１～Ｃ５を被選択コンセプトとして算出した類似度Ｓｃ（式（２）参照）を、全体の寄与度Ｓｔ（式（１）参照）から差し引くことによって算出される。 Here, five main concepts C1 to C5, ranked first to fifth, are determined based on the second method. The similarity (contribution) Sc is also shown for each of these five main concepts C1 to C5. Furthermore, the contribution (remaining contribution) of concepts other than the five main concepts C1 to C5 is also shown (in the "Other" column). This remaining contribution is calculated, for example, by subtracting the similarity Sc (see formula (2)) calculated using the five concepts C1 to C5 as the selected concepts from the overall contribution St (see formula (1)).

特に、当該５つのコンセプトＣ１～Ｃ５は、この順序で高い順位から低い順位へと（降順に）並ぶように、主要コンセプトとして抽出されている。この順序Ｃ１～Ｃ５は、第２手法に基づく順序であり、式（５）で算出された類似度Ｓｃに基づく順位（第１手法に基づく順位）とは異なる順序である。 In particular, the five concepts C1 to C5 are extracted as main concepts so that they are arranged in descending order from highest to lowest. This order C1 to C5 is based on the second method and is different from the order based on the similarity Sc calculated using equation (5) (the order based on the first method).

仮に第１手法が用いられる場合には、この５つのコンセプトの中では、式（５）で算出された類似度Ｓｃに基づき、コンセプトＣ１，Ｃ２，Ｃ４，Ｃ３，Ｃ５の順序（コンセプトごとの類似度の降順）で抽出される。これに対して、第２手法では、コンセプトＣ１，Ｃ２，Ｃ３，Ｃ４，Ｃ５の順序で抽出されている。すなわち、コンセプトＣ３に関する式（５）による類似度Ｓｃは、コンセプトＣ４に関する式（５）による類似度Ｓｃよりも小さいにもかかわらず、コンセプトＣ３はコンセプトＣ４よりも上位のコンセプトとして抽出されている。 If the first method were used, these five concepts would be extracted in the order of concepts C1, C2, C4, C3, and C5 (descending order of similarity for each concept) based on the similarity Sc calculated using equation (5). In contrast, with the second method, concepts are extracted in the order of C1, C2, C3, C4, and C5. In other words, even though the similarity Sc for concept C3 using equation (5) is smaller than the similarity Sc for concept C4 using equation (5), concept C3 is extracted as a higher-ranking concept than concept C4.

また、第１手法では、式（５）で算出された類似度Ｓｃに基づき、たとえば第１順位のコンセプトＣ１（クラスタＧ１１０に対応するコンセプトベクトルＵ１１０）の下位コンセプト（下位コンセプトベクトル）等も比較的上位のコンセプトとして抽出され得る。詳細には、（クラスタＧ１１０を代表する）コンセプトベクトルＵ１１０のみならず、その下位の（クラスタＧ１１１，Ｇ１１２（図１１参照）を代表する）コンセプトベクトルＵ１１１，Ｕ１１２もが、主要なコンセプトベクトル（コンセプト）として抽出され得る。コンセプトベクトルＵ１１１がコンセプトベクトルＵ１１０に類似する場合、コンセプトベクトルＵ１１１に関する類似度Ｓｃも、コンセプトベクトルＵ１１０と同様に高い値になる可能性があるからである。なお、コンセプトベクトルＵ１１１は、クラスタＧ１１０の下位クラスタＧ１１１を代表するコンセプトベクトルであり、コンセプトベクトルＵ１１２は、クラスタＧ１１０の下位クラスタＧ１１２を代表するコンセプトベクトルである。 Furthermore, in the first method, based on the similarity Sc calculated using equation (5), lower-level concepts (lower-level concept vectors) of the first-ranked concept C1 (concept vector U110 corresponding to cluster G110) can also be extracted as relatively higher-level concepts. Specifically, not only the concept vector U110 (representing cluster G110) but also its lower-level concept vectors U111 and U112 (representing clusters G111 and G112 (see FIG. 11)) can be extracted as major concept vectors (concepts). This is because if concept vector U111 is similar to concept vector U110, the similarity Sc for concept vector U111 is likely to be as high as that for concept vector U110. Note that concept vector U111 is the concept vector representing lower-level cluster G111 of cluster G110, and concept vector U112 is the concept vector representing lower-level cluster G112 of cluster G110.

一方、第２手法では、上述のように、或る時点で既に選択（考慮）されたコンセプトベクトル（たとえば、コンセプトベクトルＵ１１０）以外のコンセプトベクトルのうち、考慮される類似度成分を最も大きく増大させるようなコンセプトベクトルが探索される。その結果、コンセプトベクトルＵ１１０とは比較的大きく異なる（高い独立性を有する）他のコンセプトベクトルＵが、主要なコンセプトベクトル（主要コンセプト）として抽出され易くなる。逆に言えば、クラスタＧ１１０の下位クラスタＧ１１１，Ｇ１１２をそれぞれ代表するコンセプトベクトルＵ１１１，Ｕ１１２は、主要なコンセプトベクトル（主要コンセプト）としては抽出され難くなる。図２５の例では、コンセプトベクトルＵ１１１，Ｕ１１２は、上位５つの主要コンセプトベクトル（主要コンセプト）としては抽出されていない。 On the other hand, as described above, the second method searches for a concept vector that will most significantly increase the considered similarity component among concept vectors other than a concept vector already selected (considered) at a certain point in time (for example, concept vector U110). As a result, other concept vectors U that are relatively different (highly independent) from concept vector U110 are more likely to be extracted as major concept vectors (major concepts). Conversely, concept vectors U111 and U112, which represent lower-level clusters G111 and G112 of cluster G110, are less likely to be extracted as major concept vectors (major concepts). In the example of Figure 25, concept vectors U111 and U112 are not extracted as the top five major concept vectors (major concepts).

このように、第２手法によれば、高い独立性を有するコンセプトを主要コンセプトとして抽出することが可能である。換言すれば、コンセプト間の重複を少なくした主要コンセプトを抽出することが可能である。 In this way, the second method makes it possible to extract concepts with high independence as main concepts. In other words, it is possible to extract main concepts with minimal overlap between concepts.

また、図２５の表示画面６００は、グラフ表示領域６０９に加えて、ボタン６０１～６０５等を有している。各ボタン６０１～６０５には、対応するクラスタＧの識別子（「Ｇ１１０」等）が表示されている。画像処理装置３０およびユーザは、当該識別子（識別ＩＤ）によって、各コンセプトＣ１～Ｃ５等の各対応クラスタを一意に特定（識別）することが可能である。 The display screen 600 in FIG. 25 also has buttons 601-605 in addition to the graph display area 609. Each button 601-605 displays the identifier of the corresponding cluster G (e.g., "G110"). The image processing device 30 and the user can uniquely identify (identify) each corresponding cluster of concepts C1-C5, etc., using this identifier (identification ID).

また、各コンセプトの詳細情報は次のようにして表示等される。 In addition, detailed information about each concept will be displayed as follows:

表示画面６００のグラフ表示領域６０９内の各コンセプトＣ１～Ｃ５の文字部分、あるいは当該各文字部分の直下に設けられたボタン６０１～６０５がマウス操作等によって押下されると、対応するコンセプトに関する詳細情報画面が表示される。 When the text portion of each concept C1 to C5 in the graph display area 609 of the display screen 600, or the buttons 601 to 605 located directly below each text portion, is pressed using a mouse or other means, a detailed information screen for the corresponding concept is displayed.

たとえば、コンセプトＣ１の直下のボタン６０１が押下されると、詳細情報画面６１０（図２６参照）が表示される。また、コンセプトＣ１の直下のボタン６０２が押下されると、詳細情報画面６２０（図２７参照）が表示される。その他のボタンについても同様である。 For example, when button 601 directly below concept C1 is pressed, detailed information screen 610 (see Figure 26) is displayed. Also, when button 602 directly below concept C1 is pressed, detailed information screen 620 (see Figure 27) is displayed. The same applies to the other buttons.

図２６は、コンセプトＣ１に関する詳細情報の表示画面６１０を示す図である。また、図２７は、コンセプトＣ２に関する詳細情報の表示画面６２０を示す図である。他のコンセプトＣ３～Ｃ５等についても同様に、詳細情報の表示画面（詳細情報画面とも称する）が存在する。以下では、コンセプトＣ１に関する詳細情報の表示画面６１０を中心に説明する。 Figure 26 is a diagram showing a display screen 610 for detailed information about concept C1. Figure 27 is a diagram showing a display screen 620 for detailed information about concept C2. Similarly, display screens for detailed information (also referred to as detailed information screens) also exist for other concepts C3 to C5, etc. The following explanation will focus on the display screen 610 for detailed information about concept C1.

表示画面６１０は、領域６１１～６１４およびボタン６１５を有している。 Display screen 610 has areas 611 to 614 and button 615.

領域６１１（上側領域とも称する）は、当該コンセプトＣ１の対応クラスタＧ１１０を構成する複数の画像（コンセプト構成画像）の表示領域である。 Area 611 (also referred to as the upper area) is a display area for multiple images (concept constituent images) that make up the corresponding cluster G110 of the concept C1.

領域６１２（下側領域とも称する）は、領域６１１に表示された複数の画像のそれぞれについて、学習済みモデル４２０によって特徴的な領域であると判定された領域（発火領域）を示すヒートマップ画像を示す図である。領域６１２においては、上側の領域６１１の各画像の直下に、当該各画像に対応するヒートマップ画像がそれぞれ表示されている。下側領域６１２のヒートマップによって各人物のシャツ部分に学習済みモデル４２０が着目していることが知得される。 Area 612 (also referred to as the lower area) shows a heat map image indicating the areas (ignition areas) determined by the trained model 420 to be characteristic areas for each of the multiple images displayed in area 611. In area 612, a heat map image corresponding to each image is displayed directly below each image in the upper area 611. From the heat map in lower area 612, it can be seen that the trained model 420 is focusing on the shirt portion of each person.

領域６１３は、「コンセプト可視化画像」（後述）の表示領域である。コンセプト可視化画像は、クラスタＧ１１０のコンセプト（コンセプトベクトル）を可視化した画像である。コンセプト可視化画像は、クラスタＧ１１０の代表ベクトル（コンセプトベクトル）に対応する仮想的な入力画像であり、Feature Visualization法（後述）等を用いて生成される。コンセプト可視化画像は、クラスタＧ１１０の抽象的概念（抽象的コンセプト）を表現した画像である、とも表現できる。 Area 613 is a display area for a "concept visualization image" (described below). The concept visualization image is an image that visualizes the concept (concept vector) of cluster G110. The concept visualization image is a virtual input image that corresponds to the representative vector (concept vector) of cluster G110, and is generated using a feature visualization method (described below) or the like. The concept visualization image can also be described as an image that expresses the abstract concept (abstract concept) of cluster G110.

領域６１４は、クラスタＧ１１０の属性情報（コンセプト名称等）の表示領域である。後述するように、画像処理装置３０は、ユーザからの操作入力（クラスタの属性情報の入力（文字入力等））を受け付けると、当該属性情報を記憶部３２に格納するとともに、当該属性情報（コンセプト名称等）を領域６１４に表示する。なお、図２６では、当該操作入力後の表示状態（文字列「コンセプトＣ１：模様多めの白シャツ」）が示されている。当該操作入力前（文字列「模様多めの白シャツ」の入力前）においては、たとえば文字列「コンセプトＣ１」のみが領域６１４に表示される。 Area 614 is a display area for attribute information (concept name, etc.) of cluster G110. As will be described later, when the image processing device 30 receives operational input from the user (input of cluster attribute information (text input, etc.)), it stores the attribute information in the storage unit 32 and displays the attribute information (concept name, etc.) in area 614. Note that Figure 26 shows the display state after the operational input (character string "Concept C1: heavily patterned white shirt"). Before the operational input (before the character string "heavily patterned white shirt" was input), for example, only the character string "Concept C1" was displayed in area 614.

ボタン６１５は、関連クラスタの情報を表示する旨の指示を受け付けるボタンである。ボタン６１５が押下されると、図１１上段のようなデンドログラム（特にコンセプトＣ１付近）、および／または図１１下段（あるいは図１３）のようなベン図が表示される。これにより、コンセプトＣ１に対応するクラスタＧ１１０に対する関連クラスタ（詳細には、同位クラスタＧ１２０，Ｇ２００、上位クラスタＧ１００、下位クラスタＧ１１１，Ｇ１１２等）の存在および包含関係等が表示される。さらに、デンドログラムあるいはベン図における各関連クラスタの対応位置をマウスでクリックすることによって、当該関連クラスタ（たとえば同位クラスタＧ１２０）に関する詳細情報表示画面が表示される。なお、これに限定されず、当該マウスクリックに応じて、図１３のような表示画面が表示されても良い。詳細には、当該表示画面において、クラスタＧ１１０の関連クラスタ（Ｇ１１０，Ｇ１２０，Ｇ１００，Ｇ２００等）をそれぞれ構成する各画像群が、クラスタＧ１１０付近のベン図（関連クラスタの上下関係（包含関係）等を示す図）とともに表示されてもよい。 Button 615 accepts an instruction to display information about related clusters. Pressing button 615 displays a dendrogram like the one shown in the top row of Figure 11 (particularly around concept C1) and/or a Venn diagram like the one shown in the bottom row of Figure 11 (or Figure 13). This displays the existence and inclusion relationships of related clusters (specifically, same-level clusters G120 and G200, higher-level cluster G100, lower-level clusters G111 and G112, etc.) with respect to cluster G110 corresponding to concept C1. Furthermore, by clicking the corresponding position of each related cluster in the dendrogram or Venn diagram with the mouse, a detailed information display screen for that related cluster (e.g., same-level cluster G120) is displayed. Note that this is not a limitation; a display screen like the one shown in Figure 13 may also be displayed in response to the mouse click. In particular, on the display screen, each group of images constituting clusters related to cluster G110 (G110, G120, G100, G200, etc.) may be displayed together with a Venn diagram (a diagram showing the hierarchical relationships (inclusion relationships) of related clusters) near cluster G110.

ユーザは、図２６の表示画面６１０から、コンセプトＣ１の詳細情報を知得することができる。 The user can obtain detailed information about concept C1 from display screen 610 in Figure 26.

具体的には、下側領域６１２のヒートマップによって各人物のシャツ部分に学習済みモデル４２０が着目していることが知得される。 Specifically, the heat map in the lower region 612 indicates that the trained model 420 is focusing on the shirt portion of each person.

また、上側領域６１１の複数の画像によって、コンセプトＣ１（詳細には、その対応クラスタ）がどのような画像で構成されているのかを視覚的に知得することが可能である。 Furthermore, the multiple images in the upper area 611 make it possible to visually understand what images concept C1 (more specifically, its corresponding cluster) is made up of.

さらに、領域６１３のコンセプト可視化画像によって、コンセプトＣ１を抽象的に可視化した画像を知得することが可能である。 Furthermore, the concept visualization image in area 613 makes it possible to obtain an image that abstractly visualizes concept C1.

また、領域６１４の属性情報（詳細には、コンセプト名称およびユーザ備考情報等）によって、言語的表現によってコンセプトの内容を知得することが可能である。 In addition, the attribute information in area 614 (specifically, the concept name and user notes, etc.) makes it possible to obtain the content of the concept through linguistic expression.

ここにおいて、クラスタＧ１１０のコンセプト名称は、次のようにしてユーザによって把握されて入力等されればよい。なお、他のクラスタのコンセプト名称についても同様である。 Here, the concept name of cluster G110 can be understood and input by the user as follows. The same applies to the concept names of other clusters.

具体的には、まず、ユーザは、クラスタＧ１１０に関する詳細情報画面６１０（図２６）を視認する。詳細情報画面６１０（詳細には、その上側領域６１１）には、クラスタＧ１１０を構成する画像群（図１３も参照）が表示される。 Specifically, the user first views the detailed information screen 610 (Figure 26) regarding cluster G110. The detailed information screen 610 (more specifically, its upper area 611) displays the group of images that make up cluster G110 (see also Figure 13).

その後、ユーザは、ボタン６１５を押下して、図１１下段のようなベン図を表示させる。そして、ユーザは、当該ベン図を参照しつつ、当該ベン図内のクラスタＧ１２０の位置を押下すること等によって、クラスタＧ１１０の同位クラスタＧ１２０に関する詳細情報画面（図２６と同様の詳細情報画面）を表示させる。クラスタＧ１２０に関する詳細情報画面には、クラスタＧ１２０を構成する画像群（図１３参照）が表示される。 The user then presses button 615 to display a Venn diagram like the one shown in the lower part of Figure 11. Then, while referring to the Venn diagram, the user can display a detailed information screen (similar to the detailed information screen in Figure 26) about cluster G120, which is the same level as cluster G110, by, for example, pressing the position of cluster G120 within the Venn diagram. The detailed information screen about cluster G120 displays the group of images that make up cluster G120 (see Figure 13).

また、必要に応じて、同様の操作によって、クラスタＧ１１０の上位クラスタＧ１００に関する詳細情報画面をも表示させる。 If necessary, a detailed information screen regarding cluster G100, which is higher than cluster G110, can also be displayed by performing the same operation.

ユーザは、これらの詳細情報画面に含まれる画像群を相互に比較検討することによって、各クラスタの特徴を把握することが可能である。 Users can understand the characteristics of each cluster by comparing the images contained in these detailed information screens.

たとえば、クラスタＧ１１０を同位クラスタＧ１２０と比較すること等によって、両クラスタＧ１１０，Ｇ１２０の特徴が把握される。具体的には、クラスタＧ１２０は、「模様少なめの白シャツ」を着用した人物の画像で構成されており、クラスタＧ１１０は、「模様多めの白シャツ」を着用した人物の画像で構成されていることが判る。 For example, by comparing cluster G110 with peer cluster G120, the characteristics of both clusters G110 and G120 can be understood. Specifically, it can be seen that cluster G120 is composed of images of people wearing "white shirts with little pattern," while cluster G110 is composed of images of people wearing "white shirts with a lot of pattern."

また、上位クラスタＧ１００は、両クラスタＧ１１０，Ｇ１２０を包含するクラスタであることから、「模様有りの白シャツ」であることが判る。 Furthermore, since the upper cluster G100 is a cluster that includes both clusters G110 and G120, it can be determined that it is a "white shirt with a pattern."

さらに、上位クラスタＧ１００とその同位クラスタＧ２００とを比較すること等によって、両クラスタＧ１００，Ｇ２００の特徴が把握される。具体的には、クラスタＧ１００は、「模様有り白シャツ」を着用した人物の画像で構成されており、クラスタＧ２００は、「薄いピンク色のシャツ」を着用した人物の画像で構成されていることが判る。換言すれば、クラスタＧ１００は「薄いピンク色以外のシャツ」であることが判る。 Furthermore, by comparing the higher-level cluster G100 with its peer cluster G200, the characteristics of both clusters G100 and G200 can be understood. Specifically, it can be seen that cluster G100 is composed of images of people wearing "patterned white shirts," while cluster G200 is composed of images of people wearing "light pink shirts." In other words, it can be seen that cluster G100 is "shirts other than light pink."

これらの検討によって、ユーザは、クラスタＧ１１０のコンセプト名称として「模様多めの白シャツ」を決定することができる。そして、ユーザは、詳細情報画面６１０の領域６１４に「模様多めの白シャツ」の文字列を入力する。これに応じて、画像処理装置３０は、このような入力操作を受け付け、クラスタＧ１１０のコンセプト名称として「模様多めの白シャツ」を記憶部３２に登録する。また、その後に詳細情報画面６１０が表示される際には、領域６１４に当該コンセプト名称「模様多めの白シャツ」が表示される。 Through these considerations, the user can decide on "heavily patterned white shirt" as the concept name for cluster G110. The user then inputs the character string "heavily patterned white shirt" into area 614 on the detailed information screen 610. In response, the image processing device 30 accepts this input operation and registers "heavily patterned white shirt" in the memory unit 32 as the concept name for cluster G110. Furthermore, when the detailed information screen 610 is subsequently displayed, the concept name "heavily patterned white shirt" is displayed in area 614.

なお、これに限定されず、ユーザは、図１３のような表示画面を視認することによって、これらの情報を纏めて取得して、クラスタＧ１１のコンセプト名称を把握等してもよい。 However, this is not limited to this, and the user may obtain all of this information by visually viewing a display screen such as that shown in Figure 13, and thereby understand the concept name of cluster G11, etc.

また、このようなコンセプト名称の把握および入力等は、ユーザの所望のクラスタおよびその関連クラスタのみ（すなわち比較的少数のクラスタのみ）について、この時点（ステップＳ３２）等に実行されればよい。ただし、これに限定されず、ステップＳ２３の直後等において、全てのクラスタについて当該処理が実行されてもよい。 Furthermore, such concept name identification and input may be performed at this time (step S32) for only the user's desired cluster and its related clusters (i.e., only a relatively small number of clusters). However, this is not limited to this, and the process may be performed for all clusters, for example, immediately after step S23.

＜コンセプト可視化画像＞
ここで、Feature Visualization法を用いたコンセプト可視化画像の生成処理について説明する。 <Concept visualization image>
Here, the process of generating a concept visualization image using the Feature Visualization method will be described.

図２８は、Feature Visualization法を用いたコンセプト可視化画像の生成処理の概略を示す図である。Feature Visualization法は、ニューラルネットワークの或る中間層における特定の発火（特定の中間出力）がどのような入力に応じて発生するのかを調べる手法である。ここでは、中間層からの中間出力に代えて、出力層からの最終出力（すなわち特徴ベクトルＦ）が採用される。すなわち、学習済みモデル４２０からの出力ベクトル（特徴ベクトルＦ）がどのような入力画像に応じて発生するのか、が調べられる。 Figure 28 shows an overview of the process for generating a concept visualization image using the Feature Visualization method. The Feature Visualization method is a technique for investigating what inputs cause a specific firing (a specific intermediate output) in a certain intermediate layer of a neural network. Here, the final output from the output layer (i.e., feature vector F) is used instead of the intermediate output from the intermediate layer. In other words, it is investigated what input images cause the output vector (feature vector F) from the trained model 420 to occur.

具体的には、図２８に示されるように、学習済みモデル４２０から出力された出力ベクトル（すなわち特徴ベクトルＦ）とターゲットベクトル（ここでは、或るコンセプトベクトルＵ）との間の距離ｄ（Ｆ，Ｕ）を最小化する入力画像が、コンセプト可視化画像として求められる。特徴ベクトルＦは、入力画像の画像ベクトルＩの関数（Ｆ（Ｉ））である。なお、ここでは、入力画像を画像ベクトルＩ（入力画像にて２次元に配列されていた画素値を１次元に配列し直したベクトル）で表している。また、距離ｄ（Ｆ，Ｕ）等を単に距離ｄ等とも略記する。 Specifically, as shown in Figure 28, the input image that minimizes the distance d(F,U) between the output vector (i.e., feature vector F) output from the trained model 420 and the target vector (here, a certain concept vector U) is found as the concept visualization image. The feature vector F is a function (F(I)) of the image vector I of the input image. Note that here, the input image is represented by the image vector I (a vector in which the pixel values arranged two-dimensionally in the input image are rearranged one-dimensionally). Also, distances d(F,U) etc. are abbreviated as simply distance d etc.

より具体的には、この距離ｄの増分δｄを最小化する増分δＩを（機械学習で獲得された）学習済みモデル４２０の内部パラメータ等を使って求める処理が繰り返されることによって、入力画像ベクトルＩ（コンセプトベクトルＵの特徴を反映したコンセプト可視化画像）が求められる。 More specifically, the process of finding the increment δI that minimizes the increment δd of this distance d using the internal parameters of the trained model 420 (acquired through machine learning) is repeated, thereby finding the input image vector I (a concept visualization image that reflects the characteristics of the concept vector U).

距離ｄの増分δｄは、次の式（８)で表現される。ここで、ベクトルＨは、ニューラルネットワークの各中間層の重みと活性化関数とにより計算される定数ベクトル（各成分が定数のベクトル）である。なお、式（８）は、画像ベクトルＩの第ｋ成分である値Ｉｋ（スカラー）の増分δＩｋと距離ｄ（Ｆ（Ｉｋ），Ｕ）の増分δｄとの関係（学習済みモデル４２０の勾配情報等で表現される）を全成分ｋについて総和をとること等によって導出される。この距離ｄ（Ｆ（Ｉｋ），Ｕ）は、当該第ｋ成分Ｉｋを有する画像に対応する特徴ベクトルＦ（Ｉｋ）と、コンセプトベクトルＵとの距離である。 The increment δd of distance d is expressed by the following equation (8). Here, vector H is a constant vector (a vector with each component being a constant) calculated using the weights and activation functions of each hidden layer of the neural network. Note that equation (8) is derived by, for example, taking the sum of the relationship (expressed by gradient information, etc., of the trained model 420) between the increment δIk of value Ik (a scalar), which is the kth component of image vector I, and the increment δd of distance d(F(Ik),U) for all components k. This distance d(F(Ik),U) is the distance between the feature vector F(Ik) corresponding to the image having the kth component Ik and the concept vector U.

式（８）に示されるように、増分δｄは、ベクトルＨとベクトルδＩ（ベクトルＩの増分ベクトル）との内積で表現される。 As shown in equation (8), the increment δd is expressed as the dot product of vector H and vector δI (the increment vector of vector I).

また、距離ｄを最小化する処理は、増分δｄを最も小さな値（そのノルムが最も大きな負の値（最大負値とも称する））にすることを繰り返すことで実現される。より詳細には、同じノルム（大きさ）を有するδＩのうち増分δｄを最小化するδＩを、繰り返し求めることで、当該処理が実現される。このような処理は、どのような向きのδＩが距離ｄを最小化できる（増分δｄを最小化できる）か、を繰り返し求めることと等価である。 The process of minimizing the distance d is achieved by repeatedly setting the increment δd to the smallest value (the norm of which is the most negative value (also called the maximum negative value)). More specifically, this process is achieved by repeatedly finding the δI that minimizes the increment δd among the δIs with the same norm (magnitude). This process is equivalent to repeatedly finding the orientation of δI that can minimize the distance d (minimize the increment δd).

増分δｄを最小化するδＩの向きは、ベクトルＨとの内積を最小化する（すなわち、最大ノルムの負の値にする）する向きである。したがって、増分δｄを最小化する（最大ノルムの負の値にする）δＩは、所定ノルム（大きさ）を有するδＩのうち、定数ベクトルＨの向きとは逆の向きのベクトル（cosθ＝－１のときのδＩ）として求められる。ここで、角度θは、両ベクトルＨ，δＩのなす角度である。 The direction of δI that minimizes the increment δd is the direction that minimizes the dot product with vector H (i.e., makes it the negative value of the maximum norm). Therefore, δI that minimizes the increment δd (makes it the negative value of the maximum norm) can be found as the vector δI with a specified norm (magnitude) that is oriented in the opposite direction to the constant vector H (δI when cosθ = -1). Here, the angle θ is the angle between the two vectors H and δI.

そして、そのようなベクトルδＩをベクトルＩに加算する操作（次式（９）参照）が多数回（たとえば何千回）に亘って繰り返される。なお、ベクトルＩの初期値（初期ベクトル）としては、ランダムノイズ画像（あるいは適宜の入力画像（コンセプトベクトルＵに対応するクラスタに属する入力画像等））に相当するベクトル等が用いられればよい。また、εは、所定の定数である。 Then, the operation of adding such vector δI to vector I (see equation (9) below) is repeated many times (for example, thousands of times). Note that the initial value (initial vector) of vector I can be a vector equivalent to a random noise image (or an appropriate input image (such as an input image belonging to the cluster corresponding to concept vector U)). ε is a predetermined constant.

このような処理によって、特徴ベクトルＦとターゲットベクトル（コンセプトベクトルＵ）との間の距離ｄを最小化する入力画像ベクトルＩ（すなわち、入力画像ベクトルＩを２次元配列に並べ替えた入力画像）が求められる。 This process determines the input image vector I (i.e., the input image obtained by rearranging the input image vector I into a two-dimensional array) that minimizes the distance d between the feature vector F and the target vector (concept vector U).

たとえば、コンセプトＣ２の「コンセプト可視化画像」（図２７の領域６２３参照）は、クラスタＧ４００（「白い短めのボトムス」）のコンセプトベクトルＵ４００（図２０の最上段参照）に基づいて、Feature Visualization法を用いて生成された画像である。このコンセプト可視化画像は、当該画像内の中央付近（太い破線の円で示した領域付近）に、「白い短めのボトムス（白い短パン）」のような部分を有している。このような画像によって、コンセプトベクトルＵ４００が、「白い短めのボトムス」という特徴を反映していることが示される。 For example, the "concept visualization image" for concept C2 (see area 623 in Figure 27) is an image generated using the Feature Visualization method based on the concept vector U400 (see the top row in Figure 20) of cluster G400 ("short white bottoms"). This concept visualization image has a part that resembles "short white bottoms (white shorts)" near the center of the image (near the area indicated by the thick dashed circle). This image shows that the concept vector U400 reflects the feature "short white bottoms."

また、コンセプトＣ１の「コンセプト可視化画像」（図２６の領域６１３参照）は、クラスタＧ１１０（「模様多めの白シャツ」）のコンセプトベクトルＵ１１０（図１８の最上段参照）に基づいて、Feature Visualization法を用いて生成された画像である。このコンセプト可視化画像は、画像内の中央やや上寄りの箇所に、「模様が付されたシャツ」のような部分を有している。このような画像によって、コンセプトベクトルＵ４００が、「模様有り（模様多め）のシャツ」という特徴を反映していることが示される。 Furthermore, the "concept visualization image" of concept C1 (see area 613 in Figure 26) is an image generated using the Feature Visualization method based on the concept vector U110 (see the top row in Figure 18) of cluster G110 ("white shirt with lots of patterns"). This concept visualization image has a part that resembles a "patterned shirt" in a location slightly above the center of the image. This image shows that the concept vector U400 reflects the feature of a "patterned (lots of patterns) shirt."

なお、各コンセプトの「コンセプト可視化画像」のみから、ユーザが当該各コンセプトの内容を完全に把握することは必ずしも容易ではない。コンセプト可視化画像は、補助的に用いられることが好ましい。 However, it is not necessarily easy for users to fully understand the content of each concept from only the "concept visualization image" of that concept. It is preferable that the concept visualization image be used as a supplementary tool.

＜１－１１．実施形態の効果等＞
上記実施形態によれば、複数の特徴ベクトルＦに対する階層化クラスタリング処理を実行することにより階層化された複数のクラスタが生成される（ステップＳ２２（図４））。そして、複数のクラスタのうちの特定クラスタに対応するベクトル（詳細にはコンセプトベクトルＵ）が、特定クラスタのコンセプトとして抽出される（ステップＳ２３）。 <1-11. Effects of the embodiment>
According to the above embodiment, a plurality of hierarchical clusters are generated by performing a hierarchical clustering process on a plurality of feature vectors F (step S22 ( FIG. 4 )). Then, a vector (specifically, a concept vector U) corresponding to a specific cluster among the plurality of clusters is extracted as a concept of the specific cluster (step S23).

したがって、階層化された特定クラスタに対応するコンセプトを、その代表ベクトル（コンセプトベクトルＵ）によって、管理および把握することが可能である。また、当該コンセプトベクトルＵを用いた解析処理等を実行することによって、画像の類似性の根拠をコンセプトベースで説明すること等が可能になる。 Therefore, it is possible to manage and understand concepts corresponding to specific hierarchical clusters using their representative vectors (concept vectors U). Furthermore, by performing analytical processing using these concept vectors U, it becomes possible to explain the basis for image similarity on a concept basis.

また、上記実施形態によれば、階層化クラスタリング処理により生成された複数のクラスタのうちの特定クラスタ（Ｇ１１０等）に対応する２以上の入力画像が、特定クラスタのコンセプト（特定コンセプト）を表す画像群として決定され表示されている（図１３、図２６等参照）。換言すれば、当該特定クラスタを構成する２以上の入力画像が、特定クラスタのコンセプトに対応する画像群として表示されている。たとえば、クラスタＧ１１０に対応する複数の入力画像（図２６の領域６１１参照）が、クラスタＧ１１０のコンセプトに対応する画像群として表示されている。当該画像群は、相互に類似する画像群（類似画像群）であり、特定クラスタのコンセプトを表現する画像群であることから、コンセプト表現用類似画像群とも称される。これによれば、当該特定クラスタがどのようなコンセプトを有するのかを視覚的にユーザに提示することが可能である。 Furthermore, according to the above embodiment, two or more input images corresponding to a specific cluster (e.g., G110) among the multiple clusters generated by the hierarchical clustering process are determined and displayed as an image group representing the concept of the specific cluster (specific concept) (see Figures 13, 26, etc.). In other words, the two or more input images constituting the specific cluster are displayed as an image group corresponding to the concept of the specific cluster. For example, multiple input images corresponding to cluster G110 (see area 611 in Figure 26) are displayed as an image group corresponding to the concept of cluster G110. This image group is a group of mutually similar images (similar image group), and because it is an image group that expresses the concept of the specific cluster, it is also referred to as a similar image group for expressing a concept. This makes it possible to visually present to the user the concept of the specific cluster.

また、上記実施形態によれば、特定クラスタの類似画像群がそれぞれのヒートマップ（領域６１２（図２６）等参照）とともに表示されている。したがって、特定クラスタに関する画像内での発火領域（特徴領域）をヒートマップで特定した上で、さらに当該特定クラスタの特徴を概念的に捉えることが可能である。 Furthermore, according to the above embodiment, a group of similar images for a specific cluster is displayed together with their respective heat maps (see area 612 (Figure 26) etc.). Therefore, it is possible to identify ignition areas (feature areas) within images related to a specific cluster using the heat map, and then conceptually grasp the features of that specific cluster.

さらに、特定クラスタの類似画像群がコンセプト可視化画像（領域６１３（図２６）等参照）とともに表示されている。したがって、特定クラスタの特徴を当該類似画像群によって概念的（特に論理的）に捉えつつ、特定クラスタの特徴をコンセプト可視化画像によって視覚的に把握することが可能である。 Furthermore, a group of similar images for a specific cluster is displayed together with the concept visualization image (see area 613 (Figure 26) etc.). Therefore, it is possible to grasp the characteristics of a specific cluster conceptually (especially logically) through the group of similar images, while visually grasping the characteristics of the specific cluster through the concept visualization image.

また、上記実施形態においては、特に階層化された複数のクラスタに対応する複数のコンセプト（詳細には、複数のコンセプトベクトルＵ）が抽出される。したがって、複数のコンセプトの相互間での上位下位関係（親子関係）および（包含関係）等を把握することが可能である。 Furthermore, in the above embodiment, multiple concepts (more specifically, multiple concept vectors U) corresponding to multiple hierarchical clusters are extracted. Therefore, it is possible to grasp the hierarchical relationships (parent-child relationships) and inclusion relationships between multiple concepts.

また、上記実施形態によれば、ステップＳ３１（図５）において２つの画像（第１画像および第２画像）の類似性を判断するにあたり、複数の候補コンセプトについて、第１画像と第２画像との類似性に対する寄与度が算出される。したがって、画像ペアの類似度に対する各コンセプトの寄与度（重要度）を把握することが可能である。 Furthermore, according to the above embodiment, when determining the similarity between two images (first image and second image) in step S31 (Figure 5), the contribution of multiple candidate concepts to the similarity between the first image and the second image is calculated. Therefore, it is possible to grasp the contribution (importance) of each concept to the similarity between the image pair.

なお、上記実施形態においては、多数のコンセプト（候補コンセプト）のそれぞれについて寄与度が算出され、当該寄与度に基づき並べ替え等を伴って上位数個の主要コンセプトが決定されている。しかしながら、これに限定されず、たとえば、ユーザが（その関心事項等に基づき）指定した少なくとも１つのコンセプト（指定コンセプト）についてのみ、２つの画像の類似性に対する寄与度が算出されてもよい。詳細には、当該指定コンセプトに対応するコンセプトベクトルＵ（指定コンセプトベクトル）についてのみ、２つの画像の類似性に対する寄与度が算出されてもよい。これによれば、任意に指定された指定コンセプトがどの程度、２つの画像の類似性に寄与しているか（影響しているか）をユーザが知得することが可能である。 In the above embodiment, the contribution level is calculated for each of a large number of concepts (candidate concepts), and the top few main concepts are determined by sorting, etc. based on the contribution level. However, this is not limited to this, and, for example, the contribution level to the similarity between two images may be calculated only for at least one concept (designated concept) designated by the user (based on their interests, etc.). In particular, the contribution level to the similarity between two images may be calculated only for the concept vector U (designated concept vector) corresponding to the designated concept. This allows the user to know to what extent the arbitrarily designated designated concept contributes (influences) to the similarity between the two images.

また、上記実施形態においては、図２５に示されるように、２つの画像（第１画像および第２画像）の類似性の根拠が、上位所定数（１個から数個）の各コンセプトの寄与度（数値）によって表示されている。したがって、判断根拠に関する客観的な評価基準を提供することが可能である。 Furthermore, in the above embodiment, as shown in Figure 25, the basis for the similarity between two images (first image and second image) is displayed by the contribution (numerical value) of a predetermined number of top concepts (one to several). Therefore, it is possible to provide objective evaluation criteria regarding the basis for judgment.

特に、２つの画像に対応する２つの特徴ベクトルＦを、各コンセプトの固有のコンセプトベクトルＵで張られる部分空間（直線）へと射影した２つの射影ベクトル同士の内積が、当該各コンセプトの寄与度Ｓｃ（式（５）参照）として算出されている。したがって、各コンセプトの類似性に関する寄与度が客観的に提示され得る。 In particular, the inner product of two projection vectors, F, corresponding to two images, projected onto a subspace (line) spanned by each concept's unique concept vector U, is calculated as the contribution Sc of each concept (see equation (5)). Therefore, the contribution of each concept to the similarity can be objectively presented.

また、図２５においては、階層化された複数のコンセプトの中で、コンセプトＣ１の寄与度ＳｃがコンセプトＣ２の寄与度Ｓｃよりも高いことが提示されている。これによれば、画像処理装置３０による類似性の判断において、コンセプトＣ１の特徴（「模様多めの白シャツ」）が、コンセプトＣ２の特徴（「白い短めのボトムス」）よりも大きく寄与していることが判る。 Furthermore, Figure 25 shows that, among the multiple hierarchical concepts, the contribution Sc of concept C1 is higher than the contribution Sc of concept C2. This shows that, in the similarity determination by the image processing device 30, the characteristic of concept C1 ("white shirt with a lot of pattern") contributes more than the characteristic of concept C2 ("white, cropped bottoms").

換言すれば、特定の特徴を有する「ボトムス」を着用している点よりも、特定の特徴を有する「シャツ」を着用している点が重視されて「類似判断」（２つの画像が互いに類似する旨の判断）がなされていることが把握される。 In other words, it appears that the "similarity judgment" (a judgment that two images are similar to each other) is based on the fact that the person is wearing a "shirt" with specific characteristics, rather than the fact that the person is wearing "bottoms" with specific characteristics.

また、図２５においては、コンセプトＣ１（クラスタＧ１１０）が最も高い寄与度Ｓｃを有するコンセプトとして判定されている。このことは、クラスタＧ１１０の寄与度Ｓｃがその同位クラスタＧ１２０（図１１参照）の寄与度Ｓｃよりも大きいことをも示している（図１２の左向き矢印参照）。すなわち、クラスタＧ１１０の特徴（「模様多めの白シャツ」）が、クラスタＧ１２０の特徴（「模様少なめの白シャツ」）よりも大きく寄与している。換言すれば、「模様少なめ（の白シャツ）」の特徴ではなく「模様多め（の白シャツ）」の特徴によって類似性が判断されていることが判る。謂わば、シャツの模様（柄）が多い点に特に特徴を見い出して、「類似判断」がなされていることが把握される。 In addition, in Figure 25, concept C1 (cluster G110) is determined to be the concept with the highest contribution Sc. This also indicates that the contribution Sc of cluster G110 is greater than the contribution Sc of its peer cluster G120 (see Figure 11) (see the left-pointing arrow in Figure 12). In other words, the feature of cluster G110 ("white shirt with a lot of pattern") contributes more than the feature of cluster G120 ("white shirt with less pattern"). In other words, it can be seen that similarity is judged based on the feature of "white shirt with a lot of pattern" rather than the feature of "white shirt with less pattern." In other words, it can be seen that the "similarity judgment" is made by finding a particular feature in the many patterns (designs) on the shirts.

また、コンセプトＣ１（クラスタＧ１１０）が最大寄与度を有することは、クラスタＧ１１０の寄与度Ｓｃがその上位クラスタＧ１００（図１１参照）の寄与度Ｓｃよりも大きいことを示している（図１２の下向き矢印参照）。すなわち、クラスタＧ１００の特徴（「模様有りの白シャツ」）よりも詳細な特徴（下位の特徴）であるクラスタＧ１１０の特徴（「模様多めの白シャツ」）が、大きく寄与していることが示されている。換言すれば、単に「模様有りの白シャツ」ではなく「模様多め」であることにも基づいて類似性が判断されていることが判る。 Furthermore, the fact that concept C1 (cluster G110) has the highest contribution indicates that the contribution Sc of cluster G110 is greater than the contribution Sc of its superordinate cluster G100 (see Figure 11) (see the downward arrow in Figure 12). In other words, this indicates that the feature of cluster G110 ("white shirt with many patterns"), which is a more detailed feature (lower-level feature) than the feature of cluster G100 ("white shirt with many patterns"), makes a large contribution. In other words, it can be seen that similarity is judged not simply based on "white shirt with many patterns" but also on the fact that it is "white shirt with many patterns."

また、コンセプトＣ１（クラスタＧ１１０）が最大寄与度を有することは、クラスタＧ１１０の寄与度Ｓｃがその下位クラスタＧ１１１，Ｇ１１２（図１１参照）の各寄与度Ｓｃよりも大きいことをも示している（図１２の上向き矢印参照）。すなわち、クラスタＧ１１０の特徴（「模様多めの白シャツ」）が、クラスタＧ１１１の特徴（「模様多めの白シャツ（且つその模様が直線的なもの）」）よりも大きく寄与していることが示されている（図１３も参照）。また、クラスタＧ１１０の特徴（「模様多めの白シャツ」）がクラスタＧ１１２の特徴（「模様多めの白シャツ（且つその模様が曲線的なもの）」）よりも大きく寄与していることも示されている。すなわち、模様の種類（直線的か曲線的か）までは考慮されずに、類似性が判断されていることが判る。 The fact that concept C1 (cluster G110) has the highest contribution also indicates that the contribution Sc of cluster G110 is greater than the contribution Sc of each of its subordinate clusters G111 and G112 (see FIG. 11) (see the upward arrows in FIG. 12). That is, it is shown that the feature of cluster G110 ("white shirt with a lot of patterns") contributes more than the feature of cluster G111 ("white shirt with a lot of patterns (and the pattern is linear)") (see also FIG. 13). It is also shown that the feature of cluster G110 ("white shirt with a lot of patterns") contributes more than the feature of cluster G112 ("white shirt with a lot of patterns (and the pattern is curved)"). In other words, it can be seen that similarity is determined without taking into account the type of pattern (straight or curved).

＜１－１２．第１実施形態の変形例＞
なお、上記実施形態では、特定クラスタに対応するベクトル（コンセプトベクトルＵ）が当該クラスタのコンセプトとして抽出されているが、これに限定されない。 <1-12. Modification of the first embodiment>
In the above embodiment, a vector (concept vector U) corresponding to a specific cluster is extracted as the concept of the cluster, but the present invention is not limited to this.

たとえば、第１実施形態において、特定クラスタに対応する「部分空間」が当該特定クラスタのコンセプト（当該コンセプトを表す特定部分空間）として抽出されてもよい。換言すれば、特定クラスタに対応する「部分空間」が当該特定クラスタのコンセプト表現（部分空間によるコンセプト表現）として抽出されてもよい。 For example, in the first embodiment, a "subspace" corresponding to a specific cluster may be extracted as a concept of the specific cluster (a specific subspace representing the concept). In other words, a "subspace" corresponding to a specific cluster may be extracted as a concept representation of the specific cluster (a concept representation using a subspace).

詳細には、当該特定クラスタ自体のコンセプトベクトルＵで張られる部分空間（具体的には、コンセプトベクトルＵに対応する射影行列Ｐによる射影後の直線（コンセプトベクトルＵを含む直線））が、当該特定クラスタのコンセプトとして抽出されてもよい。たとえば、クラスタＧ１００（図１１下段参照）自体のコンセプトベクトルＵ１００（図１８最下段参照）で張られる部分空間（すなわち、コンセプトベクトルＵ１００を含む直線）が、クラスタＧ１００のコンセプトとして抽出されてもよい。 In particular, the subspace spanned by the concept vector U of the specific cluster itself (specifically, the straight line (the straight line including the concept vector U) after projection using the projection matrix P corresponding to the concept vector U) may be extracted as the concept of the specific cluster. For example, the subspace spanned by the concept vector U100 (see the bottom row of Figure 11) of cluster G100 itself (i.e., the straight line including the concept vector U100) may be extracted as the concept of cluster G100.

あるいは、特定クラスタに包含される所定数（たとえば２つ）の下位クラスタにそれぞれ対応する所定数のコンセプトベクトルＵで張られる部分空間が、当該特定クラスタのコンセプト（コンセプト表現）として抽出されてもよい。たとえば、クラスタＧ１００（図１１下段参照）の２つの下位クラスタＧ１１０，Ｇ１２０にそれぞれ対応する２つのコンセプトベクトルＵ１１０，Ｕ１２０（図１８参照）で張られる部分空間（平面）が、クラスタＧ１００のコンセプトとして抽出されてもよい。 Alternatively, a subspace spanned by a predetermined number of concept vectors U corresponding to a predetermined number (e.g., two) of lower-level clusters contained in a specific cluster may be extracted as the concept (concept representation) of that specific cluster. For example, a subspace (plane) spanned by two concept vectors U110 and U120 (see Figure 18) corresponding to two lower-level clusters G110 and G120 of cluster G100 (see the bottom of Figure 11) may be extracted as the concept of cluster G100.

特に、第１実施形態では、２つの画像の類似性判断において、２つの画像が互いに類似する旨の判断（「類似判断」）の根拠コンセプトとして、特定のコンセプトベクトルＵが抽出されているが、これに限定されない。たとえば、当該特定のコンセプトベクトルＵで張られる部分空間（直線）等が類似判断の根拠コンセプトとして抽出されてもよい。 In particular, in the first embodiment, when judging the similarity between two images, a specific concept vector U is extracted as the basis concept for judging that the two images are similar to each other ("similarity judgment"), but this is not limited to this. For example, a subspace (straight line) spanned by the specific concept vector U may be extracted as the basis concept for the similarity judgment.

また、次述する第２実施形態においても同様である。特に、第２実施形態では、２つの画像が互いに類似していない旨の判断（「非類似判断」）の根拠コンセプトとして、特定のコンセプトベクトルＵが抽出される（次述）が、これに限定されない。たとえば、当該特定のコンセプトベクトルＵで張られる部分空間（直線）等が「非類似判断」の根拠コンセプトとして抽出されてもよい。 The same applies to the second embodiment described below. In particular, in the second embodiment, a specific concept vector U is extracted (described below) as the basis concept for determining that two images are not similar to each other ("dissimilarity determination"), but this is not limiting. For example, a subspace (straight line) spanned by the specific concept vector U may be extracted as the basis concept for the "dissimilarity determination."

＜２．第２実施形態＞
上記第１実施形態では、推論結果に関する説明情報の生成処理として、２つの画像が互いに類似する旨の判断（「類似判断」）の根拠を説明する処理等（図５）について例示した。しかしながら、これに限定されず、たとえば、推論結果に関する説明情報の生成処理として、２つの画像が互いに類似していない旨の判断（「非類似判断」）の根拠を説明する処理等（図６参照）が行われてもよい。換言すれば、「類似性」の根拠として、類似判断の根拠ではなく、非類似判断の根拠が説明されてもよい。第２実施形態では、このような態様について説明する。以下では、第１実施形態との相違点を中心に説明する。第２実施形態では、ステップＳ３０（図５）に代えてステップＳ４０（図６）が実行される。 2. Second embodiment
In the first embodiment, a process for explaining the basis for a determination that two images are similar to each other ("similarity determination") (see FIG. 5) has been described as an example of a process for generating explanatory information regarding an inference result. However, the present invention is not limited to this. For example, a process for explaining the basis for a determination that two images are not similar to each other ("dissimilarity determination") (see FIG. 6) may be performed as a process for generating explanatory information regarding an inference result. In other words, the basis for "similarity" may be a basis for a dissimilarity determination, rather than a basis for a similarity determination. In the second embodiment, such an aspect will be described. The following description will focus on differences from the first embodiment. In the second embodiment, step S40 (see FIG. 6) is executed instead of step S30 (see FIG. 5).

図２９は、「非類似判断」の根拠を求める処理（ステップＳ４１）を示す概念図である。図２９においては、図示の都合上、全空間が３次元空間で捨象されて表現されている。コンセプトベクトルＵはｚ軸と同じ向きを有しており、ｚ方向に伸びる直線は、コンセプトベクトルＵで張られる部分空間を示している。また、ｘｙ平面（ｚ＝０の平面）は、コンセプトベクトルＵで張られる部分空間の直交補空間を示している。また、ここでは、２つの特徴ベクトルｑ，ｇは、互いに類似していない２つの画像に対応する特徴ベクトルＦである。図２９では、２つの特徴ベクトルＦの向きが大きく互いに異なること、すなわち、２つの画像が互いに類似していないことが示されている。 Figure 29 is a conceptual diagram showing the process of determining the basis for a "dissimilarity judgment" (step S41). For convenience of illustration, Figure 29 abstracts the entire space into a three-dimensional space. The concept vector U has the same orientation as the z axis, and a straight line extending in the z direction indicates the subspace spanned by the concept vector U. The xy plane (the plane where z = 0) indicates the orthogonal complement of the subspace spanned by the concept vector U. Here, the two feature vectors q and g are feature vectors F corresponding to two images that are dissimilar to each other. Figure 29 shows that the orientations of the two feature vectors F are significantly different from each other, i.e., the two images are dissimilar to each other.

図２９に示されるように、特定コンセプト（特定クラスタ）に対応する特定コンセプトベクトルＵで示される部分空間（直線）の直交補空間（ここではコンセプトベクトルＵに垂直な平面）を想定する。 As shown in Figure 29, we consider the orthogonal complement of the subspace (straight line) indicated by a specific concept vector U corresponding to a specific concept (specific cluster) (here, a plane perpendicular to the concept vector U).

特徴ベクトルＦを当該直交補空間へ射影した射影ベクトル（ＲＦ）（具体的には、Ｒｑ，Ｒｇ）は、特徴ベクトルＦのうち、コンセプトベクトルＵで説明された成分を除いた成分（コンセプトベクトルＵでは未だ説明されていない成分（残留成分））を有している。このような２つの射影ベクトル（ＲＦ）が互いに近いということは、未だ説明されていない成分（残留成分）が類似すること（ひいては、当該残留成分は、２つの特徴ベクトルＦの類似性判断（非類似判断）には大きな影響を及ぼさないこと）を意味する。逆に言えば、コンセプトベクトルＵで既に説明された成分が２つの特徴ベクトルＦの非類似判断に大きな影響を及ぼすことを意味する。 The projection vector (RF) (specifically, Rq, Rg) obtained by projecting feature vector F onto the orthogonal complementary space contains components of feature vector F excluding those explained by concept vector U (components (residual components) not yet explained by concept vector U). The fact that two such projection vectors (RF) are close to each other means that the components (residual components) not yet explained are similar (and, by extension, that the residual components do not have a significant impact on the similarity judgment (dissimilarity judgment) of two feature vectors F). Conversely, this means that the components already explained by concept vector U have a significant impact on the dissimilarity judgment of two feature vectors F.

このような特性を利用し、第２実施形態では、２つの特徴ベクトルｑ，ｇを当該直交補空間に対して射影した射影ベクトルＲｑ，Ｒｇ同士の距離が小さい（特に非常に近い）場合、当該特定コンセプト（に対応する特定コンセプトベクトルＵ）が、「非類似判断の根拠コンセプト」として抽出される。より詳細には、その部分空間（特定コンセプトベクトルＵで張られる部分空間（直線））の直交補空間（平面等）への射影ベクトルＲｑ，Ｒｇ同士の距離が相対的に（他のコンセプトベクトルＵよりも）小さな特定コンセプトベクトルＵが、「非類似判断の根拠コンセプト」として抽出される。以下、より具体的に説明する。 Using this characteristic, in the second embodiment, if the distance between the projection vectors Rq, Rg obtained by projecting two feature vectors q, g onto the orthogonal complementary space is small (especially very close), the specific concept (or its corresponding specific concept vector U) is extracted as the "basis concept for the dissimilarity judgment." More specifically, a specific concept vector U for which the distance between the projection vectors Rq, Rg onto the orthogonal complementary space (plane, etc.) of the subspace (subspace (straight line) spanned by the specific concept vector U) is relatively small (compared to other concept vectors U) is extracted as the "basis concept for the dissimilarity judgment." A more detailed explanation is provided below.

特定コンセプト（特定クラスタ）に対応する特定コンセプトベクトルＵで示される部分空間への射影行列が行列Ｐ（式（３）等参照）で表されるとき、当該部分空間の直交補空間への射影行列Ｒは（１－Ｐ）で表される。したがって、特徴ベクトルｑを当該直交補空間に射影した射影ベクトルはベクトル（（１－Ｐ）ｑ）であり、特徴ベクトルｇを当該直交補空間に射影した射影ベクトルはベクトル（（１－Ｐ）ｇ）である（図２９参照）。 When the projection matrix onto the subspace indicated by the specific concept vector U corresponding to a specific concept (specific cluster) is expressed as matrix P (see equation (3), etc.), the projection matrix R onto the orthogonal complement of that subspace is expressed as (1-P). Therefore, the projection vector obtained by projecting feature vector q onto the orthogonal complement is vector ((1-P)q), and the projection vector obtained by projecting feature vector g onto the orthogonal complement is vector ((1-P)g) (see Figure 29).

これらの射影ベクトル同士の距離は、｜（（１－Ｐ）ｑ）－（（１－Ｐ）ｇ）｜であり、当該距離の２乗を評価値Ｓｄ１として定義する（式（１０）参照）。 The distance between these projection vectors is |((1-P)q)-((1-P)g)|, and the square of this distance is defined as the evaluation value Sd1 (see equation (10)).

上述したように、行列Ｐ（式（３）等参照）は、行列ＢひいてはコンセプトベクトルＵに応じて相違する行列である。したがって、評価値Ｓｄ１を最小化する行列Ｐを求めることは、評価値Ｓｄ１を最小化するコンセプトベクトルＵを求めることと等価である。それ故、この評価値Ｓｄ１を最小化するコンセプトベクトルＵが、「非類似判断の根拠コンセプト」として抽出される。 As described above, matrix P (see equation (3), etc.) is a matrix that differs depending on matrix B and, ultimately, concept vector U. Therefore, finding matrix P that minimizes evaluation value Sd1 is equivalent to finding concept vector U that minimizes evaluation value Sd1. Therefore, the concept vector U that minimizes evaluation value Sd1 is extracted as the "basis concept for dissimilarity judgment."

また、評価値Ｓｄ１を最小化することは、式（１１）の評価値Ｓｄ２を「最大化」することと等価である。より簡易な評価値Ｓｄ２が用いられてもよい。 Furthermore, minimizing the evaluation value Sd1 is equivalent to "maximizing" the evaluation value Sd2 in equation (11). A simpler evaluation value Sd2 may also be used.

このようにして、評価値Ｓｄ２（あるいはＳｄ１）が用いられて、「非類似判断の根拠コンセプト」として主要コンセプトベクトルＵが求められる。 In this way, the evaluation value Sd2 (or Sd1) is used to obtain the main concept vector U as the "basis concept for the dissimilarity judgment."

また、その際には、上述の第１手法に類似する手法（第３手法とも称する）、あるいは第２手法に類似する手法（第４手法とも称する）が用いられればよい。 In this case, a method similar to the first method described above (also referred to as the third method) or a method similar to the second method (also referred to as the fourth method) may be used.

具体的には、第３手法においては、まず、コントローラ３１は、単一のコンセプトベクトルＵに関する評価値Ｓｄ２（上述の式（１１））を、複数の候補コンセプトベクトルＵのそれぞれについて求める。次に、コントローラ３１は、複数の候補コンセプトベクトルＵに対応する複数のコンセプトをその評価値Ｓｄ２の大きい順に並べ替える。そして、コントローラ３１は、上位数個のコンセプトを、２つの画像の「非類似判断の根拠コンセプト」（特にその主要なコンセプト）として決定する。 Specifically, in the third method, the controller 31 first calculates the evaluation value Sd2 (the above-mentioned formula (11)) for a single concept vector U for each of the multiple candidate concept vectors U. Next, the controller 31 sorts the multiple concepts corresponding to the multiple candidate concept vectors U in descending order of their evaluation value Sd2. The controller 31 then determines the top few concepts as the "basis concepts for determining dissimilarity" between the two images (particularly their main concepts).

一方、第４手法においては、或る時点（ｉ回目の繰り返し処理時点）で既に選択（考慮）されたコンセプト以外のコンセプト（未選択コンセプト）のうち、評価値Ｓｄ２を最大化（あるいは評価値Ｓｄ１を最小化）するコンセプトを探索する探索処理が繰り返し実行される。当該探索処理は、所定の終了条件が成立するまで（たとえば所定数の主要コンセプトが決定されるまで）繰り返し実行される。なお、繰り返しに伴ってコンセプトベクトルＵの選択数が１つずつ増加していき、これに応じて、射影行列Ｐのランク（階数）が１つずつ増大していく。 On the other hand, in the fourth method, a search process is repeatedly performed to search for a concept that maximizes evaluation value Sd2 (or minimizes evaluation value Sd1) among concepts (unselected concepts) other than those already selected (considered) at a certain point in time (at the time of the i-th iteration). This search process is repeatedly performed until a predetermined termination condition is met (for example, until a predetermined number of main concepts have been determined). Note that with each iteration, the number of selected concept vectors U increases by one, and accordingly, the rank (order) of the projection matrix P increases by one.

また、各手法において、式（１２）で示される評価値Ｄも算出され得る。この評価値Ｄは、「似ていない」判断に関して或る時点までに選択されたｎ個のコンセプト（コンセプトベクトルＵ）によっては未だ考慮されていない成分（類似性評価に関する残留成分）を意味する。評価値Ｄは、具体的には、「似ていない」判断（非類似判断）の度合い（１－ｑ・ｇ）から、ｎ個のコンセプトで説明される「似ていない」判断の度合い（Ｓｄ２／２）を差し引いた値（残差とも称する）として算出される。ただし、Ｐが単位行列（フルランク）（β×βの単位行列）になる場合（すなわち全体空間に対応するコンセプト（コンセプトベクトルＵ）が考慮された場合）に評価値Ｄがゼロになるように、係数１／２が評価値Ｓｄ２に乗じられて調整されている。 Each method can also calculate an evaluation value D, as shown in equation (12). This evaluation value D refers to the components (residual components related to the similarity evaluation) that have not yet been considered by the n concepts (concept vector U) selected up to a certain point in time regarding the "dissimilar" judgment. Specifically, the evaluation value D is calculated as the value (also called the residual) obtained by subtracting the degree of "dissimilar" judgment (dissimilarity judgment) explained by the n concepts (Sd2/2) from the degree of "dissimilar" judgment (dissimilarity judgment) (1-q·g). However, the evaluation value Sd2 is adjusted by multiplying it by a coefficient 1/2 so that the evaluation value D becomes zero when P becomes a unit matrix (full rank) (a β×β unit matrix) (i.e., when the concept (concept vector U) corresponding to the entire space is considered).

なお、第４手法においては、この評価値Ｄが（評価値Ｓｄ２等に代えて）用いられてもよい。すなわち、或る時点で既に選択（考慮）されたコンセプト以外のコンセプト（未選択コンセプト）のうち、評価値Ｄを最小化するコンセプトを探索する探索処理が実行されてもよい。当該探索処理は、所定の終了条件が成立するまで（たとえば所定数の主要コンセプトが決定されるまで）繰り返し実行される。なお、繰り返しに伴ってコンセプトベクトルＵの選択数が１つずつ増加していき、これに応じて、射影行列Ｐのランク（階数）が１つずつ増大していくとともに、評価値Ｄは徐々に減少していく。 In the fourth method, this evaluation value D may be used (instead of evaluation value Sd2, etc.). That is, a search process may be performed to search for a concept that minimizes evaluation value D among concepts (unselected concepts) other than those already selected (considered) at a certain point in time. This search process is repeatedly performed until a predetermined termination condition is met (for example, until a predetermined number of main concepts have been determined). With each repetition, the number of selected concept vectors U increases by one, and accordingly, the rank (order) of the projection matrix P increases by one, and the evaluation value D gradually decreases.

以上のような第３手法あるいは第４手法等によって、複数のクラスタにそれぞれ対応する複数のコンセプトベクトルＵのうちの特定のコンセプトベクトルＵが、「非類似判断の根拠コンセプト」として抽出される。当該特定のコンセプトベクトルＵは、それ（その特定のコンセプトベクトルＵ）に対応する部分空間の直交補空間への各特徴ベクトルｑ，ｇの射影ベクトル（Ｒｑ），（Ｒｇ）の相互間の距離を最小化するコンセプトベクトルである。 Using the third or fourth method described above, a specific concept vector U from among the multiple concept vectors U corresponding to multiple clusters is extracted as the "basis concept for dissimilarity judgment." This specific concept vector U is the concept vector that minimizes the distance between the projection vectors (Rq) and (Rg) of each feature vector q and g onto the orthogonal complement of the subspace corresponding to it (that specific concept vector U).

換言すれば、複数のクラスタにそれぞれ対応する複数の部分空間のうち、その直交補空間への２つの特徴ベクトルｑ，ｇの射影ベクトル（（１－Ｐ）ｑ），（（１－Ｐ）ｇ）の相互間の距離を相対的に（他の部分空間よりも）小さくする部分空間を張るコンセプトベクトルＵが、「非類似判断の根拠コンセプト」として抽出される。 In other words, of the multiple subspaces corresponding to the multiple clusters, the concept vector U that spans the subspace that makes the distance between the projection vectors ((1-P)q) and ((1-P)g) of the two feature vectors q and g onto the orthogonal complementary space relatively smaller (than other subspaces) is extracted as the "basis concept for dissimilarity judgment."

謂わば、複数のクラスタにそれぞれ対応する複数のコンセプトのうち、そのコンセプトを取り除けば両画像が互いに類似していると判定されるようなコンセプトが、「非類似判断の根拠コンセプト」として抽出される。 In other words, among the multiple concepts corresponding to the multiple clusters, the concept that, if removed, would result in the two images being judged to be similar to each other is extracted as the "basis concept for the dissimilarity judgment."

以上のような処理の結果、次のような「非類似判断の根拠コンセプト」が抽出され得る。 As a result of the above processing, the following "basis concepts for dissimilarity judgments" can be extracted:

たとえば、薄いピンク色シャツを着用した人物画像と無地のワインレッド色のシャツを着用した人物画像とが非類似であると判断される場合、コンセプトベクトルＵ２００およびコンセプトベクトルＵ３１０等が、「非類似判断の根拠コンセプト」として抽出され得る（図１１等参照）。 For example, if an image of a person wearing a light pink shirt and an image of a person wearing a plain wine-red shirt are determined to be dissimilar, concept vector U200, concept vector U310, etc. may be extracted as the "basis concept for the dissimilarity determination" (see Figure 11, etc.).

あるいは、（図１１等には図示されていないが、）仮に「青色のシャツ」の上位クラスタと「青色且つチェック柄のシャツ」の下位クラスタとが存在する場合において、青色シャツを着用した人物画像と青色且つチェック柄シャツを着用した人物画像とが非類似と判断されることもある。その非類似判断の根拠コンセプトとしては、「青色のシャツ」クラスタのコンセプトベクトルおよび「青色且つチェック柄のシャツ」クラスタのコンセプトベクトル等が抽出され得る。 Alternatively, (although not shown in Figure 11, etc.) if there is a higher-level cluster of "blue shirt" and a lower-level cluster of "blue and checked shirt," an image of a person wearing a blue shirt and an image of a person wearing a blue and checked shirt may be determined to be dissimilar. The concept that can be extracted to support this dissimilarity determination may include the concept vector of the "blue shirt" cluster and the concept vector of the "blue and checked shirt" cluster.

このように、比較対象の両画像のそれぞれの特徴を最も良く反映したコンセプトＣ（コンセプトベクトルＵ）が主要コンセプトして抽出され得る。 In this way, the concept C (concept vector U) that best reflects the characteristics of both images being compared can be extracted as the main concept.

また、ステップＳ４２においては、図２５および図２６等と同様の表示が行われる。ただし、「類似判断」の根拠コンセプトではなく、「非類似判断」の根拠コンセプトを説明するための表示が行われる。 Furthermore, in step S42, a display similar to that shown in Figures 25 and 26 is displayed. However, a display is displayed to explain the basis concept of the "dissimilarity judgment" rather than the basis concept of the "similarity judgment."

具体的には、たとえば、図２５と同様に、上位数個（たとえば２個～５個）の主要コンセプトの評価値Ｓｄ２（あるいはＳｄ１）がグラフ化されて表示される。評価値Ｄ（残差）もが表示されてもよい。また、図２６等と同様に、各主要コンセプトの詳細情報（対応クラスタの構成画像群、ヒートマップ画像群、コンセプト可視化画像等）が表示される。 Specifically, for example, similar to Figure 25, the evaluation values Sd2 (or Sd1) of the top few (e.g., 2 to 5) main concepts are displayed in a graph. The evaluation values D (residuals) may also be displayed. Furthermore, similar to Figure 26, etc., detailed information about each main concept (constituent images of the corresponding cluster, heat map images, concept visualization images, etc.) is displayed.

また、図２６等と同様に、上位数個の主要コンセプト（「非類似判断」の根拠コンセプト）のそれぞれについての詳細情報が表示される。 In addition, similar to Figure 26, detailed information about each of the top few main concepts (basis concepts for "dissimilarity judgment") is displayed.

以上のような処理によれば、画像ペアが似ていない場合に、似ていないと判断される根拠（非類似判断の根拠）を把握することが可能である。 By performing the above processing, when an image pair is dissimilar, it is possible to understand the basis for determining that they are dissimilar (the basis for determining dissimilarity).

なお、この第２実施形態では、特定のコンセプトベクトルＵが「非類似判断の根拠コンセプト」として抽出されている。ただし、上述したように、これに限定されず、たとえば、当該特定のコンセプトベクトルＵで張られる部分空間（直線）が「非類似判断の根拠コンセプト」として抽出されてもよい。 In this second embodiment, a specific concept vector U is extracted as the "basis concept for dissimilarity judgment." However, as mentioned above, this is not limited to this, and for example, a subspace (straight line) spanned by the specific concept vector U may also be extracted as the "basis concept for dissimilarity judgment."

＜３．変形例等＞
以上、この発明の実施の形態について説明したが、この発明は上記説明した内容のものに限定されるものではない。 <3. Modifications, etc.>
Although the embodiment of the present invention has been described above, the present invention is not limited to the above-described contents.

たとえば、サブフェーズＰＨ３ａ（ステップＳ２０）の処理は、必ずしも第２フェーズＰＨ２（ステップＳ１２）の後に行われなくてもよく、たとえば、第１フェーズＰＨ１（ステップＳ１１）の直後に行われてもよい。 For example, the processing of subphase PH3a (step S20) does not necessarily have to be performed after second phase PH2 (step S12), but may instead be performed immediately after first phase PH1 (step S11).

また、上記各実施形態では、サブフェーズＰＨ３ａにおいて、学習済みモデル４２０に対する入力画像２１０として、機械学習に用いられた複数の入力画像２１１が用いられているが、これに限定されず、当該複数の入力画像２１１とは別の複数の入力画像（たとえば、入力画像２１３）が用いられてもよい。ただし、学習済みモデル４２０に対する入力画像２１０としては、当該別の複数の入力画像（入力画像２１３等）を用いるよりも、学習済みモデル４２０の学習に利用された複数の入力画像２１１を用いる方が好ましい。当該複数の入力画像２１０に対する特徴ベクトル２５０の分布（学習済みモデル４２０からの出力分布）として、比較的正確な分布が得られていると考えられるためである。 In addition, in each of the above embodiments, in sub-phase PH3a, multiple input images 211 used in machine learning are used as input images 210 for the trained model 420. However, this is not limited to this, and multiple input images other than the multiple input images 211 (for example, input image 213) may be used. However, it is preferable to use the multiple input images 211 used in training the trained model 420 as input images 210 for the trained model 420 rather than using the multiple input images other than the multiple input images (input image 213, etc.). This is because it is believed that a relatively accurate distribution is obtained as the distribution of feature vectors 250 for the multiple input images 210 (output distribution from the trained model 420).

また、上記各実施形態においては、本発明が人物認識に適用される態様が例示されているが、これに限定されない。たとえば、本発明は、商品認識に適用されてもよい。あるいは、病変認識（病変検出）等に適用されてもよい。 Furthermore, while the above embodiments illustrate examples in which the present invention is applied to person recognition, the present invention is not limited to this. For example, the present invention may also be applied to product recognition. Alternatively, the present invention may also be applied to lesion recognition (lesion detection), etc.

また、上記各実施形態においては、本発明がメトリックラーニング（距離学習）に適用される態様が例示されているが、これに限定されず、本発明は、クラス分類学習等に適用されてもよい。 Furthermore, while the above embodiments illustrate examples in which the present invention is applied to metric learning (distance learning), the present invention is not limited to this and may also be applied to class classification learning, etc.

たとえば、画像特徴を抽出する特徴抽出層（ＣＮＮ等として構成される）と当該特徴抽出層にて抽出された特徴に基づき分類処理等を実行する全結合層とを備える学習済みモデル４２０を用いたクラス分類学習に適用されてもよい。詳細には、当該学習済みモデル４２０からの中間的な出力ベクトル（特徴抽出層から出力され、当該特徴抽出層の次の全結合層に入力されるベクトル）が、特徴空間における特徴ベクトルＦとして取得されればよい。換言すれば、学習モデル４２０から出力される特徴ベクトルＦは、学習済みモデル４２０から最終的に出力されるベクトル（最終出力）に限定されず、学習済みモデル４２０から中間的に出力されるベクトル等（中間出力）であってもよい。そして、当該特徴ベクトルＦに対する階層化クラスタリング処理、およびコンセプトベクトル抽出処理等が実行されればよい。 For example, this may be applied to class classification learning using a trained model 420 that includes a feature extraction layer (configured as a CNN, etc.) that extracts image features and a fully connected layer that performs classification processing, etc. based on the features extracted in the feature extraction layer. In particular, the intermediate output vector from the trained model 420 (the vector output from the feature extraction layer and input to the fully connected layer next to the feature extraction layer) may be acquired as the feature vector F in feature space. In other words, the feature vector F output from the trained model 420 is not limited to the vector ultimately output from the trained model 420 (final output), but may also be a vector intermediately output from the trained model 420 (intermediate output). Then, hierarchical clustering processing, concept vector extraction processing, etc. may be performed on the feature vector F.

１画像処理システム
３０画像処理装置（情報処理装置）
３１コントローラ
２１０，２１１，２１３，２１５人物画像（入力画像）
２５０，２５１，２５３，２５５，Ｆ特徴ベクトル
４００，４１０，４２０学習モデル
５０１分離平面
６００，６１０，６２０表示画面
Ｃコンセプト
Ｇクラスタ
Ｕコンセプトベクトル 1 Image processing system 30 Image processing device (information processing device)
31 Controller 210, 211, 213, 215 Person image (input image)
250, 251, 253, 255, F Feature vector 400, 410, 420 Learning model 501 Separation plane 600, 610, 620 Display screen C Concept G Cluster U Concept vector

Claims

a control unit that acquires a plurality of feature vectors output from a machine-learned learning model corresponding to input of a plurality of input images to the learning model, generates a plurality of hierarchical clusters by executing a hierarchical clustering process on the plurality of feature vectors, and extracts a subspace or vector corresponding to a specific cluster from the plurality of clusters as a concept of the specific cluster;
An information processing device comprising:

The information processing device of claim 1, wherein the control unit extracts a representative vector related to the specific cluster as a concept of the specific cluster.

The information processing device described in claim 2, characterized in that the control unit generates a concept visualization image, which is a virtual input image corresponding to the representative vector and which visualizes the concept of the specific cluster, based on the representative vector and the learning model, and displays the concept visualization image on a display unit.

a receiving unit that receives input of attribute information of the specific cluster;
The information processing device according to claim 1 , further comprising:

The information processing device described in any one of claims 1 to 4, characterized in that when determining the similarity between the first image and the second image based on a first feature vector output from the learning model in response to an input of a first image to the learning model and a second feature vector output from the learning model in response to an input of a second image to the learning model, the control unit calculates the contribution of at least one concept among a plurality of concepts corresponding respectively to the plurality of clusters to the determination of the similarity between the first image and the second image.

The information processing device of any one of claims 1 to 4, characterized in that when the control unit determines the similarity between the first image and the second image based on a first feature vector output from the learning model in response to a first image input to the learning model and a second feature vector output from the learning model in response to a second image input to the learning model, the control unit extracts, from among a plurality of subspaces corresponding to the plurality of clusters, a subspace that relatively reduces the distance between the projection vectors of the first and second feature vectors onto the orthogonal complementary space, or a vector spanning the subspace, as a concept that serves as a basis for determining that the first image and the second image are dissimilar to each other.

a control unit that acquires a plurality of feature vectors output from a machine-learned learning model corresponding to input of a plurality of input images to the learning model, generates a plurality of hierarchical clusters by executing a hierarchical clustering process on the plurality of feature vectors, and determines two or more input images corresponding to a specific cluster among the plurality of clusters as a group of images representing the concept of the specific cluster;
An information processing device comprising:

a) obtaining a plurality of feature vectors output from a machine-learned learning model in response to input of a plurality of input images to the learning model;
b) generating a plurality of hierarchical clusters by performing a hierarchical clustering process on the plurality of feature vectors;
c) extracting a subspace or vector corresponding to a specific cluster from among the plurality of clusters as a concept of the specific cluster;
An information processing method comprising: