JP7655779B2

JP7655779B2 - Learning model updating device and learning model updating method

Info

Publication number: JP7655779B2
Application number: JP2021086450A
Authority: JP
Inventors: 敦廣池; 全孔
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2025-04-02
Anticipated expiration: 2041-05-21
Also published as: JP2022179162A

Description

本発明は、学習モデル更新装置及び学習モデル更新方法に関する。 The present invention relates to a learning model update device and a learning model update method.

各種記憶装置の容量の大規模化、計算機演算性能の向上、ネットワーク上の転送速度の高速化等により、各種業務において、大量の電子的履歴情報の蓄積が可能となった。情報の内容としても、従来の書誌的情報だけではなく、映像、画像等の多様なメディア情報を蓄積することが可能となった。 The increasing capacity of various storage devices, improved computing performance, and faster transfer speeds over networks have made it possible to store large amounts of electronic historical information in a variety of business processes. In terms of information content, it is now possible to store not only traditional bibliographic information, but also a variety of media information such as video and images.

このような背景から、これらの電子的履歴情報を利活用するための情報処理技術へのニーズが高まっている。例えば、特許、商標、意匠の審査業務では、出願された文献、関連する図像、審査業務に関連し付与された書誌情報等の多様な情報が業務の履歴情報として蓄積される。これらの履歴情報を活用することによって、審査業務の効率化、出願に関わる作業の効率化が実現できれば、産業分野の発展に大きく寄与することとなる。 Against this background, there is an increasing need for information processing technology to utilize this electronic history information. For example, in the examination process for patents, trademarks, and designs, a variety of information such as applied documents, related images, and bibliographic information attached in relation to the examination process is accumulated as work history information. If this history information could be utilized to improve the efficiency of the examination process and the work related to applications, it would make a significant contribution to the development of the industrial sector.

画像に対して、任意個数のキーワードと共に保存された業務情報は、しばしば存在する。また、業務上取得された画像に対して、それに付与するべきキーワードを自動で付与するような要望も多い。 Business information is often stored with any number of keywords for images. There is also a strong demand for a system that can automatically assign keywords to images acquired during business operations.

データに対して複数個のラベルが付与された学習データは、マルチラベルの学習データと呼ばれる。マルチラベル学習データを対象とする深層型ニューラルネットワークの学習では、Cross-Entropy Lossを用いる方法が一般的である。この方法では、ラベルの「異なり数」分の出力値を出すネットワークを構成する。しかし、キーワード付与では、事前に付与されるキーワードの異なり数を想定することが困難な場合も多い。また、想定する異なり数が極端に大きい場合、ネットワークモデルの構成上、および、処理の実行上、課題が発生する。 Training data in which multiple labels have been assigned to the data is called multi-label training data. When training deep neural networks that target multi-label training data, a method that uses Cross-Entropy Loss is commonly used. With this method, a network is constructed that outputs values equal to the "number of different" labels. However, when assigning keywords, it is often difficult to predict in advance the number of different keywords that will be assigned. Furthermore, when the number of different expected keywords is extremely large, problems arise in configuring the network model and in executing the process.

一方、深層型ニューラルネットワークの学習方式の１つであるtripletネットワーク学習では、同一のラベルを持つデータ間では距離が小さく、異なるレベルを持つデータ間では距離が大きくなるような特徴量ベクトルを出力するようにニューラルネットワークを学習する。 On the other hand, triplet network learning, which is one of the learning methods for deep neural networks, trains the neural network to output feature vectors in which the distance between data with the same label is small and the distance between data with different levels is large.

図１に、tripletの概念を図示した。tripletネットワーク学習では、着目するデータ（アンカー）１１１と同一のレベルを持つデータを正事例１１２、異なるレベルを持つデータを負事例１１３とし、アンカー、正事例、負事例の３者から構成されるtriplet１１０を作る。 The triplet concept is illustrated in Figure 1. In triplet network learning, data at the same level as the data of interest (anchor) 111 is treated as positive examples 112, and data at a different level is treated as negative examples 113, and a triplet 110 consisting of the anchor, positive examples, and negative examples is created.

tripletを構成する各データは、同一の重みを持つニューラルネットワークモデル１２０に入力され、ニューラルネットワークモデル１２０を構成するニューラルネットワークは、それぞれのデータに対応した特徴量ベクトル１３０を出力する。 Each piece of data that makes up the triplet is input to a neural network model 120 with the same weighting, and the neural network that makes up the neural network model 120 outputs a feature vector 130 that corresponds to each piece of data.

なお、図１ではニューラルネットワークモデル１２０としてＣＮＮ（Convolutional Neural Network：畳み込みニューラルネットワーク）を例示しているが、ニューラルネットワークモデル１２０としてＣＮＮに限定する意図はなく、既知のニューラルネットワークモデルが適用可能である。本明細書全体を通しても、ネットワークモデルは既知のものが好適に適用可能である。 Note that, although FIG. 1 illustrates a CNN (Convolutional Neural Network) as an example of the neural network model 120, there is no intention to limit the neural network model 120 to a CNN, and any known neural network model can be applied. Throughout this specification, known network models can be suitably applied.

次に、アンカー１１１・正事例１１２間の特徴量空間上での距離Ｄ_＋１４０、および、アンカー１１１・負事例１１３間の特徴量空間上での距離Ｄ_－１５０が計算される。学習の目的関数であるloss関数は、tripletごと算出されるこの２つの距離の差の総和として定義される。非特許文献１では、このtripletネットワーク学習を顔画像による個人認証を行うための特徴量の学習に適用している。 Next, a distance D ₊ 140 in the feature space between the anchor 111 and the positive example 112, and a distance D _- 150 in the feature space between the anchor 111 and the negative example 113 are calculated. A loss function, which is an objective function of the learning, is defined as the sum of the differences between these two distances calculated for each triplet. In Non-Patent Document 1, this triplet network learning is applied to learning features for performing personal authentication using face images.

ある学習セットが与えられた時、tripletは無数に存在する。非特許文献２は、類似ベクトル検索処理を適用することによって、効率的な学習を実現するtripletを選択する方式について論じている。 When a learning set is given, there are an infinite number of triplets. Non-Patent Document 2 discusses a method for selecting triplets that achieve efficient learning by applying a similar vector search process.

Schroff, Florian, Dmitry Kalenichenko, James Philbin. "Facenet: A unified embedding for face recognition and clustering", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015)Schroff, Florian, Dmitry Kalenichenko, James Philbin. "Facenet: A unified embedding for face recognition and clustering", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015) 池浦康充、岡本光一、加嶋亮平、廣池敦：IoTデータ向けマルチモーダル深層学習基盤,日立評論,102-3,119-123（2020）Yasumitsu Ikeura, Koichi Okamoto, Ryohei Kashima, and Atsushi Hiroike: Multimodal Deep Learning Platform for IoT Data, Hitachi Review, 102-3, 119-123 (2020)

上述したtripletネットワーク学習は、特徴量ベクトルの取得するための方法として有効であるが、各データに対して単一のラベルが付与された場合に適用できる方法である。一方、マルチラベルの学習データの場合、アンカーとなるデータと他のデータとの間で正事例・負事例の関係が一義的には定義できないため、tripletを構成することが出来ない。 The triplet network learning described above is an effective method for obtaining feature vectors, but it is a method that can be applied when a single label is assigned to each piece of data. On the other hand, in the case of multi-label learning data, the relationship between positive and negative examples between the anchor data and other data cannot be uniquely defined, so triplets cannot be constructed.

本発明は上記の課題に鑑みてなされたもので、その目的は、マルチラベルの学習データを対象としたtripletネットワーク学習を可能とする学習モデル更新装置及び学習モデル更新方法を提供することにある。 The present invention has been made in consideration of the above problems, and its purpose is to provide a learning model update device and a learning model update method that enable triplet network learning for multi-label learning data.

上記課題を解決すべく、本発明の一つの観点に従う学習モデル更新装置は、複数のラベルが付与された画像データの画像特徴量ベクトルを算出する学習モデルを受け入れ、この学習モデルを更新する学習モデル更新装置であって、学習モデルに対する学習データとして受け入れた複数の画像データから選択された１の画像データであるクエリと、クエリとして選択された画像データと異なる複数の画像データとの間のラベル類似度を算出するラベル類似度算出部と、ラベル類似度の算出の際に用いられた複数の画像データのうち、ラベル類似度のギャップが所定の条件を満たす画像データの組を正事例及び負事例の組として選択する選択処理部と、選択された正事例と負事例の組に基づき学習モデルを更新するモデル更新処理部とを備えることを特徴とする。 In order to solve the above problem, a learning model updating device according to one aspect of the present invention is a learning model updating device that accepts a learning model that calculates an image feature vector of image data to which multiple labels have been assigned, and updates this learning model, and is characterized by comprising: a query, which is one image data selected from multiple image data accepted as learning data for the learning model, a label similarity calculation unit that calculates label similarity between the image data selected as the query and multiple image data different from the multiple image data; a selection processing unit that selects, from the multiple image data used in calculating the label similarity, a pair of image data whose label similarity gap satisfies a predetermined condition as a pair of positive and negative examples; and a model updating processing unit that updates the learning model based on the selected pair of positive and negative examples.

本発明によれば、マルチラベルの学習データを対象としたtripletネットワーク学習を可能とする学習モデル更新装置及び学習モデル更新方法を実現することができる。 The present invention provides a learning model update device and a learning model update method that enable triplet network learning for multi-label learning data.

一般的なtripletネットワーク学習の模式図である。FIG. 1 is a schematic diagram of general triplet network learning. 実施例１に係る学習モデル更新装置の構成図である。FIG. 1 is a configuration diagram of a learning model updating device according to a first embodiment. 実施例１に係る学習モデル更新装置の処理のフローである。1 is a flow of processing of a learning model updating device according to a first embodiment. 実施例１に係る学習モデル更新装置のtriplet集合生成処理のフローである。13 is a flow of a triplet set generation process of the learning model updating device according to the first embodiment. 実施例１に係る学習モデル更新装置における検索結果ごとのtriplet生成処理の説明図である。FIG. 2 is an explanatory diagram of a triplet generation process for each search result in the learning model update device according to the first embodiment. 実施例１に係る学習モデル更新装置における検索結果ごとのtriplet生成処理のフローである。13 is a flow of a triplet generation process for each search result in the learning model updating device according to the first embodiment. 実施例２に係る学習モデル更新装置の構成図である。FIG. 11 is a configuration diagram of a learning model updating device according to a second embodiment. 実施例２に係る学習モデル更新装置の処理のフローである。13 is a flow of processing of a learning model updating device according to a second embodiment. 実施例２に係る学習モデル更新装置におけるtriplet集合生成処理のフローである。13 is a flow of a triplet set generation process in the learning model updating device according to the second embodiment. 実施例２に係る学習モデル更新装置におけるアンカーごとのtriplet生成処理のフローである。13 is a flow of a triplet generation process for each anchor in the learning model updating device according to the second embodiment. 実施例１または実施例２により更新されたネットワークモデルを用いた検索装置の構成図である。FIG. 11 is a configuration diagram of a search device using a network model updated according to the first or second embodiment.

以下、本発明の実施形態について、図面を参照して説明する。なお、以下に説明する実施形態は請求の範囲に係る発明を限定するものではなく、また実施形態の中で説明されている諸要素及びその組み合わせの全てが発明の解決手段に必須であるとは限らない。 The following describes an embodiment of the present invention with reference to the drawings. Note that the embodiment described below does not limit the invention as claimed, and not all of the elements and combinations thereof described in the embodiment are necessarily essential to the solution of the invention.

なお、実施例を説明する図において、同一の機能を有する箇所には同一の符号を付し、その繰り返しの説明は省略する。 In addition, in the figures explaining the embodiments, parts having the same functions are given the same reference numerals, and repeated explanations are omitted.

また、以下の説明では、情報の一例として「ｘｘｘデータ」といった表現を用いる場合があるが、情報のデータ構造はどのようなものでもよい。すなわち、情報がデータ構造に依存しないことを示すために、「ｘｘｘデータ」を「ｘｘｘテーブル」と言うことができる。さらに、「ｘｘｘデータ」を単に「ｘｘｘ」と言うこともある。そして、以下の説明において、各情報の構成は一例であり、情報を分割して保持したり、結合して保持したりしても良い。 In the following explanation, expressions such as "xxx data" may be used as an example of information, but the information may have any data structure. That is, to show that the information does not depend on the data structure, "xxx data" may be referred to as a "xxx table." Furthermore, "xxx data" may simply be referred to as "xxx." In the following explanation, the structure of each piece of information is an example, and the information may be divided and stored, or combined and stored.

なお、以下の説明では、「プログラム」を主語として処理を説明する場合があるが、プログラムは、プロセッサ（例えばＣＰＵ（Central Processing Unit））によって実行されることで、定められた処理を、適宜に記憶資源（例えばメモリ）及び／又は通信インターフェースデバイス（例えばポート）を用いながら行うため、処理の主語がプログラムとされても良い。プログラムを主語として説明された処理は、プロセッサ或いはそのプロセッサを有する計算機が行う処理としても良い。 In the following explanation, the process may be described with a "program" as the subject, but since a program is executed by a processor (e.g., a CPU (Central Processing Unit)) to perform a defined process using storage resources (e.g., memory) and/or communication interface devices (e.g., ports) as appropriate, the subject of the process may also be the program. Processes described with a program as the subject may also be processes performed by a processor or a computer having that processor.

本実施形態の学習モデル更新装置は、一例として以下のような構成を有する。 The learning model update device of this embodiment has the following configuration, as an example:

マルチラベルの学習データにおいて、各データに付与されたラベルに基づき、データ間のラベル類似度を定義する。あるデータをアンカーとして考えた時に、アンカーとのラベル類似度の差（ギャップ）が最大となるデータの組、ないしは、ラベル類似度の差（ギャップ）が一定の値以上となるデータの組の集合を抽出する。各データの組の中で、アンカーとのラベル類似度が高いものを正事例の候補、低いものを負事例の候補とし、抽出された正事例候補・負事例候補の組から、アンカーとの特徴量ベクトル空間中での距離が小さい順に必要な個数を選択し、tripletの集合を構成する。 In multi-label learning data, the label similarity between data is defined based on the label assigned to each piece of data. When a certain piece of data is considered as an anchor, the set of data with the largest difference (gap) in label similarity with the anchor, or the set of data sets with a difference (gap) in label similarity equal to or greater than a certain value is extracted. Among each set of data, those with high label similarity with the anchor are considered as positive example candidates, and those with low label similarity are considered as negative example candidates. From the extracted sets of positive and negative example candidates, the necessary number of pairs are selected in order of smallest distance to the anchor in the feature vector space to form a set of triplets.

学習データ中の各データをアンカーとして上記処理を実行することによって、学習対象であるtripletの全体集合を構成する。構成されたtripletの全体集合に対して、loss関数を定義し、その最小化によって、特徴量ベクトル抽出のための学習を行う。 By executing the above process using each data in the training data as an anchor, a universal set of triplets to be learned is constructed. A loss function is defined for the universal set of triplets constructed, and learning for extracting feature vectors is performed by minimizing this function.

本実施例は、本実施例の技術と類似ベクトル検索技術を併用して効率的な学習を実現するものである。 This embodiment achieves efficient learning by combining the technology of this embodiment with similar vector search technology.

図２は、本実施例の構成図である。 Figure 2 shows the configuration of this embodiment.

本実施例の学習モデル更新装置２１０は、マルチラベル情報が付与された画像データの集合を学習データ２２０として受け付ける。通常の運用では、深層型ニューラルネットワークの学習を行うにあたって、完全にネットワークの重みはランダムな状態から開始することは稀である。事前に他の学習課題で学習済みのネットワークモデル２３０を学習の初期状態として与えることによって、学習の効率化を行う。 The learning model update device 210 of this embodiment accepts a collection of image data to which multi-label information has been added as learning data 220. In normal operation, when learning a deep neural network, it is rare for the network weights to start in a completely random state. Learning is made more efficient by providing a network model 230 that has already been trained in another learning task as the initial state of learning.

学習モデル更新装置２１０の内部では、推論処理部２１１において、学習過程におけるその時点のネットワークモデルの状態を用いて学習データの特徴ベクトルの抽出を行う。類似ベクトル検索処理部２１２では、推論処理部２１１により抽出された特徴量ベクトルの集合を対象とした類似ベクトル検索を行う。 Inside the learning model update device 210, the inference processing unit 211 extracts feature vectors of the learning data using the state of the network model at that point in the learning process. The similar vector search processing unit 212 performs a similar vector search on the set of feature vectors extracted by the inference processing unit 211.

triplet集合生成処理部２１３では、類似ベクトル検索処理部２１２による類似ベクトル検索処理の結果に基づいて、学習に用いるtripletの集合を構成する。より詳細には、triplet集合生成処理部２１３のラベル類似度算出部２１３ａは、学習データ２２０である複数の画像データから選択された１の画像データであるクエリと、クエリとして選択された画像データと異なる複数の画像データとの間のラベル類似度を算出し、選択処理部２１３ｂは、ラベル類似度の算出の際に用いられた複数の画像データのうち、ラベル類似度のギャップが所定の条件を満たす画像データの組を正事例及び負事例の組として選択する。 The triplet set generation processing unit 213 constructs a set of triplets to be used for learning based on the result of the similarity vector search processing by the similarity vector search processing unit 212. More specifically, the label similarity calculation unit 213a of the triplet set generation processing unit 213 calculates the label similarity between a query, which is one image data selected from the multiple image data that are the learning data 220, and multiple image data different from the image data selected as the query, and the selection processing unit 213b selects, from the multiple image data used in calculating the label similarity, a pair of image data whose label similarity gap satisfies a predetermined condition as a pair of positive and negative cases.

モデル更新処理部２１４では、triplet集合生成処理部２１３により構成された学習用tripletの集合を用いてネットワークモデルの更新処理を行う。更新されたネットワークモデル２４０の状態は、モデル保存処理部２１５において、必要に応じて外部の記憶媒体に保存される。 The model update processing unit 214 performs a network model update process using the set of learning triplets constructed by the triplet set generation processing unit 213. The state of the updated network model 240 is saved in an external storage medium by the model saving processing unit 215 as necessary.

図３に、学習モデル更新装置２１０の処理の流れを示す。 Figure 3 shows the processing flow of the learning model update device 210.

学習モデル更新装置２１０では、先ず、学習データおよびネットワークモデルの初期状態を取得する（３０１）。続く推論処理（３０２）で、学習データの特徴量ベクトルを抽出する。 The learning model update device 210 first acquires the learning data and the initial state of the network model (301). In the subsequent inference process (302), a feature vector of the learning data is extracted.

次のクラスタリング処理（３０３）は、triplet集合生成処理（３０４）において行う類似ベクトル検索を高速に実行するための前処理である。クラスタリングを用いた類似ベクトル検索については、上述の非特許文献２等で論じられている。続くtriplet集合生成処理（３０４）については後述する。 The next clustering process (303) is a pre-processing step to quickly execute the similar vector search performed in the triplet set generation process (304). Similar vector search using clustering is discussed in the above-mentioned non-patent document 2 and elsewhere. The subsequent triplet set generation process (304) will be described later.

triplet集合生成処理（３０４）で取得されたtripletの集合から、一定の基準に基づき、実際に学習に用いる学習用tripletの選択（３０５）を行う。本基準の内容については、後述する。学習用tripletは、tripletを要素とする順序付けられた配列である。このようにして取得された学習用tripletによって、ネットワークモデルの更新（３０６）を行う。次回行われる推論処理（３０２）では、この更新されたネットワークモデルを用いることとなる。 From the set of triplets obtained in the triplet set generation process (304), a learning triplet to be actually used for learning is selected (305) based on certain criteria. The content of this criterion will be described later. A learning triplet is an ordered array whose elements are triplets. The network model is updated (306) using the learning triplets obtained in this way. This updated network model will be used in the next inference process (302).

本実施例では、推論処理（３０２）からネットワークモデルの更新（３０６）に至るまでの処理を学習の１周期とする。学習終了の判定処理（３０７）では、通常、この周期の数が事前に設定された最大周期数に達した場合、学習を終了するが、他の評価基準に基づいて学習を終了してもよい。 In this embodiment, the process from the inference process (302) to the update of the network model (306) is considered to be one cycle of learning. In the process of determining whether or not to end learning (307), learning is usually ended when the number of cycles reaches a preset maximum number of cycles, but learning may also be ended based on other evaluation criteria.

図４に、triplet集合生成処理（３０４）における処理の流れを示す。 Figure 4 shows the process flow for triplet set generation processing (304).

先ず、triplet集合の要素を空にする（４０１）。次に、学習データ中のデータを１つクエリとして選択し（４０２）、そのクエリの特徴量ベクトルと類似した特徴量ベクトルを持つデータを検索し（４０３）、検索結果を取得する（４０４）。取得された検索結果に用いて、次の検索結果ごとのtriplet生成処理（４０５）を行う。本処理の内容については、後述する。 First, the elements of the triplet set are cleared (401). Next, one piece of data in the training data is selected as a query (402), data having a feature vector similar to that of the query is searched for (403), and search results are obtained (404). The obtained search results are used to perform triplet generation processing for each of the next search results (405). The details of this processing will be described later.

次に、取得された検索結果ごとのtriplet集合を本処理のtriplet集合に追加する（４０６）。４０２から４０６までの処理は、通常、全ての学習データをクエリとして行われるが、計算量の制限等の要請により、クエリとするデータの件数に上限を設ける場合もある。この場合は、学習データ中から無作為に選択された一定個数のデータをクエリとする、等の方法が取られる。 Next, the triplet set for each obtained search result is added to the triplet set of this process (406). The processes from 402 to 406 are usually performed using all learning data as a query, but there are cases where an upper limit is set on the number of pieces of data to be used as a query due to requirements such as limits on the amount of calculation. In this case, a method is used in which a certain number of pieces of data randomly selected from the learning data are used as the query.

次に、検索結果ごとのtriplet生成処理（４０５）の詳細な内容について説明する。 Next, we will explain the details of the triplet generation process (405) for each search result.

先ず、マルチラベルが付与された場合のラベル類似度を以下のように定義する。

ここで、ｑは、クエリとして選択されたデータ、ｘは、検索結果中の任意のデータ、Ｑはクエリｑに付与されたレベルの集合、Ｘは検索結果中のデータｘに付与されたラベルの集合である。また、式（１）左辺の分母は、ＱとＸの和集合の要素数、分子は、ＱとＸの積集合の要素数である。ｑとｘとのラベル類似度ｓ（ｑ，ｘ）は、両者のラベルが完全に一致した場合１、共通するラベルが全く存在しない場合０となる。 First, the label similarity when multiple labels are assigned is defined as follows.

Here, q is data selected as a query, x is any data in the search results, Q is a set of levels assigned to query q, and X is a set of labels assigned to data x in the search results. The denominator on the left side of formula (1) is the number of elements in the union of Q and X, and the numerator is the number of elements in the intersection of Q and X. The label similarity s(q, x) between q and x is 1 when the labels of the two completely match, and 0 when there are no common labels.

次に、検索結果中の異なる２つのデータから構成されるデータの組（ｘ，ｙ）間に対して、クエリｑとのラベル類似度の差（ギャップ）ｇ（ｑ，ｘ，ｙ）を次式のように定義する。

上式の値の範囲は、－１～１となる。 Next, for a data pair (x, y) consisting of two different data in the search results, the difference (gap) g(q, x, y) in label similarity with the query q is defined as follows:

The value range of the above formula is from -1 to 1.

図５は、検索結果中のラベル類似度の推移を模式的に示したものである。 Figure 5 shows a schematic diagram of the change in label similarity in search results.

図５の横軸は検索結果中の順位、縦軸は各検索結果のラベル類似度である。特徴量ベクトル空間上での類似度と、ラベル類似度が高い相関関係にあれば、ラベル類似度は、検索順位の増大に従い、単調に減少すると想定される。しかし、実際には、両者の食い違いが発生し、検索順位が低いデータが高いラベル類似度を示したり、検索順位が高いデータが低いラベル類似度を示したりする。 The horizontal axis of Figure 5 is the ranking in the search results, and the vertical axis is the label similarity of each search result. If there is a high correlation between the similarity in the feature vector space and the label similarity, it is expected that the label similarity will monotonically decrease as the search ranking increases. However, in reality, a discrepancy between the two occurs, and data with a low search ranking may show high label similarity, or data with a high search ranking may show low label similarity.

そこで、式（２）のラベル類似度のギャップの概念を用いる。ある検索順位のデータに着目した時、そのデータよりも順位が低いデータとそのデータとの間のラベル類似度のギャップを算出する。式（２）の変数名に従えば、より低い順位のデータがｘ、着目したデータがｙに対応する。 Therefore, we use the concept of label similarity gap in formula (2). When focusing on a certain piece of search ranking data, we calculate the label similarity gap between that data and data with a lower ranking. According to the variable names in formula (2), the data with a lower ranking corresponds to x, and the focused data corresponds to y.

本実施例では、検索結果中のデータの組について、このラベル類似度のギャップが最大となる組を抽出し、各組を構成するデータの中で順位が低い方をtripletを構成するための正事例の候補、順位が高い方を負事例の候補とする。 In this embodiment, for the data pairs in the search results, the pair with the largest gap in label similarity is extracted, and the data that make up each pair with the lower ranking is selected as a positive example candidate for forming a triplet, and the data with the higher ranking is selected as a negative example candidate.

なお、ある検索結果が与えられた時、このギャップの値が最大となる組は、通常、複数存在する。ギャップ、および、その元となるラベル類似度は、０～１の間の値をとる実数として定義されているが、式（１）から明らかなように、ラベル類似度は、各データに付与されたラベルの個数等のそれほど大きくない整数値の組合せで定義されている。結果としてギャップの値は、連続的な値として分布するものではなく、多数の離散的な値をとるものとして分布する。 When a search result is given, there are usually multiple pairs for which the gap value is maximum. The gap and its underlying label similarity are defined as real numbers between 0 and 1, but as is clear from equation (1), the label similarity is defined as a combination of not-so-large integer values, such as the number of labels assigned to each piece of data. As a result, the gap value is not distributed as a continuous value, but as a number of discrete values.

従って、最大値を与える組は、一意に決まることはなく、通常は、多数発生する。また、ギャップが最大となる組の内、同一の負事例候補を共有する組が複数存在する場合には、正事例候補の順位が高い方の組、すなわち、検索順位の隔たりが小さい方の組を選択する。 Therefore, the pair that gives the maximum value is not unique, but usually there are many pairs. Also, if there are multiple pairs that share the same negative example candidate among the pairs with the largest gap, the pair with the higher ranked positive example candidate, i.e., the pair with the smaller gap in search ranking, is selected.

上述の方法により抽出された正事例候補・負事例候補の組の集合中から、負事例候補の検索順位が高い方から事前に指定された個数の組を選択し、クエリと各組と組合わせることによって、triplet集合を構成する。正事例候補・負事例候補の組の数が、指摘された個数に満たない場合は、全ての正事例候補・負事例候補の組を用いてtriplet集合を構成する。なお、最大ギャップ値が、事前に指定された閾値より小さい場合は、tripletの構成を行わない。すなわち、この場合、生成されるtripletの集合は空集合となる。 From the set of pairs of positive and negative example candidates extracted using the above method, a pre-specified number of pairs are selected in order of the negative example candidate's search ranking, and a triplet set is constructed by combining each pair with the query. If the number of pairs of positive and negative example candidates is less than the specified number, a triplet set is constructed using all pairs of positive and negative example candidates. Note that if the maximum gap value is smaller than a pre-specified threshold, no triplets are constructed. In other words, in this case, the set of triplets generated is an empty set.

図６に、上述した検索結果ごとのtriplet集合生成処理（４０５）の処理の流れを示す。 Figure 6 shows the process flow for generating a triplet set for each search result (405).

先ず、クエリと検索結果中のデータ間でのラベル類似度を算出する（６０１）。次に、ラベル類似度のギャップが最大となる検索結果の組を正事例候補・負事例候補の組として抽出する（６０２）。次に、抽出された正事例候補・負事例候補の組から、負事例候補の検索順位が高い方からＮ件を選択する（６０３）。最後に、クエリと選択された正事例候補・負事例候補の組を組み合わせることによって、Ｎ個のtripletを構成する（６０４）。 First, the label similarity between the query and the data in the search results is calculated (601). Next, the pair of search results with the largest gap in label similarity is extracted as a pair of positive and negative example candidates (602). Next, from the extracted pairs of positive and negative example candidates, N negative example candidates are selected in descending order of search ranking (603). Finally, N triplets are constructed by combining the query and the selected pairs of positive and negative example candidates (604).

生成された各tripletに対して、次式のような値（tripletごとのloss関数）を定義する。

ここで、ｑは、クエリの特徴量ベクトル、ｘは正事例の特徴量ベクトル、ｙは負事例の特徴量ベクトルである。本実施例では、クエリからの特徴量ベクトル空間上での距離が負事例よりも正事例の方が大きいものを選択してtripletを構成しているため、上式の値は、非負となる。 For each triplet generated, we define the following value (loss function for each triplet):

Here, q is the feature vector of the query, x is the feature vector of the positive case, and y is the feature vector of the negative case. In this embodiment, a triplet is constructed by selecting positive cases whose distance from the query in the feature vector space is greater than that of negative cases, so the value of the above formula is non-negative.

図３の学習用tripletの選択（３０５）では、triplet集合生成処理（３０４）によって取得されたtriplet集合を、式（３）の値が小さい順にソートする。triplet集合の要素数が指定された件数よりも大きい場合は、下位のtriplet、すなわち、式（３）の値が相対的に大きいtripletは切り捨てられる。 In the selection of learning triplets (305) in FIG. 3, the triplet set obtained by the triplet set generation process (304) is sorted in ascending order of the value of expression (3). If the number of elements in the triplet set is greater than the specified number, the lower triplets, i.e., the triplets with relatively large values of expression (3), are discarded.

図３のネットワークモデルの更新（３０６）では、次式のloss関数が減少するように最適化が行われる。

ここで、ｉは学習用tripleの配列の添え字、ｑ_ｉはｉ番目のtripletのクエリの特徴量ベクトル、ｘ_ｉはｉ番目のtripletの正事例の特徴量ベクトル、ｙ_ｉはｉ番目のtripletの負事例の特徴量ベクトルである。また、ｂは、２乗距離の差の下限で、あるtripletの正事例、負事例間での２乗距離の差がｂを下回る場合、そのtripletは最適化計算の対象から除外される。 In updating the network model (306) in FIG. 3, optimization is performed so as to reduce the loss function of the following equation.

Here, i is the subscript of the array of training triples, _qi is the feature vector of the query of the i-th triplet, _xi is the feature vector of the positive cases of the i-th triplet, and _yi is the feature vector of the negative cases of the i-th triplet. Also, b is the lower limit of the difference in squared distance, and if the difference in squared distance between the positive and negative cases of a triplet falls below b, the triplet is excluded from the optimization calculation.

本実施例では、全ての学習用tripletについて、一度に最適化を行うわけではない。多くの深層型ニューラルネットワークの学習と同様に、学習用tripletの配列全体をミニ・バッチと言われる小さい配列に分割し、ミニ・バッチ単位で、逐次的にネットワークの更新を行う。 In this embodiment, optimization is not performed on all training triplets at once. As with most deep neural network training, the entire array of training triplets is divided into small arrays called mini-batches, and the network is updated sequentially in mini-batch units.

従って、ある周期での最初の状態で、式（３）のtripletごとのloss関数が非負の条件を満たしていたとしても、更新の過程でのネットワークの状態の変動により、正事例・負事例間の２乗距離の差が負の値となる場合が出てくる。式（４）のｂは、負の極端に大きな値に基づきネットワークの更新を行った場合、数値計算上の不安定化により、学習が失敗してしまうことを避けるためのものである。かかる観点から、式（４）のｂの値は適切なスカラー値として事前に設定される。 Therefore, even if the loss function for each triplet in equation (3) satisfies the non-negative condition in the initial state of a certain cycle, the difference in squared distance between positive and negative examples may become negative due to fluctuations in the state of the network during the update process. The value of b in equation (4) is intended to prevent learning from failing due to instability in the numerical calculations when updating the network based on an extremely negative value. From this perspective, the value of b in equation (4) is set in advance as an appropriate scalar value.

また、ネットワークの更新（３０６）処理は、処理（３０５）でソートされた順序で行われる。従って、式（３）の値が小さいtripletを用いた学習が優先されることになる。tripletごとのloss関数が小さいtripletを先に学習することは、特徴量空間の補正がより小規模で足りると想定されるtriplet、すなわち、学習するのが容易と想定されるtripletから優先して処理することを意味する。これは、より安定性の高い学習過程を実現するための方策である。 The network update process (306) is performed in the order sorted in process (305). Therefore, learning using triplets with smaller values of equation (3) is prioritized. Learning triplets with smaller loss functions per triplet first means that triplets that are expected to require smaller corrections in the feature space, that is, triplets that are expected to be easier to learn, are prioritized. This is a measure to achieve a more stable learning process.

なお、図６で説明した検索結果ごとのtriplet集合生成処理では、ラベル間類似度のギャップが最大となる正事例候補・負事例候補の組として抽出しているが、ギャップの値が一定閾値以上である全ての組を抽出してもよい。また、抽出された正事例候補・負事例候補の組から、負事例候補の順位が高い組を優先させて選択しているが、式（３）のtripletごとのloss関数が小さくなる組を優先させて選択してもよいし、無作為に選択してもよい。また、図３の学習用tripletの選択（３０５）では、ラベル類似度のギャップが大きいものを優先して選択してもよいし、無作為に選択してもよい。 In the triplet set generation process for each search result described in FIG. 6, the pair of positive example candidate and negative example candidate with the largest gap in label similarity is extracted, but all pairs with a gap value equal to or greater than a certain threshold may be extracted. In addition, from the extracted pairs of positive example candidate and negative example candidate, the pair with the highest ranking of the negative example candidate is selected with priority, but the pair with the smallest loss function for each triplet in Equation (3) may be selected with priority, or may be selected randomly. In addition, in the selection of learning triplets (305) in FIG. 3, the pair with the largest gap in label similarity may be selected with priority, or may be selected randomly.

また、式（１）のラベル類似度の定義方法において、他の定義を採用してもよい。例えば、次式は、式（１）と同様、集合演算を用いた別の定義方法である。

In addition, other definitions may be adopted in the method of defining the label similarity in formula (1). For example, the following formula is another definition method using set operations, similar to formula (1).

更に、マルチラベルが付与された表現形式として、存在するラベルの総数を次元とするベクトルを想定し、あるデータに対して付与されている場合は１、付与させていない場合は０として、ベクトルの形式で表現する方法もある。この場合は、ベクトル間の内積、ないしは、ベクトル間の距離からラベル類似度を定義することができる。 Furthermore, as a representation format for multi-labeled data, there is also a method of expressing it in vector form, assuming a vector whose dimension is the total number of labels present, and representing it as 1 if a label has been assigned to a certain piece of data, and 0 if it has not been assigned. In this case, the label similarity can be defined from the inner product between vectors, or the distance between vectors.

以上詳細に説明したように、本実施例によれば、マルチラベルの学習データを対象としたtripletネットワーク学習を可能とする学習モデル更新装置２１０及び学習モデル更新方法を実現することができる。 As described above in detail, according to this embodiment, it is possible to realize a learning model updating device 210 and a learning model updating method that enable triplet network learning targeting multi-label learning data.

本実施例では、類似ベクトル検索を伴わない学習処理について説明する。 In this example, we will explain the learning process that does not involve similar vector search.

図７は、本実施例の構成図である。 Figure 7 shows the configuration of this embodiment.

本実施例を構成する各処理部は、図２に示した実施例１の対応する処理部と同様の機能を提供するものが多い。 Many of the processing units constituting this embodiment provide the same functions as the corresponding processing units in Example 1 shown in Figure 2.

学習モデル更新装置７１０は、マルチラベル情報が付与された画像データの集合を学習データ７２０として受け取る。また、必要な場合は、事前に他の学習課題で学習済みのネットワークモデル７３０を学習の初期状態として受け取る。 The learning model update device 710 receives a collection of image data to which multi-label information has been assigned as learning data 720. If necessary, it also receives a network model 730 that has been trained in advance on another learning task as the initial state of learning.

学習モデル更新装置７１０の内部では、triplet集合生成処理部７１１において学習に用いるtripletの集合を生成し、モデル更新処理部７１２において生成された学習用tripletの集合を用いたネットワークモデルの更新処理を行う。更新されたネットワークモデル７４０の状態は、モデル保存処理部７１３において必要に応じて外部の記憶媒体に保存される。 Inside the learning model update device 710, a triplet set generation processing unit 711 generates a set of triplets to be used for learning, and a model update processing unit 712 performs a network model update process using the generated set of learning triplets. The state of the updated network model 740 is saved in an external storage medium by a model saving processing unit 713 as necessary.

図８に、学習モデル更新装置７１０の処理の流れを示す。 Figure 8 shows the processing flow of the learning model update device 710.

学習モデル更新装置７１０では、先ず、学習データおよびネットワークモデルの初期状態を取得する（８０１）。次に、全学習データ間において式（１）のラベル類似度を算出し、その値を保持する（８０２）。 The learning model update device 710 first acquires the learning data and the initial state of the network model (801). Next, it calculates the label similarity of equation (1) between all the learning data and stores the value (802).

次に、本実施例におけるtriplet集合生成処理（８０３）を行う。本処理の内容については後述する。triplet集合生成処理（８０３）で取得されたtripletの集合から、一定の基準に基づき、実際に学習に用いる学習用tripletの選択（８０４）を行う。本基準の内容については、後述する。学習用tripletは、tripletを要素とする順序付けられた配列である。このようにして取得された学習用tripletによって、ネットワークモデルの更新（８０５）を行う。 Next, a triplet set generation process (803) in this embodiment is performed. The details of this process will be described later. From the set of triplets obtained in the triplet set generation process (803), a learning triplet to be actually used for learning is selected (804) based on a certain criterion. The details of this criterion will be described later. A learning triplet is an ordered array whose elements are triplets. The network model is updated (805) using the learning triplets obtained in this way.

本実施例では、triplet集合生成処理（８０３）からネットワークモデルの更新（８０５）に至るまでの処理を学習の１周期とする。学習終了の判定処理（８０６）では、通常、この周期の数が事前に設定された最大周期数に達した場合、学習を終了するが、他の評価基準に基づいて学習を終了してもよい。 In this embodiment, the process from triplet set generation process (803) to network model update (805) is considered to be one learning cycle. In the learning end determination process (806), learning is usually terminated when the number of cycles reaches a preset maximum number of cycles, but learning may also be terminated based on other evaluation criteria.

図９に、triplet集合生成処理（８０３）における処理の流れを示す。 Figure 9 shows the process flow for triplet set generation processing (803).

先ず、triplet集合の要素を空にする（９０１）。次に、学習データ中のデータを１つアンカーとして選択する（９０２）。このアンカーとして選択されたデータに基づき、アンカーごとのtriplet生成処理（９０３）を行う。本処理の内容については、後述する。 First, the elements of the triplet set are cleared (901). Next, one piece of data in the training data is selected as an anchor (902). Based on the data selected as the anchor, a triplet generation process (903) is performed for each anchor. The details of this process will be described later.

次に、取得されたアンカーごとのtriplet集合を本処理のtriplet集合に追加する（９０４）。（９０２）から（９０４）までの処理は、通常、全ての学習データをアンカーとして選択して行われるが、計算量の制限等の要請により、アンカーとするデータの件数に上限を設ける場合もある。この場合は、学習データ中から無作為に選択された一定個数のデータをアンカーとして用いる。 Next, the triplet set for each acquired anchor is added to the triplet set of this process (904). The processes from (902) to (904) are usually performed by selecting all the training data as anchors, but there are also cases where an upper limit is set on the number of data items to be used as anchors due to requirements such as restrictions on the amount of calculation. In this case, a certain number of data items randomly selected from the training data are used as anchors.

図１０に、アンカーごとのtriplet生成処理（９０３）の処理の流れを示す。 Figure 10 shows the process flow for generating triplet for each anchor (903).

先ず、本処理中に一時的に保持されるデータである、正事例候補・負事例候補の組の集合Ｓを空にする（１００１）。次に、アンカーａを除く学習データの中から、データの組（ｘ，ｙ）を無作為に選択する（１００２）。続いて、ａとｘ、ａとｙ、それぞれの間のラベル類似度より、式（２）のラベル類似度のギャップｇ（ａ，ｘ，ｙ）を算出する（１００３）。ここで、式（２）のクエリｑに対応するものが、本実施例ではアンカーａとなる。ギャップｇ（ａ，ｘ，ｙ）が予め定義された閾値以上であったなら、集合Ｓに（ｘ，ｙ）を追加する（１００５）。 First, the set S of pairs of positive and negative example candidates, which is data temporarily held during this process, is emptied (1001). Next, a data pair (x, y) is randomly selected from the training data excluding anchor a (1002). Next, the label similarity gap g(a, x, y) in equation (2) is calculated from the label similarity between a and x, and between a and y (1003). Here, the one corresponding to query q in equation (2) is anchor a in this embodiment. If the gap g(a, x, y) is equal to or greater than a predefined threshold, (x, y) is added to set S (1005).

（１００２）から（１００５）に至る処理は、一定の終了条件に達するまで繰り返し実行される（１００６）。本実施例では、集合Ｓの要素数が予め定義された個数に達した場合、あるいは、実行された回数が予め定義された最大数に達した場合、本繰り返し処理を終了する。 The process from (1002) to (1005) is repeated until a certain termination condition is met (1006). In this embodiment, this repeated process is terminated when the number of elements in set S reaches a predefined number, or when the number of times the process has been executed reaches a predefined maximum number.

次に、このようにして抽出された正事例候補・負事例候補の組の集合Ｓを、ラベル類似度のギャップが大きい順序にソートし、上位となる組を選択する（１００７）。この際の選択数には上限があり、それを超えたもの、すなわち、ラベル類似度のギャップが相対的に小さい組は切り捨てられる。このようにして選択された正事例候補・負事例候補の組から、tripletの集合を構成する（１００８）。 Next, the set S of pairs of positive example candidates and negative example candidates extracted in this way is sorted in order of the largest gap in label similarity, and the top pairs are selected (1007). There is an upper limit to the number of pairs that can be selected, and pairs exceeding this limit, i.e. pairs with relatively small gaps in label similarity, are discarded. A set of triplets is constructed from the pairs of positive example candidates and negative example candidates selected in this way (1008).

本実施例のtriplet集合生成処理（８０３）で生成されるtripletは、実施例１の場合とは異なり、正事例・負事例の選択に関する特徴量ベクトル空間上での制約を設けていない。従って、式（３）のtripletごとのloss関数は、負の値となる場合もある。学習用tripletの選択（８０４）では、生成されたtripletの集合を式（３）のtripletごとのloss関数の絶対値が小さい順にソートし、上位一定個数のtripletを最終的に学習に用いるtripletとして選択する。また、それに続くネットワークモデルの更新（８０５）では、ソートされた順序に従い、各tripletを用いたモデルの更新を行う。 Unlike in the first embodiment, the triplets generated in the triplet set generation process (803) of this embodiment have no constraints on the feature vector space regarding the selection of positive and negative cases. Therefore, the loss function for each triplet in formula (3) may be a negative value. In the selection of learning triplets (804), the set of generated triplets is sorted in ascending order of the absolute value of the loss function for each triplet in formula (3), and a certain number of the top triplets are selected as the triplets to be ultimately used for learning. In the subsequent network model update (805), the model is updated using each triplet according to the sorted order.

従って、本実施例によっても、上述の実施例１と同様の効果を得ることができる。 Therefore, this embodiment can achieve the same effects as the above-mentioned embodiment 1.

本実施例では、実施例１、２の学習モデル更新装置２１０、７１０により更新されたネットワークモデル２４０、７４０を用いて類似する画像データを検索する検索装置１１００について説明する。 In this embodiment, a search device 1100 is described that searches for similar image data using network models 240, 740 updated by the learning model update devices 210, 710 of embodiments 1 and 2.

図１１は、実施例１または実施例２により更新されたネットワークモデル２４０、７４０を用いた検索装置１１００の構成図である。 Figure 11 is a configuration diagram of a search device 1100 using a network model 240, 740 updated according to Example 1 or Example 2.

検索装置１１００の画像入力部１１０１に、クエリ（検索対象）である画像データが入力されると、特徴量算出部１１０２は入力された画像データの特徴量を算出する。次いで、検索部１１０３は、算出された特徴量をネットワークモデル２４０、７４０に入力し、検索結果である、クエリに類似する（と判断された）画像データを入手する。そして、検索結果表示部１１０４は、検索部１１０３により得られた検索結果である画像データを図略のディスプレイ等の表示装置を用いて表示する。 When image data that is a query (search target) is input to the image input unit 1101 of the search device 1100, the feature calculation unit 1102 calculates the feature amounts of the input image data. Next, the search unit 1103 inputs the calculated feature amounts to the network models 240, 740, and obtains image data that is similar (or judged to be similar) to the query, which is the search result. Then, the search result display unit 1104 displays the image data that is the search result obtained by the search unit 1103 using a display device such as a display not shown.

なお、上記した実施例は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、各実施例の構成の一部について、他の構成に追加、削除、置換することが可能である。 The above-mentioned embodiments are detailed descriptions of the configurations in order to clearly explain the present invention, and are not necessarily limited to those having all of the configurations described. In addition, some of the configurations of each embodiment can be added to, deleted from, or replaced with other configurations.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部または全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、本発明は、実施例の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をコンピュータに提供し、そのコンピュータが備えるプロセッサが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ハードディスク、ＳＳＤ（Solid State Drive）、光ディスク、光磁気ディスク、ＣＤ－Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 The above-mentioned configurations, functions, processing units, processing means, etc. may be realized in part or in whole by hardware, for example by designing them as integrated circuits. The present invention can also be realized by software program code that realizes the functions of the embodiments. In this case, a storage medium on which the program code is recorded is provided to a computer, and a processor of the computer reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-mentioned embodiments, and the program code itself and the storage medium on which it is stored constitute the present invention. Examples of storage media for supplying such program code include flexible disks, CD-ROMs, DVD-ROMs, hard disks, SSDs (Solid State Drives), optical disks, magneto-optical disks, CD-Rs, magnetic tapes, non-volatile memory cards, and ROMs.

また、本実施例に記載の機能を実現するプログラムコードは、例えば、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｊａｖａ（登録商標）、Ｐｙｔｈｏｎ等の広範囲のプログラムまたはスクリプト言語で実装できる。 In addition, the program code that realizes the functions described in this embodiment can be implemented in a wide range of program or script languages, such as assembler, C/C++, perl, Shell, PHP, Java (registered trademark), Python, etc.

さらに、各実施例の機能を実現するソフトウェアのプログラムコードのすべてまたは一部は、予め機械学習システムのストレージに格納されていてもよいし、必要に応じて、ネットワークに接続された他の装置の非一時的記憶装置から、または機械学習システムが備える図略の外部Ｉ／Ｆを介して、非一時的な記憶媒体からストレージに格納されてもよい。 Furthermore, all or part of the program code of the software that realizes the functions of each embodiment may be stored in advance in the storage of the machine learning system, or, if necessary, may be stored in the storage from a non-transitory storage device of another device connected to the network, or from a non-transitory storage medium via an external I/F (not shown) provided in the machine learning system.

さらに、実施例の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することによって、それをコンピュータのハードディスクやメモリ等の記憶手段またはＣＤ－ＲＷ、ＣＤ－Ｒ等の記憶媒体に格納し、コンピュータが備えるプロセッサが当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしてもよい。 Furthermore, the program code of the software that realizes the functions of the embodiment may be distributed over a network and stored in a storage means such as a computer's hard disk or memory, or in a storage medium such as a CD-RW or CD-R, and the processor of the computer may read and execute the program code stored in the storage means or storage medium.

上述の実施例において、制御線や情報線は、説明上必要と考えられるものを示しており
、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に
接続されていてもよい。
(付記1)
複数のラベルが付与された画像データの画像特徴量ベクトルを算出する学習モデルを受け入れ、この学習モデルを更新する学習モデル更新装置であって、
前記学習モデルに対する学習データとして受け入れた複数の前記画像データから選択された１の前記画像データであるクエリと、前記クエリとして選択された前記画像データと異なる複数の前記画像データとの間のラベル類似度を算出するラベル類似度算出部と、
前記ラベル類似度の算出の際に用いられた前記複数の前記画像データのうち、前記ラベル類似度のギャップが所定の条件を満たす前記画像データの組を正事例及び負事例の組として選択する選択処理部と、
前記選択された正事例と負事例の組に基づき前記学習モデルを更新するモデル更新処理部と、
を備えることを特徴とする学習モデル更新装置。
(付記2)
前記所定の条件は前記ギャップが最大であることを特徴とする(付記1)に記載の学習モデル更新装置。
(付記3)
前記所定の条件は前記ギャップが予め定められた所定の閾値以上であることを特徴とする(付記1)に記載の学習モデル更新装置。
(付記4)
前記学習モデルに基づき、前記クエリに対する複数の前記正事例と複数の前記負事例のそれぞれについて類似ベクトル検索を行い、前記クエリに対する類似ベクトル検索結果の順位を算出する類似ベクトル検索処理部を有し、
前記選択処理部は、前記所定の条件と前記順位に基づき前記正事例及び前記負事例の組を選択する
ことを特徴とする(付記2)または(付記3)に記載の学習モデル更新装置。
(付記5)
前記選択処理部は、前記ラベル類似度の前記ギャップが所定の条件を満たす前記正事例及び前記負事例の複数の組を候補して抽出し、前記順位が高い順に所定の順位までに入る前記負事例を特定し、抽出された前記複数の組の候補のうち特定された前記負事例が含まれる前記正事例と前記負事例との組を選択する
ことを特徴とする(付記4)に記載の学習モデル更新装置。
(付記6)
前記選択処理部は、前記ラベル類似度のギャップが所定の条件を満たす前記正事例及び前記負事例の複数の組を候補して抽出し、抽出された前記複数の組の候補のうちtripletのloss関数が最も小さい前記正事例及び前記負事例の組を選択する
ことを特徴とする(付記4)に記載の学習モデル更新装置。
(付記7)
前記tripletのloss関数は、前記クエリの前記特徴量ベクトルと前記正事例の前記特徴量ベクトルとの間の距離、及び前記クエリの前記特徴量ベクトルと前記負事例の前記特徴量ベクトルとの間の距離の２乗和誤差に基づいて算出されることを特徴とする(付記6)に記載の学習モデル更新装置。
(付記8)
前記ラベル類似度は、前記クエリに付与された前記ラベルの集合及び前記クエリとして選択された前記画像データと異なる複数の前記画像データに付与された前記ラベルの集合の和集合の要素数及び積集合の要素数に基づいて算出されることを特徴とする(付記1)に記載の学習モデル更新装置。
(付記9)
複数のラベルが付与された画像データの画像特徴量ベクトルを算出する学習モデルを受け入れ、この学習モデルを更新する学習モデル更新装置による学習モデル更新方法であって、
前記学習モデルに対する学習データとして受け入れた複数の前記画像データから選択された１の前記画像データであるクエリと、前記クエリとして選択された前記画像データと異なる複数の前記画像データとの間のラベル類似度を算出し、
前記ラベル類似度の算出の際に用いられた前記複数の前記画像データのうち、前記ラベル類似度のギャップが所定の条件を満たす前記画像データの組を正事例及び負事例の組として選択し、
前記選択された正事例と負事例の組に基づき前記学習モデルを更新することを特徴とする学習モデル更新装置による学習モデル更新方法。 In the above-mentioned embodiment, the control lines and information lines are shown as those considered necessary for the explanation, and not all the control lines and information lines are shown in the product. All the components may be connected to each other.
(Appendix 1)
A learning model update device that accepts a learning model that calculates an image feature vector of image data to which a plurality of labels are assigned, and updates the learning model,
a label similarity calculation unit that calculates label similarity between a query, which is one of the image data selected from the plurality of image data accepted as learning data for the learning model, and a plurality of image data different from the image data selected as the query;
a selection processing unit that selects, from among the plurality of image data used in calculating the label similarity, a pair of image data in which a gap in the label similarity satisfies a predetermined condition as a pair of a positive case and a negative case;
a model update processing unit that updates the learning model based on the selected set of positive examples and negative examples;
A learning model updating device comprising:
(Appendix 2)
The learning model updating device according to (Appendix 1), wherein the predetermined condition is that the gap is maximum.
(Appendix 3)
The learning model updating device described in (Appendix 1), wherein the specified condition is that the gap is greater than or equal to a predetermined threshold.
(Appendix 4)
a similar vector search processing unit that performs a similar vector search for each of the plurality of positive examples and the plurality of negative examples for the query based on the learning model, and calculates a ranking of similar vector search results for the query;
The learning model updating device according to (Supplementary Note 2) or (Supplementary Note 3), wherein the selection processing unit selects a set of the positive examples and the negative examples based on the predetermined condition and the ranking.
(Appendix 5)
The selection processing unit extracts a plurality of candidate pairs of the positive example and the negative example in which the gap in the label similarity satisfies a predetermined condition, identifies the negative examples that fall within a predetermined rank in descending order of the ranking, and selects a pair of the positive example and the negative example that includes the identified negative example from the plurality of extracted candidate pairs.
(Appendix 6)
The selection processing unit extracts a plurality of candidate pairs of the positive example and the negative example in which the gap in the label similarity satisfies a predetermined condition, and selects the pair of the positive example and the negative example having the smallest triplet loss function from the extracted plurality of candidate pairs.
(Appendix 7)
The learning model updating device described in (Appendix 6) is characterized in that the triplet loss function is calculated based on a squared sum error of a distance between the feature vector of the query and the feature vector of the positive example, and a distance between the feature vector of the query and the feature vector of the negative example.
(Appendix 8)
The learning model updating device described in (Appendix 1) is characterized in that the label similarity is calculated based on the number of elements in the union and the number of elements in the intersection of the set of labels assigned to the query and the set of labels assigned to multiple pieces of image data other than the image data selected as the query.
(Appendix 9)
A learning model updating method using a learning model updating device that accepts a learning model that calculates an image feature vector of image data to which a plurality of labels are assigned, and updates the learning model, comprising:
Calculating label similarity between a query, which is one image data selected from the plurality of image data accepted as learning data for the learning model, and a plurality of image data different from the image data selected as the query;
selecting, from among the plurality of image data used in the calculation of the label similarity, a pair of image data in which the gap in the label similarity satisfies a predetermined condition as a pair of a positive case and a negative case;
A learning model updating method using a learning model updating device, characterized in updating the learning model based on the selected set of positive examples and negative examples.

１１０…triplet １１１…アンカー１１２…正事例１１３…負事例１２０…深層型ニューラルネットワーク１３０…特徴量ベクトル１４０…正事例とアンカーとの距離１５０…負事例とアンカーとの距離２１０、７１０…学習モデル更新装置２１１…推論部２１２…類似ベクトル検索処理部２１３、７１１…triplet集合生成処理部２１３ａ…ラベル類似度算出部２１３ｂ…選択処理部２１４、７１２…モデル更新処理部２１５、７１３…モデル保存処理部２２０…学習データ２３０、２４０、７３０、７４０…ネットワークモデル 110...triplet 111...anchor 112...positive example 113...negative example 120...deep neural network 130...feature vector 140...distance between positive example and anchor 150...distance between negative example and anchor 210, 710...learning model update device 211...inference unit 212...similar vector search processing unit 213, 711...triplet set generation processing unit 213a...label similarity calculation unit 213b...selection processing unit 214, 712...model update processing unit 215, 713...model storage processing unit 220...learning data 230, 240, 730, 740...network model

Claims

A learning model update device that accepts a learning model that calculates an image feature vector of image data to which a plurality of labels are assigned, and updates the learning model,
a label similarity calculation unit that calculates label similarity between a query, which is one of the image data selected from the plurality of image data accepted as learning data for the learning model, and a plurality of image data different from the image data selected as the query;
a selection processing unit that selects, from among the plurality of image data used in calculating the label similarity, a pair of image data in which a gap in the label similarity satisfies a predetermined condition as a pair of a positive case and a negative case;
a model update processing unit that updates the learning model based on the selected set of positive examples and negative examples;
A learning model updating device comprising:

The learning model update device according to claim 1, characterized in that the predetermined condition is that the gap is maximum.

The learning model update device according to claim 1, characterized in that the specified condition is that the gap is equal to or greater than a predetermined threshold.

a similar vector search processing unit that searches for the image data having the image feature vector similar to the image feature vector of the query, obtains a similar vector search result for the query, and calculates a ranking in the similar vector search result for the query;
4. The learning model updating device according to claim 2, wherein the selection processing unit selects the set of the positive examples and the negative examples based on the predetermined condition and the ranking.

5. The learning model updating device according to claim 4, wherein the selection processing unit extracts a plurality of pairs of the positive example and the negative example in which the gap in the label similarity satisfies a predetermined condition as candidates, identifies a pre-specified number of the negative examples in descending order of the rank, and selects a pair of the positive example and the negative example including the identified negative example from the extracted plurality of candidate pairs.

5. The learning model updating device according to claim 4, wherein the selection processing unit extracts a plurality of pairs of the positive example and the negative example, the gap of the label similarity of which satisfies a predetermined condition, as candidates, and selects, from among the extracted plurality of candidate pairs, a candidate pair having a small triplet loss function as the pair of the positive example and the negative example, by giving priority to the candidate pair.

The learning model update device according to claim 6, characterized in that the triplet loss function is calculated based on the squared sum error of the distance between the feature vector of the query and the feature vector of the positive example, and the distance between the feature vector of the query and the feature vector of the negative example.

2. The learning model updating device according to claim 1, wherein the label similarity is calculated based on the number of elements in a union and a product of a set of the labels assigned to the query and a set of the labels assigned to a plurality of image data other than the image data selected as the query.

A learning model updating method using a learning model updating device that accepts a learning model that calculates an image feature vector of image data to which a plurality of labels are assigned, and updates the learning model, comprising:
Calculating label similarity between a query, which is one of the image data selected from the plurality of image data accepted as learning data for the learning model, and a plurality of image data different from the image data selected as the query;
selecting, from among the plurality of image data used in the calculation of the label similarity, a pair of image data in which the gap in the label similarity satisfies a predetermined condition as a pair of a positive case and a negative case;
A learning model updating method using a learning model updating device, characterized in updating the learning model based on the selected set of positive examples and negative examples.