JP7540595B2

JP7540595B2 - Model learning device, model learning method, and program

Info

Publication number: JP7540595B2
Application number: JP2023526801A
Authority: JP
Inventors: 圭吾若山; 翔一郎齊藤
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2024-08-27
Anticipated expiration: 2041-06-11
Also published as: WO2022259517A1; JPWO2022259517A1

Description

本発明は、機械学習技術に関する。 The present invention relates to machine learning technology.

近年、音響イベント検知、画像セグメンテーション、画像認識などの分野において、機械学習が用いられている。機械学習の一般的な手順について、以下説明する。 In recent years, machine learning has been used in fields such as acoustic event detection, image segmentation, and image recognition. The general steps of machine learning are described below.

（１）１つのデータに対してそのデータが属するクラスを示すラベル（正解ラベルという）を１つ付与し、正解ラベルが付与されたデータを生成する。この作業を多くのデータに対して行うことにより、正解ラベルが付与されたデータの集合を生成する。 (1) A label (called a correct label) is assigned to each piece of data, indicating the class to which the data belongs, and data with the correct label is generated. By performing this process on many pieces of data, a set of data with correct labels is generated.

（２）正解ラベルが付与されたデータの集合を用いてモデルを学習する。 (2) Train the model using a set of data with correct labels.

（３）学習済モデルを用いて、入力されたデータが属するクラス（正解クラスという）を推定する。 (3) Using the trained model, the class to which the input data belongs (called the correct class) is estimated.

上記手順に従い生成した学習済モデルを用いた正解クラスの推定において、推定精度を高めようとする場合、より多くの正解ラベルが付与されたデータが必要になる。また、推定対象となる、データが属するクラスの数を増やそうとする場合も、より多くの正解ラベルが付与されたデータが必要になる。しかし、正解ラベルを付与する作業は非常に手間がかかるものであり、正解ラベルが付与されたデータを大量に生成するのは困難である。 When trying to improve the accuracy of correct class estimation using a trained model generated according to the above procedure, data with more correct labels is required. Also, when trying to increase the number of classes to which the data to be estimated belongs, data with more correct labels is required. However, the task of assigning correct labels is very time-consuming, and it is difficult to generate large amounts of data with correct labels.

そこで、非特許文献１では、以下の手順の機械学習を提案している。Therefore, non-patent document 1 proposes the following machine learning procedure.

（１）１つのデータに対してそのデータが属さないクラスを示すラベル（補ラベルという）を１つ付与し、間違ったラベルである補ラベルが付与されたデータを生成する。この作業を多くのデータに対して行うことにより、補ラベルが付与されたデータの集合を生成する。 (1) A label (called a complementary label) is assigned to each piece of data, indicating a class to which the data does not belong, and data is generated that has been assigned the incorrect complementary label. By performing this process on many pieces of data, a set of data that has been assigned complementary labels is generated.

（２）補ラベルが付与されたデータの集合を用いてモデルを学習する。 (2) Train the model using a set of data with complementary labels.

あるデータに正解ラベルを付与するよりも補ラベルを付与する方が手間がかからないため、より多くのラベルが付与されたデータを生成することができる。また、非特許文献１の手順に従い生成した学習済モデルを用いた正解クラスの推定における推定精度は、一般的な手順に従い生成した学習済モデルを用いた正解クラスの推定における推定精度と同程度である。 Since it is less time-consuming to assign complementary labels to data than to assign correct labels to data, it is possible to generate data with more labels. In addition, the estimation accuracy of the correct class using a trained model generated according to the procedure in Non-Patent Document 1 is comparable to the estimation accuracy of the correct class using a trained model generated according to a general procedure.

T. Ishida et al., “Complementary-Label Learning for Arbitrary Losses and Models,” ICML 2019, pp.2971-2980, 2019.T. Ishida et al., “Complementary-Label Learning for Arbitrary Losses and Models,” ICML 2019, pp.2971-2980, 2019.

非特許文献１の技術では、１つのデータに対してそのデータが属するクラスが１つである問題（以下、多クラス分類問題という）を対象としており、１つのデータに対してそのデータが属するクラスが１つとは限らない（つまり、２つ以上ある場合もある）問題（以下、マルチラベル分類問題という）を扱うことができない。The technology in Non-Patent Document 1 targets problems where a single piece of data belongs to only one class (hereafter referred to as a multi-class classification problem), and cannot handle problems where a single piece of data belongs to more than one class (i.e., there may be two or more classes) (hereafter referred to as a multi-label classification problem).

そこで本発明では、マルチラベル分類問題を対象とする、補ラベルを用いたモデル学習技術を提供することを目的とする。 Therefore, the present invention aims to provide a model learning technology using complementary labels for multi-label classification problems.

本発明の一態様は、１個以上の補ラベルが付与されたデータの集合から、１個の補ラベルが付与されたデータの集合（以下、学習データ集合という）を生成する学習データ生成部と、前記学習データ集合の部分集合であるバッチを用いて、次式で計算される損失関数^-lossに関する決定関数gのリスク^-R(g:^-loss)を計算する第１リスク計算部と、

（ただし、Kはデータを分類するクラスの数、lossは１個以上の正解ラベルが付与されたデータの集合を用いてモデルを学習する場合に用いる損失関数）、リスク^-R(g:^-loss)を用いて、モデルを更新するモデル更新部と、を含む。 One aspect of the present invention includes a training data generation unit that generates a set of data to which one supplementary label is assigned (hereinafter referred to as a training data set) from a set of data to which one or more supplementary labels are assigned; and a first risk calculation unit that calculates a risk ^- R(g: ^-loss ) of a decision function g related to a loss function ^- loss, which is calculated by the following formula, using a batch that is a subset of the training data set;

(where K is the number of classes into which data is classified, and loss is a loss function used when training a model using a set of data to which one or more correct labels are assigned), and a model update unit that updates the model using risk ^- R(g: ^- loss).

（ただし、Kはデータを分類するクラスの数、lossは１個以上の正解ラベルが付与されたデータの集合を用いてモデルを学習する場合に用いる損失関数）、１個以上の正解ラベルが付与されたデータの集合の部分集合であるバッチを用いて、損失関数lossに関する決定関数gのリスクR(g:loss)を計算する第２リスク計算部と、リスク^-R(g:^-loss)とリスクR(g:loss)から、R(g)=α^-R(g:^-loss)+(1-α)(g:loss)（ただし、αは0<α<1を満たす定数）により、リスクR(g)を計算する第３リスク計算部と、リスクR(g)を用いて、モデルを更新するモデル更新部と、を含む。 One aspect of the present invention includes a training data generation unit that generates a set of data to which one supplementary label is assigned (hereinafter referred to as a training data set) from a set of data to which one or more supplementary labels are assigned; and a first risk calculation unit that calculates a risk ^- R(g: ^-loss ) of a decision function g related to a loss function ^- loss, which is calculated by the following formula, using a batch that is a subset of the training data set;

(where K is the number of classes for classifying data, and loss is a loss function used when training a model using a set of data to which one or more correct labels have been assigned), a second risk calculation unit that calculates a risk R(g:loss) of a decision function g related to the loss function loss using a batch that is a subset of the set of data to which one or more correct labels have been assigned; a third risk calculation unit that calculates a risk R(g) from the risk ^-R (g: ^-loss ) and the risk R(g:loss) by R(g)=α ^- R(g: ^-loss )+(1-α)(g:loss) (where α is a constant satisfying 0<α<1); and a model update unit that updates the model using the risk R(g).

本発明によれば、マルチラベル分類問題を対象とする、補ラベルを用いたモデル学習が可能となる。 According to the present invention, it becomes possible to learn a model using complementary labels for multi-label classification problems.

モデル学習装置１００の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a model learning device 100. モデル学習装置１００の動作を示すフローチャートである。4 is a flowchart showing the operation of the model learning device 100. モデル学習装置２００の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a model learning device 200. モデル学習装置２００の動作を示すフローチャートである。4 is a flowchart showing the operation of the model learning device 200. 本発明の実施形態における各装置を実現するコンピュータの機能構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of a functional configuration of a computer that realizes each device according to an embodiment of the present invention.

以下、本発明の実施の形態について、詳細に説明する。なお、同じ機能を有する構成部には同じ番号を付し、重複説明を省略する。Hereinafter, an embodiment of the present invention will be described in detail. Components having the same functions are given the same numbers, and duplicate explanations will be omitted.

各実施形態の説明に先立って、この明細書における表記方法について説明する。Before describing each embodiment, we will explain the notation used in this specification.

^（キャレット）は上付き添字を表す。例えば、x^{y^z}はy^zがxに対する上付き添字であり、x_y^zはy^zがxに対する下付き添字であることを表す。また、_（アンダースコア）は下付き添字を表す。例えば、x^y_zはy_zがxに対する上付き添字であり、x_{y_z}はy_zがxに対する下付き添字であることを表す。 ^ (caret) represents a superscript. For example, x ^{y^z} means that y ^z is a superscript to x, and x _y^z means that y ^z is a subscript to x. _ (underscore) represents a subscript. For example, x ^y_z means that y _z is a superscript to x, and x _{y_z} means that y _z is a subscript to x.

また、ある文字xに対する^xや~xのような上付き添え字の”^”や”~”は、本来”x”の真上に記載されるべきであるが、明細書の記載表記の制約上、^xや~xと記載しているものである。 In addition, superscripts such as "^" and "~" for a certain letter x, such as ^x and ~x, should actually be written directly above the "x", but due to the constraints of the description in the specification, they are written as ^x and ~x.

＜技術的背景＞
本発明の実施形態では、１個以上の補ラベルが付与されたデータを用いてマルチラベル分類問題に対するモデルを学習する。なお、本発明の実施形態を用いて、１個以上の補ラベルが付与されたデータを用いて多クラス分類問題に対するモデルを学習することもできる。 <Technical Background>
In an embodiment of the present invention, a model for a multi-label classification problem is trained using data with one or more supplementary labels. Note that the embodiment of the present invention can also be used to train a model for a multi-class classification problem using data with one or more supplementary labels.

以下、データが属する可能性があるクラス、つまり、データを分類するクラスの数をK、正解ラベルの集合[K]={1, …, K}とする。ここで、正解ラベルとは、クラス1に属すことを示すラベル、…、クラスKに属すことを示すラベルのことであり、それぞれ、1, …, Kで表す。 In the following, the number of classes to which data may belong, i.e., the number of classes into which data is classified, is defined as K, and the set of correct labels is [K] = {1, ..., K}. Here, correct labels are labels indicating that data belongs to class 1, ..., and labels indicating that data belongs to class K, and are represented as 1, ..., K, respectively.

K個の補ラベルを考える。ここで、K個の補ラベルとは、クラス1に属さないことを示すラベル、…、クラスKに属さないことを示すラベルのことであり、それぞれ、^-1, …, ^-Kで表すこととする。また、補ラベルの集合[^-K]={^-1, …, ^-K}で表すこととする。 Consider K complementary labels. Here, the K complementary labels are labels indicating that the class does not belong to class 1, ..., and labels indicating that the class does not belong to class K, and are represented as ^-1 , ..., ^-K , respectively. The set of complementary labels is represented as [ ^-K ]={ ^- 1, ..., ^-K }.

そして、１以上の補ラベルが付与されたデータについて、本発明の実施形態では、次のように取り扱うこととする。M個の補ラベルが付与されたデータに対して、Mが2以上である場合、当該データから1個の補ラベルが付与されたデータM個生成する。このことを以下で説明する記号を用いて説明すると、“M個の補ラベルが付与されたデータ（x_i, (^-y₁, …, ^-y_M)）（ただし、x_i∈χ, ^-y₁, …, ^-y_M∈[^-K]）から、1個の補ラベルが付与されたデータ（x_i, ^-y₁）, …, （x_i, ^-y_M)を生成する”となる。 In the embodiment of the present invention, data to which one or more complementary labels are assigned is handled as follows. For data to which M complementary labels are assigned, where M is 2 or more, M pieces of data to which one complementary label is assigned are generated from the data. This can be explained using the symbols described below as "data (x _i , ( ^-y ₁ , ..., ^-y _M )) to which M complementary labels are assigned (where x _i ∈χ, ^-y ₁ , ..., ^-y _M ∈[ ^-K ]) are generated as data (x _i , ^-y ₁ ), ..., (x _i , ^-y _M ) to which one complementary label is assigned."

以下、正解ラベル学習、補ラベル学習について詳しく説明する。ここで、正解ラベル学習とは、１個以上の正解ラベルが付与されたデータの集合を用いてモデルを学習することをいい、補ラベル学習とは、１個以上の補ラベルが付与されたデータの集合を用いてモデルを学習することをいう。Below, we will explain correct label learning and supplementary label learning in detail. Correct label learning here means learning a model using a set of data to which one or more correct labels have been assigned, and supplementary label learning means learning a model using a set of data to which one or more supplementary labels have been assigned.

［正解ラベル学習］
χをデータの集合、g:χ→R^Kを決定関数とする。また、g_kを決定関数gの第k要素とする。Dをχ×[K]上の分布（ただし、分布Dの確率変数を(X, Y)～Dと表す）、{P_k}_k=1 ^K（ただし、P_k=P(X|Y=k)）、{π_k}_k=1 ^K（ただし、π_k=P(Y=k)）、loss:[K]×R^K→R₊を正解ラベル学習の損失関数とすると、損失関数loss, 分布Dに関する決定関数gのリスクR(g:loss)は、次式で表される。

また、リスクR(g:loss)は、次式で表すこともできる。

マルチラベル分類問題に対するモデルを学習する場合、損失関数lossとして、以下のバイナリクロスエントロピーやマルチラベルソフトマージンを用いることができる。ここで、y_kはクラスkが存在する場合は1、それ以外の場合は0を表すものとする。 [Correct label learning]
Let χ be the set of data, g:χ→R ^K be the decision function, and g _k be the k-th element of the decision function g. Let D be the distribution on χ×[K] (where the random variables of distribution D are represented as (X, Y) to D), {P _k } _k=1 ^K (where P _k =P(X|Y=k)), {π _k } _k=1 ^K (where π _k =P(Y=k)), and loss:[K]×R ^K →R ₊ be the loss function for learning correct labels. The loss function loss, the risk R(g:loss) of the decision function g on distribution D, is expressed by the following equation.

In addition, the risk R(g:loss) can also be expressed by the following formula:

When training a model for a multi-label classification problem, the following binary cross entropy or multi-label soft margin can be used as the loss function, where y _k is 1 if class k exists and 0 otherwise.

（バイナリクロスエントロピー）

（マルチラベルソフトマージン）

なお、多クラス分類問題に対するモデルを学習する場合、損失関数lossとして、ソフトマックスクロスエントロピーを用いることができる。 (Binary Cross Entropy)

(Multi-label soft margin)

When training a model for a multi-class classification problem, softmax cross entropy can be used as the loss function.

［補ラベル学習］
^-Dをχ×[^-K]上の分布（ただし、分布^-Dの確率変数を(X, ^-Y)～^-Dと表す）、{^-P_k}_k=1 ^K（ただし、^-P_k=P(X|^-Y=k)）、{^-π_k}_k=1 ^K（ただし、^-π_k=P(^-Y=k)）、^-loss:[^-K]×R^K→R₊を補ラベル学習の損失関数とすると、損失関数^-loss, 分布^-Dに関する決定関数gのリスク^-R(g:^-loss)は、次式で表される。

また、リスク^-R(g:^-loss)は、次式で表すこともできる。

損失関数^-lossは、損失関数lossを用いた次式で計算される。

なお、１個以上の正解ラベルが付与されたデータの集合と１個以上の補ラベルが付与されたデータの集合とを用いてモデルを学習することもできる。この場合、決定関数gのリスクR(g)は、次式を用いて計算するとよい。

ただし、αは0<α<1を満たす定数である。 [Complementary label learning]
^{Let -D} be a distribution on χ×[ ^-K ] (where the random variable in distribution ^-D is represented as (X, ^-Y )～ ^-D ), { ^-Pk } _k=1K ⁽ ^{where -Pk} ₌ P(X| ^-Y = _k )), { ^-πk } _k _=1K ⁽ where ^-πk =P( _-Y =k)), ^and -loss:[ ^- K]× ^RK →R ₊ be the loss function for complementary label learning. Then, the loss function ^-loss , ^the risk ^-R (g: ^-loss ) of the decision function g on distribution ^-D is expressed by the following equation.

The risk ^−R (g: ^−loss ) can also be expressed as follows:

The loss function ^- loss is calculated using the loss function loss as follows:

It is also possible to train a model using a set of data to which one or more correct labels have been assigned and a set of data to which one or more complementary labels have been assigned. In this case, the risk R(g) of the decision function g may be calculated using the following formula:

Here, α is a constant that satisfies 0<α<1.

＜第１実施形態＞
以下、図１～図２を参照してモデル学習装置１００について説明する。図１は、モデル学習装置１００の構成を示すブロック図である。図２は、モデル学習装置１００の動作を示すフローチャートである。図１に示すようにモデル学習装置１００は、学習データ生成部１１０と、第１リスク計算部１２０と、モデル更新部１３０と、終了条件判定部１４０と、記録部１９０を含む。記録部１９０は、モデル学習装置１００の処理に必要な情報を適宜記録する構成部である。 First Embodiment
Below, the model learning device 100 will be described with reference to Figures 1 and 2. Figure 1 is a block diagram showing the configuration of the model learning device 100. Figure 2 is a flowchart showing the operation of the model learning device 100. As shown in Figure 1, the model learning device 100 includes a learning data generation unit 110, a first risk calculation unit 120, a model update unit 130, a termination condition determination unit 140, and a recording unit 190. The recording unit 190 is a component that appropriately records information necessary for the processing of the model learning device 100.

図２に従いモデル学習装置１００の動作について説明する。The operation of the model learning device 100 will be explained with reference to Figure 2.

Ｓ１１０において、学習データ生成部１１０は、１個以上の補ラベルが付与されたデータの集合（以下、入力補ラベル付きデータ集合という）から、１個の補ラベルが付与されたデータの集合（以下、学習データ集合という）を生成する。At S110, the training data generation unit 110 generates a set of data to which one complementary label has been assigned (hereinafter referred to as a training data set) from a set of data to which one or more complementary labels have been assigned (hereinafter referred to as an input complementary labeled data set).

Ｓ１２０において、第１リスク計算部１２０は、Ｓ１１０で生成した学習データ集合の部分集合であるバッチを用いて、次式で計算される損失関数^-lossに関する決定関数gのリスク^-R(g:^-loss)を計算する。

（ただし、Kはデータを分類するクラスの数、lossは１個以上の正解ラベルが付与されたデータの集合を用いてモデルを学習する場合に用いる損失関数）
Ｓ１３０において、モデル更新部１３０は、Ｓ１２０で計算したリスク^-R(g:^-loss)を用いて、モデルを更新する。具体的には、モデル更新部１３０は、リスク^-R(g:^-loss)を最小化するように、モデルを更新する。音響イベント検知に用いるモデルを学習する場合、モデルは、参考非特許文献１に記載の自己注意機構を備えたDNNモデルとすることができる。また、画像セグメンテーションに用いるモデルを学習する場合、モデルは、参考非特許文献２に記載のクラスアクティベーションマップを備えたDNNモデルとすることができる。 In S120, the first risk calculation unit 120 calculates a risk ^−R (g:-loss) of a decision function g related to a loss function ^−loss calculated by the following formula, using a batch which is a subset of the ^training data set generated in S110.

(where K is the number of classes into which data is classified, and loss is the loss function used when training a model using a set of data with one or more correct labels.)
In S130, the model update unit 130 updates the model using the risk ^- R(g: ^-loss ) calculated in S120. Specifically, the model update unit 130 updates the model so as to minimize the risk ^- R(g: ^-loss ). When training a model for use in acoustic event detection, the model may be a DNN model equipped with a self-attention mechanism as described in Reference Non-Patent Document 1. When training a model for use in image segmentation, the model may be a DNN model equipped with a class activation map as described in Reference Non-Patent Document 2.

（参考非特許文献１：Q. Kong et al., “Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.28, pp.2450-2460, 2020.）
（参考非特許文献２：Y. Wang et al., “Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation,” CVPR 2020, pp.12275-12284, 2020.）
Ｓ１４０において、終了条件判定部１４０は、所定の終了条件が満たされる場合には、Ｓ１３０の処理で得られたモデルを学習済みモデルとして処理を終了し、それ以外の場合には、Ｓ１２０の処理に戻る。終了条件には、例えば、モデル更新回数の上限に達したか否かという条件を用いることができる。 (Reference Non-Patent Document 1: Q. Kong et al., “Sound Event Detection of Weakly Labeled Data with CNN-Transformer and Automatic Threshold Optimization,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol.28, pp .2450-2460, 2020.)
(Reference Non-Patent Document 2: Y. Wang et al., “Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation,” CVPR 2020, pp.12275-12284, 2020.)
In S140, if a predetermined termination condition is satisfied, the termination condition determination unit 140 terminates the process by treating the model obtained in the process of S130 as a trained model, and otherwise returns to the process of S120. The termination condition may be, for example, whether or not the upper limit of the number of model updates has been reached.

（変形例）
上記Ｓ１１０～Ｓ１４０の処理は、以下のようにしてもよい。 (Modification)
The above-mentioned processes in steps S110 to S140 may be carried out as follows.

Ｓ１１０において、学習データ生成部１１０は、１個以上の補ラベルが付与されたデータの集合（以下、入力補ラベル付きデータ集合という）から、入力補ラベル付きデータ集合の部分集合であるバッチを生成し、当該バッチから、１個の補ラベルが付与されたデータの集合（以下、学習データ集合という）を生成する。In S110, the training data generation unit 110 generates a batch that is a subset of the input labeled data set from a set of data to which one or more complementary labels have been assigned (hereinafter referred to as the input labeled data set), and generates a set of data to which one complementary label has been assigned from the batch (hereinafter referred to as the training data set).

Ｓ１２０において、第１リスク計算部１２０は、Ｓ１１０で生成した学習データ集合を用いて、式(1)で計算される損失関数^-lossに関する決定関数gのリスク^-R(g:^-loss)を計算する。 In S120, the first risk calculation unit 120 calculates the risk ^−R (g: ^-loss ) of the decision function g related to the loss function ^−loss calculated by the formula (1) using the training data set generated in S110.

Ｓ１３０において、モデル更新部１３０は、Ｓ１２０で計算したリスク^-R(g:^-loss)を用いて、モデルを更新する。 In S130, the model update unit 130 updates the model using the risk ^−R (g: ^−loss ) calculated in S120.

Ｓ１４０において、終了条件判定部１４０は、所定の終了条件が満たされる場合には、Ｓ１３０の処理で得られたモデルを学習済みモデルとして処理を終了し、それ以外の場合には、Ｓ１１０の処理に戻る。In S140, if a specified termination condition is satisfied, the termination condition determination unit 140 terminates the processing by treating the model obtained in the processing of S130 as the trained model, and otherwise returns to the processing of S110.

本発明の実施形態によれば、マルチラベル分類問題を対象とする、補ラベルを用いたモデル学習が可能となる。補ラベルを付与したデータを用いることにより、より多くのデータを用いた学習が可能となり、学習済みモデルを用いた推定の精度を向上させることやより多くのクラスを対象とする推定が可能となる。 According to an embodiment of the present invention, it is possible to learn a model using a complementary label for a multi-label classification problem. By using data with a complementary label, it is possible to learn using a larger amount of data, thereby improving the accuracy of estimation using a trained model and enabling estimation for a larger number of classes.

＜第２実施形態＞
以下、図３～図４を参照してモデル学習装置２００について説明する。図３は、モデル学習装置２００の構成を示すブロック図である。図４は、モデル学習装置２００の動作を示すフローチャートである。図３に示すようにモデル学習装置２００は、学習データ生成部１１０と、第１リスク計算部１２０と、第２リスク計算部２２０と、第３リスク計算部２３０と、モデル更新部２４０と、終了条件判定部１４０と、記録部１９０を含む。記録部１９０は、モデル学習装置２００の処理に必要な情報を適宜記録する構成部である。 Second Embodiment
Hereinafter, model learning device 200 will be described with reference to Figs. 3 and 4. Fig. 3 is a block diagram showing the configuration of model learning device 200. Fig. 4 is a flowchart showing the operation of model learning device 200. As shown in Fig. 3, model learning device 200 includes a learning data generation unit 110, a first risk calculation unit 120, a second risk calculation unit 220, a third risk calculation unit 230, a model update unit 240, a termination condition determination unit 140, and a recording unit 190. Recording unit 190 is a component that appropriately records information necessary for the processing of model learning device 200.

図４に従いモデル学習装置２００の動作について説明する。The operation of the model learning device 200 will be explained with reference to Figure 4.

（ただし、Kはデータを分類するクラスの数、lossは１個以上の正解ラベルが付与されたデータの集合を用いてモデルを学習する場合に用いる損失関数）
Ｓ２２０において、第２リスク計算部２２０は、１個以上の正解ラベルが付与されたデータの集合（以下、入力正解ラベル付きデータ集合という）の部分集合であるバッチを用いて、損失関数lossに関する決定関数gのリスクR(g:loss)を計算する。 In S120, the first risk calculation unit 120 calculates a risk ^−R (g:-loss) of a decision function g related to a loss function ^−loss calculated by the following formula, using a batch which is a subset of the ^training data set generated in S110.

(where K is the number of classes into which data is classified, and loss is the loss function used when training a model using a set of data with one or more correct labels.)
In S220, the second risk calculation unit 220 calculates the risk R(g:loss) of the decision function g related to the loss function loss using a batch that is a subset of a set of data to which one or more correct labels have been assigned (hereinafter referred to as the input correct-labeled data set).

Ｓ２３０において、第３リスク計算部２３０は、Ｓ１２０で計算したリスク^-R(g:^-loss)とＳ１３０で計算したリスクR(g:loss)から、R(g)=α^-R(g:^-loss)+(1-α)(g:loss)（ただし、αは0<α<1を満たす定数）により、リスクR(g)を計算する。 In S230, the third risk calculation unit 230 calculates a risk R(g) from the risk ^-R (g: ^-loss ) calculated in S120 and the risk R(g:loss) calculated in S130 by R(g)=α ^- R(g: ^-loss )+(1-α)(g:loss) (where α is a constant satisfying 0<α<1).

Ｓ２４０において、モデル更新部２４０は、Ｓ２３０で計算したリスクR(g)を用いて、モデルを更新する。具体的には、モデル更新部２４０は、リスクR(g)を最小化するように、モデルを更新する。また、第１実施形態と同様、音響イベント検知に用いるモデルを学習する場合は参考非特許文献１に記載のモデル、画像セグメンテーションに用いるモデルを学習する場合は参考非特許文献２に記載のモデルとすることができる。In S240, the model update unit 240 updates the model using the risk R(g) calculated in S230. Specifically, the model update unit 240 updates the model so as to minimize the risk R(g). Also, as in the first embodiment, when training a model to be used for acoustic event detection, the model described in Reference Non-Patent Document 1 can be used, and when training a model to be used for image segmentation, the model described in Reference Non-Patent Document 2 can be used.

Ｓ１４０において、終了条件判定部１４０は、所定の終了条件が満たされる場合には、Ｓ２４０の処理で得られたモデルを学習済みモデルとして処理を終了し、それ以外の場合には、Ｓ１２０の処理、Ｓ２２０の処理に戻る。In S140, if a specified termination condition is satisfied, the termination condition determination unit 140 terminates the processing by treating the model obtained in the processing of S240 as the trained model, and otherwise returns to the processing of S120 and S220.

Ｓ１１０において、学習データ生成部１１０は、１個以上の補ラベルが付与されたデータの集合（以下、入力補ラベル付きデータ集合という）から、入力補ラベル付きデータ集合の部分集合であるバッチを生成し、当該バッチから、１個の補ラベルが付与されたデータの集合（以下、学習データ集合という）を生成する。In S110, the training data generation unit 110 generates a batch that is a subset of the input labeled data set from a set of data to which one or more complementary labels have been assigned (hereinafter referred to as the input labeled data set), and generates a set of data to which one complementary label has been assigned (hereinafter referred to as the training data set) from the batch.

Ｓ１２０において、第１リスク計算部１２０は、Ｓ１１０で生成した学習データ集合を用いて、式(2)で計算される損失関数^-lossに関する決定関数gのリスク^-R(g:^-loss)を計算する。 In S120, the first risk calculation unit 120 calculates the risk ^−R (g: ^-loss ) of the decision function g related to the loss function ^−loss calculated by the formula (2), using the training data set generated in S110.

Ｓ２２０において、第２リスク計算部２２０は、１個以上の正解ラベルが付与されたデータの集合（以下、入力正解ラベル付きデータ集合という）の部分集合であるバッチを用いて、損失関数lossに関する決定関数gのリスクR(g:loss)を計算する。In S220, the second risk calculation unit 220 calculates the risk R(g:loss) of the decision function g related to the loss function loss using a batch, which is a subset of a set of data to which one or more correct answer labels have been assigned (hereinafter referred to as the input labeled data set).

Ｓ２４０において、モデル更新部２４０は、Ｓ２３０で計算したリスクR(g)を用いて、モデルを更新する。In S240, the model update unit 240 updates the model using the risk R(g) calculated in S230.

Ｓ１４０において、終了条件判定部１４０は、所定の終了条件が満たされる場合には、Ｓ２４０の処理で得られたモデルを学習済みモデルとして処理を終了し、それ以外の場合には、Ｓ１１０の処理、Ｓ２２０の処理に戻る。In S140, if a specified termination condition is satisfied, the termination condition determination unit 140 terminates the processing by treating the model obtained in the processing of S240 as the trained model, and otherwise returns to the processing of S110 and S220.

＜補記＞
図５は、上述の各装置（つまり、各ノード）を実現するコンピュータの機能構成の一例を示す図である。上述の各装置における処理は、記録部２０２０に、コンピュータを上述の各装置として機能させるためのプログラムを読み込ませ、制御部２０１０、入力部２０３０、出力部２０４０などに動作させることで実施できる。 <Additional Notes>
5 is a diagram showing an example of the functional configuration of a computer that realizes each of the above-mentioned devices (i.e., each node). The processing in each of the above-mentioned devices can be implemented by having the recording unit 2020 load a program for causing the computer to function as each of the above-mentioned devices, and having the control unit 2010, input unit 2030, output unit 2040, etc. operate.

本発明の装置は、例えば単一のハードウェアエンティティとして、キーボードなどが接続可能な入力部、液晶ディスプレイなどが接続可能な出力部、ハードウェアエンティティの外部に通信可能な通信装置（例えば通信ケーブル）が接続可能な通信部、ＣＰＵ（Central Processing Unit、キャッシュメモリやレジスタなどを備えていてもよい）、メモリであるＲＡＭやＲＯＭ、ハードディスクである外部記憶装置並びにこれらの入力部、出力部、通信部、ＣＰＵ、ＲＡＭ、ＲＯＭ、外部記憶装置の間のデータのやり取りが可能なように接続するバスを有している。また必要に応じて、ハードウェアエンティティに、ＣＤ－ＲＯＭなどの記録媒体を読み書きできる装置（ドライブ）などを設けることとしてもよい。このようなハードウェア資源を備えた物理的実体としては、汎用コンピュータなどがある。 The device of the present invention, for example as a single hardware entity, has an input section to which a keyboard or the like can be connected, an output section to which an LCD display or the like can be connected, a communication section to which a communication device (for example a communication cable) capable of communicating with the outside of the hardware entity can be connected, a CPU (which may also have a central processing unit, cache memory, registers, etc.), memories such as RAM and ROM, an external storage device such as a hard disk, and buses connecting these input section, output section, communication section, CPU, RAM, ROM, and external storage device so that data can be exchanged between them. If necessary, the hardware entity may also be provided with a device (drive) capable of reading and writing recording media such as a CD-ROM. An example of a physical entity equipped with such hardware resources is a general-purpose computer.

ハードウェアエンティティの外部記憶装置には、上述の機能を実現するために必要となるプログラムおよびこのプログラムの処理において必要となるデータなどが記憶されている（外部記憶装置に限らず、例えばプログラムを読み出し専用記憶装置であるＲＯＭに記憶させておくこととしてもよい）。また、これらのプログラムの処理によって得られるデータなどは、ＲＡＭや外部記憶装置などに適宜に記憶される。The external storage device of the hardware entity stores the programs required to realize the above-mentioned functions and the data required in processing these programs (not limited to an external storage device, but for example the programs may be stored in a ROM, which is a read-only storage device). Data obtained by processing these programs is stored appropriately in RAM, an external storage device, etc.

ハードウェアエンティティでは、外部記憶装置（あるいはＲＯＭなど）に記憶された各プログラムとこの各プログラムの処理に必要なデータが必要に応じてメモリに読み込まれて、適宜にＣＰＵで解釈実行・処理される。その結果、ＣＰＵが所定の機能（上記、…部、…手段などと表した各構成部）を実現する。In a hardware entity, each program stored in an external storage device (or ROM, etc.) and the data required to process each program are loaded into memory as needed, and interpreted, executed, and processed by the CPU as appropriate. As a result, the CPU realizes a specified function (each component represented as the above, ... unit, ... means, etc.).

本発明は上述の実施形態に限定されるものではなく、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。また、上記実施形態において説明した処理は、記載の順に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されるとしてもよい。The present invention is not limited to the above-described embodiments, and appropriate modifications can be made without departing from the spirit of the present invention. Furthermore, the processes described in the above embodiments are not limited to being executed chronologically in the order described, but may be executed in parallel or individually depending on the processing capacity of the device executing the processes or as necessary.

既述のように、上記実施形態において説明したハードウェアエンティティ（本発明の装置）における処理機能をコンピュータによって実現する場合、ハードウェアエンティティが有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記ハードウェアエンティティにおける処理機能がコンピュータ上で実現される。As mentioned above, when the processing functions of the hardware entities (the devices of the present invention) described in the above embodiments are realized by a computer, the processing contents of the functions that the hardware entities should have are described by a program. Then, by executing this program on a computer, the processing functions of the hardware entities are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。具体的には、例えば、磁気記録装置として、ハードディスク装置、フレキシブルディスク、磁気テープ等を、光ディスクとして、ＤＶＤ（Digital Versatile Disc）、ＤＶＤ－ＲＡＭ（Random Access Memory）、ＣＤ－ＲＯＭ（Compact Disc Read Only Memory）、ＣＤ－Ｒ（Recordable）／ＲＷ（ReWritable）等を、光磁気記録媒体として、ＭＯ（Magneto-Optical disc）等を、半導体メモリとしてＥＥＰ－ＲＯＭ（Electronically Erasable and Programmable-Read Only Memory）等を用いることができる。 The program describing the processing contents can be recorded on a computer-readable recording medium. Examples of computer-readable recording media include magnetic recording devices, optical disks, magneto-optical recording media, and semiconductor memories. Specifically, for example, a hard disk drive, a flexible disk, a magnetic tape, etc. can be used as a magnetic recording device; a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only Memory), a CD-R (Recordable)/RW (ReWritable), etc. can be used as an optical disk; an MO (Magneto-Optical disc) can be used as a magneto-optical recording medium; and an EEP-ROM (Electronically Erasable and Programmable-Read Only Memory) can be used as a semiconductor memory.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ－ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program may be distributed, for example, by selling, transferring, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of a server computer and transferring the program from the server computer to other computers via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶装置に格納する。そして、処理の実行時、このコンピュータは、自己の記憶装置に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実行形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよく、さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、本形態におけるプログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。A computer that executes such a program, for example, first stores in its own storage device the program recorded on a portable recording medium or the program transferred from a server computer. Then, when executing a process, the computer reads the program stored in its own storage device and executes the process according to the read program. As another execution form of this program, the computer may read the program directly from the portable recording medium and execute the process according to the program, or may execute the process according to the received program each time a program is transferred from the server computer to this computer. In addition, the server computer may not transfer the program to this computer, but may execute the above-mentioned process by a so-called ASP (Application Service Provider) type service that realizes the processing function only by issuing an execution instruction and obtaining the results. Note that the program in this embodiment includes information used for processing by an electronic computer that is equivalent to a program (such as data that is not a direct command to the computer but has properties that specify the processing of the computer).

また、この形態では、コンピュータ上で所定のプログラムを実行させることにより、ハードウェアエンティティを構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, in this embodiment, a hardware entity is configured by executing a specific program on a computer, but at least a portion of these processing contents may also be realized by hardware.

上述の本発明の実施形態の記載は、例証と記載の目的で提示されたものである。網羅的であるという意思はなく、開示された厳密な形式に発明を限定する意思もない。変形やバリエーションは上述の教示から可能である。実施形態は、本発明の原理の最も良い例証を提供するために、そして、この分野の当業者が、熟考された実際の使用に適するように本発明を色々な実施形態で、また、色々な変形を付加して利用できるようにするために、選ばれて表現されたものである。すべてのそのような変形やバリエーションは、公正に合法的に公平に与えられる幅にしたがって解釈された添付の請求項によって定められた本発明のスコープ内である。The foregoing description of the embodiments of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings. The embodiments have been chosen and depicted to provide a best illustration of the principles of the invention and to enable those skilled in the art to utilize the invention in various embodiments and with various modifications as may be suitable for the practical use contemplated. All such modifications and variations are within the scope of the invention as defined by the appended claims interpreted in accordance with the breadth to which they are fairly, legally and equitably entitled.

Claims

a training data generation unit that generates a set of data to which one complementary label is assigned (hereinafter referred to as a training data set) from a set of data to which one or more complementary labels are assigned;
A first risk calculation unit that calculates a risk ^- R(g: ^-loss ) of a decision function g regarding a loss function ^- loss calculated by the following formula using a batch that is a subset of the learning data set;

(where K is the number of classes into which data is classified, and loss is the loss function used when training a model using a set of data with one or more correct labels.)
A second risk calculation unit that calculates a risk R(g:loss) of a decision function g related to a loss function loss using a batch that is a subset of a set of data to which one or more correct answer labels are assigned;
a third risk calculation unit that calculates risk R(g) from risk ^- R(g: ^- loss) and risk R(g:loss) by R(g)=α ^- R(g: ^- loss)+(1-α) R (g:loss) (where α is a constant satisfying 0<α<1);
a model updating unit that updates the model using the risk R(g);
A model learning device comprising:

a learning data generating step in which the model learning device generates a set of data to which one supplementary label is assigned (hereinafter referred to as a learning data set) from a set of data to which one or more supplementary labels are assigned;
A first risk calculation step in which the model learning device calculates a risk ^- R(g: ^-loss ) of a decision function g regarding a loss function ^- loss calculated by the following formula, using a batch that is a subset of the learning data set;

(where K is the number of classes into which data is classified, and loss is the loss function used when training a model using a set of data with one or more correct labels.)
A second risk calculation step in which the model learning device calculates a risk R(g:loss) of a decision function g related to a loss function loss using a batch that is a subset of a set of data to which one or more correct answer labels are assigned;
a third risk calculation step in which the model learning device calculates a risk R(g) from risk ^-R (g: ^-loss ) and risk R(g:loss) by R(g)=α ^- R(g: ^-loss )+(1-α) R (g:loss) (where α is a constant satisfying 0<α<1);
a model updating step in which the model learning device updates the model using the risk R(g);
A model training method including:

The model learning method according to claim 2 ,
A model training method characterized in that the loss function is binary cross entropy or multi-label soft margin.

A program for causing a computer to execute the model learning method according to claim 2 or 3 .