JP7618917B2

JP7618917B2 - Machine learning device, machine learning method, and machine learning program

Info

Publication number: JP7618917B2
Application number: JP2021010177A
Authority: JP
Inventors: 尹誠楊; 英樹竹原; 晋吾木田
Original assignee: JVCKenwood Corp
Current assignee: JVCKenwood Corp
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2025-01-22
Anticipated expiration: 2041-01-26
Also published as: JP2022114065A

Description

本発明は、機械学習技術に関する。 The present invention relates to machine learning technology.

人間は長期にわたる経験を通して新しい知識を学習することができ、昔の知識を忘れないように維持することができる。一方、畳み込みニューラルネットワーク（Convolutional Neural Network(CNN)）の知識は学習に使用したデータセットに依存しており、データ分布の変化に適応するためにはデータセット全体に対してＣＮＮのパラメータの再学習が必要となる。ＣＮＮでは、新しいタスクについて学習していくにつれて、昔のタスクに対する推定精度は低下していく。このようにＣＮＮでは連続学習を行うと新しいタスクの学習中に昔のタスクの学習結果を忘れてしまう致命的忘却(catastrophic forgetting)が避けられない。 Humans are able to learn new knowledge through long-term experience, and are able to retain old knowledge without forgetting it. On the other hand, the knowledge of a Convolutional Neural Network (CNN) depends on the dataset used for training, and in order to adapt to changes in the data distribution, it is necessary to retrain the CNN parameters for the entire dataset. As a CNN learns new tasks, its estimation accuracy for old tasks decreases. Thus, when a CNN performs continuous training, it is unavoidable to suffer from catastrophic forgetting, in which the learning results of old tasks are forgotten while learning a new task.

致命的忘却を回避する手法として、継続学習（incremental learningまたはcontinual learning）が提案されている。継続学習の一つの手法としてＰａｃｋＮｅｔがある。 Incremental learning or continual learning has been proposed as a method to avoid fatal forgetting. One method of continuous learning is PackNet.

特許文献１には、入力された検証用画像と類似する画像にラベルを付与することで教師データを作成し、作成された教師データを用いて教師あり画像分類器を学習させる情報処理システムが開示されている。 Patent Document 1 discloses an information processing system that creates training data by labeling images that are similar to an input verification image, and trains a supervised image classifier using the created training data.

特開２０１７－１１１７３１号公報JP 2017-111731 A

継続学習の一つの手法であるＰａｃｋＮｅｔは、致命的忘却問題を回避することができる。しかし、入力データにラベル情報がなければ継続学習に利用することができないという問題があった。また、入力データを学習に用いる頻度を制御しなければ学習に偏りが生じることがある。 PackNet, one method of continuous learning, can avoid the fatal forgetting problem. However, there is a problem in that if the input data does not have label information, it cannot be used for continuous learning. In addition, if the frequency with which the input data is used for learning is not controlled, bias in the learning may occur.

本発明はこうした状況に鑑みてなされたものであり、その目的は、入力データにラベル情報を与え、入力データを学習に用いる頻度を制御することができる機械学習技術を提供することにある。 The present invention was made in light of these circumstances, and its purpose is to provide a machine learning technique that can assign label information to input data and control the frequency with which the input data is used for learning.

上記課題を解決するために、本発明のある態様の機械学習装置は、入力タスクの特徴マップと予め登録されている各タスクの代表特徴マップの類似度を算出し、最も類似度が高いタスクが持つラベルを前記入力タスクに付与する分類部と、前記入力タスクを継続学習に用いるかどうかを決定する制御部と、前記入力タスクを継続学習に用いる場合、ラベルが付与された前記入力タスクを教師データとして継続学習する継続学習部とを含む。 To solve the above problem, a machine learning device according to one embodiment of the present invention includes a classification unit that calculates the similarity between a feature map of an input task and a representative feature map of each task registered in advance, and assigns the label of the most similar task to the input task, a control unit that decides whether to use the input task for continued learning, and a continuous learning unit that, if the input task is to be used for continued learning, performs continuous learning using the labeled input task as training data.

本発明の別の態様は、機械学習方法である。この方法は、入力タスクの特徴マップと予め登録されている各タスクの代表特徴マップの類似度を算出し、最も類似度が高いタスクが持つラベルを前記入力タスクに付与する分類ステップと、前記入力タスクを継続学習に用いるかどうかを決定する制御ステップと、前記入力タスクを継続学習に用いる場合、ラベルが付与された前記入力タスクを教師データとして継続学習する継続学習ステップとを含む。 Another aspect of the present invention is a machine learning method. This method includes a classification step of calculating the similarity between a feature map of an input task and a representative feature map of each task registered in advance, and assigning the label of the task with the highest similarity to the input task, a control step of determining whether or not to use the input task for continued learning, and a continuous learning step of performing continuous learning using the labeled input task as training data if the input task is to be used for continued learning.

なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本発明の態様として有効である。 In addition, any combination of the above components, and any transformation of the present invention into a method, device, system, recording medium, computer program, etc., are also valid aspects of the present invention.

本発明によれば、入力データにラベル情報を与え、入力データを学習に用いる頻度を制御することができる機械学習技術を提供することができる。 The present invention provides a machine learning technique that can assign label information to input data and control the frequency with which the input data is used for learning.

実施の形態に係る機械学習装置の構成図である。FIG. 1 is a configuration diagram of a machine learning device according to an embodiment. 図１のスクリーニング処理部の構成図である。FIG. 2 is a configuration diagram of a screening processing unit in FIG. 1. 図２のスクリーニング部の構成図である。FIG. 3 is a configuration diagram of a screening unit in FIG. 2. 図２の学習制御部の構成図である。FIG. 3 is a configuration diagram of a learning control unit in FIG. 2 . 図１のタスク保存部の構成図である。FIG. 2 is a configuration diagram of a task storage unit in FIG. 1 . 図１のタスクデータベースの構成図である。FIG. 2 is a diagram showing the configuration of a task database in FIG. 1; 図１の機械学習装置による継続学習手順を説明するフローチャートである。2 is a flowchart illustrating a continuous learning procedure by the machine learning device of FIG. 1 . 畳み込みニューラルネットワークの各畳み込み層から出力される特徴マップを説明する図である。FIG. 1 is a diagram illustrating a feature map output from each convolutional layer of a convolutional neural network. 畳み込みニューラルネットワークの各畳み込み層から出力される特徴マップの連結を説明する図である。FIG. 1 is a diagram illustrating the concatenation of feature maps output from each convolutional layer of a convolutional neural network. チャネル数の異なる特徴マップを加算して連結した特徴マップを生成する方法を説明する図である。FIG. 11 is a diagram illustrating a method for generating a concatenated feature map by adding feature maps with different numbers of channels. チャネル数とサイズの両方が異なる特徴マップを加算して連結した特徴マップを生成する方法を説明する図である。FIG. 13 is a diagram illustrating a method for generating a concatenated feature map by adding feature maps that differ in both the number of channels and the size. 特徴マップを結合して連結した特徴マップを生成する方法を説明する図である。FIG. 1 illustrates a method for combining feature maps to generate a concatenated feature map.

図１は、実施の形態に係る機械学習装置１００の構成図である。機械学習装置１００は、入力部１０、スクリーニング処理部２０、タスク保存部３０、継続学習部４０、タスクデータベース５０、推論部６０、および出力部７０を含む。 Figure 1 is a configuration diagram of a machine learning device 100 according to an embodiment. The machine learning device 100 includes an input unit 10, a screening processing unit 20, a task storage unit 30, a continuous learning unit 40, a task database 50, an inference unit 60, and an output unit 70.

入力部１０は、ラベルを付与すべきタスクをスクリーニング処理部２０に供給し、未知タスクを推論部６０に供給する。ここでは、一例としてタスクは画像認識である。たとえば、タスク１は猫の認識、タスク２は犬の認識といった画像における特定の物体の認識である。 The input unit 10 supplies the tasks to be labeled to the screening processing unit 20 and supplies the unknown tasks to the inference unit 60. Here, as an example, the tasks are image recognition. For example, task 1 is the recognition of a specific object in an image, such as recognizing a cat, and task 2 is the recognition of a dog.

スクリーニング処理部２０は、入力されたタスクの特徴マップをタスクデータベース５０に登録されたタスクの代表特徴マップと比較して類似度を算出し、入力されたタスクを最も類似度が高いタスクに分類し、入力されたタスクの類似度をもつタスクが学習に用いられた頻度および類似度の少なくとも一方にもとづいて入力されたタスクを継続学習に利用するかどうかを決定する。 The screening processing unit 20 compares the feature map of the input task with the representative feature maps of tasks registered in the task database 50 to calculate the similarity, classifies the input task into the task with the highest similarity, and determines whether to use the input task for continued learning based on at least one of the frequency with which tasks similar to the input task were used for learning and the similarity.

入力されたタスクの画像を物体認識モデルであるニューラルネットワークモデルに入力すると特徴マップが出力される。入力画像の特徴マップが例えば猫の特徴マップと類似するならば、入力画像は猫の画像であると認識し、猫というラベルを付与する。 When the input task image is input into a neural network model, which is an object recognition model, a feature map is output. If the feature map of the input image is similar to, for example, the feature map of a cat, the input image is recognized as an image of a cat and is labeled as a cat.

ここで、タスクは、１つの特徴マップだけで物体を認識しているわけではなく、複数の特徴マップを使って物体を認識している。そこで、タスクの複数の特徴マップの平均値などの代表値と入力タスクの特徴マップを比較して類似度を算出する。 Here, the task does not recognize an object using only one feature map, but multiple feature maps. Therefore, the similarity is calculated by comparing a representative value, such as the average value, of the task's multiple feature maps with the feature map of the input task.

類似度は、特徴マップの要素の値の絶対値を比較することによって算出する。たとえば、特徴マップのサイズが２５６×２５６の場合、６５５３６個の要素の絶対値を比較する。ここで、閾値を設定する。類似度が閾値を上回ると、二つの特徴マップは重複していると判定される。 The similarity is calculated by comparing the absolute values of the elements of the feature maps. For example, if the size of the feature maps is 256 x 256, the absolute values of 65,536 elements are compared. A threshold is set here. If the similarity exceeds the threshold, the two feature maps are determined to overlap.

特徴マップＡの各要素をａ_ｉｊ、特徴マップＢの各要素をｂ_ｉｊとした場合、二つの特徴マップＡ、Ｂ間で同じ位置にある値の絶対値の差を、たとえば次式のｄ_１（Ａ，Ｂ）、ｄ_２（Ａ，Ｂ）、ｄ_∞（Ａ，Ｂ）、ｄ_ｍ（Ａ，Ｂ）のように計算する。
If each element of feature map A is _aij and each element of feature map B is _bij , the difference in absolute values between values at the same position between two feature maps A and B is calculated, for example, as _d1 (A,B), _d2 (A,B), _d∞ (A,B), and _dm (A,B) in the following equations.

上記の説明では、特徴マップの類似度は、二つの特徴マップ間で同じ位置にある値の絶対値の差を計算することによって算出したが、これ以外の方法で類似度を算出してもよい。たとえば、各特徴マップについて、特徴マップ絶対差分和ＳＡＤを水平方向絶対差分和ＳＡＤ＿Ｈと垂直方向絶対差分和ＳＡＤ＿Ｖの和として、ＳＡＤ＝ＳＡＤ＿Ｈ＋ＳＡＤ＿Ｖにより求める。特徴マップＡの特徴マップ絶対差分和ＳＡＤ＿Ａと特徴マップＢの特徴マップ絶対差分和ＳＡＤ＿Ｂの差が閾値より小さいなら、特徴マップＡと特徴マップＢは重複していると判定してもよい。ここで、説明を簡単にするため、特徴マップのサイズが３×３であるとして、特徴マップの第１行の要素をａ１、ａ２、ａ３、第２行の要素をａ４、ａ５、ａ６、第３行の要素をａ７、ａ８、ａ９とした場合、水平方向絶対差分和ＳＡＤ＿Ｈと垂直方向絶対差分和ＳＡＤ＿Ｖは次式で与えられる。
ＳＡＤ＿Ｈ＝｜ａ１－ａ２｜＋｜ａ２－ａ３｜＋｜ａ４－ａ５｜＋｜ａ５－ａ６｜＋｜ａ７－ａ８｜＋｜ａ８－ａ９｜
ＳＡＤ＿Ｖ＝｜ａ１－ａ４｜＋｜ａ２－ａ５｜＋｜ａ３－ａ６｜＋｜ａ４－ａ７｜＋｜ａ５－ａ８｜＋｜ａ６－ａ９｜
また、別の類似度の算出方法として、ユークリッド距離やコサイン距離の比較を用いてもよい。 In the above description, the similarity between the feature maps is calculated by calculating the difference between the absolute values of values at the same position between the two feature maps, but the similarity may be calculated by other methods. For example, for each feature map, the feature map absolute difference sum SAD is calculated as the sum of the horizontal absolute difference sum SAD_H and the vertical absolute difference sum SAD_V, and SAD = SAD_H + SAD_V is calculated. If the difference between the feature map absolute difference sum SAD_A of the feature map A and the feature map absolute difference sum SAD_B of the feature map B is smaller than a threshold value, the feature maps A and B may be determined to be overlapping. Here, for the sake of simplicity, assuming that the size of the feature map is 3 x 3, the elements of the first row of the feature map are a1, a2, a3, the elements of the second row are a4, a5, a6, and the elements of the third row are a7, a8, a9, the horizontal absolute difference sum SAD_H and the vertical absolute difference sum SAD_V are given by the following formulas.
SAD_H=|a1-a2|+|a2-a3|+|a4-a5|+|a5-a6|+|a7-a8|+|a8-a9|
SAD_V=|a1-a4|+|a2-a5|+|a3-a6|+|a4-a7|+|a5-a8|+|a6-a9|
As another method for calculating the similarity, a comparison of Euclidean distance or cosine distance may be used.

タスク保存部３０は、分類された入力タスクによってタスクデータベース５０を更新し、頻度および類似度の少なくとも一方にもとづいて、分類された入力タスクを継続学習部４０に与えるかどうかを制御する。たとえば、分類された入力タスクの頻度がタスクデータベース５０の登録タスクの平均的な頻度未満の場合、または分類された入力タスクの類似度が所定の閾値未満の場合、分類された入力タスクを継続学習に用いるが、分類された入力タスクの頻度が平均的な頻度以上の場合、または分類された入力タスクの類似度が所定の閾値以上の場合、分類された入力タスクを継続学習に用いない。ここで、入力タスクの頻度は、例えば、入力タスクを学習した回数、入力タスクの使用回数をタスクデータベース５０に登録された他のタスクの使用回数と比較した場合の相対的な指標などにより、評価される。 The task storage unit 30 updates the task database 50 with the classified input tasks, and controls whether to provide the classified input tasks to the continuous learning unit 40 based on at least one of the frequency and similarity. For example, if the frequency of the classified input task is less than the average frequency of the tasks registered in the task database 50, or if the similarity of the classified input task is less than a predetermined threshold, the classified input task is used for continuous learning, but if the frequency of the classified input task is equal to or greater than the average frequency, or if the similarity of the classified input task is equal to or greater than a predetermined threshold, the classified input task is not used for continuous learning. Here, the frequency of the input task is evaluated, for example, by the number of times the input task has been learned, or a relative index obtained by comparing the number of times the input task has been used with the number of times other tasks registered in the task database 50 have been used, or the like.

継続学習部４０は、タスク保存部３０から分類された入力タスクを受け取った場合、その分類された入力タスクを教師付きデータとして物体認識モデルを継続学習する。 When the continuous learning unit 40 receives a classified input task from the task storage unit 30, it continues learning the object recognition model using the classified input task as supervised data.

推論部６０は、継続学習部４０により学習された物体認識モデルを用いて、入力部１０から受け取った未知タスクの分類を推論する。出力部７０は、推論部６０による推論結果を出力する。 The inference unit 60 uses the object recognition model learned by the continuous learning unit 40 to infer the classification of the unknown task received from the input unit 10. The output unit 70 outputs the inference result by the inference unit 60.

図２は、図１のスクリーニング処理部２０の構成図である。スクリーニング処理部２０は、スクリーニング部８０および学習制御部９０を含む。 Figure 2 is a configuration diagram of the screening processing unit 20 in Figure 1. The screening processing unit 20 includes a screening unit 80 and a learning control unit 90.

スクリーニング部８０は、タスクデータベース５０から読み出された各タスクの代表特徴マップを入力されたタスクの特徴マップと比較し、入力されたタスクを最も類似するタスクに分類し、入力されたタスクにラベル情報を付与する。スクリーニング部８０は、入力されたタスクのラベル情報および類似度を学習制御部９０に供給し、入力されたタスクの特徴マップをタスク保存部３０に供給する。 The screening unit 80 compares the representative feature map of each task read from the task database 50 with the feature map of the input task, classifies the input task into the most similar task, and assigns label information to the input task. The screening unit 80 supplies the label information and similarity of the input task to the learning control unit 90, and supplies the feature map of the input task to the task storage unit 30.

学習制御部９０は、入力タスクと同じ類似度をもつタスクが学習された頻度にもとづいて入力されたタスクを継続学習部４０に与えるかどうかを決定する。 The learning control unit 90 determines whether to provide the input task to the continuous learning unit 40 based on the frequency with which tasks with the same similarity as the input task have been learned.

図３は、スクリーニング部８０の構成図である。スクリーニング部８０は、代表特徴マップ取得部８２、特徴マップ変換部８４、および分類部８６を含む。 Figure 3 is a configuration diagram of the screening unit 80. The screening unit 80 includes a representative feature map acquisition unit 82, a feature map conversion unit 84, and a classification unit 86.

代表特徴マップ取得部８２は、タスクデータベース５０から各タスクの代表特徴マップを読み出し、分類部８６に供給する。特徴マップ変換部８４は、入力されたタスクの特徴マップを計算し、分類部８６に供給する。分類部８６は、入力されたタスクの特徴マップとタスクデータベース５０の各タスクの代表特徴マップの類似度を計算し、入力されたタスクを最も類似度が高いタスクに分類し、入力されたタスクにラベルを付与する。分類部８６は、入力されたタスクのラベルと類似度を学習制御部９０に、入力されたタスクの特徴マップをタスク保存部３０に供給する。 The representative feature map acquisition unit 82 reads out the representative feature map of each task from the task database 50 and supplies it to the classification unit 86. The feature map conversion unit 84 calculates the feature map of the input task and supplies it to the classification unit 86. The classification unit 86 calculates the similarity between the feature map of the input task and the representative feature map of each task in the task database 50, classifies the input task into the task with the highest similarity, and assigns a label to the input task. The classification unit 86 supplies the label and similarity of the input task to the learning control unit 90, and supplies the feature map of the input task to the task storage unit 30.

図４は、学習制御部９０の構成図である。学習制御部９０は、頻度情報取得部９２、類似度情報取得部９４、および学習頻度制御部９６を含む。 Figure 4 is a configuration diagram of the learning control unit 90. The learning control unit 90 includes a frequency information acquisition unit 92, a similarity information acquisition unit 94, and a learning frequency control unit 96.

頻度情報取得部９２は、入力されたタスクと同じ類似度を有する同一ラベルのタスクが学習された頻度をタスクデータベース５０から取得し、学習頻度制御部９６に供給する。類似度情報取得部９４は、入力されたタスクの類似度を学習頻度制御部９６に供給する。また、頻度情報取得部９２は頻度をタスク保存部３０に供給し、類似度情報取得部９４は類似度をタスク保存部３０に供給する。 The frequency information acquisition unit 92 acquires the frequency at which a task with the same label and the same similarity as the input task is learned from the task database 50, and supplies this to the learning frequency control unit 96. The similarity information acquisition unit 94 supplies the similarity of the input task to the learning frequency control unit 96. In addition, the frequency information acquisition unit 92 supplies the frequency to the task storage unit 30, and the similarity information acquisition unit 94 supplies the similarity to the task storage unit 30.

学習頻度制御部９６は、頻度および類似度の少なくとも一方にもとづいて入力されたタスクを継続学習に用いるかどうかを決定し、継続学習の有無をタスク保存部３０に与える。 The learning frequency control unit 96 determines whether or not to use the input task for continued learning based on at least one of the frequency and similarity, and notifies the task storage unit 30 of whether or not continued learning is to be used.

継続学習では、学習データの順番に学習結果が大きく依存し、学習データの与え方に偏りがあれば、結果に大きな影響が出る。学習データの多様性を維持するために、同じ類似度を持つタスクが学習された頻度を計測し、学習データの頻度の分布を均一にする。 In continuous learning, the learning results are highly dependent on the order of the learning data, and any bias in the way the learning data is presented will have a significant impact on the results. To maintain the diversity of the learning data, we measure the frequency with which tasks with the same similarity are learned, and make the frequency distribution of the learning data uniform.

学習頻度制御部９６は、入力されたタスクの学習頻度がタスクデータベース５０に登録された各タスクの平均的な学習頻度よりも低いか、入力されたタスクの類似度が所定の閾値よりも低い場合、入力されたタスクを継続学習に用いる。これは、学習頻度が低いタスクや類似度が低いタスクは継続学習に積極的に用いることでタスク全体の推論精度を偏りなく向上させることができるからである。 The learning frequency control unit 96 uses the input task for continued learning if the learning frequency of the input task is lower than the average learning frequency of each task registered in the task database 50, or if the similarity of the input task is lower than a predetermined threshold. This is because the inference accuracy of the entire task can be improved without bias by actively using tasks with low learning frequency or low similarity for continued learning.

例えば、猫の画像が学習された頻度が高く、既に猫の認識は十分な精度があるにもかかわらず、入力された猫の画像を再度学習してしまうと、学習に偏りが生じてしまう。そこで十分な学習後には余分な学習を行わないようにすることで他のタスクとの学習の回数を均等にし、致命的忘却の発生リスクを下げる。逆に犬の画像の学習頻度が他のタスクと比べて低い場合、入力された犬の画像を継続学習することで、他のタスクとの学習の回数を均等にする。 For example, if cat images have been learned frequently and cat recognition is already sufficiently accurate, relearning the input cat images will result in biased learning. Therefore, by preventing additional learning after sufficient learning, the number of times learning with other tasks is made equal, lowering the risk of fatal forgetting. Conversely, if the frequency with which dog images are learned is low compared to other tasks, the number of times learning with other tasks is made equal by continuing to learn the input dog images.

図５は、タスク保存部３０の構成図である。タスク保存部３０は、更新判定部３２を含み、タスクデータベース５０に登録されたタスクの更新判定を行う。 Figure 5 is a configuration diagram of the task storage unit 30. The task storage unit 30 includes an update determination unit 32, which performs update determination for tasks registered in the task database 50.

更新判定部３２は、特徴マップ、類似度、頻度、および継続学習の有無にもとづいてタスクデータベース５０に登録されたタスクを必要に応じて更新する。タスクの学習頻度は類似度毎に格納されており、頻度情報は毎回更新される。タスクの代表特徴マップは、継続学習に与える入力タスクの特徴マップを用いて更新される。継続学習に与えない入力タスクについては無視し、代表特徴マップの更新には用いられない。 The update determination unit 32 updates the tasks registered in the task database 50 as necessary based on the feature map, similarity, frequency, and whether or not continued learning is performed. The learning frequency of a task is stored for each similarity, and the frequency information is updated each time. The representative feature map of a task is updated using the feature map of the input task provided for continued learning. Input tasks that are not provided for continued learning are ignored and are not used to update the representative feature map.

タスク保存部３０は、継続学習の有無を継続学習部４０に供給する。継続学習部４０は、継続学習の有無を参照して、入力されたタスクを継続学習に利用する場合、入力されたタスクを教師データとして継続学習するが、入力されたタスクを継続学習に利用しない場合、入力されたタスクを無視する。 The task storage unit 30 supplies the continued learning unit 40 with information on whether or not continued learning is to be performed. The continued learning unit 40 refers to the continued learning status, and if the input task is to be used for continued learning, it continues learning using the input task as teacher data, but if the input task is not to be used for continued learning, it ignores the input task.

図６は、タスクデータベース５０の構成図である。タスクデータベース５０には、各タスクの代表特徴マップ５２と頻度５４が格納されている。ここでタスクの特徴マップの代表値は、たとえばタスクの複数の特徴マップの平均値、中央値、最頻値、最大値、最小値などである。 Figure 6 is a diagram showing the configuration of the task database 50. The task database 50 stores a representative feature map 52 and a frequency 54 for each task. Here, the representative value of the feature map of a task is, for example, the average value, median value, mode value, maximum value, minimum value, etc. of multiple feature maps of the task.

図７は、図１の機械学習装置１００による継続学習手順を説明するフローチャートである。 Figure 7 is a flowchart explaining the continuous learning procedure by the machine learning device 100 of Figure 1.

入力部１０は、ラベルが不明のタスクを入力し、スクリーニング処理部２０に与える（Ｓ１０）。 The input unit 10 inputs a task with an unknown label and provides it to the screening processing unit 20 (S10).

スクリーニング処理部２０において、スクリーニング部８０の分類部８６は、入力タスクの特徴マップとタスクデータベース５０に登録された各タスクの代表特徴マップの類似度にもとづいて入力タスクを分類し、ラベルを付与する（Ｓ２０）。 In the screening processing unit 20, the classification unit 86 of the screening unit 80 classifies the input tasks based on the similarity between the feature map of the input task and the representative feature maps of each task registered in the task database 50, and assigns a label (S20).

スクリーニング処理部２０において、学習制御部９０の学習頻度制御部９６は、入力タスクの頻度および類似度の少なくとも一方にもとづいて入力タスクを継続学習に用いるかどうかを決定する（Ｓ３０）。 In the screening processing unit 20, the learning frequency control unit 96 of the learning control unit 90 determines whether to use the input task for continued learning based on at least one of the frequency and similarity of the input task (S30).

タスク保存部３０の更新判定部３２は、タスクデータベース５０に類似度と関連づけられた頻度を更新するとともに、入力タスクが継続学習に用いられる場合、入力タスクの特徴マップを用いてタスクデータベース５０に登録されたタスクの代表特徴マップを更新する（Ｓ４０）。 The update determination unit 32 of the task storage unit 30 updates the frequency associated with the similarity in the task database 50, and if the input task is used for continuous learning, updates the representative feature map of the task registered in the task database 50 using the feature map of the input task (S40).

入力タスクを継続学習に用いる場合（Ｓ５０のＹ）、継続学習部４０は、ラベルが付与された入力タスクを用いて物体認識モデルを継続学習し（Ｓ６０）、入力タスクを継続学習に用いない場合（Ｓ５０のＮ）、入力タスクを無視し、ステップＳ７０に進む。 If the input task is to be used for continued learning (Y in S50), the continued learning unit 40 continues learning the object recognition model using the labeled input task (S60); if the input task is not to be used for continued learning (N in S50), the input task is ignored and the process proceeds to step S70.

まだタスクがある場合（Ｓ７０のＮ）、ステップＳ１０に戻り、次のタスクを入力する。タスクが終了の場合（Ｓ７０のＹ）、継続学習を終了する。 If there are still tasks (N in S70), return to step S10 and input the next task. If the task is complete (Y in S70), end the continuous learning.

上記の実施の形態で説明した特徴マップについて、図８～図１２を参照して詳しく説明する。 The feature map described in the above embodiment will be explained in detail with reference to Figures 8 to 12.

図８は、畳み込みニューラルネットワークの各畳み込み層から出力される特徴マップを説明する図である。本実施の形態において、特徴マップは入力データを畳み込みニューラルネットワークに入力した場合の畳み込み層の出力データのことである。図８に示すように、特徴マップには、各レイヤの畳み込み層の中間的な出力データと、最終レイヤの畳み込み層の最終的な出力データの二種類がある。 Figure 8 is a diagram explaining the feature maps output from each convolutional layer of a convolutional neural network. In this embodiment, the feature map refers to the output data of the convolutional layer when input data is input to the convolutional neural network. As shown in Figure 8, there are two types of feature maps: intermediate output data from the convolutional layers of each layer, and final output data from the convolutional layer of the final layer.

図９は、畳み込みニューラルネットワークの各畳み込み層から出力される特徴マップの連結を説明する図である。入力データの特徴マップを取得する際、最終レイヤの畳み込み層から出力される特徴マップだけでは、見逃される特徴がある。そこで、図９に示すように、適宜必要な中間レイヤの畳み込み層から出力される特徴マップと最終レイヤの畳み込み層から出力される最終特徴マップとを連結して、入力データの特徴マップを作成することがより好ましい。例えば、レイヤ２の特徴マップ２と最終レイヤの最終特徴マップを連結する。レイヤ２の特徴マップ２、レイヤ５の特徴マップ５、および最終レイヤの最終特徴マップを連結する。このように様々な組み合わせがあり、目的によって使い分けることができる。 Figure 9 is a diagram explaining the concatenation of feature maps output from each convolutional layer of a convolutional neural network. When obtaining a feature map of input data, some features may be overlooked if only the feature map output from the convolutional layer of the final layer is used. Therefore, as shown in Figure 9, it is more preferable to create a feature map of the input data by concatenating the feature maps output from the convolutional layers of the intermediate layers as required and the final feature map output from the convolutional layer of the final layer. For example, concatenate feature map 2 of layer 2 and the final feature map of the final layer. Concatenate feature map 2 of layer 2, feature map 5 of layer 5, and the final feature map of the final layer. As shown above, there are various combinations, which can be used depending on the purpose.

図１０は、チャネル数の異なる特徴マップを加算して連結した特徴マップを生成する方法を説明する図である。ここでは、チャネル数が６４、サイズが２５６×２５６の特徴マップ２と、チャネル数が１２８、サイズが２５６×２５６の特徴マップ４とを連結する場合を説明する。チャネル数を統一する、すなわち特徴マップ２のチャネル数を特徴マップ４と同じチャネル数にするために、特徴マップ２の後に畳み込み層を設置する。この畳み込み層の入力チャネル数は６４、出力チャネル数は１２８であり、フィルタのサイズは１×１である。特徴マップ２をこの畳み込み層に入力すると、畳み込み層から出力される新特徴マップ２のサイズは変わらないが、チャネル数は１２８に変換される。すなわち新特徴マップ２は、チャネル数が１２８、サイズが２５６×２５６になる。チャネル数が統一された新特徴マップ２と特徴マップ４を加算することにより、チャネル数が１２８、サイズが２５６×２５６の連結した特徴マップが生成される。 Figure 10 is a diagram explaining a method for generating a concatenated feature map by adding feature maps with different numbers of channels. Here, a case will be explained in which feature map 2 with 64 channels and size 256 x 256 is concatenated with feature map 4 with 128 channels and size 256 x 256. To unify the number of channels, that is, to make the number of channels of feature map 2 the same as that of feature map 4, a convolutional layer is placed after feature map 2. This convolutional layer has 64 input channels, 128 output channels, and a filter size of 1 x 1. When feature map 2 is input to this convolutional layer, the size of new feature map 2 output from the convolutional layer does not change, but the number of channels is converted to 128. In other words, new feature map 2 has 128 channels and a size of 256 x 256. By adding new feature map 2 and feature map 4 with the unified number of channels, a concatenated feature map with 128 channels and a size of 256 x 256 is generated.

図１１は、チャネル数とサイズの両方が異なる特徴マップを加算して連結した特徴マップを生成する方法を説明する図である。ここでは、チャネル数が６４、サイズが２５６×２５６の特徴マップ２と、チャネル数が１２８、サイズが６４×６４の特徴マップ４とを連結する場合を説明する。チャネル数を統一する、すなわち特徴マップ２のチャネル数を特徴マップ４と同じチャネル数にするために、特徴マップ２に対してゼロパディングでチャネル数を増やす。これは追加したチャネルの値をゼロで埋めることである。これにより、特徴マップ２のチャネル数は１２８に変換される。その後、サイズを統一する、すなわち特徴マップ２のサイズを特徴マップ４と同じサイズにするために、特徴マップ２をダウンサンプリングする。これはストライドが４のプーリングをすることにより実現される。プーリングの例として最大値を出力する最大プーリングや平均値を出力する平均プーリングを用いることができ、これらに限らず、他の方法でプーリングしてもよい。これにより、チャネル数が１２８、サイズが６４×６４の新特徴マップ２が得られる。チャネル数およびサイズが統一された新特徴マップ２と特徴マップ４を加算することにより、チャネル数が１２８、サイズが６４×６４の連結した特徴マップが生成される。なお、特徴マップ２のチャネル数を特徴マップ４と同じチャネル数にするために、図１０で説明した特徴マップ２の後に畳み込み層を設置する方法を採用してもよい。 Figure 11 is a diagram explaining a method for generating a feature map by adding and concatenating feature maps with different numbers of channels and sizes. Here, a case will be explained in which a feature map 2 with 64 channels and a size of 256 x 256 is concatenated with a feature map 4 with 128 channels and a size of 64 x 64. To unify the number of channels, that is, to make the number of channels of the feature map 2 the same as that of the feature map 4, the number of channels of the feature map 2 is increased by zero padding. This is to fill the values of the added channels with zeros. As a result, the number of channels of the feature map 2 is converted to 128. After that, to unify the sizes, that is, to make the size of the feature map 2 the same as that of the feature map 4, the feature map 2 is downsampled. This is achieved by pooling with a stride of 4. Examples of pooling include maximum pooling, which outputs the maximum value, and average pooling, which outputs the average value, and are not limited to these, and other methods of pooling may be used. As a result, a new feature map 2 with 128 channels and a size of 64 x 64 is obtained. By adding new feature map 2 and feature map 4, which have the same number of channels and size, a concatenated feature map with 128 channels and a size of 64 x 64 is generated. Note that in order to make the number of channels in feature map 2 the same as that of feature map 4, a method of placing a convolutional layer after feature map 2 as described in Figure 10 may be adopted.

図１２は、特徴マップを結合して連結した特徴マップを生成する方法を説明する図である。ここでは、チャネル数が６４、サイズが２５６×２５６の特徴マップ２と、チャネル数が１２８、サイズが６４×６４の特徴マップ４とを連結する場合を説明する。特徴マップのサイズが異なるため、サイズを統一する、すなわち特徴マップ２のサイズを特徴マップ４と同じサイズにするために、特徴マップ２をダウンサンプリングする。これはストライドが４のプーリングをすることにより実現される。これにより、チャネル数が６４、サイズが６４×６４の新特徴マップ２が得られる。チャネル数が異なり、サイズが同一の新特徴マップ２と特徴マップ４を結合（concatenate）することにより、チャネル数が１９２、サイズが６４×６４の連結した特徴マップが生成される。特徴マップは、サイズが同じであれば、チャネル数が異なっていても、３次元テンソルとして結合（concatenate）することができることに留意する。たとえば、特徴マップＡの次元が［６４×６４×６４］、特徴マップＢの次元が［１２８×６４×６４］の場合（ここで特徴マップのサイズを［Ｈ×Ｗ］とすると、特徴マップの次元は［チャネル数×Ｈ×Ｗ］で表される）、特徴マップＡと特徴マップＢを結合する時、サイズＨ×Ｗが同じであれば、結合した特徴マップの次元は［（チャネル数Ａ＋チャネル数Ｂ）×Ｈ×Ｗ］＝［（６４＋１２８）×６４×６４］＝［１９２×６４×６４］となる。 Figure 12 is a diagram explaining a method of generating a concatenated feature map by combining feature maps. Here, a case will be explained in which a feature map 2 with 64 channels and a size of 256x256 is concatenated with a feature map 4 with 128 channels and a size of 64x64. Since the feature maps have different sizes, the feature map 2 is downsampled to unify the sizes, i.e., to make the size of the feature map 2 the same as that of the feature map 4. This is achieved by pooling with a stride of 4. As a result, a new feature map 2 with 64 channels and a size of 64x64 is obtained. By concatenating the new feature map 2 and the feature map 4 with different numbers of channels and the same size, a concatenated feature map with 192 channels and a size of 64x64 is generated. Note that feature maps can be concatenated as a 3D tensor even if the number of channels is different, as long as they have the same size. For example, if feature map A has dimensions of [64 x 64 x 64] and feature map B has dimensions of [128 x 64 x 64] (where the size of the feature map is [H x W], the dimensions of the feature map are expressed as [number of channels x H x W]), when feature map A and feature map B are combined, if the sizes H x W are the same, the dimensions of the combined feature map will be [(number of channels A + number of channels B) x H x W] = [(64 + 128) x 64 x 64] = [192 x 64 x 64].

図１２では、サイズを統一してから特徴マップを結合したが、チャネル数が同一でサイズが異なる特徴マップを結合することも可能である。たとえば、チャネル数が６４、サイズが６４×６４の特徴マップＡと、チャネル数が６４、サイズが１２８×６４の特徴マップＢを結合すると、チャネル数が６４、サイズが１９２×６４の連結した特徴マップが生成される。一般に特徴マップのサイズがｍ×ｎであり、２つの特徴マップの間でｍまたはｎのいずれか一方のみが同じ場合、２つの特徴マップを結合することができる。特徴マップＡのサイズがｍ×ｎ_１であり、特徴マップＢのサイズがｍ×ｎ_２である場合、行を揃えて、特徴マップＡの各要素の左に特徴マップＢの各要素を並べることにより、サイズがｍ×（ｎ_１＋ｎ_２）の連結した特徴マップを生成することができる。特徴マップＡのサイズがｍ_１×ｎであり、特徴マップＢのサイズがｍ_２×ｎである場合、列を揃えて、特徴マップＡの各要素の下に特徴マップＢの各要素を並べることにより、サイズが（ｍ_１＋ｍ_２）×ｎの連結した特徴マップを生成することができる。 In FIG. 12, the feature maps are combined after unifying the sizes, but it is also possible to combine feature maps with the same number of channels but different sizes. For example, when a feature map A with 64 channels and a size of 64×64 is combined with a feature map B with 64 channels and a size of 128×64, a concatenated feature map with 64 channels and a size of 192×64 is generated. In general, when the size of the feature map is m×n and only one of m or n is the same between the two feature maps, the two feature maps can be combined. When the size of the feature map A is m× _n1 and the size of the feature map B is m× _n2 , the rows are aligned and each element of the feature map B is arranged to the left of each element of the feature map A, thereby generating a concatenated feature map with a size of m×(n ₁ +n ₂ ). If feature map A has size _m1 x n and feature map B has size _m2 x n, then a concatenated feature map of size ( _m1 + _m2 ) x n can be generated by aligning the columns and arranging each element of feature map B below each element of feature map A.

このようにして連結した入力タスクの特徴マップとあらかじめ登録されている各タスクの代表特徴マップとの類似度を算出する場合、チャネル数とサイズが異なる場合がある。この場合、図１０および図１１の連結方法で示した２つの方法を用いて、チャネル数とサイズの両方を揃える必要がある。 When calculating the similarity between the feature map of the input task concatenated in this way and the representative feature maps of each task registered in advance, the number of channels and size may differ. In this case, it is necessary to align both the number of channels and size using the two concatenation methods shown in Figures 10 and 11.

以上説明した機械学習装置１００の各種の処理は、ＣＰＵやメモリ等のハードウェアを用いた装置として実現することができるのは勿論のこと、ＲＯＭ（リード・オンリ・メモリ）やフラッシュメモリ等に記憶されているファームウェアや、コンピュータ等のソフトウェアによっても実現することができる。そのファームウェアプログラム、ソフトウェアプログラムをコンピュータ等で読み取り可能な記録媒体に記録して提供することも、有線あるいは無線のネットワークを通してサーバと送受信することも、地上波あるいは衛星ディジタル放送のデータ放送として送受信することも可能である。 The various processes of the machine learning device 100 described above can of course be realized as a device using hardware such as a CPU and memory, but can also be realized by firmware stored in a ROM (read-only memory) or flash memory, or by software on a computer, etc. The firmware program or software program can be provided by recording it on a recording medium readable by a computer, etc., or it can be transmitted and received with a server via a wired or wireless network, or it can be transmitted and received as data broadcasting on terrestrial or satellite digital broadcasting.

以上述べたように、本実施の形態の機械学習装置１００によれば、入力タスクに対して特徴マップにもとづいて自動的にラベル付けをすることができる。また、入力タスクの学習頻度または類似度にもとづいて入力タスクを継続学習に用いるかどうかを決定することにより、学習の偏りを抑制し、致命的忘却の発生リスクを低減することができる。 As described above, according to the machine learning device 100 of this embodiment, it is possible to automatically label input tasks based on a feature map. In addition, by determining whether to use an input task for continued learning based on the learning frequency or similarity of the input task, it is possible to suppress learning bias and reduce the risk of fatal forgetting.

以上、本発明を実施の形態をもとに説明した。実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on an embodiment. The embodiment is merely an example, and it will be understood by those skilled in the art that various modifications are possible in the combination of each component and each processing process, and that such modifications are also within the scope of the present invention.

１０入力部、２０スクリーニング処理部、３０タスク保存部、３２更新判定部、４０継続学習部、５０タスクデータベース、５２代表特徴マップ、５４頻度、６０推論部、７０出力部、８０スクリーニング部、８２代表特徴マップ取得部、８４特徴マップ変換部、８６分類部、９０学習制御部、９２頻度情報取得部、９４類似度情報取得部、９６学習頻度制御部、１００機械学習装置。 10 Input unit, 20 Screening processing unit, 30 Task storage unit, 32 Update determination unit, 40 Continuous learning unit, 50 Task database, 52 Representative feature map, 54 Frequency, 60 Inference unit, 70 Output unit, 80 Screening unit, 82 Representative feature map acquisition unit, 84 Feature map conversion unit, 86 Classification unit, 90 Learning control unit, 92 Frequency information acquisition unit, 94 Similarity information acquisition unit, 96 Learning frequency control unit, 100 Machine learning device.

Claims

a classification unit that calculates a similarity between a feature map of an input task and a representative feature map of each task registered in advance, and assigns a label of a task having the highest similarity to the input task;
a control unit that determines whether the input task is to be used for continuous learning;
a continuous learning unit that performs continuous learning using the labeled input task as training data when the input task is used for continuous learning,
The control unit determines whether to use the input task for further learning based on the frequency with which a task having a representative feature map that has the highest similarity to the feature map of the input task has been used for learning .

The machine learning device according to claim 1 , wherein the control unit uses the input task for continuous learning when the frequency is less than a predetermined threshold.

a classification step of calculating a similarity between a feature map of an input task and a representative feature map of each task registered in advance, and assigning a label of a task having the highest similarity to the input task;
a control step of determining whether the input task is to be used for further learning;
and a continuous learning step of performing continuous learning using the labeled input task as training data when the input task is used for continuous learning,
A machine learning method executed by a computer, characterized in that the control step determines whether to use the input task for further learning based on the frequency with which a task having a representative feature map that is most similar to the feature map of the input task has been used for learning .

a classification step of calculating a similarity between a feature map of an input task and a representative feature map of each task registered in advance, and assigning a label of a task having the highest similarity to the input task;
a control step of determining whether the input task is to be used for further learning;
a continuous learning step of performing continuous learning using the labeled input task as training data when the input task is used for continuous learning ;
A machine learning program characterized in that the control step determines whether to use the input task for continued learning based on the frequency with which a task having a representative feature map that has the highest similarity to the feature map of the input task has been used for learning .