JP6692271B2

JP6692271B2 - Multitask processing device, multitask model learning device, and program

Info

Publication number: JP6692271B2
Application number: JP2016190234A
Authority: JP
Inventors: 卓弘金子; 薫平松; 柏野　邦夫; 邦夫柏野
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2016-09-28
Filing date: 2016-09-28
Publication date: 2020-05-13
Anticipated expiration: 2036-09-28
Also published as: JP2018055377A

Description

本発明は、マルチタスク処理装置、マルチタスクモデル学習装置、及びプログラムに関する。 The present invention relates to a multitask processing device, a multitask model learning device, and a program.

入力データに対して評価を行いフィードバックするシステムにおいて、ルールベースに基づくフィードバックシステムや、比較ベースに基づくフィードバックシステムが知られている。ルールベースに基づくフィードバックシステムでは、評価基準をヒューリスティックに設定してフィードバックが行われる。例えば「○○は△△以内であるべき」といった評価基準が設定される。また、比較ベースに基づくフィードバックシステムでは、プロフェッショナルによる実施例を基準にして差分によって評価が行われる。また、学習ベースに基づくフィードバックシステムも知られている（例えば、非特許文献１）。 As a system for evaluating and feeding back input data, a feedback system based on a rule base and a feedback system based on a comparison base are known. In a rule-based feedback system, feedback is performed by setting evaluation criteria heuristically. For example, an evaluation standard such as “○○ should be within ΔΔ” is set. Further, in the feedback system based on the comparison base, the evaluation is performed by the difference based on the example by the professional. A feedback system based on a learning base is also known (for example, Non-Patent Document 1).

Pirsiavash,“Assessing the Quality of Actions”,ECCV2014Pirsiavash, “Assessing the Quality of Actions”, ECCV2014

しかし、ルールベースに基づくフィードバックシステムは、プロフェッショナルによる知識が必要であり、比較ベースに基づくフィードバックシステムは、個人差があることによる差分の定義が困難であるとの問題点がある。 However, the rule-based feedback system requires professional knowledge, and the comparison-based feedback system has a problem that it is difficult to define a difference due to individual differences.

また、上記非特許文献１に記載の学習ベースに基づくフィードバックシステムを、顔領域が映った入力画像に対し表情改善のフィードバックを行うフィードバックシステムに適用させた場合、図２４に示すように、入力画像から目や鼻の位置を表すキーポイントが検出され、キーポイントに応じたスコアが算出される。そして、キーポイントの検出とスコアとの関係に応じてフィードバックに関する値が算出される。 Further, when the feedback system based on the learning base described in Non-Patent Document 1 above is applied to a feedback system that provides feedback for improving facial expressions to an input image in which a face area is reflected, as shown in FIG. From this, a key point representing the position of the eyes or nose is detected, and a score corresponding to the key point is calculated. Then, a value regarding feedback is calculated according to the relationship between the detection of the key point and the score.

しかし、先にキーポイント抽出を行うことが必要であるため、スコアの算出はキーポイントの検出精度に依存してしまう。また、フィードバックに関する値の算出に使える情報はキーポイントのみであるため、入力画像の情報を反映することができないといった問題がある。 However, since it is necessary to first extract the key points, the score calculation depends on the key point detection accuracy. Further, since the information that can be used to calculate the value related to the feedback is only the key points, there is a problem that the information of the input image cannot be reflected.

本発明は、上記問題点を解決するために成されたものであり、データに対する複数のタスクの結果を、１つのモデルから取得することができるマルチタスク処理装置及びプログラムを提供することを目的とする。 The present invention has been made to solve the above problems, and an object of the present invention is to provide a multitask processing device and a program capable of acquiring the results of a plurality of tasks for data from one model. To do.

また、本発明は、複数のタスクの結果を取得するためのマルチタスクモデルを得ることができるマルチタスクモデル学習装置及びプログラムを提供することを目的とする。 Another object of the present invention is to provide a multitasking model learning device and a program capable of obtaining a multitasking model for acquiring results of a plurality of tasks.

上記目的を達成するために、第１の発明に係るマルチタスク処理装置は、入力されたデータに対して、データを評価する評価用タスクと、データから情報を抽出する情報提示用タスクとを含む複数のタスクを処理するためのマルチタスクモデルを用いて、前記評価用タスクと、前記情報提示用タスクとについての出力を得るタスク処理部含んで構成されている。 In order to achieve the above object, the multitask processing device according to the first invention includes an evaluation task for evaluating the input data and an information presentation task for extracting information from the data. It is configured to include a task processing unit that obtains outputs of the evaluation task and the information presentation task by using a multi-task model for processing a plurality of tasks.

前記マルチタスクモデルは、前記評価用タスクと前記情報提示用タスクとで特徴空間を共有し、前記特徴空間の特徴量に基づいて、データを評価する前記評価用タスクを行い、前記特徴空間の特徴量に基づいて、データから情報を抽出する前記情報提示用タスクを行うマルチタスクＤＮＮ（Deep Neural Network）であるようにすることができる。 The multi-task model shares a feature space between the evaluation task and the information presentation task, performs the evaluation task of evaluating data based on the feature amount of the feature space, and determines the characteristics of the feature space. It may be a multi-task DNN (Deep Neural Network) that performs the information presentation task of extracting information from data based on the amount.

第１の発明に係るマルチタスク処理装置は、前記複数のタスクのうちの複数または一つの評価用タスクに対して、目標値を与えたときに、前記目標値に近づくように、前記情報提示用タスクの出力を計算し、前記タスク処理部による前記情報提示用タスクの出力の計算結果と、前記目標値に近づくように計算された、前記情報提示用タスクの出力の計算結果とに基づいて、前記情報提示用タスクの出力に関する情報を提示するフィードバック生成部を更に含むようにすることができる。 A multi-task processing device according to a first aspect of the present invention is for presenting the information so as to approach a target value when a target value is given to a plurality of or one evaluation task among the plurality of tasks. The output of the task is calculated, the calculation result of the output of the information presenting task by the task processing unit, and the calculation result of the output of the information presenting task, which is calculated so as to approach the target value, A feedback generation unit that presents information regarding the output of the information presenting task may be further included.

前記データは、画像情報であるようにすることができる。 The data may be image information.

前記マルチタスクモデルは、複数層を有するＤＮＮ（Deep Neural Network）であり、前記フィードバック生成部は、前記タスク処理部によって前記入力されたデータを前記マルチタスクモデルに適用したときの前記複数層のうちの何れか１つの層の出力を修正することにより、前記マルチタスクモデルの前記評価用タスクによって前記目標値であると評価されるときに、前記マルチタスクモデルの前記情報提示用タスクによって抽出される情報を計算するようにすることができる。 The multi-task model is a DNN (Deep Neural Network) having a plurality of layers, and the feedback generation unit includes a plurality of layers when the input data is applied to the multi-task model by the task processing unit. By modifying the output of any one of the layers, when it is evaluated as the target value by the evaluation task of the multitask model, it is extracted by the information presentation task of the multitask model. Information can be calculated.

第２の発明に係るマルチタスクモデル学習装置は、データと、前記データに対する複数のタスクの結果とからなる学習データに基づいて、前記複数のタスクで特徴空間を共有するマルチタスクＤＮＮ（Deep Neural Network）であって、かつ、前記特徴空間の特徴量に基づいて、前記複数のタスクを行うマルチタスクＤＮＮを学習する学習部を含んで構成されている。 A multitasking model learning device according to a second aspect of the invention is a multitasking DNN (Deep Neural Network) that shares a feature space with a plurality of tasks based on learning data consisting of data and results of a plurality of tasks on the data. ) And based on the feature amount of the feature space, a learning unit that learns a multi-task DNN that performs the plurality of tasks is included.

また、前記データは、画像情報であるようにすることができる。 Further, the data may be image information.

本発明に係るプログラムは、コンピュータを、上記のマルチタスク処理装置、及び上記のマルチタスクモデル学習装置を構成する各部として機能させるためのプログラムである。 A program according to the present invention is a program for causing a computer to function as each unit included in the above-mentioned multitask processing device and the above-mentioned multitask model learning device.

本発明のマルチタスク処理装置及びプログラムによれば、入力されたデータに対して、データを評価する評価用タスクと、データから情報を抽出する情報提示用タスクとを含む複数のタスクを処理するためのマルチタスクモデルを用いて、評価用タスクと情報提示用タスクとについての出力を得ることにより、データに対する複数のタスクの結果を１つのモデルから取得することができる、という効果が得られる。 According to the multitask processing device and the program of the present invention, for processing a plurality of tasks including an evaluation task for evaluating data and an information presentation task for extracting information from the data with respect to the input data. By obtaining outputs for the evaluation task and the information presentation task by using the multitask model of, the effect of being able to obtain the results of a plurality of tasks for data from one model is obtained.

本発明のマルチタスクモデル学習装置及びプログラムによれば、データと、データに対する複数のタスクの結果とを含んで構成される学習データに基づいて、複数のタスクで特徴空間を共有するマルチタスクモデルであって、かつ、特徴空間の特徴量に基づいて、複数のタスクを行うマルチタスクモデルを学習することにより、複数のタスクの結果を取得するためのマルチタスクモデルを得ることができる、という効果が得られる。 According to the multitask model learning device and program of the present invention, a multitask model in which a feature space is shared by a plurality of tasks based on learning data including data and results of a plurality of tasks for the data. It is possible to obtain a multitask model for acquiring the results of multiple tasks by learning a multitask model that performs multiple tasks based on the feature amount of the feature space. can get.

顔領域の画像情報に対し笑顔を生成するための具体的な指示をフィードバックするシステムを説明するための説明図である。It is an explanatory view for explaining a system for feeding back a specific instruction for generating a smile to the image information of the face area. 異なる顔領域の画像情報毎にフィードバック情報を提示するシステムの一例を説明するための説明図である。It is an explanatory view for explaining an example of a system which presents feedback information for every image information on a different face region. 本実施の形態におけるフィードバックシステムの概念図である。It is a conceptual diagram of the feedback system in this Embodiment. ルールベース及び比較ベースと比べた学習ベースのフィードバックシステムの利点を説明するための説明図である。It is an explanatory view for explaining an advantage of a learning based feedback system compared with a rule base and a comparison base. 学習データの与え方を説明するための説明図である。It is an explanatory view for explaining how to give learning data. 情報提示の方法の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of the method of information presentation. 本実施の形態の処理の概要を説明するための説明図である。It is explanatory drawing for demonstrating the outline | summary of the process of this Embodiment. 本発明の実施の形態に係るマルチタスク処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the multitask processing apparatus which concerns on embodiment of this invention. マルチタスクモデルにおける３つの定式化のパートについての説明図である。It is explanatory drawing about the part of three formulation in a multitasking model. 表情認識とキーポイント抽出との同時学習の説明図である。It is explanatory drawing of simultaneous learning of facial expression recognition and keypoint extraction. マルチタスクＣＮＮの処理の概要の説明図である。It is an explanatory view of the outline of processing of multi-task CNN. マルチタスクＣＮＮの学習処理の概要の説明図である。It is an explanatory view of an outline of learning processing of multitask CNN. フィードバック生成部３０による処理を説明するための説明図である。FIG. 7 is an explanatory diagram illustrating a process performed by a feedback generation unit 30. フィードバック生成部３０による処理を説明するための説明図である。FIG. 7 is an explanatory diagram illustrating a process performed by a feedback generation unit 30. フィードバック生成部３０による処理を説明するための説明図である。FIG. 7 is an explanatory diagram illustrating a process performed by a feedback generation unit 30. 本実施の形態の処理の概要を説明するための説明図である。It is explanatory drawing for demonstrating the outline | summary of the process of this Embodiment. 本発明の実施の形態に係るマルチタスク処理装置における学習処理ルーチンを示すフローチャートである。It is a flow chart which shows a learning processing routine in a multitask processing device concerning an embodiment of the invention. 本発明の実施の形態に係るマルチタスク処理装置におけるマルチタスク処理ルーチンを示すフローチャートである。It is a flowchart which shows the multitask processing routine in the multitask processing apparatus which concerns on embodiment of this invention. マルチタスク処理ルーチンの擬似チャートの一例を示す図である。It is a figure which shows an example of the pseudo chart of a multitasking processing routine. 本発明の実施の形態に係るマルチタスク処理装置におけるフィードバック生成処理ルーチンを示すフローチャートである。It is a flow chart which shows a feedback generation processing routine in a multitask processing device concerning an embodiment of the invention. 実験の概要を説明するための説明図である。It is an explanatory view for explaining the outline of an experiment. 実験の概要を説明するための説明図である。It is an explanatory view for explaining the outline of an experiment. 実験結果の例を説明するための説明図である。It is explanatory drawing for demonstrating the example of an experimental result. 表情改善のフィードバックを行うフィードバックシステムに適用させた場合の例を説明するための説明図である。It is explanatory drawing for demonstrating the example at the time of applying to the feedback system which gives feedback of facial expression improvement.

＜本発明の実施の形態の概要＞
本実施の形態では、フィードバックシステムの実現を目指す。例えば、表情改善に関するフィードバックシステムの場合、図１に示すように、入力データの一例である顔領域の画像情報に対し、笑顔を生成するための具体的な指示をユーザへフィードバックする。顔領域の画像情報のフィードバックシステムを実現することにより、例えば以下の（１）〜（４）に示すようなアプリケーションに応用することが想定される。 <Outline of Embodiment of the Present Invention>
In this embodiment, the aim is to realize a feedback system. For example, in the case of a feedback system regarding facial expression improvement, as shown in FIG. 1, a specific instruction for generating a smile is fed back to the user with respect to image information of a face area, which is an example of input data. By implementing the feedback system of the image information of the face area, it is expected to be applied to the applications as shown in the following (1) to (4), for example.

（１）役者がある状況ですべき表情は？
（２）面接の時にすべき表情は？
（３）英語を発音する時の正しい口の動きは？
（４）表情操作に障害がある方の支援 (1) What expression should an actor have in a certain situation?
(2) What kind of facial expression should you have during the interview?
(3) What is the correct mouth movement when pronouncing English?
(4) Support for people with disabilities in facial expression operations

本実施の形態では、特に、多様な入力データに対して、適応的にフィードバックを生成するフィードバックシステムの実現を試みる。例えば、図２に示すように、異なる顔領域の画像情報毎に、画像情報に応じたフィードバック情報を提示する。 In this embodiment, in particular, an attempt is made to realize a feedback system that adaptively generates feedback for various input data. For example, as shown in FIG. 2, feedback information corresponding to image information is presented for each image information of different face areas.

図３に、本実施の形態におけるフィードバックシステムの概念図を示す。図３に示すように、［1.Input］において画像や動画等の入力データが入力されると、［2-1.Evaluation］における評価基準と目的状態である［Target］とに基づいて、［2-2.Check of Achievement］において達成の度合いが確認される。そして、目的状態が達成されていない場合には［3-1.Formulation of improvement Plan］の改善計画に応じて、［3-2.Feedback Proposal］においてフィードバックが生成され、入力データに対する修正が行われる。 FIG. 3 shows a conceptual diagram of the feedback system in the present embodiment. As shown in FIG. 3, when input data such as an image or a moving image is input in [1.Input], based on the evaluation standard in [2-1.Evaluation] and the target state [Target], [ 2-2. Check of Achievement] confirms the degree of achievement. When the target state is not achieved, feedback is generated in [3-2.Feedback Proposal] according to the improvement plan in [3-1.Formulation of improvement Plan], and the input data is corrected. ..

ここで、上記図３に示すように、評価基準をどうやって求めるかという点と、具体的な指示をどうやって求めるかが問題となる。 Here, as shown in FIG. 3, there are problems in how to obtain the evaluation standard and how to obtain a specific instruction.

これに対し、本実施の形態では、学習ベースに基づくフィードバックシステムを用いて、評価基準を大量のデータから学習する。これにより、プロフェッショナルの知識は不要となり、評価基準はデータからうまく抽出される。また、図４に示すように、学習ベースに基づくフィードバックシステムは、ルールベース及び比較ベースと比べ、未だ発見されていない領域に対して有効である。 On the other hand, in the present embodiment, a feedback system based on a learning base is used to learn the evaluation standard from a large amount of data. This eliminates the need for professional knowledge and allows the evaluation criteria to be well extracted from the data. Further, as shown in FIG. 4, the feedback system based on the learning base is more effective than the rule base and the comparison base on the region that has not been discovered yet.

また、従来技術においては、Recognition、Detection、Localizationが対象とされている。例えば、Recognitionの一例である表情認識では、Happy, Sad, Angry…といった表情に応じた感情が認識される。また、Detectionの一例としては顔検出が行われている。また、Localizationの一例としては、顔領域の目や鼻の位置を表すキーポイントの検出が行われている。このように、従来技術においては、現状（What people do）を認識するのが目的となっている。 Further, in the related art, Recognition, Detection, and Localization are targeted. For example, in facial expression recognition, which is an example of Recognition, emotions corresponding to facial expressions such as Happy, Sad, Angry ... Are recognized. Face detection is performed as an example of Detection. In addition, as an example of Localization, detection of key points indicating the positions of eyes and nose in the face area is performed. As described above, in the related art, the purpose is to recognize the current situation (What people do).

一方、本実施の形態では、Feedback Generationを行い、例えば表情改善に必要な指示を提示する。本実施の形態では、どう変えればいいか（How people can improve）を学習するのが目的であり、従来技術とは対象としているタスクが大きく異なるため、新しい学習法が必要となる。 On the other hand, in the present embodiment, feedback generation is performed and, for example, instructions necessary for improving facial expressions are presented. In the present embodiment, the purpose is to learn how to change (How people can improve), and since the target task is significantly different from the conventional technique, a new learning method is required.

また、フィードバックシステムを構築する場合、学習データの与え方としては２つの方法が考えられる。１つ目の方法としては、１人のユーザから全パターンの学習データを収集することである。２つ目の方法としては、１人のユーザから１つのパターンの学習データを収集することである。例えば、図５に示すように、１人のユーザから全パターンの学習データを収集する場合、各人において正解例が存在するというメリットがあるが、一人一人の負担が大きいというデメリットがある。また、１人のユーザから１つのパターンの学習データを収集する場合、一人一人の負担は小さいというメリットがあるが、各人の正解例が存在しないというデメリットがある。また、データを収集した後に人が正解をアノテーションすることも可能だが、専門的知識が必要でありスケーラブルではないという問題がある。そのため、本実施の形態では、データ収集のスケーラビリティを考え、１人のユーザから１つのパターンの学習データを収集することを選択する。 Further, when constructing a feedback system, two methods can be considered as methods of giving learning data. The first method is to collect all patterns of learning data from one user. The second method is to collect one pattern of learning data from one user. For example, as shown in FIG. 5, when learning data of all patterns is collected from one user, there is a merit that each person has a correct answer example, but there is a demerit that each person has a heavy burden. Further, in the case of collecting one pattern of learning data from one user, there is an advantage that the burden on each person is small, but there is a disadvantage that there is no correct answer example for each person. It is also possible for a person to annotate the correct answer after collecting the data, but it requires specialized knowledge and is not scalable. Therefore, in the present embodiment, considering the scalability of data collection, it is selected to collect one pattern of learning data from one user.

また、情報提示の方法としては、図６に示すような形式が考えられる。図６に示すように、特定のユーザの顔の画像情報が入力され、ユーザが笑顔になるようなフィードバックを提示する場合を考える。この場合、同一ユーザの顔の画像情報を変形した画像情報をフィードバックとして提示するのか、同一ユーザの顔の画像情報に対し指標を付加してフィードバックとして提示するのか、異なるユーザの顔の画像情報を変形した画像情報をフィードバックとして提示するのか、異なるユーザの顔の画像情報に対し指標を付加してフィードバックとして提示するのかを選択する必要がある。 As a method of presenting information, the format shown in FIG. 6 can be considered. As shown in FIG. 6, consider a case where image information of a face of a specific user is input and feedback is provided to make the user smile. In this case, whether the image information obtained by transforming the image information of the face of the same user is presented as feedback, or whether the image information of the face of the same user is added as an index and presented as feedback, the image information of the face of different users is displayed. It is necessary to select whether to present modified image information as feedback or to present it as feedback by adding an index to the image information of the faces of different users.

本実施の形態では、提示情報の分かりやすさを考え、同一ユーザの顔の画像情報に対し指標を付加してフィードバックとして提示する方法を選択した。 In the present embodiment, a method of adding an index to the image information of the face of the same user and presenting it as feedback is selected in consideration of the intelligibility of the presentation information.

本実施の形態は、図７に示すように、ユーザの顔の画像情報が入力されると、画像情報を評価する評価用タスクの一例である表情認識と、画像情報からキーポイントの一例である目や鼻等を抽出する情報提示用タスクが行われる。そして、目標として「笑顔」を設定し、ユーザが笑顔になるために、方略を活用してフィードバックプランニングを生成し、生成されたフィードバックプランニングを画像情報に矢印として表示する。 As shown in FIG. 7, the present embodiment is an example of facial expression recognition, which is an example of an evaluation task for evaluating image information when a user's face image information is input, and an example of key points from the image information. An information presentation task of extracting eyes, nose, etc. is performed. Then, "smile" is set as a goal, and feedback planning is generated by utilizing the strategy to make the user smile, and the generated feedback planning is displayed as an arrow in the image information.

以下、図面を参照して本発明の実施の形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施の形態に係るマルチタスク処理装置の構成＞ <Configuration of Multitask Processing Device According to Embodiment of Present Invention>

本発明の実施の形態に係るマルチタスク処理装置の構成について説明する。図８に示すように、本発明の実施の形態に係るマルチタスク処理装置１００は、ＣＰＵと、ＲＡＭと、後述する各処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。このマルチタスク処理装置１００は、機能的には図８に示すように入力部１０と、演算部２０と、出力部４０とを備えている。 The configuration of the multitask processing device according to the embodiment of the present invention will be described. As shown in FIG. 8, a multi-task processing device 100 according to the embodiment of the present invention includes a CPU, a RAM, and a ROM that stores programs and various data for executing each processing routine described later. It can be composed of a computer. This multitask processing device 100 is functionally provided with an input unit 10, a calculation unit 20, and an output unit 40, as shown in FIG.

入力部１０は、学習データを受け付ける。また、入力部１０は、入力されたデータを受け付ける。学習データには、データと当該データに対する複数のタスクの結果が含まれている。本実施の形態では、学習データ及び入力されたデータが人の顔領域の画像情報である場合を例に説明する。 The input unit 10 receives learning data. The input unit 10 also receives input data. The learning data includes data and results of a plurality of tasks for the data. In the present embodiment, a case where the learning data and the input data are image information of a human face area will be described as an example.

演算部２０は、学習データ記憶部２２と、学習部２４と、モデル記憶部２６と、タスク処理部２８と、フィードバック生成部３０とを含んで構成されている。 The arithmetic unit 20 includes a learning data storage unit 22, a learning unit 24, a model storage unit 26, a task processing unit 28, and a feedback generation unit 30.

学習データ記憶部２２には、入力部１０により受け付けた学習データが格納される。学習データは、データの一例である画像情報と、データに対する複数のタスクの結果の一例である、画像情報の評価及び画像情報のキーポイントとを含んで構成されている。画像情報のキーポイントとしては、両目、鼻、及び口の両端の位置に関する情報が画像情報に対し付与されている。また、画像情報の評価としては表情（例えば、怒り、幸せ、普通）を表す情報が、画像情報に対して付与されている。 The learning data storage unit 22 stores the learning data received by the input unit 10. The learning data is configured to include image information, which is an example of data, and evaluation of image information and key points of the image information, which are examples of results of a plurality of tasks on the data. As key points of the image information, information about the positions of both eyes, nose, and both ends of the mouth is added to the image information. Further, as the evaluation of the image information, information indicating a facial expression (eg, anger, happiness, normal) is added to the image information.

学習部２４は、学習データ記憶部２２に記憶された学習データに基づいて、複数のタスクで特徴空間を共有するマルチタスクモデルであって、かつ、特徴空間の特徴量に基づいて、複数のタスクを行うマルチタスクモデルを学習する。本実施の形態では、画像情報を評価する評価用タスクと画像情報からキーポイントを抽出する情報提示用タスクとで特徴空間を共有するマルチタスクモデルの一例である、マルチタスクＤＮＮ（Deep Neural Network）を学習する。マルチタスクモデルを用いることにより、複数のタスクを処理し、複数のタスクについての出力を得ることができる。本実施の形態では、複数のタスクの一例である、評価用タスク及び情報提示用タスクを処理する場合を例に説明する。そのため、マルチタスクＤＮＮは、特徴空間の特徴量に基づいて、評価用タスクを行い、特徴空間の特徴量に基づいて、情報提示用タスクを行う。 The learning unit 24 is a multi-task model in which the feature space is shared by a plurality of tasks based on the learning data stored in the learning data storage unit 22, and a plurality of tasks based on the feature amount of the feature space. Learn a multitasking model. In the present embodiment, a multitasking DNN (Deep Neural Network), which is an example of a multitasking model in which a feature space is shared by an evaluation task that evaluates image information and an information presentation task that extracts key points from the image information. To learn. By using a multitasking model, it is possible to process multiple tasks and obtain outputs for multiple tasks. In the present embodiment, a case of processing an evaluation task and an information presentation task, which are examples of a plurality of tasks, will be described as an example. Therefore, the multitasking DNN performs the evaluation task based on the feature amount of the feature space, and performs the information presentation task based on the feature amount of the feature space.

本実施の形態のマルチタスクモデルは、画像情報を評価する評価用タスクと、画像情報から、情報の一例であるキーポイントを抽出する情報提示用タスクとを行うためのモデルである。本実施の形態では、マルチタスクモデルが、複数層を有するＤＮＮ（Deep Neural Network）の一例である、ＣＮＮ（Convolutional Neural Network）である場合を例に説明する。従って、本実施の形態で用いるマルチタスクモデルは、複数層を有するマルチタスクＣＮＮである。 The multitasking model of the present embodiment is a model for performing an evaluation task for evaluating image information and an information presentation task for extracting a key point which is an example of information from image information. In the present embodiment, a case where the multitask model is a CNN (Convolutional Neural Network), which is an example of a DNN (Deep Neural Network) having a plurality of layers, will be described as an example. Therefore, the multitasking model used in this embodiment is a multitasking CNN having a plurality of layers.

図９に、マルチタスクモデルにおける３つの定式化のパートについての説明図を示す。図９に示すように、本実施の形態では、評価用タスクの一例である表情認識と、情報提示用タスクであるキーポイント抽出と、フィードバックプランニングとについて、定式化を行う必要がある。 FIG. 9 shows an explanatory diagram of three formulation parts in the multitasking model. As shown in FIG. 9, in the present embodiment, it is necessary to formulate facial expression recognition, which is an example of an evaluation task, keypoint extraction, which is an information presentation task, and feedback planning.

表情認識では、画像情報ｘ_０が入力された場合に、画像情報ｘ_０が表す表情を識別する識別問題として定式化する。例えば、以下の式（１）に従って算出されるｙ^ｃが、画像情報ｘ_０が表す表情であると識別する。なお、ｆはマルチタスクＣＮＮを表し、ｗはマルチタスクＣＮＮのパラメータを表す。また、添え字のｃは評価用タスクの一例である表情認識を表し、添え字ｒは情報提示用タスクであるキーポイント抽出を表す。 In the facial expression recognition, when the image information x ₀ is input, it is formulated as an identification problem for identifying the facial expression represented by the image information x ₀ . For example, y ^c calculated according to the following equation (1) is identified as the facial expression represented by the image information x ₀ . Note that f represents a multitasking CNN, and w represents a parameter of the multitasking CNN. The subscript c represents facial expression recognition, which is an example of an evaluation task, and the subscript r represents keypoint extraction, which is an information presentation task.

（１）
(1)

また、キーポイント抽出では、画像情報ｘ_０が入力された場合に、画像情報ｘ_０のキーポイント抽出を回帰問題として定式化する。例えば、以下の式（２）に従って算出されるｙ^ｒが、画像情報ｘ_０のキーポイントであるとして抽出する。 Further, the key point extraction, if the image information x ₀ is input, formulated as a regression problem keypoint extraction of image information x _0. For example, y ^r calculated according to the following equation (2) is extracted as a key point of the image information x ₀ .

（２）
(2)

また、フィードバックプランニングでは、画像情報ｘ_０と目標となる表情認識の結果である目標値ｙ^ｃ＊とが入力された場合、例えば、以下の式（３）に従ってフィードバックプランニングが行われる。 Further, in the feedback planning, when the image information x ₀ and the target value y ^{c *} which is the result of the target facial expression recognition are input, the feedback planning is performed according to the following formula (3), for example.

（３）
(3)

この問題設定では、フィードバックの正解ｙ^ｒ＊は未知であるため、明示的な学習は不可であるが、本実施の形態では同時学習によって方略を暗黙的に獲得する。 In this problem setting, since the correct answer y ^{r * of} feedback is unknown, explicit learning is impossible, but in the present embodiment, a strategy is implicitly acquired by simultaneous learning.

具体的には、図１０に示すように、表情認識とキーポイント抽出との同時学習により、表情ｙ^ｃとキーポイントｙ^ｒとの関係（方略）を暗黙的に獲得する。そして、獲得された方略を活用し、フィードバックプランニングを行う。 Specifically, as shown in FIG. 10, the relationship (strategy) between the facial expression y ^c and the key point y ^r is implicitly acquired by simultaneous learning of facial expression recognition and keypoint extraction. Then, utilizing the acquired strategies, feedback planning is performed.

マルチタスクＣＮＮの具体的な処理としては、図１１に示すように、入力部１０によって受け付けた画像情報ｘ_０がマルチタスクＣＮＮに適用され、画像情報ｘ_０が何れかの表情へ評価されることで表情認識が行われ、画像情報ｘ_０のキーポイントが抽出される。なお、図１１に示すconvは畳込み層を表し、ｆｃは全結合層を表す。また、入力された画像情報ｘ_０は、マルチタスクＣＮＮによって以下の式に示すように処理が行われ、特徴量ｘ^Ｌが抽出される。 As specific processing of the multitasking CNN, as shown in FIG. 11, the image information x ₀ received by the input unit 10 is applied to the multitasking CNN, and the image information x ₀ is evaluated to any facial expression. Expression recognition is performed and the key points of the image information x ₀ are extracted. Note that conv shown in FIG. 11 represents a convolutional layer, and fc represents a fully connected layer. In addition, the input image information x ₀ is processed by the multitasking CNN as shown in the following equation, and the feature amount x ^L is extracted.

そして、マルチタスクＣＮＮによって、抽出された特徴量ｘ^Ｌを入力として、以下の式（４）に従って表情認識が行われる。 Then, facial expression recognition is performed by the multitasking CNN using the extracted feature quantity x ^L as input, according to the following equation (4).

（４）
(4)

また、マルチタスクＣＮＮによって、抽出された特徴量ｘ^Ｌを入力として、以下の式（５）に従ってキーポイント抽出が行われる。 Further, the multi-task CNN inputs the extracted feature quantity x ^L and performs key point extraction according to the following equation (5).

（５）
(5)

図１２に具体的な学習処理の例を示す。本実施の形態では、図１２に示すように、表情認識とキーポイント抽出とが同時に学習される。また、表情とキーポイントとの関係は共有特徴空間内に埋め込まれる。 FIG. 12 shows an example of a specific learning process. In the present embodiment, as shown in FIG. 12, facial expression recognition and keypoint extraction are learned at the same time. In addition, the relationship between the facial expression and the key point is embedded in the shared feature space.

そして、表情認識に関しては以下の式（６）に示す交差エントロピー誤差を最小化するように、マルチタスクＣＮＮのパラメータｗ^ｓ１，…，ｗ^ｓＬ，ｗ_ｃの学習が行われる。なお、ｙ_ｉ ^ｃは表情を表すラベルであり、笑顔である場合は１をとり、笑顔でない場合には０をとる。また、Ｎは学習データの数である。 Then, regarding facial expression recognition, the parameters w ^s1 , ..., W ^sL , w _c of the multitasking CNN are learned so as to minimize the cross entropy error shown in the following expression (6). Incidentally, y _i ^c is the label representing facial expressions, if a smiling takes one, if not a smile take 0. N is the number of learning data.

（６）
(6)

また、キーポイント抽出に関しては以下の式（７）に示す最小二乗誤差を最小化するように、マルチタスクＣＮＮのパラメータｗ^ｓ１，…，ｗ^ｓＬ，ｗ_ｒの学習が行われる。 Further, regarding key point extraction, learning of the parameters w ^s1 , ..., W ^sL , w _r of the multitask CNN is performed so as to minimize the least-square error shown in the following formula (7).

（７）
(7)

モデル記憶部２６には、学習部２４によって学習されたマルチタスクＣＮＮのパラメータｗ^ｓ１，…，ｗ^ｓＬ，ｗ_ｃ，ｗ_ｒが格納される。 The model storage unit 26 stores the parameters w ^s1 , ..., W ^sL , w _c , and w _{r of the} multitask CNN learned by the learning unit 24.

タスク処理部２８は、入力部１０によって受け付けた画像情報に対して、モデル記憶部２６に格納されたマルチタスクＣＮＮを用いて、複数のタスクについての出力を得る。本実施の形態では、画像情報を表情認識によって評価し、かつ画像情報のキーポイントを抽出する。 The task processing unit 28 uses the multitasking CNN stored in the model storage unit 26 for the image information received by the input unit 10 to obtain outputs for a plurality of tasks. In the present embodiment, the image information is evaluated by facial expression recognition, and the key points of the image information are extracted.

フィードバック生成部３０は、複数のタスクのうちの評価用タスクの一例である表情認識に対して、目標値を与えたときに、目標値に近づくように、情報提示用タスクの一例であるキーポイント抽出の出力を計算する。そして、フィードバック生成部３０は、タスク処理部２８によるキーポイント抽出の出力の計算結果と、目標値に近づくように計算された、キーポイント抽出の出力の計算結果とに基づいて、キーポイント抽出の出力に関する情報を提示する。 For the facial expression recognition, which is an example of an evaluation task among a plurality of tasks, the feedback generation unit 30 makes a keypoint, which is an example of an information presentation task, so as to approach the target value when a target value is given. Calculate the output of the extraction. Then, the feedback generation unit 30 performs the keypoint extraction based on the calculation result of the output of the keypoint extraction by the task processing unit 28 and the calculation result of the output of the keypoint extraction calculated so as to approach the target value. Present information about the output.

具体的には、タスク処理部２８による評価結果と、表情認識に関する評価の目標値と、モデル記憶部２６に格納されたマルチタスクＣＮＮとに基づいて、マルチタスクＣＮＮの表情認識によって目標値であると評価されるときに、マルチタスクＣＮＮによって抽出されるキーポイントを計算し、タスク処理部２８によって抽出されたキーポイントと、計算されたキーポイントとの差分を提示する。 Specifically, the target value is obtained by the facial expression recognition of the multitasking CNN based on the evaluation result by the task processing unit 28, the target value of the evaluation regarding facial expression recognition, and the multitasking CNN stored in the model storage unit 26. When it is evaluated as, the key points extracted by the multi-task CNN are calculated, and the difference between the key points extracted by the task processing unit 28 and the calculated key points is presented.

より詳細には、フィードバック生成部３０は、タスク処理部２８によって入力された画像情報をマルチタスクＣＮＮに適用したときの複数層のうちの何れか１つの層の出力を修正する。マルチタスクＣＮＮの表情認識によって目標値であると評価されるまで、当該何れか１つの層の出力の修正を繰り返す。これにより、フィードバック生成部３０は、マルチタスクＣＮＮの表情認識によって目標値であると評価されるときに、マルチタスクＣＮＮによって抽出されるキーポイントを計算する。 More specifically, the feedback generation unit 30 corrects the output of any one of the plurality of layers when the image information input by the task processing unit 28 is applied to the multitask CNN. The correction of the output of any one of the layers is repeated until the target value is evaluated by the facial expression recognition of the multitasking CNN. Accordingly, the feedback generation unit 30 calculates the key points extracted by the multitasking CNN when the facial expression recognition of the multitasking CNN evaluates that the target value is the target value.

図１３に、フィードバック生成部３０による処理を説明するための説明図を示す。図１３に示すように、タスク処理部２８による評価結果として「Not Smile」という現在状態が得られた場合、フィードバック生成部３０は、目標値である「Smile」を正解として、以下の式（６Ａ）に示す損失関数を計算する。 FIG. 13 shows an explanatory diagram for explaining the processing by the feedback generation unit 30. As shown in FIG. 13, when the current state of “Not Smile” is obtained as an evaluation result by the task processing unit 28, the feedback generation unit 30 sets the target value “Smile” as the correct answer and the following equation (6A ) Calculate the loss function.

（６Ａ）
(6A)

次に、フィードバック生成部３０は、図１４に示すように、バックプロパゲーションにより交差エントロピー誤差を伝播させ、マルチタスクＣＮＮの複数層のうちの何れか１つの層の出力（何れか１つの層の入力）ｘ^ｌを修正する。なお、マルチタスクＣＮＮのパラメータである重みｗは固定のまま使用される。 Next, as shown in FIG. 14, the feedback generation unit 30 propagates the cross entropy error by backpropagation, and outputs the output of any one of the plurality of layers of the multitasking CNN (of any one of the layers). Input) Correct x ^l . The weight w, which is a parameter of the multitasking CNN, is used as it is fixed.

そして、フィードバック生成部３０は、図１５に示すように、フォワードプロパゲーションにより、修正された出力ｘ^ｌを用いて、特徴量ｘ^Ｌを計算し、計算された特徴量ｘ^Ｌに基づいて、目標値を正解とした損失関数を計算する。 Then, as shown in FIG. 15, the feedback generation unit 30 calculates the feature quantity x ^L by using the corrected output x ^l by the forward propagation, and based on the calculated feature quantity x ^L , the target Calculate the loss function with the correct value.

また、フィードバック生成部３０は、損失関数の値が閾値未満となるまで、バックプロパゲーションによる何れか１つの層の出力の修正処理と、特徴量ｘ^Ｌの計算と、損失関数の計算とを繰り返す。
そして、フィードバック生成部３０は、損失関数の値が閾値未満となったとき、修正された出力ｘ^ｌを用いてフォワードプロパゲーションにより得られた特徴量ｘ^Ｌに基づいて、キーポイントの再計算を行う。 Further, the feedback generation unit 30 repeats the correction process of the output of any one layer by backpropagation, the calculation of the feature amount x ^L , and the calculation of the loss function until the value of the loss function becomes less than the threshold value. ..
Then, when the value of the loss function becomes less than the threshold value, the feedback generation unit 30 recalculates the key point based on the feature quantity x ^L obtained by the forward propagation using the corrected output x ^1. To do.

そして、フィードバック生成部３０は、タスク処理部２８によって抽出されたキーポイントと、再計算されたキーポイントとの差分を画像情報に付加して、フィードバックとして出力する。 Then, the feedback generation unit 30 adds the difference between the key point extracted by the task processing unit 28 and the recalculated key point to the image information and outputs it as feedback.

このように、本実施の形態では、タスク処理部２８による評価結果である現在状態と目標値との差分を、マルチタスクＣＮＮ内に深く伝播させてキーポイントの再計算に反映する（図１６参照）。 As described above, in the present embodiment, the difference between the current state and the target value, which is the evaluation result by the task processing unit 28, is deeply propagated in the multitask CNN and reflected in the recalculation of the keypoint (see FIG. 16). ).

出力部４０は、フィードバック生成部３０によって得られたフィードバックを提示する。 The output unit 40 presents the feedback obtained by the feedback generation unit 30.

＜本発明の実施の形態に係るマルチタスク処理装置の作用＞ <Operation of the multitask processing device according to the embodiment of the present invention>

次に、本発明の実施の形態に係るマルチタスク処理装置１００の作用について説明する。マルチタスク処理装置１００では、マルチタスクＣＮＮを学習する学習処理と、マルチタスクＣＮＮを用いてフィードバックを提示するマルチタスク処理とが実行される。 Next, the operation of the multitask processing device 100 according to the embodiment of the present invention will be described. The multitask processing device 100 executes a learning process for learning the multitask CNN and a multitask process for presenting feedback using the multitask CNN.

まず、マルチタスクＣＮＮを学習する学習処理について説明する。入力部１０において学習データを受け付けると、入力部１０は学習データ記憶部２２に学習データを格納する。そして、マルチタスク処理装置１００は、図１７に示す学習処理ルーチンを実行する。 First, a learning process for learning the multitasking CNN will be described. When the input unit 10 receives the learning data, the input unit 10 stores the learning data in the learning data storage unit 22. Then, the multi-task processing device 100 executes the learning processing routine shown in FIG.

＜学習処理ルーチン＞
まず、ステップＳ１００において、学習部２４は、学習データ記憶部２２に記憶された学習データに基づいて、上記式（６）に示す交差エントロピー誤差を最小化し、かつ、上記式（７）に示す最小二乗誤差を最小化するように、マルチタスクＣＮＮのパラメータｗ^ｓ１，…，ｗ^ｓＬ，ｗ_ｃ，ｗ_ｒを学習する。これにより、表情認識とキーポイント抽出とで特徴空間を共有するマルチタスクＣＮＮが取得される。 <Learning processing routine>
First, in step S100, the learning unit 24 minimizes the cross entropy error shown in the above formula (6) based on the learning data stored in the learning data storage unit 22, and the minimum shown in the above formula (7). The parameters w ^s1 , ..., W ^sL , w _c , w _r of the multitasking CNN are learned so as to minimize the squared error. As a result, a multitasking CNN that shares a feature space for facial expression recognition and keypoint extraction is obtained.

次に、ステップＳ１０２において、学習部２４は、上記ステップＳ１００で学習されたマルチタスクＣＮＮのパラメータｗ^ｓ１，…，ｗ^ｓＬ，ｗ_ｃ，ｗ_ｒをモデル記憶部２６に格納して、学習処理ルーチンを終了する。 Next, in step S102, the learning unit 24 stores the parameters w ^s1 , ..., W ^sL , w _c , and w _r of the multitask CNN learned in step S100 in the model storage unit 26, and the learning processing routine. To finish.

次に、マルチタスクＣＮＮを用いてフィードバックを提示するマルチタスク処理について説明する。以下では、入力部１０において画像情報を逐次受け付ける場合を例に説明する。入力部１０において画像情報を受け付けると、マルチタスク処理装置１００は、図１８に示すマルチタスク処理ルーチンを実行する。なお、図１８に示すマルチタスク処理ルーチンは、図１９に示すアルゴリズムの擬似チャートに従って実行される処理の一例である。 Next, the multitasking process of presenting feedback using the multitasking CNN will be described. Hereinafter, a case where the input unit 10 sequentially receives image information will be described as an example. When the input unit 10 receives the image information, the multitask processing device 100 executes the multitask processing routine shown in FIG. 18. The multitask processing routine shown in FIG. 18 is an example of processing executed according to the pseudo chart of the algorithm shown in FIG.

＜マルチタスク処理ルーチン＞
ステップＳ２００において、表情認識に関する評価の目標値を設定する。目標値としては、例えば「Smile」が設定される。 <Multitask processing routine>
In step S200, a target value for evaluation regarding facial expression recognition is set. For example, “Smile” is set as the target value.

ステップＳ２０２において、入力部１０は入力された画像情報を取得する。 In step S202, the input unit 10 acquires the input image information.

ステップＳ２０３において、タスク処理部２８は、上記ステップＳ２０２で受け付けた画像情報を、モデル記憶部２６に格納されたマルチタスクＣＮＮに適用して、画像情報の特徴量を計算する。 In step S203, the task processing unit 28 applies the image information received in step S202 to the multitask CNN stored in the model storage unit 26 to calculate the feature amount of the image information.

ステップＳ２０４において、タスク処理部２８は、上記ステップＳ２０３で計算された特徴量に基づいて、モデル記憶部２６に格納されたマルチタスクＣＮＮにより、画像情報を表情認識によって評価し、評価結果を現在状態として取得する。 In step S204, the task processing unit 28 evaluates the image information by facial expression recognition by the multitasking CNN stored in the model storage unit 26 based on the feature amount calculated in step S203, and the evaluation result is the current state. To get as.

ステップＳ２０６において、タスク処理部２８は、上記ステップＳ２０３で計算された特徴量に基づいて、モデル記憶部２６に格納されたマルチタスクＣＮＮにより、画像情報のキーポイントを抽出する。 In step S206, the task processing unit 28 extracts the key point of the image information by the multitask CNN stored in the model storage unit 26 based on the feature amount calculated in step S203.

ステップＳ２０８において、フィードバック生成部３０は、上記ステップＳ２００で設定された目標値と上記ステップＳ２０４で取得された現在状態とが一致しているか否かを判定する。目標値と現在状態とが一致している場合には、ステップＳ２１０へ進む。一方、目標値と現在状態とが一致していない場合には、ステップＳ２１２へ進む。 In step S208, the feedback generation unit 30 determines whether the target value set in step S200 and the current state acquired in step S204 match. If the target value and the current state match, the process proceeds to step S210. On the other hand, if the target value and the current state do not match, the process proceeds to step S212.

ステップＳ２１０において、フィードバック生成部３０は、上記ステップＳ２０６で抽出されたキーポイントを表示する。 In step S210, the feedback generation unit 30 displays the key points extracted in step S206.

ステップＳ２１２において、フィードバック生成部３０は、上記ステップＳ２００で設定された表情認識に関する評価の目標値と、モデル記憶部２６に格納されたマルチタスクＣＮＮと、上記ステップＳ２０３で計算されたマルチタスクＣＮＮの各層の出力ｘ^１，…，ｘ^Ｌとに基づいて、マルチタスクＣＮＮの表情認識によって目標値であると評価されるときに、マルチタスクＣＮＮによって抽出されるキーポイントを再計算する。ステップＳ２１２は、図２０に示すフィードバック生成処理ルーチンによって実現される。 In step S212, the feedback generation unit 30 determines the target value of the evaluation regarding facial expression recognition set in step S200, the multitask CNN stored in the model storage unit 26, and the multitask CNN calculated in step S203. Based on the outputs x ¹ , ..., X ^{L of} each layer, the key points extracted by the multitasking CNN are recalculated when they are evaluated as target values by the facial expression recognition of the multitasking CNN. Step S212 is implemented by the feedback generation processing routine shown in FIG.

＜フィードバック生成処理ルーチン＞
ステップＳ３００において、フィードバック生成部３０は、特徴量ｘ^Ｌと、表情認識に関する評価の目標値と、モデル記憶部２６に格納されたマルチタスクＣＮＮとに基づいて、損失関数を計算し、計算結果Ｌ^ｓを取得する。 <Feedback generation processing routine>
In step S300, the feedback generation unit 30 calculates the loss function based on the feature value x ^L , the target value of the evaluation regarding facial expression recognition, and the multitask CNN stored in the model storage unit 26, and the calculation result L get ^s .

ステップＳ３０２において、フィードバック生成部３０は、上記ステップＳ３００で取得された損失関数の計算結果Ｌ^ｓに基づいて、例えば以下の式（８）に従ったバックプロパゲーションにより、特定の層の入力ｘ^ｌを修正する。 In step S302, the feedback generation unit 30 performs, based on the calculation result L ^s of the loss function acquired in step S300, for example, back propagation according to the following equation (8) to input x ^l of the specific layer. To fix.

（８）
(8)

ステップＳ３０４において、フィードバック生成部３０は、上記ステップＳ３０２での修正結果に基づいて、フォワードプロパゲーションにより特徴量ｘ^Ｌを計算する。 In step S304, the feedback generation unit 30 calculates the feature amount x ^L by forward propagation based on the correction result in step S302.

ステップＳ３０６において、フィードバック生成部３０は、上記ステップＳ３００で計算されたＬ^ｓの値が予め設定された閾値以下であるか否かを判定する。Ｌ^ｓの値が予め設定された閾値以下である場合には、ステップＳ３０７へ進む。一方、Ｌ^ｓの値が予め設定された閾値より大きい場合には、ステップＳ３００へ戻る。 In step S306, the feedback generation unit 30 determines whether the value of L ^s calculated in step S300 is equal to or less than a preset threshold value. If the value of L ^s is less than or equal to the preset threshold value, the process proceeds to step S307. On the other hand, when the value of L ^s is larger than the preset threshold value, the process returns to step S300.

ステップＳ３０７において、フィードバック生成部３０は、画像情報のキーポイントを抽出する。 In step S307, the feedback generator 30 extracts key points of the image information.

ステップＳ３０８において、フィードバック生成部３０は、上記ステップＳ３０３で計算された特徴量ｘ^Ｌに基づいて、モデル記憶部２６に格納されたマルチタスクＣＮＮにより、キーポイントを再計算し、再計算されたキーポイントを出力し、フィードバック生成処理ルーチンを終了する。 In step S308, the feedback generation unit 30 recalculates the keypoint by the multitasking CNN stored in the model storage unit 26 based on the feature value x ^L calculated in step S303, and the recalculated key is calculated. The point is output, and the feedback generation processing routine ends.

次に、図１８のマルチタスク処理ルーチンのステップＳ２１４へ戻り、上記ステップＳ２０６で抽出されたキーポイントと上記ステップＳ２１２で得られたキーポイントとの差分をフィードバックとして表示する。例えば、上記ステップＳ２０６で抽出されたキーポイントから、上記ステップＳ２１２で得られたキーポイントへ向かう矢印をフィードバックとして表示する。 Next, returning to step S214 of the multi-task processing routine of FIG. 18, the difference between the key point extracted in step S206 and the key point obtained in step S212 is displayed as feedback. For example, the arrow pointing from the key point extracted in step S206 to the key point obtained in step S212 is displayed as feedback.

ステップＳ２１６において、フィードバック生成部３０は、処理を終了するか否かを判定する。例えば、画像情報の入力が終了した場合には、マルチタスク処理ルーチンを終了する。画像情報の入力が続いている場合には、ステップＳ２０２へ戻る。 In step S216, the feedback generation unit 30 determines whether to end the process. For example, when the input of the image information is completed, the multitask processing routine is ended. When the image information is continuously input, the process returns to step S202.

［実験結果］ [Experimental result]

次に、本実施の形態に係るマルチタスク処理装置１００に関する実験結果について説明する。図２１に示すように、（ａ）〜（ｄ）の４種類のマルチタスクＣＮＮを用いて実験を行った。 Next, an experimental result regarding the multitask processing apparatus 100 according to the present embodiment will be described. As shown in FIG. 21, an experiment was conducted using four types of multitasking CNNs (a) to (d).

（ａ）のマルチタスクＣＮＮは、浅い層の出力を修正するものである。また、（ｂ）のマルチタスクＣＮＮは、（ａ）のマルチタスクＣＮＮに比べ深い層の出力を修正するものである。また、（ｃ）のマルチタスクＣＮＮは、最も深い層の出力を修正するものである。従って、（a）〜（ｃ）のモデルはフィードバックを生成するときにバックプロパゲーションによる修正の層の位置を変更したものである。一方、（ｄ）のマルチタスクＣＮＮは、評価用タスクである表情認識と、情報提示用タスクであるキーポイント抽出とを分けたＣＮＮであり、表情認識とキーポイント抽出とが別々のネットワークによって学習される。 The multitasking CNN in (a) corrects the output of the shallow layer. Further, the multitasking CNN of (b) corrects the output of a deeper layer than the multitasking CNN of (a). Further, the multitasking CNN of (c) corrects the output of the deepest layer. Therefore, the models (a) to (c) are obtained by changing the position of the layer of correction by backpropagation when generating feedback. On the other hand, the multi-task CNN in (d) is a CNN that separates facial expression recognition, which is an evaluation task, and keypoint extraction, which is an information presentation task, and facial expression recognition and keypoint extraction are learned by different networks. To be done.

実験に用いたＣＮＮのネットワーク構造の詳細を表１に示す。 Table 1 shows the details of the CNN network structure used in the experiment.

また、図２２に示すように、口角の動く向きについて評価の定義をＦＡＣＳ（Facial Action Coding System）により行った。ＦＡＣＳは、ある表情の時の顔の物理的な変化に関する基準を定義するものであり、例えば、「Anger: Lip tightener, Happiness: Lip corner puller」というように、表情と口角の動く向きについての対応付けにより基準が定義される。 In addition, as shown in FIG. 22, the evaluation of the moving direction of the corner of the mouth was defined by FACS (Facial Action Coding System). FACS defines the criteria for physical changes in the facial expression of a certain facial expression. For example, “Anger: Lip tightener, Happiness: Lip corner puller” corresponds to the movement direction of facial expression and mouth corner. Criteria define the criteria.

表２に、実験結果の概要を示す。表２に示されるように、バックプロパゲーションによる修正が行われる層が浅いほど精度が良く、深いほど精度が悪いことがわかる。これにより、顔領域内の目印であるキーポイントと顔の感情を表す表情との関係は、これらの層（マルチタスクＣＮＮの浅い層又は後段の層）に保持されていることがわかる。また、２つのタスクを分けたネットワークのモデルである（ｄ）は最も悪い結果となった。これにより、マルチタスクモデルによる学習が有用であることがわかる。 Table 2 shows a summary of the experimental results. As shown in Table 2, it can be seen that the shallower the layer to be modified by back propagation, the better the accuracy, and the deeper the layer, the worse the accuracy. From this, it can be seen that the relationship between the key points, which are the landmarks in the face area, and the facial expression expressing the emotion of the face is held in these layers (the shallow layer of the multitasking CNN or the subsequent layer). In addition, (d) which is a model of a network in which two tasks are divided has the worst result. From this, it is found that the learning by the multitask model is useful.

また、図２３に、マルチタスクＣＮＮによって得られたフィードバックの提示の例を示す。「To happiness」は、笑顔になるためのフィードバックを表しており、適切にフィードバックの提示がなされていることがわかる。また、様々な入力、ターゲットの表情に対して適応的にフィードバック生成が可能であることがわかる。 Further, FIG. 23 shows an example of presentation of feedback obtained by the multitasking CNN. “To happiness” represents feedback for smiling, and it can be seen that the feedback is properly presented. Also, it can be seen that it is possible to adaptively generate feedback for various input and facial expressions of the target.

以上説明したように、本発明の実施の形態に係るマルチタスク処理装置によれば、入力された画像情報に対して、複数のタスクを処理するためのマルチタスクモデルを用いて、複数のタスクについての出力を得ることにより、画像情報について、適切な評価結果及びキーポイントを１つのモデルから取得することができる。 As described above, according to the multi-task processing device according to the embodiment of the present invention, with respect to the input image information, the multi-task model for processing the plurality of tasks is used, and By obtaining the output of, the appropriate evaluation result and key point for the image information can be acquired from one model.

また、本発明の実施の形態に係るマルチタスク処理装置によれば、入力された画像情報を、画像情報を評価する評価用タスクと画像情報からキーポイントを抽出する情報提示用タスクとを行うためのマルチタスクＣＮＮに適用して、入力された画像情報を評価し、かつ、入力された画像情報のキーポイントを抽出することにより、画像情報について、適切な評価結果及びキーポイントを１つのモデルから取得することができる。 Further, according to the multi-task processing device according to the embodiment of the present invention, the input image information is subjected to an evaluation task for evaluating the image information and an information presentation task for extracting a key point from the image information. By applying to the multi-task CNN of, the input image information is evaluated, and the key points of the input image information are extracted, an appropriate evaluation result and key points for the image information can be obtained from one model. Can be obtained.

また、本発明の実施の形態に係るマルチタスク処理装置によれば、タスク処理部による評価結果と、評価用タスクによる評価の目標値と、マルチタスクＣＮＮとに基づいて、マルチタスクＣＮＮの評価用タスクによって目標値であると評価されるときに、マルチタスクＣＮＮの情報提示用タスクによって抽出されるキーポイントを計算し、タスク処理部によって抽出されたキーポイントと、計算されたキーポイントとの差分を提示することにより、適切なフィードバックを提示することができる。 Further, according to the multi-task processing device according to the embodiment of the present invention, the multi-task CNN for evaluation is based on the evaluation result by the task processing unit, the evaluation target value by the evaluation task, and the multi-task CNN. When the task evaluates to be the target value, the key point extracted by the information presenting task of the multitask CNN is calculated, and the difference between the key point extracted by the task processing unit and the calculated key point is calculated. By presenting, it is possible to present appropriate feedback.

また、本発明の実施の形態に係るマルチタスク処理装置によれば、画像情報と、画像情報の評価と、画像情報のキーポイントとを含んで構成される学習データに基づいて、画像情報を評価する評価用タスクと画像情報からキーポイントを抽出する情報提示用タスクとで特徴空間を共有するマルチタスクＣＮＮであって、かつ、特徴空間の特徴量に基づいて、評価用タスクを行い、特徴空間の特徴量に基づいて、情報提示用タスクを行うマルチタスクＣＮＮを学習することにより、適切な評価結果及びキーポイントを取得するためのマルチタスクモデルを得ることができる。 Further, according to the multi-task processing device according to the embodiment of the present invention, image information is evaluated based on learning data including image information, evaluation of image information, and key points of image information. A multitasking CNN in which a feature space is shared by an evaluation task and an information presentation task for extracting key points from image information, and the evaluation task is performed based on the feature amount of the feature space. By learning the multitasking CNN that performs the information presentation task based on the feature quantity of, it is possible to obtain a multitasking model for acquiring appropriate evaluation results and key points.

また、マルチタスクＣＮＮの複数層のうち後段の層の出力を修正することにより、画像情報の評価と画像情報のキーポイントとの関係を保持している層の出力が修正され、適切な評価結果及びキーポイントを取得することができる。また、後段の層のうち出力層からより深い層を修正すると、出力まで多数の非線形変換が行われるため、多様性を含む出力を得ることができる。一方、後段の層のうち出力層からより浅い層を修正すると、出力まで少数の非線形変換または線形変換が行われるため、フィードバック結果を直接反映した出力を得ることができる。 Further, by correcting the output of the subsequent layer among the multiple layers of the multitasking CNN, the output of the layer that holds the relationship between the image information evaluation and the key point of the image information is corrected, and the appropriate evaluation result is obtained. And key points can be acquired. In addition, when a deeper layer is modified from the output layer in the subsequent layers, a large number of non-linear transformations are performed up to the output, so that an output including diversity can be obtained. On the other hand, when a shallower layer is modified from the output layer in the subsequent layers, a small number of non-linear or linear transformations are performed up to the output, so that the output directly reflecting the feedback result can be obtained.

また、本実施の形態のマルチタスクＣＮＮは１入力に対して２出力を行うことができ、二つの出力間で情報を伝播することで、フィードバックを生成することができる。また、ネットワークを深く伝播することで、入力の画像情報も反映することができる。 In addition, the multitasking CNN of the present embodiment can perform two outputs for one input, and can generate feedback by propagating information between two outputs. Further, by propagating deeply in the network, input image information can also be reflected.

また、多様な入力データに対して適応的にフィードバック生成を行うことができ、フィードバック生成に必要な「ルール」はデータから学習することができる。 Further, feedback generation can be adaptively performed on various input data, and the "rule" required for feedback generation can be learned from the data.

なお、本発明は、上述した実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications can be made without departing from the scope of the present invention.

例えば、上記の実施の形態では、マルチタスクモデルの一例として、マルチタスクＣＮＮを用いる場合を例に説明したが、これに限定されるものではない。例えば、他のマルチタスクＤＮＮを用いてもよいし、ＤＮＮに限らず他のマルチタスクモデルを用いてもよい。 For example, although cases have been described with the above embodiment as examples of using a multitasking CNN as an example of a multitasking model, the present invention is not limited to this. For example, another multitasking DNN may be used, or another multitasking model may be used instead of the DNN.

また、上記の実施の形態では、データの一例として画像情報を用いる場合を例に説明したが、これに限定されるものではなく、例えば、音響信号や農業分野で用いる観測データにも使えるような信号を用いてもよい。 Further, in the above-described embodiment, the case where image information is used as an example of data has been described, but the present invention is not limited to this, and for example, it can be used for acoustic signals and observation data used in the agricultural field. You may use a signal.

また、上記の実施の形態では、学習処理とマルチタスク処理とを行うマルチタスク処理装置を例に説明したが、学習処理を行うマルチタスクモデル学習装置とマルチタスク処理を行うマルチタスク処理装置とを別々の装置として構成してもよい。この場合には、マルチタスクモデル学習装置は、学習データ記憶部２２と学習部２４とを備え、マルチタスク処理装置は、タスク処理部２８とフィードバック生成部とを備える。 In the above embodiment, the multitask processing device that performs the learning process and the multitask process has been described as an example. However, the multitask model learning device that performs the learning process and the multitask processing device that performs the multitask process are described. It may be configured as a separate device. In this case, the multitask model learning device includes a learning data storage unit 22 and a learning unit 24, and the multitask processing device includes a task processing unit 28 and a feedback generation unit.

また、上記の実施の形態では、評価用タスクの評価の処理の一例として、表情認識が笑顔であるか否かを評価する場合を例に説明したが、これに限定されるものではない。例えば、「回帰（Regression）」（例：表情の場合、笑顔度）、「順位付け（Ranking）」（例：表情の場合、Ａさんよりも笑顔で、Ｂさんよりも笑顔でない）などを、評価用タスクの評価の処理としてもよい。 Further, although cases have been described with the above embodiments as examples of the evaluation task evaluation process where the facial expression recognition is evaluated as to whether or not it is a smile, the present invention is not limited to this. For example, "Regression" (example: smile degree in case of facial expression), "Ranking" (example: in case of facial expression, smiling than A, not smiling than B), etc. The evaluation task may be evaluated.

また、上記の実施の形態では、情報提示用タスクの情報の抽出の処理の一例として、画像情報からのキーポイント抽出の処理を例に説明したが、これに限定されるものではない。例えば、本実施の形態のキーポイントは点で表されるが、Bounding Boxのような領域で表される情報を抽出するようにしてもよい。 Further, in the above-described embodiment, the process of extracting the key point from the image information is described as an example of the process of extracting the information of the information presentation task, but the present invention is not limited to this. For example, the key points of this embodiment are represented by dots, but information represented by a region such as a bounding box may be extracted.

また、上記の実施の形態では、評価用タスクが一つであり、情報提示用タスクが一つである場合について説明したが、これに限定されるものではなく、評価用タスクが複数あり、情報提示用タスクが複数あってもよい。この場合には、複数または一つの評価用タスクに対して目標値を与えたときに、目標値に近づくように、情報提示用タスクの出力を計算する。 Further, in the above-described embodiment, the case where there is one evaluation task and one information presentation task has been described, but the present invention is not limited to this, and there are a plurality of evaluation tasks. There may be multiple presentation tasks. In this case, when a target value is given to a plurality of or one evaluation task, the output of the information presentation task is calculated so as to approach the target value.

また、上記の実施の形態では、タスク処理部２８によって抽出されたキーポイントと、再計算されたキーポイントとの差分を、フィードバックとして提示する場合を例に説明したが、これに限定されるものではない。例えば、フィードバックに関係する領域を提示する場合は、差分で提示するのではなく、目標値に対して再計算した結果そのものを提示するようにしてもよい。
また、複数の情報提示用タスクの結果を組み合わせて情報提示してもよい。例えば、フィードバックに関係する領域を提示するものと、キーポイントを抽出するものがあった場合、領域内のキーポイントだけを提示するようにしてもよい。 Further, although cases have been described with the above embodiment as examples where the difference between the key points extracted by the task processing unit 28 and the recalculated key points is presented as feedback, the present invention is not limited to this. is not. For example, when presenting the area related to the feedback, the result itself recalculated with respect to the target value may be presented instead of presenting the difference.
Further, information may be presented by combining the results of a plurality of information presentation tasks. For example, if there is one that presents a region related to feedback and one that extracts key points, only the key points within the region may be presented.

上述のマルチタスク処理装置は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 The above-mentioned multi-task processing device has a computer system inside, but the “computer system” also includes a homepage providing environment (or display environment) if a WWW system is used.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。 Further, in the specification of the present application, the embodiment in which the program is preinstalled has been described, but the program can be stored in a computer-readable recording medium and provided.

１０入力部
２０演算部
２２学習データ記憶部
２４学習部
２６モデル記憶部
２８タスク処理部
３０フィードバック生成部
４０出力部
１００マルチタスク処理装置 10 input unit 20 arithmetic unit 22 learning data storage unit 24 learning unit 26 model storage unit 28 task processing unit 30 feedback generation unit 40 output unit 100 multi-task processing device

Claims

A multi-task DNN (Deep Neural Network) having a plurality of layers for processing a plurality of tasks including an evaluation task for evaluating data and an information presentation task for extracting information from the data with respect to the input data. ) using, said evaluation task, the task processing unit for obtaining an output for said information presentation tasks,
For presenting the information when the input or output of any one layer of the multi-task DNN is modified so that the output of a plurality of or one evaluation task of the plurality of tasks approaches a target value. Recalculate the task output,
Feedback for presenting information regarding the output of the information presenting task based on the calculation result of the output of the information presenting task by the task processing unit and the recalculated calculation result of the output of the information presenting task A generator,
A multitasking processing device including.

The task processing unit has a multitasking DNN having a plurality of layers, for the input image, for performing facial expression recognition of a person's face shown in the image and extracting key points representing the facial part of the person. By using, to perform facial expression recognition of the person in the image and extraction of the person's key points,
The feedback generation unit recalculates the keypoint extraction when the input or the output of any one layer of the multitasking DNN is corrected so that the facial expression recognition approaches the target value,
Presenting feedback in which the difference between the keypoint extraction result by the task processing unit and the recalculated keypoint extraction result is added to the image.
The multitasking processing device according to claim 1 .

The multitasking DNN shares a feature space between the evaluation task and the information presentation task, performs the evaluation task for evaluating data based on the feature amount of the feature space, and determines the characteristics of the feature space. It is a model that performs the information presentation task of extracting information from data based on quantity .
The multitasking processing device according to claim 1 or 2 .

The multi-task processing device according to claim 1, wherein the data is image information.

A multitasking DN N that shares a feature space among the plurality of tasks based on learning data composed of data and results of a plurality of tasks for the data, and based on a feature amount of the feature space, Further comprising a learning unit for learning the multitasking DNN that performs the plurality of tasks ,
The task processing unit uses the multitask DNN learned by the learning unit to obtain outputs for the evaluation task and the information presentation task,
The multitasking processing device according to claim 1.

A program for a computer to function as each unit constituting the multi-task processing equipment of any one of claims 1 to 5.