JP7700951B2

JP7700951B2 - Image conversion device, method and program

Info

Publication number: JP7700951B2
Application number: JP2024502365A
Authority: JP
Inventors: 雄貴蔵内; 真奈笹川; 直紀萩山; 文香佐野; 隆二山本
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2025-07-01
Anticipated expiration: 2042-02-25
Also published as: JPWO2023162132A1; WO2023162132A1

Description

本発明の実施形態は、画像変換装置、方法およびプログラムに関する。 Embodiments of the present invention relate to image conversion devices, methods and programs.

非特許文献１は、リアルタイムな表情変形（表情変換）フィードバックによる感情体験の操作の可能性について開示している。非特許文献１では、被験者の顔をリアルタイムにトラッキング（tracking）して自然な表情変形処理を施している。非特許文献１では、画像変換法としてＲｉｇｉｄＭＬＳ（Moving Least Squares）法を使用して、顔画像における表情を変形している。ＲｉｇｉｄＭＬＳ法は、画像から認識した画像中の特徴点を認識して、これを移動させることで、画像を歪めるという手法である。このような手法は非特許文献２にも開示される。なお、顔画像とは、被験者の顔を撮影した画像、コンピュータが生成したアバターの顔を抽出した画像、などである。Non-Patent Document 1 discloses the possibility of manipulating emotional experiences by feedback of real-time facial deformation (facial expression transformation). In Non-Patent Document 1, the subject's face is tracked in real time and a natural facial expression transformation process is performed. In Non-Patent Document 1, the Rigid MLS (Moving Least Squares) method is used as an image transformation method to deform the facial expression in the facial image. The Rigid MLS method is a method of distorting an image by recognizing and moving feature points in an image recognized from the image. Such a method is also disclosed in Non-Patent Document 2. Note that the facial image may be an image of the subject's face, an image of an avatar's face extracted by a computer, or the like.

吉田成朗（Shigeo Yoshida）ら，「リアルタイムな表情変形フィードバックによる感情体験の操作（Manipulation of Emotional Experience by Real-time Deformed Facial Feedback）」，ヒューマンインタフェース学会論文誌（The Transactions of Human Interface Society），Vol.17，No.1，2015Shigeo Yoshida et al., "Manipulation of Emotional Experience by Real-time Deformed Facial Feedback," The Transactions of Human Interface Society, Vol. 17, No. 1, 2015 Tomas Jakab, et al., “Unsupervised Learning of Object Landmarks through Conditional Image Generation”, NIPS, 2018.Tomas Jakab, et al., “Unsupervised Learning of Object Landmarks through Conditional Image Generation”, NIPS, 2018.

しかしながら、被験者の顔の角度が変わったり、顔の一部が隠れたりすることで、上記の特徴点の認識ができなかった場合、不自然なタイミング（timing）にて表情変換が止まってしまうため、不自然な変換による顔画像しか得ることができない。すなわち、顔の画像に表れる表情をシームレス（seamless）に変換することができない。However, if the angle of the subject's face changes or part of the face is hidden and the above feature points cannot be recognized, the facial expression conversion stops at an unnatural timing, and only facial images with unnatural conversion can be obtained. In other words, the facial expressions shown in the facial image cannot be converted seamlessly.

この発明は、上記事情に着目してなされたもので、その目的とするところは、顔の画像に表れる表情をシームレスに変換することができるようにした画像変換装置、方法およびプログラムを提供することにある。This invention has been made in light of the above-mentioned circumstances, and its purpose is to provide an image conversion device, method and program that can seamlessly convert facial expressions shown in facial images.

上記課題を解決するために、この発明の一態様に係る画像変換装置は、人の顔が含まれる画像から認識された顔パーツの特徴点を認識する特徴点認識部と、前記画像における顔が正面から認識できなくなる限界の角度に対する、正面からの前記顔の角度の比率と、前記顔の全体の領域に対する前記顔が物体で隠れている領域が除かれた領域の割合に基づいて、前記認識された顔の表情を変換するべき変換表情に変換するときの、前記変換表情に応じた前記顔パーツの特徴点のそれぞれについての変形量を表す変化量を補正する変化量補正部と、前記補正した変化量により前記特徴点を変形することで前記人の顔の表情を変換した変換画像を得る表情変換部と、を備える。 In order to solve the above problem, an image conversion device according to one embodiment of the present invention comprises a feature point recognition unit that recognizes feature points of facial parts recognized from an image including a human face, a change amount correction unit that corrects a change amount representing the amount of deformation for each of the feature points of the facial parts corresponding to the converted expression when converting the recognized facial expression into a converted expression to be converted, based on the ratio of the angle of the face from the front to the limit angle at which the face in the image cannot be recognized from the front and the proportion of the area excluding areas of the face that are obscured by objects to the entire area of the face, and an expression conversion unit that obtains a converted image in which the human facial expression is converted by deforming the feature points by the corrected change amount.

上記課題を解決するために、この一態様に係る画像変換方法は、人の顔の画像における表情を変換する画像変換装置により行われる方法であって、前記画像変換装置の特徴点認識部により、人の顔が含まれる画像から認識された顔パーツの特徴点を認識することと、前記画像変換装置の変化量補正部により、前記画像における顔が正面から認識できなくなる限界の角度に対する、正面からの前記顔の角度の比率と、前記顔の全体の領域に対する前記顔が物体で隠れている領域が除かれた領域の割合に基づいて、前記認識された顔の表情を変換するべき変換表情に変換するときの、前記変換表情に応じた前記顔パーツの特徴点のそれぞれについての変形量を表す変化量を補正することと、前記画像変換装置の表情変換部により、前記補正した変化量により前記特徴点を変形することで前記人の顔の表情を変換した変換画像を得ることと、を具備する。 In order to solve the above problem, an image conversion method according to one embodiment is a method performed by an image conversion device that converts facial expressions in an image of a human face, and includes: recognizing, by a feature point recognition unit of the image conversion device, feature points of facial parts recognized from an image including a human face; correcting, by a change amount correction unit of the image conversion device, a change amount representing the amount of deformation for each of the feature points of the facial parts corresponding to the converted expression when converting the recognized facial expression into a converted expression to be converted, based on a ratio of the angle of the face from the front to the limit angle at which the face in the image cannot be recognized from the front and a ratio of an area excluding areas where the face is hidden by objects to the entire area of the face; and obtaining a converted image in which the facial expression of the person is converted by transforming the feature points by the corrected change amount, by a facial expression conversion unit of the image conversion device.

本発明によれば、顔の画像に表れる表情をシームレスに変換することができる。 The present invention makes it possible to seamlessly transform facial expressions appearing in facial images.

図１は、この発明の一実施形態に係る画像変換装置の構成の一例を示すブロック図（block diagram）である。FIG. 1 is a block diagram showing an example of the configuration of an image conversion device according to an embodiment of the present invention. 図２は、画像変換装置のハードウェア（hardware）構成の一例を示す図である。FIG. 2 is a diagram showing an example of the hardware configuration of the image conversion device. 図３は、顔の特徴点の一例を示す図である。FIG. 3 is a diagram showing an example of facial feature points. 図４は、特徴点の記憶形態の一例を示す図である。FIG. 4 is a diagram showing an example of a storage format of feature points. 図５は、変化量の記憶形態の一例を示す図である。FIG. 5 is a diagram showing an example of a storage format of the amount of change. 図６は、画像変換装置による画像変換処理動作の一例を示すフローチャート（flow chart）である。FIG. 6 is a flow chart showing an example of an image conversion processing operation by the image conversion device. 図７は、表示割合算出部により用いられるニューラルネットワーク（neural network）の一例を示す図である。FIG. 7 is a diagram showing an example of a neural network used by the display ratio calculation unit. 図８は、表示割合算出部により処理されるグリッドセル（grid cell）（グリッド領域）の一例を示す図である。FIG. 8 is a diagram showing an example of a grid cell (grid area) processed by the display ratio calculation unit.

［一実施形態］
以下、図面を参照して、この発明に係わる一実施形態を説明する。
（構成例）
図１は、この発明の一実施形態に係る画像変換装置の構成の一例を示すブロック図である。
図１に示される例では、この発明の一実施形態に係る画像変換装置１００は、画像取得部１１、特徴点認識部１２、顔角度算出部１３、表示割合算出部１４、変換表情入力部１５、変化量格納部１６、変化量補正部１７、表情変換部１８、及び画像出力部１９を有する。 [One embodiment]
An embodiment of the present invention will now be described with reference to the drawings.
(Configuration example)
FIG. 1 is a block diagram showing an example of the configuration of an image conversion device according to an embodiment of the present invention.
In the example shown in Figure 1, an image conversion device 100 according to one embodiment of the present invention has an image acquisition unit 11, a feature point recognition unit 12, a face angle calculation unit 13, a display ratio calculation unit 14, a converted facial expression input unit 15, a change amount storage unit 16, a change amount correction unit 17, a facial expression conversion unit 18, and an image output unit 19.

画像取得部１１は、例えばｗｅｂカメラ（camera）により撮影された画像またはアバター（avatar）などからユーザ（user）の顔画像を取得する。画像取得部１１は、取得した顔画像を、特徴点認識部１２、表示割合算出部１４、及び表情変換部１８に出力する。The image acquisition unit 11 acquires a facial image of a user from, for example, an image captured by a web camera or an avatar. The image acquisition unit 11 outputs the acquired facial image to the feature point recognition unit 12, the display ratio calculation unit 14, and the facial expression conversion unit 18.

特徴点認識部１２は、画像取得部１１が取得した顔画像を入力とし、その顔画像から認識される顔パーツ（parts）の特徴点を認識する。この特徴点認識部１２における特徴点の認識手法については後述する。特徴点認識部１２は、認識した特徴点を顔角度算出部１３及び変化量補正部１７に出力する。The feature point recognition unit 12 receives the facial image acquired by the image acquisition unit 11 as input, and recognizes the feature points of the facial parts recognized from the facial image. The method of recognizing feature points in the feature point recognition unit 12 will be described later. The feature point recognition unit 12 outputs the recognized feature points to the face angle calculation unit 13 and the change amount correction unit 17.

顔角度算出部１３は、特徴点認識部１２が認識した特徴点を入力とし、顔画像における顔の角度、例えば顔が正面を向いたときの位置を基準とした、顔の中心の現在の位置との間の角度（正面からの顔の角度と称することがある）を算出して、この算出した角度のデータ（data）を変化量補正部１７に出力する。The face angle calculation unit 13 receives as input the feature points recognized by the feature point recognition unit 12, calculates the face angle in the face image, for example the angle between the current position of the centre of the face and the position when the face is facing forward (sometimes referred to as the face angle from the front), and outputs data on this calculated angle to the change amount correction unit 17.

表示割合算出部１４は、画像取得部１１が取得した顔画像を入力とし、その顔画像に対して顔の全体のうち隠れている部分の割合を算出し、この算出した割合のデータを変化量補正部１７に出力する。The display ratio calculation unit 14 receives the facial image acquired by the image acquisition unit 11 as input, calculates the ratio of the hidden part of the entire face for that facial image, and outputs the calculated ratio data to the change amount correction unit 17.

変換表情入力部１５は、キーボード（keyboard）などのユーザインタフェース（user interface）からユーザが指定入力した、笑顔などの変換したい先の表情である変換表情（変換するべき変換表情と称することがある）を取得する。変換表情入力部１５は、取得した変換表情を変化量補正部１７に出力する。The conversion facial expression input unit 15 acquires a conversion facial expression (sometimes called a conversion facial expression to be converted), which is the facial expression to be converted to, such as a smile, that is specified and input by the user from a user interface such as a keyboard. The conversion facial expression input unit 15 outputs the acquired conversion facial expression to the change amount correction unit 17.

変化量格納部１６には、変換したい先の表情ごとに、各特徴点についての変形量（座標値の移動量）を表す変化量が予め格納（記憶）される。変化量は、変換したい先の表情に応じて各特徴点の座標値を、どの程度移動すべきかを示す情報である。変化量は、例えば、ユーザが特定の顔画像について無表情顔に表情変形処理を適用しながら、自然な表情となるように調整して、予め求めることができる。The change amount storage unit 16 pre-stores (memorizes) a change amount representing the amount of deformation (amount of movement of coordinate values) for each feature point for each facial expression to be converted. The change amount is information indicating how much the coordinate values of each feature point should be moved according to the facial expression to be converted. The change amount can be determined in advance, for example, by a user applying facial expression transformation processing to an expressionless face for a specific facial image, while adjusting it so that the facial expression becomes natural.

変化量補正部１７は、特徴点認識部１２が認識した特徴点、顔角度算出部１３により算出した顔角度、及び表示割合算出部１４により算出した表示割合を入力する。
また、変化量補正部１７は、変換表情入力部１５から入力された変換表情で示される変換したい先の表情に応じた変化量を変化量格納部１６から読み出す。
変化量補正部１７は、これら入力した特徴点、顔角度、及び表示割合に基づいて、変換したい先の表情における変化量を後述する式によって補正した変化量を算出し、この算出した変化量のデータを表情変換部１８に出力する。 The change amount correction unit 17 receives as input the feature points recognized by the feature point recognition unit 12 , the face angle calculated by the face angle calculation unit 13 , and the display ratio calculated by the display ratio calculation unit 14 .
Furthermore, the change amount correction section 17 reads out from the change amount storage section 16 the amount of change corresponding to the facial expression to be converted, which is indicated by the converted facial expression input from the converted facial expression input section 15 .
The change amount correction unit 17 calculates the amount of change by correcting the amount of change in the facial expression to be converted based on the input feature points, face angle, and display ratio using a formula described later, and outputs the calculated amount of change data to the facial expression conversion unit 18.

表情変換部１８は、変化量補正部１７が補正した変化量を入力とする。表情変換部１８は、上記補正した変化量、すなわち変換するべき変換表情に応じた変形量を表す変化量に基づいて、入力された顔画像における各特徴点を、入力した、その特徴点の補正した変化量である移動量に基づいて移動することで、顔画像の表情を変換した顔画像を得る。表情変換部１８は、変換後の顔画像を画像出力部１９に出力する。The facial expression conversion unit 18 receives the amount of change corrected by the change amount correction unit 17 as input. The facial expression conversion unit 18 moves each feature point in the input facial image based on the corrected amount of change, i.e., the amount of change representing the amount of deformation corresponding to the converted facial expression to be converted, based on the input movement amount that is the corrected amount of change for that feature point, thereby obtaining a facial image in which the facial expression of the facial image has been converted. The facial expression conversion unit 18 outputs the converted facial image to the image output unit 19.

画像出力部１９は、表情変換部１８からの変換後の顔画像を入力とし、入力された顔画像を出力する。ここで、出力とは、例えば、記憶媒体に記憶すること、ディスプレイ（display）で表示すること、通信ネットワークを介して他の機器へ送信すること、などを含む。The image output unit 19 receives the converted facial image from the facial expression conversion unit 18 as input, and outputs the input facial image. Here, output includes, for example, storing the facial image in a storage medium, displaying the facial image on a display, transmitting the facial image to another device via a communication network, and the like.

図２は、画像変換装置１００のハードウェア構成の一例を示す図である。
画像変換装置１００は、例えば、パーソナルコンピュータ（Personal computer）、スマートホン（smart phone）、サーバコンピュータ（server computer）、などのコンピュータにより構成される。画像変換装置１００は、図２に示すように、ＣＰＵ（Central Processing Unit）等のハードウェアプロセッサ（hardware processor）（単にプロセッサと称することがある）１１１Ａを有する。なお、ＣＰＵは、マルチコア（multi-core）及びマルチスレッド（multithread）のものを用いることで、同時に複数の情報処理を実行することができる。また、プロセッサ１１１Ａは、複数のＣＰＵを備えていても良い。そして、画像変換装置１００では、このプロセッサ１１１Ａに対し、プログラムメモリ（program memory）１１１Ｂと、データメモリ（data memory）１１２と、通信インタフェース１１４と、入出力インタフェース１１３とが、バス（bus）１１５を介して接続される。 FIG. 2 is a diagram showing an example of the hardware configuration of the image conversion device 100. As shown in FIG.
The image conversion device 100 is composed of a computer such as a personal computer, a smartphone, or a server computer. As shown in FIG. 2, the image conversion device 100 has a hardware processor (sometimes simply referred to as a processor) 111A such as a CPU (Central Processing Unit). The CPU can execute multiple information processes simultaneously by using a multi-core and multi-threaded one. The processor 111A may also include multiple CPUs. In the image conversion device 100, a program memory 111B, a data memory 112, a communication interface 114, and an input/output interface 113 are connected to the processor 111A via a bus 115.

通信インタフェース１１４は、例えば一つ以上の有線または無線の通信モジュールを含むことができる。通信インタフェース１１４は、ケーブル（cable）もしくはＬＡＮ（Local Area Network）またはインターネット（internet）等のネットワーク（ＮＷ）を介して接続される他のコンピュータおよびｗｅｂカメラ、などとの間で通信を行うことができる。The communication interface 114 may include, for example, one or more wired or wireless communication modules. The communication interface 114 may communicate with other computers and web cameras connected via a network (NW) such as a cable, a local area network (LAN), or the internet.

入出力インタフェース１１３には、入力デバイス（device）２００及び出力デバイス３００が接続されている。入力デバイス２００は、キーボード、マウス（mouse）などのポインティングデバイス（pointing device）、などの入力デバイス、カメラなどのセンサデバイス（sensor device）、などを含む。また、出力デバイス３００は、液晶ディスプレイ、ＣＲＴ（Cathode Ray Tube）ディスプレイ、などの表示デバイスである。入力デバイス２００及び出力デバイス３００は、いわゆるタブレット（tablet）型の入力・表示デバイスを用いたものが用いられることもできる。この種の入力・表示デバイスは、例えば液晶または有機ＥＬ（Electro Luminescence）を使用した表示デバイスの表示画面上に、静電方式または圧力方式を採用した入力検知シート（sheet）を配置して構成される。入出力インタフェース１１３は、上記入力デバイス２００において入力された操作情報をプロセッサ１１１Ａに入力すると共に、プロセッサ１１１Ａで生成された表示情報を出力デバイス３００に表示させる。An input device 200 and an output device 300 are connected to the input/output interface 113. The input device 200 includes input devices such as a keyboard, a pointing device such as a mouse, and a sensor device such as a camera. The output device 300 is a display device such as a liquid crystal display or a CRT (Cathode Ray Tube) display. The input device 200 and the output device 300 can also be a so-called tablet type input/display device. This type of input/display device is configured by arranging an input detection sheet using an electrostatic method or a pressure method on the display screen of a display device using, for example, liquid crystal or organic EL (Electro Luminescence). The input/output interface 113 inputs operation information inputted in the input device 200 to the processor 111A, and displays display information generated by the processor 111A on the output device 300.

なお、入力デバイス２００及び出力デバイス３００は、入出力インタフェース１１３に接続されていなくても良い。入力デバイス２００及び出力デバイス３００は、通信インタフェース１１４と直接またはネットワークを介して接続するための通信ユニットを備えることで、プロセッサ１１１Ａとの間で情報の授受を行い得る。 The input device 200 and the output device 300 do not have to be connected to the input/output interface 113. The input device 200 and the output device 300 can transmit and receive information between the processor 111A and the input device 200 and the output device 300 by being provided with a communication unit for connecting to the communication interface 114 directly or via a network.

また、入出力インタフェース１１３は、フラッシュメモリ（Flash memory）等の半導体メモリといった記録媒体のリード／ライト（read / write）機能を有しても良いし、あるいは、そのような記録媒体のリード／ライト機能を持ったリーダライタ（reader writer）との接続機能を有しても良い。さらに、入出力インタフェース１１３は、他の機器との接続機能を有して良い。In addition, the input/output interface 113 may have a read/write function for a recording medium such as a semiconductor memory such as a flash memory, or may have a connection function with a reader/writer having a read/write function for such a recording medium. Furthermore, the input/output interface 113 may have a connection function with other devices.

プログラムメモリ１１１Ｂは、非一時的な有形のコンピュータ可読記憶媒体として、随時書込み及び読出しが可能な不揮発性メモリ（non-volatile memory）と、随時読出しのみが可能な不揮発性メモリとが組み合わせて使用されたものである。随時書込み及び読出しが可能な不揮発性メモリは、例えば、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、などである。随時読出しのみが可能な不揮発性メモリは、例えば、ＲＯＭ（Read Only Memory）などである。このプログラムメモリ１１１Ｂには、プロセッサ１１１Ａが一実施形態に係る各種制御処理を実行するために必要なプログラム、例えば画像変換プログラムが格納されている。すなわち、上記の画像取得部１１、特徴点認識部１２、顔角度算出部１３、表示割合算出部１４、変換表情入力部１５、変化量補正部１７、表情変換部１８、及び画像出力部１９の各部における処理機能部は、何れも、プログラムメモリ１１１Ｂに格納された画像変換プログラムを上記プロセッサ１１１Ａにより読み出させて実行させることにより実現され得る。なお、これらの処理機能部の一部または全部は、特定用途向け集積回路（ＡＳＩＣ：Application Specific Integrated Circuit）またはＦＰＧＡ（field-programmable gate array）等の集積回路を含む、他の多様な形式によって実現されても良い。The program memory 111B is a non-transient tangible computer-readable storage medium that is a combination of a non-volatile memory that can be written and read at any time and a non-volatile memory that can only be read at any time. Examples of non-volatile memories that can be written and read at any time include HDDs (Hard Disk Drives) and SSDs (Solid State Drives). Examples of non-volatile memories that can only be read at any time include ROMs (Read Only Memory). This program memory 111B stores programs, such as an image conversion program, that are necessary for the processor 111A to execute various control processes according to one embodiment. That is, the processing function units in each of the above-mentioned image acquisition unit 11, feature point recognition unit 12, face angle calculation unit 13, display ratio calculation unit 14, conversion facial expression input unit 15, change amount correction unit 17, facial expression conversion unit 18, and image output unit 19 can all be realized by having the processor 111A read and execute the image conversion program stored in the program memory 111B. It should be noted that some or all of these processing functions may be implemented in a variety of other forms, including integrated circuits such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs).

データメモリ１１２は、有形のコンピュータ可読記憶媒体として、例えば、上記の不揮発性メモリと、ＲＡＭ（Random Access Memory）等の揮発性メモリ（volatile memory）とが組み合わせて使用されたものである。このデータメモリ１１２は、各種処理が行われる過程で取得及び作成された各種データが記憶されるために用いられる。すなわち、データメモリ１１２には、各種処理が行われる過程で、適宜、各種データを記憶するための領域が確保される。 Data memory 112 is a tangible computer-readable storage medium that is, for example, a combination of the above-mentioned non-volatile memory and a volatile memory such as a RAM (Random Access Memory). This data memory 112 is used to store various data acquired and created in the course of various processes. In other words, areas are secured in data memory 112 for storing various data as appropriate in the course of various processes.

図３は、顔の特徴点の一例を示す図である。図３中の星印が、プロセッサ１１１Ａが認識した特徴点であり、各特徴点の横に付された数字は各特徴点を識別するための一意な特徴点ＩＤ（IDentifier）である。特徴点ＩＤの数及び各特徴点ＩＤに対する顔の部分は、採用する特徴点認識手法により決まっている。例えば、特徴点ＩＤ「１８」の特徴点は向かって左の眉の左端、のように予め決まっている。 Figure 3 is a diagram showing an example of facial feature points. The stars in Figure 3 are feature points recognized by processor 111A, and the numbers next to each feature point are unique feature point IDs (IDentifiers) for identifying each feature point. The number of feature point IDs and the part of the face that each feature point ID corresponds to are determined by the feature point recognition method employed. For example, the feature point for feature point ID "18" is predetermined to be the left end of the left eyebrow.

図４は、特徴点の記憶形態の一例を示す図である。図４に示すように、データメモリ１１２には、テーブル（table）形式で、特徴点ＩＤに対応付けて顔画像中の特徴点のｘ座標及びｙ座標が記憶される。座標の値はピクセル（pixel）である。従って、データメモリ１１２には、図３の例であれば、特徴点ＩＤ「１」～「６８」に係る特徴点について、そのｘｙ座標が記憶される。 Figure 4 is a diagram showing an example of the storage format of feature points. As shown in Figure 4, the data memory 112 stores the x and y coordinates of feature points in a face image in table format, corresponding to feature point IDs. The coordinate values are in pixels. Therefore, in the example of Figure 3, the data memory 112 stores the x and y coordinates of feature points associated with feature point IDs "1" to "68".

データメモリ１１２には、プロセッサ１１１Ａが上記の変換表情入力部１５として動作したときに取得した、ユーザによって指定された変換表情が記憶される。
データメモリ１１２には、上記の変化量格納部１６に格納される変換量が格納され得る。 The data memory 112 stores a converted facial expression designated by the user, which is acquired when the processor 111A operates as the converted facial expression input unit 15 described above.
The data memory 112 can store the conversion amount stored in the change amount storage section 16 described above.

図５は、変化量の記憶形態の一例を示す図である。図５に示すように、データメモリ１１２には、変換表情ごとに、特徴点ＩＤに対応付けて、特徴点のｘ座標の変化量とｙ座標の変化量とが、被写体である人物によらない変化量として、テーブル形式で記憶される。変化量の値はピクセルである。変化量は、特徴点の移動方向と移動量によって表される。例えば、移動量「＋１」は、正方向に１ピクセル移動することを表す。 Figure 5 is a diagram showing an example of the storage format of the amount of change. As shown in Figure 5, in the data memory 112, the amount of change in the x coordinate and the amount of change in the y coordinate of the feature point are stored in table format for each converted facial expression, associated with the feature point ID, as an amount of change that is independent of the person who is the subject. The value of the amount of change is in pixels. The amount of change is represented by the direction and amount of movement of the feature point. For example, an amount of movement of "+1" represents a movement of one pixel in the positive direction.

データメモリ１１２には、プロセッサ１１１Ａが上記の表情変換部１８として動作したときに変換した顔画像が記憶され得る。
また、データメモリ１１２には、プロセッサ１１１Ａが動作途中で発生する種々の中間データが記憶され得る。 The data memory 112 can store a facial image converted when the processor 111A operates as the facial expression conversion unit 18 described above.
Furthermore, the data memory 112 can store various intermediate data generated during the operation of the processor 111A.

（動作）
次に、画像変換装置１００の動作を説明する。
図６は、画像変換装置１００による画像変換処理動作の一例を示すフローチャートである。画像変換装置１００のプロセッサ１１１Ａは、プログラムメモリ１１１Ｂに記憶された画像変換プログラムを読み出して実行することで、このフローチャートに示す画像変換装置１００としての動作を開始する。プロセッサ１１１Ａでの画像変換プログラムの実行は、入力デバイス２００から、入出力インタフェース１１３を介して、あるいは、通信インタフェース１１４を介して、画像変換の実施を指示されることで開始される。 (operation)
Next, the operation of the image conversion device 100 will be described.
6 is a flowchart showing an example of the image conversion processing operation by the image conversion device 100. The processor 111A of the image conversion device 100 starts the operation of the image conversion device 100 shown in this flowchart by reading and executing the image conversion program stored in the program memory 111B. Execution of the image conversion program by the processor 111A is started when an instruction to perform image conversion is received from the input device 200 via the input/output interface 113 or via the communication interface 114.

プロセッサ１１１Ａは、変換表情入力部１５として動作して、ユーザによる、笑顔などの変換したい先の表情である変換表情の指定入力を待つ（ステップＳ１）。例えば、プロセッサ１１１Ａは、入出力インタフェース１１３または通信インタフェース１１４を介した入力デバイス２００からの入力信号が変換表情の指定入力を含むか否かを判断する。変換表情の指定入力が有ったならば、プロセッサ１１１Ａは、ステップＳ２の処理へ移行する。The processor 111A operates as the conversion facial expression input unit 15 and waits for the user to input a designated conversion facial expression, such as a smile, which is the facial expression to be converted to (step S1). For example, the processor 111A determines whether or not the input signal from the input device 200 via the input/output interface 113 or the communication interface 114 includes a designated input of a conversion facial expression. If a designated input of a conversion facial expression is present, the processor 111A proceeds to processing in step S2.

プロセッサ１１１Ａは、指定された変換表情を、データメモリ１１２に記憶させる（ステップＳ２）。The processor 111A stores the specified converted facial expression in the data memory 112 (step S2).

プロセッサ１１１Ａは、画像取得部１１として動作して、顔画像を取得する（ステップＳ３）。例えば、プロセッサ１１１Ａは、入力デバイス２００のカメラによる被験者の顔の撮影画像を入出力インタフェース１１３を介して取得する。あるいは、プロセッサ１１１Ａは、ネットワークに接続されたｗｅｂカメラにより撮影された顔画像または他のコンピュータが生成したアバターの顔を通信インタフェース１１４を介して取得する。プロセッサ１１１Ａは、取得した顔画像を、データメモリ１１２に記憶させる。The processor 111A operates as the image acquisition unit 11 to acquire a facial image (step S3). For example, the processor 111A acquires an image of the subject's face captured by the camera of the input device 200 via the input/output interface 113. Alternatively, the processor 111A acquires a facial image captured by a web camera connected to a network or the face of an avatar generated by another computer via the communication interface 114. The processor 111A stores the acquired facial image in the data memory 112.

プロセッサ１１１Ａは、特徴点認識部１２として動作して、データメモリ１１２に記憶されている顔画像から特徴点を認識する（ステップＳ４）。プロセッサ１１１Ａは、例えば、ｄｌｉｂのｆａｃｅ＿ｌａｎｄｍａｒｋ＿ｄｅｔｅｃｔｉｏｎ関数（例えばhttp://dlib.net/face_landmark_detection.py.htmlを参照）などを利用して、顔画像に対して特徴点を認識する。具体的には、プロセッサ１１１Ａは、入力の顔画像に対して、ＨＯＧ（Histogram of Oriented Gradients）特徴と呼ばれる輝度の勾配方向の分布を抽出する。ＨＯＧ特徴と顔の特徴点の位置を紐付けたデータをもとに学習されたモデル（model）は一般的に提供されている。よって、プロセッサ１１１Ａは、抽出されたＨＯＧ特徴を、この学習モデルに入力し、顔の特徴点の位置を取得する。プロセッサ１１１Ａは、取得した特徴点の位置をデータメモリ１１２に記憶させる。The processor 111A operates as the feature point recognition unit 12 and recognizes feature points from the face image stored in the data memory 112 (step S4). The processor 111A recognizes feature points from the face image, for example, by using the face_landmark_detection function of dlib (see, for example, http://dlib.net/face_landmark_detection.py.html). Specifically, the processor 111A extracts a distribution of the gradient direction of brightness, called a HOG (Histogram of Oriented Gradients) feature, from the input face image. A model trained based on data linking the HOG feature with the positions of the facial feature points is generally provided. Therefore, the processor 111A inputs the extracted HOG feature into this learning model and acquires the positions of the facial feature points. The processor 111A stores the acquired positions of the feature points in the data memory 112.

プロセッサ１１１Ａは、顔角度算出部１３として動作して、例えばopencvなどを利用して、顔画像における顔の角度を算出する（ステップＳ５）。
具体的には、プロセッサ１１１Ａは、顔が正面に向いているときの顔パーツの特徴点の３次元位置（P_3d）を予め計測して、これをデータメモリ１１２に保持する。
プロセッサ１１１Ａは、顔画像の顔パーツの現在の特徴点の２次元位置（P’_2d）を取得する。
プロセッサ１１１Ａは、上記３次元位置（P_3d）を回転または移動したときの顔パーツの特徴点の２次元位置（P_2d）を算出する。
プロセッサ１１１Ａは、例えばopencvのProjectPoints2関数（例えばhttp://opencv.jp/opencv-2svn/py/camera_calibration_and_3d_reconstruction.html#projectpoints2を参照）などを利用して、上記各２次元位置を算出する。 The processor 111A operates as the face angle calculation unit 13 and calculates the face angle in the face image using, for example, opencv (step S5).
Specifically, the processor 111A measures in advance the three-dimensional positions (P_3d) of the feature points of the facial features when the face is facing forward, and stores this in the data memory 112.
The processor 111A obtains the two-dimensional position (P'_2d) of the current feature point of the facial part of the facial image.
The processor 111A calculates the two-dimensional position (P_2d) of the feature point of the facial feature when the above-mentioned three-dimensional position (P_3d) is rotated or moved.
The processor 111A calculates each of the two-dimensional positions by using, for example, the ProjectPoints2 function of opencv (see, for example, http://opencv.jp/opencv-2svn/py/camera_calibration_and_3d_reconstruction.html#projectpoints2).

プロセッサ１１１Ａは、２次元位置（P_2d）と、２次元位置（P’_2d）の距離の二乗和（sum of squares）（Ｄ）を算出する。
プロセッサ１１１Ａは、この二乗和Ｄを最小化するような角度（および移動量）を大域的最適化（global optimization）により求める。 The processor 111A calculates the sum of squares (D) of the distance between the two-dimensional position (P_2d) and the two-dimensional position (P'_2d).
The processor 111A determines the angle (and amount of movement) that minimizes this sum of squares D by global optimization.

プロセッサ１１１Ａは、例えばopencvのsolvPnP関数（例えばhttp://opencv.jp/opencv-2svn/cpp/camera_calibration_and_3d_reconstruction.html#cv-solvepnpを参照）などを利用して、上記最小化するような角度（および移動量）を正面からの顔の角度（ａ）として算出部する。The processor 111A calculates the angle (and amount of movement) that minimizes the above-mentioned angle as the face angle (a) from the front, for example, using the solvPnP function of opencv (see, for example, http://opencv.jp/opencv-2svn/cpp/camera_calibration_and_3d_reconstruction.html#cv-solvepnp).

プロセッサ１１１Ａは、顔認識ツールを起動しつつ顔を動かしながら、認識ができなくなった際の特徴点の位置を取得することにより、認識ができる限界の顔の角度（Ａ）を被写体の人物によらない角度として予め算出し、これをデータメモリ１１２に保持する。The processor 111A activates the face recognition tool and moves the face to obtain the position of the feature point at which recognition becomes impossible, thereby calculating in advance the limit face angle (A) at which recognition is possible as an angle that is independent of the person being photographed, and stores this in the data memory 112.

次に、プロセッサ１１１Ａは、表示割合算出部１４として動作して、顔画像に対して顔の全体の領域のうち顔以外の物体で隠れている領域の割合である、顔の表示割合を算出する（ステップＳ６）。例えば顔の全体の１０％が顔以外の物体で隠れていれば、上記顔の表示割合は１０％となる。Next, the processor 111A operates as the display ratio calculation unit 14 to calculate the display ratio of the face, which is the ratio of the area of the entire face that is hidden by objects other than the face, for the face image (step S6). For example, if 10% of the entire face is hidden by objects other than the face, the display ratio of the face is 10%.

ここで、表示割合算出部１４による算出の例を図７および図８を参照して説明する。 Here, an example of calculation by the display ratio calculation unit 14 is explained with reference to Figures 7 and 8.

図７は、表示割合算出部により用いられるニューラルネットワークの一例を示す図である。図８は、表示割合算出部により処理されるグリッドセルの一例を示す図である。ここでは、動物および各種物体が含まれる入力画像に係る例を説明するが、これらが人の顔および顔を隠している物体、例えば手またはその他の物体であるときにも同様に適用が可能である。 Figure 7 shows an example of a neural network used by the display ratio calculation unit. Figure 8 shows an example of a grid cell processed by the display ratio calculation unit. Here, an example is described for an input image containing animals and various objects, but it can be similarly applied when these are human faces and objects obscuring the faces, such as hands or other objects.

図７および図８に示された例では、既知のYOLO (You Only Look Once)（ディープラーニング（deep learning）による一般物体検出手法）が用いられ得る。この手法は、例えば下記の資料に開示される。
「Joseph Redmon, et al., “YOLOv3: An Incremental Improvement”, arXiv preprint, arXiv:1804.02767, 2018.」 7 and 8, the well-known YOLO (You Only Look Once) (a general object detection method using deep learning) may be used. This method is disclosed in, for example, the following document:
“Joseph Redmon, et al., “YOLOv3: An Incremental Improvement”, arXiv preprint, arXiv:1804.02767, 2018.”

この手法では、プロセッサ１１１Ａは、顔画像を正方形にリサイズ（resize）し、これを図７に示されるような、画像処理の分野で数多く用いられるニューラルネットワークであるＣＮＮ（Convolutional Neural Network（畳み込みニューラルネットワーク））に入力する。プロセッサ１１１Ａは、図７に示されたＣＮＮにおける２４層の畳み込み層(Conv. Layer)および４層のpooling層（図７の符号ａ参照）を経て顔画像から特徴を抽出し、２層の全結合層(Conn. Layer)で（図７の符号ｂ参照）、画像における物体のBounding Box、および物体の種類の確率を推定することができる。畳み込み層の最終出力サイズ7×7はgrid cellの分割数と一致する。In this technique, processor 111A resizes the face image to a square and inputs it into a CNN (Convolutional Neural Network), a neural network widely used in the field of image processing, as shown in Figure 7. Processor 111A extracts features from the face image through the 24 convolutional layers (Conv. Layer) and 4 pooling layers (see symbol a in Figure 7) in the CNN shown in Figure 7, and can estimate the bounding box of an object in the image and the probability of the object type in the 2 fully connected layers (Conn. Layer) (see symbol b in Figure 7). The final output size of the convolutional layer, 7 x 7, matches the number of divisions in the grid cell.

上記入力された画像は、図８に示されるような、Ｓ×Ｓのgrid cellに分割される（図８の（ａ）参照）。
プロセッサ１１１Ａは、上記分割した各grid cellに対して、Ｂ個の物体のBounding Boxを推定する。プロセッサ１１１Ａは、１つのBounding Boxにつき、Bounding Boxの座標値、幅、高さ(x, y, w, h)と、そのBounding Boxが物体である信頼度(confidence)スコアでなる、計５つの値を出力する（図８の（ｂ）参照）。 The input image is divided into S×S grid cells as shown in FIG. 8 (see FIG. 8(a)).
The processor 111A estimates the bounding boxes of B objects for each of the divided grid cells. For each bounding box, the processor 111A outputs a total of five values, including the coordinate values, width, and height (x, y, w, h) of the bounding box, and a confidence score that the bounding box is an object (see FIG. 8B).

座標値のx, yは、grid cellの境界を基準にしたBounding Boxの中心座標であり、幅wと高さhは画像全体のサイズに対する相対値であり、信頼度スコア（score）は、そのBounding Boxが物体か背景かの確率を表す。この確率は、物体なら「１」で背景であれば「０」である。 The coordinates x and y are the center coordinates of the bounding box based on the boundary of the grid cell, the width w and height h are relative values to the size of the entire image, and the confidence score (score) represents the probability that the bounding box is an object or background. This probability is "1" if it is an object and "0" if it is background.

物体領域の推定精度を測る指標として、正解Bounding Boxと推定Bounding Boxの一致具合を表すIoU (Intersection over Union)がある。上記YOLOではBounding Boxの信頼度スコアがIoUを表す。 One index for measuring the accuracy of object region estimation is IoU (Intersection over Union), which indicates the degree of agreement between the correct bounding box and the estimated bounding box. In the above YOLO, the reliability score of the bounding box represents IoU.

プロセッサ１１１Ａは、各grid cell単位で物体の種類の確率を推定する。例えば、プロセッサ１１１Ａは、Ｃ種類の分類クラス（classification class）で、grid cellが物体である場合に、どのクラスに属するかの確率、すなわち条件付き確率（conditional probability）を推定する（図８の（ｃ）参照）。The processor 111A estimates the probability of the type of object for each grid cell. For example, in the classification class of type C, when a grid cell is an object, the processor 111A estimates the probability of which class it belongs to, i.e., the conditional probability (see (c) of Figure 8).

プロセッサ１１１Ａは、ここで推定したクラス確率を上記のBounding Boxと統合することで、何の物体であるかを示す複数のBounding Boxを得る（図８の（ｄ）参照）。The processor 111A combines the class probability estimated here with the bounding box described above to obtain multiple bounding boxes indicating what the object is (see (d) in Figure 8).

プロセッサ１１１Ａは、重複領域も含んだ、これらのBounding Boxを、信頼度スコアの高いBounding Boxを基準にＮＭＳ（(Non-Maximum Suppression）という手法で選別する（図８の（ｅ）参照）。NMSは、IoU値が大きい(重なり度合いの高い)領域をしきい値で抑制（suppression）する。これにより物体領域の検出結果が得られる。The processor 111A selects these bounding boxes, including overlapping regions, using a technique called Non-Maximum Suppression (NMS) based on the bounding boxes with high confidence scores (see (e) in Figure 8). NMS suppresses regions with large IoU values (high degree of overlap) using a threshold value. This results in the detection of object regions.

プロセッサ１１１Ａは、顔領域と、この領域に重畳する物体領域があったときは、重畳している領域の面積を顔領域の面積で除することによって、上記の顔の表示割合を算出することができる。When there is a face area and an object area overlapping this area, the processor 111A can calculate the display ratio of the face by dividing the area of the overlapping area by the area of the face area.

次に、プロセッサ１１１Ａは、変化量補正部１７として動作して、変換したい先の表情に応じた変化量を変化量格納部１６から読み出し、Ｓ４で認識した特徴点、Ｓ５で算出した顔角度、及びＳ６で算出した表示割合に基づいて、変換したい先の表情に応じた、上記読み出した変化量を補正した変化量を算出する（ステップＳ７）。Next, the processor 111A operates as a change amount correction unit 17, reads out the change amount corresponding to the facial expression to be converted from the change amount storage unit 16, and calculates a change amount that is a correction of the read out change amount corresponding to the facial expression to be converted based on the feature points recognized in S4, the facial angle calculated in S5, and the display ratio calculated in S6 (step S7).

具体的には、プロセッサ１１１Ａは、顔の角度、すなわち正面からの顔の角度aおよび認識ができる限界の顔の角度Ａと、顔全体の領域に対する顔が隠れている領域の割合Ｈを取得し、これらに応じて、下記の式（１）により、表情変換の変化量を減衰させる、すなわち変化量を補正し、この補正した結果をデータメモリ１１２に保持する。
ΔＰ_ｎｅｗ＝ΔＰ・（１－Ｈ）・ａ／А …式（１）
式（１）の左辺ΔＰ_ｎｅｗは、表情変換の減衰させた、すなわち補正後の変化量であり、右辺のΔＰは表情変換の補正前の変化量である。 Specifically, processor 111A obtains the face angle, i.e., the face angle a from the front and the limit face angle A at which the face can be recognized, and the proportion H of the area where the face is hidden relative to the entire face area, and based on these, attenuates the amount of change in facial expression transformation, i.e., corrects the amount of change, using the following equation (1), and stores this corrected result in data memory 112.
ΔP _new = ΔP・(1-H)・a/A…Formula (1)
The left side of equation (1), ΔP _new , is the amount of change after attenuation, that is, correction, of facial expression transformation, and ΔP on the right side is the amount of change before correction of facial expression transformation.

すなわち、上記の例では、（１）正面からの顔の角度aおよび認識ができる限界の顔の角度Ａとの比率ａ／Аと、（２）顔全体の領域に対する顔が隠れている領域の割合Ｈと、に基づいて、補正後の変化量が算出される。
なお、この例に限らず、例えば、許容される精度の範囲内で、（１）正面からの顔の角度aおよび認識ができる限界の顔の角度Ａとの比率ａ／Аと、（２）顔全体の領域に対する顔が隠れている領域の割合Ｈと、の一方に基づいて補正後の変化量が算出されてもよい。 That is, in the above example, the amount of change after correction is calculated based on (1) the ratio a/A between the face angle a from the front and the limit face angle A at which the face can be recognized, and (2) the proportion H of the area where the face is hidden to the entire face area.
In addition, without being limited to this example, for example, within the range of allowable accuracy, the amount of change after correction may be calculated based on either (1) the ratio a/A between the face angle a from the front and the limit face angle A at which the face can be recognized, or (2) the proportion H of the area where the face is hidden to the entire face area.

このようにして変化量を補正すれば、顔の角度が変わったり、顔の一部が隠れたりすることにより、特徴点の認識ができなかったとしても、自然でないタイミングで表情変換が止まることが無くなり、顔画像の表情を自然に変換することができる。 By correcting the amount of change in this way, even if feature points cannot be recognized due to a change in the angle of the face or part of the face being hidden, facial expression conversion will not stop at an unnatural time, and the facial expression in the facial image can be converted naturally.

プロセッサ１１１Ａは、表情変換部１８として動作して、データメモリ１１２に記憶されている顔画像の表情を変換する（ステップＳ８）。すなわち、プロセッサ１１１Ａは、データメモリ１１２に記憶された、変換表情に応じた変化量が補正された結果に基づいて、顔画像を変換する。例えば、プロセッサ１１１Ａは、ＭＬＳの実装（例えばhttps://github.com/Jarvis73/Moving-Least-Squaresを参照）などを利用する。The processor 111A operates as the facial expression conversion unit 18 and converts the facial expression of the facial image stored in the data memory 112 (step S8). That is, the processor 111A converts the facial image based on the result of correcting the amount of change according to the converted facial expression stored in the data memory 112. For example, the processor 111A uses an implementation of MLS (see, for example, https://github.com/Jarvis73/Moving-Least-Squares) or the like.

具体的には、プロセッサ１１１Ａは、各特徴点について、データメモリ１１２に記憶された変換表情に応じた変化量の補正後の変化量分だけ移動させる。例えば、表情を笑顔に変換する場合には、特徴点ＩＤ「１」の制御点については、変換前のｘｙ座標が（２３，４５）であるので（図４参照）、プロセッサ１１１Ａは、ｘ座標を「＋１」、ｙ座標を「＋２」する（図５参照）ことで、当該特徴点の画素を（２４，４７）に移動するような変換を行う。Specifically, processor 111A moves each feature point by the amount of change after correction of the amount of change corresponding to the converted facial expression stored in data memory 112. For example, when converting a facial expression to a smile, the x and y coordinates before conversion for the control point with feature point ID "1" are (23, 45) (see Figure 4), so processor 111A performs a conversion such that the pixel of the feature point moves to (24, 47) by incrementing the x coordinate by "+1" and the y coordinate by "+2" (see Figure 5).

そして、特徴点については、プロセッサ１１１Ａは、下記の式（２）に示されるアフィン（Affine）変換（ヘルマート（Helmert）変換＝相似変換及びｒｉｇｉｄｄｅｆｏｒｍａｔｉｏｎ＝剛体変形を含む）を適用する。Then, for the feature points, the processor 111A applies an affine transformation (including the Helmert transformation = similarity transformation and rigid deformation) shown in the following equation (2).

ただし、上記式（２）のｘ，ｙは近傍の特徴点の座標であり、ｘ’，ｙ’は、その特徴点の座標に変化量を足した座標であり、ａ，ｂ，ｃ，ｄはパラメータ（parameter）であり、ｔ_x，ｔ_yは平行移動パラメータである。プロセッサ１１１Ａは、特徴点の座標ｘ，ｙと変化量を足した座標ｘ’，ｙ’の最小二乗平均（least square means）を算出し、これを最小化するようなパラメータａ，ｂ，ｃ，ｄ，ｔ_x，ｔ_yを大域的最適化により求める。そして、プロセッサ１１１Ａ変換するべき対象点の座標をｘ，ｙとして、これら求めたパラメータを用いて変換後の座標を求める。プロセッサ１１１Ａは、こうして求めたパラメータａ，ｂ，ｃ，ｄ，ｔ_x，ｔ_yを用いて、特徴点から上記アフィン変換により変換した後の座標を求める。 In the above formula (2), x and y are the coordinates of the nearby feature point, x' and y' are the coordinates obtained by adding the amount of change to the coordinates of the feature point, a, b, c, and d are parameters, and t _x and t _y are translation parameters. The processor 111A calculates the least square means of the coordinates x' and y' obtained by adding the amount of change to the coordinates of the feature point x and y, and obtains parameters a, b, c, d, t _x , and t _y that minimize this by global optimization. Then, the processor 111A determines the coordinates after transformation using these obtained parameters, with the coordinates of the target point to be transformed being x and y. The processor 111A determines the coordinates after transformation from the feature point by the above affine transformation using the parameters a, b, c, d, t _x , and t _y thus obtained.

プロセッサ１１１Ａは、こうして変換した後の顔画像を変換画像としてデータメモリ１１２に記憶させる。The processor 111A stores the facial image thus converted as a converted image in the data memory 112.

プロセッサ１１１Ａは、画像出力部１９として動作して、データメモリ１１２に記憶された変換画像を出力する（ステップＳ９）。例えば、プロセッサ１１１Ａは、入出力インタフェース１１３を介して出力デバイス３００に顔画像を表示させる。あるいは、プロセッサ１１１Ａは、通信インタフェース１１４によりネットワーク上に送信し、ネットワークに接続された表示デバイスに表示させたり、ネットワークに接続された他のコンピュータの表示部に表示させたりする。The processor 111A operates as the image output unit 19 and outputs the converted image stored in the data memory 112 (step S9). For example, the processor 111A causes the output device 300 to display the facial image via the input/output interface 113. Alternatively, the processor 111A transmits the facial image onto the network via the communication interface 114 and causes it to be displayed on a display device connected to the network, or on the display unit of another computer connected to the network.

プロセッサ１１１Ａは、図６のフローチャートに示す画像変換装置１００としての動作を終了するか否か判断する（ステップＳ１０）。例えば、プロセッサ１１１Ａは、入力デバイス２００から、入出力インタフェース１１３を介して、あるいは、通信インタフェース１１４を介して、ユーザから画像変換の終了を指示されたか否か確認する。ここで、上記動作を終了する場合には（ステップＳ１０のＹＥＳ）、プロセッサ１１１Ａは、図６のフローチャートに示す動作を終了する。The processor 111A determines whether or not to terminate the operation of the image conversion device 100 shown in the flowchart of FIG. 6 (step S10). For example, the processor 111A checks whether or not the user has instructed the end of image conversion from the input device 200, via the input/output interface 113, or via the communication interface 114. If the operation is to be terminated (YES in step S10), the processor 111A terminates the operation shown in the flowchart of FIG. 6.

これに対して、未だ上記動作を終了しない場合には（ステップＳ１０のＮＯ）、プロセッサ１１１Ａは、変換表情入力部１５として動作して、ユーザによる変換表情の変更指定入力が有ったか否か判断する（ステップＳ１１）。変換表情の変更指定入力が無ければ（ステップＳ１１のＮＯ）、プロセッサ１１１Ａは、ステップＳ３の処理へ移行する。また、変換表情の変更指定入力が有った場合には（ステップＳ１０のＹＥＳ）、プロセッサ１１１Ａは、ステップＳ２の処理へ移行する。On the other hand, if the above operation has not yet ended (NO in step S10), processor 111A operates as conversion facial expression input unit 15 and determines whether or not the user has input a change to the conversion facial expression (step S11). If there has been no input specifying a change to the conversion facial expression (NO in step S11), processor 111A proceeds to processing in step S3. If there has been input specifying a change to the conversion facial expression (YES in step S10), processor 111A proceeds to processing in step S2.

以上に説明した一実施形態に係る画像変換装置１００は、顔角度算出部１３と、表示割合算出部１４と、変化量補正部１７と、表情変換部１８とを備える。表情変換部１８は、変換するべき変換表情に応じた変形量により特徴点を変換することで人の顔の表情を変換した変換画像を得る。
従って、一実施形態に係る画像変換装置１００は、顔の角度が変わったり、顔の一部が隠れたりすることにより、特徴点の認識ができなかったとしても、自然でないタイミングで表情変換が止まることが無くなり、顔画像の表情を自然に変換することができる。 The image conversion device 100 according to the embodiment described above includes a face angle calculation unit 13, a display ratio calculation unit 14, a change amount correction unit 17, and an expression conversion unit 18. The expression conversion unit 18 obtains a converted image in which a human facial expression is converted by converting feature points using a deformation amount corresponding to a conversion expression to be converted.
Therefore, the image conversion device 100 according to one embodiment can convert the facial expression of a facial image naturally without stopping facial expression conversion at an unnatural timing, even if feature points cannot be recognized due to a change in the angle of the face or part of the face being hidden.

［他の実施形態］
なお、この発明は上記一実施形態に限定されるものではない。
例えば、以上で説明した各処理の流れは、説明した手順に限定されるものではなく、いくつかのステップの順序が入れ替えられても良いし、いくつかのステップが同時並行で実施されても良い。 [Other embodiments]
It should be noted that the present invention is not limited to the above embodiment.
For example, the flow of each process described above is not limited to the procedures described, and the order of some steps may be changed, or some steps may be performed simultaneously in parallel.

また、以上で説明した各処理の流れは、リアルタイムに取得する顔画像の表情をリアルタイムに変換していく場合であったが、リアルタイム処理ではなく、保存された顔画像の表情を変換する用途にも同様に適用できる。 In addition, the process flow described above is for converting facial expressions in facial images acquired in real time in real time, but it can also be applied to applications where the facial expressions of stored facial images are converted rather than real-time processing.

また、各実施形態に記載された手法は、計算機（コンピュータ）に実行させることができるプログラム（ソフトウエア手段）として、例えば磁気ディスク（フロッピー（登録商標）ディスク（Floppy disk）、ハードディスク（hard disk）等）、光ディスク（optical disc）（ＣＤ－ＲＯＭ、ＤＶＤ、ＭＯ等）、半導体メモリ（ＲＯＭ、ＲＡＭ、フラッシュメモリ等）等の記録媒体に格納し、また通信媒体により伝送して頒布され得る。なお、媒体側に格納されるプログラムには、計算機に実行させるソフトウエア手段（実行プログラムのみならずテーブル、データ構造も含む）を計算機内に構成させる設定プログラムをも含む。本装置を実現する計算機は、記録媒体に記録されたプログラムを読み込み、また場合により設定プログラムによりソフトウエア手段を構築し、このソフトウエア手段によって動作が制御されることにより上述した処理を実行する。なお、本明細書でいう記録媒体は、頒布用に限らず、計算機内部あるいはネットワークを介して接続される機器に設けられた磁気ディスク、半導体メモリ等の記憶媒体を含むものである。 The methods described in each embodiment may be stored as a program (software means) that can be executed by a computer on a recording medium such as a magnetic disk (floppy disk, hard disk, etc.), optical disk (CD-ROM, DVD, MO, etc.), semiconductor memory (ROM, RAM, flash memory, etc.), or may be distributed by transmission via a communication medium. The programs stored on the medium include a setting program that configures the software means (including not only execution programs but also tables and data structures) that the computer executes. The computer that realizes this device reads the program recorded on the recording medium, and in some cases, constructs the software means using the setting program, and executes the above-mentioned processing by controlling the operation of the software means. The recording medium referred to in this specification is not limited to a recording medium for distribution, but also includes a storage medium such as a magnetic disk or semiconductor memory provided inside the computer or in a device connected via a network.

なお、本発明は、上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は適宜組み合わせて実施してもよく、その場合組み合わせた効果が得られる。更に、上記実施形態には種々の発明が含まれており、開示される複数の構成要件から選択された組み合わせにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、課題が解決でき、効果が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。 Note that the present invention is not limited to the above-described embodiments, and can be modified in various ways in the implementation stage without departing from the gist of the invention. The embodiments may also be implemented in appropriate combination, in which case the combined effects can be obtained. Furthermore, the above-described embodiments include various inventions, and various inventions can be extracted by combinations selected from the multiple constituent elements disclosed. For example, if the problem can be solved and an effect can be obtained even if some constituent elements are deleted from all the constituent elements shown in the embodiments, the configuration from which these constituent elements are deleted can be extracted as an invention.

１００…画像変換装置
１１…画像取得部
１２…特徴点認識部
１３…顔角度算出部
１４…表示割合算出部
１５…変換表情入力部
１６…変化量格納部
１７…変化量補正部
１８…表情変換部
１９…画像出力部
１１１Ａ…プロセッサ
１１１Ｂ…プログラムメモリ
１１２…データメモリ
１１３…入出力インタフェース
１１４…通信インタフェース
１１５…バス
２００…入力デバイス
３００…出力デバイス 100: Image conversion device 11: Image acquisition section 12: Feature point recognition section 13: Face angle calculation section 14: Display ratio calculation section 15: Converted facial expression input section 16: Change amount storage section 17: Change amount correction section 18: Facial expression conversion section 19: Image output section 111A: Processor 111B: Program memory 112: Data memory 113: Input/output interface 114: Communication interface 115: Bus 200: Input device 300: Output device

Claims

a feature point recognition unit that recognizes feature points of facial parts from an image including a human face;
a change amount correction unit which corrects a change amount representing an amount of deformation for each of the feature points of the facial features according to the converted facial expression when converting the recognized facial expression into a converted facial expression to be converted, based on a ratio of an angle of the face from the front to a limit angle at which the face in the image cannot be recognized from the front, and a ratio of an area excluding areas of the face hidden by objects to an entire area of the face;
an expression conversion unit that obtains a converted image in which the facial expression of the person is converted by transforming the feature points according to the corrected change amount;
An image conversion device comprising:

The change amount correction unit
correcting the amount of change by multiplying a predetermined amount of change for each of the feature points of the facial features by a ratio of an angle of the face from the front to a limit angle at which the face in the image cannot be recognized from the front, and a ratio of an area of the face excluding an area of the face hidden by an object to a total area of the face;
2. The image conversion device according to claim 1.

calculating a two-dimensional position of a feature point of the facial part when the three-dimensional position of the feature point of the facial part when the face is facing forward is rotated or moved, and calculating an angle at which a sum of squares of a distance between the calculated two-dimensional position and a current two-dimensional position of the feature point of the facial part is a minimum as an angle of the face from the front;
2. The image conversion device according to claim 1.

a storage device in which a change amount representing a deformation amount for each of the feature points is stored in advance for each of the facial expressions to be converted;
a conversion expression input unit for inputting the conversion expression to be converted;
Further comprising:
The change amount correction unit
reading out the amount of change corresponding to the input converted facial expression from the storage device, and correcting the amount of change that has been read out;
4. An image conversion device according to claim 1.

1. A method performed by an image transformation device for transforming facial expressions in an image of a human face, comprising:
Recognizing feature points of facial features from an image including a human face by a feature point recognition unit of the image conversion device;
a change amount correction unit of the image conversion device corrects a change amount representing a deformation amount for each of the feature points of the facial features according to the converted facial expression when converting the recognized facial expression into a converted facial expression to be converted, based on a ratio of an angle of the face from the front to a limit angle at which the face in the image cannot be recognized from the front, and a ratio of an area excluding areas of the face hidden by objects to an entire area of the face;
obtaining a converted image in which the facial expression of the person is converted by transforming the feature points by the corrected change amount using a facial expression conversion unit of the image conversion device;
13. An image conversion method comprising:

5. An image conversion processing program that causes a processor to function as each unit of the image conversion device according to claim 1.